1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-29 22:00:58 +02:00

7 Commits

Author SHA1 Message Date
Andreas Rheinhardt
4d7128be9a avfilter/x86/vf_yadif: Remove obsolete MMXEXT functions
The only system which benefit from these are truely ancient
32bit x86s as all other systems use at least the SSE2 versions
(this includes all x64 cpus (which is why this code is restricted
to x86-32)).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:37:48 +02:00
James Almer
ddea3b7106 x86/yadif-10: remove duplicate ABS macro
And use the x86util ones instead, which are optimized for mmxext/sse2.
About ~1% increase in performance on pre SSSE3 processors.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-10 21:06:51 +02:00
Christophe Gisquet
9107612818 x86util: add and use RSHIFT/LSHIFT macros
Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15 13:19:27 +02:00
Robert Krüger
194ef56ba7 Change license of yadif from GPL to LGPL
Signed-off-by: Robert Krüger <krueger@lesspain.de>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-14 14:19:15 +01:00
James Darnley
c9a51c29fc yadif: remove an 'm' from the LOAD macro definition
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:33:49 +01:00
James Darnley
1d3b14cac2 yadif: remove repeated check on width
The filter already checks that width (and height) are greater than 3.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:33:30 +01:00
James Darnley
0a5814c9ba yadif: x86 assembly for 9 to 14-bit samples
These smaller samples do not need to be unpacked to double words
allowing the code to process more pixels every iteration (still 2 in MMX
but 6 in SSE2).  It also avoids emulating the missing double word
instructions on older instruction sets.

Like with the previous code for 16-bit samples this has been tested on
an Athlon64 and a Core2Quad.

Athlon64:
1809275 decicycles in C,    32718 runs, 50 skips
 911675 decicycles in mmx,  32727 runs, 41 skips, 2.0x faster
 495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster

Core2Quad:
 921363 decicycles in C,     32756 runs, 12 skips
 486537 decicycles in mmx,   32764 runs,  4 skips, 1.9x faster
 293296 decicycles in sse2,  32759 runs,  9 skips, 3.1x faster
 284910 decicycles in ssse3, 32759 runs,  9 skips, 3.2x faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-16 22:32:54 +01:00