FFmpeg

virtualenv/FFmpeg

Fork 0

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-07 11:13:41 +02:00

Commit Graph

Author	SHA1	Message	Date
James Darnley	c9a51c29fc	yadif: remove an 'm' from the LOAD macro definition Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-03-16 22:33:49 +01:00
James Darnley	1d3b14cac2	yadif: remove repeated check on width The filter already checks that width (and height) are greater than 3. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-03-16 22:33:30 +01:00
James Darnley	0a5814c9ba	yadif: x86 assembly for 9 to 14-bit samples These smaller samples do not need to be unpacked to double words allowing the code to process more pixels every iteration (still 2 in MMX but 6 in SSE2). It also avoids emulating the missing double word instructions on older instruction sets. Like with the previous code for 16-bit samples this has been tested on an Athlon64 and a Core2Quad. Athlon64: 1809275 decicycles in C, 32718 runs, 50 skips 911675 decicycles in mmx, 32727 runs, 41 skips, 2.0x faster 495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster Core2Quad: 921363 decicycles in C, 32756 runs, 12 skips 486537 decicycles in mmx, 32764 runs, 4 skips, 1.9x faster 293296 decicycles in sse2, 32759 runs, 9 skips, 3.1x faster 284910 decicycles in ssse3, 32759 runs, 9 skips, 3.2x faster Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-03-16 22:32:54 +01:00

Author

SHA1

Message

Date

James Darnley

c9a51c29fc

yadif: remove an 'm' from the LOAD macro definition

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

2013-03-16 22:33:49 +01:00

James Darnley

1d3b14cac2

yadif: remove repeated check on width

The filter already checks that width (and height) are greater than 3.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

2013-03-16 22:33:30 +01:00

James Darnley

0a5814c9ba

yadif: x86 assembly for 9 to 14-bit samples

These smaller samples do not need to be unpacked to double words
allowing the code to process more pixels every iteration (still 2 in MMX
but 6 in SSE2).  It also avoids emulating the missing double word
instructions on older instruction sets.

Like with the previous code for 16-bit samples this has been tested on
an Athlon64 and a Core2Quad.

Athlon64:
1809275 decicycles in C,    32718 runs, 50 skips
 911675 decicycles in mmx,  32727 runs, 41 skips, 2.0x faster
 495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster

Core2Quad:
 921363 decicycles in C,     32756 runs, 12 skips
 486537 decicycles in mmx,   32764 runs,  4 skips, 1.9x faster
 293296 decicycles in sse2,  32759 runs,  9 skips, 3.1x faster
 284910 decicycles in ssse3, 32759 runs,  9 skips, 3.2x faster

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>

2013-03-16 22:32:54 +01:00

3 Commits