James Almer
d5b3077ecf
x86/pixelutils: add missing preprocessor wrapper to the AVX2 functions
...
Should fix compilation with old yasm/nasm
Signed-off-by: James Almer <jamrial@gmail.com>
2018-07-31 22:14:42 -03:00
Jun Zhao
d36b8394f4
avutil/pixelutils: sad_32x32 sse2/avx2 optimizations.
...
add ff_pixelutils_sad_32x32_sse2, ff_pixelutils_sad_{a,u}_32x32_sse2,
ff_pixelutils_sad_32x32_avx22, ff_pixelutils_sad_{a,u}_32x32_avx2
use perf record/report profiling, get instructions:u for avx2 sad_32x32:
72.05% pixelutils pixelutils [.] block_sad_32x32_c
18.50% pixelutils pixelutils [.] block_sad_16x16_c
4.78% pixelutils pixelutils [.] block_sad_8x8_c
2.69% pixelutils pixelutils [.] block_sad_4x4_c
0.89% pixelutils pixelutils [.] block_sad_2x2_c
0.16% pixelutils pixelutils [.] ff_pixelutils_sad_32x32_avx2
0.16% pixelutils pixelutils [.] ff_pixelutils_sad_u_32x32_avx2
0.12% pixelutils pixelutils [.] ff_pixelutils_sad_a_32x32_avx2
sse2 sad_32x32 instructions:u like:
71.86% pixelutils pixelutils [.] block_sad_32x32_c
18.42% pixelutils pixelutils [.] block_sad_16x16_c
4.81% pixelutils pixelutils [.] block_sad_8x8_c
2.68% pixelutils pixelutils [.] block_sad_4x4_c
0.88% pixelutils pixelutils [.] block_sad_2x2_c
0.29% pixelutils pixelutils [.] ff_pixelutils_sad_32x32_sse2
0.26% pixelutils pixelutils [.] ff_pixelutils_sad_u_32x32_sse2
0.23% pixelutils pixelutils [.] ff_pixelutils_sad_a_32x32_sse2
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
2018-07-31 19:17:51 +08:00
Jun Zhao
09628cb1b4
avutil/pixelutils: correct the function name in comments
...
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
2018-07-11 20:12:33 +08:00
Henrik Gramner
f0b7882ceb
x86inc: Drop SECTION_TEXT macro
...
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
2015-08-04 20:13:09 +02:00
Clément Bœsch
554d819062
avutil/pixelutils: faster pixelutils_sad_16x16
...
501 to 439 decicycles.
See 45c7f3997ea11c3d1007b2126b1c0049a8c27105.
2014-08-23 20:12:56 +02:00
Clément Bœsch
45c7f3997e
avutil/pixelutils: faster pixelutils_sad_[au]_16x16
...
~560 → ~500 decicycles
This is following the comments from Michael in
https://ffmpeg.org/pipermail/ffmpeg-devel/2014-August/160599.html
Using 2 registers for accumulator didn't help. On the other hand,
some re-ordering between the movs and psadbw allowed going ~538 to ~500.
2014-08-23 10:18:53 +02:00
Clément Bœsch
28a2107a8d
avutil: add pixelutils API
2014-08-05 21:05:52 +02:00