1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-18 03:19:31 +02:00
Commit Graph

9 Commits

Author SHA1 Message Date
Andreas Rheinhardt
ea043cc53e avutil/x86/pixelutils: Remove obsolete MMX(EXT) functions
x64 always has MMX, MMXEXT, SSE and SSE2 and this means
that some functions for MMX, MMXEXT, SSE and 3dnow are always
overridden by other functions (unless one e.g. explicitly
disables SSE2). So given that the only systems which benefit
from the 8x8 MMX (overridden by MMXEXT) or the 16x16 MMXEXT
(overridden by SSE2) are truely ancient 32bit x86s they are removed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-06-22 13:36:44 +02:00
Alexander Kanavin
91326dc942 libavutil: include assembly with full path from source root
Otherwise nasm writes the full host-specific paths into .o
output, which breaks binary reproducibility.

Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-02-08 10:42:26 +01:00
James Almer
d5b3077ecf x86/pixelutils: add missing preprocessor wrapper to the AVX2 functions
Should fix compilation with old yasm/nasm

Signed-off-by: James Almer <jamrial@gmail.com>
2018-07-31 22:14:42 -03:00
Jun Zhao
d36b8394f4 avutil/pixelutils: sad_32x32 sse2/avx2 optimizations.
add ff_pixelutils_sad_32x32_sse2, ff_pixelutils_sad_{a,u}_32x32_sse2,
ff_pixelutils_sad_32x32_avx22, ff_pixelutils_sad_{a,u}_32x32_avx2

use perf record/report profiling, get instructions:u for avx2 sad_32x32:

  72.05%  pixelutils  pixelutils     [.] block_sad_32x32_c
  18.50%  pixelutils  pixelutils     [.] block_sad_16x16_c
   4.78%  pixelutils  pixelutils     [.] block_sad_8x8_c
   2.69%  pixelutils  pixelutils     [.] block_sad_4x4_c
   0.89%  pixelutils  pixelutils     [.] block_sad_2x2_c
   0.16%  pixelutils  pixelutils     [.] ff_pixelutils_sad_32x32_avx2
   0.16%  pixelutils  pixelutils     [.] ff_pixelutils_sad_u_32x32_avx2
   0.12%  pixelutils  pixelutils     [.] ff_pixelutils_sad_a_32x32_avx2

sse2 sad_32x32 instructions:u like:

  71.86%  pixelutils  pixelutils     [.] block_sad_32x32_c
  18.42%  pixelutils  pixelutils     [.] block_sad_16x16_c
   4.81%  pixelutils  pixelutils     [.] block_sad_8x8_c
   2.68%  pixelutils  pixelutils     [.] block_sad_4x4_c
   0.88%  pixelutils  pixelutils     [.] block_sad_2x2_c
   0.29%  pixelutils  pixelutils     [.] ff_pixelutils_sad_32x32_sse2
   0.26%  pixelutils  pixelutils     [.] ff_pixelutils_sad_u_32x32_sse2
   0.23%  pixelutils  pixelutils     [.] ff_pixelutils_sad_a_32x32_sse2

Signed-off-by: Jun Zhao <mypopydev@gmail.com>
2018-07-31 19:17:51 +08:00
Jun Zhao
09628cb1b4 avutil/pixelutils: correct the function name in comments
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
2018-07-11 20:12:33 +08:00
Henrik Gramner
f0b7882ceb x86inc: Drop SECTION_TEXT macro
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
2015-08-04 20:13:09 +02:00
Clément Bœsch
554d819062 avutil/pixelutils: faster pixelutils_sad_16x16
501 to 439 decicycles.

See 45c7f3997e.
2014-08-23 20:12:56 +02:00
Clément Bœsch
45c7f3997e avutil/pixelutils: faster pixelutils_sad_[au]_16x16
~560 → ~500 decicycles

This is following the comments from Michael in
https://ffmpeg.org/pipermail/ffmpeg-devel/2014-August/160599.html

Using 2 registers for accumulator didn't help. On the other hand,
some re-ordering between the movs and psadbw allowed going ~538 to ~500.
2014-08-23 10:18:53 +02:00
Clément Bœsch
28a2107a8d avutil: add pixelutils API 2014-08-05 21:05:52 +02:00