1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-19 05:49:09 +02:00

248 Commits

Author SHA1 Message Date
Martin Vignali
53a03b5c8c avfilter/x86/vf_blend : add 16 bit version for BLEND_SIMPLE, phoenix, difference for SSE and AVX2 (x86_64) 2018-02-24 21:44:19 +01:00
Martin Vignali
6c6c9d14a8 avfilter/x86/vf_blend : indent 2018-02-24 21:44:16 +01:00
Martin Vignali
7590d58b61 avfilter/x86/vf_blend : reorganize init in order to add 16 bit version 2018-02-24 21:44:13 +01:00
Martin Vignali
3a230ce5fa avfilter/x86/vf_blend : avfilter/x86/vf_blend : add AVX2 version for each func except divide
and optimize average, grainextract, multiply, screen, grain merge
2018-01-28 20:21:32 +01:00
Marton Balint
4d95c6d5d7 avfilter/vf_framerate: add SIMD functions for frame blending
Blend function speedups on x86_64 Core i5 4460:

ffmpeg -f lavfi -i allyuv -vf framerate=60:threads=1 -f null none

C:     447548411 decicycles in Blend,    2048 runs,      0 skips
SSSE3: 130020087 decicycles in Blend,    2048 runs,      0 skips
AVX2:  128508221 decicycles in Blend,    2048 runs,      0 skips

ffmpeg -f lavfi -i allyuv -vf format=yuv420p12,framerate=60:threads=1 -f null none

C:     228932745 decicycles in Blend,    2048 runs,      0 skips
SSE4:  123357781 decicycles in Blend,    2048 runs,      0 skips
AVX2:  121215353 decicycles in Blend,    2048 runs,      0 skips

Signed-off-by: Marton Balint <cus@passwd.hu>
2018-01-28 18:50:52 +01:00
Martin Vignali
b94cd55155 avfilter/x86/vf_interlace : add AVX2 version 2018-01-11 21:03:19 +01:00
James Almer
8e0e4384b0 Revert "avfilter/vf_interlace : add AVX2 for lowpass_line 8 and 16"
This reverts commits 1a5865b6dcc97754a1d7eedc130fb58237d2a715 and
8fb1d63d919286971b8e6afad372730d6d6f25c8.

They made fate interlace tests fail when AVX2 was used.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-19 19:04:25 -03:00
Martin Vignali
3df6e61dad avfilter/x86/vf_hflip : indent
based on patch by Paul B Mahol
2017-12-19 21:10:12 +01:00
Martin Vignali
f181648176 avfilter/x86/vf_hflip : add avx2 version for hflip_byte and hflip_short 2017-12-19 21:10:09 +01:00
Martin Vignali
a4a4179e83 avfilter/x86/vf_hflip : merge hflip byte and hflip short to one macro 2017-12-19 21:10:05 +01:00
Martin Vignali
8fb1d63d91 avfilter/vf_tinterlace : add AVX2 func for lowpass_line 8 and 16 2017-12-19 20:59:59 +01:00
Martin Vignali
1a5865b6dc avfilter/vf_interlace : add AVX2 for lowpass_line 8 and 16 2017-12-19 20:59:54 +01:00
Martin Vignali
d31770d9a6 avfilter/vf_interlace : move func init in ff_interlace_init and add depth arg for ff_interlace_init_x86 2017-12-19 20:59:47 +01:00
Martin Vignali
3c6dc27035 avfilter/x86/vf_interlace : avfilter/x86/vf_interlace : fix crash when using unaligned data in low_pass complex
related to ticket 6491
2017-12-15 11:28:29 +01:00
Martin Vignali
49dced9fd0 avfilter/x86/vf_interlace : avoid crash when data are unaligned
ticket 6491
2017-12-15 11:28:25 +01:00
Martin Vignali
869efbf971 avfilter/x86/vf_threshold : add threshold16 SIMD (SSE4 and AVX2) 2017-12-09 14:47:09 +01:00
James Almer
f2aa0ce5a0 x86/vf_hflip: use xor to zero initialize registers
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-07 19:34:12 -03:00
James Almer
dc33fe1d00 x86/vf_hflip: don't load the width argument twice
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-07 19:34:12 -03:00
James Almer
cc2ba526d4 x86/vf_threshold: make threshold8 functions work on x86_32
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-04 15:46:09 -03:00
Paul B Mahol
5ff0d2acae avfilter/x86/vf_hflip.asm: fix building on x32
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-12-04 15:08:43 +01:00
Paul B Mahol
86fda8be3f avfilter: add hflip x86 SIMD
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-12-04 09:58:25 +01:00
James Almer
b73304f79e x86vf_threshold/: use the PBLENDVB macro
Fixes building with yasm

Tested-by: stevenliu
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-04 02:22:30 -03:00
Martin Vignali
6e3e696591 avfilter/x86/vf_threshold : cosmetic indent 2017-12-03 19:17:28 +01:00
Martin Vignali
9719d57b34 avfilter/x86/vf_threshold : add avx2 version for threshold 8 2017-12-03 19:17:23 +01:00
Martin Vignali
51345cb1d5 avfilter/x86/vf_threshold : make macro for threshold8 in order to add avx2 version 2017-12-03 19:17:19 +01:00
Paul B Mahol
bbfcb1b7c8 avfilter/vf_threshold: add x86 SIMD
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-12-02 14:58:56 +01:00
James Almer
2904db9045 Merge commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2'
* commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2':
  x86util: Port all macros to cpuflags

See d5f8a642f6eb1c6e305c41dabddd0fd36ffb3f77

Merged-by: James Almer <jamrial@gmail.com>
2017-10-21 12:15:57 -03:00
Thomas Mundt
40bfaa190c avfilter/interlace: add support for 10 and 12 bit
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Thomas Mundt <tmundt75@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-23 16:19:58 -03:00
Thomas Mundt
a7f6bfdc18 avfilter/interlace: prevent over-sharpening with the complex low-pass filter
The complex vertical low-pass filter slightly over-sharpens the picture. This becomes visible when several transcodings are cascaded and the error potentises, e.g. some generations of HD->SD SD->HD.
To prevent this behaviour the destination pixel must not exceed the source pixel when the average of the pixels above and below is less than the source pixel. And the other way around.

Tested and approved in a visual transcoding cascade test by video professionals.
SSIM/PSNR test with the first generation of an HD->SD file as a reference against the 6th generation(3 x SD->HD HD->SD):
Results without the patch:
SSIM Y:0.956508 (13.615881) U:0.991601 (20.757750) V:0.993004 (21.551382) All:0.974405 (15.918463)
PSNR y:31.838009 u:48.424280 v:48.962711 average:34.759466 min:31.699297 max:40.857847
Results with the patch:
SSIM Y:0.970051 (15.236232) U:0.991883 (20.905857) V:0.993174 (21.658049) All:0.981290 (17.279202)
PSNR y:34.412108 u:48.504454 v:48.969496 average:37.264644 min:34.310637 max:42.373392

Signed-off-by: Thomas Mundt <tmundt75@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-09-15 22:40:21 +02:00
Paul B Mahol
f8d0689d3f avfilter/vf_blend: rename addition128 and difference128 to grainmerge and grainextract 2017-08-24 14:45:52 +02:00
James Almer
5688fd77b5 x86/vf_limiter: make limiter functions work on x86_32
Signed-off-by: James Almer <jamrial@gmail.com>
2017-07-13 18:17:17 -03:00
Paul B Mahol
01e545d046 avfilter: add limiter filter
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-07-08 11:49:54 +02:00
James Almer
d2ef9e6e7f x86/vf_blend: use ABS2 macro 2017-06-27 20:45:55 -03:00
James Almer
0daa1cf073 x86/vf_blend: optimize difference and negation functions
Process more pixels per loop.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-27 13:17:23 -03:00
James Almer
fa50d9360b x86/vf_blend: add sse and ssse3 extremity functions
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-27 13:17:23 -03:00
Ronald S. Bultje
97f7f83169 vf_spp: only assign function pointers if permutation matches expectations. 2017-06-24 07:53:15 -04:00
Diego Biurrun
fd502f4f5f build: Generalize yasm/nasm-related variable names
None of them are specific to the YASM assembler.

(Cherry-picked from libav commit 39e208f4d4756367c7cd2d581847e0c1b8a429c1)

Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-21 17:00:29 -03:00
Paul B Mahol
49bbfb9d13 avfilter: add arbitrary audio FIR filter
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-05-09 20:47:52 +02:00
Thomas Mundt
2da5bf4c2f avfilter/interlace: add complex vertical low-pass filter
This complex (-1 2 6 2 -1) filter slightly less reduces interlace 'twitter' but better retain detail and subjective sharpness impression compared to the linear (1 2 1) filter.

Signed-off-by: Thomas Mundt <tmundt75@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-05-02 14:09:50 -03:00
Thomas Mundt
207e6debf8 avfilter/interlace: change lowpass_line function prototype
Signed-off-by: Thomas Mundt <tmundt75@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-04-22 20:12:15 +02:00
Diego Biurrun
39e208f4d4 build: Generalize yasm/nasm-related variable names
None of them are specific to the YASM assembler.
2017-03-01 10:18:15 +01:00
Paul B Mahol
c6c888e996 avfilter/vf_w3fdif: add >8 but <16 bit support
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2016-12-25 09:50:36 +01:00
Diego Biurrun
6be7944ee2 x86: Add missing colons after assembly labels
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
2016-10-17 16:31:26 +02:00
James Almer
a8e3833a61 x86/avf_showcqt: use the FMULADD_PS x86util macro
Signed-off-by: James Almer <jamrial@gmail.com>
2016-08-20 02:12:33 -03:00
Matthieu Bouron
9eb3da2f99 asm: FF_-prefix internal macros used in inline assembly
See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
2016-06-27 17:21:18 +02:00
Hendrik Leppkes
c142dc203e Merge commit 'dc40a70c5755bccfb1a1349639943e1f408bea50'
* commit 'dc40a70c5755bccfb1a1349639943e1f408bea50':
  Drop unnecessary libavutil/x86/asm.h #includes

Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
2016-06-26 15:53:00 +02:00
James Almer
172af20852 x86/showcqt: use three operand format for some instructions
Fixes failures with yasm 1.1.0 and older

Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-08 19:37:08 -03:00
James Almer
7d7fdd6532 x86/showcqt: add missing preprocessor checks
Old yasm/nasm versions don't support some of these

Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-08 19:34:43 -03:00
James Almer
99b899483e avutil/x86util: move haddps sse emulation from showcqt
Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-08 14:18:00 -03:00
Muhammad Faiz
1e69ac9246 avfilter/avf_showcqt: cqt_calc optimization on x86
on x86_64:
        time    PSNR
plain   3.303   inf
SSE     1.649   107.087535
SSE3    1.632   107.087535
AVX     1.409   106.986771
FMA3    1.265   107.108437

on x86_32 (PSNR compared to x86_64 plain):
        time    PSNR
plain   7.225   103.951979
SSE     1.827   105.859282
SSE3    1.819   105.859282
AVX     1.533   105.997661
FMA3    1.384   105.885377

FMA4 test is not available

Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
2016-06-08 16:09:43 +07:00