Paul B Mahol
dcae5ba322
avfilter: add anlmdn filter x86 SIMD optimizations
2019-01-10 21:49:47 +01:00
James Almer
ef67af31ff
x86/af_afir: use three operand form forat some instructions
...
Fixes compilation with old yasm versions.
Signed-off-by: James Almer <jamrial@gmail.com>
2019-01-03 23:36:19 -03:00
James Almer
5402c1886b
x86/af_afir: add ff_fcmul_add_avx()
...
fcmul_add_c: 1228.8
fcmul_add_sse3: 334.3
fcmul_add_avx: 186.3
Tested on a Core i5 4460 @ 3.2GHz
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-01-03 10:12:19 -03:00
James Almer
82043dfd2e
avfilter/af_afir: split off fcmul_add into a DSP context
...
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-01-03 10:12:18 -03:00
James Almer
9b5bd665e1
x86/af_afir: fix processing the last element
...
ff_fcmul_add_sse3() is now identical to the C version.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-01-03 10:12:18 -03:00
James Almer
3913d6f734
x86/scene_sad: fix link errors when HAVE_X86ASM is not defined
...
Reviewed-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2018-11-21 22:26:07 -03:00
Paul B Mahol
c98a32e4ad
avfilter/vf_blend: add 10bit support
2018-11-15 14:44:24 +01:00
Philip Langdale
1096614c42
avfilter/vf_bwdif: Use common yadif frame management logic
...
After adding field type management to the common yadif logic, we can
remove the duplicate copy of that logic from bwdif.
2018-11-14 17:41:01 -08:00
Marton Balint
6c2a7a8e9a
avfilter/vf_framerate: factorize SAD functions which compute SAD for a whole frame
...
Also add SIMD which works on lines because it is faster then calculating it on
8x8 blocks using pixelutils.
Signed-off-by: Marton Balint <cus@passwd.hu>
2018-11-11 20:30:50 +01:00
Paul B Mahol
0f0d468fbc
avfilter/vf_overlay: exclude nv12/nv21 formats from x86 asm check
...
They are yet to be supported,
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2018-05-03 09:22:28 +02:00
Paul B Mahol
6d7c63588c
avfilter/vf_overlay: add x86 SIMD
...
Specifically for yuv444, yuv422, yuv420 format when main stream has no alpha, and alpha
is straight.
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2018-05-02 23:58:21 +02:00
Vasile Toncu
9c01cdb94e
avfilter/vf_interlace: remove duplicate code with same funcionality
2018-04-23 23:48:30 +02:00
Martin Vignali
f3df42e81d
avfilter/x86/vf_blend : add SIMD for 16 bit version of
...
grainextract
grainmerge
average
extremity
negation
2018-04-05 21:46:16 +02:00
Martin Vignali
8eb0bb1108
avfilter/x86/vf_blend : reorganize DIFFERENCE macro to reduce line duplication between 8bit and 16 bit version
2018-04-05 21:46:11 +02:00
Martin Vignali
53a03b5c8c
avfilter/x86/vf_blend : add 16 bit version for BLEND_SIMPLE, phoenix, difference for SSE and AVX2 (x86_64)
2018-02-24 21:44:19 +01:00
Martin Vignali
6c6c9d14a8
avfilter/x86/vf_blend : indent
2018-02-24 21:44:16 +01:00
Martin Vignali
7590d58b61
avfilter/x86/vf_blend : reorganize init in order to add 16 bit version
2018-02-24 21:44:13 +01:00
Martin Vignali
3a230ce5fa
avfilter/x86/vf_blend : avfilter/x86/vf_blend : add AVX2 version for each func except divide
...
and optimize average, grainextract, multiply, screen, grain merge
2018-01-28 20:21:32 +01:00
Marton Balint
4d95c6d5d7
avfilter/vf_framerate: add SIMD functions for frame blending
...
Blend function speedups on x86_64 Core i5 4460:
ffmpeg -f lavfi -i allyuv -vf framerate=60:threads=1 -f null none
C: 447548411 decicycles in Blend, 2048 runs, 0 skips
SSSE3: 130020087 decicycles in Blend, 2048 runs, 0 skips
AVX2: 128508221 decicycles in Blend, 2048 runs, 0 skips
ffmpeg -f lavfi -i allyuv -vf format=yuv420p12,framerate=60:threads=1 -f null none
C: 228932745 decicycles in Blend, 2048 runs, 0 skips
SSE4: 123357781 decicycles in Blend, 2048 runs, 0 skips
AVX2: 121215353 decicycles in Blend, 2048 runs, 0 skips
Signed-off-by: Marton Balint <cus@passwd.hu>
2018-01-28 18:50:52 +01:00
Martin Vignali
b94cd55155
avfilter/x86/vf_interlace : add AVX2 version
2018-01-11 21:03:19 +01:00
James Almer
8e0e4384b0
Revert "avfilter/vf_interlace : add AVX2 for lowpass_line 8 and 16"
...
This reverts commits 1a5865b6dcc97754a1d7eedc130fb58237d2a715 and
8fb1d63d919286971b8e6afad372730d6d6f25c8.
They made fate interlace tests fail when AVX2 was used.
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-19 19:04:25 -03:00
Martin Vignali
3df6e61dad
avfilter/x86/vf_hflip : indent
...
based on patch by Paul B Mahol
2017-12-19 21:10:12 +01:00
Martin Vignali
f181648176
avfilter/x86/vf_hflip : add avx2 version for hflip_byte and hflip_short
2017-12-19 21:10:09 +01:00
Martin Vignali
a4a4179e83
avfilter/x86/vf_hflip : merge hflip byte and hflip short to one macro
2017-12-19 21:10:05 +01:00
Martin Vignali
8fb1d63d91
avfilter/vf_tinterlace : add AVX2 func for lowpass_line 8 and 16
2017-12-19 20:59:59 +01:00
Martin Vignali
1a5865b6dc
avfilter/vf_interlace : add AVX2 for lowpass_line 8 and 16
2017-12-19 20:59:54 +01:00
Martin Vignali
d31770d9a6
avfilter/vf_interlace : move func init in ff_interlace_init and add depth arg for ff_interlace_init_x86
2017-12-19 20:59:47 +01:00
Martin Vignali
3c6dc27035
avfilter/x86/vf_interlace : avfilter/x86/vf_interlace : fix crash when using unaligned data in low_pass complex
...
related to ticket 6491
2017-12-15 11:28:29 +01:00
Martin Vignali
49dced9fd0
avfilter/x86/vf_interlace : avoid crash when data are unaligned
...
ticket 6491
2017-12-15 11:28:25 +01:00
Martin Vignali
869efbf971
avfilter/x86/vf_threshold : add threshold16 SIMD (SSE4 and AVX2)
2017-12-09 14:47:09 +01:00
James Almer
f2aa0ce5a0
x86/vf_hflip: use xor to zero initialize registers
...
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-07 19:34:12 -03:00
James Almer
dc33fe1d00
x86/vf_hflip: don't load the width argument twice
...
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-07 19:34:12 -03:00
James Almer
cc2ba526d4
x86/vf_threshold: make threshold8 functions work on x86_32
...
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-04 15:46:09 -03:00
Paul B Mahol
5ff0d2acae
avfilter/x86/vf_hflip.asm: fix building on x32
...
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-12-04 15:08:43 +01:00
Paul B Mahol
86fda8be3f
avfilter: add hflip x86 SIMD
...
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-12-04 09:58:25 +01:00
James Almer
b73304f79e
x86vf_threshold/: use the PBLENDVB macro
...
Fixes building with yasm
Tested-by: stevenliu
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-04 02:22:30 -03:00
Martin Vignali
6e3e696591
avfilter/x86/vf_threshold : cosmetic indent
2017-12-03 19:17:28 +01:00
Martin Vignali
9719d57b34
avfilter/x86/vf_threshold : add avx2 version for threshold 8
2017-12-03 19:17:23 +01:00
Martin Vignali
51345cb1d5
avfilter/x86/vf_threshold : make macro for threshold8 in order to add avx2 version
2017-12-03 19:17:19 +01:00
Paul B Mahol
bbfcb1b7c8
avfilter/vf_threshold: add x86 SIMD
...
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-12-02 14:58:56 +01:00
James Almer
2904db9045
Merge commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2'
...
* commit '994c4bc10751e39c7ed9f67ffd0c0dea5223daf2':
x86util: Port all macros to cpuflags
See d5f8a642f6eb1c6e305c41dabddd0fd36ffb3f77
Merged-by: James Almer <jamrial@gmail.com>
2017-10-21 12:15:57 -03:00
Thomas Mundt
40bfaa190c
avfilter/interlace: add support for 10 and 12 bit
...
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Thomas Mundt <tmundt75@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-09-23 16:19:58 -03:00
Thomas Mundt
a7f6bfdc18
avfilter/interlace: prevent over-sharpening with the complex low-pass filter
...
The complex vertical low-pass filter slightly over-sharpens the picture. This becomes visible when several transcodings are cascaded and the error potentises, e.g. some generations of HD->SD SD->HD.
To prevent this behaviour the destination pixel must not exceed the source pixel when the average of the pixels above and below is less than the source pixel. And the other way around.
Tested and approved in a visual transcoding cascade test by video professionals.
SSIM/PSNR test with the first generation of an HD->SD file as a reference against the 6th generation(3 x SD->HD HD->SD):
Results without the patch:
SSIM Y:0.956508 (13.615881) U:0.991601 (20.757750) V:0.993004 (21.551382) All:0.974405 (15.918463)
PSNR y:31.838009 u:48.424280 v:48.962711 average:34.759466 min:31.699297 max:40.857847
Results with the patch:
SSIM Y:0.970051 (15.236232) U:0.991883 (20.905857) V:0.993174 (21.658049) All:0.981290 (17.279202)
PSNR y:34.412108 u:48.504454 v:48.969496 average:37.264644 min:34.310637 max:42.373392
Signed-off-by: Thomas Mundt <tmundt75@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2017-09-15 22:40:21 +02:00
Paul B Mahol
f8d0689d3f
avfilter/vf_blend: rename addition128 and difference128 to grainmerge and grainextract
2017-08-24 14:45:52 +02:00
James Almer
5688fd77b5
x86/vf_limiter: make limiter functions work on x86_32
...
Signed-off-by: James Almer <jamrial@gmail.com>
2017-07-13 18:17:17 -03:00
Paul B Mahol
01e545d046
avfilter: add limiter filter
...
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-07-08 11:49:54 +02:00
James Almer
d2ef9e6e7f
x86/vf_blend: use ABS2 macro
2017-06-27 20:45:55 -03:00
James Almer
0daa1cf073
x86/vf_blend: optimize difference and negation functions
...
Process more pixels per loop.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-27 13:17:23 -03:00
James Almer
fa50d9360b
x86/vf_blend: add sse and ssse3 extremity functions
...
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-06-27 13:17:23 -03:00
Ronald S. Bultje
97f7f83169
vf_spp: only assign function pointers if permutation matches expectations.
2017-06-24 07:53:15 -04:00