1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-19 05:49:09 +02:00

272 Commits

Author SHA1 Message Date
James Almer
1dbd3c6116 avfilter/vf_eq: fix compilation with x86 asm disabled
Signed-off-by: James Almer <jamrial@gmail.com>
2019-09-26 12:19:43 -03:00
Ting Fu
4f589d668e avfilter/x86/vf_eq: add SSE2 version
Signed-off-by: Ting Fu <ting.fu@intel.com>
2019-09-26 08:12:36 +08:00
Ting Fu
6aff2042d6 avfilter/x86/vf_eq: Change inline assembly into nasm code
Signed-off-by: Ting Fu <ting.fu@intel.com>
2019-09-26 08:11:13 +08:00
Paul B Mahol
921eb21b1d avfilter/x86/vf_360: add most of >8 depth asm 2019-09-16 10:21:16 +02:00
James Almer
4857688732 x86/vf_v360: use a faster horizontal add in remap4_8bit_line_avx2
Signed-off-by: James Almer <jamrial@gmail.com>
2019-09-06 12:11:46 -03:00
James Almer
2200cf1aca x86/vf_v360: make remap{1,2}_8bit_line_avx2 work on x86_32
Signed-off-by: James Almer <jamrial@gmail.com>
2019-09-06 11:11:45 -03:00
Paul B Mahol
058bbf48c6 avfilter/vf_v360: x86 SIMD for interpolations 2019-09-06 14:10:37 +02:00
Ruiling Song
98e419cbf5 avfilter/vf_convolution: add x86 SIMD for filter_3x3()
Tested using a simple command (apply edge enhance):
./ffmpeg_g -i ~/Downloads/bbb_sunflower_1080p_30fps_normal.mp4 \
 -vf convolution="0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:5:1:1:1:0:128:128:128" \
 -an -vframes 1000 -f null /dev/null

The fps increase from 151 to 270 on my local machine.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
2019-08-07 14:31:28 +08:00
James Almer
b8f1542dcb avfilter/vf_gblur: add missing preprocessor check
Fixes compilation on x86_32

Signed-off-by: James Almer <jamrial@gmail.com>
2019-06-12 10:54:59 -03:00
Ruiling Song
83f9da7768 avfilter/vf_gblur: add x86 SIMD optimizations
The horizontal pass get ~2x performance with the patch
under single thread.

Tested overall performance using the command(avx2 enabled):
./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null
./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null
For single thread, the fps improves from 43 to 60, about 40%.
For multi-thread, the fps improves from 110 to 130, about 20%.

Signed-off-by: Ruiling Song <ruiling.song@intel.com>
2019-06-12 08:53:11 +08:00
Paul B Mahol
dcae5ba322 avfilter: add anlmdn filter x86 SIMD optimizations 2019-01-10 21:49:47 +01:00
James Almer
ef67af31ff x86/af_afir: use three operand form forat some instructions
Fixes compilation with old yasm versions.

Signed-off-by: James Almer <jamrial@gmail.com>
2019-01-03 23:36:19 -03:00
James Almer
5402c1886b x86/af_afir: add ff_fcmul_add_avx()
fcmul_add_c: 1228.8
fcmul_add_sse3: 334.3
fcmul_add_avx: 186.3

Tested on a Core i5 4460 @ 3.2GHz

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-01-03 10:12:19 -03:00
James Almer
82043dfd2e avfilter/af_afir: split off fcmul_add into a DSP context
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-01-03 10:12:18 -03:00
James Almer
9b5bd665e1 x86/af_afir: fix processing the last element
ff_fcmul_add_sse3() is now identical to the C version.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2019-01-03 10:12:18 -03:00
James Almer
3913d6f734 x86/scene_sad: fix link errors when HAVE_X86ASM is not defined
Reviewed-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2018-11-21 22:26:07 -03:00
Paul B Mahol
c98a32e4ad avfilter/vf_blend: add 10bit support 2018-11-15 14:44:24 +01:00
Philip Langdale
1096614c42 avfilter/vf_bwdif: Use common yadif frame management logic
After adding field type management to the common yadif logic, we can
remove the duplicate copy of that logic from bwdif.
2018-11-14 17:41:01 -08:00
Marton Balint
6c2a7a8e9a avfilter/vf_framerate: factorize SAD functions which compute SAD for a whole frame
Also add SIMD which works on lines because it is faster then calculating it on
8x8 blocks using pixelutils.

Signed-off-by: Marton Balint <cus@passwd.hu>
2018-11-11 20:30:50 +01:00
Paul B Mahol
0f0d468fbc avfilter/vf_overlay: exclude nv12/nv21 formats from x86 asm check
They are yet to be supported,

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2018-05-03 09:22:28 +02:00
Paul B Mahol
6d7c63588c avfilter/vf_overlay: add x86 SIMD
Specifically for yuv444, yuv422, yuv420 format when main stream has no alpha, and alpha
is straight.

Signed-off-by: Paul B Mahol <onemda@gmail.com>
2018-05-02 23:58:21 +02:00
Vasile Toncu
9c01cdb94e avfilter/vf_interlace: remove duplicate code with same funcionality 2018-04-23 23:48:30 +02:00
Martin Vignali
f3df42e81d avfilter/x86/vf_blend : add SIMD for 16 bit version of
grainextract
grainmerge
average
extremity
negation
2018-04-05 21:46:16 +02:00
Martin Vignali
8eb0bb1108 avfilter/x86/vf_blend : reorganize DIFFERENCE macro to reduce line duplication between 8bit and 16 bit version 2018-04-05 21:46:11 +02:00
Martin Vignali
53a03b5c8c avfilter/x86/vf_blend : add 16 bit version for BLEND_SIMPLE, phoenix, difference for SSE and AVX2 (x86_64) 2018-02-24 21:44:19 +01:00
Martin Vignali
6c6c9d14a8 avfilter/x86/vf_blend : indent 2018-02-24 21:44:16 +01:00
Martin Vignali
7590d58b61 avfilter/x86/vf_blend : reorganize init in order to add 16 bit version 2018-02-24 21:44:13 +01:00
Martin Vignali
3a230ce5fa avfilter/x86/vf_blend : avfilter/x86/vf_blend : add AVX2 version for each func except divide
and optimize average, grainextract, multiply, screen, grain merge
2018-01-28 20:21:32 +01:00
Marton Balint
4d95c6d5d7 avfilter/vf_framerate: add SIMD functions for frame blending
Blend function speedups on x86_64 Core i5 4460:

ffmpeg -f lavfi -i allyuv -vf framerate=60:threads=1 -f null none

C:     447548411 decicycles in Blend,    2048 runs,      0 skips
SSSE3: 130020087 decicycles in Blend,    2048 runs,      0 skips
AVX2:  128508221 decicycles in Blend,    2048 runs,      0 skips

ffmpeg -f lavfi -i allyuv -vf format=yuv420p12,framerate=60:threads=1 -f null none

C:     228932745 decicycles in Blend,    2048 runs,      0 skips
SSE4:  123357781 decicycles in Blend,    2048 runs,      0 skips
AVX2:  121215353 decicycles in Blend,    2048 runs,      0 skips

Signed-off-by: Marton Balint <cus@passwd.hu>
2018-01-28 18:50:52 +01:00
Martin Vignali
b94cd55155 avfilter/x86/vf_interlace : add AVX2 version 2018-01-11 21:03:19 +01:00
James Almer
8e0e4384b0 Revert "avfilter/vf_interlace : add AVX2 for lowpass_line 8 and 16"
This reverts commits 1a5865b6dcc97754a1d7eedc130fb58237d2a715 and
8fb1d63d919286971b8e6afad372730d6d6f25c8.

They made fate interlace tests fail when AVX2 was used.

Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-19 19:04:25 -03:00
Martin Vignali
3df6e61dad avfilter/x86/vf_hflip : indent
based on patch by Paul B Mahol
2017-12-19 21:10:12 +01:00
Martin Vignali
f181648176 avfilter/x86/vf_hflip : add avx2 version for hflip_byte and hflip_short 2017-12-19 21:10:09 +01:00
Martin Vignali
a4a4179e83 avfilter/x86/vf_hflip : merge hflip byte and hflip short to one macro 2017-12-19 21:10:05 +01:00
Martin Vignali
8fb1d63d91 avfilter/vf_tinterlace : add AVX2 func for lowpass_line 8 and 16 2017-12-19 20:59:59 +01:00
Martin Vignali
1a5865b6dc avfilter/vf_interlace : add AVX2 for lowpass_line 8 and 16 2017-12-19 20:59:54 +01:00
Martin Vignali
d31770d9a6 avfilter/vf_interlace : move func init in ff_interlace_init and add depth arg for ff_interlace_init_x86 2017-12-19 20:59:47 +01:00
Martin Vignali
3c6dc27035 avfilter/x86/vf_interlace : avfilter/x86/vf_interlace : fix crash when using unaligned data in low_pass complex
related to ticket 6491
2017-12-15 11:28:29 +01:00
Martin Vignali
49dced9fd0 avfilter/x86/vf_interlace : avoid crash when data are unaligned
ticket 6491
2017-12-15 11:28:25 +01:00
Martin Vignali
869efbf971 avfilter/x86/vf_threshold : add threshold16 SIMD (SSE4 and AVX2) 2017-12-09 14:47:09 +01:00
James Almer
f2aa0ce5a0 x86/vf_hflip: use xor to zero initialize registers
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-07 19:34:12 -03:00
James Almer
dc33fe1d00 x86/vf_hflip: don't load the width argument twice
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-07 19:34:12 -03:00
James Almer
cc2ba526d4 x86/vf_threshold: make threshold8 functions work on x86_32
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-04 15:46:09 -03:00
Paul B Mahol
5ff0d2acae avfilter/x86/vf_hflip.asm: fix building on x32
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-12-04 15:08:43 +01:00
Paul B Mahol
86fda8be3f avfilter: add hflip x86 SIMD
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-12-04 09:58:25 +01:00
James Almer
b73304f79e x86vf_threshold/: use the PBLENDVB macro
Fixes building with yasm

Tested-by: stevenliu
Signed-off-by: James Almer <jamrial@gmail.com>
2017-12-04 02:22:30 -03:00
Martin Vignali
6e3e696591 avfilter/x86/vf_threshold : cosmetic indent 2017-12-03 19:17:28 +01:00
Martin Vignali
9719d57b34 avfilter/x86/vf_threshold : add avx2 version for threshold 8 2017-12-03 19:17:23 +01:00
Martin Vignali
51345cb1d5 avfilter/x86/vf_threshold : make macro for threshold8 in order to add avx2 version 2017-12-03 19:17:19 +01:00
Paul B Mahol
bbfcb1b7c8 avfilter/vf_threshold: add x86 SIMD
Signed-off-by: Paul B Mahol <onemda@gmail.com>
2017-12-02 14:58:56 +01:00