FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-21 10:55:51 +02:00

Author	SHA1	Message	Date
Andreas Rheinhardt	24936a9fbb	avfilter/vf_gblur: Move ff_gblur_init into a header This removes a dependency of checkasm on lavfi/vf_gblur.o and also allows to inline ff_gblur_init() irrespectively of interposing. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-05-06 05:19:49 +02:00
Wu Jianhua	4041c1029b	libavfilter/x86/vf_gblur: add localbuf and ff_horiz_slice_avx2/512() We introduced a ff_horiz_slice_avx2/512() implemented on a new algorithm. In a nutshell, the new algorithm does three things, gathering data from 8/16 rows, blurring data, and scattering data back to the image buffer. Here we used a customized transpose 8x8/16x16 to avoid the huge overhead brought by gather and scatter instructions, which is dependent on the temporary buffer called localbuf added newly. Performance data: ff_horiz_slice_avx2(old): 109.89 ff_horiz_slice_avx2(new): 666.67 ff_horiz_slice_avx512: 1000 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-08-29 19:58:33 +02:00
Wu Jianhua	68a2722aee	libavfilter/x86/vf_gblur: add ff_verti_slice_avx2/512() The new vertical slice with AVX2/512 acceleration can significantly improve the performance of Gaussian Filter 2D. Performance data: ff_verti_slice_c: 32.57 ff_verti_slice_avx2: 476.19 ff_verti_slice_avx512: 833.33 Co-authored-by: Cheng Yanfei <yanfei.cheng@intel.com> Co-authored-by: Jin Jun <jun.i.jin@intel.com> Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>	2021-08-29 19:58:33 +02:00
Paul B Mahol	058db59e16	avfilter/vf_gblur: factor out postscale function	2021-02-16 21:12:11 +01:00
Paul B Mahol	34922dffca	avfilter/vf_gblur: add float format support	2021-02-12 21:09:51 +01:00
Ruiling Song	83f9da7768	avfilter/vf_gblur: add x86 SIMD optimizations The horizontal pass get ~2x performance with the patch under single thread. Tested overall performance using the command(avx2 enabled): ./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null ./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null For single thread, the fps improves from 43 to 60, about 40%. For multi-thread, the fps improves from 110 to 130, about 20%. Signed-off-by: Ruiling Song <ruiling.song@intel.com>	2019-06-12 08:53:11 +08:00

6 Commits