FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-28 20:53:54 +02:00

Author	SHA1	Message	Date
Paul B Mahol	047c362d3c	avfilter/vf_nlmeans: add x86 SIMD	2021-11-11 21:54:46 +01:00
Mark Reid	716b396740	avfilter/vf_lut3d: add x86-optimized tetrahedral interpolation I spotted an interesting pattern that I didn't see before that leads to the implementation being faster. The bit shifting table I was using before is no longer needed, and was able to remove quite a few lines. I also add use of FMA on the AVX2 version. f32 1920x1080 1 thread with prelut c impl 1434012700 UNITS in lut3d->interp, 1 runs, 0 skips 1434035335 UNITS in lut3d->interp, 2 runs, 0 skips 1423615347 UNITS in lut3d->interp, 4 runs, 0 skips 1426268863 UNITS in lut3d->interp, 8 runs, 0 skips sse2 905484420 UNITS in lut3d->interp, 1 runs, 0 skips 905659010 UNITS in lut3d->interp, 2 runs, 0 skips 915167140 UNITS in lut3d->interp, 4 runs, 0 skips 915834222 UNITS in lut3d->interp, 8 runs, 0 skips avx 574794860 UNITS in lut3d->interp, 1 runs, 0 skips 581035090 UNITS in lut3d->interp, 2 runs, 0 skips 584116720 UNITS in lut3d->interp, 4 runs, 0 skips 581460290 UNITS in lut3d->interp, 8 runs, 0 skips avx2 301698880 UNITS in lut3d->interp, 1 runs, 0 skips 301982880 UNITS in lut3d->interp, 2 runs, 0 skips 306962430 UNITS in lut3d->interp, 4 runs, 0 skips 305472025 UNITS in lut3d->interp, 8 runs, 0 skips gbrap16 1920x1080 1 thread with prelut c impl 1480894840 UNITS in lut3d->interp, 1 runs, 0 skips 1502922990 UNITS in lut3d->interp, 2 runs, 0 skips 1496114307 UNITS in lut3d->interp, 4 runs, 0 skips 1492554551 UNITS in lut3d->interp, 8 runs, 0 skips sse2 980777180 UNITS in lut3d->interp, 1 runs, 0 skips 986121520 UNITS in lut3d->interp, 2 runs, 0 skips 986489840 UNITS in lut3d->interp, 4 runs, 0 skips 998832248 UNITS in lut3d->interp, 8 runs, 0 skips avx 622212360 UNITS in lut3d->interp, 1 runs, 0 skips 622981160 UNITS in lut3d->interp, 2 runs, 0 skips 645396315 UNITS in lut3d->interp, 4 runs, 0 skips 641057075 UNITS in lut3d->interp, 8 runs, 0 skips avx2 321336400 UNITS in lut3d->interp, 1 runs, 0 skips 321268920 UNITS in lut3d->interp, 2 runs, 0 skips 323459895 UNITS in lut3d->interp, 4 runs, 0 skips 324949967 UNITS in lut3d->interp, 8 runs, 0 skips	2021-10-10 22:23:48 +02:00
Paul B Mahol	ac0f5f4c17	avfilter/vf_maskedclamp: add x86 SIMD	2019-10-23 16:20:21 +02:00
Paul B Mahol	ccd9bca15a	avfilter/vf_transpose: add x86 SIMD	2019-10-21 20:37:51 +02:00
Paul B Mahol	295d99b439	avfilter/vf_adadenoise: add x86 SIMD	2019-10-17 19:44:11 +02:00
James Almer	1dbd3c6116	avfilter/vf_eq: fix compilation with x86 asm disabled Signed-off-by: James Almer <jamrial@gmail.com>	2019-09-26 12:19:43 -03:00
Ting Fu	6aff2042d6	avfilter/x86/vf_eq: Change inline assembly into nasm code Signed-off-by: Ting Fu <ting.fu@intel.com>	2019-09-26 08:11:13 +08:00
Paul B Mahol	058bbf48c6	avfilter/vf_v360: x86 SIMD for interpolations	2019-09-06 14:10:37 +02:00
Ruiling Song	98e419cbf5	avfilter/vf_convolution: add x86 SIMD for filter_3x3() Tested using a simple command (apply edge enhance): ./ffmpeg_g -i ~/Downloads/bbb_sunflower_1080p_30fps_normal.mp4 \ -vf convolution="0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:0 0 0 -1 1 0 0 0 0:5:1:1:1:0:128:128:128" \ -an -vframes 1000 -f null /dev/null The fps increase from 151 to 270 on my local machine. Signed-off-by: Ruiling Song <ruiling.song@intel.com>	2019-08-07 14:31:28 +08:00
Ruiling Song	83f9da7768	avfilter/vf_gblur: add x86 SIMD optimizations The horizontal pass get ~2x performance with the patch under single thread. Tested overall performance using the command(avx2 enabled): ./ffmpeg -i 1080p.mp4 -vf gblur -f null /dev/null ./ffmpeg -i 1080p.mp4 -vf gblur=threads=1 -f null /dev/null For single thread, the fps improves from 43 to 60, about 40%. For multi-thread, the fps improves from 110 to 130, about 20%. Signed-off-by: Ruiling Song <ruiling.song@intel.com>	2019-06-12 08:53:11 +08:00
Paul B Mahol	dcae5ba322	avfilter: add anlmdn filter x86 SIMD optimizations	2019-01-10 21:49:47 +01:00
Marton Balint	6c2a7a8e9a	avfilter/vf_framerate: factorize SAD functions which compute SAD for a whole frame Also add SIMD which works on lines because it is faster then calculating it on 8x8 blocks using pixelutils. Signed-off-by: Marton Balint <cus@passwd.hu>	2018-11-11 20:30:50 +01:00
Paul B Mahol	6d7c63588c	avfilter/vf_overlay: add x86 SIMD Specifically for yuv444, yuv422, yuv420 format when main stream has no alpha, and alpha is straight. Signed-off-by: Paul B Mahol <onemda@gmail.com>	2018-05-02 23:58:21 +02:00
Vasile Toncu	9c01cdb94e	avfilter/vf_interlace: remove duplicate code with same funcionality	2018-04-23 23:48:30 +02:00
Marton Balint	4d95c6d5d7	avfilter/vf_framerate: add SIMD functions for frame blending Blend function speedups on x86_64 Core i5 4460: ffmpeg -f lavfi -i allyuv -vf framerate=60:threads=1 -f null none C: 447548411 decicycles in Blend, 2048 runs, 0 skips SSSE3: 130020087 decicycles in Blend, 2048 runs, 0 skips AVX2: 128508221 decicycles in Blend, 2048 runs, 0 skips ffmpeg -f lavfi -i allyuv -vf format=yuv420p12,framerate=60:threads=1 -f null none C: 228932745 decicycles in Blend, 2048 runs, 0 skips SSE4: 123357781 decicycles in Blend, 2048 runs, 0 skips AVX2: 121215353 decicycles in Blend, 2048 runs, 0 skips Signed-off-by: Marton Balint <cus@passwd.hu>	2018-01-28 18:50:52 +01:00
Paul B Mahol	86fda8be3f	avfilter: add hflip x86 SIMD Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-12-04 09:58:25 +01:00
Paul B Mahol	bbfcb1b7c8	avfilter/vf_threshold: add x86 SIMD Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-12-02 14:58:56 +01:00
Paul B Mahol	01e545d046	avfilter: add limiter filter Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-07-08 11:49:54 +02:00
Diego Biurrun	fd502f4f5f	build: Generalize yasm/nasm-related variable names None of them are specific to the YASM assembler. (Cherry-picked from libav commit `39e208f4d4`) Signed-off-by: James Almer <jamrial@gmail.com>	2017-06-21 17:00:29 -03:00
Paul B Mahol	49bbfb9d13	avfilter: add arbitrary audio FIR filter Signed-off-by: Paul B Mahol <onemda@gmail.com>	2017-05-09 20:47:52 +02:00
Muhammad Faiz	1e69ac9246	avfilter/avf_showcqt: cqt_calc optimization on x86 on x86_64: time PSNR plain 3.303 inf SSE 1.649 107.087535 SSE3 1.632 107.087535 AVX 1.409 106.986771 FMA3 1.265 107.108437 on x86_32 (PSNR compared to x86_64 plain): time PSNR plain 7.225 103.951979 SSE 1.827 105.859282 SSE3 1.819 105.859282 AVX 1.533 105.997661 FMA3 1.384 105.885377 FMA4 test is not available Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>	2016-06-08 16:09:43 +07:00
Ronald S. Bultje	5ce703a6bf	vf_colorspace: x86-64 SIMD (SSE2) optimizations.	2016-04-12 16:42:48 -04:00
Thomas Mundt	5024a82e95	avfilter/vf_bwdif: add x86 SIMD Signed-off-by: Thomas Mundt <loudmax@yahoo.de>	2016-03-13 10:06:21 +01:00
Paul B Mahol	5740dc27e1	avfilter/vf_w3fdif: add x86 SIMD Signed-off-by: Paul B Mahol <onemda@gmail.com>	2015-10-10 17:33:43 +02:00
Paul B Mahol	ac74e857a2	avfilter/vf_stereo3d: add x86 SIMD for anaglyph outputs Signed-off-by: Paul B Mahol <onemda@gmail.com>	2015-10-06 21:01:24 +02:00
Paul B Mahol	9762554dd0	avfilter/vf_blend: add x86 SIMD for some modes Signed-off-by: Paul B Mahol <onemda@gmail.com>	2015-10-03 21:26:17 +02:00
Paul B Mahol	160556c9ad	avfilter/vf_maskedmerge: add SIMD for maskedmerge with 8 bit depth input Signed-off-by: Paul B Mahol <onemda@gmail.com>	2015-10-02 17:40:57 +02:00
James Darnley	bff7242608	avfilter/vf_removegrain: add x86 and x86_64 SSE2 functions Speed of all modes increased by a factor between 7.4 and 19.8 largely depending on whether bytes are unpacked into words. Modes 2, 3, and 4 have been sped-up by a factor of 43 (thanks quick sort!) All modes are available on x86_64 but only modes 1, 10, 11, 12, 13, 14, 19, 20, 21, and 22 are available on x86 due to the number of SIMD registers used. With a contribution from James Almer <jamrial@gmail.com>	2015-07-14 23:50:50 +00:00
Ronald S. Bultje	ae4c9ddebc	vf_psnr: sse2 optimizations for sum-squared-error. The internal line accumulator for 16bit can overflow, so I changed that from int to uint64_t in the C code. The matching assembly looks a little weird but output looks correct. (avx2 should be trivial to add later.) Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2015-07-14 17:57:14 +02:00
Ronald S. Bultje	dfc58584b4	vf_ssim: x86 simd for ssim_4x4xN and ssim_endN. Both are 2-2.5x faster than their C counterpart. Reviewed-by: Paul B Mahol <onemda@gmail.com> Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2015-07-14 05:07:07 +02:00
Arwa Arif	4c38e960d0	avfilter: Port mp=eq/eq2 to lavfi Code adapted from James Darnley's port Some fixes from Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-01-26 00:14:04 +01:00
James Almer	da02ee127a	x86/vf_pp7: port dctB_mmx to yasm Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-01-09 20:02:27 -03:00
Arwa Arif	a299cd5ab3	lavfi: port mp=pp7 to libavfilter The only difference with mp=pp7 is that default mode is "medium", as stated in the MPlayer docs, rather than "hard". Signed-off-by: Stefano Sabatini <stefasab@gmail.com>	2015-01-09 17:26:31 +01:00
James Almer	466e32bf25	x86/vf_fspp: port inline asm to yasm Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-12-26 15:39:51 -03:00
Arwa Arif	bdc4db0ee3	lavfi: port mp=fspp to a native libavfilter filter Signed-off-by: Stefano Sabatini <stefasab@gmail.com>	2014-12-24 16:29:18 +01:00
Michael Niedermayer	fb3eb57369	avfilter/tinterlace: add Support for ff_lowpass_line_avx() & ff_lowpass_line_sse2() Based-on: `2e1704059a` by Kieran Kunhya Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-15 04:02:33 +01:00
Michael Niedermayer	6f373d75e8	Merge commit '2e1704059ae8625beda2ffde847ad22c5ba416dc' * commit '2e1704059ae8625beda2ffde847ad22c5ba416dc': vf_interlace: Add SIMD for lowpass filter Conflicts: libavfilter/vf_interlace.c libavfilter/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-15 02:39:49 +01:00
Kieran Kunhya	2e1704059a	vf_interlace: Add SIMD for lowpass filter Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2014-11-15 00:35:31 +01:00
James Almer	864f9326fb	x86/vf_noise: move asm code to a separate file Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-10-17 00:44:35 -03:00
skal	406a9ccffe	avfilter/vf_idet: MMX/MMXEXT/SSE2 implementation of idet's filter_line() integration by Neil Birkbeck, with help from Vitor Sessak. core SSE2 loop by Skal (pascal.massimino@gmail.com) Reviewed-by: Clément Bœsch <u@pkh.me> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-04 22:19:00 +02:00
Robert Krüger	4a38eeec38	Revert "Revert "vf_yadif: move x86 init code to x86/yadif.c"" This reverts commit `975110a85e`. Signed-off-by: Robert Krüger <krueger@lesspain.de> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-14 14:19:14 +01:00
Michael Niedermayer	975110a85e	Revert "vf_yadif: move x86 init code to x86/yadif.c" This reverts commit `a87b17f328`. This reduces the amount of non LGPL code, making a relicensing to LGPL easier Conflicts: libavfilter/vf_yadif.c libavfilter/x86/yadif.c libavfilter/x86/yadif_template.c libavfilter/yadif.h Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-12-01 20:26:26 +01:00
Michael Niedermayer	1ea28ffc4d	Merge commit '0e730494160d973400aed8d2addd1f58a0ec883e' * commit '0e730494160d973400aed8d2addd1f58a0ec883e': avfilter: x86: Port gradfun filter optimizations to yasm Conflicts: libavfilter/x86/vf_gradfun_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-24 10:35:39 +02:00
Daniel Kang	0e73049416	avfilter: x86: Port gradfun filter optimizations to yasm Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-10-23 14:50:27 +02:00
Paul B Mahol	9c774459a9	avfilter: port pullup filter from libmpcodecs Signed-off-by: Paul B Mahol <onemda@gmail.com>	2013-09-17 17:03:36 +00:00
Clément Bœsch	a2c547ffec	lavfi: add spp filter.	2013-06-14 01:27:22 +02:00
James Darnley	0a5814c9ba	yadif: x86 assembly for 9 to 14-bit samples These smaller samples do not need to be unpacked to double words allowing the code to process more pixels every iteration (still 2 in MMX but 6 in SSE2). It also avoids emulating the missing double word instructions on older instruction sets. Like with the previous code for 16-bit samples this has been tested on an Athlon64 and a Core2Quad. Athlon64: 1809275 decicycles in C, 32718 runs, 50 skips 911675 decicycles in mmx, 32727 runs, 41 skips, 2.0x faster 495284 decicycles in sse2, 32747 runs, 21 skips, 3.7x faster Core2Quad: 921363 decicycles in C, 32756 runs, 12 skips 486537 decicycles in mmx, 32764 runs, 4 skips, 1.9x faster 293296 decicycles in sse2, 32759 runs, 9 skips, 3.1x faster 284910 decicycles in ssse3, 32759 runs, 9 skips, 3.2x faster Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-03-16 22:32:54 +01:00
James Darnley	17e7b49501	yadif: x86 assembly for 16-bit samples This is a fairly dumb copy of the assembly for 8-bit samples but it works and produces identical output to the C version. The options have been tested on an Athlon64 and a Core2Quad. Athlon64: 1810385 decicycles in C, 32726 runs, 42 skips 1080744 decicycles in mmx, 32744 runs, 24 skips, 1.7x faster 818315 decicycles in sse2, 32735 runs, 33 skips, 2.2x faster Core2Quad: 924025 decicycles in C, 32750 runs, 18 skips 623995 decicycles in mmx, 32767 runs, 1 skips, 1.5x faster 406223 decicycles in sse2, 32764 runs, 4 skips, 2.3x faster 387842 decicycles in ssse3, 32767 runs, 1 skips, 2.4x faster 307726 decicycles in sse4, 32763 runs, 5 skips, 3.0x faster Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-03-16 22:32:34 +01:00
Diego Biurrun	e66240f22e	avfilter: x86: consistent filenames for filter optimizations	2013-02-04 15:00:47 +01:00
Diego Biurrun	76d90125cd	vf_hqdn3d: x86: Add proper arch optimization initialization	2013-02-01 13:11:45 +01:00

1 2

56 Commits