FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-23 12:43:46 +02:00

Author	SHA1	Message	Date
Muhammad Faiz	de1308429a	swresample/x86/resample: extend resample_double to support avx and fma3 benchmark: sse2 10.670s avx 8.763s fma3 8.380s Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>	2017-03-19 12:24:41 +07:00
Muhammad Faiz	06f94149c6	swresample/resample: optimize exact_rational=on:linear_interp=on case separate dsp.resample to dsp.resample_common and dsp.resample_linear and choose to call faster resample_common even when linear_interp=on when c->frac and c->dst_incr_mod are both zero speed up resampling when exact_rational and linear_interp are both enabled because exact_rational force c->frac and c->dst_incr_mod to be zero when soft compensation does not happen benchmark on exact_rational=on:linear_interp=on old new real 8.432s 5.097s user 7.679s 4.989s sys 0.125s 0.107s Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>	2016-11-25 03:22:04 +07:00
Muhammad Faiz	6031e5d1af	swresample/x86: add support for exact_rational phase_shift and phase_mask is removed generally exact_rational=on is faster than exact_rational=off Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>	2016-06-21 05:18:21 +07:00
Muhammad Faiz	b8c6e5a661	swresample: add exact_rational option give high quality resampling as good as with linear_interp=on as fast as without linear_interp=on tested visually with ffplay ffplay -f lavfi "aevalsrc='sin(10000tt)', aresample=osr=48000, showcqt=gamma=5" ffplay -f lavfi "aevalsrc='sin(10000tt)', aresample=osr=48000:linear_interp=on, showcqt=gamma=5" ffplay -f lavfi "aevalsrc='sin(10000tt)', aresample=osr=48000:exact_rational=on, showcqt=gamma=5" slightly speed improvement for fair comparison with -cpuflags 0 audio.wav is ~ 1 hour 44100 stereo 16bit wav file ffmpeg -i audio.wav -af aresample=osr=48000 -f null - old new real 13.498s 13.121s user 13.364s 12.987s sys 0.131s 0.129s linear_interp=on old new real 23.035s 23.050s user 22.907s 22.917s sys 0.119s 0.125s exact_rational=on real 12.418s user 12.298s sys 0.114s possibility to decrease memory usage if soft compensation is ignored Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>	2016-06-13 12:36:01 +07:00
James Almer	70d685a77f	x86: use the new helper macros where useful Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	2016-02-14 20:00:21 -03:00
James Almer	acdd672506	x86/audio_convert: fix clobbering of xmm registers Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-10-01 22:40:50 -03:00
James Almer	5750d6c5e9	x86: move XOP emulation code back to x86inc Only two functions that use xop multiply-accumulate instructions where the first operand is the same as the fourth actually took advantage of the macros. This further reduces differences with x264's x86inc. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-08-03 17:11:13 -03:00
James Almer	f37a5dcb55	swresample/x86: add missing colon to labels Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>	2015-07-26 02:51:13 -03:00
James Almer	c16e99e3b3	x86: check for AV_CPU_FLAG_AVXSLOW where useful Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-06-01 00:15:35 +02:00
Michael Niedermayer	c0e3b46118	swresample: add av_cold to init functions Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-21 00:33:09 +01:00
James Almer	f7ed997a6d	x86/swr: make pack_8ch functions work with compilers without aligned stack Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-15 13:57:37 -03:00
Michael Niedermayer	b74ecb82fa	swresample/x86/rematrix_init: Check av_malloc* return codes, forward errors Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-09 10:15:56 +01:00
Michael Niedermayer	48ffaaaaef	swresample/x86/rematrix_init: Use av_mallocz_array() Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-09 10:15:56 +01:00
James Almer	59ac93f6af	x86/swr: add SSE/AVX unpack_6ch functions int32/float only Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-01-12 15:40:03 -03:00
James Almer	6abf00d615	x86/swr: load constants outside the loop in pack_6ch functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-01-11 01:11:46 -03:00
James Almer	975ff6a3c6	x86/swr: disable pack_8ch functions on msvc/icl x86_32 Until a proper fix is committed. Signed-off-by: James Almer <jamrial@gmail.com>	2014-12-31 16:38:33 -03:00
James Almer	5f14f9e984	x86/swr: add missing alignment check to pack_6ch functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-12-31 13:35:11 -03:00
James Almer	37b35feb64	x86/swr: add SSE2/AVX pack_8ch functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-12-30 23:05:27 -03:00
James Almer	edff061fb0	x86/swr: add ff_float_to_int32_a_avx2 13797 decicycles in ff_float_to_int32_a_sse2, 32768 runs, 0 skips 8603 decicycles in ff_float_to_int32_a_avx2, 32766 runs, 2 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-11-07 15:01:35 -03:00
James Almer	b385c4c6a3	x86/swr: replace sse4 instructions in pack_6ch with sse ones There's no benefit from using blendps here except on CPUs with AVX, where it's faster than shufps according to Intel's documentation. As such, rename the sse4 functions to sse/sse2 and use shufps instead. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-11-06 20:54:00 -03:00
James Almer	9937362c54	x86/swr: use lavu helper macros to check CPU extensions Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-04 02:12:16 +02:00
James Almer	8279a15284	x86/swr: split audioconvert and rematrix DSP into separate files Also rename resample_x86_dsp.c to resample_init.c Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-04 02:00:11 +02:00
James Almer	857cd1f33b	swr: initialize only the necessary resample dsp functions Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-04 01:37:41 +02:00
James Almer	b5f0eac068	swr: rename swresample_dsp init functions to swri_resample_dsp The swresample_ prefix is not for internal functions Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-02 13:18:30 +02:00
James Almer	c45b7f0d80	x86/swr: add ff_resample_{common, linear}_int16_xop Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-02 01:11:20 +02:00
James Almer	1a69224f44	x86/swr: add ff_resample_{common, linear}_float_fma Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-02 01:09:53 +02:00
James Almer	dd2c9034b1	x86/swr: convert resample_{common, linear}_double_sse2 to yasm Signed-off-by: James Almer <jamrial@gmail.com> 312531 -> 311528 dezicycles Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-07-01 17:57:36 +02:00
Ronald S. Bultje	847bb638c0	swr: convert resample_common/linear_int16_mmx2/sse2 to yasm. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-06-30 20:11:50 +02:00
Ronald S. Bultje	faa1471ffc	swr: rewrite resample_common/linear_float_sse/avx in yasm. Linear interpolation goes from 63 (llvm) or 58 (gcc) to 48 (yasm) cycles/sample on 64bit, or from 66 (llvm/gcc) to 52 (yasm) cycles/ sample on 32bit. Bon-linear goes from 43 (llvm) or 38 (gcc) to 32 (yasm) cycles/sample on 64bit, or from 46 (llvm) or 44 (gcc) to 38 (yasm) cycles/sample on 32bit (all testing on OSX 10.9.2, llvm 5.1 and gcc 4.8/9). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-06-28 17:06:47 +02:00
Ronald S. Bultje	083cd3d1f7	swr: compile mmx2 s16p functions only on x86-32. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-06-15 13:34:53 +02:00
James Almer	7f4dfbd080	swr: add prototypes for resample dsp functions Should fix compilation failures with MSVC and any other compiler without inline asm support. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-06-15 01:33:17 +02:00
Ronald S. Bultje	ada8f9c046	swr: remove obsolete function prototypes. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-06-15 00:07:25 +02:00
Ronald S. Bultje	7128a35f8c	swr: split out DSP functions. DSP bits of swri_resample go into their own mini-DSP functions; DSP init goes from a per-call branch in multiple_resample to a proper DSP init routine; x86 bits go into x86/; swri_resample() moves out of resample_template.c into resample.c because it's independent of DSP code or sample type; multiple_resample() is simplified. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-06-14 20:21:39 +02:00
James Almer	a9bf713d35	swresample: add swri_resample_float_avx Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-16 05:27:03 +02:00
Matt Oliver	1898c2f49d	inline asm: fix arrays as named constraints. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-07 15:02:45 +02:00
James Almer	4cdea92976	swresample/resample: add missing xmm clobbers Might fix fate-swr on ICL Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-07 01:32:40 +02:00
James Almer	cdac3ab59f	swresample: add swri_resample_double_sse2 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-25 16:46:07 +02:00
James Almer	63dbba655e	swresample/resample: sse float linear interpolation About two times faster Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-24 02:34:02 +01:00
James Almer	fa25c4c400	swresample/resample: mmx2/sse2 int16 linear interpolation About three times faster Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-24 02:33:16 +01:00
James Almer	32291ba6ea	swresample: add swri_resample_float_sse At least two times faster than the C version. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-20 06:01:06 +01:00
Matt Oliver	8236747511	Automatically change MANGLE() into named inline asm operands when direct symbol reference in inline asm are not supported. This is part of the patch-set for intel C inline asm on windows support Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-18 23:39:30 +01:00
James Almer	7c8bf09edd	swresample: change COMMON_CORE_INT16 asm from SSSE3 to SSE2 pshuf+paddd is slightly faster than phaddd. The real gain is in pre-ssse3 processors like AMD K8 and K10, which get a big boost in performance compared to the mmxext version Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-18 15:00:50 +01:00
Martin Storsjö	3dd04cbcf7	swresample: Add arm&x86 clobber tests Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-01-18 18:38:57 +01:00
Reimar Döffinger	cbeaf67888	Avoid using empty macro arguments. These are not supported by all compilers (gcc 2.95 but also older SPARC compilers, see gcc bug #33304 for example), and there is no real need for them. One use of this feature remains in libavdevice/v4l2.c which can't be replaced quite as easily. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2013-12-31 12:19:59 +01:00
Ronald S. Bultje	ad75d2b590	x86: Fix compilation with nasm on PPC & OS/2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 12:36:19 +02:00
Michael Niedermayer	ca2818b881	swresample/x86/audio_convert: add emms to CONV Might fix Ticket1874 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-06-18 02:26:36 +02:00
Michael Niedermayer	4cfc92081d	swr: add native_simd_one Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-06-04 23:50:45 +02:00
Michael Niedermayer	3174616f59	Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73' * commit '6860b4081d046558c44b1b42f22022ea341a2a73': x86: include x86inc.asm in x86util.asm cng: Reindent some incorrectly indented lines cngdec: Allow flushing the decoder cngdec: Make the dbov variable have the right unit cngdec: Fix the memset size to cover the full array cngdec: Update the LPC coefficients after averaging the reflection coefficients configure: fix print_config() with broke awks Conflicts: libavcodec/x86/ac3dsp.asm libavcodec/x86/dct32.asm libavcodec/x86/deinterlace.asm libavcodec/x86/dsputil.asm libavcodec/x86/dsputilenc.asm libavcodec/x86/fft.asm libavcodec/x86/fmtconvert.asm libavcodec/x86/h264_chromamc.asm libavcodec/x86/h264_deblock.asm libavcodec/x86/h264_deblock_10bit.asm libavcodec/x86/h264_idct.asm libavcodec/x86/h264_idct_10bit.asm libavcodec/x86/h264_intrapred.asm libavcodec/x86/h264_intrapred_10bit.asm libavcodec/x86/h264_weight.asm libavcodec/x86/vc1dsp.asm libavcodec/x86/vp3dsp.asm libavcodec/x86/vp56dsp.asm libavcodec/x86/vp8dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-10-31 13:43:33 +01:00
Michael Niedermayer	31a797eb28	swr: add av_cold to init/free functions Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-09-09 02:26:20 +02:00
Carl Eugen Hoyos	a26789cf9f	Fix compilation with yasm-0.6.2.	2012-09-01 10:59:16 +02:00

1 2

93 Commits