FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-12 19:18:44 +02:00

Author	SHA1	Message	Date
James Almer	3b06208a57	x86/float_dsp: remove duplicated code from vector_dmul_scalar Use the xm# and ym# aliases as they remain in sync with m# after a SWAP. No actual changes to the assembly. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-19 14:21:51 +02:00
James Almer	11b36b1ee0	x86/float_dsp: unroll loop in vector_fmac_scalar ~6% faster SSE2 performance. AVX/FMA3 are unaffected. Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-16 18:36:52 +02:00
James Almer	3b808900af	x86/float_dsp: use SWAP in vector_fmac_scalar Win64 The mova is unnecessary Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-16 15:46:21 +02:00
James Almer	7d7487e85c	x86/float_dsp: add ff_vector_{fmul_add, fmac_scalar}_fma3 ~7% faster than AVX Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-13 04:34:05 +01:00
Christophe Gisquet	133b34207c	x86: float dsp: unroll SSE versions vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-15 18:54:21 +01:00
Michael Niedermayer	e91339cde2	Merge commit '566b7a20fd0cab44d344329538d314454a0bcc2f' * commit '566b7a20fd0cab44d344329538d314454a0bcc2f': x86: float dsp: butterflies_float SSE Conflicts: libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-05-03 11:57:59 +02:00
Christophe Gisquet	566b7a20fd	x86: float dsp: butterflies_float SSE 97c -> 49c Some codecs could benefit from more unrolling, but AAC doesn't.	2013-05-03 08:08:02 +02:00
Michael Niedermayer	92218aad00	butterflies_float: replace 2 lea by 2 add adds are simpler instructions and should be faster or equally fast on all cpus Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-04-17 00:10:06 +02:00
Christophe Gisquet	1a4007964c	x86: float dsp: butterflies_float SSE 97c -> 49c Some codecs could benefit from more unrolling, but AAC doesn't. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-04-17 00:03:25 +02:00
Michael Niedermayer	8102f27b5b	Merge commit '73b704ac609d83e0be124589f24efd9b94947cf9' * commit '73b704ac609d83e0be124589f24efd9b94947cf9': arm: Add some missing header #includes floatdsp: move scalarproduct_float from dsputil to avfloatdsp. Conflicts: libavcodec/acelp_pitch_delay.c libavcodec/amrnbdec.c libavcodec/amrwbdec.c libavcodec/ra288.c libavcodec/x86/dsputil_mmx.c libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-01-23 14:31:55 +01:00
Michael Niedermayer	6e6e170898	Merge commit '42d324694883cdf1fff1612ac70fa403692a1ad4' * commit '42d324694883cdf1fff1612ac70fa403692a1ad4': floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp. Conflicts: libavcodec/arm/dsputil_init_vfp.c libavcodec/arm/dsputil_vfp.S libavcodec/dsputil.c libavcodec/ppc/float_altivec.c libavcodec/x86/dsputil.asm libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-01-23 14:04:50 +01:00
Michael Niedermayer	b1b870fbd7	Merge commit '55aa03b9f8f11ebb7535424cc0e5635558590f49' * commit '55aa03b9f8f11ebb7535424cc0e5635558590f49': floatdsp: move vector_fmul_add from dsputil to avfloatdsp. Conflicts: libavcodec/dsputil.c libavcodec/x86/dsputil.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-01-23 13:54:34 +01:00
Ronald S. Bultje	42d3246948	floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp. Now, nellymoserenc and aacenc no longer depends on dsputil. Independent of this patch, wmaprodec also does not depend on dsputil, so I removed it from there also.	2013-01-22 11:55:42 -08:00
Ronald S. Bultje	55aa03b9f8	floatdsp: move vector_fmul_add from dsputil to avfloatdsp.	2013-01-22 11:55:42 -08:00
Ronald S. Bultje	d56668bd80	floatdsp: move scalarproduct_float from dsputil to avfloatdsp. This makes the aac decoder and all voice codecs independent of dsputil.	2013-01-22 11:55:42 -08:00
Michael Niedermayer	5c076205a6	Merge remote-tracking branch 'qatar/master' * qatar/master: golomb: use unsigned arithmetics in svq3_get_ue_golomb() x86: float_dsp: fix loading of the len parameter on x86-32 takdec: fix initialisation of LOCAL_ALIGNED array takdec: fix initialisation of LOCAL_ALIGNED array Conflicts: libavcodec/rv30.c libavcodec/svq3.c libavcodec/takdec.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-12-08 16:36:47 +01:00
Justin Ruggles	1c012e6bfb	x86: float_dsp: fix loading of the len parameter on x86-32	2012-12-07 21:19:29 -05:00
Michael Niedermayer	af164d7d9f	Merge commit 'c25fc5c2bb6ae8c93541c9427df3e47206d95152' * commit 'c25fc5c2bb6ae8c93541c9427df3e47206d95152': fate: dpcm: Add dependencies SBR DSP x86: implement SSE sbr_hf_gen AAC SBR: use AVFloatDSPContext's vector_fmul fate: image: Add dependencies Changelog: add an entry for deprecating the avconv -vol option x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32 Conflicts: Changelog libavutil/x86/float_dsp.asm tests/fate/image.mak Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-12-07 15:21:41 +01:00
Michael Niedermayer	15784c2bab	Merge commit '9d5c62ba5b586c80af508b5914934b1c439f6652' * commit '9d5c62ba5b586c80af508b5914934b1c439f6652': lavu/opt: do not filter out the initial sign character except for flags eval: treat dB as decibels instead of decibytes float_dsp: add vector_dmul_scalar() to multiply a vector of doubles Conflicts: libavutil/eval.c tests/ref/fate/eval Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-12-06 14:33:38 +01:00
Justin Ruggles	ecc8b02194	x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32 Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2012-12-06 14:11:15 +01:00
Justin Ruggles	ac7eb4cb20	float_dsp: add vector_dmul_scalar() to multiply a vector of doubles Include x86-optimized versions for SSE2 and AVX.	2012-12-05 11:23:36 -05:00
Michael Niedermayer	b4d4e51027	Merge commit '3c370f5abc55739a261534b9f9bdc739cedbbbb9' * commit '3c370f5abc55739a261534b9f9bdc739cedbbbb9': riff: only warn on a bad INFO chunk code size instead of failing configure: Add separate list for libraries and use where appropriate x86: float_dsp: add SSE version of vector_fmul_scalar() Conflicts: configure libavformat/riff.c libavutil/x86/float_dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-11-27 14:10:05 +01:00
Justin Ruggles	947f933687	x86: float_dsp: add SSE version of vector_fmul_scalar()	2012-11-26 11:30:19 -05:00
Diego Biurrun	2b479bcab0	build: Drop AVX assembly ifdefs An assembler able to cope with AVX instructions is now required.	2012-11-11 20:43:28 +01:00
Michael Niedermayer	3174616f59	Merge commit '6860b4081d046558c44b1b42f22022ea341a2a73' * commit '6860b4081d046558c44b1b42f22022ea341a2a73': x86: include x86inc.asm in x86util.asm cng: Reindent some incorrectly indented lines cngdec: Allow flushing the decoder cngdec: Make the dbov variable have the right unit cngdec: Fix the memset size to cover the full array cngdec: Update the LPC coefficients after averaging the reflection coefficients configure: fix print_config() with broke awks Conflicts: libavcodec/x86/ac3dsp.asm libavcodec/x86/dct32.asm libavcodec/x86/deinterlace.asm libavcodec/x86/dsputil.asm libavcodec/x86/dsputilenc.asm libavcodec/x86/fft.asm libavcodec/x86/fmtconvert.asm libavcodec/x86/h264_chromamc.asm libavcodec/x86/h264_deblock.asm libavcodec/x86/h264_deblock_10bit.asm libavcodec/x86/h264_idct.asm libavcodec/x86/h264_idct_10bit.asm libavcodec/x86/h264_intrapred.asm libavcodec/x86/h264_intrapred_10bit.asm libavcodec/x86/h264_weight.asm libavcodec/x86/vc1dsp.asm libavcodec/x86/vp3dsp.asm libavcodec/x86/vp56dsp.asm libavcodec/x86/vp8dsp.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-10-31 13:43:33 +01:00
Diego Biurrun	6860b4081d	x86: include x86inc.asm in x86util.asm This is necessary to allow refactoring some x86util macros with cpuflags.	2012-10-31 00:37:42 +01:00
Michael Niedermayer	7beadfe1f7	Merge remote-tracking branch 'qatar/master' * qatar/master: mov_chan: Only set the channel_layout if setting it to a nonzero value mov_chan: Reindent an incorrectly indented line mp2 muxer: mark as AVFMT_NOTIMESTAMPS. x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64 x86: more specific checks for availability of required assembly capabilities x86: avcodec: Drop silly "_mmx" suffix from dsputil template names fate: Drop redundant setting of FUZZ to 1 cavsdsp: set idct permutation independently of dsputil x86: allow using add_hfyu_median_prediction_cmov on any cpu with cmov Conflicts: libavcodec/x86/dsputil_mmx.c libavformat/mp3enc.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-09-08 12:53:44 +02:00
Justin Ruggles	7327525997	x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64 The SWAP macro does not work for explicit xmm/ymm usage, so instead just move the scalar value from xmm2 to xmm0.	2012-09-07 14:49:10 -04:00
Michael Niedermayer	c617bed34f	Merge remote-tracking branch 'qatar/master' * qatar/master: MSS1 and MSS2: set final pixel format after common stuff has been initialised MSS2 decoder configure: handle --disable-asm before check_deps x86: Split inline and external assembly #ifdefs configure: x86: Separate inline from standalone assembler capabilities pktdumper: Use a custom define instead of PATH_MAX for buffers pktdumper: Use av_strlcpy instead of strncpy pktdumper: Use sizeof(variable) instead of the direct buffer length Conflicts: Changelog configure libavcodec/allcodecs.c libavcodec/avcodec.h libavcodec/codec_desc.c libavcodec/dct-test.c libavcodec/imgconvert.c libavcodec/mss12.c libavcodec/version.h libavfilter/x86/gradfun.c libswscale/x86/yuv2rgb.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-08-31 13:34:32 +02:00
Diego Biurrun	17337f54c0	x86: Split inline and external assembly #ifdefs	2012-08-31 01:53:25 +02:00
Michael Niedermayer	2fc7c818cb	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: fix build with nasm 2.08 x86: use nop cpu directives only if supported x86: fix rNmp macros with nasm build: add trailing / to yasm/nasm -I flags x86: use 32-bit source registers with movd instruction x86: add colons after labels Conflicts: Makefile libavutil/x86/x86inc.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-08-07 23:04:55 +02:00
Mans Rullgard	a3df4781f4	x86: add colons after labels nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-07 15:20:56 +01:00
Michael Niedermayer	c6963a220d	Merge remote-tracking branch 'qatar/master' * qatar/master: proresdsp: port x86 assembly to cpuflags. lavr: x86: improve non-SSE4 version of S16_TO_S32_SX macro lavfi: better channel layout negotiation alac: check for truncated packets alac: reverse lpc coeff order, simplify filter lavr: add x86-optimized mixing functions x86: add support for fmaddps fma4 instruction with abstraction to avx/sse tscc2: fix typo in array index build: use COMPILE template for HOSTOBJS build: do full flag handling for all compiler-type tools eval: fix printing of NaN in eval fate test. build: Rename aandct component to more descriptive aandcttables mpegaudio: bury inline asm under HAVE_INLINE_ASM. x86inc: automatically insert vzeroupper for YMM functions. rtmp: Check the buffer length of ping packets rtmp: Allow having more unknown data at the end of a chunk size packet without failing rtmp: Prevent reading outside of an allocate buffer when receiving server bandwidth packets Conflicts: Makefile configure libavcodec/x86/proresdsp.asm libavutil/eval.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-07-27 23:42:19 +02:00
Ronald S. Bultje	30b45d9c38	x86inc: automatically insert vzeroupper for YMM functions.	2012-07-26 13:43:16 -07:00
Michael Niedermayer	cabbd271a5	Merge remote-tracking branch 'qatar/master' * qatar/master: (24 commits) flvdec: remove incomplete, disabled seeking code mem: add support for _aligned_malloc() as found on Windows lavc: Extend the documentation for avcodec_init_packet flvdec: remove incomplete, disabled seeking code http: replace atoll() with strtoll() mpegts: remove unused/incomplete/broken seeking code af_amix: allow float planar sample format as input af_amix: use AVFloatDSPContext.vector_fmac_scalar() float_dsp: add x86-optimized functions for vector_fmac_scalar() float_dsp: Move vector_fmac_scalar() from libavcodec to libavutil lavr: Add x86-optimized function for flt to s32 conversion lavr: Add x86-optimized function for flt to s16 conversion lavr: Add x86-optimized functions for s32 to flt conversion lavr: Add x86-optimized functions for s32 to s16 conversion lavr: Add x86-optimized functions for s16 to flt conversion lavr: Add x86-optimized function for s16 to s32 conversion rtpenc: Support packetizing iLBC rtpdec: Add a depacketizer for iLBC Implement the iLBC storage file format mov: Support muxing/demuxing iLBC ... Conflicts: Changelog configure libavcodec/avcodec.h libavcodec/dsputil.c libavcodec/version.h libavformat/movenc.c libavformat/mpegts.c libavformat/version.h libavutil/mem.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2012-06-19 20:53:27 +02:00
Justin Ruggles	82b2df9790	float_dsp: add x86-optimized functions for vector_fmac_scalar()	2012-06-18 18:01:14 -04:00
Michael Niedermayer	f0313e9022	x86/float_dsp.asm: restore author attribution The attribution was removed by libav while moving the code to libavutil The original code is from commit `eb4825b5d4` Author: Loren Merritt <lorenm@u.washington.edu> Date: Thu Aug 10 19:06:25 2006 +0000 sse and 3dnow implementations of float->int conversion and mdct windowing. 15% faster vorbis. and commit `069720565c` Author: Loren Merritt <lorenm@u.washington.edu> Date: Fri Aug 11 18:19:37 2006 +0000 vorbis simd tweaks Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2012-06-09 16:09:11 +02:00
Justin Ruggles	d5a7229ba4	Add a float DSP framework to libavutil Move vector_fmul() from DSPContext to AVFloatDSPContext.	2012-06-08 13:14:38 -04:00

38 Commits