FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-03 05:10:03 +02:00

Author	SHA1	Message	Date
Hendrik Leppkes	7b865c222e	Merge commit '5d14cf199990cd378904a2618b5c72c4b02290f6' * commit '5d14cf199990cd378904a2618b5c72c4b02290f6': mpegvideo: Make sure mpegutils.h is included where needed Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2015-09-16 11:23:40 +02:00
Vittorio Giovara	5d14cf1999	mpegvideo: Make sure mpegutils.h is included where needed	2015-09-13 17:34:45 +02:00
James Almer	d5f8a642f6	x86: port PSIGNW to cpuflags Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-09-11 23:27:03 -03:00
Ronald S. Bultje	4b66274a86	vp9: save one (PSIGNW) instruction in iadst16_1d sse2/ssse3.	2015-09-11 20:36:51 -04:00
Ronald S. Bultje	fd8b90f5f6	vp9: fix overflow in 8x8 topleft 32x32 idct ssse3 version. Also disable the mmx/iwht optimization when the bitexact flag is set. With synthetically coded coefficients (i.e. these that lead to a residual well outside the [-255,255] range), our optimizations will overflow. It doesn't make sense to fix the overflows, since they can only occur on synthetic input, not on real fwht-generated input. Thus, add a bitexact flag that disables this optimization.	2015-09-10 07:51:16 -04:00
Hendrik Leppkes	5d8e836d0e	Replace all remaining occurances of step/depth_minus1 and offset_plus1	2015-09-08 17:10:48 +02:00
Ronald S. Bultje	f12093fffd	vp9: fix integer overflows in sse2 version of iadst4.	2015-09-06 15:07:19 -04:00
Michael Niedermayer	8d860f9a77	avcodec/x86/w64xmmtest: Fix another build failure Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2015-09-05 22:15:53 +02:00
Ronald S. Bultje	086c9b78d4	vp9: fix rounding error in idct_8x8_ssse3.	2015-09-05 15:50:02 -04:00
Hendrik Leppkes	41194f065c	Merge commit 'cad40a3833ad81a352e7657ec6f7d637cea3b798' * commit 'cad40a3833ad81a352e7657ec6f7d637cea3b798': lavc: Drop deprecated deinterlace module Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2015-09-05 17:06:14 +02:00
Vittorio Giovara	cad40a3833	lavc: Drop deprecated deinterlace module Deprecated in 03/2013.	2015-08-28 16:04:19 +02:00
Ganesh Ajjanagadde	6638e4a950	avcodec/x86/mpegaudiodsp: correct asm guards Fixes -Wunused-function warnings when compiling with --disable-yasm on x86. Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2015-08-23 02:39:21 +02:00
Ganesh Ajjanagadde	907373ea9d	avcodec/x86/v210-init: fix unused variable warning Fixes a -Wunused-variable while compiling with --disable-yasm on x86 Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2015-08-21 17:06:27 +02:00
Ronald S. Bultje	e3b7298aed	lavc: fix compilation with FF_API_XVMC.	2015-08-18 12:05:57 -04:00
Henrik Gramner	ab43beefab	x86inc: Drop SECTION_TEXT macro The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2015-08-11 11:12:01 +02:00
Henrik Gramner	9f1245eb96	x86inc: Support arbitrary stack alignments Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2015-08-11 11:04:11 +02:00
Henrik Gramner	4a53c758d2	x86: dcadsp: Avoid SSE2 instructions in SSE functions Signed-off-by: Anton Khirnov <anton@khirnov.net>	2015-08-11 09:22:46 +02:00
James Almer	9c0407e856	x86/sbrdsp: remove an unnecessary mova in sbr_autocorrelate Signed-off-by: James Almer <jamrial@gmail.com>	2015-08-06 23:42:19 -03:00
Henrik Gramner	f0b7882ceb	x86inc: Drop SECTION_TEXT macro The .text section is already 16-byte aligned by default on all supported platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.	2015-08-04 20:13:09 +02:00
Henrik Gramner	826790f596	x86inc: Support arbitrary stack alignments Change ALLOC_STACK to always align the stack before allocating stack space for consistency. Previously alignment would occur either before or after allocating stack space depending on whether manual alignment was required or not.	2015-08-04 20:13:09 +02:00
James Almer	5750d6c5e9	x86: move XOP emulation code back to x86inc Only two functions that use xop multiply-accumulate instructions where the first operand is the same as the fourth actually took advantage of the macros. This further reduces differences with x264's x86inc. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-08-03 17:11:13 -03:00
Hendrik Leppkes	1ce298dac5	Merge commit 'ebaf571aca2dd6ce3caeeeec4210a3fccd47e7db' * commit 'ebaf571aca2dd6ce3caeeeec4210a3fccd47e7db': x86: dct: Disable dct32_float_sse on x86-64 Conflicts: libavcodec/x86/dct32.asm libavcodec/x86/dct_init.c Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2015-08-02 12:31:39 +02:00
Henrik Gramner	ebaf571aca	x86: dct: Disable dct32_float_sse on x86-64 There is an SSE2 implementation so the SSE version is never used. The "SSE" version also happens to contain SSE2 instructions on x86-64. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2015-08-02 08:41:45 +02:00
James Almer	9dcaae70f2	x86/aacpsdsp: add SSE and SSE3 optimized functions Between 1.5 and 2.5 times faster Reviewed-by: Michael Niedermayer <michael@niedermayer.cc> Signed-off-by: James Almer <jamrial@gmail.com>	2015-07-30 19:01:15 -03:00
Michael Niedermayer	29d147c94d	Merge commit '059a934806d61f7af9ab3fd9f74994b838ea5eba' * commit '059a934806d61f7af9ab3fd9f74994b838ea5eba': lavc: Consistently prefix input buffer defines Conflicts: doc/examples/decoding_encoding.c libavcodec/4xm.c libavcodec/aac_adtstoasc_bsf.c libavcodec/aacdec.c libavcodec/aacenc.c libavcodec/ac3dec.h libavcodec/asvenc.c libavcodec/avcodec.h libavcodec/avpacket.c libavcodec/dvdec.c libavcodec/ffv1enc.c libavcodec/g2meet.c libavcodec/gif.c libavcodec/h264.c libavcodec/h264_mp4toannexb_bsf.c libavcodec/huffyuvdec.c libavcodec/huffyuvenc.c libavcodec/jpeglsenc.c libavcodec/libxvid.c libavcodec/mdec.c libavcodec/motionpixels.c libavcodec/mpeg4videodec.c libavcodec/mpegvideo.c libavcodec/noise_bsf.c libavcodec/nuv.c libavcodec/nvenc.c libavcodec/options.c libavcodec/parser.c libavcodec/pngenc.c libavcodec/proresenc_kostya.c libavcodec/qsvdec.c libavcodec/svq1enc.c libavcodec/tiffenc.c libavcodec/truemotion2.c libavcodec/utils.c libavcodec/utvideoenc.c libavcodec/vc1dec.c libavcodec/wmalosslessdec.c libavformat/adxdec.c libavformat/aiffdec.c libavformat/apc.c libavformat/apetag.c libavformat/avidec.c libavformat/bink.c libavformat/cafdec.c libavformat/flvdec.c libavformat/id3v2.c libavformat/isom.c libavformat/matroskadec.c libavformat/mov.c libavformat/mpc.c libavformat/mpc8.c libavformat/mpegts.c libavformat/mvi.c libavformat/mxfdec.c libavformat/mxg.c libavformat/nutdec.c libavformat/oggdec.c libavformat/oggparsecelt.c libavformat/oggparseflac.c libavformat/oggparseopus.c libavformat/oggparsespeex.c libavformat/omadec.c libavformat/rawdec.c libavformat/riffdec.c libavformat/rl2.c libavformat/rmdec.c libavformat/rtpdec_latm.c libavformat/rtpdec_mpeg4.c libavformat/rtpdec_qdm2.c libavformat/rtpdec_svq3.c libavformat/sierravmd.c libavformat/smacker.c libavformat/smush.c libavformat/spdifenc.c libavformat/takdec.c libavformat/tta.c libavformat/utils.c libavformat/vqf.c libavformat/westwood_vqa.c libavformat/xmv.c libavformat/xwma.c libavformat/yop.c Merged-by: Michael Niedermayer <michael@niedermayer.cc>	2015-07-27 23:15:19 +02:00
Michael Niedermayer	94d68a41fa	Merge commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615' * commit '7c6eb0a1b7bf1aac7f033a7ec6d8cacc3b5c2615': lavc: AV-prefix all codec flags Conflicts: doc/examples/muxing.c ffmpeg.c ffmpeg_opt.c ffplay.c libavcodec/aacdec.c libavcodec/aacenc.c libavcodec/ac3dec.c libavcodec/ac3enc_float.c libavcodec/atrac1.c libavcodec/atrac3.c libavcodec/atrac3plusdec.c libavcodec/dcadec.c libavcodec/ffv1enc.c libavcodec/h264.c libavcodec/h264_loopfilter.c libavcodec/h264_mb.c libavcodec/imc.c libavcodec/libmp3lame.c libavcodec/libtheoraenc.c libavcodec/libtwolame.c libavcodec/libvpxenc.c libavcodec/libxavs.c libavcodec/libxvid.c libavcodec/mpeg12dec.c libavcodec/mpeg12enc.c libavcodec/mpegaudiodec_template.c libavcodec/mpegvideo.c libavcodec/mpegvideo_enc.c libavcodec/mpegvideo_motion.c libavcodec/nellymoserdec.c libavcodec/nellymoserenc.c libavcodec/nvenc.c libavcodec/on2avc.c libavcodec/options_table.h libavcodec/opus_celt.c libavcodec/pngenc.c libavcodec/ra288.c libavcodec/ratecontrol.c libavcodec/twinvq.c libavcodec/vc1_block.c libavcodec/vc1_loopfilter.c libavcodec/vc1_mc.c libavcodec/vc1dec.c libavcodec/vorbisdec.c libavcodec/vp3.c libavcodec/wma.c libavcodec/wmaprodec.c libavcodec/x86/hpeldsp_init.c libavcodec/x86/me_cmp_init.c Merged-by: Michael Niedermayer <michael@niedermayer.cc>	2015-07-27 22:10:35 +02:00
Vittorio Giovara	7c6eb0a1b7	lavc: AV-prefix all codec flags Convert doxygen to multiline and express bitfields more simply. Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2015-07-27 15:24:58 +01:00
James Almer	844bef578e	avcodec/x86: add missing colon to labels Silences warnings with Nasm Signed-off-by: James Almer <jamrial@gmail.com>	2015-07-26 02:50:14 -03:00
Michael Niedermayer	52b6d96268	Merge commit 'a344e5d094ebcf9a23acf3a27c56cbbbc829db42' * commit 'a344e5d094ebcf9a23acf3a27c56cbbbc829db42': x86: bswapdsp: Don't treat 32-bit integers as 64-bit Conflicts: libavcodec/x86/bswapdsp.asm Merged-by: Michael Niedermayer <michael@niedermayer.cc>	2015-07-17 23:20:14 +02:00
Michael Niedermayer	115a9b5091	Merge commit 'd42191c78befc1983f23b1899b2dda513b72f1ed' * commit 'd42191c78befc1983f23b1899b2dda513b72f1ed': configure: Factor out vp8dsp module Conflicts: configure libavcodec/Makefile libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michael@niedermayer.cc>	2015-07-17 22:45:34 +02:00
Michael Niedermayer	fd29dd432c	Merge commit '5cb4bdb2a03c3643f8f1e7d21d7094e61e0a4418' * commit '5cb4bdb2a03c3643f8f1e7d21d7094e61e0a4418': configure: Factor out rv34dsp module Conflicts: libavcodec/Makefile libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michael@niedermayer.cc>	2015-07-17 22:21:36 +02:00
Henrik Gramner	a344e5d094	x86: bswapdsp: Don't treat 32-bit integers as 64-bit The upper halves are not guaranteed to be zero in x86-64. Also use `test` instead of `and` when the result isn't used for anything other than as a branch condition, this allows some register moves to be eliminated. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2015-07-17 20:02:28 +02:00
Vittorio Giovara	d42191c78b	configure: Factor out vp8dsp module	2015-07-17 18:46:24 +01:00
Vittorio Giovara	5cb4bdb2a0	configure: Factor out rv34dsp module	2015-07-17 18:46:24 +01:00
Michael Niedermayer	b8c438e762	videodsp: assert that linesize is larger than width Suggested-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-07-08 01:32:04 +02:00
Andreas Cadhalpun	28efeb6502	doc: avoid incorrect phrase 'allows to' Also fix typo found by Lou Logan: Sacrifying -> Sacrificing Reviewed-by: Lou Logan <lou@lrcd.com> Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>	2015-06-16 21:48:51 +02:00
James Almer	9f815bc2c2	avcodec/jpeg200dsp: add ff_rct_int_{sse2,avx2} Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-06-13 16:53:31 -03:00
James Almer	7912a6830d	avcodec/jpeg200dsp: add ff_ict_float_{sse,avx} Original intrinsics version by Nicolas Bertrand. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-06-13 16:53:27 -03:00
Michael Niedermayer	63b0356274	Merge commit 'b7a4127a45b780d76e6b09427a3d0197c4bc1cdb' * commit 'b7a4127a45b780d76e6b09427a3d0197c4bc1cdb': h264_qpel: Use the correct header Merged-by: Michael Niedermayer <michaelni@gmx.at>	2015-06-12 21:55:40 +02:00
Michael Niedermayer	b68b5ec513	Merge commit '5e87080f2c73186066df0b9c43877b4af0beef3a' * commit '5e87080f2c73186066df0b9c43877b4af0beef3a': h264_weight: Fix SSSE3 biweight code with weights of 128 Conflicts: libavcodec/x86/h264_weight.asm See: `e100966575` See: `fb2288834b` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2015-06-12 21:47:01 +02:00
Vittorio Giovara	b7a4127a45	h264_qpel: Use the correct header	2015-06-12 17:02:48 +01:00
Michael Niedermayer	5e87080f2c	h264_weight: Fix SSSE3 biweight code with weights of 128 CC: libav-stable@libav.org Sample-Id: test_bref.mp4 Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2015-06-12 17:02:48 +01:00
Michael Niedermayer	e100966575	avcodec/x86/h264_weight: handle weight1=128 Fix ticket4596 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-06-09 05:11:09 +02:00
James Almer	c16e99e3b3	x86: check for AV_CPU_FLAG_AVXSLOW where useful Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-06-01 00:15:35 +02:00
James Almer	d68c05380c	x86: check for AV_CPU_FLAG_AVXSLOW where useful Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2015-05-31 12:07:11 +02:00
Michael Niedermayer	b666e81c13	Merge commit 'e4610300de6869bd6b3b00e76cfeabb6d7653dcd' * commit 'e4610300de6869bd6b3b00e76cfeabb6d7653dcd': x86: cavs: Remove an unneeded scratch buffer Conflicts: libavcodec/x86/cavsdsp.c See: `d79f7bf0d6` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2015-05-28 22:12:41 +02:00
Michael Niedermayer	e4610300de	x86: cavs: Remove an unneeded scratch buffer Simplifies the code and makes it build on certain compilers running out of registers on x86. CC: libav-stable@libav.org Reported-By: mudler	2015-05-28 18:40:40 +02:00
Timothy Gu	2b388e6dde	Revert "Move struc FFTContext below SECTION_RODATA" This reverts commit `599888a480`. The commit does not silence the warning on ELF-based systems, and will be fixed in the subsequent commit. Conflicts: libavcodec/x86/fft_mmx.asm Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-05-28 00:08:32 +02:00
Michael Niedermayer	d9b264bc73	Merge commit '848e86f74d3e6e87fa592ee8ba8c184cc5fd9a42' * commit '848e86f74d3e6e87fa592ee8ba8c184cc5fd9a42': mpegvideo: Drop flags and flags2 Conflicts: libavcodec/mpeg12dec.c libavcodec/mpeg12enc.c libavcodec/mpegvideo.c libavcodec/mpegvideo_enc.c libavcodec/mpegvideo_motion.c libavcodec/ratecontrol.c libavcodec/vc1_block.c libavcodec/vc1_loopfilter.c libavcodec/vc1_mc.c libavcodec/vc1dec.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2015-05-22 20:24:41 +02:00
Vittorio Giovara	848e86f74d	mpegvideo: Drop flags and flags2 They are just duplicates of AVCodecContext members so use those instead.	2015-05-22 15:34:39 +01:00
Michael Niedermayer	451be676f3	Merge remote-tracking branch 'rbultje/vp9-bugfixes' * rbultje/vp9-bugfixes: vp9: match another find_ref_mvs() bug in libvpx. vp9: fix scaled motion vector clipping for sub8x8 blocks. vp9: improve signbias check. vp9: don't allow compound references if error_resilience is enabled. vp9: clamp segmented lflvl before applying ref/mode deltas. vp9: reset loopfilter mode/ref deltas on keyframe. vp9: fix crash when playing back 440/440 content with width%64<56. vp9: extend loopfilter workaround for vp9 h/v mix-up to work for 422. vp9: clip motion vectors in the same way as libvpx does. vp9: set skip flag if the block had no coded coefficients. vp9: apply mv scaling workaround only when subsampling is enabled. vp9: read all 4x4 blocks in sub8x8 blocks individually with scalability. vp9: fix segmentation map referencing upon framesize change. vp9: disable more pmulhrsw optimizations in idct16/32. vp9: disable all pmulhrsw in 8/16 iadst x86 optimizations. Merged-by: Michael Niedermayer <michaelni@gmx.at>	2015-05-18 02:35:16 +02:00
Carl Eugen Hoyos	e609cfd697	lavc/flac: Fix encoding and decoding with high lpc. Based on an analysis by trac user lvqcl. Fixes ticket #4421, reported by Chase Walker.	2015-05-17 02:08:58 +02:00
Ronald S. Bultje	d32d0593f1	vp9: disable more pmulhrsw optimizations in idct16/32. For idct16, only when called from a adst16x16 variant, so impact is minor. For idct32, for all, so relatively major impact.	2015-05-14 14:15:27 -04:00
Ronald S. Bultje	96d30c3495	vp9: disable all pmulhrsw in 8/16 iadst x86 optimizations. They all overflow in various samples that are considered valid input.	2015-05-14 13:39:37 -04:00
Michael Niedermayer	cc77bb09e4	avcodec/x86/vp9dsp_init: Fix mix of declaration and statement Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-05-07 14:33:10 +02:00
Ronald S. Bultje	b224b165cb	vp9: add keyframe profile 2/3 support.	2015-05-06 15:10:41 -04:00
Michael Niedermayer	6ef3426d90	avcodec/x86/deinterlace: use INIT_MMX like other asm code does too	2015-05-05 02:41:15 +02:00
Michael Niedermayer	dfc0708e23	avcodec/x86/dct-test: Use uint8_t for idct_simple_mmx_perm The table contains no element outside the unsigned 8bit range Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-05-02 13:43:15 +02:00
Michael Niedermayer	270e647adc	avcodec/x86/dct-test: Make static table const Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-05-02 13:42:46 +02:00
Ronald S. Bultje	3de13d5212	vp9: remove another optimization branch in iadst16 which causes overflows. See sample vp90-2-14-resize-fp-tiles-16-8.webm from the vp9 test vector set to reproduce the issue.	2015-04-24 16:54:31 +02:00
Ronald S. Bultje	d02d04a18f	vp9: remove one optimization branch in iadst16 which causes overflows. See sample vp90-2-14-resize-fp-tiles-16-8-4-2-1.webm from the vp9 test vector set which reproduces the issue. This probably costs a few cycles, but I don't think there's an easy way to workaround that. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-04-22 21:37:10 +02:00
Michael Niedermayer	0245abc7c1	avcodec/x86/hpeldsp_init: Put CONFIG_* first in if() This is more consistent and may fix a build failure Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-03-26 15:41:27 +01:00
James Almer	6b940b8c99	x86/xvididct: add some yasm guards Should fix compilation on compilers with less-than-ideal dead code elimination Signed-off-by: James Almer <jamrial@gmail.com>	2015-03-20 02:38:20 -03:00
James Almer	b0fea4ad7e	x86/xvididct: remove obsolete function prototypes Signed-off-by: James Almer <jamrial@gmail.com>	2015-03-20 02:38:14 -03:00
Michael Niedermayer	1eb28479da	Merge commit '48aef27f5232794e70ecef0d347b9f65e27a9bad' * commit '48aef27f5232794e70ecef0d347b9f65e27a9bad': x86: Put COPY3_IF_LT under HAVE_6REGS Conflicts: libavcodec/x86/mathops.h See: `b38910c979` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2015-03-17 20:25:47 +01:00
Luca Barbato	48aef27f52	x86: Put COPY3_IF_LT under HAVE_6REGS It uses 6 registers, unbreaks building on hardened x86 system. Bug-Id: gentoo/541930 CC: libav-stable@libav.org	2015-03-17 12:31:04 +01:00
Michael Niedermayer	d79f7bf0d6	avcodec/x86/cavsdsp: remove incorrect LOCAL_ALIGN tmp This is faster and simpler as well Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-03-16 14:51:51 +01:00
James Almer	e8374d7202	x86/proresdsp: remove ff_prores_idct_put_10_sse4 It's exactly the same as the sse2 version. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-03-16 01:52:44 -03:00
James Almer	bdd179c8cb	x86/proresdsp: remove unused macro Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-03-16 01:49:34 -03:00
Christophe Gisquet	238db7cc56	x86: lavc: use LOCAL_ALIGNED instead of DECLARE_ALIGNED The later may yield incorrect code for on-stack variables. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-03-14 20:06:47 +01:00
Christophe Gisquet	15ce160183	x86: xvid_idct: SSE2 merged add version Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-03-14 13:36:47 +01:00
Christophe Gisquet	decd5193e1	x86: xvid_idct: merged idct_put SSE2 versions Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-03-14 13:36:29 +01:00
Christophe Gisquet	8200575d84	x86: dct-test: evaluate prores idct avx version Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-03-14 13:23:27 +01:00
Christophe Gisquet	4eb4451be1	x86: dct-test: fix compilation for prores When the decoder is deactivated, the x86-optimized versions are not compiled, resulting in a link error. The C version is unaffected, as it is part of the idctdsp subsystem. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-03-14 13:23:06 +01:00
Christophe Gisquet	c3bf52713a	x86: xvid_idct: port MMX iDCT to yasm Also reduce the table duplication with SSE2 code, remove duplicated macro parameters. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-03-14 11:45:11 +01:00
Christophe Gisquet	2999bd7da2	x86: xvid_idct: port SSE2 iDCT to yasm The main difference consists in renaming properly labels, and letting yasm select the gprs for skipping 1D transforms. Previous-version-reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-03-13 01:04:52 +01:00
James Almer	5c8f747085	x86/hevc_sao: use unaligned movs for sao_{band,filter} with width 8 Suggested-by: Christophe Gisquet <christophe.gisquet@gmail.com> Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-03-01 20:02:43 -03:00
Michael Niedermayer	7fce8c752d	Merge commit '71f1ad37d858b810b71a4af1c25771beaa50b27b' * commit '71f1ad37d858b810b71a4af1c25771beaa50b27b': lavc: do not compile fmtconvert unconditionally Conflicts: configure libavcodec/ppc/Makefile libavcodec/x86/Makefile Merged-by: Michael Niedermayer <michaelni@gmx.at>	2015-03-01 00:06:42 +01:00
Michael Niedermayer	5c17377e28	Merge commit 'd74a8cb7e42f703be5796eeb485f06af710ae8ca' * commit 'd74a8cb7e42f703be5796eeb485f06af710ae8ca': fmtconvert: drop unused functions Conflicts: libavcodec/arm/fmtconvert_vfp_armv6.S libavcodec/x86/fmtconvert.asm libavcodec/x86/fmtconvert_init.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-28 23:58:29 +01:00
Anton Khirnov	71f1ad37d8	lavc: do not compile fmtconvert unconditionally Only ac3dec and dcadec use it.	2015-02-28 21:51:24 +01:00
Anton Khirnov	d74a8cb7e4	fmtconvert: drop unused functions	2015-02-28 21:51:24 +01:00
Michael Niedermayer	23a90768a8	avcodec/v210dec: Add ff prefix to v210_x86_init() Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-27 19:08:09 +01:00
Michael Niedermayer	0e699676f9	avcodec/snow: mark dwt init as av_cold Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-27 16:53:37 +01:00
Carl Eugen Hoyos	36a6fb989b	hevc_deblock: Fix compilation with nasm CC: libav-stable@libav.org Bug-Id: 795 Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2015-02-22 22:34:20 +00:00
Michael Niedermayer	03f39fbb2a	avcodec/x86/mlpdsp_init: Simplify mlp_filter_channel_x86() Based on patch by Francisco Blas Izquierdo Riera Commit message partly taken from carl fixes a compilation error in mlpdsp_init.c with -fstack-check and some gcc compilers (I reproduced the issue with gcc 4.7.3) by simplifying the code. See also https://bugs.gentoo.org/show_bug.cgi?id=471756 $ make libavcodec/x86/mlpdsp_init.o libavcodec/x86/mlpdsp_init.c: In function ‘mlp_filter_channel_x86’: libavcodec/x86/mlpdsp_init.c:142:5: error: can’t find a register in class ‘GENERAL_REGS’ while reloading ‘asm’ libavcodec/x86/mlpdsp_init.c:142:5: error: ‘asm’ operand has impossible constraints 4551 -> 4509 dezicycles Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-21 16:05:41 +01:00
Christophe Gisquet	398f531915	x86: hevc_mc: fewer xmm regs used in epel h/v 11 xmm regs seem only required for avx2. Reviewed-by: Mickaël Raulet <mraulet@insa-rennes.fr Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-17 15:19:19 +01:00
Christophe Gisquet	89cb4995fa	x86: hevc_mc: save 1 gpr in epel filter loading The 3*stride value stored in r3src can be loaded much later, so use r3src instead of a dedicated gpr when possible. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-16 21:53:51 +01:00
James Almer	03adafb318	x86/g722dsp: add ff_g722_apply_qmf_sse2 Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-16 00:41:21 -03:00
Christophe Gisquet	b533949813	x86: hevc: remove a parameter to WP internals The second stride is always the internal buffer one, MAX_PB_SIZE (times 2 to get the value in bytes). Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-14 17:22:50 +01:00
James Almer	1679d68dbf	x86/hevc_mc: optimize AVX2 mc functions Before 40766 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips After 37975 decicycles in ff_hevc_put_hevc_qpel_h64_8_avx2, 8192 runs, 0 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-12 13:21:58 -03:00
James Almer	14b44c1614	x86/hevc_sao: make sao_edge_filter_{10,12} work on x86_32 Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-12 13:21:30 -03:00
James Almer	06fe6dfe12	x86/hevc_sao: make sao_band_filter work on x86_32 Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-09 20:41:21 -03:00
Christophe Gisquet	b61b9e4919	x86: hevc_mc: remove lea in EPEL_LOAD The second parameter to the macro is always an immediate address, so no lea is needed. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-08 22:19:35 +01:00
Christophe Gisquet	4919b38421	x86: hevc_mc: fewer gpr autoloads for _v filters In that case, it's just to load my, but mx/r3src is not used. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-08 22:19:34 +01:00
James Almer	92d903afaa	x86/vp9dsp: fix clobbering of xmm6 on IDCT sse2 functions Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-08 00:50:39 -03:00
Christophe Gisquet	626d6184ce	x86: lavc/hevc_mc: fix comments The width parameter is now completely at the back, and actually never used. This helps understanding the actual parameter list. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-07 20:52:03 +01:00
Christophe Gisquet	ed450d4acf	x86: lavc: share more constant through defines Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-07 17:48:14 +01:00
Christophe Gisquet	691b7f5e9e	lavc/lossless_audiodsp: revert various commits Their intent was to make the DSP work with wmalossless pro. The later was fixed to work with the DSP. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-07 15:15:19 +01:00
Christophe Gisquet	9dc45d1f42	x86: lavc: share more constants Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 23:35:02 +01:00
Mickaël Raulet	6ecc3fd612	x86/hevc_mc: use aligned loads Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 21:38:00 +01:00
James Almer	383fddeec6	x86/lossless_audiodsp: fix compilation with --disable-yasm Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-06 17:30:17 -03:00
James Almer	aea29a891f	x86/hevc_sao: fix loading of RIP address pb_eo must be handled as a rip relative address for MSVC64, so an intermediate register is needed. Should fix link failures. Suggested by Hendrik Leppkes and Christophe Gisquet. Tested-By: Hendrik Leppkes <h.leppkes@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-06 15:06:15 -03:00
Mickaël Raulet	bcb0925115	x86/hevc: use CLIPW macro when possible Conflicts: libavcodec/x86/hevc_mc.asm Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 17:38:47 +01:00
Christophe Gisquet	5eedd36df1	x86: hevc_mc: use epel_hv 16-wide function The epel_hv functions were still relying on only epel_hv 8-wide being the maximum width instanciated. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 17:37:56 +01:00
Pierre Edouard Lepere	a0d1300f71	x86: hevc_mc: add AVX2 optimizations before 33304 decicycles in luma_bi_1, 523066 runs, 1222 skips 38138 decicycles in luma_bi_2, 523427 runs, 861 skips 13490 decicycles in luma_uni, 516138 runs, 8150 skips after 20185 decicycles in luma_bi_1, 519970 runs, 4318 skips 24620 decicycles in luma_bi_2, 521024 runs, 3264 skips 10397 decicycles in luma_uni, 515715 runs, 8573 skips Conflicts: libavcodec/x86/hevc_mc.asm libavcodec/x86/hevcdsp_init.c Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 17:20:47 +01:00
Michael Niedermayer	a6c2c8fe3f	Revert "avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 prototypes more similar" This reverts commit `3b4ffba3af`. Unbreaks the SSSE3 code on mingw32 Conflicts: libavcodec/x86/lossless_audiodsp.asm Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 02:31:45 +01:00
Michael Niedermayer	f1214763af	avcodec/x86/lossless_audiodsp: Move order&8 fallback into C code This is simpler and more robust, and fixes mismatching XMM save restore mismatches Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 02:18:54 +01:00
Michael Niedermayer	3b4ffba3af	avcodec/x86/lossless_audiodsp: Make scalarproduct_and_madd_int16 prototypes more similar This is needed as the mmx code is used as fallback from the ssse3 code Suggested-by: jamrial Tested-by: wm4 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2015-02-06 00:20:59 +01:00
James Almer	15574c505b	x86/hevcdsp: add ff_hevc_sao_edge_filter_{10,12}_{sse2,avx2} Original x86 intrinsics code by Pierre-Edouard Lepere. Yasm port, refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 342694 decicycles in sao_edge_filter_10, 16384 runs, 0 skips 29476 decicycles in ff_hevc_sao_edge_filter_32_10_ssse3, 16384 runs, 0 skips 13996 decicycles in ff_hevc_sao_edge_filter_32_10_avx2, 16381 runs, 3 skips Width 64 581163 decicycles in sao_edge_filter_10, 8192 runs, 0 skips 59774 decicycles in ff_hevc_sao_edge_filter_64_10_ssse3, 8192 runs, 0 skips 28383 decicycles in ff_hevc_sao_edge_filter_64_10_avx2, 8191 runs, 1 skips Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-05 15:02:33 -03:00
James Almer	042c1159fc	x86/hevcdsp: add ff_hevc_sao_edge_filter_8_{ssse3,avx2} Original x86 intrinsics code and initial yasm port by Pierre-Edouard Lepere. Refactoring and optimizations by James Almer. Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U Width 32 158583 decicycles in edge, sao_edge_filter_8 runs, 0 skips 5205 decicycles in ff_hevc_sao_edge_filter_32_8_ssse3, 32767 runs, 1 skips 2942 decicycles in ff_hevc_sao_edge_filter_32_8_avx2, 32767 runs, 1 skips Width 64 705639 decicycles in sao_edge_filter_8, 262144 runs, 0 skips 19224 decicycles in ff_hevc_sao_edge_filter_64_8_ssse3, 262111 runs, 33 skips 10433 decicycles in ff_hevc_sao_edge_filter_64_8_avx2, 262115 runs, 29 skips Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-05 15:02:27 -03:00
James Almer	aa945dc112	x86/hevcdsp: add missing vzeroupper in ff_hevc_sao_band_filter_48_*_avx2 Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-02 00:01:35 -03:00
James Almer	71e2cb4706	x86/hevcdsp: add missing guards to ff_hevc_sao_band_filter_avx2 Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-01 21:45:52 -03:00
Christophe Gisquet	bff7feb328	x86: hevc/sao: aligned source buffers Usefull for at least band filter, for which: - Band filter call only: 32 64 Before: 16556 54015 After: 16497 52355 - Whole case: 32 64 Before: 37031 103008 After: 32045 93952	2015-02-01 20:22:54 -03:00
James Almer	fa3eccb4f9	x86/hevc: add ff_hevc_sao_band_filter_{8,10,12}_{sse2,avx,avx2} Original x86 intrinsics code and initial 8bit yasm port by Pierre-Edouard Lepere. 10/12bit yasm ports, refactoring and optimizations by James Almer Benchmarks of BQTerrace_1920x1080_60_qp22.bin with an Intel Core i5-4200U width 32 40338 decicycles in sao_band_filter_0_8, 2048 runs, 0 skips 8056 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 2048 runs, 0 skips 7458 decicycles in ff_hevc_sao_band_filter_8_32_avx, 2048 runs, 0 skips 4504 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 2048 runs, 0 skips width 64 136046 decicycles in sao_band_filter_0_8, 16384 runs, 0 skips 28576 decicycles in ff_hevc_sao_band_filter_8_32_sse2, 16384 runs, 0 skips 26707 decicycles in ff_hevc_sao_band_filter_8_32_avx, 16384 runs, 0 skips 14387 decicycles in ff_hevc_sao_band_filter_8_32_avx2, 16384 runs, 0 skips Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2015-02-01 20:22:35 -03:00
Christophe Gisquet	7aeafacfd0	x86/sbrdsp: Use different mem moves Before 2843 decicycles in ff_sbr_autocorrelate_sse3, 262086 runs, 58 skips After 2693 decicycles in ff_sbr_autocorrelate_sse3, 262117 runs, 27 skips Signed-off-by: James Almer <jamrial@gmail.com>	2015-01-25 18:20:43 -03:00
James Almer	449b21bfab	x86/sbrdsp: add ff_sbr_autocorrelate_{sse,sse3} 2 to 2.5 times faster. Signed-off-by: James Almer <jamrial@gmail.com>	2015-01-25 18:20:39 -03:00
James Almer	08810a8895	x86/flacdsp: remove unneeded ifdeffery x86inc can translate r*m into a register or stack on its own Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2015-01-05 16:29:28 -03:00
James Almer	37b35feb64	x86/swr: add SSE2/AVX pack_8ch functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-12-30 23:05:27 -03:00
Ronald S. Bultje	3aefca68ca	vp9/x86: add myself to copyright holders for loopfilter assembly.	2014-12-27 16:55:16 -05:00
Ronald S. Bultje	afd8c464b7	vp9/x86: make filter_16_h work on 32-bit.	2014-12-27 16:55:16 -05:00
Ronald S. Bultje	b26bc3520f	vp9/x86: make filter_48/84/88_h work on 32-bit.	2014-12-27 16:55:15 -05:00
Ronald S. Bultje	8a1cff1c35	vp9/x86: make filter_44_h work on 32-bit.	2014-12-27 16:55:15 -05:00
Ronald S. Bultje	047088b8c6	vp9/x86: make filter_16_v work on 32-bit.	2014-12-27 16:55:14 -05:00
Ronald S. Bultje	0cc9c23ea1	vp9/x86: make filter_48/84_v work on 32-bit.	2014-12-27 16:55:14 -05:00
Ronald S. Bultje	6433a9133f	vp9/x86: make filter_88_v work on 32-bit.	2014-12-27 16:55:14 -05:00
Ronald S. Bultje	75f8e52089	vp9/x86: make filter_44_v work on 32-bit.	2014-12-27 16:55:13 -05:00
Ronald S. Bultje	7f80c3344c	vp8/x86: save one register in SIGN_ADD/SUB.	2014-12-27 16:55:13 -05:00
Ronald S. Bultje	8ea2194ebb	vp9/x86: store unpacked intermediates for filter6/14 on stack. filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88 goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS.	2014-12-27 16:55:13 -05:00
Ronald S. Bultje	e42409479f	vp8/x86: move variable assigned inside macro branch. The value is not used outside the branch.	2014-12-27 16:55:12 -05:00
Ronald S. Bultje	418c202c63	vp9/x86: simplify ABSSUM_CMP by inverting the comparison meaning.	2014-12-27 16:55:12 -05:00
Ronald S. Bultje	d1c55654e1	vp8/x86: remove unused register from ABSSUB_CMP macro.	2014-12-27 16:55:12 -05:00
Ronald S. Bultje	e59bd08986	vp9/x86: slightly simplify 44/48/84/88 h stores.	2014-12-27 16:55:11 -05:00
Ronald S. Bultje	8132629bd5	vp9/x86: make cglobal statement more conservative in register allocation.	2014-12-27 16:55:11 -05:00
Ronald S. Bultje	c013ca58c5	vp9/x86: save one register in loopfilter surface coverage.	2014-12-27 16:55:11 -05:00
James Almer	32c836cb11	x86/vp9: remove duplicate function prototypes Fixes "redundant redeclaration" warnings. Signed-off-by: James Almer <jamrial@gmail.com>	2014-12-23 00:56:51 -03:00
James Almer	7696e429c7	x86/vp3dsp: port put_vp_no_rnd_pixels8_l2_mmx to yasm Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-20 13:25:43 +01:00
James Almer	a4d62f7775	x86/constants: fix alignment of pw_255 Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-19 20:21:34 +01:00
Ronald S. Bultje	bdc1e3e3b2	vp9/x86: intra prediction sse2/32bit support. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-19 14:07:19 +01:00
Ronald S. Bultje	b6e1711223	vp9/x86: invert hu_ipred left array ordering. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-19 14:07:18 +01:00
Ronald S. Bultje	0a7964dca5	vp9/x86: save one register on 32bit idct32x32. Fixes build on win32. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-16 02:51:26 +01:00
Ronald S. Bultje	cae893f692	vp9/x86: sse2 MC assembly. Also a slight change to the ssse3 code, which prevents a theoretical overflow in the sharp filter. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-15 02:34:05 +01:00
Ronald S. Bultje	fd77fbb390	vp9/x86: 32bit and sse2 support for vp9 inverse transform assembly Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-15 00:38:05 +01:00
Michael Niedermayer	a03f72e744	avcodec/x86/hevc_mc: fix sse register counts These fix failures of --enable-xmm-clobber-test It would be better to change the code to use fewer registers, but until someone does the used register count must not be too small Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-11 13:17:26 +01:00
Michael Niedermayer	d43d5c5707	avcodec/x86/hevc_mc: remove dead branch from EPEL_FILTER Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-10 07:34:49 +01:00
Michael Niedermayer	ed9be7dd47	avcodec/x86/pngdsp: fix off by 1 error This fixes artifacts in the last pixel of rows with some widths and pixel formats Found-by: Dominique Leroux <Dominique.Leroux@autodesk.com> Tested-by: Dominique Leroux <Dominique.Leroux@autodesk.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-08 18:24:40 +01:00
Michael Niedermayer	1d048f762d	Merge commit '9a738c27dceb4b975784b23213a46f5cb560d1c2' * commit '9a738c27dceb4b975784b23213a46f5cb560d1c2': v210enc: Add SIMD optimised 8-bit and 10-bit encoders Conflicts: libavcodec/v210enc.c libavcodec/v210enc.h libavcodec/x86/Makefile libavcodec/x86/v210enc.asm libavcodec/x86/v210enc_init.c tests/ref/vsynth/vsynth1-v210 tests/ref/vsynth/vsynth2-v210 See: `36091742d1` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-06 01:54:10 +01:00
Kieran Kunhya	9a738c27dc	v210enc: Add SIMD optimised 8-bit and 10-bit encoders Signed-off-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>	2014-12-05 13:03:49 +00:00
Reimar Döffinger	49d9cbe55d	h264_i386: Fix operand size Fixes fate failure on macosx clang x86-64 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-03 23:03:13 +01:00
Christophe Gisquet	9fa056ba75	pngdsp x86: use unaligned access For test images manually generated to contain only up prediction, timing results: 8380x3032 255x185 before: 138635 1992 after: 139232 1996 Actually jumping to the proper version depending on the alignment: 8380x3032: 138767 A 0.5% speed improvement for gigantic images is not worth the code duplication. Fixes ticket #4148 Signed-off-by: Christophe Gisquet <christophe.gisquet@gmail.com> Tested-by: Benoit Fouet <benoit.fouet@free.fr> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-12-03 11:56:22 +01:00
Kieran Kunhya	36091742d1	v210enc: Add SIMD optimised 8-bit and 10-bit encoders Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-26 20:30:47 +01:00
Michael Niedermayer	ea41e6d637	Merge commit '9c12c6ff9539e926df0b2a2299e915ae71872600' * commit '9c12c6ff9539e926df0b2a2299e915ae71872600': motion_est: convert stride to ptrdiff_t Conflicts: libavcodec/me_cmp.c libavcodec/ppc/me_cmp.c libavcodec/x86/me_cmp_init.c See: `9c669672c7` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-24 12:13:00 +01:00
Vittorio Giovara	9c12c6ff95	motion_est: convert stride to ptrdiff_t CC: libav-stable@libav.org Bug-Id: CID 700556 / CID 700557 / CID 700558	2014-11-24 01:30:10 +00:00
Carl Eugen Hoyos	600e38f563	Fix standalone compilation of the apng decoder on x86.	2014-11-23 13:21:29 +01:00
Michael Niedermayer	65ce8f8895	avcodec/x86/Makefile: fix order Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-23 01:49:04 +01:00
Michael Niedermayer	d3512a0e89	avcodec/x86/lossless_audiodsp: fix fallback code for 32bit Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-22 21:08:38 +01:00
Michael Niedermayer	4327088da3	avcodec/x86/lossless_audiodsp: support len %16 == 8 in scalarproduct_and_madd_int16() Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-11-22 20:40:36 +01:00
Reimar Döffinger	478c61ccb2	h264_i386: Optimize decode_significance_8x8_x86 for 64 bit. 11674 -> 10877 decicycles on my Phenom II. Overall speedup was unfortunately within measurement error. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2014-11-22 14:06:48 +01:00
James Almer	3cec54b7d7	x86/flacdsp: add SSE2 and AVX decorrelate functions Two to four times faster depending on instruction set, block size and channel count.	2014-11-13 13:47:55 -03:00
James Almer	84ccc317ce	x86/flacdsp: separate decoder and encoder dsp initialization Signed-off-by: James Almer <jamrial@gmail.com>	2014-11-12 14:41:45 -03:00
James Almer	7292b0477a	x86/hpeldsp: fix loop in {avg,avg_no_rnd}_pixels16_x2_mmx Handle it inside the __asm__() block. Fixes fate-vc1_ilaced_twomv when using the gcc-usan toolchain. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-10-23 13:11:05 -03:00
Michael Niedermayer	3c1378ce0a	Merge commit '2d91abade29e43bb45c881d45909b8ee77e904e2' * commit '2d91abade29e43bb45c881d45909b8ee77e904e2': x86: h264_intrapred: Don't treat 32-bit integers as 64-bit Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-10-08 11:48:58 +02:00
Henrik Gramner	2d91abade2	x86: h264_intrapred: Don't treat 32-bit integers as 64-bit The upper halves are not guaranteed to be zero in x86-64. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2014-10-08 08:15:52 +00:00
Mickaël Raulet	4ba6371a83	x86/hevc: get rid off packusdw for ssse3 compatibility cherry picked from commit df8ebe304df453f26c28ff8f11d607f49b90a4c2 Fixes out of array access Fixes: asan_stack-oob_1046454_9_asan_stack-oob_15a9e7c_170_WP_MAIN10_B_Toshiba_3.bit Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-10-04 21:14:15 +02:00
James Almer	0de1d6287e	x86/mlpdec: add ff_mlp_rematrix_channel_{sse4,avx2} 2x to 2.5x faster than the C version. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-10-02 22:11:55 -03:00
James Almer	acebff8e5d	x86/mpegvideoencdsp: improve ff_pix_sum16_sse2 ~15% faster. Also add an mmxext version that takes advantage of the new code, and build it alongside with the mmx version only on x86_32. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-10-01 13:07:22 -03:00
Michael Niedermayer	d22e88d120	avcodec/x86/fmtconvert: Fix operand size in ff_int32_to_float_fmul_array8_sse* Fixes acodec-dca2 fate failure Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-28 19:04:06 +02:00
James Almer	26cd7b1e1a	x86/fmtconvert: add ff_int32_to_float_fmul_array8_{sse,sse2} About two times faster than the c wrapper. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-26 20:48:40 -03:00
Carl Eugen Hoyos	c0f9df30dd	lavc/x86/idctdsp.h: Fix make checkheaders.	2014-09-25 22:18:25 +02:00
James Almer	a829870b2f	avcodec/svq1enc: align buffer used by simd functions Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-25 16:00:20 -03:00
James Almer	4b892e469b	x86/cavsdsp: fix buffer alignment in cavs_idct8_add_mmx() It may be used by ff_add_pixels_clamped_sse2(). Should fix fate-cavs failures on some systems. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-25 16:00:16 -03:00
James Almer	4f4f08e6f0	x86/idctdsp: port {put,add}_pixels_clamped to yasm Also add sse2 versions for both. put_pixels_clamped port and sse2 version originally written by Timothy Gu. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-24 21:52:13 -03:00
James Almer	c99a882814	avcodec/idctdsp: change {put,add}_pixels_clamped to ptrdiff_t line_size Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-24 21:43:19 -03:00
James Almer	ad26e83f9c	avcodec/x86: use function pointers for {put,add}_pixels_clamped Same behavior as in simple_idct. This way the best optimized versions available will be used instead. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-24 18:52:32 -03:00
James Almer	70277d1d23	x86/videodsp: add ff_emu_edge_{hfix,hvar}_avx2 ~15% faster than sse2. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-24 16:12:55 -03:00
James Almer	164d6c7f5b	x86/videodsp: fix warning about discarded 'const' qualifier Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-23 19:59:20 -03:00
James Almer	6b2caa321f	x86/vp9: add AVX and AVX2 MC Roughly 25% faster MC than ssse3 for blocksizes 32 and 64. Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-22 22:35:03 -03:00
James Almer	33c752be51	x86/me_cmp: port mmxext vsad functions to yasm Also add mmxext versions of vsad8 and vsad_intra8, and sse2 versions of vsad16 and vsad_intra16. Since vsad8 and vsad16 are not bitexact, they are accordingly marked as approximate. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-19 20:50:20 -03:00
James Almer	77f9a81cca	x86/me_cmp: combine sad functions into a single macro No point in having the sad8 functions separate now that the loop is no longer unrolled. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-17 23:52:36 -03:00
Michael Niedermayer	41d82b85ab	avcodec/x86/vp9lpf: Always include x86util.asm Fixes executable stack Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-17 23:37:46 +02:00
Michael Niedermayer	85f2c0124d	avcodec/x86/me_cmp: fix sad8xh This adds back support for 8x4 and 8x16 it does not support 8x2, i think nothing uses that Found-by: ubitux Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-17 14:08:24 +02:00
James Almer	0456d169c4	x86/me_cmp: port mmxext and sse2 sad functions to yasm Also add a missing c->pix_abs[0][0] initialization, and sse2 versions of sad16_x2, sad16_y2 and sad16_xy2 (%15 to %20 faster than mmxext). Since the _xy2 versions are not bitexact, they are accordingly marked as approximate. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-17 11:12:50 +02:00
James Almer	52ec81c67d	x86/hevc_res_add: add missing guards to hevc_transform_add32_8_avx2 Should fix compilation with old Yasm/Nasm versions. Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-04 23:34:01 -03:00
James Almer	c3d2426cca	x86/hevc_res_add: add ff_hevc_transform_add32_8_avx2 ~20% faster than AVX. Reviewed-by: Michael Niedermayer <michaelni@gmx.at> Signed-off-by: James Almer <jamrial@gmail.com>	2014-09-04 20:21:29 -03:00
James Darnley	46ef45ab59	lavc/x86/v210: give cpuflag to INIT macro This lets the cglobal macro automatically append a suffix to the function name. This means that INIT_XMM avx must be used rather than INIT_AVX. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-05 00:35:07 +02:00
Michael Niedermayer	5b58d79a99	Merge commit '7a1d6ddd2c6b2d66fbc1afa584cf506930a26453' * commit '7a1d6ddd2c6b2d66fbc1afa584cf506930a26453': xvid: Add C IDCT Conflicts: libavcodec/dct-test.c libavcodec/xvididct.c See: `298b3b6c1f` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-03 04:09:38 +02:00
Michael Niedermayer	5db23c07a3	Merge commit '95c0cec03acec0a80cc1c7db48f3b2355d9e767b' * commit '95c0cec03acec0a80cc1c7db48f3b2355d9e767b': idctdsp: Add global function pointers for {add\|put}_pixels_clamped functions Conflicts: libavcodec/arm/idctdsp_init_arm.c libavcodec/dct.h libavcodec/idctdsp.c libavcodec/jrevdct.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-09-03 03:19:40 +02:00
Pascal Massimino	7a1d6ddd2c	xvid: Add C IDCT Thanks to Pascal Massimino and Michael Militzer for relicensing as LGPL. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2014-09-02 14:41:13 -07:00
Diego Biurrun	95c0cec03a	idctdsp: Add global function pointers for {add\|put}_pixels_clamped functions These function pointers already existed in the ARM code. Adding them globally allows calls to the function pointers to access arch-optimized versions of the functions transparently.	2014-09-02 14:41:13 -07:00
Reimar Döffinger	d9e2aceb7f	Add missing "const" all over the place. Only "./configure --enable-gpl" on x86 was tested. Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>	2014-08-29 18:57:25 +02:00
Michael Niedermayer	5403a288a7	Merge commit '8d27bf1cff35be406b0fd89d832e1852d4c573bc' * commit '8d27bf1cff35be406b0fd89d832e1852d4c573bc': x86: xvid: K&R formatting cosmetics Conflicts: libavcodec/x86/xvididct_sse2.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-27 21:20:39 +02:00
Michael Niedermayer	b3b05a11d3	Merge commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5' * commit 'dcb7c868ec7af7d3a138b3254ef2e08f074d8ec5': cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs Conflicts: libavcodec/mpeg4videodec.c libavcodec/x86/Makefile libavcodec/x86/dct-test.c libavcodec/x86/xvididct_sse2.c libavcodec/xvididct.c Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-27 21:09:30 +02:00
Michael Niedermayer	3ff5ca89fc	Merge commit '1f156af4274dc72d588620f6bedb4e9e66023c92' * commit '1f156af4274dc72d588620f6bedb4e9e66023c92': x86: xvid_idct: Drop unused definitions Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-27 21:01:54 +02:00
Diego Biurrun	8d27bf1cff	x86: xvid: K&R formatting cosmetics	2014-08-27 05:58:04 -07:00
Diego Biurrun	dcb7c868ec	cosmetics: Make naming scheme of Xvid IDCT consistent with other IDCTs	2014-08-27 04:54:05 -07:00
Diego Biurrun	1f156af427	x86: xvid_idct: Drop unused definitions	2014-08-27 04:36:41 -07:00
Christophe Gisquet	3e892b2bcd	x86: hevc_mc: split differently calls In some cases, 2 or 3 calls are performed to functions for unusual widths. Instead, perform 2 calls for different widths to split the workload. The 8+16 and 4+8 widths for respectively 8 and more than 8 bits can't be processed that way without modifications: some calls use unaligned buffers, and having branches to handle this was resulting in no micro-benchmark benefit. For block_w == 12 (around 1% of the pixels of the sequence): Before: 12758 decicycles in epel_uni, 4093 runs, 3 skips 19389 decicycles in qpel_uni, 8187 runs, 5 skips 22699 decicycles in epel_bi, 32743 runs, 25 skips 34736 decicycles in qpel_bi, 32733 runs, 35 skips After: 11929 decicycles in epel_uni, 4096 runs, 0 skips 18131 decicycles in qpel_uni, 8184 runs, 8 skips 20065 decicycles in epel_bi, 32750 runs, 18 skips 31458 decicycles in qpel_bi, 32753 runs, 15 skips Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-24 12:05:33 +02:00
Christophe Gisquet	38e2aa3759	x86: hevc_mc: correct unneeded use of SSE4 code Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-24 11:43:33 +02:00
Christophe Gisquet	2346f2b5db	x86: hevcdsp: use compilation-time-fixed constant The stride for some buffers is known. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 16:26:30 +02:00
Christophe Gisquet	dad7f15567	hevcdsp: remove more instances of compile-time-fixed parameters Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 15:22:42 +02:00
Christophe Gisquet	d4f44b66d3	hevcdsp: remove compilation-time-fixed parameter The dststride parameter is always MAX_PB_SIZE. Reviewed-by: Mickaël Raulet <mraulet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-08-22 14:57:37 +02:00

... 2 3 4 5 6 ...

2156 Commits