FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-18 03:19:31 +02:00

Author	SHA1	Message	Date
Clément Bœsch	d0e132bab6	Merge commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d' * commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d': hevc: Separate adding residual to prediction from IDCT This commit should be a noop but isn't because of the following renames: - transform_add → add_residual - transform_skip → dequant - idct_4x4_luma → transform_4x4_luma Merged-by: Clément Bœsch <cboesch@gopro.com>	2017-01-31 15:31:34 +01:00
James Almer	6d4c9f2ade	lossless_videodsp: rename add_hfyu_left_pred_int16 to add_left_pred_int16 Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:05 -03:00
James Almer	47f212329e	huffyuvdsp: move functions only used by huffyuv from lossless_videodsp Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:05 -03:00
James Almer	cf9ef83960	huffyuvencdsp: move shared functions to a new lossless_videoencdsp context Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:04 -03:00
James Almer	30c1f27299	huffyuvencdsp: move functions only used by huffyuv from lossless_videodsp Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:04 -03:00
James Almer	5ac1dd8e23	lossless_videodsp: move shared functions from huffyuvdsp Several codecs other than huffyuv use them. Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:04 -03:00
Michael Niedermayer	aa95292043	avcodec/x86/vc1dsp_mc: Fix build with NASM 2.09.10 make fate passes Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-02 22:37:55 +01:00
John Comeau	d06518752b	avcodec/x86/imdct36: fix building with nasm 2.11.05 fixes `operation size not specified` errors as described here: http://stackoverflow.com/questions/36854583/compiling-ffmpeg-for-kali-linux-2 I rebuilt again with yasm and made sure it didn't break that. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-02 20:44:16 +01:00
Paul B Mahol	6d09d6edbc	avcodec/magicyuv: add 10 bit support Signed-off-by: Paul B Mahol <onemda@gmail.com>	2016-12-20 13:32:15 +01:00
James Darnley	acdd2d805d	avcodec/h264: resolve assert being triggered when stack is not aligned 32-bit msvc.	2016-12-07 22:32:19 +01:00
James Darnley	728651df06	avcodec/h264: mmx2, sse2, avx 10-bit 4:2:2 h chroma deblock/loop filter Yorkfield: - mmx2: 2.53x (504 vs. 199 cycles) - sse2: 3.83x (504 vs. 131 cycles) Nehalem: - mmx2: 2.42x (365 vs. 151 cycles) - sse2: 3.56x (365 vs. 103 cycles) Skylake: - mmx2: 1.81x (308 vs. 170 cycles) - sse2: 2.84x (308 vs. 108 cycles) - avx: 2.93x (308 vs. 105 cycles)	2016-12-07 00:29:13 +01:00
James Darnley	add21d0bb3	avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter Yorkfield: - mmx2: 2.45x (279 vs. 114 cycles) - sse2: 3.36x (279 vs. 83 cycles) Nehalem: - mmx2: 2.10x (192 vs. 92 cycles) - sse2: 2.84x (192 vs. 68 cycles) Skylake: - mmx2: 1.75x (170 vs. 97 cycles) - sse2: 2.47x (170 vs. 69 cycles) - avx: 2.47x (170 vs. 69 cycles)	2016-12-07 00:29:13 +01:00
James Darnley	58ca2ef62e	whitespace changes after last commit	2016-12-07 00:29:13 +01:00
James Darnley	f33714a694	avcodec/h264: clean up and expand x86 function definitions	2016-12-07 00:29:13 +01:00
James Darnley	13d71c28cc	avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions Yorkfield: - sse2: - complex: 4.13x faster (1514 vs. 367 cycles) - simple: 4.38x faster (1836 vs. 419 cycles) Skylake: - sse2: - complex: 3.61x faster ( 936 vs. 260 cycles) - simple: 3.97x faster (1126 vs. 284 cycles) - avx (versus sse2): - complex: 1.07x faster (260 vs. 244 cycles) - simple: 1.03x faster (284 vs. 274 cycles)	2016-11-30 22:58:28 +01:00
James Darnley	1dae7ffa0b	avcodec/h264: mmx 4:2:2 idct add8 function 2.87 times faster (1830 vs. 638 cycles)	2016-11-30 22:58:27 +01:00
James Darnley	815ea8c6cc	avcodec/h264: mmxext 4:2:2 chroma intra deblock/loop filter 2.1 times faster (401 vs. 194 cycles)	2016-11-30 22:58:27 +01:00
James Almer	2de1c79b61	x86/vp9itxfm: add missing AVX2 guards Fixes compilation with Yasm 1.1.0 and older. Signed-off-by: James Almer <jamrial@gmail.com>	2016-11-18 17:01:11 -03:00
Ronald S. Bultje	83a139e3d8	vp9: add avx2 iadst16 implementations. Also a small cosmetic change to the avx2 idct16 version to make it explicit that one of the arguments to the write-out macros is unused for >=avx2 (it uses pmovzxbw instead of punpcklbw).	2016-11-15 11:01:36 -05:00
Hendrik Leppkes	db854c6c4a	Merge commit '4a081f224e12f4227ae966bcbdd5384f22121ecf' * commit '4a081f224e12f4227ae966bcbdd5384f22121ecf': libavcodec: fix constness in clobber test avcodec_open2() wrappers Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-11-13 17:30:33 +01:00
Andreas Cadhalpun	c8a6eb58d7	doc: fix spelling errors Thanks to Mathieu Malaterre <malat@debian.org> for reporting the Que/Queue typo. (https://bugs.debian.org/839542) Reviewed-by: Lou Logan <lou@lrcd.com> Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>	2016-10-21 23:58:47 +02:00
Rostislav Pehlivanov	d2ae5f77c6	aacenc: add SIMD optimizations for abs_pow34 and quantization Performance improvements: quant_bands: with: 681 decicycles in quant_bands, 8388453 runs, 155 skips without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips Around 42% for the function Twoloop coder: abs_pow34: with/without: 7.82s/8.17s Around 4% for the entire encoder Both: with/without: 7.15s/8.17s Around 12% for the entire encoder Fast coder: abs_pow34: with/without: 3.40s/3.77s Around 10% for the entire encoder Both: with/without: 3.02s/3.77s Around 20% faster for the entire encoder Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: James Almer <jamrial@gmail.com>	2016-10-18 21:41:18 +01:00
James Almer	42111e8543	avcodec: fix arguments on xmm/neon clobber test wrappers Signed-off-by: James Almer <jamrial@gmail.com>	2016-10-02 02:15:47 -03:00
James Almer	449f263f9f	avcodec: add missing xmm/neon clobber test wrappers for the new encode API Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2016-10-01 14:08:50 -03:00
Hendrik Leppkes	5ae0ad001a	x86/h264_weight: use appropriate register size for weight parameters Fixes trac 5579 Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Acked-by: Michael Niedermayer <michael@niedermayer.cc>	2016-09-23 16:40:57 +02:00
Michael Niedermayer	bc26fe8927	avcodec/h264: Use ptrdiff_t for (bi)weight functions Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2016-09-23 04:10:44 +02:00
James Almer	d950279cbf	avcodec/ttadsp: cosmetics Clean some header includes and use the same naming scheme as in ttaencdsp Signed-off-by: James Almer <jamrial@gmail.com>	2016-08-06 18:27:01 -03:00
James Almer	efc9d5c4bc	x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4} Signed-off-by: James Almer <jamrial@gmail.com>	2016-08-02 15:48:04 -03:00
Clément Bœsch	15b26e88cb	Merge commit '9df889a5f116c1ee78c2f239e0ba599c492431aa' * commit '9df889a5f116c1ee78c2f239e0ba599c492431aa': h264: rename h264.[ch] to h264dec.[ch] Merged-by: Clément Bœsch <u@pkh.me>	2016-07-29 11:01:36 +02:00
Ronald S. Bultje	a4edaa0270	vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters. Each takes about 0.1% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).	2016-07-26 15:59:07 -04:00
Ronald S. Bultje	7ca422bb1b	vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters. Each takes about 0.5% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).	2016-07-26 15:59:07 -04:00
Ronald S. Bultje	726501a34e	vp9: add 32x32 idct AVX2 implementation. About 1.8x speedup compared to AVX version for full IDCT. Other sub-IDCT scenarios also see speedups. Full --bench output for idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles): nop: 16.5 vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4 vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0 vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4 vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1 vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2 vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8 vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2 vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9 vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5 vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2 vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1 vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1 vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7 vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7 vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1 vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4 vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8 vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5 vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0 vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4 vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7 vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7 vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4 vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7 vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5 vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6 vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6 vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9 vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6 vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0	2016-07-26 15:59:07 -04:00
James Almer	7a15cf42ee	x86/diracdsp: make ff_put_signed_rect_clamped_10_sse4 work on x86_32 Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2016-07-20 13:43:38 -03:00
Rostislav Pehlivanov	df1dc52195	diracdsp_init: add missing ARCH_X86_64 check That SIMD is still x86_64 only for now. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2016-07-12 00:39:12 +01:00
Rostislav Pehlivanov	bd61f3c6bf	diracdsp: add SIMD for the 10 bit version of put_signed_rect_clamped Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>	2016-07-11 23:33:24 +01:00
Rostislav Pehlivanov	80721cc1ff	diracdsp: add dequantization SIMD Currently unused, to be used in the following commits. Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>	2016-07-11 23:30:11 +01:00
Ronald S. Bultje	f0a2b6249b	vp9: add 16x16 idct avx2 (8-bit). checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4	2016-07-11 10:14:58 -04:00
Clément Bœsch	84ecbbfb27	Merge commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c' * commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c': x86: Add missing movsxd for the int stride parameter Merged-by: Clément Bœsch <u@pkh.me>	2016-07-09 14:52:23 +02:00
James Almer	645489cf90	x86/dcadsp: optimize lfe_fir0_float_fma3 on x86_32 About 10% faster. Signed-off-by: James Almer <jamrial@gmail.com>	2016-07-05 17:48:20 -03:00
James Almer	293484fa5e	avcodec: add missing xmm/neon clobber test wrappers for the new decode API Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2016-07-03 18:04:30 -03:00
Matthieu Bouron	9eb3da2f99	asm: FF_-prefix internal macros used in inline assembly See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.	2016-06-27 17:21:18 +02:00
Clément Bœsch	4a081f224e	libavcodec: fix constness in clobber test avcodec_open2() wrappers Signed-off-by: Martin Storsjö <martin@martin.st>	2016-06-26 21:34:04 +03:00
Hendrik Leppkes	c142dc203e	Merge commit 'dc40a70c5755bccfb1a1349639943e1f408bea50' * commit 'dc40a70c5755bccfb1a1349639943e1f408bea50': Drop unnecessary libavutil/x86/asm.h #includes Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-06-26 15:53:00 +02:00
Clément Bœsch	5d48e4eafa	Merge commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196' * commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196': tests: Move all test programs to a subdirectory Merged-by: Clément Bœsch <clement@stupeflix.com>	2016-06-22 13:44:34 +02:00
Clément Bœsch	8ef57a0d61	Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb' * commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb': cosmetics: Fix spelling mistakes Merged-by: Clément Bœsch <u@pkh.me>	2016-06-21 21:55:34 +02:00
Anton Khirnov	9df889a5f1	h264: rename h264.[ch] to h264dec.[ch] This is more consistent with the naming of other decoders.	2016-06-21 11:11:26 +02:00
Martin Storsjö	f1a9eee41c	x86: Add missing movsxd for the int stride parameter Signed-off-by: Martin Storsjö <martin@martin.st>	2016-06-17 00:11:21 +03:00
James Almer	ede4ec1f8f	x86/aacpsdsp: optimize add_squares loop Signed-off-by: James Almer <jamrial@gmail.com>	2016-06-14 12:41:23 -03:00
James Almer	82dbfccaf0	x86/aacdec: use HADDPS macro Signed-off-by: James Almer <jamrial@gmail.com>	2016-06-08 14:18:18 -03:00
Diego Biurrun	dc40a70c57	Drop unnecessary libavutil/x86/asm.h #includes	2016-05-28 19:18:26 +02:00

1 2 3 4 5 ...

2183 Commits