FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-08 13:22:53 +02:00

Author	SHA1	Message	Date
Paul B Mahol	6d09d6edbc	avcodec/magicyuv: add 10 bit support Signed-off-by: Paul B Mahol <onemda@gmail.com>	2016-12-20 13:32:15 +01:00
James Darnley	acdd2d805d	avcodec/h264: resolve assert being triggered when stack is not aligned 32-bit msvc.	2016-12-07 22:32:19 +01:00
James Darnley	728651df06	avcodec/h264: mmx2, sse2, avx 10-bit 4:2:2 h chroma deblock/loop filter Yorkfield: - mmx2: 2.53x (504 vs. 199 cycles) - sse2: 3.83x (504 vs. 131 cycles) Nehalem: - mmx2: 2.42x (365 vs. 151 cycles) - sse2: 3.56x (365 vs. 103 cycles) Skylake: - mmx2: 1.81x (308 vs. 170 cycles) - sse2: 2.84x (308 vs. 108 cycles) - avx: 2.93x (308 vs. 105 cycles)	2016-12-07 00:29:13 +01:00
James Darnley	add21d0bb3	avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter Yorkfield: - mmx2: 2.45x (279 vs. 114 cycles) - sse2: 3.36x (279 vs. 83 cycles) Nehalem: - mmx2: 2.10x (192 vs. 92 cycles) - sse2: 2.84x (192 vs. 68 cycles) Skylake: - mmx2: 1.75x (170 vs. 97 cycles) - sse2: 2.47x (170 vs. 69 cycles) - avx: 2.47x (170 vs. 69 cycles)	2016-12-07 00:29:13 +01:00
James Darnley	58ca2ef62e	whitespace changes after last commit	2016-12-07 00:29:13 +01:00
James Darnley	f33714a694	avcodec/h264: clean up and expand x86 function definitions	2016-12-07 00:29:13 +01:00
James Darnley	13d71c28cc	avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions Yorkfield: - sse2: - complex: 4.13x faster (1514 vs. 367 cycles) - simple: 4.38x faster (1836 vs. 419 cycles) Skylake: - sse2: - complex: 3.61x faster ( 936 vs. 260 cycles) - simple: 3.97x faster (1126 vs. 284 cycles) - avx (versus sse2): - complex: 1.07x faster (260 vs. 244 cycles) - simple: 1.03x faster (284 vs. 274 cycles)	2016-11-30 22:58:28 +01:00
James Darnley	1dae7ffa0b	avcodec/h264: mmx 4:2:2 idct add8 function 2.87 times faster (1830 vs. 638 cycles)	2016-11-30 22:58:27 +01:00
James Darnley	815ea8c6cc	avcodec/h264: mmxext 4:2:2 chroma intra deblock/loop filter 2.1 times faster (401 vs. 194 cycles)	2016-11-30 22:58:27 +01:00
James Almer	2de1c79b61	x86/vp9itxfm: add missing AVX2 guards Fixes compilation with Yasm 1.1.0 and older. Signed-off-by: James Almer <jamrial@gmail.com>	2016-11-18 17:01:11 -03:00
Ronald S. Bultje	83a139e3d8	vp9: add avx2 iadst16 implementations. Also a small cosmetic change to the avx2 idct16 version to make it explicit that one of the arguments to the write-out macros is unused for >=avx2 (it uses pmovzxbw instead of punpcklbw).	2016-11-15 11:01:36 -05:00
Hendrik Leppkes	db854c6c4a	Merge commit '4a081f224e12f4227ae966bcbdd5384f22121ecf' * commit '4a081f224e12f4227ae966bcbdd5384f22121ecf': libavcodec: fix constness in clobber test avcodec_open2() wrappers Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-11-13 17:30:33 +01:00
Andreas Cadhalpun	c8a6eb58d7	doc: fix spelling errors Thanks to Mathieu Malaterre <malat@debian.org> for reporting the Que/Queue typo. (https://bugs.debian.org/839542) Reviewed-by: Lou Logan <lou@lrcd.com> Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>	2016-10-21 23:58:47 +02:00
Rostislav Pehlivanov	d2ae5f77c6	aacenc: add SIMD optimizations for abs_pow34 and quantization Performance improvements: quant_bands: with: 681 decicycles in quant_bands, 8388453 runs, 155 skips without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips Around 42% for the function Twoloop coder: abs_pow34: with/without: 7.82s/8.17s Around 4% for the entire encoder Both: with/without: 7.15s/8.17s Around 12% for the entire encoder Fast coder: abs_pow34: with/without: 3.40s/3.77s Around 10% for the entire encoder Both: with/without: 3.02s/3.77s Around 20% faster for the entire encoder Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: James Almer <jamrial@gmail.com>	2016-10-18 21:41:18 +01:00
James Almer	42111e8543	avcodec: fix arguments on xmm/neon clobber test wrappers Signed-off-by: James Almer <jamrial@gmail.com>	2016-10-02 02:15:47 -03:00
James Almer	449f263f9f	avcodec: add missing xmm/neon clobber test wrappers for the new encode API Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2016-10-01 14:08:50 -03:00
Hendrik Leppkes	5ae0ad001a	x86/h264_weight: use appropriate register size for weight parameters Fixes trac 5579 Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Acked-by: Michael Niedermayer <michael@niedermayer.cc>	2016-09-23 16:40:57 +02:00
Michael Niedermayer	bc26fe8927	avcodec/h264: Use ptrdiff_t for (bi)weight functions Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2016-09-23 04:10:44 +02:00
James Almer	d950279cbf	avcodec/ttadsp: cosmetics Clean some header includes and use the same naming scheme as in ttaencdsp Signed-off-by: James Almer <jamrial@gmail.com>	2016-08-06 18:27:01 -03:00
James Almer	efc9d5c4bc	x86/ttaenc: add ff_ttaenc_filter_process_{ssse3,sse4} Signed-off-by: James Almer <jamrial@gmail.com>	2016-08-02 15:48:04 -03:00
Clément Bœsch	15b26e88cb	Merge commit '9df889a5f116c1ee78c2f239e0ba599c492431aa' * commit '9df889a5f116c1ee78c2f239e0ba599c492431aa': h264: rename h264.[ch] to h264dec.[ch] Merged-by: Clément Bœsch <u@pkh.me>	2016-07-29 11:01:36 +02:00
Ronald S. Bultje	a4edaa0270	vp9: add mxext versions of the single-block (w=8,npx=8) h/v loopfilters. Each takes about 0.1% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).	2016-07-26 15:59:07 -04:00
Ronald S. Bultje	7ca422bb1b	vp9: add mxext versions of the single-block (w=4,npx=8) h/v loopfilters. Each takes about 0.5% of runtime in my profiles, and they didn't have any SIMD yet so far (we only had simd for npx=16 double-block versions).	2016-07-26 15:59:07 -04:00
Ronald S. Bultje	726501a34e	vp9: add 32x32 idct AVX2 implementation. About 1.8x speedup compared to AVX version for full IDCT. Other sub-IDCT scenarios also see speedups. Full --bench output for idct_32x32_add_{bpp}_${subidct}_${opt} (50k cycles): nop: 16.5 vp9_inv_dct_dct_32x32_add_8_1_c: 2284.4 vp9_inv_dct_dct_32x32_add_8_1_sse2: 145.0 vp9_inv_dct_dct_32x32_add_8_1_ssse3: 137.4 vp9_inv_dct_dct_32x32_add_8_1_avx: 137.1 vp9_inv_dct_dct_32x32_add_8_1_avx2: 73.2 vp9_inv_dct_dct_32x32_add_8_2_c: 14680.8 vp9_inv_dct_dct_32x32_add_8_2_sse2: 2617.2 vp9_inv_dct_dct_32x32_add_8_2_ssse3: 982.9 vp9_inv_dct_dct_32x32_add_8_2_avx: 958.5 vp9_inv_dct_dct_32x32_add_8_2_avx2: 704.2 vp9_inv_dct_dct_32x32_add_8_4_c: 14443.1 vp9_inv_dct_dct_32x32_add_8_4_sse2: 2717.1 vp9_inv_dct_dct_32x32_add_8_4_ssse3: 965.7 vp9_inv_dct_dct_32x32_add_8_4_avx: 1000.7 vp9_inv_dct_dct_32x32_add_8_4_avx2: 717.1 vp9_inv_dct_dct_32x32_add_8_8_c: 14436.4 vp9_inv_dct_dct_32x32_add_8_8_sse2: 2671.8 vp9_inv_dct_dct_32x32_add_8_8_ssse3: 1038.5 vp9_inv_dct_dct_32x32_add_8_8_avx: 983.0 vp9_inv_dct_dct_32x32_add_8_8_avx2: 729.4 vp9_inv_dct_dct_32x32_add_8_16_c: 14614.7 vp9_inv_dct_dct_32x32_add_8_16_sse2: 2701.7 vp9_inv_dct_dct_32x32_add_8_16_ssse3: 1334.4 vp9_inv_dct_dct_32x32_add_8_16_avx: 1276.7 vp9_inv_dct_dct_32x32_add_8_16_avx2: 719.5 vp9_inv_dct_dct_32x32_add_8_32_c: 14363.6 vp9_inv_dct_dct_32x32_add_8_32_sse2: 2575.6 vp9_inv_dct_dct_32x32_add_8_32_ssse3: 2633.9 vp9_inv_dct_dct_32x32_add_8_32_avx: 2539.6 vp9_inv_dct_dct_32x32_add_8_32_avx2: 1395.0	2016-07-26 15:59:07 -04:00
James Almer	7a15cf42ee	x86/diracdsp: make ff_put_signed_rect_clamped_10_sse4 work on x86_32 Reviewed-by: Rostislav Pehlivanov <atomnuker@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2016-07-20 13:43:38 -03:00
Rostislav Pehlivanov	df1dc52195	diracdsp_init: add missing ARCH_X86_64 check That SIMD is still x86_64 only for now. Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com>	2016-07-12 00:39:12 +01:00
Rostislav Pehlivanov	bd61f3c6bf	diracdsp: add SIMD for the 10 bit version of put_signed_rect_clamped Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>	2016-07-11 23:33:24 +01:00
Rostislav Pehlivanov	80721cc1ff	diracdsp: add dequantization SIMD Currently unused, to be used in the following commits. Signed-off-by: Rostislav Pehlivanov <rpehlivanov@obe.tv>	2016-07-11 23:30:11 +01:00
Ronald S. Bultje	f0a2b6249b	vp9: add 16x16 idct avx2 (8-bit). checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows that it's about 1.65x as fast as the AVX version for the full IDCT, and similar speedups for the sub-IDCTs: nop: 24.6 vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8 vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6 vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4 vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2 vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5 vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7 vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9 vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2 vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9 vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3 vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7 vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4 vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1 vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1 vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0 vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4 vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6 vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7 vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9 vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2 vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6 vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5 vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0 vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9 vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4	2016-07-11 10:14:58 -04:00
Clément Bœsch	84ecbbfb27	Merge commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c' * commit 'f1a9eee41c4b5ea35db9ff0088ce4e6f1e187f2c': x86: Add missing movsxd for the int stride parameter Merged-by: Clément Bœsch <u@pkh.me>	2016-07-09 14:52:23 +02:00
James Almer	645489cf90	x86/dcadsp: optimize lfe_fir0_float_fma3 on x86_32 About 10% faster. Signed-off-by: James Almer <jamrial@gmail.com>	2016-07-05 17:48:20 -03:00
James Almer	293484fa5e	avcodec: add missing xmm/neon clobber test wrappers for the new decode API Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2016-07-03 18:04:30 -03:00
Matthieu Bouron	9eb3da2f99	asm: FF_-prefix internal macros used in inline assembly See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.	2016-06-27 17:21:18 +02:00
Clément Bœsch	4a081f224e	libavcodec: fix constness in clobber test avcodec_open2() wrappers Signed-off-by: Martin Storsjö <martin@martin.st>	2016-06-26 21:34:04 +03:00
Hendrik Leppkes	c142dc203e	Merge commit 'dc40a70c5755bccfb1a1349639943e1f408bea50' * commit 'dc40a70c5755bccfb1a1349639943e1f408bea50': Drop unnecessary libavutil/x86/asm.h #includes Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-06-26 15:53:00 +02:00
Clément Bœsch	5d48e4eafa	Merge commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196' * commit 'a6a750c7ef240b72ce01e9653343a0ddf247d196': tests: Move all test programs to a subdirectory Merged-by: Clément Bœsch <clement@stupeflix.com>	2016-06-22 13:44:34 +02:00
Clément Bœsch	8ef57a0d61	Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb' * commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb': cosmetics: Fix spelling mistakes Merged-by: Clément Bœsch <u@pkh.me>	2016-06-21 21:55:34 +02:00
Anton Khirnov	9df889a5f1	h264: rename h264.[ch] to h264dec.[ch] This is more consistent with the naming of other decoders.	2016-06-21 11:11:26 +02:00
Martin Storsjö	f1a9eee41c	x86: Add missing movsxd for the int stride parameter Signed-off-by: Martin Storsjö <martin@martin.st>	2016-06-17 00:11:21 +03:00
James Almer	ede4ec1f8f	x86/aacpsdsp: optimize add_squares loop Signed-off-by: James Almer <jamrial@gmail.com>	2016-06-14 12:41:23 -03:00
James Almer	82dbfccaf0	x86/aacdec: use HADDPS macro Signed-off-by: James Almer <jamrial@gmail.com>	2016-06-08 14:18:18 -03:00
Diego Biurrun	dc40a70c57	Drop unnecessary libavutil/x86/asm.h #includes	2016-05-28 19:18:26 +02:00
Diego Biurrun	1e9c5bf4c1	asm: FF_-prefix internal macros used in inline assembly These warnings conflict with system macros on Solaris, producing truckloads of warnings about macro redefinition.	2016-05-28 19:18:26 +02:00
Diego Biurrun	a6a750c7ef	tests: Move all test programs to a subdirectory	2016-05-13 14:55:56 +02:00
Christophe Gisquet	9630b3fc06	x86: lossless audio: SSE4 madd 32bits The unique user so far is wmalossless 24bits. The few samples tested show an order of 8, so more unrolling or an avx2 version do not make sense. Timings: 68 -> 49 cycles Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2016-05-07 23:28:48 +02:00
Vittorio Giovara	41ed7ab45f	cosmetics: Fix spelling mistakes Signed-off-by: Diego Biurrun <diego@biurrun.de>	2016-05-04 18:16:21 +02:00
Derek Buitenhuis	d496d52d02	Merge commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622' * commit '73ff983e8dd22ccee166403d0bbbc9c1cd543622': fft: x86: cosmetics: Drop silly comments, add comment, whitespace Merged-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2016-04-12 15:42:21 +01:00
Diego Biurrun	01621202aa	build: miscellaneous cosmetics Restore alphabetical order in lists, break overly long lines, do some prettyprinting, add some explanatory section comments, group parts together that belong together logically.	2016-04-07 15:26:08 +02:00
Michael Niedermayer	305344d89e	avcodec/fft: Add revtab32 for FFTs with more than 65536 samples x86 optimizations are used only for the cases they support (<=65536 samples) Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2016-03-04 16:05:47 +01:00
Michael Niedermayer	ae76b84221	avcodec: Extend fft to size 2^17 Asked-for-by: durandal_1707 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2016-03-04 13:51:42 +01:00

1 2 3 4 5 ...

2175 Commits