FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-03 05:10:03 +02:00

Author	SHA1	Message	Date
Mirage Abeysekara	5eb4f95bef	h264pred: added AVX2 implementation for tm_vp8 16x16. checkasm --bench results with 5000 runs pred16x16_tm_vp8_c: 302.8 pred16x16_tm_vp8_mmx: 101.4 pred16x16_tm_vp8_mmxext: 95.5 pred16x16_tm_vp8_sse2: 95.1 pred16x16_tm_vp8_avx2: 38.2 Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2017-03-20 09:45:42 -04:00
James Almer	6966a5e4d7	Merge commit '721d57e608dc4fd6c86f27c5ae76ef559d646220' * commit '721d57e608dc4fd6c86f27c5ae76ef559d646220': vp56: Separate VP5 and VP6 dsp initialization Merged-by: James Almer <jamrial@gmail.com>	2017-03-19 17:15:24 -03:00
James Almer	663640d745	Merge commit '3fd22538bc0e0de84b31335266b4b1577d3d609e' * commit '3fd22538bc0e0de84b31335266b4b1577d3d609e': prores: Change type of stride parameters to ptrdiff_t Merged-by: James Almer <jamrial@gmail.com>	2017-03-19 15:30:13 -03:00
James Almer	aec42ebc27	Merge commit 'f81be06cf614919d71ded29b8f595bef40123ad8' * commit 'f81be06cf614919d71ded29b8f595bef40123ad8': cavs: Change type of stride parameters to ptrdiff_t Merged-by: James Almer <jamrial@gmail.com>	2017-03-19 15:23:52 -03:00
James Almer	4e4dfcac58	Merge commit '802727b538b484e3f9d1345bfcc4ab24cfea8898' * commit '802727b538b484e3f9d1345bfcc4ab24cfea8898': vp8: Update some assembly comments left unchanged in `bd66f073fe` Merged-by: James Almer <jamrial@gmail.com>	2017-03-19 15:18:31 -03:00
James Almer	4004d33fcb	Merge commit 'd9d26a3674f31f482f54e936fcb382160830877a' * commit 'd9d26a3674f31f482f54e936fcb382160830877a': vp56: Change type of stride parameters to ptrdiff_t Merged-by: James Almer <jamrial@gmail.com>	2017-03-19 14:54:25 -03:00
Clément Bœsch	6a42a54b9d	Merge commit '6892df9294d93322d43255ada299507465bc93c8' * commit '6892df9294d93322d43255ada299507465bc93c8': vp3: Change type of stride parameters to ptrdiff_t Merged-by: Clément Bœsch <u@pkh.me>	2017-03-19 18:41:26 +01:00
Clément Bœsch	8695ce73ca	Merge commit 'e2b9993558b6adee42dcc6eb385a14943aaca974' * commit 'e2b9993558b6adee42dcc6eb385a14943aaca974': simple_idct: x86: Drop disabled IDCT implementation Merged-by: Clément Bœsch <u@pkh.me>	2017-03-19 16:11:11 +01:00
Clément Bœsch	8286c359ad	Merge commit 'e99ecda55082cb9dde8fd349361e169dc383943a' * commit 'e99ecda55082cb9dde8fd349361e169dc383943a': checkasm: add vp9 MC tests. vp9mc/x86: sse2 MC assembly. vp9mc/x86: add AVX and AVX2 MC vp9mc/x86: rename ff_* to ff_vp9_* vp9mc/x86: rename ff_avg[48]_sse to ff_avg[48]_mmxext vp9mc/x86: simplify a few inits. vp9mc/x86: add 16px functions (64bit only). Noop (aside from a formatting comment in vp9mc.asm). We already have all of this. We should consider making a final diff between the two projects when the dust comes down. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-16 20:25:39 +01:00
Clément Bœsch	a4f5e79f7c	Merge commit '89466de4aeaf5e359489b81b8a9920a2bc7936d6' * commit '89466de4aeaf5e359489b81b8a9920a2bc7936d6': vp9/x86: rename vp9dsp to vp9mc File was already renamed, only the top description is updated. Merged-by: Clément Bœsch <u@pkh.me>	2017-03-16 20:10:47 +01:00
James Almer	e632fe9bab	Merge commit '3c504bc3599f00bfc5923adc114beef34bce11d0' * commit '3c504bc3599f00bfc5923adc114beef34bce11d0': x86: deduplicate some constants Merged-by: James Almer <jamrial@gmail.com>	2017-03-15 22:07:28 -03:00
Michael Niedermayer	835d9f299c	avcodec/x86/cavsdsp: Put MMX code under mmx check Without this the FPU state becomes trashed and causes mysterious fate failures with cpuflags=0 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-03-06 16:47:17 +01:00
James Darnley	33de0fee2c	avcodec/h264: enable sse2 chroma deblock/loop filter functions Between 1.00 and 1.16 times faster on Intel Yorkfield Core 2 Quad. Between 1.11 and 1.39 times faster on Intel Kaby Lake Pentium.	2017-02-27 13:22:06 +01:00
James Darnley	cd893b9307	avcodec/h264: add avx 8-bit 4:2:2 chroma h intra deblock/loop filter ~1.37x faster (147 vs. 108 cycles) compared to mmxext function	2017-02-27 13:22:06 +01:00
James Darnley	0e16b3e2be	avcodec/h264: add avx 8-bit 4:2:0 chroma h intra deblock/loop filter ~1.10x faster (69 vs. 63 cycles) compared to mmxext function	2017-02-27 13:22:06 +01:00
James Darnley	987ffe4b8d	avcodec/h264: add avx 8-bit chroma v intra deblock/loop filter ~1.14x faster (90 vs 78 cycles) compared with mmxext	2017-02-27 13:22:06 +01:00
James Darnley	88307b3eec	avcodec/h264: add avx 8-bit 4:2:2 chroma h deblock/loop filter ~1.21x faster (68 vs. 56 cycles) compared with mmxext function	2017-02-27 13:22:06 +01:00
James Darnley	ac096fc82d	avcodec/h264: add avx 8-bit 4:2:0 chroma h deblock/loop filter ~1.14x faster (93 vs. 81 cycles) compared with mmxext function	2017-02-27 13:22:06 +01:00
James Darnley	5c56758843	avcodec/h264: add avx 8-bit chroma v deblock/loop filter ~1.24x faster (101 vs. 81 cycles) compared with mmxext function	2017-02-27 13:22:06 +01:00
James Darnley	5336887867	avcodec/h264: sse2, avx h luma mbaff deblock/loop filter x86-64 only Yorkfield: - sse2: ~2.17x (434 vs. 200 cycles) Nehalem: - sse2: ~2.94x (409 vs. 139 cycles) Skylake: - sse2: ~3.10x (370 vs. 119 cycles) - avx: ~3.29x (370 vs. 112 cycles)	2017-02-18 20:26:52 +01:00
James Darnley	e18bc2114f	avcodec/h264: add named parameters to x86 function	2017-02-18 20:26:50 +01:00
James Darnley	9d815b7424	avcodec/x86: deduplicate PASS8ROWS macro	2017-02-18 20:26:49 +01:00
James Almer	c8467abbad	x86/rv34dsp: add ff_rv34_idct_dc_add_sse2 Also disable ff_rv34_idct_dc_add_mmx on x86_64 as the presence of sse2 is guaranteed in such builds. Signed-off-by: James Almer <jamrial@gmail.com>	2017-02-02 17:51:21 -03:00
James Almer	ab5c4d006d	x86/vp8dsp: add ff_vp8_idct_dc_add_sse2 Also disable ff_vp8_idct_dc_add_mmx on x86_64 as the presence of sse2 is guaranteed in such builds. Signed-off-by: James Almer <jamrial@gmail.com>	2017-02-02 17:18:58 -03:00
Michael Niedermayer	536ac72f46	Revert "Merge commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553'" The assumption this is based on is wrong, the code is not always run with bitexact flags This reverts commit `a956164e1e`, reversing changes made to `f6005907fd`. Approved-by: James Almer <jamrial@gmail.com>	2017-02-01 02:01:07 +01:00
James Almer	ba5d089381	Merge commit 'd06dfaa5cbdd20acfd2364b16c0f4ae4ddb30a65' * commit 'd06dfaa5cbdd20acfd2364b16c0f4ae4ddb30a65': x86: huffyuv: Use EXTERNAL_SSSE3_FAST convenience macro where appropriate Merged-by: James Almer <jamrial@gmail.com>	2017-01-31 15:36:49 -03:00
James Almer	ac774cfa57	Merge commit '4efab89332ea39a77145e8b15562b981d9dbde68' * commit '4efab89332ea39a77145e8b15562b981d9dbde68': x86: Use _FAST/_SLOW CPU feature detection macros where appropriate Merged-by: James Almer <jamrial@gmail.com>	2017-01-31 15:08:19 -03:00
James Almer	a956164e1e	Merge commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553' * commit '0a39c9ac0bfd7345fe676b4e2707d9cec3cbb553': x86: hpeldsp: Don't check for bitexact flag when initializing VP3-specific code Merged-by: James Almer <jamrial@gmail.com>	2017-01-31 14:59:29 -03:00
James Almer	f6005907fd	Merge commit '95c1df929b92d81454656c222a35ec5f7db576b4' * commit '95c1df929b92d81454656c222a35ec5f7db576b4': x86: hpeldsp: Drop unused function parameters Merged-by: James Almer <jamrial@gmail.com>	2017-01-31 14:56:11 -03:00
James Almer	4d0e89ce27	Merge commit 'c3e83ad3b7d75f3597f47ada2616ba4479665009' * commit 'c3e83ad3b7d75f3597f47ada2616ba4479665009': x86: hpeldsp: Use EXTERNAL_SSE2_FAST where appropriate Merged-by: James Almer <jamrial@gmail.com>	2017-01-31 14:53:27 -03:00
James Almer	ca8a3978e5	Merge commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5' * commit '1dfc3cf89d0eb026af28be46294b85d79499ffb5': x86: hpeldsp: Split off VP3-specific bits into a separate file Merged-by: James Almer <jamrial@gmail.com>	2017-01-31 14:49:29 -03:00
Clément Bœsch	7c300a8ed4	lavc/hevc: remove a few random spaces to reduce diff with libav	2017-01-31 17:02:24 +01:00
Clément Bœsch	78d16eb452	Merge commit 'fca3c3b61952aacc45e9ca54d86a762946c21942' * commit 'fca3c3b61952aacc45e9ca54d86a762946c21942': hevc: Add AVX2 DC IDCT Mostly noop as we already have that code. In the ASM, code is merged with the exception of SECTION which is kept uppercase for consistency with the rest of the codebase. Still in the ASM, the prototype comment is fixed to honor the '_' added from the original commit. idct_dc_proto() is dropped as it's not used anymore here. Merged-by: Clément Bœsch <cboesch@gopro.com>	2017-01-31 16:53:37 +01:00
Clément Bœsch	d0e132bab6	Merge commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d' * commit '1bd890ad173d79e7906c5e1d06bf0a06cca4519d': hevc: Separate adding residual to prediction from IDCT This commit should be a noop but isn't because of the following renames: - transform_add → add_residual - transform_skip → dequant - idct_4x4_luma → transform_4x4_luma Merged-by: Clément Bœsch <cboesch@gopro.com>	2017-01-31 15:31:34 +01:00
Anton Khirnov	b4a911c189	mpegvideoenc: make a table const	2017-01-19 09:52:21 +01:00
James Almer	6d4c9f2ade	lossless_videodsp: rename add_hfyu_left_pred_int16 to add_left_pred_int16 Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:05 -03:00
James Almer	47f212329e	huffyuvdsp: move functions only used by huffyuv from lossless_videodsp Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:05 -03:00
James Almer	cf9ef83960	huffyuvencdsp: move shared functions to a new lossless_videoencdsp context Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:04 -03:00
James Almer	30c1f27299	huffyuvencdsp: move functions only used by huffyuv from lossless_videodsp Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:04 -03:00
James Almer	5ac1dd8e23	lossless_videodsp: move shared functions from huffyuvdsp Several codecs other than huffyuv use them. Signed-off-by: James Almer <jamrial@gmail.com>	2017-01-12 22:53:04 -03:00
Michael Niedermayer	aa95292043	avcodec/x86/vc1dsp_mc: Fix build with NASM 2.09.10 make fate passes Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-02 22:37:55 +01:00
John Comeau	d06518752b	avcodec/x86/imdct36: fix building with nasm 2.11.05 fixes `operation size not specified` errors as described here: http://stackoverflow.com/questions/36854583/compiling-ffmpeg-for-kali-linux-2 I rebuilt again with yasm and made sure it didn't break that. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2017-01-02 20:44:16 +01:00
Paul B Mahol	6d09d6edbc	avcodec/magicyuv: add 10 bit support Signed-off-by: Paul B Mahol <onemda@gmail.com>	2016-12-20 13:32:15 +01:00
James Darnley	acdd2d805d	avcodec/h264: resolve assert being triggered when stack is not aligned 32-bit msvc.	2016-12-07 22:32:19 +01:00
James Darnley	728651df06	avcodec/h264: mmx2, sse2, avx 10-bit 4:2:2 h chroma deblock/loop filter Yorkfield: - mmx2: 2.53x (504 vs. 199 cycles) - sse2: 3.83x (504 vs. 131 cycles) Nehalem: - mmx2: 2.42x (365 vs. 151 cycles) - sse2: 3.56x (365 vs. 103 cycles) Skylake: - mmx2: 1.81x (308 vs. 170 cycles) - sse2: 2.84x (308 vs. 108 cycles) - avx: 2.93x (308 vs. 105 cycles)	2016-12-07 00:29:13 +01:00
James Darnley	add21d0bb3	avcodec/h264: mmx2, sse2, avx 10-bit h chroma deblock/loop filter Yorkfield: - mmx2: 2.45x (279 vs. 114 cycles) - sse2: 3.36x (279 vs. 83 cycles) Nehalem: - mmx2: 2.10x (192 vs. 92 cycles) - sse2: 2.84x (192 vs. 68 cycles) Skylake: - mmx2: 1.75x (170 vs. 97 cycles) - sse2: 2.47x (170 vs. 69 cycles) - avx: 2.47x (170 vs. 69 cycles)	2016-12-07 00:29:13 +01:00
James Darnley	58ca2ef62e	whitespace changes after last commit	2016-12-07 00:29:13 +01:00
James Darnley	f33714a694	avcodec/h264: clean up and expand x86 function definitions	2016-12-07 00:29:13 +01:00
Diego Biurrun	0a35f128f3	cabac: x86: Give optimizations header a more meaningful name	2016-12-01 08:23:54 +01:00
James Darnley	13d71c28cc	avcodec/h264: sse2 and avx 4:2:2 idct add8 10-bit functions Yorkfield: - sse2: - complex: 4.13x faster (1514 vs. 367 cycles) - simple: 4.38x faster (1836 vs. 419 cycles) Skylake: - sse2: - complex: 3.61x faster ( 936 vs. 260 cycles) - simple: 3.97x faster (1126 vs. 284 cycles) - avx (versus sse2): - complex: 1.07x faster (260 vs. 244 cycles) - simple: 1.03x faster (284 vs. 274 cycles)	2016-11-30 22:58:28 +01:00
James Darnley	1dae7ffa0b	avcodec/h264: mmx 4:2:2 idct add8 function 2.87 times faster (1830 vs. 638 cycles)	2016-11-30 22:58:27 +01:00
James Darnley	815ea8c6cc	avcodec/h264: mmxext 4:2:2 chroma intra deblock/loop filter 2.1 times faster (401 vs. 194 cycles)	2016-11-30 22:58:27 +01:00
James Almer	2de1c79b61	x86/vp9itxfm: add missing AVX2 guards Fixes compilation with Yasm 1.1.0 and older. Signed-off-by: James Almer <jamrial@gmail.com>	2016-11-18 17:01:11 -03:00
Ronald S. Bultje	83a139e3d8	vp9: add avx2 iadst16 implementations. Also a small cosmetic change to the avx2 idct16 version to make it explicit that one of the arguments to the write-out macros is unused for >=avx2 (it uses pmovzxbw instead of punpcklbw).	2016-11-15 11:01:36 -05:00
Hendrik Leppkes	db854c6c4a	Merge commit '4a081f224e12f4227ae966bcbdd5384f22121ecf' * commit '4a081f224e12f4227ae966bcbdd5384f22121ecf': libavcodec: fix constness in clobber test avcodec_open2() wrappers Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>	2016-11-13 17:30:33 +01:00
Diego Biurrun	0361e4dcb4	h264_qpel: x86: Move function with only one instance out of template macro libavcodec/x86/h264_qpel.c:392:785: warning: unused function 'ff_avg_h264_qpel8or16_hv1_lowpass_mmxext' [-Wunused-function]	2016-11-08 17:21:02 +01:00
Diego Biurrun	3cba09e522	x86: Drop stray semicolons after function definitions libavcodec/x86/rv40dsp_init.c:97:2: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic] libavcodec/x86/vp9dsp_init.c:94:40: warning: ISO C does not allow extra ‘;’ outside of a function [-Wpedantic]	2016-11-05 12:41:45 +01:00
Martin Storsjö	2e55e26b40	vp9: Flip the order of arguments in MC functions This makes it match the pattern already used for VP8 MC functions. This also makes the signature match ffmpeg's version of these functions, easing porting of code in both directions. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-11-03 09:12:02 +02:00
Pierre Edouard Lepere	6d5636ad9a	hevc: x86: Add add_residual() SIMD optimizations Initially written by Pierre Edouard Lepere <Pierre-Edouard.Lepere@insa-rennes.fr>, extended by James Almer <jamrial@gmail.com>. Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>	2016-10-22 17:33:35 +02:00
Andreas Cadhalpun	c8a6eb58d7	doc: fix spelling errors Thanks to Mathieu Malaterre <malat@debian.org> for reporting the Que/Queue typo. (https://bugs.debian.org/839542) Reviewed-by: Lou Logan <lou@lrcd.com> Signed-off-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>	2016-10-21 23:58:47 +02:00
Diego Biurrun	788544ff0e	audiodsp: x86: Remove pointless header file Its single forward declaration can be moved to the only place it is used, like is done for all other dsp init files.	2016-10-19 15:20:41 +02:00
Diego Biurrun	b89804da9b	x86: videodsp: Add parentheses to expression to work around warning libavcodec/x86/videodsp.asm:128: warning: signed dword value exceeds bounds	2016-10-19 10:13:34 +02:00
Rostislav Pehlivanov	d2ae5f77c6	aacenc: add SIMD optimizations for abs_pow34 and quantization Performance improvements: quant_bands: with: 681 decicycles in quant_bands, 8388453 runs, 155 skips without: 1190 decicycles in quant_bands, 8388386 runs, 222 skips Around 42% for the function Twoloop coder: abs_pow34: with/without: 7.82s/8.17s Around 4% for the entire encoder Both: with/without: 7.15s/8.17s Around 12% for the entire encoder Fast coder: abs_pow34: with/without: 3.40s/3.77s Around 10% for the entire encoder Both: with/without: 3.02s/3.77s Around 20% faster for the entire encoder Signed-off-by: Rostislav Pehlivanov <atomnuker@gmail.com> Tested-by: Michael Niedermayer <michael@niedermayer.cc> Reviewed-by: James Almer <jamrial@gmail.com>	2016-10-18 21:41:18 +01:00
Diego Biurrun	6be7944ee2	x86: Add missing colons after assembly labels This fixes many warnings of the sort warning: label alone on a line without a colon might be in error	2016-10-17 16:31:26 +02:00
Alexandra Hájková	112cee0241	hevc: Add SSE2 and AVX IDCT Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-11 18:21:04 +02:00
Anton Khirnov	e4128c08d7	Revert "hevc: x86: Refactor IDCT macro declarations" This reverts commit `d9dccc0389`. There were outstanding objections to this commit.	2016-10-06 15:24:04 +02:00
Diego Biurrun	5801f9ed24	h264_intrapred: x86: Update comments left behind in `95c89da36e`	2016-10-06 12:32:34 +02:00
Diego Biurrun	d9dccc0389	hevc: x86: Refactor IDCT macro declarations	2016-10-06 12:32:34 +02:00
Ronald S. Bultje	715f139c9b	vp9lpf/x86: make filter_16_h work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:09 +02:00
Ronald S. Bultje	8915320db9	vp9lpf/x86: make filter_48/84/88_h work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:09 +02:00
Ronald S. Bultje	725a216481	vp9lpf/x86: make filter_44_h work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:09 +02:00
Ronald S. Bultje	5bfa96c4b3	vp9lpf/x86: make filter_16_v work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:09 +02:00
Ronald S. Bultje	b905e8d2fe	vp9lpf/x86: make filter_48/84_v work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	37637e6590	vp9lpf/x86: make filter_88_v work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	be10834bd9	vp9lpf/x86: make filter_44_v work on 32-bit. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	7c62891efe	vp9lpf/x86: save one register in SIGN_ADD/SUB. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	c6375a83d1	vp9lpf/x86: store unpacked intermediates for filter6/14 on stack. filter16 goes from 508 to 482 (h) or 346 to 314 (v) cycles; filter88 goes from 240 to 238 (h) or 174 to 165 (v) cycles, measured on TOS. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	4ce8ba72f9	vp9lpf/x86: move variable assigned inside macro branch. The value is not used outside the branch. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	e4961035b2	vp9lpf/x86: simplify ABSSUM_CMP by inverting the comparison meaning. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	683da2788e	vp9lpf/x86: remove unused register from ABSSUB_CMP macro. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	6e74e9636b	vp9lpf/x86: slightly simplify 44/48/84/88 h stores. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	6411c328a2	vp9lpf/x86: make cglobal statement more conservative in register allocation. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Ronald S. Bultje	a6e288d624	vp9lpf/x86: save one register in loopfilter surface coverage. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Clément Bœsch	0ed21bdc9e	vp9lpf/x86: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Clément Bœsch	f2e3d706a1	vp9lpf/x86: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}(). Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
James Almer	92d47550ea	vp9lpf/x86: add an SSE2 version of vp9_loop_filter_[vh]_88_16 Similar gains as the ssse3 version once again Additional improvements by Clément Bœsch <u@pkh.me>. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Clément Bœsch	6bea478158	vp9lpf/x86: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
James Almer	1f451eed60	vp9lpf/x86: add ff_vp9_loop_filter_[vh]_16_16_sse2(). Similar gains in performance as the SSSE3 version Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
Clément Bœsch	a692724c58	vp9lpf/x86: add x86 SSSE3/AVX SIMD for vp9_loop_filter_[vh]_16_16. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2016-10-04 10:54:08 +02:00
James Almer	42111e8543	avcodec: fix arguments on xmm/neon clobber test wrappers Signed-off-by: James Almer <jamrial@gmail.com>	2016-10-02 02:15:47 -03:00
James Almer	449f263f9f	avcodec: add missing xmm/neon clobber test wrappers for the new encode API Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2016-10-01 14:08:50 -03:00
Justin Ruggles	b57e38f52c	ac3dsp: x86: Replace inline asm for in-decoder downmixing with standalone asm Adds a wrapper function for downmixing which detects channel count changes and updates the selected downmix function accordingly. Simplification and porting to current x86inc infrastructure by Diego Biurrun. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2016-10-01 00:46:25 +02:00
Justin Ruggles	43717469f9	ac3dsp: Reverse matrix in/out order in downmix() Also use (float *) instead of (float ()[2]). This matches the matrix layout in libavresample so we can reuse assembly code between the two. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2016-10-01 00:45:55 +02:00
Hendrik Leppkes	8d1267932c	x86/h264_weight: use appropriate register size for weight parameters This fixes decoding corruption on 64 bit windows. Signed-off-by: Martin Storsjö <martin@martin.st>	2016-09-30 12:18:22 +03:00
Diego Biurrun	2caa93b813	mpegaudiodsp: Change type of array stride parameters to ptrdiff_t This avoids SIMD-optimized functions having to sign-extend their stride argument manually to be able to do pointer arithmetic.	2016-09-29 17:54:24 +02:00
Diego Biurrun	e4a94d8b36	h264chroma: Change type of stride parameters to ptrdiff_t This avoids SIMD-optimized functions having to sign-extend their stride argument manually to be able to do pointer arithmetic.	2016-09-29 14:48:04 +02:00
Diego Biurrun	2ec9fa5ec6	idct: Change type of array stride parameters to ptrdiff_t ptrdiff_t is the correct type for array strides and similar.	2016-09-29 14:48:03 +02:00
Diego Biurrun	009adfd4fb	x86: fpel: Remove unnecessary sign extend	2016-09-29 14:47:41 +02:00
Anton Khirnov	de2ae3c1fa	lavc: add clobber tests for the new encoding/decoding API	2016-09-28 10:01:52 +02:00
Hendrik Leppkes	5ae0ad001a	x86/h264_weight: use appropriate register size for weight parameters Fixes trac 5579 Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com> Acked-by: Michael Niedermayer <michael@niedermayer.cc>	2016-09-23 16:40:57 +02:00

1 2 3 4 5 ...

2336 Commits