FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-23 12:43:46 +02:00

Author	SHA1	Message	Date
Christophe Gisquet	9107612818	x86util: add and use RSHIFT/LSHIFT macros Those macros take a byte number as shift argument, as this argument differs between MMX and SSE2 instructions. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-06-15 13:19:27 +02:00
James Almer	85065d2a7c	x86/float_dsp: add missing femms It was lost during the port. Should fix fate on 3dnowext machines. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-06-08 20:06:28 +02:00
James Almer	dcaf9660b6	x86/float_dsp: port vector_fmul_window to yasm Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-06-08 12:41:32 +02:00
James Almer	fc8db12a73	x86/vp9: inital AVX2 intra_pred tos3k-vp9-b10000.webm on a Core i5-4200U @1.6GHz 1219 decicycles in ff_vp9_ipred_dc_32x32_ssse3, 131070 runs, 2 skips 439 decicycles in ff_vp9_ipred_dc_32x32_avx2, 131070 runs, 2 skips 3570 decicycles in ff_vp9_ipred_dc_top_32x32_ssse3, 4096 runs, 0 skips 2494 decicycles in ff_vp9_ipred_dc_top_32x32_avx2, 4096 runs, 0 skips 1419 decicycles in ff_vp9_ipred_dc_left_32x32_ssse3, 16384 runs, 0 skips 717 decicycles in ff_vp9_ipred_dc_left_32x32_avx2, 16384 runs, 0 skips 2737 decicycles in ff_vp9_ipred_tm_32x32_avx, 1024 runs, 0 skips 2088 decicycles in ff_vp9_ipred_tm_32x32_avx2, 1024 runs, 0 skips 3090 decicycles in ff_vp9_ipred_v_32x32_avx, 512 runs, 0 skips 2226 decicycles in ff_vp9_ipred_v_32x32_avx2, 512 runs, 0 skips 1565 decicycles in ff_vp9_ipred_h_32x32_avx, 1024 runs, 0 skips 922 decicycles in ff_vp9_ipred_h_32x32_avx2, 1024 runs, 0 skips Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-06-08 02:37:20 +02:00
Christophe Gisquet	2267003981	x86: hpeldsp: better factorization Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-29 21:47:40 +02:00
James Almer	561bfc85eb	x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1} Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-28 23:29:34 +02:00
Matt Oliver	1898c2f49d	inline asm: fix arrays as named constraints. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-05-07 15:02:45 +02:00
James Almer	3b06208a57	x86/float_dsp: remove duplicated code from vector_dmul_scalar Use the xm# and ym# aliases as they remain in sync with m# after a SWAP. No actual changes to the assembly. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-19 14:21:51 +02:00
James Almer	76ed71a72b	x86: move horizontal add macros to x86util Also port relevant AVX2/XOP optimizations from x264 with permission to relicense to LGPL from the corresponding authors Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-17 14:15:09 +02:00
James Almer	11b36b1ee0	x86/float_dsp: unroll loop in vector_fmac_scalar ~6% faster SSE2 performance. AVX/FMA3 are unaffected. Signed-off-by: James Almer <jamrial@gmail.com> Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-16 18:36:52 +02:00
James Almer	3b808900af	x86/float_dsp: use SWAP in vector_fmac_scalar Win64 The mova is unnecessary Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-04-16 15:46:21 +02:00
James Almer	2d9821a208	x86/cpu: check for OS support before enabling AVX2 AV_CPU_FLAG_AVX is enabled at this point only if there's OS support. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-25 17:56:43 +01:00
Matt Oliver	8236747511	Automatically change MANGLE() into named inline asm operands when direct symbol reference in inline asm are not supported. This is part of the patch-set for intel C inline asm on windows support Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-18 23:39:30 +01:00
James Almer	7d7487e85c	x86/float_dsp: add ff_vector_{fmul_add, fmac_scalar}_fma3 ~7% faster than AVX Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-13 04:34:05 +01:00
Michael Niedermayer	4159f702a7	avutil/timer: Fix units for x86 after `c708b54033` Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-03-09 15:22:02 +01:00
James Almer	3f3d748cab	x86: Move XOP emulation to x86util We need the emulation to support the cases where the first argument is the same as the fourth. To achieve this a fifth argument working as a temporary may be needed. Emulation that doesn't obey the original instruction semantics can't be in x86inc. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-24 08:30:19 +01:00
Michael Niedermayer	bd8d73ea8b	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: add detection for Bit Manipulation Instruction sets Conflicts: libavutil/x86/cpu.c See: `0bc3de19ff` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-23 22:52:58 +01:00
Michael Niedermayer	d9574069c1	Merge commit '1b932eb1508f550fac9e911923a0383efda53aa3' * commit '1b932eb1508f550fac9e911923a0383efda53aa3': x86: add detection for FMA3 instruction set Conflicts: configure libavutil/cpu.h libavutil/x86/cpu.c See: `a2af8eddab` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-23 22:43:08 +01:00
James Almer	d59fcdaff3	x86: add detection for Bit Manipulation Instruction sets Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com>	2014-02-23 15:29:36 +01:00
James Almer	1b932eb150	x86: add detection for FMA3 instruction set Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com>	2014-02-23 15:29:36 +01:00
James Almer	10b0161d78	x86: add missing XOP checks and macros Signed-off-by: James Almer <jamrial@gmail.com>	2014-02-23 15:29:36 +01:00
James Almer	0bc3de19ff	x86: add detection for Bit Manipulation Instruction sets Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-22 17:26:00 +01:00
James Almer	a2af8eddab	x86: add detection for FMA3 instruction set Based on x264 code Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-22 17:25:52 +01:00
Christophe Gisquet	996697e266	x86: float dsp: unroll SSE versions vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2014-02-20 14:18:05 +01:00
Christophe Gisquet	133b34207c	x86: float dsp: unroll SSE versions vector_fmul and vector_fmac_scalar are guaranteed that they can process in batch of 16 elements, but their SSE versions only does 8 at a time. Therefore, unroll them a bit. 299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64. Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-15 18:54:21 +01:00
James Almer	23a8c63452	x86inc: Extend FMA_INSTR functionality Support the cases where the first and last operand of the XOP instruction are the same. Also add vpmacsdql emulation. Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-13 22:14:24 +01:00
James Almer	6c12b1de06	x86: add missing XOP checks and macros Signed-off-by: James Almer <jamrial@gmail.com> Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2014-02-11 03:46:52 +01:00
Loren Merritt	b7d0d10a1d	x86inc: Speed up assembling with Yasm Work around Yasm's inefficiency with handling large numbers of variables in the global scope. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2014-01-26 18:40:08 +01:00
Loren Merritt	4d55fe7204	x86inc: speed up compilation with yasm Work around yasm's inefficiency with handling large numbers of variables in the global scope.	2014-01-18 01:19:16 +01:00
Michael Niedermayer	c3814ab654	rename new lls code to lls2 to avoid conflict with the old which has a different ABI also remove failed attempt at a compatibility layer, the code simply cannot work Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-17 16:41:08 +01:00
Michael Niedermayer	bbe66ef912	avutil: rename lls to lls2 Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-11-17 16:30:23 +01:00
Michael Niedermayer	a665704402	Merge commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4' * commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4': libavutil: x86: Add AVX2 capable CPU detection. Conflicts: libavutil/cpu.c libavutil/cpu.h libavutil/x86/cpu.c See: `865b70bc5d` Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-26 02:36:36 +02:00
Kieran Kunhya	865b70bc5d	Add AVX2 capable CPU detection. Patch based on x264's AVX2 detection Signed-off-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-26 02:34:22 +02:00
Kieran Kunhya	4d6ee07255	libavutil: x86: Add AVX2 capable CPU detection. Patch based on x264's AVX2 detection Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-25 19:36:55 +01:00
Michael Niedermayer	f9bef2bec9	Merge remote-tracking branch 'qatar/master' * qatar/master: x86: more AVX2 framework Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-14 16:13:57 +02:00
Michael Niedermayer	e3e0e3d0c9	Merge commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497' * commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497': x86inc: FMA3/4 Support Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-14 16:06:22 +02:00
Michael Niedermayer	9ac124c889	Merge commit '206895708ea2b464755d340e44501daf9a07c310' * commit '206895708ea2b464755d340e44501daf9a07c310': x86inc: Remove our FMA4 support Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-14 15:54:23 +02:00
Michael Niedermayer	12e4493f9c	Merge commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098' * commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098': x86inc: Use VEX-encoded instructions in AVX functions Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-14 15:48:34 +02:00
Jason Garrett-Glaser	a3fabc6cb3	x86: more AVX2 framework Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-14 12:41:56 +01:00
Jason Garrett-Glaser	c6908d6b4b	x86inc: FMA3/4 Support Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-14 12:41:54 +01:00
Derek Buitenhuis	206895708e	x86inc: Remove our FMA4 support This is so we can sync to x264's version of FMA4 support. This partialy reverts commit `79687079a9`. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-14 12:39:29 +01:00
Henrik Gramner	c108ba0175	x86inc: Use VEX-encoded instructions in AVX functions Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version. This change makes it easier to extend existing code to use AVX2. Also add support for AVX emulation of a few instructions that were missing before. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-14 12:36:11 +01:00
Michael Niedermayer	31d0d35560	Merge remote-tracking branch 'qatar/master' * qatar/master: x86inc: Remove .rodata kludges Conflicts: libavutil/x86/x86inc.asm Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-09 14:29:42 +02:00
Henrik Gramner	ad7d7d4f6a	x86inc: Remove .rodata kludges The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old anymore. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-09 07:44:30 -04:00
Michael Niedermayer	19c3890819	Merge commit '3e2fa991db7ef172579422accd61624d52777e5a' * commit '3e2fa991db7ef172579422accd61624d52777e5a': x86inc: remove misaligned cpu flag Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 12:02:21 +02:00
Michael Niedermayer	31d9aa6b2e	Merge commit '71155665414b551ad350622d5abed20e58371fbf' * commit '71155665414b551ad350622d5abed20e58371fbf': x86inc: various minor backports from x264 Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 11:57:39 +02:00
Michael Niedermayer	3f965ab95d	Merge commit '47f9d7ce5493e119e09d1227d017414feaaf8d97' * commit '47f9d7ce5493e119e09d1227d017414feaaf8d97': x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64" Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 11:37:22 +02:00
Michael Niedermayer	1f17619fe4	Merge commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450' * commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450': x86inc: Utilize the shadow space on 64-bit Windows Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 11:23:00 +02:00
Michael Niedermayer	17d9c7c208	Merge commit '3fb78e99a04d0ed8db834d813d933eb86c37142a' * commit '3fb78e99a04d0ed8db834d813d933eb86c37142a': x86inc: create xm# and ym#, analagous to m# Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 11:15:17 +02:00
Michael Niedermayer	3352fdb292	Merge commit '49ebe3f9fe02174ae7e14548001fd146ed375cc2' * commit '49ebe3f9fe02174ae7e14548001fd146ed375cc2': x86inc: fix some corner cases of SWAP Merged-by: Michael Niedermayer <michaelni@gmx.at>	2013-10-08 11:07:03 +02:00

1 2 3 4 5 ...

331 Commits