FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00

Author	SHA1	Message	Date
Jason Garrett-Glaser	a3fabc6cb3	x86: more AVX2 framework Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-14 12:41:56 +01:00
Jason Garrett-Glaser	c6908d6b4b	x86inc: FMA3/4 Support Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-14 12:41:54 +01:00
Derek Buitenhuis	206895708e	x86inc: Remove our FMA4 support This is so we can sync to x264's version of FMA4 support. This partialy reverts commit `79687079a9`. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-14 12:39:29 +01:00
Henrik Gramner	c108ba0175	x86inc: Use VEX-encoded instructions in AVX functions Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4 functions for all instructions that exists in a VEX-encoded version. This change makes it easier to extend existing code to use AVX2. Also add support for AVX emulation of a few instructions that were missing before. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-14 12:36:11 +01:00
Henrik Gramner	ad7d7d4f6a	x86inc: Remove .rodata kludges The Mach-O bug was fixed in yasm 0.8.0 and we don't support versions that old anymore. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-09 07:44:30 -04:00
Henrik Gramner	3e2fa991db	x86inc: remove misaligned cpu flag Prevents a crash if the misaligned exception mask bit is cleared for some reason. Misaligned SSE functions are only used on AMD Phenom CPUs and the benefit is miniscule. They also require modifying the MXCSR control register and by removing those functions we can get rid of that complexity altogether. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-07 06:27:38 -04:00
Jason Garrett-Glaser	7115566541	x86inc: various minor backports from x264 Small backports that sneaked into other asm commits in x264. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-07 06:27:22 -04:00
Derek Buitenhuis	47f9d7ce54	x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64" This is also a valid value for WIN64. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-07 06:27:08 -04:00
Henrik Gramner	bbe4a6db44	x86inc: Utilize the shadow space on 64-bit Windows Store XMM6 and XMM7 in the shadow space in functions that clobbers them. This way we don't have to adjust the stack pointer as often, reducing the number of instructions as well as code size. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-07 06:25:35 -04:00
Loren Merritt	3fb78e99a0	x86inc: create xm# and ym#, analagous to m# For when we want to mix simd sizes within one function. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-07 06:25:19 -04:00
Loren Merritt	49ebe3f9fe	x86inc: fix some corner cases of SWAP SWAP with >=3 named (rather than numbered) args PERMUTE followed by SWAP with 2 named args used to produce the wrong permutation Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-07 06:25:06 -04:00
Henrik Gramner	63f0d62310	x86inc: Use SSE instead of SSE2 for copying data Reduces code size because movaps/movups is one byte shorter than movdqa/movdqu. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-07 06:24:33 -04:00
Henrik Gramner	ad76e6e7e1	x86inc: Set ELF hidden visibility for global constants Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-07 06:24:13 -04:00
Loren Merritt	25cb0c1a1e	x86inc: activate REP_RET automatically Now RET checks whether it immediately follows a branch, so the programmer dosen't have to keep track of that condition. REP_RET is still needed manually when it's a branch target, but that's much rarer. The implementation involves lots of spurious labels, but that's OK because we strip them. Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>	2013-10-07 06:17:59 -04:00
Alex Smith	08fa828b3f	avutil: Fix compilation with inline asm disabled on mingw Because of -Werror=implicit-function-declaration the build will fail. Signed-off-by: Martin Storsjö <martin@martin.st>	2013-09-22 00:50:32 +03:00
Diego Biurrun	79aec43ce8	x86: Add and use more convenience macros to check CPU extension availability	2013-08-29 13:07:37 +02:00
Diego Biurrun	8410d6e93c	avutil: Refactor CPU extension availability macros	2013-08-28 23:54:14 +02:00
Diego Biurrun	b78b10c4b7	avutil: Move internal CPU detection function declarations to private header	2013-08-28 23:54:14 +02:00
Diego Biurrun	3ac7fa81b2	Consistently use "cpu_flags" as variable/parameter name for CPU flags	2013-07-18 00:31:35 +02:00
Loren Merritt	c8b920a9b7	lls/x86: use 3-operator vaddpd in ADDPD_MEM Fixes build with yasm-1.1 Signed-off-by: Anton Khirnov <anton@khirnov.net>	2013-07-02 10:15:09 +02:00
Loren Merritt	1221bb6239	x86: lpc: fix a segfault in av_evaluate_lls_sse2()	2013-06-30 23:11:19 +00:00
Loren Merritt	b545179fdf	x86: lpc: simd av_evaluate_lls 1.5x-1.8x faster on sandybridge Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-06-29 13:23:57 +02:00
Loren Merritt	502ab21af0	x86: lpc: simd av_update_lls 4x-6x faster on sandybridge Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-06-29 13:23:57 +02:00
Diego Biurrun	1fda184a85	avutil: Add av_cold attributes to init functions missing them	2013-05-04 22:48:05 +02:00
Christophe Gisquet	566b7a20fd	x86: float dsp: butterflies_float SSE 97c -> 49c Some codecs could benefit from more unrolling, but AAC doesn't.	2013-05-03 08:08:02 +02:00
Ronald S. Bultje	b93b27edb0	dsputil: Make dsputil selectable Signed-off-by: Martin Storsjö <martin@martin.st>	2013-04-10 11:04:05 +03:00
Christophe Gisquet	2e81acc687	x86inc: Fix number of operands for cmp* instructions cmp{p,s}{s,d} instructions do take an imm8 operand. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-04-09 23:55:30 +02:00
Diego Biurrun	b6649ab503	cosmetics: Remove unnecessary extern keywords from function declarations	2013-03-27 14:21:45 +01:00
Ronald S. Bultje	0c0828ecc5	x86: Use simple nop codes for <= sse (rather than <= mmx) The "CentaurHauls family 6 model 9 stepping 8" family of CPUs (flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse up rng rng_en ace ace_en) SIGILLs on long nop codes. Signed-off-by: Martin Storsjö <martin@martin.st>	2013-02-19 22:33:19 +02:00
Diego Biurrun	4db96649ca	avutil: Ensure that emms_c is always defined, even on non-x86	2013-02-14 19:29:04 +01:00
Diego Biurrun	ab441e20ff	avutil: Move emms code to x86-specific header	2013-02-14 17:37:34 +01:00
Ronald S. Bultje	d56668bd80	floatdsp: move scalarproduct_float from dsputil to avfloatdsp. This makes the aac decoder and all voice codecs independent of dsputil.	2013-01-22 11:55:42 -08:00
Ronald S. Bultje	42d3246948	floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp. Now, nellymoserenc and aacenc no longer depends on dsputil. Independent of this patch, wmaprodec also does not depend on dsputil, so I removed it from there also.	2013-01-22 11:55:42 -08:00
Ronald S. Bultje	55aa03b9f8	floatdsp: move vector_fmul_add from dsputil to avfloatdsp.	2013-01-22 11:55:42 -08:00
Martin Storsjö	f4facd2ce7	x86: Add a Yasm-based emms() replacement This provides a fallback when building with Yasm enabled, but neither inline assembly, nor the _mm_empty intrinsic are available or enabled. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-01-18 22:02:13 +01:00
Diego Biurrun	d633d12b2c	x86inc: Add cvisible macro for C functions with public prefix This allows defining externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-01-18 22:02:03 +01:00
Diego Biurrun	ef5d41a553	x86inc: Rename "program_name" to "private_prefix" The new name is more descriptive and will allow defining a separate public prefix for externally visible library symbols. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2013-01-18 20:29:53 +01:00
Martin Storsjö	973b4d44f1	float_dsp: Add #ifdef HAVE_INLINE_ASM around vector_fmul_window This fixes builds on 64bit MSVC. Signed-off-by: Martin Storsjö <martin@martin.st>	2013-01-17 19:07:35 +02:00
Justin Ruggles	e034cc6c60	lavc: Move vector_fmul_window to AVFloatDSPContext Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2013-01-16 10:45:45 +01:00
Diego Biurrun	dae1d507af	x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags	2013-01-15 17:29:43 +01:00
Diego Biurrun	320e1d0df3	x86: ABSB2: port to cpuflags	2013-01-15 11:18:51 +01:00
Diego Biurrun	094a7405e5	x86: ABSB: port to cpuflags	2013-01-15 11:18:51 +01:00
Diego Biurrun	51969a652c	x86: ABS2: port to cpuflags	2013-01-14 21:56:55 +01:00
Diego Biurrun	5b4dfbffc2	x86: ABS1: port to cpuflags	2013-01-06 13:57:01 +01:00
Ronald S. Bultje	a34d9ad969	lavc: merge latest x86inc.asm fixes with x264 Unbreak NASM support. Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2012-12-19 07:27:33 +01:00
Janne Grunau	0995ad8db4	x86inc: fully concatenate tokens to fix macro expansion for nasm Fixes build errors with nasm introduced in `6f40e9f070` for stack memory alignment. Noticed by BugMaster.	2012-12-13 23:57:09 +01:00
Ronald S. Bultje	140367aff9	x86inc: fix stack alignment on win64 Signed-off-by: Martin Storsjö <martin@martin.st>	2012-12-12 21:30:49 +02:00
Ronald S. Bultje	6f40e9f070	x86inc: support stack mem allocation and re-alignment in PROLOGUE Use this in VP8/H264-8bit loopfilter functions so they can be used if there is no aligned stack (e.g. MSVC 32bit or ICC 10.x). Signed-off-by: Luca Barbato <lu_zero@gentoo.org>	2012-12-12 05:23:46 +01:00
Justin Ruggles	1c012e6bfb	x86: float_dsp: fix loading of the len parameter on x86-32	2012-12-07 21:19:29 -05:00
Justin Ruggles	ecc8b02194	x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32 Signed-off-by: Janne Grunau <janne-libav@jannau.net>	2012-12-06 14:11:15 +01:00
Justin Ruggles	b30a363331	x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling	2012-12-05 11:23:37 -05:00
Justin Ruggles	ac7eb4cb20	float_dsp: add vector_dmul_scalar() to multiply a vector of doubles Include x86-optimized versions for SSE2 and AVX.	2012-12-05 11:23:36 -05:00
Diego Biurrun	490df522c7	x86: cpu: Drop unused HAVE_RWEFLAGS condition The test for rweflags was dropped in a previous commit.	2012-11-28 00:28:09 +01:00
Justin Ruggles	947f933687	x86: float_dsp: add SSE version of vector_fmul_scalar()	2012-11-26 11:30:19 -05:00
Diego Biurrun	87af05c575	x86: SPLATD: port to cpuflags	2012-11-18 18:34:05 +01:00
Diego Biurrun	26301caaa1	x86: mmx2 ---> mmxext in asm constructs	2012-11-14 00:58:51 +01:00
Diego Biurrun	2b479bcab0	build: Drop AVX assembly ifdefs An assembler able to cope with AVX instructions is now required.	2012-11-11 20:43:28 +01:00
Diego Biurrun	f0d124f005	x86inc: Set program_name outside of x86inc.asm This reduces the local difference to the x264 upstream version.	2012-11-11 11:06:19 +01:00
Diego Biurrun	4b60fac419	x86: PALIGNR: port to cpuflags	2012-11-09 21:31:31 +01:00
Diego Biurrun	dbb37e7711	x86: PABSW: port to cpuflags	2012-11-05 14:51:10 +01:00
Diego Biurrun	0a7a94f2e5	x86: Refactor PSWAPD fallback implementations and port to cpuflags	2012-11-02 17:05:29 +01:00
Diego Biurrun	26f01bd106	x86: PMINUB: port to cpuflags	2012-11-02 15:38:15 +01:00
Diego Biurrun	61bc2bc7d4	x86util: Add cpuflags_mmxext alias for cpuflags_mmx2 "mmxext" is a more sensible name and more common in outside projects.	2012-11-02 15:22:34 +01:00
Diego Biurrun	012f73e271	x86inc: Only define program_name if the macro is unset This allows overriding the value from outside of the file.	2012-11-02 14:38:00 +01:00
Dave Yeo	9c167914a1	x86: Fix assembly with NASM Unlike YASM, NASM only looks for include files in the current directory, not in the directory that included files reside in. Signed-off-by: Diego Biurrun <diego@biurrun.de>	2012-10-31 10:20:35 +01:00
Diego Biurrun	588fafe7f3	x86: MMX2 ---> MMXEXT in macro names	2012-10-31 01:04:55 +01:00
Diego Biurrun	6860b4081d	x86: include x86inc.asm in x86util.asm This is necessary to allow refactoring some x86util macros with cpuflags.	2012-10-31 00:37:42 +01:00
Ronald S. Bultje	08b028c18d	Remove INIT_AVX from x86inc.asm.	2012-10-29 14:51:14 -07:00
Diego Biurrun	a7329e5fc2	x86: get_cpu_flags: add necessary ifdefs around function body ff_get_cpu_flags_x86() requires cpuid(), which is conditionally defined elsewhere in the file. Surrounding the function body with ifdefs allows building even when cpuid is not defined. An empty cpuflags mask is returned in this case.	2012-10-04 19:29:14 +02:00
Diego Biurrun	f6fbce761e	x86: Drop CPU detection intrinsics Now that there is CPU detection in YASM, there will always be one of inline or external assembly enabled, which obviates the need to fall back on CPU detection through compiler intrinsics.	2012-10-04 19:29:14 +02:00
Diego Biurrun	1f6d86991f	x86: Add YASM implementations of cpuid and xgetbv from x264 This allows detecting CPU features with builds that have neither gcc inline assembly nor the right compiler intrinsics enabled.	2012-10-04 19:29:14 +02:00
Diego Biurrun	54b243141e	x86: cpu: Break out test for cpuid capabilities into separate function	2012-10-04 18:09:21 +02:00
Diego Biurrun	cc5e9e5ff0	x86: ff_get_cpu_flags_x86(): Avoid a pointless variable indirection	2012-10-04 17:58:42 +02:00
Diego Biurrun	e0c6cce447	x86: Replace checks for CPU extensions and flags by convenience macros This separates code relying on inline from that relying on external assembly and fixes instances where the coalesced check was incorrect.	2012-09-08 18:18:34 +02:00
Justin Ruggles	7327525997	x86: float_dsp: fix ff_vector_fmac_scalar_avx() on Win64 The SWAP macro does not work for explicit xmm/ymm usage, so instead just move the scalar value from xmm2 to xmm0.	2012-09-07 14:49:10 -04:00
Diego Biurrun	f82c4fb27f	x86: Add convenience macros to check for CPU extensions and flags	2012-09-04 01:44:59 +02:00
Diego Biurrun	17337f54c0	x86: Split inline and external assembly #ifdefs	2012-08-31 01:53:25 +02:00
Diego Biurrun	a886b279a0	x86: cosmetics: Comment some #endifs for better readability	2012-08-30 18:50:33 +02:00
Loren Merritt	7a1944b907	vf_hqdn3d: x86 asm 13% faster on penryn, 16% on sandybridge, 15% on bulldozer Not simd; a compiler should have generated this, but gcc didn't.	2012-08-26 10:49:14 +00:00
Justin Ruggles	6092dafb5a	lavr: x86: optimized 6-channel s16 to fltp conversion	2012-08-23 20:10:57 -04:00
Mans Rullgard	5b170c0bea	x86: remove FASTDIV inline asm GCC 4.3 and later do the right thing with the plain C code. Earlier versions in 32-bit mode generate one extra instruction, needlessly zeroing what would be the high half of the shifted value. At least two gcc configurations miscompile the inline asm in some situations. In 64-bit mode, all gcc versions generate imul r64, r64 followed by shr. On Intel i7 and later, this imul is faster 32-bit mul. On older Intel and all AMD, it is slightly slower. On Atom it is much slower. Considering where the FASTDIV macro is used, any overall negative performance impact of this change should be negligible. If anyone cares, they should file a bug against gcc and get the instruction selection fixed. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-22 14:29:10 +01:00
Martin Storsjö	33e112847d	Add more missing includes after removing the implicit common.h Signed-off-by: Martin Storsjö <martin@martin.st>	2012-08-16 10:49:54 +03:00
Martin Storsjö	70766c2182	Add some more missing includes after removing the implicit common.h Signed-off-by: Martin Storsjö <martin@martin.st>	2012-08-15 23:48:48 +03:00
Mans Rullgard	070a402b60	x86: move MANGLE() and related macros to libavutil/x86/asm.h These x86-specific macros do not belong in generic code. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-09 00:58:20 +01:00
Mans Rullgard	c318626ce2	x86: rename libavutil/x86_cpu.h to libavutil/x86/asm.h This puts x86-specific things in the x86/ subdirectory where they belong. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-09 00:58:20 +01:00
Mans Rullgard	edd8226795	x86: fix build with nasm 2.08 It appears that something goes wrong in old nasm versions when the %+ operator is used in the last argument of a macro invocation and this argument is tested with %ifdef within the macro. This patch rearranges the macro arguments such that the %+ operator is never used in the last argument.	2012-08-07 15:24:34 +01:00
Mans Rullgard	180d43bc67	x86: use nop cpu directives only if supported nasm does not support 'CPU foonop' directives. This adds a configure test for the directive and uses it only if supported. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-07 15:22:20 +01:00
Mans Rullgard	7238265052	x86: fix rNmp macros with nasm For some reason, nasm requires this. No harm done to yasm. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-07 15:21:58 +01:00
Mans Rullgard	a3df4781f4	x86: add colons after labels nasm prints a warning if the colon is missing. Signed-off-by: Mans Rullgard <mans@mansr.com>	2012-08-07 15:20:56 +01:00
Diego Biurrun	239fdf1b4a	x86: build: replace mmx2 by mmxext Refactoring mmx2/mmxext YASM code with cpuflags will force renames. So switching to a consistent naming scheme beforehand is sensible. The name "mmxext" is more official and widespread and also the name of the CPU flag, as reported e.g. by the Linux kernel.	2012-08-03 22:51:05 +02:00
Diego Biurrun	ca844b7be9	x86: Use consistent 3dnowext function and macro name suffixes Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to "3dnowext", which is a more common name of the CPU flag, as reported e.g. by the Linux kernel, unifies this.	2012-08-03 14:00:47 +02:00
Loren Merritt	f8d8fe255d	x86inc: clip num_args to 7 on x86-32. This allows us to unconditionally set the cglobal num_args parameter to a bigger value, thus making writing yasm code even easier than before. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2012-07-28 08:29:45 -07:00
Ronald S. Bultje	96c9cc1094	x86inc: sync to latest version from x264.	2012-07-28 08:29:44 -07:00
Justin Ruggles	79687079a9	x86: add support for fmaddps fma4 instruction with abstraction to avx/sse	2012-07-27 11:25:48 -04:00
Ronald S. Bultje	30b45d9c38	x86inc: automatically insert vzeroupper for YMM functions.	2012-07-26 13:43:16 -07:00
Jason Garrett-Glaser	85a3c19ed1	dsputil: x86: add SHUFFLE_MASK_W macro Simplifies pshufb masks that operate on words.	2012-07-22 16:56:58 -04:00
Ronald S. Bultje	358d854df8	x86/cpu: implement get/set_eflags using intrinsics Signed-off-by: Diego Biurrun <diego@biurrun.de> Signed-off-by: Martin Storsjö <martin@martin.st>	2012-07-10 14:33:32 +03:00
Ronald S. Bultje	c0ee695bd7	x86/cpu: implement support for cpuid through intrinsics Signed-off-by: Martin Storsjö <martin@martin.st>	2012-07-10 14:33:24 +03:00
Ronald S. Bultje	3f150ffba3	x86/cpu: implement support for xgetbv through intrinsics Signed-off-by: Martin Storsjö <martin@martin.st>	2012-07-10 14:33:17 +03:00
Ronald S. Bultje	07b287020c	x86/timer: implement an intrinsic-based version for rdtsc (AV_READ_TIME).	2012-07-07 13:35:07 -07:00

1 2 3 4 5

206 Commits