Henrik Gramner
c108ba0175
x86inc: Use VEX-encoded instructions in AVX functions
...
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:36:11 +01:00
Henrik Gramner
ad7d7d4f6a
x86inc: Remove .rodata kludges
...
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-09 07:44:30 -04:00
Henrik Gramner
3e2fa991db
x86inc: remove misaligned cpu flag
...
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:38 -04:00
Jason Garrett-Glaser
7115566541
x86inc: various minor backports from x264
...
Small backports that sneaked into other asm commits in x264.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:22 -04:00
Derek Buitenhuis
47f9d7ce54
x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
...
This is also a valid value for WIN64.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:08 -04:00
Henrik Gramner
bbe4a6db44
x86inc: Utilize the shadow space on 64-bit Windows
...
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:35 -04:00
Loren Merritt
3fb78e99a0
x86inc: create xm# and ym#, analagous to m#
...
For when we want to mix simd sizes within one function.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:19 -04:00
Loren Merritt
49ebe3f9fe
x86inc: fix some corner cases of SWAP
...
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:06 -04:00
Henrik Gramner
63f0d62310
x86inc: Use SSE instead of SSE2 for copying data
...
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:33 -04:00
Henrik Gramner
ad76e6e7e1
x86inc: Set ELF hidden visibility for global constants
...
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:13 -04:00
Loren Merritt
25cb0c1a1e
x86inc: activate REP_RET automatically
...
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:17:59 -04:00
Alex Smith
08fa828b3f
avutil: Fix compilation with inline asm disabled on mingw
...
Because of -Werror=implicit-function-declaration the build will fail.
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-09-22 00:50:32 +03:00
Diego Biurrun
79aec43ce8
x86: Add and use more convenience macros to check CPU extension availability
2013-08-29 13:07:37 +02:00
Diego Biurrun
8410d6e93c
avutil: Refactor CPU extension availability macros
2013-08-28 23:54:14 +02:00
Diego Biurrun
b78b10c4b7
avutil: Move internal CPU detection function declarations to private header
2013-08-28 23:54:14 +02:00
Diego Biurrun
3ac7fa81b2
Consistently use "cpu_flags" as variable/parameter name for CPU flags
2013-07-18 00:31:35 +02:00
Loren Merritt
c8b920a9b7
lls/x86: use 3-operator vaddpd in ADDPD_MEM
...
Fixes build with yasm-1.1
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-07-02 10:15:09 +02:00
Loren Merritt
1221bb6239
x86: lpc: fix a segfault in av_evaluate_lls_sse2()
2013-06-30 23:11:19 +00:00
Loren Merritt
b545179fdf
x86: lpc: simd av_evaluate_lls
...
1.5x-1.8x faster on sandybridge
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-06-29 13:23:57 +02:00
Loren Merritt
502ab21af0
x86: lpc: simd av_update_lls
...
4x-6x faster on sandybridge
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-06-29 13:23:57 +02:00
Diego Biurrun
1fda184a85
avutil: Add av_cold attributes to init functions missing them
2013-05-04 22:48:05 +02:00
Christophe Gisquet
566b7a20fd
x86: float dsp: butterflies_float SSE
...
97c -> 49c
Some codecs could benefit from more unrolling, but AAC doesn't.
2013-05-03 08:08:02 +02:00
Ronald S. Bultje
b93b27edb0
dsputil: Make dsputil selectable
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-10 11:04:05 +03:00
Christophe Gisquet
2e81acc687
x86inc: Fix number of operands for cmp* instructions
...
cmp{p,s}{s,d} instructions do take an imm8 operand.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-04-09 23:55:30 +02:00
Diego Biurrun
b6649ab503
cosmetics: Remove unnecessary extern keywords from function declarations
2013-03-27 14:21:45 +01:00
Ronald S. Bultje
0c0828ecc5
x86: Use simple nop codes for <= sse (rather than <= mmx)
...
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-02-19 22:33:19 +02:00
Diego Biurrun
4db96649ca
avutil: Ensure that emms_c is always defined, even on non-x86
2013-02-14 19:29:04 +01:00
Diego Biurrun
ab441e20ff
avutil: Move emms code to x86-specific header
2013-02-14 17:37:34 +01:00
Ronald S. Bultje
d56668bd80
floatdsp: move scalarproduct_float from dsputil to avfloatdsp.
...
This makes the aac decoder and all voice codecs independent of dsputil.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
42d3246948
floatdsp: move vector_fmul_reverse from dsputil to avfloatdsp.
...
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
2013-01-22 11:55:42 -08:00
Ronald S. Bultje
55aa03b9f8
floatdsp: move vector_fmul_add from dsputil to avfloatdsp.
2013-01-22 11:55:42 -08:00
Martin Storsjö
f4facd2ce7
x86: Add a Yasm-based emms() replacement
...
This provides a fallback when building with Yasm enabled, but neither
inline assembly, nor the _mm_empty intrinsic are available or enabled.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 22:02:13 +01:00
Diego Biurrun
d633d12b2c
x86inc: Add cvisible macro for C functions with public prefix
...
This allows defining externally visible library symbols.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 22:02:03 +01:00
Diego Biurrun
ef5d41a553
x86inc: Rename "program_name" to "private_prefix"
...
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 20:29:53 +01:00
Martin Storsjö
973b4d44f1
float_dsp: Add #ifdef HAVE_INLINE_ASM around vector_fmul_window
...
This fixes builds on 64bit MSVC.
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-01-17 19:07:35 +02:00
Justin Ruggles
e034cc6c60
lavc: Move vector_fmul_window to AVFloatDSPContext
...
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-16 10:45:45 +01:00
Diego Biurrun
dae1d507af
x86: Add PAVGB macro to abstract pavgb/pavgusb instruction via cpuflags
2013-01-15 17:29:43 +01:00
Diego Biurrun
320e1d0df3
x86: ABSB2: port to cpuflags
2013-01-15 11:18:51 +01:00
Diego Biurrun
094a7405e5
x86: ABSB: port to cpuflags
2013-01-15 11:18:51 +01:00
Diego Biurrun
51969a652c
x86: ABS2: port to cpuflags
2013-01-14 21:56:55 +01:00
Diego Biurrun
5b4dfbffc2
x86: ABS1: port to cpuflags
2013-01-06 13:57:01 +01:00
Ronald S. Bultje
a34d9ad969
lavc: merge latest x86inc.asm fixes with x264
...
Unbreak NASM support.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-19 07:27:33 +01:00
Janne Grunau
0995ad8db4
x86inc: fully concatenate tokens to fix macro expansion for nasm
...
Fixes build errors with nasm introduced in 6f40e9f070
for stack
memory alignment. Noticed by BugMaster.
2012-12-13 23:57:09 +01:00
Ronald S. Bultje
140367aff9
x86inc: fix stack alignment on win64
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-12-12 21:30:49 +02:00
Ronald S. Bultje
6f40e9f070
x86inc: support stack mem allocation and re-alignment in PROLOGUE
...
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-12 05:23:46 +01:00
Justin Ruggles
1c012e6bfb
x86: float_dsp: fix loading of the len parameter on x86-32
2012-12-07 21:19:29 -05:00
Justin Ruggles
ecc8b02194
x86: float_dsp: fix compilation of ff_vector_dmul_scalar_avx() on x86-32
...
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2012-12-06 14:11:15 +01:00
Justin Ruggles
b30a363331
x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling
2012-12-05 11:23:37 -05:00
Justin Ruggles
ac7eb4cb20
float_dsp: add vector_dmul_scalar() to multiply a vector of doubles
...
Include x86-optimized versions for SSE2 and AVX.
2012-12-05 11:23:36 -05:00
Diego Biurrun
490df522c7
x86: cpu: Drop unused HAVE_RWEFLAGS condition
...
The test for rweflags was dropped in a previous commit.
2012-11-28 00:28:09 +01:00