James Almer
e229df9478
x86/aacpsdsp: add ff_ps_hybrid_synthesis_deint_{sse,sse4}
...
About 2x faster than the c version.
2017-06-18 22:33:27 -03:00
Henrik Gramner
aad1b6786e
x86inc: Add some additional cpuflag relations
...
Simplifies writing assembly code that depends on available instructions.
LZCNT implies SSE2
BMI1 implies AVX+LZCNT
AVX2 implies BMI2
2017-06-12 11:41:25 +02:00
Anton Mitrofanov
d991b3e8a8
x86inc: Remove argument from WIN64_RESTORE_XMM
...
The use of rsp was pretty much hardcoded there and probably didn't work
otherwise with stack_size > 0.
2017-06-09 13:43:01 +02:00
Henrik Gramner
cd4ca82459
x86inc: Prefer r14/r15 over r12/r13 on x86-64
...
Due to a peculiarity in the ModR/M addressing encoding, the r12 and r13
registers sometimes requires an additional byte when used as a base register.
r14 and r15 doesn't have that issue, so prefer using them.
2017-06-09 13:43:00 +02:00
Henrik Gramner
88dcdfad09
x86inc: Make REP_RET identical to RET in SSSE3+ functions
...
There's no point in emitting a rep prefix before ret on modern CPUs.
2017-06-09 13:43:00 +02:00
Henrik Gramner
406e0ddc0b
x86inc: Fix call with memory operands
...
We overload the `call` instruction with a macro, but it would misbehave when
the macro argument wasn't a valid identifier. Fix it by explicitly checking
if the argument is an identifier.
2017-06-09 13:43:00 +02:00
James Almer
0fbc7a2169
x86/float_dsp: remove usage of integer instructions
2017-05-12 23:34:49 -03:00
James Almer
f1d80bc630
x86/float_dsp: add ff_vector_fmul_reverse_avx2
...
~20% faster than AVX.
Signed-off-by: James Almer <jamrial@gmail.com>
2017-04-11 21:35:35 -03:00
James Almer
ed9b25a148
x86/float_dsp: add ff_vector_dmac_scalar_{sse2,avx,fma3}
2017-04-10 12:18:55 -03:00
Clément Bœsch
f291a9a1ad
Merge commit '99434f4df81b6801b2b535d5b9143305595784f6'
...
* commit '99434f4df81b6801b2b535d5b9143305595784f6':
float_dsp: Have implementation match function pointer prototype
Merged-by: Clément Bœsch <cboesch@gopro.com>
2017-03-30 10:23:25 +02:00
James Almer
c97e986e90
Merge commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8'
...
* commit '7911186ed616ae81dd8617d6d0e8b08c818db9d8':
emms: Give apriv_emms_yasm() a more general name
Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:28:56 -03:00
James Almer
29db87af52
Merge commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4'
...
* commit '6be7944ee2ec2f045e6eb9a93237e992c8b20ac4':
x86: Add missing colons after assembly labels
Merged-by: James Almer <jamrial@gmail.com>
2017-03-23 18:05:27 -03:00
James Almer
d8962ffbd8
avutil/x86util: don't use movss in VBROADCASTSS macro when src and dst args are the same
...
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2017-03-21 19:15:00 -03:00
Clément Bœsch
3898e346b3
Merge commit '07e1f99a1bb41d1a615676140eefc85cf69fa793'
...
* commit '07e1f99a1bb41d1a615676140eefc85cf69fa793':
x86util: Document SBUTTERFLY macro
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 18:38:07 +01:00
Clément Bœsch
8200b16a9c
Merge commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5'
...
* commit 'd7bc52bf456deba0f32d9fe5c288ec441f1ebef5':
imgutils: add a function for copying image data from GPU mapped memory
Merged-by: Clément Bœsch <u@pkh.me>
2017-03-20 08:34:10 +01:00
James Darnley
5336887867
avcodec/h264: sse2, avx h luma mbaff deblock/loop filter
...
x86-64 only
Yorkfield:
- sse2: ~2.17x (434 vs. 200 cycles)
Nehalem:
- sse2: ~2.94x (409 vs. 139 cycles)
Skylake:
- sse2: ~3.10x (370 vs. 119 cycles)
- avx: ~3.29x (370 vs. 112 cycles)
2017-02-18 20:26:52 +01:00
James Darnley
7627df15d4
x86util: import MOVHL macro
...
Originally committed to x264 in 1637239a by Henrik Gramner who has
agreed to re-license it as LGPL. Original commit message follows.
x86: Avoid some bypass delays and false dependencies
A bypass delay of 1-3 clock cycles may occur on some CPUs when transitioning
between int and float domains, so try to avoid that if possible.
2017-02-18 20:26:51 +01:00
James Darnley
9d815b7424
avcodec/x86: deduplicate PASS8ROWS macro
2017-02-18 20:26:49 +01:00
James Almer
8d5df204d0
Merge commit '8e9cd81d291b1010c625b2766058aadf4affb537'
...
* commit '8e9cd81d291b1010c625b2766058aadf4affb537':
x86: cpu: Detect Conroe CPUs and their slow shuffle unit
Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 15:20:54 -03:00
James Almer
2eab48177d
Merge commit '7d7355aa92bb36ca0765c49a569a999bcb96f332'
...
* commit '7d7355aa92bb36ca0765c49a569a999bcb96f332':
x86: Add SSSE3_SLOW CPU flag and related convenience macros
Merged-by: James Almer <jamrial@gmail.com>
2017-01-31 15:17:19 -03:00
Henrik Gramner
cd09e3b349
x86inc: Avoid using eax/rax for storing the stack pointer
...
When allocating stack space with an alignment requirement that is larger
than the current stack alignment we need to store a copy of the original
stack pointer in order to be able to restore it later.
If we chose to use another register for this purpose we should not pick
eax/rax since it can be overwritten as a return value.
2017-01-09 16:00:29 +01:00
Diego Biurrun
99434f4df8
float_dsp: Have implementation match function pointer prototype
...
libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 1 different from declaration
libavutil/x86/float_dsp_init.c(144) : warning C4028: formal parameter 2 different from declaration
2016-11-03 17:43:55 +01:00
Michael Niedermayer
051517648b
avutil/x86/emms: Document the emms_c() vs alloc/free relation.
...
Reviewed-by: Andreas Cadhalpun <andreas.cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2016-10-23 13:02:37 +02:00
Diego Biurrun
7911186ed6
emms: Give apriv_emms_yasm() a more general name
2016-10-18 13:09:09 +02:00
Diego Biurrun
6be7944ee2
x86: Add missing colons after assembly labels
...
This fixes many warnings of the sort
warning: label alone on a line without a colon might be in error
2016-10-17 16:31:26 +02:00
Alexandra Hájková
07e1f99a1b
x86util: Document SBUTTERFLY macro
...
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2016-09-19 10:02:43 +02:00
Anton Khirnov
d7bc52bf45
imgutils: add a function for copying image data from GPU mapped memory
...
See https://software.intel.com/en-us/articles/copying-accelerated-video-decode-frame-buffers
2016-08-31 08:15:47 +02:00
Fiona Glaser
8e9cd81d29
x86: cpu: Detect Conroe CPUs and their slow shuffle unit
2016-07-20 18:43:28 +02:00
Diego Biurrun
7d7355aa92
x86: Add SSSE3_SLOW CPU flag and related convenience macros
2016-07-20 18:43:28 +02:00
James Almer
fd5e6a095f
x86util: Extend SPLATW for avx2
...
Integration to Libav by Josh de Kock <josh@itanimul.li>.
Signed-off-by: Alexandra Hájková <alexandra@khirnov.net>
2016-07-18 15:27:13 +02:00
Ronald S. Bultje
f0a2b6249b
vp9: add 16x16 idct avx2 (8-bit).
...
checkasm --bench, 10k runs, for *_add_${bpc}_${sub_idct}_${opt}, shows
that it's about 1.65x as fast as the AVX version for the full IDCT, and
similar speedups for the sub-IDCTs:
nop: 24.6
vp9_inv_dct_dct_16x16_add_8_1_c: 6444.8
vp9_inv_dct_dct_16x16_add_8_1_sse2: 638.6
vp9_inv_dct_dct_16x16_add_8_1_ssse3: 484.4
vp9_inv_dct_dct_16x16_add_8_1_avx: 661.2
vp9_inv_dct_dct_16x16_add_8_1_avx2: 311.5
vp9_inv_dct_dct_16x16_add_8_2_c: 6665.7
vp9_inv_dct_dct_16x16_add_8_2_sse2: 646.9
vp9_inv_dct_dct_16x16_add_8_2_ssse3: 455.2
vp9_inv_dct_dct_16x16_add_8_2_avx: 521.9
vp9_inv_dct_dct_16x16_add_8_2_avx2: 304.3
vp9_inv_dct_dct_16x16_add_8_4_c: 7022.7
vp9_inv_dct_dct_16x16_add_8_4_sse2: 647.4
vp9_inv_dct_dct_16x16_add_8_4_ssse3: 467.1
vp9_inv_dct_dct_16x16_add_8_4_avx: 446.1
vp9_inv_dct_dct_16x16_add_8_4_avx2: 297.0
vp9_inv_dct_dct_16x16_add_8_8_c: 6800.4
vp9_inv_dct_dct_16x16_add_8_8_sse2: 598.6
vp9_inv_dct_dct_16x16_add_8_8_ssse3: 465.7
vp9_inv_dct_dct_16x16_add_8_8_avx: 440.9
vp9_inv_dct_dct_16x16_add_8_8_avx2: 290.2
vp9_inv_dct_dct_16x16_add_8_16_c: 6626.6
vp9_inv_dct_dct_16x16_add_8_16_sse2: 599.5
vp9_inv_dct_dct_16x16_add_8_16_ssse3: 475.0
vp9_inv_dct_dct_16x16_add_8_16_avx: 469.9
vp9_inv_dct_dct_16x16_add_8_16_avx2: 286.4
2016-07-11 10:14:58 -04:00
Matthieu Bouron
9eb3da2f99
asm: FF_-prefix internal macros used in inline assembly
...
See merge commit '39d6d3618d48625decaff7d9bdbb45b44ef2a805'.
2016-06-27 17:21:18 +02:00
Clément Bœsch
8ef57a0d61
Merge commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb'
...
* commit '41ed7ab45fc693f7d7fc35664c0233f4c32d69bb':
cosmetics: Fix spelling mistakes
Merged-by: Clément Bœsch <u@pkh.me>
2016-06-21 21:55:34 +02:00
Matt Oliver
5ca44ebd99
lavu/intmath.h: fix compilation with msvc10.
...
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
2016-06-13 13:49:24 +10:00
James Almer
172af20852
x86/showcqt: use three operand format for some instructions
...
Fixes failures with yasm 1.1.0 and older
Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-08 19:37:08 -03:00
James Almer
99b899483e
avutil/x86util: move haddps sse emulation from showcqt
...
Signed-off-by: James Almer <jamrial@gmail.com>
2016-06-08 14:18:00 -03:00
Diego Biurrun
1e9c5bf4c1
asm: FF_-prefix internal macros used in inline assembly
...
These warnings conflict with system macros on Solaris, producing
truckloads of warnings about macro redefinition.
2016-05-28 19:18:26 +02:00
Anton Mitrofanov
2fb1d17a5a
x86inc: Enable AVX emulation in additional cases
...
Allows emulation to work when dst is equal to src2 as long as the
instruction is commutative, e.g. `addps m0, m1, m0`.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-05-16 10:31:24 +02:00
Anton Mitrofanov
300fb0df84
x86inc: Improve handling of %ifid with multi-token parameters
...
The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-05-16 10:31:20 +02:00
Anton Mitrofanov
8d02579fae
x86inc: Fix AVX emulation of some instructions
...
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-05-16 10:31:17 +02:00
Henrik Gramner
ba3eb745cc
x86inc: Fix AVX emulation of scalar float instructions
...
Those instructions are not commutative since they only change the first
element in the vector and leave the rest unmodified.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2016-05-16 10:31:13 +02:00
Vittorio Giovara
41ed7ab45f
cosmetics: Fix spelling mistakes
...
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2016-05-04 18:16:21 +02:00
Anton Mitrofanov
e428f3b30c
x86inc: Enable AVX emulation in additional cases
...
Allows emulation to work when dst is equal to src2 as long as the
instruction is commutative, e.g. `addps m0, m1, m0`.
2016-04-20 19:16:22 +02:00
Anton Mitrofanov
4bd5583ace
x86inc: Improve handling of %ifid with multi-token parameters
...
The yasm/nasm preprocessor only checks the first token, which means that
parameters such as `dword [rax]` are treated as identifiers, which is
generally not what we want.
2016-04-20 19:16:22 +02:00
Anton Mitrofanov
42be240ad6
x86inc: Fix AVX emulation of some instructions
2016-04-20 19:16:22 +02:00
Henrik Gramner
8dd3ee9ddd
x86inc: Fix AVX emulation of scalar float instructions
...
Those instructions are not commutative since they only change the first
element in the vector and leave the rest unmodified.
2016-04-20 19:16:22 +02:00
James Almer
70d685a77f
x86: use the new helper macros where useful
...
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-14 20:00:21 -03:00
James Almer
73a4589d4b
x86: add some more helper macros to check for slow cpuflags
...
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-14 20:00:17 -03:00
James Almer
be22bd32fe
x86/cpu: set avxslow cpuflag on btver2 CPUs
...
They are also slow when using 256 bit wide registers
Reviewed-by: Hendrik Leppkes <h.leppkes@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-07 16:39:21 -03:00
James Almer
b3b0ecee15
x86/emms: empty the mmx state unconditionally on supported targets
...
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
2016-02-04 01:49:01 -03:00