1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-18 03:19:31 +02:00
Commit Graph

482 Commits

Author SHA1 Message Date
Anton Mitrofanov
b114d28a18 x86inc: warn when instructions incompatible with current cpuflags are used
Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:07:18 +02:00
Henrik Gramner
9f1245eb96 x86inc: Support arbitrary stack alignments
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:04:11 +02:00
Anton Mitrofanov
8c75ba55a4 x86inc: warn if XOP integer FMA instruction emulation is impossible
Emulation requires a temporary register if arguments 1 and 4 are the same; this
doesn't obey the semantics of the original instruction, so we can't emulate
that in x86inc.

Also add pmacsdql emulation.

Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:02:27 +02:00
Anton Mitrofanov
8db0f71b49 x86inc: warn if XOP integer FMA instruction emulation is impossible
Signed-off-by: Henrik Gramner <henrik@gramner.com>
2015-08-05 16:15:40 +02:00
Henrik Gramner
f0b7882ceb x86inc: Drop SECTION_TEXT macro
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
2015-08-04 20:13:09 +02:00
Henrik Gramner
826790f596 x86inc: Support arbitrary stack alignments
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
2015-08-04 20:13:09 +02:00
James Almer
5750d6c5e9 x86: move XOP emulation code back to x86inc
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.

This further reduces differences with x264's x86inc.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-08-03 17:11:13 -03:00
Henrik Gramner
127203ba5a x86inc: Various minor backports from x264
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-08-03 04:08:33 +02:00
Henrik Gramner
f151fbd9e5 x86inc: Disable vpbroadcastq workaround in newer yasm versions
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.

Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-08-03 03:13:20 +02:00
James Almer
4d2c014a8f x86/float_dsp: add missing colon to labels
Silences warnings with Nasm

Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-26 02:51:08 -03:00
James Almer
bd48764532 avutil/x86/bswap: force inline asm versions with ICC
Recent ICC versions that define GCC as >= 4.5 (like ICC 13) apparently can't
optimize the generic C versions of av_bswap*() on their own.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-07-18 20:48:09 -03:00
Michael Niedermayer
2ecbf44f21 Merge commit 'd1a6cb195f610978ba5d2351e60f938f7f261d59'
* commit 'd1a6cb195f610978ba5d2351e60f938f7f261d59':
  x86: Serialize rdtsc in read_time()

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-07-09 12:28:09 +02:00
Henrik Gramner
d1a6cb195f x86: Serialize rdtsc in read_time()
Improves the accuracy of measurements, especially in short sections.

To quote the Intel 64 and IA-32 Architectures Software Developer's Manual:
"The RDTSC instruction is not a serializing instruction. It does not necessarily
wait until all previous instructions have been executed before reading the counter.
Similarly, subsequent instructions may begin execution before the read operation
is performed. If software requires RDTSC to be executed only after all previous
instructions have completed locally, it can either use RDTSCP (if the processor
supports that instruction) or execute the sequence LFENCE;RDTSC."

SSE2 is a requirement for lfence so only use it on SSE2-capable systems.
Prefer lfence;rdtsc over rdtscp since rdtscp is supported on fewer systems.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-07-09 00:10:13 +02:00
James Almer
93e7b7fb34 avutil/x86/intmath: add missing check for inline assembly
Signed-off-by: James Almer <jamrial@gmail.com>
2015-06-27 14:33:53 -03:00
James Almer
1e51e517be avutil/x86/intmath: use bzhi gcc builtin in av_mod_uintp2()
Signed-off-by: James Almer <jamrial@gmail.com>
2015-06-27 12:56:55 -03:00
James Almer
c16e99e3b3 x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-06-01 00:15:35 +02:00
Michael Niedermayer
16c430e8ef Merge commit 'cae39851201b7781f1262e1c23627b45e6e80bb4'
* commit 'cae39851201b7781f1262e1c23627b45e6e80bb4':
  x86: Add helper macros to check for slow cpuflags

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-31 23:59:48 +02:00
James Almer
cae3985120 x86: Add helper macros to check for slow cpuflags
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-31 12:07:11 +02:00
James Almer
d68c05380c x86: check for AV_CPU_FLAG_AVXSLOW where useful
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-31 12:07:11 +02:00
James Almer
f7cafb5d02 x86: add AV_CPU_FLAG_AVXSLOW flag
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-31 12:07:11 +02:00
Timothy Gu
dd4d709be7 x86inc: Clear __SECT__
Silences warning(s) like:

    libavcodec/x86/fft.asm:93: warning: section flags ignored on
    section redeclaration

The cause of this warning is that because `struc` and `endstruc`
attempts to revert to the previous section state [1].

The section state is stored in the macro __SECT__, defined by
x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION`
directive [2].

Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.

That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].

That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).

[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-28 11:40:15 +02:00
Timothy Gu
204b228a1d x86inc: Clear __SECT__
This commit silences warning(s) like:

    libavcodec/x86/fft.asm:93: warning: section flags ignored on section
    redeclaration

The cause of this warning is that because `struc` and `endstruc` attempts to
revert to the previous section state [1]. The section state is stored in the
macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the
`SECTION` directive [2].  Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.

That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].

That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).

[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-28 00:08:37 +02:00
James Almer
c312bfac4c x86/cpu: add AV_CPU_FLAG_AVXSLOW flag
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-05-27 03:31:11 -03:00
Michael Niedermayer
d630f38f47 avutil/x86/Makefile: fix conditional x86/emms.o build
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-04-09 01:12:51 +02:00
Ronald S. Bultje
b926f02e81 avutil/x86/Makefile: Make building and linking of emms.c conditional
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-04-08 17:25:35 +02:00
James Almer
60b9373dbd libavutil: add bmi2 optimized av_mod_uintp2
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-03-20 15:47:43 -03:00
Peter Cordes
9e5687adf2 pixelutils: Comment on (lack of) sad_8x8_sse2
Signed-off-by: Peter Cordes <peter@cordes.ca>
2015-03-04 21:58:53 +01:00
James Almer
bc65abc8d7 libavutil: add x86 optimized av_popcount
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-02-25 19:58:00 -03:00
Christophe Gisquet
d9293c776e x86inc: Correctly warn on use of SSE2 instructions in SSE functions
SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2
instructions did not issue warnings when used in SSE functions. Handle
it by also checking the register type when such instructions are used.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-17 12:35:58 +01:00
Christophe Gisquet
e93d3a22cb x86: lavu/x264asm: fix ymm register instantiation
This mimicks what is done for the other instruction sets.

Tested-by: James Almer <jamrial@gmail.com>
Tested-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-04 00:18:29 +01:00
James Darnley
12120174ce lavu/x86/x86inc: deprecate INIT_AVX
The same can be done with INIT_XMM avx

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-02 01:09:16 +01:00
Anton Mitrofanov
a1684311b3 x264asm: warn when inappropriate instruction used in function with specified cpuflags
Requested-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Requested-by: "Ronald S. Bultje" <rsbultje@gmail.com>
2015-02-02 00:06:14 +01:00
James Almer
37b35feb64 x86/swr: add SSE2/AVX pack_8ch functions
Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2014-12-30 23:05:27 -03:00
Kieran Kunhya
9a738c27dc v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
2014-12-05 13:03:49 +00:00
Kieran Kunhya
36091742d1 v210enc: Add SIMD optimised 8-bit and 10-bit encoders
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-11-26 20:30:47 +01:00
Michael Niedermayer
579a0fdc21 avutil/lls: Make unchanged function arguments const
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-28 19:32:07 +02:00
lvqcl
e58fc44649 avutil/x86/cpu: fix cpuid sub-leaf selection
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-27 13:21:31 +02:00
Henrik Gramner
f629705b02 x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags
Previously there was a limit of two cpuflags.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 02:00:25 -07:00
Loren Merritt
ec217218c2 x86inc: Free up variable name "n" in global namespace
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 02:00:19 -07:00
Henrik Gramner
176a0fca3f x86inc: Make ym# behave the same way as xm#
This makes more sense for future implementations of templates with zmm registers.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 01:45:14 -07:00
Henrik Gramner
428aa14a48 x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags
Previously there was a limit of two cpuflags.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 14:06:03 +02:00
Henrik Gramner
720c21d11f x86inc: Make ym# behave the same way as xm#
This makes more sense for future implementations of templates with zmm registers.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 01:55:28 +02:00
Loren Merritt
a4dbabc8b3 x86inc: free up variable name "n" in global namespace
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 01:41:50 +02:00
Clément Bœsch
554d819062 avutil/pixelutils: faster pixelutils_sad_16x16
501 to 439 decicycles.

See 45c7f3997e.
2014-08-23 20:12:56 +02:00
Clément Bœsch
45c7f3997e avutil/pixelutils: faster pixelutils_sad_[au]_16x16
~560 → ~500 decicycles

This is following the comments from Michael in
https://ffmpeg.org/pipermail/ffmpeg-devel/2014-August/160599.html

Using 2 registers for accumulator didn't help. On the other hand,
some re-ordering between the movs and psadbw allowed going ~538 to ~500.
2014-08-23 10:18:53 +02:00
Michael Niedermayer
70b8668fb5 drop LLS1, rename LLS2 to LLS
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-09 23:20:31 +02:00
Clément Bœsch
28a2107a8d avutil: add pixelutils API 2014-08-05 21:05:52 +02:00
James Almer
d0f56ca071 x86/hevc_deblock: improve 8bit transpose store macros
Up to four instructions less depending on function and instruction set.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-08-03 04:24:15 +02:00
James Almer
1ace9573dc x86/hevc_idct: replace old and unused idct functions
Only 8-bit and 10-bit idct_dc() functions are included (adding others should be trivial).

Benchmarks on an Intel Core i5-4200U:

idct8x8_dc
       SSE2   MMXEXT  C
cycles 22     26      57

idct16x16_dc
       AVX2   SSE2    C
cycles 27     32      249

idct32x32_dc
       AVX2   SSE2    C
cycles 62     126     1375

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-26 18:00:11 +02:00
Michael Niedermayer
8d0c7031a8 Merge commit '79793f833784121d574454af4871866576c0749d'
* commit '79793f833784121d574454af4871866576c0749d':
  Update Fiona's name in copyright statements.

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-01 15:43:40 +02:00
Diego Biurrun
79793f8337 Update Fiona's name in copyright statements. 2014-07-01 03:26:51 -07:00
Christophe Gisquet
9107612818 x86util: add and use RSHIFT/LSHIFT macros
Those macros take a byte number as shift argument, as this argument
differs between MMX and SSE2 instructions.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-15 13:19:27 +02:00
James Almer
85065d2a7c x86/float_dsp: add missing femms
It was lost during the port.
Should fix fate on 3dnowext machines.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-08 20:06:28 +02:00
James Almer
dcaf9660b6 x86/float_dsp: port vector_fmul_window to yasm
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-08 12:41:32 +02:00
James Almer
fc8db12a73 x86/vp9: inital AVX2 intra_pred
tos3k-vp9-b10000.webm on a Core i5-4200U @1.6GHz

1219 decicycles in ff_vp9_ipred_dc_32x32_ssse3, 131070 runs, 2 skips
439 decicycles in ff_vp9_ipred_dc_32x32_avx2, 131070 runs, 2 skips

3570 decicycles in ff_vp9_ipred_dc_top_32x32_ssse3, 4096 runs, 0 skips
2494 decicycles in ff_vp9_ipred_dc_top_32x32_avx2, 4096 runs, 0 skips

1419 decicycles in ff_vp9_ipred_dc_left_32x32_ssse3, 16384 runs, 0 skips
717 decicycles in ff_vp9_ipred_dc_left_32x32_avx2, 16384 runs, 0 skips

2737 decicycles in ff_vp9_ipred_tm_32x32_avx, 1024 runs, 0 skips
2088 decicycles in ff_vp9_ipred_tm_32x32_avx2, 1024 runs, 0 skips

3090 decicycles in ff_vp9_ipred_v_32x32_avx, 512 runs, 0 skips
2226 decicycles in ff_vp9_ipred_v_32x32_avx2, 512 runs, 0 skips

1565 decicycles in ff_vp9_ipred_h_32x32_avx, 1024 runs, 0 skips
922 decicycles in ff_vp9_ipred_h_32x32_avx2, 1024 runs, 0 skips

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-06-08 02:37:20 +02:00
Christophe Gisquet
2267003981 x86: hpeldsp: better factorization
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29 21:47:40 +02:00
James Almer
561bfc85eb x86/dsputilenc: implement SSE2 versions of pix_{sum16, norm1}
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-28 23:29:34 +02:00
Matt Oliver
1898c2f49d inline asm: fix arrays as named constraints.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-07 15:02:45 +02:00
James Almer
3b06208a57 x86/float_dsp: remove duplicated code from vector_dmul_scalar
Use the xm# and ym# aliases as they remain in sync with m# after a SWAP.
No actual changes to the assembly.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-19 14:21:51 +02:00
James Almer
76ed71a72b x86: move horizontal add macros to x86util
Also port relevant AVX2/XOP optimizations from x264 with permission
to relicense to LGPL from the corresponding authors

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-17 14:15:09 +02:00
James Almer
11b36b1ee0 x86/float_dsp: unroll loop in vector_fmac_scalar
~6% faster SSE2 performance. AVX/FMA3 are unaffected.

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-16 18:36:52 +02:00
James Almer
3b808900af x86/float_dsp: use SWAP in vector_fmac_scalar Win64
The mova is unnecessary

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-04-16 15:46:21 +02:00
James Almer
2d9821a208 x86/cpu: check for OS support before enabling AVX2
AV_CPU_FLAG_AVX is enabled at this point only if there's OS support.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-25 17:56:43 +01:00
Matt Oliver
8236747511 Automatically change MANGLE() into named inline asm operands when direct symbol reference in inline asm are not supported.
This is part of the patch-set for intel C inline asm on windows support

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-18 23:39:30 +01:00
James Almer
7d7487e85c x86/float_dsp: add ff_vector_{fmul_add, fmac_scalar}_fma3
~7% faster than AVX

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-13 04:34:05 +01:00
Michael Niedermayer
4159f702a7 avutil/timer: Fix units for x86 after c708b54033
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-09 15:22:02 +01:00
James Almer
3f3d748cab x86: Move XOP emulation to x86util
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-24 08:30:19 +01:00
Michael Niedermayer
bd8d73ea8b Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: add detection for Bit Manipulation Instruction sets

Conflicts:
	libavutil/x86/cpu.c

See: 0bc3de19ff
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-23 22:52:58 +01:00
Michael Niedermayer
d9574069c1 Merge commit '1b932eb1508f550fac9e911923a0383efda53aa3'
* commit '1b932eb1508f550fac9e911923a0383efda53aa3':
  x86: add detection for FMA3 instruction set

Conflicts:
	configure
	libavutil/cpu.h
	libavutil/x86/cpu.c

See: a2af8eddab
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-23 22:43:08 +01:00
James Almer
d59fcdaff3 x86: add detection for Bit Manipulation Instruction sets
Based on x264 code

Signed-off-by: James Almer <jamrial@gmail.com>
2014-02-23 15:29:36 +01:00
James Almer
1b932eb150 x86: add detection for FMA3 instruction set
Based on x264 code

Signed-off-by: James Almer <jamrial@gmail.com>
2014-02-23 15:29:36 +01:00
James Almer
10b0161d78 x86: add missing XOP checks and macros
Signed-off-by: James Almer <jamrial@gmail.com>
2014-02-23 15:29:36 +01:00
James Almer
0bc3de19ff x86: add detection for Bit Manipulation Instruction sets
Based on x264 code

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-22 17:26:00 +01:00
James Almer
a2af8eddab x86: add detection for FMA3 instruction set
Based on x264 code

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-22 17:25:52 +01:00
Christophe Gisquet
996697e266 x86: float dsp: unroll SSE versions
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.

Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-20 14:18:05 +01:00
Christophe Gisquet
133b34207c x86: float dsp: unroll SSE versions
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.

Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-15 18:54:21 +01:00
James Almer
23a8c63452 x86inc: Extend FMA_INSTR functionality
Support the cases where the first and last operand of
the XOP instruction are the same.

Also add vpmacsdql emulation.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-13 22:14:24 +01:00
James Almer
6c12b1de06 x86: add missing XOP checks and macros
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-11 03:46:52 +01:00
Loren Merritt
b7d0d10a1d x86inc: Speed up assembling with Yasm
Work around Yasm's inefficiency with handling large numbers of variables
in the global scope.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-01-26 18:40:08 +01:00
Loren Merritt
4d55fe7204 x86inc: speed up compilation with yasm
Work around yasm's inefficiency with handling large numbers of variables
in the global scope.
2014-01-18 01:19:16 +01:00
Michael Niedermayer
c3814ab654 rename new lls code to lls2 to avoid conflict with the old which has a different ABI
also remove failed attempt at a compatibility layer, the code simply cannot work

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-11-17 16:41:08 +01:00
Michael Niedermayer
bbe66ef912 avutil: rename lls to lls2
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-11-17 16:30:23 +01:00
Michael Niedermayer
a665704402 Merge commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4'
* commit '4d6ee0725553a43ba88d6f8327ebcf8f1c5ae8d4':
  libavutil: x86: Add AVX2 capable CPU detection.

Conflicts:
	libavutil/cpu.c
	libavutil/cpu.h
	libavutil/x86/cpu.c

See: 865b70bc5d
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-26 02:36:36 +02:00
Kieran Kunhya
865b70bc5d Add AVX2 capable CPU detection. Patch based on x264's AVX2 detection
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-26 02:34:22 +02:00
Kieran Kunhya
4d6ee07255 libavutil: x86: Add AVX2 capable CPU detection.
Patch based on x264's AVX2 detection

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-25 19:36:55 +01:00
Michael Niedermayer
f9bef2bec9 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: more AVX2 framework

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 16:13:57 +02:00
Michael Niedermayer
e3e0e3d0c9 Merge commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497'
* commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497':
  x86inc: FMA3/4 Support

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 16:06:22 +02:00
Michael Niedermayer
9ac124c889 Merge commit '206895708ea2b464755d340e44501daf9a07c310'
* commit '206895708ea2b464755d340e44501daf9a07c310':
  x86inc: Remove our FMA4 support

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 15:54:23 +02:00
Michael Niedermayer
12e4493f9c Merge commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098'
* commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098':
  x86inc: Use VEX-encoded instructions in AVX functions

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 15:48:34 +02:00
Jason Garrett-Glaser
a3fabc6cb3 x86: more AVX2 framework
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:56 +01:00
Jason Garrett-Glaser
c6908d6b4b x86inc: FMA3/4 Support
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:54 +01:00
Derek Buitenhuis
206895708e x86inc: Remove our FMA4 support
This is so we can sync to x264's version of FMA4 support.

This partialy reverts commit 79687079a9.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:39:29 +01:00
Henrik Gramner
c108ba0175 x86inc: Use VEX-encoded instructions in AVX functions
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.

This change makes it easier to extend existing code to use AVX2.

Also add support for AVX emulation of a few instructions that
were missing before.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:36:11 +01:00
Michael Niedermayer
31d0d35560 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86inc: Remove .rodata kludges

Conflicts:
	libavutil/x86/x86inc.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-09 14:29:42 +02:00
Henrik Gramner
ad7d7d4f6a x86inc: Remove .rodata kludges
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-09 07:44:30 -04:00
Michael Niedermayer
19c3890819 Merge commit '3e2fa991db7ef172579422accd61624d52777e5a'
* commit '3e2fa991db7ef172579422accd61624d52777e5a':
  x86inc: remove misaligned cpu flag

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 12:02:21 +02:00
Michael Niedermayer
31d9aa6b2e Merge commit '71155665414b551ad350622d5abed20e58371fbf'
* commit '71155665414b551ad350622d5abed20e58371fbf':
  x86inc: various minor backports from x264

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:57:39 +02:00
Michael Niedermayer
3f965ab95d Merge commit '47f9d7ce5493e119e09d1227d017414feaaf8d97'
* commit '47f9d7ce5493e119e09d1227d017414feaaf8d97':
  x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:37:22 +02:00
Michael Niedermayer
1f17619fe4 Merge commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450'
* commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450':
  x86inc: Utilize the shadow space on 64-bit Windows

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:23:00 +02:00
Michael Niedermayer
17d9c7c208 Merge commit '3fb78e99a04d0ed8db834d813d933eb86c37142a'
* commit '3fb78e99a04d0ed8db834d813d933eb86c37142a':
  x86inc: create xm# and ym#, analagous to m#

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:15:17 +02:00
Michael Niedermayer
3352fdb292 Merge commit '49ebe3f9fe02174ae7e14548001fd146ed375cc2'
* commit '49ebe3f9fe02174ae7e14548001fd146ed375cc2':
  x86inc: fix some corner cases of SWAP

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:07:03 +02:00
Michael Niedermayer
006c0fcfea Merge commit '63f0d623100bdb0c6081456127f4b6713e83d3db'
* commit '63f0d623100bdb0c6081456127f4b6713e83d3db':
  x86inc: Use SSE instead of SSE2 for copying data

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:01:40 +02:00
Michael Niedermayer
faafffaf82 Merge commit 'ad76e6e7e193b98e7335156422d35467816f9ef1'
* commit 'ad76e6e7e193b98e7335156422d35467816f9ef1':
  x86inc: Set ELF hidden visibility for global constants

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 10:52:51 +02:00
Michael Niedermayer
c1488fab3d Merge commit '25cb0c1a1e66edacc1667acf6818f524c0997f10'
* commit '25cb0c1a1e66edacc1667acf6818f524c0997f10':
  x86inc: activate REP_RET automatically

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 10:27:30 +02:00
Henrik Gramner
3e2fa991db x86inc: remove misaligned cpu flag
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.

Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:38 -04:00
Jason Garrett-Glaser
7115566541 x86inc: various minor backports from x264
Small backports that sneaked into other asm commits in x264.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:22 -04:00
Derek Buitenhuis
47f9d7ce54 x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
This is also a valid value for WIN64.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:08 -04:00
Henrik Gramner
bbe4a6db44 x86inc: Utilize the shadow space on 64-bit Windows
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:35 -04:00
Loren Merritt
3fb78e99a0 x86inc: create xm# and ym#, analagous to m#
For when we want to mix simd sizes within one function.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:19 -04:00
Loren Merritt
49ebe3f9fe x86inc: fix some corner cases of SWAP
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:06 -04:00
Henrik Gramner
63f0d62310 x86inc: Use SSE instead of SSE2 for copying data
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:33 -04:00
Henrik Gramner
ad76e6e7e1 x86inc: Set ELF hidden visibility for global constants
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:13 -04:00
Loren Merritt
25cb0c1a1e x86inc: activate REP_RET automatically
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.

The implementation involves lots of spurious labels, but that's OK
because we strip them.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:17:59 -04:00
Ronald S. Bultje
c07ac8d467 VP9 MC (ssse3) optimizations.
Decoding time of ped1080p.webm goes from 20.7sec to 11.3sec.
2013-10-02 21:03:15 -04:00
Michael Niedermayer
361bc70731 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  avutil: Fix compilation with inline asm disabled on mingw

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-09-22 11:51:38 +02:00
Alex Smith
08fa828b3f avutil: Fix compilation with inline asm disabled on mingw
Because of -Werror=implicit-function-declaration the build will fail.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-09-22 00:50:32 +03:00
Thilo Borgmann
d814a839ac Reinstate proper FFmpeg license for all files. 2013-08-30 15:47:38 +00:00
Michael Niedermayer
f0a3562382 Merge commit '79aec43ce813a3e270743ca64fa3f31fa43df80b'
* commit '79aec43ce813a3e270743ca64fa3f31fa43df80b':
  x86: Add and use more convenience macros to check CPU extension availability

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-30 11:57:35 +02:00
Michael Niedermayer
2a60666d1d Merge commit '8410d6e93c2e074881f1c7b7e4cdefd2e497d52e'
* commit '8410d6e93c2e074881f1c7b7e4cdefd2e497d52e':
  avutil: Refactor CPU extension availability macros

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-29 14:15:10 +02:00
Michael Niedermayer
c83d794936 Merge commit 'b78b10c4b78b696927f2801cf2d9f193b4eff28b'
* commit 'b78b10c4b78b696927f2801cf2d9f193b4eff28b':
  avutil: Move internal CPU detection function declarations to private header

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-08-29 14:05:15 +02:00
Diego Biurrun
79aec43ce8 x86: Add and use more convenience macros to check CPU extension availability 2013-08-29 13:07:37 +02:00
Diego Biurrun
8410d6e93c avutil: Refactor CPU extension availability macros 2013-08-28 23:54:14 +02:00
Diego Biurrun
b78b10c4b7 avutil: Move internal CPU detection function declarations to private header 2013-08-28 23:54:14 +02:00
Michael Niedermayer
9d01bf7d66 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  Consistently use "cpu_flags" as variable/parameter name for CPU flags

Conflicts:
	libavcodec/x86/dsputil_init.c
	libavcodec/x86/h264dsp_init.c
	libavcodec/x86/hpeldsp_init.c
	libavcodec/x86/motion_est.c
	libavcodec/x86/mpegvideo.c
	libavcodec/x86/proresdsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-18 09:53:47 +02:00
Diego Biurrun
3ac7fa81b2 Consistently use "cpu_flags" as variable/parameter name for CPU flags 2013-07-18 00:31:35 +02:00
Michael Niedermayer
a478e99a60 avutil/x86: reenable ff_update_lls_avx()
The bug has been fixed in c8b920a9b7 by Loren Merritt

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-02 12:02:08 +02:00
Michael Niedermayer
d1fa671895 Merge commit 'c8b920a9b7fa534a6141695ace4e8c2dfcd56cee'
* commit 'c8b920a9b7fa534a6141695ace4e8c2dfcd56cee':
  lls/x86: use 3-operator vaddpd in ADDPD_MEM

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-02 11:40:44 +02:00
Loren Merritt
c8b920a9b7 lls/x86: use 3-operator vaddpd in ADDPD_MEM
Fixes build with yasm-1.1

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2013-07-02 10:15:09 +02:00
Michael Niedermayer
a6e46ed51a Revert "avutil/x86: disable ff_evaluate_lls_sse2() for 32bit"
This reverts commit 247425241c.
2013-07-01 02:27:47 +02:00
Michael Niedermayer
4e488ac5f5 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: lpc: fix a segfault in av_evaluate_lls_sse2()

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-07-01 02:26:22 +02:00
Loren Merritt
1221bb6239 x86: lpc: fix a segfault in av_evaluate_lls_sse2() 2013-06-30 23:11:19 +00:00
Michael Niedermayer
247425241c avutil/x86: disable ff_evaluate_lls_sse2() for 32bit
It just segfaults on 32bit, thus its disabled until someone fixes it.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-30 19:03:57 +02:00
Michael Niedermayer
6e76e6a05a Merge commit 'b545179fdff1ccfbbb9d422e4e9720cb6c6d9191'
* commit 'b545179fdff1ccfbbb9d422e4e9720cb6c6d9191':
  x86: lpc: simd av_evaluate_lls

Conflicts:
	libavutil/x86/lls.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-30 12:15:12 +02:00
Michael Niedermayer
a285079bc7 lls.asm: disable ff_update_lls_avx
The code doesnt build with yasm from ubuntu 12.04

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-30 12:12:11 +02:00
Michael Niedermayer
0b40c50508 lls.asm: put avx code under if HAVE_AVX_EXTERNAL
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-30 12:12:01 +02:00
Michael Niedermayer
78b5479633 Merge commit '502ab21af0ca68f76d6112722c46d2f35c004053'
* commit '502ab21af0ca68f76d6112722c46d2f35c004053':
  x86: lpc: simd av_update_lls

The versions are bumped due to changes in lls.h which is used across
libraries affecting intra library ABI
(This version bump also covers changes to lls.h in the immedeatly previous
 commits)

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-06-30 11:35:52 +02:00
Loren Merritt
b545179fdf x86: lpc: simd av_evaluate_lls
1.5x-1.8x faster on sandybridge

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-06-29 13:23:57 +02:00
Loren Merritt
502ab21af0 x86: lpc: simd av_update_lls
4x-6x faster on sandybridge

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-06-29 13:23:57 +02:00
Michael Niedermayer
3c200aa693 Merge commit '1fda184a85178cfd7b98d9e308d18e1ded76a511'
* commit '1fda184a85178cfd7b98d9e308d18e1ded76a511':
  avutil: Add av_cold attributes to init functions missing them

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-05 12:53:50 +02:00
Diego Biurrun
1fda184a85 avutil: Add av_cold attributes to init functions missing them 2013-05-04 22:48:05 +02:00
Michael Niedermayer
e91339cde2 Merge commit '566b7a20fd0cab44d344329538d314454a0bcc2f'
* commit '566b7a20fd0cab44d344329538d314454a0bcc2f':
  x86: float dsp: butterflies_float SSE

Conflicts:
	libavutil/x86/float_dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-05-03 11:57:59 +02:00
Christophe Gisquet
566b7a20fd x86: float dsp: butterflies_float SSE
97c -> 49c
Some codecs could benefit from more unrolling, but AAC doesn't.
2013-05-03 08:08:02 +02:00
Michael Niedermayer
92218aad00 butterflies_float: replace 2 lea by 2 add
adds are simpler instructions and should be faster or equally fast
on all cpus

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-17 00:10:06 +02:00
Christophe Gisquet
1a4007964c x86: float dsp: butterflies_float SSE
97c -> 49c
Some codecs could benefit from more unrolling, but AAC doesn't.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-17 00:03:25 +02:00
Ronald S. Bultje
b93b27edb0 dsputil: Make dsputil selectable
Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-10 11:04:05 +03:00
Christophe Gisquet
2e81acc687 x86inc: Fix number of operands for cmp* instructions
cmp{p,s}{s,d} instructions do take an imm8 operand.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-04-09 23:55:30 +02:00
Christophe Gisquet
0b467a6e83 x264asm: fix cmp* number of arguments
cmp{p,s}{s,d} instructions do take an imm8 operand.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-05 16:42:12 +02:00
Michael Niedermayer
63a97d5674 Merge commit 'b6649ab5037fb55f78c2606f3d23cea0867cdeaa'
* commit 'b6649ab5037fb55f78c2606f3d23cea0867cdeaa':
  cosmetics: Remove unnecessary extern keywords from function declarations

Conflicts:
	libswscale/x86/swscale.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-28 11:20:41 +01:00
Diego Biurrun
b6649ab503 cosmetics: Remove unnecessary extern keywords from function declarations 2013-03-27 14:21:45 +01:00
Ronald S. Bultje
6a701306db dsputil: make selectable.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-12 19:56:58 +01:00