1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-18 03:19:31 +02:00
Commit Graph

173 Commits

Author SHA1 Message Date
Anton Mitrofanov
8c75ba55a4 x86inc: warn if XOP integer FMA instruction emulation is impossible
Emulation requires a temporary register if arguments 1 and 4 are the same; this
doesn't obey the semantics of the original instruction, so we can't emulate
that in x86inc.

Also add pmacsdql emulation.

Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2015-08-11 11:02:27 +02:00
Anton Mitrofanov
8db0f71b49 x86inc: warn if XOP integer FMA instruction emulation is impossible
Signed-off-by: Henrik Gramner <henrik@gramner.com>
2015-08-05 16:15:40 +02:00
Henrik Gramner
f0b7882ceb x86inc: Drop SECTION_TEXT macro
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
2015-08-04 20:13:09 +02:00
Henrik Gramner
826790f596 x86inc: Support arbitrary stack alignments
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
2015-08-04 20:13:09 +02:00
James Almer
5750d6c5e9 x86: move XOP emulation code back to x86inc
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.

This further reduces differences with x264's x86inc.

Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
2015-08-03 17:11:13 -03:00
Henrik Gramner
127203ba5a x86inc: Various minor backports from x264
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-08-03 04:08:33 +02:00
Henrik Gramner
f151fbd9e5 x86inc: Disable vpbroadcastq workaround in newer yasm versions
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.

Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2015-08-03 03:13:20 +02:00
Timothy Gu
dd4d709be7 x86inc: Clear __SECT__
Silences warning(s) like:

    libavcodec/x86/fft.asm:93: warning: section flags ignored on
    section redeclaration

The cause of this warning is that because `struc` and `endstruc`
attempts to revert to the previous section state [1].

The section state is stored in the macro __SECT__, defined by
x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION`
directive [2].

Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.

That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].

That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).

[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2015-05-28 11:40:15 +02:00
Timothy Gu
204b228a1d x86inc: Clear __SECT__
This commit silences warning(s) like:

    libavcodec/x86/fft.asm:93: warning: section flags ignored on section
    redeclaration

The cause of this warning is that because `struc` and `endstruc` attempts to
revert to the previous section state [1]. The section state is stored in the
macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the
`SECTION` directive [2].  Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.

That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].

That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).

[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-05-28 00:08:37 +02:00
Christophe Gisquet
d9293c776e x86inc: Correctly warn on use of SSE2 instructions in SSE functions
SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2
instructions did not issue warnings when used in SSE functions. Handle
it by also checking the register type when such instructions are used.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-17 12:35:58 +01:00
Christophe Gisquet
e93d3a22cb x86: lavu/x264asm: fix ymm register instantiation
This mimicks what is done for the other instruction sets.

Tested-by: James Almer <jamrial@gmail.com>
Tested-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-04 00:18:29 +01:00
James Darnley
12120174ce lavu/x86/x86inc: deprecate INIT_AVX
The same can be done with INIT_XMM avx

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2015-02-02 01:09:16 +01:00
Anton Mitrofanov
a1684311b3 x264asm: warn when inappropriate instruction used in function with specified cpuflags
Requested-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Requested-by: "Ronald S. Bultje" <rsbultje@gmail.com>
2015-02-02 00:06:14 +01:00
Henrik Gramner
f629705b02 x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags
Previously there was a limit of two cpuflags.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 02:00:25 -07:00
Loren Merritt
ec217218c2 x86inc: Free up variable name "n" in global namespace
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 02:00:19 -07:00
Henrik Gramner
176a0fca3f x86inc: Make ym# behave the same way as xm#
This makes more sense for future implementations of templates with zmm registers.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-09-09 01:45:14 -07:00
Henrik Gramner
428aa14a48 x86inc: Make INIT_CPUFLAGS support an arbitrary number of cpuflags
Previously there was a limit of two cpuflags.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 14:06:03 +02:00
Henrik Gramner
720c21d11f x86inc: Make ym# behave the same way as xm#
This makes more sense for future implementations of templates with zmm registers.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 01:55:28 +02:00
Loren Merritt
a4dbabc8b3 x86inc: free up variable name "n" in global namespace
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-09-05 01:41:50 +02:00
Michael Niedermayer
8d0c7031a8 Merge commit '79793f833784121d574454af4871866576c0749d'
* commit '79793f833784121d574454af4871866576c0749d':
  Update Fiona's name in copyright statements.

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-07-01 15:43:40 +02:00
Diego Biurrun
79793f8337 Update Fiona's name in copyright statements. 2014-07-01 03:26:51 -07:00
James Almer
3f3d748cab x86: Move XOP emulation to x86util
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-24 08:30:19 +01:00
James Almer
23a8c63452 x86inc: Extend FMA_INSTR functionality
Support the cases where the first and last operand of
the XOP instruction are the same.

Also add vpmacsdql emulation.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-13 22:14:24 +01:00
Loren Merritt
b7d0d10a1d x86inc: Speed up assembling with Yasm
Work around Yasm's inefficiency with handling large numbers of variables
in the global scope.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2014-01-26 18:40:08 +01:00
Loren Merritt
4d55fe7204 x86inc: speed up compilation with yasm
Work around yasm's inefficiency with handling large numbers of variables
in the global scope.
2014-01-18 01:19:16 +01:00
Michael Niedermayer
f9bef2bec9 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: more AVX2 framework

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 16:13:57 +02:00
Michael Niedermayer
e3e0e3d0c9 Merge commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497'
* commit 'c6908d6b4b377a04a5d055ba874bdbcf06c80497':
  x86inc: FMA3/4 Support

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 16:06:22 +02:00
Michael Niedermayer
9ac124c889 Merge commit '206895708ea2b464755d340e44501daf9a07c310'
* commit '206895708ea2b464755d340e44501daf9a07c310':
  x86inc: Remove our FMA4 support

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 15:54:23 +02:00
Michael Niedermayer
12e4493f9c Merge commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098'
* commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098':
  x86inc: Use VEX-encoded instructions in AVX functions

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-14 15:48:34 +02:00
Jason Garrett-Glaser
a3fabc6cb3 x86: more AVX2 framework
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:56 +01:00
Jason Garrett-Glaser
c6908d6b4b x86inc: FMA3/4 Support
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:41:54 +01:00
Derek Buitenhuis
206895708e x86inc: Remove our FMA4 support
This is so we can sync to x264's version of FMA4 support.

This partialy reverts commit 79687079a9.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:39:29 +01:00
Henrik Gramner
c108ba0175 x86inc: Use VEX-encoded instructions in AVX functions
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.

This change makes it easier to extend existing code to use AVX2.

Also add support for AVX emulation of a few instructions that
were missing before.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-14 12:36:11 +01:00
Michael Niedermayer
31d0d35560 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86inc: Remove .rodata kludges

Conflicts:
	libavutil/x86/x86inc.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-09 14:29:42 +02:00
Henrik Gramner
ad7d7d4f6a x86inc: Remove .rodata kludges
The Mach-O bug was fixed in yasm 0.8.0 and we don't
support versions that old anymore.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-09 07:44:30 -04:00
Michael Niedermayer
19c3890819 Merge commit '3e2fa991db7ef172579422accd61624d52777e5a'
* commit '3e2fa991db7ef172579422accd61624d52777e5a':
  x86inc: remove misaligned cpu flag

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 12:02:21 +02:00
Michael Niedermayer
31d9aa6b2e Merge commit '71155665414b551ad350622d5abed20e58371fbf'
* commit '71155665414b551ad350622d5abed20e58371fbf':
  x86inc: various minor backports from x264

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:57:39 +02:00
Michael Niedermayer
3f965ab95d Merge commit '47f9d7ce5493e119e09d1227d017414feaaf8d97'
* commit '47f9d7ce5493e119e09d1227d017414feaaf8d97':
  x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:37:22 +02:00
Michael Niedermayer
1f17619fe4 Merge commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450'
* commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450':
  x86inc: Utilize the shadow space on 64-bit Windows

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:23:00 +02:00
Michael Niedermayer
17d9c7c208 Merge commit '3fb78e99a04d0ed8db834d813d933eb86c37142a'
* commit '3fb78e99a04d0ed8db834d813d933eb86c37142a':
  x86inc: create xm# and ym#, analagous to m#

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:15:17 +02:00
Michael Niedermayer
3352fdb292 Merge commit '49ebe3f9fe02174ae7e14548001fd146ed375cc2'
* commit '49ebe3f9fe02174ae7e14548001fd146ed375cc2':
  x86inc: fix some corner cases of SWAP

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:07:03 +02:00
Michael Niedermayer
006c0fcfea Merge commit '63f0d623100bdb0c6081456127f4b6713e83d3db'
* commit '63f0d623100bdb0c6081456127f4b6713e83d3db':
  x86inc: Use SSE instead of SSE2 for copying data

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 11:01:40 +02:00
Michael Niedermayer
faafffaf82 Merge commit 'ad76e6e7e193b98e7335156422d35467816f9ef1'
* commit 'ad76e6e7e193b98e7335156422d35467816f9ef1':
  x86inc: Set ELF hidden visibility for global constants

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 10:52:51 +02:00
Michael Niedermayer
c1488fab3d Merge commit '25cb0c1a1e66edacc1667acf6818f524c0997f10'
* commit '25cb0c1a1e66edacc1667acf6818f524c0997f10':
  x86inc: activate REP_RET automatically

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-10-08 10:27:30 +02:00
Henrik Gramner
3e2fa991db x86inc: remove misaligned cpu flag
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.

Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:38 -04:00
Jason Garrett-Glaser
7115566541 x86inc: various minor backports from x264
Small backports that sneaked into other asm commits in x264.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:22 -04:00
Derek Buitenhuis
47f9d7ce54 x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
This is also a valid value for WIN64.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:27:08 -04:00
Henrik Gramner
bbe4a6db44 x86inc: Utilize the shadow space on 64-bit Windows
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:35 -04:00
Loren Merritt
3fb78e99a0 x86inc: create xm# and ym#, analagous to m#
For when we want to mix simd sizes within one function.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:19 -04:00
Loren Merritt
49ebe3f9fe x86inc: fix some corner cases of SWAP
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:25:06 -04:00
Henrik Gramner
63f0d62310 x86inc: Use SSE instead of SSE2 for copying data
Reduces code size because movaps/movups is one byte
shorter than movdqa/movdqu.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:33 -04:00
Henrik Gramner
ad76e6e7e1 x86inc: Set ELF hidden visibility for global constants
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:24:13 -04:00
Loren Merritt
25cb0c1a1e x86inc: activate REP_RET automatically
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.

The implementation involves lots of spurious labels, but that's OK
because we strip them.

Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
2013-10-07 06:17:59 -04:00
Ronald S. Bultje
c07ac8d467 VP9 MC (ssse3) optimizations.
Decoding time of ped1080p.webm goes from 20.7sec to 11.3sec.
2013-10-02 21:03:15 -04:00
Christophe Gisquet
2e81acc687 x86inc: Fix number of operands for cmp* instructions
cmp{p,s}{s,d} instructions do take an imm8 operand.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-04-09 23:55:30 +02:00
Christophe Gisquet
0b467a6e83 x264asm: fix cmp* number of arguments
cmp{p,s}{s,d} instructions do take an imm8 operand.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-04-05 16:42:12 +02:00
Ronald S. Bultje
0c0828ecc5 x86: Use simple nop codes for <= sse (rather than <= mmx)
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-02-19 22:33:19 +02:00
Ronald S. Bultje
b582af1ed7 Use simple nop codes for <= sse (rather than <= mmx).
The "CPU: CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.

Change-Id: I7e7c52a2191006df30a9aadbc40d481a1db89106
2013-02-11 23:38:57 +01:00
Michael Niedermayer
b45e0c2573 Merge commit 'd633d12b2cc999cee3ac25bf9a810fe7ff03726d'
* commit 'd633d12b2cc999cee3ac25bf9a810fe7ff03726d':
  x86inc: Add cvisible macro for C functions with public prefix

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-19 13:11:41 +01:00
Michael Niedermayer
1b03e09198 Merge commit 'ef5d41a5534b65f03d02f2e11a503ab8416bfc3b'
* commit 'ef5d41a5534b65f03d02f2e11a503ab8416bfc3b':
  x86inc: Rename "program_name" to "private_prefix"
  configure: Run SHFLAGS through ldflags_filter()

Conflicts:
	configure

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-19 13:01:06 +01:00
Diego Biurrun
d633d12b2c x86inc: Add cvisible macro for C functions with public prefix
This allows defining externally visible library symbols.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 22:02:03 +01:00
Diego Biurrun
ef5d41a553 x86inc: Rename "program_name" to "private_prefix"
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-01-18 20:29:53 +01:00
Michael Niedermayer
7e90053822 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  mpegvideo: increase edge_emu_buffer size for VC1
  lavc: merge latest x86inc.asm fixes with x264

Conflicts:
	libavcodec/mpegvideo.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-20 02:51:35 +01:00
Ronald S. Bultje
a34d9ad969 lavc: merge latest x86inc.asm fixes with x264
Unbreak NASM support.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-19 07:27:33 +01:00
Michael Niedermayer
a01fe55077 Merge commit 'c0dc57f1264dad1e121772d03abdb9e14ed8857f'
* commit 'c0dc57f1264dad1e121772d03abdb9e14ed8857f':
  asyncts: merge two conditions
  x86inc: fully concatenate tokens to fix macro expansion for nasm
  h264: initialize frame-mt context copies properly

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-14 15:43:46 +01:00
Janne Grunau
0995ad8db4 x86inc: fully concatenate tokens to fix macro expansion for nasm
Fixes build errors with nasm introduced in 6f40e9f070 for stack
memory alignment. Noticed by BugMaster.
2012-12-13 23:57:09 +01:00
Michael Niedermayer
7897919a88 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  aacdec: Fix an off-by-one overwrite when switching to LTP profile from MAIN.
  x86inc: fix stack alignment on win64
  rtpproto: Remove unused defines

Conflicts:
	libavcodec/aacdec.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-13 12:23:48 +01:00
Ronald S. Bultje
140367aff9 x86inc: fix stack alignment on win64
Signed-off-by: Martin Storsjö <martin@martin.st>
2012-12-12 21:30:49 +02:00
Ronald S. Bultje
ce58642ed0 x86inc: support stack mem allocation and re-alignment in PROLOGUE.
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-12 10:37:52 +01:00
Ronald S. Bultje
6f40e9f070 x86inc: support stack mem allocation and re-alignment in PROLOGUE
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-12-12 05:23:46 +01:00
Michael Niedermayer
54a71f2e6c Merge commit 'b519298a1578e0c895d53d4b4ed8867b1c031a56'
* commit 'b519298a1578e0c895d53d4b4ed8867b1c031a56':
  pixdesc: fix yuva 10bit bit depth
  avconv: deprecate the -vol option
  x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling
  x86: af_volume: add SSE2-optimized s16 volume scaling

Conflicts:
	ffmpeg.c
	tests/ref/lavfi/pixdesc
	tests/ref/lavfi/pixfmts_copy
	tests/ref/lavfi/pixfmts_null
	tests/ref/lavfi/pixfmts_scale
	tests/ref/lavfi/pixfmts_vflip

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-12-06 15:55:47 +01:00
Justin Ruggles
b30a363331 x86: af_volume: add SSE2/SSSE3/AVX-optimized s32 volume scaling 2012-12-05 11:23:37 -05:00
Michael Niedermayer
da501ea857 Merge commit '802713c4e7b41bc2deed754d78649945c3442063'
* commit '802713c4e7b41bc2deed754d78649945c3442063':
  mss2: prevent potential uninitialized reads
  mss2: reindent after last commit
  mss2: fix handling of unmasked implicit WMV9 rectangles
  configure: add lavu dependency to lavr/lavfi .pc files
  x86inc: Set program_name outside of x86inc.asm

Conflicts:
	configure

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-12 10:57:06 +01:00
Diego Biurrun
f0d124f005 x86inc: Set program_name outside of x86inc.asm
This reduces the local difference to the x264 upstream version.
2012-11-11 11:06:19 +01:00
Michael Niedermayer
1dad486714 Merge commit '9ce02e14f01de50fcc6f7f459544b140be66d615'
* commit '9ce02e14f01de50fcc6f7f459544b140be66d615':
  x86: ac3dsp: port to cpuflags
  x86util: Add cpuflags_mmxext alias for cpuflags_mmx2
  x86inc: Only define program_name if the macro is unset

Conflicts:
	libavcodec/x86/ac3dsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-11-03 13:38:38 +01:00
Diego Biurrun
012f73e271 x86inc: Only define program_name if the macro is unset
This allows overriding the value from outside of the file.
2012-11-02 14:38:00 +01:00
Ronald S. Bultje
08b028c18d Remove INIT_AVX from x86inc.asm. 2012-10-29 14:51:14 -07:00
Michael Niedermayer
17106a7c90 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  audio_frame_queue: Clean up ff_af_queue_log_state debug function
  dwt: Remove unused code.
  cavs: convert cavsdata.h to a .c file
  cavs: Move inline functions only used in one file out of the header
  cavs: Move data tables used in only one place to that file
  fate: Add a single symbol Ut Video decoder test
  vf_hqdn3d: x86 asm
  vf_hqdn3d: support 16bit colordepth
  avconv: prefer user-forced input framerate when choosing output framerate

Conflicts:
	ffmpeg.c
	libavcodec/audio_frame_queue.c
	libavcodec/dwt.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-26 22:40:02 +02:00
Loren Merritt
7a1944b907 vf_hqdn3d: x86 asm
13% faster on penryn, 16% on sandybridge, 15% on bulldozer
Not simd; a compiler should have generated this, but gcc didn't.
2012-08-26 10:49:14 +00:00
Michael Niedermayer
c794acc44e x86inc.asm: remove redundant ifdef __YASM_VER__
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-08 01:14:18 +02:00
Michael Niedermayer
2fc7c818cb Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: fix build with nasm 2.08
  x86: use nop cpu directives only if supported
  x86: fix rNmp macros with nasm
  build: add trailing / to yasm/nasm -I flags
  x86: use 32-bit source registers with movd instruction
  x86: add colons after labels

Conflicts:
	Makefile
	libavutil/x86/x86inc.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-07 23:04:55 +02:00
Mans Rullgard
edd8226795 x86: fix build with nasm 2.08
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro.  This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
2012-08-07 15:24:34 +01:00
Mans Rullgard
180d43bc67 x86: use nop cpu directives only if supported
nasm does not support 'CPU foonop' directives.  This adds a configure
test for the directive and uses it only if supported.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:22:20 +01:00
Mans Rullgard
7238265052 x86: fix rNmp macros with nasm
For some reason, nasm requires this.  No harm done to yasm.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2012-08-07 15:21:58 +01:00
Michael Niedermayer
a7acab6cda Merge remote-tracking branch 'qatar/master'
* qatar/master:
  vc1dec: Remove separate scaling function for interlaced field MVs
  vc1dec: Invoke edge_emulation regardless of MV precision
  x86: Use consistent 3dnowext function and macro name suffixes
  g723_1: scale output as supposed for the case with postfilter disabled
  g723_1: increase excitation storage by 4
  g723_1: fix upper bound parameter from inverse maximum autocorrelation
  g723_1: make scale_vector() behave like the reference
  g723_1: fix off-by-one error in normalize_bits()
  g723_1: save/restore excitation with offset to store LPC history
  wmapro: prevent division by zero when sample rate is unspecified
  x86: proresdsp: improve SIGNEXTEND macro comments
  x86: h264dsp: K&R formatting cosmetics
  LICENSE: Document all GPL files

Conflicts:
	libavcodec/g723_1.c
	libavcodec/wmaprodec.c
	libavcodec/x86/h264dsp_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-08-03 23:13:06 +02:00
Diego Biurrun
ca844b7be9 x86: Use consistent 3dnowext function and macro name suffixes
Currently there is a wild mix of 3dn2/3dnow2/3dnowext.  Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
2012-08-03 14:00:47 +02:00
Michael Niedermayer
706bd8ea19 Merge remote-tracking branch 'qatar/master'
* qatar/master: (35 commits)
  h264_idct_10bit: port x86 assembly to cpuflags.
  x86inc: clip num_args to 7 on x86-32.
  x86inc: sync to latest version from x264.
  fft: rename "z" to "zc" to prevent name collision.
  wv: return meaningful error codes.
  wv: return AVERROR_EOF on EOF, not EIO.
  mp3dec: forward errors for av_get_packet().
  mp3dec: remove a pointless local variable.
  mp3dec: remove commented out cruft.
  lavfi: bump minor to mark stabilizing the ABI.
  FATE: add tests for yadif.
  FATE: add a test for delogo video filter.
  FATE: add a test for amix audio filter.
  audiogen: allow specifying random seed as a commandline parameter.
  vc1dec: Override invalid macroblock quantizer
  vc1: avoid reading beyond the last line in vc1_draw_sprites()
  vc1dec: check that coded slice positions and interlacing match.
  vc1dec: Do not ignore ff_vc1_parse_frame_header_adv return value
  configure: Move parts that should not be user-selectable to CONFIG_EXTRA
  lavf: remove commented out cruft in avformat_find_stream_info()
  ...

Conflicts:
	Makefile
	configure
	libavcodec/vc1dec.c
	libavcodec/x86/h264_deblock.asm
	libavcodec/x86/h264_deblock_10bit.asm
	libavcodec/x86/h264dsp_mmx.c
	libavfilter/version.h
	libavformat/mp3dec.c
	libavformat/utils.c
	libavformat/wv.c
	libavutil/x86/x86inc.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-29 02:16:26 +02:00
Loren Merritt
f8d8fe255d x86inc: clip num_args to 7 on x86-32.
This allows us to unconditionally set the cglobal num_args
parameter to a bigger value, thus making writing yasm code
even easier than before.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2012-07-28 08:29:45 -07:00
Ronald S. Bultje
96c9cc1094 x86inc: sync to latest version from x264. 2012-07-28 08:29:44 -07:00
Michael Niedermayer
c6963a220d Merge remote-tracking branch 'qatar/master'
* qatar/master:
  proresdsp: port x86 assembly to cpuflags.
  lavr: x86: improve non-SSE4 version of S16_TO_S32_SX macro
  lavfi: better channel layout negotiation
  alac: check for truncated packets
  alac: reverse lpc coeff order, simplify filter
  lavr: add x86-optimized mixing functions
  x86: add support for fmaddps fma4 instruction with abstraction to avx/sse
  tscc2: fix typo in array index
  build: use COMPILE template for HOSTOBJS
  build: do full flag handling for all compiler-type tools
  eval: fix printing of NaN in eval fate test.
  build: Rename aandct component to more descriptive aandcttables
  mpegaudio: bury inline asm under HAVE_INLINE_ASM.
  x86inc: automatically insert vzeroupper for YMM functions.
  rtmp: Check the buffer length of ping packets
  rtmp: Allow having more unknown data at the end of a chunk size packet without failing
  rtmp: Prevent reading outside of an allocate buffer when receiving server bandwidth packets

Conflicts:
	Makefile
	configure
	libavcodec/x86/proresdsp.asm
	libavutil/eval.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-27 23:42:19 +02:00
Justin Ruggles
79687079a9 x86: add support for fmaddps fma4 instruction with abstraction to avx/sse 2012-07-27 11:25:48 -04:00
Ronald S. Bultje
30b45d9c38 x86inc: automatically insert vzeroupper for YMM functions. 2012-07-26 13:43:16 -07:00
Clément Bœsch
7073174551 x86inc: put basicnop under ifdef to prevent compile failure.
This should fix the NASM box.

Reviewed-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-07 22:48:43 +02:00
Michael Niedermayer
dc12f7d4ec x86inc: try to put amdnop under ifdef to prevent compile failure
based on similar amdnop usage in ffmpeg

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-07 20:16:56 +02:00
Michael Niedermayer
24823a761c Merge remote-tracking branch 'qatar/master'
* qatar/master:
  qdm2: remove broken and disabled dump_context() debug function
  x86: h264_intrapred: use newly introduced SPLAT* and PSHUFLW macros
  x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros
  x86inc: modify ALIGN to not generate long nops on i586
  x86: h264_intrapred: port to cpuflag macros
  avplay: update input filter pointer when the filtergraph is reset.
  avconv: fix parsing of -force_key_frames option.
  h264: use templates to avoid excessive inlining
  xtea: Make the count parameter match the documentation
  blowfish: Make the count parameter match the documentation
  mpegvideo: Don't use ff_mspel_motion() for vc1
  xtea: invert branch and loop precedence
  blowfish: invert branch and loop precedence
  flvdec: optionally trust the metadata
  avconv: Set audio filter time base to the sample rate
  vp8: Add ifdef guards around the sse2 loopfilter in the sse2slow branch too

Conflicts:
	ffmpeg.c
	ffplay.c
	libavcodec/h264.c
	libavcodec/mpegvideo_common.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-07-05 21:55:31 +02:00
Loren Merritt
2cd1f5cadc x86inc: modify ALIGN to not generate long nops on i586
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2012-07-05 17:37:11 +02:00
Reimar Döffinger
9b1f776d75 Fix compilation with NASM.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2012-04-20 21:16:12 +02:00
Michael Niedermayer
2a976debc1 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  dv: Initialize encoder tables during encoder init.
  dv: Replace some magic numbers by the appropriate #define.
  FATE: pass the decoded output format and audio source file to enc_dec_pcm
  FATE: specify the input format when decoding in enc_dec_pcm()
  x86inc: support AVX abstraction for 2-operand instructions
  configure: detect PGI compiler and set suitable flags
  avconv: check for an incompatible changing channel layout
  avio: make AVIOContext.av_class pointer to const
  nutdec: add malloc check and fix const to non-const conversion warnings

Conflicts:
	ffmpeg.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-19 21:23:52 +02:00
Nico Weber
a4a88fd42c Remove .rodata alignment kludge for Mach-O if a recent enough yasm is used.
Yasm was fixed in its r2161 and yasm 0.8.0 (Apr 2010) contained this fix.
Nasm was fixed in 2.06 (Jun 2009):
https://groups.google.com/group/alt.lang.asm/browse_thread/thread/fcc85bbc3745d893

I tested with yasm  0.7.99 and yasm 1.2.0.7, where this works fine.

I also tested with nasm. The nasm shipping with Xcode is too old to understand
ffmpeg's assembly, before and after the patch. Nasm 2.10 fails to compile
fft_mmx.asm on trunk with

  libavcodec/x86/fft_mmx.asm:88: panic: section ".text" has already been specified with alignment 32, conflicts with new alignment of 16

but builds fine if I change the two alignment "16"s in x86inc.asm to "32". With this patch,
nasm 2.10 fails with

  libavcodec/x86/fft_mmx.asm:39: panic: section ".rodata" has already been specified with alignment 32, conflicts with new alignment of 16

instead, but again builds fine with s/16/32/.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2012-04-19 17:14:47 +02:00
Loren Merritt
705f3d4759 x86inc: support AVX abstraction for 2-operand instructions
Add cvtdq2ps and cvtps2dq to the AVX instruction list.

Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>
2012-04-18 21:14:32 -04:00