Emulation requires a temporary register if arguments 1 and 4 are the same; this
doesn't obey the semantics of the original instruction, so we can't emulate
that in x86inc.
Also add pmacsdql emulation.
Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
Only two functions that use xop multiply-accumulate instructions where the
first operand is the same as the fourth actually took advantage of the macros.
This further reduces differences with x264's x86inc.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The bug was fixed in 1.3.0, so only perform the workaround in earlier versions.
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Silences warning(s) like:
libavcodec/x86/fft.asm:93: warning: section flags ignored on
section redeclaration
The cause of this warning is that because `struc` and `endstruc`
attempts to revert to the previous section state [1].
The section state is stored in the macro __SECT__, defined by
x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION`
directive [2].
Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.
That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].
That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).
[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This commit silences warning(s) like:
libavcodec/x86/fft.asm:93: warning: section flags ignored on section
redeclaration
The cause of this warning is that because `struc` and `endstruc` attempts to
revert to the previous section state [1]. The section state is stored in the
macro __SECT__, defined by x86inc.asm to be `.note.GNU-stack ...`, through the
`SECTION` directive [2]. Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.
That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].
That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).
[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2
instructions did not issue warnings when used in SSE functions. Handle
it by also checking the register type when such instructions are used.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This mimicks what is done for the other instruction sets.
Tested-by: James Almer <jamrial@gmail.com>
Tested-by: Mickaël Raulet <mraulet@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Support the cases where the first and last operand of
the XOP instruction are the same.
Also add vpmacsdql emulation.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'c108ba0175d4fc3a3253a8b0f782fbfb96ba5098':
x86inc: Use VEX-encoded instructions in AVX functions
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a9.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
* commit '47f9d7ce5493e119e09d1227d017414feaaf8d97':
x86inc: Check for __OUTPUT_FORMAT__ having a value of "x64"
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'bbe4a6db44f0b55b424a5cc9d3e89cd88e250450':
x86inc: Utilize the shadow space on 64-bit Windows
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '63f0d623100bdb0c6081456127f4b6713e83d3db':
x86inc: Use SSE instead of SSE2 for copying data
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ad76e6e7e193b98e7335156422d35467816f9ef1':
x86inc: Set ELF hidden visibility for global constants
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.
Signed-off-by: Martin Storsjö <martin@martin.st>
The "CPU: CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.
Change-Id: I7e7c52a2191006df30a9aadbc40d481a1db89106
* commit 'd633d12b2cc999cee3ac25bf9a810fe7ff03726d':
x86inc: Add cvisible macro for C functions with public prefix
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'ef5d41a5534b65f03d02f2e11a503ab8416bfc3b':
x86inc: Rename "program_name" to "private_prefix"
configure: Run SHFLAGS through ldflags_filter()
Conflicts:
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
* qatar/master:
aacdec: Fix an off-by-one overwrite when switching to LTP profile from MAIN.
x86inc: fix stack alignment on win64
rtpproto: Remove unused defines
Conflicts:
libavcodec/aacdec.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* commit '9ce02e14f01de50fcc6f7f459544b140be66d615':
x86: ac3dsp: port to cpuflags
x86util: Add cpuflags_mmxext alias for cpuflags_mmx2
x86inc: Only define program_name if the macro is unset
Conflicts:
libavcodec/x86/ac3dsp.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
audio_frame_queue: Clean up ff_af_queue_log_state debug function
dwt: Remove unused code.
cavs: convert cavsdata.h to a .c file
cavs: Move inline functions only used in one file out of the header
cavs: Move data tables used in only one place to that file
fate: Add a single symbol Ut Video decoder test
vf_hqdn3d: x86 asm
vf_hqdn3d: support 16bit colordepth
avconv: prefer user-forced input framerate when choosing output framerate
Conflicts:
ffmpeg.c
libavcodec/audio_frame_queue.c
libavcodec/dwt.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: fix build with nasm 2.08
x86: use nop cpu directives only if supported
x86: fix rNmp macros with nasm
build: add trailing / to yasm/nasm -I flags
x86: use 32-bit source registers with movd instruction
x86: add colons after labels
Conflicts:
Makefile
libavutil/x86/x86inc.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
It appears that something goes wrong in old nasm versions when the
%+ operator is used in the last argument of a macro invocation and
this argument is tested with %ifdef within the macro. This patch
rearranges the macro arguments such that the %+ operator is never
used in the last argument.
nasm does not support 'CPU foonop' directives. This adds a configure
test for the directive and uses it only if supported.
Signed-off-by: Mans Rullgard <mans@mansr.com>
* qatar/master:
vc1dec: Remove separate scaling function for interlaced field MVs
vc1dec: Invoke edge_emulation regardless of MV precision
x86: Use consistent 3dnowext function and macro name suffixes
g723_1: scale output as supposed for the case with postfilter disabled
g723_1: increase excitation storage by 4
g723_1: fix upper bound parameter from inverse maximum autocorrelation
g723_1: make scale_vector() behave like the reference
g723_1: fix off-by-one error in normalize_bits()
g723_1: save/restore excitation with offset to store LPC history
wmapro: prevent division by zero when sample rate is unspecified
x86: proresdsp: improve SIGNEXTEND macro comments
x86: h264dsp: K&R formatting cosmetics
LICENSE: Document all GPL files
Conflicts:
libavcodec/g723_1.c
libavcodec/wmaprodec.c
libavcodec/x86/h264dsp_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Currently there is a wild mix of 3dn2/3dnow2/3dnowext. Switching to
"3dnowext", which is a more common name of the CPU flag, as reported
e.g. by the Linux kernel, unifies this.
* qatar/master: (35 commits)
h264_idct_10bit: port x86 assembly to cpuflags.
x86inc: clip num_args to 7 on x86-32.
x86inc: sync to latest version from x264.
fft: rename "z" to "zc" to prevent name collision.
wv: return meaningful error codes.
wv: return AVERROR_EOF on EOF, not EIO.
mp3dec: forward errors for av_get_packet().
mp3dec: remove a pointless local variable.
mp3dec: remove commented out cruft.
lavfi: bump minor to mark stabilizing the ABI.
FATE: add tests for yadif.
FATE: add a test for delogo video filter.
FATE: add a test for amix audio filter.
audiogen: allow specifying random seed as a commandline parameter.
vc1dec: Override invalid macroblock quantizer
vc1: avoid reading beyond the last line in vc1_draw_sprites()
vc1dec: check that coded slice positions and interlacing match.
vc1dec: Do not ignore ff_vc1_parse_frame_header_adv return value
configure: Move parts that should not be user-selectable to CONFIG_EXTRA
lavf: remove commented out cruft in avformat_find_stream_info()
...
Conflicts:
Makefile
configure
libavcodec/vc1dec.c
libavcodec/x86/h264_deblock.asm
libavcodec/x86/h264_deblock_10bit.asm
libavcodec/x86/h264dsp_mmx.c
libavfilter/version.h
libavformat/mp3dec.c
libavformat/utils.c
libavformat/wv.c
libavutil/x86/x86inc.asm
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This allows us to unconditionally set the cglobal num_args
parameter to a bigger value, thus making writing yasm code
even easier than before.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
* qatar/master:
proresdsp: port x86 assembly to cpuflags.
lavr: x86: improve non-SSE4 version of S16_TO_S32_SX macro
lavfi: better channel layout negotiation
alac: check for truncated packets
alac: reverse lpc coeff order, simplify filter
lavr: add x86-optimized mixing functions
x86: add support for fmaddps fma4 instruction with abstraction to avx/sse
tscc2: fix typo in array index
build: use COMPILE template for HOSTOBJS
build: do full flag handling for all compiler-type tools
eval: fix printing of NaN in eval fate test.
build: Rename aandct component to more descriptive aandcttables
mpegaudio: bury inline asm under HAVE_INLINE_ASM.
x86inc: automatically insert vzeroupper for YMM functions.
rtmp: Check the buffer length of ping packets
rtmp: Allow having more unknown data at the end of a chunk size packet without failing
rtmp: Prevent reading outside of an allocate buffer when receiving server bandwidth packets
Conflicts:
Makefile
configure
libavcodec/x86/proresdsp.asm
libavutil/eval.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
qdm2: remove broken and disabled dump_context() debug function
x86: h264_intrapred: use newly introduced SPLAT* and PSHUFLW macros
x86inc: add SPLATB_LOAD, SPLATB_REG, PSHUFLW macros
x86inc: modify ALIGN to not generate long nops on i586
x86: h264_intrapred: port to cpuflag macros
avplay: update input filter pointer when the filtergraph is reset.
avconv: fix parsing of -force_key_frames option.
h264: use templates to avoid excessive inlining
xtea: Make the count parameter match the documentation
blowfish: Make the count parameter match the documentation
mpegvideo: Don't use ff_mspel_motion() for vc1
xtea: invert branch and loop precedence
blowfish: invert branch and loop precedence
flvdec: optionally trust the metadata
avconv: Set audio filter time base to the sample rate
vp8: Add ifdef guards around the sse2 loopfilter in the sse2slow branch too
Conflicts:
ffmpeg.c
ffplay.c
libavcodec/h264.c
libavcodec/mpegvideo_common.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
dv: Initialize encoder tables during encoder init.
dv: Replace some magic numbers by the appropriate #define.
FATE: pass the decoded output format and audio source file to enc_dec_pcm
FATE: specify the input format when decoding in enc_dec_pcm()
x86inc: support AVX abstraction for 2-operand instructions
configure: detect PGI compiler and set suitable flags
avconv: check for an incompatible changing channel layout
avio: make AVIOContext.av_class pointer to const
nutdec: add malloc check and fix const to non-const conversion warnings
Conflicts:
ffmpeg.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Yasm was fixed in its r2161 and yasm 0.8.0 (Apr 2010) contained this fix.
Nasm was fixed in 2.06 (Jun 2009):
https://groups.google.com/group/alt.lang.asm/browse_thread/thread/fcc85bbc3745d893
I tested with yasm 0.7.99 and yasm 1.2.0.7, where this works fine.
I also tested with nasm. The nasm shipping with Xcode is too old to understand
ffmpeg's assembly, before and after the patch. Nasm 2.10 fails to compile
fft_mmx.asm on trunk with
libavcodec/x86/fft_mmx.asm:88: panic: section ".text" has already been specified with alignment 32, conflicts with new alignment of 16
but builds fine if I change the two alignment "16"s in x86inc.asm to "32". With this patch,
nasm 2.10 fails with
libavcodec/x86/fft_mmx.asm:39: panic: section ".rodata" has already been specified with alignment 32, conflicts with new alignment of 16
instead, but again builds fine with s/16/32/.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>