Up to ~4 times faster on x86_64, ~8 times on x86_32 if compiling using x87 fp math.
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Using rNm and x86inc's stack allocation with a negative value at the same
time isn't supported, and caused the original stack pointer to be clobbered
when using a compiler that doesn't support stack alignment.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
* commit '8563f9887194b07c972c3475d6b51592d77f73f7':
x86: use emms after ff_int32_to_float_fmul_scalar_sse
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit 'f4f27e4cf1013c55b2c7df359ce8d58ee922662c':
x86: zero extend the 32-bit length in int32_to_float_fmul_scalar implicitly
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit '2008f76054906e9ff6bf744800af0e5a5bfe61be':
dca: remove unused decode_hf function and quant_d tables
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
Intel's Instruction Set Reference (as of September 2015) clearly states
that cvtpi2ps switches to MMX state. Actual CPUs do not switch if the
source is a memory location. The Instruction Set Reference from 1999
(Order Number 243191) describes this behaviour but all later versions
I've seen have make no distinction whether MMX registers or memory is
used as source.
The documentation for the matching SSE2 instruction to convert to double
(cvtpi2pd) was fixed (see the valgrind bug
https://bugs.kde.org/show_bug.cgi?id=210264).
It will take time to get a clarification and fixes in place. In the
meantime it makes sense to change ff_int32_to_float_fmul_scalar_sse to
be correct according to the documentation. The vast majority of users
will have SSE2 so a change to the SSE version has little effect.
Fixes fate-checkasm on x86 valgrind targets.
Valgrind 'bug' reported as https://bugs.kde.org/show_bug.cgi?id=357059
This fixes extra semicolons that clang 3.7 on GNU/Linux warns about.
These were trigggered when built under -Wpedantic, which essentially
checks for strict ISO compliance in numerous ways.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Ganesh Ajjanagadde <gajjanagadde@gmail.com>
This function is only used within other inline asm functions, hence the
HAVE_MMX_INLINE guard. Per recent discussions, we should not worry about
the performance of inline asm-only builds.