GCC 4.3 and later do the right thing with the plain C code. Earlier
versions in 32-bit mode generate one extra instruction, needlessly
zeroing what would be the high half of the shifted value. At least
two gcc configurations miscompile the inline asm in some situations.
In 64-bit mode, all gcc versions generate imul r64, r64 followed by
shr. On Intel i7 and later, this imul is faster 32-bit mul. On
older Intel and all AMD, it is slightly slower. On Atom it is much
slower.
Considering where the FASTDIV macro is used, any overall negative
performance impact of this change should be negligible. If anyone
cares, they should file a bug against gcc and get the instruction
selection fixed.
Signed-off-by: Mans Rullgard <mans@mansr.com>
There is no point in having the user disable any fastdiv macros.
Besides the condition implementation was broken and only disabled
the C implementation, but no platform specific assembly versions.
Pass pointer to sample buffer instead of channel number to various
functions called from decode_subframe(). Also simplify a few
expressions within this function.
The failures on various architectures and compilers on the RGB(A)
tests seem to have been because of one-off YCbCr->RGB conversion
results. This should make the conversion results match on most if
not all code paths.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Currently it takes a mask and value, such that options for which
(flags & mask) == value.
Change it to take required flags and forbidden flags instead. This is
shorter and simpler to understand.
Replace mpz_random by mpz_urandomb with a random state initialization in
order to improve the randomness.
Signed-off-by: Martin Storsjö <martin@martin.st>
The h264_vdpau decoder crashed if output colorspace was not 8-bit 420.
Add a check to error out instead (current hardware does not support
other colorspaces, so successful decoding is not possible).
Signed-off-by: Martin Storsjö <martin@martin.st>
The way this bit is decoded was accidentally flipped in b70feb405,
leading to warnings "Encountered a bad or corrupted frame" for each
decoded frame.
Signed-off-by: Martin Storsjö <martin@martin.st>
There used to be one test for Altivec intrinsics support and a
separate test to determine which of two possible syntaxes to use
for vector literals. Since 2008, we only support the more common
of these so the split test no longer makes sense.
This combines the tests into one and also changes the hard error on
failure to a warning. The test can reasonably fail if no --cpu flag
is provided (or is provided with an unknown CPU) and the compiler
default target does not support Altivec. Aborting in this case is
probably over-reacting.
Signed-off-by: Mans Rullgard <mans@mansr.com>