Technically _tzcnt* intrinsics are only available when the BMI
instruction set is present. However the instruction encoding
degrades to "rep bsf" on older processors.
Clang for Windows debatably restricts the _tzcnt* instrinics behind
the __BMI__ architecture define, so check for its presence or
exclude the usage of these intrinics when clang is present.
See also:
https://ffmpeg.org/pipermail/ffmpeg-devel/2015-November/183404.htmlhttps://bugs.llvm.org/show_bug.cgi?id=30506http://lists.llvm.org/pipermail/cfe-dev/2016-October/051034.html
Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Matt Oliver <protogonoi@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
It seems to miscompile them
Should fix fate-ra-288 and fate-twinvq
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Reviewed-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
ICC versions older than atleast 12.1.6 dont have the tzcnt intrinsics.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Matt Oliver <protogonoi@gmail.com>
GCC 4.3 and later do the right thing with the plain C code. Earlier
versions in 32-bit mode generate one extra instruction, needlessly
zeroing what would be the high half of the shifted value. At least
two gcc configurations miscompile the inline asm in some situations.
In 64-bit mode, all gcc versions generate imul r64, r64 followed by
shr. On Intel i7 and later, this imul is faster 32-bit mul. On
older Intel and all AMD, it is slightly slower. On Atom it is much
slower.
Considering where the FASTDIV macro is used, any overall negative
performance impact of this change should be negligible. If anyone
cares, they should file a bug against gcc and get the instruction
selection fixed.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This reduces the number of false dependencies on header files and
speeds up compilation.
Originally committed as revision 22407 to svn://svn.ffmpeg.org/ffmpeg/trunk