modelled after aarch64 code
on Cortex-A8, s16 and s32 code is about 2x faster,
float code about 7x faster
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit '5bcbb516f2ff45290ef7995b081762e668693672':
arm: Add X() around all references to extern symbols
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
x86: dsputil: Move Xvid IDCT put/add functions to a more suitable place
trasher: Include all the necessary headers
x86: Remove some leftover declarations for non-existent functions
ARM: libavresample: NEON optimised generic fltp to s16 conversion
ARM: libavresample: NEON optimised stereo fltp to s16 conversion
ARM: libavresample: NEON optimised flat float to s16 conversion
Conflicts:
libavcodec/x86/dsputil_mmx.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>