Timings for Arrandale:
C SSE
win32: 2108 334
win64: 1152 322
Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.
Unrolling for ARCH_X86_64 is a 20 cycles gain.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This removes the rather pointless wrappers (one not even inline)
for calling the fft_calc and related function pointers.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Thats not an attempt to emulate indent -kr behavior down to the finest
fineprint, first it would not be worth the work, second it would be less
readable, third it would not be K&R but indent -kr then.
Originally committed as revision 20416 to svn://svn.ffmpeg.org/ffmpeg/trunk