1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-24 13:56:33 +02:00
Ben Avison 5c22e8e4ad armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.

In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.

I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:

                  Before          After
                  Mean   StdDev   Mean   StdDev  Confidence  Change
aac_decode_frame  2368.1 35.8     2117.2 35.3    100.0%      +11.8%
ff_imdct_half_*   457.5  22.4     251.2  16.2    100.0%      +82.1%

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-18 01:34:08 +03:00
2014-07-09 10:14:12 -04:00
2014-06-18 14:55:28 +02:00
2014-04-06 21:18:49 +02:00
2014-05-11 15:00:03 +02:00
2013-07-07 21:43:23 +02:00
2014-06-18 14:55:28 +02:00
2014-03-13 11:59:34 +01:00
2014-02-12 13:13:17 +00:00
2014-06-18 14:55:28 +02:00
2014-03-13 08:24:11 -04:00

Libav README
------------

1) Documentation
----------------

* Read the documentation in the doc/ directory.

2) Licensing
------------

* See the LICENSE file.
Languages
C 90.3%
Assembly 7.8%
Makefile 1.3%
C++ 0.2%
Objective-C 0.2%
Other 0.1%