1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-21 10:55:51 +02:00
FFmpeg/libavcodec/arm
Ben Avison 5c22e8e4ad armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)
The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.

In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.

I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:

                  Before          After
                  Mean   StdDev   Mean   StdDev  Confidence  Change
aac_decode_frame  2368.1 35.8     2117.2 35.3    100.0%      +11.8%
ff_imdct_half_*   457.5  22.4     251.2  16.2    100.0%      +82.1%

Signed-off-by: Martin Storsjö <martin@martin.st>
2014-07-18 01:34:08 +03:00
..
aac.h
aacpsdsp_init_arm.c
aacpsdsp_neon.S
ac3dsp_arm.S
ac3dsp_armv6.S
ac3dsp_init_arm.c
ac3dsp_neon.S
apedsp_init_arm.c dsputil: Move APE-specific bits into apedsp 2014-05-29 06:41:15 -07:00
apedsp_neon.S dsputil: Move APE-specific bits into apedsp 2014-05-29 06:41:15 -07:00
asm-offsets.h mpegvideo: move the MpegEncContext fields used from arm asm to the beginning 2014-04-29 14:49:42 +02:00
audiodsp_arm.h dsputil: Split audio operations off into a separate context 2014-06-22 06:20:15 -07:00
audiodsp_init_arm.c dsputil: Split audio operations off into a separate context 2014-06-22 06:20:15 -07:00
audiodsp_init_neon.c dsputil: Split audio operations off into a separate context 2014-06-22 06:20:15 -07:00
audiodsp_neon.S dsputil: Split audio operations off into a separate context 2014-06-22 06:20:15 -07:00
blockdsp_arm.h dsputil: Split clear_block*/fill_block* off into a separate context 2014-06-18 14:07:23 -07:00
blockdsp_init_arm.c dsputil: Split clear_block*/fill_block* off into a separate context 2014-06-18 14:07:23 -07:00
blockdsp_init_neon.c dsputil: Split clear_block*/fill_block* off into a separate context 2014-06-18 14:07:23 -07:00
blockdsp_neon.S dsputil: Split clear_block*/fill_block* off into a separate context 2014-06-18 14:07:23 -07:00
cabac.h arm: get_cabac inline asm 2014-03-09 00:45:34 +01:00
dca.h
dcadsp_init_arm.c arm: dcadsp: implement decode_hf as external NEON asm 2014-02-28 13:12:19 +01:00
dcadsp_neon.S arm: dcadsp: implement decode_hf as external NEON asm 2014-02-28 13:12:19 +01:00
dcadsp_vfp.S
fft_fixed_init_arm.c
fft_fixed_neon.S
fft_init_arm.c
fft_neon.S
fft_vfp.S
flacdsp_arm.S
flacdsp_init_arm.c
fmtconvert_init_arm.c
fmtconvert_neon.S
fmtconvert_vfp_armv6.S
fmtconvert_vfp.S
h264chroma_init_arm.c
h264cmc_neon.S h264: avoid using uninitialized memory in NEON chroma mc 2014-06-23 16:32:15 +02:00
h264dsp_armv6.S
h264dsp_init_arm.c arm: Avoid using the 'setend' instruction on ARMv7 and newer 2014-07-08 12:09:09 +03:00
h264dsp_neon.S
h264idct_neon.S
h264pred_init_arm.c On2 VP7 decoder 2014-04-04 04:00:11 +02:00
h264pred_neon.S
h264qpel_init_arm.c
h264qpel_neon.S
hpeldsp_arm.h
hpeldsp_arm.S Update dsputil- and SIMD-related comments to match reality more closely 2014-03-13 05:50:29 -07:00
hpeldsp_armv6.S arm: hpeldsp: fix put_pixels8_y2_{,no_rnd_}armv6 2014-03-08 18:31:57 +01:00
hpeldsp_init_arm.c dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros 2014-03-22 06:17:29 -07:00
hpeldsp_init_armv6.c
hpeldsp_init_neon.c
hpeldsp_neon.S
idctdsp_arm.h dsputil: Split off IDCT bits into their own context 2014-06-30 07:58:46 -07:00
idctdsp_arm.S dsputil: Split off IDCT bits into their own context 2014-06-30 07:58:46 -07:00
idctdsp_armv6.S dsputil: Split off IDCT bits into their own context 2014-06-30 07:58:46 -07:00
idctdsp_init_arm.c dsputil: Split off IDCT bits into their own context 2014-06-30 07:58:46 -07:00
idctdsp_init_armv5te.c dsputil: Split off IDCT bits into their own context 2014-06-30 07:58:46 -07:00
idctdsp_init_armv6.c dsputil: Split off IDCT bits into their own context 2014-06-30 07:58:46 -07:00
idctdsp_init_neon.c dsputil: Split off IDCT bits into their own context 2014-06-30 07:58:46 -07:00
idctdsp_neon.S dsputil: Split off IDCT bits into their own context 2014-06-30 07:58:46 -07:00
int_neon.S dsputil: Move APE-specific bits into apedsp 2014-05-29 06:41:15 -07:00
jrevdct_arm.S
Makefile dsputil: Split motion estimation compare bits off into their own context 2014-07-17 09:07:10 -07:00
mathops.h
mdct_fixed_neon.S
mdct_neon.S
mdct_vfp.S armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6) 2014-07-18 01:34:08 +03:00
me_cmp_armv6.S dsputil: Split motion estimation compare bits off into their own context 2014-07-17 09:07:10 -07:00
me_cmp_init_arm.c dsputil: Split motion estimation compare bits off into their own context 2014-07-17 09:07:10 -07:00
mlpdsp_armv5te.S truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel. 2014-03-26 19:54:10 +02:00
mlpdsp_armv6.S truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output. 2014-03-26 19:54:32 +02:00
mlpdsp_init_arm.c truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output. 2014-03-26 19:54:32 +02:00
mpegaudiodsp_fixed_armv6.S
mpegaudiodsp_init_arm.c
mpegvideo_arm.c lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets 2014-04-24 18:28:26 +02:00
mpegvideo_arm.h
mpegvideo_armv5te_s.S
mpegvideo_armv5te.c
mpegvideo_neon.S
mpegvideoencdsp_armv6.S dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc 2014-07-06 14:26:53 -07:00
mpegvideoencdsp_init_arm.c dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc 2014-07-06 14:26:53 -07:00
neon.S
neontest.c
pixblockdsp_armv6.S dsputil: Split off pixel block routines into their own context 2014-07-09 08:05:26 -07:00
pixblockdsp_init_arm.c dsputil: Split off pixel block routines into their own context 2014-07-09 08:05:26 -07:00
rdft_neon.S
rv34dsp_init_arm.c
rv34dsp_neon.S
rv40dsp_init_arm.c
rv40dsp_neon.S
sbrdsp_init_arm.c
sbrdsp_neon.S
simple_idct_arm.S
simple_idct_armv5te.S
simple_idct_armv6.S
simple_idct_neon.S
synth_filter_neon.S
synth_filter_vfp.S
vc1dsp_init_arm.c
vc1dsp_init_neon.c arm: check if AS supports .dn 2014-06-03 14:23:03 +02:00
vc1dsp_neon.S arm: check if AS supports .dn 2014-06-03 14:23:03 +02:00
vc1dsp.h
videodsp_arm.h
videodsp_armv5te.S Update dsputil- and SIMD-related comments to match reality more closely 2014-03-13 05:50:29 -07:00
videodsp_init_arm.c
videodsp_init_armv5te.c
vorbisdsp_init_arm.c
vorbisdsp_neon.S
vp3dsp_init_arm.c Remove a number of unnecessary dsputil.h #includes 2014-04-04 19:08:05 +02:00
vp3dsp_neon.S
vp6dsp_init_arm.c
vp6dsp_neon.S
vp8_armv6.S
vp8.h arm: asm decode_block_coeffs_internal is vp8 specific 2014-04-04 10:39:29 +02:00
vp8dsp_armv6.S armv6: vp8: use explicit labels in motion compensation asm 2014-03-12 15:06:05 +01:00
vp8dsp_init_arm.c On2 VP7 decoder 2014-04-04 04:00:11 +02:00
vp8dsp_init_armv6.c On2 VP7 decoder 2014-04-04 04:00:11 +02:00
vp8dsp_init_neon.c On2 VP7 decoder 2014-04-04 04:00:11 +02:00
vp8dsp_neon.S
vp8dsp.h On2 VP7 decoder 2014-04-04 04:00:11 +02:00
vp56_arith.h