FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-03-03 14:32:16 +02:00

History

Ben Avison 5c22e8e4ad armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)

The previous implementation targeted DTS Coherent Acoustics, which only
requires mdct_bits == 6. This relatively small size lent itself to
unrolling the loops a small number of times, and encoding offsets
calculated at assembly time within the load/store instructions of each
iteration.

In the more general case (codecs such as AAC and AC3) much larger arrays
are used - mdct_bits == [8, 9, 11]. The old method does not scale for
these cases, so more integer registers are used with non-unrolled versions
of the loops (and with some stack spillage). The postrotation filter loop
is still unrolled by a factor of 2 to permit the double-buffering of some
VFP registers to facilitate overlap of neighbouring iterations.

I benchmarked the result by measuring the number of gperftools samples
that hit anywhere in the AAC decoder (starting from aac_decode_frame())
or specifically in ff_imdct_half_c / ff_imdct_half_vfp, for the same
example AAC stream:

                  Before          After
                  Mean   StdDev   Mean   StdDev  Confidence  Change
aac_decode_frame  2368.1 35.8     2117.2 35.3    100.0%      +11.8%
ff_imdct_half_*   457.5  22.4     251.2  16.2    100.0%      +82.1%

Signed-off-by: Martin Storsjö <martin@martin.st>

2014-07-18 01:34:08 +03:00

aac.h

…

aacpsdsp_init_arm.c

…

aacpsdsp_neon.S

…

ac3dsp_arm.S

…

ac3dsp_armv6.S

…

ac3dsp_init_arm.c

dsputil: Move apply_window_int16 to ac3dsp

2013-12-08 17:57:15 +01:00

ac3dsp_neon.S

dsputil: Move apply_window_int16 to ac3dsp

2013-12-08 17:57:15 +01:00

apedsp_init_arm.c

dsputil: Move APE-specific bits into apedsp

2014-05-29 06:41:15 -07:00

apedsp_neon.S

dsputil: Move APE-specific bits into apedsp

2014-05-29 06:41:15 -07:00

asm-offsets.h

mpegvideo: move the MpegEncContext fields used from arm asm to the beginning

2014-04-29 14:49:42 +02:00

audiodsp_arm.h

dsputil: Split audio operations off into a separate context

2014-06-22 06:20:15 -07:00

audiodsp_init_arm.c

dsputil: Split audio operations off into a separate context

2014-06-22 06:20:15 -07:00

audiodsp_init_neon.c

dsputil: Split audio operations off into a separate context

2014-06-22 06:20:15 -07:00

audiodsp_neon.S

dsputil: Split audio operations off into a separate context

2014-06-22 06:20:15 -07:00

blockdsp_arm.h

dsputil: Split clear_block*/fill_block* off into a separate context

2014-06-18 14:07:23 -07:00

blockdsp_init_arm.c

dsputil: Split clear_block*/fill_block* off into a separate context

2014-06-18 14:07:23 -07:00

blockdsp_init_neon.c

dsputil: Split clear_block*/fill_block* off into a separate context

2014-06-18 14:07:23 -07:00

blockdsp_neon.S

dsputil: Split clear_block*/fill_block* off into a separate context

2014-06-18 14:07:23 -07:00

cabac.h

arm: get_cabac inline asm

2014-03-09 00:45:34 +01:00

dca.h

dcadec: simplify decoding of VQ high frequencies

2014-02-28 13:03:22 +01:00

dcadsp_init_arm.c

arm: dcadsp: implement decode_hf as external NEON asm

2014-02-28 13:12:19 +01:00

dcadsp_neon.S

arm: dcadsp: implement decode_hf as external NEON asm

2014-02-28 13:12:19 +01:00

dcadsp_vfp.S

dcadec: remove scaling in lfe_interpolation_fir

2014-02-28 13:00:47 +01:00

fft_fixed_init_arm.c

Rename CONFIG_FFT_FLOAT ---> FFT_FLOAT

2014-01-06 19:12:48 +01:00

fft_fixed_neon.S

…

fft_init_arm.c

arm: dcadsp: Move synth filter initialization to dcadsp file

2013-08-29 11:24:14 +02:00

fft_neon.S

…

fft_vfp.S

arm: Add VFP-accelerated version of fft16

2013-07-22 10:15:41 +03:00

flacdsp_arm.S

…

flacdsp_init_arm.c

…

fmtconvert_init_arm.c

arm: fmtconvert: Split armv6 fmtconvert code off from vfp code

2013-08-29 11:24:14 +02:00

fmtconvert_neon.S

arm: Add X() around all references to extern symbols

2014-02-07 15:13:58 +02:00

fmtconvert_vfp_armv6.S

arm: fmtconvert: Split armv6 fmtconvert code off from vfp code

2013-08-29 11:24:14 +02:00

fmtconvert_vfp.S

arm: fmtconvert: Split armv6 fmtconvert code off from vfp code

2013-08-29 11:24:14 +02:00

h264chroma_init_arm.c

…

h264cmc_neon.S

h264: avoid using uninitialized memory in NEON chroma mc

2014-06-23 16:32:15 +02:00

h264dsp_armv6.S

arm: Use the matching endfunc macro instead of the assembler directive directly

2014-01-04 13:53:08 +02:00

h264dsp_init_arm.c

arm: Avoid using the 'setend' instruction on ARMv7 and newer

2014-07-08 12:09:09 +03:00

h264dsp_neon.S

…

h264idct_neon.S

arm: Add X() around all references to extern symbols

2014-02-07 15:13:58 +02:00

h264pred_init_arm.c

On2 VP7 decoder

2014-04-04 04:00:11 +02:00

h264pred_neon.S

…

h264qpel_init_arm.c

…

h264qpel_neon.S

…

hpeldsp_arm.h

arm: Use full filenames as multiple inclusion guards

2014-01-14 00:04:52 +01:00

hpeldsp_arm.S

Update dsputil- and SIMD-related comments to match reality more closely

2014-03-13 05:50:29 -07:00

hpeldsp_armv6.S

arm: hpeldsp: fix put_pixels8_y2_{,no_rnd_}armv6

2014-03-08 18:31:57 +01:00

hpeldsp_init_arm.c

dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros

2014-03-22 06:17:29 -07:00

hpeldsp_init_armv6.c

arm: hpeldsp: Move half-pel assembly from dsputil to hpeldsp

2013-04-19 23:19:08 +03:00

hpeldsp_init_neon.c

arm: hpeldsp: Move half-pel assembly from dsputil to hpeldsp

2013-04-19 23:19:08 +03:00

hpeldsp_neon.S

arm: hpeldsp: Move half-pel assembly from dsputil to hpeldsp

2013-04-19 23:19:08 +03:00

idctdsp_arm.h

dsputil: Split off IDCT bits into their own context

2014-06-30 07:58:46 -07:00

idctdsp_arm.S

dsputil: Split off IDCT bits into their own context

2014-06-30 07:58:46 -07:00

idctdsp_armv6.S

dsputil: Split off IDCT bits into their own context

2014-06-30 07:58:46 -07:00

idctdsp_init_arm.c

dsputil: Split off IDCT bits into their own context

2014-06-30 07:58:46 -07:00

idctdsp_init_armv5te.c

dsputil: Split off IDCT bits into their own context

2014-06-30 07:58:46 -07:00

idctdsp_init_armv6.c

dsputil: Split off IDCT bits into their own context

2014-06-30 07:58:46 -07:00

idctdsp_init_neon.c

dsputil: Split off IDCT bits into their own context

2014-06-30 07:58:46 -07:00

idctdsp_neon.S

dsputil: Split off IDCT bits into their own context

2014-06-30 07:58:46 -07:00

int_neon.S

dsputil: Move APE-specific bits into apedsp

2014-05-29 06:41:15 -07:00

jrevdct_arm.S

…

Makefile

dsputil: Split motion estimation compare bits off into their own context

2014-07-17 09:07:10 -07:00

mathops.h

…

mdct_fixed_neon.S

…

mdct_neon.S

arm: Add X() around all references to extern symbols

2014-02-07 15:13:58 +02:00

mdct_vfp.S

armv6: Accelerate ff_imdct_half for general case (mdct_bits != 6)

2014-07-18 01:34:08 +03:00

me_cmp_armv6.S

dsputil: Split motion estimation compare bits off into their own context

2014-07-17 09:07:10 -07:00

me_cmp_init_arm.c

dsputil: Split motion estimation compare bits off into their own context

2014-07-17 09:07:10 -07:00

mlpdsp_armv5te.S

truehd: add hand-scheduled ARM asm version of ff_mlp_rematrix_channel.

2014-03-26 19:54:10 +02:00

mlpdsp_armv6.S

truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.

2014-03-26 19:54:32 +02:00

mlpdsp_init_arm.c

truehd: add hand-scheduled ARM asm version of ff_mlp_pack_output.

2014-03-26 19:54:32 +02:00

mpegaudiodsp_fixed_armv6.S

…

mpegaudiodsp_init_arm.c

…

mpegvideo_arm.c

lavu: add CHK_OFFS as AV_CHECK_OFFSET to check struct member offsets

2014-04-24 18:28:26 +02:00

mpegvideo_arm.h

arm: Use full filenames as multiple inclusion guards

2014-01-14 00:04:52 +01:00

mpegvideo_armv5te_s.S

…

mpegvideo_armv5te.c

…

mpegvideo_neon.S

arm: Add X() around all references to extern symbols

2014-02-07 15:13:58 +02:00

mpegvideoencdsp_armv6.S

dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc

2014-07-06 14:26:53 -07:00

mpegvideoencdsp_init_arm.c

dsputil: Move pix_sum, pix_norm1, shrink function pointers to mpegvideoenc

2014-07-06 14:26:53 -07:00

neon.S

…

neontest.c

arm: Add an option for making sure NEON registers aren't clobbered

2014-01-11 00:03:00 +02:00

pixblockdsp_armv6.S

dsputil: Split off pixel block routines into their own context

2014-07-09 08:05:26 -07:00

pixblockdsp_init_arm.c

dsputil: Split off pixel block routines into their own context

2014-07-09 08:05:26 -07:00

rdft_neon.S

…

rv34dsp_init_arm.c

…

rv34dsp_neon.S

…

rv40dsp_init_arm.c

arm: Drop unnecessary ff_ name prefixes from static functions

2013-04-30 16:02:03 +02:00

rv40dsp_neon.S

…

sbrdsp_init_arm.c

…

sbrdsp_neon.S

…

simple_idct_arm.S

arm: Add a missing endfunc macro call

2014-01-04 13:53:02 +02:00

simple_idct_armv5te.S

…

simple_idct_armv6.S

…

simple_idct_neon.S

…

synth_filter_neon.S

…

synth_filter_vfp.S

arm: Mangle external symbols properly in new vfp assembly files

2013-07-22 14:48:30 +03:00

vc1dsp_init_arm.c

vc1: arm: Add NEON assembly

2013-12-20 14:53:39 +02:00

vc1dsp_init_neon.c

arm: check if AS supports .dn

2014-06-03 14:23:03 +02:00

vc1dsp_neon.S

arm: check if AS supports .dn

2014-06-03 14:23:03 +02:00

vc1dsp.h

vc1: arm: Add NEON assembly

2013-12-20 14:53:39 +02:00

videodsp_arm.h

…

videodsp_armv5te.S

Update dsputil- and SIMD-related comments to match reality more closely

2014-03-13 05:50:29 -07:00

videodsp_init_arm.c

…

videodsp_init_armv5te.c

…

vorbisdsp_init_arm.c

…

vorbisdsp_neon.S

…

vp3dsp_init_arm.c

Remove a number of unnecessary dsputil.h #includes

2014-04-04 19:08:05 +02:00

vp3dsp_neon.S

arm: Add a missing # as prefix for an immediate constant

2014-01-07 19:30:13 +02:00

vp6dsp_init_arm.c

vp56: Mark VP6-only optimizations as such.

2013-08-23 14:42:19 +02:00

vp6dsp_neon.S

vp56: Mark VP6-only optimizations as such.

2013-08-23 14:42:19 +02:00

vp8_armv6.S

…

vp8.h

arm: asm decode_block_coeffs_internal is vp8 specific

2014-04-04 10:39:29 +02:00

vp8dsp_armv6.S

armv6: vp8: use explicit labels in motion compensation asm

2014-03-12 15:06:05 +01:00

vp8dsp_init_arm.c

On2 VP7 decoder

2014-04-04 04:00:11 +02:00

vp8dsp_init_armv6.c

On2 VP7 decoder

2014-04-04 04:00:11 +02:00

vp8dsp_init_neon.c

On2 VP7 decoder

2014-04-04 04:00:11 +02:00

vp8dsp_neon.S

vp8: Use 2 registers for dst_stride and src_stride in neon bilin filter

2014-02-06 09:32:26 +02:00

vp8dsp.h

On2 VP7 decoder

2014-04-04 04:00:11 +02:00

vp56_arith.h

…