1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-07 11:13:41 +02:00
FFmpeg/libavcodec/arm
Martin Storsjö 5eb5aec475 arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible
This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 12388 bytes to 19784 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:                              Cortex A7       A8       A9      A53
vp9_inv_dct_dct_16x16_sub1_add_neon:     273.0    189.5    212.0    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    2102.1   1521.7   1736.2   1265.8
vp9_inv_dct_dct_16x16_sub4_add_neon:    2104.5   1533.0   1736.6   1265.5
vp9_inv_dct_dct_16x16_sub8_add_neon:    2484.8   1828.7   2014.4   1506.5
vp9_inv_dct_dct_16x16_sub12_add_neon:   2851.2   2117.8   2294.8   1753.2
vp9_inv_dct_dct_16x16_sub16_add_neon:   3239.4   2408.3   2543.5   1994.9
vp9_inv_dct_dct_32x32_sub1_add_neon:     758.3    456.7    864.5    553.9
vp9_inv_dct_dct_32x32_sub2_add_neon:   10776.7   7949.8   8567.7   6819.7
vp9_inv_dct_dct_32x32_sub4_add_neon:   10865.6   8131.5   8589.6   6816.3
vp9_inv_dct_dct_32x32_sub8_add_neon:   12053.9   9271.3   9387.7   7564.0
vp9_inv_dct_dct_32x32_sub12_add_neon:  13328.3  10463.2  10217.0   8321.3
vp9_inv_dct_dct_32x32_sub16_add_neon:  14176.4  11509.5  11018.7   9062.3
vp9_inv_dct_dct_32x32_sub20_add_neon:  15301.5  12999.9  11855.1   9828.2
vp9_inv_dct_dct_32x32_sub24_add_neon:  16482.7  14931.5  12650.1  10575.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17589.5  15811.9  13482.8  11333.4
vp9_inv_dct_dct_32x32_sub32_add_neon:  18696.2  17049.2  14355.6  12089.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon:     273.0    189.5    211.7    235.8
vp9_inv_dct_dct_16x16_sub2_add_neon:    1203.5    998.2   1035.3    763.0
vp9_inv_dct_dct_16x16_sub4_add_neon:    1203.5    998.1   1035.5    760.8
vp9_inv_dct_dct_16x16_sub8_add_neon:    1926.1   1610.6   1722.1   1271.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   2873.2   2129.7   2285.1   1757.3
vp9_inv_dct_dct_16x16_sub16_add_neon:   3221.4   2520.3   2557.6   2002.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     753.0    457.5    866.6    554.6
vp9_inv_dct_dct_32x32_sub2_add_neon:    7554.6   5652.4   6048.4   4920.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    7549.9   5685.0   6046.9   4925.7
vp9_inv_dct_dct_32x32_sub8_add_neon:    8336.9   6704.5   6604.0   5478.0
vp9_inv_dct_dct_32x32_sub12_add_neon:  10914.0   9777.2   9240.4   7416.9
vp9_inv_dct_dct_32x32_sub16_add_neon:  11859.2  11223.3   9966.3   8095.1
vp9_inv_dct_dct_32x32_sub20_add_neon:  15237.1  13029.4  11838.3   9829.4
vp9_inv_dct_dct_32x32_sub24_add_neon:  16293.2  14379.8  12644.9  10572.0
vp9_inv_dct_dct_32x32_sub28_add_neon:  17424.3  15734.7  13473.0  11326.9
vp9_inv_dct_dct_32x32_sub32_add_neon:  18531.3  17457.0  14298.6  12080.0

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:32:00 +02:00
..
aac.h
aacpsdsp_init_arm.c
aacpsdsp_neon.S
ac3dsp_arm.S
ac3dsp_armv6.S
ac3dsp_init_arm.c
ac3dsp_neon.S
apedsp_init_arm.c
apedsp_neon.S
asm-offsets.h
audiodsp_arm.h
audiodsp_init_arm.c
audiodsp_init_neon.c audiodsp: reorder arguments for vector_clipf 2016-09-22 09:47:52 +02:00
audiodsp_neon.S audiodsp: reorder arguments for vector_clipf 2016-09-22 09:47:52 +02:00
blockdsp_arm.h blockdsp: drop the high_bit_depth parameter 2016-09-22 09:47:52 +02:00
blockdsp_init_arm.c blockdsp: drop the high_bit_depth parameter 2016-09-22 09:47:52 +02:00
blockdsp_init_neon.c blockdsp: drop the high_bit_depth parameter 2016-09-22 09:47:52 +02:00
blockdsp_neon.S
cabac.h
dca.h
dcadsp_init_arm.c dca: remove unused decode_hf function and quant_d tables 2015-12-24 13:58:18 +01:00
dcadsp_neon.S dca: remove unused decode_hf function and quant_d tables 2015-12-24 13:58:18 +01:00
dcadsp_vfp.S
fft_fixed_init_arm.c fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
fft_fixed_neon.S
fft_init_arm.c fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
fft_neon.S
fft_vfp.S
flacdsp_arm.S
flacdsp_init_arm.c
fmtconvert_init_arm.c
fmtconvert_neon.S
fmtconvert_vfp.S
g722dsp_init_arm.c
g722dsp_neon.S
h264chroma_init_arm.c h264chroma: Change type of stride parameters to ptrdiff_t 2016-09-29 14:48:04 +02:00
h264cmc_neon.S h264chroma: Change type of stride parameters to ptrdiff_t 2016-09-29 14:48:04 +02:00
h264dsp_init_arm.c
h264dsp_neon.S
h264idct_neon.S
h264pred_init_arm.c
h264pred_neon.S
h264qpel_init_arm.c
h264qpel_neon.S
hpeldsp_arm.h
hpeldsp_arm.S hpeldsp: arm: Update comments left behind in 25841dfe80 2016-09-29 14:48:03 +02:00
hpeldsp_armv6.S
hpeldsp_init_arm.c
hpeldsp_init_armv6.c
hpeldsp_init_neon.c
hpeldsp_neon.S
idct.h idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
idctdsp_arm.h
idctdsp_arm.S idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
idctdsp_armv6.S
idctdsp_init_arm.c idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
idctdsp_init_armv5te.c
idctdsp_init_armv6.c idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
idctdsp_init_neon.c
idctdsp_neon.S
int_neon.S
jrevdct_arm.S
Makefile arm: vp9: Add NEON loop filters 2016-11-11 14:16:42 +02:00
mathops.h
mdct_fixed_init_arm.c fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
mdct_fixed_neon.S
mdct_init_arm.c fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
mdct_neon.S
mdct_vfp.S
me_cmp_armv6.S
me_cmp_init_arm.c
mlpdsp_armv5te.S
mlpdsp_armv6.S cosmetics: Fix spelling mistakes 2016-05-04 18:16:21 +02:00
mlpdsp_init_arm.c
mpegaudiodsp_fixed_armv6.S
mpegaudiodsp_init_arm.c
mpegvideo_arm.c
mpegvideo_arm.h
mpegvideo_armv5te_s.S
mpegvideo_armv5te.c cosmetics: Fix spelling mistakes 2016-05-04 18:16:21 +02:00
mpegvideo_neon.S
mpegvideoencdsp_armv6.S
mpegvideoencdsp_init_arm.c
neon.S
neontest.c lavc: add clobber tests for the new encoding/decoding API 2016-09-28 10:01:52 +02:00
pixblockdsp_armv6.S
pixblockdsp_init_arm.c pixblockdsp: Change type of stride parameters to ptrdiff_t 2016-09-14 14:12:36 +02:00
rdft_init_arm.c rdft: arm: Split RDFT initialization into a separate file 2016-02-26 14:34:58 +01:00
rdft_neon.S
rv34dsp_init_arm.c
rv34dsp_neon.S
rv40dsp_init_arm.c
rv40dsp_neon.S
sbrdsp_init_arm.c
sbrdsp_neon.S
simple_idct_arm.S cosmetics: Fix spelling mistakes 2016-05-04 18:16:21 +02:00
simple_idct_armv5te.S simple_idct: arm: Drop disabled code variant 2016-08-17 12:21:54 +02:00
simple_idct_armv6.S idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
simple_idct_neon.S idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
startcode_armv6.S
startcode.h
synth_filter_neon.S
synth_filter_vfp.S
vc1dsp_init_arm.c
vc1dsp_init_neon.c h264chroma: Change type of stride parameters to ptrdiff_t 2016-09-29 14:48:04 +02:00
vc1dsp_neon.S idct: Change type of array stride parameters to ptrdiff_t 2016-09-29 14:48:03 +02:00
vc1dsp.h
videodsp_arm.h
videodsp_armv5te.S
videodsp_init_arm.c
videodsp_init_armv5te.c
vorbisdsp_init_arm.c
vorbisdsp_neon.S
vp3dsp_init_arm.c vp3: Change type of stride parameters to ptrdiff_t 2016-08-26 11:36:26 +02:00
vp3dsp_neon.S
vp6dsp_init_arm.c vp56: Separate VP5 and VP6 dsp initialization 2016-08-26 11:50:22 +02:00
vp6dsp_neon.S
vp8_armv6.S
vp8.h
vp8dsp_armv6.S vp8: Update some assembly comments left unchanged in bd66f073fe 2016-08-26 11:36:53 +02:00
vp8dsp_init_arm.c
vp8dsp_init_armv6.c
vp8dsp_init_neon.c
vp8dsp_neon.S arm: Fix a typo in a comment 2016-07-06 22:58:51 +03:00
vp8dsp.h
vp9dsp_init_arm.c arm: vp9: Add NEON loop filters 2016-11-11 14:16:42 +02:00
vp9itxfm_neon.S arm: vp9itxfm: Do a simpler half/quarter idct16/idct32 when possible 2017-02-09 12:32:00 +02:00
vp9lpf_neon.S arm: vp9: Add NEON loop filters 2016-11-11 14:16:42 +02:00
vp9mc_neon.S arm: vp9mc: Fix vertical alignment of operands 2017-01-03 14:15:45 +02:00
vp56_arith.h