1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-03 05:10:03 +02:00
FFmpeg/libavcodec/aarch64
Martin Storsjö a63da4511d aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32
This work is sponsored by, and copyright, Google.

This avoids loading and calculating coefficients that we know will
be zero, and avoids filling the temp buffer with zeros in places
where we know the second pass won't read.

This gives a pretty substantial speedup for the smaller subpartitions.

The code size increases from 14740 bytes to 24292 bytes.

The idct16/32_end macros are moved above the individual functions; the
instructions themselves are unchanged, but since new functions are added
at the same place where the code is moved from, the diff looks rather
messy.

Before:
vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
vp9_inv_dct_dct_16x16_sub2_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub4_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub8_add_neon:    1051.0
vp9_inv_dct_dct_16x16_sub12_add_neon:   1387.4
vp9_inv_dct_dct_16x16_sub16_add_neon:   1387.6
vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    5198.5
vp9_inv_dct_dct_32x32_sub4_add_neon:    5198.6
vp9_inv_dct_dct_32x32_sub8_add_neon:    5196.3
vp9_inv_dct_dct_32x32_sub12_add_neon:   6183.4
vp9_inv_dct_dct_32x32_sub16_add_neon:   6174.3
vp9_inv_dct_dct_32x32_sub20_add_neon:   7151.4
vp9_inv_dct_dct_32x32_sub24_add_neon:   7145.3
vp9_inv_dct_dct_32x32_sub28_add_neon:   8119.3
vp9_inv_dct_dct_32x32_sub32_add_neon:   8118.7

After:
vp9_inv_dct_dct_16x16_sub1_add_neon:     236.7
vp9_inv_dct_dct_16x16_sub2_add_neon:     640.8
vp9_inv_dct_dct_16x16_sub4_add_neon:     639.0
vp9_inv_dct_dct_16x16_sub8_add_neon:     842.0
vp9_inv_dct_dct_16x16_sub12_add_neon:   1388.3
vp9_inv_dct_dct_16x16_sub16_add_neon:   1389.3
vp9_inv_dct_dct_32x32_sub1_add_neon:     554.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    3685.5
vp9_inv_dct_dct_32x32_sub4_add_neon:    3685.1
vp9_inv_dct_dct_32x32_sub8_add_neon:    3684.4
vp9_inv_dct_dct_32x32_sub12_add_neon:   5312.2
vp9_inv_dct_dct_32x32_sub16_add_neon:   5315.4
vp9_inv_dct_dct_32x32_sub20_add_neon:   7154.9
vp9_inv_dct_dct_32x32_sub24_add_neon:   7154.5
vp9_inv_dct_dct_32x32_sub28_add_neon:   8126.6
vp9_inv_dct_dct_32x32_sub32_add_neon:   8127.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-02-09 12:32:03 +02:00
..
asm-offsets.h
cabac.h
dcadsp_init.c dca: remove unused decode_hf function and quant_d tables 2015-12-24 13:58:18 +01:00
dcadsp_neon.S dca: remove unused decode_hf function and quant_d tables 2015-12-24 13:58:18 +01:00
fft_init_aarch64.c fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
fft_neon.S
fmtconvert_init.c arm64: int32_to_float_fmul neon asm 2015-12-14 16:45:02 +01:00
fmtconvert_neon.S arm64: int32_to_float_fmul neon asm 2015-12-14 16:45:02 +01:00
h264chroma_init_aarch64.c h264chroma: Change type of stride parameters to ptrdiff_t 2016-09-29 14:48:04 +02:00
h264cmc_neon.S h264chroma: Change type of stride parameters to ptrdiff_t 2016-09-29 14:48:04 +02:00
h264dsp_init_aarch64.c
h264dsp_neon.S
h264idct_neon.S aarch64: h264idct: Use the offset parameter to movrel 2016-11-10 11:18:22 +02:00
h264pred_init.c
h264pred_neon.S
h264qpel_init_aarch64.c
h264qpel_neon.S
hpeldsp_init_aarch64.c
hpeldsp_neon.S
imdct15_init.c
imdct15_neon.S
Makefile aarch64: vp9: Implement NEON loop filters 2016-11-14 00:10:13 +02:00
mdct_init.c fft: Split MDCT bits off from FFT 2016-03-01 10:18:28 +01:00
mdct_neon.S
mpegaudiodsp_init.c mpegaudiodsp: aarch64: Adjust function prototype after 2caa93b813 2016-11-10 00:13:48 +01:00
mpegaudiodsp_neon.S mpegaudiodsp: Change type of array stride parameters to ptrdiff_t 2016-09-29 17:54:24 +02:00
neon.S aarch64: Make transpose_4x4H do a regular transpose 2016-03-26 21:25:56 +02:00
neontest.c lavc: add clobber tests for the new encoding/decoding API 2016-09-28 10:01:52 +02:00
rv40dsp_init_aarch64.c h264chroma: Change type of stride parameters to ptrdiff_t 2016-09-29 14:48:04 +02:00
synth_filter_neon.S arm64: replace 'bic' with immediate with 'and' with inverted immediate 2016-12-14 21:53:05 +01:00
vc1dsp_init_aarch64.c h264chroma: Change type of stride parameters to ptrdiff_t 2016-09-29 14:48:04 +02:00
videodsp_init.c
videodsp.S
vorbisdsp_init.c
vorbisdsp_neon.S
vp9dsp_init_aarch64.c aarch64: vp9dsp: Fix vertical alignment in the init file 2017-01-03 14:15:58 +02:00
vp9itxfm_neon.S aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 2017-02-09 12:32:03 +02:00
vp9lpf_neon.S aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne}; 2016-11-16 09:05:18 +01:00
vp9mc_neon.S aarch64: vp9mc: Fix a comment to refer to a register with the right name 2017-01-03 14:16:10 +02:00