1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-23 12:43:46 +02:00
FFmpeg/libavcodec/aarch64
Martin Storsjö 2905657b90 aarch64: vp9itxfm: Avoid reloading the idct32 coefficients
The idct32x32 function actually pushed d8-d15 onto the stack even
though it didn't clobber them; there are plenty of registers that
can be used to allow keeping all the idct coefficients in registers
without having to reload different subsets of them at different
stages in the transform.

After this, we still can skip pushing d12-d15.

Before:
vp9_inv_dct_dct_32x32_sub32_add_neon: 8128.3
After:
vp9_inv_dct_dct_32x32_sub32_add_neon: 8053.3

This is cherrypicked from libav commit
65aa002d54.

Signed-off-by: Martin Storsjö <martin@martin.st>
2017-03-11 13:14:51 +02:00
..
asm-offsets.h
cabac.h
fft_init_aarch64.c
fft_neon.S
fmtconvert_init.c
fmtconvert_neon.S
h264chroma_init_aarch64.c
h264cmc_neon.S
h264dsp_init_aarch64.c
h264dsp_neon.S
h264idct_neon.S aarch64: h264idct: Use the offset parameter to movrel 2016-12-08 18:11:07 +01:00
h264pred_init.c
h264pred_neon.S
h264qpel_init_aarch64.c
h264qpel_neon.S
hpeldsp_init_aarch64.c
hpeldsp_neon.S
Makefile aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:36:11 +02:00
mdct_neon.S
mpegaudiodsp_init.c
mpegaudiodsp_neon.S
neon.S
neontest.c
rv40dsp_init_aarch64.c
synth_filter_init.c
synth_filter_neon.S
vc1dsp_init_aarch64.c
videodsp_init.c
videodsp.S
vorbisdsp_init.c
vorbisdsp_neon.S
vp9dsp_init_10bpp_aarch64.c aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:36:05 +02:00
vp9dsp_init_12bpp_aarch64.c aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:36:05 +02:00
vp9dsp_init_16bpp_aarch64_template.c aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:36:11 +02:00
vp9dsp_init_aarch64.c aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:36:05 +02:00
vp9dsp_init.h aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:36:05 +02:00
vp9itxfm_16bpp_neon.S aarch64: Add NEON optimizations for 10 and 12 bit vp9 itxfm 2017-01-24 22:36:08 +02:00
vp9itxfm_neon.S aarch64: vp9itxfm: Avoid reloading the idct32 coefficients 2017-03-11 13:14:51 +02:00
vp9lpf_16bpp_neon.S aarch64: Add NEON optimizations for 10 and 12 bit vp9 loop filter 2017-01-24 22:36:11 +02:00
vp9lpf_neon.S aarch64: vp9lpf: Use dup+rev16+uzp1 instead of dup+lsr+dup+trn1 2017-03-11 13:14:50 +02:00
vp9mc_16bpp_neon.S aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC 2017-01-24 22:36:05 +02:00
vp9mc_neon.S aarch64: vp9mc: Calculate less unused data in the 4 pixel wide horizontal filter 2017-03-11 13:14:48 +02:00