FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-23 12:43:46 +02:00

History

Martin Storsjö ceb36b8178 aarch64: Add NEON optimizations for 10 and 12 bit vp9 itxfm This work is sponsored by, and copyright, Google. Compared to the arm version, on aarch64 we can keep the full 8x8 transform in registers, and for 16x16 and 32x32, we can process it in slices of 4 pixels instead of 2. Examples of runtimes vs the 32 bit version, on a Cortex A53: ARM AArch64 vp9_inv_adst_adst_4x4_sub4_add_10_neon: 111.0 109.7 vp9_inv_adst_adst_8x8_sub8_add_10_neon: 914.0 733.5 vp9_inv_adst_adst_16x16_sub16_add_10_neon: 5184.0 3745.7 vp9_inv_dct_dct_4x4_sub1_add_10_neon: 65.0 65.7 vp9_inv_dct_dct_4x4_sub4_add_10_neon: 100.0 96.7 vp9_inv_dct_dct_8x8_sub1_add_10_neon: 111.0 119.7 vp9_inv_dct_dct_8x8_sub8_add_10_neon: 618.0 494.7 vp9_inv_dct_dct_16x16_sub1_add_10_neon: 295.1 284.6 vp9_inv_dct_dct_16x16_sub2_add_10_neon: 2303.2 1883.9 vp9_inv_dct_dct_16x16_sub8_add_10_neon: 2984.8 2189.3 vp9_inv_dct_dct_16x16_sub16_add_10_neon: 3890.0 2799.4 vp9_inv_dct_dct_32x32_sub1_add_10_neon: 1044.4 1012.7 vp9_inv_dct_dct_32x32_sub2_add_10_neon: 13333.7 9695.1 vp9_inv_dct_dct_32x32_sub16_add_10_neon: 18531.3 12459.8 vp9_inv_dct_dct_32x32_sub32_add_10_neon: 24470.7 16160.2 vp9_inv_wht_wht_4x4_sub4_add_10_neon: 83.0 79.7 The larger transforms are significantly faster than the corresponding ARM versions. The speedup vs C code is smaller than in 32 bit mode, probably because the 64 bit intermediates in the C code can be expressed more efficiently in aarch64. Signed-off-by: Martin Storsjö <martin@martin.st>		2017-01-24 22:36:08 +02:00
..
asm-offsets.h
cabac.h
fft_init_aarch64.c
fft_neon.S
fmtconvert_init.c
fmtconvert_neon.S
h264chroma_init_aarch64.c
h264cmc_neon.S
h264dsp_init_aarch64.c
h264dsp_neon.S
h264idct_neon.S	aarch64: h264idct: Use the offset parameter to movrel	2016-12-08 18:11:07 +01:00
h264pred_init.c
h264pred_neon.S
h264qpel_init_aarch64.c
h264qpel_neon.S
hpeldsp_init_aarch64.c
hpeldsp_neon.S
Makefile	aarch64: Add NEON optimizations for 10 and 12 bit vp9 itxfm	2017-01-24 22:36:08 +02:00
mdct_neon.S
mpegaudiodsp_init.c
mpegaudiodsp_neon.S
neon.S
neontest.c	avcodec: fix arguments on xmm/neon clobber test wrappers	2016-10-02 02:15:47 -03:00
rv40dsp_init_aarch64.c
synth_filter_init.c
synth_filter_neon.S
vc1dsp_init_aarch64.c
videodsp_init.c
videodsp.S
vorbisdsp_init.c
vorbisdsp_neon.S
vp9dsp_init_10bpp_aarch64.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	2017-01-24 22:36:05 +02:00
vp9dsp_init_12bpp_aarch64.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	2017-01-24 22:36:05 +02:00
vp9dsp_init_16bpp_aarch64_template.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 itxfm	2017-01-24 22:36:08 +02:00
vp9dsp_init_aarch64.c	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	2017-01-24 22:36:05 +02:00
vp9dsp_init.h	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	2017-01-24 22:36:05 +02:00
vp9itxfm_16bpp_neon.S	aarch64: Add NEON optimizations for 10 and 12 bit vp9 itxfm	2017-01-24 22:36:08 +02:00
vp9itxfm_neon.S	aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32	2017-01-14 21:13:32 +01:00
vp9lpf_neon.S	aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne};	2017-01-14 21:13:10 +01:00
vp9mc_16bpp_neon.S	aarch64: Add NEON optimizations for 10 and 12 bit vp9 MC	2017-01-24 22:36:05 +02:00
vp9mc_neon.S	aarch64: vp9mc: Fix a comment to refer to a register with the right name	2017-01-14 21:13:43 +01:00