FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-03 05:10:03 +02:00

History

Martin Storsjö a63da4511d aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32 This work is sponsored by, and copyright, Google. This avoids loading and calculating coefficients that we know will be zero, and avoids filling the temp buffer with zeros in places where we know the second pass won't read. This gives a pretty substantial speedup for the smaller subpartitions. The code size increases from 14740 bytes to 24292 bytes. The idct16/32_end macros are moved above the individual functions; the instructions themselves are unchanged, but since new functions are added at the same place where the code is moved from, the diff looks rather messy. Before: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub4_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 1051.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1387.4 vp9_inv_dct_dct_16x16_sub16_add_neon: 1387.6 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 5198.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 5198.6 vp9_inv_dct_dct_32x32_sub8_add_neon: 5196.3 vp9_inv_dct_dct_32x32_sub12_add_neon: 6183.4 vp9_inv_dct_dct_32x32_sub16_add_neon: 6174.3 vp9_inv_dct_dct_32x32_sub20_add_neon: 7151.4 vp9_inv_dct_dct_32x32_sub24_add_neon: 7145.3 vp9_inv_dct_dct_32x32_sub28_add_neon: 8119.3 vp9_inv_dct_dct_32x32_sub32_add_neon: 8118.7 After: vp9_inv_dct_dct_16x16_sub1_add_neon: 236.7 vp9_inv_dct_dct_16x16_sub2_add_neon: 640.8 vp9_inv_dct_dct_16x16_sub4_add_neon: 639.0 vp9_inv_dct_dct_16x16_sub8_add_neon: 842.0 vp9_inv_dct_dct_16x16_sub12_add_neon: 1388.3 vp9_inv_dct_dct_16x16_sub16_add_neon: 1389.3 vp9_inv_dct_dct_32x32_sub1_add_neon: 554.1 vp9_inv_dct_dct_32x32_sub2_add_neon: 3685.5 vp9_inv_dct_dct_32x32_sub4_add_neon: 3685.1 vp9_inv_dct_dct_32x32_sub8_add_neon: 3684.4 vp9_inv_dct_dct_32x32_sub12_add_neon: 5312.2 vp9_inv_dct_dct_32x32_sub16_add_neon: 5315.4 vp9_inv_dct_dct_32x32_sub20_add_neon: 7154.9 vp9_inv_dct_dct_32x32_sub24_add_neon: 7154.5 vp9_inv_dct_dct_32x32_sub28_add_neon: 8126.6 vp9_inv_dct_dct_32x32_sub32_add_neon: 8127.2 Signed-off-by: Martin Storsjö <martin@martin.st>		2017-02-09 12:32:03 +02:00
..
asm-offsets.h
cabac.h
dcadsp_init.c	dca: remove unused decode_hf function and quant_d tables	2015-12-24 13:58:18 +01:00
dcadsp_neon.S	dca: remove unused decode_hf function and quant_d tables	2015-12-24 13:58:18 +01:00
fft_init_aarch64.c	fft: Split MDCT bits off from FFT	2016-03-01 10:18:28 +01:00
fft_neon.S
fmtconvert_init.c	arm64: int32_to_float_fmul neon asm	2015-12-14 16:45:02 +01:00
fmtconvert_neon.S	arm64: int32_to_float_fmul neon asm	2015-12-14 16:45:02 +01:00
h264chroma_init_aarch64.c	h264chroma: Change type of stride parameters to ptrdiff_t	2016-09-29 14:48:04 +02:00
h264cmc_neon.S	h264chroma: Change type of stride parameters to ptrdiff_t	2016-09-29 14:48:04 +02:00
h264dsp_init_aarch64.c
h264dsp_neon.S
h264idct_neon.S	aarch64: h264idct: Use the offset parameter to movrel	2016-11-10 11:18:22 +02:00
h264pred_init.c
h264pred_neon.S
h264qpel_init_aarch64.c
h264qpel_neon.S
hpeldsp_init_aarch64.c
hpeldsp_neon.S
imdct15_init.c
imdct15_neon.S
Makefile	aarch64: vp9: Implement NEON loop filters	2016-11-14 00:10:13 +02:00
mdct_init.c	fft: Split MDCT bits off from FFT	2016-03-01 10:18:28 +01:00
mdct_neon.S
mpegaudiodsp_init.c	mpegaudiodsp: aarch64: Adjust function prototype after `2caa93b813`	2016-11-10 00:13:48 +01:00
mpegaudiodsp_neon.S	mpegaudiodsp: Change type of array stride parameters to ptrdiff_t	2016-09-29 17:54:24 +02:00
neon.S	aarch64: Make transpose_4x4H do a regular transpose	2016-03-26 21:25:56 +02:00
neontest.c	lavc: add clobber tests for the new encoding/decoding API	2016-09-28 10:01:52 +02:00
rv40dsp_init_aarch64.c	h264chroma: Change type of stride parameters to ptrdiff_t	2016-09-29 14:48:04 +02:00
synth_filter_neon.S	arm64: replace 'bic' with immediate with 'and' with inverted immediate	2016-12-14 21:53:05 +01:00
vc1dsp_init_aarch64.c	h264chroma: Change type of stride parameters to ptrdiff_t	2016-09-29 14:48:04 +02:00
videodsp_init.c
videodsp.S
vorbisdsp_init.c
vorbisdsp_neon.S
vp9dsp_init_aarch64.c	aarch64: vp9dsp: Fix vertical alignment in the init file	2017-01-03 14:15:58 +02:00
vp9itxfm_neon.S	aarch64: vp9itxfm: Do separate functions for half/quarter idct16 and idct32	2017-02-09 12:32:03 +02:00
vp9lpf_neon.S	aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne};	2016-11-16 09:05:18 +01:00
vp9mc_neon.S	aarch64: vp9mc: Fix a comment to refer to a register with the right name	2017-01-03 14:16:10 +02:00