FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-03-03 14:32:16 +02:00

History

Martin Storsjö 8b11a89c06 aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32

This work is sponsored by, and copyright, Google.

Previously all subpartitions except the eob=1 (DC) case ran with
the same runtime:

vp9_inv_dct_dct_16x16_sub16_add_neon:   1373.2
vp9_inv_dct_dct_32x32_sub32_add_neon:   8089.0

By skipping individual 8x16 or 8x32 pixel slices in the first pass,
we reduce the runtime of these functions like this:

vp9_inv_dct_dct_16x16_sub1_add_neon:     235.3
vp9_inv_dct_dct_16x16_sub2_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub4_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub8_add_neon:    1036.7
vp9_inv_dct_dct_16x16_sub12_add_neon:   1372.1
vp9_inv_dct_dct_16x16_sub16_add_neon:   1372.1
vp9_inv_dct_dct_32x32_sub1_add_neon:     555.1
vp9_inv_dct_dct_32x32_sub2_add_neon:    5190.2
vp9_inv_dct_dct_32x32_sub4_add_neon:    5180.0
vp9_inv_dct_dct_32x32_sub8_add_neon:    5183.1
vp9_inv_dct_dct_32x32_sub12_add_neon:   6161.5
vp9_inv_dct_dct_32x32_sub16_add_neon:   6155.5
vp9_inv_dct_dct_32x32_sub20_add_neon:   7136.3
vp9_inv_dct_dct_32x32_sub24_add_neon:   7128.4
vp9_inv_dct_dct_32x32_sub28_add_neon:   8098.9
vp9_inv_dct_dct_32x32_sub32_add_neon:   8098.8

I.e. in general a very minor overhead for the full subpartition case due
to the additional cmps, but a significant speedup for the cases when we
only need to process a small part of the actual input data.

This is cherrypicked from libav commits
cad42fadcd2c2ae1b3676bb398844a1f521a2d7b and
a0c443a3980dc22eb02b067ac4cb9ffa2f9b04d2.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>

2017-01-14 21:13:32 +01:00

asm-offsets.h

…

cabac.h

…

fft_init_aarch64.c

…

fft_neon.S

…

fmtconvert_init.c

…

fmtconvert_neon.S

…

h264chroma_init_aarch64.c

…

h264cmc_neon.S

…

h264dsp_init_aarch64.c

…

h264dsp_neon.S

…

h264idct_neon.S

aarch64: h264idct: Use the offset parameter to movrel

2016-12-08 18:11:07 +01:00

h264pred_init.c

…

h264pred_neon.S

…

h264qpel_init_aarch64.c

…

h264qpel_neon.S

…

hpeldsp_init_aarch64.c

…

hpeldsp_neon.S

…

Makefile

imdct15: remove the AArch64 assembly

2017-01-05 22:32:02 +00:00

mdct_neon.S

…

mpegaudiodsp_init.c

…

mpegaudiodsp_neon.S

…

neon.S

…

neontest.c

avcodec: fix arguments on xmm/neon clobber test wrappers

2016-10-02 02:15:47 -03:00

rv40dsp_init_aarch64.c

…

synth_filter_init.c

…

synth_filter_neon.S

…

vc1dsp_init_aarch64.c

…

videodsp_init.c

…

videodsp.S

…

vorbisdsp_init.c

…

vorbisdsp_neon.S

…

vp9dsp_init_aarch64.c

aarch64: vp9: Implement NEON loop filters

2016-11-15 15:10:03 -05:00

vp9itxfm_neon.S

aarch64: vp9itxfm: Skip empty slices in the first pass of idct_idct 16x16 and 32x32

2017-01-14 21:13:32 +01:00

vp9lpf_neon.S

aarch64: vp9: loop filter: replace 'orr; cbn?z' with 'adds; b.{eq,ne};

2017-01-14 21:13:10 +01:00

vp9mc_neon.S

aarch64: vp9: Add NEON optimizations of VP9 MC functions

2016-11-15 15:10:03 -05:00