1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-28 20:53:54 +02:00
FFmpeg/libavcodec/aarch64
Lynne 4d2f62150d aarch64/opusdsp: implement NEON accelerated postfilter and deemphasis
153372 UNITS in postfilter_c,   65536 runs,      0 skips
73164 UNITS in postfilter_neon,   65536 runs,      0 skips -> 2.1x speedup

80591 UNITS in deemphasis_c,  131072 runs,      0 skips
43969 UNITS in deemphasis_neon,  131072 runs,      0 skips -> 1.83x speedup

Total decoder speedup: ~15% on a Raspberry Pi 3 (from 28.1x to 33.5x realtime)

Deemphasis SIMD based on the following unrolling:
const float c1 = CELT_EMPH_COEFF, c2 = c1*c1, c3 = c2*c1, c4 = c3*c1;
float state = coeff;

for (int i = 0; i < len; i += 4) {
    y[0] = x[0] + c1*state;
    y[1] = x[1] + c2*state + c1*x[0];
    y[2] = x[2] + c3*state + c1*x[1] + c2*x[0];
    y[3] = x[3] + c4*state + c1*x[2] + c2*x[1] + c3*x[0];

    state = y[3];
    y += 4;
    x += 4;
}

Unlike the x86 version, duplication is used instead of pslldq so
the structure and tables are different.
2019-04-10 01:08:54 +02:00
..
aacpsdsp_init_aarch64.c
aacpsdsp_neon.S
asm-offsets.h
cabac.h
fft_init_aarch64.c
fft_neon.S
fmtconvert_init.c
fmtconvert_neon.S
h264chroma_init_aarch64.c
h264cmc_neon.S
h264dsp_init_aarch64.c Merge commit '186bd30aa3b6c2b29b4dbf18278700b572068b1e' 2019-03-14 16:29:41 -03:00
h264dsp_neon.S Merge commit '186bd30aa3b6c2b29b4dbf18278700b572068b1e' 2019-03-14 16:29:41 -03:00
h264idct_neon.S
h264pred_init.c
h264pred_neon.S
h264qpel_init_aarch64.c
h264qpel_neon.S
hpeldsp_init_aarch64.c
hpeldsp_neon.S
idct.h
idctdsp_init_aarch64.c
Makefile aarch64/opusdsp: implement NEON accelerated postfilter and deemphasis 2019-04-10 01:08:54 +02:00
mdct_neon.S
mpegaudiodsp_init.c
mpegaudiodsp_neon.S
neon.S
neontest.c
opusdsp_init.c aarch64/opusdsp: implement NEON accelerated postfilter and deemphasis 2019-04-10 01:08:54 +02:00
opusdsp_neon.S aarch64/opusdsp: implement NEON accelerated postfilter and deemphasis 2019-04-10 01:08:54 +02:00
rv40dsp_init_aarch64.c
sbrdsp_init_aarch64.c
sbrdsp_neon.S
simple_idct_neon.S
synth_filter_init.c
synth_filter_neon.S
vc1dsp_init_aarch64.c
videodsp_init.c
videodsp.S
vorbisdsp_init.c
vorbisdsp_neon.S
vp8dsp_init_aarch64.c Merge commit 'e39a9212ab37a55b346801c77487d8a47b6f9fe2' 2019-03-14 16:18:42 -03:00
vp8dsp_neon.S Merge commit '7e42d5f0ab2aeac811fd01e122627c9198b13f01' 2019-03-14 16:22:29 -03:00
vp8dsp.h Merge commit 'e39a9212ab37a55b346801c77487d8a47b6f9fe2' 2019-03-14 16:18:42 -03:00
vp9dsp_init_10bpp_aarch64.c
vp9dsp_init_12bpp_aarch64.c
vp9dsp_init_16bpp_aarch64_template.c
vp9dsp_init_aarch64.c
vp9dsp_init.h
vp9itxfm_16bpp_neon.S
vp9itxfm_neon.S
vp9lpf_16bpp_neon.S
vp9lpf_neon.S
vp9mc_16bpp_neon.S
vp9mc_neon.S