1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-21 10:55:51 +02:00
FFmpeg/libavcodec/aarch64
Martin Storsjö 68a03f6424 aarch64: me_cmp: Switch from uabd to uabal in ff_pix_abs16_xy2_neon
Using absolute-difference-accumulate does use twice the amount of
absolute-difference instructions, but avoids the need for the
uaddl and add instructions, reducing the total number of instructions
by 3.

These can be interleaved in the rest of the calculation, to avoid
tight dependencies at the end. Unfortunately, this is marginally
slower on Cortex A53, but faster on A72 and A73.

Before:       Cortex A53    A72    A73   Graviton 3
pix_abs_0_3_neon:  175.7  109.2   92.0   41.2
After:
pix_abs_0_3_neon:  179.7   96.7   87.5   41.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-07-16 17:25:54 +03:00
..
aacpsdsp_init_aarch64.c Include attributes.h directly 2021-04-19 14:34:10 +02:00
aacpsdsp_neon.S
asm-offsets.h
cabac.h
fft_init_aarch64.c
fft_neon.S arm64: Fix wrong BTI landing pad 2022-04-26 10:26:49 +03:00
fmtconvert_init.c
fmtconvert_neon.S
h264chroma_init_aarch64.c
h264cmc_neon.S configure: Use a separate config_components.h header for $ALL_COMPONENTS 2022-03-16 14:12:49 +02:00
h264dsp_init_aarch64.c lavc/aarch64: h264, add chroma loop filters for 10bit 2021-08-21 00:06:26 +03:00
h264dsp_neon.S aarch64: h264dsp: Fix incorrectly indented code 2022-02-11 10:49:12 +02:00
h264idct_neon.S aarch64: Add Armv8.5-A BTI support 2021-11-16 13:43:56 +02:00
h264pred_init.c lavc/aarch64: add pred functions for 10-bit 2021-08-21 00:06:26 +03:00
h264pred_neon.S lavc/aarch64: add pred functions for 10-bit 2021-08-21 00:06:26 +03:00
h264qpel_init_aarch64.c
h264qpel_neon.S aarch64: h264qpel: Do vertical filtering without transposing 2021-10-18 14:27:58 +03:00
hevcdsp_idct_neon.S aarch64: hevc_idct: Fix overflows in idct_dc 2021-05-22 00:08:03 +03:00
hevcdsp_init_aarch64.c lavc/aarch64: add hevc sao edge 8x8 2022-05-25 08:04:46 +02:00
hevcdsp_sao_neon.S lavc/aarch64: hevc_sao reschedule slightly 2022-05-26 08:10:41 +02:00
hpeldsp_init_aarch64.c
hpeldsp_neon.S
idct.h avcodec/aarch64/idct: Add missing stddef 2022-02-21 13:10:04 +01:00
idctdsp_init_aarch64.c avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths 2022-04-01 10:03:34 +03:00
idctdsp_neon.S avcodec/idctdsp: Arm 64-bit NEON block add and clamp fast paths 2022-04-01 10:03:34 +03:00
Makefile lavc/aarch64: motion estimation functions in neon 2022-06-28 00:51:39 +03:00
mdct_neon.S arm64: Add Armv8.3-A PAC support to assembly files 2022-03-09 15:04:25 +02:00
me_cmp_init_aarch64.c lavc/aarch64: Add pix_abs16_x2 neon implementation 2022-07-13 23:25:22 +03:00
me_cmp_neon.S aarch64: me_cmp: Switch from uabd to uabal in ff_pix_abs16_xy2_neon 2022-07-16 17:25:54 +03:00
mpegaudiodsp_init.c
mpegaudiodsp_neon.S
neon.S lavc/aarch64: move transpose_4x8H to neon.S 2021-08-21 00:06:26 +03:00
neontest.c avcodec: Remove deprecated old encode/decode APIs 2021-04-27 10:43:12 -03:00
opusdsp_init.c Include attributes.h directly 2021-04-19 14:34:10 +02:00
opusdsp_neon.S
pixblockdsp_init_aarch64.c
pixblockdsp_neon.S
rv40dsp_init_aarch64.c
sbrdsp_init_aarch64.c
sbrdsp_neon.S
simple_idct_neon.S aarch64: Use ret x<n> instead of br x<n> where possible 2021-11-16 13:43:56 +02:00
synth_filter_init.c
synth_filter_neon.S arm64: Add Armv8.3-A PAC support to assembly files 2022-03-09 15:04:25 +02:00
vc1dsp_init_aarch64.c avcodec/vc1: Arm 64-bit NEON unescape fast path 2022-04-01 10:03:34 +03:00
vc1dsp_neon.S avcodec/vc1: Arm 64-bit NEON unescape fast path 2022-04-01 10:03:34 +03:00
videodsp_init.c
videodsp.S lavc/aarch64: fix relocation out of range error 2021-09-25 21:55:29 +03:00
vorbisdsp_init.c
vorbisdsp_neon.S
vp8dsp_init_aarch64.c
vp8dsp_neon.S
vp8dsp.h
vp9dsp_init_10bpp_aarch64.c
vp9dsp_init_12bpp_aarch64.c
vp9dsp_init_16bpp_aarch64_template.c
vp9dsp_init_aarch64.c
vp9dsp_init.h
vp9itxfm_16bpp_neon.S aarch64: Use ret x<n> instead of br x<n> where possible 2021-11-16 13:43:56 +02:00
vp9itxfm_neon.S aarch64: Use ret x<n> instead of br x<n> where possible 2021-11-16 13:43:56 +02:00
vp9lpf_16bpp_neon.S aarch64: Use ret x<n> instead of br x<n> where possible 2021-11-16 13:43:56 +02:00
vp9lpf_neon.S aarch64: Use ret x<n> instead of br x<n> where possible 2021-11-16 13:43:56 +02:00
vp9mc_16bpp_neon.S
vp9mc_aarch64.S
vp9mc_neon.S