FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-11-23 21:54:53 +02:00

Files

Krzysztof Pyrkosz f9b8f30680 avcodec/aarch64/vvc: Optimize vvc_avg{8, 10, 12}

This patch replaces integer widening with halving addition, and
multi-step "emulated" rounding shift with a single asm instruction doing
exactly that.

Benchmarks before and after:
A78
avg_8_64x64_neon:                                     2686.2 ( 6.12x)
avg_8_128x128_neon:                                  10734.2 ( 5.88x)
avg_10_64x64_neon:                                    2536.8 ( 5.40x)
avg_10_128x128_neon:                                 10079.0 ( 5.22x)
avg_12_64x64_neon:                                    2548.2 ( 5.38x)
avg_12_128x128_neon:                                 10133.8 ( 5.19x)

avg_8_64x64_neon:                                      897.8 (18.26x)
avg_8_128x128_neon:                                   3608.5 (17.37x)
avg_10_32x32_neon:                                     444.2 ( 8.51x)
avg_10_64x64_neon:                                    1711.8 ( 8.00x)
avg_12_64x64_neon:                                    1706.2 ( 8.02x)
avg_12_128x128_neon:                                  7010.0 ( 7.46x)

A72
avg_8_64x64_neon:                                     5823.4 ( 3.88x)
avg_8_128x128_neon:                                  17430.5 ( 4.73x)
avg_10_64x64_neon:                                    5228.1 ( 3.71x)
avg_10_128x128_neon:                                 16722.2 ( 4.17x)
avg_12_64x64_neon:                                    5379.1 ( 3.51x)
avg_12_128x128_neon:                                 16715.7 ( 4.17x)

avg_8_64x64_neon:                                     2006.5 (10.61x)
avg_8_128x128_neon:                                   9158.7 ( 8.96x)
avg_10_64x64_neon:                                    3357.7 ( 5.60x)
avg_10_128x128_neon:                                 12411.7 ( 5.56x)
avg_12_64x64_neon:                                    3317.5 ( 5.67x)
avg_12_128x128_neon:                                 12358.5 ( 5.58x)

A53
avg_8_64x64_neon:                                     8327.8 ( 5.18x)
avg_8_128x128_neon:                                  31631.3 ( 5.34x)
avg_10_64x64_neon:                                    8783.5 ( 4.98x)
avg_10_128x128_neon:                                 32617.0 ( 5.25x)
avg_12_64x64_neon:                                    8686.0 ( 5.06x)
avg_12_128x128_neon:                                 32487.5 ( 5.25x)

avg_8_64x64_neon:                                     6032.3 ( 7.17x)
avg_8_128x128_neon:                                  22008.5 ( 7.69x)
avg_10_64x64_neon:                                    7738.0 ( 5.68x)
avg_10_128x128_neon:                                 27813.8 ( 6.14x)
avg_12_64x64_neon:                                    7844.5 ( 5.60x)
avg_12_128x128_neon:                                 26999.5 ( 6.34x)

Signed-off-by: Martin Storsjö <martin@martin.st>

2025-03-07 15:51:20 +02:00

h26x

aarch64: h26x: Fix the indentation of one function

2024-09-26 13:42:11 +03:00

vvc

avcodec/aarch64/vvc: Optimize vvc_avg{8, 10, 12}

2025-03-07 15:51:20 +02:00

aacencdsp_init.c

avcodec/aarch64/aacencdsp: NEON implementation

2025-01-28 10:44:40 +02:00

aacencdsp_neon.S

avcodec/aarch64/aacencdsp: NEON implementation

2025-01-28 10:44:40 +02:00

aacpsdsp_init_aarch64.c

…

aacpsdsp_neon.S

…

ac3dsp_init_aarch64.c

…

ac3dsp_neon.S

avcodec/aarch64/ac3dsp_neon.S: Optimize ac3_sum_square_butterfly_int32_neon

2025-03-02 01:17:53 +02:00

cabac.h

…

fdct.h

…

fdctdsp_init_aarch64.c

…

fdctdsp_neon.S

…

fmtconvert_init.c

…

fmtconvert_neon.S

…

h264chroma_init_aarch64.c

…

h264cmc_neon.S

…

h264dsp_init_aarch64.c

…

h264dsp_neon.S

…

h264idct_neon.S

…

h264pred_init.c

…

h264pred_neon.S

lavc/aarch64: Fix ff_pred16x16_plane_neon_10

2024-12-17 14:50:29 +02:00

h264qpel_init_aarch64.c

…

h264qpel_neon.S

…

hevcdsp_deblock_neon.S

…

hevcdsp_idct_neon.S

aarch64/hevcdsp_idct_neon: Add implementation for idct dc 12

2025-03-04 17:01:58 +08:00

hevcdsp_init_aarch64.c

aarch64/hevcdsp_idct_neon: Add implementation for idct dc 12

2025-03-04 17:01:58 +08:00

hpeldsp_init_aarch64.c

…

hpeldsp_neon.S

…

idct.h

…

idctdsp_init_aarch64.c

…

idctdsp_neon.S

…

Makefile

avcodec/aarch64/aacencdsp: NEON implementation

2025-01-28 10:44:40 +02:00

me_cmp_init_aarch64.c

…

me_cmp_neon.S

…

mpegaudiodsp_init.c

…

mpegaudiodsp_neon.S

…

mpegvideoencdsp_init.c

avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t

2024-09-01 13:42:30 +02:00

mpegvideoencdsp_neon.S

avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t

2024-09-01 13:42:30 +02:00

neon.S

…

neontest.c

…

opusdsp_init.c

lavc/opus*: move to opus/ subdir

2024-09-02 11:56:53 +02:00

opusdsp_neon.S

avcodec/aarch64/opusdsp_neon: Simplify opus_postfilter_neon

2025-02-10 14:55:16 +02:00

pixblockdsp_init_aarch64.c

…

pixblockdsp_neon.S

…

rv40dsp_init_aarch64.c

…

sbrdsp_init_aarch64.c

…

sbrdsp_neon.S

…

simple_idct_neon.S

…

synth_filter_init.c

…

synth_filter_neon.S

…

vc1dsp_init_aarch64.c

…

vc1dsp_neon.S

…

videodsp_init.c

…

videodsp.S

…

vorbisdsp_init.c

…

vorbisdsp_neon.S

…

vp8dsp_init_aarch64.c

…

vp8dsp_neon.S

…

vp8dsp.h

…

vp9dsp_init_10bpp_aarch64.c

…

vp9dsp_init_12bpp_aarch64.c

…

vp9dsp_init_16bpp_aarch64_template.c

…

vp9dsp_init_aarch64.c

…

vp9dsp_init.h

…

vp9itxfm_16bpp_neon.S

…

vp9itxfm_neon.S

…

vp9lpf_16bpp_neon.S

…

vp9lpf_neon.S

…

vp9mc_16bpp_neon.S

…

vp9mc_aarch64.S

…

vp9mc_neon.S

aarch64: vp9mc: Load only 12 pixels in the 4 pixel wide horizontal filter

2025-01-03 17:53:46 -05:00