mirror of
https://github.com/FFmpeg/FFmpeg.git
synced 2024-11-21 10:55:51 +02:00
1e9cfa5bb0
Add arm64 neon implementations for hscale 8 to 19 with filter sizes 4, 4X and 8. Both implementations are based on very similar ones dedicated to hscale 8 to 15. The major changes refer to saving the data - instead of writing the result as int16_t it is done with int32_t. These functions are heavily inspired on patches provided by J. Swinney and M. Storsjö for hscale8to15 which were slightly adapted for hscale8to19. The tests and benchmarks run on AWS Graviton 2 instances. The results from a checkasm tool shown below. hscale_8_to_19__fs_4_dstW_512_c: 5663.2 hscale_8_to_19__fs_4_dstW_512_neon: 1259.7 hscale_8_to_19__fs_8_dstW_512_c: 9306.0 hscale_8_to_19__fs_8_dstW_512_neon: 2020.2 hscale_8_to_19__fs_12_dstW_512_c: 12932.7 hscale_8_to_19__fs_12_dstW_512_neon: 2462.5 hscale_8_to_19__fs_16_dstW_512_c: 16844.2 hscale_8_to_19__fs_16_dstW_512_neon: 4671.2 hscale_8_to_19__fs_32_dstW_512_c: 32803.7 hscale_8_to_19__fs_32_dstW_512_neon: 5474.2 hscale_8_to_19__fs_40_dstW_512_c: 40948.0 hscale_8_to_19__fs_40_dstW_512_neon: 6669.7 Signed-off-by: Hubert Mazur <hum@semihalf.com> Signed-off-by: Martin Storsjö <martin@martin.st> |
||
---|---|---|
.. | ||
aarch64 | ||
arm | ||
loongarch | ||
ppc | ||
riscv | ||
tests | ||
x86 | ||
alphablend.c | ||
bayer_template.c | ||
gamma.c | ||
half2float.c | ||
hscale_fast_bilinear.c | ||
hscale.c | ||
input.c | ||
libswscale.v | ||
log2_tab.c | ||
Makefile | ||
options.c | ||
output.c | ||
rgb2rgb_template.c | ||
rgb2rgb.c | ||
rgb2rgb.h | ||
slice.c | ||
swscale_internal.h | ||
swscale_unscaled.c | ||
swscale.c | ||
swscale.h | ||
swscaleres.rc | ||
utils.c | ||
version_major.h | ||
version.c | ||
version.h | ||
vscale.c | ||
yuv2rgb.c |