FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-28 20:53:54 +02:00

History

Swinney, Jonathan 0ea61725b1 swscale/aarch64: add hscale specializations This patch adds code to support specializations of the hscale function and adds a specialization for filterSize == 4. ff_hscale8to15_4_neon is a complete rewrite. Since the main bottleneck here is loading the data from src, this data is loaded a whole block ahead and stored back to the stack to be loaded again with ld4. This arranges the data for most efficient use of the vector instructions and removes the need for completion adds at the end. The number of iterations of the C per iteration of the assembly is increased from 4 to 8, but because of the prefetching, there must be a special section without prefetching when dstW < 16. This improves speed on Graviton 2 (Neoverse N1) dramatically in the case where previously fs=8 would have been required. before: hscale_8_to_15__fs_8_dstW_512_neon: 1962.8 after : hscale_8_to_15__fs_4_dstW_512_neon: 1220.9 Signed-off-by: Jonathan Swinney <jswinney@amazon.com> Signed-off-by: Martin Storsjö <martin@martin.st>		2022-05-28 01:09:05 +03:00
..
hscale.S	swscale/aarch64: add hscale specializations	2022-05-28 01:09:05 +03:00
Makefile	swscale: aarch64: Add a NEON implementation of interleaveBytes	2020-05-15 23:38:17 +03:00
output.S	swscale: aarch64: Don't clobber callee-saved registers v8-v15	2020-04-21 23:41:13 +03:00
rgb2rgb_neon.S	swscale: aarch64: Add a NEON implementation of interleaveBytes	2020-05-15 23:38:17 +03:00
rgb2rgb.c	swscale: aarch64: Add a NEON implementation of interleaveBytes	2020-05-15 23:38:17 +03:00
swscale_unscaled.c	sws: rename SwsContext.swscale to convert_unscaled	2021-07-03 15:57:53 +02:00
swscale.c	swscale/aarch64: add hscale specializations	2022-05-28 01:09:05 +03:00
yuv2rgb_neon.S	aarch64/yuv2rgb_neon: fix return value	2020-07-09 10:33:14 +01:00