1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-13 21:28:01 +02:00
FFmpeg/libswscale/aarch64
Sebastian Pop bd83191271 swscale/aarch64: use multiply accumulate and increase vector factor to 4
This patch implements ff_hscale_8_to_15_neon with NEON fused multiply accumulate
and bumps the vectorization factor from 2 to 4.
The speedup is of 25% on Graviton1 A1 instances based on A-72 cpus:

$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.040303 avg:0.040287 max:0.040371 min:0.039214
after:  t:0.032168 avg:0.032215 max:0.033081 min:0.032146

The speedup is of 39% on Graviton2 m6g instances based on Neoverse-N1 cpus:
$ ffmpeg -nostats -f lavfi -i testsrc2=4k:d=2 -vf bench=start,scale=1024x1024,bench=stop -f null -
before: t:0.019446 avg:0.019423 max:0.019493 min:0.019181
after:  t:0.014015 avg:0.014096 max:0.015018 min:0.013971

Tested with `make check` on aarch64-linux.

Signed-off-by: Sebastian Pop <spop@amazon.com>
Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2019-12-17 23:41:47 +01:00
..
hscale.S swscale/aarch64: use multiply accumulate and increase vector factor to 4 2019-12-17 23:41:47 +01:00
Makefile sws/aarch64: add ff_yuv2planeX_8_neon 2016-04-11 16:27:19 +02:00
output.S sws/aarch64: add ff_yuv2planeX_8_neon 2016-04-11 16:27:19 +02:00
swscale_unscaled.c
swscale.c sws/aarch64: add ff_yuv2planeX_8_neon 2016-04-11 16:27:19 +02:00
yuv2rgb_neon.S sws/aarch64/yuv2rgb: honor iOS calling convention 2016-04-08 17:58:43 +02:00