1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-02-09 14:14:39 +02:00
Martin Storsjö 4136405c86 aarch64: me_cmp: Don't do uaddlv once per iteration
The max height is currently documented as 16; the max difference per
pixel is 255, and a .8h element can easily contain 16*255, thus keep
accumulating in two .8h vectors, and just do the final accumulationat the
end. This should work for heights up to 256.

This requires a minor register renumbering in ff_pix_abs16_xy2_neon.

Before:       Cortex A53    A72    A73   Graviton 3
pix_abs_0_0_neon:   97.7   47.0   37.5   22.7
pix_abs_0_1_neon:  154.0   59.0   52.0   25.0
pix_abs_0_3_neon:  179.7   96.7   87.5   41.2
After:
pix_abs_0_0_neon:   96.0   39.2   31.2   22.0
pix_abs_0_1_neon:  150.7   59.7   46.2   23.7
pix_abs_0_3_neon:  175.7   83.7   81.7   38.2

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-07-16 17:26:17 +03:00
..
2022-04-26 10:26:49 +03:00
2021-04-19 14:34:10 +02:00