1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-29 22:00:58 +02:00
Martin Storsjö 7e42d5f0ab aarch64: vp8: Optimize vp8_idct_add_neon for aarch64
The previous version was a pretty exact translation of the arm
version. This version does do some unnecessary arithemetic (it does
more operations on vectors that are only half filled; it does 4
uaddw and 4 sqxtun instead of 2 of each), but it reduces the overhead
of packing data together (which could be done for free in the arm
version).

This gives a decent speedup on Cortex A53, a minor speedup on
A72 and a very minor slowdown on Cortex A73.

Before:        Cortex A53    A72    A73
vp8_idct_add_neon:   79.7   67.5   65.0
After:
vp8_idct_add_neon:   67.7   64.8   66.7

Signed-off-by: Martin Storsjö <martin@martin.st>
2019-02-19 11:46:28 +02:00
..