1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-18 03:19:31 +02:00
Commit Graph

5 Commits

Author SHA1 Message Date
Martin Storsjö
9b2ccafb48 aarch64: Add missing sign extension in ff_h264_idct8_add_neon
Signed-off-by: Martin Storsjö <martin@martin.st>
2016-10-10 14:57:53 +03:00
Martin Storsjö
cdb1665f70 aarch64: Make transpose_4x4H do a regular transpose
Previously, ff_h264_idct_add_neon (originally in the arm version) used
a non-regular transpose in order to be able to use more instructions
that deal with registers as 128 bit register pairs. The aarch64
translation doesn't do it to the same extent, but brought along the
same structure since it was a straight translation.

This reshuffles ff_h264_idct_add_neon, bringing it closer to
the C implementation, making the transpose_4x4H macro do a regular
transpose, usable for other algorithms as well.

Previously, the third and fourth output from transpose_4x4H were
swapped, and prior to cc29d96d5a, the same inputs as well. In
addition to just swapping the outputs, also renumber the intermediate
registers for better readability (making the register order match
transpose_4x8B).

This runs with the same number of cycles as before.

Signed-off-by: Martin Storsjö <martin@martin.st>
2016-03-26 21:25:56 +02:00
Janne Grunau
cc29d96d5a arm64: fix inverted register order in transpose_4x4H
Fix related register order issue in ff_h264_idct_add_neon.

Found-by: zjh8890 <243186085@qq.com>
2015-12-21 13:44:20 +01:00
Janne Grunau
9c029f67ca aarch64: use EXTERN_ASM consistently for exported symbols
Based on e3fec3f095 for arm.
2014-02-20 15:24:35 +01:00
Janne Grunau
8438b3f09f aarch64: h264 idct NEON assembler optimizations
Ported from ARMv7 NEON.
2014-01-15 12:13:41 +01:00