binutils 2.43 has stricter validation for labels[1] and results in errors
when building ffmpeg for armv5:
src/libavcodec/arm/mlpdsp_armv5te.S:232: Error: junk at end of line, first unrecognized character is `0'
Remove the leading zero in the "01" label to resolve this error.
[1] https://sourceware.org/git/?p=binutils-gdb.git;a=commit;h=226749d5a6ff0d5c607d6428d6c81e1e7e7a994b
Signed-off-by: Ross Burton <ross.burton@arm.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
(cherry picked from commit 654bd47716)
When compiling FFmpeg with GCC-9, some very random segfaults were
observed in code which had previously called down into the SBC encoder
NEON assembly routines. This was caused by these functions clobbering
some of the vfp callee saved registers (d8 - d15 aka q4 - q7). GCC was
using these registers to save local variables, but after these
functions returned, they would contain garbage.
Fix by reallocating the registers in the two affected functions in
the following way:
ff_sbc_analyze_4_neon: q2-q5 => q8-q11, then q1-q4 => q8-q11
ff_sbc_analyze_8_neon: q2-q9 => q8-q15
The reason for using these replacements is to keep closely related
sets of registers consecutively numbered which hopefully makes the
code more easy to follow. Since this commit only reallocates
registers, it should have no performance impact.
Signed-off-by: James Cowgill <jcowgill@debian.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
(cherry picked from commit 50a4dff69f)
Signed-off-by: Martin Storsjö <martin@martin.st>
No longer used by anything.
Unfortunately the old FFT_FLOAT/FFT_FIXED_32 is left as-is. It's
simply too much work for code meant to be all removed anyway.
Fix an occasional crash for hevc decoder in ARM 32 platform, the
root cause is the memory over read(read cross the memory boundary)
in SAO NENO functions ff_hevc_sao_band_filter_neon_8 and
ff_hevc_sao_edge_filter_neon_8.
After this fix, the crash disapper in the massive Android phone
test.
Signed-off-by: qoroliang <qoroliang@tencent.com>
* commit '0676de935b1e81bc5b5698fef3e7d48ff2ea77ff':
arm: Implement a NEON version of 422 h264_h_loop_filter_chroma
Merged-by: James Almer <jamrial@gmail.com>
This makes it similar to put_epel16_v6, and gives a 10-25%
speedup of this function.
Before: Cortex A7 A8 A9 A53 A72
vp8_put_epel16_h6v6_neon: 3058.0 2218.5 2459.8 2183.0 1572.2
After:
vp8_put_epel16_h6v6_neon: 2670.8 1934.2 2244.4 1729.4 1503.9
Signed-off-by: Martin Storsjö <martin@martin.st>
When targeting darwin, clang requires commas between arguments,
while the no-comma form is allowed for other targets.
Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.
Signed-off-by: Martin Storsjö <martin@martin.st>
Clang supports the macro expansion counter (used for making unique
labels within macro expansions), but not when targeting darwin.
Convert uses of the counter into normal local labels, as used
elsewhere.
Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.
Signed-off-by: Martin Storsjö <martin@martin.st>
When targeting darwin, clang requires commas between arguments,
while the no-comma form is allowed for other targets.
Since Xcode 9.3, the bundled clang supports altmacro and doesn't
require using gas-preprocessor any longer.
Signed-off-by: Martin Storsjö <martin@martin.st>
Compilation error "out of range" fixed for armeabi-v7a. Compilation failed
trying to build libvlc.aar for ARM7 android on ubuntu 16.04 host. Error
messages is "Offset out of range". The reason of the error is assembler LDR
directives in function "ff_hevc_transform_luma_4x4_neon_8" need local storage
in range <1k, but no such storage provided.
Based on a patch by Ihor Bobalo <bob@eleks.com>
Suggested-by: wbs
Signed-off-by: James Almer <jamrial@gmail.com>
* commit '9dde6ab06c48f9447cd16f39bee33569cddb7be4':
arm: Fix SIGBUS on ARM when compiled with binutils 2.29
Merged-by: James Almer <jamrial@gmail.com>
* commit '118dd4a321a2d67f67c21b076abd0b4d939ab642':
hevc: 16x16 NEON idct: Use the right element size for loads/stores
Merged-by: James Almer <jamrial@gmail.com>
* commit 'e1c2453a4fac1f7116244d0d05310935c20887e6':
arm: hevc_idct: Tune the add_res_8x8 and add_res_32x32 functions
Merged-by: James Almer <jamrial@gmail.com>
* commit '358adef0305618219522858e471edf7e0cb4043e':
hevc: Add NEON IDCT DC functions for bitdepth 8
See 03cecf45c1
Merged-by: James Almer <jamrial@gmail.com>
In binutils 2.29, the behavior of the ADR instruction changed so that 1 is
added to the address of a Thumb function (previously nothing was added). This
allows the loaded address to be passed to a BLX instruction and the correct
mode change will occur.
See: https://sourceware.org/bugzilla/show_bug.cgi?id=21458
By using adr with a label that isn't annotated as a thumb function,
we avoid the new behaviour in binutils 2.29 and get the same behaviour
as in prior releases, and as in other assemblers (ms armasm.exe,
clang's built in assembler) - an idea that Janne Grunau came up with.
Signed-off-by: Martin Storsjö <martin@martin.st>
It is redundant with costable. The first half of sintable is
identical with the second half of costable. The second half
of sintable is negative value of the first half of sintable.
The computation is changed to handle sign of sin values, in
C code and ARM assembly code.
Signed-off-by: Muhammad Faiz <mfcc64@gmail.com>
The code originally pre-multiply by 2 the steps, causing the running sum
of the h factors to drift away due to the lack of precision. It quickly
causes an inaccuracy > 0.01.
I tried diverse approaches such as multiply by 2.0 (instead of adding
the value itself) without success.
I'm unable to bench the impact of this change, feel free to compare.
This commit fixes the incoming aacpsdsp tests.
Following is an alternative simplified function (matching the incoming
AArch64 code) that may be used:
function ff_ps_stereo_interpolate_neon, export=1
vld1.32 {q0}, [r2]
vld1.32 {q1}, [r3]
ldr r12, [sp]
vmov.f32 q8, q0
vmov.f32 q9, q1
vzip.32 q8, q0
vzip.32 q9, q1
1:
vld1.32 {d4}, [r0,:64]
vld1.32 {d6}, [r1,:64]
vadd.f32 q8, q8, q9
vadd.f32 q0, q0, q1
vmov.f32 d5, d4
vmov.f32 d7, d6
vmul.f32 q2, q2, q8
vmla.f32 q2, q3, q0
vst1.32 {d4}, [r0,:64]!
vst1.32 {d5}, [r1,:64]!
subs r12, r12, #1
bgt 1b
bx lr
endfunc
clang now (in the upcoming 5.0 version) is capable of building our
arm assembly without relying on gas-preprocessor, although clang/LLVM
doesn't support .dn register aliases.
The VC1 MC assembly was only built and used if the chosen assembler
supported the .dn directives though. This was supported as long as
gas-preprocessor was used.
This means that VC1 decoding got a speed regression on clang 5.0,
unless the user manually chose using gas-preprocessor again.
By avoiding using the .dn register aliases, we can build the VC1 MC
assembly with the latest clang version.
Support for the .dn/.qn directives in clang/LLVM isn't actively planned,
see https://bugs.llvm.org/show_bug.cgi?id=18199.
This partially reverts 896a5bff64.
Signed-off-by: Martin Storsjö <martin@martin.st>