When compiling FFmpeg with GCC-9, some very random segfaults were
observed in code which had previously called down into the SBC encoder
NEON assembly routines. This was caused by these functions clobbering
some of the vfp callee saved registers (d8 - d15 aka q4 - q7). GCC was
using these registers to save local variables, but after these
functions returned, they would contain garbage.
Fix by reallocating the registers in the two affected functions in
the following way:
ff_sbc_analyze_4_neon: q2-q5 => q8-q11, then q1-q4 => q8-q11
ff_sbc_analyze_8_neon: q2-q9 => q8-q15
The reason for using these replacements is to keep closely related
sets of registers consecutively numbered which hopefully makes the
code more easy to follow. Since this commit only reallocates
registers, it should have no performance impact.
Signed-off-by: James Cowgill <jcowgill@debian.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
It is not needed on x64, because the AV_COPY* and AV_ZERO*
macros never use MMX on x64.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Provide optimized implementation for vsse_intra16 for arm64.
Performance tests are shown below.
- vsse_4_c: 155.2
- vsse_4_neon: 36.2
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation for vsad_intra16 function for arm64.
Performance comparison tests are shown below.
- vsad_4_c: 177.5
- vsad_4_neon: 23.5
Benchmarks and tests are run with checkasm tool on AWS Gravtion 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation of vsse16 for arm64.
Performance comparison tests are shown below.
- vsse_0_c: 257.7
- vsse_0_neon: 59.2
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Provide optimized implementation of vsad16 function for arm64.
Performance comparison tests are shown below.
- vsad_0_c: 285.2
- vsad_0_neon: 39.5
Benchmarks and tests are run with checkasm tool on AWS Graviton 3.
Co-authored-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Hubert Mazur <hum@semihalf.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Add "slice" intra refresh type to h264_qsv and hevc_qsv. This type means
horizontal refresh by slices without overlapping. Also update the doc.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
This duration is equal to the longest duration in all track's tkhd atoms, which
may be comprised of the sum of all edit lists in each track. Empty edit lists
in tracks represent start_time, and the actual media duration is stored in the
mdhd atom.
This change lets the generic demux code derive the longest track duration taken
from mdhd atoms, so the correct duration and start_time combination will be
reported.
Should fix ticket #9775.
Reviewed-by: zhilizhao(赵志立) <quinkblack@foxmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
These macros are definitions, not only declarations and therefore
should not contain a semicolon. Such a semicolon is actually
spec-incompliant, but compilers happen to accept them.
Reviewed-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>