The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Arm asm changes by Janne Grunau.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
This makes the generated assembly more internally consistent,
avoiding declaring two labels for the same function (for cases
where EXTERN_ASM is empty) and not declaring a separate unprefixed
label in other cases.
This also makes sure the .func and .type delcarations have the same
prefix. They have previously not been used on the platforms
that have prefixed symbols on arm (iOS), but gas-preprocessor
has recently started using the .func declarations for adding
.thumb_func declarations for such functions.
Signed-off-by: Martin Storsjö <martin@martin.st>
The original test without a forced idct is still useful since it tests
the switching of the idct algorithm/permutation on x86 with MMX. MMXext
or SSE2. Make sure the test runs only if MMX inline asm is available and
force -cpuflags to all.
Add the required bitexact flag for both tests.
Also adjust header #include order and some comments.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Currently ff_interleave_packet_per_dts() waits until it gets a frame for
each stream before outputting packets in interleaved order.
Sparse streams (i.e. streams with much fewer packets than the other
streams, like subtitles or audio with DTX) tend to add up latency and in
specific cases end up allocating a large amount of memory.
Emit the top packet from the packet_buffer if it has a time delta
larger than a specified threshold.
Original report of the issue and initial proposed solution by
mus.svz@gmail.com.
Bug-id: 31
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Prevents using GetBitContexts with data from previous calls.
Fixes access to freed memory.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC:libav-stable@libav.org
With cli usage the decoder might have not set the colorspace during
encoder init, manual colorspace override might be needed in such
cases.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
A dependent slice cannot have address 0.
Prevent an out of array bound load in ff_hevc_cabac_init().
Sample-Id: 00001406-google
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
According to my understanding of T-REC-H.265-2013044 chapter 8.6.1.
Sample-Id: 00001438-google
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Makes it easier to recreate an AVCodecContext for ATRAC3+ decoding,
which is needed in multimedia frameworks, as well as in general cases
where demuxing and decoding are separate entities.
Should fix crashes or corrupt output on pre-SSE2 CPUs when they were
using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in
hfix or hvar single-edge (left/right) extension functions.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
Fixes dependency file generation with gas-preprocessor.pl and clang.
Flags copied from GCC and tested with Apple's clang from Xcode 5 and
5.1 and clang 3.2, 3.3, 3.4 on Linux.