The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e3fec3f095ab5ea08ee662942d98526aaf5e3635':
arm: Add EXTERN_ASM to the .func and .type declarations for exported symbols
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5bcbb516f2ff45290ef7995b081762e668693672':
arm: Add X() around all references to extern symbols
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Fixes use of uninitialized memory
Fixes: 93728afd9aa074ba14a09bfd93a632fd-asan_static-oob_124a17d_1445_cov_1021181966_DBLK_D_VIXS_1.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The x86 runs short on registers because numerous elements are not static.
In addition, splitting them allows more optimized code, at least for x86.
Arm asm changes by Janne Grunau.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
For the callable function (as opposed to the inline one):
C SSE SSE2 SSE4
Win32: 47 42 29 26
Win64: 30 33 25 23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
It is currently declared as a macro who is set to inlinable functions,
among which a Neon and a default C implementations.
Add a DSP parameter to each inline function, unused except by the
default C implementation which calls a function from the DSP context.
On an Arrandale CPU, gain for an inlined SSE2 function vs. a call:
- Win32: 29 to 26 cycles
- Win64: 25 to 23 cycles
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
This makes the generated assembly more internally consistent,
avoiding declaring two labels for the same function (for cases
where EXTERN_ASM is empty) and not declaring a separate unprefixed
label in other cases.
This also makes sure the .func and .type delcarations have the same
prefix. They have previously not been used on the platforms
that have prefixed symbols on arm (iOS), but gas-preprocessor
has recently started using the .func declarations for adding
.thumb_func declarations for such functions.
Signed-off-by: Martin Storsjö <martin@martin.st>
Fixes use of uninitialized memory
Fixes out of array read
Fixes assertion failure
Fixes part of cb307d24befbd109c6f054008d6777b5/asan_static-oob_124a175_1445_cov_2355279992_DBLK_D_VIXS_1.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Fixes inconsistencies
Fixes use of uninitilaized memory
Fixes part of cb307d24befbd109c6f054008d6777b5/asan_static-oob_124a175_1445_cov_2355279992_DBLK_D_VIXS_1.bit
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The DCBZL instruction is not available for the e500v1 and e500v2
architectures, but may still be recognized by the toolchain, so we need to
remove the test for it explicitly for these architectures.
References: PowerPC™ e500 Core Family Reference Manual (Freescale)
Found-by: Ståle Kristoffersen <staalebk@ifi.uio.no>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Inspired by GCC r86635.
This is more consistent with other man pages. For example in `man git`,
all the "git-help(1)" kind of cross refs are bold.
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
vp8: Use 2 registers for dst_stride and src_stride in neon bilin filter
Conflicts:
libavcodec/arm/vp8dsp_neon.S
Merged-by: Michael Niedermayer <michaelni@gmx.at>
benchmarked on sandybridge x86_64:
1358232 decicycles in flac_lpc_32_c
1244575 decicycles in flac_lpc_32_sse4, James Almer's patch
650045 decicycles in flac_lpc_32_sse4, this patch
I haven't tested the edgecases such as odd block lengths
odd block length tested-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Enable compilation on machines with an old libfdk-aac.
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
fate: force the simple idct for xvid custom matrix test
Conflicts:
tests/fate/xvid.mak
tests/ref/fate/xvid-custom-matrix
See: ef034cbf18
Merged-by: Michael Niedermayer <michaelni@gmx.at>
If a special comment packet shows up in the middle of the stream, we
should extract it out into the vorbis stream metadata dictionary.
Also, if there is metadata in the packet on the way in, it might linger
since we only add data to the dictionary causing stale metadata to be
inserted into the stream. Instead, clear it to remove any doubt about
what is new and old.
Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Currently, if there are multiple 'performer' tags, the last one is the
only one which appears. Instead, join them with a semicolon.
Signed-off-by: Ben Boeckel <mathstuf@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Setting seek_preroll value in AVCodecContext for Opus streams
embedded in ogg container.
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Reviewed-by: Nicolas George <george@nsup.org>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The original test without a forced idct is still useful since it tests
the switching of the idct algorithm/permutation on x86 with MMX. MMXext
or SSE2. Make sure the test runs only if MMX inline asm is available and
force -cpuflags to all.
Add the required bitexact flag for both tests.
* cus/stable:
ffplay: flush subtitle codecs as well with null packets
ffplay: reorder the filters to ensure that inputs of the custom filters are merged first
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '9ecb858775483a76c137e8e1ad45a95e318bca61':
doxy: Format @code blocks so they render properly
Merged-by: Michael Niedermayer <michaelni@gmx.at>