Ported from arm NEON and added vector_dmul_scalar.
Functions between 1.5 and 5 times faster than the C implementations
using Apple's clang-503.0.19 on A7.
* commit '1481d24c3a0abf81e1d7a514547bd5305232be30':
RGBA64 pixel formats
Conflicts:
doc/APIchanges
libavutil/pixdesc.c
libavutil/pixfmt.h
libavutil/version.h
libswscale/utils.c
See: 9569a3c9f4
See: 92afb43162, as well as others
Note: the enum values added in libav are incompatible/different to what ffmpeg used since 3 years
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '831a1180785a786272cdcefb71566a770bfb879e':
Update dsputil- and SIMD-related comments to match reality more closely
Conflicts:
libavcodec/x86/hpeldsp.asm
libavutil/arm/float_dsp_init_arm.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'c708b5403346255ea5adc776645616cc7c61f078':
timer: use mach_absolute_time as high resolution clock on darwin
Conflicts:
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Not guaranteed to be in nanosecond resolution. On iOS 7 the duration
of one tick is 125/3 ns which is still more than an order of magnitude
better then microseconds.
Replace decicycles with the neutral UNITS. Decicycles is strange but
tenths of a nanosecond and unspecific "deci"-ticks for mach_absolute_time
is just silly.
* commit 'a18ef7a76c735bcf78ed4825e33ad7f9f6f77a54':
doc: fix a couple of typos in frame.h
Conflicts:
libavutil/frame.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
new function allows to unref buffer and obtain its data.
Signed-off-by: Lukasz Marek <lukasz.m.luki@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '1155fd02ae7bac215acab316e847c6bb25f74fc3':
frame: add a convenience function for copying AVFrame data
Conflicts:
doc/APIchanges
libavutil/frame.c
libavutil/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '9c029f67ca82147ddfa83a1546ee1e109e11fbd4':
aarch64: use EXTERN_ASM consistently for exported symbols
Merged-by: Michael Niedermayer <michaelni@gmx.at>
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
* commit '874c751cc5b99cd68932e21c2c3a0d21134207e0':
threads: Check w32threads dependencies at the configure stage
Merged-by: Michael Niedermayer <michaelni@gmx.at>
If linking in an object file without this attribute set, the
linker will assume that an executable stack might be needed.
Signed-off-by: Martin Storsjö <martin@martin.st>
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Support the cases where the first and last operand of
the XOP instruction are the same.
Also add vpmacsdql emulation.
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '25a1ba814ad80056247fd357ec4c6911324a3f66':
log: Have function parameter names match between .c and .h file
Conflicts:
libavutil/log.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e3fec3f095ab5ea08ee662942d98526aaf5e3635':
arm: Add EXTERN_ASM to the .func and .type declarations for exported symbols
Merged-by: Michael Niedermayer <michaelni@gmx.at>
This makes the generated assembly more internally consistent,
avoiding declaring two labels for the same function (for cases
where EXTERN_ASM is empty) and not declaring a separate unprefixed
label in other cases.
This also makes sure the .func and .type delcarations have the same
prefix. They have previously not been used on the platforms
that have prefixed symbols on arm (iOS), but gas-preprocessor
has recently started using the .func declarations for adding
.thumb_func declarations for such functions.
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit '9ecb858775483a76c137e8e1ad45a95e318bca61':
doxy: Format @code blocks so they render properly
Merged-by: Michael Niedermayer <michaelni@gmx.at>
Without this a developer would have to add a include every time he
wants to benchmark some code, this is a moderate inconvenience.
This reverts the specific hunk from fb0c9d41d6
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit '045654f422e74be8ed09a0819d39051d67633a09':
doxy: Document better the available AVFrame flags
Merged-by: Michael Niedermayer <michaelni@gmx.at>
NEON and VFP are currently mandatory for all ARMv8 profiles. Both are
handled as extensions as far as cpuflags are concerned. This is
consistent with handling x86_64 which always has SSE2, but still
handles it as an extension.
* qatar/master:
arm: Add an option for making sure NEON registers aren't clobbered
Conflicts:
configure
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '5dae4872357613a0b51120b54a4c5221e0ec3f69':
arm: Allow overriding the alignment set in the function macro
Merged-by: Michael Niedermayer <michaelni@gmx.at>
The function macro always sets .align 2 before declaring the
function label (since 5c5e1ea3) and always sets the section to
.text (since 278caa6a).
The .align 5 before certain functions, added in fc252eba, were added
before .text and .align were added to the function macro and thus
became useless/unused when the function macro got them.
This restores the original intention, to align the loop entry
points.
Signed-off-by: Martin Storsjö <martin@martin.st>
The new code is faster and reuses the previous state in case of
multiple calls.
The previous code could easily end up in near-infinite loops,
if the difference between two clock() calls never was larger than
1.
This makes fate-parseutils finish in finite time when run in wine,
if CryptGenRandom isn't available (which e.g. isn't available if
targeting Windows RT/metro).
Patch originally by Michael Niedermayer but with some modifications
by Martin Storsjö.
Signed-off-by: Martin Storsjö <martin@martin.st>