Not guaranteed to be in nanosecond resolution. On iOS 7 the duration
of one tick is 125/3 ns which is still more than an order of magnitude
better then microseconds.
Replace decicycles with the neutral UNITS. Decicycles is strange but
tenths of a nanosecond and unspecific "deci"-ticks for mach_absolute_time
is just silly.
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
If linking in an object file without this attribute set, the
linker will assume that an executable stack might be needed.
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes the generated assembly more internally consistent,
avoiding declaring two labels for the same function (for cases
where EXTERN_ASM is empty) and not declaring a separate unprefixed
label in other cases.
This also makes sure the .func and .type delcarations have the same
prefix. They have previously not been used on the platforms
that have prefixed symbols on arm (iOS), but gas-preprocessor
has recently started using the .func declarations for adding
.thumb_func declarations for such functions.
Signed-off-by: Martin Storsjö <martin@martin.st>
NEON and VFP are currently mandatory for all ARMv8 profiles. Both are
handled as extensions as far as cpuflags are concerned. This is
consistent with handling x86_64 which always has SSE2, but still
handles it as an extension.
The function macro always sets .align 2 before declaring the
function label (since 5c5e1ea3) and always sets the section to
.text (since 278caa6a).
The .align 5 before certain functions, added in fc252eba, were added
before .text and .align were added to the function macro and thus
became useless/unused when the function macro got them.
This restores the original intention, to align the loop entry
points.
Signed-off-by: Martin Storsjö <martin@martin.st>
The new code is faster and reuses the previous state in case of
multiple calls.
The previous code could easily end up in near-infinite loops,
if the difference between two clock() calls never was larger than
1.
This makes fate-parseutils finish in finite time when run in wine,
if CryptGenRandom isn't available (which e.g. isn't available if
targeting Windows RT/metro).
Patch originally by Michael Niedermayer but with some modifications
by Martin Storsjö.
Signed-off-by: Martin Storsjö <martin@martin.st>
Commit 41578f70cf changed the LLS API, which was
called from libavcodec. Thus using an old libavcodec with a new libavutil will
break.
All scheduled API changes are deferred to the next bump.
XvMC has long ago been superseded by newer acceleration APIs, such as
VDPAU, and few downstreams still support it. Furthermore XvMC is not
implemented within the hwaccel framework, but requires its own specific
code in the MPEG-1/2 decoder, which is a maintenance burden.
This can be optionally disabled whith the "output_corrupt" flags
option. When in "output_corrupt" mode, incomplete frames are
signalled through AVFrame.flags FRAME_FLAG_INCOMPLETE_FRAME.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It does not make sense in the vast majority of use cases, no currently
defined AV_OPT_TYPE_FLAGS options in Libav set the range to anything
nontrivial, and many of those get it wrong (the "correct" range is
INT_MIN to INT_MAX so that the builtin constant "all" works).
This makes sure that pointers from av_strdup are reallocable,
which is used in av_dict_set if the AV_DICT_APPEND flag is set.
Nothing should rely on pointers from av_strdup being aligned.
Signed-off-by: Martin Storsjö <martin@martin.st>
This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a9.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Prior to this on msvc/icl there was no handling of deprecated functions
and the deprecated warning was disabled.
After enabling there are a number of warnings relating to the CRT and
the use of the non-secure versions of several functions. Defining
_CRT_SECURE_NO_WARNINGS silences these warnings.
Signed-off-by: Martin Storsjö <martin@martin.st>