vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
If linking in an object file without this attribute set, the
linker will assume that an executable stack might be needed.
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes the generated assembly more internally consistent,
avoiding declaring two labels for the same function (for cases
where EXTERN_ASM is empty) and not declaring a separate unprefixed
label in other cases.
This also makes sure the .func and .type delcarations have the same
prefix. They have previously not been used on the platforms
that have prefixed symbols on arm (iOS), but gas-preprocessor
has recently started using the .func declarations for adding
.thumb_func declarations for such functions.
Signed-off-by: Martin Storsjö <martin@martin.st>
NEON and VFP are currently mandatory for all ARMv8 profiles. Both are
handled as extensions as far as cpuflags are concerned. This is
consistent with handling x86_64 which always has SSE2, but still
handles it as an extension.
The function macro always sets .align 2 before declaring the
function label (since 5c5e1ea3) and always sets the section to
.text (since 278caa6a).
The .align 5 before certain functions, added in fc252eba, were added
before .text and .align were added to the function macro and thus
became useless/unused when the function macro got them.
This restores the original intention, to align the loop entry
points.
Signed-off-by: Martin Storsjö <martin@martin.st>
The new code is faster and reuses the previous state in case of
multiple calls.
The previous code could easily end up in near-infinite loops,
if the difference between two clock() calls never was larger than
1.
This makes fate-parseutils finish in finite time when run in wine,
if CryptGenRandom isn't available (which e.g. isn't available if
targeting Windows RT/metro).
Patch originally by Michael Niedermayer but with some modifications
by Martin Storsjö.
Signed-off-by: Martin Storsjö <martin@martin.st>
Commit 41578f70cf changed the LLS API, which was
called from libavcodec. Thus using an old libavcodec with a new libavutil will
break.
All scheduled API changes are deferred to the next bump.
XvMC has long ago been superseded by newer acceleration APIs, such as
VDPAU, and few downstreams still support it. Furthermore XvMC is not
implemented within the hwaccel framework, but requires its own specific
code in the MPEG-1/2 decoder, which is a maintenance burden.
This can be optionally disabled whith the "output_corrupt" flags
option. When in "output_corrupt" mode, incomplete frames are
signalled through AVFrame.flags FRAME_FLAG_INCOMPLETE_FRAME.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
It does not make sense in the vast majority of use cases, no currently
defined AV_OPT_TYPE_FLAGS options in Libav set the range to anything
nontrivial, and many of those get it wrong (the "correct" range is
INT_MIN to INT_MAX so that the builtin constant "all" works).
This makes sure that pointers from av_strdup are reallocable,
which is used in av_dict_set if the AV_DICT_APPEND flag is set.
Nothing should rely on pointers from av_strdup being aligned.
Signed-off-by: Martin Storsjö <martin@martin.st>
This is so we can sync to x264's version of FMA4 support.
This partialy reverts commit 79687079a9.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Automatically use VEX-encoding in AVX/AVX2/XOP/FMA3/FMA4
functions for all instructions that exists in a VEX-encoded
version.
This change makes it easier to extend existing code to use AVX2.
Also add support for AVX emulation of a few instructions that
were missing before.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Prevents a crash if the misaligned exception mask bit is
cleared for some reason.
Misaligned SSE functions are only used on AMD Phenom CPUs
and the benefit is miniscule. They also require modifying
the MXCSR control register and by removing those functions
we can get rid of that complexity altogether.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Store XMM6 and XMM7 in the shadow space in functions that
clobbers them. This way we don't have to adjust the stack
pointer as often, reducing the number of instructions as
well as code size.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
SWAP with >=3 named (rather than numbered) args
PERMUTE followed by SWAP with 2 named args
used to produce the wrong permutation
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Now RET checks whether it immediately follows a branch, so the
programmer dosen't have to keep track of that condition. REP_RET
is still needed manually when it's a branch target, but that's
much rarer.
The implementation involves lots of spurious labels, but that's OK
because we strip them.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Prior to this on msvc/icl there was no handling of deprecated functions
and the deprecated warning was disabled.
After enabling there are a number of warnings relating to the CRT and
the use of the non-secure versions of several functions. Defining
_CRT_SECURE_NO_WARNINGS silences these warnings.
Signed-off-by: Martin Storsjö <martin@martin.st>
Add one copy of the function into each of the libraries, similarly
to what we do for log2_tab. When using static libs, only one
copy of the file_open.o object file gets included, while when
using shared libraries, each of them get a copy of its own.
This fixes DLL builds with a statically linked C runtime, where
each DLL effectively has got its own instance of the C runtime,
where file descriptors can't be shared across runtimes.
On systems not using msvcrt, the function is not duplicated.
Signed-off-by: Martin Storsjö <martin@martin.st>
This used to only be necessary in static builds (when using the
dynamically linked C runtime), since the _imp prefixed symbols do
exist when linking to the actual DLL. When building testprogs,
however, the current library (e.g. libavutil for some of the testprogs)
is linked statically.
This fixes make fate on DLL builds when using the dynamically
linked C runtime.
Signed-off-by: Martin Storsjö <martin@martin.st>
When libavformat was changed to use the new avpriv_open function
in 51eb213d00, this silently bypassed the existing wrapper for
win32. Move the win32 wrapper into libavutil/file.c to make sure
it gets called everywhere (not just in the libavformat case).
This makes sure that non-ascii file names gets opened properly
(where file names internally are stored as utf8, but they get
converted to wchar_t and opened with _wsopen).
Signed-off-by: Martin Storsjö <martin@martin.st>
AVIOContext has got an av_class member that only gets set if
opening the context using avio_open2, but not if allocating a
custom IO context. A caller that wants to read AVOptions from
an AVIOContext (recursively using AV_OPT_SEARCH_CHILDREN) may
not know if the AVIOContext actually has got a class set or not.
Signed-off-by: Martin Storsjö <martin@martin.st>
Use this for enabling the ppc timer.h implementation only on
assemblers that support labels in the inline assembly.
Signed-off-by: Martin Storsjö <martin@martin.st>
This matches the other eabi attribute in the same file. This is
required in order to build for arm/hardfloat with other object
file formats than ELF.
Signed-off-by: Martin Storsjö <martin@martin.st>
The check `src > dst' in the form `&c->out[-back] > c->out' invokes
pointer overflow, which is undefined behavior in C.
Remove the check. Also replace `&c->out[-back] < c->out_start' with
a safe form `c->out - c->out_start < back' to avoid overflow.
CC: libav-stable@libav.org
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
The mingw win32 atomics appear to be faulty, so they should not be used
if the gcc ones are available.
Signed-off-by: Martin Storsjö <martin@martin.st>
On the current code, armcc will fail with:
"libavutil/atomic_gcc.h", line 52: Error: #2771: first argument must be
a pointer to integer or enumeration type
Not all gcc configurations have an implementation of all the atomic
operations, and some gcc configurations have some atomic builtins
implemented but not all.
Thus check for the most essential function, whose presence should
indicate that all others are present as well, since it can be used
to implement all the other ones.
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes them pass standalone compilation tests. Previously,
they included atomic.h which included themselves again, leading to
double definitions.
Signed-off-by: Martin Storsjö <martin@martin.st>
These could be used for reference counting, or for keeping track of
decoding progress in references in multithreaded decoders.
Support is provided by gcc/msvc/suncc intrinsics, with a fallback using
pthread mutexes.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.
Signed-off-by: Martin Storsjö <martin@martin.st>
On recent android versions, /proc/self/auxw is unreadable
(unless the process is running running under the shell uid or
in debuggable mode, which makes it hard to notice). See
http://b.android.com/43055 and
https://android-review.googlesource.com/51271 for more information
about the issue.
This makes sure e.g. neon optimizations are enabled at runtime in
android apps even when built in release mode, if configured to
use the runtime detection.
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes sure that the restrict keyword is mapped to whatever
keyword the compiler prefers/supports. This fixes building on MSVC
(and possibly on GCC 2.x as well).
Signed-off-by: Martin Storsjö <martin@martin.st>
Now, nellymoserenc and aacenc no longer depends on dsputil. Independent
of this patch, wmaprodec also does not depend on dsputil, so I removed
it from there also.
This provides a fallback when building with Yasm enabled, but neither
inline assembly, nor the _mm_empty intrinsic are available or enabled.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
The new name is more descriptive and will allow defining a separate
public prefix for externally visible library symbols.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Use this in VP8/H264-8bit loopfilter functions so they can be used if
there is no aligned stack (e.g. MSVC 32bit or ICC 10.x).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This allows compiling optimised functions for features not enabled
in the core build and selecting these at runtime if the system has
the necessary support.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This is consistent with usual ARM nomenclature as well as with the
VFPV3 and NEON symbols which both lack the ARM prefix.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Not all versions of windows have the console color functions,
while io.h might be needed for isatty (which can be found in
unistd.h or io.h).
Signed-off-by: Martin Storsjö <martin@martin.st>
The existence of MapViewOfFile isn't linked to the existence of
io.h.
Not all versions of windows have MapViewOfFile (in particular,
Windows Phone 8 and the "metro" windows 8 API subset don't),
while they still have io.h (and need it for open/read/close).
Signed-off-by: Martin Storsjö <martin@martin.st>
Preventing the use of discouraged or 'insecure' external functions
through defines in an internal header is not a good solution. The
header is not guaranteed to be included universally which makes
overlooking bad use of said functions during review more likely.
There are cases were those functions either are the most straight
forward solution or even have to be used. Using malloc or free is
required if the allocation or release is done by other libraries.
- Add special cases for offsets of 2, 3, or 4 bytes. This means the
offset is always >4 in the generic case, allowing 32-bit copies to
be used there.
- Don't use memcpy() for sizes less than 16 bytes.
Signed-off-by: Mans Rullgard <mans@mansr.com>