When allocating stack space with a larger alignment than the known stack
alignment a temporary register is used for storing the stack pointer.
Ensure that this isn't one of the registers used for passing arguments.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
* Correctly handle FMA instructions with memory operands.
* Print a warning if FMA instructions are used without the correct cpuflag.
* Simplify the instantiation code.
* Clarify documentation.
Only the last operand in FMA3 instructions can be a memory operand. When
converting FMA4 instructions to FMA3 instructions we can utilize the fact
that multiply is a commutative operation and reorder operands if necessary
to ensure that a memory operand is used only as the last operand.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This fixes builds with --disable-vfp.
Checking for the armv6 cpu flag is incorrect, since vfpv2 isn't
armv6 specific.
Signed-off-by: Martin Storsjö <martin@martin.st>
The vector mode was deprecated in ARMv7-A/VFPv3 and various cpu
implementations do not support it in hardware. Vector mode code will
depending the OS either be emulated in software or result in an illegal
instruction on cpus which does not support it. This was not really
problem in practice since NEON implementations of the same functions are
preferred. It will however become a problem for checkasm which tests
every cpu flag separately.
Since this is a cpu feature newer cpu do not support anymore the
behaviour of this flag differs from the other flags. It can be only
activated by runtime cpu feature selection.
The ISB (instruction synchronization barrier) might be too heavy for
START/STOPTIMER use but should be more accurate in checkasm where the
timing overhead is subtracted.
Include macros.h explicitly in common.h so that external code using
FFALIGN does not break. It was already implicitly included through
version.h. Include macros.h in lls.h and internal.h for FFALIGN.
lls.h was including common.h only for FFALIGN and internal.h was
missing the include for FFALIGN. `make checkheaders` did not catch it
because it's an internal header.
They're short enough that inlining them actually reduces code size due to
all the overhead associated with making a function call.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
These field are difficult to interpret, and are provided by a single
encoder (mpegvideoenc). In general they do not belong to a structure
containing raw data only, so remove them from AVFrame.
Mpegvideoenc now uses a private field in Picture for its internal
computations.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
MIPS R6 supports unaligned memory access and does not have
the load/store-left/right family of instructions.
Signed-off-by: Vicente Olivert Riera <Vincent.Riera at imgtec.com>
Signed-off-by: Luca Barbato <lu_zero at gentoo.org>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Ensure that the components are ordered consistently, ie. always
RGB(A) and YUV(A). This allows to identify a specific plane on a given
pixel format without hard-coding knowledge of the plane order.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
There is no practical benefit in having this structure elements
bit packed given the size of the structure and its usage.
Change types from uint16_t (packed) to plain int in order to simplify
modifying the structure and accessing its fields.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
The .text section is already 16-byte aligned by default on all supported
platforms so `SECTION_TEXT` isn't any different from `SECTION .text`.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Change ALLOC_STACK to always align the stack before allocating stack space for
consistency. Previously alignment would occur either before or after allocating
stack space depending on whether manual alignment was required or not.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Emulation requires a temporary register if arguments 1 and 4 are the same; this
doesn't obey the semantics of the original instruction, so we can't emulate
that in x86inc.
Also add pmacsdql emulation.
Signed-off-by: Henrik Gramner <henrik@gramner.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Also replace custom tests for MD5 with those published in RFC 2202
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Improves the accuracy of measurements, especially in short sections.
To quote the Intel 64 and IA-32 Architectures Software Developer's Manual:
"The RDTSC instruction is not a serializing instruction. It does not necessarily
wait until all previous instructions have been executed before reading the counter.
Similarly, subsequent instructions may begin execution before the read operation
is performed. If software requires RDTSC to be executed only after all previous
instructions have completed locally, it can either use RDTSCP (if the processor
supports that instruction) or execute the sequence LFENCE;RDTSC."
SSE2 is a requirement for lfence so only use it on SSE2-capable systems.
Prefer lfence;rdtsc over rdtscp since rdtscp is supported on fewer systems.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This returns something like "v12_dev0-1332-g333a27c". This is much more
useful than the individual library versions, of which there are too
many, and which are very hard to map back to releases or git commits.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
The C runtime C99 compatibility had been improved a lot and it now
rejects some of the compatibility defines provided for the older
versions.
Many thanks to Ray for the time spent testing.
Bug-Id: 864
CC: libav-stable@libav.org
This was probably broken some time ago. The breakage is now part of the
ABI. For example, we have:
AV_PIX_FMT_XYZ12BE
AV_PIX_FMT_NV16
AV_PIX_FMT_NV20LE
AV_PIX_FMT_NV20LE is wrong. It has the value 113, but as little-endian
format it should be even. This must have been quite obvious when these
formats were added (because of the AV_PIX_FMT_XYZ12BE entry), but
nobody cared or knew about this.
The future libavutil major bump will also break this additionally,
because disabling FF_API_VDPAU will remove an odd number of entries from
the middle of the enum.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Silences warning(s) like:
libavcodec/x86/fft.asm:93: warning: section flags ignored on
section redeclaration
The cause of this warning is that because `struc` and `endstruc`
attempts to revert to the previous section state [1].
The section state is stored in the macro __SECT__, defined by
x86inc.asm to be `.note.GNU-stack ...`, through the `SECTION`
directive [2].
Thus, the `.note.GNU-stack` section is defined twice
(once in x86inc.asm, once during `endstruc`), causing the warning.
That is the first part of the commit: using the primitive `[section]` format
for .note.GNU-stack etc., which does not update `__SECT__` [2].
That fixes only half of the problem. Even without any `SECTION` directives,
`__SECT__` is predefined as `.text`, which conflicting with the later
`SECTION_TEXT` (which expands to `.text align=16`).
[1]: http://www.nasm.us/doc/nasmdoc6.html#section-6.4
[2]: http://www.nasm.us/doc/nasmdoc6.html#section-6.3
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Useful to understand where and in what execution state a certain message
is generated. It is enabled only when optimizations are disabled, since
function names are not printed otherwise.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
This will allow to copy the matrix as is and it is just cleaner to keep
the matrix in the same order specified by the mov standard (which is
also explicitly described in the documentation).
In order to preserve compatibility, flip the angle sign in the display API
av_display_rotation_set() and av_display_rotation_get(), and improve the
documentation mentioning the rotation direction.
When all the codepaths using manually set .arch/.fpu code is
behind runtime detection, the elf attributes should be suppressed.
This allows tools to know that the final built binary doesn't
strictly require these extensions.
Signed-off-by: Martin Storsjö <martin@martin.st>
add ARM code for implementing av_clip_intp2 using the ssat instruction
on Cortex-A8, av_clip_intp2_arm() is faster than av_clip_intp2_c() and
the generic av_clip(), about -19%
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
there already is a function, av_clip_uintp2() that clips a signed integer
to an unsigned power-of-two range, i.e. 0,2^p-1
this patch adds a function av_clip_intp2() that clips a signed integer
to a signed power-of-two range, i.e. -(2^p),(2^p-1)
the new function can be used as a special case for av_clip(), e.g.
av_clip(x, -8192, 8191) can be rewritten as av_clip_intp2(x, 13)
there are ARM instructions, usat and ssat resp., which map nicely to these
functions (see next patch)
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This uses explicit memory copying to read and write pointer to pointers
of arbitrary object types. This works provided that the architecture
uses the same representation for all pointer types (the previous code
made that assumption already anyway).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Move the lavc/imgconvert functions and rename them as follows:
avpicture_get_size -> av_image_get_buffer_size()
avpicture_fill -> av_image_fill_arrays()
avpicture_layout -> av_image_copy_to_buffer()
The new functions have an align parameter, which allows to define the
linesize alignment assumed in the buffer (which is set or read).
The names of the functions are consistent with the lavu/samples API
(av_samples_get_buffer_size(), av_samples_fill_arrays()).
A redundant check has been dropped from av_image_fill_arrays().
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
The buffer pool has to atomically add and remove entries from the linked
list of available buffers. This was done by removing the entire list
with a CAS operation, working on it, and then setting it back again
(using a retry-loop in case another thread was doing the same thing).
This could effectively cause memory leaks: while a thread was working on
the buffer list, other threads would allocate new buffers, increasing
the pool's total size. There was no real leak, but since these extra
buffers were not needed, but not free'd either (except when the buffer
pool was destroyed), this had the same effects as a real leak. For some
reason, growth was exponential, and could easily kill the process due
to OOM in real-world uses.
Fix this by using a mutex to protect the list operations. The fancy
way atomics remove the whole list to work on it is not needed anymore,
which also avoids the situation which was causing the leak.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Also add no-op fallbacks when threading is disabled.
This helps keeping the code clean if Libav is compiled for targets
without threading. Since we assume that no threads of any kind are used
in such configurations, doing nothing is ok by definition.
Based on a patch by wm4 <nfxjfg@googlemail.com>.
This doesn't add any dependency on library internals, since this
only is a static inline function that gets built into each of the
calling functions - this is only to reduce the code duplication.
Signed-off-by: Martin Storsjö <martin@martin.st>
gmtime isn't thread safe in general. In msvcrt (which lacks gmtime_r),
the buffer used by gmtime is thread specific though.
One call to localtime is left in avconv_opt.c, where thread safety
shouldn't matter (instead of making avconv depend on the libavutil
internal header).
Signed-off-by: Martin Storsjö <martin@martin.st>
This allows writing most code as if they always are is available.
These are ok to use from other libraries even though it's not a
public header, since they only provide an inline declaration, and
doesn't add an actual dependency on lavu internals. (This can be
considered more a build system compatibility fallback than a
libavutil feature.)
Signed-off-by: Martin Storsjö <martin@martin.st>
Since av_gettime() is used in a number of places where actual
real time clock is required, the monotonic clock introduced in
ebef9f5a5 would have consequences that are hard to handle. Instead
split it into a separate function that can be used in the cases
where only relative time is desired.
On platform where no monotonic clock is available, the difference
between the two av_gettime functions is not clear, and one could
mistakenly use the relative clock where an absolute one is
required. Therefore add an offset, to make it evident that the
time returned from av_gettime_relative never is actual current
real time, even though it is based on av_gettime.
Based on a patch by Olivier Langlois.
Signed-off-by: Martin Storsjö <martin@martin.st>
In order to support metadata being set as an option, it's necessary to be able
to set dictionaries as values.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The rationale is that you have a packed format in form
<greyscale sample> <alpha sample> <greyscale sample> <alpha sample>
and shortening greyscale to 'G' might make one thing about Greenscale instead.
An alias pixel format and color space name are provided for compatibility.
libavutil/cpu-test prints raw and effective cpu flags to STDERR. Detected
cpu flags can be useful for debugging fate errors.
No comparison of the result against a expected result since that would
require fate config specific references.
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in butterflies_float_c() / ff_butterflies_float_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1542.8 43.7 1470.5 41.5 100.0% +4.9%
butterflies_float 130.0 11.9 70.2 12.1 100.0% +85.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
I benchmarked the result by measuring the number of gperftools samples that
hit anywhere in the AAC decoder (starting from aac_decode_frame()) or
specifically in vector_fmul_window_c() / ff_vector_fmul_window_vfp() for the
same sample AAC stream:
Before After
Mean StdDev Mean StdDev Confidence Change
Audio decode 1598.2 47.4 1529.2 25.4 100.0% +4.5%
vector_fmul_window 244.0 22.1 188.9 22.3 100.0% +29.2%
Signed-off-by: Martin Storsjö <martin@martin.st>
Categorize the enum and functions as "audio-related".
Signed-off-by: Timothy Gu <timothygu99@gmail.com>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
When running on a 64 bit kernel, /proc/cpuinfo lists different
optional features than on 32 bit kernels (because some of them
are mandatory in the 64 bit implemenations).
The kernel does list the old features properly if they are queried
via /proc/self/auxv though - however this file is not always readable
(e.g. on most android systems). The getauxval function could also
provide the same info as /proc/self/auxv even if this file isn't
readable, but this function is not always available (and thus would
need to be loaded with dlsym for compatibility with older android
versions).
The android cpufeatures library does this slightly differently,
by assuming that these are available if the "CPU architecture"
line is >= 8, see [1] for details.
It has been suggested to include the old, non-optional features in
/proc/cpuinfo as well, but that suggested patch never was merged.
See [2] for the discussion around this suggestion.
[1] https://android-review.googlesource.com/91380
[2] http://marc.info/?l=linux-arm-kernel&m=139087240101974
Signed-off-by: Martin Storsjö <martin@martin.st>
Both gnu as and clang treat lines starting with '#' as comments if they
aren't consumed by the C-style preprocessor.
Using '//' does not work with clang since comments are removed before
macro expansion.
Blackfin is a painful platform to work with, no test machines are available
and the range of multimedia applications is dubious. Thus it only represents
a maintenance burden.
This fixes building in PIC mode with gas. The examples in the gas
manual showed using a # here even though gas itself actually didn't
support that syntax (and the gas test suite only tests it without
the extra hash sign).
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
Add AV_PKT_DATA_DISPLAYMATRIX and AV_FRAME_DATA_DISPLAYMATRIX as stream and
frame side data (respectively) to describe a display transformation matrix
for linear transformation operations on the decoded video.
Add functions to easily extract a rotation angle from a matrix and
conversely to setup a matrix for a given rotation angle.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This fixes usage of AV_TIME_BASE_Q in C++ applications, which
cannot use compound literals directly in their code.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
According to the ReplayGain spec, the peak amplitude may overflow and may result
in peak amplitude values greater than 1.0 with psychoacoustically coded audio,
such as MP3. Fully compliant decoders must allow peak overflows.
Additionally, having peak values in the 0<->UINT32_MAX scale makes it more
difficult for applications to actually use the peak values (e.g. when
implementing clipping prevention) since values have to be rescaled down.
This patch corrects the peak parsing by removing the rescaling of the decoded
values between 0 and UINT32_MAX and the 1.0 upper limit.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
And provide extended coloring capabilities for debugging.
The default colors do not change in 256 more to keep
supporting people using Black on White, White on Black and
Solarized terminals.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Ported from arm NEON and added vector_dmul_scalar.
Functions between 1.5 and 5 times faster than the C implementations
using Apple's clang-503.0.19 on A7.
Not guaranteed to be in nanosecond resolution. On iOS 7 the duration
of one tick is 125/3 ns which is still more than an order of magnitude
better then microseconds.
Replace decicycles with the neutral UNITS. Decicycles is strange but
tenths of a nanosecond and unspecific "deci"-ticks for mach_absolute_time
is just silly.
vector_fmul and vector_fmac_scalar are guaranteed that they can process in
batch of 16 elements, but their SSE versions only does 8 at a time.
Therefore, unroll them a bit.
299 to 261c for 256 elements in vector_fmac_scalar on Arrandale/Win64.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
If linking in an object file without this attribute set, the
linker will assume that an executable stack might be needed.
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes the generated assembly more internally consistent,
avoiding declaring two labels for the same function (for cases
where EXTERN_ASM is empty) and not declaring a separate unprefixed
label in other cases.
This also makes sure the .func and .type delcarations have the same
prefix. They have previously not been used on the platforms
that have prefixed symbols on arm (iOS), but gas-preprocessor
has recently started using the .func declarations for adding
.thumb_func declarations for such functions.
Signed-off-by: Martin Storsjö <martin@martin.st>
NEON and VFP are currently mandatory for all ARMv8 profiles. Both are
handled as extensions as far as cpuflags are concerned. This is
consistent with handling x86_64 which always has SSE2, but still
handles it as an extension.
The function macro always sets .align 2 before declaring the
function label (since 5c5e1ea3) and always sets the section to
.text (since 278caa6a).
The .align 5 before certain functions, added in fc252eba, were added
before .text and .align were added to the function macro and thus
became useless/unused when the function macro got them.
This restores the original intention, to align the loop entry
points.
Signed-off-by: Martin Storsjö <martin@martin.st>
The new code is faster and reuses the previous state in case of
multiple calls.
The previous code could easily end up in near-infinite loops,
if the difference between two clock() calls never was larger than
1.
This makes fate-parseutils finish in finite time when run in wine,
if CryptGenRandom isn't available (which e.g. isn't available if
targeting Windows RT/metro).
Patch originally by Michael Niedermayer but with some modifications
by Martin Storsjö.
Signed-off-by: Martin Storsjö <martin@martin.st>
Commit 41578f70cf changed the LLS API, which was
called from libavcodec. Thus using an old libavcodec with a new libavutil will
break.
All scheduled API changes are deferred to the next bump.