Instead of inlining everything into ff_h264_hl_decode_mb(), use
explicit templating to create versions of the called functions
with constant parameters filled in. This greatly speeds up
compilation of h264.c and reduces the code size without any
measurable impact on performance.
Compilation time for h264.c on an i7 goes from 30s to 5.5s.
Code size is reduced by 430kB.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Previously it was interpreted as number of bytes, while the
documentation stated that it was the number of 8 byte blocks.
This makes it behave similarly to the existing AES code.
Signed-off-by: Martin Storsjö <martin@martin.st>
Previously it was interpreted as number of bytes, while the
documentation stated that it was the number of 8 byte blocks.
This makes it behave similarly to the existing AES code.
Signed-off-by: Martin Storsjö <martin@martin.st>
Using ff_mspel_motion assumes that s (a MpegEncContext
poiinter) really is a Wmv2Context.
This fixes crashes in error resilience on vc1/wmv3 videos.
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
If the output frame size is smaller than the input sample rate,
and the input stream time base corresponds exactly to the input
frame size (getting input packet timestamps like 0, 1, 2, 3, 4 etc),
the output timestamps from the filter will be like
0, 1, 2, 3, 4, 4, 5 ..., leadning to non-monotone timestamps later.
A concrete example is input mp3 data having frame sizes of 1152
samples, transcoded to aac with 1024 sample frames.
By setting the audio filter time base to the sample rate, we will
get sensible timestamps for all output packets, regardless of
the ratio between the input and output frame sizes.
Signed-off-by: Martin Storsjö <martin@martin.st>
This tool uses lavfi internal symbols not accessible in shared
libraries. TESTPROGS are linked statically to allow them use of
library internals not normally exported.
Signed-off-by: Mans Rullgard <mans@mansr.com>
For reading from normal files on disk, the queue limits for
demuxed data work fine, but for reading data from realtime
streams, they mean we're not reading from the input stream
at all once the queue limit has been reached. For TCP streams,
this means that writing to the socket from the peer side blocks
(potentially leading to the peer dropping data), and for UDP
streams it means that our kernel might drop data.
For some protocols/servers, the server initially sends a
large burst with data to fill client side buffers, but once
filled, we should keep reading to avoid dropping data.
For all realtime streams, it IMO makes sense to just buffer
as much as we get (rather in buffers in avplay.c than in
OS level buffers). With this option set, the input thread
should always be blocking waiting for more input data,
never sleeping waiting for the decoder to consume data.
Signed-off-by: Martin Storsjö <martin@martin.st>
The buffers are only allocated once, although it can happen from
any of a few different places, so there is no need to use realloc.
Using av_malloc() ensures they are aligned suitably for SIMD
optimisations.
Signed-off-by: Mans Rullgard <mans@mansr.com>
In Smooth Streaming, the fragments are addressed by time, and
the manifest only stores one list of time offests for all streams,
so all streams need to have identical fragment offsets. Warn if
this isn't the case, so that the user can fix the files instead of
getting failures at runtime when the fragments can't be found.
Signed-off-by: Martin Storsjö <martin@martin.st>
Currently, --enable-small turns av_always_inline into plain inline,
which is more or less ignored by the compiler. While the intent of
this is probably to reduce code size by avoiding some inlining, it
has more far-reaching effects.
We use av_always_inline in two situations:
1. The body of a function is smaller than the call overhead.
Instances of these are abundant in libavutil, the bswap.h
functions being good examples.
2. The function is a template relying on constant propagation
through inlined calls for sane code generation. These are
often found in motion compensation code.
Both of these types of functions should be inlined even if targeting
small code size.
Although GCC has heuristics for detecting the first of these types,
it is not always reliable, especially when the function uses inline
assembler, which is often the reason for having those functions in
the first place, so making it explicit is generally a good idea.
The size increase from inlining template-type functions is usually
much smaller than it seems due to different branches being mutually
exclusive between the different invocations. The dead branches can,
however, only be removed after inlining and constant propagation have
been performed, which means the initial cost estimate for inlining
these is much higher than is actually the case, resulting in GCC
often making bad choices if left to its own devices.
Furthermore, the GCC inliner limits how much it allows a function to
grow due to automatic inlining of calls, and this appears to not take
call overhead into account. When nested inlining is used, the limit
may be hit before the innermost level is reached. In some cases, this
has prevented inlining of type 1 functions as defined above, resulting
in significant performance loss.
Signed-off-by: Mans Rullgard <mans@mansr.com>