It is only used for error resilience. This allows building the
h264 decoder without dsputil, if error resilience is disabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
Allows use of AVHWAccel based decoders with frame based multithreading.
The decoders will be forced into an non-concurrent mode by delaying
ff_thread_finish_setup() calls after decoding of the current frame
is finished.
This wastes memory by unnecessarily using multiple threads and thus
copies of the decoder context but allows seamless switching between
hardware accelerated and frame threaded software decoding when the
hardware decoder does not support the stream.
Error resilience is enabled by the h264 decoder, unless explicitly
disabled. --disable-everything --enable-decoder=h264 will produce
a h264 decoder with error resilience enabled, while
--disable-everything --enable-decoder=h264 --disable-error-resilience
will produce a h264 decoder with error resilience disabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
AVCodecContext.bits_per_raw_sample is updated from the previous thread
in the generic update function before the codec specific update_thread
function is called. The check for reinitialization of dsp functions uses
bits_per_raw_sample. When called from update_thread_context it will be
already at the current value and the dsp functions aren't updated if
only the bit depth changes.
This ensures the hwaccel privdata does not leak when a frame buffer could
not be allocated (and toggle the assert when the frame is re-used).
Having no frame buffer available is quite common when using the DXVA2
hwaccel in situations where the DXVA2 renderer is being re-allocated, for
example when moving between displays.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This makes the decoder independent of mpegvideo.
This copy of the draw_horiz_band code is simplified compared to
the "generic" mpegvideo one which still has a number of special
cases for different codecs.
Signed-off-by: Martin Storsjö <martin@martin.st>
Not all hwaccels implement all codecs, so using one single list for
multiple such codecs means some codecs will be represented in the list,
even though they don't actually handle that codec. Copying specific
lists in each codec fixes that.
Signed-off-by: Martin Storsjö <martin@martin.st>
The decoder assumes a single bit depth for all the planes
while the specification allows different bit depths for luma
and chroma.
Avoid the possible problems described in CVE-2013-2277
CC: libav-stable@libav.org
Instead, only extend edges on-demand when the motion vector actually
crosses the visible decoded area using ff_emulated_edge_mc(). This
changes decoding time for cathedral from 8.722sec to 8.706sec, i.e.
0.2% faster overall. More generally (VP8 uses this also), low-motion
content gets significant speed improvements, whereas high-motion content
tends to decode in approximately the same time.
Signed-off-by: Martin Storsjö <martin@martin.st>
These functions are mostly H264-specific (the only other user I can
spot is bink), and this allows us to special-case some functionality
for H264. Also remove the 16-bit-coeff with >8bpp versions (unused)
and merge the duplicate 32-bit-coeff for >8bpp (identical).
Signed-off-by: Martin Storsjö <martin@martin.st>
Most of the changes are just trivial are just trivial replacements of
fields from MpegEncContext with equivalent fields in H264Context.
Everything in h264* other than h264.c are those trivial changes.
The nontrivial parts are:
1) extracting a simplified version of the frame management code from
mpegvideo.c. We don't need last/next_picture anymore, since h264 uses
its own more complex system already and those were set only to appease
the mpegvideo parts.
2) some tables that need to be allocated/freed in appropriate places.
3) hwaccels -- mostly trivial replacements.
for dxva, the draw_horiz_band() call is moved from
ff_dxva2_common_end_frame() to per-codec end_frame() callbacks,
because it's now different for h264 and MpegEncContext-based
decoders.
4) svq3 -- it does not use h264 complex reference system, so I just
added some very simplistic frame management instead and dropped the
use of ff_h264_frame_start(). Because of this I also had to move some
initialization code to svq3.
Additional fixes for chroma format and bit depth changes by
Janne Grunau <janne-libav@jannau.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The sh4 optimizations are removed, because the code is
100% identical to the C code, so it is unlikely to
provide any real practical benefit.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
ref_list is constructed from other fields per slice when needed, so do
not copy it for both frame and slice threading.
default_ref_list is constructed per frame and still needs to be copied
to per-slice contexts for slice threading, but a copy is not needed for
frame threading.
If the motion vector is at a subpixel position, we need 3 pixels below
the motion vector's wholepel position available, not 2, since the MC
filter is a sixtap filter for the hpel position, and then a bilin filter
for the qpel position.
This patch fixes highly irreproducible (0.1%) fate failures in frame 2
and 4 of h264-conformance-cama2_vtc_b (e.g. first P-frame, first field,
last line of MB x=40,y=2 and second field and last lines of MBs x=39-40,
y=3). These used pre-loopfilter instead of post-loopfilter data because
the await_progress() waited for one line too little in that field, and
the motion vector of these particular MBs happened to align exactly to a
position where that demonstrates the bug.
CC: libav-stable@libav.org
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Clobbering these tables will temporarily clobber the template used
as a basis for other threads to start decoding from. If the other
decoding thread updates from the template right at that moment,
subsequent threads will get invalid (or, usually, none at all) mmco
tables. This leads to invalid reference lists and subsequent decode
failures.
Therefore, instead, decode the mmco tables only for the first slice in
a field or frame. For other slices, decode the bits and ensure they
are identical to the mmco tables in the first slice, but don't ever
clobber the context state. This prevents other threads from using a
clobbered/invalid template as starting point for decoding, and thus
fixes decoding in these cases.
This fixes occasional (~1%) failures of h264-conformance-mr1_bt_a with
frame-multithreading enabled.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Fixes null pointer dereference later, since if this function failed,
a positive return value was returned to the caller.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Martin Storsjö <martin@martin.st>
Comparing AVCodecContext.pix_fmt against the get_pixel_format() return
value has the side effect of calling the get_format() callback on each
slice. Users of the callback will probably handle hardware accelerator
initialization in the callback.
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Since we can't know which stride a custom get_buffer() implementation is
going to use we have to allocate this scratch buffers after the linesize
is known. It was pretty safe for 8 bit per pixel pixel formats since we
always allocated memory for up to 16 bits per pixel. It broke hoever
with cmdutis.c's alloc_buffer() and high pixel bit depth since it
allocated larger edges than mpegvideo expected.
Fixes fuzzed sample nasa-8s2.ts_s244342.
It is not posible to call get_buffer during frame-mt codec
initialization. Libavformat might pass huge amounts of data as
extradata after parsing broken files. The 'extradata' for the fuzzed
sample sample_varPAR_s5374_r001-02.avi is 2.8M large and contains
multiple slices.
This does not seem to have an effect currently. Fate-h264 passes with
THREADS=1..16 and both threading types as before. It fixes however a
segfault during error resilience with my adaptive-frame-mt patchset.
A picture in use during error resilience gets realloced in another
thread in the fuzzed sample sample_varPAR.avi_s226019.
Dropping frames is undesirable but that is the only way by which the
decoder could return to low delay mode. Instead emit a warning and
continue with delayed frames.
Fixes a crash in fuzzed sample nasa-8s2.ts_s20033 caused by a larger
than expected has_b_frames value. Low delay keeps getting re-enabled
from a presumely broken SPS.
CC: libav-stable@libav.org
s->mb_x is reset to zero a couple of lines above. It does not make
sense to call ff_er_add_slice() with 0 as endx when the end of the
macroblock row was reached. Fixes unnecessary and counterproductive
error resilience in https://bugzilla.libav.org/show_bug.cgi?id=394.
CC: libav-stable@libav.org
Some invocations include a verb in the log message, others do not. Yet
av_log_missing_feature expects callers to provide a verb. Change the
function to include a verb instead and update the callers accordingly.
The result is a more natural function API and correct English in the
function invocations.