Several decoders disable those anyway and they are not measurably faster
on x86. They might be somewhat faster on other platforms due to missing
emu edge SIMD, but the gain is not large enough (and those decoders
relevant enough) to justify the added complexity.
Allow supporting files for which the image stride is smaller than
the maximum block size + number of subpel mc taps, e.g. a 64x64 VP9
file or a 16x16 VP8 file with -fflags +emu_edge.
If the height is zero, the decompression will probably end up
failing due to not fitting into the allocated buffer later
anyway, so this doesn't need any more elaborate check.
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
Also pass on any returned error code.
Reported-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Signed-off-by: Martin Storsjö <martin@martin.st>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
Instead, only extend edges on-demand when the motion vector actually
crosses the visible decoded area using ff_emulated_edge_mc(). This
changes decoding time for cathedral from 8.722sec to 8.706sec, i.e.
0.2% faster overall. More generally (VP8 uses this also), low-motion
content gets significant speed improvements, whereas high-motion content
tends to decode in approximately the same time.
Signed-off-by: Martin Storsjö <martin@martin.st>
Most of the changes are just trivial are just trivial replacements of
fields from MpegEncContext with equivalent fields in H264Context.
Everything in h264* other than h264.c are those trivial changes.
The nontrivial parts are:
1) extracting a simplified version of the frame management code from
mpegvideo.c. We don't need last/next_picture anymore, since h264 uses
its own more complex system already and those were set only to appease
the mpegvideo parts.
2) some tables that need to be allocated/freed in appropriate places.
3) hwaccels -- mostly trivial replacements.
for dxva, the draw_horiz_band() call is moved from
ff_dxva2_common_end_frame() to per-codec end_frame() callbacks,
because it's now different for h264 and MpegEncContext-based
decoders.
4) svq3 -- it does not use h264 complex reference system, so I just
added some very simplistic frame management instead and dropped the
use of ff_h264_frame_start(). Because of this I also had to move some
initialization code to svq3.
Additional fixes for chroma format and bit depth changes by
Janne Grunau <janne-libav@jannau.net>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Move some functions from dsputil. The idea is that videodsp contains
functions that are useful for a large and varied set of video decoders.
Currently, it contains emulated_edge_mc() and prefetch().
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
This prevents undefined behaviour of signed left shift if the coded
value is larger than 2^31. Large values are most likely invalid and
caused errors or by feeding random.
Validate every use of svq3_get_ue_golomb() and changed the place there
the return value was compared with negative numbers. dirac.c was clean,
fixed rv30 and svq3.
Fixes:
libavcodec/svq3.c:661:9: warning: passing argument 2 of 'svq3_decode_block' from incompatible pointer type
libavcodec/svq3.c:208:19: note: expected 'DCTELEM *' but argument is of type 'DCTELEM (*)[32]'
Signed-off-by: Mans Rullgard <mans@mansr.com>
Also break some long lines, remove codec function placeholder comments
and add spaces in sample/pixel format lists.
Signed-off-by: Martin Storsjö <martin@martin.st>
Results of IDCT can by far outreach the range of ff_cropTbl[], leading
to overreads and potentially crashes.
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
Conversion of the luma intra prediction mode to one of the constrained
("alzheimer") ones can happen by crafting special bitstreams, causing
a crash because we'll call a NULL function pointer for 16x16 block intra
prediction, since constrained intra prediction functions are only
implemented for chroma (8x8 blocks).
Found-by: Mateusz "j00ru" Jurczyk and Gynvael Coldwind
CC: libav-stable@libav.org
FF_COMMON_FRAME holds the contents of the AVFrame structure and is also copied
to struct Picture. Replace by an embedded AVFrame structure in struct Picture.
If done before, some parameters aren't known yet.
With svq3/rtp, initializing before some parameters are known
can lead to calling av_malloc(0), which on OS X currently returns
broken pointers.
No speed improvement, but necessary for some future stuff.
Also opens up the possibility of asm chroma dc idct/dequant.
Originally committed as revision 26349 to svn://svn.ffmpeg.org/ffmpeg/trunk
Doesn't help speed as there isn't an asm implementation yet, but consistency
is a good thing.
Originally committed as revision 26348 to svn://svn.ffmpeg.org/ffmpeg/trunk
About 2.5x the speed.
NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.
Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
svq3 still doesn't support multithreading, but it's simpler for clients if
they can enable threading for all codecs by default.
Originally committed as revision 26015 to svn://svn.ffmpeg.org/ffmpeg/trunk
Passing an explicit filename to this command is only necessary if the
documentation in the @file block refers to a file different from the
one the block resides in.
Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk
svq3 decoder does not work yet though but i didnt want to keep compilation
broken longer
Originally committed as revision 22060 to svn://svn.ffmpeg.org/ffmpeg/trunk
called once per MB in worst case and doesnt seem to benefit from static inline.
Actually the code might be a hair faster now (0.1% according to my benchmark but
this could be random noise)
Originally committed as revision 21173 to svn://svn.ffmpeg.org/ffmpeg/trunk
functions called more than per mb are moved into the header, scan8 is also
as it must be known at compiletime.
The code after this patch duplicates h264data.h, this has been done to minimize
the changes in this step and allow more fine grained benchmarking.
Speedwise this is 1% faster on my pentium dual core with diegos cursed cathedral
sample.
Originally committed as revision 21157 to svn://svn.ffmpeg.org/ffmpeg/trunk
AVPacket argument rather than a const uint8_t *buf + int buf_size. This allows
passing of packet-specific flags from demuxer to decoder, such as the keyframe
flag, which appears necessary to playback corePNG P-frames.
Patch by Thilo Borgmann thilo.borgmann googlemail com, see also the thread
"Google Summer of Code participation" on the mailinglist.
Originally committed as revision 18351 to svn://svn.ffmpeg.org/ffmpeg/trunk
svq3_decode_slice_header() modifies the buffer used by the bitstream
reader. Some of the bitstream readers cache a few bytes of data, which
must be flushed after such a modification. Calling skip_bits_long(gb, 0)
achieves this.
Originally committed as revision 17680 to svn://svn.ffmpeg.org/ffmpeg/trunk
Otherwise doxygen complains about ambiguous filenames when files exist
under the same name in different subdirectories.
Originally committed as revision 16912 to svn://svn.ffmpeg.org/ffmpeg/trunk