Force termination when the overlay stream ends. Simplify scripting logic,
for example when an infinite source is used to generate a background for
a composite video.
Avoid to write more than one cuepoint per track and PTS in
mkv_write_cues(). This avoids a later assertion failure on "(bytes >=
needed_bytes)" in put_ebml_num() called from end_ebml_master(), in case
there are several cuepoints per track with the same PTS.
This may happen with files containing packets with duplicated PTS in the
same track.
* commit '8a11ce43d08352f7a290355ebb5b29c495ad9609':
build: Ensure that output directories for header objects are created
h264: Get rid of unnecessary casts
Conflicts:
common.mak
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit '7ebfb466aec2c4628fcd42a72b29034efcaba4bc':
h264: Don't store intra pcm samples in h->mb
get_bits: Return pointer to buffer that is the result of the alignment
Conflicts:
libavcodec/h264_mb_template.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'e5ffffe48d20642acc079166f0fa7d93a6a9f594':
h264chroma: Remove duplicate 9/10 bit functions
x86: Use simple nop codes for <= sse (rather than <= mmx)
vp56: Remove clear_blocks call, and clear alpha plane U/V DC only
Merged-by: Michael Niedermayer <michaelni@gmx.at>
change the treatment of the strip y coordinates which previously did
not follow the description (nor did it behave like the binary decoder
on files with absolute strip offsets).
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The new code is also faster and more robust.
As for the performance:
old decoder + conversion to rgb: fps = 2618
old decoder, without converting to rgb: fps = 4012
new decoder, producing rgb: fps = 4502
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This gets rid of a number of warnings about casts discarding
qualifiers from the pointer target, present since 7ebfb466a.
Signed-off-by: Martin Storsjö <martin@martin.st>
Instead, only extend edges on-demand when the motion vector actually
crosses the visible decoded area using ff_emulated_edge_mc(). This
changes decoding time for cathedral from 8.722sec to 8.706sec, i.e.
0.2% faster overall. More generally (VP8 uses this also), low-motion
content gets significant speed improvements, whereas high-motion content
tends to decode in approximately the same time.
Signed-off-by: Martin Storsjö <martin@martin.st>
Instead, keep them in the bitstream buffer until we read them verbatim,
this saves a memcpy() and a subsequent clearing of the target buffer.
decode_cabac+decode_mb for a sample file (CAPM3_Sony_D.jsv) goes from
6121.4 to 6095.5 cycles, i.e. 26 cycles faster.
Signed-off-by: Martin Storsjö <martin@martin.st>
This allows more transparent mixing of get_bits and whole-byte access
without having to touch get_bits internals.
Signed-off-by: Martin Storsjö <martin@martin.st>
These functions are mostly H264-specific (the only other user I can
spot is bink), and this allows us to special-case some functionality
for H264. Also remove the 16-bit-coeff with >8bpp versions (unused)
and merge the duplicate 32-bit-coeff for >8bpp (identical).
Signed-off-by: Martin Storsjö <martin@martin.st>
The "CentaurHauls family 6 model 9 stepping 8" family of CPUs
(flags: fpu vme de pse tsc msr cx8 sep mtrr pge mov pat mmx fxsr sse
up rng rng_en ace ace_en) SIGILLs on long nop codes.
Signed-off-by: Martin Storsjö <martin@martin.st>
The non-alpha and alpha-Y planes are cleared in the idct_put/add()
calls. For the alpha U/V planes, we only care about the DC for entropy
context prediction purposes, the rest of the data is unused.
Signed-off-by: Martin Storsjö <martin@martin.st>
The non-intra-pcm branch in hl_decode_mb (simple, 8bpp) goes from 700
to 672 cycles, and the complete loop of decode_mb_cabac and hl_decode_mb
(in the decode_slice loop) goes from 1759 to 1733 cycles on the clip
tested (cathedral), i.e. almost 30 cycles per mb faster.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>