Move the loop counter decrement further from the branch instruction,
this hides the latency of the decrement.
In loops that first load, then store (the horizontal prediction cases),
do the decrement after the load (where the next instruction would
stall a bit anyway, waiting for the result of the load).
In loops that store twice using the same destination register,
also do the decrement between the two stores (as the second store
would need to wait for the updated destination register from the
first instruction).
In loops that store twice to two different destination registers,
do the decrement before both stores, to do it as soon before the
branch as possible.
This gives minor (1-2 cycle) speedups in most cases (modulo measurement
noise), but the horizontal prediction functions get a rather notable
speedup on the Cortex A53.
Before: Cortex A53 A72 A73
pred8x8_dc_8_neon: 60.7 46.2 39.2
pred8x8_dc_128_8_neon: 30.7 18.0 14.0
pred8x8_horizontal_8_neon: 42.2 29.2 18.5
pred8x8_left_dc_8_neon: 52.7 36.2 32.2
pred8x8_mad_cow_dc_0l0_8_neon: 48.2 27.7 25.7
pred8x8_mad_cow_dc_0lt_8_neon: 52.5 33.2 34.7
pred8x8_mad_cow_dc_l0t_8_neon: 52.5 31.7 33.2
pred8x8_mad_cow_dc_l00_8_neon: 43.2 27.0 25.5
pred8x8_plane_8_neon: 112.2 86.2 88.2
pred8x8_top_dc_8_neon: 40.7 23.0 21.2
pred8x8_vertical_8_neon: 27.2 15.5 14.0
pred16x16_dc_8_neon: 91.0 73.2 70.5
pred16x16_dc_128_8_neon: 43.0 34.7 30.7
pred16x16_horizontal_8_neon: 86.0 49.7 44.7
pred16x16_left_dc_8_neon: 87.0 67.2 67.5
pred16x16_plane_8_neon: 236.0 175.7 173.0
pred16x16_top_dc_8_neon: 53.2 39.0 41.7
pred16x16_vertical_8_neon: 41.7 29.7 31.0
After:
pred8x8_dc_8_neon: 59.0 46.7 42.5
pred8x8_dc_128_8_neon: 28.2 18.0 14.0
pred8x8_horizontal_8_neon: 34.2 29.2 18.5
pred8x8_left_dc_8_neon: 51.0 38.2 32.7
pred8x8_mad_cow_dc_0l0_8_neon: 46.7 28.2 26.2
pred8x8_mad_cow_dc_0lt_8_neon: 55.2 33.7 37.5
pred8x8_mad_cow_dc_l0t_8_neon: 51.2 31.7 37.2
pred8x8_mad_cow_dc_l00_8_neon: 41.7 27.5 26.0
pred8x8_plane_8_neon: 111.5 86.5 89.5
pred8x8_top_dc_8_neon: 39.0 23.2 21.0
pred8x8_vertical_8_neon: 27.2 16.0 14.0
pred16x16_dc_8_neon: 85.0 70.2 70.5
pred16x16_dc_128_8_neon: 42.0 30.0 30.7
pred16x16_horizontal_8_neon: 66.5 49.5 42.5
pred16x16_left_dc_8_neon: 81.0 66.5 67.5
pred16x16_plane_8_neon: 235.0 175.7 173.0
pred16x16_top_dc_8_neon: 52.0 39.0 41.7
pred16x16_vertical_8_neon: 40.2 33.2 31.0
Despite this, a number of these functions still are slower than
what e.g. GCC 7 generates - this shows the relative speedup of the
neon codepaths over the compiler generated ones:
Cortex A53 A72 A73
pred8x8_dc_8_neon: 0.86 0.65 1.04
pred8x8_dc_128_8_neon: 0.59 0.44 0.62
pred8x8_horizontal_8_neon: 1.51 0.58 1.30
pred8x8_left_dc_8_neon: 0.72 0.56 0.89
pred8x8_mad_cow_dc_0l0_8_neon: 0.93 0.93 1.37
pred8x8_mad_cow_dc_0lt_8_neon: 1.37 1.41 1.68
pred8x8_mad_cow_dc_l0t_8_neon: 1.21 1.17 1.32
pred8x8_mad_cow_dc_l00_8_neon: 1.24 1.19 1.60
pred8x8_plane_8_neon: 3.36 3.58 3.76
pred8x8_top_dc_8_neon: 0.97 0.99 1.43
pred8x8_vertical_8_neon: 0.86 0.78 1.18
pred16x16_dc_8_neon: 1.20 1.06 1.49
pred16x16_dc_128_8_neon: 0.83 0.95 0.99
pred16x16_horizontal_8_neon: 1.78 0.96 1.59
pred16x16_left_dc_8_neon: 1.06 0.96 1.32
pred16x16_plane_8_neon: 5.78 6.49 7.19
pred16x16_top_dc_8_neon: 1.48 1.53 1.94
pred16x16_vertical_8_neon: 1.39 1.34 1.98
In particular, on Cortex A72, many of these functions are slower
than the compiler generated code, while they're more beneficial on
e.g. the Cortex A73.
Signed-off-by: Martin Storsjö <martin@martin.st>
Currently, the DASH demuxer omits the final segment for a non-live
stream (using SegmentTemplate) if it is shorter than the other segments.
Correct calc_max_seg_no to round up when calulating the number of
segments instead of rounding down to resolve this issue.
Signed-off-by: Matt Robinson <git@nerdoftheherd.com>
The MJPEG encoder supports some pixel format/color range combinations
only when strictness is set to unofficial or less. Before commit
059fc2d9da said encoder's pix_fmts array
only included the pixel formats supported with default strictness.
When strictness was <= unofficial, fftools/ffmpeg_filter.c used
an extended list of pixel formats instead of the encoder's including
the pixel formats only supported when strictness <= unofficial.
Said commit turned the logic around: The encoder's pix_fmts array now
included all pixel formats and fftools/ffmpeg_filter.c instead used
a small list of all pixel formats supported when strictness is >
unofficial and the encoder's pixel formats instead. In particular,
the codec's pix_fmt is not used when strictness is normal.
This works for the mjpeg encoder; yet it did not work for other
(hardware-based) mjpeg encoders, because the check for whether one is
using the MJPEG encoder is wrong: It just checks the codec id.
So if one used strict unofficial with a hardware-accelerated MJPEG
encoder before commit 059fc2d9da, the unofficial (non-hardware)
pixel formats of the MJPEG encoder would be used; since said commit
the codec's pixel formats are overridden at ordinary strictness
by the ordinary MJPEG pixel formats. This leads to format conversion
errors lateron which were reported in #9186.
The solution to this is to check AVCodec.name instead of its id.
Fixes ticket #9186.
Tested-by: Eoff, Ullysses A <ullysses.a.eoff@intel.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This encoder has AVCodec.pix_fmts set, so ff_encode_preinit() already
checks for this.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The value zero for AVPacket.duration means that the duration is unknown,
which in practice means "play this subtitle until overridden by the next
subtitle". Yet for Matroska a BlockGroup with duration zero means
that the subtitle really has a duration zero. "Display until overridden"
is achieved by not setting a duration on the container level at all and
this is achieved by using a SimpleBlock or a BlockGroup without
duration. This commit implements this.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: left shift of negative value -224
Fixes: 32144/clusterfuzz-testcase-minimized-ffmpeg_dem_MVI_fuzzer-4971479323246592
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Apparently for various image sequences libavformat/utils.c can
calculate rather fancy r_frame_rate values, such as `186/1921`,
and since ffmpeg.c utilizes r_frame_rate for the filter chain
time base, this can quite deteriorate the output frame timing - even
though the user has requested the image sequence to be interpreted
at a specific, constant frame rate.
Most of the codecs just need everything zeroed. Those that don't
are either handled inline during decode, or pull state from
extradata.
Move state reset/init functionality into adpcm_flush(), and
invoke it from adpcm_decode_init().
Signed-off-by: Zane van Iperen <zane@zanevaniperen.com>
The block size is hardcoded, so the buffer size is always known.
Statically allocate the buffer on the stack.
Signed-off-by: Zane van Iperen <zane@zanevaniperen.com>
MPEG-1/2/4 are the only mpegvideo based encoders that support bframes;
yet even the encoders not supporting bframes have options that only make
sense for an encoder that supports bframes; setting any of these options
for such an encoder has no impact on the encoded outcome (but setting
b_strategy to two slows down encoding considerably). So deprecate these
options.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The MPEG-2 encoder is the only mpegvideo-based encoder that supports
embedding a53 side data.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
mpeg_quant may only be set for MPEG-4 and MPEG-2, yet for the latter
it is no option as the code acts as if it were always set.
So deprecate the option for all codecs for which it makes no sense.
Furthermore, given that the code already errors out if the option is set
for a codec that doesn't support it we can restrict the range of
the option for all these codecs without breaking something. This means
that the checks for whether mpeg_quant is set for these codecs can be
removed as soon as AVCodecContext.mpeg_quant is removed.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This has the advantage that one does not waste some allocations
if one errors out because of these checks.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Currently said list contains only the pixel formats that are always
supported irrespective of the range and the value of
strict_std_compliance. This makes the MJPEG encoder an outlier as all
other codecs put all potentially supported pixel formats into said list
and error out if the chosen pixel format is unsupported. This commit
brings it therefore in line with the other encoders.
The behaviour of fftools/ffmpeg_filter.c has been preserved. A more
informed decision would be possible if colour range were available
at this point, but it isn't.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The documentation for AV_PIX_FMT_YUVJ420P reads:
"planar YUV 4:2:0, 12bpp, full scale (JPEG), deprecated in favor of
AV_PIX_FMT_YUV420P and setting color_range"
Yet the LJPEG encoder only accepts full scale yuv420p when strictness is
set to unofficial or lower; with default strictness it emits a nonsense
error message that says that limit range YUV is unofficial. This has
been changed to allow full range yuv420p, yuv422p and yuv444p irrespective
of the level of strictness.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
All encoders using ff_mpv_encode_init() already have pix_fmts set
so that the pixel format is already checked in ff_encode_preinit().
The one exception to this is MJPEG whose check remains.
(Btw: The AVCodec.pix_fmts check for AMV is stricter than the check
in ff_mpv_encode_init().)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Using optimal Huffman tables is not supported for AMV and always
disabled by ff_mpv_encode_init(); therefore one can build
the AMV encoder without mjpegenc_huffman if one adds the necessary
compile-time checks.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now the relevant checks all checked for the existence of the
MJPEG encoder only.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The ProRes encoder allocates huge worst-case buffers just to be safe;
and for huge resolutions (8k in this case) these can be so big that the
number of bits does no longer fit into a (signed 32-bit) int; this means
that one must no longer use the parts of the PutBits API that deal with
bit counters. Yet proresenc_kostya did it, namely for a check about
whether we are already beyond the end. Yet this check is unnecessary
nowadays, because the PutBits API comes with automatic checks (with
a log message and a av_assert2() in put_bits() and an av_assert0() in
flush_put_bits()), so this is unnecessary. So simply remove the check.
Fixes ticket #9173.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>