This proved beneficial for performance: For the sample [1] the number
of decicycles in one decode call decreased from 155851561 to 108158037
for Clang 10 and from 168270467 to 128847479 for GCC 9.3. For x86-32
compiled with GCC 9.3 and run on an x64 Haswell the number increased
from 158405517 to 202215769, so that the cached bitstream reader is only
enabled if HAVE_FAST_64BIT is set. These values are the average of 10
runs each looping five times over the input.
[1]: samples.ffmpeg.org/ffmpeg-bugs/trac/ticket2593/fraps_flv1_decoding_errors.avi
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The fraps decoder already checked for overreads manually (and errored
out in this scenario), yet it still enabled implicit checks, leading to
worse performance and more code size.
This commit disables the implicit bitstream reader checks. For the
sample [1] this improves performance from 195105896 to 155851561
decicycles for Clang 10 and from 222801887 to 168270467 decicycles when
compiled with GCC 9.3. These values are the average of 10 runs each
looping ten times over the input.
[1]: samples.ffmpeg.org/ffmpeg-bugs/trac/ticket2593/fraps_flv1_decoding_errors.avi
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Unused since 3594788b713e76449eda0bc9d64b38258c86a594.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The Ut video format uses Huffman trees which are only implicitly coded
in the bitstream: Only the lengths of the codes are coded, the rest has
to be inferred by the decoder according to the rule that the longer
codes are to the left of shorter codes in the tree and on each level the
symbols are descending from left to right.
Because longer codes are to the left of shorter codes, one needs to know
how many non-leaf nodes there are on each level in order to know the
code of the next left-most leaf (which belongs to the highest symbol on
that level). The current code does this by sorting the entries to be
ascending according to length and (for entries with the same length)
ascending according to their symbols. This array is then traversed in
reverse order, so that the lowest level is dealt with first, so that the
number of non-leaf nodes of the next higher level is known when
processing said level.
But this can also be calculated without sorting: Simply count how many
leaf nodes there are on each level. Then one can calculate the number of
non-leaf nodes on each level iteratively from the lowest level upwards:
It is just half the number of nodes of the level below.
This improves performance: For the sample from ticket #4044 the amount
of decicycles for one call to build_huff() decreased from 1055489 to
446310 for Clang 10 and from 1080306 to 535155 for GCC 9.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The Ut Video format stores Huffman tables in its bitstream by coding
the length of a given symbol; it does not code the actual code directly,
instead this is to be inferred by the rule that a symbol is to the left
of every shorter symbol in the Huffman tree and that for symbols of the
same length the symbol is descending from left to right. With one
exception, this is also what our de- and encoder did.
The exception only matters when there are codes of length 32, because
in this case the first symbol of this length did not get the code 0,
but 1; this is tantamount to pretending that there is a (nonexistent)
leaf of length 32. This is simply false. The reference software agrees
with this [1].
[1]: 2700a471a7/utv_core/HuffmanCode.cpp (L280)
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Now that the HuffEntries are no longer sorted by the MagicYUV decoder,
their symbols are trivial: The symbol of the element with index i is i.
They can therefore be removed. Furthermore, despite the length of the
codes being in the range 1..32 bits, the actual value of the codes is
<= 4096 (for 12 bit content). The reason for this is that the longer
codes are on the left side of the tree, so that the higher bits of
these codes are simply zero. By using an uint16_t for the codes and
removing the symbols entry, the size of each HuffEntry is decreased from
eight to four, saving 16KB of stack space.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The MagicYUV format stores Huffman tables in its bitstream by coding
the length of a given symbol; it does not code the actual code directly,
instead this is to be inferred by the rule that a symbol is to the left
of every shorter symbol in the Huffman tree and that for symbols of the
same length the symbol is ascending from left to right.
Our decoder implemented this by first sorting the array containing
length and symbol of each element according to descending length and
for equal length, according to ascending symbol. Afterwards, the current
state in the tree got encoded in a variable code; if the next array entry
had length len, then the len most significant bits of code contained
the code of this entry. Whenever an entry of the array of length
len was processed, code was incremented by 1U << (32 - len). So two
entries of length len have the same effect as incrementing code by
1U << (32 - (len - 1)), which corresponds to the parent node of length
len - 1 of the two nodes of length len etc.
This commit modifies this to avoid sorting the entries before
calculating the codes. This is done by calculating how many non-leaf
nodes there are on each level of the tree before calculating the codes.
Afterwards every leaf node on this level gets assigned the number of
nodes already on this level as code. This of course works only because
the entries are already sorted by their symbol initially, so that this
algorithm indeed gives ascending symbols from left to right on every
level.
This offers both speed- as well as (obvious) codesize advantages. With
Clang 10 the number of decicycles for build_huffman decreased from
1561987 to 1228405; for GCC 9 it went from 1825096 decicyles to 1429921.
These tests were carried out with a sample with 150 frames that was
looped 13 times; and this was iterated 10 times. The earlier reference
point here is from the point when the loop generating the codes was
traversed in reverse order (as the patch reversing the order led to
performance penalties).
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The MagicYUV format stores Huffman tables in its bitstream by coding
the length of a given symbol; it does not code the actual code directly,
instead this is to be inferred by the rule that a symbol is to the left
of every shorter symbol in the Huffman tree and that for symbols of the
same length the symbol is ascending from left to right. With one
exception, this is also what our decoder did.
The exception only matters when there are codes of length 32, because
in this case the first symbol of this length did not get the code 0,
but 1; e.g. if there were exactly two nodes of length 32, then they
would get assigned the codes 1 and 2 and a node of length 31 will get
the 31-bit code 1 which is a prefix of the 32 bit code 2, making the
Huffman table invalid. On the other hand, if there were only one symbol
with the length 32, the earlier code would accept this un-Huffman-tree.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
The MagicYUV decoder currently sets both the length and the symbol field
of an array of HuffEntries; hereby the symbol of the ith entry (0-based)
is just i. Then said array gets sorted so that entries with greater
length are at the end and entries with the same length are ordered so
that those with smaller symbols are at the end. Afterwards the newly
sorted array is traversed in reverse order. This commit instead inverts
the ordering and traverses the array in its ordinary order in order to
simplify understanding.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Every plane of each slice has to contain at least two bytes for flags
and the type of prediction used.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Forgotten in ca3c6c981aa5b0af8a5576020b79fdd3cdf9ae9e.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Fixes: left shift of negative value -768
Fixes: 25574/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXTORY_fuzzer-6012596027916288
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: left shift of negative value -256
Fixes: 25460/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXTORY_fuzzer-5073252341514240
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: out of array read
Fixes: 25455/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_DXTORY_fuzzer-6327985731534848
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
AV1 decoder is supported on Tiger Lake+ platforms since libmfx 1.34
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: Zhong Li <zhongli_dev@126.com>
Fixes: shift exponent -2 is negative
Fixes: 25683/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_MOBICLIP_fuzzer-6434808492982272
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Since release 4.2, FFmpeg fails to detect the correct streams in an RTMP
stream that contains a |RtmpSampleAccess AMF object prior to the
onMetaData AMF object. In the debug log it would show "[flv] Unknown
type |RtmpSampleAccess".
This functionality broke in commit d7638d8dfc3c4ffd0dc18a64937a5a07ed67b354
as unknown metadata packets now result in an opaque data stream, and the
|RtmpSampleAccess packet was an "unknown" metadata packet type.
With this change the RTMP streams are correctly detected when there
is a |RtmpSampleAccess object prior to the onMetaData object.
Signed-off-by: Peter van der Spek <p.vanderspek@bluebillywig.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fix the issue: https://github.com/intel/media-driver/issues/317
the root cause is update_dimensions will be called multple times
when decoder thread number is not only 1, but update_dimensions
call get_pixel_format in each decode thread will trigger the
hwaccel_uninit/hwaccel_init more than once. But only one hwaccel
should be shared with all decode threads.
in current context,
there are 3 situations in the update_dimensions():
1. First time calling. No matter single thread or multithread,
get_pixel_format() should be called after dimensions were
set;
2. Dimention changed at the runtime. Dimention need to be
updated when macroblocks_base is already allocated,
get_pixel_format() should be called to recreate new frames
according to updated dimension;
3. Multithread first time calling. After decoder init, the
other threads will call update_dimensions() at first time
to allocate macroblocks_base and set dimensions.
But get_pixel_format() is shouldn't be called due to low
level frames and context are already created.
In this fix, we only call update_dimensions as need.
Signed-off-by: Wang, Shaofei <shaofei.wang@intel.com>
Reviewed-by: Jun, Zhao <jun.zhao@intel.com>
Reviewed-by: Haihao Xiang <haihao.xiang@intel.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
As it was brought up that the current documentation leaves things
as specific to YCbCr only, ICtCp and RGB are now mentioned.
Additionally, the specifications on which these definitions of
narrow and full range are defined are mentioned.
This way, the documentation of AVColorRange should now match how
most people seem to read interpret it at this point, and thus
flagging RGB AVFrames as full range is valid not only according to
common sense, but also the enum definition.
The earlier code would first attempt to allocate two buffers, then
attempt to allocate an AVIOContext, using one of the new buffers I/O
buffer, then check the allocations. On success, a z_stream that is used
in the AVIOContext's read_packet callback is initialized afterwards.
There are two problems with this: In case the allocation of the I/O
buffer fails avio_alloc_context() will be given a NULL read buffer
with a size > 0. This works right now, but it is fragile. The second
problem is that the z_stream used in the read_packet callback is not
functional when avio_alloc_context() is allocated (it might be that
avio_alloc_context() might already fill the buffer in the future). This
commit fixes both of these problems by reordering the operations.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Causes some error as the ADPCM predictors aren't known, but
the difference is negligible and not audible.
Signed-off-by: Zane van Iperen <zane@zanevaniperen.com>
If the average bit rate cannot be calculated, such as in the case
of streamed fragmented mp4, utilize various available parameters
in priority order.
Tests are updated where the esds or btrt or ISML manifest boxes'
output changes.
This is utilized by various media ingests to figure out the bit
rate of the content you are pushing towards it, so write it for
video, audio and subtitle tracks in case at least one nonzero value
is available. It is only mentioned for timed metadata sample
descriptions in QTFF, so limit it only to ISOBMFF (MODE_MP4) mode.
Updates the FATE tests which have their results changed due to the
20 extra bytes being written per track.
for some cases (for example, super resolution), the DNN model changes
the frame size which impacts the filter behavior, so the filter needs
to know the out frame size at very beginning.
Currently, the filter reuses DNNModule.execute_model to query the
out frame size, it is not clear from interface perspective, so add
a new explict interface DNNModel.get_output for such query.
suppose we have a detect and classify filter in the future, the
detect filter generates some bounding boxes (BBox) as AVFrame sidedata,
and the classify filter executes DNN model for each BBox. For each
BBox, we need to crop the AVFrame, copy data to DNN model input and do
the model execution. So we have to save the in_frame at DNNModel.set_input
and use it at DNNModule.execute_model, such saving is not feasible
when we support async execute_model.
This patch sets the in_frame as execution_model parameter, and so
all the information are put together within the same function for
each inference. It also makes easy to support BBox async inference.