Only warn if the advanced_editlist option is enabled (it is enabled
by default though) so we don't print one warning for each track, and
demote the warning to AV_LOG_LEVEL_VERBOSE; this message does get
generated whenever parsing a fragmented MP4 file, regardless of
whether the file actually uses multiple edits or not.
Later when parsing the mov structures, the demuxer does warn if
the file did contain multiple edits which would require the
advanced_editlist option enabled for decoding correctly.
Adjust the warning message for the case when the file seemed like it
actually would have needed handling of advanced edit lists, to
reflect the fact that it doesn't help to try set the option as
it has been automatically disabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
Advanced edit list support is entirely broken for fragmented MP4s,
currently. mov_fix_index is never run in mov_build_index, since
in fragmented MP4s the stco, stsz, stts, and stsc boxes have zero
entries, with the index being filled in as each fragment's trun
box is seen.
The result of this is that the skip samples is never set properly,
since half the code thinks it doesn't need to, as advanced_editlist
is enabled, but as mov_fix_index is never called, it doesnt get set.
This means that any edits for e.g. priming are not properly applied
as skip samples side data.
This also means remuxing to fragmented MP4 from progressive MP4 with
lavf will quietly drop the edit list, currently.
Example:
$ ffmpeg -loglevel quiet -advanced_editlist 1 -i non_fragmented.mp4 -f md5 -
MD5=d02d929f8eb4edef624758a298d5f7c6
$ ffmpeg -loglevel quiet -advanced_editlist 0 -i non_fragmented.mp4 -f md5 -
MD5=d02d929f8eb4edef624758a298d5f7c6
$ ffmpeg -loglevel quiet -advanced_editlist 1 -i fragmented.mp4 -f md5 -
MD5=e38b110f586fa886ff94e0ca98a95d59 <-- wrong, extra samples are output instead of being skipped
$ ffmpeg -loglevel quiet -advanced_editlist 0 -i fragmented.mp4 -f md5 -
MD5=d02d929f8eb4edef624758a298d5f7c6
We cannot call mov_fix_index after reading a trun box
since mov_fix_index seems to assume it is only called once, on a
fully complete index, an multiple calls to it don't seem like
they'd work, so the "best" option seems to be disabling advanced
edit list support entirely for the time being, as it is broken
for these types of files.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Fixes some MP4F files which have duration in mdhd set to UINT_MAX instead of zero.
Signed-off-by: Sasi Inguva <isasi@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
When determining whether a packet should be decrypted,
should use the stsd_id of the fragment where the current packet is located.
Reviewed-by: Zhao Zhili <zhilizhao@tencent.com>
Signed-off-by: Wang Yaqiang <wangyaqiang03@kuaishou.com>
This duration is equal to the longest duration in all track's tkhd atoms, which
may be comprised of the sum of all edit lists in each track. Empty edit lists
in tracks represent start_time, and the actual media duration is stored in the
mdhd atom.
This change lets the generic demux code derive the longest track duration taken
from mdhd atoms, so the correct duration and start_time combination will be
reported.
Should fix ticket #9775.
Reviewed-by: zhilizhao(赵志立) <quinkblack@foxmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The mov demuxer only returns DV audio, video packets are discarded.
It first reads the data to be parsed into a packet. Then both this
packet and the pointer to its data are passed together to
avpriv_dv_produce_packet(), which parses the data and partially
overwrites the packet. This is confusing and potentially dangerous, so
just pass NULL and avoid pointless packet modification.
Fixes: ffmpeg.md
Fixes: Out of array access
Fixes: CVE-2022-2566
Found-by: Andy Nguyen <theflow@google.com>
Found-by: 3pvd <3pvd@google.com>
Reviewed-by: Andy Nguyen <theflow@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Some muxers, such as GPAC, create files with only one sidx, but two streams
muxed into the same fragments pointed to by this sidx.
Prevously, in such a case, when we seeked in such files, we fell back
to, for example, using the sidx associated with the video stream, to
seek the audio stream, leaving the seekhead in the wrong place.
We can still do this, but we need to take care to compare timestamps
in the same time base.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
frag_stream_info->index_entry isn't the first sample/trun index.
cenc.frag_index_entry_base failed to catch the case since
current_index > 0.
Fix ticket #9807.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
frag_index.current is used by cenc_filter, and is updated inside
mov_read_moof. It can out of sync regarding to mov_read_packet.
Partly fix ticket #9807.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Stores the item ids of all the items found in the file and
processes the primary item at the end of the meta box. This patch
does not change any behavior. It sets up the code for parsing
alpha channel (and possibly images with 'grid') in follow up
patches.
Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Signed-off-by: James Zern <jzern@google.com>
Streams with all zero sample_delta in 'stts' have all zero dts.
They have higher chance be chose by mov_find_next_sample(), which
leads to seek again and again.
For example, GoPro created a 'GoPro SOS' stream:
Stream #0:4[0x5](eng): Data: none (fdsc / 0x63736466), 13 kb/s (default)
Metadata:
creation_time : 2022-06-21T08:49:19.000000Z
handler_name : GoPro SOS
With 'ffprobe -show_frames http://example.com/gopro.mp4', ffprobe
blocks until all samples in 'GoPro SOS' stream are consumed first.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Data does not have to be decrypted in 16-byte blocks for AES-CTR mode, so
existing buggy code can be hugely simplified.
Fixes ticket #9829.
Signed-off-by: Marton Balint <cus@passwd.hu>
In order to not generate 0 sized packets or create a huge index table
needlessly.
Fixes: Timeout
Fixes: 43717/clusterfuzz-testcase-minimized-ffmpeg_IO_DEMUXER_fuzzer-5206008287330304
Fixes: 45738/clusterfuzz-testcase-minimized-ffmpeg_IO_DEMUXER_fuzzer-6142535657979904
Signed-off-by: Marton Balint <cus@passwd.hu>
Update the still AVIF parser to only read the primary item. With this
patch, AVIF still images with exif/icc/alpha channel will no longer
fail to parse.
For example, this patch enables parsing of files in:
https://github.com/AOMediaCodec/av1-avif/tree/master/testFiles/Microsoft
Adding two fate tests:
1) demuxing of still image with 1 item - this test will pass regardless
of this patch.
2) demuxing of still image with 2 items - this test will fail without
this patch and will pass with patch applied.
Partially fixes trac ticket #7621
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Signed-off-by: James Zern <jzern@google.com>
For ipcm and fpcm streams, big-endian format is the default, but it can be changed
with additional 'pcmC' sub-atom of audio sample description.
Details can be found in ISO/IEC 23003-5:2020
Fixes ticket #9763.
Fixes ticket #9790.
Patch simplified by Marton Balint.
Signed-off-by: Marton Balint <cus@passwd.hu>
Fixes: signed integer overflow: 536870913 * 536870913 cannot be represented in type 'int'
Fixes: 45862/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-4730373768085504
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
ff_codec_get_id loops over ff_codec_movvideo_tags (which is a large
array) two times. The result is unused most of the cases.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The stsc_index is checked and updated for the next sample. If the
next sample needs to update stsd_index and stsc_index, then only
stsc_index is updated, which leads to a missing
AV_PKT_DATA_NEW_EXTRADATA. For example, the sample in the second
chunk needs to update both.
entry[0]
first_chunk = 1
samples_per_chunk = 3
sample_description_index = 1
entry[1]
first_chunk = 2
samples_per_chunk = 1
sample_description_index = 2
entry[2]
first_chunk = 3
samples_per_chunk = 8
sample_description_index = 2
The fix is simple: first check and update stsd_index for current
sample, then check and update stsc_index for the next.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
AVIF still and animations are now supported by the MOV parser.
Add the "avif" extension to the list of supported extensions to
AVInputFormat.
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
This patch supports AVIF still images conforming to the
final specification that have exactly one item (i.e. no alpha channel).
The iloc box is parsed and the mov index populated.
Partially fixes#7621.
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Signed-off-by: Gyan Doshi <ffmpeg@gyani.pro>
60 fps content have "Number of Frames" set to 30 in the tmcd atom, but the
frame duration / timescale reflects the original video frame rate.
Therefore we multiply the frame count with the quotient of the rounded timecode
frame rate and the "Number of Frames" per second to get a frame count in the original
(higher) frame rate.
Note that the frames part in the timecode will be in high frame rate which will
make the timecode different to e.g. MediaInfo which seems to show the 30 fps
timecode even for 120 fps content.
Regression since 428b4aacb1.
Fixes ticket #9710.
Fixes ticket #9492.
Signed-off-by: Marton Balint <cus@passwd.hu>
This avoids unnecessary rebuilds of most source files if only the
list of enabled components has changed, but not the other properties
of the build, set in config.h.
Signed-off-by: Martin Storsjö <martin@martin.st>
As defined by Google's Spatial Audio RFC.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: James Almer <jamrial@gmail.com>
This was tested with medias recorded from an iPhone XR and an iPhone 13.
Here is how a typical stream looks like in coding order:
┌────────┬─────┬─────┬──────────┐
│ sample | PTS | DTS | keyframe |
├────────┼─────┼─────┼──────────┤
┊ ┊ ┊ ┊ ┊
│ 53 │ 560 │ 510 │ No │
│ 54 │ 540 │ 520 │ No │
│ 55 │ 530 │ 530 │ No │
│ 56 │ 550 │ 540 │ No │
│ 57 │ 600 │ 550 │ Yes │
│ * 58 │ 580 │ 560 │ No │
│ * 59 │ 570 │ 570 │ No │
│ * 60 │ 590 │ 580 │ No │
│ 61 │ 640 │ 590 │ No │
│ 62 │ 620 │ 600 │ No │
┊ ┊ ┊ ┊ ┊
In composition/display order:
┌────────┬─────┬─────┬──────────┐
│ sample | PTS | DTS | keyframe |
├────────┼─────┼─────┼──────────┤
┊ ┊ ┊ ┊ ┊
│ 55 │ 530 │ 530 │ No │
│ 54 │ 540 │ 520 │ No │
│ 56 │ 550 │ 540 │ No │
│ 53 │ 560 │ 510 │ No │
│ * 59 │ 570 │ 570 │ No │
│ * 58 │ 580 │ 560 │ No │
│ * 60 │ 590 │ 580 │ No │
│ 57 │ 600 │ 550 │ Yes │
│ 63 │ 610 │ 610 │ No │
│ 62 │ 620 │ 600 │ No │
┊ ┊ ┊ ┊ ┊
Sample/frame 58, 59 and 60 are B-frames which actually depends on the
key frame (57). Here the key frame is not an IDR but a "CRA" (Clean
Random Access).
Initially, I thought I could rely on the sdtp box (independent and
disposable samples), but unfortunately:
sdtp[54] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[55] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[56] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[57] is_leading:0 sample_depends_on:2 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[58] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[59] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[60] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[61] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[62] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
The information that might have been useful here would have been
is_leading, but all the samples are set to 0 so this was unusable.
Instead, we need to rely on sgpd/sbgp tables. In my case the video track
contained 3 sgpd tables with the following grouping types: tscl, sync
and tsas. In the sync table we have the following 2 entries (only):
sgpd.sync[1]: sync nal_unit_type:0x14
sgpd.sync[2]: sync nal_unit_type:0x15
(The count starts at 1 because 0 carries the undefined semantic, we'll
see that later in the reference table).
The NAL unit types presented here correspond to:
libavcodec/hevc.h: HEVC_NAL_IDR_N_LP = 20,
libavcodec/hevc.h: HEVC_NAL_CRA_NUT = 21,
In parallel, the sbgp sync table contains the following:
┌────┬───────┬─────┐
│ id │ count │ gdi │
├────┼───────┼─────┤
│ 0 │ 1 │ 1 │
│ 1 │ 56 │ 0 │
│ 2 │ 1 │ 2 │
│ 3 │ 59 │ 0 │
│ 4 │ 1 │ 2 │
│ 5 │ 59 │ 0 │
│ 6 │ 1 │ 2 │
│ 7 │ 59 │ 0 │
│ 8 │ 1 │ 2 │
│ 9 │ 59 │ 0 │
│ 10 │ 1 │ 2 │
│ 11 │ 11 │ 0 │
└────┴───────┴─────┘
The gdi column (group description index) directly refers to the index in
the sgpd.sync table. This means the first frame is an IDR, then we have
batches of undefined frames interlaced with CRA frames. No IDR ever
appears again (tried on a 30+ seconds sample).
With that information, we can build an heuristic using the presentation
order.
A few things needed to be introduced in this commit:
1. min_sample_duration is extracted from the stts: we need the minimal
step between sample in order to PTS-step backward to a valid point
2. In order to avoid a loop over the ctts table systematically during a
seek, we build an expanded list of sample offsets which will be used
to translate from DTS to PTS
3. An open_key_samples index to keep track of all the non-IDR key
frames; for now it only supports HEVC CRA frames. We should probably
add BLA frames as well, but I don't have any sample so I prefered to
leave that for later
It is entirely possible I missed something obvious in my approach, but I
couldn't come up with a better solution. Also, as mentioned in the diff,
we could optimize is_open_key_sample(), but the linear scaling overhead
should be fine for now since it only happens in seek events.
Fixing this issue prevents sending broken packets to the decoder. With
FFmpeg hevc decoder the frames are skipped, with VideoToolbox the frames
are glitching.
sgpd means Sample Group Description Box.
For now, only the sync grouping type is parsed, but the function can
easily be adjusted to support other flavours.
The sbgp (Sample to Group Box) sync_group table built in previous commit
contains references to this table through the group_description_index
field.
It appears this is not allowed "Each Segment Index box documents how a (sub)segment is divided into one or more subsegments
(which may themselves be further subdivided using Segment Index boxes)."
Fixes: Null pointer dereference
Fixes: Ticket9517
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -9223372036854775808 - 8 cannot be represented in type 'long'
Fixes: 43542/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-5237670148702208
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
MOVAtom.type is always read as a little-endian number
(despite MOV/ISOBMFF being big-endian).
Fixes the matroska-dovi-write-config8 FATE-test on big-endian
arches (which runs into the "index out of range" warning message).
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is small (16 B) and therefore the overhead of exporting it more
than outweighs the size savings from not having duplicated symbols:
When the symbol is no longer avpriv, one saves twice the size of
the string containing the symbols name (2x30 byte), two entries
in .dynsym (24 bytes each on x64), one entry in the importing libraries
.got and .rela.dyn (8 + 24 bytes on x64) and two entries for the
symbol version (2 bytes each) and one hash value in the exporting
library (4 bytes).
(The exact numbers are of course different for other platforms
(e.g. when using dlls), but given that the strings saved alone
more than outweigh the array size it can be presumed that this
is beneficial for all platforms.)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To avoid duplicating code. The implementation in dovi_isom is identical.
Signed-off-by: quietvoid <tcChlisop0@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Very high stts sample deltas may occasionally be intended but usually
they are written in error or used to store a negative value for dts correction
when treated as signed 32-bit integers.
This option lets the user set an upper limit, beyond which the delta is clamped to 1.
Values greater than the limit if negative when cast to int32 are used to adjust onward dts.
Unit is the track time scale. Default is UINT_MAX - 48000*10 which
allows upto a 10 second dts correction for 48 kHz audio streams while
accommodating 99.9% of uint32 range.
Signed-off-by: Gyan Doshi <ffmpeg@gyani.pro>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>