Parse iprp and iinf boxes and its child boxes to get the actual codec used
(AV1 for avif, HEVC for heic), and properly export extradata and other
properties in a generic way.
The avif tests reference files are updated as the extradata is now exported.
Based on a patch by Swaraj Hota
Co-authored-by: Swaraj Hota <swarajhota353@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
They are similar to AVIF images (both use the HEIF container).
The only additional work needed is to parse the hvcC box and put
it in the extradata.
With this patch applied, ffmpeg (when built with an HEVC decoder)
is able to decode the files in
https://github.com/nokiatech/heif/tree/gh-pages/content/images
Also add a couple of fate tests with samples from
https://github.com/nokiatech/heif_conformance/tree/master/conformance_files
Partially fixes trac ticket #6521.
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
For badly interleaved files, interleave packets from multiple tracks
at the demuxer level can trigger seeking back and forth, which can be
dramatically slow depending on the protocol. Demuxer level interleave
can be useless sometimes, e.g., reading mp4 via http and then
transcoding/remux to DASH. Disable this option when you don't need the
demuxer level interleave, and want to avoid the IO penalizes.
Co-authored-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Only warn if the advanced_editlist option is enabled (it is enabled
by default though) so we don't print one warning for each track, and
demote the warning to AV_LOG_LEVEL_VERBOSE; this message does get
generated whenever parsing a fragmented MP4 file, regardless of
whether the file actually uses multiple edits or not.
Later when parsing the mov structures, the demuxer does warn if
the file did contain multiple edits which would require the
advanced_editlist option enabled for decoding correctly.
Adjust the warning message for the case when the file seemed like it
actually would have needed handling of advanced edit lists, to
reflect the fact that it doesn't help to try set the option as
it has been automatically disabled.
Signed-off-by: Martin Storsjö <martin@martin.st>
When determining whether a packet should be decrypted,
should use the stsd_id of the fragment where the current packet is located.
Reviewed-by: Zhao Zhili <zhilizhao@tencent.com>
Signed-off-by: Wang Yaqiang <wangyaqiang03@kuaishou.com>
frag_stream_info->index_entry isn't the first sample/trun index.
cenc.frag_index_entry_base failed to catch the case since
current_index > 0.
Fix ticket #9807.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Stores the item ids of all the items found in the file and
processes the primary item at the end of the meta box. This patch
does not change any behavior. It sets up the code for parsing
alpha channel (and possibly images with 'grid') in follow up
patches.
Reviewed-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Signed-off-by: James Zern <jzern@google.com>
Update the still AVIF parser to only read the primary item. With this
patch, AVIF still images with exif/icc/alpha channel will no longer
fail to parse.
For example, this patch enables parsing of files in:
https://github.com/AOMediaCodec/av1-avif/tree/master/testFiles/Microsoft
Adding two fate tests:
1) demuxing of still image with 1 item - this test will pass regardless
of this patch.
2) demuxing of still image with 2 items - this test will fail without
this patch and will pass with patch applied.
Partially fixes trac ticket #7621
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Signed-off-by: James Zern <jzern@google.com>
This patch supports AVIF still images conforming to the
final specification that have exactly one item (i.e. no alpha channel).
The iloc box is parsed and the mov index populated.
Partially fixes#7621.
Signed-off-by: Vignesh Venkatasubramanian <vigneshv@google.com>
Signed-off-by: Gyan Doshi <ffmpeg@gyani.pro>
60 fps content have "Number of Frames" set to 30 in the tmcd atom, but the
frame duration / timescale reflects the original video frame rate.
Therefore we multiply the frame count with the quotient of the rounded timecode
frame rate and the "Number of Frames" per second to get a frame count in the original
(higher) frame rate.
Note that the frames part in the timecode will be in high frame rate which will
make the timecode different to e.g. MediaInfo which seems to show the 30 fps
timecode even for 120 fps content.
Regression since 428b4aacb1.
Fixes ticket #9710.
Fixes ticket #9492.
Signed-off-by: Marton Balint <cus@passwd.hu>
This was tested with medias recorded from an iPhone XR and an iPhone 13.
Here is how a typical stream looks like in coding order:
┌────────┬─────┬─────┬──────────┐
│ sample | PTS | DTS | keyframe |
├────────┼─────┼─────┼──────────┤
┊ ┊ ┊ ┊ ┊
│ 53 │ 560 │ 510 │ No │
│ 54 │ 540 │ 520 │ No │
│ 55 │ 530 │ 530 │ No │
│ 56 │ 550 │ 540 │ No │
│ 57 │ 600 │ 550 │ Yes │
│ * 58 │ 580 │ 560 │ No │
│ * 59 │ 570 │ 570 │ No │
│ * 60 │ 590 │ 580 │ No │
│ 61 │ 640 │ 590 │ No │
│ 62 │ 620 │ 600 │ No │
┊ ┊ ┊ ┊ ┊
In composition/display order:
┌────────┬─────┬─────┬──────────┐
│ sample | PTS | DTS | keyframe |
├────────┼─────┼─────┼──────────┤
┊ ┊ ┊ ┊ ┊
│ 55 │ 530 │ 530 │ No │
│ 54 │ 540 │ 520 │ No │
│ 56 │ 550 │ 540 │ No │
│ 53 │ 560 │ 510 │ No │
│ * 59 │ 570 │ 570 │ No │
│ * 58 │ 580 │ 560 │ No │
│ * 60 │ 590 │ 580 │ No │
│ 57 │ 600 │ 550 │ Yes │
│ 63 │ 610 │ 610 │ No │
│ 62 │ 620 │ 600 │ No │
┊ ┊ ┊ ┊ ┊
Sample/frame 58, 59 and 60 are B-frames which actually depends on the
key frame (57). Here the key frame is not an IDR but a "CRA" (Clean
Random Access).
Initially, I thought I could rely on the sdtp box (independent and
disposable samples), but unfortunately:
sdtp[54] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[55] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[56] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[57] is_leading:0 sample_depends_on:2 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[58] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[59] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[60] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[61] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[62] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
The information that might have been useful here would have been
is_leading, but all the samples are set to 0 so this was unusable.
Instead, we need to rely on sgpd/sbgp tables. In my case the video track
contained 3 sgpd tables with the following grouping types: tscl, sync
and tsas. In the sync table we have the following 2 entries (only):
sgpd.sync[1]: sync nal_unit_type:0x14
sgpd.sync[2]: sync nal_unit_type:0x15
(The count starts at 1 because 0 carries the undefined semantic, we'll
see that later in the reference table).
The NAL unit types presented here correspond to:
libavcodec/hevc.h: HEVC_NAL_IDR_N_LP = 20,
libavcodec/hevc.h: HEVC_NAL_CRA_NUT = 21,
In parallel, the sbgp sync table contains the following:
┌────┬───────┬─────┐
│ id │ count │ gdi │
├────┼───────┼─────┤
│ 0 │ 1 │ 1 │
│ 1 │ 56 │ 0 │
│ 2 │ 1 │ 2 │
│ 3 │ 59 │ 0 │
│ 4 │ 1 │ 2 │
│ 5 │ 59 │ 0 │
│ 6 │ 1 │ 2 │
│ 7 │ 59 │ 0 │
│ 8 │ 1 │ 2 │
│ 9 │ 59 │ 0 │
│ 10 │ 1 │ 2 │
│ 11 │ 11 │ 0 │
└────┴───────┴─────┘
The gdi column (group description index) directly refers to the index in
the sgpd.sync table. This means the first frame is an IDR, then we have
batches of undefined frames interlaced with CRA frames. No IDR ever
appears again (tried on a 30+ seconds sample).
With that information, we can build an heuristic using the presentation
order.
A few things needed to be introduced in this commit:
1. min_sample_duration is extracted from the stts: we need the minimal
step between sample in order to PTS-step backward to a valid point
2. In order to avoid a loop over the ctts table systematically during a
seek, we build an expanded list of sample offsets which will be used
to translate from DTS to PTS
3. An open_key_samples index to keep track of all the non-IDR key
frames; for now it only supports HEVC CRA frames. We should probably
add BLA frames as well, but I don't have any sample so I prefered to
leave that for later
It is entirely possible I missed something obvious in my approach, but I
couldn't come up with a better solution. Also, as mentioned in the diff,
we could optimize is_open_key_sample(), but the linear scaling overhead
should be fine for now since it only happens in seek events.
Fixing this issue prevents sending broken packets to the decoder. With
FFmpeg hevc decoder the frames are skipped, with VideoToolbox the frames
are glitching.
sgpd means Sample Group Description Box.
For now, only the sync grouping type is parsed, but the function can
easily be adjusted to support other flavours.
The sbgp (Sample to Group Box) sync_group table built in previous commit
contains references to this table through the group_description_index
field.
Very high stts sample deltas may occasionally be intended but usually
they are written in error or used to store a negative value for dts correction
when treated as signed 32-bit integers.
This option lets the user set an upper limit, beyond which the delta is clamped to 1.
Values greater than the limit if negative when cast to int32 are used to adjust onward dts.
Unit is the track time scale. Default is UINT_MAX - 48000*10 which
allows upto a 10 second dts correction for 48 kHz audio streams while
accommodating 99.9% of uint32 range.
Signed-off-by: Gyan Doshi <ffmpeg@gyani.pro>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
As per 8.6.1.2.2 of ISO/IEC 14496-12:2015(E), STTS sample offsets
are to be always stored as uint32_t. So far, they have been signed ints
which led to desync in files with very large offsets.
The MOVStts struct was used to store CTTS offsets as well. These can be
negative in version 1. So a new struct MOVCtts was created and all
declarations for CTTS usage changed to MOVCtts.
correct implementation of 'cenc' encryption scheme to support
decryption of partial cipher blocks at the end of subsamples
https://www.iso.org/standard/68042.html
Signed-off-by: Nachiket Tarate <nachiket.programmer@gmail.com>
Signed-off-by: Steven Liu <lq@chinaffmpeg.org>
This information is coded in a standard MP4 KindBox and utilizes the
scheme and values as per the DASH role scheme defined in MPEG-DASH.
Other schemes are technically allowed, but where multiple schemes
define the same concepts, the DASH scheme should be utilized.
Such flagging is additionally utilized by the DASH-IF CMAF ingest
specification, enabling an encoder to inform the following component
of the roles of the incoming media streams.
A test is added for this functionality in a similar manner to the
matroska test.
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
Includes basic support for both the ISMV ('dfxp') and MP4 ('stpp')
methods. This initial version also foregoes fragmentation support
in case the built-in sample squashing is to be utilized, as this
eases the initial review.
Additionally, add basic tests for both muxing modes in MP4.
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
The AAXC container format is the same as the (already supported) Audible
AAX format but it uses a different encryption scheme.
Note: audible_key and audible_iv values are variable (per file) and are
externally fed.
It is possible to extend https://github.com/mkb79/Audible to derive the
audible_key and audible_key values.
Relevant code:
def decrypt_voucher(deviceSerialNumber, customerId, deviceType, asin, voucher):
buf = (deviceType + deviceSerialNumber + customerId + asin).encode("ascii")
digest = hashlib.sha256(buf).digest()
key = digest[0:16]
iv = digest[16:]
# decrypt "voucher" using AES in CBC mode with no padding
cipher = AES.new(key, AES.MODE_CBC, iv)
plaintext = cipher.decrypt(voucher).rstrip(b"\x00") # improve this!
return json.loads(plaintext)
The decrypted "voucher" has the required audible_key and audible_iv
values.
Update (Nov-2020): This patch has now been tested by multiple folks -
details at the following URL:
https://github.com/mkb79/Audible/issues/3
Signed-off-by: Vesselin Bontchev <vesselin.bontchev@yandex.com>
On files with more than one sidx box, like live fragmented MP4
files, it was previously re-reading and seeking on every singl
sidx box, leading to extremely poor performance on larger files,
especially over the network.
Only do it on the first one, and stash its result.
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
dts would start over at the beginning of each trun when they should be
computed contiguously for each trun in a traf
Fixes ticket 8070
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Allows the creation of the sdtp atom while remuxing MP4 to MP4. This
atom is required by Apple devices (iPhone, Apple TV) in order to accept
2160p medias.
Detecting missing tfhd avoids re-using tfhd track info from the previous
moof. For files with multiple tracks, this may make a mess of the
avindex and fragindex, which can later trigger av_assert0 in
mov_read_trun().
Reviewed-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
ISOBMFF does not allow AudioSampleEntryV1 in stsd version 0, so
assume the descriptor format is QTFF SoundDescriptionV1. ISOBMFF does
not define a version 2.
This fixes audio decoding for some MP4 files generated with Apple
tools. The additional fields present in SoundDescriptionV1/V2 need to
be read in order to correctly read additional boxes that contain
information required for decoding the stream.
Fixes#7376.
Also see: https://github.com/HandBrake/HandBrake/issues/1555
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Fixes a compilation warning if size_t != uint64_t:
libavformat/mov.c: In function ‘mov_read_saio’:
libavformat/mov.c:6207:45: warning: assignment from incompatible pointer type [-Wincompatible-pointer-types]
encryption_index->auxiliary_offsets = auxiliary_offsets;
^
This doesn't support saio atoms with more than one offset.
Signed-off-by: Jacob Trimble <modmaker@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
- Parse schm atom to get different encryption schemes.
- Allow senc atom to appear in track fragments.
- Allow 16-byte IVs.
- Allow constant IVs (specified in tenc).
- Allow only tenc to specify encryption (i.e. no senc/saiz/saio).
- Use sample descriptor to detect clear fragments.
This doesn't support:
- Different sample descriptor holding different encryption info.
- Only first sample descriptor can be encrypted.
- Encrypted sample groups (i.e. seig).
- Non-'cenc' encryption scheme when using -decryption_key.
Signed-off-by: Jacob Trimble <modmaker@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes seek for files with empty edits and files with negative ctts
(dts_shift > 0). Added fate samples and tests.
Signed-off-by: Sasi Inguva <isasi@isasi.mtv.corp.google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When keyframe intervals of dash segments are not perfectly aligned,
fragments in the stream can overlap in time. The previous sorting by
timestamp causes packets to be read out of decode order and results
in decode errors.
Insert new "trun" index entries into index_entries in the order that
the trun are referenced by the sidx.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When sidx box support is enabled, the code will skip reading all
trun boxes (each containing ctts entries for samples inthat box).
If seeks are attempted before all ctts values are known, the old
code would dump ctts entries into the wrong location. These are
then used to compute pts values which leads to out of order and
incorrectly timestamped packets.
This patch fixes ctts processing by always using the index returned
by av_add_index_entry() as the ctts_data index. When the index gains
new entries old values are reshuffled as appropriate.
This approach makes sense since the mov demuxer is already relying
on the mapping of AVIndex entries to samples for correct demuxing.
As a result of this all ctts entries are now 1-count. A followup
change will be submitted to remove support for > 1 count entries
which will simplify seeking.
Notes for future improvement:
Probably there are other boxes (stts, stsc, etc) that are impacted
by this issue... this patch only attempts to fix ctts since it
completely breaks packet timestamping.
This patch continues using an array for the ctts data, which is not
the most ideal given the rearrangement that needs to happen (via
memmove as new entries are read in). Ideally AVIndex and the ctts
data would be set-type structures so addition is always worst case
O(lg(n)) instead of the O(n^2) that exists now; this slowdown is
noticeable during seeks.
Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Adding an MOV format option to turn on/off the editlist supporting code, introduced in ca6cae73db
Signed-off-by: Sasi Inguva <isasi@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Retain the ranges of frame indexes when applying edit list in
mov_fix_index. The index ranges are then used to keep track of the frame
index of the current sample. In case of a discontinuity in frame indexes
due to edit, update the auxiliary info position accordingly.
Reviewed-by: Sasi Inguva <isasi@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This implements Spherical Video V1 and V2, as described in the
spatial-media collection by Google.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
This implements Spherical Video V1 and V2, as described in the
spatial-media collection by Google.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
This matrix needs to be applied after all others have (currently only
display matrix from trak), but cannot be handled in movie box, since
streams are not allocated yet. So store it in main context, and apply
it when appropriate, that is after parsing the tkhd one.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>