This was tested with medias recorded from an iPhone XR and an iPhone 13.
Here is how a typical stream looks like in coding order:
┌────────┬─────┬─────┬──────────┐
│ sample | PTS | DTS | keyframe |
├────────┼─────┼─────┼──────────┤
┊ ┊ ┊ ┊ ┊
│ 53 │ 560 │ 510 │ No │
│ 54 │ 540 │ 520 │ No │
│ 55 │ 530 │ 530 │ No │
│ 56 │ 550 │ 540 │ No │
│ 57 │ 600 │ 550 │ Yes │
│ * 58 │ 580 │ 560 │ No │
│ * 59 │ 570 │ 570 │ No │
│ * 60 │ 590 │ 580 │ No │
│ 61 │ 640 │ 590 │ No │
│ 62 │ 620 │ 600 │ No │
┊ ┊ ┊ ┊ ┊
In composition/display order:
┌────────┬─────┬─────┬──────────┐
│ sample | PTS | DTS | keyframe |
├────────┼─────┼─────┼──────────┤
┊ ┊ ┊ ┊ ┊
│ 55 │ 530 │ 530 │ No │
│ 54 │ 540 │ 520 │ No │
│ 56 │ 550 │ 540 │ No │
│ 53 │ 560 │ 510 │ No │
│ * 59 │ 570 │ 570 │ No │
│ * 58 │ 580 │ 560 │ No │
│ * 60 │ 590 │ 580 │ No │
│ 57 │ 600 │ 550 │ Yes │
│ 63 │ 610 │ 610 │ No │
│ 62 │ 620 │ 600 │ No │
┊ ┊ ┊ ┊ ┊
Sample/frame 58, 59 and 60 are B-frames which actually depends on the
key frame (57). Here the key frame is not an IDR but a "CRA" (Clean
Random Access).
Initially, I thought I could rely on the sdtp box (independent and
disposable samples), but unfortunately:
sdtp[54] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[55] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[56] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[57] is_leading:0 sample_depends_on:2 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[58] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[59] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[60] is_leading:0 sample_depends_on:1 sample_is_depended_on:2 sample_has_redundancy:0
sdtp[61] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
sdtp[62] is_leading:0 sample_depends_on:1 sample_is_depended_on:0 sample_has_redundancy:0
The information that might have been useful here would have been
is_leading, but all the samples are set to 0 so this was unusable.
Instead, we need to rely on sgpd/sbgp tables. In my case the video track
contained 3 sgpd tables with the following grouping types: tscl, sync
and tsas. In the sync table we have the following 2 entries (only):
sgpd.sync[1]: sync nal_unit_type:0x14
sgpd.sync[2]: sync nal_unit_type:0x15
(The count starts at 1 because 0 carries the undefined semantic, we'll
see that later in the reference table).
The NAL unit types presented here correspond to:
libavcodec/hevc.h: HEVC_NAL_IDR_N_LP = 20,
libavcodec/hevc.h: HEVC_NAL_CRA_NUT = 21,
In parallel, the sbgp sync table contains the following:
┌────┬───────┬─────┐
│ id │ count │ gdi │
├────┼───────┼─────┤
│ 0 │ 1 │ 1 │
│ 1 │ 56 │ 0 │
│ 2 │ 1 │ 2 │
│ 3 │ 59 │ 0 │
│ 4 │ 1 │ 2 │
│ 5 │ 59 │ 0 │
│ 6 │ 1 │ 2 │
│ 7 │ 59 │ 0 │
│ 8 │ 1 │ 2 │
│ 9 │ 59 │ 0 │
│ 10 │ 1 │ 2 │
│ 11 │ 11 │ 0 │
└────┴───────┴─────┘
The gdi column (group description index) directly refers to the index in
the sgpd.sync table. This means the first frame is an IDR, then we have
batches of undefined frames interlaced with CRA frames. No IDR ever
appears again (tried on a 30+ seconds sample).
With that information, we can build an heuristic using the presentation
order.
A few things needed to be introduced in this commit:
1. min_sample_duration is extracted from the stts: we need the minimal
step between sample in order to PTS-step backward to a valid point
2. In order to avoid a loop over the ctts table systematically during a
seek, we build an expanded list of sample offsets which will be used
to translate from DTS to PTS
3. An open_key_samples index to keep track of all the non-IDR key
frames; for now it only supports HEVC CRA frames. We should probably
add BLA frames as well, but I don't have any sample so I prefered to
leave that for later
It is entirely possible I missed something obvious in my approach, but I
couldn't come up with a better solution. Also, as mentioned in the diff,
we could optimize is_open_key_sample(), but the linear scaling overhead
should be fine for now since it only happens in seek events.
Fixing this issue prevents sending broken packets to the decoder. With
FFmpeg hevc decoder the frames are skipped, with VideoToolbox the frames
are glitching.
sgpd means Sample Group Description Box.
For now, only the sync grouping type is parsed, but the function can
easily be adjusted to support other flavours.
The sbgp (Sample to Group Box) sync_group table built in previous commit
contains references to this table through the group_description_index
field.
By ffmpeg threading support implementation via frame slicing and doing
zimg_filter_graph_build that used to take 30-60% of each frame processig
only if necessary (some parameters changed)
the performance increase vs original version
in video downscale and color conversion >4x is seen
on 64 cores Intel Xeon, 3x on i7-6700K (4 cores with HT)
Signed-off-by: Victoria Zhislina <Victoria.Zhislina@intel.com>
If _FieldBased, _Matrix, _ColorRange, or _ChromaLocation haven't
been set, that absence would be interpreted as 0, leading to those
being set to case 0 instead of default. There is no case 0 for
_Primaries and _Transfer, so those were correctly falling back
to the default case.
Signed-off-by: Stephen Hutchinson <qyot27@gmail.com>
It appears this is not allowed "Each Segment Index box documents how a (sub)segment is divided into one or more subsegments
(which may themselves be further subdivided using Segment Index boxes)."
Fixes: Null pointer dereference
Fixes: Ticket9517
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The loongson_intrinsics.h file is updated from v1.0.3 version
to v1.1.0. Some spelling mistakes are fixed and new functions are added.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Reviewed-by: 殷时友 <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>