Note that this changes the code to work the same way as other protocols where
an URL parameter can override an AVOption.
Signed-off-by: Marton Balint <cus@passwd.hu>
This allows choosing whether the `fit_mode` merely controls the placement
of the image within the output resolution, or whether the output resolution
is also adjusted according to the given `fit_mode`.
The semantics of these keywords are well-defined by the CSS 'object-fit'
property. This is arguably more user-friendly and less obtuse than the
existing `normalize_sar` and `pad_crop_ratio` options. Additionally, this
comes with two new (useful) behaviors, `none` and `scale_down`, neither of
which map elegantly to the existing options.
One additional benefit of this option is that, unlike `normalize_sar`, it
does *not* also imply `reset_sar`; meaning that users can now choose to
have an anamorphic base layer and still have the overlay images scaled to fit
on top of it according to the chosen strategy.
See-Also: https://drafts.csswg.org/css-images/#the-object-fit
This allows upstream filters to observe EOF on their corresponding outputs
and terminate early, which is particularly useful for decoders and demuxers
that may then gracefully release their resources.
Prevents "leaking" memory for previously used, but now unused, filter inputs.
THis is currently done by sch_demux_send() (via demux_stream_send_to_dst()),
sch_enc_send() (via enc_send_to_dst()), and sch_dec_send() (via
dec_send_to_dst()), but not by sch_filter_send().
Implement the same queue-closing logic for them. The main benefit here is that
this will allow them to mark downstream inputs as send-done (in addition
to received-done), which is useful for a following commit.
Cut off a choked demuxer's output codec/filter queues, effectively preventing
them from processing packets while the demuxer is choked. Avoids downstream
nodes from piling up extra input that a demuxer shouldn't currently be
sending.
The main benefit of this is to avoid queuing up excess packets that don't want
to be decoded yet, reducing memory consumption for idle inputs by preventing
them from being read earlier than needed.
Currently, when a demuxer thread is choked, it will avoid queuing more
packets, but any packets already present on the thread queue will still be
processed.
This can be quite wasteful if the choke is due to e.g. decoder not being
needed yet, such as in a filter graph involving concatenation-style filters.
Adding the ability to propagate the choke status to the thread queue directly
allows downstream decoders and filter graphs to avoid unnecessary work and
buffering.
Reduces the effective latency between scheduler updates and changes in the
thread workfload.
Currently, while the filter graph is being initially created, the scheduler
continues demuxing frames on the last input that happened to be active before
the filter graph was complete.
This can lead to an excess number of decoded frames "piling" up on this input,
regardless of whether or not it will actually be requested by the configured
filter graph. Suspending the filter graph during this initialization phase
reduces the amount of wasted memory.
This field is just saving (typically) a single pointer indirection; and IMO
makes the logic and graph relations unnecessarily complicated. I am also
considering adding choking logic to decoders and encoders as well, which this
field would get in the way of.
Apart from the unchoking logic in unchoke_for_input(), the only other place
that uses this field is the (cold) function check_acyclic(), which can be
served just as well with a simple function to do the graph traversal there.
I tested this extensively under different conditions and could not come up
with any scenario where using a larger queue size was actually beneficial.
Moreover, having such a large default queue is very wasteful especially
for larger frame sizes; and can in the worst case lead to an extra ~50% memory
footprint per input (with the default 16 threads), regardless of whether that
input is currently in use or not.
My methodology was to add logging in the event of a queue underrun/overrun,
and then observe and then observe the frequency of such events in practice,
as well as the impact on performance. I came up with an example filter graph
involving decoding, filtering and encoding with several input files and
various changes to move the bottleneck around.
I found that, in all configurations I tested, with all thread counts and
bottlenecks, using a queue size of 2 frames yielded practically identical
performance to a queue size of 8 frames. I was only able to consistently
measure a slowdown when restricting the queue to a single frame, where the
underruns ended up making up almost 1.1% of frame events in the worst case.
A summary of my test log follows:
= Bottleneck in decoder =
ffmpeg -i A -i B -i C -filter_complex "concat=n=3" -f null -
== 16 threads ==
=== Queue statistics (dec -> filtergraph) ===
- 8 frames = 91355 underruns, 1 overrun
- 4 frames = 91381 underruns, 2 overruns
- 2 frames = 91326 underruns, 21 overruns
- 1 frame = 91284 underruns, 102 overruns
=== Time elapsed ===
- 8 frames = 14.37s
- 4 frames = 14.28s
- 2 frames = 14.27s
- 1 frame = 14.35s
== 1 thread ==
=== Queue statistics (dec -> filtergraph) ===
- 8 frames = 91801 underruns, 0 overruns
- 4 frames = 91929 underruns, 1 overrun
- 2 frames = 91854 underruns, 7 overruns
- 1 frame = 91745 underrons, 83 overruns
=== Time elapsed ===
- 8 frames = 39.51s
- 4 frames = 39.94s
- 2 frames = 39.91s
- 1 frame = 41.69s
= Bottleneck in filter graph: =
ffmpeg -i A -i B -i C -filter_complex "concat=n=3,scale=3840x2160" -f null -
== 16 threads ==
=== Queue statistics (dec -> filtergraph) ===
- 8 frames = 277 underruns, 84673 overruns
- 4 frames = 640 underruns, 86523 overruns
- 2 frames = 850 underruns, 88751 overruns
- 1 frame = 1028 underruns, 89957 overruns
=== Time elapsed ===
- 8 frames = 26.35s
- 4 frames = 26.31s
- 2 frames = 26.38s
- 1 frame = 26.55s
== 1 thread ==
=== Queue statistics (dec -> filtergraph) ===
- 8 frames = 29746 underruns, 57033 overruns
- 4 frames = 29940 underruns, 58948 overruns
- 2 frames = 30160 underruns, 60185 overruns
- 1 frame = 30259 underruns, 61126 overruns
=== Time elapsed ===
- 8 frames = 52.08s
- 4 frames = 52.49s
- 2 frames = 52.25s
- 1 frame = 52.69s
= Bottleneck in encoder: =
ffmpeg -i A -i B -i C -filter_complex "concat=n=3" -c:v libx264 -preset veryfast -f null -
== 1 thread ==
== Queue statistics (filtergraph -> enc) ==
- 8 frames = 26763 underruns, 63535 overruns
- 4 frames = 26863 underruns, 63810 overruns
- 2 frames = 27243 underruns, 63839 overruns
- 1 frame = 27670 underruns, 63953 overruns
== Time elapsed ==
- 8 frames = 89.45s
- 4 frames = 89.04s
- 2 frames = 89.24s
- 1 frame = 90.26s
The code in the decoder just cares about allocating enough extra hw frames
to cover the size of the queue; but there's no reason we actually *have* to
use this many. We can safely relax the assertion to a <= check.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
I have redacted the exact location of the FFmpeg server as writing
that in public seems just a bad idea
Refer to RFC 4588.
Add and set the basic param of RTX like
ssrc, payload_type, srtp.
Modify the SDP to add RTX info so that
the peer be able to parse the RTX packet.
There are more pateches to make RTX really
work.
Signed-off-by: Jack Lau <jacklau1222@qq.com>
for live RTP streams. Some external applications, such as Qt Multimedia,
depend on this flag being set correctly.
Signed-off-by: Kaarle Ritvanen <kaarle.ritvanen@datakunkku.fi>
This is required placement by standard [[maybe_unused]] attribute, works
the same for __attribute__((unused)).
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
This affects the {avg,put}_no_rnd_pixels16_{x,y}2 MMX and
(put-only) MMXEXT versions. Removing these functions saved
1184B here.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These currently only exist as MMX versions.
The added functions occupy 320B here. So far, they are only for
the x2 and y2 (i.e. right and down, not down-right) directions.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These currently only exist as MMX and (not bitexact) MMXEXT versions.
The added functions occupy 288B here. So far, they are only for
the x2 and y2 (i.e. right and down, not down-right) directions.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only the blocksizes 16 and 8 are implemented, yet the motion estimation
code touches the blocksize 4 entries. But really nothing touches
the blocksize 2 entries, so that we can reduce the put_no_rnd_pixels_tab
array size to [3][4].
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 versions.
This commit therefore removes the MMX(EXT) functions overridden
by them (which don't abide by the ABI) to get closer to a removal
of emms_c.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
SSSE3 is already quite old (introduced 2006 for Intel, 2011 for AMD),
so that the overwhelming majority of our users (particularly those
that actually update their FFmpeg) will be using the SSSE3 version
of filter_line.
This commit therefore removes the overridden MMXEXT version
(which didn't abide by the ABI) which allows us to remove
an emms_c() from vf_gradfun.c, so that users with SSSE3 no longer
pay a price for the mere existence of an MMXEXT version.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>