ffmpeg CLI pixel format selection for filtering currently special-cases
MJPEG encoding, where it will restrict the supported list of pixel
formats depending on the value of the -strict option. In order to get
that value it will apply it from the options dict into the encoder
context, which is a highly invasive action even now, and would become a
race once encoding is moved to its own thread.
The ugliness of this code can be much reduced by moving the special
handling of MJPEG into ofilter_bind_ost(), which is called from encoder
init and is thus synchronized with it. There is also no need to write
anything to the encoder context, we can evaluate the option into our
stack variable.
There is also no need to access AVCodec at all during pixel format
selection, as the pixel formats array is already stored in
OutputFilterPriv.
When -pix_fmt designates a BE/LE pixel format, it gets translated into
the native one by av_get_pix_fmt(). This may not always be the best
choice, as the encoder might only support one endianness. In such a
case, explicitly choose the endianness supported by the encoder.
While this is currently redundant with choose_pixel_fmt() in
ffmpeg_filter.c, the latter code will be deprecated in following commits.
frame is always != NULL for audio and video here
(this is checked by an assert and the frame is already dereferenced
before it reaches this code here).
Fixes Coverity issue #1538858.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
map_func is supposed to be an array of const pointer to function
returning int, not an array of pointer to function returning const int.
Reported-By: Martin Storsjö
Read the timebase from FrameData rather than the input stream. This
should fix#10393 and generally be more reliable.
Replace the use of '-1' to indicate demuxing timebase with the string
'demux'. Also allow to request filter timebase with
'-enc_time_base filter'.
It now contains data from multiple sources, so group those items that
always come from the decoder. Also, initialize them to invalid values,
so that frames that did not originate from a decoder can be
distinguished.
This is possible now that enc_open() is always called with a non-NULL
frame for audio/video.
Previously the code would directly reach into the buffersink, which is a
layering violation.
When no frames were passed from a filtergraph to an encoder, but the
filtergraph is configured (i.e. has output parameters), encoder flush
code will use those parameters to initialize the encoder in a last-ditch
effort to produce some useful output.
Rework this process so that it is triggered by the filtergraph, which
now sends a dummy frame with parameters, but no data, to the encoder,
rather than the encoder reaching backwards into the filter.
This approach is more in line with the natural data flow from filters to
encoders and will allow to reduce encoder-filter interactions in
following commits.
This code is tested by fate-adpcm-ima-cunning-trunc-t2-track1, which (as
confirmed by Zane) is supposed to produce empty output.
This line was added in c30a4489b4 along with
AVStream.sample_aspect_ratio. However, configuring SAR for video
encoding is now done after this code (specifically in enc_open(), which
is called when the first video frame to be encoded is obtained), so this
line cannot have any meaningful effect.
Replace duplicated(!) and broken* custom string parsing with
av_dict_parse_string(). Return error codes instead of aborting.
* e.g. it treats NULL returned from av_get_token() as "separator not
found", when in fact av_get_token() only returns NULL on memory
allocation failure
EOF from sq_receive() means no packets will ever be output by the sync
queue. Since the muxing sync queue is always used by all interleaved
(i.e. non-attachment) streams, this means no further packets can make
it to the muxer and we can terminate muxing now.
Also fix a couple of possible overflows while at it.
Fixes the negative initial timestamps in ticket #10358.
Signed-off-by: Marton Balint <cus@passwd.hu>
Not sure if the function naming frame_queue_destory is intended because
"destory" is not really a word. Changing it to "destroy" makes more sense.
Signed-off-by: QiTong Li <liqitong@163.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
This is only a preparatory step to a fully threaded architecture and
does not yet make decoding truly parallel - the main thread will
currently submit a packet and wait until it has been fully processed by
the decoding thread before moving on. Decoder behavior as observed by
the rest of the program should remain unchanged. That will change in
future commits after encoders and filters are moved to threads and a
thread-aware scheduler is added.
It is only used for flushing the subtitle decoder, so allocate a
dedicated packet for that.
Keep Decoder.pkt unused for now, it will be repurposed in future
commits.
Make the function process just one input stream at a time and save an
indentation level. Also rename it to ist_add() to be consistent with an
analogous function in ffmpeg_mux_init.
Currently of_output_packet() reuses the input packet, which requires its
callers to submit blank packets even on EOF, which makes the code more
complex.
It is set by the muxing code, which will not be synchronized with
encoding code after upcoming threading changes. Use an encoder-private
variable instead.
Packets submitted to the muxer now have their timebase attached to them,
so the muxer can do conversion to muxing timebase and avoid exposing it
to callers.
There is no reason to postpone it until opening the encoder. Also, abort
when the input stream is unknown, rather than disregard an explicit
request from the user.
The code will currently add a small offset to avoid exact midpoints, but
this can cause inexact results when a float timestamp is exactly
representable as an integer.
Fixes off-by-one in the first frame duration in multiple FATE tests.
Current code marks the output stream as finished and waits for a flush
packet, but that is both unnecessary and suspect, as in theory nothing
should be sent to a finished stream - not even flush packets.
Make all relevant state per-filtergraph input, rather than per-input
stream. Refactor the code to make it work and avoid leaking memory when
a single subtitle stream is sent to multiple filters.
Set them in ifilter_parameters_from_dec(), similarly to audio/video
streams. This reduces the extent to which sub2video filters need to be
treated specially.
This function should not take an InputStream, as it only uses it to get
the InputFile and the timebase. Pass those directly instead and avoid
confusion over dealing with multiple InputStreams.
This queue should be associated with a specific filtergraph input - if
a subtitle stream is sent to multiple filters then each should have its
own queue.
This code is a sub2video analogue of ifilter_send_frame(), so it
properly belongs to the filtering code.
Note that using sub2video with more than one target for a given input
subtitle stream is currently broken and this commit does not change
that. It will be addressed in following commits.
When the filtergraph has no inputs, it can be configured immediately
when all its outputs are bound to output streams. This will simplify
treating some corner cases.
This way the list of filtergraph inputs/outputs is always known after
FilterGraph creation. This will allow treating simple and complex
filtergraphs in a more uniform manner.
Currently NULL would be passed for simple filtergraphs, which would
make the filter code extract the graph description from the output
stream when needed. This is unnecessarily convoluted.
A non-limiting stream could mistakenly end up being the queue head,
which would then produce incorrect synchronization, seen e.g. in
fate-matroska-flac-extradata-update for certain number of frame threads
(e.g. 5).
Found-By: James Almer
Checking whether the user requested an unsupported conversion between
text and bitmap subtitles can be done immediately when creating the
output stream.
This function is entangled with encoder setup, so it is more encoding
code rather than ffmpeg_hw code. This will allow making more encoder
state private in the future.
This function is entangled with decoder setup, so it is more decoding
code rather than ffmpeg_hw code. This will allow making more decoder
state private in the future.
As the comment in the code mentions, EAGAIN is not an expected value here
because we call avcodec_receive_frame() until all frames have been returned.
avcodec_send_packet() returning EAGAIN means a packet is still buffered, which
hints that the underlying decoder is buggy and not fetching packets as it
should.
An example of this behavior was in the libdav1d wrapper before f209614290,
where feeding it split frames (or individual OBUs) would result in the CLI
eventually printing the confusing "Error submitting packet to decoder: Resource
temporarily unavailable" error message, and just keep going until EOF without
returning new frames.
Signed-off-by: James Almer <jamrial@gmail.com>
It currently emulates the long-removed
avcodec_decode_audio4/avcodec_decode_video2 APIs, which obfuscates the
actual decoding flow. Restructure the decoding calls so that they
naturally follow the new avcodec_send_packet()/avcodec_receive_frame()
design.
This is not only significantly easier to read, but also shorter.
It is currently handled in the same loop as audio and video, but this
obscures the actual flow, because only one iteration is ever performed
for subtitles.
Also, avoid a pointless packet reference.
process_input_packet() contains two non-interacting pieces of nontrivial
size and complexity - decoding and streamcopy. Separating them makes the
code easier to read.
New placement requires fewer explicit conditions and is easier to
understand.
The logic should be exactly equivalent, since this is the only place
where eof_reached is set for decoding.
Passing ist=NULL is currently used to identify stream types that do not
decode into AVFrames, i.e. subtitles. That is highly non-obvious -
always pass a non-NULL InputStream and just check the type explicitly.
It tracks whether the decoder for this stream ever produced any frames
and its only use is for checking whether a filter input ever received a
frame - those that did not are prioritized by the scheduler.
This is awkward and unnecessarily complicated - checking whether the
filtergraph input format is valid works just as well and does not
require maintaining an extra variable.
In case no decoder is available, dec_open() called from ist_use() will
fail with 'Decoding requested, but no decoder found', so this check is
redundant.
When a filtergraph input receives EOF but never saw any input frames, we
use the fallback parameters. Currently an attempt to actually configure
the filtergraph will happen elsewhere, but there is no reason to
postpone this.
With complex filtergraphs it can happen that the filtergraph is
unconfigured because some other filter than the one we just got EOF on
is missing parameters.
Make sure that the fallback parametes for a given input are only used
when that input is unconfigured.
It does no initialization anymore, except for setting
transcode_init_done - the bulk of the function is printing the
input/output maps. It also cannot fail anymore, so remove the useless
return value.
Export the corresponding flag in InputFile instead. This will allow
making the demuxer AVFormatContext private in future commits, similarly
to what was previously done for muxers.
There is no point in having a per-stream wallclock start time, since
they are all computed at the same instant. Keep a per-file start time
instead, initialized when the demuxer thread starts.
That is a more appropriate place for this code and will allow hiding
more of InputStream.
The value of repeat_pict extracted from libavformat internal parser no
longer needs to be trasmitted outside of the demuxing thread.
Move readrate handling to the demuxer thread. This has to be done in the
same commit, since it reads InputStream.dts,nb_packets, which are now
set in the demuxer thread.
This way computing it and using it for streamcopy does not need to
happen in sync. Will be useful in following commits, where updating
InputStream.dts will be moved to the demuxing thread.
This code runs post-demuxing and is not synchronized with the decoder
output (which may be delayed with respect to its input by arbitrary and
unknowable amounts), so accessing any decoder properties is incorrect.
Move them to a separate function called right after timestamp
discontinuity processing. This is now possible, since these values have
no interaction with decoding anymore.
The two checks using eof_reached are testing whether more input can
possibly appear on this filtergraph input. InputFilterPriv.eof is the
more authoritative source for this information.
When an input stream terminates and no frames were successfully decoded,
filtering code will currently configure the filtergraph using demuxer
stream parameters. Use decoder parameters instead, which should be more
reliable. Also, initialize them immediately when an input stream is
bound to a filtergraph input, so that these parameters are always
available (if at all) and filtering code does not need to reach into the
decoder at some arbitrary later point.
When no frames are ever seen by an encoder, encoder flush will do a
last-ditch attempt to configure its source filtergraph in order to at
least get the stream parameters. This involves extracting demuxer
parameters from filtergraph source inputs, which is
* a bad layering violation
* probably unreachable, because decoders are flushed before encoders,
which should call ifilter_send_eof(), which will also set these
parameters; however due to complex control flow it is hard to be
entirely sure this code can never be triggered
Even if this code can actually be reached, it is probably better to
return an error as the comment above it says.
These two functions are a part of a single logical action - determining
which, if any, output stream needs to be processed next. Keeping them
separate is a historical artifact that obscures what is actually being
done.
Currently those are set in different ways depending on whether the
stream is decoded or not, using some values from the decoder if it is.
This is wrong, because there may be arbitrary amount of delay between
input packets and output frames (depending e.g. on the thread count when
frame threading is used).
Always use the path that was previously used only for streamcopy. This
should not cause any issues, because these values are now used only for
streamcopy and discontinuity handling.
This change will allow to decouple discontinuity processing from
decoding and move it to ffmpeg_demux. It also makes the code simpler.
Changes output in fate-cover-art-aiff-id3v2-remux and
fate-cover-art-mp3-id3v2-remux, where attached pictures are now written
in the correct order. This happens because InputStream.dts is no longer
reset to AV_NOPTS_VALUE after decoding, so streamcopy actually sees
valid dts values.
This was added in 380db56928 as a
temporary crutch that is not needed anymore. The only case where this
code can be triggered is the very first frame, for which InputStream.pts
is always equal to 0.
Stop using InputStream.dts for generating missing timestamps for decoded
frames, because it contains pre-decoding timestamps and there may be
arbitrary amount of delay between input packets and output frames (e.g.
dependent on the thread count when frame threading is used). It is also
in AV_TIME_BASE (i.e. microseconds), which may introduce unnecessary
rounding issues.
New code maintains a timebase that is the inverse of the LCM of all the
samplerates seen so far, and thus can accurately represent every audio
sample. This timebase is used to generate missing timestamps after
decoding.
Changes the result of the following FATE tests
* pcm_dvd-16-5.1-96000
* lavf-smjpeg
* adpcm-ima-smjpeg
In all of these the timestamps now better correspond to actual frame
durations.
If input packets have timestamps, they will be propagated to output
frames by the decoder, so at best this block does not do anything.
There can also be an arbitrary amount of delay between packets sent to
the decoder and decoded frames (e.g. due to decoder's intrinsic delay or
frame threading), so deriving any timestamps from packet properties is
wrong.
The previous name is misleading, because the function does not actually
initialize any filters - it creates a new output stream and binds a
filtergraph output to it.
Their only function is checking that encoding was not specified for
data/unknown-type streams, but the check is broken because enc_ctx will
not be created in ost_add() unless a valid encoder can be found.
Add an actually working check for all types for which encoding is not
supported in choose_encoder().
ts_discontinuity_detect() is applied right after demuxing, while
InputStream.pts is a post-decoding timestamp, which may be delayed with
respect to demuxing by an arbitrary amount (e.g. depending on the thread
count when frame threading is used).
The name is misleading, because it is not a picture in the sense of MPEG
terminology (which define "picture" as "frame or field"), but always a
full frame. 'next' is also redundant and/or misleading, because it is
the _current_ frame to be encoded.
Previously they would only be used with trivial filtergraphs, because
filters did not handle frame durations. That is no longer true - most
filters process frame durations properly (there may still be some that
don't - this change will help finding and fixing them).
Improves output video frame durations in a number of FATE tests.
When an encoder exports sum-of-squared-differences information in
encoded packets, print_report() will print PSNR information in the
status line. However,
* the code computing PSNR assumes 8bit 420 video and prints incorrect
values otherwise; there are no issues on trac about this
* only a few encoders (namely aom, vpx, mpegvideo, snow) export this
information; other often-used encoders such as libx26[45] do not
export this, even though they could
This suggests that this feature is not useful and it is better to remove
it rather than spend effort on fixing it.
Remove now-obsolete code setting packet durations pre-muxing for CFR
encoded video.
Changes output in the following FATE tests:
* numerous adpcm tests
* ffmpeg-filter_complex_audio
* lavf-asf
* lavf-mkv
* lavf-mkv_attachment
* matroska-encoding-delay
All of these change due to the fact that the output duration is now
the actual input data duration and does not include padding added by
the encoder.
* apng-osample: less wrong packet durations are now passed to the muxer.
They are not entirely correct, because the first frame duration should
be 3 rather than 2. This is caused by the vsync code and should be
addressed later, but this change is a step in the right direction.
* tscc2-mov: last output frame has a duration of 11 rather than 1 - this
corresponds to the duration actually returned by the demuxer.
* film-cvid: video frame durations are now 2 rather than 1 - this
corresponds to durations actually returned by the demuxer and matches
the timestamps.
* mpeg2-ticket6677: durations of some video frames are now 2 rather than
1 - this matches the timestamps.
That field was added to store timestamp conversion state for audio
decoding code. Later it started being used by streamcopy as well, but
that use is wrong because, for a given input stream, both decoding and
an arbitrary number of streamcopies may be performed simultaneously.
They would then all overwrite the same state variable.
Store this state in MuxStream instead.
This is the last use of InputStream in of_streamcopy(), so the ist
parameter can now be removed.
It stores codec parameters of the stream submitted to the muxer, which
may be different from the codec parameters in AVStream due to bitstream
filtering.
This avoids the confusing back and forth synchronisation between the
encoder, bitstream filters, and the muxer, now information flows only in
one direction. It also reduces the need for non-muxing code to access
AVStream.
Reduces access to a deeply nested muxer property
OutputStream.st->codecpar->codec_type for this fundamental and immutable
stream property.
Besides making the code shorter, this will allow making the AVStream
(OutputStream.st) private to the muxer in the future.
Set InputStream.decoding_needed/discard/etc. only from
ist_{filter,output},add() functions. Reduces the knowledge of
InputStream internals in muxing/filtering code.
init_input_stream() can print log messages directly, there is no need to
ship them to the caller.
Also, log errors to the InputStream and avoid duplicate information in
the message.
Changing AVCodecContext.sample_aspect_ratio after the encoder was opened
is by itself questionable, but if anywhere it belongs in encoding rather
than filtering code.
Creating a new output stream of a given type is currently done by
calling new_<type>_stream(), which all start by calling
new_output_stream() to allocate the stream and do common init, followed
by type-specific init.
Reverse this structure - the caller now calls the common function
ost_add() with the type as a parameter, which then calls the
type-specific function internally. This will allow adding common code
that runs after type-specific code in future commits.
In most cases this should only occur once or so per stream in an
input, and in case the logic ends up in an eternal loop, it should
be visible to the end user instead of being completely invisible.
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
When no packet dts values are available from the container, video
decoding code will currently use its own guessed values, which will then
be propagated to pkt_dts on decoded frames and used as pts in certain
cases. This is inaccurate, fragile, and unnecessarily convoluted.
Simply removing this allows the extrapolation code introduced in the
previous commit to do a better job.
Changes the results of numerous h264 and hevc FATE tests, where no
spurious timestamp gaps are generated anymore. Several tests no longer
need -vsync passthrough.
When no timestamps are available from the container, the video decoding
code will currently use fake dts values - generated in
process_input_packet() based on a combination of information from the
decoder and the parser (obtained via the demuxer) - to generate
timestamps during decoder flushing. This is fragile, hard to follow, and
unnecessarily convoluted, since more reliable information can be
obtained directly from post-decoding values.
The new code keeps track of the last decoded frame pts and estimates its
duration based on a number of heuristics. Timestamps generated when both
pts and pkt_dts are missing are then simple pts+duration of the last frame.
The heuristics are somewhat complicated by the fact that lavf insists on
making up packet timestamps based on its highly incomplete information.
That should be removed in the future, allowing to further simplify this
code.
The results of the following tests change:
* h264-3386 now requires -fps_mode passthrough to avoid dropping frames
at the end; this is a pathology of the interaction of the new and old
code, and the fact that the sample switches from field to frame coding
in the last packet, and will be fixed in following commits
* hevc-conformance-DELTAQP_A_BRCM_4 stops inventing an arbitrary
timestamp gap at the end
* hevc-small422chroma - the single frame output by this test now has a
timestamp of 0, rather than an arbitrary 7
This field contains different values depending on whether the stream is
being decoded or not. When it is, InputStream.pts is set to the
timestamp of the last decoded frame. Otherwise, it is made equal to
InputStream.dts.
Since a given InputStream can be at the same time decoded and
streamcopied to any number of output streams, this use is incorrect, as
decoded frame timestamps can be delayed with respect to input packets by
an arbitrary amount (e.g. depending on the thread count when frame
threading is used).
Replace all uses of InputStream.pts for streamcopy with InputStream.dts,
which is its value when decoding is not performed. Stop setting
InputStream.pts for pure streamcopy.
Also, pass InputStream.dts as a parameter to do_streamcopy(), which
will allow that function to be decoupled from InputStream completely in
the future.
Which is subtitle encoding. Also, check for AVSubtitle.pts rather than
InputStream.pts, since that is the more authoritative value and is
guaranteed to be valid.
That function only contains two checks now - whether the muxer returned
EOF and whether the packet timestamp is before requested output start
time.
The first check is unnecessary, since the packet will just be rejected
by the muxer. The second check is better combined with a related check
directly in do_streamcopy().
Currently, output streams where an input stream is sent directly (i.e.
not through lavfi) are determined by iterating over ALL the output
streams and skipping the irrelevant ones. This is awkward and
inefficient.
The channel layout is set before opening the encoder, in enc_open().
Messing with it in configure_output_audio_filter() cannot accomplish
anything meaningful.
This option adds a long string of numbers to the progress line, where
i-th number contains the base-2 logarithm of the number of times a frame
with this QP value was seen by print_report().
There are multiple problems with this feature:
* despite this existing since 2005, web search shows no indication
that it was ever useful for any meaningful purpose;
* the format of what is printed is entirely undocumented, one has to
find it out from the source code;
* QP values above 31 are silently ignored;
* it only works with one video stream;
* as it relies on global state, it is in conflict with ongoing
architectural changes.
It then seems that the nontrivial cost of maintaining this option is not
worth its negligible (or possibly negative - since it pollutes the
already large option space) value.
Users who really need similar functionality can also implement it
themselves using -vstats.
Current code in print_final_stats(), printing the final summary such as
video:8851kB audio:548kB subtitle:0kB other streams:0kB global headers:20kB muxing overhead: 0.559521%
was written with a single output file in mind and makes very little
sense otherwise.
Print this information in mux_final_stats() instead, one line per output
file. Use the correct filesize, if available.
This is currently done in two places:
* at the end of print_final_stats(), which merely prints a warning if
the total size of all written packets is zero
* at the end of transcode() (under a misleading historical 'close each
encoder' comment), which instead checks the packet count to implement
-abort_on empty_output[_stream]
Consolidate both of these blocks into a single function called from
of_write_trailer(), which is a more appropriate place for this. Also,
return an error code rather than exit immediately, which ensures all
output files are properly closed.
Properly pass muxing return codes through the call stack instead.
Slightly changes behavior in case of errors:
* the output IO stream is closed even if writing the trailer returns an
error, which should be more correct
* all files get properly closed with -xerror, even if one of them fails
It is video encoding-only and does not need to be visible outside of
ffmpeg_enc.c
Also, rename the variable to frames_prev_hist to be consistent with
the naming in do_video_out().
Drop unneeded ctype.h and math.h.
Group all system headers together.
Sort unconditional includes alphabetically.
Group local includes by the library, sort alphabetically.
Several places in the code currently call init_output_stream_wrapper(),
which in turn calls init_output_stream(), which then calls either
enc_open() or init_output_stream_streamcopy(), followed by
of_stream_init(), which tells the muxer the stream is ready for muxing.
All except one of these callers are in the encoding code, which will be
moved to ffmpeg_enc.c. Keeping this structure would then necessitate
ffmpeg_enc.c calling back into the common code in ffmpeg.c, which would
then just call ffmpeg_mux, thus making the already convoluted call chain
even more so.
Simplify the situation by using separate paths for filter-fed output
streams (audio and video encoders) and others (subtitles, streamcopy,
data, attachments).
Encoder initialization is currently split rather arbitrarily between
init_output_stream_encode() and init_output_stream(). Move all of it to
init_output_stream_encode().
The code currently uses lavfi for this, which creates a sort of
configuration dependency loop - the encoder should be ideally
initialized with information from the first audio frame, but to get this
frame one needs to first open the encoder to know the frame size. This
necessitates an awkward workaround, which causes audio handling to be
different from video.
With this change, audio encoder initialization is congruent with video.
For audio AVFrames, nb_samples is typically more trustworthy than
duration. Since sync queues look at durations, make sure they match the
sample count.
The last audio frame in the fate-shortest test is now gone. This is more
correct, since it outlasts the last video frame.