Requires a new upstream function to test not for *import* support on a
given output pixel format, but also whether we can render to it.
Fixes: https://github.com/haasn/libplacebo/issues/173
That feature is overkill for a constant pointer to AVFilterLink which
can be stored in AVCodecContext.opaque (indirectly, because the link is
not allocated yet at the time the codec is opened).
This also avoids leaking non-NULL AVFrame.opaque to callers.
Add an optional filter_line3 to the available optimisations.
filter_line3 is equivalent to filter_line, memcpy, filter_line
filter_line shares quite a number of loads and some calculations in
common with its next iteration and testing shows that using aarch64
neon filter_line3s performance is 30% better than two filter_lines
and a memcpy.
Adds a test for vf_bwdif filter_line3 to checkasm
Rounds job start lines down to a multiple of 4. This means that if
filter_line3 exists then filter_line will not sometimes be called
once at the end of a slice depending on thread count. The final slice
may do up to 3 extra lines but filter_edge is faster than filter_line
so it is unlikely to create any noticable thread load variation.
Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
Exports C filter_line needed for tail fixup of neon code
Adds neon for filter_line
Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
Adds clip and spatial macros for aarch64 neon
Exports C filter_edge needed for tail fixup of neon code
Adds neon for filter_edge
Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
Adds an outline for aarch neon functions
Adds common macros and consts for aarch64 neon
Exports C filter_intra needed for tail fixup of neon code
Adds neon for filter_intra
Signed-off-by: John Cox <jc@kynesim.co.uk>
Signed-off-by: Martin Storsjö <martin@martin.st>
The HDR metadata should be removed after HDR to SDR conversion,
otherwise the output frame still has HDR side data.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
This makes the filter output match that of the C version.
It was left intentionally while we figured out if it was better
or not, and while it makes certain samples better, it makes static
samples jump around slightly.
The discrepancy between the definition and the declaration
in allfilters.c is actually UB.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The old logic was trying to be excessively clever in "deducing" that the
user wanted to stretch/scale the image when ow/oh differed from iw/ih
aspect ratio. But this is almost surely unintended except in
pathological cases, and in those cases users should simply disable
normalize_sar and do all the stretching/scaling logic themselves. This
is especially important in multi-input mode, where the canvas may be
vastly different from the input dimensions of any stream. Also, passing
through input 0 SAR in multi-input mode is arbitrary and nearly useless,
so again force output SAR to 1:1 here.
Use the gcd of all input timebases to ensure PTS accuracy. For the
framerate, just pick the highest of all the inputs, under the assumption
that we will render frames with approximately this frequency. Of course,
this is not 100% accurate, in particular if the input frames are badly
misaligned. But this field is informational to begin with.
Importantly, it covers the "common" case of combining high FPS and low
FPS streams with aligned frames.
In the event that some frame mixes are OK while others are not, the
priority goes:
1. Errors in updating any frame -> return error
2. Any input incomplete -> request frames and return
3. Any inputs OK -> ignore EOF streams and render remaining inputs
4. No inputs OK -> set output to most recent status
This logic ensures that we can continue rendering the remaining streams,
no matter which streams reach their end of life, until we have no
streams left at which point we forward the last EOF.
When combining multiple inputs, the output PTS may be less than the PTS
of the input. In this case, the current's code assumption of always
draining one value from the FIFO is incorrect. Replace by a smarter
function which drains only those PTS values that were actually consumed.
When combining multiple inputs with different PTS and durations, in
input-timed mode, we emit one output frame for every input frame PTS,
from *any* input. So when combining a low FPS stream with a high FPS
stream, the output framerate would match the higher FPS, independent of
which order they are specified in.
Subsequent inputs require frame blending to be enabled, in order to not
overwrite the existing frame contents.
For output metadata, we implicitly copy the metadata of the *first*
available stream (falling back to the second stream if the first has
already reached EOF, and so on). This is done to resolve any conflicts
between inputs with differing metadata. So when e.g. input 1 is HDR and
output 2 is SDR, the output will be HDR, and vice versa. This logic
could probablly be improved by dynamically determining some "superior"
set of metadata, but I don't want to handle that complexity in this
series.
Instead of finding the ref frame in output_frame() and then passing its
signature to update_crops(), pull out the logic and invoke it a second
time inside update_crops().
This may seem wasteful at present, but will actually become required in
the future, since update_crops() runs on *every* input, and needs values
specific to that input (which the signature isn't), while output_frame()
is only interested in a single input. It's much easier to just split the
logic cleanly.
Including the queue status, because these will need to be re-queried
inside output_frame_mix when that function is refactored to handle
multiple inputs.
In anticipation of a refactor which will enable multiple input support.
Note: the renderer is also input-specific because it maintains a frame
cache, HDR peak detection state and mixing cache, all of which are tied
to a specific input stream.
If the input queue is EOF, then the s->status check should already have
covered it, and prevented the code from getting this far.
If we still hit this case for some reason, it's probably a bug. Better
to hit the AVERROR_BUG branch.
We will postpone the vpp session initialization to when input and output
frames are ready, this copy of the sequence parameters will be used to
initialize vpp session.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
- text is now shaped using libharfbuz
- glyphs position is now accurate to 1/4 pixel in both directions
- the default line height is now the one defined in the font
Adds libharfbuzz dependency.
I've been sitting on this for 3 1/2 years now(!), and I finally got
around to fixing the loose ends and convincing myself that it was
correct. It follows the same basic structure as yadif_cuda, including
leaving out the edge handling, to avoid expensive branching.
As we are introducing two new formats and supporting conversions
between them, and also with the existing 0rgb32/0bgr32 formats, we get
a combinatorial explosion of kernels. I introduced a few new macros to
keep the things mostly managable.
The conversions are all simple, following existing patterns, with four
specific exceptions. When converting from 0rgb32/0bgr32 to rgb32/bgr32,
we need to ensure the alpha value is set to 1. In all other cases, it
can just be passed through, either to be used or ignored.
When an option could not be found, print its name and value. Note that
this is not done while applying the options in
avfilter_graph_segment_apply_opts() to give the caller the option of
handling the missing options in some other way.
I'm not sure why I originally did this, but there's no good reason to
put pointers to the cuda context and stream in the priv struct. They
are directly available in the device context that is already being
stored there.
This also fixed a warning: implicit conversion from enumeration
type 'TF_DataType' (aka 'enum TF_DataType') to different
enumeration type 'DNNDataType'.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The old code was not properly handling a bunch of edge-cases with
streams terminating earlier and also did not properly report back EOF
to the first input.
This fixes at least one case where the filter could stop doing
anything:
ffmpeg -f lavfi -i "color=blue:d=10" -f lavfi -i "color=aqua:d=0" -filter_complex "[0][1]xfade=duration=2:offset=0:transition=wiperight" -f null -
Importing Vulkan device on older versions no longer works due to the
lavu vulkan API changes (specifically, the switch to planar textures by
default). Additionally, importing on versions that don't suppirt
lock/unlock_queue is unsafe with the advent of the threaded vulkan
hwaccel. As a plus, saves us some annoying #ifdef boilerplate.
I will raise the minimum vf_libplacebo version globally on the next
stable release of libplacebo, and remove all of these checks.
The new function is much more precise
For default beta it is slightly slower, but its speed is already at the
worst case in that comparison
while the replaced function becomes much slower for larger beta
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
mix->timestamps is expressed relative to the source timebase, which is
possibly a different timescale from `base_pts`. We can't mix-and-match
here. The only reason this worked in my previous testing was because I
was testing on a source file which had an exactly matching timebase.
Fix it by always using the exact PTS as tagged on the AVFrame.
When no drawing is to be performed, tpad can work fine with
hardware frames, so advertise this in the query_formats
callback and ensure the drawing context is never initialised
when just cloning frames.
Reviewed-by: Thilo Borgmann <thilo.borgmann@mail.de>
Reviewed-by: Niklas Haas <git@haasn.dev>
This algorithm has once again been refactored, this time leading to a
dropping of the old `tone_mapping_mode` field, to be replaced by a
single tunable hybrid mode with configurable strength.
We can approximately map the old modes onto the new API for backwards
compatibility. Replace deprecated enums by their integer equivalents to
safely preserve this API until the next bump.
Upstream deprecated the old ad-hoc, enum/intent-based gamut mapping API
and added a new API based on colorimetrically accurate gamut mapping
functions.
The relevant change for us is the addition of several new modes, as well
as deprecation of the old options. Update the documentation accordingly.
The current code relied on pl_queue eventually returning EOF back to the
caller, which didn't work in all situations (e.g. single frame input).
Also, the current code assumed that ff_inlink_acknowledge_status only
fired once, which was patently not true, as the above edge cases
demonstrated.
Solve both issues by keeping track of the acknowledged link status and
forwarding it (instead of trying to probe the pl_queue again) in the
event that we run out of queued input frames, as well as (in CFR mode)
when we pass the indicated status PTS.
This exposes libplacebo's frame mixing functionality to vf_libplacebo,
by allowing users to specify a desired target fps to output at. Incoming
frames will be smoothly resampled (in a manner determined by the
`frame_mixer` option, to be added in the next commit).
To generate a consistently timed output stream, we directly use the
desired framerate as the timebase, and simply output frames in
sequential order (tracked by the number of frames output so far).
To present compatibility with the current behavior, we keep track of a
FIFO of exact frame timestamps that we want to output to the user. In
practice, this is essentially equivalent to the current filter_frame()
code, but this design allows us to scale to more complicated use cases
in the future - for example, insertion of intermediate frames
(deinterlacing, frame doubling, conversion to fixed fps, ...)
This does not leverage any immediate benefits, but refactors and
prepares the codebase for upcoming changes, which will include the
ability to do deinterlacing and resampling (frame mixing).
This commit contains no functional change. The goal is merely to
separate the highly intertwined `filter_frame` and `process_frames`
functions into their separate concerns, specifically to separate frame
uploading (which is now done directly in `filter_frame`) from emitting a
frame (which is now done by a dedicated function `output_frame`).
The overall idea here is to be able to ultimately call `output_frame`
multiple times, to e.g. emit several output frames for a single input
frame.
For example
ffmpeg -f lavfi -i sine -af "aformat=cl=stereo|5.1|7.1,lowpass,aformat=cl=7.1|5.1|stereo" -f null -
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Before this commit if allocation would fail in ff_add_channel_layout()
function, function would return negative error code and this would
cause wrong format pick up later. If allocation would not fail return
code would be 0 and then format negotiation would simply fail as code
would break from the loop but with wrong return code.
Error was introduced in 6aaac24d72 commit.
Fixes#6638
This is not public API, no it has no need for an alloc() and free()
functions. The struct can reside on stack.
Signed-off-by: James Almer <jamrial@gmail.com>
THis filter can correct certain issues seen from upstream sources
where the cc_count is not properly set or the CEA-608 tuples are
not at the start of the payload as expected.
Make use of the ccfifo to extract and immediately repack the CEA-708
side data, thereby removing any extra padding and ensuring the 608
tuples are at the front of the payload.
Signed-off-by: Devin Heitmueller <dheitmueller@ltnglobal.com>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Because the interlacing filter halves the effective framerate, we
need to ensure that no CEA-708 data is lost as frames are merged.
Make use of the new ccfifo mechanism to ensure that caption data
is properly preserved as frames pass through the filter.
Thanks to Thomas Mundt for review and noticing a couple of
missed codepaths for injection on output. Thanks to Lance Wang
for pointing out a memory leak.
Signed-off-by: Devin Heitmueller <dheitmueller@ltnglobal.com>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Various deinterlacing modes have the effect of doubling the
framerate, and we need to ensure that the caption data isn't
duplicated (or else you get double captions on-screen).
Use the new ccfifo mechanism for yadif (and yadif_cuda and bwdif
since they use the same yadif core) so that CEA-708 data is
properly preserved through this filter.
Signed-off-by: Devin Heitmueller <dheitmueller@ltnglobal.com>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
The existing implementation made an attempt to remove duplicate
captions if increasing the framerate, but made no attempt to
handle reducing the framerate, nor did it rewrite the caption
payloads to have the appropriate cc_count (e.g. the cc_count needs
to change from 20 to 10 when going from 1080i59 to 720p59 and
vice-versa).
Make use of the new ccfifo mechanism to ensure that caption data
is properly preserved.
Signed-off-by: Devin Heitmueller <dheitmueller@ltnglobal.com>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
When transcoding video that contains 708 closed captions, the
caption data is tied to the frames as side data. Simply dropping
or adding frames to change the framerate will result in loss of
data, so the caption data needs to be preserved and reformatted.
For example, without this patch converting 720p59 to 1080i59
would result in loss of 50% of the caption bytes, resulting in
garbled 608 captions and 708 probably wouldn't render at all.
Further, the frames that are there will have an illegal
cc_count for the target framerate, so some decoders may ignore
the packets entirely.
Extract the 608 and 708 tuples and insert them onto queues. Then
after dropping/adding frames, re-write the tuples back into the
resulting frames at the appropriate rate given the target
framerate. This includes both having the correct cc_count as
well as clocking out the 608 pairs at the appropriate rate.
Thanks to Lance Wang <lance.lmwang@gmail.com>, Anton
Khirnov <anton@khirnov.net>, and Michael Niedermayer <michael@niedermayer.cc>
for providing review/feedback.
Signed-off-by: Devin Heitmueller <dheitmueller@ltnglobal.com>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>