d119ae2fd8 removed the loop-breaking condition
received_sigterm.
Thus, signals no longer gracefully shutdown ffmpeg.
Fixes: #10834
Signed-off-by: Patrick Wang <mail6543210@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
If we're invoked with range == UINT_MAX, we end up doing
"rnd() % (UINT_MAX + 1)", which is equal to "rnd() % 0". On
arm (on all platforms) and on MSVC i386, this ends up crashing
at runtime.
This fixes the crash.
In constant FPS mode, it's possible for there no be *no* input active at
the current timestamp, even though there would be active inputs later in
the timeline (nb_active > 0). This would previously hit the AVERROR_BUG case.
With this change, we can instead output an empty frame.
Instead of having a single s->status field to track the most recently
EOF'd stream, track the number of active streams directly and only query
the most recent status at the end.
The reason this worked before is because we implicitly relied on the
assumption that `ok` would always be true until all streams are EOF, but
because it's possible to have gaps in the timeline when mixing multiple
streams, this is not always the case in principle.
In practice, this fixes a bug where the filter would early-exit (EOF)
too soon, when any input reached EOF and there is a gap in the timeline.
When using libplacebo to composite multiple streams with complex PTS
values, it's possible for there to be a "hole" in the output stream; in
particular in constant FPS mode. This commit refactors output_frame() to
allow it to handle the case of there being no active input.
A --enable-* option should not have a condition where it's enabled by some
other mean, as is the case here if swscale is enabled.
Signed-off-by: James Almer <jamrial@gmail.com>
This patch adds format handling code for the new operations. This entails
fully decoding a format to standardized RGB, and the inverse.
Handling it this way means we can always guarantee that a conversion path
exists from A to B without having to explicitly cover logic for each path;
and choosing RGB instead of YUV as the intermediate (as was done in swscale
v1) is more flexible with regards to enabling further operations such as
primaries conversions, linear scaling, etc.
In the case of YUV->YUV transform, the redundant matrix multiplication will
be canceled out anyways.
Because of the lack of an external ABI on low-level kernels, we cannot
directly test internal functions. Instead, we construct a minimal op chain
consisting of a read, the op to be tested, and a write.
The bigger complication arises from the fact that the backend may generate
arbitrary internal state that needs to be passed back to the implementation,
which means we cannot directly call `func_ref` on the generated chain. To get
around this, always compile the op chain twice - once using the backend to be
tested, and once using the reference C backend.
The actual entry point may also just be a shared wrapper, so we need to
be very careful to run checkasm_check_func() on a pseudo-pointer that will
actually be unique for each combination of backend and active CPU flags.
This covers most 8-bit and 16-bit ops, and some 32-bit ops. It also covers all
floating point operations. While this is not yet 100% coverage, it's good
enough for the vast majority of formats out there.
Of special note is the packed shuffle fast path, which uses pshufb at vector
sizes up to AVX512.
Provides a generic fast path for any operation list that can be decomposed
into a series of memcpy and memset operations.
25% faster than the x86 backend for yuv444p -> yuva444p
33% faster than the x86 backend for gray -> yuvj444p
This will serve as a reference for the SIMD backends to come. That said,
with auto-vectorization enabled, the performance of this is not atrocious.
It easily beats the old C code and sometimes even the old SIMD.
In theory, we can dramatically speed it up by using GCC vectors instead of
arrays, but the performance gains from this are too dependent on exact GCC
versions and flags, so it practice it's not a substitute for a SIMD
implementation.
This can turn any compatible sequence of operations into a single packed
shuffle, including packed swizzling, grayscale->RGB conversion, endianness
swapping, RGB bit depth conversions, rgb24->rgb0 alpha clearing and more.
This handles the low-level execution of an op list, and integration into
the SwsGraph infrastructure. To handle frames with insufficient padding in
the stride (or a width smaller than one block size), we use a fallback loop
that pads the last column of pixels using `memcpy` into an appropriately
sized buffer.
This is responsible for taking a "naive" ops list and optimizing it
as much as possible. Also includes a small analyzer that generates component
metadata for use by the optimizer.
See docs/swscale-v2.txt for an in-depth introduction to the new approach.
This commit merely introduces the ops definitions and boilerplate functions.
The subsequent commits will flesh out the underlying implementation.
Give users and developers a way to opt in to the new format conversion code,
and more code from the swscale rewrite in general, even while development is
still ongoing.
We split the standard macro into its body (implementation) and declaration,
and use a macro argument in place of the raw `memcmp` call, with the major
difference that we now take the number of pixels to compare instead of the
number of bytes (to match the signature of float_near_ulp_array).
Sometimes, when measuring very small functions, rdtsc is not accurate enough
to get a reliable measurement. This increases the number of runs inside the
inner loop from 4 to 32, which should help a lot. Less important when using
the more precise linux-perf API, but still useful.
There should be no user-visible change since the number of runs is adjusted
to keep the total time spent measuring the same.
This behavior had no real justification and was just incredibly confusing,
since the in/out pointers passet to setup() did not match those passed to
run(), all for what is arguably an exception anyways (the palette setup).
This wrapping logic still considered any nonzero return from the ASM function
to be the overall result, but this is not true since the addition of
FF_ALPHA_TRANSPARENT.
Fix it by only early returning if FF_ALPHA_STRAIGHT is detected.
Fixes: 9b8b78a815
See-Also: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20301#issuecomment-4802
Extract Orientation and export it as a display matrix if present, and set the
frame's metadata with the remaining Exif entries.
Signed-off-by: James Almer <jamrial@gmail.com>
In hybrid_fragmented mode, the first moov at start should contain
mvex tag, since it's fmp4. When writing the moov at the second time,
it's not fmp4 any more, so mvex should be skipped.
Set CODECAPI_AVLowLatencyMode when AV_CODEC_FLAG_LOW_DELAY is enabled on
the AVCodecContext. AVLowLatencyMode can acheive lower latency encoding
that may not be accessible using eAVScenarioInfo options alone.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>