This handles the low-level execution of an op list, and integration into
the SwsGraph infrastructure. To handle frames with insufficient padding in
the stride (or a width smaller than one block size), we use a fallback loop
that pads the last column of pixels using `memcpy` into an appropriately
sized buffer.
This is responsible for taking a "naive" ops list and optimizing it
as much as possible. Also includes a small analyzer that generates component
metadata for use by the optimizer.
See docs/swscale-v2.txt for an in-depth introduction to the new approach.
This commit merely introduces the ops definitions and boilerplate functions.
The subsequent commits will flesh out the underlying implementation.
Give users and developers a way to opt in to the new format conversion code,
and more code from the swscale rewrite in general, even while development is
still ongoing.
We split the standard macro into its body (implementation) and declaration,
and use a macro argument in place of the raw `memcmp` call, with the major
difference that we now take the number of pixels to compare instead of the
number of bytes (to match the signature of float_near_ulp_array).
Sometimes, when measuring very small functions, rdtsc is not accurate enough
to get a reliable measurement. This increases the number of runs inside the
inner loop from 4 to 32, which should help a lot. Less important when using
the more precise linux-perf API, but still useful.
There should be no user-visible change since the number of runs is adjusted
to keep the total time spent measuring the same.
This behavior had no real justification and was just incredibly confusing,
since the in/out pointers passet to setup() did not match those passed to
run(), all for what is arguably an exception anyways (the palette setup).
This wrapping logic still considered any nonzero return from the ASM function
to be the overall result, but this is not true since the addition of
FF_ALPHA_TRANSPARENT.
Fix it by only early returning if FF_ALPHA_STRAIGHT is detected.
Fixes: 9b8b78a815
See-Also: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/20301#issuecomment-4802
Extract Orientation and export it as a display matrix if present, and set the
frame's metadata with the remaining Exif entries.
Signed-off-by: James Almer <jamrial@gmail.com>
In hybrid_fragmented mode, the first moov at start should contain
mvex tag, since it's fmp4. When writing the moov at the second time,
it's not fmp4 any more, so mvex should be skipped.
Set CODECAPI_AVLowLatencyMode when AV_CODEC_FLAG_LOW_DELAY is enabled on
the AVCodecContext. AVLowLatencyMode can acheive lower latency encoding
that may not be accessible using eAVScenarioInfo options alone.
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
Currently, the aacencdsp checkasm tests fails for many seeds,
if the C code has been built with x87 math. This happens because
the excess precision of x87 math can make it end up rounding
to a different integer, and the checkasm tests checks that the
output integers match exactly between C and assembly.
One such failing case is "tests/checkasm/checkasm --test=aacencdsp
41" when compiled with GCC. When compiled with Clang, the test
seed 21 produces a failure.
To avoid the issue, we need to limit the precision of intermediates
to their nominal float range, matching the assembly implementations.
This can be achieved when compiling with GCC, by just adding a single
cast.
To observe the effect of this cast, compile the following
snippet,
int cast(float a, float b) {
return (int)
#ifdef CAST
(float)
#endif
(a + b);
}
with "gcc -m32 -std=c17 -O2", with/without -DCAST. For x86_64
cases (without the "-m32"), the cast doesn't make any difference
on the generated code.
This cast would seem to not have any effect, as a binary expression
with float inputs also would have the type float.
However, if compiling with GCC with -fexcess-precision=standard,
the cast forces limiting the precision according to the language
standard here - according to the GCC docs [1]:
> When compiling C or C++, if -fexcess-precision=standard is
> specified then excess precision follows the rules specified in
> ISO C99 or C++; in particular, both casts and assignments cause
> values to be rounded to their semantic types (whereas -ffloat-store
> only affects assignments). This option is enabled by default for
> C or C++ if a strict conformance option such as -std=c99 or
> -std=c++17 is used.
Ffmpeg's configure scripts enables -std=c17 by default.
This only helps with GCC though - the cast doesn't make any
difference for Clang. (Although, upstream Clang seems to default
to SSE math, while Ubuntu provided Clang defaults to x87 math.)
Limiting the precision with Clang would require casting to volatile
float for both intermediates here - and that does have a code
generation effect on all architectures.
[1] https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html
Using the built-in token seems to only partially work.
And in the latest Forgejo release the issue forcing the use of the
built-in token was fixed with:
https://codeberg.org/forgejo/forgejo/pulls/9025
Change the PCM table init to use compile-time #if instead of a runtime
`if (CONFIG_...)`. This makes sure code for disabled encoders never gets
built, without depending on compiler dead-code elimination.
Signed-off-by: YiboFang <fangyibo@xiaomi.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Add the max_frame_size option to support setting max frame size in
bytes. Max frame size is the maximum cap in the bitrate algorithm per
each encoded frame.
Signed-off-by: Tong Wu <wutong1208@outlook.com>
strftime returns 0 in case of an empty output (e.g. %p format string with some
locales), there is no way to distinguish this from a buffer-too-small error
condition. So we must use some heuristics to handle this case, and not consume
INT_MAX RAM and falsely report a truncated output.
Signed-off-by: Marton Balint <cus@passwd.hu>
The counters should be incremented for each new codec.
Catch-up to be in sync with codec_id.h again.
Signed-off-by: Nicolas Gaullier <nicolas.gaullier@cji.paris>
If this function returns an error after ff_sws_graph_add_pass() has been
called, and the pass->free callback is therefore already set up to free the
context, the graph will end up freed twice: once by the pass->free callback
(during ff_sws_graph_free()), and once before that by failure path of the
caller (e.g. add_legacy_sws_pass(), or init_legacy_subpass() itself for
cascaded contexts.)
The solution is to redefine the ownership of SwsGraph to pass clearly from
the caller of add_legacy_sws_pass() to init_legacy_subpass(), which can then
deal with appropriately freeing the context conditional on whether or not the
pass was already registered in the pass list.
Reported-by: 김영민 <kunshim@naver.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
'decode_spectrum' reads 5 bits from bitstream to get
number of encoded subbands – so 31 means all 32
subbands are encoded. This value also is used to
determinate the number of used band in the hybrid
filterbank.
'subband_tab' array contains 33 values of MDCT spec
line positions started from 0 line and used to map
subband number in to the range of mdct lines.
Since the subband_num returned by decode_spectrum
actually is number – 1 and subband_tab started from 0
we need to add 1 to make num_bands calculation correct.
We're missing a call to av_exif_free here. We leak the internal heap-
allocated objects when &ifd goes out of scope.
Signed-off-by: Leo Izen <leo.izen@gmail.com>