Previously floats where scaled up to 32bit int, but floats do not
have 32bits in their mantisse so a quarter of the bits where nonsense.
It seems no fate test is affected by this change, which is interresting
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The new function is much more precise
For default beta it is slightly slower, but its speed is already at the
worst case in that comparison
while the replaced function becomes much slower for larger beta
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
0th order modified bessel function of the first kind are used in multiple
places, lets avoid having 3+ different implementations
I picked this one as its accurate and quite fast, it can be replaced if
a better one is found
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: signed integer overflow: -9223372036854775808 - 2082844800 cannot be represented in type 'long'
Fixes: 58384/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-6428383700713472
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
When seeking through MBAFF-coded H264, this can happen. Decoding calls end_frame
without calling start_frame. We are unable to decode this, as no frame
state has been set.
Happens for both VAAPI and Vulkan. Could be an issue elsewhere, hence
the individual commit.
There are circumstances where the flag isn't set but the skip mode
frames are. So don't use the inferred bit which has other inputs
when deciding to pass the skip mode frames to the device.
This fixes some decoding bugs on intel av1
The temporary AVFrame on staack enables us to use the common
dependency/dispatch code in prepare_frame().
The prepare_frame() function is used for both frame initialization
and frame import/export queue family transfer operations.
In the former case, no AVFrame exists yet, so, as this is purely
libavutil code, we create a temporary frame on stack. Otherwise,
we'd need to allocate multiple frames somewhere, one for each
possible command buffer dispatch.
The idea was that it's faster to map linear images and copy them
via regular memcpy. This is a very niche use, plus very inconsistently
useful, as it would only really be faster on a few Intel GPUs.
Even then, using the non-cached memcpy would've been better.
Instead, scrap this code. Drivers are better at figuring out
what copy to use, and if we're host-mapping, it should actually be
just as fast, if not faster.
This commit adds proper handling of multiplane images throughout
all of the hwcontext code. To avoid breakage of individual
components, the change is performed as a single commit.