Today, cuda is not able to import multiplane images, and cuda requires
images to be imported whether you trying to import to cuda or export
from cuda (in the later case, the image is imported and then copied
into on the cuda side). So any interop between cuda and vulkan requires
that multiplane be disabled.
The existing option for this is not sufficient, because when deriving
devices it is not possible to specify any options.
And, it is necessary to derive the Vulkan device, because any pipeline
that involves uploading from cuda to vulkan and then back to cuda must
use the same cuda context on both sides, and the only way to propagate
the cuda context all the way through is to derive the device at each
stage.
ie:
-vf hwupload=derive_device=vulkan,<filters>,hwupload=derive_device=cuda
0th order modified bessel function of the first kind are used in multiple
places, lets avoid having 3+ different implementations
I picked this one as its accurate and quite fast, it can be replaced if
a better one is found
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The temporary AVFrame on staack enables us to use the common
dependency/dispatch code in prepare_frame().
The prepare_frame() function is used for both frame initialization
and frame import/export queue family transfer operations.
In the former case, no AVFrame exists yet, so, as this is purely
libavutil code, we create a temporary frame on stack. Otherwise,
we'd need to allocate multiple frames somewhere, one for each
possible command buffer dispatch.
The idea was that it's faster to map linear images and copy them
via regular memcpy. This is a very niche use, plus very inconsistently
useful, as it would only really be faster on a few Intel GPUs.
Even then, using the non-cached memcpy would've been better.
Instead, scrap this code. Drivers are better at figuring out
what copy to use, and if we're host-mapping, it should actually be
just as fast, if not faster.
This commit adds proper handling of multiplane images throughout
all of the hwcontext code. To avoid breakage of individual
components, the change is performed as a single commit.
This commit rewrites the majority of vulkan.c to enable its use
as a general-purpose high-level utility code, usable for decoding,
encoding, and filtering of video frames.
The dependency system was rewritten to simplify management of
execution.
The image handling system was rewritten to accomodate multiplane
images.
Due to how related all the new features were, this is a single
commit.
This just disables the vulkan headers from defining any symbols
like vkCmdPipelineBarrier2(). Instead, all functions must be loaded
via the loader and used as function pointers as vk->CmdPipelineBarrier2.
Mostly just forces developers to write correct code, as using the
symbols can be undesirable in case API users define their own
function wrappers via the loader API.
The hack was added to enable exporting of vulkan images to DRM.
On Intel hardware, specifically for DRM images, all planes must be
allocated next to each other, due to hardware limitation, so the hack
used a single large allocation and suballocated all planes from it.
By natively supporting multiplane images, the driver is what decides
the layout, so exporting just works.
It's a hack because it conflicted heavily with image allocation, and
with the whole ecosystem in general, before multiplane images were
supported, which just made it redundant.
This is also the commit which broke the hwcontext hardest and prompted
the entire rewrite in the first place.