This just bumps the required loader library version (libvulkan).
All device-related features, such as video decoding, atomics, etc.
are still optional and the code deals with their loss on a local level
(e.g. the decoder or filter checks for the features it needs, not
the hwcontext).
Bumping the required version essentially packs all maintenance
extensions which correct the spec rather than requiring to enable
them individually.
Fixes compilation with clang which errors out on Wint-conversion.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
This fixes the following error when compiling with a modern
version of Clang for Windows/i386:
src/libavutil/hwcontext_vulkan.c:738:32: error: incompatible function pointer types initializing 'PFN_vkDebugUtilsMessengerCallbackEXT' (aka 'unsigned int (*)(enum VkDebugUtilsMessageSeverityFlagBitsEXT, unsigned int, const struct VkDebugUtilsMessengerCallbackDataEXT *, void *) __attribute__((stdcall))') with an expression of type 'VkBool32 (VkDebugUtilsMessageSeverityFlagBitsEXT, VkDebugUtilsMessageTypeFlagsEXT, const VkDebugUtilsMessengerCallbackDataEXT *, void *)' (aka 'unsigned int (enum VkDebugUtilsMessageSeverityFlagBitsEXT, unsigned int, const struct VkDebugUtilsMessengerCallbackDataEXT *, void *)') [-Wincompatible-function-pointer-types]
.pfnUserCallback = vk_dbg_callback,
^~~~~~~~~~~~~~~
Signed-off-by: Martin Storsjö <martin@martin.st>
They're not currently used, so they don't need to be there.
Vulkan stabilized the decode extensions less than a week ago, and their
name prefixes were changed from EXT to KHR. It's a bit too soon to be
depending on it, so rather than bumping, just remove these for now.
If we want to be able to map between VAAPI and Vulkan (to do Vulkan
filtering), we need to have matching formats on each side.
The mappings here are not exact. In the same way that P010 is still
mapped to full 16 bit formats, P012 has to be mapped that way as well.
Similarly, VUYX has to be mapped to an alpha-equipped format, and XV36
has to be mapped to a fully 16bit alpha-equipped format.
While Vulkan seems to fundamentally lack formats with an undefined,
but physically present, alpha channel, it has have 10X6 and 12X4
formats that you could imagine using for P010, P012 and XV36, but these
formats don't support the STORAGE usage flag. Today, hwcontext_vulkan
requires all formats to be storable because it wants to be able to use
them to create writable images. Until that changes, which might happen,
we have to restrict the set of formats we use.
Finally, when mapping a Vulkan image back to vaapi, I observed that
the VK_FORMAT_R16G16B16A16_UNORM format we have to use for XV36 going
to Vulkan is mapped to Y416 when going to vaapi (which makes sense as
it's the exact matching format) so I had to add an entry for it even
though we don't use it directly.
VkPhysicalDeviceVulkan12Features isn't implemented on MoltenVK yet.
VkPhysicalDeviceTimelineSemaphoreFeatures is less versatile but
simple. None of device_features_1_1 nor device_features_1_2 has real
usage yet, keep the code for future.
We don't use it. Was copied from libplacebo's recommended defaults.
Creates problems with validation on Intel devices, where the driver
still advertizes it, even though it's not usable without a swapchain.
When vulkan image exports to drm, the tilling need to be
VK_IMAGE_TILING_DRM_FORMAT_MODIFIER_EXT. Now add code to create vulkan
image using this format.
Now the following command line works:
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format \
vaapi -i input_1080p.264 -vf "hwmap=derive_device=vulkan,format=vulkan, \
scale_vulkan=1920:1080,hwmap=derive_device=vaapi,format=vaapi" -c:v h264_vaapi output.264
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Further-modifications-by: Lynne <dev@lynne.ee>
Add support to map vulkan frames to software frames when
using contiguous_planes flag.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Further-modifications-by: Lynne <dev@lynne.ee>
VAAPI on Intel can import external frame, but the planes of the external
frames should be in the same drm object. A new option "contiguous_planes"
is added to device. This flag tells device to allocate places in one
memory. When device is derived from vaapi this flag will be enabled.
A new flag frame_flag is also added to AVVulkanFramesContext. User
can use this flag to force enable or disable this behaviour.
A new variable "offset "is added to AVVKFrame. It describe describe the
offset from the memory currently bound to the VkImage.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Further-modifications-by: Lynne <dev@lynne.ee>
Validation layer is an indispensable part of developing on Vulkan.
The following commands is on how to enable validation layers:
ffmpeg -init_hw_device vulkan=0,debug=1,validation_layers=VK_LAYER_LUNARG_monitor+VK_LAYER_LUNARG_api_dump
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
"All commands that are allowed on a queue that supports transfer
operations are also allowed on a queue that supports either
graphics or compute operations. Thus, if the capabilities of a
queue family include VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT,
then reporting the VK_QUEUE_TRANSFER_BIT capability separately for
that queue family is optional."
Include windows.h to fix it. Normally, it'd be better to include it in
vulkan_functions.h, but I'm reasonably confident nothing else that uses
the Vulkan code will need to include Windows functions and not windows.h.
This simplifies and makes queue family picking simpler and more robust.
The requirements on the device context are relaxed. They made no sense
in the first place.
The video encode/decode extension is still in beta, at least on paper,
but I really doubt they'd change needing a separate queue family.
While Vulkan itself went more or less the way it was expected to go,
libvulkan didn't quite solve all of the opengl loader issues. It's multi-vendor,
yes, but unfortunately, the code is Google/Khronos QUALITY, so suffers from
big static linking issues (static linking on anything but OSX is unsupported),
has bugs, and due to the prefix system used, there are 3 or so ways to type out
functions.
Just solve all of those problems by dlopening it. We even have nice emulation
for it on Windows.
VkPhysicalDeviceLimits.optimalBufferCopyRowPitchAlignment and
VkPhysicalDeviceExternalMemoryHostPropertiesEXT.minImportedHostPointerAlignment are of type
VkDeviceSize (a typedef uint64_t).
VkPhysicalDeviceLimits.minMemoryMapAlignment is of type size_t.
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Lynne <dev@lynne.ee>
fixes http://trac.ffmpeg.org/ticket/9055
The hw decoder may allocate a large frame from AVHWFramesContext, and adjust width and height based on bitstream.
We need to use resolution from src frame instead of AVHWFramesContext.
test command:
ffmpeg -loglevel debug -hide_banner -hwaccel vaapi -init_hw_device vaapi=va:/dev/dri/renderD128 -hwaccel_device va -hwaccel_output_format vaapi -init_hw_device vulkan=vulk -filter_hw_device vulk -i 1920x1080.264 -c:v libx264 -r:v 30 -profile:v high -preset veryfast -vf "hwmap,chromaber_vulkan=0:0,hwdownload,format=nv12" -map 0 -y vaapiouts.mkv
expected:
No green bar at bottom.
These two extensions and two features are both optionally used by
libplacebo to speed up rendering, so it makes sense for libavutil to
automatically enable them as well.
Vulkan formats with a PACK suffix define native endianess.
Vulkan formats without a PACK suffix are in bytestream order.
Pixel formats with a LE/BE suffix define endianess.
Pixel formats without LE/BE suffix are in bytestream order.
This relies on the fact that host memory is always going to be required
to be aligned to the platform's page size, which means we can adjust
the pointers when we map them to buffers and therefore skip an entire
copy. This has already had extensive testing in libplacebo without
problems, so its safe to use here as well.
Speeds up downloads and uploads on platforms which do not pool their
memory hugely, but less so on platforms that do.
We can pool the buffers ourselves, but that can come as a later patch
if necessary.
The process space is guaranteed to be aligned to the page size, hence we're
never going to map outside of our address space.
There are more optimizations to do with respect to chroma plane alignment and
buffer offsets, but that can be done later.
We want to copy the lowest amount of bytes per line, but while the buffer
stride is sanitized, the src/dst stride can be negative, and negative numbers
of bytes do not make a lot of sense.
This was never actually used, likely due to confusion, as the device context
also had one used for uploads and downloads.
Also, since we're only using it for very quick image barriers (which are
practically free on all hardware), use the compute queue instead of the
transfer queue.
This commit makes full use of the enabled queues to provide asynchronous
uploads of images (downloads remain synchronous).
For a pure uploading use cases, the performance gains can be significant.
With this, the puzzle of making libplacebo, ffmpeg and any other Vulkan
API users interoperable is complete.
Users of both libraries can initialize one another's contexts without having
to create a new one.
This allows for users who derive devices to set options for the
new device context they derive.
The main use case of this is to allow users to enable extensions
(such as surface drawing extensions) in Vulkan while deriving from
the device their frames are on. That way, users don't need to write
any initialization code themselves, since the Vulkan spec invalidates
mixing instances, physical devices and active devices.
Apart from Vulkan, other hwcontexts ignore the opts argument since they
don't support options at all (or in VAAPI and OpenCL's case, options are
currently only used for device selection, which device_derive overrides).
Only warn instead. API users can find out which extensions were unavailable
by using the enabled_inst_extensions and enabled_dev_extensions fields.
This eliminates having to trial-and-error to find which extensions were missing.
Due to our AVHWDevice infrastructure, where API users are offered a way
to derive contexts rather than always create new one, our filterchains,
being supported by a single hardware device context, can grow to considerable
size.
Hence, in such situations, using the maximum amount of queues the device offers
can be benefitial to eliminating bottlenecks where queue submissions on the
same family have to wait for the previous one to finish.
This reverts commit 97b526c192.
It broke the API, and assumed no other APIs used multiple semaphores.
This also disallowed certain optimizations to happen.
Dealing with APIs that give or expect single semaphores is easier when
we use per-image semaphores.
The specs note that images should be in the GENERAL layout when exporting
for maximum compatibility.
CUDA exported images are handled differently, and the queue is the same,
so we don't need to do that there.
As it turns out, we were already assuming and treating all images as if they had
concurrent access mode. This just changes the flag to CONCURRENT, which has less
restrictions than EXCLUSIVE, and fixed validation messages on machines with
multiple queues.
The validation layer didn't pick this up because the machine I was testing on
had only a single queue.
This solves a huge oversight - it lets users reliably use their own
AVVulkanDeviceContext. Otherwise, the extensions supplied and enabled
are not discoverable by anything outside of hwcontext_vulkan.
Also clarifies that any user-supplied VkInstance must be at least 1.1.
Also documents all options supported by the hwdevice.
This lets users enable all extensions they need without writing their own
instance initialization code.
We derive the destination buffer stride from the input stride,
which meant if the image was flipped with a negative stride,
we'd be FFALIGNING a negative number which ends up being huge,
thus making the Vulkan buffer allocation fail and the whole
image transfer fail.
Only found out about this as OpenGL compositors can copy an entire
image with a single call if its flipped, rather than iterate over
each line.
The idea was to allow separate planes to be filtered independently, however,
in hindsight, literaly nothing uses separate per-plane semaphores and it
would only work when each plane is backed by separate device memory.