Vulkan's main issue around using BGR is simple.
The letters in the shader don't match up (rgba in shader, bgra in format).
So of course, rather than allowing "bgra" or other permutations of
formats in the shader, they went the nuclear option and spent months writing
an extension to get rid of the need to have a format in the shader to begin
with.
All this to solve a problem that should never have existed to begin with.
This fixes BGRA images since enabling WithoutFormat, as the GPU now remaps
without your involvement.
This implements support for reading and writing storage images with
no format.
The issue is that we define our images as arrays, and arrays can
only have a single type, which means that f.ex. NV12 needs two
different images, R8 and RG8.
The only driver known not to advertise support for the extension
as a whole is Intel, because they have parial support for odd formats
we never use. Therefore, just always enable it by default.
We violated the spec, which, despite the actual command buffer pool
*not* being involved in any functions which require external synchronization
of the pool, *require* external synchronization even if only the
command buffers are used.
This also has the effect of *significantly* speeding up execution
in case command buffers are contended.
This commit adds support for compiler hints.
While on AMD these are not used/needed, Nvidia benefits from them, and gives
a sizeable 10% speedup on 4k.
This simplifies the code, reduces allocations, and critically, does
not store references of frames, along with references to hw_frames_ctx.
The issue was that storing refs to frames while transferring stored
refs to hw_frames_ctx of frames, and so created a circular dependency,
which caused the Vulkan device to never be terminated.
This only stores what it strictly needs as a dependency, and enables
the frames context to be freed, even while doing asynchronous transfers.
This enables users to specify a number that would be appended to
the buf_content string.
Saves users from needing to manually print to a string.
An earlier commit tried doing this via .elems, but it was
faulty, as this also incremented the total number of descriptors
in the descriptor set.
We recently introduced a public field which was a superset
of the queue context we used to have.
Switch to using it entirely.
This also allows us to get rid of the NIH function which was
valid only for video queues.
This code was simply incorrect through and through. It did not
protect what actually has to be protected in a multi-threaded setup.
Perhaps it was used to silence threading errors?
Either way, remove it, and document the correct way to use execution
pools in a threaded environment.
The code used to use atomic, but over time, this got broken.
This commit also remmoves the is-the-last-submission-ready
shortcut, which rarely did anything.
There's also value in relying on the fact that contexts
always carry their frames in a strictly incremental order
with no gaps.
The code used as a basis was the buffer dependency code, where the
counter was incremented after each buffer, but for the sw_frame dep
API, we only support adding individual frames at a time.