Favour left aligned columns over right aligned columns.
In principle either style should be ok, but some of the cases
easily lead to incorrect indentation in the surrounding code (see
a couple of cases fixed up in the preceding patch), and show up in
automatic indentation correction attempts.
Signed-off-by: Martin Storsjö <martin@martin.st>
Some functions have slightly different indentation styles; try
to match the surrounding code.
libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally
uses a layered indentation style to visually show how different
unrolled/interleaved phases fit together.
Signed-off-by: Martin Storsjö <martin@martin.st>
The gather index vector is only used as double-length (due to register
pressure), so no need to initialise it for quad-length. Basically this
matches the multiplier in the prologue to the the multipler in the loop.
Fixes multiplane support on Nvidia.
Also, remove the ENCODE usage, even if the driver signals it as supported.
Currently, it's not used, and when it is used, it'll be gated behind
two extension checks.
Fixes: signed integer overflow: -1364715454 + -1468954671 cannot be represented in type 'int'
Fixes: 62093/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_FIXED_fuzzer-5538774254485504
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This revectors the inner loop to reverse vectors element in vectors,
thus eliminating the negative register stride. Note that RVV does not
have a vector reverse instruction, so this uses a gather.
It does not make much sense to me, but GCC somehow optimises the
inline assembler even though the output is very obviously used and
having observable side effects.
This reverts commit 09731fbfc3.
ISO C++ forbids compound-literals. It's not available with MSVC.
This is a known issue from 10 years ago, and that's why there is a
av_get_time_base_q().
Since we have no plan to remove AV_TIME_BASE_Q, just make it
available in C++.
There are multiple choices:
1. Use C++11 syntax: AVRational{1, AV_TIME_BASE}
Users may still use C++98 to write new code. So no.
2. Use av_get_time_base_q().
It's for this purpose. But it's not compile time constants as
AV_TIME_BASE_Q in C.
So I choose av_make_q() as Anton's suggestion.
https://libav-devel.libav.narkive.com/ZQCWfTun/patch-0-2-fix-avutil-h-usage-from-c
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The alignment in vulkan_unmap_from_drm() (formerly the clone
of vulkan_frame_free()) is nicer than the in vulkan_frame_free(),
let's preserve it.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The AVBuffer API uses uint8_t as base type for buffers
and therefore its free callbacks need to abide by this.
Therefore vulkan_frame_free() used an inappropriate signature
which caused casts whenever this function has been called
manually.
This commit changes this by making vulkan_frame_free()
use the proper type and a vulkan_frame_free_cb() that
is used as free callback for the AVBuffer API.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
An AVBPrint is initialized via av_bprint_init() (or
av_bprint_init_for_buffer()) which expects uninitialized
AVBPrints; it is therefore not necessary to zero them before
the actual initialization.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
av_image_copy() accepts const uint8_t* const * as source;
lots of user have uint8_t* const * and therefore either
cast (the majority) or copy the array of pointers.
This commit changes this by adding a static inline wrapper
for av_image_copy() that casts between the two types
so that we do not need to add casts everywhere else.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Also constify AVAudioFifo* in the peek functions
besides constifying intermediate pointers (void**->void * const *).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is done immediately without waiting for the next major bump
just as in 9546b3a1cb and
4eaaa38d3d.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is not necessary at all. So remove it.
This also breaks an inclusion cycle mem.h->avutil.h->common.h->mem.h.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, avutil.h includes common.h which includes mem.h which
includes avutil.h, so that all these headers are in fact equivalent.
Yet mem.h does not need to include avutil.h at all and when it no longer
does, including common.h will no longer include error.h (included by
avutil.h) as well; change this by moving error.h to avutil.h, as error.h
is clearly a commonly used header.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These are in-place transforms, required for DCT-I and DST-I.
Templated as the mod2 variant requires minor modifications, and is
required specifically for DCT-I/DST-I.
The loader ensures only that functions with tagged supported extensions
exist, rather than ensuring only those with supported extensions are
loaded.
As the init function uses Vulkan functions, whose loading requires them
to have the extension flags set, the extension flags are guaranteed
to also exist at this point.
So far, AV_READ_TIME would return the cycle counter. This posed two
problems:
1) On recent systems, it would just raise an illegal instruction
exception. Indeed RDCYCLE is blocked in user space to ward off some
side channel attacks. In particular, this would cause the random
number generator to crash.
2) It does not match the x86 behaviour and the apparent original intent
of AV_READ_TIME in the functional code base (outside test cases).
So this replaces the cycle counter with the time counter. The unit is
a platform-dependent constant fraction of time, and the value should be
stable across harts (RISC-V lingo for physical CPU thread).
Currently, create_pnext is only used if an applicable external memory
extension is enabled. This will usually the case when used from the command
line, but may not be when the Vulkan context is created manually.
For images used in video decoding, create_pnext contains the video profile
list, which is mandatory.[1] This fixes a GPU crash when using RADV.
[1] https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkImageCreateInfo.html#VUID-VkImageCreateInfo-usage-04815
Signed-off-by: Chris Spencer <spencercw@gmail.com>
This abstraction is similar to the existing one for pthread_mutex_t and
pthread_once_t functions, and should reduce the amount of ifdeffery used
in future code.
Signed-off-by: James Almer <jamrial@gmail.com>
C++ doesn't support designated initializers until C++20. We have
a bunch of pre-defined channel layouts, the gains to make them
usable in C++ exceed the losses.
Bump minor version so C++ project can check before use these defines.
Also initialize .opaque field explicitly to reduce warning in C++.
Signed-off-by: James Almer <jamrial@gmail.com>
Including winsock2.h or windows.h without WIN32_LEAN_AND_MEAN cause
bzlib.h to parse as nonsense, due to an instance of #define char small
in rpcndr.h.
See:
https://stackoverflow.com/a/27794577
Signed-off-by: L. E. Segovia <amy@amyspark.me>
Signed-off-by: Martin Storsjö <martin@martin.st>
Specifically, test copying a channel layout with custom order,
so that the allocation codepath of av_channel_layout_copy()
is executed.
Reviewed-by: Nicolas George <george@nsup.org>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This test does not need access to the internals of said compilation
unit.
Reviewed-by: Nicolas George <george@nsup.org>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Reviewed-by: Nicolas George <george@nsup.org>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
av_channel_name(), av_channel_description() and
av_channel_layout_describe() are supposed to return the size
of the needed buffer to allow the user to check for truncation;
the documentation ("If the returned value is bigger than buf_size,
then the string was truncated.") confirms that size does not
mean strlen.
Yet the AVBPrint API, i.e. AVBPrint.len, does not account for
the terminating '\0'. Therefore the returned length is off by one.
Furthermore, also check for whether the returned value actually
fits in an int (which is the return value of these functions).
Reviewed-by: Nicolas George <george@nsup.org>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The AVBPrint API guarantees that the string buffer is always
zero-terminated; in order to honour this guarantee, there
obviously must be a string buffer at all and it must have
a size >= 1. Therefore av_bprint_init_for_buffer() treats
passing a NULL buffer or size == 0 as invalid data that
leads to undefined behaviour, namely NPD in case NULL is provided
or a write to a buffer of size 0 in case size == 0.
But it would be easy to support this, namely by using the internal
buffer with AV_BPRINT_SIZE_COUNT_ONLY in case size == 0.
There is a reason to allow this: Several functions like
av_channel_(description|name) are actually wrappers
around corresponding AVBPrint functions. They accept user
provided buffers and are supposed to return the required
size of the buffer, which would allow the user to call
it once to get the required buffer size and call it once
more after having allocated the buffer.
If av_bprint_init_for_buffer() treats size == 0 as invalid,
all these users would need to check for this themselves
and basically add the same codeblock that this patch
adds to av_bprint_init_for_buffer().
This change is in line with e.g. snprintf() which also allows
the pointer to be NULL in case size is zero.
This fixes Coverity issues #1503074, #1503076 and #1503082;
all of these issues are about providing NULL to the channel-layout
functions that are wrappers around AVBPrint versions.
Reviewed-by: Nicolas George <george@nsup.org>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Use the GCC specific codepath for Clang in MSVC mode too.
This matches the condition used in a number of other places.
MSVC doesn't have a way to signal potential aliasing, while GCC
(and Clang) can use __attribute__((may_alias)) for this purpose.
When building with Clang in MSVC mode, __GNUC__ isn't defined but
_MSC_VER is as Clang primarily impersonates MSVC - but even then it
does support the GCC style attributes.
The GCC specific codepath uses av_alias, which expands to
the may_alias attribute if supported. The MSVC specific codepath
doesn't use av_alias so far (as MSVC doesn't support any
corresponding attribute).
This fixes a couple HEVC decoder tests when built with Clang 14 or
newer in MSVC mode (with issues observed on all of x86_64, armv7
and aarch64).
Signed-off-by: Martin Storsjö <martin@martin.st>
libavutil/hwcontext_qsv.c: In function ‘qsv_map_to’:
libavutil/hwcontext_qsv.c:1905:47: warning: cast from pointer to integer
of different size [-Wpointer-to-int-cast]
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
major/minor are in <sys/types.h> on BSDs and <sys/mkdev.h> on Solaris-like.
libavutil/hwcontext_vulkan.c:55:10: fatal error: 'sys/sysmacros.h' file not found
#include <sys/sysmacros.h>
^~~~~~~~~~~~~~~~~
1) Take the reductive sum out of the loop,
leaving a regular vector addition in the loop.
2) Merge the addition and the multiplication.
3) Unroll.
Before:
scalarproduct_float_rvv_f32: 832.5
After:
scalarproduct_float_rvv_f32: 275.2
The code was blindly assuming that Zbb or V implied Zba. While the
earlier is practically always true, the later broke some QEMU setups,
as V was introduced earlier than Zba.
This is not actually used for anything. The configure check causes the
CPU feature flag to be set, but nothing consumes it at all.
While AArch64 does have VFP, it is only used for the scalar C code.
Conversely, it is still possible to disable VFP, by changing the
C compiler flags as before (though that only makes sense for an
hypothetical non-standard Armv8 platform without VFP).
Note that this retains the "vfp" option flag, for backward
compatibility and on the very remote but theoretically possible chance
that FFmpeg actually makes use of it in the future.
AV_CPU_FLAG_VFP is retained as it is actually used by AArch32.
libavutil/random_seed.c calls arc4random_buf which is
not available on OSX 10.4 Tiger, but the configuration
script tests for arc4random which is available.
Fix the configuration test to match the actual API used.
Co-authored-by: James Almer <jamrial@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Uses the existing code for av_get_random_seed() to return a buffer with
cryptographically secure random data, or an error if none could be generated.
Signed-off-by: James Almer <jamrial@gmail.com>
This ensures the requested amount of bytes is read.
Also remove /dev/random as it's no longer necessary.
Signed-off-by: James Almer <jamrial@gmail.com>
When qsv device is created by device_derive, the ctx->free function is
not registered, causing potential memory leak because of not properly
closing the MFX session.
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
nvenc declares support for these formats, but if hwcontext_cuda doesn't
do that as well, then it's not possible to hwupload them for use in a
possible cuda pipeline before encoding.
This reduces memory needed dramatically, as unneeded resources
can be immediately returned to the pool.
Although waitforfences is threadsafe, we add a mutex wait around
it, as the mutex fence in combination with waitforfences assures
us that no other thread will reset the fence in the meanwhile
whilst the mutex is locked. This allows is to call
ff_vk_exec_discard_deps.
For now, there's not much value in this since Clang don't support
enabling the dotprod or i8mm features with either .arch_extension
or .arch (it has to be enabled by the base arch flags passed to
the compiler). But it may be supported in the future.
Signed-off-by: Martin Storsjö <martin@martin.st>
These are available since ARMv8.4-a and ARMv8.6-a respectively,
but can also be available optionally since ARMv8.2-a.
Check if ".arch armv8.2-a" and ".arch_extension {dotprod,i8mm}" are
supported, and check if the instructions can be assembled.
Current clang versions fail to support the dotprod and i8mm
features in the .arch_extension directive, but do support them
if enabled with -march=armv8.4-a on the command line. (Curiously,
lowering the arch level with ".arch armv8.2-a" doesn't make the
extensions unavailable if they were enabled with -march; if that
changes, Clang should also learn to support these extensions via
.arch_extension for them to remain usable here.)
Signed-off-by: Martin Storsjö <martin@martin.st>
Today, cuda is not able to import multiplane images, and cuda requires
images to be imported whether you trying to import to cuda or export
from cuda (in the later case, the image is imported and then copied
into on the cuda side). So any interop between cuda and vulkan requires
that multiplane be disabled.
The existing option for this is not sufficient, because when deriving
devices it is not possible to specify any options.
And, it is necessary to derive the Vulkan device, because any pipeline
that involves uploading from cuda to vulkan and then back to cuda must
use the same cuda context on both sides, and the only way to propagate
the cuda context all the way through is to derive the device at each
stage.
ie:
-vf hwupload=derive_device=vulkan,<filters>,hwupload=derive_device=cuda
0th order modified bessel function of the first kind are used in multiple
places, lets avoid having 3+ different implementations
I picked this one as its accurate and quite fast, it can be replaced if
a better one is found
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The temporary AVFrame on staack enables us to use the common
dependency/dispatch code in prepare_frame().
The prepare_frame() function is used for both frame initialization
and frame import/export queue family transfer operations.
In the former case, no AVFrame exists yet, so, as this is purely
libavutil code, we create a temporary frame on stack. Otherwise,
we'd need to allocate multiple frames somewhere, one for each
possible command buffer dispatch.
The idea was that it's faster to map linear images and copy them
via regular memcpy. This is a very niche use, plus very inconsistently
useful, as it would only really be faster on a few Intel GPUs.
Even then, using the non-cached memcpy would've been better.
Instead, scrap this code. Drivers are better at figuring out
what copy to use, and if we're host-mapping, it should actually be
just as fast, if not faster.
This commit adds proper handling of multiplane images throughout
all of the hwcontext code. To avoid breakage of individual
components, the change is performed as a single commit.
This commit rewrites the majority of vulkan.c to enable its use
as a general-purpose high-level utility code, usable for decoding,
encoding, and filtering of video frames.
The dependency system was rewritten to simplify management of
execution.
The image handling system was rewritten to accomodate multiplane
images.
Due to how related all the new features were, this is a single
commit.
This just disables the vulkan headers from defining any symbols
like vkCmdPipelineBarrier2(). Instead, all functions must be loaded
via the loader and used as function pointers as vk->CmdPipelineBarrier2.
Mostly just forces developers to write correct code, as using the
symbols can be undesirable in case API users define their own
function wrappers via the loader API.
The hack was added to enable exporting of vulkan images to DRM.
On Intel hardware, specifically for DRM images, all planes must be
allocated next to each other, due to hardware limitation, so the hack
used a single large allocation and suballocated all planes from it.
By natively supporting multiplane images, the driver is what decides
the layout, so exporting just works.
It's a hack because it conflicted heavily with image allocation, and
with the whole ecosystem in general, before multiplane images were
supported, which just made it redundant.
This is also the commit which broke the hwcontext hardest and prompted
the entire rewrite in the first place.