major/minor are in <sys/types.h> on BSDs and <sys/mkdev.h> on Solaris-like.
libavutil/hwcontext_vulkan.c:55:10: fatal error: 'sys/sysmacros.h' file not found
#include <sys/sysmacros.h>
^~~~~~~~~~~~~~~~~
1) Take the reductive sum out of the loop,
leaving a regular vector addition in the loop.
2) Merge the addition and the multiplication.
3) Unroll.
Before:
scalarproduct_float_rvv_f32: 832.5
After:
scalarproduct_float_rvv_f32: 275.2
The code was blindly assuming that Zbb or V implied Zba. While the
earlier is practically always true, the later broke some QEMU setups,
as V was introduced earlier than Zba.
This is not actually used for anything. The configure check causes the
CPU feature flag to be set, but nothing consumes it at all.
While AArch64 does have VFP, it is only used for the scalar C code.
Conversely, it is still possible to disable VFP, by changing the
C compiler flags as before (though that only makes sense for an
hypothetical non-standard Armv8 platform without VFP).
Note that this retains the "vfp" option flag, for backward
compatibility and on the very remote but theoretically possible chance
that FFmpeg actually makes use of it in the future.
AV_CPU_FLAG_VFP is retained as it is actually used by AArch32.
libavutil/random_seed.c calls arc4random_buf which is
not available on OSX 10.4 Tiger, but the configuration
script tests for arc4random which is available.
Fix the configuration test to match the actual API used.
Co-authored-by: James Almer <jamrial@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Uses the existing code for av_get_random_seed() to return a buffer with
cryptographically secure random data, or an error if none could be generated.
Signed-off-by: James Almer <jamrial@gmail.com>
This ensures the requested amount of bytes is read.
Also remove /dev/random as it's no longer necessary.
Signed-off-by: James Almer <jamrial@gmail.com>
When qsv device is created by device_derive, the ctx->free function is
not registered, causing potential memory leak because of not properly
closing the MFX session.
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
nvenc declares support for these formats, but if hwcontext_cuda doesn't
do that as well, then it's not possible to hwupload them for use in a
possible cuda pipeline before encoding.
This reduces memory needed dramatically, as unneeded resources
can be immediately returned to the pool.
Although waitforfences is threadsafe, we add a mutex wait around
it, as the mutex fence in combination with waitforfences assures
us that no other thread will reset the fence in the meanwhile
whilst the mutex is locked. This allows is to call
ff_vk_exec_discard_deps.
For now, there's not much value in this since Clang don't support
enabling the dotprod or i8mm features with either .arch_extension
or .arch (it has to be enabled by the base arch flags passed to
the compiler). But it may be supported in the future.
Signed-off-by: Martin Storsjö <martin@martin.st>
These are available since ARMv8.4-a and ARMv8.6-a respectively,
but can also be available optionally since ARMv8.2-a.
Check if ".arch armv8.2-a" and ".arch_extension {dotprod,i8mm}" are
supported, and check if the instructions can be assembled.
Current clang versions fail to support the dotprod and i8mm
features in the .arch_extension directive, but do support them
if enabled with -march=armv8.4-a on the command line. (Curiously,
lowering the arch level with ".arch armv8.2-a" doesn't make the
extensions unavailable if they were enabled with -march; if that
changes, Clang should also learn to support these extensions via
.arch_extension for them to remain usable here.)
Signed-off-by: Martin Storsjö <martin@martin.st>
Today, cuda is not able to import multiplane images, and cuda requires
images to be imported whether you trying to import to cuda or export
from cuda (in the later case, the image is imported and then copied
into on the cuda side). So any interop between cuda and vulkan requires
that multiplane be disabled.
The existing option for this is not sufficient, because when deriving
devices it is not possible to specify any options.
And, it is necessary to derive the Vulkan device, because any pipeline
that involves uploading from cuda to vulkan and then back to cuda must
use the same cuda context on both sides, and the only way to propagate
the cuda context all the way through is to derive the device at each
stage.
ie:
-vf hwupload=derive_device=vulkan,<filters>,hwupload=derive_device=cuda
0th order modified bessel function of the first kind are used in multiple
places, lets avoid having 3+ different implementations
I picked this one as its accurate and quite fast, it can be replaced if
a better one is found
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The temporary AVFrame on staack enables us to use the common
dependency/dispatch code in prepare_frame().
The prepare_frame() function is used for both frame initialization
and frame import/export queue family transfer operations.
In the former case, no AVFrame exists yet, so, as this is purely
libavutil code, we create a temporary frame on stack. Otherwise,
we'd need to allocate multiple frames somewhere, one for each
possible command buffer dispatch.
The idea was that it's faster to map linear images and copy them
via regular memcpy. This is a very niche use, plus very inconsistently
useful, as it would only really be faster on a few Intel GPUs.
Even then, using the non-cached memcpy would've been better.
Instead, scrap this code. Drivers are better at figuring out
what copy to use, and if we're host-mapping, it should actually be
just as fast, if not faster.
This commit adds proper handling of multiplane images throughout
all of the hwcontext code. To avoid breakage of individual
components, the change is performed as a single commit.
This commit rewrites the majority of vulkan.c to enable its use
as a general-purpose high-level utility code, usable for decoding,
encoding, and filtering of video frames.
The dependency system was rewritten to simplify management of
execution.
The image handling system was rewritten to accomodate multiplane
images.
Due to how related all the new features were, this is a single
commit.
This just disables the vulkan headers from defining any symbols
like vkCmdPipelineBarrier2(). Instead, all functions must be loaded
via the loader and used as function pointers as vk->CmdPipelineBarrier2.
Mostly just forces developers to write correct code, as using the
symbols can be undesirable in case API users define their own
function wrappers via the loader API.
The hack was added to enable exporting of vulkan images to DRM.
On Intel hardware, specifically for DRM images, all planes must be
allocated next to each other, due to hardware limitation, so the hack
used a single large allocation and suballocated all planes from it.
By natively supporting multiplane images, the driver is what decides
the layout, so exporting just works.
It's a hack because it conflicted heavily with image allocation, and
with the whole ecosystem in general, before multiplane images were
supported, which just made it redundant.
This is also the commit which broke the hwcontext hardest and prompted
the entire rewrite in the first place.
This just bumps the required loader library version (libvulkan).
All device-related features, such as video decoding, atomics, etc.
are still optional and the code deals with their loss on a local level
(e.g. the decoder or filter checks for the features it needs, not
the hwcontext).
Bumping the required version essentially packs all maintenance
extensions which correct the spec rather than requiring to enable
them individually.
This patch supports the use of the "checkasm --bench" testing feature
on loongarch platform.
Change-Id: I42790388d057c9ade0dfa38a19d9c1fd44ca0bc3
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Not only this is information that relies on the concept of a sequence of
frames, which is completely out of place as a field in AVFrame, but there are
no known or intended uses of this field.
Signed-off-by: James Almer <jamrial@gmail.com>
As with the earlier bswap change, all versions of GCC and Clang that
support RISC-V support the popcount built-ins, so we can just use them
instead of inline assembler.
av_bswapXX() are used in context that expect exact size types, notably
variable arguments to av_log(). On Linux RV64, uint_fast32_t is an
unsigned long, so the current inline assembler does not work properly.
Since GCC and Clang gained their byte-swap built-ins before they
supported RISC-V, we can simply defer to them. As an added bonus, the
compiler can do instruction scheduling, which it couldn't with the Zbb
inline assembler.
- qsv_internal.h: Remove unnecessary include va_drm.h
- qsv_internal.h: Enable AVCODEC_QSV_LINUX_SESSION_HANDLE on Linux/VA only
- hwcontext_qsv.c: Do not allow child_device_type VAAPI for Windows until
support is added, keep D3D11/DXVA2 as more prioritary defaults.
Initial review at https://github.com/intel-media-ci/ffmpeg/pull/619/
Signed-off-by: Sil Vilerino <sivileri@microsoft.com>
Reviewed-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Wu, Tong1 <tong1.wu@intel.com>
Fixes: signed integer overflow: 100183269 - -2132769113 cannot be represented in type 'int'
Fixes: 55063/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_FIXED_fuzzer-5039294027005952
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
timer.h has been removed from internal.h, and then added back with
3e6088f for convenience. This patch removed it again for the
following reasons:
1. Only includes what's necessary is a common and safe strategy.
2. It fixed some build errors on Android:
a. libavutil/timer.h includes sys/ioctl.h, and ioctl.h includes
termios.h on Android.
b. termios.h reserves names prefixed with ‘c_’, ‘V’, ‘I’, ‘O’, and
‘TC’; and names prefixed with ‘B’ followed by a digit.
c. libavcodec uses B0 B1 and so on as variable names a lot. So
the code failed to build with --enable-linux-perf, or
--target-os=Linux.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The function now accepts an existing buffer to avoid unnecessary allocations,
as well as only reporting the needed amount of bytes if you pass a NULL pointer
as input for data.
For this, both parameters become input and output, as well as making data
optional. This is backwards compatible, and as such not breaking any existing
use of the function in external code (if there's any).
Signed-off-by: James Almer <jamrial@gmail.com>
This way we can clean up separate definitions in functions with
just a single loop, as well as have no reuse between different
loops' counters in functions with multiple.
These fields are supposed to store information about the packet the
frame was decoded from, specifically the byte offset it was stored at
and its size.
However,
- the fields are highly ad-hoc - there is no strong reason why
specifically those (and not any other) packet properties should have a
dedicated field in AVFrame; unlike e.g. the timestamps, there is no
fundamental link between coded packet offset/size and decoded frames
- they only make sense for frames produced by decoding demuxed packets,
and even then it is not always the case that the encoded data was
stored in the file as a contiguous sequence of bytes (in order for pos
to be well-defined)
- pkt_pos was added without much explanation, apparently to allow
passthrough of this information through lavfi in order to handle byte
seeking in ffplay. That is now implemented using arbitrary user data
passthrough in AVFrame.opaque_ref.
- several filters use pkt_pos as a variable available to user-supplied
expressions, but there seems to be no established motivation for using them.
- pkt_size was added for use in ffprobe, but that too is now handled
without using this field. Additonally, the values of this field
produced by libavcodec are flawed, as described in the previous
ffprobe conversion commit.
In summary - these fields are ill-defined and insufficiently motivated,
so deprecate them.
Fixes compilation with clang which errors out on Wint-conversion.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
This fixes the following error when compiling with a modern
version of Clang for Windows/i386:
src/libavutil/hwcontext_vulkan.c:738:32: error: incompatible function pointer types initializing 'PFN_vkDebugUtilsMessengerCallbackEXT' (aka 'unsigned int (*)(enum VkDebugUtilsMessageSeverityFlagBitsEXT, unsigned int, const struct VkDebugUtilsMessengerCallbackDataEXT *, void *) __attribute__((stdcall))') with an expression of type 'VkBool32 (VkDebugUtilsMessageSeverityFlagBitsEXT, VkDebugUtilsMessageTypeFlagsEXT, const VkDebugUtilsMessengerCallbackDataEXT *, void *)' (aka 'unsigned int (enum VkDebugUtilsMessageSeverityFlagBitsEXT, unsigned int, const struct VkDebugUtilsMessengerCallbackDataEXT *, void *)') [-Wincompatible-function-pointer-types]
.pfnUserCallback = vk_dbg_callback,
^~~~~~~~~~~~~~~
Signed-off-by: Martin Storsjö <martin@martin.st>
According to description of vaExportSurfaceHandle in libva, vaSyncSurface
must be called if the contents of the surface will be read.
Fixes ticket #9967.
Reviewed-by: Zhao Zhili <zhilizhao@tencent.com>
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Replace cpucfg with getauxval to avoid crash in case of
some processor capabilities are not supportted by kernel used.
Reviewed-by: Steven Liu <liuqi05@kuaishou.com>
Remove CONFIG_VAAPI for VUYX, YUYV422, Y210, XV30, Y212, XV36.
Make 8-bit, 10-bit, 12-bit YUV 4:2:2 video sources as well as YUV 4:4:4
video sources supported by d3d11va and dxva2 just like what VAAPI does.
Sign-off-by: Tong Wu <tong1.wu@intel.com>
Add support for VUYX, YUYV422, Y210, XV30, P012, Y212, XV36.
The added formats work with qsv acceleration and will not have
impact on dxva2 acceleration(-hwaccel dxva2) since so far
these formats are still not supported by using dxva2 acceleration.
Hwupload and hwdownload can work with the added formats.
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Add support for VUYX, YUYV422, Y210, XV30, P012, Y212, XV36.
The added formats work with qsv acceleration and will not have
impact on d3d11va acceleration(-hwaccel d3d11va) since so far
these formats are still not supported by using d3d11va acceleration.
Hwupload and hwdownload can work with the added formats.
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Their usefulness is questionable, very few decoders set them, and their type
should have been int64_t. A replacement field can be added later if a valid use
case is found.
Signed-off-by: Marton Balint <cus@passwd.hu>
libavutil/color_utils contains some avpriv_ symbols that map
enum AVTransferCharacteristic values to gamma-curve approximations and
to the actual transfer functions to invert them (i.e. -> linear).
There's two issues with this:
(1) avpriv is evil and should be avoided whenever possible
(2) libavutil/csp.h exposes a public API for handling color that
already handles primaries and matricies
I don't see any reason this API has to be private, so this commit takes
the functionality from avutil/color_utils and merges it into avutil/csp
with an exposed av_ API rather than the previous avpriv_ API.
Every reference to the previous API has been updated to point to the
new one. color_utils.h has been deleted as well. This should not break
any applications as it only contained avpriv_ symbols in the first
place, so nothing in that header could be referenced by other
applications.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The SDK supports UYVY from version 1.17, and VPP may support UYVY
input on Linux [1]
$ ffmpeg -loglevel verbose -init_hw_device qsv=intel -f lavfi -i \
yuvtestsrc -vf \
"format=uyvy422,hwupload=extra_hw_frames=32,vpp_qsv=format=nv12" \
-f null -
[1] https://github.com/Intel-Media-SDK/MediaSDK/blob/master/doc/samples/readme-vpp_linux.md
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)
x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.
The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.
In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
The construct of using offsetof on a (potentially anonymous) struct
defined within the offsetof expression, while supported by all
current compilers, has been declared explicitly undefined by the
C standards committee [1].
Clang recently got a change to identify this as an issue [2];
initially it was treated as a hard error, but it was soon after
softened into a warning under the -Wgnu-offsetof-extensions option
(not enabled automatically as part of -Wall though).
Nevertheless - in this particular case, it's trivial to fix the
code not to rely on the construct that the standards committee has
explicitly called out as undefined.
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2350.htm
[2] https://reviews.llvm.org/D133574
Signed-off-by: Martin Storsjö <martin@martin.st>
If the dictionary provided on input contains multiple entries for an
option (relevant for flags modifying the previous value with '+' or
'-') and the option is not found in the target object, only the last
entry would be returned to the caller.
Pass AV_DICT_MULTIKEY to av_dict_set() to make sure all such entries are
returned.
AVMediaCodecDeviceContext without surface or native_window is
useless, it shouldn't be created at all. Such dummy AVHWDeviceContext
is allowed before, and it's used by mpv player. Creating a ANativeWindow
automatically breaks such usecases.
So disable creating a ANativeWindow by default. It can be enabled
via the create_window flag, or by set the AVDictionary of
av_hwdevice_ctx_create(). The downside is that
ffmpeg -hwaccel mediacodec -i input.mp4 \
-c:a copy -c:v hevc_mediacodec output.mp4
use ByteBuffer mode which isn't as efficient as before. The upside
is libavfilter works now, which should be less surprise.
To enable create_window on ffmpeg command line, use
ffmpeg -hwaccel mediacodec \
-init_hw_device mediacodec=mediacodec,create_window=1 \
-i input.mp4 -c:a copy -c:v hevc_mediacodec output.mp4
Users should know what it is to enable create_window. It should
be OK to take sometime to figure out the option. And there are comments
inside hwcontext_mediacodec.h to help user figure it out.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Fixes: signed integer overflow: 574590586 - -1875616554 cannot be represented in type 'int'
Fixes: 53914/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_FIXED_fuzzer-5037125846564864
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
They're not currently used, so they don't need to be there.
Vulkan stabilized the decode extensions less than a week ago, and their
name prefixes were changed from EXT to KHR. It's a bit too soon to be
depending on it, so rather than bumping, just remove these for now.
xv30be is an obnoxious format that I shouldn't have included in the
first place. xv30 packs 3 10bit channels into 32bits and while our
byte-oriented logic can handle Little Endian correctly, it cannot
handle Big Endian. To avoid that, I marked xv30be as a bitstream
format, but while that didn't produce FATE errors, it turns out that
the existing read/write code silently produces incorrect results, which
can be revealed via ubsan.
In all likelyhood, the correct fix here is to remove the format. As
this format is only used by Intel vaapi, it's only going to show up
in LE form, so we could just drop the BE version. But I don't want to
deal with creating a hole in the pixfmt list and all the weirdness that
comes from that. Instead, I decided to write the correct read/write
code for it.
And that code isn't too bad, as long as it's specialised for this
format, as the channels are all bit-aligned inside a 32bit word.
There can be more than one available render node, and it's not
guaranteed the first node we come across is the correct one. In
particular, 'vgem' devices are common, and are
never VAAPI-enabled and thus not valid here.
We have a 'kernel_driver' arg already for specifying a single driver we
*do* want, but it doesn't support a negation, nor a list. It's easier
just to automatically skip 'vgem' anyway, to avoid foisting this burden
on users.
This has precedent in libva-utils already:
bfb6b98ed62a exclude vgem node and invalid drm node in vainfo
bfb6b98ed6
Signed-off-by: Brian Norris <briannorris@chromium.org>
This commit tests it in a way that automatically checks
that using av_dict_iterate() is equivalent to using
av_dict_get() with key "" and AV_DICT_IGNORE_SUFFIX.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: signed integer overflow: -1284837070 - 982101618 cannot be represented in type 'int'
Fixes: 53105/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AC3_FIXED_fuzzer-4848015827664896
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>