These are available since ARMv8.4-a and ARMv8.6-a respectively,
but can also be available optionally since ARMv8.2-a.
Check if ".arch armv8.2-a" and ".arch_extension {dotprod,i8mm}" are
supported, and check if the instructions can be assembled.
Current clang versions fail to support the dotprod and i8mm
features in the .arch_extension directive, but do support them
if enabled with -march=armv8.4-a on the command line. (Curiously,
lowering the arch level with ".arch armv8.2-a" doesn't make the
extensions unavailable if they were enabled with -march; if that
changes, Clang should also learn to support these extensions via
.arch_extension for them to remain usable here.)
Signed-off-by: Martin Storsjö <martin@martin.st>
Today, cuda is not able to import multiplane images, and cuda requires
images to be imported whether you trying to import to cuda or export
from cuda (in the later case, the image is imported and then copied
into on the cuda side). So any interop between cuda and vulkan requires
that multiplane be disabled.
The existing option for this is not sufficient, because when deriving
devices it is not possible to specify any options.
And, it is necessary to derive the Vulkan device, because any pipeline
that involves uploading from cuda to vulkan and then back to cuda must
use the same cuda context on both sides, and the only way to propagate
the cuda context all the way through is to derive the device at each
stage.
ie:
-vf hwupload=derive_device=vulkan,<filters>,hwupload=derive_device=cuda
0th order modified bessel function of the first kind are used in multiple
places, lets avoid having 3+ different implementations
I picked this one as its accurate and quite fast, it can be replaced if
a better one is found
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The temporary AVFrame on staack enables us to use the common
dependency/dispatch code in prepare_frame().
The prepare_frame() function is used for both frame initialization
and frame import/export queue family transfer operations.
In the former case, no AVFrame exists yet, so, as this is purely
libavutil code, we create a temporary frame on stack. Otherwise,
we'd need to allocate multiple frames somewhere, one for each
possible command buffer dispatch.
The idea was that it's faster to map linear images and copy them
via regular memcpy. This is a very niche use, plus very inconsistently
useful, as it would only really be faster on a few Intel GPUs.
Even then, using the non-cached memcpy would've been better.
Instead, scrap this code. Drivers are better at figuring out
what copy to use, and if we're host-mapping, it should actually be
just as fast, if not faster.
This commit adds proper handling of multiplane images throughout
all of the hwcontext code. To avoid breakage of individual
components, the change is performed as a single commit.
This commit rewrites the majority of vulkan.c to enable its use
as a general-purpose high-level utility code, usable for decoding,
encoding, and filtering of video frames.
The dependency system was rewritten to simplify management of
execution.
The image handling system was rewritten to accomodate multiplane
images.
Due to how related all the new features were, this is a single
commit.
This just disables the vulkan headers from defining any symbols
like vkCmdPipelineBarrier2(). Instead, all functions must be loaded
via the loader and used as function pointers as vk->CmdPipelineBarrier2.
Mostly just forces developers to write correct code, as using the
symbols can be undesirable in case API users define their own
function wrappers via the loader API.
The hack was added to enable exporting of vulkan images to DRM.
On Intel hardware, specifically for DRM images, all planes must be
allocated next to each other, due to hardware limitation, so the hack
used a single large allocation and suballocated all planes from it.
By natively supporting multiplane images, the driver is what decides
the layout, so exporting just works.
It's a hack because it conflicted heavily with image allocation, and
with the whole ecosystem in general, before multiplane images were
supported, which just made it redundant.
This is also the commit which broke the hwcontext hardest and prompted
the entire rewrite in the first place.
This just bumps the required loader library version (libvulkan).
All device-related features, such as video decoding, atomics, etc.
are still optional and the code deals with their loss on a local level
(e.g. the decoder or filter checks for the features it needs, not
the hwcontext).
Bumping the required version essentially packs all maintenance
extensions which correct the spec rather than requiring to enable
them individually.
This patch supports the use of the "checkasm --bench" testing feature
on loongarch platform.
Change-Id: I42790388d057c9ade0dfa38a19d9c1fd44ca0bc3
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Not only this is information that relies on the concept of a sequence of
frames, which is completely out of place as a field in AVFrame, but there are
no known or intended uses of this field.
Signed-off-by: James Almer <jamrial@gmail.com>
As with the earlier bswap change, all versions of GCC and Clang that
support RISC-V support the popcount built-ins, so we can just use them
instead of inline assembler.
av_bswapXX() are used in context that expect exact size types, notably
variable arguments to av_log(). On Linux RV64, uint_fast32_t is an
unsigned long, so the current inline assembler does not work properly.
Since GCC and Clang gained their byte-swap built-ins before they
supported RISC-V, we can simply defer to them. As an added bonus, the
compiler can do instruction scheduling, which it couldn't with the Zbb
inline assembler.
- qsv_internal.h: Remove unnecessary include va_drm.h
- qsv_internal.h: Enable AVCODEC_QSV_LINUX_SESSION_HANDLE on Linux/VA only
- hwcontext_qsv.c: Do not allow child_device_type VAAPI for Windows until
support is added, keep D3D11/DXVA2 as more prioritary defaults.
Initial review at https://github.com/intel-media-ci/ffmpeg/pull/619/
Signed-off-by: Sil Vilerino <sivileri@microsoft.com>
Reviewed-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Reviewed-by: Wu, Tong1 <tong1.wu@intel.com>
Fixes: signed integer overflow: 100183269 - -2132769113 cannot be represented in type 'int'
Fixes: 55063/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_FIXED_fuzzer-5039294027005952
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
timer.h has been removed from internal.h, and then added back with
3e6088f for convenience. This patch removed it again for the
following reasons:
1. Only includes what's necessary is a common and safe strategy.
2. It fixed some build errors on Android:
a. libavutil/timer.h includes sys/ioctl.h, and ioctl.h includes
termios.h on Android.
b. termios.h reserves names prefixed with ‘c_’, ‘V’, ‘I’, ‘O’, and
‘TC’; and names prefixed with ‘B’ followed by a digit.
c. libavcodec uses B0 B1 and so on as variable names a lot. So
the code failed to build with --enable-linux-perf, or
--target-os=Linux.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
The function now accepts an existing buffer to avoid unnecessary allocations,
as well as only reporting the needed amount of bytes if you pass a NULL pointer
as input for data.
For this, both parameters become input and output, as well as making data
optional. This is backwards compatible, and as such not breaking any existing
use of the function in external code (if there's any).
Signed-off-by: James Almer <jamrial@gmail.com>
This way we can clean up separate definitions in functions with
just a single loop, as well as have no reuse between different
loops' counters in functions with multiple.
These fields are supposed to store information about the packet the
frame was decoded from, specifically the byte offset it was stored at
and its size.
However,
- the fields are highly ad-hoc - there is no strong reason why
specifically those (and not any other) packet properties should have a
dedicated field in AVFrame; unlike e.g. the timestamps, there is no
fundamental link between coded packet offset/size and decoded frames
- they only make sense for frames produced by decoding demuxed packets,
and even then it is not always the case that the encoded data was
stored in the file as a contiguous sequence of bytes (in order for pos
to be well-defined)
- pkt_pos was added without much explanation, apparently to allow
passthrough of this information through lavfi in order to handle byte
seeking in ffplay. That is now implemented using arbitrary user data
passthrough in AVFrame.opaque_ref.
- several filters use pkt_pos as a variable available to user-supplied
expressions, but there seems to be no established motivation for using them.
- pkt_size was added for use in ffprobe, but that too is now handled
without using this field. Additonally, the values of this field
produced by libavcodec are flawed, as described in the previous
ffprobe conversion commit.
In summary - these fields are ill-defined and insufficiently motivated,
so deprecate them.
Fixes compilation with clang which errors out on Wint-conversion.
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
This fixes the following error when compiling with a modern
version of Clang for Windows/i386:
src/libavutil/hwcontext_vulkan.c:738:32: error: incompatible function pointer types initializing 'PFN_vkDebugUtilsMessengerCallbackEXT' (aka 'unsigned int (*)(enum VkDebugUtilsMessageSeverityFlagBitsEXT, unsigned int, const struct VkDebugUtilsMessengerCallbackDataEXT *, void *) __attribute__((stdcall))') with an expression of type 'VkBool32 (VkDebugUtilsMessageSeverityFlagBitsEXT, VkDebugUtilsMessageTypeFlagsEXT, const VkDebugUtilsMessengerCallbackDataEXT *, void *)' (aka 'unsigned int (enum VkDebugUtilsMessageSeverityFlagBitsEXT, unsigned int, const struct VkDebugUtilsMessengerCallbackDataEXT *, void *)') [-Wincompatible-function-pointer-types]
.pfnUserCallback = vk_dbg_callback,
^~~~~~~~~~~~~~~
Signed-off-by: Martin Storsjö <martin@martin.st>
According to description of vaExportSurfaceHandle in libva, vaSyncSurface
must be called if the contents of the surface will be read.
Fixes ticket #9967.
Reviewed-by: Zhao Zhili <zhilizhao@tencent.com>
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Replace cpucfg with getauxval to avoid crash in case of
some processor capabilities are not supportted by kernel used.
Reviewed-by: Steven Liu <liuqi05@kuaishou.com>
Remove CONFIG_VAAPI for VUYX, YUYV422, Y210, XV30, Y212, XV36.
Make 8-bit, 10-bit, 12-bit YUV 4:2:2 video sources as well as YUV 4:4:4
video sources supported by d3d11va and dxva2 just like what VAAPI does.
Sign-off-by: Tong Wu <tong1.wu@intel.com>
Add support for VUYX, YUYV422, Y210, XV30, P012, Y212, XV36.
The added formats work with qsv acceleration and will not have
impact on dxva2 acceleration(-hwaccel dxva2) since so far
these formats are still not supported by using dxva2 acceleration.
Hwupload and hwdownload can work with the added formats.
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Add support for VUYX, YUYV422, Y210, XV30, P012, Y212, XV36.
The added formats work with qsv acceleration and will not have
impact on d3d11va acceleration(-hwaccel d3d11va) since so far
these formats are still not supported by using d3d11va acceleration.
Hwupload and hwdownload can work with the added formats.
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Their usefulness is questionable, very few decoders set them, and their type
should have been int64_t. A replacement field can be added later if a valid use
case is found.
Signed-off-by: Marton Balint <cus@passwd.hu>
libavutil/color_utils contains some avpriv_ symbols that map
enum AVTransferCharacteristic values to gamma-curve approximations and
to the actual transfer functions to invert them (i.e. -> linear).
There's two issues with this:
(1) avpriv is evil and should be avoided whenever possible
(2) libavutil/csp.h exposes a public API for handling color that
already handles primaries and matricies
I don't see any reason this API has to be private, so this commit takes
the functionality from avutil/color_utils and merges it into avutil/csp
with an exposed av_ API rather than the previous avpriv_ API.
Every reference to the previous API has been updated to point to the
new one. color_utils.h has been deleted as well. This should not break
any applications as it only contained avpriv_ symbols in the first
place, so nothing in that header could be referenced by other
applications.
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The SDK supports UYVY from version 1.17, and VPP may support UYVY
input on Linux [1]
$ ffmpeg -loglevel verbose -init_hw_device qsv=intel -f lavfi -i \
yuvtestsrc -vf \
"format=uyvy422,hwupload=extra_hw_frames=32,vpp_qsv=format=nv12" \
-f null -
[1] https://github.com/Intel-Media-SDK/MediaSDK/blob/master/doc/samples/readme-vpp_linux.md
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
From x86inc:
> On AMD cpus <=K10, an ordinary ret is slow if it immediately follows either
> a branch or a branch target. So switch to a 2-byte form of ret in that case.
> We can automatically detect "follows a branch", but not a branch target.
> (SSSE3 is a sufficient condition to know that your cpu doesn't have this problem.)
x86inc can automatically determine whether to use REP_RET rather than
REP in most of these cases, so impact is minimal. Additionally, a few
REP_RETs were used unnecessary, despite the return being nowhere near a
branch.
The only CPUs affected were AMD K10s, made between 2007 and 2011, 16
years ago and 12 years ago, respectively.
In the future, everyone involved with x86inc should consider dropping
REP_RETs altogether.
The construct of using offsetof on a (potentially anonymous) struct
defined within the offsetof expression, while supported by all
current compilers, has been declared explicitly undefined by the
C standards committee [1].
Clang recently got a change to identify this as an issue [2];
initially it was treated as a hard error, but it was soon after
softened into a warning under the -Wgnu-offsetof-extensions option
(not enabled automatically as part of -Wall though).
Nevertheless - in this particular case, it's trivial to fix the
code not to rely on the construct that the standards committee has
explicitly called out as undefined.
[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2350.htm
[2] https://reviews.llvm.org/D133574
Signed-off-by: Martin Storsjö <martin@martin.st>
If the dictionary provided on input contains multiple entries for an
option (relevant for flags modifying the previous value with '+' or
'-') and the option is not found in the target object, only the last
entry would be returned to the caller.
Pass AV_DICT_MULTIKEY to av_dict_set() to make sure all such entries are
returned.
AVMediaCodecDeviceContext without surface or native_window is
useless, it shouldn't be created at all. Such dummy AVHWDeviceContext
is allowed before, and it's used by mpv player. Creating a ANativeWindow
automatically breaks such usecases.
So disable creating a ANativeWindow by default. It can be enabled
via the create_window flag, or by set the AVDictionary of
av_hwdevice_ctx_create(). The downside is that
ffmpeg -hwaccel mediacodec -i input.mp4 \
-c:a copy -c:v hevc_mediacodec output.mp4
use ByteBuffer mode which isn't as efficient as before. The upside
is libavfilter works now, which should be less surprise.
To enable create_window on ffmpeg command line, use
ffmpeg -hwaccel mediacodec \
-init_hw_device mediacodec=mediacodec,create_window=1 \
-i input.mp4 -c:a copy -c:v hevc_mediacodec output.mp4
Users should know what it is to enable create_window. It should
be OK to take sometime to figure out the option. And there are comments
inside hwcontext_mediacodec.h to help user figure it out.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Fixes: signed integer overflow: 574590586 - -1875616554 cannot be represented in type 'int'
Fixes: 53914/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_FIXED_fuzzer-5037125846564864
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
They're not currently used, so they don't need to be there.
Vulkan stabilized the decode extensions less than a week ago, and their
name prefixes were changed from EXT to KHR. It's a bit too soon to be
depending on it, so rather than bumping, just remove these for now.
xv30be is an obnoxious format that I shouldn't have included in the
first place. xv30 packs 3 10bit channels into 32bits and while our
byte-oriented logic can handle Little Endian correctly, it cannot
handle Big Endian. To avoid that, I marked xv30be as a bitstream
format, but while that didn't produce FATE errors, it turns out that
the existing read/write code silently produces incorrect results, which
can be revealed via ubsan.
In all likelyhood, the correct fix here is to remove the format. As
this format is only used by Intel vaapi, it's only going to show up
in LE form, so we could just drop the BE version. But I don't want to
deal with creating a hole in the pixfmt list and all the weirdness that
comes from that. Instead, I decided to write the correct read/write
code for it.
And that code isn't too bad, as long as it's specialised for this
format, as the channels are all bit-aligned inside a 32bit word.
There can be more than one available render node, and it's not
guaranteed the first node we come across is the correct one. In
particular, 'vgem' devices are common, and are
never VAAPI-enabled and thus not valid here.
We have a 'kernel_driver' arg already for specifying a single driver we
*do* want, but it doesn't support a negation, nor a list. It's easier
just to automatically skip 'vgem' anyway, to avoid foisting this burden
on users.
This has precedent in libva-utils already:
bfb6b98ed62a exclude vgem node and invalid drm node in vainfo
bfb6b98ed6
Signed-off-by: Brian Norris <briannorris@chromium.org>
This commit tests it in a way that automatically checks
that using av_dict_iterate() is equivalent to using
av_dict_get() with key "" and AV_DICT_IGNORE_SUFFIX.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Fixes: signed integer overflow: -1284837070 - 982101618 cannot be represented in type 'int'
Fixes: 53105/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AC3_FIXED_fuzzer-4848015827664896
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Since D3D11 was introduced for QSV in FFmpeg 5.0, there is an implied
API/ABI change for user-supplied frames [1], hence update the
description for AV_PIX_FMT_QSV.
[1] https://ffmpeg.org/pipermail/ffmpeg-devel/2021-December/290444.html
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Should fix fate failures on Windowx x86 targets, where long is 32 bits.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: James Almer <jamrial@gmail.com>
The amount of lines printed is too high for the verbose level, and the debug
level is a better fit for their content.
Signed-off-by: James Almer <jamrial@gmail.com>
Mostly consistent formatting and consistently ordering of
warnings/notes to be next to the description.
Additionally group the AV_DICT_* macros.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This is a more explicit iteration API rather than using the "magic"
av_dict_get(d, "", t, AV_DICT_IGNORE_SUFFIX) which is not really
trivial to grasp what it does when casually reading through code.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This can be achieved by moving the AVOnce out of the structure
containing the function pointers; the latter can then be made
const.
This also has the advantage of eliminating padding in the structure
(sizeof(AVOnce) is four here) and allowing the AVOnces to be put
into .bss (dependening upon the implementation).
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is possible to avoid the factors array for the power-of-two
tables for which said array is unused by using a different
structure for initialization for power-of-two tables than for
non-power-of-two-tables. This saves 3*15*16B from .data.
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, av_aes_init() uses a->round_key[0].u8 + t
as dst of memcpy where it is intended for t to greater
than 16 (u8 is an uint8_t[16]); given that round_key itself
is an array, it is actually intended for the dst to be
in a latter round_key member. To do this properly,
just cast a->round_key to unsigned char*.
This fixes the srtp, aes, aes_ctr, mov-3elist-encrypted,
mov-frag-encrypted and mov-tenc-only-encrypted
FATE-tests with (Clang-)UBSan.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The AES code uses av_aes_block, a union consisting of
uint64_t[2], uint32_t[4], uint8_t[4][4] and uint8_t[16].
subshift() performs byte-wise manipulations of two av_aes_blocks,
but when encrypting, it does so with a shift of two bytes;
more precisely, it uses
"av_aes_block *s1 = (av_aes_block *) (s0[0].u8 - s)"
and lateron uses the uint8_t[16] member to access s0.
Yet av_aes_block requires to be suitably aligned for
the uint64_t[2] member, which s0[0].u8 - 2 is certainly
not. This is in violation of 6.3.2.3 (7) of C11. UBSan
reports this in the aes_ctr, mov-3elist-encrypted,
mov-frag-encrypted, mov-tenc-only-encrypted and srtp
tests.
Furthermore, there is another issue here: The pointer points
outside of s0; this works, because all the accesses lateron
use an index >= 3. (Clang-)UBSan reports this as
"runtime error: index -2 out of bounds for type 'uint8_t[16]'".
This commit fixes both of these issues: The latter issue
is fixed by applying an offset of "+ 3" during the cast
and subtracting this from the indices used lateron.
The former issue is solved by not casting to av_aes_block*
at all; instead simply cast to unsigned char*.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Separate the blocks to make the grouping easier to grasp,
add the file properly to the group and fix the file description
incorrectly used as filename, fixing the following doxy warning:
warning: the name 'Colorspace' supplied as the argument in
the \file statement is not an input file
Make it a bit easier to grasp the grouping when not
unnecessarily splitting comment blocks.
Additionally do not try to add lavu_video_stereo3d to itself, resolving
the following doxy warning:
warning: Refusing to add group lavu_video_stereo3d to itself
Make it a bit easier to grasp the grouping when not
unnecessarily splitting comment blocks.
Additionally do not try to add lavu_video_spherical to itself, resolving
the following doxy warning:
warning: Refusing to add group lavu_video_spherical to itself
Make it a bit easier to grasp the grouping when not
unnecessarily splitting comment blocks.
Additionally do not try to add lavu_video_display to itself, resolving
the following doxy warning:
warning: Refusing to add group lavu_video_display to itself
VSETVLI xd, x0, ...' has rather nonobvious semantics:
- If xd is x0, then it preserves the current vector length.
- If xd is not x0, it sets the vector length to the supported maximum.
Also somewhat confusingly, while VMV.X.S always does its thing
regardless of the selected vector length, VMV.S.X does _nothing_ if the
selected vector length is zero.
So the current code breaks fails to initialise the accumulator if we
are unlucky to have a selected vector length of zero on entry. Fix it
by forcing the vector length to one.
Add an AV_PIX_FMT_NE macro for RGB32FBE/RGB32FLE and also one for
RGBA32FBE/RGBA32FLE for packed 32-bit float RGB samples, and also
packed 32-bit float RGBA samples, respectively.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Leo Izen <leo.izen@gmail.com>
Currently it is done in several different ways, which
might cause needless dependencies or in case of
tx_float_neon.S is incorrect.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
GCC 4.0 not only added a visibility attribute, but also
a pragma to set it for a whole region of code.*
This commit exposes this via macros.
*: See https://gcc.gnu.org/gcc-4.0/changes.html
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
P012, Y212 and XV36 are used for 12bit content in FFmpeg VAAPI, so
these formats should be used in FFmpeg QSV too, however the SDK only
declares support for P016, Y216 and Y416. So this commit fudged mappings
between AV_PIX_FMT_P012 and MFX_FOURCC_P016, AV_PIX_FMT_Y212 and
MFX_FOURCC_Y216, AV_PIX_FMT_XV36 and MFX_FOURCC_Y416.
Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
XV30 is used for 10bit 4:4:4 content in FFmpeg VAAPI, so XV30 should be
used for 10bit 4:4:4 content in FFmpeg QSV too because QSV is based on
VAAPI on Linux. However the SDK only declares support for Y410 but does
nothing with the alpha in Y410, so this commit fudged a mapping between
AV_PIX_FMT_XV30 and MFX_FOURCC_Y410.
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
We can't get Shift from bit depth for some formats in the SDK. For
example, bit depth is 10, however Shift is 0 for Y410 (XV30 in FFmpeg).
In order to support these formats in the next commits, this patch
specified Shift for each format
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
On most cases, the vector type (VTYPE) for the RISC-V Vector extension
is supplied as an immediate value, with either of the VSETVLI or
VSETIVLI instructions. There is however a third instruction VSETVL
which takes the vector type from a general purpose register. That is so
the type can be selected at run-time.
This introduces a macro to load a (valid) vector type into a register.
The syntax follows that of VSETVLI and VSETIVLI, with element size,
group multiplier, then tail and mask policies.
Unfortunately, it is common, and will remain so, that the Bit
manipulations are not enabled at compilation time. This is an official
policy for Debian ports in general (though they do not support RISC-V
officially as of yet) to stick to the minimal target baseline, which
does not include the B extension or even its Zbb subset.
For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or
even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems
impractical. But at least it can work for the byte-swap DSP functions.
The butterflies_fixed function pointer declaration specifies av_restrict
for the first two pointer arguments. So the corresponding function
definitions should honor this declaration.
MSVC emits warning C4113 for this.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Namely to lavu/tests/pixelutils.c. This way, this function will
not be included into actual binaries any more.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Instead use av_pix_fmt_desc_next(). It is still possible
to check its return values by comparing it with the
(currently) expected values and the code does so.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
ff_check_pixfmt_descriptors() was added in commit
20e99a9c10. At this time,
the values of enum AVPixelFormat were not contiguous;
instead there was a jump from 111 to 291 (or from 115
to 295 depending upon AV_PIX_FMT_ABI_GIT_MASTER).
ff_check_pixfmt_descriptors() accounts for this
by skipping empty descriptors. Yet this issue no longer
exists: There are no holes.
The check for said holes makes GCC believe that the name
can be NULL; because it is used as argument corresponding to
%s in a log statement, it therefore emits a warning
(since d75c4693fe). Therefore
this commit simply removes these checks.
Also move the checks for name before the log statement.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Don't place it as doxy specific for the order field, and generalize it both to
also cover already defined orders and to not make it seem like the user is
required to handle a layout they don't fully support or understand.
Signed-off-by: James Almer <jamrial@gmail.com>
RVV defines a total of 12 different extensions, including:
- 5 different instruction subsets:
- Zve32x: 8-, 16- and 32-bit integers,
- Zve32f: Zve32x plus single precision floats,
- Zve64x: Zve32x plus 64-bit integers,
- Zve64f: Zve32f plus Zve64x,
- Zve64d: Zve64f plus double precision floats.
- 6 different vector lengths:
- Zvl32b (embedded only),
- Zvl64b (embedded only),
- Zvl128b,
- Zvl256b,
- Zvl512b,
- Zvl1024b,
- and the V extension proper: equivalent to Zve64f and Zvl128b.
In total, there are 6 different possible sets of supported instructions
(including the empty set), but for convenience we allocate one bit for
each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32),
64-bit ints (RVV_I64) and doubles (RVV_F64).
Whence the vector size is needed, it can be retrieved by reading the
unprivileged read-only vlenb CSR. This should probably be a separate
helper macro if needed at a later point.
This introduces compile-time and run-time CPU detection on RISC-V. In
practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of
I, F and D extensions, and if it does, it probably won't have run-time
detection. So the flags are essentially always set.
But as things stand, checkasm wants them that way. Compare the ARMV8
flag on AArch64. We are nowhere near running short on CPU flag bits.
They are intended as replacements for avcodec_enum_to_chroma_pos()
and avcodec_chroma_pos_to_enum().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
To support non-aligned buffers during the post-transform step, just iterate
backwards over the array.
This allows using the 15xN-point FFT, with which the speed is 2.1 times
faster than our old libavcodec implementation.
~4x faster than the C version.
The shuffles in the 15pt dim1 are seriously expensive. Not happy with it,
but I'm contempt.
Can be easily converted to pure AVX by removing all vpermpd/vpermps
instructions.