For SSE2 and SSE3, there are four states that the two flags
involved (AV_CPU_FLAG_SSE[23] and AV_CPU_FLAG_SSE[23]SLOW) can convey.
When ordered from worst to best they are:
1. both flags unset (SSE[23] unavailable)
2. the slow flag set, the ordinary flag unset (this is designed
for cases where SSE2 is available, but so slow that MMX(EXT)/SSE
code is usually faster)
3. both flags set (SSE2 is available, but there might be scenarios
where MMX(EXT)/SSE code is faster)
4. the ordinary flag set, the slow flag unset (this is the normal case)
The ordinary macros for checking cpuflags return true
in the latter two cases; the fast macros only return true for
the latter case. Yet the macros to check for slow currently
only return true in case three.
This seems unintended. In fact, the only uses of the slow macros
are all of the form
if (EXTERNAL_SSE2(cpu_flags) || EXTERNAL_SSE2_SLOW(cpu_flags))
where the check for EXTERNAL_SSE2_SLOW is completely redundant.
Even more importantly, it is not what was intended. Before
6369ba3c9c, the checks passed
in cases 2 to 4. Said commit changed this to something that
only passes for the third case. Commits
7fb758cd8e and
c1913064e3 restored the old behaviour,
yet merging 4efab89332 (in commit
ac774cfa57) broke this again
by changing it to what it is now.*
This commit changes the macros to make the slow macros check
whether a specific instruction is supported, even if slow.
This restores the intended meaning to all uses of the SLOW macros
and is generally more natural.
*: Libav only checks for EXTERNAL_SSE2_SLOW, i.e. for the third case
only.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This is more spec-compliant because it does not rely
on dead-code elimination by the compiler. Especially
MSVC has problems with this, as can be seen in
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/296373.html
or
https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297022.html
This commit does not eliminate every instance where we rely
on dead code elimination: It only tackles branching to
the initialization of arch-specific dsp code, not e.g. all
uses of CONFIG_ and HAVE_ checks. But maybe it is already
enough to compile FFmpeg with MSVC with whole-programm-optimizations
enabled (if one does not disable too many components).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Those are always showing up on Patchwork when FATE tests are failing,
covering some possibly more useful information.
The volatile keyword was used as a workaround for an eight year old
clang version.
Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
Normally, both the source and dest frame would have only the old API fields
set, only the new API fields set, or both set. But in some cases, like when
calling av_frame_ref() using a non reference counted source frame where only
the old channel layout API fields were populated, the result would be the dst
frame having both the new and old fields populated.
This commit takes this into account and fixes the checks by calling
av_channel_layout_compare() only if the source frame has the new API fields
set, and doing sanity checks for the source frame old API fields if the new
ones are not set.
Signed-off-by: James Almer <jamrial@gmail.com>
This commit moves some of the functionality from avfilter/colorspace
into avutil/csp and exposes it as a public API so it can be used by
libavcodec and/or libavformat. It also converts those structs from
double values to AVRational to make regression testing easier and
more consistent.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The doc says those function are like av_free if size or nmemb is
zero. It doesn't match the code. av_realloc() realloc one byte if
size is zero, which was added by 91ff05f6ac ten years ago.
realloc() itself in C is implementation-dependent. Make the doc
match the longstanding behaviour.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
libmfx 1.28 was released 3 years ago, it is easy to get a greater
version than 1.28. We may remove lots of compile-time checks if adding
the requirement for the minimal version in the configure script.
Reviewed-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Since every DLL can use an individual CRT on Windows, having
an exported function that opens a FILE* won't work if that
FILE* is going to be used from a different DLL (or from user
application code).
Internally within the libraries, the issue can be worked around
by duplicating the function in all libraries (this already happened
implicitly because the function resided in file_open.c) and renaming
the function to ff_fopen_utf8 (so that it doesn't end up exported from
the DLLs) and duplicating it in all libraries that use it.
This makes the avpriv_fopen_utf8 / ff_fopen_utf8 function work in
the exact same way as the existing avpriv_open / ff_open, with the
same setup as introduced in e743e7ae6e.
That mechanism doesn't work for external users, thus deprecate the
existing function.
Signed-off-by: Martin Storsjö <martin@martin.st>
In d3d11va_create_staging_texture(), during the hwmap process, the
ctx->internal->priv is not initialized, resulting in the
texDesc.Format not initialized. Now pass the format value from
d3d11va_transfer_data() to fix it.
$ ffmpeg.exe -y -hwaccel qsv -init_hw_device d3d11va=d3d11 \
-init_hw_device qsv=qsv@d3d11 -c:v h264_qsv \
-i input.h264 -vf "hwmap=derive_device=d3d11va,format=d3d11,hwdownload,format=nv12" \
-f null -
Reviewed-by: Soft Works <softworkz@hotmail.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
When the SLOW_GATHER flag was added to the AVX2 version, this
made FMA3-features not enabled on Zen CPUs.
As FMA3 adds 6-7% across all platforms that support it, in
the interest of saving space, this commit removes the AVX
version and replaces it with an FMA3 version.
The only CPUs affected are Sandy Bridge and Bulldozer, which
have AVX support, but no FMA3 support.
In the future, if there's a demand for it, a version of the
function duplicated for AVX can be added.
Instead of having a fixed -64 prio penalty, make the penalties
more granular.
As the prio is based on the register size in bits, decrementing
it by 129 makes AVX SLOW functions be avoided in favor of any
SSE versions.
This reverts commit 82a68a8771.
Smarter slow ISA penalties makes gathers still useful.
The intention is to use gathers with the final stage of non-ptwo iMDCTs,
where they give benefit.
Its performance loss ranges from either being just as fast as individual loads
(Skylake), a few percent slower (Alderlake), 8% slower (Zen 3), to completely
disasterous (older/other CPUs).
Sadly, gathers never panned out fast on x86, even with the benefit of time and
implementation experience.
This also saves a register, as there's no need to fill out an additional
register mask.
Zen 3 (16384-point transform):
Before: 1561050 decicycles in av_tx (fft), 131072 runs, 0 skips
After: 1449621 decicycles in av_tx (fft), 131072 runs, 0 skips
Alderlake:
2% slower on big transforms (65536), to 1% (131072), to a few percent for smaller
sizes.
This avoids having to rebuild big files every time FFMPEG_VERSION
changes (which it does with every commit).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Otherwise its effect might not work causing CPU_COUNT to not get defined.
Fixes cpu count detection to actually use sched_getaffinity if available.
Signed-off-by: Marton Balint <cus@passwd.hu>
The width and height for qsv frame to download need to be
aligned with 16. Add the alignment operation.
Now the following command works:
ffmpeg -hwaccel qsv -f rawvideo -s 1920x1080 -pix_fmt yuv420p -i \
input.yuv -vf "hwupload=extra_hw_frames=16,format=qsv,hwdownload, \
format=nv12" -f null -
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Commit e050959103 implemented passing in
modifiers by using the PRIME_2 memory type, which only exists in v2 of
the library.
To still support v1 of the library, conditionally compile using
VA_CHECK_VERSION() for both the new code and the old code before
the commit.
Note PRIME_2 memory was introduced from VA-API 1.1, so use
VA_CHECK_VERSION(1, 1, 0) instead of VA_CHECK_VERSION(2, 0, 0) (Haihao)
Signed-off-by: Ingo Brückl <ib@wupperonline.de>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
No point running all 64 iterations in the loop to never write anything to ret.
Also make ambisonic layouts check its mask too while at it.
Signed-off-by: James Almer <jamrial@gmail.com>
bp->len cannot be used to detect if try_describe_ambisonic was successful
because the bprint buffer might contain other data as well.
Also describing an invalid ambisonic layout should not return 0 but
AVERROR(EINVAL) instead, so change try_describe_ambisonic to actually return
error on invalid ambisonics. This also allows us to fix the first issue.
Signed-off-by: Marton Balint <cus@passwd.hu>
This reduces code duplication an allows printing AMBI%d channel names for
custom layouts for non-standard or partial ambisonic layouts.
Signed-off-by: Marton Balint <cus@passwd.hu>
Fixes memleaks in the channel_layout FATE-test.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The new API is more extensible and allows for custom layouts.
More accurate information is exported, eg for decoders that do not
set a channel layout, lavc will not make one up for them.
Deprecate the old API working with just uint64_t bitmasks.
Expanded and completed by Vittorio Giovara <vittorio.giovara@gmail.com>
and James Almer <jamrial@gmail.com>.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
This avoids build errors if such features are enabled while targeting
another binary format. (Using such features on other platforms
might require some other form of signaling/setup though, but
the ELF specific .note section isn't applicable at least.)
Signed-off-by: Martin Storsjö <martin@martin.st>
This patch adds optional support for Arm Pointer Authentication Codes.
PAC support is turned on or off at compile time using additional
compiler flags. Unless any of these is enabled explicitly, no additional
code will be emitted at all.
Signed-off-by: André Kempe <andre.kempe@arm.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
The loongson_intrinsics.h file is updated from v1.0.3 version
to v1.1.0. Some spelling mistakes are fixed and new functions are added.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Reviewed-by: 殷时友 <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Having optionally installed headers is a bad idea as there's no way to know
if they are present or not (unless a define is added to avconfig.h, but that's
just ugly).
Signed-off-by: James Almer <jamrial@gmail.com>
Some of these were made possible by moving several common macros to
libavutil/macros.h.
While just at it, also improve the other headers a bit.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is a remnant of an FF_API_* inclusion (back from when they were in
avutil.h and not in version.h).
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It has been added for an FF_API_* at a time when these were in avutil.h.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It has been included since af5f434f8c
for deprecation reasons, but removing it has been forgotten after
it had served is purpose. So remove it.
For convenience, include version.h instead as LIBAVUTIL_VERSION_INT
is supposed to be used when creating AVClasses.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only include it if it is needed, namely if __MMX__ is undefined.
X86 is currently the only arch where lavu/cpu.h is basically
automatically included (for internal development): #if ARCH_X86
is true, lavu/internal.h (which is basically included everywhere)
includes lavu/x86/emms.h which can mask missing inclusions
of lavu/cpu.h if the developer works on x86/x64. This has happened
in 8e825ec3ab and also earlier
(see 6d2365882f).
By including said header only if necessary ordinary developer machines
will behave like non-x86 arches, so that missing inclusions of cpu.h
won't go unnoticed any more.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When the fifo is grown by exactly the current write offset, it would end
up with offset_w = nb_elems. If av_fifo_write_from_cb() is called in
such a state, the user callback would get callled with *nb_elems=0,
which will then cause the write to return without writing anything.
Otherwise nasm writes the full host-specific paths into .o
output, which breaks binary reproducibility.
Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
getauxval is marginally faster, and works even when procfs is not mounted
support on Linux was added in glibc 2.16
support on Android was added in 4.4 (API 20)
fixes#6578
Signed-off-by: Aman Karmani <aman@tmm1.net>
This commit does some refactoring to make defining assembly codelets
smaller, and fixes compiler redefinition warnings. It also allows
for other assembly versions to reuse the same boilerplate code as
x86.
Finally, it also adds the out_of_place flag to all assembly codelets.
This changes nothing, as out-of-place operation was assumed to be
available anyway, but this makes it more explicit.
Users should switch to the superior AVFifo API.
Unfortunately AVFifoBuffer fields cannot be marked as deprecated because
it would trigger a warning wherever fifo.h is #included, due to
inlined av_fifo_peek2().
Many AVFifoBuffer users operate on fixed-size elements (e.g. pointers),
but the current FIFO API deals exclusively in bytes, requiring extra
complexity in all these callers.
Add a new AVFifo API creating a FIFO with an element size
that may be larger than a byte. All operations on such a FIFO then
operate on complete elements.
This API does not reuse AVFifoBuffer and its API at all, but instead uses
an opaque struct called AVFifo. The AVFifoBuffer API will be deprecated
in a future commit once all of its users have been switched to the new
API.
Not reusing AVFifoBuffer also allowed to use the full range of size_t
from the beginning.
The API currently allows creating FIFOs up to
- UINT_MAX: av_fifo_alloc(), av_fifo_realloc(), av_fifo_grow()
- SIZE_MAX: av_fifo_alloc_array()
However the usable limit is determined by
- rndx/wndx being uint32_t
- av_fifo_[size,space] returning int
so no FIFO should be larger than the smallest of
- INT_MAX
- UINT32_MAX
- SIZE_MAX
(which should be INT_MAX an all commonly used platforms).
Return an error on trying to allocate FIFOs larger than this limit.
Avoids code duplication. It furthermore properly checks
for buf_size to be > 0 before doing anything.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
VkPhysicalDeviceVulkan12Features isn't implemented on MoltenVK yet.
VkPhysicalDeviceTimelineSemaphoreFeatures is less versatile but
simple. None of device_features_1_1 nor device_features_1_2 has real
usage yet, keep the code for future.
Makes Bulldozer prefer AVX functions rather than AVX2,
which are 64% slower:
AVX: 117653 decicycles in av_tx (fft), 1048535 runs, 41 skips
AVX2: 193385 decicycles in av_tx (fft), 1048561 runs, 15 skips
The only difference between both is that vgatherdpd is used in
the former. We don't want to mark them with the new SLOW_GATHER
flag however, since gathers are still faster on Haswell/Zen 2/3
than plain loads.
If a codelet initializes 2 subtransforms, and the second one fails,
the failure would free all subcontexts.
Instead, if there are subcontexts still left, don't free the array.
If all initializations fail, the init() function will return,
and reset_ctx() from the previous step will clean up all contained
subtransforms.
Fix CID: 1497864
The control flow should return ENOSYS if nb_cd_matches is 0 at before
and the ret equal AVERROR(ENOMEM) or goto end label, so remove the last
control flow if (ret >= 0) before end label.
Signed-off-by: Steven Liu <liuqi05@kuaishou.com>
This broke builds with --disable-mmx, which also disabled assembly
entirely, but ARCH_X86 was still true, so the init file tried to find
assembly that didn't exist.
Instead of checking for architecture, check if external x86 assembly
is enabled.
RDFTs are full of conventions that vary between implementations.
What I've gone for here is what's most common between
both fftw, avcodec's rdft and what we use, the equivalent of
which is DFT_R2C for forward and IDFT_C2R for inverse. The
other 2 conventions (IDFT_R2C and DFT_C2R) were not used at
all in our code, and their names are also not appropriate.
If there's a use for either, we can easily add a flag which
would just flip the sign on one exptab.
For some unknown reason, possibly to allow reusing FFT's exp tables,
av_rdft's C2R output is 0.5x lower than what it should be to ensure
a proper back-and-forth conversion.
This code outputs its real samples at the correct level, which
matches FFTW's level, and allows the user to change the level
and insert arbitrary multiplies for free by setting the scale option.
This commit rewrites the internal transform code into a constructor
that stitches transforms (codelets).
This allows for transforms to reuse arbitrary parts of other
transforms, and allows transforms to be stacked onto one
another (such as a full iMDCT using a half-iMDCT which in turn
uses an FFT). It also permits for each step to be individually
replaced by assembly or a custom implementation (such as an ASIC).
qHD is 960x540 (q stands for quarter) and QHD is 2560x1440 (Q is quad).
use quadhd for QHD for abbreviation.
Fix ticket#9591
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Trying to write too much will currently overwrite previous data. Trying
to read too much will either av_assert2() in av_fifo_drain() or return
old data. Trying to peek too much will either av_assert2() in
av_fifo_generic_peek_at() or return old data.
Return an error code in all these cases, which is safer and more
consistent.
It returns a pointer inside the fifo's buffer, which cannot be safely
used without accessing AVFifoBuffer internals. It is easier and safer to
use av_fifo_generic_peek_at().
The test /libavutil/tests/hwdevice checks that when deriving a device
from a source device and then deriving back to the type of the source
device, the result is matching the original source device, i.e. the
derivation mechanism doesn't create a new device in this case.
Previously, this test was usually passed, but only due to two different
kind of flaws:
1. The test covers only a single level of derivation (and back)
It derives device Y from device X and then Y back to the type of X and
checks whether the result matches X.
What it doesn't check for, are longer chains of derivation like:
CUDA1 > OpenCL2 > CUDA3 and then back to OpenCL4
In that case, the second derivation returns the first device (CUDA3 ==
CUDA1), but when deriving OpenCL4, hwcontext.c was creating a new
OpenCL4 context instead of returning OpenCL2, because there was no link
from CUDA1 to OpenCL2 (only backwards from OpenCL2 to CUDA1)
If the test would check for two levels of derivation, it would have
failed.
This patch fixes those (yet untested) cases by introducing forward
references (derived_device) in addition to the existing back references
(source_device).
2. hwcontext_qsv didn't properly set the source_device
In case of QSV, hwcontext_qsv creates a source context internally
(vaapi, dxva2 or d3d11va) without calling av_hwdevice_ctx_create_derived
and without setting source_device.
This way, the hwcontext test ran successful, but what practically
happened, was that - for example - deriving vaapi from qsv didn't return
the original underlying vaapi device and a new one was created instead:
Exactly what the test is intended to detect and prevent. It just
couldn't do so, because the original device was hidden (= not set as the
source_device of the QSV device).
This patch properly makes these setting and fixes all derivation
scenarios.
(at a later stage, /libavutil/tests/hwdevice should be extended to check
longer derivation chains as well)
Reviewed-by: Lynne <dev@lynne.ee>
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Tested-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
This is done a second time for 5.0 because master was
merged into 5.0 so that it contains the recent DOVI additions.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
In order to be able to extend this struct later (as the Dolby Vision RPU
evolves), all of the 'container' structs are considered extensible, and
the individual constituent fields must instead be accessed via offsets.
The precedent for this style of access is set in
<libavutil/detection_bbox.h>
Signed-off-by: Niklas Haas <git@haasn.dev>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
We don't use it. Was copied from libplacebo's recommended defaults.
Creates problems with validation on Intel devices, where the driver
still advertizes it, even though it's not usable without a swapchain.
Fix#7830
When we upload a frame that is not padded as MSDK requires, we create a
new AVFrame to copy data. The frame's padding data is uninitialized so
it brings run to run problem. For example, If we run the following
command serveral times we will get different outputs.
ffmpeg -init_hw_device qsv=qsv:hw -qsv_device /dev/dri/renderD128 \
-filter_hw_device qsv -f rawvideo -s 192x200 -pix_fmt p010 \
-i 192x200_P010.yuv -vf "format=nv12,hwupload=extra_hw_frames=16" \
-c:v hevc_qsv output.265
According to https://github.com/Intel-Media-SDK/MediaSDK/blob/master/doc/mediasdk-man.md#encoding-procedures
"Note: It is the application's responsibility to fill pixels outside
of crop window when it is smaller than frame to be encoded. Especially
in cases when crops are not aligned to minimum coding block size (16
for AVC, 8 for HEVC and VP9)"
I add a function to fill padding area with border pixel to fix this
run2run problem, and also move the new AVFrame to global structure
to reduce redundant allocation operation to increase preformance.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
The previous implementation swapped the two halves of the plaintext. The
existing tests only decrypted data with a plaintext of all zeroes, which is
not affected by swapping the halves. Tests which detect the old buggy behavior
have been added.
Signed-off-by: Sebastian Kirmayer <ffmpeg@kirmayer.eu>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
LSX and LASX is loongarch SIMD extention.
They are enabled by default if compiler support it, and can be disabled
with '--disable-lsx' '--disable-lasx'.
Change-Id: Ie2608ea61dbd9b7fffadbf0ec2348bad6c124476
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Reviewed-by: guxiwei <guxiwei-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Always require one semaphore per sw_format plane. This is what
the implementation uses and relies upon throughout. This was
a leftover from an earlier revision that was never needed.
When vulkan image exports to drm, the tilling need to be
VK_IMAGE_TILING_DRM_FORMAT_MODIFIER_EXT. Now add code to create vulkan
image using this format.
Now the following command line works:
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format \
vaapi -i input_1080p.264 -vf "hwmap=derive_device=vulkan,format=vulkan, \
scale_vulkan=1920:1080,hwmap=derive_device=vaapi,format=vaapi" -c:v h264_vaapi output.264
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Further-modifications-by: Lynne <dev@lynne.ee>
Add support to map vulkan frames to software frames when
using contiguous_planes flag.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Further-modifications-by: Lynne <dev@lynne.ee>
VAAPI on Intel can import external frame, but the planes of the external
frames should be in the same drm object. A new option "contiguous_planes"
is added to device. This flag tells device to allocate places in one
memory. When device is derived from vaapi this flag will be enabled.
A new flag frame_flag is also added to AVVulkanFramesContext. User
can use this flag to force enable or disable this behaviour.
A new variable "offset "is added to AVVKFrame. It describe describe the
offset from the memory currently bound to the VkImage.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Further-modifications-by: Lynne <dev@lynne.ee>
This way we can pass explicit modifiers in. Sometimes the
modifier matters for the number of memory planes that
libva accepts, in particular when dealing with
driver-compressed textures. Furthermore the driver might
not actually be able to determine the implicit modifier
if all the buffer-passing has used explicit modifier.
All these issues should be resolved by passing in the
modifier, and for that we switch to using the PRIME_2
memory type.
Tested with experimental radeonsi patches for modifiers
and kmsgrab. Also tested with radeonsi without the
patches to double-check it works without PRIME_2 support.
v2:
Cache PRIME_2 support to avoid doing two calls every time on
libva drivers that do not support it.
v3:
Remove prime2_vas usage.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Currently, it also tests whether extended_data points to something
different than the AVFrame's data array and frees extended_data
if it is different. Yet this is only necessary for one of its three
callers, namely av_frame_unref(); meanwhile the other two callers
took measures to avoid this (or rather, to make it to an av_free(NULL)).
This commit moves this chunk to av_frame_unref() (so that
get_frame_defaults() now treats its input as uninitialized)
and removes the now superfluous code in the other two callers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The data stored in data[3] in VAAPI AVFrame is VASurfaceID while
the data stored in pair->first is the pointer of VASurfaceID, so
we need to do cast to make following commandline works:
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi -i input.264 \
-vf "hwmap=derive_device=qsv,format=qsv" -c:v h264_qsv output.264
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Validation layer is an indispensable part of developing on Vulkan.
The following commands is on how to enable validation layers:
ffmpeg -init_hw_device vulkan=0,debug=1,validation_layers=VK_LAYER_LUNARG_monitor+VK_LAYER_LUNARG_api_dump
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
"All commands that are allowed on a queue that supports transfer
operations are also allowed on a queue that supports either
graphics or compute operations. Thus, if the capabilities of a
queue family include VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT,
then reporting the VK_QUEUE_TRANSFER_BIT capability separately for
that queue family is optional."
What happens on startup is that ffmpeg.c initializes the filter,
then frees it without feeding a single frame through. With no
input frame, the filter lacks a hardware device. The rest of the
uninit code checks if Vulkan objects exist, which they must if there's
a hardware device, but vk->DeviceWaitIdle does not require an object.
So, add a check for it.
It's got a much better API that's actually maintained, it eliminates
race conditions, it comes with a pkg-config file by default, and
unfortunately isn't currently packaged by Debian or other large
distributions.
The issue is that libavfilter depends on libavcodec, and when doing a
static build, if libavcodec also includes "libavfilter/vulkan.c", then
during link-time, compiling programs will fail as there would be multiple
definitions of the same symbols in both libavfilter and libavcodec's
object files.
Linkers are, however, more permitting if both files that include
a common file that's used as a template are one-to-one identical.
Hence, to make both files the same in the future, export all avfilter
specific functions to a separate file.
There is some work in progress to make templated files like this be
compiled only once, so this is not a long-term solution.
This also removes a macro that could be used to toggle SPIRV compilation
capability on #include-time, as this could cause the files to be different.
It has already been checked immediately before that said
AVDictionaryEntry exists; checking again is redundant.
Furthermore, av_hwdevice_find_type_by_name() requires its argument
to be non-NULL, so adding a codepath that automatically calls it
with that parameter is nonsense. The same goes for the argument
corresponding to %s.
Fixes Coverity issue 1491394.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This av_buffer_create() does nothing but leak an AVBuffer and an
AVBufferRef (except on allocation error).
Fixes Coverity issue 1491393.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Add Branch Target Identifiers (BTIs) to all functions defined in
AArch64 assembly files. Most of the BTI landing pads are added
automatically by the 'function' macro.
BTI support is turned on or off at compile time based on the presence
of the __ARM_FEATURE_BTI_DEFAULT feature macro.
A binary compiled with BTI support can be executed on an Armv8-A
processor without BTI support because the instructions are defined in
NOP space.
Signed-off-by: Jonathan Wright <jonathan.wright@arm.com>
Signed-off-by: Elijah Ahmad <elijah.ahmad@arm.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Changes av_clipf to return amin if a is nan.
Before if a is nan av_clipf_c returned nan and
av_clipf_sse would return amax. Now the both
should behave the same.
This works because nan > amin is false.
The max(nan, amin) will be amin.
Signed-off-by: James Almer <jamrial@gmail.com>
There is no reason to wrap them in #ifndef guards, they should only be
defined here and nowhere else. The define guards just add the
possibility to accidentally use the same FF_API name in different
libraries.
Before:
overlay AVOptions:
x <string> ..FV....... set the x expression (default "0")
y <string> ..FV....... set the y expression (default "0")
eof_action <int> ..FV....... Action to take when encountering EOF from secondary input (from 0 to 2) (default repeat)
repeat 0 ..FV....... Repeat the previous frame.
endall 1 ..FV....... End both streams.
pass 2 ..FV....... Pass through the main input.
eval <int> ..FV....... specify when to evaluate expressions (from 0 to 1) (default frame)
After:
a
overlay AVOptions:
x <string> ..FV....... set the x expression (default "0")
y <string> ..FV....... set the y expression (default "0")
eof_action <int> ..FV....... Action to take when encountering EOF from secondary input (from 0 to 2) (default repeat)
repeat 0 ..FV....... Repeat the previous frame.
endall 1 ..FV....... End both streams.
pass 2 ..FV....... Pass through the main input.
eval <int> ..FV....... specify when to evaluate expressions (from 0 to 1) (default frame)
Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
Include windows.h to fix it. Normally, it'd be better to include it in
vulkan_functions.h, but I'm reasonably confident nothing else that uses
the Vulkan code will need to include Windows functions and not windows.h.
This simplifies and makes queue family picking simpler and more robust.
The requirements on the device context are relaxed. They made no sense
in the first place.
The video encode/decode extension is still in beta, at least on paper,
but I really doubt they'd change needing a separate queue family.
Make get_int/set_int symetric. The int64_t to double to int64_t
conversion is unprecise for large value.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avutil_version() currently performs several checks before
just returning the version. There is a static int that aims
to ensure that these tests are run only once. The reason is that
there used to be a slightly expensive check, but it has been removed
in 92e3a6fdac. Today running only
once is unnecessary and can be counterproductive: GCC 10 optimizes
all the actual checks away, but the checks_done variable and the code
setting it has been kept. Given that this check is inherently racy
(it uses non-atomic variables), it is best to just remove it.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The new format (given in big/little endian forms) matches the
existing X2RGB10 format, except with B and R channels switched.
AV_PIX_FMT_X2BGR10 data often is created by OpenGL programs
whose buffers use the GL_RGB10 internal format.
Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Command below failed.
ffmpeg -v verbose -init_hw_device vaapi=va:/dev/dri/renderD128
-init_hw_device qsv=qs@va -hwaccel qsv -hwaccel_device qs
-filter_hw_device va -c:v h264_qsv
-i 1080P.264 -vf "hwmap,format=vaapi" -c:v h264_vaapi output.264
Cause: Assign pair->first directly to data[3] in vaapi frame.
pair->first is *VASurfaceID while data[3] in vaapi frame is
VASurfaceID. I fix this line of code. Now the command above works.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
It does the same as av_calloc(), so one of them should be removed.
Given that av_calloc() has the shorter name, it is retained.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Do this by putting an AVBuffer structure into BufferPoolEntry and
reuse it for all subsequent uses of said BufferPoolEntry.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Microsoft VideoProcessor requires texture with D3DUSAGE_RENDERTARGET flag as output.
There is no way to allocate array of textures with D3D11_BIND_RENDER_TARGET flag
and .ArraySize > 2 by ID3D11Device_CreateTexture2D due to the Microsoft limitation.
Adding AVD3D11FrameDescriptors array to store array of single textures
instead of texture with multiple slices resolves this.
Signed-off-by: Artem Galin <artem.galin@intel.com>
UPD: Rebase of last patch set over current master and use DX9 as default device type.
Makes selection of dxva2/DX9 device type by default as before with explicit d3d11va/DX11 usage to cover more HW configurations.
Added warning message to expect changing default device type in the future.
Fixes TGL / AV1 decode as requires DX11 with explicit DX11 type
selection.
Add headless/multi adapter support and fixes:
https://trac.ffmpeg.org/ticket/7511https://trac.ffmpeg.org/ticket/6827http://ffmpeg.org/pipermail/ffmpeg-trac/2017-November/041901.htmlhttps://trac.ffmpeg.org/ticket/7933338fbcd5bbhttps://github.com/jellyfin/jellyfin/issues/2626#issuecomment-602153952
Any other fixes are welcome including OpenCL interop patch since I don't have proper setup to validate this use case
Decoding, encoding, transcoding have been validated.
child_device_type option is responsible for d3d11va/dxva2 device selection
Usage examples:
DirectX 11:
-init_hw_device qsv:hw,child_device_type=d3d11va
-init_hw_device qsv:hw,child_device_type=d3d11va,child_device=0
OR
-init_hw_device d3d11va=dx -init_hw_device qsv@dx
DirectX 9 is still supported but requires explicit selection:
-init_hw_device qsv:hw,child_device_type=dxva2
OR
-init_hw_device dxva2=dx -init_hw_device qsv@dx
Signed-off-by: Artem Galin <artem.galin@intel.com>
This enables usage of non-powered/headless GPU, better HDR support.
Pool of resources is allocated as one texture with array of slices.
Signed-off-by: Artem Galin <artem.galin@intel.com>
EINVAL is the wrong error code here, since the arguments passed to the
function are valid. The error is that the function is not implemented in
the build, which corresponds to ENOSYS.
From SMPTE RDD 5-2006, the grain seed is to be computed from the
following definition of `pic_offset`:
> When decoding H.264 | MPEG-4 AVC bitstreams, pic_offset is defined as
> follows:
> - pic_offset = PicOrderCnt(CurrPic) + (PicOrderCnt_offset << 5)
> where:
> - PicOrderCnt(CurrPic) is the picture order count of the current frame,
> which shall be derived from [the video stream].
>
> - PicOrderCnt_offset is set to idr_pic_id on IDR frames. idr_pic_id
> shall be read from the slice header of [the video stream]. On non-IDR I
> frames, PicOrderCnt_offset is set to 0. A frame shall be classified as I
> frame when all its slices are I slices, which may be optionally
> designated by setting primary_pic_type to 0 in the access delimiter NAL
> unit. Otherwise, PicOrderCnt_offset it not changed. PicOrderCnt_offset is
> updated in decoding order.
Co-authored-by: James Almer <jamrial@gmail.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
Signed-off-by: James Almer <jamrial@gmail.com>
In particular, document that av_opt_copy() always disentangles
allocated options even on error; this guarantee is needed to e.g.
properly free duplicated thread contexts in libavcodec on error.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The reason why the generic av_image_copy_uc_from() doesn't really
fit in the case for Vulkan is because some planes may be copied via
other methods (such as mapping GPU memory), and if they don't satisfy
the strict alignment requirements, a gpu image->gpu buffer->cpu ram
copy is performed.
We need this for hwcontext_vulkan, and I think this will also be
useful to API users like libplacebo who would rather not write
a custom SIMD memcpy.
Since 580e168a94, av_size_mult() is no
longer inlined; on systems where interposing is a thing, this also
inhibits the compiler from inlining said function into the internal
callers of said function, although inlining such a small function is
typically beneficial: With GCC 10.3 on Ubuntu x64 and -O3 this decreases
the size of av_realloc_array from 91B to 23B, from 129B to 81B for
av_realloc_f and from 77B to 23B for each of av_malloc_array,
av_mallocz_array and av_calloc.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is also used by libavfilter and it is only natural to define it
alongside ff_dlog().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>