This commit moves some of the functionality from avfilter/colorspace
into avutil/csp and exposes it as a public API so it can be used by
libavcodec and/or libavformat. It also converts those structs from
double values to AVRational to make regression testing easier and
more consistent.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
The doc says those function are like av_free if size or nmemb is
zero. It doesn't match the code. av_realloc() realloc one byte if
size is zero, which was added by 91ff05f6ac ten years ago.
realloc() itself in C is implementation-dependent. Make the doc
match the longstanding behaviour.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
libmfx 1.28 was released 3 years ago, it is easy to get a greater
version than 1.28. We may remove lots of compile-time checks if adding
the requirement for the minimal version in the configure script.
Reviewed-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Since every DLL can use an individual CRT on Windows, having
an exported function that opens a FILE* won't work if that
FILE* is going to be used from a different DLL (or from user
application code).
Internally within the libraries, the issue can be worked around
by duplicating the function in all libraries (this already happened
implicitly because the function resided in file_open.c) and renaming
the function to ff_fopen_utf8 (so that it doesn't end up exported from
the DLLs) and duplicating it in all libraries that use it.
This makes the avpriv_fopen_utf8 / ff_fopen_utf8 function work in
the exact same way as the existing avpriv_open / ff_open, with the
same setup as introduced in e743e7ae6e.
That mechanism doesn't work for external users, thus deprecate the
existing function.
Signed-off-by: Martin Storsjö <martin@martin.st>
In d3d11va_create_staging_texture(), during the hwmap process, the
ctx->internal->priv is not initialized, resulting in the
texDesc.Format not initialized. Now pass the format value from
d3d11va_transfer_data() to fix it.
$ ffmpeg.exe -y -hwaccel qsv -init_hw_device d3d11va=d3d11 \
-init_hw_device qsv=qsv@d3d11 -c:v h264_qsv \
-i input.h264 -vf "hwmap=derive_device=d3d11va,format=d3d11,hwdownload,format=nv12" \
-f null -
Reviewed-by: Soft Works <softworkz@hotmail.com>
Signed-off-by: Tong Wu <tong1.wu@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
When the SLOW_GATHER flag was added to the AVX2 version, this
made FMA3-features not enabled on Zen CPUs.
As FMA3 adds 6-7% across all platforms that support it, in
the interest of saving space, this commit removes the AVX
version and replaces it with an FMA3 version.
The only CPUs affected are Sandy Bridge and Bulldozer, which
have AVX support, but no FMA3 support.
In the future, if there's a demand for it, a version of the
function duplicated for AVX can be added.
Instead of having a fixed -64 prio penalty, make the penalties
more granular.
As the prio is based on the register size in bits, decrementing
it by 129 makes AVX SLOW functions be avoided in favor of any
SSE versions.
This reverts commit 82a68a8771.
Smarter slow ISA penalties makes gathers still useful.
The intention is to use gathers with the final stage of non-ptwo iMDCTs,
where they give benefit.
Its performance loss ranges from either being just as fast as individual loads
(Skylake), a few percent slower (Alderlake), 8% slower (Zen 3), to completely
disasterous (older/other CPUs).
Sadly, gathers never panned out fast on x86, even with the benefit of time and
implementation experience.
This also saves a register, as there's no need to fill out an additional
register mask.
Zen 3 (16384-point transform):
Before: 1561050 decicycles in av_tx (fft), 131072 runs, 0 skips
After: 1449621 decicycles in av_tx (fft), 131072 runs, 0 skips
Alderlake:
2% slower on big transforms (65536), to 1% (131072), to a few percent for smaller
sizes.
This avoids having to rebuild big files every time FFMPEG_VERSION
changes (which it does with every commit).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Otherwise its effect might not work causing CPU_COUNT to not get defined.
Fixes cpu count detection to actually use sched_getaffinity if available.
Signed-off-by: Marton Balint <cus@passwd.hu>
The width and height for qsv frame to download need to be
aligned with 16. Add the alignment operation.
Now the following command works:
ffmpeg -hwaccel qsv -f rawvideo -s 1920x1080 -pix_fmt yuv420p -i \
input.yuv -vf "hwupload=extra_hw_frames=16,format=qsv,hwdownload, \
format=nv12" -f null -
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
Commit e050959103 implemented passing in
modifiers by using the PRIME_2 memory type, which only exists in v2 of
the library.
To still support v1 of the library, conditionally compile using
VA_CHECK_VERSION() for both the new code and the old code before
the commit.
Note PRIME_2 memory was introduced from VA-API 1.1, so use
VA_CHECK_VERSION(1, 1, 0) instead of VA_CHECK_VERSION(2, 0, 0) (Haihao)
Signed-off-by: Ingo Brückl <ib@wupperonline.de>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
No point running all 64 iterations in the loop to never write anything to ret.
Also make ambisonic layouts check its mask too while at it.
Signed-off-by: James Almer <jamrial@gmail.com>
bp->len cannot be used to detect if try_describe_ambisonic was successful
because the bprint buffer might contain other data as well.
Also describing an invalid ambisonic layout should not return 0 but
AVERROR(EINVAL) instead, so change try_describe_ambisonic to actually return
error on invalid ambisonics. This also allows us to fix the first issue.
Signed-off-by: Marton Balint <cus@passwd.hu>
This reduces code duplication an allows printing AMBI%d channel names for
custom layouts for non-standard or partial ambisonic layouts.
Signed-off-by: Marton Balint <cus@passwd.hu>
Fixes memleaks in the channel_layout FATE-test.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The new API is more extensible and allows for custom layouts.
More accurate information is exported, eg for decoders that do not
set a channel layout, lavc will not make one up for them.
Deprecate the old API working with just uint64_t bitmasks.
Expanded and completed by Vittorio Giovara <vittorio.giovara@gmail.com>
and James Almer <jamrial@gmail.com>.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
This avoids build errors if such features are enabled while targeting
another binary format. (Using such features on other platforms
might require some other form of signaling/setup though, but
the ELF specific .note section isn't applicable at least.)
Signed-off-by: Martin Storsjö <martin@martin.st>
This patch adds optional support for Arm Pointer Authentication Codes.
PAC support is turned on or off at compile time using additional
compiler flags. Unless any of these is enabled explicitly, no additional
code will be emitted at all.
Signed-off-by: André Kempe <andre.kempe@arm.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
The loongson_intrinsics.h file is updated from v1.0.3 version
to v1.1.0. Some spelling mistakes are fixed and new functions are added.
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Reviewed-by: 殷时友 <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Having optionally installed headers is a bad idea as there's no way to know
if they are present or not (unless a define is added to avconfig.h, but that's
just ugly).
Signed-off-by: James Almer <jamrial@gmail.com>
Some of these were made possible by moving several common macros to
libavutil/macros.h.
While just at it, also improve the other headers a bit.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is a remnant of an FF_API_* inclusion (back from when they were in
avutil.h and not in version.h).
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It has been added for an FF_API_* at a time when these were in avutil.h.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It has been included since af5f434f8c
for deprecation reasons, but removing it has been forgotten after
it had served is purpose. So remove it.
For convenience, include version.h instead as LIBAVUTIL_VERSION_INT
is supposed to be used when creating AVClasses.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Only include it if it is needed, namely if __MMX__ is undefined.
X86 is currently the only arch where lavu/cpu.h is basically
automatically included (for internal development): #if ARCH_X86
is true, lavu/internal.h (which is basically included everywhere)
includes lavu/x86/emms.h which can mask missing inclusions
of lavu/cpu.h if the developer works on x86/x64. This has happened
in 8e825ec3ab and also earlier
(see 6d2365882f).
By including said header only if necessary ordinary developer machines
will behave like non-x86 arches, so that missing inclusions of cpu.h
won't go unnoticed any more.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
When the fifo is grown by exactly the current write offset, it would end
up with offset_w = nb_elems. If av_fifo_write_from_cb() is called in
such a state, the user callback would get callled with *nb_elems=0,
which will then cause the write to return without writing anything.
Otherwise nasm writes the full host-specific paths into .o
output, which breaks binary reproducibility.
Signed-off-by: Alexander Kanavin <alex.kanavin@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
getauxval is marginally faster, and works even when procfs is not mounted
support on Linux was added in glibc 2.16
support on Android was added in 4.4 (API 20)
fixes#6578
Signed-off-by: Aman Karmani <aman@tmm1.net>
This commit does some refactoring to make defining assembly codelets
smaller, and fixes compiler redefinition warnings. It also allows
for other assembly versions to reuse the same boilerplate code as
x86.
Finally, it also adds the out_of_place flag to all assembly codelets.
This changes nothing, as out-of-place operation was assumed to be
available anyway, but this makes it more explicit.
Users should switch to the superior AVFifo API.
Unfortunately AVFifoBuffer fields cannot be marked as deprecated because
it would trigger a warning wherever fifo.h is #included, due to
inlined av_fifo_peek2().
Many AVFifoBuffer users operate on fixed-size elements (e.g. pointers),
but the current FIFO API deals exclusively in bytes, requiring extra
complexity in all these callers.
Add a new AVFifo API creating a FIFO with an element size
that may be larger than a byte. All operations on such a FIFO then
operate on complete elements.
This API does not reuse AVFifoBuffer and its API at all, but instead uses
an opaque struct called AVFifo. The AVFifoBuffer API will be deprecated
in a future commit once all of its users have been switched to the new
API.
Not reusing AVFifoBuffer also allowed to use the full range of size_t
from the beginning.
The API currently allows creating FIFOs up to
- UINT_MAX: av_fifo_alloc(), av_fifo_realloc(), av_fifo_grow()
- SIZE_MAX: av_fifo_alloc_array()
However the usable limit is determined by
- rndx/wndx being uint32_t
- av_fifo_[size,space] returning int
so no FIFO should be larger than the smallest of
- INT_MAX
- UINT32_MAX
- SIZE_MAX
(which should be INT_MAX an all commonly used platforms).
Return an error on trying to allocate FIFOs larger than this limit.
Avoids code duplication. It furthermore properly checks
for buf_size to be > 0 before doing anything.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
VkPhysicalDeviceVulkan12Features isn't implemented on MoltenVK yet.
VkPhysicalDeviceTimelineSemaphoreFeatures is less versatile but
simple. None of device_features_1_1 nor device_features_1_2 has real
usage yet, keep the code for future.
Makes Bulldozer prefer AVX functions rather than AVX2,
which are 64% slower:
AVX: 117653 decicycles in av_tx (fft), 1048535 runs, 41 skips
AVX2: 193385 decicycles in av_tx (fft), 1048561 runs, 15 skips
The only difference between both is that vgatherdpd is used in
the former. We don't want to mark them with the new SLOW_GATHER
flag however, since gathers are still faster on Haswell/Zen 2/3
than plain loads.
If a codelet initializes 2 subtransforms, and the second one fails,
the failure would free all subcontexts.
Instead, if there are subcontexts still left, don't free the array.
If all initializations fail, the init() function will return,
and reset_ctx() from the previous step will clean up all contained
subtransforms.
Fix CID: 1497864
The control flow should return ENOSYS if nb_cd_matches is 0 at before
and the ret equal AVERROR(ENOMEM) or goto end label, so remove the last
control flow if (ret >= 0) before end label.
Signed-off-by: Steven Liu <liuqi05@kuaishou.com>