The previous implementation swapped the two halves of the plaintext. The
existing tests only decrypted data with a plaintext of all zeroes, which is
not affected by swapping the halves. Tests which detect the old buggy behavior
have been added.
Signed-off-by: Sebastian Kirmayer <ffmpeg@kirmayer.eu>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
LSX and LASX is loongarch SIMD extention.
They are enabled by default if compiler support it, and can be disabled
with '--disable-lsx' '--disable-lasx'.
Change-Id: Ie2608ea61dbd9b7fffadbf0ec2348bad6c124476
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Reviewed-by: guxiwei <guxiwei-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Always require one semaphore per sw_format plane. This is what
the implementation uses and relies upon throughout. This was
a leftover from an earlier revision that was never needed.
When vulkan image exports to drm, the tilling need to be
VK_IMAGE_TILING_DRM_FORMAT_MODIFIER_EXT. Now add code to create vulkan
image using this format.
Now the following command line works:
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 -hwaccel_output_format \
vaapi -i input_1080p.264 -vf "hwmap=derive_device=vulkan,format=vulkan, \
scale_vulkan=1920:1080,hwmap=derive_device=vaapi,format=vaapi" -c:v h264_vaapi output.264
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Further-modifications-by: Lynne <dev@lynne.ee>
Add support to map vulkan frames to software frames when
using contiguous_planes flag.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Further-modifications-by: Lynne <dev@lynne.ee>
VAAPI on Intel can import external frame, but the planes of the external
frames should be in the same drm object. A new option "contiguous_planes"
is added to device. This flag tells device to allocate places in one
memory. When device is derived from vaapi this flag will be enabled.
A new flag frame_flag is also added to AVVulkanFramesContext. User
can use this flag to force enable or disable this behaviour.
A new variable "offset "is added to AVVKFrame. It describe describe the
offset from the memory currently bound to the VkImage.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Further-modifications-by: Lynne <dev@lynne.ee>
This way we can pass explicit modifiers in. Sometimes the
modifier matters for the number of memory planes that
libva accepts, in particular when dealing with
driver-compressed textures. Furthermore the driver might
not actually be able to determine the implicit modifier
if all the buffer-passing has used explicit modifier.
All these issues should be resolved by passing in the
modifier, and for that we switch to using the PRIME_2
memory type.
Tested with experimental radeonsi patches for modifiers
and kmsgrab. Also tested with radeonsi without the
patches to double-check it works without PRIME_2 support.
v2:
Cache PRIME_2 support to avoid doing two calls every time on
libva drivers that do not support it.
v3:
Remove prime2_vas usage.
Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Currently, it also tests whether extended_data points to something
different than the AVFrame's data array and frees extended_data
if it is different. Yet this is only necessary for one of its three
callers, namely av_frame_unref(); meanwhile the other two callers
took measures to avoid this (or rather, to make it to an av_free(NULL)).
This commit moves this chunk to av_frame_unref() (so that
get_frame_defaults() now treats its input as uninitialized)
and removes the now superfluous code in the other two callers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The data stored in data[3] in VAAPI AVFrame is VASurfaceID while
the data stored in pair->first is the pointer of VASurfaceID, so
we need to do cast to make following commandline works:
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format vaapi -i input.264 \
-vf "hwmap=derive_device=qsv,format=qsv" -c:v h264_qsv output.264
Signed-off-by: nyanmisaka <nst799610810@gmail.com>
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Validation layer is an indispensable part of developing on Vulkan.
The following commands is on how to enable validation layers:
ffmpeg -init_hw_device vulkan=0,debug=1,validation_layers=VK_LAYER_LUNARG_monitor+VK_LAYER_LUNARG_api_dump
Signed-off-by: Wu Jianhua <jianhua.wu@intel.com>
"All commands that are allowed on a queue that supports transfer
operations are also allowed on a queue that supports either
graphics or compute operations. Thus, if the capabilities of a
queue family include VK_QUEUE_GRAPHICS_BIT or VK_QUEUE_COMPUTE_BIT,
then reporting the VK_QUEUE_TRANSFER_BIT capability separately for
that queue family is optional."
What happens on startup is that ffmpeg.c initializes the filter,
then frees it without feeding a single frame through. With no
input frame, the filter lacks a hardware device. The rest of the
uninit code checks if Vulkan objects exist, which they must if there's
a hardware device, but vk->DeviceWaitIdle does not require an object.
So, add a check for it.
It's got a much better API that's actually maintained, it eliminates
race conditions, it comes with a pkg-config file by default, and
unfortunately isn't currently packaged by Debian or other large
distributions.
The issue is that libavfilter depends on libavcodec, and when doing a
static build, if libavcodec also includes "libavfilter/vulkan.c", then
during link-time, compiling programs will fail as there would be multiple
definitions of the same symbols in both libavfilter and libavcodec's
object files.
Linkers are, however, more permitting if both files that include
a common file that's used as a template are one-to-one identical.
Hence, to make both files the same in the future, export all avfilter
specific functions to a separate file.
There is some work in progress to make templated files like this be
compiled only once, so this is not a long-term solution.
This also removes a macro that could be used to toggle SPIRV compilation
capability on #include-time, as this could cause the files to be different.
It has already been checked immediately before that said
AVDictionaryEntry exists; checking again is redundant.
Furthermore, av_hwdevice_find_type_by_name() requires its argument
to be non-NULL, so adding a codepath that automatically calls it
with that parameter is nonsense. The same goes for the argument
corresponding to %s.
Fixes Coverity issue 1491394.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
This av_buffer_create() does nothing but leak an AVBuffer and an
AVBufferRef (except on allocation error).
Fixes Coverity issue 1491393.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Add Branch Target Identifiers (BTIs) to all functions defined in
AArch64 assembly files. Most of the BTI landing pads are added
automatically by the 'function' macro.
BTI support is turned on or off at compile time based on the presence
of the __ARM_FEATURE_BTI_DEFAULT feature macro.
A binary compiled with BTI support can be executed on an Armv8-A
processor without BTI support because the instructions are defined in
NOP space.
Signed-off-by: Jonathan Wright <jonathan.wright@arm.com>
Signed-off-by: Elijah Ahmad <elijah.ahmad@arm.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
Changes av_clipf to return amin if a is nan.
Before if a is nan av_clipf_c returned nan and
av_clipf_sse would return amax. Now the both
should behave the same.
This works because nan > amin is false.
The max(nan, amin) will be amin.
Signed-off-by: James Almer <jamrial@gmail.com>
There is no reason to wrap them in #ifndef guards, they should only be
defined here and nowhere else. The define guards just add the
possibility to accidentally use the same FF_API name in different
libraries.
Before:
overlay AVOptions:
x <string> ..FV....... set the x expression (default "0")
y <string> ..FV....... set the y expression (default "0")
eof_action <int> ..FV....... Action to take when encountering EOF from secondary input (from 0 to 2) (default repeat)
repeat 0 ..FV....... Repeat the previous frame.
endall 1 ..FV....... End both streams.
pass 2 ..FV....... Pass through the main input.
eval <int> ..FV....... specify when to evaluate expressions (from 0 to 1) (default frame)
After:
a
overlay AVOptions:
x <string> ..FV....... set the x expression (default "0")
y <string> ..FV....... set the y expression (default "0")
eof_action <int> ..FV....... Action to take when encountering EOF from secondary input (from 0 to 2) (default repeat)
repeat 0 ..FV....... Repeat the previous frame.
endall 1 ..FV....... End both streams.
pass 2 ..FV....... Pass through the main input.
eval <int> ..FV....... specify when to evaluate expressions (from 0 to 1) (default frame)
Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Marton Balint <cus@passwd.hu>
Include windows.h to fix it. Normally, it'd be better to include it in
vulkan_functions.h, but I'm reasonably confident nothing else that uses
the Vulkan code will need to include Windows functions and not windows.h.
This simplifies and makes queue family picking simpler and more robust.
The requirements on the device context are relaxed. They made no sense
in the first place.
The video encode/decode extension is still in beta, at least on paper,
but I really doubt they'd change needing a separate queue family.
Make get_int/set_int symetric. The int64_t to double to int64_t
conversion is unprecise for large value.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
avutil_version() currently performs several checks before
just returning the version. There is a static int that aims
to ensure that these tests are run only once. The reason is that
there used to be a slightly expensive check, but it has been removed
in 92e3a6fdac. Today running only
once is unnecessary and can be counterproductive: GCC 10 optimizes
all the actual checks away, but the checks_done variable and the code
setting it has been kept. Given that this check is inherently racy
(it uses non-atomic variables), it is best to just remove it.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The new format (given in big/little endian forms) matches the
existing X2RGB10 format, except with B and R channels switched.
AV_PIX_FMT_X2BGR10 data often is created by OpenGL programs
whose buffers use the GL_RGB10 internal format.
Signed-off-by: Manuel Stoeckl <code@mstoeckl.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Command below failed.
ffmpeg -v verbose -init_hw_device vaapi=va:/dev/dri/renderD128
-init_hw_device qsv=qs@va -hwaccel qsv -hwaccel_device qs
-filter_hw_device va -c:v h264_qsv
-i 1080P.264 -vf "hwmap,format=vaapi" -c:v h264_vaapi output.264
Cause: Assign pair->first directly to data[3] in vaapi frame.
pair->first is *VASurfaceID while data[3] in vaapi frame is
VASurfaceID. I fix this line of code. Now the command above works.
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
It does the same as av_calloc(), so one of them should be removed.
Given that av_calloc() has the shorter name, it is retained.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Do this by putting an AVBuffer structure into BufferPoolEntry and
reuse it for all subsequent uses of said BufferPoolEntry.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Microsoft VideoProcessor requires texture with D3DUSAGE_RENDERTARGET flag as output.
There is no way to allocate array of textures with D3D11_BIND_RENDER_TARGET flag
and .ArraySize > 2 by ID3D11Device_CreateTexture2D due to the Microsoft limitation.
Adding AVD3D11FrameDescriptors array to store array of single textures
instead of texture with multiple slices resolves this.
Signed-off-by: Artem Galin <artem.galin@intel.com>
UPD: Rebase of last patch set over current master and use DX9 as default device type.
Makes selection of dxva2/DX9 device type by default as before with explicit d3d11va/DX11 usage to cover more HW configurations.
Added warning message to expect changing default device type in the future.
Fixes TGL / AV1 decode as requires DX11 with explicit DX11 type
selection.
Add headless/multi adapter support and fixes:
https://trac.ffmpeg.org/ticket/7511https://trac.ffmpeg.org/ticket/6827http://ffmpeg.org/pipermail/ffmpeg-trac/2017-November/041901.htmlhttps://trac.ffmpeg.org/ticket/7933338fbcd5bbhttps://github.com/jellyfin/jellyfin/issues/2626#issuecomment-602153952
Any other fixes are welcome including OpenCL interop patch since I don't have proper setup to validate this use case
Decoding, encoding, transcoding have been validated.
child_device_type option is responsible for d3d11va/dxva2 device selection
Usage examples:
DirectX 11:
-init_hw_device qsv:hw,child_device_type=d3d11va
-init_hw_device qsv:hw,child_device_type=d3d11va,child_device=0
OR
-init_hw_device d3d11va=dx -init_hw_device qsv@dx
DirectX 9 is still supported but requires explicit selection:
-init_hw_device qsv:hw,child_device_type=dxva2
OR
-init_hw_device dxva2=dx -init_hw_device qsv@dx
Signed-off-by: Artem Galin <artem.galin@intel.com>
This enables usage of non-powered/headless GPU, better HDR support.
Pool of resources is allocated as one texture with array of slices.
Signed-off-by: Artem Galin <artem.galin@intel.com>
EINVAL is the wrong error code here, since the arguments passed to the
function are valid. The error is that the function is not implemented in
the build, which corresponds to ENOSYS.
From SMPTE RDD 5-2006, the grain seed is to be computed from the
following definition of `pic_offset`:
> When decoding H.264 | MPEG-4 AVC bitstreams, pic_offset is defined as
> follows:
> - pic_offset = PicOrderCnt(CurrPic) + (PicOrderCnt_offset << 5)
> where:
> - PicOrderCnt(CurrPic) is the picture order count of the current frame,
> which shall be derived from [the video stream].
>
> - PicOrderCnt_offset is set to idr_pic_id on IDR frames. idr_pic_id
> shall be read from the slice header of [the video stream]. On non-IDR I
> frames, PicOrderCnt_offset is set to 0. A frame shall be classified as I
> frame when all its slices are I slices, which may be optionally
> designated by setting primary_pic_type to 0 in the access delimiter NAL
> unit. Otherwise, PicOrderCnt_offset it not changed. PicOrderCnt_offset is
> updated in decoding order.
Co-authored-by: James Almer <jamrial@gmail.com>
Signed-off-by: Niklas Haas <git@haasn.dev>
Signed-off-by: James Almer <jamrial@gmail.com>
In particular, document that av_opt_copy() always disentangles
allocated options even on error; this guarantee is needed to e.g.
properly free duplicated thread contexts in libavcodec on error.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
The reason why the generic av_image_copy_uc_from() doesn't really
fit in the case for Vulkan is because some planes may be copied via
other methods (such as mapping GPU memory), and if they don't satisfy
the strict alignment requirements, a gpu image->gpu buffer->cpu ram
copy is performed.
We need this for hwcontext_vulkan, and I think this will also be
useful to API users like libplacebo who would rather not write
a custom SIMD memcpy.
Since 580e168a94, av_size_mult() is no
longer inlined; on systems where interposing is a thing, this also
inhibits the compiler from inlining said function into the internal
callers of said function, although inlining such a small function is
typically beneficial: With GCC 10.3 on Ubuntu x64 and -O3 this decreases
the size of av_realloc_array from 91B to 23B, from 129B to 81B for
av_realloc_f and from 77B to 23B for each of av_malloc_array,
av_mallocz_array and av_calloc.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is also used by libavfilter and it is only natural to define it
alongside ff_dlog().
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Up until now, including error.h alone does not make the AVERROR_* defines
usable, because they just expand to something involving MKTAG, but
without the header providing MKTAG. So include macros.h, the header
providing MKTAG.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
common.h currently contains several things: Math macros, UTF-8 macros,
other fundamental macros; furthermore it also contains miscellaneous
math functions and it (directly and indirectly) includes lots of other
headers.
This commit moves the "other fundamental macros" to macros.h which is
a more fitting place.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Some function had exceed 30 inline assembly register oprands limiation
when using LOONGSON2 version of MMI macros. We can avoid that by take
$at, which is register reserved for assembler, as temporary register.
As none of instructions used in these macros is pseudo, it is safe to
utilize $at here.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
{SAVE,RECOVER}_REG will be available for Loongson2 again,
also comment about the magic.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
These have mostly been added because of FF_API_*; yet when these were
removed, removing the header has been forgotten.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
These inclusions are not necessary, as cpu.h is already included
wherever it is needed (via direct inclusion or via the arch-specific
headers).
Also remove other unnecessary cpu.h inclusions from ordinary
non-headers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It is not used here at all; instead, add it where it is used without
including it or any of the arch-specific CPU headers.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Teach AV_HWDEVICE_TYPE_VIDEOTOOLBOX to be able to create AVFrames of type
AV_PIX_FMT_VIDEOTOOLBOX. This can be used to hwupload a regular AVFrame
into its CVPixelBuffer equivalent.
ffmpeg -init_hw_device videotoolbox -f lavfi -i color=black:640x480 -vf hwupload -c:v h264_videotoolbox -f null -y /dev/null
Signed-off-by: Aman Karmani <aman@tmm1.net>
The field is a standard field, yet we were loading it as if it was
a quadword. This worked for forward transforms by chance, but broke
when the transform was inverse.
checkasm couldn't catch that because we only test forward transforms,
which are identical to inverse transforms but with a different revtab.
Fixes: left shift of negative value -1
Fixes: 33736/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_SIREN_fuzzer-6657785795313664
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Clang is more strict on the type of asm operands, float or double
type variable should use constraint 'f', integer variable should
use constraint 'r'.
Signed-off-by: Jin Bo <jinbo@loongson.cn>
Reviewed-by: yinshiyou-hf@loongson.cn
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This mostly reverts 785bfb1d7b.
But I also added some clarifications so that nobody mixes primaries
with matrix again. SMPTE 240 and 170 primaires are the same, while
matrix coeff. are different, because 240 is derived from 170's new
primaries and white point while 170 uses BT.601 derived from BT.470
System M (yes, with Illuminant C) a.k.a. NTSC 1953. Some nits too.
Reviewed-by: Reto Kromer <lists@reto.ch>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
MSA2 optimizations are attached to MSA macros in generic_macros_msa.h.
It's difficult to do runtime check for them. Remove this part of code
can make it more robust. H264 1080p decoding: 5.13x==>5.12x.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
While Vulkan itself went more or less the way it was expected to go,
libvulkan didn't quite solve all of the opengl loader issues. It's multi-vendor,
yes, but unfortunately, the code is Google/Khronos QUALITY, so suffers from
big static linking issues (static linking on anything but OSX is unsupported),
has bugs, and due to the prefix system used, there are 3 or so ways to type out
functions.
Just solve all of those problems by dlopening it. We even have nice emulation
for it on Windows.
VkPhysicalDeviceLimits.optimalBufferCopyRowPitchAlignment and
VkPhysicalDeviceExternalMemoryHostPropertiesEXT.minImportedHostPointerAlignment are of type
VkDeviceSize (a typedef uint64_t).
VkPhysicalDeviceLimits.minMemoryMapAlignment is of type size_t.
Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Lynne <dev@lynne.ee>
Originally deprecated in 1296b1f6c0631ab79464e22d48a6a1548450b943;
scheduled again for removal in a991526832.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
This commit adds a pure x86 assembly SIMD version of the FFT in libavutil/tx.
The design of this pure assembly FFT is pretty unconventional.
On the lowest level, instead of splitting the complex numbers into
real and imaginary parts, we keep complex numbers together but split
them in terms of parity. This saves a number of shuffles in each transform,
but more importantly, it splits each transform into two independent
paths, which we process using separate registers in parallel.
This allows us to keep all units saturated and lets us use all available
registers to avoid dependencies.
Moreover, it allows us to double the granularity of our per-load permutation,
skipping many expensive lookups and allowing us to use just 4 loads per register,
rather than 8, or in case FMA3 (and by extension, AVX2), use the vgatherdpd
instruction, which is at least as fast as 4 separate loads on old hardware,
and quite a bit faster on modern CPUs).
Higher up, we go for a bottom-up construction of large transforms, foregoing
the traditional per-transform call-return recursion chains. Instead, we always
start at the bottom-most basis transform (in this case, a 32-point transform),
and continue constructing larger and larger transforms until we return to the
top-most transform.
This way, we only touch the stack 3 times per a complete target transform:
once for the 1/2 length transform and two times for the 1/4 length transform.
The combination algorithm we use is a standard Split-Radix algorithm,
as used in our C code. Although a version with less operations exists
(Steven G. Johnson and Matteo Frigo's "A modified split-radix FFT with fewer
arithmetic operations", IEEE Trans. Signal Process. 55 (1), 111–119 (2007),
which is the one FFTW uses), it only has 2% less operations and requires at least 4x
the binary code (due to it needing 4 different paths to do a single transform).
That version also has other issues which prevent it from being implemented
with SIMD code as efficiently, which makes it lose the marginal gains it offered,
and cannot be performed bottom-up, requiring many recursive call-return chains,
whose overhead adds up.
We go through a lot of effort to minimize load/stores by keeping as much in
registers in between construcring transforms. This saves us around 32 cycles,
on paper, but in reality a lot more due to load/store aliasing (a load from a
memory location cannot be issued while there's a store pending, and there are
only so many (2 for Zen 3) load/store units in a CPU).
Also, we interleave coefficients during the last stage to save on a store+load
per register.
Each of the smallest, basis transforms (4, 8 and 16-point in our case)
has been extremely optimized. Our 8-point transform is barely 20 instructions
in total, beating our old implementation 8-point transform by 1 instruction.
Our 2x8-point transform is 23 instructions, beating our old implementation by
6 instruction and needing 50% less cycles. Our 16-point transform's combination
code takes slightly more instructions than our old implementation, but makes up
for it by requiring a lot less arithmetic operations.
Overall, the transform was optimized for the timings of Zen 3, which at the
time of writing has the most IPC from all documented CPUs. Shuffles were
preferred over arithmetic operations due to their 1/0.5 latency/throughput.
On average, this code is 30% faster than our old libavcodec implementation.
It's able to trade blows with the previously-untouchable FFTW on small transforms,
and due to its tiny size and better prediction, outdoes FFTW on larger transforms
by 11% on the largest currently supported size.
This sadly required making changes to the code itself,
due to the same context needing to be reused for both versions.
The lookup table had to be duplicated for both versions.
This commit refactors the power-of-two FFT, making it faster and
halving the size of all tables, making the code much smaller on
all systems.
This removes the big/small pass split, because on modern systems
the "big" pass is always faster, and even on older machines there
is no measurable speed difference.
av_set_cpu_flags_mask() has been deprecated in the commit which merged
it: 6df42f98746be06c883ce683563e07c9a2af983f; av_parse_cpu_flags() has
been deprecated in 4b529edff8.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
av_frame_copy() is allowed to return values >= 0 on success, whereas
the documentation of av_frame_ref() states that the return value is 0 on
success. Ergo the latter must not just return the former's return value.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
av_cpu_count() intends to emit a debug message containing the number of
logical cores when called the first time. The check currently works with
a static volatile int; yet this does not help at all in case of
concurrent accesses by multiple threads. So replace this with an
atomic_int.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
av_adler32_update() is used by av_hash_update() which will be switched
to size_t at the next bump. So it also has to be made to use size_t.
This is also necessary for framecrcenc.c, because the size of side data
will become a size_t, too.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
av_bprint_finalize() can still fail even when it has been checked that
the AVBPrint is currently complete: Namely if the string was so short
that it fit into the AVBPrint's internal buffer.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Fixes: Integer overflow and division by 0
Fixes: poc-202102-div.mov
Found-by: 1vanChen of NSFOCUS Security Team
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
clang errors when compiling with C++11 about needing spaces between
literal and identifier
Signed-off-by: Christopher Degawa <ccom@randomderp.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Base escaping only escapes values required for base character data
according to part 2.4 of XML, and if additional flags are added
single and double quotes can additionally be escaped in order
to handle single and double quoted attributes.
Co-authored-by: Jan Ekström <jan.ekstrom@24i.com>
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
Fixes: signed integer overflow: -9223372053736 * 1000000 cannot be represented in type 'long'
Fixes: 26910/clusterfuzz-testcase-minimized-ffmpeg_dem_CONCAT_fuzzer-6607924558430208
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
out[lut[i]] = in[i] lookups were 4.04 times(!) slower than
out[i] = in[lut[i]] lookups for an out-of-place FFT of length 4096.
The permutes remain unchanged for anything but out-of-place monolithic
FFT, as those benefit quite a lot from the current order (it means
there's only 1 lookup necessary to add to an offset, rather than
a full gather).
The code was based around non-power-of-two FFTs, so this wasn't
benchmarked early on.
No buffer will be fetched from the pool after it's uninitialized, so there's
no benefit from waiting until every single buffer has been returned to it
before freeing them all.
This should free some memory in certain scenarios, which can be beneficial in
low memory systems.
Based on a patch by Jonas Karlman.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: James Almer <jamrial@gmail.com>
This is much less precise than the cycle counter register, but
the cycle counter register is not available on apple platforms
(and on linux, it requires a kernel module for allowing user mode
access).
Signed-off-by: Martin Storsjö <martin@martin.st>
This commit adds support for in-place FFT transforms. Since our
internal transforms were all in-place anyway, this only changes
the permutation on the input.
Unfortunately, research papers were of no help here. All focused
on dry hardware implementations, where permutes are free, or on
software implementations where binary bloat is of no concern so
storing dozen times the transforms for each permutation and version
is not considered bad practice.
Still, for a pure C implementation, it's only around 28% slower
than the multi-megabyte FFTW3 in unaligned mode.
Unlike a closed permutation like with PFA, split-radix FFT bit-reversals
contain multiple NOPs, multiple simple swaps, and a few chained swaps,
so regular single-loop single-state permute loops were not possible.
Instead, we filter out parts of the input indices which are redundant.
This allows for a single branch, and with some clever AVX512 asm,
could possibly be SIMD'd without refactoring.
The inplace_idx array is guaranteed to never be larger than the
revtab array, and in practice only requires around log2(len) entries.
The power-of-two MDCTs can be done in-place as well. And it's
possible to eliminate a copy in the compound MDCTs too, however
it'll be slower than doing them out of place, and we'd need to dirty
the input array.
This patch also fixes a -Wtautological-constant-out-of-range-compare
warning from Clang and a -Wtype-limits warning from GCC on systems
where size_t is 64bits and unsigned 32bits. The reason for this seems
to be that variable (whose value derives from sizeof() and can therefore
be known at compile-time) is used instead of using sizeof() directly in
the comparison.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
libavutil/common.h is a public header that provides generic math
functions whereas libavutil/intmath.h is a private header that contains
plattform-specific optimized versions of said math functions. common.h
includes intmath.h (when building the FFmpeg libraries) so that the
optimized versions are used for them.
This interdependency sometimes causes trouble: intmath.h once contained
an inlined ff_sqrt function that relied upon av_log2_16bit. In case there
was no optimized logarithm available on this plattform, intmath.h needed
to include common.h to get the generic implementation and this has been
done after the optimized versions (if any) have been provided so that
common.h used the optimized versions; it also needed to be done before
ff_sqrt. Yet when intmath.h was included from common.h and if an ordinary
inclusion guard was used by common.h, the #include "common.h" in intmath.h
was a no-op and therefore av_log2_16bit was still unknown at the end of
intmath.h (and also in ff_sqrt) if no optimized version was available.
Before a955b59658 this was solved by
duplicating the #ifndef av_log2_16bit check after the inclusion of
common.h in intmath.h; said commit instead moved these checks to the
end of common.h, outside the inclusion guards and made common.h include
itself to get these unguarded defines. This is still the current
state of affairs.
Yet this is unnecessary since 9734b8ba56
as said commit removed ff_sqrt as well as the #include "common.h" from
intmath.h. Therefore this commit moves everything inside the inclusion
guards and makes common.h not include itself.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Fixes: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself
Fixes: 29437/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-4748510022991872
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: division by zero
Fixes: 29555/clusterfuzz-testcase-minimized-ffmpeg_dem_VIVO_fuzzer-5149951447400448
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This reverts commit 6696a07ac6.
It is wrong to restrict timecodes to always contain leading zeros or for hours
or frames to be 2 chars only.
Signed-off-by: Marton Balint <cus@passwd.hu>
fixes http://trac.ffmpeg.org/ticket/9055
The hw decoder may allocate a large frame from AVHWFramesContext, and adjust width and height based on bitstream.
We need to use resolution from src frame instead of AVHWFramesContext.
test command:
ffmpeg -loglevel debug -hide_banner -hwaccel vaapi -init_hw_device vaapi=va:/dev/dri/renderD128 -hwaccel_device va -hwaccel_output_format vaapi -init_hw_device vulkan=vulk -filter_hw_device vulk -i 1920x1080.264 -c:v libx264 -r:v 30 -profile:v high -preset veryfast -vf "hwmap,chromaber_vulkan=0:0,hwdownload,format=nv12" -map 0 -y vaapiouts.mkv
expected:
No green bar at bottom.
Fixes: signed integer overflow: 2147462079 + 2149596 cannot be represented in type 'int'
Fixes: 27565/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-5091972813160448
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This patch adds support for arbitrary-point FFTs and all even MDCT
transforms.
Odd MDCTs are not supported yet as they're based on the DCT-II and DCT-III
and they're very niche.
With this we can now write tests.
Fixes: division by zero
Fixes: 26451/clusterfuzz-testcase-minimized-ffmpeg_dem_VIVO_fuzzer-4756955832516608
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
INT32_MAX (2147483647) isn't exactly representable by a floating point
value, with the closest being 2147483648.0. So when rescaling a value
of 1.0, this could overflow when casting the 64-bit value returned from
lrintf() into 32 bits.
Unfortunately the properties of integer overflows don't match up well
with how a Fourier Transform operates. So clip the value before
casting to a 32-bit int.
Should be noted we don't have overflows with the table values we're
currently using. However, converting a Kaiser-Bessel window function
with a length of 256 and a parameter of 5.0 to fixed point did create
overflows. So this is more of insurance to save debugging time
in case something changes in the future.
The macro is only used during init, so it being a little slower is
not a problem.
Do it only when requested with the AV_CODEC_EXPORT_DATA_VIDEO_ENC_PARAMS
flag.
Drop previous code using the long-deprecated AV_FRAME_DATA_QP_TABLE*
API. Temporarily disable fate-filter-pp, fate-filter-pp7,
fate-filter-spp. They will be reenabled once these filters are converted
in following commits.
This was introduced in version 4.6. And may not exist all without an
optional package. So to prevent a hard dependency on needing the Linux
kernel headers to compile, make this optional.
Also ignore the status of the ioctl, since it may fail on older kernels
which don't support it. It's okay to ignore as its not fatal and any
serious errors will be caught later by the mmap call.
When targeting a recent enough macOS/iOS version that has clock_gettime
it won't be a weak symbol, in which case clang warns for this check
as it's always true:
warning: address of function 'clock_gettime' will always
evaluate to 'true'
This warning is silenced by using the address-of operator to make
the intent explicit.
These two extensions and two features are both optionally used by
libplacebo to speed up rendering, so it makes sense for libavutil to
automatically enable them as well.
Vulkan formats with a PACK suffix define native endianess.
Vulkan formats without a PACK suffix are in bytestream order.
Pixel formats with a LE/BE suffix define endianess.
Pixel formats without LE/BE suffix are in bytestream order.
This relies on the fact that host memory is always going to be required
to be aligned to the platform's page size, which means we can adjust
the pointers when we map them to buffers and therefore skip an entire
copy. This has already had extensive testing in libplacebo without
problems, so its safe to use here as well.
Speeds up downloads and uploads on platforms which do not pool their
memory hugely, but less so on platforms that do.
We can pool the buffers ourselves, but that can come as a later patch
if necessary.
This patch introduces a new frame side data type AVFilmGrainParams for use
with video codecs which support it.
It can save a lot of memory used for duplicate processed reference frames and
reduce copies when applying film grain during presentation.
Fixes: signed integer overflow: 9223372036854770375 + 5450 cannot be represented in type 'long'
Fixes: 26471/clusterfuzz-testcase-minimized-ffmpeg_dem_MXG_fuzzer-6229617557635072
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
No benchmark because this is not used in any speed relevant pathes nor is it
used where __builtin_add_overflow is available.
So I do not know how to realistically benchmark it.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
A common pattern e.g. in libavcodec is replacing/updating buffer
references: unref old one, ref new one. This function allows simplifying
such code and avoiding unnecessary refs+unrefs if the references are
already equivalent.
As it was brought up that the current documentation leaves things
as specific to YCbCr only, ICtCp and RGB are now mentioned.
Additionally, the specifications on which these definitions of
narrow and full range are defined are mentioned.
This way, the documentation of AVColorRange should now match how
most people seem to read interpret it at this point, and thus
flagging RGB AVFrames as full range is valid not only according to
common sense, but also the enum definition.
Fixes: signed integer overflow: 0 - -2147483648 cannot be represented in type 'int'
Fixes: 23646/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_FIXED_fuzzer-5480991098667008
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
SMPTE 12M timecode can only count frames up to 39, because the tens-of-frames
value is stored in 2 bit. In order to resolve this 50/60 fps SMPTE timecode is
using the field bit (which is the same bit as the phase correction bit) to
signal the least significant bit of a 50/60 fps timecode. See SMPTE ST
12-1:2014 section 12.1.
Therefore we slightly change the format of the return value of
av_timecode_get_smpte_from_framenum and AV_FRAME_DATA_S12M_TIMECODE and start
using the previously unused Phase Correction bit as Field bit. (As the SMPTE
standard suggests)
We add 50/60 fps support to av_timecode_get_smpte_from_framenum by calling the
recently added av_timecode_get_smpte function in it which already handles this
properly.
This change affects the decklink indev and the DV and MXF muxers. MXF has no
fate test for 50/60fps content, DV does, therefore the changes.
MediaInfo (a recent version) confirms that half-frame timecode must be inserted
to DV. MXFInspect confirms valid timecode insertion to the System Item of MXF
files. For MXF, also see EBU R122.
Note that for DV the field flag is not used because in the HDV specs (SMPTE
370M) it is still defined as biphase mark polarity correction flag. So it
should not matter that the DV muxer overrides the field bit.
Signed-off-by: Marton Balint <cus@passwd.hu>
The V4L2 driver does not actually have an associated DRM device at all, so
users work around the requirement by giving libva an unrelated display-only
device instead (which is fine, because it doesn't actually do anything with
that device). This was broken by bc9b6358fb
forcing a render node, because the display-only device did not have an
associated render node to use. Fix that by just passing through the
original non-render DRM fd if we can't find a render node.
Reported-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Tested-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Requires some extraneous top side and bottom front channels to be
defined.
According to STD-B59v2, the defined channel layout is:
- FL
- FR
- FC
- LFE1
- BL
- BR
- FLc
- FRc
- BC
- LFE2
- SiL
- SiR
- TpFL
- TpFR
- TpFC
- TpC
- TpBL
- TpBR
- TpSiL
- TpSiR
- TpBC
- BtFC
- BtFL
- BtFR
The process space is guaranteed to be aligned to the page size, hence we're
never going to map outside of our address space.
There are more optimizations to do with respect to chroma plane alignment and
buffer offsets, but that can be done later.
GCC support these two synthesized instruction, but clang does not yet.
Use machine instruction instead to adapt clang compiler.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
hwcontext_vaapi maps different VA fourcc to the same pix_fmt for U/V
plane swap cases, however duplicate formats are not expected in sw_format
list when merging formats.
For example:
ffmpeg -loglevel debug -init_hw_device vaapi -filter_hw_device vaapi0 \
-f lavfi -i smptebars -vf \
"hwupload=derive_device=vaapi,scale_vaapi,hwdownload,format=yuv420p" \
-vframes 1 -f null -
Without this fix, an auto scaler is required for the above command
Duplicate formats in ff_merge_formats detected
[auto_scaler_0 @ 0x560df58f4550] Setting 'flags' to value 'bicubic'
[auto_scaler_0 @ 0x560df58f4550] w:iw h:ih flags:'bicubic' interl:0
[Parsed_hwupload_0 @ 0x560df58f0ec0] auto-inserting filter
'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and
the filter 'Parsed_hwupload_0'
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
The size for a previous plane doesn't signal the presence of another after it.
If the plane is present, av_image_fill_plane_sizes() will have returned a size
for it.
Fixes a regression since 3a8e927176.
Reported-by: Imad R. Faiad <irfaiad@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Add MMI & MSA runtime detection for MIPS.
Basically there are two code pathes. For systems that
natively support CPUCFG instruction or kernel emulated
that instruction, we'll sense this feature from HWCAP and
report the flags according to values grab from CPUCFG. For
systems that have no CPUCFG (or not export it in HWCAP),
we'll parse /proc/cpuinfo instead.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
That helper grab from kernel code can allow us to inline
newer instructions (not implemented by the assembler) in
a elegant manner.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This uses av_image_fill_plane_sizes instead of av_image_fill_pointers
when we are getting plane sizes to avoid UB from adding offsets to NULL.
Signed-off-by: Brian Kim <bkkim@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
This utility helps avoid undefined behavior when doing things like
checking how much memory we need to allocate for an image before we have
allocated a buffer.
Signed-off-by: Brian Kim <bkkim@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The number of declared vdpau formats can vary depending on which
version of libvdpau we build against, so the number of pix fmts
can vary too. Let's make sure we keep those numbers in sync.
Some new warnings regarding use of empty macro parameters has
been added, so adjust some x86inc code to silence those.
Fixes part of ticket #8771
Signed-off-by: James Almer <jamrial@gmail.com>
Added VDPAU to list of supported formats for HEVC10 and 12 bit formats
also added 42010 bit to surface_parameters and new VDP chroma formats to
VDPAUPixFmtMaps
Add HEVC 420 10/12 Bit and 444 10/12 Bit support for VDPAU
YUV444P10 is defined as the 444 surface with 10bit valid data in LSBs
but H/w returns Data in MSBs Hence if we map output as YUV444p16 it
is filtering out the LSB to convert to p10 format.
Signed-off-by: Philip Langdale <philipl@overt.org>
Fixes: signed integer overflow: 2147483610 + 52 cannot be represented in type 'int'
Fixes: 23260/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_PBM_fuzzer-5187871274434560
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: left shift of 1913647649 by 1 places cannot be represented in type 'int'
Fixes: 23572/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WMALOSSLESS_fuzzer-5082619795734528
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
These functions have a terrible design, let us fix them before extending
them.
First design mistake: no error code. A helper function for testing
memory allocation failure where AVERROR(ENOMEM) does not appear is
absurd.
Second design mistake: printing a message. Return the error code, let
the caller print the error message.
Third design mistake: hard-coded use of goto.
http://ffmpeg.org/pipermail/ffmpeg-devel/2020-May/262544.html
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>