This commit refactors the power-of-two FFT, making it faster and
halving the size of all tables, making the code much smaller on
all systems.
This removes the big/small pass split, because on modern systems
the "big" pass is always faster, and even on older machines there
is no measurable speed difference.
av_set_cpu_flags_mask() has been deprecated in the commit which merged
it: 6df42f98746be06c883ce683563e07c9a2af983f; av_parse_cpu_flags() has
been deprecated in 4b529edff8.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Some files currently rely on libavutil/cpu.h to include it for them;
yet said file won't use include it any more after the currently
deprecated functions are removed, so include attributes.h directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
av_frame_copy() is allowed to return values >= 0 on success, whereas
the documentation of av_frame_ref() states that the return value is 0 on
success. Ergo the latter must not just return the former's return value.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
av_cpu_count() intends to emit a debug message containing the number of
logical cores when called the first time. The check currently works with
a static volatile int; yet this does not help at all in case of
concurrent accesses by multiple threads. So replace this with an
atomic_int.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
av_adler32_update() is used by av_hash_update() which will be switched
to size_t at the next bump. So it also has to be made to use size_t.
This is also necessary for framecrcenc.c, because the size of side data
will become a size_t, too.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
av_bprint_finalize() can still fail even when it has been checked that
the AVBPrint is currently complete: Namely if the string was so short
that it fit into the AVBPrint's internal buffer.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Fixes: Integer overflow and division by 0
Fixes: poc-202102-div.mov
Found-by: 1vanChen of NSFOCUS Security Team
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
clang errors when compiling with C++11 about needing spaces between
literal and identifier
Signed-off-by: Christopher Degawa <ccom@randomderp.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Base escaping only escapes values required for base character data
according to part 2.4 of XML, and if additional flags are added
single and double quotes can additionally be escaped in order
to handle single and double quoted attributes.
Co-authored-by: Jan Ekström <jan.ekstrom@24i.com>
Signed-off-by: Jan Ekström <jan.ekstrom@24i.com>
Fixes: signed integer overflow: -9223372053736 * 1000000 cannot be represented in type 'long'
Fixes: 26910/clusterfuzz-testcase-minimized-ffmpeg_dem_CONCAT_fuzzer-6607924558430208
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
out[lut[i]] = in[i] lookups were 4.04 times(!) slower than
out[i] = in[lut[i]] lookups for an out-of-place FFT of length 4096.
The permutes remain unchanged for anything but out-of-place monolithic
FFT, as those benefit quite a lot from the current order (it means
there's only 1 lookup necessary to add to an offset, rather than
a full gather).
The code was based around non-power-of-two FFTs, so this wasn't
benchmarked early on.
No buffer will be fetched from the pool after it's uninitialized, so there's
no benefit from waiting until every single buffer has been returned to it
before freeing them all.
This should free some memory in certain scenarios, which can be beneficial in
low memory systems.
Based on a patch by Jonas Karlman.
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: James Almer <jamrial@gmail.com>
This is much less precise than the cycle counter register, but
the cycle counter register is not available on apple platforms
(and on linux, it requires a kernel module for allowing user mode
access).
Signed-off-by: Martin Storsjö <martin@martin.st>
This commit adds support for in-place FFT transforms. Since our
internal transforms were all in-place anyway, this only changes
the permutation on the input.
Unfortunately, research papers were of no help here. All focused
on dry hardware implementations, where permutes are free, or on
software implementations where binary bloat is of no concern so
storing dozen times the transforms for each permutation and version
is not considered bad practice.
Still, for a pure C implementation, it's only around 28% slower
than the multi-megabyte FFTW3 in unaligned mode.
Unlike a closed permutation like with PFA, split-radix FFT bit-reversals
contain multiple NOPs, multiple simple swaps, and a few chained swaps,
so regular single-loop single-state permute loops were not possible.
Instead, we filter out parts of the input indices which are redundant.
This allows for a single branch, and with some clever AVX512 asm,
could possibly be SIMD'd without refactoring.
The inplace_idx array is guaranteed to never be larger than the
revtab array, and in practice only requires around log2(len) entries.
The power-of-two MDCTs can be done in-place as well. And it's
possible to eliminate a copy in the compound MDCTs too, however
it'll be slower than doing them out of place, and we'd need to dirty
the input array.
This patch also fixes a -Wtautological-constant-out-of-range-compare
warning from Clang and a -Wtype-limits warning from GCC on systems
where size_t is 64bits and unsigned 32bits. The reason for this seems
to be that variable (whose value derives from sizeof() and can therefore
be known at compile-time) is used instead of using sizeof() directly in
the comparison.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
libavutil/common.h is a public header that provides generic math
functions whereas libavutil/intmath.h is a private header that contains
plattform-specific optimized versions of said math functions. common.h
includes intmath.h (when building the FFmpeg libraries) so that the
optimized versions are used for them.
This interdependency sometimes causes trouble: intmath.h once contained
an inlined ff_sqrt function that relied upon av_log2_16bit. In case there
was no optimized logarithm available on this plattform, intmath.h needed
to include common.h to get the generic implementation and this has been
done after the optimized versions (if any) have been provided so that
common.h used the optimized versions; it also needed to be done before
ff_sqrt. Yet when intmath.h was included from common.h and if an ordinary
inclusion guard was used by common.h, the #include "common.h" in intmath.h
was a no-op and therefore av_log2_16bit was still unknown at the end of
intmath.h (and also in ff_sqrt) if no optimized version was available.
Before a955b59658 this was solved by
duplicating the #ifndef av_log2_16bit check after the inclusion of
common.h in intmath.h; said commit instead moved these checks to the
end of common.h, outside the inclusion guards and made common.h include
itself to get these unguarded defines. This is still the current
state of affairs.
Yet this is unnecessary since 9734b8ba56
as said commit removed ff_sqrt as well as the #include "common.h" from
intmath.h. Therefore this commit moves everything inside the inclusion
guards and makes common.h not include itself.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Fixes: negation of -9223372036854775808 cannot be represented in type 'int64_t' (aka 'long'); cast to an unsigned type to negate this value to itself
Fixes: 29437/clusterfuzz-testcase-minimized-ffmpeg_dem_MOV_fuzzer-4748510022991872
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: division by zero
Fixes: 29555/clusterfuzz-testcase-minimized-ffmpeg_dem_VIVO_fuzzer-5149951447400448
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This reverts commit 6696a07ac6.
It is wrong to restrict timecodes to always contain leading zeros or for hours
or frames to be 2 chars only.
Signed-off-by: Marton Balint <cus@passwd.hu>
fixes http://trac.ffmpeg.org/ticket/9055
The hw decoder may allocate a large frame from AVHWFramesContext, and adjust width and height based on bitstream.
We need to use resolution from src frame instead of AVHWFramesContext.
test command:
ffmpeg -loglevel debug -hide_banner -hwaccel vaapi -init_hw_device vaapi=va:/dev/dri/renderD128 -hwaccel_device va -hwaccel_output_format vaapi -init_hw_device vulkan=vulk -filter_hw_device vulk -i 1920x1080.264 -c:v libx264 -r:v 30 -profile:v high -preset veryfast -vf "hwmap,chromaber_vulkan=0:0,hwdownload,format=nv12" -map 0 -y vaapiouts.mkv
expected:
No green bar at bottom.
Fixes: signed integer overflow: 2147462079 + 2149596 cannot be represented in type 'int'
Fixes: 27565/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-5091972813160448
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This patch adds support for arbitrary-point FFTs and all even MDCT
transforms.
Odd MDCTs are not supported yet as they're based on the DCT-II and DCT-III
and they're very niche.
With this we can now write tests.
Fixes: division by zero
Fixes: 26451/clusterfuzz-testcase-minimized-ffmpeg_dem_VIVO_fuzzer-4756955832516608
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
INT32_MAX (2147483647) isn't exactly representable by a floating point
value, with the closest being 2147483648.0. So when rescaling a value
of 1.0, this could overflow when casting the 64-bit value returned from
lrintf() into 32 bits.
Unfortunately the properties of integer overflows don't match up well
with how a Fourier Transform operates. So clip the value before
casting to a 32-bit int.
Should be noted we don't have overflows with the table values we're
currently using. However, converting a Kaiser-Bessel window function
with a length of 256 and a parameter of 5.0 to fixed point did create
overflows. So this is more of insurance to save debugging time
in case something changes in the future.
The macro is only used during init, so it being a little slower is
not a problem.
Do it only when requested with the AV_CODEC_EXPORT_DATA_VIDEO_ENC_PARAMS
flag.
Drop previous code using the long-deprecated AV_FRAME_DATA_QP_TABLE*
API. Temporarily disable fate-filter-pp, fate-filter-pp7,
fate-filter-spp. They will be reenabled once these filters are converted
in following commits.
This was introduced in version 4.6. And may not exist all without an
optional package. So to prevent a hard dependency on needing the Linux
kernel headers to compile, make this optional.
Also ignore the status of the ioctl, since it may fail on older kernels
which don't support it. It's okay to ignore as its not fatal and any
serious errors will be caught later by the mmap call.
When targeting a recent enough macOS/iOS version that has clock_gettime
it won't be a weak symbol, in which case clang warns for this check
as it's always true:
warning: address of function 'clock_gettime' will always
evaluate to 'true'
This warning is silenced by using the address-of operator to make
the intent explicit.
These two extensions and two features are both optionally used by
libplacebo to speed up rendering, so it makes sense for libavutil to
automatically enable them as well.
Vulkan formats with a PACK suffix define native endianess.
Vulkan formats without a PACK suffix are in bytestream order.
Pixel formats with a LE/BE suffix define endianess.
Pixel formats without LE/BE suffix are in bytestream order.
This relies on the fact that host memory is always going to be required
to be aligned to the platform's page size, which means we can adjust
the pointers when we map them to buffers and therefore skip an entire
copy. This has already had extensive testing in libplacebo without
problems, so its safe to use here as well.
Speeds up downloads and uploads on platforms which do not pool their
memory hugely, but less so on platforms that do.
We can pool the buffers ourselves, but that can come as a later patch
if necessary.
This patch introduces a new frame side data type AVFilmGrainParams for use
with video codecs which support it.
It can save a lot of memory used for duplicate processed reference frames and
reduce copies when applying film grain during presentation.
Fixes: signed integer overflow: 9223372036854770375 + 5450 cannot be represented in type 'long'
Fixes: 26471/clusterfuzz-testcase-minimized-ffmpeg_dem_MXG_fuzzer-6229617557635072
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
No benchmark because this is not used in any speed relevant pathes nor is it
used where __builtin_add_overflow is available.
So I do not know how to realistically benchmark it.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
A common pattern e.g. in libavcodec is replacing/updating buffer
references: unref old one, ref new one. This function allows simplifying
such code and avoiding unnecessary refs+unrefs if the references are
already equivalent.
As it was brought up that the current documentation leaves things
as specific to YCbCr only, ICtCp and RGB are now mentioned.
Additionally, the specifications on which these definitions of
narrow and full range are defined are mentioned.
This way, the documentation of AVColorRange should now match how
most people seem to read interpret it at this point, and thus
flagging RGB AVFrames as full range is valid not only according to
common sense, but also the enum definition.
Fixes: signed integer overflow: 0 - -2147483648 cannot be represented in type 'int'
Fixes: 23646/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AAC_FIXED_fuzzer-5480991098667008
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
SMPTE 12M timecode can only count frames up to 39, because the tens-of-frames
value is stored in 2 bit. In order to resolve this 50/60 fps SMPTE timecode is
using the field bit (which is the same bit as the phase correction bit) to
signal the least significant bit of a 50/60 fps timecode. See SMPTE ST
12-1:2014 section 12.1.
Therefore we slightly change the format of the return value of
av_timecode_get_smpte_from_framenum and AV_FRAME_DATA_S12M_TIMECODE and start
using the previously unused Phase Correction bit as Field bit. (As the SMPTE
standard suggests)
We add 50/60 fps support to av_timecode_get_smpte_from_framenum by calling the
recently added av_timecode_get_smpte function in it which already handles this
properly.
This change affects the decklink indev and the DV and MXF muxers. MXF has no
fate test for 50/60fps content, DV does, therefore the changes.
MediaInfo (a recent version) confirms that half-frame timecode must be inserted
to DV. MXFInspect confirms valid timecode insertion to the System Item of MXF
files. For MXF, also see EBU R122.
Note that for DV the field flag is not used because in the HDV specs (SMPTE
370M) it is still defined as biphase mark polarity correction flag. So it
should not matter that the DV muxer overrides the field bit.
Signed-off-by: Marton Balint <cus@passwd.hu>
The V4L2 driver does not actually have an associated DRM device at all, so
users work around the requirement by giving libva an unrelated display-only
device instead (which is fine, because it doesn't actually do anything with
that device). This was broken by bc9b6358fb
forcing a render node, because the display-only device did not have an
associated render node to use. Fix that by just passing through the
original non-render DRM fd if we can't find a render node.
Reported-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Tested-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
Requires some extraneous top side and bottom front channels to be
defined.
According to STD-B59v2, the defined channel layout is:
- FL
- FR
- FC
- LFE1
- BL
- BR
- FLc
- FRc
- BC
- LFE2
- SiL
- SiR
- TpFL
- TpFR
- TpFC
- TpC
- TpBL
- TpBR
- TpSiL
- TpSiR
- TpBC
- BtFC
- BtFL
- BtFR
The process space is guaranteed to be aligned to the page size, hence we're
never going to map outside of our address space.
There are more optimizations to do with respect to chroma plane alignment and
buffer offsets, but that can be done later.
GCC support these two synthesized instruction, but clang does not yet.
Use machine instruction instead to adapt clang compiler.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
hwcontext_vaapi maps different VA fourcc to the same pix_fmt for U/V
plane swap cases, however duplicate formats are not expected in sw_format
list when merging formats.
For example:
ffmpeg -loglevel debug -init_hw_device vaapi -filter_hw_device vaapi0 \
-f lavfi -i smptebars -vf \
"hwupload=derive_device=vaapi,scale_vaapi,hwdownload,format=yuv420p" \
-vframes 1 -f null -
Without this fix, an auto scaler is required for the above command
Duplicate formats in ff_merge_formats detected
[auto_scaler_0 @ 0x560df58f4550] Setting 'flags' to value 'bicubic'
[auto_scaler_0 @ 0x560df58f4550] w:iw h:ih flags:'bicubic' interl:0
[Parsed_hwupload_0 @ 0x560df58f0ec0] auto-inserting filter
'auto_scaler_0' between the filter 'graph 0 input from stream 0:0' and
the filter 'Parsed_hwupload_0'
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
The size for a previous plane doesn't signal the presence of another after it.
If the plane is present, av_image_fill_plane_sizes() will have returned a size
for it.
Fixes a regression since 3a8e927176.
Reported-by: Imad R. Faiad <irfaiad@gmail.com>
Signed-off-by: James Almer <jamrial@gmail.com>
Add MMI & MSA runtime detection for MIPS.
Basically there are two code pathes. For systems that
natively support CPUCFG instruction or kernel emulated
that instruction, we'll sense this feature from HWCAP and
report the flags according to values grab from CPUCFG. For
systems that have no CPUCFG (or not export it in HWCAP),
we'll parse /proc/cpuinfo instead.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
That helper grab from kernel code can allow us to inline
newer instructions (not implemented by the assembler) in
a elegant manner.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This uses av_image_fill_plane_sizes instead of av_image_fill_pointers
when we are getting plane sizes to avoid UB from adding offsets to NULL.
Signed-off-by: Brian Kim <bkkim@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
This utility helps avoid undefined behavior when doing things like
checking how much memory we need to allocate for an image before we have
allocated a buffer.
Signed-off-by: Brian Kim <bkkim@google.com>
Signed-off-by: James Almer <jamrial@gmail.com>
The number of declared vdpau formats can vary depending on which
version of libvdpau we build against, so the number of pix fmts
can vary too. Let's make sure we keep those numbers in sync.
Some new warnings regarding use of empty macro parameters has
been added, so adjust some x86inc code to silence those.
Fixes part of ticket #8771
Signed-off-by: James Almer <jamrial@gmail.com>
Added VDPAU to list of supported formats for HEVC10 and 12 bit formats
also added 42010 bit to surface_parameters and new VDP chroma formats to
VDPAUPixFmtMaps
Add HEVC 420 10/12 Bit and 444 10/12 Bit support for VDPAU
YUV444P10 is defined as the 444 surface with 10bit valid data in LSBs
but H/w returns Data in MSBs Hence if we map output as YUV444p16 it
is filtering out the LSB to convert to p10 format.
Signed-off-by: Philip Langdale <philipl@overt.org>
Fixes: signed integer overflow: 2147483610 + 52 cannot be represented in type 'int'
Fixes: 23260/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_PBM_fuzzer-5187871274434560
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: left shift of 1913647649 by 1 places cannot be represented in type 'int'
Fixes: 23572/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WMALOSSLESS_fuzzer-5082619795734528
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
These functions have a terrible design, let us fix them before extending
them.
First design mistake: no error code. A helper function for testing
memory allocation failure where AVERROR(ENOMEM) does not appear is
absurd.
Second design mistake: printing a message. Return the error code, let
the caller print the error message.
Third design mistake: hard-coded use of goto.
http://ffmpeg.org/pipermail/ffmpeg-devel/2020-May/262544.html
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
Use opaque iteration state instead of the previous child class. This
mirrors similar changes done in lavf/lavc.
Deprecate the av_opt_child_class_next() API.
make checkheaders will get error as follow:
CC libavutil/hwcontext_vulkan.h.o
In file included from libavutil/hwcontext_vulkan.h.c:1:
./libavutil/hwcontext_vulkan.h:130:23: error: ‘AV_NUM_DATA_POINTERS’ undeclared here (not in a function)
130 | void *alloc_pnext[AV_NUM_DATA_POINTERS];
| ^~~~~~~~~~~~~~~~~~~~
./libavutil/hwcontext_vulkan.h:199:43: warning: ‘enum AVPixelFormat’ declared inside parameter list will not be visible outside of this definition or declaration
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
We want to copy the lowest amount of bytes per line, but while the buffer
stride is sanitized, the src/dst stride can be negative, and negative numbers
of bytes do not make a lot of sense.
The size of a single allocation performed by av_malloc() or av_realloc()
is supposed to be bounded by max_alloc_size, which defaults to INT_MAX
and can be set by the user; yet currently this is not completely
honoured: The actual value used is max_alloc_size - 32. How this came
to be can only be understood historically:
a) 0ecca7a49f disallowed allocations
> INT_MAX. At that time the size parameter of av_malloc() was an
unsigned and the commentary added ("lets disallow possible ambiguous
cases") indicates that this was done as a precaution against calling the
functions with negative int values. Genuinely limiting the size of
allocations to INT_MAX doesn't seem to have been the intention given
that at this time the memalign hack introduced in commit
da9b170c6f (which when enabled increased
the size of allocations slightly so that one can return a correctly
aligned pointer that actually does not point to the beginning of the
allocated buffer) was already present.
b) Said memalign hack allocated 17 bytes more than actually desired, yet
allocating 16 bytes more is actually enough and so this was changed in
a9493601638b048c44751956d2360f215918800c; this commit also replaced
INT_MAX by INT_MAX - 16 (and made the limit therefore a limit on the size
of the allocated buffer), but kept the comment, although there is nothing
ambiguous about allocating (INT_MAX - 16)..INT_MAX.
c) 13dfce3d44 then increased 16 to 32 for
AVX, 6b4c0be558 replaced INT_MAX by
MAX_MALLOC_SIZE (which was of course defined to be INT_MAX) and
5a8e994287 added max_alloc_size and made
it user-selectable.
d) 4fb311c804 then dropped the memalign
hack, yet it kept the -32 (probably because the comment about ambiguous
cases was still present?), although it is no longer needed at all after
this commit. Therefore this commit removes it and uses max_alloc_size
directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
This was never actually used, likely due to confusion, as the device context
also had one used for uploads and downloads.
Also, since we're only using it for very quick image barriers (which are
practically free on all hardware), use the compute queue instead of the
transfer queue.
This commit makes full use of the enabled queues to provide asynchronous
uploads of images (downloads remain synchronous).
For a pure uploading use cases, the performance gains can be significant.
With this, the puzzle of making libplacebo, ffmpeg and any other Vulkan
API users interoperable is complete.
Users of both libraries can initialize one another's contexts without having
to create a new one.
This allows for users who derive devices to set options for the
new device context they derive.
The main use case of this is to allow users to enable extensions
(such as surface drawing extensions) in Vulkan while deriving from
the device their frames are on. That way, users don't need to write
any initialization code themselves, since the Vulkan spec invalidates
mixing instances, physical devices and active devices.
Apart from Vulkan, other hwcontexts ignore the opts argument since they
don't support options at all (or in VAAPI and OpenCL's case, options are
currently only used for device selection, which device_derive overrides).
This will be used for AVCodecContext->profile. By specifying constants in the
encoders we won't have to use the common AVCodecContext options table and
different encoders can use the same profile name even with different values.
Signed-off-by: Marton Balint <cus@passwd.hu>
Enables HEVC Range Extension decoding support (Linux) for 4:2:2 8/10 bit
on ICL+ (gen11 +) platform.
Restricted to linux only for now.
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Many places are using their own custom code for handling overflow
around timestamps or other int64_t values. There are enough of these
now that having some common saturated math functions seems sound.
Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
On windows and darwin (and modern android), the x18 register is reserved
and shouldn't be modified by user code, while it is freely available on
linux. Strictly avoid it, to keep the assembly code portable.
This would have helped catch the issue fixed in 872790b1f9
immediately.
Signed-off-by: Martin Storsjö <martin@martin.st>
Only warn instead. API users can find out which extensions were unavailable
by using the enabled_inst_extensions and enabled_dev_extensions fields.
This eliminates having to trial-and-error to find which extensions were missing.
Due to our AVHWDevice infrastructure, where API users are offered a way
to derive contexts rather than always create new one, our filterchains,
being supported by a single hardware device context, can grow to considerable
size.
Hence, in such situations, using the maximum amount of queues the device offers
can be benefitial to eliminating bottlenecks where queue submissions on the
same family have to wait for the previous one to finish.
This is intended to replace the deprecated the AV_FRAME_DATA_QP_TABLE*
API and extend it to a wider range of codecs.
In the future, it may also be extended to support other encoding
parameters such as motion vectors.
Additional changes by Anton Khirnov <anton@khirnov.net> with suggestions
by Lynne <dev@lynne.ee>.
Signed-off-by: Juan De León <juandl@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This reverts commit 97b526c192.
It broke the API, and assumed no other APIs used multiple semaphores.
This also disallowed certain optimizations to happen.
Dealing with APIs that give or expect single semaphores is easier when
we use per-image semaphores.
The specs note that images should be in the GENERAL layout when exporting
for maximum compatibility.
CUDA exported images are handled differently, and the queue is the same,
so we don't need to do that there.
As it turns out, we were already assuming and treating all images as if they had
concurrent access mode. This just changes the flag to CONCURRENT, which has less
restrictions than EXCLUSIVE, and fixed validation messages on machines with
multiple queues.
The validation layer didn't pick this up because the machine I was testing on
had only a single queue.
This solves a huge oversight - it lets users reliably use their own
AVVulkanDeviceContext. Otherwise, the extensions supplied and enabled
are not discoverable by anything outside of hwcontext_vulkan.
Also clarifies that any user-supplied VkInstance must be at least 1.1.
Also documents all options supported by the hwdevice.
This lets users enable all extensions they need without writing their own
instance initialization code.
Fixes problems when non-rational options were set using rational expressions,
causing rounding errors and the option range limits not to be enforced
properly.
ffmpeg -f lavfi -i "sine=r=96000/2"
This caused an assertion failure with assert level 2.
Signed-off-by: Marton Balint <cus@passwd.hu>
bump minor version for DOVI sidedata, because added the dovi_meta.h
as lavu API part. Also update APIchanges.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
We derive the destination buffer stride from the input stride,
which meant if the image was flipped with a negative stride,
we'd be FFALIGNING a negative number which ends up being huge,
thus making the Vulkan buffer allocation fail and the whole
image transfer fail.
Only found out about this as OpenGL compositors can copy an entire
image with a single call if its flipped, rather than iterate over
each line.
Do not limit the array allocation functions and av_calloc() to allocations
of INT_MAX, instead depend on max_alloc_size like av_malloc().
Allows a workaround for ticket #7140.
The idea was to allow separate planes to be filtered independently, however,
in hindsight, literaly nothing uses separate per-plane semaphores and it
would only work when each plane is backed by separate device memory.
If one calls av_opt_set() with an incorrect string to set the value of
an option of type AV_OPT_TYPE_VIDEO_RATE, the given string is used in a
log message via %s. This also happens when the string is actually a
nullpointer in which case using it for %s is forbidden.
This commit changes this by erroring out early in case of a nullpointer.
This also fixes a warning from GCC 9.2:
"‘%s’ directive argument is null [-Wformat-overflow=]"
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
By itself, this allows 6-point, 10-point and 30-point transforms.
When the 9-point transform is added it allows for 18-point FFT,
and also for a 36-point MDCT (used by MP3).
This matches the inclusion of the other hwcontext_<hwaccel>.h headers.
The skipping of the header depending on build flags is already present.
Signed-off-by: Daniel Playfair Cal: <daniel.playfair.cal@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The specifications are very vague about who has ownership, and in this case,
Vulkan takes ownership of all DMABUF FDs passed to it, causing errors
to occur if someone gave us images for mapping which were meant to be kept.
The old behavior worked with one-way VAAPI and DMABUF imports, but was broken
with clients like wlroots' dmabuf-capture.
There was a recent change in Intel's driver that triggered a driver-internal
error if the semaphore given to the command buffer wasn't initialized.
Given that the specifications require the semaphore to be initialized,
this is within spec. Unlike what's causing it in the first place, which is
that there are no ways to extract/import dma sync objects from DMABUFs,
so we must leave our semaphores bare.
If we are given a non-render node, try to find the matching render node and
fail if that isn't possible.
libva will not accept a non-render device which is not DRM master, because
it requires legacy DRM authentication to succeed in that case:
<https://github.com/intel/libva/blob/master/va/drm/va_drm.c#L68-L75>. This
is annoying for kmsgrab because in most recording situations DRM master is
already held by something else (such as a windowing system), leading to
device derivation not working and forcing the user to create the target
VAAPI device separately.
VA_RT_FORMAT describes the desired sampling format for surface.
When creating surface, VA_RT_FORMAT will be used firstly to choose
the expected fourcc/media_format for the surface. And the fourcc
will be revised by the value of VASurfaceAttribPixelFormat.
Add vaapi_format_map support for new pixel_format Y210.
This is fundamental for both VA-API and QSV.
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Required minimal changes to the code so made sense to implement.
FFT and MDCT tested, the output of both was properly rounded.
Fun fact: the non-power-of-two fixed-point FFT and MDCT are the fastest ever
non-power-of-two fixed-point FFT and MDCT written.
This can replace the power of two integer MDCTs in aac and ac3 if the
MIPS optimizations are ported across.
Unfortunately the ac3 encoder uses a 16-bit fixed point forward transform,
unlike the encoder which uses a 32bit inverse transform, so some modifications
might be required there.
The 3-point FFT is somewhat less accurate than it otherwise could be,
having minor rounding errors with bigger transforms. However, this
could be improved later, and the way its currently written is the way one
would write assembly for it.
Similar rounding errors can also be found throughout the power of two FFTs
as well, though those are more difficult to correct.
Despite this, the integer transforms are more than accurate enough.
Compared to ad-hoc if(printed) ... code this allows the user to disable
it by adjusting the log level
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
To make behavior the same as non-win32 code when the standard error is
redirected. Also restructure the code a bit.
Signed-off-by: Marton Balint <cus@passwd.hu>
Not even FFTW's output is normalized.
This should prevent at least some users from complaining that doing a forward
transform followed by an inverse transform has a mismatching output to the
original input.
This commit adds the necessary code to initialize and use a Vulkan device
within the hwcontext libavutil framework.
Currently direct mapping to VAAPI and DRM frames is functional, and
transfers to CUDA and native frames are supported.
Lets hope the future Vulkan video decode extension fits well within this
framework.
We are beginning to consider scenarios where a given HW Context
may be able to transfer frames to another HW Context without
passing via system memory - this would usually be when two
contexts represent different APIs on the same device (eg: Vulkan
and CUDA).
This is modelled as a transfer, as we have today, but where both
the src and the dst are hardware frames with hw contexts. We need
to be careful to ensure the contexts are compatible - particularly,
we cannot do transfers where one of the frames has been mapped via
a derived frames context - we can only do transfers for frames that
were directly allocated by the specified context.
Additionally, as we have two hardware contexts, the transfer function
could be implemented by either (or indeed both). To handle this
uncertainty, we explicitly look for ENOSYS as an indicator to try
the transfer in the other direction before giving up.
When clang works in MSVC mode, it does have the _byteswap_ulong
builtin, but one has to include stdlib.h before using it.
Signed-off-by: Martin Storsjö <martin@martin.st>
SetConsoleTextAttribute used to be unavailable for Windows Store apps,
but is available to them now. But GetStdHandle still is unavailable,
thus make sure to check for both functions before using code that
requires both.
Signed-off-by: Martin Storsjö <martin@martin.st>
In case of failure, all the successfully set entries are stored in
*pm. We need to manually free the created dictionary to avoid
memory leak.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
In order to access the original opaque parameter of a buffer in the buffer
pool. (The buffer pool implementation overrides the normal opaque parameter but
also saves it so it is accessible).
v2: add assertion check before dereferencing the BufferPoolEntry.
Signed-off-by: Marton Balint <cus@passwd.hu>
Fixes: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
Fixes: 18333/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_COMFORTNOISE_fuzzer-5668481831272448
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>