make checkheaders will get error as follow:
CC libavutil/hwcontext_vulkan.h.o
In file included from libavutil/hwcontext_vulkan.h.c:1:
./libavutil/hwcontext_vulkan.h:130:23: error: ‘AV_NUM_DATA_POINTERS’ undeclared here (not in a function)
130 | void *alloc_pnext[AV_NUM_DATA_POINTERS];
| ^~~~~~~~~~~~~~~~~~~~
./libavutil/hwcontext_vulkan.h:199:43: warning: ‘enum AVPixelFormat’ declared inside parameter list will not be visible outside of this definition or declaration
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
We want to copy the lowest amount of bytes per line, but while the buffer
stride is sanitized, the src/dst stride can be negative, and negative numbers
of bytes do not make a lot of sense.
The size of a single allocation performed by av_malloc() or av_realloc()
is supposed to be bounded by max_alloc_size, which defaults to INT_MAX
and can be set by the user; yet currently this is not completely
honoured: The actual value used is max_alloc_size - 32. How this came
to be can only be understood historically:
a) 0ecca7a49f disallowed allocations
> INT_MAX. At that time the size parameter of av_malloc() was an
unsigned and the commentary added ("lets disallow possible ambiguous
cases") indicates that this was done as a precaution against calling the
functions with negative int values. Genuinely limiting the size of
allocations to INT_MAX doesn't seem to have been the intention given
that at this time the memalign hack introduced in commit
da9b170c6f (which when enabled increased
the size of allocations slightly so that one can return a correctly
aligned pointer that actually does not point to the beginning of the
allocated buffer) was already present.
b) Said memalign hack allocated 17 bytes more than actually desired, yet
allocating 16 bytes more is actually enough and so this was changed in
a9493601638b048c44751956d2360f215918800c; this commit also replaced
INT_MAX by INT_MAX - 16 (and made the limit therefore a limit on the size
of the allocated buffer), but kept the comment, although there is nothing
ambiguous about allocating (INT_MAX - 16)..INT_MAX.
c) 13dfce3d44 then increased 16 to 32 for
AVX, 6b4c0be558 replaced INT_MAX by
MAX_MALLOC_SIZE (which was of course defined to be INT_MAX) and
5a8e994287 added max_alloc_size and made
it user-selectable.
d) 4fb311c804 then dropped the memalign
hack, yet it kept the -32 (probably because the comment about ambiguous
cases was still present?), although it is no longer needed at all after
this commit. Therefore this commit removes it and uses max_alloc_size
directly.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
This was never actually used, likely due to confusion, as the device context
also had one used for uploads and downloads.
Also, since we're only using it for very quick image barriers (which are
practically free on all hardware), use the compute queue instead of the
transfer queue.
This commit makes full use of the enabled queues to provide asynchronous
uploads of images (downloads remain synchronous).
For a pure uploading use cases, the performance gains can be significant.
With this, the puzzle of making libplacebo, ffmpeg and any other Vulkan
API users interoperable is complete.
Users of both libraries can initialize one another's contexts without having
to create a new one.
This allows for users who derive devices to set options for the
new device context they derive.
The main use case of this is to allow users to enable extensions
(such as surface drawing extensions) in Vulkan while deriving from
the device their frames are on. That way, users don't need to write
any initialization code themselves, since the Vulkan spec invalidates
mixing instances, physical devices and active devices.
Apart from Vulkan, other hwcontexts ignore the opts argument since they
don't support options at all (or in VAAPI and OpenCL's case, options are
currently only used for device selection, which device_derive overrides).
This will be used for AVCodecContext->profile. By specifying constants in the
encoders we won't have to use the common AVCodecContext options table and
different encoders can use the same profile name even with different values.
Signed-off-by: Marton Balint <cus@passwd.hu>
Enables HEVC Range Extension decoding support (Linux) for 4:2:2 8/10 bit
on ICL+ (gen11 +) platform.
Restricted to linux only for now.
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Many places are using their own custom code for handling overflow
around timestamps or other int64_t values. There are enough of these
now that having some common saturated math functions seems sound.
Signed-off-by: Dale Curtis <dalecurtis@chromium.org>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
On windows and darwin (and modern android), the x18 register is reserved
and shouldn't be modified by user code, while it is freely available on
linux. Strictly avoid it, to keep the assembly code portable.
This would have helped catch the issue fixed in 872790b1f9
immediately.
Signed-off-by: Martin Storsjö <martin@martin.st>
Only warn instead. API users can find out which extensions were unavailable
by using the enabled_inst_extensions and enabled_dev_extensions fields.
This eliminates having to trial-and-error to find which extensions were missing.
Due to our AVHWDevice infrastructure, where API users are offered a way
to derive contexts rather than always create new one, our filterchains,
being supported by a single hardware device context, can grow to considerable
size.
Hence, in such situations, using the maximum amount of queues the device offers
can be benefitial to eliminating bottlenecks where queue submissions on the
same family have to wait for the previous one to finish.
This is intended to replace the deprecated the AV_FRAME_DATA_QP_TABLE*
API and extend it to a wider range of codecs.
In the future, it may also be extended to support other encoding
parameters such as motion vectors.
Additional changes by Anton Khirnov <anton@khirnov.net> with suggestions
by Lynne <dev@lynne.ee>.
Signed-off-by: Juan De León <juandl@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
This reverts commit 97b526c192.
It broke the API, and assumed no other APIs used multiple semaphores.
This also disallowed certain optimizations to happen.
Dealing with APIs that give or expect single semaphores is easier when
we use per-image semaphores.
The specs note that images should be in the GENERAL layout when exporting
for maximum compatibility.
CUDA exported images are handled differently, and the queue is the same,
so we don't need to do that there.
As it turns out, we were already assuming and treating all images as if they had
concurrent access mode. This just changes the flag to CONCURRENT, which has less
restrictions than EXCLUSIVE, and fixed validation messages on machines with
multiple queues.
The validation layer didn't pick this up because the machine I was testing on
had only a single queue.
This solves a huge oversight - it lets users reliably use their own
AVVulkanDeviceContext. Otherwise, the extensions supplied and enabled
are not discoverable by anything outside of hwcontext_vulkan.
Also clarifies that any user-supplied VkInstance must be at least 1.1.
Also documents all options supported by the hwdevice.
This lets users enable all extensions they need without writing their own
instance initialization code.
Fixes problems when non-rational options were set using rational expressions,
causing rounding errors and the option range limits not to be enforced
properly.
ffmpeg -f lavfi -i "sine=r=96000/2"
This caused an assertion failure with assert level 2.
Signed-off-by: Marton Balint <cus@passwd.hu>
bump minor version for DOVI sidedata, because added the dovi_meta.h
as lavu API part. Also update APIchanges.
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
We derive the destination buffer stride from the input stride,
which meant if the image was flipped with a negative stride,
we'd be FFALIGNING a negative number which ends up being huge,
thus making the Vulkan buffer allocation fail and the whole
image transfer fail.
Only found out about this as OpenGL compositors can copy an entire
image with a single call if its flipped, rather than iterate over
each line.
Do not limit the array allocation functions and av_calloc() to allocations
of INT_MAX, instead depend on max_alloc_size like av_malloc().
Allows a workaround for ticket #7140.
The idea was to allow separate planes to be filtered independently, however,
in hindsight, literaly nothing uses separate per-plane semaphores and it
would only work when each plane is backed by separate device memory.
If one calls av_opt_set() with an incorrect string to set the value of
an option of type AV_OPT_TYPE_VIDEO_RATE, the given string is used in a
log message via %s. This also happens when the string is actually a
nullpointer in which case using it for %s is forbidden.
This commit changes this by erroring out early in case of a nullpointer.
This also fixes a warning from GCC 9.2:
"‘%s’ directive argument is null [-Wformat-overflow=]"
Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
By itself, this allows 6-point, 10-point and 30-point transforms.
When the 9-point transform is added it allows for 18-point FFT,
and also for a 36-point MDCT (used by MP3).
This matches the inclusion of the other hwcontext_<hwaccel>.h headers.
The skipping of the header depending on build flags is already present.
Signed-off-by: Daniel Playfair Cal: <daniel.playfair.cal@gmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The specifications are very vague about who has ownership, and in this case,
Vulkan takes ownership of all DMABUF FDs passed to it, causing errors
to occur if someone gave us images for mapping which were meant to be kept.
The old behavior worked with one-way VAAPI and DMABUF imports, but was broken
with clients like wlroots' dmabuf-capture.
There was a recent change in Intel's driver that triggered a driver-internal
error if the semaphore given to the command buffer wasn't initialized.
Given that the specifications require the semaphore to be initialized,
this is within spec. Unlike what's causing it in the first place, which is
that there are no ways to extract/import dma sync objects from DMABUFs,
so we must leave our semaphores bare.
If we are given a non-render node, try to find the matching render node and
fail if that isn't possible.
libva will not accept a non-render device which is not DRM master, because
it requires legacy DRM authentication to succeed in that case:
<https://github.com/intel/libva/blob/master/va/drm/va_drm.c#L68-L75>. This
is annoying for kmsgrab because in most recording situations DRM master is
already held by something else (such as a windowing system), leading to
device derivation not working and forcing the user to create the target
VAAPI device separately.
VA_RT_FORMAT describes the desired sampling format for surface.
When creating surface, VA_RT_FORMAT will be used firstly to choose
the expected fourcc/media_format for the surface. And the fourcc
will be revised by the value of VASurfaceAttribPixelFormat.
Add vaapi_format_map support for new pixel_format Y210.
This is fundamental for both VA-API and QSV.
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Required minimal changes to the code so made sense to implement.
FFT and MDCT tested, the output of both was properly rounded.
Fun fact: the non-power-of-two fixed-point FFT and MDCT are the fastest ever
non-power-of-two fixed-point FFT and MDCT written.
This can replace the power of two integer MDCTs in aac and ac3 if the
MIPS optimizations are ported across.
Unfortunately the ac3 encoder uses a 16-bit fixed point forward transform,
unlike the encoder which uses a 32bit inverse transform, so some modifications
might be required there.
The 3-point FFT is somewhat less accurate than it otherwise could be,
having minor rounding errors with bigger transforms. However, this
could be improved later, and the way its currently written is the way one
would write assembly for it.
Similar rounding errors can also be found throughout the power of two FFTs
as well, though those are more difficult to correct.
Despite this, the integer transforms are more than accurate enough.
Compared to ad-hoc if(printed) ... code this allows the user to disable
it by adjusting the log level
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
To make behavior the same as non-win32 code when the standard error is
redirected. Also restructure the code a bit.
Signed-off-by: Marton Balint <cus@passwd.hu>
Not even FFTW's output is normalized.
This should prevent at least some users from complaining that doing a forward
transform followed by an inverse transform has a mismatching output to the
original input.
This commit adds the necessary code to initialize and use a Vulkan device
within the hwcontext libavutil framework.
Currently direct mapping to VAAPI and DRM frames is functional, and
transfers to CUDA and native frames are supported.
Lets hope the future Vulkan video decode extension fits well within this
framework.
We are beginning to consider scenarios where a given HW Context
may be able to transfer frames to another HW Context without
passing via system memory - this would usually be when two
contexts represent different APIs on the same device (eg: Vulkan
and CUDA).
This is modelled as a transfer, as we have today, but where both
the src and the dst are hardware frames with hw contexts. We need
to be careful to ensure the contexts are compatible - particularly,
we cannot do transfers where one of the frames has been mapped via
a derived frames context - we can only do transfers for frames that
were directly allocated by the specified context.
Additionally, as we have two hardware contexts, the transfer function
could be implemented by either (or indeed both). To handle this
uncertainty, we explicitly look for ENOSYS as an indicator to try
the transfer in the other direction before giving up.
When clang works in MSVC mode, it does have the _byteswap_ulong
builtin, but one has to include stdlib.h before using it.
Signed-off-by: Martin Storsjö <martin@martin.st>
SetConsoleTextAttribute used to be unavailable for Windows Store apps,
but is available to them now. But GetStdHandle still is unavailable,
thus make sure to check for both functions before using code that
requires both.
Signed-off-by: Martin Storsjö <martin@martin.st>
In case of failure, all the successfully set entries are stored in
*pm. We need to manually free the created dictionary to avoid
memory leak.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
In order to access the original opaque parameter of a buffer in the buffer
pool. (The buffer pool implementation overrides the normal opaque parameter but
also saves it so it is accessible).
v2: add assertion check before dereferencing the BufferPoolEntry.
Signed-off-by: Marton Balint <cus@passwd.hu>
Fixes: signed integer overflow: 2147483647 + 1 cannot be represented in type 'int'
Fixes: 18333/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_COMFORTNOISE_fuzzer-5668481831272448
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
1)Some filters allow cross-referenced expressions e.g. x=y+10. In
such cases, filters evaluate expressions multiple times for
successful evaluation of all expressions. If the expression for one or
more variables contains a RNG, the result may vary across evaluation
leading to inconsistent values across the cross-referenced expressions.
2)A related case is circular expressions e.g. x=y+10 and y=x+10 which
cannot be succesfully resolved.
3)Certain filter variables may only be applicable in specific eval modes
and lead to a failure of evaluation in other modes e.g. pts is only
relevant for frame eval mode.
At present, there is no reliable means to identify these occurrences and
thus the error messages provided are broad or inaccurate. The helper
function introduced - av_expr_count_vars - allows developers to identify
the use and count of variables in expressions and thus tailor the error
message, allow for a graceful fallback and/or decide evaluation order.
remove_side_data is supposed to remove a single instance by design.
Since new_side_data() doesn't forbid add multiple instances of the
same type, remove_side_data should deal with that.
Signed-off-by: Marton Balint <cus@passwd.hu>
Performance of WMV3 decoding has speed up from 3.66x to 5.23x tested on 3A4000.
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Linux and OSX systems support basename and dirname via <libgen.h>, I plan to
make the wrapper interface conform to the standard interface first.
If it is feasible, I will continue to modify it to call the system interface
if there is already a system call interface.
You can get more description about the system interface by below command:
"man 3 basename"
Reviewed-by: Marton Balint <cus@passwd.hu>
Reviewed-by: Tomas Härdin <tjoppen@acc.umu.se>
Reviewed-by: Steven Liu <lq@chinaffmpeg.org>
Signed-off-by: Limin Wang <lance.lmwang@gmail.com>
When used ROUNDED_DIV(a,b), if a is unsigned integer zero, it's
will lead to an underflow issue(it called unsigned integer
wrapping).
Fixes#8062
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Mengye Lv <mengyelv@tencent.com>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
This bug has been introduced in 9e0a071e.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The pointer arguments to memcpy (and several other functions of the
C standard library) are not allowed to be NULL, not even when the number
of bytes to copy is zero. An AVEncryptionInitInfo's data pointer is
explicitly allowed to be NULL and yet av_encryption_init_info_add_side_data
unconditionally used it as a source pointer to copy from. This commit changes
this so that copying is only done if the number of bytes to copy is > 0.
Fixes ticket #8141 as well as a part of ticket #8150.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Reviewed-by: Tomas Härdin <tjoppen@acc.umu.se>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
It's often not obvious how option constants relate to numerical values.
Defaults are sometimes printed as numbers and from/to are always printed as numbers.
Printing the numeric values of options constants avoids this confusion.
It also allows to see which constants are equivalent.
Before this patch:
-segment_list_type <int> E........ set the segment list type (from -1 to 4) (default -1)
flat E........ flat format
csv E........ csv format
ext E........ extended format
ffconcat E........ ffconcat format
m3u8 E........ M3U8 format
hls E........ Apple HTTP Live Streaming compatible
Afterwards:
-segment_list_type <int> E........ set the segment list type (from -1 to 4) (default -1)
flat 0 E........ flat format
csv 1 E........ csv format
ext 3 E........ extended format
ffconcat 4 E........ ffconcat format
m3u8 2 E........ M3U8 format
hls 2 E........ Apple HTTP Live Streaming compatible
Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Integer values should not be printed using format specifier '%g' which leads to inexact display in case of higher values.
Before this patch:
-trans_color <int> .D.V..... color value [...] (default 1.67772e+07)
Afterwards:
-trans_color <int> .D.V..... color value [...] (default 16777215)
Signed-off-by: softworkz <softworkz@hotmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
av_mod_uintp2_c uses a bitwise AND with (1 << p) - 1 to clear the high
bits of an unsigned int. But this is undefined if p == 31, because 1 is
an int and 2^31 is not representable in an int. So make 1 unsigned.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Changing details as following:
1. The previous order of parameters are irregular and difficult to
understand. Adjust the order of the parameters according to the
rule: (RTYPE, input registers, input mask/input index/..., output registers).
Most of the existing msa macros follow the rule.
2. Remove the redundant macro SLDI_Bn_0 and use SLDI_Bn instead.
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is an alias for JEDEC P22.
The name associated with the value is also changed
from jedec-p22 to ebu3213 to match ITU-T H.273.
Signed-off-by: Raphaël Zumer <rzumer@tebako.net>
Signed-off-by: James Almer <jamrial@gmail.com>
Fixes: signed integer overflow: 9223372036854775807 + 1 cannot be represented in type 'long'
Fixes: 16022/clusterfuzz-testcase-minimized-ffmpeg_DEMUXER_fuzzer-5759796759756800
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Always set *size to zero if *bufptr is NULL, it's more make sence.
fix#8095
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Jun Zhao <barryjzhao@tencent.com>
As of LLVM r368102, Clang will set a pointer tag in bits 56-63 of the
address of a global when compiling with -fsanitize=hwaddress. This requires
an adjustment to assembly code that takes the address of such globals: the
code cannot use the regular R_AARCH64_ADR_PREL_PG_HI21 relocation to refer
to the global, since the tag would take the address out of range. Instead,
the code must use the non-checking (_NC) variant of the relocation (the
link-time check is substituted by a runtime check).
This change makes the necessary adjustment in the movrel macro, where it is
needed when compiling with -fsanitize=hwaddress.
Signed-off-by: Peter Collingbourne <pcc@google.com>
Reviewed-by: Martin Storsjö
Reviewed-by: Janne Grunau
Changing details as following:
1. Remove the local variable 'out_m' in 'CLIP_SH' and store the result in
source vector.
2. Refine the implementation of macro 'CLIP_SH_0_255' and 'CLIP_SW_0_255'.
Performance of VP8 decoding has speed up about 1.1%(from 7.03x to 7.11x).
Performance of H264 decoding has speed up about 0.5%(from 4.35x to 4.37x).
Performance of Theora decoding has speed up about 0.7%(from 5.79x to 5.83x).
3. Remove redundant macro 'CLIP_SH/Wn_0_255_MAX_SATU' and use 'CLIP_SH/Wn_0_255'
instead, because there are no difference in the effect of this two macros.
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Simply moves and templates the actual transforms to support an
additional data type.
Unlike the float version, which is equal or better than libfftw3f,
double precision output is bit identical with libfftw3.
Replace STnxm_UB and LDnxm_SH with new macros ST_{H/W/D}{1/2/4/8}.
The old macros are difficult to use because they don't follow the same parameter passing rules.
Changing details as following:
1. remove LD4x4_SH.
2. replace ST2x4_UB with ST_H4.
3. replace ST4x2_UB with ST_W2.
4. replace ST4x4_UB with ST_W4.
5. replace ST4x8_UB with ST_W8.
6. replace ST6x4_UB with ST_W2 and ST_H2.
7. replace ST8x1_UB with ST_D1.
8. replace ST8x2_UB with ST_D2.
9. replace ST8x4_UB with ST_D4.
10. replace ST8x8_UB with ST_D8.
11. replace ST12x4_UB with ST_D4 and ST_W4.
Examples of new macro: ST_H4(in, idx0, idx1, idx2, idx3, pdst, stride)
ST_H4 store four half-word elements in vector 'in' to pdst with stride.
About the macro name:
1) 'ST' means store operation.
2) 'H/W/D' means type of vector element is 'half-word/word/double-word'.
3) Number '1/2/4/8' means how many elements will be stored.
About the macro parameter:
1) 'in0, in1...' 128-bits vector.
2) 'idx0, idx1...' elements index.
3) 'pdst' destination pointer to store to
4) 'stride' stride of each store operation.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Loongson 3A4000 and 2k1000 has supported MSA2.0.
This patch optimized SAD_UB2_UH,UNPCK_R_SH_SW,UNPCK_SB_SH and UNPCK_SH_SW with MSA2.0 instruction.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
FF_DECODE_ERROR_CONCEALMENT_ACTIVE is set when the decoded frame has error(s) but the returned value from
avcodec_receive_frame is zero i.e. concealed errors
Signed-off-by: Amir Pauker <amir@livelyvideo.tv>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Tries to find a device backed by the i915 kernel driver and loads the iHD
VAAPI driver to use with it. This reduces confusion on machines with
multiple DRM devices and removes the surprising requirement to set the
LIBVA_DRIVER_NAME environment variable to use libmfx at all.
Opening the device via X11 (DRI2/DRI3) rather than opening a DRM render
node directly is only useful if you intend to use the legacy X11 interop
functions. That's never true for the ffmpeg utility, and a library user
who does want this will likely provide their own display instance rather
than making a new one here.
For example: -init_hw_device vaapi:/dev/dri/renderD128,driver=foo
This may be more convenient that using the environment variable, and allows
loading different drivers for different devices in the same process.
Iterate over available render devices and pick the first one which looks
usable. Adds an option to specify the name of the kernel driver associated
with the desired device, so that it is possible to select a specific type
of device in a multiple-device system without knowing the card numbering.
For example: -init_hw_device vaapi:,kernel_driver=amdgpu will select only
devices using the "amdgpu" driver (as used with recent AMD graphics cards).
Kernel driver selection requires libdrm to work.
This commit adds a new API to libavutil to allow for arbitrary transformations
on various types of data.
This is a partly new implementation, with the power of two transforms taken
from libavcodec/fft_template, the 5 and 15-point FFT taken from mdct15, while
the 3-point FFT was written from scratch.
The (i)mdct folding code is taken from mdct15 as well, as the mdct_template
code was somewhat old, messy and not easy to separate.
A notable feature of this implementation is that it allows for 3xM and 5xM
based transforms, where M is a power of two, e.g. 384, 640, 768, 1280, etc.
AC-4 uses 3xM transforms while Siren uses 5xM transforms, so the code will
allow for decoding of such streams.
A non-exaustive list of supported sizes:
4, 8, 12, 16, 20, 24, 32, 40, 48, 60, 64, 80, 96, 120, 128, 160, 192, 240,
256, 320, 384, 480, 512, 640, 768, 960, 1024, 1280, 1536, 1920, 2048, 2560...
The API was designed such that it allows for not only 1D transforms but also
2D transforms of certain block sizes. This was partly on accident as the stride
argument is required for Opus MDCTs, but can be used in the context of a 2D
transform as well.
Also, various data types would be implemented eventually as well, such as
"double" and "int32_t".
Some performance comparisons with libfftw3f (SIMD disabled for both):
120:
22353 decicycles in fftwf_execute, 1024 runs, 0 skips
21836 decicycles in compound_fft_15x8, 1024 runs, 0 skips
128:
22003 decicycles in fftwf_execute, 1024 runs, 0 skips
23132 decicycles in monolithic_fft_ptwo, 1024 runs, 0 skips
384:
75939 decicycles in fftwf_execute, 1024 runs, 0 skips
73973 decicycles in compound_fft_3x128, 1024 runs, 0 skips
640:
104354 decicycles in fftwf_execute, 1024 runs, 0 skips
149518 decicycles in compound_fft_5x128, 1024 runs, 0 skips
768:
109323 decicycles in fftwf_execute, 1024 runs, 0 skips
164096 decicycles in compound_fft_3x256, 1024 runs, 0 skips
960:
186210 decicycles in fftwf_execute, 1024 runs, 0 skips
215256 decicycles in compound_fft_15x64, 1024 runs, 0 skips
1024:
163464 decicycles in fftwf_execute, 1024 runs, 0 skips
199686 decicycles in monolithic_fft_ptwo, 1024 runs, 0 skips
With SIMD we should be faster than fftw for 15xM transforms as our fft15 SIMD
is around 2x faster than theirs, even if our ptwo SIMD is slightly slower.
The goal is to remove the libavcodec/mdct15 code and deprecate the
libavcodec/avfft interface once aarch64 and x86 SIMD code has been ported.
New code throughout the project should use this API.
The implementation passes fate when used in Opus, AAC and Vorbis, and the output
is identical with ATRAC9 as well.
These are the 4:4:4 variants of the semi-planar NV12/NV21 formats.
These formats are not used much, so we've never had a reason to add
them until now. VDPAU recently added support HEVC 4:4:4 content
and when you use the OpenGL interop, the returned surfaces are in
NV24 format, so we need the pixel format for media players, even
if there's no direct use within ffmpeg.
Separately, there are apparently webcams that use NV24, but I've
never seen one.
New VdpYCbCr Formats VDP_YCBCR_FORMAT_Y_U_V_444 and,
VDP_YCBCR_FORMAT_Y_UV_444 have been added in VDPAU with libvdpau-1.2
to be used in get/putbits for YUV 4:4:4 surfaces. Earlier mapping of
AV_PIX_FMT_YUV444P to VDP_YCBCR_FORMAT_YV12 is not valid.
Hence this Change maps AV_PIX_FMT_YUV444P to VDP_YCBCR_FORMAT_Y_U_V_444
to access the YUV 4:4:4 surface via read-back API's of VDPAU.
* commit 'c4642788e83b0858bca449f9b6e71ddb015dfa5d':
time_internal: Prefix fallback versions of gmtime_r/localtime_r with ff_
Merged-by: James Almer <jamrial@gmail.com>
Fix the aligned check in hwupload, input surface should be 16 aligned
too.
Partly fix#7830.
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Zhong Li <zhong.li@intel.com>
The function in case of n=0 would read more bytes than 0.
The end pointer could be beyond the allocated space, which
is undefined.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Silences several warnings:
libavutil/hwcontext_d3d11va.c:413:49: warning: passing argument 3 of ‘av_image_copy’ from incompatible pointer type
libavutil/hwcontext_d3d11va.c:425:47: warning: passing argument 3 of ‘av_image_copy’ from incompatible pointer type
libavutil/hwcontext_dxva2.c:351:45: warning: passing argument 3 of ‘av_image_copy’ from incompatible pointer type
libavutil/hwcontext_dxva2.c:382:52: warning: passing argument 3 of ‘av_image_copy_uc_from’ from incompatible pointer type
Use a macro to redirect calling code from the official name to the
ff_ prefixed one.
Detecting these functions in configure can be tricky (on mingw, they
are conditionally available depending on posix feature defines).
If configure didn't detect them, but they still are visible at
compile time (due to an unrelated header defining the posix feature
defines), providing the local fallback versions with a prefixed
name is safer.
Signed-off-by: Martin Storsjö <martin@martin.st>
The `opencl_get_plane_format` function was incorrectly determining the
value used to set the image channel order. This resulted in all RGB
pixel formats being set to the `CL_RGBA` pixel format, regardless of
whether or not they actually *were* RGBA.
This patch fixes the issue by using the `offset` and depth of components
rather than the loop index to determine the value of `order`.
Signed-off-by: Jarek Samic <cldfire3@gmail.com>
Signed-off-by: Mark Thompson <sw@jkqxz.net>
Similarly to the previous changes, we don't need to synchronise
after a memcpy to device memory. On the other hand, we need to
keep synchronising after a copy to host memory, otherwise there's
no guarantee that subsequent host reads will return valid data.
The function typedefs we were using are only present when using the
dynamic loader, which means compilation breaks for code directly
using the cuda SDK.
To fix this, let's just duplicate the function typedefs locally.
These are not going to change.
Fixes a warning with clang:
libavutil/imgutils.c:314:16: warning: absolute value function 'abs'
given an argument of type 'ptrdiff_t' (aka 'long') but has
parameter of type 'int' which may cause truncation of value
As .rodata isn't one of the default created sections for COFF, it was
created as a read-write data section. By using the default .rdata
section name for COFF, it automatically becomes a read-only data section.
The existing ".section .rodata" works as intended for ELF though.
This is based on an original patch and diagnose by Tom Tan
<Tom.Tan@microsoft.com>.
Signed-off-by: Martin Storsjö <martin@martin.st>
Optimize put_hevc_qpel_hv_8 with mmi in the case width=4/8/12/16/24/32/48/64.
This optimization improved HEVC decoding performance 11%(1.81x to 2.01x, tested on loongson 3A3000).
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This is strongly based on code by Marton Balint, and depends on the previous commit
Fixes: Timeout
Fixes: 11502/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920
Before: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 11209 ms
After: Executed clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_WCMV_fuzzer-5664893810769920 in 4104 ms
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Reviewed-by: Marton Balint <cus@passwd.hu>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The encoders such as libx264 support different QPs offset for different MBs,
it makes possible for ROI-based encoding. It makes sense to add support
within ffmpeg to generate/accept ROI infos and pass into encoders.
Typical usage: After AVFrame is decoded, a ffmpeg filter or user's code
generates ROI info for that frame, and the encoder finally does the
ROI-based encoding.
The ROI info is maintained as side data of AVFrame.
Signed-off-by: Guo, Yejun <yejun.guo@intel.com>
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
macros for reading and writing 64-bit aligned little-endian values.
these macros are used by the DST decoder and give a performance boost
on platforms that where the compiler must guard against unaligned
memory access.
The dynamic metadata contains data for color volume transform -
application 4 of SMPTE 2094-40:2016 standard. The data comes from
HEVC in the SEI_TYPE_USER_DATA_REGISTERED_ITU_T_T35.
Signed-off-by: Vittorio Giovara <vittorio.giovara@gmail.com>
The alloc_size attribute is valid only on functions that return a
pointer. GCC 9 (not yet released) warns about invalid usage:
./libavutil/mem.h:342:1: warning: 'alloc_size' attribute ignored on a function returning int' [-Wattributes]
342 | av_alloc_size(2, 3) int av_reallocp_array(void *ptr, size_t nmemb, size_t size);
| ^~~~~~~~~~~~~
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The current wording regarding size and min_size is completely wrong and
ignores that min_size is indeed only a desired minimal size, not the
actually allocated size.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@googlemail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes the following warning:
libavutil/avsscanf.c: In function 'decfloat':
libavutil/avsscanf.c:354:9: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
int bitlim = bits-3*(int)(rp-9);
^~~
The header guards were unnecessarily non-standard and the c file
inclusion trick means the files dont't have standard licence
headers.
Based on a patch by: Martin Vignali <martin.vignali@gmail.com>
We have a pattern of wrapping CUDA calls to print errors and
normalise return values that is used in a couple of places. To
avoid duplication and increase consistency, let's put the wrapper
implementation in a shared place and use it everywhere.
Affects:
* avcodec/cuviddec
* avcodec/nvdec
* avcodec/nvenc
* avfilter/vf_scale_cuda
* avfilter/vf_scale_npp
* avfilter/vf_thumbnail_cuda
* avfilter/vf_transpose_npp
* avfilter/vf_yadif_cuda
This was marked as deprecated (but only in the doxygen, not with an
actual deprecation attribute) in 81c623fae0 in 2011, but was
undeprecated in ad1ee5fa7.
Signed-off-by: Martin Storsjö <martin@martin.st>
This was marked as deprecated (but only in the doxygen, not with an
actual deprecation attribute) in 81c623fae0 in 2011, but was
undeprecated in ad1ee5fa7.
Signed-off-by: Martin Storsjö <martin@martin.st>
Add error handle if av_image_fill_pointers fail.
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
This is needed because of 32bit float formats (which are difficult to
store in 16bits)
This also fixes undefined behavior found by fate
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Create SMPTE ST 12-1 timecodes based on H.264 SEI picture timing
info.
For framerates > 30 FPS, the field flag is used in conjunction with
pairs of frames which contain the same frame timestamp in S12M.
Ensure the field is properly set per the spec.
Prior to Xcode 9.3, the clang built-in assembler didn't support
altmacro, and gas-preprocessor was used for assembling for arm/darwin.
For thumb functions, gas-preprocessor took care of adding the .thumb_func
directives, but when now being able to assemble without gas-preprocessor,
we need to add these directives ourselves.
Signed-off-by: Martin Storsjö <martin@martin.st>
Libmfx requires 16 bytes aligned input/output for uploading.
Currently only output is 16 byte aligned and assigning same width/height to
input with smaller buffer size actually, thus definitely will cause segment fault.
Can reproduce with any 1080p nv12 rawvideo input:
ffmpeg -init_hw_device qsv=qsv:hw -hwaccel qsv -filter_hw_device qsv -f rawvideo -pix_fmt nv12 -s:v 1920x1080
-i 1080p_nv12.yuv -vf 'format=nv12,hwupload=extra_hw_frames=16,hwdownload,format=nv12' -an -y out_nv12.yuv
It can fix#7418
Signed-off-by: Zhong Li <zhong.li@intel.com>
RGB32(AV_PIX_FMT_BGRA on intel platforms) format may be used as overlay with alpha blending.
So add AV_PIX_FMT_BGRA format support.
One example of alpha blending overlay: ffmpeg -hwaccel qsv -c:v h264_qsv -i BA1_Sony_D.jsv
-filter_complex 'movie=lena-rgba.png,hwupload=extra_hw_frames=16[a];[0:v][a]overlay_qsv=x=10:y=10'
-c:v h264_qsv -y out.mp4
Rename RGB32 to be BGRA to make it clearer as Mark Thompson's suggestion.
V2: Add P010 format support else will introduce HEVC 10bit encoding regression.
Thanks for LinJie's discovery.
Signed-off-by: Zhong Li <zhong.li@intel.com>
Verified-by: Fu, Linjie <linjie.fu@intel.com>
Variable 'ret' hasn't been initialized,thus introducing a random
hwupload failure regression due to qsv session uninitialized.
Signed-off-by: Zhong Li <zhong.li@intel.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Give the entries in the VAAPI format map table an explicit type and add
functions to do the necessary lookups. Add another field to this table
indicating whether the chroma planes are swapped (as in YV12), and use
that rather than explicit comparisons where swapping is needed.
Clarify that the list is the naughty list, and therefore being on it is
not desirable. The i965 driver does not need to be on the list after
version 2.0 (when the standard parameter buffer rendering behaviour was
changed).
* commit 'f89ec87afaf0d1abb6d450253b0b348fd554533b':
frame: Simplify the video allocation
Merged-by: James Almer <jamrial@gmail.com>
Padding-Remixed-by: Michael Niedermayer <michael@niedermayer.cc>
Removing unused VPP sessions by initializing only when used in order to help
reduce CPU utilization.
Thanks to Maxym for the guidance.
Signed-off-by: Joe Olivas <joseph.k.olivas@intel.com>
Signed-off-by: Maxym Dmytrychenko <maxim.d33@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Performance of mpeg4 decoding improved about 23%(from 128fps to 158fps, tested on loongson 3A3000).
Reoptimized following functions with mmi.
1. ff_simple_idct_put_8_mmi
2. ff_simple_idct_add_8_mmi
3. ff_simple_idct_8_mmi
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Found by Chrome's ClusterFuzz: https://crbug.com/873693
Signed-off-by: Jacob Trimble <modmaker@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
libavutil/hwcontext_d3d11va.c: In function 'd3d11va_device_create':
libavutil/hwcontext_d3d11va.c:554:46: warning: passing argument 2 of 'pAdapter->lpVtbl->GetDesc' from incompatible pointer type [-Wincompatible-pointer-types]
hr = IDXGIAdapter2_GetDesc(pAdapter, &desc);
^
libavutil/hwcontext_d3d11va.c:554:46: note: expected 'DXGI_ADAPTER_DESC * {aka struct DXGI_ADAPTER_DESC *}' but argument is of type 'DXGI_ADAPTER_DESC2 * {aka struct DXGI_ADAPTER_DESC2 *}'
Reviewed-by: Jean-Baptiste Kempf <jb@videolan.org>
Signed-off-by: James Almer <jamrial@gmail.com>