1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-13 21:28:01 +02:00
Commit Graph

5910 Commits

Author SHA1 Message Date
Lynne
710d83bdde
lavu/tx: zero-out imaginary of last coefficient in forward RDFTs
We didn't do this, because it's zero anyway, but it prevents users from using
uninitialized memory in calculations.
2022-12-03 21:02:00 +01:00
Michael Niedermayer
7792825ad6
avutil/tx: Use unsigned in ff_tx_fft_sr_combine() to avoid undefined behavior
Fixes: signed integer overflow: -1284837070 - 982101618 cannot be represented in type 'int'
Fixes: 53105/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_AC3_FIXED_fuzzer-4848015827664896

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2022-11-28 20:58:05 +01:00
Lynne
90c17a05aa
x86/tx_float: fix stray change in 15xM FFT and replace imul->lea
Thanks to rorgoroth for bisecting and kurosu for the lea suggestion.
2022-11-28 16:58:12 +01:00
Andreas Rheinhardt
1a7efafd33 avutil/tx: Use proper deallocator
May fix the FATE failures on x64 Windows here:
https://fate.ffmpeg.org/report.cgi?slot=x86_64-msvc17-windows-native&time=20221125130443

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-25 15:54:33 +01:00
Lynne
a56d7e0ca3
lavu/tx: add DCT-III implementation 2022-11-24 15:58:36 +01:00
Lynne
504b7bec1a
lavu/tx: add DCT-II implementation 2022-11-24 15:58:35 +01:00
Lynne
93c30bd6f0
lavu/tx: clarify stride for RDFT transforms 2022-11-24 15:58:35 +01:00
Lynne
43d285a40f
lavu/tx: fix last coefficient scaling for R2C transforms
This was a typo.
2022-11-24 15:58:35 +01:00
Lynne
8547123f3b
lavu/tx: generalize PFA FFTs
This commit permits any stacking of FFTs of any size.
2022-11-24 15:58:34 +01:00
Lynne
7f019e7758
lavu/tx: add length decomposition function
Rather than using a list of lengths supported, this goes a step beyond
and uses all registered codelets to come up with a good decomposition.
2022-11-24 15:58:34 +01:00
Lynne
87bae6b018
lavu/tx: refactor to explicitly track and convert lookup table order
Necessary for generalizing PFAs.
2022-11-24 15:58:34 +01:00
Lynne
1c8d77a2bf
lavu/tx: refactor and separate codelet list and prio code 2022-11-24 15:58:33 +01:00
Lynne
958b3760b5
lavu/tx: improve transform tree logging
Now prints the actual codelet size used, as well as the number of
allowed factors.
2022-11-24 15:58:33 +01:00
Lynne
6ddd10c3e2
lavu/tx: allow codelets to specify a minimum number of matching factors 2022-11-24 15:58:33 +01:00
Lynne
dd77e61182
lavu/tx: add ff_tx_clear_ctx()
This function allows implementations to clean up a context after
successfully initializing subcontexts.
2022-11-24 15:58:32 +01:00
Lynne
fab97faf02
x86/tx_float: implement striding in fft_15xM 2022-11-24 15:58:32 +01:00
Lynne
92100eee5b
x86/tx_float_init: properly specify the supported factors of 15xM FFTs
Only powers of two are currently supported.
2022-11-24 15:58:32 +01:00
Lynne
cc1df4045e
x86/tx_float: add a standalone 15-point AVX2 transform
Enables its use everywhere else in the framework.
2022-11-24 15:58:31 +01:00
Lynne
877e575b5d
x86/tx_float: optimize and macro out FFT15 2022-11-24 15:58:31 +01:00
Lynne
fbe4fd992f
lavu/tx: support output stride in naive transforms
Allows them to be used in general PFAs.
2022-11-24 15:58:31 +01:00
Lynne
68cabf8750
lavu/tx: add fft_inplace_small transforms
This is much faster than the loop.
2022-11-24 15:58:30 +01:00
Lynne
d4e39cae2e
lavu/tx: drop requirement of input == output for in-place transforms
No longer necessary.
2022-11-24 15:58:30 +01:00
Lynne
fff3e1d848
lavu/tx: support out-of-place transforms in fft_inplace
This makes testing easier, as a unified path can be used for in/out of
place transforms.
2022-11-24 15:58:30 +01:00
Lynne
d260796f11
lavu/tx: make C ptwo transforms in+out of place
We assume that _all_ in-place transforms can operate out of place,
which isn't true, because the C ptwo transforms were always in-place (dst).
2022-11-24 15:58:29 +01:00
Lynne
37008dc402
lavu/tx: add naive_small FFT
The same as naive but with precomputed tables. Makes it more useful
for odd-factors we don't support yet.
2022-11-24 15:58:29 +01:00
Lynne
e8a9b7b298
lavu/tx: list all odd-length FFT factors as regular codelets
Allows them to be picked just like any other transform.
2022-11-24 15:58:28 +01:00
Lynne
45bd4bf79f
lavu/tx: generalize single-factor transforms
Not that useful, but it gives us fast small odd-length transforms.
2022-11-24 15:58:28 +01:00
Lynne
79f11e2409
lavu/tx: make prime factor transforms truly in-place
They all overwrote in[0] and then used it as a DC.
2022-11-24 15:58:28 +01:00
Haihao Xiang
3dc8bceabe lavu/pixfmt: Update the description for AV_PIX_FMT_QSV
Since D3D11 was introduced for QSV in FFmpeg 5.0, there is an implied
API/ABI change for user-supplied frames [1], hence update the
description for AV_PIX_FMT_QSV.

[1] https://ffmpeg.org/pipermail/ffmpeg-devel/2021-December/290444.html

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-11-22 13:52:38 +08:00
Zhao Zhili
b7a3f16957 avutil/hwcontext: verify hw_frames_ctx in transfer_data_alloc
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2022-11-21 23:57:03 +08:00
Zhao Zhili
2697f23f4e avutil/hwcontext_mediacodec: add ANativeWindow support
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2022-11-21 23:53:27 +08:00
James Almer
84fe53f6e1 avutil/hwcontext_cuda: fix compilation without Vulkan after last commit
Signed-off-by: James Almer <jamrial@gmail.com>
2022-11-12 15:54:53 -03:00
James Almer
f4aa5c275f avutil/hwcontext_cuda: fix mixed declarations and code warning
Signed-off-by: James Almer <jamrial@gmail.com>
2022-11-12 15:52:10 -03:00
Andreas Rheinhardt
c124981b79 avutil/cast5: Avoid undefined shift of uint32_t by 32 places
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-11-11 12:24:23 +01:00
James Almer
86157f5a25 avutil/tx: use llrintf() to convert a float into a 64 bit integer
Should fix fate failures on Windowx x86 targets, where long is 32 bits.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-11-08 14:24:49 -03:00
James Almer
d5c7970a27 avutil/tx: use a lower log level for the debug messages
The amount of lines printed is too high for the verbose level, and the debug
level is a better fit for their content.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-11-08 14:08:05 -03:00
Marvin Scholz
2508e846a8 avutil/dict: Improve documentation
Mostly consistent formatting and consistently ordering of
warnings/notes to be next to the description.

Additionally group the AV_DICT_* macros.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-11-06 08:26:50 +01:00
Marvin Scholz
3101b8afb3 avutil/dict: Use av_dict_iterate in av_dict_get_string
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-11-06 08:26:50 +01:00
Marvin Scholz
3c2050b749 avutil/dict: Use av_dict_iterate in av_dict_copy
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-11-06 08:26:50 +01:00
Marvin Scholz
5f7c5a0bd7 avutil/dict: Use av_dict_iterate in av_dict_get
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-11-06 08:26:50 +01:00
Marvin Scholz
9dad237928 avutil/dict: Add av_dict_iterate
This is a more explicit iteration API rather than using the "magic"
av_dict_get(d, "", t, AV_DICT_IGNORE_SUFFIX) which is not really
trivial to grasp what it does when casually reading through code.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-11-06 08:26:48 +01:00
James Darnley
0f252dfa95 avutil/tests/cpu: print the avx512icl flag 2022-11-04 19:37:46 +01:00
James Almer
6228ba141d avutil/channel_layout: add a 7.1(top) channel layout
Signed-off-by: James Almer <jamrial@gmail.com>
2022-11-03 19:39:45 -03:00
James Almer
83e918de71 avutil/channel_layout: add a cube channel layout
Signed-off-by: James Almer <jamrial@gmail.com>
2022-10-30 16:18:30 -03:00
Andreas Rheinhardt
f8efd890bf avutil/tx_template: Move function pointers to const memory
This can be achieved by moving the AVOnce out of the structure
containing the function pointers; the latter can then be made
const.
This also has the advantage of eliminating padding in the structure
(sizeof(AVOnce) is four here) and allowing the AVOnces to be put
into .bss (dependening upon the implementation).

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-28 09:30:10 +02:00
Andreas Rheinhardt
188216581b avutil/tx_template: Avoid code duplication
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-28 09:30:10 +02:00
Andreas Rheinhardt
2af5f55b2e avutil/tx_template: Don't waste space for inexistent factors
It is possible to avoid the factors array for the power-of-two
tables for which said array is unused by using a different
structure for initialization for power-of-two tables than for
non-power-of-two-tables. This saves 3*15*16B from .data.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-28 09:29:41 +02:00
Andreas Rheinhardt
d7c3e52fbf avutil/integer: Use '|' instead of '+' where it is more natural
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-24 20:11:20 +02:00
Andreas Rheinhardt
9a6cdd1ba3 avutil/integer: Fix undefined left shifts of negative numbers
Affected the integers FATE-test.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-24 18:04:24 +02:00
Andreas Rheinhardt
f49375f28f avutil/aes: Don't use out-of-bounds index
Up until now, av_aes_init() uses a->round_key[0].u8 + t
as dst of memcpy where it is intended for t to greater
than 16 (u8 is an uint8_t[16]); given that round_key itself
is an array, it is actually intended for the dst to be
in a latter round_key member. To do this properly,
just cast a->round_key to unsigned char*.

This fixes the srtp, aes, aes_ctr, mov-3elist-encrypted,
mov-frag-encrypted and mov-tenc-only-encrypted
FATE-tests with (Clang-)UBSan.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-24 16:28:14 +02:00
Andreas Rheinhardt
73930e4f93 avutil/aes: Don't use misaligned pointers
The AES code uses av_aes_block, a union consisting of
uint64_t[2], uint32_t[4], uint8_t[4][4] and uint8_t[16].
subshift() performs byte-wise manipulations of two av_aes_blocks,
but when encrypting, it does so with a shift of two bytes;
more precisely, it uses
"av_aes_block *s1 = (av_aes_block *) (s0[0].u8 - s)"
and lateron uses the uint8_t[16] member to access s0.
Yet av_aes_block requires to be suitably aligned for
the uint64_t[2] member, which s0[0].u8 - 2 is certainly
not. This is in violation of 6.3.2.3 (7) of C11. UBSan
reports this in the aes_ctr, mov-3elist-encrypted,
mov-frag-encrypted, mov-tenc-only-encrypted and srtp
tests.
Furthermore, there is another issue here: The pointer points
outside of s0; this works, because all the accesses lateron
use an index >= 3. (Clang-)UBSan reports this as
"runtime error: index -2 out of bounds for type 'uint8_t[16]'".

This commit fixes both of these issues: The latter issue
is fixed by applying an offset of "+ 3" during the cast
and subtracting this from the indices used lateron.
The former issue is solved by not casting to av_aes_block*
at all; instead simply cast to unsigned char*.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-24 16:28:14 +02:00
Anton Khirnov
f66e794672 lavu/thread: add an internal function for setting thread name
Linux-only for now.
2022-10-24 02:00:31 +02:00
Carl Eugen Hoyos
882a17068f lavu/hwcontext_vaapi: Fix type specifier for uintptr_t
Fixes a format specifier warning on x86_32 Linux.
Fixes part of ticket #9986.
2022-10-23 20:51:42 +02:00
Marvin Scholz
3bd0bf76fb avutil/samplefmt: document missing argument 2022-10-17 09:56:47 +02:00
Marvin Scholz
3973d4fbc7 avutil/aes_ctr: document some missing arguments 2022-10-17 09:56:47 +02:00
Marvin Scholz
0baa6871ac avutil/aes: document some missing arguments 2022-10-17 09:56:47 +02:00
Marvin Scholz
a76d5fecf3 avutil/imgutils: document some missing arguments 2022-10-17 09:56:47 +02:00
Marvin Scholz
ed2aa4e692 avutil/crc: Add doxy for missing arguments 2022-10-17 09:56:47 +02:00
Marvin Scholz
96f89cdc87 avutil/des: Add doxy for missing arguments
Additionally reorder so that the arguments list matches the
order of the arguments in the function declaration.
2022-10-17 09:56:47 +02:00
Marvin Scholz
023966d2f8 avutil/avstring: Add doxy for missing argument 2022-10-17 09:56:47 +02:00
Marvin Scholz
20a947f479 avutil/frame: Add doxy for missing argument 2022-10-17 09:56:47 +02:00
Marvin Scholz
3dea9adc67 avutil/rc4: Add doxy for missing arguments 2022-10-17 09:56:47 +02:00
Marvin Scholz
0e7ce0d5e7 avutil/uuid: Remove bogus doxy return doc
The function returns void and no error code.
2022-10-17 09:56:47 +02:00
Marvin Scholz
c4ff708c81 avutil/parseutils: Use inline code and properly escape
For some reason doxygen needs the % to be escaped here, except for the
%% in the inline code…
2022-10-17 09:56:47 +02:00
Marvin Scholz
990340377b avutil/parseutils: Add doxy for missing arguments 2022-10-17 09:56:47 +02:00
Marvin Scholz
8521a691b9 avutil/hwcontext: Add doxy for missing argument 2022-10-17 09:56:47 +02:00
Marvin Scholz
8bac3902b0 avutil/lfg: Minor doxy improvements
Use inline code for sizeof and use proper @return directive.
2022-10-17 09:56:47 +02:00
Marvin Scholz
6a1ad7a752 avutil/lfg: Add doxy for missing argument 2022-10-17 09:56:47 +02:00
Marvin Scholz
a679b87570 avutil/file: Add doxy for missing arguments 2022-10-17 09:56:47 +02:00
Marvin Scholz
b850347a89 avutil/eval: Add doxy for missing arguments 2022-10-17 09:56:47 +02:00
Marvin Scholz
27dbc9e724 avutil/detection_bbox: Add doxy for missing argument 2022-10-17 09:56:47 +02:00
Marvin Scholz
436879a203 avutil/channel_layout: Document missing arguments 2022-10-17 09:55:19 +02:00
Marvin Scholz
f824388c33 avutil/channel_layout: Use inline code for Doxy
This avoids Doxygen to interpret <i> and others that look like XML tags
as those, fixing a warning about unknown tags.
2022-10-17 09:55:19 +02:00
Marvin Scholz
e3c5b8c610 avutil/camellia: Fix doxy @param typo 2022-10-17 09:55:19 +02:00
Marvin Scholz
58b86d8b68 avutil/bprint: Improve doxy documentation
Declare proper group, add the file to that group,
group the defines and document them.

Use lists to represents lists of cases.
2022-10-17 09:55:19 +02:00
Marvin Scholz
88e78ec6a8 avutil/csp: Fix bogus doxy filename
Separate the blocks to make the grouping easier to grasp,
add the file properly to the group and fix the file description
incorrectly used as filename, fixing the following doxy warning:

  warning: the name 'Colorspace' supplied as the argument in
  the \file statement is not an input file
2022-10-17 09:55:19 +02:00
Marvin Scholz
6938ddb167 avutil/stereo3d: Add file to doxy group
This way the related file will be properly grouped with its
corresponding group like it's done in other places in the doxy.
2022-10-17 09:55:19 +02:00
Marvin Scholz
06bcbe1477 avutil/stereo3d: consolidate group doxy
Make it a bit easier to grasp the grouping when not
unnecessarily splitting comment blocks.

Additionally do not try to add lavu_video_stereo3d to itself, resolving
the following doxy warning:
  warning: Refusing to add group lavu_video_stereo3d to itself
2022-10-17 09:55:19 +02:00
Marvin Scholz
7e8d974487 avutil/spherical: Add file to doxy group
This way the related file will be properly grouped with its
corresponding group like it's done in other places in the doxy.
2022-10-17 09:55:19 +02:00
Marvin Scholz
24b610e366 avutil/spherical: consolidate group doxy
Make it a bit easier to grasp the grouping when not
unnecessarily splitting comment blocks.

Additionally do not try to add lavu_video_spherical to itself, resolving
the following doxy warning:
  warning: Refusing to add group lavu_video_spherical to itself
2022-10-17 09:55:19 +02:00
Marvin Scholz
71c45b8a44 avutil/display: Add file to doxy group
This way the related file will be properly grouped with its
corresponding group like it's done in other places in the doxy.
2022-10-17 09:55:19 +02:00
Marvin Scholz
9570a833a0 avutil/display: consolidate group doxy
Make it a bit easier to grasp the grouping when not
unnecessarily splitting comment blocks.

Additionally do not try to add lavu_video_display to itself, resolving
the following doxy warning:
  warning: Refusing to add group lavu_video_display to itself
2022-10-17 09:55:19 +02:00
Marvin Scholz
c2c4ef6ae4 avutil/aes_ctr: Add proper doxy group
So it will be properly listed along the other crypto modules
in the documentation.
2022-10-17 09:55:19 +02:00
Marvin Scholz
c468a8c04f avutil/twofish: Fix doxy @param typo 2022-10-17 09:51:47 +02:00
Marvin Scholz
f29dde49d2 avutil/channel_layout: Group pre-defined channel layouts 2022-10-17 09:51:47 +02:00
Marvin Scholz
6c2ae2e994 avutil/channel_layout: Group deprecated functions
Makes it a bit easier to spot the deprecated ones when
looking at the overview.
2022-10-17 09:51:47 +02:00
Marvin Scholz
57c8722a47 avutil/channel_layout: Move to its own group
Before it was cluttering the general avutil Audio group page.
2022-10-17 09:51:47 +02:00
Marvin Scholz
2b51b1829d avutil/channel_layout: Remove bogus closing group 2022-10-17 09:51:47 +02:00
Marvin Scholz
3fbf8d6e1d avutil: Fix mismatching argument names 2022-10-17 09:51:47 +02:00
Rémi Denis-Courmont
96a83ceea4 riscv: fix scalar product initialisation
VSETVLI xd, x0, ...' has rather nonobvious semantics:
- If xd is x0, then it preserves the current vector length.
- If xd is not x0, it sets the vector length to the supported maximum.

Also somewhat confusingly, while VMV.X.S always does its thing
regardless of the selected vector length, VMV.S.X does _nothing_ if the
selected vector length is zero.

So the current code breaks fails to initialise the accumulator if we
are unlucky to have a selected vector length of zero on entry. Fix it
by forcing the vector length to one.
2022-10-13 10:17:38 +02:00
Leo Izen
479747645f avutil/pixfmt.h: add native-endian RGB32F and RGBA32F formats
Add an AV_PIX_FMT_NE macro for RGB32FBE/RGB32FLE and also one for
RGBA32FBE/RGBA32FLE for packed 32-bit float RGB samples, and also
packed 32-bit float RGBA samples, respectively.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Leo Izen <leo.izen@gmail.com>
2022-10-11 16:31:15 -03:00
Reimar Döffinger
38cd829dce
aarch64: Implement stack spilling in a consistent way.
Currently it is done in several different ways, which
might cause needless dependencies or in case of
tx_float_neon.S is incorrect.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>
2022-10-11 09:12:02 +02:00
Andreas Rheinhardt
a60befce40 avutil/attributes_internal: Add visibility pragma
GCC 4.0 not only added a visibility attribute, but also
a pragma to set it for a whole region of code.*
This commit exposes this via macros.

*: See https://gcc.gnu.org/gcc-4.0/changes.html

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-10 13:43:59 +02:00
Fei Wang
201cb35061 lavu/hwcontext_qsv: add support for 12bit content on Linux
P012, Y212 and XV36 are used for 12bit content in FFmpeg VAAPI, so
these formats should be used in FFmpeg QSV too, however the SDK only
declares support for P016, Y216 and Y416. So this commit fudged mappings
between AV_PIX_FMT_P012 and MFX_FOURCC_P016, AV_PIX_FMT_Y212 and
MFX_FOURCC_Y216, AV_PIX_FMT_XV36 and MFX_FOURCC_Y416.

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Signed-off-by: Wenbin Chen <wenbin.chen@intel.com>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-10-10 09:31:34 +08:00
Haihao Xiang
aba25b391c lavu/hwcontext_qsv: add support for 10bit 4:4:4 content on Linux
XV30 is used for 10bit 4:4:4 content in FFmpeg VAAPI, so XV30 should be
used for 10bit 4:4:4 content in FFmpeg QSV too because QSV is based on
VAAPI on Linux. However the SDK only declares support for Y410 but does
nothing with the alpha in Y410, so this commit fudged a mapping between
AV_PIX_FMT_XV30 and MFX_FOURCC_Y410.

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-10-10 09:31:34 +08:00
Haihao Xiang
1496e7c173 lavu/hwcontext_qsv: specify Shift for each format
We can't get Shift from bit depth for some formats in the SDK. For
example, bit depth is 10, however Shift is 0 for Y410 (XV30 in FFmpeg).
In order to support these formats in the next commits, this patch
specified Shift for each format

Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-10-10 09:31:34 +08:00
Rémi Denis-Courmont
f59a767ccd lavu/riscv: helper macro for VTYPE encoding
On most cases, the vector type (VTYPE) for the RISC-V Vector extension
is supplied as an immediate value, with either of the VSETVLI or
VSETIVLI instructions. There is however a third instruction VSETVL
which takes the vector type from a general purpose register. That is so
the type can be selected at run-time.

This introduces a macro to load a (valid) vector type into a register.
The syntax follows that of VSETVLI and VSETIVLI, with element size,
group multiplier, then tail and mask policies.
2022-10-10 02:22:12 +02:00
Lynne
bd3e552549
lavu: bump minor and add APIChanges entry for RISC-V's RVBbasic 2022-10-05 08:31:15 +02:00
Rémi Denis-Courmont
37d5ddc317 lavu/riscv: CPU flag for the Zbb extension
Unfortunately, it is common, and will remain so, that the Bit
manipulations are not enabled at compilation time. This is an official
policy for Debian ports in general (though they do not support RISC-V
officially as of yet) to stick to the minimal target baseline, which
does not include the B extension or even its Zbb subset.

For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or
even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems
impractical. But at least it can work for the byte-swap DSP functions.
2022-10-05 08:26:19 +02:00
Rémi Denis-Courmont
3ba5579e55 riscv: remove unnecessary #include's
Pointed out by Andreas Rheinhardt.
2022-10-05 06:54:56 +02:00
Johannes Kauffmann
a11e745b97 lavu/fixed_dsp: add missing av_restrict qualifiers
The butterflies_fixed function pointer declaration specifies av_restrict
for the first two pointer arguments. So the corresponding function
definitions should honor this declaration.

MSVC emits warning C4113 for this.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2022-10-04 10:56:12 +02:00
Andreas Rheinhardt
e4beb307ab avutil/channel_layout: Don't mention dead project
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-10-03 23:19:47 +02:00
Andreas Rheinhardt
9d52844aba avutil/tests/pixelutils: Test that all non-hw pix fmts have components
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-30 14:33:08 +02:00
Andreas Rheinhardt
36e805e9df avutil/tests/pixelutils: Use av_assert0 instead for test tools
These are test tools, so they should be picky.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-30 14:33:08 +02:00
Andreas Rheinhardt
5fe447bbb4 avutil/pixdesc: Move ff_check_pixfmt_descriptors() to its only user
Namely to lavu/tests/pixelutils.c. This way, this function will
not be included into actual binaries any more.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-30 14:33:08 +02:00
Andreas Rheinhardt
571b670e7d avutil/pixdesc: Avoid direct access to pix fmt desc array
Instead use av_pix_fmt_desc_next(). It is still possible
to check its return values by comparing it with the
(currently) expected values and the code does so.

Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-30 14:33:08 +02:00
Andreas Rheinhardt
6d0a7e96e7 avutil/pixdesc: Remove always-false checks
ff_check_pixfmt_descriptors() was added in commit
20e99a9c10. At this time,
the values of enum AVPixelFormat were not contiguous;
instead there was a jump from 111 to 291 (or from 115
to 295 depending upon AV_PIX_FMT_ABI_GIT_MASTER).
ff_check_pixfmt_descriptors() accounts for this
by skipping empty descriptors. Yet this issue no longer
exists: There are no holes.

The check for said holes makes GCC believe that the name
can be NULL; because it is used as argument corresponding to
%s in a log statement, it therefore emits a warning
(since d75c4693fe). Therefore
this commit simply removes these checks.

Also move the checks for name before the log statement.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-30 14:33:08 +02:00
Andreas Rheinhardt
3d8754cd09 avutil/display: Drop wrong comments about matrices being allocated
These functions work just as well with stack based matrices.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-29 00:05:32 +02:00
James Almer
299253ae1b avutil/channel_layout: move and improve the comment about unknown orders
Don't place it as doxy specific for the order field, and generalize it both to
also cover already defined orders and to not make it seem like the user is
required to handle a layout they don't fully support or understand.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-28 12:21:18 -03:00
James Almer
bcd2e7d685 avutil/version: bump minor for the new RISC-V cpu flags
Forgotten in 0c0a3deb18.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-28 12:21:18 -03:00
Rémi Denis-Courmont
c47ebfa141 lavu/riscv: helper to read the vector length 2022-09-28 11:43:17 +02:00
Rémi Denis-Courmont
c1bb19e263 lavu/fixeddsp: RISC-V V butterflies_fixed 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
cd77662953 lavu/floatdsp: RISC-V V scalarproduct_float 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
b493370662 lavu/floatdsp: RISC-V V vector_fmul_window 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
9aeb6aca3a lavu/floatdsp: RISC-V V vector_fmul_reverse 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
47ce9735cc lavu/floatdsp: RISC-V V butterflies_float 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
f4ea45040f lavu/floatdsp: RISC-V V vector_fmul_add 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
d120ab5b91 lavu/floatdsp: RISC-V V vector_dmac_scalar 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
c3db27ba95 lavu/floatdsp: RISC-V V vector_fmac_scalar 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
da169a210d lavu/floatdsp: RISC-V V vector_dmul 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
7058af9969 lavu/floatdsp: RISC-V V vector_fmul 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
89b7ec65a8 lavu/floatdsp: RISC-V V vector_dmul_scalar 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
a6c10d05fe lavu/floatdsp: RISC-V V vector_fmul_scalar
This is based on existing code from the VLC git tree with two minor
changes to account for the different function prototypes.
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
39357cad37 lavu/riscv: fallback macros for SH{1, 2, 3}ADD
Those mnemonics require the very latest binutils release at the time of
writing. These macros provide seamless backward compatibility.
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
0c0a3deb18 lavu/cpu: CPU flags for the RISC-V Vector extension
RVV defines a total of 12 different extensions, including:

- 5 different instruction subsets:
  - Zve32x: 8-, 16- and 32-bit integers,
  - Zve32f: Zve32x plus single precision floats,
  - Zve64x: Zve32x plus 64-bit integers,
  - Zve64f: Zve32f plus Zve64x,
  - Zve64d: Zve64f plus double precision floats.

- 6 different vector lengths:
  - Zvl32b (embedded only),
  - Zvl64b (embedded only),
  - Zvl128b,
  - Zvl256b,
  - Zvl512b,
  - Zvl1024b,

- and the V extension proper: equivalent to Zve64f and Zvl128b.

In total, there are 6 different possible sets of supported instructions
(including the empty set), but for convenience we allocate one bit for
each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32),
64-bit ints (RVV_I64) and doubles (RVV_F64).

Whence the vector size is needed, it can be retrieved by reading the
unprivileged read-only vlenb CSR. This should probably be a separate
helper macro if needed at a later point.
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
746f1ff36a lavu/riscv: initial common header for assembler macros 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
b95e2fbd85 lavu/cpu: detect RISC-V base extensions
This introduces compile-time and run-time CPU detection on RISC-V. In
practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of
I, F and D extensions, and if it does, it probably won't have run-time
detection. So the flags are essentially always set.

But as things stand, checkasm wants them that way. Compare the ARMV8
flag on AArch64. We are nowhere near running short on CPU flag bits.
2022-09-27 13:19:52 +02:00
Andreas Rheinhardt
8be6552aa4 avutil/pixdesc: Add av_chroma_location_(enum_to_pos|pos_to_enum)
They are intended as replacements for avcodec_enum_to_chroma_pos()
and avcodec_chroma_pos_to_enum().

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-26 03:02:25 +02:00
Paul B Mahol
7bb0afc245 avutil: add RGBA single-float precision packed formats 2022-09-25 18:34:48 +02:00
Paul B Mahol
63bb6d6a9b avutil: add RGB single-precision float formats 2022-09-25 18:34:48 +02:00
Lynne
f21899db7d
x86/tx_float: enable AVX-only split-radix FFT codelets
Sandy Bridge, Ivy Bridge and Bulldozer cores don't support FMA3.
2022-09-24 04:16:55 +02:00
James Almer
d2f482965f x86/tx_float: fix some symbol names
Should fix compilation on MacOS

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-23 18:53:05 -03:00
James Almer
0d8f43c74d x86/tx_float: change a condition in a preprocessor check
Fixes compilation with yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-23 16:05:07 -03:00
James Almer
750f378bec x86/tx_float: add missing preprocessor wrapper for AVX2 functions
Fixes compilation with old assemblers.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-23 15:15:20 -03:00
Lynne
e7a987d7c9
lavu/tx: remove special -1 inverted lookup mode
It was somewhat hacky and unnecessary.
2022-09-23 12:35:28 +02:00
Lynne
74e8541bab
x86/tx_float: generalize iMDCT
To support non-aligned buffers during the post-transform step, just iterate
backwards over the array.

This allows using the 15xN-point FFT, with which the speed is 2.1 times
faster than our old libavcodec implementation.
2022-09-23 12:35:28 +02:00
Lynne
ace42cf581
x86/tx_float: add 15xN PFA FFT AVX SIMD
~4x faster than the C version.
The shuffles in the 15pt dim1 are seriously expensive. Not happy with it,
but I'm contempt.

Can be easily converted to pure AVX by removing all vpermpd/vpermps
instructions.
2022-09-23 12:35:27 +02:00
Lynne
3241e9225c
x86/tx_float: adjust internal ASM call ABI again
There are many ways to go about it, and this one seems optimal for both
MDCTs and PFA FFTs without requiring excessive instructions or stack usage.
2022-09-23 12:33:35 +02:00
Lynne
7e7baf8ab8
lavu/tx: do not steal lookup tables of subcontexts in the iMDCT
As it happens, some still need their contexts.
2022-09-23 12:33:31 +02:00
James Almer
05cff214b9 avutil/channel_layout: mention how the API user should treat channel orders it does not understand
In case new orders are added in the future, existing library users can still
use the layout simply by ignoring everything but the channel count in it, so
make this explicit.

Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 10:22:19 -03:00
Andreas Rheinhardt
187cd27832 avutil/dict: Error out in case of key == NULL
Up until now, using NULL as key in av_dict_get() on a non-empty
AVDictionary would crash; using NULL as key in av_dict_set()
would also crash for a non-empty AVDictionary unless AV_DICT_MULTIKEY
was set; in case the dictionary was initially empty or AV_DICT_MULTIKEY
was set, it was even possible for av_dict_set() to succeed when
adding a NULL key, namely when one uses a value != NULL and
the AV_DICT_DONT_STRDUP_VAL flag. Using av_dict_get() on such
an AVDictionary will usually lead to crashes, though.

Fix this by actually checking for key in both functions; error out
if they are NULL.

While just at it, also stop relying on av_strdup(NULL) to return NULL
in av_dict_set().

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-19 23:39:58 +02:00
Lynne
4ba68639ca
x86/tx_float: add asm call versions of the 2pt and 4pt transforms
Verified to be working.
2022-09-19 06:01:06 +02:00
Lynne
892548e6a1
x86/tx_float: fully support 128bit regs in LOAD64_LUT
The gather path didn't support 128bit registers.
It's not faster on Zen 3, but it's here for completeness.
2022-09-19 06:01:04 +02:00
Lynne
af42bb3d61
x86/tx_float: simplify and describe the intra-asm call convention 2022-09-19 06:01:02 +02:00
Philip Langdale
ed83a3a5bd lavu/pixdesc: favour formats where depth and subsampling exactly match
Since introducing the various packed formats used by VAAPI (and p012),
we've noticed that there's actually a gap in how
av_find_best_pix_fmt_of_2 works. It doesn't actually assign any value
to having the same bit depth as the source format, when comparing
against formats with a higher bit depth. This usually doesn't matter,
because av_get_padded_bits_per_pixel() will account for it.

However, as many of these formats use padding internally, we find that
av_get_padded_bits_per_pixel() actually returns the same value for the
10 bit, 12 bit, 16 bit flavours, etc. In these tied situations, we end
up just picking the first of the two provided formats, even if the
second one should be preferred because it matches the actual bit depth.

This bug already existed if you tried to compare yuv420p10 against p016
and p010, for example, but it simply hadn't come up before so we never
noticed.

But now, we actually got a situation in the VAAPI VP9 decoder where it
offers both p010 and p012 because Profile 3 could be either depth and
ends up picking p012 for 10 bit content due to the ordering of the
testing.

In addition, in the process of testing the fix, I realised we have the
same gap when it comes to chroma subsampling - we do not favour a
format that has exactly the same subsampling vs one with less
subsampling when all else is equal.

To fix this, I'm introducing a small score penalty if the bit depth or
subsampling doesn't exactly match the source format. This will break
the tie in favour of the format with the exact match, but not offset
any of the other scoring penalties we already have.

I have added a set of tests around these formats which will fail
without this fix.
2022-09-17 15:11:13 -07:00
Rémi Denis-Courmont
6df3ad9687 lavu/riscv: fix off-by-one in bit-magnitude clip 2022-09-15 18:11:12 -03:00
Rémi Denis-Courmont
a90e5335b3 avutil/lfg: fix comment typo 2022-09-15 20:56:23 +05:30
Rémi Denis-Courmont
a5ce44f301 lavu/riscv: fix av_clip_int16
Some serious copy-paste / squash / rebase mismanipulation here.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-14 14:37:21 -03:00
Andreas Rheinhardt
e867a29ec1 avutil/dict: Improve appending values
When appending two values (due to AV_DICT_APPEND), the earlier code
would first zero-allocate a buffer of the required size and then
copy both parts into it via av_strlcat(). This is problematic,
as it leads to quadratic performance in case of frequent enlargements.
Fix this by using av_realloc() (which is hopefully designed to handle
such cases in a better way than simply throwing the buffer we already
have away) and by copying the string via memcpy() (after all, we already
calculated the strlen of both strings).

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-14 19:01:02 +02:00
Andreas Rheinhardt
c15dd31d2a avutil/dict: Fix memleak when using AV_DICT_APPEND
If a key already exists in an AVDictionary and the AV_DICT_APPEND flag
is set, the old entry is at first discarded from the dictionary, but
a pointer to the value is kept. Lateron enough memory to store the
appended string is allocated; should this allocation fail, the old string
is not freed and hence leaks. This commit changes this by moving
creating the combined value to an earlier point in the function,
which also ensures that the AVDictionary is unchanged in case of errors.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-14 19:00:44 +02:00
Andreas Rheinhardt
f976ed7fcf avutil/dict: Avoid check whose result is known in advance
We know that an AVDictionary is not empty if we have just added
an entry to it, so only check for it being empty on the branch
that does not do so.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-14 15:03:59 +02:00
Andreas Rheinhardt
e402bd65b1 Revert "avcodec/loongarch/h264chroma, vc1dsp_lasx: Add wrapper for __lasx_xvldx"
This reverts commit 2c8dc7e953.
The loongarch headers have been fixed, so that this wrapper
is no longer necessary.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-14 14:09:26 +02:00
Andreas Rheinhardt
1234df7501 Revert "avcodec/loongarch: Add wrapper for __lsx_vldx"
This reverts commit 6c9a60ada4.
The loongarch headers have been fixed, so that this workaround
is no longer necessary.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-14 14:09:26 +02:00
Rémi Denis-Courmont
c177108ae1 lavu/riscv: add <intmath.h> optimisations
This provides some micro-optimisations for signed integer clipping, and
support for bit weight with the Zbb extension.
2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont
df2057041b lavu/riscv: byte-swap operations
If the target supports the Basic bit-manipulation (Zbb) extension, then
the REV8 instruction is available to reverse byte order.

Note that this instruction only exists at the "XLEN" register size,
so we need to right shift the result down to the data width.

If Zbb is not supported, then this patchset does nothing. Support for
run-time detection is left for the future. Currently, there are no
bits in auxv/ELF HWCAP for Z-extensions, so there are no clean ways to
do this.
2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont
d808070547 lavu/riscv: AV_READ_TIME cycle counter
This uses the architected RISC-V 64-bit cycle counter from the
RISC-V unprivileged instruction set.

In 64-bit and 128-bit, this is a straightforward CSR read.
In 32-bit mode, the 64-bit value is exposed as two CSRs, which
cannot be read atomically, so a loop is necessary to detect and fix up
the race condition where the bottom half wraps exactly between the two
reads.
2022-09-13 16:50:43 -03:00
James Almer
bda3a9faf4 x86/float_dsp: use three operand form for some instructions
Fixes compilation with old yasm

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-13 13:50:09 -03:00
Paul B Mahol
72acff9f59 avutil/x86/float_dsp: add fma3 for scalarproduct 2022-09-13 17:43:15 +02:00
Andreas Rheinhardt
29c4c0886d avutil/x86/intreadwrite: Add ability to detect whether MMX code is used
It can be used to call emms_c() only when needed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 21:08:04 +02:00
Lynne
f1b35fc8f0
lavu/tx: remove av_cold from table definitions
How did this get here?
2022-09-11 03:18:40 +02:00
Lynne
c92edd969a
lavu/tx: rotate 3 & 15-point exptabs
This just inverts their signs. Simplifies SIMD.
2022-09-10 02:37:17 +02:00
Lynne
51172223fd
lavu/tx: generalize MDCTs
The same code can perform any-length MDCTs with minimal changes.
2022-09-10 02:37:16 +02:00
Lynne
645a1f4422
lavu/tx: add the inplace flag to PFA FFTs
They support in-place, because they have to use a temporary buffer.
2022-09-10 02:37:14 +02:00
Lynne
8c283e8fe6
lavu/tx: propagate the codelet flags into the context
The field is documented as a combination of both.
2022-09-10 02:37:11 +02:00
Haihao Xiang
b7dbffe698 lavu/hwcontext_qsv: add support for AV_PIX_FMT_VUYX on Linux
AV_PIX_FMT_VUYX is used for 8bit 4:4:4 content in FFmpeg VAAPI, so
AV_PIX_FMT_VUYX should be used for 8bit 4:4:4 content in FFmpeg QSV too
because QSV is based on VAAPI on Linux. However the SDK only declares
support for AYUV and does nothing with the alpha, so this commit fudged
a mapping between AV_PIX_FMT_VUYX and MFX_FOURCC_AYUV.

Reviewed-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-09-07 14:04:12 +08:00
James Almer
f4097e4c1f x86/tx_float: add missing check for AVX2
Fixes compilation with old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-06 14:06:33 -03:00
James Almer
74f5fb6db8 x86/tx_float: set all operands for shufps
Fixes compilation with AVX2 enabled yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-06 14:06:03 -03:00
Martin Storsjö
da5f7799a0 slicethread: Limit the automatic number of threads to 16
This matches a similar cap on the number of automatic threads
in libavcodec/pthread_slice.c.

On systems with lots of cores, this fixes a couple fate failures
in 32 bit mode on such machines (where spawning a huge number of
threads runs out of address space).

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-06 18:46:44 +03:00
Martin Storsjö
e4759fa951 x86/tx_float: Fix building for platforms with a symbol prefix
This fixes building for x86 macOS (both i386 and x86_64) and
i386 windows.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-06 18:46:39 +03:00
Lynne
a89025f74d
aarch64/tx_float: fix compilation
Forgot to add the new function arguments.
2022-09-06 05:42:32 +02:00
Lynne
4537d9554d
x86/tx_float: implement inverse MDCT AVX2 assembly
This commit implements an iMDCT in pure assembly.

This is capable of processing any mod-8 transforms, rather than just
power of two, but since power of two is all we have assembly for
currently, that's what's supported.
It would really benefit if we could somehow use the C code to decide
which function to jump into, but exposing function labels from assebly
into C is anything but easy.
The post-transform loop could probably be improved.

This was somewhat annoying to write, as we must support arbitrary
strides during runtime. There's a fast branch for stride == 4 bytes
and a slower one which uses vgatherdps.

Zen 3 benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx):

128pt:
   2811 decicycles in         av_tx (imdct),16775916 runs,   1300 skips
   3082 decicycles in         av_imdct_half,16776751 runs,    465 skips

256pt:
   4920 decicycles in         av_tx (imdct),16775820 runs,   1396 skips
   5378 decicycles in         av_imdct_half,16776411 runs,    805 skips

512pt:
   9668 decicycles in         av_tx (imdct),16775774 runs,   1442 skips
  10626 decicycles in         av_imdct_half,16775647 runs,   1569 skips

1024pt:
  19812 decicycles in         av_tx (imdct),16777144 runs,     72 skips
  23036 decicycles in         av_imdct_half,16777167 runs,     49 skips
2022-09-06 04:21:46 +02:00
Lynne
2425d5cd7e
x86/tx_float: add support for calling assembly functions from assembly
Needed for the next patch.
We get this for the extremely small cost of a branch on _ns functions,
which wouldn't be used anyway with assembly.
2022-09-06 04:21:41 +02:00
Anton Khirnov
ea5b375e0e lavu/fifo: clarify interaction of AV_FIFO_FLAG_AUTO_GROW with av_fifo_write() 2022-09-05 08:59:36 +02:00
Anton Khirnov
8728808b3e lavu/fifo: clarify interaction of AV_FIFO_FLAG_AUTO_GROW with av_fifo_can_write() 2022-09-05 08:59:14 +02:00
Anton Khirnov
c9b6fd27bf lavu/fifo: add the header to its own doxy group
Also, drop mentions of it being a circular buffer, as this is an
internal implementation detail that should be invisible to the caller.
2022-09-05 08:58:51 +02:00
Andreas Rheinhardt
f89949afed avutil/tests/.gitignore: Add channel_layout testtool
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-05 02:00:08 +02:00
Philip Langdale
2f9b8bbd1f lavu/hwcontext_vulkan: support mapping VUYX, P012, and XV36
If we want to be able to map between VAAPI and Vulkan (to do Vulkan
filtering), we need to have matching formats on each side.

The mappings here are not exact. In the same way that P010 is still
mapped to full 16 bit formats, P012 has to be mapped that way as well.

Similarly, VUYX has to be mapped to an alpha-equipped format, and XV36
has to be mapped to a fully 16bit alpha-equipped format.

While Vulkan seems to fundamentally lack formats with an undefined,
but physically present, alpha channel, it has have 10X6 and 12X4
formats that you could imagine using for P010, P012 and XV36, but these
formats don't support the STORAGE usage flag. Today, hwcontext_vulkan
requires all formats to be storable because it wants to be able to use
them to create writable images. Until that changes, which might happen,
we have to restrict the set of formats we use.

Finally, when mapping a Vulkan image back to vaapi, I observed that
the VK_FORMAT_R16G16B16A16_UNORM format we have to use for XV36 going
to Vulkan is mapped to Y416 when going to vaapi (which makes sense as
it's the exact matching format) so I had to add an entry for it even
though we don't use it directly.
2022-09-03 16:19:40 -07:00
Philip Langdale
b982dd0d83 lavc/vaapi: Add support for remaining 10/12bit profiles
With the necessary pixel formats defined, we can now expose support for
the remaining 10/12bit combinations that VAAPI can handle.

Specifically, we are adding support for:

* HEVC
** 12bit 420
** 10bit 422
** 12bit 422
** 10bit 444
** 12bit 444

* VP9
** 10bit 444
** 12bit 444

These obviously require actual hardware support to be usable, but where
that exists, it is now enabled.

Note that unlike YUVA/YUVX, the Intel driver does not formally expose
support for the alphaless formats XV30 and XV360, and so we are
implicitly discarding the alpha from the decoder and passing undefined
values for the alpha to the encoder. If a future encoder iteration was
to actually do something with the alpha bits, we would need to use a
formal alpha capable format or the encoder would need to explicitly
accept the alphaless format.
2022-09-03 16:19:40 -07:00
Philip Langdale
d75c4693fe lavu/pixfmt: Add P012, Y212, XV30, and XV36 formats
These are the formats we want/need to use when dealing with the Intel
VAAPI decoder for 12bit 4:2:0, 12bit 4:2:2, 10bit 4:4:4 and 12bit 4:4:4
respectively.

As with the already supported Y210 and YUVX (XVUY) formats, they are
based on formats Microsoft picked as their preferred 4:2:2 and 4:4:4
video formats, and Intel ran with it.

P12 and Y212 are simply an extension of 10 bit formats to say 12 bits
will be used, with 4 unused bits instead of 6.

XV30, and XV36, as exotic as they sound, are variants of Y410 and Y412
where the alpha channel is left formally undefined. We prefer these
over the alpha versions because the hardware cannot actually do
anything with the alpha channel and respecting it is just overhead.

Y412/XV46 is a normal looking packed 4 channel format where each
channel is 16bits wide but only the 12msb are used (like P012).

Y410/XV30 packs three 10bit channels in 32bits with 2bits of alpha,
like A/X2RGB10 style formats. This annoying layout forced me to define
the BE version as a bitstream format. It seems like our pixdesc
infrastructure can handle the LE version being byte-defined, but not
when it's reversed. If there's a better way to handle this, please
let me know. Our existing X2 formats all have the 2 bits at the MSB
end, but this format places them at the LSB end and that seems to be
the root of the problem.
2022-09-03 16:19:40 -07:00
Rémi Denis-Courmont
620e6e1487 arm: relax byte-swap assembler constraints
There are no particular reasons to force the compiler to use the same
register as output and input operand. This forces an extra MOV
instruction if the input value needs to be reused after the swap.

In most cases, this makes no differences, as the compiler will seleect
the same register for both operands either way.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-03 23:54:05 +03:00
Rémi Denis-Courmont
164021423a aarch64: relax byte-swap assembler constraints
There are no particular reasons to force the compiler to use the same
register as output and input operand. This forces an extra MOV
instruction if the input value needs to be reused after the swap.

In most cases, this makes no differences, as the compiler will seleect
the same register for both operands either way.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-03 23:54:05 +03:00
Andreas Rheinhardt
73fada029c avcodec/codec_internal: Add macros for update_thread_context(_for_user)
It reduces typing: Before this patch, there were 11 callbacks
that exceeded the 80 char line length limit; now there are zero.
It also allows to remove ONLY_IF_THREADS_ENABLED() in
libavutil/internal.h.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:42:57 +02:00
Andreas Rheinhardt
48286d4d98 avcodec/codec_internal: Add macro to set AVCodec.long_name
It reduces typing: Before this patch, there were 105 codecs
whose long_name-definition exceeded the 80 char line length
limit. Now there are only nine of them.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:42:57 +02:00
Andreas Rheinhardt
dea9744560 avutil/file: Properly deprecate av_tempfile()
It has been deprecated in b4f59beeb4,
but the attribute_deprecated was not set and there was no entry
in APIchanges. This commit adds these and schedules it for removal.
Given that the reason behind the deprecation is exactly the same
as in av_fopen_utf8(), reuse its FF_API_AV_FOPEN_UTF8.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:42:40 +02:00
Andreas Rheinhardt
72c601e0f7 avutil/internal: Move avpriv-file API to a header of its own
It is not used by the large majority of files that include
lavu/internal.h.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:41:44 +02:00
Andreas Rheinhardt
04b7217872 avutil/dict: Move avpriv_dict_set_timestamp() to a header of its own
It is used almost nowhere, so it needn't be auto-included
almost everywhere.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:41:44 +02:00
Andreas Rheinhardt
26325cceb0 avutil/internal: Remove unused FF_SYMVER
They are unused since d63443b968.
Furthermore, they were always in the wrong header:
libavutil/internal.h is auto-included almost everywhere, but
FF_SYMVER would only ever be used at a few places, so a proper
header of its own would be appropriate for it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:41:44 +02:00
Andreas Rheinhardt
5b0856d2b9 avutil/internal: Remove unused ff_rint64_clip()
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-03 15:41:44 +02:00
Martin Storsjö
e4ac156b7c libavcodec: Set hidden visibility on global symbols accessed from AArch64 assembly
The AArch64 assembly accesses those symbols directly, without
indirection via e.g. the GOT on ELF. In order for this not to
require text relocations, those symbols need to be resolved fully
at link time, i.e. those symbols can't be interposable.

Normally, so far, this is achieved when linking shared libraries
in two ways; we have a version script (libavcodec/libavcodec.v) which
marks all symbols that don't start with av* as local. Additionally,
we try to add -Wl,-Bsymbolic to the linker options if supported,
making sure that such symbol references are resolved fully at link
time, instead of making them interposable.

When the libavcodec static library is linked into another shared
library, there's no guarantee that it uses similar options (even though
that would be favourable), which would end up requiring text relocations
in the AArch64 assembly.

Explicitly mark the symbols that are accessed from AArch64 assembly
as hidden, so that they are resolved fully at link time even without
the version script and -Wl,-Bsymbolic.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-02 23:13:29 +03:00
Martin Storsjö
0dd8fe6f4b arm: Check the build time constants in av_clip_*intp2
This fixes building for arm targets with optimizations disabled.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-02 23:12:26 +03:00
Philip Langdale
caf26a8a12 lavc/vaapi: Switch preferred 8bit 444 format to VUYX
As vaapi doesn't actually do anything useful with the alpha channel,
and we have an alphaless format available, let's use that instead.

The changes here are mostly 1:1 switching, but do note the explicit
change in the number of declared channels from 4 to 3 to reflect that
the alpha is being ignored.
2022-08-25 19:04:10 -07:00
Philip Langdale
cc5a5c9860 lavu/pixfmt: Introduce VUYX format
This is the alphaless version of VUYA that I introduced recently. After
further discussion and noting that the Intel vaapi driver explicitly
lists XYUV as a support format for encoding and decoding 8bit 444
content, we decided to switch our usage and avoid the overhead of
having a declared alpha channel around.

Note that I am not removing VUYA, as this turned out to have another
use, which was to replace the need for v408enc/dec when dealing with
the format.

The vaapi switching will happen in the next change
2022-08-25 19:02:49 -07:00
Lynne
f932b89ea3
lavu/tx: implement aarch64 NEON SIMD FFT
The fastest fast Fourier transform in not just the west, but the world,
now for the most popular toy ISA.

On a high level, it follows the design of the AVX2 version closely,
with the exception that the input is slightly less permuted as we don't have
to do lane switching with the input on double 4pt and 8pt.

On a low level, the lack of subadd/addsub instructions REALLY penalizes
any attempt at writing an FFT. That single register matters a lot,
and reloading it simply takes unacceptably long.
In x86 land, vendors would've noticed developers need this.
In ARM land, you get a badly designed complex multiplication instruction
we cannot use, that's not present on 95% of devices. Because only
compilers matter, right?

Future optimization options are very few, perhaps better register
management to use more ld1/st1s.

All timings below are in cycles:
A53:
Length | C           | New (lavu)  | Old (lavc)  | FFTW
------ |-------------|-------------|-------------|-----
4      |         842 | 420         | 1210        | 1460
8      |        1538 | 1020        | 1850        | 2520
16     |        3717 | 1900        | 3700        | 3990
32     |        9156 | 4070        | 8289        | 8860
64     |       21160 | 9931        | 18600       | 19625
128    |       49180 | 23278       | 41922       | 41922
256    |      112073 | 53876       | 93202       | 101092
512    |      252864 | 122884      | 205897      | 207868
1024   |      560512 | 278322      | 458071      | 453053
2048   |     1295402 | 775835      | 1038205     | 1020265
4096   |     3281263 | 2021221     | 2409718     | 2577554
8192   |     8577845 | 4780526     | 5673041     | 6802722

Apple M1
New  - Total for len 512 reps 2097152 = 1.459141 s
Old  - Total for len 512 reps 2097152 = 2.251344 s
FFTW - Total for len 512 reps 2097152 = 1.868429 s

New  - Total for len 1024 reps 4194304 = 6.490080 s
Old  - Total for len 1024 reps 4194304 = 9.604949 s
FFTW - Total for len 1024 reps 4194304 = 7.889281 s

New  - Total for len 16384 reps 262144 = 10.374001 s
Old  - Total for len 16384 reps 262144 = 15.266713 s
FFTW - Total for len 16384 reps 262144 = 12.341745 s

New  - Total for len 65536 reps 8192 = 1.769812 s
Old  - Total for len 65536 reps 8192 = 4.209413 s
FFTW - Total for len 65536 reps 8192 = 3.012365 s

New  - Total for len 131072 reps 4096 = 1.942836 s
Old  - Segfaults
FFTW - Total for len 131072 reps 4096 = 3.713713 s

Thanks to wbs for some simplifications, assembler fixes and a review
and to jannau for giving it a look.
2022-08-25 17:40:28 +02:00
Andreas Rheinhardt
0bb0c26799 avutil/mem_internal: Fix headers
Including avassert.h is unnecessary since commit
786be70e28.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-08-24 03:43:52 +02:00
Timo Rothenpieler
ef2c2a2220 avutil/half2float: use native _Float16 if available
_Float16 support was available on arm/aarch64 for a while, and with gcc
12 was enabled on x86 as long as SSE2 is supported.

If the target arch supports f16c, gcc emits fairly efficient assembly,
taking advantage of it. This is the case on x86-64-v3 or higher.
Same goes on arm, which has native float16 support.
On x86, without f16c, it emulates it in software using sse2 instructions.

This has shown to perform rather poorly:

_Float16 full SSE2 emulation:
frame=50074 fps=848 q=-0.0 size=N/A time=00:33:22.96 bitrate=N/A speed=33.9x

_Float16 f16c accelerated (Zen2, --cpu=znver2):
frame=50636 fps=1965 q=-0.0 Lsize=N/A time=00:33:45.40 bitrate=N/A speed=78.6x

classic half2float full software implementation:
frame=49926 fps=1605 q=-0.0 Lsize=N/A time=00:33:17.00 bitrate=N/A speed=64.2x

Hence an additional check was introduced, that only enables use of
_Float16 on x86 if f16c is being utilized.

On aarch64, a similar uplift in performance is seen:

RPi4 half2float full software implementation:
frame= 6088 fps=126 q=-0.0 Lsize=N/A time=00:04:03.48 bitrate=N/A speed=5.06x

RPi4 _Float16:
frame= 6103 fps=158 q=-0.0 Lsize=N/A time=00:04:04.08 bitrate=N/A speed=6.32x

Since arm/aarch64 always natively support 16 bit floats, it can always
be considered fast there.

I'm not aware of any additional platforms that currently support
_Float16. And if there are, they should be considered non-fast until
proven fast.
2022-08-19 22:09:36 +02:00
Timo Rothenpieler
6dc79f1d04 avutil/half2float: move non-inline init code out of header 2022-08-19 22:09:36 +02:00
Timo Rothenpieler
f3fb528cd5 avutil/half2float: move tables to header-internal structs
Having to put the knowledge of the size of those arrays into a multitude
of places is rather smelly.
2022-08-19 22:09:36 +02:00
Timo Rothenpieler
cb8ad005bb avutil/half2float: adjust conversion of NaN
IEEE-754 differentiates two different kind of NaNs.
Quiet and Signaling ones. They are differentiated by the MSB of the
mantissa.

For whatever reason, actual hardware conversion of half to single always
sets the signaling bit to 1 if the mantissa is != 0, and to 0 if it's 0.
So our code has to follow suite or fate-testing hardware float16 will be
impossible.
2022-08-19 22:09:36 +02:00
Timo Rothenpieler
b42925264a avutil: move half-precision float helper to avutil 2022-08-19 22:09:36 +02:00
Lynne
ae66a9db7b
lavu/tx: optimize and simplify inverse MDCTs
Convert the input from a scatter to a gather instead,
which is faster and better for SIMD.
Also, add a pre-shuffled exptab version to avoid
gathering there at all. This doubles the exptab size,
but the speedup makes it worth it. In SIMD, the
exptab will likely be purged to a higher cache
anyway because of the FFT in the middle, and
the amount of loads stays identical.

For a 960-point inverse MDCT, the speedup is 10%.

This makes it possible to write sane and fast SIMD
versions of inverse MDCTs.
2022-08-16 01:22:38 +02:00