1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00
Commit Graph

5641 Commits

Author SHA1 Message Date
Rémi Denis-Courmont
da169a210d lavu/floatdsp: RISC-V V vector_dmul 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
7058af9969 lavu/floatdsp: RISC-V V vector_fmul 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
89b7ec65a8 lavu/floatdsp: RISC-V V vector_dmul_scalar 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
a6c10d05fe lavu/floatdsp: RISC-V V vector_fmul_scalar
This is based on existing code from the VLC git tree with two minor
changes to account for the different function prototypes.
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
39357cad37 lavu/riscv: fallback macros for SH{1, 2, 3}ADD
Those mnemonics require the very latest binutils release at the time of
writing. These macros provide seamless backward compatibility.
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
0c0a3deb18 lavu/cpu: CPU flags for the RISC-V Vector extension
RVV defines a total of 12 different extensions, including:

- 5 different instruction subsets:
  - Zve32x: 8-, 16- and 32-bit integers,
  - Zve32f: Zve32x plus single precision floats,
  - Zve64x: Zve32x plus 64-bit integers,
  - Zve64f: Zve32f plus Zve64x,
  - Zve64d: Zve64f plus double precision floats.

- 6 different vector lengths:
  - Zvl32b (embedded only),
  - Zvl64b (embedded only),
  - Zvl128b,
  - Zvl256b,
  - Zvl512b,
  - Zvl1024b,

- and the V extension proper: equivalent to Zve64f and Zvl128b.

In total, there are 6 different possible sets of supported instructions
(including the empty set), but for convenience we allocate one bit for
each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32),
64-bit ints (RVV_I64) and doubles (RVV_F64).

Whence the vector size is needed, it can be retrieved by reading the
unprivileged read-only vlenb CSR. This should probably be a separate
helper macro if needed at a later point.
2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
746f1ff36a lavu/riscv: initial common header for assembler macros 2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont
b95e2fbd85 lavu/cpu: detect RISC-V base extensions
This introduces compile-time and run-time CPU detection on RISC-V. In
practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of
I, F and D extensions, and if it does, it probably won't have run-time
detection. So the flags are essentially always set.

But as things stand, checkasm wants them that way. Compare the ARMV8
flag on AArch64. We are nowhere near running short on CPU flag bits.
2022-09-27 13:19:52 +02:00
Andreas Rheinhardt
8be6552aa4 avutil/pixdesc: Add av_chroma_location_(enum_to_pos|pos_to_enum)
They are intended as replacements for avcodec_enum_to_chroma_pos()
and avcodec_chroma_pos_to_enum().

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-26 03:02:25 +02:00
Paul B Mahol
7bb0afc245 avutil: add RGBA single-float precision packed formats 2022-09-25 18:34:48 +02:00
Paul B Mahol
63bb6d6a9b avutil: add RGB single-precision float formats 2022-09-25 18:34:48 +02:00
Lynne
f21899db7d
x86/tx_float: enable AVX-only split-radix FFT codelets
Sandy Bridge, Ivy Bridge and Bulldozer cores don't support FMA3.
2022-09-24 04:16:55 +02:00
James Almer
d2f482965f x86/tx_float: fix some symbol names
Should fix compilation on MacOS

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-23 18:53:05 -03:00
James Almer
0d8f43c74d x86/tx_float: change a condition in a preprocessor check
Fixes compilation with yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-23 16:05:07 -03:00
James Almer
750f378bec x86/tx_float: add missing preprocessor wrapper for AVX2 functions
Fixes compilation with old assemblers.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-23 15:15:20 -03:00
Lynne
e7a987d7c9
lavu/tx: remove special -1 inverted lookup mode
It was somewhat hacky and unnecessary.
2022-09-23 12:35:28 +02:00
Lynne
74e8541bab
x86/tx_float: generalize iMDCT
To support non-aligned buffers during the post-transform step, just iterate
backwards over the array.

This allows using the 15xN-point FFT, with which the speed is 2.1 times
faster than our old libavcodec implementation.
2022-09-23 12:35:28 +02:00
Lynne
ace42cf581
x86/tx_float: add 15xN PFA FFT AVX SIMD
~4x faster than the C version.
The shuffles in the 15pt dim1 are seriously expensive. Not happy with it,
but I'm contempt.

Can be easily converted to pure AVX by removing all vpermpd/vpermps
instructions.
2022-09-23 12:35:27 +02:00
Lynne
3241e9225c
x86/tx_float: adjust internal ASM call ABI again
There are many ways to go about it, and this one seems optimal for both
MDCTs and PFA FFTs without requiring excessive instructions or stack usage.
2022-09-23 12:33:35 +02:00
Lynne
7e7baf8ab8
lavu/tx: do not steal lookup tables of subcontexts in the iMDCT
As it happens, some still need their contexts.
2022-09-23 12:33:31 +02:00
James Almer
05cff214b9 avutil/channel_layout: mention how the API user should treat channel orders it does not understand
In case new orders are added in the future, existing library users can still
use the layout simply by ignoring everything but the channel count in it, so
make this explicit.

Reviewed-by: Anton Khirnov <anton@khirnov.net>
Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-22 10:22:19 -03:00
Andreas Rheinhardt
187cd27832 avutil/dict: Error out in case of key == NULL
Up until now, using NULL as key in av_dict_get() on a non-empty
AVDictionary would crash; using NULL as key in av_dict_set()
would also crash for a non-empty AVDictionary unless AV_DICT_MULTIKEY
was set; in case the dictionary was initially empty or AV_DICT_MULTIKEY
was set, it was even possible for av_dict_set() to succeed when
adding a NULL key, namely when one uses a value != NULL and
the AV_DICT_DONT_STRDUP_VAL flag. Using av_dict_get() on such
an AVDictionary will usually lead to crashes, though.

Fix this by actually checking for key in both functions; error out
if they are NULL.

While just at it, also stop relying on av_strdup(NULL) to return NULL
in av_dict_set().

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-19 23:39:58 +02:00
Lynne
4ba68639ca
x86/tx_float: add asm call versions of the 2pt and 4pt transforms
Verified to be working.
2022-09-19 06:01:06 +02:00
Lynne
892548e6a1
x86/tx_float: fully support 128bit regs in LOAD64_LUT
The gather path didn't support 128bit registers.
It's not faster on Zen 3, but it's here for completeness.
2022-09-19 06:01:04 +02:00
Lynne
af42bb3d61
x86/tx_float: simplify and describe the intra-asm call convention 2022-09-19 06:01:02 +02:00
Philip Langdale
ed83a3a5bd lavu/pixdesc: favour formats where depth and subsampling exactly match
Since introducing the various packed formats used by VAAPI (and p012),
we've noticed that there's actually a gap in how
av_find_best_pix_fmt_of_2 works. It doesn't actually assign any value
to having the same bit depth as the source format, when comparing
against formats with a higher bit depth. This usually doesn't matter,
because av_get_padded_bits_per_pixel() will account for it.

However, as many of these formats use padding internally, we find that
av_get_padded_bits_per_pixel() actually returns the same value for the
10 bit, 12 bit, 16 bit flavours, etc. In these tied situations, we end
up just picking the first of the two provided formats, even if the
second one should be preferred because it matches the actual bit depth.

This bug already existed if you tried to compare yuv420p10 against p016
and p010, for example, but it simply hadn't come up before so we never
noticed.

But now, we actually got a situation in the VAAPI VP9 decoder where it
offers both p010 and p012 because Profile 3 could be either depth and
ends up picking p012 for 10 bit content due to the ordering of the
testing.

In addition, in the process of testing the fix, I realised we have the
same gap when it comes to chroma subsampling - we do not favour a
format that has exactly the same subsampling vs one with less
subsampling when all else is equal.

To fix this, I'm introducing a small score penalty if the bit depth or
subsampling doesn't exactly match the source format. This will break
the tie in favour of the format with the exact match, but not offset
any of the other scoring penalties we already have.

I have added a set of tests around these formats which will fail
without this fix.
2022-09-17 15:11:13 -07:00
Rémi Denis-Courmont
6df3ad9687 lavu/riscv: fix off-by-one in bit-magnitude clip 2022-09-15 18:11:12 -03:00
Rémi Denis-Courmont
a90e5335b3 avutil/lfg: fix comment typo 2022-09-15 20:56:23 +05:30
Rémi Denis-Courmont
a5ce44f301 lavu/riscv: fix av_clip_int16
Some serious copy-paste / squash / rebase mismanipulation here.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-14 14:37:21 -03:00
Andreas Rheinhardt
e867a29ec1 avutil/dict: Improve appending values
When appending two values (due to AV_DICT_APPEND), the earlier code
would first zero-allocate a buffer of the required size and then
copy both parts into it via av_strlcat(). This is problematic,
as it leads to quadratic performance in case of frequent enlargements.
Fix this by using av_realloc() (which is hopefully designed to handle
such cases in a better way than simply throwing the buffer we already
have away) and by copying the string via memcpy() (after all, we already
calculated the strlen of both strings).

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-14 19:01:02 +02:00
Andreas Rheinhardt
c15dd31d2a avutil/dict: Fix memleak when using AV_DICT_APPEND
If a key already exists in an AVDictionary and the AV_DICT_APPEND flag
is set, the old entry is at first discarded from the dictionary, but
a pointer to the value is kept. Lateron enough memory to store the
appended string is allocated; should this allocation fail, the old string
is not freed and hence leaks. This commit changes this by moving
creating the combined value to an earlier point in the function,
which also ensures that the AVDictionary is unchanged in case of errors.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-14 19:00:44 +02:00
Andreas Rheinhardt
f976ed7fcf avutil/dict: Avoid check whose result is known in advance
We know that an AVDictionary is not empty if we have just added
an entry to it, so only check for it being empty on the branch
that does not do so.

Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-14 15:03:59 +02:00
Andreas Rheinhardt
e402bd65b1 Revert "avcodec/loongarch/h264chroma, vc1dsp_lasx: Add wrapper for __lasx_xvldx"
This reverts commit 2c8dc7e953.
The loongarch headers have been fixed, so that this wrapper
is no longer necessary.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-14 14:09:26 +02:00
Andreas Rheinhardt
1234df7501 Revert "avcodec/loongarch: Add wrapper for __lsx_vldx"
This reverts commit 6c9a60ada4.
The loongarch headers have been fixed, so that this workaround
is no longer necessary.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-14 14:09:26 +02:00
Rémi Denis-Courmont
c177108ae1 lavu/riscv: add <intmath.h> optimisations
This provides some micro-optimisations for signed integer clipping, and
support for bit weight with the Zbb extension.
2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont
df2057041b lavu/riscv: byte-swap operations
If the target supports the Basic bit-manipulation (Zbb) extension, then
the REV8 instruction is available to reverse byte order.

Note that this instruction only exists at the "XLEN" register size,
so we need to right shift the result down to the data width.

If Zbb is not supported, then this patchset does nothing. Support for
run-time detection is left for the future. Currently, there are no
bits in auxv/ELF HWCAP for Z-extensions, so there are no clean ways to
do this.
2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont
d808070547 lavu/riscv: AV_READ_TIME cycle counter
This uses the architected RISC-V 64-bit cycle counter from the
RISC-V unprivileged instruction set.

In 64-bit and 128-bit, this is a straightforward CSR read.
In 32-bit mode, the 64-bit value is exposed as two CSRs, which
cannot be read atomically, so a loop is necessary to detect and fix up
the race condition where the bottom half wraps exactly between the two
reads.
2022-09-13 16:50:43 -03:00
James Almer
bda3a9faf4 x86/float_dsp: use three operand form for some instructions
Fixes compilation with old yasm

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-13 13:50:09 -03:00
Paul B Mahol
72acff9f59 avutil/x86/float_dsp: add fma3 for scalarproduct 2022-09-13 17:43:15 +02:00
Andreas Rheinhardt
29c4c0886d avutil/x86/intreadwrite: Add ability to detect whether MMX code is used
It can be used to call emms_c() only when needed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2022-09-11 21:08:04 +02:00
Lynne
f1b35fc8f0
lavu/tx: remove av_cold from table definitions
How did this get here?
2022-09-11 03:18:40 +02:00
Lynne
c92edd969a
lavu/tx: rotate 3 & 15-point exptabs
This just inverts their signs. Simplifies SIMD.
2022-09-10 02:37:17 +02:00
Lynne
51172223fd
lavu/tx: generalize MDCTs
The same code can perform any-length MDCTs with minimal changes.
2022-09-10 02:37:16 +02:00
Lynne
645a1f4422
lavu/tx: add the inplace flag to PFA FFTs
They support in-place, because they have to use a temporary buffer.
2022-09-10 02:37:14 +02:00
Lynne
8c283e8fe6
lavu/tx: propagate the codelet flags into the context
The field is documented as a combination of both.
2022-09-10 02:37:11 +02:00
Haihao Xiang
b7dbffe698 lavu/hwcontext_qsv: add support for AV_PIX_FMT_VUYX on Linux
AV_PIX_FMT_VUYX is used for 8bit 4:4:4 content in FFmpeg VAAPI, so
AV_PIX_FMT_VUYX should be used for 8bit 4:4:4 content in FFmpeg QSV too
because QSV is based on VAAPI on Linux. However the SDK only declares
support for AYUV and does nothing with the alpha, so this commit fudged
a mapping between AV_PIX_FMT_VUYX and MFX_FOURCC_AYUV.

Reviewed-by: Philip Langdale <philipl@overt.org>
Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>
2022-09-07 14:04:12 +08:00
James Almer
f4097e4c1f x86/tx_float: add missing check for AVX2
Fixes compilation with old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-06 14:06:33 -03:00
James Almer
74f5fb6db8 x86/tx_float: set all operands for shufps
Fixes compilation with AVX2 enabled yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2022-09-06 14:06:03 -03:00
Martin Storsjö
da5f7799a0 slicethread: Limit the automatic number of threads to 16
This matches a similar cap on the number of automatic threads
in libavcodec/pthread_slice.c.

On systems with lots of cores, this fixes a couple fate failures
in 32 bit mode on such machines (where spawning a huge number of
threads runs out of address space).

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-06 18:46:44 +03:00
Martin Storsjö
e4759fa951 x86/tx_float: Fix building for platforms with a symbol prefix
This fixes building for x86 macOS (both i386 and x86_64) and
i386 windows.

Signed-off-by: Martin Storsjö <martin@martin.st>
2022-09-06 18:46:39 +03:00