FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-18 03:19:31 +02:00

Author	SHA1	Message	Date
Haihao Xiang	aba25b391c	lavu/hwcontext_qsv: add support for 10bit 4:4:4 content on Linux XV30 is used for 10bit 4:4:4 content in FFmpeg VAAPI, so XV30 should be used for 10bit 4:4:4 content in FFmpeg QSV too because QSV is based on VAAPI on Linux. However the SDK only declares support for Y410 but does nothing with the alpha in Y410, so this commit fudged a mapping between AV_PIX_FMT_XV30 and MFX_FOURCC_Y410. Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2022-10-10 09:31:34 +08:00
Haihao Xiang	1496e7c173	lavu/hwcontext_qsv: specify Shift for each format We can't get Shift from bit depth for some formats in the SDK. For example, bit depth is 10, however Shift is 0 for Y410 (XV30 in FFmpeg). In order to support these formats in the next commits, this patch specified Shift for each format Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2022-10-10 09:31:34 +08:00
Rémi Denis-Courmont	f59a767ccd	lavu/riscv: helper macro for VTYPE encoding On most cases, the vector type (VTYPE) for the RISC-V Vector extension is supplied as an immediate value, with either of the VSETVLI or VSETIVLI instructions. There is however a third instruction VSETVL which takes the vector type from a general purpose register. That is so the type can be selected at run-time. This introduces a macro to load a (valid) vector type into a register. The syntax follows that of VSETVLI and VSETIVLI, with element size, group multiplier, then tail and mask policies.	2022-10-10 02:22:12 +02:00
Lynne	bd3e552549	lavu: bump minor and add APIChanges entry for RISC-V's RVBbasic	2022-10-05 08:31:15 +02:00
Rémi Denis-Courmont	37d5ddc317	lavu/riscv: CPU flag for the Zbb extension Unfortunately, it is common, and will remain so, that the Bit manipulations are not enabled at compilation time. This is an official policy for Debian ports in general (though they do not support RISC-V officially as of yet) to stick to the minimal target baseline, which does not include the B extension or even its Zbb subset. For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems impractical. But at least it can work for the byte-swap DSP functions.	2022-10-05 08:26:19 +02:00
Rémi Denis-Courmont	3ba5579e55	riscv: remove unnecessary #include's Pointed out by Andreas Rheinhardt.	2022-10-05 06:54:56 +02:00
Johannes Kauffmann	a11e745b97	lavu/fixed_dsp: add missing av_restrict qualifiers The butterflies_fixed function pointer declaration specifies av_restrict for the first two pointer arguments. So the corresponding function definitions should honor this declaration. MSVC emits warning C4113 for this. Signed-off-by: Anton Khirnov <anton@khirnov.net>	2022-10-04 10:56:12 +02:00
Andreas Rheinhardt	e4beb307ab	avutil/channel_layout: Don't mention dead project Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-10-03 23:19:47 +02:00
Andreas Rheinhardt	9d52844aba	avutil/tests/pixelutils: Test that all non-hw pix fmts have components Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-30 14:33:08 +02:00
Andreas Rheinhardt	36e805e9df	avutil/tests/pixelutils: Use av_assert0 instead for test tools These are test tools, so they should be picky. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-30 14:33:08 +02:00
Andreas Rheinhardt	5fe447bbb4	avutil/pixdesc: Move ff_check_pixfmt_descriptors() to its only user Namely to lavu/tests/pixelutils.c. This way, this function will not be included into actual binaries any more. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-30 14:33:08 +02:00
Andreas Rheinhardt	571b670e7d	avutil/pixdesc: Avoid direct access to pix fmt desc array Instead use av_pix_fmt_desc_next(). It is still possible to check its return values by comparing it with the (currently) expected values and the code does so. Reviewed-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-30 14:33:08 +02:00
Andreas Rheinhardt	6d0a7e96e7	avutil/pixdesc: Remove always-false checks ff_check_pixfmt_descriptors() was added in commit `20e99a9c10`. At this time, the values of enum AVPixelFormat were not contiguous; instead there was a jump from 111 to 291 (or from 115 to 295 depending upon AV_PIX_FMT_ABI_GIT_MASTER). ff_check_pixfmt_descriptors() accounts for this by skipping empty descriptors. Yet this issue no longer exists: There are no holes. The check for said holes makes GCC believe that the name can be NULL; because it is used as argument corresponding to %s in a log statement, it therefore emits a warning (since `d75c4693fe`). Therefore this commit simply removes these checks. Also move the checks for name before the log statement. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-30 14:33:08 +02:00
Andreas Rheinhardt	3d8754cd09	avutil/display: Drop wrong comments about matrices being allocated These functions work just as well with stack based matrices. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-29 00:05:32 +02:00
James Almer	299253ae1b	avutil/channel_layout: move and improve the comment about unknown orders Don't place it as doxy specific for the order field, and generalize it both to also cover already defined orders and to not make it seem like the user is required to handle a layout they don't fully support or understand. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-28 12:21:18 -03:00
James Almer	bcd2e7d685	avutil/version: bump minor for the new RISC-V cpu flags Forgotten in `0c0a3deb18`. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-28 12:21:18 -03:00
Rémi Denis-Courmont	c47ebfa141	lavu/riscv: helper to read the vector length	2022-09-28 11:43:17 +02:00
Rémi Denis-Courmont	c1bb19e263	lavu/fixeddsp: RISC-V V butterflies_fixed	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	cd77662953	lavu/floatdsp: RISC-V V scalarproduct_float	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	b493370662	lavu/floatdsp: RISC-V V vector_fmul_window	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	9aeb6aca3a	lavu/floatdsp: RISC-V V vector_fmul_reverse	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	47ce9735cc	lavu/floatdsp: RISC-V V butterflies_float	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	f4ea45040f	lavu/floatdsp: RISC-V V vector_fmul_add	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	d120ab5b91	lavu/floatdsp: RISC-V V vector_dmac_scalar	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	c3db27ba95	lavu/floatdsp: RISC-V V vector_fmac_scalar	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	da169a210d	lavu/floatdsp: RISC-V V vector_dmul	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	7058af9969	lavu/floatdsp: RISC-V V vector_fmul	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	89b7ec65a8	lavu/floatdsp: RISC-V V vector_dmul_scalar	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	a6c10d05fe	lavu/floatdsp: RISC-V V vector_fmul_scalar This is based on existing code from the VLC git tree with two minor changes to account for the different function prototypes.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	39357cad37	lavu/riscv: fallback macros for SH{1, 2, 3}ADD Those mnemonics require the very latest binutils release at the time of writing. These macros provide seamless backward compatibility.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	0c0a3deb18	lavu/cpu: CPU flags for the RISC-V Vector extension RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d: Zve64f plus double precision floats. - 6 different vector lengths: - Zvl32b (embedded only), - Zvl64b (embedded only), - Zvl128b, - Zvl256b, - Zvl512b, - Zvl1024b, - and the V extension proper: equivalent to Zve64f and Zvl128b. In total, there are 6 different possible sets of supported instructions (including the empty set), but for convenience we allocate one bit for each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32), 64-bit ints (RVV_I64) and doubles (RVV_F64). Whence the vector size is needed, it can be retrieved by reading the unprivileged read-only vlenb CSR. This should probably be a separate helper macro if needed at a later point.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	746f1ff36a	lavu/riscv: initial common header for assembler macros	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	b95e2fbd85	lavu/cpu: detect RISC-V base extensions This introduces compile-time and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of I, F and D extensions, and if it does, it probably won't have run-time detection. So the flags are essentially always set. But as things stand, checkasm wants them that way. Compare the ARMV8 flag on AArch64. We are nowhere near running short on CPU flag bits.	2022-09-27 13:19:52 +02:00
Andreas Rheinhardt	8be6552aa4	avutil/pixdesc: Add av_chroma_location_(enum_to_pos\|pos_to_enum) They are intended as replacements for avcodec_enum_to_chroma_pos() and avcodec_chroma_pos_to_enum(). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-26 03:02:25 +02:00
Paul B Mahol	7bb0afc245	avutil: add RGBA single-float precision packed formats	2022-09-25 18:34:48 +02:00
Paul B Mahol	63bb6d6a9b	avutil: add RGB single-precision float formats	2022-09-25 18:34:48 +02:00
Lynne	f21899db7d	x86/tx_float: enable AVX-only split-radix FFT codelets Sandy Bridge, Ivy Bridge and Bulldozer cores don't support FMA3.	2022-09-24 04:16:55 +02:00
James Almer	d2f482965f	x86/tx_float: fix some symbol names Should fix compilation on MacOS Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-23 18:53:05 -03:00
James Almer	0d8f43c74d	x86/tx_float: change a condition in a preprocessor check Fixes compilation with yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-23 16:05:07 -03:00
James Almer	750f378bec	x86/tx_float: add missing preprocessor wrapper for AVX2 functions Fixes compilation with old assemblers. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-23 15:15:20 -03:00
Lynne	e7a987d7c9	lavu/tx: remove special -1 inverted lookup mode It was somewhat hacky and unnecessary.	2022-09-23 12:35:28 +02:00
Lynne	74e8541bab	x86/tx_float: generalize iMDCT To support non-aligned buffers during the post-transform step, just iterate backwards over the array. This allows using the 15xN-point FFT, with which the speed is 2.1 times faster than our old libavcodec implementation.	2022-09-23 12:35:28 +02:00
Lynne	ace42cf581	x86/tx_float: add 15xN PFA FFT AVX SIMD ~4x faster than the C version. The shuffles in the 15pt dim1 are seriously expensive. Not happy with it, but I'm contempt. Can be easily converted to pure AVX by removing all vpermpd/vpermps instructions.	2022-09-23 12:35:27 +02:00
Lynne	3241e9225c	x86/tx_float: adjust internal ASM call ABI again There are many ways to go about it, and this one seems optimal for both MDCTs and PFA FFTs without requiring excessive instructions or stack usage.	2022-09-23 12:33:35 +02:00
Lynne	7e7baf8ab8	lavu/tx: do not steal lookup tables of subcontexts in the iMDCT As it happens, some still need their contexts.	2022-09-23 12:33:31 +02:00
James Almer	05cff214b9	avutil/channel_layout: mention how the API user should treat channel orders it does not understand In case new orders are added in the future, existing library users can still use the layout simply by ignoring everything but the channel count in it, so make this explicit. Reviewed-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-22 10:22:19 -03:00
Andreas Rheinhardt	187cd27832	avutil/dict: Error out in case of key == NULL Up until now, using NULL as key in av_dict_get() on a non-empty AVDictionary would crash; using NULL as key in av_dict_set() would also crash for a non-empty AVDictionary unless AV_DICT_MULTIKEY was set; in case the dictionary was initially empty or AV_DICT_MULTIKEY was set, it was even possible for av_dict_set() to succeed when adding a NULL key, namely when one uses a value != NULL and the AV_DICT_DONT_STRDUP_VAL flag. Using av_dict_get() on such an AVDictionary will usually lead to crashes, though. Fix this by actually checking for key in both functions; error out if they are NULL. While just at it, also stop relying on av_strdup(NULL) to return NULL in av_dict_set(). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-19 23:39:58 +02:00
Lynne	4ba68639ca	x86/tx_float: add asm call versions of the 2pt and 4pt transforms Verified to be working.	2022-09-19 06:01:06 +02:00
Lynne	892548e6a1	x86/tx_float: fully support 128bit regs in LOAD64_LUT The gather path didn't support 128bit registers. It's not faster on Zen 3, but it's here for completeness.	2022-09-19 06:01:04 +02:00
Lynne	af42bb3d61	x86/tx_float: simplify and describe the intra-asm call convention	2022-09-19 06:01:02 +02:00
Philip Langdale	ed83a3a5bd	lavu/pixdesc: favour formats where depth and subsampling exactly match Since introducing the various packed formats used by VAAPI (and p012), we've noticed that there's actually a gap in how av_find_best_pix_fmt_of_2 works. It doesn't actually assign any value to having the same bit depth as the source format, when comparing against formats with a higher bit depth. This usually doesn't matter, because av_get_padded_bits_per_pixel() will account for it. However, as many of these formats use padding internally, we find that av_get_padded_bits_per_pixel() actually returns the same value for the 10 bit, 12 bit, 16 bit flavours, etc. In these tied situations, we end up just picking the first of the two provided formats, even if the second one should be preferred because it matches the actual bit depth. This bug already existed if you tried to compare yuv420p10 against p016 and p010, for example, but it simply hadn't come up before so we never noticed. But now, we actually got a situation in the VAAPI VP9 decoder where it offers both p010 and p012 because Profile 3 could be either depth and ends up picking p012 for 10 bit content due to the ordering of the testing. In addition, in the process of testing the fix, I realised we have the same gap when it comes to chroma subsampling - we do not favour a format that has exactly the same subsampling vs one with less subsampling when all else is equal. To fix this, I'm introducing a small score penalty if the bit depth or subsampling doesn't exactly match the source format. This will break the tie in favour of the format with the exact match, but not offset any of the other scoring penalties we already have. I have added a set of tests around these formats which will fail without this fix.	2022-09-17 15:11:13 -07:00
Rémi Denis-Courmont	6df3ad9687	lavu/riscv: fix off-by-one in bit-magnitude clip	2022-09-15 18:11:12 -03:00
Rémi Denis-Courmont	a90e5335b3	avutil/lfg: fix comment typo	2022-09-15 20:56:23 +05:30
Rémi Denis-Courmont	a5ce44f301	lavu/riscv: fix av_clip_int16 Some serious copy-paste / squash / rebase mismanipulation here. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-14 14:37:21 -03:00
Andreas Rheinhardt	e867a29ec1	avutil/dict: Improve appending values When appending two values (due to AV_DICT_APPEND), the earlier code would first zero-allocate a buffer of the required size and then copy both parts into it via av_strlcat(). This is problematic, as it leads to quadratic performance in case of frequent enlargements. Fix this by using av_realloc() (which is hopefully designed to handle such cases in a better way than simply throwing the buffer we already have away) and by copying the string via memcpy() (after all, we already calculated the strlen of both strings). Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-14 19:01:02 +02:00
Andreas Rheinhardt	c15dd31d2a	avutil/dict: Fix memleak when using AV_DICT_APPEND If a key already exists in an AVDictionary and the AV_DICT_APPEND flag is set, the old entry is at first discarded from the dictionary, but a pointer to the value is kept. Lateron enough memory to store the appended string is allocated; should this allocation fail, the old string is not freed and hence leaks. This commit changes this by moving creating the combined value to an earlier point in the function, which also ensures that the AVDictionary is unchanged in case of errors. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-14 19:00:44 +02:00
Andreas Rheinhardt	f976ed7fcf	avutil/dict: Avoid check whose result is known in advance We know that an AVDictionary is not empty if we have just added an entry to it, so only check for it being empty on the branch that does not do so. Reviewed-by: Paul B Mahol <onemda@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-14 15:03:59 +02:00
Andreas Rheinhardt	e402bd65b1	Revert "avcodec/loongarch/h264chroma, vc1dsp_lasx: Add wrapper for __lasx_xvldx" This reverts commit `2c8dc7e953`. The loongarch headers have been fixed, so that this wrapper is no longer necessary. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-14 14:09:26 +02:00
Andreas Rheinhardt	1234df7501	Revert "avcodec/loongarch: Add wrapper for __lsx_vldx" This reverts commit `6c9a60ada4`. The loongarch headers have been fixed, so that this workaround is no longer necessary. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-14 14:09:26 +02:00
Rémi Denis-Courmont	c177108ae1	lavu/riscv: add <intmath.h> optimisations This provides some micro-optimisations for signed integer clipping, and support for bit weight with the Zbb extension.	2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont	df2057041b	lavu/riscv: byte-swap operations If the target supports the Basic bit-manipulation (Zbb) extension, then the REV8 instruction is available to reverse byte order. Note that this instruction only exists at the "XLEN" register size, so we need to right shift the result down to the data width. If Zbb is not supported, then this patchset does nothing. Support for run-time detection is left for the future. Currently, there are no bits in auxv/ELF HWCAP for Z-extensions, so there are no clean ways to do this.	2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont	d808070547	lavu/riscv: AV_READ_TIME cycle counter This uses the architected RISC-V 64-bit cycle counter from the RISC-V unprivileged instruction set. In 64-bit and 128-bit, this is a straightforward CSR read. In 32-bit mode, the 64-bit value is exposed as two CSRs, which cannot be read atomically, so a loop is necessary to detect and fix up the race condition where the bottom half wraps exactly between the two reads.	2022-09-13 16:50:43 -03:00
James Almer	bda3a9faf4	x86/float_dsp: use three operand form for some instructions Fixes compilation with old yasm Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-13 13:50:09 -03:00
Paul B Mahol	72acff9f59	avutil/x86/float_dsp: add fma3 for scalarproduct	2022-09-13 17:43:15 +02:00
Andreas Rheinhardt	29c4c0886d	avutil/x86/intreadwrite: Add ability to detect whether MMX code is used It can be used to call emms_c() only when needed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-11 21:08:04 +02:00
Lynne	f1b35fc8f0	lavu/tx: remove av_cold from table definitions How did this get here?	2022-09-11 03:18:40 +02:00
Lynne	c92edd969a	lavu/tx: rotate 3 & 15-point exptabs This just inverts their signs. Simplifies SIMD.	2022-09-10 02:37:17 +02:00
Lynne	51172223fd	lavu/tx: generalize MDCTs The same code can perform any-length MDCTs with minimal changes.	2022-09-10 02:37:16 +02:00
Lynne	645a1f4422	lavu/tx: add the inplace flag to PFA FFTs They support in-place, because they have to use a temporary buffer.	2022-09-10 02:37:14 +02:00
Lynne	8c283e8fe6	lavu/tx: propagate the codelet flags into the context The field is documented as a combination of both.	2022-09-10 02:37:11 +02:00
Haihao Xiang	b7dbffe698	lavu/hwcontext_qsv: add support for AV_PIX_FMT_VUYX on Linux AV_PIX_FMT_VUYX is used for 8bit 4:4:4 content in FFmpeg VAAPI, so AV_PIX_FMT_VUYX should be used for 8bit 4:4:4 content in FFmpeg QSV too because QSV is based on VAAPI on Linux. However the SDK only declares support for AYUV and does nothing with the alpha, so this commit fudged a mapping between AV_PIX_FMT_VUYX and MFX_FOURCC_AYUV. Reviewed-by: Philip Langdale <philipl@overt.org> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2022-09-07 14:04:12 +08:00
James Almer	f4097e4c1f	x86/tx_float: add missing check for AVX2 Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-06 14:06:33 -03:00
James Almer	74f5fb6db8	x86/tx_float: set all operands for shufps Fixes compilation with AVX2 enabled yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-06 14:06:03 -03:00
Martin Storsjö	da5f7799a0	slicethread: Limit the automatic number of threads to 16 This matches a similar cap on the number of automatic threads in libavcodec/pthread_slice.c. On systems with lots of cores, this fixes a couple fate failures in 32 bit mode on such machines (where spawning a huge number of threads runs out of address space). Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-06 18:46:44 +03:00
Martin Storsjö	e4759fa951	x86/tx_float: Fix building for platforms with a symbol prefix This fixes building for x86 macOS (both i386 and x86_64) and i386 windows. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-06 18:46:39 +03:00
Lynne	a89025f74d	aarch64/tx_float: fix compilation Forgot to add the new function arguments.	2022-09-06 05:42:32 +02:00
Lynne	4537d9554d	x86/tx_float: implement inverse MDCT AVX2 assembly This commit implements an iMDCT in pure assembly. This is capable of processing any mod-8 transforms, rather than just power of two, but since power of two is all we have assembly for currently, that's what's supported. It would really benefit if we could somehow use the C code to decide which function to jump into, but exposing function labels from assebly into C is anything but easy. The post-transform loop could probably be improved. This was somewhat annoying to write, as we must support arbitrary strides during runtime. There's a fast branch for stride == 4 bytes and a slower one which uses vgatherdps. Zen 3 benchmarks for stride == 4 for old (av_imdct_half) vs new (av_tx): 128pt: 2811 decicycles in av_tx (imdct),16775916 runs, 1300 skips 3082 decicycles in av_imdct_half,16776751 runs, 465 skips 256pt: 4920 decicycles in av_tx (imdct),16775820 runs, 1396 skips 5378 decicycles in av_imdct_half,16776411 runs, 805 skips 512pt: 9668 decicycles in av_tx (imdct),16775774 runs, 1442 skips 10626 decicycles in av_imdct_half,16775647 runs, 1569 skips 1024pt: 19812 decicycles in av_tx (imdct),16777144 runs, 72 skips 23036 decicycles in av_imdct_half,16777167 runs, 49 skips	2022-09-06 04:21:46 +02:00
Lynne	2425d5cd7e	x86/tx_float: add support for calling assembly functions from assembly Needed for the next patch. We get this for the extremely small cost of a branch on _ns functions, which wouldn't be used anyway with assembly.	2022-09-06 04:21:41 +02:00
Anton Khirnov	ea5b375e0e	lavu/fifo: clarify interaction of AV_FIFO_FLAG_AUTO_GROW with av_fifo_write()	2022-09-05 08:59:36 +02:00
Anton Khirnov	8728808b3e	lavu/fifo: clarify interaction of AV_FIFO_FLAG_AUTO_GROW with av_fifo_can_write()	2022-09-05 08:59:14 +02:00
Anton Khirnov	c9b6fd27bf	lavu/fifo: add the header to its own doxy group Also, drop mentions of it being a circular buffer, as this is an internal implementation detail that should be invisible to the caller.	2022-09-05 08:58:51 +02:00
Andreas Rheinhardt	f89949afed	avutil/tests/.gitignore: Add channel_layout testtool Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-05 02:00:08 +02:00
Philip Langdale	2f9b8bbd1f	lavu/hwcontext_vulkan: support mapping VUYX, P012, and XV36 If we want to be able to map between VAAPI and Vulkan (to do Vulkan filtering), we need to have matching formats on each side. The mappings here are not exact. In the same way that P010 is still mapped to full 16 bit formats, P012 has to be mapped that way as well. Similarly, VUYX has to be mapped to an alpha-equipped format, and XV36 has to be mapped to a fully 16bit alpha-equipped format. While Vulkan seems to fundamentally lack formats with an undefined, but physically present, alpha channel, it has have 10X6 and 12X4 formats that you could imagine using for P010, P012 and XV36, but these formats don't support the STORAGE usage flag. Today, hwcontext_vulkan requires all formats to be storable because it wants to be able to use them to create writable images. Until that changes, which might happen, we have to restrict the set of formats we use. Finally, when mapping a Vulkan image back to vaapi, I observed that the VK_FORMAT_R16G16B16A16_UNORM format we have to use for XV36 going to Vulkan is mapped to Y416 when going to vaapi (which makes sense as it's the exact matching format) so I had to add an entry for it even though we don't use it directly.	2022-09-03 16:19:40 -07:00
Philip Langdale	b982dd0d83	lavc/vaapi: Add support for remaining 10/12bit profiles With the necessary pixel formats defined, we can now expose support for the remaining 10/12bit combinations that VAAPI can handle. Specifically, we are adding support for: * HEVC 12bit 420 10bit 422 12bit 422 10bit 444 ** 12bit 444 * VP9 10bit 444 12bit 444 These obviously require actual hardware support to be usable, but where that exists, it is now enabled. Note that unlike YUVA/YUVX, the Intel driver does not formally expose support for the alphaless formats XV30 and XV360, and so we are implicitly discarding the alpha from the decoder and passing undefined values for the alpha to the encoder. If a future encoder iteration was to actually do something with the alpha bits, we would need to use a formal alpha capable format or the encoder would need to explicitly accept the alphaless format.	2022-09-03 16:19:40 -07:00
Philip Langdale	d75c4693fe	lavu/pixfmt: Add P012, Y212, XV30, and XV36 formats These are the formats we want/need to use when dealing with the Intel VAAPI decoder for 12bit 4:2:0, 12bit 4:2:2, 10bit 4:4:4 and 12bit 4:4:4 respectively. As with the already supported Y210 and YUVX (XVUY) formats, they are based on formats Microsoft picked as their preferred 4:2:2 and 4:4:4 video formats, and Intel ran with it. P12 and Y212 are simply an extension of 10 bit formats to say 12 bits will be used, with 4 unused bits instead of 6. XV30, and XV36, as exotic as they sound, are variants of Y410 and Y412 where the alpha channel is left formally undefined. We prefer these over the alpha versions because the hardware cannot actually do anything with the alpha channel and respecting it is just overhead. Y412/XV46 is a normal looking packed 4 channel format where each channel is 16bits wide but only the 12msb are used (like P012). Y410/XV30 packs three 10bit channels in 32bits with 2bits of alpha, like A/X2RGB10 style formats. This annoying layout forced me to define the BE version as a bitstream format. It seems like our pixdesc infrastructure can handle the LE version being byte-defined, but not when it's reversed. If there's a better way to handle this, please let me know. Our existing X2 formats all have the 2 bits at the MSB end, but this format places them at the LSB end and that seems to be the root of the problem.	2022-09-03 16:19:40 -07:00
Rémi Denis-Courmont	620e6e1487	arm: relax byte-swap assembler constraints There are no particular reasons to force the compiler to use the same register as output and input operand. This forces an extra MOV instruction if the input value needs to be reused after the swap. In most cases, this makes no differences, as the compiler will seleect the same register for both operands either way. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-03 23:54:05 +03:00
Rémi Denis-Courmont	164021423a	aarch64: relax byte-swap assembler constraints There are no particular reasons to force the compiler to use the same register as output and input operand. This forces an extra MOV instruction if the input value needs to be reused after the swap. In most cases, this makes no differences, as the compiler will seleect the same register for both operands either way. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-03 23:54:05 +03:00
Andreas Rheinhardt	73fada029c	avcodec/codec_internal: Add macros for update_thread_context(_for_user) It reduces typing: Before this patch, there were 11 callbacks that exceeded the 80 char line length limit; now there are zero. It also allows to remove ONLY_IF_THREADS_ENABLED() in libavutil/internal.h. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:42:57 +02:00
Andreas Rheinhardt	48286d4d98	avcodec/codec_internal: Add macro to set AVCodec.long_name It reduces typing: Before this patch, there were 105 codecs whose long_name-definition exceeded the 80 char line length limit. Now there are only nine of them. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:42:57 +02:00
Andreas Rheinhardt	dea9744560	avutil/file: Properly deprecate av_tempfile() It has been deprecated in `b4f59beeb4`, but the attribute_deprecated was not set and there was no entry in APIchanges. This commit adds these and schedules it for removal. Given that the reason behind the deprecation is exactly the same as in av_fopen_utf8(), reuse its FF_API_AV_FOPEN_UTF8. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:42:40 +02:00
Andreas Rheinhardt	72c601e0f7	avutil/internal: Move avpriv-file API to a header of its own It is not used by the large majority of files that include lavu/internal.h. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:41:44 +02:00
Andreas Rheinhardt	04b7217872	avutil/dict: Move avpriv_dict_set_timestamp() to a header of its own It is used almost nowhere, so it needn't be auto-included almost everywhere. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:41:44 +02:00
Andreas Rheinhardt	26325cceb0	avutil/internal: Remove unused FF_SYMVER They are unused since `d63443b968`. Furthermore, they were always in the wrong header: libavutil/internal.h is auto-included almost everywhere, but FF_SYMVER would only ever be used at a few places, so a proper header of its own would be appropriate for it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:41:44 +02:00
Andreas Rheinhardt	5b0856d2b9	avutil/internal: Remove unused ff_rint64_clip() Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:41:44 +02:00
Martin Storsjö	e4ac156b7c	libavcodec: Set hidden visibility on global symbols accessed from AArch64 assembly The AArch64 assembly accesses those symbols directly, without indirection via e.g. the GOT on ELF. In order for this not to require text relocations, those symbols need to be resolved fully at link time, i.e. those symbols can't be interposable. Normally, so far, this is achieved when linking shared libraries in two ways; we have a version script (libavcodec/libavcodec.v) which marks all symbols that don't start with av* as local. Additionally, we try to add -Wl,-Bsymbolic to the linker options if supported, making sure that such symbol references are resolved fully at link time, instead of making them interposable. When the libavcodec static library is linked into another shared library, there's no guarantee that it uses similar options (even though that would be favourable), which would end up requiring text relocations in the AArch64 assembly. Explicitly mark the symbols that are accessed from AArch64 assembly as hidden, so that they are resolved fully at link time even without the version script and -Wl,-Bsymbolic. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-02 23:13:29 +03:00
Martin Storsjö	0dd8fe6f4b	arm: Check the build time constants in av_clip_*intp2 This fixes building for arm targets with optimizations disabled. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-02 23:12:26 +03:00
Philip Langdale	caf26a8a12	lavc/vaapi: Switch preferred 8bit 444 format to VUYX As vaapi doesn't actually do anything useful with the alpha channel, and we have an alphaless format available, let's use that instead. The changes here are mostly 1:1 switching, but do note the explicit change in the number of declared channels from 4 to 3 to reflect that the alpha is being ignored.	2022-08-25 19:04:10 -07:00
Philip Langdale	cc5a5c9860	lavu/pixfmt: Introduce VUYX format This is the alphaless version of VUYA that I introduced recently. After further discussion and noting that the Intel vaapi driver explicitly lists XYUV as a support format for encoding and decoding 8bit 444 content, we decided to switch our usage and avoid the overhead of having a declared alpha channel around. Note that I am not removing VUYA, as this turned out to have another use, which was to replace the need for v408enc/dec when dealing with the format. The vaapi switching will happen in the next change	2022-08-25 19:02:49 -07:00
Lynne	f932b89ea3	lavu/tx: implement aarch64 NEON SIMD FFT The fastest fast Fourier transform in not just the west, but the world, now for the most popular toy ISA. On a high level, it follows the design of the AVX2 version closely, with the exception that the input is slightly less permuted as we don't have to do lane switching with the input on double 4pt and 8pt. On a low level, the lack of subadd/addsub instructions REALLY penalizes any attempt at writing an FFT. That single register matters a lot, and reloading it simply takes unacceptably long. In x86 land, vendors would've noticed developers need this. In ARM land, you get a badly designed complex multiplication instruction we cannot use, that's not present on 95% of devices. Because only compilers matter, right? Future optimization options are very few, perhaps better register management to use more ld1/st1s. All timings below are in cycles: A53: Length \| C \| New (lavu) \| Old (lavc) \| FFTW ------ \|-------------\|-------------\|-------------\|----- 4 \| 842 \| 420 \| 1210 \| 1460 8 \| 1538 \| 1020 \| 1850 \| 2520 16 \| 3717 \| 1900 \| 3700 \| 3990 32 \| 9156 \| 4070 \| 8289 \| 8860 64 \| 21160 \| 9931 \| 18600 \| 19625 128 \| 49180 \| 23278 \| 41922 \| 41922 256 \| 112073 \| 53876 \| 93202 \| 101092 512 \| 252864 \| 122884 \| 205897 \| 207868 1024 \| 560512 \| 278322 \| 458071 \| 453053 2048 \| 1295402 \| 775835 \| 1038205 \| 1020265 4096 \| 3281263 \| 2021221 \| 2409718 \| 2577554 8192 \| 8577845 \| 4780526 \| 5673041 \| 6802722 Apple M1 New - Total for len 512 reps 2097152 = 1.459141 s Old - Total for len 512 reps 2097152 = 2.251344 s FFTW - Total for len 512 reps 2097152 = 1.868429 s New - Total for len 1024 reps 4194304 = 6.490080 s Old - Total for len 1024 reps 4194304 = 9.604949 s FFTW - Total for len 1024 reps 4194304 = 7.889281 s New - Total for len 16384 reps 262144 = 10.374001 s Old - Total for len 16384 reps 262144 = 15.266713 s FFTW - Total for len 16384 reps 262144 = 12.341745 s New - Total for len 65536 reps 8192 = 1.769812 s Old - Total for len 65536 reps 8192 = 4.209413 s FFTW - Total for len 65536 reps 8192 = 3.012365 s New - Total for len 131072 reps 4096 = 1.942836 s Old - Segfaults FFTW - Total for len 131072 reps 4096 = 3.713713 s Thanks to wbs for some simplifications, assembler fixes and a review and to jannau for giving it a look.	2022-08-25 17:40:28 +02:00
Andreas Rheinhardt	0bb0c26799	avutil/mem_internal: Fix headers Including avassert.h is unnecessary since commit `786be70e28`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-24 03:43:52 +02:00
Timo Rothenpieler	ef2c2a2220	avutil/half2float: use native _Float16 if available _Float16 support was available on arm/aarch64 for a while, and with gcc 12 was enabled on x86 as long as SSE2 is supported. If the target arch supports f16c, gcc emits fairly efficient assembly, taking advantage of it. This is the case on x86-64-v3 or higher. Same goes on arm, which has native float16 support. On x86, without f16c, it emulates it in software using sse2 instructions. This has shown to perform rather poorly: _Float16 full SSE2 emulation: frame=50074 fps=848 q=-0.0 size=N/A time=00:33:22.96 bitrate=N/A speed=33.9x _Float16 f16c accelerated (Zen2, --cpu=znver2): frame=50636 fps=1965 q=-0.0 Lsize=N/A time=00:33:45.40 bitrate=N/A speed=78.6x classic half2float full software implementation: frame=49926 fps=1605 q=-0.0 Lsize=N/A time=00:33:17.00 bitrate=N/A speed=64.2x Hence an additional check was introduced, that only enables use of _Float16 on x86 if f16c is being utilized. On aarch64, a similar uplift in performance is seen: RPi4 half2float full software implementation: frame= 6088 fps=126 q=-0.0 Lsize=N/A time=00:04:03.48 bitrate=N/A speed=5.06x RPi4 _Float16: frame= 6103 fps=158 q=-0.0 Lsize=N/A time=00:04:04.08 bitrate=N/A speed=6.32x Since arm/aarch64 always natively support 16 bit floats, it can always be considered fast there. I'm not aware of any additional platforms that currently support _Float16. And if there are, they should be considered non-fast until proven fast.	2022-08-19 22:09:36 +02:00
Timo Rothenpieler	6dc79f1d04	avutil/half2float: move non-inline init code out of header	2022-08-19 22:09:36 +02:00
Timo Rothenpieler	f3fb528cd5	avutil/half2float: move tables to header-internal structs Having to put the knowledge of the size of those arrays into a multitude of places is rather smelly.	2022-08-19 22:09:36 +02:00
Timo Rothenpieler	cb8ad005bb	avutil/half2float: adjust conversion of NaN IEEE-754 differentiates two different kind of NaNs. Quiet and Signaling ones. They are differentiated by the MSB of the mantissa. For whatever reason, actual hardware conversion of half to single always sets the signaling bit to 1 if the mantissa is != 0, and to 0 if it's 0. So our code has to follow suite or fate-testing hardware float16 will be impossible.	2022-08-19 22:09:36 +02:00
Timo Rothenpieler	b42925264a	avutil: move half-precision float helper to avutil	2022-08-19 22:09:36 +02:00
Lynne	ae66a9db7b	lavu/tx: optimize and simplify inverse MDCTs Convert the input from a scatter to a gather instead, which is faster and better for SIMD. Also, add a pre-shuffled exptab version to avoid gathering there at all. This doubles the exptab size, but the speedup makes it worth it. In SIMD, the exptab will likely be purged to a higher cache anyway because of the FFT in the middle, and the amount of loads stays identical. For a 960-point inverse MDCT, the speedup is 10%. This makes it possible to write sane and fast SIMD versions of inverse MDCTs.	2022-08-16 01:22:38 +02:00
Timo Rothenpieler	dd94a03468	avutil/hwcontext_d3d11va: add support for rgbaf16 pixel format	2022-08-13 15:21:59 +02:00
Timo Rothenpieler	e95b08a7dd	lavu/pixfmt: add packed RGBA float16 format This is the default format of the Windows compositor and what DXGI Desktop Duplication will give you for any kind of HDR output.	2022-08-13 15:21:46 +02:00
Timo Rothenpieler	b77fff47d0	configure: always enable gnu_windres if available Use the appropiate Makefile variable to ensure the resource file is only built into shared libraries instead.	2022-08-13 14:42:36 +02:00
Haihao Xiang	05bd88dca2	lavu/hwcontext_qsv: make qsv hwdevice works with oneVPL In oneVPL, MFXLoad() and MFXCreateSession() are required to create a workable mfx session[1] Add config filters for D3D9/D3D11 session (galinart) The default device is changed to d3d11va for oneVPL when both d3d11va and dxva2 are enabled on Microsoft Windows This is in preparation for oneVPL support [1] https://spec.oneapi.io/versions/latest/elements/oneVPL/source/programming_guide/VPL_prg_session.html#onevpl-dispatcher Co-authored-by: galinart <artem.galin@intel.com> Signed-off-by: galinart <artem.galin@intel.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2022-08-12 10:43:39 +08:00
Haihao Xiang	e0bbdbe0a6	lavu/hwcontext_qsv: add loader field to AVQSVDeviceContext In oneVPL, a valid mfxLoader handle is needed when creating mfx session for decoding, encoding and processing[1], so add loader field to AVQSVDeviceContext. User should fill this field before calling av_hwdevice_ctx_init() if using oneVPL This is in preparation for oneVPL support [1]https://spec.oneapi.io/versions/latest/elements/oneVPL/source/programming_guide/VPL_prg_session.html#onevpl-dispatcher	2022-08-12 10:43:39 +08:00
Haihao Xiang	c77149bc37	qsv: restrict OPAQUE memory to MFX_VERSION < 2.0 OPAQUE memory isn't supported for MFX_VERSION >= 2.0[1][2]. This is in preparation for oneVPL support [1] https://spec.oneapi.io/versions/latest/elements/oneVPL/source/VPL_intel_media_sdk.html#msdk-full-name-feature-removals [2] https://github.com/oneapi-src/oneVPL	2022-08-12 10:43:39 +08:00
Haihao Xiang	3e61b7dd7f	qsv: remove mfx/ prefix from mfx headers The following Cflags has been added to libmfx.pc, so mfx/ prefix is no longer needed when including mfx headers in FFmpeg. Cflags: -I${includedir} -I${includedir}/mfx Some old versions of libmfx have the following Cflags in libmfx.pc Cflags: -I${includedir} We may add -I${includedir}/mfx to CFLAGS when running 'configure --enable-libmfx' for old versions of libmfx, if so, mfx headers without mfx/ prefix can be included too. If libmfx comes without pkg-config support, we may do a small change to the settings of the environment(e.g. set -I/opt/intel/mediasdk/include/mfx instead of -I/opt/intel/mediasdk/include to CFLAGS), then the build can find the mfx headers without mfx/ prefix After applying this change, we won't need to change #include for mfx headers when mfx headers are installed under a new directory. This is in preparation for oneVPL support (mfx headers in oneVPL are installed under vpl directory)	2022-08-12 10:43:39 +08:00
Andreas Rheinhardt	d576b37fa7	avutil/buffer: Never poison returned buffers Poisoning returned buffers is based around the implicit assumption that the contents of said buffers are transient. Yet this is not true for the buffer pools used by the various hardware contexts which store important state in there that needs to be preserved. Furthermore, the current code is also based on the assumption that the complete buffer pointed to by AVBuffer->data coincides with AVBufferRef->data; yet an implementation might store some data of its own before the actual user-visible data (accessible via AVBufferRef) which would be broken by the current code. (This is of course yet more proof that the AVBuffer API is not the right tool for the hardware contexts.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-10 18:49:35 +02:00
Lynne	98b32ef462	x86/tx_float: save a branch during coefficient deinterleaving Directly branch into the special 64-point deinterleave subroutine rather than going through the general deinterleave. 64-point transform timings on Zen 3: Before: 1974 decicycles in av_tx (fft),16776864 runs, 352 skips After: 1956 decicycles in av_tx (fft),16775378 runs, 1838 skips	2022-08-09 03:35:12 +02:00
Zhao Zhili	fc13803323	avutil/hwcontext_videotoolbox: add missing include for AVFrame Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2022-08-08 11:08:55 +08:00
James Almer	85c59bd6de	avutil/test/pixfmt_best: test the VUYA pixel format Signed-off-by: James Almer <jamrial@gmail.com>	2022-08-07 09:33:16 -03:00
Andreas Rheinhardt	2c8dc7e953	avcodec/loongarch/h264chroma, vc1dsp_lasx: Add wrapper for __lasx_xvldx __lasx_xvldx does not accept a pointer to const (in fact, no function in lasxintrin.h does so), although it is not allowed to modify the pointed-to buffer. Therefore this commit adds a wrapper for it in order to constify the H264Chroma API in a later commit. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-05 02:59:58 +02:00
Andreas Rheinhardt	6c9a60ada4	avcodec/loongarch: Add wrapper for __lsx_vldx __lsx_vldx does not accept a pointer to const (in fact, no function in lsxintrin.h does so), although it is not allowed to modify the pointed-to buffer. Therefore this commit adds a wrapper for it in order to constify the HEVC DSP functions in a later commit. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-05 02:53:35 +02:00
Philip Langdale	2b720676e0	lavu/hwcontext_vaapi: Map the AYUV format This is the format used by Intel VAAPI for 8bit 4:4:4 content.	2022-08-03 14:10:12 -07:00
Philip Langdale	6ab8a9d375	lavu/pixfmt: Add packed 4:4:4 format The "AYUV" format is defined by Microsoft as their preferred format for 4:4:4 content, and so it is the format used by Intel VAAPI and QSV. As Microsoft like to define their byte ordering in little-endian fashion, the memory order is reversed, and so our pix_fmt, which follows memory order, has a reversed name (VUYA).	2022-08-03 14:09:46 -07:00
Andreas Rheinhardt	8d7d52721a	avutil/opt: Combine multiple av_log statements Reviewed-by: Thilo Borgmann <thilo.borgmann@mail.de> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-03 21:09:24 +02:00
Anton Khirnov	eede1d2927	lavu/frame: allow calling av_frame_make_writable() on non-refcounted frames This is an easy way to make a refcounted frame from a non-refcounted one.	2022-08-02 10:44:37 +02:00
Anton Khirnov	4397f9a5a0	lavu/frame: add a duration field to AVFrame The only duration field currently present in AVFrame is pkt_duration, which is semantically restricted to those frames that are output by decoders. Add a new field that stores the frame's duration without regard for how that frame was produced. Deprecate pkt_duration.	2022-07-19 12:27:17 +02:00
Timo Rothenpieler	63ce42019c	avutil/hwcontext_d3d11va: add BGRA/RGBA10 formats support Desktop duplication outputs those	2022-07-18 00:32:14 +02:00
Timo Rothenpieler	6cbb7d673d	avutil/hwcontext_d3d11va: update hwctx flags from input texture At least QSV relies on those being set correctly when deriving a hwctx.	2022-07-18 00:32:14 +02:00
Timo Rothenpieler	30bbc0a624	avutil/hwcontext_d3d11va: fix texture_infos writes on non-fixed-size pools	2022-07-18 00:32:14 +02:00
Timo Rothenpieler	e18c575474	avutil/hwcontext_d3d11va: fix mixed declaration and code	2022-07-18 00:32:14 +02:00
Michael Niedermayer	fd26b07e8b	Bump versions after 5.1 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-07-13 00:29:05 +02:00
Michael Niedermayer	6f1b144358	Bump Versions for 5.1 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-07-13 00:27:37 +02:00
Paul B Mahol	6ed9eaf664	avfilter: add remap opencl filter	2022-07-07 17:52:32 +02:00
Andreas Rheinhardt	aca09ed7d4	avutil/mem: Handle fast allocations near UINT_MAX properly av_fast_realloc and av_fast_mallocz? store the size of the objects they allocate in an unsigned. Yet they overallocate and currently they can allocate more than UINT_MAX bytes in case a user has requested a size of about UINT_MAX * 16 / 17 or more if SIZE_MAX > UINT_MAX (and if the user increased max_alloc_size via av_max_alloc). In this case it is impossible to store the true size of the buffer via the unsigned*; future requests are likely to use the (re)allocation codepath even if the buffer is actually large enough because of the incorrect size. Fix this by ensuring that the actually allocated size always fits into an unsigned. (This entails erroring out in case the user requested more than UINT_MAX.) Reviewed-by: Tomas Härdin <tjoppen@acc.umu.se> Reviewed-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-07-06 22:53:15 +02:00
Lynne	f9dd8fcf9b	)hwcontext: add a stub implementation for Vulkan functions	2022-07-05 15:20:08 +02:00
James Almer	0afdc95767	Revert "avutil/channel_layout: av_channel_layout_describe_bprint: Check for buffer end" The doxy for av_channel_layout_describe() states that the user should look at the return value to check if the string was truncated. Returning an error code in this scenario goes against this and is an API break. A proper fix for the timeout was applied to the Matroska demuxer in `94901a9518`. This reverts commit `8154cb7c2f`.	2022-07-04 14:04:54 -03:00
Michael Niedermayer	8154cb7c2f	avutil/channel_layout: av_channel_layout_describe_bprint: Check for buffer end Fixes: Timeout printing a billion channels Fixes: 48099/clusterfuzz-testcase-minimized-ffmpeg_dem_MATROSKA_fuzzer-6754782204788736 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-07-02 19:22:36 +02:00
Andreas Rheinhardt	4454142782	avutil/wchar_filename: Make the header C++ compatible When compiling decklink, this header is included from a C++ file (albeit inside 'extern "C"') and this causes compilation failures because of an implicit void* -> char* conversion. So add an explicit cast. Fixes ticket #9819. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-28 10:59:31 +02:00
Brad Smith	beaf172d75	avutil/ppc/cpu: Use proper header for OpenBSD PPC CPU detection Use the proper header for PPC CPU detection code. sys/param.h includes sys/types, but sys/types.h is the more appropriate header to be used here. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-06-25 12:16:51 +02:00
Andreas Rheinhardt	2718a3be1f	avutil/x86/float_dsp: Remove obsolete 3dnowext function x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT, SSE and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2). So given that the only systems which benefit from ff_vector_fmul_window_3dnowext are truely ancient 32bit AMD x86s it is removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:37:22 +02:00
Andreas Rheinhardt	ea043cc53e	avutil/x86/pixelutils: Remove obsolete MMX(EXT) functions x64 always has MMX, MMXEXT, SSE and SSE2 and this means that some functions for MMX, MMXEXT, SSE and 3dnow are always overridden by other functions (unless one e.g. explicitly disables SSE2). So given that the only systems which benefit from the 8x8 MMX (overridden by MMXEXT) or the 16x16 MMXEXT (overridden by SSE2) are truely ancient 32bit x86s they are removed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-22 13:36:44 +02:00
Nil Admirari	cc5844da98	libavutil: Add wchartoutf8(), wchartoansi(), utf8toansi(), getenv_utf8(), freeenv_utf8() and getenv_dup() wchartoutf8() converts strings returned by WinAPI into UTF-8, which is FFmpeg's preffered encoding. Some external dependencies, such as AviSynth, are still not Unicode-enabled. utf8toansi() converts UTF-8 strings into ANSI in two steps: UTF-8 -> wchar_t -> ANSI. wchartoansi() is responsible for the second step of the conversion. Conversion in just one step is not supported by WinAPI. Since these character converting functions allocate the buffer of necessary size, they also facilitate the removal of MAX_PATH limit in places where fixed-size ANSI/WCHAR strings were used as filename buffers. On Windows, getenv_utf8() wraps _wgetenv() converting its input from and its output to UTF-8. Strings returned by getenv_utf8() must be freed by freeenv_utf8(). On all other platforms getenv_utf8() is a wrapper around getenv(), and freeenv_utf8() is a no-op. The value returned by plain getenv() cannot be modified; av_strdup() is usually used when modifications are required. However, on Windows, av_strdup() after getenv_utf8() leads to unnecessary allocation. getenv_dup() is introduced to avoid such an allocation. Value returned by getenv_dup() must be freed by av_free(). Because of cleanup complexities, in places that only test the existence of an environment variable or compare its value with a string consisting entirely of ASCII characters, the use of plain getenv() is still preferred. (libavutil/log.c check_color_terminal() is an example of such a place.) Plain getenv() is also preffered in UNIX-only code, such as bktr.c, fbdev_common.c, oss.c in libavdevice or af_ladspa.c in libavfilter. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-06-21 13:27:46 +03:00
Andreas Rheinhardt	ac322ec214	avutil/cpu_internal: Fix check for SSE2SLOW For SSE2 and SSE3, there are four states that the two flags involved (AV_CPU_FLAG_SSE[23] and AV_CPU_FLAG_SSE[23]SLOW) can convey. When ordered from worst to best they are: 1. both flags unset (SSE[23] unavailable) 2. the slow flag set, the ordinary flag unset (this is designed for cases where SSE2 is available, but so slow that MMX(EXT)/SSE code is usually faster) 3. both flags set (SSE2 is available, but there might be scenarios where MMX(EXT)/SSE code is faster) 4. the ordinary flag set, the slow flag unset (this is the normal case) The ordinary macros for checking cpuflags return true in the latter two cases; the fast macros only return true for the latter case. Yet the macros to check for slow currently only return true in case three. This seems unintended. In fact, the only uses of the slow macros are all of the form if (EXTERNAL_SSE2(cpu_flags) \|\| EXTERNAL_SSE2_SLOW(cpu_flags)) where the check for EXTERNAL_SSE2_SLOW is completely redundant. Even more importantly, it is not what was intended. Before `6369ba3c9c`, the checks passed in cases 2 to 4. Said commit changed this to something that only passes for the third case. Commits `7fb758cd8e` and `c1913064e3` restored the old behaviour, yet merging `4efab89332` (in commit `ac774cfa57`) broke this again by changing it to what it is now.* This commit changes the macros to make the slow macros check whether a specific instruction is supported, even if slow. This restores the intended meaning to all uses of the SLOW macros and is generally more natural. *: Libav only checks for EXTERNAL_SSE2_SLOW, i.e. for the third case only. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-18 19:25:03 +02:00
Andreas Rheinhardt	40e6575aa3	all: Replace if (ARCH_FOO) checks by #if ARCH_FOO This is more spec-compliant because it does not rely on dead-code elimination by the compiler. Especially MSVC has problems with this, as can be seen in https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/296373.html or https://ffmpeg.org/pipermail/ffmpeg-devel/2022-May/297022.html This commit does not eliminate every instance where we rely on dead code elimination: It only tackles branching to the initialization of arch-specific dsp code, not e.g. all uses of CONFIG_ and HAVE_ checks. But maybe it is already enough to compile FFmpeg with MSVC with whole-programm-optimizations enabled (if one does not disable too many components). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-06-15 04:56:37 +02:00
Zane van Iperen	ff59ecc4de	avutil: bump version after UUID changes Forgot to bump after `76e95daa08`. Signed-off-by: Zane van Iperen <zane@zanevaniperen.com> Signed-off-by: James Almer <jamrial@gmail.com>	2022-06-13 22:01:28 -03:00
Pierre-Anthony Lemieux	7c2f029ede	avutil/tests/uuid: add uuid tests	2022-06-12 18:34:37 +10:00
Zane van Iperen	76e95daa08	avutil/uuid: add utility library for manipulating UUIDs as specified in RFC 4122 Co-authored-by: Pierre-Anthony Lemieux <pal@palemieux.com> Signed-off-by: Zane van Iperen <zane@zanevaniperen.com>	2022-06-12 18:34:28 +10:00
softworkz	c5aba39a04	avutil/wchar_filename,file_open: Support long file names on Windows Signed-off-by: softworkz <softworkz@hotmail.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2022-06-09 13:03:47 +03:00
softworkz	22ab2a375d	libavutil/tests/md5: Remove 'volatile workaround' to avoid warnings Those are always showing up on Patchwork when FATE tests are failing, covering some possibly more useful information. The volatile keyword was used as a workaround for an eight year old clang version. Signed-off-by: softworkz <softworkz@hotmail.com> Signed-off-by: Marton Balint <cus@passwd.hu>	2022-06-08 22:24:31 +02:00
James Almer	5929ea6d4b	avutil/avframe: fix channel layout checks in av_frame_copy() Normally, both the source and dest frame would have only the old API fields set, only the new API fields set, or both set. But in some cases, like when calling av_frame_ref() using a non reference counted source frame where only the old channel layout API fields were populated, the result would be the dst frame having both the new and old fields populated. This commit takes this into account and fixes the checks by calling av_channel_layout_compare() only if the source frame has the new API fields set, and doing sanity checks for the source frame old API fields if the new ones are not set. Signed-off-by: James Almer <jamrial@gmail.com>	2022-06-05 09:09:07 -03:00
Leo Izen	d42b410e05	avutil/csp: create public API for colorspace structs This commit moves some of the functionality from avfilter/colorspace into avutil/csp and exposes it as a public API so it can be used by libavcodec and/or libavformat. It also converts those structs from double values to AVRational to make regression testing easier and more consistent. Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>	2022-06-01 13:52:38 -04:00
Zhao Zhili	5d8d3c1ac2	avutil/mem: fix doc for reallocs The doc says those function are like av_free if size or nmemb is zero. It doesn't match the code. av_realloc() realloc one byte if size is zero, which was added by `91ff05f6ac` ten years ago. realloc() itself in C is implementation-dependent. Make the doc match the longstanding behaviour. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2022-05-26 17:18:23 +08:00

1 2 3 4 5 ...

5766 Commits