FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-03 05:10:03 +02:00

Author	SHA1	Message	Date
Philip Langdale	b982dd0d83	lavc/vaapi: Add support for remaining 10/12bit profiles With the necessary pixel formats defined, we can now expose support for the remaining 10/12bit combinations that VAAPI can handle. Specifically, we are adding support for: * HEVC 12bit 420 10bit 422 12bit 422 10bit 444 ** 12bit 444 * VP9 10bit 444 12bit 444 These obviously require actual hardware support to be usable, but where that exists, it is now enabled. Note that unlike YUVA/YUVX, the Intel driver does not formally expose support for the alphaless formats XV30 and XV360, and so we are implicitly discarding the alpha from the decoder and passing undefined values for the alpha to the encoder. If a future encoder iteration was to actually do something with the alpha bits, we would need to use a formal alpha capable format or the encoder would need to explicitly accept the alphaless format.	2022-09-03 16:19:40 -07:00
Philip Langdale	d75c4693fe	lavu/pixfmt: Add P012, Y212, XV30, and XV36 formats These are the formats we want/need to use when dealing with the Intel VAAPI decoder for 12bit 4:2:0, 12bit 4:2:2, 10bit 4:4:4 and 12bit 4:4:4 respectively. As with the already supported Y210 and YUVX (XVUY) formats, they are based on formats Microsoft picked as their preferred 4:2:2 and 4:4:4 video formats, and Intel ran with it. P12 and Y212 are simply an extension of 10 bit formats to say 12 bits will be used, with 4 unused bits instead of 6. XV30, and XV36, as exotic as they sound, are variants of Y410 and Y412 where the alpha channel is left formally undefined. We prefer these over the alpha versions because the hardware cannot actually do anything with the alpha channel and respecting it is just overhead. Y412/XV46 is a normal looking packed 4 channel format where each channel is 16bits wide but only the 12msb are used (like P012). Y410/XV30 packs three 10bit channels in 32bits with 2bits of alpha, like A/X2RGB10 style formats. This annoying layout forced me to define the BE version as a bitstream format. It seems like our pixdesc infrastructure can handle the LE version being byte-defined, but not when it's reversed. If there's a better way to handle this, please let me know. Our existing X2 formats all have the 2 bits at the MSB end, but this format places them at the LSB end and that seems to be the root of the problem.	2022-09-03 16:19:40 -07:00
Rémi Denis-Courmont	620e6e1487	arm: relax byte-swap assembler constraints There are no particular reasons to force the compiler to use the same register as output and input operand. This forces an extra MOV instruction if the input value needs to be reused after the swap. In most cases, this makes no differences, as the compiler will seleect the same register for both operands either way. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-03 23:54:05 +03:00
Rémi Denis-Courmont	164021423a	aarch64: relax byte-swap assembler constraints There are no particular reasons to force the compiler to use the same register as output and input operand. This forces an extra MOV instruction if the input value needs to be reused after the swap. In most cases, this makes no differences, as the compiler will seleect the same register for both operands either way. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-03 23:54:05 +03:00
Andreas Rheinhardt	73fada029c	avcodec/codec_internal: Add macros for update_thread_context(_for_user) It reduces typing: Before this patch, there were 11 callbacks that exceeded the 80 char line length limit; now there are zero. It also allows to remove ONLY_IF_THREADS_ENABLED() in libavutil/internal.h. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:42:57 +02:00
Andreas Rheinhardt	48286d4d98	avcodec/codec_internal: Add macro to set AVCodec.long_name It reduces typing: Before this patch, there were 105 codecs whose long_name-definition exceeded the 80 char line length limit. Now there are only nine of them. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:42:57 +02:00
Andreas Rheinhardt	dea9744560	avutil/file: Properly deprecate av_tempfile() It has been deprecated in `b4f59beeb4`, but the attribute_deprecated was not set and there was no entry in APIchanges. This commit adds these and schedules it for removal. Given that the reason behind the deprecation is exactly the same as in av_fopen_utf8(), reuse its FF_API_AV_FOPEN_UTF8. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:42:40 +02:00
Andreas Rheinhardt	72c601e0f7	avutil/internal: Move avpriv-file API to a header of its own It is not used by the large majority of files that include lavu/internal.h. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:41:44 +02:00
Andreas Rheinhardt	04b7217872	avutil/dict: Move avpriv_dict_set_timestamp() to a header of its own It is used almost nowhere, so it needn't be auto-included almost everywhere. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:41:44 +02:00
Andreas Rheinhardt	26325cceb0	avutil/internal: Remove unused FF_SYMVER They are unused since `d63443b968`. Furthermore, they were always in the wrong header: libavutil/internal.h is auto-included almost everywhere, but FF_SYMVER would only ever be used at a few places, so a proper header of its own would be appropriate for it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:41:44 +02:00
Andreas Rheinhardt	5b0856d2b9	avutil/internal: Remove unused ff_rint64_clip() Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-09-03 15:41:44 +02:00
Martin Storsjö	e4ac156b7c	libavcodec: Set hidden visibility on global symbols accessed from AArch64 assembly The AArch64 assembly accesses those symbols directly, without indirection via e.g. the GOT on ELF. In order for this not to require text relocations, those symbols need to be resolved fully at link time, i.e. those symbols can't be interposable. Normally, so far, this is achieved when linking shared libraries in two ways; we have a version script (libavcodec/libavcodec.v) which marks all symbols that don't start with av* as local. Additionally, we try to add -Wl,-Bsymbolic to the linker options if supported, making sure that such symbol references are resolved fully at link time, instead of making them interposable. When the libavcodec static library is linked into another shared library, there's no guarantee that it uses similar options (even though that would be favourable), which would end up requiring text relocations in the AArch64 assembly. Explicitly mark the symbols that are accessed from AArch64 assembly as hidden, so that they are resolved fully at link time even without the version script and -Wl,-Bsymbolic. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-02 23:13:29 +03:00
Martin Storsjö	0dd8fe6f4b	arm: Check the build time constants in av_clip_*intp2 This fixes building for arm targets with optimizations disabled. Signed-off-by: Martin Storsjö <martin@martin.st>	2022-09-02 23:12:26 +03:00
Philip Langdale	caf26a8a12	lavc/vaapi: Switch preferred 8bit 444 format to VUYX As vaapi doesn't actually do anything useful with the alpha channel, and we have an alphaless format available, let's use that instead. The changes here are mostly 1:1 switching, but do note the explicit change in the number of declared channels from 4 to 3 to reflect that the alpha is being ignored.	2022-08-25 19:04:10 -07:00
Philip Langdale	cc5a5c9860	lavu/pixfmt: Introduce VUYX format This is the alphaless version of VUYA that I introduced recently. After further discussion and noting that the Intel vaapi driver explicitly lists XYUV as a support format for encoding and decoding 8bit 444 content, we decided to switch our usage and avoid the overhead of having a declared alpha channel around. Note that I am not removing VUYA, as this turned out to have another use, which was to replace the need for v408enc/dec when dealing with the format. The vaapi switching will happen in the next change	2022-08-25 19:02:49 -07:00
Lynne	f932b89ea3	lavu/tx: implement aarch64 NEON SIMD FFT The fastest fast Fourier transform in not just the west, but the world, now for the most popular toy ISA. On a high level, it follows the design of the AVX2 version closely, with the exception that the input is slightly less permuted as we don't have to do lane switching with the input on double 4pt and 8pt. On a low level, the lack of subadd/addsub instructions REALLY penalizes any attempt at writing an FFT. That single register matters a lot, and reloading it simply takes unacceptably long. In x86 land, vendors would've noticed developers need this. In ARM land, you get a badly designed complex multiplication instruction we cannot use, that's not present on 95% of devices. Because only compilers matter, right? Future optimization options are very few, perhaps better register management to use more ld1/st1s. All timings below are in cycles: A53: Length \| C \| New (lavu) \| Old (lavc) \| FFTW ------ \|-------------\|-------------\|-------------\|----- 4 \| 842 \| 420 \| 1210 \| 1460 8 \| 1538 \| 1020 \| 1850 \| 2520 16 \| 3717 \| 1900 \| 3700 \| 3990 32 \| 9156 \| 4070 \| 8289 \| 8860 64 \| 21160 \| 9931 \| 18600 \| 19625 128 \| 49180 \| 23278 \| 41922 \| 41922 256 \| 112073 \| 53876 \| 93202 \| 101092 512 \| 252864 \| 122884 \| 205897 \| 207868 1024 \| 560512 \| 278322 \| 458071 \| 453053 2048 \| 1295402 \| 775835 \| 1038205 \| 1020265 4096 \| 3281263 \| 2021221 \| 2409718 \| 2577554 8192 \| 8577845 \| 4780526 \| 5673041 \| 6802722 Apple M1 New - Total for len 512 reps 2097152 = 1.459141 s Old - Total for len 512 reps 2097152 = 2.251344 s FFTW - Total for len 512 reps 2097152 = 1.868429 s New - Total for len 1024 reps 4194304 = 6.490080 s Old - Total for len 1024 reps 4194304 = 9.604949 s FFTW - Total for len 1024 reps 4194304 = 7.889281 s New - Total for len 16384 reps 262144 = 10.374001 s Old - Total for len 16384 reps 262144 = 15.266713 s FFTW - Total for len 16384 reps 262144 = 12.341745 s New - Total for len 65536 reps 8192 = 1.769812 s Old - Total for len 65536 reps 8192 = 4.209413 s FFTW - Total for len 65536 reps 8192 = 3.012365 s New - Total for len 131072 reps 4096 = 1.942836 s Old - Segfaults FFTW - Total for len 131072 reps 4096 = 3.713713 s Thanks to wbs for some simplifications, assembler fixes and a review and to jannau for giving it a look.	2022-08-25 17:40:28 +02:00
Andreas Rheinhardt	0bb0c26799	avutil/mem_internal: Fix headers Including avassert.h is unnecessary since commit `786be70e28`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-24 03:43:52 +02:00
Timo Rothenpieler	ef2c2a2220	avutil/half2float: use native _Float16 if available _Float16 support was available on arm/aarch64 for a while, and with gcc 12 was enabled on x86 as long as SSE2 is supported. If the target arch supports f16c, gcc emits fairly efficient assembly, taking advantage of it. This is the case on x86-64-v3 or higher. Same goes on arm, which has native float16 support. On x86, without f16c, it emulates it in software using sse2 instructions. This has shown to perform rather poorly: _Float16 full SSE2 emulation: frame=50074 fps=848 q=-0.0 size=N/A time=00:33:22.96 bitrate=N/A speed=33.9x _Float16 f16c accelerated (Zen2, --cpu=znver2): frame=50636 fps=1965 q=-0.0 Lsize=N/A time=00:33:45.40 bitrate=N/A speed=78.6x classic half2float full software implementation: frame=49926 fps=1605 q=-0.0 Lsize=N/A time=00:33:17.00 bitrate=N/A speed=64.2x Hence an additional check was introduced, that only enables use of _Float16 on x86 if f16c is being utilized. On aarch64, a similar uplift in performance is seen: RPi4 half2float full software implementation: frame= 6088 fps=126 q=-0.0 Lsize=N/A time=00:04:03.48 bitrate=N/A speed=5.06x RPi4 _Float16: frame= 6103 fps=158 q=-0.0 Lsize=N/A time=00:04:04.08 bitrate=N/A speed=6.32x Since arm/aarch64 always natively support 16 bit floats, it can always be considered fast there. I'm not aware of any additional platforms that currently support _Float16. And if there are, they should be considered non-fast until proven fast.	2022-08-19 22:09:36 +02:00
Timo Rothenpieler	6dc79f1d04	avutil/half2float: move non-inline init code out of header	2022-08-19 22:09:36 +02:00
Timo Rothenpieler	f3fb528cd5	avutil/half2float: move tables to header-internal structs Having to put the knowledge of the size of those arrays into a multitude of places is rather smelly.	2022-08-19 22:09:36 +02:00
Timo Rothenpieler	cb8ad005bb	avutil/half2float: adjust conversion of NaN IEEE-754 differentiates two different kind of NaNs. Quiet and Signaling ones. They are differentiated by the MSB of the mantissa. For whatever reason, actual hardware conversion of half to single always sets the signaling bit to 1 if the mantissa is != 0, and to 0 if it's 0. So our code has to follow suite or fate-testing hardware float16 will be impossible.	2022-08-19 22:09:36 +02:00
Timo Rothenpieler	b42925264a	avutil: move half-precision float helper to avutil	2022-08-19 22:09:36 +02:00
Lynne	ae66a9db7b	lavu/tx: optimize and simplify inverse MDCTs Convert the input from a scatter to a gather instead, which is faster and better for SIMD. Also, add a pre-shuffled exptab version to avoid gathering there at all. This doubles the exptab size, but the speedup makes it worth it. In SIMD, the exptab will likely be purged to a higher cache anyway because of the FFT in the middle, and the amount of loads stays identical. For a 960-point inverse MDCT, the speedup is 10%. This makes it possible to write sane and fast SIMD versions of inverse MDCTs.	2022-08-16 01:22:38 +02:00
Timo Rothenpieler	dd94a03468	avutil/hwcontext_d3d11va: add support for rgbaf16 pixel format	2022-08-13 15:21:59 +02:00
Timo Rothenpieler	e95b08a7dd	lavu/pixfmt: add packed RGBA float16 format This is the default format of the Windows compositor and what DXGI Desktop Duplication will give you for any kind of HDR output.	2022-08-13 15:21:46 +02:00
Timo Rothenpieler	b77fff47d0	configure: always enable gnu_windres if available Use the appropiate Makefile variable to ensure the resource file is only built into shared libraries instead.	2022-08-13 14:42:36 +02:00
Haihao Xiang	05bd88dca2	lavu/hwcontext_qsv: make qsv hwdevice works with oneVPL In oneVPL, MFXLoad() and MFXCreateSession() are required to create a workable mfx session[1] Add config filters for D3D9/D3D11 session (galinart) The default device is changed to d3d11va for oneVPL when both d3d11va and dxva2 are enabled on Microsoft Windows This is in preparation for oneVPL support [1] https://spec.oneapi.io/versions/latest/elements/oneVPL/source/programming_guide/VPL_prg_session.html#onevpl-dispatcher Co-authored-by: galinart <artem.galin@intel.com> Signed-off-by: galinart <artem.galin@intel.com> Signed-off-by: Haihao Xiang <haihao.xiang@intel.com>	2022-08-12 10:43:39 +08:00
Haihao Xiang	e0bbdbe0a6	lavu/hwcontext_qsv: add loader field to AVQSVDeviceContext In oneVPL, a valid mfxLoader handle is needed when creating mfx session for decoding, encoding and processing[1], so add loader field to AVQSVDeviceContext. User should fill this field before calling av_hwdevice_ctx_init() if using oneVPL This is in preparation for oneVPL support [1]https://spec.oneapi.io/versions/latest/elements/oneVPL/source/programming_guide/VPL_prg_session.html#onevpl-dispatcher	2022-08-12 10:43:39 +08:00
Haihao Xiang	c77149bc37	qsv: restrict OPAQUE memory to MFX_VERSION < 2.0 OPAQUE memory isn't supported for MFX_VERSION >= 2.0[1][2]. This is in preparation for oneVPL support [1] https://spec.oneapi.io/versions/latest/elements/oneVPL/source/VPL_intel_media_sdk.html#msdk-full-name-feature-removals [2] https://github.com/oneapi-src/oneVPL	2022-08-12 10:43:39 +08:00
Haihao Xiang	3e61b7dd7f	qsv: remove mfx/ prefix from mfx headers The following Cflags has been added to libmfx.pc, so mfx/ prefix is no longer needed when including mfx headers in FFmpeg. Cflags: -I${includedir} -I${includedir}/mfx Some old versions of libmfx have the following Cflags in libmfx.pc Cflags: -I${includedir} We may add -I${includedir}/mfx to CFLAGS when running 'configure --enable-libmfx' for old versions of libmfx, if so, mfx headers without mfx/ prefix can be included too. If libmfx comes without pkg-config support, we may do a small change to the settings of the environment(e.g. set -I/opt/intel/mediasdk/include/mfx instead of -I/opt/intel/mediasdk/include to CFLAGS), then the build can find the mfx headers without mfx/ prefix After applying this change, we won't need to change #include for mfx headers when mfx headers are installed under a new directory. This is in preparation for oneVPL support (mfx headers in oneVPL are installed under vpl directory)	2022-08-12 10:43:39 +08:00
Andreas Rheinhardt	d576b37fa7	avutil/buffer: Never poison returned buffers Poisoning returned buffers is based around the implicit assumption that the contents of said buffers are transient. Yet this is not true for the buffer pools used by the various hardware contexts which store important state in there that needs to be preserved. Furthermore, the current code is also based on the assumption that the complete buffer pointed to by AVBuffer->data coincides with AVBufferRef->data; yet an implementation might store some data of its own before the actual user-visible data (accessible via AVBufferRef) which would be broken by the current code. (This is of course yet more proof that the AVBuffer API is not the right tool for the hardware contexts.) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-10 18:49:35 +02:00
Lynne	98b32ef462	x86/tx_float: save a branch during coefficient deinterleaving Directly branch into the special 64-point deinterleave subroutine rather than going through the general deinterleave. 64-point transform timings on Zen 3: Before: 1974 decicycles in av_tx (fft),16776864 runs, 352 skips After: 1956 decicycles in av_tx (fft),16775378 runs, 1838 skips	2022-08-09 03:35:12 +02:00
Zhao Zhili	fc13803323	avutil/hwcontext_videotoolbox: add missing include for AVFrame Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2022-08-08 11:08:55 +08:00
James Almer	85c59bd6de	avutil/test/pixfmt_best: test the VUYA pixel format Signed-off-by: James Almer <jamrial@gmail.com>	2022-08-07 09:33:16 -03:00
Andreas Rheinhardt	2c8dc7e953	avcodec/loongarch/h264chroma, vc1dsp_lasx: Add wrapper for __lasx_xvldx __lasx_xvldx does not accept a pointer to const (in fact, no function in lasxintrin.h does so), although it is not allowed to modify the pointed-to buffer. Therefore this commit adds a wrapper for it in order to constify the H264Chroma API in a later commit. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-05 02:59:58 +02:00
Andreas Rheinhardt	6c9a60ada4	avcodec/loongarch: Add wrapper for __lsx_vldx __lsx_vldx does not accept a pointer to const (in fact, no function in lsxintrin.h does so), although it is not allowed to modify the pointed-to buffer. Therefore this commit adds a wrapper for it in order to constify the HEVC DSP functions in a later commit. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-05 02:53:35 +02:00
Philip Langdale	2b720676e0	lavu/hwcontext_vaapi: Map the AYUV format This is the format used by Intel VAAPI for 8bit 4:4:4 content.	2022-08-03 14:10:12 -07:00
Philip Langdale	6ab8a9d375	lavu/pixfmt: Add packed 4:4:4 format The "AYUV" format is defined by Microsoft as their preferred format for 4:4:4 content, and so it is the format used by Intel VAAPI and QSV. As Microsoft like to define their byte ordering in little-endian fashion, the memory order is reversed, and so our pix_fmt, which follows memory order, has a reversed name (VUYA).	2022-08-03 14:09:46 -07:00
Andreas Rheinhardt	8d7d52721a	avutil/opt: Combine multiple av_log statements Reviewed-by: Thilo Borgmann <thilo.borgmann@mail.de> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-08-03 21:09:24 +02:00
Anton Khirnov	eede1d2927	lavu/frame: allow calling av_frame_make_writable() on non-refcounted frames This is an easy way to make a refcounted frame from a non-refcounted one.	2022-08-02 10:44:37 +02:00
Anton Khirnov	4397f9a5a0	lavu/frame: add a duration field to AVFrame The only duration field currently present in AVFrame is pkt_duration, which is semantically restricted to those frames that are output by decoders. Add a new field that stores the frame's duration without regard for how that frame was produced. Deprecate pkt_duration.	2022-07-19 12:27:17 +02:00
Timo Rothenpieler	63ce42019c	avutil/hwcontext_d3d11va: add BGRA/RGBA10 formats support Desktop duplication outputs those	2022-07-18 00:32:14 +02:00
Timo Rothenpieler	6cbb7d673d	avutil/hwcontext_d3d11va: update hwctx flags from input texture At least QSV relies on those being set correctly when deriving a hwctx.	2022-07-18 00:32:14 +02:00
Timo Rothenpieler	30bbc0a624	avutil/hwcontext_d3d11va: fix texture_infos writes on non-fixed-size pools	2022-07-18 00:32:14 +02:00
Timo Rothenpieler	e18c575474	avutil/hwcontext_d3d11va: fix mixed declaration and code	2022-07-18 00:32:14 +02:00
Michael Niedermayer	fd26b07e8b	Bump versions after 5.1 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-07-13 00:29:05 +02:00
Michael Niedermayer	6f1b144358	Bump Versions for 5.1 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2022-07-13 00:27:37 +02:00
Paul B Mahol	6ed9eaf664	avfilter: add remap opencl filter	2022-07-07 17:52:32 +02:00
Andreas Rheinhardt	aca09ed7d4	avutil/mem: Handle fast allocations near UINT_MAX properly av_fast_realloc and av_fast_mallocz? store the size of the objects they allocate in an unsigned. Yet they overallocate and currently they can allocate more than UINT_MAX bytes in case a user has requested a size of about UINT_MAX * 16 / 17 or more if SIZE_MAX > UINT_MAX (and if the user increased max_alloc_size via av_max_alloc). In this case it is impossible to store the true size of the buffer via the unsigned*; future requests are likely to use the (re)allocation codepath even if the buffer is actually large enough because of the incorrect size. Fix this by ensuring that the actually allocated size always fits into an unsigned. (This entails erroring out in case the user requested more than UINT_MAX.) Reviewed-by: Tomas Härdin <tjoppen@acc.umu.se> Reviewed-by: Anton Khirnov <anton@khirnov.net> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2022-07-06 22:53:15 +02:00
Lynne	f9dd8fcf9b	)hwcontext: add a stub implementation for Vulkan functions	2022-07-05 15:20:08 +02:00

1 2 3 4 5 ...

5583 Commits