FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-12 19:18:44 +02:00

Author	SHA1	Message	Date
Niklas Haas	2a2e0aced2	avutil/dovi_meta: document static vs dynamic ext blocks	2024-08-16 11:48:02 +02:00
Lynne	604dfdb44c	hwcontext_vulkan: align host mapping size to minImportedHostPointerAlignment This was left out of the recent rewrite of the system.	2024-08-16 01:22:16 +02:00
Lynne	18d964fc2c	vulkan: enable encoding of images if video_maintenance1 is enabled Vulkan encoding was designed in a very... consolidated way. You had to know the exact codec and profile that the image was going to eventually be encoded as at... image creation time. Unfortunately, as good as our code is, glimpsing into the exact future isn't what its capable of. video_maintenance1 removed that requirement, which only then made encoding images practically possible.	2024-08-16 01:22:16 +02:00
Lynne	46c13834b6	hwcontext_vulkan: enable VK_KHR_video_maintenance1 We require it for encoding.	2024-08-16 01:22:15 +02:00
Lynne	97e947a2a7	hwcontext_vulkan: setup extensions before features The issue is that enabling features requires that the device extension is supported. The extensions bitfield was set later, so it was always 0, leading to no features being added.	2024-08-16 01:22:15 +02:00
Lynne	c3cbaf39bb	hwcontext_vulkan: don't enable deprecated VK_KHR_sampler_ycbcr_conversion extension It was added to Vulkan 1.1 a long time ago. Validation layer will warn if this is enabled.	2024-08-16 01:22:15 +02:00
Lynne	3f65d24075	hwcontext_vulkan: fix user layers, add support for different debug modes The validation layer option only supported GPU-assisted validation. This is mutually exclusive with shader debug printfs, so we need to differentiate between the two. This also fixes issues with user-given layers, and leaks in case of errors.	2024-08-16 01:22:14 +02:00
gnattu	a1976e963f	avutil/hwcontext_videotoolbox: silence warning for RGB Hardware frames with RGB colorspace will not have a YCbCrMatrixKey. Currently, it will spam the console with warning if rgb frame is uploaded. Signed-off-by: Gnattu OC <gnattuoc@me.com> Reviewed-by: Marvin Scholz <epirat07@gmail.com> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-08-15 20:10:33 +08:00
Lynne	d138d7a595	vulkan: make sure descriptor buffers are always DEVICE_LOCAL Implementations are required to list memory heaps in the most optimal order. But its better to be explicit for this particular allocation.	2024-08-13 19:05:20 +02:00
Lynne	e25667f9f1	hwcontext_vulkan: ignore false positive validation errors Issue ref: https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/6627	2024-08-11 05:13:18 +02:00
Lynne	ef11a6456d	hwcontext_vulkan: do not chain structs of unsupported extensions in vkCreateDevice Fixes: vkCreateDevice(): pCreateInfo->pNext<VkPhysicalDeviceOpticalFlowFeaturesNV> includes a pointer to a VkPhysicalDeviceOpticalFlowFeaturesNV, but when creating VkDevice, the parent extension (VK_NV_optical_flow) was not included in ppEnabledExtensionNames. The Vulkan spec states: Each pNext member of any structure (including this one) in the pNext chain must be either NULL or a pointer to a valid struct for extending VkDeviceCreateInfo.	2024-08-11 05:13:17 +02:00
Lynne	d6c08a41cb	vulkan: load queue families upon loading properties Avoids the need to call ff_vk_qf_init if manually filling in a queue family structure.	2024-08-11 05:13:16 +02:00
Lynne	0b25f0bc1d	hwcontext_vulkan: correct comment in header	2024-08-11 05:13:16 +02:00
Lynne	5f0f1f7b7a	libavutil: deprecate the old Vulkan queue API, add doc/APIchanges entries	2024-08-11 05:13:15 +02:00
Lynne	83cd77563f	vulkan: add support for encode feedback queries	2024-08-11 05:13:15 +02:00
Lynne	2ce0e51503	hwcontext_vulkan: add support for Vulkan encoding	2024-08-11 05:13:14 +02:00
Lynne	8eac11105b	vulkan: use allocator callback for buffer creation This would've let to a segfault if custom allocators were used.	2024-08-11 05:13:13 +02:00
Lynne	55adcb4fc5	hwcontext_vulkan: add support for VK_EXT_shader_object We'd like to use it eventually, and its already covered by the minimum version of the headers we require.	2024-08-11 05:13:13 +02:00
Lynne	c19af16f8d	hwcontext_vulkan: enable storageBuffer16BitAccess if available	2024-08-11 05:13:12 +02:00
Lynne	957d34784a	hwcontext_vulkan: constify validation layer features table The struct data seem to get corrupted otherwise. Possibly a validation layer or libvulkan issue.	2024-08-11 05:13:11 +02:00
Lynne	9e606b33a8	hwcontext_vulkan: add HOST_CACHED flag to transfer buffer Significantly speeds up downloads on devices without host mapping.	2024-08-11 05:13:11 +02:00
Lynne	aea4d4b423	hwcontext_vulkan: rewrite upload/download This commit was long overdue. The old transfer dubiously tried to merge as much code as possible, and had very little in the way of optimizations, apart from basic host-mapping. The new code uses buffer pools for any temporary bufflers, and handles falling back to buffer-based uploads if host-mapping fails. Roundtrip performance difference: ffmpeg -init_hw_device "vulkan=vk:0,debug=0,disable_multiplane=1" -f lavfi \ -i color=red:s=3840x2160 -vf hwupload,hwdownload,format=yuv420p -f null - 7900XTX: Before: 224fps After: 502fps Ada, with proprietary drivers: Before: 29fps After: 54fps Alder Lake: Before: 85fps After: 108fps With the host-mapping codepath disabled: Before: 32fps After: 51fps	2024-08-11 05:13:11 +02:00
Lynne	81c5d4ea0e	hwcontext_vulkan: remove unused struct	2024-08-11 05:13:10 +02:00
Lynne	a30b7c0158	hwcontext_vulkan: initialize optical flow queues if available Lets us implement FPS conversion.	2024-08-11 05:13:10 +02:00
Lynne	8790a30882	hwcontext_vulkan: rewrite queue picking system for the new API This allows us to support different video ops on different queues, as well as any other arbitrary queues we need.	2024-08-11 05:13:09 +02:00
Lynne	bedfabc437	vulkan: use the new queue family mechanism	2024-08-11 05:13:09 +02:00
Lynne	13489c8a21	hwcontext_vulkan: add a new mechanism to expose used queue families The issue with the old mechanism is that we had to introduce new API each time we needed a new queue family, and all the queue families were functionally fixed to a given purpose. Nvidia's GPUs are able to handle video encoding and compute on the same queue, which results in a speedup when pre-processing is required. Also, this enables us to expose optical flow queues for frame interpolation.	2024-08-11 05:13:03 +02:00
Fei Wang	eab4a9e9f8	lavu/hwcontext_qsv: Use vendor id to create device New kernel driver "xe" will be supported from Lunar Lake instead of "i915". "xe" kernel driver: https://github.com/torvalds/linux/tree/master/drivers/gpu/drm/xe Signed-off-by: Fei Wang <fei.w.wang@intel.com>	2024-08-09 13:40:26 +08:00
Fei Wang	dbd74ba3c8	lavu/hwcontext_vaapi: Add option to allow to specify vendor id when init hw device Vendor id will help to select desired device in case of kernel driver is unknow or unsupported, for vendor may support different kernel driver on different platforms. Signed-off-by: Fei Wang <fei.w.wang@intel.com>	2024-08-09 13:40:24 +08:00
James Almer	210740b4ed	avutil/frame: use the maximum compile time supported alignment for strides This puts lavu frame buffer allocator helpers in sync with lavc's decoder frame buffer allocator's STRIDE_ALIGN define. Remove the comment about av_cpu_max_align() while at it as using it is not ideal when CPU flags can be changed mid process. Should fix ticket #11116. Signed-off-by: James Almer <jamrial@gmail.com>	2024-08-07 00:16:21 -03:00
Rémi Denis-Courmont	e0f9f4d491	lavu/cpu: deprecate RISC-V F, D and zba CPU flags	2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont	d1326b6347	lavu/riscv: drop probing for zba CPU capability	2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont	cb31f17ca8	lavu/riscv: depend on RVB and simplify accordingly	2024-08-05 21:16:26 +03:00
Nathan E. Egge	ba88e8174a	lavu: Set default FF_TIMER_UNITS to "ns" Signed-off-by: Nathan E. Egge <unlord@xiph.org> Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2024-08-05 21:16:26 +03:00
Gnattu OC	d50f9701b6	avutil/hwcontext_videotoolbox: Correctly set trc The color trc key was assigned a color primaries value which causes the resulting colorspace is always SDR. Fixes #10884. Signed-off-by: Gnattu OC <gnattuoc@me.com> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-08-02 10:24:09 +08:00
Rémi Denis-Courmont	1b2a925e94	lavc/riscv: drop probing for F & D extensions F and D extensions are included in all RISC-V application profiles ever made (so starting from RV64GC a.k.a. RVA20). Realistically they need to be selected at compilation time. Currently, there are no consumers for these two flags. If there is ever a need to reintroduce F- or D-specific optimisations, we can always use __riscv_f or __riscv_d compiler predefined macros respectively.	2024-08-01 22:56:50 +03:00
Rémi Denis-Courmont	54b1970c60	lavu/riscv: fix return type	2024-08-01 18:44:01 +03:00
James Almer	6f8e365a2a	avutil/hwcontext_vaapi: use the correct type for VASurfaceAttribExternalBuffers.buffers Should fix ticket #11115. Signed-off-by: James Almer <jamrial@gmail.com>	2024-08-01 12:13:53 -03:00
Marvin Scholz	ca7fcf5089	avutil/hwcontext_videotoolbox: Fix build with older SDKs The previous fix was not sufficient. To make things easier to reason about, split the function and add the guards there instead of complicating the call site more. Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-08-01 20:58:27 +08:00
Nathan E. Egge	8280ec7a32	lavu/riscv: Revert `d808070`, removing AV_READ_TIME The implementation of ff_read_time() for RISC-V uses rdtime which has precision on existing hardware too low (!) for benchmarking purposes. Deleting this implementation falls back on clock_gettime() which was added as the default ff_read_time() implementation in `33e4cc9`. Below are metrics gathered on SpacemiT K1, before and after this commit: Before: $ tests/checkasm/checkasm --bench benchmarking with native FFmpeg timers nop: 0.0 checkasm: using random seed 3473665261 checkasm: bench runs 1024 (1 << 10) RVI: - pixblockdsp.get_pixels [OK] - vc1dsp.mspel_pixels [OK] RVF: - audiodsp.audiodsp [OK] checkasm: all 4 tests passed audiodsp.vector_clipf_c: 1388.7 audiodsp.vector_clipf_rvf: 261.5 get_pixels_c: 2.0 get_pixels_rvi: 1.5 vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 8.0 vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 1.0 vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 2.0 vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 0.5 After: $ tests/checkasm/checkasm --bench benchmarking with native FFmpeg timers nop: 56.4 checkasm: using random seed 1021411603 checkasm: bench runs 1024 (1 << 10) RVI: - pixblockdsp.get_pixels [OK] - vc1dsp.mspel_pixels [OK] RVF: - audiodsp.audiodsp [OK] checkasm: all 4 tests passed audiodsp.vector_clipf_c: 23236.4 audiodsp.vector_clipf_rvf: 11038.4 get_pixels_c: 79.6 get_pixels_rvi: 48.4 vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 329.6 vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 38.1 vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 89.9 vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 17.1 Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2024-07-31 17:48:50 +03:00
Rémi Denis-Courmont	bd0c3edb13	lavu/riscv: count bytes rather than words for bswap32 This removes the dependency on Zba at essentially zero cost.	2024-07-30 18:41:51 +03:00
Fei Wang	79b4869959	lavu/hwcontext_qsv: Derive bind flag from frame type if no valid surface Fix cmd: ffmpeg.exe -init_hw_device d3d11va=d3d -init_hw_device qsv=qsv@d3d \ -filter_hw_device d3d -hwaccel qsv -hwaccel_output_format qsv \ -i in.h264 -vf "hwmap,format=d3d11,hwdownload,format=nv12" -y out.yuv Signed-off-by: Fei Wang <fei.w.wang@intel.com> Tested-by: Tong Wu <wutong1208@outlook.com>	2024-07-30 13:41:15 +08:00
James Almer	9e7a93c6fd	x86/intreadwrite: add SSE2 optimized AV_COPY128U Signed-off-by: James Almer <jamrial@gmail.com>	2024-07-29 23:17:52 -03:00
James Almer	753f2aeed7	avutil/intreadwrite: add missing aligned read/write macros Signed-off-by: James Almer <jamrial@gmail.com>	2024-07-29 21:33:31 -03:00
Rémi Denis-Courmont	39ced529b0	lavu/riscv: implement floating point clips Unlike x86, fmin/fmax are single instructions, not function calls. They are much much faster than doing a comparison, then branching based on its results. With this, audiodsp.vector_clipf gets almost twice as fast, and a properly unrollled version of it gets 4-5x faster, on SiFive-U74. This is only the low-hanging fruit: FFMIN and FFMAX are presumably affected as well. This likely applies to other instruction sets with native IEEE floats, especially those lacking a conditional select instruction.	2024-07-28 21:24:58 +03:00
Niklas Haas	cbea92c84d	avutil/dovi_meta: add dv_md_compression to cfg record This field is used to signal the compression method in use.	2024-07-28 12:20:07 +02:00
Rémi Denis-Courmont	a14d21a446	lavu/riscv: add forward-edge CFI landing pads	2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont	6319601343	lavu/riscv: assembly for zicfilp LPAD This instruction, if aligned on a 4-byte boundary, defines a valid target ("landing pad") for an indirect call or jump. Since this instruction is a HINT, it is safe to assemble even if not included in the target instruction set architecture. The necessary alignment is already provided by the `func` macro. However this still lacks the ELF attribute to indicate that the zicfilp is supported in simple mode. This is left for future work as the ELF specification is not ratified as of yet. This will also nonobviously require the assembler to support zicfilp, insofar as the `tail` pseudo-instruction shall clobber T2 (instead of T1) as its temporary register.	2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont	982376660c	lavu/riscv: align functions to 4 bytes Currently the start of the byte range for each function is aligned to 4 bytes. But this can lead to situations whence the function is preceded by a 2-byte C.NOP at the aligned 4-byte boundary. Then the first actual instruction and the function symbol are only aligned on 2 bytes. This forcefully disables compression for the alignment and the symbol, thus ensuring that there is no padding before the function.	2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont	45d7078a21	lavu/riscv: add CPU flag for B bit manipulations The B extension was finally ratified in May 2024, encompassing: - Zba (addresses), - Zbb (basics) and - Zbs (single bits). It does not include Zbc (base-2 polynomials).	2024-07-25 23:09:58 +03:00

1 2 3 4 5 ...

6380 Commits