1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-24 13:56:33 +02:00

6372 Commits

Author SHA1 Message Date
Lynne
d138d7a595
vulkan: make sure descriptor buffers are always DEVICE_LOCAL
Implementations are required to list memory heaps in the most optimal
order. But its better to be explicit for this particular allocation.
2024-08-13 19:05:20 +02:00
Lynne
e25667f9f1
hwcontext_vulkan: ignore false positive validation errors
Issue ref:
https://github.com/KhronosGroup/Vulkan-ValidationLayers/issues/6627
2024-08-11 05:13:18 +02:00
Lynne
ef11a6456d
hwcontext_vulkan: do not chain structs of unsupported extensions in vkCreateDevice
Fixes:

vkCreateDevice(): pCreateInfo->pNext<VkPhysicalDeviceOpticalFlowFeaturesNV> includes a
pointer to a VkPhysicalDeviceOpticalFlowFeaturesNV, but when creating VkDevice, the
parent extension (VK_NV_optical_flow) was not included in ppEnabledExtensionNames.
The Vulkan spec states: Each pNext member of any structure (including this one) in
the pNext chain must be either NULL or a pointer to a valid struct for extending
VkDeviceCreateInfo.
2024-08-11 05:13:17 +02:00
Lynne
d6c08a41cb
vulkan: load queue families upon loading properties
Avoids the need to call ff_vk_qf_init if manually filling in
a queue family structure.
2024-08-11 05:13:16 +02:00
Lynne
0b25f0bc1d
hwcontext_vulkan: correct comment in header 2024-08-11 05:13:16 +02:00
Lynne
5f0f1f7b7a
libavutil: deprecate the old Vulkan queue API, add doc/APIchanges entries 2024-08-11 05:13:15 +02:00
Lynne
83cd77563f
vulkan: add support for encode feedback queries 2024-08-11 05:13:15 +02:00
Lynne
2ce0e51503
hwcontext_vulkan: add support for Vulkan encoding 2024-08-11 05:13:14 +02:00
Lynne
8eac11105b
vulkan: use allocator callback for buffer creation
This would've let to a segfault if custom allocators were used.
2024-08-11 05:13:13 +02:00
Lynne
55adcb4fc5
hwcontext_vulkan: add support for VK_EXT_shader_object
We'd like to use it eventually, and its already covered by
the minimum version of the headers we require.
2024-08-11 05:13:13 +02:00
Lynne
c19af16f8d
hwcontext_vulkan: enable storageBuffer16BitAccess if available 2024-08-11 05:13:12 +02:00
Lynne
957d34784a
hwcontext_vulkan: constify validation layer features table
The struct data seem to get corrupted otherwise.
Possibly a validation layer or libvulkan issue.
2024-08-11 05:13:11 +02:00
Lynne
9e606b33a8
hwcontext_vulkan: add HOST_CACHED flag to transfer buffer
Significantly speeds up downloads on devices without host mapping.
2024-08-11 05:13:11 +02:00
Lynne
aea4d4b423
hwcontext_vulkan: rewrite upload/download
This commit was long overdue. The old transfer dubiously tried to
merge as much code as possible, and had very little in the way
of optimizations, apart from basic host-mapping.

The new code uses buffer pools for any temporary bufflers, and
handles falling back to buffer-based uploads if host-mapping fails.

Roundtrip performance difference:
ffmpeg -init_hw_device "vulkan=vk:0,debug=0,disable_multiplane=1" -f lavfi \
-i color=red:s=3840x2160 -vf hwupload,hwdownload,format=yuv420p -f null -

7900XTX:
Before: 224fps
After: 502fps

Ada, with proprietary drivers:
Before: 29fps
After: 54fps

Alder Lake:
Before: 85fps
After: 108fps

With the host-mapping codepath disabled:
Before: 32fps
After: 51fps
2024-08-11 05:13:11 +02:00
Lynne
81c5d4ea0e
hwcontext_vulkan: remove unused struct 2024-08-11 05:13:10 +02:00
Lynne
a30b7c0158
hwcontext_vulkan: initialize optical flow queues if available
Lets us implement FPS conversion.
2024-08-11 05:13:10 +02:00
Lynne
8790a30882
hwcontext_vulkan: rewrite queue picking system for the new API
This allows us to support different video ops on different queues,
as well as any other arbitrary queues we need.
2024-08-11 05:13:09 +02:00
Lynne
bedfabc437
vulkan: use the new queue family mechanism 2024-08-11 05:13:09 +02:00
Lynne
13489c8a21
hwcontext_vulkan: add a new mechanism to expose used queue families
The issue with the old mechanism is that we had to introduce new
API each time we needed a new queue family, and all the queue families
were functionally fixed to a given purpose.

Nvidia's GPUs are able to handle video encoding and compute on the
same queue, which results in a speedup when pre-processing is required.

Also, this enables us to expose optical flow queues for frame interpolation.
2024-08-11 05:13:03 +02:00
Fei Wang
eab4a9e9f8 lavu/hwcontext_qsv: Use vendor id to create device
New kernel driver "xe" will be supported from Lunar Lake instead of
"i915".

"xe" kernel driver:
https://github.com/torvalds/linux/tree/master/drivers/gpu/drm/xe

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-08-09 13:40:26 +08:00
Fei Wang
dbd74ba3c8 lavu/hwcontext_vaapi: Add option to allow to specify vendor id when init hw device
Vendor id will help to select desired device in case of kernel driver is
unknow or unsupported, for vendor may support different kernel driver on
different platforms.

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
2024-08-09 13:40:24 +08:00
James Almer
210740b4ed avutil/frame: use the maximum compile time supported alignment for strides
This puts lavu frame buffer allocator helpers in sync with lavc's decoder frame
buffer allocator's STRIDE_ALIGN define.

Remove the comment about av_cpu_max_align() while at it as using it is not
ideal when CPU flags can be changed mid process.

Should fix ticket #11116.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-07 00:16:21 -03:00
Rémi Denis-Courmont
e0f9f4d491 lavu/cpu: deprecate RISC-V F, D and zba CPU flags 2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
d1326b6347 lavu/riscv: drop probing for zba CPU capability 2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
cb31f17ca8 lavu/riscv: depend on RVB and simplify accordingly 2024-08-05 21:16:26 +03:00
Nathan E. Egge
ba88e8174a lavu: Set default FF_TIMER_UNITS to "ns"
Signed-off-by: Nathan E. Egge <unlord@xiph.org>
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-08-05 21:16:26 +03:00
Gnattu OC
d50f9701b6 avutil/hwcontext_videotoolbox: Correctly set trc
The color trc key was assigned a color primaries value which causes
the resulting colorspace is always SDR.

Fixes #10884.

Signed-off-by: Gnattu OC <gnattuoc@me.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-02 10:24:09 +08:00
Rémi Denis-Courmont
1b2a925e94 lavc/riscv: drop probing for F & D extensions
F and D extensions are included in all RISC-V application profiles ever
made (so starting from RV64GC a.k.a. RVA20). Realistically they need to be
selected at compilation time.

Currently, there are no consumers for these two flags. If there is ever a
need to reintroduce F- or D-specific optimisations, we can always use
__riscv_f or __riscv_d compiler predefined macros respectively.
2024-08-01 22:56:50 +03:00
Rémi Denis-Courmont
54b1970c60 lavu/riscv: fix return type 2024-08-01 18:44:01 +03:00
James Almer
6f8e365a2a avutil/hwcontext_vaapi: use the correct type for VASurfaceAttribExternalBuffers.buffers
Should fix ticket #11115.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-01 12:13:53 -03:00
Marvin Scholz
ca7fcf5089 avutil/hwcontext_videotoolbox: Fix build with older SDKs
The previous fix was not sufficient.
To make things easier to reason about, split the function and
add the guards there instead of complicating the call site more.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-08-01 20:58:27 +08:00
Nathan E. Egge
8280ec7a32 lavu/riscv: Revert d808070, removing AV_READ_TIME
The implementation of ff_read_time() for RISC-V uses rdtime which has
 precision on existing hardware too low (!) for benchmarking purposes.
Deleting this implementation falls back on clock_gettime() which was
 added as the default ff_read_time() implementation in 33e4cc9.
Below are metrics gathered on SpacemiT K1, before and after this commit:

Before:

$ tests/checkasm/checkasm --bench
benchmarking with native FFmpeg timers
nop: 0.0
checkasm: using random seed 3473665261
checkasm: bench runs 1024 (1 << 10)
RVI:
 - pixblockdsp.get_pixels                [OK]
 - vc1dsp.mspel_pixels                   [OK]
RVF:
 - audiodsp.audiodsp                     [OK]
checkasm: all 4 tests passed
audiodsp.vector_clipf_c: 1388.7
audiodsp.vector_clipf_rvf: 261.5
get_pixels_c: 2.0
get_pixels_rvi: 1.5
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 8.0
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 1.0
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 2.0
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 0.5

After:

$ tests/checkasm/checkasm --bench
benchmarking with native FFmpeg timers
nop: 56.4
checkasm: using random seed 1021411603
checkasm: bench runs 1024 (1 << 10)
RVI:
 - pixblockdsp.get_pixels                [OK]
 - vc1dsp.mspel_pixels                   [OK]
RVF:
 - audiodsp.audiodsp                     [OK]
checkasm: all 4 tests passed
audiodsp.vector_clipf_c: 23236.4
audiodsp.vector_clipf_rvf: 11038.4
get_pixels_c: 79.6
get_pixels_rvi: 48.4
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_c: 329.6
vc1dsp.put_vc1_mspel_pixels_tab[0][0]_rvi: 38.1
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_c: 89.9
vc1dsp.put_vc1_mspel_pixels_tab[1][0]_rvi: 17.1

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-07-31 17:48:50 +03:00
Rémi Denis-Courmont
bd0c3edb13 lavu/riscv: count bytes rather than words for bswap32
This removes the dependency on Zba at essentially zero cost.
2024-07-30 18:41:51 +03:00
Fei Wang
79b4869959 lavu/hwcontext_qsv: Derive bind flag from frame type if no valid surface
Fix cmd:
ffmpeg.exe -init_hw_device d3d11va=d3d -init_hw_device qsv=qsv@d3d \
-filter_hw_device d3d -hwaccel qsv -hwaccel_output_format qsv      \
-i in.h264 -vf "hwmap,format=d3d11,hwdownload,format=nv12" -y out.yuv

Signed-off-by: Fei Wang <fei.w.wang@intel.com>
Tested-by: Tong Wu <wutong1208@outlook.com>
2024-07-30 13:41:15 +08:00
James Almer
9e7a93c6fd x86/intreadwrite: add SSE2 optimized AV_COPY128U
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 23:17:52 -03:00
James Almer
753f2aeed7 avutil/intreadwrite: add missing aligned read/write macros
Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-29 21:33:31 -03:00
Rémi Denis-Courmont
39ced529b0 lavu/riscv: implement floating point clips
Unlike x86, fmin/fmax are single instructions, not function calls. They
are much much faster than doing a comparison, then branching based on its
results. With this, audiodsp.vector_clipf gets almost twice as fast, and
a properly unrollled version of it gets 4-5x faster, on SiFive-U74.
This is only the low-hanging fruit: FFMIN and FFMAX are presumably
affected as well.

This likely applies to other instruction sets with native IEEE floats,
especially those lacking a conditional select instruction.
2024-07-28 21:24:58 +03:00
Niklas Haas
cbea92c84d avutil/dovi_meta: add dv_md_compression to cfg record
This field is used to signal the compression method in use.
2024-07-28 12:20:07 +02:00
Rémi Denis-Courmont
a14d21a446 lavu/riscv: add forward-edge CFI landing pads 2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
6319601343 lavu/riscv: assembly for zicfilp LPAD
This instruction, if aligned on a 4-byte boundary, defines a valid target
("landing pad") for an indirect call or jump. Since this instruction is a
HINT, it is safe to assemble even if not included in the target
instruction set architecture.

The necessary alignment is already provided by the `func` macro. However
this still lacks the ELF attribute to indicate that the zicfilp is supported
in simple mode. This is left for future work as the ELF specification is not
ratified as of yet.

This will also nonobviously require the assembler to support zicfilp,
insofar as the `tail` pseudo-instruction shall clobber T2 (instead of T1) as
its temporary register.
2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
982376660c lavu/riscv: align functions to 4 bytes
Currently the start of the byte range for each function is aligned to
4 bytes. But this can lead to situations whence the function is preceded
by a 2-byte C.NOP at the aligned 4-byte boundary. Then the first actual
instruction and the function symbol are only aligned on 2 bytes.

This forcefully disables compression for the alignment and the symbol,
thus ensuring that there is no padding before the function.
2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
45d7078a21 lavu/riscv: add CPU flag for B bit manipulations
The B extension was finally ratified in May 2024, encompassing:
- Zba (addresses),
- Zbb (basics) and
- Zbs (single bits).
It does not include Zbc (base-2 polynomials).
2024-07-25 23:09:58 +03:00
Rémi Denis-Courmont
529d423012 lavu/riscv: remove bespoke SH{1,2,3}ADD assembler
configure checks that the assembler supports the B extension (or rather
its constituents) anyway. These macros were dodging sanity checks for
unsupported instructions and nothing else.
2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont
5f10173fa1 lavu/riscv: require B or zba explicitly 2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont
7f97344bfb lavu/riscv: grok B as an extension
The RISC-V B bit manipulation extension was ratified only two months ago.
But it is strictly equivalent to the union of the zba, zbb and zbs
extensions which were defined almost 3 years earlier. Rather than require
new assembler, we can just match the extension name manually and translate
it into its constituent parts.
2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont
1e7ab200ee lavu/riscv: allow any number of extensions
This reworks the func/endfunc macros to support any number of ISA extension
as parameters.
2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont
0e32192548 lavu/riscv: do not fallback to AT_HWCAP auxillary vector
If __riscv_hwprobe() fails, then the kernel version is presumably too
old. There is not much point falling back to the auxillary vector.

- The Linux kernel requires I, so the flag is always set on Linux, and
  run-time detection is unnecessary. Our RISC-V assembler does anyway not
  support targets without I.

- Linux can compile with or without F and D, but it cannot perform
  run-time detection for them (a kernel with F support will not boot a
  processor without F). The run-time detection is thus useless in that
  case. Besides F and D extensions are used throughout the C code, so
  their run-time detection would not be practical.

- Support for V was added in a later kernel version than riscv_hwprobe(),
  so the system call will always be available if the kernel supports V.
  The only exception would be vendor kernel forks, but those are known to
  haphasardly pretend to support V on systems without actual V support, or
  with only pre-ratification binary-incompatible version. Furthermore, a
  large chunk of our optimisations require Zba and/or Zbb which cannot be
  detected with HWCAP in those kernels.

For what it is worth, OpenJDK already took a similar action. Note that this
keeps AT_HWCAP usage for platforms with neither C run-time <sys/hwprobe.h>
nor kernel <asm/hwprobe.h>, notably kernels other than Linux.
2024-07-22 19:43:51 +03:00
Michael Niedermayer
23851c9ee0
avutil/slicethread: Check pthread_*_init() for failure
Fixes: CID1604383 Unchecked return value
Fixes: CID1604439 Unchecked return value

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-21 17:02:13 +02:00
Michael Niedermayer
15540b3d28
avutil/frame: Check log2_crop_align
Fixes: CID1604586 Overflowed constant

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-21 17:02:13 +02:00
Michael Niedermayer
82f5b20ff5
avutil/buffer: Check ff_mutex_init() for failure
Fixes: CID1604487 Unchecked return value
Fixes: CID1604494 Unchecked return value

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-21 17:02:13 +02:00