FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-07 11:13:41 +02:00

Author	SHA1	Message	Date
Rémi Denis-Courmont	a14d21a446	lavu/riscv: add forward-edge CFI landing pads	2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont	6319601343	lavu/riscv: assembly for zicfilp LPAD This instruction, if aligned on a 4-byte boundary, defines a valid target ("landing pad") for an indirect call or jump. Since this instruction is a HINT, it is safe to assemble even if not included in the target instruction set architecture. The necessary alignment is already provided by the `func` macro. However this still lacks the ELF attribute to indicate that the zicfilp is supported in simple mode. This is left for future work as the ELF specification is not ratified as of yet. This will also nonobviously require the assembler to support zicfilp, insofar as the `tail` pseudo-instruction shall clobber T2 (instead of T1) as its temporary register.	2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont	982376660c	lavu/riscv: align functions to 4 bytes Currently the start of the byte range for each function is aligned to 4 bytes. But this can lead to situations whence the function is preceded by a 2-byte C.NOP at the aligned 4-byte boundary. Then the first actual instruction and the function symbol are only aligned on 2 bytes. This forcefully disables compression for the alignment and the symbol, thus ensuring that there is no padding before the function.	2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont	45d7078a21	lavu/riscv: add CPU flag for B bit manipulations The B extension was finally ratified in May 2024, encompassing: - Zba (addresses), - Zbb (basics) and - Zbs (single bits). It does not include Zbc (base-2 polynomials).	2024-07-25 23:09:58 +03:00
Rémi Denis-Courmont	529d423012	lavu/riscv: remove bespoke SH{1,2,3}ADD assembler configure checks that the assembler supports the B extension (or rather its constituents) anyway. These macros were dodging sanity checks for unsupported instructions and nothing else.	2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont	5f10173fa1	lavu/riscv: require B or zba explicitly	2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont	7f97344bfb	lavu/riscv: grok B as an extension The RISC-V B bit manipulation extension was ratified only two months ago. But it is strictly equivalent to the union of the zba, zbb and zbs extensions which were defined almost 3 years earlier. Rather than require new assembler, we can just match the extension name manually and translate it into its constituent parts.	2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont	1e7ab200ee	lavu/riscv: allow any number of extensions This reworks the func/endfunc macros to support any number of ISA extension as parameters.	2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont	0e32192548	lavu/riscv: do not fallback to AT_HWCAP auxillary vector If __riscv_hwprobe() fails, then the kernel version is presumably too old. There is not much point falling back to the auxillary vector. - The Linux kernel requires I, so the flag is always set on Linux, and run-time detection is unnecessary. Our RISC-V assembler does anyway not support targets without I. - Linux can compile with or without F and D, but it cannot perform run-time detection for them (a kernel with F support will not boot a processor without F). The run-time detection is thus useless in that case. Besides F and D extensions are used throughout the C code, so their run-time detection would not be practical. - Support for V was added in a later kernel version than riscv_hwprobe(), so the system call will always be available if the kernel supports V. The only exception would be vendor kernel forks, but those are known to haphasardly pretend to support V on systems without actual V support, or with only pre-ratification binary-incompatible version. Furthermore, a large chunk of our optimisations require Zba and/or Zbb which cannot be detected with HWCAP in those kernels. For what it is worth, OpenJDK already took a similar action. Note that this keeps AT_HWCAP usage for platforms with neither C run-time <sys/hwprobe.h> nor kernel <asm/hwprobe.h>, notably kernels other than Linux.	2024-07-22 19:43:51 +03:00
Rémi Denis-Courmont	d5e603ddc0	lavu/lls: remove useless VSETVL This changes neither VL nor VTYPE, so it can safely be removed.	2024-06-29 21:03:44 +03:00
J. Dekker	e61fed8280	avutil/riscv/cpu: fix __riscv_v_min_vlen typo Signed-off-by: J. Dekker <jdek@itanimul.li>	2024-06-26 12:50:02 +02:00
Rémi Denis-Courmont	f6d0a41c8c	lavu/riscv: use Zbb CLZ/CTZ/CLZW/CTZW at run-time Zbb static Zbb dynamic I baseline clz 0.668032642 1.336072283 19.552376803 clzl 0.668092643 1.336181786 26.110855571 ctz 1.336208533 3.340209702 26.054869008 ctzl 1.336247784 3.340362457 26.055266290 (seconds for 1 billion iterations on a SiFive-U74 core)	2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont	98db140910	lavu/riscv: use Zbb CPOP/CPOPW at run-time Zbb static Zbb dynamic I baseline popcount 1.336129286 3.469067758 20.146362909 popcountl 1.336322291 3.340292968 20.224829821 (seconds for 1 billion iterations on a SiFive-U74 core)	2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont	324899b748	lavu/riscv: use Zbb REV8 at run-time This adds runtime support to use Zbb REV8 for 32- and 64-bit byte-wise swaps. The result is about five times slower than if targetting Zbb statically, but still a lot faster than the default bespoke C code or a call to GCC run-time functions. For 16-bit swap, this is however unsurprisingly a lot worse, and so this sticks to the baseline. In fact, even using REV8 statically does not seem to be beneficial in that case. Zbb static Zbb dynamic I baseline bswap16: 0.668184765 3.340764069 0.668029012 bswap32: 0.668174014 3.340763319 9.353855435 bswap64: 0.668221765 3.340496313 14.698672283 (seconds for 1 billion iterations on a SiFive-U74 core)	2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont	378d1b06c3	riscv: probe for Zbb extension at load time Due to hysterical raisins, most RISC-V Linux distributions target a RV64GC baseline excluding the Bit-manipulation ISA extensions, most notably: - Zba: address generation extension and - Zbb: basic bit manipulation extension. Most CPUs that would make sense to run FFmpeg on support Zba and Zbb (including the current FATE runner), so it makes sense to optimise for them. In fact a large chunk of existing assembler optimisations relies on Zba and/or Zbb. Since we cannot patch shared library code, the next best thing is to carry a flag initialised at load-time and check it on need basis. This results in 3 instructions overhead on isolated use, e.g.: 1: AUIPC rd, %pcrel_hi(ff_rv_zbb_supported) LBU rd, %pcrel_lo(1b)(rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here The C compiler will typically load the flag ahead of time to reducing latency, and can also keep it around if Zbb is used multiple times in a single optimisation scope. For this to work, the flag symbol must be hidden; otherwise the optimisation degrades with a GOT look-up to support interposition: 1: AUIPC rd, GOT_OFFSET_HI LD rd, GOT_OFFSET_LO(rd) LBU rd, (rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here This patch adds code to provision the flag in libraries using bit manipulation functions from libavutil: byte-swap, bit-weight and counting leading or trailing zeroes.	2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont	eed0a1d3d4	lavu/lls: R-V V update_lls update_lls_8_c: 7.5 update_lls_8_rvv_f64: 4.2 update_lls_12_c: 14.5 update_lls_12_rvv_f64: 5.7	2024-06-01 18:05:58 +03:00
Rémi Denis-Courmont	9238f6cb41	lavu/float_dsp: R-V V scalarproduct_double C908: scalarproduct_double_c: 39.2 scalarproduct_double_rvv_f64: 10.5 X60: scalarproduct_double_c: 35.0 scalarproduct_double_rvv_f64: 5.2	2024-05-31 22:22:43 +03:00
Rémi Denis-Courmont	4fe8f2cc43	riscv: allow passing addend to vtype_vli macro A constant (-1) is added to the length value, so we can have an added for free, and optimise the addition away if the addend is exactly 1.	2024-05-30 18:30:52 +03:00
Rémi Denis-Courmont	ee1526c05f	lavu/riscv: add assembler macros for adjusting vector LMUL vtype_vli computes the VTYPE value with the optimal LMUL for a given element width, tail and mask policies and a run-time vector length. vtype_ivli does the same, but with the compile-time constant vector length. vwtypei and vntypei can be used to widen or narrow a VTYPE value for use in mixed-width vector-optimised functions.	2024-05-19 18:37:33 +03:00
Rémi Denis-Courmont	83e5fdd3f4	lavu/riscv: fix parsing the unaligned access capability Pointed-out-by: Stefan O'Rear <sorear@fastmail.com>	2024-05-15 20:04:08 +03:00
Rémi Denis-Courmont	20fbc07af1	lavu/riscv: remove bogus B extension The B Bit manipulation extension was not defined to this day, and probably never will. Instead it was broken down into Zba, Zbb, Zbc and Zbs with no particular blessed set to make up B. This removes the bogus field test. Linux never set this bit, nor (AFAICT) did FreeBSD or any other OS. We can always add it back in the unlikely event that it gets taken into use.	2024-05-14 19:50:00 +03:00
Rémi Denis-Courmont	b410439263	lavu/riscv: CPU flag for fast misaligned accesses	2024-05-14 19:50:00 +03:00
Rémi Denis-Courmont	61ec7450ff	lavu/riscv: fallback to raw hwprobe() system call Not all C run-times support this, and even then, it will be a while before distributions provide recent enough versions thereof. Since this is a trivial system call wrapper, we might just as well call the corresponding kernel system call directly where the C run-time lacks support but the kernel headers are new enough (as is the case on Debian Unstable at the time of writing). In doing so, we need to add a few more guards as the first suitable kernel (headers) release did not expose the V, Zba and Zbb extensions.	2024-05-14 19:50:00 +03:00
Rémi Denis-Courmont	247c5b2b97	lavu/riscv: add ff_rv_vlen_least() This inline function checks that the vector length is at least a given value. With this, most run-time VLEN checks can be optimised away.	2024-05-13 18:36:07 +03:00
Rémi Denis-Courmont	5d8f62feb5	lavu/riscv: add Zvbb CPU capability detection This requires Linux kernel version 6.8 or later.	2024-05-11 11:38:49 +03:00
Rémi Denis-Courmont	5afe734b6d	lavu/riscv: remove bespoke assembler for MIN This is no longer necessary as Zbb is now always explicitly required.	2024-05-10 18:59:06 +03:00
Rémi Denis-Courmont	89029baebd	lavu/riscv: allow requesting a second extension	2024-05-10 18:59:06 +03:00
Rémi Denis-Courmont	1f150a68ac	lavu/riscv: fix build without <sys/hwprobe.h>	2024-05-08 18:26:32 +03:00
Rémi Denis-Courmont	95d1052fba	lavu/riscv: add hwprobe() for CPU detection This adds the Linux-specific function call to detect CPU features. Unlike the more portable auxillary vector, this supports extensions other than single lettered ones. At this point, FFmpeg already needs this to detect Zba and Zbb at run-time, and probably will need it for Zvbb in the near future. Support will be available in glibc 2.40 onward.	2024-05-06 22:09:41 +03:00
Rémi Denis-Courmont	d7333ba6f2	lavu/riscv: indent code This reindents code to prepare for the next changeset. No functional changes.	2024-05-06 22:09:41 +03:00
Rémi Denis-Courmont	e33ce0d9dd	lavu/fixed_dsp: R-V V fmul_window_scaled vector_fmul_window_scaled_fixed_c: 4393.7 vector_fmul_window_scaled_fixed_rvv_i64: 1642.7	2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont	e49f41fb27	lavu/float_dsp: optimise R-V V fmul_reverse & fmul_window Roll the loop to avoid slow gathers. Before: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 2410.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1879.5 After: vector_fmul_reverse_c: 1561.7 vector_fmul_reverse_rvv_f32: 916.2 vector_fmul_window_c: 2068.2 vector_fmul_window_rvv_f32: 1202.5	2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont	3a134e8299	lavu/fixed_dsp: optimise R-V V fmul_reverse Gathers are (unsurprisingly) a notable exception to the rule that R-V V gets faster with larger group multipliers. So roll the function to speed it up. Before: vector_fmul_reverse_fixed_c: 2840.7 vector_fmul_reverse_fixed_rvv_i32: 2430.2 After: vector_fmul_reverse_fixed_c: 2841.0 vector_fmul_reverse_fixed_rvv_i32: 962.2 It might be possible to further optimise the function by moving the reverse-subtract out of the loop and adding ad-hoc tail handling.	2023-11-23 18:57:18 +02:00
Rémi Denis-Courmont	cd6089dc9c	riscv: fix builds without Zbb support	2023-11-18 22:01:59 +02:00
Rémi Denis-Courmont	04b49fb3c5	lavu/riscv: fix typo	2023-10-29 22:15:15 +02:00
Rémi Denis-Courmont	f39a8790e1	lavu/fixed_dsp: R-V V vector_fmul_window	2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont	10eb3b9c9f	lavu/fixed_dsp: R-V V vector_fmul vector_fmul_fixed_c: 4.0 vector_fmul_fixed_rvv_i64: 0.5	2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont	da7a77fb0a	lavu/fixed_dsp: R-V V vector_fmul_reverse	2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont	bf911cc1bf	lavu/fixed_dsp: R-V V vector_fmul_add vector_fmul_add_fixed_c: 2.2 vector_fmul_add_fixed_rvv_i64: 0.5	2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont	9091ffb006	lavu/float_dsp: adjust multipler in R-V V fmul_window The gather index vector is only used as double-length (due to register pressure), so no need to initialise it for quad-length. Basically this matches the multiplier in the prologue to the the multipler in the loop.	2023-10-09 19:52:28 +03:00
Rémi Denis-Courmont	eb73d178ea	lavu/fixed_dsp: R-V V scalarproduct	2023-10-07 17:45:39 +03:00
Rémi Denis-Courmont	9240035c0e	lavu/float_dsp: avoid reg-stride in R-V V fmul_window	2023-10-03 22:48:10 +03:00
Rémi Denis-Courmont	446b0090cb	lavu/float_dsp: avoid reg-stride in R-V V reverse_fmul This revectors the inner loop to reverse vectors element in vectors, thus eliminating the negative register stride. Note that RVV does not have a vector reverse instruction, so this uses a gather.	2023-10-03 20:48:47 +03:00
Rémi Denis-Courmont	cec48e3b32	riscv: factor out the bswap32 assembler	2023-10-02 22:28:21 +03:00
Rémi Denis-Courmont	7a24d794f6	Revert "lavu/timer: remove gratuitous volatile" It does not make much sense to me, but GCC somehow optimises the inline assembler even though the output is very obviously used and having observable side effects. This reverts commit `09731fbfc3`.	2023-09-28 17:48:18 +03:00
Rémi Denis-Courmont	6f8ac298da	lavu/timer: specify RISC-V time unit	2023-08-24 20:58:57 +03:00
Rémi Denis-Courmont	09731fbfc3	lavu/timer: remove gratuitous volatile AV_READ_TIME has no side effects. It does not need to be volatile.	2023-08-24 20:58:57 +03:00
Rémi Denis-Courmont	05115a77e0	lavu/timer: use time for AV_READ_TIME on RISC-V So far, AV_READ_TIME would return the cycle counter. This posed two problems: 1) On recent systems, it would just raise an illegal instruction exception. Indeed RDCYCLE is blocked in user space to ward off some side channel attacks. In particular, this would cause the random number generator to crash. 2) It does not match the x86 behaviour and the apparent original intent of AV_READ_TIME in the functional code base (outside test cases). So this replaces the cycle counter with the time counter. The unit is a platform-dependent constant fraction of time, and the value should be stable across harts (RISC-V lingo for physical CPU thread).	2023-08-24 20:58:57 +03:00
Rémi Denis-Courmont	29b9d616c2	lavu/float_dsp: rework RISC-V V scalar product 1) Take the reductive sum out of the loop, leaving a regular vector addition in the loop. 2) Merge the addition and the multiplication. 3) Unroll. Before: scalarproduct_float_rvv_f32: 832.5 After: scalarproduct_float_rvv_f32: 275.2	2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont	b710f881ce	lavu/float_dsp: unroll RISC-V V loops butterflies_float_c: 1057.0 butterflies_float_rvv_f32: 351.0 (before) butterflies_float_rvv_f32: 329.5 (after) vector_dmac_scalar_c: 819.0 vector_dmac_scalar_rvv_f64: 670.5 (before) vector_dmac_scalar_rvv_f64: 431.0 (after) vector_dmul_c: 800.2 vector_dmul_rvv_f64: 541.5 (before) vector_dmul_rvv_f64: 426.0 (after) vector_dmul_scalar_c: 545.7 vector_dmul_scalar_rvv_f64: 670.7 (before) vector_dmul_scalar_rvv_f64: 324.7 (after) vector_fmac_scalar_c: 804.5 vector_fmac_scalar_rvv_f32: 412.7 (before) vector_fmac_scalar_rvv_f32: 214.5 (after) vector_fmul_c: 811.2 vector_fmul_rvv_f32: 285.7 (before) vector_fmul_rvv_f32: 214.2 (after) vector_fmul_add_c: 1313.0 vector_fmul_add_rvv_f32: 349.0 (before) vector_fmul_add_rvv_f32: 290.2 (after) vector_fmul_reverse_c: 815.7 vector_fmul_reverse_rvv_f32: 529.2 (before) vector_fmul_reverse_rvv_f32: 515.7 (after) vector_fmul_scalar_c: 546.0 vector_fmul_scalar_rvv_f32: 350.2 (before) vector_fmul_scalar_rvv_f32: 169.5 (after)	2023-07-20 22:54:34 +03:00

1 2

80 Commits