FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00

Author	SHA1	Message	Date
Rémi Denis-Courmont	29b9d616c2	lavu/float_dsp: rework RISC-V V scalar product 1) Take the reductive sum out of the loop, leaving a regular vector addition in the loop. 2) Merge the addition and the multiplication. 3) Unroll. Before: scalarproduct_float_rvv_f32: 832.5 After: scalarproduct_float_rvv_f32: 275.2	2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont	b710f881ce	lavu/float_dsp: unroll RISC-V V loops butterflies_float_c: 1057.0 butterflies_float_rvv_f32: 351.0 (before) butterflies_float_rvv_f32: 329.5 (after) vector_dmac_scalar_c: 819.0 vector_dmac_scalar_rvv_f64: 670.5 (before) vector_dmac_scalar_rvv_f64: 431.0 (after) vector_dmul_c: 800.2 vector_dmul_rvv_f64: 541.5 (before) vector_dmul_rvv_f64: 426.0 (after) vector_dmul_scalar_c: 545.7 vector_dmul_scalar_rvv_f64: 670.7 (before) vector_dmul_scalar_rvv_f64: 324.7 (after) vector_fmac_scalar_c: 804.5 vector_fmac_scalar_rvv_f32: 412.7 (before) vector_fmac_scalar_rvv_f32: 214.5 (after) vector_fmul_c: 811.2 vector_fmul_rvv_f32: 285.7 (before) vector_fmul_rvv_f32: 214.2 (after) vector_fmul_add_c: 1313.0 vector_fmul_add_rvv_f32: 349.0 (before) vector_fmul_add_rvv_f32: 290.2 (after) vector_fmul_reverse_c: 815.7 vector_fmul_reverse_rvv_f32: 529.2 (before) vector_fmul_reverse_rvv_f32: 515.7 (after) vector_fmul_scalar_c: 546.0 vector_fmul_scalar_rvv_f32: 350.2 (before) vector_fmul_scalar_rvv_f32: 169.5 (after)	2023-07-20 22:54:34 +03:00
Rémi Denis-Courmont	b6585eb04c	lavu: add/use flag for RISC-V Zba extension The code was blindly assuming that Zbb or V implied Zba. While the earlier is practically always true, the later broke some QEMU setups, as V was introduced earlier than Zba.	2023-07-19 19:29:35 +03:00
Rémi Denis-Courmont	3d79afbe70	lavu/fixed_dsp: unroll RISC-V V loop Before: butterflies_fixed_c: 804.7 butterflies_fixed_rvv_i32: 348.2 After: butterflies_fixed_rvv_i32: 308.7	2023-07-17 18:48:42 +03:00
Rémi Denis-Courmont	0e580806d8	riscv/intmath: use builtins for counting ones As with the earlier bswap change, all versions of GCC and Clang that support RISC-V support the popcount built-ins, so we can just use them instead of inline assembler.	2023-05-02 22:08:25 +02:00
Rémi Denis-Courmont	7dcb5e1ab0	riscv/bswap: use compiler builtins av_bswapXX() are used in context that expect exact size types, notably variable arguments to av_log(). On Linux RV64, uint_fast32_t is an unsigned long, so the current inline assembler does not work properly. Since GCC and Clang gained their byte-swap built-ins before they supported RISC-V, we can simply defer to them. As an added bonus, the compiler can do instruction scheduling, which it couldn't with the Zbb inline assembler.	2023-05-02 22:08:21 +02:00
Rémi Denis-Courmont	96a83ceea4	riscv: fix scalar product initialisation VSETVLI xd, x0, ...' has rather nonobvious semantics: - If xd is x0, then it preserves the current vector length. - If xd is not x0, it sets the vector length to the supported maximum. Also somewhat confusingly, while VMV.X.S always does its thing regardless of the selected vector length, VMV.S.X does _nothing_ if the selected vector length is zero. So the current code breaks fails to initialise the accumulator if we are unlucky to have a selected vector length of zero on entry. Fix it by forcing the vector length to one.	2022-10-13 10:17:38 +02:00
Rémi Denis-Courmont	f59a767ccd	lavu/riscv: helper macro for VTYPE encoding On most cases, the vector type (VTYPE) for the RISC-V Vector extension is supplied as an immediate value, with either of the VSETVLI or VSETIVLI instructions. There is however a third instruction VSETVL which takes the vector type from a general purpose register. That is so the type can be selected at run-time. This introduces a macro to load a (valid) vector type into a register. The syntax follows that of VSETVLI and VSETIVLI, with element size, group multiplier, then tail and mask policies.	2022-10-10 02:22:12 +02:00
Rémi Denis-Courmont	37d5ddc317	lavu/riscv: CPU flag for the Zbb extension Unfortunately, it is common, and will remain so, that the Bit manipulations are not enabled at compilation time. This is an official policy for Debian ports in general (though they do not support RISC-V officially as of yet) to stick to the minimal target baseline, which does not include the B extension or even its Zbb subset. For inline helpers (CPOP, REV8), compiler builtins (CTZ, CLZ) or even plain C code (MIN, MAX, MINU, MAXU), run-time detection seems impractical. But at least it can work for the byte-swap DSP functions.	2022-10-05 08:26:19 +02:00
Rémi Denis-Courmont	3ba5579e55	riscv: remove unnecessary #include's Pointed out by Andreas Rheinhardt.	2022-10-05 06:54:56 +02:00
Rémi Denis-Courmont	c47ebfa141	lavu/riscv: helper to read the vector length	2022-09-28 11:43:17 +02:00
Rémi Denis-Courmont	c1bb19e263	lavu/fixeddsp: RISC-V V butterflies_fixed	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	cd77662953	lavu/floatdsp: RISC-V V scalarproduct_float	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	b493370662	lavu/floatdsp: RISC-V V vector_fmul_window	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	9aeb6aca3a	lavu/floatdsp: RISC-V V vector_fmul_reverse	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	47ce9735cc	lavu/floatdsp: RISC-V V butterflies_float	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	f4ea45040f	lavu/floatdsp: RISC-V V vector_fmul_add	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	d120ab5b91	lavu/floatdsp: RISC-V V vector_dmac_scalar	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	c3db27ba95	lavu/floatdsp: RISC-V V vector_fmac_scalar	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	da169a210d	lavu/floatdsp: RISC-V V vector_dmul	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	7058af9969	lavu/floatdsp: RISC-V V vector_fmul	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	89b7ec65a8	lavu/floatdsp: RISC-V V vector_dmul_scalar	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	a6c10d05fe	lavu/floatdsp: RISC-V V vector_fmul_scalar This is based on existing code from the VLC git tree with two minor changes to account for the different function prototypes.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	39357cad37	lavu/riscv: fallback macros for SH{1, 2, 3}ADD Those mnemonics require the very latest binutils release at the time of writing. These macros provide seamless backward compatibility.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	0c0a3deb18	lavu/cpu: CPU flags for the RISC-V Vector extension RVV defines a total of 12 different extensions, including: - 5 different instruction subsets: - Zve32x: 8-, 16- and 32-bit integers, - Zve32f: Zve32x plus single precision floats, - Zve64x: Zve32x plus 64-bit integers, - Zve64f: Zve32f plus Zve64x, - Zve64d: Zve64f plus double precision floats. - 6 different vector lengths: - Zvl32b (embedded only), - Zvl64b (embedded only), - Zvl128b, - Zvl256b, - Zvl512b, - Zvl1024b, - and the V extension proper: equivalent to Zve64f and Zvl128b. In total, there are 6 different possible sets of supported instructions (including the empty set), but for convenience we allocate one bit for each type sets: up-to-32-bit ints (RVV_I32), floats (RVV_F32), 64-bit ints (RVV_I64) and doubles (RVV_F64). Whence the vector size is needed, it can be retrieved by reading the unprivileged read-only vlenb CSR. This should probably be a separate helper macro if needed at a later point.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	746f1ff36a	lavu/riscv: initial common header for assembler macros	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	b95e2fbd85	lavu/cpu: detect RISC-V base extensions This introduces compile-time and run-time CPU detection on RISC-V. In practice, I doubt that FFmpeg will ever see a RISC-V CPU without all of I, F and D extensions, and if it does, it probably won't have run-time detection. So the flags are essentially always set. But as things stand, checkasm wants them that way. Compare the ARMV8 flag on AArch64. We are nowhere near running short on CPU flag bits.	2022-09-27 13:19:52 +02:00
Rémi Denis-Courmont	6df3ad9687	lavu/riscv: fix off-by-one in bit-magnitude clip	2022-09-15 18:11:12 -03:00
Rémi Denis-Courmont	a5ce44f301	lavu/riscv: fix av_clip_int16 Some serious copy-paste / squash / rebase mismanipulation here. Signed-off-by: James Almer <jamrial@gmail.com>	2022-09-14 14:37:21 -03:00
Rémi Denis-Courmont	c177108ae1	lavu/riscv: add <intmath.h> optimisations This provides some micro-optimisations for signed integer clipping, and support for bit weight with the Zbb extension.	2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont	df2057041b	lavu/riscv: byte-swap operations If the target supports the Basic bit-manipulation (Zbb) extension, then the REV8 instruction is available to reverse byte order. Note that this instruction only exists at the "XLEN" register size, so we need to right shift the result down to the data width. If Zbb is not supported, then this patchset does nothing. Support for run-time detection is left for the future. Currently, there are no bits in auxv/ELF HWCAP for Z-extensions, so there are no clean ways to do this.	2022-09-13 16:50:43 -03:00
Rémi Denis-Courmont	d808070547	lavu/riscv: AV_READ_TIME cycle counter This uses the architected RISC-V 64-bit cycle counter from the RISC-V unprivileged instruction set. In 64-bit and 128-bit, this is a straightforward CSR read. In 32-bit mode, the 64-bit value is exposed as two CSRs, which cannot be read atomically, so a loop is necessary to detect and fix up the race condition where the bottom half wraps exactly between the two reads.	2022-09-13 16:50:43 -03:00

32 Commits