FFmpeg/libavcodec/h263dsp.c at 9c6c653a469e6679f11a0f25aab65a704fc872db

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-11-29 05:57:37 +02:00

Files

Rémi Denis-Courmont 910d281b21 lavc/h263dsp: R-V V {h,v}_loop_filter

Since the horizontal and vertical filters are identical except for a
transposition, this uses a common subprocedure with an ad-hoc ABI.
To preserve return-address stack prediction, a link register has to be
used (c.f. the "Control Transfer Instructions" from the
RISC-V ISA Manual). The alternate/temporary link register T0 is used
here, so that the normal RA is preserved (something Arm cannot do!).

To load the strength value based on `qscale`, the shortest possible
and PIC-compatible sequence is used: AUIPC; ADD; LBU. The classic
LLA; ADD; LBU sequence would add one more instruction since LLA is a
convenience alias for AUIPC; ADDI. To ensure that this trick works,
relocation relaxation is disabled.

To implement the two signed divisions by a power of two toward zero:
 (x / (1 << SHIFT))
the code relies on the small range of integers involved, computing:
 (x + (x >> (16 - SHIFT))) >> SHIFT
rather than the more general:
 (x + ((x >> (16 - 1)) & ((1 << SHIFT) - 1))) >> SHIFT
Thus one ANDI instruction is avoided.

T-Head C908:
h263dsp.h_loop_filter_c:       228.2
h263dsp.h_loop_filter_rvv_i32: 144.0
h263dsp.v_loop_filter_c:       242.7
h263dsp.v_loop_filter_rvv_i32: 114.0
(C is probably worse in real use due to less predictible branches.)

2024-05-22 19:15:39 +03:00

3.4 KiB

Raw Blame History

View Raw

3.4 KiB Raw Blame History

3.4 KiB

Raw Blame History