1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-29 22:00:58 +02:00
FFmpeg/libavcodec
Rémi Denis-Courmont f0ef11ea83 lavc/bswapdsp: RISC-V B bswap_buf
Simply taking the Zbb REV8 instruction into use in a simple loop gives
some significant savings:

bswap_buf_c: 1081.0
bswap_buf_rvb_b: 771.0

But we can also use the 64-bit REV8 as a pseudo-SIMD instruction with
just one additional shift, and one fewer load, effectively doubling the
bandwidth. Consequently, this patch is useful even if the compile-time
target has Zbb enabled for C code:

bswap_buf_c: 1081.0
bswap_buf_rvb_b: 341.0  (this patch)

On the other hand, this approach fails miserably for bswap16_buf as the
ratio of shifts and stores becomes unfavorable compared to naïve C:

bswap16_buf_c: 1542.0
bswap16_buf_rvb_b: 1803.7

Unrolling to process 128 bits (4 samples) at a time actually worsens
performance ever so slightly:

bswap_buf_c: 1081.0
bswap_buf_rvb_b: 408.5
2022-10-05 08:26:19 +02:00
..
2022-10-05 08:26:19 +02:00
2022-09-17 10:32:16 +02:00
2022-09-27 13:19:52 +02:00
2022-09-30 19:11:36 +02:00
2022-09-23 20:26:28 +02:00
2022-08-31 16:25:31 +02:00
2022-06-24 15:37:23 +08:00
2022-06-24 15:37:23 +08:00
2022-10-05 08:26:19 +02:00
2022-10-05 08:26:19 +02:00
2022-09-30 19:11:36 +02:00
2022-09-30 19:11:36 +02:00
2022-09-07 00:07:45 +02:00
2022-09-07 00:07:45 +02:00
2022-07-30 11:42:06 +02:00
2022-07-30 11:42:06 +02:00
2022-06-25 09:05:58 +08:00
2022-09-29 00:05:32 +02:00
2022-09-29 00:05:32 +02:00
2022-09-30 19:11:36 +02:00
2022-09-16 10:02:19 +02:00
2022-09-30 19:30:29 +02:00
2022-09-22 01:05:59 +02:00
2022-10-02 20:27:36 +02:00
2022-09-30 04:40:44 +02:00
2022-06-25 09:05:58 +08:00
2022-09-17 10:15:01 +02:00