mirror of
https://github.com/FFmpeg/FFmpeg.git
synced 2024-12-28 20:53:54 +02:00
f0ef11ea83
Simply taking the Zbb REV8 instruction into use in a simple loop gives some significant savings: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 771.0 But we can also use the 64-bit REV8 as a pseudo-SIMD instruction with just one additional shift, and one fewer load, effectively doubling the bandwidth. Consequently, this patch is useful even if the compile-time target has Zbb enabled for C code: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 341.0 (this patch) On the other hand, this approach fails miserably for bswap16_buf as the ratio of shifts and stores becomes unfavorable compared to naïve C: bswap16_buf_c: 1542.0 bswap16_buf_rvb_b: 1803.7 Unrolling to process 128 bits (4 samples) at a time actually worsens performance ever so slightly: bswap_buf_c: 1081.0 bswap_buf_rvb_b: 408.5 |
||
---|---|---|
.. | ||
aacpsdsp_init.c | ||
aacpsdsp_rvv.S | ||
alacdsp_init.c | ||
alacdsp_rvv.S | ||
audiodsp_init.c | ||
audiodsp_rvf.S | ||
audiodsp_rvv.S | ||
bswapdsp_init.c | ||
bswapdsp_rvb.S | ||
fmtconvert_init.c | ||
fmtconvert_rvv.S | ||
idctdsp_init.c | ||
idctdsp_rvv.S | ||
Makefile | ||
pixblockdsp_init.c | ||
pixblockdsp_rvi.S | ||
pixblockdsp_rvv.S | ||
vorbisdsp_init.c | ||
vorbisdsp_rvv.S |