Nuo Mi
8d27256a74
avcodec/vvcdec: remove vvc prefix for x86 and riscv
2024-12-22 21:00:06 +08:00
sunyuechi
6b31e42c47
lavc/riscv: vset macro for simplify if-else
2024-12-21 12:03:45 +08:00
Rémi Denis-Courmont
bd226fdd74
lavc/h264dsp: R-V V intra loop filter
...
As with the inter loop filter, performance metrics seem to be biased in
favour of the C implementation because checkasm inputs almost always
fall in the no-op case.
h264_h_loop_filter_chroma_intra_8bpp_c: 82.8 ( 1.00x)
h264_h_loop_filter_chroma_intra_8bpp_rvv_i32: 72.6 ( 1.14x)
h264_h_loop_filter_chroma_mbaff_intra_8bpp_c: 41.1 ( 1.00x)
h264_h_loop_filter_chroma_mbaff_intra_8bpp_rvv_i32: 72.6 ( 0.57x)
h264_h_loop_filter_luma_intra_8bpp_c: 166.1 ( 1.00x)
h264_h_loop_filter_luma_intra_8bpp_rvv_i32: 395.4 ( 0.42x)
h264_h_loop_filter_luma_mbaff_intra_8bpp_c: 93.3 ( 1.00x)
h264_h_loop_filter_luma_mbaff_intra_8bpp_rvv_i32: 395.4 ( 0.24x)
h264_v_loop_filter_chroma_intra_8bpp_c: 134.8 ( 1.00x)
h264_v_loop_filter_chroma_intra_8bpp_rvv_i32: 51.6 ( 2.61x)
h264_v_loop_filter_luma_intra_8bpp_c: 468.1 ( 1.00x)
h264_v_loop_filter_luma_intra_8bpp_rvv_i32: 134.8 ( 3.47x)
2024-12-17 09:00:28 +02:00
sunyuechi
16d4945e9a
lavc/vvc_mc R-V V sad
...
k230 banana_f3
sad_8x16_c: 387.7 ( 1.00x) 394.9 ( 1.00x)
sad_8x16_rvv_i32: 109.7 ( 3.53x) 103.5 ( 3.82x)
sad_16x8_c: 378.2 ( 1.00x) 384.7 ( 1.00x)
sad_16x8_rvv_i32: 82.0 ( 4.61x) 61.7 ( 6.24x)
sad_16x16_c: 748.7 ( 1.00x) 759.7 ( 1.00x)
sad_16x16_rvv_i32: 128.5 ( 5.83x) 113.7 ( 6.68x)
2024-12-17 09:21:20 +08:00
sunyuechi
b3f7440298
lavc/hevc: R-V V put_pixels(pow2)
...
k230 banana_f3
put_hevc_pel_pixels4_8_c: 61.6 ( 1.00x) 69.5 ( 1.00x)
put_hevc_pel_pixels4_8_rvv_i32: 24.6 ( 2.50x) 28.0 ( 2.48x)
put_hevc_pel_pixels8_8_c: 209.8 ( 1.00x) 215.5 ( 1.00x)
put_hevc_pel_pixels8_8_rvv_i32: 52.6 ( 3.99x) 38.2 ( 5.64x)
put_hevc_pel_pixels16_8_c: 839.4 ( 1.00x) 830.0 ( 1.00x)
put_hevc_pel_pixels16_8_rvv_i32: 126.6 ( 6.63x) 90.5 ( 9.17x)
put_hevc_pel_pixels32_8_c: 3246.6 ( 1.00x) 3246.7 ( 1.00x)
put_hevc_pel_pixels32_8_rvv_i32: 311.6 (10.42x) 257.0 (12.63x)
put_hevc_pel_pixels64_8_c: 12894.6 ( 1.00x) 12892.7 ( 1.00x)
put_hevc_pel_pixels64_8_rvv_i32: 1135.8 (11.35x) 778.0 (16.57x)
2024-12-17 09:21:20 +08:00
sunyuechi
dad062c4f8
lavc/vvc_mc: R-V V put_pixels
...
k230 banana_f3
put_chroma_pixels_8_4x4_c: 63.5 ( 1.00x) 59.2 ( 1.00x)
put_chroma_pixels_8_4x4_rvv_i32: 26.5 ( 2.39x) 28.0 ( 2.12x)
put_chroma_pixels_8_8x8_c: 211.8 ( 1.00x) 215.5 ( 1.00x)
put_chroma_pixels_8_8x8_rvv_i32: 54.3 ( 3.90x) 48.8 ( 4.42x)
put_chroma_pixels_8_16x16_c: 841.3 ( 1.00x) 830.0 ( 1.00x)
put_chroma_pixels_8_16x16_rvv_i32: 137.5 ( 6.12x) 121.8 ( 6.82x)
put_chroma_pixels_8_32x32_c: 3248.8 ( 1.00x) 3288.2 ( 1.00x)
put_chroma_pixels_8_32x32_rvv_i32: 350.5 ( 9.27x) 288.5 (11.40x)
put_chroma_pixels_8_64x64_c: 12998.3 ( 1.00x) 12976.2 ( 1.00x)
put_chroma_pixels_8_64x64_rvv_i32: 1100.5 (11.81x) 924.0 (14.04x)
put_chroma_pixels_8_128x128_c: 54284.0 ( 1.00x) 52654.5 ( 1.00x)
put_chroma_pixels_8_128x128_rvv_i32: 7192.8 ( 7.55x) 2934.2 (17.94x)
put_luma_pixels_8_4x4_c: 63.5 ( 1.00x) 69.5 ( 1.00x)
put_luma_pixels_8_4x4_rvv_i32: 26.5 ( 2.39x) 28.0 ( 2.48x)
put_luma_pixels_8_8x8_c: 211.5 ( 1.00x) 225.8 ( 1.00x)
put_luma_pixels_8_8x8_rvv_i32: 54.3 ( 3.90x) 38.5 ( 5.86x)
put_luma_pixels_8_16x16_c: 850.5 ( 1.00x) 830.0 ( 1.00x)
put_luma_pixels_8_16x16_rvv_i32: 137.5 ( 6.18x) 100.8 ( 8.24x)
put_luma_pixels_8_32x32_c: 3248.8 ( 1.00x) 3257.2 ( 1.00x)
put_luma_pixels_8_32x32_rvv_i32: 341.3 ( 9.52x) 246.8 (13.20x)
put_luma_pixels_8_64x64_c: 13007.5 ( 1.00x) 13038.8 ( 1.00x)
put_luma_pixels_8_64x64_rvv_i32: 1119.0 (11.62x) 684.2 (19.06x)
put_luma_pixels_8_128x128_c: 54219.3 ( 1.00x) 52060.8 ( 1.00x)
put_luma_pixels_8_128x128_rvv_i32: 6813.5 ( 7.96x) 2548.8 (20.43x)
2024-12-17 09:21:20 +08:00
sunyuechi
9288196c0d
lavc/riscv: Move VVC macro to h26x
2024-12-17 09:21:20 +08:00
sunyuechi
89df9c4404
lavc/vvc_mc: R-V V dmvr
...
k230 banana_f3
dmvr_8_12x20_c: 619.3 ( 1.00x) 624.1 ( 1.00x)
dmvr_8_12x20_rvv_i32: 128.6 ( 4.82x) 103.4 ( 6.04x)
dmvr_8_20x12_c: 610.0 ( 1.00x) 665.6 ( 1.00x)
dmvr_8_20x12_rvv_i32: 137.6 ( 4.44x) 92.9 ( 7.17x)
dmvr_8_20x20_c: 1008.0 ( 1.00x) 1082.7 ( 1.00x)
dmvr_8_20x20_rvv_i32: 221.1 ( 4.56x) 155.4 ( 6.97x)
dmvr_h_8_12x20_c: 2008.0 ( 1.00x) 2009.7 ( 1.00x)
dmvr_h_8_12x20_rvv_i32: 239.6 ( 8.38x) 186.7 (10.77x)
dmvr_h_8_20x12_c: 1989.5 ( 1.00x) 2009.4 ( 1.00x)
dmvr_h_8_20x12_rvv_i32: 230.3 ( 8.64x) 155.4 (12.93x)
dmvr_h_8_20x20_c: 3304.1 ( 1.00x) 3342.9 ( 1.00x)
dmvr_h_8_20x20_rvv_i32: 378.3 ( 8.73x) 248.9 (13.43x)
dmvr_hv_8_12x20_c: 3609.8 ( 1.00x) 3603.4 ( 1.00x)
dmvr_hv_8_12x20_rvv_i32: 369.1 ( 9.78x) 322.1 (11.19x)
dmvr_hv_8_20x12_c: 3628.3 ( 1.00x) 3624.2 ( 1.00x)
dmvr_hv_8_20x12_rvv_i32: 322.8 (11.24x) 238.7 (15.19x)
dmvr_hv_8_20x20_c: 5933.8 ( 1.00x) 5936.6 ( 1.00x)
dmvr_hv_8_20x20_rvv_i32: 526.5 (11.27x) 374.1 (15.87x)
dmvr_v_8_12x20_c: 2156.3 ( 1.00x) 2155.4 ( 1.00x)
dmvr_v_8_12x20_rvv_i32: 239.6 ( 9.00x) 176.2 (12.24x)
dmvr_v_8_20x12_c: 2137.6 ( 1.00x) 2165.9 ( 1.00x)
dmvr_v_8_20x12_rvv_i32: 230.3 ( 9.28x) 155.2 (13.96x)
dmvr_v_8_20x20_c: 4183.8 ( 1.00x) 3592.9 ( 1.00x)
dmvr_v_8_20x20_rvv_i32: 369.3 (11.33x) 249.2 (14.42x)
2024-12-17 09:21:20 +08:00
sunyuechi
b86766d610
Update R-V V vvc_mc vset to support more lengths
2024-12-17 09:21:20 +08:00
sunyuechi
2dc864eb4e
lavc/rv40dsp: fix RISC-V chroma_mc
...
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-12-10 11:24:45 -05:00
Rémi Denis-Courmont
f8e91ab05f
lavc/h264idct: fix compilation for RV32IMA
2024-11-25 19:29:21 +02:00
Rémi Denis-Courmont
f2b945147d
lavc/vp8dsp: fix compilation for RV32IMA
2024-11-25 19:29:21 +02:00
Rémi Denis-Courmont
d3acffae7a
lavc/pixblockdsp: fix compilation for RV32IMA
2024-11-25 19:29:21 +02:00
Rémi Denis-Courmont
da1ab7940e
riscv: remove unnecessary #include's
2024-11-25 19:29:21 +02:00
Rémi Denis-Courmont
607d4cca8e
riscv/h264dsp: remove spurious instruction
2024-11-18 22:02:19 +02:00
Rémi Denis-Courmont
b75dff0e20
lavc/h264dsp: fix R-V V weight_pixels pointer arithmetic
...
As of 459a1512f1
,
the code is unrolled to process two rows per iteration.
The output cursor thus needs to be incremented by twice the
stride, which is taken care of with SH1ADD. However the original
ADD from the original implemetation was incorrectly left over.
2024-11-18 20:04:58 +02:00
Rémi Denis-Courmont
bbb0fdedb7
lavc/h264idct: fix RISC-V group multiplier
...
After the branch, the expected SEW/LMUL ratio is 1 byte/vector.
So we have to set the same ratio before branching (QEMU does not care,
but real hardware does).
2024-11-17 16:35:27 +02:00
Rémi Denis-Courmont
fd8cbfec3d
lavc/vp8dsp: remove RISC-V table alignment
...
These values are bytes and need not be aligned.
2024-11-17 11:28:21 +02:00
Rémi Denis-Courmont
690c015758
lavc/h264dsp: remove RISC-V table alignment
...
These values are bytes and need not be aligned.
2024-11-17 11:28:21 +02:00
Rémi Denis-Courmont
c3051d94a7
lavc/h264dsp: move RISC-V fn pointers to .data.rel.ro
...
This should fix PIC builds.
2024-11-16 16:04:24 +02:00
Rémi Denis-Courmont
1eb026dd8b
riscv/vvc: fix UNDEF whilst initialising DSP
...
The current triggers an illegal instruction if the CPU does not support
vectors.
2024-10-12 09:23:33 +03:00
Niklas Haas
2f77ecc6bc
avcodec/riscv: add h264 qpel
...
Benched on K230 for VLEN 128, SpaceMIT for VLEN 256. Variants for 4
width have no speedup for VLEN 256 vs VLEN 128 on available hardware,
so were disabled.
C RVV128 C RVV256
avg_h264_qpel_4_mc00_8 33.9 33.6 (1.01x)
avg_h264_qpel_4_mc01_8 218.8 89.1 (2.46x)
avg_h264_qpel_4_mc02_8 218.8 79.8 (2.74x)
avg_h264_qpel_4_mc03_8 218.8 89.1 (2.46x)
avg_h264_qpel_4_mc10_8 172.3 126.1 (1.37x)
avg_h264_qpel_4_mc11_8 339.1 190.8 (1.78x)
avg_h264_qpel_4_mc12_8 533.6 357.6 (1.49x)
avg_h264_qpel_4_mc13_8 348.4 190.8 (1.83x)
avg_h264_qpel_4_mc20_8 144.8 116.8 (1.24x)
avg_h264_qpel_4_mc21_8 478.1 385.6 (1.24x)
avg_h264_qpel_4_mc22_8 348.4 283.6 (1.23x)
avg_h264_qpel_4_mc23_8 478.1 394.6 (1.21x)
avg_h264_qpel_4_mc30_8 172.6 126.1 (1.37x)
avg_h264_qpel_4_mc31_8 339.4 191.1 (1.78x)
avg_h264_qpel_4_mc32_8 542.9 357.6 (1.52x)
avg_h264_qpel_4_mc33_8 339.4 191.1 (1.78x)
avg_h264_qpel_8_mc00_8 116.8 42.9 (2.72x) 123.6 50.6 (2.44x)
avg_h264_qpel_8_mc01_8 774.4 163.1 (4.75x) 779.8 165.1 (4.72x)
avg_h264_qpel_8_mc02_8 774.4 154.1 (5.03x) 779.8 144.3 (5.40x)
avg_h264_qpel_8_mc03_8 774.4 163.3 (4.74x) 779.8 165.3 (4.72x)
avg_h264_qpel_8_mc10_8 617.1 237.3 (2.60x) 613.1 227.6 (2.69x)
avg_h264_qpel_8_mc11_8 1209.3 376.4 (3.21x) 1206.8 363.1 (3.32x)
avg_h264_qpel_8_mc12_8 1913.3 598.6 (3.20x) 1894.3 561.1 (3.38x)
avg_h264_qpel_8_mc13_8 1218.6 376.4 (3.24x) 1217.1 363.1 (3.35x)
avg_h264_qpel_8_mc20_8 524.4 228.1 (2.30x) 519.3 227.6 (2.28x)
avg_h264_qpel_8_mc21_8 1709.6 681.9 (2.51x) 1707.1 644.3 (2.65x)
avg_h264_qpel_8_mc22_8 1274.3 459.6 (2.77x) 1279.8 436.1 (2.93x)
avg_h264_qpel_8_mc23_8 1700.3 672.6 (2.53x) 1706.8 644.6 (2.65x)
avg_h264_qpel_8_mc30_8 607.6 246.6 (2.46x) 623.6 238.1 (2.62x)
avg_h264_qpel_8_mc31_8 1209.6 376.4 (3.21x) 1206.8 363.1 (3.32x)
avg_h264_qpel_8_mc32_8 1904.1 607.9 (3.13x) 1894.3 571.3 (3.32x)
avg_h264_qpel_8_mc33_8 1209.6 376.1 (3.22x) 1206.8 363.1 (3.32x)
avg_h264_qpel_16_mc00_8 431.9 89.1 (4.85x) 436.1 71.3 (6.12x)
avg_h264_qpel_16_mc01_8 2894.6 376.1 (7.70x) 2842.3 300.6 (9.46x)
avg_h264_qpel_16_mc02_8 2987.3 348.4 (8.57x) 2967.3 290.1 (10.23x)
avg_h264_qpel_16_mc03_8 2885.3 376.4 (7.67x) 2842.3 300.6 (9.46x)
avg_h264_qpel_16_mc10_8 2404.1 524.4 (4.58x) 2404.8 456.8 (5.26x)
avg_h264_qpel_16_mc11_8 4709.4 811.6 (5.80x) 4675.6 706.8 (6.62x)
avg_h264_qpel_16_mc12_8 7477.9 1274.3 (5.87x) 7436.1 1061.1 (7.01x)
avg_h264_qpel_16_mc13_8 4718.6 820.6 (5.75x) 4655.1 706.8 (6.59x)
avg_h264_qpel_16_mc20_8 2052.1 487.1 (4.21x) 2071.3 446.3 (4.64x)
avg_h264_qpel_16_mc21_8 7440.6 1422.6 (5.23x) 6727.8 1217.3 (5.53x)
avg_h264_qpel_16_mc22_8 5051.9 950.4 (5.32x) 5071.6 790.3 (6.42x)
avg_h264_qpel_16_mc23_8 6764.9 1422.3 (4.76x) 6748.6 1217.3 (5.54x)
avg_h264_qpel_16_mc30_8 2413.1 524.4 (4.60x) 2415.1 467.3 (5.17x)
avg_h264_qpel_16_mc31_8 4681.6 839.1 (5.58x) 4675.6 727.6 (6.43x)
avg_h264_qpel_16_mc32_8 8579.6 1292.8 (6.64x) 7436.3 1071.3 (6.94x)
avg_h264_qpel_16_mc33_8 5375.9 829.9 (6.48x) 4665.3 717.3 (6.50x)
put_h264_qpel_4_mc00_8 24.4 24.4 (1.00x)
put_h264_qpel_4_mc01_8 987.4 79.8 (12.37x)
put_h264_qpel_4_mc02_8 190.8 79.8 (2.39x)
put_h264_qpel_4_mc03_8 209.6 89.1 (2.35x)
put_h264_qpel_4_mc10_8 163.3 117.1 (1.39x)
put_h264_qpel_4_mc11_8 339.4 181.6 (1.87x)
put_h264_qpel_4_mc12_8 533.6 348.4 (1.53x)
put_h264_qpel_4_mc13_8 339.4 190.8 (1.78x)
put_h264_qpel_4_mc20_8 126.3 116.8 (1.08x)
put_h264_qpel_4_mc21_8 468.9 376.1 (1.25x)
put_h264_qpel_4_mc22_8 330.1 274.4 (1.20x)
put_h264_qpel_4_mc23_8 468.9 376.1 (1.25x)
put_h264_qpel_4_mc30_8 163.3 126.3 (1.29x)
put_h264_qpel_4_mc31_8 339.1 191.1 (1.77x)
put_h264_qpel_4_mc32_8 533.6 348.4 (1.53x)
put_h264_qpel_4_mc33_8 339.4 181.8 (1.87x)
put_h264_qpel_8_mc00_8 98.6 33.6 (2.93x) 92.3 40.1 (2.30x)
put_h264_qpel_8_mc01_8 737.1 153.8 (4.79x) 738.1 144.3 (5.12x)
put_h264_qpel_8_mc02_8 663.1 135.3 (4.90x) 665.1 134.1 (4.96x)
put_h264_qpel_8_mc03_8 737.4 154.1 (4.79x) 1508.8 144.3 (10.46x)
put_h264_qpel_8_mc10_8 598.4 237.1 (2.52x) 592.3 227.6 (2.60x)
put_h264_qpel_8_mc11_8 1172.3 357.9 (3.28x) 1175.6 342.3 (3.43x)
put_h264_qpel_8_mc12_8 1867.1 589.1 (3.17x) 1863.1 561.1 (3.32x)
put_h264_qpel_8_mc13_8 1172.6 366.9 (3.20x) 1175.6 352.8 (3.33x)
put_h264_qpel_8_mc20_8 450.4 218.8 (2.06x) 446.3 206.8 (2.16x)
put_h264_qpel_8_mc21_8 1672.3 663.1 (2.52x) 1675.6 633.8 (2.64x)
put_h264_qpel_8_mc22_8 1144.6 1200.1 (0.95x) 1144.3 425.6 (2.69x)
put_h264_qpel_8_mc23_8 1672.6 672.4 (2.49x) 1665.3 634.1 (2.63x)
put_h264_qpel_8_mc30_8 598.6 237.3 (2.52x) 613.1 227.6 (2.69x)
put_h264_qpel_8_mc31_8 1172.3 376.1 (3.12x) 1175.6 352.6 (3.33x)
put_h264_qpel_8_mc32_8 1857.8 598.6 (3.10x) 1863.1 561.1 (3.32x)
put_h264_qpel_8_mc33_8 1172.3 376.1 (3.12x) 1175.6 352.8 (3.33x)
put_h264_qpel_16_mc00_8 320.6 61.4 (5.22x) 321.3 60.8 (5.28x)
put_h264_qpel_16_mc01_8 2774.3 339.1 (8.18x) 2759.1 279.8 (9.86x)
put_h264_qpel_16_mc02_8 2589.1 320.6 (8.08x) 2571.6 269.3 (9.55x)
put_h264_qpel_16_mc03_8 2774.3 339.4 (8.17x) 2738.1 290.1 (9.44x)
put_h264_qpel_16_mc10_8 2274.3 487.4 (4.67x) 2290.1 436.1 (5.25x)
put_h264_qpel_16_mc11_8 5237.1 792.9 (6.60x) 4529.8 685.8 (6.61x)
put_h264_qpel_16_mc12_8 7357.6 1255.8 (5.86x) 7352.8 1040.1 (7.07x)
put_h264_qpel_16_mc13_8 4579.9 792.9 (5.78x) 4571.6 686.1 (6.66x)
put_h264_qpel_16_mc20_8 1802.1 459.6 (3.92x) 1800.6 425.6 (4.23x)
put_h264_qpel_16_mc21_8 6644.6 2246.6 (2.96x) 6644.3 1196.6 (5.55x)
put_h264_qpel_16_mc22_8 4589.1 913.4 (5.02x) 4592.3 769.3 (5.97x)
put_h264_qpel_16_mc23_8 6644.6 1394.6 (4.76x) 6634.1 1196.6 (5.54x)
put_h264_qpel_16_mc30_8 2274.3 496.6 (4.58x) 2290.1 456.8 (5.01x)
put_h264_qpel_16_mc31_8 5255.6 802.1 (6.55x) 4550.8 706.8 (6.44x)
put_h264_qpel_16_mc32_8 7376.1 1265.1 (5.83x) 7352.8 1050.6 (7.00x)
put_h264_qpel_16_mc33_8 4579.9 802.1 (5.71x) 4561.1 696.3 (6.55x)
Signed-off-by: Niklas Haas <git@haasn.dev>
Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-09-28 18:35:35 +02:00
Rémi Denis-Courmont
6611bf5484
lavc/h264dsp: optimise R-V V biweight for shorter heights
...
T-Head C908:
h264_biweight2_8_c: 313.7 ( 1.00x)
h264_biweight2_8_rvv_i32: before 239.5 ( 1.23x)
h264_biweight2_8_rvv_i32: after 72.7 ( 4.31x)
h264_biweight4_8_c: 582.0 ( 1.00x)
h264_biweight4_8_rvv_i32: before 471.0 ( 1.16x)
h264_biweight4_8_rvv_i32: after 91.5 ( 6.36x)
h264_biweight8_8_c: 1110.0 ( 1.00x)
h264_biweight8_8_rvv_i32: before 943.3 ( 1.10x)
h264_biweight8_8_rvv_i64: after 147.0 ( 7.55x)
SpacemiT X60:
h264_biweight2_8_c: 311.4 ( 1.00x)
h264_biweight2_8_rvv_i32: before 363.1 ( 0.83x)
h264_biweight2_8_rvv_i32: after 103.1 ( 3.02x)
h264_biweight4_8_c: 571.9 ( 1.00x)
h264_biweight4_8_rvv_i32: before 717.4 ( 0.78x)
h264_biweight4_8_rvv_i32: after 71.8 ( 7.96x)
h264_biweight8_8_c: 1103.1 ( 1.00x)
h264_biweight8_8_rvv_i32: before 1415.2 ( 0.76x)
h264_biweight8_8_rvv_i64: ater 92.8 (11.88x)
2024-09-24 20:04:51 +03:00
Rémi Denis-Courmont
459a1512f1
lavc/h264dsp: unroll R-V V weight16
...
As VLSE128.V does not exist, we have no other way to deal with latency.
T-Head C908:
h264_weight16_8_c: 989.4 ( 1.00x)
h264_weight16_8_rvv_i32: 193.2 ( 5.12x)
SpacemiT X60:
h264_weight16_8_c: 874.1 ( 1.00x)
h264_weight16_8_rvv_i32: 196.9 ( 4.44x)
2024-09-24 20:04:51 +03:00
Rémi Denis-Courmont
4936bb2508
lavc/h264dsp: optimise R-V V weight for shorter heights
...
The height is a power of two of up to 16 rows. The current code was
optimised for large sample counts.
T-Head C908:
h264_weight2_8_c: 211.7 ( 1.00x)
h264_weight2_8_rvv_i32: before 184.0 ( 1.15x)
h264_weight2_8_rvv_i32: after 54.2 ( 3.90x)
h264_weight4_8_c: 285.7 ( 1.00x)
h264_weight4_8_rvv_i32: before 341.2 ( 0.86x)
h264_weight4_8_rvv_i32: after 82.2 ( 3.47x)
h264_weight8_8_c: 498.7 ( 1.00x)
h264_weight8_8_rvv_i32: before 683.7 ( 0.73x)
h264_weight8_8_rvv_i64: after 128.5 ( 3.95x)
h264_weight16_8_c: 878.2 ( 1.00x)
h264_weight16_8_rvv_i32: unchanged 239.5 ( 3.67x)
SpacemiT X60:
h264_weight2_8_c: 207.2 ( 1.00x)
h264_weight2_8_rvv_i32: before 259.6 ( 0.80x)
h264_weight2_8_rvv_i32: after 82.2 ( 2.52x)
h264_weight4_8_c: 290.8 ( 1.00x)
h264_weight4_8_rvv_i32: before 509.6 ( 0.57x)
h264_weight4_8_rvv_i32: after 61.5 ( 4.73x)
h264_weight8_8_c: 498.8 ( 1.00x)
h264_weight8_8_rvv_i32: before 1019.8 ( 0.49x)
h264_weight8_8_rvv_i64: after 71.8 ( 6.95x)
h264_weight16_8_c: 874.0 ( 1.00x)
h264_weight16_8_rvv_i32: unchanged 249.0 ( 3.51x)
2024-09-24 20:04:51 +03:00
sunyuechi
ba7d0d5fc3
lavc/vvc_mc: R-V V avg w_avg
...
C908 X60
avg_8_2x2_c : 1.2 1.0
avg_8_2x2_rvv_i32 : 0.7 0.7
avg_8_2x4_c : 2.0 2.2
avg_8_2x4_rvv_i32 : 1.2 1.2
avg_8_2x8_c : 3.7 4.0
avg_8_2x8_rvv_i32 : 1.7 1.5
avg_8_2x16_c : 7.2 7.7
avg_8_2x16_rvv_i32 : 3.0 2.7
avg_8_2x32_c : 14.2 15.2
avg_8_2x32_rvv_i32 : 5.5 5.0
avg_8_2x64_c : 51.0 43.7
avg_8_2x64_rvv_i32 : 39.2 29.7
avg_8_2x128_c : 100.5 79.2
avg_8_2x128_rvv_i32 : 79.7 68.2
avg_8_4x2_c : 1.7 2.0
avg_8_4x2_rvv_i32 : 1.0 0.7
avg_8_4x4_c : 3.5 3.7
avg_8_4x4_rvv_i32 : 1.2 1.2
avg_8_4x8_c : 6.7 7.0
avg_8_4x8_rvv_i32 : 1.7 1.5
avg_8_4x16_c : 13.5 14.0
avg_8_4x16_rvv_i32 : 3.0 2.7
avg_8_4x32_c : 26.2 27.7
avg_8_4x32_rvv_i32 : 5.5 4.7
avg_8_4x64_c : 73.0 73.7
avg_8_4x64_rvv_i32 : 39.0 32.5
avg_8_4x128_c : 143.0 137.2
avg_8_4x128_rvv_i32 : 72.7 68.0
avg_8_8x2_c : 3.5 3.5
avg_8_8x2_rvv_i32 : 1.0 0.7
avg_8_8x4_c : 6.2 6.5
avg_8_8x4_rvv_i32 : 1.5 1.0
avg_8_8x8_c : 12.7 13.2
avg_8_8x8_rvv_i32 : 2.0 1.5
avg_8_8x16_c : 25.0 26.5
avg_8_8x16_rvv_i32 : 3.2 2.7
avg_8_8x32_c : 50.0 52.7
avg_8_8x32_rvv_i32 : 6.2 5.0
avg_8_8x64_c : 118.7 122.5
avg_8_8x64_rvv_i32 : 40.2 31.5
avg_8_8x128_c : 236.7 220.2
avg_8_8x128_rvv_i32 : 85.2 67.7
avg_8_16x2_c : 6.2 6.7
avg_8_16x2_rvv_i32 : 1.2 0.7
avg_8_16x4_c : 12.5 13.0
avg_8_16x4_rvv_i32 : 1.7 1.0
avg_8_16x8_c : 24.5 26.0
avg_8_16x8_rvv_i32 : 3.0 1.7
avg_8_16x16_c : 49.0 51.5
avg_8_16x16_rvv_i32 : 5.5 3.0
avg_8_16x32_c : 97.5 102.5
avg_8_16x32_rvv_i32 : 10.5 5.5
avg_8_16x64_c : 213.7 222.0
avg_8_16x64_rvv_i32 : 48.5 34.2
avg_8_16x128_c : 434.7 420.0
avg_8_16x128_rvv_i32 : 97.7 74.0
avg_8_32x2_c : 12.2 12.7
avg_8_32x2_rvv_i32 : 1.5 1.0
avg_8_32x4_c : 24.5 25.5
avg_8_32x4_rvv_i32 : 3.0 1.7
avg_8_32x8_c : 48.5 50.7
avg_8_32x8_rvv_i32 : 5.2 2.7
avg_8_32x16_c : 96.7 101.2
avg_8_32x16_rvv_i32 : 10.2 5.0
avg_8_32x32_c : 192.7 202.2
avg_8_32x32_rvv_i32 : 19.7 9.5
avg_8_32x64_c : 427.5 426.5
avg_8_32x64_rvv_i32 : 64.2 18.2
avg_8_32x128_c : 816.5 821.0
avg_8_32x128_rvv_i32 : 135.2 75.5
avg_8_64x2_c : 24.0 25.2
avg_8_64x2_rvv_i32 : 2.7 1.5
avg_8_64x4_c : 48.2 50.5
avg_8_64x4_rvv_i32 : 5.0 2.7
avg_8_64x8_c : 96.0 100.7
avg_8_64x8_rvv_i32 : 9.7 4.5
avg_8_64x16_c : 207.7 201.2
avg_8_64x16_rvv_i32 : 19.0 9.0
avg_8_64x32_c : 383.2 402.0
avg_8_64x32_rvv_i32 : 37.5 17.5
avg_8_64x64_c : 837.2 828.7
avg_8_64x64_rvv_i32 : 84.7 35.5
avg_8_64x128_c : 1640.7 1640.2
avg_8_64x128_rvv_i32 : 206.0 153.0
avg_8_128x2_c : 48.7 51.0
avg_8_128x2_rvv_i32 : 5.2 2.7
avg_8_128x4_c : 96.7 101.5
avg_8_128x4_rvv_i32 : 10.2 5.0
avg_8_128x8_c : 192.2 202.0
avg_8_128x8_rvv_i32 : 19.7 9.2
avg_8_128x16_c : 400.7 403.2
avg_8_128x16_rvv_i32 : 38.7 18.5
avg_8_128x32_c : 786.7 805.7
avg_8_128x32_rvv_i32 : 77.0 36.2
avg_8_128x64_c : 1615.5 1655.5
avg_8_128x64_rvv_i32 : 189.7 80.7
avg_8_128x128_c : 3182.0 3238.0
avg_8_128x128_rvv_i32 : 397.5 308.5
w_avg_8_2x2_c : 1.7 1.2
w_avg_8_2x2_rvv_i32 : 1.2 1.0
w_avg_8_2x4_c : 2.7 2.7
w_avg_8_2x4_rvv_i32 : 1.7 1.5
w_avg_8_2x8_c : 21.7 4.7
w_avg_8_2x8_rvv_i32 : 2.7 2.5
w_avg_8_2x16_c : 9.5 9.2
w_avg_8_2x16_rvv_i32 : 4.7 4.2
w_avg_8_2x32_c : 19.0 18.7
w_avg_8_2x32_rvv_i32 : 9.0 8.0
w_avg_8_2x64_c : 62.0 50.2
w_avg_8_2x64_rvv_i32 : 47.7 33.5
w_avg_8_2x128_c : 116.7 87.7
w_avg_8_2x128_rvv_i32 : 80.0 69.5
w_avg_8_4x2_c : 2.5 2.5
w_avg_8_4x2_rvv_i32 : 1.2 1.0
w_avg_8_4x4_c : 4.7 4.5
w_avg_8_4x4_rvv_i32 : 1.7 1.7
w_avg_8_4x8_c : 9.0 8.7
w_avg_8_4x8_rvv_i32 : 2.7 2.5
w_avg_8_4x16_c : 17.7 17.5
w_avg_8_4x16_rvv_i32 : 4.7 4.2
w_avg_8_4x32_c : 35.0 35.0
w_avg_8_4x32_rvv_i32 : 9.0 8.0
w_avg_8_4x64_c : 100.5 84.5
w_avg_8_4x64_rvv_i32 : 42.2 33.7
w_avg_8_4x128_c : 203.5 151.2
w_avg_8_4x128_rvv_i32 : 83.0 69.5
w_avg_8_8x2_c : 4.5 4.5
w_avg_8_8x2_rvv_i32 : 1.2 1.2
w_avg_8_8x4_c : 8.7 8.7
w_avg_8_8x4_rvv_i32 : 2.0 1.7
w_avg_8_8x8_c : 17.0 17.0
w_avg_8_8x8_rvv_i32 : 3.2 2.5
w_avg_8_8x16_c : 34.0 33.5
w_avg_8_8x16_rvv_i32 : 5.5 4.2
w_avg_8_8x32_c : 86.0 67.5
w_avg_8_8x32_rvv_i32 : 10.5 8.0
w_avg_8_8x64_c : 187.2 149.5
w_avg_8_8x64_rvv_i32 : 45.0 35.5
w_avg_8_8x128_c : 342.7 290.0
w_avg_8_8x128_rvv_i32 : 108.7 70.2
w_avg_8_16x2_c : 8.5 8.2
w_avg_8_16x2_rvv_i32 : 2.0 1.2
w_avg_8_16x4_c : 16.7 16.7
w_avg_8_16x4_rvv_i32 : 3.0 1.7
w_avg_8_16x8_c : 33.2 33.5
w_avg_8_16x8_rvv_i32 : 5.5 3.0
w_avg_8_16x16_c : 66.2 66.7
w_avg_8_16x16_rvv_i32 : 10.5 5.0
w_avg_8_16x32_c : 132.5 131.0
w_avg_8_16x32_rvv_i32 : 20.0 9.7
w_avg_8_16x64_c : 340.0 283.5
w_avg_8_16x64_rvv_i32 : 60.5 37.2
w_avg_8_16x128_c : 641.2 597.5
w_avg_8_16x128_rvv_i32 : 118.7 77.7
w_avg_8_32x2_c : 16.5 16.7
w_avg_8_32x2_rvv_i32 : 3.2 1.7
w_avg_8_32x4_c : 33.2 33.2
w_avg_8_32x4_rvv_i32 : 5.5 2.7
w_avg_8_32x8_c : 66.0 62.5
w_avg_8_32x8_rvv_i32 : 10.5 5.0
w_avg_8_32x16_c : 131.5 132.0
w_avg_8_32x16_rvv_i32 : 20.2 9.5
w_avg_8_32x32_c : 261.7 272.0
w_avg_8_32x32_rvv_i32 : 39.7 18.0
w_avg_8_32x64_c : 575.2 545.5
w_avg_8_32x64_rvv_i32 : 105.5 58.7
w_avg_8_32x128_c : 1154.2 1088.0
w_avg_8_32x128_rvv_i32 : 207.0 98.0
w_avg_8_64x2_c : 33.0 33.0
w_avg_8_64x2_rvv_i32 : 6.2 2.7
w_avg_8_64x4_c : 65.5 66.0
w_avg_8_64x4_rvv_i32 : 11.5 5.0
w_avg_8_64x8_c : 131.2 132.5
w_avg_8_64x8_rvv_i32 : 22.5 9.5
w_avg_8_64x16_c : 268.2 262.5
w_avg_8_64x16_rvv_i32 : 44.2 18.0
w_avg_8_64x32_c : 561.5 528.7
w_avg_8_64x32_rvv_i32 : 88.0 35.2
w_avg_8_64x64_c : 1136.2 1124.0
w_avg_8_64x64_rvv_i32 : 222.0 82.2
w_avg_8_64x128_c : 2345.0 2312.7
w_avg_8_64x128_rvv_i32 : 423.0 190.5
w_avg_8_128x2_c : 65.7 66.5
w_avg_8_128x2_rvv_i32 : 11.2 5.5
w_avg_8_128x4_c : 131.2 132.2
w_avg_8_128x4_rvv_i32 : 22.0 10.2
w_avg_8_128x8_c : 263.5 312.0
w_avg_8_128x8_rvv_i32 : 43.2 19.7
w_avg_8_128x16_c : 528.7 526.2
w_avg_8_128x16_rvv_i32 : 85.5 39.5
w_avg_8_128x32_c : 1067.7 1062.7
w_avg_8_128x32_rvv_i32 : 171.7 78.2
w_avg_8_128x64_c : 2234.7 2168.7
w_avg_8_128x64_rvv_i32 : 400.0 159.0
w_avg_8_128x128_c : 4752.5 4295.0
w_avg_8_128x128_rvv_i32 : 757.7 365.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-09-24 20:04:51 +03:00
Anton Khirnov
3f9ca51015
lavc/opus*: move to opus/ subdir
2024-09-02 11:56:53 +02:00
Ramiro Polla
6aafe61285
avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t
2024-09-01 13:42:30 +02:00
Rémi Denis-Courmont
7d1dda4892
lavc/h264dsp: R-V V loop_filter_chroma
...
T-Head C908:
h264_v_loop_filter_chroma_8bpp_c: 137.4
h264_v_loop_filter_chroma_8bpp_rvv_i32: 54.2
2024-09-01 10:58:48 +03:00
Rémi Denis-Courmont
3a53656837
lavc/h264dsp: do not write back unmodified rows in R-V V loop filter
2024-09-01 10:52:26 +03:00
Rémi Denis-Courmont
d8fb44c0aa
lavc/mpegvideoencdsp: R-V V add_8x8basis
...
T-Head C908:
add_8x8basis_c: 440.6
add_8x8basis_rvv_i32: 70.3
SpacemiT X60:
add_8x8basis_c: 436.3
add_8x8basis_rvv_i32: 40.5
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
1907dd7f23
lavc/mpegvideoencdsp: R-V V try_8x8basis
...
T-Head C908:
try_8x8basis_c: 922.5
try_8x8basis_rvv_i32: 135.3
SpacemiT X60:
try_8x8basis_c: 926.1
try_8x8basis_rvv_i32: 103.1
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
0fd37c00d7
lavc/mpegvideoencdsp: R-V V pix_norm1
...
T-Head C908:
pix_norm1_c: 480.2
pix_norm1_rvv_i64: 146.9
SpacemiT X60:
pix_norm1_c: 478.2
pix_norm1_rvv_i64: 92.7
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
63d016aea5
lavc/mpegvideoencdsp: R-V V pix_sum
...
T-Head C908:
pix_sum_c: 332.2
pix_sum_rvv_i64: 91.2
SpacemiT X60:
pix_sum_c: 321.2
pix_sum_rvv_i64: 60.9
2024-08-19 22:41:13 +03:00
sunyuechi
4e7b5ac48f
lavc/vp9dsp: R-V V mc bilin hv
...
C908 X60
vp9_avg_bilin_4hv_8bpp_c : 10.7 9.5
vp9_avg_bilin_4hv_8bpp_rvv_i32 : 4.0 3.5
vp9_avg_bilin_8hv_8bpp_c : 38.5 34.2
vp9_avg_bilin_8hv_8bpp_rvv_i32 : 7.2 6.5
vp9_avg_bilin_16hv_8bpp_c : 147.2 130.5
vp9_avg_bilin_16hv_8bpp_rvv_i32 : 14.5 12.7
vp9_avg_bilin_32hv_8bpp_c : 574.2 509.7
vp9_avg_bilin_32hv_8bpp_rvv_i32 : 42.5 38.0
vp9_avg_bilin_64hv_8bpp_c : 2321.2 2017.7
vp9_avg_bilin_64hv_8bpp_rvv_i32 : 163.5 131.0
vp9_put_bilin_4hv_8bpp_c : 10.0 8.7
vp9_put_bilin_4hv_8bpp_rvv_i32 : 3.5 3.0
vp9_put_bilin_8hv_8bpp_c : 35.2 31.2
vp9_put_bilin_8hv_8bpp_rvv_i32 : 6.5 5.7
vp9_put_bilin_16hv_8bpp_c : 134.0 119.0
vp9_put_bilin_16hv_8bpp_rvv_i32 : 12.7 11.5
vp9_put_bilin_32hv_8bpp_c : 538.5 464.2
vp9_put_bilin_32hv_8bpp_rvv_i32 : 39.7 35.2
vp9_put_bilin_64hv_8bpp_c : 2111.7 1833.2
vp9_put_bilin_64hv_8bpp_rvv_i32 : 138.5 122.5
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-08-19 22:29:20 +03:00
sunyuechi
9edd2e723b
lavc/vp9dsp: R-V V mc bilin h v
...
C908 X60
vp9_avg_bilin_4h_8bpp_c : 5.5 4.7
vp9_avg_bilin_4h_8bpp_rvv_i32 : 1.7 1.5
vp9_avg_bilin_4v_8bpp_c : 5.5 4.7
vp9_avg_bilin_4v_8bpp_rvv_i32 : 1.5 1.2
vp9_avg_bilin_8h_8bpp_c : 20.0 17.7
vp9_avg_bilin_8h_8bpp_rvv_i32 : 3.0 2.7
vp9_avg_bilin_8v_8bpp_c : 20.7 18.7
vp9_avg_bilin_8v_8bpp_rvv_i32 : 3.0 2.7
vp9_avg_bilin_16h_8bpp_c : 78.2 69.7
vp9_avg_bilin_16h_8bpp_rvv_i32 : 7.0 6.2
vp9_avg_bilin_16v_8bpp_c : 98.5 73.2
vp9_avg_bilin_16v_8bpp_rvv_i32 : 7.0 6.0
vp9_avg_bilin_32h_8bpp_c : 325.5 275.5
vp9_avg_bilin_32h_8bpp_rvv_i32 : 23.0 20.5
vp9_avg_bilin_32v_8bpp_c : 342.2 290.0
vp9_avg_bilin_32v_8bpp_rvv_i32 : 21.7 19.5
vp9_avg_bilin_64h_8bpp_c : 1263.7 1095.7
vp9_avg_bilin_64h_8bpp_rvv_i32 : 91.2 81.2
vp9_avg_bilin_64v_8bpp_c : 1331.7 1155.2
vp9_avg_bilin_64v_8bpp_rvv_i32 : 91.2 81.0
vp9_put_bilin_4h_8bpp_c : 4.5 4.0
vp9_put_bilin_4h_8bpp_rvv_i32 : 1.0 1.0
vp9_put_bilin_4v_8bpp_c : 4.7 4.2
vp9_put_bilin_4v_8bpp_rvv_i32 : 1.0 1.0
vp9_put_bilin_8h_8bpp_c : 16.7 15.0
vp9_put_bilin_8h_8bpp_rvv_i32 : 2.2 2.0
vp9_put_bilin_8v_8bpp_c : 17.5 15.7
vp9_put_bilin_8v_8bpp_rvv_i32 : 2.2 2.0
vp9_put_bilin_16h_8bpp_c : 65.2 58.0
vp9_put_bilin_16h_8bpp_rvv_i32 : 6.0 5.5
vp9_put_bilin_16v_8bpp_c : 69.2 61.7
vp9_put_bilin_16v_8bpp_rvv_i32 : 5.7 5.2
vp9_put_bilin_32h_8bpp_c : 273.2 229.0
vp9_put_bilin_32h_8bpp_rvv_i32 : 19.7 17.7
vp9_put_bilin_32v_8bpp_c : 290.5 243.7
vp9_put_bilin_32v_8bpp_rvv_i32 : 18.7 16.7
vp9_put_bilin_64h_8bpp_c : 1040.5 910.5
vp9_put_bilin_64h_8bpp_rvv_i32 : 82.5 73.0
vp9_put_bilin_64v_8bpp_c : 1108.5 971.0
vp9_put_bilin_64v_8bpp_rvv_i32 : 82.2 73.2
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-08-19 22:29:20 +03:00
Rémi Denis-Courmont
616fdeaea3
lavc/riscv: depend on RVB and simplify accordingly
...
There is no known (real) hardware with V and without the complete B
extension. B was indeed required in the RISC-V application profile from
2022, earlier than V. There should not be any relevant hardware in the
future either.
In practice, different R-V Vector optimisations in FFmpeg already depend on
every constituent of the B extension anyhow, so it would not work well.
2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
4edfc11a28
lavc/h264dsp: R-V V idct4_add8 (all depths)
...
These are really just wrappers for idct4_add16intra functions, which are in
turn mostly wrappers for idct4_add and idct4_dc_add functions.
For benchmarks refer to the later two sets.
2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
de7f999481
lavc/videodsp: work-around LLVM-as
...
For some reason, it can't handle the normal syntax for an address operand
without an offset, so add a dummy zero offset.
2024-08-02 21:24:01 +03:00
Rémi Denis-Courmont
677f28b310
lavc/h264dsp: stick R-V V weight to 16-bit precision
...
T-Head C908 (ns):
h264_weight2_8_c: 1607.8
h264_weight2_8_rvv_i32: 515.0 (before)
h264_weight2_8_rvv_i32: 348.5 (after)
h264_weight4_8_c: 2255.8
h264_weight4_8_rvv_i32: 1015.0 (before)
h264_weight4_8_rvv_i32: 691.0 (after)
h264_weight8_8_c: 3857.5
h264_weight8_8_rvv_i32: 2218.8 (before)
h264_weight8_8_rvv_i32: 1561.3 (after)
h264_weight16_8_c: 7431.5
h264_weight16_8_rvv_i32: 2737.3 (before)
h264_weight16_8_rvv_i32: 1848.3 (after)
SpacemiT X60 (ns):
h264_weight2_8_c: 1624.1
h264_weight2_8_rvv_i32: 352.6 (before)
h264_weight2_8_rvv_i32: 259.3 (after)
h264_weight4_8_c: 2259.3
h264_weight4_8_rvv_i32: 685.8 (before)
h264_weight4_8_rvv_i32: 530.3 (after)
h264_weight8_8_c: 4103.3
h264_weight8_8_rvv_i32: 1581.8 (before)
h264_weight8_8_rvv_i32: 1238.6 (after)
h264_weight16_8_c: 7624.3
h264_weight16_8_rvv_i32: 2738.1 (before)
h264_weight16_8_rvv_i32: 1853.3 (after)
2024-08-02 21:24:01 +03:00
Rémi Denis-Courmont
afd45c7ff7
lavc/h264dsp: stick R-V V biweight to 16-bit
...
T-Head C908 (ns):
h264_biweight2_8_c: 2414.5
h264_biweight2_8_rvv_i32: 701.8 (before)
h264_biweight2_8_rvv_i32: 468.5 (after)
h264_biweight4_8_c: 4655.3
h264_biweight4_8_rvv_i32: 1377.5 (before)
h264_biweight4_8_rvv_i32: 931.8 (after)
h264_biweight8_8_c: 9701.5
h264_biweight8_8_rvv_i32: 2896.0 (before)
h264_biweight8_8_rvv_i32: 2070.5 (after)
h264_biweight16_8_c: 18025.0
h264_biweight16_8_rvv_i32: 3460.8 (before)
h264_biweight16_8_rvv_i32: 1978.0 (after)
SpacemiT X60 (ns):
h264_biweight2_8_c: 2415.5
h264_biweight2_8_rvv_i32: 478.2 (before)
h264_biweight2_8_rvv_i32: 362.8 (after)
h264_biweight4_8_c: 4655.3
h264_biweight4_8_rvv_i32: 946.7 (before)
h264_biweight4_8_rvv_i32: 727.3 (after)
h264_biweight8_8_c: 9061.8
h264_biweight8_8_rvv_i32: 2071.7 (before)
h264_biweight8_8_rvv_i32: 1685.8 (after)
h264_biweight16_8_c: 18020.5
h264_biweight16_8_rvv_i32: 3457.2 (before)
h264_biweight16_8_rvv_i32: 1935.8 (after)
2024-08-02 21:24:01 +03:00
Rémi Denis-Courmont
2f083fd581
lavc/audiodsp: drop R-V F vector_clipf
...
This is now firmly slower than C.
SiFive-U74 (cycles):
audiodsp.vector_clipf_c: 31.2
audiodsp.vector_clipf_rvf: 39.5
2024-08-01 19:29:40 +03:00
Rémi Denis-Courmont
54ae270213
lavc/rv34dsp: use saturating add/sub for R-V V DC add
...
T-Head C908 (cycles):
rv34_idct_dc_add_c: 113.2
rv34_idct_dc_add_rvv_i32: 48.5 (before)
rv34_idct_dc_add_rvv_i32: 39.5 (after)
2024-08-01 18:43:04 +03:00
Rémi Denis-Courmont
952b426f3b
lavc/bswapdsp: add RV Zvbb bswap16 and bswap32
2024-08-01 18:43:04 +03:00
Rémi Denis-Courmont
262168b04e
lavc/videodsp: RISC-V zicbop prefetch
...
There are currently no ways to run-time detect the CPU capability, so we
take it for granted (in the worst case, it will execute NOPs).
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
324eba69f7
lavc/vc1dsp: use saturating arithmetic for RVV inv_trans_dc
...
T-Head C908 (cycles):
vc1dsp.vc1_inv_trans_4x4_dc_c: 113.7
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 46.5 (before)
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 45.5 (after)
vc1dsp.vc1_inv_trans_4x8_dc_c: 230.7
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.7 (before)
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 52.5 (after)
vc1dsp.vc1_inv_trans_8x4_dc_c: 246.7
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 56.7 (before)
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 45.5 (after)
vc1dsp.vc1_inv_trans_8x8_dc_c: 419.7
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 81.2 (before)
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 53.5 (after)
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
784a72a116
lavc/vc1dsp: unify R-V V DC bypass functions
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
bd0c3edb13
lavu/riscv: count bytes rather than words for bswap32
...
This removes the dependency on Zba at essentially zero cost.
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
5171baa228
lavc/ac3dsp: fix R-V CPU requirements
...
It probably will not matter on any real hardware, but the Zbb optimisations
do not require Zba. And then, we need HAVE_RVV to build the RVV stuff.
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
7b24f96c87
lavc/vp9dsp: remove R-V I intra functions
...
At this point, they are identical to the C code, except for instruction
ordering. In fact, they are typically slower or no faster than the C code.
2024-07-29 21:16:41 +03:00