1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-23 12:43:46 +02:00
Commit Graph

286 Commits

Author SHA1 Message Date
Nuo Mi
8d27256a74 avcodec/vvcdec: remove vvc prefix for x86 and riscv 2024-12-22 21:00:06 +08:00
sunyuechi
6b31e42c47 lavc/riscv: vset macro for simplify if-else 2024-12-21 12:03:45 +08:00
Rémi Denis-Courmont
bd226fdd74 lavc/h264dsp: R-V V intra loop filter
As with the inter loop filter, performance metrics seem to be biased in
favour of the C implementation because checkasm inputs almost always
fall in the no-op case.

h264_h_loop_filter_chroma_intra_8bpp_c:                 82.8 ( 1.00x)
h264_h_loop_filter_chroma_intra_8bpp_rvv_i32:           72.6 ( 1.14x)
h264_h_loop_filter_chroma_mbaff_intra_8bpp_c:           41.1 ( 1.00x)
h264_h_loop_filter_chroma_mbaff_intra_8bpp_rvv_i32:     72.6 ( 0.57x)
h264_h_loop_filter_luma_intra_8bpp_c:                  166.1 ( 1.00x)
h264_h_loop_filter_luma_intra_8bpp_rvv_i32:            395.4 ( 0.42x)
h264_h_loop_filter_luma_mbaff_intra_8bpp_c:             93.3 ( 1.00x)
h264_h_loop_filter_luma_mbaff_intra_8bpp_rvv_i32:      395.4 ( 0.24x)
h264_v_loop_filter_chroma_intra_8bpp_c:                134.8 ( 1.00x)
h264_v_loop_filter_chroma_intra_8bpp_rvv_i32:           51.6 ( 2.61x)
h264_v_loop_filter_luma_intra_8bpp_c:                  468.1 ( 1.00x)
h264_v_loop_filter_luma_intra_8bpp_rvv_i32:            134.8 ( 3.47x)
2024-12-17 09:00:28 +02:00
sunyuechi
16d4945e9a lavc/vvc_mc R-V V sad
k230               banana_f3
sad_8x16_c:                 387.7 ( 1.00x)    394.9 ( 1.00x)
sad_8x16_rvv_i32:           109.7 ( 3.53x)    103.5 ( 3.82x)
sad_16x8_c:                 378.2 ( 1.00x)    384.7 ( 1.00x)
sad_16x8_rvv_i32:            82.0 ( 4.61x)    61.7 ( 6.24x)
sad_16x16_c:                748.7 ( 1.00x)    759.7 ( 1.00x)
sad_16x16_rvv_i32:          128.5 ( 5.83x)    113.7 ( 6.68x)
2024-12-17 09:21:20 +08:00
sunyuechi
b3f7440298 lavc/hevc: R-V V put_pixels(pow2)
k230               banana_f3
put_hevc_pel_pixels4_8_c:               61.6 ( 1.00x)    69.5 ( 1.00x)
put_hevc_pel_pixels4_8_rvv_i32:         24.6 ( 2.50x)    28.0 ( 2.48x)
put_hevc_pel_pixels8_8_c:              209.8 ( 1.00x)    215.5 ( 1.00x)
put_hevc_pel_pixels8_8_rvv_i32:         52.6 ( 3.99x)    38.2 ( 5.64x)
put_hevc_pel_pixels16_8_c:             839.4 ( 1.00x)    830.0 ( 1.00x)
put_hevc_pel_pixels16_8_rvv_i32:       126.6 ( 6.63x)    90.5 ( 9.17x)
put_hevc_pel_pixels32_8_c:            3246.6 ( 1.00x)    3246.7 ( 1.00x)
put_hevc_pel_pixels32_8_rvv_i32:       311.6 (10.42x)    257.0 (12.63x)
put_hevc_pel_pixels64_8_c:           12894.6 ( 1.00x)    12892.7 ( 1.00x)
put_hevc_pel_pixels64_8_rvv_i32:      1135.8 (11.35x)    778.0 (16.57x)
2024-12-17 09:21:20 +08:00
sunyuechi
dad062c4f8 lavc/vvc_mc: R-V V put_pixels
k230               banana_f3
put_chroma_pixels_8_4x4_c:                              63.5 ( 1.00x)    59.2 ( 1.00x)
put_chroma_pixels_8_4x4_rvv_i32:                        26.5 ( 2.39x)    28.0 ( 2.12x)
put_chroma_pixels_8_8x8_c:                             211.8 ( 1.00x)    215.5 ( 1.00x)
put_chroma_pixels_8_8x8_rvv_i32:                        54.3 ( 3.90x)    48.8 ( 4.42x)
put_chroma_pixels_8_16x16_c:                           841.3 ( 1.00x)    830.0 ( 1.00x)
put_chroma_pixels_8_16x16_rvv_i32:                     137.5 ( 6.12x)    121.8 ( 6.82x)
put_chroma_pixels_8_32x32_c:                          3248.8 ( 1.00x)    3288.2 ( 1.00x)
put_chroma_pixels_8_32x32_rvv_i32:                     350.5 ( 9.27x)    288.5 (11.40x)
put_chroma_pixels_8_64x64_c:                         12998.3 ( 1.00x)    12976.2 ( 1.00x)
put_chroma_pixels_8_64x64_rvv_i32:                    1100.5 (11.81x)    924.0 (14.04x)
put_chroma_pixels_8_128x128_c:                       54284.0 ( 1.00x)    52654.5 ( 1.00x)
put_chroma_pixels_8_128x128_rvv_i32:                  7192.8 ( 7.55x)    2934.2 (17.94x)
put_luma_pixels_8_4x4_c:                                63.5 ( 1.00x)    69.5 ( 1.00x)
put_luma_pixels_8_4x4_rvv_i32:                          26.5 ( 2.39x)    28.0 ( 2.48x)
put_luma_pixels_8_8x8_c:                               211.5 ( 1.00x)    225.8 ( 1.00x)
put_luma_pixels_8_8x8_rvv_i32:                          54.3 ( 3.90x)    38.5 ( 5.86x)
put_luma_pixels_8_16x16_c:                             850.5 ( 1.00x)    830.0 ( 1.00x)
put_luma_pixels_8_16x16_rvv_i32:                       137.5 ( 6.18x)    100.8 ( 8.24x)
put_luma_pixels_8_32x32_c:                            3248.8 ( 1.00x)    3257.2 ( 1.00x)
put_luma_pixels_8_32x32_rvv_i32:                       341.3 ( 9.52x)    246.8 (13.20x)
put_luma_pixels_8_64x64_c:                           13007.5 ( 1.00x)    13038.8 ( 1.00x)
put_luma_pixels_8_64x64_rvv_i32:                      1119.0 (11.62x)    684.2 (19.06x)
put_luma_pixels_8_128x128_c:                         54219.3 ( 1.00x)    52060.8 ( 1.00x)
put_luma_pixels_8_128x128_rvv_i32:                    6813.5 ( 7.96x)    2548.8 (20.43x)
2024-12-17 09:21:20 +08:00
sunyuechi
9288196c0d lavc/riscv: Move VVC macro to h26x 2024-12-17 09:21:20 +08:00
sunyuechi
89df9c4404 lavc/vvc_mc: R-V V dmvr
k230               banana_f3
dmvr_8_12x20_c:                       619.3 ( 1.00x)    624.1 ( 1.00x)
dmvr_8_12x20_rvv_i32:                 128.6 ( 4.82x)    103.4 ( 6.04x)
dmvr_8_20x12_c:                       610.0 ( 1.00x)    665.6 ( 1.00x)
dmvr_8_20x12_rvv_i32:                 137.6 ( 4.44x)    92.9 ( 7.17x)
dmvr_8_20x20_c:                      1008.0 ( 1.00x)    1082.7 ( 1.00x)
dmvr_8_20x20_rvv_i32:                 221.1 ( 4.56x)    155.4 ( 6.97x)
dmvr_h_8_12x20_c:                    2008.0 ( 1.00x)    2009.7 ( 1.00x)
dmvr_h_8_12x20_rvv_i32:               239.6 ( 8.38x)    186.7 (10.77x)
dmvr_h_8_20x12_c:                    1989.5 ( 1.00x)    2009.4 ( 1.00x)
dmvr_h_8_20x12_rvv_i32:               230.3 ( 8.64x)    155.4 (12.93x)
dmvr_h_8_20x20_c:                    3304.1 ( 1.00x)    3342.9 ( 1.00x)
dmvr_h_8_20x20_rvv_i32:               378.3 ( 8.73x)    248.9 (13.43x)
dmvr_hv_8_12x20_c:                   3609.8 ( 1.00x)    3603.4 ( 1.00x)
dmvr_hv_8_12x20_rvv_i32:              369.1 ( 9.78x)    322.1 (11.19x)
dmvr_hv_8_20x12_c:                   3628.3 ( 1.00x)    3624.2 ( 1.00x)
dmvr_hv_8_20x12_rvv_i32:              322.8 (11.24x)    238.7 (15.19x)
dmvr_hv_8_20x20_c:                   5933.8 ( 1.00x)    5936.6 ( 1.00x)
dmvr_hv_8_20x20_rvv_i32:              526.5 (11.27x)    374.1 (15.87x)
dmvr_v_8_12x20_c:                    2156.3 ( 1.00x)    2155.4 ( 1.00x)
dmvr_v_8_12x20_rvv_i32:               239.6 ( 9.00x)    176.2 (12.24x)
dmvr_v_8_20x12_c:                    2137.6 ( 1.00x)    2165.9 ( 1.00x)
dmvr_v_8_20x12_rvv_i32:               230.3 ( 9.28x)    155.2 (13.96x)
dmvr_v_8_20x20_c:                    4183.8 ( 1.00x)    3592.9 ( 1.00x)
dmvr_v_8_20x20_rvv_i32:               369.3 (11.33x)    249.2 (14.42x)
2024-12-17 09:21:20 +08:00
sunyuechi
b86766d610 Update R-V V vvc_mc vset to support more lengths 2024-12-17 09:21:20 +08:00
sunyuechi
2dc864eb4e lavc/rv40dsp: fix RISC-V chroma_mc
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-12-10 11:24:45 -05:00
Rémi Denis-Courmont
f8e91ab05f lavc/h264idct: fix compilation for RV32IMA 2024-11-25 19:29:21 +02:00
Rémi Denis-Courmont
f2b945147d lavc/vp8dsp: fix compilation for RV32IMA 2024-11-25 19:29:21 +02:00
Rémi Denis-Courmont
d3acffae7a lavc/pixblockdsp: fix compilation for RV32IMA 2024-11-25 19:29:21 +02:00
Rémi Denis-Courmont
da1ab7940e riscv: remove unnecessary #include's 2024-11-25 19:29:21 +02:00
Rémi Denis-Courmont
607d4cca8e riscv/h264dsp: remove spurious instruction 2024-11-18 22:02:19 +02:00
Rémi Denis-Courmont
b75dff0e20 lavc/h264dsp: fix R-V V weight_pixels pointer arithmetic
As of 459a1512f1,
the code is unrolled to process two rows per iteration.
The output cursor thus needs to be incremented by twice the
stride, which is taken care of with SH1ADD. However the original
ADD from the original implemetation was incorrectly left over.
2024-11-18 20:04:58 +02:00
Rémi Denis-Courmont
bbb0fdedb7 lavc/h264idct: fix RISC-V group multiplier
After the branch, the expected SEW/LMUL ratio is 1 byte/vector.
So we have to set the same ratio before branching (QEMU does not care,
but real hardware does).
2024-11-17 16:35:27 +02:00
Rémi Denis-Courmont
fd8cbfec3d lavc/vp8dsp: remove RISC-V table alignment
These values are bytes and need not be aligned.
2024-11-17 11:28:21 +02:00
Rémi Denis-Courmont
690c015758 lavc/h264dsp: remove RISC-V table alignment
These values are bytes and need not be aligned.
2024-11-17 11:28:21 +02:00
Rémi Denis-Courmont
c3051d94a7 lavc/h264dsp: move RISC-V fn pointers to .data.rel.ro
This should fix PIC builds.
2024-11-16 16:04:24 +02:00
Rémi Denis-Courmont
1eb026dd8b riscv/vvc: fix UNDEF whilst initialising DSP
The current triggers an illegal instruction if the CPU does not support
vectors.
2024-10-12 09:23:33 +03:00
Niklas Haas
2f77ecc6bc avcodec/riscv: add h264 qpel
Benched on K230 for VLEN 128, SpaceMIT for VLEN 256. Variants for 4
width have no speedup for VLEN 256 vs VLEN 128 on available hardware,
so were disabled.

                        C      RVV128          C     RVV256
avg_h264_qpel_4_mc00_8  33.9   33.6   (1.01x)
avg_h264_qpel_4_mc01_8  218.8  89.1   (2.46x)
avg_h264_qpel_4_mc02_8  218.8  79.8   (2.74x)
avg_h264_qpel_4_mc03_8  218.8  89.1   (2.46x)
avg_h264_qpel_4_mc10_8  172.3  126.1  (1.37x)
avg_h264_qpel_4_mc11_8  339.1  190.8  (1.78x)
avg_h264_qpel_4_mc12_8  533.6  357.6  (1.49x)
avg_h264_qpel_4_mc13_8  348.4  190.8  (1.83x)
avg_h264_qpel_4_mc20_8  144.8  116.8  (1.24x)
avg_h264_qpel_4_mc21_8  478.1  385.6  (1.24x)
avg_h264_qpel_4_mc22_8  348.4  283.6  (1.23x)
avg_h264_qpel_4_mc23_8  478.1  394.6  (1.21x)
avg_h264_qpel_4_mc30_8  172.6  126.1  (1.37x)
avg_h264_qpel_4_mc31_8  339.4  191.1  (1.78x)
avg_h264_qpel_4_mc32_8  542.9  357.6  (1.52x)
avg_h264_qpel_4_mc33_8  339.4  191.1  (1.78x)
avg_h264_qpel_8_mc00_8  116.8  42.9   (2.72x)  123.6  50.6   (2.44x)
avg_h264_qpel_8_mc01_8  774.4  163.1  (4.75x)  779.8  165.1  (4.72x)
avg_h264_qpel_8_mc02_8  774.4  154.1  (5.03x)  779.8  144.3  (5.40x)
avg_h264_qpel_8_mc03_8  774.4  163.3  (4.74x)  779.8  165.3  (4.72x)
avg_h264_qpel_8_mc10_8  617.1  237.3  (2.60x)  613.1  227.6  (2.69x)
avg_h264_qpel_8_mc11_8  1209.3 376.4  (3.21x)  1206.8 363.1  (3.32x)
avg_h264_qpel_8_mc12_8  1913.3 598.6  (3.20x)  1894.3 561.1  (3.38x)
avg_h264_qpel_8_mc13_8  1218.6 376.4  (3.24x)  1217.1 363.1  (3.35x)
avg_h264_qpel_8_mc20_8  524.4  228.1  (2.30x)  519.3  227.6  (2.28x)
avg_h264_qpel_8_mc21_8  1709.6 681.9  (2.51x)  1707.1 644.3  (2.65x)
avg_h264_qpel_8_mc22_8  1274.3 459.6  (2.77x)  1279.8 436.1  (2.93x)
avg_h264_qpel_8_mc23_8  1700.3 672.6  (2.53x)  1706.8 644.6  (2.65x)
avg_h264_qpel_8_mc30_8  607.6  246.6  (2.46x)  623.6  238.1  (2.62x)
avg_h264_qpel_8_mc31_8  1209.6 376.4  (3.21x)  1206.8 363.1  (3.32x)
avg_h264_qpel_8_mc32_8  1904.1 607.9  (3.13x)  1894.3 571.3  (3.32x)
avg_h264_qpel_8_mc33_8  1209.6 376.1  (3.22x)  1206.8 363.1  (3.32x)
avg_h264_qpel_16_mc00_8 431.9  89.1   (4.85x)  436.1  71.3   (6.12x)
avg_h264_qpel_16_mc01_8 2894.6 376.1  (7.70x)  2842.3 300.6  (9.46x)
avg_h264_qpel_16_mc02_8 2987.3 348.4  (8.57x)  2967.3 290.1  (10.23x)
avg_h264_qpel_16_mc03_8 2885.3 376.4  (7.67x)  2842.3 300.6  (9.46x)
avg_h264_qpel_16_mc10_8 2404.1 524.4  (4.58x)  2404.8 456.8  (5.26x)
avg_h264_qpel_16_mc11_8 4709.4 811.6  (5.80x)  4675.6 706.8  (6.62x)
avg_h264_qpel_16_mc12_8 7477.9 1274.3 (5.87x)  7436.1 1061.1 (7.01x)
avg_h264_qpel_16_mc13_8 4718.6 820.6  (5.75x)  4655.1 706.8  (6.59x)
avg_h264_qpel_16_mc20_8 2052.1 487.1  (4.21x)  2071.3 446.3  (4.64x)
avg_h264_qpel_16_mc21_8 7440.6 1422.6 (5.23x)  6727.8 1217.3 (5.53x)
avg_h264_qpel_16_mc22_8 5051.9 950.4  (5.32x)  5071.6 790.3  (6.42x)
avg_h264_qpel_16_mc23_8 6764.9 1422.3 (4.76x)  6748.6 1217.3 (5.54x)
avg_h264_qpel_16_mc30_8 2413.1 524.4  (4.60x)  2415.1 467.3  (5.17x)
avg_h264_qpel_16_mc31_8 4681.6 839.1  (5.58x)  4675.6 727.6  (6.43x)
avg_h264_qpel_16_mc32_8 8579.6 1292.8 (6.64x)  7436.3 1071.3 (6.94x)
avg_h264_qpel_16_mc33_8 5375.9 829.9  (6.48x)  4665.3 717.3  (6.50x)
put_h264_qpel_4_mc00_8  24.4   24.4   (1.00x)
put_h264_qpel_4_mc01_8  987.4  79.8   (12.37x)
put_h264_qpel_4_mc02_8  190.8  79.8   (2.39x)
put_h264_qpel_4_mc03_8  209.6  89.1   (2.35x)
put_h264_qpel_4_mc10_8  163.3  117.1  (1.39x)
put_h264_qpel_4_mc11_8  339.4  181.6  (1.87x)
put_h264_qpel_4_mc12_8  533.6  348.4  (1.53x)
put_h264_qpel_4_mc13_8  339.4  190.8  (1.78x)
put_h264_qpel_4_mc20_8  126.3  116.8  (1.08x)
put_h264_qpel_4_mc21_8  468.9  376.1  (1.25x)
put_h264_qpel_4_mc22_8  330.1  274.4  (1.20x)
put_h264_qpel_4_mc23_8  468.9  376.1  (1.25x)
put_h264_qpel_4_mc30_8  163.3  126.3  (1.29x)
put_h264_qpel_4_mc31_8  339.1  191.1  (1.77x)
put_h264_qpel_4_mc32_8  533.6  348.4  (1.53x)
put_h264_qpel_4_mc33_8  339.4  181.8  (1.87x)
put_h264_qpel_8_mc00_8  98.6   33.6   (2.93x)  92.3   40.1   (2.30x)
put_h264_qpel_8_mc01_8  737.1  153.8  (4.79x)  738.1  144.3  (5.12x)
put_h264_qpel_8_mc02_8  663.1  135.3  (4.90x)  665.1  134.1  (4.96x)
put_h264_qpel_8_mc03_8  737.4  154.1  (4.79x)  1508.8 144.3  (10.46x)
put_h264_qpel_8_mc10_8  598.4  237.1  (2.52x)  592.3  227.6  (2.60x)
put_h264_qpel_8_mc11_8  1172.3 357.9  (3.28x)  1175.6 342.3  (3.43x)
put_h264_qpel_8_mc12_8  1867.1 589.1  (3.17x)  1863.1 561.1  (3.32x)
put_h264_qpel_8_mc13_8  1172.6 366.9  (3.20x)  1175.6 352.8  (3.33x)
put_h264_qpel_8_mc20_8  450.4  218.8  (2.06x)  446.3  206.8  (2.16x)
put_h264_qpel_8_mc21_8  1672.3 663.1  (2.52x)  1675.6 633.8  (2.64x)
put_h264_qpel_8_mc22_8  1144.6 1200.1 (0.95x)  1144.3 425.6  (2.69x)
put_h264_qpel_8_mc23_8  1672.6 672.4  (2.49x)  1665.3 634.1  (2.63x)
put_h264_qpel_8_mc30_8  598.6  237.3  (2.52x)  613.1  227.6  (2.69x)
put_h264_qpel_8_mc31_8  1172.3 376.1  (3.12x)  1175.6 352.6  (3.33x)
put_h264_qpel_8_mc32_8  1857.8 598.6  (3.10x)  1863.1 561.1  (3.32x)
put_h264_qpel_8_mc33_8  1172.3 376.1  (3.12x)  1175.6 352.8  (3.33x)
put_h264_qpel_16_mc00_8 320.6  61.4   (5.22x)  321.3  60.8   (5.28x)
put_h264_qpel_16_mc01_8 2774.3 339.1  (8.18x)  2759.1 279.8  (9.86x)
put_h264_qpel_16_mc02_8 2589.1 320.6  (8.08x)  2571.6 269.3  (9.55x)
put_h264_qpel_16_mc03_8 2774.3 339.4  (8.17x)  2738.1 290.1  (9.44x)
put_h264_qpel_16_mc10_8 2274.3 487.4  (4.67x)  2290.1 436.1  (5.25x)
put_h264_qpel_16_mc11_8 5237.1 792.9  (6.60x)  4529.8 685.8  (6.61x)
put_h264_qpel_16_mc12_8 7357.6 1255.8 (5.86x)  7352.8 1040.1 (7.07x)
put_h264_qpel_16_mc13_8 4579.9 792.9  (5.78x)  4571.6 686.1  (6.66x)
put_h264_qpel_16_mc20_8 1802.1 459.6  (3.92x)  1800.6 425.6  (4.23x)
put_h264_qpel_16_mc21_8 6644.6 2246.6 (2.96x)  6644.3 1196.6 (5.55x)
put_h264_qpel_16_mc22_8 4589.1 913.4  (5.02x)  4592.3 769.3  (5.97x)
put_h264_qpel_16_mc23_8 6644.6 1394.6 (4.76x)  6634.1 1196.6 (5.54x)
put_h264_qpel_16_mc30_8 2274.3 496.6  (4.58x)  2290.1 456.8  (5.01x)
put_h264_qpel_16_mc31_8 5255.6 802.1  (6.55x)  4550.8 706.8  (6.44x)
put_h264_qpel_16_mc32_8 7376.1 1265.1 (5.83x)  7352.8 1050.6 (7.00x)
put_h264_qpel_16_mc33_8 4579.9 802.1  (5.71x)  4561.1 696.3  (6.55x)

Signed-off-by: Niklas Haas <git@haasn.dev>
Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-09-28 18:35:35 +02:00
Rémi Denis-Courmont
6611bf5484 lavc/h264dsp: optimise R-V V biweight for shorter heights
T-Head C908:
h264_biweight2_8_c:                                    313.7 ( 1.00x)
h264_biweight2_8_rvv_i32:              before          239.5 ( 1.23x)
h264_biweight2_8_rvv_i32:              after            72.7 ( 4.31x)
h264_biweight4_8_c:                                    582.0 ( 1.00x)
h264_biweight4_8_rvv_i32:              before          471.0 ( 1.16x)
h264_biweight4_8_rvv_i32:              after            91.5 ( 6.36x)
h264_biweight8_8_c:                                   1110.0 ( 1.00x)
h264_biweight8_8_rvv_i32:              before          943.3 ( 1.10x)
h264_biweight8_8_rvv_i64:              after           147.0 ( 7.55x)

SpacemiT X60:
h264_biweight2_8_c:                                    311.4 ( 1.00x)
h264_biweight2_8_rvv_i32:              before          363.1 ( 0.83x)
h264_biweight2_8_rvv_i32:              after           103.1 ( 3.02x)
h264_biweight4_8_c:                                    571.9 ( 1.00x)
h264_biweight4_8_rvv_i32:              before          717.4 ( 0.78x)
h264_biweight4_8_rvv_i32:              after            71.8 ( 7.96x)
h264_biweight8_8_c:                                   1103.1 ( 1.00x)
h264_biweight8_8_rvv_i32:              before         1415.2 ( 0.76x)
h264_biweight8_8_rvv_i64:              ater             92.8 (11.88x)
2024-09-24 20:04:51 +03:00
Rémi Denis-Courmont
459a1512f1 lavc/h264dsp: unroll R-V V weight16
As VLSE128.V does not exist, we have no other way to deal with latency.

T-Head C908:
h264_weight16_8_c:                                     989.4 ( 1.00x)
h264_weight16_8_rvv_i32:                               193.2 ( 5.12x)

SpacemiT X60:
h264_weight16_8_c:                                     874.1 ( 1.00x)
h264_weight16_8_rvv_i32:                               196.9 ( 4.44x)
2024-09-24 20:04:51 +03:00
Rémi Denis-Courmont
4936bb2508 lavc/h264dsp: optimise R-V V weight for shorter heights
The height is a power of two of up to 16 rows. The current code was
optimised for large sample counts.

T-Head C908:
h264_weight2_8_c:                                      211.7 ( 1.00x)
h264_weight2_8_rvv_i32:                   before       184.0 ( 1.15x)
h264_weight2_8_rvv_i32:                   after         54.2 ( 3.90x)
h264_weight4_8_c:                                      285.7 ( 1.00x)
h264_weight4_8_rvv_i32:                   before       341.2 ( 0.86x)
h264_weight4_8_rvv_i32:                   after         82.2 ( 3.47x)
h264_weight8_8_c:                                      498.7 ( 1.00x)
h264_weight8_8_rvv_i32:                   before       683.7 ( 0.73x)
h264_weight8_8_rvv_i64:                   after        128.5 ( 3.95x)
h264_weight16_8_c:                                     878.2 ( 1.00x)
h264_weight16_8_rvv_i32:                  unchanged    239.5 ( 3.67x)

SpacemiT X60:
h264_weight2_8_c:                                      207.2 ( 1.00x)
h264_weight2_8_rvv_i32:                   before       259.6 ( 0.80x)
h264_weight2_8_rvv_i32:                   after         82.2 ( 2.52x)
h264_weight4_8_c:                                      290.8 ( 1.00x)
h264_weight4_8_rvv_i32:                   before       509.6 ( 0.57x)
h264_weight4_8_rvv_i32:                   after         61.5 ( 4.73x)
h264_weight8_8_c:                                      498.8 ( 1.00x)
h264_weight8_8_rvv_i32:                   before      1019.8 ( 0.49x)
h264_weight8_8_rvv_i64:                   after         71.8 ( 6.95x)
h264_weight16_8_c:                                     874.0 ( 1.00x)
h264_weight16_8_rvv_i32:                  unchanged    249.0 ( 3.51x)
2024-09-24 20:04:51 +03:00
sunyuechi
ba7d0d5fc3 lavc/vvc_mc: R-V V avg w_avg
C908   X60
avg_8_2x2_c                                        :    1.2    1.0
avg_8_2x2_rvv_i32                                  :    0.7    0.7
avg_8_2x4_c                                        :    2.0    2.2
avg_8_2x4_rvv_i32                                  :    1.2    1.2
avg_8_2x8_c                                        :    3.7    4.0
avg_8_2x8_rvv_i32                                  :    1.7    1.5
avg_8_2x16_c                                       :    7.2    7.7
avg_8_2x16_rvv_i32                                 :    3.0    2.7
avg_8_2x32_c                                       :   14.2   15.2
avg_8_2x32_rvv_i32                                 :    5.5    5.0
avg_8_2x64_c                                       :   51.0   43.7
avg_8_2x64_rvv_i32                                 :   39.2   29.7
avg_8_2x128_c                                      :  100.5   79.2
avg_8_2x128_rvv_i32                                :   79.7   68.2
avg_8_4x2_c                                        :    1.7    2.0
avg_8_4x2_rvv_i32                                  :    1.0    0.7
avg_8_4x4_c                                        :    3.5    3.7
avg_8_4x4_rvv_i32                                  :    1.2    1.2
avg_8_4x8_c                                        :    6.7    7.0
avg_8_4x8_rvv_i32                                  :    1.7    1.5
avg_8_4x16_c                                       :   13.5   14.0
avg_8_4x16_rvv_i32                                 :    3.0    2.7
avg_8_4x32_c                                       :   26.2   27.7
avg_8_4x32_rvv_i32                                 :    5.5    4.7
avg_8_4x64_c                                       :   73.0   73.7
avg_8_4x64_rvv_i32                                 :   39.0   32.5
avg_8_4x128_c                                      :  143.0  137.2
avg_8_4x128_rvv_i32                                :   72.7   68.0
avg_8_8x2_c                                        :    3.5    3.5
avg_8_8x2_rvv_i32                                  :    1.0    0.7
avg_8_8x4_c                                        :    6.2    6.5
avg_8_8x4_rvv_i32                                  :    1.5    1.0
avg_8_8x8_c                                        :   12.7   13.2
avg_8_8x8_rvv_i32                                  :    2.0    1.5
avg_8_8x16_c                                       :   25.0   26.5
avg_8_8x16_rvv_i32                                 :    3.2    2.7
avg_8_8x32_c                                       :   50.0   52.7
avg_8_8x32_rvv_i32                                 :    6.2    5.0
avg_8_8x64_c                                       :  118.7  122.5
avg_8_8x64_rvv_i32                                 :   40.2   31.5
avg_8_8x128_c                                      :  236.7  220.2
avg_8_8x128_rvv_i32                                :   85.2   67.7
avg_8_16x2_c                                       :    6.2    6.7
avg_8_16x2_rvv_i32                                 :    1.2    0.7
avg_8_16x4_c                                       :   12.5   13.0
avg_8_16x4_rvv_i32                                 :    1.7    1.0
avg_8_16x8_c                                       :   24.5   26.0
avg_8_16x8_rvv_i32                                 :    3.0    1.7
avg_8_16x16_c                                      :   49.0   51.5
avg_8_16x16_rvv_i32                                :    5.5    3.0
avg_8_16x32_c                                      :   97.5  102.5
avg_8_16x32_rvv_i32                                :   10.5    5.5
avg_8_16x64_c                                      :  213.7  222.0
avg_8_16x64_rvv_i32                                :   48.5   34.2
avg_8_16x128_c                                     :  434.7  420.0
avg_8_16x128_rvv_i32                               :   97.7   74.0
avg_8_32x2_c                                       :   12.2   12.7
avg_8_32x2_rvv_i32                                 :    1.5    1.0
avg_8_32x4_c                                       :   24.5   25.5
avg_8_32x4_rvv_i32                                 :    3.0    1.7
avg_8_32x8_c                                       :   48.5   50.7
avg_8_32x8_rvv_i32                                 :    5.2    2.7
avg_8_32x16_c                                      :   96.7  101.2
avg_8_32x16_rvv_i32                                :   10.2    5.0
avg_8_32x32_c                                      :  192.7  202.2
avg_8_32x32_rvv_i32                                :   19.7    9.5
avg_8_32x64_c                                      :  427.5  426.5
avg_8_32x64_rvv_i32                                :   64.2   18.2
avg_8_32x128_c                                     :  816.5  821.0
avg_8_32x128_rvv_i32                               :  135.2   75.5
avg_8_64x2_c                                       :   24.0   25.2
avg_8_64x2_rvv_i32                                 :    2.7    1.5
avg_8_64x4_c                                       :   48.2   50.5
avg_8_64x4_rvv_i32                                 :    5.0    2.7
avg_8_64x8_c                                       :   96.0  100.7
avg_8_64x8_rvv_i32                                 :    9.7    4.5
avg_8_64x16_c                                      :  207.7  201.2
avg_8_64x16_rvv_i32                                :   19.0    9.0
avg_8_64x32_c                                      :  383.2  402.0
avg_8_64x32_rvv_i32                                :   37.5   17.5
avg_8_64x64_c                                      :  837.2  828.7
avg_8_64x64_rvv_i32                                :   84.7   35.5
avg_8_64x128_c                                     : 1640.7 1640.2
avg_8_64x128_rvv_i32                               :  206.0  153.0
avg_8_128x2_c                                      :   48.7   51.0
avg_8_128x2_rvv_i32                                :    5.2    2.7
avg_8_128x4_c                                      :   96.7  101.5
avg_8_128x4_rvv_i32                                :   10.2    5.0
avg_8_128x8_c                                      :  192.2  202.0
avg_8_128x8_rvv_i32                                :   19.7    9.2
avg_8_128x16_c                                     :  400.7  403.2
avg_8_128x16_rvv_i32                               :   38.7   18.5
avg_8_128x32_c                                     :  786.7  805.7
avg_8_128x32_rvv_i32                               :   77.0   36.2
avg_8_128x64_c                                     : 1615.5 1655.5
avg_8_128x64_rvv_i32                               :  189.7   80.7
avg_8_128x128_c                                    : 3182.0 3238.0
avg_8_128x128_rvv_i32                              :  397.5  308.5
w_avg_8_2x2_c                                      :    1.7    1.2
w_avg_8_2x2_rvv_i32                                :    1.2    1.0
w_avg_8_2x4_c                                      :    2.7    2.7
w_avg_8_2x4_rvv_i32                                :    1.7    1.5
w_avg_8_2x8_c                                      :   21.7    4.7
w_avg_8_2x8_rvv_i32                                :    2.7    2.5
w_avg_8_2x16_c                                     :    9.5    9.2
w_avg_8_2x16_rvv_i32                               :    4.7    4.2
w_avg_8_2x32_c                                     :   19.0   18.7
w_avg_8_2x32_rvv_i32                               :    9.0    8.0
w_avg_8_2x64_c                                     :   62.0   50.2
w_avg_8_2x64_rvv_i32                               :   47.7   33.5
w_avg_8_2x128_c                                    :  116.7   87.7
w_avg_8_2x128_rvv_i32                              :   80.0   69.5
w_avg_8_4x2_c                                      :    2.5    2.5
w_avg_8_4x2_rvv_i32                                :    1.2    1.0
w_avg_8_4x4_c                                      :    4.7    4.5
w_avg_8_4x4_rvv_i32                                :    1.7    1.7
w_avg_8_4x8_c                                      :    9.0    8.7
w_avg_8_4x8_rvv_i32                                :    2.7    2.5
w_avg_8_4x16_c                                     :   17.7   17.5
w_avg_8_4x16_rvv_i32                               :    4.7    4.2
w_avg_8_4x32_c                                     :   35.0   35.0
w_avg_8_4x32_rvv_i32                               :    9.0    8.0
w_avg_8_4x64_c                                     :  100.5   84.5
w_avg_8_4x64_rvv_i32                               :   42.2   33.7
w_avg_8_4x128_c                                    :  203.5  151.2
w_avg_8_4x128_rvv_i32                              :   83.0   69.5
w_avg_8_8x2_c                                      :    4.5    4.5
w_avg_8_8x2_rvv_i32                                :    1.2    1.2
w_avg_8_8x4_c                                      :    8.7    8.7
w_avg_8_8x4_rvv_i32                                :    2.0    1.7
w_avg_8_8x8_c                                      :   17.0   17.0
w_avg_8_8x8_rvv_i32                                :    3.2    2.5
w_avg_8_8x16_c                                     :   34.0   33.5
w_avg_8_8x16_rvv_i32                               :    5.5    4.2
w_avg_8_8x32_c                                     :   86.0   67.5
w_avg_8_8x32_rvv_i32                               :   10.5    8.0
w_avg_8_8x64_c                                     :  187.2  149.5
w_avg_8_8x64_rvv_i32                               :   45.0   35.5
w_avg_8_8x128_c                                    :  342.7  290.0
w_avg_8_8x128_rvv_i32                              :  108.7   70.2
w_avg_8_16x2_c                                     :    8.5    8.2
w_avg_8_16x2_rvv_i32                               :    2.0    1.2
w_avg_8_16x4_c                                     :   16.7   16.7
w_avg_8_16x4_rvv_i32                               :    3.0    1.7
w_avg_8_16x8_c                                     :   33.2   33.5
w_avg_8_16x8_rvv_i32                               :    5.5    3.0
w_avg_8_16x16_c                                    :   66.2   66.7
w_avg_8_16x16_rvv_i32                              :   10.5    5.0
w_avg_8_16x32_c                                    :  132.5  131.0
w_avg_8_16x32_rvv_i32                              :   20.0    9.7
w_avg_8_16x64_c                                    :  340.0  283.5
w_avg_8_16x64_rvv_i32                              :   60.5   37.2
w_avg_8_16x128_c                                   :  641.2  597.5
w_avg_8_16x128_rvv_i32                             :  118.7   77.7
w_avg_8_32x2_c                                     :   16.5   16.7
w_avg_8_32x2_rvv_i32                               :    3.2    1.7
w_avg_8_32x4_c                                     :   33.2   33.2
w_avg_8_32x4_rvv_i32                               :    5.5    2.7
w_avg_8_32x8_c                                     :   66.0   62.5
w_avg_8_32x8_rvv_i32                               :   10.5    5.0
w_avg_8_32x16_c                                    :  131.5  132.0
w_avg_8_32x16_rvv_i32                              :   20.2    9.5
w_avg_8_32x32_c                                    :  261.7  272.0
w_avg_8_32x32_rvv_i32                              :   39.7   18.0
w_avg_8_32x64_c                                    :  575.2  545.5
w_avg_8_32x64_rvv_i32                              :  105.5   58.7
w_avg_8_32x128_c                                   : 1154.2 1088.0
w_avg_8_32x128_rvv_i32                             :  207.0   98.0
w_avg_8_64x2_c                                     :   33.0   33.0
w_avg_8_64x2_rvv_i32                               :    6.2    2.7
w_avg_8_64x4_c                                     :   65.5   66.0
w_avg_8_64x4_rvv_i32                               :   11.5    5.0
w_avg_8_64x8_c                                     :  131.2  132.5
w_avg_8_64x8_rvv_i32                               :   22.5    9.5
w_avg_8_64x16_c                                    :  268.2  262.5
w_avg_8_64x16_rvv_i32                              :   44.2   18.0
w_avg_8_64x32_c                                    :  561.5  528.7
w_avg_8_64x32_rvv_i32                              :   88.0   35.2
w_avg_8_64x64_c                                    : 1136.2 1124.0
w_avg_8_64x64_rvv_i32                              :  222.0   82.2
w_avg_8_64x128_c                                   : 2345.0 2312.7
w_avg_8_64x128_rvv_i32                             :  423.0  190.5
w_avg_8_128x2_c                                    :   65.7   66.5
w_avg_8_128x2_rvv_i32                              :   11.2    5.5
w_avg_8_128x4_c                                    :  131.2  132.2
w_avg_8_128x4_rvv_i32                              :   22.0   10.2
w_avg_8_128x8_c                                    :  263.5  312.0
w_avg_8_128x8_rvv_i32                              :   43.2   19.7
w_avg_8_128x16_c                                   :  528.7  526.2
w_avg_8_128x16_rvv_i32                             :   85.5   39.5
w_avg_8_128x32_c                                   : 1067.7 1062.7
w_avg_8_128x32_rvv_i32                             :  171.7   78.2
w_avg_8_128x64_c                                   : 2234.7 2168.7
w_avg_8_128x64_rvv_i32                             :  400.0  159.0
w_avg_8_128x128_c                                  : 4752.5 4295.0
w_avg_8_128x128_rvv_i32                            :  757.7  365.5

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-09-24 20:04:51 +03:00
Anton Khirnov
3f9ca51015 lavc/opus*: move to opus/ subdir 2024-09-02 11:56:53 +02:00
Ramiro Polla
6aafe61285 avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t 2024-09-01 13:42:30 +02:00
Rémi Denis-Courmont
7d1dda4892 lavc/h264dsp: R-V V loop_filter_chroma
T-Head C908:
h264_v_loop_filter_chroma_8bpp_c:      137.4
h264_v_loop_filter_chroma_8bpp_rvv_i32: 54.2
2024-09-01 10:58:48 +03:00
Rémi Denis-Courmont
3a53656837 lavc/h264dsp: do not write back unmodified rows in R-V V loop filter 2024-09-01 10:52:26 +03:00
Rémi Denis-Courmont
d8fb44c0aa lavc/mpegvideoencdsp: R-V V add_8x8basis
T-Head C908:
add_8x8basis_c:      440.6
add_8x8basis_rvv_i32: 70.3

SpacemiT X60:
add_8x8basis_c:      436.3
add_8x8basis_rvv_i32: 40.5
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
1907dd7f23 lavc/mpegvideoencdsp: R-V V try_8x8basis
T-Head C908:
try_8x8basis_c:       922.5
try_8x8basis_rvv_i32: 135.3

SpacemiT X60:
try_8x8basis_c:       926.1
try_8x8basis_rvv_i32: 103.1
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
0fd37c00d7 lavc/mpegvideoencdsp: R-V V pix_norm1
T-Head C908:
pix_norm1_c:       480.2
pix_norm1_rvv_i64: 146.9

SpacemiT X60:
pix_norm1_c:       478.2
pix_norm1_rvv_i64:  92.7
2024-08-19 22:41:13 +03:00
Rémi Denis-Courmont
63d016aea5 lavc/mpegvideoencdsp: R-V V pix_sum
T-Head C908:
pix_sum_c:      332.2
pix_sum_rvv_i64: 91.2

SpacemiT X60:
pix_sum_c:      321.2
pix_sum_rvv_i64: 60.9
2024-08-19 22:41:13 +03:00
sunyuechi
4e7b5ac48f lavc/vp9dsp: R-V V mc bilin hv
C908   X60
vp9_avg_bilin_4hv_8bpp_c                           :   10.7    9.5
vp9_avg_bilin_4hv_8bpp_rvv_i32                     :    4.0    3.5
vp9_avg_bilin_8hv_8bpp_c                           :   38.5   34.2
vp9_avg_bilin_8hv_8bpp_rvv_i32                     :    7.2    6.5
vp9_avg_bilin_16hv_8bpp_c                          :  147.2  130.5
vp9_avg_bilin_16hv_8bpp_rvv_i32                    :   14.5   12.7
vp9_avg_bilin_32hv_8bpp_c                          :  574.2  509.7
vp9_avg_bilin_32hv_8bpp_rvv_i32                    :   42.5   38.0
vp9_avg_bilin_64hv_8bpp_c                          : 2321.2 2017.7
vp9_avg_bilin_64hv_8bpp_rvv_i32                    :  163.5  131.0
vp9_put_bilin_4hv_8bpp_c                           :   10.0    8.7
vp9_put_bilin_4hv_8bpp_rvv_i32                     :    3.5    3.0
vp9_put_bilin_8hv_8bpp_c                           :   35.2   31.2
vp9_put_bilin_8hv_8bpp_rvv_i32                     :    6.5    5.7
vp9_put_bilin_16hv_8bpp_c                          :  134.0  119.0
vp9_put_bilin_16hv_8bpp_rvv_i32                    :   12.7   11.5
vp9_put_bilin_32hv_8bpp_c                          :  538.5  464.2
vp9_put_bilin_32hv_8bpp_rvv_i32                    :   39.7   35.2
vp9_put_bilin_64hv_8bpp_c                          : 2111.7 1833.2
vp9_put_bilin_64hv_8bpp_rvv_i32                    :  138.5  122.5

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-08-19 22:29:20 +03:00
sunyuechi
9edd2e723b lavc/vp9dsp: R-V V mc bilin h v
C908   X60
vp9_avg_bilin_4h_8bpp_c                            :    5.5    4.7
vp9_avg_bilin_4h_8bpp_rvv_i32                      :    1.7    1.5
vp9_avg_bilin_4v_8bpp_c                            :    5.5    4.7
vp9_avg_bilin_4v_8bpp_rvv_i32                      :    1.5    1.2
vp9_avg_bilin_8h_8bpp_c                            :   20.0   17.7
vp9_avg_bilin_8h_8bpp_rvv_i32                      :    3.0    2.7
vp9_avg_bilin_8v_8bpp_c                            :   20.7   18.7
vp9_avg_bilin_8v_8bpp_rvv_i32                      :    3.0    2.7
vp9_avg_bilin_16h_8bpp_c                           :   78.2   69.7
vp9_avg_bilin_16h_8bpp_rvv_i32                     :    7.0    6.2
vp9_avg_bilin_16v_8bpp_c                           :   98.5   73.2
vp9_avg_bilin_16v_8bpp_rvv_i32                     :    7.0    6.0
vp9_avg_bilin_32h_8bpp_c                           :  325.5  275.5
vp9_avg_bilin_32h_8bpp_rvv_i32                     :   23.0   20.5
vp9_avg_bilin_32v_8bpp_c                           :  342.2  290.0
vp9_avg_bilin_32v_8bpp_rvv_i32                     :   21.7   19.5
vp9_avg_bilin_64h_8bpp_c                           : 1263.7 1095.7
vp9_avg_bilin_64h_8bpp_rvv_i32                     :   91.2   81.2
vp9_avg_bilin_64v_8bpp_c                           : 1331.7 1155.2
vp9_avg_bilin_64v_8bpp_rvv_i32                     :   91.2   81.0
vp9_put_bilin_4h_8bpp_c                            :    4.5    4.0
vp9_put_bilin_4h_8bpp_rvv_i32                      :    1.0    1.0
vp9_put_bilin_4v_8bpp_c                            :    4.7    4.2
vp9_put_bilin_4v_8bpp_rvv_i32                      :    1.0    1.0
vp9_put_bilin_8h_8bpp_c                            :   16.7   15.0
vp9_put_bilin_8h_8bpp_rvv_i32                      :    2.2    2.0
vp9_put_bilin_8v_8bpp_c                            :   17.5   15.7
vp9_put_bilin_8v_8bpp_rvv_i32                      :    2.2    2.0
vp9_put_bilin_16h_8bpp_c                           :   65.2   58.0
vp9_put_bilin_16h_8bpp_rvv_i32                     :    6.0    5.5
vp9_put_bilin_16v_8bpp_c                           :   69.2   61.7
vp9_put_bilin_16v_8bpp_rvv_i32                     :    5.7    5.2
vp9_put_bilin_32h_8bpp_c                           :  273.2  229.0
vp9_put_bilin_32h_8bpp_rvv_i32                     :   19.7   17.7
vp9_put_bilin_32v_8bpp_c                           :  290.5  243.7
vp9_put_bilin_32v_8bpp_rvv_i32                     :   18.7   16.7
vp9_put_bilin_64h_8bpp_c                           : 1040.5  910.5
vp9_put_bilin_64h_8bpp_rvv_i32                     :   82.5   73.0
vp9_put_bilin_64v_8bpp_c                           : 1108.5  971.0
vp9_put_bilin_64v_8bpp_rvv_i32                     :   82.2   73.2

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-08-19 22:29:20 +03:00
Rémi Denis-Courmont
616fdeaea3 lavc/riscv: depend on RVB and simplify accordingly
There is no known (real) hardware with V and without the complete B
extension. B was indeed required in the RISC-V application profile from
2022, earlier than V. There should not be any relevant hardware in the
future either.

In practice, different R-V Vector optimisations in FFmpeg already depend on
every constituent of the B extension anyhow, so it would not work well.
2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
4edfc11a28 lavc/h264dsp: R-V V idct4_add8 (all depths)
These are really just wrappers for idct4_add16intra functions, which are in
turn mostly wrappers for idct4_add and idct4_dc_add functions.

For benchmarks refer to the later two sets.
2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
de7f999481 lavc/videodsp: work-around LLVM-as
For some reason, it can't handle the normal syntax for an address operand
without an offset, so add a dummy zero offset.
2024-08-02 21:24:01 +03:00
Rémi Denis-Courmont
677f28b310 lavc/h264dsp: stick R-V V weight to 16-bit precision
T-Head C908 (ns):
h264_weight2_8_c:        1607.8
h264_weight2_8_rvv_i32:   515.0 (before)
h264_weight2_8_rvv_i32:   348.5 (after)
h264_weight4_8_c:        2255.8
h264_weight4_8_rvv_i32:  1015.0 (before)
h264_weight4_8_rvv_i32:   691.0 (after)
h264_weight8_8_c:        3857.5
h264_weight8_8_rvv_i32:  2218.8 (before)
h264_weight8_8_rvv_i32:  1561.3 (after)
h264_weight16_8_c:       7431.5
h264_weight16_8_rvv_i32: 2737.3 (before)
h264_weight16_8_rvv_i32: 1848.3 (after)

SpacemiT X60 (ns):
h264_weight2_8_c:        1624.1
h264_weight2_8_rvv_i32:   352.6 (before)
h264_weight2_8_rvv_i32:   259.3 (after)
h264_weight4_8_c:        2259.3
h264_weight4_8_rvv_i32:   685.8 (before)
h264_weight4_8_rvv_i32:   530.3 (after)
h264_weight8_8_c:        4103.3
h264_weight8_8_rvv_i32:  1581.8 (before)
h264_weight8_8_rvv_i32:  1238.6 (after)
h264_weight16_8_c:       7624.3
h264_weight16_8_rvv_i32: 2738.1 (before)
h264_weight16_8_rvv_i32: 1853.3 (after)
2024-08-02 21:24:01 +03:00
Rémi Denis-Courmont
afd45c7ff7 lavc/h264dsp: stick R-V V biweight to 16-bit
T-Head C908 (ns):
h264_biweight2_8_c:        2414.5
h264_biweight2_8_rvv_i32:   701.8 (before)
h264_biweight2_8_rvv_i32:   468.5 (after)
h264_biweight4_8_c:        4655.3
h264_biweight4_8_rvv_i32:  1377.5 (before)
h264_biweight4_8_rvv_i32:   931.8 (after)
h264_biweight8_8_c:        9701.5
h264_biweight8_8_rvv_i32:  2896.0 (before)
h264_biweight8_8_rvv_i32:  2070.5 (after)
h264_biweight16_8_c:      18025.0
h264_biweight16_8_rvv_i32: 3460.8 (before)
h264_biweight16_8_rvv_i32: 1978.0 (after)

SpacemiT X60 (ns):
h264_biweight2_8_c:        2415.5
h264_biweight2_8_rvv_i32:   478.2 (before)
h264_biweight2_8_rvv_i32:   362.8 (after)
h264_biweight4_8_c:        4655.3
h264_biweight4_8_rvv_i32:   946.7 (before)
h264_biweight4_8_rvv_i32:   727.3 (after)
h264_biweight8_8_c:        9061.8
h264_biweight8_8_rvv_i32:  2071.7 (before)
h264_biweight8_8_rvv_i32:  1685.8 (after)
h264_biweight16_8_c:      18020.5
h264_biweight16_8_rvv_i32: 3457.2 (before)
h264_biweight16_8_rvv_i32: 1935.8 (after)
2024-08-02 21:24:01 +03:00
Rémi Denis-Courmont
2f083fd581 lavc/audiodsp: drop R-V F vector_clipf
This is now firmly slower than C.

SiFive-U74 (cycles):
audiodsp.vector_clipf_c:   31.2
audiodsp.vector_clipf_rvf: 39.5
2024-08-01 19:29:40 +03:00
Rémi Denis-Courmont
54ae270213 lavc/rv34dsp: use saturating add/sub for R-V V DC add
T-Head C908 (cycles):
rv34_idct_dc_add_c:      113.2
rv34_idct_dc_add_rvv_i32: 48.5 (before)
rv34_idct_dc_add_rvv_i32: 39.5 (after)
2024-08-01 18:43:04 +03:00
Rémi Denis-Courmont
952b426f3b lavc/bswapdsp: add RV Zvbb bswap16 and bswap32 2024-08-01 18:43:04 +03:00
Rémi Denis-Courmont
262168b04e lavc/videodsp: RISC-V zicbop prefetch
There are currently no ways to run-time detect the CPU capability, so we
take it for granted (in the worst case, it will execute NOPs).
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
324eba69f7 lavc/vc1dsp: use saturating arithmetic for RVV inv_trans_dc
T-Head C908 (cycles):
vc1dsp.vc1_inv_trans_4x4_dc_c:      113.7
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 46.5 (before)
vc1dsp.vc1_inv_trans_4x4_dc_rvv_i32: 45.5 (after)
vc1dsp.vc1_inv_trans_4x8_dc_c:      230.7
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 65.7 (before)
vc1dsp.vc1_inv_trans_4x8_dc_rvv_i32: 52.5 (after)
vc1dsp.vc1_inv_trans_8x4_dc_c:      246.7
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 56.7 (before)
vc1dsp.vc1_inv_trans_8x4_dc_rvv_i64: 45.5 (after)
vc1dsp.vc1_inv_trans_8x8_dc_c:      419.7
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 81.2 (before)
vc1dsp.vc1_inv_trans_8x8_dc_rvv_i64: 53.5 (after)
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
784a72a116 lavc/vc1dsp: unify R-V V DC bypass functions 2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
bd0c3edb13 lavu/riscv: count bytes rather than words for bswap32
This removes the dependency on Zba at essentially zero cost.
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
5171baa228 lavc/ac3dsp: fix R-V CPU requirements
It probably will not matter on any real hardware, but the Zbb optimisations
do not require Zba. And then, we need HAVE_RVV to build the RVV stuff.
2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont
7b24f96c87 lavc/vp9dsp: remove R-V I intra functions
At this point, they are identical to the C code, except for instruction
ordering. In fact, they are typically slower or no faster than the C code.
2024-07-29 21:16:41 +03:00