FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-19 05:49:09 +02:00

Author	SHA1	Message	Date
Rémi Denis-Courmont	da1ab7940e	riscv: remove unnecessary #include's	2024-11-25 19:29:21 +02:00
Niklas Haas	2d077f9acd	swscale/internal: group user-facing options together This is a preliminary step to separating these into a new struct. This commit contains no functional changes, it is a pure search-and-replace. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2024-11-21 12:49:56 +01:00
Rémi Denis-Courmont	1912c86af6	sws/range_convert: fix RISC-V chrFromJpeg	2024-11-17 11:28:21 +02:00
Ramiro Polla	f7ee0195df	swscale/range_convert: drop redundant conditionals from arch-specific init functions These conditions are already checked for in the main init function.	2024-10-27 13:20:56 +01:00
Ramiro Polla	7728b3357d	swscale/range_convert: call arch-specific init functions from main init function This commit also fixes the issue that the call to ff_sws_init_range_convert() from sws_init_swscale() was not setting up the arch-specific optimizations.	2024-10-27 13:20:56 +01:00
Niklas Haas	67adb30322	swscale: rename SwsContext to SwsInternal And preserve the public SwsContext as separate name. The motivation here is that I want to turn SwsContext into a public struct, while keeping the internal implementation hidden. Additionally, I also want to be able to use multiple internal implementations, e.g. for GPU devices. This commit does not include any functional changes. For the most part, it is a simple rename. The only complications arise from the public facing API functions, which preserve their current type (and hence require an additional unwrapping step internally), and the checkasm test framework, which directly accesses SwsInternal. For consistency, the affected functions that need to maintain a distionction have generally been changed to refer to the SwsContext as sws, and the SwsInternal as c. In an upcoming commit, I will provide a backing definition for the public SwsContext, and update `sws_internal()` to dereference the internal struct instead of merely casting it. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2024-10-24 22:50:00 +02:00
Rémi Denis-Courmont	27d28b68da	swscale/rgb2rgb: enable R-V V deinterleaveBytes T-Head C908: deinterleave_bytes_c: 100328.3 ( 1.00x) deinterleave_bytes_rvv_i32: 19331.3 ( 5.19x) deinterleave_bytes_aligned_c: 100337.5 ( 1.00x) deinterleave_bytes_aligned_rvv_i32: 15748.0 ( 6.37x) SpacemiT X60: deinterleave_bytes_c: 95230.6 ( 1.00x) deinterleave_bytes_rvv_i32: 9790.3 ( 9.73x) deinterleave_bytes_aligned_c: 96564.1 ( 1.00x) deinterleave_bytes_aligned_rvv_i32: 7780.1 (12.41x)	2024-09-04 22:04:11 +03:00
Rémi Denis-Courmont	210877c5fd	sws/riscv: depend on RVB and simplify accordingly	2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont	bd0c3edb13	lavu/riscv: count bytes rather than words for bswap32 This removes the dependency on Zba at essentially zero cost.	2024-07-30 18:41:51 +03:00
Rémi Denis-Courmont	4f2472909e	sws/riscv: add forward-edge CFI landing pads	2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont	e91a8cc4de	sws/riscv: require B or zba explicitly	2024-07-25 18:55:48 +03:00
Rémi Denis-Courmont	378d1b06c3	riscv: probe for Zbb extension at load time Due to hysterical raisins, most RISC-V Linux distributions target a RV64GC baseline excluding the Bit-manipulation ISA extensions, most notably: - Zba: address generation extension and - Zbb: basic bit manipulation extension. Most CPUs that would make sense to run FFmpeg on support Zba and Zbb (including the current FATE runner), so it makes sense to optimise for them. In fact a large chunk of existing assembler optimisations relies on Zba and/or Zbb. Since we cannot patch shared library code, the next best thing is to carry a flag initialised at load-time and check it on need basis. This results in 3 instructions overhead on isolated use, e.g.: 1: AUIPC rd, %pcrel_hi(ff_rv_zbb_supported) LBU rd, %pcrel_lo(1b)(rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here The C compiler will typically load the flag ahead of time to reducing latency, and can also keep it around if Zbb is used multiple times in a single optimisation scope. For this to work, the flag symbol must be hidden; otherwise the optimisation degrades with a GOT look-up to support interposition: 1: AUIPC rd, GOT_OFFSET_HI LD rd, GOT_OFFSET_LO(rd) LBU rd, (rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here This patch adds code to provision the flag in libraries using bit manipulation functions from libavutil: byte-swap, bit-weight and counting leading or trailing zeroes.	2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont	417957ec5e	sws/range_convert: R-V V to/from JPEG C908 X60 chrRangeFromJpeg_8_c: 2.7 2.5 chrRangeFromJpeg_8_rvv_i32: 1.7 1.5 chrRangeFromJpeg_24_c: 7.5 6.7 chrRangeFromJpeg_24_rvv_i32: 1.7 1.5 chrRangeFromJpeg_128_c: 55.2 34.7 chrRangeFromJpeg_128_rvv_i32: 6.5 3.0 chrRangeFromJpeg_144_c: 44.0 39.2 chrRangeFromJpeg_144_rvv_i32: 7.7 4.5 chrRangeFromJpeg_256_c: 78.2 69.5 chrRangeFromJpeg_256_rvv_i32: 12.2 6.0 chrRangeFromJpeg_512_c: 172.2 138.5 chrRangeFromJpeg_512_rvv_i32: 24.5 11.7 chrRangeToJpeg_8_c: 4.7 4.2 chrRangeToJpeg_8_rvv_i32: 2.0 1.7 chrRangeToJpeg_24_c: 13.7 12.2 chrRangeToJpeg_24_rvv_i32: 2.0 1.5 chrRangeToJpeg_128_c: 72.0 63.7 chrRangeToJpeg_128_rvv_i32: 6.7 3.2 chrRangeToJpeg_144_c: 80.7 71.7 chrRangeToJpeg_144_rvv_i32: 8.5 4.7 chrRangeToJpeg_256_c: 143.2 127.2 chrRangeToJpeg_256_rvv_i32: 13.5 6.5 chrRangeToJpeg_512_c: 285.7 253.7 chrRangeToJpeg_512_rvv_i32: 27.0 13.0 lumRangeFromJpeg_8_c: 1.7 1.5 lumRangeFromJpeg_8_rvv_i32: 1.2 1.0 lumRangeFromJpeg_24_c: 4.2 3.7 lumRangeFromJpeg_24_rvv_i32: 1.2 1.0 lumRangeFromJpeg_128_c: 21.7 19.2 lumRangeFromJpeg_128_rvv_i32: 3.7 1.7 lumRangeFromJpeg_144_c: 24.7 22.0 lumRangeFromJpeg_144_rvv_i32: 4.7 2.7 lumRangeFromJpeg_256_c: 43.7 39.0 lumRangeFromJpeg_256_rvv_i32: 7.5 3.2 lumRangeFromJpeg_512_c: 87.0 77.2 lumRangeFromJpeg_512_rvv_i32: 14.5 6.7 lumRangeToJpeg_8_c: 2.7 2.2 lumRangeToJpeg_8_rvv_i32: 1.0 1.0 lumRangeToJpeg_24_c: 7.2 6.5 lumRangeToJpeg_24_rvv_i32: 1.2 1.0 lumRangeToJpeg_128_c: 37.7 33.7 lumRangeToJpeg_128_rvv_i32: 3.7 2.0 lumRangeToJpeg_144_c: 42.5 37.7 lumRangeToJpeg_144_rvv_i32: 4.7 2.7 lumRangeToJpeg_256_c: 75.0 66.7 lumRangeToJpeg_256_rvv_i32: 7.5 3.5 lumRangeToJpeg_512_c: 149.5 133.0 lumRangeToJpeg_512_rvv_i32: 14.7 7.0	2024-06-10 22:48:52 +03:00
Rémi Denis-Courmont	7a3369398f	sws/input: R-V V 32-bit RGB to halved UV T-Head C908: abgr_to_uv_half_8_c: 2.2 abgr_to_uv_half_8_rvv_i32: 3.5 abgr_to_uv_half_128_c: 44.0 abgr_to_uv_half_128_rvv_i32: 13.0 abgr_to_uv_half_1080_c: 245.0 abgr_to_uv_half_1080_rvv_i32: 107.2 abgr_to_uv_half_1920_c: 406.2 abgr_to_uv_half_1920_rvv_i32: 188.7 bgra_to_uv_half_8_c: 2.2 bgra_to_uv_half_8_rvv_i32: 3.5 bgra_to_uv_half_128_c: 26.5 bgra_to_uv_half_128_rvv_i32: 13.0 bgra_to_uv_half_1080_c: 219.7 bgra_to_uv_half_1080_rvv_i32: 107.0 bgra_to_uv_half_1920_c: 406.7 bgra_to_uv_half_1920_rvv_i32: 188.7 SpacemiT X60: abgr_to_uv_half_8_c: 2.2 abgr_to_uv_half_8_rvv_i32: 3.0 abgr_to_uv_half_128_c: 28.2 abgr_to_uv_half_128_rvv_i32: 5.7 abgr_to_uv_half_1080_c: 235.5 abgr_to_uv_half_1080_rvv_i32: 47.7 abgr_to_uv_half_1920_c: 418.2 abgr_to_uv_half_1920_rvv_i32: 84.0 bgra_to_uv_half_8_c: 2.0 bgra_to_uv_half_8_rvv_i32: 3.0 bgra_to_uv_half_128_c: 23.7 bgra_to_uv_half_128_rvv_i32: 5.7 bgra_to_uv_half_1080_c: 195.5 bgra_to_uv_half_1080_rvv_i32: 47.7 bgra_to_uv_half_1920_c: 346.5 bgra_to_uv_half_1920_rvv_i32: 84.0	2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont	e2f069905e	sws/input: R-V V 32-bit RGB to UV	2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont	f5555cb106	sws/input: R-V V 32-bit RGB to Y T-Head C908: abgr_to_y_8_c: 2.5 abgr_to_y_8_rvv_i32: 2.2 abgr_to_y_128_c: 37.0 abgr_to_y_128_rvv_i32: 8.5 abgr_to_y_1080_c: 327.0 abgr_to_y_1080_rvv_i32: 69.5 abgr_to_y_1920_c: 552.0 abgr_to_y_1920_rvv_i32: 122.2 bgra_to_y_8_c: 2.5 bgra_to_y_8_rvv_i32: 2.2 bgra_to_y_128_c: 37.2 bgra_to_y_128_rvv_i32: 8.5 bgra_to_y_1080_c: 310.2 bgra_to_y_1080_rvv_i32: 69.5 bgra_to_y_1920_c: 568.2 bgra_to_y_1920_rvv_i32: 122.5 SpacemiT X60: abgr_to_y_8_c: 2.5 abgr_to_y_8_rvv_i32: 2.0 abgr_to_y_128_c: 33.0 abgr_to_y_128_rvv_i32: 3.7 abgr_to_y_1080_c: 276.0 abgr_to_y_1080_rvv_i32: 31.5 abgr_to_y_1920_c: 493.7 abgr_to_y_1920_rvv_i32: 55.5 bgra_to_y_8_c: 2.2 bgra_to_y_8_rvv_i32: 2.0 bgra_to_y_128_c: 33.0 bgra_to_y_128_rvv_i32: 3.7 bgra_to_y_1080_c: 276.0 bgra_to_y_1080_rvv_i32: 31.5 bgra_to_y_1920_c: 490.7 bgra_to_y_1920_rvv_i32: 55.5	2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont	e0f4d185f1	sws/input: R-V V rgb24ToUV_half and bgr24ToUV_half T-Head C908: rgb24_to_uv_half_4_c: 2.0 rgb24_to_uv_half_4_rvv_i32: 3.5 rgb24_to_uv_half_64_c: 27.0 rgb24_to_uv_half_64_rvv_i32: 12.5 rgb24_to_uv_half_540_c: 223.7 rgb24_to_uv_half_540_rvv_i32: 105.2 rgb24_to_uv_half_640_c: 265.5 rgb24_to_uv_half_640_rvv_i32: 123.7 rgb24_to_uv_half_960_c: 414.5 rgb24_to_uv_half_960_rvv_i32: 249.5 SpacemiT X60: rgb24_to_uv_half_4_c: 1.7 rgb24_to_uv_half_4_rvv_i32: 4.2 rgb24_to_uv_half_64_c: 24.0 rgb24_to_uv_half_64_rvv_i32: 8.7 rgb24_to_uv_half_540_c: 199.2 rgb24_to_uv_half_540_rvv_i32: 72.5 rgb24_to_uv_half_640_c: 235.7 rgb24_to_uv_half_640_rvv_i32: 85.2 rgb24_to_uv_half_960_c: 353.5 rgb24_to_uv_half_960_rvv_i32: 127.5	2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont	3ef5867e4b	sws/input: R-V V rgb24ToUV and bgr24ToUV T-Head C908: rgb24_to_uv_8_c: 2.7 rgb24_to_uv_8_rvv_i32: 3.2 rgb24_to_uv_128_c: 41.0 rgb24_to_uv_128_rvv_i32: 12.7 rgb24_to_uv_1080_c: 342.5 rgb24_to_uv_1080_rvv_i32: 105.7 rgb24_to_uv_1280_c: 406.0 rgb24_to_uv_1280_rvv_i32: 124.2 rgb24_to_uv_1920_c: 626.0 rgb24_to_uv_1920_rvv_i32: 186.0 SpacemiT X60: rgb24_to_uv_8_c: 2.5 rgb24_to_uv_8_rvv_i32: 3.0 rgb24_to_uv_128_c: 36.5 rgb24_to_uv_128_rvv_i32: 5.7 rgb24_to_uv_1080_c: 304.2 rgb24_to_uv_1080_rvv_i32: 49.0 rgb24_to_uv_1280_c: 360.5 rgb24_to_uv_1280_rvv_i32: 57.5 rgb24_to_uv_1920_c: 540.7 rgb24_to_uv_1920_rvv_i32: 86.2	2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont	79dfdac4db	sws/input: R-V V rgb24ToY & bgr24ToY T-Head C908: rgb24_to_y_8_c: 2.0 rgb24_to_y_8_rvv_i32: 2.7 rgb24_to_y_128_c: 26.2 rgb24_to_y_128_rvv_i32: 9.2 rgb24_to_y_1080_c: 219.5 rgb24_to_y_1080_rvv_i32: 76.2 rgb24_to_y_1280_c: 276.2 rgb24_to_y_1280_rvv_i32: 89.7 rgb24_to_y_1920_c: 389.7 rgb24_to_y_1920_rvv_i32: 134.2 SpacemiT X60: rgb24_to_y_8_c: 1.7 rgb24_to_y_8_rvv_i32: 2.2 rgb24_to_y_128_c: 23.2 rgb24_to_y_128_rvv_i32: 4.2 rgb24_to_y_1080_c: 195.0 rgb24_to_y_1080_rvv_i32: 33.7 rgb24_to_y_1280_c: 231.0 rgb24_to_y_1280_rvv_i32: 40.0 rgb24_to_y_1920_c: 346.2 rgb24_to_y_1920_rvv_i32: 59.7	2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont	6c6313f1b5	swscale/riscv: explicitly require Zbb for MIN	2024-05-10 18:59:06 +03:00
Rémi Denis-Courmont	b3825bbe45	riscv: test for assembler support This should fix the build on LLVM 16 and earlier, at the cost of turning all non-RVV optimisations off.	2023-12-08 17:21:09 +02:00
Rémi Denis-Courmont	6d60cc7baf	sws/rgb2rgb: fix unaligned accesses in R-V V YUYV to I422p In my personal opinion, we should not need to support unaligned YUY2 pixel maps. They should always be aligned to at least 32 bits, and the current code assumes just 16 bits. However checkasm does test for unaligned input bitmaps. QEMU accepts it, but real hardware dose not. In this particular case, we can at the same time improve performance and handle unaligned inputs, so do just that. uyvytoyuv422_c: 104379.0 uyvytoyuv422_c: 104060.0 uyvytoyuv422_rvv_i32: 25284.0 (before) uyvytoyuv422_rvv_i32: 19303.2 (after)	2023-11-13 18:34:29 +02:00
Rémi Denis-Courmont	5b8b5ec9c5	sws/rgb2rgb: rework R-V V YUY2 to 4:2:2 planar This saves three scratch registers and three instructions per line. The performance gains are mostly negligible. The main point is to free up registers for further rework.	2023-11-13 18:34:29 +02:00
Rémi Denis-Courmont	19baf4e009	swscale/rgb2rgb: R-V V deinterleaveBytes	2023-10-03 22:53:20 +03:00
Rémi Denis-Courmont	ede3215115	swscale/rgb2rgb: fix extra iteration in R-V V interleave There was an additional iteration doing nothing for each line, due to checking the selected vector length instead of the available vector length.	2023-10-03 22:53:20 +03:00
Rémi Denis-Courmont	d14130aea3	swscale/rgb2rgb: unroll R-V V interleave_bytes	2023-10-03 20:48:47 +03:00
Rémi Denis-Courmont	6269c4a440	swscale/rgb2rgb: unroll RISC-V V uyvytoyuv422	2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont	e50f8e861b	swscale/rgb2rgb: avoid S-regs in RISC-V V uyvytoyuv422 We can make do with callee-clobbered registers only now. As an added bonus, this makes the code XLEN-independent.	2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont	be37a2e364	swscale/rgb2rgb: rework RISC-V V uyvytoyuv422 This avoids using relatively slow register strides.	2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont	1a4bd76ea5	swscale/rgb2rgb: remove R-V V shuffle_bytes_3012 This is slower than the Zbb version on real hardware due to register strides. Proper support for vector byte-swap requires the Zvbb extension, but it's much too early for me to worry about it.	2023-10-02 22:28:38 +03:00
Rémi Denis-Courmont	c4a144c29d	swscale/rgb2rgb: add R-V Zbb shuffle_bytes_3210	2023-10-02 22:28:25 +03:00
Rémi Denis-Courmont	c2b38619c0	swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{1230,3012} This avoids strided loads. Before: shuffle_bytes_1230_rvv_i32: 308.7 shuffle_bytes_3012_rvv_i32: 308.7 After: shuffle_bytes_1230_rvv_i32: 46.7 shuffle_bytes_3012_rvv_i32: 46.7	2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont	15982554e6	swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{0321,2103} This avoids strided loads. Before: shuffle_bytes_0321_rvv_i32: 307.7 shuffle_bytes_2103_rvv_i32: 308.7 After: shuffle_bytes_0321_rvv_i32: 59.7 shuffle_bytes_2103_rvv_i32: 61.5	2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont	d3948e4db5	swscale: inline ff_shuffle_bytes_3210_rvv No functional changes.	2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont	b6585eb04c	lavu: add/use flag for RISC-V Zba extension The code was blindly assuming that Zbb or V implied Zba. While the earlier is practically always true, the later broke some QEMU setups, as V was introduced earlier than Zba.	2023-07-19 19:29:35 +03:00
Khem Raj	a7b3c0203f	libswscale/riscv: fix syntax of vsetvli Add missing operand which clang complains about but GCC assumes it to be 'm1' if not specified. Works around build failure with Clang: \| src/libswscale/riscv/rgb2rgb_rvv.S:88:25: error: operand must be e[8\|16\|32\|64\|128\|256\|512\|1024],m[1\|2\|4\|8\|f2\|f4\|f8],[ta\|tu],[ma\|mu] \| vsetvli t4, t3, e8, ta, ma \| ^ Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>	2023-07-13 22:01:24 +03:00
Rémi Denis-Courmont	a1bfb5290e	sws/rgb2rgb: RISC-V 64-bit V packed YUYV/UYVY to planar 4:2:2 This is currently 64-bit only because the stack spilling code would not assemble on RV32I (and it would corrupt s0 and s1 on RV128I, in theory). This could be added later in the unlikely that someone wants it.	2022-09-30 07:25:44 +02:00
Rémi Denis-Courmont	9181835a24	sws/rgb2rgb: RISC-V V interleaveBytes	2022-09-30 07:24:09 +02:00
Rémi Denis-Courmont	66a03f4053	sws/rgb2rgb: RISC-V V shuffle_bytes_xxxx functions	2022-09-30 07:24:09 +02:00

39 Commits