1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-12-07 11:13:41 +02:00
Commit Graph

2564 Commits

Author SHA1 Message Date
Ramiro Polla
75f1a8e071 swscale/aarch64: add neon {lum,chr}ConvertRange
chrRangeFromJpeg_8_c: 29.2
chrRangeFromJpeg_8_neon: 19.5
chrRangeFromJpeg_24_c: 80.5
chrRangeFromJpeg_24_neon: 34.0
chrRangeFromJpeg_128_c: 413.7
chrRangeFromJpeg_128_neon: 156.0
chrRangeFromJpeg_144_c: 471.0
chrRangeFromJpeg_144_neon: 174.2
chrRangeFromJpeg_256_c: 842.0
chrRangeFromJpeg_256_neon: 305.5
chrRangeFromJpeg_512_c: 1699.0
chrRangeFromJpeg_512_neon: 608.0
chrRangeToJpeg_8_c: 51.7
chrRangeToJpeg_8_neon: 22.7
chrRangeToJpeg_24_c: 149.7
chrRangeToJpeg_24_neon: 38.0
chrRangeToJpeg_128_c: 761.7
chrRangeToJpeg_128_neon: 176.7
chrRangeToJpeg_144_c: 866.2
chrRangeToJpeg_144_neon: 198.7
chrRangeToJpeg_256_c: 1516.5
chrRangeToJpeg_256_neon: 348.7
chrRangeToJpeg_512_c: 3067.2
chrRangeToJpeg_512_neon: 692.7
lumRangeFromJpeg_8_c: 24.0
lumRangeFromJpeg_8_neon: 17.0
lumRangeFromJpeg_24_c: 56.7
lumRangeFromJpeg_24_neon: 21.0
lumRangeFromJpeg_128_c: 294.5
lumRangeFromJpeg_128_neon: 76.7
lumRangeFromJpeg_144_c: 332.5
lumRangeFromJpeg_144_neon: 86.7
lumRangeFromJpeg_256_c: 586.0
lumRangeFromJpeg_256_neon: 152.2
lumRangeFromJpeg_512_c: 1190.0
lumRangeFromJpeg_512_neon: 298.0
lumRangeToJpeg_8_c: 31.7
lumRangeToJpeg_8_neon: 19.5
lumRangeToJpeg_24_c: 83.5
lumRangeToJpeg_24_neon: 24.2
lumRangeToJpeg_128_c: 440.5
lumRangeToJpeg_128_neon: 91.0
lumRangeToJpeg_144_c: 504.2
lumRangeToJpeg_144_neon: 101.0
lumRangeToJpeg_256_c: 879.7
lumRangeToJpeg_256_neon: 177.2
lumRangeToJpeg_512_c: 1794.2
lumRangeToJpeg_512_neon: 354.0
2024-06-18 23:12:41 +02:00
James Almer
fcf72966a5 swscale/x86/range_convert: add missing AVX2 preprocessor wrapper
Fixes compilation with old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-16 10:09:38 -03:00
James Almer
8a4c9d6bd3 swscale/x86/range_convert: reduce amount of xmm regs clobbered in luma functions
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-15 21:02:06 -03:00
Ramiro Polla
f6859cade3 swscale/x86: add sse2 and avx2 {lum,chr}ConvertRange
chrRangeFromJpeg_8_c: 22.3
chrRangeFromJpeg_8_sse2: 13.3
chrRangeFromJpeg_8_avx2: 13.3
chrRangeFromJpeg_24_c: 72.8
chrRangeFromJpeg_24_sse2: 22.3
chrRangeFromJpeg_24_avx2: 17.5
chrRangeFromJpeg_128_c: 345.5
chrRangeFromJpeg_128_sse2: 106.0
chrRangeFromJpeg_128_avx2: 57.8
chrRangeFromJpeg_144_c: 380.5
chrRangeFromJpeg_144_sse2: 118.5
chrRangeFromJpeg_144_avx2: 62.3
chrRangeFromJpeg_256_c: 646.3
chrRangeFromJpeg_256_sse2: 218.8
chrRangeFromJpeg_256_avx2: 109.0
chrRangeFromJpeg_512_c: 1461.5
chrRangeFromJpeg_512_sse2: 426.5
chrRangeFromJpeg_512_avx2: 211.5
chrRangeToJpeg_8_c: 37.8
chrRangeToJpeg_8_sse2: 10.5
chrRangeToJpeg_8_avx2: 14.0
chrRangeToJpeg_24_c: 114.3
chrRangeToJpeg_24_sse2: 23.5
chrRangeToJpeg_24_avx2: 16.3
chrRangeToJpeg_128_c: 633.5
chrRangeToJpeg_128_sse2: 107.5
chrRangeToJpeg_128_avx2: 55.0
chrRangeToJpeg_144_c: 758.3
chrRangeToJpeg_144_sse2: 132.0
chrRangeToJpeg_144_avx2: 64.5
chrRangeToJpeg_256_c: 1345.0
chrRangeToJpeg_256_sse2: 218.0
chrRangeToJpeg_256_avx2: 105.3
chrRangeToJpeg_512_c: 2524.0
chrRangeToJpeg_512_sse2: 417.0
chrRangeToJpeg_512_avx2: 218.8
lumRangeFromJpeg_8_c: 11.8
lumRangeFromJpeg_8_sse2: 11.0
lumRangeFromJpeg_8_avx2: 10.3
lumRangeFromJpeg_24_c: 38.5
lumRangeFromJpeg_24_sse2: 15.5
lumRangeFromJpeg_24_avx2: 12.5
lumRangeFromJpeg_128_c: 232.3
lumRangeFromJpeg_128_sse2: 60.0
lumRangeFromJpeg_128_avx2: 26.8
lumRangeFromJpeg_144_c: 259.5
lumRangeFromJpeg_144_sse2: 65.3
lumRangeFromJpeg_144_avx2: 29.0
lumRangeFromJpeg_256_c: 464.5
lumRangeFromJpeg_256_sse2: 107.5
lumRangeFromJpeg_256_avx2: 54.0
lumRangeFromJpeg_512_c: 897.5
lumRangeFromJpeg_512_sse2: 224.5
lumRangeFromJpeg_512_avx2: 109.8
lumRangeToJpeg_8_c: 17.8
lumRangeToJpeg_8_sse2: 11.0
lumRangeToJpeg_8_avx2: 11.8
lumRangeToJpeg_24_c: 56.3
lumRangeToJpeg_24_sse2: 11.0
lumRangeToJpeg_24_avx2: 12.5
lumRangeToJpeg_128_c: 333.8
lumRangeToJpeg_128_sse2: 53.3
lumRangeToJpeg_128_avx2: 26.5
lumRangeToJpeg_144_c: 375.5
lumRangeToJpeg_144_sse2: 60.8
lumRangeToJpeg_144_avx2: 29.0
lumRangeToJpeg_256_c: 652.0
lumRangeToJpeg_256_sse2: 109.5
lumRangeToJpeg_256_avx2: 53.5
lumRangeToJpeg_512_c: 1284.3
lumRangeToJpeg_512_sse2: 218.0
lumRangeToJpeg_512_avx2: 108.3
2024-06-16 00:35:51 +02:00
Rémi Denis-Courmont
378d1b06c3 riscv: probe for Zbb extension at load time
Due to hysterical raisins, most RISC-V Linux distributions target a
RV64GC baseline excluding the Bit-manipulation ISA extensions, most
notably:
- Zba: address generation extension and
- Zbb: basic bit manipulation extension.
Most CPUs that would make sense to run FFmpeg on support Zba and Zbb
(including the current FATE runner), so it makes sense to optimise for
them. In fact a large chunk of existing assembler optimisations relies
on Zba and/or Zbb.

Since we cannot patch shared library code, the next best thing is to
carry a flag initialised at load-time and check it on need basis.
This results in 3 instructions overhead on isolated use, e.g.:
1:  AUIPC rd, %pcrel_hi(ff_rv_zbb_supported)
    LBU   rd, %pcrel_lo(1b)(rd)
    BEQZ  rd, non_Zbb_fallback_code
    // Zbb code here

The C compiler will typically load the flag ahead of time to reducing
latency, and can also keep it around if Zbb is used multiple times in a
single optimisation scope. For this to work, the flag symbol must be
hidden; otherwise the optimisation degrades with a GOT look-up to
support interposition:
1:  AUIPC rd, GOT_OFFSET_HI
    LD    rd, GOT_OFFSET_LO(rd)
    LBU   rd, (rd)
    BEQZ  rd, non_Zbb_fallback_code
    // Zbb code here

This patch adds code to provision the flag in libraries using bit
manipulation functions from libavutil: byte-swap, bit-weight and
counting leading or trailing zeroes.
2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont
417957ec5e sws/range_convert: R-V V to/from JPEG
C908   X60
chrRangeFromJpeg_8_c:          2.7    2.5
chrRangeFromJpeg_8_rvv_i32:    1.7    1.5
chrRangeFromJpeg_24_c:         7.5    6.7
chrRangeFromJpeg_24_rvv_i32:   1.7    1.5
chrRangeFromJpeg_128_c:       55.2   34.7
chrRangeFromJpeg_128_rvv_i32:  6.5    3.0
chrRangeFromJpeg_144_c:       44.0   39.2
chrRangeFromJpeg_144_rvv_i32:  7.7    4.5
chrRangeFromJpeg_256_c:       78.2   69.5
chrRangeFromJpeg_256_rvv_i32: 12.2    6.0
chrRangeFromJpeg_512_c:      172.2  138.5
chrRangeFromJpeg_512_rvv_i32: 24.5   11.7
chrRangeToJpeg_8_c:            4.7    4.2
chrRangeToJpeg_8_rvv_i32:      2.0    1.7
chrRangeToJpeg_24_c:          13.7   12.2
chrRangeToJpeg_24_rvv_i32:     2.0    1.5
chrRangeToJpeg_128_c:         72.0   63.7
chrRangeToJpeg_128_rvv_i32:    6.7    3.2
chrRangeToJpeg_144_c:         80.7   71.7
chrRangeToJpeg_144_rvv_i32:    8.5    4.7
chrRangeToJpeg_256_c:        143.2  127.2
chrRangeToJpeg_256_rvv_i32:   13.5    6.5
chrRangeToJpeg_512_c:        285.7  253.7
chrRangeToJpeg_512_rvv_i32:   27.0   13.0
lumRangeFromJpeg_8_c:          1.7    1.5
lumRangeFromJpeg_8_rvv_i32:    1.2    1.0
lumRangeFromJpeg_24_c:         4.2    3.7
lumRangeFromJpeg_24_rvv_i32:   1.2    1.0
lumRangeFromJpeg_128_c:       21.7   19.2
lumRangeFromJpeg_128_rvv_i32:  3.7    1.7
lumRangeFromJpeg_144_c:       24.7   22.0
lumRangeFromJpeg_144_rvv_i32:  4.7    2.7
lumRangeFromJpeg_256_c:       43.7   39.0
lumRangeFromJpeg_256_rvv_i32:  7.5    3.2
lumRangeFromJpeg_512_c:       87.0   77.2
lumRangeFromJpeg_512_rvv_i32: 14.5    6.7
lumRangeToJpeg_8_c:            2.7    2.2
lumRangeToJpeg_8_rvv_i32:      1.0    1.0
lumRangeToJpeg_24_c:           7.2    6.5
lumRangeToJpeg_24_rvv_i32:     1.2    1.0
lumRangeToJpeg_128_c:         37.7   33.7
lumRangeToJpeg_128_rvv_i32:    3.7    2.0
lumRangeToJpeg_144_c:         42.5   37.7
lumRangeToJpeg_144_rvv_i32:    4.7    2.7
lumRangeToJpeg_256_c:         75.0   66.7
lumRangeToJpeg_256_rvv_i32:    7.5    3.5
lumRangeToJpeg_512_c:        149.5  133.0
lumRangeToJpeg_512_rvv_i32:   14.7    7.0
2024-06-10 22:48:52 +03:00
Zhao Zhili
9dac8495b0 swscale/aarch64: Add rgb24 to yuv implementation
Test on Apple M1:

rgb24_to_uv_8_c: 0.0
rgb24_to_uv_8_neon: 0.2
rgb24_to_uv_128_c: 1.0
rgb24_to_uv_128_neon: 0.5
rgb24_to_uv_1080_c: 7.0
rgb24_to_uv_1080_neon: 5.7
rgb24_to_uv_1920_c: 12.5
rgb24_to_uv_1920_neon: 9.5
rgb24_to_uv_half_8_c: 0.2
rgb24_to_uv_half_8_neon: 0.2
rgb24_to_uv_half_128_c: 1.0
rgb24_to_uv_half_128_neon: 0.5
rgb24_to_uv_half_1080_c: 6.2
rgb24_to_uv_half_1080_neon: 3.0
rgb24_to_uv_half_1920_c: 11.2
rgb24_to_uv_half_1920_neon: 5.2
rgb24_to_y_8_c: 0.2
rgb24_to_y_8_neon: 0.0
rgb24_to_y_128_c: 0.5
rgb24_to_y_128_neon: 0.5
rgb24_to_y_1080_c: 4.7
rgb24_to_y_1080_neon: 3.2
rgb24_to_y_1920_c: 8.0
rgb24_to_y_1920_neon: 5.7

On Pixel 6:

rgb24_to_uv_8_c: 30.7
rgb24_to_uv_8_neon: 56.9
rgb24_to_uv_128_c: 213.9
rgb24_to_uv_128_neon: 173.2
rgb24_to_uv_1080_c: 1649.9
rgb24_to_uv_1080_neon: 1424.4
rgb24_to_uv_1920_c: 2907.9
rgb24_to_uv_1920_neon: 2480.7
rgb24_to_uv_half_8_c: 36.2
rgb24_to_uv_half_8_neon: 33.4
rgb24_to_uv_half_128_c: 167.9
rgb24_to_uv_half_128_neon: 99.4
rgb24_to_uv_half_1080_c: 1293.9
rgb24_to_uv_half_1080_neon: 778.7
rgb24_to_uv_half_1920_c: 2292.7
rgb24_to_uv_half_1920_neon: 1328.7
rgb24_to_y_8_c: 19.7
rgb24_to_y_8_neon: 27.7
rgb24_to_y_128_c: 129.9
rgb24_to_y_128_neon: 96.7
rgb24_to_y_1080_c: 995.4
rgb24_to_y_1080_neon: 767.7
rgb24_to_y_1920_c: 1747.4
rgb24_to_y_1920_neon: 1337.2

Note both tests use clang as compiler, which has vectorization
enabled by default with -O3.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-11 01:12:09 +08:00
James Almer
17c3cc5bb6 swscale/x86/rgb_2_rgb: add missing wrap to ff_uyvytoyuv422_avx2
Fixes old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 16:04:36 -03:00
James Almer
03546f49a3 swscale/x86/rgb2rgb: add missing wrap for ff_uyvytoyuv422_avx2
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 15:56:52 -03:00
James Almer
e8cef5e152 swscale/x86/rgb2rgb: remove mmxext version of shuffle_bytes_2103
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
c578bb9864 swscale/x86/input: add AVX2 optimized uyvytoyuv422
uyvytoyuv422_c: 23991.8
uyvytoyuv422_sse2: 2817.8
uyvytoyuv422_avx: 2819.3
uyvytoyuv422_avx2: 1972.3

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
e9cfd53257 swscale/x86/input: add AVX2 optimized RGB32 to YUV functions
abgr_to_uv_8_c: 43.3
abgr_to_uv_8_sse2: 14.3
abgr_to_uv_8_avx: 15.3
abgr_to_uv_8_avx2: 18.8
abgr_to_uv_128_c: 650.3
abgr_to_uv_128_sse2: 110.8
abgr_to_uv_128_avx: 112.3
abgr_to_uv_128_avx2: 64.8
abgr_to_uv_1080_c: 5456.3
abgr_to_uv_1080_sse2: 888.8
abgr_to_uv_1080_avx: 900.8
abgr_to_uv_1080_avx2: 518.3
abgr_to_uv_1920_c: 9692.3
abgr_to_uv_1920_sse2: 1593.8
abgr_to_uv_1920_avx: 1613.3
abgr_to_uv_1920_avx2: 864.8
abgr_to_y_8_c: 23.3
abgr_to_y_8_sse2: 12.8
abgr_to_y_8_avx: 13.3
abgr_to_y_8_avx2: 17.3
abgr_to_y_128_c: 308.3
abgr_to_y_128_sse2: 67.3
abgr_to_y_128_avx: 66.8
abgr_to_y_128_avx2: 44.8
abgr_to_y_1080_c: 2371.3
abgr_to_y_1080_sse2: 512.8
abgr_to_y_1080_avx: 505.8
abgr_to_y_1080_avx2: 314.3
abgr_to_y_1920_c: 4177.3
abgr_to_y_1920_sse2: 915.8
abgr_to_y_1920_avx: 926.8
abgr_to_y_1920_avx2: 519.3
bgra_to_uv_8_c: 37.3
bgra_to_uv_8_sse2: 13.3
bgra_to_uv_8_avx: 14.8
bgra_to_uv_8_avx2: 19.8
bgra_to_uv_128_c: 563.8
bgra_to_uv_128_sse2: 111.3
bgra_to_uv_128_avx: 112.3
bgra_to_uv_128_avx2: 64.8
bgra_to_uv_1080_c: 4691.8
bgra_to_uv_1080_sse2: 893.8
bgra_to_uv_1080_avx: 899.8
bgra_to_uv_1080_avx2: 517.8
bgra_to_uv_1920_c: 8332.8
bgra_to_uv_1920_sse2: 1590.8
bgra_to_uv_1920_avx: 1605.8
bgra_to_uv_1920_avx2: 867.3
bgra_to_y_8_c: 22.3
bgra_to_y_8_sse2: 12.8
bgra_to_y_8_avx: 12.8
bgra_to_y_8_avx2: 17.3
bgra_to_y_128_c: 291.3
bgra_to_y_128_sse2: 67.8
bgra_to_y_128_avx: 69.3
bgra_to_y_128_avx2: 45.3
bgra_to_y_1080_c: 2357.3
bgra_to_y_1080_sse2: 508.3
bgra_to_y_1080_avx: 518.3
bgra_to_y_1080_avx2: 399.8
bgra_to_y_1920_c: 4202.8
bgra_to_y_1920_sse2: 906.8
bgra_to_y_1920_avx: 907.3
bgra_to_y_1920_avx2: 526.3

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
d5fe99dc5f swscale/x86/input: add AVX2 optimized RGB24 to YUV functions
rgb24_to_uv_8_c: 39.3
rgb24_to_uv_8_sse2: 14.3
rgb24_to_uv_8_ssse3: 13.3
rgb24_to_uv_8_avx: 12.8
rgb24_to_uv_8_avx2: 14.3
rgb24_to_uv_128_c: 582.8
rgb24_to_uv_128_sse2: 127.3
rgb24_to_uv_128_ssse3: 107.3
rgb24_to_uv_128_avx: 111.3
rgb24_to_uv_128_avx2: 62.3
rgb24_to_uv_1080_c: 4981.3
rgb24_to_uv_1080_sse2: 1048.3
rgb24_to_uv_1080_ssse3: 876.8
rgb24_to_uv_1080_avx: 887.8
rgb24_to_uv_1080_avx2: 492.3
rgb24_to_uv_1280_c: 5906.8
rgb24_to_uv_1280_sse2: 1263.3
rgb24_to_uv_1280_ssse3: 1048.3
rgb24_to_uv_1280_avx: 1045.8
rgb24_to_uv_1280_avx2: 579.8
rgb24_to_uv_1920_c: 8665.3
rgb24_to_uv_1920_sse2: 1888.8
rgb24_to_uv_1920_ssse3: 1571.8
rgb24_to_uv_1920_avx: 1558.8
rgb24_to_uv_1920_avx2: 869.3
rgb24_to_y_8_c: 20.3
rgb24_to_y_8_sse2: 11.8
rgb24_to_y_8_ssse3: 10.3
rgb24_to_y_8_avx: 10.3
rgb24_to_y_8_avx2: 10.8
rgb24_to_y_128_c: 284.8
rgb24_to_y_128_sse2: 83.3
rgb24_to_y_128_ssse3: 66.8
rgb24_to_y_128_avx: 64.8
rgb24_to_y_128_avx2: 39.3
rgb24_to_y_1080_c: 2451.3
rgb24_to_y_1080_sse2: 696.3
rgb24_to_y_1080_ssse3: 516.8
rgb24_to_y_1080_avx: 518.8
rgb24_to_y_1080_avx2: 301.8
rgb24_to_y_1280_c: 2892.8
rgb24_to_y_1280_sse2: 816.8
rgb24_to_y_1280_ssse3: 623.3
rgb24_to_y_1280_avx: 616.3
rgb24_to_y_1280_avx2: 350.8
rgb24_to_y_1920_c: 4338.8
rgb24_to_y_1920_sse2: 1210.8
rgb24_to_y_1920_ssse3: 928.3
rgb24_to_y_1920_avx: 920.3
rgb24_to_y_1920_avx2: 534.8

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:42:09 -03:00
Rémi Denis-Courmont
7a3369398f sws/input: R-V V 32-bit RGB to halved UV
T-Head C908:
abgr_to_uv_half_8_c:            2.2
abgr_to_uv_half_8_rvv_i32:      3.5
abgr_to_uv_half_128_c:         44.0
abgr_to_uv_half_128_rvv_i32:   13.0
abgr_to_uv_half_1080_c:       245.0
abgr_to_uv_half_1080_rvv_i32: 107.2
abgr_to_uv_half_1920_c:       406.2
abgr_to_uv_half_1920_rvv_i32: 188.7
bgra_to_uv_half_8_c:            2.2
bgra_to_uv_half_8_rvv_i32:      3.5
bgra_to_uv_half_128_c:         26.5
bgra_to_uv_half_128_rvv_i32:   13.0
bgra_to_uv_half_1080_c:       219.7
bgra_to_uv_half_1080_rvv_i32: 107.0
bgra_to_uv_half_1920_c:       406.7
bgra_to_uv_half_1920_rvv_i32: 188.7

SpacemiT X60:
abgr_to_uv_half_8_c:           2.2
abgr_to_uv_half_8_rvv_i32:     3.0
abgr_to_uv_half_128_c:        28.2
abgr_to_uv_half_128_rvv_i32:   5.7
abgr_to_uv_half_1080_c:      235.5
abgr_to_uv_half_1080_rvv_i32: 47.7
abgr_to_uv_half_1920_c:      418.2
abgr_to_uv_half_1920_rvv_i32: 84.0
bgra_to_uv_half_8_c:           2.0
bgra_to_uv_half_8_rvv_i32:     3.0
bgra_to_uv_half_128_c:        23.7
bgra_to_uv_half_128_rvv_i32:   5.7
bgra_to_uv_half_1080_c:      195.5
bgra_to_uv_half_1080_rvv_i32: 47.7
bgra_to_uv_half_1920_c:      346.5
bgra_to_uv_half_1920_rvv_i32: 84.0
2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont
e2f069905e sws/input: R-V V 32-bit RGB to UV 2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont
f5555cb106 sws/input: R-V V 32-bit RGB to Y
T-Head C908:
abgr_to_y_8_c:            2.5
abgr_to_y_8_rvv_i32:      2.2
abgr_to_y_128_c:         37.0
abgr_to_y_128_rvv_i32:    8.5
abgr_to_y_1080_c:       327.0
abgr_to_y_1080_rvv_i32:  69.5
abgr_to_y_1920_c:       552.0
abgr_to_y_1920_rvv_i32: 122.2
bgra_to_y_8_c:            2.5
bgra_to_y_8_rvv_i32:      2.2
bgra_to_y_128_c:         37.2
bgra_to_y_128_rvv_i32:    8.5
bgra_to_y_1080_c:       310.2
bgra_to_y_1080_rvv_i32:  69.5
bgra_to_y_1920_c:       568.2
bgra_to_y_1920_rvv_i32: 122.5

SpacemiT X60:
abgr_to_y_8_c:            2.5
abgr_to_y_8_rvv_i32:      2.0
abgr_to_y_128_c:         33.0
abgr_to_y_128_rvv_i32:    3.7
abgr_to_y_1080_c:       276.0
abgr_to_y_1080_rvv_i32:  31.5
abgr_to_y_1920_c:       493.7
abgr_to_y_1920_rvv_i32:  55.5
bgra_to_y_8_c:            2.2
bgra_to_y_8_rvv_i32:      2.0
bgra_to_y_128_c:         33.0
bgra_to_y_128_rvv_i32:    3.7
bgra_to_y_1080_c:       276.0
bgra_to_y_1080_rvv_i32:  31.5
bgra_to_y_1920_c:       490.7
bgra_to_y_1920_rvv_i32:  55.5
2024-06-09 14:33:04 +03:00
Andreas Rheinhardt
8b62fb231a swscale/x86/rgb2rgb: Detemplatize
Every function in rgb2rgb_template.c is only compiled exactly
once; there is no overlap at all between the MMXEXT and the
SSE2 functions, so detemplatize it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
5421dee0e7 swscale/x86/rgb2rgb_template: Remove unused uyvytoyv12
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
c1c35380a7 swscale/x86/rgb2rgb: Don't unnecessarily check for inline ASM
The SSE2 and AVX versions of deinterleaveBytes are external ASM.
Move them out of the inline ASM template.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
f7305eb3b3 swscale/x86/rgb2rgb_template: Remove unnecessary SFENCE
The ff_nv12ToUV_* functions don't use non-temporal stores
at all.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Rémi Denis-Courmont
e0f4d185f1 sws/input: R-V V rgb24ToUV_half and bgr24ToUV_half
T-Head C908:
rgb24_to_uv_half_4_c:           2.0
rgb24_to_uv_half_4_rvv_i32:     3.5
rgb24_to_uv_half_64_c:         27.0
rgb24_to_uv_half_64_rvv_i32:   12.5
rgb24_to_uv_half_540_c:       223.7
rgb24_to_uv_half_540_rvv_i32: 105.2
rgb24_to_uv_half_640_c:       265.5
rgb24_to_uv_half_640_rvv_i32: 123.7
rgb24_to_uv_half_960_c:       414.5
rgb24_to_uv_half_960_rvv_i32: 249.5

SpacemiT X60:
rgb24_to_uv_half_4_c:           1.7
rgb24_to_uv_half_4_rvv_i32:     4.2
rgb24_to_uv_half_64_c:         24.0
rgb24_to_uv_half_64_rvv_i32:    8.7
rgb24_to_uv_half_540_c:       199.2
rgb24_to_uv_half_540_rvv_i32:  72.5
rgb24_to_uv_half_640_c:       235.7
rgb24_to_uv_half_640_rvv_i32:  85.2
rgb24_to_uv_half_960_c:       353.5
rgb24_to_uv_half_960_rvv_i32: 127.5
2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont
3ef5867e4b sws/input: R-V V rgb24ToUV and bgr24ToUV
T-Head C908:
rgb24_to_uv_8_c:            2.7
rgb24_to_uv_8_rvv_i32:      3.2
rgb24_to_uv_128_c:         41.0
rgb24_to_uv_128_rvv_i32:   12.7
rgb24_to_uv_1080_c:       342.5
rgb24_to_uv_1080_rvv_i32: 105.7
rgb24_to_uv_1280_c:       406.0
rgb24_to_uv_1280_rvv_i32: 124.2
rgb24_to_uv_1920_c:       626.0
rgb24_to_uv_1920_rvv_i32: 186.0

SpacemiT X60:
rgb24_to_uv_8_c:            2.5
rgb24_to_uv_8_rvv_i32:      3.0
rgb24_to_uv_128_c:         36.5
rgb24_to_uv_128_rvv_i32:    5.7
rgb24_to_uv_1080_c:       304.2
rgb24_to_uv_1080_rvv_i32:  49.0
rgb24_to_uv_1280_c:       360.5
rgb24_to_uv_1280_rvv_i32:  57.5
rgb24_to_uv_1920_c:       540.7
rgb24_to_uv_1920_rvv_i32:  86.2
2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont
79dfdac4db sws/input: R-V V rgb24ToY & bgr24ToY
T-Head C908:
rgb24_to_y_8_c:            2.0
rgb24_to_y_8_rvv_i32:      2.7
rgb24_to_y_128_c:         26.2
rgb24_to_y_128_rvv_i32:    9.2
rgb24_to_y_1080_c:       219.5
rgb24_to_y_1080_rvv_i32:  76.2
rgb24_to_y_1280_c:       276.2
rgb24_to_y_1280_rvv_i32:  89.7
rgb24_to_y_1920_c:       389.7
rgb24_to_y_1920_rvv_i32: 134.2

SpacemiT X60:
rgb24_to_y_8_c:            1.7
rgb24_to_y_8_rvv_i32:      2.2
rgb24_to_y_128_c:         23.2
rgb24_to_y_128_rvv_i32:    4.2
rgb24_to_y_1080_c:       195.0
rgb24_to_y_1080_rvv_i32:  33.7
rgb24_to_y_1280_c:       231.0
rgb24_to_y_1280_rvv_i32:  40.0
rgb24_to_y_1920_c:       346.2
rgb24_to_y_1920_rvv_i32:  59.7
2024-06-08 18:30:43 +03:00
Ramiro Polla
5939f7228a libswscale/x86/yuv_2_rgb: fix some comments 2024-06-07 15:24:06 +02:00
Shiyou Yin
6b35fcacdb
swscale: [loongarch] Fix undeclared functions prob.
Compile with '--disable-lasx', ‘lumRangeFromJpeg_lasx’ undeclared.

Reviewed-by: 金波 <jinbo@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-31 02:20:23 +02:00
Michael Niedermayer
bfc22f364d
swscale/yuv2rgb: Use 64bit for brightness computation
This will not overflow for normal values
Fixes: CID1500280 Unintentional integer overflow

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:06 +02:00
Michael Niedermayer
3f9daf1c18
swscale/x86/swscale: use a clearer name for INPUT_PLANER_RGB_A_FUNC_CASE
related: CID1497114 Missing break in switch

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:06 +02:00
Rémi Denis-Courmont
6c6313f1b5 swscale/riscv: explicitly require Zbb for MIN 2024-05-10 18:59:06 +03:00
Michael Niedermayer
1330a73cca
swscale/output: Fix integer overflow in yuv2rgba64_full_1_c_template()
Fixes: signed integer overflow: -1082982400 + -1079364728 cannot be represented in type 'int'
Fixes: 67910/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5329011971522560
The input is 9bit in 16bit, the fuzzer fills all 16bit thus generating "invalid" input
No overflow should happen with valid input.

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-06 03:00:40 +02:00
Michael Niedermayer
a56559e688
swscale/output: Fix integer overflow in yuv2rgba64_1_c_template
Fixes: signed integer overflow: -831176 * 9539 cannot be represented in type 'int'
Fixes: 67869/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5117342091640832

The input is 9bit in 16bit, the fuzzer fills all 16bit thus generating "invalid" input
No overflow should happen with valid input.

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-06 03:00:40 +02:00
Shiyou Yin
2a7d622ddd
swscale: [LA] Optimize swscale funcs in input.c
Optimized 7 funcs with LSX and LASX:
1. yuy2ToUV_c
2. yvy2ToUV_c
3. uyvyToUV_c
4. nv12ToUV_c
5. nv21ToUV_c
6. abgrToA_c
7. rgbaToA_c

Reviewed-by: colleague of Shiyou Yin
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-11 23:53:59 +02:00
Shiyou Yin
8b76df9142
swscale: [LA] Optimize yuv2plane1_8_c.
Reviewed-by: colleague of Shiyou Yin
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-11 23:53:59 +02:00
Shiyou Yin
f3fe2cb5f7
swscale: [LA] Optimize range convert for yuvj420p.
Reviewed-by: 陈昊 <chenhao@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-11 23:53:41 +02:00
Michael Niedermayer
1a9eda65d0
swscale/utils: Fix xInc overflow
Fixes: signed integer overflow: 2 * 1073741824 cannot be represented in type 'int'
Fixes: 67802/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6249515855183872

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-04 19:38:29 +02:00
Andreas Rheinhardt
428ff7bd8c swscale/ppc/swscale_ppc_template: Reindent after the previous commit
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-04-04 16:47:21 +02:00
Andreas Rheinhardt
95b4aea5e3 swscale/ppc/swscale_ppc_template: Remove code not passing checkasm
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-04-04 16:45:23 +02:00
Andreas Rheinhardt
790f793844 avutil/common: Don't auto-include mem.h
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.

Keep it for external users in order to not cause breakages.

Also improve the other headers a bit while just at it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:43 +01:00
Andreas Rheinhardt
b616be1649 lib*/version: Use static_assert for static asserts
Also update the checks that guard against inserting
a new enum entry in the middle of a range.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
2d38141ea6 swscale/swscale_internal: Don't export internal function
sws_alloc_set_opts() can actually be made internal to utils.c.
This commit does so.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
ad1cef04a9 swscale/swscale_internal: Hoist branch out of loop
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
b49e621c83 swscale/ppc/swscale_altivec: Simplify macro
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
72f4f1dafb swscale/ppc/swscale_altivec: Fix build with -O0
In this case GCC does not treat a const variable initialized
to the compile-time constant "3" as a compile-time constant
and errors out because the argument is not a literal value.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
4b44b5eaf0 swscale/swscale_internal: Only include altivec header iff HAVE_ALTIVEC
Reviewed-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-30 05:04:41 +01:00
Michael Niedermayer
6b213175c9
Bump after 7.0 branch point
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-27 01:04:54 +01:00
Michael Niedermayer
872980ace6
Bump prior release/7.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-27 01:04:53 +01:00
Henrik Gramner
c3d3f0e697 avutil/x86util: Fix broken pre-SSE4.1 PMINSD emulation
Fixes yadif-16 which allows FATE to pass.

Broken since 2904db9045 (2017).
2024-03-17 13:52:27 +01:00
James Almer
783d00b203 libs: bump major version for all libraries
Signed-off-by: James Almer <jamrial@gmail.com>
2024-03-07 11:29:43 -03:00
Michael Niedermayer
e9cc9e492f
libswscale/utils: Fix bayer to yuvj
Fixes: out of array access.

Earlier code assumes that a unscaled bayer to yuvj420 converter exists
but the later code then skips yuvj420

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-02-21 18:24:17 +01:00
Michael Niedermayer
f9906911f0
Revert "swscale: fix sws_setColorspaceDetails after sws_init_context"
Suggested by: Niklas Haas in Ticket10824

Fixes: Assertion failure
Fixes: Ticket10824

This reverts commit cedf589c09.
2024-02-21 18:24:17 +01:00
Michael Niedermayer
64098d0cd8
swscale/swscale: Check srcSliceH for bayer
Fixes: Assertion srcSliceH > 1 failed at libswscale/swscale_unscaled.c:1359
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-02-21 18:24:16 +01:00