1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00
Commit Graph

2629 Commits

Author SHA1 Message Date
Niklas Haas
4ec45aca36 swscale/utils: fix leak on threaded ctx init failure
This count gets incremented after init succeeds, when it should be
incremented after *alloc* succeeds. Otherwise, we leak the context on
failure.

There are no negative consequences of incrementing for
allocated-but-not-initialized contexts, as the only functions that
reference it will, in the worst case, simply behave as if called on
allocated-but-not-initialized contexts, which is in line with expected
behavior when sws_init_context() fails.
2024-07-14 13:48:59 +02:00
Sean McGovern
34b4ca8696
swscale: prevent undefined behaviour in the PUTRGBA macro
For even small values of 'asrc[x]', shifting them by 24 bits or more
will cause arithmetic overflow and be caught by
GCC's undefined behaviour sanitizer.

Ensure the values do not overflow by up-casting the bracketed
expressions involving 'asrc' to uint32_t.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-10 18:10:10 +02:00
Ramiro Polla
ac6263945a swscale/x86/yuv2rgb: Detemplatize
Every function in yuv2rgb_template.c is only compiled exactly
once, so detemplatize it.
2024-07-10 12:25:32 +02:00
Ramiro Polla
4f7f9b1026 swscale: remove unconditional #define DITHER1XBPP
This seems to have had an use in the past, but it is now defined
unconditionally.
2024-07-10 12:25:03 +02:00
Michael Niedermayer
66b60bae68
swscale/swscale: Use ptrdiff_t for linesize computations
This is unlikely to make a difference

Fixes: CID1591896 Unintentional integer overflow
Fixes: CID1591901 Unintentional integer overflow

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-07 23:36:30 +02:00
Zhao Zhili
4d90a76986 swscale/aarch64: Add argb/abgr to yuv
Test on Apple M1 with kperf:
				: -O3		: -O3 -fno-vectorize
abgr_to_uv_8_c			: 19.4		: 26.1
abgr_to_uv_8_neon		: 29.9		: 51.1
abgr_to_uv_128_c		: 146.4		: 558.9
abgr_to_uv_128_neon		: 85.1		: 83.4
abgr_to_uv_1080_c		: 1162.6	: 4786.4
abgr_to_uv_1080_neon		: 819.6		: 826.6
abgr_to_uv_1920_c		: 2063.6	: 8492.1
abgr_to_uv_1920_neon		: 1435.1	: 1447.1
abgr_to_uv_half_8_c		: 16.4		: 11.4
abgr_to_uv_half_8_neon		: 35.6		: 20.4
abgr_to_uv_half_128_c		: 108.6		: 359.4
abgr_to_uv_half_128_neon	: 75.4		: 42.6
abgr_to_uv_half_1080_c		: 883.4		: 2885.6
abgr_to_uv_half_1080_neon	: 460.6		: 481.1
abgr_to_uv_half_1920_c		: 1553.6	: 5106.9
abgr_to_uv_half_1920_neon	: 817.6		: 820.4
abgr_to_y_8_c			: 6.1		: 26.4
abgr_to_y_8_neon		: 40.6		: 6.4
abgr_to_y_128_c			: 99.9		: 390.1
abgr_to_y_128_neon		: 67.4		: 55.9
abgr_to_y_1080_c		: 735.9		: 3170.4
abgr_to_y_1080_neon		: 534.6		: 536.6
abgr_to_y_1920_c		: 1279.4	: 6016.4
abgr_to_y_1920_neon		: 932.6		: 927.6

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-05 16:32:31 +08:00
Zhao Zhili
52422133ae swscale/aarch64: Add bgra/rgba to yuv
Test on Apple M1 with kperf
				: -O3		: -O3 -fno-vectorize
bgra_to_uv_8_c			: 13.4		: 27.5
bgra_to_uv_8_neon		: 37.4		: 41.7
bgra_to_uv_128_c		: 155.9		: 550.2
bgra_to_uv_128_neon		: 91.7		: 92.7
bgra_to_uv_1080_c		: 1173.2	: 4558.2
bgra_to_uv_1080_neon		: 822.7		: 809.5
bgra_to_uv_1920_c		: 2078.2	: 8115.2
bgra_to_uv_1920_neon		: 1437.7	: 1438.7
bgra_to_uv_half_8_c		: 17.9		: 14.2
bgra_to_uv_half_8_neon		: 37.4		: 10.5
bgra_to_uv_half_128_c		: 103.9		: 326.0
bgra_to_uv_half_128_neon	: 73.9		: 68.7
bgra_to_uv_half_1080_c		: 850.2		: 3732.0
bgra_to_uv_half_1080_neon	: 484.2		: 490.0
bgra_to_uv_half_1920_c		: 1479.2	: 4942.7
bgra_to_uv_half_1920_neon	: 824.2		: 824.7
bgra_to_y_8_c			: 8.2		: 29.5
bgra_to_y_8_neon		: 18.2		: 32.7
bgra_to_y_128_c			: 101.4		: 361.5
bgra_to_y_128_neon		: 74.9		: 73.7
bgra_to_y_1080_c		: 739.4		: 3018.0
bgra_to_y_1080_neon		: 613.4		: 544.2
bgra_to_y_1920_c		: 1298.7	: 5326.0
bgra_to_y_1920_neon		: 918.7		: 934.2

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-05 16:32:31 +08:00
Zhao Zhili
b8b71be07a swscale/aarch64: Add bgr24 to yuv
Test on Apple M1 with kperf
				: -O3		: -O3 -fno-vectorize
bgr24_to_uv_8_c			: 28.5		: 52.5
bgr24_to_uv_8_neon		: 54.5		: 59.7
bgr24_to_uv_128_c		: 294.0		: 830.7
bgr24_to_uv_128_neon		: 99.7		: 112.0
bgr24_to_uv_1080_c		: 965.0		: 6624.0
bgr24_to_uv_1080_neon		: 751.5		: 754.7
bgr24_to_uv_1920_c		: 1693.2	: 11554.5
bgr24_to_uv_1920_neon		: 1292.5	: 1307.5
bgr24_to_uv_half_8_c		: 54.2		: 37.0
bgr24_to_uv_half_8_neon		: 27.2		: 22.5
bgr24_to_uv_half_128_c		: 127.2		: 392.5
bgr24_to_uv_half_128_neon	: 63.0		: 52.0
bgr24_to_uv_half_1080_c		: 880.2		: 3329.0
bgr24_to_uv_half_1080_neon	: 401.5		: 390.7
bgr24_to_uv_half_1920_c		: 1585.7	: 6390.7
bgr24_to_uv_half_1920_neon	: 694.7		: 698.7
bgr24_to_y_8_c			: 21.7		: 22.5
bgr24_to_y_8_neon		: 797.2		: 25.5
bgr24_to_y_128_c		: 88.0		: 280.5
bgr24_to_y_128_neon		: 63.7		: 55.0
bgr24_to_y_1080_c		: 616.7		: 2208.7
bgr24_to_y_1080_neon		: 900.0		: 452.0
bgr24_to_y_1920_c		: 1093.2	: 3894.7
bgr24_to_y_1920_neon		: 777.2		: 767.5

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-05 16:32:31 +08:00
Ramiro Polla
61e851381f swscale/yuv2rgb/x86: remove mmx/mmxext yuv2rgb functions
These functions are either slower or barely faster than the C LUT
yuv2rgb code.
2024-07-04 11:12:47 +02:00
Michael Niedermayer
c221c7422f
swscale/output: Avoid undefined overflow in yuv2rgb_write_full()
Fixes: signed integer overflow: -140140 * 16525 cannot be represented in type 'int'
Fixes: 68859/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-4516387130245120

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-06-26 20:49:36 +02:00
Michael Niedermayer
9e6c5b6e86
swscale/output: alpha can become negative after scaling, use multiply
Fixes: left shift of negative value -3245
Fixes: 69047/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6571511551950848

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-06-26 20:49:36 +02:00
Ramiro Polla
e37a93031e swscale/yuv2rgb: reindent after previous commit 2024-06-24 13:35:22 +02:00
Ramiro Polla
0a08c64588 swscale/yuv2rgb: fix yuv422p input in C code
The C code was silently ignoring the second chroma line on yuv422p
input.
2024-06-24 13:34:53 +02:00
Ramiro Polla
fb8fae864f swscale/yuv2rgb: add macros to simplify code generation 2024-06-24 13:34:28 +02:00
Ramiro Polla
88a402df74 swscale/yuv2rgb: fix conversion for widths not aligned to 8
The C code for some pixel formats (rgb555, rgb565, rgb444, and monob)
was not converting the last pixels on widths not aligned to 8.

NOTE: the last pixel for odd widths is still not converted for any of
      the pixel formats in the C code for yuv2rgb except for monob.
2024-06-24 13:33:53 +02:00
Ramiro Polla
75f1a8e071 swscale/aarch64: add neon {lum,chr}ConvertRange
chrRangeFromJpeg_8_c: 29.2
chrRangeFromJpeg_8_neon: 19.5
chrRangeFromJpeg_24_c: 80.5
chrRangeFromJpeg_24_neon: 34.0
chrRangeFromJpeg_128_c: 413.7
chrRangeFromJpeg_128_neon: 156.0
chrRangeFromJpeg_144_c: 471.0
chrRangeFromJpeg_144_neon: 174.2
chrRangeFromJpeg_256_c: 842.0
chrRangeFromJpeg_256_neon: 305.5
chrRangeFromJpeg_512_c: 1699.0
chrRangeFromJpeg_512_neon: 608.0
chrRangeToJpeg_8_c: 51.7
chrRangeToJpeg_8_neon: 22.7
chrRangeToJpeg_24_c: 149.7
chrRangeToJpeg_24_neon: 38.0
chrRangeToJpeg_128_c: 761.7
chrRangeToJpeg_128_neon: 176.7
chrRangeToJpeg_144_c: 866.2
chrRangeToJpeg_144_neon: 198.7
chrRangeToJpeg_256_c: 1516.5
chrRangeToJpeg_256_neon: 348.7
chrRangeToJpeg_512_c: 3067.2
chrRangeToJpeg_512_neon: 692.7
lumRangeFromJpeg_8_c: 24.0
lumRangeFromJpeg_8_neon: 17.0
lumRangeFromJpeg_24_c: 56.7
lumRangeFromJpeg_24_neon: 21.0
lumRangeFromJpeg_128_c: 294.5
lumRangeFromJpeg_128_neon: 76.7
lumRangeFromJpeg_144_c: 332.5
lumRangeFromJpeg_144_neon: 86.7
lumRangeFromJpeg_256_c: 586.0
lumRangeFromJpeg_256_neon: 152.2
lumRangeFromJpeg_512_c: 1190.0
lumRangeFromJpeg_512_neon: 298.0
lumRangeToJpeg_8_c: 31.7
lumRangeToJpeg_8_neon: 19.5
lumRangeToJpeg_24_c: 83.5
lumRangeToJpeg_24_neon: 24.2
lumRangeToJpeg_128_c: 440.5
lumRangeToJpeg_128_neon: 91.0
lumRangeToJpeg_144_c: 504.2
lumRangeToJpeg_144_neon: 101.0
lumRangeToJpeg_256_c: 879.7
lumRangeToJpeg_256_neon: 177.2
lumRangeToJpeg_512_c: 1794.2
lumRangeToJpeg_512_neon: 354.0
2024-06-18 23:12:41 +02:00
James Almer
fcf72966a5 swscale/x86/range_convert: add missing AVX2 preprocessor wrapper
Fixes compilation with old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-16 10:09:38 -03:00
James Almer
8a4c9d6bd3 swscale/x86/range_convert: reduce amount of xmm regs clobbered in luma functions
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-15 21:02:06 -03:00
Ramiro Polla
f6859cade3 swscale/x86: add sse2 and avx2 {lum,chr}ConvertRange
chrRangeFromJpeg_8_c: 22.3
chrRangeFromJpeg_8_sse2: 13.3
chrRangeFromJpeg_8_avx2: 13.3
chrRangeFromJpeg_24_c: 72.8
chrRangeFromJpeg_24_sse2: 22.3
chrRangeFromJpeg_24_avx2: 17.5
chrRangeFromJpeg_128_c: 345.5
chrRangeFromJpeg_128_sse2: 106.0
chrRangeFromJpeg_128_avx2: 57.8
chrRangeFromJpeg_144_c: 380.5
chrRangeFromJpeg_144_sse2: 118.5
chrRangeFromJpeg_144_avx2: 62.3
chrRangeFromJpeg_256_c: 646.3
chrRangeFromJpeg_256_sse2: 218.8
chrRangeFromJpeg_256_avx2: 109.0
chrRangeFromJpeg_512_c: 1461.5
chrRangeFromJpeg_512_sse2: 426.5
chrRangeFromJpeg_512_avx2: 211.5
chrRangeToJpeg_8_c: 37.8
chrRangeToJpeg_8_sse2: 10.5
chrRangeToJpeg_8_avx2: 14.0
chrRangeToJpeg_24_c: 114.3
chrRangeToJpeg_24_sse2: 23.5
chrRangeToJpeg_24_avx2: 16.3
chrRangeToJpeg_128_c: 633.5
chrRangeToJpeg_128_sse2: 107.5
chrRangeToJpeg_128_avx2: 55.0
chrRangeToJpeg_144_c: 758.3
chrRangeToJpeg_144_sse2: 132.0
chrRangeToJpeg_144_avx2: 64.5
chrRangeToJpeg_256_c: 1345.0
chrRangeToJpeg_256_sse2: 218.0
chrRangeToJpeg_256_avx2: 105.3
chrRangeToJpeg_512_c: 2524.0
chrRangeToJpeg_512_sse2: 417.0
chrRangeToJpeg_512_avx2: 218.8
lumRangeFromJpeg_8_c: 11.8
lumRangeFromJpeg_8_sse2: 11.0
lumRangeFromJpeg_8_avx2: 10.3
lumRangeFromJpeg_24_c: 38.5
lumRangeFromJpeg_24_sse2: 15.5
lumRangeFromJpeg_24_avx2: 12.5
lumRangeFromJpeg_128_c: 232.3
lumRangeFromJpeg_128_sse2: 60.0
lumRangeFromJpeg_128_avx2: 26.8
lumRangeFromJpeg_144_c: 259.5
lumRangeFromJpeg_144_sse2: 65.3
lumRangeFromJpeg_144_avx2: 29.0
lumRangeFromJpeg_256_c: 464.5
lumRangeFromJpeg_256_sse2: 107.5
lumRangeFromJpeg_256_avx2: 54.0
lumRangeFromJpeg_512_c: 897.5
lumRangeFromJpeg_512_sse2: 224.5
lumRangeFromJpeg_512_avx2: 109.8
lumRangeToJpeg_8_c: 17.8
lumRangeToJpeg_8_sse2: 11.0
lumRangeToJpeg_8_avx2: 11.8
lumRangeToJpeg_24_c: 56.3
lumRangeToJpeg_24_sse2: 11.0
lumRangeToJpeg_24_avx2: 12.5
lumRangeToJpeg_128_c: 333.8
lumRangeToJpeg_128_sse2: 53.3
lumRangeToJpeg_128_avx2: 26.5
lumRangeToJpeg_144_c: 375.5
lumRangeToJpeg_144_sse2: 60.8
lumRangeToJpeg_144_avx2: 29.0
lumRangeToJpeg_256_c: 652.0
lumRangeToJpeg_256_sse2: 109.5
lumRangeToJpeg_256_avx2: 53.5
lumRangeToJpeg_512_c: 1284.3
lumRangeToJpeg_512_sse2: 218.0
lumRangeToJpeg_512_avx2: 108.3
2024-06-16 00:35:51 +02:00
Rémi Denis-Courmont
378d1b06c3 riscv: probe for Zbb extension at load time
Due to hysterical raisins, most RISC-V Linux distributions target a
RV64GC baseline excluding the Bit-manipulation ISA extensions, most
notably:
- Zba: address generation extension and
- Zbb: basic bit manipulation extension.
Most CPUs that would make sense to run FFmpeg on support Zba and Zbb
(including the current FATE runner), so it makes sense to optimise for
them. In fact a large chunk of existing assembler optimisations relies
on Zba and/or Zbb.

Since we cannot patch shared library code, the next best thing is to
carry a flag initialised at load-time and check it on need basis.
This results in 3 instructions overhead on isolated use, e.g.:
1:  AUIPC rd, %pcrel_hi(ff_rv_zbb_supported)
    LBU   rd, %pcrel_lo(1b)(rd)
    BEQZ  rd, non_Zbb_fallback_code
    // Zbb code here

The C compiler will typically load the flag ahead of time to reducing
latency, and can also keep it around if Zbb is used multiple times in a
single optimisation scope. For this to work, the flag symbol must be
hidden; otherwise the optimisation degrades with a GOT look-up to
support interposition:
1:  AUIPC rd, GOT_OFFSET_HI
    LD    rd, GOT_OFFSET_LO(rd)
    LBU   rd, (rd)
    BEQZ  rd, non_Zbb_fallback_code
    // Zbb code here

This patch adds code to provision the flag in libraries using bit
manipulation functions from libavutil: byte-swap, bit-weight and
counting leading or trailing zeroes.
2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont
417957ec5e sws/range_convert: R-V V to/from JPEG
C908   X60
chrRangeFromJpeg_8_c:          2.7    2.5
chrRangeFromJpeg_8_rvv_i32:    1.7    1.5
chrRangeFromJpeg_24_c:         7.5    6.7
chrRangeFromJpeg_24_rvv_i32:   1.7    1.5
chrRangeFromJpeg_128_c:       55.2   34.7
chrRangeFromJpeg_128_rvv_i32:  6.5    3.0
chrRangeFromJpeg_144_c:       44.0   39.2
chrRangeFromJpeg_144_rvv_i32:  7.7    4.5
chrRangeFromJpeg_256_c:       78.2   69.5
chrRangeFromJpeg_256_rvv_i32: 12.2    6.0
chrRangeFromJpeg_512_c:      172.2  138.5
chrRangeFromJpeg_512_rvv_i32: 24.5   11.7
chrRangeToJpeg_8_c:            4.7    4.2
chrRangeToJpeg_8_rvv_i32:      2.0    1.7
chrRangeToJpeg_24_c:          13.7   12.2
chrRangeToJpeg_24_rvv_i32:     2.0    1.5
chrRangeToJpeg_128_c:         72.0   63.7
chrRangeToJpeg_128_rvv_i32:    6.7    3.2
chrRangeToJpeg_144_c:         80.7   71.7
chrRangeToJpeg_144_rvv_i32:    8.5    4.7
chrRangeToJpeg_256_c:        143.2  127.2
chrRangeToJpeg_256_rvv_i32:   13.5    6.5
chrRangeToJpeg_512_c:        285.7  253.7
chrRangeToJpeg_512_rvv_i32:   27.0   13.0
lumRangeFromJpeg_8_c:          1.7    1.5
lumRangeFromJpeg_8_rvv_i32:    1.2    1.0
lumRangeFromJpeg_24_c:         4.2    3.7
lumRangeFromJpeg_24_rvv_i32:   1.2    1.0
lumRangeFromJpeg_128_c:       21.7   19.2
lumRangeFromJpeg_128_rvv_i32:  3.7    1.7
lumRangeFromJpeg_144_c:       24.7   22.0
lumRangeFromJpeg_144_rvv_i32:  4.7    2.7
lumRangeFromJpeg_256_c:       43.7   39.0
lumRangeFromJpeg_256_rvv_i32:  7.5    3.2
lumRangeFromJpeg_512_c:       87.0   77.2
lumRangeFromJpeg_512_rvv_i32: 14.5    6.7
lumRangeToJpeg_8_c:            2.7    2.2
lumRangeToJpeg_8_rvv_i32:      1.0    1.0
lumRangeToJpeg_24_c:           7.2    6.5
lumRangeToJpeg_24_rvv_i32:     1.2    1.0
lumRangeToJpeg_128_c:         37.7   33.7
lumRangeToJpeg_128_rvv_i32:    3.7    2.0
lumRangeToJpeg_144_c:         42.5   37.7
lumRangeToJpeg_144_rvv_i32:    4.7    2.7
lumRangeToJpeg_256_c:         75.0   66.7
lumRangeToJpeg_256_rvv_i32:    7.5    3.5
lumRangeToJpeg_512_c:        149.5  133.0
lumRangeToJpeg_512_rvv_i32:   14.7    7.0
2024-06-10 22:48:52 +03:00
Zhao Zhili
9dac8495b0 swscale/aarch64: Add rgb24 to yuv implementation
Test on Apple M1:

rgb24_to_uv_8_c: 0.0
rgb24_to_uv_8_neon: 0.2
rgb24_to_uv_128_c: 1.0
rgb24_to_uv_128_neon: 0.5
rgb24_to_uv_1080_c: 7.0
rgb24_to_uv_1080_neon: 5.7
rgb24_to_uv_1920_c: 12.5
rgb24_to_uv_1920_neon: 9.5
rgb24_to_uv_half_8_c: 0.2
rgb24_to_uv_half_8_neon: 0.2
rgb24_to_uv_half_128_c: 1.0
rgb24_to_uv_half_128_neon: 0.5
rgb24_to_uv_half_1080_c: 6.2
rgb24_to_uv_half_1080_neon: 3.0
rgb24_to_uv_half_1920_c: 11.2
rgb24_to_uv_half_1920_neon: 5.2
rgb24_to_y_8_c: 0.2
rgb24_to_y_8_neon: 0.0
rgb24_to_y_128_c: 0.5
rgb24_to_y_128_neon: 0.5
rgb24_to_y_1080_c: 4.7
rgb24_to_y_1080_neon: 3.2
rgb24_to_y_1920_c: 8.0
rgb24_to_y_1920_neon: 5.7

On Pixel 6:

rgb24_to_uv_8_c: 30.7
rgb24_to_uv_8_neon: 56.9
rgb24_to_uv_128_c: 213.9
rgb24_to_uv_128_neon: 173.2
rgb24_to_uv_1080_c: 1649.9
rgb24_to_uv_1080_neon: 1424.4
rgb24_to_uv_1920_c: 2907.9
rgb24_to_uv_1920_neon: 2480.7
rgb24_to_uv_half_8_c: 36.2
rgb24_to_uv_half_8_neon: 33.4
rgb24_to_uv_half_128_c: 167.9
rgb24_to_uv_half_128_neon: 99.4
rgb24_to_uv_half_1080_c: 1293.9
rgb24_to_uv_half_1080_neon: 778.7
rgb24_to_uv_half_1920_c: 2292.7
rgb24_to_uv_half_1920_neon: 1328.7
rgb24_to_y_8_c: 19.7
rgb24_to_y_8_neon: 27.7
rgb24_to_y_128_c: 129.9
rgb24_to_y_128_neon: 96.7
rgb24_to_y_1080_c: 995.4
rgb24_to_y_1080_neon: 767.7
rgb24_to_y_1920_c: 1747.4
rgb24_to_y_1920_neon: 1337.2

Note both tests use clang as compiler, which has vectorization
enabled by default with -O3.

Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-11 01:12:09 +08:00
James Almer
17c3cc5bb6 swscale/x86/rgb_2_rgb: add missing wrap to ff_uyvytoyuv422_avx2
Fixes old yasm.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 16:04:36 -03:00
James Almer
03546f49a3 swscale/x86/rgb2rgb: add missing wrap for ff_uyvytoyuv422_avx2
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 15:56:52 -03:00
James Almer
e8cef5e152 swscale/x86/rgb2rgb: remove mmxext version of shuffle_bytes_2103
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
c578bb9864 swscale/x86/input: add AVX2 optimized uyvytoyuv422
uyvytoyuv422_c: 23991.8
uyvytoyuv422_sse2: 2817.8
uyvytoyuv422_avx: 2819.3
uyvytoyuv422_avx2: 1972.3

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
e9cfd53257 swscale/x86/input: add AVX2 optimized RGB32 to YUV functions
abgr_to_uv_8_c: 43.3
abgr_to_uv_8_sse2: 14.3
abgr_to_uv_8_avx: 15.3
abgr_to_uv_8_avx2: 18.8
abgr_to_uv_128_c: 650.3
abgr_to_uv_128_sse2: 110.8
abgr_to_uv_128_avx: 112.3
abgr_to_uv_128_avx2: 64.8
abgr_to_uv_1080_c: 5456.3
abgr_to_uv_1080_sse2: 888.8
abgr_to_uv_1080_avx: 900.8
abgr_to_uv_1080_avx2: 518.3
abgr_to_uv_1920_c: 9692.3
abgr_to_uv_1920_sse2: 1593.8
abgr_to_uv_1920_avx: 1613.3
abgr_to_uv_1920_avx2: 864.8
abgr_to_y_8_c: 23.3
abgr_to_y_8_sse2: 12.8
abgr_to_y_8_avx: 13.3
abgr_to_y_8_avx2: 17.3
abgr_to_y_128_c: 308.3
abgr_to_y_128_sse2: 67.3
abgr_to_y_128_avx: 66.8
abgr_to_y_128_avx2: 44.8
abgr_to_y_1080_c: 2371.3
abgr_to_y_1080_sse2: 512.8
abgr_to_y_1080_avx: 505.8
abgr_to_y_1080_avx2: 314.3
abgr_to_y_1920_c: 4177.3
abgr_to_y_1920_sse2: 915.8
abgr_to_y_1920_avx: 926.8
abgr_to_y_1920_avx2: 519.3
bgra_to_uv_8_c: 37.3
bgra_to_uv_8_sse2: 13.3
bgra_to_uv_8_avx: 14.8
bgra_to_uv_8_avx2: 19.8
bgra_to_uv_128_c: 563.8
bgra_to_uv_128_sse2: 111.3
bgra_to_uv_128_avx: 112.3
bgra_to_uv_128_avx2: 64.8
bgra_to_uv_1080_c: 4691.8
bgra_to_uv_1080_sse2: 893.8
bgra_to_uv_1080_avx: 899.8
bgra_to_uv_1080_avx2: 517.8
bgra_to_uv_1920_c: 8332.8
bgra_to_uv_1920_sse2: 1590.8
bgra_to_uv_1920_avx: 1605.8
bgra_to_uv_1920_avx2: 867.3
bgra_to_y_8_c: 22.3
bgra_to_y_8_sse2: 12.8
bgra_to_y_8_avx: 12.8
bgra_to_y_8_avx2: 17.3
bgra_to_y_128_c: 291.3
bgra_to_y_128_sse2: 67.8
bgra_to_y_128_avx: 69.3
bgra_to_y_128_avx2: 45.3
bgra_to_y_1080_c: 2357.3
bgra_to_y_1080_sse2: 508.3
bgra_to_y_1080_avx: 518.3
bgra_to_y_1080_avx2: 399.8
bgra_to_y_1920_c: 4202.8
bgra_to_y_1920_sse2: 906.8
bgra_to_y_1920_avx: 907.3
bgra_to_y_1920_avx2: 526.3

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
d5fe99dc5f swscale/x86/input: add AVX2 optimized RGB24 to YUV functions
rgb24_to_uv_8_c: 39.3
rgb24_to_uv_8_sse2: 14.3
rgb24_to_uv_8_ssse3: 13.3
rgb24_to_uv_8_avx: 12.8
rgb24_to_uv_8_avx2: 14.3
rgb24_to_uv_128_c: 582.8
rgb24_to_uv_128_sse2: 127.3
rgb24_to_uv_128_ssse3: 107.3
rgb24_to_uv_128_avx: 111.3
rgb24_to_uv_128_avx2: 62.3
rgb24_to_uv_1080_c: 4981.3
rgb24_to_uv_1080_sse2: 1048.3
rgb24_to_uv_1080_ssse3: 876.8
rgb24_to_uv_1080_avx: 887.8
rgb24_to_uv_1080_avx2: 492.3
rgb24_to_uv_1280_c: 5906.8
rgb24_to_uv_1280_sse2: 1263.3
rgb24_to_uv_1280_ssse3: 1048.3
rgb24_to_uv_1280_avx: 1045.8
rgb24_to_uv_1280_avx2: 579.8
rgb24_to_uv_1920_c: 8665.3
rgb24_to_uv_1920_sse2: 1888.8
rgb24_to_uv_1920_ssse3: 1571.8
rgb24_to_uv_1920_avx: 1558.8
rgb24_to_uv_1920_avx2: 869.3
rgb24_to_y_8_c: 20.3
rgb24_to_y_8_sse2: 11.8
rgb24_to_y_8_ssse3: 10.3
rgb24_to_y_8_avx: 10.3
rgb24_to_y_8_avx2: 10.8
rgb24_to_y_128_c: 284.8
rgb24_to_y_128_sse2: 83.3
rgb24_to_y_128_ssse3: 66.8
rgb24_to_y_128_avx: 64.8
rgb24_to_y_128_avx2: 39.3
rgb24_to_y_1080_c: 2451.3
rgb24_to_y_1080_sse2: 696.3
rgb24_to_y_1080_ssse3: 516.8
rgb24_to_y_1080_avx: 518.8
rgb24_to_y_1080_avx2: 301.8
rgb24_to_y_1280_c: 2892.8
rgb24_to_y_1280_sse2: 816.8
rgb24_to_y_1280_ssse3: 623.3
rgb24_to_y_1280_avx: 616.3
rgb24_to_y_1280_avx2: 350.8
rgb24_to_y_1920_c: 4338.8
rgb24_to_y_1920_sse2: 1210.8
rgb24_to_y_1920_ssse3: 928.3
rgb24_to_y_1920_avx: 920.3
rgb24_to_y_1920_avx2: 534.8

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:42:09 -03:00
Rémi Denis-Courmont
7a3369398f sws/input: R-V V 32-bit RGB to halved UV
T-Head C908:
abgr_to_uv_half_8_c:            2.2
abgr_to_uv_half_8_rvv_i32:      3.5
abgr_to_uv_half_128_c:         44.0
abgr_to_uv_half_128_rvv_i32:   13.0
abgr_to_uv_half_1080_c:       245.0
abgr_to_uv_half_1080_rvv_i32: 107.2
abgr_to_uv_half_1920_c:       406.2
abgr_to_uv_half_1920_rvv_i32: 188.7
bgra_to_uv_half_8_c:            2.2
bgra_to_uv_half_8_rvv_i32:      3.5
bgra_to_uv_half_128_c:         26.5
bgra_to_uv_half_128_rvv_i32:   13.0
bgra_to_uv_half_1080_c:       219.7
bgra_to_uv_half_1080_rvv_i32: 107.0
bgra_to_uv_half_1920_c:       406.7
bgra_to_uv_half_1920_rvv_i32: 188.7

SpacemiT X60:
abgr_to_uv_half_8_c:           2.2
abgr_to_uv_half_8_rvv_i32:     3.0
abgr_to_uv_half_128_c:        28.2
abgr_to_uv_half_128_rvv_i32:   5.7
abgr_to_uv_half_1080_c:      235.5
abgr_to_uv_half_1080_rvv_i32: 47.7
abgr_to_uv_half_1920_c:      418.2
abgr_to_uv_half_1920_rvv_i32: 84.0
bgra_to_uv_half_8_c:           2.0
bgra_to_uv_half_8_rvv_i32:     3.0
bgra_to_uv_half_128_c:        23.7
bgra_to_uv_half_128_rvv_i32:   5.7
bgra_to_uv_half_1080_c:      195.5
bgra_to_uv_half_1080_rvv_i32: 47.7
bgra_to_uv_half_1920_c:      346.5
bgra_to_uv_half_1920_rvv_i32: 84.0
2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont
e2f069905e sws/input: R-V V 32-bit RGB to UV 2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont
f5555cb106 sws/input: R-V V 32-bit RGB to Y
T-Head C908:
abgr_to_y_8_c:            2.5
abgr_to_y_8_rvv_i32:      2.2
abgr_to_y_128_c:         37.0
abgr_to_y_128_rvv_i32:    8.5
abgr_to_y_1080_c:       327.0
abgr_to_y_1080_rvv_i32:  69.5
abgr_to_y_1920_c:       552.0
abgr_to_y_1920_rvv_i32: 122.2
bgra_to_y_8_c:            2.5
bgra_to_y_8_rvv_i32:      2.2
bgra_to_y_128_c:         37.2
bgra_to_y_128_rvv_i32:    8.5
bgra_to_y_1080_c:       310.2
bgra_to_y_1080_rvv_i32:  69.5
bgra_to_y_1920_c:       568.2
bgra_to_y_1920_rvv_i32: 122.5

SpacemiT X60:
abgr_to_y_8_c:            2.5
abgr_to_y_8_rvv_i32:      2.0
abgr_to_y_128_c:         33.0
abgr_to_y_128_rvv_i32:    3.7
abgr_to_y_1080_c:       276.0
abgr_to_y_1080_rvv_i32:  31.5
abgr_to_y_1920_c:       493.7
abgr_to_y_1920_rvv_i32:  55.5
bgra_to_y_8_c:            2.2
bgra_to_y_8_rvv_i32:      2.0
bgra_to_y_128_c:         33.0
bgra_to_y_128_rvv_i32:    3.7
bgra_to_y_1080_c:       276.0
bgra_to_y_1080_rvv_i32:  31.5
bgra_to_y_1920_c:       490.7
bgra_to_y_1920_rvv_i32:  55.5
2024-06-09 14:33:04 +03:00
Andreas Rheinhardt
8b62fb231a swscale/x86/rgb2rgb: Detemplatize
Every function in rgb2rgb_template.c is only compiled exactly
once; there is no overlap at all between the MMXEXT and the
SSE2 functions, so detemplatize it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
5421dee0e7 swscale/x86/rgb2rgb_template: Remove unused uyvytoyv12
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
c1c35380a7 swscale/x86/rgb2rgb: Don't unnecessarily check for inline ASM
The SSE2 and AVX versions of deinterleaveBytes are external ASM.
Move them out of the inline ASM template.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
f7305eb3b3 swscale/x86/rgb2rgb_template: Remove unnecessary SFENCE
The ff_nv12ToUV_* functions don't use non-temporal stores
at all.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Rémi Denis-Courmont
e0f4d185f1 sws/input: R-V V rgb24ToUV_half and bgr24ToUV_half
T-Head C908:
rgb24_to_uv_half_4_c:           2.0
rgb24_to_uv_half_4_rvv_i32:     3.5
rgb24_to_uv_half_64_c:         27.0
rgb24_to_uv_half_64_rvv_i32:   12.5
rgb24_to_uv_half_540_c:       223.7
rgb24_to_uv_half_540_rvv_i32: 105.2
rgb24_to_uv_half_640_c:       265.5
rgb24_to_uv_half_640_rvv_i32: 123.7
rgb24_to_uv_half_960_c:       414.5
rgb24_to_uv_half_960_rvv_i32: 249.5

SpacemiT X60:
rgb24_to_uv_half_4_c:           1.7
rgb24_to_uv_half_4_rvv_i32:     4.2
rgb24_to_uv_half_64_c:         24.0
rgb24_to_uv_half_64_rvv_i32:    8.7
rgb24_to_uv_half_540_c:       199.2
rgb24_to_uv_half_540_rvv_i32:  72.5
rgb24_to_uv_half_640_c:       235.7
rgb24_to_uv_half_640_rvv_i32:  85.2
rgb24_to_uv_half_960_c:       353.5
rgb24_to_uv_half_960_rvv_i32: 127.5
2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont
3ef5867e4b sws/input: R-V V rgb24ToUV and bgr24ToUV
T-Head C908:
rgb24_to_uv_8_c:            2.7
rgb24_to_uv_8_rvv_i32:      3.2
rgb24_to_uv_128_c:         41.0
rgb24_to_uv_128_rvv_i32:   12.7
rgb24_to_uv_1080_c:       342.5
rgb24_to_uv_1080_rvv_i32: 105.7
rgb24_to_uv_1280_c:       406.0
rgb24_to_uv_1280_rvv_i32: 124.2
rgb24_to_uv_1920_c:       626.0
rgb24_to_uv_1920_rvv_i32: 186.0

SpacemiT X60:
rgb24_to_uv_8_c:            2.5
rgb24_to_uv_8_rvv_i32:      3.0
rgb24_to_uv_128_c:         36.5
rgb24_to_uv_128_rvv_i32:    5.7
rgb24_to_uv_1080_c:       304.2
rgb24_to_uv_1080_rvv_i32:  49.0
rgb24_to_uv_1280_c:       360.5
rgb24_to_uv_1280_rvv_i32:  57.5
rgb24_to_uv_1920_c:       540.7
rgb24_to_uv_1920_rvv_i32:  86.2
2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont
79dfdac4db sws/input: R-V V rgb24ToY & bgr24ToY
T-Head C908:
rgb24_to_y_8_c:            2.0
rgb24_to_y_8_rvv_i32:      2.7
rgb24_to_y_128_c:         26.2
rgb24_to_y_128_rvv_i32:    9.2
rgb24_to_y_1080_c:       219.5
rgb24_to_y_1080_rvv_i32:  76.2
rgb24_to_y_1280_c:       276.2
rgb24_to_y_1280_rvv_i32:  89.7
rgb24_to_y_1920_c:       389.7
rgb24_to_y_1920_rvv_i32: 134.2

SpacemiT X60:
rgb24_to_y_8_c:            1.7
rgb24_to_y_8_rvv_i32:      2.2
rgb24_to_y_128_c:         23.2
rgb24_to_y_128_rvv_i32:    4.2
rgb24_to_y_1080_c:       195.0
rgb24_to_y_1080_rvv_i32:  33.7
rgb24_to_y_1280_c:       231.0
rgb24_to_y_1280_rvv_i32:  40.0
rgb24_to_y_1920_c:       346.2
rgb24_to_y_1920_rvv_i32:  59.7
2024-06-08 18:30:43 +03:00
Ramiro Polla
5939f7228a libswscale/x86/yuv_2_rgb: fix some comments 2024-06-07 15:24:06 +02:00
Shiyou Yin
6b35fcacdb
swscale: [loongarch] Fix undeclared functions prob.
Compile with '--disable-lasx', ‘lumRangeFromJpeg_lasx’ undeclared.

Reviewed-by: 金波 <jinbo@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-31 02:20:23 +02:00
Michael Niedermayer
bfc22f364d
swscale/yuv2rgb: Use 64bit for brightness computation
This will not overflow for normal values
Fixes: CID1500280 Unintentional integer overflow

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:06 +02:00
Michael Niedermayer
3f9daf1c18
swscale/x86/swscale: use a clearer name for INPUT_PLANER_RGB_A_FUNC_CASE
related: CID1497114 Missing break in switch

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:06 +02:00
Rémi Denis-Courmont
6c6313f1b5 swscale/riscv: explicitly require Zbb for MIN 2024-05-10 18:59:06 +03:00
Michael Niedermayer
1330a73cca
swscale/output: Fix integer overflow in yuv2rgba64_full_1_c_template()
Fixes: signed integer overflow: -1082982400 + -1079364728 cannot be represented in type 'int'
Fixes: 67910/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5329011971522560
The input is 9bit in 16bit, the fuzzer fills all 16bit thus generating "invalid" input
No overflow should happen with valid input.

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-06 03:00:40 +02:00
Michael Niedermayer
a56559e688
swscale/output: Fix integer overflow in yuv2rgba64_1_c_template
Fixes: signed integer overflow: -831176 * 9539 cannot be represented in type 'int'
Fixes: 67869/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5117342091640832

The input is 9bit in 16bit, the fuzzer fills all 16bit thus generating "invalid" input
No overflow should happen with valid input.

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-06 03:00:40 +02:00
Shiyou Yin
2a7d622ddd
swscale: [LA] Optimize swscale funcs in input.c
Optimized 7 funcs with LSX and LASX:
1. yuy2ToUV_c
2. yvy2ToUV_c
3. uyvyToUV_c
4. nv12ToUV_c
5. nv21ToUV_c
6. abgrToA_c
7. rgbaToA_c

Reviewed-by: colleague of Shiyou Yin
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-11 23:53:59 +02:00
Shiyou Yin
8b76df9142
swscale: [LA] Optimize yuv2plane1_8_c.
Reviewed-by: colleague of Shiyou Yin
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-11 23:53:59 +02:00
Shiyou Yin
f3fe2cb5f7
swscale: [LA] Optimize range convert for yuvj420p.
Reviewed-by: 陈昊 <chenhao@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-11 23:53:41 +02:00
Michael Niedermayer
1a9eda65d0
swscale/utils: Fix xInc overflow
Fixes: signed integer overflow: 2 * 1073741824 cannot be represented in type 'int'
Fixes: 67802/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6249515855183872

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-04 19:38:29 +02:00
Andreas Rheinhardt
428ff7bd8c swscale/ppc/swscale_ppc_template: Reindent after the previous commit
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-04-04 16:47:21 +02:00
Andreas Rheinhardt
95b4aea5e3 swscale/ppc/swscale_ppc_template: Remove code not passing checkasm
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-04-04 16:45:23 +02:00
Andreas Rheinhardt
790f793844 avutil/common: Don't auto-include mem.h
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.

Keep it for external users in order to not cause breakages.

Also improve the other headers a bit while just at it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:43 +01:00
Andreas Rheinhardt
b616be1649 lib*/version: Use static_assert for static asserts
Also update the checks that guard against inserting
a new enum entry in the middle of a range.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
2d38141ea6 swscale/swscale_internal: Don't export internal function
sws_alloc_set_opts() can actually be made internal to utils.c.
This commit does so.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
ad1cef04a9 swscale/swscale_internal: Hoist branch out of loop
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
b49e621c83 swscale/ppc/swscale_altivec: Simplify macro
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
72f4f1dafb swscale/ppc/swscale_altivec: Fix build with -O0
In this case GCC does not treat a const variable initialized
to the compile-time constant "3" as a compile-time constant
and errors out because the argument is not a literal value.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00
Andreas Rheinhardt
4b44b5eaf0 swscale/swscale_internal: Only include altivec header iff HAVE_ALTIVEC
Reviewed-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-30 05:04:41 +01:00
Michael Niedermayer
6b213175c9
Bump after 7.0 branch point
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-27 01:04:54 +01:00
Michael Niedermayer
872980ace6
Bump prior release/7.0 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-03-27 01:04:53 +01:00
Henrik Gramner
c3d3f0e697 avutil/x86util: Fix broken pre-SSE4.1 PMINSD emulation
Fixes yadif-16 which allows FATE to pass.

Broken since 2904db9045 (2017).
2024-03-17 13:52:27 +01:00
James Almer
783d00b203 libs: bump major version for all libraries
Signed-off-by: James Almer <jamrial@gmail.com>
2024-03-07 11:29:43 -03:00
Michael Niedermayer
e9cc9e492f
libswscale/utils: Fix bayer to yuvj
Fixes: out of array access.

Earlier code assumes that a unscaled bayer to yuvj420 converter exists
but the later code then skips yuvj420

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-02-21 18:24:17 +01:00
Michael Niedermayer
f9906911f0
Revert "swscale: fix sws_setColorspaceDetails after sws_init_context"
Suggested by: Niklas Haas in Ticket10824

Fixes: Assertion failure
Fixes: Ticket10824

This reverts commit cedf589c09.
2024-02-21 18:24:17 +01:00
Michael Niedermayer
64098d0cd8
swscale/swscale: Check srcSliceH for bayer
Fixes: Assertion srcSliceH > 1 failed at libswscale/swscale_unscaled.c:1359
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-02-21 18:24:16 +01:00
Michael Niedermayer
18f26f8a2f
swscale/utils: Allocate more dithererror
Fixes: out of array read
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-02-21 18:24:16 +01:00
Michael Niedermayer
ebb7dffa97
swscale/tests/swscale: Add help text
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-02-15 23:07:44 +01:00
Michael Niedermayer
6ebe4ebee3
swscale/tests/swscale: Highlight cases that worsened
also highlight cases that worsened alot in uppercase

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-02-15 23:07:44 +01:00
Michael Niedermayer
f7770ec9a4
swscale/tests/swscale: Allow comparing a subset of cases to a reference file
Testing all cases exhaustively is slow

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-02-15 23:07:44 +01:00
Michael Niedermayer
885a802f24
swscale/tests/swscale: Test a wider range of flag combinations
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-02-15 23:07:43 +01:00
Michael Niedermayer
35ab103c30
swscale/tests/swscale: Compute chroma and alpha between gray and opaque frames too
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-02-15 23:07:43 +01:00
Michael Niedermayer
247f485448
swscale/tests/swscale: Split sws_getContext()
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-02-15 23:07:43 +01:00
Michael Niedermayer
1055ece30b
swscale/tests/swscale: Implement isALPHA() using AVPixFmtDescriptor
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-02-15 23:07:42 +01:00
Anton Khirnov
1e7d2007c3 all: use designated initializers for AVOption.unit
Makes it robust against adding fields before it, which will be useful in
following commits.

Majority of the patch generated by the following Coccinelle script:

@@
typedef AVOption;
identifier arr_name;
initializer list il;
initializer list[8] il1;
expression tail;
@@
AVOption arr_name[] = { il, { il1,
- tail
+ .unit = tail
}, ...  };

with some manual changes, as the script:
* has trouble with options defined inside macros
* sometimes does not handle options under an #else branch
* sometimes swallows whitespace
2024-02-14 14:53:41 +01:00
Rémi Denis-Courmont
b3825bbe45 riscv: test for assembler support
This should fix the build on LLVM 16 and earlier, at the cost of turning
all non-RVV optimisations off.
2023-12-08 17:21:09 +02:00
Alfred Wingate
e5ce473040 swscale/x86/rgb_2_rgb: Add opaque pointer to missed definitions of ff_nv12ToUV
Opaque parameters were previously added to the original definition of
ff_nv12ToUV, leading to gcc noticing a type mismatch with -Wlto-type-mismatch.

f2de911818
https://bugs.gentoo.org/907484

Signed-off-by: Alfred Wingate <parona@protonmail.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2023-12-02 11:22:46 +01:00
xufuji456
cc86343b96 lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d
Building iOS platform with arm64, the compiler has a warning: "instruction movi.2d with immediate #0 may not function correctly on this CPU, converting to movi.16b"

Signed-off-by: xufuji456 <839789740@qq.com>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-11-28 15:54:49 +02:00
Rémi Denis-Courmont
6d60cc7baf sws/rgb2rgb: fix unaligned accesses in R-V V YUYV to I422p
In my personal opinion, we should not need to support unaligned YUY2
pixel maps. They should always be aligned to at least 32 bits, and the
current code assumes just 16 bits. However checkasm does test for
unaligned input bitmaps. QEMU accepts it, but real hardware dose not.

In this particular case, we can at the same time improve performance and
handle unaligned inputs, so do just that.

uyvytoyuv422_c:      104379.0
uyvytoyuv422_c:      104060.0
uyvytoyuv422_rvv_i32: 25284.0 (before)
uyvytoyuv422_rvv_i32: 19303.2 (after)
2023-11-13 18:34:29 +02:00
Rémi Denis-Courmont
5b8b5ec9c5 sws/rgb2rgb: rework R-V V YUY2 to 4:2:2 planar
This saves three scratch registers and three instructions per line. The
performance gains are mostly negligible. The main point is to free up
registers for further rework.
2023-11-13 18:34:29 +02:00
Niklas Haas
736284e7b9 swscale/yuv2rgb: fix sws_getCoefficients for colorspace=0
The documentation states that invalid entries default to SWS_CS_DEFAULT.
A value of 0 is not a valid SWS_CS_*, yet the code incorrectly
hard-codes it to BT.709 coefficients instead of SWS_CS_DEFAULT.
2023-11-09 12:53:35 +01:00
Niklas Haas
d043e5c54c swscale: don't omit ff_sws_init_range_convert for high-bit
This was a complete hack seemingly designed to work around a different
bug, which was fixed in the previous commit. As such, there is no more
reason not to do this, as it simply breaks changing color range in
sws_setColorspaceDetails for no reason.
2023-11-09 12:53:35 +01:00
Niklas Haas
cedf589c09 swscale: fix sws_setColorspaceDetails after sws_init_context
More commonly, this fixes the case of sws_setColorspaceDetails after
sws_getContext, since the latter implies sws_init_context.

The problem here is that sws_init_context sets up the range conversion
and fast path tables based on the values of srcRange/dstRange at init
time. This may result in locking in a "wrong" path (either using
unscaled fast path when range conversion later required, or using
scaled slow path when range conversion becomes no longer required).

There are two way outs:

1. Always initialize range conversion and unscaled converters, even if
   they will be unused, and extend the runtime check.
2. Re-do initialization if the values change after
   sws_setColorspaceDetails.

I opted for approach 1 because it was simpler and easier to reason
about.

Reword the av_log message to make it clear that this special converter
is not necessarily used, depending on whether or not there is range
conversion or YUV matrix conversion going on.
2023-11-09 12:53:35 +01:00
Michael Niedermayer
47e784f881
Bump versions after 6.1
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-10-29 16:19:14 +01:00
Michael Niedermayer
9d3a7d30c4
Bump versions prior to 6.1
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2023-10-29 15:34:05 +01:00
Martin Storsjö
a76b409dd0 aarch64: Reindent all assembly to 8/24 column indentation
libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally
uses a layered indentation style to visually show how different
unrolled/interleaved phases fit together.

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-21 23:25:54 +03:00
Martin Storsjö
93cda5a9c2 aarch64: Lowercase UXTW/SXTW and similar flags
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-21 23:25:23 +03:00
Martin Storsjö
184103b310 aarch64: Consistently use lowercase for vector element specifiers
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-10-21 23:25:18 +03:00
Rémi Denis-Courmont
19baf4e009 swscale/rgb2rgb: R-V V deinterleaveBytes 2023-10-03 22:53:20 +03:00
Rémi Denis-Courmont
ede3215115 swscale/rgb2rgb: fix extra iteration in R-V V interleave
There was an additional iteration doing nothing for each line,
due to checking the selected vector length instead of the available
vector length.
2023-10-03 22:53:20 +03:00
Rémi Denis-Courmont
d14130aea3 swscale/rgb2rgb: unroll R-V V interleave_bytes 2023-10-03 20:48:47 +03:00
Rémi Denis-Courmont
6269c4a440 swscale/rgb2rgb: unroll RISC-V V uyvytoyuv422 2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont
e50f8e861b swscale/rgb2rgb: avoid S-regs in RISC-V V uyvytoyuv422
We can make do with callee-clobbered registers only now.
As an added bonus, this makes the code XLEN-independent.
2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont
be37a2e364 swscale/rgb2rgb: rework RISC-V V uyvytoyuv422
This avoids using relatively slow register strides.
2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont
1a4bd76ea5 swscale/rgb2rgb: remove R-V V shuffle_bytes_3012
This is slower than the Zbb version on real hardware due to register
strides. Proper support for vector byte-swap requires the Zvbb
extension, but it's much too early for me to worry about it.
2023-10-02 22:28:38 +03:00
Rémi Denis-Courmont
c4a144c29d swscale/rgb2rgb: add R-V Zbb shuffle_bytes_3210 2023-10-02 22:28:25 +03:00
Paul B Mahol
29b673bdcf swscale: add GBRAP14 format support 2023-09-28 19:37:58 +02:00
Andreas Rheinhardt
f8503b4c33 avutil/internal: Don't auto-include emms.h
Instead include emms.h wherever it is needed.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2023-09-04 11:04:45 +02:00
L. E. Segovia
ddc1cd5cdd configure: Set WIN32_LEAN_AND_MEAN at configure time
Including winsock2.h or windows.h without WIN32_LEAN_AND_MEAN cause
bzlib.h to parse as nonsense, due to an instance of #define char small
in rpcndr.h.

See:

https://stackoverflow.com/a/27794577

Signed-off-by: L. E. Segovia <amy@amyspark.me>
Signed-off-by: Martin Storsjö <martin@martin.st>
2023-08-14 22:57:28 +03:00
Rémi Denis-Courmont
c2b38619c0 swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{1230,3012}
This avoids strided loads.

Before:
shuffle_bytes_1230_rvv_i32: 308.7
shuffle_bytes_3012_rvv_i32: 308.7

After:
shuffle_bytes_1230_rvv_i32: 46.7
shuffle_bytes_3012_rvv_i32: 46.7
2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont
15982554e6 swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{0321,2103}
This avoids strided loads.

Before:
shuffle_bytes_0321_rvv_i32: 307.7
shuffle_bytes_2103_rvv_i32: 308.7

After:
shuffle_bytes_0321_rvv_i32: 59.7
shuffle_bytes_2103_rvv_i32: 61.5
2023-07-21 22:18:02 +03:00