FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00

Author	SHA1	Message	Date
Niklas Haas	4ec45aca36	swscale/utils: fix leak on threaded ctx init failure This count gets incremented after init succeeds, when it should be incremented after alloc succeeds. Otherwise, we leak the context on failure. There are no negative consequences of incrementing for allocated-but-not-initialized contexts, as the only functions that reference it will, in the worst case, simply behave as if called on allocated-but-not-initialized contexts, which is in line with expected behavior when sws_init_context() fails.	2024-07-14 13:48:59 +02:00
Sean McGovern	34b4ca8696	swscale: prevent undefined behaviour in the PUTRGBA macro For even small values of 'asrc[x]', shifting them by 24 bits or more will cause arithmetic overflow and be caught by GCC's undefined behaviour sanitizer. Ensure the values do not overflow by up-casting the bracketed expressions involving 'asrc' to uint32_t. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-07-10 18:10:10 +02:00
Ramiro Polla	ac6263945a	swscale/x86/yuv2rgb: Detemplatize Every function in yuv2rgb_template.c is only compiled exactly once, so detemplatize it.	2024-07-10 12:25:32 +02:00
Ramiro Polla	4f7f9b1026	swscale: remove unconditional #define DITHER1XBPP This seems to have had an use in the past, but it is now defined unconditionally.	2024-07-10 12:25:03 +02:00
Michael Niedermayer	66b60bae68	swscale/swscale: Use ptrdiff_t for linesize computations This is unlikely to make a difference Fixes: CID1591896 Unintentional integer overflow Fixes: CID1591901 Unintentional integer overflow Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-07-07 23:36:30 +02:00
Zhao Zhili	4d90a76986	swscale/aarch64: Add argb/abgr to yuv Test on Apple M1 with kperf: : -O3 : -O3 -fno-vectorize abgr_to_uv_8_c : 19.4 : 26.1 abgr_to_uv_8_neon : 29.9 : 51.1 abgr_to_uv_128_c : 146.4 : 558.9 abgr_to_uv_128_neon : 85.1 : 83.4 abgr_to_uv_1080_c : 1162.6 : 4786.4 abgr_to_uv_1080_neon : 819.6 : 826.6 abgr_to_uv_1920_c : 2063.6 : 8492.1 abgr_to_uv_1920_neon : 1435.1 : 1447.1 abgr_to_uv_half_8_c : 16.4 : 11.4 abgr_to_uv_half_8_neon : 35.6 : 20.4 abgr_to_uv_half_128_c : 108.6 : 359.4 abgr_to_uv_half_128_neon : 75.4 : 42.6 abgr_to_uv_half_1080_c : 883.4 : 2885.6 abgr_to_uv_half_1080_neon : 460.6 : 481.1 abgr_to_uv_half_1920_c : 1553.6 : 5106.9 abgr_to_uv_half_1920_neon : 817.6 : 820.4 abgr_to_y_8_c : 6.1 : 26.4 abgr_to_y_8_neon : 40.6 : 6.4 abgr_to_y_128_c : 99.9 : 390.1 abgr_to_y_128_neon : 67.4 : 55.9 abgr_to_y_1080_c : 735.9 : 3170.4 abgr_to_y_1080_neon : 534.6 : 536.6 abgr_to_y_1920_c : 1279.4 : 6016.4 abgr_to_y_1920_neon : 932.6 : 927.6 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-07-05 16:32:31 +08:00
Zhao Zhili	52422133ae	swscale/aarch64: Add bgra/rgba to yuv Test on Apple M1 with kperf : -O3 : -O3 -fno-vectorize bgra_to_uv_8_c : 13.4 : 27.5 bgra_to_uv_8_neon : 37.4 : 41.7 bgra_to_uv_128_c : 155.9 : 550.2 bgra_to_uv_128_neon : 91.7 : 92.7 bgra_to_uv_1080_c : 1173.2 : 4558.2 bgra_to_uv_1080_neon : 822.7 : 809.5 bgra_to_uv_1920_c : 2078.2 : 8115.2 bgra_to_uv_1920_neon : 1437.7 : 1438.7 bgra_to_uv_half_8_c : 17.9 : 14.2 bgra_to_uv_half_8_neon : 37.4 : 10.5 bgra_to_uv_half_128_c : 103.9 : 326.0 bgra_to_uv_half_128_neon : 73.9 : 68.7 bgra_to_uv_half_1080_c : 850.2 : 3732.0 bgra_to_uv_half_1080_neon : 484.2 : 490.0 bgra_to_uv_half_1920_c : 1479.2 : 4942.7 bgra_to_uv_half_1920_neon : 824.2 : 824.7 bgra_to_y_8_c : 8.2 : 29.5 bgra_to_y_8_neon : 18.2 : 32.7 bgra_to_y_128_c : 101.4 : 361.5 bgra_to_y_128_neon : 74.9 : 73.7 bgra_to_y_1080_c : 739.4 : 3018.0 bgra_to_y_1080_neon : 613.4 : 544.2 bgra_to_y_1920_c : 1298.7 : 5326.0 bgra_to_y_1920_neon : 918.7 : 934.2 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-07-05 16:32:31 +08:00
Zhao Zhili	b8b71be07a	swscale/aarch64: Add bgr24 to yuv Test on Apple M1 with kperf : -O3 : -O3 -fno-vectorize bgr24_to_uv_8_c : 28.5 : 52.5 bgr24_to_uv_8_neon : 54.5 : 59.7 bgr24_to_uv_128_c : 294.0 : 830.7 bgr24_to_uv_128_neon : 99.7 : 112.0 bgr24_to_uv_1080_c : 965.0 : 6624.0 bgr24_to_uv_1080_neon : 751.5 : 754.7 bgr24_to_uv_1920_c : 1693.2 : 11554.5 bgr24_to_uv_1920_neon : 1292.5 : 1307.5 bgr24_to_uv_half_8_c : 54.2 : 37.0 bgr24_to_uv_half_8_neon : 27.2 : 22.5 bgr24_to_uv_half_128_c : 127.2 : 392.5 bgr24_to_uv_half_128_neon : 63.0 : 52.0 bgr24_to_uv_half_1080_c : 880.2 : 3329.0 bgr24_to_uv_half_1080_neon : 401.5 : 390.7 bgr24_to_uv_half_1920_c : 1585.7 : 6390.7 bgr24_to_uv_half_1920_neon : 694.7 : 698.7 bgr24_to_y_8_c : 21.7 : 22.5 bgr24_to_y_8_neon : 797.2 : 25.5 bgr24_to_y_128_c : 88.0 : 280.5 bgr24_to_y_128_neon : 63.7 : 55.0 bgr24_to_y_1080_c : 616.7 : 2208.7 bgr24_to_y_1080_neon : 900.0 : 452.0 bgr24_to_y_1920_c : 1093.2 : 3894.7 bgr24_to_y_1920_neon : 777.2 : 767.5 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-07-05 16:32:31 +08:00
Ramiro Polla	61e851381f	swscale/yuv2rgb/x86: remove mmx/mmxext yuv2rgb functions These functions are either slower or barely faster than the C LUT yuv2rgb code.	2024-07-04 11:12:47 +02:00
Michael Niedermayer	c221c7422f	swscale/output: Avoid undefined overflow in yuv2rgb_write_full() Fixes: signed integer overflow: -140140 * 16525 cannot be represented in type 'int' Fixes: 68859/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-4516387130245120 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-06-26 20:49:36 +02:00
Michael Niedermayer	9e6c5b6e86	swscale/output: alpha can become negative after scaling, use multiply Fixes: left shift of negative value -3245 Fixes: 69047/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6571511551950848 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-06-26 20:49:36 +02:00
Ramiro Polla	e37a93031e	swscale/yuv2rgb: reindent after previous commit	2024-06-24 13:35:22 +02:00
Ramiro Polla	0a08c64588	swscale/yuv2rgb: fix yuv422p input in C code The C code was silently ignoring the second chroma line on yuv422p input.	2024-06-24 13:34:53 +02:00
Ramiro Polla	fb8fae864f	swscale/yuv2rgb: add macros to simplify code generation	2024-06-24 13:34:28 +02:00
Ramiro Polla	88a402df74	swscale/yuv2rgb: fix conversion for widths not aligned to 8 The C code for some pixel formats (rgb555, rgb565, rgb444, and monob) was not converting the last pixels on widths not aligned to 8. NOTE: the last pixel for odd widths is still not converted for any of the pixel formats in the C code for yuv2rgb except for monob.	2024-06-24 13:33:53 +02:00
Ramiro Polla	75f1a8e071	swscale/aarch64: add neon {lum,chr}ConvertRange chrRangeFromJpeg_8_c: 29.2 chrRangeFromJpeg_8_neon: 19.5 chrRangeFromJpeg_24_c: 80.5 chrRangeFromJpeg_24_neon: 34.0 chrRangeFromJpeg_128_c: 413.7 chrRangeFromJpeg_128_neon: 156.0 chrRangeFromJpeg_144_c: 471.0 chrRangeFromJpeg_144_neon: 174.2 chrRangeFromJpeg_256_c: 842.0 chrRangeFromJpeg_256_neon: 305.5 chrRangeFromJpeg_512_c: 1699.0 chrRangeFromJpeg_512_neon: 608.0 chrRangeToJpeg_8_c: 51.7 chrRangeToJpeg_8_neon: 22.7 chrRangeToJpeg_24_c: 149.7 chrRangeToJpeg_24_neon: 38.0 chrRangeToJpeg_128_c: 761.7 chrRangeToJpeg_128_neon: 176.7 chrRangeToJpeg_144_c: 866.2 chrRangeToJpeg_144_neon: 198.7 chrRangeToJpeg_256_c: 1516.5 chrRangeToJpeg_256_neon: 348.7 chrRangeToJpeg_512_c: 3067.2 chrRangeToJpeg_512_neon: 692.7 lumRangeFromJpeg_8_c: 24.0 lumRangeFromJpeg_8_neon: 17.0 lumRangeFromJpeg_24_c: 56.7 lumRangeFromJpeg_24_neon: 21.0 lumRangeFromJpeg_128_c: 294.5 lumRangeFromJpeg_128_neon: 76.7 lumRangeFromJpeg_144_c: 332.5 lumRangeFromJpeg_144_neon: 86.7 lumRangeFromJpeg_256_c: 586.0 lumRangeFromJpeg_256_neon: 152.2 lumRangeFromJpeg_512_c: 1190.0 lumRangeFromJpeg_512_neon: 298.0 lumRangeToJpeg_8_c: 31.7 lumRangeToJpeg_8_neon: 19.5 lumRangeToJpeg_24_c: 83.5 lumRangeToJpeg_24_neon: 24.2 lumRangeToJpeg_128_c: 440.5 lumRangeToJpeg_128_neon: 91.0 lumRangeToJpeg_144_c: 504.2 lumRangeToJpeg_144_neon: 101.0 lumRangeToJpeg_256_c: 879.7 lumRangeToJpeg_256_neon: 177.2 lumRangeToJpeg_512_c: 1794.2 lumRangeToJpeg_512_neon: 354.0	2024-06-18 23:12:41 +02:00
James Almer	fcf72966a5	swscale/x86/range_convert: add missing AVX2 preprocessor wrapper Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-16 10:09:38 -03:00
James Almer	8a4c9d6bd3	swscale/x86/range_convert: reduce amount of xmm regs clobbered in luma functions Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-15 21:02:06 -03:00
Ramiro Polla	f6859cade3	swscale/x86: add sse2 and avx2 {lum,chr}ConvertRange chrRangeFromJpeg_8_c: 22.3 chrRangeFromJpeg_8_sse2: 13.3 chrRangeFromJpeg_8_avx2: 13.3 chrRangeFromJpeg_24_c: 72.8 chrRangeFromJpeg_24_sse2: 22.3 chrRangeFromJpeg_24_avx2: 17.5 chrRangeFromJpeg_128_c: 345.5 chrRangeFromJpeg_128_sse2: 106.0 chrRangeFromJpeg_128_avx2: 57.8 chrRangeFromJpeg_144_c: 380.5 chrRangeFromJpeg_144_sse2: 118.5 chrRangeFromJpeg_144_avx2: 62.3 chrRangeFromJpeg_256_c: 646.3 chrRangeFromJpeg_256_sse2: 218.8 chrRangeFromJpeg_256_avx2: 109.0 chrRangeFromJpeg_512_c: 1461.5 chrRangeFromJpeg_512_sse2: 426.5 chrRangeFromJpeg_512_avx2: 211.5 chrRangeToJpeg_8_c: 37.8 chrRangeToJpeg_8_sse2: 10.5 chrRangeToJpeg_8_avx2: 14.0 chrRangeToJpeg_24_c: 114.3 chrRangeToJpeg_24_sse2: 23.5 chrRangeToJpeg_24_avx2: 16.3 chrRangeToJpeg_128_c: 633.5 chrRangeToJpeg_128_sse2: 107.5 chrRangeToJpeg_128_avx2: 55.0 chrRangeToJpeg_144_c: 758.3 chrRangeToJpeg_144_sse2: 132.0 chrRangeToJpeg_144_avx2: 64.5 chrRangeToJpeg_256_c: 1345.0 chrRangeToJpeg_256_sse2: 218.0 chrRangeToJpeg_256_avx2: 105.3 chrRangeToJpeg_512_c: 2524.0 chrRangeToJpeg_512_sse2: 417.0 chrRangeToJpeg_512_avx2: 218.8 lumRangeFromJpeg_8_c: 11.8 lumRangeFromJpeg_8_sse2: 11.0 lumRangeFromJpeg_8_avx2: 10.3 lumRangeFromJpeg_24_c: 38.5 lumRangeFromJpeg_24_sse2: 15.5 lumRangeFromJpeg_24_avx2: 12.5 lumRangeFromJpeg_128_c: 232.3 lumRangeFromJpeg_128_sse2: 60.0 lumRangeFromJpeg_128_avx2: 26.8 lumRangeFromJpeg_144_c: 259.5 lumRangeFromJpeg_144_sse2: 65.3 lumRangeFromJpeg_144_avx2: 29.0 lumRangeFromJpeg_256_c: 464.5 lumRangeFromJpeg_256_sse2: 107.5 lumRangeFromJpeg_256_avx2: 54.0 lumRangeFromJpeg_512_c: 897.5 lumRangeFromJpeg_512_sse2: 224.5 lumRangeFromJpeg_512_avx2: 109.8 lumRangeToJpeg_8_c: 17.8 lumRangeToJpeg_8_sse2: 11.0 lumRangeToJpeg_8_avx2: 11.8 lumRangeToJpeg_24_c: 56.3 lumRangeToJpeg_24_sse2: 11.0 lumRangeToJpeg_24_avx2: 12.5 lumRangeToJpeg_128_c: 333.8 lumRangeToJpeg_128_sse2: 53.3 lumRangeToJpeg_128_avx2: 26.5 lumRangeToJpeg_144_c: 375.5 lumRangeToJpeg_144_sse2: 60.8 lumRangeToJpeg_144_avx2: 29.0 lumRangeToJpeg_256_c: 652.0 lumRangeToJpeg_256_sse2: 109.5 lumRangeToJpeg_256_avx2: 53.5 lumRangeToJpeg_512_c: 1284.3 lumRangeToJpeg_512_sse2: 218.0 lumRangeToJpeg_512_avx2: 108.3	2024-06-16 00:35:51 +02:00
Rémi Denis-Courmont	378d1b06c3	riscv: probe for Zbb extension at load time Due to hysterical raisins, most RISC-V Linux distributions target a RV64GC baseline excluding the Bit-manipulation ISA extensions, most notably: - Zba: address generation extension and - Zbb: basic bit manipulation extension. Most CPUs that would make sense to run FFmpeg on support Zba and Zbb (including the current FATE runner), so it makes sense to optimise for them. In fact a large chunk of existing assembler optimisations relies on Zba and/or Zbb. Since we cannot patch shared library code, the next best thing is to carry a flag initialised at load-time and check it on need basis. This results in 3 instructions overhead on isolated use, e.g.: 1: AUIPC rd, %pcrel_hi(ff_rv_zbb_supported) LBU rd, %pcrel_lo(1b)(rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here The C compiler will typically load the flag ahead of time to reducing latency, and can also keep it around if Zbb is used multiple times in a single optimisation scope. For this to work, the flag symbol must be hidden; otherwise the optimisation degrades with a GOT look-up to support interposition: 1: AUIPC rd, GOT_OFFSET_HI LD rd, GOT_OFFSET_LO(rd) LBU rd, (rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here This patch adds code to provision the flag in libraries using bit manipulation functions from libavutil: byte-swap, bit-weight and counting leading or trailing zeroes.	2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont	417957ec5e	sws/range_convert: R-V V to/from JPEG C908 X60 chrRangeFromJpeg_8_c: 2.7 2.5 chrRangeFromJpeg_8_rvv_i32: 1.7 1.5 chrRangeFromJpeg_24_c: 7.5 6.7 chrRangeFromJpeg_24_rvv_i32: 1.7 1.5 chrRangeFromJpeg_128_c: 55.2 34.7 chrRangeFromJpeg_128_rvv_i32: 6.5 3.0 chrRangeFromJpeg_144_c: 44.0 39.2 chrRangeFromJpeg_144_rvv_i32: 7.7 4.5 chrRangeFromJpeg_256_c: 78.2 69.5 chrRangeFromJpeg_256_rvv_i32: 12.2 6.0 chrRangeFromJpeg_512_c: 172.2 138.5 chrRangeFromJpeg_512_rvv_i32: 24.5 11.7 chrRangeToJpeg_8_c: 4.7 4.2 chrRangeToJpeg_8_rvv_i32: 2.0 1.7 chrRangeToJpeg_24_c: 13.7 12.2 chrRangeToJpeg_24_rvv_i32: 2.0 1.5 chrRangeToJpeg_128_c: 72.0 63.7 chrRangeToJpeg_128_rvv_i32: 6.7 3.2 chrRangeToJpeg_144_c: 80.7 71.7 chrRangeToJpeg_144_rvv_i32: 8.5 4.7 chrRangeToJpeg_256_c: 143.2 127.2 chrRangeToJpeg_256_rvv_i32: 13.5 6.5 chrRangeToJpeg_512_c: 285.7 253.7 chrRangeToJpeg_512_rvv_i32: 27.0 13.0 lumRangeFromJpeg_8_c: 1.7 1.5 lumRangeFromJpeg_8_rvv_i32: 1.2 1.0 lumRangeFromJpeg_24_c: 4.2 3.7 lumRangeFromJpeg_24_rvv_i32: 1.2 1.0 lumRangeFromJpeg_128_c: 21.7 19.2 lumRangeFromJpeg_128_rvv_i32: 3.7 1.7 lumRangeFromJpeg_144_c: 24.7 22.0 lumRangeFromJpeg_144_rvv_i32: 4.7 2.7 lumRangeFromJpeg_256_c: 43.7 39.0 lumRangeFromJpeg_256_rvv_i32: 7.5 3.2 lumRangeFromJpeg_512_c: 87.0 77.2 lumRangeFromJpeg_512_rvv_i32: 14.5 6.7 lumRangeToJpeg_8_c: 2.7 2.2 lumRangeToJpeg_8_rvv_i32: 1.0 1.0 lumRangeToJpeg_24_c: 7.2 6.5 lumRangeToJpeg_24_rvv_i32: 1.2 1.0 lumRangeToJpeg_128_c: 37.7 33.7 lumRangeToJpeg_128_rvv_i32: 3.7 2.0 lumRangeToJpeg_144_c: 42.5 37.7 lumRangeToJpeg_144_rvv_i32: 4.7 2.7 lumRangeToJpeg_256_c: 75.0 66.7 lumRangeToJpeg_256_rvv_i32: 7.5 3.5 lumRangeToJpeg_512_c: 149.5 133.0 lumRangeToJpeg_512_rvv_i32: 14.7 7.0	2024-06-10 22:48:52 +03:00
Zhao Zhili	9dac8495b0	swscale/aarch64: Add rgb24 to yuv implementation Test on Apple M1: rgb24_to_uv_8_c: 0.0 rgb24_to_uv_8_neon: 0.2 rgb24_to_uv_128_c: 1.0 rgb24_to_uv_128_neon: 0.5 rgb24_to_uv_1080_c: 7.0 rgb24_to_uv_1080_neon: 5.7 rgb24_to_uv_1920_c: 12.5 rgb24_to_uv_1920_neon: 9.5 rgb24_to_uv_half_8_c: 0.2 rgb24_to_uv_half_8_neon: 0.2 rgb24_to_uv_half_128_c: 1.0 rgb24_to_uv_half_128_neon: 0.5 rgb24_to_uv_half_1080_c: 6.2 rgb24_to_uv_half_1080_neon: 3.0 rgb24_to_uv_half_1920_c: 11.2 rgb24_to_uv_half_1920_neon: 5.2 rgb24_to_y_8_c: 0.2 rgb24_to_y_8_neon: 0.0 rgb24_to_y_128_c: 0.5 rgb24_to_y_128_neon: 0.5 rgb24_to_y_1080_c: 4.7 rgb24_to_y_1080_neon: 3.2 rgb24_to_y_1920_c: 8.0 rgb24_to_y_1920_neon: 5.7 On Pixel 6: rgb24_to_uv_8_c: 30.7 rgb24_to_uv_8_neon: 56.9 rgb24_to_uv_128_c: 213.9 rgb24_to_uv_128_neon: 173.2 rgb24_to_uv_1080_c: 1649.9 rgb24_to_uv_1080_neon: 1424.4 rgb24_to_uv_1920_c: 2907.9 rgb24_to_uv_1920_neon: 2480.7 rgb24_to_uv_half_8_c: 36.2 rgb24_to_uv_half_8_neon: 33.4 rgb24_to_uv_half_128_c: 167.9 rgb24_to_uv_half_128_neon: 99.4 rgb24_to_uv_half_1080_c: 1293.9 rgb24_to_uv_half_1080_neon: 778.7 rgb24_to_uv_half_1920_c: 2292.7 rgb24_to_uv_half_1920_neon: 1328.7 rgb24_to_y_8_c: 19.7 rgb24_to_y_8_neon: 27.7 rgb24_to_y_128_c: 129.9 rgb24_to_y_128_neon: 96.7 rgb24_to_y_1080_c: 995.4 rgb24_to_y_1080_neon: 767.7 rgb24_to_y_1920_c: 1747.4 rgb24_to_y_1920_neon: 1337.2 Note both tests use clang as compiler, which has vectorization enabled by default with -O3. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-06-11 01:12:09 +08:00
James Almer	17c3cc5bb6	swscale/x86/rgb_2_rgb: add missing wrap to ff_uyvytoyuv422_avx2 Fixes old yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-09 16:04:36 -03:00
James Almer	03546f49a3	swscale/x86/rgb2rgb: add missing wrap for ff_uyvytoyuv422_avx2 Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-09 15:56:52 -03:00
James Almer	e8cef5e152	swscale/x86/rgb2rgb: remove mmxext version of shuffle_bytes_2103 Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-09 13:43:11 -03:00
James Almer	c578bb9864	swscale/x86/input: add AVX2 optimized uyvytoyuv422 uyvytoyuv422_c: 23991.8 uyvytoyuv422_sse2: 2817.8 uyvytoyuv422_avx: 2819.3 uyvytoyuv422_avx2: 1972.3 Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-09 13:43:11 -03:00
James Almer	e9cfd53257	swscale/x86/input: add AVX2 optimized RGB32 to YUV functions abgr_to_uv_8_c: 43.3 abgr_to_uv_8_sse2: 14.3 abgr_to_uv_8_avx: 15.3 abgr_to_uv_8_avx2: 18.8 abgr_to_uv_128_c: 650.3 abgr_to_uv_128_sse2: 110.8 abgr_to_uv_128_avx: 112.3 abgr_to_uv_128_avx2: 64.8 abgr_to_uv_1080_c: 5456.3 abgr_to_uv_1080_sse2: 888.8 abgr_to_uv_1080_avx: 900.8 abgr_to_uv_1080_avx2: 518.3 abgr_to_uv_1920_c: 9692.3 abgr_to_uv_1920_sse2: 1593.8 abgr_to_uv_1920_avx: 1613.3 abgr_to_uv_1920_avx2: 864.8 abgr_to_y_8_c: 23.3 abgr_to_y_8_sse2: 12.8 abgr_to_y_8_avx: 13.3 abgr_to_y_8_avx2: 17.3 abgr_to_y_128_c: 308.3 abgr_to_y_128_sse2: 67.3 abgr_to_y_128_avx: 66.8 abgr_to_y_128_avx2: 44.8 abgr_to_y_1080_c: 2371.3 abgr_to_y_1080_sse2: 512.8 abgr_to_y_1080_avx: 505.8 abgr_to_y_1080_avx2: 314.3 abgr_to_y_1920_c: 4177.3 abgr_to_y_1920_sse2: 915.8 abgr_to_y_1920_avx: 926.8 abgr_to_y_1920_avx2: 519.3 bgra_to_uv_8_c: 37.3 bgra_to_uv_8_sse2: 13.3 bgra_to_uv_8_avx: 14.8 bgra_to_uv_8_avx2: 19.8 bgra_to_uv_128_c: 563.8 bgra_to_uv_128_sse2: 111.3 bgra_to_uv_128_avx: 112.3 bgra_to_uv_128_avx2: 64.8 bgra_to_uv_1080_c: 4691.8 bgra_to_uv_1080_sse2: 893.8 bgra_to_uv_1080_avx: 899.8 bgra_to_uv_1080_avx2: 517.8 bgra_to_uv_1920_c: 8332.8 bgra_to_uv_1920_sse2: 1590.8 bgra_to_uv_1920_avx: 1605.8 bgra_to_uv_1920_avx2: 867.3 bgra_to_y_8_c: 22.3 bgra_to_y_8_sse2: 12.8 bgra_to_y_8_avx: 12.8 bgra_to_y_8_avx2: 17.3 bgra_to_y_128_c: 291.3 bgra_to_y_128_sse2: 67.8 bgra_to_y_128_avx: 69.3 bgra_to_y_128_avx2: 45.3 bgra_to_y_1080_c: 2357.3 bgra_to_y_1080_sse2: 508.3 bgra_to_y_1080_avx: 518.3 bgra_to_y_1080_avx2: 399.8 bgra_to_y_1920_c: 4202.8 bgra_to_y_1920_sse2: 906.8 bgra_to_y_1920_avx: 907.3 bgra_to_y_1920_avx2: 526.3 Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-09 13:43:11 -03:00
James Almer	d5fe99dc5f	swscale/x86/input: add AVX2 optimized RGB24 to YUV functions rgb24_to_uv_8_c: 39.3 rgb24_to_uv_8_sse2: 14.3 rgb24_to_uv_8_ssse3: 13.3 rgb24_to_uv_8_avx: 12.8 rgb24_to_uv_8_avx2: 14.3 rgb24_to_uv_128_c: 582.8 rgb24_to_uv_128_sse2: 127.3 rgb24_to_uv_128_ssse3: 107.3 rgb24_to_uv_128_avx: 111.3 rgb24_to_uv_128_avx2: 62.3 rgb24_to_uv_1080_c: 4981.3 rgb24_to_uv_1080_sse2: 1048.3 rgb24_to_uv_1080_ssse3: 876.8 rgb24_to_uv_1080_avx: 887.8 rgb24_to_uv_1080_avx2: 492.3 rgb24_to_uv_1280_c: 5906.8 rgb24_to_uv_1280_sse2: 1263.3 rgb24_to_uv_1280_ssse3: 1048.3 rgb24_to_uv_1280_avx: 1045.8 rgb24_to_uv_1280_avx2: 579.8 rgb24_to_uv_1920_c: 8665.3 rgb24_to_uv_1920_sse2: 1888.8 rgb24_to_uv_1920_ssse3: 1571.8 rgb24_to_uv_1920_avx: 1558.8 rgb24_to_uv_1920_avx2: 869.3 rgb24_to_y_8_c: 20.3 rgb24_to_y_8_sse2: 11.8 rgb24_to_y_8_ssse3: 10.3 rgb24_to_y_8_avx: 10.3 rgb24_to_y_8_avx2: 10.8 rgb24_to_y_128_c: 284.8 rgb24_to_y_128_sse2: 83.3 rgb24_to_y_128_ssse3: 66.8 rgb24_to_y_128_avx: 64.8 rgb24_to_y_128_avx2: 39.3 rgb24_to_y_1080_c: 2451.3 rgb24_to_y_1080_sse2: 696.3 rgb24_to_y_1080_ssse3: 516.8 rgb24_to_y_1080_avx: 518.8 rgb24_to_y_1080_avx2: 301.8 rgb24_to_y_1280_c: 2892.8 rgb24_to_y_1280_sse2: 816.8 rgb24_to_y_1280_ssse3: 623.3 rgb24_to_y_1280_avx: 616.3 rgb24_to_y_1280_avx2: 350.8 rgb24_to_y_1920_c: 4338.8 rgb24_to_y_1920_sse2: 1210.8 rgb24_to_y_1920_ssse3: 928.3 rgb24_to_y_1920_avx: 920.3 rgb24_to_y_1920_avx2: 534.8 Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-09 13:42:09 -03:00
Rémi Denis-Courmont	7a3369398f	sws/input: R-V V 32-bit RGB to halved UV T-Head C908: abgr_to_uv_half_8_c: 2.2 abgr_to_uv_half_8_rvv_i32: 3.5 abgr_to_uv_half_128_c: 44.0 abgr_to_uv_half_128_rvv_i32: 13.0 abgr_to_uv_half_1080_c: 245.0 abgr_to_uv_half_1080_rvv_i32: 107.2 abgr_to_uv_half_1920_c: 406.2 abgr_to_uv_half_1920_rvv_i32: 188.7 bgra_to_uv_half_8_c: 2.2 bgra_to_uv_half_8_rvv_i32: 3.5 bgra_to_uv_half_128_c: 26.5 bgra_to_uv_half_128_rvv_i32: 13.0 bgra_to_uv_half_1080_c: 219.7 bgra_to_uv_half_1080_rvv_i32: 107.0 bgra_to_uv_half_1920_c: 406.7 bgra_to_uv_half_1920_rvv_i32: 188.7 SpacemiT X60: abgr_to_uv_half_8_c: 2.2 abgr_to_uv_half_8_rvv_i32: 3.0 abgr_to_uv_half_128_c: 28.2 abgr_to_uv_half_128_rvv_i32: 5.7 abgr_to_uv_half_1080_c: 235.5 abgr_to_uv_half_1080_rvv_i32: 47.7 abgr_to_uv_half_1920_c: 418.2 abgr_to_uv_half_1920_rvv_i32: 84.0 bgra_to_uv_half_8_c: 2.0 bgra_to_uv_half_8_rvv_i32: 3.0 bgra_to_uv_half_128_c: 23.7 bgra_to_uv_half_128_rvv_i32: 5.7 bgra_to_uv_half_1080_c: 195.5 bgra_to_uv_half_1080_rvv_i32: 47.7 bgra_to_uv_half_1920_c: 346.5 bgra_to_uv_half_1920_rvv_i32: 84.0	2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont	e2f069905e	sws/input: R-V V 32-bit RGB to UV	2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont	f5555cb106	sws/input: R-V V 32-bit RGB to Y T-Head C908: abgr_to_y_8_c: 2.5 abgr_to_y_8_rvv_i32: 2.2 abgr_to_y_128_c: 37.0 abgr_to_y_128_rvv_i32: 8.5 abgr_to_y_1080_c: 327.0 abgr_to_y_1080_rvv_i32: 69.5 abgr_to_y_1920_c: 552.0 abgr_to_y_1920_rvv_i32: 122.2 bgra_to_y_8_c: 2.5 bgra_to_y_8_rvv_i32: 2.2 bgra_to_y_128_c: 37.2 bgra_to_y_128_rvv_i32: 8.5 bgra_to_y_1080_c: 310.2 bgra_to_y_1080_rvv_i32: 69.5 bgra_to_y_1920_c: 568.2 bgra_to_y_1920_rvv_i32: 122.5 SpacemiT X60: abgr_to_y_8_c: 2.5 abgr_to_y_8_rvv_i32: 2.0 abgr_to_y_128_c: 33.0 abgr_to_y_128_rvv_i32: 3.7 abgr_to_y_1080_c: 276.0 abgr_to_y_1080_rvv_i32: 31.5 abgr_to_y_1920_c: 493.7 abgr_to_y_1920_rvv_i32: 55.5 bgra_to_y_8_c: 2.2 bgra_to_y_8_rvv_i32: 2.0 bgra_to_y_128_c: 33.0 bgra_to_y_128_rvv_i32: 3.7 bgra_to_y_1080_c: 276.0 bgra_to_y_1080_rvv_i32: 31.5 bgra_to_y_1920_c: 490.7 bgra_to_y_1920_rvv_i32: 55.5	2024-06-09 14:33:04 +03:00
Andreas Rheinhardt	8b62fb231a	swscale/x86/rgb2rgb: Detemplatize Every function in rgb2rgb_template.c is only compiled exactly once; there is no overlap at all between the MMXEXT and the SSE2 functions, so detemplatize it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-06-09 12:03:47 +02:00
Andreas Rheinhardt	5421dee0e7	swscale/x86/rgb2rgb_template: Remove unused uyvytoyv12 Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-06-09 12:03:47 +02:00
Andreas Rheinhardt	c1c35380a7	swscale/x86/rgb2rgb: Don't unnecessarily check for inline ASM The SSE2 and AVX versions of deinterleaveBytes are external ASM. Move them out of the inline ASM template. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-06-09 12:03:47 +02:00
Andreas Rheinhardt	f7305eb3b3	swscale/x86/rgb2rgb_template: Remove unnecessary SFENCE The ff_nv12ToUV_* functions don't use non-temporal stores at all. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-06-09 12:03:47 +02:00
Rémi Denis-Courmont	e0f4d185f1	sws/input: R-V V rgb24ToUV_half and bgr24ToUV_half T-Head C908: rgb24_to_uv_half_4_c: 2.0 rgb24_to_uv_half_4_rvv_i32: 3.5 rgb24_to_uv_half_64_c: 27.0 rgb24_to_uv_half_64_rvv_i32: 12.5 rgb24_to_uv_half_540_c: 223.7 rgb24_to_uv_half_540_rvv_i32: 105.2 rgb24_to_uv_half_640_c: 265.5 rgb24_to_uv_half_640_rvv_i32: 123.7 rgb24_to_uv_half_960_c: 414.5 rgb24_to_uv_half_960_rvv_i32: 249.5 SpacemiT X60: rgb24_to_uv_half_4_c: 1.7 rgb24_to_uv_half_4_rvv_i32: 4.2 rgb24_to_uv_half_64_c: 24.0 rgb24_to_uv_half_64_rvv_i32: 8.7 rgb24_to_uv_half_540_c: 199.2 rgb24_to_uv_half_540_rvv_i32: 72.5 rgb24_to_uv_half_640_c: 235.7 rgb24_to_uv_half_640_rvv_i32: 85.2 rgb24_to_uv_half_960_c: 353.5 rgb24_to_uv_half_960_rvv_i32: 127.5	2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont	3ef5867e4b	sws/input: R-V V rgb24ToUV and bgr24ToUV T-Head C908: rgb24_to_uv_8_c: 2.7 rgb24_to_uv_8_rvv_i32: 3.2 rgb24_to_uv_128_c: 41.0 rgb24_to_uv_128_rvv_i32: 12.7 rgb24_to_uv_1080_c: 342.5 rgb24_to_uv_1080_rvv_i32: 105.7 rgb24_to_uv_1280_c: 406.0 rgb24_to_uv_1280_rvv_i32: 124.2 rgb24_to_uv_1920_c: 626.0 rgb24_to_uv_1920_rvv_i32: 186.0 SpacemiT X60: rgb24_to_uv_8_c: 2.5 rgb24_to_uv_8_rvv_i32: 3.0 rgb24_to_uv_128_c: 36.5 rgb24_to_uv_128_rvv_i32: 5.7 rgb24_to_uv_1080_c: 304.2 rgb24_to_uv_1080_rvv_i32: 49.0 rgb24_to_uv_1280_c: 360.5 rgb24_to_uv_1280_rvv_i32: 57.5 rgb24_to_uv_1920_c: 540.7 rgb24_to_uv_1920_rvv_i32: 86.2	2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont	79dfdac4db	sws/input: R-V V rgb24ToY & bgr24ToY T-Head C908: rgb24_to_y_8_c: 2.0 rgb24_to_y_8_rvv_i32: 2.7 rgb24_to_y_128_c: 26.2 rgb24_to_y_128_rvv_i32: 9.2 rgb24_to_y_1080_c: 219.5 rgb24_to_y_1080_rvv_i32: 76.2 rgb24_to_y_1280_c: 276.2 rgb24_to_y_1280_rvv_i32: 89.7 rgb24_to_y_1920_c: 389.7 rgb24_to_y_1920_rvv_i32: 134.2 SpacemiT X60: rgb24_to_y_8_c: 1.7 rgb24_to_y_8_rvv_i32: 2.2 rgb24_to_y_128_c: 23.2 rgb24_to_y_128_rvv_i32: 4.2 rgb24_to_y_1080_c: 195.0 rgb24_to_y_1080_rvv_i32: 33.7 rgb24_to_y_1280_c: 231.0 rgb24_to_y_1280_rvv_i32: 40.0 rgb24_to_y_1920_c: 346.2 rgb24_to_y_1920_rvv_i32: 59.7	2024-06-08 18:30:43 +03:00
Ramiro Polla	5939f7228a	libswscale/x86/yuv_2_rgb: fix some comments	2024-06-07 15:24:06 +02:00
Shiyou Yin	6b35fcacdb	swscale: [loongarch] Fix undeclared functions prob. Compile with '--disable-lasx', ‘lumRangeFromJpeg_lasx’ undeclared. Reviewed-by: 金波 <jinbo@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-31 02:20:23 +02:00
Michael Niedermayer	bfc22f364d	swscale/yuv2rgb: Use 64bit for brightness computation This will not overflow for normal values Fixes: CID1500280 Unintentional integer overflow Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-28 03:48:06 +02:00
Michael Niedermayer	3f9daf1c18	swscale/x86/swscale: use a clearer name for INPUT_PLANER_RGB_A_FUNC_CASE related: CID1497114 Missing break in switch Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-28 03:48:06 +02:00
Rémi Denis-Courmont	6c6313f1b5	swscale/riscv: explicitly require Zbb for MIN	2024-05-10 18:59:06 +03:00
Michael Niedermayer	1330a73cca	swscale/output: Fix integer overflow in yuv2rgba64_full_1_c_template() Fixes: signed integer overflow: -1082982400 + -1079364728 cannot be represented in type 'int' Fixes: 67910/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5329011971522560 The input is 9bit in 16bit, the fuzzer fills all 16bit thus generating "invalid" input No overflow should happen with valid input. Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-06 03:00:40 +02:00
Michael Niedermayer	a56559e688	swscale/output: Fix integer overflow in yuv2rgba64_1_c_template Fixes: signed integer overflow: -831176 * 9539 cannot be represented in type 'int' Fixes: 67869/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5117342091640832 The input is 9bit in 16bit, the fuzzer fills all 16bit thus generating "invalid" input No overflow should happen with valid input. Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-06 03:00:40 +02:00
Shiyou Yin	2a7d622ddd	swscale: [LA] Optimize swscale funcs in input.c Optimized 7 funcs with LSX and LASX: 1. yuy2ToUV_c 2. yvy2ToUV_c 3. uyvyToUV_c 4. nv12ToUV_c 5. nv21ToUV_c 6. abgrToA_c 7. rgbaToA_c Reviewed-by: colleague of Shiyou Yin Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-11 23:53:59 +02:00
Shiyou Yin	8b76df9142	swscale: [LA] Optimize yuv2plane1_8_c. Reviewed-by: colleague of Shiyou Yin Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-11 23:53:59 +02:00
Shiyou Yin	f3fe2cb5f7	swscale: [LA] Optimize range convert for yuvj420p. Reviewed-by: 陈昊 <chenhao@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-11 23:53:41 +02:00
Michael Niedermayer	1a9eda65d0	swscale/utils: Fix xInc overflow Fixes: signed integer overflow: 2 * 1073741824 cannot be represented in type 'int' Fixes: 67802/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6249515855183872 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-04 19:38:29 +02:00
Andreas Rheinhardt	428ff7bd8c	swscale/ppc/swscale_ppc_template: Reindent after the previous commit Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-04-04 16:47:21 +02:00
Andreas Rheinhardt	95b4aea5e3	swscale/ppc/swscale_ppc_template: Remove code not passing checkasm Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-04-04 16:45:23 +02:00
Andreas Rheinhardt	790f793844	avutil/common: Don't auto-include mem.h There are lots of files that don't need it: The number of object files that actually need it went down from 2011 to 884 here. Keep it for external users in order to not cause breakages. Also improve the other headers a bit while just at it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-31 00:08:43 +01:00
Andreas Rheinhardt	b616be1649	lib*/version: Use static_assert for static asserts Also update the checks that guard against inserting a new enum entry in the middle of a range. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-31 00:08:42 +01:00
Andreas Rheinhardt	2d38141ea6	swscale/swscale_internal: Don't export internal function sws_alloc_set_opts() can actually be made internal to utils.c. This commit does so. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-31 00:08:42 +01:00
Andreas Rheinhardt	ad1cef04a9	swscale/swscale_internal: Hoist branch out of loop Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-31 00:08:42 +01:00
Andreas Rheinhardt	b49e621c83	swscale/ppc/swscale_altivec: Simplify macro Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-31 00:08:42 +01:00
Andreas Rheinhardt	72f4f1dafb	swscale/ppc/swscale_altivec: Fix build with -O0 In this case GCC does not treat a const variable initialized to the compile-time constant "3" as a compile-time constant and errors out because the argument is not a literal value. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-31 00:08:42 +01:00
Andreas Rheinhardt	4b44b5eaf0	swscale/swscale_internal: Only include altivec header iff HAVE_ALTIVEC Reviewed-by: Sean McGovern <gseanmcg@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-30 05:04:41 +01:00
Michael Niedermayer	6b213175c9	Bump after 7.0 branch point Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-27 01:04:54 +01:00
Michael Niedermayer	872980ace6	Bump prior release/7.0 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-03-27 01:04:53 +01:00
Henrik Gramner	c3d3f0e697	avutil/x86util: Fix broken pre-SSE4.1 PMINSD emulation Fixes yadif-16 which allows FATE to pass. Broken since `2904db9045` (2017).	2024-03-17 13:52:27 +01:00
James Almer	783d00b203	libs: bump major version for all libraries Signed-off-by: James Almer <jamrial@gmail.com>	2024-03-07 11:29:43 -03:00
Michael Niedermayer	e9cc9e492f	libswscale/utils: Fix bayer to yuvj Fixes: out of array access. Earlier code assumes that a unscaled bayer to yuvj420 converter exists but the later code then skips yuvj420 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-02-21 18:24:17 +01:00
Michael Niedermayer	f9906911f0	Revert "swscale: fix sws_setColorspaceDetails after sws_init_context" Suggested by: Niklas Haas in Ticket10824 Fixes: Assertion failure Fixes: Ticket10824 This reverts commit `cedf589c09`.	2024-02-21 18:24:17 +01:00
Michael Niedermayer	64098d0cd8	swscale/swscale: Check srcSliceH for bayer Fixes: Assertion srcSliceH > 1 failed at libswscale/swscale_unscaled.c:1359 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-02-21 18:24:16 +01:00
Michael Niedermayer	18f26f8a2f	swscale/utils: Allocate more dithererror Fixes: out of array read Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-02-21 18:24:16 +01:00
Michael Niedermayer	ebb7dffa97	swscale/tests/swscale: Add help text Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-02-15 23:07:44 +01:00
Michael Niedermayer	6ebe4ebee3	swscale/tests/swscale: Highlight cases that worsened also highlight cases that worsened alot in uppercase Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-02-15 23:07:44 +01:00
Michael Niedermayer	f7770ec9a4	swscale/tests/swscale: Allow comparing a subset of cases to a reference file Testing all cases exhaustively is slow Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-02-15 23:07:44 +01:00
Michael Niedermayer	885a802f24	swscale/tests/swscale: Test a wider range of flag combinations Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-02-15 23:07:43 +01:00
Michael Niedermayer	35ab103c30	swscale/tests/swscale: Compute chroma and alpha between gray and opaque frames too Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-02-15 23:07:43 +01:00
Michael Niedermayer	247f485448	swscale/tests/swscale: Split sws_getContext() Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-02-15 23:07:43 +01:00
Michael Niedermayer	1055ece30b	swscale/tests/swscale: Implement isALPHA() using AVPixFmtDescriptor Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-02-15 23:07:42 +01:00
Anton Khirnov	1e7d2007c3	all: use designated initializers for AVOption.unit Makes it robust against adding fields before it, which will be useful in following commits. Majority of the patch generated by the following Coccinelle script: @@ typedef AVOption; identifier arr_name; initializer list il; initializer list[8] il1; expression tail; @@ AVOption arr_name[] = { il, { il1, - tail + .unit = tail }, ... }; with some manual changes, as the script: * has trouble with options defined inside macros * sometimes does not handle options under an #else branch * sometimes swallows whitespace	2024-02-14 14:53:41 +01:00
Rémi Denis-Courmont	b3825bbe45	riscv: test for assembler support This should fix the build on LLVM 16 and earlier, at the cost of turning all non-RVV optimisations off.	2023-12-08 17:21:09 +02:00
Alfred Wingate	e5ce473040	swscale/x86/rgb_2_rgb: Add opaque pointer to missed definitions of ff_nv12ToUV Opaque parameters were previously added to the original definition of ff_nv12ToUV, leading to gcc noticing a type mismatch with -Wlto-type-mismatch. `f2de911818` https://bugs.gentoo.org/907484 Signed-off-by: Alfred Wingate <parona@protonmail.com> Signed-off-by: Anton Khirnov <anton@khirnov.net>	2023-12-02 11:22:46 +01:00
xufuji456	cc86343b96	lavc/hevcdsp_qpel_neon: using movi.16b instead of movi.2d Building iOS platform with arm64, the compiler has a warning: "instruction movi.2d with immediate #0 may not function correctly on this CPU, converting to movi.16b" Signed-off-by: xufuji456 <839789740@qq.com> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-11-28 15:54:49 +02:00
Rémi Denis-Courmont	6d60cc7baf	sws/rgb2rgb: fix unaligned accesses in R-V V YUYV to I422p In my personal opinion, we should not need to support unaligned YUY2 pixel maps. They should always be aligned to at least 32 bits, and the current code assumes just 16 bits. However checkasm does test for unaligned input bitmaps. QEMU accepts it, but real hardware dose not. In this particular case, we can at the same time improve performance and handle unaligned inputs, so do just that. uyvytoyuv422_c: 104379.0 uyvytoyuv422_c: 104060.0 uyvytoyuv422_rvv_i32: 25284.0 (before) uyvytoyuv422_rvv_i32: 19303.2 (after)	2023-11-13 18:34:29 +02:00
Rémi Denis-Courmont	5b8b5ec9c5	sws/rgb2rgb: rework R-V V YUY2 to 4:2:2 planar This saves three scratch registers and three instructions per line. The performance gains are mostly negligible. The main point is to free up registers for further rework.	2023-11-13 18:34:29 +02:00
Niklas Haas	736284e7b9	swscale/yuv2rgb: fix sws_getCoefficients for colorspace=0 The documentation states that invalid entries default to SWS_CS_DEFAULT. A value of 0 is not a valid SWS_CS_*, yet the code incorrectly hard-codes it to BT.709 coefficients instead of SWS_CS_DEFAULT.	2023-11-09 12:53:35 +01:00
Niklas Haas	d043e5c54c	swscale: don't omit ff_sws_init_range_convert for high-bit This was a complete hack seemingly designed to work around a different bug, which was fixed in the previous commit. As such, there is no more reason not to do this, as it simply breaks changing color range in sws_setColorspaceDetails for no reason.	2023-11-09 12:53:35 +01:00
Niklas Haas	cedf589c09	swscale: fix sws_setColorspaceDetails after sws_init_context More commonly, this fixes the case of sws_setColorspaceDetails after sws_getContext, since the latter implies sws_init_context. The problem here is that sws_init_context sets up the range conversion and fast path tables based on the values of srcRange/dstRange at init time. This may result in locking in a "wrong" path (either using unscaled fast path when range conversion later required, or using scaled slow path when range conversion becomes no longer required). There are two way outs: 1. Always initialize range conversion and unscaled converters, even if they will be unused, and extend the runtime check. 2. Re-do initialization if the values change after sws_setColorspaceDetails. I opted for approach 1 because it was simpler and easier to reason about. Reword the av_log message to make it clear that this special converter is not necessarily used, depending on whether or not there is range conversion or YUV matrix conversion going on.	2023-11-09 12:53:35 +01:00
Michael Niedermayer	47e784f881	Bump versions after 6.1 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-10-29 16:19:14 +01:00
Michael Niedermayer	9d3a7d30c4	Bump versions prior to 6.1 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2023-10-29 15:34:05 +01:00
Martin Storsjö	a76b409dd0	aarch64: Reindent all assembly to 8/24 column indentation libavcodec/aarch64/vc1dsp_neon.S is skipped here, as it intentionally uses a layered indentation style to visually show how different unrolled/interleaved phases fit together. Signed-off-by: Martin Storsjö <martin@martin.st>	2023-10-21 23:25:54 +03:00
Martin Storsjö	93cda5a9c2	aarch64: Lowercase UXTW/SXTW and similar flags Signed-off-by: Martin Storsjö <martin@martin.st>	2023-10-21 23:25:23 +03:00
Martin Storsjö	184103b310	aarch64: Consistently use lowercase for vector element specifiers Signed-off-by: Martin Storsjö <martin@martin.st>	2023-10-21 23:25:18 +03:00
Rémi Denis-Courmont	19baf4e009	swscale/rgb2rgb: R-V V deinterleaveBytes	2023-10-03 22:53:20 +03:00
Rémi Denis-Courmont	ede3215115	swscale/rgb2rgb: fix extra iteration in R-V V interleave There was an additional iteration doing nothing for each line, due to checking the selected vector length instead of the available vector length.	2023-10-03 22:53:20 +03:00
Rémi Denis-Courmont	d14130aea3	swscale/rgb2rgb: unroll R-V V interleave_bytes	2023-10-03 20:48:47 +03:00
Rémi Denis-Courmont	6269c4a440	swscale/rgb2rgb: unroll RISC-V V uyvytoyuv422	2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont	e50f8e861b	swscale/rgb2rgb: avoid S-regs in RISC-V V uyvytoyuv422 We can make do with callee-clobbered registers only now. As an added bonus, this makes the code XLEN-independent.	2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont	be37a2e364	swscale/rgb2rgb: rework RISC-V V uyvytoyuv422 This avoids using relatively slow register strides.	2023-10-03 20:48:39 +03:00
Rémi Denis-Courmont	1a4bd76ea5	swscale/rgb2rgb: remove R-V V shuffle_bytes_3012 This is slower than the Zbb version on real hardware due to register strides. Proper support for vector byte-swap requires the Zvbb extension, but it's much too early for me to worry about it.	2023-10-02 22:28:38 +03:00
Rémi Denis-Courmont	c4a144c29d	swscale/rgb2rgb: add R-V Zbb shuffle_bytes_3210	2023-10-02 22:28:25 +03:00
Paul B Mahol	29b673bdcf	swscale: add GBRAP14 format support	2023-09-28 19:37:58 +02:00
Andreas Rheinhardt	f8503b4c33	avutil/internal: Don't auto-include emms.h Instead include emms.h wherever it is needed. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2023-09-04 11:04:45 +02:00
L. E. Segovia	ddc1cd5cdd	configure: Set WIN32_LEAN_AND_MEAN at configure time Including winsock2.h or windows.h without WIN32_LEAN_AND_MEAN cause bzlib.h to parse as nonsense, due to an instance of #define char small in rpcndr.h. See: https://stackoverflow.com/a/27794577 Signed-off-by: L. E. Segovia <amy@amyspark.me> Signed-off-by: Martin Storsjö <martin@martin.st>	2023-08-14 22:57:28 +03:00
Rémi Denis-Courmont	c2b38619c0	swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{1230,3012} This avoids strided loads. Before: shuffle_bytes_1230_rvv_i32: 308.7 shuffle_bytes_3012_rvv_i32: 308.7 After: shuffle_bytes_1230_rvv_i32: 46.7 shuffle_bytes_3012_rvv_i32: 46.7	2023-07-21 22:18:02 +03:00
Rémi Denis-Courmont	15982554e6	swscale/rgb2rgb2: rework RISC-V V shuffle_bytes_{0321,2103} This avoids strided loads. Before: shuffle_bytes_0321_rvv_i32: 307.7 shuffle_bytes_2103_rvv_i32: 308.7 After: shuffle_bytes_0321_rvv_i32: 59.7 shuffle_bytes_2103_rvv_i32: 61.5	2023-07-21 22:18:02 +03:00

1 2 3 4 5 ...

2629 Commits