FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-21 10:55:51 +02:00

Author	SHA1	Message	Date
James Almer	04612351ab	swscale/input: add V30X input support Signed-off-by: James Almer <jamrial@gmail.com>	2024-10-08 22:26:07 -03:00
James Almer	ea05edc9e0	swscale/input: add VYU444 input support Signed-off-by: James Almer <jamrial@gmail.com>	2024-10-08 22:24:47 -03:00
James Almer	ec7f5e314d	swscale/input: add UYVA input support Signed-off-by: James Almer <jamrial@gmail.com>	2024-10-08 22:24:47 -03:00
James Almer	bb37d3c33e	swscale/input: add AYUV input support Signed-off-by: James Almer <jamrial@gmail.com>	2024-10-08 22:24:47 -03:00
jinbo	e6ecc1e757	swscale: Fix conflicting types for loongarch Build breaks after `c1a0e65763` Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-10-09 01:55:50 +02:00
Niklas Haas	477445722c	swscale/ppc: fix altivec build failure Fixes: `c1a0e65763`	2024-10-08 16:45:36 +02:00
Martin Storsjö	b9145fcab2	swscale: Fix aarch64 and i386 compilation failures This unbreaks builds after `c1a0e65763`, which broke with errors like src/libswscale/aarch64/rgb2rgb.c:66:25: error: incompatible function pointer types assigning to 'void ()(const uint8_t , uint8_t , uint8_t , uint8_t , int, int, int, int, int, const int32_t )' (aka 'void ()(const unsigned char , unsigned char , unsigned char , unsigned char , int, int, int, int, int, const int )') from 'void (const uint8_t , uint8_t , uint8_t , uint8_t , int, int, int, int, int, int32_t )' (aka 'void (const unsigned char , unsigned char , unsigned char , unsigned char , int, int, int, int, int, int )') [-Wincompatible-function-pointer-types] 66 \| ff_rgb24toyv12 = rgb24toyv12; \| ^ ~~~~~~~~~~~ and src/libswscale/aarch64/swscale_unscaled.c:213:29: error: incompatible function pointer types assigning to 'SwsFunc' (aka 'int ()(struct SwsContext , const unsigned char const , const int , int, int, unsigned char const , const int )') from 'int (SwsContext , const uint8_t const , const int , int, int, const uint8_t *, const int )' (aka 'int (struct SwsContext , const unsigned char const , const int , int, int, const unsigned char *, const int )') [-Wincompatible-function-pointer-types] 213 \| c->convert_unscaled = nv24_to_yuv420p_neon_wrapper; \| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Signed-off-by: Martin Storsjö <martin@martin.st>	2024-10-08 09:29:07 +03:00
Niklas Haas	73b3344edd	swscale/input: parametrize ff_sws_init_input_funcs() pointers Following the precedent set by ff_sws_init_output_funcs(). Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2024-10-07 19:51:34 +02:00
Niklas Haas	20b350b284	swscale/internal: add typedefs for input reading functions Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2024-10-07 19:51:34 +02:00
Niklas Haas	b90d522d2c	swscale/internal: forward typedef SwsContext Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2024-10-07 19:51:34 +02:00
Niklas Haas	c1a0e65763	swscale/internal: constify SwsFunc I want to move away from having random leaf processing functions mutate plane pointers, and while we're at it, we might as well make the strides and tables const as well. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2024-10-07 19:51:34 +02:00
Niklas Haas	286bdc9cdc	swscale/internal: turn cascaded_tmp into an array Slightly more convenient to access from the new wrapping code. Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2024-10-07 19:51:34 +02:00
Niklas Haas	61369484f6	swscale/internal: expose ff_update_palette() internally Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2024-10-07 19:51:34 +02:00
Niklas Haas	aee19ee431	swscale/internal: rename NB_SWS_DITHER for consistency Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2024-10-07 19:51:34 +02:00
Niklas Haas	41ce370b65	tests/swscale: fix minor typos Sponsored-by: Sovereign Tech Fund Signed-off-by: Niklas Haas <git@haasn.dev>	2024-10-07 19:51:34 +02:00
Michael Niedermayer	38e224c2ba	*/version.h: bump after release/7.1 branch Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-09-24 17:10:35 +02:00
Michael Niedermayer	e1094ac45d	*/version.h: bump minor versions for release/7.1 Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-09-24 17:07:30 +02:00
Zhao Zhili	e18b46d95f	swscale/aarch64: Fix rgb24toyv12 only works with aligned width Since `c0666d8b`, rgb24toyv12 is broken for width non-aligned to 16. Add a simple wrapper to handle the non-aligned part. Co-authored-by: johzzy <hellojinqiang@gmail.com> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-09-24 10:24:14 +08:00
Michael Niedermayer	bd80c97391	swscale/output: Fix undefined integer overflow in yuv2rgba64_2_c_template() Fixes: signed integer overflow: -1082982400 + -1083218484 cannot be represented in type 'int' Fixes: 70657/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6707819712675840 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-09-19 00:24:26 +02:00
Michael Niedermayer	44c5641ae8	swscale/swscale: Use unsigned operation to avoid undefined behavior I have not checked that the constant is correct, this just fixes the undefined behavior Fixes: signed integer overflow: -646656 * 3517 cannot be represented in type 'int Fixes: 70559/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5209368631508992 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-09-19 00:10:38 +02:00
Ramiro Polla	c0666d8bed	swscale/aarch64/rgb2rgb: add neon implementation for rgb24toyv12 A55 A76 rgb24toyv12_16_200_c: 36890.6 17275.5 rgb24toyv12_16_200_neon: 12460.1 ( 2.96x) 5360.8 ( 3.22x) rgb24toyv12_128_60_c: 83205.1 39884.8 rgb24toyv12_128_60_neon: 27468.4 ( 3.03x) 13552.5 ( 2.94x) rgb24toyv12_512_16_c: 88111.6 42346.8 rgb24toyv12_512_16_neon: 29126.6 ( 3.03x) 14411.2 ( 2.94x) rgb24toyv12_1920_4_c: 82068.1 39620.0 rgb24toyv12_1920_4_neon: 27011.6 ( 3.04x) 13492.2 ( 2.94x)	2024-09-06 23:11:13 +02:00
Ramiro Polla	caaec2ea95	swscale/x86/rgb2rgb: disable rgb24toyv12_mmxext for x86_64 The mmxext implementation is slower than the C version in x86_64. m32 m64 rgb24toyv12_16_200_c: 24942.7 14812.6 rgb24toyv12_16_200_mmxext: 17857.2 ( 1.40x) 17400.4 ( 0.85x) rgb24toyv12_128_60_c: 56892.9 35616.9 rgb24toyv12_128_60_mmxext: 40730.9 ( 1.40x) 39610.4 ( 0.90x) rgb24toyv12_512_16_c: 58402.7 37209.4 rgb24toyv12_512_16_mmxext: 44842.4 ( 1.30x) 41136.2 ( 0.90x) rgb24toyv12_1920_4_c: 54827.4 34737.4 rgb24toyv12_1920_4_mmxext: 51169.9 ( 1.07x) 34818.9 ( 1.00x)	2024-09-06 23:06:38 +02:00
Ramiro Polla	3604b2403c	swscale/rgb2rgb: improve chroma conversion in ff_rgb24toyv12_c The current code subsamples by dropping 3/4 pixels to calculate the chroma components. This commit calculates the average of 4 rgb pixels before calculating the chroma components, putting it in line with the mmxext implementation.	2024-09-06 23:06:32 +02:00
Ramiro Polla	d8848325a6	swscale/aarch64/rgb2rgb: add deinterleaveBytes neon implementation A55 A76 deinterleave_bytes_c: 70342.0 34497.5 deinterleave_bytes_neon: 21594.5 ( 3.26x) 5535.2 ( 6.23x) deinterleave_bytes_aligned_c: 71340.8 34651.2 deinterleave_bytes_aligned_neon: 8616.8 ( 8.28x) 3996.2 ( 8.67x)	2024-09-06 23:05:09 +02:00
Ramiro Polla	4c824ad391	swscale/x86/rgb2rgb: fix deinterleaveBytes writing past the end of the buffers	2024-09-06 23:05:04 +02:00
Ramiro Polla	f17a6bd200	swscale/x86/rgb2rgb: fix deinterleaveBytes for unaligned dst pointers	2024-09-06 23:05:01 +02:00
Rémi Denis-Courmont	27d28b68da	swscale/rgb2rgb: enable R-V V deinterleaveBytes T-Head C908: deinterleave_bytes_c: 100328.3 ( 1.00x) deinterleave_bytes_rvv_i32: 19331.3 ( 5.19x) deinterleave_bytes_aligned_c: 100337.5 ( 1.00x) deinterleave_bytes_aligned_rvv_i32: 15748.0 ( 6.37x) SpacemiT X60: deinterleave_bytes_c: 95230.6 ( 1.00x) deinterleave_bytes_rvv_i32: 9790.3 ( 9.73x) deinterleave_bytes_aligned_c: 96564.1 ( 1.00x) deinterleave_bytes_aligned_rvv_i32: 7780.1 (12.41x)	2024-09-04 22:04:11 +03:00
Ramiro Polla	420d443600	swscale/aarch64: cosmetics fix (spaces inside curly braces)	2024-08-26 11:07:49 +02:00
Ramiro Polla	52887683e9	swscale/aarch64: add nv24/nv42 to yuv420p unscaled converter A55 A76 nv24_yuv420p_128_c: 4956.1 1267.0 nv24_yuv420p_128_neon: 3109.1 ( 1.59x) 640.0 ( 1.98x) nv24_yuv420p_1920_c: 35728.4 11736.2 nv24_yuv420p_1920_neon: 8011.1 ( 4.46x) 2436.0 ( 4.82x) nv42_yuv420p_128_c: 4956.4 1270.5 nv42_yuv420p_128_neon: 3074.6 ( 1.61x) 639.5 ( 1.99x) nv42_yuv420p_1920_c: 35685.9 11732.5 nv42_yuv420p_1920_neon: 7995.1 ( 4.46x) 2437.2 ( 4.81x)	2024-08-26 11:04:46 +02:00
Ramiro Polla	88a563ad18	swscale: export ff_copyPlane so it may be used by simd code	2024-08-26 11:04:46 +02:00
Ramiro Polla	4eb5594295	swscale: add nv24/nv42 to yuv420p unscaled converter	2024-08-26 11:04:46 +02:00
Martin Storsjö	cfe0a36352	libswscale: aarch64: Fix the indentation of some macro invocations Signed-off-by: Martin Storsjö <martin@martin.st>	2024-08-22 14:40:30 +03:00
Martin Storsjö	507c2a5774	libswscale: arm: Don't assume aligned output in yuv2rgb functions This fixes failures in recently added checkasm tests. While the buffers in most cases are aligned, libswscale in general can't assume the output to be aligned. Signed-off-by: Martin Storsjö <martin@martin.st>	2024-08-19 23:04:52 +03:00
Ramiro Polla	181cd260db	swscale/aarch64/yuv2rgb: add neon yuv42{0,2}p -> gbrp unscaled colorspace converters checkasm --bench on a Raspberry Pi 5 Model B Rev 1.0: yuv420p_gbrp_128_c: 1243.0 yuv420p_gbrp_128_neon: 453.5 yuv420p_gbrp_1920_c: 18165.5 yuv420p_gbrp_1920_neon: 6700.0 yuv422p_gbrp_128_c: 1463.5 yuv422p_gbrp_128_neon: 471.5 yuv422p_gbrp_1920_c: 21343.7 yuv422p_gbrp_1920_neon: 6743.5	2024-08-18 22:26:17 +02:00
Ramiro Polla	8744764a4c	swscale/x86/yuv2rgb: add ssse3 yuv42{0,2}p -> gbrp unscaled colorspace converters Note: this implementation is limited to x86_64 due to general purpose register pressure. checkasm --bench on an Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz: yuv420p_gbrp_8_c: 118.5 yuv420p_gbrp_8_ssse3: 93.3 yuv420p_gbrp_128_c: 1068.3 yuv420p_gbrp_128_ssse3: 319.3 yuv420p_gbrp_1080_c: 8841.8 yuv420p_gbrp_1080_ssse3: 2211.8 yuv420p_gbrp_1920_c: 15903.8 yuv420p_gbrp_1920_ssse3: 3814.3 yuv422p_gbrp_8_c: 144.8 yuv422p_gbrp_8_ssse3: 93.8 yuv422p_gbrp_128_c: 1395.8 yuv422p_gbrp_128_ssse3: 313.0 yuv422p_gbrp_1080_c: 11551.5 yuv422p_gbrp_1080_ssse3: 2240.8 yuv422p_gbrp_1920_c: 20585.3 yuv422p_gbrp_1920_ssse3: 5249.5 yuva420p_gbrp_8_c: 117.5 yuva420p_gbrp_8_ssse3: 92.0 yuva420p_gbrp_128_c: 1593.0 yuva420p_gbrp_128_ssse3: 319.3 yuva420p_gbrp_1080_c: 8694.5 yuva420p_gbrp_1080_ssse3: 2186.0 yuva420p_gbrp_1920_c: 15946.5 yuva420p_gbrp_1920_ssse3: 3805.3	2024-08-18 22:26:14 +02:00
Ramiro Polla	4545205a26	swscale/yuv2rgb: add yuv42{0,2}p -> gbrp unscaled colorspace converters	2024-08-18 22:26:11 +02:00
Ramiro Polla	af5adf57e3	swscale/yuv2rgb: prepare YUV2RGBFUNC macro for multi-planar rgb This will be used in the upcoming yuv42{0,2}p -> gbrp unscaled colorspace converters. There is no difference in performance.	2024-08-18 22:26:08 +02:00
Ramiro Polla	24063e7827	swscale/yuv2rgb: prepare LOADCHROMA/PUTFUNC macros for multi-planar rgb This will be used in the upcoming yuv42{0,2}p -> gbrp unscaled colorspace converters. There is no difference in performance.	2024-08-18 22:26:05 +02:00
Niklas Haas	6b40be941a	swscale/options: relax src/dst_h/v_chr_pos value range When dealing with 4x subsampling ratios (log2 == 2), such as can arise with 4:1:1 or 4:1:0, a value range of 512 is not enough to cover the range of possible scenarios. For example, bottom-sited chroma in 4:1:0 would require an offset of 768 (three luma rows). Simply double the limit to 1024. I don't see any place in initFilter() that would experience overflow as a result of this change, especially since get_local_pos() right-shifts it by the subsampling ratio again.	2024-08-16 11:43:37 +02:00
Niklas Haas	3e064f52eb	swscale: document SWS_FULL_CHR_H_* flags Based on my best understanding of what they do, given the source code.	2024-08-16 11:43:37 +02:00
James Almer	66592e8b10	swscale/output: don't leave the alpha channel undefined in vuyx and xv36le It's non-determistic, as shown by poisoning avfilter buffers instead of zeroing them. Signed-off-by: James Almer <jamrial@gmail.com>	2024-08-13 14:49:41 -03:00
Rémi Denis-Courmont	210877c5fd	sws/riscv: depend on RVB and simplify accordingly	2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont	bd0c3edb13	lavu/riscv: count bytes rather than words for bswap32 This removes the dependency on Zba at essentially zero cost.	2024-07-30 18:41:51 +03:00
Shiyou Yin	4713a5cc24	swscale: [loongarch] Fix checkasm-sw_yuv2rgb failure. Reviewed-by: 陈昊 <chenhao@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-07-28 19:02:16 +02:00
Rémi Denis-Courmont	4f2472909e	sws/riscv: add forward-edge CFI landing pads	2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont	e91a8cc4de	sws/riscv: require B or zba explicitly	2024-07-25 18:55:48 +03:00
Michael Niedermayer	bcab9789ef	swscale/output: Fix integer overflows in yuv2rgba64_X_c_template Fixes: signed integer overflow: -1082982400 + -1068681048 cannot be represented in type 'int' Fixes: 69995/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6285740271534080 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-07-21 15:35:08 +02:00
Niklas Haas	4ec45aca36	swscale/utils: fix leak on threaded ctx init failure This count gets incremented after init succeeds, when it should be incremented after alloc succeeds. Otherwise, we leak the context on failure. There are no negative consequences of incrementing for allocated-but-not-initialized contexts, as the only functions that reference it will, in the worst case, simply behave as if called on allocated-but-not-initialized contexts, which is in line with expected behavior when sws_init_context() fails.	2024-07-14 13:48:59 +02:00
Sean McGovern	34b4ca8696	swscale: prevent undefined behaviour in the PUTRGBA macro For even small values of 'asrc[x]', shifting them by 24 bits or more will cause arithmetic overflow and be caught by GCC's undefined behaviour sanitizer. Ensure the values do not overflow by up-casting the bracketed expressions involving 'asrc' to uint32_t. Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-07-10 18:10:10 +02:00
Ramiro Polla	ac6263945a	swscale/x86/yuv2rgb: Detemplatize Every function in yuv2rgb_template.c is only compiled exactly once, so detemplatize it.	2024-07-10 12:25:32 +02:00
Ramiro Polla	4f7f9b1026	swscale: remove unconditional #define DITHER1XBPP This seems to have had an use in the past, but it is now defined unconditionally.	2024-07-10 12:25:03 +02:00
Michael Niedermayer	66b60bae68	swscale/swscale: Use ptrdiff_t for linesize computations This is unlikely to make a difference Fixes: CID1591896 Unintentional integer overflow Fixes: CID1591901 Unintentional integer overflow Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-07-07 23:36:30 +02:00
Zhao Zhili	4d90a76986	swscale/aarch64: Add argb/abgr to yuv Test on Apple M1 with kperf: : -O3 : -O3 -fno-vectorize abgr_to_uv_8_c : 19.4 : 26.1 abgr_to_uv_8_neon : 29.9 : 51.1 abgr_to_uv_128_c : 146.4 : 558.9 abgr_to_uv_128_neon : 85.1 : 83.4 abgr_to_uv_1080_c : 1162.6 : 4786.4 abgr_to_uv_1080_neon : 819.6 : 826.6 abgr_to_uv_1920_c : 2063.6 : 8492.1 abgr_to_uv_1920_neon : 1435.1 : 1447.1 abgr_to_uv_half_8_c : 16.4 : 11.4 abgr_to_uv_half_8_neon : 35.6 : 20.4 abgr_to_uv_half_128_c : 108.6 : 359.4 abgr_to_uv_half_128_neon : 75.4 : 42.6 abgr_to_uv_half_1080_c : 883.4 : 2885.6 abgr_to_uv_half_1080_neon : 460.6 : 481.1 abgr_to_uv_half_1920_c : 1553.6 : 5106.9 abgr_to_uv_half_1920_neon : 817.6 : 820.4 abgr_to_y_8_c : 6.1 : 26.4 abgr_to_y_8_neon : 40.6 : 6.4 abgr_to_y_128_c : 99.9 : 390.1 abgr_to_y_128_neon : 67.4 : 55.9 abgr_to_y_1080_c : 735.9 : 3170.4 abgr_to_y_1080_neon : 534.6 : 536.6 abgr_to_y_1920_c : 1279.4 : 6016.4 abgr_to_y_1920_neon : 932.6 : 927.6 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-07-05 16:32:31 +08:00
Zhao Zhili	52422133ae	swscale/aarch64: Add bgra/rgba to yuv Test on Apple M1 with kperf : -O3 : -O3 -fno-vectorize bgra_to_uv_8_c : 13.4 : 27.5 bgra_to_uv_8_neon : 37.4 : 41.7 bgra_to_uv_128_c : 155.9 : 550.2 bgra_to_uv_128_neon : 91.7 : 92.7 bgra_to_uv_1080_c : 1173.2 : 4558.2 bgra_to_uv_1080_neon : 822.7 : 809.5 bgra_to_uv_1920_c : 2078.2 : 8115.2 bgra_to_uv_1920_neon : 1437.7 : 1438.7 bgra_to_uv_half_8_c : 17.9 : 14.2 bgra_to_uv_half_8_neon : 37.4 : 10.5 bgra_to_uv_half_128_c : 103.9 : 326.0 bgra_to_uv_half_128_neon : 73.9 : 68.7 bgra_to_uv_half_1080_c : 850.2 : 3732.0 bgra_to_uv_half_1080_neon : 484.2 : 490.0 bgra_to_uv_half_1920_c : 1479.2 : 4942.7 bgra_to_uv_half_1920_neon : 824.2 : 824.7 bgra_to_y_8_c : 8.2 : 29.5 bgra_to_y_8_neon : 18.2 : 32.7 bgra_to_y_128_c : 101.4 : 361.5 bgra_to_y_128_neon : 74.9 : 73.7 bgra_to_y_1080_c : 739.4 : 3018.0 bgra_to_y_1080_neon : 613.4 : 544.2 bgra_to_y_1920_c : 1298.7 : 5326.0 bgra_to_y_1920_neon : 918.7 : 934.2 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-07-05 16:32:31 +08:00
Zhao Zhili	b8b71be07a	swscale/aarch64: Add bgr24 to yuv Test on Apple M1 with kperf : -O3 : -O3 -fno-vectorize bgr24_to_uv_8_c : 28.5 : 52.5 bgr24_to_uv_8_neon : 54.5 : 59.7 bgr24_to_uv_128_c : 294.0 : 830.7 bgr24_to_uv_128_neon : 99.7 : 112.0 bgr24_to_uv_1080_c : 965.0 : 6624.0 bgr24_to_uv_1080_neon : 751.5 : 754.7 bgr24_to_uv_1920_c : 1693.2 : 11554.5 bgr24_to_uv_1920_neon : 1292.5 : 1307.5 bgr24_to_uv_half_8_c : 54.2 : 37.0 bgr24_to_uv_half_8_neon : 27.2 : 22.5 bgr24_to_uv_half_128_c : 127.2 : 392.5 bgr24_to_uv_half_128_neon : 63.0 : 52.0 bgr24_to_uv_half_1080_c : 880.2 : 3329.0 bgr24_to_uv_half_1080_neon : 401.5 : 390.7 bgr24_to_uv_half_1920_c : 1585.7 : 6390.7 bgr24_to_uv_half_1920_neon : 694.7 : 698.7 bgr24_to_y_8_c : 21.7 : 22.5 bgr24_to_y_8_neon : 797.2 : 25.5 bgr24_to_y_128_c : 88.0 : 280.5 bgr24_to_y_128_neon : 63.7 : 55.0 bgr24_to_y_1080_c : 616.7 : 2208.7 bgr24_to_y_1080_neon : 900.0 : 452.0 bgr24_to_y_1920_c : 1093.2 : 3894.7 bgr24_to_y_1920_neon : 777.2 : 767.5 Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-07-05 16:32:31 +08:00
Ramiro Polla	61e851381f	swscale/yuv2rgb/x86: remove mmx/mmxext yuv2rgb functions These functions are either slower or barely faster than the C LUT yuv2rgb code.	2024-07-04 11:12:47 +02:00
Michael Niedermayer	c221c7422f	swscale/output: Avoid undefined overflow in yuv2rgb_write_full() Fixes: signed integer overflow: -140140 * 16525 cannot be represented in type 'int' Fixes: 68859/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-4516387130245120 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-06-26 20:49:36 +02:00
Michael Niedermayer	9e6c5b6e86	swscale/output: alpha can become negative after scaling, use multiply Fixes: left shift of negative value -3245 Fixes: 69047/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6571511551950848 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-06-26 20:49:36 +02:00
Ramiro Polla	e37a93031e	swscale/yuv2rgb: reindent after previous commit	2024-06-24 13:35:22 +02:00
Ramiro Polla	0a08c64588	swscale/yuv2rgb: fix yuv422p input in C code The C code was silently ignoring the second chroma line on yuv422p input.	2024-06-24 13:34:53 +02:00
Ramiro Polla	fb8fae864f	swscale/yuv2rgb: add macros to simplify code generation	2024-06-24 13:34:28 +02:00
Ramiro Polla	88a402df74	swscale/yuv2rgb: fix conversion for widths not aligned to 8 The C code for some pixel formats (rgb555, rgb565, rgb444, and monob) was not converting the last pixels on widths not aligned to 8. NOTE: the last pixel for odd widths is still not converted for any of the pixel formats in the C code for yuv2rgb except for monob.	2024-06-24 13:33:53 +02:00
Ramiro Polla	75f1a8e071	swscale/aarch64: add neon {lum,chr}ConvertRange chrRangeFromJpeg_8_c: 29.2 chrRangeFromJpeg_8_neon: 19.5 chrRangeFromJpeg_24_c: 80.5 chrRangeFromJpeg_24_neon: 34.0 chrRangeFromJpeg_128_c: 413.7 chrRangeFromJpeg_128_neon: 156.0 chrRangeFromJpeg_144_c: 471.0 chrRangeFromJpeg_144_neon: 174.2 chrRangeFromJpeg_256_c: 842.0 chrRangeFromJpeg_256_neon: 305.5 chrRangeFromJpeg_512_c: 1699.0 chrRangeFromJpeg_512_neon: 608.0 chrRangeToJpeg_8_c: 51.7 chrRangeToJpeg_8_neon: 22.7 chrRangeToJpeg_24_c: 149.7 chrRangeToJpeg_24_neon: 38.0 chrRangeToJpeg_128_c: 761.7 chrRangeToJpeg_128_neon: 176.7 chrRangeToJpeg_144_c: 866.2 chrRangeToJpeg_144_neon: 198.7 chrRangeToJpeg_256_c: 1516.5 chrRangeToJpeg_256_neon: 348.7 chrRangeToJpeg_512_c: 3067.2 chrRangeToJpeg_512_neon: 692.7 lumRangeFromJpeg_8_c: 24.0 lumRangeFromJpeg_8_neon: 17.0 lumRangeFromJpeg_24_c: 56.7 lumRangeFromJpeg_24_neon: 21.0 lumRangeFromJpeg_128_c: 294.5 lumRangeFromJpeg_128_neon: 76.7 lumRangeFromJpeg_144_c: 332.5 lumRangeFromJpeg_144_neon: 86.7 lumRangeFromJpeg_256_c: 586.0 lumRangeFromJpeg_256_neon: 152.2 lumRangeFromJpeg_512_c: 1190.0 lumRangeFromJpeg_512_neon: 298.0 lumRangeToJpeg_8_c: 31.7 lumRangeToJpeg_8_neon: 19.5 lumRangeToJpeg_24_c: 83.5 lumRangeToJpeg_24_neon: 24.2 lumRangeToJpeg_128_c: 440.5 lumRangeToJpeg_128_neon: 91.0 lumRangeToJpeg_144_c: 504.2 lumRangeToJpeg_144_neon: 101.0 lumRangeToJpeg_256_c: 879.7 lumRangeToJpeg_256_neon: 177.2 lumRangeToJpeg_512_c: 1794.2 lumRangeToJpeg_512_neon: 354.0	2024-06-18 23:12:41 +02:00
James Almer	fcf72966a5	swscale/x86/range_convert: add missing AVX2 preprocessor wrapper Fixes compilation with old yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-16 10:09:38 -03:00
James Almer	8a4c9d6bd3	swscale/x86/range_convert: reduce amount of xmm regs clobbered in luma functions Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-15 21:02:06 -03:00
Ramiro Polla	f6859cade3	swscale/x86: add sse2 and avx2 {lum,chr}ConvertRange chrRangeFromJpeg_8_c: 22.3 chrRangeFromJpeg_8_sse2: 13.3 chrRangeFromJpeg_8_avx2: 13.3 chrRangeFromJpeg_24_c: 72.8 chrRangeFromJpeg_24_sse2: 22.3 chrRangeFromJpeg_24_avx2: 17.5 chrRangeFromJpeg_128_c: 345.5 chrRangeFromJpeg_128_sse2: 106.0 chrRangeFromJpeg_128_avx2: 57.8 chrRangeFromJpeg_144_c: 380.5 chrRangeFromJpeg_144_sse2: 118.5 chrRangeFromJpeg_144_avx2: 62.3 chrRangeFromJpeg_256_c: 646.3 chrRangeFromJpeg_256_sse2: 218.8 chrRangeFromJpeg_256_avx2: 109.0 chrRangeFromJpeg_512_c: 1461.5 chrRangeFromJpeg_512_sse2: 426.5 chrRangeFromJpeg_512_avx2: 211.5 chrRangeToJpeg_8_c: 37.8 chrRangeToJpeg_8_sse2: 10.5 chrRangeToJpeg_8_avx2: 14.0 chrRangeToJpeg_24_c: 114.3 chrRangeToJpeg_24_sse2: 23.5 chrRangeToJpeg_24_avx2: 16.3 chrRangeToJpeg_128_c: 633.5 chrRangeToJpeg_128_sse2: 107.5 chrRangeToJpeg_128_avx2: 55.0 chrRangeToJpeg_144_c: 758.3 chrRangeToJpeg_144_sse2: 132.0 chrRangeToJpeg_144_avx2: 64.5 chrRangeToJpeg_256_c: 1345.0 chrRangeToJpeg_256_sse2: 218.0 chrRangeToJpeg_256_avx2: 105.3 chrRangeToJpeg_512_c: 2524.0 chrRangeToJpeg_512_sse2: 417.0 chrRangeToJpeg_512_avx2: 218.8 lumRangeFromJpeg_8_c: 11.8 lumRangeFromJpeg_8_sse2: 11.0 lumRangeFromJpeg_8_avx2: 10.3 lumRangeFromJpeg_24_c: 38.5 lumRangeFromJpeg_24_sse2: 15.5 lumRangeFromJpeg_24_avx2: 12.5 lumRangeFromJpeg_128_c: 232.3 lumRangeFromJpeg_128_sse2: 60.0 lumRangeFromJpeg_128_avx2: 26.8 lumRangeFromJpeg_144_c: 259.5 lumRangeFromJpeg_144_sse2: 65.3 lumRangeFromJpeg_144_avx2: 29.0 lumRangeFromJpeg_256_c: 464.5 lumRangeFromJpeg_256_sse2: 107.5 lumRangeFromJpeg_256_avx2: 54.0 lumRangeFromJpeg_512_c: 897.5 lumRangeFromJpeg_512_sse2: 224.5 lumRangeFromJpeg_512_avx2: 109.8 lumRangeToJpeg_8_c: 17.8 lumRangeToJpeg_8_sse2: 11.0 lumRangeToJpeg_8_avx2: 11.8 lumRangeToJpeg_24_c: 56.3 lumRangeToJpeg_24_sse2: 11.0 lumRangeToJpeg_24_avx2: 12.5 lumRangeToJpeg_128_c: 333.8 lumRangeToJpeg_128_sse2: 53.3 lumRangeToJpeg_128_avx2: 26.5 lumRangeToJpeg_144_c: 375.5 lumRangeToJpeg_144_sse2: 60.8 lumRangeToJpeg_144_avx2: 29.0 lumRangeToJpeg_256_c: 652.0 lumRangeToJpeg_256_sse2: 109.5 lumRangeToJpeg_256_avx2: 53.5 lumRangeToJpeg_512_c: 1284.3 lumRangeToJpeg_512_sse2: 218.0 lumRangeToJpeg_512_avx2: 108.3	2024-06-16 00:35:51 +02:00
Rémi Denis-Courmont	378d1b06c3	riscv: probe for Zbb extension at load time Due to hysterical raisins, most RISC-V Linux distributions target a RV64GC baseline excluding the Bit-manipulation ISA extensions, most notably: - Zba: address generation extension and - Zbb: basic bit manipulation extension. Most CPUs that would make sense to run FFmpeg on support Zba and Zbb (including the current FATE runner), so it makes sense to optimise for them. In fact a large chunk of existing assembler optimisations relies on Zba and/or Zbb. Since we cannot patch shared library code, the next best thing is to carry a flag initialised at load-time and check it on need basis. This results in 3 instructions overhead on isolated use, e.g.: 1: AUIPC rd, %pcrel_hi(ff_rv_zbb_supported) LBU rd, %pcrel_lo(1b)(rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here The C compiler will typically load the flag ahead of time to reducing latency, and can also keep it around if Zbb is used multiple times in a single optimisation scope. For this to work, the flag symbol must be hidden; otherwise the optimisation degrades with a GOT look-up to support interposition: 1: AUIPC rd, GOT_OFFSET_HI LD rd, GOT_OFFSET_LO(rd) LBU rd, (rd) BEQZ rd, non_Zbb_fallback_code // Zbb code here This patch adds code to provision the flag in libraries using bit manipulation functions from libavutil: byte-swap, bit-weight and counting leading or trailing zeroes.	2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont	417957ec5e	sws/range_convert: R-V V to/from JPEG C908 X60 chrRangeFromJpeg_8_c: 2.7 2.5 chrRangeFromJpeg_8_rvv_i32: 1.7 1.5 chrRangeFromJpeg_24_c: 7.5 6.7 chrRangeFromJpeg_24_rvv_i32: 1.7 1.5 chrRangeFromJpeg_128_c: 55.2 34.7 chrRangeFromJpeg_128_rvv_i32: 6.5 3.0 chrRangeFromJpeg_144_c: 44.0 39.2 chrRangeFromJpeg_144_rvv_i32: 7.7 4.5 chrRangeFromJpeg_256_c: 78.2 69.5 chrRangeFromJpeg_256_rvv_i32: 12.2 6.0 chrRangeFromJpeg_512_c: 172.2 138.5 chrRangeFromJpeg_512_rvv_i32: 24.5 11.7 chrRangeToJpeg_8_c: 4.7 4.2 chrRangeToJpeg_8_rvv_i32: 2.0 1.7 chrRangeToJpeg_24_c: 13.7 12.2 chrRangeToJpeg_24_rvv_i32: 2.0 1.5 chrRangeToJpeg_128_c: 72.0 63.7 chrRangeToJpeg_128_rvv_i32: 6.7 3.2 chrRangeToJpeg_144_c: 80.7 71.7 chrRangeToJpeg_144_rvv_i32: 8.5 4.7 chrRangeToJpeg_256_c: 143.2 127.2 chrRangeToJpeg_256_rvv_i32: 13.5 6.5 chrRangeToJpeg_512_c: 285.7 253.7 chrRangeToJpeg_512_rvv_i32: 27.0 13.0 lumRangeFromJpeg_8_c: 1.7 1.5 lumRangeFromJpeg_8_rvv_i32: 1.2 1.0 lumRangeFromJpeg_24_c: 4.2 3.7 lumRangeFromJpeg_24_rvv_i32: 1.2 1.0 lumRangeFromJpeg_128_c: 21.7 19.2 lumRangeFromJpeg_128_rvv_i32: 3.7 1.7 lumRangeFromJpeg_144_c: 24.7 22.0 lumRangeFromJpeg_144_rvv_i32: 4.7 2.7 lumRangeFromJpeg_256_c: 43.7 39.0 lumRangeFromJpeg_256_rvv_i32: 7.5 3.2 lumRangeFromJpeg_512_c: 87.0 77.2 lumRangeFromJpeg_512_rvv_i32: 14.5 6.7 lumRangeToJpeg_8_c: 2.7 2.2 lumRangeToJpeg_8_rvv_i32: 1.0 1.0 lumRangeToJpeg_24_c: 7.2 6.5 lumRangeToJpeg_24_rvv_i32: 1.2 1.0 lumRangeToJpeg_128_c: 37.7 33.7 lumRangeToJpeg_128_rvv_i32: 3.7 2.0 lumRangeToJpeg_144_c: 42.5 37.7 lumRangeToJpeg_144_rvv_i32: 4.7 2.7 lumRangeToJpeg_256_c: 75.0 66.7 lumRangeToJpeg_256_rvv_i32: 7.5 3.5 lumRangeToJpeg_512_c: 149.5 133.0 lumRangeToJpeg_512_rvv_i32: 14.7 7.0	2024-06-10 22:48:52 +03:00
Zhao Zhili	9dac8495b0	swscale/aarch64: Add rgb24 to yuv implementation Test on Apple M1: rgb24_to_uv_8_c: 0.0 rgb24_to_uv_8_neon: 0.2 rgb24_to_uv_128_c: 1.0 rgb24_to_uv_128_neon: 0.5 rgb24_to_uv_1080_c: 7.0 rgb24_to_uv_1080_neon: 5.7 rgb24_to_uv_1920_c: 12.5 rgb24_to_uv_1920_neon: 9.5 rgb24_to_uv_half_8_c: 0.2 rgb24_to_uv_half_8_neon: 0.2 rgb24_to_uv_half_128_c: 1.0 rgb24_to_uv_half_128_neon: 0.5 rgb24_to_uv_half_1080_c: 6.2 rgb24_to_uv_half_1080_neon: 3.0 rgb24_to_uv_half_1920_c: 11.2 rgb24_to_uv_half_1920_neon: 5.2 rgb24_to_y_8_c: 0.2 rgb24_to_y_8_neon: 0.0 rgb24_to_y_128_c: 0.5 rgb24_to_y_128_neon: 0.5 rgb24_to_y_1080_c: 4.7 rgb24_to_y_1080_neon: 3.2 rgb24_to_y_1920_c: 8.0 rgb24_to_y_1920_neon: 5.7 On Pixel 6: rgb24_to_uv_8_c: 30.7 rgb24_to_uv_8_neon: 56.9 rgb24_to_uv_128_c: 213.9 rgb24_to_uv_128_neon: 173.2 rgb24_to_uv_1080_c: 1649.9 rgb24_to_uv_1080_neon: 1424.4 rgb24_to_uv_1920_c: 2907.9 rgb24_to_uv_1920_neon: 2480.7 rgb24_to_uv_half_8_c: 36.2 rgb24_to_uv_half_8_neon: 33.4 rgb24_to_uv_half_128_c: 167.9 rgb24_to_uv_half_128_neon: 99.4 rgb24_to_uv_half_1080_c: 1293.9 rgb24_to_uv_half_1080_neon: 778.7 rgb24_to_uv_half_1920_c: 2292.7 rgb24_to_uv_half_1920_neon: 1328.7 rgb24_to_y_8_c: 19.7 rgb24_to_y_8_neon: 27.7 rgb24_to_y_128_c: 129.9 rgb24_to_y_128_neon: 96.7 rgb24_to_y_1080_c: 995.4 rgb24_to_y_1080_neon: 767.7 rgb24_to_y_1920_c: 1747.4 rgb24_to_y_1920_neon: 1337.2 Note both tests use clang as compiler, which has vectorization enabled by default with -O3. Reviewed-by: Rémi Denis-Courmont <remi@remlab.net> Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>	2024-06-11 01:12:09 +08:00
James Almer	17c3cc5bb6	swscale/x86/rgb_2_rgb: add missing wrap to ff_uyvytoyuv422_avx2 Fixes old yasm. Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-09 16:04:36 -03:00
James Almer	03546f49a3	swscale/x86/rgb2rgb: add missing wrap for ff_uyvytoyuv422_avx2 Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-09 15:56:52 -03:00
James Almer	e8cef5e152	swscale/x86/rgb2rgb: remove mmxext version of shuffle_bytes_2103 Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-09 13:43:11 -03:00
James Almer	c578bb9864	swscale/x86/input: add AVX2 optimized uyvytoyuv422 uyvytoyuv422_c: 23991.8 uyvytoyuv422_sse2: 2817.8 uyvytoyuv422_avx: 2819.3 uyvytoyuv422_avx2: 1972.3 Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-09 13:43:11 -03:00
James Almer	e9cfd53257	swscale/x86/input: add AVX2 optimized RGB32 to YUV functions abgr_to_uv_8_c: 43.3 abgr_to_uv_8_sse2: 14.3 abgr_to_uv_8_avx: 15.3 abgr_to_uv_8_avx2: 18.8 abgr_to_uv_128_c: 650.3 abgr_to_uv_128_sse2: 110.8 abgr_to_uv_128_avx: 112.3 abgr_to_uv_128_avx2: 64.8 abgr_to_uv_1080_c: 5456.3 abgr_to_uv_1080_sse2: 888.8 abgr_to_uv_1080_avx: 900.8 abgr_to_uv_1080_avx2: 518.3 abgr_to_uv_1920_c: 9692.3 abgr_to_uv_1920_sse2: 1593.8 abgr_to_uv_1920_avx: 1613.3 abgr_to_uv_1920_avx2: 864.8 abgr_to_y_8_c: 23.3 abgr_to_y_8_sse2: 12.8 abgr_to_y_8_avx: 13.3 abgr_to_y_8_avx2: 17.3 abgr_to_y_128_c: 308.3 abgr_to_y_128_sse2: 67.3 abgr_to_y_128_avx: 66.8 abgr_to_y_128_avx2: 44.8 abgr_to_y_1080_c: 2371.3 abgr_to_y_1080_sse2: 512.8 abgr_to_y_1080_avx: 505.8 abgr_to_y_1080_avx2: 314.3 abgr_to_y_1920_c: 4177.3 abgr_to_y_1920_sse2: 915.8 abgr_to_y_1920_avx: 926.8 abgr_to_y_1920_avx2: 519.3 bgra_to_uv_8_c: 37.3 bgra_to_uv_8_sse2: 13.3 bgra_to_uv_8_avx: 14.8 bgra_to_uv_8_avx2: 19.8 bgra_to_uv_128_c: 563.8 bgra_to_uv_128_sse2: 111.3 bgra_to_uv_128_avx: 112.3 bgra_to_uv_128_avx2: 64.8 bgra_to_uv_1080_c: 4691.8 bgra_to_uv_1080_sse2: 893.8 bgra_to_uv_1080_avx: 899.8 bgra_to_uv_1080_avx2: 517.8 bgra_to_uv_1920_c: 8332.8 bgra_to_uv_1920_sse2: 1590.8 bgra_to_uv_1920_avx: 1605.8 bgra_to_uv_1920_avx2: 867.3 bgra_to_y_8_c: 22.3 bgra_to_y_8_sse2: 12.8 bgra_to_y_8_avx: 12.8 bgra_to_y_8_avx2: 17.3 bgra_to_y_128_c: 291.3 bgra_to_y_128_sse2: 67.8 bgra_to_y_128_avx: 69.3 bgra_to_y_128_avx2: 45.3 bgra_to_y_1080_c: 2357.3 bgra_to_y_1080_sse2: 508.3 bgra_to_y_1080_avx: 518.3 bgra_to_y_1080_avx2: 399.8 bgra_to_y_1920_c: 4202.8 bgra_to_y_1920_sse2: 906.8 bgra_to_y_1920_avx: 907.3 bgra_to_y_1920_avx2: 526.3 Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-09 13:43:11 -03:00
James Almer	d5fe99dc5f	swscale/x86/input: add AVX2 optimized RGB24 to YUV functions rgb24_to_uv_8_c: 39.3 rgb24_to_uv_8_sse2: 14.3 rgb24_to_uv_8_ssse3: 13.3 rgb24_to_uv_8_avx: 12.8 rgb24_to_uv_8_avx2: 14.3 rgb24_to_uv_128_c: 582.8 rgb24_to_uv_128_sse2: 127.3 rgb24_to_uv_128_ssse3: 107.3 rgb24_to_uv_128_avx: 111.3 rgb24_to_uv_128_avx2: 62.3 rgb24_to_uv_1080_c: 4981.3 rgb24_to_uv_1080_sse2: 1048.3 rgb24_to_uv_1080_ssse3: 876.8 rgb24_to_uv_1080_avx: 887.8 rgb24_to_uv_1080_avx2: 492.3 rgb24_to_uv_1280_c: 5906.8 rgb24_to_uv_1280_sse2: 1263.3 rgb24_to_uv_1280_ssse3: 1048.3 rgb24_to_uv_1280_avx: 1045.8 rgb24_to_uv_1280_avx2: 579.8 rgb24_to_uv_1920_c: 8665.3 rgb24_to_uv_1920_sse2: 1888.8 rgb24_to_uv_1920_ssse3: 1571.8 rgb24_to_uv_1920_avx: 1558.8 rgb24_to_uv_1920_avx2: 869.3 rgb24_to_y_8_c: 20.3 rgb24_to_y_8_sse2: 11.8 rgb24_to_y_8_ssse3: 10.3 rgb24_to_y_8_avx: 10.3 rgb24_to_y_8_avx2: 10.8 rgb24_to_y_128_c: 284.8 rgb24_to_y_128_sse2: 83.3 rgb24_to_y_128_ssse3: 66.8 rgb24_to_y_128_avx: 64.8 rgb24_to_y_128_avx2: 39.3 rgb24_to_y_1080_c: 2451.3 rgb24_to_y_1080_sse2: 696.3 rgb24_to_y_1080_ssse3: 516.8 rgb24_to_y_1080_avx: 518.8 rgb24_to_y_1080_avx2: 301.8 rgb24_to_y_1280_c: 2892.8 rgb24_to_y_1280_sse2: 816.8 rgb24_to_y_1280_ssse3: 623.3 rgb24_to_y_1280_avx: 616.3 rgb24_to_y_1280_avx2: 350.8 rgb24_to_y_1920_c: 4338.8 rgb24_to_y_1920_sse2: 1210.8 rgb24_to_y_1920_ssse3: 928.3 rgb24_to_y_1920_avx: 920.3 rgb24_to_y_1920_avx2: 534.8 Signed-off-by: James Almer <jamrial@gmail.com>	2024-06-09 13:42:09 -03:00
Rémi Denis-Courmont	7a3369398f	sws/input: R-V V 32-bit RGB to halved UV T-Head C908: abgr_to_uv_half_8_c: 2.2 abgr_to_uv_half_8_rvv_i32: 3.5 abgr_to_uv_half_128_c: 44.0 abgr_to_uv_half_128_rvv_i32: 13.0 abgr_to_uv_half_1080_c: 245.0 abgr_to_uv_half_1080_rvv_i32: 107.2 abgr_to_uv_half_1920_c: 406.2 abgr_to_uv_half_1920_rvv_i32: 188.7 bgra_to_uv_half_8_c: 2.2 bgra_to_uv_half_8_rvv_i32: 3.5 bgra_to_uv_half_128_c: 26.5 bgra_to_uv_half_128_rvv_i32: 13.0 bgra_to_uv_half_1080_c: 219.7 bgra_to_uv_half_1080_rvv_i32: 107.0 bgra_to_uv_half_1920_c: 406.7 bgra_to_uv_half_1920_rvv_i32: 188.7 SpacemiT X60: abgr_to_uv_half_8_c: 2.2 abgr_to_uv_half_8_rvv_i32: 3.0 abgr_to_uv_half_128_c: 28.2 abgr_to_uv_half_128_rvv_i32: 5.7 abgr_to_uv_half_1080_c: 235.5 abgr_to_uv_half_1080_rvv_i32: 47.7 abgr_to_uv_half_1920_c: 418.2 abgr_to_uv_half_1920_rvv_i32: 84.0 bgra_to_uv_half_8_c: 2.0 bgra_to_uv_half_8_rvv_i32: 3.0 bgra_to_uv_half_128_c: 23.7 bgra_to_uv_half_128_rvv_i32: 5.7 bgra_to_uv_half_1080_c: 195.5 bgra_to_uv_half_1080_rvv_i32: 47.7 bgra_to_uv_half_1920_c: 346.5 bgra_to_uv_half_1920_rvv_i32: 84.0	2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont	e2f069905e	sws/input: R-V V 32-bit RGB to UV	2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont	f5555cb106	sws/input: R-V V 32-bit RGB to Y T-Head C908: abgr_to_y_8_c: 2.5 abgr_to_y_8_rvv_i32: 2.2 abgr_to_y_128_c: 37.0 abgr_to_y_128_rvv_i32: 8.5 abgr_to_y_1080_c: 327.0 abgr_to_y_1080_rvv_i32: 69.5 abgr_to_y_1920_c: 552.0 abgr_to_y_1920_rvv_i32: 122.2 bgra_to_y_8_c: 2.5 bgra_to_y_8_rvv_i32: 2.2 bgra_to_y_128_c: 37.2 bgra_to_y_128_rvv_i32: 8.5 bgra_to_y_1080_c: 310.2 bgra_to_y_1080_rvv_i32: 69.5 bgra_to_y_1920_c: 568.2 bgra_to_y_1920_rvv_i32: 122.5 SpacemiT X60: abgr_to_y_8_c: 2.5 abgr_to_y_8_rvv_i32: 2.0 abgr_to_y_128_c: 33.0 abgr_to_y_128_rvv_i32: 3.7 abgr_to_y_1080_c: 276.0 abgr_to_y_1080_rvv_i32: 31.5 abgr_to_y_1920_c: 493.7 abgr_to_y_1920_rvv_i32: 55.5 bgra_to_y_8_c: 2.2 bgra_to_y_8_rvv_i32: 2.0 bgra_to_y_128_c: 33.0 bgra_to_y_128_rvv_i32: 3.7 bgra_to_y_1080_c: 276.0 bgra_to_y_1080_rvv_i32: 31.5 bgra_to_y_1920_c: 490.7 bgra_to_y_1920_rvv_i32: 55.5	2024-06-09 14:33:04 +03:00
Andreas Rheinhardt	8b62fb231a	swscale/x86/rgb2rgb: Detemplatize Every function in rgb2rgb_template.c is only compiled exactly once; there is no overlap at all between the MMXEXT and the SSE2 functions, so detemplatize it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-06-09 12:03:47 +02:00
Andreas Rheinhardt	5421dee0e7	swscale/x86/rgb2rgb_template: Remove unused uyvytoyv12 Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-06-09 12:03:47 +02:00
Andreas Rheinhardt	c1c35380a7	swscale/x86/rgb2rgb: Don't unnecessarily check for inline ASM The SSE2 and AVX versions of deinterleaveBytes are external ASM. Move them out of the inline ASM template. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-06-09 12:03:47 +02:00
Andreas Rheinhardt	f7305eb3b3	swscale/x86/rgb2rgb_template: Remove unnecessary SFENCE The ff_nv12ToUV_* functions don't use non-temporal stores at all. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-06-09 12:03:47 +02:00
Rémi Denis-Courmont	e0f4d185f1	sws/input: R-V V rgb24ToUV_half and bgr24ToUV_half T-Head C908: rgb24_to_uv_half_4_c: 2.0 rgb24_to_uv_half_4_rvv_i32: 3.5 rgb24_to_uv_half_64_c: 27.0 rgb24_to_uv_half_64_rvv_i32: 12.5 rgb24_to_uv_half_540_c: 223.7 rgb24_to_uv_half_540_rvv_i32: 105.2 rgb24_to_uv_half_640_c: 265.5 rgb24_to_uv_half_640_rvv_i32: 123.7 rgb24_to_uv_half_960_c: 414.5 rgb24_to_uv_half_960_rvv_i32: 249.5 SpacemiT X60: rgb24_to_uv_half_4_c: 1.7 rgb24_to_uv_half_4_rvv_i32: 4.2 rgb24_to_uv_half_64_c: 24.0 rgb24_to_uv_half_64_rvv_i32: 8.7 rgb24_to_uv_half_540_c: 199.2 rgb24_to_uv_half_540_rvv_i32: 72.5 rgb24_to_uv_half_640_c: 235.7 rgb24_to_uv_half_640_rvv_i32: 85.2 rgb24_to_uv_half_960_c: 353.5 rgb24_to_uv_half_960_rvv_i32: 127.5	2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont	3ef5867e4b	sws/input: R-V V rgb24ToUV and bgr24ToUV T-Head C908: rgb24_to_uv_8_c: 2.7 rgb24_to_uv_8_rvv_i32: 3.2 rgb24_to_uv_128_c: 41.0 rgb24_to_uv_128_rvv_i32: 12.7 rgb24_to_uv_1080_c: 342.5 rgb24_to_uv_1080_rvv_i32: 105.7 rgb24_to_uv_1280_c: 406.0 rgb24_to_uv_1280_rvv_i32: 124.2 rgb24_to_uv_1920_c: 626.0 rgb24_to_uv_1920_rvv_i32: 186.0 SpacemiT X60: rgb24_to_uv_8_c: 2.5 rgb24_to_uv_8_rvv_i32: 3.0 rgb24_to_uv_128_c: 36.5 rgb24_to_uv_128_rvv_i32: 5.7 rgb24_to_uv_1080_c: 304.2 rgb24_to_uv_1080_rvv_i32: 49.0 rgb24_to_uv_1280_c: 360.5 rgb24_to_uv_1280_rvv_i32: 57.5 rgb24_to_uv_1920_c: 540.7 rgb24_to_uv_1920_rvv_i32: 86.2	2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont	79dfdac4db	sws/input: R-V V rgb24ToY & bgr24ToY T-Head C908: rgb24_to_y_8_c: 2.0 rgb24_to_y_8_rvv_i32: 2.7 rgb24_to_y_128_c: 26.2 rgb24_to_y_128_rvv_i32: 9.2 rgb24_to_y_1080_c: 219.5 rgb24_to_y_1080_rvv_i32: 76.2 rgb24_to_y_1280_c: 276.2 rgb24_to_y_1280_rvv_i32: 89.7 rgb24_to_y_1920_c: 389.7 rgb24_to_y_1920_rvv_i32: 134.2 SpacemiT X60: rgb24_to_y_8_c: 1.7 rgb24_to_y_8_rvv_i32: 2.2 rgb24_to_y_128_c: 23.2 rgb24_to_y_128_rvv_i32: 4.2 rgb24_to_y_1080_c: 195.0 rgb24_to_y_1080_rvv_i32: 33.7 rgb24_to_y_1280_c: 231.0 rgb24_to_y_1280_rvv_i32: 40.0 rgb24_to_y_1920_c: 346.2 rgb24_to_y_1920_rvv_i32: 59.7	2024-06-08 18:30:43 +03:00
Ramiro Polla	5939f7228a	libswscale/x86/yuv_2_rgb: fix some comments	2024-06-07 15:24:06 +02:00
Shiyou Yin	6b35fcacdb	swscale: [loongarch] Fix undeclared functions prob. Compile with '--disable-lasx', ‘lumRangeFromJpeg_lasx’ undeclared. Reviewed-by: 金波 <jinbo@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-31 02:20:23 +02:00
Michael Niedermayer	bfc22f364d	swscale/yuv2rgb: Use 64bit for brightness computation This will not overflow for normal values Fixes: CID1500280 Unintentional integer overflow Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-28 03:48:06 +02:00
Michael Niedermayer	3f9daf1c18	swscale/x86/swscale: use a clearer name for INPUT_PLANER_RGB_A_FUNC_CASE related: CID1497114 Missing break in switch Sponsored-by: Sovereign Tech Fund Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-28 03:48:06 +02:00
Rémi Denis-Courmont	6c6313f1b5	swscale/riscv: explicitly require Zbb for MIN	2024-05-10 18:59:06 +03:00
Michael Niedermayer	1330a73cca	swscale/output: Fix integer overflow in yuv2rgba64_full_1_c_template() Fixes: signed integer overflow: -1082982400 + -1079364728 cannot be represented in type 'int' Fixes: 67910/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5329011971522560 The input is 9bit in 16bit, the fuzzer fills all 16bit thus generating "invalid" input No overflow should happen with valid input. Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-06 03:00:40 +02:00
Michael Niedermayer	a56559e688	swscale/output: Fix integer overflow in yuv2rgba64_1_c_template Fixes: signed integer overflow: -831176 * 9539 cannot be represented in type 'int' Fixes: 67869/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5117342091640832 The input is 9bit in 16bit, the fuzzer fills all 16bit thus generating "invalid" input No overflow should happen with valid input. Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-05-06 03:00:40 +02:00
Shiyou Yin	2a7d622ddd	swscale: [LA] Optimize swscale funcs in input.c Optimized 7 funcs with LSX and LASX: 1. yuy2ToUV_c 2. yvy2ToUV_c 3. uyvyToUV_c 4. nv12ToUV_c 5. nv21ToUV_c 6. abgrToA_c 7. rgbaToA_c Reviewed-by: colleague of Shiyou Yin Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-11 23:53:59 +02:00
Shiyou Yin	8b76df9142	swscale: [LA] Optimize yuv2plane1_8_c. Reviewed-by: colleague of Shiyou Yin Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-11 23:53:59 +02:00
Shiyou Yin	f3fe2cb5f7	swscale: [LA] Optimize range convert for yuvj420p. Reviewed-by: 陈昊 <chenhao@loongson.cn> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-11 23:53:41 +02:00
Michael Niedermayer	1a9eda65d0	swscale/utils: Fix xInc overflow Fixes: signed integer overflow: 2 * 1073741824 cannot be represented in type 'int' Fixes: 67802/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6249515855183872 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2024-04-04 19:38:29 +02:00
Andreas Rheinhardt	428ff7bd8c	swscale/ppc/swscale_ppc_template: Reindent after the previous commit Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-04-04 16:47:21 +02:00
Andreas Rheinhardt	95b4aea5e3	swscale/ppc/swscale_ppc_template: Remove code not passing checkasm Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-04-04 16:45:23 +02:00
Andreas Rheinhardt	790f793844	avutil/common: Don't auto-include mem.h There are lots of files that don't need it: The number of object files that actually need it went down from 2011 to 884 here. Keep it for external users in order to not cause breakages. Also improve the other headers a bit while just at it. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-31 00:08:43 +01:00
Andreas Rheinhardt	b616be1649	lib*/version: Use static_assert for static asserts Also update the checks that guard against inserting a new enum entry in the middle of a range. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2024-03-31 00:08:42 +01:00

1 2 3 4 5 ...

2676 Commits