James Almer
04612351ab
swscale/input: add V30X input support
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-10-08 22:26:07 -03:00
James Almer
ea05edc9e0
swscale/input: add VYU444 input support
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-10-08 22:24:47 -03:00
James Almer
ec7f5e314d
swscale/input: add UYVA input support
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-10-08 22:24:47 -03:00
James Almer
bb37d3c33e
swscale/input: add AYUV input support
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-10-08 22:24:47 -03:00
jinbo
e6ecc1e757
swscale: Fix conflicting types for loongarch
...
Build breaks after c1a0e65763
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-10-09 01:55:50 +02:00
Niklas Haas
477445722c
swscale/ppc: fix altivec build failure
...
Fixes: c1a0e65763
2024-10-08 16:45:36 +02:00
Martin Storsjö
b9145fcab2
swscale: Fix aarch64 and i386 compilation failures
...
This unbreaks builds after c1a0e65763
,
which broke with errors like
src/libswscale/aarch64/rgb2rgb.c:66:25: error: incompatible function pointer types assigning to 'void (*)(const uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int, const int32_t *)' (aka 'void (*)(const unsigned char *, unsigned char *, unsigned char *, unsigned char *, int, int, int, int, int, const int *)') from 'void (const uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int, int32_t *)' (aka 'void (const unsigned char *, unsigned char *, unsigned char *, unsigned char *, int, int, int, int, int, int *)') [-Wincompatible-function-pointer-types]
66 | ff_rgb24toyv12 = rgb24toyv12;
| ^ ~~~~~~~~~~~
and
src/libswscale/aarch64/swscale_unscaled.c:213:29: error: incompatible function pointer types assigning to 'SwsFunc' (aka 'int (*)(struct SwsContext *, const unsigned char *const *, const int *, int, int, unsigned char *const *, const int *)') from 'int (SwsContext *, const uint8_t *const *, const int *, int, int, const uint8_t **, const int *)' (aka 'int (struct SwsContext *, const unsigned char *const *, const int *, int, int, const unsigned char **, const int *)') [-Wincompatible-function-pointer-types]
213 | c->convert_unscaled = nv24_to_yuv420p_neon_wrapper;
| ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-10-08 09:29:07 +03:00
Niklas Haas
73b3344edd
swscale/input: parametrize ff_sws_init_input_funcs() pointers
...
Following the precedent set by ff_sws_init_output_funcs().
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
20b350b284
swscale/internal: add typedefs for input reading functions
...
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
b90d522d2c
swscale/internal: forward typedef SwsContext
...
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
c1a0e65763
swscale/internal: constify SwsFunc
...
I want to move away from having random leaf processing functions mutate
plane pointers, and while we're at it, we might as well make the strides
and tables const as well.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
286bdc9cdc
swscale/internal: turn cascaded_tmp into an array
...
Slightly more convenient to access from the new wrapping code.
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
61369484f6
swscale/internal: expose ff_update_palette() internally
...
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
aee19ee431
swscale/internal: rename NB_SWS_DITHER for consistency
...
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
41ce370b65
tests/swscale: fix minor typos
...
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Michael Niedermayer
38e224c2ba
*/version.h: bump after release/7.1 branch
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-09-24 17:10:35 +02:00
Michael Niedermayer
e1094ac45d
*/version.h: bump minor versions for release/7.1
...
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-09-24 17:07:30 +02:00
Zhao Zhili
e18b46d95f
swscale/aarch64: Fix rgb24toyv12 only works with aligned width
...
Since c0666d8b
, rgb24toyv12 is broken for width non-aligned to 16.
Add a simple wrapper to handle the non-aligned part.
Co-authored-by: johzzy <hellojinqiang@gmail.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-09-24 10:24:14 +08:00
Michael Niedermayer
bd80c97391
swscale/output: Fix undefined integer overflow in yuv2rgba64_2_c_template()
...
Fixes: signed integer overflow: -1082982400 + -1083218484 cannot be represented in type 'int'
Fixes: 70657/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6707819712675840
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-09-19 00:24:26 +02:00
Michael Niedermayer
44c5641ae8
swscale/swscale: Use unsigned operation to avoid undefined behavior
...
I have not checked that the constant is correct, this just fixes the undefined behavior
Fixes: signed integer overflow: -646656 * 3517 cannot be represented in type 'int
Fixes: 70559/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5209368631508992
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-09-19 00:10:38 +02:00
Ramiro Polla
c0666d8bed
swscale/aarch64/rgb2rgb: add neon implementation for rgb24toyv12
...
A55 A76
rgb24toyv12_16_200_c: 36890.6 17275.5
rgb24toyv12_16_200_neon: 12460.1 ( 2.96x) 5360.8 ( 3.22x)
rgb24toyv12_128_60_c: 83205.1 39884.8
rgb24toyv12_128_60_neon: 27468.4 ( 3.03x) 13552.5 ( 2.94x)
rgb24toyv12_512_16_c: 88111.6 42346.8
rgb24toyv12_512_16_neon: 29126.6 ( 3.03x) 14411.2 ( 2.94x)
rgb24toyv12_1920_4_c: 82068.1 39620.0
rgb24toyv12_1920_4_neon: 27011.6 ( 3.04x) 13492.2 ( 2.94x)
2024-09-06 23:11:13 +02:00
Ramiro Polla
caaec2ea95
swscale/x86/rgb2rgb: disable rgb24toyv12_mmxext for x86_64
...
The mmxext implementation is slower than the C version in x86_64.
m32 m64
rgb24toyv12_16_200_c: 24942.7 14812.6
rgb24toyv12_16_200_mmxext: 17857.2 ( 1.40x) 17400.4 ( 0.85x)
rgb24toyv12_128_60_c: 56892.9 35616.9
rgb24toyv12_128_60_mmxext: 40730.9 ( 1.40x) 39610.4 ( 0.90x)
rgb24toyv12_512_16_c: 58402.7 37209.4
rgb24toyv12_512_16_mmxext: 44842.4 ( 1.30x) 41136.2 ( 0.90x)
rgb24toyv12_1920_4_c: 54827.4 34737.4
rgb24toyv12_1920_4_mmxext: 51169.9 ( 1.07x) 34818.9 ( 1.00x)
2024-09-06 23:06:38 +02:00
Ramiro Polla
3604b2403c
swscale/rgb2rgb: improve chroma conversion in ff_rgb24toyv12_c
...
The current code subsamples by dropping 3/4 pixels to calculate the
chroma components. This commit calculates the average of 4 rgb pixels
before calculating the chroma components, putting it in line with the
mmxext implementation.
2024-09-06 23:06:32 +02:00
Ramiro Polla
d8848325a6
swscale/aarch64/rgb2rgb: add deinterleaveBytes neon implementation
...
A55 A76
deinterleave_bytes_c: 70342.0 34497.5
deinterleave_bytes_neon: 21594.5 ( 3.26x) 5535.2 ( 6.23x)
deinterleave_bytes_aligned_c: 71340.8 34651.2
deinterleave_bytes_aligned_neon: 8616.8 ( 8.28x) 3996.2 ( 8.67x)
2024-09-06 23:05:09 +02:00
Ramiro Polla
4c824ad391
swscale/x86/rgb2rgb: fix deinterleaveBytes writing past the end of the buffers
2024-09-06 23:05:04 +02:00
Ramiro Polla
f17a6bd200
swscale/x86/rgb2rgb: fix deinterleaveBytes for unaligned dst pointers
2024-09-06 23:05:01 +02:00
Rémi Denis-Courmont
27d28b68da
swscale/rgb2rgb: enable R-V V deinterleaveBytes
...
T-Head C908:
deinterleave_bytes_c: 100328.3 ( 1.00x)
deinterleave_bytes_rvv_i32: 19331.3 ( 5.19x)
deinterleave_bytes_aligned_c: 100337.5 ( 1.00x)
deinterleave_bytes_aligned_rvv_i32: 15748.0 ( 6.37x)
SpacemiT X60:
deinterleave_bytes_c: 95230.6 ( 1.00x)
deinterleave_bytes_rvv_i32: 9790.3 ( 9.73x)
deinterleave_bytes_aligned_c: 96564.1 ( 1.00x)
deinterleave_bytes_aligned_rvv_i32: 7780.1 (12.41x)
2024-09-04 22:04:11 +03:00
Ramiro Polla
420d443600
swscale/aarch64: cosmetics fix (spaces inside curly braces)
2024-08-26 11:07:49 +02:00
Ramiro Polla
52887683e9
swscale/aarch64: add nv24/nv42 to yuv420p unscaled converter
...
A55 A76
nv24_yuv420p_128_c: 4956.1 1267.0
nv24_yuv420p_128_neon: 3109.1 ( 1.59x) 640.0 ( 1.98x)
nv24_yuv420p_1920_c: 35728.4 11736.2
nv24_yuv420p_1920_neon: 8011.1 ( 4.46x) 2436.0 ( 4.82x)
nv42_yuv420p_128_c: 4956.4 1270.5
nv42_yuv420p_128_neon: 3074.6 ( 1.61x) 639.5 ( 1.99x)
nv42_yuv420p_1920_c: 35685.9 11732.5
nv42_yuv420p_1920_neon: 7995.1 ( 4.46x) 2437.2 ( 4.81x)
2024-08-26 11:04:46 +02:00
Ramiro Polla
88a563ad18
swscale: export ff_copyPlane so it may be used by simd code
2024-08-26 11:04:46 +02:00
Ramiro Polla
4eb5594295
swscale: add nv24/nv42 to yuv420p unscaled converter
2024-08-26 11:04:46 +02:00
Martin Storsjö
cfe0a36352
libswscale: aarch64: Fix the indentation of some macro invocations
...
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-08-22 14:40:30 +03:00
Martin Storsjö
507c2a5774
libswscale: arm: Don't assume aligned output in yuv2rgb functions
...
This fixes failures in recently added checkasm tests.
While the buffers in most cases are aligned, libswscale in general
can't assume the output to be aligned.
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-08-19 23:04:52 +03:00
Ramiro Polla
181cd260db
swscale/aarch64/yuv2rgb: add neon yuv42{0,2}p -> gbrp unscaled colorspace converters
...
checkasm --bench on a Raspberry Pi 5 Model B Rev 1.0:
yuv420p_gbrp_128_c: 1243.0
yuv420p_gbrp_128_neon: 453.5
yuv420p_gbrp_1920_c: 18165.5
yuv420p_gbrp_1920_neon: 6700.0
yuv422p_gbrp_128_c: 1463.5
yuv422p_gbrp_128_neon: 471.5
yuv422p_gbrp_1920_c: 21343.7
yuv422p_gbrp_1920_neon: 6743.5
2024-08-18 22:26:17 +02:00
Ramiro Polla
8744764a4c
swscale/x86/yuv2rgb: add ssse3 yuv42{0,2}p -> gbrp unscaled colorspace converters
...
Note: this implementation is limited to x86_64 due to general purpose
register pressure.
checkasm --bench on an Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz:
yuv420p_gbrp_8_c: 118.5
yuv420p_gbrp_8_ssse3: 93.3
yuv420p_gbrp_128_c: 1068.3
yuv420p_gbrp_128_ssse3: 319.3
yuv420p_gbrp_1080_c: 8841.8
yuv420p_gbrp_1080_ssse3: 2211.8
yuv420p_gbrp_1920_c: 15903.8
yuv420p_gbrp_1920_ssse3: 3814.3
yuv422p_gbrp_8_c: 144.8
yuv422p_gbrp_8_ssse3: 93.8
yuv422p_gbrp_128_c: 1395.8
yuv422p_gbrp_128_ssse3: 313.0
yuv422p_gbrp_1080_c: 11551.5
yuv422p_gbrp_1080_ssse3: 2240.8
yuv422p_gbrp_1920_c: 20585.3
yuv422p_gbrp_1920_ssse3: 5249.5
yuva420p_gbrp_8_c: 117.5
yuva420p_gbrp_8_ssse3: 92.0
yuva420p_gbrp_128_c: 1593.0
yuva420p_gbrp_128_ssse3: 319.3
yuva420p_gbrp_1080_c: 8694.5
yuva420p_gbrp_1080_ssse3: 2186.0
yuva420p_gbrp_1920_c: 15946.5
yuva420p_gbrp_1920_ssse3: 3805.3
2024-08-18 22:26:14 +02:00
Ramiro Polla
4545205a26
swscale/yuv2rgb: add yuv42{0,2}p -> gbrp unscaled colorspace converters
2024-08-18 22:26:11 +02:00
Ramiro Polla
af5adf57e3
swscale/yuv2rgb: prepare YUV2RGBFUNC macro for multi-planar rgb
...
This will be used in the upcoming yuv42{0,2}p -> gbrp unscaled
colorspace converters.
There is no difference in performance.
2024-08-18 22:26:08 +02:00
Ramiro Polla
24063e7827
swscale/yuv2rgb: prepare LOADCHROMA/PUTFUNC macros for multi-planar rgb
...
This will be used in the upcoming yuv42{0,2}p -> gbrp unscaled
colorspace converters.
There is no difference in performance.
2024-08-18 22:26:05 +02:00
Niklas Haas
6b40be941a
swscale/options: relax src/dst_h/v_chr_pos value range
...
When dealing with 4x subsampling ratios (log2 == 2), such as can arise
with 4:1:1 or 4:1:0, a value range of 512 is not enough to cover the
range of possible scenarios.
For example, bottom-sited chroma in 4:1:0 would require an offset of 768
(three luma rows). Simply double the limit to 1024. I don't see any
place in initFilter() that would experience overflow as a result of this
change, especially since get_local_pos() right-shifts it by the
subsampling ratio again.
2024-08-16 11:43:37 +02:00
Niklas Haas
3e064f52eb
swscale: document SWS_FULL_CHR_H_* flags
...
Based on my best understanding of what they do, given the source code.
2024-08-16 11:43:37 +02:00
James Almer
66592e8b10
swscale/output: don't leave the alpha channel undefined in vuyx and xv36le
...
It's non-determistic, as shown by poisoning avfilter buffers instead of zeroing them.
Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-13 14:49:41 -03:00
Rémi Denis-Courmont
210877c5fd
sws/riscv: depend on RVB and simplify accordingly
2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
bd0c3edb13
lavu/riscv: count bytes rather than words for bswap32
...
This removes the dependency on Zba at essentially zero cost.
2024-07-30 18:41:51 +03:00
Shiyou Yin
4713a5cc24
swscale: [loongarch] Fix checkasm-sw_yuv2rgb failure.
...
Reviewed-by: 陈昊 <chenhao@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-28 19:02:16 +02:00
Rémi Denis-Courmont
4f2472909e
sws/riscv: add forward-edge CFI landing pads
2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
e91a8cc4de
sws/riscv: require B or zba explicitly
2024-07-25 18:55:48 +03:00
Michael Niedermayer
bcab9789ef
swscale/output: Fix integer overflows in yuv2rgba64_X_c_template
...
Fixes: signed integer overflow: -1082982400 + -1068681048 cannot be represented in type 'int'
Fixes: 69995/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6285740271534080
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-21 15:35:08 +02:00
Niklas Haas
4ec45aca36
swscale/utils: fix leak on threaded ctx init failure
...
This count gets incremented after init succeeds, when it should be
incremented after *alloc* succeeds. Otherwise, we leak the context on
failure.
There are no negative consequences of incrementing for
allocated-but-not-initialized contexts, as the only functions that
reference it will, in the worst case, simply behave as if called on
allocated-but-not-initialized contexts, which is in line with expected
behavior when sws_init_context() fails.
2024-07-14 13:48:59 +02:00
Sean McGovern
34b4ca8696
swscale: prevent undefined behaviour in the PUTRGBA macro
...
For even small values of 'asrc[x]', shifting them by 24 bits or more
will cause arithmetic overflow and be caught by
GCC's undefined behaviour sanitizer.
Ensure the values do not overflow by up-casting the bracketed
expressions involving 'asrc' to uint32_t.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-10 18:10:10 +02:00
Ramiro Polla
ac6263945a
swscale/x86/yuv2rgb: Detemplatize
...
Every function in yuv2rgb_template.c is only compiled exactly
once, so detemplatize it.
2024-07-10 12:25:32 +02:00
Ramiro Polla
4f7f9b1026
swscale: remove unconditional #define DITHER1XBPP
...
This seems to have had an use in the past, but it is now defined
unconditionally.
2024-07-10 12:25:03 +02:00
Michael Niedermayer
66b60bae68
swscale/swscale: Use ptrdiff_t for linesize computations
...
This is unlikely to make a difference
Fixes: CID1591896 Unintentional integer overflow
Fixes: CID1591901 Unintentional integer overflow
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-07 23:36:30 +02:00
Zhao Zhili
4d90a76986
swscale/aarch64: Add argb/abgr to yuv
...
Test on Apple M1 with kperf:
: -O3 : -O3 -fno-vectorize
abgr_to_uv_8_c : 19.4 : 26.1
abgr_to_uv_8_neon : 29.9 : 51.1
abgr_to_uv_128_c : 146.4 : 558.9
abgr_to_uv_128_neon : 85.1 : 83.4
abgr_to_uv_1080_c : 1162.6 : 4786.4
abgr_to_uv_1080_neon : 819.6 : 826.6
abgr_to_uv_1920_c : 2063.6 : 8492.1
abgr_to_uv_1920_neon : 1435.1 : 1447.1
abgr_to_uv_half_8_c : 16.4 : 11.4
abgr_to_uv_half_8_neon : 35.6 : 20.4
abgr_to_uv_half_128_c : 108.6 : 359.4
abgr_to_uv_half_128_neon : 75.4 : 42.6
abgr_to_uv_half_1080_c : 883.4 : 2885.6
abgr_to_uv_half_1080_neon : 460.6 : 481.1
abgr_to_uv_half_1920_c : 1553.6 : 5106.9
abgr_to_uv_half_1920_neon : 817.6 : 820.4
abgr_to_y_8_c : 6.1 : 26.4
abgr_to_y_8_neon : 40.6 : 6.4
abgr_to_y_128_c : 99.9 : 390.1
abgr_to_y_128_neon : 67.4 : 55.9
abgr_to_y_1080_c : 735.9 : 3170.4
abgr_to_y_1080_neon : 534.6 : 536.6
abgr_to_y_1920_c : 1279.4 : 6016.4
abgr_to_y_1920_neon : 932.6 : 927.6
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-05 16:32:31 +08:00
Zhao Zhili
52422133ae
swscale/aarch64: Add bgra/rgba to yuv
...
Test on Apple M1 with kperf
: -O3 : -O3 -fno-vectorize
bgra_to_uv_8_c : 13.4 : 27.5
bgra_to_uv_8_neon : 37.4 : 41.7
bgra_to_uv_128_c : 155.9 : 550.2
bgra_to_uv_128_neon : 91.7 : 92.7
bgra_to_uv_1080_c : 1173.2 : 4558.2
bgra_to_uv_1080_neon : 822.7 : 809.5
bgra_to_uv_1920_c : 2078.2 : 8115.2
bgra_to_uv_1920_neon : 1437.7 : 1438.7
bgra_to_uv_half_8_c : 17.9 : 14.2
bgra_to_uv_half_8_neon : 37.4 : 10.5
bgra_to_uv_half_128_c : 103.9 : 326.0
bgra_to_uv_half_128_neon : 73.9 : 68.7
bgra_to_uv_half_1080_c : 850.2 : 3732.0
bgra_to_uv_half_1080_neon : 484.2 : 490.0
bgra_to_uv_half_1920_c : 1479.2 : 4942.7
bgra_to_uv_half_1920_neon : 824.2 : 824.7
bgra_to_y_8_c : 8.2 : 29.5
bgra_to_y_8_neon : 18.2 : 32.7
bgra_to_y_128_c : 101.4 : 361.5
bgra_to_y_128_neon : 74.9 : 73.7
bgra_to_y_1080_c : 739.4 : 3018.0
bgra_to_y_1080_neon : 613.4 : 544.2
bgra_to_y_1920_c : 1298.7 : 5326.0
bgra_to_y_1920_neon : 918.7 : 934.2
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-05 16:32:31 +08:00
Zhao Zhili
b8b71be07a
swscale/aarch64: Add bgr24 to yuv
...
Test on Apple M1 with kperf
: -O3 : -O3 -fno-vectorize
bgr24_to_uv_8_c : 28.5 : 52.5
bgr24_to_uv_8_neon : 54.5 : 59.7
bgr24_to_uv_128_c : 294.0 : 830.7
bgr24_to_uv_128_neon : 99.7 : 112.0
bgr24_to_uv_1080_c : 965.0 : 6624.0
bgr24_to_uv_1080_neon : 751.5 : 754.7
bgr24_to_uv_1920_c : 1693.2 : 11554.5
bgr24_to_uv_1920_neon : 1292.5 : 1307.5
bgr24_to_uv_half_8_c : 54.2 : 37.0
bgr24_to_uv_half_8_neon : 27.2 : 22.5
bgr24_to_uv_half_128_c : 127.2 : 392.5
bgr24_to_uv_half_128_neon : 63.0 : 52.0
bgr24_to_uv_half_1080_c : 880.2 : 3329.0
bgr24_to_uv_half_1080_neon : 401.5 : 390.7
bgr24_to_uv_half_1920_c : 1585.7 : 6390.7
bgr24_to_uv_half_1920_neon : 694.7 : 698.7
bgr24_to_y_8_c : 21.7 : 22.5
bgr24_to_y_8_neon : 797.2 : 25.5
bgr24_to_y_128_c : 88.0 : 280.5
bgr24_to_y_128_neon : 63.7 : 55.0
bgr24_to_y_1080_c : 616.7 : 2208.7
bgr24_to_y_1080_neon : 900.0 : 452.0
bgr24_to_y_1920_c : 1093.2 : 3894.7
bgr24_to_y_1920_neon : 777.2 : 767.5
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-07-05 16:32:31 +08:00
Ramiro Polla
61e851381f
swscale/yuv2rgb/x86: remove mmx/mmxext yuv2rgb functions
...
These functions are either slower or barely faster than the C LUT
yuv2rgb code.
2024-07-04 11:12:47 +02:00
Michael Niedermayer
c221c7422f
swscale/output: Avoid undefined overflow in yuv2rgb_write_full()
...
Fixes: signed integer overflow: -140140 * 16525 cannot be represented in type 'int'
Fixes: 68859/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-4516387130245120
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-06-26 20:49:36 +02:00
Michael Niedermayer
9e6c5b6e86
swscale/output: alpha can become negative after scaling, use multiply
...
Fixes: left shift of negative value -3245
Fixes: 69047/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6571511551950848
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-06-26 20:49:36 +02:00
Ramiro Polla
e37a93031e
swscale/yuv2rgb: reindent after previous commit
2024-06-24 13:35:22 +02:00
Ramiro Polla
0a08c64588
swscale/yuv2rgb: fix yuv422p input in C code
...
The C code was silently ignoring the second chroma line on yuv422p
input.
2024-06-24 13:34:53 +02:00
Ramiro Polla
fb8fae864f
swscale/yuv2rgb: add macros to simplify code generation
2024-06-24 13:34:28 +02:00
Ramiro Polla
88a402df74
swscale/yuv2rgb: fix conversion for widths not aligned to 8
...
The C code for some pixel formats (rgb555, rgb565, rgb444, and monob)
was not converting the last pixels on widths not aligned to 8.
NOTE: the last pixel for odd widths is still not converted for any of
the pixel formats in the C code for yuv2rgb except for monob.
2024-06-24 13:33:53 +02:00
Ramiro Polla
75f1a8e071
swscale/aarch64: add neon {lum,chr}ConvertRange
...
chrRangeFromJpeg_8_c: 29.2
chrRangeFromJpeg_8_neon: 19.5
chrRangeFromJpeg_24_c: 80.5
chrRangeFromJpeg_24_neon: 34.0
chrRangeFromJpeg_128_c: 413.7
chrRangeFromJpeg_128_neon: 156.0
chrRangeFromJpeg_144_c: 471.0
chrRangeFromJpeg_144_neon: 174.2
chrRangeFromJpeg_256_c: 842.0
chrRangeFromJpeg_256_neon: 305.5
chrRangeFromJpeg_512_c: 1699.0
chrRangeFromJpeg_512_neon: 608.0
chrRangeToJpeg_8_c: 51.7
chrRangeToJpeg_8_neon: 22.7
chrRangeToJpeg_24_c: 149.7
chrRangeToJpeg_24_neon: 38.0
chrRangeToJpeg_128_c: 761.7
chrRangeToJpeg_128_neon: 176.7
chrRangeToJpeg_144_c: 866.2
chrRangeToJpeg_144_neon: 198.7
chrRangeToJpeg_256_c: 1516.5
chrRangeToJpeg_256_neon: 348.7
chrRangeToJpeg_512_c: 3067.2
chrRangeToJpeg_512_neon: 692.7
lumRangeFromJpeg_8_c: 24.0
lumRangeFromJpeg_8_neon: 17.0
lumRangeFromJpeg_24_c: 56.7
lumRangeFromJpeg_24_neon: 21.0
lumRangeFromJpeg_128_c: 294.5
lumRangeFromJpeg_128_neon: 76.7
lumRangeFromJpeg_144_c: 332.5
lumRangeFromJpeg_144_neon: 86.7
lumRangeFromJpeg_256_c: 586.0
lumRangeFromJpeg_256_neon: 152.2
lumRangeFromJpeg_512_c: 1190.0
lumRangeFromJpeg_512_neon: 298.0
lumRangeToJpeg_8_c: 31.7
lumRangeToJpeg_8_neon: 19.5
lumRangeToJpeg_24_c: 83.5
lumRangeToJpeg_24_neon: 24.2
lumRangeToJpeg_128_c: 440.5
lumRangeToJpeg_128_neon: 91.0
lumRangeToJpeg_144_c: 504.2
lumRangeToJpeg_144_neon: 101.0
lumRangeToJpeg_256_c: 879.7
lumRangeToJpeg_256_neon: 177.2
lumRangeToJpeg_512_c: 1794.2
lumRangeToJpeg_512_neon: 354.0
2024-06-18 23:12:41 +02:00
James Almer
fcf72966a5
swscale/x86/range_convert: add missing AVX2 preprocessor wrapper
...
Fixes compilation with old yasm.
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-16 10:09:38 -03:00
James Almer
8a4c9d6bd3
swscale/x86/range_convert: reduce amount of xmm regs clobbered in luma functions
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-15 21:02:06 -03:00
Ramiro Polla
f6859cade3
swscale/x86: add sse2 and avx2 {lum,chr}ConvertRange
...
chrRangeFromJpeg_8_c: 22.3
chrRangeFromJpeg_8_sse2: 13.3
chrRangeFromJpeg_8_avx2: 13.3
chrRangeFromJpeg_24_c: 72.8
chrRangeFromJpeg_24_sse2: 22.3
chrRangeFromJpeg_24_avx2: 17.5
chrRangeFromJpeg_128_c: 345.5
chrRangeFromJpeg_128_sse2: 106.0
chrRangeFromJpeg_128_avx2: 57.8
chrRangeFromJpeg_144_c: 380.5
chrRangeFromJpeg_144_sse2: 118.5
chrRangeFromJpeg_144_avx2: 62.3
chrRangeFromJpeg_256_c: 646.3
chrRangeFromJpeg_256_sse2: 218.8
chrRangeFromJpeg_256_avx2: 109.0
chrRangeFromJpeg_512_c: 1461.5
chrRangeFromJpeg_512_sse2: 426.5
chrRangeFromJpeg_512_avx2: 211.5
chrRangeToJpeg_8_c: 37.8
chrRangeToJpeg_8_sse2: 10.5
chrRangeToJpeg_8_avx2: 14.0
chrRangeToJpeg_24_c: 114.3
chrRangeToJpeg_24_sse2: 23.5
chrRangeToJpeg_24_avx2: 16.3
chrRangeToJpeg_128_c: 633.5
chrRangeToJpeg_128_sse2: 107.5
chrRangeToJpeg_128_avx2: 55.0
chrRangeToJpeg_144_c: 758.3
chrRangeToJpeg_144_sse2: 132.0
chrRangeToJpeg_144_avx2: 64.5
chrRangeToJpeg_256_c: 1345.0
chrRangeToJpeg_256_sse2: 218.0
chrRangeToJpeg_256_avx2: 105.3
chrRangeToJpeg_512_c: 2524.0
chrRangeToJpeg_512_sse2: 417.0
chrRangeToJpeg_512_avx2: 218.8
lumRangeFromJpeg_8_c: 11.8
lumRangeFromJpeg_8_sse2: 11.0
lumRangeFromJpeg_8_avx2: 10.3
lumRangeFromJpeg_24_c: 38.5
lumRangeFromJpeg_24_sse2: 15.5
lumRangeFromJpeg_24_avx2: 12.5
lumRangeFromJpeg_128_c: 232.3
lumRangeFromJpeg_128_sse2: 60.0
lumRangeFromJpeg_128_avx2: 26.8
lumRangeFromJpeg_144_c: 259.5
lumRangeFromJpeg_144_sse2: 65.3
lumRangeFromJpeg_144_avx2: 29.0
lumRangeFromJpeg_256_c: 464.5
lumRangeFromJpeg_256_sse2: 107.5
lumRangeFromJpeg_256_avx2: 54.0
lumRangeFromJpeg_512_c: 897.5
lumRangeFromJpeg_512_sse2: 224.5
lumRangeFromJpeg_512_avx2: 109.8
lumRangeToJpeg_8_c: 17.8
lumRangeToJpeg_8_sse2: 11.0
lumRangeToJpeg_8_avx2: 11.8
lumRangeToJpeg_24_c: 56.3
lumRangeToJpeg_24_sse2: 11.0
lumRangeToJpeg_24_avx2: 12.5
lumRangeToJpeg_128_c: 333.8
lumRangeToJpeg_128_sse2: 53.3
lumRangeToJpeg_128_avx2: 26.5
lumRangeToJpeg_144_c: 375.5
lumRangeToJpeg_144_sse2: 60.8
lumRangeToJpeg_144_avx2: 29.0
lumRangeToJpeg_256_c: 652.0
lumRangeToJpeg_256_sse2: 109.5
lumRangeToJpeg_256_avx2: 53.5
lumRangeToJpeg_512_c: 1284.3
lumRangeToJpeg_512_sse2: 218.0
lumRangeToJpeg_512_avx2: 108.3
2024-06-16 00:35:51 +02:00
Rémi Denis-Courmont
378d1b06c3
riscv: probe for Zbb extension at load time
...
Due to hysterical raisins, most RISC-V Linux distributions target a
RV64GC baseline excluding the Bit-manipulation ISA extensions, most
notably:
- Zba: address generation extension and
- Zbb: basic bit manipulation extension.
Most CPUs that would make sense to run FFmpeg on support Zba and Zbb
(including the current FATE runner), so it makes sense to optimise for
them. In fact a large chunk of existing assembler optimisations relies
on Zba and/or Zbb.
Since we cannot patch shared library code, the next best thing is to
carry a flag initialised at load-time and check it on need basis.
This results in 3 instructions overhead on isolated use, e.g.:
1: AUIPC rd, %pcrel_hi(ff_rv_zbb_supported)
LBU rd, %pcrel_lo(1b)(rd)
BEQZ rd, non_Zbb_fallback_code
// Zbb code here
The C compiler will typically load the flag ahead of time to reducing
latency, and can also keep it around if Zbb is used multiple times in a
single optimisation scope. For this to work, the flag symbol must be
hidden; otherwise the optimisation degrades with a GOT look-up to
support interposition:
1: AUIPC rd, GOT_OFFSET_HI
LD rd, GOT_OFFSET_LO(rd)
LBU rd, (rd)
BEQZ rd, non_Zbb_fallback_code
// Zbb code here
This patch adds code to provision the flag in libraries using bit
manipulation functions from libavutil: byte-swap, bit-weight and
counting leading or trailing zeroes.
2024-06-11 20:12:37 +03:00
Rémi Denis-Courmont
417957ec5e
sws/range_convert: R-V V to/from JPEG
...
C908 X60
chrRangeFromJpeg_8_c: 2.7 2.5
chrRangeFromJpeg_8_rvv_i32: 1.7 1.5
chrRangeFromJpeg_24_c: 7.5 6.7
chrRangeFromJpeg_24_rvv_i32: 1.7 1.5
chrRangeFromJpeg_128_c: 55.2 34.7
chrRangeFromJpeg_128_rvv_i32: 6.5 3.0
chrRangeFromJpeg_144_c: 44.0 39.2
chrRangeFromJpeg_144_rvv_i32: 7.7 4.5
chrRangeFromJpeg_256_c: 78.2 69.5
chrRangeFromJpeg_256_rvv_i32: 12.2 6.0
chrRangeFromJpeg_512_c: 172.2 138.5
chrRangeFromJpeg_512_rvv_i32: 24.5 11.7
chrRangeToJpeg_8_c: 4.7 4.2
chrRangeToJpeg_8_rvv_i32: 2.0 1.7
chrRangeToJpeg_24_c: 13.7 12.2
chrRangeToJpeg_24_rvv_i32: 2.0 1.5
chrRangeToJpeg_128_c: 72.0 63.7
chrRangeToJpeg_128_rvv_i32: 6.7 3.2
chrRangeToJpeg_144_c: 80.7 71.7
chrRangeToJpeg_144_rvv_i32: 8.5 4.7
chrRangeToJpeg_256_c: 143.2 127.2
chrRangeToJpeg_256_rvv_i32: 13.5 6.5
chrRangeToJpeg_512_c: 285.7 253.7
chrRangeToJpeg_512_rvv_i32: 27.0 13.0
lumRangeFromJpeg_8_c: 1.7 1.5
lumRangeFromJpeg_8_rvv_i32: 1.2 1.0
lumRangeFromJpeg_24_c: 4.2 3.7
lumRangeFromJpeg_24_rvv_i32: 1.2 1.0
lumRangeFromJpeg_128_c: 21.7 19.2
lumRangeFromJpeg_128_rvv_i32: 3.7 1.7
lumRangeFromJpeg_144_c: 24.7 22.0
lumRangeFromJpeg_144_rvv_i32: 4.7 2.7
lumRangeFromJpeg_256_c: 43.7 39.0
lumRangeFromJpeg_256_rvv_i32: 7.5 3.2
lumRangeFromJpeg_512_c: 87.0 77.2
lumRangeFromJpeg_512_rvv_i32: 14.5 6.7
lumRangeToJpeg_8_c: 2.7 2.2
lumRangeToJpeg_8_rvv_i32: 1.0 1.0
lumRangeToJpeg_24_c: 7.2 6.5
lumRangeToJpeg_24_rvv_i32: 1.2 1.0
lumRangeToJpeg_128_c: 37.7 33.7
lumRangeToJpeg_128_rvv_i32: 3.7 2.0
lumRangeToJpeg_144_c: 42.5 37.7
lumRangeToJpeg_144_rvv_i32: 4.7 2.7
lumRangeToJpeg_256_c: 75.0 66.7
lumRangeToJpeg_256_rvv_i32: 7.5 3.5
lumRangeToJpeg_512_c: 149.5 133.0
lumRangeToJpeg_512_rvv_i32: 14.7 7.0
2024-06-10 22:48:52 +03:00
Zhao Zhili
9dac8495b0
swscale/aarch64: Add rgb24 to yuv implementation
...
Test on Apple M1:
rgb24_to_uv_8_c: 0.0
rgb24_to_uv_8_neon: 0.2
rgb24_to_uv_128_c: 1.0
rgb24_to_uv_128_neon: 0.5
rgb24_to_uv_1080_c: 7.0
rgb24_to_uv_1080_neon: 5.7
rgb24_to_uv_1920_c: 12.5
rgb24_to_uv_1920_neon: 9.5
rgb24_to_uv_half_8_c: 0.2
rgb24_to_uv_half_8_neon: 0.2
rgb24_to_uv_half_128_c: 1.0
rgb24_to_uv_half_128_neon: 0.5
rgb24_to_uv_half_1080_c: 6.2
rgb24_to_uv_half_1080_neon: 3.0
rgb24_to_uv_half_1920_c: 11.2
rgb24_to_uv_half_1920_neon: 5.2
rgb24_to_y_8_c: 0.2
rgb24_to_y_8_neon: 0.0
rgb24_to_y_128_c: 0.5
rgb24_to_y_128_neon: 0.5
rgb24_to_y_1080_c: 4.7
rgb24_to_y_1080_neon: 3.2
rgb24_to_y_1920_c: 8.0
rgb24_to_y_1920_neon: 5.7
On Pixel 6:
rgb24_to_uv_8_c: 30.7
rgb24_to_uv_8_neon: 56.9
rgb24_to_uv_128_c: 213.9
rgb24_to_uv_128_neon: 173.2
rgb24_to_uv_1080_c: 1649.9
rgb24_to_uv_1080_neon: 1424.4
rgb24_to_uv_1920_c: 2907.9
rgb24_to_uv_1920_neon: 2480.7
rgb24_to_uv_half_8_c: 36.2
rgb24_to_uv_half_8_neon: 33.4
rgb24_to_uv_half_128_c: 167.9
rgb24_to_uv_half_128_neon: 99.4
rgb24_to_uv_half_1080_c: 1293.9
rgb24_to_uv_half_1080_neon: 778.7
rgb24_to_uv_half_1920_c: 2292.7
rgb24_to_uv_half_1920_neon: 1328.7
rgb24_to_y_8_c: 19.7
rgb24_to_y_8_neon: 27.7
rgb24_to_y_128_c: 129.9
rgb24_to_y_128_neon: 96.7
rgb24_to_y_1080_c: 995.4
rgb24_to_y_1080_neon: 767.7
rgb24_to_y_1920_c: 1747.4
rgb24_to_y_1920_neon: 1337.2
Note both tests use clang as compiler, which has vectorization
enabled by default with -O3.
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-11 01:12:09 +08:00
James Almer
17c3cc5bb6
swscale/x86/rgb_2_rgb: add missing wrap to ff_uyvytoyuv422_avx2
...
Fixes old yasm.
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 16:04:36 -03:00
James Almer
03546f49a3
swscale/x86/rgb2rgb: add missing wrap for ff_uyvytoyuv422_avx2
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 15:56:52 -03:00
James Almer
e8cef5e152
swscale/x86/rgb2rgb: remove mmxext version of shuffle_bytes_2103
...
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
c578bb9864
swscale/x86/input: add AVX2 optimized uyvytoyuv422
...
uyvytoyuv422_c: 23991.8
uyvytoyuv422_sse2: 2817.8
uyvytoyuv422_avx: 2819.3
uyvytoyuv422_avx2: 1972.3
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
e9cfd53257
swscale/x86/input: add AVX2 optimized RGB32 to YUV functions
...
abgr_to_uv_8_c: 43.3
abgr_to_uv_8_sse2: 14.3
abgr_to_uv_8_avx: 15.3
abgr_to_uv_8_avx2: 18.8
abgr_to_uv_128_c: 650.3
abgr_to_uv_128_sse2: 110.8
abgr_to_uv_128_avx: 112.3
abgr_to_uv_128_avx2: 64.8
abgr_to_uv_1080_c: 5456.3
abgr_to_uv_1080_sse2: 888.8
abgr_to_uv_1080_avx: 900.8
abgr_to_uv_1080_avx2: 518.3
abgr_to_uv_1920_c: 9692.3
abgr_to_uv_1920_sse2: 1593.8
abgr_to_uv_1920_avx: 1613.3
abgr_to_uv_1920_avx2: 864.8
abgr_to_y_8_c: 23.3
abgr_to_y_8_sse2: 12.8
abgr_to_y_8_avx: 13.3
abgr_to_y_8_avx2: 17.3
abgr_to_y_128_c: 308.3
abgr_to_y_128_sse2: 67.3
abgr_to_y_128_avx: 66.8
abgr_to_y_128_avx2: 44.8
abgr_to_y_1080_c: 2371.3
abgr_to_y_1080_sse2: 512.8
abgr_to_y_1080_avx: 505.8
abgr_to_y_1080_avx2: 314.3
abgr_to_y_1920_c: 4177.3
abgr_to_y_1920_sse2: 915.8
abgr_to_y_1920_avx: 926.8
abgr_to_y_1920_avx2: 519.3
bgra_to_uv_8_c: 37.3
bgra_to_uv_8_sse2: 13.3
bgra_to_uv_8_avx: 14.8
bgra_to_uv_8_avx2: 19.8
bgra_to_uv_128_c: 563.8
bgra_to_uv_128_sse2: 111.3
bgra_to_uv_128_avx: 112.3
bgra_to_uv_128_avx2: 64.8
bgra_to_uv_1080_c: 4691.8
bgra_to_uv_1080_sse2: 893.8
bgra_to_uv_1080_avx: 899.8
bgra_to_uv_1080_avx2: 517.8
bgra_to_uv_1920_c: 8332.8
bgra_to_uv_1920_sse2: 1590.8
bgra_to_uv_1920_avx: 1605.8
bgra_to_uv_1920_avx2: 867.3
bgra_to_y_8_c: 22.3
bgra_to_y_8_sse2: 12.8
bgra_to_y_8_avx: 12.8
bgra_to_y_8_avx2: 17.3
bgra_to_y_128_c: 291.3
bgra_to_y_128_sse2: 67.8
bgra_to_y_128_avx: 69.3
bgra_to_y_128_avx2: 45.3
bgra_to_y_1080_c: 2357.3
bgra_to_y_1080_sse2: 508.3
bgra_to_y_1080_avx: 518.3
bgra_to_y_1080_avx2: 399.8
bgra_to_y_1920_c: 4202.8
bgra_to_y_1920_sse2: 906.8
bgra_to_y_1920_avx: 907.3
bgra_to_y_1920_avx2: 526.3
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:43:11 -03:00
James Almer
d5fe99dc5f
swscale/x86/input: add AVX2 optimized RGB24 to YUV functions
...
rgb24_to_uv_8_c: 39.3
rgb24_to_uv_8_sse2: 14.3
rgb24_to_uv_8_ssse3: 13.3
rgb24_to_uv_8_avx: 12.8
rgb24_to_uv_8_avx2: 14.3
rgb24_to_uv_128_c: 582.8
rgb24_to_uv_128_sse2: 127.3
rgb24_to_uv_128_ssse3: 107.3
rgb24_to_uv_128_avx: 111.3
rgb24_to_uv_128_avx2: 62.3
rgb24_to_uv_1080_c: 4981.3
rgb24_to_uv_1080_sse2: 1048.3
rgb24_to_uv_1080_ssse3: 876.8
rgb24_to_uv_1080_avx: 887.8
rgb24_to_uv_1080_avx2: 492.3
rgb24_to_uv_1280_c: 5906.8
rgb24_to_uv_1280_sse2: 1263.3
rgb24_to_uv_1280_ssse3: 1048.3
rgb24_to_uv_1280_avx: 1045.8
rgb24_to_uv_1280_avx2: 579.8
rgb24_to_uv_1920_c: 8665.3
rgb24_to_uv_1920_sse2: 1888.8
rgb24_to_uv_1920_ssse3: 1571.8
rgb24_to_uv_1920_avx: 1558.8
rgb24_to_uv_1920_avx2: 869.3
rgb24_to_y_8_c: 20.3
rgb24_to_y_8_sse2: 11.8
rgb24_to_y_8_ssse3: 10.3
rgb24_to_y_8_avx: 10.3
rgb24_to_y_8_avx2: 10.8
rgb24_to_y_128_c: 284.8
rgb24_to_y_128_sse2: 83.3
rgb24_to_y_128_ssse3: 66.8
rgb24_to_y_128_avx: 64.8
rgb24_to_y_128_avx2: 39.3
rgb24_to_y_1080_c: 2451.3
rgb24_to_y_1080_sse2: 696.3
rgb24_to_y_1080_ssse3: 516.8
rgb24_to_y_1080_avx: 518.8
rgb24_to_y_1080_avx2: 301.8
rgb24_to_y_1280_c: 2892.8
rgb24_to_y_1280_sse2: 816.8
rgb24_to_y_1280_ssse3: 623.3
rgb24_to_y_1280_avx: 616.3
rgb24_to_y_1280_avx2: 350.8
rgb24_to_y_1920_c: 4338.8
rgb24_to_y_1920_sse2: 1210.8
rgb24_to_y_1920_ssse3: 928.3
rgb24_to_y_1920_avx: 920.3
rgb24_to_y_1920_avx2: 534.8
Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 13:42:09 -03:00
Rémi Denis-Courmont
7a3369398f
sws/input: R-V V 32-bit RGB to halved UV
...
T-Head C908:
abgr_to_uv_half_8_c: 2.2
abgr_to_uv_half_8_rvv_i32: 3.5
abgr_to_uv_half_128_c: 44.0
abgr_to_uv_half_128_rvv_i32: 13.0
abgr_to_uv_half_1080_c: 245.0
abgr_to_uv_half_1080_rvv_i32: 107.2
abgr_to_uv_half_1920_c: 406.2
abgr_to_uv_half_1920_rvv_i32: 188.7
bgra_to_uv_half_8_c: 2.2
bgra_to_uv_half_8_rvv_i32: 3.5
bgra_to_uv_half_128_c: 26.5
bgra_to_uv_half_128_rvv_i32: 13.0
bgra_to_uv_half_1080_c: 219.7
bgra_to_uv_half_1080_rvv_i32: 107.0
bgra_to_uv_half_1920_c: 406.7
bgra_to_uv_half_1920_rvv_i32: 188.7
SpacemiT X60:
abgr_to_uv_half_8_c: 2.2
abgr_to_uv_half_8_rvv_i32: 3.0
abgr_to_uv_half_128_c: 28.2
abgr_to_uv_half_128_rvv_i32: 5.7
abgr_to_uv_half_1080_c: 235.5
abgr_to_uv_half_1080_rvv_i32: 47.7
abgr_to_uv_half_1920_c: 418.2
abgr_to_uv_half_1920_rvv_i32: 84.0
bgra_to_uv_half_8_c: 2.0
bgra_to_uv_half_8_rvv_i32: 3.0
bgra_to_uv_half_128_c: 23.7
bgra_to_uv_half_128_rvv_i32: 5.7
bgra_to_uv_half_1080_c: 195.5
bgra_to_uv_half_1080_rvv_i32: 47.7
bgra_to_uv_half_1920_c: 346.5
bgra_to_uv_half_1920_rvv_i32: 84.0
2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont
e2f069905e
sws/input: R-V V 32-bit RGB to UV
2024-06-09 14:33:04 +03:00
Rémi Denis-Courmont
f5555cb106
sws/input: R-V V 32-bit RGB to Y
...
T-Head C908:
abgr_to_y_8_c: 2.5
abgr_to_y_8_rvv_i32: 2.2
abgr_to_y_128_c: 37.0
abgr_to_y_128_rvv_i32: 8.5
abgr_to_y_1080_c: 327.0
abgr_to_y_1080_rvv_i32: 69.5
abgr_to_y_1920_c: 552.0
abgr_to_y_1920_rvv_i32: 122.2
bgra_to_y_8_c: 2.5
bgra_to_y_8_rvv_i32: 2.2
bgra_to_y_128_c: 37.2
bgra_to_y_128_rvv_i32: 8.5
bgra_to_y_1080_c: 310.2
bgra_to_y_1080_rvv_i32: 69.5
bgra_to_y_1920_c: 568.2
bgra_to_y_1920_rvv_i32: 122.5
SpacemiT X60:
abgr_to_y_8_c: 2.5
abgr_to_y_8_rvv_i32: 2.0
abgr_to_y_128_c: 33.0
abgr_to_y_128_rvv_i32: 3.7
abgr_to_y_1080_c: 276.0
abgr_to_y_1080_rvv_i32: 31.5
abgr_to_y_1920_c: 493.7
abgr_to_y_1920_rvv_i32: 55.5
bgra_to_y_8_c: 2.2
bgra_to_y_8_rvv_i32: 2.0
bgra_to_y_128_c: 33.0
bgra_to_y_128_rvv_i32: 3.7
bgra_to_y_1080_c: 276.0
bgra_to_y_1080_rvv_i32: 31.5
bgra_to_y_1920_c: 490.7
bgra_to_y_1920_rvv_i32: 55.5
2024-06-09 14:33:04 +03:00
Andreas Rheinhardt
8b62fb231a
swscale/x86/rgb2rgb: Detemplatize
...
Every function in rgb2rgb_template.c is only compiled exactly
once; there is no overlap at all between the MMXEXT and the
SSE2 functions, so detemplatize it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
5421dee0e7
swscale/x86/rgb2rgb_template: Remove unused uyvytoyv12
...
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
c1c35380a7
swscale/x86/rgb2rgb: Don't unnecessarily check for inline ASM
...
The SSE2 and AVX versions of deinterleaveBytes are external ASM.
Move them out of the inline ASM template.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Andreas Rheinhardt
f7305eb3b3
swscale/x86/rgb2rgb_template: Remove unnecessary SFENCE
...
The ff_nv12ToUV_* functions don't use non-temporal stores
at all.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Rémi Denis-Courmont
e0f4d185f1
sws/input: R-V V rgb24ToUV_half and bgr24ToUV_half
...
T-Head C908:
rgb24_to_uv_half_4_c: 2.0
rgb24_to_uv_half_4_rvv_i32: 3.5
rgb24_to_uv_half_64_c: 27.0
rgb24_to_uv_half_64_rvv_i32: 12.5
rgb24_to_uv_half_540_c: 223.7
rgb24_to_uv_half_540_rvv_i32: 105.2
rgb24_to_uv_half_640_c: 265.5
rgb24_to_uv_half_640_rvv_i32: 123.7
rgb24_to_uv_half_960_c: 414.5
rgb24_to_uv_half_960_rvv_i32: 249.5
SpacemiT X60:
rgb24_to_uv_half_4_c: 1.7
rgb24_to_uv_half_4_rvv_i32: 4.2
rgb24_to_uv_half_64_c: 24.0
rgb24_to_uv_half_64_rvv_i32: 8.7
rgb24_to_uv_half_540_c: 199.2
rgb24_to_uv_half_540_rvv_i32: 72.5
rgb24_to_uv_half_640_c: 235.7
rgb24_to_uv_half_640_rvv_i32: 85.2
rgb24_to_uv_half_960_c: 353.5
rgb24_to_uv_half_960_rvv_i32: 127.5
2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont
3ef5867e4b
sws/input: R-V V rgb24ToUV and bgr24ToUV
...
T-Head C908:
rgb24_to_uv_8_c: 2.7
rgb24_to_uv_8_rvv_i32: 3.2
rgb24_to_uv_128_c: 41.0
rgb24_to_uv_128_rvv_i32: 12.7
rgb24_to_uv_1080_c: 342.5
rgb24_to_uv_1080_rvv_i32: 105.7
rgb24_to_uv_1280_c: 406.0
rgb24_to_uv_1280_rvv_i32: 124.2
rgb24_to_uv_1920_c: 626.0
rgb24_to_uv_1920_rvv_i32: 186.0
SpacemiT X60:
rgb24_to_uv_8_c: 2.5
rgb24_to_uv_8_rvv_i32: 3.0
rgb24_to_uv_128_c: 36.5
rgb24_to_uv_128_rvv_i32: 5.7
rgb24_to_uv_1080_c: 304.2
rgb24_to_uv_1080_rvv_i32: 49.0
rgb24_to_uv_1280_c: 360.5
rgb24_to_uv_1280_rvv_i32: 57.5
rgb24_to_uv_1920_c: 540.7
rgb24_to_uv_1920_rvv_i32: 86.2
2024-06-08 18:30:43 +03:00
Rémi Denis-Courmont
79dfdac4db
sws/input: R-V V rgb24ToY & bgr24ToY
...
T-Head C908:
rgb24_to_y_8_c: 2.0
rgb24_to_y_8_rvv_i32: 2.7
rgb24_to_y_128_c: 26.2
rgb24_to_y_128_rvv_i32: 9.2
rgb24_to_y_1080_c: 219.5
rgb24_to_y_1080_rvv_i32: 76.2
rgb24_to_y_1280_c: 276.2
rgb24_to_y_1280_rvv_i32: 89.7
rgb24_to_y_1920_c: 389.7
rgb24_to_y_1920_rvv_i32: 134.2
SpacemiT X60:
rgb24_to_y_8_c: 1.7
rgb24_to_y_8_rvv_i32: 2.2
rgb24_to_y_128_c: 23.2
rgb24_to_y_128_rvv_i32: 4.2
rgb24_to_y_1080_c: 195.0
rgb24_to_y_1080_rvv_i32: 33.7
rgb24_to_y_1280_c: 231.0
rgb24_to_y_1280_rvv_i32: 40.0
rgb24_to_y_1920_c: 346.2
rgb24_to_y_1920_rvv_i32: 59.7
2024-06-08 18:30:43 +03:00
Ramiro Polla
5939f7228a
libswscale/x86/yuv_2_rgb: fix some comments
2024-06-07 15:24:06 +02:00
Shiyou Yin
6b35fcacdb
swscale: [loongarch] Fix undeclared functions prob.
...
Compile with '--disable-lasx', ‘lumRangeFromJpeg_lasx’ undeclared.
Reviewed-by: 金波 <jinbo@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-31 02:20:23 +02:00
Michael Niedermayer
bfc22f364d
swscale/yuv2rgb: Use 64bit for brightness computation
...
This will not overflow for normal values
Fixes: CID1500280 Unintentional integer overflow
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:06 +02:00
Michael Niedermayer
3f9daf1c18
swscale/x86/swscale: use a clearer name for INPUT_PLANER_RGB_A_FUNC_CASE
...
related: CID1497114 Missing break in switch
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-28 03:48:06 +02:00
Rémi Denis-Courmont
6c6313f1b5
swscale/riscv: explicitly require Zbb for MIN
2024-05-10 18:59:06 +03:00
Michael Niedermayer
1330a73cca
swscale/output: Fix integer overflow in yuv2rgba64_full_1_c_template()
...
Fixes: signed integer overflow: -1082982400 + -1079364728 cannot be represented in type 'int'
Fixes: 67910/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5329011971522560
The input is 9bit in 16bit, the fuzzer fills all 16bit thus generating "invalid" input
No overflow should happen with valid input.
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-06 03:00:40 +02:00
Michael Niedermayer
a56559e688
swscale/output: Fix integer overflow in yuv2rgba64_1_c_template
...
Fixes: signed integer overflow: -831176 * 9539 cannot be represented in type 'int'
Fixes: 67869/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5117342091640832
The input is 9bit in 16bit, the fuzzer fills all 16bit thus generating "invalid" input
No overflow should happen with valid input.
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-05-06 03:00:40 +02:00
Shiyou Yin
2a7d622ddd
swscale: [LA] Optimize swscale funcs in input.c
...
Optimized 7 funcs with LSX and LASX:
1. yuy2ToUV_c
2. yvy2ToUV_c
3. uyvyToUV_c
4. nv12ToUV_c
5. nv21ToUV_c
6. abgrToA_c
7. rgbaToA_c
Reviewed-by: colleague of Shiyou Yin
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-11 23:53:59 +02:00
Shiyou Yin
8b76df9142
swscale: [LA] Optimize yuv2plane1_8_c.
...
Reviewed-by: colleague of Shiyou Yin
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-11 23:53:59 +02:00
Shiyou Yin
f3fe2cb5f7
swscale: [LA] Optimize range convert for yuvj420p.
...
Reviewed-by: 陈昊 <chenhao@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-11 23:53:41 +02:00
Michael Niedermayer
1a9eda65d0
swscale/utils: Fix xInc overflow
...
Fixes: signed integer overflow: 2 * 1073741824 cannot be represented in type 'int'
Fixes: 67802/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6249515855183872
Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-04-04 19:38:29 +02:00
Andreas Rheinhardt
428ff7bd8c
swscale/ppc/swscale_ppc_template: Reindent after the previous commit
...
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-04-04 16:47:21 +02:00
Andreas Rheinhardt
95b4aea5e3
swscale/ppc/swscale_ppc_template: Remove code not passing checkasm
...
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-04-04 16:45:23 +02:00
Andreas Rheinhardt
790f793844
avutil/common: Don't auto-include mem.h
...
There are lots of files that don't need it: The number of object
files that actually need it went down from 2011 to 884 here.
Keep it for external users in order to not cause breakages.
Also improve the other headers a bit while just at it.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:43 +01:00
Andreas Rheinhardt
b616be1649
lib*/version: Use static_assert for static asserts
...
Also update the checks that guard against inserting
a new enum entry in the middle of a range.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-03-31 00:08:42 +01:00