1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2024-11-26 19:01:44 +02:00
Commit Graph

2629 Commits

Author SHA1 Message Date
Niklas Haas
775de8c19d swscale/rgb2xyz: follow convention on image pointers and strides
Instead of taking an int16_t pointer and a stride in halfwords, follow the
usual convention of treating all planes and strides as byte-addressed.

This does not have any immediate effect but makes these functions more
reusable without unintended "gotchas".

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-09 13:14:57 +02:00
Niklas Haas
9d8f5141cf swscale/rgb2xyz: add explicit width parameter
This fixes an 11-year-old bug in the rgb2xyz functions, when used with a
negative stride. The current loop bounds turned it into a no-op.

Additionally, this increases performance on highly cropped images, whose
stride may be substantially higher than the effective width.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-09 13:14:57 +02:00
Niklas Haas
ea228fc415 swscale/rgb2xyz: minor style fixes
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-09 13:14:57 +02:00
James Almer
04612351ab swscale/input: add V30X input support
Signed-off-by: James Almer <jamrial@gmail.com>
2024-10-08 22:26:07 -03:00
James Almer
ea05edc9e0 swscale/input: add VYU444 input support
Signed-off-by: James Almer <jamrial@gmail.com>
2024-10-08 22:24:47 -03:00
James Almer
ec7f5e314d swscale/input: add UYVA input support
Signed-off-by: James Almer <jamrial@gmail.com>
2024-10-08 22:24:47 -03:00
James Almer
bb37d3c33e swscale/input: add AYUV input support
Signed-off-by: James Almer <jamrial@gmail.com>
2024-10-08 22:24:47 -03:00
jinbo
e6ecc1e757
swscale: Fix conflicting types for loongarch
Build breaks after c1a0e65763

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-10-09 01:55:50 +02:00
Niklas Haas
477445722c swscale/ppc: fix altivec build failure
Fixes: c1a0e65763
2024-10-08 16:45:36 +02:00
Martin Storsjö
b9145fcab2 swscale: Fix aarch64 and i386 compilation failures
This unbreaks builds after c1a0e65763,
which broke with errors like

src/libswscale/aarch64/rgb2rgb.c:66:25: error: incompatible function pointer types assigning to 'void (*)(const uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int, const int32_t *)' (aka 'void (*)(const unsigned char *, unsigned char *, unsigned char *, unsigned char *, int, int, int, int, int, const int *)') from 'void (const uint8_t *, uint8_t *, uint8_t *, uint8_t *, int, int, int, int, int, int32_t *)' (aka 'void (const unsigned char *, unsigned char *, unsigned char *, unsigned char *, int, int, int, int, int, int *)') [-Wincompatible-function-pointer-types]
   66 |         ff_rgb24toyv12  = rgb24toyv12;
      |                         ^ ~~~~~~~~~~~

and

src/libswscale/aarch64/swscale_unscaled.c:213:29: error: incompatible function pointer types assigning to 'SwsFunc' (aka 'int (*)(struct SwsContext *, const unsigned char *const *, const int *, int, int, unsigned char *const *, const int *)') from 'int (SwsContext *, const uint8_t *const *, const int *, int, int, const uint8_t **, const int *)' (aka 'int (struct SwsContext *, const unsigned char *const *, const int *, int, int, const unsigned char **, const int *)') [-Wincompatible-function-pointer-types]
  213 |         c->convert_unscaled = nv24_to_yuv420p_neon_wrapper;
      |                             ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-10-08 09:29:07 +03:00
Niklas Haas
73b3344edd swscale/input: parametrize ff_sws_init_input_funcs() pointers
Following the precedent set by ff_sws_init_output_funcs().

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
20b350b284 swscale/internal: add typedefs for input reading functions
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
b90d522d2c swscale/internal: forward typedef SwsContext
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
c1a0e65763 swscale/internal: constify SwsFunc
I want to move away from having random leaf processing functions mutate
plane pointers, and while we're at it, we might as well make the strides
and tables const as well.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
286bdc9cdc swscale/internal: turn cascaded_tmp into an array
Slightly more convenient to access from the new wrapping code.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
61369484f6 swscale/internal: expose ff_update_palette() internally
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
aee19ee431 swscale/internal: rename NB_SWS_DITHER for consistency
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Niklas Haas
41ce370b65 tests/swscale: fix minor typos
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-10-07 19:51:34 +02:00
Michael Niedermayer
38e224c2ba
*/version.h: bump after release/7.1 branch
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-09-24 17:10:35 +02:00
Michael Niedermayer
e1094ac45d
*/version.h: bump minor versions for release/7.1
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-09-24 17:07:30 +02:00
Zhao Zhili
e18b46d95f swscale/aarch64: Fix rgb24toyv12 only works with aligned width
Since c0666d8b, rgb24toyv12 is broken for width non-aligned to 16.
Add a simple wrapper to handle the non-aligned part.

Co-authored-by: johzzy <hellojinqiang@gmail.com>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-09-24 10:24:14 +08:00
Michael Niedermayer
bd80c97391
swscale/output: Fix undefined integer overflow in yuv2rgba64_2_c_template()
Fixes: signed integer overflow: -1082982400 + -1083218484 cannot be represented in type 'int'
Fixes: 70657/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6707819712675840

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-09-19 00:24:26 +02:00
Michael Niedermayer
44c5641ae8
swscale/swscale: Use unsigned operation to avoid undefined behavior
I have not checked that the constant is correct, this just fixes the undefined behavior

Fixes: signed integer overflow: -646656 * 3517 cannot be represented in type 'int
Fixes: 70559/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5209368631508992

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-09-19 00:10:38 +02:00
Ramiro Polla
c0666d8bed swscale/aarch64/rgb2rgb: add neon implementation for rgb24toyv12
A55               A76
rgb24toyv12_16_200_c:     36890.6           17275.5
rgb24toyv12_16_200_neon:  12460.1 ( 2.96x)   5360.8 ( 3.22x)
rgb24toyv12_128_60_c:     83205.1           39884.8
rgb24toyv12_128_60_neon:  27468.4 ( 3.03x)  13552.5 ( 2.94x)
rgb24toyv12_512_16_c:     88111.6           42346.8
rgb24toyv12_512_16_neon:  29126.6 ( 3.03x)  14411.2 ( 2.94x)
rgb24toyv12_1920_4_c:     82068.1           39620.0
rgb24toyv12_1920_4_neon:  27011.6 ( 3.04x)  13492.2 ( 2.94x)
2024-09-06 23:11:13 +02:00
Ramiro Polla
caaec2ea95 swscale/x86/rgb2rgb: disable rgb24toyv12_mmxext for x86_64
The mmxext implementation is slower than the C version in x86_64.

                                m32               m64
rgb24toyv12_16_200_c:       24942.7           14812.6
rgb24toyv12_16_200_mmxext:  17857.2 ( 1.40x)  17400.4 ( 0.85x)
rgb24toyv12_128_60_c:       56892.9           35616.9
rgb24toyv12_128_60_mmxext:  40730.9 ( 1.40x)  39610.4 ( 0.90x)
rgb24toyv12_512_16_c:       58402.7           37209.4
rgb24toyv12_512_16_mmxext:  44842.4 ( 1.30x)  41136.2 ( 0.90x)
rgb24toyv12_1920_4_c:       54827.4           34737.4
rgb24toyv12_1920_4_mmxext:  51169.9 ( 1.07x)  34818.9 ( 1.00x)
2024-09-06 23:06:38 +02:00
Ramiro Polla
3604b2403c swscale/rgb2rgb: improve chroma conversion in ff_rgb24toyv12_c
The current code subsamples by dropping 3/4 pixels to calculate the
chroma components. This commit calculates the average of 4 rgb pixels
before calculating the chroma components, putting it in line with the
mmxext implementation.
2024-09-06 23:06:32 +02:00
Ramiro Polla
d8848325a6 swscale/aarch64/rgb2rgb: add deinterleaveBytes neon implementation
A55               A76
deinterleave_bytes_c:             70342.0           34497.5
deinterleave_bytes_neon:          21594.5 ( 3.26x)   5535.2 ( 6.23x)
deinterleave_bytes_aligned_c:     71340.8           34651.2
deinterleave_bytes_aligned_neon:   8616.8 ( 8.28x)   3996.2 ( 8.67x)
2024-09-06 23:05:09 +02:00
Ramiro Polla
4c824ad391 swscale/x86/rgb2rgb: fix deinterleaveBytes writing past the end of the buffers 2024-09-06 23:05:04 +02:00
Ramiro Polla
f17a6bd200 swscale/x86/rgb2rgb: fix deinterleaveBytes for unaligned dst pointers 2024-09-06 23:05:01 +02:00
Rémi Denis-Courmont
27d28b68da swscale/rgb2rgb: enable R-V V deinterleaveBytes
T-Head C908:
deinterleave_bytes_c:                               100328.3 ( 1.00x)
deinterleave_bytes_rvv_i32:                          19331.3 ( 5.19x)
deinterleave_bytes_aligned_c:                       100337.5 ( 1.00x)
deinterleave_bytes_aligned_rvv_i32:                  15748.0 ( 6.37x)

SpacemiT X60:
deinterleave_bytes_c:                                95230.6 ( 1.00x)
deinterleave_bytes_rvv_i32:                           9790.3 ( 9.73x)
deinterleave_bytes_aligned_c:                        96564.1 ( 1.00x)
deinterleave_bytes_aligned_rvv_i32:                   7780.1 (12.41x)
2024-09-04 22:04:11 +03:00
Ramiro Polla
420d443600 swscale/aarch64: cosmetics fix (spaces inside curly braces) 2024-08-26 11:07:49 +02:00
Ramiro Polla
52887683e9 swscale/aarch64: add nv24/nv42 to yuv420p unscaled converter
A55               A76
nv24_yuv420p_128_c:       4956.1            1267.0
nv24_yuv420p_128_neon:    3109.1 ( 1.59x)    640.0 ( 1.98x)
nv24_yuv420p_1920_c:     35728.4           11736.2
nv24_yuv420p_1920_neon:   8011.1 ( 4.46x)   2436.0 ( 4.82x)
nv42_yuv420p_128_c:       4956.4            1270.5
nv42_yuv420p_128_neon:    3074.6 ( 1.61x)    639.5 ( 1.99x)
nv42_yuv420p_1920_c:     35685.9           11732.5
nv42_yuv420p_1920_neon:   7995.1 ( 4.46x)   2437.2 ( 4.81x)
2024-08-26 11:04:46 +02:00
Ramiro Polla
88a563ad18 swscale: export ff_copyPlane so it may be used by simd code 2024-08-26 11:04:46 +02:00
Ramiro Polla
4eb5594295 swscale: add nv24/nv42 to yuv420p unscaled converter 2024-08-26 11:04:46 +02:00
Martin Storsjö
cfe0a36352 libswscale: aarch64: Fix the indentation of some macro invocations
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-08-22 14:40:30 +03:00
Martin Storsjö
507c2a5774 libswscale: arm: Don't assume aligned output in yuv2rgb functions
This fixes failures in recently added checkasm tests.

While the buffers in most cases are aligned, libswscale in general
can't assume the output to be aligned.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-08-19 23:04:52 +03:00
Ramiro Polla
181cd260db swscale/aarch64/yuv2rgb: add neon yuv42{0,2}p -> gbrp unscaled colorspace converters
checkasm --bench on a Raspberry Pi 5 Model B Rev 1.0:
yuv420p_gbrp_128_c: 1243.0
yuv420p_gbrp_128_neon: 453.5
yuv420p_gbrp_1920_c: 18165.5
yuv420p_gbrp_1920_neon: 6700.0
yuv422p_gbrp_128_c: 1463.5
yuv422p_gbrp_128_neon: 471.5
yuv422p_gbrp_1920_c: 21343.7
yuv422p_gbrp_1920_neon: 6743.5
2024-08-18 22:26:17 +02:00
Ramiro Polla
8744764a4c swscale/x86/yuv2rgb: add ssse3 yuv42{0,2}p -> gbrp unscaled colorspace converters
Note: this implementation is limited to x86_64 due to general purpose
      register pressure.

checkasm --bench on an Intel(R) Core(TM) i5-5300U CPU @ 2.30GHz:
yuv420p_gbrp_8_c: 118.5
yuv420p_gbrp_8_ssse3: 93.3
yuv420p_gbrp_128_c: 1068.3
yuv420p_gbrp_128_ssse3: 319.3
yuv420p_gbrp_1080_c: 8841.8
yuv420p_gbrp_1080_ssse3: 2211.8
yuv420p_gbrp_1920_c: 15903.8
yuv420p_gbrp_1920_ssse3: 3814.3
yuv422p_gbrp_8_c: 144.8
yuv422p_gbrp_8_ssse3: 93.8
yuv422p_gbrp_128_c: 1395.8
yuv422p_gbrp_128_ssse3: 313.0
yuv422p_gbrp_1080_c: 11551.5
yuv422p_gbrp_1080_ssse3: 2240.8
yuv422p_gbrp_1920_c: 20585.3
yuv422p_gbrp_1920_ssse3: 5249.5
yuva420p_gbrp_8_c: 117.5
yuva420p_gbrp_8_ssse3: 92.0
yuva420p_gbrp_128_c: 1593.0
yuva420p_gbrp_128_ssse3: 319.3
yuva420p_gbrp_1080_c: 8694.5
yuva420p_gbrp_1080_ssse3: 2186.0
yuva420p_gbrp_1920_c: 15946.5
yuva420p_gbrp_1920_ssse3: 3805.3
2024-08-18 22:26:14 +02:00
Ramiro Polla
4545205a26 swscale/yuv2rgb: add yuv42{0,2}p -> gbrp unscaled colorspace converters 2024-08-18 22:26:11 +02:00
Ramiro Polla
af5adf57e3 swscale/yuv2rgb: prepare YUV2RGBFUNC macro for multi-planar rgb
This will be used in the upcoming yuv42{0,2}p -> gbrp unscaled
colorspace converters.

There is no difference in performance.
2024-08-18 22:26:08 +02:00
Ramiro Polla
24063e7827 swscale/yuv2rgb: prepare LOADCHROMA/PUTFUNC macros for multi-planar rgb
This will be used in the upcoming yuv42{0,2}p -> gbrp unscaled
colorspace converters.

There is no difference in performance.
2024-08-18 22:26:05 +02:00
Niklas Haas
6b40be941a swscale/options: relax src/dst_h/v_chr_pos value range
When dealing with 4x subsampling ratios (log2 == 2), such as can arise
with 4:1:1 or 4:1:0, a value range of 512 is not enough to cover the
range of possible scenarios.

For example, bottom-sited chroma in 4:1:0 would require an offset of 768
(three luma rows). Simply double the limit to 1024. I don't see any
place in initFilter() that would experience overflow as a result of this
change, especially since get_local_pos() right-shifts it by the
subsampling ratio again.
2024-08-16 11:43:37 +02:00
Niklas Haas
3e064f52eb swscale: document SWS_FULL_CHR_H_* flags
Based on my best understanding of what they do, given the source code.
2024-08-16 11:43:37 +02:00
James Almer
66592e8b10 swscale/output: don't leave the alpha channel undefined in vuyx and xv36le
It's non-determistic, as shown by poisoning avfilter buffers instead of zeroing them.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-08-13 14:49:41 -03:00
Rémi Denis-Courmont
210877c5fd sws/riscv: depend on RVB and simplify accordingly 2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
bd0c3edb13 lavu/riscv: count bytes rather than words for bswap32
This removes the dependency on Zba at essentially zero cost.
2024-07-30 18:41:51 +03:00
Shiyou Yin
4713a5cc24
swscale: [loongarch] Fix checkasm-sw_yuv2rgb failure.
Reviewed-by: 陈昊 <chenhao@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-28 19:02:16 +02:00
Rémi Denis-Courmont
4f2472909e sws/riscv: add forward-edge CFI landing pads 2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
e91a8cc4de sws/riscv: require B or zba explicitly 2024-07-25 18:55:48 +03:00
Michael Niedermayer
bcab9789ef
swscale/output: Fix integer overflows in yuv2rgba64_X_c_template
Fixes: signed integer overflow: -1082982400 + -1068681048 cannot be represented in type 'int'
Fixes: 69995/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6285740271534080

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-07-21 15:35:08 +02:00