1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-02-09 14:14:39 +02:00

2726 Commits

Author SHA1 Message Date
Niklas Haas
5ca5bbd462 swscale/options: add -sws_dither none alias
While this one was technically supported on account of the generic options
code allowing "none" as a valid alias for 0, not having it listed here meant
it never showed up in e.g. the -h output, and is also inconsistent with other
places in the code that generally add an explicit alias with appropriate
documentation. Reduces user confusion at negligible cost.

Fixes: #9192
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 12:47:10 +01:00
Niklas Haas
253b8977c0 swscale: remove primaries/trc change warning
This is now supported when using the new API.
2024-12-23 12:33:43 +01:00
Niklas Haas
6da940e118 swscale/graph: allow dynamically updating HDR metadata
Without triggering a full graph reinit.
2024-12-23 12:33:43 +01:00
Niklas Haas
efff80c8f6 swscale/graph: add color mapping pass
This leverages the previously introduced color management subsystem in order
to adapt between transfer functions and color spaces, as well as for HDR tone
mapping.

Take special care to handle grayscale formats without a colorspace
gracefully.
2024-12-23 12:33:43 +01:00
Niklas Haas
a57fe519b6 swscale/lut3d: add 3DLUT dispatch system
This is a lightweight wrapper around the underlying color management system,
whose job it is merely to manage the 3DLUT state and apply them to the frame
data. This is where we might add platform-specific optimizations in the future.

I also plan on adding support for more pixel formats in the future. In
particular, we could support YUV or XYZ input formats directly using only
negligible additional code in the 3DLUT setup functions. This would eliminate
the major source of slowdown, which is currently the roundtrip to RGBA64.
2024-12-23 12:33:43 +01:00
Niklas Haas
dddf536d3d swscale/cms: add color management subsystem
The underlying color mapping logic was ported as straightforwardly as possible
from libplacebo, although the API and glue code has been very heavily
refactored / rewritten. In particular, the generalization of gamut mapping
methods is replaced by a single ICC intent selection, and constants have been
hard-coded.

To minimize the amount of overall operations, this gamut mapping LUT now embeds
a direct end-to-end transformation to the output color space; something that
libplacebo does in shaders, but which is prohibitively expensive in software.

In order to preserve compatibility with dynamic tone mapping without severely
regressing performance, we add the ability to generate a pair of "split" LUTS,
one for encoding the input and output to the perceptual color space, and a
third to embed the tone mapping operation. Additionally, this intermediate
space could be used for additional subjective effect (e.g. changing
saturation or brightness).

The big downside of the new approach is that generating a static color mapping
LUT is now fairly slow, as the chromaticity lobe peaks have to be recomputed
for every single RGB value, since correlated RGB colors are not necessarily
aligned in ICh space. Generating a split 3DLUT significantly alleviates this
problem because the expensive step is done as part of the IPT input LUT, which
can share the same hue peak calculation at least for all input intensities.
2024-12-23 12:33:43 +01:00
Niklas Haas
2e674780b7 swscale/csputils: add internal colorspace math helpers
Logic is, for the most part, a straight port of similar logic in
liplacebo's colorspace.c, with some general edits and refactors.
2024-12-23 12:33:43 +01:00
Niklas Haas
45f0a7ad33 swscale: add ICC intent enum and option
This setting can be used to infuence the type of tone and gamut mapping used
internally when color space conversions are required. As discussed at VDD'24,
the default was set to relative colorimetric clipping, which is approximately
associative, surjective and idempotent. As such, it roundtrips well, although
it is strictly speaking not associative on out-of-gamut colors.
2024-12-23 12:33:43 +01:00
Niklas Haas
7b7c32322d swscale/utils: fix XYZ primaries tagging
Swscale currently handles XYZ by embedding a forced conversion to
BT.709 RGB with a hardcoded matrix. This is not ideal, but to preserve the
status quo and avoid any unexpected changes in behavior, this patch merely
fixes the inferred primaries tag to match the reality.

In the future, I would like to handle XYZ properly, via direct conversion
to the target colorspace (or possibly simply by using a more fitting
RGB intermediate like SMPTE428), but for now just keep the status quo.
2024-12-23 12:33:43 +01:00
Niklas Haas
1f0c500784 swscale/utils: add helper function to infer colorspace metadata
Logic is loosely on equivalent decisions in libplacebo. The basic idea is to try
and be a bit conservative by treating AVCOL_*_UNSPECIFIED as a no-op, unless the
other primaries set are non-standard / wide-gamut or HDR. This helps avoid
unintended or unexpected colorspace conversions, while forcing it in cases where
we are almost certain it is needed. The major departure from libplacebo semantics
is that we no default to a 1000:1 contrast ration for SDR displays, instead modelling
them as idealized devices with an infinite contrast ratio.

In either case, setting SWS_STRICT overrides this behavior in favor of always
requiring explicit colorspace metadata.
2024-12-23 12:33:43 +01:00
Niklas Haas
9084d581e8 swscale/utils: read dynamic HDR10+ metadata from AVFrame
Logic ported from libplacebo's AVFrame helpers. The basic idea is to use the
provided MaxRGB/MaxSCL values to infer what the actual luminance would have
been, which HDR10+ metadata does not provide directly. It's worth pointing out
that this gives us an *upper* bound on the true maximum luminance, so any
error in the estimation cannot result in clipping.
2024-12-23 12:33:43 +01:00
Niklas Haas
7432fa19cd swscale/utils: read HDR mastering metadata from AVFrame 2024-12-23 12:33:43 +01:00
Niklas Haas
5b21b7f52c swscale/utils: set static/implied HDR metadata
Provide default values for the fields added in the previous commit.
2024-12-23 12:33:43 +01:00
Niklas Haas
a8d01dff9a swscale/utils: add HDR metadata to SwsFormat
Only add the condensed values that we actually care about. Group them into
a new struct to make it easier to discard or replace this metadata.

Define a special comparison function that does not choke on undefined/unknown
metadata.
2024-12-23 12:33:43 +01:00
Niklas Haas
b9dfe8138e swscale/utils: check for supported color transfers
We will use the av_csp_itu_eotf() functions to decode these internally, so
check this function to see if it succeeds.
2024-12-23 12:33:43 +01:00
Niklas Haas
6c9218d748 swscale/unscaled: allow semiplanar copies
As fixed in the previous commit, this enables semipacked range and
bit depth conversions. Previously these would go through the general
purpose path.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:32:02 +01:00
Niklas Haas
77db7f9b87 swscale/unscaled: correctly copy semiplanar formats
This fixes multiple bugs with semiplanar formats like NV12. Not only do these
false positive the grayscale format checks (because dst[2] in NULL), but they
also copied an incorrect number of pixels.

Fixes conversions such as nv12 -> nv12, gray8 -> nv12, nv20le -> nv20be, etc.

Fixes: #11239
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:31:58 +01:00
Niklas Haas
c6bf7f6645 swscale/unscaled: correctly round yuv2yuv when not dithering
We should at least bias towards the nearest integer, instead of always
rounding down, when not dithering. This is a bit more correct.

The FATE changes are only in the cases where sws_dither was explicitly set
to "none", which is exactly as expected.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-23 11:29:22 +01:00
Niklas Haas
095f8038fa swscale/output: fix bilinear yuv2rgb chroma interpolation
These functions were divided into two special cases; one assuming that
uvalpha == 0, and the other assuming that uvalpha == 2048. This worked fine
for simple 2x chroma upscaling but broke for e.g. yuv410p, non-centered chroma,
or other special cases that involved non-aligned chroma filters.

Fix it by instead dividing this check into two cases, a uvalpha==0 fast path
and a uvalpha>0 general path. Instead of (A+B)/2 the general path now multiplies
in the true uvalpha weight.

I tried preserving the old fast path for the case of uvalpha == 2048, but this
was significantly slower in practise versus having just one general path.
However, we still need a uvalpha == 0 path for the unscaled case.

Fixes: ticket #5083
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-21 10:57:54 +01:00
Niklas Haas
b38f6f9990 tests/swscale: allow nonzero positive return codes from sws_scale_frame()
See previous commit.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-18 17:30:48 +01:00
Niklas Haas
e05a1bb879 swscale: fix documentation of sws_scale_frame()
Since its introduction, this function has claimed to return 0 on success, yet
never actually did so (until the introduction of the new graph based API). It
always returned the number of scaled lines, and continues to do so.

To avoid confusion, but also avoid regressing possible clients that relied on
the existing semantics, simply update the documentation to reflect the actual
behavior. Remain ambiguous about the exact interpretation of the return value
on account of the unfortunate difference in behavior between the legacy and
new scaling APIs.

Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-18 17:30:48 +01:00
Niklas Haas
2df655bc2c swscale/utils: fix sws_getCachedContext check
This logic was inverted, but || was not replaced by &&.

Fixes: ed5dd675624c83d9c69b406ce30e4e09f29970e3
Fixes: ticket #11353
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-18 17:30:26 +01:00
Niklas Haas
ce457bfccd swscale/slice: fix init of 32 bpc planes
In input.c and output.c and many other places, swscale follows the rule of using
15-bit intermediate if output bpc is <= 8, and 19-bit (inside int32_t)
intermediate otherwise. See e.g. the comments on hyScale() on
swscale_internal.h. These are also the coefficients that yuv2gbrpf32_full_X_c()
is using.

In contrast to this, the plane init code in slice.c (function fill_ones) is
assuming that we use 35-bit intermediates (inside 64-bit integers) for this
case, seemingly added by commit b4967fc71c63eae8cd96f9c46cd3e1fbd705bbf9 with
no further justification.

This causes a mismatch whenever the implicitly initialized plane contents leak
out to the output, e.g. when converting from grayscale to RGB.

Fixes: ticket #10716
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-16 12:21:55 +01:00
Niklas Haas
ee903c4786 tests/swscale: fix sscanf return value check
We only parse 12 values, so this check always failed. Regression caused by
a change to the print format.

Fixes: 59c39a79cafdcb46972380aac5644f84059cd2a8
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-11 09:10:45 +01:00
Ramiro Polla
ca889b1328 swscale/aarch64: add neon {lum,chr}ConvertRange16
aarch64 A55:
chrRangeFromJpeg16_1920_c:    32684.2
chrRangeFromJpeg16_1920_neon:  8431.2 (3.88x)
chrRangeToJpeg16_1920_c:      24996.8
chrRangeToJpeg16_1920_neon:    9395.0 (2.66x)
lumRangeFromJpeg16_1920_c:    17305.2
lumRangeFromJpeg16_1920_neon:  4586.5 (3.77x)
lumRangeToJpeg16_1920_c:      21144.8
lumRangeToJpeg16_1920_neon:    5069.8 (4.17x)

aarch64 A76:
chrRangeFromJpeg16_1920_c:    11523.8
chrRangeFromJpeg16_1920_neon:  3367.5 (3.42x)
chrRangeToJpeg16_1920_c:      11655.2
chrRangeToJpeg16_1920_neon:    4087.2 (2.85x)
lumRangeFromJpeg16_1920_c:     5762.0
lumRangeFromJpeg16_1920_neon:  1815.8 (3.17x)
lumRangeToJpeg16_1920_c:       5946.2
lumRangeToJpeg16_1920_neon:    2148.2 (2.77x)
2024-12-05 21:10:29 +01:00
Ramiro Polla
87052c0933 swscale/x86: add sse4 and avx2 {lum,chr}ConvertRange16
chrRangeFromJpeg16_1920_c:    3153.9
chrRangeFromJpeg16_1920_sse4: 1770.0 (1.78x)
chrRangeFromJpeg16_1920_avx2:  891.5 (3.54x)
chrRangeToJpeg16_1920_c:      3165.0
chrRangeToJpeg16_1920_sse4:   1953.2 (1.62x)
chrRangeToJpeg16_1920_avx2:    973.0 (3.25x)
lumRangeFromJpeg16_1920_c:    1298.5
lumRangeFromJpeg16_1920_sse4:  886.5 (1.46x)
lumRangeFromJpeg16_1920_avx2:  447.7 (2.90x)
lumRangeToJpeg16_1920_c:      1905.0
lumRangeToJpeg16_1920_sse4:    993.0 (1.92x)
lumRangeToJpeg16_1920_avx2:    498.9 (3.82x)
2024-12-05 21:10:29 +01:00
Ramiro Polla
6fe4a4ffb6 swscale/aarch64/range_convert: update neon range_convert functions to new API
aarch64 A55:
chrRangeFromJpeg8_1920_c:    28835.2 (1.00x)
chrRangeFromJpeg8_1920_neon:  5313.9 (5.43x)  5308.4 (5.43x)
chrRangeToJpeg8_1920_c:      23074.7 (1.00x)
chrRangeToJpeg8_1920_neon:    5551.3 (4.16x)  5549.2 (4.16x)
lumRangeFromJpeg8_1920_c:    15389.7 (1.00x)
lumRangeFromJpeg8_1920_neon:  3152.3 (4.88x)  3147.7 (4.89x)
lumRangeToJpeg8_1920_c:      19227.8 (1.00x)
lumRangeToJpeg8_1920_neon:    3628.7 (5.30x)  3630.2 (5.30x)

aarch64 A76:
chrRangeFromJpeg8_1920_c:    6324.4 (1.00x)
chrRangeFromJpeg8_1920_neon: 2344.5 (2.70x) 2304.2 (2.74x)
chrRangeToJpeg8_1920_c:      9656.0 (1.00x)
chrRangeToJpeg8_1920_neon:   2824.2 (3.42x) 2794.2 (3.46x)
lumRangeFromJpeg8_1920_c:    4422.0 (1.00x)
lumRangeFromJpeg8_1920_neon: 1104.5 (4.00x) 1106.2 (4.00x)
lumRangeToJpeg8_1920_c:      5949.1 (1.00x)
lumRangeToJpeg8_1920_neon:   1329.8 (4.47x) 1328.2 (4.48x)
2024-12-05 21:10:29 +01:00
Ramiro Polla
be108ebcf4 swscale/x86/range_convert: update sse2 and avx2 range_convert functions to new API
chrRangeFromJpeg8_1920_c:    2127.4 (1.00x)
chrRangeFromJpeg8_1920_sse2:  816.0 (2.61x)  813.5 (2.62x)
chrRangeFromJpeg8_1920_avx2:  408.9 (5.20x)  405.4 (5.25x)
chrRangeToJpeg8_1920_c:      3166.9 (1.00x)
chrRangeToJpeg8_1920_sse2:    815.0 (3.89x)  815.0 (3.89x)
chrRangeToJpeg8_1920_avx2:    404.5 (7.83x)  405.5 (7.81x)
lumRangeFromJpeg8_1920_c:    1263.0 (1.00x)
lumRangeFromJpeg8_1920_sse2:  411.0 (3.07x)  413.2 (3.06x)
lumRangeFromJpeg8_1920_avx2:  200.5 (6.30x)  201.9 (6.26x)
lumRangeToJpeg8_1920_c:      1886.8 (1.00x)
lumRangeToJpeg8_1920_sse2:    412.0 (4.58x)  408.9 (4.61x)
lumRangeToJpeg8_1920_avx2:    208.5 (9.05x)  205.7 (9.17x)
2024-12-05 21:10:29 +01:00
Ramiro Polla
384fe39623 swscale/range_convert: fix mpeg ranges in yuv range conversion for non-8-bit pixel formats
There is an issue with the constants used in YUV to YUV range conversion,
where the upper bound is not respected when converting to mpeg range.

With this commit, the constants are calculated at runtime, depending on
the bit depth. This approach also allows us to more easily understand how
the constants are derived.

For bit depths <= 14, the number of fixed point bits has been set to 14
for all conversions, to simplify the code.
For bit depths > 14, the number of fixed points bits has been raised and
set to 18, to allow for the conversion to be accurate enough for the mpeg
range to be respected.

The convert functions now take the conversion constants (coeff and offset)
as function arguments.
For bit depths <= 14, coeff is unsigned 16-bit and offset is 32-bit.
For bit depths > 14, coeff is unsigned 32-bit and offset is 64-bit.

x86_64:
chrRangeFromJpeg8_1920_c:    2127.4   2125.0  (1.00x)
chrRangeFromJpeg16_1920_c:   2325.2   2127.2  (1.09x)
chrRangeToJpeg8_1920_c:      3166.9   3168.7  (1.00x)
chrRangeToJpeg16_1920_c:     2152.4   3164.8  (0.68x)
lumRangeFromJpeg8_1920_c:    1263.0   1302.5  (0.97x)
lumRangeFromJpeg16_1920_c:   1080.5   1299.2  (0.83x)
lumRangeToJpeg8_1920_c:      1886.8   2112.2  (0.89x)
lumRangeToJpeg16_1920_c:     1077.0   1906.5  (0.56x)

aarch64 A55:
chrRangeFromJpeg8_1920_c:   28835.2  28835.6  (1.00x)
chrRangeFromJpeg16_1920_c:  28839.8  32680.8  (0.88x)
chrRangeToJpeg8_1920_c:     23074.7  23075.4  (1.00x)
chrRangeToJpeg16_1920_c:    17318.9  24996.0  (0.69x)
lumRangeFromJpeg8_1920_c:   15389.7  15384.5  (1.00x)
lumRangeFromJpeg16_1920_c:  15388.2  17306.7  (0.89x)
lumRangeToJpeg8_1920_c:     19227.8  19226.6  (1.00x)
lumRangeToJpeg16_1920_c:    15387.0  21146.3  (0.73x)

aarch64 A76:
chrRangeFromJpeg8_1920_c:    6324.4   6268.1  (1.01x)
chrRangeFromJpeg16_1920_c:   6339.9  11521.5  (0.55x)
chrRangeToJpeg8_1920_c:      9656.0   9612.8  (1.00x)
chrRangeToJpeg16_1920_c:     6340.4  11651.8  (0.54x)
lumRangeFromJpeg8_1920_c:    4422.0   4420.8  (1.00x)
lumRangeFromJpeg16_1920_c:   4420.9   5762.0  (0.77x)
lumRangeToJpeg8_1920_c:      5949.1   5977.5  (1.00x)
lumRangeToJpeg16_1920_c:     4446.8   5946.2  (0.75x)

NOTE: all simd optimizations for range_convert have been disabled.
      they will be re-enabled when they are fixed for each architecture.

NOTE2: the same issue still exists in rgb2yuv conversions, which is not
       addressed in this commit.
2024-12-05 21:10:29 +01:00
Ramiro Polla
58bcdeb742 swscale/aarch64/range_convert: saturate output instead of limiting input
aarch64 A55:
chrRangeFromJpeg8_1920_c:    28836.2 (1.00x)
chrRangeFromJpeg8_1920_neon:  5312.6 (5.43x)  5313.9 (5.43x)
chrRangeToJpeg8_1920_c:      44196.2 (1.00x)
chrRangeToJpeg8_1920_neon:    6034.6 (7.32x)  5551.3 (7.96x)
lumRangeFromJpeg8_1920_c:    15388.5 (1.00x)
lumRangeFromJpeg8_1920_neon:  3150.7 (4.88x)  3152.3 (4.88x)
lumRangeToJpeg8_1920_c:      23069.7 (1.00x)
lumRangeToJpeg8_1920_neon:    3873.2 (5.96x)  3628.7 (6.36x)

aarch64 A76:
chrRangeFromJpeg8_1920_c:     6334.7 (1.00x)
chrRangeFromJpeg8_1920_neon:  2264.5 (2.80x)  2344.5 (2.70x)
chrRangeToJpeg8_1920_c:      11474.5 (1.00x)
chrRangeToJpeg8_1920_neon:    2646.5 (4.34x)  2824.2 (4.06x)
lumRangeFromJpeg8_1920_c:     4453.2 (1.00x)
lumRangeFromJpeg8_1920_neon:  1104.8 (4.03x)  1104.5 (4.03x)
lumRangeToJpeg8_1920_c:       6645.0 (1.00x)
lumRangeToJpeg8_1920_neon:    1310.5 (5.07x)  1329.8 (5.00x)
2024-12-05 21:10:29 +01:00
Ramiro Polla
2d1358a84d swscale/range_convert: saturate output instead of limiting input
For bit depths <= 14, the result is saturated to 15 bits.
For bit depths > 14, the result is saturated to 19 bits.

x86_64:
chrRangeFromJpeg8_1920_c:    2126.5   2127.4  (1.00x)
chrRangeFromJpeg16_1920_c:   2331.4   2325.2  (1.00x)
chrRangeToJpeg8_1920_c:      3163.0   3166.9  (1.00x)
chrRangeToJpeg16_1920_c:     3163.7   2152.4  (1.47x)
lumRangeFromJpeg8_1920_c:    1262.2   1263.0  (1.00x)
lumRangeFromJpeg16_1920_c:   1079.5   1080.5  (1.00x)
lumRangeToJpeg8_1920_c:      1860.5   1886.8  (0.99x)
lumRangeToJpeg16_1920_c:     1910.2   1077.0  (1.77x)

aarch64 A55:
chrRangeFromJpeg8_1920_c:   28836.2  28835.2  (1.00x)
chrRangeFromJpeg16_1920_c:  28840.1  28839.8  (1.00x)
chrRangeToJpeg8_1920_c:     44196.2  23074.7  (1.92x)
chrRangeToJpeg16_1920_c:    36527.3  17318.9  (2.11x)
lumRangeFromJpeg8_1920_c:   15388.5  15389.7  (1.00x)
lumRangeFromJpeg16_1920_c:  15389.3  15388.2  (1.00x)
lumRangeToJpeg8_1920_c:     23069.7  19227.8  (1.20x)
lumRangeToJpeg16_1920_c:    19227.8  15387.0  (1.25x)

aarch64 A76:
chrRangeFromJpeg8_1920_c:    6334.7   6324.4  (1.00x)
chrRangeFromJpeg16_1920_c:   6336.0   6339.9  (1.00x)
chrRangeToJpeg8_1920_c:     11474.5   9656.0  (1.19x)
chrRangeToJpeg16_1920_c:     9640.5   6340.4  (1.52x)
lumRangeFromJpeg8_1920_c:    4453.2   4422.0  (1.01x)
lumRangeFromJpeg16_1920_c:   4414.2   4420.9  (1.00x)
lumRangeToJpeg8_1920_c:      6645.0   5949.1  (1.12x)
lumRangeToJpeg16_1920_c:     6005.2   4446.8  (1.35x)

NOTE: all simd optimizations for range_convert have been disabled
      except for x86, which already had the same behaviour.
      they will be re-enabled when they are fixed for each architecture.
2024-12-05 21:10:29 +01:00
Niklas Haas
2f95bc3cb3 swscale/utils: disable full_chr_h_input optimization for odd width
The basic problem here is that the rgb*ToUV_half_* functions hard-code a
bilinear downsample from src[i] + src[i+1], with no bounds check on the i+1
access.

Due to the signature of the function, we cannot easily plumb the "true" width
into the function body to perform a bounds check. Similarly, we cannot easily
pre-pad the input because it is typically reading from the (const) input
frame, which would require a full memcpy to pad. Either of these solutions are
more trouble than the feature is worth, so just disable it on odd input sizes.

Fixes: use of uninitialized value
Fixes: ticket #11265
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-04 11:38:47 +01:00
Niklas Haas
79452d382f swscale/graph: fix memleak of cascaded graphs
Just free them directly and discard the parent context.

Fixes: bf738412e849bcb8c63a330dfb814281b3d97f6b
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: Sovereign Tech Fund
2024-12-04 11:38:30 +01:00
Michael Niedermayer
d32dcc07a7
swscale/swscale_unscaled: Fix odd height with nv24_to_yuv420p_chroma()
Fixes: out of array read
Fixes: 71726/clusterfuzz-testcase-ffmpeg_SWS_fuzzer-5876893532880896
Fixes: 377735917/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-6686071112400896

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Approved-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-12-04 04:23:48 +01:00
Michael Niedermayer
aeec39f3c1
swscale/slice: clear allocated memory in alloc_lines()
Fixes: use of uninitialized memory in hScale16To15_c()
Fixes: 373924007/clusterfuzz-testcase-minimized-ffmpeg_SWS_fuzzer-5841199968092160

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-12-02 03:14:47 +01:00
Sean McGovern
b9eaf6e05c
swscale/ppc: disable YUV2RGB AltiVec acceleration
The FATE test 'checkasm-sw_yuv2rgb' currently fails on this platform,
in both little- and big-endian configurations with AltiVec enabled.

Disable it for the time being.

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2024-12-02 02:51:39 +01:00
Rémi Denis-Courmont
da1ab7940e riscv: remove unnecessary #include's 2024-11-25 19:29:21 +02:00
Marvin Scholz
6b9f4f36f7 swscale/internal: fix typo in loongarch specific code
Regression from 2d077f9acda4946b3455ded5778fb3fc7e85bba2
2024-11-25 17:15:00 +01:00
Niklas Haas
3edd1e42b9 tests/swscale: add a benchmarking mode
With the ability to set the thread count as well. This benchmark includes
the constant overhead of context initialization.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-25 11:03:54 +01:00
Niklas Haas
59c39a79ca tests/swscale: rewrite on top of new API
This rewrite cleans up the code to use AVFrames and the new swscale API. The
log format has also been simplified and expanded to account for the new
options. (Not yet implemented)

The self testing code path has also been expanded to test the new swscale
implementation against the old one, to serve as an unchanging reference. This
does not accomplish much yet, but serves as a framework for future work.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-25 11:03:54 +01:00
Niklas Haas
2a091d4f2e swscale: introduce new, dynamic scaling API
As part of a larger, ongoing effort to modernize and partially rewrite
libswscale, it was decided and generally agreed upon to introduce a new
public API for libswscale. This API is designed to be less stateful, more
explicitly defined, and considerably easier to use than the existing one.

Most of the API work has been already accomplished in the previous commits,
this commit merely introduces the ability to use sws_scale_frame()
dynamically, without prior sws_init_context() calls. Instead, the new API
takes frame properties from the frames themselves, and the implementation is
based on the new SwsGraph API, which we simply reinitialize as needed.

This high-level wrapper also recreates the logic that used to live inside
vf_scale for scaling interlaced frames, enabling it to be reused more easily
by end users.

Finally, this function is designed to simply copy refs directly when nothing
needs to be done, substantially improving throughput of the noop fast path.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-25 11:03:50 +01:00
Niklas Haas
bf738412e8 swscale/graph: add new high-level scaler dispatch mechanism
This interface has been designed from the ground up to serve as a new
framework for dispatching various scaling operations at a high level. This
will eventually replace the old ad-hoc system of using cascaded contexts,
as well as allowing us to plug in more dynamic scaling passes requiring
intermediate steps, such as colorspace conversions, etc.

The starter implementation merely piggybacks off the existing sws_init() and
sws_scale(), functions, though it does bring the immediate improvement of
splitting up cascaded functions and pre/post conversion functions into
separate filter passes, which allows them to e.g. be executed in parallel
even when the main scaler is required to be single threaded. Additionally,
a dedicated (multi-threaded) noop memcpy pass substantially improves
throughput of that fast path.

Follow-up commits will eventually expand this to move all of the scaling
decision logic into the graph init function, and also eliminate some of the
current special cases.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-25 11:02:16 +01:00
Niklas Haas
c461dcf291 swscale/internal: expose sws_init_single_context() internally
Used by the graph API swscale wrapper, for now.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-25 11:02:16 +01:00
Niklas Haas
fb16964009 swscale: organize and better document flags
Group them into an enum rather than random #defines, and document their
behavior a bit more obviously.

Of particular note, I discovered that SWS_DIRECT_BGR is not referenced
anywhere else in the code base. As such, I have moved it to the deprecated
section, alongside SWS_ERROR_DIFFUSION.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-25 11:02:12 +01:00
Niklas Haas
6a91a165fd swscale: eliminate redundant SwsInternal accesses
This is a purely cosmetic commit aimed at replacing accesses to
SwsInternal.opts by direct access to SwsContext wherever convenient.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-25 10:59:52 +01:00
Niklas Haas
ed5dd67562 swscale: expose SwsContext publicly
Following in the footsteps of the work in the previous commit, it's now
relatively straightforward to expose the options struct publicly as
SwsContext. This is a step towards making this more user friendly, as
well as following API conventions established elsewhere.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-25 10:59:49 +01:00
Niklas Haas
2d077f9acd swscale/internal: group user-facing options together
This is a preliminary step to separating these into a new struct. This
commit contains no functional changes, it is a pure search-and-replace.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-21 12:49:56 +01:00
Niklas Haas
10d1be2621 swscale/internal: use static_assert for enforcing offsets
Instead of sprinkling av_assert0 into random init functions.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-21 12:47:43 +01:00
Niklas Haas
55d5eae411 swscale/options: cosmetic changes
Reorganize the list, fix whitespace, make indentation consistent, and
rename some descriptions for clarity, consistency or informativeness.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2024-11-21 12:47:14 +01:00
Rémi Denis-Courmont
1912c86af6 sws/range_convert: fix RISC-V chrFromJpeg 2024-11-17 11:28:21 +02:00