1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2026-04-24 04:44:54 +02:00
Commit Graph

123702 Commits

Author SHA1 Message Date
Lynne 5482deeb66 lavfi/scale_vulkan: fix width/height match check
Sponsored-by: Sovereign Tech Fund
2026-03-28 19:36:58 +01:00
Lynne 0e077f2dc1 swscale/vulkan: do not apply order_src/dst for packed r/w
> packed = load all components from a single plane (the index given by order_src[0])
> planar = load one component each from separate planes (the index given by order_src[i])

Sponsored-by: Sovereign Tech Fund
2026-03-28 19:36:04 +01:00
Lynne 69c9cfbddf swscale/vulkan: fix redundant check for packed data
This is always in the branch where packed == false.

Sponsored-by: Sovereign Tech Fund
2026-03-28 19:36:04 +01:00
Niklas Haas 814f862832 swscale/graph: add scaling ops when required
The question of whether to do vertical or horizontal scaling first is a tricky
one. There are several valid philosophies:

1. Prefer horizontal scaling on the smaller pixel size, since this lowers the
   cost of gather-based kernels.
2. Prefer minimizing the number of total filter taps, i.e. minimizing the size
   of the intermediate image.
3. Prefer minimizing the number of rows horizontal scaling is applied to.

Empirically, I'm still not sure which approach is best overall, and it probably
depends at least a bit on the exact filter kernels in use. But for now, I
opted to implement approach 3, which seems to work well. I will re-evaluate
this once the filter kernels are actually finalized.

The 'scale' in 'libswscale' can now stand for 'scaling'.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 2ef01689c4 swscale/x86/ops: add 4x4 transposed kernel for large filters
Above a certain filter size, we can load the offsets as scalars and loop
over filter taps instead. To avoid having to assemble the output register
in memory (or use some horrific sequence of blends and insertions), we process
4 adjacent pixels at a time and do a 4x4 transpose before accumulating the
weights.

Significantly faster than the existing kernels after 2-3 iterations.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 4bf51d6615 swscale/x86/ops: add reference SWS_OP_FILTER_H implementation
This uses a naive gather-based loop, similar to the existing legacy hscale
SIMD. This has provably correct semantics (and avoids overflow as long as
the filter scale is 1 << 14 or so), though it's not particularly fast for
larger filter sizes.

We can specialize this to more efficient implementations in a subset of cases,
but for now, this guarantees a match to the C code.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 568cdca9cc swscale/x86/ops: implement support for SWS_OP_FILTER_V
Ideally, we would like to be able to specialize these to fixed kernel
sizes as well (e.g. 2 taps), but that only saves a tiny bit of loop overhead
and at the moment I have more pressing things to focus on.

I found that using FMA instead of straight mulps/addps gains about 15%, so
I defined a separate FMA path that can be used when BITEXACT is not specified
(or when we can statically guarantee that the final sum fits into the floating
point range).

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 7966de1ce6 swscale/x86/ops: add support for applying y line bump
A singular `imul` per line here is completely irrelevant in terms of
overhead, and definitely not the worth of whatever precomputation would be
required to avoid it.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 77588898e2 swscale/x86/ops: add some missing packed shuffle instances
Missing ayuv64le -> gray and vyu444 -> gray; these conversions can arise
transiently during scaling.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 98f2aba45a swscale/x86/ops: add bxq/yq variants of bxd/yd
Sometimes, bxd/yd need to be passed directly to a 64-bit memory operand,
which requires the use of the 64-bit variants. Since we can't guarantee that
the high bits are correctly zero'd on function entry, add an explicit
movsxd instruction to cover the first loop iteration.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 48369f6cf2 swscale/x86/ops: reserve one more temporary register
Slightly more convenient for the calculations inside the filter kernel, and
ultimately not significant due to the fact that the extra register only needs
to be saved on the loop entrypoint.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 4ff32b6e86 swscale/ops_chain: add optional check() call to SwsOpEntry
Allows implementations to implement more advanced logic to determine if an
operation is compatible or not.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 7b6170a9a5 tests/swscale: don't hard-error on low bit depth SSIM loss
This is an expected consequence of the fact that the new ops code does not
yet do error diffusion, which only really affects formats like rgb4 and monow.

Specifically, this avoids erroring out with the following error:

 loss 0.214988 is WORSE by 0.0111071, ref loss 0.203881
 SSIM {Y=0.745148 U=1.000000 V=1.000000 A=1.000000}

When scaling monow -> monow from 96x96 to 128x96.

We can remove this hack again in the future when error diffusion is implemented,
but for now, this check prevents me from easily testing the scaling code.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas d8b82c1097 tests/checkasm/sw_ops: add tests for SWS_OP_FILTER_H/V
These tests check that the (fused) read+filter ops work.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 0402ecc270 tests/checkasm/sw_ops: set value range on op list input
May allow more efficient implementations that rely on the value range being
constrained.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 43242e8a88 tests/checkasm/sw_ops: increase line count
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 1a8c3d522e swscale/ops_backend: add support for SWS_OP_FILTER_H
Naive scalar loop to serve mainly as a reference for the asm backends.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas e787f75ec8 swscale/ops_backend: add support for SWS_OP_FILTER_V
These could be implemented as a special case of DECL_READ(), but the
amount of extra noise that entails is not worth it; especially due to the
extra setup/free code that needs to be used here.

I've decided that, for now, the canonical implementation shall convert the
weights to floating point before doing the actual scaling. This is not a huge
efficiency loss (since the result will be 32-bit anyways, and mulps/addps are
1-cycle ops); so the main downside comes from the single extra float conversion
on the input pixels.

In theory, we may revisit this later if it turns out that using e.g. pmaddwd
is a win even for vertical scaling, but for now, this works and is a simple
starting point. Vertical scaling also tends to happen after horizontal scaling,
at which point the input will be F32 already to begin with.

For smaller types/kernels (e.g. U8 input with a reasonably sized kernel),
the result here is exact either way, since the resulting 8+14 bit sum fits
exactly into float.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 542557ba47 swscale/ops_backend: implement support for y_bump map
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas fce3deaa3b swscale/ops_backend: add SwsOpExec to SwsOpIter
Needed for the scaling kernel, which accesses line strides.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 0b91b5a5e4 swscale/ops_backend: remove unused/wrong #define
PIXEL_MIN is either useless (int) or wrong (float); should be -FLT_MAX
rather than FLT_MIN, if the intent is to capture the most negative possible
value.

Just remove it since we don't actually need it for anything.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas e6e9c45892 swscale/ops_dispatch: try again with split subpasses if compile() fails
First, we try compiling the filter pass as-is; in case any backends decide to
handle the filter as a single pass. (e.g. Vulkan, which will want to compile
such using internal temporary buffers and barriers)

If that fails, retry with a chained list of split passes.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas e3daeff965 swscale/ops_dispatch: compute input x offset map for SwsOpExec
This is cheap to precompute and can be used as-is for gather-style horizontal
filter implementations.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas dc88946d7b swscale/ops_dispatch: fix plane width calculation
This was wrong if sub_x > 1.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 78878b9daa swscale/ops_dispatch: refactor tail handling
Rather than dispatching the compiled function for each line of the tail
individually, with a memcpy to a shared buffer in between, this instead copies
the entire tail region into a temporary intermediate buffer, processes it with
a single dispatch call, and then copies the entire result back to the
destination.

The main benefit of this is that it enables scaling, subsampling or other
quirky layouts to continue working, which may require accessing lines adjacent
to the main input.

It also arguably makes the code a bit simpler and easier to follow, but YMMV.

One minor consequence of the change in logic is that we also no longer handle
the last row of an unpadded input buffer separately - instead, if *any* row
needs to be padded, *all* rows in the current slice will be padded. This is
a bit less efficient but much more predictable, and as discussed, basically
required for scaling/filtering anyways.

While we could implement some sort of hybrid regime where we only use the new
logic when scaling is needed, I really don't think this would gain us anything
concrete enough to be worth the effort, especially since the performance is
basically roughly the same across the board:

16 threads:
  yuv444p 1920x1080 -> ayuv 1920x1080: speedup=1.000x slower (input memcpy)
  rgb24   1920x1080 -> argb 1920x1080: speedup=1.012x faster (output memcpy)

1 thread:
  yuv444p 1920x1080 -> ayuv 1920x1080: speedup=1.062x faster (input memcpy)
  rgb24   1920x1080 -> argb 1920x1080: speedup=0.959x slower (output memcpy)

Overall speedup is +/- 1% across the board, well within margin of error.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 015abfab38 swscale/ops_dispatch: precompute relative y bump map
This is more useful for tight loops inside CPU backends, which can implement
this by having a shared path for incrementing to the next line (as normal),
and then a separate path for adding an extra position-dependent, stride
multiplied line offset after each completed line.

As a free upside, this encoding does not require any separate/special handling
for the exec tail.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 2583d7ad9b swscale/ops_dispatch: add line offsets map to SwsOpPass
And use it to look up the correct source plane line for each destination
line. Needed for vertical scaling, in which case multiple output lines can
reference the same input line.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:14 +01:00
Niklas Haas 9f0353a5b7 swscale/ops_optimizer: implement filter optimizations
We have to move the filters out of the way very early to avoid blocking
SWS_OP_LINEAR fusion, since filters tend to be nested in between all the
decode and encode linear ops.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:13 +01:00
Niklas Haas a41bc1dea3 swscale/ops_optimizer: merge duplicate SWS_OP_SCALE
(As long as the constant doesn't overflow)

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:13 +01:00
Niklas Haas cba54e9e3b swscale/ops: add helper function to split filter subpasses
An operation list containing multiple filter passes, or containing nontrivial
operations before a filter pass, need to be split up into multiple execution
steps with temporary buffers in between; at least for CPU backends.

This helper function introduces the necessary subpass splitting logic

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:13 +01:00
Niklas Haas bf09910292 swscale/ops: add filter kernel to SwsReadWriteOp
This allows reads to directly embed filter kernels. This is because, in
practice, a filter needs to be combined with a read anyways. To accomplish
this, we define filter ops as their semantic high-level operation types, and
then have the optimizer fuse them with the corresponding read/write ops
(where possible).

Ultimately, something like this will be needed anyways for subsampled formats,
and doing it here is just incredibly clean and beneficial compared to each
of the several alternative designs I explored.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:13 +01:00
Niklas Haas 63140bff5e swscale/ops: define SWS_OP_FILTER_H/V
This commit merely adds the definitions. The implementations will follow.

It may seem a bit impractical to have these filter ops given that they
break the usual 1:1 association between operation inputs and outputs, but
the design path I chose will have these filter "pseudo-ops" end up migrating
towards the read/write for CPU implementations. (Which don't benefit from
any ability to hide the intermediate memory internally the way e.g. a fused
Vulkan compute shader might).

What we gain from this design, on the other hand, is considerably cleaner
high-level code, which doesn't need to concern itself with low-level
execution details at all, and can just freely insert these ops wherever
it needs to. The dispatch layer will take care of actually executing these
by implicitly splitting apart subpasses.

To handle out-of-range values and so on, the filters by necessity have to
also convert the pixel range. I have settled on using floating point types
as the canonical intermediate format - not only does this save us from having
to define e.g. I32 as a new intermediate format, but it also allows these
operations to chain naturally into SWS_OP_DITHER, which will basically
always be needed after a filter pass anyways.

The one exception here is for point sampling, which would rather preserve
the input type. I'll worry about this optimization at a later point in time.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:13 +01:00
Niklas Haas 53ee892035 swscale/graph: add way to roll back passes
When an op list needs to be decomposed into a more complicated sequence
of passes, the compile() code may need to roll back passes that have already
been partially compiled, if a later pass fails to compile.

This matters for subpass splitting (e.g. for filtering), as well as for
plane splitting.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:13 +01:00
Niklas Haas 475b11b2e0 swscale/filters: write new filter LUT generation code
This is a complete rewrite of the math in swscale/utils.c initFilter(), using
floating point math and with a bit more polished UI and internals. I have
also included a substantial number of improvements, including a method to
numerically compute the true filter support size from the parameters, and a
more robust logic for the edge conditions. The upshot of these changes is
that the filter weight computation is now much simpler and faster, and with
fewer edge cases.

I copy/pasted the actual underlying kernel functions from libplacebo, so this
math is already quite battle-tested. I made some adjustments to the defaults
to align with the existing defaults in libswscale, for backwards compatibility.

Note that this commit introduces a lot more filter kernels than what we
actually expose; but they are cheap to carry around, don't take up binary
space, and will probably save some poor soul from incorrectly reimplementing
them in the future. Plus, I have plans to expand the list of functions down
the line, so it makes sense to just define them all, even if we don't
necessarily use them yet.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 18:50:13 +01:00
Niklas Haas f76aa4e408 swscale/tests/sws_ops: add option for summarizing all operation patterns
This can be used to either manually verify, or perhaps programmatically
generate, the list of operation patterns that need to be supported by a
backend to be feature-complete.

Sponsored-by: Sovereign Tech Fund
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Niklas Haas d7a079279f swscale/tests/sws_ops: refactor argument parsing
To allow for argumentless options in the future.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Ramiro Polla b6e470467e swscale/tests/sws_ops: add -v option to set log verbosity
Sponsored-by: Sovereign Tech Fund
Signed-off-by: Ramiro Polla <ramiro.polla@gmail.com>
2026-03-28 16:48:13 +00:00
Niklas Haas d3db2dc518 swscale/tests/sws_ops: simplify using ff_sws_enum_op_lists()
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Niklas Haas 4395e8f3a2 swscale/ops: add helper function to enumerate over all op lists
This moves the logic from tests/sws_ops into the library itself, where it
can be reused by e.g. the aarch64 asmgen backend to iterate over all possible
operation types it can expect to see.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Niklas Haas f62c837eb6 swscale/ops: move op-formatting code to helper function
Annoyingly, access to order_src/dst requires access to the SwsOpList, so
we have to append that data after the fact.

Maybe this is another incremental tick in favor of `SwsReadWriteOp` in the
ever-present question in my head of whether the plane order should go there
or into SwsOpList.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Niklas Haas 8fae195395 swscale/ops: avoid printing values for ignored components
Makes the list output a tiny bit tidier. This is cheap to support now thanks
to the print_q4() helper.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Niklas Haas 1caa548caf swscale/ops: refactor PRINTQ() macro
Instead of allocating a billion tiny temporary buffers, these helpers now
directly append to an AVBPrint. I decided to explicitly control whether or not
a value with denom 0 should be printed as "inf/nan" or as "_", because a lot
of ops have the implicit semantic of "den == 0 -> ignored". At the same time,
we don't want to obscure legitimate NAN/INF values when the do occur
unintentionally.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Niklas Haas 0d54a1b53a swscale/ops: remove , from comp min/max print-out for consistency
Interferes with an upcoming simplification, otherwise.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Niklas Haas a0bb0c2772 swscale/ops: use AVBPrint for assembling op descriptions
This commit does not yet touch the PRINTQ macro, but it gets rid of at least
one unnecessary hand-managed buffer.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Niklas Haas 95e6c68707 swscale/ops: print exact constant on SWS_OP_SCALE
More informative.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Niklas Haas f6d963553b swscale/ops: correctly uninit all ops in ff_sws_op_list_remove_at()
This only ever removed a single op, even with count > 1.

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Niklas Haas 6f1664382d swscale/format: add helper function to get "default" SwsFormat
But still apply the sanitization/defaulting logic from ff_fmt_from_frame().

Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Niklas Haas 08a7b714f2 swscale/format: move SwsFormat sanitization to helper function
Signed-off-by: Niklas Haas <git@haasn.dev>
2026-03-28 16:48:13 +00:00
Priyanshu Thapliyal d1bcaab230 avcodec/alsdec: preserve full float value in zero-truncated samples
Signed-off-by: Priyanshu Thapliyal <priyanshuthapliyal2005@gmail.com>
2026-03-28 12:18:37 +00:00
Priyanshu Thapliyal febc82690d avcodec/alsdec: propagate read_diff_float_data() errors in read_frame_data()
The return value of read_diff_float_data() was previously ignored,
allowing decode to continue silently with partially transformed samples
on malformed floating ALS input. Check and propagate the error.

All failure paths in read_diff_float_data() already return
AVERROR_INVALIDDATA, so the caller fix is sufficient without
any normalization inside the function.

Signed-off-by: Priyanshu Thapliyal <priyanshuthapliyal2005@gmail.com>
2026-03-28 11:53:38 +00:00