Instead of calculating a^2, b^2, (a+b)^2 and (a-b)^2, calculate only
a^2, b^2 and 2*a*b in each iteration and derive the latter parts from
these three at the end.
Before and after:
A78
ac3_sum_square_bufferfly_int32_neon: 484.8 ( 2.00x)
ac3_sum_square_bufferfly_int32_neon: 468.2 ( 2.08x)
A72
ac3_sum_square_bufferfly_int32_neon: 793.6 ( 1.26x)
ac3_sum_square_bufferfly_int32_neon: 527.3 ( 1.92x)
Signed-off-by: Martin Storsjö <martin@martin.st>
Constify dstStrice argument of rgbx_to_nv12_neon_16_wrapper to be
compatible with other functions as used in function assignment.
Signed-off-by: Adam Lackorzynski <adam@l4re.org>
Signed-off-by: Martin Storsjö <martin@martin.st>
(setting to 100 as a reasonable compromise)
The change has caused regressions for many users and consumers.
Playlist reloads only happen when a playlist doesn't indicate that it
has ended (via #EXT-X-ENDLIST), which means that the addition of future
segments is still expected.
It is well possible that an HLS server is temporarily unable to serve
further segments but resumes after some time, either indicating a
discontinuity or even by fully catching up.
With a segment length of 3s, a max_reload value of 1000 corresponds to
a duration of 50 minutes which appears to be a reasonable default.
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The issue is that this could consume gigabytes of VRAM at higher
resolutions for not that much of a speedup.
Automatic detection was not a good idea as we can't know how much
VRAM is actually free.
Just remove it.
BGR formats in Vulkan cannot be used in storage images, as the
pixel labels on storage images are always ordered as RGB, and
swizzling is not an option due to old hardware limitations.
This means that you must always use an RGB format and manually
swizzle when reading or writing to BGR images, or simply not use
a format in the shader itself.
This adds support for the latter.
&pl_alpha_overlay expects straight alpha, but the alpha output may be
premultiplied as a result of the target alpha mode (or in the absence of an
alpha channel). Fix it by requesting independent alpha explicitly when
blending.
There is no reason to only do this on the first input. It doesn't actually
matter for now given that we don't constrain the color space list, but it
may matter when that changes.
Signed-off-by: Niklas Haas <git@haasn.dev>
Each input is entirely independent and can have varying pixel formats,
color spaces, etc. To accomplish this, we need to make a full copy of the
format list for each subsequent input, rather than sharing the same ref.
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: nxtedition
Fixes a deprecation warning, and also fixes a bug where the depredated
skip_target_clearing option was not correctly mapped to the new API inside
libplacebo upstream.
Signed-off-by: Niklas Haas <git@haasn.dev>
Sponsored-by: nxtedition
These were introduced in libplacebo API version 220. We actually already
map the field by default, but deinterlacing was never enabled unless the user
explicitly forced it using extra_ops.
This filter already failed to compile on older versions, because of an
unconditional use of an API introduced in API version 220. Nobody noticed
this, so I conclude that it's safe to bump the required version by now.
External libraries are not components and so their CONFIG_ define
is in config.h.
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It currently uses an intermediate int which wouldn't work
if linesize exceeded the range of int and inhibits compiler
optimizations. Also switch to pointer arithmetic and use
smaller scope.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
avctx->bits_per_raw_sample is always 10 or 12 here;
the checks have been added in preparation for making
bits_per_raw_sample user-settable via an AVOption,
but this never happened.
While just at it, also set unpack_alpha earlier
(where bits_per_raw_sample is set).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Using LONG_BITSTREAM_READER means that every get_bits() call
uses an AV_RB64() to ensure that cache always contains 32 valid bits
(as opposed to the ordinary 25 guaranteed by reading 32 bits);
yet this is unnecessary when unpacking alpha. So only use these
64bit reads where necessary.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>