James Almer
6231fa7fb7
avcodec/av1dec: don't emit a warning when parsing isobmff style extradata
...
No OBUs may be present and it's a valid scenario, so only warn when parsing raw
extradata.
Signed-off-by: James Almer <jamrial@gmail.com >
2025-10-05 22:23:51 -03:00
James Almer
78a16e42bd
avcodec/av1dec: don't overwrite container level color information if none is coded in the bitstream
...
Signed-off-by: James Almer <jamrial@gmail.com >
2025-10-05 13:22:23 -03:00
James Almer
009e4a1c20
avcodec/libdav1d: also consider user defined color information when selectiog pix_fmt
...
Fixes issue #20624 .
Signed-off-by: James Almer <jamrial@gmail.com >
2025-10-05 13:22:23 -03:00
James Almer
9b709532d5
avformat/demux: don't overwrite container level color information if set
...
If the information is coded at the container level, then that's what should be
exported. The user will still have access to values coded at the bitstream
level by firing a decoder.
Fixes issue #20121
Signed-off-by: James Almer <jamrial@gmail.com >
2025-10-05 13:22:17 -03:00
James Almer
95850f339e
tests/checkasm: add a test for dcadsp
...
Signed-off-by: James Almer <jamrial@gmail.com >
2025-10-05 10:09:04 -03:00
James Almer
99034b581f
avcodec/dcadsp: constify lfe_samples parameter
...
Signed-off-by: James Almer <jamrial@gmail.com >
2025-10-04 14:18:30 -03:00
Andreas Rheinhardt
e05f8acabf
avfilter/blend_modes: Don't build duplicate functions
...
Some of the blend mode functions only depend on the underlying type
and therefore need only one version for 9, 10, 12, 14, 16 bits.
This saved 35104B with GCC and 26880B with Clang.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 17:49:08 +02:00
Andreas Rheinhardt
ea346a23de
avfilter/blend_modes: Use stride in bytes
...
The blend functions currently convert strides from bytes to elements
of the type by using the stride /= sizeof(pixel) idiom. Yet this has
several drawbacks:
1. It invokes undefined behavior that happens to work when stride is
negative: size_t is typically the unsigned type of ptrdiff_t and
therefore the division will be performed as size_t, i.e. use logical
right shifts, making stride very big when sizeof(pixel) is > 1. This
works, because pointer to pixel for accesses entails an implicit
factor of sizeof(pixel) so that everything is correct modulo SIZE_MAX.
Yet this is UB and UBSan complains about it.
2. It makes the compiler emit actual shifts/ands to discard the low bits
shifted away.
3. There may be systems where alignof(uint16_t) or alignof(float) is
strictly smaller than their sizeof, so that the stride (in bytes) is
not guaranteed to be multiple of these sizeofs. In this case, dividing
by sizeof(pixel) is simply wrong.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 17:49:08 +02:00
Andreas Rheinhardt
8fad52bd57
avcodec/x86/h264_qpel: Use ptrdiff_t for strides
...
Avoids having to sign-extend the strides in the assembly
(it also is more correct given that the qpel_mc_func
already uses ptrdiff_t).
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
495c3d03ae
avcodec/x86/h264_qpel_10bit: Remove SSE2 "cache64" duplicates
...
The horizontal 10bit MC SSE2 functions are currently duplicated:
They exist both in ordinary form as well as with a "sse2_cache64"
suffix. A comment in ff_h264qpel_init_x86() indicates that this
is due to older processors not liking accesses that cross cache
lines, yet these functions are identical to the non-cache64
functions (apart from the unavoidable changes in the rip-offset).
The only difference between these functions and the ordinary ones
are that the cache64 ones are created via a special form of the
INIT_XMM macro: "INIT_XMM sse2, cache64". This affects the name
and apparently defines cpuflags_cache64, yet nothing checks for
this, so both versions are identical. So remove the cache64 ones
and treat the remaining ones like ordinary SSE2 functions.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
697da64c8e
avcodec/x86/h264_qpel: Port pixel8_l2_shift5 from MMXEXT to SSE2
...
This abides by the ABI (no missing emms) and yields a tiny
performance improvement here.
Old benchmarks:
avg_h264_qpel_8_mc12_8_c: 419.9 ( 1.00x)
avg_h264_qpel_8_mc12_8_sse2: 78.9 ( 5.32x)
avg_h264_qpel_8_mc12_8_ssse3: 71.7 ( 5.86x)
avg_h264_qpel_8_mc32_8_c: 429.1 ( 1.00x)
avg_h264_qpel_8_mc32_8_sse2: 76.9 ( 5.58x)
avg_h264_qpel_8_mc32_8_ssse3: 73.4 ( 5.84x)
put_h264_qpel_8_mc12_8_c: 424.0 ( 1.00x)
put_h264_qpel_8_mc12_8_sse2: 78.6 ( 5.40x)
put_h264_qpel_8_mc12_8_ssse3: 70.6 ( 6.00x)
put_h264_qpel_8_mc32_8_c: 425.7 ( 1.00x)
put_h264_qpel_8_mc32_8_sse2: 75.2 ( 5.66x)
put_h264_qpel_8_mc32_8_ssse3: 70.4 ( 6.05x)
New benchmarks:
avg_h264_qpel_8_mc12_8_c: 425.7 ( 1.00x)
avg_h264_qpel_8_mc12_8_sse2: 77.5 ( 5.49x)
avg_h264_qpel_8_mc12_8_ssse3: 69.8 ( 6.10x)
avg_h264_qpel_8_mc32_8_c: 423.7 ( 1.00x)
avg_h264_qpel_8_mc32_8_sse2: 74.6 ( 5.68x)
avg_h264_qpel_8_mc32_8_ssse3: 71.9 ( 5.89x)
put_h264_qpel_8_mc12_8_c: 422.2 ( 1.00x)
put_h264_qpel_8_mc12_8_sse2: 75.8 ( 5.57x)
put_h264_qpel_8_mc12_8_ssse3: 67.9 ( 6.22x)
put_h264_qpel_8_mc32_8_c: 421.8 ( 1.00x)
put_h264_qpel_8_mc32_8_sse2: 72.6 ( 5.81x)
put_h264_qpel_8_mc32_8_ssse3: 67.7 ( 6.23x)
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
4ac9162beb
avcodec/x86/h264_qpel: Don't use ff_ prefix for static functions
...
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
cd077e88d1
avcodec/x86/h264_qpel: Add ff_{avg,put}_h264_qpel16_h_lowpass_l2_sse2()
...
These functions are currently emulated via four calls to the versions
for 8x8 blocks. In fact, the size savings from the simplified calls
in h264_qpel.c (GCC 1344B, Clang 1280B) more than outweigh the size
of the added functions (512B) here.
It is also beneficial performance-wise. Old benchmarks:
avg_h264_qpel_16_mc11_8_c: 1414.1 ( 1.00x)
avg_h264_qpel_16_mc11_8_sse2: 206.2 ( 6.86x)
avg_h264_qpel_16_mc11_8_ssse3: 177.7 ( 7.96x)
avg_h264_qpel_16_mc13_8_c: 1417.0 ( 1.00x)
avg_h264_qpel_16_mc13_8_sse2: 207.4 ( 6.83x)
avg_h264_qpel_16_mc13_8_ssse3: 178.2 ( 7.95x)
avg_h264_qpel_16_mc21_8_c: 1632.8 ( 1.00x)
avg_h264_qpel_16_mc21_8_sse2: 349.3 ( 4.67x)
avg_h264_qpel_16_mc21_8_ssse3: 291.3 ( 5.60x)
avg_h264_qpel_16_mc23_8_c: 1640.2 ( 1.00x)
avg_h264_qpel_16_mc23_8_sse2: 351.3 ( 4.67x)
avg_h264_qpel_16_mc23_8_ssse3: 290.8 ( 5.64x)
avg_h264_qpel_16_mc31_8_c: 1411.7 ( 1.00x)
avg_h264_qpel_16_mc31_8_sse2: 203.4 ( 6.94x)
avg_h264_qpel_16_mc31_8_ssse3: 178.9 ( 7.89x)
avg_h264_qpel_16_mc33_8_c: 1409.7 ( 1.00x)
avg_h264_qpel_16_mc33_8_sse2: 204.6 ( 6.89x)
avg_h264_qpel_16_mc33_8_ssse3: 178.1 ( 7.92x)
put_h264_qpel_16_mc11_8_c: 1391.0 ( 1.00x)
put_h264_qpel_16_mc11_8_sse2: 197.4 ( 7.05x)
put_h264_qpel_16_mc11_8_ssse3: 176.1 ( 7.90x)
put_h264_qpel_16_mc13_8_c: 1395.9 ( 1.00x)
put_h264_qpel_16_mc13_8_sse2: 196.7 ( 7.10x)
put_h264_qpel_16_mc13_8_ssse3: 177.7 ( 7.85x)
put_h264_qpel_16_mc21_8_c: 1609.5 ( 1.00x)
put_h264_qpel_16_mc21_8_sse2: 341.1 ( 4.72x)
put_h264_qpel_16_mc21_8_ssse3: 289.2 ( 5.57x)
put_h264_qpel_16_mc23_8_c: 1604.0 ( 1.00x)
put_h264_qpel_16_mc23_8_sse2: 340.9 ( 4.71x)
put_h264_qpel_16_mc23_8_ssse3: 289.6 ( 5.54x)
put_h264_qpel_16_mc31_8_c: 1390.2 ( 1.00x)
put_h264_qpel_16_mc31_8_sse2: 194.6 ( 7.14x)
put_h264_qpel_16_mc31_8_ssse3: 176.4 ( 7.88x)
put_h264_qpel_16_mc33_8_c: 1400.4 ( 1.00x)
put_h264_qpel_16_mc33_8_sse2: 198.5 ( 7.06x)
put_h264_qpel_16_mc33_8_ssse3: 176.2 ( 7.95x)
New benchmarks:
avg_h264_qpel_16_mc11_8_c: 1413.3 ( 1.00x)
avg_h264_qpel_16_mc11_8_sse2: 171.8 ( 8.23x)
avg_h264_qpel_16_mc11_8_ssse3: 173.0 ( 8.17x)
avg_h264_qpel_16_mc13_8_c: 1423.2 ( 1.00x)
avg_h264_qpel_16_mc13_8_sse2: 172.0 ( 8.27x)
avg_h264_qpel_16_mc13_8_ssse3: 173.4 ( 8.21x)
avg_h264_qpel_16_mc21_8_c: 1641.3 ( 1.00x)
avg_h264_qpel_16_mc21_8_sse2: 322.1 ( 5.10x)
avg_h264_qpel_16_mc21_8_ssse3: 291.3 ( 5.63x)
avg_h264_qpel_16_mc23_8_c: 1629.1 ( 1.00x)
avg_h264_qpel_16_mc23_8_sse2: 323.0 ( 5.04x)
avg_h264_qpel_16_mc23_8_ssse3: 293.3 ( 5.55x)
avg_h264_qpel_16_mc31_8_c: 1409.2 ( 1.00x)
avg_h264_qpel_16_mc31_8_sse2: 172.0 ( 8.19x)
avg_h264_qpel_16_mc31_8_ssse3: 173.7 ( 8.11x)
avg_h264_qpel_16_mc33_8_c: 1402.5 ( 1.00x)
avg_h264_qpel_16_mc33_8_sse2: 172.5 ( 8.13x)
avg_h264_qpel_16_mc33_8_ssse3: 173.6 ( 8.08x)
put_h264_qpel_16_mc11_8_c: 1393.7 ( 1.00x)
put_h264_qpel_16_mc11_8_sse2: 170.4 ( 8.18x)
put_h264_qpel_16_mc11_8_ssse3: 178.2 ( 7.82x)
put_h264_qpel_16_mc13_8_c: 1398.0 ( 1.00x)
put_h264_qpel_16_mc13_8_sse2: 170.2 ( 8.21x)
put_h264_qpel_16_mc13_8_ssse3: 178.6 ( 7.83x)
put_h264_qpel_16_mc21_8_c: 1619.6 ( 1.00x)
put_h264_qpel_16_mc21_8_sse2: 320.6 ( 5.05x)
put_h264_qpel_16_mc21_8_ssse3: 297.2 ( 5.45x)
put_h264_qpel_16_mc23_8_c: 1617.4 ( 1.00x)
put_h264_qpel_16_mc23_8_sse2: 320.0 ( 5.05x)
put_h264_qpel_16_mc23_8_ssse3: 297.4 ( 5.44x)
put_h264_qpel_16_mc31_8_c: 1389.7 ( 1.00x)
put_h264_qpel_16_mc31_8_sse2: 169.9 ( 8.18x)
put_h264_qpel_16_mc31_8_ssse3: 178.1 ( 7.80x)
put_h264_qpel_16_mc33_8_c: 1394.0 ( 1.00x)
put_h264_qpel_16_mc33_8_sse2: 170.9 ( 8.16x)
put_h264_qpel_16_mc33_8_ssse3: 176.9 ( 7.88x)
Notice that the SSSE3 versions of mc21 and mc23 benefit from
an optimized version of hv2_lowpass.
Also notice that there is no SSE2 version of the purely horizontal
motion compensation. This means that src2 is currently always aligned
when calling the SSE2 functions (and that srcStride is always equal
to the block width). Yet this has not been exploited (yet).
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
4880fa4dca
avcodec/x86/h264_qpel_8bit: Remove dead macro
...
Forgotten in 4011a76494
.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
35aaf697e9
avcodec/x86/h264_qpel_8bit: Replace qpel8_h_lowpass_l2 MMXEXT by SSE2
...
Using xmm registers here is very natural, as it allows to
operate on eight words at a time. It also saves 48B here
and does not clobber the MMX state.
Old benchmarks (only tests affected by the modified function are shown):
avg_h264_qpel_8_mc11_8_c: 352.2 ( 1.00x)
avg_h264_qpel_8_mc11_8_sse2: 70.4 ( 5.00x)
avg_h264_qpel_8_mc11_8_ssse3: 53.9 ( 6.53x)
avg_h264_qpel_8_mc13_8_c: 353.3 ( 1.00x)
avg_h264_qpel_8_mc13_8_sse2: 72.8 ( 4.86x)
avg_h264_qpel_8_mc13_8_ssse3: 53.8 ( 6.57x)
avg_h264_qpel_8_mc21_8_c: 404.0 ( 1.00x)
avg_h264_qpel_8_mc21_8_sse2: 116.1 ( 3.48x)
avg_h264_qpel_8_mc21_8_ssse3: 94.3 ( 4.28x)
avg_h264_qpel_8_mc23_8_c: 398.9 ( 1.00x)
avg_h264_qpel_8_mc23_8_sse2: 118.6 ( 3.36x)
avg_h264_qpel_8_mc23_8_ssse3: 94.8 ( 4.21x)
avg_h264_qpel_8_mc31_8_c: 352.7 ( 1.00x)
avg_h264_qpel_8_mc31_8_sse2: 71.4 ( 4.94x)
avg_h264_qpel_8_mc31_8_ssse3: 53.8 ( 6.56x)
avg_h264_qpel_8_mc33_8_c: 354.0 ( 1.00x)
avg_h264_qpel_8_mc33_8_sse2: 70.6 ( 5.01x)
avg_h264_qpel_8_mc33_8_ssse3: 53.7 ( 6.59x)
avg_h264_qpel_16_mc11_8_c: 1417.0 ( 1.00x)
avg_h264_qpel_16_mc11_8_sse2: 276.9 ( 5.12x)
avg_h264_qpel_16_mc11_8_ssse3: 178.8 ( 7.92x)
avg_h264_qpel_16_mc13_8_c: 1427.3 ( 1.00x)
avg_h264_qpel_16_mc13_8_sse2: 277.4 ( 5.14x)
avg_h264_qpel_16_mc13_8_ssse3: 179.7 ( 7.94x)
avg_h264_qpel_16_mc21_8_c: 1634.1 ( 1.00x)
avg_h264_qpel_16_mc21_8_sse2: 421.3 ( 3.88x)
avg_h264_qpel_16_mc21_8_ssse3: 291.2 ( 5.61x)
avg_h264_qpel_16_mc23_8_c: 1627.0 ( 1.00x)
avg_h264_qpel_16_mc23_8_sse2: 420.8 ( 3.87x)
avg_h264_qpel_16_mc23_8_ssse3: 291.0 ( 5.59x)
avg_h264_qpel_16_mc31_8_c: 1418.4 ( 1.00x)
avg_h264_qpel_16_mc31_8_sse2: 278.5 ( 5.09x)
avg_h264_qpel_16_mc31_8_ssse3: 178.6 ( 7.94x)
avg_h264_qpel_16_mc33_8_c: 1407.3 ( 1.00x)
avg_h264_qpel_16_mc33_8_sse2: 277.6 ( 5.07x)
avg_h264_qpel_16_mc33_8_ssse3: 179.9 ( 7.82x)
put_h264_qpel_8_mc11_8_c: 348.1 ( 1.00x)
put_h264_qpel_8_mc11_8_sse2: 69.1 ( 5.04x)
put_h264_qpel_8_mc11_8_ssse3: 53.8 ( 6.47x)
put_h264_qpel_8_mc13_8_c: 349.3 ( 1.00x)
put_h264_qpel_8_mc13_8_sse2: 69.7 ( 5.01x)
put_h264_qpel_8_mc13_8_ssse3: 53.7 ( 6.51x)
put_h264_qpel_8_mc21_8_c: 398.5 ( 1.00x)
put_h264_qpel_8_mc21_8_sse2: 115.0 ( 3.46x)
put_h264_qpel_8_mc21_8_ssse3: 95.3 ( 4.18x)
put_h264_qpel_8_mc23_8_c: 399.9 ( 1.00x)
put_h264_qpel_8_mc23_8_sse2: 120.8 ( 3.31x)
put_h264_qpel_8_mc23_8_ssse3: 95.4 ( 4.19x)
put_h264_qpel_8_mc31_8_c: 350.4 ( 1.00x)
put_h264_qpel_8_mc31_8_sse2: 69.6 ( 5.03x)
put_h264_qpel_8_mc31_8_ssse3: 54.2 ( 6.47x)
put_h264_qpel_8_mc33_8_c: 353.1 ( 1.00x)
put_h264_qpel_8_mc33_8_sse2: 71.0 ( 4.97x)
put_h264_qpel_8_mc33_8_ssse3: 54.2 ( 6.51x)
put_h264_qpel_16_mc11_8_c: 1384.2 ( 1.00x)
put_h264_qpel_16_mc11_8_sse2: 272.9 ( 5.07x)
put_h264_qpel_16_mc11_8_ssse3: 178.3 ( 7.76x)
put_h264_qpel_16_mc13_8_c: 1393.6 ( 1.00x)
put_h264_qpel_16_mc13_8_sse2: 271.1 ( 5.14x)
put_h264_qpel_16_mc13_8_ssse3: 178.3 ( 7.82x)
put_h264_qpel_16_mc21_8_c: 1612.6 ( 1.00x)
put_h264_qpel_16_mc21_8_sse2: 416.5 ( 3.87x)
put_h264_qpel_16_mc21_8_ssse3: 289.1 ( 5.58x)
put_h264_qpel_16_mc23_8_c: 1621.3 ( 1.00x)
put_h264_qpel_16_mc23_8_sse2: 416.9 ( 3.89x)
put_h264_qpel_16_mc23_8_ssse3: 289.4 ( 5.60x)
put_h264_qpel_16_mc31_8_c: 1408.4 ( 1.00x)
put_h264_qpel_16_mc31_8_sse2: 273.5 ( 5.15x)
put_h264_qpel_16_mc31_8_ssse3: 176.9 ( 7.96x)
put_h264_qpel_16_mc33_8_c: 1396.4 ( 1.00x)
put_h264_qpel_16_mc33_8_sse2: 276.3 ( 5.05x)
put_h264_qpel_16_mc33_8_ssse3: 176.4 ( 7.92x)
New benchmarks:
avg_h264_qpel_8_mc11_8_c: 352.1 ( 1.00x)
avg_h264_qpel_8_mc11_8_sse2: 52.5 ( 6.71x)
avg_h264_qpel_8_mc11_8_ssse3: 53.9 ( 6.54x)
avg_h264_qpel_8_mc13_8_c: 350.8 ( 1.00x)
avg_h264_qpel_8_mc13_8_sse2: 54.7 ( 6.42x)
avg_h264_qpel_8_mc13_8_ssse3: 54.3 ( 6.46x)
avg_h264_qpel_8_mc21_8_c: 400.1 ( 1.00x)
avg_h264_qpel_8_mc21_8_sse2: 98.6 ( 4.06x)
avg_h264_qpel_8_mc21_8_ssse3: 95.5 ( 4.19x)
avg_h264_qpel_8_mc23_8_c: 400.4 ( 1.00x)
avg_h264_qpel_8_mc23_8_sse2: 101.4 ( 3.95x)
avg_h264_qpel_8_mc23_8_ssse3: 95.9 ( 4.18x)
avg_h264_qpel_8_mc31_8_c: 352.4 ( 1.00x)
avg_h264_qpel_8_mc31_8_sse2: 52.9 ( 6.67x)
avg_h264_qpel_8_mc31_8_ssse3: 54.4 ( 6.48x)
avg_h264_qpel_8_mc33_8_c: 354.5 ( 1.00x)
avg_h264_qpel_8_mc33_8_sse2: 52.9 ( 6.70x)
avg_h264_qpel_8_mc33_8_ssse3: 54.4 ( 6.52x)
avg_h264_qpel_16_mc11_8_c: 1420.4 ( 1.00x)
avg_h264_qpel_16_mc11_8_sse2: 204.8 ( 6.93x)
avg_h264_qpel_16_mc11_8_ssse3: 177.9 ( 7.98x)
avg_h264_qpel_16_mc13_8_c: 1409.8 ( 1.00x)
avg_h264_qpel_16_mc13_8_sse2: 206.4 ( 6.83x)
avg_h264_qpel_16_mc13_8_ssse3: 178.0 ( 7.92x)
avg_h264_qpel_16_mc21_8_c: 1634.1 ( 1.00x)
avg_h264_qpel_16_mc21_8_sse2: 349.6 ( 4.67x)
avg_h264_qpel_16_mc21_8_ssse3: 290.0 ( 5.63x)
avg_h264_qpel_16_mc23_8_c: 1624.1 ( 1.00x)
avg_h264_qpel_16_mc23_8_sse2: 350.0 ( 4.64x)
avg_h264_qpel_16_mc23_8_ssse3: 291.9 ( 5.56x)
avg_h264_qpel_16_mc31_8_c: 1407.2 ( 1.00x)
avg_h264_qpel_16_mc31_8_sse2: 205.8 ( 6.84x)
avg_h264_qpel_16_mc31_8_ssse3: 178.2 ( 7.90x)
avg_h264_qpel_16_mc33_8_c: 1400.5 ( 1.00x)
avg_h264_qpel_16_mc33_8_sse2: 206.3 ( 6.79x)
avg_h264_qpel_16_mc33_8_ssse3: 179.4 ( 7.81x)
put_h264_qpel_8_mc11_8_c: 349.7 ( 1.00x)
put_h264_qpel_8_mc11_8_sse2: 50.2 ( 6.96x)
put_h264_qpel_8_mc11_8_ssse3: 51.3 ( 6.82x)
put_h264_qpel_8_mc13_8_c: 349.8 ( 1.00x)
put_h264_qpel_8_mc13_8_sse2: 50.7 ( 6.90x)
put_h264_qpel_8_mc13_8_ssse3: 51.7 ( 6.76x)
put_h264_qpel_8_mc21_8_c: 398.0 ( 1.00x)
put_h264_qpel_8_mc21_8_sse2: 96.5 ( 4.13x)
put_h264_qpel_8_mc21_8_ssse3: 92.3 ( 4.31x)
put_h264_qpel_8_mc23_8_c: 401.4 ( 1.00x)
put_h264_qpel_8_mc23_8_sse2: 102.3 ( 3.92x)
put_h264_qpel_8_mc23_8_ssse3: 92.8 ( 4.32x)
put_h264_qpel_8_mc31_8_c: 349.4 ( 1.00x)
put_h264_qpel_8_mc31_8_sse2: 50.8 ( 6.88x)
put_h264_qpel_8_mc31_8_ssse3: 51.8 ( 6.75x)
put_h264_qpel_8_mc33_8_c: 351.1 ( 1.00x)
put_h264_qpel_8_mc33_8_sse2: 52.2 ( 6.73x)
put_h264_qpel_8_mc33_8_ssse3: 51.7 ( 6.79x)
put_h264_qpel_16_mc11_8_c: 1391.1 ( 1.00x)
put_h264_qpel_16_mc11_8_sse2: 196.6 ( 7.07x)
put_h264_qpel_16_mc11_8_ssse3: 178.2 ( 7.81x)
put_h264_qpel_16_mc13_8_c: 1385.2 ( 1.00x)
put_h264_qpel_16_mc13_8_sse2: 195.6 ( 7.08x)
put_h264_qpel_16_mc13_8_ssse3: 176.6 ( 7.84x)
put_h264_qpel_16_mc21_8_c: 1607.5 ( 1.00x)
put_h264_qpel_16_mc21_8_sse2: 341.0 ( 4.71x)
put_h264_qpel_16_mc21_8_ssse3: 289.1 ( 5.56x)
put_h264_qpel_16_mc23_8_c: 1616.7 ( 1.00x)
put_h264_qpel_16_mc23_8_sse2: 340.8 ( 4.74x)
put_h264_qpel_16_mc23_8_ssse3: 288.6 ( 5.60x)
put_h264_qpel_16_mc31_8_c: 1397.6 ( 1.00x)
put_h264_qpel_16_mc31_8_sse2: 197.3 ( 7.08x)
put_h264_qpel_16_mc31_8_ssse3: 175.4 ( 7.97x)
put_h264_qpel_16_mc33_8_c: 1394.3 ( 1.00x)
put_h264_qpel_16_mc33_8_sse2: 197.7 ( 7.05x)
put_h264_qpel_16_mc33_8_ssse3: 175.2 ( 7.96x)
As can be seen, the SSE2 version is often neck-to-neck with the SSSE3
version (which also benefits from a better hv2_lowpass SSSE3
implementation for mc21 and mc23) for eight byte block sizes.
Unsurprisingly, SSSE3 beats SSE2 for 16x16 blocks: For SSE2,
these blocks are processed by calling the 8x8 function four times
whereas SSSE3 has a dedicated function (on x64).
This implementation should also be extendable to an AVX version
for 16x16 blocks.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
fa9ea5113b
avcodec/x86/h264_qpel_8bit: Optimize branch away
...
ff_{avg,put}_h264_qpel8or16_hv2_lowpass_ssse3()
currently is almost the disjoint union of the codepaths
for sizes 8 and 16. This size is a compile-time constant
at every callsite. So split the function and avoid
the runtime branch.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
400203c00c
avcodec/x86/h264_qpel: Remove unused parameter from hv2_lowpass funcs
...
tmpstride is unused. This also allows to remove said parameter
from lots of functions in h264_qpel.c.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
b84c818c83
avcodec/x86/h264_qpel: Remove constant parameters from shift5 funcs
...
They are constant since the size 16 version is no longer emulated
via the size 8 version.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
810bd3e62a
avcodec/x86/h264_qpel: Add ff_{avg,put}_pixels16_l2_shift5_sse2
...
Up until now this function was emulated via two calls
to ff_{avg,pull}_pixels8_l2_shift5_mmxext(). Adding a dedicated
function proved beneficial both size wise and performance wise:
The new functions take 192B, yet the simplified calls save
256B with GCC and 320B with Clang here.
This change will also allow further optimizations.
Old benchmarks:
avg_h264_qpel_16_mc12_8_c: 1735.8 ( 1.00x)
avg_h264_qpel_16_mc12_8_sse2: 300.8 ( 5.77x)
avg_h264_qpel_16_mc12_8_ssse3: 233.3 ( 7.44x)
avg_h264_qpel_16_mc32_8_c: 1777.9 ( 1.00x)
avg_h264_qpel_16_mc32_8_sse2: 275.6 ( 6.45x)
avg_h264_qpel_16_mc32_8_ssse3: 235.7 ( 7.54x)
put_h264_qpel_16_mc12_8_c: 1808.2 ( 1.00x)
put_h264_qpel_16_mc12_8_sse2: 267.2 ( 6.77x)
put_h264_qpel_16_mc12_8_ssse3: 231.9 ( 7.80x)
put_h264_qpel_16_mc32_8_c: 1766.9 ( 1.00x)
put_h264_qpel_16_mc32_8_sse2: 272.9 ( 6.47x)
put_h264_qpel_16_mc32_8_ssse3: 229.5 ( 7.70x)
New benchmarks:
avg_h264_qpel_16_mc12_8_c: 1742.3 ( 1.00x)
avg_h264_qpel_16_mc12_8_sse2: 240.3 ( 7.25x)
avg_h264_qpel_16_mc12_8_ssse3: 214.8 ( 8.11x)
avg_h264_qpel_16_mc32_8_c: 1748.0 ( 1.00x)
avg_h264_qpel_16_mc32_8_sse2: 238.0 ( 7.35x)
avg_h264_qpel_16_mc32_8_ssse3: 209.2 ( 8.35x)
put_h264_qpel_16_mc12_8_c: 2014.4 ( 1.00x)
put_h264_qpel_16_mc12_8_sse2: 243.7 ( 8.27x)
put_h264_qpel_16_mc12_8_ssse3: 211.5 ( 9.52x)
put_h264_qpel_16_mc32_8_c: 1800.0 ( 1.00x)
put_h264_qpel_16_mc32_8_sse2: 238.8 ( 7.54x)
put_h264_qpel_16_mc32_8_ssse3: 206.7 ( 8.71x)
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
279b6f3cf5
avcodec/fpel: Avoid loop in ff_avg_pixels4_mmxext()
...
It is only used by h264_qpel.c and only with height four
(which is unrolled) and uses a loop in order to handle
multiples of four as height. Remove the loop and the height
parameter and move the function to h264_qpel_8bit.asm.
This leads to a bit of code duplication, but this is simpler
than all the %if checks necessary to achieve the same outcome
in fpel.asm.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
e340f31b89
avcodec/x86/fpel: Remove redundant repetition
...
The repetition count is always one since
2cf9e733c6
.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
b0c91c2fba
avcodec/h264qpel: Make avg_h264_qpel_pixels_tab smaller
...
avg_h264_qpel only supports 16x16,8x8 and 4x4 blocksizes,
so it is currently unnecessarily large.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
6eb8bc4217
avcodec/h264qpel: Don't build unused 2x2 size funcs for bitdepths > 8
...
The 2x2 put functions are only used by Snow and Snow uses
only the eight bit versions. The rest is dead code. Disabling
it saved 41277B here.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:33 +02:00
Andreas Rheinhardt
92ae9d1ffc
configure: Remove vc1dsp->qpeldsp dependency
...
It only needs it for some x86 fpel functions; instead
add a direct dependency for that.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:32 +02:00
Andreas Rheinhardt
16d5e074dc
avcodec/mips/Makefile: Fix VC1DSP build rules
...
Affected standalone builds of the VC-1 parser.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:32 +02:00
Andreas Rheinhardt
d09f4f3c78
configure: Remove h263_decoder->h263_parser,qpeldsp dependency
...
The former is unnecessary since
3ceffe7839
. The latter is since
ff_mpeg4_workaround_bugs() (and thereby setting the "old" qpeldsp
functions) has been moved inside #if CONFIG_MPEG4_DECODER.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:32 +02:00
Andreas Rheinhardt
0035d99c61
configure: Avoid mpeg4video_parser->{h263,qpel}dsp dependency
...
This can be easily achieved by moving code only used by the MPEG-4
decoder behind #if CONFIG_MPEG4_DECODER.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:32 +02:00
Andreas Rheinhardt
770f78b24a
configure: Remove mss2->qpeldsp dependency
...
Forgotten in 9cc38cc636
.
(mss2 still has an implicit dependency on qpeldsp
via the VC-1 decoder.)
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:32 +02:00
Andreas Rheinhardt
c4c616db53
avcodec/x86/qpel: Move ff_{put,avg}_pixels4_l2_mmxext to h264_qpel
...
Only used there.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:32 +02:00
Andreas Rheinhardt
1e11fdff52
avcodec/x86/qpel{,dsp_init}: Remove constant function parameters
...
ff_avg_pixels{4,8,16}_l2_mmxext() are always called with height
equal to their blocksize. And ff_{put,avg}_pixels4_l2_mmxext()
are furthermore always called with both strides being equal.
So remove these redundant function parameters.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:32 +02:00
Andreas Rheinhardt
52a77128fd
avcodec/x86/qpel{dsp,dsp_init}: Use ptrdiff_t for stride
...
This is more correct given that qpel_mc_func already uses ptrdiff_t;
it also allows to avoid movsxdifnidn.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:32 +02:00
Andreas Rheinhardt
cacf854fe7
avcodec/x86/qpel: Remove always-false branches
...
The ff_avg_pixels{4,8,16}_l2_mmxext() functions are only ever
used in the last step (the one that actually writes to the dst buffer)
where the number of lines to process is always equal to the
dimensions of the block, whereas ff_put_pixels{8,16}_mmxext()
are also used in intermediate calculations where the number of
lines can be 9 or 17.
The code in qpel.asm uses common macros for both and processes
more than one line per loop iteration; it therefore checks
for whether the number of lines is odd and treats this line separately;
yet this special handling is only needed for the put functions,
not the avg functions. It has therefore been %if'ed away for these.
The check is also not needed for ff_put_pixels4_l2_mmxext() which
is only used by H.264 which always processes four lines. Because
ff_{avg,put}_pixels4_l2_mmxext() processes four lines in a single loop
iteration, not only the odd-height handling, but the whole loop
could be removed.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:32 +02:00
Andreas Rheinhardt
8820e2205c
tests/checkasm/hpeldsp: Use instruction-set independent height
...
Otherwise the benchmark numbers are incomparable.
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:32 +02:00
Andreas Rheinhardt
9a0581fca0
tests/checkasm: Add qpeldsp checkasm
...
Reviewed-by: James Almer <jamrial@gmail.com >
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 07:06:32 +02:00
Andreas Rheinhardt
15a9c8dea3
avcodec/liblc3enc: Avoid allocating buffer to send a zero frame
...
liblc3 supports arbitrary strides, so one can simply use a stride
of zero to make it read the same zero value again and again.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-04 06:07:37 +02:00
Andreas Rheinhardt
ab7d1c64c9
avcodec/x86/h263_loopfilter: Port loop filter to SSE2
...
Old benchmarks:
h263dsp.h_loop_filter_c: 41.2 ( 1.00x)
h263dsp.h_loop_filter_mmx: 39.5 ( 1.04x)
h263dsp.v_loop_filter_c: 43.5 ( 1.00x)
h263dsp.v_loop_filter_mmx: 16.9 ( 2.57x)
New benchmarks:
h263dsp.h_loop_filter_c: 41.6 ( 1.00x)
h263dsp.h_loop_filter_sse2: 28.2 ( 1.48x)
h263dsp.v_loop_filter_c: 42.4 ( 1.00x)
h263dsp.v_loop_filter_sse2: 15.1 ( 2.81x)
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-03 17:05:46 +00:00
Andreas Rheinhardt
a8a16c15c8
tests/checkasm/llviddsp: Use the same width for each cpuflag
...
Otherwise the benchmark numbers would be incomparable nonsense.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com >
2025-10-03 17:05:46 +00:00
Cameron Gutman
df4587789f
avcodec/amfenc: avoid unnecessary output delay in low delay mode
...
The code optimizes throughput by letting the encoder work on frame N
until frame N+1 is ready for submission, but this hurts low-delay uses
by delaying output by one frame. Don't delay output beyond what is
necessary when AV_CODEC_FLAG_LOW_DELAY is used.
Signed-off-by: Cameron Gutman <aicommander@gmail.com >
2025-10-03 11:05:03 +00:00
Marton Balint
f1d5114103
avformat/tls_openssl: do not cleanup tls after a successful dtls_start()
...
Regression since 8e11e2cdb8
.
Signed-off-by: Marton Balint <cus@passwd.hu >
2025-10-02 18:41:47 +02:00
Michael Niedermayer
61b6877637
avcodec/mjpegdec: Explain buf_size/width/height check
...
Suggested-by: Ramiro
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc >
2025-10-02 12:52:43 +00:00
Zhao Zhili
1a02412170
avformat/movenc_ttml: fix memleaks
...
Memory leaks can happen on normal case when break from while loop
early, and it can happen on error path with goto cleanup.
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com >
2025-10-01 22:31:03 +08:00
Romain Beauxis
cb4052beae
libavformat/oggparseopus.c: Parse comments from secondary chained streams header packet.
2025-10-01 14:20:55 +00:00
Romain Beauxis
45d7d5d3e2
libavformat/oggparseflac.c: Parse ogg/flac comments in new ogg packets, add them to ogg stream
...
new_metadata.
2025-10-01 14:20:55 +00:00
Romain Beauxis
7dbf7d2a45
libavformat/oggdec.c: Use AV_PKT_DATA_STRINGS_METADATA to pass metadata updates.
2025-10-01 14:20:55 +00:00
Romain Beauxis
cebbb6ae8a
libavformat/oggdec.h, libavformat/oggparsevorbis.c: Factor out vorbis metadata update mechanism.
2025-10-01 14:20:55 +00:00
Romain Beauxis
de8d57e4c5
ogg/vorbis: implement header packet skip in chained ogg bitstreams.
2025-10-01 14:20:55 +00:00
James Almer
5511641365
avcodec/atrac9dec: use av_zero_extend()
...
Signed-off-by: James Almer <jamrial@gmail.com >
2025-10-01 01:26:19 +00:00
James Almer
7ce3a14496
avcodec/apv_entropy: use av_zero_extend()
...
Signed-off-by: James Almer <jamrial@gmail.com >
2025-10-01 01:26:19 +00:00
James Almer
776ee07990
avcodec/aom_film_grain: use av_zero_extend()
...
Signed-off-by: James Almer <jamrial@gmail.com >
2025-10-01 01:26:19 +00:00
Marton Balint
8e11e2cdb8
avformat/tls_openssl: initialize underlying protocol early for dtls_start()
...
The same way we do with TLS, so all tls URL options will be properly supported.
Signed-off-by: Marton Balint <cus@passwd.hu >
2025-10-01 00:34:19 +02:00