1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-11-23 21:54:53 +02:00
Commit Graph

121456 Commits

Author SHA1 Message Date
Michael Yang
26dee5b43e libavfilter/vf_nlmeans_vulkan: reverse img_bar 2025-10-16 21:32:43 +00:00
Michael Yang
71ff349cc1 libavfilter/vf_nlmeans_vulkan: lower strength min
Lower (per-component) strength minimum from 1.0 to 0.0, with 0.0 skipping
integral and weights calculations.
2025-10-16 21:32:43 +00:00
Michael Yang
2e12b3251d libavfilter/vf_nlmeans_vulkan: clean up naming
Add `nb_components` to push data.

Rename `ws_total_*`` to `ws_*`.
2025-10-16 21:32:43 +00:00
Michael Yang
3fac2d8593 avfilter/vf_nlmeans_vulkan: rewrite filter
This is a major rewrite of the exising nlmeans vulkan code, with bug
fixes and major performance improvement.

Fix visual artifacts found in ticket #10661, #10733. Add OOB checks for
image loading and patch sized area around the border. Correct chroma
plane height, strength and buffer barrier index.

Improve parallelism with component workgroup axis and more but smaller
workgroups. Split weights pass into vertical/horizontal (integral) and
weights passes. Remove h/v order logic to always calculate sum on
vertical pass. Remove atomic float requirement, which causes high memory
locking contentions, at the cost of higher memory usage of w/s buffer.
Use cache blocking in h pass to reduce memory bandwidth usage.
2025-10-16 21:32:43 +00:00
Martin Storsjö
36896af64a movenc: Make the hybrid_fragmented mode more robust
Write the moov tag at the end first, before overwriting the mdat size
at the start of the file.

In case writing the final moov box fails (e.g. due to being out
of disk), we haven't broken the initial moov box yet.

Thus if writing stops between these steps, we could end up with
a file with two moov boxes - which arguably is more feasible to
recover from, than from a file with no moov boxes at all.
2025-10-16 18:58:54 +00:00
Niklas Haas
a45d30a675 avutil/hwcontext_vulkan: always enable baseline usage flags
The documentation states that this field is for enabling "extra" usage
flags. This conflicts with the implementation, and the rest of the comment,
though.

In resolving this ambiguity, I think it's better to lean towards the first
sentence and treat this field purely as specifying *extra* usage flags to
enable. Otherwise, this may break vulkan encoding or subsequent hwdownload
if the upstream filter did not specifically advertise this.

Change the default behavior and update the documentation slightly to more
clearly document the semantics.
2025-10-16 17:40:25 +00:00
Andreas Rheinhardt
b1f2eea1cd avfilter/vf_noise: Deduplicate option flags
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-16 19:10:51 +02:00
Andreas Rheinhardt
3ba570de8b avfilter/x86/vf_noise: Port line_noise funcs to SSE2
This avoids having to fix up ABI violations via emms_c and
also leads to a 73% speedup for the line noise average version
here.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-16 19:09:45 +02:00
Andreas Rheinhardt
adfec0f52e avfilter/x86/vf_noise: Make line_noise_avg_mmx() match C function
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-16 18:41:19 +02:00
Andreas Rheinhardt
214b52df43 avfilter/vf_noise: Avoid cast
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-16 18:41:19 +02:00
Andreas Rheinhardt
ece623b1b3 avfilter/vf_noise: Fix race with very tall images
When using averaged noise with height > MAX_RES (i.e. 4096),
multiple threads would access the same prev_shift slot,
leading to races. Fix this by disabling slice threading
in such scenarios.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-16 18:41:19 +02:00
Andreas Rheinhardt
6a53a4e341 avfilter/vf_noise: Don't write beyond end-of-array
This is not only UB, but also leads to races and nondeterministic
output, because the write one last the end of the buffer actually
conflicts with accesses by the thread that actually owns it.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-16 18:41:18 +02:00
Andreas Rheinhardt
94948bd6b9 avfilter/vf_noise: Make private context smaller
"all" only exists to set options; it does not need the big arrays
contained in FilterParams.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-16 18:41:18 +02:00
Zhao Zhili
cd4b01707d Revert "avformat/movenc: sidx earliest_presentation_time is applied after editlist"
This reverts commit 301141b576.

cluster[0].dts, pts and frag_info[0].time are already in presentation
timeline, so they shouldn't be shift by start_pts.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2025-10-16 11:22:37 +08:00
Zhao Zhili
0de3b1f358 avformat/mov: don't shift sidx_pts
sidx_pts is already in presentation time, so it shouldn't be shift
by sc->time_offset again.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2025-10-16 11:22:37 +08:00
James Almer
2e1d702cfc avformat/dump: fix log level passed to av_log when printing stream group side data
Signed-off-by: James Almer <jamrial@gmail.com>
2025-10-15 17:49:11 -03:00
Andreas Rheinhardt
74a3c1ddb6 avfilter/x86/vf_pullup: Port pullup functions to SSE2, SSSE3
The diff and var functions benefit from psadbw, comb from wider
registers which allows to avoid reloading values, reducing the number
of loads from 48 to 10. Performance increased by 117% (the loop
in compute_metric() has been timed); codesize decreased by 144B.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-15 19:43:37 +02:00
Andreas Rheinhardt
dcb28ed860 avfilter/x86/vf_spp: Port store_slice to SSE2
This allows to remove an emms_c from the filter. It also gives
25% speedup here (when timing the calls to store_slice using
START/STOP_TIMER).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-15 19:43:37 +02:00
Andreas Rheinhardt
f4a87d8ca4 avcodec/x86/mpegvideoencdsp_init: Use xmm registers in SSSE3 functions
Improves performance and no longer breaks the ABI (by forgetting
to call emms).

Old benchmarks:
add_8x8basis_c:                                         43.6 ( 1.00x)
add_8x8basis_ssse3:                                     12.3 ( 3.55x)

New benchmarks:
add_8x8basis_c:                                         43.0 ( 1.00x)
add_8x8basis_ssse3:                                      6.3 ( 6.79x)

Notice that the output of try_8x8basis_ssse3 changes a bit:
Before this commit, it computes certain values and adds the values
for i,i+1,i+4 and i+5 before right shifting them; now it adds
the values for i,i+1,i+8,i+9. The second pair in these lists
could be avoided (by shifting xmm0 and xmm1 before adding both together
instead of only shifting xmm0 after adding them), but the former
i,i+1 is inherent in using pmaddwd. This is the reason that this
function is not bitexact.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-15 08:55:13 +02:00
Andreas Rheinhardt
cffd029e98 avcodec/x86/mpegvideoencdsp_init: Don't use slow path unnecessarily
The only requirement of this code (and essentially the pmulhrsw
instruction) is that the scaled scale fits into an int16_t.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-15 08:55:13 +02:00
Andreas Rheinhardt
ce499ebf96 tests/checkasm/mpegvideoencdsp: Add test for add_8x8basis
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-15 08:55:13 +02:00
Michael Niedermayer
566e9032b1 swscale/output: Fix unsigned cast position in yuv2*
Fixes: signed overflow

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-10-14 20:55:54 +02:00
Michael Niedermayer
0c6b7f9483 swscale/output: Fix integer overflow in yuv2ya16_X_c_template()
Found-by: colod colod <colodcolod7@gmail.com>

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-10-14 20:55:53 +02:00
Zhao Zhili
6b961f5963 avformat/mov: fix missing video size when some decoders are disabled
Fix #20667

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2025-10-14 20:05:55 +08:00
Andreas Rheinhardt
a24e0f536d avcodec/x86/hpeldsp_init: Remove check for inline mmx
Forgotten in 4c55724da8.

Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-14 12:31:15 +02:00
Frank Plowman
b0c77e5a12 lavc/vvc: Store RefStruct references to referenced PSs/headers in slice
This loosens the coupling between CBS and the decoder by no longer using
CodedBitstreamH266Context (containing the most recently parsed PSs & PH)
to retrieve the PSs & PH in the decoder. Doing so is beneficial in two
ways:
1. It improves robustness to the case in which an AVPacket doesn't
   contain precisely one PU.
2. It allows the decoder parameter set manager to properly handle the
   case in which a single PU (erroneously) contains conflicting
   parameter sets.

Signed-off-by: Frank Plowman <post@frankplowman.com>
2025-10-13 19:05:36 +01:00
Andreas Rheinhardt
31f0749cd4 avcodec/vp3: Optimize alignment check away when possible
Check only on arches that need said check.

(Btw: I do not see how h_loop_filter benefits from alignment
at all and why h_loop_filter_unaligned exists.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-13 18:59:49 +02:00
Andreas Rheinhardt
5823ab347a avcodec/vp3dsp: Remove unused flags parameter from ff_vp3dsp_init()
No longer necessary now that the x86 loop filter functions are
bitexact.

Reviewed-by: Sean McGovern <gseanmcg@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-13 18:59:24 +02:00
Andreas Rheinhardt
e3ca57ae8f avcodec/x86/vp3dsp: Port loop filters to SSE2
The old code operated on bytes and did lots of tricks
due to their limited range; it did not completely succeed,
which is why the old versions were not used when bitexact
output was requested.

In contrast, the new version is much simpler: It operates
on signed 16 bit words whose range is more than sufficient.
This means that these functions don't need a check for bitexactness
(and can be used in FATE).

Old benchmarks (for this, the AV_CODEC_FLAG_BITEXACT check has been
removed from checkasm):
h_loop_filter_c:                                        29.8 ( 1.00x)
h_loop_filter_mmxext:                                   32.2 ( 0.93x)
h_loop_filter_unaligned_c:                              29.9 ( 1.00x)
h_loop_filter_unaligned_mmxext:                         31.4 ( 0.95x)
v_loop_filter_c:                                        39.3 ( 1.00x)
v_loop_filter_mmxext:                                   14.2 ( 2.78x)
v_loop_filter_unaligned_c:                              38.9 ( 1.00x)
v_loop_filter_unaligned_mmxext:                         14.3 ( 2.72x)

New benchmarks:
h_loop_filter_c:                                        29.2 ( 1.00x)
h_loop_filter_sse2:                                     28.6 ( 1.02x)
h_loop_filter_unaligned_c:                              29.0 ( 1.00x)
h_loop_filter_unaligned_sse2:                           26.9 ( 1.08x)
v_loop_filter_c:                                        38.3 ( 1.00x)
v_loop_filter_sse2:                                     11.0 ( 3.47x)
v_loop_filter_unaligned_c:                              35.5 ( 1.00x)
v_loop_filter_unaligned_sse2:                           11.2 ( 3.18x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-13 18:58:50 +02:00
Andreas Rheinhardt
5d9a392bce tests/checkasm: Add VP3 loop filter test
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-13 18:58:50 +02:00
zhanghongyuan
0bc54cddb1 fftools/opt_common: add long-form license option
Add "license" as a long-form command line option alongside the existing
"L" short option for showing license information. This maintains
consistent option naming patterns with other commands that provide both
short and long forms (help/?/help, etc.) and improves command line
usability by providing more descriptive option names.
2025-10-12 03:26:21 +00:00
Tong Wu
10e9672a8c avcodec/d3d12va_encode: use macros to set QP range and max frame size
Signed-off-by: Tong Wu <wutong1208@outlook.com>
2025-10-12 01:50:57 +00:00
Andreas Rheinhardt
36f92206bb avcodec/x86/hpeldsp: Improve ff_{avg,put}_pixels8_xy2_ssse3()
This SSSE3 function uses MMX registers (of course without emms
at the end) and processes eight bytes of input by unpacking
it into two MMX registers. This is very suboptimal given
that one can just use XMM registers to process eight words.
This commit switches them to using XMM registers.

Old benchmarks:
avg_pixels_tab[1][3]_c:                                114.5 ( 1.00x)
avg_pixels_tab[1][3]_ssse3:                             43.6 ( 2.62x)
put_pixels_tab[1][3]_c:                                 83.6 ( 1.00x)
put_pixels_tab[1][3]_ssse3:                             34.0 ( 2.46x)

New benchmarks:
avg_pixels_tab[1][3]_c:                                115.3 ( 1.00x)
avg_pixels_tab[1][3]_ssse3:                             24.6 ( 4.69x)
put_pixels_tab[1][3]_c:                                 83.8 ( 1.00x)
put_pixels_tab[1][3]_ssse3:                             19.7 ( 4.24x)

Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-12 02:45:37 +02:00
Andreas Rheinhardt
4c55724da8 avcodec/x86/hpeldsp: Add ff_put_no_rnd_pixels8_xy2_ssse3()
Given that one has to deal with 16 byte intermediates it is
unsurprising that SSE2 wins against MMX; the MMX version has
therefore been removed (as well as the now unused inline_asm.h).
The new function is even 32B smaller than the old MMX one.

Old benchmarks:
put_no_rnd_pixels_tab[1][3]_c:                          84.1 ( 1.00x)
put_no_rnd_pixels_tab[1][3]_mmx:                        41.1 ( 2.05x)

New benchmarks:
put_no_rnd_pixels_tab[1][3]_c:                          84.0 ( 1.00x)
put_no_rnd_pixels_tab[1][3]_ssse3:                      22.1 ( 3.80x)

Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-12 02:45:25 +02:00
Andreas Rheinhardt
f84e06026a avcodec/x86/hpeldsp: Add SSE2 of {avg,put} no_rnd xy2 with blocksize 16
Also remove the now superseded MMX versions (the new functions have the
exact same codesize as the removed ones).

Old benchmarks:
avg_no_rnd_pixels_tab[0][3]_c:                         233.7 ( 1.00x)
avg_no_rnd_pixels_tab[0][3]_mmx:                       121.5 ( 1.92x)
put_no_rnd_pixels_tab[0][3]_c:                         171.4 ( 1.00x)
put_no_rnd_pixels_tab[0][3]_mmx:                        82.6 ( 2.08x)

New benchmarks:
avg_no_rnd_pixels_tab[0][3]_c:                         233.3 ( 1.00x)
avg_no_rnd_pixels_tab[0][3]_sse2:                       45.0 ( 5.18x)
put_no_rnd_pixels_tab[0][3]_c:                         172.1 ( 1.00x)
put_no_rnd_pixels_tab[0][3]_sse2:                       40.9 ( 4.21x)

Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-12 02:43:29 +02:00
Andreas Rheinhardt
ce9d181444 avcodec/mjpegdec: Remove unnecessary reloads
Hint: The parts of this patch in decode_block_progressive()
and decode_block_refinement() rely on the fact that GET_VLC
returns -1 on error, so that it enters the codepaths for
actually coded block coefficients.

Reviewed-by: Ramiro Polla <ramiro.polla@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-11 08:20:42 +02:00
Andreas Rheinhardt
dad06a445f avcodec/Makefile: Remove h263 decoder->mpeg4videodec.o dependency
Also prefer using #if CONFIG_MPEG4_DECODER checks in order not
to rely on DCE.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-11 07:51:01 +02:00
Andreas Rheinhardt
10d3479da0 avcodec/h263dec: Avoid redundant branch
Only the MPEG-4 decoder can have partitioned frames here.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-11 07:51:01 +02:00
Andreas Rheinhardt
d96f8d32ad avcodec/x86/h264_qpel: Don't instantiate unused functions
The v_lowpass wrappers (which are instantiated by this macro)
are only used in the put (and not the avg) form for SSSE3
(the avg form is only used for mc02, which doesn't exist
for SSSE3). Clang warns about the unused functions.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-10 16:27:57 +02:00
Niklas Haas
6f1ab828d3 libavfilter/vf_libplacebo: add temperature option 2025-10-09 20:45:09 +00:00
Leo Izen
eab3b68237 avcodec/exif: avoid printing errors for makernote non-IFD parsing
When we parse a MakerNote, we first try to parse it as an IFD and if
that fails, we try to re-parse it as a binary blob. This is because
MakerNote is not well-documented in its nature.

However, if we fail to parse it the first time, we should not av_log
error messages about the parse failure, so instead we log these as
AV_LOG_DEBUG.

Signed-off-by: Leo Izen <leo.izen@gmail.com>
Reported-by: Ramiro Polla <ramiro.polla@gmail.com>
2025-10-09 12:40:41 -04:00
James Almer
41c168444e avcodec/hevc/sei: don't attempt to use stale values in HEVCSEITimeCode
Invalidate the whole struct on SEI reset.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-10-09 12:09:35 -03:00
James Almer
8e01bff774 avcodec/hevc/sei: don't attempt to use stale values in HEVCSEITDRDI
Invalidate the whole struct on SEI reset.

Signed-off-by: James Almer <jamrial@gmail.com>
2025-10-09 12:09:35 -03:00
James Almer
d448d6d1a0 avcodec/hevc/sei: prevent storing a potentially bogus num_ref_displays value in HEVCSEITDRDI
Fixes: 439711052/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_HEVC_fuzzer-4956250308935680
Fixes: out of array access

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: James Almer <jamrial@gmail.com>
2025-10-09 12:09:35 -03:00
Jack Lau
a934d48440 doc/muxers: correct default pkt_size value of whip
Signed-off-by: Jack Lau <jacklau1222@qq.com>
2025-10-09 14:33:02 +00:00
Jack Lau
b43f8dec18 avformat/whip: add macros to replace magic number
Signed-off-by: Jack Lau <jacklau1222@qq.com>
2025-10-09 14:32:03 +00:00
Jack Lau
bc6164eb6f avformat/whip: remove WHIP_STATE_DTLS_CONNECTING
This value is only useful when dtls handshake is NONBLOCK mode,
dtls handshake just need to call ffurl_handshake once since it
force block mode.

Signed-off-by: Jack Lau <jacklau1222@qq.com>
2025-10-09 14:32:03 +00:00
Jack Lau
76b13ca0a6 avformat/whip: check the peer whether is ice lite
See RFC 5245 Section 4.3
If an agent is a lite implementation, it MUST include an "a=ice-lite"
session-level attribute in its SDP.  If an agent is a full
implementation, it MUST NOT include this attribute.

Signed-off-by: Jack Lau <jacklau1222@qq.com>
2025-10-09 14:32:03 +00:00
Jack Lau
ec0a04de0d avformat/whip: remind user increase -buffer_size
The udp buffer size might be too small to easily
be full temporarily and return WSAEWOULDBLOCK.
The udp code will handle the windows error code
and convert it to AVERROR(EAGAIN).

This issue just can be reproduced on windows.

If sleep a interval and retry to send pkt when hit
EAGAIN, it will increase latency, and appropriate
interval is hard to define.

So this patch just remind user increase the buffer
size via -buffer_size to avoid this issue.

Signed-off-by: Jack Lau <jacklau1222@qq.com>
2025-10-09 09:55:18 +00:00
Jack Lau
b3793d9941 avformat/whip: pass through buffer_size option to udp
Signed-off-by: Jack Lau <jacklau1222@qq.com>
2025-10-09 09:55:18 +00:00