1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-11-23 21:54:53 +02:00
Commit Graph

121803 Commits

Author SHA1 Message Date
Andreas Rheinhardt
99209c2876 avcodec/x86/mpegvideoenc_template: Reduce number of registers used
qmat and bias always have a constant offset, so one can use one register
to address both of them. This allows to remove the check for HAVE_6REGS
(untested on a system where HAVE_6REGS is false).
Also avoid FF_REG_a while at it.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-18 20:41:13 +01:00
Andreas Rheinhardt
b890cd0f73 avcodec/x86/mpegvideoenc_template: Avoid touching nonvolatile register
xmm7 is nonvolatile on x64 Windows.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-18 20:41:13 +01:00
Andreas Rheinhardt
aeb138679a avcodec/x86/mpegvideoencdsp: Port add_8x8basis_ssse3() to ASM
Both GCC and Clang completely unroll the unlikely loop at -O3,
leading to codesize bloat; their code is also suboptimal, as they
don't make use of pmulhrsw (even with -mssse3). This commit
therefore ports the whole function to external assembly. The new
function occupies 176B here vs 1406B for GCC.

Benchmarks for a testcase with huge qscale (notice that the C version
is unrolled just like the unlikely loop in the SSSE3 version):
add_8x8basis_c:                                         43.4 ( 1.00x)
add_8x8basis_ssse3 (old):                               43.6 ( 1.00x)
add_8x8basis_ssse3 (new):                               11.9 ( 3.63x)

Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-18 20:41:12 +01:00
Andreas Rheinhardt
0d3a88e55f tests/checkasm/mpegvideoencdsp: Test denoise_dct
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-18 20:41:12 +01:00
Andreas Rheinhardt
1c00e09427 avcodec/mpegvideo_enc: Port denoise_dct to MpegvideoEncDSPContext
It is very simple to remove the MPVEncContext from it.
Notice that this also fixes a bug in x86/mpegvideoenc.c: It only
used the SSE2 version of denoise_dct when dct_algo was auto or mmx
(and it was therefore unused during FATE).

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-18 20:41:12 +01:00
Andreas Rheinhardt
d633fa0433 avcodec/x86/mpegvideoenc: Port denoise_dct_sse2 to external assembly
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-18 20:41:12 +01:00
Andreas Rheinhardt
2cfef7031c avcodec/x86/mpegvideoenc: Reduce number of registers used
Avoids a push+pop on x64 Windows.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-18 20:41:12 +01:00
Andreas Rheinhardt
503afa40f7 avcodec/x86/mpegvideoenc: Remove check for MMX
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-18 20:41:12 +01:00
Marvin Scholz
00ef656a85 .forgejo/CODEOWNERS: add myself to VideoToolbox and Icecast 2025-11-18 15:17:05 +01:00
Carl Hetherington via ffmpeg-devel
1eb2cbd865 avfilter/f_ebur128: Fix incorrect ebur128 peak calculation.
Since 3b26b782ee it would only look at the
first channel.

Signed-off-by: Carl Hetherington <cth@carlh.net>
Reviewed-by: Niklas Haas <ffmpeg@haasn.xyz>
2025-11-18 08:40:08 +01:00
Gyan Doshi
f60db2e566 doc/fate: document setting of session-wide env variables 2025-11-18 04:19:06 +00:00
Kacper Michajłow
9b2162275b configure: filter out -guard:signret from armasm flags
While cl.exe supports -guard:signret, armasm64 complains about
unknown flag. Note that -guard:ehcont is accepted by armasm64.

Fixes:
error A2029: unknown command-line argument or argument value -guard:signret

Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-11-17 20:41:34 +00:00
Kacper Michajłow
523d688c2b fate: add more configure flags to fate config
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
2025-11-17 20:25:24 +00:00
Andreas Rheinhardt
ddf443f1e9 avfilter/vf_fsppdsp: Fix left shifts of negative numbers
They are undefined behavior and UBSan warns about them
(in the checkasm test). Put the shifts in the constants
instead. This even gives a tiny speedup here:

Old benchmarks:
column_fidct_c:                                       3369.9 ( 1.00x)
column_fidct_sse2:                                     829.1 ( 4.06x)
New benchmarks:
column_fidct_c:                                       3304.2 ( 1.00x)
column_fidct_sse2:                                     827.9 ( 3.99x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:12 +01:00
Andreas Rheinhardt
f8bcea4946 avfilter/vf_fsppdsp: Remove pointless cast
Also don't cast const away and use a smaller scope.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:12 +01:00
Andreas Rheinhardt
0c556a6b09 avfilter/vf_fspp: Pre-reorder threshold table
Avoids reordering at runtime.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:12 +01:00
Andreas Rheinhardt
778ff97efa avfilter/vf_fspp: Make output endian-independent
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:12 +01:00
Andreas Rheinhardt
f442145729 avfilter/vf_fspp: Avoid casts, effective-type violations
Maybe uint64_t has been used as a poor man's alignment specifier?
Anyway, reading an uint64_t via an lvalue of type int16_t (as happens
in the C versions of the dsp functions) is undefined behavior.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:12 +01:00
Andreas Rheinhardt
c0648b2004 avfilter/x86/vf_spp: Fix comment
Forgotten in dcb28ed860.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:12 +01:00
Andreas Rheinhardt
06b0dae51b avfilter/vf_fsppdsp: Constify
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:12 +01:00
Andreas Rheinhardt
cc97f1e276 avfilter/vf_fspp: Fix effective type violation
Also don't use unnecessarily large alignment; it avoids having to align
the stack.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:12 +01:00
Andreas Rheinhardt
3cd452cbf1 avfilter/x86/vf_fspp: Avoid stack on x64
Possible due to the amount of registers.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:12 +01:00
Andreas Rheinhardt
ddd74276f8 avfilter/x86/vf_fspp: Port ff_column_fidct_mmx() to SSE2
It gains a lot because it has to operate on eight words;
it also saves 608B of .text here.

Old benchmarks:
column_fidct_c:                                       3365.7 ( 1.00x)
column_fidct_mmx:                                     1784.6 ( 1.89x)

New benchmarks:
column_fidct_c:                                       3361.5 ( 1.00x)
column_fidct_sse2:                                     801.1 ( 4.20x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:11 +01:00
Andreas Rheinhardt
68b11cde82 tests/checkasm/vf_fspp: Add test for column_fidct
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:11 +01:00
Andreas Rheinhardt
63493bf0e0 avfilter/x86/vf_fspp: Put shifts into constants
This avoids some shift instructions and also gives us more headroom
in the registers. In fact, I have proven to myself that everything
that is supposed to fit into 16bits now actually does so.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:11 +01:00
Andreas Rheinhardt
66af18d06a avfilter/x86/vf_fspp: Make ff_column_fidct_mmx() bitexact
It currently is not, because the shortcut mode uses different rounding
than the C code (as well as the non-shortcut code).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 12:18:11 +01:00
Andreas Rheinhardt
1049a5fba8 avfilter/vf_fsppdsp: Reduce discrepancies between C code and x86 asm
The x86 assembly uses the following pattern to zero all
the values with abs<threshold:
    x -= threshold;
    x satu+= threshold (unsigned saturated addition)
    x += threshold
    x satu-= threshold (unsigned saturated subtraction)
The reference C code meanwhile zeroed everything
with abs <= threshold. This commit makes the C code behave
like the x86 assembly to reduce discrepancies between the two.

An alternative would be to require SSSE3, so that
one can use pabsw, pcmpgtw for abs>threshold, followed by
a pand with the original data. Or one could modify the thresholds
to make both equal.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 11:28:04 +01:00
Andreas Rheinhardt
d19050a1ae avfilter/vf_fsppdsp: Use restrict
It is possible because the requirements are fulfilled;
it is also beneficial performance and code-size wise.
For GCC 14 (with -O3), this reduced codesize by 26750B
here; for Clang 20, it was 432B.

Old benchmarks:
mul_thrmat_c:                                            4.3 ( 1.00x)
mul_thrmat_sse2:                                         4.3 ( 1.00x)
store_slice_c:                                        2810.8 ( 1.00x)
store_slice_sse2:                                      542.5 ( 5.18x)
store_slice2_c:                                       3817.0 ( 1.00x)
store_slice2_sse2:                                     410.4 ( 9.30x)

New benchmarks:
mul_thrmat_c:                                            4.3 ( 1.00x)
mul_thrmat_sse2:                                         4.3 ( 1.00x)
store_slice_c:                                        1510.1 ( 1.00x)
store_slice_sse2:                                      545.2 ( 2.77x)
store_slice2_c:                                       1763.5 ( 1.00x)
store_slice2_sse2:                                     408.3 ( 4.32x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 11:28:04 +01:00
Andreas Rheinhardt
ff85a20b7d avfilter/x86/vf_fspp: Port store_slice to SSE2
Old benchmarks:
store_slice_c:                                        2798.3 ( 1.00x)
store_slice_mmx:                                       950.2 ( 2.94x)
store_slice2_c:                                       3811.7 ( 1.00x)
store_slice2_mmx:                                      682.3 ( 5.59x)

New benchmarks:
store_slice_c:                                        2797.2 ( 1.00x)
store_slice_sse2:                                      543.5 ( 5.15x)
store_slice2_c:                                       3817.0 ( 1.00x)
store_slice2_sse2:                                     408.2 ( 9.35x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 11:28:04 +01:00
Andreas Rheinhardt
570f8fc6c9 tests/checkasm/vf_fspp: Test store_slice
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 11:28:04 +01:00
Andreas Rheinhardt
e042f17e99 avfilter/vf_fsppdsp: Use standard clamping
This is obviously what is intended and what the MMX code does;
yet I cannot rule out that it changes the output for some inputs:
I have observed individual src values which would lead to temp
values just above 512 if they came in pairs (i.e. if both inputs
were simultaneously huge).

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 11:28:04 +01:00
Andreas Rheinhardt
52ba2ac7bd avfilter/x86/vf_fspp: Port mul_thrmat to SSE2
This fixes an ABI violation, as mul_thrmat did not issue emms.
It seems that this ABI violation could reach the user, namely
if ff_get_video_buffer() fails. Notice that ff_get_video_buffer()
itself could fail because of this, namely if the allocator uses
floating point registers.

On x64 (where GCC already used SSE2 in the C version)
mul_thrmat_c:                                            4.4 ( 1.00x)
mul_thrmat_mmx:                                          8.6 ( 0.52x)
mul_thrmat_sse2:                                         4.4 ( 1.00x)

On 32bit (where SSE2 is not known to be available):
mul_thrmat_c:                                           56.0 ( 1.00x)
mul_thrmat_sse2:                                         6.0 ( 9.40x)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 11:28:04 +01:00
Andreas Rheinhardt
70eb8a76a9 tests/checkasm: Add vf_fspp mul_thrmat test
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 11:28:04 +01:00
Andreas Rheinhardt
9f4d5d818d avfilter/x86/vf_fspp: Don't duplicate dither table
Reuse the one from vf_fsppdsp.c; also don't overalign said table too
much.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 11:28:04 +01:00
Andreas Rheinhardt
1699de0955 avfilter/vf_fsppdsp: Use enum for constants
It means that the compiler does not have to optimize the static const
object away.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 11:28:04 +01:00
Andreas Rheinhardt
9b34088c4d avfilter/vf_fspp: Add DSPCtx, move DSP functions to file of their own
This is in preparation for adding checkasm tests; without it,
checkasm would pull all of libavfilter in.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 11:28:04 +01:00
Andreas Rheinhardt
57d6898730 configure: Only test for SSE2 intrinsics on x86
Reviewed-by: Kieran Kunhya <kieran@kunhya.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-17 10:36:54 +01:00
Artem Smorodin
e94439e49b avformat/tee: fix the default onfail setting of the tee salves
I found that the default value is not set for onfail option. I see that there is an attempt to set this value by default inside parse_slave_failure_policy_option. But look at the CONSUME_OPTION macro. If av_dict_get cannot find this option, then this function is not even called.
2025-11-17 00:01:42 +00:00
Michael Niedermayer
88b676105d avcodec/prores_raw: Check bits in get_value()
The code loads 32bit so we can at maximum use 32bit

the return type is also changed to uint16_t (was requested in review),

no path is known where a return value above 32767 is produced, but that was not exhaustively checked

Fixes: runtime error: shift exponent -9 is negative
Fixes: 439483046/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_PRORES_RAW_DEC_fuzzer-6649466540326912

Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-11-16 21:43:17 +01:00
Michael Niedermayer
9ccc33d84d avcodec/prores_raw: Prettify ff_prores_raw_*_cb
the values contain 3 4 bit values, thus using hex is more natural
and shows more information

Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-11-16 21:34:38 +01:00
Michael Niedermayer
ad956ff076 avfilter/vf_drawtext: Account for bbox text seperator
Fixes: out of array access
no test case

Found-by: Joshua Rogers <joshua@joshua.hu> with ZeroPath
Reviewed-by: Joshua Rogers <joshua@joshua.hu>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-11-16 20:32:11 +01:00
Andreas Rheinhardt
643e2e10f9 avutil/cpu: Deprecate AV_CPU_FLAG_FORCE
This flag does nothing since the deactivation of
the dsp_mask field of AVCodecContext in
commits 9ae6ba2883 and
9ae6ba2883 (it has
been superseded with better ways to override the CPU flags).
So deprecate it.

Reviewed-by: Lynne <dev@lynne.ee>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-16 11:24:00 +01:00
Zhao Zhili
925282fafc doc/filters: add section for VideoToolbox filter
Move scale_vt and transpose_vt to this section.
2025-11-16 10:22:19 +00:00
Martin Storsjö
e096a592cb doc: Fix building with makeinfo 4.8
This fixes building after commit
1ce88d29d0.

That commit caused the following errors:

    src/doc/fate.texi:234: @anchor expected braces.
    src/doc/fate.texi:245: @item found outside of an insertion block.
    src/doc/fate.texi:249: @item found outside of an insertion block.
    src/doc/fate.texi:261: @item found outside of an insertion block.
    src/doc/fate.texi:265: @item found outside of an insertion block.
    src/doc/fate.texi:268: @item found outside of an insertion block.
    src/doc/fate.texi:274: @item found outside of an insertion block.
    src/doc/fate.texi:277: @item found outside of an insertion block.
    src/doc/fate.texi:281: @item found outside of an insertion block.
    src/doc/fate.texi:287: Unmatched `@end'.
    ./src/doc/fate.texi:65: Cross reference to nonexistent node `makefile variables' (perhaps incorrect sectioning?).
2025-11-15 19:29:08 +02:00
Gyan Doshi
1ce88d29d0 doc/fate: improve section on running FATE
With thanks to Adam Koszek.
2025-11-15 08:28:51 +00:00
Cameron Gutman
d3dea2b142 avcodec/v4l2_buffers: map additional V4L2 TRCs
Signed-off-by: Cameron Gutman <aicommander@gmail.com>
2025-11-15 00:39:43 +00:00
Andreas Rheinhardt
e293091d98 avcodec/prores_raw: Reuse permutation
The ProresDSPContext already contains the idct_permutation.

Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-11-14 14:27:53 +01:00
Michael Niedermayer
41a9c6ec5f avcodec/mediacodecdec_common: Check that the input to mediacodec_wrap_sw_audio_buffer() contains channel * sample_size
Fixes: out of array access
no testcase

Found-by: Joshua Rogers <joshua@joshua.hu> with ZeroPath
Reviewed-by: Joshua Rogers <joshua@joshua.hu>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-11-14 12:16:48 +00:00
James Almer
b478037423 avdevice/lavfi: stop setting deprecated buffersink options
Signed-off-by: James Almer <jamrial@gmail.com>
2025-11-13 21:02:10 -03:00
Dmitrii Ovchinnikov
62184be548 avutil/hwcontext_amf: Simplified blocking before frame submission
Instead of blocking the entire context, which can cause issues in more
complex pipelines, now only frame sending is blocked via AVMutex
2025-11-13 15:49:42 +01:00