FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-11-23 21:54:53 +02:00

Author	SHA1	Message	Date
Andreas Rheinhardt	99209c2876	avcodec/x86/mpegvideoenc_template: Reduce number of registers used qmat and bias always have a constant offset, so one can use one register to address both of them. This allows to remove the check for HAVE_6REGS (untested on a system where HAVE_6REGS is false). Also avoid FF_REG_a while at it. Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-18 20:41:13 +01:00
Andreas Rheinhardt	b890cd0f73	avcodec/x86/mpegvideoenc_template: Avoid touching nonvolatile register xmm7 is nonvolatile on x64 Windows. Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-18 20:41:13 +01:00
Andreas Rheinhardt	aeb138679a	avcodec/x86/mpegvideoencdsp: Port add_8x8basis_ssse3() to ASM Both GCC and Clang completely unroll the unlikely loop at -O3, leading to codesize bloat; their code is also suboptimal, as they don't make use of pmulhrsw (even with -mssse3). This commit therefore ports the whole function to external assembly. The new function occupies 176B here vs 1406B for GCC. Benchmarks for a testcase with huge qscale (notice that the C version is unrolled just like the unlikely loop in the SSSE3 version): add_8x8basis_c: 43.4 ( 1.00x) add_8x8basis_ssse3 (old): 43.6 ( 1.00x) add_8x8basis_ssse3 (new): 11.9 ( 3.63x) Reviewed-by: Kieran Kunhya <kieran@kunhya.com> Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-18 20:41:12 +01:00
Andreas Rheinhardt	0d3a88e55f	tests/checkasm/mpegvideoencdsp: Test denoise_dct Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-18 20:41:12 +01:00
Andreas Rheinhardt	1c00e09427	avcodec/mpegvideo_enc: Port denoise_dct to MpegvideoEncDSPContext It is very simple to remove the MPVEncContext from it. Notice that this also fixes a bug in x86/mpegvideoenc.c: It only used the SSE2 version of denoise_dct when dct_algo was auto or mmx (and it was therefore unused during FATE). Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-18 20:41:12 +01:00
Andreas Rheinhardt	d633fa0433	avcodec/x86/mpegvideoenc: Port denoise_dct_sse2 to external assembly Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-18 20:41:12 +01:00
Andreas Rheinhardt	2cfef7031c	avcodec/x86/mpegvideoenc: Reduce number of registers used Avoids a push+pop on x64 Windows. Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-18 20:41:12 +01:00
Andreas Rheinhardt	503afa40f7	avcodec/x86/mpegvideoenc: Remove check for MMX Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-18 20:41:12 +01:00
Marvin Scholz	00ef656a85	.forgejo/CODEOWNERS: add myself to VideoToolbox and Icecast	2025-11-18 15:17:05 +01:00
Carl Hetherington via ffmpeg-devel	1eb2cbd865	avfilter/f_ebur128: Fix incorrect ebur128 peak calculation. Since `3b26b782ee` it would only look at the first channel. Signed-off-by: Carl Hetherington <cth@carlh.net> Reviewed-by: Niklas Haas <ffmpeg@haasn.xyz>	2025-11-18 08:40:08 +01:00
Gyan Doshi	f60db2e566	doc/fate: document setting of session-wide env variables	2025-11-18 04:19:06 +00:00
Kacper Michajłow	9b2162275b	configure: filter out -guard:signret from armasm flags While cl.exe supports -guard:signret, armasm64 complains about unknown flag. Note that -guard:ehcont is accepted by armasm64. Fixes: error A2029: unknown command-line argument or argument value -guard:signret Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-11-17 20:41:34 +00:00
Kacper Michajłow	523d688c2b	fate: add more configure flags to fate config Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-11-17 20:25:24 +00:00
Andreas Rheinhardt	ddf443f1e9	avfilter/vf_fsppdsp: Fix left shifts of negative numbers They are undefined behavior and UBSan warns about them (in the checkasm test). Put the shifts in the constants instead. This even gives a tiny speedup here: Old benchmarks: column_fidct_c: 3369.9 ( 1.00x) column_fidct_sse2: 829.1 ( 4.06x) New benchmarks: column_fidct_c: 3304.2 ( 1.00x) column_fidct_sse2: 827.9 ( 3.99x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:12 +01:00
Andreas Rheinhardt	f8bcea4946	avfilter/vf_fsppdsp: Remove pointless cast Also don't cast const away and use a smaller scope. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:12 +01:00
Andreas Rheinhardt	0c556a6b09	avfilter/vf_fspp: Pre-reorder threshold table Avoids reordering at runtime. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:12 +01:00
Andreas Rheinhardt	778ff97efa	avfilter/vf_fspp: Make output endian-independent Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:12 +01:00
Andreas Rheinhardt	f442145729	avfilter/vf_fspp: Avoid casts, effective-type violations Maybe uint64_t has been used as a poor man's alignment specifier? Anyway, reading an uint64_t via an lvalue of type int16_t (as happens in the C versions of the dsp functions) is undefined behavior. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:12 +01:00
Andreas Rheinhardt	c0648b2004	avfilter/x86/vf_spp: Fix comment Forgotten in `dcb28ed860`. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:12 +01:00
Andreas Rheinhardt	06b0dae51b	avfilter/vf_fsppdsp: Constify Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:12 +01:00
Andreas Rheinhardt	cc97f1e276	avfilter/vf_fspp: Fix effective type violation Also don't use unnecessarily large alignment; it avoids having to align the stack. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:12 +01:00
Andreas Rheinhardt	3cd452cbf1	avfilter/x86/vf_fspp: Avoid stack on x64 Possible due to the amount of registers. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:12 +01:00
Andreas Rheinhardt	ddd74276f8	avfilter/x86/vf_fspp: Port ff_column_fidct_mmx() to SSE2 It gains a lot because it has to operate on eight words; it also saves 608B of .text here. Old benchmarks: column_fidct_c: 3365.7 ( 1.00x) column_fidct_mmx: 1784.6 ( 1.89x) New benchmarks: column_fidct_c: 3361.5 ( 1.00x) column_fidct_sse2: 801.1 ( 4.20x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:11 +01:00
Andreas Rheinhardt	68b11cde82	tests/checkasm/vf_fspp: Add test for column_fidct Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:11 +01:00
Andreas Rheinhardt	63493bf0e0	avfilter/x86/vf_fspp: Put shifts into constants This avoids some shift instructions and also gives us more headroom in the registers. In fact, I have proven to myself that everything that is supposed to fit into 16bits now actually does so. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:11 +01:00
Andreas Rheinhardt	66af18d06a	avfilter/x86/vf_fspp: Make ff_column_fidct_mmx() bitexact It currently is not, because the shortcut mode uses different rounding than the C code (as well as the non-shortcut code). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 12:18:11 +01:00
Andreas Rheinhardt	1049a5fba8	avfilter/vf_fsppdsp: Reduce discrepancies between C code and x86 asm The x86 assembly uses the following pattern to zero all the values with abs<threshold: x -= threshold; x satu+= threshold (unsigned saturated addition) x += threshold x satu-= threshold (unsigned saturated subtraction) The reference C code meanwhile zeroed everything with abs <= threshold. This commit makes the C code behave like the x86 assembly to reduce discrepancies between the two. An alternative would be to require SSSE3, so that one can use pabsw, pcmpgtw for abs>threshold, followed by a pand with the original data. Or one could modify the thresholds to make both equal. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	d19050a1ae	avfilter/vf_fsppdsp: Use restrict It is possible because the requirements are fulfilled; it is also beneficial performance and code-size wise. For GCC 14 (with -O3), this reduced codesize by 26750B here; for Clang 20, it was 432B. Old benchmarks: mul_thrmat_c: 4.3 ( 1.00x) mul_thrmat_sse2: 4.3 ( 1.00x) store_slice_c: 2810.8 ( 1.00x) store_slice_sse2: 542.5 ( 5.18x) store_slice2_c: 3817.0 ( 1.00x) store_slice2_sse2: 410.4 ( 9.30x) New benchmarks: mul_thrmat_c: 4.3 ( 1.00x) mul_thrmat_sse2: 4.3 ( 1.00x) store_slice_c: 1510.1 ( 1.00x) store_slice_sse2: 545.2 ( 2.77x) store_slice2_c: 1763.5 ( 1.00x) store_slice2_sse2: 408.3 ( 4.32x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	ff85a20b7d	avfilter/x86/vf_fspp: Port store_slice to SSE2 Old benchmarks: store_slice_c: 2798.3 ( 1.00x) store_slice_mmx: 950.2 ( 2.94x) store_slice2_c: 3811.7 ( 1.00x) store_slice2_mmx: 682.3 ( 5.59x) New benchmarks: store_slice_c: 2797.2 ( 1.00x) store_slice_sse2: 543.5 ( 5.15x) store_slice2_c: 3817.0 ( 1.00x) store_slice2_sse2: 408.2 ( 9.35x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	570f8fc6c9	tests/checkasm/vf_fspp: Test store_slice Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	e042f17e99	avfilter/vf_fsppdsp: Use standard clamping This is obviously what is intended and what the MMX code does; yet I cannot rule out that it changes the output for some inputs: I have observed individual src values which would lead to temp values just above 512 if they came in pairs (i.e. if both inputs were simultaneously huge). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	52ba2ac7bd	avfilter/x86/vf_fspp: Port mul_thrmat to SSE2 This fixes an ABI violation, as mul_thrmat did not issue emms. It seems that this ABI violation could reach the user, namely if ff_get_video_buffer() fails. Notice that ff_get_video_buffer() itself could fail because of this, namely if the allocator uses floating point registers. On x64 (where GCC already used SSE2 in the C version) mul_thrmat_c: 4.4 ( 1.00x) mul_thrmat_mmx: 8.6 ( 0.52x) mul_thrmat_sse2: 4.4 ( 1.00x) On 32bit (where SSE2 is not known to be available): mul_thrmat_c: 56.0 ( 1.00x) mul_thrmat_sse2: 6.0 ( 9.40x) Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	70eb8a76a9	tests/checkasm: Add vf_fspp mul_thrmat test Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	9f4d5d818d	avfilter/x86/vf_fspp: Don't duplicate dither table Reuse the one from vf_fsppdsp.c; also don't overalign said table too much. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	1699de0955	avfilter/vf_fsppdsp: Use enum for constants It means that the compiler does not have to optimize the static const object away. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	9b34088c4d	avfilter/vf_fspp: Add DSPCtx, move DSP functions to file of their own This is in preparation for adding checkasm tests; without it, checkasm would pull all of libavfilter in. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 11:28:04 +01:00
Andreas Rheinhardt	57d6898730	configure: Only test for SSE2 intrinsics on x86 Reviewed-by: Kieran Kunhya <kieran@kunhya.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-17 10:36:54 +01:00
Artem Smorodin	e94439e49b	avformat/tee: fix the default onfail setting of the tee salves I found that the default value is not set for onfail option. I see that there is an attempt to set this value by default inside parse_slave_failure_policy_option. But look at the CONSUME_OPTION macro. If av_dict_get cannot find this option, then this function is not even called.	2025-11-17 00:01:42 +00:00
Michael Niedermayer	88b676105d	avcodec/prores_raw: Check bits in get_value() The code loads 32bit so we can at maximum use 32bit the return type is also changed to uint16_t (was requested in review), no path is known where a return value above 32767 is produced, but that was not exhaustively checked Fixes: runtime error: shift exponent -9 is negative Fixes: 439483046/clusterfuzz-testcase-minimized-ffmpeg_AV_CODEC_ID_PRORES_RAW_DEC_fuzzer-6649466540326912 Found-by: continuous fuzzing process https://github.com/google/oss-fuzz/tree/master/projects/ffmpeg Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2025-11-16 21:43:17 +01:00
Michael Niedermayer	9ccc33d84d	avcodec/prores_raw: Prettify ff_prores_raw_*_cb the values contain 3 4 bit values, thus using hex is more natural and shows more information Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2025-11-16 21:34:38 +01:00
Michael Niedermayer	ad956ff076	avfilter/vf_drawtext: Account for bbox text seperator Fixes: out of array access no test case Found-by: Joshua Rogers <joshua@joshua.hu> with ZeroPath Reviewed-by: Joshua Rogers <joshua@joshua.hu> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2025-11-16 20:32:11 +01:00
Andreas Rheinhardt	643e2e10f9	avutil/cpu: Deprecate AV_CPU_FLAG_FORCE This flag does nothing since the deactivation of the dsp_mask field of AVCodecContext in commits `9ae6ba2883` and `9ae6ba2883` (it has been superseded with better ways to override the CPU flags). So deprecate it. Reviewed-by: Lynne <dev@lynne.ee> Reviewed-by: James Almer <jamrial@gmail.com> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-16 11:24:00 +01:00
Zhao Zhili	925282fafc	doc/filters: add section for VideoToolbox filter Move scale_vt and transpose_vt to this section.	2025-11-16 10:22:19 +00:00
Martin Storsjö	e096a592cb	doc: Fix building with makeinfo 4.8 This fixes building after commit `1ce88d29d0`. That commit caused the following errors: src/doc/fate.texi:234: @anchor expected braces. src/doc/fate.texi:245: @item found outside of an insertion block. src/doc/fate.texi:249: @item found outside of an insertion block. src/doc/fate.texi:261: @item found outside of an insertion block. src/doc/fate.texi:265: @item found outside of an insertion block. src/doc/fate.texi:268: @item found outside of an insertion block. src/doc/fate.texi:274: @item found outside of an insertion block. src/doc/fate.texi:277: @item found outside of an insertion block. src/doc/fate.texi:281: @item found outside of an insertion block. src/doc/fate.texi:287: Unmatched `@end'. ./src/doc/fate.texi:65: Cross reference to nonexistent node `makefile variables' (perhaps incorrect sectioning?).	2025-11-15 19:29:08 +02:00
Gyan Doshi	1ce88d29d0	doc/fate: improve section on running FATE With thanks to Adam Koszek.	2025-11-15 08:28:51 +00:00
Cameron Gutman	d3dea2b142	avcodec/v4l2_buffers: map additional V4L2 TRCs Signed-off-by: Cameron Gutman <aicommander@gmail.com>	2025-11-15 00:39:43 +00:00
Andreas Rheinhardt	e293091d98	avcodec/prores_raw: Reuse permutation The ProresDSPContext already contains the idct_permutation. Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-11-14 14:27:53 +01:00
Michael Niedermayer	41a9c6ec5f	avcodec/mediacodecdec_common: Check that the input to mediacodec_wrap_sw_audio_buffer() contains channel * sample_size Fixes: out of array access no testcase Found-by: Joshua Rogers <joshua@joshua.hu> with ZeroPath Reviewed-by: Joshua Rogers <joshua@joshua.hu> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2025-11-14 12:16:48 +00:00
James Almer	b478037423	avdevice/lavfi: stop setting deprecated buffersink options Signed-off-by: James Almer <jamrial@gmail.com>	2025-11-13 21:02:10 -03:00
Dmitrii Ovchinnikov	62184be548	avutil/hwcontext_amf: Simplified blocking before frame submission Instead of blocking the entire context, which can cause issues in more complex pipelines, now only frame sending is blocked via AVMutex	2025-11-13 15:49:42 +01:00

1 2 3 4 5 ...

121803 Commits