FFmpeg

mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-10-06 05:47:18 +02:00

Author	SHA1	Message	Date
Andreas Rheinhardt	a8a16c15c8	tests/checkasm/llviddsp: Use the same width for each cpuflag Otherwise the benchmark numbers would be incomparable nonsense. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-10-03 17:05:46 +00:00
Kacper Michajłow	d6cb0d2c2b	ALL: move av_unused to conform with standard requirement This is required placement by standard [[maybe_unused]] attribute, works the same for __attribute__((unused)). Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-09-26 16:15:46 +00:00
Andreas Rheinhardt	4e2ef29cba	tests/checkasm: Add hpeldsp checkasm Reviewed-by: Lynne <dev@lynne.ee> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-09-26 06:21:02 +02:00
Niklas Haas	00e05bcd68	tests/checkasm: add vf_idet checkasm	2025-09-21 11:02:41 +00:00
Andreas Rheinhardt	a35c91dc14	avfilter/vf_colordetect: Rename header to vf_colordetectdsp.h It is more in line with our naming conventions. Reviewed-by: Martin Storsjö <martin@martin.st> Reviewed-by: Niklas Haas <ffmpeg@haasn.dev> Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-09-16 18:22:24 +02:00
Timo Rothenpieler	0362cb3806	build: link with CXX when -lstdc++ on linker commandline	2025-09-14 11:45:11 +00:00
Andreas Rheinhardt	bc545bae3b	tests/checkasm/sw_ops: Avoid 1 << 32 It is UB. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-09-13 21:27:27 +02:00
Martin Storsjö	5a893c1806	checkasm: sw_ops: Avoid division by zero If we're invoked with range == UINT_MAX, we end up doing "rnd() % (UINT_MAX + 1)", which is equal to "rnd() % 0". On arm (on all platforms) and on MSVC i386, this ends up crashing at runtime. This fixes the crash.	2025-09-02 14:28:56 +03:00
Niklas Haas	5e6ffa0376	tests/checkasm: add checkasm tests for swscale ops Because of the lack of an external ABI on low-level kernels, we cannot directly test internal functions. Instead, we construct a minimal op chain consisting of a read, the op to be tested, and a write. The bigger complication arises from the fact that the backend may generate arbitrary internal state that needs to be passed back to the implementation, which means we cannot directly call `func_ref` on the generated chain. To get around this, always compile the op chain twice - once using the backend to be tested, and once using the reference C backend. The actual entry point may also just be a shared wrapper, so we need to be very careful to run checkasm_check_func() on a pseudo-pointer that will actually be unique for each combination of backend and active CPU flags.	2025-09-01 19:28:36 +02:00
Niklas Haas	8406c56b0c	tests/checkasm: generalize DEF_CHECKASM_CHECK_FUNC to floats We split the standard macro into its body (implementation) and declaration, and use a macro argument in place of the raw `memcmp` call, with the major difference that we now take the number of pixels to compare instead of the number of bytes (to match the signature of float_near_ulp_array).	2025-09-01 19:27:53 +02:00
Niklas Haas	faf62cbdf5	tests/checkasm: increase number of runs in between measurements Sometimes, when measuring very small functions, rdtsc is not accurate enough to get a reliable measurement. This increases the number of runs inside the inner loop from 4 to 32, which should help a lot. Less important when using the more precise linux-perf API, but still useful. There should be no user-visible change since the number of runs is adjusted to keep the total time spent measuring the same.	2025-09-01 19:27:53 +02:00
Zhao Zhili	6450e01446	checkasm/vf_colordetect: test non-aligned width	2025-09-01 15:35:16 +00:00
Henrik Gramner	10a061ba99	vp9: Add AVX-512ICL asm for 8bpc subpel mc	2025-08-28 12:45:52 +00:00
Niklas Haas	9b8b78a815	avfilter/vf_colordetect: detect fully opaque alpha planes It can be useful to know if the alpha plane consists of fully opaque pixels or not, in which case it can e.g. safely be stripped. This only requires a very minor modification to the AVX2 routines, adding an extra AND on the read alpha value with the reference alpha value, and a single extra cheap test per line. detect_alpha_8_full_c: 2849.1 ( 1.00x) detect_alpha_8_full_avx2: 260.3 (10.95x) detect_alpha_8_full_avx512icl: 130.2 (21.87x) detect_alpha_8_limited_c: 8349.2 ( 1.00x) detect_alpha_8_limited_avx2: 756.6 (11.04x) detect_alpha_8_limited_avx512icl: 364.2 (22.93x) detect_alpha_16_full_c: 1652.8 ( 1.00x) detect_alpha_16_full_avx2: 236.5 ( 6.99x) detect_alpha_16_full_avx512icl: 134.6 (12.28x) detect_alpha_16_limited_c: 5263.1 ( 1.00x) detect_alpha_16_limited_avx2: 797.4 ( 6.60x) detect_alpha_16_limited_avx512icl: 400.3 (13.15x)	2025-08-18 18:50:00 +00:00
Niklas Haas	ae3c5ac2c1	avfilter/vf_colordetect: remove extra safety margin on premul check This safety margin was motivated by the fact that vf_premultiply sometimes produces such illegally high values, but this has since been fixed by `603334a043`, so there's no more reason to have this safety margin, at least for our own code. (Of course, other sources may also produce such broken files, but we shouldn't work around that - garbage in, garbage out.) See-Also: `603334a043`	2025-08-18 18:50:00 +00:00
Niklas Haas	c96ccd78fc	avfilter/vf_colordetect: rename p, q, k variables for clarity Purely cosmetic. Motivated in part because I want to depend on the assumption that P represents the maximum alpha channel value.	2025-08-18 18:50:00 +00:00
Niklas Haas	2968f30a15	tests/checkasm/vf_colordetect: also test opaque alpha base case Preemptively adding a check for a following commit.	2025-08-18 18:50:00 +00:00
Dash Santosh	6f9e8a599d	checkasm/swscale: fix whitespace issues	2025-08-12 09:05:00 +00:00
Logaprakash Ramajayam	49477972b7	swscale/aarch64/output: Implement neon assembly for yuv2planeX_10_c_template() yuv2yuvX_8_2_0_512_accurate_c: 2213.4 ( 1.00x) yuv2yuvX_8_2_0_512_accurate_neon: 147.5 (15.01x) yuv2yuvX_8_2_0_512_approximate_c: 2203.9 ( 1.00x) yuv2yuvX_8_2_0_512_approximate_neon: 154.1 (14.30x) yuv2yuvX_8_2_16_512_accurate_c: 2147.2 ( 1.00x) yuv2yuvX_8_2_16_512_accurate_neon: 150.8 (14.24x) yuv2yuvX_8_2_16_512_approximate_c: 2149.7 ( 1.00x) yuv2yuvX_8_2_16_512_approximate_neon: 146.8 (14.64x) yuv2yuvX_8_2_32_512_accurate_c: 2078.9 ( 1.00x) yuv2yuvX_8_2_32_512_accurate_neon: 139.0 (14.95x) yuv2yuvX_8_2_32_512_approximate_c: 2083.7 ( 1.00x) yuv2yuvX_8_2_32_512_approximate_neon: 140.5 (14.84x) yuv2yuvX_8_2_48_512_accurate_c: 2010.7 ( 1.00x) yuv2yuvX_8_2_48_512_accurate_neon: 138.2 (14.55x) yuv2yuvX_8_2_48_512_approximate_c: 2012.6 ( 1.00x) yuv2yuvX_8_2_48_512_approximate_neon: 141.2 (14.26x) yuv2yuvX_10LE_16_0_512_accurate_c: 7874.1 ( 1.00x) yuv2yuvX_10LE_16_0_512_accurate_neon: 831.6 ( 9.47x) yuv2yuvX_10LE_16_0_512_approximate_c: 7918.1 ( 1.00x) yuv2yuvX_10LE_16_0_512_approximate_neon: 836.1 ( 9.47x) yuv2yuvX_10LE_16_16_512_accurate_c: 7630.9 ( 1.00x) yuv2yuvX_10LE_16_16_512_accurate_neon: 804.5 ( 9.49x) yuv2yuvX_10LE_16_16_512_approximate_c: 7724.7 ( 1.00x) yuv2yuvX_10LE_16_16_512_approximate_neon: 808.6 ( 9.55x) yuv2yuvX_10LE_16_32_512_accurate_c: 7436.4 ( 1.00x) yuv2yuvX_10LE_16_32_512_accurate_neon: 780.4 ( 9.53x) yuv2yuvX_10LE_16_32_512_approximate_c: 7366.7 ( 1.00x) yuv2yuvX_10LE_16_32_512_approximate_neon: 780.5 ( 9.44x) yuv2yuvX_10LE_16_48_512_accurate_c: 7099.9 ( 1.00x) yuv2yuvX_10LE_16_48_512_accurate_neon: 761.0 ( 9.33x) yuv2yuvX_10LE_16_48_512_approximate_c: 7097.6 ( 1.00x) yuv2yuvX_10LE_16_48_512_approximate_neon: 754.6 ( 9.41x) Benchmarked on: Snapdragon(R) X Elite - X1E80100 - Qualcomm(R) Oryon(TM) CPU 3417 Mhz, 12 Core(s), 12 Logical Processor(s)	2025-08-12 09:05:00 +00:00
Martin Storsjö	8e4c904c8e	checkasm: ac3dsp: Increase the float tolerance for sum_square_butterfly_float Accept up to 13 ULP difference. This fixes running "checkasm --test=ac3dsp 3044836819" on ARM. Depending on how the SIMD implementations aggregate numbers, larger/smaller values might not end up accumulated in exactly the same way; the current NEON implementation for ARM aggregates into vectors of 2 elements. If it would aggregate into vectors of 4 elements instead, like the AArch64 version does, this particular case would end up with a smaller difference.	2025-08-10 02:27:44 +00:00
Martin Storsjö	0400e05a1a	checkasm: ac3dsp: Fix function name typos for sum_square_butterfly	2025-08-10 02:27:44 +00:00
Timo Rothenpieler	262d41c804	all: fix typos found by codespell	2025-08-03 13:48:47 +02:00
Andreas Rheinhardt	15cec71665	checkasm/h264dsp: Fix stack-buffer-overflow, effective-type violations Also ensure that the dst buffers are not too big (they had the right size for >8 bit depths and were therefore too big for eight bit, letting potential buffer overflows in the eight bit version go undetected). Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2025-07-28 19:29:51 +02:00
Niklas Haas	f944a70fcc	tests/checkasm: add check for vf_colordetect	2025-07-21 18:10:26 +02:00
Niklas Haas	bfab026298	tests/checkasm: add test for vf_blackdetect	2025-07-18 10:47:31 +02:00
Niklas Haas	9251af058a	tests/checkasm: add scene_sad checkasm test	2025-07-17 12:26:05 +02:00
Kacper Michajłow	ec51162bb6	checkasm/swscale: fix function prototypes This aligns declared function types in checkasm with real definition. Fixes FATE: checkasm-{sw_rgb,sw_scale,sw_yuv2rgb,sw_yuv2yuv} Fixes: runtime error: call to function <func> through pointer to incorrect function type Fixes: `c1a0e65763` Signed-off-by: Kacper Michajłow <kasper93@gmail.com>	2025-07-17 00:28:21 +02:00
Andreas Rheinhardt	9b409ea1e6	configure: Factor mpegvideoencdsp out of mpegvideoenc This will allow to relax the dependency on mpegvideoenc for several codecs. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-06-21 22:08:52 +02:00
Tristan Matthews	0d9f680b69	checkasm: h264dsp: test luma_dc_dequant Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2025-06-16 01:31:45 +02:00
Tristan Matthews	5ea3adfcf9	checkasm: add checkasm_check_dctcoef This is useful for tests that compare dctcoefs which will be either 2 bytes or 4 bytes, depending on bitdepth. Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2025-06-16 01:31:44 +02:00
Andreas Rheinhardt	17d5f30dd5	avcodec/pixblockdsp: Pass bits_per_raw_sample directly Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-05-31 01:27:09 +02:00
Henrik Gramner	fd18ae88ae	avcodec/x86/vp9: Add AVX-512ICL for 16x16 and 32x32 8bpc inverse transforms	2025-05-19 15:56:27 +02:00
Nuo Mi	87b0561c88	build: fix windows build issue introduced by `45bea45` We defined CR to 2 in libavcodec/vvc/dec.h, but the CR used by _IMAGE_ARM64_RUNTIME_FUNCTION_ENTRY winnt.h reorder the header will avoid the issue.	2025-05-16 20:30:46 +08:00
Nuo Mi	3004835850	checkasm: hevc sao, use checkasm_check_padded	2025-05-14 20:55:39 +08:00
Nuo Mi	5150d26e0a	checkasm: hevc sao_edge, benchmarking inside the width loop is meaningless	2025-05-14 20:55:39 +08:00
Shaun Loo	45bea45c7b	checkasm: add vvc_sao This is a part of Google Summer of Code 2023 AVX2: - vvc_sao.sao_band [OK] - vvc_sao.sao_edge [OK] checkasm: all 54 tests passed vvc_sao_band_8_8_c: 157.4 ( 1.00x) vvc_sao_band_8_8_avx2: 30.7 ( 5.12x) vvc_sao_band_8_10_c: 119.4 ( 1.00x) vvc_sao_band_8_10_avx2: 29.2 ( 4.09x) vvc_sao_band_8_12_c: 144.6 ( 1.00x) vvc_sao_band_8_12_avx2: 30.0 ( 4.82x) vvc_sao_band_16_8_c: 446.5 ( 1.00x) vvc_sao_band_16_8_avx2: 103.3 ( 4.32x) vvc_sao_band_16_10_c: 399.2 ( 1.00x) vvc_sao_band_16_10_avx2: 64.3 ( 6.21x) vvc_sao_band_16_12_c: 472.9 ( 1.00x) vvc_sao_band_16_12_avx2: 56.5 ( 8.37x) vvc_sao_band_32_8_c: 2430.9 ( 1.00x) vvc_sao_band_32_8_avx2: 203.3 (11.96x) vvc_sao_band_32_10_c: 1405.7 ( 1.00x) vvc_sao_band_32_10_avx2: 208.5 ( 6.74x) vvc_sao_band_32_12_c: 2054.3 ( 1.00x) vvc_sao_band_32_12_avx2: 213.0 ( 9.64x) vvc_sao_band_48_8_c: 3835.4 ( 1.00x) vvc_sao_band_48_8_avx2: 604.2 ( 6.35x) vvc_sao_band_48_10_c: 3624.6 ( 1.00x) vvc_sao_band_48_10_avx2: 468.8 ( 7.73x) vvc_sao_band_48_12_c: 3752.4 ( 1.00x) vvc_sao_band_48_12_avx2: 477.5 ( 7.86x) vvc_sao_band_64_8_c: 6061.1 ( 1.00x) vvc_sao_band_64_8_avx2: 803.9 ( 7.54x) vvc_sao_band_64_10_c: 6142.5 ( 1.00x) vvc_sao_band_64_10_avx2: 827.3 ( 7.43x) vvc_sao_band_64_12_c: 6106.6 ( 1.00x) vvc_sao_band_64_12_avx2: 839.9 ( 7.27x) vvc_sao_band_80_8_c: 9478.0 ( 1.00x) vvc_sao_band_80_8_avx2: 1516.7 ( 6.25x) vvc_sao_band_80_10_c: 10300.5 ( 1.00x) vvc_sao_band_80_10_avx2: 1298.7 ( 7.93x) vvc_sao_band_80_12_c: 8941.1 ( 1.00x) vvc_sao_band_80_12_avx2: 1315.3 ( 6.80x) vvc_sao_band_96_8_c: 13351.5 ( 1.00x) vvc_sao_band_96_8_avx2: 1815.4 ( 7.35x) vvc_sao_band_96_10_c: 13197.5 ( 1.00x) vvc_sao_band_96_10_avx2: 1872.4 ( 7.05x) vvc_sao_band_96_12_c: 11969.0 ( 1.00x) vvc_sao_band_96_12_avx2: 1895.8 ( 6.31x) vvc_sao_band_112_8_c: 19936.9 ( 1.00x) vvc_sao_band_112_8_avx2: 2802.3 ( 7.11x) vvc_sao_band_112_10_c: 19534.9 ( 1.00x) vvc_sao_band_112_10_avx2: 2635.0 ( 7.41x) vvc_sao_band_112_12_c: 16520.6 ( 1.00x) vvc_sao_band_112_12_avx2: 2591.8 ( 6.37x) vvc_sao_band_128_8_c: 25967.5 ( 1.00x) vvc_sao_band_128_8_avx2: 3155.3 ( 8.23x) vvc_sao_band_128_10_c: 24002.6 ( 1.00x) vvc_sao_band_128_10_avx2: 3374.6 ( 7.11x) vvc_sao_band_128_12_c: 20829.4 ( 1.00x) vvc_sao_band_128_12_avx2: 3377.0 ( 6.17x) vvc_sao_edge_8_8_c: 174.6 ( 1.00x) vvc_sao_edge_8_8_avx2: 37.0 ( 4.72x) vvc_sao_edge_8_10_c: 174.4 ( 1.00x) vvc_sao_edge_8_10_avx2: 58.5 ( 2.98x) vvc_sao_edge_8_12_c: 171.1 ( 1.00x) vvc_sao_edge_8_12_avx2: 58.5 ( 2.93x) vvc_sao_edge_16_8_c: 677.7 ( 1.00x) vvc_sao_edge_16_8_avx2: 72.2 ( 9.39x) vvc_sao_edge_16_10_c: 724.8 ( 1.00x) vvc_sao_edge_16_10_avx2: 106.4 ( 6.81x) vvc_sao_edge_16_12_c: 647.0 ( 1.00x) vvc_sao_edge_16_12_avx2: 106.6 ( 6.07x) vvc_sao_edge_32_8_c: 3001.8 ( 1.00x) vvc_sao_edge_32_8_avx2: 157.6 (19.04x) vvc_sao_edge_32_10_c: 3071.1 ( 1.00x) vvc_sao_edge_32_10_avx2: 404.2 ( 7.60x) vvc_sao_edge_32_12_c: 2698.6 ( 1.00x) vvc_sao_edge_32_12_avx2: 398.8 ( 6.77x) vvc_sao_edge_48_8_c: 6557.7 ( 1.00x) vvc_sao_edge_48_8_avx2: 380.1 (17.25x) vvc_sao_edge_48_10_c: 6319.9 ( 1.00x) vvc_sao_edge_48_10_avx2: 896.3 ( 7.05x) vvc_sao_edge_48_12_c: 6306.4 ( 1.00x) vvc_sao_edge_48_12_avx2: 885.5 ( 7.12x) vvc_sao_edge_64_8_c: 11510.7 ( 1.00x) vvc_sao_edge_64_8_avx2: 504.1 (22.84x) vvc_sao_edge_64_10_c: 10917.4 ( 1.00x) vvc_sao_edge_64_10_avx2: 1608.3 ( 6.79x) vvc_sao_edge_64_12_c: 11499.8 ( 1.00x) vvc_sao_edge_64_12_avx2: 1586.4 ( 7.25x) vvc_sao_edge_80_8_c: 18193.2 ( 1.00x) vvc_sao_edge_80_8_avx2: 930.2 (19.56x) vvc_sao_edge_80_10_c: 17984.3 ( 1.00x) vvc_sao_edge_80_10_avx2: 2420.9 ( 7.43x) vvc_sao_edge_80_12_c: 18289.4 ( 1.00x) vvc_sao_edge_80_12_avx2: 2412.1 ( 7.58x) vvc_sao_edge_96_8_c: 26361.8 ( 1.00x) vvc_sao_edge_96_8_avx2: 1118.4 (23.57x) vvc_sao_edge_96_10_c: 26162.2 ( 1.00x) vvc_sao_edge_96_10_avx2: 3666.9 ( 7.13x) vvc_sao_edge_96_12_c: 25926.6 ( 1.00x) vvc_sao_edge_96_12_avx2: 3433.9 ( 7.55x) vvc_sao_edge_112_8_c: 36562.9 ( 1.00x) vvc_sao_edge_112_8_avx2: 1741.0 (21.00x) vvc_sao_edge_112_10_c: 38126.4 ( 1.00x) vvc_sao_edge_112_10_avx2: 5153.3 ( 7.40x) vvc_sao_edge_112_12_c: 36345.7 ( 1.00x) vvc_sao_edge_112_12_avx2: 4684.9 ( 7.76x) vvc_sao_edge_128_8_c: 46379.8 ( 1.00x) vvc_sao_edge_128_8_avx2: 2012.4 (23.05x) vvc_sao_edge_128_10_c: 47029.5 ( 1.00x) vvc_sao_edge_128_10_avx2: 6162.2 ( 7.63x) vvc_sao_edge_128_12_c: 49647.3 ( 1.00x) vvc_sao_edge_128_12_avx2: 6127.1 ( 8.10x) Co-authored-by: Nuo Mi <nuomi2021@gmail.com>	2025-05-14 20:55:39 +08:00
Mark Thompson	d03c99441d	lavc/apv: AVX2 transquant for x86-64 Typical checkasm result on Alder Lake: decode_transquant_8_c: 464.2 ( 1.00x) decode_transquant_8_avx2: 86.2 ( 5.38x) decode_transquant_10_c: 481.6 ( 1.00x) decode_transquant_10_avx2: 83.5 ( 5.77x)	2025-04-27 15:52:30 +01:00
Martin Storsjö	4d4b301e4a	checkasm: hevc_pel: Use helpers for checking for writes out of bounds This allows catching whether the functions write outside of the designated rectangle, and if run with "checkasm -v", it also prints out on which side of the rectangle the overwrite was. Signed-off-by: Martin Storsjö <martin@martin.st>	2025-04-10 13:30:18 +03:00
Rodger Combs	779cbc2b97	checkasm: add tests for AES Signed-off-by: James Almer <jamrial@gmail.com>	2025-04-06 11:02:10 -03:00
Michael Niedermayer	d5ad860cd8	tests/checkasm/checkasm.c: Assert that aligned_w/h do not overflow Reviewed-by: Martin Storsjö <martin@martin.st> Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>	2025-04-03 01:58:07 +02:00
Martin Storsjö	c1a2da72cc	checkasm: vp8dsp: Use checkasm_check_padded in check_mc Signed-off-by: Martin Storsjö <martin@martin.st>	2025-04-01 18:34:53 +03:00
Martin Storsjö	b863b81500	checkasm: Implement helpers for defining and checking padded rects This backports similar functionality from dav1d, from commits 35d1d011fda4a92bcaf42d30ed137583b27d7f6d and d130da9c315d5a1d3968d278bbee2238ad9051e7. This allows detecting writes out of bounds, on all 4 sides of the intended destination rectangle. The bounds checking also can optionally allow small overwrites (up to a specified alignment), while still checking for larger overwrites past the intended allowed region. Signed-off-by: Martin Storsjö <martin@martin.st>	2025-04-01 18:34:51 +03:00
Martin Storsjö	37c664a253	checkasm: Make checkasm_fail_func return whether we should print verbosely This makes it easier to implement custom error printouts in tests. This is a port of dav1d's commit 13a7d78655f8747c2cd01e8a48d44dcc7f60a8e5 into ffmpeg's checkasm. Signed-off-by: Martin Storsjö <martin@martin.st>	2025-04-01 18:34:48 +03:00
Kieran Kunhya	4db571e516	checkasm/v210enc.c: Use checkasm_check() This gives more informative printouts if the tests fail, if checkasm is run with "-v". Signed-off-by: Martin Storsjö <martin@martin.st>	2025-04-01 18:31:58 +03:00
Niklas Haas	256a38101f	tests/checkasm: fix wrong summation of bench time This was changed 8 years ago with the introduction of the linux-perf path, with seemingly no justification at the time. Likely a developer oversight from testing. This bug not only made --runs completely ineffective, but also meant that we didn't actually correctly filter out outliers. Fixes: `e0d56f097f`	2025-03-31 15:27:24 +02:00
Andreas Rheinhardt	a064d34a32	avcodec/mpegvideoenc: Add MPVEncContext Many of the fields of MpegEncContext (which is also used by decoders) are actually only used by encoders. Therefore this commit adds a new encoder-only structure and moves all of the encoder-only fields to it except for those which require more explicit synchronisation between the main slice context and the other slice contexts. This synchronisation is currently mainly provided by ff_update_thread_context() which simply copies most of the main slice context over the other slice contexts. Fields which are moved to the new MPVEncContext no longer participate in this (which is desired, because it is horrible and for the fields b) below wasteful) which means that some fields can only be moved when explicit synchronisation code is added in later commits. More explicitly, this commit moves the following fields: a) Fields not copied by ff_update_duplicate_context(): dct_error_sum and dct_count; the former does not need synchronisation, the latter is synchronised in merge_context_after_encode(). b) Fields which do not change after initialisation (these fields could also be put into MPVMainEncContext at the cost of an indirection to access them): lambda_table, adaptive_quant, {luma,chroma}_elim_threshold, new_pic, fdsp, mpvencdsp, pdsp, {p,b_forw,b_back,b_bidir_forw,b_bidir_back,b_direct,b_field}_mv_table, [pb]_field_select_table, mb_{type,var,mean}, mc_mb_var, {min,max}_qcoeff, {inter,intra}_quant_bias, ac_esc_length, the *_vlc_length fields, the q_{intra,inter,chroma_intra}_matrix{,16}, dct_offset, mb_info, mjpeg_ctx, rtp_mode, rtp_payload_size, encode_mb, all function pointers, mpv_flags, quantizer_noise_shaping, frame_reconstruction_bitfield, error_rate and intra_penalty. c) Fields which are already (re)set explicitly: The PutBitContexts pb, tex_pb, pb2; dquant, skipdct, encoding_error, the statistics fields {mv,i_tex,p_tex,misc,last}_bits and i_count; last_mv_dir, esc_pos (reset when writing the header). d) Fields which are only used by encoders not supporting slice threading for which synchronisation doesn't matter: esc3_level_length and the remaining mb_info fields. e) coded_score: This field is only really used when FF_MPV_FLAG_CBP_RD is set (which implies trellis) and even then it is only used for non-intra blocks. For these blocks dct_quantize_trellis_c() either sets coded_score[n] or returns a last_non_zero value of -1 in which case coded_score will be reset in encode_mb_internal(). Therefore no old values are ever used. The MotionEstContext has not been moved yet. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-03-26 04:08:33 +01:00
Andreas Rheinhardt	9f0970ee35	tests/checkasm/videodsp: Don't use declare_func_emms It allows the callee to clobber the MMX state, yet since `1e3dc705df` this is no longer done. So use the stricter declare_func instead. Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>	2025-03-22 03:45:03 +01:00
Martin Storsjö	e75a0f3c75	checkasm: aacencdsp: Actually test nonzero values in quant_bands Previously, we read elements from ff_aac_pow34sf_tab; however that table is initialized to zero; one needs to call ff_aac_float_common_init() to make sure that the table is initialized. However, given the range of the input values, a large number of entries in ff_aac_pow34sf_tab would give results outside of the range for signed 32 bit integers. As the largest aac_cb_maxval entry is 16, it seems more reasonable to produce values within an order of mangitude of that value. (When hitting INT_MIN, implementations may end up with different results depending on whether the value is negated as a float or as an int. This corner case is irrelevant in practice as this is way outside of the expected value range here.) Coincidentally, this fixes linking checkasm with Apple's older linker. (In Xcode 15, Apple switched to a new linker. The one in older toolchains seems to have a bug where it won't figure out to load object files from a static library, if the only symbol referenced in the object file is a "common" symbol, i.e. one for a zero-initialized variable. This issue can also be reproduced with newer Apple toolchains by passing -Wl,-ld_classic to the linker.) Signed-off-by: Martin Storsjö <martin@martin.st>	2025-02-10 14:03:25 +02:00
Krzysztof Pyrkosz	c85a748979	swscale/aarch64/rgb2rgb: Implemented NEON shuf routines The key idea is to pass the pre-generated tables to the TBL instruction and churn through the data 16 bytes at a time. The remaining 4 elements are handled with a specialized block located at the end of the routine. The 3210 variant can be implemented using rev32, but surprisingly it is slower than the generic TBL on A78, but much faster on A72. There may be some room for improvement. Possibly instead of handling last 8 and then 4 bytes separately, we can load these 4 into {v0.s}[2] and process along with the last 8 bytes. Speeds measured with checkasm --test=sw_rgb --bench --runs=10 \| grep shuf - A78 shuffle_bytes_0321_c: 75.5 ( 1.00x) shuffle_bytes_0321_neon: 26.5 ( 2.85x) shuffle_bytes_1203_c: 136.2 ( 1.00x) shuffle_bytes_1203_neon: 27.2 ( 5.00x) shuffle_bytes_1230_c: 135.5 ( 1.00x) shuffle_bytes_1230_neon: 28.0 ( 4.84x) shuffle_bytes_2013_c: 138.8 ( 1.00x) shuffle_bytes_2013_neon: 22.0 ( 6.31x) shuffle_bytes_2103_c: 76.5 ( 1.00x) shuffle_bytes_2103_neon: 20.5 ( 3.73x) shuffle_bytes_2130_c: 137.5 ( 1.00x) shuffle_bytes_2130_neon: 28.0 ( 4.91x) shuffle_bytes_3012_c: 138.2 ( 1.00x) shuffle_bytes_3012_neon: 21.5 ( 6.43x) shuffle_bytes_3102_c: 138.2 ( 1.00x) shuffle_bytes_3102_neon: 27.2 ( 5.07x) shuffle_bytes_3210_c: 138.0 ( 1.00x) shuffle_bytes_3210_neon: 22.0 ( 6.27x) shuf3210 using rev32 shuffle_bytes_3210_c: 139.0 ( 1.00x) shuffle_bytes_3210_neon: 28.5 ( 4.88x) - A72 shuffle_bytes_0321_c: 120.0 ( 1.00x) shuffle_bytes_0321_neon: 36.0 ( 3.33x) shuffle_bytes_1203_c: 188.2 ( 1.00x) shuffle_bytes_1203_neon: 37.8 ( 4.99x) shuffle_bytes_1230_c: 195.0 ( 1.00x) shuffle_bytes_1230_neon: 36.0 ( 5.42x) shuffle_bytes_2013_c: 195.8 ( 1.00x) shuffle_bytes_2013_neon: 43.5 ( 4.50x) shuffle_bytes_2103_c: 117.2 ( 1.00x) shuffle_bytes_2103_neon: 53.5 ( 2.19x) shuffle_bytes_2130_c: 203.2 ( 1.00x) shuffle_bytes_2130_neon: 37.8 ( 5.38x) shuffle_bytes_3012_c: 183.8 ( 1.00x) shuffle_bytes_3012_neon: 46.8 ( 3.93x) shuffle_bytes_3102_c: 180.8 ( 1.00x) shuffle_bytes_3102_neon: 37.8 ( 4.79x) shuffle_bytes_3210_c: 195.8 ( 1.00x) shuffle_bytes_3210_neon: 37.8 ( 5.19x) shuf3210 using rev32 shuffle_bytes_3210_c: 194.8 ( 1.00x) shuffle_bytes_3210_neon: 30.8 ( 6.33x) - x13s: shuffle_bytes_0321_c: 49.4 ( 1.00x) shuffle_bytes_0321_neon: 18.1 ( 2.72x) shuffle_bytes_1203_c: 98.4 ( 1.00x) shuffle_bytes_1203_neon: 18.4 ( 5.35x) shuffle_bytes_1230_c: 97.4 ( 1.00x) shuffle_bytes_1230_neon: 19.1 ( 5.09x) shuffle_bytes_2013_c: 101.4 ( 1.00x) shuffle_bytes_2013_neon: 16.9 ( 6.01x) shuffle_bytes_2103_c: 53.9 ( 1.00x) shuffle_bytes_2103_neon: 13.9 ( 3.88x) shuffle_bytes_2130_c: 100.9 ( 1.00x) shuffle_bytes_2130_neon: 19.1 ( 5.27x) shuffle_bytes_3012_c: 97.4 ( 1.00x) shuffle_bytes_3012_neon: 17.1 ( 5.69x) shuffle_bytes_3102_c: 100.9 ( 1.00x) shuffle_bytes_3102_neon: 19.1 ( 5.27x) shuffle_bytes_3210_c: 100.6 ( 1.00x) shuffle_bytes_3210_neon: 16.9 ( 5.96x) shuf3210 using rev32 shuffle_bytes_3210_c: 100.6 ( 1.00x) shuffle_bytes_3210_neon: 18.6 ( 5.40x) Signed-off-by: Martin Storsjö <martin@martin.st>	2025-02-07 12:54:55 +02:00
James Almer	7a16bfa7c9	tests/checkasm/sw_rgb: increase plane array buffers Fixes stack-buffer-overflow errors running under asan. Reviewed-by: Marvin Scholz <epirat07@gmail.com> Signed-off-by: James Almer <jamrial@gmail.com>	2025-01-28 15:26:00 -03:00

1 2 3 4 5 ...

635 Commits