1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-11-23 21:54:53 +02:00
Commit Graph

193 Commits

Author SHA1 Message Date
Andreas Rheinhardt
5d9a392bce tests/checkasm: Add VP3 loop filter test
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-13 18:58:50 +02:00
Andreas Rheinhardt
54598238e4 tests/checkasm: Add CAVS qpel test
This test already uncovered a bug in the vertical qpel motion
compensation code.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-08 20:40:08 +02:00
James Almer
95850f339e tests/checkasm: add a test for dcadsp
Signed-off-by: James Almer <jamrial@gmail.com>
2025-10-05 10:09:04 -03:00
Andreas Rheinhardt
9a0581fca0 tests/checkasm: Add qpeldsp checkasm
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-10-04 07:06:32 +02:00
Andreas Rheinhardt
4e2ef29cba tests/checkasm: Add hpeldsp checkasm
Reviewed-by: Lynne <dev@lynne.ee>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-09-26 06:21:02 +02:00
Niklas Haas
00e05bcd68 tests/checkasm: add vf_idet checkasm 2025-09-21 11:02:41 +00:00
Niklas Haas
5e6ffa0376 tests/checkasm: add checkasm tests for swscale ops
Because of the lack of an external ABI on low-level kernels, we cannot
directly test internal functions. Instead, we construct a minimal op chain
consisting of a read, the op to be tested, and a write.

The bigger complication arises from the fact that the backend may generate
arbitrary internal state that needs to be passed back to the implementation,
which means we cannot directly call `func_ref` on the generated chain. To get
around this, always compile the op chain twice - once using the backend to be
tested, and once using the reference C backend.

The actual entry point may also just be a shared wrapper, so we need to
be very careful to run checkasm_check_func() on a pseudo-pointer that will
actually be unique for each combination of backend and active CPU flags.
2025-09-01 19:28:36 +02:00
Niklas Haas
8406c56b0c tests/checkasm: generalize DEF_CHECKASM_CHECK_FUNC to floats
We split the standard macro into its body (implementation) and declaration,
and use a macro argument in place of the raw `memcmp` call, with the major
difference that we now take the number of pixels to compare instead of the
number of bytes (to match the signature of float_near_ulp_array).
2025-09-01 19:27:53 +02:00
Niklas Haas
faf62cbdf5 tests/checkasm: increase number of runs in between measurements
Sometimes, when measuring very small functions, rdtsc is not accurate enough
to get a reliable measurement. This increases the number of runs inside the
inner loop from 4 to 32, which should help a lot. Less important when using
the more precise linux-perf API, but still useful.

There should be no user-visible change since the number of runs is adjusted
to keep the total time spent measuring the same.
2025-09-01 19:27:53 +02:00
Niklas Haas
f944a70fcc tests/checkasm: add check for vf_colordetect 2025-07-21 18:10:26 +02:00
Niklas Haas
bfab026298 tests/checkasm: add test for vf_blackdetect 2025-07-18 10:47:31 +02:00
Niklas Haas
9251af058a tests/checkasm: add scene_sad checkasm test 2025-07-17 12:26:05 +02:00
Andreas Rheinhardt
9b409ea1e6 configure: Factor mpegvideoencdsp out of mpegvideoenc
This will allow to relax the dependency on mpegvideoenc
for several codecs.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2025-06-21 22:08:52 +02:00
Shaun Loo
45bea45c7b checkasm: add vvc_sao
This is a part of Google Summer of Code 2023

AVX2:
 - vvc_sao.sao_band [OK]
 - vvc_sao.sao_edge [OK]
checkasm: all 54 tests passed
vvc_sao_band_8_8_c:                                    157.4 ( 1.00x)
vvc_sao_band_8_8_avx2:                                  30.7 ( 5.12x)
vvc_sao_band_8_10_c:                                   119.4 ( 1.00x)
vvc_sao_band_8_10_avx2:                                 29.2 ( 4.09x)
vvc_sao_band_8_12_c:                                   144.6 ( 1.00x)
vvc_sao_band_8_12_avx2:                                 30.0 ( 4.82x)
vvc_sao_band_16_8_c:                                   446.5 ( 1.00x)
vvc_sao_band_16_8_avx2:                                103.3 ( 4.32x)
vvc_sao_band_16_10_c:                                  399.2 ( 1.00x)
vvc_sao_band_16_10_avx2:                                64.3 ( 6.21x)
vvc_sao_band_16_12_c:                                  472.9 ( 1.00x)
vvc_sao_band_16_12_avx2:                                56.5 ( 8.37x)
vvc_sao_band_32_8_c:                                  2430.9 ( 1.00x)
vvc_sao_band_32_8_avx2:                                203.3 (11.96x)
vvc_sao_band_32_10_c:                                 1405.7 ( 1.00x)
vvc_sao_band_32_10_avx2:                               208.5 ( 6.74x)
vvc_sao_band_32_12_c:                                 2054.3 ( 1.00x)
vvc_sao_band_32_12_avx2:                               213.0 ( 9.64x)
vvc_sao_band_48_8_c:                                  3835.4 ( 1.00x)
vvc_sao_band_48_8_avx2:                                604.2 ( 6.35x)
vvc_sao_band_48_10_c:                                 3624.6 ( 1.00x)
vvc_sao_band_48_10_avx2:                               468.8 ( 7.73x)
vvc_sao_band_48_12_c:                                 3752.4 ( 1.00x)
vvc_sao_band_48_12_avx2:                               477.5 ( 7.86x)
vvc_sao_band_64_8_c:                                  6061.1 ( 1.00x)
vvc_sao_band_64_8_avx2:                                803.9 ( 7.54x)
vvc_sao_band_64_10_c:                                 6142.5 ( 1.00x)
vvc_sao_band_64_10_avx2:                               827.3 ( 7.43x)
vvc_sao_band_64_12_c:                                 6106.6 ( 1.00x)
vvc_sao_band_64_12_avx2:                               839.9 ( 7.27x)
vvc_sao_band_80_8_c:                                  9478.0 ( 1.00x)
vvc_sao_band_80_8_avx2:                               1516.7 ( 6.25x)
vvc_sao_band_80_10_c:                                10300.5 ( 1.00x)
vvc_sao_band_80_10_avx2:                              1298.7 ( 7.93x)
vvc_sao_band_80_12_c:                                 8941.1 ( 1.00x)
vvc_sao_band_80_12_avx2:                              1315.3 ( 6.80x)
vvc_sao_band_96_8_c:                                 13351.5 ( 1.00x)
vvc_sao_band_96_8_avx2:                               1815.4 ( 7.35x)
vvc_sao_band_96_10_c:                                13197.5 ( 1.00x)
vvc_sao_band_96_10_avx2:                              1872.4 ( 7.05x)
vvc_sao_band_96_12_c:                                11969.0 ( 1.00x)
vvc_sao_band_96_12_avx2:                              1895.8 ( 6.31x)
vvc_sao_band_112_8_c:                                19936.9 ( 1.00x)
vvc_sao_band_112_8_avx2:                              2802.3 ( 7.11x)
vvc_sao_band_112_10_c:                               19534.9 ( 1.00x)
vvc_sao_band_112_10_avx2:                             2635.0 ( 7.41x)
vvc_sao_band_112_12_c:                               16520.6 ( 1.00x)
vvc_sao_band_112_12_avx2:                             2591.8 ( 6.37x)
vvc_sao_band_128_8_c:                                25967.5 ( 1.00x)
vvc_sao_band_128_8_avx2:                              3155.3 ( 8.23x)
vvc_sao_band_128_10_c:                               24002.6 ( 1.00x)
vvc_sao_band_128_10_avx2:                             3374.6 ( 7.11x)
vvc_sao_band_128_12_c:                               20829.4 ( 1.00x)
vvc_sao_band_128_12_avx2:                             3377.0 ( 6.17x)
vvc_sao_edge_8_8_c:                                    174.6 ( 1.00x)
vvc_sao_edge_8_8_avx2:                                  37.0 ( 4.72x)
vvc_sao_edge_8_10_c:                                   174.4 ( 1.00x)
vvc_sao_edge_8_10_avx2:                                 58.5 ( 2.98x)
vvc_sao_edge_8_12_c:                                   171.1 ( 1.00x)
vvc_sao_edge_8_12_avx2:                                 58.5 ( 2.93x)
vvc_sao_edge_16_8_c:                                   677.7 ( 1.00x)
vvc_sao_edge_16_8_avx2:                                 72.2 ( 9.39x)
vvc_sao_edge_16_10_c:                                  724.8 ( 1.00x)
vvc_sao_edge_16_10_avx2:                               106.4 ( 6.81x)
vvc_sao_edge_16_12_c:                                  647.0 ( 1.00x)
vvc_sao_edge_16_12_avx2:                               106.6 ( 6.07x)
vvc_sao_edge_32_8_c:                                  3001.8 ( 1.00x)
vvc_sao_edge_32_8_avx2:                                157.6 (19.04x)
vvc_sao_edge_32_10_c:                                 3071.1 ( 1.00x)
vvc_sao_edge_32_10_avx2:                               404.2 ( 7.60x)
vvc_sao_edge_32_12_c:                                 2698.6 ( 1.00x)
vvc_sao_edge_32_12_avx2:                               398.8 ( 6.77x)
vvc_sao_edge_48_8_c:                                  6557.7 ( 1.00x)
vvc_sao_edge_48_8_avx2:                                380.1 (17.25x)
vvc_sao_edge_48_10_c:                                 6319.9 ( 1.00x)
vvc_sao_edge_48_10_avx2:                               896.3 ( 7.05x)
vvc_sao_edge_48_12_c:                                 6306.4 ( 1.00x)
vvc_sao_edge_48_12_avx2:                               885.5 ( 7.12x)
vvc_sao_edge_64_8_c:                                 11510.7 ( 1.00x)
vvc_sao_edge_64_8_avx2:                                504.1 (22.84x)
vvc_sao_edge_64_10_c:                                10917.4 ( 1.00x)
vvc_sao_edge_64_10_avx2:                              1608.3 ( 6.79x)
vvc_sao_edge_64_12_c:                                11499.8 ( 1.00x)
vvc_sao_edge_64_12_avx2:                              1586.4 ( 7.25x)
vvc_sao_edge_80_8_c:                                 18193.2 ( 1.00x)
vvc_sao_edge_80_8_avx2:                                930.2 (19.56x)
vvc_sao_edge_80_10_c:                                17984.3 ( 1.00x)
vvc_sao_edge_80_10_avx2:                              2420.9 ( 7.43x)
vvc_sao_edge_80_12_c:                                18289.4 ( 1.00x)
vvc_sao_edge_80_12_avx2:                              2412.1 ( 7.58x)
vvc_sao_edge_96_8_c:                                 26361.8 ( 1.00x)
vvc_sao_edge_96_8_avx2:                               1118.4 (23.57x)
vvc_sao_edge_96_10_c:                                26162.2 ( 1.00x)
vvc_sao_edge_96_10_avx2:                              3666.9 ( 7.13x)
vvc_sao_edge_96_12_c:                                25926.6 ( 1.00x)
vvc_sao_edge_96_12_avx2:                              3433.9 ( 7.55x)
vvc_sao_edge_112_8_c:                                36562.9 ( 1.00x)
vvc_sao_edge_112_8_avx2:                              1741.0 (21.00x)
vvc_sao_edge_112_10_c:                               38126.4 ( 1.00x)
vvc_sao_edge_112_10_avx2:                             5153.3 ( 7.40x)
vvc_sao_edge_112_12_c:                               36345.7 ( 1.00x)
vvc_sao_edge_112_12_avx2:                             4684.9 ( 7.76x)
vvc_sao_edge_128_8_c:                                46379.8 ( 1.00x)
vvc_sao_edge_128_8_avx2:                              2012.4 (23.05x)
vvc_sao_edge_128_10_c:                               47029.5 ( 1.00x)
vvc_sao_edge_128_10_avx2:                             6162.2 ( 7.63x)
vvc_sao_edge_128_12_c:                               49647.3 ( 1.00x)
vvc_sao_edge_128_12_avx2:                             6127.1 ( 8.10x)

Co-authored-by: Nuo Mi <nuomi2021@gmail.com>
2025-05-14 20:55:39 +08:00
Mark Thompson
d03c99441d lavc/apv: AVX2 transquant for x86-64
Typical checkasm result on Alder Lake:

decode_transquant_8_c:                                 464.2 ( 1.00x)
decode_transquant_8_avx2:                               86.2 ( 5.38x)
decode_transquant_10_c:                                481.6 ( 1.00x)
decode_transquant_10_avx2:                              83.5 ( 5.77x)
2025-04-27 15:52:30 +01:00
Rodger Combs
779cbc2b97 checkasm: add tests for AES
Signed-off-by: James Almer <jamrial@gmail.com>
2025-04-06 11:02:10 -03:00
Michael Niedermayer
d5ad860cd8 tests/checkasm/checkasm.c: Assert that aligned_w/h do not overflow
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-04-03 01:58:07 +02:00
Martin Storsjö
b863b81500 checkasm: Implement helpers for defining and checking padded rects
This backports similar functionality from dav1d, from commits
35d1d011fda4a92bcaf42d30ed137583b27d7f6d and
d130da9c315d5a1d3968d278bbee2238ad9051e7.

This allows detecting writes out of bounds, on all 4 sides of
the intended destination rectangle.

The bounds checking also can optionally allow small overwrites
(up to a specified alignment), while still checking for larger
overwrites past the intended allowed region.

Signed-off-by: Martin Storsjö <martin@martin.st>
2025-04-01 18:34:51 +03:00
Martin Storsjö
37c664a253 checkasm: Make checkasm_fail_func return whether we should print verbosely
This makes it easier to implement custom error printouts in tests.

This is a port of dav1d's commit
13a7d78655f8747c2cd01e8a48d44dcc7f60a8e5 into ffmpeg's checkasm.

Signed-off-by: Martin Storsjö <martin@martin.st>
2025-04-01 18:34:48 +03:00
Martin Storsjö
4b524649ff checkasm: Print benchmarks of C-only functions
This corresponds to commit 9278a14cf406f8edb5052c42b83750112bf5b515
in dav1d.

Omitting the C-only functions doesn't speed up benchmarking
anyway (as those has to be benchmarked before we know if we have
any corresponding assembly functions), and being able to benchmark
those functions without corresponding assembly can be valuable in
a number of cases.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-12-11 10:51:15 +02:00
Zhao Zhili
018ec4fe5f tests/checkasm: Simplify logic for WASI signal handling
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Reviewed-by: Martin Storsjö <martin@martin.st>
2024-12-06 10:48:11 +08:00
Zhao Zhili
ea3d21c349 tests/checkasm: Add partial support for wasm
WASI mssing signal and siglongjmp support. This patch workaround
build error and add simd128 flag. Please note that many tests use
large array on stack, so you need to increase the stack size when
build checkasm, e.g., --extra-ldflags='-Wl,-z,stack-size=10485760'

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-12-04 16:43:07 +08:00
Rémi Denis-Courmont
55aa81d5cc checkasm: add RISC-V vector width to arch info 2024-11-17 11:28:21 +02:00
Kyosuke Kawakami
711290f9a3 checkasm/diracdsp: test add_dirac_obmc
Signed-off-by: Kyosuke Kawakami <kawakami150708@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-11-15 13:44:53 -05:00
Martin Storsjö
c65a294f79 checkasm: Print the SVE vector length at startup
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-09-27 00:06:55 +03:00
Martin Storsjö
e6eabb7ce7 aarch64: Add CPU feature flags for SVE and SVE2
Add code for detecting the feature on Linux and Windows.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-09-27 00:04:30 +03:00
Rémi Denis-Courmont
d9f594209f checkasm/riscv: print official extension names 2024-09-04 22:04:11 +03:00
J. Dekker
e758b24396 checkasm: add wildcompares for test & functions
Added:

  --test=<pattern>    Filter tests by glob style pattern.
  --bench[=<pattern>] Run benchmark and optionally filter functions
                      by glob style pattern.

Example:

$ ./tests/checkasm/checkasm --bench=yuva*
[...]
yuva420p_bgr24_8_c:                                     34.5 ( 1.00x)
yuva420p_bgr24_8_ssse3:                                 31.1 ( 1.11x)
yuva420p_bgr24_128_c:                                  310.6 ( 1.00x)
yuva420p_bgr24_128_ssse3:                              178.1 ( 1.74x)
yuva420p_bgr24_1080_c:                                2509.6 ( 1.00x)
yuva420p_bgr24_1080_ssse3:                            1471.5 ( 1.71x)
yuva420p_bgr24_1920_c:                                4462.6 ( 1.00x)
yuva420p_bgr24_1920_ssse3:                            2331.1 ( 1.91x)
[...]

Ported from dav1d.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
d0986709a8 checkasm: improve print format
Port dav1d's checkasm output format to FFmpeg's checkasm, includes
relative speedups and aligns results.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
03f26549cd checkasm: print only results to stdout
Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
42528ff835 checkasm: add csv/tsv bench output
When collecting performance information from checkasm it is common
to parse the output for use in graphs to compare vs different
architectures.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
Ramiro Polla
834964ce1a checkasm/mpegvideoencdsp: add pix_sum, pix_norm1, and draw_edges 2024-08-26 12:48:09 +02:00
Ramiro Polla
a2e01cade8 checkasm/yuv2yuv: add tests for semiplanar unscaled converters 2024-08-26 11:04:46 +02:00
Rémi Denis-Courmont
d1326b6347 lavu/riscv: drop probing for zba CPU capability 2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
1b2a925e94 lavc/riscv: drop probing for F & D extensions
F and D extensions are included in all RISC-V application profiles ever
made (so starting from RV64GC a.k.a. RVA20). Realistically they need to be
selected at compilation time.

Currently, there are no consumers for these two flags. If there is ever a
need to reintroduce F- or D-specific optimisations, we can always use
__riscv_f or __riscv_d compiler predefined macros respectively.
2024-08-01 22:56:50 +03:00
Rémi Denis-Courmont
45d7078a21 lavu/riscv: add CPU flag for B bit manipulations
The B extension was finally ratified in May 2024, encompassing:
- Zba (addresses),
- Zbb (basics) and
- Zbs (single bits).
It does not include Zbc (base-2 polynomials).
2024-07-25 23:09:58 +03:00
Ramiro Polla
1fb77347c8 checkasm: add tests for yuv2rgb 2024-06-28 14:49:49 +02:00
Zhao Zhili
74b4e550cb tests/checkasm: Remove check on linux perf fd in uninit
The check should be >= 0, not > 0. The check itself is redundant
since uninit only being called after init is success.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-18 15:23:46 +08:00
Ramiro Polla
874152033d checkasm: add tests for {lum,chr}ConvertRange 2024-06-16 00:34:24 +02:00
Rémi Denis-Courmont
8d117024fe checkasm: disable unaligned access emulation
The OS may silently fix (emulate) unaligned hardware access exceptions.
This is extremely slow and code should be fixed not to rely on unaligned
access on affected hardware. Accordingly this requests that the OS
disable emulation and instead throw Bus error, which will be caught by
checkasm's signal handler.

This has no effects if the hardware supports unaligned access in
hardware, since no exceptions are generated. prctl() will fail safe in
that case.
2024-06-07 17:53:05 +03:00
Rémi Denis-Courmont
fc85aff72f checkasm: add linear least square tests 2024-06-01 18:05:58 +03:00
Rémi Denis-Courmont
44f7f6e010 checkasm: add h263dsp.{h,v}_loop_filter 2024-05-27 22:42:07 +03:00
Rémi Denis-Courmont
d03cdfa2b6 checkasm/riscv: test misaligned before V
Otherwise V functions mask scalar misaligned ones.
2024-05-24 17:53:43 +03:00
Lynne
d43e123837 checkasm: print bench runs when benchmarking
Helps make sense of the possible noise in the results.
2024-05-21 17:48:48 +02:00
J. Dekker
b1adf6d1d0 checkasm: add runs argument to adjust during bench
Some timers on certain device and test combinations can produce noisy
results, affecting the reliability of performance measurements. One
notable example of this is the Canaan K230 RISC-V development board.

An option to adjust the number of samples by an exponent (--runs) has
been added, allowing developers to increase the sample count for more
reliable results.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-05-21 16:47:45 +02:00
Rémi Denis-Courmont
b410439263 lavu/riscv: CPU flag for fast misaligned accesses 2024-05-14 19:50:00 +03:00
Wu Jianhua
9ef6e15b04 tests/checkasm: add checkasm_check_vvc_alf and check_alf_filter
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-05-14 19:21:35 +08:00
Rémi Denis-Courmont
01c5f4ad9f riscv: add Zvbb vector bit manipulation extension 2024-05-11 11:38:49 +03:00
Ramiro Polla
250c0defa2 checkasm: add test for fdct
Reviewed-by: Martin Storsjö <martin@martin.st>
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
2024-05-11 10:28:59 +02:00
sunyuechi
cfa8d2488d checkasm/rv40dsp: add chroma_mc test
This is similar to h264.

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-05-03 18:00:53 +03:00