1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-11-29 05:57:37 +02:00
Commit Graph

180 Commits

Author SHA1 Message Date
Shaun Loo
45bea45c7b checkasm: add vvc_sao
This is a part of Google Summer of Code 2023

AVX2:
 - vvc_sao.sao_band [OK]
 - vvc_sao.sao_edge [OK]
checkasm: all 54 tests passed
vvc_sao_band_8_8_c:                                    157.4 ( 1.00x)
vvc_sao_band_8_8_avx2:                                  30.7 ( 5.12x)
vvc_sao_band_8_10_c:                                   119.4 ( 1.00x)
vvc_sao_band_8_10_avx2:                                 29.2 ( 4.09x)
vvc_sao_band_8_12_c:                                   144.6 ( 1.00x)
vvc_sao_band_8_12_avx2:                                 30.0 ( 4.82x)
vvc_sao_band_16_8_c:                                   446.5 ( 1.00x)
vvc_sao_band_16_8_avx2:                                103.3 ( 4.32x)
vvc_sao_band_16_10_c:                                  399.2 ( 1.00x)
vvc_sao_band_16_10_avx2:                                64.3 ( 6.21x)
vvc_sao_band_16_12_c:                                  472.9 ( 1.00x)
vvc_sao_band_16_12_avx2:                                56.5 ( 8.37x)
vvc_sao_band_32_8_c:                                  2430.9 ( 1.00x)
vvc_sao_band_32_8_avx2:                                203.3 (11.96x)
vvc_sao_band_32_10_c:                                 1405.7 ( 1.00x)
vvc_sao_band_32_10_avx2:                               208.5 ( 6.74x)
vvc_sao_band_32_12_c:                                 2054.3 ( 1.00x)
vvc_sao_band_32_12_avx2:                               213.0 ( 9.64x)
vvc_sao_band_48_8_c:                                  3835.4 ( 1.00x)
vvc_sao_band_48_8_avx2:                                604.2 ( 6.35x)
vvc_sao_band_48_10_c:                                 3624.6 ( 1.00x)
vvc_sao_band_48_10_avx2:                               468.8 ( 7.73x)
vvc_sao_band_48_12_c:                                 3752.4 ( 1.00x)
vvc_sao_band_48_12_avx2:                               477.5 ( 7.86x)
vvc_sao_band_64_8_c:                                  6061.1 ( 1.00x)
vvc_sao_band_64_8_avx2:                                803.9 ( 7.54x)
vvc_sao_band_64_10_c:                                 6142.5 ( 1.00x)
vvc_sao_band_64_10_avx2:                               827.3 ( 7.43x)
vvc_sao_band_64_12_c:                                 6106.6 ( 1.00x)
vvc_sao_band_64_12_avx2:                               839.9 ( 7.27x)
vvc_sao_band_80_8_c:                                  9478.0 ( 1.00x)
vvc_sao_band_80_8_avx2:                               1516.7 ( 6.25x)
vvc_sao_band_80_10_c:                                10300.5 ( 1.00x)
vvc_sao_band_80_10_avx2:                              1298.7 ( 7.93x)
vvc_sao_band_80_12_c:                                 8941.1 ( 1.00x)
vvc_sao_band_80_12_avx2:                              1315.3 ( 6.80x)
vvc_sao_band_96_8_c:                                 13351.5 ( 1.00x)
vvc_sao_band_96_8_avx2:                               1815.4 ( 7.35x)
vvc_sao_band_96_10_c:                                13197.5 ( 1.00x)
vvc_sao_band_96_10_avx2:                              1872.4 ( 7.05x)
vvc_sao_band_96_12_c:                                11969.0 ( 1.00x)
vvc_sao_band_96_12_avx2:                              1895.8 ( 6.31x)
vvc_sao_band_112_8_c:                                19936.9 ( 1.00x)
vvc_sao_band_112_8_avx2:                              2802.3 ( 7.11x)
vvc_sao_band_112_10_c:                               19534.9 ( 1.00x)
vvc_sao_band_112_10_avx2:                             2635.0 ( 7.41x)
vvc_sao_band_112_12_c:                               16520.6 ( 1.00x)
vvc_sao_band_112_12_avx2:                             2591.8 ( 6.37x)
vvc_sao_band_128_8_c:                                25967.5 ( 1.00x)
vvc_sao_band_128_8_avx2:                              3155.3 ( 8.23x)
vvc_sao_band_128_10_c:                               24002.6 ( 1.00x)
vvc_sao_band_128_10_avx2:                             3374.6 ( 7.11x)
vvc_sao_band_128_12_c:                               20829.4 ( 1.00x)
vvc_sao_band_128_12_avx2:                             3377.0 ( 6.17x)
vvc_sao_edge_8_8_c:                                    174.6 ( 1.00x)
vvc_sao_edge_8_8_avx2:                                  37.0 ( 4.72x)
vvc_sao_edge_8_10_c:                                   174.4 ( 1.00x)
vvc_sao_edge_8_10_avx2:                                 58.5 ( 2.98x)
vvc_sao_edge_8_12_c:                                   171.1 ( 1.00x)
vvc_sao_edge_8_12_avx2:                                 58.5 ( 2.93x)
vvc_sao_edge_16_8_c:                                   677.7 ( 1.00x)
vvc_sao_edge_16_8_avx2:                                 72.2 ( 9.39x)
vvc_sao_edge_16_10_c:                                  724.8 ( 1.00x)
vvc_sao_edge_16_10_avx2:                               106.4 ( 6.81x)
vvc_sao_edge_16_12_c:                                  647.0 ( 1.00x)
vvc_sao_edge_16_12_avx2:                               106.6 ( 6.07x)
vvc_sao_edge_32_8_c:                                  3001.8 ( 1.00x)
vvc_sao_edge_32_8_avx2:                                157.6 (19.04x)
vvc_sao_edge_32_10_c:                                 3071.1 ( 1.00x)
vvc_sao_edge_32_10_avx2:                               404.2 ( 7.60x)
vvc_sao_edge_32_12_c:                                 2698.6 ( 1.00x)
vvc_sao_edge_32_12_avx2:                               398.8 ( 6.77x)
vvc_sao_edge_48_8_c:                                  6557.7 ( 1.00x)
vvc_sao_edge_48_8_avx2:                                380.1 (17.25x)
vvc_sao_edge_48_10_c:                                 6319.9 ( 1.00x)
vvc_sao_edge_48_10_avx2:                               896.3 ( 7.05x)
vvc_sao_edge_48_12_c:                                 6306.4 ( 1.00x)
vvc_sao_edge_48_12_avx2:                               885.5 ( 7.12x)
vvc_sao_edge_64_8_c:                                 11510.7 ( 1.00x)
vvc_sao_edge_64_8_avx2:                                504.1 (22.84x)
vvc_sao_edge_64_10_c:                                10917.4 ( 1.00x)
vvc_sao_edge_64_10_avx2:                              1608.3 ( 6.79x)
vvc_sao_edge_64_12_c:                                11499.8 ( 1.00x)
vvc_sao_edge_64_12_avx2:                              1586.4 ( 7.25x)
vvc_sao_edge_80_8_c:                                 18193.2 ( 1.00x)
vvc_sao_edge_80_8_avx2:                                930.2 (19.56x)
vvc_sao_edge_80_10_c:                                17984.3 ( 1.00x)
vvc_sao_edge_80_10_avx2:                              2420.9 ( 7.43x)
vvc_sao_edge_80_12_c:                                18289.4 ( 1.00x)
vvc_sao_edge_80_12_avx2:                              2412.1 ( 7.58x)
vvc_sao_edge_96_8_c:                                 26361.8 ( 1.00x)
vvc_sao_edge_96_8_avx2:                               1118.4 (23.57x)
vvc_sao_edge_96_10_c:                                26162.2 ( 1.00x)
vvc_sao_edge_96_10_avx2:                              3666.9 ( 7.13x)
vvc_sao_edge_96_12_c:                                25926.6 ( 1.00x)
vvc_sao_edge_96_12_avx2:                              3433.9 ( 7.55x)
vvc_sao_edge_112_8_c:                                36562.9 ( 1.00x)
vvc_sao_edge_112_8_avx2:                              1741.0 (21.00x)
vvc_sao_edge_112_10_c:                               38126.4 ( 1.00x)
vvc_sao_edge_112_10_avx2:                             5153.3 ( 7.40x)
vvc_sao_edge_112_12_c:                               36345.7 ( 1.00x)
vvc_sao_edge_112_12_avx2:                             4684.9 ( 7.76x)
vvc_sao_edge_128_8_c:                                46379.8 ( 1.00x)
vvc_sao_edge_128_8_avx2:                              2012.4 (23.05x)
vvc_sao_edge_128_10_c:                               47029.5 ( 1.00x)
vvc_sao_edge_128_10_avx2:                             6162.2 ( 7.63x)
vvc_sao_edge_128_12_c:                               49647.3 ( 1.00x)
vvc_sao_edge_128_12_avx2:                             6127.1 ( 8.10x)

Co-authored-by: Nuo Mi <nuomi2021@gmail.com>
2025-05-14 20:55:39 +08:00
Mark Thompson
d03c99441d lavc/apv: AVX2 transquant for x86-64
Typical checkasm result on Alder Lake:

decode_transquant_8_c:                                 464.2 ( 1.00x)
decode_transquant_8_avx2:                               86.2 ( 5.38x)
decode_transquant_10_c:                                481.6 ( 1.00x)
decode_transquant_10_avx2:                              83.5 ( 5.77x)
2025-04-27 15:52:30 +01:00
Rodger Combs
779cbc2b97 checkasm: add tests for AES
Signed-off-by: James Almer <jamrial@gmail.com>
2025-04-06 11:02:10 -03:00
Michael Niedermayer
d5ad860cd8 tests/checkasm/checkasm.c: Assert that aligned_w/h do not overflow
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
2025-04-03 01:58:07 +02:00
Martin Storsjö
b863b81500 checkasm: Implement helpers for defining and checking padded rects
This backports similar functionality from dav1d, from commits
35d1d011fda4a92bcaf42d30ed137583b27d7f6d and
d130da9c315d5a1d3968d278bbee2238ad9051e7.

This allows detecting writes out of bounds, on all 4 sides of
the intended destination rectangle.

The bounds checking also can optionally allow small overwrites
(up to a specified alignment), while still checking for larger
overwrites past the intended allowed region.

Signed-off-by: Martin Storsjö <martin@martin.st>
2025-04-01 18:34:51 +03:00
Martin Storsjö
37c664a253 checkasm: Make checkasm_fail_func return whether we should print verbosely
This makes it easier to implement custom error printouts in tests.

This is a port of dav1d's commit
13a7d78655f8747c2cd01e8a48d44dcc7f60a8e5 into ffmpeg's checkasm.

Signed-off-by: Martin Storsjö <martin@martin.st>
2025-04-01 18:34:48 +03:00
Martin Storsjö
4b524649ff checkasm: Print benchmarks of C-only functions
This corresponds to commit 9278a14cf406f8edb5052c42b83750112bf5b515
in dav1d.

Omitting the C-only functions doesn't speed up benchmarking
anyway (as those has to be benchmarked before we know if we have
any corresponding assembly functions), and being able to benchmark
those functions without corresponding assembly can be valuable in
a number of cases.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-12-11 10:51:15 +02:00
Zhao Zhili
018ec4fe5f tests/checkasm: Simplify logic for WASI signal handling
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
Reviewed-by: Martin Storsjö <martin@martin.st>
2024-12-06 10:48:11 +08:00
Zhao Zhili
ea3d21c349 tests/checkasm: Add partial support for wasm
WASI mssing signal and siglongjmp support. This patch workaround
build error and add simd128 flag. Please note that many tests use
large array on stack, so you need to increase the stack size when
build checkasm, e.g., --extra-ldflags='-Wl,-z,stack-size=10485760'

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-12-04 16:43:07 +08:00
Rémi Denis-Courmont
55aa81d5cc checkasm: add RISC-V vector width to arch info 2024-11-17 11:28:21 +02:00
Kyosuke Kawakami
711290f9a3 checkasm/diracdsp: test add_dirac_obmc
Signed-off-by: Kyosuke Kawakami <kawakami150708@gmail.com>
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2024-11-15 13:44:53 -05:00
Martin Storsjö
c65a294f79 checkasm: Print the SVE vector length at startup
Signed-off-by: Martin Storsjö <martin@martin.st>
2024-09-27 00:06:55 +03:00
Martin Storsjö
e6eabb7ce7 aarch64: Add CPU feature flags for SVE and SVE2
Add code for detecting the feature on Linux and Windows.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-09-27 00:04:30 +03:00
Rémi Denis-Courmont
d9f594209f checkasm/riscv: print official extension names 2024-09-04 22:04:11 +03:00
J. Dekker
e758b24396 checkasm: add wildcompares for test & functions
Added:

  --test=<pattern>    Filter tests by glob style pattern.
  --bench[=<pattern>] Run benchmark and optionally filter functions
                      by glob style pattern.

Example:

$ ./tests/checkasm/checkasm --bench=yuva*
[...]
yuva420p_bgr24_8_c:                                     34.5 ( 1.00x)
yuva420p_bgr24_8_ssse3:                                 31.1 ( 1.11x)
yuva420p_bgr24_128_c:                                  310.6 ( 1.00x)
yuva420p_bgr24_128_ssse3:                              178.1 ( 1.74x)
yuva420p_bgr24_1080_c:                                2509.6 ( 1.00x)
yuva420p_bgr24_1080_ssse3:                            1471.5 ( 1.71x)
yuva420p_bgr24_1920_c:                                4462.6 ( 1.00x)
yuva420p_bgr24_1920_ssse3:                            2331.1 ( 1.91x)
[...]

Ported from dav1d.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
d0986709a8 checkasm: improve print format
Port dav1d's checkasm output format to FFmpeg's checkasm, includes
relative speedups and aligns results.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
03f26549cd checkasm: print only results to stdout
Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
42528ff835 checkasm: add csv/tsv bench output
When collecting performance information from checkasm it is common
to parse the output for use in graphs to compare vs different
architectures.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
Ramiro Polla
834964ce1a checkasm/mpegvideoencdsp: add pix_sum, pix_norm1, and draw_edges 2024-08-26 12:48:09 +02:00
Ramiro Polla
a2e01cade8 checkasm/yuv2yuv: add tests for semiplanar unscaled converters 2024-08-26 11:04:46 +02:00
Rémi Denis-Courmont
d1326b6347 lavu/riscv: drop probing for zba CPU capability 2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
1b2a925e94 lavc/riscv: drop probing for F & D extensions
F and D extensions are included in all RISC-V application profiles ever
made (so starting from RV64GC a.k.a. RVA20). Realistically they need to be
selected at compilation time.

Currently, there are no consumers for these two flags. If there is ever a
need to reintroduce F- or D-specific optimisations, we can always use
__riscv_f or __riscv_d compiler predefined macros respectively.
2024-08-01 22:56:50 +03:00
Rémi Denis-Courmont
45d7078a21 lavu/riscv: add CPU flag for B bit manipulations
The B extension was finally ratified in May 2024, encompassing:
- Zba (addresses),
- Zbb (basics) and
- Zbs (single bits).
It does not include Zbc (base-2 polynomials).
2024-07-25 23:09:58 +03:00
Ramiro Polla
1fb77347c8 checkasm: add tests for yuv2rgb 2024-06-28 14:49:49 +02:00
Zhao Zhili
74b4e550cb tests/checkasm: Remove check on linux perf fd in uninit
The check should be >= 0, not > 0. The check itself is redundant
since uninit only being called after init is success.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-18 15:23:46 +08:00
Ramiro Polla
874152033d checkasm: add tests for {lum,chr}ConvertRange 2024-06-16 00:34:24 +02:00
Rémi Denis-Courmont
8d117024fe checkasm: disable unaligned access emulation
The OS may silently fix (emulate) unaligned hardware access exceptions.
This is extremely slow and code should be fixed not to rely on unaligned
access on affected hardware. Accordingly this requests that the OS
disable emulation and instead throw Bus error, which will be caught by
checkasm's signal handler.

This has no effects if the hardware supports unaligned access in
hardware, since no exceptions are generated. prctl() will fail safe in
that case.
2024-06-07 17:53:05 +03:00
Rémi Denis-Courmont
fc85aff72f checkasm: add linear least square tests 2024-06-01 18:05:58 +03:00
Rémi Denis-Courmont
44f7f6e010 checkasm: add h263dsp.{h,v}_loop_filter 2024-05-27 22:42:07 +03:00
Rémi Denis-Courmont
d03cdfa2b6 checkasm/riscv: test misaligned before V
Otherwise V functions mask scalar misaligned ones.
2024-05-24 17:53:43 +03:00
Lynne
d43e123837 checkasm: print bench runs when benchmarking
Helps make sense of the possible noise in the results.
2024-05-21 17:48:48 +02:00
J. Dekker
b1adf6d1d0 checkasm: add runs argument to adjust during bench
Some timers on certain device and test combinations can produce noisy
results, affecting the reliability of performance measurements. One
notable example of this is the Canaan K230 RISC-V development board.

An option to adjust the number of samples by an exponent (--runs) has
been added, allowing developers to increase the sample count for more
reliable results.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-05-21 16:47:45 +02:00
Rémi Denis-Courmont
b410439263 lavu/riscv: CPU flag for fast misaligned accesses 2024-05-14 19:50:00 +03:00
Wu Jianhua
9ef6e15b04 tests/checkasm: add checkasm_check_vvc_alf and check_alf_filter
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-05-14 19:21:35 +08:00
Rémi Denis-Courmont
01c5f4ad9f riscv: add Zvbb vector bit manipulation extension 2024-05-11 11:38:49 +03:00
Ramiro Polla
250c0defa2 checkasm: add test for fdct
Reviewed-by: Martin Storsjö <martin@martin.st>
Reviewed-by: Rémi Denis-Courmont <remi@remlab.net>
2024-05-11 10:28:59 +02:00
sunyuechi
cfa8d2488d checkasm/rv40dsp: add chroma_mc test
This is similar to h264.

Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-05-03 18:00:53 +03:00
J. Dekker
985fdf8e3d tests/checkasm: add exclude_guest for non-x86 linux perf
The exclude_guest option only has an effect on x86. Omitting
'exclude_guest' defaults to zero which implies that you can count guest
events should you run one. Some non-x86 kernels just ignore it, while
others (e.g. the Asahi Linux kernels) require the user to explicitly set
the option to 1, i.e. the only behaviour that makes sense when counting
guest events isn't supported.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-04-10 13:37:40 +02:00
sunyuechi
6728edadde checkasm/rv34dsp: add rv34_inv_transform_dc test
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-02-17 14:33:35 +02:00
Wu Jianhua
fb26c7bfd4 tests/checkasm: add checkasm_check_vvc_mc
Signed-off-by: Wu Jianhua <toqsxw@outlook.com>
2024-02-01 19:54:29 +08:00
Martin Storsjö
ac40c3bb07 checkasm: Test whether the native FFmpeg timers work
On some platforms (in particular, ARM/AArch64), the implementation
of AV_READ_TIME() may use a privileged instruction - in such
cases, benchmarking just fails with a SIGILL.

Instead of crashing, try executing AV_READ_TIME() once within
a region with the signal handler active, to allow gracefully
informing the user about the issue.

This matches the dav1d checkasm commit
95a192549a448b70d9542e840c4e34b60d09b093.

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-01-15 23:29:12 +02:00
sunyuechi
202a35ecdb checkasm/svqenc: add ssd_int8_vs_int16 test
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2024-01-15 19:03:03 +02:00
Martin Storsjö
65739691b9 checkasm: Generalize crash handling
This replaces the riscv specific handling from
7212466e73 (which essentially is
reverted), with a different implementation of the same (plus a bit
more), based on the corresponding feature in dav1d's checkasm,
supporting both Unix and Windows.

See in particular the dav1d commits
0b6ee30eab2400e4f85b735ad29a68a842c34e21,
0421f787ea592fd2cc74c887f20b8dc31393788b,
8501a4b20135f93a4c3b426468e2240e872949c5 and
d23e87f7aee26ddcf5f7a2e185112031477599a7, authored by Henrik Gramner.

The overall approach compared to the existing implementation for
riscv is the same; set up a signal handler, store the state with
sigsetjmp, jump out of the crashing function with siglongjmp.

The main difference is in what happens when the signal handler
is invoked. In the previous implementation, it would resume from
right before calling the crashing function, and then skip that call
based on the setjmp return value.

In the imported implementation from dav1d, we return to right before
the check_func() call, which will skip testing the current function
(as the pointer is the same as it was before).

Other differences are:
- Support for other signal handling mechanisms (Windows
  AddVectoredExceptionHandler)
- Using RtlCaptureContext/RtlRestoreContext instead of setjmp/longjmp
  on Windows with SEH
- Only catching signals once per function - if more than one
  signal is delivered before signal handling is reenabled, any
  signal is handled as it would without our handler
- Not using an arch specific signal handler written in assembly

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-01-11 14:48:53 +02:00
sunyuechi
3bdb0fe511 checkasm/takdsp: add decorrelate_ls test
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-21 22:42:34 +02:00
Martin Storsjö
f5e3e9e04e checkasm: Remove unnecessary const on scalar parameters
The ffmpeg coding style doesn't usually use const on scalar
parameters (or on the pointer values - as opposed to the type
that is pointed to, where it has a semantic meaning), contrary
to the dav1d coding style (where this was imported from).

This avoids warnings about differences in the type signatures
between declaration and definition of this function, with older
versions of MSVC.

The issue was observed with one version of MSVC 2017,
19.16.27024.1, with warnings like these:

    src/tests/checkasm/checkasm.c(969): warning C4028: formal parameter 3 different from declaration

The warning itself is bogus as the const here is harmless, and
newer versions of MSVC no longer warn about this.

Signed-off-by: Martin Storsjö <martin@martin.st>
2023-12-21 00:14:41 +02:00
sunyuechi
1c3620b2bb checkasm: test for abs_pow34
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-11 18:42:07 +02:00
Rémi Denis-Courmont
b3825bbe45 riscv: test for assembler support
This should fix the build on LLVM 16 and earlier, at the cost of turning
all non-RVV optimisations off.
2023-12-08 17:21:09 +02:00
sunyuechi
d0ec826077 checkasm/ac3dsp: add float_to_fixed24 test
Signed-off-by: Rémi Denis-Courmont <remi@remlab.net>
2023-12-01 20:26:48 +02:00
Rémi Denis-Courmont
7212466e73 checkasm/riscv: report an error upon SIGILL
Terminating the whole checkasm process is not very helpful. This will
report if an illegal instruction occurs while executing a tested
function. This is a common occurrence whilst developping RISC-V
assembler, due to the compatibility between vector configuration and
instruction done at run-time.
2023-11-23 19:04:07 +02:00
Rémi Denis-Courmont
286d674221 checkasm: add helper to report a fatal signal 2023-11-23 18:57:18 +02:00