1
0
mirror of https://github.com/FFmpeg/FFmpeg.git synced 2025-01-29 22:00:58 +02:00

554 Commits

Author SHA1 Message Date
Ramiro Polla
e0cc06184c checkasm/sw_rgb: add rgb24toyv12 tests 2024-09-06 23:06:35 +02:00
Ramiro Polla
c08bb33e41 checkasm/sw_rgb: add deinterleaveBytes 2024-09-06 23:05:06 +02:00
James Almer
2a6f84718b fate/checkasm/sw_gbrp: don't randomly set internal values
They are set by sws_init_context().
May help with signed integer overflows reported by gcc-usan.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-09-05 22:19:47 -03:00
Rémi Denis-Courmont
d9f594209f checkasm/riscv: print official extension names 2024-09-04 22:04:11 +03:00
Anton Khirnov
3f9ca51015 lavc/opus*: move to opus/ subdir 2024-09-02 11:56:53 +02:00
Ramiro Polla
6aafe61285 avcodec/mpegvideoencdsp: convert stride parameters from int to ptrdiff_t 2024-09-01 13:42:30 +02:00
Nuo Mi
7175544c0b checkasm: add vvc_bdof test
apply_bdof_8_8x16_c: 5776.5
apply_bdof_8_8x16_avx2: 396.2
apply_bdof_8_16x8_c: 5722.0
apply_bdof_8_16x8_avx2: 216.0
apply_bdof_8_16x16_c: 11213.2
apply_bdof_8_16x16_avx2: 434.5
apply_bdof_10_8x16_c: 5657.7
apply_bdof_10_8x16_avx2: 1096.0
apply_bdof_10_16x8_c: 5531.7
apply_bdof_10_16x8_avx2: 212.5
apply_bdof_10_16x16_c: 11043.7
apply_bdof_10_16x16_avx2: 1252.7
apply_bdof_12_8x16_c: 5680.0
apply_bdof_12_8x16_avx2: 1096.5
apply_bdof_12_16x8_c: 5646.2
apply_bdof_12_16x8_avx2: 624.5
apply_bdof_12_16x16_c: 11076.0
apply_bdof_12_16x16_avx2: 1241.5
2024-08-31 14:08:54 +08:00
J. Dekker
e758b24396 checkasm: add wildcompares for test & functions
Added:

  --test=<pattern>    Filter tests by glob style pattern.
  --bench[=<pattern>] Run benchmark and optionally filter functions
                      by glob style pattern.

Example:

$ ./tests/checkasm/checkasm --bench=yuva*
[...]
yuva420p_bgr24_8_c:                                     34.5 ( 1.00x)
yuva420p_bgr24_8_ssse3:                                 31.1 ( 1.11x)
yuva420p_bgr24_128_c:                                  310.6 ( 1.00x)
yuva420p_bgr24_128_ssse3:                              178.1 ( 1.74x)
yuva420p_bgr24_1080_c:                                2509.6 ( 1.00x)
yuva420p_bgr24_1080_ssse3:                            1471.5 ( 1.71x)
yuva420p_bgr24_1920_c:                                4462.6 ( 1.00x)
yuva420p_bgr24_1920_ssse3:                            2331.1 ( 1.91x)
[...]

Ported from dav1d.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
d0986709a8 checkasm: improve print format
Port dav1d's checkasm output format to FFmpeg's checkasm, includes
relative speedups and aligns results.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
03f26549cd checkasm: print only results to stdout
Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
J. Dekker
42528ff835 checkasm: add csv/tsv bench output
When collecting performance information from checkasm it is common
to parse the output for use in graphs to compare vs different
architectures.

Signed-off-by: J. Dekker <jdek@itanimul.li>
2024-08-28 11:45:46 +02:00
Ramiro Polla
834964ce1a checkasm/mpegvideoencdsp: add pix_sum, pix_norm1, and draw_edges 2024-08-26 12:48:09 +02:00
Ramiro Polla
a2e01cade8 checkasm/yuv2yuv: add tests for semiplanar unscaled converters 2024-08-26 11:04:46 +02:00
Ramiro Polla
4545205a26 swscale/yuv2rgb: add yuv42{0,2}p -> gbrp unscaled colorspace converters 2024-08-18 22:26:11 +02:00
Nuo Mi
7eb1df44ae checkasm: add tests for vvc dmvr
dmvr_8_12x20_c: 186.2
dmvr_8_12x20_avx2: 25.7
dmvr_8_20x12_c: 181.7
dmvr_8_20x12_avx2: 25.2
dmvr_8_20x20_c: 283.2
dmvr_8_20x20_avx2: 32.0
dmvr_10_12x20_c: 90.0
dmvr_10_12x20_avx2: 15.7
dmvr_10_20x12_c: 41.0
dmvr_10_20x12_avx2: 14.7
dmvr_10_20x20_c: 81.5
dmvr_10_20x20_avx2: 26.7
dmvr_12_12x20_c: 190.7
dmvr_12_12x20_avx2: 20.2
dmvr_12_20x12_c: 187.2
dmvr_12_20x12_avx2: 20.2
dmvr_12_20x20_c: 292.7
dmvr_12_20x20_avx2: 27.2
dmvr_h_8_12x20_c: 317.0
dmvr_h_8_12x20_avx2: 37.0
dmvr_h_8_20x12_c: 340.0
dmvr_h_8_20x12_avx2: 41.0
dmvr_h_8_20x20_c: 540.7
dmvr_h_8_20x20_avx2: 64.0
dmvr_h_10_12x20_c: 322.7
dmvr_h_10_12x20_avx2: 30.7
dmvr_h_10_20x12_c: 344.2
dmvr_h_10_20x12_avx2: 34.0
dmvr_h_10_20x20_c: 529.0
dmvr_h_10_20x20_avx2: 51.5
dmvr_h_12_12x20_c: 326.7
dmvr_h_12_12x20_avx2: 33.5
dmvr_h_12_20x12_c: 331.7
dmvr_h_12_20x12_avx2: 51.2
dmvr_h_12_20x20_c: 534.0
dmvr_h_12_20x20_avx2: 62.7
dmvr_hv_8_12x20_c: 650.0
dmvr_hv_8_12x20_avx2: 57.2
dmvr_hv_8_20x12_c: 676.2
dmvr_hv_8_20x12_avx2: 70.0
dmvr_hv_8_20x20_c: 1068.5
dmvr_hv_8_20x20_avx2: 103.2
dmvr_hv_10_12x20_c: 649.0
dmvr_hv_10_12x20_avx2: 48.2
dmvr_hv_10_20x12_c: 677.7
dmvr_hv_10_20x12_avx2: 59.7
dmvr_hv_10_20x20_c: 1093.5
dmvr_hv_10_20x20_avx2: 91.7
dmvr_hv_12_12x20_c: 660.0
dmvr_hv_12_12x20_avx2: 58.7
dmvr_hv_12_20x12_c: 682.7
dmvr_hv_12_20x12_avx2: 72.0
dmvr_hv_12_20x20_c: 1094.0
dmvr_hv_12_20x20_avx2: 113.2
dmvr_v_8_12x20_c: 325.7
dmvr_v_8_12x20_avx2: 31.2
dmvr_v_8_20x12_c: 326.2
dmvr_v_8_20x12_avx2: 38.5
dmvr_v_8_20x20_c: 538.5
dmvr_v_8_20x20_avx2: 54.2
dmvr_v_10_12x20_c: 318.5
dmvr_v_10_12x20_avx2: 23.7
dmvr_v_10_20x12_c: 330.7
dmvr_v_10_20x12_avx2: 40.5
dmvr_v_10_20x20_c: 567.5
dmvr_v_10_20x20_avx2: 48.0
dmvr_v_12_12x20_c: 335.2
dmvr_v_12_12x20_avx2: 30.0
dmvr_v_12_20x12_c: 330.2
dmvr_v_12_20x12_avx2: 39.5
dmvr_v_12_20x20_c: 535.2
dmvr_v_12_20x20_avx2: 60.0
2024-08-15 20:19:45 +08:00
Rémi Denis-Courmont
d1326b6347 lavu/riscv: drop probing for zba CPU capability 2024-08-05 21:16:26 +03:00
Rémi Denis-Courmont
1b2a925e94 lavc/riscv: drop probing for F & D extensions
F and D extensions are included in all RISC-V application profiles ever
made (so starting from RV64GC a.k.a. RVA20). Realistically they need to be
selected at compilation time.

Currently, there are no consumers for these two flags. If there is ever a
need to reintroduce F- or D-specific optimisations, we can always use
__riscv_f or __riscv_d compiler predefined macros respectively.
2024-08-01 22:56:50 +03:00
Rémi Denis-Courmont
656a9664bf checkasm/riscv: preserve T1 whilst calling...
This preserves T1 whilst calling the instrumented function. In a Sci-Fi
setting where type-based Control Flow Integrity (CFI) is supported, the
calling code (i.e., the `checkasm` test case) will set T1 to the expected
value of the landing pad label (LPL) of the instrumented function.

The call wrapper will always use LPL zero which is a wild card. We should
preserve the value of T1 at least until the indirect call to the
instrumented function. Of course this is Sci-Fi, because:
1) there is no hardware (or even QEMU) support yet,
2) all our assembler functions currently use LPL zero anyway.

This uses T3 rather than T2 because indirect branches with T2 is reserved
for notionally direct calls made with an indirect call instruction (e.g.
due to GOT indirection), and are exempted from forward-edge CFI checks.
2024-08-01 18:44:01 +03:00
Rémi Denis-Courmont
8030876d1c checkasm/riscv: align the landing pads 2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
7dde8be29f checkasm/riscv: add forward-edge CFI landing pads 2024-07-25 23:10:14 +03:00
Rémi Denis-Courmont
45d7078a21 lavu/riscv: add CPU flag for B bit manipulations
The B extension was finally ratified in May 2024, encompassing:
- Zba (addresses),
- Zbb (basics) and
- Zbs (single bits).
It does not include Zbc (base-2 polynomials).
2024-07-25 23:09:58 +03:00
Martin Storsjö
97a708a507 checkasm: Increase the tolerance for ac3_sum_square_butterfly_float
Increase the tolerance from 10 ulp to 11 ulp. This fixes occasional
errors for some inputs; the errors could be reproduced on
aarch64/neon builds, with "checkasm --test=ac3dsp 3446175925".

Signed-off-by: Martin Storsjö <martin@martin.st>
2024-07-24 12:10:33 +03:00
Rémi Denis-Courmont
c4c811b3d9 checkasm/h264dsp: test TX bypass 2024-07-21 22:36:48 +03:00
James Almer
97fd5d3363 checkasm/lls: increase epsilon value for the update_lls test
Should fix failures for some seeds on x86_32.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-07-19 09:24:59 -03:00
Ramiro Polla
1fb77347c8 checkasm: add tests for yuv2rgb 2024-06-28 14:49:49 +02:00
Nuo Mi
0333b97414 checkasm/vvc_alf: ensure right and bottom boundaries are not overwritten by asm 2024-06-25 19:32:17 +08:00
Nuo Mi
1fa9f5b17f checkasm/vvc_alf: random select alf virtual boundaries position
A picture's virtual boundaries will split a CTU into 4 ALF blocks.
The ALF virtual boundary may cross or not cross a ALF block.
2024-06-25 19:32:17 +08:00
Nuo Mi
b82ef7c0ba checkasm/vvc_alf: only check the valid filter and classify sizes 2024-06-25 19:32:17 +08:00
Andreas Rheinhardt
8b4f7c0663 avcodec/me_cmp: Zero MECmpContext in ff_me_cmp_init()
Not every function will be set, so zero the context
to initialize everything.

This also allows to remove an initialization in dvenc.c.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-20 18:58:38 +02:00
Andreas Rheinhardt
b1a31b32ab avcodec/me_cmp,dvenc,mpegvideo: Move ildct_cmp to its users
MECmpContext.ildct_cmp is an array of function pointers that
are not set by ff_me_cmp_init(), but that are set by users
to one of the other arrays via ff_set_cmp().

Remove these pointers from MECmpContext and add pointers
for the actually used functions to its users. (The DV encoder
already did so.)

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-20 18:58:38 +02:00
Andreas Rheinhardt
cd2e46a350 avcodec/me_cmp, mpegvideo: Move frame_skip_cmp to MpegEncContext
MECmpContext has several arrays of function pointers that
are not set by ff_me_cmp_init(), but that are set by users
to one of the other arrays via ff_set_cmp().

One of these other users is mpegvideo_enc; it is the only user
of MECmpContext.frame_skip_cmp and it only uses one of these
function pointers at all.

This commit therefore moves this function pointer to MpegEncContext;
and removes the array from MECmpContext.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-20 18:58:38 +02:00
Andreas Rheinhardt
182e647a64 avcodec/me_cmp, motion_est: Move me_(pre_)?_cmp etc. to MotionEstContext
MECmpContext has several arrays of function pointers that
are not set by ff_me_cmp_init(), but that are set by users
to one of the other arrays via ff_set_cmp().

One of these other users is the motion estimation API.
It uses MECmpContext.(me_pre|me|me_sub|mb)_cmp. It is
basically the only user of these arrays.

This commit therefore moves these arrays to MotionEstContext;
this has the additional advantage of making motion_est.c
more independent from MpegEncContext.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-20 18:58:38 +02:00
Zhao Zhili
74b4e550cb tests/checkasm: Remove check on linux perf fd in uninit
The check should be >= 0, not > 0. The check itself is redundant
since uninit only being called after init is success.

Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-18 15:23:46 +08:00
Ramiro Polla
874152033d checkasm: add tests for {lum,chr}ConvertRange 2024-06-16 00:34:24 +02:00
James Almer
a7e9f1c1e7 checkasm/lls: add missing random values to the test buffers
Fixes valgrind warnings after 18adaf9fe558587cb1b707c647af83015b69da48.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-13 14:21:18 -03:00
Rémi Denis-Courmont
18adaf9fe5 checkasm/lls: adjust buffer sizes and alignments
var must be padded.
param has `order + 1`, not `order` elements and is *not* over-aligned.
2024-06-11 20:07:55 +03:00
Zhao Zhili
b1240c983f tests/checkasm: Fix build error when enable linux perf on Android
B0 is defined by system header, see f0f596dbc6b for ref.

Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Zhao Zhili <zhilizhao@tencent.com>
2024-06-11 01:11:46 +08:00
James Almer
287d139b77 checkasm/sw_rgb: fix alignment of buffers for rgb_to_yuv tests
src is apparently not guaranteed to be >8 byte aligned, but align to 16
nonetheless as the x86 asm will do unaligned loads anyway.
dst is guaranteed to be 32 byte aligned for the Y plane, but 16 byte for UV.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 14:12:51 -03:00
James Almer
6743c2fc6a checkasm/sw_rgb: test rgb32/rgb32_1 to yuv
Test all four pixel formats, but only bench the two native endian ones for a
given target.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 12:29:49 -03:00
James Almer
91b9af0058 x86/aacencdsp: add AVX version of quantize_bands
quant_bands_signed_c: 1928.0
quant_bands_signed_sse2: 406.0
quant_bands_signed_avx: 207.0
quant_bands_unsigned_c: 1702.0
quant_bands_unsigned_sse2: 404.0
quant_bands_unsigned_avx: 209.0

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-09 12:29:49 -03:00
Andreas Rheinhardt
fca796ac3b tests/checkasm/sw_rgb: Be more strict about clobbering MMX state
The MMXEXT versions of the rgb2rgb functions tested here
always emit emms on their own. Therefore one can use
a stricter test to ensure that it stays that way.

Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
2024-06-09 12:03:47 +02:00
Rémi Denis-Courmont
8d117024fe checkasm: disable unaligned access emulation
The OS may silently fix (emulate) unaligned hardware access exceptions.
This is extremely slow and code should be fixed not to rely on unaligned
access on affected hardware. Accordingly this requests that the OS
disable emulation and instead throw Bus error, which will be caught by
checkasm's signal handler.

This has no effects if the hardware supports unaligned access in
hardware, since no exceptions are generated. prctl() will fail safe in
that case.
2024-06-07 17:53:05 +03:00
Zhao Zhili
47ba87551c checkasm/sw_rgb: test rgb24/bgr24 to yuv
The line width 8 is supposed to test corner case, while the
performance doesn't matter. Width 1080 is also a case of
unaligned to 16.

Width 1920 meant for benchmark (together with --runs options).

Signed-off-by: James Almer <jamrial@gmail.com>
2024-06-05 15:22:49 -03:00
Anton Khirnov
e4601cc339 lavc/hevc*: move to hevc/ subdir 2024-06-04 11:46:27 +02:00
Rémi Denis-Courmont
be6f8c439a checkasm: add aacencdsp.quant_bands test 2024-06-03 22:43:37 +03:00
Rémi Denis-Courmont
fc85aff72f checkasm: add linear least square tests 2024-06-01 18:05:58 +03:00
James Almer
0a949aacae checkasm/lpc: use fixed length to bench apply_welch_window
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-31 17:06:08 -03:00
Rémi Denis-Courmont
16132a810d checkasm/lpc: test compute_autocorr
Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-31 16:36:43 -03:00
Rémi Denis-Courmont
98405d28fa checkasm/float_dsp: add double-precision scalar product 2024-05-31 22:22:43 +03:00
James Almer
4008a80c1b tests/checkasm/vvc_mc: don't zero the SAD buffers
They will be filled immediately after.

Signed-off-by: James Almer <jamrial@gmail.com>
2024-05-31 20:05:21 +08:00