This is required placement by standard [[maybe_unused]] attribute, works
the same for __attribute__((unused)).
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
If we're invoked with range == UINT_MAX, we end up doing
"rnd() % (UINT_MAX + 1)", which is equal to "rnd() % 0". On
arm (on all platforms) and on MSVC i386, this ends up crashing
at runtime.
This fixes the crash.
Because of the lack of an external ABI on low-level kernels, we cannot
directly test internal functions. Instead, we construct a minimal op chain
consisting of a read, the op to be tested, and a write.
The bigger complication arises from the fact that the backend may generate
arbitrary internal state that needs to be passed back to the implementation,
which means we cannot directly call `func_ref` on the generated chain. To get
around this, always compile the op chain twice - once using the backend to be
tested, and once using the reference C backend.
The actual entry point may also just be a shared wrapper, so we need to
be very careful to run checkasm_check_func() on a pseudo-pointer that will
actually be unique for each combination of backend and active CPU flags.
We split the standard macro into its body (implementation) and declaration,
and use a macro argument in place of the raw `memcmp` call, with the major
difference that we now take the number of pixels to compare instead of the
number of bytes (to match the signature of float_near_ulp_array).
Sometimes, when measuring very small functions, rdtsc is not accurate enough
to get a reliable measurement. This increases the number of runs inside the
inner loop from 4 to 32, which should help a lot. Less important when using
the more precise linux-perf API, but still useful.
There should be no user-visible change since the number of runs is adjusted
to keep the total time spent measuring the same.
It can be useful to know if the alpha plane consists of fully opaque
pixels or not, in which case it can e.g. safely be stripped.
This only requires a very minor modification to the AVX2 routines, adding
an extra AND on the read alpha value with the reference alpha value, and a
single extra cheap test per line.
detect_alpha_8_full_c: 2849.1 ( 1.00x)
detect_alpha_8_full_avx2: 260.3 (10.95x)
detect_alpha_8_full_avx512icl: 130.2 (21.87x)
detect_alpha_8_limited_c: 8349.2 ( 1.00x)
detect_alpha_8_limited_avx2: 756.6 (11.04x)
detect_alpha_8_limited_avx512icl: 364.2 (22.93x)
detect_alpha_16_full_c: 1652.8 ( 1.00x)
detect_alpha_16_full_avx2: 236.5 ( 6.99x)
detect_alpha_16_full_avx512icl: 134.6 (12.28x)
detect_alpha_16_limited_c: 5263.1 ( 1.00x)
detect_alpha_16_limited_avx2: 797.4 ( 6.60x)
detect_alpha_16_limited_avx512icl: 400.3 (13.15x)
This safety margin was motivated by the fact that vf_premultiply sometimes
produces such illegally high values, but this has since been fixed by
603334a043, so there's no more reason to have this safety margin, at
least for our own code. (Of course, other sources may also produce such
broken files, but we shouldn't work around that - garbage in, garbage out.)
See-Also: 603334a043
Accept up to 13 ULP difference.
This fixes running "checkasm --test=ac3dsp 3044836819" on ARM.
Depending on how the SIMD implementations aggregate numbers,
larger/smaller values might not end up accumulated in exactly
the same way; the current NEON implementation for ARM aggregates
into vectors of 2 elements. If it would aggregate into vectors
of 4 elements instead, like the AArch64 version does, this particular
case would end up with a smaller difference.
Also ensure that the dst buffers are not too big
(they had the right size for >8 bit depths and were therefore
too big for eight bit, letting potential buffer overflows
in the eight bit version go undetected).
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This aligns declared function types in checkasm with real definition.
Fixes FATE: checkasm-{sw_rgb,sw_scale,sw_yuv2rgb,sw_yuv2yuv}
Fixes: runtime error: call to function <func> through pointer to incorrect function type
Fixes: c1a0e65763
Signed-off-by: Kacper Michajłow <kasper93@gmail.com>
This is useful for tests that compare dctcoefs which will be either 2 bytes or
4 bytes, depending on bitdepth.
Reviewed-by: Martin Storsjö <martin@martin.st>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This allows catching whether the functions write outside of
the designated rectangle, and if run with "checkasm -v", it also
prints out on which side of the rectangle the overwrite was.
Signed-off-by: Martin Storsjö <martin@martin.st>
This backports similar functionality from dav1d, from commits
35d1d011fda4a92bcaf42d30ed137583b27d7f6d and
d130da9c315d5a1d3968d278bbee2238ad9051e7.
This allows detecting writes out of bounds, on all 4 sides of
the intended destination rectangle.
The bounds checking also can optionally allow small overwrites
(up to a specified alignment), while still checking for larger
overwrites past the intended allowed region.
Signed-off-by: Martin Storsjö <martin@martin.st>
This makes it easier to implement custom error printouts in tests.
This is a port of dav1d's commit
13a7d78655f8747c2cd01e8a48d44dcc7f60a8e5 into ffmpeg's checkasm.
Signed-off-by: Martin Storsjö <martin@martin.st>
This was changed 8 years ago with the introduction of the linux-perf path,
with seemingly no justification at the time. Likely a developer oversight
from testing.
This bug not only made --runs completely ineffective, but also meant that we
didn't actually correctly filter out outliers.
Fixes: e0d56f097f
Many of the fields of MpegEncContext (which is also used by decoders)
are actually only used by encoders. Therefore this commit adds
a new encoder-only structure and moves all of the encoder-only
fields to it except for those which require more explicit
synchronisation between the main slice context and the other
slice contexts. This synchronisation is currently mainly provided
by ff_update_thread_context() which simply copies most of
the main slice context over the other slice contexts. Fields
which are moved to the new MPVEncContext no longer participate
in this (which is desired, because it is horrible and for the
fields b) below wasteful) which means that some fields can only
be moved when explicit synchronisation code is added in later commits.
More explicitly, this commit moves the following fields:
a) Fields not copied by ff_update_duplicate_context():
dct_error_sum and dct_count; the former does not need synchronisation,
the latter is synchronised in merge_context_after_encode().
b) Fields which do not change after initialisation (these fields
could also be put into MPVMainEncContext at the cost of
an indirection to access them): lambda_table, adaptive_quant,
{luma,chroma}_elim_threshold, new_pic, fdsp, mpvencdsp, pdsp,
{p,b_forw,b_back,b_bidir_forw,b_bidir_back,b_direct,b_field}_mv_table,
[pb]_field_select_table, mb_{type,var,mean}, mc_mb_var, {min,max}_qcoeff,
{inter,intra}_quant_bias, ac_esc_length, the *_vlc_length fields,
the q_{intra,inter,chroma_intra}_matrix{,16}, dct_offset, mb_info,
mjpeg_ctx, rtp_mode, rtp_payload_size, encode_mb, all function
pointers, mpv_flags, quantizer_noise_shaping,
frame_reconstruction_bitfield, error_rate and intra_penalty.
c) Fields which are already (re)set explicitly: The PutBitContexts
pb, tex_pb, pb2; dquant, skipdct, encoding_error, the statistics
fields {mv,i_tex,p_tex,misc,last}_bits and i_count; last_mv_dir,
esc_pos (reset when writing the header).
d) Fields which are only used by encoders not supporting slice
threading for which synchronisation doesn't matter: esc3_level_length
and the remaining mb_info fields.
e) coded_score: This field is only really used when FF_MPV_FLAG_CBP_RD
is set (which implies trellis) and even then it is only used for
non-intra blocks. For these blocks dct_quantize_trellis_c() either
sets coded_score[n] or returns a last_non_zero value of -1
in which case coded_score will be reset in encode_mb_internal().
Therefore no old values are ever used.
The MotionEstContext has not been moved yet.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
It allows the callee to clobber the MMX state,
yet since 1e3dc705df this is no longer
done. So use the stricter declare_func instead.
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Previously, we read elements from ff_aac_pow34sf_tab; however
that table is initialized to zero; one needs to call
ff_aac_float_common_init() to make sure that the table is
initialized.
However, given the range of the input values, a large number of
entries in ff_aac_pow34sf_tab would give results outside of the
range for signed 32 bit integers. As the largest aac_cb_maxval
entry is 16, it seems more reasonable to produce values within
an order of mangitude of that value.
(When hitting INT_MIN, implementations may end up with different
results depending on whether the value is negated as a float or
as an int. This corner case is irrelevant in practice as this
is way outside of the expected value range here.)
Coincidentally, this fixes linking checkasm with Apple's older
linker. (In Xcode 15, Apple switched to a new linker. The one in
older toolchains seems to have a bug where it won't figure out to
load object files from a static library, if the only symbol
referenced in the object file is a "common" symbol, i.e. one for
a zero-initialized variable. This issue can also be reproduced with
newer Apple toolchains by passing -Wl,-ld_classic to the linker.)
Signed-off-by: Martin Storsjö <martin@martin.st>