The stack used by checkasm_checked_call_vfp was a multiple of 4 when the
checked function is called. AAPCS requires a double word (8 byte)
aligned stack public interfaces. Since both calls are public interfaces
the stack is misaligned when the checked is called.
Might fix the SIGBUS error in the armv7-linux-clang-3.7 fate config.
This avoids listing the same feature multiple times in the
test output. Previously the output contained something like this:
SSE2:
- hevc_mc.qpel [OK]
- hevc_mc.epel [OK]
- hevc_mc.unweighted_pred [OK]
- hevc_mc.qpel [OK]
- hevc_mc.epel [OK]
- hevc_mc.unweighted_pred [OK]
Signed-off-by: Martin Storsjö <martin@martin.st>
This avoids the risk of accidentally clobbering such variables outside
of the macro if the same variables are used there.
Signed-off-by: Martin Storsjö <martin@martin.st>
This fixes valgrind warnings about conditional jumps based on
uninitialized data (even though the uninitialized data only ever
was compared with a direct copy of the same uninitialized data).
Signed-off-by: Martin Storsjö <martin@martin.st>
The functions may not clean up properly after using MMX
registers. For the normal testing calls, the checkasm_checked_call
functions will do the cleanup (and check that functions that
should clean up do it as well), but when benchmarking functions
that don't clean up, we don't currently properly clean up at all.
This causes issues if a benchmarked function is followed by testing
of a function that is supposed to not clobber the MMX/FPU state but
doesn't touch it at all.
Signed-off-by: Martin Storsjö <martin@martin.st>
Restore alphabetical order in lists, break overly long lines, do some
prettyprinting, add some explanatory section comments, group parts
together that belong together logically.
Also bench a smaller buffer. This drastically reduces --bench runtime
and reports smaller, more readable numbers.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: James Almer <jamrial@gmail.com>
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Some debuggers/profilers use this metadata to determine which function a
given instruction is in; without it they get can confused by local labels
(if you haven't stripped those). On the other hand, some tools are still
confused even with this metadata. e.g. this fixes `gdb`, but not `perf`.
Currently only implemented for ELF.
Use two separate functions, depending on whether VFP/NEON is available.
This is set to require armv5te - it uses blx, which is only available
since armv5t, but we don't have a separate configure item for that.
(It also uses ldrd, which requires armv5te, but this could be avoided
if necessary.)
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit '2008f76054906e9ff6bf744800af0e5a5bfe61be':
dca: remove unused decode_hf function and quant_d tables
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
* commit '711781d7a1714ea4eb0217eb1ba04811978c43d1':
x86: checkasm: check for or handle missing cleanup after MMX instructions
Merged-by: Hendrik Leppkes <h.leppkes@gmail.com>
Check the full FPU tag word instead of only the lower half and simplify
the comparison.
Use upper-case function base name as macro name to instantiate both
checked_call variants.
Not every asm routine is expected clear the MMX state after returning.
It is however a requisite for testing floating point code in checkasm.
Annotate functions requiring cleanup with declare_func_emms() and issue
emms after the call. The remaining functions are checked for having a
cleared MMX state after return.
The vector mode was deprecated in ARMv7-A/VFPv3 and various cpu
implementations do not support it in hardware. Vector mode code will
depending the OS either be emulated in software or result in an illegal
instruction on cpus which does not support it. This was not really
problem in practice since NEON implementations of the same functions are
preferred. It will however become a problem for checkasm which tests
every cpu flag separately.
Since this is a cpu feature newer cpu do not support anymore the
behaviour of this flag differs from the other flags. It can be only
activated by runtime cpu feature selection.
These aren't quite as helpful as the ones in 8bpp, since over there,
we can use pmulhrsw, but here the coefficients have too many bits to
be able to take advantage of pmulhrsw. However, we can still skip
cols for which all coefs are 0, and instead just zero the input data
for the row itx. This helps a few % on overall decoding speed.
The System V ABI on x86-64 specifies that the al register contains an upper
bound of the number of arguments passed in vector registers when calling
variadic functions, so we aren't allowed to clobber it.
checkasm_fail_func() is a variadic function so also zero al before calling it.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
Tested functions are internally kept in a binary search tree for efficient
lookups. The downside of the current implementation is that the tree quickly
becomes unbalanced which causes an unneccessary amount of comparisons between
nodes. Improve this by changing the tree into a self-balancing left-leaning
red-black tree with a worst case lookup/insertion time complexity of O(log n).
Significantly reduces the recursion depth and makes the tests run around 10%
faster overall. The relative performance improvement compared to the existing
non-balanced tree will also most likely increase as more tests are added.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The System V ABI on x86-64 specifies that the al register contains an upper
bound of the number of arguments passed in vector registers when calling
variadic functions, so we aren't allowed to clobber it.
checkasm_fail_func() is a variadic function so also zero al before calling it.
Tested functions are internally kept in a binary search tree for efficient
lookups. The downside of the current implementation is that the tree quickly
becomes unbalanced which causes an unneccessary amount of comparisons between
nodes. Improve this by changing the tree into a self-balancing left-leaning
red-black tree with a worst case lookup/insertion time complexity of O(log n).
Significantly reduces the recursion depth and makes the tests run around 10%
faster overall. The relative performance improvement compared to the existing
non-balanced tree will also most likely increase as more tests are added.
The randomize_buffer() implementation assures that "most of the time",
we'll do a good mix of wide16/wide8/hev/regular/no filters for complete
code coverage. However, this is not mathematically assured because that
would make the code either much more complex, or much less random.
Now we no longer have to rely on function pointers intentionally
declared without specified argument types.
This makes it easier to support functions with floating point parameters
or return values as well as functions returning 64-bit values on 32-bit
architectures. It also avoids having to explicitly cast strides to
ptrdiff_t for example.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
If the return value doesn't fit in a single register rdx/edx can in some
cases be used in addition to rax/eax.
Doesn't affect any of the existing checkasm tests but might be useful later.
Also comment the relevant code a bit better.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
If the return value doesn't fit in a single register rdx/edx can in some
cases be used in addition to rax/eax.
Doesn't affect any of the existing checkasm tests but might be useful later.
Also comment the relevant code a bit better.
Now we no longer have to rely on function pointers intentionally
declared without specified argument types.
This makes it easier to support functions with floating point parameters
or return values as well as functions returning 64-bit values on 32-bit
architectures. It also avoids having to explicitly cast strides to
ptrdiff_t for example.
* commit 'bf0cef5c3a114df452e5476167634dd8f51eb448':
checkasm: Include io.h for isatty, if available
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
configure does check for isatty, and checkasm properly checks
HAVE_ISATTY, but on some platforms (e.g. WinRT), io.h needs to be
included for isatty to be available.
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit 'e605bf3b590d295f215fcc9fd58eb11be55b68cb':
checkasm: remove empty array initializer list in h264pred test
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
* commit '82e6ac85ff9aa7631b8c01521b3d6b5ca0bc8014':
checkasm: test all architectures with optimisations
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
* commit '6cc4d3e9a982e926494f4b919d9733fe29774acf':
checkasm: exit with status 0 instead of 1 if there are no tests to perform
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
It provides the following features:
* verify correctness by comparing output to the C version.
* detect failure to save and restore clobbered callee-saved registers.
* detect 32-bit parameters being used as if they were 64-bit in x86-64
(the upper halves are not guaranteed to be zero - but in practice
they very often are, which makes those bugs hard to spot otherwise).
* easy benchmarking.
Compile by running 'make checkasm'.
Execute by running 'tests/checkasm/checkasm'.
Optional arguments are '--bench' to run benchmarks for all functions,
'--bench=<pattern>' to run benchmarks for all functions that starts with
<pattern>, and '<integer>' to seed the PRNG for reproducible results.
Contains unit tests for most h264pred functions to get started, more tests
can be added afterwards using those as a reference.
Loosely based on code from x264. Currently only supports x86 and x86-64,
but additional architectures shouldn't be too much of an obstacle to add.
Note that functions with floating point parameters or floating point
return values are not supported. Some compiler-specific features or
preprocessor hacks would likely be required to add support for that.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>