Fixes so that fate under 64 bit Windows passes.
These functions replace all ff_hscale8to15_*_ssse3 when avx2 is available.
Signed-off-by: James Almer <jamrial@gmail.com>
LSX and LASX is loongarch SIMD extention.
They are enabled by default if compiler support it, and can be disabled
with '--disable-lsx' '--disable-lasx'.
Change-Id: Ie2608ea61dbd9b7fffadbf0ec2348bad6c124476
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Reviewed-by: guxiwei <guxiwei-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
This patch increases several stack buffers in order to fix
stack-buffer-overflows (e.g. in put_hevc_qpel_uni_hv_9 in
line 814 of hevcdsp_template.c) detected with ASAN in the hevc_pel
checkasm test.
The buffers are increased by the minimal amount necessary
in order not to mask potential future bugs.
Reviewed-by: Martin Storsjö <martin@martin.st>
Reviewed-by: "zhilizhao(赵志立)" <quinkblack@foxmail.com>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@outlook.com>
Call the scaler function directly rather than through a function
pointer. Drop the now-unused return value from ff_getSwsFunc() and
rename the function to reflect its new role.
This will be useful in the following commits, where it will become
important that the amount of output is different for scaled vs unscaled
case.
This sadly required making changes to the code itself,
due to the same context needing to be reused for both versions.
The lookup table had to be duplicated for both versions.
Add MMI & MSA runtime detection for MIPS.
Basically there are two code pathes. For systems that
natively support CPUCFG instruction or kernel emulated
that instruction, we'll sense this feature from HWCAP and
report the flags according to values grab from CPUCFG. For
systems that have no CPUCFG (or not export it in HWCAP),
we'll parse /proc/cpuinfo instead.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Reviewed-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
A buffer whose size is not a multiple of four has been initialized using
consecutive writes of 32bits. This results in a stack-buffer-overflow
reported by ASAN in the checkasm-sw_scale FATE-test.
Reviewed-by: Michael Niedermayer <michael@niedermayer.cc>
Signed-off-by: Andreas Rheinhardt <andreas.rheinhardt@gmail.com>
Also fill x8-x17 with garbage before calling the function.
Figure out the number of stack parameters and make sure that the
value on the stack after those is untouched.
Signed-off-by: Martin Storsjö <martin@martin.st>
Figure out the number of stack parameters and make sure that the
value on the stack after those is untouched.
Signed-off-by: Martin Storsjö <martin@martin.st>
We should just use a normal bl here, and the linker will add the 'x'
bit if necessary.
This fixes calling the checkasm_fail_func on windows, where the
code is built in thumb mode (and the linker doesn't clear the 'x'
bit in the blx instruction).
Signed-off-by: Martin Storsjö <martin@martin.st>
Add overflow test for hevc_add_res when int16_t coeff = -32768.
The result of C is good, while ASM is not.
To verify:
make fate-checkasm-hevc_add_res
ffmpeg/tests/checkasm/checkasm --test=hevc_add_res
./checkasm --test=hevc_add_res
checkasm: using random seed 679391863
MMXEXT:
hevc_add_res_4x4_8_mmxext (hevc_add_res.c:69)
- hevc_add_res.add_residual [FAILED]
SSE2:
hevc_add_res_8x8_8_sse2 (hevc_add_res.c:69)
hevc_add_res_16x16_8_sse2 (hevc_add_res.c:69)
hevc_add_res_32x32_8_sse2 (hevc_add_res.c:69)
- hevc_add_res.add_residual [FAILED]
AVX:
hevc_add_res_8x8_8_avx (hevc_add_res.c:69)
hevc_add_res_16x16_8_avx (hevc_add_res.c:69)
hevc_add_res_32x32_8_avx (hevc_add_res.c:69)
- hevc_add_res.add_residual [FAILED]
AVX2:
hevc_add_res_32x32_8_avx2 (hevc_add_res.c:69)
- hevc_add_res.add_residual [FAILED]
checkasm: 8 of 14 tests have failed
Signed-off-by: Xu Guangxin <guangxin.xu@intel.com>
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
check_func will return NULL for functions that have already been tested. If
the func is tested and skipped (which happens several times), there is no
need to prepare data(randomize_buffers and memcpy).
Move relative code in compare_add_res(), prepare data and do check only if
the function is not tested.
Signed-off-by: Linjie Fu <linjie.fu@intel.com>
Signed-off-by: Anton Khirnov <anton@khirnov.net>
The stereo_interpolate functions add h_step to the values h
BUF_SIZE times. Within the stereo_interpolate C functions, the
values h (h0-h3, h00-h13) are declared as local float variables,
but the compiler is free to keep them in a register with extra
precision.
If the accumulation is rounded to 32 bit float precision after
each step, the less significant bits of h_step end up ignored
and the sum can deviate, affecting the end result more than
the currently set EPS.
By clearing the log2(BUF_SIZE) lower bits of h_step, we make sure
that the accumulation shouldn't differ significantly, regardless
of any extra precision in the accmulating register/variable.
This fixes the aacpsdsp checkasm test when built with clang for
mingw/x86_32.
Signed-off-by: Martin Storsjö <martin@martin.st>
As the values generated by av_bmg_get can be arbitrarily large
(only the stddev is specified), we can't use a fixed tolerance.
Calculate a dynamic tolerance (like in float_dsp from 38f966b222),
based on the individual steps of the calculation.
This fixes running this test with certain seeds, when built with
clang for mingw/x86_32.
Signed-off-by: Martin Storsjö <martin@martin.st>
As the values generated by av_bmg_get can be arbitrarily large
(only the stddev is specified), we can't use a fixed tolerance.
This matches what was done for test_vector_dmul_scalar in
38f966b222.
This fixes the float_dsp checkasm test for some seeds, when built
with clang for mingw/x86_32.
Signed-off-by: Martin Storsjö <martin@martin.st>
Generic C implementation of vf_blend performs reads and writes of 16-bit
elements, which requires the buffers to be aligned to at least 2-byte
boundary.
Also, the change fixes source buffer overrun caused by src_offset being
added to to test handling of misaligned buffers.
Fixes: #7226
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
fix the warning: "function declaration isn’t a prototype", in C
int foo() and int foo(void) are different functions. int foo()
accepts an arbitrary number of arguments, while int foo(void) accepts 0
arguments.
Signed-off-by: Jun Zhao <mypopydev@gmail.com>
* commit 'e00db9f78bb475ed5103364f61892f4e75ef89ba':
checkasm: hevc: Add a hevc_ prefix to the add_residual functions
Merged-by: James Almer <jamrial@gmail.com>
* commit '7cb1d9e2dbbe5bf4652be5d78cdd68e956fa3d63':
build: Fine-grained link-time dependency settings
Also included are bug fix commits 5ff3b5cafc,
d9da7151ee and
5e27ef800b.
Merged-by: James Almer <jamrial@gmail.com>
On ARM platforms, accessing the PMU registers requires special user
access permissions. Since there is no other way to get accurate timers,
the current implementation of timers in FFmpeg rely on these registers.
Unfortunately, enabling user access to these registers on Linux is not
trivial, and generally involve compiling a random and unreliable github
kernel module, or patching somehow your kernel.
Such module is very unlikely to reach the upstream anytime soon. Quoting
Robin Murphin from ARM:
> Say you do give userspace direct access to the PMU; now run two or more
> programs at once that believe they can use the counters for their own
> "minimal-overhead" profiling. Have fun interpreting those results...
>
> And that's not even getting into the implications of scheduling across
> different CPUs, CPUidle, etc. where the PMU state is completely beyond
> userspace's control. In general, the plan to provide userspace with
> something which might happen to just about work in a few corner cases,
> but is meaningless, misleading or downright broken in all others, is to
> never do so.
As a result, the alternative is to use the Performance Monitoring Linux
API which makes use of these registers internally (assuming the PMU of
your ARM board is supported in the kernel, which is definitely not a
given...).
While the Linux API is obviously cross platform, it does have a
significant overhead which needs to be taken into account. As a result,
that mode is only weakly enabled on ARM platforms exclusively.
Note on the non flexibility of the implementation: the timers (native
FFmpeg vs Linux API) are selected at compilation time to prevent the
need of function calls, which would result in a negative impact on the
cycle counters.
This reverts commit 547db1eaec.
This commit wasn't supposed to be pushed (yet) since it hasn't
been reviewed.
Signed-off-by: Martin Storsjö <martin@martin.st>
Meant for DSP functions returning a float or double, as they'd fail if emms
is called after every run on x86_32.
Signed-off-by: James Almer <jamrial@gmail.com>
Loads from this strictly doesn't require alignment, but specify it
just for consistency with the arm version.
Signed-off-by: Martin Storsjö <martin@martin.st>
* commit 'effc1430b2fe5997d9d55bf28dc507c27125eb27':
Revert "checkasm: vp9dsp: Benchmark the dc-only version of idct_idct separately"
Merged-by: James Almer <jamrial@gmail.com>
* commit 'c91d6a33f872574c95c8784277cf60ffcf6bff4f':
checkasm: aarch64: Add filler args to make sure all parameters are passed on the stack
Merged-by: James Almer <jamrial@gmail.com>
* commit 'f1b3e131385176c3c9d9783b25047856a0dcebf6':
checkasm: aarch64: Clobber the stack before calling functions
Merged-by: James Almer <jamrial@gmail.com>
* commit 'a05cc56124b4f1237f6355784de821e3290ddb44':
checkasm: arm/aarch64: Fix the amount of space reserved for stack parameters
Merged-by: James Almer <jamrial@gmail.com>
* commit '22c3ab18646924ce24dc6017a9e882ff69689e40':
checkasm: Add test for huffyuvdsp add_bytes
huffyuvdsp is renamed to llviddsp to be consistent with our codebase.
Note: af607b7e07 wasn't actually required for this test since this
commit is not actually testing huffyuvdsp.
Merged-by: Clément Bœsch <u@pkh.me>
* commit '12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5':
audiodsp/x86: yasmify vector_clipf_sse
audiodsp: reorder arguments for vector_clipf
Merged the version from Libav after a discussion with James Almer on
IRC:
19:22 <ubitux> jamrial: opinion on 12004a9a7f20e44f4da2ee6c372d5e1794c8d6c5?
19:23 <ubitux> it was apparently yasmified differently
19:23 <ubitux> (it depends on the previous commit arg shuffle)
19:24 <ubitux> i don't see the magic movsxdifnidn in your port btw
19:24 <ubitux> it's a port from 1d36defe94
19:25 <jamrial> seems better thanks to said arg shuffle
19:25 <jamrial> the loop is the same, but init is simpler
19:25 <jamrial> probably worth merging
19:25 <ubitux> OK
19:25 <ubitux> thanks
19:26 <jamrial> curious they didn't make len ptrdiff_t after the previous bunch of commits, heh
19:26 <ubitux> yeah indeed
Both commits are merged at the same time to prevent a conflict with our
existing yasmified ff_vector_clipf_sse.
Merged-by: Clément Bœsch <u@pkh.me>