Commit dfa9208074 ("mips/float_dsp: fix a bug in vector_fmul_window_mips")
fixed vector_fmul_window_mips by unrolling the loop only 4 times, but also
removed the outer C loop and replaced it with assembly branches and pointer
arithmetic. When submitting my 64-bit porting patch I missed this new
assembly which also needed porting.
This patch fixes a bus error in the fate-float-dsp test when run on 64-bit
mips.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Reviewed-by: Nedeljko Babic <nedeljko.babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Unfortunately android < api 21 (lollipop) doesn't have the sgidefs.h header,
the easiest way around this is to just use the preprocessor definitions from
gcc / clang.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* commit 'dcae2e32f7d8a1ca5fb8c1e4aa81313be854dd73':
arm: Suppress tags about used cpu arch and extensions
Merged-by: Michael Niedermayer <michaelni@gmx.at>
When all the codepaths using manually set .arch/.fpu code is
behind runtime detection, the elf attributes should be suppressed.
This allows tools to know that the final built binary doesn't
strictly require these extensions.
Signed-off-by: Martin Storsjö <martin@martin.st>
When OpenCL kernels are compiled, is_compiled flag is being set for each
kernel. But, in opencl uninit, this flag is not being cleared.
This causes an error when an OpenCL kernel is tried on different OpenCL
devices on same platform.
Here is the patch with a fix
Reviewed-by; Wei Gao <highgod0401@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This mainly consists of replacing all the pointer arithmatic 'addiu'
instructions with PTR_ADDIU which will handle the differences in pointer
sizes when compiled on 64 bit mips systems.
The header asmdefs.h contains the PTR_ macros which expend to the correct mips
instructions to manipulate registers containing pointers.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Loop was unrolled eight times although in heder there is assumption
that len is multiple of 4.
This is fixed, and assembly code is rewritten to be more optimal and
to simplify clobber list.
Signed-off-by: Nedeljko Babic <nedeljko.babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
These could trigger assert failures previously
Found-by: Andreas Cadhalpun <Andreas.Cadhalpun@googlemail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
add ARM code for implementing av_clip_intp2 using the ssat instruction
on Cortex-A8, av_clip_intp2_arm() is faster than av_clip_intp2_c() and
the generic av_clip(), about -19%
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
there already is a function, av_clip_uintp2() that clips a signed integer
to an unsigned power-of-two range, i.e. 0,2^p-1
this patch adds a function av_clip_intp2() that clips a signed integer
to a signed power-of-two range, i.e. -(2^p),(2^p-1)
the new function can be used as a special case for av_clip(), e.g.
av_clip(x, -8192, 8191) can be rewritten as av_clip_intp2(x, 13)
there are ARM instructions, usat and ssat resp., which map nicely to these
functions (see next patch)
Signed-off-by: Peter Meerwald <pmeerw@pmeerw.net>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
* commit '5b1d9ceec715846a58fe029bc3889ed6fa62436a':
pixfmt: add a pixel format for QSV hwaccel
Conflicts:
doc/APIchanges
libavutil/pixfmt.h
libavutil/version.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
SSE2 instructions that are XMM-implementations of pre-existing MMX/MMX2
instructions did not issue warnings when used in SSE functions. Handle
it by also checking the register type when such instructions are used.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>