This gets rid of the variable-length scratch buffer by filtering 16
pixels at a time and writing directly to the destination. The extra
loads this requires to load the source values are compensated by not
doing a round-trip to memory before shifting.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This reverts parts of e0c6cce447. There is external mmx asm that
requires this alignment.
This fixes crashes when using swscale in builds with external mmx,
without inline assembly.
Signed-off-by: Martin Storsjö <martin@martin.st>
This introduces support for width%4==2 in addition to width%4==0. For
odd widths, some more checks are needed, since the current code always
handles two luma items in a row, thus there is a possibility of an
overread by one.
To access data at multiple fixed offsets from a base address, this
code uses a single "m" operand and code of the form "32%0", relying on
the memory operand instantiation having no displacement, giving a final
result of the form "32(%rax)". If the compiler uses a register and
displacement, e.g. "64(%rax)", the end result becomes "3264(%rax)",
which obviously does not work.
Replacing the "m" operands with "r" operands allows safe addition of a
displacement. In theory, multiple memory operands could use a shared
base register with different index registers, "(%rax,%rbx)", potentially
making more efficient use of registers. In the cases at hand, no such
sharing is possible since the addresses involved are entirely unrelated.
After this change, the code somewhat rudely accesses memory without
using a corresponding memory operand, which in some cases can lead to
unwanted "optimisations" of surrounding code. However, the original
code also accesses memory not covered by a memory operand, so this is
not adding any defect not already present. It is also hightly unlikely
that any such optimisations could be performed here since the memory
locations in questions are not accessed elsewhere in the same functions.
This fixes crashes with suncc.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Refactoring mmx2/mmxext YASM code with cpuflags will force renames.
So switching to a consistent naming scheme beforehand is sensible.
The name "mmxext" is more official and widespread and also the name
of the CPU flag, as reported e.g. by the Linux kernel.
Add support for all x86-64 registers
Prefer caller-saved register over callee-saved on WIN64
Support up to 15 function arguments
Also (by Ronald S. Bultje)
Fix up our asm to work with new x86inc.asm.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Signed-off-by: Justin Ruggles <justin.ruggles@gmail.com>