For test images manually generated to contain only up prediction,
timing results:
8380x3032 255x185
before: 138635 1992
after: 139232 1996
Actually jumping to the proper version depending on the alignment:
8380x3032: 138767
A 0.5% speed improvement for gigantic images is not worth the code
duplication.
Fixes ticket #4148
Signed-off-by: Christophe Gisquet <christophe.gisquet@gmail.com>
Tested-by: Benoit Fouet <benoit.fouet@free.fr>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Line sizes are only 8-byte aligned, so use unaliged loads
for add_bytes_l2 pointers.
Increasing the alignment requirement to 16 seemed a bit extreme
(png may be used for rather small sizes).
Also fix a mov that had its arguments swapped, leading
add_bytes_l2 being applied on up to 8 bytes too few.
Signed-off-by: Reimar Döffinger <Reimar.Doeffinger@gmx.de>