A bug was introduced in 977105407c whereby when
frame height wasn't divisible by the number of threads, pixels would be omitted
from the bottom rows during decode.
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
Enjoy some cache locality and use less threads.
About 5x speedup (from 60ms to 12ms to decode a 4k frame).
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>