Run loop filter per row instead of per MB, this also should make it
much easier to switch to per frame filtering and also doing so in a
seperate thread in the future if some volunteer wants to try.
Overall decoding speedup of 1.7% (single thread on pentium dual / cathedral sample)
This change also allows some optimizations to be tried that would not have
been possible before.
Originally committed as revision 21270 to svn://svn.ffmpeg.org/ffmpeg/trunk
No benchmark because its just replacing variables with litteral constants
(so no risk for slowdown outside gcc silliness) and i need sleep.
Originally committed as revision 21237 to svn://svn.ffmpeg.org/ffmpeg/trunk