This is faster 2871 -> 2189 cycles for int16 matrixbench -> 23456hz
Fixes a integer overflow in a artificial corner case
Fixes part of 668007-media
Found-by: Matt Wolenetz <wolenetz@google.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>