4.5x faster than C float version with autovectorization 10 x faster than C int version 25 x faster than C float version without autovectorization