Instead, hardcode the use of the _arm implementation of add_pixels,
and use the C version for put_pixels (as no arm-optimized version
exists). Since there's separate implementations of idct{,_put,_add}
for neon, this has no practical impact on performance.