These benchmarks are exactly like the ones ran on 2016-09-17 with three
changes:
1. `pt` was added back to a few more benchmarks so that it appears any
time `sift` appears.
2. Warmup iterations was bumped from 1 to 3.
3. Actual benchmark iterations were bumped from 3 to 10.
These benchmarks took around two hours to run.
The runner now detects if commands exist and permits running incomplete
benchmarks.
Also, explicitly use Python 3 since that's what default Ubuntu 16.04 seems
to want.