It looks like it incorrectly treats a file that is purely valid UTF-8 as
a binary file, which in turn effectively renders all of the Russian
subtitle benchmarks moot for ugrep. So we pass '-a' to force ugrep to
treat the file as text.
This technically gives ugrep an edge because it now no longer needs to
look to see if the haystack is binary or not. In practice this is
usually implemented using highly optimized SIMD routines (e.g.,
'memchr'), so it tends not to matter much. We might also consider
passing '-a' to all grep commands. But... I think using '-a' is the less
common case and we should try to benchmark the common case.
This removes the old commented out URLs for the 2016 subtitles that
don't work any more. I should probably upload the files to a more stable
URL.
This also switches to a 'https://' GitHub URL as I believe the 'git://'
URLs are no longer supported.
It's not quite clear why I added this originally. ripgrep doesn't have
its `-a` flag enabled. It's possible I tricked myself into adding it
because ripgrep's binary detection has evolved to be more like GNU
grep's nowadays.
In any case, using `-a` on data that is non-binary can only improve
performance because it removes the overhead for checking whether the
data is binary or not. So this was giving an artificial boost to GNU
grep.
None of these tools got particularly popular (except for pt briefly),
but they do not appear to be active projects nowadays. While ucg was
fast, sift and pt were ecscruiating slow in a number of cases that
required special care in the benchmarks.
This also fixes the ordering of benchmark output to reflect the ordering
in the source of the benchsuite script.
Since the English subtitle file actually changed its content, we tweak
the benchmark to use a slightly bigger sample that more closely matches
the file size of the Russian subtitle file.
Also, the BurntSushi/linux repo has been updated and I've confirmed that
it builds on my Linux machine.
Fixes#1257
2016-12-24-archlinux-cheetah uses the same setup/compile config as
2016-09-22-archlinux-cheetah. The improvements are nice.
The other 2016-12-24-archlinux-cheetah-* benchmarks try to suss out
a difference between MUSL, glibc, jemalloc and the system allocator.
These benchmarks are exactly like the ones ran on 2016-09-17 with three
changes:
1. `pt` was added back to a few more benchmarks so that it appears any
time `sift` appears.
2. Warmup iterations was bumped from 1 to 3.
3. Actual benchmark iterations were bumped from 3 to 10.
These benchmarks took around two hours to run.
The runner now detects if commands exist and permits running incomplete
benchmarks.
Also, explicitly use Python 3 since that's what default Ubuntu 16.04 seems
to want.