Memory maps appear to degrade quite a bit in the presence of multithreading.
Also, switch to lock free data structures for synchronization. Give each
worker an input and output buffer which require no synchronization.
I'm pretty disappointed by the performance of regex sets. They are
apparently spending a lot of their time in construction of the DFA,
which probably means that the DFA is just too big.
It turns out that it's actually faster to build an *additional* normal
regex with the alternation of every glob and use it as a first-pass
filter over every file path. If there's a match, only then do we try the
more expensive RegexSet.