I though plain `read` had usurped them, but when searching a very small
number of files, mmaps can be around 20% faster on Linux. It'd be really
unfortunate to leave that on the table.
Mmap searching doesn't support contexts yet, but we probably don't really
care. And duplicating that logic doesn't sound fun. Without contexts, mmap
searching is delightfully simple.
In particular, if we had an inner literal and were doing a case insensitive
search, then the literals are dropped because we previously only allowed
a single inner literal to have an effect. Now we allow alternations of
inner literals, but still don't quite take full advantage.
- Refactored interaction between CLI args and rest of xrep.
- Filling in a lot more options, including file type filtering.
- Fixing some bugs in globbing/ignoring.
- More documentation.
Memory maps appear to degrade quite a bit in the presence of multithreading.
Also, switch to lock free data structures for synchronization. Give each
worker an input and output buffer which require no synchronization.
I'm pretty disappointed by the performance of regex sets. They are
apparently spending a lot of their time in construction of the DFA,
which probably means that the DFA is just too big.
It turns out that it's actually faster to build an *additional* normal
regex with the alternation of every glob and use it as a first-pass
filter over every file path. If there's a match, only then do we try the
more expensive RegexSet.
We really need functionality like this when memory maps aren't suitable,
either because they're too slow or because they just aren't available (like
for reading stdin). However, this particular approach was completely bunk.
Namely, the interface was all wrong. The caller needs to maintain some kind
of control over the search buffers for special output features (like
contexts or inverted matching), but this interface as written doesn't
support that kind of pattern at all.
So... back to the drawing board.