mirror of
https://github.com/BurntSushi/ripgrep.git
synced 2025-11-23 21:54:45 +02:00
As of the memchr 2.6 release, its Iterator::count method is specialized
to only count the number of occurrences instead of finding the offset of
each occurrence. This replaces ripgrep's use of the bytecount crate.
While micro-benchmarks suggest that memchr's method has better
throughput than bytecount, it turned out to be an illusion. Namely, on a
~13GB haystack prior to this change:
$ time rg-bytecount 'You killed my friend, my best friend, my lifelong friend!' OpenSubtitles2018.raw.en --line-number
441450441:- You killed my friend, my best friend, my lifelong friend!
real 1.473
user 1.186
sys 0.286
maxmem 12512 MB
faults 0
And then after:
$ time rg 'You killed my friend, my best friend, my lifelong friend!' OpenSubtitles2018.raw.en --line-number
441450441:- You killed my friend, my best friend, my lifelong friend!
real 1.532
user 1.280
sys 0.250
maxmem 12512 MB
faults 0
But perf is just about in the same ballpark. That's good enough for me
at the moment in order to drop the extra dependency.
I did this because the marginal cost of adding the Iterator::count()
specialization to memchr was extremely small.
grep-searcher
A high level library for executing fast line oriented searches. This handles things like reporting contextual lines, counting lines, inverting a search, detecting binary data, automatic UTF-16 transcoding and deciding whether or not to use memory maps.
Dual-licensed under MIT or the UNLICENSE.
Documentation
NOTE: You probably don't want to use this crate directly. Instead, you
should prefer the facade defined in the
grep
crate.
Usage
Add this to your Cargo.toml:
[dependencies]
grep-searcher = "0.1"