1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00
ripgrep/grep-regex
Andrew Gallant 09108b7fda regex: make multi-literal searcher faster
This makes the case of searching for a dictionary of a very large number
of literals much much faster. (~10x or so.) In particular, we achieve this
by short-circuiting the construction of a full regex when we know we have
a simple alternation of literals. Building the regex for a large dictionary
(>100,000 literals) turns out to be quite slow, even if it internally will
dispatch to Aho-Corasick.

Even that isn't quite enough. It turns out that even *parsing* such a regex
is quite slow. So when the -F/--fixed-strings flag is set, we short
circuit regex parsing completely and jump straight to Aho-Corasick.

We aren't quite as fast as GNU grep here, but it's much closer (less than
2x slower).

In general, this is somewhat of a hack. In particular, it seems plausible
that this optimization could be implemented entirely in the regex engine.
Unfortunately, the regex engine's internals are just not amenable to this
at all, so it would require a larger refactoring effort. For now, it's
good enough to add this fairly simple hack at a higher level.

Unfortunately, if you don't pass -F/--fixed-strings, then ripgrep will
be slower, because of the aforementioned missing optimization. Moreover,
passing flags like `-i` or `-S` will cause ripgrep to abandon this
optimization and fall back to something potentially much slower. Again,
this fix really needs to happen inside the regex engine, although we
might be able to special case -i when the input literals are pure ASCII
via Aho-Corasick's `ascii_case_insensitive`.

Fixes #497, Fixes #838
2019-04-07 19:11:03 -04:00
..
src regex: make multi-literal searcher faster 2019-04-07 19:11:03 -04:00
Cargo.toml regex: make multi-literal searcher faster 2019-04-07 19:11:03 -04:00
LICENSE-MIT libripgrep: initial commit introducing libripgrep 2018-08-20 07:10:19 -04:00
README.md libripgrep: initial commit introducing libripgrep 2018-08-20 07:10:19 -04:00
UNLICENSE libripgrep: initial commit introducing libripgrep 2018-08-20 07:10:19 -04:00

grep-regex

The grep-regex crate provides an implementation of the Matcher trait from the grep-matcher crate. This implementation permits Rust's regex engine to be used in the grep crate for fast line oriented searching.

Linux build status Windows build status

Dual-licensed under MIT or the UNLICENSE.

Documentation

https://docs.rs/grep-regex

NOTE: You probably don't want to use this crate directly. Instead, you should prefer the facade defined in the grep crate.

Usage

Add this to your Cargo.toml:

[dependencies]
grep-regex = "0.1"

and this to your crate root:

extern crate grep_regex;