1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00
ripgrep/grep-regex
Andrew Gallant ad97e9c93f grep-regex: improve inner literal detection
This fixes an interesting performance bug where the inner literal
extractor would sometimes choose a sub-optimal literal. For example,
consider the regex:

    \x20+Sherlock Holmes\x20+

(The `\x20` is the ASCII code for a space character, which we use here
to just make it clearer. It otherwise does not matter.)

Previously, this would see the initial \x20 and then stop collecting
literals after the `+` repetition operator. This was because the inner
literal detector was adapter from the prefix literal detector, which had
to stop here. Namely, while \x20S would be a valid prefix (for example),
\x20\x20S would also be a valid prefix. As would \x20\x20\x20S and so
on. So the prefix detector would have to stop at the repetition
operator. Otherwise, only searching for \x20S could potentially scan
farther then the starting position of the next match.

However, for inner literals, this calculus no longer makes sense. We can
freely search for, e.g., \x20S without missing matches that start with
\x20\x20S precisely because we know this is an inner literal which may
not correspond to the start of a match.

With this fix, the literal that is now detected is

    \x20Sherlock Holmes\x20

Which is much better. We achieve this by no longer "cutting" literals
after seeing a `+` repetition operator. Instead, we permit literals to
continue to be extended.

The reason why this is important is because using \x20 as the literal to
search for is generally bad juju since it is so common. In fact, we
should probably add more logic here to either avoid such things or give
up entirely on the inner literal optimization if it detected a literal
that we think is very common. But we punt on such things here.
2020-02-17 17:16:28 -05:00
..
src grep-regex: improve inner literal detection 2020-02-17 17:16:28 -05:00
Cargo.toml deps: update to thread_local 1.0 2020-01-10 15:07:47 -05:00
LICENSE-MIT libripgrep: initial commit introducing libripgrep 2018-08-20 07:10:19 -04:00
README.md libripgrep: initial commit introducing libripgrep 2018-08-20 07:10:19 -04:00
UNLICENSE libripgrep: initial commit introducing libripgrep 2018-08-20 07:10:19 -04:00

grep-regex

The grep-regex crate provides an implementation of the Matcher trait from the grep-matcher crate. This implementation permits Rust's regex engine to be used in the grep crate for fast line oriented searching.

Linux build status Windows build status

Dual-licensed under MIT or the UNLICENSE.

Documentation

https://docs.rs/grep-regex

NOTE: You probably don't want to use this crate directly. Instead, you should prefer the facade defined in the grep crate.

Usage

Add this to your Cargo.toml:

[dependencies]
grep-regex = "0.1"

and this to your crate root:

extern crate grep_regex;