mirror of
https://github.com/BurntSushi/ripgrep.git
synced 2024-12-12 19:18:24 +02:00
ad97e9c93f
This fixes an interesting performance bug where the inner literal extractor would sometimes choose a sub-optimal literal. For example, consider the regex: \x20+Sherlock Holmes\x20+ (The `\x20` is the ASCII code for a space character, which we use here to just make it clearer. It otherwise does not matter.) Previously, this would see the initial \x20 and then stop collecting literals after the `+` repetition operator. This was because the inner literal detector was adapter from the prefix literal detector, which had to stop here. Namely, while \x20S would be a valid prefix (for example), \x20\x20S would also be a valid prefix. As would \x20\x20\x20S and so on. So the prefix detector would have to stop at the repetition operator. Otherwise, only searching for \x20S could potentially scan farther then the starting position of the next match. However, for inner literals, this calculus no longer makes sense. We can freely search for, e.g., \x20S without missing matches that start with \x20\x20S precisely because we know this is an inner literal which may not correspond to the start of a match. With this fix, the literal that is now detected is \x20Sherlock Holmes\x20 Which is much better. We achieve this by no longer "cutting" literals after seeing a `+` repetition operator. Instead, we permit literals to continue to be extended. The reason why this is important is because using \x20 as the literal to search for is generally bad juju since it is so common. In fact, we should probably add more logic here to either avoid such things or give up entirely on the inner literal optimization if it detected a literal that we think is very common. But we punt on such things here. |
||
---|---|---|
.. | ||
src | ||
Cargo.toml | ||
LICENSE-MIT | ||
README.md | ||
UNLICENSE |
grep-regex
The grep-regex
crate provides an implementation of the Matcher
trait from
the grep-matcher
crate. This implementation permits Rust's regex engine to
be used in the grep
crate for fast line oriented searching.
Dual-licensed under MIT or the UNLICENSE.
Documentation
NOTE: You probably don't want to use this crate directly. Instead, you
should prefer the facade defined in the
grep
crate.
Usage
Add this to your Cargo.toml
:
[dependencies]
grep-regex = "0.1"
and this to your crate root:
extern crate grep_regex;