1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00
ripgrep/crates/regex
Andrew Gallant c21302b409 regex: tweak inner literal heuristic
Previously, we had logic to skip our own inner literal optimization if
the regex itself was already (likely) accelerated. It turns out that the
presence of a Unicode word boundary can defeat acceleration to a point.
It's likely enough that even if the underlying regex is accelerated, it
would be prudent to do our own inner literal optimization if the pattern
has a Unicode word boundary.

Normally a Unicode word boundary doesn't defeat literal optimizations,
since even the slower engines can make use of *prefix* literal
optimizations. But a regex can be accelerated via its own inner or
suffix literal optimizations, and those require the use of a DFA (or
lazy DFA). Since DFAs crap out on haystacks that contain a non-ASCII
Unicode scalar value when the regex contains a Unicode word boundary, it
follows that an "accelerated" can still wind up being quite slow.

(An "accelerated" regex can also slow down because of restrictions on
avoiding quadratic behavior, but I believe this happens less frequently
and is not as severe as the slow down as a result of Unicode word
boundaries. Namely, avoiding quadratic behavior just means giving up on
the inner literal optimization for a single search. In which case, the
regex engine can still fall back to a normal forward DFA. That will
definitely be slower than an inner literal optimization done by ripgrep,
but not quite as dramatic as it would be when DFAs can't be used at
all.)
2023-11-20 23:51:53 -05:00
..
src regex: tweak inner literal heuristic 2023-11-20 23:51:53 -05:00
Cargo.toml progress 2023-10-09 20:29:52 -04:00
LICENSE-MIT repo: move all source code in crates directory 2020-02-17 19:24:53 -05:00
README.md edition: manual changes 2021-06-01 21:07:37 -04:00
UNLICENSE repo: move all source code in crates directory 2020-02-17 19:24:53 -05:00

grep-regex

The grep-regex crate provides an implementation of the Matcher trait from the grep-matcher crate. This implementation permits Rust's regex engine to be used in the grep crate for fast line oriented searching.

Build status

Dual-licensed under MIT or the UNLICENSE.

Documentation

https://docs.rs/grep-regex

NOTE: You probably don't want to use this crate directly. Instead, you should prefer the facade defined in the grep crate.

Usage

Add this to your Cargo.toml:

[dependencies]
grep-regex = "0.1"