ripgrep/crates/regex/src/literal.rs

use regex_syntax::hir::Hir;

// BREADCRUMBS:
//
// The way we deal with line terminators in the regex is clunky, but probably
// the least bad option for now unfortunately.
//
// The `non_matching_bytes` routine currently hardcodes line terminators for
// anchors. But it's not really clear it should even care about line terminators
// anyway, since anchors aren't actually part of a match. If we fix that
// though, that currently reveals a different bug elsewhere: '(?-m:^)' isn't
// implemented correctly in multi-line search, because it defers to the fast
// line-by-line strategy, which ends up being wrong. I think the way forward
// there is to:
//
// 1) Adding something in the grep-matcher interface that exposes a way to
// query for \A and \z specifically. If they're in the pattern, then we can
// decide how to handle them.
//
// 2) Perhaps provide a way to "translate \A/\z to ^/$" for cases when
// mulit-line search is not enabled.

#[derive(Clone, Debug)]
pub struct LiteralSets {}

impl LiteralSets {
    /// Create a set of literals from the given HIR expression.
    pub fn new(_: &Hir) -> LiteralSets {
        LiteralSets {}
    }

    /// If it is deemed advantageuous to do so (via various suspicious
    /// heuristics), this will return a single regular expression pattern that
    /// matches a subset of the language matched by the regular expression that
    /// generated these literal sets. The idea here is that the pattern
    /// returned by this method is much cheaper to search for. i.e., It is
    /// usually a single literal or an alternation of literals.
    pub fn one_regex(&self, _word: bool) -> Option<String> {
        None
    }
}
deps: initial migration steps to regex 1.9 This leaves the grep-regex crate in tatters. Pretty much the entire thing needs to be re-worked. The upshot is that it should result in some big simplifications. I hope. The idea here is to drop down and actually use regex-automata 0.3 instead of the regex crate itself. 2023-06-11 21:25:23 -04:00			`use regex_syntax::hir::Hir;`
libripgrep: initial commit introducing libripgrep libripgrep is not any one library, but rather, a collection of libraries that roughly separate the following key distinct phases in a grep implementation: 1. Pattern matching (e.g., by a regex engine). 2. Searching a file using a pattern matcher. 3. Printing results. Ultimately, both (1) and (3) are defined by de-coupled interfaces, of which there may be multiple implementations. Namely, (1) is satisfied by the `Matcher` trait in the `grep-matcher` crate and (3) is satisfied by the `Sink` trait in the `grep2` crate. The searcher (2) ties everything together and finds results using a matcher and reports those results using a `Sink` implementation. Closes #162 2018-04-29 09:29:52 -04:00
regex: migrate grep-regex to regex-automata We just do a "basic" dumb migration. We don't try to improve anything here. 2023-06-15 15:05:07 -04:00			`// BREADCRUMBS:`
			`//`
			`// The way we deal with line terminators in the regex is clunky, but probably`
			`// the least bad option for now unfortunately.`
			`//`
			// The `non_matching_bytes` routine currently hardcodes line terminators for
			`// anchors. But it's not really clear it should even care about line terminators`
			`// anyway, since anchors aren't actually part of a match. If we fix that`
			`// though, that currently reveals a different bug elsewhere: '(?-m:^)' isn't`
			`// implemented correctly in multi-line search, because it defers to the fast`
			`// line-by-line strategy, which ends up being wrong. I think the way forward`
			`// there is to:`
			`//`
			`// 1) Adding something in the grep-matcher interface that exposes a way to`
			`// query for \A and \z specifically. If they're in the pattern, then we can`
			`// decide how to handle them.`
			`//`
			`// 2) Perhaps provide a way to "translate \A/\z to ^/$" for cases when`
			`// mulit-line search is not enabled.`

libripgrep: initial commit introducing libripgrep libripgrep is not any one library, but rather, a collection of libraries that roughly separate the following key distinct phases in a grep implementation: 1. Pattern matching (e.g., by a regex engine). 2. Searching a file using a pattern matcher. 3. Printing results. Ultimately, both (1) and (3) are defined by de-coupled interfaces, of which there may be multiple implementations. Namely, (1) is satisfied by the `Matcher` trait in the `grep-matcher` crate and (3) is satisfied by the `Sink` trait in the `grep2` crate. The searcher (2) ties everything together and finds results using a matcher and reports those results using a `Sink` implementation. Closes #162 2018-04-29 09:29:52 -04:00			`#[derive(Clone, Debug)]`
deps: initial migration steps to regex 1.9 This leaves the grep-regex crate in tatters. Pretty much the entire thing needs to be re-worked. The upshot is that it should result in some big simplifications. I hope. The idea here is to drop down and actually use regex-automata 0.3 instead of the regex crate itself. 2023-06-11 21:25:23 -04:00			`pub struct LiteralSets {}`
libripgrep: initial commit introducing libripgrep libripgrep is not any one library, but rather, a collection of libraries that roughly separate the following key distinct phases in a grep implementation: 1. Pattern matching (e.g., by a regex engine). 2. Searching a file using a pattern matcher. 3. Printing results. Ultimately, both (1) and (3) are defined by de-coupled interfaces, of which there may be multiple implementations. Namely, (1) is satisfied by the `Matcher` trait in the `grep-matcher` crate and (3) is satisfied by the `Sink` trait in the `grep2` crate. The searcher (2) ties everything together and finds results using a matcher and reports those results using a `Sink` implementation. Closes #162 2018-04-29 09:29:52 -04:00
			`impl LiteralSets {`
			`/// Create a set of literals from the given HIR expression.`
deps: initial migration steps to regex 1.9 This leaves the grep-regex crate in tatters. Pretty much the entire thing needs to be re-worked. The upshot is that it should result in some big simplifications. I hope. The idea here is to drop down and actually use regex-automata 0.3 instead of the regex crate itself. 2023-06-11 21:25:23 -04:00			`pub fn new(_: &Hir) -> LiteralSets {`
			`LiteralSets {}`
libripgrep: initial commit introducing libripgrep libripgrep is not any one library, but rather, a collection of libraries that roughly separate the following key distinct phases in a grep implementation: 1. Pattern matching (e.g., by a regex engine). 2. Searching a file using a pattern matcher. 3. Printing results. Ultimately, both (1) and (3) are defined by de-coupled interfaces, of which there may be multiple implementations. Namely, (1) is satisfied by the `Matcher` trait in the `grep-matcher` crate and (3) is satisfied by the `Sink` trait in the `grep2` crate. The searcher (2) ties everything together and finds results using a matcher and reports those results using a `Sink` implementation. Closes #162 2018-04-29 09:29:52 -04:00			`}`

			`/// If it is deemed advantageuous to do so (via various suspicious`
			`/// heuristics), this will return a single regular expression pattern that`
			`/// matches a subset of the language matched by the regular expression that`
			`/// generated these literal sets. The idea here is that the pattern`
			`/// returned by this method is much cheaper to search for. i.e., It is`
			`/// usually a single literal or an alternation of literals.`
deps: initial migration steps to regex 1.9 This leaves the grep-regex crate in tatters. Pretty much the entire thing needs to be re-worked. The upshot is that it should result in some big simplifications. I hope. The idea here is to drop down and actually use regex-automata 0.3 instead of the regex crate itself. 2023-06-11 21:25:23 -04:00			`pub fn one_regex(&self, _word: bool) -> Option<String> {`
			`None`
regex: fix another inner literal bug It looks like `is_simple` wasn't quite correct. I can't wait until this code is rewritten. It is still not quite clearly correct to me. Fixes #1537 2020-04-01 20:34:39 -04:00			`}`
libripgrep: initial commit introducing libripgrep libripgrep is not any one library, but rather, a collection of libraries that roughly separate the following key distinct phases in a grep implementation: 1. Pattern matching (e.g., by a regex engine). 2. Searching a file using a pattern matcher. 3. Printing results. Ultimately, both (1) and (3) are defined by de-coupled interfaces, of which there may be multiple implementations. Namely, (1) is satisfied by the `Matcher` trait in the `grep-matcher` crate and (3) is satisfied by the `Sink` trait in the `grep2` crate. The searcher (2) ties everything together and finds results using a matcher and reports those results using a `Sink` implementation. Closes #162 2018-04-29 09:29:52 -04:00			`}`