1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-04-02 20:45:38 +02:00

106 lines
3.5 KiB
Rust
Raw Normal View History

/*!
This crate provides featureful and fast printers that interoperate with the
[`grep-searcher`](https://docs.rs/grep-searcher)
crate.
# Brief overview
The [`Standard`](struct.Standard.html) printer shows results in a human
readable format, and is modeled after the formats used by standard grep-like
tools. Features include, but are not limited to, cross platform terminal
coloring, search & replace, multi-line result handling and reporting summary
statistics.
The [`JSON`](struct.JSON.html) printer shows results in a machine readable
format. To facilitate a stream of search results, the format uses
[JSON Lines](https://jsonlines.org/)
by emitting a series of messages as search results are found.
The [`Summary`](struct.Summary.html) printer shows *aggregate* results for a
single search in a human readable format, and is modeled after similar formats
found in standard grep-like tools. This printer is useful for showing the total
number of matches and/or printing file paths that either contain or don't
contain matches.
# Example
This example shows how to create a "standard" printer and execute a search.
```
use {
grep_regex::RegexMatcher,
grep_printer::Standard,
grep_searcher::Searcher,
};
const SHERLOCK: &'static [u8] = b"\
For the Doctor Watsons of this world, as opposed to the Sherlock
Holmeses, success in the province of detective work must always
be, to a very large extent, the result of luck. Sherlock Holmes
can extract a clew from a wisp of straw or a flake of cigar ash;
but Doctor Watson has to have it taken out for him and dusted,
and exhibited clearly, with a label attached.
";
let matcher = RegexMatcher::new(r"Sherlock")?;
let mut printer = Standard::new_no_color(vec![]);
Searcher::new().search_slice(&matcher, SHERLOCK, printer.sink(&matcher))?;
// into_inner gives us back the underlying writer we provided to
// new_no_color, which is wrapped in a termcolor::NoColor. Thus, a second
// into_inner gives us back the actual buffer.
let output = String::from_utf8(printer.into_inner().into_inner())?;
let expected = "\
1:For the Doctor Watsons of this world, as opposed to the Sherlock
3:be, to a very large extent, the result of luck. Sherlock Holmes
";
assert_eq!(output, expected);
# Ok::<(), Box<dyn std::error::Error>>(())
```
*/
#![deny(missing_docs)]
#![cfg_attr(feature = "pattern", feature(pattern))]
pub use crate::{
color::{default_color_specs, ColorError, ColorSpecs, UserColorSpec},
hyperlink::{
HyperlinkPattern, HyperlinkPatternBuilder, HyperlinkPatternError,
},
path::{PathPrinter, PathPrinterBuilder},
standard::{Standard, StandardBuilder, StandardSink},
stats::Stats,
summary::{Summary, SummaryBuilder, SummaryKind, SummarySink},
};
#[cfg(feature = "serde")]
pub use crate::json::{JSONBuilder, JSONSink, JSON};
grep: fix bugs in handling multi-line look-around This commit hacks in a bug fix for handling look-around across multiple lines. The main problem is that by the time the matching lines are sent to the printer, the surrounding context---which some look-behind or look-ahead might have matched---could have been dropped if it wasn't part of the set of matching lines. Therefore, when the printer re-runs the regex engine in some cases (to do replacements, color matches, etc etc), it won't be guaranteed to see the same matches that the searcher found. Overall, this is a giant clusterfuck and suggests that the way I divided the abstraction boundary between the printer and the searcher is just wrong. It's likely that the searcher needs to handle more of the work of matching and pass that info on to the printer. The tricky part is that this additional work isn't always needed. Ultimately, this means a serious re-design of the interface between searching and printing. Sigh. The way this fix works is to smuggle the underlying buffer used by the searcher through into the printer. Since these bugs only impact multi-line search (otherwise, searches are only limited to matches across a single line), and since multi-line search always requires having the entire file contents in a single contiguous slice (memory mapped or on the heap), it follows that the buffer we pass through when we need it is, in fact, the entire haystack. So this commit refactors the printer's regex searching to use that buffer instead of the intended bundle of bytes containing just the relevant matching portions of that same buffer. There is one last little hiccup: PCRE2 doesn't seem to have a way to specify an ending position for a search. So when we re-run the search to find matches, we can't say, "but don't search past here." Since the buffer is likely to contain the entire file, we really cannot do anything here other than specify a fixed upper bound on the number of bytes to search. So if look-ahead goes more than N bytes beyond the match, this code will break by simply being unable to find the match. In practice, this is probably pretty rare. I believe that if we did a better fix for this bug by fixing the interfaces, then we'd probably try to have PCRE2 find the pertinent matches up front so that it never needs to re-discover them. Fixes #1412
2021-05-31 08:29:01 -04:00
// The maximum number of bytes to execute a search to account for look-ahead.
//
// This is an unfortunate kludge since PCRE2 doesn't provide a way to search
// a substring of some input while accounting for look-ahead. In theory, we
// could refactor the various 'grep' interfaces to account for it, but it would
// be a large change. So for now, we just let PCRE2 go looking a bit for a
// match without searching the entire rest of the contents.
//
// Note that this kludge is only active in multi-line mode.
const MAX_LOOK_AHEAD: usize = 128;
#[macro_use]
mod macros;
mod color;
mod counter;
mod hyperlink;
mod hyperlink_aliases;
#[cfg(feature = "serde")]
mod json;
#[cfg(feature = "serde")]
mod jsont;
mod path;
mod standard;
mod stats;
mod summary;
mod util;