2018-04-29 09:29:52 -04:00
|
|
|
/*!
|
|
|
|
This crate provides featureful and fast printers that interoperate with the
|
|
|
|
[`grep-searcher`](https://docs.rs/grep-searcher)
|
|
|
|
crate.
|
|
|
|
|
|
|
|
# Brief overview
|
|
|
|
|
2023-09-25 18:24:08 -04:00
|
|
|
The [`Standard`] printer shows results in a human readable format, and is
|
|
|
|
modeled after the formats used by standard grep-like tools. Features include,
|
|
|
|
but are not limited to, cross platform terminal coloring, search & replace,
|
|
|
|
multi-line result handling and reporting summary statistics.
|
|
|
|
|
|
|
|
The [`JSON`] printer shows results in a machine readable format.
|
|
|
|
To facilitate a stream of search results, the format uses [JSON
|
|
|
|
Lines](https://jsonlines.org/) by emitting a series of messages as search
|
|
|
|
results are found.
|
|
|
|
|
|
|
|
The [`Summary`] printer shows *aggregate* results for a single search in a
|
|
|
|
human readable format, and is modeled after similar formats found in standard
|
|
|
|
grep-like tools. This printer is useful for showing the total number of matches
|
|
|
|
and/or printing file paths that either contain or don't contain matches.
|
2018-04-29 09:29:52 -04:00
|
|
|
|
|
|
|
# Example
|
|
|
|
|
|
|
|
This example shows how to create a "standard" printer and execute a search.
|
|
|
|
|
|
|
|
```
|
2023-09-21 16:57:02 -04:00
|
|
|
use {
|
|
|
|
grep_regex::RegexMatcher,
|
|
|
|
grep_printer::Standard,
|
|
|
|
grep_searcher::Searcher,
|
|
|
|
};
|
2018-04-29 09:29:52 -04:00
|
|
|
|
|
|
|
const SHERLOCK: &'static [u8] = b"\
|
|
|
|
For the Doctor Watsons of this world, as opposed to the Sherlock
|
|
|
|
Holmeses, success in the province of detective work must always
|
|
|
|
be, to a very large extent, the result of luck. Sherlock Holmes
|
|
|
|
can extract a clew from a wisp of straw or a flake of cigar ash;
|
|
|
|
but Doctor Watson has to have it taken out for him and dusted,
|
|
|
|
and exhibited clearly, with a label attached.
|
|
|
|
";
|
|
|
|
|
2023-09-21 16:57:02 -04:00
|
|
|
let matcher = RegexMatcher::new(r"Sherlock")?;
|
|
|
|
let mut printer = Standard::new_no_color(vec![]);
|
|
|
|
Searcher::new().search_slice(&matcher, SHERLOCK, printer.sink(&matcher))?;
|
|
|
|
|
|
|
|
// into_inner gives us back the underlying writer we provided to
|
|
|
|
// new_no_color, which is wrapped in a termcolor::NoColor. Thus, a second
|
|
|
|
// into_inner gives us back the actual buffer.
|
|
|
|
let output = String::from_utf8(printer.into_inner().into_inner())?;
|
|
|
|
let expected = "\
|
2018-04-29 09:29:52 -04:00
|
|
|
1:For the Doctor Watsons of this world, as opposed to the Sherlock
|
|
|
|
3:be, to a very large extent, the result of luck. Sherlock Holmes
|
|
|
|
";
|
2023-09-21 16:57:02 -04:00
|
|
|
assert_eq!(output, expected);
|
|
|
|
# Ok::<(), Box<dyn std::error::Error>>(())
|
2018-04-29 09:29:52 -04:00
|
|
|
```
|
|
|
|
*/
|
|
|
|
|
|
|
|
#![deny(missing_docs)]
|
2023-09-22 14:57:44 -04:00
|
|
|
#![cfg_attr(docsrs, feature(doc_auto_cfg))]
|
2023-09-21 16:57:02 -04:00
|
|
|
|
|
|
|
pub use crate::{
|
|
|
|
color::{default_color_specs, ColorError, ColorSpecs, UserColorSpec},
|
|
|
|
hyperlink::{
|
2023-09-22 14:57:44 -04:00
|
|
|
HyperlinkConfig, HyperlinkEnvironment, HyperlinkFormat,
|
|
|
|
HyperlinkFormatError,
|
2023-09-21 16:57:02 -04:00
|
|
|
},
|
2023-09-21 17:28:58 -04:00
|
|
|
path::{PathPrinter, PathPrinterBuilder},
|
2023-09-21 16:57:02 -04:00
|
|
|
standard::{Standard, StandardBuilder, StandardSink},
|
|
|
|
stats::Stats,
|
|
|
|
summary::{Summary, SummaryBuilder, SummaryKind, SummarySink},
|
2023-07-08 00:56:50 +02:00
|
|
|
};
|
2023-09-21 16:57:02 -04:00
|
|
|
|
|
|
|
#[cfg(feature = "serde")]
|
2021-06-01 19:29:50 -04:00
|
|
|
pub use crate::json::{JSONBuilder, JSONSink, JSON};
|
2018-04-29 09:29:52 -04:00
|
|
|
|
grep: fix bugs in handling multi-line look-around
This commit hacks in a bug fix for handling look-around across multiple
lines. The main problem is that by the time the matching lines are sent
to the printer, the surrounding context---which some look-behind or
look-ahead might have matched---could have been dropped if it wasn't
part of the set of matching lines. Therefore, when the printer re-runs
the regex engine in some cases (to do replacements, color matches, etc
etc), it won't be guaranteed to see the same matches that the searcher
found.
Overall, this is a giant clusterfuck and suggests that the way I divided
the abstraction boundary between the printer and the searcher is just
wrong. It's likely that the searcher needs to handle more of the work of
matching and pass that info on to the printer. The tricky part is that
this additional work isn't always needed. Ultimately, this means a
serious re-design of the interface between searching and printing. Sigh.
The way this fix works is to smuggle the underlying buffer used by the
searcher through into the printer. Since these bugs only impact
multi-line search (otherwise, searches are only limited to matches
across a single line), and since multi-line search always requires
having the entire file contents in a single contiguous slice (memory
mapped or on the heap), it follows that the buffer we pass through when
we need it is, in fact, the entire haystack. So this commit refactors
the printer's regex searching to use that buffer instead of the intended
bundle of bytes containing just the relevant matching portions of that
same buffer.
There is one last little hiccup: PCRE2 doesn't seem to have a way to
specify an ending position for a search. So when we re-run the search to
find matches, we can't say, "but don't search past here." Since the
buffer is likely to contain the entire file, we really cannot do
anything here other than specify a fixed upper bound on the number of
bytes to search. So if look-ahead goes more than N bytes beyond the
match, this code will break by simply being unable to find the match. In
practice, this is probably pretty rare. I believe that if we did a
better fix for this bug by fixing the interfaces, then we'd probably try
to have PCRE2 find the pertinent matches up front so that it never needs
to re-discover them.
Fixes #1412
2021-05-31 08:29:01 -04:00
|
|
|
// The maximum number of bytes to execute a search to account for look-ahead.
|
|
|
|
//
|
|
|
|
// This is an unfortunate kludge since PCRE2 doesn't provide a way to search
|
|
|
|
// a substring of some input while accounting for look-ahead. In theory, we
|
|
|
|
// could refactor the various 'grep' interfaces to account for it, but it would
|
|
|
|
// be a large change. So for now, we just let PCRE2 go looking a bit for a
|
|
|
|
// match without searching the entire rest of the contents.
|
|
|
|
//
|
|
|
|
// Note that this kludge is only active in multi-line mode.
|
|
|
|
const MAX_LOOK_AHEAD: usize = 128;
|
|
|
|
|
2018-04-29 09:29:52 -04:00
|
|
|
#[macro_use]
|
|
|
|
mod macros;
|
|
|
|
|
|
|
|
mod color;
|
|
|
|
mod counter;
|
2023-07-08 00:56:50 +02:00
|
|
|
mod hyperlink;
|
|
|
|
mod hyperlink_aliases;
|
2023-09-21 16:57:02 -04:00
|
|
|
#[cfg(feature = "serde")]
|
2018-04-29 09:29:52 -04:00
|
|
|
mod json;
|
2023-09-21 16:57:02 -04:00
|
|
|
#[cfg(feature = "serde")]
|
2018-04-29 09:29:52 -04:00
|
|
|
mod jsont;
|
2023-09-21 17:28:58 -04:00
|
|
|
mod path;
|
2018-04-29 09:29:52 -04:00
|
|
|
mod standard;
|
|
|
|
mod stats;
|
|
|
|
mod summary;
|
|
|
|
mod util;
|