This commit adds `anyhow` as a dependency and switches over to it from
Box<dyn Error>.
It actually looks like I've kept all of my errors rather shallow, such
that we don't get a huge benefit from anyhow at present. But now that
anyhow is in use, I expect to use its "context" feature more going
forward.
As a result of discussion in #2611, it seems prudent to disable
hyperlinks by default. Ideally they would be enabled, but it looks like
some environments may barf on them. Since this is the first release with
hyperlink support, it makes sense to me at least to make users opt into
them. This does not preclude enabling them by default in future
releases.
ripgrep does not, and likely never will, report which pattern matched.
Because of that, we can dedup the patterns via just their concrete
syntax without any fuss.
This is somewhat of a pathological case because you don't expect the end
user to pass duplicate patterns in general. But if the end user
generated a list of, say, names and did not dedup them, then ripgrep
could end up spending a lot of extra time on those duplicates if there
are many of them. By deduping them explicitly in the application, we
essentially remove their extra cost completely.
This essentially takes the work done in #2483 and does a bit of a
facelift. A brief summary:
* We reduce the hyperlink API we expose to just the format, a
configuration and an environment.
* We move buffer management into a hyperlink-specific interpolator.
* We expand the documentation on --hyperlink-format.
* We rewrite the hyperlink format parser to be a simple state machine
with support for escaping '{{' and '}}'.
* We remove the 'gethostname' dependency and instead insist on the
caller to provide the hostname. (So grep-printer doesn't get it
itself, but the application will.) Similarly for the WSL prefix.
* Probably some other things.
Overall, the general structure of #2483 was kept. The biggest change is
probably requiring the caller to pass in things like a hostname instead
of having the crate do it. I did this for a couple reasons:
1. I feel uncomfortable with code deep inside the printing logic
reaching out into the environment to assume responsibility for
retrieving the hostname. This feels more like an application-level
responsibility. Arguably, path canonicalization falls into this same
bucket, but it is more difficult to rip that out. (And we can do it
in the future in a backwards compatible fashion I think.)
2. I wanted to permit end users to tell ripgrep about their system's
hostname in their own way, e.g., by running a custom executable. I
want this because I know at least for my own use cases, I sometimes
log into systems using an SSH hostname that is distinct from the
system's actual hostname (usually because the system is shared in
some way or changing its hostname is not allowed/practical).
I think that's about it.
Closes#665, Closes#2483
I originally did not put PathPrinter into grep-printer because I
considered it somewhat extraneous to what a "grep" program does, and
also that its implementation was rather simple. But now with hyperlink
support, its implementation has grown a smidge more complicated. And
more importantly, its existence required exposing a lot more of the
hyperlink guts. Without it, we can keep things like HyperlinkPath and
HyperlinkSpan completely private.
We can now also keep `PrinterPath` completely private as well. And this
is a breaking change.
This does a variety of polishing.
1. Deprecate the tty methods in favor of std's IsTerminal trait.
2. Trim down un-needed dependencies.
3. Use bstr to implement escaping.
4. Various aesthetic polishing.
I'm doing this as prep work before adding more to this crate. And as
part of a general effort toward reducing ripgrep's dependencies.
This commit represents the initial work to get hyperlinks working and
was submitted as part of PR #2483. Subsequent commits largely retain the
functionality and structure of the hyperlink support added here, but
rejigger some things around.
Previously, sorting worked by sorting the parents and then sorting the
children within each parent. This was done during traversal, but it only
works when sorting parents preserves the overall order. This generally
only works for '--sort path' in ascending order.
This commit fixes the rest of the sorting behavior by collecting all of
the paths to search and then sorting them before searching. We only
collect all of the paths when sorting was requested.
Fixes#2243, Closes#2361
This causes ripgrep to stop searching an individual file after it has
found a non-matching line. But this only occurs after it has found a
matching line.
Fixes#1790, Closes#1930
This matches the behavior of GNU grep which does not ignore
before-context and after-context completely if the context flag is also
provided.
Note that this change wasn't done just to match GNU grep. In this case,
GNU grep has the more sensible behavior.
Fixes#2288, Closes#2451
Previously, ripgrep core was responsible for escaping regex patterns and
implementing the --line-regexp flag. This commit moves that
responsibility down into the matchers such that ripgrep just needs to
hand the patterns it gets off to the matcher builder. The builder will
then take care of escaping and all that.
This was done to make pattern construction completely owned by the
matcher builders. With the arrival regex-automata, this means we can
move to the HIR very quickly and then never move back to the concrete
syntax. We can then build our regex directly from the HIR. This overall
can save quite a bit of time, especially when searching for large
dictionaries.
We still aren't quite as fast as GNU grep when searching something on
the scale of /usr/share/dict/words, but we are basically within spitting
distance. Prior to this, we were about an order of magnitude slower.
This architecture in particular lets us write a pretty simple fast path
that avoids AST parsing and HIR translation entirely: the case where one
is just searching for a literal. In that case, we can hand construct the
HIR directly.
0.2.4 updates to PCRE2 10.42 and has a few other nice changes. For
example, when `utf` is enabled, the crate will always set the
PCRE2_MATCH_INVALID_UTF option. That means we no longer need to do
transcoding or UTF-8 validity checks.
Because of this, we actually get to remove one of the two uses of
`unsafe` in ripgrep's `main` program.
(This also updates a couple other dependencies for convenience.)
This leaves the grep-regex crate in tatters. Pretty much the entire
thing needs to be re-worked. The upshot is that it should result in some
big simplifications. I hope.
The idea here is to drop down and actually use regex-automata 0.3
instead of the regex crate itself.
It turns out that the vimgrep format really only wants one line per
match, even when that match spans multiple lines.
We continue to support the previous behavior (print all lines in a
match) in the `grep-printer` crate. We add a new option to enable the
"only print the first line" behavior, and unconditionally enable it in
ripgrep. We can do that because the option has no effect in single-line
mode, since, well, in that case matches are guaranteed to span one line
anyway.
Fixes#1866
These flags permit configuring the bytes used to delimit fields in match
or context lines, where "fields" are things like the file path, line
number, column number and the match/context itself.
Fixes#1842, Closes#1871
This was once part of ripgrep, but at some point, was unintentionally
removed. The value of this warning is that since ripgrep tries to be
"smart" by default, it can be surprising if it doesn't search certain
things. This warning covers the case when ripgrep searches *nothing*,
which happens somewhat more frequently than you might expect. e.g., If
you're searching within an ignore directory.
Note that for now, we only print this message when the user has not
supplied any explicit paths. It's not clear that we want to print this
otherwise, and in particular, it seems that the message shows up too
eagerly. e.g., 'rg foo does-not-exist' will both print an error about
'does-not-exist' not existing, *and* the message about no files being
searched, which seems annoying in this case. We can always refine this
logic later.
Fixes#1404, Closes#1762
This fixes a bug only present on Windows that would permit someone to
execute an arbitrary program if they crafted an appropriate directory
tree. Namely, if someone put an executable named 'xz.exe' in the root of
a directory tree and one ran 'rg -z foo' from the root of that tree,
then the 'xz.exe' executable in that tree would execute if there are any
'xz' files anywhere in the tree.
The root cause of this problem is that 'CreateProcess' on Windows will
implicitly look in the current working directory for an executable when
it is given a relative path to a program. Rust's standard library allows
this behavior to occur, so we work around it here. We work around it by
explicitly resolving programs like 'xz' via 'PATH'. That way, we only
ever pass an absolute path to 'CreateProcess', which avoids the implicit
behavior of checking the current working directory.
This fix doesn't apply to non-Windows systems as it is believed to only
impact Windows. In theory, the bug could apply on Unix if '.' is in
one's PATH, but at that point, you reap what you sow.
While the extent to which this is a security problem isn't clear, I
think users generally expect to be able to download or clone
repositories from the Internet and run ripgrep on them without fear of
anything too awful happening. Being able to execute an arbitrary program
probably violates that expectation. Therefore, CVE-2021-3013[1] was
created for this issue.
We apply the same logic to the --pre command, since the --pre command is
likely in a user's config file and it would be surprising for something
that the user is searching to modify which preprocessor command is used.
The --pre and -z/--search-zip flags are the only two ways that ripgrep
will invoke external programs, so this should cover any possible
exploitable cases of this bug.
[1] - https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-3013
The purpose of this flag is to force ripgrep to ignore all --ignore-file
flags (whether they come before or after --no-ignore-files).
This flag can be overridden with --ignore-files.
Fixes#1466
This permits switching between the different regex engine modes that
ripgrep supports. The purpose of this flag is to make it easier to
extend ripgrep with additional regex engines.
Closes#1488, Closes#1502
This is in preparation for adding a new --engine flag which is intended
to eventually supplant --auto-hybrid-regex.
While there are no immediate plans to add more regex engines to ripgrep,
this is intended to make it easier to maintain a patch to ripgrep with
an additional regex engine. See #1488 for more details.
The top-level listing was just getting a bit too long for my taste. So
put all of the code in one directory and shrink the large top-level mess
to a small top-level mess.
NOTE: This commit only contains renames. The subsequent commit will
actually make ripgrep build again. We do it this way with the naive hope
that this will make it easier for git history to track the renames.
Sigh.