This commit attempts to surface binary filtering in a slightly more
user friendly way. Namely, before, ripgrep would silently stop
searching a file if it detected a NUL byte, even if it had previously
printed a match. This can lead to the user quite reasonably assuming
that there are no more matches, since a partial search is fairly
unintuitive. (ripgrep has this behavior by default because it really
wants to NOT search binary files at all, just like it doesn't search
gitignored or hidden files.)
With this commit, if a match has already been printed and ripgrep detects
a NUL byte, then it will print a warning message indicating that the search
stopped prematurely.
Moreover, this commit adds a new flag, --binary, which causes ripgrep to
stop filtering binary files, but in a way that still avoids dumping
binary data into terminals. That is, the --binary flag makes ripgrep
behave more like grep's default behavior.
For files explicitly specified in a search, e.g., `rg foo some-file`,
then no binary filtering is applied (just like no gitignore and no
hidden file filtering is applied). Instead, ripgrep behaves as if you
gave the --binary flag for all explicitly given files.
This was a fairly invasive change, and potentially increases the UX
complexity of ripgrep around binary files. (Before, there were two
binary modes, where as now there are three.) However, ripgrep is now a
bit louder with warning messages when binary file detection might
otherwise be hiding potential matches, so hopefully this is a net
improvement.
Finally, the `-uuu` convenience now maps to `--no-ignore --hidden
--binary`, since this is closer to the actualy intent of the
`--unrestricted` flag, i.e., to reduce ripgrep's smart filtering. As a
consequence, `rg -uuu foo` should now search roughly the same number of
bytes as `grep -r foo`, and `rg -uuua foo` should search roughly the
same number of bytes as `grep -ra foo`. (The "roughly" weasel word is
used because grep's and ripgrep's binary file detection might differ
somewhat---perhaps based on buffer sizes---which can impact exactly what
is and isn't searched.)
See the numerous tests in tests/binary.rs for intended behavior.
Fixes#306, Fixes#855
This basically rewrites every integration test. We reduce the amount of
magic involved here in terms of which arguments are being passed to
ripgrep processes. To make up for the boiler plate saved by the magic,
we make the Dir (formerly WorkDir) type a bit nicer to use, along with a
new TestCommand that wraps a std::process::Command. In exchange, we get
tests that are easier to read and write.
We also run every test with the `--pcre2` flag to make sure that works,
when PCRE2 is available.
This commit does the work to delete the old `grep` crate and effectively
rewrite most of ripgrep core to use the new libripgrep crates. The new
`grep` crate is now a facade that collects the various crates that make
up libripgrep.
The most complex part of ripgrep core is now arguably the translation
between command line parameters and the library options, which is
ultimately where we want to be.
This commit improves the integration test setup by running tests inside
the system's temporary directory instead of within ripgrep's `target`
directory. The motivation here is to attempt to reduce the effect of
unanticipated state on ripgrep's integration tests, such as the presence
of `.gitignore` files in ripgrep's checkout directory hierarchy
(including parent directories).
This doesn't remove all possible state. For example, there's no
guarantee that the system's temporary directory isn't itself within a
git repository. Moreover, there may still be other ignore rules in the
directory tree that might impact test behavior. Fixing this seems
somewhat difficult. Conceptually, it seems like ripgrep should run each
test in its own `chroot`-like environment, but doing this in a
non-annoying and portable way (including Windows) doesn't appear to be
possible.
Another approach to take here might be to teach ripgrep itself that a
particular directory should be treated as root, and therefore, never
look at anything outside that directory. This also seems complex to
implement, but tractable. Let's see how this approach works for now.
Fixes#448, #996
Generally speaking, ripgrep prevents the case of not having any patterns
via its arg parsing. However, it is possible for users to provide a file
of patterns via the `-f` flag. If that file is empty, then ripgrep has
nothing to search for and therefore should not ever produce any match.
One way of fixing this might be to replace the absence of patterns with
a pattern that can never match, but this still requires opening and
searching through every file, which is quite a waste. Instead, we detect
this case explicitly and quit early.
Fixes#900
This commit fixes an interesting bug in the `ignore` crate where it
would basically respect any `.gitignore` file anywhere (including global
gitignores in `~/.config/git/ignore`), regardless of whether we were
searching in a git repository or not. This commit rectifies that
behavior to only respect gitignore rules when in a git repo.
The key change here is to move the logic of whether to traverse parents
into the directory matcher rather than putting the onus on the directory
traverser. In particular, we now need to traverse parent directories in
more cases than we previously did, since we need to determine whether
we're in a git repository or not.
Fixes#934
The preprocessor flag accepts a command program and executes this
program for every input file that is searched. Instead of searching the
file directly, ripgrep will instead search the stdout contents of the
program.
Closes#978, Closes#981
A bug in the atty crate was previously masking a problem with the
integration tests on Windows. Namely, the bug in atty resulted in
atty::is(Stdin) returning true if we couldn't get the file name for the
stdin stream. This in turn caused tests like `rg foo` to search the CWD,
which was the intended behavior. However, once the atty bug was fixed,
atty::is(Stdin) no longer returned true, causing `rg foo` searches to
fail.
On Unix-like systems, the atty behavior has always been correct.
However, on Unix-like systems we have a decent way of detecting whether
stdin is readable or not. If it isn't---which is the case in the
integration tests---then we fall back to searching the CWD. On Windows
however, we haven't yet implemented anything to detect whether stdin is
readable or not, so we must always assume that it is. Therefore, we
never get the "go ahead" to search the CWD and the tests fail.
Most of the tests are written to search the CWD explicitly, but there
were a few stragglers that don't.
This isn't great, and we should try to figure out how to do better stdin
detection on Windows.
This commit does what no software project has ever done before: we've
outright removed a flag with no possible way to recapture its
functionality.
This flag presents numerous problems in that it never really worked well
in the first place, and completely falls over when ripgrep uses the
--no-heading output format. Well meaning users want ripgrep to fix this
by getting into the alignment business by buffering all output, but that
is a line that I refuse to cross.
Fixes#795
This update brings with it many bug fixes:
* Better error messages are printed overall. We also include
explicit call out for unsupported features like backreferences
and look-around.
* Regexes like `\s*{` no longer emit incomprehensible errors.
* Unicode escape sequences, such as `\u{..}` are now supported.
For the most part, this upgrade was done in a straight-forward way. We
resist the urge to refactor the `grep` crate, in anticipation of it
being rewritten anyway.
Note that we removed the `--fixed-strings` suggestion whenever a regex
syntax error occurs. In practice, I've found that it results in a lot of
false positives, and I believe that its use is not as paramount now that
regex parse errors are much more readable.
Closes#268, Closes#395, Closes#702, Closes#853
This commit provides basic support for a --stats flag, which will print
various aggregate statistics about a search after all of the results
have been printed. This is mostly intended to support a similar feature
found in the Silver Searcher. Note though that we don't emit the total
bytes searched; this is a first pass at an implementation and we can
improve upon it later.
Closes#411, Closes#799
Namely, when ripgrep is asked to count things and is also asked to print
every match on its own line, then we should just automatically count the
matches and not the lines. This is a departure from how GNU grep behaves,
but there is a compelling argument to be made that GNU grep's behavior
doesn't make a lot of sense.
Note that since this changes the behavior of combining two existing
flags, this is a breaking change.
This commit introduces a new flag, --count-matches, which will cause
ripgrep to report a total count of all matches instead of a count of
total lines matched.
Closes#566, Closes#814
This commit adds support for printing 0-based byte offset before each
line. We handle corner cases such as `-o/--only-matching` and
`-C/--context` as well.
Closes#812
Use the new `Globset::backslash_escape` knob to conform to git behavior:
`\` will escape the following character. For example, the pattern `\*`
will match a file literally named `*`.
Also tweak a test in ripgrep that was relying on this incorrect
behavior.
Closes#526, Closes#811
This commit fixes a bug where `rg --hidden .` would behave differently
with respect to ignore filtering than `rg --hidden ./`. In particular,
this was due to a bug where the directory name `.` caused the leading
`.` in a hidden directory to get stripped, which in turn caused the
ignore rules to fail.
Fixes#807
We use the new AppSettings::AllArgsOverrideSelf to permit all flags to
be specified multiple times. This removes the need for our previous
work-around where we would enable `multiple` for every flag and then
just extract the last value when consuming clap's matches.
We also add a couple regression tests that ensure repeated switches and
flags work as expected.
This commit adds support for reading configuration files that change
ripgrep's default behavior. The format of the configuration file is an
"rc" style and is very simple. It is defined by two rules:
1. Every line is a shell argument, after trimming ASCII whitespace.
2. Lines starting with '#' (optionally preceded by any amount of
ASCII whitespace) are ignored.
ripgrep will look for a single configuration file if and only if the
RIPGREP_CONFIG_PATH environment variable is set and is non-empty.
ripgrep will parse shell arguments from this file on startup and will
behave as if the arguments in this file were prepended to any explicit
arguments given to ripgrep on the command line.
For example, if your ripgreprc file contained a single line:
--smart-case
then the following command
RIPGREP_CONFIG_PATH=wherever/.ripgreprc rg foo
would behave identically to the following command
rg --smart-case foo
This commit also adds a new flag, --no-config, that when present will
suppress any and all support for configuration. This includes any future
support for auto-loading configuration files from pre-determined paths
(which this commit does not add).
Conflicts between configuration files and explicit arguments are handled
exactly like conflicts in the same command line invocation. That is,
this command:
RIPGREP_CONFIG_PATH=wherever/.ripgreprc rg foo --case-sensitive
is exactly equivalent to
rg --smart-case foo --case-sensitive
in which case, the --case-sensitive flag would override the --smart-case
flag.
Closes#196
This commit adds opt-in support for searching compressed files during
recursive search. This behavior is only enabled when the
`-z/--search-zip` flag is passed to ripgrep. When enabled, a limited set
of common compression formats are recognized via file extension, and a
new process is spawned to perform the decompression. ripgrep then
searches the stdout of that spawned process.
Closes#539
The --passthru flag causes ripgrep to print every line,
even if the line does not contain a match. This is a
response to the common pattern of `^|foo` to match every
line, while still highlighting things like `foo`.
Fixes#740
clippy: fix a few lints
The fixes are:
* Use single quotes for single-character
* Use ticks in documentation when necessary.
* Just bow to clippy's wisdom.
This fixes a bug where a "match" color escape was erroneously emitted
after the new line character. This is because `^` is actually allowed to
match after the end of a trailing new line, which means `^$` matches both
before and after the trailing new line when multiline mode is enabled.
The trailing match was causing the phantom escape sequence to appear,
which we don't want.
Incidentally, this is the root cause of #441 as well, although this commit
doesn't fix that issue, since the line itself is printed before we detect
the phantom match.
Fixes#599
When -o/--only-matching is used with -r/--replace, the replacement works
as expected. This is not a breaking change because the flags were
previously set to conflict.
For regression 210 we may not actually need to test anything if the file
system doesn't support creating files with invalid UTF-8 bytes. Don't
create the command until we know there will be an assertion.
APFS does not support creating filenames with invalid UTF-8 byte codes,
thus this test doesn't make sense. Skip it on file systems where this
shouldn't be possible.
Fixes#559
The test is severely constrained to the specific ANSI formatting of
ripgrep in accordance with its default color scheme. The default color
scheme on Windows changed, which caused the test to fail.
For now, just disable the test on Windows.
fixesBurntSushi/ripgrep#506. Word boundary search as arg had unexpected
behavior. added capture group to regex to encapsulate 'or' option search and
prevent expansion and partial boundary finds.
Signed-off-by: Evan.Mattiza <emattiza@gmail.com>
The handling of the -o/--only-matching was incorrect. We cannot ever
re-run regexes on a *subset* of a matched line, because it doesn't take
into account zero width assertions on the edges of the regex. This
occurs whenever an end user uses an assertion explicity, but also occurs
when one is used implicitly, e.g., with the `-w` flag.
This instead reuses the initial matched range from the first regex
match. We also apply this fix to coloring.
Fixes#493
This will cause certain unsupported legacy encodings to act as if they
don't exist, in order to avoid using an unhelpful (in the context of
file searching) "replacement" encoding.
Kudos to @hsivonen for chirping about this!