This flag, when set, will automatically dispatch to PCRE2 if the given
regex cannot be compiled by Rust's regex engine. If both engines fail to
compile the regex, then both errors are surfaced.
Closes#1155
The default stack size is 32KB, and this increases it to 10MB. 32KB is
pretty paltry in the environments in which ripgrep runs, and 10MB is
easily afforded as a maximum size. (The size limit we set for Rust's
regex engine is considerably larger.)
This was motivated due to the fack that JIT stack limits have been
observed to be hit in the wild:
https://github.com/Microsoft/vscode/issues/64606
This flag is commonly used in pipelines and it can be annoying to write
it out every time you need it.
Ideally, we would use -h for this to match GNU grep, but -h is used to
print help output.
Closes#1185
This commit adds support for showing a preview of long lines. While the
default still remains as completely suppressing the entire line, this
new functionality will show the first N graphemes of a matching line,
including the number of matches that are suppressed.
This was unfortunately a fairly invasive change to the printer that
required a bit of refactoring. On the bright side, the single line
and multi-line coloring are now more unified than they were before.
Closes#1078
This commit attempts to surface binary filtering in a slightly more
user friendly way. Namely, before, ripgrep would silently stop
searching a file if it detected a NUL byte, even if it had previously
printed a match. This can lead to the user quite reasonably assuming
that there are no more matches, since a partial search is fairly
unintuitive. (ripgrep has this behavior by default because it really
wants to NOT search binary files at all, just like it doesn't search
gitignored or hidden files.)
With this commit, if a match has already been printed and ripgrep detects
a NUL byte, then it will print a warning message indicating that the search
stopped prematurely.
Moreover, this commit adds a new flag, --binary, which causes ripgrep to
stop filtering binary files, but in a way that still avoids dumping
binary data into terminals. That is, the --binary flag makes ripgrep
behave more like grep's default behavior.
For files explicitly specified in a search, e.g., `rg foo some-file`,
then no binary filtering is applied (just like no gitignore and no
hidden file filtering is applied). Instead, ripgrep behaves as if you
gave the --binary flag for all explicitly given files.
This was a fairly invasive change, and potentially increases the UX
complexity of ripgrep around binary files. (Before, there were two
binary modes, where as now there are three.) However, ripgrep is now a
bit louder with warning messages when binary file detection might
otherwise be hiding potential matches, so hopefully this is a net
improvement.
Finally, the `-uuu` convenience now maps to `--no-ignore --hidden
--binary`, since this is closer to the actualy intent of the
`--unrestricted` flag, i.e., to reduce ripgrep's smart filtering. As a
consequence, `rg -uuu foo` should now search roughly the same number of
bytes as `grep -r foo`, and `rg -uuua foo` should search roughly the
same number of bytes as `grep -ra foo`. (The "roughly" weasel word is
used because grep's and ripgrep's binary file detection might differ
somewhat---perhaps based on buffer sizes---which can impact exactly what
is and isn't searched.)
See the numerous tests in tests/binary.rs for intended behavior.
Fixes#306, Fixes#855
This makes the case of searching for a dictionary of a very large number
of literals much much faster. (~10x or so.) In particular, we achieve this
by short-circuiting the construction of a full regex when we know we have
a simple alternation of literals. Building the regex for a large dictionary
(>100,000 literals) turns out to be quite slow, even if it internally will
dispatch to Aho-Corasick.
Even that isn't quite enough. It turns out that even *parsing* such a regex
is quite slow. So when the -F/--fixed-strings flag is set, we short
circuit regex parsing completely and jump straight to Aho-Corasick.
We aren't quite as fast as GNU grep here, but it's much closer (less than
2x slower).
In general, this is somewhat of a hack. In particular, it seems plausible
that this optimization could be implemented entirely in the regex engine.
Unfortunately, the regex engine's internals are just not amenable to this
at all, so it would require a larger refactoring effort. For now, it's
good enough to add this fairly simple hack at a higher level.
Unfortunately, if you don't pass -F/--fixed-strings, then ripgrep will
be slower, because of the aforementioned missing optimization. Moreover,
passing flags like `-i` or `-S` will cause ripgrep to abandon this
optimization and fall back to something potentially much slower. Again,
this fix really needs to happen inside the regex engine, although we
might be able to special case -i when the input literals are pure ASCII
via Aho-Corasick's `ascii_case_insensitive`.
Fixes#497, Fixes#838
This commit adds a new encoding feature where the -E/--encoding flag
will now accept a value of 'none'. When given this value, all encoding
related machinery is disabled and ripgrep will search the raw bytes of
the file, including the BOM if it's present.
Closes#1207, Closes#1208
This lets us implement correct Unicode trimming and also simplifies the
parsing logic a bit. This also removes the last platform specific bits of
code in ripgrep core.
This fixes what appears to be a pretty egregious regression where the
`-F/--fixed-strings` flag wasn't be applied to patterns supplied via
the `-f/--file` flag. The same bug existed for the `-x/--line-regexp`
flag as well, which we fix here.
Fixes#1176
This changes how ripgrep emit exit status codes. In particular, any error
that occurs while searching will now cause ripgrep to emit a `2` exit
code, where as it previously would emit either a `0` or a `1` code based
on whether it matched or not. That is, ripgrep would only emit a `2` exit
code for a catastrophic error.
This tweak includes additional logic that GNU grep adheres to, which seems
like good sense. Namely, if -q/--quiet is given, and an error occurs and
a match occurs, then ripgrep will emit a `0` exit code.
Closes#1159
Previously, we relied on clap to handle printing either an error
message, or --help/--version output, in addition to setting the exit
status code. Unfortunately, for --help/--version output, clap was
panicking if the write failed, which can happen in fairly common
scenarios via a broken pipe error. e.g., `rg -h | head`.
We fix this by using clap's "safe" API and doing the printing ourselves.
We also set the exit code to `2` when an invalid command has been given.
Fixes#1125 and partially addresses #1159
The --ignore-file-case-insensitive flag causes all
.gitignore/.rgignore/.ignore files to have their globs matched without
regard for case. Because this introduces a potentially significant
performance regression, this is always disabled by default. Users that
need case insensitive matching can enable it on a case by case basis.
Closes#1164, Closes#1170
This commit fixes a bug where AsciiDoc would drop any line containing a
'{foo}' because it interpreted it as an undefined attribute reference:
> Simple attribute references take the form {<name>}. If the attribute name
> is defined its text value is substituted otherwise the line containing the
> reference is dropped from the output.
See: https://www.methods.co.nz/asciidoc/chunked/ch30.html
We fix this by simply replacing all occurrences of '{' and '}' with
their escaped forms: '{' and '}'.
Fixes#1101
When a "\n literal is not allowed" error is reported, ripgrep will now
suggest the use of the -U/--multiline flag, which enables matching
newlines.
Fixes#1055
The --pre-glob flag is like the --glob flag, except it applies to filtering
files through the preprocessor instead of for search. This makes it
possible to apply the preprocessor to only a small subset of files, which
can greatly reduce the process overhead of using a preprocessor when
searching large directories.
These flags provide granular control over ripgrep's buffering strategy.
The --line-buffered flag can be genuinely useful in certain types of shell
pipelines. The --block-buffered flag has a murkier use case, but we add it
for completeness.
This commit moves a lot of "utility" code from ripgrep core into
grep-cli. Any one of these things might not be worth creating a new
crate, but combining everything together results in a fair number of a
convenience routines that make up a decent sized crate.
There is potentially more we could move into the crate, but much of what
remains in ripgrep core is almost entirely dealing with the number of
flags we support.
In the course of doing moving things to the grep-cli crate, we clean up
a lot of gunk and improve failure modes in a number of cases. In
particular, we've fixed a bug where other processes could deadlock if
they write too much to stderr.
Fixes#990
This removes ripgrep-specific code for filtering files that correspond to
stdout and instead uses the 'ignore' crate's functionality for doing the
same.
These flags each accept one of five choices: none, path, modified,
accessed or created. The value indicates how the results are sorted.
For --sort, results are sorted in ascending order where as for --sortr,
results are sorted in descending order.
Closes#404
This commit adds a 'same_file_system' option to the walk builder. For
single threaded walking, it defers to the walkdir crate, which has the
same option. The bulk of this commit implements this flag for the parallel
walker. We add one very feeble test for this.
The parallel walker is now officially a complete mess.
Closes#321
This commit undoes a work-around for a bug in Rust's standard library
that prevented correct file type detection on Windows in OneDrive
directories. We remove the work-around because we are moving to a
latest-stable Rust version policy, which has included this fix for a while
now.
ref #705, https://github.com/rust-lang/rust/issues/46484
This also updates some code to make use of our more liberal versioning
requirement, including the use of crossbeam-channel instead of the MsQueue
from the older an unmaintained crossbeam 0.3. This does regrettably add
a sizable number of dependencies, however, compile times seem mostly
unaffected.
Closes#1019
Previously, we used --pcre2-unicode as the canonical flag despite the
fact that it is enabled by default, which is inconsistent with how we
handle other similar flags.
The reason why --pcre2-unicode was made the canonical flag was to make
it easier to discover since it would be sorted near the --pcre2 flag. To
solve that problem, we simply start a convention that lists related
flags in the docs.
Fixes#1022
This commit does the work to delete the old `grep` crate and effectively
rewrite most of ripgrep core to use the new libripgrep crates. The new
`grep` crate is now a facade that collects the various crates that make
up libripgrep.
The most complex part of ripgrep core is now arguably the translation
between command line parameters and the library options, which is
ultimately where we want to be.
Generally speaking, ripgrep prevents the case of not having any patterns
via its arg parsing. However, it is possible for users to provide a file
of patterns via the `-f` flag. If that file is empty, then ripgrep has
nothing to search for and therefore should not ever produce any match.
One way of fixing this might be to replace the absence of patterns with
a pattern that can never match, but this still requires opening and
searching through every file, which is quite a waste. Instead, we detect
this case explicitly and quit early.
Fixes#900
This removes logic from the decompressor for skipping tar archives. This
logic was originally added under the assumption that we probably want to
avoid the cost of reading them. However, this is generally inconsistent
with how ripgrep treats files like tar archives: it should search them
and do binary detection like normal.
Fixes#918
This commit adds a new --no-ignore-global flag that permits disabling
the use of global gitignore filtering. Global gitignores are generally
found in `$HOME/.config/git/ignore`, but its location can be configured
via git's `core.excludesFile` option.
Closes#934
This commit improves the error message when --path-separator fails. Namely,
it prints the separator it got and also prints a notice for Windows users
for common failure modes.
Fixes#957
This switch explicitly disables the --pre behavior. An empty string will
also disable --pre behavior, but actually utterring an empty flag value
is quite awkward, so we provide an explicit switch to do the same thing.
Thanks to @c-blake for pointing this out.