This commit attempts to surface binary filtering in a slightly more
user friendly way. Namely, before, ripgrep would silently stop
searching a file if it detected a NUL byte, even if it had previously
printed a match. This can lead to the user quite reasonably assuming
that there are no more matches, since a partial search is fairly
unintuitive. (ripgrep has this behavior by default because it really
wants to NOT search binary files at all, just like it doesn't search
gitignored or hidden files.)
With this commit, if a match has already been printed and ripgrep detects
a NUL byte, then it will print a warning message indicating that the search
stopped prematurely.
Moreover, this commit adds a new flag, --binary, which causes ripgrep to
stop filtering binary files, but in a way that still avoids dumping
binary data into terminals. That is, the --binary flag makes ripgrep
behave more like grep's default behavior.
For files explicitly specified in a search, e.g., `rg foo some-file`,
then no binary filtering is applied (just like no gitignore and no
hidden file filtering is applied). Instead, ripgrep behaves as if you
gave the --binary flag for all explicitly given files.
This was a fairly invasive change, and potentially increases the UX
complexity of ripgrep around binary files. (Before, there were two
binary modes, where as now there are three.) However, ripgrep is now a
bit louder with warning messages when binary file detection might
otherwise be hiding potential matches, so hopefully this is a net
improvement.
Finally, the `-uuu` convenience now maps to `--no-ignore --hidden
--binary`, since this is closer to the actualy intent of the
`--unrestricted` flag, i.e., to reduce ripgrep's smart filtering. As a
consequence, `rg -uuu foo` should now search roughly the same number of
bytes as `grep -r foo`, and `rg -uuua foo` should search roughly the
same number of bytes as `grep -ra foo`. (The "roughly" weasel word is
used because grep's and ripgrep's binary file detection might differ
somewhat---perhaps based on buffer sizes---which can impact exactly what
is and isn't searched.)
See the numerous tests in tests/binary.rs for intended behavior.
Fixes#306, Fixes#855
This lets us implement correct Unicode trimming and also simplifies the
parsing logic a bit. This also removes the last platform specific bits of
code in ripgrep core.
This changes how ripgrep emit exit status codes. In particular, any error
that occurs while searching will now cause ripgrep to emit a `2` exit
code, where as it previously would emit either a `0` or a `1` code based
on whether it matched or not. That is, ripgrep would only emit a `2` exit
code for a catastrophic error.
This tweak includes additional logic that GNU grep adheres to, which seems
like good sense. Namely, if -q/--quiet is given, and an error occurs and
a match occurs, then ripgrep will emit a `0` exit code.
Closes#1159
Previously, we relied on clap to handle printing either an error
message, or --help/--version output, in addition to setting the exit
status code. Unfortunately, for --help/--version output, clap was
panicking if the write failed, which can happen in fairly common
scenarios via a broken pipe error. e.g., `rg -h | head`.
We fix this by using clap's "safe" API and doing the printing ourselves.
We also set the exit code to `2` when an invalid command has been given.
Fixes#1125 and partially addresses #1159
This commit cleans up the README and splits portions of it out into
a user guide (GUIDE.md) and a FAQ (FAQ.md). The README now provides a
small list of documentation "quick" links to various parts of the docs.
This commit also does a few other minor touchups.
This commit uses the recent refactoring for defining flags to
automatically generate a man page. This finally allows us to define the
documentation for each flag in a single place.
The man page is generated on every build, if and only if `asciidoc` is
installed. When generated, it is placed in Cargo's `OUT_DIR` directory,
which is the same place that shell completions live.
This commit adds support for reading configuration files that change
ripgrep's default behavior. The format of the configuration file is an
"rc" style and is very simple. It is defined by two rules:
1. Every line is a shell argument, after trimming ASCII whitespace.
2. Lines starting with '#' (optionally preceded by any amount of
ASCII whitespace) are ignored.
ripgrep will look for a single configuration file if and only if the
RIPGREP_CONFIG_PATH environment variable is set and is non-empty.
ripgrep will parse shell arguments from this file on startup and will
behave as if the arguments in this file were prepended to any explicit
arguments given to ripgrep on the command line.
For example, if your ripgreprc file contained a single line:
--smart-case
then the following command
RIPGREP_CONFIG_PATH=wherever/.ripgreprc rg foo
would behave identically to the following command
rg --smart-case foo
This commit also adds a new flag, --no-config, that when present will
suppress any and all support for configuration. This includes any future
support for auto-loading configuration files from pre-determined paths
(which this commit does not add).
Conflicts between configuration files and explicit arguments are handled
exactly like conflicts in the same command line invocation. That is,
this command:
RIPGREP_CONFIG_PATH=wherever/.ripgreprc rg foo --case-sensitive
is exactly equivalent to
rg --smart-case foo --case-sensitive
in which case, the --case-sensitive flag would override the --smart-case
flag.
Closes#196
This commit adds opt-in support for searching compressed files during
recursive search. This behavior is only enabled when the
`-z/--search-zip` flag is passed to ripgrep. When enabled, a limited set
of common compression formats are recognized via file extension, and a
new process is spawned to perform the decompression. ripgrep then
searches the stdout of that spawned process.
Closes#539
The --passthru flag causes ripgrep to print every line,
even if the line does not contain a match. This is a
response to the common pattern of `^|foo` to match every
line, while still highlighting things like `foo`.
Fixes#740
* Don't use 'smart typography' when generating man page
* Document PATTERN and PATH
* Capitalise place-holder names consistently
* Add note about PATH overriding glob/ignore rules
* Update args.rs for new PATH capitalisation
Fixes#725
This reverts a couple of changes introduced in 4c78ca8 and keeps the
`PATTERN` argument consistently uppercased, so error messages can look
like:
error: The following required arguments were not provided:
<PATTERN>
Formatting of rg.1.md. Remove backticks from already indented code.
Add missing italic to some arguments, Replace -n by --line-number in
--pretty for better clarity. Add explicit example of `*.foo` instead of
`<glob>` in examples. Add vim information to --vimgrep.
In src/app.rs, also changed help text for pattern and regexp. Actually,
"multiple patterns may be given" was not true for the standalone
pattern.
This permits setting the maximum line width with respect to the number
of bytes in a line. Omitted lines (whether part of a match, replacement
or context) are replaced with a message stating that the line was
elided.
Fixes#129
This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and
Shift_JIS. (Courtesy of the `encoding_rs` crate.)
Specifically, this feature enables ripgrep to search files that are
encoded in an encoding other than UTF-8. The list of available encodings
is tied directly to what the `encoding_rs` crate supports, which is in
turn tied to the Encoding Standard. The full list of available encodings
can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get
This pull request also introduces the notion that text encodings can be
automatically detected on a best effort basis. Currently, the only
support for this is checking for a UTF-16 bom. In all other cases, a
text encoding of `auto` (the default) implies a UTF-8 or ASCII
compatible source encoding. When a text encoding is otherwise specified,
it is unconditionally used for all files searched.
Since ripgrep's regex engine is fundamentally built on top of UTF-8,
this feature works by transcoding the files to be searched from their
source encoding to UTF-8. This transcoding only happens when:
1. `auto` is specified and a non-UTF-8 encoding is detected.
2. A specific encoding is given by end users (including UTF-8).
When transcoding occurs, errors are handled by automatically inserting
the Unicode replacement character. In this case, ripgrep's output is
guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they
are printed).
In all other cases, the source text is searched directly, which implies
an assumption that it is at least ASCII compatible, but where UTF-8 is
most useful. In this scenario, encoding errors are not detected. In this
case, ripgrep's output will match the input exactly, byte-for-byte.
This design may not be optimal in all cases, but it has some advantages:
1. In the happy path ("UTF-8 everywhere") remains happy. I have not been
able to witness any performance regressions.
2. In the non-UTF-8 path, implementation complexity is kept relatively
low. The cost here is transcoding itself. A potentially superior
implementation might build decoding of any encoding into the regex
engine itself. In particular, the fundamental problem with
transcoding everything first is that literal optimizations are nearly
negated.
Future work should entail improving the user experience. For example, we
might want to auto-detect more text encodings. A more elaborate UX
experience might permit end users to specify multiple text encodings,
although this seems hard to pull off in an ergonomic way.
Fixes#1
The --max-filesize option allows filtering files which are larger than
the specified limit. This is potentially useful if one is attempting to
search a number of large files without common file-types/suffixes.
See #369.
Add small howtos for installing shell completion files to the README and
the man page.
They are still incomplete. We're missing Zsh and PowerShell.
Fixes#262
Previously, ripgrep would only emit the 'bold' ANSI escape sequence if
no foreground or background color was set. Instead, it would convert colors
to their "intense" versions if bold was set. The intent was to do the same
thing on Windows and Unix. However, this had a few negative side effects:
1. Omitting the 'bold' ANSI escape when 'bold' was set is surprising.
2. Intense colors can look quite bad and be hard to read.
To fix this, we introduce a new setting called 'intense' in the --colors
flag, and thread that down through to the public API of the `termcolor`
crate. The 'intense' setting has environment specific behavior:
1. In ANSI mode, it will convert the selected color to its "intense"
variant.
2. In the Windows console, it will make the text "intense."
There is no longer any "smart" handling of the 'bold' style. The 'bold'
ANSI escape is always emitted when it is selected. In the Windows
console, the 'bold' setting now has no effect. Note that this is a
breaking change.
Fixes#266, #293
This commit completely guts all of the color handling code and replaces
most of it with two new crates: wincolor and termcolor. wincolor
provides a simple API to coloring using the Windows console and
termcolor provides a platform independent coloring API tuned for
multithreaded command line programs. This required a lot more
flexibility than what the `term` crate provided, so it was dropped.
We instead switch to writing ANSI escape sequences directly and ignore
the TERMINFO database.
In addition to fixing several bugs, this commit also permits end users
to customize colors to a certain extent. For example, this command will
set the match color to magenta and the line number background to yellow:
rg --colors 'match:fg:magenta' --colors 'line:bg:yellow' foo
For tty handling, we've adopted a hack from `git` to do tty detection in
MSYS/mintty terminals. As a result, ripgrep should get both color
detection and piping correct on Windows regardless of which terminal you
use.
Finally, switch to line buffering. Performance doesn't seem to be
impacted and it's an otherwise more user friendly option.
Fixes#37, Fixes#51, Fixes#94, Fixes#117, Fixes#182, Fixes#231