This commit fixes a bug where `rg --hidden .` would behave differently
with respect to ignore filtering than `rg --hidden ./`. In particular,
this was due to a bug where the directory name `.` caused the leading
`.` in a hidden directory to get stripped, which in turn caused the
ignore rules to fail.
Fixes#807
We use the new AppSettings::AllArgsOverrideSelf to permit all flags to
be specified multiple times. This removes the need for our previous
work-around where we would enable `multiple` for every flag and then
just extract the last value when consuming clap's matches.
We also add a couple regression tests that ensure repeated switches and
flags work as expected.
This commit adds support for reading configuration files that change
ripgrep's default behavior. The format of the configuration file is an
"rc" style and is very simple. It is defined by two rules:
1. Every line is a shell argument, after trimming ASCII whitespace.
2. Lines starting with '#' (optionally preceded by any amount of
ASCII whitespace) are ignored.
ripgrep will look for a single configuration file if and only if the
RIPGREP_CONFIG_PATH environment variable is set and is non-empty.
ripgrep will parse shell arguments from this file on startup and will
behave as if the arguments in this file were prepended to any explicit
arguments given to ripgrep on the command line.
For example, if your ripgreprc file contained a single line:
--smart-case
then the following command
RIPGREP_CONFIG_PATH=wherever/.ripgreprc rg foo
would behave identically to the following command
rg --smart-case foo
This commit also adds a new flag, --no-config, that when present will
suppress any and all support for configuration. This includes any future
support for auto-loading configuration files from pre-determined paths
(which this commit does not add).
Conflicts between configuration files and explicit arguments are handled
exactly like conflicts in the same command line invocation. That is,
this command:
RIPGREP_CONFIG_PATH=wherever/.ripgreprc rg foo --case-sensitive
is exactly equivalent to
rg --smart-case foo --case-sensitive
in which case, the --case-sensitive flag would override the --smart-case
flag.
Closes#196
This commit adds opt-in support for searching compressed files during
recursive search. This behavior is only enabled when the
`-z/--search-zip` flag is passed to ripgrep. When enabled, a limited set
of common compression formats are recognized via file extension, and a
new process is spawned to perform the decompression. ripgrep then
searches the stdout of that spawned process.
Closes#539
The --passthru flag causes ripgrep to print every line,
even if the line does not contain a match. This is a
response to the common pattern of `^|foo` to match every
line, while still highlighting things like `foo`.
Fixes#740
clippy: fix a few lints
The fixes are:
* Use single quotes for single-character
* Use ticks in documentation when necessary.
* Just bow to clippy's wisdom.
This fixes a bug where a "match" color escape was erroneously emitted
after the new line character. This is because `^` is actually allowed to
match after the end of a trailing new line, which means `^$` matches both
before and after the trailing new line when multiline mode is enabled.
The trailing match was causing the phantom escape sequence to appear,
which we don't want.
Incidentally, this is the root cause of #441 as well, although this commit
doesn't fix that issue, since the line itself is printed before we detect
the phantom match.
Fixes#599
When -o/--only-matching is used with -r/--replace, the replacement works
as expected. This is not a breaking change because the flags were
previously set to conflict.
For regression 210 we may not actually need to test anything if the file
system doesn't support creating files with invalid UTF-8 bytes. Don't
create the command until we know there will be an assertion.
APFS does not support creating filenames with invalid UTF-8 byte codes,
thus this test doesn't make sense. Skip it on file systems where this
shouldn't be possible.
Fixes#559
The test is severely constrained to the specific ANSI formatting of
ripgrep in accordance with its default color scheme. The default color
scheme on Windows changed, which caused the test to fail.
For now, just disable the test on Windows.
fixesBurntSushi/ripgrep#506. Word boundary search as arg had unexpected
behavior. added capture group to regex to encapsulate 'or' option search and
prevent expansion and partial boundary finds.
Signed-off-by: Evan.Mattiza <emattiza@gmail.com>
The handling of the -o/--only-matching was incorrect. We cannot ever
re-run regexes on a *subset* of a matched line, because it doesn't take
into account zero width assertions on the edges of the regex. This
occurs whenever an end user uses an assertion explicity, but also occurs
when one is used implicitly, e.g., with the `-w` flag.
This instead reuses the initial matched range from the first regex
match. We also apply this fix to coloring.
Fixes#493
This will cause certain unsupported legacy encodings to act as if they
don't exist, in order to avoid using an unhelpful (in the context of
file searching) "replacement" encoding.
Kudos to @hsivonen for chirping about this!
This permits setting the maximum line width with respect to the number
of bytes in a line. Omitted lines (whether part of a match, replacement
or context) are replaced with a message stating that the line was
elided.
Fixes#129
This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and
Shift_JIS. (Courtesy of the `encoding_rs` crate.)
Specifically, this feature enables ripgrep to search files that are
encoded in an encoding other than UTF-8. The list of available encodings
is tied directly to what the `encoding_rs` crate supports, which is in
turn tied to the Encoding Standard. The full list of available encodings
can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get
This pull request also introduces the notion that text encodings can be
automatically detected on a best effort basis. Currently, the only
support for this is checking for a UTF-16 bom. In all other cases, a
text encoding of `auto` (the default) implies a UTF-8 or ASCII
compatible source encoding. When a text encoding is otherwise specified,
it is unconditionally used for all files searched.
Since ripgrep's regex engine is fundamentally built on top of UTF-8,
this feature works by transcoding the files to be searched from their
source encoding to UTF-8. This transcoding only happens when:
1. `auto` is specified and a non-UTF-8 encoding is detected.
2. A specific encoding is given by end users (including UTF-8).
When transcoding occurs, errors are handled by automatically inserting
the Unicode replacement character. In this case, ripgrep's output is
guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they
are printed).
In all other cases, the source text is searched directly, which implies
an assumption that it is at least ASCII compatible, but where UTF-8 is
most useful. In this scenario, encoding errors are not detected. In this
case, ripgrep's output will match the input exactly, byte-for-byte.
This design may not be optimal in all cases, but it has some advantages:
1. In the happy path ("UTF-8 everywhere") remains happy. I have not been
able to witness any performance regressions.
2. In the non-UTF-8 path, implementation complexity is kept relatively
low. The cost here is transcoding itself. A potentially superior
implementation might build decoding of any encoding into the regex
engine itself. In particular, the fundamental problem with
transcoding everything first is that literal optimizations are nearly
negated.
Future work should entail improving the user experience. For example, we
might want to auto-detect more text encodings. A more elaborate UX
experience might permit end users to specify multiple text encodings,
although this seems hard to pull off in an ergonomic way.
Fixes#1
When writing paths like `!/foo` in gitignore files (or when using the
-g/--glob flag), the presence of `!` would prevent the gitignore builder
from noticing the leading slash, which causes absolute path matching to
fail.
Fixes#405
The --max-filesize option allows filtering files which are larger than
the specified limit. This is potentially useful if one is attempting to
search a number of large files without common file-types/suffixes.
See #369.
When give an explicit file path on the command line like `foo` where `foo`
is a symlink, ripgrep should follow it even if `-L` isn't set. This is
consistent with the behavior of `foo/`.
Fixes#256
When ripgrep detects a literal, it emits them as raw hex escaped byte
sequences to Regex::new. This permits literal optimizations for arbitrary
byte sequences (i.e., possibly invalid UTF-8). The problem is that
Regex::new interprets hex escaped byte sequences as *Unicode codepoints*
by default, but we want them to actually stand for their raw byte values.
Therefore, disable Unicode mode.
This is OK, since the regex is composed entirely of literals and literal
extraction does Unicode case folding.
Fixes#251
This changes the uppercase literal detection for the "smart case"
functionality. In particular, a character class is considered to have an
uppercase literal if at least one of its ranges starts or stops with an
uppercase literal.
Fixes#229