fixesBurntSushi/ripgrep#506. Word boundary search as arg had unexpected
behavior. added capture group to regex to encapsulate 'or' option search and
prevent expansion and partial boundary finds.
Signed-off-by: Evan.Mattiza <emattiza@gmail.com>
to better organize options. These are the changes:
- color will have default value of "never" if --vimgrep is given,
and only if no --color option is given
- last overrides previous: --line-number and --no-line-number, --heading
and --no-heading, --with-filename and --no-filename, and --vimgrep and
--count
- no heading will be show if --vimgrep is defined. This worked inside
vim actually because heading is also only shown if tty is stdout
(which is not the case when rg is called from vim).
Unfortunately, clap does not behave like a usual GNU/POSIX in some
cases, as reported in https://github.com/kbknapp/clap-rs/issues/970
and https://github.com/kbknapp/clap-rs/issues/976 (having all the bells
and whistles, on the other hand). So we still have issues like rg
failing when same argument is given more than once (unless for the few
ones marked with `multiple(true)`), or having unintuitive precedence
rules (and probably non-intentional, just there because of clap's
limitations) like:
- --no-filename over --vimgrep
- --no-line-number over --column, --pretty or --vimgrep
- --no-heading over --pretty
regardless of the order in which options where given, where the desired
behavior would be that the last option would override the previous ones
given.
This reverts a couple of changes introduced in 4c78ca8 and keeps the
`PATTERN` argument consistently uppercased, so error messages can look
like:
error: The following required arguments were not provided:
<PATTERN>
The handling of the -o/--only-matching was incorrect. We cannot ever
re-run regexes on a *subset* of a matched line, because it doesn't take
into account zero width assertions on the edges of the regex. This
occurs whenever an end user uses an assertion explicity, but also occurs
when one is used implicitly, e.g., with the `-w` flag.
This instead reuses the initial matched range from the first regex
match. We also apply this fix to coloring.
Fixes#493
Formatting of rg.1.md. Remove backticks from already indented code.
Add missing italic to some arguments, Replace -n by --line-number in
--pretty for better clarity. Add explicit example of `*.foo` instead of
`<glob>` in examples. Add vim information to --vimgrep.
In src/app.rs, also changed help text for pattern and regexp. Actually,
"multiple patterns may be given" was not true for the standalone
pattern.
With vim configured with:
set grepprg=rg\ --vimgrep
set grepformat^=%f:%l:%c:%m
and running the command `:grep 'vimgrep' doc/rg.1`, the output should
be:
doc/rg.1:446:8:.B \-\-vimgrep
but the actual output was:
446:8:.B \-\-vimgrep
Same issue would happen if results only match one file. Ag behaves as
expected.
This will cause certain unsupported legacy encodings to act as if they
don't exist, in order to avoid using an unhelpful (in the context of
file searching) "replacement" encoding.
Kudos to @hsivonen for chirping about this!
This commit updates clap to v2.23.0
The update contained a bug fix in clap that results in broken code in
ripgrep. ripgrep was relying on the bug, but this commit fixes that
issue. The bug centered around not being able to override the
auto-generated help message by supplying a flag with a long of `help`.
Normally, supplying a flag with a long of `help` means whenever the user
passes `--help`, the consuming code (e.g. ripgrep) is responsible for
displaying the help message. However, due to the bug in clap this wasn't
necessary for ripgrep to do unless the user passed `-h`. With the bug
fixed, it meant the user passing `--help` and clap expected ripgrep to
display the help, yet ripgrep expected clap to display the help. This
has been fixed in this commit of ripgrep.
All well now!
v2.23.0 also brings the abilty to use `Arg::help` or `Arg::long_help`
allowing one to distinguish between `-h` and `--help`. This commit
leaves all doc strings in the `lazy_static!` hashmap however only for
aesthetic reasons.
This means all home rolled handling of `-h`/`--help` has been removed
from ripgrep, yet functionality *and* appearances are 100% the same.
Previously, `get_matches` would return even if --help or --version was
given, and we could check for them manually. That behavior seems to have
changed. Instead, we must use get_matches_safe to inspect the error to
determine what happened.
We can't use the same process for -V/--version since clap will
unconditionally print its own version info. Instead, we rename (internally)
the version flag so that clap doesn't interfere.
This permits setting the maximum line width with respect to the number
of bytes in a line. Omitted lines (whether part of a match, replacement
or context) are replaced with a message stating that the line was
elided.
Fixes#129
This changes the default behavior of ripgrep to *not* show line numbers
when it is printing to a tty and is only searching stdin.
Fixes#380
[breaking-change]
It's not clear what exactly is happening here, but the Read implementation
for text decoding appears a bit sensitive. Small pertubations in the code
appear to have a nearly 100% impact on the overall speed of ripgrep when
searching UTF-16 files.
I haven't had the time to examine the generated code in detail, but
`perf stat` seems to think that the instruction cache is performing a lot
worse when the code slows down. This might mean that excessive inlining
causes a different code structure that leads to less-than-optimal icache
usage, but it's at best a guess.
Explicitly disabling the inline for the cold path seems to help the
optimizer figure out the right thing.
This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and
Shift_JIS. (Courtesy of the `encoding_rs` crate.)
Specifically, this feature enables ripgrep to search files that are
encoded in an encoding other than UTF-8. The list of available encodings
is tied directly to what the `encoding_rs` crate supports, which is in
turn tied to the Encoding Standard. The full list of available encodings
can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get
This pull request also introduces the notion that text encodings can be
automatically detected on a best effort basis. Currently, the only
support for this is checking for a UTF-16 bom. In all other cases, a
text encoding of `auto` (the default) implies a UTF-8 or ASCII
compatible source encoding. When a text encoding is otherwise specified,
it is unconditionally used for all files searched.
Since ripgrep's regex engine is fundamentally built on top of UTF-8,
this feature works by transcoding the files to be searched from their
source encoding to UTF-8. This transcoding only happens when:
1. `auto` is specified and a non-UTF-8 encoding is detected.
2. A specific encoding is given by end users (including UTF-8).
When transcoding occurs, errors are handled by automatically inserting
the Unicode replacement character. In this case, ripgrep's output is
guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they
are printed).
In all other cases, the source text is searched directly, which implies
an assumption that it is at least ASCII compatible, but where UTF-8 is
most useful. In this scenario, encoding errors are not detected. In this
case, ripgrep's output will match the input exactly, byte-for-byte.
This design may not be optimal in all cases, but it has some advantages:
1. In the happy path ("UTF-8 everywhere") remains happy. I have not been
able to witness any performance regressions.
2. In the non-UTF-8 path, implementation complexity is kept relatively
low. The cost here is transcoding itself. A potentially superior
implementation might build decoding of any encoding into the regex
engine itself. In particular, the fundamental problem with
transcoding everything first is that literal optimizations are nearly
negated.
Future work should entail improving the user experience. For example, we
might want to auto-detect more text encodings. A more elaborate UX
experience might permit end users to specify multiple text encodings,
although this seems hard to pull off in an ergonomic way.
Fixes#1
The --max-filesize option allows filtering files which are larger than
the specified limit. This is potentially useful if one is attempting to
search a number of large files without common file-types/suffixes.
See #369.
This commit fixes two issues. The first issue is that if a file contained
many NUL bytes without any LF bytes, then the InputBuffer would read the
entire file into memory. This is not typically a problem, but if you run
rg on /proc, then bad things can happen when reading virtual memory mapping
files. Arguably, such files should be ignored, but we should also try to
avoid exhausting memory too. We fix this by pushing the `-a/--text` flag
option down into InputBuffer, so that it knows to stop immediately if it
finds a NUL byte.
The other issue this fixes is that binary detection is now applied to every
buffer instead of just the first one. This helps avoid detecting too many
files as plain text if the first parts of a binary file happen to contain
no NUL bytes. This issue still persists somewhat in the memory map
searcher, since we probably don't want to search the entire file upfront
for NUL bytes before actually performing our search. Instead, we search the
first 10KB for now.
Fixes#52, Fixes#311
This is essentially a rename of the existing `Stdout` type to `StandardStream`
and a change of its constructor from a single `new()` function to have two
`stdout()` and `stderr()` functions.
Under the hood, we add add internal IoStandardStream{,Lock} enums that allow
us to abstract between Stdout and Stderr conveniently. The rest of the needed
changes then fall out fairly naturally.
Fixes#324.
[breaking-change]
In Emacs, its terminal apparently doesn't support "extended" sets of
foreground/background colors. Unless we need to set an "intense" color,
we should instead use one of the eight basic color codes.
Also, remove the "intense" setting from the default set of colors. It
doesn't do much anyway and enables the default color settings to work
in Emacs out of the box.
Fixes#182 (again)
When running ripgrep like this:
rg foo > output
we must be careful not to search `output` since ripgrep is actively writing
to it. Searching it can cause massive blowups where the file grows without
bound.
While this is conceptually easy to fix (check the inode of the redirection
and the inode of the file you're about to search), there are a few problems
with it.
First, inodes are a Unix thing, so we need a Windows specific solution to
this as well. To resolve this concern, I created a new crate, `same-file`,
which provides a cross platform abstraction.
Second, stat'ing every file is costly. This is not avoidable on Windows,
but on Unix, we can get the inode number directly from directory traversal.
However, this information wasn't exposed, but now it is (through both the
ignore and walkdir crates).
Fixes#286
Previously, ripgrep would only emit the 'bold' ANSI escape sequence if
no foreground or background color was set. Instead, it would convert colors
to their "intense" versions if bold was set. The intent was to do the same
thing on Windows and Unix. However, this had a few negative side effects:
1. Omitting the 'bold' ANSI escape when 'bold' was set is surprising.
2. Intense colors can look quite bad and be hard to read.
To fix this, we introduce a new setting called 'intense' in the --colors
flag, and thread that down through to the public API of the `termcolor`
crate. The 'intense' setting has environment specific behavior:
1. In ANSI mode, it will convert the selected color to its "intense"
variant.
2. In the Windows console, it will make the text "intense."
There is no longer any "smart" handling of the 'bold' style. The 'bold'
ANSI escape is always emitted when it is selected. In the Windows
console, the 'bold' setting now has no effect. Note that this is a
breaking change.
Fixes#266, #293