The `ignore` crate currently handles two different kinds of "global"
gitignore files: gitignores from `~/.gitconfig`'s `core.excludesFile`
and gitignores passed in via `WalkBuilder::add_ignore` (corresponding to
ripgrep's `--ignore-file` flag).
In contrast to any other kind of gitignore file, these gitignore files
should have their patterns interpreted relative to the current working
directory. (Arguably there are other choices we could make here, e.g.,
based on the paths given. But the `ignore` infrastructure can't handle
that, and it's not clearly correct to me.) Normally, a gitignore file
has its patterns interpreted relative to where the gitignore file is.
This relative interpretation matters for patterns like `/foo`, which are
anchored to _some_ directory.
Previously, we would generally get the global gitignores correct because
it's most common to use ripgrep without providing a path. Thus, it
searches the current working directory. In this case, no stripping of
the paths is needed in order for the gitignore patterns to be applied
directly.
But if one provides an absolute path (or something else) to ripgrep to
search, the paths aren't stripped correctly. Indeed, in the core, I had
just given up and not provided a "root" path to these global gitignores.
So it had no hope of getting this correct.
We fix this assigning the CWD to the `Gitignore` values created from
global gitignore files. This was a painful thing to do because we'd
ideally:
1. Call `std::env::current_dir()` at most once for each traversal.
2. Provide a way to avoid the library calling `std::env::current_dir()`
at all. (Since this is global process state and folks might want to
set it to different values for $reasons.)
The `ignore` crate's internals are a total mess. But I think I've
addressed the above 2 points in a semver compatible manner.
Fixes#3179
This was a crazy subtle bug where ripgrep could slow down exponentially
as increasingly larger values of `-A/--after-context` were used. But,
interestingly, this would only occur when searching `stdin` and _not_
when searching the same data as a regular file.
This confounded me because ripgrep, pretty early on, erases the
difference between searching a single file and `stdin`. So it wasn't
like there were different code paths. And I mistakenly assumed that they
would otherwise behave the same as they are just treated as streams.
But... it turns out that running `read` on a `stdin` versus a regular
file seems to behave differently. At least on my Linux system, with
`stdin`, `read` never seems to fill the buffer with more than 64K. But
with a regular file, `read` pretty reliably fills the caller's buffer
with as much space as declared.
Of course, it is expected that `read` doesn't *have* to fill up the
caller's buffer, and ripgrep is generally fine with that. But when
`-A/--after-context` is used with a very large value---big enough that
the default buffer capacity is too small---then more heap memory needs
to be allocated to correctly handle all cases. This can result in
passing buffers bigger than 64K to `read`.
While we *correctly* handle `read` calls that don't fill the buffer,
it turns out that if we don't fill the buffer, then we get into a
pathological case where we aren't processing as many bytes as we could.
That is, because of the `-A/--after-context` causing us to keep a lot of
bytes around while we roll the buffer and because reading from `stdin`
gives us fewer bytes than normal, we weren't amortizing our `read` calls
as well as we should have been. Indeed, our buffer capacity increases
specifically take this amortization into account, but we weren't taking
advantage of it.
We fix this by putting `read` into an inner loop that ensures our
buffer gets filled up. This fixes the performance bug:
```
$ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A9999) | wc -l
real 1.330
user 0.767
sys 0.559
maxmem 29 MB
faults 0
10000
$ cat bigger.txt | (time rg ZQZQZQZQZQ --no-mmap -A9999) | wc -l
real 2.355
user 0.860
sys 0.613
maxmem 29 MB
faults 0
10000
$ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A99999) | wc -l
real 3.636
user 3.091
sys 0.537
maxmem 29 MB
faults 0
100000
$ cat bigger.txt | (time rg ZQZQZQZQZQ --no-mmap -A99999) | wc -l
real 4.918
user 3.236
sys 0.710
maxmem 29 MB
faults 0
100000
$ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A999999) | wc -l
real 5.430
user 4.666
sys 0.750
maxmem 51 MB
faults 0
1000000
$ cat bigger.txt | (time rg ZQZQZQZQZQ --no-mmap -A999999) | wc -l
real 6.894
user 4.907
sys 0.850
maxmem 51 MB
faults 0
1000000
```
For comparison, here is GNU grep:
```
$ cat bigger.txt | (time grep ZQZQZQZQZQ -A9999) | wc -l
real 1.466
user 0.159
sys 0.839
maxmem 29 MB
faults 0
10000
$ cat bigger.txt | (time grep ZQZQZQZQZQ -A99999) | wc -l
real 1.663
user 0.166
sys 0.941
maxmem 29 MB
faults 0
100000
$ cat bigger.txt | (time grep ZQZQZQZQZQ -A999999) | wc -l
real 1.631
user 0.204
sys 0.910
maxmem 29 MB
faults 0
1000000
```
GNU grep is still notably faster. We'll fix that in the next commit.
Fixes#3184
The abstraction boundary fuck up is the gift that keeps on giving. It
turns out that the invariant that the match would never exceed the range
given is not always true. So we kludge around it.
Also, update the CHANGELOG to include the fix for #2111.
Fixes#3180
Building it can consume resources. In particular, on Windows, the
various binaries are eagerly resolved.
I think this originally wasn't done. The eager resolution was added
later for security purposes. But the "eager" part isn't actually
necessary.
It would probably be better to change the decompression reader to do
lazy resolution only when the binary is needed. But this will at least
avoid doing anything when the `-z/--search-zip` flag isn't used. But
when it is, ripgrep will still eagerly resolve all possible binaries.
Fixes#2111
Every single call site wants to pass a path relative to the directory
the command was created for. So just make it do that automatically,
similar to `Dir::create` and friends.
Note that we skip lz4/brotli/zstd tests on RISC-V.
The CI runs RISC-V tests using cross/QEMU emulation. The decompression
tools (lz4, brotli, zstd) are x86_64 binaries on the host that cannot
execute in the RISC-V QEMU environment.
Skip these three tests at compile-time on RISC-V to avoid test failures.
The -z/--search-zip functionality itself works correctly on real RISC-V
hardware where native decompression tools are available.
PR #3165
... specifically, when the whitelist comes from a _parent_ gitignore
file.
Our handling of parent gitignores is pretty ham-fisted and has been a
source of some unfortunate bugs. The problem is that we need to strip
the parent path from the path we're searching in order to correctly
apply the globs. But getting this stripping correct seems to be a subtle
affair.
Fixes#3173
This finishes what I started in commit
a6e0be3c90.
Specifically, the `max_matches` configuration has been moved to the
`grep-searcher` crate and *removed* from the `grep-printer` crate. The
commit message has the details for why we're doing this, but the short
story is to fix#3076.
Note that this is a breaking change for `grep-printer`, so this will
require a semver incompatible release.
I've split the previously singular "CentOS/RHEL/Rocky" section into 3
sections. They each benefit from having their own steps.
I've also copied steps from [EPEL Getting Started] documentation,
including steps that don't seem to be required because it seems to be
best practice (although I do not understand it). Notably, this is not
required for CentOS Stream:
```
dnf config-manager --set-enabled crb
```
And this is not required for Red Hat:
```
subscription-manager repos --enable codeready-builder-for-rhel-10-$(arch)-rpms
```
And neither are available on Rocky Linux 10. Hence, all 3 have slightly
different instructions.
It has been suggested (see [here][suggest1] and [here][suggest2]) that
the installation instructions should just link to the [EPEL Getting
Started] documentation and just contain this step:
```
sudo dnf install ripgrep
```
However, this is not sufficient to actually install ripgrep from a
base installation of these Linux distributions. I tested this via the
`dokken/centos-stream-10:sha-d1e294f`, `rockylinux/rockylinux:10` and
`redhat/ubi10` Docker images on DockerHub.
While this does mean ripgrep's installation instructions can become out
of sync from upstream, this is *always* a risk regardless of platform.
The instructions are provided on a best effort basis and generally
should work on the latest release of said platform. If the instructions
result in unhelpful errors (like `dnf install ripgrep` does if you
don't enable EPEL), then that isn't being maximally helpful to users.
I'd rather attempt to give the entire set of instructions and risk
being out of sync.
Also, since the installation instructions include URLs with version
numbers in them, I made the section names include version numbers as
well.
Note: I found using the `dokken/centos-stream-10:sha-d1e294f` Docker
image to be somewhat odd, as I could not find any official CentOS
Docker images. [This][DockerHub-CentOS] is still the first hit on
Google, but all of its tags have been deleted and the image is
deprecated. I was profoundly confused by this given that the [EPEL
Getting Started] documentation *specifically* cites CentOS 10. In fact,
it is citing CentOS *Stream* 10, which is something wholly distinct
from CentOS. What an absolute **clusterfuck**. If I had just read this
paragraph on Wikipedia from the beginning, I would have saved myself a
lot of confusion:
> In December 2020, Red Hat unilaterally terminated CentOS development in favor
> of CentOS Stream 9, a distribution positioned upstream of RHEL. In March
> 2021, CloudLinux (makers of CloudLinux OS) released a RHEL derivative called
> AlmaLinux. Later in May 2021, one of the CentOS founders (Gregory Kurtzer)
> created the competing Rocky Linux project as a successor to the original
> mission of CentOS.
Ref #2981, Ref #2924
[EPEL Getting Started]: https://docs.fedoraproject.org/en-US/epel/getting-started/
[suggest1]: https://github.com/BurntSushi/ripgrep/pull/2981#issuecomment-3204114293
[suggest2]: https://github.com/BurntSushi/ripgrep/issues/2924#issuecomment-3326357254
[DockerHub-CentOS]: https://hub.docker.com/_/centos
In #2843, it's requested that these trailing contextual lines should be
displayed as non-matching because they exceed the limit. While
reasonable, I think that:
1. This would be a weird complication to the implementation.
2. This would overall be less intuitive and more complex. Today, there
is never a case where ripgrep emits a matching line in a way where
the match isn't highlighted.
Closes#2843
Specifically, it is only equivalent to `--count-matches` when the
pattern(s) given can match over multiple lines.
We could have instead made `--multiline --count` always equivalent to
`--multiline --count-matches`, but this seems plausibly less useful.
Indeed, I think it's generally a good thing that users can enable
`-U/--multiline` but still use patterns that only match a single line.
Changing how that behaves would I think be more surprising.
Either way we slice this, it's unfortunately pretty subtle.
Fixes#2852
This is unfortunate, but is a known bug that I don't think can be fixed
without either making `-l/--files-with-matches` much slower or changing
what "binary filtering" means by default.
In this PR, we document this inconsistency since users may find it quite
surprising. The actual work-around is to disable binary filtering with
the `--binary` flag.
We add a test confirming this behavior.
Closes#3131
The underlying issue here is #2528, which was introduced by commit
efd9cfb2fc which fixed another bug.
For the specific case of "did a file match," we can always assume the
match count is at least 1 here. But this doesn't fix the underlying
problem.
Fixes#3139
I believe the current stable version of Debian packages 1.85 rustc. So
if the next release of ripgrep uses a higher MSRV, then I think Debian
won't be able to package it.
It also turned out that I wasn't using anything from beyond Rust 1.85
anyway.
It's likely that I could make use of let-chains in various places, but I
don't think it's worth combing through the code to switch to them at
this point.
Maybe 2024 changes?
Note that we now set `edition = "2024"` explicitly in `rustfmt.toml`.
Without this, it seems like it's possible in some cases for rustfmt to
run under an older edition's style. Not sure how though.
When enabled, patterns like `[abc`, `[]`, `[!]` are treated as if the
opening `[` is just a literal. This is in contrast the default behavior,
which prioritizes better error messages, of returning a parse error.
Fixes#3127, Closes#3145
This is a bit of a brutal change, but I believe is necessary in order to
fix a bug in how we handle the "max matches" limit in multi-line mode
while simultaneously handling context lines correctly.
The main problem here is that "max matches" refers to the shorter of
"one match per line" or "a single match." In typical grep, matches
*can't* span multiple lines, so there's never a difference. But in
multi-line mode, they can. So match counts necessarily must be handled
differently for multi-line mode.
The printer was previously responsible for this. But for $reasons, the
printer is fundamentally not in charge of how matches are found and
reported.
See my comments in #3094 for even more context.
This is a breaking change for `grep-printer`.
Fixes#3076, Closes#3094
qrc[1] are the resource files for data related to user interfaces, and
ui[2] is the extension that the Qt Designer generates, for Widget based
projects.
Note that the initial PR used `ui` as a name for `*.ui`, but this seems
overly general. Instead, we use `qui` here instead.
Closes#3141
[1]: https://doc.qt.io/qt-6/resources.html
[2]: https://doc.qt.io/qt-6/uic.html
Apparently, if we don't do this, some roff renderers with use a special
Unicode hyphen. That in turn makes searching a man page not work as one
would expect.
Fixes#3140
The goal is to make the completion for `rg --hyperlink-format v<TAB>`
work in the fish shell.
These are not exhaustive (the user can also specify custom formats).
This is somewhat unfortunate, but is probably better than not doing
anything at all.
The `grep+` value necessitated a change to a test.
Closes#3096
This exports a new `HyperlinkAlias` type in the `grep-printer` crate.
This includes a "display priority" with each alias and a function for
getting all supported aliases from the crate.
This should hopefully make it possible for downstream users of this
crate to include a list of supported aliases in the documentation.
Closes#3103
Previously, `Quiet` mode in the summary printer always acted like
"print matching paths," except without the printing. This happened even
if we wanted to "print non-matching paths." Since this only afflicted
quiet mode, this had the effect of flipping the exit status when
`--files-without-match --quiet` was used.
Fixes#3108, Ref #3118
For users of globset who already have a `Vec<Glob>` (or similar),
the current API requires them to iterate over their `Vec<Glob>`,
calling `GlobSetBuilder::add` for each `Glob`, thus constructing a new
`Vec<Glob>` internal to the GlobSetBuilder. This makes the consuming
code unnecessarily verbose. (There is unlikely to be any meaningful
performance impact of this, however, since the cost of allocating a new
`Vec` is likely marginal compared to the cost of glob compilation.)
Instead of taking a `&[Glob]`, we accept an iterator of anything that
can be borrowed as a `&Glob`. This required some light refactoring of
the constructor, but nothing onerous.
Closes#3066
Use `cfg!` to assign a 1000ms delay only on Windows Aarch64 targets.
This was done because it has been observed to be necessary on this
platform. The conditional logic is used because 1s is quite long to
wait on every other more sensible platform.
Closes#3071, Closes#3072