1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-11-23 21:54:45 +02:00
Commit Graph

525 Commits

Author SHA1 Message Date
Andrew Gallant
2ea06d69aa grep-0.4.1 2025-10-22 08:28:53 -04:00
Andrew Gallant
85006b08d6 deps: bump to grep-printer 0.3.1 2025-10-22 08:28:32 -04:00
Andrew Gallant
423afb8513 grep-printer-0.3.1 2025-10-22 08:28:06 -04:00
Andrew Gallant
4694800be5 deps: bump to grep-searcher 0.1.16 2025-10-22 08:26:22 -04:00
Andrew Gallant
86e0ab12ef grep-searcher-0.1.16 2025-10-22 08:25:01 -04:00
Andrew Gallant
7189950799 deps: bump to globset 0.4.18 2025-10-22 08:24:51 -04:00
Andrew Gallant
0b0e013f5a globset-0.4.18 2025-10-22 08:23:57 -04:00
Andrew Gallant
cac9870a02 doc: update date in man page template 2025-10-22 08:23:05 -04:00
Jorge Gomez
24e88dc15b ignore/types: add ssa type
This PR adds support for [.ssa](https://en.wikipedia.org/wiki/Static_single-assignment_form) files as read by [qbe](https://c9x.me/compile/):

See: https://c9x.me/compile/doc/il.html#Input-Files
2025-10-22 08:18:30 -04:00
Andrew Gallant
5748f81bb1 printer: use doc_cfg instead of doc_auto_cfg
Fixes #3202
2025-10-22 07:47:07 -04:00
Andrew Gallant
d47663b1b4 searcher: fix regression with --line-buffered flag
In my fix for #3184, I actually had two fixes. One was a tweak to how we
read data and the other was a tweak to how we determined how much of the
buffer we needed to keep around. It turns out that fixing #3184 only
required the latter fix, found in commit
d4b77a8d89. The former fix also helped the
specific case of #3184, but it ended up regressing `--line-buffered`.

Specifically, previous to 8c6595c215 (the
first fix), we would do one `read` syscall. This call might not fill our
caller provided buffer. And in particular, `stdin` seemed to fill fewer
bytes than reading from a file. So the "fix" was to put `read` in a loop
and keep calling it until the caller provided buffer was full or until
the stream was exhausted. This helped alleviate #3184 by amortizing
`read` syscalls better.

But of course, in retrospect, this change is clearly contrary to how
`--line-buffered` works. We specifically do _not_ want to wait around
until the buffer is full. We want to read what we can, search it and
move on.

So this reverts the first fix but leaves the second, which still
keeps #3184 fixed and also fixes #3194 (the regression).

This reverts commit 8c6595c215.

Fixes #3194
2025-10-19 11:06:39 -04:00
Enoch
38d630261a printer: add Cursor hyperlink alias
This is similar to the other aliases used by
VS Code forks.

PR #3192
2025-10-17 14:59:17 -04:00
Andrew Gallant
b3dc4b0998 globset: improve debug log
This shows the regex that the glob was compiled to.
2025-10-17 10:27:19 -04:00
Andrew Gallant
ca2e34f37c grep-0.4.0 2025-10-15 23:06:34 -04:00
Andrew Gallant
a0d61a063f grep-printer-0.3.0 2025-10-15 23:04:24 -04:00
Andrew Gallant
c22fc0f13c deps: bump to grep-searcher 0.1.15 2025-10-15 23:02:59 -04:00
Andrew Gallant
087f82273d grep-searcher-0.1.15 2025-10-15 23:02:33 -04:00
Andrew Gallant
a3a30896be deps: bump to grep-pcre2 0.1.9 2025-10-15 23:01:31 -04:00
Andrew Gallant
7397ab7d97 grep-pcre2-0.1.9 2025-10-15 23:01:07 -04:00
Andrew Gallant
cf1dab0d5a deps: bump to grep-regex 0.1.14 2025-10-15 23:00:58 -04:00
Andrew Gallant
e523c6bf32 grep-regex-0.1.14 2025-10-15 23:00:22 -04:00
Andrew Gallant
720376ead6 deps: bump to grep-matcher 0.1.8 2025-10-15 23:00:12 -04:00
Andrew Gallant
a5ba50ceaf grep-matcher-0.1.8 2025-10-15 22:59:35 -04:00
Andrew Gallant
a766f79710 deps: bump to grep-cli 0.1.12 2025-10-15 22:59:17 -04:00
Andrew Gallant
4aafe45760 grep-cli-0.1.12 2025-10-15 22:58:42 -04:00
Andrew Gallant
70ae7354e1 ignore-0.4.24 2025-10-15 22:57:50 -04:00
Andrew Gallant
19c2a6e0d9 deps: bump to globset 0.4.17 2025-10-15 22:57:28 -04:00
Andrew Gallant
064b36b115 globset-0.4.17 2025-10-15 22:55:55 -04:00
Andrew Gallant
72a5291b4e doc: update date in man page template 2025-10-15 22:54:11 -04:00
Andrew Gallant
63209ae0b9 printer: fix --stats for --json
Somehow, the JSON printer seems to have never emitted correct summary
statistics. And I believe #3178 is the first time anyone has ever
reported it. I believe this bug has persisted for years. That's
surprising.

Anyway, the problem here was that we were bailing out of `finish()` on
the sink if we weren't supposed to print anything. But we bailed out
before we tallied our summary statistics. Obviously we shouldn't do
that.

Fixes #3178
2025-10-15 21:21:20 -04:00
Andrew Gallant
b610d1cb15 ignore: fix global gitignore bug that arises with absolute paths
The `ignore` crate currently handles two different kinds of "global"
gitignore files: gitignores from `~/.gitconfig`'s `core.excludesFile`
and gitignores passed in via `WalkBuilder::add_ignore` (corresponding to
ripgrep's `--ignore-file` flag).

In contrast to any other kind of gitignore file, these gitignore files
should have their patterns interpreted relative to the current working
directory. (Arguably there are other choices we could make here, e.g.,
based on the paths given. But the `ignore` infrastructure can't handle
that, and it's not clearly correct to me.) Normally, a gitignore file
has its patterns interpreted relative to where the gitignore file is.
This relative interpretation matters for patterns like `/foo`, which are
anchored to _some_ directory.

Previously, we would generally get the global gitignores correct because
it's most common to use ripgrep without providing a path. Thus, it
searches the current working directory. In this case, no stripping of
the paths is needed in order for the gitignore patterns to be applied
directly.

But if one provides an absolute path (or something else) to ripgrep to
search, the paths aren't stripped correctly. Indeed, in the core, I had
just given up and not provided a "root" path to these global gitignores.
So it had no hope of getting this correct.

We fix this assigning the CWD to the `Gitignore` values created from
global gitignore files. This was a painful thing to do because we'd
ideally:

1. Call `std::env::current_dir()` at most once for each traversal.
2. Provide a way to avoid the library calling `std::env::current_dir()`
   at all. (Since this is global process state and folks might want to
   set it to different values for $reasons.)

The `ignore` crate's internals are a total mess. But I think I've
addressed the above 2 points in a semver compatible manner.

Fixes #3179
2025-10-15 19:44:23 -04:00
Luke Hannan
9ec08522be ignore/types: add lowercase R extensions
PR #3186
2025-10-14 15:15:07 -04:00
Andrew Gallant
d4b77a8d89 searcher: fix a performance bug with -A/--after-context
Previously (with the previous commit):

```
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999) | wc -l

real    2.321
user    0.674
sys     0.735
maxmem  30 MB
faults  0
1000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A9999) | wc -l

real    2.513
user    0.823
sys     0.686
maxmem  30 MB
faults  0
10000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A99999) | wc -l

real    5.067
user    3.254
sys     0.676
maxmem  30 MB
faults  0
100000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999999) | wc -l

real    6.658
user    4.841
sys     0.778
maxmem  51 MB
faults  0
1000000
```

Now with this commit:

```
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999) | wc -l

real    1.845
user    0.328
sys     0.757
maxmem  30 MB
faults  0
1000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A9999) | wc -l

real    1.917
user    0.334
sys     0.771
maxmem  30 MB
faults  0
10000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A99999) | wc -l

real    1.972
user    0.319
sys     0.812
maxmem  30 MB
faults  0
100000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999999) | wc -l

real    2.005
user    0.333
sys     0.855
maxmem  30 MB
faults  0
1000000
```

And compare to GNU grep:

```
$ cat bigger.txt | (time grep ZQZQZQZQZQ -A999) | wc -l

real    1.488
user    0.143
sys     0.866
maxmem  30 MB
faults  0
1000

$ cat bigger.txt | (time grep ZQZQZQZQZQ -A9999) | wc -l

real    1.697
user    0.170
sys     0.986
maxmem  30 MB
faults  1
10000

$ cat bigger.txt | (time grep ZQZQZQZQZQ -A99999) | wc -l

real    1.515
user    0.166
sys     0.856
maxmem  29 MB
faults  0
100000

$ cat bigger.txt | (time grep ZQZQZQZQZQ -A999999) | wc -l

real    1.490
user    0.174
sys     0.851
maxmem  30 MB
faults  0
1000000
```

Interestingly, GNU grep is still a bit faster. But both commands remain
roughly invariant in search time as `-A` is increased.

There is definitely something "odd" about searching `stdin`, where it
seems substantially slower. We can also observe with GNU grep:

```
$ (time grep ZQZQZQZQZQ -A999999 bigger.txt) | wc -l

real    0.692
user    0.184
sys     0.506
maxmem  30 MB
faults  0
1000000

$ cat bigger.txt | (time grep ZQZQZQZQZQ -A999999) | wc -l

real    1.700
user    0.201
sys     0.954
maxmem  30 MB
faults  0
1000000

$ (time rg ZQZQZQZQZQ -A999999 bigger.txt) | wc -l

real    0.640
user    0.428
sys     0.209
maxmem  7734 MB
faults  0
1000000

$ (time rg ZQZQZQZQZQ --no-mmap -A999999 bigger.txt) | wc -l

real    0.866
user    0.282
sys     0.581
maxmem  30 MB
faults  0
1000000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999999) | wc -l

real    1.991
user    0.338
sys     0.819
maxmem  30 MB
faults  0
1000000
```

I wonder if this is related to my discovery in the previous commit where
`read` calls on `stdin` seem to never return anything more than ~64K. Oh
well, I'm satisfied at this point, especially given that GNU grep seems
to do a lot worse than ripgrep with bigger values of
`-B/--before-context`:

```
$ cat bigger.txt | (time grep ZQZQZQZQZQ -B9) | wc -l

real    1.568
user    0.170
sys     0.885
maxmem  30 MB
faults  0
1

$ cat bigger.txt | (time grep ZQZQZQZQZQ -B99) | wc -l

real    1.734
user    0.338
sys     0.879
maxmem  30 MB
faults  0
1

$ cat bigger.txt | (time grep ZQZQZQZQZQ -B999) | wc -l

real    2.349
user    1.723
sys     0.620
maxmem  30 MB
faults  0
1

$ cat bigger.txt | (time grep ZQZQZQZQZQ -B9999) | wc -l

real    16.459
user    15.848
sys     0.586
maxmem  30 MB
faults  0
1

$ time grep ZQZQZQZQZQ -B99999 bigger.txt
ZQZQZQZQZQ

real    1:45.06
user    1:44.12
sys     0.772
maxmem  30 MB
faults  0
```

The above pattern occurs regardless of whether you put `bigger.txt` on
stdin or whether you search it directly.

And now ripgrep:

```
$ cat bigger.txt | (time rg ZQZQZQZQZQ -B9) | wc -l

real    1.965
user    0.326
sys     0.814
maxmem  29 MB
faults  0
1

$ cat bigger.txt | (time rg ZQZQZQZQZQ -B99) | wc -l

real    1.941
user    0.423
sys     0.813
maxmem  29 MB
faults  0
1

$ cat bigger.txt | (time rg ZQZQZQZQZQ -B999) | wc -l

real    2.372
user    0.759
sys     0.703
maxmem  30 MB
faults  0
1

$ cat bigger.txt | (time rg ZQZQZQZQZQ -B9999) | wc -l

real    2.638
user    0.895
sys     0.665
maxmem  29 MB
faults  0
1

$ cat bigger.txt | (time rg ZQZQZQZQZQ -B99999) | wc -l

real    5.172
user    3.282
sys     0.748
maxmem  29 MB
faults  0
1
```

NOTE: To get `bigger.txt`:

```
$ curl -LO 'https://burntsushi.net/stuff/opensubtitles/2018/en/sixteenth.txt.gz'
$ gzip -d sixteenth.txt.gz
$ (echo ZQZQZQZQZQ && for ((i=0;i<10;i++)); do cat sixteenth.txt; done) > bigger.txt
```
2025-10-14 14:27:43 -04:00
Andrew Gallant
8c6595c215 searcher: fix performance bug with -A/--after-context when searching stdin
This was a crazy subtle bug where ripgrep could slow down exponentially
as increasingly larger values of `-A/--after-context` were used. But,
interestingly, this would only occur when searching `stdin` and _not_
when searching the same data as a regular file.

This confounded me because ripgrep, pretty early on, erases the
difference between searching a single file and `stdin`. So it wasn't
like there were different code paths. And I mistakenly assumed that they
would otherwise behave the same as they are just treated as streams.

But... it turns out that running `read` on a `stdin` versus a regular
file seems to behave differently. At least on my Linux system, with
`stdin`, `read` never seems to fill the buffer with more than 64K. But
with a regular file, `read` pretty reliably fills the caller's buffer
with as much space as declared.

Of course, it is expected that `read` doesn't *have* to fill up the
caller's buffer, and ripgrep is generally fine with that. But when
`-A/--after-context` is used with a very large value---big enough that
the default buffer capacity is too small---then more heap memory needs
to be allocated to correctly handle all cases. This can result in
passing buffers bigger than 64K to `read`.

While we *correctly* handle `read` calls that don't fill the buffer,
it turns out that if we don't fill the buffer, then we get into a
pathological case where we aren't processing as many bytes as we could.
That is, because of the `-A/--after-context` causing us to keep a lot of
bytes around while we roll the buffer and because reading from `stdin`
gives us fewer bytes than normal, we weren't amortizing our `read` calls
as well as we should have been. Indeed, our buffer capacity increases
specifically take this amortization into account, but we weren't taking
advantage of it.

We fix this by putting `read` into an inner loop that ensures our
buffer gets filled up. This fixes the performance bug:

```
$ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A9999) | wc -l

real    1.330
user    0.767
sys     0.559
maxmem  29 MB
faults  0
10000

$ cat bigger.txt | (time rg ZQZQZQZQZQ --no-mmap -A9999) | wc -l

real    2.355
user    0.860
sys     0.613
maxmem  29 MB
faults  0
10000

$ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A99999) | wc -l

real    3.636
user    3.091
sys     0.537
maxmem  29 MB
faults  0
100000

$ cat bigger.txt | (time rg ZQZQZQZQZQ --no-mmap -A99999) | wc -l

real    4.918
user    3.236
sys     0.710
maxmem  29 MB
faults  0
100000

$ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A999999) | wc -l

real    5.430
user    4.666
sys     0.750
maxmem  51 MB
faults  0
1000000

$ cat bigger.txt | (time rg ZQZQZQZQZQ --no-mmap -A999999) | wc -l

real    6.894
user    4.907
sys     0.850
maxmem  51 MB
faults  0
1000000
```

For comparison, here is GNU grep:

```
$ cat bigger.txt | (time grep ZQZQZQZQZQ -A9999) | wc -l

real    1.466
user    0.159
sys     0.839
maxmem  29 MB
faults  0
10000

$ cat bigger.txt | (time grep ZQZQZQZQZQ -A99999) | wc -l

real    1.663
user    0.166
sys     0.941
maxmem  29 MB
faults  0
100000

$ cat bigger.txt | (time grep ZQZQZQZQZQ -A999999) | wc -l

real    1.631
user    0.204
sys     0.910
maxmem  29 MB
faults  0
1000000
```

GNU grep is still notably faster. We'll fix that in the next commit.

Fixes #3184
2025-10-14 14:27:43 -04:00
Andrew Gallant
de2567a4c7 printer: fix panic in replacements in look-around corner case
The abstraction boundary fuck up is the gift that keeps on giving. It
turns out that the invariant that the match would never exceed the range
given is not always true. So we kludge around it.

Also, update the CHANGELOG to include the fix for #2111.

Fixes #3180
2025-10-12 17:25:19 -04:00
Andrew Gallant
916415857f core: don't build decompression reader unless we intend to use it
Building it can consume resources. In particular, on Windows, the
various binaries are eagerly resolved.

I think this originally wasn't done. The eager resolution was added
later for security purposes. But the "eager" part isn't actually
necessary.

It would probably be better to change the decompression reader to do
lazy resolution only when the binary is needed. But this will at least
avoid doing anything when the `-z/--search-zip` flag isn't used. But
when it is, ripgrep will still eagerly resolve all possible binaries.

Fixes #2111
2025-10-12 16:31:20 -04:00
Andrew Gallant
f0faa91c68 doc: clarify --ignore-file precedence
Fixes #2777
2025-10-10 22:06:59 -04:00
Andrew Gallant
0407e104f6 ignore: fix problem with searching whitelisted hidden files
... specifically, when the whitelist comes from a _parent_ gitignore
file.

Our handling of parent gitignores is pretty ham-fisted and has been a
source of some unfortunate bugs. The problem is that we need to strip
the parent path from the path we're searching in order to correctly
apply the globs. But getting this stripping correct seems to be a subtle
affair.

Fixes #3173
2025-10-08 21:16:59 -04:00
Alvaro Parker
2924d0c4c0 ignore: add min_depth option
This mimics the eponymous option in `walkdir`.

Closes #3158, PR #3162
2025-10-05 10:05:26 -04:00
Andrew Gallant
9d8016d10c printer: finish removal of max_matches
This finishes what I started in commit
a6e0be3c90.
Specifically, the `max_matches` configuration has been moved to the
`grep-searcher` crate and *removed* from the `grep-printer` crate. The
commit message has the details for why we're doing this, but the short
story is to fix #3076.

Note that this is a breaking change for `grep-printer`, so this will
require a semver incompatible release.
2025-10-04 09:19:53 -04:00
Andrew Gallant
fdea9723ca doc: clarify a case where -m/--max-count is not strictly respected
In #2843, it's requested that these trailing contextual lines should be
displayed as non-matching because they exceed the limit. While
reasonable, I think that:

1. This would be a weird complication to the implementation.
2. This would overall be less intuitive and more complex. Today, there
   is never a case where ripgrep emits a matching line in a way where
   the match isn't highlighted.

Closes #2843
2025-09-22 22:12:15 -04:00
Andrew Gallant
c45ec16360 doc: clarify --multiline --count
Specifically, it is only equivalent to `--count-matches` when the
pattern(s) given can match over multiple lines.

We could have instead made `--multiline --count` always equivalent to
`--multiline --count-matches`, but this seems plausibly less useful.
Indeed, I think it's generally a good thing that users can enable
`-U/--multiline` but still use patterns that only match a single line.
Changing how that behaves would I think be more surprising.

Either way we slice this, it's unfortunately pretty subtle.

Fixes #2852
2025-09-22 22:00:15 -04:00
Andrew Gallant
e42432cc5d ignore: clarify WalkBuilder::filter_entry
Fixes #2913
2025-09-22 21:49:29 -04:00
Andrew Gallant
6e77339f30 cli: tweak docs for resolve_binary
Fixes #2928
2025-09-22 21:38:08 -04:00
Andrew Gallant
1b07c6616a cli: document that -c/--count can be inconsistent with -l/--files-with-matches
This is unfortunate, but is a known bug that I don't think can be fixed
without either making `-l/--files-with-matches` much slower or changing
what "binary filtering" means by default.

In this PR, we document this inconsistency since users may find it quite
surprising. The actual work-around is to disable binary filtering with
the `--binary` flag.

We add a test confirming this behavior.

Closes #3131
2025-09-22 20:24:53 -04:00
Andrew Gallant
8b5d3d1c1e printer: hack in a fix for -l/--files-with-matches when using --pcre2 --multiline with look-around
The underlying issue here is #2528, which was introduced by commit
efd9cfb2fc which fixed another bug.

For the specific case of "did a file match," we can always assume the
match count is at least 1 here. But this doesn't fix the underlying
problem.

Fixes #3139
2025-09-22 09:12:16 -04:00
Lucas Trzesniewski
a7b7d81d66 lint: fix a few Clippy errors
PR #3151
2025-09-21 09:15:48 -04:00
Andrew Gallant
bb8172fe9b style: apply rustfmt
Maybe 2024 changes?

Note that we now set `edition = "2024"` explicitly in `rustfmt.toml`.
Without this, it seems like it's possible in some cases for rustfmt to
run under an older edition's style. Not sure how though.
2025-09-19 21:08:19 -04:00
Isaac
64174b8e68 printer: preserve line terminator when using --crlf and --replace
Ref #3097, Closes #3100
2025-09-19 21:08:19 -04:00
mostafa
f596a5d875 globset: add allow_unclosed_class toggle
When enabled, patterns like `[abc`, `[]`, `[!]` are treated as if the
opening `[` is just a literal. This is in contrast the default behavior,
which prioritizes better error messages, of returning a parse error.

Fixes #3127, Closes #3145
2025-09-19 21:08:19 -04:00