ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-11-23 21:54:45 +02:00

Author	SHA1	Message	Date
Andrew Gallant	cac9870a02	doc: update date in man page template	2025-10-22 08:23:05 -04:00
Andrew Gallant	bee13375ed	deps: update everything	2025-10-22 08:21:56 -04:00
Andrew Gallant	f5be160839	changelog: 15.1.0	2025-10-22 08:21:34 -04:00
Jorge Gomez	24e88dc15b	ignore/types: add `ssa` type This PR adds support for [.ssa](https://en.wikipedia.org/wiki/Static_single-assignment_form) files as read by [qbe](https://c9x.me/compile/): See: https://c9x.me/compile/doc/il.html#Input-Files	2025-10-22 08:18:30 -04:00
Andrew Gallant	5748f81bb1	printer: use `doc_cfg` instead of `doc_auto_cfg` Fixes #3202	2025-10-22 07:47:07 -04:00
Andrew Gallant	d47663b1b4	searcher: fix regression with `--line-buffered` flag In my fix for #3184, I actually had two fixes. One was a tweak to how we read data and the other was a tweak to how we determined how much of the buffer we needed to keep around. It turns out that fixing #3184 only required the latter fix, found in commit `d4b77a8d89`. The former fix also helped the specific case of #3184, but it ended up regressing `--line-buffered`. Specifically, previous to `8c6595c215` (the first fix), we would do one `read` syscall. This call might not fill our caller provided buffer. And in particular, `stdin` seemed to fill fewer bytes than reading from a file. So the "fix" was to put `read` in a loop and keep calling it until the caller provided buffer was full or until the stream was exhausted. This helped alleviate #3184 by amortizing `read` syscalls better. But of course, in retrospect, this change is clearly contrary to how `--line-buffered` works. We specifically do _not_ want to wait around until the buffer is full. We want to read what we can, search it and move on. So this reverts the first fix but leaves the second, which still keeps #3184 fixed and also fixes #3194 (the regression). This reverts commit `8c6595c215`. Fixes #3194	2025-10-19 11:06:39 -04:00
Enoch	38d630261a	printer: add Cursor hyperlink alias This is similar to the other aliases used by VS Code forks. PR #3192	2025-10-17 14:59:17 -04:00
Andrew Gallant	b3dc4b0998	globset: improve debug log This shows the regex that the glob was compiled to.	2025-10-17 10:27:19 -04:00
Andrew Gallant	f09b55b8e7	changelog: start next section	2025-10-15 23:32:00 -04:00
Andrew Gallant	0551c6b931	pkg/brew: update tap	2025-10-15 23:31:35 -04:00
Andrew Gallant	3a612f88b8	15.0.0 15.0.0	2025-10-15 23:07:50 -04:00
Andrew Gallant	ca2e34f37c	grep-0.4.0 grep-0.4.0	2025-10-15 23:06:34 -04:00
Andrew Gallant	a6092beee4	deps: bump to grep-printer 0.3.0	2025-10-15 23:05:10 -04:00
Andrew Gallant	a0d61a063f	grep-printer-0.3.0 grep-printer-0.3.0	2025-10-15 23:04:24 -04:00
Andrew Gallant	c22fc0f13c	deps: bump to grep-searcher 0.1.15	2025-10-15 23:02:59 -04:00
Andrew Gallant	087f82273d	grep-searcher-0.1.15 grep-searcher-0.1.15	2025-10-15 23:02:33 -04:00
Andrew Gallant	a3a30896be	deps: bump to grep-pcre2 0.1.9	2025-10-15 23:01:31 -04:00
Andrew Gallant	7397ab7d97	grep-pcre2-0.1.9 grep-pcre2-0.1.9	2025-10-15 23:01:07 -04:00
Andrew Gallant	cf1dab0d5a	deps: bump to grep-regex 0.1.14	2025-10-15 23:00:58 -04:00
Andrew Gallant	e523c6bf32	grep-regex-0.1.14 grep-regex-0.1.14	2025-10-15 23:00:22 -04:00
Andrew Gallant	720376ead6	deps: bump to grep-matcher 0.1.8	2025-10-15 23:00:12 -04:00
Andrew Gallant	a5ba50ceaf	grep-matcher-0.1.8 grep-matcher-0.1.8	2025-10-15 22:59:35 -04:00
Andrew Gallant	a766f79710	deps: bump to grep-cli 0.1.12	2025-10-15 22:59:17 -04:00
Andrew Gallant	4aafe45760	grep-cli-0.1.12 grep-cli-0.1.12	2025-10-15 22:58:42 -04:00
Andrew Gallant	c03e49b8c5	deps: bump to ignore 0.4.24	2025-10-15 22:58:35 -04:00
Andrew Gallant	70ae7354e1	ignore-0.4.24 ignore-0.4.24	2025-10-15 22:57:50 -04:00
Andrew Gallant	19c2a6e0d9	deps: bump to globset 0.4.17	2025-10-15 22:57:28 -04:00
Andrew Gallant	064b36b115	globset-0.4.17 globset-0.4.17	2025-10-15 22:55:55 -04:00
Andrew Gallant	365384a5c1	doc: move CHANGELOG update before dependency updates It seems better to write this first. Especially so it gets included into crate publishes.	2025-10-15 22:54:51 -04:00
Andrew Gallant	72a5291b4e	doc: update date in man page template	2025-10-15 22:54:11 -04:00
Andrew Gallant	62e676843a	deps: update everything	2025-10-15 22:53:30 -04:00
Andrew Gallant	3780168c13	changelog: 15.0.0	2025-10-15 22:53:30 -04:00
Andrew Gallant	4c953731c4	release: finally switch to LTO for release binaries There seems to be a modest improvement on some workloads: ``` $ time rg -co '\w+' sixteenth.txt 158520346 real 8.457 user 8.426 sys 0.020 maxmem 779 MB faults 0 $ time rg-lto -co '\w+' sixteenth.txt 158520346 real 8.200 user 8.178 sys 0.012 maxmem 778 MB faults 0 ``` I've somewhat reversed course on my previous thoughts here. The improvement isn't much, but the hit to compile times in CI isn't terrible. Mostly I'm doing this out of "good sense," and I think it's generally unlikely to make it more difficult for me to diagnose performance problems. (Since I still use the default `release` profile locally, since it's about an order of magnitude quicker to compile.) Ref #325, Ref #413, Ref #1187, Ref #1255	2025-10-15 22:51:41 -04:00
Andrew Gallant	79d393a302	release: remove riscv64 and powerpc64 artifacts Their CI workflows broke for different reasons. I perceive these as niche platforms that aren't worth blocking a release on. And not worth my time investigating CI problems.	2025-10-15 22:42:51 -04:00
Andrew Gallant	85eaf95833	ci: testing release	2025-10-15 22:41:46 -04:00
Andrew Gallant	63209ae0b9	printer: fix `--stats` for `--json` Somehow, the JSON printer seems to have never emitted correct summary statistics. And I believe #3178 is the first time anyone has ever reported it. I believe this bug has persisted for years. That's surprising. Anyway, the problem here was that we were bailing out of `finish()` on the sink if we weren't supposed to print anything. But we bailed out before we tallied our summary statistics. Obviously we shouldn't do that. Fixes #3178	2025-10-15 21:21:20 -04:00
Andrew Gallant	b610d1cb15	ignore: fix global gitignore bug that arises with absolute paths The `ignore` crate currently handles two different kinds of "global" gitignore files: gitignores from `~/.gitconfig`'s `core.excludesFile` and gitignores passed in via `WalkBuilder::add_ignore` (corresponding to ripgrep's `--ignore-file` flag). In contrast to any other kind of gitignore file, these gitignore files should have their patterns interpreted relative to the current working directory. (Arguably there are other choices we could make here, e.g., based on the paths given. But the `ignore` infrastructure can't handle that, and it's not clearly correct to me.) Normally, a gitignore file has its patterns interpreted relative to where the gitignore file is. This relative interpretation matters for patterns like `/foo`, which are anchored to _some_ directory. Previously, we would generally get the global gitignores correct because it's most common to use ripgrep without providing a path. Thus, it searches the current working directory. In this case, no stripping of the paths is needed in order for the gitignore patterns to be applied directly. But if one provides an absolute path (or something else) to ripgrep to search, the paths aren't stripped correctly. Indeed, in the core, I had just given up and not provided a "root" path to these global gitignores. So it had no hope of getting this correct. We fix this assigning the CWD to the `Gitignore` values created from global gitignore files. This was a painful thing to do because we'd ideally: 1. Call `std::env::current_dir()` at most once for each traversal. 2. Provide a way to avoid the library calling `std::env::current_dir()` at all. (Since this is global process state and folks might want to set it to different values for $reasons.) The `ignore` crate's internals are a total mess. But I think I've addressed the above 2 points in a semver compatible manner. Fixes #3179	2025-10-15 19:44:23 -04:00
Luke Hannan	9ec08522be	ignore/types: add lowercase R extensions PR #3186	2025-10-14 15:15:07 -04:00
Andrew Gallant	d4b77a8d89	searcher: fix a performance bug with `-A/--after-context` Previously (with the previous commit): ``` $ cat bigger.txt \| (time rg ZQZQZQZQZQ -A999) \| wc -l real 2.321 user 0.674 sys 0.735 maxmem 30 MB faults 0 1000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ -A9999) \| wc -l real 2.513 user 0.823 sys 0.686 maxmem 30 MB faults 0 10000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ -A99999) \| wc -l real 5.067 user 3.254 sys 0.676 maxmem 30 MB faults 0 100000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ -A999999) \| wc -l real 6.658 user 4.841 sys 0.778 maxmem 51 MB faults 0 1000000 ``` Now with this commit: ``` $ cat bigger.txt \| (time rg ZQZQZQZQZQ -A999) \| wc -l real 1.845 user 0.328 sys 0.757 maxmem 30 MB faults 0 1000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ -A9999) \| wc -l real 1.917 user 0.334 sys 0.771 maxmem 30 MB faults 0 10000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ -A99999) \| wc -l real 1.972 user 0.319 sys 0.812 maxmem 30 MB faults 0 100000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ -A999999) \| wc -l real 2.005 user 0.333 sys 0.855 maxmem 30 MB faults 0 1000000 ``` And compare to GNU grep: ``` $ cat bigger.txt \| (time grep ZQZQZQZQZQ -A999) \| wc -l real 1.488 user 0.143 sys 0.866 maxmem 30 MB faults 0 1000 $ cat bigger.txt \| (time grep ZQZQZQZQZQ -A9999) \| wc -l real 1.697 user 0.170 sys 0.986 maxmem 30 MB faults 1 10000 $ cat bigger.txt \| (time grep ZQZQZQZQZQ -A99999) \| wc -l real 1.515 user 0.166 sys 0.856 maxmem 29 MB faults 0 100000 $ cat bigger.txt \| (time grep ZQZQZQZQZQ -A999999) \| wc -l real 1.490 user 0.174 sys 0.851 maxmem 30 MB faults 0 1000000 ``` Interestingly, GNU grep is still a bit faster. But both commands remain roughly invariant in search time as `-A` is increased. There is definitely something "odd" about searching `stdin`, where it seems substantially slower. We can also observe with GNU grep: ``` $ (time grep ZQZQZQZQZQ -A999999 bigger.txt) \| wc -l real 0.692 user 0.184 sys 0.506 maxmem 30 MB faults 0 1000000 $ cat bigger.txt \| (time grep ZQZQZQZQZQ -A999999) \| wc -l real 1.700 user 0.201 sys 0.954 maxmem 30 MB faults 0 1000000 $ (time rg ZQZQZQZQZQ -A999999 bigger.txt) \| wc -l real 0.640 user 0.428 sys 0.209 maxmem 7734 MB faults 0 1000000 $ (time rg ZQZQZQZQZQ --no-mmap -A999999 bigger.txt) \| wc -l real 0.866 user 0.282 sys 0.581 maxmem 30 MB faults 0 1000000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ -A999999) \| wc -l real 1.991 user 0.338 sys 0.819 maxmem 30 MB faults 0 1000000 ``` I wonder if this is related to my discovery in the previous commit where `read` calls on `stdin` seem to never return anything more than ~64K. Oh well, I'm satisfied at this point, especially given that GNU grep seems to do a lot worse than ripgrep with bigger values of `-B/--before-context`: ``` $ cat bigger.txt \| (time grep ZQZQZQZQZQ -B9) \| wc -l real 1.568 user 0.170 sys 0.885 maxmem 30 MB faults 0 1 $ cat bigger.txt \| (time grep ZQZQZQZQZQ -B99) \| wc -l real 1.734 user 0.338 sys 0.879 maxmem 30 MB faults 0 1 $ cat bigger.txt \| (time grep ZQZQZQZQZQ -B999) \| wc -l real 2.349 user 1.723 sys 0.620 maxmem 30 MB faults 0 1 $ cat bigger.txt \| (time grep ZQZQZQZQZQ -B9999) \| wc -l real 16.459 user 15.848 sys 0.586 maxmem 30 MB faults 0 1 $ time grep ZQZQZQZQZQ -B99999 bigger.txt ZQZQZQZQZQ real 1:45.06 user 1:44.12 sys 0.772 maxmem 30 MB faults 0 ``` The above pattern occurs regardless of whether you put `bigger.txt` on stdin or whether you search it directly. And now ripgrep: ``` $ cat bigger.txt \| (time rg ZQZQZQZQZQ -B9) \| wc -l real 1.965 user 0.326 sys 0.814 maxmem 29 MB faults 0 1 $ cat bigger.txt \| (time rg ZQZQZQZQZQ -B99) \| wc -l real 1.941 user 0.423 sys 0.813 maxmem 29 MB faults 0 1 $ cat bigger.txt \| (time rg ZQZQZQZQZQ -B999) \| wc -l real 2.372 user 0.759 sys 0.703 maxmem 30 MB faults 0 1 $ cat bigger.txt \| (time rg ZQZQZQZQZQ -B9999) \| wc -l real 2.638 user 0.895 sys 0.665 maxmem 29 MB faults 0 1 $ cat bigger.txt \| (time rg ZQZQZQZQZQ -B99999) \| wc -l real 5.172 user 3.282 sys 0.748 maxmem 29 MB faults 0 1 ``` NOTE: To get `bigger.txt`: ``` $ curl -LO 'https://burntsushi.net/stuff/opensubtitles/2018/en/sixteenth.txt.gz' $ gzip -d sixteenth.txt.gz $ (echo ZQZQZQZQZQ && for ((i=0;i<10;i++)); do cat sixteenth.txt; done) > bigger.txt ```	2025-10-14 14:27:43 -04:00
Andrew Gallant	8c6595c215	searcher: fix performance bug with `-A/--after-context` when searching `stdin` This was a crazy subtle bug where ripgrep could slow down exponentially as increasingly larger values of `-A/--after-context` were used. But, interestingly, this would only occur when searching `stdin` and _not_ when searching the same data as a regular file. This confounded me because ripgrep, pretty early on, erases the difference between searching a single file and `stdin`. So it wasn't like there were different code paths. And I mistakenly assumed that they would otherwise behave the same as they are just treated as streams. But... it turns out that running `read` on a `stdin` versus a regular file seems to behave differently. At least on my Linux system, with `stdin`, `read` never seems to fill the buffer with more than 64K. But with a regular file, `read` pretty reliably fills the caller's buffer with as much space as declared. Of course, it is expected that `read` doesn't have to fill up the caller's buffer, and ripgrep is generally fine with that. But when `-A/--after-context` is used with a very large value---big enough that the default buffer capacity is too small---then more heap memory needs to be allocated to correctly handle all cases. This can result in passing buffers bigger than 64K to `read`. While we correctly handle `read` calls that don't fill the buffer, it turns out that if we don't fill the buffer, then we get into a pathological case where we aren't processing as many bytes as we could. That is, because of the `-A/--after-context` causing us to keep a lot of bytes around while we roll the buffer and because reading from `stdin` gives us fewer bytes than normal, we weren't amortizing our `read` calls as well as we should have been. Indeed, our buffer capacity increases specifically take this amortization into account, but we weren't taking advantage of it. We fix this by putting `read` into an inner loop that ensures our buffer gets filled up. This fixes the performance bug: ``` $ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A9999) \| wc -l real 1.330 user 0.767 sys 0.559 maxmem 29 MB faults 0 10000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ --no-mmap -A9999) \| wc -l real 2.355 user 0.860 sys 0.613 maxmem 29 MB faults 0 10000 $ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A99999) \| wc -l real 3.636 user 3.091 sys 0.537 maxmem 29 MB faults 0 100000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ --no-mmap -A99999) \| wc -l real 4.918 user 3.236 sys 0.710 maxmem 29 MB faults 0 100000 $ (time rg ZQZQZQZQZQ bigger.txt --no-mmap -A999999) \| wc -l real 5.430 user 4.666 sys 0.750 maxmem 51 MB faults 0 1000000 $ cat bigger.txt \| (time rg ZQZQZQZQZQ --no-mmap -A999999) \| wc -l real 6.894 user 4.907 sys 0.850 maxmem 51 MB faults 0 1000000 ``` For comparison, here is GNU grep: ``` $ cat bigger.txt \| (time grep ZQZQZQZQZQ -A9999) \| wc -l real 1.466 user 0.159 sys 0.839 maxmem 29 MB faults 0 10000 $ cat bigger.txt \| (time grep ZQZQZQZQZQ -A99999) \| wc -l real 1.663 user 0.166 sys 0.941 maxmem 29 MB faults 0 100000 $ cat bigger.txt \| (time grep ZQZQZQZQZQ -A999999) \| wc -l real 1.631 user 0.204 sys 0.910 maxmem 29 MB faults 0 1000000 ``` GNU grep is still notably faster. We'll fix that in the next commit. Fixes #3184	2025-10-14 14:27:43 -04:00
Andrew Gallant	de2567a4c7	printer: fix panic in replacements in look-around corner case The abstraction boundary fuck up is the gift that keeps on giving. It turns out that the invariant that the match would never exceed the range given is not always true. So we kludge around it. Also, update the CHANGELOG to include the fix for #2111. Fixes #3180	2025-10-12 17:25:19 -04:00
Andrew Gallant	916415857f	core: don't build decompression reader unless we intend to use it Building it can consume resources. In particular, on Windows, the various binaries are eagerly resolved. I think this originally wasn't done. The eager resolution was added later for security purposes. But the "eager" part isn't actually necessary. It would probably be better to change the decompression reader to do lazy resolution only when the binary is needed. But this will at least avoid doing anything when the `-z/--search-zip` flag isn't used. But when it is, ripgrep will still eagerly resolve all possible binaries. Fixes #2111	2025-10-12 16:31:20 -04:00
Andrew Gallant	5c42c8c48f	test: add regression test for fixed bug It turns out that #2094 was fixed in my `--max-count` refactor a few commits back. This commit adds a regression test for it. Closes #2094	2025-10-12 12:45:34 -04:00
Andrew Gallant	f0faa91c68	doc: clarify `--ignore-file` precedence Fixes #2777	2025-10-10 22:06:59 -04:00
Andrew Gallant	a5d9e03c68	test: attempt to fix flaky time-reliant test Fixes #2794	2025-10-10 22:06:59 -04:00
Andrew Gallant	924ba101ee	test: fix `Command::current_dir` API Every single call site wants to pass a path relative to the directory the command was created for. So just make it do that automatically, similar to `Dir::create` and friends.	2025-10-10 22:06:59 -04:00
Andrew Gallant	293ef80eaf	test: add another regression test for gitignore matching bug I believe this was also fixed by #2933. Closes #2770	2025-10-10 22:06:59 -04:00
Andrew Gallant	fa80aab6b0	test: add regression test for fixed gitignore bug I believe this was actually fixed by #2933. Closes #3067	2025-10-10 22:06:59 -04:00
mariano-m13	7c2161d687	release: add binaries for `riscv64gc-unknown-linux-gnu` target Note that we skip lz4/brotli/zstd tests on RISC-V. The CI runs RISC-V tests using cross/QEMU emulation. The decompression tools (lz4, brotli, zstd) are x86_64 binaries on the host that cannot execute in the RISC-V QEMU environment. Skip these three tests at compile-time on RISC-V to avoid test failures. The -z/--search-zip functionality itself works correctly on real RISC-V hardware where native decompression tools are available. PR #3165	2025-10-10 20:50:28 -04:00
Andrew Gallant	096f79ab98	deps: update everything This includes an update to `regex 1.12.1`, which fixes a couple of outstanding bugs in ripgrep. Fixes #2750, Fixes #3135	2025-10-10 20:13:29 -04:00

1 2 3 4 5 ...

2191 Commits