I was comparing the work being done by fd and find and noticed (with
`strace -f -c -S` calls) that fd was doing a ton of failed `statx`
calls. Upon closer inspection it was stating `.jj` even though I
was passing `--no-ignore`. Eventually I turned up this check in
`Ignore::add_child_path` that was doing stat on `.jj` regardless of
whether the options request it.
With this patch it'll only stat `.jj` if that's relevant to the query.
PR #3212
In my fix for #3184, I actually had two fixes. One was a tweak to how we
read data and the other was a tweak to how we determined how much of the
buffer we needed to keep around. It turns out that fixing #3184 only
required the latter fix, found in commit
d4b77a8d89. The former fix also helped the
specific case of #3184, but it ended up regressing `--line-buffered`.
Specifically, previous to 8c6595c215 (the
first fix), we would do one `read` syscall. This call might not fill our
caller provided buffer. And in particular, `stdin` seemed to fill fewer
bytes than reading from a file. So the "fix" was to put `read` in a loop
and keep calling it until the caller provided buffer was full or until
the stream was exhausted. This helped alleviate #3184 by amortizing
`read` syscalls better.
But of course, in retrospect, this change is clearly contrary to how
`--line-buffered` works. We specifically do _not_ want to wait around
until the buffer is full. We want to read what we can, search it and
move on.
So this reverts the first fix but leaves the second, which still
keeps #3184 fixed and also fixes#3194 (the regression).
This reverts commit 8c6595c215.
Fixes#3194
There seems to be a modest improvement on some workloads:
```
$ time rg -co '\w+' sixteenth.txt
158520346
real 8.457
user 8.426
sys 0.020
maxmem 779 MB
faults 0
$ time rg-lto -co '\w+' sixteenth.txt
158520346
real 8.200
user 8.178
sys 0.012
maxmem 778 MB
faults 0
```
I've somewhat reversed course on my previous thoughts here. The
improvement isn't much, but the hit to compile times in CI isn't
terrible. Mostly I'm doing this out of "good sense," and I think it's
generally unlikely to make it more difficult for me to diagnose
performance problems. (Since I still use the default `release` profile
locally, since it's about an order of magnitude quicker to compile.)
Ref #325, Ref #413, Ref #1187, Ref #1255
Their CI workflows broke for different reasons.
I perceive these as niche platforms that aren't worth blocking
a release on. And not worth my time investigating CI problems.
Somehow, the JSON printer seems to have never emitted correct summary
statistics. And I believe #3178 is the first time anyone has ever
reported it. I believe this bug has persisted for years. That's
surprising.
Anyway, the problem here was that we were bailing out of `finish()` on
the sink if we weren't supposed to print anything. But we bailed out
before we tallied our summary statistics. Obviously we shouldn't do
that.
Fixes#3178
The `ignore` crate currently handles two different kinds of "global"
gitignore files: gitignores from `~/.gitconfig`'s `core.excludesFile`
and gitignores passed in via `WalkBuilder::add_ignore` (corresponding to
ripgrep's `--ignore-file` flag).
In contrast to any other kind of gitignore file, these gitignore files
should have their patterns interpreted relative to the current working
directory. (Arguably there are other choices we could make here, e.g.,
based on the paths given. But the `ignore` infrastructure can't handle
that, and it's not clearly correct to me.) Normally, a gitignore file
has its patterns interpreted relative to where the gitignore file is.
This relative interpretation matters for patterns like `/foo`, which are
anchored to _some_ directory.
Previously, we would generally get the global gitignores correct because
it's most common to use ripgrep without providing a path. Thus, it
searches the current working directory. In this case, no stripping of
the paths is needed in order for the gitignore patterns to be applied
directly.
But if one provides an absolute path (or something else) to ripgrep to
search, the paths aren't stripped correctly. Indeed, in the core, I had
just given up and not provided a "root" path to these global gitignores.
So it had no hope of getting this correct.
We fix this assigning the CWD to the `Gitignore` values created from
global gitignore files. This was a painful thing to do because we'd
ideally:
1. Call `std::env::current_dir()` at most once for each traversal.
2. Provide a way to avoid the library calling `std::env::current_dir()`
at all. (Since this is global process state and folks might want to
set it to different values for $reasons.)
The `ignore` crate's internals are a total mess. But I think I've
addressed the above 2 points in a semver compatible manner.
Fixes#3179