This replaces the use of channels in the parallel directory traversal
with a simple stack. The primary motivation for this change is to reduce
peak memory usage. In particular, when using a channel (which is a
queue), we wind up visiting files in a breadth first fashion. Using a
stack switches us to a depth first traversal. While there are no real
intrinsic differences, depth first traversal generally tends to use less
memory because directory trees are more commonly wide than they are
deep.
In particular, the queue/stack size itself is not the only concern. In
one recent case documented in #1550, a user wanted to search all Rust
crates. The directory structure was shallow but extremely wide, with a
single directory containing all crates. This in turn results is in
descending into each of those directories and building a gitignore
matcher for each (since most crates have `.gitignore` files) before ever
searching a single file. This means that ripgrep has all such matchers
in memory simultaneously, which winds up using quite a bit of memory.
In a depth first traversal, peak memory usage is much lower because
gitignore matches are built and discarded more quickly. In the case of
searching all crates, the peak memory usage decrease is dramatic. On my
system, it shrinks by an order magnitude, from almost 1GB to 50MB. The
decline in peak memory usage is consistent across other use cases as
well, but is typically more modest. For example, searching the Linux
repo has a 50% decrease in peak memory usage and searching the Chromium
repo has a 25% decrease in peak memory usage.
Search times generally remain unchanged, although some ad hoc benchmarks
that I typically run have gotten a bit slower. As far as I can tell,
this appears to be result of scheduling changes. Namely, the depth first
traversal seems to result in searching some very large files towards the
end of the search, which reduces the effectiveness of parallelism and
makes the overall search take longer. This seems to suggest that a stack
isn't optimal. It would instead perhaps be better to prioritize
searching larger files first, but it's not quite clear how to do this
without introducing more overhead (getting the file size for each file
requires a stat call).
Fixes#1550
It appears to be intermittently failing. Specifically, a2x seems to be
failing occasionally with no apparent reason why. The error message it
gives is inscrutable. Sigh.
In a prior commit, we fixed a performance problem with the -w flag by
doing a little extra work to extract literals. It turns out that using
literals in this case when the -w flag is NOT used results in a
performance regression. The reasoning is that we end up using a "fast"
regex as a prefilter when the regex engine itself uses its own
equivalent prefilter, so ripgrep ends up redoing a fair amount of work.
Instead, we only do this extra work when we know the -w flag is enabled.
... and don't replace them with anything because crates.io does not
support GitHub Actions yet. But it's almost there:
https://github.com/rust-lang/crates.io/pull/1838
Thanks @atouchet for noticing this.
We should not assume that the commondir file actually exists. If it
doesn't, then just move on. This otherwise emits an error message when
searching normal submodules, which is not OK.
This regression was introduced in #1446.
Fixes#1520
This also updates the corpora used, so previous times (and counts) are
not comparable.
We also remove some tools, likt pt, sift and ucg, since they appear to
be no longer maintained. ag isn't really maintained either, but it still
has significant mind share, so we retain a benchmark for it.
We also upgrade ack to version 3, and remove the clarification on how
`-w` is implemented.
We also add `git grep -P` (uses PCRE2) which appears to be much faster
than `git grep -E`.
Finally, we add ugrep which is a new up and comer in this space.
Fixes#1474
If a literal is entirely whitespace, then it's quite likely that it is
very common. So when that case occurs, just don't do (inner) literal
optimizations at all.
The regex engine may still make sub-optimal decisions here, but that's a
problem for another day.
Fixes#1087
The purpose of this flag is to force ripgrep to ignore all --ignore-file
flags (whether they come before or after --no-ignore-files).
This flag can be overridden with --ignore-files.
Fixes#1466
It doesn't really belong in the man page since it's an artifact of a
build/runtime configuration. Moreover, it inhibits reproducible builds.
Fixes#1441
This permits switching between the different regex engine modes that
ripgrep supports. The purpose of this flag is to make it easier to
extend ripgrep with additional regex engines.
Closes#1488, Closes#1502
This is in preparation for adding a new --engine flag which is intended
to eventually supplant --auto-hybrid-regex.
While there are no immediate plans to add more regex engines to ripgrep,
this is intended to make it easier to maintain a patch to ripgrep with
an additional regex engine. See #1488 for more details.
This adds one new dependency, maybe-uninit, which is brought in by
crossbeam-channel[1]. This is to apparently fix some unsound code
without bumping the MSRV. Since ripgrep uses the latest stable release
of Rust, the maybe-uninit crate should compile down to nothing and just
re-export std's `MaybeUninit` type.
[1] - https://github.com/crossbeam-rs/crossbeam/pull/458
It's not clear why removing this makes things work. I've submitted
PRs that passed CI with fetch-depth=1. Maybe it only fails when
PRs are submitted from external contributors?
Either way, for now, we remove this and absorb the extra cost in
order to get PRs passing CI again.
PR #1501