This also updates the corpora used, so previous times (and counts) are
not comparable.
We also remove some tools, likt pt, sift and ucg, since they appear to
be no longer maintained. ag isn't really maintained either, but it still
has significant mind share, so we retain a benchmark for it.
We also upgrade ack to version 3, and remove the clarification on how
`-w` is implemented.
We also add `git grep -P` (uses PCRE2) which appears to be much faster
than `git grep -E`.
Finally, we add ugrep which is a new up and comer in this space.
Fixes#1474
If a literal is entirely whitespace, then it's quite likely that it is
very common. So when that case occurs, just don't do (inner) literal
optimizations at all.
The regex engine may still make sub-optimal decisions here, but that's a
problem for another day.
Fixes#1087
The purpose of this flag is to force ripgrep to ignore all --ignore-file
flags (whether they come before or after --no-ignore-files).
This flag can be overridden with --ignore-files.
Fixes#1466
It doesn't really belong in the man page since it's an artifact of a
build/runtime configuration. Moreover, it inhibits reproducible builds.
Fixes#1441
This permits switching between the different regex engine modes that
ripgrep supports. The purpose of this flag is to make it easier to
extend ripgrep with additional regex engines.
Closes#1488, Closes#1502
This is in preparation for adding a new --engine flag which is intended
to eventually supplant --auto-hybrid-regex.
While there are no immediate plans to add more regex engines to ripgrep,
this is intended to make it easier to maintain a patch to ripgrep with
an additional regex engine. See #1488 for more details.
This adds one new dependency, maybe-uninit, which is brought in by
crossbeam-channel[1]. This is to apparently fix some unsound code
without bumping the MSRV. Since ripgrep uses the latest stable release
of Rust, the maybe-uninit crate should compile down to nothing and just
re-export std's `MaybeUninit` type.
[1] - https://github.com/crossbeam-rs/crossbeam/pull/458
It's not clear why removing this makes things work. I've submitted
PRs that passed CI with fetch-depth=1. Maybe it only fails when
PRs are submitted from external contributors?
Either way, for now, we remove this and absorb the extra cost in
order to get PRs passing CI again.
PR #1501
We can just ask the channel whether any work has been loaded. Normally
querying a channel for its length is a strong predictor of bugs, but in
this case, we do it before we ever attempt a `recv`, so it should work.
Kudos to @zsugabubus for suggesting this!
It turns out that the previous version wasn't quite correct. Namely, it
was possible for the following sequence to occur:
1. Consider that all workers, except for one, are `waiting`.
2. The last remaining worker finds one more job to do and sends it on
the channel.
3. One of the previously `waiting` workers wakes up from the job that
the last running worker sent, but `self.resume()` has not been
called yet.
4. The last worker, from (2), calls `get_work` and sees that the
channel has nothing on it, so it executes `self.waiting() ==
1`. Since the worker in (3) hasn't called `self.resume()` yet,
`self.waiting() == 1` evaluates to true.
5. This sets off a chain reaction that stops all workers, despite that
fact that (3) got more work (which could itself spawn more work).
The end result is that the traversal may terminate while their are still
outstanding work items to process. This problem was observed through
spurious failures in CI. I was not actually able to reproduce the bug
locally.
We fix this by changing our strategy to detect termination using a
counter. Namely, we increment the counter just before sending new work
and decrement the counter just after finishing work. In this way, we
guarantee that the counter only ever reaches 0 once there is no more
work to process.
See #1337 for more discussion. Many thanks to @zsugabubus for helping me
work through this.
The transient failures appear to be persisting and they are quite
difficult to debug. So include a full directory listing in the output of
every test failure.
The reason why it wasn't working was the integration tests. Namely, the
integration tests attempted to execute the 'rg' binary directly from
inside cross's docker container. But this obviously doesn't work when
'rg' was compiled for a totally different architecture.
Cross normally does this by hooking into the Rust test infrastructure
and causing tests to run with 'qemu'. But our integration tests didn't
do that. This commit fixes our test setup to check for cross's
environment variable that points to the 'qemu' binary. Once we have
that, we just use 'qemu-foo rg' instead of 'rg'. Piece of cake.
The top-level listing was just getting a bit too long for my taste. So
put all of the code in one directory and shrink the large top-level mess
to a small top-level mess.
NOTE: This commit only contains renames. The subsequent commit will
actually make ripgrep build again. We do it this way with the naive hope
that this will make it easier for git history to track the renames.
Sigh.
This is why I was so intent on clearing the PR queue. This will
effectively invalidate all existing patches, so I wanted to start from a
clean slate.
We do make one little tweak: we put the default type definitions in
their own file and tell rustfmt to keep its grubby mits off of it. We
also sort it lexicographically and hopefully will enforce that from here
on.
Change the meaning of `Quit` message. Now it means terminate. The final
"dance" is unnecessary, because by the time quitting begins, no thread
will ever spawn a new `Work`. The trick was to replace the heuristic
spin-loop with blocking receive.
Closes#1337
Due to how walkdir works if symlinks are not followed, symlinks to
directories are seen as simple files by ripgrep. This caused a panic
in some cases due to receiving a WalkEvent::Exit event without a
corresponding WalkEvent::Dir event.
This is fixed by looking at the metadata of the file in the case of a
symlink to determine if it's a directory. We are careful to only do
this stat check when the depth of the entry is 0, as this bug only
impacts us when 1) we aren't following symlinks generally and 2) the
user provides a symlinked directory that we do follow as a top-level
path to search.
Fixes#1389, Closes#1397
This adds a universal --no-unicode flag that is intended to work for all
supported regex engines. There is no point in retaining
--no-pcre2-unicode, so we make them aliases to the new flags and
deprecate them.
This flag prevents ripgrep from requiring one to search a git repository
in order to respect git-related ignore rules (global, .gitignore and
local excludes). This actually corresponds to behavior ripgrep had long
ago, but #934 changed that. It turns out that users were relying on this
buggy behavior. In most cases, fixing it as simple as converting one's
rules to .ignore or .rgignore files. Unfortunately, there are other use
cases---like Perforce automatically respecting .gitignore files---that
make a strong case for ripgrep to at least support this.
The UX of a flag like this is absolutely atrocious. It's so obscure that
it's really not worth explicitly calling it out anywhere. Moreover, the
error cases that occur when this flag isn't used (but its behavior is
desirable) will not be intuitive, do not seem easily detectable and will
not guide users to this flag. Nevertheless, the motivation for this is
just barely strong enough for me to begrudgingly accept this.
Fixes#1414, Closes#1416
This commit fixes an inconsistency between the serial and the parallel
directory walkers around visiting a directory for which the user holds
insufficient permissions to descend into.
The serial walker does produce a successful entry for a directory that
it cannot descend into due to insufficient permissions. However, before
this change that has not been the case for the parallel walker, which
would produce an `Err` item not only when descending into a directory
that it cannot read from but also for the directory entry itself.
This change brings the behaviour of the parallel variant in line with
that of the serial one.
Fixes#1346, Closes#1365
This appears to be another transcription bug from copying this code from
the prefix literal detection from inside the regex crate. Namely, when
it comes to inner literals, we only want to treat counted repetition as
two separate cases: the case when the minimum match is 0 and the case
when the minimum match is more than 0. In the former case, we treat
`e{0,n}` as `e*` and in the latter we treat `e{m,n}` where `m >= 1` as
just `e`.
We could definitely do better here. e.g., This means regexes like
`(foo){10}` will only have `foo` extracted as a literal, where searching
for the full literal would likely be faster.
The actual bug here was that we were not implementing this logic
correctly. Namely, we weren't always "cutting" the literals in the
second case to prevent them from being expanded.
Fixes#1319, Closes#1367
On top of the parallel-walk's closures, this provides a Visitor API.
This clarifies the role of the two different closures in the `run`
API and allows implementing of `Drop` for post-processing once traversal
is finished.
The closure API is maintained not just for compatibility but also
convinience for simple cases.
Fixes#469, Closes#1430
This makes it so the caller can more easily refactor from
single-threaded to multi-threaded walking. If they want to support both,
this makes it easier to do so with a single initialization code-path. In
particular, it side-steps the need to put everything into an `Arc`.
This is not a breaking change because it strictly increases the number
of allowed inputs to `WalkParallel::run`.
Closes#1410, Closes#1432
Git looks for this file in GIT_COMMON_DIR, which is usually the same
as GIT_DIR (.git). However, when searching inside a linked worktree,
.git is usually a file that contains the path of the actual git dir,
which in turn contains a file "commondir" which references the directory
where info/exclude may reside, alongside other configuration shared across
all worktrees. This directory is usually the git dir of the main worktree.
Unlike git this does *not* read environment variables GIT_DIR and
GIT_COMMON_DIR, because it is not clear how to interpret them when
searching multiple repositories.
Fixes#1445, Closes#1446