The `ignore` crate currently handles two different kinds of "global"
gitignore files: gitignores from `~/.gitconfig`'s `core.excludesFile`
and gitignores passed in via `WalkBuilder::add_ignore` (corresponding to
ripgrep's `--ignore-file` flag).
In contrast to any other kind of gitignore file, these gitignore files
should have their patterns interpreted relative to the current working
directory. (Arguably there are other choices we could make here, e.g.,
based on the paths given. But the `ignore` infrastructure can't handle
that, and it's not clearly correct to me.) Normally, a gitignore file
has its patterns interpreted relative to where the gitignore file is.
This relative interpretation matters for patterns like `/foo`, which are
anchored to _some_ directory.
Previously, we would generally get the global gitignores correct because
it's most common to use ripgrep without providing a path. Thus, it
searches the current working directory. In this case, no stripping of
the paths is needed in order for the gitignore patterns to be applied
directly.
But if one provides an absolute path (or something else) to ripgrep to
search, the paths aren't stripped correctly. Indeed, in the core, I had
just given up and not provided a "root" path to these global gitignores.
So it had no hope of getting this correct.
We fix this assigning the CWD to the `Gitignore` values created from
global gitignore files. This was a painful thing to do because we'd
ideally:
1. Call `std::env::current_dir()` at most once for each traversal.
2. Provide a way to avoid the library calling `std::env::current_dir()`
at all. (Since this is global process state and folks might want to
set it to different values for $reasons.)
The `ignore` crate's internals are a total mess. But I think I've
addressed the above 2 points in a semver compatible manner.
Fixes#3179
... specifically, when the whitelist comes from a _parent_ gitignore
file.
Our handling of parent gitignores is pretty ham-fisted and has been a
source of some unfortunate bugs. The problem is that we need to strip
the parent path from the path we're searching in order to correctly
apply the globs. But getting this stripping correct seems to be a subtle
affair.
Fixes#3173
Maybe 2024 changes?
Note that we now set `edition = "2024"` explicitly in `rustfmt.toml`.
Without this, it seems like it's possible in some cases for rustfmt to
run under an older edition's style. Not sure how though.
When enabled, patterns like `[abc`, `[]`, `[!]` are treated as if the
opening `[` is just a literal. This is in contrast the default behavior,
which prioritizes better error messages, of returning a parse error.
Fixes#3127, Closes#3145
qrc[1] are the resource files for data related to user interfaces, and
ui[2] is the extension that the Qt Designer generates, for Widget based
projects.
Note that the initial PR used `ui` as a name for `*.ui`, but this seems
overly general. Instead, we use `qui` here instead.
Closes#3141
[1]: https://doc.qt.io/qt-6/resources.html
[2]: https://doc.qt.io/qt-6/uic.html
[Gleam] is a general-purpose, concurrent, functional high-level
programming language that compiles to Erlang or JavaScript source code.
Closes#3105
[Gleam]: https://gleam.run/
This PR adds llvm to the list of default types, matching files with
extension ll which is used widely for the textual form of LLVM's
Intermediate Representation.
Ref: https://llvm.org/docs/LangRef.htmlCloses#3079
`.env` or "dotenv" is used quite often in cross-compilation/embedded
development environments to load environment variables, define shell
functions or even to execute shell commands. Just like `.zshenv` in
this list, I think `.env` should also be added here.
Closes#3063
Kconfig files are used to represent the configuration database of
Kbuild build system. Kbuild is developed as part of the Linux kernel.
There are numerous other users including OpenWrt and U-Boot.
Ref: https://docs.kernel.org/kbuild/index.htmlCloses#2942
I was somewhat unsure about adding this, since `.svelte.ts` seems
primarily like a TypeScript file and it could be surprising to show up
in a search for Svelte files. In particular, ripgrep doesn't know how to
only search the Svelte stuff inside of a `.svelte.ts` file, so you could
end up with lots of false positives.
However, I was swayed[1] by the argument that the extension does
actually include `svelte` in it, so maybe this is fine. Please open an
issue if this change ends up being too annoying for most users.
Closes#2874, Closes#2909
[1]: https://github.com/BurntSushi/ripgrep/issues/2874#issuecomment-3126892931
This PR adds the .rake extension to the Ruby type. It's a pretty common
file extension in Rails apps—in my experience, the Rakefile is often
pretty empty and only sets some stuff up while most of the code lives
in various .rake files.
See: https://ruby.github.io/rake/doc/rakefile_rdoc.html#label-Multiple+Rake+FilesCloses#2921
When searching in parallel with many more arguments than threads, the
first arguments are searched last -- unlike in the -j1 case.
This is unexpected for users who know about the parallel nature of rg
and think they can give the scheduler a hint by positioning larger
input files (L1, L2, ..) before smaller ones (█, ██). Instead, this can
result in sub-optimal thread usage and thus longer runtime (simplified
example with 2 threads):
T1: █ ██ █ █ █ █ ██ █ █ █ █ █ ██ ╠═════════════L1════════════╣
T2: █ █ ██ █ █ ██ █ █ █ ██ █ █ ╠═════L2════╣
┏━━━━┳━━━━┳━━━━┳━━━━┓
This is caused by assigning work to ┃ T1 ┃ T2 ┃ T3 ┃ T4 ┃
per-thread stacks in a round-robin ┡━━━━╇━━━━╇━━━━╇━━━━┩
manner, starting here → │ L1 │ L2 │ L3 │ L4 │ ↵
├────├────┼────┼────┤
│ s5 │ s6 │ s7 │ s8 │ ↵
├────┼────┼────┼────┤
╷ .. ╷ .. ╷ .. ╷ .. ╷
├────┼────┼────┼────┤
│ st │ su │ sv │ sw │ ↵
├────┼────┼────┼────┘
│ sx │ sy │ sz │
└────┴────┴────┘
and then processing them bottom-up: ↥ ↥ ↥ ↥
╷ .. ╷ .. ╷ .. ╷ .. ╷
This patch reverses the input order ├────┼────┼────┼────┤
so the two reversals cancel each other │ s7 │ s6 │ s5 │ L4 │ ↵
out. Now at least the first N ├────┼────┼────┼────┘
arguments, N=number-of-threads, are │ L3 │ L2 │ L1 │
processed before any others (then └────┴────┴────┘
work-stealing may happen):
T1: ╠═════════════L1════════════╣ █ ██ █ █ █ █ █ █ ██
T2: ╠═════L2════╣ █ █ ██ █ █ ██ █ █ █ ██ █ █ ██ █ █ █
(With some more shuffling T1 could always be assigned L1 etc., but
that would mostly be for optics).
Closes#2849
The *BSD build systems make use of "Makefile.inc" a lot. Make the
"make" type recognize this file by default. And more generally,
`Makefile.*` seems to be a convention, so just generalize it.
Closes#2846
This makes it so the presence of `.jj` will cause ripgrep to treat it
as a VCS directory, just as if `.git` were present. This is useful for
ripgrep's default behavior when working with jj repositories that don't
have a `.git` but do have `.gitignore`. Namely, ripgrep requires the
presence of a VCS repository in order to respect `.gitignore`.
We don't handle clone-specific exclude rules for jj repositories without
`.git` though. It seems it isn't 100% set yet where we can find
those[1].
Closes#2842
[1]: https://github.com/BurntSushi/ripgrep/pull/2842#discussion_r2020076722
The previous code deleted too many parts of the path when constructing
the absolute path, resulting in a shortened final path. This patch
creates the correct absolute path by only removing the necessary parts.
Fixes#829, Fixes#2731, Fixes#2747, Fixes#2778, Fixes#2836, Fixes#2933, Fixes#3144Closes#2933
These all seem pretty straight-forward. Compared with #2706, I dropped
the changes to the atomic orderings used in `ignore` because I haven't
had time to think through that carefully. But the ops in this PR seem
fine.
Closes#2706
It looks like there is a reference cycle caused by the compiled
matchers (compiled HashMap holds ref to Ignore and Ignore holds ref
to HashMap). Using weak refs fixes issue #2690 in my test project.
Also confirmed via before and after when profiling the code, see the
attached screenshots in #2692.
Fixes#2690
I don't usually like doing this and would prefer to just delete unused
code, but I don't have the context required to understand why this code
is unused. A refresh of this crate is on the (distant) horizon, so I'll
just leave these here for now to squash the warnings.
Previously, every worker would increment the shared num_pending count on
every new work item, and decrement it after finishing them, leading to
lots of contention. Now, we only track the number of workers actively
running, so there is no contention except when workers go to sleep or
wake up.
Closes#2642
This permits the value to be surrounded in double quotes. It's still not
perfect, but probably better than it was. Getting this to be more
correct will likely require writing (or using) a real parser, which I'm
not particularly incliend to do at present.
Fixes#2392, Closes#2629
There's no particular reason for this change. I happened to be looking
at the code again and realized that stealing from your left neighbour
or your right neighbour shouldn't make a difference (and indeed perf is
the same in my benchmarks).
Closes#2624
This represents yet another iteration on how `ignore` enqueues and
distributes work in parallel. The original implementation used a
multi-producer/multi-consumer thread safe queue from crossbeam. At some
point, I migrated to a simple `Arc<Mutex<Vec<_>>>` and treated it as a
stack so that we did depth first traversal. This helped with memory
usage in very wide directories.
But it turns out that a naive stack-behind-a-mutex can be quite a bit
slower than something that's a little smarter, such as a work-stealing
stack used in this commit. My hypothesis for why this helps is that
without the stealing component, work distribution can get stuck in
sub-optimal configurations that depend on which directory entries get
assigned to a particular worker. It's likely that this can result in
some workers getting "more" work than others, just by chance, and thus
remain idle. But the work-stealing approach heads that off.
This does re-introduce a dependency on parts of crossbeam which is kind
of a bummer, but it's carrying its weight for now.
Closes#1823, Closes#2591
Ref https://github.com/sharkdp/fd/issues/28