1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-11-23 21:54:45 +02:00
Commit Graph

143 Commits

Author SHA1 Message Date
Andrew Gallant
b610d1cb15 ignore: fix global gitignore bug that arises with absolute paths
The `ignore` crate currently handles two different kinds of "global"
gitignore files: gitignores from `~/.gitconfig`'s `core.excludesFile`
and gitignores passed in via `WalkBuilder::add_ignore` (corresponding to
ripgrep's `--ignore-file` flag).

In contrast to any other kind of gitignore file, these gitignore files
should have their patterns interpreted relative to the current working
directory. (Arguably there are other choices we could make here, e.g.,
based on the paths given. But the `ignore` infrastructure can't handle
that, and it's not clearly correct to me.) Normally, a gitignore file
has its patterns interpreted relative to where the gitignore file is.
This relative interpretation matters for patterns like `/foo`, which are
anchored to _some_ directory.

Previously, we would generally get the global gitignores correct because
it's most common to use ripgrep without providing a path. Thus, it
searches the current working directory. In this case, no stripping of
the paths is needed in order for the gitignore patterns to be applied
directly.

But if one provides an absolute path (or something else) to ripgrep to
search, the paths aren't stripped correctly. Indeed, in the core, I had
just given up and not provided a "root" path to these global gitignores.
So it had no hope of getting this correct.

We fix this assigning the CWD to the `Gitignore` values created from
global gitignore files. This was a painful thing to do because we'd
ideally:

1. Call `std::env::current_dir()` at most once for each traversal.
2. Provide a way to avoid the library calling `std::env::current_dir()`
   at all. (Since this is global process state and folks might want to
   set it to different values for $reasons.)

The `ignore` crate's internals are a total mess. But I think I've
addressed the above 2 points in a semver compatible manner.

Fixes #3179
2025-10-15 19:44:23 -04:00
Luke Hannan
9ec08522be ignore/types: add lowercase R extensions
PR #3186
2025-10-14 15:15:07 -04:00
Andrew Gallant
0407e104f6 ignore: fix problem with searching whitelisted hidden files
... specifically, when the whitelist comes from a _parent_ gitignore
file.

Our handling of parent gitignores is pretty ham-fisted and has been a
source of some unfortunate bugs. The problem is that we need to strip
the parent path from the path we're searching in order to correctly
apply the globs. But getting this stripping correct seems to be a subtle
affair.

Fixes #3173
2025-10-08 21:16:59 -04:00
Alvaro Parker
2924d0c4c0 ignore: add min_depth option
This mimics the eponymous option in `walkdir`.

Closes #3158, PR #3162
2025-10-05 10:05:26 -04:00
Andrew Gallant
e42432cc5d ignore: clarify WalkBuilder::filter_entry
Fixes #2913
2025-09-22 21:49:29 -04:00
Andrew Gallant
bb8172fe9b style: apply rustfmt
Maybe 2024 changes?

Note that we now set `edition = "2024"` explicitly in `rustfmt.toml`.
Without this, it seems like it's possible in some cases for rustfmt to
run under an older edition's style. Not sure how though.
2025-09-19 21:08:19 -04:00
mostafa
f596a5d875 globset: add allow_unclosed_class toggle
When enabled, patterns like `[abc`, `[]`, `[!]` are treated as if the
opening `[` is just a literal. This is in contrast the default behavior,
which prioritizes better error messages, of returning a parse error.

Fixes #3127, Closes #3145
2025-09-19 21:08:19 -04:00
Thomas ten Cate
556623684e ignore/types: add GDScript files (*.gd) for the Godot Engine
Closes #3142
2025-09-19 21:08:19 -04:00
Cristián Maureira-Fredes
3f565b58cc ignore/types: add Qt types for resource files and ui declaration
qrc[1] are the resource files for data related to user interfaces, and
ui[2] is the extension that the Qt Designer generates, for Widget based
projects.

Note that the initial PR used `ui` as a name for `*.ui`, but this seems
overly general. Instead, we use `qui` here instead.

Closes #3141

[1]: https://doc.qt.io/qt-6/resources.html
[2]: https://doc.qt.io/qt-6/uic.html
2025-09-19 21:08:19 -04:00
Porkepix
56d03a1e2f ignore/types: include missing files for the tf type
Existing matches were too restrictives, so we simplify those to every
type of tfvars file we can encounter.

Closes #3117
2025-09-19 21:08:19 -04:00
Tomek
e166f271df ignore/types: add gleam
[Gleam] is a general-purpose, concurrent, functional high-level
programming language that compiles to Erlang or JavaScript source code.

Closes #3105

[Gleam]: https://gleam.run/
2025-09-19 21:08:19 -04:00
Andrew McNulty
83d94672ae ignore/types: add LLVM to default types
This PR adds llvm to the list of default types, matching files with
extension ll which is used widely for the textual form of LLVM's
Intermediate Representation.

Ref: https://llvm.org/docs/LangRef.html

Closes #3079
2025-09-19 21:08:19 -04:00
James Moberg
6887122e5b ignore/types: add ColdFusion and BoxLang
Closes #3090
2025-09-19 21:08:19 -04:00
Lilian A. Moraru
06210b382a ignore/types: add .env to sh file type
`.env` or "dotenv" is used quite often in cross-compilation/embedded
development environments to load environment variables, define shell
functions or even to execute shell commands. Just like `.zshenv` in
this list, I think `.env` should also be added here.

Closes #3063
2025-09-19 21:08:19 -04:00
bbb651
ba23ced817 ignore/types: add scdoc
Ref https://sr.ht/~sircmpwn/scdoc/

Closes #3007
2025-09-19 21:08:19 -04:00
Thomas Weißschuh
6244e635a1 ignore/types: add Kconfig
Kconfig files are used to represent the configuration database of
Kbuild build system. Kbuild is developed as part of the Linux kernel.
There are numerous other users including OpenWrt and U-Boot.

Ref: https://docs.kernel.org/kbuild/index.html

Closes #2942
2025-09-19 21:08:19 -04:00
Dmitry Gerasimov
75e17fcabe ignore/types: add *.dtso to devicetree type
`dtso` files became recognized as devicetree a
couple of years ago with the following commit:
363547d219

Closes #2938
2025-09-19 21:08:19 -04:00
Martin Pool
99b7957122 ignore/doc: explain that require_git(false) will ascend above git roots
This should hopefully help avoid confusion about #2812 as encountered
in https://github.com/sourcefrog/cargo-mutants/issues/450.

Closes #2937
2025-09-19 21:08:19 -04:00
Colin Heffernan
b0c6d4c34a ignore/types: add *.svelte.ts to Svelte file type glob
I was somewhat unsure about adding this, since `.svelte.ts` seems
primarily like a TypeScript file and it could be surprising to show up
in a search for Svelte files. In particular, ripgrep doesn't know how to
only search the Svelte stuff inside of a `.svelte.ts` file, so you could
end up with lots of false positives.

However, I was swayed[1] by the argument that the extension does
actually include `svelte` in it, so maybe this is fine. Please open an
issue if this change ends up being too annoying for most users.

Closes #2874, Closes #2909

[1]: https://github.com/BurntSushi/ripgrep/issues/2874#issuecomment-3126892931
2025-09-19 21:08:19 -04:00
Zach Ahn
e83828fc8c ignore/types: add *.rake extension to list of Ruby file types
This PR adds the .rake extension to the Ruby type. It's a pretty common
file extension in Rails apps—in my experience, the Rakefile is often
pretty empty and only sets some stuff up while most of the code lives
in various .rake files.

See: https://ruby.github.io/rake/doc/rakefile_rdoc.html#label-Multiple+Rake+Files

Closes #2921
2025-09-19 21:08:19 -04:00
f3rn0s
72a1303238 ignore/types: add typst
Closes #2914
2025-09-19 21:08:19 -04:00
Melvin Wang
5be67c1244 ignore/types: include msbuild solution filters
Closes #2871
2025-09-19 21:08:19 -04:00
Alex Povel
d869038cf6 ignore: improve multithreading heuristic
This copies the one found in ripgrep.

See also:
71d71d2d98/crates/core/flags/hiargs.rs (L172)

Closes #2854, Closes #2856
2025-09-19 21:08:19 -04:00
Thomas Otto
75970fd16b ignore: don't process command line arguments in reverse order
When searching in parallel with many more arguments than threads, the
first arguments are searched last -- unlike in the -j1 case.

This is unexpected for users who know about the parallel nature of rg
and think they can give the scheduler a hint by positioning larger
input files (L1, L2, ..) before smaller ones (█, ██). Instead, this can
result in sub-optimal thread usage and thus longer runtime (simplified
example with 2 threads):

 T1:  █ ██ █ █ █ █ ██ █ █ █ █ █ ██ ╠═════════════L1════════════╣
 T2:  █ █ ██ █ █ ██ █ █ █ ██ █ █ ╠═════L2════╣

                                       ┏━━━━┳━━━━┳━━━━┳━━━━┓
This is caused by assigning work to    ┃ T1 ┃ T2 ┃ T3 ┃ T4 ┃
 per-thread stacks in a round-robin    ┡━━━━╇━━━━╇━━━━╇━━━━┩
              manner, starting here  → │ L1 │ L2 │ L3 │ L4 │ ↵
                                       ├────├────┼────┼────┤
                                       │ s5 │ s6 │ s7 │ s8 │ ↵
                                       ├────┼────┼────┼────┤
                                       ╷ .. ╷ .. ╷ .. ╷ .. ╷
                                       ├────┼────┼────┼────┤
                                       │ st │ su │ sv │ sw │ ↵
                                       ├────┼────┼────┼────┘
                                       │ sx │ sy │ sz │
                                       └────┴────┴────┘
   and then processing them bottom-up:   ↥    ↥    ↥    ↥

                                       ╷ .. ╷ .. ╷ .. ╷ .. ╷
This patch reverses the input order    ├────┼────┼────┼────┤
so the two reversals cancel each other │ s7 │ s6 │ s5 │ L4 │ ↵
out. Now at least the first N          ├────┼────┼────┼────┘
arguments, N=number-of-threads, are    │ L3 │ L2 │ L1 │
processed before any others (then      └────┴────┴────┘
work-stealing may happen):

 T1:  ╠═════════════L1════════════╣ █ ██ █ █ █ █ █ █ ██
 T2:  ╠═════L2════╣ █ █ ██ █ █ ██ █ █ █ ██ █ █ ██ █ █ █

(With some more shuffling T1 could always be assigned L1 etc., but
that would mostly be for optics).

Closes #2849
2025-09-19 21:08:19 -04:00
Christoph Badura
380809f1e2 ignore/types: add Makefile.*
The *BSD build systems make use of "Makefile.inc" a lot. Make the
"make" type recognize this file by default. And more generally,
`Makefile.*` seems to be a convention, so just generalize it.

Closes #2846
2025-09-19 21:08:19 -04:00
Matt Kulukundis
94ea38da30 ignore: support .jj as well as .git
This makes it so the presence of `.jj` will cause ripgrep to treat it
as a VCS directory, just as if `.git` were present. This is useful for
ripgrep's default behavior when working with jj repositories that don't
have a `.git` but do have `.gitignore`. Namely, ripgrep requires the
presence of a VCS repository in order to respect `.gitignore`.

We don't handle clone-specific exclude rules for jj repositories without
`.git` though. It seems it isn't 100% set yet where we can find
those[1].

Closes #2842

[1]: https://github.com/BurntSushi/ripgrep/pull/2842#discussion_r2020076722
2025-09-19 21:08:19 -04:00
Stephen Albert-Moore
483628469a ignore/gitignore: skip BOM at start of ignore file
This matches Git's behavior.

Fixes #2177, Closes #2782
2025-09-19 21:08:19 -04:00
Keith Smiley
7c004f224e ignore/types: detect WORKSPACE.bzlmod for bazel file type
This file came alongside MODULE.bazel and I should have added it here
previously.

Closes #2726
2025-09-19 21:08:19 -04:00
ChristopherYoung
14f4957b3d ignore: fix filtering searching subdir or .ignore in parent dir
The previous code deleted too many parts of the path when constructing
the absolute path, resulting in a shortened final path. This patch
creates the correct absolute path by only removing the necessary parts.

Fixes #829, Fixes #2731, Fixes #2747, Fixes #2778, Fixes #2836, Fixes #2933, Fixes #3144
Closes #2933
2025-09-19 21:08:19 -04:00
wang384670111
90a680ab45 impl: switch most atomic ops to Relaxed ordering
These all seem pretty straight-forward. Compared with #2706, I dropped
the changes to the atomic orderings used in `ignore` because I haven't
had time to think through that carefully. But the ops in this PR seem
fine.

Closes #2706
2025-09-19 21:08:19 -04:00
Pierre Rouleau
5fbc4fee64 ignore/types: fix Seed7 file extension
PR #3023
2025-04-07 10:53:32 -04:00
Pierre Rouleau
004370bd16 ignore/types: add support for Seed7 files
For more info on the Seed7 programming Language see:

- on Wikipedia: https://en.wikipedia.org/wiki/Seed7
- Seed7 home:   https://seed7.sourceforge.net/
- Seed7 repo:   https://github.com/ThomasMertes/seed7

PR #3022
2025-04-07 08:51:22 -04:00
Andrew Gallant
0a0893a765 ignore: add debug log message when opening gitignore file
I'm not sure why it took me this long to add this debug message, but
it's quite useful in determining where ignore rules are coming from.
2024-05-27 14:53:19 -04:00
Linda_pp
2acf25c689 ignore/types: add WGSL to the default file types
[WGSL][1] is a shading language for WebGPU. As defined in [Appendix
A][2], the file extension is `.wgsl`.

PR #2774 

[1]: https://www.w3.org/TR/WGSL/
[2]: https://www.w3.org/TR/WGSL/#text-wgsl-media-type
2024-04-01 23:05:15 -04:00
Vadim Kostin
80007698d3 ignore/types: add Vue
PR #2772
2024-04-01 07:49:29 -04:00
cgzones
3ad0e83471 ignore/walk: correct build_parallel() documentation
The returned closure should return `WalkState`, not `()`.

Closes #2767
2024-03-27 14:50:05 -04:00
Brent Williams
9da0995df4 ignore/types: add 'svelte' to the default file types
Ref: https://svelte.dev/

PR #2759
2024-03-19 13:36:08 -04:00
Andrew Gallant
59212d08d3 style: fix new lints
The Rust compiler seems to have gotten smarter at finding unused or
redundant imports.
2024-03-07 09:37:48 -05:00
fe9lix
b9c774937f ignore: fix reference cycle for compiled matchers
It looks like there is a reference cycle caused by the compiled
matchers (compiled HashMap holds ref to Ignore and Ignore holds ref
to HashMap). Using weak refs fixes issue #2690 in my test project.
Also confirmed via before and after when profiling the code, see the
attached screenshots in #2692.

Fixes #2690
2024-01-06 12:50:42 -05:00
Andrew Gallant
67dd809a80 ignore: add some 'allow(dead_code)' annotations
I don't usually like doing this and would prefer to just delete unused
code, but I don't have the context required to understand why this code
is unused. A refresh of this crate is on the (distant) horizon, so I'll
just leave these here for now to squash the warnings.
2024-01-06 12:25:06 -05:00
amesgen
56c7ad175a ignore/types: add Lean
Ref: https://lean-lang.org/

PR #2678
2023-12-07 11:46:00 -05:00
Patrick Williams
2a4dba3fbf ignore/types: add meson.options
Starting with meson 1.1, there is a preference for using meson.options
instead of meson_options.txt.  Add the new filename to the meson set.

PR #2666
2023-11-29 19:03:12 -05:00
Tavian Barnes
6d7550d58e ignore: Avoid contention on num_pending
Previously, every worker would increment the shared num_pending count on
every new work item, and decrement it after finishing them, leading to
lots of contention.  Now, we only track the number of workers actively
running, so there is no contention except when workers go to sleep or
wake up.

Closes #2642
2023-11-21 18:39:32 -05:00
Kento Okamoto
922bad2b92 ignore: improve 'excludesFile' parsing
This permits the value to be surrounded in double quotes. It's still not
perfect, but probably better than it was. Getting this to be more
correct will likely require writing (or using) a real parser, which I'm
not particularly incliend to do at present.

Fixes #2392, Closes #2629
2023-11-20 23:51:53 -05:00
Tavian Barnes
53679e4c43 ignore: simplify the work-stealing strategy
There's no particular reason for this change. I happened to be looking
at the code again and realized that stealing from your left neighbour
or your right neighbour shouldn't make a difference (and indeed perf is
the same in my benchmarks).

Closes #2624
2023-11-20 23:51:53 -05:00
Andrew Gallant
6d17b3ed68 deps: drop thread_local, lazy_static and once_cell
This is largely made possible by the addition of std::sync::OnceLock to
the standard library, and the memory pool available in regex-automata.
2023-10-09 20:29:52 -04:00
Andrew Gallant
f16ea0812d ignore: polish
Like previous commits, we do a bit of polishing and bring the style up
to my current practice.
2023-10-09 20:29:52 -04:00
Andrew Gallant
3ad7a0d95e crates: remove hard-coded links
And use rustdoc's native intra-crate links. So much nicer.
2023-10-09 20:29:52 -04:00
Linda_pp
abfa65c2c1 ignore/types: add *.sarif for SARIF format files
[SARIF] is a format for reporting static analysis results. It is [used
by GitHub CodeQL][GH] for example.

Here are some samples from Microsoft's VSCode extension:

https://github.com/microsoft/sarif-vscode-extension/tree/main/samples

The SARIF format is built on top of JSON.

[SARIF]: https://docs.oasis-open.org/sarif/sarif/v2.1.0/csprd01/sarif-v2.1.0-csprd01.html
[GH]: https://docs.github.com/en/code-security/code-scanning/integrating-with-code-scanning/sarif-support-for-code-scanning

PR #2620
2023-10-05 13:23:29 -04:00
Tavian Barnes
d938e955af ignore: use work-stealing stack instead of Arc<Mutex<Vec<_>>>
This represents yet another iteration on how `ignore` enqueues and
distributes work in parallel. The original implementation used a
multi-producer/multi-consumer thread safe queue from crossbeam. At some
point, I migrated to a simple `Arc<Mutex<Vec<_>>>` and treated it as a
stack so that we did depth first traversal. This helped with memory
usage in very wide directories.

But it turns out that a naive stack-behind-a-mutex can be quite a bit
slower than something that's a little smarter, such as a work-stealing
stack used in this commit. My hypothesis for why this helps is that
without the stealing component, work distribution can get stuck in
sub-optimal configurations that depend on which directory entries get
assigned to a particular worker. It's likely that this can result in
some workers getting "more" work than others, just by chance, and thus
remain idle. But the work-stealing approach heads that off.

This does re-introduce a dependency on parts of crossbeam which is kind
of a bummer, but it's carrying its weight for now.

Closes #1823, Closes #2591
Ref https://github.com/sharkdp/fd/issues/28
2023-09-20 11:52:42 -04:00