1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-06-14 22:15:13 +02:00
Commit Graph

327 Commits

Author SHA1 Message Date
053a1669bb globset-0.4.12 2023-07-26 19:51:38 -04:00
31d3f16254 api: impl Deserialize for GlobSet
PR #2569
2023-07-26 19:51:22 -04:00
304a60e8e9 grep-cli-0.1.9 2023-07-18 13:25:23 -04:00
1d35859861 globset-0.4.11 2023-07-12 12:58:43 -04:00
601e122e9f ignore/types: add Windows Command Prompt files
This PR adds `*.bat` and `*.cmd` file types.

In doing so, it makes a distinction between batch files (old standard
from the MS-DOS era) and command scripts (new flavor - can operate on
batch files, although `*.cmd` is preferred for various reasons, the
main one being batch files will set `ERRORLEVEL` following inconsistent
MS-DOS style rules[1]).

PR #2556

[1]: https://groups.google.com/g/microsoft.public.win2000.cmdprompt.admin/c/XHeUq8oe2wk/m/LIEViGNmkK0J#i106
2023-07-10 15:58:17 -04:00
6abb962f0d cli: fix non-path sorting behavior
Previously, sorting worked by sorting the parents and then sorting the
children within each parent. This was done during traversal, but it only
works when sorting parents preserves the overall order. This generally
only works for '--sort path' in ascending order.

This commit fixes the rest of the sorting behavior by collecting all of
the paths to search and then sorting them before searching. We only
collect all of the paths when sorting was requested.

Fixes #2243, Closes #2361
2023-07-09 10:14:03 -04:00
6d95c130d5 cli: add --stop-on-nonmatch flag
This causes ripgrep to stop searching an individual file after it has
found a non-matching line. But this only occurs after it has found a
matching line.

Fixes #1790, Closes #1930
2023-07-08 18:52:42 -04:00
4782ebd5e0 core: lock stdout before printing an error message to stderr
Adds a new eprintln_locked macro which locks STDOUT before logging
to STDERR. This patch also replaces instances of eprintln with
eprintln_locked to avoid interleaving lines.

Fixes #1941, Closes #1968
2023-07-08 18:52:42 -04:00
4993d29a16 globset: add 'escape' routine
Fixes #2060, Closes #2061
2023-07-08 18:52:42 -04:00
23adbd6795 cli: force binary existance check
Previously, we were only doing a binary existence check on Windows. And
in fact, the main point there wasn't binary existence, but ensuring we
didn't accidentally resolve a binary name relative to the CWD, which
could result in executing a program one didn't mean to run.

However, it is useful to be able to check whether a binary exists on any
platform when associating a glob with a binary. If the binary doesn't
exist, then the association can fail eagerly and let some other glob
apply.

Closes #1946
2023-07-08 18:52:42 -04:00
cb7501ff11 doc: clarify the comment on Worker.work_done
We call `work_done` only once the work has been actually performed
(otherwise `num_pending` could go to 0 before the actual work is done).

Closes #2039
2023-07-08 18:52:42 -04:00
3b66f37a31 doc: improve -r/--replace flag syntax docs
Fixes #2108, Closes #2123
2023-07-08 18:52:42 -04:00
f30a30867e ignore/types: name aliases for file types
We also make py/python, md/markdown and ts/typescript aliases of one
another.

Note that this only introduces aliases at the point where default types
are defined. This just makes them a bit easier to read/write, and also
makes it easier to expose more names that describe the same thing.

Fixes #1857, Closes #1895
2023-07-08 18:52:42 -04:00
7313dca472 ignore/types: add 'typescript' alias for 'ts'
Closes #2009
2023-07-08 18:52:42 -04:00
99bf2b01dc ignore/types: add Ada filetypes, including gprbuild and alire
*.adb and *.ads are the usual extensions for Ada source code,
and *.gpr indicates a GPRbuild project file used for Ada, and
these days often being combined with alire for package dependency
resolution. Alire stores a bunch of files named alire.toml in
different directories in your (gitignored) cache/dependencies/...

Closes #2013
2023-07-08 18:52:42 -04:00
ee1360cc07 ignore/types: add raku extensions to ignore types
Closes #2117
2023-07-08 18:52:42 -04:00
da7c81fb96 ignore/types: add MDX format to Markdown types
Ref https://mdxjs.com/

Closes #2142
2023-07-08 18:52:42 -04:00
a4e3d56de1 ignore/types: add DITA (Darwin Information Typing Architecture)
Closes #2148
2023-07-08 18:52:42 -04:00
7c83b90f95 doc: fix typo
Closes #2153
2023-07-08 18:52:42 -04:00
97b5b7769c doc: fix some typos
Closes #2195
2023-07-08 18:52:42 -04:00
f3241fd657 cli: '--no-ignore-dot' should also '.rgignore'
Fixes #2198, Closes #2202
2023-07-08 18:52:42 -04:00
cfe357188d ignore/types: fix formatting 2023-07-08 18:52:42 -04:00
792451e331 ignore/types: added V type
V (http://vlang.io) uses '.v' files.

Closes #2302
2023-07-08 18:52:42 -04:00
f34fd5c4b6 globset: introduce option to keep empty alternates
Add a method GlobBuilder::empty_alternates and supporting mechanisms.

Ref #1368
Closes #2369
2023-07-08 18:52:42 -04:00
d51c6c005a globset: permit deserializing Glob from String
Closes #2386, Closes #2388
2023-07-08 18:52:42 -04:00
0f6181d309 ignore/types: add USD to the default file types
Closes #2432
2023-07-08 18:52:42 -04:00
e902e2fef4 ignore/types: add Gentoo eclass type
Eclasses are "ebuild libraries" and generally if you're filtering
for/filtering out an ebuild/eclass, you don't want the other either.

Followup to 4dfea016b9

Closes #2437
2023-07-08 18:52:42 -04:00
07cbfee225 ignore/types: improve Elixir globs
Closes #2450
2023-07-08 18:52:42 -04:00
d675844510 core: don't let context flags override eachother
This matches the behavior of GNU grep which does not ignore
before-context and after-context completely if the context flag is also
provided.

Note that this change wasn't done just to match GNU grep. In this case,
GNU grep has the more sensible behavior.

Fixes #2288, Closes #2451
2023-07-08 18:52:42 -04:00
43bbcca06f doc: note '-n' and '-N' override each other
Closes #2460
2023-07-08 18:52:42 -04:00
ad9bfdd981 ignore/gitignore: expose gitconfig_excludes_path
I have reservations about this, but it looks useful and doesn't seem
terribly onerous to support. The `ignore` crate will really always need
to have some kind of logic supporting this in some form I think.

Closes #2482
2023-07-08 18:52:42 -04:00
0c1cbd99f3 ignore: tweak regex crate features
This removes most of the Unicode features as they aren't currently
used. We can always add them back later if necessary.

We can avoid the unicode-perl feature by changing `\s` to `[[:space:]]`,
which uses the ASCII-only definition of `\s`. Since we don't expect
non-ASCII whitespace in git config files, this seems okay.

Closes #2502
2023-07-08 18:52:42 -04:00
96cfc0ed13 ignore/types: add 'graphql' type
GraphQL file extensions: .graphql and .graphqls (schema)

We could also add `.gql`, but perhaps it's less correct to do so. We'll
start conservatively here, and we can always add `.gql` later.

Closes #2439, Closes #2508
2023-07-08 18:52:42 -04:00
da8ecddce9 cli: make resolve_binary take COM executables into account
When `resolve_binary()` attempts to resolve a path to a program on
Windows while searching for a program in `PATH` without an extension,
`ripgrep` will assume the extension of the file to be `.exe` as it's
the *de facto* standard, which will work most (99.99%) of the time...

...unless the binary is a COM executable (we're on Windows, duh).

Closes #2523
2023-07-08 18:52:42 -04:00
545a7dc759 ignore/types: add cml to the default types list
It's used in Fuchsia to mean "component manifest language."[1]

[1]: https://fuchsia.dev/reference/cml?hl=en

Closes #2529
2023-07-08 18:52:42 -04:00
f4d07b9cbd grep-cli-0.1.8 2023-07-05 17:09:09 -04:00
3ac4541e9f regex: remove old inner literal extractor
(It had already been removed from the crate.)
2023-07-05 14:04:29 -04:00
a68db3ac02 deps: drop temporary patch and move to bstr 1.6
Now that regex 1.9 is out, we can depend on it from crates.io.
2023-07-05 14:04:29 -04:00
ca740d9ace regex: add new inner literal extractor
This is mostly a copy of the prefix literal extractor in regex-syntax,
but with a tweaked notion of Seq that keeps track of whether it's a
prefix of an expression or not. If it isn't, then we can't cross it as a
suffix to another Seq.

This new extractor should be a lot more robust than the old one. We
actually will keep going through the regex to try and find the "best"
literals to search for (according to some heuristic).
2023-07-05 14:04:29 -04:00
e80c102dee regex: tweak formatting of regex-automata version spec
This makes it easier to enable the `logging` feature for regex-automata.

I wish I could just enable it unconditionally, but it winds up producing
a lot of output because ripgrep uses regexes for things other than the
primary search (like every glob). Sigh.
2023-07-05 14:04:29 -04:00
8ac66a9e04 regex: refactor matcher construction
This does a little bit of refactoring so that we can pass both a
ConfiguredHIR and a Regex to the inner literal extraction routine.

One downside of this approach is that a regex object hangs on to a
ConfiguredHIR. But the extra memory usage is probably negligible. A
benefit though is that converting the HIR to its concrete syntax is now
lazy and only happens when logging is enabled.
2023-07-05 14:04:29 -04:00
04dde9a4eb regex: tweak DFA settings
This increases the limits a bit for when the regex engine will build and
use a fully compiled DFA. They can faster in some circumstances. For
example, '(?-u)^\w{30,}$' gets a nice speed boost from state
acceleration.

We are also able to remove `regex` proper as a dependency. Wow.
2023-07-05 14:04:29 -04:00
81341702af regex: push more pattern handling to matcher construction
Previously, ripgrep core was responsible for escaping regex patterns and
implementing the --line-regexp flag. This commit moves that
responsibility down into the matchers such that ripgrep just needs to
hand the patterns it gets off to the matcher builder. The builder will
then take care of escaping and all that.

This was done to make pattern construction completely owned by the
matcher builders. With the arrival regex-automata, this means we can
move to the HIR very quickly and then never move back to the concrete
syntax. We can then build our regex directly from the HIR. This overall
can save quite a bit of time, especially when searching for large
dictionaries.

We still aren't quite as fast as GNU grep when searching something on
the scale of /usr/share/dict/words, but we are basically within spitting
distance. Prior to this, we were about an order of magnitude slower.

This architecture in particular lets us write a pretty simple fast path
that avoids AST parsing and HIR translation entirely: the case where one
is just searching for a literal. In that case, we can hand construct the
HIR directly.
2023-07-05 14:04:29 -04:00
d34c5c88a7 globset: fix build error in tests
I guess we haven't been testing with the Serde feature enabled? Weird.
2023-07-05 14:04:29 -04:00
4b8aa91ae5 deps: update to pcre2 0.2.4
0.2.4 updates to PCRE2 10.42 and has a few other nice changes. For
example, when `utf` is enabled, the crate will always set the
PCRE2_MATCH_INVALID_UTF option. That means we no longer need to do
transcoding or UTF-8 validity checks.

Because of this, we actually get to remove one of the two uses of
`unsafe` in ripgrep's `main` program.

(This also updates a couple other dependencies for convenience.)
2023-07-05 14:04:29 -04:00
a775b493fd regex: small cleanups
Just some small polishing. We also get rid of thread_local in favor of
using regex-automata, mostly just in the name of reducing dependencies.
(We should eventually be able to drop thread_local completely.)
2023-07-05 14:04:29 -04:00
a6dbff502f regex: s/locations/captures
Now that we use regex-automata, we no longer use any type with
"locations" in it. Instead, that's mostly legacy from the top-level
regex crate.
2023-07-05 14:04:29 -04:00
51480d57a6 regex: simplify AST analysis a bit
The verbatim literal stuff hasn't been used for a while and I don't
foresee it being used. If it's really needed, it would probably better
to just implement it by looking at the pattern string itself, which
avoids parsing it into an AST altogether.
2023-07-05 14:04:29 -04:00
d9bd261be8 regex: some small cleanup in 'strip.rs'
We also utilize bstr's methods to get rid of some helpers we had written
by hand.
2023-07-05 14:04:29 -04:00
9d62eb997a BREAKING: regex: finally remove CRLF hack
Now that Rust's regex crate finally supports a CRLF mode, we can remove
this giant hack in ripgrep to enable it. (And assuredly did not work in
all cases.)

The way this works in the regex engine is actually subtly different than
what ripgrep previously did. Namely, --crlf would previously treat
either \r\n or \n as a line terminator. But now it treats \r\n, \n and
\r as line terminators. In effect, it is implemented by treating \r and
\n as line terminators, but ^ and $ will never match at a position
between a \r and a \n.

So basically this means that $ will end up matching in more cases than
it might be intended too, but I don't expect this to be a big problem in
practice.

Note that passing --crlf to ripgrep and enabling CRLF mode in the regex
via the `R` inline flag (e.g., `(?R:$)`) are subtly different. The `R`
flag just controls the regex engine, but --crlf instructs all of ripgrep
to use \r\n as a line terminator. There are likely some inconsistencies
or corner cases that are wrong as a result of this cognitive dissonance,
but we choose to leave well enough alone for now.

Fixing this for real will probably require re-thinking how line
terminators are handled in ripgrep. For example, one "problem" with how
they're handled now is that ripgrep will re-insert its own line
terminators when printing output instead of copying the input. This is
maybe not so great and perhaps unexpected. (ripgrep probably can't get
away with not inserting any line terminators. Users probably expect
files that don't end with a line terminator whose last line matches to
have a line terminator inserted.)
2023-07-05 14:04:29 -04:00