1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00
Commit Graph

399 Commits

Author SHA1 Message Date
Andrew Gallant
9d738ad0c0 regex: fix inner literal extraction that resulted in false negatives
In some rare cases, it was possible for ripgrep's inner literal detector
to extract a set of literals that could produce a false negative. #2884
gives an example: `(?i:e.x|ex)`. In this case, the set extracted can be
discovered by running `rg '(?i:e.x|ex) --trace`:

    Seq[E("EX"), E("Ex"), E("eX"), E("ex")]

This extraction leads to building a multi-substring matcher for `EX`,
`Ex`, `eX` and `ex`. Searching the haystack `e-x` produces no match,
and thus, ripgrep shows no matches. But the regex `(?i:e.x|ex)` matches
`e-x`.

The issue at play here was that when two extracted literal sequences
were unioned, we were correctly unioning their "prefix" attribute.
And this in turn leads to those literal sequences being combined
incorrectly via cross product. This case in particular triggers it
because two different optimizations combine to produce an incorrect
result. Firslty, the regex has a common prefix extracted and is
rewritten as `(?i:e(?:.x|x))`. Secondly, the `x` in the first branch of
the alternation has its `prefix` attribute set to `false` (correctly),
which means it can't be cross producted with another concatenation. But
in this case, it is unioned with the `x` from the second branch, and
this results in the union result having `prefix` set to `true`. This
in turn pops up and lets it get cross producted with the `e` prefix,
producing an incorrect literal sequence.

We fix this by changing the implementation of `union` to return
`prefix` set to `true` only when *both* literal sequences being unioned
have `prefix` set to `true`.

Doing this exposed a second bug that was present, but was purely
cosmetic: the extracted literals in this case, after the fix, are
`X` and `x`. They were considered "exact" (i.e., lead to a match),
but of course they are not. Observing an `X` or an `x` does not mean
there is a match. This was fixed by making `choose` always return
an inexact literal sequence. This is perhaps too conservative in
aggregate in some cases, but always correct. The idea here is that if
one is choosing between two concatenations, then it is likely the case
that the sequence returned should be considered inexact. The issue
is that this can lead to avoiding cross products in some cases that
would otherwise be correct. This is bad because it means extracting
shorter literals in some cases. (In general, the longer the literal the
better.) But we prioritize correctness for now and fix it. You can see
a few tests where this shortens some extracted literals.

Fixes #2884
2024-09-08 22:00:46 -04:00
Cort Spellman
af8c386d5e
doc: fix typo in --heading flag help
PR #2864
2024-08-02 17:32:42 -04:00
Tobias Decking
c9ebcbd8ab
globset: optimize character escaping
Rewrites the char_to_escaped_literal and bytes_to_escaped_literal
functions in a way that minimizes heap allocations. After this, the
resulting string is the only allocation remaining.

I believe when this code was originally written, the routines available
to avoid heap allocations didn't exist.

I'm skeptical that this matters in the grand scheme of things, but I
think this is still worth doing for "good sense" reasons.

PR #2833
2024-06-05 09:56:00 -04:00
Andrew Gallant
0a0893a765
ignore: add debug log message when opening gitignore file
I'm not sure why it took me this long to add this debug message, but
it's quite useful in determining where ignore rules are coming from.
2024-05-27 14:53:19 -04:00
Andrew Gallant
f1d23c06e3 cli: add more logging for stdin heuristic detection
Stdin heuristic detection is complicated and opaque enough that it's
worth having easy access to the complete story that leads ripgrep to
decide whether to search stdin or not.

Ref #2806
2024-05-13 09:43:04 -04:00
tgolang
22b677900f
doc: fix some typos
PR #2754
2024-05-13 07:44:51 -04:00
NicoElbers
bb6f0f5519
doc: fix typo in --vimgrep help message
PR #2802
2024-05-11 07:02:24 -04:00
Nicolas Holzschuch
bb8601b2ba
printer: make compilation on non-unix, non-windows platforms work
Some of the new hyperlink work caused ripgrep to stop compiling
on non-{Unix,Windows} platforms. The most popular of which is WASI.

This commit makes non-{Unix,Windows} compile again. And we add a
very basic WASI test in CI to catch regressions.

More work is needed to make tests on non-{Unix,Windows} platforms
work. And of course, this commit specifically takes the path of disabling
hyperlink support for non-{Unix,Windows} platforms.
2024-04-23 13:12:19 -04:00
redistay
d922b7ac11
doc: fix typo
PR #2776
2024-04-02 09:10:25 -04:00
Linda_pp
2acf25c689
ignore/types: add WGSL to the default file types
[WGSL][1] is a shading language for WebGPU. As defined in [Appendix
A][2], the file extension is `.wgsl`.

PR #2774 

[1]: https://www.w3.org/TR/WGSL/
[2]: https://www.w3.org/TR/WGSL/#text-wgsl-media-type
2024-04-01 23:05:15 -04:00
Vadim Kostin
80007698d3
ignore/types: add Vue
PR #2772
2024-04-01 07:49:29 -04:00
cgzones
3ad0e83471
ignore/walk: correct build_parallel() documentation
The returned closure should return `WalkState`, not `()`.

Closes #2767
2024-03-27 14:50:05 -04:00
Brent Williams
9da0995df4
ignore/types: add 'svelte' to the default file types
Ref: https://svelte.dev/

PR #2759
2024-03-19 13:36:08 -04:00
Andrew Gallant
e9abbc1a02 cargo: nuke 'simd-accel' from orbit
This feature causes nothing but problems and is frequently broken. The
only optimization it was enabling were SIMD optimizations for
transcoding. In particular, for UTF-16 transcoding. This is performed by
the [`encoding_rs`](https://github.com/hsivonen/encoding_rs) crate,
which specifically uses unstable portable SIMD APIs instead of the
stable non-portable SIMD APIs.

SIMD optimizations that apply to search have long been making use of
stable APIs, and are automatically enabled when your target supports
them. This is, IMO, the correct user experience and one that
`encoding_rs` refuses to support. I'm done dealing with it, so
transcoding will only use scalar code until the SIMD optimizations in
`encoding_rs` work on stable. (This doesn't mean that `encoding_rs` has
to change. This could also be fixed by stabilizing `std::simd`.)

Fixes #2748
2024-03-07 09:47:43 -05:00
Andrew Gallant
59212d08d3
style: fix new lints
The Rust compiler seems to have gotten smarter at finding unused or
redundant imports.
2024-03-07 09:37:48 -05:00
SuperSpecialSweet
6ebebb2aaa
doc: fix typo in comments
PR #2741
2024-02-22 06:57:58 -05:00
Andrew Gallant
e92e2ef813
cli: remove stray dbg!
Whoops, forgot to review my commits before pushing.
2024-02-15 12:02:15 -05:00
Andrew Gallant
4a30819302
cli: tweak how "is one file" predicate works
In effect, we switch from `path.is_file()` to `!path.is_dir()`. In cases
where process substitution is used, for example, the path can actually
have type "fifo" instead of "file." Even if it's a fifo, we want to
treat it as-if it were a file. The real key here is that we basically
always want to consider a lone argument as a file so long as we know it
isn't a directory. Because a directory is the only thing that will
causes us to (potentially) search more than one thing.

Fixes #2736
2024-02-15 11:59:59 -05:00
Wilfred Hughes
9b42af96f0
doc: fix typo in --hidden docs
PR #2718
2024-01-22 13:31:11 -05:00
Andrew Gallant
2c3897585d
ignore-0.4.22 2024-01-06 14:27:44 -05:00
Andrew Gallant
c8e4a84519
cli: prefix all non-fatal error messages with 'rg: '
Fixes #2694
2024-01-06 14:15:52 -05:00
fe9lix
b9c774937f ignore: fix reference cycle for compiled matchers
It looks like there is a reference cycle caused by the compiled
matchers (compiled HashMap holds ref to Ignore and Ignore holds ref
to HashMap). Using weak refs fixes issue #2690 in my test project.
Also confirmed via before and after when profiling the code, see the
attached screenshots in #2692.

Fixes #2690
2024-01-06 12:50:42 -05:00
Andrew Gallant
67dd809a80
ignore: add some 'allow(dead_code)' annotations
I don't usually like doing this and would prefer to just delete unused
code, but I don't have the context required to understand why this code
is unused. A refresh of this crate is on the (distant) horizon, so I'll
just leave these here for now to squash the warnings.
2024-01-06 12:25:06 -05:00
Jan Verbeek
e0a85678e1 complete/fish: improve shell completions for fish
- Stop using `-n __fish_use_subcommand`. This had the effect of
ignoring options if a positional argument has already been given, but
that's not how ripgrep works.

- Only suggest negation options if the option they're negating is
passed (e.g., only complete `--no-pcre2` if `--pcre2` is present). The
zsh completions already do this.

- Take into account whether an option takes an argument. If an option
is not a switch then it won't suggest further options until the
argument is given, e.g. `-C<tab>` won't suggest options but `-i<tab>`
will.

- Suggest correct arguments for options. We already completed a fixed
set of choices where available, but now we go further:

  - Filenames are only suggested for options that take filenames.

  - `--pre` and `--hostname-bin` suggest binaries from `$PATH`.

  - `-t`/`--type`/&c use `--type-list` for suggestions, like in zsh,
  with a preview of the glob patterns.

  - `--encoding` uses a hardcoded list extracted from the zsh
  completions. This has been refactored into a separate file, and the
  range globs (`{1..5}`) replaced by comma globs (`{1,2,3,4,5}`) since
  those work in both shells. I verified that this produces the same
  list as before in zsh, and the same list in fish (albeit in a
  different order).

PR #2684
2024-01-06 10:39:35 -05:00
amesgen
56c7ad175a
ignore/types: add Lean
Ref: https://lean-lang.org/

PR #2678
2023-12-07 11:46:00 -05:00
Patrick Williams
2a4dba3fbf
ignore/types: add meson.options
Starting with meson 1.1, there is a preference for using meson.options
instead of meson_options.txt.  Add the new filename to the meson set.

PR #2666
2023-11-29 19:03:12 -05:00
Andrew Gallant
daa157b5f9
core: actually implement --sortr=path
This is an embarrassing oversight. A `todo!()` actually made its way
into a release! Oof.

This was working in ripgrep 13, but I had redone some aspects of sorting
and this just got left undone.

Fixes #2664
2023-11-28 16:17:14 -05:00
Andrew Gallant
0096c74c11
grep-0.3.1 2023-11-27 21:36:54 -05:00
Andrew Gallant
8c48355b03
deps: bump grep-printer to 0.2.1 2023-11-27 21:36:44 -05:00
Andrew Gallant
f9b86de963
grep-printer-0.2.1 2023-11-27 21:36:02 -05:00
Andrew Gallant
d23b74975a
deps: bump grep-searcher to 0.1.13 2023-11-27 21:35:53 -05:00
Andrew Gallant
a5cbdb3dfe
grep-searcher-0.1.13 2023-11-27 21:34:58 -05:00
Andrew Gallant
805fa32d18 searcher: work around NUL line terminator bug
As the FIXME comment says, ripgrep is not yet using the new line
terminator option in regex-automata exposed for exactly this purpose.
Because of that, line anchors like `(?m:^)` and `(?m:$)` will only match
`\n` as a line terminator. This means that when --null-data is used in
combination with --line-regexp, the anchors inserted by --line-regexp
will not match correctly. This is only a big deal in the "fast" path,
which requires the regex engine to deal with line terminators itself
correctly. The slow path strips line terminators regardless of what they
are, and so the line anchors can match (begin/end of haystack).

Fixes #2658
2023-11-27 21:17:12 -05:00
Jan Verbeek
8575d26179 complete/fish: Fix syntax for negated options
And also, negated options don't take arguments.

Specifically, the fish completion generator currently forgets to add
`-l` to negation options, leading to a list of these errors:

    complete: too many arguments

    ~/.config/fish/completions/rg.fish (line 146):
    complete -c rg -n '__fish_use_subcommand'  no-sort-files -d '(DEPRECATED) Sort results by file path.'
    ^
    from sourcing file ~/.config/fish/completions/rg.fish

    (Type 'help complete' for related documentation)

To reproduce, run `fish -c 'rg --generate=complete-fish | source'`.

It also potentially suggests a list of choices for negation options,
even though those never take arguments. That case doesn't occur with
any of the current options but it's an easy fix.

Fixes #2659, Closes #2655
2023-11-27 21:17:12 -05:00
Jon Jensen
2e81a7adfe doc: fix typo that was preventing interpolation
Closes #2662
2023-11-27 21:17:12 -05:00
Andrew Gallant
625743d7c8
grep-0.3.0 2023-11-26 15:24:09 -05:00
Andrew Gallant
3d0171040a
grep-printer-0.2.0 2023-11-26 15:21:40 -05:00
Andrew Gallant
179487aaed
grep-0.2.13 2023-11-26 14:18:17 -05:00
Andrew Gallant
b407d62b63
deps: bump grep-searcher to 0.1.12 2023-11-26 14:18:03 -05:00
Andrew Gallant
9bd1e737bc
grep-searcher-0.1.12 2023-11-26 14:17:26 -05:00
Andrew Gallant
c12231c621
deps: bump grep-pcre2 to 0.1.7 2023-11-26 14:17:11 -05:00
Andrew Gallant
b0df573834
grep-pcre2-0.1.7 2023-11-26 14:16:46 -05:00
Andrew Gallant
85b2ceecd1
deps: bump grep-regex to 0.1.12 2023-11-26 14:16:31 -05:00
Andrew Gallant
fee7ac79f1
grep-regex-0.1.12 2023-11-26 14:15:44 -05:00
Andrew Gallant
54d5540c10
deps: bump grep-matcher to 0.1.7 2023-11-26 14:15:34 -05:00
Andrew Gallant
d0251c77fe
grep-matcher-0.1.7 2023-11-26 14:13:54 -05:00
Andrew Gallant
6aa5993d4b
deps: bump grep-cli to 0.1.10 2023-11-26 14:13:40 -05:00
Andrew Gallant
6f78d211bf
grep-cli-0.1.10 2023-11-26 14:13:03 -05:00
Andrew Gallant
381c521d02
ignore-0.4.21 2023-11-26 14:12:16 -05:00
Andrew Gallant
57495db10e
deps: bump globset to 0.4.14 2023-11-26 14:11:43 -05:00