1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-06-25 14:22:54 +02:00
Commit Graph

1744 Commits

Author SHA1 Message Date
9d62eb997a BREAKING: regex: finally remove CRLF hack
Now that Rust's regex crate finally supports a CRLF mode, we can remove
this giant hack in ripgrep to enable it. (And assuredly did not work in
all cases.)

The way this works in the regex engine is actually subtly different than
what ripgrep previously did. Namely, --crlf would previously treat
either \r\n or \n as a line terminator. But now it treats \r\n, \n and
\r as line terminators. In effect, it is implemented by treating \r and
\n as line terminators, but ^ and $ will never match at a position
between a \r and a \n.

So basically this means that $ will end up matching in more cases than
it might be intended too, but I don't expect this to be a big problem in
practice.

Note that passing --crlf to ripgrep and enabling CRLF mode in the regex
via the `R` inline flag (e.g., `(?R:$)`) are subtly different. The `R`
flag just controls the regex engine, but --crlf instructs all of ripgrep
to use \r\n as a line terminator. There are likely some inconsistencies
or corner cases that are wrong as a result of this cognitive dissonance,
but we choose to leave well enough alone for now.

Fixing this for real will probably require re-thinking how line
terminators are handled in ripgrep. For example, one "problem" with how
they're handled now is that ripgrep will re-insert its own line
terminators when printing output instead of copying the input. This is
maybe not so great and perhaps unexpected. (ripgrep probably can't get
away with not inserting any line terminators. Users probably expect
files that don't end with a line terminator whose last line matches to
have a line terminator inserted.)
2023-07-05 14:04:29 -04:00
e028ea3792 regex: migrate grep-regex to regex-automata
We just do a "basic" dumb migration. We don't try to improve anything
here.
2023-07-05 14:04:29 -04:00
1035f6b1ff deps: initial migration steps to regex 1.9
This leaves the grep-regex crate in tatters. Pretty much the entire
thing needs to be re-worked. The upshot is that it should result in some
big simplifications. I hope.

The idea here is to drop down and actually use regex-automata 0.3
instead of the regex crate itself.
2023-07-05 14:04:29 -04:00
a7f1276021 readme: update Debian instructions
We probably don't need to mention Buster specifically nor Debian
unstable since ripgrep has been in Debian for a while now.

But we can't just get rid of the `deb` file either, because Debian might
package a very old version.

Fixes #2531
2023-06-12 07:50:13 -04:00
4fcb1b2202 cli: replace atty with std::io::IsTerminal
The `atty` crate is unmaintained[1] and `std::io::IsTerminal` was
stabilized in Rust 1.70.

[1]: https://rustsec.org/advisories/RUSTSEC-2021-0145.html

PR #2526
2023-06-05 14:00:46 -04:00
949092fd22 ignore/types: add 'mdwn' to Markdown
PR #2520
2023-05-26 14:44:41 -04:00
4a7e7094ad deps: update everything else 2023-05-25 13:06:13 -04:00
fc0d9b90a9 deps: bump regex to 1.8.3
This brings in an update from the regex crate that fixes a matching bug
for particular kinds of alternations of literals.

Fixes #2518
2023-05-25 13:06:13 -04:00
335aa4937a ignore/types: add *.pyi for Python
https://peps.python.org/pep-0484/#stub-files

PR #2517
2023-05-23 07:10:02 -04:00
803c447845 searcher: re-enable mmap on 32-bit architectures
memmap2 v0.3.0 introduced a regression when trying to map files larger than 4GB
on 32-bit architectures[1] which was subsequently fixed in v0.3.1[2].

This commit bumps locked version of the memmap2 dependency to the current v0.5.0
and reverts fdfc418be5 to re-enable mmap on 32-bit
architectures as a different approach to fixing [3].

This was tested to report matches from the end of a 5GB file using MinGW and Wine.

Ref #1911, PR #2000 

[1] 5e271224c8
[2] 9aa838aed9
[3] https://github.com/BurntSushi/ripgrep/issues/1911
2023-05-19 08:23:53 -04:00
c5415adbe8 deps: update everything
This does unfortunately bring in both regex-syntax 0.6 and 0.7, but
we'll fix that once regex 1.9 is out.
2023-05-16 13:14:23 -04:00
251376597f deps: update minimum version of grep crate
Ref #2516
2023-05-16 13:13:34 -04:00
e593f5b7ee grep-0.2.12 grep-0.2.12 2023-05-16 13:12:45 -04:00
6b19be2477 crates/grep: remove 'deny(missing_docs)'
This crate is only a shim over a bunch of other crates. I'm not sure
that there's anything to add to each of the `pub extern` items. So
instead of just writing fluff, I removed the lint.

Fixes #2516
2023-05-16 13:10:42 -04:00
041544853c doc: fix --quiet docs
The wording was previously inverted, which had the opposite
meaning as was intended.

Fixes #1962
2023-03-28 07:22:59 -04:00
a7ae9e4043 ignore/types: add support for docker-compose files
Default file is docker-compose.yml and the documentation
mentions overrides in the form of docker-compose.*.yml.

PR #2469
2023-03-21 12:56:38 -04:00
595e7845b8 readme: add a link to delta's support for ripgrep
Ref: https://github.com/BurntSushi/ripgrep/issues/86#issuecomment-1469717706
2023-03-15 08:02:04 -04:00
44fb9fce2c ignore/types: add *.sln for msbuild
.sln is the extension for Visual Studio Project Soltion files, one of
the file types accepted as inputs by MSBuild.

PR #2415
2023-02-09 21:20:49 -05:00
339c46a6ed ignore/types: enhance terraform default filter
The default filter for terraform only checks for *.tf files, but there
are quite few other terraform filetypes.

The explanation for all of them can be found below (including link to
documentation from Hashicorp at time of writing)

- *.tf.json & *.tfvars.json is to capture the files written in
  JSON-based variant of the Terraform language
    - https://developer.hashicorp.com/terraform/language/files
- *.tfvars is used to supply variables
    - https://developer.hashicorp.com/terraform/cloud-docs/workspaces/variables#6-auto-tfvars-variable-files
- .terraform.lock.hcl is used as a Dependency lock file
    - https://developer.hashicorp.com/terraform/language/files/dependency-lock
- terraform.rc & .terraformrc, *.tfrc
    - https://developer.hashicorp.com/terraform/cli/config/config-file

PR #2412
2023-02-09 12:57:01 -05:00
fe97c0a152 ignore-0.4.20 ignore-0.4.20 2023-01-15 08:21:02 -05:00
826f3fad5b ignore/api: add Clone and Debug impls for OverrideBuilder
PR #2397
2023-01-15 08:16:27 -05:00
bc55049327 readme: update MSRV in README
... this was apparently long outdated, wow.
2023-01-05 12:09:46 -05:00
d58e9353fc deps: update to grep 0.2.11 2023-01-05 09:13:47 -05:00
ca60fef4db grep-0.2.11 grep-0.2.11 2023-01-05 09:12:49 -05:00
a25307d6c8 deps: update to grep-printer 0.1.7 2023-01-05 09:12:37 -05:00
b80947a8b3 grep-printer-0.1.7 grep-printer-0.1.7 2023-01-05 09:11:16 -05:00
ad793a0d8f deps: update to grep-searcher 0.1.11 2023-01-05 09:07:49 -05:00
120e55e7c7 grep-searcher-0.1.11 grep-searcher-0.1.11 2023-01-05 09:07:09 -05:00
3941a7701d deps: update to grep-pcre2 0.1.6 2023-01-05 09:06:52 -05:00
96e130fbf9 grep-pcre2-0.1.6 grep-pcre2-0.1.6 2023-01-05 09:05:59 -05:00
180c4eaf8b deps: update to grep-regex 0.1.11 2023-01-05 09:05:39 -05:00
81529288cf grep-regex-0.1.11 grep-regex-0.1.11 2023-01-05 09:02:55 -05:00
bcc7473a87 deps: update to grep-matcher 0.1.6 2023-01-05 09:02:40 -05:00
bc78c644db grep-matcher-0.1.6 grep-matcher-0.1.6 2023-01-05 09:00:33 -05:00
dc7267a0fb deps: update to grep-cli 0.1.7 2023-01-05 08:58:47 -05:00
3224324e25 grep-cli-0.1.7 grep-cli-0.1.7 2023-01-05 08:57:31 -05:00
0f61f08eb1 deps: update to ignore 0.4.19 2023-01-05 08:57:05 -05:00
a0e8dbe9df ignore-0.4.19 ignore-0.4.19 2023-01-05 08:55:46 -05:00
e95254a86f deps: remove ignore's dependency on crossbeam-utils
Scoped threads are now part of std.
2023-01-05 08:51:08 -05:00
2f484d8ce5 deps: update to globset 0.4.10 2023-01-05 08:49:58 -05:00
364772ddd2 globset-0.4.10 globset-0.4.10 2023-01-05 08:45:47 -05:00
2e207833bc deps: upgrade to jemallocator 0.5 2023-01-05 08:33:43 -05:00
92b35a65f8 deps: upgrade to base64 0.20 2023-01-05 08:21:49 -05:00
ac8fecbbf2 deps: upgrade bstr to 1.1 2023-01-05 08:21:15 -05:00
8596817374 deps: do semver compatible upgrades 2023-01-05 08:16:32 -05:00
28bff84a0a deps: remove 'num_cpus'
Now that std:🧵:available_parallelism is a thing, we no longer
need num_cpus.
2023-01-05 08:15:09 -05:00
61101289fa cargo: set rust-version
This should hopefully make compilation errors from using
an older-than-supported compiler more helpful.

PR #2373
2022-12-21 07:37:09 -05:00
13faa39b66 deps: update all dependencies within semver
Note that this adds a new dependency, 'unicode-ident', and removes
'unicode-xid'. I looked briefly at 'unicode-ident' and all looks okay.
It is also permissively licensed.
2022-12-20 09:23:29 -05:00
6b61271bbb benchsuite/runs: add another run of the benchmarks
Looks like ripgrep is still the king. ;-)
2022-12-16 11:24:10 -05:00
1be86392e0 benchsuite: pass '-a' to ugrep in some cases
It looks like it incorrectly treats a file that is purely valid UTF-8 as
a binary file, which in turn effectively renders all of the Russian
subtitle benchmarks moot for ugrep. So we pass '-a' to force ugrep to
treat the file as text.

This technically gives ugrep an edge because it now no longer needs to
look to see if the haystack is binary or not. In practice this is
usually implemented using highly optimized SIMD routines (e.g.,
'memchr'), so it tends not to matter much. We might also consider
passing '-a' to all grep commands. But... I think using '-a' is the less
common case and we should try to benchmark the common case.
2022-12-16 11:21:58 -05:00