1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00
Commit Graph

1225 Commits

Author SHA1 Message Date
Andrew Gallant
967e7ad0de ripgrep: add --auto-hybrid-regex flag
This flag, when set, will automatically dispatch to PCRE2 if the given
regex cannot be compiled by Rust's regex engine. If both engines fail to
compile the regex, then both errors are surfaced.

Closes #1155
2019-04-14 19:29:27 -04:00
Andrew Gallant
9952ba2068 deps: update glob dev-dependency 2019-04-14 19:29:27 -04:00
Andrew Gallant
b751758d60 deps: update everything 2019-04-14 19:29:27 -04:00
Andrew Gallant
8f14cb18a5 ripgrep: increase pcre2's default JIT stack size
The default stack size is 32KB, and this increases it to 10MB. 32KB is
pretty paltry in the environments in which ripgrep runs, and 10MB is
easily afforded as a maximum size. (The size limit we set for Rust's
regex engine is considerably larger.)

This was motivated due to the fack that JIT stack limits have been
observed to be hit in the wild:
https://github.com/Microsoft/vscode/issues/64606
2019-04-14 19:29:27 -04:00
Andrew Gallant
da9d720431 ripgrep: add --pcre2-version flag
This flag will output details about the version of PCRE2 that ripgrep
is using (if any).
2019-04-14 19:29:27 -04:00
Andrew Gallant
a9d71a0368 pcre2: add a few re-exports
This adds the top-level is_jit_available and version free functions from
the underlying pcre2 crate, and also forwards the max_jit_stack_size
option.
2019-04-14 19:29:27 -04:00
Andrew Gallant
f3646242cc deps: use pcre2 0.2.0
This comes with PCRE 10.32 and a few new options we'll use in subsequent
commits.
2019-04-14 19:29:27 -04:00
Andrew Gallant
601f212a0b ripgrep: add -I as a short option for --no-filename
This flag is commonly used in pipelines and it can be annoying to write
it out every time you need it.

Ideally, we would use -h for this to match GNU grep, but -h is used to
print help output.

Closes #1185
2019-04-14 19:29:27 -04:00
Andrew Gallant
5a565354f8 versioning: next version will be ripgrep 11
This sets up the release announcement and briefly describes the
versioning change. The actual version change itself won't happen until
the release.

Closes #1172
2019-04-14 19:29:27 -04:00
Andrew Gallant
2a6532ae71 doc: note cases of exorbitant memory usage
Fixes #1189
2019-04-14 19:29:27 -04:00
Andrew Gallant
ece1f50cfe printer: support previews for long lines
This commit adds support for showing a preview of long lines. While the
default still remains as completely suppressing the entire line, this
new functionality will show the first N graphemes of a matching line,
including the number of matches that are suppressed.

This was unfortunately a fairly invasive change to the printer that
required a bit of refactoring. On the bright side, the single line
and multi-line coloring are now more unified than they were before.

Closes #1078
2019-04-14 19:29:27 -04:00
Andrew Gallant
a7d26c8f14 binary: rejigger ripgrep's handling of binary files
This commit attempts to surface binary filtering in a slightly more
user friendly way. Namely, before, ripgrep would silently stop
searching a file if it detected a NUL byte, even if it had previously
printed a match. This can lead to the user quite reasonably assuming
that there are no more matches, since a partial search is fairly
unintuitive. (ripgrep has this behavior by default because it really
wants to NOT search binary files at all, just like it doesn't search
gitignored or hidden files.)

With this commit, if a match has already been printed and ripgrep detects
a NUL byte, then it will print a warning message indicating that the search
stopped prematurely.

Moreover, this commit adds a new flag, --binary, which causes ripgrep to
stop filtering binary files, but in a way that still avoids dumping
binary data into terminals. That is, the --binary flag makes ripgrep
behave more like grep's default behavior.

For files explicitly specified in a search, e.g., `rg foo some-file`,
then no binary filtering is applied (just like no gitignore and no
hidden file filtering is applied). Instead, ripgrep behaves as if you
gave the --binary flag for all explicitly given files.

This was a fairly invasive change, and potentially increases the UX
complexity of ripgrep around binary files. (Before, there were two
binary modes, where as now there are three.) However, ripgrep is now a
bit louder with warning messages when binary file detection might
otherwise be hiding potential matches, so hopefully this is a net
improvement.

Finally, the `-uuu` convenience now maps to `--no-ignore --hidden
--binary`, since this is closer to the actualy intent of the
`--unrestricted` flag, i.e., to reduce ripgrep's smart filtering. As a
consequence, `rg -uuu foo` should now search roughly the same number of
bytes as `grep -r foo`, and `rg -uuua foo` should search roughly the
same number of bytes as `grep -ra foo`. (The "roughly" weasel word is
used because grep's and ripgrep's binary file detection might differ
somewhat---perhaps based on buffer sizes---which can impact exactly what
is and isn't searched.)

See the numerous tests in tests/binary.rs for intended behavior.

Fixes #306, Fixes #855
2019-04-14 19:29:27 -04:00
Andrew Gallant
bd222ae93f regex: fix HIR analysis bug
An alternate can be empty at this point, so we must handle it. We didn't
before because the regex engine actually disallows empty alternates,
however, this code runs before the regex compiler rejects the regex.
2019-04-14 19:29:27 -04:00
hupfdule
4359d8aac0 ignore/types: add more extensions for xml
This includes:

    *.dtd for Document Type Definitions
    *.xsl and *.xslt for XSL Transformation descriptions
    *.xsd for XML Schema definitions
    *.xjb for JAXB bindings
    *.rng for Relax NG files
    *.sch for Schematron files

PR #1243
2019-04-09 15:17:57 -04:00
tonypai
308819fb1f ignore/types: add lock files
Treat anything with a `.lock` extension as a lock file, with
an extra rule or two for special cases, e.g., package-lock.json.
2019-04-09 10:24:48 -04:00
Andrew Gallant
09108b7fda regex: make multi-literal searcher faster
This makes the case of searching for a dictionary of a very large number
of literals much much faster. (~10x or so.) In particular, we achieve this
by short-circuiting the construction of a full regex when we know we have
a simple alternation of literals. Building the regex for a large dictionary
(>100,000 literals) turns out to be quite slow, even if it internally will
dispatch to Aho-Corasick.

Even that isn't quite enough. It turns out that even *parsing* such a regex
is quite slow. So when the -F/--fixed-strings flag is set, we short
circuit regex parsing completely and jump straight to Aho-Corasick.

We aren't quite as fast as GNU grep here, but it's much closer (less than
2x slower).

In general, this is somewhat of a hack. In particular, it seems plausible
that this optimization could be implemented entirely in the regex engine.
Unfortunately, the regex engine's internals are just not amenable to this
at all, so it would require a larger refactoring effort. For now, it's
good enough to add this fairly simple hack at a higher level.

Unfortunately, if you don't pass -F/--fixed-strings, then ripgrep will
be slower, because of the aforementioned missing optimization. Moreover,
passing flags like `-i` or `-S` will cause ripgrep to abandon this
optimization and fall back to something potentially much slower. Again,
this fix really needs to happen inside the regex engine, although we
might be able to special case -i when the input literals are pure ASCII
via Aho-Corasick's `ascii_case_insensitive`.

Fixes #497, Fixes #838
2019-04-07 19:11:03 -04:00
Andrew Gallant
743d64f2e4 deps: update to clap 2.33 2019-04-06 10:35:08 -04:00
lesnyrumcajs
5962abc465 searcher: add option to disable BOM sniffing
This commit adds a new encoding feature where the -E/--encoding flag
will now accept a value of 'none'. When given this value, all encoding
related machinery is disabled and ripgrep will search the raw bytes of
the file, including the BOM if it's present.

Closes #1207, Closes #1208
2019-04-06 10:35:08 -04:00
dana
1604a18db3 ignore/types: add *.am and *.in for C/C++/make
PR #1205
2019-04-06 08:02:04 -04:00
luzpaz
9eeb0b01ce readme: add Repology badge
This adds a badge to the README.md file indicating to users that click
on it if their os/distro carries that latest version of ripgrep.

PR #1213
2019-04-06 08:00:40 -04:00
dana
df4400209a ripgrep: remove extra new-line after Clap output
PR #1222
2019-04-06 07:59:36 -04:00
Andrew Gallant
77439f99a4 deps: add bstr to Cargo.lock 2019-04-05 23:24:08 -04:00
Andrew Gallant
be7d6dd9ce regex: print out final regex in trace mode
This is useful for debugging to see what regex is actually being run.
We put this as a trace since the regex can be quite gnarly. (It is not
pretty printed.)
2019-04-05 23:24:08 -04:00
Andrew Gallant
9f15e3b671 regex: fix a perf bug when using -w flag
When looking for an inner literal to speed up searches, if only a prefix
is found, then we generally give up doing inner literal optimizations since
the regex engine will generally handle it for us. Unfortunately, this
decision was being made *before* we wrap the regex in (^|\W)...($|\W) when
using the -w/--word-regexp flag, which would then defeat the literal
optimizations inside the regex engine.

We fix this with a bit of a hack that says, "if we're doing a word regexp,
then give me back any literal you find, even if it's a prefix."
2019-04-05 23:24:08 -04:00
Andrew Gallant
254b8b67bb globset: small perf improvements
This tweaks the path handling functions slightly to make them a hair
faster. In particular, `file_name` is called on every path that ripgrep
visits, and it was possible to remove a few branches without changing
behavior.
2019-04-05 23:24:08 -04:00
Andrew Gallant
8a7f43b84d globset: use bstr
This simplifies the various path related functions and pushed more platform
dependent code down into bstr. This likely also makes things a bit more
efficient on Windows, since we now only do a single UTF-8 check for each
file path.
2019-04-05 23:24:08 -04:00
Andrew Gallant
d968a27ed5 cli: use bstr
This uses bstr in the unescaping logic. This lets us remove some platform
specific code, and also lets us remove a hacked UTF-8 decoder on raw
bytes.
2019-04-05 23:24:08 -04:00
Andrew Gallant
9b8f5cbaba config: switch to using bstrs
This lets us implement correct Unicode trimming and also simplifies the
parsing logic a bit. This also removes the last platform specific bits of
code in ripgrep core.
2019-04-05 23:24:08 -04:00
Andrew Gallant
c52da74ac3 printer: use bstr
This starts the usage of bstr in the printer. We don't use it too much
yet, but it comes in handy for implementing PrinterPath and lets us push
down some platform specific code into bstr.
2019-04-05 23:24:08 -04:00
Andrew Gallant
7dcbff9a9b searcher: partially migrate to bstr
This commit causes grep-searcher to use byte strings internally for its
line buffer support. We manage to remove a use of `unsafe` by doing this
(by pushing it down into `bstr`).

We stop short of using byte strings everywhere else because we rely
heavily on the `impl ops::Index<[u8]> for grep_matcher::Match` impl,
which isn't available for byte strings. (It is premature to make bstr a
public dep of a core crate like grep-matcher, but maybe some day.)
2019-04-05 23:24:08 -04:00
Andrew Gallant
bef1f0e770
ci: switch to xenial (#1234)
Rust is having problems with trusty, in particular, see this bug I
filed: https://github.com/rust-lang/rust/issues/59411

This was purpotedly fixed in
https://github.com/rust-lang/rust/pull/59468,
but it appears the issue is still occurring.

This commit tries to update to Ubuntu 16.04 in the hope that it will fix
this problem.
2019-04-03 19:52:34 -04:00
Andrew Gallant
cd9815cb37
deps: update to aho-corasick 0.7
We do the simplest possible change to migrate to the new version.

Fixes #1228
2019-04-03 13:51:26 -04:00
Andrew Gallant
3f22c3a658
deps: update everything
This updates all dependencies to their latest versions.

We tolerate a duplicative aho-corasick for now, which we will fix in the
next commit.
2019-04-03 13:07:26 -04:00
Andrew Gallant
0913972104
deps: bump encoding_rs_io
This brings in a new API for disabling BOM sniffing.

This is part of the work toward completing
https://github.com/BurntSushi/ripgrep/issues/1207
2019-03-03 16:36:34 -05:00
Andrew Gallant
f19b84fb23
regex: bump regex dep to fix match bug
See

* 661bf53d5b
* edf45e6f5f

for details on the bug fix, which was in the regex engine.

Fixes #1203
2019-02-27 17:42:14 -05:00
Andrew Gallant
59fc583aeb
readme: include details about filtering
Despite the fact that we mention this in several places, people are
still surprised by ripgrep's "smart" filtering.
2019-02-27 08:01:23 -05:00
Andrew Gallant
1c7c4e6640
deps: update tempfile 2019-02-21 16:32:17 -05:00
Andrew Gallant
69c5e3938d
deps: bump smallvec
This gets rid of the unmaintained crates `unreachable` and `void`. Yay!
2019-02-21 16:31:48 -05:00
Andrew Gallant
d9cf05ad50
deps: update to aho-corasick 0.6.10
This brings in a fix for this bug:
https://github.com/BurntSushi/aho-corasick/issues/37

Fixes #1079
2019-02-16 11:39:33 -05:00
Andrew Gallant
af8b6caebb
deps: update various dependencies 2019-02-16 09:39:42 -05:00
Andrew Gallant
c84cfb6756
grep-regex-0.1.2 2019-02-16 09:30:06 -05:00
Andrew Gallant
895e26a000
ci: don't do releases on all tags
This attempts to make Appveyor more conservative in what tags it thinks
are releases. I don't know for sure, but it looks like the previous
regex could match anywhere, so we anchor it.

Fixes #1195
2019-02-10 12:51:56 -05:00
Andrew Gallant
8c95290ff6
deps: miscellaneous updates 2019-02-10 07:45:08 -05:00
Andrew Gallant
d6feeb7ff2
grep-searcher-0.1.3 2019-02-10 07:42:37 -05:00
Andrew Gallant
626ed00c19
searcher: revert big-endian patch
This undoes the patch to stop using bytecount on big-endian
architectures. In particular, we bump our bytecount dependency to the
latest release, which has a fix.

This reverts commit a4868b8835.

Fixes #1144 (again), Closes #1194
2019-02-10 07:40:32 -05:00
Andrew Gallant
332ad18401
tests: use const constructor for atomics
We did this in 05411b2b for core ripgrep, but didn't carry it over to
tests.
2019-02-09 16:27:25 -05:00
Andrew Gallant
fc3cf41247
grep-searcher-0.1.2 2019-02-09 16:13:07 -05:00
Andrew Gallant
a4868b8835
searcher: use naive line counting on big-endian
This patches out bytecount's "fast" vectorized algorithm on big-endian
machines, where it has been observed to fail. Going forward, bytecount
should probably fix this on their end, but for now, we take a small
performance hit on big-endian machines.

Fixes #1144
2019-02-09 16:13:07 -05:00
John Schmidt
f99b991117 ignore/types: add zig
PR #1191
2019-02-08 08:12:40 -05:00
Andrew Gallant
de0bc78982
deps: bump encoding_rs to 0.8.16
This brings in an updated `encoding_rs` crate that uses `packed_simd`,
which compiles on the latest nightly. Compilation times do appear to be
impacted significantly though.

Fixes #1175 (again)
2019-02-07 17:05:14 -05:00