1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00
Commit Graph

52 Commits

Author SHA1 Message Date
Andrew Gallant
afa06c518a
deps: update libripgrep crate versions
This prepares them for an initial 0.1.0 release.
2018-08-20 17:34:45 -04:00
Andrew Gallant
bb110c1ebe ripgrep: migrate to libripgrep
This commit does the work to delete the old `grep` crate and effectively
rewrite most of ripgrep core to use the new libripgrep crates. The new
`grep` crate is now a facade that collects the various crates that make
up libripgrep.

The most complex part of ripgrep core is now arguably the translation
between command line parameters and the library options, which is
ultimately where we want to be.
2018-08-20 07:10:19 -04:00
Andrew Gallant
94be3bd4bb
grep: remove senseless test
It was pulling in a sizable data file and doesn't appear to be testing
anything meaningful that isn't covered by a variety of other tests.
2018-08-15 19:52:50 -04:00
Andrew Gallant
0fdab0ec5e
grep-0.1.9 2018-08-03 16:12:08 -04:00
Andrew Gallant
7e5a590276
grep: small literal detection fix
This commit tweaks the inner literal detection heuristic such that if it
comes up with any literal that is all whitespace, then it's likely a bad
literal to look for since it's so common. Therefore, we simply reject the
inner literal optimization in this case and let the regex engine do its
thang.
2018-07-17 20:27:04 -04:00
Bastien Orivel
49f36c7dcd deps: update regex to 1.0
We retain the `simd-accel` feature on globset for backwards
compatibility, but will remove it in the next semver release.
2018-05-07 13:07:30 -04:00
Andrew Gallant
42b8132d0a grep: add "perfect" smart case detection
This commit removes the previous smart case detection logic and replaces
it with detection based on the regex AST. This particular AST is a faithful
representation of the concrete syntax, which lets us be very precise in
how we handle it.

Closes #851
2018-03-13 22:55:39 -04:00
Andrew Gallant
cd08707c7c grep: upgrade to regex-syntax 0.5
This update brings with it many bug fixes:

  * Better error messages are printed overall. We also include
    explicit call out for unsupported features like backreferences
    and look-around.
  * Regexes like `\s*{` no longer emit incomprehensible errors.
  * Unicode escape sequences, such as `\u{..}` are now supported.

For the most part, this upgrade was done in a straight-forward way. We
resist the urge to refactor the `grep` crate, in anticipation of it
being rewritten anyway.

Note that we removed the `--fixed-strings` suggestion whenever a regex
syntax error occurs. In practice, I've found that it results in a lot of
false positives, and I believe that its use is not as paramount now that
regex parse errors are much more readable.

Closes #268, Closes #395, Closes #702, Closes #853
2018-03-13 22:55:39 -04:00
Andrew Gallant
1f70e9187c deps: update regex crate
This update brings with it a new feature of the regex crate which will
now use SIMD optimizations automatically at runtime with no necessary
compile time flags. All that's needed is to enable the `unstable` feature.

Other crates, such as bytecount and encoding_rs, are still using the
old-style SIMD support, so we leave the simd-accel and avx-accel features.
However, the binaries we distribute on Github no longer have those
features enabled, which makes them truly portable.

Fixes #135
2018-03-12 23:21:42 -04:00
Andrew Gallant
739f8f596b
grep: release 0.1.8 2018-02-11 13:35:54 -05:00
Andrew Gallant
3535047094 logger: drop env_logger
This commit updates the `log` crate to 0.4 and drops the dependency on
env_logger. In particular, the latest version of env_logger brings in
additional non-optional dependencies such as chrono that I don't think is
worth including into ripgrep.

It turns out ripgrep doesn't need any fancy logging. We just need a concept
of log levels and the ability to print to stderr. Therefore, we just roll
our own super simple logger.

This update is motivated by the persistent configuration task. In
particular, we need the ability to toggle the global log level more than
once, and this doesn't appear to be possible with older versions of the
log crate.
2018-02-04 10:40:20 -05:00
Balaji Sivaraman
b6177f0459 cleanup: replace try! with ? 2018-01-01 09:22:35 -05:00
dana
86c890bcec Improve detection of upper-case characters by smart-case feature
Fixes #717 (partially)

The previous implementation of the smart-case feature was actually *too* smart,
in that it inspected the final character ranges in the AST to determine if the
pattern contained upper-case characters. This meant that patterns like `foo\w`
would not be handled case-insensitively, since `\w` includes the range of
upper-case characters A–Z.

As a medium-term solution to this problem, we now inspect the input pattern
itself for upper-case characters, ignoring any that immediately follow a `\`.
This neatly handles all of the most basic cases like `\w`, `\S`, and `É`, though
it still has problems with more complex features like `\p{Ll}`. Handling those
correctly will require improvements to the AST.
2017-12-18 17:58:26 -05:00
Andrew Gallant
efa4de8126
cargo: bump to 0.7.0 2017-10-21 22:40:10 -04:00
Andrew Gallant
1267f01c24
deps: upgrade to memchr 2 2017-10-21 22:40:09 -04:00
Andrew Gallant
c648eadbaa Bump and update deps. 2017-03-12 21:33:13 -04:00
Andrew Gallant
c1b841e934 Add license files to each crate.
Fixes #381
2017-03-12 16:57:15 -04:00
Andrew Gallant
057ed6305a 0.4.0 2017-01-13 23:46:21 -05:00
Andrew Gallant
163e00677a Update to regex 0.2. 2017-01-01 01:03:21 -05:00
Andrew Gallant
d58236fbdc bump various versions 2016-12-30 15:44:08 -05:00
Andrew Gallant
b65bb37b14 Remove superfluous memmap dependency in grep crate.
Fixes #295.
2016-12-27 15:46:40 -05:00
Andrew Gallant
0473df1ef5 Disable Unicode mode for literal regex.
When ripgrep detects a literal, it emits them as raw hex escaped byte
sequences to Regex::new. This permits literal optimizations for arbitrary
byte sequences (i.e., possibly invalid UTF-8). The problem is that
Regex::new interprets hex escaped byte sequences as *Unicode codepoints*
by default, but we want them to actually stand for their raw byte values.
Therefore, disable Unicode mode.

This is OK, since the regex is composed entirely of literals and literal
extraction does Unicode case folding.

Fixes #251
2016-11-28 18:31:58 -05:00
Andrew Gallant
301a3fd71d Detect more uppercase literals for --smart-case.
This changes the uppercase literal detection for the "smart case"
functionality. In particular, a character class is considered to have an
uppercase literal if at least one of its ranges starts or stops with an
uppercase literal.

Fixes #229
2016-11-28 17:57:26 -05:00
Andrew Gallant
8baa0e56b7 grep-0.1.4 2016-11-06 15:35:17 -05:00
Andrew Gallant
0222e024fe Fixes a bug with --smart-case.
This was a subtle bug, but the big picture was that the smart case
information wasn't being carried through to the literal extraction in
some cases. When this happened, it was possible to get back an incomplete
set of literals, which would therefore miss some valid matches.

The fix to this is to actually parse the regex and determine whether
smart case applies before doing anything else. It's a little extra work,
but parsing is pretty fast.

Fixes #199
2016-11-06 12:07:47 -05:00
Andrew Gallant
d79add341b Move all gitignore matching to separate crate.
This PR introduces a new sub-crate, `ignore`, which primarily provides a
fast recursive directory iterator that respects ignore files like
gitignore and other configurable filtering rules based on globs or even
file types.

This results in a substantial source of complexity moved out of ripgrep's
core and into a reusable component that others can now (hopefully)
benefit from.

While much of the ignore code carried over from ripgrep's core, a
substantial portion of it was rewritten with the following goals in
mind:

1. Reuse matchers built from gitignore files across directory iteration.
2. Design the matcher data structure to be amenable for parallelizing
   directory iteration. (Indeed, writing the parallel iterator is the
   next step.)

Fixes #9, #44, #45
2016-10-29 20:48:59 -04:00
Andrew Gallant
d3e118a786 Fix debug expression statement. 2016-10-10 21:48:34 -04:00
Andrew Gallant
b62195b33f grep 0.1.3 2016-09-25 22:29:35 -04:00
Andrew Gallant
6a8051b258 Don't union inner literals of repetitions.
If we do, this results in extracting `foofoofoo` from `(\wfoo){3}`,
which is wrong. This does prevent us from extracting `foofoofoo` from
`foo{3}`, which is unfortunate, but we miss plenty of other stuff too.
Literal extracting needs a good rethink (all the way down into the regex
engine).

Fixes #93
2016-09-25 20:10:28 -04:00
Andrew Gallant
1595f0faf5 Add --smart-case.
It does what it says on the tin.

Closes #70.
2016-09-24 21:51:04 -04:00
Andrew Gallant
24e14a0341 grep 0.1.2 2016-09-21 19:14:12 -04:00
Andrew Gallant
2a2b1506d4 Fix a performance bug where using -w could result in very bad performance.
The specific issue is that -w causes the regex to be wrapped in Unicode
word boundaries. Regrettably, Unicode word boundaries are the one thing
our regex engine can't handle well in the presence of non-ASCII text. We
work around its slowness by stripping word boundaries in some
circumstances, and using the resulting expression as a way to produce match
candidates that are then verified by the full original regex.

This doesn't fix all cases, but it should fix all cases where -w is used.
2016-09-21 19:12:07 -04:00
Andrew Gallant
4d6b3c727e Bump regex version. 2016-09-21 19:05:15 -04:00
Andrew Gallant
bf5d873099 grep 0.1.1 2016-09-17 11:32:47 -04:00
Andrew Gallant
d22a3ca3e5 Improve the "bad literal" error message.
Incidentally, this was done by using the Debug impl for `char` instead
of the Display impl. Cute.

Fixes #5.
2016-09-16 18:12:00 -04:00
Andrew Gallant
5fdfae2f15 add readme 2016-09-13 21:15:10 -04:00
Andrew Gallant
7057ee91de update grep Cargo.toml 2016-09-13 21:13:33 -04:00
Andrew Gallant
954fbeb1d8 Update regex. 2016-09-11 18:52:42 -04:00
Andrew Gallant
98a48b44bc Fix off-by-one bug in searcher. 2016-09-10 01:35:30 -04:00
Andrew Gallant
a744ec133d Rename xrep to ripgrep. 2016-09-08 16:15:44 -04:00
Andrew Gallant
af3b56a623 Fix grep match iterator. 2016-09-06 21:45:41 -04:00
Andrew Gallant
fd3e5069b6 Fix required literal handling and add debug prints.
In particular, if we had an inner literal and were doing a case insensitive
search, then the literals are dropped because we previously only allowed
a single inner literal to have an effect. Now we allow alternations of
inner literals, but still don't quite take full advantage.
2016-09-06 19:33:03 -04:00
Andrew Gallant
2bda77c414 Fix deps so that others can build it. 2016-09-05 18:22:12 -04:00
Andrew Gallant
0bf278e72f making search work (finally) 2016-09-03 21:48:23 -04:00
Andrew Gallant
d011cea053 The search code is a mess, but...
... we now support inverted matches and line numbers!
2016-08-29 22:44:15 -04:00
Andrew Gallant
1c8379f55a Implementing core functionality.
Initially experimenting with crossbeam to manage synchronization.
2016-08-28 01:37:12 -04:00
Andrew Gallant
957f90c898 docs and small polish 2016-08-24 18:33:35 -04:00
Andrew Gallant
61f49ba716 Remove the buffered reader.
We really need functionality like this when memory maps aren't suitable,
either because they're too slow or because they just aren't available (like
for reading stdin). However, this particular approach was completely bunk.
Namely, the interface was all wrong. The caller needs to maintain some kind
of control over the search buffers for special output features (like
contexts or inverted matching), but this interface as written doesn't
support that kind of pattern at all.

So... back to the drawing board.
2016-08-24 18:06:42 -04:00
Andrew Gallant
e97d75c024 Refactor buffered test. 2016-08-08 19:17:25 -04:00
Andrew Gallant
076eeff3ea update 2016-08-05 00:10:58 -04:00