1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00
Commit Graph

41 Commits

Author SHA1 Message Date
Balaji Sivaraman
b6177f0459 cleanup: replace try! with ? 2018-01-01 09:22:35 -05:00
dana
86c890bcec Improve detection of upper-case characters by smart-case feature
Fixes #717 (partially)

The previous implementation of the smart-case feature was actually *too* smart,
in that it inspected the final character ranges in the AST to determine if the
pattern contained upper-case characters. This meant that patterns like `foo\w`
would not be handled case-insensitively, since `\w` includes the range of
upper-case characters A–Z.

As a medium-term solution to this problem, we now inspect the input pattern
itself for upper-case characters, ignoring any that immediately follow a `\`.
This neatly handles all of the most basic cases like `\w`, `\S`, and `É`, though
it still has problems with more complex features like `\p{Ll}`. Handling those
correctly will require improvements to the AST.
2017-12-18 17:58:26 -05:00
Andrew Gallant
efa4de8126
cargo: bump to 0.7.0 2017-10-21 22:40:10 -04:00
Andrew Gallant
1267f01c24
deps: upgrade to memchr 2 2017-10-21 22:40:09 -04:00
Andrew Gallant
c648eadbaa Bump and update deps. 2017-03-12 21:33:13 -04:00
Andrew Gallant
c1b841e934 Add license files to each crate.
Fixes #381
2017-03-12 16:57:15 -04:00
Andrew Gallant
057ed6305a 0.4.0 2017-01-13 23:46:21 -05:00
Andrew Gallant
163e00677a Update to regex 0.2. 2017-01-01 01:03:21 -05:00
Andrew Gallant
d58236fbdc bump various versions 2016-12-30 15:44:08 -05:00
Andrew Gallant
b65bb37b14 Remove superfluous memmap dependency in grep crate.
Fixes #295.
2016-12-27 15:46:40 -05:00
Andrew Gallant
0473df1ef5 Disable Unicode mode for literal regex.
When ripgrep detects a literal, it emits them as raw hex escaped byte
sequences to Regex::new. This permits literal optimizations for arbitrary
byte sequences (i.e., possibly invalid UTF-8). The problem is that
Regex::new interprets hex escaped byte sequences as *Unicode codepoints*
by default, but we want them to actually stand for their raw byte values.
Therefore, disable Unicode mode.

This is OK, since the regex is composed entirely of literals and literal
extraction does Unicode case folding.

Fixes #251
2016-11-28 18:31:58 -05:00
Andrew Gallant
301a3fd71d Detect more uppercase literals for --smart-case.
This changes the uppercase literal detection for the "smart case"
functionality. In particular, a character class is considered to have an
uppercase literal if at least one of its ranges starts or stops with an
uppercase literal.

Fixes #229
2016-11-28 17:57:26 -05:00
Andrew Gallant
8baa0e56b7 grep-0.1.4 2016-11-06 15:35:17 -05:00
Andrew Gallant
0222e024fe Fixes a bug with --smart-case.
This was a subtle bug, but the big picture was that the smart case
information wasn't being carried through to the literal extraction in
some cases. When this happened, it was possible to get back an incomplete
set of literals, which would therefore miss some valid matches.

The fix to this is to actually parse the regex and determine whether
smart case applies before doing anything else. It's a little extra work,
but parsing is pretty fast.

Fixes #199
2016-11-06 12:07:47 -05:00
Andrew Gallant
d79add341b Move all gitignore matching to separate crate.
This PR introduces a new sub-crate, `ignore`, which primarily provides a
fast recursive directory iterator that respects ignore files like
gitignore and other configurable filtering rules based on globs or even
file types.

This results in a substantial source of complexity moved out of ripgrep's
core and into a reusable component that others can now (hopefully)
benefit from.

While much of the ignore code carried over from ripgrep's core, a
substantial portion of it was rewritten with the following goals in
mind:

1. Reuse matchers built from gitignore files across directory iteration.
2. Design the matcher data structure to be amenable for parallelizing
   directory iteration. (Indeed, writing the parallel iterator is the
   next step.)

Fixes #9, #44, #45
2016-10-29 20:48:59 -04:00
Andrew Gallant
d3e118a786 Fix debug expression statement. 2016-10-10 21:48:34 -04:00
Andrew Gallant
b62195b33f grep 0.1.3 2016-09-25 22:29:35 -04:00
Andrew Gallant
6a8051b258 Don't union inner literals of repetitions.
If we do, this results in extracting `foofoofoo` from `(\wfoo){3}`,
which is wrong. This does prevent us from extracting `foofoofoo` from
`foo{3}`, which is unfortunate, but we miss plenty of other stuff too.
Literal extracting needs a good rethink (all the way down into the regex
engine).

Fixes #93
2016-09-25 20:10:28 -04:00
Andrew Gallant
1595f0faf5 Add --smart-case.
It does what it says on the tin.

Closes #70.
2016-09-24 21:51:04 -04:00
Andrew Gallant
24e14a0341 grep 0.1.2 2016-09-21 19:14:12 -04:00
Andrew Gallant
2a2b1506d4 Fix a performance bug where using -w could result in very bad performance.
The specific issue is that -w causes the regex to be wrapped in Unicode
word boundaries. Regrettably, Unicode word boundaries are the one thing
our regex engine can't handle well in the presence of non-ASCII text. We
work around its slowness by stripping word boundaries in some
circumstances, and using the resulting expression as a way to produce match
candidates that are then verified by the full original regex.

This doesn't fix all cases, but it should fix all cases where -w is used.
2016-09-21 19:12:07 -04:00
Andrew Gallant
4d6b3c727e Bump regex version. 2016-09-21 19:05:15 -04:00
Andrew Gallant
bf5d873099 grep 0.1.1 2016-09-17 11:32:47 -04:00
Andrew Gallant
d22a3ca3e5 Improve the "bad literal" error message.
Incidentally, this was done by using the Debug impl for `char` instead
of the Display impl. Cute.

Fixes #5.
2016-09-16 18:12:00 -04:00
Andrew Gallant
5fdfae2f15 add readme 2016-09-13 21:15:10 -04:00
Andrew Gallant
7057ee91de update grep Cargo.toml 2016-09-13 21:13:33 -04:00
Andrew Gallant
954fbeb1d8 Update regex. 2016-09-11 18:52:42 -04:00
Andrew Gallant
98a48b44bc Fix off-by-one bug in searcher. 2016-09-10 01:35:30 -04:00
Andrew Gallant
a744ec133d Rename xrep to ripgrep. 2016-09-08 16:15:44 -04:00
Andrew Gallant
af3b56a623 Fix grep match iterator. 2016-09-06 21:45:41 -04:00
Andrew Gallant
fd3e5069b6 Fix required literal handling and add debug prints.
In particular, if we had an inner literal and were doing a case insensitive
search, then the literals are dropped because we previously only allowed
a single inner literal to have an effect. Now we allow alternations of
inner literals, but still don't quite take full advantage.
2016-09-06 19:33:03 -04:00
Andrew Gallant
2bda77c414 Fix deps so that others can build it. 2016-09-05 18:22:12 -04:00
Andrew Gallant
0bf278e72f making search work (finally) 2016-09-03 21:48:23 -04:00
Andrew Gallant
d011cea053 The search code is a mess, but...
... we now support inverted matches and line numbers!
2016-08-29 22:44:15 -04:00
Andrew Gallant
1c8379f55a Implementing core functionality.
Initially experimenting with crossbeam to manage synchronization.
2016-08-28 01:37:12 -04:00
Andrew Gallant
957f90c898 docs and small polish 2016-08-24 18:33:35 -04:00
Andrew Gallant
61f49ba716 Remove the buffered reader.
We really need functionality like this when memory maps aren't suitable,
either because they're too slow or because they just aren't available (like
for reading stdin). However, this particular approach was completely bunk.
Namely, the interface was all wrong. The caller needs to maintain some kind
of control over the search buffers for special output features (like
contexts or inverted matching), but this interface as written doesn't
support that kind of pattern at all.

So... back to the drawing board.
2016-08-24 18:06:42 -04:00
Andrew Gallant
e97d75c024 Refactor buffered test. 2016-08-08 19:17:25 -04:00
Andrew Gallant
076eeff3ea update 2016-08-05 00:10:58 -04:00
Andrew Gallant
a3f609222c progress 2016-06-22 21:19:02 -04:00
Andrew Gallant
0163b39faa refactor progress 2016-06-20 16:55:13 -04:00