ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-12-04 14:00:13 +02:00

Author	SHA1	Message	Date
Andrew Gallant	4846d63539	grep-cli: introduce new grep-cli crate This commit moves a lot of "utility" code from ripgrep core into grep-cli. Any one of these things might not be worth creating a new crate, but combining everything together results in a fair number of a convenience routines that make up a decent sized crate. There is potentially more we could move into the crate, but much of what remains in ripgrep core is almost entirely dealing with the number of flags we support. In the course of doing moving things to the grep-cli crate, we clean up a lot of gunk and improve failure modes in a number of cases. In particular, we've fixed a bug where other processes could deadlock if they write too much to stderr. Fixes #990	2018-09-04 23:18:55 -04:00
Andrew Gallant	bb110c1ebe	ripgrep: migrate to libripgrep This commit does the work to delete the old `grep` crate and effectively rewrite most of ripgrep core to use the new libripgrep crates. The new `grep` crate is now a facade that collects the various crates that make up libripgrep. The most complex part of ripgrep core is now arguably the translation between command line parameters and the library options, which is ultimately where we want to be.	2018-08-20 07:10:19 -04:00
Andrew Gallant	94be3bd4bb	grep: remove senseless test It was pulling in a sizable data file and doesn't appear to be testing anything meaningful that isn't covered by a variety of other tests.	2018-08-15 19:52:50 -04:00
Andrew Gallant	7e5a590276	grep: small literal detection fix This commit tweaks the inner literal detection heuristic such that if it comes up with any literal that is all whitespace, then it's likely a bad literal to look for since it's so common. Therefore, we simply reject the inner literal optimization in this case and let the regex engine do its thang.	2018-07-17 20:27:04 -04:00
Andrew Gallant	42b8132d0a	grep: add "perfect" smart case detection This commit removes the previous smart case detection logic and replaces it with detection based on the regex AST. This particular AST is a faithful representation of the concrete syntax, which lets us be very precise in how we handle it. Closes #851	2018-03-13 22:55:39 -04:00
Andrew Gallant	cd08707c7c	grep: upgrade to regex-syntax 0.5 This update brings with it many bug fixes: * Better error messages are printed overall. We also include explicit call out for unsupported features like backreferences and look-around. * Regexes like `\s{` no longer emit incomprehensible errors. Unicode escape sequences, such as `\u{..}` are now supported. For the most part, this upgrade was done in a straight-forward way. We resist the urge to refactor the `grep` crate, in anticipation of it being rewritten anyway. Note that we removed the `--fixed-strings` suggestion whenever a regex syntax error occurs. In practice, I've found that it results in a lot of false positives, and I believe that its use is not as paramount now that regex parse errors are much more readable. Closes #268, Closes #395, Closes #702, Closes #853	2018-03-13 22:55:39 -04:00
Balaji Sivaraman	b6177f0459	cleanup: replace try! with ?	2018-01-01 09:22:35 -05:00
dana	86c890bcec	Improve detection of upper-case characters by smart-case feature Fixes #717 (partially) The previous implementation of the smart-case feature was actually too smart, in that it inspected the final character ranges in the AST to determine if the pattern contained upper-case characters. This meant that patterns like `foo\w` would not be handled case-insensitively, since `\w` includes the range of upper-case characters A–Z. As a medium-term solution to this problem, we now inspect the input pattern itself for upper-case characters, ignoring any that immediately follow a `\`. This neatly handles all of the most basic cases like `\w`, `\S`, and `É`, though it still has problems with more complex features like `\p{Ll}`. Handling those correctly will require improvements to the AST.	2017-12-18 17:58:26 -05:00
Andrew Gallant	163e00677a	Update to regex 0.2.	2017-01-01 01:03:21 -05:00
Andrew Gallant	0473df1ef5	Disable Unicode mode for literal regex. When ripgrep detects a literal, it emits them as raw hex escaped byte sequences to Regex::new. This permits literal optimizations for arbitrary byte sequences (i.e., possibly invalid UTF-8). The problem is that Regex::new interprets hex escaped byte sequences as Unicode codepoints by default, but we want them to actually stand for their raw byte values. Therefore, disable Unicode mode. This is OK, since the regex is composed entirely of literals and literal extraction does Unicode case folding. Fixes #251	2016-11-28 18:31:58 -05:00
Andrew Gallant	301a3fd71d	Detect more uppercase literals for --smart-case. This changes the uppercase literal detection for the "smart case" functionality. In particular, a character class is considered to have an uppercase literal if at least one of its ranges starts or stops with an uppercase literal. Fixes #229	2016-11-28 17:57:26 -05:00
Andrew Gallant	0222e024fe	Fixes a bug with --smart-case. This was a subtle bug, but the big picture was that the smart case information wasn't being carried through to the literal extraction in some cases. When this happened, it was possible to get back an incomplete set of literals, which would therefore miss some valid matches. The fix to this is to actually parse the regex and determine whether smart case applies before doing anything else. It's a little extra work, but parsing is pretty fast. Fixes #199	2016-11-06 12:07:47 -05:00
Andrew Gallant	d3e118a786	Fix debug expression statement.	2016-10-10 21:48:34 -04:00
Andrew Gallant	6a8051b258	Don't union inner literals of repetitions. If we do, this results in extracting `foofoofoo` from `(\wfoo){3}`, which is wrong. This does prevent us from extracting `foofoofoo` from `foo{3}`, which is unfortunate, but we miss plenty of other stuff too. Literal extracting needs a good rethink (all the way down into the regex engine). Fixes #93	2016-09-25 20:10:28 -04:00
Andrew Gallant	1595f0faf5	Add --smart-case. It does what it says on the tin. Closes #70.	2016-09-24 21:51:04 -04:00
Andrew Gallant	2a2b1506d4	Fix a performance bug where using -w could result in very bad performance. The specific issue is that -w causes the regex to be wrapped in Unicode word boundaries. Regrettably, Unicode word boundaries are the one thing our regex engine can't handle well in the presence of non-ASCII text. We work around its slowness by stripping word boundaries in some circumstances, and using the resulting expression as a way to produce match candidates that are then verified by the full original regex. This doesn't fix all cases, but it should fix all cases where -w is used.	2016-09-21 19:12:07 -04:00
Andrew Gallant	d22a3ca3e5	Improve the "bad literal" error message. Incidentally, this was done by using the Debug impl for `char` instead of the Display impl. Cute. Fixes #5.	2016-09-16 18:12:00 -04:00
Andrew Gallant	98a48b44bc	Fix off-by-one bug in searcher.	2016-09-10 01:35:30 -04:00
Andrew Gallant	af3b56a623	Fix grep match iterator.	2016-09-06 21:45:41 -04:00
Andrew Gallant	fd3e5069b6	Fix required literal handling and add debug prints. In particular, if we had an inner literal and were doing a case insensitive search, then the literals are dropped because we previously only allowed a single inner literal to have an effect. Now we allow alternations of inner literals, but still don't quite take full advantage.	2016-09-06 19:33:03 -04:00
Andrew Gallant	0bf278e72f	making search work (finally)	2016-09-03 21:48:23 -04:00
Andrew Gallant	d011cea053	The search code is a mess, but... ... we now support inverted matches and line numbers!	2016-08-29 22:44:15 -04:00
Andrew Gallant	1c8379f55a	Implementing core functionality. Initially experimenting with crossbeam to manage synchronization.	2016-08-28 01:37:12 -04:00
Andrew Gallant	957f90c898	docs and small polish	2016-08-24 18:33:35 -04:00
Andrew Gallant	61f49ba716	Remove the buffered reader. We really need functionality like this when memory maps aren't suitable, either because they're too slow or because they just aren't available (like for reading stdin). However, this particular approach was completely bunk. Namely, the interface was all wrong. The caller needs to maintain some kind of control over the search buffers for special output features (like contexts or inverted matching), but this interface as written doesn't support that kind of pattern at all. So... back to the drawing board.	2016-08-24 18:06:42 -04:00
Andrew Gallant	e97d75c024	Refactor buffered test.	2016-08-08 19:17:25 -04:00
Andrew Gallant	076eeff3ea	update	2016-08-05 00:10:58 -04:00
Andrew Gallant	a3f609222c	progress	2016-06-22 21:19:02 -04:00
Andrew Gallant	0163b39faa	refactor progress	2016-06-20 16:55:13 -04:00

29 Commits