ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-05-19 05:33:04 +02:00

Author	SHA1	Message	Date
Andrew Gallant	afa06c518a	deps: update libripgrep crate versions This prepares them for an initial 0.1.0 release.	2018-08-20 17:34:45 -04:00
Andrew Gallant	bb110c1ebe	ripgrep: migrate to libripgrep This commit does the work to delete the old `grep` crate and effectively rewrite most of ripgrep core to use the new libripgrep crates. The new `grep` crate is now a facade that collects the various crates that make up libripgrep. The most complex part of ripgrep core is now arguably the translation between command line parameters and the library options, which is ultimately where we want to be.	2018-08-20 07:10:19 -04:00
Andrew Gallant	94be3bd4bb	grep: remove senseless test It was pulling in a sizable data file and doesn't appear to be testing anything meaningful that isn't covered by a variety of other tests.	2018-08-15 19:52:50 -04:00
Andrew Gallant	0fdab0ec5e	grep-0.1.9	2018-08-03 16:12:08 -04:00
Andrew Gallant	7e5a590276	grep: small literal detection fix This commit tweaks the inner literal detection heuristic such that if it comes up with any literal that is all whitespace, then it's likely a bad literal to look for since it's so common. Therefore, we simply reject the inner literal optimization in this case and let the regex engine do its thang.	2018-07-17 20:27:04 -04:00
Bastien Orivel	49f36c7dcd	deps: update regex to 1.0 We retain the `simd-accel` feature on globset for backwards compatibility, but will remove it in the next semver release.	2018-05-07 13:07:30 -04:00
Andrew Gallant	42b8132d0a	grep: add "perfect" smart case detection This commit removes the previous smart case detection logic and replaces it with detection based on the regex AST. This particular AST is a faithful representation of the concrete syntax, which lets us be very precise in how we handle it. Closes #851	2018-03-13 22:55:39 -04:00
Andrew Gallant	cd08707c7c	grep: upgrade to regex-syntax 0.5 This update brings with it many bug fixes: * Better error messages are printed overall. We also include explicit call out for unsupported features like backreferences and look-around. * Regexes like `\s{` no longer emit incomprehensible errors. Unicode escape sequences, such as `\u{..}` are now supported. For the most part, this upgrade was done in a straight-forward way. We resist the urge to refactor the `grep` crate, in anticipation of it being rewritten anyway. Note that we removed the `--fixed-strings` suggestion whenever a regex syntax error occurs. In practice, I've found that it results in a lot of false positives, and I believe that its use is not as paramount now that regex parse errors are much more readable. Closes #268, Closes #395, Closes #702, Closes #853	2018-03-13 22:55:39 -04:00
Andrew Gallant	1f70e9187c	deps: update regex crate This update brings with it a new feature of the regex crate which will now use SIMD optimizations automatically at runtime with no necessary compile time flags. All that's needed is to enable the `unstable` feature. Other crates, such as bytecount and encoding_rs, are still using the old-style SIMD support, so we leave the simd-accel and avx-accel features. However, the binaries we distribute on Github no longer have those features enabled, which makes them truly portable. Fixes #135	2018-03-12 23:21:42 -04:00
Andrew Gallant	739f8f596b	grep: release 0.1.8	2018-02-11 13:35:54 -05:00
Andrew Gallant	3535047094	logger: drop env_logger This commit updates the `log` crate to 0.4 and drops the dependency on env_logger. In particular, the latest version of env_logger brings in additional non-optional dependencies such as chrono that I don't think is worth including into ripgrep. It turns out ripgrep doesn't need any fancy logging. We just need a concept of log levels and the ability to print to stderr. Therefore, we just roll our own super simple logger. This update is motivated by the persistent configuration task. In particular, we need the ability to toggle the global log level more than once, and this doesn't appear to be possible with older versions of the log crate.	2018-02-04 10:40:20 -05:00
Balaji Sivaraman	b6177f0459	cleanup: replace try! with ?	2018-01-01 09:22:35 -05:00
dana	86c890bcec	Improve detection of upper-case characters by smart-case feature Fixes #717 (partially) The previous implementation of the smart-case feature was actually too smart, in that it inspected the final character ranges in the AST to determine if the pattern contained upper-case characters. This meant that patterns like `foo\w` would not be handled case-insensitively, since `\w` includes the range of upper-case characters A–Z. As a medium-term solution to this problem, we now inspect the input pattern itself for upper-case characters, ignoring any that immediately follow a `\`. This neatly handles all of the most basic cases like `\w`, `\S`, and `É`, though it still has problems with more complex features like `\p{Ll}`. Handling those correctly will require improvements to the AST.	2017-12-18 17:58:26 -05:00
Andrew Gallant	efa4de8126	cargo: bump to 0.7.0	2017-10-21 22:40:10 -04:00
Andrew Gallant	1267f01c24	deps: upgrade to memchr 2	2017-10-21 22:40:09 -04:00
Andrew Gallant	c648eadbaa	Bump and update deps.	2017-03-12 21:33:13 -04:00
Andrew Gallant	c1b841e934	Add license files to each crate. Fixes #381	2017-03-12 16:57:15 -04:00
Andrew Gallant	057ed6305a	0.4.0	2017-01-13 23:46:21 -05:00
Andrew Gallant	163e00677a	Update to regex 0.2.	2017-01-01 01:03:21 -05:00
Andrew Gallant	d58236fbdc	bump various versions	2016-12-30 15:44:08 -05:00
Andrew Gallant	b65bb37b14	Remove superfluous memmap dependency in `grep` crate. Fixes #295.	2016-12-27 15:46:40 -05:00
Andrew Gallant	0473df1ef5	Disable Unicode mode for literal regex. When ripgrep detects a literal, it emits them as raw hex escaped byte sequences to Regex::new. This permits literal optimizations for arbitrary byte sequences (i.e., possibly invalid UTF-8). The problem is that Regex::new interprets hex escaped byte sequences as Unicode codepoints by default, but we want them to actually stand for their raw byte values. Therefore, disable Unicode mode. This is OK, since the regex is composed entirely of literals and literal extraction does Unicode case folding. Fixes #251	2016-11-28 18:31:58 -05:00
Andrew Gallant	301a3fd71d	Detect more uppercase literals for --smart-case. This changes the uppercase literal detection for the "smart case" functionality. In particular, a character class is considered to have an uppercase literal if at least one of its ranges starts or stops with an uppercase literal. Fixes #229	2016-11-28 17:57:26 -05:00
Andrew Gallant	8baa0e56b7	grep-0.1.4	2016-11-06 15:35:17 -05:00
Andrew Gallant	0222e024fe	Fixes a bug with --smart-case. This was a subtle bug, but the big picture was that the smart case information wasn't being carried through to the literal extraction in some cases. When this happened, it was possible to get back an incomplete set of literals, which would therefore miss some valid matches. The fix to this is to actually parse the regex and determine whether smart case applies before doing anything else. It's a little extra work, but parsing is pretty fast. Fixes #199	2016-11-06 12:07:47 -05:00
Andrew Gallant	d79add341b	Move all gitignore matching to separate crate. This PR introduces a new sub-crate, `ignore`, which primarily provides a fast recursive directory iterator that respects ignore files like gitignore and other configurable filtering rules based on globs or even file types. This results in a substantial source of complexity moved out of ripgrep's core and into a reusable component that others can now (hopefully) benefit from. While much of the ignore code carried over from ripgrep's core, a substantial portion of it was rewritten with the following goals in mind: 1. Reuse matchers built from gitignore files across directory iteration. 2. Design the matcher data structure to be amenable for parallelizing directory iteration. (Indeed, writing the parallel iterator is the next step.) Fixes #9, #44, #45	2016-10-29 20:48:59 -04:00
Andrew Gallant	d3e118a786	Fix debug expression statement.	2016-10-10 21:48:34 -04:00
Andrew Gallant	b62195b33f	grep 0.1.3	2016-09-25 22:29:35 -04:00
Andrew Gallant	6a8051b258	Don't union inner literals of repetitions. If we do, this results in extracting `foofoofoo` from `(\wfoo){3}`, which is wrong. This does prevent us from extracting `foofoofoo` from `foo{3}`, which is unfortunate, but we miss plenty of other stuff too. Literal extracting needs a good rethink (all the way down into the regex engine). Fixes #93	2016-09-25 20:10:28 -04:00
Andrew Gallant	1595f0faf5	Add --smart-case. It does what it says on the tin. Closes #70.	2016-09-24 21:51:04 -04:00
Andrew Gallant	24e14a0341	grep 0.1.2	2016-09-21 19:14:12 -04:00
Andrew Gallant	2a2b1506d4	Fix a performance bug where using -w could result in very bad performance. The specific issue is that -w causes the regex to be wrapped in Unicode word boundaries. Regrettably, Unicode word boundaries are the one thing our regex engine can't handle well in the presence of non-ASCII text. We work around its slowness by stripping word boundaries in some circumstances, and using the resulting expression as a way to produce match candidates that are then verified by the full original regex. This doesn't fix all cases, but it should fix all cases where -w is used.	2016-09-21 19:12:07 -04:00
Andrew Gallant	4d6b3c727e	Bump regex version.	2016-09-21 19:05:15 -04:00
Andrew Gallant	bf5d873099	grep 0.1.1	2016-09-17 11:32:47 -04:00
Andrew Gallant	d22a3ca3e5	Improve the "bad literal" error message. Incidentally, this was done by using the Debug impl for `char` instead of the Display impl. Cute. Fixes #5.	2016-09-16 18:12:00 -04:00
Andrew Gallant	5fdfae2f15	add readme	2016-09-13 21:15:10 -04:00
Andrew Gallant	7057ee91de	update grep Cargo.toml	2016-09-13 21:13:33 -04:00
Andrew Gallant	954fbeb1d8	Update regex.	2016-09-11 18:52:42 -04:00
Andrew Gallant	98a48b44bc	Fix off-by-one bug in searcher.	2016-09-10 01:35:30 -04:00
Andrew Gallant	a744ec133d	Rename xrep to ripgrep.	2016-09-08 16:15:44 -04:00
Andrew Gallant	af3b56a623	Fix grep match iterator.	2016-09-06 21:45:41 -04:00
Andrew Gallant	fd3e5069b6	Fix required literal handling and add debug prints. In particular, if we had an inner literal and were doing a case insensitive search, then the literals are dropped because we previously only allowed a single inner literal to have an effect. Now we allow alternations of inner literals, but still don't quite take full advantage.	2016-09-06 19:33:03 -04:00
Andrew Gallant	2bda77c414	Fix deps so that others can build it.	2016-09-05 18:22:12 -04:00
Andrew Gallant	0bf278e72f	making search work (finally)	2016-09-03 21:48:23 -04:00
Andrew Gallant	d011cea053	The search code is a mess, but... ... we now support inverted matches and line numbers!	2016-08-29 22:44:15 -04:00
Andrew Gallant	1c8379f55a	Implementing core functionality. Initially experimenting with crossbeam to manage synchronization.	2016-08-28 01:37:12 -04:00
Andrew Gallant	957f90c898	docs and small polish	2016-08-24 18:33:35 -04:00
Andrew Gallant	61f49ba716	Remove the buffered reader. We really need functionality like this when memory maps aren't suitable, either because they're too slow or because they just aren't available (like for reading stdin). However, this particular approach was completely bunk. Namely, the interface was all wrong. The caller needs to maintain some kind of control over the search buffers for special output features (like contexts or inverted matching), but this interface as written doesn't support that kind of pattern at all. So... back to the drawing board.	2016-08-24 18:06:42 -04:00
Andrew Gallant	e97d75c024	Refactor buffered test.	2016-08-08 19:17:25 -04:00
Andrew Gallant	076eeff3ea	update	2016-08-05 00:10:58 -04:00

1 2

52 Commits