ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00

Author	SHA1	Message	Date
Marc Tiehuis	66efbad871	Add dfa-size-limit and regex-size-limit arguments Fixes #362.	2017-04-12 18:14:23 -04:00
Roman Proskuryakov	90a11dec5e	Add `-o/--only-matching` flag. Currently, the `--only-matching` flag conflicts with the `--replace` flag. In the future, this restriction may be relaxed. Fixes #34	2017-04-09 08:47:35 -04:00
Roman Proskuryakov	aed3ccb9c7	Improves Printer, fixes some bugs	2017-03-31 14:44:13 -04:00
Roman Proskuryakov	01deac9427	Add -0 shortcut for --null Fixes #419	2017-03-28 18:37:40 -04:00
Ralf Jung	d352b79294	Add new -M/--max-columns option. This permits setting the maximum line width with respect to the number of bytes in a line. Omitted lines (whether part of a match, replacement or context) are replaced with a message stating that the line was elided. Fixes #129	2017-03-12 21:21:28 -04:00
Andrew Gallant	8bbe58d623	Add support for additional text encodings. This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and Shift_JIS. (Courtesy of the `encoding_rs` crate.) Specifically, this feature enables ripgrep to search files that are encoded in an encoding other than UTF-8. The list of available encodings is tied directly to what the `encoding_rs` crate supports, which is in turn tied to the Encoding Standard. The full list of available encodings can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get This pull request also introduces the notion that text encodings can be automatically detected on a best effort basis. Currently, the only support for this is checking for a UTF-16 bom. In all other cases, a text encoding of `auto` (the default) implies a UTF-8 or ASCII compatible source encoding. When a text encoding is otherwise specified, it is unconditionally used for all files searched. Since ripgrep's regex engine is fundamentally built on top of UTF-8, this feature works by transcoding the files to be searched from their source encoding to UTF-8. This transcoding only happens when: 1. `auto` is specified and a non-UTF-8 encoding is detected. 2. A specific encoding is given by end users (including UTF-8). When transcoding occurs, errors are handled by automatically inserting the Unicode replacement character. In this case, ripgrep's output is guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they are printed). In all other cases, the source text is searched directly, which implies an assumption that it is at least ASCII compatible, but where UTF-8 is most useful. In this scenario, encoding errors are not detected. In this case, ripgrep's output will match the input exactly, byte-for-byte. This design may not be optimal in all cases, but it has some advantages: 1. In the happy path ("UTF-8 everywhere") remains happy. I have not been able to witness any performance regressions. 2. In the non-UTF-8 path, implementation complexity is kept relatively low. The cost here is transcoding itself. A potentially superior implementation might build decoding of any encoding into the regex engine itself. In particular, the fundamental problem with transcoding everything first is that literal optimizations are nearly negated. Future work should entail improving the user experience. For example, we might want to auto-detect more text encodings. A more elaborate UX experience might permit end users to specify multiple text encodings, although this seems hard to pull off in an ergonomic way. Fixes #1	2017-03-12 19:54:48 -04:00
Andrew Gallant	6ecffec537	Fix test on Windows. (This is what I get for directly pushing to master.)	2017-03-12 16:07:31 -04:00
Andrew Gallant	80e91a1f1d	Fix leading slash bug when used with `!`. When writing paths like `!/foo` in gitignore files (or when using the -g/--glob flag), the presence of `!` would prevent the gitignore builder from noticing the leading slash, which causes absolute path matching to fail. Fixes #405	2017-03-12 15:51:17 -04:00
Marc Tiehuis	adff43fbb4	Remove clap validator + add max-filesize integration tests	2017-03-08 10:17:18 -05:00
tiehuis	714ae82241	Add `--max-filesize` option to cli The --max-filesize option allows filtering files which are larger than the specified limit. This is potentially useful if one is attempting to search a number of large files without common file-types/suffixes. See #369.	2017-03-08 10:17:18 -05:00
Marc Tiehuis	066f97d855	Add enclosing group to alternations in globs Fixes #391.	2017-03-08 10:13:28 -05:00
Andrew Gallant	7a951f103a	Make --column imply --line-number. Closes #243	2017-01-11 18:53:35 -05:00
Andrew Gallant	8751e55706	Add --path-separator flag. This flag permits setting the path separator used for all file paths printed by ripgrep in normal operation. Fixes #275	2017-01-10 18:16:15 -05:00
Andrew Gallant	97e6873b38	Fix type compose test.	2017-01-07 22:50:38 -05:00
Ian Kerins	ed01e80a79	Provide a mechanism to compose type definitions This extends the syntax of the --type-add flag to allow including the globs of other already defined types. Fixes #83.	2017-01-07 18:14:24 -05:00
Andrew Gallant	b65a8c353b	Add --sort-files flag. When used, parallelism is disabled but the results are sorted by file path. Closes #263	2017-01-06 22:43:59 -05:00
Andrew Gallant	bb70f96743	Fix a non-termination bug. This was a very silly bug. Instead of creating a particular atomic once and cloning it, we created a new value for each worker. Fixes #279	2016-12-12 06:55:49 -05:00
Andrew Gallant	d66812102b	Fix leading hypen bug by updating clap. Fixes #270	2016-12-06 17:29:34 -05:00
Andrew Gallant	7282706b42	Fix bug reading root symlink. When give an explicit file path on the command line like `foo` where `foo` is a symlink, ripgrep should follow it even if `-L` isn't set. This is consistent with the behavior of `foo/`. Fixes #256	2016-12-05 20:05:57 -05:00
Andrew Gallant	0473df1ef5	Disable Unicode mode for literal regex. When ripgrep detects a literal, it emits them as raw hex escaped byte sequences to Regex::new. This permits literal optimizations for arbitrary byte sequences (i.e., possibly invalid UTF-8). The problem is that Regex::new interprets hex escaped byte sequences as Unicode codepoints by default, but we want them to actually stand for their raw byte values. Therefore, disable Unicode mode. This is OK, since the regex is composed entirely of literals and literal extraction does Unicode case folding. Fixes #251	2016-11-28 18:31:58 -05:00
Andrew Gallant	301a3fd71d	Detect more uppercase literals for --smart-case. This changes the uppercase literal detection for the "smart case" functionality. In particular, a character class is considered to have an uppercase literal if at least one of its ranges starts or stops with an uppercase literal. Fixes #229	2016-11-28 17:57:26 -05:00
Andrew Gallant	03f7605322	Rename --files-without-matches to --files-without-match. This is to be consistent with grep.	2016-11-19 20:15:41 -05:00
Daniel Luz	bd3e7eedb1	Add --files-without-matches flag. Performs the opposite of --files-with-matches: only shows paths of files that contain zero matches. Closes #138	2016-11-19 21:48:59 -02:00
Andrew Gallant	e37f783fc0	Fix issue number mixup. Thanks @bluss!	2016-11-17 20:30:18 -05:00
Andrew Gallant	92dc402f7f	Switch from Docopt to Clap. There were two important reasons for the switch: 1. Performance. Docopt does poorly when the argv becomes large, which is a reasonable common use case for search tools. (e.g., use with xargs) 2. Better failure modes. Clap knows a lot more about how a particular argv might be invalid, and can therefore provide much clearer error messages. While both were important, (1) made it urgent. Note that since Clap requires at least Rust 1.11, this will in turn increase the minimum Rust version supported by ripgrep from Rust 1.9 to Rust 1.11. It is therefore a breaking change, so the soonest release of ripgrep with Clap will have to be 0.3. There is also at least one subtle breaking change in real usage. Previous to this commit, this used to work: rg -e -foo Where this would cause ripgrep to search for the string `-foo`. Clap currently has problems supporting this use case (see: https://github.com/kbknapp/clap-rs/issues/742), but it can be worked around by using this instead: rg -e [-]foo or even rg [-]foo and this still works: rg -- -foo This commit also adds Bash, Fish and PowerShell completion files to the release, fixes a bug that prevented ripgrep from working on file paths containing invalid UTF-8 and shows short descriptions in the output of `-h` but longer descriptions in the output of `--help`. Fixes #136, Fixes #189, Fixes #210, Fixes #230	2016-11-17 19:53:41 -05:00
Eric Kidd	e9cd0a1cc3	Allow specifying patterns with `-f FILE` and `-f-` This is a somewhat basic implementation of `-f-` (#7), with unit tests. Changes include: 1. The internals of the `pattern` function have been refactored to avoid code duplication, but there's a lot more we could do. Right now we read the entire pattern list into a `Vec`. 2. There's now a `WorkDir::pipe` command that allows sending standard input to `rg` when testing. Not implemented: aho-corasick.	2016-11-15 13:00:16 -05:00
Andrew Gallant	4b18f82899	Disable symlink tests on Windows. For some reason, these work on AppVeyor but not in other build systems. Let's just disable them. See: https://github.com/rust-lang/rust/pull/37149	2016-11-11 06:44:23 -05:00
Andrew Gallant	2dce0dc0df	Fix a bug with handling --ignore-file. Namely, passing a directory to --ignore-file caused ripgrep to allocate memory without bound. The issue was that I got a bit overzealous with partial error reporting. Namely, when processing a gitignore file, we should try to use every pattern even if some patterns are invalid globs (e.g., a**b). In the process, I applied the same logic to I/O errors. In this case, it manifest by attempting to read lines from a directory, which appears to yield Results forever, where each Result is an error of the form "you can't read from a directory silly." Since I treated it as a partial error, ripgrep was just spinning and accruing each error in memory, which caused the OOM killer to kick in. Fixes #228	2016-11-09 16:45:23 -05:00
Andrew Gallant	58aca2efb2	Add -m/--max-count flag. This flag limits the number of matches printed per file. Closes #159	2016-11-06 13:09:53 -05:00
Andrew Gallant	0222e024fe	Fixes a bug with --smart-case. This was a subtle bug, but the big picture was that the smart case information wasn't being carried through to the literal extraction in some cases. When this happened, it was possible to get back an incomplete set of literals, which would therefore miss some valid matches. The fix to this is to actually parse the regex and determine whether smart case applies before doing anything else. It's a little extra work, but parsing is pretty fast. Fixes #199	2016-11-06 12:07:47 -05:00
Andre Bogus	02de97b8ce	Use the bytecount crate for fast line counting. Fixes #128	2016-11-05 22:29:26 -04:00
Andrew Gallant	16975797fe	Fixes a matching bug in the glob override matcher. This was probably a transcription error when moving the ignore matcher code out of ripgrep core. Specifically, the override glob matcher should not ignore directories if they don't match. Fixes #206	2016-10-31 19:54:38 -04:00
Andrew Gallant	d79add341b	Move all gitignore matching to separate crate. This PR introduces a new sub-crate, `ignore`, which primarily provides a fast recursive directory iterator that respects ignore files like gitignore and other configurable filtering rules based on globs or even file types. This results in a substantial source of complexity moved out of ripgrep's core and into a reusable component that others can now (hopefully) benefit from. While much of the ignore code carried over from ripgrep's core, a substantial portion of it was rewritten with the following goals in mind: 1. Reuse matchers built from gitignore files across directory iteration. 2. Design the matcher data structure to be amenable for parallelizing directory iteration. (Indeed, writing the parallel iterator is the next step.) Fixes #9, #44, #45	2016-10-29 20:48:59 -04:00
Andrew Gallant	f2e1711781	Fix bug when processing parent gitignore files. This particular bug was triggered whenever a search was run in a directory with a parent directory that contains a relevant .gitignore file. In particular, before matching against a parent directory's gitignore rules, a path's leading `./` was not stripped, which results in errant matching. We now make sure `./` is stripped. Fixes #184.	2016-10-16 10:15:11 -04:00
Andrew Gallant	4737326ed3	Update regex-syntax for bug fix. The bug fix was in expression pretty printing. ripgrep parses the regex into an AST and may do some modifications to it, which requires the ability to go from string -> AST -> string' -> AST' where string == string' implies AST == AST'. Also, add a regression test for the specific regex that tripped the bug. Fixes #156.	2016-10-10 22:04:29 -04:00
Andrew Gallant	a3537aa32a	Update darwin cfg attributes.	2016-10-10 21:48:47 -04:00
Andrew Gallant	4e52059ad6	Disable regression_131 test on darwin. It's not clear why it's failing. Maybe it doesn't permit certain characters in file paths?	2016-10-10 21:03:11 -04:00
Andrew Gallant	27a980c1bc	Fix symlink test. We attempt to run it on Windows, but I'm getting "access denied" errors when trying to create a file symlink. So we disable the test on Windows.	2016-10-10 19:34:57 -04:00
Andrew Gallant	e8645dc8ae	style nits	2016-10-10 19:27:12 -04:00
Andrew Gallant	e96d93034a	Finish overhaul of glob matching. This commit completes the initial move of glob matching to an external crate, including fixing up cross platform support, polishing the external crate for others to use and fixing a number of bugs in the process. Fixes #87, #127, #131	2016-10-10 19:24:18 -04:00
Ian Kerins	1c964372ad	Always follow symlinks on explicit file arguments.	2016-10-08 22:40:03 -04:00
Andrew Gallant	175406df01	Refactor and test glob sets. This commit goes a long way toward refactoring glob sets so that the code is easier to maintain going forward. In particular, it makes the literal optimizations that glob sets used a lot more structured and much easier to extend. Tests have also been modified to include glob sets. There's still a bit of polish work left to do before a release. This also fixes the immediate issue where large gitignore files were causing ripgrep to slow way down. While we don't technically fix it for good, we're a lot better about reducing the number of regexes we compile. In particular, if a gitignore file contains thousands of patterns that can't be matched more simply using literals, then ripgrep will slow down again. We could fix this for good by avoiding RegexSet if the number of regexes grows too large. Fixes #134.	2016-10-04 20:28:56 -04:00
Andrew Gallant	925d0db9f0	Add -s/--case-sensitive flag. This flag overrides both --smart-case and --ignore-case. Closes #124.	2016-09-28 16:32:29 -04:00
Garrett Squire	babe80d498	add a max-depth option for directory traversal CR and add integration test	2016-09-27 16:14:53 -07:00
Andrew Gallant	3e78fce3a3	Don't print empty lines in single threaded mode. Fixes #99.	2016-09-26 19:57:23 -04:00
Andrew Gallant	7a3fd1f23f	Add a --null flag. This flag causes a NUL byte to follow any file path in ripgrep's output. Closes #89.	2016-09-26 19:21:17 -04:00
Andrew Gallant	d306403440	Fix an off-by-one error with --column. Fixes #105.	2016-09-26 19:09:59 -04:00
Andrew Gallant	b034b77798	Don't replace NUL bytes when searching binary files as text. This was a result of misinterpreting a feature in grep where NUL bytes are replaced with \n. The primary reason for doing this is to avoid excessive memory usage on truly binary data. However, grep only does this when searching binary files as if they were binary, and which only reports whether the file matched or not. When grep is told to search binary data as text (the -a/--text flag), then it doesn't do any replacement so we shouldn't either. In general, this makes sense, because the user is essentially asserting that a particular file that looks like binary is actually text. In that case, we shouldn't try to replace any NUL bytes. ripgrep doesn't actually support searching binary data for whether it matches or not, so we don't actually need the replace_buf function. However, it does seem like a potentially useful feature.	2016-09-25 21:26:49 -04:00
Andrew Gallant	6a8051b258	Don't union inner literals of repetitions. If we do, this results in extracting `foofoofoo` from `(\wfoo){3}`, which is wrong. This does prevent us from extracting `foofoofoo` from `foo{3}`, which is unfortunate, but we miss plenty of other stuff too. Literal extracting needs a good rethink (all the way down into the regex engine). Fixes #93	2016-09-25 20:10:28 -04:00
Andrew Gallant	ed94aedf27	Permit whitelisting hidden files in ignores. Fixes #90	2016-09-25 18:31:41 -04:00

1 2

71 Commits