ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00

Author	SHA1	Message	Date
Andrew Gallant	8bbe58d623	Add support for additional text encodings. This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and Shift_JIS. (Courtesy of the `encoding_rs` crate.) Specifically, this feature enables ripgrep to search files that are encoded in an encoding other than UTF-8. The list of available encodings is tied directly to what the `encoding_rs` crate supports, which is in turn tied to the Encoding Standard. The full list of available encodings can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get This pull request also introduces the notion that text encodings can be automatically detected on a best effort basis. Currently, the only support for this is checking for a UTF-16 bom. In all other cases, a text encoding of `auto` (the default) implies a UTF-8 or ASCII compatible source encoding. When a text encoding is otherwise specified, it is unconditionally used for all files searched. Since ripgrep's regex engine is fundamentally built on top of UTF-8, this feature works by transcoding the files to be searched from their source encoding to UTF-8. This transcoding only happens when: 1. `auto` is specified and a non-UTF-8 encoding is detected. 2. A specific encoding is given by end users (including UTF-8). When transcoding occurs, errors are handled by automatically inserting the Unicode replacement character. In this case, ripgrep's output is guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they are printed). In all other cases, the source text is searched directly, which implies an assumption that it is at least ASCII compatible, but where UTF-8 is most useful. In this scenario, encoding errors are not detected. In this case, ripgrep's output will match the input exactly, byte-for-byte. This design may not be optimal in all cases, but it has some advantages: 1. In the happy path ("UTF-8 everywhere") remains happy. I have not been able to witness any performance regressions. 2. In the non-UTF-8 path, implementation complexity is kept relatively low. The cost here is transcoding itself. A potentially superior implementation might build decoding of any encoding into the regex engine itself. In particular, the fundamental problem with transcoding everything first is that literal optimizations are nearly negated. Future work should entail improving the user experience. For example, we might want to auto-detect more text encodings. A more elaborate UX experience might permit end users to specify multiple text encodings, although this seems hard to pull off in an ergonomic way. Fixes #1	2017-03-12 19:54:48 -04:00
Marc Tiehuis	adff43fbb4	Remove clap validator + add max-filesize integration tests	2017-03-08 10:17:18 -05:00
tiehuis	714ae82241	Add `--max-filesize` option to cli The --max-filesize option allows filtering files which are larger than the specified limit. This is potentially useful if one is attempting to search a number of large files without common file-types/suffixes. See #369.	2017-03-08 10:17:18 -05:00
Andrew Gallant	16de47920c	Permit --heading to override --no-heading. @kbknapp <3 Fixes #327	2017-02-18 15:25:08 -05:00
Andrew Gallant	7a951f103a	Make --column imply --line-number. Closes #243	2017-01-11 18:53:35 -05:00
Andrew Gallant	8751e55706	Add --path-separator flag. This flag permits setting the path separator used for all file paths printed by ripgrep in normal operation. Fixes #275	2017-01-10 18:16:15 -05:00
Andrew Gallant	2143bcf9cb	Add example to -r/--replace docs. Fixes #308	2017-01-10 16:43:28 -05:00
Ian Kerins	ed01e80a79	Provide a mechanism to compose type definitions This extends the syntax of the --type-add flag to allow including the globs of other already defined types. Fixes #83.	2017-01-07 18:14:24 -05:00
Andrew Gallant	851799f42b	Fix spacing issue in --help output.	2017-01-06 22:45:12 -05:00
Andrew Gallant	b65a8c353b	Add --sort-files flag. When used, parallelism is disabled but the results are sorted by file path. Closes #263	2017-01-06 22:43:59 -05:00
Andrew Gallant	b187c1a817	Rejigger bold and intense settings. Previously, ripgrep would only emit the 'bold' ANSI escape sequence if no foreground or background color was set. Instead, it would convert colors to their "intense" versions if bold was set. The intent was to do the same thing on Windows and Unix. However, this had a few negative side effects: 1. Omitting the 'bold' ANSI escape when 'bold' was set is surprising. 2. Intense colors can look quite bad and be hard to read. To fix this, we introduce a new setting called 'intense' in the --colors flag, and thread that down through to the public API of the `termcolor` crate. The 'intense' setting has environment specific behavior: 1. In ANSI mode, it will convert the selected color to its "intense" variant. 2. In the Windows console, it will make the text "intense." There is no longer any "smart" handling of the 'bold' style. The 'bold' ANSI escape is always emitted when it is selected. In the Windows console, the 'bold' setting now has no effect. Note that this is a breaking change. Fixes #266, #293	2017-01-06 20:09:51 -05:00
Andrew Gallant	8396d3ffaa	Make backreference support clear. Fixes #268.	2016-12-12 07:03:37 -05:00
Andrew Gallant	d66812102b	Fix leading hypen bug by updating clap. Fixes #270	2016-12-06 17:29:34 -05:00
Andrew Gallant	d12bdf35a5	Clarify use of --heading/--no-heading. Fixes #247.	2016-11-28 17:40:44 -05:00
Andrew Gallant	e8a30cb893	Completely re-work colored output and tty handling. This commit completely guts all of the color handling code and replaces most of it with two new crates: wincolor and termcolor. wincolor provides a simple API to coloring using the Windows console and termcolor provides a platform independent coloring API tuned for multithreaded command line programs. This required a lot more flexibility than what the `term` crate provided, so it was dropped. We instead switch to writing ANSI escape sequences directly and ignore the TERMINFO database. In addition to fixing several bugs, this commit also permits end users to customize colors to a certain extent. For example, this command will set the match color to magenta and the line number background to yellow: rg --colors 'match:fg:magenta' --colors 'line:bg:yellow' foo For tty handling, we've adopted a hack from `git` to do tty detection in MSYS/mintty terminals. As a result, ripgrep should get both color detection and piping correct on Windows regardless of which terminal you use. Finally, switch to line buffering. Performance doesn't seem to be impacted and it's an otherwise more user friendly option. Fixes #37, Fixes #51, Fixes #94, Fixes #117, Fixes #182, Fixes #231	2016-11-20 11:14:52 -05:00
Andrew Gallant	03f7605322	Rename --files-without-matches to --files-without-match. This is to be consistent with grep.	2016-11-19 20:15:41 -05:00
Daniel Luz	bd3e7eedb1	Add --files-without-matches flag. Performs the opposite of --files-with-matches: only shows paths of files that contain zero matches. Closes #138	2016-11-19 21:48:59 -02:00
Andrew Gallant	92dc402f7f	Switch from Docopt to Clap. There were two important reasons for the switch: 1. Performance. Docopt does poorly when the argv becomes large, which is a reasonable common use case for search tools. (e.g., use with xargs) 2. Better failure modes. Clap knows a lot more about how a particular argv might be invalid, and can therefore provide much clearer error messages. While both were important, (1) made it urgent. Note that since Clap requires at least Rust 1.11, this will in turn increase the minimum Rust version supported by ripgrep from Rust 1.9 to Rust 1.11. It is therefore a breaking change, so the soonest release of ripgrep with Clap will have to be 0.3. There is also at least one subtle breaking change in real usage. Previous to this commit, this used to work: rg -e -foo Where this would cause ripgrep to search for the string `-foo`. Clap currently has problems supporting this use case (see: https://github.com/kbknapp/clap-rs/issues/742), but it can be worked around by using this instead: rg -e [-]foo or even rg [-]foo and this still works: rg -- -foo This commit also adds Bash, Fish and PowerShell completion files to the release, fixes a bug that prevented ripgrep from working on file paths containing invalid UTF-8 and shows short descriptions in the output of `-h` but longer descriptions in the output of `--help`. Fixes #136, Fixes #189, Fixes #210, Fixes #230	2016-11-17 19:53:41 -05:00

18 Commits