ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00

Author	SHA1	Message	Date
Andrew Gallant	8bbe58d623	Add support for additional text encodings. This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and Shift_JIS. (Courtesy of the `encoding_rs` crate.) Specifically, this feature enables ripgrep to search files that are encoded in an encoding other than UTF-8. The list of available encodings is tied directly to what the `encoding_rs` crate supports, which is in turn tied to the Encoding Standard. The full list of available encodings can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get This pull request also introduces the notion that text encodings can be automatically detected on a best effort basis. Currently, the only support for this is checking for a UTF-16 bom. In all other cases, a text encoding of `auto` (the default) implies a UTF-8 or ASCII compatible source encoding. When a text encoding is otherwise specified, it is unconditionally used for all files searched. Since ripgrep's regex engine is fundamentally built on top of UTF-8, this feature works by transcoding the files to be searched from their source encoding to UTF-8. This transcoding only happens when: 1. `auto` is specified and a non-UTF-8 encoding is detected. 2. A specific encoding is given by end users (including UTF-8). When transcoding occurs, errors are handled by automatically inserting the Unicode replacement character. In this case, ripgrep's output is guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they are printed). In all other cases, the source text is searched directly, which implies an assumption that it is at least ASCII compatible, but where UTF-8 is most useful. In this scenario, encoding errors are not detected. In this case, ripgrep's output will match the input exactly, byte-for-byte. This design may not be optimal in all cases, but it has some advantages: 1. In the happy path ("UTF-8 everywhere") remains happy. I have not been able to witness any performance regressions. 2. In the non-UTF-8 path, implementation complexity is kept relatively low. The cost here is transcoding itself. A potentially superior implementation might build decoding of any encoding into the regex engine itself. In particular, the fundamental problem with transcoding everything first is that literal optimizations are nearly negated. Future work should entail improving the user experience. For example, we might want to auto-detect more text encodings. A more elaborate UX experience might permit end users to specify multiple text encodings, although this seems hard to pull off in an ergonomic way. Fixes #1	2017-03-12 19:54:48 -04:00
Andrew Gallant	7c37065911	update deps	2017-03-08 20:23:12 -05:00
Andrew Gallant	4e8c0fc4ad	bump clap to 2.20.5 Fixes #383	2017-02-25 18:43:13 -05:00
Andrew Gallant	a114b86063	update termcolor dep	2017-02-18 15:09:25 -05:00
Andrew Gallant	d825648b86	Remove lazy_static from globset	2017-02-12 15:37:50 -05:00
Andrew Gallant	fecef10c1c	update deps	2017-01-17 19:36:23 -05:00
Andrew Gallant	f5a2d022ec	Replace internal atty module with atty crate. This removes all use of explicit unsafe in ripgrep proper except for one: accessing the contents of a memory map. (Which may never go away.)	2017-01-15 16:32:30 -05:00
Andrew Gallant	057ed6305a	0.4.0	2017-01-13 23:46:21 -05:00
Andrew Gallant	a7ca2d6563	update same-file dep	2017-01-13 20:01:06 -05:00
Andrew Gallant	c3de1f58ea	another bytecount update, weird	2017-01-10 21:13:40 -05:00
Andrew Gallant	e940bc956d	update bytecount Fixes #313	2017-01-10 18:30:16 -05:00
Andrew Gallant	461e0c4e33	Don't search stdout redirected file. When running ripgrep like this: rg foo > output we must be careful not to search `output` since ripgrep is actively writing to it. Searching it can cause massive blowups where the file grows without bound. While this is conceptually easy to fix (check the inode of the redirection and the inode of the file you're about to search), there are a few problems with it. First, inodes are a Unix thing, so we need a Windows specific solution to this as well. To resolve this concern, I created a new crate, `same-file`, which provides a cross platform abstraction. Second, stat'ing every file is costly. This is not avoidable on Windows, but on Unix, we can get the inode number directly from directory traversal. However, this information wasn't exposed, but now it is (through both the ignore and walkdir crates). Fixes #286	2017-01-09 16:12:08 -05:00
Andrew Gallant	aed315e80a	bump deps	2017-01-03 07:27:51 -05:00
Andrew Gallant	163e00677a	Update to regex 0.2.	2017-01-01 01:03:21 -05:00
Andrew Gallant	d58236fbdc	bump various versions	2016-12-30 15:44:08 -05:00
Andrew Gallant	b65bb37b14	Remove superfluous memmap dependency in `grep` crate. Fixes #295.	2016-12-27 15:46:40 -05:00
Andrew Gallant	de5cb7d22e	Remove special ^C handling. This means that ripgrep will no longer try to reset your colors in your terminal if you kill it while searching. This could result in messing up the colors in your terminal, and the fix is to simply run some other command that resets them for you. For example: $ echo -ne "\033[0m" The reason why the ^C handling was removed is because it is irrevocably broken on Windows and is impossible to do correctly and efficiently in ANSI terminals. Fixes #281	2016-12-24 12:53:09 -05:00
Andrew Gallant	82ceb818f3	update deps	2016-12-24 08:32:32 -05:00
Lilian Anatolie Moraru	cbacf4f19e	Update Cargo.lock to bring in clap 2.19.2 fix for ZSH completions. (#287 )	2016-12-23 06:47:55 -05:00
Andrew Gallant	de33003527	0.3.2	2016-12-07 10:59:06 -05:00
Andrew Gallant	d66812102b	Fix leading hypen bug by updating clap. Fixes #270	2016-12-06 17:29:34 -05:00
Andrew Gallant	86f8c3c818	update Cargo.lock	2016-12-05 20:15:45 -05:00
Andrew Gallant	7282706b42	Fix bug reading root symlink. When give an explicit file path on the command line like `foo` where `foo` is a symlink, ripgrep should follow it even if `-L` isn't set. This is consistent with the behavior of `foo/`. Fixes #256	2016-12-05 20:05:57 -05:00
Andrew Gallant	c4a6733f3b	0.3.1	2016-11-21 20:53:52 -05:00
Andrew Gallant	05b26d5986	bump termcolor	2016-11-21 20:33:57 -05:00
Andrew Gallant	aef46beaf2	0.3.0	2016-11-20 16:07:25 -05:00
Andrew Gallant	e8a30cb893	Completely re-work colored output and tty handling. This commit completely guts all of the color handling code and replaces most of it with two new crates: wincolor and termcolor. wincolor provides a simple API to coloring using the Windows console and termcolor provides a platform independent coloring API tuned for multithreaded command line programs. This required a lot more flexibility than what the `term` crate provided, so it was dropped. We instead switch to writing ANSI escape sequences directly and ignore the TERMINFO database. In addition to fixing several bugs, this commit also permits end users to customize colors to a certain extent. For example, this command will set the match color to magenta and the line number background to yellow: rg --colors 'match:fg:magenta' --colors 'line:bg:yellow' foo For tty handling, we've adopted a hack from `git` to do tty detection in MSYS/mintty terminals. As a result, ripgrep should get both color detection and piping correct on Windows regardless of which terminal you use. Finally, switch to line buffering. Performance doesn't seem to be impacted and it's an otherwise more user friendly option. Fixes #37, Fixes #51, Fixes #94, Fixes #117, Fixes #182, Fixes #231	2016-11-20 11:14:52 -05:00
Andrew Gallant	92dc402f7f	Switch from Docopt to Clap. There were two important reasons for the switch: 1. Performance. Docopt does poorly when the argv becomes large, which is a reasonable common use case for search tools. (e.g., use with xargs) 2. Better failure modes. Clap knows a lot more about how a particular argv might be invalid, and can therefore provide much clearer error messages. While both were important, (1) made it urgent. Note that since Clap requires at least Rust 1.11, this will in turn increase the minimum Rust version supported by ripgrep from Rust 1.9 to Rust 1.11. It is therefore a breaking change, so the soonest release of ripgrep with Clap will have to be 0.3. There is also at least one subtle breaking change in real usage. Previous to this commit, this used to work: rg -e -foo Where this would cause ripgrep to search for the string `-foo`. Clap currently has problems supporting this use case (see: https://github.com/kbknapp/clap-rs/issues/742), but it can be worked around by using this instead: rg -e [-]foo or even rg [-]foo and this still works: rg -- -foo This commit also adds Bash, Fish and PowerShell completion files to the release, fixes a bug that prevented ripgrep from working on file paths containing invalid UTF-8 and shows short descriptions in the output of `-h` but longer descriptions in the output of `--help`. Fixes #136, Fixes #189, Fixes #210, Fixes #230	2016-11-17 19:53:41 -05:00
Andrew Gallant	5462af4434	Pin rustc-serialize to 0.3.19. See: https://github.com/rust-lang-nursery/rustc-serialize/pull/159	2016-11-09 20:28:58 -05:00
Andrew Gallant	d2e70da040	0.2.9	2016-11-09 19:07:25 -05:00
Andrew Gallant	64dc9b6709	update deps	2016-11-09 18:54:22 -05:00
Andrew Gallant	18943b9317	0.2.8	2016-11-06 16:16:48 -05:00
Andrew Gallant	2daef51fe5	0.2.7	2016-11-06 15:49:25 -05:00
Andrew Gallant	dada75d2a7	Update sub-crate dependency versions.	2016-11-06 15:48:40 -05:00
Andrew Gallant	5bd0edbbe1	Actually use simd/avx optimizations in bytecount crate. Also update compile script.	2016-11-05 22:44:33 -04:00
Andre Bogus	02de97b8ce	Use the bytecount crate for fast line counting. Fixes #128	2016-11-05 22:29:26 -04:00
Andrew Gallant	b272be25fa	Add parallel recursive directory iterator. This adds a new walk type in the `ignore` crate, `WalkParallel`, which provides a way for recursively iterating over a set of paths in parallel while respecting various ignore rules. The API is a bit strange, as a closure producing a closure isn't something one often sees, but it does seem to work well. This also allowed us to simplify much of the worker logic in ripgrep proper, where MultiWorker is now gone.	2016-11-05 21:45:55 -04:00
Andrew Gallant	1aeae3e22d	update ripgrep	2016-11-04 21:12:08 -04:00
Andrew Gallant	d85a6dd5c8	update ignore dependency	2016-10-31 20:01:31 -04:00
Andrew Gallant	c8e2fa1869	update Cargo.lock	2016-10-31 19:54:38 -04:00
Andrew Gallant	1aae2759ad	update deps	2016-10-29 22:27:29 -04:00
Brian Campbell	79a8d0ab3f	Reset the terminal when Ctrl-C is pressed If a user hits Ctrl-C to exit out of a search in the middle of printing a line, we don't want to leave the terminal colors screwed up for them. Catch Ctrl-C using the ctrlc crate, obtain a stdout lock to ensure that other threads don't continue writing after we do so, reset the terminal, and exit the program. Closes #119	2016-10-29 21:23:05 -04:00
Andrew Gallant	d79add341b	Move all gitignore matching to separate crate. This PR introduces a new sub-crate, `ignore`, which primarily provides a fast recursive directory iterator that respects ignore files like gitignore and other configurable filtering rules based on globs or even file types. This results in a substantial source of complexity moved out of ripgrep's core and into a reusable component that others can now (hopefully) benefit from. While much of the ignore code carried over from ripgrep's core, a substantial portion of it was rewritten with the following goals in mind: 1. Reuse matchers built from gitignore files across directory iteration. 2. Design the matcher data structure to be amenable for parallelizing directory iteration. (Indeed, writing the parallel iterator is the next step.) Fixes #9, #44, #45	2016-10-29 20:48:59 -04:00
Andrew Gallant	94d600e6e1	Update deps.	2016-10-16 10:12:49 -04:00
Andrew Gallant	247a9398f4	Switch to thread_local crate in lieu of thread_local!. This is to work around a bug where using a thread_local! was causing a segfault on macos. Fixes #164.	2016-10-11 18:23:49 -04:00
Andrew Gallant	4737326ed3	Update regex-syntax for bug fix. The bug fix was in expression pretty printing. ripgrep parses the regex into an AST and may do some modifications to it, which requires the ability to go from string -> AST -> string' -> AST' where string == string' implies AST == AST'. Also, add a regression test for the specific regex that tripped the bug. Fixes #156.	2016-10-10 22:04:29 -04:00
Andrew Gallant	e96d93034a	Finish overhaul of glob matching. This commit completes the initial move of glob matching to an external crate, including fixing up cross platform support, polishing the external crate for others to use and fixing a number of bugs in the process. Fixes #87, #127, #131	2016-10-10 19:24:18 -04:00
Andrew Gallant	175406df01	Refactor and test glob sets. This commit goes a long way toward refactoring glob sets so that the code is easier to maintain going forward. In particular, it makes the literal optimizations that glob sets used a lot more structured and much easier to extend. Tests have also been modified to include glob sets. There's still a bit of polish work left to do before a release. This also fixes the immediate issue where large gitignore files were causing ripgrep to slow way down. While we don't technically fix it for good, we're a lot better about reducing the number of regexes we compile. In particular, if a gitignore file contains thousands of patterns that can't be matched more simply using literals, then ripgrep will slow down again. We could fix this for good by avoiding RegexSet if the number of regexes grows too large. Fixes #134.	2016-10-04 20:28:56 -04:00
Andrew Gallant	fdf24317ac	Move glob implementation to new crate. It is isolated and complex enough that it deserves attention all on its own. It's also eminently reusable.	2016-09-30 19:42:41 -04:00
Andrew Gallant	316ffd87b3	bump docopt to 0.6.86	2016-09-28 15:56:59 -04:00
Andrew Gallant	de79be2db2	0.2.1	2016-09-26 20:02:58 -04:00
Andrew Gallant	b1c52b52d6	0.2.0	2016-09-25 22:32:14 -04:00
Andrew Gallant	109bc3f78e	bump grep to 0.1.3	2016-09-25 22:30:17 -04:00
Andrew Gallant	af4dc78537	Update to docopt 0.6.85. The new version won't panic if printing to stdout fails. Fixes #22.	2016-09-24 19:14:19 -04:00
Andrew Gallant	b33e9cba69	0.1.17	2016-09-23 11:26:23 -04:00
Andrew Gallant	25c259112b	0.1.16	2016-09-22 21:32:41 -04:00
Andrew Gallant	2115774c6e	0.1.15	2016-09-22 19:20:11 -04:00
Andrew Gallant	1b14e245be	0.1.14	2016-09-22 17:48:49 -04:00
Andrew Gallant	263e2b012f	0.1.13	2016-09-21 21:07:40 -04:00
Andrew Gallant	525d051172	0.1.12	2016-09-21 20:47:44 -04:00
Andrew Gallant	fe84928c85	0.1.11	2016-09-21 19:37:37 -04:00
Andrew Gallant	c1c92e4fee	0.1.10	2016-09-21 19:27:16 -04:00
Andrew Gallant	b0d8ff6f4a	0.1.9	2016-09-21 16:41:28 -04:00
Andrew Gallant	0263a401f6	0.1.8	2016-09-21 07:08:37 -04:00
Andrew Gallant	f9bff90842	0.1.7	2016-09-20 22:13:49 -04:00
Andrew Gallant	9e2f10b893	0.1.6	2016-09-20 20:25:51 -04:00
Andrew Gallant	e7fb0fd267	0.1.5	2016-09-19 21:56:00 -04:00
Andrew Gallant	6cb604f38f	0.1.3	2016-09-17 12:55:09 -04:00
Andrew Gallant	8f87a4e8ac	0.1.2	2016-09-17 11:36:11 -04:00
Andrew Gallant	d27d3e675f	bump grep	2016-09-17 11:34:27 -04:00
Andrew Gallant	e9ec52b7f9	Update walkdir	2016-09-16 17:56:44 -04:00
Andrew Gallant	0d14c74e63	Some minor performance tweaks. This includes moving basename-only globs into separate regexes. The hope is that if the regex processes less input, it will be faster.	2016-09-16 16:13:28 -04:00
Andrew Gallant	0e46171e3b	Rework glob sets. We try to reduce the pressure on regexes and offload some of it to Aho-Corasick or exact lookups.	2016-09-15 22:06:04 -04:00
Andrew Gallant	c24f8fd50f	Replace crossbeam with deque. deque appears faster.	2016-09-14 07:40:46 -04:00
Andrew Gallant	4212a8b9cb	0.1.1	2016-09-13 21:21:45 -04:00
Andrew Gallant	983c7fd6f9	We don't use thread_local any more, so remove it.	2016-09-13 21:21:36 -04:00
Andrew Gallant	cf3a33cea7	commit Cargo.lock	2016-09-11 19:06:05 -04:00

... 2 3 4 5 6

277 Commits