ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-11-29 05:57:07 +02:00

Author	SHA1	Message	Date
Ralf Jung	d352b79294	Add new -M/--max-columns option. This permits setting the maximum line width with respect to the number of bytes in a line. Omitted lines (whether part of a match, replacement or context) are replaced with a message stating that the line was elided. Fixes #129	2017-03-12 21:21:28 -04:00
Andrew Gallant	4ef4818130	No line numbers when searching only stdin. This changes the default behavior of ripgrep to not show line numbers when it is printing to a tty and is only searching stdin. Fixes #380 [breaking-change]	2017-03-12 20:21:40 -04:00
Andrew Gallant	8db24e1353	Stop aggressive inlining. It's not clear what exactly is happening here, but the Read implementation for text decoding appears a bit sensitive. Small pertubations in the code appear to have a nearly 100% impact on the overall speed of ripgrep when searching UTF-16 files. I haven't had the time to examine the generated code in detail, but `perf stat` seems to think that the instruction cache is performing a lot worse when the code slows down. This might mean that excessive inlining causes a different code structure that leads to less-than-optimal icache usage, but it's at best a guess. Explicitly disabling the inline for the cold path seems to help the optimizer figure out the right thing.	2017-03-12 20:21:22 -04:00
Andrew Gallant	8bbe58d623	Add support for additional text encodings. This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and Shift_JIS. (Courtesy of the `encoding_rs` crate.) Specifically, this feature enables ripgrep to search files that are encoded in an encoding other than UTF-8. The list of available encodings is tied directly to what the `encoding_rs` crate supports, which is in turn tied to the Encoding Standard. The full list of available encodings can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get This pull request also introduces the notion that text encodings can be automatically detected on a best effort basis. Currently, the only support for this is checking for a UTF-16 bom. In all other cases, a text encoding of `auto` (the default) implies a UTF-8 or ASCII compatible source encoding. When a text encoding is otherwise specified, it is unconditionally used for all files searched. Since ripgrep's regex engine is fundamentally built on top of UTF-8, this feature works by transcoding the files to be searched from their source encoding to UTF-8. This transcoding only happens when: 1. `auto` is specified and a non-UTF-8 encoding is detected. 2. A specific encoding is given by end users (including UTF-8). When transcoding occurs, errors are handled by automatically inserting the Unicode replacement character. In this case, ripgrep's output is guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they are printed). In all other cases, the source text is searched directly, which implies an assumption that it is at least ASCII compatible, but where UTF-8 is most useful. In this scenario, encoding errors are not detected. In this case, ripgrep's output will match the input exactly, byte-for-byte. This design may not be optimal in all cases, but it has some advantages: 1. In the happy path ("UTF-8 everywhere") remains happy. I have not been able to witness any performance regressions. 2. In the non-UTF-8 path, implementation complexity is kept relatively low. The cost here is transcoding itself. A potentially superior implementation might build decoding of any encoding into the regex engine itself. In particular, the fundamental problem with transcoding everything first is that literal optimizations are nearly negated. Future work should entail improving the user experience. For example, we might want to auto-detect more text encodings. A more elaborate UX experience might permit end users to specify multiple text encodings, although this seems hard to pull off in an ergonomic way. Fixes #1	2017-03-12 19:54:48 -04:00
Marc Tiehuis	adff43fbb4	Remove clap validator + add max-filesize integration tests	2017-03-08 10:17:18 -05:00
tiehuis	714ae82241	Add `--max-filesize` option to cli The --max-filesize option allows filtering files which are larger than the specified limit. This is potentially useful if one is attempting to search a number of large files without common file-types/suffixes. See #369.	2017-03-08 10:17:18 -05:00
Andrew Gallant	79d40d0e20	Tweak how binary files are handled internally. This commit fixes two issues. The first issue is that if a file contained many NUL bytes without any LF bytes, then the InputBuffer would read the entire file into memory. This is not typically a problem, but if you run rg on /proc, then bad things can happen when reading virtual memory mapping files. Arguably, such files should be ignored, but we should also try to avoid exhausting memory too. We fix this by pushing the `-a/--text` flag option down into InputBuffer, so that it knows to stop immediately if it finds a NUL byte. The other issue this fixes is that binary detection is now applied to every buffer instead of just the first one. This helps avoid detecting too many files as plain text if the first parts of a binary file happen to contain no NUL bytes. This issue still persists somewhat in the memory map searcher, since we probably don't want to search the entire file upfront for NUL bytes before actually performing our search. Instead, we search the first 10KB for now. Fixes #52, Fixes #311	2017-02-18 16:20:21 -05:00
Andrew Gallant	525b278049	Don't parses regexes with --files. When the --files flag is given, ripgrep would still try to parse some of the positional arguments as regexes. Don't do that. Fixes #326	2017-02-18 15:34:54 -05:00
Andrew Gallant	16de47920c	Permit --heading to override --no-heading. @kbknapp <3 Fixes #327	2017-02-18 15:25:08 -05:00
Andrew Gallant	8ac5bc0147	Remove Windows deps from ripgrep proper. All Windows specific code has been (mostly) pushed out of ripgrep and into its constituent libraries.	2017-02-18 15:06:20 -05:00
Peter Williams	22cb644eb6	termcolor: add support for output to standard error This is essentially a rename of the existing `Stdout` type to `StandardStream` and a change of its constructor from a single `new()` function to have two `stdout()` and `stderr()` functions. Under the hood, we add add internal IoStandardStream{,Lock} enums that allow us to abstract between Stdout and Stderr conveniently. The rest of the needed changes then fall out fairly naturally. Fixes #324. [breaking-change]	2017-02-09 20:57:23 -05:00
Andrew Gallant	f5a2d022ec	Replace internal atty module with atty crate. This removes all use of explicit unsafe in ripgrep proper except for one: accessing the contents of a memory map. (Which may never go away.)	2017-01-15 16:32:30 -05:00
Andrew Gallant	a7d0e40668	Use basic SGR sequences when possible. In Emacs, its terminal apparently doesn't support "extended" sets of foreground/background colors. Unless we need to set an "intense" color, we should instead use one of the eight basic color codes. Also, remove the "intense" setting from the default set of colors. It doesn't do much anyway and enables the default color settings to work in Emacs out of the box. Fixes #182 (again)	2017-01-13 19:03:03 -05:00
Andrew Gallant	7a951f103a	Make --column imply --line-number. Closes #243	2017-01-11 18:53:35 -05:00
Andrew Gallant	8751e55706	Add --path-separator flag. This flag permits setting the path separator used for all file paths printed by ripgrep in normal operation. Fixes #275	2017-01-10 18:16:15 -05:00
Andrew Gallant	2143bcf9cb	Add example to -r/--replace docs. Fixes #308	2017-01-10 16:43:28 -05:00
Andrew Gallant	461e0c4e33	Don't search stdout redirected file. When running ripgrep like this: rg foo > output we must be careful not to search `output` since ripgrep is actively writing to it. Searching it can cause massive blowups where the file grows without bound. While this is conceptually easy to fix (check the inode of the redirection and the inode of the file you're about to search), there are a few problems with it. First, inodes are a Unix thing, so we need a Windows specific solution to this as well. To resolve this concern, I created a new crate, `same-file`, which provides a cross platform abstraction. Second, stat'ing every file is costly. This is not avoidable on Windows, but on Unix, we can get the inode number directly from directory traversal. However, this information wasn't exposed, but now it is (through both the ignore and walkdir crates). Fixes #286	2017-01-09 16:12:08 -05:00
Daniel Luz	c4633ff187	Remove trivial condition.	2017-01-08 17:02:57 -05:00
Ian Kerins	ed01e80a79	Provide a mechanism to compose type definitions This extends the syntax of the --type-add flag to allow including the globs of other already defined types. Fixes #83.	2017-01-07 18:14:24 -05:00
Andrew Gallant	851799f42b	Fix spacing issue in --help output.	2017-01-06 22:45:12 -05:00
Andrew Gallant	b65a8c353b	Add --sort-files flag. When used, parallelism is disabled but the results are sorted by file path. Closes #263	2017-01-06 22:43:59 -05:00
Andrew Gallant	b187c1a817	Rejigger bold and intense settings. Previously, ripgrep would only emit the 'bold' ANSI escape sequence if no foreground or background color was set. Instead, it would convert colors to their "intense" versions if bold was set. The intent was to do the same thing on Windows and Unix. However, this had a few negative side effects: 1. Omitting the 'bold' ANSI escape when 'bold' was set is surprising. 2. Intense colors can look quite bad and be hard to read. To fix this, we introduce a new setting called 'intense' in the --colors flag, and thread that down through to the public API of the `termcolor` crate. The 'intense' setting has environment specific behavior: 1. In ANSI mode, it will convert the selected color to its "intense" variant. 2. In the Windows console, it will make the text "intense." There is no longer any "smart" handling of the 'bold' style. The 'bold' ANSI escape is always emitted when it is selected. In the Windows console, the 'bold' setting now has no effect. Note that this is a breaking change. Fixes #266, #293	2017-01-06 20:09:51 -05:00
Andrew Gallant	163e00677a	Update to regex 0.2.	2017-01-01 01:03:21 -05:00
Andrew Gallant	de5cb7d22e	Remove special ^C handling. This means that ripgrep will no longer try to reset your colors in your terminal if you kill it while searching. This could result in messing up the colors in your terminal, and the fix is to simply run some other command that resets them for you. For example: $ echo -ne "\033[0m" The reason why the ^C handling was removed is because it is irrevocably broken on Windows and is impossible to do correctly and efficiently in ANSI terminals. Fixes #281	2016-12-24 12:53:09 -05:00
Andrew Gallant	084d3f4911	Small code cleanups.	2016-12-24 10:06:37 -05:00
Leonardo Yvens	dd5ded2f78	fix some clippy lints (#288 )	2016-12-23 14:53:35 -05:00
Andrew Gallant	8396d3ffaa	Make backreference support clear. Fixes #268.	2016-12-12 07:03:37 -05:00
Andrew Gallant	d66812102b	Fix leading hypen bug by updating clap. Fixes #270	2016-12-06 17:29:34 -05:00
Andrew Gallant	160f04894f	Simplify code. Instead of `Ok(n) if n == 0` we can just write `Ok(0)`.	2016-12-04 12:00:13 -05:00
Andrew Gallant	d12bdf35a5	Clarify use of --heading/--no-heading. Fixes #247.	2016-11-28 17:40:44 -05:00
Andrew Gallant	ae592b11e3	Only emit bold ANSI code if bold is true. This was a simple logic error. Also, avoid emitting ANSI escape codes if there are no color settings. Fixes #242	2016-11-21 20:33:15 -05:00
Andrew Gallant	d06f84ced3	Get rid of special mmap decision on Windows. I spent some quality time on my Windows 10 laptop and it appears to suffer from a similar trade-off as on Linux: mmaps are bad for large directory traversals but good for single large files. Darwin continues to reject memory maps in all cases (unless explicitly requested), but more testing should be done there.	2016-11-20 15:32:50 -05:00
Andrew Gallant	9598331fa8	Propagate no_messages option to worker. Fixes #241	2016-11-20 15:01:37 -05:00
Andrew Gallant	e8a30cb893	Completely re-work colored output and tty handling. This commit completely guts all of the color handling code and replaces most of it with two new crates: wincolor and termcolor. wincolor provides a simple API to coloring using the Windows console and termcolor provides a platform independent coloring API tuned for multithreaded command line programs. This required a lot more flexibility than what the `term` crate provided, so it was dropped. We instead switch to writing ANSI escape sequences directly and ignore the TERMINFO database. In addition to fixing several bugs, this commit also permits end users to customize colors to a certain extent. For example, this command will set the match color to magenta and the line number background to yellow: rg --colors 'match:fg:magenta' --colors 'line:bg:yellow' foo For tty handling, we've adopted a hack from `git` to do tty detection in MSYS/mintty terminals. As a result, ripgrep should get both color detection and piping correct on Windows regardless of which terminal you use. Finally, switch to line buffering. Performance doesn't seem to be impacted and it's an otherwise more user friendly option. Fixes #37, Fixes #51, Fixes #94, Fixes #117, Fixes #182, Fixes #231	2016-11-20 11:14:52 -05:00
Andrew Gallant	03f7605322	Rename --files-without-matches to --files-without-match. This is to be consistent with grep.	2016-11-19 20:15:41 -05:00
Daniel Luz	bd3e7eedb1	Add --files-without-matches flag. Performs the opposite of --files-with-matches: only shows paths of files that contain zero matches. Closes #138	2016-11-19 21:48:59 -02:00
Andrew Gallant	0302d58eb8	Fix stdin bug with --file. When `rg -f-` is used, the default search path should be `./` and not `-`.	2016-11-17 20:48:11 -05:00
Andrew Gallant	92dc402f7f	Switch from Docopt to Clap. There were two important reasons for the switch: 1. Performance. Docopt does poorly when the argv becomes large, which is a reasonable common use case for search tools. (e.g., use with xargs) 2. Better failure modes. Clap knows a lot more about how a particular argv might be invalid, and can therefore provide much clearer error messages. While both were important, (1) made it urgent. Note that since Clap requires at least Rust 1.11, this will in turn increase the minimum Rust version supported by ripgrep from Rust 1.9 to Rust 1.11. It is therefore a breaking change, so the soonest release of ripgrep with Clap will have to be 0.3. There is also at least one subtle breaking change in real usage. Previous to this commit, this used to work: rg -e -foo Where this would cause ripgrep to search for the string `-foo`. Clap currently has problems supporting this use case (see: https://github.com/kbknapp/clap-rs/issues/742), but it can be worked around by using this instead: rg -e [-]foo or even rg [-]foo and this still works: rg -- -foo This commit also adds Bash, Fish and PowerShell completion files to the release, fixes a bug that prevented ripgrep from working on file paths containing invalid UTF-8 and shows short descriptions in the output of `-h` but longer descriptions in the output of `--help`. Fixes #136, Fixes #189, Fixes #210, Fixes #230	2016-11-17 19:53:41 -05:00
Eric Kidd	e9cd0a1cc3	Allow specifying patterns with `-f FILE` and `-f-` This is a somewhat basic implementation of `-f-` (#7), with unit tests. Changes include: 1. The internals of the `pattern` function have been refactored to avoid code duplication, but there's a lot more we could do. Right now we read the entire pattern list into a `Vec`. 2. There's now a `WorkDir::pipe` command that allows sending standard input to `rg` when testing. Not implemented: aho-corasick.	2016-11-15 13:00:16 -05:00
Andrew Gallant	5b73dcc8ab	Rework parallelism in directory iterator. Previously, ignore::WalkParallel would invoke the callback for all explicitly given file paths in a single thread, which effectively meant that `rg pattern foo bar baz ...` didn't actually search foo, bar and baz in parallel. The code was structured that way to avoid spinning up workers if no directory paths were given. The original intention was probably to have a separate pool of threads responsible for searching, but ripgrep ended up just reusing the ignore::WalkParallel workers themselves for searching, and thereby subjected to its sub-par performance in this case. The code has been restructured so that file paths are sent to the workers, which brings back parallelism. Fixes #226	2016-11-09 17:19:40 -05:00
Andrew Gallant	2e5c3c05e8	reword	2016-11-06 19:48:49 -05:00
Andrew Gallant	6884eea2f5	reword	2016-11-06 19:48:17 -05:00
Andrew Gallant	f24873c70b	Don't ever search directories.	2016-11-06 19:02:14 -05:00
Andrew Gallant	9fc9f368f5	Always search paths given by user. This permits doing `rg -a test /dev/sda1` for example, where as before /dev/sda1 was skipped because it wasn't a regular file.	2016-11-06 18:23:50 -05:00
Andrew Gallant	77ad7588ae	Add --no-messages flag. This flag is similar to what's found in grep: it will suppress all error messages, such as those shown when a particular file couldn't be read. Closes #149	2016-11-06 14:36:08 -05:00
Andrew Gallant	58aca2efb2	Add -m/--max-count flag. This flag limits the number of matches printed per file. Closes #159	2016-11-06 13:09:53 -05:00
Andrew Gallant	277dda544c	Include the name "ripgrep" in more places. Fixes #203	2016-11-06 12:21:36 -05:00
Andrew Gallant	598b162fea	Note -e/--regexp's additional usefulness. Specifically, it can be used when searching for patterns that start with a dash. Fixes #215	2016-11-06 12:10:27 -05:00
Andre Bogus	02de97b8ce	Use the bytecount crate for fast line counting. Fixes #128	2016-11-05 22:29:26 -04:00
Andrew Gallant	b272be25fa	Add parallel recursive directory iterator. This adds a new walk type in the `ignore` crate, `WalkParallel`, which provides a way for recursively iterating over a set of paths in parallel while respecting various ignore rules. The API is a bit strange, as a closure producing a closure isn't something one often sees, but it does seem to work well. This also allowed us to simplify much of the worker logic in ripgrep proper, where MultiWorker is now gone.	2016-11-05 21:45:55 -04:00

1 2 3 4 5

216 Commits