ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-08-04 21:52:54 +02:00

Author	SHA1	Message	Date
Andrew Gallant	967e7ad0de	ripgrep: add --auto-hybrid-regex flag This flag, when set, will automatically dispatch to PCRE2 if the given regex cannot be compiled by Rust's regex engine. If both engines fail to compile the regex, then both errors are surfaced. Closes #1155	2019-04-14 19:29:27 -04:00
Andrew Gallant	9952ba2068	deps: update glob dev-dependency	2019-04-14 19:29:27 -04:00
Andrew Gallant	b751758d60	deps: update everything	2019-04-14 19:29:27 -04:00
Andrew Gallant	8f14cb18a5	ripgrep: increase pcre2's default JIT stack size The default stack size is 32KB, and this increases it to 10MB. 32KB is pretty paltry in the environments in which ripgrep runs, and 10MB is easily afforded as a maximum size. (The size limit we set for Rust's regex engine is considerably larger.) This was motivated due to the fack that JIT stack limits have been observed to be hit in the wild: https://github.com/Microsoft/vscode/issues/64606	2019-04-14 19:29:27 -04:00
Andrew Gallant	da9d720431	ripgrep: add --pcre2-version flag This flag will output details about the version of PCRE2 that ripgrep is using (if any).	2019-04-14 19:29:27 -04:00
Andrew Gallant	a9d71a0368	pcre2: add a few re-exports This adds the top-level is_jit_available and version free functions from the underlying pcre2 crate, and also forwards the max_jit_stack_size option.	2019-04-14 19:29:27 -04:00
Andrew Gallant	f3646242cc	deps: use pcre2 0.2.0 This comes with PCRE 10.32 and a few new options we'll use in subsequent commits.	2019-04-14 19:29:27 -04:00
Andrew Gallant	601f212a0b	ripgrep: add -I as a short option for --no-filename This flag is commonly used in pipelines and it can be annoying to write it out every time you need it. Ideally, we would use -h for this to match GNU grep, but -h is used to print help output. Closes #1185	2019-04-14 19:29:27 -04:00
Andrew Gallant	5a565354f8	versioning: next version will be ripgrep 11 This sets up the release announcement and briefly describes the versioning change. The actual version change itself won't happen until the release. Closes #1172	2019-04-14 19:29:27 -04:00
Andrew Gallant	2a6532ae71	doc: note cases of exorbitant memory usage Fixes #1189	2019-04-14 19:29:27 -04:00
Andrew Gallant	ece1f50cfe	printer: support previews for long lines This commit adds support for showing a preview of long lines. While the default still remains as completely suppressing the entire line, this new functionality will show the first N graphemes of a matching line, including the number of matches that are suppressed. This was unfortunately a fairly invasive change to the printer that required a bit of refactoring. On the bright side, the single line and multi-line coloring are now more unified than they were before. Closes #1078	2019-04-14 19:29:27 -04:00
Andrew Gallant	a7d26c8f14	binary: rejigger ripgrep's handling of binary files This commit attempts to surface binary filtering in a slightly more user friendly way. Namely, before, ripgrep would silently stop searching a file if it detected a NUL byte, even if it had previously printed a match. This can lead to the user quite reasonably assuming that there are no more matches, since a partial search is fairly unintuitive. (ripgrep has this behavior by default because it really wants to NOT search binary files at all, just like it doesn't search gitignored or hidden files.) With this commit, if a match has already been printed and ripgrep detects a NUL byte, then it will print a warning message indicating that the search stopped prematurely. Moreover, this commit adds a new flag, --binary, which causes ripgrep to stop filtering binary files, but in a way that still avoids dumping binary data into terminals. That is, the --binary flag makes ripgrep behave more like grep's default behavior. For files explicitly specified in a search, e.g., `rg foo some-file`, then no binary filtering is applied (just like no gitignore and no hidden file filtering is applied). Instead, ripgrep behaves as if you gave the --binary flag for all explicitly given files. This was a fairly invasive change, and potentially increases the UX complexity of ripgrep around binary files. (Before, there were two binary modes, where as now there are three.) However, ripgrep is now a bit louder with warning messages when binary file detection might otherwise be hiding potential matches, so hopefully this is a net improvement. Finally, the `-uuu` convenience now maps to `--no-ignore --hidden --binary`, since this is closer to the actualy intent of the `--unrestricted` flag, i.e., to reduce ripgrep's smart filtering. As a consequence, `rg -uuu foo` should now search roughly the same number of bytes as `grep -r foo`, and `rg -uuua foo` should search roughly the same number of bytes as `grep -ra foo`. (The "roughly" weasel word is used because grep's and ripgrep's binary file detection might differ somewhat---perhaps based on buffer sizes---which can impact exactly what is and isn't searched.) See the numerous tests in tests/binary.rs for intended behavior. Fixes #306, Fixes #855	2019-04-14 19:29:27 -04:00
Andrew Gallant	bd222ae93f	regex: fix HIR analysis bug An alternate can be empty at this point, so we must handle it. We didn't before because the regex engine actually disallows empty alternates, however, this code runs before the regex compiler rejects the regex.	2019-04-14 19:29:27 -04:00
hupfdule	4359d8aac0	ignore/types: add more extensions for xml This includes: .dtd for Document Type Definitions .xsl and .xslt for XSL Transformation descriptions .xsd for XML Schema definitions .xjb for JAXB bindings .rng for Relax NG files *.sch for Schematron files PR #1243	2019-04-09 15:17:57 -04:00
tonypai	308819fb1f	ignore/types: add lock files Treat anything with a `.lock` extension as a lock file, with an extra rule or two for special cases, e.g., package-lock.json.	2019-04-09 10:24:48 -04:00
Andrew Gallant	09108b7fda	regex: make multi-literal searcher faster This makes the case of searching for a dictionary of a very large number of literals much much faster. (~10x or so.) In particular, we achieve this by short-circuiting the construction of a full regex when we know we have a simple alternation of literals. Building the regex for a large dictionary (>100,000 literals) turns out to be quite slow, even if it internally will dispatch to Aho-Corasick. Even that isn't quite enough. It turns out that even parsing such a regex is quite slow. So when the -F/--fixed-strings flag is set, we short circuit regex parsing completely and jump straight to Aho-Corasick. We aren't quite as fast as GNU grep here, but it's much closer (less than 2x slower). In general, this is somewhat of a hack. In particular, it seems plausible that this optimization could be implemented entirely in the regex engine. Unfortunately, the regex engine's internals are just not amenable to this at all, so it would require a larger refactoring effort. For now, it's good enough to add this fairly simple hack at a higher level. Unfortunately, if you don't pass -F/--fixed-strings, then ripgrep will be slower, because of the aforementioned missing optimization. Moreover, passing flags like `-i` or `-S` will cause ripgrep to abandon this optimization and fall back to something potentially much slower. Again, this fix really needs to happen inside the regex engine, although we might be able to special case -i when the input literals are pure ASCII via Aho-Corasick's `ascii_case_insensitive`. Fixes #497, Fixes #838	2019-04-07 19:11:03 -04:00
Andrew Gallant	743d64f2e4	deps: update to clap 2.33	2019-04-06 10:35:08 -04:00
lesnyrumcajs	5962abc465	searcher: add option to disable BOM sniffing This commit adds a new encoding feature where the -E/--encoding flag will now accept a value of 'none'. When given this value, all encoding related machinery is disabled and ripgrep will search the raw bytes of the file, including the BOM if it's present. Closes #1207, Closes #1208	2019-04-06 10:35:08 -04:00
dana	1604a18db3	ignore/types: add .am and .in for C/C++/make PR #1205	2019-04-06 08:02:04 -04:00
luzpaz	9eeb0b01ce	readme: add Repology badge This adds a badge to the README.md file indicating to users that click on it if their os/distro carries that latest version of ripgrep. PR #1213	2019-04-06 08:00:40 -04:00
dana	df4400209a	ripgrep: remove extra new-line after Clap output PR #1222	2019-04-06 07:59:36 -04:00
Andrew Gallant	77439f99a4	deps: add bstr to Cargo.lock	2019-04-05 23:24:08 -04:00
Andrew Gallant	be7d6dd9ce	regex: print out final regex in trace mode This is useful for debugging to see what regex is actually being run. We put this as a trace since the regex can be quite gnarly. (It is not pretty printed.)	2019-04-05 23:24:08 -04:00
Andrew Gallant	9f15e3b671	regex: fix a perf bug when using -w flag When looking for an inner literal to speed up searches, if only a prefix is found, then we generally give up doing inner literal optimizations since the regex engine will generally handle it for us. Unfortunately, this decision was being made before we wrap the regex in (^\|\W)...($\|\W) when using the -w/--word-regexp flag, which would then defeat the literal optimizations inside the regex engine. We fix this with a bit of a hack that says, "if we're doing a word regexp, then give me back any literal you find, even if it's a prefix."	2019-04-05 23:24:08 -04:00
Andrew Gallant	254b8b67bb	globset: small perf improvements This tweaks the path handling functions slightly to make them a hair faster. In particular, `file_name` is called on every path that ripgrep visits, and it was possible to remove a few branches without changing behavior.	2019-04-05 23:24:08 -04:00
Andrew Gallant	8a7f43b84d	globset: use bstr This simplifies the various path related functions and pushed more platform dependent code down into bstr. This likely also makes things a bit more efficient on Windows, since we now only do a single UTF-8 check for each file path.	2019-04-05 23:24:08 -04:00
Andrew Gallant	d968a27ed5	cli: use bstr This uses bstr in the unescaping logic. This lets us remove some platform specific code, and also lets us remove a hacked UTF-8 decoder on raw bytes.	2019-04-05 23:24:08 -04:00
Andrew Gallant	9b8f5cbaba	config: switch to using bstrs This lets us implement correct Unicode trimming and also simplifies the parsing logic a bit. This also removes the last platform specific bits of code in ripgrep core.	2019-04-05 23:24:08 -04:00
Andrew Gallant	c52da74ac3	printer: use bstr This starts the usage of bstr in the printer. We don't use it too much yet, but it comes in handy for implementing PrinterPath and lets us push down some platform specific code into bstr.	2019-04-05 23:24:08 -04:00
Andrew Gallant	7dcbff9a9b	searcher: partially migrate to bstr This commit causes grep-searcher to use byte strings internally for its line buffer support. We manage to remove a use of `unsafe` by doing this (by pushing it down into `bstr`). We stop short of using byte strings everywhere else because we rely heavily on the `impl ops::Index<[u8]> for grep_matcher::Match` impl, which isn't available for byte strings. (It is premature to make bstr a public dep of a core crate like grep-matcher, but maybe some day.)	2019-04-05 23:24:08 -04:00
Andrew Gallant	bef1f0e770	ci: switch to xenial (#1234 ) Rust is having problems with trusty, in particular, see this bug I filed: https://github.com/rust-lang/rust/issues/59411 This was purpotedly fixed in https://github.com/rust-lang/rust/pull/59468, but it appears the issue is still occurring. This commit tries to update to Ubuntu 16.04 in the hope that it will fix this problem.	2019-04-03 19:52:34 -04:00
Andrew Gallant	cd9815cb37	deps: update to aho-corasick 0.7 We do the simplest possible change to migrate to the new version. Fixes #1228	2019-04-03 13:51:26 -04:00
Andrew Gallant	3f22c3a658	deps: update everything This updates all dependencies to their latest versions. We tolerate a duplicative aho-corasick for now, which we will fix in the next commit.	2019-04-03 13:07:26 -04:00
Andrew Gallant	0913972104	deps: bump encoding_rs_io This brings in a new API for disabling BOM sniffing. This is part of the work toward completing https://github.com/BurntSushi/ripgrep/issues/1207	2019-03-03 16:36:34 -05:00
Andrew Gallant	f19b84fb23	regex: bump regex dep to fix match bug See * `661bf53d5b` * `edf45e6f5f` for details on the bug fix, which was in the regex engine. Fixes #1203	2019-02-27 17:42:14 -05:00
Andrew Gallant	59fc583aeb	readme: include details about filtering Despite the fact that we mention this in several places, people are still surprised by ripgrep's "smart" filtering.	2019-02-27 08:01:23 -05:00
Andrew Gallant	1c7c4e6640	deps: update tempfile	2019-02-21 16:32:17 -05:00
Andrew Gallant	69c5e3938d	deps: bump smallvec This gets rid of the unmaintained crates `unreachable` and `void`. Yay!	2019-02-21 16:31:48 -05:00
Andrew Gallant	d9cf05ad50	deps: update to aho-corasick 0.6.10 This brings in a fix for this bug: https://github.com/BurntSushi/aho-corasick/issues/37 Fixes #1079	2019-02-16 11:39:33 -05:00
Andrew Gallant	af8b6caebb	deps: update various dependencies	2019-02-16 09:39:42 -05:00
Andrew Gallant	c84cfb6756	grep-regex-0.1.2 grep-regex-0.1.2	2019-02-16 09:30:06 -05:00
Andrew Gallant	895e26a000	ci: don't do releases on all tags This attempts to make Appveyor more conservative in what tags it thinks are releases. I don't know for sure, but it looks like the previous regex could match anywhere, so we anchor it. Fixes #1195	2019-02-10 12:51:56 -05:00
Andrew Gallant	8c95290ff6	deps: miscellaneous updates	2019-02-10 07:45:08 -05:00
Andrew Gallant	d6feeb7ff2	grep-searcher-0.1.3 grep-searcher-0.1.3	2019-02-10 07:42:37 -05:00
Andrew Gallant	626ed00c19	searcher: revert big-endian patch This undoes the patch to stop using bytecount on big-endian architectures. In particular, we bump our bytecount dependency to the latest release, which has a fix. This reverts commit `a4868b8835`. Fixes #1144 (again), Closes #1194	2019-02-10 07:40:32 -05:00
Andrew Gallant	332ad18401	tests: use const constructor for atomics We did this in `05411b2b` for core ripgrep, but didn't carry it over to tests.	2019-02-09 16:27:25 -05:00
Andrew Gallant	fc3cf41247	grep-searcher-0.1.2 grep-searcher-0.1.2	2019-02-09 16:13:07 -05:00
Andrew Gallant	a4868b8835	searcher: use naive line counting on big-endian This patches out bytecount's "fast" vectorized algorithm on big-endian machines, where it has been observed to fail. Going forward, bytecount should probably fix this on their end, but for now, we take a small performance hit on big-endian machines. Fixes #1144	2019-02-09 16:13:07 -05:00
John Schmidt	f99b991117	ignore/types: add zig PR #1191	2019-02-08 08:12:40 -05:00
Andrew Gallant	de0bc78982	deps: bump encoding_rs to 0.8.16 This brings in an updated `encoding_rs` crate that uses `packed_simd`, which compiles on the latest nightly. Compilation times do appear to be impacted significantly though. Fixes #1175 (again)	2019-02-07 17:05:14 -05:00

1 2 3 4 5 ...

1225 Commits