ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-12-20 14:15:43 +02:00

Author	SHA1	Message	Date
Andrew Gallant	f97cc623f7	grep-0.2.7	2020-05-29 09:17:24 -04:00
Andrew Gallant	f35de5c523	grep: update minimal dependency versions	2020-05-29 09:17:08 -04:00
Andrew Gallant	c9bb78ceba	grep-cli-0.1.5	2020-05-29 09:14:18 -04:00
Andrew Gallant	72bdde6771	ignore-0.4.16	2020-05-29 09:13:02 -04:00
Andy Salerno	e8822ce97a	ignore/doc: update misleading documentation This likely originated from a bad copy/paste. PR #1596	2020-05-24 23:12:53 -04:00
Andrew Gallant	a700b75843	doc: clarify capture group indices And in particular, note the special $0 index, which corresponds to the entire match. Fixes #1591	2020-05-21 22:22:51 -04:00
Gerion Entrup	b72ad8f8aa	ignore/types: add meson filetype Closes #1586, PR #1587	2020-05-18 14:01:35 -04:00
Andrew Gallant	1980630f17	doc: fix egregious markup output We use '+++' syntax to output a literal '*' for a '--glob' example. This '+++' syntax is pretty ugly when rendered literally via --help. We fix this by hackily inserting the '+++' syntax for its one specific case that we need it during man page generation. Not ideal but it works. And --help still has some 'foo*' markup, but we live with that for now. Fixes #1581	2020-05-13 08:13:05 -04:00
Andrew Gallant	72807462e8	deps: update minimal versions for dependencies	2020-05-09 10:39:43 -04:00
Andrew Gallant	08dee094dd	grep-0.2.6	2020-05-09 10:37:29 -04:00
Andrew Gallant	caa53b7b09	grep: update minimal dependency versions	2020-05-09 10:37:08 -04:00
Andrew Gallant	c5d6141562	grep-printer-0.1.5	2020-05-09 10:33:02 -04:00
Andrew Gallant	c0f0492b98	grep-regex-0.1.8	2020-05-09 10:31:29 -04:00
Andrew Gallant	568018386b	ignore-0.4.15	2020-05-09 10:27:19 -04:00
Andrew Gallant	b458cf39f2	deps: update to base64 0.12 No code changes were necessary.	2020-05-09 10:25:37 -04:00
Casey Rodarmor	793c1179cc	ignore: allow filtering with predicate Adds `WalkBuilder::filter_entry` that takes a predicate to be applied to all entries. If the predicate returns `false` on a given entry, that entry and all children will be skipped. Fixes #1555, Closes #1557	2020-05-08 23:24:40 -04:00
Wieland Hoffmann	df7a3bfc7f	grep-cli: support files compressed by compress(1) While Linux distributions (at least Arch Linux, RHEL, Debian) do not support compressing files with compress(1), macOS & AIX do (the utility is part of POSIX). Additionally, gzip is able to uncompress such compressed files and provides an `uncompress` binary. Closes #1547	2020-05-08 23:24:40 -04:00
Andrew Gallant	28f2a93cae	doc: shorten -h/--help prelude It has grown quite long. It would be nice if we could shorten this only when -h is used and keep it long for --help, but it seems clap doesn't let this happen. (It does have `about` and `long_about` options, but they don't work, even when I disable the use of the template.) The longer prelude is now only available in the man page. This addresses #189.	2020-05-08 23:24:40 -04:00
Andrew Gallant	64a4dee495	cli: improve invalid UTF-8 pattern error message When a pattern with invalid UTF-8 is given, the error message suggests unqualified use of hex escape sequences to match arbitrary bytes. But you also need to disable Unicode mode. So include that in the error message. Fixes #1339	2020-05-08 23:24:40 -04:00
Andrew Gallant	50840ea43b	doc: note how to escape a '$' in --replace Fixes #1524	2020-05-08 23:24:40 -04:00
Andrew Gallant	17dcc2bf51	doc: clarify that files override gitignores This attempts to fix some mild confusion that came up as part of #1574. Specifically: https://github.com/BurntSushi/ripgrep/issues/1574#issuecomment-625780436	2020-05-08 23:24:40 -04:00
Andrew Gallant	9a858e4909	doc: add config file note for --type-{add,clear} This clarifies that persistence is possible via a configuration file. Fixes #1571	2020-05-08 23:24:40 -04:00
Andrew Gallant	7ed9a31819	printer: fix --count-matches output In order to implement --count-matches, we simply re-execute the regex on the spans reported by the searcher. The spans always correspond to the lines that participated in the match. This is the correct thing to do, except when the regex contains look-ahead (or look-behind). In particular, the look-around permits the regex's match success to depends on an arbitrary point before or after the lines actually reported as participating in the match. Since only the matched lines are reported to the printer, it is possible for subsequent searching on those lines to fail. A true fix for this would somehow make the total span available to the printer. But that seems tricky since it isn't always available. For PCRE2's case in multiline mode, it is available because we force it to be so for correctness. For now, we simply detect this corner case heuristically. If the match count is zero, then it necessarily means there is some kind of look-around that isn't matching. So we set the match count to 1. This is probably incorrect in some cases, although my brain can't quite come up with a concrete example. Nevertheless, this is strictly better than the status quo. Fixes #1573	2020-05-08 23:24:40 -04:00
Andrew Gallant	139f186e57	crates/ignore: switch to depth first traversal This replaces the use of channels in the parallel directory traversal with a simple stack. The primary motivation for this change is to reduce peak memory usage. In particular, when using a channel (which is a queue), we wind up visiting files in a breadth first fashion. Using a stack switches us to a depth first traversal. While there are no real intrinsic differences, depth first traversal generally tends to use less memory because directory trees are more commonly wide than they are deep. In particular, the queue/stack size itself is not the only concern. In one recent case documented in #1550, a user wanted to search all Rust crates. The directory structure was shallow but extremely wide, with a single directory containing all crates. This in turn results is in descending into each of those directories and building a gitignore matcher for each (since most crates have `.gitignore` files) before ever searching a single file. This means that ripgrep has all such matchers in memory simultaneously, which winds up using quite a bit of memory. In a depth first traversal, peak memory usage is much lower because gitignore matches are built and discarded more quickly. In the case of searching all crates, the peak memory usage decrease is dramatic. On my system, it shrinks by an order magnitude, from almost 1GB to 50MB. The decline in peak memory usage is consistent across other use cases as well, but is typically more modest. For example, searching the Linux repo has a 50% decrease in peak memory usage and searching the Chromium repo has a 25% decrease in peak memory usage. Search times generally remain unchanged, although some ad hoc benchmarks that I typically run have gotten a bit slower. As far as I can tell, this appears to be result of scheduling changes. Namely, the depth first traversal seems to result in searching some very large files towards the end of the search, which reduces the effectiveness of parallelism and makes the overall search take longer. This seems to suggest that a stack isn't optimal. It would instead perhaps be better to prioritize searching larger files first, but it's not quite clear how to do this without introducing more overhead (getting the file size for each file requires a stat call). Fixes #1550	2020-04-18 11:33:03 -04:00
Andrew Gallant	a75b4d122a	doc: fix newline escape Fixes #1551	2020-04-13 08:49:27 -04:00
Andrew Gallant	1c4b5adb7b	regex: fix another inner literal bug It looks like `is_simple` wasn't quite correct. I can't wait until this code is rewritten. It is still not quite clearly correct to me. Fixes #1537	2020-04-01 20:37:48 -04:00
Marius Schulz	3d6a58faff	doc: fix typo in help description PR #1536	2020-03-30 17:31:16 -04:00
Andrew Gallant	09a4b75baf	ignore-0.4.14	2020-03-29 18:49:01 -04:00
Zoltan Puskas	4dfea016b9	ignore/types: add ebuild type Add support for Gentoo's portage package manager spec files: https://wiki.gentoo.org/wiki/Portage	2020-03-29 18:44:04 -04:00
Andrew Gallant	67c0f576b6	ignore-0.4.13	2020-03-22 21:08:37 -04:00
Andrew Gallant	543f99dbf1	grep-regex-0.1.7	2020-03-22 21:08:19 -04:00
Andrew Gallant	0ea65efd6d	regex: special case literal extraction In a prior commit, we fixed a performance problem with the -w flag by doing a little extra work to extract literals. It turns out that using literals in this case when the -w flag is NOT used results in a performance regression. The reasoning is that we end up using a "fast" regex as a prefilter when the regex engine itself uses its own equivalent prefilter, so ripgrep ends up redoing a fair amount of work. Instead, we only do this extra work when we know the -w flag is enabled.	2020-03-22 21:02:51 -04:00
Andrew Gallant	8ba6ccd159	ignore: fix failing test This fixes fallout from fixing #1520.	2020-03-16 19:16:24 -04:00
Andrew Gallant	34edb8123a	ignore: squash noisy error message We should not assume that the commondir file actually exists. If it doesn't, then just move on. This otherwise emits an error message when searching normal submodules, which is not OK. This regression was introduced in #1446. Fixes #1520	2020-03-16 18:50:02 -04:00
Andrew Gallant	92daa34eb3	ripgrep: release 12.0.0	2020-03-15 21:42:54 -04:00
Andrew Gallant	e772a95b58	regex: avoid using literal optimizations when whitespace is detected If a literal is entirely whitespace, then it's quite likely that it is very common. So when that case occurs, just don't do (inner) literal optimizations at all. The regex engine may still make sub-optimal decisions here, but that's a problem for another day. Fixes #1087	2020-03-15 13:19:14 -04:00
Andrew Gallant	9dd4bf8d7f	style: fix rust-analyzer lint warnings	2020-03-15 13:19:14 -04:00
Andrew Gallant	c4c43c733e	cli: add --no-ignore-files flag The purpose of this flag is to force ripgrep to ignore all --ignore-file flags (whether they come before or after --no-ignore-files). This flag can be overridden with --ignore-files. Fixes #1466	2020-03-15 13:19:14 -04:00
Andrew Gallant	447506ebe0	doc: clarify globing behavior Fixes #1442, Fixes #1478	2020-03-15 13:19:14 -04:00
Andrew Gallant	12e4180985	doc: remove CPU features from man pages It doesn't really belong in the man page since it's an artifact of a build/runtime configuration. Moreover, it inhibits reproducible builds. Fixes #1441	2020-03-15 13:19:14 -04:00
Andrew Gallant	daa8319398	doc: note ripgrep's stdin behavior Fixes #1439	2020-03-15 13:19:14 -04:00
pierrenn	3a6a24a52a	cli: add engine flag This permits switching between the different regex engine modes that ripgrep supports. The purpose of this flag is to make it easier to extend ripgrep with additional regex engines. Closes #1488, Closes #1502	2020-03-15 09:30:58 -04:00
pierrenn	aab3d80374	args: refactor to permit adding other engines This is in preparation for adding a new --engine flag which is intended to eventually supplant --auto-hybrid-regex. While there are no immediate plans to add more regex engines to ripgrep, this is intended to make it easier to maintain a patch to ripgrep with an additional regex engine. See #1488 for more details.	2020-03-15 09:24:28 -04:00
Andrew Gallant	1856cda77b	style: fix rust-analyzer lints in core	2020-03-15 09:04:54 -04:00
chip	50d2047ae2	crates: update URLs in Cargo.toml This corrects an oversight when the repo was re-organized to have its crates moved into a 'crates' sub-directory. PR #1505	2020-02-28 20:31:43 -05:00
Wolf Honore	227436624f	ignore/types: add coq type PR #1504	2020-02-28 19:11:29 -05:00
Lucien Greathouse	db7a8cdcb5	globset: Implement serde::{Serialize, Deserialize} for Glob PR #1492	2020-02-21 07:40:47 -05:00
Andrew Gallant	4176050cdd	ignore: another simplification Again, thanks to @zsugabubus!	2020-02-20 17:26:34 -05:00
Andrew Gallant	109460fce2	ignore: simplify parallel worker initialization We can just ask the channel whether any work has been loaded. Normally querying a channel for its length is a strong predictor of bugs, but in this case, we do it before we ever attempt a `recv`, so it should work. Kudos to @zsugabubus for suggesting this!	2020-02-20 16:50:41 -05:00
Andrew Gallant	f314b0d55f	ignore: fix parallel traversal It turns out that the previous version wasn't quite correct. Namely, it was possible for the following sequence to occur: 1. Consider that all workers, except for one, are `waiting`. 2. The last remaining worker finds one more job to do and sends it on the channel. 3. One of the previously `waiting` workers wakes up from the job that the last running worker sent, but `self.resume()` has not been called yet. 4. The last worker, from (2), calls `get_work` and sees that the channel has nothing on it, so it executes `self.waiting() == 1`. Since the worker in (3) hasn't called `self.resume()` yet, `self.waiting() == 1` evaluates to true. 5. This sets off a chain reaction that stops all workers, despite that fact that (3) got more work (which could itself spawn more work). The end result is that the traversal may terminate while their are still outstanding work items to process. This problem was observed through spurious failures in CI. I was not actually able to reproduce the bug locally. We fix this by changing our strategy to detect termination using a counter. Namely, we increment the counter just before sending new work and decrement the counter just after finishing work. In this way, we guarantee that the counter only ever reaches 0 once there is no more work to process. See #1337 for more discussion. Many thanks to @zsugabubus for helping me work through this.	2020-02-20 16:07:51 -05:00

1 2

53 Commits