ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-03-03 14:32:22 +02:00

Author	SHA1	Message	Date
Andrew Gallant	72bdde6771	ignore-0.4.16	2020-05-29 09:13:02 -04:00
Gerion Entrup	b72ad8f8aa	ignore/types: add meson filetype Closes #1586, PR #1587	2020-05-18 14:01:35 -04:00
Andrew Gallant	72807462e8	deps: update minimal versions for dependencies	2020-05-09 10:39:43 -04:00
Andrew Gallant	568018386b	ignore-0.4.15	2020-05-09 10:27:19 -04:00
Casey Rodarmor	793c1179cc	ignore: allow filtering with predicate Adds `WalkBuilder::filter_entry` that takes a predicate to be applied to all entries. If the predicate returns `false` on a given entry, that entry and all children will be skipped. Fixes #1555, Closes #1557	2020-05-08 23:24:40 -04:00
Wieland Hoffmann	df7a3bfc7f	grep-cli: support files compressed by compress(1) While Linux distributions (at least Arch Linux, RHEL, Debian) do not support compressing files with compress(1), macOS & AIX do (the utility is part of POSIX). Additionally, gzip is able to uncompress such compressed files and provides an `uncompress` binary. Closes #1547	2020-05-08 23:24:40 -04:00
Andrew Gallant	139f186e57	crates/ignore: switch to depth first traversal This replaces the use of channels in the parallel directory traversal with a simple stack. The primary motivation for this change is to reduce peak memory usage. In particular, when using a channel (which is a queue), we wind up visiting files in a breadth first fashion. Using a stack switches us to a depth first traversal. While there are no real intrinsic differences, depth first traversal generally tends to use less memory because directory trees are more commonly wide than they are deep. In particular, the queue/stack size itself is not the only concern. In one recent case documented in #1550, a user wanted to search all Rust crates. The directory structure was shallow but extremely wide, with a single directory containing all crates. This in turn results is in descending into each of those directories and building a gitignore matcher for each (since most crates have `.gitignore` files) before ever searching a single file. This means that ripgrep has all such matchers in memory simultaneously, which winds up using quite a bit of memory. In a depth first traversal, peak memory usage is much lower because gitignore matches are built and discarded more quickly. In the case of searching all crates, the peak memory usage decrease is dramatic. On my system, it shrinks by an order magnitude, from almost 1GB to 50MB. The decline in peak memory usage is consistent across other use cases as well, but is typically more modest. For example, searching the Linux repo has a 50% decrease in peak memory usage and searching the Chromium repo has a 25% decrease in peak memory usage. Search times generally remain unchanged, although some ad hoc benchmarks that I typically run have gotten a bit slower. As far as I can tell, this appears to be result of scheduling changes. Namely, the depth first traversal seems to result in searching some very large files towards the end of the search, which reduces the effectiveness of parallelism and makes the overall search take longer. This seems to suggest that a stack isn't optimal. It would instead perhaps be better to prioritize searching larger files first, but it's not quite clear how to do this without introducing more overhead (getting the file size for each file requires a stat call). Fixes #1550	2020-04-18 11:33:03 -04:00
Andrew Gallant	09a4b75baf	ignore-0.4.14	2020-03-29 18:49:01 -04:00
Zoltan Puskas	4dfea016b9	ignore/types: add ebuild type Add support for Gentoo's portage package manager spec files: https://wiki.gentoo.org/wiki/Portage	2020-03-29 18:44:04 -04:00
Andrew Gallant	67c0f576b6	ignore-0.4.13	2020-03-22 21:08:37 -04:00
Andrew Gallant	8ba6ccd159	ignore: fix failing test This fixes fallout from fixing #1520.	2020-03-16 19:16:24 -04:00
Andrew Gallant	34edb8123a	ignore: squash noisy error message We should not assume that the commondir file actually exists. If it doesn't, then just move on. This otherwise emits an error message when searching normal submodules, which is not OK. This regression was introduced in #1446. Fixes #1520	2020-03-16 18:50:02 -04:00
Andrew Gallant	92daa34eb3	ripgrep: release 12.0.0	2020-03-15 21:42:54 -04:00
chip	50d2047ae2	crates: update URLs in Cargo.toml This corrects an oversight when the repo was re-organized to have its crates moved into a 'crates' sub-directory. PR #1505	2020-02-28 20:31:43 -05:00
Wolf Honore	227436624f	ignore/types: add coq type PR #1504	2020-02-28 19:11:29 -05:00
Andrew Gallant	4176050cdd	ignore: another simplification Again, thanks to @zsugabubus!	2020-02-20 17:26:34 -05:00
Andrew Gallant	109460fce2	ignore: simplify parallel worker initialization We can just ask the channel whether any work has been loaded. Normally querying a channel for its length is a strong predictor of bugs, but in this case, we do it before we ever attempt a `recv`, so it should work. Kudos to @zsugabubus for suggesting this!	2020-02-20 16:50:41 -05:00
Andrew Gallant	f314b0d55f	ignore: fix parallel traversal It turns out that the previous version wasn't quite correct. Namely, it was possible for the following sequence to occur: 1. Consider that all workers, except for one, are `waiting`. 2. The last remaining worker finds one more job to do and sends it on the channel. 3. One of the previously `waiting` workers wakes up from the job that the last running worker sent, but `self.resume()` has not been called yet. 4. The last worker, from (2), calls `get_work` and sees that the channel has nothing on it, so it executes `self.waiting() == 1`. Since the worker in (3) hasn't called `self.resume()` yet, `self.waiting() == 1` evaluates to true. 5. This sets off a chain reaction that stops all workers, despite that fact that (3) got more work (which could itself spawn more work). The end result is that the traversal may terminate while their are still outstanding work items to process. This problem was observed through spurious failures in CI. I was not actually able to reproduce the bug locally. We fix this by changing our strategy to detect termination using a counter. Namely, we increment the counter just before sending new work and decrement the counter just after finishing work. In this way, we guarantee that the counter only ever reaches 0 once there is no more work to process. See #1337 for more discussion. Many thanks to @zsugabubus for helping me work through this.	2020-02-20 16:07:51 -05:00
asymmetric	b44554c803	ignore/types: add K type Adds support for files used by the K executable semantic framework: http://www.kframework.org/index.php/Main_Page PR #1493	2020-02-19 07:07:09 -05:00
Andrew Gallant	fdd8510fdd	repo: move all source code in crates directory The top-level listing was just getting a bit too long for my taste. So put all of the code in one directory and shrink the large top-level mess to a small top-level mess. NOTE: This commit only contains renames. The subsequent commit will actually make ripgrep build again. We do it this way with the naive hope that this will make it easier for git history to track the renames. Sigh.	2020-02-17 19:24:53 -05:00

20 Commits