ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00

Author	SHA1	Message	Date
Richard Khoury	a28e664abd	ignore: check ignore rules before issuing stat calls This seems like an obvious optimization but becomes critical when filesystem operations even as simple as stat can result in significant overheads; an example of this was a bespoke filesystem layer in Windows that hosted files remotely and would download them on-demand when particular filesystem operations occurred. Users of this system who ensured correct file-type fileters were being used could still get unnecessary file access resulting in large downloads. Fixes #1657, Closes #1660	2021-05-31 21:51:18 -04:00
Raimon Grau	53c4855517	ignore/types: add red See: https://www.red-lang.org/ Closes #1663	2021-05-31 21:51:18 -04:00
Simon Morgan	121e0135c1	ignore/types: replace duplicate glob with .aspx.vb .aspx.cs was listed twice and the VB variant is missing. Closes #1683	2021-05-31 21:51:18 -04:00
Marco Ieni	b3a6a69f9d	ci: check docs for all crates This also replaces '--all' in Cargo commands with '--workspace'. The former has apparently been deprecated. We also fix a couple warnings that this new step detected. Closes #1848	2021-05-31 21:51:18 -04:00
jack1142	ba965962fe	ignore/types: add po files to supported types See: https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html Closes #1875	2021-05-28 12:06:10 -04:00
jgart	4ebe8375ec	ignore/types: add mint PR #1844	2021-04-04 08:00:12 -04:00
Sergei Vorobev	9c8d873a75	ignore/types: improve bazel globs Adds .BUILD and .bazelrc. PR #1789	2021-01-30 18:22:48 -05:00
Ed Page	873abecbf1	ignore: provide underlying IO Error `ignore::Error` wraps `std::io::Error` with additional information (as well as expose non-IO errors). For people wanting to inspect what the error is, they have to recursively match the Enum. This provides `io_error` and `into_io_error` helpers to do this for the user. PR #1740	2020-11-23 10:19:31 -05:00
James Harr	44e69ba627	ignore/types: add yang file type YANG is described in RFC 6020 https://tools.ietf.org/html/rfc6020 PR #1736	2020-11-20 09:41:29 -05:00
Vanessa McHale	e1ac18ef06	ignore/types: add Futhark See: https://futhark-lang.org/ PR #1720	2020-10-31 12:10:15 -04:00
Brandon Adams	ba3f9673ad	ignore/types: generalize bazel type a bit Bazel supports `BUILD.bazel` as well as `WORKSPACE.bazel`. In addition, it is common to ship BUILD/WORKSPACE templates for external repositories suffixed with .bazel for easier tool recognition. Co-authored-by: Brandon Adams <brandon.adams@imc.com> PR #1716	2020-10-23 12:24:30 -04:00
Dương Đỗ Minh Châu	86c843a44b	ignore/types: add a type for minified files Fixes #1710, PR #1711	2020-10-19 09:10:54 -04:00
Andrew Pyatkov	6301e20ee4	ignore/types: add flatbuffers type See: https://google.github.io/flatbuffers/ PR #1707	2020-10-16 20:19:16 -04:00
Andy Freeland	fc2a99bb1f	ignore/types: add vcl (#1659 ) VCL is the Varnish Configuration Language used by Varnish and Fastly. https://varnish-cache.org/docs/trunk/users-guide/vcl.html PR #1659	2020-08-19 16:28:14 -04:00
Raimon Grau (rgrau)	ffd4c9ccba	ignore/types: add racket PR #1628	2020-06-25 08:51:32 -04:00
jtrakk	a16bfcb3d6	ignore/types: add dvc This provides support for DVC files (https://dvc.org/). PR #1608	2020-06-09 07:44:09 -04:00
Martin Michlmayr	1b2c1dc675	doc: fix typos PR #1605	2020-06-04 09:06:09 -04:00
Gerion Entrup	b72ad8f8aa	ignore/types: add meson filetype Closes #1586, PR #1587	2020-05-18 14:01:35 -04:00
Casey Rodarmor	793c1179cc	ignore: allow filtering with predicate Adds `WalkBuilder::filter_entry` that takes a predicate to be applied to all entries. If the predicate returns `false` on a given entry, that entry and all children will be skipped. Fixes #1555, Closes #1557	2020-05-08 23:24:40 -04:00
Wieland Hoffmann	df7a3bfc7f	grep-cli: support files compressed by compress(1) While Linux distributions (at least Arch Linux, RHEL, Debian) do not support compressing files with compress(1), macOS & AIX do (the utility is part of POSIX). Additionally, gzip is able to uncompress such compressed files and provides an `uncompress` binary. Closes #1547	2020-05-08 23:24:40 -04:00
Andrew Gallant	139f186e57	crates/ignore: switch to depth first traversal This replaces the use of channels in the parallel directory traversal with a simple stack. The primary motivation for this change is to reduce peak memory usage. In particular, when using a channel (which is a queue), we wind up visiting files in a breadth first fashion. Using a stack switches us to a depth first traversal. While there are no real intrinsic differences, depth first traversal generally tends to use less memory because directory trees are more commonly wide than they are deep. In particular, the queue/stack size itself is not the only concern. In one recent case documented in #1550, a user wanted to search all Rust crates. The directory structure was shallow but extremely wide, with a single directory containing all crates. This in turn results is in descending into each of those directories and building a gitignore matcher for each (since most crates have `.gitignore` files) before ever searching a single file. This means that ripgrep has all such matchers in memory simultaneously, which winds up using quite a bit of memory. In a depth first traversal, peak memory usage is much lower because gitignore matches are built and discarded more quickly. In the case of searching all crates, the peak memory usage decrease is dramatic. On my system, it shrinks by an order magnitude, from almost 1GB to 50MB. The decline in peak memory usage is consistent across other use cases as well, but is typically more modest. For example, searching the Linux repo has a 50% decrease in peak memory usage and searching the Chromium repo has a 25% decrease in peak memory usage. Search times generally remain unchanged, although some ad hoc benchmarks that I typically run have gotten a bit slower. As far as I can tell, this appears to be result of scheduling changes. Namely, the depth first traversal seems to result in searching some very large files towards the end of the search, which reduces the effectiveness of parallelism and makes the overall search take longer. This seems to suggest that a stack isn't optimal. It would instead perhaps be better to prioritize searching larger files first, but it's not quite clear how to do this without introducing more overhead (getting the file size for each file requires a stat call). Fixes #1550	2020-04-18 11:33:03 -04:00
Zoltan Puskas	4dfea016b9	ignore/types: add ebuild type Add support for Gentoo's portage package manager spec files: https://wiki.gentoo.org/wiki/Portage	2020-03-29 18:44:04 -04:00
Andrew Gallant	8ba6ccd159	ignore: fix failing test This fixes fallout from fixing #1520.	2020-03-16 19:16:24 -04:00
Andrew Gallant	34edb8123a	ignore: squash noisy error message We should not assume that the commondir file actually exists. If it doesn't, then just move on. This otherwise emits an error message when searching normal submodules, which is not OK. This regression was introduced in #1446. Fixes #1520	2020-03-16 18:50:02 -04:00
Wolf Honore	227436624f	ignore/types: add coq type PR #1504	2020-02-28 19:11:29 -05:00
Andrew Gallant	4176050cdd	ignore: another simplification Again, thanks to @zsugabubus!	2020-02-20 17:26:34 -05:00
Andrew Gallant	109460fce2	ignore: simplify parallel worker initialization We can just ask the channel whether any work has been loaded. Normally querying a channel for its length is a strong predictor of bugs, but in this case, we do it before we ever attempt a `recv`, so it should work. Kudos to @zsugabubus for suggesting this!	2020-02-20 16:50:41 -05:00
Andrew Gallant	f314b0d55f	ignore: fix parallel traversal It turns out that the previous version wasn't quite correct. Namely, it was possible for the following sequence to occur: 1. Consider that all workers, except for one, are `waiting`. 2. The last remaining worker finds one more job to do and sends it on the channel. 3. One of the previously `waiting` workers wakes up from the job that the last running worker sent, but `self.resume()` has not been called yet. 4. The last worker, from (2), calls `get_work` and sees that the channel has nothing on it, so it executes `self.waiting() == 1`. Since the worker in (3) hasn't called `self.resume()` yet, `self.waiting() == 1` evaluates to true. 5. This sets off a chain reaction that stops all workers, despite that fact that (3) got more work (which could itself spawn more work). The end result is that the traversal may terminate while their are still outstanding work items to process. This problem was observed through spurious failures in CI. I was not actually able to reproduce the bug locally. We fix this by changing our strategy to detect termination using a counter. Namely, we increment the counter just before sending new work and decrement the counter just after finishing work. In this way, we guarantee that the counter only ever reaches 0 once there is no more work to process. See #1337 for more discussion. Many thanks to @zsugabubus for helping me work through this.	2020-02-20 16:07:51 -05:00
asymmetric	b44554c803	ignore/types: add K type Adds support for files used by the K executable semantic framework: http://www.kframework.org/index.php/Main_Page PR #1493	2020-02-19 07:07:09 -05:00
Andrew Gallant	fdd8510fdd	repo: move all source code in crates directory The top-level listing was just getting a bit too long for my taste. So put all of the code in one directory and shrink the large top-level mess to a small top-level mess. NOTE: This commit only contains renames. The subsequent commit will actually make ripgrep build again. We do it this way with the naive hope that this will make it easier for git history to track the renames. Sigh.	2020-02-17 19:24:53 -05:00

30 Commits