1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00
Commit Graph

30 Commits

Author SHA1 Message Date
Richard Khoury
a28e664abd ignore: check ignore rules before issuing stat calls
This seems like an obvious optimization but becomes critical when
filesystem operations even as simple as stat can result in significant
overheads; an example of this was a bespoke filesystem layer in Windows
that hosted files remotely and would download them on-demand when
particular filesystem operations occurred. Users of this system who
ensured correct file-type fileters were being used could still get
unnecessary file access resulting in large downloads.

Fixes , Closes 
2021-05-31 21:51:18 -04:00
Raimon Grau
53c4855517 ignore/types: add red
See: https://www.red-lang.org/

Closes 
2021-05-31 21:51:18 -04:00
Simon Morgan
121e0135c1 ignore/types: replace duplicate glob with *.aspx.vb
*.aspx.cs was listed twice and the VB variant is missing.

Closes 
2021-05-31 21:51:18 -04:00
Marco Ieni
b3a6a69f9d ci: check docs for all crates
This also replaces '--all' in Cargo commands with '--workspace'. The
former has apparently been deprecated.

We also fix a couple warnings that this new step detected.

Closes 
2021-05-31 21:51:18 -04:00
jack1142
ba965962fe
ignore/types: add po files to supported types
See: https://www.gnu.org/software/gettext/manual/html_node/PO-Files.html

Closes 
2021-05-28 12:06:10 -04:00
jgart
4ebe8375ec
ignore/types: add mint
PR 
2021-04-04 08:00:12 -04:00
Sergei Vorobev
9c8d873a75
ignore/types: improve bazel globs
Adds *.BUILD and *.bazelrc.

PR 
2021-01-30 18:22:48 -05:00
Ed Page
873abecbf1
ignore: provide underlying IO Error
`ignore::Error` wraps `std::io::Error` with additional information
(as well as expose non-IO errors). For people wanting to inspect what
the error is, they have to recursively match the Enum. This provides
`io_error` and `into_io_error` helpers to do this for the user.

PR 
2020-11-23 10:19:31 -05:00
James Harr
44e69ba627
ignore/types: add yang file type
YANG is described in RFC 6020
https://tools.ietf.org/html/rfc6020

PR 
2020-11-20 09:41:29 -05:00
Vanessa McHale
e1ac18ef06
ignore/types: add Futhark
See: https://futhark-lang.org/

PR 
2020-10-31 12:10:15 -04:00
Brandon Adams
ba3f9673ad
ignore/types: generalize bazel type a bit
Bazel supports `BUILD.bazel` as well as `WORKSPACE.bazel`. In
addition, it is common to ship BUILD/WORKSPACE templates for
external repositories suffixed with .bazel for easier tool
recognition.

Co-authored-by: Brandon Adams <brandon.adams@imc.com>

PR 
2020-10-23 12:24:30 -04:00
Dương Đỗ Minh Châu
86c843a44b
ignore/types: add a type for minified files
Fixes , PR 
2020-10-19 09:10:54 -04:00
Andrew Pyatkov
6301e20ee4
ignore/types: add flatbuffers type
See: https://google.github.io/flatbuffers/

PR 
2020-10-16 20:19:16 -04:00
Andy Freeland
fc2a99bb1f
ignore/types: add vcl ()
VCL is the Varnish Configuration Language used by Varnish and Fastly.

https://varnish-cache.org/docs/trunk/users-guide/vcl.html

PR 
2020-08-19 16:28:14 -04:00
Raimon Grau (rgrau)
ffd4c9ccba
ignore/types: add racket
PR 
2020-06-25 08:51:32 -04:00
jtrakk
a16bfcb3d6
ignore/types: add dvc
This provides support for DVC files (https://dvc.org/).

PR 
2020-06-09 07:44:09 -04:00
Martin Michlmayr
1b2c1dc675
doc: fix typos
PR 
2020-06-04 09:06:09 -04:00
Gerion Entrup
b72ad8f8aa
ignore/types: add meson filetype
Closes , PR 
2020-05-18 14:01:35 -04:00
Casey Rodarmor
793c1179cc ignore: allow filtering with predicate
Adds `WalkBuilder::filter_entry` that takes a predicate to be applied to
all entries. If the predicate returns `false` on a given entry, that
entry and all children will be skipped.

Fixes , Closes 
2020-05-08 23:24:40 -04:00
Wieland Hoffmann
df7a3bfc7f grep-cli: support files compressed by compress(1)
While Linux distributions (at least Arch Linux, RHEL, Debian) do not support
compressing files with compress(1), macOS & AIX do (the utility is part of
POSIX). Additionally, gzip is able to uncompress such compressed files and
provides an `uncompress` binary.

Closes 
2020-05-08 23:24:40 -04:00
Andrew Gallant
139f186e57 crates/ignore: switch to depth first traversal
This replaces the use of channels in the parallel directory traversal
with a simple stack. The primary motivation for this change is to reduce
peak memory usage. In particular, when using a channel (which is a
queue), we wind up visiting files in a breadth first fashion. Using a
stack switches us to a depth first traversal. While there are no real
intrinsic differences, depth first traversal generally tends to use less
memory because directory trees are more commonly wide than they are
deep.

In particular, the queue/stack size itself is not the only concern. In
one recent case documented in , a user wanted to search all Rust
crates. The directory structure was shallow but extremely wide, with a
single directory containing all crates. This in turn results is in
descending into each of those directories and building a gitignore
matcher for each (since most crates have `.gitignore` files) before ever
searching a single file. This means that ripgrep has all such matchers
in memory simultaneously, which winds up using quite a bit of memory.

In a depth first traversal, peak memory usage is much lower because
gitignore matches are built and discarded more quickly. In the case of
searching all crates, the peak memory usage decrease is dramatic. On my
system, it shrinks by an order magnitude, from almost 1GB to 50MB. The
decline in peak memory usage is consistent across other use cases as
well, but is typically more modest. For example, searching the Linux
repo has a 50% decrease in peak memory usage and searching the Chromium
repo has a 25% decrease in peak memory usage.

Search times generally remain unchanged, although some ad hoc benchmarks
that I typically run have gotten a bit slower. As far as I can tell,
this appears to be result of scheduling changes. Namely, the depth first
traversal seems to result in searching some very large files towards the
end of the search, which reduces the effectiveness of parallelism and
makes the overall search take longer. This seems to suggest that a stack
isn't optimal. It would instead perhaps be better to prioritize
searching larger files first, but it's not quite clear how to do this
without introducing more overhead (getting the file size for each file
requires a stat call).

Fixes 
2020-04-18 11:33:03 -04:00
Zoltan Puskas
4dfea016b9 ignore/types: add ebuild type
Add support for Gentoo's portage package manager spec files:
https://wiki.gentoo.org/wiki/Portage
2020-03-29 18:44:04 -04:00
Andrew Gallant
8ba6ccd159
ignore: fix failing test
This fixes fallout from fixing .
2020-03-16 19:16:24 -04:00
Andrew Gallant
34edb8123a
ignore: squash noisy error message
We should not assume that the commondir file actually exists. If it
doesn't, then just move on. This otherwise emits an error message when
searching normal submodules, which is not OK.

This regression was introduced in .

Fixes 
2020-03-16 18:50:02 -04:00
Wolf Honore
227436624f
ignore/types: add coq type
PR 
2020-02-28 19:11:29 -05:00
Andrew Gallant
4176050cdd
ignore: another simplification
Again, thanks to @zsugabubus!
2020-02-20 17:26:34 -05:00
Andrew Gallant
109460fce2
ignore: simplify parallel worker initialization
We can just ask the channel whether any work has been loaded. Normally
querying a channel for its length is a strong predictor of bugs, but in
this case, we do it before we ever attempt a `recv`, so it should work.

Kudos to @zsugabubus for suggesting this!
2020-02-20 16:50:41 -05:00
Andrew Gallant
f314b0d55f ignore: fix parallel traversal
It turns out that the previous version wasn't quite correct. Namely, it
was possible for the following sequence to occur:

1. Consider that all workers, except for one, are `waiting`.
2. The last remaining worker finds one more job to do and sends it on
   the channel.
3. One of the previously `waiting` workers wakes up from the job that
   the last running worker sent, but `self.resume()` has not been
   called yet.
4. The last worker, from (2), calls `get_work` and sees that the
   channel has nothing on it, so it executes `self.waiting() ==
   1`. Since the worker in (3) hasn't called `self.resume()` yet,
   `self.waiting() == 1` evaluates to true.
5. This sets off a chain reaction that stops all workers, despite that
   fact that (3) got more work (which could itself spawn more work).

The end result is that the traversal may terminate while their are still
outstanding work items to process. This problem was observed through
spurious failures in CI. I was not actually able to reproduce the bug
locally.

We fix this by changing our strategy to detect termination using a
counter. Namely, we increment the counter just before sending new work
and decrement the counter just after finishing work. In this way, we
guarantee that the counter only ever reaches 0 once there is no more
work to process.

See  for more discussion. Many thanks to @zsugabubus for helping me
work through this.
2020-02-20 16:07:51 -05:00
asymmetric
b44554c803
ignore/types: add K type
Adds support for files used by the K executable semantic framework:
http://www.kframework.org/index.php/Main_Page

PR 
2020-02-19 07:07:09 -05:00
Andrew Gallant
fdd8510fdd repo: move all source code in crates directory
The top-level listing was just getting a bit too long for my taste. So
put all of the code in one directory and shrink the large top-level mess
to a small top-level mess.

NOTE: This commit only contains renames. The subsequent commit will
actually make ripgrep build again. We do it this way with the naive hope
that this will make it easier for git history to track the renames.
Sigh.
2020-02-17 19:24:53 -05:00