ripgrep/crates/ignore/Cargo.toml

[package]
name = "ignore"
version = "0.4.17"  #:version
authors = ["Andrew Gallant <jamslam@gmail.com>"]
description = """
A fast library for efficiently matching ignore files such as `.gitignore`
against file paths.
"""
documentation = "https://docs.rs/ignore"
homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore"
repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore"
readme = "README.md"
keywords = ["glob", "ignore", "gitignore", "pattern", "file"]
license = "Unlicense/MIT"

[lib]
name = "ignore"
bench = false

[dependencies]
crossbeam-utils = "0.8.0"
globset = { version = "0.4.5", path = "../globset" }
lazy_static = "1.1"
log = "0.4.5"
memchr = "2.1"
regex = "1.1"
same-file = "1.0.4"
thread_local = "1"
walkdir = "2.2.7"

[target.'cfg(windows)'.dependencies.winapi-util]
version = "0.1.2"

[dev-dependencies]
crossbeam-channel = "0.5.0"

[features]
simd-accel = ["globset/simd-accel"]
Move all gitignore matching to separate crate. This PR introduces a new sub-crate, `ignore`, which primarily provides a fast recursive directory iterator that respects ignore files like gitignore and other configurable filtering rules based on globs or even file types. This results in a substantial source of complexity moved out of ripgrep's core and into a reusable component that others can now (hopefully) benefit from. While much of the ignore code carried over from ripgrep's core, a substantial portion of it was rewritten with the following goals in mind: 1. Reuse matchers built from gitignore files across directory iteration. 2. Design the matcher data structure to be amenable for parallelizing directory iteration. (Indeed, writing the parallel iterator is the next step.) Fixes #9, #44, #45 2016-10-12 01:57:09 +02:00			`[package]`
			`name = "ignore"`
ignore-0.4.17 2020-11-23 17:25:33 +02:00			`version = "0.4.17" #:version`
Move all gitignore matching to separate crate. This PR introduces a new sub-crate, `ignore`, which primarily provides a fast recursive directory iterator that respects ignore files like gitignore and other configurable filtering rules based on globs or even file types. This results in a substantial source of complexity moved out of ripgrep's core and into a reusable component that others can now (hopefully) benefit from. While much of the ignore code carried over from ripgrep's core, a substantial portion of it was rewritten with the following goals in mind: 1. Reuse matchers built from gitignore files across directory iteration. 2. Design the matcher data structure to be amenable for parallelizing directory iteration. (Indeed, writing the parallel iterator is the next step.) Fixes #9, #44, #45 2016-10-12 01:57:09 +02:00			`authors = ["Andrew Gallant <jamslam@gmail.com>"]`
			`description = """`
			A fast library for efficiently matching ignore files such as `.gitignore`
			`against file paths.`
			`"""`
			`documentation = "https://docs.rs/ignore"`
crates: update URLs in Cargo.toml This corrects an oversight when the repo was re-organized to have its crates moved into a 'crates' sub-directory. PR #1505 2020-02-29 03:31:43 +02:00			`homepage = "https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore"`
			`repository = "https://github.com/BurntSushi/ripgrep/tree/master/crates/ignore"`
Move all gitignore matching to separate crate. This PR introduces a new sub-crate, `ignore`, which primarily provides a fast recursive directory iterator that respects ignore files like gitignore and other configurable filtering rules based on globs or even file types. This results in a substantial source of complexity moved out of ripgrep's core and into a reusable component that others can now (hopefully) benefit from. While much of the ignore code carried over from ripgrep's core, a substantial portion of it was rewritten with the following goals in mind: 1. Reuse matchers built from gitignore files across directory iteration. 2. Design the matcher data structure to be amenable for parallelizing directory iteration. (Indeed, writing the parallel iterator is the next step.) Fixes #9, #44, #45 2016-10-12 01:57:09 +02:00			`readme = "README.md"`
			`keywords = ["glob", "ignore", "gitignore", "pattern", "file"]`
			`license = "Unlicense/MIT"`

			`[lib]`
			`name = "ignore"`
			`bench = false`

			`[dependencies]`
deps: targeted update of some dependencies This updates encoding_rs, crossbeam-utils and crossbeam-channel. This serves two purposes. The encoding_rs update fixes a compilation failure on the latest nightly. The crossbeam updates are good sense and to reduce duplicate dependencies such as cfg-if. (Although, we note that the log crate still pulls in cfg-if 0.1, so ripgrep has a duplicate dependency there for now. But it's very small.) Fixes #1721, Closes #1705 2020-11-02 16:04:39 +02:00			`crossbeam-utils = "0.8.0"`
deps: update minimal versions for dependencies 2020-05-09 16:39:43 +02:00			`globset = { version = "0.4.5", path = "../globset" }`
deps: update various dependencies We also increase the MSRV to 1.32, the current stable release, which sets the stage for migrating to Rust 2018. 2019-01-19 16:32:32 +02:00			`lazy_static = "1.1"`
deps: update versions for all crates I don't think every change here is needed, but this ensures we're using the latest version of every direct dependency. 2018-09-07 19:14:31 +02:00			`log = "0.4.5"`
deps: update various dependencies We also increase the MSRV to 1.32, the current stable release, which sets the stage for migrating to Rust 2018. 2019-01-19 16:32:32 +02:00			`memchr = "2.1"`
			`regex = "1.1"`
			`same-file = "1.0.4"`
deps: update to thread_local 1.0 We also update the pcre2 and regex dependencies, which removes any other lingering uses of thread_local 0.3. 2020-01-10 03:58:28 +02:00			`thread_local = "1"`
deps: update various dependencies We also increase the MSRV to 1.32, the current stable release, which sets the stage for migrating to Rust 2018. 2019-01-19 16:32:32 +02:00			`walkdir = "2.2.7"`
Move all gitignore matching to separate crate. This PR introduces a new sub-crate, `ignore`, which primarily provides a fast recursive directory iterator that respects ignore files like gitignore and other configurable filtering rules based on globs or even file types. This results in a substantial source of complexity moved out of ripgrep's core and into a reusable component that others can now (hopefully) benefit from. While much of the ignore code carried over from ripgrep's core, a substantial portion of it was rewritten with the following goals in mind: 1. Reuse matchers built from gitignore files across directory iteration. 2. Design the matcher data structure to be amenable for parallelizing directory iteration. (Indeed, writing the parallel iterator is the next step.) Fixes #9, #44, #45 2016-10-12 01:57:09 +02:00
ignore: add 'same_file_system' option This commit adds a 'same_file_system' option to the walk builder. For single threaded walking, it defers to the walkdir crate, which has the same option. The bulk of this commit implements this flag for the parallel walker. We add one very feeble test for this. The parallel walker is now officially a complete mess. Closes #321 2018-08-26 03:08:42 +02:00			`[target.'cfg(windows)'.dependencies.winapi-util]`
ignore: correctly detect hidden files on Windows This commit fixes a bug where ripgrep only treated files beginning with a `.` as hidden. On Windows, we continue this tradition, but additionally check whether a file has the special Windows "hidden" attribute set. If so, we treat it as a hidden file. In order to make this work without an additional stat call, we had to rearrange some of the plumbing from the directory traverser. Fixes #1154 2019-01-27 17:45:09 +02:00			`version = "0.1.2"`
windows: fix OneDrive traversals This commit fixes a bug on Windows where directory traversals were completely broken when attempting to scan OneDrive directories that use the "file on demand" strategy. The specific problem was that Rust's standard library treats OneDrive directories as reparse points instead of directories, which causes methods like `FileType::is_file` and `FileType::is_dir` to always return false, even when retrieved via methods like `metadata` that purport to follow symbolic links. We fix this by peppering our code with checks on the underlying file attributes exposed by Windows. We consider an entry a directory if and only if the directory bit is set on the attributes. We are careful to make sure that the code remains the same on non-Windows platforms. Note that we also bump the dependency on `walkdir`, which contains a similar fix for its traversals. This bug is recorded upstream: https://github.com/rust-lang/rust/issues/46484 Upstream also has a pending PR: https://github.com/rust-lang/rust/pull/47956 Fixes #705 2018-02-02 04:11:02 +02:00
crates/ignore: switch to depth first traversal This replaces the use of channels in the parallel directory traversal with a simple stack. The primary motivation for this change is to reduce peak memory usage. In particular, when using a channel (which is a queue), we wind up visiting files in a breadth first fashion. Using a stack switches us to a depth first traversal. While there are no real intrinsic differences, depth first traversal generally tends to use less memory because directory trees are more commonly wide than they are deep. In particular, the queue/stack size itself is not the only concern. In one recent case documented in #1550, a user wanted to search all Rust crates. The directory structure was shallow but extremely wide, with a single directory containing all crates. This in turn results is in descending into each of those directories and building a gitignore matcher for each (since most crates have `.gitignore` files) before ever searching a single file. This means that ripgrep has all such matchers in memory simultaneously, which winds up using quite a bit of memory. In a depth first traversal, peak memory usage is much lower because gitignore matches are built and discarded more quickly. In the case of searching all crates, the peak memory usage decrease is dramatic. On my system, it shrinks by an order magnitude, from almost 1GB to 50MB. The decline in peak memory usage is consistent across other use cases as well, but is typically more modest. For example, searching the Linux repo has a 50% decrease in peak memory usage and searching the Chromium repo has a 25% decrease in peak memory usage. Search times generally remain unchanged, although some ad hoc benchmarks that I typically run have gotten a bit slower. As far as I can tell, this appears to be result of scheduling changes. Namely, the depth first traversal seems to result in searching some very large files towards the end of the search, which reduces the effectiveness of parallelism and makes the overall search take longer. This seems to suggest that a stack isn't optimal. It would instead perhaps be better to prioritize searching larger files first, but it's not quite clear how to do this without introducing more overhead (getting the file size for each file requires a stat call). Fixes #1550 2020-04-18 01:58:47 +02:00			`[dev-dependencies]`
deps: targeted update of some dependencies This updates encoding_rs, crossbeam-utils and crossbeam-channel. This serves two purposes. The encoding_rs update fixes a compilation failure on the latest nightly. The crossbeam updates are good sense and to reduce duplicate dependencies such as cfg-if. (Although, we note that the log crate still pulls in cfg-if 0.1, so ripgrep has a duplicate dependency there for now. But it's very small.) Fixes #1721, Closes #1705 2020-11-02 16:04:39 +02:00			`crossbeam-channel = "0.5.0"`
crates/ignore: switch to depth first traversal This replaces the use of channels in the parallel directory traversal with a simple stack. The primary motivation for this change is to reduce peak memory usage. In particular, when using a channel (which is a queue), we wind up visiting files in a breadth first fashion. Using a stack switches us to a depth first traversal. While there are no real intrinsic differences, depth first traversal generally tends to use less memory because directory trees are more commonly wide than they are deep. In particular, the queue/stack size itself is not the only concern. In one recent case documented in #1550, a user wanted to search all Rust crates. The directory structure was shallow but extremely wide, with a single directory containing all crates. This in turn results is in descending into each of those directories and building a gitignore matcher for each (since most crates have `.gitignore` files) before ever searching a single file. This means that ripgrep has all such matchers in memory simultaneously, which winds up using quite a bit of memory. In a depth first traversal, peak memory usage is much lower because gitignore matches are built and discarded more quickly. In the case of searching all crates, the peak memory usage decrease is dramatic. On my system, it shrinks by an order magnitude, from almost 1GB to 50MB. The decline in peak memory usage is consistent across other use cases as well, but is typically more modest. For example, searching the Linux repo has a 50% decrease in peak memory usage and searching the Chromium repo has a 25% decrease in peak memory usage. Search times generally remain unchanged, although some ad hoc benchmarks that I typically run have gotten a bit slower. As far as I can tell, this appears to be result of scheduling changes. Namely, the depth first traversal seems to result in searching some very large files towards the end of the search, which reduces the effectiveness of parallelism and makes the overall search take longer. This seems to suggest that a stack isn't optimal. It would instead perhaps be better to prioritize searching larger files first, but it's not quite clear how to do this without introducing more overhead (getting the file size for each file requires a stat call). Fixes #1550 2020-04-18 01:58:47 +02:00
Move all gitignore matching to separate crate. This PR introduces a new sub-crate, `ignore`, which primarily provides a fast recursive directory iterator that respects ignore files like gitignore and other configurable filtering rules based on globs or even file types. This results in a substantial source of complexity moved out of ripgrep's core and into a reusable component that others can now (hopefully) benefit from. While much of the ignore code carried over from ripgrep's core, a substantial portion of it was rewritten with the following goals in mind: 1. Reuse matchers built from gitignore files across directory iteration. 2. Design the matcher data structure to be amenable for parallelizing directory iteration. (Indeed, writing the parallel iterator is the next step.) Fixes #9, #44, #45 2016-10-12 01:57:09 +02:00			`[features]`
			`simd-accel = ["globset/simd-accel"]`