ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-08-04 21:52:54 +02:00

Author	SHA1	Message	Date
Jakub Wieczorek	322fc75a3d	ignore: make walker visit untraversable directories This commit fixes an inconsistency between the serial and the parallel directory walkers around visiting a directory for which the user holds insufficient permissions to descend into. The serial walker does produce a successful entry for a directory that it cannot descend into due to insufficient permissions. However, before this change that has not been the case for the parallel walker, which would produce an `Err` item not only when descending into a directory that it cannot read from but also for the directory entry itself. This change brings the behaviour of the parallel variant in line with that of the serial one. Fixes #1346, Closes #1365	2020-02-17 17:16:28 -05:00
Jakub Wieczorek	b435eaafc8	grep-regex: fix inner literal extraction bug This appears to be another transcription bug from copying this code from the prefix literal detection from inside the regex crate. Namely, when it comes to inner literals, we only want to treat counted repetition as two separate cases: the case when the minimum match is 0 and the case when the minimum match is more than 0. In the former case, we treat `e{0,n}` as `e*` and in the latter we treat `e{m,n}` where `m >= 1` as just `e`. We could definitely do better here. e.g., This means regexes like `(foo){10}` will only have `foo` extracted as a literal, where searching for the full literal would likely be faster. The actual bug here was that we were not implementing this logic correctly. Namely, we weren't always "cutting" the literals in the second case to prevent them from being expanded. Fixes #1319, Closes #1367	2020-02-17 17:16:28 -05:00
Ed Page	f8e70294d5	ignore: allow post-processing at end-of-thread On top of the parallel-walk's closures, this provides a Visitor API. This clarifies the role of the two different closures in the `run` API and allows implementing of `Drop` for post-processing once traversal is finished. The closure API is maintained not just for compatibility but also convinience for simple cases. Fixes #469, Closes #1430	2020-02-17 17:16:28 -05:00
Ed Page	578e2d47a8	core: simplify parallel walking using borrows This changes ripgrep to use ignore's new support for borrowing data when walking in parallel.	2020-02-17 17:16:28 -05:00
Ed Page	9f7c2ebc09	ignore: allow parallel walker to borrow data This makes it so the caller can more easily refactor from single-threaded to multi-threaded walking. If they want to support both, this makes it easier to do so with a single initialization code-path. In particular, it side-steps the need to put everything into an `Arc`. This is not a breaking change because it strictly increases the number of allowed inputs to `WalkParallel::run`. Closes #1410, Closes #1432	2020-02-17 17:16:28 -05:00
Andrew Gallant	5c1eac41a3	changelog: highlight a bad performance regression	2020-02-17 17:16:28 -05:00
Johannes Altmanninger	6f2b79f584	ignore: use git commondir for sourcing .git/info/exclude Git looks for this file in GIT_COMMON_DIR, which is usually the same as GIT_DIR (.git). However, when searching inside a linked worktree, .git is usually a file that contains the path of the actual git dir, which in turn contains a file "commondir" which references the directory where info/exclude may reside, alongside other configuration shared across all worktrees. This directory is usually the git dir of the main worktree. Unlike git this does not read environment variables GIT_DIR and GIT_COMMON_DIR, because it is not clear how to interpret them when searching multiple repositories. Fixes #1445, Closes #1446	2020-02-17 17:16:28 -05:00
Andrew Gallant	0c3b673e4c	cli: make ripgrep work in non-existent directories It turns out that querying the CWD while in a directory that no longer exists results in an error. Since the CWD is queried every time ripgrep starts---whether it needs it or not---for dealing with glob matching, ripgrep winds up being completely useless inside a non-existent directory. We fix this in a few different ways: * Firstly, if std::env::current_dir() fails, then we fall back to trying to read the `PWD` environment variable. * If that fails, that we return a more sensible error message so that a user can at least react to the problem. Previously, the error message was inscrutable. * Finally, we try to avoid the problem altogether by building empty glob matchers if not globs were provided, thus side-stepping querying the CWD completely. Fixes #1291, Closes #1400	2020-02-17 17:16:28 -05:00
Naveen Nathan	297b428c8c	cli: add --no-ignore-exclude flag This commit adds a new --no-ignore-exclude flag that permits disabling the use of .git/info/exclude filtering. Local exclusions are manual configurations to a repository and are not shared, so it is sometimes useful to disable to get a consistent view of a repository. This also adds a new section to the man page that describes automatic filtering. Closes #1420	2020-02-17 17:16:28 -05:00
Manfred Endres	804b43ecd8	globset: implement FromStr for Glob The `globset::Glob` type [`new`] function creates a new value with an `&str` parameter which returns an `Result<Glob, Error>` object. This is exactly what [`std::str::FromStr::from_str`][`std::str::FromStr`] defines. Libraries like [`clap`] use [`std::str::FromStr`] to create objects from provided commandline arguments. This change makes this library usable without a newtype wrapper. [`std::str::FromStr`]: https://doc.rust-lang.org/std/str/trait.FromStr.html [`clap`]: https://docs.rs/clap/2.33.0/clap/macro.value_t.html [`new`]: https://docs.rs/globset/0.4.4/globset/struct.Glob.html#method.new Closes #1447	2020-02-17 17:16:28 -05:00
Lucien Greathouse	2263b8ac92	globset: add GlobMatcher::glob This exposes the underlying `Glob` used to compile the matcher. This can be useful for wrapping up the glob matcher in other types. Closes #1454	2020-02-17 17:16:28 -05:00
Andrew Gallant	cd8ec38a68	grep-regex: add fast path for -w/--word-regexp Previously, ripgrep would always defer to the regex engine's capturing matches in order to implement word matching. Namely, ripgrep would determine the correct match offsets via a capturing group, since the word regex is itself generated from the user supplied regex. Unfortunately, the regex engine's capturing mode is still fairly slow, so this commit adds a fast path to avoid capturing mode in the vast majority of cases. See comments in the code for details.	2020-02-17 17:16:28 -05:00
Andrew Gallant	6a0e0147e0	grep-regex: improve literal detection with -w When the -w/--word-regexp was used, ripgrep would in many cases fail to apply literal optimizations. This occurs specifically when the regex given by the user is an alternation of literals with no common prefixes or suffixes, e.g., rg -w 'foo\|bar\|baz\|quux' In this case, the inner literal detector fails. Normally, this would result in literal prefixes being detected by the regex engine. But because of the -w/--word-regexp flag, the actual regex that we run ends up looking like this: (^\|\W)(foo\|bar\|baz\|quux)($\|\W) which of course defeats any prefix or suffix literal optimizations in the regex crate's somewhat naive extractor. (A better extractor could still do literal optimizations in the above case.) So this commit fixes this by falling back to prefix or suffix literals when they're available instead of prematurely giving up and assuming the regex engine will do the rest.	2020-02-17 17:16:28 -05:00
Andrew Gallant	ad97e9c93f	grep-regex: improve inner literal detection This fixes an interesting performance bug where the inner literal extractor would sometimes choose a sub-optimal literal. For example, consider the regex: \x20+Sherlock Holmes\x20+ (The `\x20` is the ASCII code for a space character, which we use here to just make it clearer. It otherwise does not matter.) Previously, this would see the initial \x20 and then stop collecting literals after the `+` repetition operator. This was because the inner literal detector was adapter from the prefix literal detector, which had to stop here. Namely, while \x20S would be a valid prefix (for example), \x20\x20S would also be a valid prefix. As would \x20\x20\x20S and so on. So the prefix detector would have to stop at the repetition operator. Otherwise, only searching for \x20S could potentially scan farther then the starting position of the next match. However, for inner literals, this calculus no longer makes sense. We can freely search for, e.g., \x20S without missing matches that start with \x20\x20S precisely because we know this is an inner literal which may not correspond to the start of a match. With this fix, the literal that is now detected is \x20Sherlock Holmes\x20 Which is much better. We achieve this by no longer "cutting" literals after seeing a `+` repetition operator. Instead, we permit literals to continue to be extended. The reason why this is important is because using \x20 as the literal to search for is generally bad juju since it is so common. In fact, we should probably add more logic here to either avoid such things or give up entirely on the inner literal optimization if it detected a literal that we think is very common. But we punt on such things here.	2020-02-17 17:16:28 -05:00
Robert Irelan	24f8a3e5ec	doc: document `all` file type This adds it to the guide and the docs for the --type flag. Fixes #1344, Closes #1472	2020-02-17 17:16:28 -05:00
Mikko Vedru	1bdb767851	doc: improve docs for `--sort` and `--sortr` flags I improved the help documentation in the following manner and for the following reasons: 1. It's only logical to put the default sub-option on the first possible line, as well as to separately mention that it is indeed the default sub-option. 2. Additional options for the flags should describe the main points of their purpose without requiring user to read the whole help entry. In my opinion, the information sub-options' influence on multi-threading and speed are important enough to warrant their inclusion in each sub-option's description line text. Closes #1434	2020-02-17 17:16:28 -05:00
Andreas Stieger	a4897eca23	readme: simplify openSUSE instructions Closes #1436	2020-02-17 17:16:28 -05:00
Collin Styles	a070722ff2	cli: add --include-zero flag This flag, when used in conjunction with --count or --count-matches, will print a result for each file searched even if there were zero matches in that file. This is off by default but can be enabled to make ripgrep behave more like grep. This also clarifies some of the defaults for the grep-printer::SummaryBuilder type. Closes #1370, Closes #1405	2020-02-17 17:16:28 -05:00
Matěj Cepl	4628d77808	ignore/types: add spec file type This is for RPM package SPEC files. Fixes #946, Closes #1449	2020-02-17 17:16:28 -05:00
Ximin Luo	f8418c6a52	explicitly declare lazy_static dependency `benches/bench.rs` uses lazy_static but Cargo.toml does not declare a dependency on it. This causes rustc to use its own internal private copy instead. Sometimes this causes unintuitive errors like this Debian bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=942243 The underlying issue is https://github.com/rust-lang/rust#27812 but it can be avoided by explicitly declaring the dependency, which you are supposed to do anyways. Closes #1435	2020-02-17 17:16:28 -05:00
luh2	040ca45ba0	ignore/types: add xhtml to xml file type Closes #1426	2020-02-17 17:16:28 -05:00
Andrew Gallant	91470572cd	changelog: add notes about new file types	2020-02-17 17:16:28 -05:00
Sven-Hendrik Haase	027adbf485	ignore/types: add 'diff' file type This includes .patch and .diff files. Fixes #1418, Closes #1419	2020-02-17 17:16:28 -05:00
Mohammad AlSaleh	e71eedf0eb	cli: add --no-context-separator flag --context-separator='' still adds a new line separator, which could still potentially be useful. So we add a new `--no-context-separator` flag that completely disables context separators even when the -A/-B/-C context flags are used. Closes #1390	2020-02-17 17:16:28 -05:00
Andrew Gallant	88f46d12f1	tests: remove existing test directory I'm surprised this wasn't caught until now, but if a test directory already exists, then it was reused. This can result in hard to debug problems with tests when, e.g., file names are changed and a recursive search is executed.	2020-02-17 17:16:28 -05:00
sharkdp	a18cf6ec39	ignore: add existence check for ignore files This commit adds a simple `.exists()` check for `.gitignore`, `.ignore`, and other similar files before actually calling `File::open(…)` in `GitIgnoreBuilder::add`. The reason is that a simple existence check via `stat` can be faster than actually trying to `open` the file, see https://stackoverflow.com/a/12774387/704831. As we typically expect(?) the number of directories without ignore files to be much larger than the number of directories with ignore files, this leads to an overall speedup. The performance gain is not huge for `rg`, but can be quite significant if more `.gitignore`-like files are added via `add_custom_ignore_filename`. The speedup is larger for folders with low files-per-directory ratios. Note though that we do not do this check on Windows until a specific analysis there suggests this is beneficial. Namely, Windows generally has slower file system operations, so it's not clear whether this speculative check is actually a benefit or not. Benchmark results ----------------- `rg --files` in my home folder (200k results, 6.5 files per directory): \| Command \| Mean [ms] \| Min [ms] \| Max [ms] \| Relative \| \|:---\|---:\|---:\|---:\|---:\| \| `./rg-master --files` \| 396.4 ± 3.2 \| 390.9 \| 400.0 \| 1.05 \| \| `./rg-feature --files` \| 376.0 ± 3.6 \| 369.3 \| 383.5 \| 1.00 \| `rg --files --hidden` in my home folder (800k results, 5.4 files per directory) \| Command \| Mean [s] \| Min [s] \| Max [s] \| Relative \| \|:---\|---:\|---:\|---:\|---:\| \| `./rg-master --files --hidden` \| 1.575 ± 0.012 \| 1.560 \| 1.597 \| 1.06 \| \| `./rg-feature --files --hidden` \| 1.479 ± 0.011 \| 1.464 \| 1.496 \| 1.00 \| `rg --files` in the chromium-79.0.3915.2 source tree (300k results, 12.7 files per directory) \| Command \| Mean [ms] \| Min [ms] \| Max [ms] \| Relative \| \|:---\|---:\|---:\|---:\|---:\| \| `~/rg-master --files` \| 445.2 ± 5.3 \| 435.6 \| 453.0 \| 1.04 \| \| `~/rg-feature --files` \| 428.9 ± 7.0 \| 418.2 \| 440.0 \| 1.00 \| `rg --files` in the linux-5.3 source tree (65k results, 15.1 files per directory) \| Command \| Mean [ms] \| Min [ms] \| Max [ms] \| Relative \| \|:---\|---:\|---:\|---:\|---:\| \| `./rg-master --files` \| 94.5 ± 1.9 \| 89.8 \| 98.5 \| 1.02 \| \| `./rg-feature --files` \| 92.6 ± 2.7 \| 88.4 \| 98.7 \| 1.00 \| Closes #1381	2020-02-17 17:16:28 -05:00
Gibson Fahnestock	c78c3236a8	readme: remove outdated SIMD info Looks like the upstream brew Formula [0][] now has SIMD support, so remove the extraneous info now that the custom tap is no longer needed [1][]. [0]: https://github.com/Homebrew/homebrew-core/blob/master/Formula/ripgrep.rb [1]: `f3083e4574` PR #1431	2020-02-15 17:19:22 -05:00
Sorin Sbarnea	7cf21600cd	readme: document CentOS 8 support ripgrep install instructions are valid even for the 7 version. The tool works without problems on these too. PR #1428	2020-02-15 17:16:57 -05:00
Jonathan Mast	647b0d3977	ignore/types: add HAML and ERB These are commonly used templating languages for Ruby, add their extensions to the filetypes list for convenient filtering. PR #1407	2020-02-15 09:18:32 -05:00
Jeff S	e572fc1683	ignore/types: add slim, slime, and skim templates PR #1391	2020-02-15 09:17:46 -05:00
Andrew Gallant	9cb93abd11	ignore: allow use of Error::description We can remove it in the next semver incompatible release.	2020-02-10 06:44:21 -05:00
Luca Kredel	41695c66fa	ignore/types: add typoscript file type Add the file types for TypoScript - the configuration language of the TYPO3 CMS. PR #1477	2020-02-07 08:41:00 -05:00
Andrew Gallant	cb0dfda936	faq: add section about donations This is asked often enough that it's worth having a canonical answer.	2020-02-05 13:09:11 -05:00
Andrew Gallant	74d1fe59e9	deps: update everything	2020-01-30 18:33:40 -05:00
Andrew Gallant	9fd1e202e0	deps: update regex, regex-syntax and aho-corasick Notably, this brings in a bug fix reported by @okdana: https://github.com/rust-lang/regex/issues/640	2020-01-30 18:32:56 -05:00
Robert Irelan	e76807b1b5	ignore/types: add *.org_archive to org file type .org_archive is the default extension for Org archive files, created when entries from an Org-mode file are archived (see <https://orgmode.org/org.html#Moving-subtrees>). These files are still in Org mode format, so it's worth searching them at the same time as non-archive Org mode files. PR #1475	2020-01-29 13:59:34 -05:00
Andrew Gallant	f8fb65f7e3	globset: fix benchmarks There were apparently a lot of unused things, including lazy_static.	2020-01-27 16:45:12 -05:00
Tristan Waddington	98de8d248a	ignore/types: make 'gradle' it's own type This change maintains the existing behavior of the 'groovy' type, which includes both .groovy and .gradle files. PR #1470	2020-01-23 06:51:11 -05:00
Crestwave	c358700dfb	readme: add instructions for Haiku x86_64 and x86_gcc2 PR #1465	2020-01-21 07:34:24 -05:00
Alex Touchet	8670a4a969	readme: update outdated links PR #1463	2020-01-21 07:32:54 -05:00
Oliver Newman	e3b1f86908	doc: add missing "will" to the user guide PR #1462	2020-01-20 17:26:08 -05:00
Jan Verbeek	46b07bb2ee	ignore/types: fix postscript globs The postscript globs were missing asterisks, so they were treated as literal filenames. PR #1461	2020-01-20 07:48:57 -05:00
Andrew Gallant	8bdf84e3a8	deps: update everything	2020-01-16 19:47:23 -05:00
Andrew Gallant	5a6e17fcc1	deps: various updates Most of these updates (sans thread_local) are from crates I maintain that have seen updates recently. Notably, this includes a bump to `termcolor 1.1.0` which includes support for respecting `NO_COLOR`. This commit therefore means that ripgrep now supports `NO_COLOR`. As an added bonus, we drop a dependency on Windows. (Although the total amount of code compiled remains the same.) Closes #1186	2020-01-11 10:09:10 -05:00
Andrew Gallant	00bfcd14a6	ignore-0.4.11 ignore-0.4.11	2020-01-10 15:08:27 -05:00
Andrew Gallant	bf0ddc4675	ci: fix musl docker build Looks like the old japaric images are bunk. We update our docker image to be based on the new rustembedded images and configure cross to use it. Turns out that this wasn't due to a stale docker image, but rather, a bug in cross: https://github.com/rust-embedded/cross/issues/357 We work around that bug by installing the master branch of cross. Sigh.	2020-01-10 15:07:47 -05:00
Andrew Gallant	0fb3f6a159	ci: disable github actions for now The CI build failures are annoying and distracting. Hopefully soon I'll be able to invest more time in the switch.	2020-01-10 15:07:47 -05:00
Andrew Gallant	837fb5e21f	deps: update to crossbeam-channel 0.4 Closes #1427	2020-01-10 15:07:47 -05:00
Andrew Gallant	2e1815606e	deps: update to bytecount 0.6 Looks like there aren't any major changes other than dependency updates.	2020-01-10 15:07:47 -05:00
Andrew Gallant	cb2f6ddc61	deps: update to thread_local 1.0 We also update the pcre2 and regex dependencies, which removes any other lingering uses of thread_local 0.3.	2020-01-10 15:07:47 -05:00

1 2 3 4 5 ...

1361 Commits