ripgrep

mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-05-29 21:47:42 +02:00

Author	SHA1	Message	Date
Andrew Gallant	6cdb99ea61	deps: drop bytecount in favor of memchr_iter(..).count() As of the memchr 2.6 release, its Iterator::count method is specialized to only count the number of occurrences instead of finding the offset of each occurrence. This replaces ripgrep's use of the bytecount crate. While micro-benchmarks suggest that memchr's method has better throughput than bytecount, it turned out to be an illusion. Namely, on a ~13GB haystack prior to this change: $ time rg-bytecount 'You killed my friend, my best friend, my lifelong friend!' OpenSubtitles2018.raw.en --line-number 441450441:- You killed my friend, my best friend, my lifelong friend! real 1.473 user 1.186 sys 0.286 maxmem 12512 MB faults 0 And then after: $ time rg 'You killed my friend, my best friend, my lifelong friend!' OpenSubtitles2018.raw.en --line-number 441450441:- You killed my friend, my best friend, my lifelong friend! real 1.532 user 1.280 sys 0.250 maxmem 12512 MB faults 0 But perf is just about in the same ballpark. That's good enough for me at the moment in order to drop the extra dependency. I did this because the marginal cost of adding the Iterator::count() specialization to memchr was extremely small.	2023-09-02 12:25:34 -04:00
Yochem van Rosmalen	d596f6ebd0	ignore/types: add *.vsh to V type PR #2604	2023-08-31 08:51:07 -04:00
Christian Vallentin	6cd9479634	ignore: implement FusedIterator for Walk PR #2567	2023-08-28 22:55:19 -04:00
Andrew Gallant	51765f2f4c	ignore: apply rustfmt I believe this happened because rustfmt now knows how to format `let ... else` constructs.	2023-08-28 20:09:26 -04:00
mataha	962d47e6a1	ignore/types: add Prolog file types This improves the Prolog file type rules. * `.pl` is the most common extension in the wild, though `.pro` is preferred in places where file extension may clash with Perl[1]. * `.P` is used for compatibility with XSB Prolog dialect[2]. PR #2590 [1]: https://www.swi-prolog.org/pldoc/man?section=fileext [2]: https://www.swi-prolog.org/pldoc/man?section=xsb-source	2023-08-21 10:53:56 -04:00
mataha	19b6a45abb	ignore/types: tweak Gradle file types This PR extends Gradle file types with the following: - Kotlin DSL buildscripts (`.gradle.kts`) - Gradle Java properties (`gradle.properties`) - wrapper files (`gradle-wrapper.`) - wrapper scripts (`gradlew`, `gradlew.bat`) PR #2587	2023-08-20 18:49:02 -04:00
Andrew Gallant	61733f6378	globset-0.4.13	2023-08-05 09:34:36 -04:00
Andrew Gallant	7227e94ce5	globset: use non-capture groups in regex transform We currently implement globs by converting them to regexes, and in doing so, sometimes use grouping. In all but one case, we used non-capturing groups. But for alternations, we used capturing groups, which was likely just an oversight. We don't make use of capture groups at all, and while they usually don't have any overhead, they lead to weird cases like this one: https://github.com/rust-lang/regex/issues/1059 That particular issue is also a bug in the regex crate itself, which is fixed in https://github.com/rust-lang/regex/pull/1062. Note though that the bug fix in the regex crate is required. Even with this patch to globset, memory usage is reduced (by about half in rust-lang/regex#1059) but is not returned to where it was prior to the regex 1.9 release.	2023-08-05 09:33:57 -04:00
Andrew Gallant	341a19e0d0	regex: fix fast path for -w/--word-regexp flag (#2576 ) It turns out our fast path for -w/--word-regexp wasn't quite correct in some cases. Namely, we use `(?m:^\|\W)(<original-regex>)(?m:\W\|$)` as the implementation of -w/--word-regexp since `\b(<original-regex>)\b` has some unintuitive results in certain cases, specifically when <original-regex> matches non-word characters at match boundaries. The problem is that using this formulation means that you need to extract the capture group around <original-regex> to find the "real" match, since the surrounding (^\|\W) and (\W\|$) aren't part of the match. This is fine, but the capture group engine is usually slow, so we have a fast path where we try to deduce the correct match boundary after an initial match (before running capture groups). The problem is that doing this is rather tricky because it's hard to know, in general, whether the `^` or the `\W` matched. This still doesn't seem quite right overall, but we at least fix one more case. Fixes #2574	2023-07-31 08:51:09 -04:00
Vidar	fed4fea217	ignore/types: add csproj Supports the .NET C# Project file extension. PR #2575	2023-07-31 07:08:44 -04:00
Andrew Gallant	053a1669bb	globset-0.4.12	2023-07-26 19:51:38 -04:00
David Tolnay	31d3f16254	api: impl Deserialize for GlobSet PR #2569	2023-07-26 19:51:22 -04:00
Andrew Gallant	304a60e8e9	grep-cli-0.1.9	2023-07-18 13:25:23 -04:00
Andrew Gallant	1d35859861	globset-0.4.11	2023-07-12 12:58:43 -04:00
mataha	601e122e9f	ignore/types: add Windows Command Prompt files This PR adds `.bat` and `.cmd` file types. In doing so, it makes a distinction between batch files (old standard from the MS-DOS era) and command scripts (new flavor - can operate on batch files, although `*.cmd` is preferred for various reasons, the main one being batch files will set `ERRORLEVEL` following inconsistent MS-DOS style rules[1]). PR #2556 [1]: https://groups.google.com/g/microsoft.public.win2000.cmdprompt.admin/c/XHeUq8oe2wk/m/LIEViGNmkK0J#i106	2023-07-10 15:58:17 -04:00
nguyenvukhang	6abb962f0d	cli: fix non-path sorting behavior Previously, sorting worked by sorting the parents and then sorting the children within each parent. This was done during traversal, but it only works when sorting parents preserves the overall order. This generally only works for '--sort path' in ascending order. This commit fixes the rest of the sorting behavior by collecting all of the paths to search and then sorting them before searching. We only collect all of the paths when sorting was requested. Fixes #2243, Closes #2361	2023-07-09 10:14:03 -04:00
Edoardo Pirovano	6d95c130d5	cli: add --stop-on-nonmatch flag This causes ripgrep to stop searching an individual file after it has found a non-matching line. But this only occurs after it has found a matching line. Fixes #1790, Closes #1930	2023-07-08 18:52:42 -04:00
Garrett Thornburg	4782ebd5e0	core: lock stdout before printing an error message to stderr Adds a new eprintln_locked macro which locks STDOUT before logging to STDERR. This patch also replaces instances of eprintln with eprintln_locked to avoid interleaving lines. Fixes #1941, Closes #1968	2023-07-08 18:52:42 -04:00
piegames	4993d29a16	globset: add 'escape' routine Fixes #2060, Closes #2061	2023-07-08 18:52:42 -04:00
Seth Stadick	23adbd6795	cli: force binary existance check Previously, we were only doing a binary existence check on Windows. And in fact, the main point there wasn't binary existence, but ensuring we didn't accidentally resolve a binary name relative to the CWD, which could result in executing a program one didn't mean to run. However, it is useful to be able to check whether a binary exists on any platform when associating a glob with a binary. If the binary doesn't exist, then the association can fail eagerly and let some other glob apply. Closes #1946	2023-07-08 18:52:42 -04:00
Michal Terepeta	cb7501ff11	doc: clarify the comment on `Worker.work_done` We call `work_done` only once the work has been actually performed (otherwise `num_pending` could go to 0 before the actual work is done). Closes #2039	2023-07-08 18:52:42 -04:00
Kyle Todeschini	3b66f37a31	doc: improve -r/--replace flag syntax docs Fixes #2108, Closes #2123	2023-07-08 18:52:42 -04:00
kotborealis	f30a30867e	ignore/types: name aliases for file types We also make py/python, md/markdown and ts/typescript aliases of one another. Note that this only introduces aliases at the point where default types are defined. This just makes them a bit easier to read/write, and also makes it easier to expose more names that describe the same thing. Fixes #1857, Closes #1895	2023-07-08 18:52:42 -04:00
Klas Mellbourn	7313dca472	ignore/types: add 'typescript' alias for 'ts' Closes #2009	2023-07-08 18:52:42 -04:00
Tama McGlinn	99bf2b01dc	ignore/types: add Ada filetypes, including gprbuild and alire .adb and .ads are the usual extensions for Ada source code, and *.gpr indicates a GPRbuild project file used for Ada, and these days often being combined with alire for package dependency resolution. Alire stores a bunch of files named alire.toml in different directories in your (gitignored) cache/dependencies/... Closes #2013	2023-07-08 18:52:42 -04:00
Juan Francisco Cantero Hurtado	ee1360cc07	ignore/types: add raku extensions to ignore types Closes #2117	2023-07-08 18:52:42 -04:00
Andrew Gallant	da7c81fb96	ignore/types: add MDX format to Markdown types Ref https://mdxjs.com/ Closes #2142	2023-07-08 18:52:42 -04:00
chrispy	a4e3d56de1	ignore/types: add DITA (Darwin Information Typing Architecture) Closes #2148	2023-07-08 18:52:42 -04:00
Ludi Rehak	7c83b90f95	doc: fix typo Closes #2153	2023-07-08 18:52:42 -04:00
cuishuang	97b5b7769c	doc: fix some typos Closes #2195	2023-07-08 18:52:42 -04:00
Richard Sternagel	f3241fd657	cli: '--no-ignore-dot' should also '.rgignore' Fixes #2198, Closes #2202	2023-07-08 18:52:42 -04:00
Andrew Gallant	cfe357188d	ignore/types: fix formatting	2023-07-08 18:52:42 -04:00
edam	792451e331	ignore/types: added V type V (http://vlang.io) uses '.v' files. Closes #2302	2023-07-08 18:52:42 -04:00
Alex Rawson	f34fd5c4b6	globset: introduce option to keep empty alternates Add a method GlobBuilder::empty_alternates and supporting mechanisms. Ref #1368 Closes #2369	2023-07-08 18:52:42 -04:00
Jérome Eertmans	d51c6c005a	globset: permit deserializing Glob from String Closes #2386, Closes #2388	2023-07-08 18:52:42 -04:00
Mark Sisson	0f6181d309	ignore/types: add USD to the default file types Closes #2432	2023-07-08 18:52:42 -04:00
Sam James	e902e2fef4	ignore/types: add Gentoo eclass type Eclasses are "ebuild libraries" and generally if you're filtering for/filtering out an ebuild/eclass, you don't want the other either. Followup to 4dfea016b915bb1e88679361de83a91e60447835 Closes #2437	2023-07-08 18:52:42 -04:00
angrycandy	07cbfee225	ignore/types: improve Elixir globs Closes #2450	2023-07-08 18:52:42 -04:00
Andrew Gallant	d675844510	core: don't let context flags override eachother This matches the behavior of GNU grep which does not ignore before-context and after-context completely if the context flag is also provided. Note that this change wasn't done just to match GNU grep. In this case, GNU grep has the more sensible behavior. Fixes #2288, Closes #2451	2023-07-08 18:52:42 -04:00
Misaki	43bbcca06f	doc: note '-n' and '-N' override each other Closes #2460	2023-07-08 18:52:42 -04:00
Eric Arellano	ad9bfdd981	ignore/gitignore: expose `gitconfig_excludes_path` I have reservations about this, but it looks useful and doesn't seem terribly onerous to support. The `ignore` crate will really always need to have some kind of logic supporting this in some form I think. Closes #2482	2023-07-08 18:52:42 -04:00
Jakub Jirutka	0c1cbd99f3	ignore: tweak regex crate features This removes most of the Unicode features as they aren't currently used. We can always add them back later if necessary. We can avoid the unicode-perl feature by changing `\s` to `[[:space:]]`, which uses the ASCII-only definition of `\s`. Since we don't expect non-ASCII whitespace in git config files, this seems okay. Closes #2502	2023-07-08 18:52:42 -04:00
Jon Parise	96cfc0ed13	ignore/types: add 'graphql' type GraphQL file extensions: .graphql and .graphqls (schema) We could also add `.gql`, but perhaps it's less correct to do so. We'll start conservatively here, and we can always add `.gql` later. Closes #2439, Closes #2508	2023-07-08 18:52:42 -04:00
mataha	da8ecddce9	cli: make resolve_binary take COM executables into account When `resolve_binary()` attempts to resolve a path to a program on Windows while searching for a program in `PATH` without an extension, `ripgrep` will assume the extension of the file to be `.exe` as it's the de facto standard, which will work most (99.99%) of the time... ...unless the binary is a COM executable (we're on Windows, duh). Closes #2523	2023-07-08 18:52:42 -04:00
Yifei Teng	545a7dc759	ignore/types: add cml to the default types list It's used in Fuchsia to mean "component manifest language."[1] [1]: https://fuchsia.dev/reference/cml?hl=en Closes #2529	2023-07-08 18:52:42 -04:00
Andrew Gallant	f4d07b9cbd	grep-cli-0.1.8	2023-07-05 17:09:09 -04:00
Andrew Gallant	3ac4541e9f	regex: remove old inner literal extractor (It had already been removed from the crate.)	2023-07-05 14:04:29 -04:00
Andrew Gallant	a68db3ac02	deps: drop temporary patch and move to bstr 1.6 Now that regex 1.9 is out, we can depend on it from crates.io.	2023-07-05 14:04:29 -04:00
Andrew Gallant	ca740d9ace	regex: add new inner literal extractor This is mostly a copy of the prefix literal extractor in regex-syntax, but with a tweaked notion of Seq that keeps track of whether it's a prefix of an expression or not. If it isn't, then we can't cross it as a suffix to another Seq. This new extractor should be a lot more robust than the old one. We actually will keep going through the regex to try and find the "best" literals to search for (according to some heuristic).	2023-07-05 14:04:29 -04:00
Andrew Gallant	e80c102dee	regex: tweak formatting of regex-automata version spec This makes it easier to enable the `logging` feature for regex-automata. I wish I could just enable it unconditionally, but it winds up producing a lot of output because ripgrep uses regexes for things other than the primary search (like every glob). Sigh.	2023-07-05 14:04:29 -04:00

1 2 3 4 5 ...

287 Commits