1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00
ripgrep/README.md

391 lines
15 KiB
Markdown
Raw Normal View History

2016-09-12 01:27:50 +02:00
ripgrep (rg)
------------
`ripgrep` is a line oriented search tool that combines the usability of The
2017-02-24 15:41:08 +02:00
Silver Searcher (similar to `ack`) with the raw speed of GNU grep. `ripgrep`
works by recursively searching your current directory for a regex pattern.
`ripgrep` has first class support on Windows, Mac and Linux, with binary
downloads available for
[every release](https://github.com/BurntSushi/ripgrep/releases).
2016-10-22 23:44:38 +02:00
[![Linux build status](https://travis-ci.org/BurntSushi/ripgrep.svg?branch=master)](https://travis-ci.org/BurntSushi/ripgrep)
2016-09-23 12:56:56 +02:00
[![Windows build status](https://ci.appveyor.com/api/projects/status/github/BurntSushi/ripgrep?svg=true)](https://ci.appveyor.com/project/BurntSushi/ripgrep)
[![](https://img.shields.io/crates/v/ripgrep.svg)](https://crates.io/crates/ripgrep)
Dual-licensed under MIT or the [UNLICENSE](http://unlicense.org).
### Screenshot of search results
[![A screenshot of a sample search with ripgrep](http://burntsushi.net/stuff/ripgrep1.png)](http://burntsushi.net/stuff/ripgrep1.png)
2016-11-07 00:59:57 +02:00
### Quick examples comparing tools
2016-09-23 12:56:56 +02:00
This example searches the entire Linux kernel source tree (after running
`make defconfig && make -j8`) for `[A-Z]+_SUSPEND`, where all matches must be
2016-11-07 01:02:45 +02:00
words. Timings were collected on a system with an Intel i7-6900K 3.2 GHz, and
ripgrep was compiled using the `compile` script in this repo.
2016-09-23 12:56:56 +02:00
Please remember that a single benchmark is never enough! See my
[blog post on `ripgrep`](http://blog.burntsushi.net/ripgrep/)
for a very detailed comparison with more benchmarks and analysis.
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
2016-11-07 00:59:57 +02:00
| ripgrep (Unicode) | `rg -n -w '[A-Z]+_SUSPEND'` | 450 | **0.134s** |
2016-09-23 12:56:56 +02:00
| [The Silver Searcher](https://github.com/ggreer/the_silver_searcher) | `ag -w '[A-Z]+_SUSPEND'` | 450 | 0.753s |
| [git grep](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `LC_ALL=C git grep -E -n -w '[A-Z]+_SUSPEND'` | 450 | 0.823s |
2016-09-28 22:47:10 +02:00
| [git grep (Unicode)](https://www.kernel.org/pub/software/scm/git/docs/git-grep.html) | `LC_ALL=en_US.UTF-8 git grep -E -n -w '[A-Z]+_SUSPEND'` | 450 | 2.880s |
2016-09-23 12:56:56 +02:00
| [sift](https://github.com/svent/sift) | `sift --git -n -w '[A-Z]+_SUSPEND'` | 450 | 3.656s |
| [The Platinum Searcher](https://github.com/monochromegane/the_platinum_searcher) | `pt -w -e '[A-Z]+_SUSPEND'` | 450 | 12.369s |
2016-11-07 01:04:55 +02:00
| [ack](https://github.com/petdance/ack2) | `ack -w '[A-Z]+_SUSPEND'` | 1878 | 16.952s |
2016-09-23 12:56:56 +02:00
(Yes, `ack` [has](https://github.com/petdance/ack2/issues/445) a
[bug](https://github.com/petdance/ack2/issues/14).)
Here's another benchmark that disregards gitignore files and searches with a
whitelist instead. The corpus is the same as in the previous benchmark, and the
2016-11-07 01:49:07 +02:00
flags passed to each command ensures that they are doing equivalent work:
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
2016-11-07 01:01:55 +02:00
| ripgrep | `rg -L -u -tc -n -w '[A-Z]+_SUSPEND'` | 404 | **0.108s** |
| [ucg](https://github.com/gvansickle/ucg) | `ucg --type=cc -w '[A-Z]+_SUSPEND'` | 392 | 0.219s |
| [GNU grep](https://www.gnu.org/software/grep/) | `egrep -R -n --include='*.c' --include='*.h' -w '[A-Z]+_SUSPEND'` | 404 | 0.733s |
2016-11-07 02:45:18 +02:00
(`ucg` [has slightly different behavior in the presence of symbolic links](https://github.com/gvansickle/ucg/issues/106).)
And finally, a straight up comparison between ripgrep and GNU grep on a single
2016-11-07 00:59:57 +02:00
large file (~9.3GB,
[`OpenSubtitles2016.raw.en.gz`](http://opus.lingfil.uu.se/OpenSubtitles2016/mono/OpenSubtitles2016.raw.en.gz)):
| Tool | Command | Line count | Time |
| ---- | ------- | ---------- | ---- |
2016-11-07 00:59:57 +02:00
| ripgrep | `rg -w 'Sherlock [A-Z]\w+'` | 5268 | **2.520s** |
| [GNU grep](https://www.gnu.org/software/grep/) | `LC_ALL=C egrep -w 'Sherlock [A-Z]\w+'` | 5268 | 7.143s |
In the above benchmark, passing the `-n` flag (for showing line numbers)
increases the times to `3.081s` for ripgrep and `11.403s` for GNU grep.
2016-09-23 12:56:56 +02:00
### Why should I use `ripgrep`?
* It can replace both The Silver Searcher and GNU grep because it is faster
than both. (N.B. It is not, strictly speaking, a "drop-in" replacement for
both, but the feature sets are far more similar than different.)
* Like The Silver Searcher, `ripgrep` defaults to recursive directory search
and won't search files ignored by your `.gitignore` files. It also ignores
hidden and binary files by default. `ripgrep` also implements full support
for `.gitignore`, where as there are many bugs related to that functionality
in The Silver Searcher.
* `ripgrep` can search specific types of files. For example, `rg -tpy foo`
limits your search to Python files and `rg -Tjs foo` excludes Javascript
files from your search. `ripgrep` can be taught about new file types with
custom matching rules.
* `ripgrep` supports many features found in `grep`, such as showing the context
of search results, searching multiple patterns, highlighting matches with
color and full Unicode support. Unlike GNU grep, `ripgrep` stays fast while
supporting Unicode (which is always on).
Add support for additional text encodings. This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and Shift_JIS. (Courtesy of the `encoding_rs` crate.) Specifically, this feature enables ripgrep to search files that are encoded in an encoding other than UTF-8. The list of available encodings is tied directly to what the `encoding_rs` crate supports, which is in turn tied to the Encoding Standard. The full list of available encodings can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get This pull request also introduces the notion that text encodings can be automatically detected on a best effort basis. Currently, the only support for this is checking for a UTF-16 bom. In all other cases, a text encoding of `auto` (the default) implies a UTF-8 or ASCII compatible source encoding. When a text encoding is otherwise specified, it is unconditionally used for all files searched. Since ripgrep's regex engine is fundamentally built on top of UTF-8, this feature works by transcoding the files to be searched from their source encoding to UTF-8. This transcoding only happens when: 1. `auto` is specified and a non-UTF-8 encoding is detected. 2. A specific encoding is given by end users (including UTF-8). When transcoding occurs, errors are handled by automatically inserting the Unicode replacement character. In this case, ripgrep's output is guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they are printed). In all other cases, the source text is searched directly, which implies an assumption that it is at least ASCII compatible, but where UTF-8 is most useful. In this scenario, encoding errors are not detected. In this case, ripgrep's output will match the input exactly, byte-for-byte. This design may not be optimal in all cases, but it has some advantages: 1. In the happy path ("UTF-8 everywhere") remains happy. I have not been able to witness any performance regressions. 2. In the non-UTF-8 path, implementation complexity is kept relatively low. The cost here is transcoding itself. A potentially superior implementation might build decoding of any encoding into the regex engine itself. In particular, the fundamental problem with transcoding everything first is that literal optimizations are nearly negated. Future work should entail improving the user experience. For example, we might want to auto-detect more text encodings. A more elaborate UX experience might permit end users to specify multiple text encodings, although this seems hard to pull off in an ergonomic way. Fixes #1
2017-03-09 03:22:48 +02:00
* `ripgrep` supports searching files in text encodings other than UTF-8, such
as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for
automatically detecting UTF-16 is provided. Other text encodings must be
specifically specified with the `-E/--encoding` flag.)
2016-09-23 12:56:56 +02:00
2017-01-10 02:55:56 +02:00
In other words, use `ripgrep` if you like speed, filtering by default, fewer
bugs and Unicode support.
### Why shouldn't I use `ripgrep`?
I'd like to try to convince you why you *shouldn't* use `ripgrep`. This should
give you a glimpse at some important downsides or missing features of
`ripgrep`.
* `ripgrep` uses a regex engine based on finite automata, so if you want fancy
regex features such as backreferences or look around, `ripgrep` won't give
them to you. `ripgrep` does support lots of things though, including, but not
limited to: lazy quantification (e.g., `a+?`), repetitions (e.g., `a{2,5}`),
begin/end assertions (e.g., `^\w+$`), word boundaries (e.g., `\bfoo\b`), and
support for Unicode categories (e.g., `\p{Sc}` to match currency symbols or
`\p{Lu}` to match any uppercase letter). (Fancier regexes will never be
supported.)
* `ripgrep` doesn't yet support searching compressed files. (Likely to be
supported in the future.)
* `ripgrep` doesn't have multiline search. (Unlikely to ever be supported.)
Add support for additional text encodings. This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and Shift_JIS. (Courtesy of the `encoding_rs` crate.) Specifically, this feature enables ripgrep to search files that are encoded in an encoding other than UTF-8. The list of available encodings is tied directly to what the `encoding_rs` crate supports, which is in turn tied to the Encoding Standard. The full list of available encodings can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get This pull request also introduces the notion that text encodings can be automatically detected on a best effort basis. Currently, the only support for this is checking for a UTF-16 bom. In all other cases, a text encoding of `auto` (the default) implies a UTF-8 or ASCII compatible source encoding. When a text encoding is otherwise specified, it is unconditionally used for all files searched. Since ripgrep's regex engine is fundamentally built on top of UTF-8, this feature works by transcoding the files to be searched from their source encoding to UTF-8. This transcoding only happens when: 1. `auto` is specified and a non-UTF-8 encoding is detected. 2. A specific encoding is given by end users (including UTF-8). When transcoding occurs, errors are handled by automatically inserting the Unicode replacement character. In this case, ripgrep's output is guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they are printed). In all other cases, the source text is searched directly, which implies an assumption that it is at least ASCII compatible, but where UTF-8 is most useful. In this scenario, encoding errors are not detected. In this case, ripgrep's output will match the input exactly, byte-for-byte. This design may not be optimal in all cases, but it has some advantages: 1. In the happy path ("UTF-8 everywhere") remains happy. I have not been able to witness any performance regressions. 2. In the non-UTF-8 path, implementation complexity is kept relatively low. The cost here is transcoding itself. A potentially superior implementation might build decoding of any encoding into the regex engine itself. In particular, the fundamental problem with transcoding everything first is that literal optimizations are nearly negated. Future work should entail improving the user experience. For example, we might want to auto-detect more text encodings. A more elaborate UX experience might permit end users to specify multiple text encodings, although this seems hard to pull off in an ergonomic way. Fixes #1
2017-03-09 03:22:48 +02:00
In other words, if you like fancy regexes, searching compressed files or
multiline search, then `ripgrep` may not quite meet your needs (yet).
2016-09-23 12:56:56 +02:00
### Is it really faster than everything else?
Yes. A large number of benchmarks with detailed analysis for each is
[available on my blog](http://blog.burntsushi.net/ripgrep/).
Summarizing, `ripgrep` is fast because:
* It is built on top of
[Rust's regex engine](https://github.com/rust-lang-nursery/regex).
Rust's regex engine uses finite automata, SIMD and aggressive literal
optimizations to make searching very fast.
* Rust's regex library maintains performance with full Unicode support by
building UTF-8 decoding directly into its deterministic finite automaton
engine.
* It supports searching with either memory maps or by searching incrementally
with an intermediate buffer. The former is better for single files and the
latter is better for large directories. `ripgrep` chooses the best searching
strategy for you automatically.
* Applies your ignore patterns in `.gitignore` files using a
[`RegexSet`](https://doc.rust-lang.org/regex/regex/struct.RegexSet.html).
That means a single file path can be matched against multiple glob patterns
simultaneously.
2016-11-07 01:51:00 +02:00
* It uses a lock-free parallel recursive directory iterator, courtesy of
[`crossbeam`](https://docs.rs/crossbeam) and
[`ignore`](https://docs.rs/ignore).
2016-09-23 12:56:56 +02:00
### Installation
The binary name for `ripgrep` is `rg`.
[Binaries for `ripgrep` are available for Windows, Mac and
Linux.](https://github.com/BurntSushi/ripgrep/releases) Linux binaries are
static executables. Windows binaries are available either as built with MinGW
(GNU) or with Microsoft Visual C++ (MSVC). When possible, prefer MSVC over GNU,
but you'll need to have the
[Microsoft VC++ 2015 redistributable](https://www.microsoft.com/en-us/download/details.aspx?id=48145)
2016-09-23 12:56:56 +02:00
installed.
If you're a **Mac OS X Homebrew** user, then you can install ripgrep either
from homebrew-core, (compiled with rust stable, no SIMD):
```
$ brew install ripgrep
```
or you can install a binary compiled with rust nightly (including SIMD and all
optimizations) by utilizing a custom tap:
2016-09-23 12:56:56 +02:00
```
2016-09-26 19:53:19 +02:00
$ brew tap burntsushi/ripgrep https://github.com/BurntSushi/ripgrep.git
$ brew install burntsushi/ripgrep/ripgrep-bin
2016-09-23 12:56:56 +02:00
```
If you're a **Windows Chocolatey** user, then you can install `ripgrep` from the [official repo](https://chocolatey.org/packages/ripgrep):
```
$ choco install ripgrep
```
If you're an **Arch Linux** user, then you can install `ripgrep` from the official repos:
2016-09-23 12:56:56 +02:00
```
$ pacman -S ripgrep
2016-09-23 12:56:56 +02:00
```
2016-11-02 04:01:04 +02:00
If you're a **Gentoo** user, you can install `ripgrep` from the [official repo](https://packages.gentoo.org/packages/sys-apps/ripgrep):
```
$ emerge ripgrep
```
If you're a **Fedora 24+** user, you can install `ripgrep` from [copr](https://copr.fedorainfracloud.org/coprs/carlgeorge/ripgrep/):
```
$ dnf copr enable carlgeorge/ripgrep
$ dnf install ripgrep
```
If you're a **RHEL/CentOS 7** user, you can install `ripgrep` from [copr](https://copr.fedorainfracloud.org/coprs/carlgeorge/ripgrep/):
```
$ yum-config-manager --add-repo=https://copr.fedorainfracloud.org/coprs/carlgeorge/ripgrep/repo/epel-7/carlgeorge-ripgrep-epel-7.repo
$ yum install ripgrep
```
If you're a **Nix** user, you can install `ripgrep` from
[nixpkgs](https://github.com/NixOS/nixpkgs/blob/master/pkgs/tools/text/ripgrep/default.nix):
```
$ nix-env --install ripgrep
$ # (Or using the attribute name, which is also `ripgrep`.)
```
2017-01-14 15:51:30 +02:00
If you're a **Rust programmer**, `ripgrep` can be installed with `cargo`. Note
that this requires you to have **Rust 1.12 or newer** installed.
2016-09-23 12:56:56 +02:00
```
$ cargo install ripgrep
```
`ripgrep` isn't currently in any other package repositories.
[I'd like to change that](https://github.com/BurntSushi/ripgrep/issues/10).
### Whirlwind tour
The command line usage of `ripgrep` doesn't differ much from other tools that
perform a similar function, so you probably already know how to use `ripgrep`.
The full details can be found in `rg --help`, but let's go on a whirlwind tour.
`ripgrep` detects when its printing to a terminal, and will automatically
colorize your output and show line numbers, just like The Silver Searcher.
Coloring works on Windows too! Colors can be controlled more granularly with
the `--color` flag.
One last thing before we get started: `ripgrep` assumes UTF-8 *everywhere*. It
can still search files that are invalid UTF-8 (like, say, latin-1), but it will
simply not work on UTF-16 encoded files or other more exotic encodings.
[Support for other encodings may
happen.](https://github.com/BurntSushi/ripgrep/issues/1)
To recursively search the current directory, while respecting all `.gitignore`
files, ignore hidden files and directories and skip binary files:
```
$ rg foobar
```
The above command also respects all `.ignore` files, including in parent
directories. `.ignore` files can be used when `.gitignore` files are
insufficient. In all cases, `.ignore` patterns take precedence over
2016-09-23 12:56:56 +02:00
`.gitignore`.
To ignore all ignore files, use `-u`. To additionally search hidden files
and directories, use `-uu`. To additionally search binary files, use `-uuu`.
(In other words, "search everything, dammit!") In particular, `rg -uuu` is
similar to `grep -a -r`.
```
$ rg -uu foobar # similar to `grep -r`
$ rg -uuu foobar # similar to `grep -a -r`
```
(Tip: If your ignore files aren't being adhered to like you expect, run your
search with the `--debug` flag.)
Make the search case insensitive with `-i`, invert the search with `-v` or
show the 2 lines before and after every search result with `-C2`.
Force all matches to be surrounded by word boundaries with `-w`.
Search and replace (find first and last names and swap them):
```
$ rg '([A-Z][a-z]+)\s+([A-Z][a-z]+)' --replace '$2, $1'
```
Named groups are supported:
```
$ rg '(?P<first>[A-Z][a-z]+)\s+(?P<last>[A-Z][a-z]+)' --replace '$last, $first'
```
Up the ante with full Unicode support, by matching any uppercase Unicode letter
followed by any sequence of lowercase Unicode letters (good luck doing this
with other search tools!):
```
$ rg '(\p{Lu}\p{Ll}+)\s+(\p{Lu}\p{Ll}+)' --replace '$2, $1'
```
Search only files matching a particular glob:
```
$ rg foo -g 'README.*'
```
<!--*-->
Or exclude files matching a particular glob:
```
$ rg foo -g '!*.min.js'
```
Search and return paths matching a particular glob (i.e., `-g` flag in ag/ack):
```
$ rg -g 'doc*' --files
```
2016-09-23 12:56:56 +02:00
Search only HTML and CSS files:
```
$ rg -thtml -tcss foobar
```
Search everything except for Javascript files:
```
$ rg -Tjs foobar
```
To see a list of types supported, run `rg --type-list`. To add a new type, use
`--type-add`, which must be accompanied by a pattern for searching (`rg` won't
persist your type settings):
2016-09-23 12:56:56 +02:00
```
$ rg --type-add 'foo:*.{foo,foobar}' -tfoo bar
2016-09-23 12:56:56 +02:00
```
The type `foo` will now match any file ending with the `.foo` or `.foobar`
extensions.
### Regex syntax
The syntax supported is
[documented as part of Rust's regex library](https://doc.rust-lang.org/regex/regex/index.html#syntax).
### Shell completions
Shell completion files are included in the release tarball for Bash, Fish, Zsh
and PowerShell.
For **bash**, move `rg.bash-completion` to `$XDG_CONFIG_HOME/bash_completion`
or `/etc/bash_completion.d/`.
For **fish**, move `rg.fish` to `$HOME/.config/fish/completions`.
2016-09-23 12:56:56 +02:00
### Building
`ripgrep` is written in Rust, so you'll need to grab a
[Rust installation](https://www.rust-lang.org/) in order to compile it.
2017-01-10 03:02:29 +02:00
`ripgrep` compiles with Rust 1.12 (stable) or newer. Building is easy:
2016-09-23 12:56:56 +02:00
```
$ git clone https://github.com/BurntSushi/ripgrep
2016-09-23 12:56:56 +02:00
$ cd ripgrep
$ cargo build --release
$ ./target/release/rg --version
0.1.3
```
If you have a Rust nightly compiler, then you can enable optional SIMD
acceleration like so:
```
2016-11-07 00:59:57 +02:00
RUSTFLAGS="-C target-cpu=native" cargo build --release --features 'simd-accel avx-accel'
2016-09-23 12:56:56 +02:00
```
2016-11-07 00:59:57 +02:00
If your machine doesn't support AVX instructions, then simply remove
`avx-accel` from the features list. Similarly for SIMD.
2016-09-23 12:56:56 +02:00
### Running tests
`ripgrep` is relatively well tested, including both unit tests and integration
tests. To run the full test suite, use:
```
$ cargo test
```
from the repository root.
### Known issues
#### I just hit Ctrl+C in the middle of ripgrep's output and now my terminal's foreground color is wrong!
Type in `color` on Windows and `echo -ne "\033[0m"` on Unix to restore your
original foreground color.
PR [#187](https://github.com/BurntSushi/ripgrep/pull/187) fixed this, and it
was later deprecated in
[#281](https://github.com/BurntSushi/ripgrep/issues/281). A full explanation is
available [here][msys issue explanation].
[msys issue explanation]: https://github.com/BurntSushi/ripgrep/issues/281#issuecomment-269093893