1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00
Commit Graph

1590 Commits

Author SHA1 Message Date
Andrew Gallant
91470572cd changelog: add notes about new file types 2020-02-17 17:16:28 -05:00
Sven-Hendrik Haase
027adbf485 ignore/types: add 'diff' file type
This includes .patch and .diff files.

Fixes #1418, Closes #1419
2020-02-17 17:16:28 -05:00
Mohammad AlSaleh
e71eedf0eb cli: add --no-context-separator flag
--context-separator='' still adds a new line separator, which could
still potentially be useful. So we add a new `--no-context-separator`
flag that completely disables context separators even when the -A/-B/-C
context flags are used.

Closes #1390
2020-02-17 17:16:28 -05:00
Andrew Gallant
88f46d12f1 tests: remove existing test directory
I'm surprised this wasn't caught until now, but if a test directory
already exists, then it was reused. This can result in hard to debug
problems with tests when, e.g., file names are changed and a recursive
search is executed.
2020-02-17 17:16:28 -05:00
sharkdp
a18cf6ec39 ignore: add existence check for ignore files
This commit adds a simple `.exists()` check for `.gitignore`,
`.ignore`, and other similar files before actually calling
`File::open(…)` in `GitIgnoreBuilder::add`.

The reason is that a simple existence check via `stat` can be faster
than actually trying to `open` the file, see
https://stackoverflow.com/a/12774387/704831. As we typically expect(?)
the number of directories *without* ignore files to be much larger
than the number of directories *with* ignore files, this leads to an
overall speedup.

The performance gain is not huge for `rg`, but can be quite significant
if more `.gitignore`-like files are added via
`add_custom_ignore_filename`. The speedup is *larger* for folders with
*low* files-per-directory ratios.

Note though that we do not do this check on Windows until a specific
analysis there suggests this is beneficial. Namely, Windows generally
has slower file system operations, so it's not clear whether this
speculative check is actually a benefit or not.

Benchmark results
-----------------

`rg --files` in my home folder (200k results, 6.5 files per directory):

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./rg-master --files` | 396.4 ± 3.2 | 390.9 | 400.0 | 1.05 |
| `./rg-feature --files` | 376.0 ± 3.6 | 369.3 | 383.5 | 1.00 |

`rg --files --hidden` in my home folder (800k results, 5.4
files per directory)

| Command | Mean [s] | Min [s] | Max [s] | Relative |
|:---|---:|---:|---:|---:|
| `./rg-master --files --hidden` | 1.575 ± 0.012 | 1.560 | 1.597 | 1.06 |
| `./rg-feature --files --hidden` | 1.479 ± 0.011 | 1.464 | 1.496 | 1.00 |

`rg --files` in the chromium-79.0.3915.2 source tree (300k results, 12.7 files per
directory)

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `~/rg-master --files` | 445.2 ± 5.3 | 435.6 | 453.0 | 1.04 |
| `~/rg-feature --files` | 428.9 ± 7.0 | 418.2 | 440.0 | 1.00 |

`rg --files` in the linux-5.3 source tree (65k results, 15.1
files per directory)

| Command | Mean [ms] | Min [ms] | Max [ms] | Relative |
|:---|---:|---:|---:|---:|
| `./rg-master --files` | 94.5 ± 1.9 | 89.8 | 98.5 | 1.02 |
| `./rg-feature --files` | 92.6 ± 2.7 | 88.4 | 98.7 | 1.00 |

Closes #1381
2020-02-17 17:16:28 -05:00
Gibson Fahnestock
c78c3236a8
readme: remove outdated SIMD info
Looks like the upstream brew Formula [0][] now has SIMD support, so
remove the extraneous info now that the custom tap is no longer needed
[1][].

[0]: https://github.com/Homebrew/homebrew-core/blob/master/Formula/ripgrep.rb
[1]: f3083e4574

PR #1431
2020-02-15 17:19:22 -05:00
Sorin Sbarnea
7cf21600cd
readme: document CentOS 8 support
ripgrep install instructions are valid even for the 7 version. The tool
works without problems on these too.

PR #1428
2020-02-15 17:16:57 -05:00
Jonathan Mast
647b0d3977
ignore/types: add HAML and ERB
These are commonly used templating languages for Ruby, add their
extensions to the filetypes list for convenient filtering.

PR #1407
2020-02-15 09:18:32 -05:00
Jeff S
e572fc1683
ignore/types: add slim, slime, and skim templates
PR #1391
2020-02-15 09:17:46 -05:00
Andrew Gallant
9cb93abd11
ignore: allow use of Error::description
We can remove it in the next semver incompatible release.
2020-02-10 06:44:21 -05:00
Luca Kredel
41695c66fa
ignore/types: add typoscript file type
Add the file types for TypoScript - the configuration language of the
TYPO3 CMS.

PR #1477
2020-02-07 08:41:00 -05:00
Andrew Gallant
cb0dfda936
faq: add section about donations
This is asked often enough that it's worth having a canonical answer.
2020-02-05 13:09:11 -05:00
Andrew Gallant
74d1fe59e9
deps: update everything 2020-01-30 18:33:40 -05:00
Andrew Gallant
9fd1e202e0
deps: update regex, regex-syntax and aho-corasick
Notably, this brings in a bug fix reported by @okdana:
https://github.com/rust-lang/regex/issues/640
2020-01-30 18:32:56 -05:00
Robert Irelan
e76807b1b5
ignore/types: add *.org_archive to org file type
.org_archive is the default extension for Org archive files, created when
entries from an Org-mode file are archived (see
<https://orgmode.org/org.html#Moving-subtrees>). These files are still in Org
mode format, so it's worth searching them at the same time as non-archive Org
mode files.

PR #1475
2020-01-29 13:59:34 -05:00
Andrew Gallant
f8fb65f7e3
globset: fix benchmarks
There were apparently a lot of unused things, including lazy_static.
2020-01-27 16:45:12 -05:00
Tristan Waddington
98de8d248a ignore/types: make 'gradle' it's own type
This change maintains the existing behavior of the 'groovy' type, which
includes both .groovy and .gradle files.

PR #1470
2020-01-23 06:51:11 -05:00
Crestwave
c358700dfb readme: add instructions for Haiku x86_64 and x86_gcc2
PR #1465
2020-01-21 07:34:24 -05:00
Alex Touchet
8670a4a969 readme: update outdated links
PR #1463
2020-01-21 07:32:54 -05:00
Oliver Newman
e3b1f86908 doc: add missing "will" to the user guide
PR #1462
2020-01-20 17:26:08 -05:00
Jan Verbeek
46b07bb2ee ignore/types: fix postscript globs
The postscript globs were missing asterisks, so they were treated as
literal filenames.

PR #1461
2020-01-20 07:48:57 -05:00
Andrew Gallant
8bdf84e3a8
deps: update everything 2020-01-16 19:47:23 -05:00
Andrew Gallant
5a6e17fcc1
deps: various updates
Most of these updates (sans thread_local) are from crates I maintain
that have seen updates recently.

Notably, this includes a bump to `termcolor 1.1.0` which includes
support for respecting `NO_COLOR`. This commit therefore means that
ripgrep now supports `NO_COLOR`.

As an added bonus, we drop a dependency on Windows. (Although the total
amount of code compiled remains the same.)

Closes #1186
2020-01-11 10:09:10 -05:00
Andrew Gallant
00bfcd14a6
ignore-0.4.11 2020-01-10 15:08:27 -05:00
Andrew Gallant
bf0ddc4675 ci: fix musl docker build
Looks like the old japaric images are bunk. We update our docker image
to be based on the new rustembedded images and configure cross to use
it.

Turns out that this wasn't due to a stale docker image, but rather, a
bug in cross: https://github.com/rust-embedded/cross/issues/357
We work around that bug by installing the master branch of cross. Sigh.
2020-01-10 15:07:47 -05:00
Andrew Gallant
0fb3f6a159 ci: disable github actions for now
The CI build failures are annoying and distracting. Hopefully soon I'll
be able to invest more time in the switch.
2020-01-10 15:07:47 -05:00
Andrew Gallant
837fb5e21f deps: update to crossbeam-channel 0.4
Closes #1427
2020-01-10 15:07:47 -05:00
Andrew Gallant
2e1815606e deps: update to bytecount 0.6
Looks like there aren't any major changes other than dependency updates.
2020-01-10 15:07:47 -05:00
Andrew Gallant
cb2f6ddc61 deps: update to thread_local 1.0
We also update the pcre2 and regex dependencies, which removes any other
lingering uses of thread_local 0.3.
2020-01-10 15:07:47 -05:00
Andrew Gallant
bd7a42602f deps: bump to base64 0.11 2020-01-10 15:07:47 -05:00
Andrew Gallant
528ce56e1b deps: run cargo update
The only new dependency is an unused target specific dependency hermit
via the atty crate.
2020-01-10 15:07:47 -05:00
Yevgen Antymyrov
8892bf648c doc: fix typo in FAQ 2019-09-25 08:13:27 -04:00
Jonathan Clem
8cb7271b64 ci: get GitHub Actions running again
Basically, matrix.os needs to be defined for every build. We
were commenting out some of the builds in order to debug
CI in the `include` section, but we also need to comment them
out in the `build section.
2019-09-11 09:08:24 -04:00
Andrew Gallant
4858267f3b ci: initial github actions config 2019-08-31 09:24:44 -04:00
Andrew Gallant
5011dba2fd
ignore: remove unused parameter 2019-08-28 20:21:34 -04:00
Andrew Gallant
e14f9195e5
deps: update everything 2019-08-28 20:18:47 -04:00
Andrew Gallant
ef0e7af56a
deps: update bstr to 0.2.7
The new bstr release contains a small performance bug fix where some
trivial methods weren't being inlined.
2019-08-11 10:41:05 -04:00
Todd Walton
b266818aa5 doc: use XDG_CONFIG_HOME in comments
XDG_CONFIG_DIR does not actually exist.

PR #1347
2019-08-09 13:37:37 -04:00
LawAbidingCactus
81415ae52d doc: update to reflect glob matching behavior change
Specifically, paths contains a `/` are not allowed to match any
other slash in the path, even as a prefix. So `!.git` is the correct
incantation for ignoring a `.git` directory that occurs anywhere 
in the path.
2019-08-07 13:47:18 -04:00
Andrew Gallant
5c4584aa7c
grep-regex-0.1.5 2019-08-06 09:51:13 -04:00
Andrew Gallant
0972c6e7c7
grep-searcher-0.1.6 2019-08-06 09:50:52 -04:00
Andrew Gallant
0a372bf2e4
deps: update ignore 2019-08-06 09:50:35 -04:00
Andrew Gallant
345124a7fa
ignore-0.4.10 2019-08-06 09:47:45 -04:00
Andrew Gallant
31807f805a
deps: drop tempfile
We were only using it to create temporary directories for `ignore`
tests, but it pulls in a bunch of dependencies and we don't really need
randomness. So just use our own simple wrapper instead.
2019-08-06 09:46:05 -04:00
Andrew Gallant
4de227fd9a
deps: update everything
Mostly this just updates regex and its assorted dependencies. This does
drop utf8-ranges and ucd-util, in accordance with changes to
regex-syntax and regex.
2019-08-05 13:50:55 -04:00
jimbo1qaz
d7ce274722 readme: Debian Buster is stable now
PR #1338
2019-08-04 08:06:10 -04:00
Andrew Gallant
5b10328f41
changelog: update with bug fix 2019-08-02 07:37:27 -04:00
Andrew Gallant
813c676eca
searcher: fix roll buffer bug
This commit fixes a subtle bug in how the line buffer was rolling its
contents. Specifically, when ripgrep searches without memory maps,
it uses a "roll" buffer for incremental line oriented search without
needing to read the entire file into memory at once. The roll buffer
works by reading a chunk of bytes from the file into memory, and then
searching everything in that buffer up to the last `\n` byte. The bytes
*after* the last `\n` byte are preserved, since they likely correspond
to *part* of the next line. Once ripgrep is done searching the buffer,
it "rolls" the buffer such that the start of the next line is at the
beginning of the buffer, and then ripgrep reads more data into the
buffer starting at the (possibly) partial end of that line.

The implication of this strategy, necessarily so, is that a buffer must
be big enough to fit a single line in memory. This is because the regex
engine needs a contiguous block of memory to search, so there is no way
to search anything smaller than a single line. So if a file contains a
single line with 7.5 million bytes, then the buffer will grow to be at
least that size. (Many files have super long lines like this, but they
tend to be *binary* files, which ripgrep will detect and stop searching
unless the user forces it with the `-a/--text` flag. So in practice,
they aren't usually a problem. However, in this case, #1335 found a case
where a plain text file had a line with 7.5 million bytes.)

Now, for performance reasons, ripgrep reuses these buffers across its
search. Typically, it will create `N` of these line buffers when it
starts (where `N` is the number of threads it is using), and then reuse
them without creating any new ones as it searches through files.

This means that if you search a file with a very long line, that buffer
will expand to be big enough to store that line. ripgrep never contracts
these buffers, so once it searches the next file, ripgrep will continue
to use this large buffer. While it might be prudent to contract these
buffers in some circumstances, this isn't otherwise inherently a
problem. The memory has already been allocated, and there isn't much
cost to using it, other than the fact that ripgrep hangs on to it and
never gives it back to the OS.

However, the `roll` implementation described above had a really
important bug in it that was impacted by the size of the buffer.
Specifically, it used the following to "roll" the partial line at the
end of the buffer to the beginning:

    self.buf.copy_within_str(self.pos.., 0);

Which means that if the buffer is very large, ripgrep will copy
*everything* from `self.pos` (which might be very small, e.g., for small
files) to the end of the buffer, and move it to the beginning of the
buffer. This will happen repeatedly each time the buffer is used to
search small files, which winds up being quite a large slow down if the
line was exceptionally large (say, megabytes).

It turns out that copying everything is completely unnecessary. We only
need to copy the remainder of the last read to the beginning of the
buffer. Everything *after* the last read in the buffer is just free
space that can be filled for the next read. So, all we need to do is
copy just those bytes:

    self.buf.copy_within_str(self.pos..self.end, 0);

... which is typically much much smaller than the rest of the buffer.

This was likely also causing small performance losses in other cases as
well. For example, when searching a lot of small files, ripgrep would
likely do a lot more copying than necessary. Although, given that the
default buffer size is 8KB, this extra copying was likely pretty small,
and was thus harder to observe.

Fixes #1335
2019-08-02 07:23:27 -04:00
Andrew Gallant
f625d72b6f
pkg: update brew tap to 11.0.2 2019-08-01 19:39:53 -04:00
Andrew Gallant
3de31f7527
ci: fix musl deployment
The docker image that the Linux binary is now built in does not have
ASCII doc installed, so setup Cross to point to my own image with those
tools installed.
2019-08-01 18:41:44 -04:00