Sadly, there were several tests that are coupled to the size of the
buffer used by ripgrep. Making the tests agnostic to the size is
difficult. And it's annoying to fix the tests. But we rarely change the
buffer size, so ¯\_(ツ)_/¯.
This increases the initial buffer size from 8KB to 64KB. This actually
leads to a reasonably noticeable improvement in at least one work-load,
and is unlikely to regress in any other case. Also, since Rust programs
(at least on Linux) seem to always use a minimum of 6-8MB of memory,
adding an extra 56KB is negligible.
Before:
$ hyperfine -i "rg 'zqzqzqzq' OpenSubtitles2018.raw.en --no-mmap"
Benchmark #1: rg 'zqzqzqzq' OpenSubtitles2018.raw.en --no-mmap
Time (mean ± σ): 2.109 s ± 0.012 s [User: 565.5 ms, System: 1541.6 ms]
Range (min … max): 2.094 s … 2.128 s 10 runs
After:
$ hyperfine -i "rg 'zqzqzqzq' OpenSubtitles2018.raw.en --no-mmap"
Benchmark #1: rg 'zqzqzqzq' OpenSubtitles2018.raw.en --no-mmap
Time (mean ± σ): 1.802 s ± 0.006 s [User: 462.3 ms, System: 1337.9 ms]
Range (min … max): 1.795 s … 1.814 s 10 runs
memmap is unmaintained at this point and it is being flagged as a
RUSTSEC advisory in ripgrep. This doesn't seem like that big of a deal
to me honestly, but memmap2 looks like a fine choice at this point.
Fixes#1785, Closes#1786
It seems that PowerShell uses sockets instead of FIFOs to redirect the
output between commands. So add `is_socket` to our `is_readable_stdin`
check.
This seems unlikely to cause problems and it probably more generally
correct than what we had before. In theory, it could cause problems if
it produces false positives, in which case, ripgrep will try to read
stdin when it should search the current working directory. (And this
usually winds up manifesting as ripgrep blocking forever.) But, if the
stdin handle reports itself as a socket, then it seems like we should
read it.
Fixes#1741, Closes#1742
`ignore::Error` wraps `std::io::Error` with additional information
(as well as expose non-IO errors). For people wanting to inspect what
the error is, they have to recursively match the Enum. This provides
`io_error` and `into_io_error` helpers to do this for the user.
PR #1740
This updates encoding_rs, crossbeam-utils and crossbeam-channel. This
serves two purposes. The encoding_rs update fixes a compilation failure
on the latest nightly. The crossbeam updates are good sense and to
reduce duplicate dependencies such as cfg-if. (Although, we note that
the log crate still pulls in cfg-if 0.1, so ripgrep has a duplicate
dependency there for now. But it's very small.)
Fixes#1721, Closes#1705
Bazel supports `BUILD.bazel` as well as `WORKSPACE.bazel`. In
addition, it is common to ship BUILD/WORKSPACE templates for
external repositories suffixed with .bazel for easier tool
recognition.
Co-authored-by: Brandon Adams <brandon.adams@imc.com>
PR #1716
Since the translation from a glob to a regex always
disables Unicode in the regex, it follows that we shouldn't
need regex's Unicode features enabled.
Now, ripgrep enables Unicode features in its regex
dependency and of course uses them, which will cause
globset to have it enabled in the ripgrep build as well. So
this doesn't actually change anything for ripgrep. But this
does slim thing downs for folks using globset independently
of ripgrep.
PR #1712
We use '+++' syntax to output a literal '**' for a '--glob' example.
This '+++' syntax is pretty ugly when rendered literally via --help. We
fix this by hackily inserting the '+++' syntax for its one specific case
that we need it during man page generation.
Not ideal but it works. And --help still has some '*foo*' markup, but we
live with that for now.
Fixes#1581
Adds `WalkBuilder::filter_entry` that takes a predicate to be applied to
all entries. If the predicate returns `false` on a given entry, that
entry and all children will be skipped.
Fixes#1555, Closes#1557
While Linux distributions (at least Arch Linux, RHEL, Debian) do not support
compressing files with compress(1), macOS & AIX do (the utility is part of
POSIX). Additionally, gzip is able to uncompress such compressed files and
provides an `uncompress` binary.
Closes#1547
It has grown quite long. It would be nice if we could shorten this only
when -h is used and keep it long for --help, but it seems clap doesn't
let this happen. (It does have `about` and `long_about` options, but
they don't work, even when I disable the use of the template.)
The longer prelude is now only available in the man page.
This addresses #189.
When a pattern with invalid UTF-8 is given, the error message suggests
unqualified use of hex escape sequences to match arbitrary bytes. But
you *also* need to disable Unicode mode. So include that in the error
message.
Fixes#1339
In order to implement --count-matches, we simply re-execute the regex on
the spans reported by the searcher. The spans always correspond to the
lines that participated in the match. This is the correct thing to do,
except when the regex contains look-ahead (or look-behind).
In particular, the look-around permits the regex's match success to
depends on an arbitrary point before or after the lines actually
reported as participating in the match. Since only the matched lines are
reported to the printer, it is possible for subsequent searching on
those lines to fail.
A true fix for this would somehow make the total span available to the
printer. But that seems tricky since it isn't always available. For
PCRE2's case in multiline mode, it is available because we force it to
be so for correctness.
For now, we simply detect this corner case heuristically. If the match
count is zero, then it necessarily means there is some kind of
look-around that isn't matching. So we set the match count to 1. This is
probably incorrect in some cases, although my brain can't quite come up
with a concrete example. Nevertheless, this is strictly better than the
status quo.
Fixes#1573