It looks like no new dependencies have been introduced. Yay!
This update was primarily motivated to bring regex 1.5 in with its new
memmem implementation from the memchr crate.
memmap is unmaintained at this point and it is being flagged as a
RUSTSEC advisory in ripgrep. This doesn't seem like that big of a deal
to me honestly, but memmap2 looks like a fine choice at this point.
Fixes#1785, Closes#1786
This brings in all other semver updates.
This did require updating some tests, since bstr changed its debug
output for NUL bytes to be a bit more idiomatic.
This updates encoding_rs, crossbeam-utils and crossbeam-channel. This
serves two purposes. The encoding_rs update fixes a compilation failure
on the latest nightly. The crossbeam updates are good sense and to
reduce duplicate dependencies such as cfg-if. (Although, we note that
the log crate still pulls in cfg-if 0.1, so ripgrep has a duplicate
dependency there for now. But it's very small.)
Fixes#1721, Closes#1705
This should bring a compilation time improvement when building static
buils of PCRE2 by enabling parallelism for C compilation.
Kudos to @JoshTriplett for the tip!
This adds one new dependency, maybe-uninit, which is brought in by
crossbeam-channel[1]. This is to apparently fix some unsound code
without bumping the MSRV. Since ripgrep uses the latest stable release
of Rust, the maybe-uninit crate should compile down to nothing and just
re-export std's `MaybeUninit` type.
[1] - https://github.com/crossbeam-rs/crossbeam/pull/458
The transient failures appear to be persisting and they are quite
difficult to debug. So include a full directory listing in the output of
every test failure.
This makes it so the caller can more easily refactor from
single-threaded to multi-threaded walking. If they want to support both,
this makes it easier to do so with a single initialization code-path. In
particular, it side-steps the need to put everything into an `Arc`.
This is not a breaking change because it strictly increases the number
of allowed inputs to `WalkParallel::run`.
Closes#1410, Closes#1432
Previously, ripgrep would always defer to the regex engine's capturing
matches in order to implement word matching. Namely, ripgrep would
determine the correct match offsets via a capturing group, since the
word regex is itself generated from the user supplied regex.
Unfortunately, the regex engine's capturing mode is still fairly slow,
so this commit adds a fast path to avoid capturing mode in the vast
majority of cases. See comments in the code for details.
`benches/bench.rs` uses lazy_static but Cargo.toml does not declare a
dependency on it. This causes rustc to use its own internal private
copy instead. Sometimes this causes unintuitive errors like this Debian
bug: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=942243
The underlying issue is https://github.com/rust-lang/rust#27812 but it
can be avoided by explicitly declaring the dependency, which you are
supposed to do anyways.
Closes#1435
Most of these updates (sans thread_local) are from crates I maintain
that have seen updates recently.
Notably, this includes a bump to `termcolor 1.1.0` which includes
support for respecting `NO_COLOR`. This commit therefore means that
ripgrep now supports `NO_COLOR`.
As an added bonus, we drop a dependency on Windows. (Although the total
amount of code compiled remains the same.)
Closes#1186
We were only using it to create temporary directories for `ignore`
tests, but it pulls in a bunch of dependencies and we don't really need
randomness. So just use our own simple wrapper instead.
Mostly this just updates regex and its assorted dependencies. This does
drop utf8-ranges and ucd-util, in accordance with changes to
regex-syntax and regex.
This brings in a bug fix that no longer tries to run `git` to update the
submodule if the `git` command doesn't exist.
This is useful is more restricted build contexts where `git` isn't
installed. Such as in the docker image used for running `cross`.
It turns out that musl's allocator is slow enough to cause a fairly
noticeable performance regression when ripgrep is built as a static
binary with musl. We fix this by using jemalloc when building with musl.
We continue to use the default system allocator in all other scenarios.
Namely, glibc's allocator doesn't noticeably regress performance compared
to jemalloc. But we could add more targets to this logic if other
system allocators (macOS, Windows) prove to be slow.
This wasn't necessary before because rustc recently stopped using jemalloc
by default.
Fixes#1268
This makes the case of searching for a dictionary of a very large number
of literals much much faster. (~10x or so.) In particular, we achieve this
by short-circuiting the construction of a full regex when we know we have
a simple alternation of literals. Building the regex for a large dictionary
(>100,000 literals) turns out to be quite slow, even if it internally will
dispatch to Aho-Corasick.
Even that isn't quite enough. It turns out that even *parsing* such a regex
is quite slow. So when the -F/--fixed-strings flag is set, we short
circuit regex parsing completely and jump straight to Aho-Corasick.
We aren't quite as fast as GNU grep here, but it's much closer (less than
2x slower).
In general, this is somewhat of a hack. In particular, it seems plausible
that this optimization could be implemented entirely in the regex engine.
Unfortunately, the regex engine's internals are just not amenable to this
at all, so it would require a larger refactoring effort. For now, it's
good enough to add this fairly simple hack at a higher level.
Unfortunately, if you don't pass -F/--fixed-strings, then ripgrep will
be slower, because of the aforementioned missing optimization. Moreover,
passing flags like `-i` or `-S` will cause ripgrep to abandon this
optimization and fall back to something potentially much slower. Again,
this fix really needs to happen inside the regex engine, although we
might be able to special case -i when the input literals are pure ASCII
via Aho-Corasick's `ascii_case_insensitive`.
Fixes#497, Fixes#838
This commit adds a new encoding feature where the -E/--encoding flag
will now accept a value of 'none'. When given this value, all encoding
related machinery is disabled and ripgrep will search the raw bytes of
the file, including the BOM if it's present.
Closes#1207, Closes#1208