This fixes a bug where using \A or (?-m)^ in combination with
-U/--multiline would permit matches that aren't anchored to the
beginning of the file. The underlying cause was an optimization that
occurred when mmaps couldn't be used. Namely, ripgrep tries to still
read the input incrementally if it knows the pattern can't match through
a new line. But the detection logic was flawed, since it didn't account
for line anchors. This commit fixes that.
Fixes#1878, Fixes#1879
It turned out that --vimgrep wasn't quite getting the column of each
match correctly. Instead of printing column numbers relative to the
current line, it was printing column numbers as byte offsets relative to
where the match began. To fix this, we simply subtract the offset of the
line number from the beginning of the match. If the beginning of the
match came before the start of the current line, then there's really
nothing sensible we can do other than to use a column number of 1, which
we now document.
Interestingly, existing tests were checking that the previous behavior
was intended. My only defense is that I somehow tricked myself into
thinking it was a byte offset instead of a column number.
Kudos to @bfrg for calling this out in #1866:
https://github.com/BurntSushi/ripgrep/issues/1866#issuecomment-841635553
This message will emit the binary detection mechanism being used for
each file.
This does not noticeably increases the number of log messages, as the
'trace' level is already used for emitting messages for every file
searched.
This trace message was added in the course of investigating #1838.
Sadly, there were several tests that are coupled to the size of the
buffer used by ripgrep. Making the tests agnostic to the size is
difficult. And it's annoying to fix the tests. But we rarely change the
buffer size, so ¯\_(ツ)_/¯.
This increases the initial buffer size from 8KB to 64KB. This actually
leads to a reasonably noticeable improvement in at least one work-load,
and is unlikely to regress in any other case. Also, since Rust programs
(at least on Linux) seem to always use a minimum of 6-8MB of memory,
adding an extra 56KB is negligible.
Before:
$ hyperfine -i "rg 'zqzqzqzq' OpenSubtitles2018.raw.en --no-mmap"
Benchmark #1: rg 'zqzqzqzq' OpenSubtitles2018.raw.en --no-mmap
Time (mean ± σ): 2.109 s ± 0.012 s [User: 565.5 ms, System: 1541.6 ms]
Range (min … max): 2.094 s … 2.128 s 10 runs
After:
$ hyperfine -i "rg 'zqzqzqzq' OpenSubtitles2018.raw.en --no-mmap"
Benchmark #1: rg 'zqzqzqzq' OpenSubtitles2018.raw.en --no-mmap
Time (mean ± σ): 1.802 s ± 0.006 s [User: 462.3 ms, System: 1337.9 ms]
Range (min … max): 1.795 s … 1.814 s 10 runs
memmap is unmaintained at this point and it is being flagged as a
RUSTSEC advisory in ripgrep. This doesn't seem like that big of a deal
to me honestly, but memmap2 looks like a fine choice at this point.
Fixes#1785, Closes#1786
It seems that PowerShell uses sockets instead of FIFOs to redirect the
output between commands. So add `is_socket` to our `is_readable_stdin`
check.
This seems unlikely to cause problems and it probably more generally
correct than what we had before. In theory, it could cause problems if
it produces false positives, in which case, ripgrep will try to read
stdin when it should search the current working directory. (And this
usually winds up manifesting as ripgrep blocking forever.) But, if the
stdin handle reports itself as a socket, then it seems like we should
read it.
Fixes#1741, Closes#1742
`ignore::Error` wraps `std::io::Error` with additional information
(as well as expose non-IO errors). For people wanting to inspect what
the error is, they have to recursively match the Enum. This provides
`io_error` and `into_io_error` helpers to do this for the user.
PR #1740
This updates encoding_rs, crossbeam-utils and crossbeam-channel. This
serves two purposes. The encoding_rs update fixes a compilation failure
on the latest nightly. The crossbeam updates are good sense and to
reduce duplicate dependencies such as cfg-if. (Although, we note that
the log crate still pulls in cfg-if 0.1, so ripgrep has a duplicate
dependency there for now. But it's very small.)
Fixes#1721, Closes#1705
Bazel supports `BUILD.bazel` as well as `WORKSPACE.bazel`. In
addition, it is common to ship BUILD/WORKSPACE templates for
external repositories suffixed with .bazel for easier tool
recognition.
Co-authored-by: Brandon Adams <brandon.adams@imc.com>
PR #1716
Since the translation from a glob to a regex always
disables Unicode in the regex, it follows that we shouldn't
need regex's Unicode features enabled.
Now, ripgrep enables Unicode features in its regex
dependency and of course uses them, which will cause
globset to have it enabled in the ripgrep build as well. So
this doesn't actually change anything for ripgrep. But this
does slim thing downs for folks using globset independently
of ripgrep.
PR #1712
We use '+++' syntax to output a literal '**' for a '--glob' example.
This '+++' syntax is pretty ugly when rendered literally via --help. We
fix this by hackily inserting the '+++' syntax for its one specific case
that we need it during man page generation.
Not ideal but it works. And --help still has some '*foo*' markup, but we
live with that for now.
Fixes#1581