mirror of
https://github.com/BurntSushi/ripgrep.git
synced 2024-12-02 02:56:32 +02:00
binary: rejigger ripgrep's handling of binary files
This commit attempts to surface binary filtering in a slightly more user friendly way. Namely, before, ripgrep would silently stop searching a file if it detected a NUL byte, even if it had previously printed a match. This can lead to the user quite reasonably assuming that there are no more matches, since a partial search is fairly unintuitive. (ripgrep has this behavior by default because it really wants to NOT search binary files at all, just like it doesn't search gitignored or hidden files.) With this commit, if a match has already been printed and ripgrep detects a NUL byte, then it will print a warning message indicating that the search stopped prematurely. Moreover, this commit adds a new flag, --binary, which causes ripgrep to stop filtering binary files, but in a way that still avoids dumping binary data into terminals. That is, the --binary flag makes ripgrep behave more like grep's default behavior. For files explicitly specified in a search, e.g., `rg foo some-file`, then no binary filtering is applied (just like no gitignore and no hidden file filtering is applied). Instead, ripgrep behaves as if you gave the --binary flag for all explicitly given files. This was a fairly invasive change, and potentially increases the UX complexity of ripgrep around binary files. (Before, there were two binary modes, where as now there are three.) However, ripgrep is now a bit louder with warning messages when binary file detection might otherwise be hiding potential matches, so hopefully this is a net improvement. Finally, the `-uuu` convenience now maps to `--no-ignore --hidden --binary`, since this is closer to the actualy intent of the `--unrestricted` flag, i.e., to reduce ripgrep's smart filtering. As a consequence, `rg -uuu foo` should now search roughly the same number of bytes as `grep -r foo`, and `rg -uuua foo` should search roughly the same number of bytes as `grep -ra foo`. (The "roughly" weasel word is used because grep's and ripgrep's binary file detection might differ somewhat---perhaps based on buffer sizes---which can impact exactly what is and isn't searched.) See the numerous tests in tests/binary.rs for intended behavior. Fixes #306, Fixes #855
This commit is contained in:
parent
bd222ae93f
commit
a7d26c8f14
10
CHANGELOG.md
10
CHANGELOG.md
@ -11,6 +11,11 @@ TODO.
|
||||
error (e.g., regex syntax error). One exception to this is if ripgrep is run
|
||||
with `-q/--quiet`. In that case, if an error occurs and a match is found,
|
||||
then ripgrep will exit with a `0` exit status code.
|
||||
* Supplying the `-u/--unrestricted` flag three times is now equivalent to
|
||||
supplying `--no-ignore --hidden --binary`. Previously, `-uuu` was equivalent
|
||||
to `--no-ignore --hidden --text`. The difference is that `--binary` disables
|
||||
binary file filtering without potentially dumping binary data into your
|
||||
terminal. That is, `rg -uuu foo` should now be equivalent to `grep -r foo`.
|
||||
* The `avx-accel` feature of ripgrep has been removed since it is no longer
|
||||
necessary. All uses of AVX in ripgrep are now enabled automatically via
|
||||
runtime CPU feature detection. The `simd-accel` feature does remain
|
||||
@ -25,6 +30,8 @@ Performance improvements:
|
||||
|
||||
Feature enhancements:
|
||||
|
||||
* [FEATURE #855](https://github.com/BurntSushi/ripgrep/issues/855):
|
||||
Add `--binary` flag for disabling binary file filtering.
|
||||
* [FEATURE #1099](https://github.com/BurntSushi/ripgrep/pull/1099):
|
||||
Add support for Brotli and Zstd to the `-z/--search-zip` flag.
|
||||
* [FEATURE #1138](https://github.com/BurntSushi/ripgrep/pull/1138):
|
||||
@ -36,6 +43,9 @@ Feature enhancements:
|
||||
|
||||
Bug fixes:
|
||||
|
||||
* [BUG #306](https://github.com/BurntSushi/ripgrep/issues/306),
|
||||
[BUG #855](https://github.com/BurntSushi/ripgrep/issues/855):
|
||||
Improve the user experience for ripgrep's binary file filtering.
|
||||
* [BUG #373](https://github.com/BurntSushi/ripgrep/issues/373),
|
||||
[BUG #1098](https://github.com/BurntSushi/ripgrep/issues/1098):
|
||||
`**` is now accepted as valid syntax anywhere in a glob.
|
||||
|
71
GUIDE.md
71
GUIDE.md
@ -18,6 +18,7 @@ translatable to any command line shell environment.
|
||||
* [Replacements](#replacements)
|
||||
* [Configuration file](#configuration-file)
|
||||
* [File encoding](#file-encoding)
|
||||
* [Binary data](#binary-data)
|
||||
* [Common options](#common-options)
|
||||
|
||||
|
||||
@ -680,6 +681,76 @@ $ rg '\w(?-u:\w)\w'
|
||||
```
|
||||
|
||||
|
||||
### Binary data
|
||||
|
||||
In addition to skipping hidden files and files in your `.gitignore` by default,
|
||||
ripgrep also attempts to skip binary files. ripgrep does this by default
|
||||
because binary files (like PDFs or images) are typically not things you want to
|
||||
search when searching for regex matches. Moreover, if content in a binary file
|
||||
did match, then it's possible for undesirable binary data to be printed to your
|
||||
terminal and wreak havoc.
|
||||
|
||||
Unfortunately, unlike skipping hidden files and respecting your `.gitignore`
|
||||
rules, a file cannot as easily be classified as binary. In order to figure out
|
||||
whether a file is binary, the most effective heuristic that balances
|
||||
correctness with performance is to simply look for `NUL` bytes. At that point,
|
||||
the determination is simple: a file is considered "binary" if and only if it
|
||||
contains a `NUL` byte somewhere in its contents.
|
||||
|
||||
The issue is that while most binary files will have a `NUL` byte toward the
|
||||
beginning of its contents, this is not necessarily true. The `NUL` byte might
|
||||
be the very last byte in a large file, but that file is still considered
|
||||
binary. While this leads to a fair amount of complexity inside ripgrep's
|
||||
implementation, it also results in some unintuitive user experiences.
|
||||
|
||||
At a high level, ripgrep operates in three different modes with respect to
|
||||
binary files:
|
||||
|
||||
1. The default mode is to attempt to remove binary files from a search
|
||||
completely. This is meant to mirror how ripgrep removes hidden files and
|
||||
files in your `.gitignore` automatically. That is, as soon as a file is
|
||||
detected as binary, searching stops. If a match was already printed (because
|
||||
it was detected long before a `NUL` byte), then ripgrep will print a warning
|
||||
message indicating that the search stopped prematurely. This default mode
|
||||
**only applies to files searched by ripgrep as a result of recursive
|
||||
directory traversal**, which is consistent with ripgrep's other automatic
|
||||
filtering. For example, `rg foo .file` will search `.file` even though it
|
||||
is hidden. Similarly, `rg foo binary-file` search `binary-file` in "binary"
|
||||
mode automatically.
|
||||
2. Binary mode is similar to the default mode, except it will not always
|
||||
stop searching after it sees a `NUL` byte. Namely, in this mode, ripgrep
|
||||
will continue searching a file that is known to be binary until the first
|
||||
of two conditions is met: 1) the end of the file has been reached or 2) a
|
||||
match is or has been seen. This means that in binary mode, if ripgrep
|
||||
reports no matches, then there are no matches in the file. When a match does
|
||||
occur, ripgrep prints a message similar to one it prints when in its default
|
||||
mode indicating that the search has stopped prematurely. This mode can be
|
||||
forcefully enabled for all files with the `--binary` flag. The purpose of
|
||||
binary mode is to provide a way to discover matches in all files, but to
|
||||
avoid having binary data dumped into your terminal.
|
||||
3. Text mode completely disables all binary detection and searches all files
|
||||
as if they were text. This is useful when searching a file that is
|
||||
predominantly text but contains a `NUL` byte, or if you are specifically
|
||||
trying to search binary data. This mode can be enabled with the `-a/--text`
|
||||
flag. Note that when using this mode on very large binary files, it is
|
||||
possible for ripgrep to use a lot of memory.
|
||||
|
||||
Unfortunately, there is one additional complexity in ripgrep that can make it
|
||||
difficult to reason about binary files. That is, the way binary detection works
|
||||
depends on the way that ripgrep searches your files. Specifically:
|
||||
|
||||
* When ripgrep uses memory maps, then binary detection is only performed on the
|
||||
first few kilobytes of the file in addition to every matching line.
|
||||
* When ripgrep doesn't use memory maps, then binary detection is performed on
|
||||
all bytes searched.
|
||||
|
||||
This means that whether a file is detected as binary or not can change based
|
||||
on the internal search strategy used by ripgrep. If you prefer to keep
|
||||
ripgrep's binary file detection consistent, then you can disable memory maps
|
||||
via the `--no-mmap` flag. (The cost will be a small performance regression when
|
||||
searching very large files on some platforms.)
|
||||
|
||||
|
||||
### Common options
|
||||
|
||||
ripgrep has a lot of flags. Too many to keep in your head at once. This section
|
||||
|
@ -227,6 +227,8 @@ _rg() {
|
||||
|
||||
+ '(text)' # Binary-search options
|
||||
{-a,--text}'[search binary files as if they were text]'
|
||||
"--binary[search binary files, don't print binary data]"
|
||||
$no"--no-binary[don't search binary files]"
|
||||
$no"(--null-data)--no-text[don't search binary files as if they were text]"
|
||||
|
||||
+ '(threads)' # Thread-count options
|
||||
|
@ -41,6 +41,9 @@ configuration file. The file can specify one shell argument per line. Lines
|
||||
starting with *#* are ignored. For more details, see the man page or the
|
||||
*README*.
|
||||
|
||||
Tip: to disable all smart filtering and make ripgrep behave a bit more like
|
||||
classical grep, use *rg -uuu*.
|
||||
|
||||
|
||||
REGEX SYNTAX
|
||||
------------
|
||||
|
@ -5,6 +5,7 @@ use std::path::Path;
|
||||
use std::sync::Arc;
|
||||
use std::time::Instant;
|
||||
|
||||
use bstr::BStr;
|
||||
use grep_matcher::{Match, Matcher};
|
||||
use grep_searcher::{
|
||||
LineStep, Searcher,
|
||||
@ -743,6 +744,11 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
||||
stats.add_matches(self.standard.matches.len() as u64);
|
||||
stats.add_matched_lines(mat.lines().count() as u64);
|
||||
}
|
||||
if searcher.binary_detection().convert_byte().is_some() {
|
||||
if self.binary_byte_offset.is_some() {
|
||||
return Ok(false);
|
||||
}
|
||||
}
|
||||
|
||||
StandardImpl::from_match(searcher, self, mat).sink()?;
|
||||
Ok(!self.should_quit())
|
||||
@ -764,6 +770,12 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
||||
self.record_matches(ctx.bytes())?;
|
||||
self.replace(ctx.bytes())?;
|
||||
}
|
||||
if searcher.binary_detection().convert_byte().is_some() {
|
||||
if self.binary_byte_offset.is_some() {
|
||||
return Ok(false);
|
||||
}
|
||||
}
|
||||
|
||||
StandardImpl::from_context(searcher, self, ctx).sink()?;
|
||||
Ok(!self.should_quit())
|
||||
}
|
||||
@ -776,6 +788,15 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
fn binary_data(
|
||||
&mut self,
|
||||
_searcher: &Searcher,
|
||||
binary_byte_offset: u64,
|
||||
) -> Result<bool, io::Error> {
|
||||
self.binary_byte_offset = Some(binary_byte_offset);
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
fn begin(
|
||||
&mut self,
|
||||
_searcher: &Searcher,
|
||||
@ -793,10 +814,12 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for StandardSink<'p, 's, M, W> {
|
||||
|
||||
fn finish(
|
||||
&mut self,
|
||||
_searcher: &Searcher,
|
||||
searcher: &Searcher,
|
||||
finish: &SinkFinish,
|
||||
) -> Result<(), io::Error> {
|
||||
self.binary_byte_offset = finish.binary_byte_offset();
|
||||
if let Some(offset) = self.binary_byte_offset {
|
||||
StandardImpl::new(searcher, self).write_binary_message(offset)?;
|
||||
}
|
||||
if let Some(stats) = self.stats.as_mut() {
|
||||
stats.add_elapsed(self.start_time.elapsed());
|
||||
stats.add_searches(1);
|
||||
@ -1314,6 +1337,38 @@ impl<'a, M: Matcher, W: WriteColor> StandardImpl<'a, M, W> {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn write_binary_message(&self, offset: u64) -> io::Result<()> {
|
||||
if self.sink.match_count == 0 {
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let bin = self.searcher.binary_detection();
|
||||
if let Some(byte) = bin.quit_byte() {
|
||||
self.write(b"WARNING: stopped searching binary file ")?;
|
||||
if let Some(path) = self.path() {
|
||||
self.write_spec(self.config().colors.path(), path.as_bytes())?;
|
||||
self.write(b" ")?;
|
||||
}
|
||||
let remainder = format!(
|
||||
"after match (found {:?} byte around offset {})\n",
|
||||
BStr::new(&[byte]), offset,
|
||||
);
|
||||
self.write(remainder.as_bytes())?;
|
||||
} else if let Some(byte) = bin.convert_byte() {
|
||||
self.write(b"Binary file ")?;
|
||||
if let Some(path) = self.path() {
|
||||
self.write_spec(self.config().colors.path(), path.as_bytes())?;
|
||||
self.write(b" ")?;
|
||||
}
|
||||
let remainder = format!(
|
||||
"matches (found {:?} byte around offset {})\n",
|
||||
BStr::new(&[byte]), offset,
|
||||
);
|
||||
self.write(remainder.as_bytes())?;
|
||||
}
|
||||
Ok(())
|
||||
}
|
||||
|
||||
fn write_context_separator(&self) -> io::Result<()> {
|
||||
if let Some(ref sep) = *self.config().separator_context {
|
||||
self.write(sep)?;
|
||||
|
@ -636,6 +636,34 @@ impl<'p, 's, M: Matcher, W: WriteColor> Sink for SummarySink<'p, 's, M, W> {
|
||||
stats.add_bytes_searched(finish.byte_count());
|
||||
stats.add_bytes_printed(self.summary.wtr.borrow().count());
|
||||
}
|
||||
// If our binary detection method says to quit after seeing binary
|
||||
// data, then we shouldn't print any results at all, even if we've
|
||||
// found a match before detecting binary data. The intent here is to
|
||||
// keep BinaryDetection::quit as a form of filter. Otherwise, we can
|
||||
// present a matching file with a smaller number of matches than
|
||||
// there might be, which can be quite misleading.
|
||||
//
|
||||
// If our binary detection method is to convert binary data, then we
|
||||
// don't quit and therefore search the entire contents of the file.
|
||||
//
|
||||
// There is an unfortunate inconsistency here. Namely, when using
|
||||
// Quiet or PathWithMatch, then the printer can quit after the first
|
||||
// match seen, which could be long before seeing binary data. This
|
||||
// means that using PathWithMatch can print a path where as using
|
||||
// Count might not print it at all because of binary data.
|
||||
//
|
||||
// It's not possible to fix this without also potentially significantly
|
||||
// impacting the performance of Quiet or PathWithMatch, so we accept
|
||||
// the bug.
|
||||
if self.binary_byte_offset.is_some()
|
||||
&& searcher.binary_detection().quit_byte().is_some()
|
||||
{
|
||||
// Squash the match count. The statistics reported will still
|
||||
// contain the match count, but the "official" match count should
|
||||
// be zero.
|
||||
self.match_count = 0;
|
||||
return Ok(());
|
||||
}
|
||||
|
||||
let show_count =
|
||||
!self.summary.config.exclude_zero
|
||||
|
@ -317,6 +317,14 @@ pub struct LineBuffer {
|
||||
}
|
||||
|
||||
impl LineBuffer {
|
||||
/// Set the binary detection method used on this line buffer.
|
||||
///
|
||||
/// This permits dynamically changing the binary detection strategy on
|
||||
/// an existing line buffer without needing to create a new one.
|
||||
pub fn set_binary_detection(&mut self, binary: BinaryDetection) {
|
||||
self.config.binary = binary;
|
||||
}
|
||||
|
||||
/// Reset this buffer, such that it can be used with a new reader.
|
||||
fn clear(&mut self) {
|
||||
self.pos = 0;
|
||||
|
@ -90,6 +90,13 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
self.sink_matched(buf, range)
|
||||
}
|
||||
|
||||
pub fn binary_data(
|
||||
&mut self,
|
||||
binary_byte_offset: u64,
|
||||
) -> Result<bool, S::Error> {
|
||||
self.sink.binary_data(&self.searcher, binary_byte_offset)
|
||||
}
|
||||
|
||||
pub fn begin(&mut self) -> Result<bool, S::Error> {
|
||||
self.sink.begin(&self.searcher)
|
||||
}
|
||||
@ -141,19 +148,28 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
consumed
|
||||
}
|
||||
|
||||
pub fn detect_binary(&mut self, buf: &[u8], range: &Range) -> bool {
|
||||
pub fn detect_binary(
|
||||
&mut self,
|
||||
buf: &[u8],
|
||||
range: &Range,
|
||||
) -> Result<bool, S::Error> {
|
||||
if self.binary_byte_offset.is_some() {
|
||||
return true;
|
||||
return Ok(self.config.binary.quit_byte().is_some());
|
||||
}
|
||||
let binary_byte = match self.config.binary.0 {
|
||||
BinaryDetection::Quit(b) => b,
|
||||
_ => return false,
|
||||
BinaryDetection::Convert(b) => b,
|
||||
_ => return Ok(false),
|
||||
};
|
||||
if let Some(i) = B(&buf[*range]).find_byte(binary_byte) {
|
||||
self.binary_byte_offset = Some(range.start() + i);
|
||||
true
|
||||
let offset = range.start() + i;
|
||||
self.binary_byte_offset = Some(offset);
|
||||
if !self.binary_data(offset as u64)? {
|
||||
return Ok(true);
|
||||
}
|
||||
Ok(self.config.binary.quit_byte().is_some())
|
||||
} else {
|
||||
false
|
||||
Ok(false)
|
||||
}
|
||||
}
|
||||
|
||||
@ -416,7 +432,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
buf: &[u8],
|
||||
range: &Range,
|
||||
) -> Result<bool, S::Error> {
|
||||
if self.binary && self.detect_binary(buf, range) {
|
||||
if self.binary && self.detect_binary(buf, range)? {
|
||||
return Ok(false);
|
||||
}
|
||||
if !self.sink_break_context(range.start())? {
|
||||
@ -448,7 +464,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
buf: &[u8],
|
||||
range: &Range,
|
||||
) -> Result<bool, S::Error> {
|
||||
if self.binary && self.detect_binary(buf, range) {
|
||||
if self.binary && self.detect_binary(buf, range)? {
|
||||
return Ok(false);
|
||||
}
|
||||
self.count_lines(buf, range.start());
|
||||
@ -478,7 +494,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
) -> Result<bool, S::Error> {
|
||||
assert!(self.after_context_left >= 1);
|
||||
|
||||
if self.binary && self.detect_binary(buf, range) {
|
||||
if self.binary && self.detect_binary(buf, range)? {
|
||||
return Ok(false);
|
||||
}
|
||||
self.count_lines(buf, range.start());
|
||||
@ -507,7 +523,7 @@ impl<'s, M: Matcher, S: Sink> Core<'s, M, S> {
|
||||
buf: &[u8],
|
||||
range: &Range,
|
||||
) -> Result<bool, S::Error> {
|
||||
if self.binary && self.detect_binary(buf, range) {
|
||||
if self.binary && self.detect_binary(buf, range)? {
|
||||
return Ok(false);
|
||||
}
|
||||
self.count_lines(buf, range.start());
|
||||
|
@ -51,6 +51,7 @@ where M: Matcher,
|
||||
fn fill(&mut self) -> Result<bool, S::Error> {
|
||||
assert!(self.rdr.buffer()[self.core.pos()..].is_empty());
|
||||
|
||||
let already_binary = self.rdr.binary_byte_offset().is_some();
|
||||
let old_buf_len = self.rdr.buffer().len();
|
||||
let consumed = self.core.roll(self.rdr.buffer());
|
||||
self.rdr.consume(consumed);
|
||||
@ -58,7 +59,14 @@ where M: Matcher,
|
||||
Err(err) => return Err(S::Error::error_io(err)),
|
||||
Ok(didread) => didread,
|
||||
};
|
||||
if !didread || self.rdr.binary_byte_offset().is_some() {
|
||||
if !already_binary {
|
||||
if let Some(offset) = self.rdr.binary_byte_offset() {
|
||||
if !self.core.binary_data(offset)? {
|
||||
return Ok(false);
|
||||
}
|
||||
}
|
||||
}
|
||||
if !didread || self.should_binary_quit() {
|
||||
return Ok(false);
|
||||
}
|
||||
// If rolling the buffer didn't result in consuming anything and if
|
||||
@ -71,6 +79,11 @@ where M: Matcher,
|
||||
}
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
fn should_binary_quit(&self) -> bool {
|
||||
self.rdr.binary_byte_offset().is_some()
|
||||
&& self.config.binary.quit_byte().is_some()
|
||||
}
|
||||
}
|
||||
|
||||
#[derive(Debug)]
|
||||
@ -103,7 +116,7 @@ impl<'s, M: Matcher, S: Sink> SliceByLine<'s, M, S> {
|
||||
DEFAULT_BUFFER_CAPACITY,
|
||||
);
|
||||
let binary_range = Range::new(0, binary_upto);
|
||||
if !self.core.detect_binary(self.slice, &binary_range) {
|
||||
if !self.core.detect_binary(self.slice, &binary_range)? {
|
||||
while
|
||||
!self.slice[self.core.pos()..].is_empty()
|
||||
&& self.core.match_by_line(self.slice)?
|
||||
@ -155,7 +168,7 @@ impl<'s, M: Matcher, S: Sink> MultiLine<'s, M, S> {
|
||||
DEFAULT_BUFFER_CAPACITY,
|
||||
);
|
||||
let binary_range = Range::new(0, binary_upto);
|
||||
if !self.core.detect_binary(self.slice, &binary_range) {
|
||||
if !self.core.detect_binary(self.slice, &binary_range)? {
|
||||
let mut keepgoing = true;
|
||||
while !self.slice[self.core.pos()..].is_empty() && keepgoing {
|
||||
keepgoing = self.sink()?;
|
||||
|
@ -75,25 +75,41 @@ impl BinaryDetection {
|
||||
BinaryDetection(line_buffer::BinaryDetection::Quit(binary_byte))
|
||||
}
|
||||
|
||||
// TODO(burntsushi): Figure out how to make binary conversion work. This
|
||||
// permits implementing GNU grep's default behavior, which is to zap NUL
|
||||
// bytes but still execute a search (if a match is detected, then GNU grep
|
||||
// stops and reports that a match was found but doesn't print the matching
|
||||
// line itself).
|
||||
//
|
||||
// This behavior is pretty simple to implement using the line buffer (and
|
||||
// in fact, it is already implemented and tested), since there's a fixed
|
||||
// size buffer that we can easily write to. The issue arises when searching
|
||||
// a `&[u8]` (whether on the heap or via a memory map), since this isn't
|
||||
// something we can easily write to.
|
||||
|
||||
/// The given byte is searched in all contents read by the line buffer. If
|
||||
/// it occurs, then it is replaced by the line terminator. The line buffer
|
||||
/// guarantees that this byte will never be observable by callers.
|
||||
#[allow(dead_code)]
|
||||
fn convert(binary_byte: u8) -> BinaryDetection {
|
||||
/// Binary detection is performed by looking for the given byte, and
|
||||
/// replacing it with the line terminator configured on the searcher.
|
||||
/// (If the searcher is configured to use `CRLF` as the line terminator,
|
||||
/// then this byte is replaced by just `LF`.)
|
||||
///
|
||||
/// When searching is performed using a fixed size buffer, then the
|
||||
/// contents of that buffer are always searched for the presence of this
|
||||
/// byte and replaced with the line terminator. In effect, the caller is
|
||||
/// guaranteed to never observe this byte while searching.
|
||||
///
|
||||
/// When searching is performed with the entire contents mapped into
|
||||
/// memory, then this setting has no effect and is ignored.
|
||||
pub fn convert(binary_byte: u8) -> BinaryDetection {
|
||||
BinaryDetection(line_buffer::BinaryDetection::Convert(binary_byte))
|
||||
}
|
||||
|
||||
/// If this binary detection uses the "quit" strategy, then this returns
|
||||
/// the byte that will cause a search to quit. In any other case, this
|
||||
/// returns `None`.
|
||||
pub fn quit_byte(&self) -> Option<u8> {
|
||||
match self.0 {
|
||||
line_buffer::BinaryDetection::Quit(b) => Some(b),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
|
||||
/// If this binary detection uses the "convert" strategy, then this returns
|
||||
/// the byte that will be replaced by the line terminator. In any other
|
||||
/// case, this returns `None`.
|
||||
pub fn convert_byte(&self) -> Option<u8> {
|
||||
match self.0 {
|
||||
line_buffer::BinaryDetection::Convert(b) => Some(b),
|
||||
_ => None,
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
/// An encoding to use when searching.
|
||||
@ -739,6 +755,12 @@ impl Searcher {
|
||||
}
|
||||
}
|
||||
|
||||
/// Set the binary detection method used on this searcher.
|
||||
pub fn set_binary_detection(&mut self, detection: BinaryDetection) {
|
||||
self.config.binary = detection.clone();
|
||||
self.line_buffer.borrow_mut().set_binary_detection(detection.0);
|
||||
}
|
||||
|
||||
/// Check that the searcher's configuration and the matcher are consistent
|
||||
/// with each other.
|
||||
fn check_config<M: Matcher>(&self, matcher: M) -> Result<(), ConfigError> {
|
||||
@ -778,6 +800,12 @@ impl Searcher {
|
||||
self.config.line_term
|
||||
}
|
||||
|
||||
/// Returns the type of binary detection configured on this searcher.
|
||||
#[inline]
|
||||
pub fn binary_detection(&self) -> &BinaryDetection {
|
||||
&self.config.binary
|
||||
}
|
||||
|
||||
/// Returns true if and only if this searcher is configured to invert its
|
||||
/// search results. That is, matching lines are lines that do **not** match
|
||||
/// the searcher's matcher.
|
||||
|
@ -167,6 +167,28 @@ pub trait Sink {
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
/// This method is called whenever binary detection is enabled and binary
|
||||
/// data is found. If binary data is found, then this is called at least
|
||||
/// once for the first occurrence with the absolute byte offset at which
|
||||
/// the binary data begins.
|
||||
///
|
||||
/// If this returns `true`, then searching continues. If this returns
|
||||
/// `false`, then searching is stopped immediately and `finish` is called.
|
||||
///
|
||||
/// If this returns an error, then searching is stopped immediately,
|
||||
/// `finish` is not called and the error is bubbled back up to the caller
|
||||
/// of the searcher.
|
||||
///
|
||||
/// By default, it does nothing and returns `true`.
|
||||
#[inline]
|
||||
fn binary_data(
|
||||
&mut self,
|
||||
_searcher: &Searcher,
|
||||
_binary_byte_offset: u64,
|
||||
) -> Result<bool, Self::Error> {
|
||||
Ok(true)
|
||||
}
|
||||
|
||||
/// This method is called when a search has begun, before any search is
|
||||
/// executed. By default, this does nothing.
|
||||
///
|
||||
@ -228,6 +250,15 @@ impl<'a, S: Sink> Sink for &'a mut S {
|
||||
(**self).context_break(searcher)
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn binary_data(
|
||||
&mut self,
|
||||
searcher: &Searcher,
|
||||
binary_byte_offset: u64,
|
||||
) -> Result<bool, S::Error> {
|
||||
(**self).binary_data(searcher, binary_byte_offset)
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn begin(
|
||||
&mut self,
|
||||
@ -275,6 +306,15 @@ impl<S: Sink + ?Sized> Sink for Box<S> {
|
||||
(**self).context_break(searcher)
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn binary_data(
|
||||
&mut self,
|
||||
searcher: &Searcher,
|
||||
binary_byte_offset: u64,
|
||||
) -> Result<bool, S::Error> {
|
||||
(**self).binary_data(searcher, binary_byte_offset)
|
||||
}
|
||||
|
||||
#[inline]
|
||||
fn begin(
|
||||
&mut self,
|
||||
|
73
src/app.rs
73
src/app.rs
@ -27,6 +27,9 @@ configuration file. The file can specify one shell argument per line. Lines
|
||||
starting with '#' are ignored. For more details, see the man page or the
|
||||
README.
|
||||
|
||||
Tip: to disable all smart filtering and make ripgrep behave a bit more like
|
||||
classical grep, use 'rg -uuu'.
|
||||
|
||||
Project home page: https://github.com/BurntSushi/ripgrep
|
||||
|
||||
Use -h for short descriptions and --help for more details.";
|
||||
@ -545,6 +548,7 @@ pub fn all_args_and_flags() -> Vec<RGArg> {
|
||||
// "positive" flag.
|
||||
flag_after_context(&mut args);
|
||||
flag_before_context(&mut args);
|
||||
flag_binary(&mut args);
|
||||
flag_block_buffered(&mut args);
|
||||
flag_byte_offset(&mut args);
|
||||
flag_case_sensitive(&mut args);
|
||||
@ -691,6 +695,55 @@ This overrides the --context flag.
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_binary(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Search binary files.";
|
||||
const LONG: &str = long!("\
|
||||
Enabling this flag will cause ripgrep to search binary files. By default,
|
||||
ripgrep attempts to automatically skip binary files in order to improve the
|
||||
relevance of results and make the search faster.
|
||||
|
||||
Binary files are heuristically detected based on whether they contain a NUL
|
||||
byte or not. By default (without this flag set), once a NUL byte is seen,
|
||||
ripgrep will stop searching the file. Usually, NUL bytes occur in the beginning
|
||||
of most binary files. If a NUL byte occurs after a match, then ripgrep will
|
||||
still stop searching the rest of the file, but a warning will be printed.
|
||||
|
||||
In contrast, when this flag is provided, ripgrep will continue searching a file
|
||||
even if a NUL byte is found. In particular, if a NUL byte is found then ripgrep
|
||||
will continue searching until either a match is found or the end of the file is
|
||||
reached, whichever comes sooner. If a match is found, then ripgrep will stop
|
||||
and print a warning saying that the search stopped prematurely.
|
||||
|
||||
If you want ripgrep to search a file without any special NUL byte handling at
|
||||
all (and potentially print binary data to stdout), then you should use the
|
||||
'-a/--text' flag.
|
||||
|
||||
The '--binary' flag is a flag for controlling ripgrep's automatic filtering
|
||||
mechanism. As such, it does not need to be used when searching a file
|
||||
explicitly or when searching stdin. That is, it is only applicable when
|
||||
recursively searching a directory.
|
||||
|
||||
Note that when the '-u/--unrestricted' flag is provided for a third time, then
|
||||
this flag is automatically enabled.
|
||||
|
||||
This flag can be disabled with '--no-binary'. It overrides the '-a/--text'
|
||||
flag.
|
||||
");
|
||||
let arg = RGArg::switch("binary")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-binary")
|
||||
.overrides("text")
|
||||
.overrides("no-text");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-binary")
|
||||
.hidden()
|
||||
.overrides("binary")
|
||||
.overrides("text")
|
||||
.overrides("no-text");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
fn flag_block_buffered(args: &mut Vec<RGArg>) {
|
||||
const SHORT: &str = "Force block buffering.";
|
||||
const LONG: &str = long!("\
|
||||
@ -1874,7 +1927,7 @@ fn flag_pre(args: &mut Vec<RGArg>) {
|
||||
For each input FILE, search the standard output of COMMAND FILE rather than the
|
||||
contents of FILE. This option expects the COMMAND program to either be an
|
||||
absolute path or to be available in your PATH. Either an empty string COMMAND
|
||||
or the `--no-pre` flag will disable this behavior.
|
||||
or the '--no-pre' flag will disable this behavior.
|
||||
|
||||
WARNING: When this flag is set, ripgrep will unconditionally spawn a
|
||||
process for every file that is searched. Therefore, this can incur an
|
||||
@ -2208,20 +2261,23 @@ escape codes to be printed that alter the behavior of your terminal.
|
||||
When binary file detection is enabled it is imperfect. In general, it uses
|
||||
a simple heuristic. If a NUL byte is seen during search, then the file is
|
||||
considered binary and search stops (unless this flag is present).
|
||||
Alternatively, if the '--binary' flag is used, then ripgrep will only quit
|
||||
when it sees a NUL byte after it sees a match (or searches the entire file).
|
||||
|
||||
Note that when the `-u/--unrestricted` flag is provided for a third time, then
|
||||
this flag is automatically enabled.
|
||||
|
||||
This flag can be disabled with --no-text.
|
||||
This flag can be disabled with '--no-text'. It overrides the '--binary' flag.
|
||||
");
|
||||
let arg = RGArg::switch("text").short("a")
|
||||
.help(SHORT).long_help(LONG)
|
||||
.overrides("no-text");
|
||||
.overrides("no-text")
|
||||
.overrides("binary")
|
||||
.overrides("no-binary");
|
||||
args.push(arg);
|
||||
|
||||
let arg = RGArg::switch("no-text")
|
||||
.hidden()
|
||||
.overrides("text");
|
||||
.overrides("text")
|
||||
.overrides("binary")
|
||||
.overrides("no-binary");
|
||||
args.push(arg);
|
||||
}
|
||||
|
||||
@ -2350,8 +2406,7 @@ Reduce the level of \"smart\" searching. A single -u won't respect .gitignore
|
||||
(etc.) files. Two -u flags will additionally search hidden files and
|
||||
directories. Three -u flags will additionally search binary files.
|
||||
|
||||
-uu is roughly equivalent to grep -r and -uuu is roughly equivalent to grep -a
|
||||
-r.
|
||||
'rg -uuu' is roughly equivalent to 'grep -r'.
|
||||
");
|
||||
let arg = RGArg::switch("unrestricted").short("u")
|
||||
.help(SHORT).long_help(LONG)
|
||||
|
47
src/args.rs
47
src/args.rs
@ -286,15 +286,18 @@ impl Args {
|
||||
&self,
|
||||
wtr: W,
|
||||
) -> Result<SearchWorker<W>> {
|
||||
let matches = self.matches();
|
||||
let matcher = self.matcher().clone();
|
||||
let printer = self.printer(wtr)?;
|
||||
let searcher = self.matches().searcher(self.paths())?;
|
||||
let searcher = matches.searcher(self.paths())?;
|
||||
let mut builder = SearchWorkerBuilder::new();
|
||||
builder
|
||||
.json_stats(self.matches().is_present("json"))
|
||||
.preprocessor(self.matches().preprocessor())
|
||||
.preprocessor_globs(self.matches().preprocessor_globs()?)
|
||||
.search_zip(self.matches().is_present("search-zip"));
|
||||
.json_stats(matches.is_present("json"))
|
||||
.preprocessor(matches.preprocessor())
|
||||
.preprocessor_globs(matches.preprocessor_globs()?)
|
||||
.search_zip(matches.is_present("search-zip"))
|
||||
.binary_detection_implicit(matches.binary_detection_implicit())
|
||||
.binary_detection_explicit(matches.binary_detection_explicit());
|
||||
Ok(builder.build(matcher, searcher, printer))
|
||||
}
|
||||
|
||||
@ -802,8 +805,7 @@ impl ArgMatches {
|
||||
.before_context(ctx_before)
|
||||
.after_context(ctx_after)
|
||||
.passthru(self.is_present("passthru"))
|
||||
.memory_map(self.mmap_choice(paths))
|
||||
.binary_detection(self.binary_detection());
|
||||
.memory_map(self.mmap_choice(paths));
|
||||
match self.encoding()? {
|
||||
EncodingMode::Some(enc) => {
|
||||
builder.encoding(Some(enc));
|
||||
@ -862,16 +864,39 @@ impl ArgMatches {
|
||||
///
|
||||
/// Methods are sorted alphabetically.
|
||||
impl ArgMatches {
|
||||
/// Returns the form of binary detection to perform.
|
||||
fn binary_detection(&self) -> BinaryDetection {
|
||||
/// Returns the form of binary detection to perform on files that are
|
||||
/// implicitly searched via recursive directory traversal.
|
||||
fn binary_detection_implicit(&self) -> BinaryDetection {
|
||||
let none =
|
||||
self.is_present("text")
|
||||
|| self.is_present("null-data");
|
||||
let convert =
|
||||
self.is_present("binary")
|
||||
|| self.unrestricted_count() >= 3;
|
||||
if none {
|
||||
BinaryDetection::none()
|
||||
} else if convert {
|
||||
BinaryDetection::convert(b'\x00')
|
||||
} else {
|
||||
BinaryDetection::quit(b'\x00')
|
||||
}
|
||||
}
|
||||
|
||||
/// Returns the form of binary detection to perform on files that are
|
||||
/// explicitly searched via the user invoking ripgrep on a particular
|
||||
/// file or files or stdin.
|
||||
///
|
||||
/// In general, this should never be BinaryDetection::quit, since that acts
|
||||
/// as a filter (but quitting immediately once a NUL byte is seen), and we
|
||||
/// should never filter out files that the user wants to explicitly search.
|
||||
fn binary_detection_explicit(&self) -> BinaryDetection {
|
||||
let none =
|
||||
self.is_present("text")
|
||||
|| self.unrestricted_count() >= 3
|
||||
|| self.is_present("null-data");
|
||||
if none {
|
||||
BinaryDetection::none()
|
||||
} else {
|
||||
BinaryDetection::quit(b'\x00')
|
||||
BinaryDetection::convert(b'\x00')
|
||||
}
|
||||
}
|
||||
|
||||
|
@ -10,7 +10,7 @@ use grep::matcher::Matcher;
|
||||
use grep::pcre2::{RegexMatcher as PCRE2RegexMatcher};
|
||||
use grep::printer::{JSON, Standard, Summary, Stats};
|
||||
use grep::regex::{RegexMatcher as RustRegexMatcher};
|
||||
use grep::searcher::Searcher;
|
||||
use grep::searcher::{BinaryDetection, Searcher};
|
||||
use ignore::overrides::Override;
|
||||
use serde_json as json;
|
||||
use serde_json::json;
|
||||
@ -27,6 +27,8 @@ struct Config {
|
||||
preprocessor: Option<PathBuf>,
|
||||
preprocessor_globs: Override,
|
||||
search_zip: bool,
|
||||
binary_implicit: BinaryDetection,
|
||||
binary_explicit: BinaryDetection,
|
||||
}
|
||||
|
||||
impl Default for Config {
|
||||
@ -36,6 +38,8 @@ impl Default for Config {
|
||||
preprocessor: None,
|
||||
preprocessor_globs: Override::empty(),
|
||||
search_zip: false,
|
||||
binary_implicit: BinaryDetection::none(),
|
||||
binary_explicit: BinaryDetection::none(),
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -134,6 +138,37 @@ impl SearchWorkerBuilder {
|
||||
self.config.search_zip = yes;
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the binary detection that should be used when searching files
|
||||
/// found via a recursive directory search.
|
||||
///
|
||||
/// Generally, this binary detection may be `BinaryDetection::quit` if
|
||||
/// we want to skip binary files completely.
|
||||
///
|
||||
/// By default, no binary detection is performed.
|
||||
pub fn binary_detection_implicit(
|
||||
&mut self,
|
||||
detection: BinaryDetection,
|
||||
) -> &mut SearchWorkerBuilder {
|
||||
self.config.binary_implicit = detection;
|
||||
self
|
||||
}
|
||||
|
||||
/// Set the binary detection that should be used when searching files
|
||||
/// explicitly supplied by an end user.
|
||||
///
|
||||
/// Generally, this binary detection should NOT be `BinaryDetection::quit`,
|
||||
/// since we never want to automatically filter files supplied by the end
|
||||
/// user.
|
||||
///
|
||||
/// By default, no binary detection is performed.
|
||||
pub fn binary_detection_explicit(
|
||||
&mut self,
|
||||
detection: BinaryDetection,
|
||||
) -> &mut SearchWorkerBuilder {
|
||||
self.config.binary_explicit = detection;
|
||||
self
|
||||
}
|
||||
}
|
||||
|
||||
/// The result of executing a search.
|
||||
@ -308,6 +343,14 @@ impl<W: WriteColor> SearchWorker<W> {
|
||||
|
||||
/// Search the given subject using the appropriate strategy.
|
||||
fn search_impl(&mut self, subject: &Subject) -> io::Result<SearchResult> {
|
||||
let bin =
|
||||
if subject.is_explicit() {
|
||||
self.config.binary_explicit.clone()
|
||||
} else {
|
||||
self.config.binary_implicit.clone()
|
||||
};
|
||||
self.searcher.set_binary_detection(bin);
|
||||
|
||||
let path = subject.path();
|
||||
if subject.is_stdin() {
|
||||
let stdin = io::stdin();
|
||||
|
@ -59,17 +59,12 @@ impl SubjectBuilder {
|
||||
if let Some(ignore_err) = subj.dent.error() {
|
||||
ignore_message!("{}", ignore_err);
|
||||
}
|
||||
// If this entry represents stdin, then we always search it.
|
||||
if subj.dent.is_stdin() {
|
||||
// If this entry was explicitly provided by an end user, then we always
|
||||
// want to search it.
|
||||
if subj.is_explicit() {
|
||||
return Some(subj);
|
||||
}
|
||||
// If this subject has a depth of 0, then it was provided explicitly
|
||||
// by an end user (or via a shell glob). In this case, we always want
|
||||
// to search it if it even smells like a file (e.g., a symlink).
|
||||
if subj.dent.depth() == 0 && !subj.is_dir() {
|
||||
return Some(subj);
|
||||
}
|
||||
// At this point, we only want to search something it's explicitly a
|
||||
// At this point, we only want to search something if it's explicitly a
|
||||
// file. This omits symlinks. (If ripgrep was configured to follow
|
||||
// symlinks, then they have already been followed by the directory
|
||||
// traversal.)
|
||||
@ -127,6 +122,26 @@ impl Subject {
|
||||
self.dent.is_stdin()
|
||||
}
|
||||
|
||||
/// Returns true if and only if this entry corresponds to a subject to
|
||||
/// search that was explicitly supplied by an end user.
|
||||
///
|
||||
/// Generally, this corresponds to either stdin or an explicit file path
|
||||
/// argument. e.g., in `rg foo some-file ./some-dir/`, `some-file` is
|
||||
/// an explicit subject, but, e.g., `./some-dir/some-other-file` is not.
|
||||
///
|
||||
/// However, note that ripgrep does not see through shell globbing. e.g.,
|
||||
/// in `rg foo ./some-dir/*`, `./some-dir/some-other-file` will be treated
|
||||
/// as an explicit subject.
|
||||
pub fn is_explicit(&self) -> bool {
|
||||
// stdin is obvious. When an entry has a depth of 0, that means it
|
||||
// was explicitly provided to our directory iterator, which means it
|
||||
// was in turn explicitly provided by the end user. The !is_dir check
|
||||
// means that we want to search files even if their symlinks, again,
|
||||
// because they were explicitly provided. (And we never want to try
|
||||
// to search a directory.)
|
||||
self.is_stdin() || (self.dent.depth() == 0 && !self.is_dir())
|
||||
}
|
||||
|
||||
/// Returns true if and only if this subject points to a directory after
|
||||
/// following symbolic links.
|
||||
fn is_dir(&self) -> bool {
|
||||
|
315
tests/binary.rs
Normal file
315
tests/binary.rs
Normal file
@ -0,0 +1,315 @@
|
||||
use crate::util::{Dir, TestCommand};
|
||||
|
||||
// This file contains a smattering of tests specifically for checking ripgrep's
|
||||
// handling of binary files. There's quite a bit of discussion on this in this
|
||||
// bug report: https://github.com/BurntSushi/ripgrep/issues/306
|
||||
|
||||
// Our haystack is the first 500 lines of Gutenberg's copy of "A Study in
|
||||
// Scarlet," with a NUL byte at line 237: `abcdef\x00`.
|
||||
//
|
||||
// The position and size of the haystack is, unfortunately, significant. In
|
||||
// particular, the NUL byte is specifically inserted at some point *after* the
|
||||
// first 8192 bytes, which corresponds to the initial capacity of the buffer
|
||||
// that ripgrep uses to read files. (grep for DEFAULT_BUFFER_CAPACITY.) The
|
||||
// position of the NUL byte ensures that we can execute some search on the
|
||||
// initial buffer contents without ever detecting any binary data. Moreover,
|
||||
// when using a memory map for searching, only the first 8192 bytes are
|
||||
// scanned for a NUL byte, so no binary bytes are detected at all when using
|
||||
// a memory map (unless our query matches line 237).
|
||||
//
|
||||
// One last note: in the tests below, we use --no-mmap heavily because binary
|
||||
// detection with memory maps is a bit different. Namely, NUL bytes are only
|
||||
// searched for in the first few KB of the file and in a match. Normally, NUL
|
||||
// bytes are searched for everywhere.
|
||||
//
|
||||
// TODO: Add tests for binary file detection when using memory maps.
|
||||
const HAY: &'static [u8] = include_bytes!("./data/sherlock-nul.txt");
|
||||
|
||||
// This tests that ripgrep prints a warning message if it finds and prints a
|
||||
// match in a binary file before detecting that it is a binary file. The point
|
||||
// here is to notify that user that the search of the file is only partially
|
||||
// complete.
|
||||
//
|
||||
// This applies to files that are *implicitly* searched via a recursive
|
||||
// directory traversal. In particular, this results in a WARNING message being
|
||||
// printed. We make our file "implicit" by doing a recursive search with a glob
|
||||
// that matches our file.
|
||||
rgtest!(after_match1_implicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
WARNING: stopped searching binary file hay after match (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit, except we provide a file to search
|
||||
// explicitly. This results in identical behavior, but a different message.
|
||||
rgtest!(after_match1_explicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Project Gutenberg EBook", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_explicit, except we feed our content on stdin.
|
||||
rgtest!(after_match1_stdin, |_: Dir, mut cmd: TestCommand| {
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Project Gutenberg EBook",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.pipe(HAY));
|
||||
});
|
||||
|
||||
// Like after_match1_implicit, but provides the --binary flag, which
|
||||
// disables binary filtering. Thus, this matches the behavior of ripgrep as
|
||||
// if the file were given explicitly.
|
||||
rgtest!(after_match1_implicit_binary, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--binary", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
Binary file hay matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(after_match1_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit_text, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(after_match1_explicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text", "Project Gutenberg EBook", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit, except this asks ripgrep to print all matching
|
||||
// files.
|
||||
//
|
||||
// This is an interesting corner case that one might consider a bug, however,
|
||||
// it's unlikely to be fixed. Namely, ripgrep probably shouldn't print `hay`
|
||||
// as a matching file since it is in fact a binary file, and thus should be
|
||||
// filtered out by default. However, the --files-with-matches flag will print
|
||||
// out the path of a matching file as soon as a match is seen and then stop
|
||||
// searching completely. Therefore, the NUL byte is never actually detected.
|
||||
//
|
||||
// The only way to fix this would be to kill ripgrep's performance in this case
|
||||
// and continue searching the entire file for a NUL byte. (Similarly if the
|
||||
// --quiet flag is set. See the next test.)
|
||||
rgtest!(after_match1_implicit_path, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-l", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
eqnice!("hay\n", cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit_path, except this indicates that a match was
|
||||
// found with no other output. (This is the same bug described above, but
|
||||
// manifest as an exit code with no output.)
|
||||
rgtest!(after_match1_implicit_quiet, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-q", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
eqnice!("", cmd.stdout());
|
||||
});
|
||||
|
||||
// This sets up the same test as after_match1_implicit_path, but instead of
|
||||
// just printing the matching files, this includes the full count of matches.
|
||||
// In this case, we need to search the entire file, so ripgrep correctly
|
||||
// detects the binary data and suppresses output.
|
||||
rgtest!(after_match1_implicit_count, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-c", "Project Gutenberg EBook", "-g", "hay",
|
||||
]);
|
||||
cmd.assert_err();
|
||||
});
|
||||
|
||||
// Like after_match1_implicit_count, except the --binary flag is provided,
|
||||
// which makes ripgrep disable binary data filtering even for implicit files.
|
||||
rgtest!(after_match1_implicit_count_binary, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-c", "--binary",
|
||||
"Project Gutenberg EBook",
|
||||
"-g", "hay",
|
||||
]);
|
||||
eqnice!("hay:1\n", cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match1_implicit_count, except the file path is provided
|
||||
// explicitly, so binary filtering is disabled and a count is correctly
|
||||
// reported.
|
||||
rgtest!(after_match1_explicit_count, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-c", "Project Gutenberg EBook", "hay",
|
||||
]);
|
||||
eqnice!("1\n", cmd.stdout());
|
||||
});
|
||||
|
||||
// This tests that a match way before the NUL byte is shown, but a match after
|
||||
// the NUL byte is not.
|
||||
rgtest!(after_match2_implicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n",
|
||||
"Project Gutenberg EBook|a medical student",
|
||||
"-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
WARNING: stopped searching binary file hay after match (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like after_match2_implicit, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(after_match2_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text",
|
||||
"Project Gutenberg EBook|a medical student",
|
||||
"-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:1:The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
hay:236:\"And yet you say he is not a medical student?\"
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// This tests that ripgrep *silently* quits before finding a match that occurs
|
||||
// after a NUL byte.
|
||||
rgtest!(before_match1_implicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Heaven", "-g", "hay",
|
||||
]);
|
||||
cmd.assert_err();
|
||||
});
|
||||
|
||||
// This tests that ripgrep *does not* silently quit before finding a match that
|
||||
// occurs after a NUL byte when a file is explicitly searched.
|
||||
rgtest!(before_match1_explicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "Heaven", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like before_match1_implicit, but enables the --binary flag, which
|
||||
// disables binary filtering. Thus, this matches the behavior of ripgrep as if
|
||||
// the file were given explicitly.
|
||||
rgtest!(before_match1_implicit_binary, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--binary", "Heaven", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
Binary file hay matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like before_match1_implicit, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(before_match1_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text", "Heaven", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:238:\"No. Heaven knows what the objects of his studies are. But here we
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// This tests that ripgrep *silently* quits before finding a match that occurs
|
||||
// before a NUL byte, but within the same buffer as the NUL byte.
|
||||
rgtest!(before_match2_implicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "a medical student", "-g", "hay",
|
||||
]);
|
||||
cmd.assert_err();
|
||||
});
|
||||
|
||||
// This tests that ripgrep *does not* silently quit before finding a match that
|
||||
// occurs before a NUL byte, but within the same buffer as the NUL byte. Even
|
||||
// though the match occurs before the NUL byte, ripgrep still doesn't print it
|
||||
// because it has already scanned ahead to detect the NUL byte. (This matches
|
||||
// the behavior of GNU grep.)
|
||||
rgtest!(before_match2_explicit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "a medical student", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 9741)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
// Like before_match1_implicit, but enables -a/--text, so no binary
|
||||
// detection should be performed.
|
||||
rgtest!(before_match2_implicit_text, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create_bytes("hay", HAY);
|
||||
cmd.args(&[
|
||||
"--no-mmap", "-n", "--text", "a medical student", "-g", "hay",
|
||||
]);
|
||||
|
||||
let expected = "\
|
||||
hay:236:\"And yet you say he is not a medical student?\"
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
500
tests/data/sherlock-nul.txt
Normal file
500
tests/data/sherlock-nul.txt
Normal file
@ -0,0 +1,500 @@
|
||||
The Project Gutenberg EBook of A Study In Scarlet, by Arthur Conan Doyle
|
||||
|
||||
This eBook is for the use of anyone anywhere at no cost and with
|
||||
almost no restrictions whatsoever. You may copy it, give it away or
|
||||
re-use it under the terms of the Project Gutenberg License included
|
||||
with this eBook or online at www.gutenberg.org
|
||||
|
||||
|
||||
Title: A Study In Scarlet
|
||||
|
||||
Author: Arthur Conan Doyle
|
||||
|
||||
Posting Date: July 12, 2008 [EBook #244]
|
||||
Release Date: April, 1995
|
||||
[Last updated: February 17, 2013]
|
||||
|
||||
Language: English
|
||||
|
||||
|
||||
*** START OF THIS PROJECT GUTENBERG EBOOK A STUDY IN SCARLET ***
|
||||
|
||||
|
||||
|
||||
|
||||
Produced by Roger Squires
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
A STUDY IN SCARLET.
|
||||
|
||||
By A. Conan Doyle
|
||||
|
||||
[1]
|
||||
|
||||
|
||||
|
||||
Original Transcriber's Note: This etext is prepared directly
|
||||
from an 1887 edition, and care has been taken to duplicate the
|
||||
original exactly, including typographical and punctuation
|
||||
vagaries.
|
||||
|
||||
Additions to the text include adding the underscore character to
|
||||
indicate italics, and textual end-notes in square braces.
|
||||
|
||||
Project Gutenberg Editor's Note: In reproofing and moving old PG
|
||||
files such as this to the present PG directory system it is the
|
||||
policy to reformat the text to conform to present PG Standards.
|
||||
In this case however, in consideration of the note above of the
|
||||
original transcriber describing his care to try to duplicate the
|
||||
original 1887 edition as to typography and punctuation vagaries,
|
||||
no changes have been made in this ascii text file. However, in
|
||||
the Latin-1 file and this html file, present standards are
|
||||
followed and the several French and Spanish words have been
|
||||
given their proper accents.
|
||||
|
||||
Part II, The Country of the Saints, deals much with the Mormon Church.
|
||||
|
||||
|
||||
|
||||
|
||||
A STUDY IN SCARLET.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
PART I.
|
||||
|
||||
(_Being a reprint from the reminiscences of_ JOHN H. WATSON, M.D., _late
|
||||
of the Army Medical Department._) [2]
|
||||
|
||||
|
||||
|
||||
|
||||
CHAPTER I. MR. SHERLOCK HOLMES.
|
||||
|
||||
|
||||
IN the year 1878 I took my degree of Doctor of Medicine of the
|
||||
University of London, and proceeded to Netley to go through the course
|
||||
prescribed for surgeons in the army. Having completed my studies there,
|
||||
I was duly attached to the Fifth Northumberland Fusiliers as Assistant
|
||||
Surgeon. The regiment was stationed in India at the time, and before
|
||||
I could join it, the second Afghan war had broken out. On landing at
|
||||
Bombay, I learned that my corps had advanced through the passes, and
|
||||
was already deep in the enemy's country. I followed, however, with many
|
||||
other officers who were in the same situation as myself, and succeeded
|
||||
in reaching Candahar in safety, where I found my regiment, and at once
|
||||
entered upon my new duties.
|
||||
|
||||
The campaign brought honours and promotion to many, but for me it had
|
||||
nothing but misfortune and disaster. I was removed from my brigade and
|
||||
attached to the Berkshires, with whom I served at the fatal battle of
|
||||
Maiwand. There I was struck on the shoulder by a Jezail bullet, which
|
||||
shattered the bone and grazed the subclavian artery. I should have
|
||||
fallen into the hands of the murderous Ghazis had it not been for the
|
||||
devotion and courage shown by Murray, my orderly, who threw me across a
|
||||
pack-horse, and succeeded in bringing me safely to the British lines.
|
||||
|
||||
Worn with pain, and weak from the prolonged hardships which I had
|
||||
undergone, I was removed, with a great train of wounded sufferers, to
|
||||
the base hospital at Peshawar. Here I rallied, and had already improved
|
||||
so far as to be able to walk about the wards, and even to bask a little
|
||||
upon the verandah, when I was struck down by enteric fever, that curse
|
||||
of our Indian possessions. For months my life was despaired of, and
|
||||
when at last I came to myself and became convalescent, I was so weak and
|
||||
emaciated that a medical board determined that not a day should be lost
|
||||
in sending me back to England. I was dispatched, accordingly, in the
|
||||
troopship "Orontes," and landed a month later on Portsmouth jetty, with
|
||||
my health irretrievably ruined, but with permission from a paternal
|
||||
government to spend the next nine months in attempting to improve it.
|
||||
|
||||
I had neither kith nor kin in England, and was therefore as free as
|
||||
air--or as free as an income of eleven shillings and sixpence a day will
|
||||
permit a man to be. Under such circumstances, I naturally gravitated to
|
||||
London, that great cesspool into which all the loungers and idlers of
|
||||
the Empire are irresistibly drained. There I stayed for some time at
|
||||
a private hotel in the Strand, leading a comfortless, meaningless
|
||||
existence, and spending such money as I had, considerably more freely
|
||||
than I ought. So alarming did the state of my finances become, that
|
||||
I soon realized that I must either leave the metropolis and rusticate
|
||||
somewhere in the country, or that I must make a complete alteration in
|
||||
my style of living. Choosing the latter alternative, I began by making
|
||||
up my mind to leave the hotel, and to take up my quarters in some less
|
||||
pretentious and less expensive domicile.
|
||||
|
||||
On the very day that I had come to this conclusion, I was standing at
|
||||
the Criterion Bar, when some one tapped me on the shoulder, and turning
|
||||
round I recognized young Stamford, who had been a dresser under me at
|
||||
Barts. The sight of a friendly face in the great wilderness of London is
|
||||
a pleasant thing indeed to a lonely man. In old days Stamford had never
|
||||
been a particular crony of mine, but now I hailed him with enthusiasm,
|
||||
and he, in his turn, appeared to be delighted to see me. In the
|
||||
exuberance of my joy, I asked him to lunch with me at the Holborn, and
|
||||
we started off together in a hansom.
|
||||
|
||||
"Whatever have you been doing with yourself, Watson?" he asked in
|
||||
undisguised wonder, as we rattled through the crowded London streets.
|
||||
"You are as thin as a lath and as brown as a nut."
|
||||
|
||||
I gave him a short sketch of my adventures, and had hardly concluded it
|
||||
by the time that we reached our destination.
|
||||
|
||||
"Poor devil!" he said, commiseratingly, after he had listened to my
|
||||
misfortunes. "What are you up to now?"
|
||||
|
||||
"Looking for lodgings." [3] I answered. "Trying to solve the problem
|
||||
as to whether it is possible to get comfortable rooms at a reasonable
|
||||
price."
|
||||
|
||||
"That's a strange thing," remarked my companion; "you are the second man
|
||||
to-day that has used that expression to me."
|
||||
|
||||
"And who was the first?" I asked.
|
||||
|
||||
"A fellow who is working at the chemical laboratory up at the hospital.
|
||||
He was bemoaning himself this morning because he could not get someone
|
||||
to go halves with him in some nice rooms which he had found, and which
|
||||
were too much for his purse."
|
||||
|
||||
"By Jove!" I cried, "if he really wants someone to share the rooms and
|
||||
the expense, I am the very man for him. I should prefer having a partner
|
||||
to being alone."
|
||||
|
||||
Young Stamford looked rather strangely at me over his wine-glass. "You
|
||||
don't know Sherlock Holmes yet," he said; "perhaps you would not care
|
||||
for him as a constant companion."
|
||||
|
||||
"Why, what is there against him?"
|
||||
|
||||
"Oh, I didn't say there was anything against him. He is a little queer
|
||||
in his ideas--an enthusiast in some branches of science. As far as I
|
||||
know he is a decent fellow enough."
|
||||
|
||||
"A medical student, I suppose?" said I.
|
||||
|
||||
"No--I have no idea what he intends to go in for. I believe he is well
|
||||
up in anatomy, and he is a first-class chemist; but, as far as I know,
|
||||
he has never taken out any systematic medical classes. His studies are
|
||||
very desultory and eccentric, but he has amassed a lot of out-of-the way
|
||||
knowledge which would astonish his professors."
|
||||
|
||||
"Did you never ask him what he was going in for?" I asked.
|
||||
|
||||
"No; he is not a man that it is easy to draw out, though he can be
|
||||
communicative enough when the fancy seizes him."
|
||||
|
||||
"I should like to meet him," I said. "If I am to lodge with anyone, I
|
||||
should prefer a man of studious and quiet habits. I am not strong
|
||||
enough yet to stand much noise or excitement. I had enough of both in
|
||||
Afghanistan to last me for the remainder of my natural existence. How
|
||||
could I meet this friend of yours?"
|
||||
|
||||
"He is sure to be at the laboratory," returned my companion. "He either
|
||||
avoids the place for weeks, or else he works there from morning to
|
||||
night. If you like, we shall drive round together after luncheon."
|
||||
|
||||
"Certainly," I answered, and the conversation drifted away into other
|
||||
channels.
|
||||
|
||||
As we made our way to the hospital after leaving the Holborn, Stamford
|
||||
gave me a few more particulars about the gentleman whom I proposed to
|
||||
take as a fellow-lodger.
|
||||
|
||||
"You mustn't blame me if you don't get on with him," he said; "I know
|
||||
nothing more of him than I have learned from meeting him occasionally in
|
||||
the laboratory. You proposed this arrangement, so you must not hold me
|
||||
responsible."
|
||||
|
||||
"If we don't get on it will be easy to part company," I answered. "It
|
||||
seems to me, Stamford," I added, looking hard at my companion, "that you
|
||||
have some reason for washing your hands of the matter. Is this fellow's
|
||||
temper so formidable, or what is it? Don't be mealy-mouthed about it."
|
||||
|
||||
"It is not easy to express the inexpressible," he answered with a laugh.
|
||||
"Holmes is a little too scientific for my tastes--it approaches to
|
||||
cold-bloodedness. I could imagine his giving a friend a little pinch of
|
||||
the latest vegetable alkaloid, not out of malevolence, you understand,
|
||||
but simply out of a spirit of inquiry in order to have an accurate idea
|
||||
of the effects. To do him justice, I think that he would take it himself
|
||||
with the same readiness. He appears to have a passion for definite and
|
||||
exact knowledge."
|
||||
|
||||
"Very right too."
|
||||
|
||||
"Yes, but it may be pushed to excess. When it comes to beating the
|
||||
subjects in the dissecting-rooms with a stick, it is certainly taking
|
||||
rather a bizarre shape."
|
||||
|
||||
"Beating the subjects!"
|
||||
|
||||
"Yes, to verify how far bruises may be produced after death. I saw him
|
||||
at it with my own eyes."
|
||||
|
||||
"And yet you say he is not a medical student?"
|
||||
abcdef
|
||||
"No. Heaven knows what the objects of his studies are. But here we
|
||||
are, and you must form your own impressions about him." As he spoke, we
|
||||
turned down a narrow lane and passed through a small side-door, which
|
||||
opened into a wing of the great hospital. It was familiar ground to me,
|
||||
and I needed no guiding as we ascended the bleak stone staircase and
|
||||
made our way down the long corridor with its vista of whitewashed
|
||||
wall and dun-coloured doors. Near the further end a low arched passage
|
||||
branched away from it and led to the chemical laboratory.
|
||||
|
||||
This was a lofty chamber, lined and littered with countless bottles.
|
||||
Broad, low tables were scattered about, which bristled with retorts,
|
||||
test-tubes, and little Bunsen lamps, with their blue flickering flames.
|
||||
There was only one student in the room, who was bending over a distant
|
||||
table absorbed in his work. At the sound of our steps he glanced round
|
||||
and sprang to his feet with a cry of pleasure. "I've found it! I've
|
||||
found it," he shouted to my companion, running towards us with a
|
||||
test-tube in his hand. "I have found a re-agent which is precipitated
|
||||
by hoemoglobin, [4] and by nothing else." Had he discovered a gold mine,
|
||||
greater delight could not have shone upon his features.
|
||||
|
||||
"Dr. Watson, Mr. Sherlock Holmes," said Stamford, introducing us.
|
||||
|
||||
"How are you?" he said cordially, gripping my hand with a strength
|
||||
for which I should hardly have given him credit. "You have been in
|
||||
Afghanistan, I perceive."
|
||||
|
||||
"How on earth did you know that?" I asked in astonishment.
|
||||
|
||||
"Never mind," said he, chuckling to himself. "The question now is about
|
||||
hoemoglobin. No doubt you see the significance of this discovery of
|
||||
mine?"
|
||||
|
||||
"It is interesting, chemically, no doubt," I answered, "but
|
||||
practically----"
|
||||
|
||||
"Why, man, it is the most practical medico-legal discovery for years.
|
||||
Don't you see that it gives us an infallible test for blood stains. Come
|
||||
over here now!" He seized me by the coat-sleeve in his eagerness, and
|
||||
drew me over to the table at which he had been working. "Let us have
|
||||
some fresh blood," he said, digging a long bodkin into his finger, and
|
||||
drawing off the resulting drop of blood in a chemical pipette. "Now, I
|
||||
add this small quantity of blood to a litre of water. You perceive that
|
||||
the resulting mixture has the appearance of pure water. The proportion
|
||||
of blood cannot be more than one in a million. I have no doubt, however,
|
||||
that we shall be able to obtain the characteristic reaction." As he
|
||||
spoke, he threw into the vessel a few white crystals, and then added
|
||||
some drops of a transparent fluid. In an instant the contents assumed a
|
||||
dull mahogany colour, and a brownish dust was precipitated to the bottom
|
||||
of the glass jar.
|
||||
|
||||
"Ha! ha!" he cried, clapping his hands, and looking as delighted as a
|
||||
child with a new toy. "What do you think of that?"
|
||||
|
||||
"It seems to be a very delicate test," I remarked.
|
||||
|
||||
"Beautiful! beautiful! The old Guiacum test was very clumsy and
|
||||
uncertain. So is the microscopic examination for blood corpuscles. The
|
||||
latter is valueless if the stains are a few hours old. Now, this appears
|
||||
to act as well whether the blood is old or new. Had this test been
|
||||
invented, there are hundreds of men now walking the earth who would long
|
||||
ago have paid the penalty of their crimes."
|
||||
|
||||
"Indeed!" I murmured.
|
||||
|
||||
"Criminal cases are continually hinging upon that one point. A man is
|
||||
suspected of a crime months perhaps after it has been committed. His
|
||||
linen or clothes are examined, and brownish stains discovered upon them.
|
||||
Are they blood stains, or mud stains, or rust stains, or fruit stains,
|
||||
or what are they? That is a question which has puzzled many an expert,
|
||||
and why? Because there was no reliable test. Now we have the Sherlock
|
||||
Holmes' test, and there will no longer be any difficulty."
|
||||
|
||||
His eyes fairly glittered as he spoke, and he put his hand over his
|
||||
heart and bowed as if to some applauding crowd conjured up by his
|
||||
imagination.
|
||||
|
||||
"You are to be congratulated," I remarked, considerably surprised at his
|
||||
enthusiasm.
|
||||
|
||||
"There was the case of Von Bischoff at Frankfort last year. He would
|
||||
certainly have been hung had this test been in existence. Then there was
|
||||
Mason of Bradford, and the notorious Muller, and Lefevre of Montpellier,
|
||||
and Samson of New Orleans. I could name a score of cases in which it
|
||||
would have been decisive."
|
||||
|
||||
"You seem to be a walking calendar of crime," said Stamford with a
|
||||
laugh. "You might start a paper on those lines. Call it the 'Police News
|
||||
of the Past.'"
|
||||
|
||||
"Very interesting reading it might be made, too," remarked Sherlock
|
||||
Holmes, sticking a small piece of plaster over the prick on his finger.
|
||||
"I have to be careful," he continued, turning to me with a smile, "for I
|
||||
dabble with poisons a good deal." He held out his hand as he spoke, and
|
||||
I noticed that it was all mottled over with similar pieces of plaster,
|
||||
and discoloured with strong acids.
|
||||
|
||||
"We came here on business," said Stamford, sitting down on a high
|
||||
three-legged stool, and pushing another one in my direction with
|
||||
his foot. "My friend here wants to take diggings, and as you were
|
||||
complaining that you could get no one to go halves with you, I thought
|
||||
that I had better bring you together."
|
||||
|
||||
Sherlock Holmes seemed delighted at the idea of sharing his rooms with
|
||||
me. "I have my eye on a suite in Baker Street," he said, "which would
|
||||
suit us down to the ground. You don't mind the smell of strong tobacco,
|
||||
I hope?"
|
||||
|
||||
"I always smoke 'ship's' myself," I answered.
|
||||
|
||||
"That's good enough. I generally have chemicals about, and occasionally
|
||||
do experiments. Would that annoy you?"
|
||||
|
||||
"By no means."
|
||||
|
||||
"Let me see--what are my other shortcomings. I get in the dumps at
|
||||
times, and don't open my mouth for days on end. You must not think I am
|
||||
sulky when I do that. Just let me alone, and I'll soon be right. What
|
||||
have you to confess now? It's just as well for two fellows to know the
|
||||
worst of one another before they begin to live together."
|
||||
|
||||
I laughed at this cross-examination. "I keep a bull pup," I said, "and
|
||||
I object to rows because my nerves are shaken, and I get up at all sorts
|
||||
of ungodly hours, and I am extremely lazy. I have another set of vices
|
||||
when I'm well, but those are the principal ones at present."
|
||||
|
||||
"Do you include violin-playing in your category of rows?" he asked,
|
||||
anxiously.
|
||||
|
||||
"It depends on the player," I answered. "A well-played violin is a treat
|
||||
for the gods--a badly-played one----"
|
||||
|
||||
"Oh, that's all right," he cried, with a merry laugh. "I think we may
|
||||
consider the thing as settled--that is, if the rooms are agreeable to
|
||||
you."
|
||||
|
||||
"When shall we see them?"
|
||||
|
||||
"Call for me here at noon to-morrow, and we'll go together and settle
|
||||
everything," he answered.
|
||||
|
||||
"All right--noon exactly," said I, shaking his hand.
|
||||
|
||||
We left him working among his chemicals, and we walked together towards
|
||||
my hotel.
|
||||
|
||||
"By the way," I asked suddenly, stopping and turning upon Stamford, "how
|
||||
the deuce did he know that I had come from Afghanistan?"
|
||||
|
||||
My companion smiled an enigmatical smile. "That's just his little
|
||||
peculiarity," he said. "A good many people have wanted to know how he
|
||||
finds things out."
|
||||
|
||||
"Oh! a mystery is it?" I cried, rubbing my hands. "This is very piquant.
|
||||
I am much obliged to you for bringing us together. 'The proper study of
|
||||
mankind is man,' you know."
|
||||
|
||||
"You must study him, then," Stamford said, as he bade me good-bye.
|
||||
"You'll find him a knotty problem, though. I'll wager he learns more
|
||||
about you than you about him. Good-bye."
|
||||
|
||||
"Good-bye," I answered, and strolled on to my hotel, considerably
|
||||
interested in my new acquaintance.
|
||||
|
||||
|
||||
|
||||
|
||||
CHAPTER II. THE SCIENCE OF DEDUCTION.
|
||||
|
||||
|
||||
WE met next day as he had arranged, and inspected the rooms at No. 221B,
|
||||
[5] Baker Street, of which he had spoken at our meeting. They
|
||||
consisted of a couple of comfortable bed-rooms and a single large
|
||||
airy sitting-room, cheerfully furnished, and illuminated by two broad
|
||||
windows. So desirable in every way were the apartments, and so moderate
|
||||
did the terms seem when divided between us, that the bargain was
|
||||
concluded upon the spot, and we at once entered into possession.
|
||||
That very evening I moved my things round from the hotel, and on the
|
||||
following morning Sherlock Holmes followed me with several boxes and
|
||||
portmanteaus. For a day or two we were busily employed in unpacking and
|
||||
laying out our property to the best advantage. That done, we
|
||||
gradually began to settle down and to accommodate ourselves to our new
|
||||
surroundings.
|
||||
|
||||
Holmes was certainly not a difficult man to live with. He was quiet
|
||||
in his ways, and his habits were regular. It was rare for him to be
|
||||
up after ten at night, and he had invariably breakfasted and gone out
|
||||
before I rose in the morning. Sometimes he spent his day at the chemical
|
||||
laboratory, sometimes in the dissecting-rooms, and occasionally in long
|
||||
walks, which appeared to take him into the lowest portions of the City.
|
||||
Nothing could exceed his energy when the working fit was upon him; but
|
||||
now and again a reaction would seize him, and for days on end he would
|
||||
lie upon the sofa in the sitting-room, hardly uttering a word or moving
|
||||
a muscle from morning to night. On these occasions I have noticed such
|
||||
a dreamy, vacant expression in his eyes, that I might have suspected him
|
||||
of being addicted to the use of some narcotic, had not the temperance
|
||||
and cleanliness of his whole life forbidden such a notion.
|
||||
|
||||
As the weeks went by, my interest in him and my curiosity as to his
|
||||
aims in life, gradually deepened and increased. His very person and
|
||||
appearance were such as to strike the attention of the most casual
|
||||
observer. In height he was rather over six feet, and so excessively
|
||||
lean that he seemed to be considerably taller. His eyes were sharp and
|
||||
piercing, save during those intervals of torpor to which I have alluded;
|
||||
and his thin, hawk-like nose gave his whole expression an air of
|
||||
alertness and decision. His chin, too, had the prominence and squareness
|
||||
which mark the man of determination. His hands were invariably
|
||||
blotted with ink and stained with chemicals, yet he was possessed of
|
||||
extraordinary delicacy of touch, as I frequently had occasion to observe
|
||||
when I watched him manipulating his fragile philosophical instruments.
|
||||
|
||||
The reader may set me down as a hopeless busybody, when I confess how
|
||||
much this man stimulated my curiosity, and how often I endeavoured
|
||||
to break through the reticence which he showed on all that concerned
|
||||
himself. Before pronouncing judgment, however, be it remembered, how
|
||||
objectless was my life, and how little there was to engage my attention.
|
||||
My health forbade me from venturing out unless the weather was
|
||||
exceptionally genial, and I had no friends who would call upon me and
|
||||
break the monotony of my daily existence. Under these circumstances, I
|
||||
eagerly hailed the little mystery which hung around my companion, and
|
||||
spent much of my time in endeavouring to unravel it.
|
||||
|
||||
He was not studying medicine. He had himself, in reply to a question,
|
||||
confirmed Stamford's opinion upon that point. Neither did he appear to
|
||||
have pursued any course of reading which might fit him for a degree in
|
||||
science or any other recognized portal which would give him an entrance
|
||||
into the learned world. Yet his zeal for certain studies was remarkable,
|
||||
and within eccentric limits his knowledge was so extraordinarily ample
|
||||
and minute that his observations have fairly astounded me. Surely no man
|
||||
would work so hard or attain such precise information unless he had some
|
||||
definite end in view. Desultory readers are seldom remarkable for the
|
||||
exactness of their learning. No man burdens his mind with small matters
|
||||
unless he has some very good reason for doing so.
|
||||
|
||||
His ignorance was as remarkable as his knowledge. Of contemporary
|
||||
literature, philosophy and politics he appeared to know next to nothing.
|
||||
Upon my quoting Thomas Carlyle, he inquired in the naivest way who he
|
||||
might be and what he had done. My surprise reached a climax, however,
|
||||
when I found incidentally that he was ignorant of the Copernican Theory
|
||||
and of the composition of the Solar System. That any civilized human
|
||||
being in this nineteenth century should not be aware that the earth
|
||||
travelled round the sun appeared to be to me such an extraordinary fact
|
||||
that I could hardly realize it.
|
||||
|
||||
"You appear to be astonished," he said, smiling at my expression of
|
||||
surprise. "Now that I do know it I shall do my best to forget it."
|
||||
|
||||
"To forget it!"
|
||||
|
||||
"You see," he explained, "I consider that a man's brain originally is
|
||||
like a little empty attic, and you have to stock it with such furniture
|
||||
as you choose. A fool takes in all the lumber of every sort that he
|
||||
comes across, so that the knowledge which might be useful to him gets
|
||||
crowded out, or at best is jumbled up with a lot of other things so that
|
||||
he has a difficulty in laying his hands upon it. Now the skilful workman
|
||||
is very careful indeed as to what he takes into his brain-attic. He will
|
||||
have nothing but the tools which may help him in doing his work, but of
|
||||
these he has a large assortment, and all in the most perfect order. It
|
||||
is a mistake to think that that little room has elastic walls and can
|
||||
distend to any extent. Depend upon it there comes a time when for every
|
||||
addition of knowledge you forget something that you knew before. It is
|
||||
of the highest importance, therefore, not to have useless facts elbowing
|
||||
out the useful ones."
|
||||
|
@ -72,7 +72,7 @@ rgtest!(f7_stdin, |dir: Dir, mut cmd: TestCommand| {
|
||||
sherlock:For the Doctor Watsons of this world, as opposed to the Sherlock
|
||||
sherlock:be, to a very large extent, the result of luck. Sherlock Holmes
|
||||
";
|
||||
eqnice!(expected, cmd.arg("-f-").pipe("Sherlock"));
|
||||
eqnice!(expected, cmd.arg("-f-").pipe(b"Sherlock"));
|
||||
});
|
||||
|
||||
// See: https://github.com/BurntSushi/ripgrep/issues/20
|
||||
|
@ -752,12 +752,11 @@ rgtest!(unrestricted2, |dir: Dir, mut cmd: TestCommand| {
|
||||
|
||||
rgtest!(unrestricted3, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create("sherlock", SHERLOCK);
|
||||
dir.create("file", "foo\x00bar\nfoo\x00baz\n");
|
||||
dir.create("hay", "foo\x00bar\nfoo\x00baz\n");
|
||||
cmd.arg("-uuu").arg("foo");
|
||||
|
||||
let expected = "\
|
||||
file:foo\x00bar
|
||||
file:foo\x00baz
|
||||
Binary file hay matches (found \"\\u{0}\" byte around offset 3)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
@ -950,10 +949,35 @@ rgtest!(compressed_failing_gzip, |dir: Dir, mut cmd: TestCommand| {
|
||||
cmd.assert_non_empty_stderr();
|
||||
});
|
||||
|
||||
rgtest!(binary_nosearch, |dir: Dir, mut cmd: TestCommand| {
|
||||
rgtest!(binary_convert, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create("file", "foo\x00bar\nfoo\x00baz\n");
|
||||
cmd.arg("foo").arg("file");
|
||||
cmd.arg("--no-mmap").arg("foo").arg("file");
|
||||
|
||||
let expected = "\
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 3)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
rgtest!(binary_convert_mmap, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create("file", "foo\x00bar\nfoo\x00baz\n");
|
||||
cmd.arg("--mmap").arg("foo").arg("file");
|
||||
|
||||
let expected = "\
|
||||
Binary file matches (found \"\\u{0}\" byte around offset 3)
|
||||
";
|
||||
eqnice!(expected, cmd.stdout());
|
||||
});
|
||||
|
||||
rgtest!(binary_quit, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create("file", "foo\x00bar\nfoo\x00baz\n");
|
||||
cmd.arg("--no-mmap").arg("foo").arg("-gfile");
|
||||
cmd.assert_err();
|
||||
});
|
||||
|
||||
rgtest!(binary_quit_mmap, |dir: Dir, mut cmd: TestCommand| {
|
||||
dir.create("file", "foo\x00bar\nfoo\x00baz\n");
|
||||
cmd.arg("--mmap").arg("foo").arg("-gfile");
|
||||
cmd.assert_err();
|
||||
});
|
||||
|
||||
|
@ -88,7 +88,7 @@ rgtest!(stdin, |_: Dir, mut cmd: TestCommand| {
|
||||
1:For the Doctor Watsons of this world, as opposed to the Sherlock
|
||||
2:Holmeses, success in the province of detective work must always
|
||||
";
|
||||
eqnice!(expected, cmd.pipe(SHERLOCK));
|
||||
eqnice!(expected, cmd.pipe(SHERLOCK.as_bytes()));
|
||||
});
|
||||
|
||||
// Test that multiline search and contextual matches work.
|
||||
|
@ -7,6 +7,8 @@ mod hay;
|
||||
// Utilities for making tests nicer to read and easier to write.
|
||||
mod util;
|
||||
|
||||
// Tests for ripgrep's handling of binary files.
|
||||
mod binary;
|
||||
// Tests related to most features in ripgrep. If you're adding something new
|
||||
// to ripgrep, tests should probably go in here.
|
||||
mod feature;
|
||||
|
@ -297,7 +297,7 @@ impl TestCommand {
|
||||
}
|
||||
|
||||
/// Pipe `input` to a command, and collect the output.
|
||||
pub fn pipe(&mut self, input: &str) -> String {
|
||||
pub fn pipe(&mut self, input: &[u8]) -> String {
|
||||
self.cmd.stdin(process::Stdio::piped());
|
||||
self.cmd.stdout(process::Stdio::piped());
|
||||
self.cmd.stderr(process::Stdio::piped());
|
||||
@ -309,7 +309,7 @@ impl TestCommand {
|
||||
let mut stdin = child.stdin.take().expect("expected standard input");
|
||||
let input = input.to_owned();
|
||||
let worker = thread::spawn(move || {
|
||||
write!(stdin, "{}", input)
|
||||
stdin.write_all(&input)
|
||||
});
|
||||
|
||||
let output = self.expect_success(child.wait_with_output().unwrap());
|
||||
|
Loading…
Reference in New Issue
Block a user