This commit goes a long way toward refactoring glob sets so that the
code is easier to maintain going forward. In particular, it makes the
literal optimizations that glob sets used a lot more structured and much
easier to extend. Tests have also been modified to include glob sets.
There's still a bit of polish work left to do before a release.
This also fixes the immediate issue where large gitignore files were
causing ripgrep to slow way down. While we don't technically fix it for
good, we're a lot better about reducing the number of regexes we
compile. In particular, if a gitignore file contains thousands of
patterns that can't be matched more simply using literals, then ripgrep
will slow down again. We could fix this for good by avoiding RegexSet if
the number of regexes grows too large.
Fixes#134.
This helps #134 by avoiding a slow regex execution path, but doesn't
actually fix the problem. Namely, we've gone from "so slow I'm not going
to keep waiting for rg to finish" to "wow that was slow but at least it
finished before I lost my patience."
It didn't make sense for --quiet to be part of the printer, because --quiet
doesn't just mean "don't print," it also means, "stop after the first
match is found." This needs to be wired all the way up through directory
traversal, and it also needs to cause all of the search workers to quit
as well. We do it with an atomic that is only checked with --quiet is
given.
Fixes#116.
This was already working correctly in multithreaded mode, but in single
threaded mode, a file failing to open caused search to stop. That's bad.
Fixes#98.
This was a result of misinterpreting a feature in grep where NUL bytes
are replaced with \n. The primary reason for doing this is to avoid
excessive memory usage on truly binary data. However, grep only does this
when searching binary files as if they were binary, and which only reports
whether the file matched or not. When grep is told to search binary data
as text (the -a/--text flag), then it doesn't do any replacement so we
shouldn't either.
In general, this makes sense, because the user is essentially asserting
that a particular file that looks like binary is actually text. In that
case, we shouldn't try to replace any NUL bytes.
ripgrep doesn't actually support searching binary data for whether it
matches or not, so we don't actually need the replace_buf function.
However, it does seem like a potentially useful feature.
It seems silly, but on *nix, we can just dump the bytes of the path
straight to the terminal. There's no need to do a UTF-8 check, which
can be costly when printing lots of matches.
This means that `rg pat < file` won't do the expected thing and search
`fil`. Instead, it will recursively search the current directory for `pat`.
This isn't ideal, but is better than the previous behavior, which was to
wait for stdin when running `rg pat`, given the appearance of hanging
forever. The former is an important use case, but the latter is the
*central* use case of ripgrep, so we should make that work.
`rg` can still be used to search stdin on Windows, it just needs to be
done explicitly. e.g., `rg pat - < file` will search for `pat` in `file`.
Fixes#19
If no paths are given to ripgrep, only read from stdin if it's a file or
a FIFO. In particular, if something like `rg foo < /dev/null` is used,
then don't try to read from stdin.
Fixes#35, #81
Closes#26.
Acts like --count but emits only the paths of files with matches,
suitable for piping to xargs. Both mmap and no-mmap searches terminate
after the first match is found. Documentation updated and tests added.
Files like /proc/cpuinfo will advertise themselves as a normal file with
size 0. Normally, this isn't a problem, but if ripgrep decides to use a
memory map, it skipped searching if the file was empty since it's an error
to memory map an empty file. Instead of returning 0, we should just fall
back to standard read calls.
Fixes#55.