1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00
Commit Graph

170 Commits

Author SHA1 Message Date
Andrew Gallant
e36b65a11a
windows: fix OneDrive traversals
This commit fixes a bug on Windows where directory traversals were
completely broken when attempting to scan OneDrive directories that use
the "file on demand" strategy.

The specific problem was that Rust's standard library treats OneDrive
directories as reparse points instead of directories, which causes
methods like `FileType::is_file` and `FileType::is_dir` to always return
false, even when retrieved via methods like `metadata` that purport to
follow symbolic links.

We fix this by peppering our code with checks on the underlying file
attributes exposed by Windows. We consider an entry a directory if and
only if the directory bit is set on the attributes. We are careful to
make sure that the code remains the same on non-Windows platforms.

Note that we also bump the dependency on `walkdir`, which contains a
similar fix for its traversals.

This bug is recorded upstream:
https://github.com/rust-lang/rust/issues/46484

Upstream also has a pending PR:
https://github.com/rust-lang/rust/pull/47956

Fixes #705
2018-02-01 21:11:02 -05:00
llogiq
e05023b406 deps: update bytecount
This improves performance with current nightly rustc.
2018-01-30 16:34:30 -05:00
Balaji Sivaraman
f007f940c5 search: add support for searching compressed files
This commit adds opt-in support for searching compressed files during
recursive search. This behavior is only enabled when the
`-z/--search-zip` flag is passed to ripgrep. When enabled, a limited set
of common compression formats are recognized via file extension, and a
new process is spawned to perform the decompression. ripgrep then
searches the stdout of that spawned process.

Closes #539
2018-01-30 09:13:53 -05:00
Igor Gnatenko
50616935a9 deps: update bytecount to 0.3
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
2018-01-09 07:09:29 -05:00
Igor Gnatenko
5e73075ef5 deps: bump lazy_static to 1 2017-12-30 16:50:18 -05:00
Andrew Gallant
7dd1194a97 deps: update to latest regex crate
The regex update fixes the Rust nightly build failure by in turn updating
its simd dependency to 2.x.

The regex update also includes a literal optimization that uses Tuned
Boyer Moore.

Fixes #617
2017-12-30 16:50:18 -05:00
Lilian A. Moraru
636bbc7c8f Speeding CI builds 2017-12-19 08:16:31 -05:00
Igor Gnatenko
373e0595e6 bump bytecount to 0.2
Signed-off-by: Igor Gnatenko <i.gnatenko.brain@gmail.com>
2017-11-22 09:52:47 -05:00
Dan Burkert
263e8f92b9 Update to memmap 0.6
`memmap` 0.6.0 introduces major API changes in anticipation of a 1.0
release. See https://github.com/danburkert/memmap-rs/releases/tag/0.6.0
for more information. CC danburkert/memmap-rs#33.
2017-11-22 06:57:15 -05:00
Andrew Gallant
c4e1945384
cargo: bump to 0.7.1 2017-10-22 10:33:09 -04:00
Andrew Gallant
8c8c83a1f8
deps: bump ignore to 0.3.1 2017-10-22 10:31:42 -04:00
Andrew Gallant
efa4de8126
cargo: bump to 0.7.0 2017-10-21 22:40:10 -04:00
Andrew Gallant
cd575d99f8
ignore: upgrade to walkdir 2
The uninteresting bits of this commit involve mechanical changes for
updates to walkdir 2. The more interesting bits of this commit are the
breaking changes, although none of them should require any significant
change on users of this library. The breaking changes are as follows:

* `DirEntry::path_is_symbolic_link` has been renamed to
  `DirEntry::path_is_symlink`. This matches the conventions in the
  standard library, and also the corresponding name change in walkdir.
* Removed the `From<walkdir::Error> for ignore::Error` impl. This was
  intended to only be used internally, but was the only thing that
  made `walkdir` a public dependency of `ignore`. Therefore, we remove
  it since it seems unnecessary.
* Renamed `WalkBuilder::sort_by` to `WalkBuilder::sort_by_file_name`,
  and changed the type of the comparator from

    Fn(&OsString, &OsString) -> cmp::Ordering + 'static

  to

    Fn(&OsStr, &OsStr) -> cmp::Ordering + Send + Sync + 'static

  The corresponding change in `walkdir` retains the `sort_by` name, but
  gives the comparator a pair of `&DirEntry` values instead of a pair
  of `&OsStr` values. Ideally, `ignore` would hand off its own pair of
  `&ignore::DirEntry` values, but this requires more design work. So for
  now, we retain previous functionality, but leave room to make a proper
  `sort_by` method.

[breaking-change]
2017-10-21 22:40:09 -04:00
Andrew Gallant
1267f01c24
deps: upgrade to memchr 2 2017-10-21 22:40:09 -04:00
Andrew Gallant
322d5515e5
style: 80 cols 2017-10-21 22:40:09 -04:00
Henri Sivonen
af77dd55a2 Update encoding_rs to 0.7.0 2017-08-28 08:14:33 -04:00
Jack O'Connor
3065a8c9c8 restore the default SIGPIPE behavior as a temporary workaround
See https://github.com/BurntSushi/ripgrep/issues/200.
2017-08-27 15:01:05 -04:00
Andrew Gallant
e10544f819
0.6.0 2017-08-23 19:54:50 -04:00
Andrew Gallant
dc7e39a6ba
cargo: add crates.io badges 2017-08-23 19:54:33 -04:00
Andrew Gallant
36c16eb00c
bump deps 2017-08-23 19:52:13 -04:00
Henri Sivonen
fe7fe74b0a Pass the simd-accel feature to encoding_rs 2017-08-20 08:42:31 -04:00
Vurich
b3a9c34515 Remove unused libc dependency 2017-08-08 07:03:58 -04:00
Igor Gnatenko
a2d4c03c71 bump encoding_rs to 0.6 2017-07-30 18:00:50 -04:00
Andrew Gallant
92e5fad27d
ignore-0.2.2 2017-07-17 17:56:33 -04:00
Andrew Gallant
e0e8f26c56
bump ignore version 2017-07-12 22:26:23 -04:00
Andrew Gallant
5a666b042d
bump ripgrep, ignore, globset
The `ignore` and `globset` crates both got breaking changes in the
course of fixing #444, so increase 0.x to 0.(x+1).
2017-05-11 19:12:20 -04:00
Andrew Gallant
0b685c8429 deps: update clap to 2.24
Fixes #442
2017-05-08 19:24:11 -04:00
Andrew Gallant
ac1c95a6d9 0.5.1 2017-04-09 09:47:00 -04:00
Andrew Gallant
487713aa34 bump ignore 2017-04-09 09:45:00 -04:00
Kevin K
0c298f60a6 updates clap and removes home rolled -h/--help distinction
This commit updates clap to v2.23.0

The update contained a bug fix in clap that results in broken code in
ripgrep. ripgrep was relying on the bug, but this commit fixes that
issue. The bug centered around not being able to override the
auto-generated help message by supplying a flag with a long of `help`.

Normally, supplying a flag with a long of `help` means whenever the user
passes `--help`, the consuming code (e.g. ripgrep) is responsible for
displaying the help message. However, due to the bug in clap this wasn't
necessary for ripgrep to do unless the user passed `-h`. With the bug
fixed, it meant the user passing `--help` and clap expected ripgrep to
display the help, yet ripgrep expected clap to display the help. This
has been fixed in this commit of ripgrep.

All well now!

v2.23.0 also brings the abilty to use `Arg::help` or `Arg::long_help`
allowing one to distinguish between `-h` and `--help`. This commit
leaves all doc strings in the `lazy_static!` hashmap however only for
aesthetic reasons.

This means all home rolled handling of `-h`/`--help` has been removed
from ripgrep, yet functionality *and* appearances are 100% the same.
2017-04-05 11:38:58 -04:00
Andrew Gallant
78847b65c8 0.5.0 2017-03-12 22:32:43 -04:00
Andrew Gallant
8bbe58d623 Add support for additional text encodings.
This includes, but is not limited to, UTF-16, latin-1, GBK, EUC-JP and
Shift_JIS. (Courtesy of the `encoding_rs` crate.)

Specifically, this feature enables ripgrep to search files that are
encoded in an encoding other than UTF-8. The list of available encodings
is tied directly to what the `encoding_rs` crate supports, which is in
turn tied to the Encoding Standard. The full list of available encodings
can be found here: https://encoding.spec.whatwg.org/#concept-encoding-get

This pull request also introduces the notion that text encodings can be
automatically detected on a best effort basis. Currently, the only
support for this is checking for a UTF-16 bom. In all other cases, a
text encoding of `auto` (the default) implies a UTF-8 or ASCII
compatible source encoding. When a text encoding is otherwise specified,
it is unconditionally used for all files searched.

Since ripgrep's regex engine is fundamentally built on top of UTF-8,
this feature works by transcoding the files to be searched from their
source encoding to UTF-8. This transcoding only happens when:

1. `auto` is specified and a non-UTF-8 encoding is detected.
2. A specific encoding is given by end users (including UTF-8).

When transcoding occurs, errors are handled by automatically inserting
the Unicode replacement character. In this case, ripgrep's output is
guaranteed to be valid UTF-8 (excluding non-UTF-8 file paths, if they
are printed).

In all other cases, the source text is searched directly, which implies
an assumption that it is at least ASCII compatible, but where UTF-8 is
most useful. In this scenario, encoding errors are not detected. In this
case, ripgrep's output will match the input exactly, byte-for-byte.

This design may not be optimal in all cases, but it has some advantages:

1. In the happy path ("UTF-8 everywhere") remains happy. I have not been
   able to witness any performance regressions.
2. In the non-UTF-8 path, implementation complexity is kept relatively
   low. The cost here is transcoding itself. A potentially superior
   implementation might build decoding of any encoding into the regex
   engine itself. In particular, the fundamental problem with
   transcoding everything first is that literal optimizations are nearly
   negated.

Future work should entail improving the user experience. For example, we
might want to auto-detect more text encodings. A more elaborate UX
experience might permit end users to specify multiple text encodings,
although this seems hard to pull off in an ergonomic way.

Fixes #1
2017-03-12 19:54:48 -04:00
Marc Tiehuis
33ec988d70 Remove regex build-dependency in Cargo.toml 2017-03-08 10:17:18 -05:00
tiehuis
714ae82241 Add --max-filesize option to cli
The --max-filesize option allows filtering files which are larger than
the specified limit. This is potentially useful if one is attempting to
search a number of large files without common file-types/suffixes.

See #369.
2017-03-08 10:17:18 -05:00
Andrew Gallant
4e8c0fc4ad bump clap to 2.20.5
Fixes #383
2017-02-25 18:43:13 -05:00
Igor Gnatenko
da1764dfd1 update env_logger to 0.4 2017-02-25 17:46:43 -05:00
Andrew Gallant
a114b86063 update termcolor dep 2017-02-18 15:09:25 -05:00
Andrew Gallant
8ac5bc0147 Remove Windows deps from ripgrep proper.
All Windows specific code has been (mostly) pushed out of ripgrep and
into its constituent libraries.
2017-02-18 15:06:20 -05:00
Andrew Gallant
b67886264f Add 'text-processing' category. 2017-01-21 11:31:09 -05:00
Jake Goulding
e67ab459d3 Add categories to Cargo.toml 2017-01-20 14:20:28 -05:00
Andrew Gallant
f5a2d022ec Replace internal atty module with atty crate.
This removes all use of explicit unsafe in ripgrep proper except for
one: accessing the contents of a memory map. (Which may never go away.)
2017-01-15 16:32:30 -05:00
Andrew Gallant
057ed6305a 0.4.0 2017-01-13 23:46:21 -05:00
Andrew Gallant
461e0c4e33 Don't search stdout redirected file.
When running ripgrep like this:

    rg foo > output

we must be careful not to search `output` since ripgrep is actively writing
to it. Searching it can cause massive blowups where the file grows without
bound.

While this is conceptually easy to fix (check the inode of the redirection
and the inode of the file you're about to search), there are a few problems
with it.

First, inodes are a Unix thing, so we need a Windows specific solution to
this as well. To resolve this concern, I created a new crate, `same-file`,
which provides a cross platform abstraction.

Second, stat'ing every file is costly. This is not avoidable on Windows,
but on Unix, we can get the inode number directly from directory traversal.
However, this information wasn't exposed, but now it is (through both the
ignore and walkdir crates).

Fixes #286
2017-01-09 16:12:08 -05:00
Andrew Gallant
163e00677a Update to regex 0.2. 2017-01-01 01:03:21 -05:00
Andrew Gallant
d58236fbdc bump various versions 2016-12-30 15:44:08 -05:00
Andrew Gallant
de5cb7d22e Remove special ^C handling.
This means that ripgrep will no longer try to reset your colors in your
terminal if you kill it while searching. This could result in messing up
the colors in your terminal, and the fix is to simply run some other
command that resets them for you. For example:

    $ echo -ne "\033[0m"

The reason why the ^C handling was removed is because it is irrevocably
broken on Windows and is impossible to do correctly and efficiently in
ANSI terminals.

Fixes #281
2016-12-24 12:53:09 -05:00
Andrew Gallant
9911cd0cd9 Remove ~ dependency on clap.
The point of the ~ dependency was to avoid implicitly increasing the
minimum Rust version required to compile ripgrep. However, clap's policy
is to support at least two prior releases of Rust (which roughly
corresponds to the convention that others use too), and that is probably
good enough.

The problem with using a ~ dependency is that it can make packaging
ripgrep in Linux distros difficult, because it means the packager may be
forced to package multiple compatible versions of the same library.

Fixes #271
2016-12-24 09:58:15 -05:00
Andrew Gallant
de33003527 0.3.2 2016-12-07 10:59:06 -05:00
Andrew Gallant
d66812102b Fix leading hypen bug by updating clap.
Fixes #270
2016-12-06 17:29:34 -05:00
Andrew Gallant
c4a6733f3b 0.3.1 2016-11-21 20:53:52 -05:00
Andrew Gallant
a5e7f176f1 Use clap ~2.18.0.
This is to ensure that we don't silently update a minor version of clap,
which could include a breaking change.

(An update to 2.19 should be done soon.)
2016-11-21 09:20:43 -05:00
Andrew Gallant
aef46beaf2 0.3.0 2016-11-20 16:07:25 -05:00
Andrew Gallant
e8a30cb893 Completely re-work colored output and tty handling.
This commit completely guts all of the color handling code and replaces
most of it with two new crates: wincolor and termcolor. wincolor
provides a simple API to coloring using the Windows console and
termcolor provides a platform independent coloring API tuned for
multithreaded command line programs. This required a lot more
flexibility than what the `term` crate provided, so it was dropped.
We instead switch to writing ANSI escape sequences directly and ignore
the TERMINFO database.

In addition to fixing several bugs, this commit also permits end users
to customize colors to a certain extent. For example, this command will
set the match color to magenta and the line number background to yellow:

    rg --colors 'match:fg:magenta' --colors 'line:bg:yellow' foo

For tty handling, we've adopted a hack from `git` to do tty detection in
MSYS/mintty terminals. As a result, ripgrep should get both color
detection and piping correct on Windows regardless of which terminal you
use.

Finally, switch to line buffering. Performance doesn't seem to be
impacted and it's an otherwise more user friendly option.

Fixes #37, Fixes #51, Fixes #94, Fixes #117, Fixes #182, Fixes #231
2016-11-20 11:14:52 -05:00
Andrew Gallant
92dc402f7f Switch from Docopt to Clap.
There were two important reasons for the switch:

1. Performance. Docopt does poorly when the argv becomes large, which is
   a reasonable common use case for search tools. (e.g., use with xargs)
2. Better failure modes. Clap knows a lot more about how a particular
   argv might be invalid, and can therefore provide much clearer error
   messages.

While both were important, (1) made it urgent.

Note that since Clap requires at least Rust 1.11, this will in turn
increase the minimum Rust version supported by ripgrep from Rust 1.9 to
Rust 1.11. It is therefore a breaking change, so the soonest release of
ripgrep with Clap will have to be 0.3.

There is also at least one subtle breaking change in real usage.
Previous to this commit, this used to work:

    rg -e -foo

Where this would cause ripgrep to search for the string `-foo`. Clap
currently has problems supporting this use case
(see: https://github.com/kbknapp/clap-rs/issues/742),
but it can be worked around by using this instead:

    rg -e [-]foo

or even

    rg [-]foo

and this still works:

    rg -- -foo

This commit also adds Bash, Fish and PowerShell completion files to the
release, fixes a bug that prevented ripgrep from working on file
paths containing invalid UTF-8 and shows short descriptions in the
output of `-h` but longer descriptions in the output of `--help`.

Fixes #136, Fixes #189, Fixes #210, Fixes #230
2016-11-17 19:53:41 -05:00
Andrew Gallant
5462af4434 Pin rustc-serialize to 0.3.19.
See: https://github.com/rust-lang-nursery/rustc-serialize/pull/159
2016-11-09 20:28:58 -05:00
Andrew Gallant
d2e70da040 0.2.9 2016-11-09 19:07:25 -05:00
Andrew Gallant
64dc9b6709 update deps 2016-11-09 18:54:22 -05:00
Andrew Gallant
18943b9317 0.2.8 2016-11-06 16:16:48 -05:00
Andrew Gallant
4ca15a8a51 simd-accel should not invoke avx-accel.
This was a silly transcription error.
2016-11-06 16:15:23 -05:00
Andrew Gallant
2daef51fe5 0.2.7 2016-11-06 15:49:25 -05:00
Andrew Gallant
dada75d2a7 Update sub-crate dependency versions. 2016-11-06 15:48:40 -05:00
Andrew Gallant
5bd0edbbe1 Actually use simd/avx optimizations in bytecount crate.
Also update compile script.
2016-11-05 22:44:33 -04:00
Andre Bogus
02de97b8ce Use the bytecount crate for fast line counting.
Fixes #128
2016-11-05 22:29:26 -04:00
Andrew Gallant
b272be25fa Add parallel recursive directory iterator.
This adds a new walk type in the `ignore` crate, `WalkParallel`, which
provides a way for recursively iterating over a set of paths in parallel
while respecting various ignore rules.

The API is a bit strange, as a closure producing a closure isn't
something one often sees, but it does seem to work well.

This also allowed us to simplify much of the worker logic in ripgrep
proper, where MultiWorker is now gone.
2016-11-05 21:45:55 -04:00
Andrew Gallant
f147f3aa39 0.2.6 2016-10-31 20:01:37 -04:00
Andrew Gallant
d85a6dd5c8 update ignore dependency 2016-10-31 20:01:31 -04:00
Andrew Gallant
6b038511c7 0.2.5 2016-10-29 22:42:28 -04:00
Andrew Gallant
a075a462fa 0.2.4 2016-10-29 22:40:02 -04:00
Andrew Gallant
91646f6cca bump ignore to 0.1.1 2016-10-29 22:19:00 -04:00
Brian Campbell
79a8d0ab3f Reset the terminal when Ctrl-C is pressed
If a user hits Ctrl-C to exit out of a search in the middle of printing
a line, we don't want to leave the terminal colors screwed up for them.
Catch Ctrl-C using the ctrlc crate, obtain a stdout lock to ensure that
other threads don't continue writing after we do so, reset the terminal,
and exit the program.

Closes #119
2016-10-29 21:23:05 -04:00
Andrew Gallant
d79add341b Move all gitignore matching to separate crate.
This PR introduces a new sub-crate, `ignore`, which primarily provides a
fast recursive directory iterator that respects ignore files like
gitignore and other configurable filtering rules based on globs or even
file types.

This results in a substantial source of complexity moved out of ripgrep's
core and into a reusable component that others can now (hopefully)
benefit from.

While much of the ignore code carried over from ripgrep's core, a
substantial portion of it was rewritten with the following goals in
mind:

1. Reuse matchers built from gitignore files across directory iteration.
2. Design the matcher data structure to be amenable for parallelizing
   directory iteration. (Indeed, writing the parallel iterator is the
   next step.)

Fixes #9, #44, #45
2016-10-29 20:48:59 -04:00
Andrew Gallant
d8712daf27 0.2.3 2016-10-11 19:44:54 -04:00
Andrew Gallant
247a9398f4 Switch to thread_local crate in lieu of thread_local!.
This is to work around a bug where using a thread_local! was causing
a segfault on macos.

Fixes #164.
2016-10-11 18:23:49 -04:00
Andrew Gallant
4981991a6e 0.2.2 2016-10-10 22:24:36 -04:00
Andrew Gallant
51440f59cd Don't include HomebrewFormula in crate. 2016-10-10 22:24:28 -04:00
Andrew Gallant
e96d93034a Finish overhaul of glob matching.
This commit completes the initial move of glob matching to an external
crate, including fixing up cross platform support, polishing the
external crate for others to use and fixing a number of bugs in the
process.

Fixes #87, #127, #131
2016-10-10 19:24:18 -04:00
Andrew Gallant
fdf24317ac Move glob implementation to new crate.
It is isolated and complex enough that it deserves attention all on its
own. It's also eminently reusable.
2016-09-30 19:42:41 -04:00
Andrew Gallant
de79be2db2 0.2.1 2016-09-26 20:02:58 -04:00
Andrew Gallant
b1c52b52d6 0.2.0 2016-09-25 22:32:14 -04:00
Andrew Gallant
109bc3f78e bump grep to 0.1.3 2016-09-25 22:30:17 -04:00
Andrew Gallant
b33e9cba69 0.1.17 2016-09-23 11:26:23 -04:00
Andrew Gallant
d5c045469b Don't use panic-on-abort.
We don't really care anyway, it was there as an experiment, and it seems
to be causing problems.

Fixes #14.
2016-09-23 11:25:46 -04:00
Andrew Gallant
25c259112b 0.1.16 2016-09-22 21:32:41 -04:00
Andrew Gallant
2115774c6e 0.1.15 2016-09-22 19:20:11 -04:00
Andrew Gallant
1b14e245be 0.1.14 2016-09-22 17:48:49 -04:00
Andrew Gallant
263e2b012f 0.1.13 2016-09-21 21:07:40 -04:00
Andrew Gallant
525d051172 0.1.12 2016-09-21 20:47:44 -04:00
Andrew Gallant
fe84928c85 0.1.11 2016-09-21 19:37:37 -04:00
Andrew Gallant
c1c92e4fee 0.1.10 2016-09-21 19:27:16 -04:00
Andrew Gallant
aeb3a5ba0f bump grep to 0.1.2 2016-09-21 19:16:28 -04:00
Andrew Gallant
4d6b3c727e Bump regex version. 2016-09-21 19:05:15 -04:00
Andrew Gallant
b0d8ff6f4a 0.1.9 2016-09-21 16:41:28 -04:00
Andrew Gallant
0263a401f6 0.1.8 2016-09-21 07:08:37 -04:00
Andrew Gallant
f9bff90842 0.1.7 2016-09-20 22:13:49 -04:00
Andrew Gallant
9e2f10b893 0.1.6 2016-09-20 20:25:51 -04:00
Andrew Gallant
e7fb0fd267 0.1.5 2016-09-19 21:56:00 -04:00
Andrew Gallant
2cf1a08969 ripgrep 0.1.4 2016-09-18 18:19:02 -04:00
Andrew Gallant
6cb604f38f 0.1.3 2016-09-17 12:55:09 -04:00
Andrew Gallant
8f87a4e8ac 0.1.2 2016-09-17 11:36:11 -04:00
Andrew Gallant
d27d3e675f bump grep 2016-09-17 11:34:27 -04:00