1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2024-12-12 19:18:24 +02:00
Commit Graph

39 Commits

Author SHA1 Message Date
Andrew Gallant
d79add341b Move all gitignore matching to separate crate.
This PR introduces a new sub-crate, `ignore`, which primarily provides a
fast recursive directory iterator that respects ignore files like
gitignore and other configurable filtering rules based on globs or even
file types.

This results in a substantial source of complexity moved out of ripgrep's
core and into a reusable component that others can now (hopefully)
benefit from.

While much of the ignore code carried over from ripgrep's core, a
substantial portion of it was rewritten with the following goals in
mind:

1. Reuse matchers built from gitignore files across directory iteration.
2. Design the matcher data structure to be amenable for parallelizing
   directory iteration. (Indeed, writing the parallel iterator is the
   next step.)

Fixes #9, #44, #45
2016-10-29 20:48:59 -04:00
Andrew Gallant
f2e1711781 Fix bug when processing parent gitignore files.
This particular bug was triggered whenever a search was run in a directory
with a parent directory that contains a relevant .gitignore file. In
particular, before matching against a parent directory's gitignore rules,
a path's leading `./` was not stripped, which results in errant matching.

We now make sure `./` is stripped.

Fixes #184.
2016-10-16 10:15:11 -04:00
Andrew Gallant
4737326ed3 Update regex-syntax for bug fix.
The bug fix was in expression pretty printing. ripgrep parses the regex
into an AST and may do some modifications to it, which requires the
ability to go from string -> AST -> string' -> AST' where string == string'
implies AST == AST'.

Also, add a regression test for the specific regex that tripped the bug.

Fixes #156.
2016-10-10 22:04:29 -04:00
Andrew Gallant
a3537aa32a Update darwin cfg attributes. 2016-10-10 21:48:47 -04:00
Andrew Gallant
4e52059ad6 Disable regression_131 test on darwin.
It's not clear why it's failing. Maybe it doesn't permit certain
characters in file paths?
2016-10-10 21:03:11 -04:00
Andrew Gallant
27a980c1bc Fix symlink test.
We attempt to run it on Windows, but I'm getting "access denied" errors
when trying to create a file symlink. So we disable the test on Windows.
2016-10-10 19:34:57 -04:00
Andrew Gallant
e8645dc8ae style nits 2016-10-10 19:27:12 -04:00
Andrew Gallant
e96d93034a Finish overhaul of glob matching.
This commit completes the initial move of glob matching to an external
crate, including fixing up cross platform support, polishing the
external crate for others to use and fixing a number of bugs in the
process.

Fixes #87, #127, #131
2016-10-10 19:24:18 -04:00
Ian Kerins
1c964372ad Always follow symlinks on explicit file arguments. 2016-10-08 22:40:03 -04:00
Andrew Gallant
175406df01 Refactor and test glob sets.
This commit goes a long way toward refactoring glob sets so that the
code is easier to maintain going forward. In particular, it makes the
literal optimizations that glob sets used a lot more structured and much
easier to extend. Tests have also been modified to include glob sets.

There's still a bit of polish work left to do before a release.

This also fixes the immediate issue where large gitignore files were
causing ripgrep to slow way down. While we don't technically fix it for
good, we're a lot better about reducing the number of regexes we
compile. In particular, if a gitignore file contains thousands of
patterns that can't be matched more simply using literals, then ripgrep
will slow down again. We could fix this for good by avoiding RegexSet if
the number of regexes grows too large.

Fixes #134.
2016-10-04 20:28:56 -04:00
Andrew Gallant
925d0db9f0 Add -s/--case-sensitive flag.
This flag overrides both --smart-case and --ignore-case.

Closes #124.
2016-09-28 16:32:29 -04:00
Garrett Squire
babe80d498 add a max-depth option for directory traversal
CR and add integration test
2016-09-27 16:14:53 -07:00
Andrew Gallant
3e78fce3a3 Don't print empty lines in single threaded mode.
Fixes #99.
2016-09-26 19:57:23 -04:00
Andrew Gallant
7a3fd1f23f Add a --null flag.
This flag causes a NUL byte to follow any file path in ripgrep's output.

Closes #89.
2016-09-26 19:21:17 -04:00
Andrew Gallant
d306403440 Fix an off-by-one error with --column.
Fixes #105.
2016-09-26 19:09:59 -04:00
Andrew Gallant
b034b77798 Don't replace NUL bytes when searching binary files as text.
This was a result of misinterpreting a feature in grep where NUL bytes
are replaced with \n. The primary reason for doing this is to avoid
excessive memory usage on truly binary data. However, grep only does this
when searching binary files as if they were binary, and which only reports
whether the file matched or not. When grep is told to search binary data
as text (the -a/--text flag), then it doesn't do any replacement so we
shouldn't either.

In general, this makes sense, because the user is essentially asserting
that a particular file that looks like binary is actually text. In that
case, we shouldn't try to replace any NUL bytes.

ripgrep doesn't actually support searching binary data for whether it
matches or not, so we don't actually need the replace_buf function.
However, it does seem like a potentially useful feature.
2016-09-25 21:26:49 -04:00
Andrew Gallant
6a8051b258 Don't union inner literals of repetitions.
If we do, this results in extracting `foofoofoo` from `(\wfoo){3}`,
which is wrong. This does prevent us from extracting `foofoofoo` from
`foo{3}`, which is unfortunate, but we miss plenty of other stuff too.
Literal extracting needs a good rethink (all the way down into the regex
engine).

Fixes #93
2016-09-25 20:10:28 -04:00
Andrew Gallant
ed94aedf27 Permit whitelisting hidden files in ignores.
Fixes #90
2016-09-25 18:31:41 -04:00
Andrew Gallant
3d6a39be06 Fix tests on Windows.
Mostly this is just using \\ instead of / in paths reported by the OS.
2016-09-25 15:45:51 -04:00
Andrew Schwartzmeyer
a8f3d9e87e Add --files-with-matches flag.
Closes #26.

Acts like --count but emits only the paths of files with matches,
suitable for piping to xargs. Both mmap and no-mmap searches terminate
after the first match is found. Documentation updated and tests added.
2016-09-24 21:40:17 -07:00
Andrew Gallant
1595f0faf5 Add --smart-case.
It does what it says on the tin.

Closes #70.
2016-09-24 21:51:04 -04:00
Andrew Gallant
8eeb0c0b60 Add --no-ignore-vcs flag.
This flag will respect .ignore but not .gitignore.

Closes #68.
2016-09-24 21:31:24 -04:00
Andrew Gallant
c8227e0cf3 Don't ignore first path when using --files.
This is a docopt oddity, but probably not a bug. If --files is given,
then just interpret the pattern (if not empty) as the first file path.

Fixes #64.
2016-09-24 20:22:02 -04:00
Andrew Gallant
b941c10b90 Fix directory whitelisting.
There was a bug in the translation from a gitignore pattern to a standard
glob where `!/dir` wasn't being interpreted as an absolute path.

Fixes #67.
2016-09-24 20:10:30 -04:00
Andrew Gallant
71ad9bf393 Fix trailing recursive globs in gitignore.
A standard glob of `foo/**` will match `foo`, but gitignore semantics
specify that `foo/**` should only match the contents of `foo` and not
`foo` itself. We capture those semantics by translating `foo/**` to
`foo/**/*`.

Fixes #30.
2016-09-24 19:44:06 -04:00
Andrew Gallant
a6e3cab65a Add --no-filename flag.
When this flag is set, a filename is never shown for a match.

Closes #20
2016-09-24 19:24:24 -04:00
Andrew Gallant
7b860affbe Change the default output of --files to elide './'.
This is kind of a ticky-tack change. I do think ./ as a prefix is
reasonable default, *but* we strip ./ when showing search results, so it
does make sense to be consistent.

Fixes #21.
2016-09-24 19:18:48 -04:00
Andrew Gallant
346bad7dfc Fix handling of absolute patterns in parent gitignore files.
If a gitignore file in a *parent* directory is used, then it must be
matched relative to the directory it's in. ripgrep wasn't actually
adhering to this rule. Consider an example:

  .gitignore
  src
    llvm
      foo

Where `.gitignore` contains `/llvm/` and `foo` contains `test`. When
running `rg test` at the top-level directory, `foo` is correctly searched.
If you `cd` into `src` and re-run the same search, `foo` is ignored because
the `/llvm/` pattern is interpreted with respect to the current working
directory, which is wrong. The problem is that the path of `llvm` is
`./llvm`, which makes it look like it should match.

We fix this by rebuilding the directory path of each file when traversing
gitignores in parent directories. This does come with a small performance
hit.

Fixes #25.
2016-09-24 18:40:50 -04:00
Andrew Gallant
a3fc4cdded Fix a bug in the translation from a gitignore pattern to a glob.
We were erroneously neglecting to prefix a pattern like `foo/`
with `**/` (to make `**/foo/`) because it had a slash in it. In fact, the
only reason to neglect a **/ prefix is if the pattern already starts
with **/, or if the pattern is absolute.

Fixes #16, #49, #50, #65
2016-09-24 16:29:25 -04:00
Andrew Gallant
cc90511ab2 Switch from .rgignore to .ignore.
But don't actually remove support for .rgignore until the next semver
bump.

Note that this puts us in line with the silver searcher:
https://github.com/ggreer/the_silver_searcher/pull/974

Fixes #40
2016-09-23 22:44:33 -04:00
Andrew Gallant
6367dd61ba Column numbers should start at 1.
ripgrep was documented to do 1-based indexing, so this is a bug and not
a breaking change.

Fixes #18
2016-09-23 17:11:09 -04:00
Andrew Gallant
dfebed6cbe Add --vimgrep flag.
The --vimgrep flag forces a line to be printed for every match, with
line and column numbers.
2016-09-22 21:32:38 -04:00
Andrew Gallant
b80a986721 fix -uuu test on Windows 2016-09-21 21:07:36 -04:00
Andrew Gallant
7402db7b43 Add "unrestricted" flag.
I don't like having multiple flags do the same thing, but -u, -uu and -uuu
are much easier to remember, particularly with -uuu meaning "search
everything."
2016-09-20 20:24:03 -04:00
Andrew Gallant
f7ee914dd3 Add support for searching multiple patterns with -e.
Also, change -Q/--literal to -F/--fixed-strings because compatibility
with grep is probably better.
2016-09-17 16:55:58 -04:00
Andrew Gallant
2b943eda47 Make file type filtering a lot faster.
We do this by avoiding using a RegexSet (*sigh*). In particular, file
type matching has much simpler semantics than gitignore files, so we don't
actually need to care which file type matched. Therefore, we can get away
with a single regex with a giant alternation.
2016-09-11 13:26:53 -04:00
Andrew Gallant
76331e5fec Fix test that relied on non-deterministic order of results. 2016-09-09 23:24:01 -04:00
Andrew Gallant
1e678d7052 Fix files test. What a pain. 2016-09-09 23:19:46 -04:00
Andrew Gallant
f83cd63b11 Add integration tests. 2016-09-09 22:58:30 -04:00