1
0
mirror of https://github.com/BurntSushi/ripgrep.git synced 2025-11-23 21:54:45 +02:00
Files
ripgrep/crates
Andrew Gallant d4b77a8d89 searcher: fix a performance bug with -A/--after-context
Previously (with the previous commit):

```
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999) | wc -l

real    2.321
user    0.674
sys     0.735
maxmem  30 MB
faults  0
1000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A9999) | wc -l

real    2.513
user    0.823
sys     0.686
maxmem  30 MB
faults  0
10000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A99999) | wc -l

real    5.067
user    3.254
sys     0.676
maxmem  30 MB
faults  0
100000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999999) | wc -l

real    6.658
user    4.841
sys     0.778
maxmem  51 MB
faults  0
1000000
```

Now with this commit:

```
$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999) | wc -l

real    1.845
user    0.328
sys     0.757
maxmem  30 MB
faults  0
1000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A9999) | wc -l

real    1.917
user    0.334
sys     0.771
maxmem  30 MB
faults  0
10000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A99999) | wc -l

real    1.972
user    0.319
sys     0.812
maxmem  30 MB
faults  0
100000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999999) | wc -l

real    2.005
user    0.333
sys     0.855
maxmem  30 MB
faults  0
1000000
```

And compare to GNU grep:

```
$ cat bigger.txt | (time grep ZQZQZQZQZQ -A999) | wc -l

real    1.488
user    0.143
sys     0.866
maxmem  30 MB
faults  0
1000

$ cat bigger.txt | (time grep ZQZQZQZQZQ -A9999) | wc -l

real    1.697
user    0.170
sys     0.986
maxmem  30 MB
faults  1
10000

$ cat bigger.txt | (time grep ZQZQZQZQZQ -A99999) | wc -l

real    1.515
user    0.166
sys     0.856
maxmem  29 MB
faults  0
100000

$ cat bigger.txt | (time grep ZQZQZQZQZQ -A999999) | wc -l

real    1.490
user    0.174
sys     0.851
maxmem  30 MB
faults  0
1000000
```

Interestingly, GNU grep is still a bit faster. But both commands remain
roughly invariant in search time as `-A` is increased.

There is definitely something "odd" about searching `stdin`, where it
seems substantially slower. We can also observe with GNU grep:

```
$ (time grep ZQZQZQZQZQ -A999999 bigger.txt) | wc -l

real    0.692
user    0.184
sys     0.506
maxmem  30 MB
faults  0
1000000

$ cat bigger.txt | (time grep ZQZQZQZQZQ -A999999) | wc -l

real    1.700
user    0.201
sys     0.954
maxmem  30 MB
faults  0
1000000

$ (time rg ZQZQZQZQZQ -A999999 bigger.txt) | wc -l

real    0.640
user    0.428
sys     0.209
maxmem  7734 MB
faults  0
1000000

$ (time rg ZQZQZQZQZQ --no-mmap -A999999 bigger.txt) | wc -l

real    0.866
user    0.282
sys     0.581
maxmem  30 MB
faults  0
1000000

$ cat bigger.txt | (time rg ZQZQZQZQZQ -A999999) | wc -l

real    1.991
user    0.338
sys     0.819
maxmem  30 MB
faults  0
1000000
```

I wonder if this is related to my discovery in the previous commit where
`read` calls on `stdin` seem to never return anything more than ~64K. Oh
well, I'm satisfied at this point, especially given that GNU grep seems
to do a lot worse than ripgrep with bigger values of
`-B/--before-context`:

```
$ cat bigger.txt | (time grep ZQZQZQZQZQ -B9) | wc -l

real    1.568
user    0.170
sys     0.885
maxmem  30 MB
faults  0
1

$ cat bigger.txt | (time grep ZQZQZQZQZQ -B99) | wc -l

real    1.734
user    0.338
sys     0.879
maxmem  30 MB
faults  0
1

$ cat bigger.txt | (time grep ZQZQZQZQZQ -B999) | wc -l

real    2.349
user    1.723
sys     0.620
maxmem  30 MB
faults  0
1

$ cat bigger.txt | (time grep ZQZQZQZQZQ -B9999) | wc -l

real    16.459
user    15.848
sys     0.586
maxmem  30 MB
faults  0
1

$ time grep ZQZQZQZQZQ -B99999 bigger.txt
ZQZQZQZQZQ

real    1:45.06
user    1:44.12
sys     0.772
maxmem  30 MB
faults  0
```

The above pattern occurs regardless of whether you put `bigger.txt` on
stdin or whether you search it directly.

And now ripgrep:

```
$ cat bigger.txt | (time rg ZQZQZQZQZQ -B9) | wc -l

real    1.965
user    0.326
sys     0.814
maxmem  29 MB
faults  0
1

$ cat bigger.txt | (time rg ZQZQZQZQZQ -B99) | wc -l

real    1.941
user    0.423
sys     0.813
maxmem  29 MB
faults  0
1

$ cat bigger.txt | (time rg ZQZQZQZQZQ -B999) | wc -l

real    2.372
user    0.759
sys     0.703
maxmem  30 MB
faults  0
1

$ cat bigger.txt | (time rg ZQZQZQZQZQ -B9999) | wc -l

real    2.638
user    0.895
sys     0.665
maxmem  29 MB
faults  0
1

$ cat bigger.txt | (time rg ZQZQZQZQZQ -B99999) | wc -l

real    5.172
user    3.282
sys     0.748
maxmem  29 MB
faults  0
1
```

NOTE: To get `bigger.txt`:

```
$ curl -LO 'https://burntsushi.net/stuff/opensubtitles/2018/en/sixteenth.txt.gz'
$ gzip -d sixteenth.txt.gz
$ (echo ZQZQZQZQZQ && for ((i=0;i<10;i++)); do cat sixteenth.txt; done) > bigger.txt
```
2025-10-14 14:27:43 -04:00
..
2025-09-22 21:38:08 -04:00
2025-09-19 21:08:19 -04:00
2025-09-19 21:08:19 -04:00
2025-09-19 21:08:19 -04:00
2025-09-19 21:08:19 -04:00
2025-09-19 21:08:19 -04:00