It looks like it incorrectly treats a file that is purely valid UTF-8 as
a binary file, which in turn effectively renders all of the Russian
subtitle benchmarks moot for ugrep. So we pass '-a' to force ugrep to
treat the file as text.
This technically gives ugrep an edge because it now no longer needs to
look to see if the haystack is binary or not. In practice this is
usually implemented using highly optimized SIMD routines (e.g.,
'memchr'), so it tends not to matter much. We might also consider
passing '-a' to all grep commands. But... I think using '-a' is the less
common case and we should try to benchmark the common case.
This removes the old commented out URLs for the 2016 subtitles that
don't work any more. I should probably upload the files to a more stable
URL.
This also switches to a 'https://' GitHub URL as I believe the 'git://'
URLs are no longer supported.
I'm not sure about putting this in such a prominent spot, and it does
bloat the introductory paragraph a bit, but it seems like an important
special case.
Dependabot automatically files PRs for updatable dependencies. As
configured it watches all workflow files in `.github/workflows` for
possible updates to any of the Actions depended upon.
We specifically do not enable Dependabot for other things, in order to
avoid running in a hamster wheel.
Closes#2315
This adds some lesser known extensions.
Notably, it adds php7 and php8, but not php6. Apparently,
php6 was never a thing: https://wiki.php.net/rfc/php6
PR #2263
It turns out that if there are text anchors (that is, \A or \z, or ^/$
when multi-line is disabled), then the "fast" line searching path isn't
quite correct. Since searching without multi-line mode is exceptionally
rare, we just look for the presence of text anchors and specifically
disable the line terminator option in 'grep-regex'. This in turn
inhibits the "fast" line searching path.
Fixes#2260
When a glob pattern ended with a \/, and since we permit backslash
escapes, the glob parser gave a "dangling escape" error. Which is weird,
because the \ is clearly not dangling.
The issue is that the layer above the glob parser, the gitignore parser,
was stripping the trailing / so that it wouldn't be part of the matching
logic. Of course, stripping the trailing / while it is escaped without
removing the backslash escape is wrong. So we do that here.
Fixes#2236
This furthers our kludge of dealing with PCRE2's look-around in the
printer. Because of our bad abstraction boundaries, we added a kludge to
deal with PCRE2 look-around by extending the bytes we search by a fixed
amount to hopefully permit any look-around to operate. But because of
that kludge, we wind up over extending ourselves in some cases and
dragging along those extra bytes.
We had fixed this for simple searching by simply rejecting any matches
past the end point. But we didn't do the same for replacements. So this
commit extends our kludge to replacements.
Thanks to @sonohgong for diagnosing the problem and proposing a fix. I
mostly went with their solution, but adding the new replacement routine
as an internal helper rather than a new APIn in the 'grep-matcher'
crate.
Fixes#2095, Fixes#2208