Every 256 bytes the lazy match finders process without finding a match,
they will increase their step size by 1. So for bytes [0, 256) they search
every position, for bytes [256, 512) they search every other position,
and so on. However, they currently still insert every position into
their hash tables. This is different from fast & dfast, which only
insert the positions they search.
This PR changes that, so now after we've searched 2KB without finding
any matches, at which point we'll only be searching one in 9 positions,
we'll stop inserting every position, and only insert the positions we
search. The exact cutoff of 2KB isn't terribly important, I've just
selected a cutoff that is reasonably large, to minimize the impact on
"normal" data.
This PR only adds skipping to greedy, lazy, and lazy2, but does not
touch btlazy2.
| Dataset | Level | Compiler | CSize ∆ | Speed ∆ |
|---------|-------|--------------|---------|---------|
| Random | 5 | clang-14.0.6 | 0.0% | +704% |
| Random | 5 | gcc-12.2.0 | 0.0% | +670% |
| Random | 7 | clang-14.0.6 | 0.0% | +679% |
| Random | 7 | gcc-12.2.0 | 0.0% | +657% |
| Random | 12 | clang-14.0.6 | 0.0% | +1355% |
| Random | 12 | gcc-12.2.0 | 0.0% | +1331% |
| Silesia | 5 | clang-14.0.6 | +0.002% | +0.35% |
| Silesia | 5 | gcc-12.2.0 | +0.002% | +2.45% |
| Silesia | 7 | clang-14.0.6 | +0.001% | -1.40% |
| Silesia | 7 | gcc-12.2.0 | +0.007% | +0.13% |
| Silesia | 12 | clang-14.0.6 | +0.011% | +22.70% |
| Silesia | 12 | gcc-12.2.0 | +0.011% | -6.68% |
| Enwik8 | 5 | clang-14.0.6 | 0.0% | -1.02% |
| Enwik8 | 5 | gcc-12.2.0 | 0.0% | +0.34% |
| Enwik8 | 7 | clang-14.0.6 | 0.0% | -1.22% |
| Enwik8 | 7 | gcc-12.2.0 | 0.0% | -0.72% |
| Enwik8 | 12 | clang-14.0.6 | 0.0% | +26.19% |
| Enwik8 | 12 | gcc-12.2.0 | 0.0% | -5.70% |
The speed difference for clang at level 12 is real, but is probably
caused by some sort of alignment or codegen issues. clang is
significantly slower than gcc before this PR, but gets up to parity with
it.
I also measured the ratio difference for the HC match finder, and it
looks basically the same as the row-based match finder. The speedup on
random data looks similar. And performance is about neutral, without the
big difference at level 12 for either clang or gcc.
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
call those instead
Part 2 of #3528
Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index).
The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
* first attempt at fast DMS short cache
* significant wins for some scenarios
* fix all clang regressions
* nits
* fix 1.5% gcc11 regression on hot 110Kdict scenario
* fix CI
* nit
* Add tags to doublefast hash table
* use tags in doublefast DMS
* Fix CI
* Clean up some hardcoded logic / constants
* Switch forCCtx to an enum
* nit
* add short cache to ip+1 long search
* Move tag size into hashLog
* Minor nits
* Truncate dictionaries greater than 16MB in short cache mode
* Helper function for tag comparison
* Cap short cache hashLog at 24 to prevent overflow
* size_t dictTagsMatch -> int dictTagsMatch
* nit
* Clean up and comment dictionary truncation
* Move ZSTD_tableFillPurpose_e next to ZSTD_dictTableLoadMethod_e
* Comment and expand helper functions
* Asserts and documentation
* nit
credit to oss-fuzz
This issue could happen when using the new Sequence Compression API in Explicit Delimiter Mode
with a too small dstCapacity.
In which case, there was one place where the buffer size wasn't checked.
directly at ZSTD_storeSeq() interface.
In the process, remove ZSTD_REP_MOVE.
This makes it possible, in future commits,
to update and effectively simplify the naming scheme
to properly label the updated processing pipeline :
offset | repcode => offBase => offCode + offBits
the new contracts seems to make more sense :
updateRep() updates an array of repeat offsets _in place_,
while newRep() generates a new structure with the updated repeat-offset array.
Most callers are actually expecting the in-place variant,
and a limited sub-section, in `zstd_opt.c` mainly, prefer `newRep()`.
to act on values stored / expressed in the sumtype numeric representation required by `storedSeq()`.
This makes it possible to abstract away this representation by using the macros to extract these values.
First user : ZSTD_updateRep() .
this meant to abstract the sumtype representation required
to transfert `offcode` to `ZSTD_storeSeq()`.
Unfortunately, the sumtype numeric representation is currently a leaky abstraction
that has permeated many other parts of the code,
especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`.
While this PR makes a good job a transfering a large nb of call sites
to using the new macros, there are still a few sites where this transformation is more complex,
or where the numeric representation itself it used "as is".
One of the problematics area is the decision to use the numeric format of the sumtype
within the match finders of `zstd_lazy`.
This commit doesn't change the behavior, it only introduces and employes the macros,
but eventually the resulting code remains identical.
At target, if the numeric representation of the sumtype can be completely abstracted
and no other part of the code depends on it,
it will be possible to move it towards something slightly more efficient.
since this is effectively what is stored in this field (== matchLength - MINMATCH).
This makes it clearer what needs to be done when reading from / writing to this field.
PR #2850 attempted to fix a determinism bug that was uncovered by OSS-Fuzz. It
succeeded in addressing that source of non-determinism, but introduced a new
one: it was possible, when index reduction occurred, to map indices in the
window to the reserved value, which would cause them to be zeroed, potentially
altering parsing of the input.
This PR addresses this issue. It makes sure that the bottom of the window is
always `>= ZSTD_WINDOW_START_INDEX`.
I'm not sure if this makes #2850 redundant. I think it's probably still
valuable to have that protection as well.
Credit to OSS-Fuzz for discovering this issue.
Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported
compilers this uses an attribute, otherwise it becomes a comment.
This is necessary to be compatible with clang's `-Wfall-through`, and
gcc's `-Wfall-through=2` which don't support comments. Without this the
linux build emits a bunch of warnings.
Also add a test to CI to ensure that we don't regress.
turns out, it's possible to constify MatchState* parameter
in some parts of the binary tree algorithm,
making it a pure read-only parameter,
as opposed to a mutable state.
This is supposed to be helpful for both maintenance and the compiler.