mirror of
https://github.com/facebook/zstd.git
synced 2025-03-07 01:10:04 +02:00
Every 256 bytes the lazy match finders process without finding a match, they will increase their step size by 1. So for bytes [0, 256) they search every position, for bytes [256, 512) they search every other position, and so on. However, they currently still insert every position into their hash tables. This is different from fast & dfast, which only insert the positions they search. This PR changes that, so now after we've searched 2KB without finding any matches, at which point we'll only be searching one in 9 positions, we'll stop inserting every position, and only insert the positions we search. The exact cutoff of 2KB isn't terribly important, I've just selected a cutoff that is reasonably large, to minimize the impact on "normal" data. This PR only adds skipping to greedy, lazy, and lazy2, but does not touch btlazy2. | Dataset | Level | Compiler | CSize ∆ | Speed ∆ | |---------|-------|--------------|---------|---------| | Random | 5 | clang-14.0.6 | 0.0% | +704% | | Random | 5 | gcc-12.2.0 | 0.0% | +670% | | Random | 7 | clang-14.0.6 | 0.0% | +679% | | Random | 7 | gcc-12.2.0 | 0.0% | +657% | | Random | 12 | clang-14.0.6 | 0.0% | +1355% | | Random | 12 | gcc-12.2.0 | 0.0% | +1331% | | Silesia | 5 | clang-14.0.6 | +0.002% | +0.35% | | Silesia | 5 | gcc-12.2.0 | +0.002% | +2.45% | | Silesia | 7 | clang-14.0.6 | +0.001% | -1.40% | | Silesia | 7 | gcc-12.2.0 | +0.007% | +0.13% | | Silesia | 12 | clang-14.0.6 | +0.011% | +22.70% | | Silesia | 12 | gcc-12.2.0 | +0.011% | -6.68% | | Enwik8 | 5 | clang-14.0.6 | 0.0% | -1.02% | | Enwik8 | 5 | gcc-12.2.0 | 0.0% | +0.34% | | Enwik8 | 7 | clang-14.0.6 | 0.0% | -1.22% | | Enwik8 | 7 | gcc-12.2.0 | 0.0% | -0.72% | | Enwik8 | 12 | clang-14.0.6 | 0.0% | +26.19% | | Enwik8 | 12 | gcc-12.2.0 | 0.0% | -5.70% | The speed difference for clang at level 12 is real, but is probably caused by some sort of alignment or codegen issues. clang is significantly slower than gcc before this PR, but gets up to parity with it. I also measured the ratio difference for the HC match finder, and it looks basically the same as the row-based match finder. The speedup on random data looks similar. And performance is about neutral, without the big difference at level 12 for either clang or gcc.
Regression tests
The regression tests run zstd in many scenarios and ensures that the size of the compressed results doesn't change. This helps us ensure that we don't accidentally regress zstd's compression ratio.
These tests get run every night by CircleCI. If the job fails you can read the diff printed by the job to ensure the change isn't a regression. If all is well you can download the results.csv
artifact and commit the new results. Or you can rebuild it yourself following the instructions below.
Rebuilding results.csv
From the root of the zstd repo run:
# Build the zstd binary
make clean
make -j zstd
# Build the regression test binary
cd tests/regression
make clean
make -j test
# Run the regression test
./test --cache data-cache --zstd ../../zstd --output results.csv
# Check results.csv to ensure the new results are okay
git diff
# Then submit the PR