Rather than remove the flag entirely, as proposed in #3499, this commit uses
the newest C++ standard the compiler supports. This retains the selection of
using only standardized features (excluding GNU extensions) and keeps the
recency requirements of the codebase explicit.
Tested with various versions of `g++` and `clang++`.
This fixes compilation with clang-cl on Windows. There
is a bug in cmake so that check_linker_flag() doesn't give
the correct result when using link.exe/lld-link.exe.
Details in CMake's gitlab: https://gitlab.kitware.com/cmake/cmake/-/issues/22023Fixes#3522
Every 256 bytes the lazy match finders process without finding a match,
they will increase their step size by 1. So for bytes [0, 256) they search
every position, for bytes [256, 512) they search every other position,
and so on. However, they currently still insert every position into
their hash tables. This is different from fast & dfast, which only
insert the positions they search.
This PR changes that, so now after we've searched 2KB without finding
any matches, at which point we'll only be searching one in 9 positions,
we'll stop inserting every position, and only insert the positions we
search. The exact cutoff of 2KB isn't terribly important, I've just
selected a cutoff that is reasonably large, to minimize the impact on
"normal" data.
This PR only adds skipping to greedy, lazy, and lazy2, but does not
touch btlazy2.
| Dataset | Level | Compiler | CSize ∆ | Speed ∆ |
|---------|-------|--------------|---------|---------|
| Random | 5 | clang-14.0.6 | 0.0% | +704% |
| Random | 5 | gcc-12.2.0 | 0.0% | +670% |
| Random | 7 | clang-14.0.6 | 0.0% | +679% |
| Random | 7 | gcc-12.2.0 | 0.0% | +657% |
| Random | 12 | clang-14.0.6 | 0.0% | +1355% |
| Random | 12 | gcc-12.2.0 | 0.0% | +1331% |
| Silesia | 5 | clang-14.0.6 | +0.002% | +0.35% |
| Silesia | 5 | gcc-12.2.0 | +0.002% | +2.45% |
| Silesia | 7 | clang-14.0.6 | +0.001% | -1.40% |
| Silesia | 7 | gcc-12.2.0 | +0.007% | +0.13% |
| Silesia | 12 | clang-14.0.6 | +0.011% | +22.70% |
| Silesia | 12 | gcc-12.2.0 | +0.011% | -6.68% |
| Enwik8 | 5 | clang-14.0.6 | 0.0% | -1.02% |
| Enwik8 | 5 | gcc-12.2.0 | 0.0% | +0.34% |
| Enwik8 | 7 | clang-14.0.6 | 0.0% | -1.22% |
| Enwik8 | 7 | gcc-12.2.0 | 0.0% | -0.72% |
| Enwik8 | 12 | clang-14.0.6 | 0.0% | +26.19% |
| Enwik8 | 12 | gcc-12.2.0 | 0.0% | -5.70% |
The speed difference for clang at level 12 is real, but is probably
caused by some sort of alignment or codegen issues. clang is
significantly slower than gcc before this PR, but gets up to parity with
it.
I also measured the ratio difference for the HC match finder, and it
looks basically the same as the row-based match finder. The speedup on
random data looks similar. And performance is about neutral, without the
big difference at level 12 for either clang or gcc.
In Python 3.x, a single element of a bytes array is returned as
an integer number. Thus, NEWLINE is an int variable, and attempting
to add it to the line array will fail with a type mismatch error
that may be demonstrated as follows:
[roam@straylight ~]$ python3 -c 'b"hello" + b"\n"[0]'
Traceback (most recent call last):
File "<string>", line 1, in <module>
TypeError: can't concat int to bytes
[roam@straylight ~]$
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
call those instead
* patch-from speed optimization: only load portion of dictionary into normal matchfinders
* test regression for x8 multiplier
* fix off-by-one error for bit shift bound
* restrict patchfrom speed optimization to strategy < ZSTD_btultra
* update results.csv
* update regression test
Part 2 of #3528
Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).
- Adds memory type that is guaranteed to have been initialized at least once in the workspace's lifetime.
- Changes tag space in row hash to be based on init once memory.
#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag.
Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index).
The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).
As reported by @P-E-Meunier in https://github.com/facebook/zstd/issues/2662#issuecomment-1443836186,
seekable format ingestion speed can be particularly slow
when selected `FRAME_SIZE` is very small,
especially in combination with the recent row_hash compression mode.
The specific scenario mentioned was `pijul`,
using frame sizes of 256 bytes and level 10.
This is improved in this PR,
by providing approximate parameter adaptation to the compression process.
Tested locally on a M1 laptop,
ingestion of `enwik8` using `pijul` parameters
went from 35sec. (before this PR) to 2.5sec (with this PR).
For the specific corner case of a file full of zeroes,
this is even more pronounced, going from 45sec. to 0.5sec.
These benefits are unrelated to (and come on top of) other improvement efforts currently being made by @yoniko for the row_hash compression method specifically.
The `seekable_compress` test program has been updated to allows setting compression level,
in order to produce these performance results.
Current timeout is too small for some slower machines, e.g. most modern riscv64 boards,
where tests fail with the following diagnostics:
Traceback (most recent call last):
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 734, in <module>
success = run_tests(tests, opts)
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 601, in run_tests
tests[test_case.name] = test_case.run()
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 285, in run
return self.analyze()
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 275, in analyze
self._join_test()
File "/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/./cli-tests/run.py", line 330, in _join_test
(stdout, stderr) = self._test_process.communicate(timeout=self._opts.timeout)
File "/usr/lib64/python3.10/subprocess.py", line 1154, in communicate
stdout, stderr = self._communicate(input, endtime, timeout)
File "/usr/lib64/python3.10/subprocess.py", line 2006, in _communicate
self._check_timeout(endtime, orig_timeout, stdout, stderr)
File "/usr/lib64/python3.10/subprocess.py", line 1198, in _check_timeout
raise TimeoutExpired(
subprocess.TimeoutExpired: Command '['/usr/src/RPM/BUILD/zstd-1.5.4-alt2/tests/cli-tests/compression/window-resize.sh']' timed out after 60 seconds
* Add ZSTD_setFParams() and ZSTD_setParams()
* Modify ZSTD_setCParams() to use ZSTD_setParameter() to avoid a second path setting parameters
* Add unit tests
* Update documentation to suggest using them to replace deprecated functions
Fixes#3396.
- Initializes clevel in `ZSTD_CCtxParams_init`
- Adds CI workflow for msan fuzzers runs without optimization (`-O0`)
- Fixes Makefile to correctly pass on user defined `MOREFLAGS` and `FUZZER_FLAGS` in cases they have been overwritten