1
0
mirror of https://github.com/facebook/zstd.git synced 2025-03-06 16:56:49 +02:00

4422 Commits

Author SHA1 Message Date
W. Felix Handte
39b7946b95 Define Macros for Possibly-Present Functions; Use Them Rather than Ifdef Guards 2023-05-04 12:18:58 -04:00
W. Felix Handte
b12e8cb3e7 Merge Ultra and Ultra2 Exclusion
Ultra2 does not exist for dict compression, and so uses ultra. So ultra must
be present if ultra2 is.
2023-05-04 12:18:58 -04:00
W. Felix Handte
6761e1c949 Tweak Ultra/Opt Guards 2023-05-04 12:18:58 -04:00
W. Felix Handte
5a75956001 Adjust Strategy in CParams to Avoid Using Excluded Block Compressors 2023-05-04 12:18:58 -04:00
W. Felix Handte
50cdf84f58 Macro-Exclude Block Compressors from Declaration/Definition 2023-05-04 12:18:58 -04:00
W. Felix Handte
81b86a2024 NULL Out Block Compressor Table Entries When Excluded
Don't check about excluding `ZSTD_fast`. It's always included so that we know
we can resolve downwards and hit a strategy that's present.
2023-05-04 12:18:58 -04:00
W. Felix Handte
cbf3e26316 Allow ZSTD_selectBlockCompressor() to Return NULL
Return an error rather than segfaulting.
2023-05-04 12:18:58 -04:00
Daniel Kutenin
4c25ea329b
Disable unused variable warning in msan configurations 2023-04-20 11:14:08 +01:00
Nick Terrell
61efb2a047 Add ZSTD_d_maxBlockSize parameter
Reduces memory when blocks are guaranteed to be smaller than allowed by
the format. This is useful for streaming compression in conjunction with
ZSTD_c_maxBlockSize.

This PR saves 2 * (formatMaxBlockSize - paramMaxBlockSize) when streaming.
Once it is rebased on top of PR #3616 it will save
3 * (formatMaxBlockSize - paramMaxBlockSize).
2023-04-17 22:06:44 -07:00
Nick Terrell
0abf2baef9 Reduce streaming decompression memory by 128KB
The split literals buffer patch increased streaming decompression memory
by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This
patch removes the added 128KB buffer, because it isn't necessary.

The buffer was there because the literals compression code didn't know
the true `blockSizeMax` of the frame, and always put split literals so
they ended 128KB - 32 from the beginning of the block. Instead, we can
pass down the true `blockSizeMax` and ensure that the split literals
end up at `blockSizeMax - 32` from the beginning of the block. We
already reserve a full `blockSizeMax` bytes in streaming mode, so we
won't be overwriting the extDict window.
2023-04-17 16:31:02 -07:00
Yann Collet
e4120c5513 fixing potential over-reads
detected by @terrelln,
these issue could be triggered in specific scenarios
namely decompression of certain invalid magic-less frames,
or requested properties from certain invalid skippable frames.
2023-04-03 16:52:32 -07:00
Yann Collet
2e29728797 fix #3583
As reported by @georgmu,
the previous fix is undone by the later initialization.
Switch order, so that initialization is adjusted by special case.
2023-04-03 09:45:11 -07:00
Yann Collet
9f58241dcc updated version number to v1.5.5
also : updated man pages
2023-03-31 23:02:08 -07:00
daniellerozenblit
fcaf06ddb4
Check that dest is valid for decompression (#3555)
* add check for valid dest buffer and fuzz on random dest ptr when malloc 0

* add uptrval to linux-kernel

* remove bin files

* get rid of uptrval

* restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)
2023-03-31 23:00:55 -07:00
Han Zhu
b558190ac7 Remove clang-only branch hints from ZSTD_decodeSequence
Looking at the __builtin_expect in ZSTD_decodeSequence:

{   size_t offset;
    #if defined(__clang__)
 if (LIKELY(ofBits > 1)) {
    #else
 if (ofBits > 1) {
    #endif
 ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);

From profile-annotated assembly, the probability of ofBits > 1 is about 75%
(101k counts out of 135k counts). This is much smaller than the recommended
likelihood to use __builtin_expect which is 99%. As a result, clang moved the
else block further away which hurts cache locality. Removing this
__built_expect along with two others in ZSTD_decodeSequence gave better
performance when PGO is enabled. I suggest to remove these branch hints and
rely on PGO which leverages runtime profiles from actual workload to calculate
branch probability instead.
2023-03-28 15:36:22 -07:00
Han Zhu
e6dccbf482 Inline BIT_reloadDStream
Inlining `BIT_reloadDStream` provided >3% decompression speed improvement for
clang PGO-optimized zstd binary, measured using the Silesia corpus with
compression level 1. The win comes from improved register allocation which leads
to fewer spills and reloads. Take a look at this comparison of
profile-annotated hot assembly before and after this change:
https://www.diffchecker.com/UjDGIyLz/. The diff is a bit messy, but notice three
fewer moves after inlining.

In general LLVM's register allocator works better when it can see more code. For
example, when the register allocator sees a call instruction, it partitions the
registers into caller registers and callee registers, and it is not free to do
whatever it wants with all the registers for the current function. Inlining the
callee lets the register allocation access all registers and use them more
flexsibly.
2023-03-28 15:36:02 -07:00
daniellerozenblit
3e0550ee52
fix window update (#3556) 2023-03-21 13:28:26 -04:00
Nick Terrell
a3c3a38b9b [lazy] Skip over incompressible data
Every 256 bytes the lazy match finders process without finding a match,
they will increase their step size by 1. So for bytes [0, 256) they search
every position, for bytes [256, 512) they search every other position,
and so on. However, they currently still insert every position into
their hash tables. This is different from fast & dfast, which only
insert the positions they search.

This PR changes that, so now after we've searched 2KB without finding
any matches, at which point we'll only be searching one in 9 positions,
we'll stop inserting every position, and only insert the positions we
search. The exact cutoff of 2KB isn't terribly important, I've just
selected a cutoff that is reasonably large, to minimize the impact on
"normal" data.

This PR only adds skipping to greedy, lazy, and lazy2, but does not
touch btlazy2.

| Dataset | Level | Compiler     | CSize ∆ | Speed ∆ |
|---------|-------|--------------|---------|---------|
| Random  |     5 | clang-14.0.6 |    0.0% |   +704% |
| Random  |     5 | gcc-12.2.0   |    0.0% |   +670% |
| Random  |     7 | clang-14.0.6 |    0.0% |   +679% |
| Random  |     7 | gcc-12.2.0   |    0.0% |   +657% |
| Random  |    12 | clang-14.0.6 |    0.0% |  +1355% |
| Random  |    12 | gcc-12.2.0   |    0.0% |  +1331% |
| Silesia |     5 | clang-14.0.6 | +0.002% |  +0.35% |
| Silesia |     5 | gcc-12.2.0   | +0.002% |  +2.45% |
| Silesia |     7 | clang-14.0.6 | +0.001% |  -1.40% |
| Silesia |     7 | gcc-12.2.0   | +0.007% |  +0.13% |
| Silesia |    12 | clang-14.0.6 | +0.011% | +22.70% |
| Silesia |    12 | gcc-12.2.0   | +0.011% |  -6.68% |
| Enwik8  |     5 | clang-14.0.6 |    0.0% |  -1.02% |
| Enwik8  |     5 | gcc-12.2.0   |    0.0% |  +0.34% |
| Enwik8  |     7 | clang-14.0.6 |    0.0% |  -1.22% |
| Enwik8  |     7 | gcc-12.2.0   |    0.0% |  -0.72% |
| Enwik8  |    12 | clang-14.0.6 |    0.0% | +26.19% |
| Enwik8  |    12 | gcc-12.2.0   |    0.0% |  -5.70% |

The speed difference for clang at level 12 is real, but is probably
caused by some sort of alignment or codegen issues. clang is
significantly slower than gcc before this PR, but gets up to parity with
it.

I also measured the ratio difference for the HC match finder, and it
looks basically the same as the row-based match finder. The speedup on
random data looks similar. And performance is about neutral, without the
big difference at level 12 for either clang or gcc.
2023-03-20 11:18:29 -07:00
Yann Collet
e2208242ac
Merge pull request #3553 from facebook/ldm_dict
added documentation for LDM + dictionary compatibility
2023-03-16 11:20:32 -07:00
Nick Terrell
fbd97f305a Deprecated bufferless and block level APIs
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
  call those instead
2023-03-16 10:04:15 -07:00
daniellerozenblit
53bad103ce
patch-from speed optimization (#3545)
* patch-from speed optimization: only load portion of dictionary into normal matchfinders

* test regression for x8 multiplier

* fix off-by-one error for bit shift bound

* restrict patchfrom speed optimization to strategy < ZSTD_btultra

* update results.csv

* update regression test
2023-03-14 20:36:56 -04:00
Yann Collet
f4563d87b9 added documentation for LDM + dictionary compatibility 2023-03-14 17:17:21 -07:00
Yonatan Komornik
91f4c23e63
Add salt into row hash (#3528 part 2) (#3533)
Part 2 of #3528

Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).
2023-03-13 15:34:13 -07:00
Yonatan Komornik
9420bce8a4
Add init once memory (#3528) (#3529)
- Adds memory type that is guaranteed to have been initialized at least once in the workspace's lifetime.
- Changes tag space in row hash to be based on init once memory.
2023-03-13 13:20:49 -07:00
Yonatan Komornik
a91e91d614
[Bugfix] row hash tries to match position 0 (#3548)
#3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag.
Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.
2023-03-13 10:00:03 -07:00
Yonatan Komornik
33e39094e7
Reduce RowHash's tag space size by x2 (#3543)
Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index).
The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).
2023-03-10 14:15:04 -08:00
Nick Terrell
c40c7378c6 Clarify dstCapacity requirements
Clarify `dstCapacity` requirements for single-pass functions.

Fixes #3524.
2023-03-09 10:18:30 -08:00
Nick Terrell
07a2a33135 Add ZSTD_set{C,F,}Params() helper functions
* Add ZSTD_setFParams() and ZSTD_setParams()
* Modify ZSTD_setCParams() to use ZSTD_setParameter() to avoid a second path setting parameters
* Add unit tests
* Update documentation to suggest using them to replace deprecated functions

Fixes #3396.
2023-03-08 09:57:35 -08:00
Yonatan Komornik
988ce61a0c
Adds initialization of clevel to static cdict (#3525) (#3527)
- Initializes clevel in `ZSTD_CCtxParams_init`
- Adds CI workflow for msan fuzzers runs without optimization (`-O0`)
- Fixes Makefile to correctly pass on user defined `MOREFLAGS` and `FUZZER_FLAGS` in cases they have been overwritten
2023-03-06 18:05:12 -08:00
Yann Collet
bd86e24637
Merge pull request #3513 from DimitriPapadopoulos/codespell
Fix typos found by codespell
2023-02-27 11:44:31 -08:00
Nick Terrell
395a2c5462 [bug-fix] Fix rare corruption bug affecting the block splitter
The block splitter confuses sequences with literal length == 65536 that use a
repeat offset code. It interprets this as literal length == 0 when deciding the
meaning of the repeat offset, and corrupts the repeat offset history. This is
benign, merely causing suboptimal compression performance, if the confused
history is flushed before the end of the block, e.g. if there are 3 consecutive
non-repeat code sequences after the mistake. It also is only triggered if the
block splitter decided to split the block.

All that to say: This is a rare bug, and requires quite a few conditions to
trigger. However, the good news is that if you have a way to validate that the
decompressed data is correct, e.g. you've enabled zstd's checksum or have a
checksum elsewhere, the original data is very likely recoverable. So if you were
affected by this bug please reach out.

The fix is to remind the block splitter that the literal length is actually 64K.
The test case is a bit tricky to set up, but I've managed to reproduce the issue.

Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!
2023-02-23 10:54:31 -08:00
Dimitri Papadopoulos
547794ef40
Fix typos found by codespell 2023-02-18 10:31:48 +01:00
Yonatan Komornik
c78f434aa4
Fix zstd-dll build missing dependencies (#3496)
* Fixes zstd-dll build (https://github.com/facebook/zstd/issues/3492):
- Adds pool.o and threading.o dependency to the zstd-dll target
- Moves custom allocation functions into header to avoid needing to add dependency on common.o
- Adds test target for zstd-dll
- Adds github workflow that buildis zstd-dll
2023-02-12 12:32:31 -08:00
Elliot Gorokhovsky
a7de1d9f49
Fix all MSVC warnings (#3495)
* fix and test MSVC AVX2 build

* treat msbuild warnings as errors

* fix incorrect MSVC 2019 compiler warning

* fix MSVC error D9035: option 'Gm' has been deprecated and will be removed in a future release
2023-02-11 10:56:59 -05:00
Elliot Gorokhovsky
ff42ed1582
Rename "External Matchfinder" to "Block-Level Sequence Producer" (#3484)
* change "external matchfinder" to "external sequence producer"

* migrate contrib/ to new naming convention

* fix contrib build

* fix error message

* update debug strings

* fix def of invalid sequences in zstd.h

* nit

* update CHANGELOG

* fix .gitignore
2023-02-09 17:01:17 -05:00
Yann Collet
c689310b25 rewrite legacy v0.7 bound checks to be independent of address space overflow 2023-02-07 17:11:07 -08:00
Yann Collet
c5bf6b8b88 add requested check for legacy decoder v0.1
which uses a different technique to store literals,
and therefore must check for potential overwrites.
2023-02-07 14:47:16 -08:00
Yann Collet
9419747171 fix legacy decoders v0.4, v0.5 and v0.6 2023-02-07 14:02:12 -08:00
Yann Collet
67d7a659f8 port fix for v0.3 to v0.6
in case it would applicable for this version
2023-02-07 13:55:30 -08:00
Yann Collet
7a1a171658 port fix for v0.3 to v0.5
in case it would be applicable for this version too
2023-02-07 13:55:30 -08:00
Yann Collet
b20e4e95f2 copy fix for v0.3 to v0.4
in case it would be applicable for this legacy version too.
2023-02-07 13:55:30 -08:00
Yann Collet
7eb4471fec adapt v0.3 fix to v0.1
slightly different constraints on end of buffer conditions
2023-02-07 13:55:30 -08:00
Yann Collet
cfec005efd fix for v0.3 blindly ported to v0.2
in case it would be applicable here too.
2023-02-07 13:55:30 -08:00
Yann Collet
e04706c58c fix oss-fuzz case 55714
impacts legacy decoder v0.3 in 32-bit mode
2023-02-07 13:55:30 -08:00
Nick Terrell
71a0259247 Fix ZSTD_getOffsetInfo() when nbSeq == 0
In 32-bit mode, ZSTD_getOffsetInfo() can be called when nbSeq == 0, and
in this case the offset table is uninitialized. The function should just
return 0 for both values, because there are no sequences.

Credit to OSS-Fuzz
2023-02-02 14:26:41 -08:00
Elliot Gorokhovsky
31e41b3d5e
Merge pull request #3471 from embg/fast_seq_parse
Reduce external matchfinder API overhead by 25%
2023-02-01 21:30:36 -05:00
Elliot Gorokhovsky
3fe5f1fbb9 assert externalRepSearch != ZSTD_ps_auto 2023-02-01 18:24:46 -08:00
Nick Terrell
cc3e3acd34 Fix 32-bit decoding with large dictionary
The 32-bit decoder could corrupt the regenerated data by using regular
offset mode when there were actually long offsets. This is because we
were only considering the window size in the calculation, not the
dictionary size. So a large dictionary could allow longer offsets.

Fix this in two ways:
1. Instead of looking at the window size, look at the total referencable
   bytes in the history buffer. Use this in the comparison instead of
   the window size. Additionally, we were comparing against the wrong
   value, it was too low. Fix that by computing exactly the maximum
   offset for regular sequence decoding.
2. If it is possible that we have long offsets due to (1), then check
   the offset code decoding table, and if the decoding table's maximum
   number of additional bits is no more than STREAM_ACCUMULATOR_MIN,
   then we can't have long offsets.

This gates us to be using the long offsets decoder only when we are very
likely to actually have long offsets.

Note that this bug only affects the decoding of the data, and the
original compressed data, if re-read with a patched decoder, will
correctly regenerate the orginal data. Except that the encoder also had
the same issue previously.

This fixes both the open OSS-Fuzz issues.

Credit to OSS-Fuzz
2023-02-01 17:22:44 -08:00
Elliot Gorokhovsky
7f8189ca57 add ZSTD_c_fastExternalSequenceParsing cctxParam 2023-02-01 09:09:53 -08:00
Elliot Gorokhovsky
64052ef57d
Guard against invalid sequences from external matchfinders (#3465) 2023-01-31 13:55:48 -05:00