1
0
mirror of https://github.com/facebook/zstd.git synced 2025-03-07 01:10:04 +02:00

4391 Commits

Author SHA1 Message Date
Dimitri Papadopoulos
547794ef40
Fix typos found by codespell 2023-02-18 10:31:48 +01:00
Yonatan Komornik
c78f434aa4
Fix zstd-dll build missing dependencies (#3496)
* Fixes zstd-dll build (https://github.com/facebook/zstd/issues/3492):
- Adds pool.o and threading.o dependency to the zstd-dll target
- Moves custom allocation functions into header to avoid needing to add dependency on common.o
- Adds test target for zstd-dll
- Adds github workflow that buildis zstd-dll
2023-02-12 12:32:31 -08:00
Elliot Gorokhovsky
a7de1d9f49
Fix all MSVC warnings (#3495)
* fix and test MSVC AVX2 build

* treat msbuild warnings as errors

* fix incorrect MSVC 2019 compiler warning

* fix MSVC error D9035: option 'Gm' has been deprecated and will be removed in a future release
2023-02-11 10:56:59 -05:00
Elliot Gorokhovsky
ff42ed1582
Rename "External Matchfinder" to "Block-Level Sequence Producer" (#3484)
* change "external matchfinder" to "external sequence producer"

* migrate contrib/ to new naming convention

* fix contrib build

* fix error message

* update debug strings

* fix def of invalid sequences in zstd.h

* nit

* update CHANGELOG

* fix .gitignore
2023-02-09 17:01:17 -05:00
Yann Collet
c689310b25 rewrite legacy v0.7 bound checks to be independent of address space overflow 2023-02-07 17:11:07 -08:00
Yann Collet
c5bf6b8b88 add requested check for legacy decoder v0.1
which uses a different technique to store literals,
and therefore must check for potential overwrites.
2023-02-07 14:47:16 -08:00
Yann Collet
9419747171 fix legacy decoders v0.4, v0.5 and v0.6 2023-02-07 14:02:12 -08:00
Yann Collet
67d7a659f8 port fix for v0.3 to v0.6
in case it would applicable for this version
2023-02-07 13:55:30 -08:00
Yann Collet
7a1a171658 port fix for v0.3 to v0.5
in case it would be applicable for this version too
2023-02-07 13:55:30 -08:00
Yann Collet
b20e4e95f2 copy fix for v0.3 to v0.4
in case it would be applicable for this legacy version too.
2023-02-07 13:55:30 -08:00
Yann Collet
7eb4471fec adapt v0.3 fix to v0.1
slightly different constraints on end of buffer conditions
2023-02-07 13:55:30 -08:00
Yann Collet
cfec005efd fix for v0.3 blindly ported to v0.2
in case it would be applicable here too.
2023-02-07 13:55:30 -08:00
Yann Collet
e04706c58c fix oss-fuzz case 55714
impacts legacy decoder v0.3 in 32-bit mode
2023-02-07 13:55:30 -08:00
Nick Terrell
71a0259247 Fix ZSTD_getOffsetInfo() when nbSeq == 0
In 32-bit mode, ZSTD_getOffsetInfo() can be called when nbSeq == 0, and
in this case the offset table is uninitialized. The function should just
return 0 for both values, because there are no sequences.

Credit to OSS-Fuzz
2023-02-02 14:26:41 -08:00
Elliot Gorokhovsky
31e41b3d5e
Merge pull request #3471 from embg/fast_seq_parse
Reduce external matchfinder API overhead by 25%
2023-02-01 21:30:36 -05:00
Elliot Gorokhovsky
3fe5f1fbb9 assert externalRepSearch != ZSTD_ps_auto 2023-02-01 18:24:46 -08:00
Nick Terrell
cc3e3acd34 Fix 32-bit decoding with large dictionary
The 32-bit decoder could corrupt the regenerated data by using regular
offset mode when there were actually long offsets. This is because we
were only considering the window size in the calculation, not the
dictionary size. So a large dictionary could allow longer offsets.

Fix this in two ways:
1. Instead of looking at the window size, look at the total referencable
   bytes in the history buffer. Use this in the comparison instead of
   the window size. Additionally, we were comparing against the wrong
   value, it was too low. Fix that by computing exactly the maximum
   offset for regular sequence decoding.
2. If it is possible that we have long offsets due to (1), then check
   the offset code decoding table, and if the decoding table's maximum
   number of additional bits is no more than STREAM_ACCUMULATOR_MIN,
   then we can't have long offsets.

This gates us to be using the long offsets decoder only when we are very
likely to actually have long offsets.

Note that this bug only affects the decoding of the data, and the
original compressed data, if re-read with a patched decoder, will
correctly regenerate the orginal data. Except that the encoder also had
the same issue previously.

This fixes both the open OSS-Fuzz issues.

Credit to OSS-Fuzz
2023-02-01 17:22:44 -08:00
Elliot Gorokhovsky
7f8189ca57 add ZSTD_c_fastExternalSequenceParsing cctxParam 2023-02-01 09:09:53 -08:00
Elliot Gorokhovsky
64052ef57d
Guard against invalid sequences from external matchfinders (#3465) 2023-01-31 13:55:48 -05:00
Yann Collet
39ceef27f9 bump version number to v1.5.4
start preparation for release
2023-01-30 19:06:39 -08:00
Nick Terrell
2f74507bbd Simplify 32-bit long offsets decoding logic
The previous code had an issue when `bitsConsumed == 32` it would read 0
bits for the `ofBits` read, which violates the precondition of
`BIT_readBitsFast()`. This can happen when the stream is corrupted.

Fix thie issue by always reading the maximum possible number of extra
bits. I've measured neutral decoding performance, likely because this
branch is unlikely, but this should be faster anyways. And if not, it is
only 32-bit decoding, so performance isn't as critical.

Credit to OSS-Fuzz
2023-01-30 12:21:42 -08:00
daniellerozenblit
00176638e3
Merge pull request #3460 from daniellerozenblit/fix-long-offsets-resolution-pointer
fix long offset resolution
2023-01-30 14:02:51 -05:00
Nick Terrell
b3b43f2893 Fix invalid assert in 32-bit decoding
The assert is only correct for valid sequences, so disable it for
everything execpt round trip fuzzers.
2023-01-27 14:40:38 -08:00
daniellerozenblit
2bde9fbf85
Update lib/compress/zstd_compress.c
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2023-01-27 16:58:53 -05:00
Nick Terrell
423a74986f [fse] Delete unused functions
Delete all unused FSE functions, now that we are no longer syncing
to/from upstream.

This avoids confusion about Zstd's stack usage like in Issue #3453.
It also removes dead code, which is always a plus.
2023-01-27 13:15:07 -08:00
Danielle Rozenblit
9e4c66b9e9 record long offsets in ZSTD_symbolEncodingTypeStats_t + add test case 2023-01-27 12:04:29 -08:00
Danielle Rozenblit
814f4bfb99 fix long offset resolution 2023-01-27 08:21:47 -08:00
Nick Terrell
bda947e17a [huf] Fix bug in fast C decoders
The input bounds checks were buggy because they were only breaking from
the inner loop, not the outer loop. The fuzzers found this immediately.
The fix is to use `goto _out` instead of `break`.

This condition can happen on corrupted inputs.

I've benchmarked before and after on x86-64 and there were small changes
in performance, some positive, and some negative, and they end up about
balacing out.

Credit to  OSS-Fuzz
2023-01-26 14:39:13 -08:00
Yann Collet
efc9ae3480
Merge pull request #3455 from facebook/fix3454
Provide more accurate error codes for busy-loop scenarios
2023-01-25 15:22:51 -08:00
Nick Terrell
8957fef554 [huf] Add generic C versions of the fast decoding loops
Add generic C versions of the fast decoding loops to serve architectures
that don't have an assembly implementation. Also allow selecting the C
decoding loop over the assembly decoding loop through a zstd
decompression parameter `ZSTD_d_disableHuffmanAssembly`.

I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor.
The benchmark command forces zstd to compress without any matches, using
only literals compression, and measures only Huffman decompression speed:

```
zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar
```

The new fast decoding loops outperform the previous implementation uniformly,
but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer
from the same stability problems that we've seen in the past, where the assembly
version doesn't. So even though clang gets close to assembly on x86-64, it still
has stability issues.

| Arch    | Function       | Compiler     | Default (MB/s) | Assembly (MB/s) | Fast (MB/s) |
|---------|----------------|--------------|----------------|-----------------|-------------|
| x86-64  | decompress 4X1 | gcc-12.2.0   |         1029.6 |          1308.1 |      1208.1 |
| x86-64  | decompress 4X1 | clang-14.0.6 |         1019.3 |          1305.6 |      1276.3 |
| x86-64  | decompress 4X2 | gcc-12.2.0   |         1348.5 |          1657.0 |      1374.1 |
| x86-64  | decompress 4X2 | clang-14.0.6 |         1027.6 |          1659.9 |      1468.1 |
| aarch64 | decompress 4X1 | clang-12.0.5 |         1081.0 |             N/A |      1234.9 |
| aarch64 | decompress 4X2 | clang-12.0.5 |         1270.0 |             N/A |      1516.6 |
2023-01-25 13:47:51 -08:00
Yann Collet
db18a62f89 Provide more accurate error codes for busy-loop scenarios
fixes #3454
2023-01-25 13:07:53 -08:00
daniellerozenblit
f3255bfeff
Merge pull request #3447 from daniellerozenblit/fuzz-sequence-compression
Fuzz large offsets through sequence compression api
2023-01-25 09:27:34 -05:00
Yonatan Komornik
1d636b4ba0 Bug fix redzones by unpoisoning only the intended buffer and not the followup redzone. 2023-01-24 12:54:43 -08:00
Danielle Rozenblit
7d600c628a fix bound check for ZSTD_copySequencesToSeqStoreNoBlockDelim() 2023-01-24 06:40:40 -08:00
Elliot Gorokhovsky
41682e6293
Merge pull request #3448 from facebook/embg-doc-fix
Fix ZSTD_estimate* and ZSTD_initCStream() docs
2023-01-23 15:04:53 -05:00
daniellerozenblit
9116000be6
Merge pull request #3439 from daniellerozenblit/sequence-validation-bug-fix
Fix sequence validation and seqStore bounds check
2023-01-23 13:50:37 -05:00
Elliot Gorokhovsky
3bfd3be5fb
Fix ZSTD_estimate* and ZSTD_initCStream() docs
Fix the following documentation bugs:
* Note that `ZSTD_estimate*` functions are not compatible with the external matchfinder API
* Note that `ZSTD_estimateCStreamSize_usingCCtxParams()` is not compatible with `nbWorkers >= 1`
* Remove incorrect warning that the legacy streaming API is incompatible with advanced parameters and/or dictionary compression
* Note that `ZSTD_initCStream()` is incompatible with dictionary compression
* Warn that
2023-01-23 13:28:36 -05:00
Nick Terrell
dc2b3e8876 Fix -Wstringop-overflow warning
Backported from kernel patch [0].

I wasn't able to reproduce the warning locally, but could repro it in
the kernel.

[0] https://lore.kernel.org/lkml/20220330193352.GA119296@embeddedor/
2023-01-23 10:12:25 -08:00
Danielle Rozenblit
815d1d4eda update external sequence error to fit error naming scheme 2023-01-23 09:58:34 -08:00
Danielle Rozenblit
1b65727e74 fix nits and add new error code for invalid external sequences 2023-01-23 07:59:02 -08:00
Yann Collet
d9280afb7d fixed minor c89 warning
introduced due to parallel merges
2023-01-20 18:04:20 -08:00
Nick Terrell
b4467c1061 Fix bufferless API with attached dictionary
Fixes #3102.
2023-01-20 16:15:16 -08:00
Nick Terrell
329169189c Replace Huffman boolean args with flags bit set 2023-01-20 14:12:53 -08:00
Nick Terrell
0cc1b0cb22 Delete unused Huffman functions
Remove all Huffman functions that aren't used by zstd.
2023-01-20 14:12:53 -08:00
Yann Collet
6742f20a7f
Merge pull request #3435 from facebook/c89build
added c89 build test to CI
2023-01-20 14:07:12 -08:00
Nick Terrell
666944fbe6 Cap hashLog & chainLog to ensure that we only use 32 bits of hash
* Cap shortCache chainLog to 24
* Cap row match finder hashLog so that rowLog <= 24
* Add unit tests to expose all cases. The row match finder unit tests
  are only run in 64-bit mode, because they allocate ~1GB.

Fixes #3336
2023-01-20 14:05:26 -08:00
Danielle Rozenblit
aa385ece13 fix sequence validation and bounds check in ZSTD_copySequencesToSeqStore() 2023-01-20 10:32:35 -08:00
Yann Collet
ea684c335a added c89 build test to CI 2023-01-19 14:59:30 -08:00
Elliot Gorokhovsky
bce0382c82
Bugfixes for the External Matchfinder API (#3433)
* external matchfinder bugfixes + tests

* small doc fix
2023-01-19 10:41:24 -05:00
daniellerozenblit
dc1c6cc5df
Merge pull request #3418 from daniellerozenblit/fuzz-max-block-size
Fuzz on maxBlockSize
2023-01-19 08:18:04 -05:00