1
0
mirror of https://github.com/facebook/zstd.git synced 2025-03-06 16:56:49 +02:00

709 Commits

Author SHA1 Message Date
Elliot Gorokhovsky
559762da12
Remove duplicate and incorrect docs in zstd_decompress.c (#3967) 2024-03-14 15:55:01 -04:00
Nick Terrell
ff0afbad58 [asm][aarch64] Mark that BTI and PAC are supported
Mark that `huf_decompress_amd64.S` supports BTI and PAC, which it trivially does because it is empty for aarch64.

The issue only requested BTI markings, but it also makes sense to mark PAC, which is the only other feature.

Also run add a test for this mode to the ARM64 QEMU test. Before this PR it warns on `huf_decompress_amd64.S`, after it doesn't.

Fixes Issue #3841.
2024-03-13 16:15:51 -04:00
Elliot Gorokhovsky
f65b9e27ce
Exercise ZSTD_findDecompressedSize() in the simple decompression fuzzer (#3959)
* Improve decompression fuzzer

* Fix legacy frame header fuzzer crash, add unit test
2024-03-12 17:07:06 -04:00
Yann Collet
a9fb8d4c41 new method to deal with offset==0
in this new method, when an `offset==0` is detected,
it's converted into (size_t)(-1), instead of 1.

The logic is that (size_t)(-1) is effectively an extremely large positive number,
which will not pass the offset distance test at next stage (`execSequence()`).
Checked the source code, and offset is always checked (as it should),
using a formula which is not vulnerable to arithmetic overflow:
```
RETURN_ERROR_IF(sequence.offset > (size_t)(oLitEnd - virtualStart),
```

The benefit is that such a case (offset==0) is always detected as corrupted data
as opposed to relying on the checksum to detect the error.
2024-03-08 15:26:06 -08:00
Yann Collet
8689633fdf
Merge pull request #3840 from aimuz/fix-reserved
lib/decompress: check for reserved bit corruption in zstd
2024-03-05 13:40:12 -08:00
Yann Collet
f77f634d41 update API documentation 2024-02-24 01:28:17 -08:00
Yann Collet
4b51526412 fix partial block uncompressed 2024-02-24 01:24:58 -08:00
Yann Collet
4683667785 refactor optimal parser
store stretches as intermediate solution instead of sequences.
makes it possible to link a solution to a predecessor.
2024-01-31 02:51:46 -08:00
aimuz
468bb17378
lib/decompress: check for reserved bit corruption in zstd
The patch adds a validation to ensure that the last field, which is
reserved, must be all-zeroes in ZSTD_decodeSeqHeaders. This prevents
potential corruption from going undetected.

Fixes an issue where corrupted input could lead to undefined behavior
due to improper validation of reserved bits.

Signed-off-by: aimuz <mr.imuz@gmail.com>
2023-11-28 21:04:37 +08:00
Nick Terrell
8193250615 Modernize macros to use do { } while (0)
This PR introduces no functional changes. It attempts to change all
macros currently using `{ }` or some variant of that to to
`do { } while (0)`, and introduces trailing `;` where necessary.
There were no bugs found during this migration.

The bug in Visual Studios warning on this has been fixed since VS2015.
Additionally, we have several instances of `do { } while (0)` which have
been present for several releases, so we don't have to worry about
breaking peoples builds.

Fixes Issue #3830.
2023-11-21 20:05:17 -05:00
Nick Terrell
dd4de1dd7a [huf] Fix null pointer addition
`HUF_DecompressFastArgs_init()` was adding 0 to NULL. Fix it by exiting
early for empty outputs. This is no change in behavior, because the
function was already exiting 0 in this case, just slightly later.
2023-11-20 17:13:01 -05:00
Nick Terrell
5ab78c0418 [huf] Improve fast C & ASM performance on small data
* Rename `ilimit` to `ilowest` and set it equal to `src` instead of
  `src + 6 + 8`. This is safe because the fast decoding loops guarantee
  to never read below `ilowest` already. This allows the fast decoder to
  run for at least two more iterations, because it consumes at most 7
  bytes per iteration.
* Continue the fast loop all the way until the number of safe iterations
 is 0. Initially, I thought that when it got towards the end, the
 computation of how many iterations of safe might become expensive. But
 it ends up being slower to have to decode each of the 4 streams
 individually, which makes sense.

This drastically speeds up the Huffman decoder on the `github` dataset
for the issue raised in #3762, measured with `zstd -b1e1r github/`.

| Decoder  | Speed before | Speed after |
|----------|--------------|-------------|
| Fallback | 477 MB/s     | 477 MB/s    |
| Fast C   | 384 MB/s     | 492 MB/s    |
| Assembly | 385 MB/s     | 501 MB/s    |

We can also look at the speed delta for different block sizes of silesia
using `zstd -b1e1r silesia.tar -B#`.

| Decoder  | -B1K ∆ | -B2K ∆ | -B4K ∆ | -B8K ∆ | -B16K ∆ | -B32K ∆ | -B64K ∆ | -B128K ∆ |
|----------|--------|--------|--------|--------|---------|---------|---------|----------|
| Fast C   | +11.2% | +8.2%  | +6.1%  | +4.4%  | +2.7%   | +1.5%   | +0.6%   | +0.2%    |
| Assembly | +12.5% | +9.0%  | +6.2%  | +3.6%  | +1.5%   | +0.7%   | +0.2%   | +0.03%   |
2023-11-20 17:13:01 -05:00
Nick Terrell
c7269add7e [huf] Improve fast huffman decoding speed in linux kernel
gcc in the linux kernel was not unrolling the inner loops of the Huffman
decoder, which was destroying decoding performance. The compiler was
generating crazy code with all sorts of branches. I suspect because of
Spectre mitigations, but I'm not certain. Once the loops were manually
unrolled, performance was restored.

Additionally, when gcc couldn't prove that the variable left shift in
the 4X2 decode loop wasn't greater than 63, it inserted checks to verify
it. To fix this, mask `entry.nbBits & 0x3F`, which allows gcc to eliete
this check. This is a no op, because `entry.nbBits` is guaranteed to be
less than 64.

Lastly, introduce the `HUF_DISABLE_FAST_DECODE` macro to disable the
fast C loops for Issue #3762. So if even after this change, there is a
performance regression, users can opt-out at compile time.
2023-11-20 14:56:46 -05:00
Yann Collet
c1e588fcb4
Merge pull request #3771 from DimitriPapadopoulos/codespell
Fix new typos found by codespell
2023-10-07 19:29:41 -07:00
Nick Terrell
43118da8a7 Stop suppressing pointer-overflow UBSAN errors
* Remove all pointer-overflow suppressions from our UBSAN builds/tests.
* Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress
  pointer-overflow at a per-function level. This is a superior approach
  because it also applies to users who build zstd with UBSAN.
* Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions.
  The end goal is to only tag these functions with
  `ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions
  that rely on pointer overflow, and gradually transition to using
  these.
* Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the
  pointer may be `NULL`.
* Fix all the fuzzer issues that came up. I'm sure there will be a lot
  more, but these are the ones that came up within a few minutes of
  running the fuzzers, and while running GitHub CI.
2023-09-28 17:35:05 -04:00
Nick Terrell
3daed7017a Revert "Work around nullptr-with-nonzero-offset warning"
This reverts commit c27fa399042f466080e79bb4fd8a4871bc0bcf28.
2023-09-28 17:35:05 -04:00
Dimitri Papadopoulos
fe34776c20
Fix new typos found by codespell 2023-09-23 18:56:01 +02:00
Nick Terrell
cdceb0fce5 Improve macro guards for ZSTD_assertValidSequence
Refine the macro guards to define the functions exactly when they are
needed.

This fixes the chromium build with zstd.

Thanks to @GregTho for reporting!
2023-09-22 16:36:14 -04:00
Nick Terrell
c27fa39904 Work around nullptr-with-nonzero-offset warning
See comment.
2023-08-25 13:20:59 -04:00
Yann Collet
c123e69ad0 fixed static analyzer false positive regarding @sequence initialization
make a mock initialization to please the tool
2023-06-16 16:24:48 -07:00
Yann Collet
c60dcedcc9 adapted long decoder to new decodeSequences
removed older decodeSequences
2023-06-16 15:52:00 -07:00
Yann Collet
33fca19dd4 changed ZSTD_decompressSequences_bodySplitLitBuffer() decoding loop
to behave more like the regular decoding loop.
2023-06-16 15:32:07 -07:00
Yann Collet
84e898a76c removed _old variant from splitLit 2023-06-16 14:42:28 -07:00
Yann Collet
02134fad12 changed (partially) the decodeSequences flow logic
this allows detecting overflow events without a checksum.
2023-06-16 11:57:12 -07:00
Yann Collet
b46236278a detect extraneous bytes in the Sequences section
when nbSeq == 0.

Reported by @ip7z
2023-06-13 11:43:45 -07:00
Yann Collet
3732a08f5b fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes
The sequence section starts with a number, which tells how sequences are present in the section.
If this number if 0, the section automatically ends.

The number 0 can be represented using the 1 byte or the 2 bytes formats.
That's because the 2-bytes formats fully overlaps the 1 byte format.

However, when 0 is represented using the 2-bytes format,
the decoder was expecting the sequence section to continue,
and was looking for FSE tables, which is incorrect.

Fixed this behavior, in both the reference decoder and the educational behavior.

In practice, this behavior never happens,
because the encoder will always select the 1-byte format to represent 0,
since this is more efficient.

Completed the fix with a new golden sample for tests,
a clarification of the specification,
and a decoder errata paragraph.
2023-06-05 16:03:00 -07:00
Nick Terrell
61efb2a047 Add ZSTD_d_maxBlockSize parameter
Reduces memory when blocks are guaranteed to be smaller than allowed by
the format. This is useful for streaming compression in conjunction with
ZSTD_c_maxBlockSize.

This PR saves 2 * (formatMaxBlockSize - paramMaxBlockSize) when streaming.
Once it is rebased on top of PR #3616 it will save
3 * (formatMaxBlockSize - paramMaxBlockSize).
2023-04-17 22:06:44 -07:00
Nick Terrell
0abf2baef9 Reduce streaming decompression memory by 128KB
The split literals buffer patch increased streaming decompression memory
by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This
patch removes the added 128KB buffer, because it isn't necessary.

The buffer was there because the literals compression code didn't know
the true `blockSizeMax` of the frame, and always put split literals so
they ended 128KB - 32 from the beginning of the block. Instead, we can
pass down the true `blockSizeMax` and ensure that the split literals
end up at `blockSizeMax - 32` from the beginning of the block. We
already reserve a full `blockSizeMax` bytes in streaming mode, so we
won't be overwriting the extDict window.
2023-04-17 16:31:02 -07:00
Yann Collet
e4120c5513 fixing potential over-reads
detected by @terrelln,
these issue could be triggered in specific scenarios
namely decompression of certain invalid magic-less frames,
or requested properties from certain invalid skippable frames.
2023-04-03 16:52:32 -07:00
daniellerozenblit
fcaf06ddb4
Check that dest is valid for decompression (#3555)
* add check for valid dest buffer and fuzz on random dest ptr when malloc 0

* add uptrval to linux-kernel

* remove bin files

* get rid of uptrval

* restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)
2023-03-31 23:00:55 -07:00
Han Zhu
b558190ac7 Remove clang-only branch hints from ZSTD_decodeSequence
Looking at the __builtin_expect in ZSTD_decodeSequence:

{   size_t offset;
    #if defined(__clang__)
 if (LIKELY(ofBits > 1)) {
    #else
 if (ofBits > 1) {
    #endif
 ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);

From profile-annotated assembly, the probability of ofBits > 1 is about 75%
(101k counts out of 135k counts). This is much smaller than the recommended
likelihood to use __builtin_expect which is 99%. As a result, clang moved the
else block further away which hurts cache locality. Removing this
__built_expect along with two others in ZSTD_decodeSequence gave better
performance when PGO is enabled. I suggest to remove these branch hints and
rely on PGO which leverages runtime profiles from actual workload to calculate
branch probability instead.
2023-03-28 15:36:22 -07:00
Nick Terrell
fbd97f305a Deprecated bufferless and block level APIs
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
  call those instead
2023-03-16 10:04:15 -07:00
Dimitri Papadopoulos
547794ef40
Fix typos found by codespell 2023-02-18 10:31:48 +01:00
Yonatan Komornik
c78f434aa4
Fix zstd-dll build missing dependencies (#3496)
* Fixes zstd-dll build (https://github.com/facebook/zstd/issues/3492):
- Adds pool.o and threading.o dependency to the zstd-dll target
- Moves custom allocation functions into header to avoid needing to add dependency on common.o
- Adds test target for zstd-dll
- Adds github workflow that buildis zstd-dll
2023-02-12 12:32:31 -08:00
Elliot Gorokhovsky
a7de1d9f49
Fix all MSVC warnings (#3495)
* fix and test MSVC AVX2 build

* treat msbuild warnings as errors

* fix incorrect MSVC 2019 compiler warning

* fix MSVC error D9035: option 'Gm' has been deprecated and will be removed in a future release
2023-02-11 10:56:59 -05:00
Nick Terrell
71a0259247 Fix ZSTD_getOffsetInfo() when nbSeq == 0
In 32-bit mode, ZSTD_getOffsetInfo() can be called when nbSeq == 0, and
in this case the offset table is uninitialized. The function should just
return 0 for both values, because there are no sequences.

Credit to OSS-Fuzz
2023-02-02 14:26:41 -08:00
Nick Terrell
cc3e3acd34 Fix 32-bit decoding with large dictionary
The 32-bit decoder could corrupt the regenerated data by using regular
offset mode when there were actually long offsets. This is because we
were only considering the window size in the calculation, not the
dictionary size. So a large dictionary could allow longer offsets.

Fix this in two ways:
1. Instead of looking at the window size, look at the total referencable
   bytes in the history buffer. Use this in the comparison instead of
   the window size. Additionally, we were comparing against the wrong
   value, it was too low. Fix that by computing exactly the maximum
   offset for regular sequence decoding.
2. If it is possible that we have long offsets due to (1), then check
   the offset code decoding table, and if the decoding table's maximum
   number of additional bits is no more than STREAM_ACCUMULATOR_MIN,
   then we can't have long offsets.

This gates us to be using the long offsets decoder only when we are very
likely to actually have long offsets.

Note that this bug only affects the decoding of the data, and the
original compressed data, if re-read with a patched decoder, will
correctly regenerate the orginal data. Except that the encoder also had
the same issue previously.

This fixes both the open OSS-Fuzz issues.

Credit to OSS-Fuzz
2023-02-01 17:22:44 -08:00
Nick Terrell
2f74507bbd Simplify 32-bit long offsets decoding logic
The previous code had an issue when `bitsConsumed == 32` it would read 0
bits for the `ofBits` read, which violates the precondition of
`BIT_readBitsFast()`. This can happen when the stream is corrupted.

Fix thie issue by always reading the maximum possible number of extra
bits. I've measured neutral decoding performance, likely because this
branch is unlikely, but this should be faster anyways. And if not, it is
only 32-bit decoding, so performance isn't as critical.

Credit to OSS-Fuzz
2023-01-30 12:21:42 -08:00
Nick Terrell
b3b43f2893 Fix invalid assert in 32-bit decoding
The assert is only correct for valid sequences, so disable it for
everything execpt round trip fuzzers.
2023-01-27 14:40:38 -08:00
Nick Terrell
bda947e17a [huf] Fix bug in fast C decoders
The input bounds checks were buggy because they were only breaking from
the inner loop, not the outer loop. The fuzzers found this immediately.
The fix is to use `goto _out` instead of `break`.

This condition can happen on corrupted inputs.

I've benchmarked before and after on x86-64 and there were small changes
in performance, some positive, and some negative, and they end up about
balacing out.

Credit to  OSS-Fuzz
2023-01-26 14:39:13 -08:00
Yann Collet
efc9ae3480
Merge pull request #3455 from facebook/fix3454
Provide more accurate error codes for busy-loop scenarios
2023-01-25 15:22:51 -08:00
Nick Terrell
8957fef554 [huf] Add generic C versions of the fast decoding loops
Add generic C versions of the fast decoding loops to serve architectures
that don't have an assembly implementation. Also allow selecting the C
decoding loop over the assembly decoding loop through a zstd
decompression parameter `ZSTD_d_disableHuffmanAssembly`.

I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor.
The benchmark command forces zstd to compress without any matches, using
only literals compression, and measures only Huffman decompression speed:

```
zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar
```

The new fast decoding loops outperform the previous implementation uniformly,
but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer
from the same stability problems that we've seen in the past, where the assembly
version doesn't. So even though clang gets close to assembly on x86-64, it still
has stability issues.

| Arch    | Function       | Compiler     | Default (MB/s) | Assembly (MB/s) | Fast (MB/s) |
|---------|----------------|--------------|----------------|-----------------|-------------|
| x86-64  | decompress 4X1 | gcc-12.2.0   |         1029.6 |          1308.1 |      1208.1 |
| x86-64  | decompress 4X1 | clang-14.0.6 |         1019.3 |          1305.6 |      1276.3 |
| x86-64  | decompress 4X2 | gcc-12.2.0   |         1348.5 |          1657.0 |      1374.1 |
| x86-64  | decompress 4X2 | clang-14.0.6 |         1027.6 |          1659.9 |      1468.1 |
| aarch64 | decompress 4X1 | clang-12.0.5 |         1081.0 |             N/A |      1234.9 |
| aarch64 | decompress 4X2 | clang-12.0.5 |         1270.0 |             N/A |      1516.6 |
2023-01-25 13:47:51 -08:00
Yann Collet
db18a62f89 Provide more accurate error codes for busy-loop scenarios
fixes #3454
2023-01-25 13:07:53 -08:00
Nick Terrell
dc2b3e8876 Fix -Wstringop-overflow warning
Backported from kernel patch [0].

I wasn't able to reproduce the warning locally, but could repro it in
the kernel.

[0] https://lore.kernel.org/lkml/20220330193352.GA119296@embeddedor/
2023-01-23 10:12:25 -08:00
Nick Terrell
329169189c Replace Huffman boolean args with flags bit set 2023-01-20 14:12:53 -08:00
Nick Terrell
0cc1b0cb22 Delete unused Huffman functions
Remove all Huffman functions that aren't used by zstd.
2023-01-20 14:12:53 -08:00
Yann Collet
ea684c335a added c89 build test to CI 2023-01-19 14:59:30 -08:00
Elliot Gorokhovsky
5d8cfa6b96
Deprecate advanced streaming functions (#3408)
* deprecate advanced streaming functions

* remove internal usage of the deprecated functions

* nit

* suppress warnings in tests/zstreamtest.c

* purge ZSTD_initDStream_usingDict

* nits

* c90 compat

* zstreamtest.c already disables deprecation warnings!

* fix initDStream() return value

* fix typo

* wasn't able to import private symbol properly, this commit works around that

* new strategy for zbuff

* undo zbuff deprecation warning changes

* move ZSTD_DISABLE_DEPRECATE_WARNINGS from .h to .c
2023-01-13 14:51:47 -05:00
Yann Collet
d5509080bc
Merge pull request #3419 from facebook/fix3416
fix root cause of #3416
2023-01-13 00:21:08 -08:00
Nick Terrell
5b266196a4 Add support for in-place decompression
* Add a function and macro ZSTD_decompressionMargin() that computes the
  decompression margin for in-place decompression. The function computes
  a tight margin that works in all cases, and the macro computes an upper
  bound that will only work if flush isn't used.
* When doing in-place decompression, make sure that our output buffer
  doesn't overlap with the input buffer. This ensures that we don't
  decide to use the portion of the output buffer that overlaps the input
  buffer for temporary memory, like for literals.
* Add a simple unit test.
* Add in-place decompression to the simple_round_trip and
  stream_round_trip fuzzers. This should help verify that our margin stays
  correct.
2023-01-12 16:28:08 -08:00