1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-04 14:48:45 +02:00
Commit Graph

103 Commits

Author SHA1 Message Date
b46236278a detect extraneous bytes in the Sequences section
when nbSeq == 0.

Reported by @ip7z
2023-06-13 11:43:45 -07:00
3732a08f5b fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes
The sequence section starts with a number, which tells how sequences are present in the section.
If this number if 0, the section automatically ends.

The number 0 can be represented using the 1 byte or the 2 bytes formats.
That's because the 2-bytes formats fully overlaps the 1 byte format.

However, when 0 is represented using the 2-bytes format,
the decoder was expecting the sequence section to continue,
and was looking for FSE tables, which is incorrect.

Fixed this behavior, in both the reference decoder and the educational behavior.

In practice, this behavior never happens,
because the encoder will always select the 1-byte format to represent 0,
since this is more efficient.

Completed the fix with a new golden sample for tests,
a clarification of the specification,
and a decoder errata paragraph.
2023-06-05 16:03:00 -07:00
0abf2baef9 Reduce streaming decompression memory by 128KB
The split literals buffer patch increased streaming decompression memory
by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This
patch removes the added 128KB buffer, because it isn't necessary.

The buffer was there because the literals compression code didn't know
the true `blockSizeMax` of the frame, and always put split literals so
they ended 128KB - 32 from the beginning of the block. Instead, we can
pass down the true `blockSizeMax` and ensure that the split literals
end up at `blockSizeMax - 32` from the beginning of the block. We
already reserve a full `blockSizeMax` bytes in streaming mode, so we
won't be overwriting the extDict window.
2023-04-17 16:31:02 -07:00
fcaf06ddb4 Check that dest is valid for decompression (#3555)
* add check for valid dest buffer and fuzz on random dest ptr when malloc 0

* add uptrval to linux-kernel

* remove bin files

* get rid of uptrval

* restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)
2023-03-31 23:00:55 -07:00
b558190ac7 Remove clang-only branch hints from ZSTD_decodeSequence
Looking at the __builtin_expect in ZSTD_decodeSequence:

{   size_t offset;
    #if defined(__clang__)
 if (LIKELY(ofBits > 1)) {
    #else
 if (ofBits > 1) {
    #endif
 ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1);

From profile-annotated assembly, the probability of ofBits > 1 is about 75%
(101k counts out of 135k counts). This is much smaller than the recommended
likelihood to use __builtin_expect which is 99%. As a result, clang moved the
else block further away which hurts cache locality. Removing this
__built_expect along with two others in ZSTD_decodeSequence gave better
performance when PGO is enabled. I suggest to remove these branch hints and
rely on PGO which leverages runtime profiles from actual workload to calculate
branch probability instead.
2023-03-28 15:36:22 -07:00
fbd97f305a Deprecated bufferless and block level APIs
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
  call those instead
2023-03-16 10:04:15 -07:00
547794ef40 Fix typos found by codespell 2023-02-18 10:31:48 +01:00
71a0259247 Fix ZSTD_getOffsetInfo() when nbSeq == 0
In 32-bit mode, ZSTD_getOffsetInfo() can be called when nbSeq == 0, and
in this case the offset table is uninitialized. The function should just
return 0 for both values, because there are no sequences.

Credit to OSS-Fuzz
2023-02-02 14:26:41 -08:00
cc3e3acd34 Fix 32-bit decoding with large dictionary
The 32-bit decoder could corrupt the regenerated data by using regular
offset mode when there were actually long offsets. This is because we
were only considering the window size in the calculation, not the
dictionary size. So a large dictionary could allow longer offsets.

Fix this in two ways:
1. Instead of looking at the window size, look at the total referencable
   bytes in the history buffer. Use this in the comparison instead of
   the window size. Additionally, we were comparing against the wrong
   value, it was too low. Fix that by computing exactly the maximum
   offset for regular sequence decoding.
2. If it is possible that we have long offsets due to (1), then check
   the offset code decoding table, and if the decoding table's maximum
   number of additional bits is no more than STREAM_ACCUMULATOR_MIN,
   then we can't have long offsets.

This gates us to be using the long offsets decoder only when we are very
likely to actually have long offsets.

Note that this bug only affects the decoding of the data, and the
original compressed data, if re-read with a patched decoder, will
correctly regenerate the orginal data. Except that the encoder also had
the same issue previously.

This fixes both the open OSS-Fuzz issues.

Credit to OSS-Fuzz
2023-02-01 17:22:44 -08:00
2f74507bbd Simplify 32-bit long offsets decoding logic
The previous code had an issue when `bitsConsumed == 32` it would read 0
bits for the `ofBits` read, which violates the precondition of
`BIT_readBitsFast()`. This can happen when the stream is corrupted.

Fix thie issue by always reading the maximum possible number of extra
bits. I've measured neutral decoding performance, likely because this
branch is unlikely, but this should be faster anyways. And if not, it is
only 32-bit decoding, so performance isn't as critical.

Credit to OSS-Fuzz
2023-01-30 12:21:42 -08:00
b3b43f2893 Fix invalid assert in 32-bit decoding
The assert is only correct for valid sequences, so disable it for
everything execpt round trip fuzzers.
2023-01-27 14:40:38 -08:00
8957fef554 [huf] Add generic C versions of the fast decoding loops
Add generic C versions of the fast decoding loops to serve architectures
that don't have an assembly implementation. Also allow selecting the C
decoding loop over the assembly decoding loop through a zstd
decompression parameter `ZSTD_d_disableHuffmanAssembly`.

I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor.
The benchmark command forces zstd to compress without any matches, using
only literals compression, and measures only Huffman decompression speed:

```
zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar
```

The new fast decoding loops outperform the previous implementation uniformly,
but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer
from the same stability problems that we've seen in the past, where the assembly
version doesn't. So even though clang gets close to assembly on x86-64, it still
has stability issues.

| Arch    | Function       | Compiler     | Default (MB/s) | Assembly (MB/s) | Fast (MB/s) |
|---------|----------------|--------------|----------------|-----------------|-------------|
| x86-64  | decompress 4X1 | gcc-12.2.0   |         1029.6 |          1308.1 |      1208.1 |
| x86-64  | decompress 4X1 | clang-14.0.6 |         1019.3 |          1305.6 |      1276.3 |
| x86-64  | decompress 4X2 | gcc-12.2.0   |         1348.5 |          1657.0 |      1374.1 |
| x86-64  | decompress 4X2 | clang-14.0.6 |         1027.6 |          1659.9 |      1468.1 |
| aarch64 | decompress 4X1 | clang-12.0.5 |         1081.0 |             N/A |      1234.9 |
| aarch64 | decompress 4X2 | clang-12.0.5 |         1270.0 |             N/A |      1516.6 |
2023-01-25 13:47:51 -08:00
329169189c Replace Huffman boolean args with flags bit set 2023-01-20 14:12:53 -08:00
0cc1b0cb22 Delete unused Huffman functions
Remove all Huffman functions that aren't used by zstd.
2023-01-20 14:12:53 -08:00
089b2797e3 Merge pull request #3398 from facebook/fix3316
spec update : require minimum nb of literals for 4-streams mode
2022-12-22 16:57:05 -08:00
6a9c525903 spec update : require minimum nb of literals for 4-streams mode
Reported by @shulib :
the specification for 4-streams mode
doesn't work when the amount of literals to compress is 5 bytes.
Extending it, it also doesn't work for sizes 1 or 2.

This patch updates the specification and the implementation
to require a minimum of 6 literals to trigger or accept the 4-streams mode.

The impact is expected to be a no-op :
the 4-streams mode is never triggered for such small quantity of literals anyway,
since it would be wasteful (it costs ~7.3 bytes more than single-stream mode).
An informal lower limit is set at ~256 bytes,
so the technical minimum is very far from this limit.

This is just meant for completeness of the specification.
2022-12-22 16:14:34 -08:00
ea2895cef4 Support decompression of compressed blocks of size ZSTD_BLOCKSIZE_MAX exactly 2022-12-22 12:40:27 -08:00
5d693cc38c Coalesce Almost All Copyright Notices to Standard Phrasing
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do sed -i '/Copyright .* \(Yann Collet\)\|\(Meta Platforms\)/ s/Copyright .*/Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done

git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0*.c lib/legacy/zstd_v0*.h
nano ./programs/windres/zstd.rc
nano ./build/VS2010/zstd/zstd.rc
nano ./build/VS2010/libzstd-dll/libzstd-dll.rc
```
2022-12-20 12:52:34 -05:00
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
a70ca2bd7d Fix off-by-one error in superblock mode (#3221)
Fixes #3212.

Long literal and match lengths had an off-by-one error in ZSTD_getSequenceLength.
Fix the off-by-one error, and add a golden compression test that catches the bug.
Also run all the golden tests in the cli-tests framework.
2022-08-03 11:28:39 -07:00
ec5fdcde19 lib: add hint to generate more pipeline friendly code (#3138)
With statistic data of test data files of silesia
the chance of position beyond highThreshold is very
low (~1.3%@L8 in most cases, all <2.5%), and is in
"lowprob area". Add the branch hint so compiler can
get better pipiline codegen.
With this change it is observed ~1% of mozilla and
xml, and slight (0.3%~0.8%) but consistent uplift on
other files on Arm N1.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade
2022-07-29 10:28:04 -07:00
558cf20d0d decomp: add prefetch for matched seq on aarch64 (#3164)
match is used for following sequence copy. It is
only updated when extDict is needed, which is a
low probability case. So it can be prefetched to
reduce cache miss.
The benchmarks on various Arm platforms showed
uplift from 1% ~ 14% with gcc-11/clang-14.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: If201af4799d2455d74c79f8387404439d7f684ae
2022-07-29 10:27:20 -07:00
43f21a600e Intial commit to address 3090. Added support to decompress empty block. (#3118)
* Intial commit to address 3090. Added support to decompress empty block

* Update zstd_decompress_block.c

Addressed review comments for the case of 'set_basic'

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2022-07-14 11:54:34 -07:00
2491c65937 dec: adjust seqSymbol load on aarch64
ZSTD_seqSymbol is a structure with total of 64 bits
wide. So it can be loaded in one operation and
extract its fields by simply shifting or extracting
on aarch64.
GCC doesn't recognize this and generates more
unnecessary ldr/ldrb/ldrh operations that cause
performance drop.
With this change it is observed 2~4% uplift of
silesia and 2.5~6% of cantrbry @L8 on Arm N1.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I7748909204cf78a17eb9d4f2333692d53239daa8
2022-05-30 22:01:38 +08:00
b772f53952 Typo and grammar fixes 2022-03-12 08:58:04 +01:00
db2f4a6532 Move bitwise builtins into bits.h 2022-02-14 11:16:03 -05:00
7616e39f3b adding traces to better track processing of literals 2022-01-26 14:47:21 -08:00
2fbb1d10c1 Reduce bit tables to 8bit
This saves some 1.7Kb in rodata section (x86_64, zstd tool),
while assembler code stays the same except
the type of a few load/extend instructions.

Should not have negative performance implications.
2021-12-14 23:47:57 +01:00
5414dd7978 [bmi2] Add lzcnt and bmi target attributes
* When dynamic dispatching to bmi2 add lzcnt and bmi to the
  TARGET_ATTRIBUTE.
* Centralize the bmi2 TARGET_ATTRIBUTE definition to
  BMI2_TARGET_ATTRIBUTE so we can change it in the future.
* Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't
  be any cases where bmi2 is supported but bmi1 isn't. But, since we are
  using the instruction we should check bmi1 as well.
2021-11-30 17:54:56 -08:00
04734ee84a Fix oss fuzz test error (#2837) 2021-10-29 10:29:50 -04:00
6a7ede3dfc Reduce size of dctx by reutilizing dst buffer (#2751)
* Reduce size of dctx by reutilizing dst buffer

Co-authored-by: Binh Vo <binhvo@fb.com>
2021-10-25 10:38:01 -04:00
0d45540695 decompress: conditionally remove bmi2 from context
Use an helper function, which will just return 0 in case
the feature is disabled.
Allows constant propagation and removal of dead code.
2021-09-26 14:41:37 +02:00
189e87bcbe [lib] Make lib compatible with -Wfall-through excepting legacy
Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported
compilers this uses an attribute, otherwise it becomes a comment.

This is necessary to be compatible with clang's `-Wfall-through`, and
gcc's `-Wfall-through=2` which don't support comments. Without this the
linux build emits a bunch of warnings.

Also add a test to CI to ensure that we don't regress.
2021-09-23 10:51:18 -07:00
2c2c9e7dfd Add possible improvements for gcc-11 2021-06-29 09:06:47 +01:00
08a3ddbd28 Add comment for gcc-11 2021-06-08 20:54:21 +01:00
6534c0000f Be C89 compliant and fix alignment for gcc11 2021-06-08 20:45:57 +01:00
a80d268700 Optimize ZSTD_decodeSequence by another x% 2021-05-29 18:21:10 +01:00
439e58d060 improved gcc-9 and gcc-10 decoding speed
the new alignment setting is better for gcc-9 and gcc-10
by about ~+5%.

Unfortunately, it's worse for essentially all other compilers.

Make the new alignment setting conditional to gcc-9+.
2021-05-08 00:01:01 -07:00
6755baf940 update decoder hot loop alignment
This seems to bring an additional ~+1.2% decompression speed
on average across 10 compilers x 6 scenarios.
2021-05-07 15:18:16 -07:00
1db5947591 improve decompression speed of long variant by ~+5%
changed strategy,
now unconditionally prefetch the first 2 cache lines,
instead of cache lines corresponding to the first and last bytes of the match.

This better corresponds to cpu expectation,
which should auto-prefetch following cachelines on detecting the sequential nature of the read.

This is globally positive, by +5%,
though exact gains depend on compiler (from -2% to +15%).
The only negative counter-example is gcc-9.
2021-05-07 11:26:14 -07:00
ee425faaa7 Merge branch 'dev' into d_prefetch_refactor 2021-05-06 19:49:26 -07:00
b052b583e5 [lib] Fix UBSAN warning in ZSTD_decompressSequences() 2021-05-06 15:31:30 -07:00
7ef6d7b36c deeper prefetching pipeline for decompressSequencesLong
pipeline increased from 4 to 8 slots.
This change substantially improves decompression speed when there are long distance offsets.
example with enwik9 compressed at level 22 :
gcc-9 : 947 -> 1039 MB/s
clang-10: 884 -> 946 MB/s

I also checked the "cold dictionary" scenario,
and found a smaller benefit, around ~2%
(measurements are more noisy for this scenario).
2021-05-05 10:04:03 -07:00
8cde167a27 Merge branch 'dev' into d_prefetch_refactor 2021-05-05 09:13:38 -07:00
a494308ae9 [copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
f5434663ea Refactor prefetching for the decoding loop
Following #2545,
I noticed that one field in `seq_t` is optional,
and only used in combination with prefetching.
(This may have contributed to static analyzer failure to detect correct initialization).

I then wondered if it would be possible to rewrite the code
so that this optional part is handled directly by the prefetching code
rather than delegated as an option into `ZSTD_decodeSequence()`.

This resulted into this refactoring exercise
where the prefetching responsibility is better isolated into its own function
and `ZSTD_decodeSequence()` is streamlined to contain strictly Sequence decoding operations.
Incidently, due to better code locality,
it reduces the need to send information around,
leading to simplified interface, and smaller state structures.
2021-03-19 15:48:17 -07:00
f9b1e711ba [zstd] Fix NULL pointer addition in ZSTD_checkContinuity()
Don't start a new section when `dstSize == 0` to avoid NULL
pointer addition.
2021-02-05 12:18:06 -08:00
b9748757b0 fixed minor cast warning 2021-02-05 09:55:54 -08:00
66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
0b39531d75 moving all references to release branch
was previously `master`
2020-12-16 23:00:35 -08:00