1
0
mirror of https://github.com/facebook/zstd.git synced 2025-03-06 08:49:28 +02:00

341 Commits

Author SHA1 Message Date
Yann Collet
c6e5257240
Merge pull request #3977 from facebook/doc_advanced
Doc update
2024-03-21 12:33:15 -07:00
Elliot Gorokhovsky
741b87bbe1
Fuzzing and bugfixes for magicless-format decoding (#3976)
* fuzzing and bugfixes for magicless format

* reset dctx before each decompression

* do not memcmp empty buffers

* nit: decompressor errata
2024-03-20 19:22:34 -04:00
Yann Collet
c5da438dc0 fix typo 2024-03-18 12:33:22 -07:00
Yann Collet
3d18d9a9ce updated API manual 2024-03-18 12:30:54 -07:00
Yann Collet
686e7e4b4b updated version to v1.5.6 2024-03-14 15:38:14 -07:00
Yann Collet
eb5f7a7fa2 produced golden sample for the offset==0 decoder test
is correctly detected as corrupted by new version,
and is accepted (changed into offset==1) by older version.

updated documentation accordingly, with an hexadecimal representation.
2024-03-09 00:33:44 -08:00
Yann Collet
d2f56ba442 update documentation 2024-03-08 15:55:30 -08:00
Yann Collet
e127139ceb
Merge pull request #3824 from elasota/specify-zero-offset
Specify offset 0 as invalid and specify required fixup behavior
2024-03-08 15:25:48 -08:00
Yann Collet
478e5fedf9
Merge pull request #3816 from elasota/fix-state-table
Fix state table formatting
2024-03-08 15:02:00 -08:00
Yann Collet
f77f634d41 update API documentation 2024-02-24 01:28:17 -08:00
Yann Collet
7971fd16f7
Merge pull request #3817 from elasota/oversized-probs-clarification
Clarify that probability tables must not contain non-zero probabilities for invalid values
2024-01-13 11:37:54 -08:00
elasota
f06b18b3ff Specify offset 0 as invalid 2023-12-28 16:47:09 -05:00
elasota
05059e5a48 Clarify that there must be at least 2 weights, i.e. encoding all weights as 0 is invalid 2023-11-24 16:49:40 -05:00
elasota
dc84e35138 Clarify that the presence of a value with weight 1 is required 2023-11-24 16:49:40 -05:00
elasota
c5bf96fb74 Clarify that a non-zero probability for an invalid symbol is invalid 2023-11-13 00:03:56 -05:00
elasota
52e41b9ac8 Fix malformed state table 2023-11-09 12:28:21 -05:00
elasota
e61e3ff152 Clarify that decoding too many Huffman weights is a failure condition 2023-11-08 20:06:58 -05:00
elasota
324cce4996 Add definition of "log2sup" function 2023-10-31 11:45:10 -04:00
elasota
b38d87b476 Clarify that the log2 of the largest possible symbol is the maximum number of bits consumed 2023-10-31 01:17:23 -04:00
Dimitri Papadopoulos
fe34776c20
Fix new typos found by codespell 2023-09-23 18:56:01 +02:00
Yann Collet
3732a08f5b fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes
The sequence section starts with a number, which tells how sequences are present in the section.
If this number if 0, the section automatically ends.

The number 0 can be represented using the 1 byte or the 2 bytes formats.
That's because the 2-bytes formats fully overlaps the 1 byte format.

However, when 0 is represented using the 2-bytes format,
the decoder was expecting the sequence section to continue,
and was looking for FSE tables, which is incorrect.

Fixed this behavior, in both the reference decoder and the educational behavior.

In practice, this behavior never happens,
because the encoder will always select the 1-byte format to represent 0,
since this is more efficient.

Completed the fix with a new golden sample for tests,
a clarification of the specification,
and a decoder errata paragraph.
2023-06-05 16:03:00 -07:00
Yann Collet
8030342eea
Merge pull request #3659 from facebook/fixHarness
Fixed a bug in the educational decoder
2023-06-05 15:03:14 -04:00
Yann Collet
1f83b7cfc4 fix a minor inefficiency in compress_superblock
and in `decodecorpus`:
the specific case `nbSeq=127` can be represented using the 1-byte format.
Note that both the 1-byte and the 2-bytes formats are valid to represent this case,
so there was no "error", produced data remains valid,
it's just that the 1-byte format is more efficient.

fix #3667

Credit to @ip7z for finding this issue.
2023-06-05 09:51:52 -07:00
Yann Collet
5108c9ac97 Fixed a bug in the educational decoder
Credit to Igor Pavlov
2023-05-27 11:22:30 -07:00
Yann Collet
0d6954b4cc added golden file for the new decompressor erratum 2023-04-19 00:24:35 -07:00
Yann Collet
a29b6ed251 added decoder errata paragraph
for compressed blocks of size exactly 128 KB
which used to be disallowed by the spec
but have become allowed in more recent version of the spec.

While this limitation is fixed in decoders v1.5.4+,
implementers should refrain from generating such block with their custom encoder
as they could be misclassified as corrupted by older decoder versions.
2023-04-17 15:43:27 -07:00
Yann Collet
9f58241dcc updated version number to v1.5.5
also : updated man pages
2023-03-31 23:02:08 -07:00
Yann Collet
64e8511b26 added clarifications for sizes of compressed huffman blocks and streams. 2023-03-08 15:31:36 -08:00
Yann Collet
832f559b0b clarify zstd specification for Huffman blocks
Following detailed comments from @dweiller in #3508.
2023-02-18 18:18:16 -08:00
Yann Collet
95ffc767f6 updated man pages 2023-02-09 14:40:39 -08:00
Yann Collet
39ceef27f9 bump version number to v1.5.4
start preparation for release
2023-01-30 19:06:39 -08:00
Yann Collet
6a9c525903 spec update : require minimum nb of literals for 4-streams mode
Reported by @shulib :
the specification for 4-streams mode
doesn't work when the amount of literals to compress is 5 bytes.
Extending it, it also doesn't work for sizes 1 or 2.

This patch updates the specification and the implementation
to require a minimum of 6 literals to trigger or accept the 4-streams mode.

The impact is expected to be a no-op :
the 4-streams mode is never triggered for such small quantity of literals anyway,
since it would be wasteful (it costs ~7.3 bytes more than single-stream mode).
An informal lower limit is set at ~256 bytes,
so the technical minimum is very far from this limit.

This is just meant for completeness of the specification.
2022-12-22 16:14:34 -08:00
W. Felix Handte
5d693cc38c Coalesce Almost All Copyright Notices to Standard Phrasing
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do sed -i '/Copyright .* \(Yann Collet\)\|\(Meta Platforms\)/ s/Copyright .*/Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done

git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0*.c lib/legacy/zstd_v0*.h
nano ./programs/windres/zstd.rc
nano ./build/VS2010/zstd/zstd.rc
nano ./build/VS2010/libzstd-dll/libzstd-dll.rc
```
2022-12-20 12:52:34 -05:00
W. Felix Handte
7f12f24cf4 Rewrite Copyright Date Ranges from -present to -2022
Apparently it's better. Somehow.

```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do echo $f; sed -i 's/\-present/-2022/' $f; done

g co HEAD -- build/meson/
```
2022-12-20 12:44:56 -05:00
W. Felix Handte
36d5c2f326 Update Copyright Year ('2021' -> 'present')
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f);
do
  sed -i 's/\-2021/-present/' $f;
done

g co HEAD -- .github/workflows/dev-short-tests.yml # fix bad match
```
2022-12-20 12:42:50 -05:00
W. Felix Handte
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
Yann Collet
1bc9dfe46e Update documentation link to html format
fix #3319
2022-12-15 15:57:29 -08:00
Danielle Rozenblit
4dffc35f2e Convert references to https from http 2022-12-14 06:58:35 -08:00
Nick Terrell
0f4fd28a64
Deprecate ZSTD_getDecompressedSize() (#3225)
Fixes #3158.

Mark ZSTD_getDecompressedSize() as deprecated and replaced by ZSTD_getFrameContentSize().
2022-08-01 11:52:14 -07:00
Mathew R Gordon
85d633042d
Add transparency and optimize logo (#3218)
Make the front page look better in dark GH themes
2022-07-29 10:17:31 -07:00
Elliot Gorokhovsky
5c382bf110 1.5.3 version bump 2022-06-29 14:45:53 -04:00
Yann Collet
f33ccd2d1b fix small error in format documentation example
reported by @dkcasset
fix #3142
2022-05-24 04:47:49 -07:00
cuishuang
05796796fd fix some typos
Signed-off-by: cuishuang <imcusg@gmail.com>
2022-04-26 17:40:23 +08:00
Dominique Pelle
b772f53952 Typo and grammar fixes 2022-03-12 08:58:04 +01:00
Nick Terrell
696fa2524a [doc] Add decompressor errata document
Add a document that lists the known bugs in the decoder where valid
frames are rejected, along with the version that they were fixed.
2022-03-09 14:40:41 -08:00
Yann Collet
e9dd923fa4 only declare debug functions in debug mode 2022-01-26 14:47:24 -08:00
W. Felix Handte
e1323744b6 Update Docs 2022-01-07 14:14:26 -05:00
Yann Collet
41153071a0 updated manual 2021-12-21 08:52:50 -08:00
Yann Collet
a9e43b37d0
Revert "Limit ZSTD_maxCLevel to 21 for 32-bit binaries." 2021-12-20 11:43:14 -08:00
Yonatan Komornik
ef2cba609d ZSTD_maxCLevel now limited to 21 for 32-bit binaries.
CI tests for constrained memory runs with max level on 32-bit binaries.
2021-11-30 10:31:52 -08:00