1
0
mirror of https://github.com/facebook/zstd.git synced 2025-03-06 08:49:28 +02:00

65 Commits

Author SHA1 Message Date
elasota
8cff66f2f5 Remove text specifying probability overflow as invalid, the variable-size value encoding scheme makes this impossible. 2024-04-01 20:08:42 -04:00
Yann Collet
e127139ceb
Merge pull request #3824 from elasota/specify-zero-offset
Specify offset 0 as invalid and specify required fixup behavior
2024-03-08 15:25:48 -08:00
Yann Collet
478e5fedf9
Merge pull request #3816 from elasota/fix-state-table
Fix state table formatting
2024-03-08 15:02:00 -08:00
Yann Collet
7971fd16f7
Merge pull request #3817 from elasota/oversized-probs-clarification
Clarify that probability tables must not contain non-zero probabilities for invalid values
2024-01-13 11:37:54 -08:00
elasota
f06b18b3ff Specify offset 0 as invalid 2023-12-28 16:47:09 -05:00
elasota
05059e5a48 Clarify that there must be at least 2 weights, i.e. encoding all weights as 0 is invalid 2023-11-24 16:49:40 -05:00
elasota
dc84e35138 Clarify that the presence of a value with weight 1 is required 2023-11-24 16:49:40 -05:00
elasota
c5bf96fb74 Clarify that a non-zero probability for an invalid symbol is invalid 2023-11-13 00:03:56 -05:00
elasota
52e41b9ac8 Fix malformed state table 2023-11-09 12:28:21 -05:00
elasota
e61e3ff152 Clarify that decoding too many Huffman weights is a failure condition 2023-11-08 20:06:58 -05:00
elasota
324cce4996 Add definition of "log2sup" function 2023-10-31 11:45:10 -04:00
elasota
b38d87b476 Clarify that the log2 of the largest possible symbol is the maximum number of bits consumed 2023-10-31 01:17:23 -04:00
Yann Collet
3732a08f5b fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes
The sequence section starts with a number, which tells how sequences are present in the section.
If this number if 0, the section automatically ends.

The number 0 can be represented using the 1 byte or the 2 bytes formats.
That's because the 2-bytes formats fully overlaps the 1 byte format.

However, when 0 is represented using the 2-bytes format,
the decoder was expecting the sequence section to continue,
and was looking for FSE tables, which is incorrect.

Fixed this behavior, in both the reference decoder and the educational behavior.

In practice, this behavior never happens,
because the encoder will always select the 1-byte format to represent 0,
since this is more efficient.

Completed the fix with a new golden sample for tests,
a clarification of the specification,
and a decoder errata paragraph.
2023-06-05 16:03:00 -07:00
Yann Collet
1f83b7cfc4 fix a minor inefficiency in compress_superblock
and in `decodecorpus`:
the specific case `nbSeq=127` can be represented using the 1-byte format.
Note that both the 1-byte and the 2-bytes formats are valid to represent this case,
so there was no "error", produced data remains valid,
it's just that the 1-byte format is more efficient.

fix #3667

Credit to @ip7z for finding this issue.
2023-06-05 09:51:52 -07:00
Yann Collet
64e8511b26 added clarifications for sizes of compressed huffman blocks and streams. 2023-03-08 15:31:36 -08:00
Yann Collet
832f559b0b clarify zstd specification for Huffman blocks
Following detailed comments from @dweiller in #3508.
2023-02-18 18:18:16 -08:00
Yann Collet
6a9c525903 spec update : require minimum nb of literals for 4-streams mode
Reported by @shulib :
the specification for 4-streams mode
doesn't work when the amount of literals to compress is 5 bytes.
Extending it, it also doesn't work for sizes 1 or 2.

This patch updates the specification and the implementation
to require a minimum of 6 literals to trigger or accept the 4-streams mode.

The impact is expected to be a no-op :
the 4-streams mode is never triggered for such small quantity of literals anyway,
since it would be wasteful (it costs ~7.3 bytes more than single-stream mode).
An informal lower limit is set at ~256 bytes,
so the technical minimum is very far from this limit.

This is just meant for completeness of the specification.
2022-12-22 16:14:34 -08:00
W. Felix Handte
5d693cc38c Coalesce Almost All Copyright Notices to Standard Phrasing
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do sed -i '/Copyright .* \(Yann Collet\)\|\(Meta Platforms\)/ s/Copyright .*/Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done

git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0*.c lib/legacy/zstd_v0*.h
nano ./programs/windres/zstd.rc
nano ./build/VS2010/zstd/zstd.rc
nano ./build/VS2010/libzstd-dll/libzstd-dll.rc
```
2022-12-20 12:52:34 -05:00
W. Felix Handte
7f12f24cf4 Rewrite Copyright Date Ranges from -present to -2022
Apparently it's better. Somehow.

```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do echo $f; sed -i 's/\-present/-2022/' $f; done

g co HEAD -- build/meson/
```
2022-12-20 12:44:56 -05:00
W. Felix Handte
36d5c2f326 Update Copyright Year ('2021' -> 'present')
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f);
do
  sed -i 's/\-2021/-present/' $f;
done

g co HEAD -- .github/workflows/dev-short-tests.yml # fix bad match
```
2022-12-20 12:42:50 -05:00
W. Felix Handte
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
Danielle Rozenblit
4dffc35f2e Convert references to https from http 2022-12-14 06:58:35 -08:00
Yann Collet
f33ccd2d1b fix small error in format documentation example
reported by @dkcasset
fix #3142
2022-05-24 04:47:49 -07:00
Dominique Pelle
b772f53952 Typo and grammar fixes 2022-03-12 08:58:04 +01:00
Dimitris Apostolou
ebbd675998
Fix typos 2021-11-13 10:04:04 +02:00
Yann Collet
0b0b62d1cf minor mention of RFC8878
more recent update
2021-05-15 23:04:46 -07:00
senhuang42
1d6d64afa3 Change year to 2021 for compression format file 2021-01-11 08:53:29 -05:00
W. Felix Handte
2d46d764cf Update Zstd Compression Format to Clarify Repcode Behavior 2020-12-09 20:03:58 -05:00
senhuang42
8adeb9f1e6 Updated to repcode documentation to reflect dict content size 2020-09-22 13:24:27 -04:00
senhuang42
9dcfe4d7b7 Update documentation about repcodes in dictionaries 2020-09-22 13:02:26 -04:00
Yann Collet
11a392ce23 minor markdown formatting fix 2020-05-26 13:15:35 -07:00
Yann Collet
bb3c9bf43a updated spec on dictID==0
Specified decoder behavior on receiving a frame with dictID=0.

Pushed paragraph on reserved DictID ranges into the Dictionary Format section.
2020-05-25 08:15:09 -07:00
Yann Collet
098b36e9ab clarifications for Block_Maximum_Size
as a follow up of #1882
2019-11-13 09:50:15 -08:00
Yann Collet
ff7bd16c0a clarifications for the FSE decoding table
requested in #1782
2019-10-18 17:48:12 -07:00
Yann Collet
97bb38635c number instead of nb
suggested by @terrelln
2019-08-17 08:04:42 +02:00
Yann Collet
1e07eb4d5c clarifications on the meaning of field Block_Size
following comments from Intel's Smita Kumar.
2019-08-16 15:15:25 +02:00
W. Felix Handte
a2861d75eb [doc] Bump Format Spec Version 2019-07-17 18:55:45 -04:00
W. Felix Handte
c05b270edc [doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content
This changes the size limit on compressed blocks to match those of the other
block types: they may not be larger than the `Block_Maximum_Decompressed_Size`,
which is the smaller of the `Window_Size` and 128 KB, removing the additional
restriction that had been placed on `Compressed_Block`s, that they be smaller
than the decompressed content they represent.

Several things motivate removing this restriction. On the one hand, this
restriction is not useful for decoders: the decoder must nonetheless be
prepared to accept compressed blocks that are the full
`Block_Maximum_Decompressed_Size`. And on the other, this bound is actually
artificially limiting. If block representations were entirely independent,
a compressed representation of a block that is larger than the contents of the
block would be ipso facto useless, and it would be strictly better to send it
as an `Raw_Block`. However, blocks are not entirely independent, and it can
make sense to pay the cost of encoding custom entropy tables in a block, even
if that pushes that block size over the size of the data it represents,
because those tables can be re-used by subsequent blocks.

Finally, as far as I can tell, this restriction in the spec is not currently
enforced in any Zstandard implementation, nor has it ever been. This change
should therefore be safe to make.
2019-07-17 18:55:45 -04:00
Yann Collet
9bf00707c7 minor clarifications of history update rules 2018-10-26 15:51:51 -07:00
Ulrich Kunitz
f0fe9b0f02 Reverted removal of a trailing space.
My editor removes trailing spaces while saving. Not confusing things I
reverted that change.
2018-10-23 08:43:19 +02:00
Ulrich Kunitz
4f702e4445 Fixed a typo
I fixed a typo in the last commit. Many thanks to @terrelin for pointing
that out.
2018-10-23 08:36:50 +02:00
Ulrich Kunitz
c7942caff0 Clarify special case of offset history update
If the current sequence has literal length of zero then an offset value
of three is handled in a special manner. While I implemented a golang
decoder I had to consult the educational decoder for clarification on
the update of the offset history in that case. This commit provides the
clarification that the offset value Repeated_Offset1-1 is handled as a
new offset is added to the offset history accordingly.
2018-10-22 23:46:43 +02:00
Yann Collet
72a3adf826 updated format documentation
to match last edits of RFC8478.
2018-09-25 16:34:26 -07:00
Yann Collet
55a8f84a2c spec clarification
following #1305 comments from @ulikunitz
2018-09-05 12:31:33 -07:00
Nick Terrell
c1a7defee1 Small fixes to zstd specification
Update to keep in sync with the RFC.
2018-07-10 15:07:36 -07:00
Yann Collet
c1e6347717 fixed minor typos, detected by @terrelln 2018-06-21 18:08:11 -07:00
Yann Collet
7639db939f updated Zstandard frame format
adding clarifications from IETF RFC DISCUSS.
2018-06-21 17:55:55 -07:00
Yann Collet
a4c9c4defe update Zstandard format specification
answering a few questions from IETF RFC Discuss stage.
2018-05-31 10:47:44 -07:00
Nick Terrell
73f4c890cd Clarify what happens when Number_of_Sequences == 0 2018-05-22 16:12:33 -07:00
Yann Collet
82ad249645 Clarifications of Zstandard format specification
from IETF RFC review
2018-04-30 12:36:55 -07:00