krak/zstd - zstd - Gitea: Git with a cup of tea

krak/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-03-06 08:49:28 +02:00

Author	SHA1	Message	Date
elasota	8cff66f2f5	Remove text specifying probability overflow as invalid, the variable-size value encoding scheme makes this impossible.	2024-04-01 20:08:42 -04:00
Yann Collet	e127139ceb	Merge pull request #3824 from elasota/specify-zero-offset Specify offset 0 as invalid and specify required fixup behavior	2024-03-08 15:25:48 -08:00
Yann Collet	478e5fedf9	Merge pull request #3816 from elasota/fix-state-table Fix state table formatting	2024-03-08 15:02:00 -08:00
Yann Collet	7971fd16f7	Merge pull request #3817 from elasota/oversized-probs-clarification Clarify that probability tables must not contain non-zero probabilities for invalid values	2024-01-13 11:37:54 -08:00
elasota	f06b18b3ff	Specify offset 0 as invalid	2023-12-28 16:47:09 -05:00
elasota	05059e5a48	Clarify that there must be at least 2 weights, i.e. encoding all weights as 0 is invalid	2023-11-24 16:49:40 -05:00
elasota	dc84e35138	Clarify that the presence of a value with weight 1 is required	2023-11-24 16:49:40 -05:00
elasota	c5bf96fb74	Clarify that a non-zero probability for an invalid symbol is invalid	2023-11-13 00:03:56 -05:00
elasota	52e41b9ac8	Fix malformed state table	2023-11-09 12:28:21 -05:00
elasota	e61e3ff152	Clarify that decoding too many Huffman weights is a failure condition	2023-11-08 20:06:58 -05:00
elasota	324cce4996	Add definition of "log2sup" function	2023-10-31 11:45:10 -04:00
elasota	b38d87b476	Clarify that the log2 of the largest possible symbol is the maximum number of bits consumed	2023-10-31 01:17:23 -04:00
Yann Collet	3732a08f5b	fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes The sequence section starts with a number, which tells how sequences are present in the section. If this number if 0, the section automatically ends. The number 0 can be represented using the 1 byte or the 2 bytes formats. That's because the 2-bytes formats fully overlaps the 1 byte format. However, when 0 is represented using the 2-bytes format, the decoder was expecting the sequence section to continue, and was looking for FSE tables, which is incorrect. Fixed this behavior, in both the reference decoder and the educational behavior. In practice, this behavior never happens, because the encoder will always select the 1-byte format to represent 0, since this is more efficient. Completed the fix with a new golden sample for tests, a clarification of the specification, and a decoder errata paragraph.	2023-06-05 16:03:00 -07:00
Yann Collet	1f83b7cfc4	fix a minor inefficiency in compress_superblock and in `decodecorpus`: the specific case `nbSeq=127` can be represented using the 1-byte format. Note that both the 1-byte and the 2-bytes formats are valid to represent this case, so there was no "error", produced data remains valid, it's just that the 1-byte format is more efficient. fix #3667 Credit to @ip7z for finding this issue.	2023-06-05 09:51:52 -07:00
Yann Collet	64e8511b26	added clarifications for sizes of compressed huffman blocks and streams.	2023-03-08 15:31:36 -08:00
Yann Collet	832f559b0b	clarify zstd specification for Huffman blocks Following detailed comments from @dweiller in #3508.	2023-02-18 18:18:16 -08:00
Yann Collet	6a9c525903	spec update : require minimum nb of literals for 4-streams mode Reported by @shulib : the specification for 4-streams mode doesn't work when the amount of literals to compress is 5 bytes. Extending it, it also doesn't work for sizes 1 or 2. This patch updates the specification and the implementation to require a minimum of 6 literals to trigger or accept the 4-streams mode. The impact is expected to be a no-op : the 4-streams mode is never triggered for such small quantity of literals anyway, since it would be wasteful (it costs ~7.3 bytes more than single-stream mode). An informal lower limit is set at ~256 bytes, so the technical minimum is very far from this limit. This is just meant for completeness of the specification.	2022-12-22 16:14:34 -08:00
W. Felix Handte	5d693cc38c	Coalesce Almost All Copyright Notices to Standard Phrasing ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache $ -prune -o -type f); do sed -i '/Copyright .* $Yann Collet$\\|$Meta Platforms$/ s/Copyright ./Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0.c lib/legacy/zstd_v0*.h nano ./programs/windres/zstd.rc nano ./build/VS2010/zstd/zstd.rc nano ./build/VS2010/libzstd-dll/libzstd-dll.rc ```	2022-12-20 12:52:34 -05:00
W. Felix Handte	7f12f24cf4	Rewrite Copyright Date Ranges from `-present` to `-2022` Apparently it's better. Somehow. ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache $ -prune -o -type f); do echo $f; sed -i 's/\-present/-2022/' $f; done g co HEAD -- build/meson/ ```	2022-12-20 12:44:56 -05:00
W. Felix Handte	36d5c2f326	Update Copyright Year ('2021' -> 'present') ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache $ -prune -o -type f); do sed -i 's/\-2021/-present/' $f; done g co HEAD -- .github/workflows/dev-short-tests.yml # fix bad match ```	2022-12-20 12:42:50 -05:00
W. Felix Handte	8927f985ff	Update Copyright Headers 'Facebook' -> 'Meta Platforms' ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora $ -prune -o -type f); do sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f; done ```	2022-12-20 12:37:57 -05:00
Danielle Rozenblit	4dffc35f2e	Convert references to https from http	2022-12-14 06:58:35 -08:00
Yann Collet	f33ccd2d1b	fix small error in format documentation example reported by @dkcasset fix #3142	2022-05-24 04:47:49 -07:00
Dominique Pelle	b772f53952	Typo and grammar fixes	2022-03-12 08:58:04 +01:00
Dimitris Apostolou	ebbd675998	Fix typos	2021-11-13 10:04:04 +02:00
Yann Collet	0b0b62d1cf	minor mention of RFC8878 more recent update	2021-05-15 23:04:46 -07:00
senhuang42	1d6d64afa3	Change year to 2021 for compression format file	2021-01-11 08:53:29 -05:00
W. Felix Handte	2d46d764cf	Update Zstd Compression Format to Clarify Repcode Behavior	2020-12-09 20:03:58 -05:00
senhuang42	8adeb9f1e6	Updated to repcode documentation to reflect dict content size	2020-09-22 13:24:27 -04:00
senhuang42	9dcfe4d7b7	Update documentation about repcodes in dictionaries	2020-09-22 13:02:26 -04:00
Yann Collet	11a392ce23	minor markdown formatting fix	2020-05-26 13:15:35 -07:00
Yann Collet	bb3c9bf43a	updated spec on dictID==0 Specified decoder behavior on receiving a frame with dictID=0. Pushed paragraph on reserved DictID ranges into the Dictionary Format section.	2020-05-25 08:15:09 -07:00
Yann Collet	098b36e9ab	clarifications for Block_Maximum_Size as a follow up of #1882	2019-11-13 09:50:15 -08:00
Yann Collet	ff7bd16c0a	clarifications for the FSE decoding table requested in #1782	2019-10-18 17:48:12 -07:00
Yann Collet	97bb38635c	`number` instead of `nb` suggested by @terrelln	2019-08-17 08:04:42 +02:00
Yann Collet	1e07eb4d5c	clarifications on the meaning of field `Block_Size` following comments from Intel's Smita Kumar.	2019-08-16 15:15:25 +02:00
W. Felix Handte	a2861d75eb	[doc] Bump Format Spec Version	2019-07-17 18:55:45 -04:00
W. Felix Handte	c05b270edc	[doc] Remove Limitation that Compressed Block is Smaller than Uncompressed Content This changes the size limit on compressed blocks to match those of the other block types: they may not be larger than the `Block_Maximum_Decompressed_Size`, which is the smaller of the `Window_Size` and 128 KB, removing the additional restriction that had been placed on `Compressed_Block`s, that they be smaller than the decompressed content they represent. Several things motivate removing this restriction. On the one hand, this restriction is not useful for decoders: the decoder must nonetheless be prepared to accept compressed blocks that are the full `Block_Maximum_Decompressed_Size`. And on the other, this bound is actually artificially limiting. If block representations were entirely independent, a compressed representation of a block that is larger than the contents of the block would be ipso facto useless, and it would be strictly better to send it as an `Raw_Block`. However, blocks are not entirely independent, and it can make sense to pay the cost of encoding custom entropy tables in a block, even if that pushes that block size over the size of the data it represents, because those tables can be re-used by subsequent blocks. Finally, as far as I can tell, this restriction in the spec is not currently enforced in any Zstandard implementation, nor has it ever been. This change should therefore be safe to make.	2019-07-17 18:55:45 -04:00
Yann Collet	9bf00707c7	minor clarifications of history update rules	2018-10-26 15:51:51 -07:00
Ulrich Kunitz	f0fe9b0f02	Reverted removal of a trailing space. My editor removes trailing spaces while saving. Not confusing things I reverted that change.	2018-10-23 08:43:19 +02:00
Ulrich Kunitz	4f702e4445	Fixed a typo I fixed a typo in the last commit. Many thanks to @terrelin for pointing that out.	2018-10-23 08:36:50 +02:00
Ulrich Kunitz	c7942caff0	Clarify special case of offset history update If the current sequence has literal length of zero then an offset value of three is handled in a special manner. While I implemented a golang decoder I had to consult the educational decoder for clarification on the update of the offset history in that case. This commit provides the clarification that the offset value Repeated_Offset1-1 is handled as a new offset is added to the offset history accordingly.	2018-10-22 23:46:43 +02:00
Yann Collet	72a3adf826	updated format documentation to match last edits of RFC8478.	2018-09-25 16:34:26 -07:00
Yann Collet	55a8f84a2c	spec clarification following #1305 comments from @ulikunitz	2018-09-05 12:31:33 -07:00
Nick Terrell	c1a7defee1	Small fixes to zstd specification Update to keep in sync with the RFC.	2018-07-10 15:07:36 -07:00
Yann Collet	c1e6347717	fixed minor typos, detected by @terrelln	2018-06-21 18:08:11 -07:00
Yann Collet	7639db939f	updated Zstandard frame format adding clarifications from IETF RFC DISCUSS.	2018-06-21 17:55:55 -07:00
Yann Collet	a4c9c4defe	update Zstandard format specification answering a few questions from IETF RFC Discuss stage.	2018-05-31 10:47:44 -07:00
Nick Terrell	73f4c890cd	Clarify what happens when Number_of_Sequences == 0	2018-05-22 16:12:33 -07:00
Yann Collet	82ad249645	Clarifications of Zstandard format specification from IETF RFC review	2018-04-30 12:36:55 -07:00

1 2

65 Commits