krak/zstd - zstd - Gitea: Git with a cup of tea

krak/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-07-03 22:30:29 +02:00

Author	SHA1	Message	Date
Nick Terrell	329169189c	Replace Huffman boolean args with flags bit set	2023-01-20 14:12:53 -08:00
Yann Collet	ac45e078a5	add explanation about new test as requested by @terrelln	2023-01-12 15:49:01 -08:00
Yann Collet	796699c0bc	fix root cause of #3416 A minor change in `5434de0` changed a `<=` into a `<`, and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress (previous limit was effectively 7 literals). This is not in itself a problem, as the threshold is merely an heuristic, but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit. This bug would make the literal compressor believes that all literals are the same symbol, but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions, this outcome could be false, resulting in data corruption. Replaced the blind heuristic by an actual test for all limit cases, so that even if the threshold is changed again in the future, the detection of RLE mode will remain reliable.	2023-01-12 15:41:08 -08:00
Yann Collet	5434de01e2	improve compression ratio of small alphabets fix #3328 In situations where the alphabet size is very small, the evaluation of literal costs from the Optimal Parser is initially incorrect. It takes some time to converge, during which compression is less efficient. This is especially important for small files, because there will not be enough data to converge, so most of the parsing is selected based on incorrect metrics. After this patch, the scenario ##3328 gets fixed, delivering the expected 29 bytes compressed size (smallest known compressed size).	2023-01-03 12:22:37 -08:00
Yann Collet	6a9c525903	spec update : require minimum nb of literals for 4-streams mode Reported by @shulib : the specification for 4-streams mode doesn't work when the amount of literals to compress is 5 bytes. Extending it, it also doesn't work for sizes 1 or 2. This patch updates the specification and the implementation to require a minimum of 6 literals to trigger or accept the 4-streams mode. The impact is expected to be a no-op : the 4-streams mode is never triggered for such small quantity of literals anyway, since it would be wasteful (it costs ~7.3 bytes more than single-stream mode). An informal lower limit is set at ~256 bytes, so the technical minimum is very far from this limit. This is just meant for completeness of the specification.	2022-12-22 16:14:34 -08:00
W. Felix Handte	5d693cc38c	Coalesce Almost All Copyright Notices to Standard Phrasing ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache $ -prune -o -type f); do sed -i '/Copyright .* $Yann Collet$\\|$Meta Platforms$/ s/Copyright ./Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0.c lib/legacy/zstd_v0*.h nano ./programs/windres/zstd.rc nano ./build/VS2010/zstd/zstd.rc nano ./build/VS2010/libzstd-dll/libzstd-dll.rc ```	2022-12-20 12:52:34 -05:00
W. Felix Handte	8927f985ff	Update Copyright Headers 'Facebook' -> 'Meta Platforms' ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora $ -prune -o -type f); do sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f; done ```	2022-12-20 12:37:57 -05:00
Danielle Rozenblit	8888a2ddcc	CI failure fixes	2022-10-11 13:12:19 -07:00
Yann Collet	cad9f8d5f9	fix 44239 credit to oss-fuzz This issue could happen when using the new Sequence Compression API in Explicit Delimiter Mode with a too small dstCapacity. In which case, there was one place where the buffer size wasn't checked.	2022-02-01 10:49:38 -08:00
Yann Collet	a66e8bb437	introduced LitHufLog constant which properly represents the maximum bit size of compressed literals (11) as defined in the specification. To be preferred from HUF_TABLELOG_DEFAULT which represents the same value but by accident. Name selected to keep the same convention as existing width definitions, MLFSELog, LLFSELog and OffFSELog.	2022-01-26 14:47:24 -08:00
Yann Collet	e9dd923fa4	only declare debug functions in debug mode	2022-01-26 14:47:24 -08:00
Yann Collet	4684836f4f	update regression tests minor compression ratio benefits in some cases, no compression ratio regression in the measured scenarios.	2022-01-26 14:47:24 -08:00
Yann Collet	7616e39f3b	adding traces to better track processing of literals	2022-01-26 14:47:21 -08:00
Binh Vo	dc5b693f1e	Proactively skip huffman compression based on sampling where non-compressibility is suspected	2021-06-30 11:02:47 -04:00
Nick Terrell	8383fc828d	Merge pull request #2541 from ihsinme/patch-1 simple fix for using bit operator.	2021-04-02 13:01:09 -07:00
Nick Terrell	a494308ae9	[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files * Switch to yearless copyright per FB policy * Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources * Add zstd copyright/license header to the `contrib/linux-kernel` sources * Update the `tests/test-license.py` to check for yearless copyright * Improvements to `tests/test-license.py` * Check `contrib/linux-kernel` in `tests/test-license.py`	2021-03-30 10:30:43 -07:00
ihsinme	a5bf09d764	simple fix for using bit operator. good day. It seems to me that the developer intended to use a logical operator. so I suggest a simple fix.	2021-03-17 11:37:42 +03:00
Yann Collet	b9748757b0	fixed minor cast warning	2021-02-05 09:55:54 -08:00
Nick Terrell	66e811d782	[license] Update year to 2021	2021-01-04 17:53:52 -05:00
Nick Terrell	c465f24457	ZSTD_ prefix mem{cpy,move,set},malloc,calloc,free	2020-08-26 12:26:03 -07:00
W. Felix Handte	5e5f262612	Add (Possibly Empty) Info Strings to All Variadic Error Handling Macro Invocations	2020-05-04 10:58:55 -04:00
Nick Terrell	e103d7b4a6	Fix superblock mode (#2100 ) Fixes: Enable RLE blocks for superblock mode Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table. Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table. Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes. Respect disableLiteralsCompression in superblock mode Fix superblock mode to handle a block consisting of only compressed literals Fix a off by 1 error in superblock mode that disabled it whenever there were last literals Fix superblock mode with long literals/matches (> 0xFFFF) Allow superblock mode to repeat Huffman tables Respect ZSTD_minGain(). Tests: Simple check for the condition in #2096. When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much. Remaining limitations: O(targetCBlockSize^2) because we recompute statistics every sequence Unable to split literals of length > targetCBlockSize into multiple sequences Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead. ... Fixes #2096	2020-05-01 16:11:47 -07:00
Nick Terrell	ac58c8d720	Fix copyright and license lines * All copyright lines now have -2020 instead of -present * All copyright lines include "Facebook, Inc" * All licenses are now standardized The copyright in `threading.{h,c}` is not changed because it comes from zstdmt. The copyright and license of `divsufsort.{h,c}` is not changed.	2020-03-26 17:02:06 -07:00
Bimba Shrestha	3fb5b106da	Replacing some literals with constants	2019-11-05 10:26:57 -08:00
W. Felix Handte	c25283cf00	Disambiguate 'workspace' and 'entropyWorkspace'	2019-09-09 13:34:08 -04:00
Ephraim Park	1dc98de279	Restructure the source files	2019-07-15 17:39:18 -07:00

26 Commits