krak/zstd - zstd - Gitea: Git with a cup of tea

krak/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-07-04 14:48:45 +02:00

Author	SHA1	Message	Date
Yann Collet	b46236278a	detect extraneous bytes in the Sequences section when nbSeq == 0. Reported by @ip7z	2023-06-13 11:43:45 -07:00
Yann Collet	3732a08f5b	fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes The sequence section starts with a number, which tells how sequences are present in the section. If this number if 0, the section automatically ends. The number 0 can be represented using the 1 byte or the 2 bytes formats. That's because the 2-bytes formats fully overlaps the 1 byte format. However, when 0 is represented using the 2-bytes format, the decoder was expecting the sequence section to continue, and was looking for FSE tables, which is incorrect. Fixed this behavior, in both the reference decoder and the educational behavior. In practice, this behavior never happens, because the encoder will always select the 1-byte format to represent 0, since this is more efficient. Completed the fix with a new golden sample for tests, a clarification of the specification, and a decoder errata paragraph.	2023-06-05 16:03:00 -07:00
Nick Terrell	0abf2baef9	Reduce streaming decompression memory by 128KB The split literals buffer patch increased streaming decompression memory by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This patch removes the added 128KB buffer, because it isn't necessary. The buffer was there because the literals compression code didn't know the true `blockSizeMax` of the frame, and always put split literals so they ended 128KB - 32 from the beginning of the block. Instead, we can pass down the true `blockSizeMax` and ensure that the split literals end up at `blockSizeMax - 32` from the beginning of the block. We already reserve a full `blockSizeMax` bytes in streaming mode, so we won't be overwriting the extDict window.	2023-04-17 16:31:02 -07:00
daniellerozenblit	fcaf06ddb4	Check that `dest` is valid for decompression (#3555 ) * add check for valid dest buffer and fuzz on random dest ptr when malloc 0 * add uptrval to linux-kernel * remove bin files * get rid of uptrval * restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)	2023-03-31 23:00:55 -07:00
Han Zhu	b558190ac7	Remove clang-only branch hints from ZSTD_decodeSequence Looking at the __builtin_expect in ZSTD_decodeSequence: { size_t offset; #if defined(__clang__) if (LIKELY(ofBits > 1)) { #else if (ofBits > 1) { #endif ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1); From profile-annotated assembly, the probability of ofBits > 1 is about 75% (101k counts out of 135k counts). This is much smaller than the recommended likelihood to use __builtin_expect which is 99%. As a result, clang moved the else block further away which hurts cache locality. Removing this __built_expect along with two others in ZSTD_decodeSequence gave better performance when PGO is enabled. I suggest to remove these branch hints and rely on PGO which leverages runtime profiles from actual workload to calculate branch probability instead.	2023-03-28 15:36:22 -07:00
Nick Terrell	fbd97f305a	Deprecated bufferless and block level APIs * Mark all bufferless and block level functions as deprecated * Update documentation to suggest not using these functions * Add `_deprecated()` wrappers for functions that we use internally and call those instead	2023-03-16 10:04:15 -07:00
Dimitri Papadopoulos	547794ef40	Fix typos found by codespell	2023-02-18 10:31:48 +01:00
Nick Terrell	71a0259247	Fix ZSTD_getOffsetInfo() when nbSeq == 0 In 32-bit mode, ZSTD_getOffsetInfo() can be called when nbSeq == 0, and in this case the offset table is uninitialized. The function should just return 0 for both values, because there are no sequences. Credit to OSS-Fuzz	2023-02-02 14:26:41 -08:00
Nick Terrell	cc3e3acd34	Fix 32-bit decoding with large dictionary The 32-bit decoder could corrupt the regenerated data by using regular offset mode when there were actually long offsets. This is because we were only considering the window size in the calculation, not the dictionary size. So a large dictionary could allow longer offsets. Fix this in two ways: 1. Instead of looking at the window size, look at the total referencable bytes in the history buffer. Use this in the comparison instead of the window size. Additionally, we were comparing against the wrong value, it was too low. Fix that by computing exactly the maximum offset for regular sequence decoding. 2. If it is possible that we have long offsets due to (1), then check the offset code decoding table, and if the decoding table's maximum number of additional bits is no more than STREAM_ACCUMULATOR_MIN, then we can't have long offsets. This gates us to be using the long offsets decoder only when we are very likely to actually have long offsets. Note that this bug only affects the decoding of the data, and the original compressed data, if re-read with a patched decoder, will correctly regenerate the orginal data. Except that the encoder also had the same issue previously. This fixes both the open OSS-Fuzz issues. Credit to OSS-Fuzz	2023-02-01 17:22:44 -08:00
Nick Terrell	2f74507bbd	Simplify 32-bit long offsets decoding logic The previous code had an issue when `bitsConsumed == 32` it would read 0 bits for the `ofBits` read, which violates the precondition of `BIT_readBitsFast()`. This can happen when the stream is corrupted. Fix thie issue by always reading the maximum possible number of extra bits. I've measured neutral decoding performance, likely because this branch is unlikely, but this should be faster anyways. And if not, it is only 32-bit decoding, so performance isn't as critical. Credit to OSS-Fuzz	2023-01-30 12:21:42 -08:00
Nick Terrell	b3b43f2893	Fix invalid assert in 32-bit decoding The assert is only correct for valid sequences, so disable it for everything execpt round trip fuzzers.	2023-01-27 14:40:38 -08:00
Nick Terrell	8957fef554	[huf] Add generic C versions of the fast decoding loops Add generic C versions of the fast decoding loops to serve architectures that don't have an assembly implementation. Also allow selecting the C decoding loop over the assembly decoding loop through a zstd decompression parameter `ZSTD_d_disableHuffmanAssembly`. I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor. The benchmark command forces zstd to compress without any matches, using only literals compression, and measures only Huffman decompression speed: ``` zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar ``` The new fast decoding loops outperform the previous implementation uniformly, but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer from the same stability problems that we've seen in the past, where the assembly version doesn't. So even though clang gets close to assembly on x86-64, it still has stability issues. \| Arch \| Function \| Compiler \| Default (MB/s) \| Assembly (MB/s) \| Fast (MB/s) \| \|---------\|----------------\|--------------\|----------------\|-----------------\|-------------\| \| x86-64 \| decompress 4X1 \| gcc-12.2.0 \| 1029.6 \| 1308.1 \| 1208.1 \| \| x86-64 \| decompress 4X1 \| clang-14.0.6 \| 1019.3 \| 1305.6 \| 1276.3 \| \| x86-64 \| decompress 4X2 \| gcc-12.2.0 \| 1348.5 \| 1657.0 \| 1374.1 \| \| x86-64 \| decompress 4X2 \| clang-14.0.6 \| 1027.6 \| 1659.9 \| 1468.1 \| \| aarch64 \| decompress 4X1 \| clang-12.0.5 \| 1081.0 \| N/A \| 1234.9 \| \| aarch64 \| decompress 4X2 \| clang-12.0.5 \| 1270.0 \| N/A \| 1516.6 \|	2023-01-25 13:47:51 -08:00
Nick Terrell	329169189c	Replace Huffman boolean args with flags bit set	2023-01-20 14:12:53 -08:00
Nick Terrell	0cc1b0cb22	Delete unused Huffman functions Remove all Huffman functions that aren't used by zstd.	2023-01-20 14:12:53 -08:00
Yann Collet	089b2797e3	Merge pull request #3398 from facebook/fix3316 spec update : require minimum nb of literals for 4-streams mode	2022-12-22 16:57:05 -08:00
Yann Collet	6a9c525903	spec update : require minimum nb of literals for 4-streams mode Reported by @shulib : the specification for 4-streams mode doesn't work when the amount of literals to compress is 5 bytes. Extending it, it also doesn't work for sizes 1 or 2. This patch updates the specification and the implementation to require a minimum of 6 literals to trigger or accept the 4-streams mode. The impact is expected to be a no-op : the 4-streams mode is never triggered for such small quantity of literals anyway, since it would be wasteful (it costs ~7.3 bytes more than single-stream mode). An informal lower limit is set at ~256 bytes, so the technical minimum is very far from this limit. This is just meant for completeness of the specification.	2022-12-22 16:14:34 -08:00
Yann Collet	ea2895cef4	Support decompression of compressed blocks of size ZSTD_BLOCKSIZE_MAX exactly	2022-12-22 12:40:27 -08:00
W. Felix Handte	5d693cc38c	Coalesce Almost All Copyright Notices to Standard Phrasing ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache $ -prune -o -type f); do sed -i '/Copyright .* $Yann Collet$\\|$Meta Platforms$/ s/Copyright ./Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0.c lib/legacy/zstd_v0*.h nano ./programs/windres/zstd.rc nano ./build/VS2010/zstd/zstd.rc nano ./build/VS2010/libzstd-dll/libzstd-dll.rc ```	2022-12-20 12:52:34 -05:00
W. Felix Handte	8927f985ff	Update Copyright Headers 'Facebook' -> 'Meta Platforms' ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora $ -prune -o -type f); do sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f; done ```	2022-12-20 12:37:57 -05:00
Nick Terrell	a70ca2bd7d	Fix off-by-one error in superblock mode (#3221 ) Fixes #3212. Long literal and match lengths had an off-by-one error in ZSTD_getSequenceLength. Fix the off-by-one error, and add a golden compression test that catches the bug. Also run all the golden tests in the cli-tests framework.	2022-08-03 11:28:39 -07:00
Jun He	ec5fdcde19	lib: add hint to generate more pipeline friendly code (#3138 ) With statistic data of test data files of silesia the chance of position beyond highThreshold is very low (~1.3%@L8 in most cases, all <2.5%), and is in "lowprob area". Add the branch hint so compiler can get better pipiline codegen. With this change it is observed ~1% of mozilla and xml, and slight (0.3%~0.8%) but consistent uplift on other files on Arm N1. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade	2022-07-29 10:28:04 -07:00
Jun He	558cf20d0d	decomp: add prefetch for matched seq on aarch64 (#3164 ) match is used for following sequence copy. It is only updated when extDict is needed, which is a low probability case. So it can be prefetched to reduce cache miss. The benchmarks on various Arm platforms showed uplift from 1% ~ 14% with gcc-11/clang-14. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: If201af4799d2455d74c79f8387404439d7f684ae	2022-07-29 10:27:20 -07:00
udayanbapat	43f21a600e	Intial commit to address 3090. Added support to decompress empty block. (#3118 ) * Intial commit to address 3090. Added support to decompress empty block * Update zstd_decompress_block.c Addressed review comments for the case of 'set_basic' * Update lib/decompress/zstd_decompress_block.c Co-authored-by: Nick Terrell <nickrterrell@gmail.com> * Update lib/decompress/zstd_decompress_block.c Co-authored-by: Nick Terrell <nickrterrell@gmail.com> Co-authored-by: Nick Terrell <nickrterrell@gmail.com>	2022-07-14 11:54:34 -07:00
Jun He	2491c65937	dec: adjust seqSymbol load on aarch64 ZSTD_seqSymbol is a structure with total of 64 bits wide. So it can be loaded in one operation and extract its fields by simply shifting or extracting on aarch64. GCC doesn't recognize this and generates more unnecessary ldr/ldrb/ldrh operations that cause performance drop. With this change it is observed 2~4% uplift of silesia and 2.5~6% of cantrbry @L8 on Arm N1. Signed-off-by: Jun He <jun.he@arm.com> Change-Id: I7748909204cf78a17eb9d4f2333692d53239daa8	2022-05-30 22:01:38 +08:00
Dominique Pelle	b772f53952	Typo and grammar fixes	2022-03-12 08:58:04 +01:00
Elliot Gorokhovsky	db2f4a6532	Move bitwise builtins into bits.h	2022-02-14 11:16:03 -05:00
Yann Collet	7616e39f3b	adding traces to better track processing of literals	2022-01-26 14:47:21 -08:00
Norbert Lange	2fbb1d10c1	Reduce bit tables to 8bit This saves some 1.7Kb in rodata section (x86_64, zstd tool), while assembler code stays the same except the type of a few load/extend instructions. Should not have negative performance implications.	2021-12-14 23:47:57 +01:00
Nick Terrell	5414dd7978	[bmi2] Add lzcnt and bmi target attributes * When dynamic dispatching to bmi2 add lzcnt and bmi to the TARGET_ATTRIBUTE. * Centralize the bmi2 TARGET_ATTRIBUTE definition to BMI2_TARGET_ATTRIBUTE so we can change it in the future. * Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't be any cases where bmi2 is supported but bmi1 isn't. But, since we are using the instruction we should check bmi1 as well.	2021-11-30 17:54:56 -08:00
binhdvo	04734ee84a	Fix oss fuzz test error (#2837 )	2021-10-29 10:29:50 -04:00
binhdvo	6a7ede3dfc	Reduce size of dctx by reutilizing dst buffer (#2751 ) * Reduce size of dctx by reutilizing dst buffer Co-authored-by: Binh Vo <binhvo@fb.com>	2021-10-25 10:38:01 -04:00
Norbert Lange	0d45540695	decompress: conditionally remove bmi2 from context Use an helper function, which will just return 0 in case the feature is disabled. Allows constant propagation and removal of dead code.	2021-09-26 14:41:37 +02:00
Nick Terrell	189e87bcbe	[lib] Make lib compatible with `-Wfall-through` excepting legacy Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported compilers this uses an attribute, otherwise it becomes a comment. This is necessary to be compatible with clang's `-Wfall-through`, and gcc's `-Wfall-through=2` which don't support comments. Without this the linux build emits a bunch of warnings. Also add a test to CI to ensure that we don't regress.	2021-09-23 10:51:18 -07:00
Danila Kutenin	2c2c9e7dfd	Add possible improvements for gcc-11	2021-06-29 09:06:47 +01:00
Danila Kutenin	08a3ddbd28	Add comment for gcc-11	2021-06-08 20:54:21 +01:00
Danila Kutenin	6534c0000f	Be C89 compliant and fix alignment for gcc11	2021-06-08 20:45:57 +01:00
Danila Kutenin	a80d268700	Optimize ZSTD_decodeSequence by another x%	2021-05-29 18:21:10 +01:00
Yann Collet	439e58d060	improved gcc-9 and gcc-10 decoding speed the new alignment setting is better for gcc-9 and gcc-10 by about ~+5%. Unfortunately, it's worse for essentially all other compilers. Make the new alignment setting conditional to gcc-9+.	2021-05-08 00:01:01 -07:00
Yann Collet	6755baf940	update decoder hot loop alignment This seems to bring an additional ~+1.2% decompression speed on average across 10 compilers x 6 scenarios.	2021-05-07 15:18:16 -07:00
Yann Collet	1db5947591	improve decompression speed of long variant by ~+5% changed strategy, now unconditionally prefetch the first 2 cache lines, instead of cache lines corresponding to the first and last bytes of the match. This better corresponds to cpu expectation, which should auto-prefetch following cachelines on detecting the sequential nature of the read. This is globally positive, by +5%, though exact gains depend on compiler (from -2% to +15%). The only negative counter-example is gcc-9.	2021-05-07 11:26:14 -07:00
Yann Collet	ee425faaa7	Merge branch 'dev' into d_prefetch_refactor	2021-05-06 19:49:26 -07:00
Nick Terrell	b052b583e5	[lib] Fix UBSAN warning in ZSTD_decompressSequences()	2021-05-06 15:31:30 -07:00
Yann Collet	7ef6d7b36c	deeper prefetching pipeline for decompressSequencesLong pipeline increased from 4 to 8 slots. This change substantially improves decompression speed when there are long distance offsets. example with enwik9 compressed at level 22 : gcc-9 : 947 -> 1039 MB/s clang-10: 884 -> 946 MB/s I also checked the "cold dictionary" scenario, and found a smaller benefit, around ~2% (measurements are more noisy for this scenario).	2021-05-05 10:04:03 -07:00
Yann Collet	8cde167a27	Merge branch 'dev' into d_prefetch_refactor	2021-05-05 09:13:38 -07:00
Nick Terrell	a494308ae9	[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files * Switch to yearless copyright per FB policy * Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources * Add zstd copyright/license header to the `contrib/linux-kernel` sources * Update the `tests/test-license.py` to check for yearless copyright * Improvements to `tests/test-license.py` * Check `contrib/linux-kernel` in `tests/test-license.py`	2021-03-30 10:30:43 -07:00
Yann Collet	f5434663ea	Refactor prefetching for the decoding loop Following #2545, I noticed that one field in `seq_t` is optional, and only used in combination with prefetching. (This may have contributed to static analyzer failure to detect correct initialization). I then wondered if it would be possible to rewrite the code so that this optional part is handled directly by the prefetching code rather than delegated as an option into `ZSTD_decodeSequence()`. This resulted into this refactoring exercise where the prefetching responsibility is better isolated into its own function and `ZSTD_decodeSequence()` is streamlined to contain strictly Sequence decoding operations. Incidently, due to better code locality, it reduces the need to send information around, leading to simplified interface, and smaller state structures.	2021-03-19 15:48:17 -07:00
Nick Terrell	f9b1e711ba	[zstd] Fix NULL pointer addition in ZSTD_checkContinuity() Don't start a new section when `dstSize == 0` to avoid NULL pointer addition.	2021-02-05 12:18:06 -08:00
Yann Collet	b9748757b0	fixed minor cast warning	2021-02-05 09:55:54 -08:00
Nick Terrell	66e811d782	[license] Update year to 2021	2021-01-04 17:53:52 -05:00
Yann Collet	0b39531d75	moving all references to `release` branch was previously `master`	2020-12-16 23:00:35 -08:00

1 2 3

103 Commits