krak/zstd - zstd - Gitea: Git with a cup of tea

krak/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-03-06 16:56:49 +02:00

Author	SHA1	Message	Date
Elliot Gorokhovsky	559762da12	Remove duplicate and incorrect docs in zstd_decompress.c (#3967 )	2024-03-14 15:55:01 -04:00
Nick Terrell	ff0afbad58	[asm][aarch64] Mark that BTI and PAC are supported Mark that `huf_decompress_amd64.S` supports BTI and PAC, which it trivially does because it is empty for aarch64. The issue only requested BTI markings, but it also makes sense to mark PAC, which is the only other feature. Also run add a test for this mode to the ARM64 QEMU test. Before this PR it warns on `huf_decompress_amd64.S`, after it doesn't. Fixes Issue #3841.	2024-03-13 16:15:51 -04:00
Elliot Gorokhovsky	f65b9e27ce	Exercise ZSTD_findDecompressedSize() in the simple decompression fuzzer (#3959 ) * Improve decompression fuzzer * Fix legacy frame header fuzzer crash, add unit test	2024-03-12 17:07:06 -04:00
Yann Collet	a9fb8d4c41	new method to deal with offset==0 in this new method, when an `offset==0` is detected, it's converted into (size_t)(-1), instead of 1. The logic is that (size_t)(-1) is effectively an extremely large positive number, which will not pass the offset distance test at next stage (`execSequence()`). Checked the source code, and offset is always checked (as it should), using a formula which is not vulnerable to arithmetic overflow: ``` RETURN_ERROR_IF(sequence.offset > (size_t)(oLitEnd - virtualStart), ``` The benefit is that such a case (offset==0) is always detected as corrupted data as opposed to relying on the checksum to detect the error.	2024-03-08 15:26:06 -08:00
Yann Collet	8689633fdf	Merge pull request #3840 from aimuz/fix-reserved lib/decompress: check for reserved bit corruption in zstd	2024-03-05 13:40:12 -08:00
Yann Collet	f77f634d41	update API documentation	2024-02-24 01:28:17 -08:00
Yann Collet	4b51526412	fix partial block uncompressed	2024-02-24 01:24:58 -08:00
Yann Collet	4683667785	refactor optimal parser store stretches as intermediate solution instead of sequences. makes it possible to link a solution to a predecessor.	2024-01-31 02:51:46 -08:00
aimuz	468bb17378	lib/decompress: check for reserved bit corruption in zstd The patch adds a validation to ensure that the last field, which is reserved, must be all-zeroes in ZSTD_decodeSeqHeaders. This prevents potential corruption from going undetected. Fixes an issue where corrupted input could lead to undefined behavior due to improper validation of reserved bits. Signed-off-by: aimuz <mr.imuz@gmail.com>	2023-11-28 21:04:37 +08:00
Nick Terrell	8193250615	Modernize macros to use `do { } while (0)` This PR introduces no functional changes. It attempts to change all macros currently using `{ }` or some variant of that to to `do { } while (0)`, and introduces trailing `;` where necessary. There were no bugs found during this migration. The bug in Visual Studios warning on this has been fixed since VS2015. Additionally, we have several instances of `do { } while (0)` which have been present for several releases, so we don't have to worry about breaking peoples builds. Fixes Issue #3830.	2023-11-21 20:05:17 -05:00
Nick Terrell	dd4de1dd7a	[huf] Fix null pointer addition `HUF_DecompressFastArgs_init()` was adding 0 to NULL. Fix it by exiting early for empty outputs. This is no change in behavior, because the function was already exiting 0 in this case, just slightly later.	2023-11-20 17:13:01 -05:00
Nick Terrell	5ab78c0418	[huf] Improve fast C & ASM performance on small data * Rename `ilimit` to `ilowest` and set it equal to `src` instead of `src + 6 + 8`. This is safe because the fast decoding loops guarantee to never read below `ilowest` already. This allows the fast decoder to run for at least two more iterations, because it consumes at most 7 bytes per iteration. * Continue the fast loop all the way until the number of safe iterations is 0. Initially, I thought that when it got towards the end, the computation of how many iterations of safe might become expensive. But it ends up being slower to have to decode each of the 4 streams individually, which makes sense. This drastically speeds up the Huffman decoder on the `github` dataset for the issue raised in #3762, measured with `zstd -b1e1r github/`. \| Decoder \| Speed before \| Speed after \| \|----------\|--------------\|-------------\| \| Fallback \| 477 MB/s \| 477 MB/s \| \| Fast C \| 384 MB/s \| 492 MB/s \| \| Assembly \| 385 MB/s \| 501 MB/s \| We can also look at the speed delta for different block sizes of silesia using `zstd -b1e1r silesia.tar -B#`. \| Decoder \| -B1K ∆ \| -B2K ∆ \| -B4K ∆ \| -B8K ∆ \| -B16K ∆ \| -B32K ∆ \| -B64K ∆ \| -B128K ∆ \| \|----------\|--------\|--------\|--------\|--------\|---------\|---------\|---------\|----------\| \| Fast C \| +11.2% \| +8.2% \| +6.1% \| +4.4% \| +2.7% \| +1.5% \| +0.6% \| +0.2% \| \| Assembly \| +12.5% \| +9.0% \| +6.2% \| +3.6% \| +1.5% \| +0.7% \| +0.2% \| +0.03% \|	2023-11-20 17:13:01 -05:00
Nick Terrell	c7269add7e	[huf] Improve fast huffman decoding speed in linux kernel gcc in the linux kernel was not unrolling the inner loops of the Huffman decoder, which was destroying decoding performance. The compiler was generating crazy code with all sorts of branches. I suspect because of Spectre mitigations, but I'm not certain. Once the loops were manually unrolled, performance was restored. Additionally, when gcc couldn't prove that the variable left shift in the 4X2 decode loop wasn't greater than 63, it inserted checks to verify it. To fix this, mask `entry.nbBits & 0x3F`, which allows gcc to eliete this check. This is a no op, because `entry.nbBits` is guaranteed to be less than 64. Lastly, introduce the `HUF_DISABLE_FAST_DECODE` macro to disable the fast C loops for Issue #3762. So if even after this change, there is a performance regression, users can opt-out at compile time.	2023-11-20 14:56:46 -05:00
Yann Collet	c1e588fcb4	Merge pull request #3771 from DimitriPapadopoulos/codespell Fix new typos found by codespell	2023-10-07 19:29:41 -07:00
Nick Terrell	43118da8a7	Stop suppressing pointer-overflow UBSAN errors * Remove all pointer-overflow suppressions from our UBSAN builds/tests. * Add `ZSTD_ALLOW_POINTER_OVERFLOW_ATTR` macro to suppress pointer-overflow at a per-function level. This is a superior approach because it also applies to users who build zstd with UBSAN. * Add `ZSTD_wrappedPtr{Diff,Add,Sub}()` that use these suppressions. The end goal is to only tag these functions with `ZSTD_ALLOW_POINTER_OVERFLOW`. But we can start by annoting functions that rely on pointer overflow, and gradually transition to using these. * Add `ZSTD_maybeNullPtrAdd()` to simplify pointer addition when the pointer may be `NULL`. * Fix all the fuzzer issues that came up. I'm sure there will be a lot more, but these are the ones that came up within a few minutes of running the fuzzers, and while running GitHub CI.	2023-09-28 17:35:05 -04:00
Nick Terrell	3daed7017a	Revert "Work around nullptr-with-nonzero-offset warning" This reverts commit c27fa399042f466080e79bb4fd8a4871bc0bcf28.	2023-09-28 17:35:05 -04:00
Dimitri Papadopoulos	fe34776c20	Fix new typos found by codespell	2023-09-23 18:56:01 +02:00
Nick Terrell	cdceb0fce5	Improve macro guards for ZSTD_assertValidSequence Refine the macro guards to define the functions exactly when they are needed. This fixes the chromium build with zstd. Thanks to @GregTho for reporting!	2023-09-22 16:36:14 -04:00
Nick Terrell	c27fa39904	Work around nullptr-with-nonzero-offset warning See comment.	2023-08-25 13:20:59 -04:00
Yann Collet	c123e69ad0	fixed static analyzer false positive regarding @sequence initialization make a mock initialization to please the tool	2023-06-16 16:24:48 -07:00
Yann Collet	c60dcedcc9	adapted long decoder to new decodeSequences removed older decodeSequences	2023-06-16 15:52:00 -07:00
Yann Collet	33fca19dd4	changed ZSTD_decompressSequences_bodySplitLitBuffer() decoding loop to behave more like the regular decoding loop.	2023-06-16 15:32:07 -07:00
Yann Collet	84e898a76c	removed _old variant from splitLit	2023-06-16 14:42:28 -07:00
Yann Collet	02134fad12	changed (partially) the decodeSequences flow logic this allows detecting overflow events without a checksum.	2023-06-16 11:57:12 -07:00
Yann Collet	b46236278a	detect extraneous bytes in the Sequences section when nbSeq == 0. Reported by @ip7z	2023-06-13 11:43:45 -07:00
Yann Collet	3732a08f5b	fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes The sequence section starts with a number, which tells how sequences are present in the section. If this number if 0, the section automatically ends. The number 0 can be represented using the 1 byte or the 2 bytes formats. That's because the 2-bytes formats fully overlaps the 1 byte format. However, when 0 is represented using the 2-bytes format, the decoder was expecting the sequence section to continue, and was looking for FSE tables, which is incorrect. Fixed this behavior, in both the reference decoder and the educational behavior. In practice, this behavior never happens, because the encoder will always select the 1-byte format to represent 0, since this is more efficient. Completed the fix with a new golden sample for tests, a clarification of the specification, and a decoder errata paragraph.	2023-06-05 16:03:00 -07:00
Nick Terrell	61efb2a047	Add ZSTD_d_maxBlockSize parameter Reduces memory when blocks are guaranteed to be smaller than allowed by the format. This is useful for streaming compression in conjunction with ZSTD_c_maxBlockSize. This PR saves 2 * (formatMaxBlockSize - paramMaxBlockSize) when streaming. Once it is rebased on top of PR #3616 it will save 3 * (formatMaxBlockSize - paramMaxBlockSize).	2023-04-17 22:06:44 -07:00
Nick Terrell	0abf2baef9	Reduce streaming decompression memory by 128KB The split literals buffer patch increased streaming decompression memory by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This patch removes the added 128KB buffer, because it isn't necessary. The buffer was there because the literals compression code didn't know the true `blockSizeMax` of the frame, and always put split literals so they ended 128KB - 32 from the beginning of the block. Instead, we can pass down the true `blockSizeMax` and ensure that the split literals end up at `blockSizeMax - 32` from the beginning of the block. We already reserve a full `blockSizeMax` bytes in streaming mode, so we won't be overwriting the extDict window.	2023-04-17 16:31:02 -07:00
Yann Collet	e4120c5513	fixing potential over-reads detected by @terrelln, these issue could be triggered in specific scenarios namely decompression of certain invalid magic-less frames, or requested properties from certain invalid skippable frames.	2023-04-03 16:52:32 -07:00
daniellerozenblit	fcaf06ddb4	Check that `dest` is valid for decompression (#3555 ) * add check for valid dest buffer and fuzz on random dest ptr when malloc 0 * add uptrval to linux-kernel * remove bin files * get rid of uptrval * restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)	2023-03-31 23:00:55 -07:00
Han Zhu	b558190ac7	Remove clang-only branch hints from ZSTD_decodeSequence Looking at the __builtin_expect in ZSTD_decodeSequence: { size_t offset; #if defined(__clang__) if (LIKELY(ofBits > 1)) { #else if (ofBits > 1) { #endif ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1); From profile-annotated assembly, the probability of ofBits > 1 is about 75% (101k counts out of 135k counts). This is much smaller than the recommended likelihood to use __builtin_expect which is 99%. As a result, clang moved the else block further away which hurts cache locality. Removing this __built_expect along with two others in ZSTD_decodeSequence gave better performance when PGO is enabled. I suggest to remove these branch hints and rely on PGO which leverages runtime profiles from actual workload to calculate branch probability instead.	2023-03-28 15:36:22 -07:00
Nick Terrell	fbd97f305a	Deprecated bufferless and block level APIs * Mark all bufferless and block level functions as deprecated * Update documentation to suggest not using these functions * Add `_deprecated()` wrappers for functions that we use internally and call those instead	2023-03-16 10:04:15 -07:00
Dimitri Papadopoulos	547794ef40	Fix typos found by codespell	2023-02-18 10:31:48 +01:00
Yonatan Komornik	c78f434aa4	Fix zstd-dll build missing dependencies (#3496 ) * Fixes zstd-dll build (https://github.com/facebook/zstd/issues/3492): - Adds pool.o and threading.o dependency to the zstd-dll target - Moves custom allocation functions into header to avoid needing to add dependency on common.o - Adds test target for zstd-dll - Adds github workflow that buildis zstd-dll	2023-02-12 12:32:31 -08:00
Elliot Gorokhovsky	a7de1d9f49	Fix all MSVC warnings (#3495 ) * fix and test MSVC AVX2 build * treat msbuild warnings as errors * fix incorrect MSVC 2019 compiler warning * fix MSVC error D9035: option 'Gm' has been deprecated and will be removed in a future release	2023-02-11 10:56:59 -05:00
Nick Terrell	71a0259247	Fix ZSTD_getOffsetInfo() when nbSeq == 0 In 32-bit mode, ZSTD_getOffsetInfo() can be called when nbSeq == 0, and in this case the offset table is uninitialized. The function should just return 0 for both values, because there are no sequences. Credit to OSS-Fuzz	2023-02-02 14:26:41 -08:00
Nick Terrell	cc3e3acd34	Fix 32-bit decoding with large dictionary The 32-bit decoder could corrupt the regenerated data by using regular offset mode when there were actually long offsets. This is because we were only considering the window size in the calculation, not the dictionary size. So a large dictionary could allow longer offsets. Fix this in two ways: 1. Instead of looking at the window size, look at the total referencable bytes in the history buffer. Use this in the comparison instead of the window size. Additionally, we were comparing against the wrong value, it was too low. Fix that by computing exactly the maximum offset for regular sequence decoding. 2. If it is possible that we have long offsets due to (1), then check the offset code decoding table, and if the decoding table's maximum number of additional bits is no more than STREAM_ACCUMULATOR_MIN, then we can't have long offsets. This gates us to be using the long offsets decoder only when we are very likely to actually have long offsets. Note that this bug only affects the decoding of the data, and the original compressed data, if re-read with a patched decoder, will correctly regenerate the orginal data. Except that the encoder also had the same issue previously. This fixes both the open OSS-Fuzz issues. Credit to OSS-Fuzz	2023-02-01 17:22:44 -08:00
Nick Terrell	2f74507bbd	Simplify 32-bit long offsets decoding logic The previous code had an issue when `bitsConsumed == 32` it would read 0 bits for the `ofBits` read, which violates the precondition of `BIT_readBitsFast()`. This can happen when the stream is corrupted. Fix thie issue by always reading the maximum possible number of extra bits. I've measured neutral decoding performance, likely because this branch is unlikely, but this should be faster anyways. And if not, it is only 32-bit decoding, so performance isn't as critical. Credit to OSS-Fuzz	2023-01-30 12:21:42 -08:00
Nick Terrell	b3b43f2893	Fix invalid assert in 32-bit decoding The assert is only correct for valid sequences, so disable it for everything execpt round trip fuzzers.	2023-01-27 14:40:38 -08:00
Nick Terrell	bda947e17a	[huf] Fix bug in fast C decoders The input bounds checks were buggy because they were only breaking from the inner loop, not the outer loop. The fuzzers found this immediately. The fix is to use `goto _out` instead of `break`. This condition can happen on corrupted inputs. I've benchmarked before and after on x86-64 and there were small changes in performance, some positive, and some negative, and they end up about balacing out. Credit to OSS-Fuzz	2023-01-26 14:39:13 -08:00
Yann Collet	efc9ae3480	Merge pull request #3455 from facebook/fix3454 Provide more accurate error codes for busy-loop scenarios	2023-01-25 15:22:51 -08:00
Nick Terrell	8957fef554	[huf] Add generic C versions of the fast decoding loops Add generic C versions of the fast decoding loops to serve architectures that don't have an assembly implementation. Also allow selecting the C decoding loop over the assembly decoding loop through a zstd decompression parameter `ZSTD_d_disableHuffmanAssembly`. I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor. The benchmark command forces zstd to compress without any matches, using only literals compression, and measures only Huffman decompression speed: ``` zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar ``` The new fast decoding loops outperform the previous implementation uniformly, but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer from the same stability problems that we've seen in the past, where the assembly version doesn't. So even though clang gets close to assembly on x86-64, it still has stability issues. \| Arch \| Function \| Compiler \| Default (MB/s) \| Assembly (MB/s) \| Fast (MB/s) \| \|---------\|----------------\|--------------\|----------------\|-----------------\|-------------\| \| x86-64 \| decompress 4X1 \| gcc-12.2.0 \| 1029.6 \| 1308.1 \| 1208.1 \| \| x86-64 \| decompress 4X1 \| clang-14.0.6 \| 1019.3 \| 1305.6 \| 1276.3 \| \| x86-64 \| decompress 4X2 \| gcc-12.2.0 \| 1348.5 \| 1657.0 \| 1374.1 \| \| x86-64 \| decompress 4X2 \| clang-14.0.6 \| 1027.6 \| 1659.9 \| 1468.1 \| \| aarch64 \| decompress 4X1 \| clang-12.0.5 \| 1081.0 \| N/A \| 1234.9 \| \| aarch64 \| decompress 4X2 \| clang-12.0.5 \| 1270.0 \| N/A \| 1516.6 \|	2023-01-25 13:47:51 -08:00
Yann Collet	db18a62f89	Provide more accurate error codes for busy-loop scenarios fixes #3454	2023-01-25 13:07:53 -08:00
Nick Terrell	dc2b3e8876	Fix -Wstringop-overflow warning Backported from kernel patch [0]. I wasn't able to reproduce the warning locally, but could repro it in the kernel. [0] https://lore.kernel.org/lkml/20220330193352.GA119296@embeddedor/	2023-01-23 10:12:25 -08:00
Nick Terrell	329169189c	Replace Huffman boolean args with flags bit set	2023-01-20 14:12:53 -08:00
Nick Terrell	0cc1b0cb22	Delete unused Huffman functions Remove all Huffman functions that aren't used by zstd.	2023-01-20 14:12:53 -08:00
Yann Collet	ea684c335a	added c89 build test to CI	2023-01-19 14:59:30 -08:00
Elliot Gorokhovsky	5d8cfa6b96	Deprecate advanced streaming functions (#3408 ) * deprecate advanced streaming functions * remove internal usage of the deprecated functions * nit * suppress warnings in tests/zstreamtest.c * purge ZSTD_initDStream_usingDict * nits * c90 compat * zstreamtest.c already disables deprecation warnings! * fix initDStream() return value * fix typo * wasn't able to import private symbol properly, this commit works around that * new strategy for zbuff * undo zbuff deprecation warning changes * move ZSTD_DISABLE_DEPRECATE_WARNINGS from .h to .c	2023-01-13 14:51:47 -05:00
Yann Collet	d5509080bc	Merge pull request #3419 from facebook/fix3416 fix root cause of #3416	2023-01-13 00:21:08 -08:00
Nick Terrell	5b266196a4	Add support for in-place decompression * Add a function and macro ZSTD_decompressionMargin() that computes the decompression margin for in-place decompression. The function computes a tight margin that works in all cases, and the macro computes an upper bound that will only work if flush isn't used. * When doing in-place decompression, make sure that our output buffer doesn't overlap with the input buffer. This ensures that we don't decide to use the portion of the output buffer that overlaps the input buffer for temporary memory, like for literals. * Add a simple unit test. * Add in-place decompression to the simple_round_trip and stream_round_trip fuzzers. This should help verify that our margin stays correct.	2023-01-12 16:28:08 -08:00

1 2 3 4 5 ...

709 Commits