krak/zstd - zstd - Gitea: Git with a cup of tea

krak/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-03-06 16:56:49 +02:00

Author	SHA1	Message	Date
W. Felix Handte	39b7946b95	Define Macros for Possibly-Present Functions; Use Them Rather than Ifdef Guards	2023-05-04 12:18:58 -04:00
W. Felix Handte	b12e8cb3e7	Merge Ultra and Ultra2 Exclusion Ultra2 does not exist for dict compression, and so uses ultra. So ultra must be present if ultra2 is.	2023-05-04 12:18:58 -04:00
W. Felix Handte	6761e1c949	Tweak Ultra/Opt Guards	2023-05-04 12:18:58 -04:00
W. Felix Handte	5a75956001	Adjust Strategy in CParams to Avoid Using Excluded Block Compressors	2023-05-04 12:18:58 -04:00
W. Felix Handte	50cdf84f58	Macro-Exclude Block Compressors from Declaration/Definition	2023-05-04 12:18:58 -04:00
W. Felix Handte	81b86a2024	NULL Out Block Compressor Table Entries When Excluded Don't check about excluding `ZSTD_fast`. It's always included so that we know we can resolve downwards and hit a strategy that's present.	2023-05-04 12:18:58 -04:00
W. Felix Handte	cbf3e26316	Allow `ZSTD_selectBlockCompressor()` to Return NULL Return an error rather than segfaulting.	2023-05-04 12:18:58 -04:00
Daniel Kutenin	4c25ea329b	Disable unused variable warning in msan configurations	2023-04-20 11:14:08 +01:00
Nick Terrell	61efb2a047	Add ZSTD_d_maxBlockSize parameter Reduces memory when blocks are guaranteed to be smaller than allowed by the format. This is useful for streaming compression in conjunction with ZSTD_c_maxBlockSize. This PR saves 2 * (formatMaxBlockSize - paramMaxBlockSize) when streaming. Once it is rebased on top of PR #3616 it will save 3 * (formatMaxBlockSize - paramMaxBlockSize).	2023-04-17 22:06:44 -07:00
Nick Terrell	0abf2baef9	Reduce streaming decompression memory by 128KB The split literals buffer patch increased streaming decompression memory by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This patch removes the added 128KB buffer, because it isn't necessary. The buffer was there because the literals compression code didn't know the true `blockSizeMax` of the frame, and always put split literals so they ended 128KB - 32 from the beginning of the block. Instead, we can pass down the true `blockSizeMax` and ensure that the split literals end up at `blockSizeMax - 32` from the beginning of the block. We already reserve a full `blockSizeMax` bytes in streaming mode, so we won't be overwriting the extDict window.	2023-04-17 16:31:02 -07:00
Yann Collet	e4120c5513	fixing potential over-reads detected by @terrelln, these issue could be triggered in specific scenarios namely decompression of certain invalid magic-less frames, or requested properties from certain invalid skippable frames.	2023-04-03 16:52:32 -07:00
Yann Collet	2e29728797	fix #3583 As reported by @georgmu, the previous fix is undone by the later initialization. Switch order, so that initialization is adjusted by special case.	2023-04-03 09:45:11 -07:00
Yann Collet	9f58241dcc	updated version number to v1.5.5 also : updated man pages	2023-03-31 23:02:08 -07:00
daniellerozenblit	fcaf06ddb4	Check that `dest` is valid for decompression (#3555 ) * add check for valid dest buffer and fuzz on random dest ptr when malloc 0 * add uptrval to linux-kernel * remove bin files * get rid of uptrval * restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)	2023-03-31 23:00:55 -07:00
Han Zhu	b558190ac7	Remove clang-only branch hints from ZSTD_decodeSequence Looking at the __builtin_expect in ZSTD_decodeSequence: { size_t offset; #if defined(__clang__) if (LIKELY(ofBits > 1)) { #else if (ofBits > 1) { #endif ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1); From profile-annotated assembly, the probability of ofBits > 1 is about 75% (101k counts out of 135k counts). This is much smaller than the recommended likelihood to use __builtin_expect which is 99%. As a result, clang moved the else block further away which hurts cache locality. Removing this __built_expect along with two others in ZSTD_decodeSequence gave better performance when PGO is enabled. I suggest to remove these branch hints and rely on PGO which leverages runtime profiles from actual workload to calculate branch probability instead.	2023-03-28 15:36:22 -07:00
Han Zhu	e6dccbf482	Inline BIT_reloadDStream Inlining `BIT_reloadDStream` provided >3% decompression speed improvement for clang PGO-optimized zstd binary, measured using the Silesia corpus with compression level 1. The win comes from improved register allocation which leads to fewer spills and reloads. Take a look at this comparison of profile-annotated hot assembly before and after this change: https://www.diffchecker.com/UjDGIyLz/. The diff is a bit messy, but notice three fewer moves after inlining. In general LLVM's register allocator works better when it can see more code. For example, when the register allocator sees a call instruction, it partitions the registers into caller registers and callee registers, and it is not free to do whatever it wants with all the registers for the current function. Inlining the callee lets the register allocation access all registers and use them more flexsibly.	2023-03-28 15:36:02 -07:00
daniellerozenblit	3e0550ee52	fix window update (#3556 )	2023-03-21 13:28:26 -04:00
Nick Terrell	a3c3a38b9b	[lazy] Skip over incompressible data Every 256 bytes the lazy match finders process without finding a match, they will increase their step size by 1. So for bytes [0, 256) they search every position, for bytes [256, 512) they search every other position, and so on. However, they currently still insert every position into their hash tables. This is different from fast & dfast, which only insert the positions they search. This PR changes that, so now after we've searched 2KB without finding any matches, at which point we'll only be searching one in 9 positions, we'll stop inserting every position, and only insert the positions we search. The exact cutoff of 2KB isn't terribly important, I've just selected a cutoff that is reasonably large, to minimize the impact on "normal" data. This PR only adds skipping to greedy, lazy, and lazy2, but does not touch btlazy2. \| Dataset \| Level \| Compiler \| CSize ∆ \| Speed ∆ \| \|---------\|-------\|--------------\|---------\|---------\| \| Random \| 5 \| clang-14.0.6 \| 0.0% \| +704% \| \| Random \| 5 \| gcc-12.2.0 \| 0.0% \| +670% \| \| Random \| 7 \| clang-14.0.6 \| 0.0% \| +679% \| \| Random \| 7 \| gcc-12.2.0 \| 0.0% \| +657% \| \| Random \| 12 \| clang-14.0.6 \| 0.0% \| +1355% \| \| Random \| 12 \| gcc-12.2.0 \| 0.0% \| +1331% \| \| Silesia \| 5 \| clang-14.0.6 \| +0.002% \| +0.35% \| \| Silesia \| 5 \| gcc-12.2.0 \| +0.002% \| +2.45% \| \| Silesia \| 7 \| clang-14.0.6 \| +0.001% \| -1.40% \| \| Silesia \| 7 \| gcc-12.2.0 \| +0.007% \| +0.13% \| \| Silesia \| 12 \| clang-14.0.6 \| +0.011% \| +22.70% \| \| Silesia \| 12 \| gcc-12.2.0 \| +0.011% \| -6.68% \| \| Enwik8 \| 5 \| clang-14.0.6 \| 0.0% \| -1.02% \| \| Enwik8 \| 5 \| gcc-12.2.0 \| 0.0% \| +0.34% \| \| Enwik8 \| 7 \| clang-14.0.6 \| 0.0% \| -1.22% \| \| Enwik8 \| 7 \| gcc-12.2.0 \| 0.0% \| -0.72% \| \| Enwik8 \| 12 \| clang-14.0.6 \| 0.0% \| +26.19% \| \| Enwik8 \| 12 \| gcc-12.2.0 \| 0.0% \| -5.70% \| The speed difference for clang at level 12 is real, but is probably caused by some sort of alignment or codegen issues. clang is significantly slower than gcc before this PR, but gets up to parity with it. I also measured the ratio difference for the HC match finder, and it looks basically the same as the row-based match finder. The speedup on random data looks similar. And performance is about neutral, without the big difference at level 12 for either clang or gcc.	2023-03-20 11:18:29 -07:00
Yann Collet	e2208242ac	Merge pull request #3553 from facebook/ldm_dict added documentation for LDM + dictionary compatibility	2023-03-16 11:20:32 -07:00
Nick Terrell	fbd97f305a	Deprecated bufferless and block level APIs * Mark all bufferless and block level functions as deprecated * Update documentation to suggest not using these functions * Add `_deprecated()` wrappers for functions that we use internally and call those instead	2023-03-16 10:04:15 -07:00
daniellerozenblit	53bad103ce	patch-from speed optimization (#3545 ) * patch-from speed optimization: only load portion of dictionary into normal matchfinders * test regression for x8 multiplier * fix off-by-one error for bit shift bound * restrict patchfrom speed optimization to strategy < ZSTD_btultra * update results.csv * update regression test	2023-03-14 20:36:56 -04:00
Yann Collet	f4563d87b9	added documentation for LDM + dictionary compatibility	2023-03-14 17:17:21 -07:00
Yonatan Komornik	91f4c23e63	Add salt into row hash (#3528 part 2) (#3533 ) Part 2 of #3528 Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).	2023-03-13 15:34:13 -07:00
Yonatan Komornik	9420bce8a4	Add init once memory (#3528 ) (#3529 ) - Adds memory type that is guaranteed to have been initialized at least once in the workspace's lifetime. - Changes tag space in row hash to be based on init once memory.	2023-03-13 13:20:49 -07:00
Yonatan Komornik	a91e91d614	[Bugfix] row hash tries to match position 0 (#3548 ) #3543 decreases the size of the tagTable by a factor of 2, which requires using the first tag position in each row for head position instead of a tag. Although position 0 stopped being a valid match, it still persisted in mask calculation resulting in the matches loops possibly terminating before it should have. The fix skips position 0 to solve this problem.	2023-03-13 10:00:03 -07:00
Yonatan Komornik	33e39094e7	Reduce RowHash's tag space size by x2 (#3543 ) Allocate half the memory for tag space, which means that we get one less slot for an actual tag (needs to be used for next position index). The results is a slight loss in compression ratio (up to 0.2%) and some regressions/improvements to speed depending on level and sample. In turn, we get to save 16% of the hash table's space (5 bytes per entry instead of 6 bytes per entry).	2023-03-10 14:15:04 -08:00
Nick Terrell	c40c7378c6	Clarify dstCapacity requirements Clarify `dstCapacity` requirements for single-pass functions. Fixes #3524.	2023-03-09 10:18:30 -08:00
Nick Terrell	07a2a33135	Add ZSTD_set{C,F,}Params() helper functions * Add ZSTD_setFParams() and ZSTD_setParams() * Modify ZSTD_setCParams() to use ZSTD_setParameter() to avoid a second path setting parameters * Add unit tests * Update documentation to suggest using them to replace deprecated functions Fixes #3396.	2023-03-08 09:57:35 -08:00
Yonatan Komornik	988ce61a0c	Adds initialization of clevel to static cdict (#3525 ) (#3527 ) - Initializes clevel in `ZSTD_CCtxParams_init` - Adds CI workflow for msan fuzzers runs without optimization (`-O0`) - Fixes Makefile to correctly pass on user defined `MOREFLAGS` and `FUZZER_FLAGS` in cases they have been overwritten	2023-03-06 18:05:12 -08:00
Yann Collet	bd86e24637	Merge pull request #3513 from DimitriPapadopoulos/codespell Fix typos found by codespell	2023-02-27 11:44:31 -08:00
Nick Terrell	395a2c5462	[bug-fix] Fix rare corruption bug affecting the block splitter The block splitter confuses sequences with literal length == 65536 that use a repeat offset code. It interprets this as literal length == 0 when deciding the meaning of the repeat offset, and corrupts the repeat offset history. This is benign, merely causing suboptimal compression performance, if the confused history is flushed before the end of the block, e.g. if there are 3 consecutive non-repeat code sequences after the mistake. It also is only triggered if the block splitter decided to split the block. All that to say: This is a rare bug, and requires quite a few conditions to trigger. However, the good news is that if you have a way to validate that the decompressed data is correct, e.g. you've enabled zstd's checksum or have a checksum elsewhere, the original data is very likely recoverable. So if you were affected by this bug please reach out. The fix is to remind the block splitter that the literal length is actually 64K. The test case is a bit tricky to set up, but I've managed to reproduce the issue. Thanks to @danlark1 for alerting us to the issue and providing us a reproducer!	2023-02-23 10:54:31 -08:00
Dimitri Papadopoulos	547794ef40	Fix typos found by codespell	2023-02-18 10:31:48 +01:00
Yonatan Komornik	c78f434aa4	Fix zstd-dll build missing dependencies (#3496 ) * Fixes zstd-dll build (https://github.com/facebook/zstd/issues/3492): - Adds pool.o and threading.o dependency to the zstd-dll target - Moves custom allocation functions into header to avoid needing to add dependency on common.o - Adds test target for zstd-dll - Adds github workflow that buildis zstd-dll	2023-02-12 12:32:31 -08:00
Elliot Gorokhovsky	a7de1d9f49	Fix all MSVC warnings (#3495 ) * fix and test MSVC AVX2 build * treat msbuild warnings as errors * fix incorrect MSVC 2019 compiler warning * fix MSVC error D9035: option 'Gm' has been deprecated and will be removed in a future release	2023-02-11 10:56:59 -05:00
Elliot Gorokhovsky	ff42ed1582	Rename "External Matchfinder" to "Block-Level Sequence Producer" (#3484 ) * change "external matchfinder" to "external sequence producer" * migrate contrib/ to new naming convention * fix contrib build * fix error message * update debug strings * fix def of invalid sequences in zstd.h * nit * update CHANGELOG * fix .gitignore	2023-02-09 17:01:17 -05:00
Yann Collet	c689310b25	rewrite legacy v0.7 bound checks to be independent of address space overflow	2023-02-07 17:11:07 -08:00
Yann Collet	c5bf6b8b88	add requested check for legacy decoder v0.1 which uses a different technique to store literals, and therefore must check for potential overwrites.	2023-02-07 14:47:16 -08:00
Yann Collet	9419747171	fix legacy decoders v0.4, v0.5 and v0.6	2023-02-07 14:02:12 -08:00
Yann Collet	67d7a659f8	port fix for v0.3 to v0.6 in case it would applicable for this version	2023-02-07 13:55:30 -08:00
Yann Collet	7a1a171658	port fix for v0.3 to v0.5 in case it would be applicable for this version too	2023-02-07 13:55:30 -08:00
Yann Collet	b20e4e95f2	copy fix for v0.3 to v0.4 in case it would be applicable for this legacy version too.	2023-02-07 13:55:30 -08:00
Yann Collet	7eb4471fec	adapt v0.3 fix to v0.1 slightly different constraints on end of buffer conditions	2023-02-07 13:55:30 -08:00
Yann Collet	cfec005efd	fix for v0.3 blindly ported to v0.2 in case it would be applicable here too.	2023-02-07 13:55:30 -08:00
Yann Collet	e04706c58c	fix oss-fuzz case 55714 impacts legacy decoder v0.3 in 32-bit mode	2023-02-07 13:55:30 -08:00
Nick Terrell	71a0259247	Fix ZSTD_getOffsetInfo() when nbSeq == 0 In 32-bit mode, ZSTD_getOffsetInfo() can be called when nbSeq == 0, and in this case the offset table is uninitialized. The function should just return 0 for both values, because there are no sequences. Credit to OSS-Fuzz	2023-02-02 14:26:41 -08:00
Elliot Gorokhovsky	31e41b3d5e	Merge pull request #3471 from embg/fast_seq_parse Reduce external matchfinder API overhead by 25%	2023-02-01 21:30:36 -05:00
Elliot Gorokhovsky	3fe5f1fbb9	assert externalRepSearch != ZSTD_ps_auto	2023-02-01 18:24:46 -08:00
Nick Terrell	cc3e3acd34	Fix 32-bit decoding with large dictionary The 32-bit decoder could corrupt the regenerated data by using regular offset mode when there were actually long offsets. This is because we were only considering the window size in the calculation, not the dictionary size. So a large dictionary could allow longer offsets. Fix this in two ways: 1. Instead of looking at the window size, look at the total referencable bytes in the history buffer. Use this in the comparison instead of the window size. Additionally, we were comparing against the wrong value, it was too low. Fix that by computing exactly the maximum offset for regular sequence decoding. 2. If it is possible that we have long offsets due to (1), then check the offset code decoding table, and if the decoding table's maximum number of additional bits is no more than STREAM_ACCUMULATOR_MIN, then we can't have long offsets. This gates us to be using the long offsets decoder only when we are very likely to actually have long offsets. Note that this bug only affects the decoding of the data, and the original compressed data, if re-read with a patched decoder, will correctly regenerate the orginal data. Except that the encoder also had the same issue previously. This fixes both the open OSS-Fuzz issues. Credit to OSS-Fuzz	2023-02-01 17:22:44 -08:00
Elliot Gorokhovsky	7f8189ca57	add ZSTD_c_fastExternalSequenceParsing cctxParam	2023-02-01 09:09:53 -08:00
Elliot Gorokhovsky	64052ef57d	Guard against invalid sequences from external matchfinders (#3465 )	2023-01-31 13:55:48 -05:00

1 2 3 4 5 ...

4422 Commits