krak/zstd - zstd - Gitea: Git with a cup of tea

krak/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-03-06 16:56:49 +02:00

Author	SHA1	Message	Date
Jacob Greenfield	55ff3e4e17	Save one byte on the frame epilogue	2023-07-20 18:59:44 -04:00
Yann Collet	118200f7b9	Merge pull request #3677 from facebook/detectOverflow Changed the decoding loop to detect more invalid cases of corruption sooner	2023-07-05 00:59:08 -07:00
Yann Collet	25822342be	Merge pull request #3688 from nidhijaju/hide-asm-apple Hide ASM symbols on Apple platforms	2023-06-29 19:40:37 -07:00
Nidhi Jaju	b1a30e2b4a	hide asm functions on apple platforms	2023-06-26 00:07:30 +00:00
Elliot Gorokhovsky	c6a888c073	suppress false error message in LDM mode	2023-06-21 19:19:02 -07:00
Yann Collet	e4aeaebc20	fixed incorrect test in Win32 pthread wrapper reported by @Banzai24-yht in #3683	2023-06-20 08:34:26 -07:00
Yann Collet	c123e69ad0	fixed static analyzer false positive regarding @sequence initialization make a mock initialization to please the tool	2023-06-16 16:24:48 -07:00
Yann Collet	c60dcedcc9	adapted long decoder to new decodeSequences removed older decodeSequences	2023-06-16 15:52:00 -07:00
Yann Collet	33fca19dd4	changed ZSTD_decompressSequences_bodySplitLitBuffer() decoding loop to behave more like the regular decoding loop.	2023-06-16 15:32:07 -07:00
Yann Collet	84e898a76c	removed _old variant from splitLit	2023-06-16 14:42:28 -07:00
Yann Collet	02134fad12	changed (partially) the decodeSequences flow logic this allows detecting overflow events without a checksum.	2023-06-16 11:57:12 -07:00
Yann Collet	d9645327b3	fixed MEM_STATIC already defined in Linux Kernel mode	2023-06-14 20:07:18 -07:00
Yann Collet	74c901bbed	fix : unused attribute for FORCE_INLINE functions fix2 : reloadDStreamFast is used by decompress4x2, modified the entry point, so that it works fine in this case too.	2023-06-14 16:32:51 -07:00
Yann Collet	ba50807029	make the bitstream generate only 0-value bits after an overflow	2023-06-14 15:42:37 -07:00
Yann Collet	b46236278a	detect extraneous bytes in the Sequences section when nbSeq == 0. Reported by @ip7z	2023-06-13 11:43:45 -07:00
Yann Collet	3732a08f5b	fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes The sequence section starts with a number, which tells how sequences are present in the section. If this number if 0, the section automatically ends. The number 0 can be represented using the 1 byte or the 2 bytes formats. That's because the 2-bytes formats fully overlaps the 1 byte format. However, when 0 is represented using the 2-bytes format, the decoder was expecting the sequence section to continue, and was looking for FSE tables, which is incorrect. Fixed this behavior, in both the reference decoder and the educational behavior. In practice, this behavior never happens, because the encoder will always select the 1-byte format to represent 0, since this is more efficient. Completed the fix with a new golden sample for tests, a clarification of the specification, and a decoder errata paragraph.	2023-06-05 16:03:00 -07:00
Yann Collet	3e815f5b3a	Merge pull request #3664 from facebook/llu changed LLU suffix into ULL for Visual 2012 and lower	2023-06-05 15:03:27 -04:00
Yann Collet	1f83b7cfc4	fix a minor inefficiency in compress_superblock and in `decodecorpus`: the specific case `nbSeq=127` can be represented using the 1-byte format. Note that both the 1-byte and the 2-bytes formats are valid to represent this case, so there was no "error", produced data remains valid, it's just that the 1-byte format is more efficient. fix #3667 Credit to @ip7z for finding this issue.	2023-06-05 09:51:52 -07:00
Yann Collet	94a2f2791f	changed LLU suffix into ULL for Visual 2012 and lower both suffixes are supposed to be valid, but for some reason, Visual 2012 and lower only support ULL.	2023-05-31 13:29:53 -07:00
Nick Terrell	d01a2c6929	Fix UBSAN issue (zero addition to NULL) Fix UBSAN issue that came up internally.	2023-05-26 13:43:47 -07:00
Duncan Horn	1b994cbc57	Get zstd working with ARM64EC on Windows	2023-05-23 18:40:31 -04:00
W. Felix Handte	1b65803fe7	Reorder Definitions in zstd_opt.c to Group Under Macro Guards (Slightly)	2023-05-22 12:41:48 -04:00
W. Felix Handte	59c7b2a492	Reorder Definitions in zstd_lazy.c to Group Under Macro Guards	2023-05-22 12:37:03 -04:00
W. Felix Handte	5490c75dda	Also Allow/Document/Test Excluding dfast and Up	2023-05-04 12:31:41 -04:00
W. Felix Handte	cc1ffe0bd6	Add Documentation to lib/README.md	2023-05-04 12:20:02 -04:00
W. Felix Handte	eb9227935e	Also Reorganize Zstd Opt Declarations	2023-05-04 12:18:58 -04:00
W. Felix Handte	d09f195ceb	Remove blockCompressor NULL Checks	2023-05-04 12:18:58 -04:00
W. Felix Handte	b7add1dd67	Abort if Unsupported Parameters Used	2023-05-04 12:18:58 -04:00
W. Felix Handte	f242f5be8f	Re-Order Lazy Declarations; Minimize ifndefs	2023-05-04 12:18:58 -04:00
W. Felix Handte	bae174960b	Add ZSTD_LIB_EXCLUDE_COMPRESSORS_DFAST_AND_UP Build Variable	2023-05-04 12:18:58 -04:00
W. Felix Handte	39b7946b95	Define Macros for Possibly-Present Functions; Use Them Rather than Ifdef Guards	2023-05-04 12:18:58 -04:00
W. Felix Handte	b12e8cb3e7	Merge Ultra and Ultra2 Exclusion Ultra2 does not exist for dict compression, and so uses ultra. So ultra must be present if ultra2 is.	2023-05-04 12:18:58 -04:00
W. Felix Handte	6761e1c949	Tweak Ultra/Opt Guards	2023-05-04 12:18:58 -04:00
W. Felix Handte	5a75956001	Adjust Strategy in CParams to Avoid Using Excluded Block Compressors	2023-05-04 12:18:58 -04:00
W. Felix Handte	50cdf84f58	Macro-Exclude Block Compressors from Declaration/Definition	2023-05-04 12:18:58 -04:00
W. Felix Handte	81b86a2024	NULL Out Block Compressor Table Entries When Excluded Don't check about excluding `ZSTD_fast`. It's always included so that we know we can resolve downwards and hit a strategy that's present.	2023-05-04 12:18:58 -04:00
W. Felix Handte	cbf3e26316	Allow `ZSTD_selectBlockCompressor()` to Return NULL Return an error rather than segfaulting.	2023-05-04 12:18:58 -04:00
Daniel Kutenin	4c25ea329b	Disable unused variable warning in msan configurations	2023-04-20 11:14:08 +01:00
Nick Terrell	61efb2a047	Add ZSTD_d_maxBlockSize parameter Reduces memory when blocks are guaranteed to be smaller than allowed by the format. This is useful for streaming compression in conjunction with ZSTD_c_maxBlockSize. This PR saves 2 * (formatMaxBlockSize - paramMaxBlockSize) when streaming. Once it is rebased on top of PR #3616 it will save 3 * (formatMaxBlockSize - paramMaxBlockSize).	2023-04-17 22:06:44 -07:00
Nick Terrell	0abf2baef9	Reduce streaming decompression memory by 128KB The split literals buffer patch increased streaming decompression memory by 64KB (shrunk lit buffer from 128KB to 64KB, and added 128KB). This patch removes the added 128KB buffer, because it isn't necessary. The buffer was there because the literals compression code didn't know the true `blockSizeMax` of the frame, and always put split literals so they ended 128KB - 32 from the beginning of the block. Instead, we can pass down the true `blockSizeMax` and ensure that the split literals end up at `blockSizeMax - 32` from the beginning of the block. We already reserve a full `blockSizeMax` bytes in streaming mode, so we won't be overwriting the extDict window.	2023-04-17 16:31:02 -07:00
Yann Collet	e4120c5513	fixing potential over-reads detected by @terrelln, these issue could be triggered in specific scenarios namely decompression of certain invalid magic-less frames, or requested properties from certain invalid skippable frames.	2023-04-03 16:52:32 -07:00
Yann Collet	2e29728797	fix #3583 As reported by @georgmu, the previous fix is undone by the later initialization. Switch order, so that initialization is adjusted by special case.	2023-04-03 09:45:11 -07:00
Yann Collet	9f58241dcc	updated version number to v1.5.5 also : updated man pages	2023-03-31 23:02:08 -07:00
daniellerozenblit	fcaf06ddb4	Check that `dest` is valid for decompression (#3555 ) * add check for valid dest buffer and fuzz on random dest ptr when malloc 0 * add uptrval to linux-kernel * remove bin files * get rid of uptrval * restrict max pointer value check to platforms where sizeof(size_t) == sizeof(void*)	2023-03-31 23:00:55 -07:00
Han Zhu	b558190ac7	Remove clang-only branch hints from ZSTD_decodeSequence Looking at the __builtin_expect in ZSTD_decodeSequence: { size_t offset; #if defined(__clang__) if (LIKELY(ofBits > 1)) { #else if (ofBits > 1) { #endif ZSTD_STATIC_ASSERT(ZSTD_lo_isLongOffset == 1); From profile-annotated assembly, the probability of ofBits > 1 is about 75% (101k counts out of 135k counts). This is much smaller than the recommended likelihood to use __builtin_expect which is 99%. As a result, clang moved the else block further away which hurts cache locality. Removing this __built_expect along with two others in ZSTD_decodeSequence gave better performance when PGO is enabled. I suggest to remove these branch hints and rely on PGO which leverages runtime profiles from actual workload to calculate branch probability instead.	2023-03-28 15:36:22 -07:00
Han Zhu	e6dccbf482	Inline BIT_reloadDStream Inlining `BIT_reloadDStream` provided >3% decompression speed improvement for clang PGO-optimized zstd binary, measured using the Silesia corpus with compression level 1. The win comes from improved register allocation which leads to fewer spills and reloads. Take a look at this comparison of profile-annotated hot assembly before and after this change: https://www.diffchecker.com/UjDGIyLz/. The diff is a bit messy, but notice three fewer moves after inlining. In general LLVM's register allocator works better when it can see more code. For example, when the register allocator sees a call instruction, it partitions the registers into caller registers and callee registers, and it is not free to do whatever it wants with all the registers for the current function. Inlining the callee lets the register allocation access all registers and use them more flexsibly.	2023-03-28 15:36:02 -07:00
daniellerozenblit	3e0550ee52	fix window update (#3556 )	2023-03-21 13:28:26 -04:00
Nick Terrell	a3c3a38b9b	[lazy] Skip over incompressible data Every 256 bytes the lazy match finders process without finding a match, they will increase their step size by 1. So for bytes [0, 256) they search every position, for bytes [256, 512) they search every other position, and so on. However, they currently still insert every position into their hash tables. This is different from fast & dfast, which only insert the positions they search. This PR changes that, so now after we've searched 2KB without finding any matches, at which point we'll only be searching one in 9 positions, we'll stop inserting every position, and only insert the positions we search. The exact cutoff of 2KB isn't terribly important, I've just selected a cutoff that is reasonably large, to minimize the impact on "normal" data. This PR only adds skipping to greedy, lazy, and lazy2, but does not touch btlazy2. \| Dataset \| Level \| Compiler \| CSize ∆ \| Speed ∆ \| \|---------\|-------\|--------------\|---------\|---------\| \| Random \| 5 \| clang-14.0.6 \| 0.0% \| +704% \| \| Random \| 5 \| gcc-12.2.0 \| 0.0% \| +670% \| \| Random \| 7 \| clang-14.0.6 \| 0.0% \| +679% \| \| Random \| 7 \| gcc-12.2.0 \| 0.0% \| +657% \| \| Random \| 12 \| clang-14.0.6 \| 0.0% \| +1355% \| \| Random \| 12 \| gcc-12.2.0 \| 0.0% \| +1331% \| \| Silesia \| 5 \| clang-14.0.6 \| +0.002% \| +0.35% \| \| Silesia \| 5 \| gcc-12.2.0 \| +0.002% \| +2.45% \| \| Silesia \| 7 \| clang-14.0.6 \| +0.001% \| -1.40% \| \| Silesia \| 7 \| gcc-12.2.0 \| +0.007% \| +0.13% \| \| Silesia \| 12 \| clang-14.0.6 \| +0.011% \| +22.70% \| \| Silesia \| 12 \| gcc-12.2.0 \| +0.011% \| -6.68% \| \| Enwik8 \| 5 \| clang-14.0.6 \| 0.0% \| -1.02% \| \| Enwik8 \| 5 \| gcc-12.2.0 \| 0.0% \| +0.34% \| \| Enwik8 \| 7 \| clang-14.0.6 \| 0.0% \| -1.22% \| \| Enwik8 \| 7 \| gcc-12.2.0 \| 0.0% \| -0.72% \| \| Enwik8 \| 12 \| clang-14.0.6 \| 0.0% \| +26.19% \| \| Enwik8 \| 12 \| gcc-12.2.0 \| 0.0% \| -5.70% \| The speed difference for clang at level 12 is real, but is probably caused by some sort of alignment or codegen issues. clang is significantly slower than gcc before this PR, but gets up to parity with it. I also measured the ratio difference for the HC match finder, and it looks basically the same as the row-based match finder. The speedup on random data looks similar. And performance is about neutral, without the big difference at level 12 for either clang or gcc.	2023-03-20 11:18:29 -07:00
Yann Collet	e2208242ac	Merge pull request #3553 from facebook/ldm_dict added documentation for LDM + dictionary compatibility	2023-03-16 11:20:32 -07:00
Nick Terrell	fbd97f305a	Deprecated bufferless and block level APIs * Mark all bufferless and block level functions as deprecated * Update documentation to suggest not using these functions * Add `_deprecated()` wrappers for functions that we use internally and call those instead	2023-03-16 10:04:15 -07:00

... 3 4 5 6 7 ...

4652 Commits