krak/zstd - zstd - Gitea: Git with a cup of tea

krak/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-03-07 01:10:04 +02:00

Author	SHA1	Message	Date
Yann Collet	481a2e1010	Merge pull request #3403 from facebook/setCParams ZSTD_CCtx_setCParams	2022-12-28 14:07:13 -08:00
Elliot Gorokhovsky	2a402626dd	External matchfinder API (#3333 ) * First building commit with sample matchfinder * Set up ZSTD_externalMatchCtx struct * move seqBuffer to ZSTD_Sequence* * support non-contiguous dictionary * clean up parens * add clearExternalMatchfinder, handle allocation errors * Add useExternalMatchfinder cParam * validate useExternalMatchfinder cParam * Disable LDM + external matchfinder * Check for static CCtx * Validate mState and mStateDestructor * Improve LDM check to cover both branches * Error API with optional fallback * handle RLE properly for external matchfinder * nit * Move to a CDict-like model for resource ownership * Add hidden useExternalMatchfinder bool to CCtx_params_s * Eliminate malloc, move to cwksp allocation * Handle CCtx reset properly * Ensure seqStore has enough space for external sequences * fix capitalization * Add DEBUGLOG statements * Add compressionLevel param to matchfinder API * fix c99 issues and add a param combination error code * nits * Test external matchfinder API * C90 compat for simpleExternalMatchFinder * Fix some @nocommits and an ASAN bug * nit * nit * nits * forward declare copySequencesToSeqStore functions in zstd_compress_internal.h * nit * nit * nits * Update copyright headers * Fix CMake zstreamtest build * Fix copyright headers (again) * typo * Add externalMatchfinder demo program to make contrib * Reduce memory consumption for small blockSize * ZSTD_postProcessExternalMatchFinderResult nits * test sum(matchlen) + sum(litlen) == srcSize in debug builds * refExternalMatchFinder -> registerExternalMatchFinder * C90 nit * zstreamtest nits * contrib nits * contrib nits * allow block splitter + external matchfinder, refactor * add windowSize param * add contrib/externalMatchfinder/README.md * docs * go back to old RLE heuristic because of the first block issue * fix initializer element is not a constant expression * ref contrib from zstd.h * extremely pedantic compiler warning fix, meson fix, typo fix * Additional docs on API limitations * minor nits * Refactor maxNbSeq calculation into a helper function * Fix copyright	2022-12-28 16:45:14 -05:00
Yann Collet	b17743e41b	Signal parameter change during MT compression	2022-12-28 13:14:58 -08:00
Yann Collet	89342d1e07	New xp library symbol : ZSTD_CCtx_setCParams() Inspired by #3395, offer a new capability to set all parameters defined in a ZSTD_compressionParameters structure with a single symbol invocation to improve user code brevity.	2022-12-27 23:49:22 -08:00
Yann Collet	089b2797e3	Merge pull request #3398 from facebook/fix3316 spec update : require minimum nb of literals for 4-streams mode	2022-12-22 16:57:05 -08:00
Yann Collet	6a9c525903	spec update : require minimum nb of literals for 4-streams mode Reported by @shulib : the specification for 4-streams mode doesn't work when the amount of literals to compress is 5 bytes. Extending it, it also doesn't work for sizes 1 or 2. This patch updates the specification and the implementation to require a minimum of 6 literals to trigger or accept the 4-streams mode. The impact is expected to be a no-op : the 4-streams mode is never triggered for such small quantity of literals anyway, since it would be wasteful (it costs ~7.3 bytes more than single-stream mode). An informal lower limit is set at ~256 bytes, so the technical minimum is very far from this limit. This is just meant for completeness of the specification.	2022-12-22 16:14:34 -08:00
Yann Collet	ea2895cef4	Support decompression of compressed blocks of size ZSTD_BLOCKSIZE_MAX exactly	2022-12-22 12:40:27 -08:00
Nick Terrell	40a7188130	Fix `make clangbuild` & add CI Fix the errors for: * `-Wdocumentation` * `-Wconversion` except `-Wsign-conversion`	2022-12-21 17:31:04 -08:00
W. Felix Handte	5d693cc38c	Coalesce Almost All Copyright Notices to Standard Phrasing ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache $ -prune -o -type f); do sed -i '/Copyright .* $Yann Collet$\\|$Meta Platforms$/ s/Copyright ./Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0.c lib/legacy/zstd_v0*.h nano ./programs/windres/zstd.rc nano ./build/VS2010/zstd/zstd.rc nano ./build/VS2010/libzstd-dll/libzstd-dll.rc ```	2022-12-20 12:52:34 -05:00
W. Felix Handte	8927f985ff	Update Copyright Headers 'Facebook' -> 'Meta Platforms' ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora $ -prune -o -type f); do sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f; done ```	2022-12-20 12:37:57 -05:00
Yann Collet	832c1a6a1c	minor reformatting and minor reliability and maintenance changes	2022-12-18 11:26:57 -08:00
Yann Collet	51355e1f70	Merge pull request #3362 from facebook/compressBound check potential overflow of compressBound()	2022-12-16 14:22:22 -08:00
Nick Terrell	ee6475cbbd	Add missing parens around macro definition Fixes #3301.	2022-12-15 17:18:23 -08:00
Yann Collet	45ed0df18a	check potential overflow of compressBound() fixed #3323, reported by @nigeltao Completed documentation around this risk (which is largely theoretical, I can't see that happening in any "real world" scenario, but an erroneous @srcSize value could indeed trigger it).	2022-12-15 15:23:15 -08:00
Nick Terrell	a91e7ec175	Fix corruption that rarely occurs in 32-bit mode with wlog=25 Fix an off-by-one error in the compressor that emits corrupt blocks if: * Zstd is compiled in 32-bit mode * The windowLog == 25 exactly * An offset of 2^25-3, 2^25-2, 2^25-1, or 2^25 is emitted * The bitstream had 7 bits leftover before writing the offset This bug has been present since before v1.0, but wasn't able to easily be triggered, since until somewhat recently zstd wasn't able to find matches that were within 128KB of the window size. Add a test case, and fix 2 bugs in `ZSTD_compressSequences()`: * The `ZSTD_isRLE()` check was incorrect. It wouldn't produce corruption, but it could waste CPU and not emit RLE even if the block was RLE * One windowSize was `1 << windowLog`, not `1u << windowLog` Thanks to @tansy for finding the issue, and giving us a reproducer! Fixes Issue #3350.	2022-12-15 14:41:50 -08:00
daniellerozenblit	e2fc93340f	Merge branch 'dev' into http-to-https	2022-12-15 10:46:13 -05:00
Danielle Rozenblit	4dffc35f2e	Convert references to https from http	2022-12-14 06:58:35 -08:00
Danielle Rozenblit	69ec75f0d5	fix window resizing edge case	2022-12-13 08:35:20 -08:00
Yann Collet	80cf73fd66	Merge pull request #3320 from cwoffenden/msvc-arm64-size_t Fix for MSVC C4267 warning on ARM64 (which becomes error C2220 with /WX)	2022-12-04 18:55:30 -08:00
Yann Collet	ecd7601c36	minor: proper pledgedSrcSize trace	2022-11-22 06:00:45 -08:00
Elliot Gorokhovsky	c8d870fe52	Improve LDM cparam validation logic	2022-11-21 15:39:18 -05:00
Carl Woffenden	0547c3d3f8	Random edit to re-run the CI I don't believe the (x64) Mac failure is related to error since it would take the SSE path.	2022-11-19 19:04:08 +01:00
Carl Woffenden	0168914490	Fix for MSVC C4267 error	2022-11-18 11:31:17 +01:00
Nick Terrell	dcc7228de9	[lazy] Use switch instead of indirect function calls. (#3295 ) Use a switch statement to select the search function instead of an indirect function call. This results in a sizable performance win. This PR is a modification of the approach taken in PR #2828. When I measured performance for that commit, it was neutral. However, I now see a performance regression on gcc, but still neutral on clang. I'm measuring on the same platform, but with newer compilers. The new approach beats both the current dev branch and the baseline before PR #2828 was merged. This PR is necessary for Issue #3275, to update zstd in the kernel. Without this PR there is a large regression in greedy - btlazy2 compression speed. With this PR it is about neutral. gcc version: 12.2.0 clang version: 14.0.6 dataset: silesia.tar \| Compiler \| Level \| Dev Speed (MB/s) \| PR Speed (MB/s) \| Delta \| \|----------\|-------\|------------------\|-----------------\|--------\| \| gcc \| 5 \| 102.6 \| 113.7 \| +10.8% \| \| gcc \| 7 \| 66.6 \| 74.8 \| +12.3% \| \| gcc \| 9 \| 51.5 \| 58.9 \| +14.3% \| \| gcc \| 13 \| 14.3 \| 14.3 \| +0.0% \| \| clang \| 5 \| 108.1 \| 114.8 \| +6.2% \| \| clang \| 7 \| 68.5 \| 72.3 \| +5.5% \| \| clang \| 9 \| 53.2 \| 56.2 \| +5.6% \| \| clang \| 13 \| 14.3 \| 14.7 \| +2.8% \| The binary size stays just about the same for clang and gcc, measured using the `size` command: \| Compiler \| Branch \| Text \| Data \| BSS \| Total \| \|----------\|--------\|---------\|------\|-----\|---------\| \| gcc \| dev \| 1127950 \| 3312 \| 280 \| 1131542 \| \| gcc \| PR \| 1123422 \| 2512 \| 280 \| 1126214 \| \| clang \| dev \| 1046254 \| 3256 \| 216 \| 1049726 \| \| clang \| PR \| 1048198 \| 2296 \| 216 \| 1050710 \|	2022-10-21 17:14:02 -07:00
Danielle Rozenblit	a910489ff5	No longer pass srcSize to minTableLog	2022-10-17 08:03:44 -07:00
Danielle Rozenblit	b34729018c	Minor simplication: no longer need to check src size if using cardinality for minTableLog	2022-10-17 07:55:07 -07:00
Danielle Rozenblit	75cd42afd7	Update regression results and better variable naming for HUF_cardinality	2022-10-14 13:37:19 -07:00
Danielle Rozenblit	e60cae33cf	Additional ratio optimizations	2022-10-14 10:37:35 -07:00
Danielle Rozenblit	fa7d9c1139	Set threshold to use optimal table log	2022-10-11 14:33:25 -07:00
Danielle Rozenblit	8888a2ddcc	CI failure fixes	2022-10-11 13:12:19 -07:00
Qiongsi Wu	1b445c1c2e	Fix hash4Ptr for big endian (#3227 )	2022-08-01 10:41:24 -07:00
udayanbapat	43f21a600e	Intial commit to address 3090. Added support to decompress empty block. (#3118 ) * Intial commit to address 3090. Added support to decompress empty block * Update zstd_decompress_block.c Addressed review comments for the case of 'set_basic' * Update lib/decompress/zstd_decompress_block.c Co-authored-by: Nick Terrell <nickrterrell@gmail.com> * Update lib/decompress/zstd_decompress_block.c Co-authored-by: Nick Terrell <nickrterrell@gmail.com> Co-authored-by: Nick Terrell <nickrterrell@gmail.com>	2022-07-14 11:54:34 -07:00
Elliot Gorokhovsky	cb9e341129	Nits	2022-06-23 16:59:21 -04:00
Elliot Gorokhovsky	93b89fb24b	Add docs	2022-06-22 16:13:07 -04:00
Elliot Gorokhovsky	2a128110d0	Add prefetchCDictTables CCtxParam	2022-06-22 16:13:07 -04:00
Elliot Gorokhovsky	f6ef14329f	"Short cache" optimization for level 1-4 DMS (+5-30% compression speed) (#3152 ) * first attempt at fast DMS short cache * significant wins for some scenarios * fix all clang regressions * nits * fix 1.5% gcc11 regression on hot 110Kdict scenario * fix CI * nit * Add tags to doublefast hash table * use tags in doublefast DMS * Fix CI * Clean up some hardcoded logic / constants * Switch forCCtx to an enum * nit * add short cache to ip+1 long search * Move tag size into hashLog * Minor nits * Truncate dictionaries greater than 16MB in short cache mode * Helper function for tag comparison * Cap short cache hashLog at 24 to prevent overflow * size_t dictTagsMatch -> int dictTagsMatch * nit * Clean up and comment dictionary truncation * Move ZSTD_tableFillPurpose_e next to ZSTD_dictTableLoadMethod_e * Comment and expand helper functions * Asserts and documentation * nit	2022-06-21 17:27:19 -04:00
Daniel Kutenin	05f3f415ce	Fix big endian ARM NEON path It is not using the NEON acceleration but the bit grouping was applied	2022-06-13 09:16:24 +01:00
Elliot Gorokhovsky	f313a773a4	Merge pull request #3157 from embg/huge_dict_bugfix Bugfix for huge dictionaries	2022-06-09 15:35:29 -04:00
Elliot Gorokhovsky	31bd6402c6	Bugfix for huge dictionaries	2022-06-09 11:39:30 -04:00
Nick Terrell	7c05b9aec3	Remove expensive assert in --rsyncable hot loop This assert slows the loop down by 10x. We can get similar coverage by asserting at the beginning & end of the loop. We need this fix because Debian compiles zstd with asserts enabled. Separately, we should ask them why, and if they would consider disabling asserts in their builds. Since we don't optimize for assert enabled builds. Fixes Issue #3150.	2022-06-06 11:56:13 -07:00
ihsinme	5081ccb056	Update zstd_compress.c	2022-05-30 14:08:19 +03:00
Danila Kutenin	9166c6ae20	Again unused error warning. Fixed	2022-05-23 14:51:47 +00:00
Danila Kutenin	6b561d230f	Move NEON version to a separate function and fix indentation	2022-05-23 14:49:35 +00:00
Danila Kutenin	778f639be9	Disable unused variable warning	2022-05-22 10:50:33 +00:00
Danila Kutenin	e11783b04d	[lazy] Optimize ZSTD_row_getMatchMask for level 8-10 We found that movemask is not used properly or consumes too much CPU. This effort helps to optimize the movemask emulation on ARM. For level 8-9 we saw 3-5% improvements. For level 10 we say 1.5% improvement. The key idea is not to use pure movemasks but to have groups of bits. For rowEntries == 16, 32 we are going to have groups of size 4 and 2 respectively. It means that each bit will be duplicated within the group Then we do AND to have only one bit set in the group so that iteration with lowering bit `a &= (a - 1)` works as well. Also, aarch64 does not have rotate instructions for 16 bit, only for 32 and 64, that's why we see more improvements for level 8-9. vshrn_n_u16 instruction is used to achieve that: vshrn_n_u16 shifts by 4 every u16 and narrows to 8 lower bits. See the picture below. It's also used in [Folly](`c570259008/folly/container/detail/F14Table.h (L446)`). It also uses 2 cycles according to Neoverse-N{1,2} guidelines. 64 bit movemask is already well optimized. We have ongoing experiments but were not able to validate other implementations work reliably faster.	2022-05-22 10:44:24 +00:00
Elliot Gorokhovsky	f349d18776	Merge pull request #3127 from embg/repcode_history Correct and clarify repcode offset history logic	2022-05-12 13:50:15 -04:00
Elliot Gorokhovsky	3620a0a565	Nits	2022-05-12 12:53:15 -04:00
W. Felix Handte	1dd046a507	Fix Comments Slightly	2022-05-11 12:38:45 -04:00
W. Felix Handte	cd1f582943	Hoist Hash Table Writes Up into Each Match Found Block Refactoring this way avoids the bad write in the case that `step > 4`, and is a bit more straightforward. It also seems to perform better!	2022-05-11 11:27:34 -04:00
W. Felix Handte	040986a4f4	ZSTD_fast_noDict: Minimize Checks When Writing Hash Table for ip1 This commit avoids checking whether a hashtable write is safe in two of the three match-found paths in `ZSTD_compressBlock_fast_noDict_generic`. This pro- duces a ~0.5% speed-up in compression. A comment in the code describes why we can skip this check in the other two paths (the repcode check and the first match check in the unrolled loop). A downside is that in the new position where we make this check, we have not yet computed `mLength`. We therefore have to avoid writing possibly dangerous positions, rather than the old check which only avoids writing actually dangerous positions. This leads to a miniscule loss in ratio (remember that this scenario can only been triggered in very negative levels or under incomp- ressibility acceleration).	2022-05-10 14:29:39 -07:00

1 2 3 4 5 ...

2168 Commits