krak/zstd - zstd - Gitea: Git with a cup of tea

krak/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-07-04 06:40:29 +02:00

Author	SHA1	Message	Date
Yann Collet	74c901bbed	fix : unused attribute for FORCE_INLINE functions fix2 : reloadDStreamFast is used by decompress4x2, modified the entry point, so that it works fine in this case too.	2023-06-14 16:32:51 -07:00
Yann Collet	ba50807029	make the bitstream generate only 0-value bits after an overflow	2023-06-14 15:42:37 -07:00
Yann Collet	3732a08f5b	fixed decoder behavior when nbSeqs==0 is encoded using 2 bytes The sequence section starts with a number, which tells how sequences are present in the section. If this number if 0, the section automatically ends. The number 0 can be represented using the 1 byte or the 2 bytes formats. That's because the 2-bytes formats fully overlaps the 1 byte format. However, when 0 is represented using the 2-bytes format, the decoder was expecting the sequence section to continue, and was looking for FSE tables, which is incorrect. Fixed this behavior, in both the reference decoder and the educational behavior. In practice, this behavior never happens, because the encoder will always select the 1-byte format to represent 0, since this is more efficient. Completed the fix with a new golden sample for tests, a clarification of the specification, and a decoder errata paragraph.	2023-06-05 16:03:00 -07:00
Duncan Horn	1b994cbc57	Get zstd working with ARM64EC on Windows	2023-05-23 18:40:31 -04:00
Han Zhu	e6dccbf482	Inline BIT_reloadDStream Inlining `BIT_reloadDStream` provided >3% decompression speed improvement for clang PGO-optimized zstd binary, measured using the Silesia corpus with compression level 1. The win comes from improved register allocation which leads to fewer spills and reloads. Take a look at this comparison of profile-annotated hot assembly before and after this change: https://www.diffchecker.com/UjDGIyLz/. The diff is a bit messy, but notice three fewer moves after inlining. In general LLVM's register allocator works better when it can see more code. For example, when the register allocator sees a call instruction, it partitions the registers into caller registers and callee registers, and it is not free to do whatever it wants with all the registers for the current function. Inlining the callee lets the register allocation access all registers and use them more flexsibly.	2023-03-28 15:36:02 -07:00
Yonatan Komornik	91f4c23e63	Add salt into row hash (#3528 part 2) (#3533 ) Part 2 of #3528 Adds hash salt that helps to avoid regressions where consecutive compressions use the same tag space with similar data (running zstd -b5e7 enwik8 -B128K reproduces this regression).	2023-03-13 15:34:13 -07:00
Yonatan Komornik	9420bce8a4	Add init once memory (#3528 ) (#3529 ) - Adds memory type that is guaranteed to have been initialized at least once in the workspace's lifetime. - Changes tag space in row hash to be based on init once memory.	2023-03-13 13:20:49 -07:00
Dimitri Papadopoulos	547794ef40	Fix typos found by codespell	2023-02-18 10:31:48 +01:00
Yonatan Komornik	c78f434aa4	Fix zstd-dll build missing dependencies (#3496 ) * Fixes zstd-dll build (https://github.com/facebook/zstd/issues/3492): - Adds pool.o and threading.o dependency to the zstd-dll target - Moves custom allocation functions into header to avoid needing to add dependency on common.o - Adds test target for zstd-dll - Adds github workflow that buildis zstd-dll	2023-02-12 12:32:31 -08:00
Elliot Gorokhovsky	a7de1d9f49	Fix all MSVC warnings (#3495 ) * fix and test MSVC AVX2 build * treat msbuild warnings as errors * fix incorrect MSVC 2019 compiler warning * fix MSVC error D9035: option 'Gm' has been deprecated and will be removed in a future release	2023-02-11 10:56:59 -05:00
Elliot Gorokhovsky	ff42ed1582	Rename "External Matchfinder" to "Block-Level Sequence Producer" (#3484 ) * change "external matchfinder" to "external sequence producer" * migrate contrib/ to new naming convention * fix contrib build * fix error message * update debug strings * fix def of invalid sequences in zstd.h * nit * update CHANGELOG * fix .gitignore	2023-02-09 17:01:17 -05:00
Nick Terrell	2f74507bbd	Simplify 32-bit long offsets decoding logic The previous code had an issue when `bitsConsumed == 32` it would read 0 bits for the `ofBits` read, which violates the precondition of `BIT_readBitsFast()`. This can happen when the stream is corrupted. Fix thie issue by always reading the maximum possible number of extra bits. I've measured neutral decoding performance, likely because this branch is unlikely, but this should be faster anyways. And if not, it is only 32-bit decoding, so performance isn't as critical. Credit to OSS-Fuzz	2023-01-30 12:21:42 -08:00
daniellerozenblit	00176638e3	Merge pull request #3460 from daniellerozenblit/fix-long-offsets-resolution-pointer fix long offset resolution	2023-01-30 14:02:51 -05:00
Nick Terrell	423a74986f	[fse] Delete unused functions Delete all unused FSE functions, now that we are no longer syncing to/from upstream. This avoids confusion about Zstd's stack usage like in Issue #3453. It also removes dead code, which is always a plus.	2023-01-27 13:15:07 -08:00
Danielle Rozenblit	9e4c66b9e9	record long offsets in ZSTD_symbolEncodingTypeStats_t + add test case	2023-01-27 12:04:29 -08:00
Danielle Rozenblit	814f4bfb99	fix long offset resolution	2023-01-27 08:21:47 -08:00
Yann Collet	efc9ae3480	Merge pull request #3455 from facebook/fix3454 Provide more accurate error codes for busy-loop scenarios	2023-01-25 15:22:51 -08:00
Nick Terrell	8957fef554	[huf] Add generic C versions of the fast decoding loops Add generic C versions of the fast decoding loops to serve architectures that don't have an assembly implementation. Also allow selecting the C decoding loop over the assembly decoding loop through a zstd decompression parameter `ZSTD_d_disableHuffmanAssembly`. I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor. The benchmark command forces zstd to compress without any matches, using only literals compression, and measures only Huffman decompression speed: ``` zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar ``` The new fast decoding loops outperform the previous implementation uniformly, but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer from the same stability problems that we've seen in the past, where the assembly version doesn't. So even though clang gets close to assembly on x86-64, it still has stability issues. \| Arch \| Function \| Compiler \| Default (MB/s) \| Assembly (MB/s) \| Fast (MB/s) \| \|---------\|----------------\|--------------\|----------------\|-----------------\|-------------\| \| x86-64 \| decompress 4X1 \| gcc-12.2.0 \| 1029.6 \| 1308.1 \| 1208.1 \| \| x86-64 \| decompress 4X1 \| clang-14.0.6 \| 1019.3 \| 1305.6 \| 1276.3 \| \| x86-64 \| decompress 4X2 \| gcc-12.2.0 \| 1348.5 \| 1657.0 \| 1374.1 \| \| x86-64 \| decompress 4X2 \| clang-14.0.6 \| 1027.6 \| 1659.9 \| 1468.1 \| \| aarch64 \| decompress 4X1 \| clang-12.0.5 \| 1081.0 \| N/A \| 1234.9 \| \| aarch64 \| decompress 4X2 \| clang-12.0.5 \| 1270.0 \| N/A \| 1516.6 \|	2023-01-25 13:47:51 -08:00
Yann Collet	db18a62f89	Provide more accurate error codes for busy-loop scenarios fixes #3454	2023-01-25 13:07:53 -08:00
daniellerozenblit	9116000be6	Merge pull request #3439 from daniellerozenblit/sequence-validation-bug-fix Fix sequence validation and seqStore bounds check	2023-01-23 13:50:37 -05:00
Danielle Rozenblit	815d1d4eda	update external sequence error to fit error naming scheme	2023-01-23 09:58:34 -08:00
Danielle Rozenblit	1b65727e74	fix nits and add new error code for invalid external sequences	2023-01-23 07:59:02 -08:00
Yann Collet	d9280afb7d	fixed minor c89 warning introduced due to parallel merges	2023-01-20 18:04:20 -08:00
Nick Terrell	329169189c	Replace Huffman boolean args with flags bit set	2023-01-20 14:12:53 -08:00
Nick Terrell	0cc1b0cb22	Delete unused Huffman functions Remove all Huffman functions that aren't used by zstd.	2023-01-20 14:12:53 -08:00
Yann Collet	ea684c335a	added c89 build test to CI	2023-01-19 14:59:30 -08:00
W. Felix Handte	d78fbedd96	Don't Even Declare Poisoning Functions if Poisoning is Disabled This guarantees that we won't accidentally forget to check the macro somewhere where we use these functions.	2023-01-13 11:56:48 -05:00
W. Felix Handte	f10922a8fa	Disable Custom ASAN/MSAN Poisoning on MinGW Builds Addresses #3240.	2023-01-13 11:53:09 -05:00
Yann Collet	d5509080bc	Merge pull request #3419 from facebook/fix3416 fix root cause of #3416	2023-01-13 00:21:08 -08:00
Nick Terrell	5b266196a4	Add support for in-place decompression * Add a function and macro ZSTD_decompressionMargin() that computes the decompression margin for in-place decompression. The function computes a tight margin that works in all cases, and the macro computes an upper bound that will only work if flush isn't used. * When doing in-place decompression, make sure that our output buffer doesn't overlap with the input buffer. This ensures that we don't decide to use the portion of the output buffer that overlaps the input buffer for temporary memory, like for literals. * Add a simple unit test. * Add in-place decompression to the simple_round_trip and stream_round_trip fuzzers. This should help verify that our margin stays correct.	2023-01-12 16:28:08 -08:00
Yann Collet	796699c0bc	fix root cause of #3416 A minor change in `5434de0` changed a `<=` into a `<`, and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress (previous limit was effectively 7 literals). This is not in itself a problem, as the threshold is merely an heuristic, but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit. This bug would make the literal compressor believes that all literals are the same symbol, but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions, this outcome could be false, resulting in data corruption. Replaced the blind heuristic by an actual test for all limit cases, so that even if the threshold is changed again in the future, the detection of RLE mode will remain reliable.	2023-01-12 15:41:08 -08:00
Elliot Gorokhovsky	2a402626dd	External matchfinder API (#3333 ) * First building commit with sample matchfinder * Set up ZSTD_externalMatchCtx struct * move seqBuffer to ZSTD_Sequence* * support non-contiguous dictionary * clean up parens * add clearExternalMatchfinder, handle allocation errors * Add useExternalMatchfinder cParam * validate useExternalMatchfinder cParam * Disable LDM + external matchfinder * Check for static CCtx * Validate mState and mStateDestructor * Improve LDM check to cover both branches * Error API with optional fallback * handle RLE properly for external matchfinder * nit * Move to a CDict-like model for resource ownership * Add hidden useExternalMatchfinder bool to CCtx_params_s * Eliminate malloc, move to cwksp allocation * Handle CCtx reset properly * Ensure seqStore has enough space for external sequences * fix capitalization * Add DEBUGLOG statements * Add compressionLevel param to matchfinder API * fix c99 issues and add a param combination error code * nits * Test external matchfinder API * C90 compat for simpleExternalMatchFinder * Fix some @nocommits and an ASAN bug * nit * nit * nits * forward declare copySequencesToSeqStore functions in zstd_compress_internal.h * nit * nit * nits * Update copyright headers * Fix CMake zstreamtest build * Fix copyright headers (again) * typo * Add externalMatchfinder demo program to make contrib * Reduce memory consumption for small blockSize * ZSTD_postProcessExternalMatchFinderResult nits * test sum(matchlen) + sum(litlen) == srcSize in debug builds * refExternalMatchFinder -> registerExternalMatchFinder * C90 nit * zstreamtest nits * contrib nits * contrib nits * allow block splitter + external matchfinder, refactor * add windowSize param * add contrib/externalMatchfinder/README.md * docs * go back to old RLE heuristic because of the first block issue * fix initializer element is not a constant expression * ref contrib from zstd.h * extremely pedantic compiler warning fix, meson fix, typo fix * Additional docs on API limitations * minor nits * Refactor maxNbSeq calculation into a helper function * Fix copyright	2022-12-28 16:45:14 -05:00
Yann Collet	6a9c525903	spec update : require minimum nb of literals for 4-streams mode Reported by @shulib : the specification for 4-streams mode doesn't work when the amount of literals to compress is 5 bytes. Extending it, it also doesn't work for sizes 1 or 2. This patch updates the specification and the implementation to require a minimum of 6 literals to trigger or accept the 4-streams mode. The impact is expected to be a no-op : the 4-streams mode is never triggered for such small quantity of literals anyway, since it would be wasteful (it costs ~7.3 bytes more than single-stream mode). An informal lower limit is set at ~256 bytes, so the technical minimum is very far from this limit. This is just meant for completeness of the specification.	2022-12-22 16:14:34 -08:00
W. Felix Handte	5d693cc38c	Coalesce Almost All Copyright Notices to Standard Phrasing ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache $ -prune -o -type f); do sed -i '/Copyright .* $Yann Collet$\\|$Meta Platforms$/ s/Copyright ./Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0.c lib/legacy/zstd_v0*.h nano ./programs/windres/zstd.rc nano ./build/VS2010/zstd/zstd.rc nano ./build/VS2010/libzstd-dll/libzstd-dll.rc ```	2022-12-20 12:52:34 -05:00
W. Felix Handte	8927f985ff	Update Copyright Headers 'Facebook' -> 'Meta Platforms' ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora $ -prune -o -type f); do sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f; done ```	2022-12-20 12:37:57 -05:00
Yonatan Komornik	26f1bf7d70	CR fixes	2022-12-19 15:13:43 -08:00
Yonatan Komornik	ec42c92aaa	Fix race condition in the Windows thread / pthread translation layer When spawning a Windows thread we have small worker wrapper function that translates between the interfaces of Windows and POSIX threads. This wrapper is given a pointer that might get stale before the worker starts running, resulting in UB and crashes. This commit adds synchronization so that we know the wrapper has finished reading the data it needs before we allow the main thread to resume execution.	2022-12-17 13:38:02 -08:00
Yonatan Komornik	500f02eb66	Fixes two bugs in the Windows thread / pthread translation layer 1. If threads are resized the threads' `ZSTD_pthread_t` might move while the worker still holds a pointer into it (see more details in #3120). 2. The join operation was waiting for a thread and then return its `thread.arg` as a return value, but since the `ZSTD_pthread_t thread` was passed by value it would have a stale `arg` that wouldn't match the thread's actual return value. This fix changes the `ZSTD_pthread_join` API and removes support for returning a value. This means that we are diverging from the `pthread_join` API and this is no longer just an alias. In the future, if needed, we could return a Windows thread's return value using `GetExitCodeThread`, but as this path wouldn't be excised in any case, it's preferable to not add it right now.	2022-12-17 13:38:02 -08:00
daniellerozenblit	e2fc93340f	Merge branch 'dev' into http-to-https	2022-12-15 10:46:13 -05:00
Alex Xu (Hello71)	a78c91ae59	Use proper unaligned access attributes Instead of using packed attribute hack, just use aligned attribute. It improves code generation on armv6 and armv7, and slightly improves code generation on aarch64. GCC generates identical code to regular aligned access on ARMv6 for all versions between 4.5 and trunk, except GCC 5 which is buggy and generates the same (bad) code as packed access: https://gcc.godbolt.org/z/hq37rz7sb	2022-12-14 16:00:37 -08:00
Danielle Rozenblit	4dffc35f2e	Convert references to https from http	2022-12-14 06:58:35 -08:00
Nick Terrell	dcc7228de9	[lazy] Use switch instead of indirect function calls. (#3295 ) Use a switch statement to select the search function instead of an indirect function call. This results in a sizable performance win. This PR is a modification of the approach taken in PR #2828. When I measured performance for that commit, it was neutral. However, I now see a performance regression on gcc, but still neutral on clang. I'm measuring on the same platform, but with newer compilers. The new approach beats both the current dev branch and the baseline before PR #2828 was merged. This PR is necessary for Issue #3275, to update zstd in the kernel. Without this PR there is a large regression in greedy - btlazy2 compression speed. With this PR it is about neutral. gcc version: 12.2.0 clang version: 14.0.6 dataset: silesia.tar \| Compiler \| Level \| Dev Speed (MB/s) \| PR Speed (MB/s) \| Delta \| \|----------\|-------\|------------------\|-----------------\|--------\| \| gcc \| 5 \| 102.6 \| 113.7 \| +10.8% \| \| gcc \| 7 \| 66.6 \| 74.8 \| +12.3% \| \| gcc \| 9 \| 51.5 \| 58.9 \| +14.3% \| \| gcc \| 13 \| 14.3 \| 14.3 \| +0.0% \| \| clang \| 5 \| 108.1 \| 114.8 \| +6.2% \| \| clang \| 7 \| 68.5 \| 72.3 \| +5.5% \| \| clang \| 9 \| 53.2 \| 56.2 \| +5.6% \| \| clang \| 13 \| 14.3 \| 14.7 \| +2.8% \| The binary size stays just about the same for clang and gcc, measured using the `size` command: \| Compiler \| Branch \| Text \| Data \| BSS \| Total \| \|----------\|--------\|---------\|------\|-----\|---------\| \| gcc \| dev \| 1127950 \| 3312 \| 280 \| 1131542 \| \| gcc \| PR \| 1123422 \| 2512 \| 280 \| 1126214 \| \| clang \| dev \| 1046254 \| 3256 \| 216 \| 1049726 \| \| clang \| PR \| 1048198 \| 2296 \| 216 \| 1050710 \|	2022-10-21 17:14:02 -07:00
daniellerozenblit	0d5d571080	Merge pull request #3285 from daniellerozenblit/optimal-huff-depth Optimal huf depth	2022-10-18 10:31:44 -04:00
Danielle Rozenblit	a910489ff5	No longer pass srcSize to minTableLog	2022-10-17 08:03:44 -07:00
Danielle Rozenblit	75cd42afd7	Update regression results and better variable naming for HUF_cardinality	2022-10-14 13:37:19 -07:00
Danielle Rozenblit	c4853e1553	Update threshold to use optimal depth	2022-10-14 11:29:32 -07:00
Danielle Rozenblit	e60cae33cf	Additional ratio optimizations	2022-10-14 10:37:35 -07:00
Yann Collet	b7d55cfa0d	fix issue #3119 fix segfault error when running zstreamtest with MALLOC_PERTURB_	2022-10-12 23:04:23 -07:00
Danielle Rozenblit	fa7d9c1139	Set threshold to use optimal table log	2022-10-11 14:33:25 -07:00
Danielle Rozenblit	8888a2ddcc	CI failure fixes	2022-10-11 13:12:19 -07:00

1 2 3 4 5 ...

736 Commits