krak/zstd - zstd - Gitea: Git with a cup of tea

krak/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-07-04 14:48:45 +02:00

Author	SHA1	Message	Date
Nick Terrell	329169189c	Replace Huffman boolean args with flags bit set	2023-01-20 14:12:53 -08:00
Nick Terrell	0cc1b0cb22	Delete unused Huffman functions Remove all Huffman functions that aren't used by zstd.	2023-01-20 14:12:53 -08:00
Yann Collet	796699c0bc	fix root cause of #3416 A minor change in `5434de0` changed a `<=` into a `<`, and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress (previous limit was effectively 7 literals). This is not in itself a problem, as the threshold is merely an heuristic, but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit. This bug would make the literal compressor believes that all literals are the same symbol, but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions, this outcome could be false, resulting in data corruption. Replaced the blind heuristic by an actual test for all limit cases, so that even if the threshold is changed again in the future, the detection of RLE mode will remain reliable.	2023-01-12 15:41:08 -08:00
Yann Collet	ebba9ff425	update regression results	2023-01-03 14:04:23 -08:00
Yann Collet	5434de01e2	improve compression ratio of small alphabets fix #3328 In situations where the alphabet size is very small, the evaluation of literal costs from the Optimal Parser is initially incorrect. It takes some time to converge, during which compression is less efficient. This is especially important for small files, because there will not be enough data to converge, so most of the parsing is selected based on incorrect metrics. After this patch, the scenario ##3328 gets fixed, delivering the expected 29 bytes compressed size (smallest known compressed size).	2023-01-03 12:22:37 -08:00
daniellerozenblit	1c818e3a0a	Merge pull request #3302 from daniellerozenblit/optimal-huff-depth-speed Optimal huff depth speed improvements	2023-01-03 12:51:51 -05:00
Danielle Rozenblit	df714ddb0f	implement suggestions	2023-01-03 07:20:21 -08:00
Danielle Rozenblit	c26f348dc8	fix CI errors	2022-12-20 12:43:46 -08:00
Danielle Rozenblit	482689b995	huf log speed optimization: unidirectional scan of logs + break when regressing	2022-12-20 12:27:38 -08:00
W. Felix Handte	5d693cc38c	Coalesce Almost All Copyright Notices to Standard Phrasing ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache $ -prune -o -type f); do sed -i '/Copyright .* $Yann Collet$\\|$Meta Platforms$/ s/Copyright ./Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0.c lib/legacy/zstd_v0*.h nano ./programs/windres/zstd.rc nano ./build/VS2010/zstd/zstd.rc nano ./build/VS2010/libzstd-dll/libzstd-dll.rc ```	2022-12-20 12:52:34 -05:00
W. Felix Handte	8927f985ff	Update Copyright Headers 'Facebook' -> 'Meta Platforms' ``` for f in $(find . $ -path ./.git -o -path ./tests/fuzz/corpora $ -prune -o -type f); do sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f; done ```	2022-12-20 12:37:57 -05:00
Nick Terrell	ee6475cbbd	Add missing parens around macro definition Fixes #3301.	2022-12-15 17:18:23 -08:00
Danielle Rozenblit	db74d043d6	Speed optimizations with macro	2022-10-27 10:20:44 -07:00
Danielle Rozenblit	401331909e	Commit for benchmarking	2022-10-24 12:35:16 -07:00
Danielle Rozenblit	b4f0d364af	Merge	2022-10-17 11:24:24 -07:00
Danielle Rozenblit	a08fabd51a	Rough draft speed optimization	2022-10-17 10:24:29 -07:00
Danielle Rozenblit	a910489ff5	No longer pass srcSize to minTableLog	2022-10-17 08:03:44 -07:00
Danielle Rozenblit	b34729018c	Minor simplication: no longer need to check src size if using cardinality for minTableLog	2022-10-17 07:55:07 -07:00
Danielle Rozenblit	75cd42afd7	Update regression results and better variable naming for HUF_cardinality	2022-10-14 13:37:19 -07:00
Danielle Rozenblit	e60cae33cf	Additional ratio optimizations	2022-10-14 10:37:35 -07:00
Danielle Rozenblit	8888a2ddcc	CI failure fixes	2022-10-11 13:12:19 -07:00
Elliot Gorokhovsky	db2f4a6532	Move bitwise builtins into bits.h	2022-02-14 11:16:03 -05:00
binhdvo	b9566fc558	Add rails for huffman table log calculation (#3047 )	2022-02-02 15:12:48 -05:00
Yann Collet	8b46895588	removed new huffman depth heuristic results are now identical to before this PR	2022-01-26 15:22:06 -08:00
Yann Collet	e9dd923fa4	only declare debug functions in debug mode	2022-01-26 14:47:24 -08:00
Yann Collet	5db717af10	proper max limit to 11	2022-01-26 14:47:24 -08:00
Yann Collet	4684836f4f	update regression tests minor compression ratio benefits in some cases, no compression ratio regression in the measured scenarios.	2022-01-26 14:47:24 -08:00
Yann Collet	51da2d2ff2	improved compression of literals in specific corner cases In rare cases, the default huffman depth selector is a bit too harsh, requiring brutal adaptations to the tree, resulting is some loss of compression ratio. This new heuristic avoids the worse cases, favoring compression ratio. As an example, compression of a specific distribution of 771 literals is now improved to 441 bytes, from 601 bytes before.	2022-01-26 14:47:24 -08:00
Yann Collet	7616e39f3b	adding traces to better track processing of literals	2022-01-26 14:47:21 -08:00
Yann Collet	30b9db8ae4	changed macro name to ZSTD_ALIGNOF for better consistency	2021-12-02 12:57:42 -08:00
Yann Collet	39dced092e	fix align conditions for huf_compress	2021-12-01 23:02:00 -08:00
Nick Terrell	5414dd7978	[bmi2] Add lzcnt and bmi target attributes * When dynamic dispatching to bmi2 add lzcnt and bmi to the TARGET_ATTRIBUTE. * Centralize the bmi2 TARGET_ATTRIBUTE definition to BMI2_TARGET_ATTRIBUTE so we can change it in the future. * Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't be any cases where bmi2 is supported but bmi1 isn't. But, since we are using the instruction we should check bmi1 as well.	2021-11-30 17:54:56 -08:00
Dimitris Apostolou	ebbd675998	Fix typos	2021-11-13 10:04:04 +02:00
senhuang42	384744888e	Void out unused functions	2021-11-04 14:32:07 +03:00
Nick Terrell	189e87bcbe	[lib] Make lib compatible with `-Wfall-through` excepting legacy Switch to a macro `ZSTD_FALLTHROUGH;` instead of a comment. On supported compilers this uses an attribute, otherwise it becomes a comment. This is necessary to be compatible with clang's `-Wfall-through`, and gcc's `-Wfall-through=2` which don't support comments. Without this the linux build emits a bunch of warnings. Also add a test to CI to ensure that we don't regress.	2021-09-23 10:51:18 -07:00
Nick Terrell	a5f2c45528	Huffman ASM	2021-09-20 14:46:43 -07:00
Nick Terrell	8bf699aa59	[build] Add support for ASM files in Make + CMake * Extract out common portion of `lib/Makefile` into `lib/libzstd.mk`. Most relevantly, the way we find library files. * Use `lib/libzstd.mk` in the other Makefiles instead of repeating the same code. * Add a test `tests/test-variants.sh` that checks that the builds of `make -C programs allVariants` are correct, and run it in Actions. * Adds support for ASM files in the CMake build. The Meson build is not updated because it lists every file in zstd, and supports ASM off the bat, so the Huffman ASM commit will just add the ASM file to the list. The Visual Studios build is not updated because I'm not adding ASM support to Visual Studios yet.	2021-09-17 14:13:53 -07:00
Sen Huang	1daf3c8dbc	Use 32 buckets for log2 bucketing in huffman sort	2021-09-13 12:29:16 -04:00
senhuang42	aa1957477b	Improve Huffman sorting algorithm	2021-08-04 12:43:34 -04:00
Nick Terrell	d8a0797268	[fuzz] Add Huffman round trip fuzzer * Add a Huffman round trip fuzzer * Fix two minor bugs in Huffman that aren't exposed in zstd - Incorrect weight comparison (weights are allowed to be equal to table log). - HUF_compress1X_usingCTable_internal() can return compressed size >= source size, so the assert that `cSize <= 65535` isn't correct, and it needs to be checked instead.	2021-08-03 08:10:06 -07:00
Nick Terrell	46f2710562	[HUF] Improve Huffman encoding speed Improve Huffman encoding speed by 20% for gcc and 10% for clang. \| Compiler \| Benchmark \| Config \| Dataset \| Ratio \| Speed MB/s (dev) \| Speed MB/s (huf-cspeed) \| Speed MB/s (huf-cspeed - dev) \| \|----------\|-------------------\|---------\|-------------\|-------\|------------------\|-------------------------\|-------------------------------\| \| gcc \| compress \| level_1 \| enwik7 \| 2.43 \| 253.70 \| 258.72 \| 2.0% \| \| gcc \| compress \| level_1 \| silesia \| 2.88 \| 341.90 \| 348.15 \| 1.8% \| \| gcc \| compress_literals \| level_1 \| enwik7 \| 1.49 \| 761.83 \| 912.76 \| 19.8% \| \| gcc \| compress_literals \| level_1 \| silesia \| 1.28 \| 754.83 \| 902.37 \| 19.5% \| \| gcc \| compress_literals \| level_7 \| enwik7 \| 1.29 \| 502.81 \| 552.79 \| 9.9% \| \| gcc \| compress_literals \| level_7 \| silesia \| 1.11 \| 675.97 \| 776.44 \| 14.9% \| \| clang \| compress \| level_1 \| enwik7 \| 2.43 \| 277.54 \| 280.98 \| 1.2% \| \| clang \| compress \| level_1 \| silesia \| 2.88 \| 369.98 \| 375.46 \| 1.5% \| \| clang \| compress_literals \| level_1 \| enwik7 \| 1.49 \| 828.83 \| 918.41 \| 10.8% \| \| clang \| compress_literals \| level_1 \| silesia \| 1.28 \| 815.81 \| 905.41 \| 11.0% \| \| clang \| compress_literals \| level_7 \| enwik7 \| 1.29 \| 533.13 \| 553.30 \| 3.8% \| \| clang \| compress_literals \| level_7 \| silesia \| 1.11 \| 714.52 \| 775.38 \| 8.5% \|	2021-07-27 15:10:35 -07:00
Binh Vo	dc5b693f1e	Proactively skip huffman compression based on sampling where non-compressibility is suspected	2021-06-30 11:02:47 -04:00
Nick Terrell	05b6773fbc	[fix] Add missing bounds checks during compression * The block splitter missed a bounds check, so when the buffer is too small it passes an erroneously large size to `ZSTD_entropyCompressSeqStore()`, which can then write the compressed data past the end of the buffer. This is a new regression in v1.5.0 when the block splitter is enabled. It is either enabled explicitly, or implicitly when using the optimal parser and `ZSTD_compress2()` or `ZSTD_compressStream()`. `HUF_writeCTable_wksp()` omits a bounds check when calling `HUF_compressWeights()`. If it is called with `dstCapacity == 0` it will pass an erroneously large size to `HUF_compressWeights()`, which can then write past the end of the buffer. This bug has been present for ages. However, I believe that zstd cannot trigger the bug, because it never calls `HUF_compress*()` with `dstCapacity == 0` because of [this check][1]. Credit to: Oss-Fuzz [1]: `89127e5ee2/lib/compress/zstd_compress_literals.c (L100)`	2021-06-14 11:35:33 -07:00
Nick Terrell	a494308ae9	[copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files * Switch to yearless copyright per FB policy * Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources * Add zstd copyright/license header to the `contrib/linux-kernel` sources * Update the `tests/test-license.py` to check for yearless copyright * Improvements to `tests/test-license.py` * Check `contrib/linux-kernel` in `tests/test-license.py`	2021-03-30 10:30:43 -07:00
Nick Terrell	5df2a21f1e	Add HUF_writeCTable_wksp() function This saves ~700 bytes of stack space in HUF_writeCTable.	2021-03-05 10:29:18 -08:00
Nick Terrell	66e811d782	[license] Update year to 2021	2021-01-04 17:53:52 -05:00
Yann Collet	a7cb4af573	added emphasis on the alignment condition of workspace and made it a programming mistake (`assert()`) rather than a runtime error.	2020-12-18 15:04:09 -08:00
Nick Terrell	ae85676d44	Fix alignment of scratchBuffer in HUF_compressWeights() The scratch buffer must be 4-byte aligned. This causes test failures in 32-bit systems, where the stack isn't aligned. Fixes Issue #2428.	2020-12-17 14:30:27 -08:00
Yann Collet	b8c3a473ec	Merge pull request #2420 from terrelln/huf-comment [huf_compress] Refactor and comment HUF_buildCTable()	2020-12-14 16:14:07 -08:00
Nick Terrell	1bbcf07bd5	[huf_compress] Refactor and comment HUF_buildCTable() Comment and refactor `HUF_buildCTable()` and the helper functions it calls as I read and understand the code. Hopefully this refactor makes the code a bit more clear.	2020-12-08 13:57:01 -08:00

1 2 3

118 Commits