krak/zstd - zstd - Gitea: Git with a cup of tea

krak/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-03-07 09:26:03 +02:00

Author	SHA1	Message	Date
Elliot Gorokhovsky	ff42ed1582	Rename "External Matchfinder" to "Block-Level Sequence Producer" (#3484 ) * change "external matchfinder" to "external sequence producer" * migrate contrib/ to new naming convention * fix contrib build * fix error message * update debug strings * fix def of invalid sequences in zstd.h * nit * update CHANGELOG * fix .gitignore	2023-02-09 17:01:17 -05:00
Nick Terrell	83f8a05f87	Fix empty-block.zst golden decompression file This frame is invalid because the `Window_Size = 0`, and the `Block_Maximum_Size = min(128 KB, Window_Size) = 0`. But the empty compressed block has a `Block_Content` size of 2, which is invalid. The fix is to switch to using a `Window_Descriptor` instead of the `Single_Segment_Flag`. This sets the `Window_Size = 1024`. Hexdump before this PR: `28b5 2ffd 2000 1500 0000 00` Hexdump after this PR: `28b5 2ffd 0000 1500 0000 00` For issue #3482.	2023-02-08 14:11:22 -08:00
Yann Collet	9cabd155fd	return error code when benchmark fails such scenario can happen, for example, when trying a decompression-only benchmark on invalid data. Other possibilities include an allocation error in an intermediate step. So far, the benchmark would return immediately, but still return 0. On command line, this would be confusing, as the program appears successful (though it does not display any successful message). Now it returns !0, which can be interpreted as an error by command line.	2023-02-07 00:35:51 -08:00
Elliot Gorokhovsky	31e41b3d5e	Merge pull request #3471 from embg/fast_seq_parse Reduce external matchfinder API overhead by 25%	2023-02-01 21:30:36 -05:00
Elliot Gorokhovsky	7f8189ca57	add ZSTD_c_fastExternalSequenceParsing cctxParam	2023-02-01 09:09:53 -08:00
Yann Collet	ac0746ac19	Merge pull request #3470 from facebook/bench_zstd_only ensure that benchmark mode can only be invoked with zstd format	2023-01-31 16:22:20 -08:00
Elliot Gorokhovsky	64052ef57d	Guard against invalid sequences from external matchfinders (#3465 )	2023-01-31 13:55:48 -05:00
Yann Collet	af09777b24	ensure that benchmark mode can only be invoked with zstd format fix #3463	2023-01-31 09:04:29 -08:00
daniellerozenblit	00176638e3	Merge pull request #3460 from daniellerozenblit/fix-long-offsets-resolution-pointer fix long offset resolution	2023-01-30 14:02:51 -05:00
Danielle Rozenblit	66fae56c86	remove big test around large offset with small window size	2023-01-30 06:26:03 -08:00
Danielle Rozenblit	da589a134a	update CI	2023-01-27 14:18:29 -08:00
Danielle Rozenblit	9e4c66b9e9	record long offsets in ZSTD_symbolEncodingTypeStats_t + add test case	2023-01-27 12:04:29 -08:00
Danielle Rozenblit	d210628b0b	initialize long offsets in decodecorpus	2023-01-27 09:52:00 -08:00
Yann Collet	82ca00811a	change logic when stderr is not console : don't update progress status but keep warnings and final operation statement. updated tests/cli-tests/ accordingly	2023-01-26 13:00:52 -08:00
Yann Collet	3c215220e3	modify cli-test logic : ignore stderr message by default Previously, cli-test would, by default, check that a stderr output is strictly identical to a saved outcome. When there was no instructions on how to interpret stderr, it would default to requiring it to be empty. There are many tests cases though where stderr content doesn't matter, and we are mainly interested in the return code of the cli. For these cases, it was possible to set a .ignore document, which would instruct to ignore stderr content. This PR update the logic, to make .ignore the default. When willing to check that stderr content is empty, one must now add an empty .strict file. This will allow status message to evolve without triggering many cli-tests errors. This is especially important when some of these status include compression results, which may change as a result of compression optimizations. It also makes it easier to add new tests which only care about the CLI's return code.	2023-01-26 10:57:41 -08:00
Yann Collet	8c85b29e32	disable --rm on -o command make it more similar to -c (aka `stdout`) convention.	2023-01-25 16:09:25 -08:00
Nick Terrell	321490cd5b	[version-test] Work around bugs in v0.7.3 dict builder Before calling a dictionary good, make sure that it can compress an input. If v0.7.3 rejects v0.7.3's dictionary, fall back to the v1.0 dictionary. This is not the job of the verison test to test it, because we cannot fix this code.	2023-01-25 13:47:51 -08:00
Nick Terrell	8957fef554	[huf] Add generic C versions of the fast decoding loops Add generic C versions of the fast decoding loops to serve architectures that don't have an assembly implementation. Also allow selecting the C decoding loop over the assembly decoding loop through a zstd decompression parameter `ZSTD_d_disableHuffmanAssembly`. I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor. The benchmark command forces zstd to compress without any matches, using only literals compression, and measures only Huffman decompression speed: ``` zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar ``` The new fast decoding loops outperform the previous implementation uniformly, but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer from the same stability problems that we've seen in the past, where the assembly version doesn't. So even though clang gets close to assembly on x86-64, it still has stability issues. \| Arch \| Function \| Compiler \| Default (MB/s) \| Assembly (MB/s) \| Fast (MB/s) \| \|---------\|----------------\|--------------\|----------------\|-----------------\|-------------\| \| x86-64 \| decompress 4X1 \| gcc-12.2.0 \| 1029.6 \| 1308.1 \| 1208.1 \| \| x86-64 \| decompress 4X1 \| clang-14.0.6 \| 1019.3 \| 1305.6 \| 1276.3 \| \| x86-64 \| decompress 4X2 \| gcc-12.2.0 \| 1348.5 \| 1657.0 \| 1374.1 \| \| x86-64 \| decompress 4X2 \| clang-14.0.6 \| 1027.6 \| 1659.9 \| 1468.1 \| \| aarch64 \| decompress 4X1 \| clang-12.0.5 \| 1081.0 \| N/A \| 1234.9 \| \| aarch64 \| decompress 4X2 \| clang-12.0.5 \| 1270.0 \| N/A \| 1516.6 \|	2023-01-25 13:47:51 -08:00
Danielle Rozenblit	7d600c628a	fix bound check for ZSTD_copySequencesToSeqStoreNoBlockDelim()	2023-01-24 06:40:40 -08:00
Danielle Rozenblit	0a91b31b17	Merge branch 'dev' into fuzz-sequence-compression for testing	2023-01-23 11:11:33 -08:00
daniellerozenblit	9116000be6	Merge pull request #3439 from daniellerozenblit/sequence-validation-bug-fix Fix sequence validation and seqStore bounds check	2023-01-23 13:50:37 -05:00
Danielle Rozenblit	7fc00c18b8	calloc dictionary in sequence compression fuzzer rather than generating a random buffer	2023-01-23 10:42:09 -08:00
Danielle Rozenblit	815d1d4eda	update external sequence error to fit error naming scheme	2023-01-23 09:58:34 -08:00
Danielle Rozenblit	f75afb613f	merge dev	2023-01-23 08:12:19 -08:00
Danielle Rozenblit	1b65727e74	fix nits and add new error code for invalid external sequences	2023-01-23 07:59:02 -08:00
Danielle Rozenblit	638d502002	modify sequence compression api fuzzer	2023-01-23 07:55:11 -08:00
Yann Collet	cee6bec9fa	refactor : --rm is ignored with stdout `zstd` CLI has progressively moved to the policy of ignoring `--rm` command when the output is `stdout`. The primary drive is to feature a behavior more consistent with `gzip`, when `--rm` is the default, but is also ignored when output is `stdout`. Other policies are certainly possible, but would break from this `gzip` convention. The new policy was inconsistenly enforced, depending on the exact list of commands. For example, it was possible to circumvent it by using `-c --rm` in this order, which would re-establish source removal. - Update the CLI so that it necessarily catch these situations and ensure that `--rm` is always disabled when output is `stdout`. - Added a warning message in this case (for verbosity 3 `-v`). - Added an `assert()`, which controls that `--rm` is no longer active with `stdout` - Added tests, which control the behavior, even when `--rm` is added after `-c` - Removed some legacy code which where trying to apply a specific policy for the `stdout` + `--rm` case, which is no longer possible	2023-01-20 18:04:55 -08:00
Felix Handte	3d25502c2d	Merge pull request #3432 from felixhandte/fix-perms Fix CLI Handling of Permissions and Ownership (Again)	2023-01-20 19:19:05 -05:00
Nick Terrell	b4467c1061	Fix bufferless API with attached dictionary Fixes #3102.	2023-01-20 16:15:16 -08:00
Nick Terrell	329169189c	Replace Huffman boolean args with flags bit set	2023-01-20 14:12:53 -08:00
Nick Terrell	0cc1b0cb22	Delete unused Huffman functions Remove all Huffman functions that aren't used by zstd.	2023-01-20 14:12:53 -08:00
Nick Terrell	667eb6d4fd	[versions-test] Work around bug in dictionary builder for older versions Older versions of zstandard have a bug in the dictionary builder, that can cause dictionary building to fail. The process still exits 0, but the dictionary is not created. For reference, the bug is that it creates a dictionary that starts with the zstd dictionary magic, in the process of writing the dictionary header, but the header isn't fully written yet, and zstd fails compressions in this case, because the dictionary is malformated. We fixed this later on by trying to load the dictionary as a zstd dictionary, but if that fails we fallback to content only (by default). The fix is to: 1. Make the dictionary determinsitic by sorting the input files. Previously the bug would only sometimes occur, when the input files were in a particular order. 2. If dictionary creation fails, fallback to the `head` dictionary.	2023-01-20 14:05:36 -08:00
Nick Terrell	666944fbe6	Cap hashLog & chainLog to ensure that we only use 32 bits of hash * Cap shortCache chainLog to 24 * Cap row match finder hashLog so that rowLog <= 24 * Add unit tests to expose all cases. The row match finder unit tests are only run in 64-bit mode, because they allocate ~1GB. Fixes #3336	2023-01-20 14:05:26 -08:00
Danielle Rozenblit	aa385ece13	fix sequence validation and bounds check in ZSTD_copySequencesToSeqStore()	2023-01-20 10:32:35 -08:00
Elliot Gorokhovsky	f593e54ee1	Enable if == 1 rather than if == 0 Co-authored-by: Nick Terrell <nickrterrell@gmail.com>	2023-01-20 11:41:53 -05:00
Elliot Gorokhovsky	3f9f568aa6	Fuzz the external matchfinder API	2023-01-19 13:33:25 -08:00
Elliot Gorokhovsky	bce0382c82	Bugfixes for the External Matchfinder API (#3433 ) * external matchfinder bugfixes + tests * small doc fix	2023-01-19 10:41:24 -05:00
daniellerozenblit	dc1c6cc5df	Merge pull request #3418 from daniellerozenblit/fuzz-max-block-size Fuzz on maxBlockSize	2023-01-19 08:18:04 -05:00
Yann Collet	bbe65d760c	Merge pull request #3423 from facebook/ptime Refactor timefn, restore support for clock_gettime()	2023-01-18 13:27:42 -08:00
W. Felix Handte	7a8c8f3fe7	Easy: Print Mode as Octal in `chmod()` Trace	2023-01-18 11:57:54 -08:00
W. Felix Handte	0d2d460223	Mimic gzip chown(gid), chmod(), chown(uid) Behavior Avoids a race condition in which we unintentionally open up permissions to the wrong group.	2023-01-18 11:57:54 -08:00
W. Felix Handte	1e3eba65a6	Copy Permissions from Source File	2023-01-18 11:57:35 -08:00
W. Felix Handte	0382076af7	Re-Use `stat_t` in `FIO_compressFilename_srcFile()`	2023-01-18 11:33:07 -08:00
Nick Terrell	860548cd5b	[tests] Fix version test determinism The dictionary source files were taken from the `dev` branch before this commit, which could introduce non-determinism on PR jobs. Instead take the sources from the PR checkout. This PR also adds stderr logging, and verbose output for the jobs that are failing, to help catch the failure if it occurs again.	2023-01-17 14:10:46 -08:00
W. Felix Handte	a5ed28f1fb	Use Existing Src File Stat in `*_dstFile()` Funcs One fewer `stat()` call to make per operation!	2023-01-17 14:08:22 -08:00
Danielle Rozenblit	8353a4b095	fix maxBlockSize resolution + add test cases	2023-01-17 12:24:18 -08:00
Yann Collet	2086e7396e	missing #include for Windows	2023-01-13 11:38:27 -08:00
Yann Collet	bcfb7ad03c	refactor timefn The timer storage type is no longer dependent on OS. This will make it possible to re-enable posix precise timers since the timer storage type will no longer be sensible to #include order. See #3168 for details of pbs of previous interface. Suggestion by @terrelln	2023-01-12 19:24:31 -08:00
Nick Terrell	5b266196a4	Add support for in-place decompression * Add a function and macro ZSTD_decompressionMargin() that computes the decompression margin for in-place decompression. The function computes a tight margin that works in all cases, and the macro computes an upper bound that will only work if flush isn't used. * When doing in-place decompression, make sure that our output buffer doesn't overlap with the input buffer. This ensures that we don't decide to use the portion of the output buffer that overlaps the input buffer for temporary memory, like for literals. * Add a simple unit test. * Add in-place decompression to the simple_round_trip and stream_round_trip fuzzers. This should help verify that our margin stays correct.	2023-01-12 16:28:08 -08:00
Yann Collet	423500d1ae	Merge pull request #3413 from facebook/timefn minor refactoring for timefn	2023-01-12 15:34:00 -08:00

1 2 3 4 5 ...

1858 Commits