1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-03 22:30:29 +02:00
Commit Graph

94 Commits

Author SHA1 Message Date
d09f195ceb Remove blockCompressor NULL Checks 2023-05-04 12:18:58 -04:00
b7add1dd67 Abort if Unsupported Parameters Used 2023-05-04 12:18:58 -04:00
50cdf84f58 Macro-Exclude Block Compressors from Declaration/Definition 2023-05-04 12:18:58 -04:00
cbf3e26316 Allow ZSTD_selectBlockCompressor() to Return NULL
Return an error rather than segfaulting.
2023-05-04 12:18:58 -04:00
5d693cc38c Coalesce Almost All Copyright Notices to Standard Phrasing
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do sed -i '/Copyright .* \(Yann Collet\)\|\(Meta Platforms\)/ s/Copyright .*/Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done

git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0*.c lib/legacy/zstd_v0*.h
nano ./programs/windres/zstd.rc
nano ./build/VS2010/zstd/zstd.rc
nano ./build/VS2010/libzstd-dll/libzstd-dll.rc
```
2022-12-20 12:52:34 -05:00
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
f6ef14329f "Short cache" optimization for level 1-4 DMS (+5-30% compression speed) (#3152)
* first attempt at fast DMS short cache

* significant wins for some scenarios

* fix all clang regressions

* nits

* fix 1.5% gcc11 regression on hot 110Kdict scenario

* fix CI

* nit

* Add tags to doublefast hash table

* use tags in doublefast DMS

* Fix CI

* Clean up some hardcoded logic / constants

* Switch forCCtx to an enum

* nit

* add short cache to ip+1 long search

* Move tag size into hashLog

* Minor nits

* Truncate dictionaries greater than 16MB in short cache mode

* Helper function for tag comparison

* Cap short cache hashLog at 24 to prevent overflow

* size_t dictTagsMatch -> int dictTagsMatch

* nit

* Clean up and comment dictionary truncation

* Move ZSTD_tableFillPurpose_e next to ZSTD_dictTableLoadMethod_e

* Comment and expand helper functions

* Asserts and documentation

* nit
2022-06-21 17:27:19 -04:00
b772f53952 Typo and grammar fixes 2022-03-12 08:58:04 +01:00
7a18d709ae updated all names to offBase convention 2021-12-29 17:30:43 -08:00
435f5a2e6d fixed regression test assert
optLdm->offset might be == 0 in invalid case.
Only use STORE_OFFSET() after validating it's a correct case.
2021-12-28 09:55:31 -08:00
1aed962216 introduce macros STORE_OFFSET() and STORE_REPCODE()
this meant to abstract the sumtype representation required
to transfert `offcode` to `ZSTD_storeSeq()`.

Unfortunately, the sumtype numeric representation is currently a leaky abstraction
that has permeated many other parts of the code,
especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`.

While this PR makes a good job a transfering a large nb of call sites
to using the new macros, there are still a few sites where this transformation is more complex,
or where the numeric representation itself it used "as is".

One of the problematics area is the decision to use the numeric format of the sumtype
within the match finders of `zstd_lazy`.

This commit doesn't change the behavior, it only introduces and employes the macros,
but eventually the resulting code remains identical.

At target, if the numeric representation of the sumtype can be completely abstracted
and no other part of the code depends on it,
it will be possible to move it towards something slightly more efficient.
2021-12-23 22:03:30 -08:00
b77fcac61f change ZSTD_storeSeq() interface to accept matchLength
instead of mlBase.

This removes the need to do `- MINMATCH` at every call site.

The new interface contract is checked with an `assert()`.
2021-12-23 12:03:33 -08:00
ebbd675998 Fix typos 2021-11-13 10:04:04 +02:00
06f42c3bfd Use new paramSwitch enum for LDM 2021-09-21 14:22:09 -04:00
b5c35d7ea3 Use new paramSwitch enum for LCM, row matchfinder, and block splitter 2021-09-21 14:22:02 -04:00
8389a5122b Merge pull request #2602 from terrelln/ldm-opt
[LDM] Speed optimization on repetitive data
2021-05-04 23:13:09 -07:00
32823bc150 [LDM] Speed optimization on repetitive data
LDM does especially poorly on repetitive data when that data's hash happens
to have `(hash & stopMask) == 0`. Either because the `stopMask == 0` or
random chance. Optimize this case by skipping over repetitive patterns.
The detection is very simplistic, but should catch most of the offending
cases.

```
head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
      21.187881087 seconds time elapsed

head -c 1G /dev/zero | perf stat -- ./zstd -1 -o /dev/null -v --zstd=ldmHashRateLog=1 --long
       1.149707921 seconds time elapsed

```
2021-05-04 10:57:42 -07:00
34aff7ea06 Bug fix & run overflow correction much more frequently in tests
* Fix overflow correction when `windowLog < cycleLog`. Previously, we
  got the correction wrong in this case, and our chain tables and binary
  trees would be corrupted. Now, we work as long as `maxDist` is a power
  of two, by adding `MAX(maxDist, cycleSize)` to our indices.
* When `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` is defined to non-zero
  run overflow correction as frequently as allowed without impacting
  compression ratio.
* Enable `ZSTD_WINDOW_OVERFLOW_CORRECT_FREQUENTLY` in `fuzzer` and
  `zstreamtest` as well as all the OSS-Fuzz fuzzers. This has a 5-10%
  speed penalty at most, which seems reasonable.
2021-05-03 15:21:47 -07:00
4694423c4f Add and integrate lazy row hash strategy 2021-04-07 09:53:34 -07:00
a494308ae9 [copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
552efcac2d relocate large arrays from the stack to ldmState_t 2021-02-10 16:16:54 +01:00
e2ad174d73 fix some compiler warnings 2021-02-08 20:19:16 +01:00
874a590e5c deal safely with short inputs in ZSTD_ldm_generateSequences
The fuzzer CI found this bug.
2021-02-04 11:15:24 +01:00
9f327c02fd new core ldm algorithm 2021-02-03 22:24:07 +01:00
aee3dc877f fix a variable name to reflect its nature 2021-01-22 02:24:19 -08:00
d6e3de77dc fix warning and remove one more occurrence of makeEntryAndInsertByTag 2021-01-20 01:39:16 -08:00
e0d5eca8fa fix forgotten numTagBits in getTagMask 2021-01-20 00:54:20 -08:00
1e65711ca5 a couple performance improvement changes for ldm 2021-01-20 00:54:20 -08:00
92a2b5ccc9 fixup: lits means literals 2021-01-07 23:30:42 +01:00
f9802d80a0 fix typos (work done by Andrea Gelmini) 2021-01-07 18:47:23 +01:00
66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
0953645837 Merge pull request #2362 from senhuang42/fix_ldm_fuzz_issue
Fix long distance matcher OSS-fuzz issue
2020-10-27 11:13:03 -07:00
4d01979b62 Expose and call ZSTD_ldm_skipRawSeqStoreBytes() 2020-10-16 20:30:00 -04:00
d0550bb18f Clarify argument names, fix DEBUGLOG() statements 2020-10-14 15:45:43 -04:00
3f99c9b38d Adjust match backwards count args 2020-10-14 15:23:03 -04:00
bf0d559449 Introduce, implement, and call ZSTD_ldm_countBackwardsMatch_2segments() 2020-10-14 12:58:06 -04:00
a6165c1b28 Change matchState_t::ldmSeqStore to pointer 2020-10-07 14:13:57 -04:00
abce708a56 Move posInSequence correction to correct location 2020-10-07 13:56:25 -04:00
0fac8e07e1 Refactor usage of ms->ldmSeqStore so that it is not modified during compressBlock(), and simplify skipRawSeqStoreBytes 2020-10-07 13:56:25 -04:00
a5500cf2af Refactor separate ldm variables all into one struct 2020-10-07 13:56:25 -04:00
031b7ec15f Disable LDM minMatch adjustment when using opt parser 2020-10-07 13:56:25 -04:00
b8bfc4e63d Add cSize regression test to fuzzer.c 2020-10-07 13:56:25 -04:00
10647924f1 Make function descriptions more accurate 2020-10-07 13:56:25 -04:00
7dee62c287 Reset ldmSeqStore after initStats_ultra() pass for btultra2 2020-10-07 13:56:25 -04:00
ea92fb3a68 Cleanups, add comments and explanations 2020-10-07 13:56:25 -04:00
6ccd97fc96 Fixed end of match boundary update issues 2020-10-07 13:56:25 -04:00
28394b64f2 Add proper bounds check on adding ldms 2020-10-07 13:56:25 -04:00
f57c7e6bbf Add base adjustment correction 2020-10-07 13:56:25 -04:00
84009a076a Add re-copying of ldmSeqStore after processing 2020-10-07 13:56:25 -04:00
35d9f488f5 Modify codepath to use opt parser exclusively if the compression level is high enough 2020-10-07 13:56:24 -04:00