1
0
mirror of https://github.com/facebook/zstd.git synced 2025-03-07 09:26:03 +02:00

10074 Commits

Author SHA1 Message Date
Nick Terrell
5b266196a4 Add support for in-place decompression
* Add a function and macro ZSTD_decompressionMargin() that computes the
  decompression margin for in-place decompression. The function computes
  a tight margin that works in all cases, and the macro computes an upper
  bound that will only work if flush isn't used.
* When doing in-place decompression, make sure that our output buffer
  doesn't overlap with the input buffer. This ensures that we don't
  decide to use the portion of the output buffer that overlaps the input
  buffer for temporary memory, like for literals.
* Add a simple unit test.
* Add in-place decompression to the simple_round_trip and
  stream_round_trip fuzzers. This should help verify that our margin stays
  correct.
2023-01-12 16:28:08 -08:00
Yann Collet
ac45e078a5 add explanation about new test
as requested by @terrelln
2023-01-12 15:49:01 -08:00
Yann Collet
796699c0bc fix root cause of #3416
A minor change in 5434de0 changed a `<=` into a `<`,
and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress
(previous limit was effectively 7 literals).

This is not in itself a problem, as the threshold is merely an heuristic,
but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit.
This bug would make the literal compressor believes that all literals are the same symbol,
but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions,
this outcome could be false, resulting in data corruption.

Replaced the blind heuristic by an actual test for all limit cases,
so that even if the threshold is changed again in the future,
the detection of RLE mode will remain reliable.
2023-01-12 15:41:08 -08:00
Yann Collet
423500d1ae
Merge pull request #3413 from facebook/timefn
minor refactoring for timefn
2023-01-12 15:34:00 -08:00
Danielle Rozenblit
06b096db47 additional tests and documentation updates + allow maxBlockSize to be set to 0 (goes to default) 2023-01-12 13:41:50 -08:00
Felix Handte
fd2eb8a68e
Merge pull request #3402 from facebook/dependabot/github_actions/ossf/scorecard-action-2.1.2
Bump ossf/scorecard-action from 2.1.0 to 2.1.2
2023-01-12 13:28:04 -05:00
Danielle Rozenblit
53eb5a758c add simple test for maxBlockSize expected functionality 2023-01-12 08:55:39 -08:00
Elliot Gorokhovsky
4f7183d887
Completely overhaul Windows CI (#3410)
* Overhaul windows CI

* upgrade setup-msbuild from v1.1.3 to v1.3

* remove cmake 2019 test

* fix 32-bit gcc mingw test

* merge conflict
2023-01-11 16:29:23 -05:00
Danielle Rozenblit
1fffcfe01d update minimum threshold for max block size 2023-01-11 11:09:57 -08:00
Daniel Kutenin
ca2ff788df Make the producer use the same amount of entropy 2023-01-11 10:09:19 -08:00
Daniel Kutenin
3ac0b91302 Fix fuzzing with ZSTD_MULTITHREAD
At Google we fuzz zstd without ZSTD_MULTITHREAD but we want inputs to be as much as reproducible. It allows us to test new fuzzing methods for our fuzz team internally and have more horsepower to find bugs
2023-01-11 10:09:19 -08:00
Yann Collet
98ca8b4456
Merge pull request #3414 from facebook/dependabot/github_actions/actions/checkout-3.3.0
Bump actions/checkout from 3.2.0 to 3.3.0
2023-01-09 11:33:29 -08:00
Danielle Rozenblit
fe08137d9a resolve max block value in cctx and use when calculating the max block size 2023-01-09 07:53:53 -08:00
dependabot[bot]
59a536aa01
Bump actions/upload-artifact from 3.1.1 to 3.1.2
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 3.1.1 to 3.1.2.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](83fd05a356...0b7f8abb15)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-09 05:10:50 +00:00
dependabot[bot]
6f17a5d8df
Bump actions/checkout from 3.2.0 to 3.3.0
Bumps [actions/checkout](https://github.com/actions/checkout) from 3.2.0 to 3.3.0.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](755da8c3cf...ac59398561)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2023-01-09 05:10:46 +00:00
Yann Collet
8b130009e3 minor simplification refactoring for timefn
`UTIL_getSpanTimeMicro()` can be factored in a generic way,
reducing OS-dependent code.
2023-01-06 16:12:54 -08:00
Yann Collet
71dbe8f9d4 minor: fix conversion warnings 2023-01-04 20:00:04 -08:00
daniellerozenblit
d913417f72
Merge branch 'dev' into fuzz-max-block-size 2023-01-04 16:34:07 -05:00
Danielle Rozenblit
908e812733 initial commit 2023-01-04 13:01:54 -08:00
Yann Collet
834fd07a99
Merge pull request #3391 from facebook/fix3228
improve compression ratio of small alphabets
2023-01-03 16:01:52 -08:00
Yann Collet
c79fb4d78d update levels.sh test
comparing level 19 to level 22 and expecting a stricter better result from level 22
is not that guaranteed,
because level 19 and 22 are very close to each other,
especially for small files,
so any noise in the final compression result
result in failing this test.

Level 22 could be compared to something much lower, like level 15,
But level 19 is required anyway, because there is a clamping test which depends on it.

Removed level 22, kept level 19
2023-01-03 14:04:41 -08:00
Yann Collet
ebba9ff425 update regression results 2023-01-03 14:04:23 -08:00
Yann Collet
5434de01e2 improve compression ratio of small alphabets
fix #3328

In situations where the alphabet size is very small,
the evaluation of literal costs from the Optimal Parser is initially incorrect.
It takes some time to converge, during which compression is less efficient.
This is especially important for small files,
because there will not be enough data to converge,
so most of the parsing is selected based on incorrect metrics.

After this patch, the scenario ##3328 gets fixed,
delivering the expected 29 bytes compressed size (smallest known compressed size).
2023-01-03 12:22:37 -08:00
daniellerozenblit
1c818e3a0a
Merge pull request #3302 from daniellerozenblit/optimal-huff-depth-speed
Optimal huff depth speed improvements
2023-01-03 12:51:51 -05:00
Danielle Rozenblit
87becc567d update regression results.csv 2023-01-03 08:41:40 -08:00
Danielle Rozenblit
df714ddb0f implement suggestions 2023-01-03 07:20:21 -08:00
Yann Collet
3248910432
Merge pull request #3248 from facebook/opt_comments1
[easy] add a few comments to the optimal parser code base for improved clarity
2022-12-28 18:03:57 -08:00
Yann Collet
d07e72bb13 fixed incorrect assert
commented Fweight instead
2022-12-28 17:23:40 -08:00
Yann Collet
4a1a79a512 just add some comments to zstd_opt for improved clarity 2022-12-28 16:24:12 -08:00
Yann Collet
9fbbd74871
Merge pull request #3400 from danlark1/dev
Move deprecated annotation before static to allow C++ compilation for clang
2022-12-28 15:50:26 -08:00
Yann Collet
bcbd395c1c
Merge pull request #3395 from terrelln/2022-12-21-deprecated-test
[tests] Remove deprecated function from longmatch.c test
2022-12-28 15:49:50 -08:00
Yann Collet
00c85b28e7 update ZSTD_CCts_setCParams() inline documentation
specify behavior when changing compression parameters during MT compression,
reported by @embg
2022-12-28 15:08:18 -08:00
Yann Collet
481a2e1010
Merge pull request #3403 from facebook/setCParams
ZSTD_CCtx_setCParams
2022-12-28 14:07:13 -08:00
Elliot Gorokhovsky
2a402626dd
External matchfinder API (#3333)
* First building commit with sample matchfinder

* Set up ZSTD_externalMatchCtx struct

* move seqBuffer to ZSTD_Sequence*

* support non-contiguous dictionary

* clean up parens

* add clearExternalMatchfinder, handle allocation errors

* Add useExternalMatchfinder cParam

* validate useExternalMatchfinder cParam

* Disable LDM + external matchfinder

* Check for static CCtx

* Validate mState and mStateDestructor

* Improve LDM check to cover both branches

* Error API with optional fallback

* handle RLE properly for external matchfinder

* nit

* Move to a CDict-like model for resource ownership

* Add hidden useExternalMatchfinder bool to CCtx_params_s

* Eliminate malloc, move to cwksp allocation

* Handle CCtx reset properly

* Ensure seqStore has enough space for external sequences

* fix capitalization

* Add DEBUGLOG statements

* Add compressionLevel param to matchfinder API

* fix c99 issues and add a param combination error code

* nits

* Test external matchfinder API

* C90 compat for simpleExternalMatchFinder

* Fix some @nocommits and an ASAN bug

* nit

* nit

* nits

* forward declare copySequencesToSeqStore functions in zstd_compress_internal.h

* nit

* nit

* nits

* Update copyright headers

* Fix CMake zstreamtest build

* Fix copyright headers (again)

* typo

* Add externalMatchfinder demo program to make contrib

* Reduce memory consumption for small blockSize

* ZSTD_postProcessExternalMatchFinderResult nits

* test sum(matchlen) + sum(litlen) == srcSize in debug builds

* refExternalMatchFinder -> registerExternalMatchFinder

* C90 nit

* zstreamtest nits

* contrib nits

* contrib nits

* allow block splitter + external matchfinder, refactor

* add windowSize param

* add contrib/externalMatchfinder/README.md

* docs

* go back to old RLE heuristic because of the first block issue

* fix initializer element is not a constant expression

* ref contrib from zstd.h

* extremely pedantic compiler warning fix, meson fix, typo fix

* Additional docs on API limitations

* minor nits

* Refactor maxNbSeq calculation into a helper function

* Fix copyright
2022-12-28 16:45:14 -05:00
Yann Collet
b17743e41b Signal parameter change during MT compression 2022-12-28 13:14:58 -08:00
Yann Collet
89342d1e07 New xp library symbol : ZSTD_CCtx_setCParams()
Inspired by #3395,
offer a new capability to set all parameters defined in a ZSTD_compressionParameters structure
with a single symbol invocation
to improve user code brevity.
2022-12-27 23:49:22 -08:00
Yann Collet
90597d78ea
Merge pull request #3394 from terrelln/issue-3010
[cli-tests] Test file stat read/write
2022-12-27 16:20:05 -08:00
dependabot[bot]
1f72dca0ff
Bump ossf/scorecard-action from 2.1.0 to 2.1.2
Bumps [ossf/scorecard-action](https://github.com/ossf/scorecard-action) from 2.1.0 to 2.1.2.
- [Release notes](https://github.com/ossf/scorecard-action/releases)
- [Changelog](https://github.com/ossf/scorecard-action/blob/main/RELEASE.md)
- [Commits](937ffa90d7...e38b1902ae)

---
updated-dependencies:
- dependency-name: ossf/scorecard-action
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2022-12-26 05:08:53 +00:00
Yann Collet
6640377783 cmake build: fix nit
reported by @jaimeMF in https://github.com/facebook/zstd/pull/3392#discussion_r1056643794
2022-12-23 14:18:11 -08:00
Daniel Kutenin
48f4aa7307
Move deprecated annotation before static to allow C++ compilation for clang
This fixes last 2 instances of https://github.com/facebook/zstd/issues/3250
2022-12-23 12:07:31 +00:00
Yann Collet
089b2797e3
Merge pull request #3398 from facebook/fix3316
spec update : require minimum nb of literals for 4-streams mode
2022-12-22 16:57:05 -08:00
Yann Collet
6a9c525903 spec update : require minimum nb of literals for 4-streams mode
Reported by @shulib :
the specification for 4-streams mode
doesn't work when the amount of literals to compress is 5 bytes.
Extending it, it also doesn't work for sizes 1 or 2.

This patch updates the specification and the implementation
to require a minimum of 6 literals to trigger or accept the 4-streams mode.

The impact is expected to be a no-op :
the 4-streams mode is never triggered for such small quantity of literals anyway,
since it would be wasteful (it costs ~7.3 bytes more than single-stream mode).
An informal lower limit is set at ~256 bytes,
so the technical minimum is very far from this limit.

This is just meant for completeness of the specification.
2022-12-22 16:14:34 -08:00
Yann Collet
8209bfc4f3
Merge pull request #3399 from facebook/fix2577
Support decompression of compressed blocks of size ZSTD_BLOCKSIZE_MAX
2022-12-22 14:05:36 -08:00
Yann Collet
ea2895cef4 Support decompression of compressed blocks of size ZSTD_BLOCKSIZE_MAX exactly 2022-12-22 12:40:27 -08:00
Felix Handte
f5ea3a196f
Merge pull request #3397 from felixhandte/man-page-tweaks
Man Page Tweaks, Edits, Formatting Fixes
2022-12-22 14:49:59 -05:00
W. Felix Handte
11aba9b316 make man 2022-12-22 14:13:24 -05:00
W. Felix Handte
382026f096 Man Page Tweaks, Edits, Formatting Fixes
This started as an application of the edits suggested in #3201 and expanded
from there.
2022-12-22 14:13:17 -05:00
Nick Terrell
7fe7a166c2 [cli-tests] Add tests that use --trace-file-stat
Basic tests for (de)compressing in the following modes:
* file to file
* file to stdout
* stdin to file
* stdin to stdout

These are basic tests, and aren't testing more advanced scenarios, but
it adds the groundwork for more complex tests as needed.

Fixes #3010.
2022-12-21 18:32:12 -08:00
Nick Terrell
4b40e405d3 [tests] Remove deprecated function from longmatch.c test
Thanks to @eli-schwartz for pointing it out!

We should maybe consider adding a helper function for applying
`ZSTD_parameters` and `ZSTD_compressionParameters` to a context.
That would aid the transition to the new API in situations like this.
2022-12-21 17:52:10 -08:00
Nick Terrell
40a7188130 Fix make clangbuild & add CI
Fix the errors for:
* `-Wdocumentation`
* `-Wconversion` except `-Wsign-conversion`
2022-12-21 17:31:04 -08:00