1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-05 07:00:22 +02:00
Commit Graph

673 Commits

Author SHA1 Message Date
a7fdceeccd changed dynamic fse threshold for offset
recent experienced showed that
default distribution table for offset
can get it wrong pretty quickly with the nb of symbols,
while it remains a reasonable choice much longer for lengths symbols.

Changed the formula,
so that dynamic threshold is now 32 symbols for offsets.
It remains at 64 symbols for lengths.

Detection based on defaultNormLog
2018-05-25 17:41:16 -07:00
4b3a36d5d8 Merge branch 'dev' into lowCompression 2018-05-25 15:45:03 -07:00
5f177f1c53 btultra accepts blocks with poorer compression ratio
zstd rejects blocks which do not compress by at least a certain amount.
In which case, such block is simply emitted uncompressed (even if a little bit of compression could be achieved).
This is better for decompression speed, hence for energy.

The logic is controlled by ZSTD_minGain().
The rule is applied uniformly, at all compression levels.

This change makes btultra accepts blocks with poor compression ratios.
We presume that users of btultra mode prefers compression ratio over some decompress speed gains.

The threshold for minimum gain is lowered for btultra
from s>>6 (~1.5% minimum gain)
to s>>7 (~0.8% minimum gain).

This is a prudent change.
Not sure if it's large enough.
2018-05-25 15:19:52 -07:00
c10d1b4011 Skeleton for In-Place Impl for ZSTD_dfast 2018-05-25 13:13:28 -04:00
f6ad59ab5c Merge branch 'dev' into staticDictCost 2018-05-24 16:21:02 -07:00
08c5be5db3 Merge pull request #1117 from felixhandte/zstd-fast-in-place-dict
ZSTD_fast: Support Searching the Dictionary Context In-Place
2018-05-23 19:32:25 -07:00
06b70179da Work around bug in zstd decoder (#1147)
Work around bug in zstd decoder

Pull request #1144 exercised a new path in the zstd decoder that proved to
be buggy. Avoid the extremely rare bug by emitting an uncompressed block.
2018-05-23 18:02:30 -07:00
d9c7e67125 Assert that Dict and Current Window are Adjacent in Index Space 2018-05-23 17:53:03 -04:00
298d24fa57 Make loadedDictEnd an Index, not the Dict Len 2018-05-23 17:53:03 -04:00
7ef85e0618 Fixes in re Comments 2018-05-23 17:53:03 -04:00
582b7f85ed Don't Attach Empty Dict Contents
In weird corner cases, they produce unexpected results...
2018-05-23 17:53:03 -04:00
a44ab3b475 Remove Out-of-Date Comment 2018-05-23 17:53:03 -04:00
7e0402e738 Also Attach Dict When Source Size is Unknown 2018-05-23 17:53:03 -04:00
3ba70cc759 Clear the Dictionary When Sliding the Window 2018-05-23 17:53:03 -04:00
2d598e6fed Force Working Context Indices Greater than Dict Indices 2018-05-23 17:53:03 -04:00
ae4fcf7816 Respond to PR Comments; Formatting/Style/Lint Fixes 2018-05-23 17:53:03 -04:00
66bc1ca641 Change Cut-Off to 8 KB 2018-05-23 17:53:03 -04:00
b67196f30d Coalesce hasDictMatchState and extDict Checks into One Enum and Rename Stuff 2018-05-23 17:53:03 -04:00
265c2869d1 Split Wrapper Functions to Cause Inlining 2018-05-23 17:53:03 -04:00
d18a405779 Refer to the Dictionary Match State In-Place (Sometimes) 2018-05-23 17:53:03 -04:00
e3959d5eba Fixes 2018-05-22 16:06:33 -07:00
7a8b3496b4 Merge branch 'dev' into staticDictCost 2018-05-22 15:10:05 -07:00
49cf880513 Approximate FSE encoding costs for selection
Estimate the cost for using FSE modes `set_basic`, `set_compressed`, and
`set_repeat`, and select the one with the lowest cost.

* The cost of `set_basic` is computed using the cross-entropy cost
  function `ZSTD_crossEntropyCost()`, using the normalized default count
  and the count.
* The cost of `set_repeat` is computed using `FSE_bitCost()`. We check the
  previous table to see if it is able to represent the distribution.
* The cost of `set_compressed` is computed with the entropy cost function
  `ZSTD_entropyCost()`, together with the cost of writing the normalized
  count `ZSTD_NCountCost()`.
2018-05-22 14:33:22 -07:00
5381369cb1 Merge branch 'dev' into tableLevels 2018-05-18 18:23:27 -07:00
b0b3fb517d updated compression levels for blocks of 256KB 2018-05-18 17:17:12 -07:00
a95e9e80d1 adding some debug functions to observe statistics 2018-05-18 14:09:42 -07:00
a243020d37 slightly improved weight calculation
translating into a tiny compression ratio improvement
2018-05-17 11:19:44 -07:00
63eeeaa1dd update table levels for blocks <= 16K
also : allow hlog to be slighly larger than windowlog,
as it's apparently good for both speed and compression ratio.
2018-05-16 16:13:37 -07:00
f372ffc64d Merge pull request #1127 from facebook/staticDictCost
Improved optimal parser with dictionary
2018-05-14 17:45:50 -07:00
c9227ee16b update table for 128 KB blocks 2018-05-13 17:15:07 -07:00
b4250489cf update compression levels for large inputs 2018-05-13 01:53:38 -07:00
99ddca43a6 fixed wrong assertion
base can actually overflow
2018-05-10 19:48:09 -07:00
1a26ec6e8d opt: init statistics from dictionary
instead of starting from fake "default" statistics.
2018-05-10 17:59:12 -07:00
ac6105463a opt: minor improvements to log traces
slight improvement when using fractional-bit evaluation (opt:dictionay)
2018-05-09 15:46:11 -07:00
4d5bd32a00 added traces to look at symbol costs
evaluation looks correct.
2018-05-09 12:00:12 -07:00
338f738c24 pass entropy tables to optimal parser
for proper estimation of symbol's weights
when using dictionary compression.

Note : using only huffman costs is not good enough,
presumably because sequence symbol costs are incorrect.
2018-05-08 15:37:06 -07:00
a155061328 minor code refactor for readability
removed some useless operations from optimal parser
(should not change performance, too small a difference)
2018-05-08 12:32:44 -07:00
ad4524d605 fix ZSTD_compressBlock() associated with CDict
reported by @let-def.

It's actually a bug in ZSTD_compressBegin_usingCDict()
which would pass a wrong pledgedSrcSize value (0 instead of ZSTD_CONTENTSIZE_UNKNOWN)
resulting in wrong window size, resulting in downsized seqStore,
resulting in segfault when writing into the seqStore later in the process.

Added a test in fuzzer to cover this use case (fails before the patch).
2018-05-07 12:54:13 -07:00
ca77822ddf Fix parameter adjustment with dictionary
The new advanced API basically set `requestedParams = appliedParams` when
using a dictionary. This halted all parameter adjustment, which can hurt
compression ratio if, for example, the window log is small for the first
call, but the rest of the files are large.

This patch fixes the bug, and checks that the `requestedParams` don't change
in the new advanced API when using a dictionary, and generally in the fuzzer.
2018-04-25 16:32:29 -07:00
c0987986e5 Only reset CDict in ZSTD_CCtx_resetParameters() 2018-04-13 11:26:40 -07:00
9f76eebd17 Add ZSTD_CCtx_resetParameters() function
* Fix docs for `ZSTD_CCtx_reset()`.
* Add `ZSTD_CCtx_resetParameters()`.

Fixes #1094.
2018-04-12 16:54:07 -07:00
3c3f59e68f Enforce pledgeSrcSize whenever known (#1106)
The test fails before the patch and passes after.

Fixes #1095.
2018-04-12 16:02:03 -07:00
280a236e9e Add ZSTD_CCtx(Param)?_getParameter() function
Closes #1096.
2018-04-12 11:50:12 -07:00
295ab0dbfa Only load extra table positions for CDicts
Zstdmt uses prefixes to load the overlap between segments. Loading extra
positions makes compression non-deterministic, depending on the previous
job the context was used for. Since loading extra position takes extra
time as well, only do it when creating a `ZSTD_CDict`.

Fixes #1077.
2018-04-02 14:41:30 -07:00
153bc1c004 removed limit ZSTD_TARGETLENGTH_MAX
this makes it possible to specify extremely large negative compression levels,
achieving the side effect as "no compression".

It will also be possible to define larger targetlength for ultra compression mode.

There is no adverse side effect due to removing this limit.
2018-03-21 15:50:05 -07:00
a99c4a3621 Merge branch 'dev' into advancedDecompress 2018-03-21 06:08:28 -07:00
87b0cf05bd Merge pull request #1057 from facebook/lrmSettings
LRM parameters
2018-03-21 05:59:39 -07:00
d1bf609abf Merge pull request #1059 from terrelln/mt-ldm
Integrate ldm with zstdmt
2018-03-20 17:50:20 -07:00
878728dc26 fixed several comments by @terrelln 2018-03-20 16:35:14 -07:00
e1c52faace Merge pull request #1060 from facebook/compressImpl
merge bmi2 implementation of encodeSequence into zstd_compress.c
2018-03-20 16:19:42 -07:00