1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-14 11:14:18 +02:00
Commit Graph

1275 Commits

Author SHA1 Message Date
a95e9e80d1 adding some debug functions to observe statistics 2018-05-18 14:09:42 -07:00
af3da079d1 fixed minor conversion warning 2018-05-17 17:27:27 -07:00
8572b4d09f fixed a pretty complex bug when combining ldm + btultra 2018-05-17 16:13:53 -07:00
134388ba6b collect statistics for first block in ultra mode
this patch makes btultra do 2 passes on the first block,
the first one being dedicated to collecting statistics
so that the 2nd pass is more accurate.

It translates into a very small compression ratio gain :

enwik7, level 20:
blocks  4K : 2.142 -> 2.153
blocks 16K : 2.447 -> 2.457
blocks 64K : 2.716 -> 2.726

On the other hand, the cpu cost is doubled.

The trade off looks bad.
Though, that's ultimately a price to pay to reach better compression ratio.
So it's only enabled when setting btultra.
2018-05-17 12:24:30 -07:00
a243020d37 slightly improved weight calculation
translating into a tiny compression ratio improvement
2018-05-17 11:19:44 -07:00
63eeeaa1dd update table levels for blocks <= 16K
also : allow hlog to be slighly larger than windowlog,
as it's apparently good for both speed and compression ratio.
2018-05-16 16:13:37 -07:00
18fc3d3cd5 introduced bit-fractional cost evaluation
this improves compression ratio by a *tiny* amount.
It also reduces speed by a small amount.

Consequently, bit-fractional evaluation is only turned on for btultra.
2018-05-16 14:53:35 -07:00
30d9c84b1a Fix failing Travis tests 2018-05-15 09:46:20 -07:00
0b31304c8d Merge branch 'dev' into staticDictCost 2018-05-14 18:09:26 -07:00
2c26df0e13 opt: removed static prices
after testing, it's actually always better to use dynamic prices
albeit initialised from dictionary.
2018-05-14 18:04:08 -07:00
f372ffc64d Merge pull request #1127 from facebook/staticDictCost
Improved optimal parser with dictionary
2018-05-14 17:45:50 -07:00
c9227ee16b update table for 128 KB blocks 2018-05-13 17:15:07 -07:00
b4250489cf update compression levels for large inputs 2018-05-13 01:53:38 -07:00
761758982e replaced FSE_count by FSE_count_simple
to reduce usage of stack memory.

Also : tweaked a few comments, as suggested by @terrelln
2018-05-11 16:03:37 -07:00
99ddca43a6 fixed wrong assertion
base can actually overflow
2018-05-10 19:48:09 -07:00
09d0fa29ee minor adjusting of weights 2018-05-10 18:13:48 -07:00
1a26ec6e8d opt: init statistics from dictionary
instead of starting from fake "default" statistics.
2018-05-10 17:59:12 -07:00
74b1c75d64 btopt : minor adjustment of update frequencies 2018-05-10 16:32:36 -07:00
ac6105463a opt: minor improvements to log traces
slight improvement when using fractional-bit evaluation (opt:dictionay)
2018-05-09 15:46:11 -07:00
c39061cb7b fixed declaration-after-statement warning 2018-05-09 12:07:25 -07:00
4d5bd32a00 added traces to look at symbol costs
evaluation looks correct.
2018-05-09 12:00:12 -07:00
c0da0f5e9e switchable bit-approximation / fractional-bit accuracy modes
also : makes it possible to select nb of fractional bits.
2018-05-09 10:48:09 -07:00
ba2ad9b6b9 implemented fractional bit cost evaluation
for FSE symbols.

While it seems to work, the gains are negligible compared to rough maxNbBits evaluation.
There are even a few losses sometimes, that still need to be explained.
Furthermode, there are still cases where btlazy2 does a better job than btopt,
which seems rather strange too.
2018-05-08 17:43:13 -07:00
1aff63b114 opt: shift all costs by 8 bits (* 256)
making it possible to represent fractional bit costs.
2018-05-08 16:19:04 -07:00
6a3c34aa58 opt: estimate cost of both Hufman and FSE symbols
For FSE symbols : provide an upper bound,
in nb of bits,
since cost function is not able to store fractional bit costs.
2018-05-08 16:11:21 -07:00
338f738c24 pass entropy tables to optimal parser
for proper estimation of symbol's weights
when using dictionary compression.

Note : using only huffman costs is not good enough,
presumably because sequence symbol costs are incorrect.
2018-05-08 15:37:06 -07:00
a155061328 minor code refactor for readability
removed some useless operations from optimal parser
(should not change performance, too small a difference)
2018-05-08 12:32:44 -07:00
ad4524d605 fix ZSTD_compressBlock() associated with CDict
reported by @let-def.

It's actually a bug in ZSTD_compressBegin_usingCDict()
which would pass a wrong pledgedSrcSize value (0 instead of ZSTD_CONTENTSIZE_UNKNOWN)
resulting in wrong window size, resulting in downsized seqStore,
resulting in segfault when writing into the seqStore later in the process.

Added a test in fuzzer to cover this use case (fails before the patch).
2018-05-07 12:54:13 -07:00
ca77822ddf Fix parameter adjustment with dictionary
The new advanced API basically set `requestedParams = appliedParams` when
using a dictionary. This halted all parameter adjustment, which can hurt
compression ratio if, for example, the window log is small for the first
call, but the rest of the files are large.

This patch fixes the bug, and checks that the `requestedParams` don't change
in the new advanced API when using a dictionary, and generally in the fuzzer.
2018-04-25 16:32:29 -07:00
c0987986e5 Only reset CDict in ZSTD_CCtx_resetParameters() 2018-04-13 11:26:40 -07:00
9f76eebd17 Add ZSTD_CCtx_resetParameters() function
* Fix docs for `ZSTD_CCtx_reset()`.
* Add `ZSTD_CCtx_resetParameters()`.

Fixes #1094.
2018-04-12 16:54:07 -07:00
3c3f59e68f Enforce pledgeSrcSize whenever known (#1106)
The test fails before the patch and passes after.

Fixes #1095.
2018-04-12 16:02:03 -07:00
280a236e9e Add ZSTD_CCtx(Param)?_getParameter() function
Closes #1096.
2018-04-12 11:50:12 -07:00
295ab0dbfa Only load extra table positions for CDicts
Zstdmt uses prefixes to load the overlap between segments. Loading extra
positions makes compression non-deterministic, depending on the previous
job the context was used for. Since loading extra position takes extra
time as well, only do it when creating a `ZSTD_CDict`.

Fixes #1077.
2018-04-02 14:41:30 -07:00
29b021f9a0 Merge pull request #1067 from facebook/targetLength
removed limit ZSTD_TARGETLENGTH_MAX
2018-03-22 10:38:33 -07:00
ad344033df Fix broken assertion
The `avgJobSize` must not be lower than 256 KB for single-pass mode.
In `zstd.h` we say the minimum value for `ZSTD_p_jobSize` is 1 MB,
so ensure that we always pick a size >= 1 MB.

Found by libFuzzer fuzzer tests with large input limits.
2018-03-21 16:20:30 -07:00
153bc1c004 removed limit ZSTD_TARGETLENGTH_MAX
this makes it possible to specify extremely large negative compression levels,
achieving the side effect as "no compression".

It will also be possible to define larger targetlength for ultra compression mode.

There is no adverse side effect due to removing this limit.
2018-03-21 15:50:05 -07:00
a99c4a3621 Merge branch 'dev' into advancedDecompress 2018-03-21 06:08:28 -07:00
87b0cf05bd Merge pull request #1057 from facebook/lrmSettings
LRM parameters
2018-03-21 05:59:39 -07:00
d1bf609abf Merge pull request #1059 from terrelln/mt-ldm
Integrate ldm with zstdmt
2018-03-20 17:50:20 -07:00
878728dc26 fixed several comments by @terrelln 2018-03-20 16:35:14 -07:00
e1c52faace Merge pull request #1060 from facebook/compressImpl
merge bmi2 implementation of encodeSequence into zstd_compress.c
2018-03-20 16:19:42 -07:00
a3b76a77ef Quiet appveyor warnings 2018-03-20 15:34:40 -07:00
6873fec658 changed dictMore for dictContentType
which seems clearer to describe what the variable/argument is about.
2018-03-20 15:13:14 -07:00
136b9e2392 Fix external sequence corner cases
* Clear external sequences when we reset the `ZSTD_CCtx`.
* Skip external sequences when a block is too small to compress.
2018-03-20 14:50:28 -07:00
451357f37f Merge pull request #1058 from facebook/cctxParams
updated CCtxParams API
2018-03-20 12:36:12 -07:00
2ed5af0766 merge bmi2 implementation of encodeSequence into zstd_compress.c 2018-03-19 19:10:31 -07:00
d19f803a3b Fix window size for 1 worker + flushing 2018-03-19 18:56:39 -07:00
24d9edbdd8 Set ldmParams to 0 when disabled 2018-03-19 18:23:54 -07:00
4b92574feb Fix corner cases exposed by zstreamtest 2018-03-19 17:54:04 -07:00