1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-05 15:09:03 +02:00
Commit Graph

698 Commits

Author SHA1 Message Date
a95e9e80d1 adding some debug functions to observe statistics 2018-05-18 14:09:42 -07:00
a243020d37 slightly improved weight calculation
translating into a tiny compression ratio improvement
2018-05-17 11:19:44 -07:00
63eeeaa1dd update table levels for blocks <= 16K
also : allow hlog to be slighly larger than windowlog,
as it's apparently good for both speed and compression ratio.
2018-05-16 16:13:37 -07:00
f372ffc64d Merge pull request #1127 from facebook/staticDictCost
Improved optimal parser with dictionary
2018-05-14 17:45:50 -07:00
c9227ee16b update table for 128 KB blocks 2018-05-13 17:15:07 -07:00
b4250489cf update compression levels for large inputs 2018-05-13 01:53:38 -07:00
99ddca43a6 fixed wrong assertion
base can actually overflow
2018-05-10 19:48:09 -07:00
1a26ec6e8d opt: init statistics from dictionary
instead of starting from fake "default" statistics.
2018-05-10 17:59:12 -07:00
ac6105463a opt: minor improvements to log traces
slight improvement when using fractional-bit evaluation (opt:dictionay)
2018-05-09 15:46:11 -07:00
4d5bd32a00 added traces to look at symbol costs
evaluation looks correct.
2018-05-09 12:00:12 -07:00
338f738c24 pass entropy tables to optimal parser
for proper estimation of symbol's weights
when using dictionary compression.

Note : using only huffman costs is not good enough,
presumably because sequence symbol costs are incorrect.
2018-05-08 15:37:06 -07:00
a155061328 minor code refactor for readability
removed some useless operations from optimal parser
(should not change performance, too small a difference)
2018-05-08 12:32:44 -07:00
ad4524d605 fix ZSTD_compressBlock() associated with CDict
reported by @let-def.

It's actually a bug in ZSTD_compressBegin_usingCDict()
which would pass a wrong pledgedSrcSize value (0 instead of ZSTD_CONTENTSIZE_UNKNOWN)
resulting in wrong window size, resulting in downsized seqStore,
resulting in segfault when writing into the seqStore later in the process.

Added a test in fuzzer to cover this use case (fails before the patch).
2018-05-07 12:54:13 -07:00
ca77822ddf Fix parameter adjustment with dictionary
The new advanced API basically set `requestedParams = appliedParams` when
using a dictionary. This halted all parameter adjustment, which can hurt
compression ratio if, for example, the window log is small for the first
call, but the rest of the files are large.

This patch fixes the bug, and checks that the `requestedParams` don't change
in the new advanced API when using a dictionary, and generally in the fuzzer.
2018-04-25 16:32:29 -07:00
c0987986e5 Only reset CDict in ZSTD_CCtx_resetParameters() 2018-04-13 11:26:40 -07:00
9f76eebd17 Add ZSTD_CCtx_resetParameters() function
* Fix docs for `ZSTD_CCtx_reset()`.
* Add `ZSTD_CCtx_resetParameters()`.

Fixes #1094.
2018-04-12 16:54:07 -07:00
3c3f59e68f Enforce pledgeSrcSize whenever known (#1106)
The test fails before the patch and passes after.

Fixes #1095.
2018-04-12 16:02:03 -07:00
280a236e9e Add ZSTD_CCtx(Param)?_getParameter() function
Closes #1096.
2018-04-12 11:50:12 -07:00
295ab0dbfa Only load extra table positions for CDicts
Zstdmt uses prefixes to load the overlap between segments. Loading extra
positions makes compression non-deterministic, depending on the previous
job the context was used for. Since loading extra position takes extra
time as well, only do it when creating a `ZSTD_CDict`.

Fixes #1077.
2018-04-02 14:41:30 -07:00
153bc1c004 removed limit ZSTD_TARGETLENGTH_MAX
this makes it possible to specify extremely large negative compression levels,
achieving the side effect as "no compression".

It will also be possible to define larger targetlength for ultra compression mode.

There is no adverse side effect due to removing this limit.
2018-03-21 15:50:05 -07:00
a99c4a3621 Merge branch 'dev' into advancedDecompress 2018-03-21 06:08:28 -07:00
87b0cf05bd Merge pull request #1057 from facebook/lrmSettings
LRM parameters
2018-03-21 05:59:39 -07:00
d1bf609abf Merge pull request #1059 from terrelln/mt-ldm
Integrate ldm with zstdmt
2018-03-20 17:50:20 -07:00
878728dc26 fixed several comments by @terrelln 2018-03-20 16:35:14 -07:00
e1c52faace Merge pull request #1060 from facebook/compressImpl
merge bmi2 implementation of encodeSequence into zstd_compress.c
2018-03-20 16:19:42 -07:00
6873fec658 changed dictMore for dictContentType
which seems clearer to describe what the variable/argument is about.
2018-03-20 15:13:14 -07:00
136b9e2392 Fix external sequence corner cases
* Clear external sequences when we reset the `ZSTD_CCtx`.
* Skip external sequences when a block is too small to compress.
2018-03-20 14:50:28 -07:00
2ed5af0766 merge bmi2 implementation of encodeSequence into zstd_compress.c 2018-03-19 19:10:31 -07:00
c8b3d389fd updated CCtxParams API
to respect naming convention :
ZSTD_CCtxParams_*()
2018-03-19 15:07:26 -07:00
6f4d0778a5 make it possible to express compression parameters in any order 2018-03-19 14:41:23 -07:00
9618c0c804 make it possible to specify LDM parameters in any order 2018-03-19 11:07:04 -07:00
4af1fafeb8 Restore setting loadedDictEnd
Setting `loadedDictEnd` was accidently removed from `ZSTD_loadDictionaryContent()`,
which means that dictionary compression will only be able to reference the parts of
the dictionary within the window. The spec allows us to reference the entire
dictionary so long as even one byte is in the window.

`ZSTD_enforceMaxDist()` incorrectly always allowed offsets up to `loadedDictEnd`
beyond the window, even once the dictionary was out of range.

When overflow protection kicked in, the check `current > loadedDictEnd + maxDist`
is incorrect if `loadedDictEnd` isn't reset back to zero. `current` could be reset
below the value, which would incorrectly allow references beyond the window. This
bug is present in `master`, but is very hard to trigger, since it requires both
dictionaries and data which triggers overflow correction.
2018-03-16 14:54:06 -07:00
192542b63c Merge pull request #1047 from facebook/hufCompress
removed huf_compress_impl.h
2018-03-15 14:14:03 -07:00
1908c92c46 Merge remote-tracking branch 'upstream/dev' into extern-seq
* upstream/dev:
  Fix overflow protection with wlog=31
2018-03-14 17:26:31 -07:00
a909c293c6 Merge branch 'dev' into hufCompress 2018-03-14 16:11:25 -07:00
a9a6dcba63 Expose reference external sequence API
* Expose the reference external sequences API for zstdmt.
  Allows external sequences of any length, which get split when necessary.
* Reset the LDM window when the context is reset.
* Store the maximum number of LDM sequences.
* Sequence generation now returns the number of last literals.
* Fix sequence generation to not throw out the last literals when blocks of
  more than 1 MB are encountered.
2018-03-14 12:29:31 -07:00
33fb966e56 Fix overflow protection with wlog=31
The overflow protection is broken when the window log is `> (3U << 29)`, so 31.
It doesn't work when `current` isn't around `1U << windowLog` ahead of `lowLimit`,
and the the assertion `current > newCurrent` fails. This happens when the same
context is used many times over, but with a large window log, like in zstdmt.

Fix it by triggering correction based on `nextSrc - base` instead of `lowLimit`.

The added test fails before the patch, and passes after.
2018-03-14 11:45:44 -07:00
50f763ec44 fixed several comments are underlined by @terrelln 2018-03-13 14:23:14 -07:00
a95a88af57 removed huf_compress_impl.h
re-imported all functions inside huf_compress.c
for easier source editing.

Also updated a bunch of code comments
for clarification.
2018-03-13 14:14:05 -07:00
2291b85a1e changed ZSTD_p_literalCompression into ZSTD_p_compressLiterals
prefer verb+object construction
2018-03-12 11:44:10 -07:00
6a9b41b731 create command --fast[=#]
access negative compression levels from command line
for both compression and benchmark modes.

also : ensure proper propagation of parameters
through ZSTD_compress_generic() interface.

added relevant cli tests.
2018-03-11 20:01:23 -07:00
a146ee04ae added negative compression levels
negative compression level trade compression ratio for more compression speed.
They turn off huffman compression of literals,
and use row 0 as baseline with a stepSize = -cLevel.

added associated test in fuzzer

also added : new advanced parameter ZSTD_p_literalCompression
2018-03-11 05:21:53 -07:00
facc09aa03 minor compression level adaptation
level 12 compresses slightly more and faster
due to better btlazy2 mode
2018-03-11 03:06:52 -07:00
0a0e64c641 LDM manages its own window round buffer 2018-02-27 12:13:23 -08:00
7e5e226cbf Split the window state into substructure 2018-02-26 13:29:57 -08:00
af866b3a58 Split block compresser out of long range matcher
* `ZSTD_ldm_generateSequences()` generates the LDM sequences and
  stores them in a table. It should work with any chunk size, but
  is currently only called one block at a time.
* `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and
  instead of encoding the literals directly, it passes them to a
  secondary block compressor. The code to handle chunk sizes greater
  than the block size is currently commented out, since it is unused.
  The next PR will uncomment exercise this code.
* During optimal parsing, ensure LDM `minMatchLength` is at least
  `targetLength`. Also don't emit repcode matches in the LDM block
  compressor. Enabling the LDM with the optimal parser now actually improves
  the compression ratio.
* The compression ratio is very similar to before. It is very slightly
  different, because the repcode handling is slightly different. If I remove
  immediate repcode checking in both branches the compressed size is exactly
  the same.
* The speed looks to be the same or better than before.

Up Next (in a separate PR)
--------------------------

Allow sequence generation to happen prior to compression, and produce more
than a block worth of sequences. Expose some API for zstdmt to consume.
This will test out some currently untested code in
`ZSTD_ldm_blockCompress()`.
2018-02-22 15:18:41 -08:00
b58f01537e [compress] Support BMI2 2018-02-14 19:20:32 -08:00
5f7495371e Merge branch 'dev' into fasterDec 2018-02-10 14:24:44 -08:00
9945e60ac4 Merge branch 'dev' into flexibleLevel 2018-02-10 11:54:49 -08:00
c72091556b fixed minor nit as per @terrelln's comments 2018-02-09 09:46:08 -08:00