1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-06 15:45:37 +02:00
Commit Graph

61 Commits

Author SHA1 Message Date
58adb1059f extended exact window size to greedy/lazy modes 2019-05-31 16:08:48 -07:00
327cf6fac1 nextToUpdate3 does not need to be maintained outside of zstd_opt.c
It's re-synchronized with nextToUpdate at beginning of each block.
It only needs to be tracked from within zstd_opt block parser.

Made the logic clear, so that no code tried to maintain this variable.

An even better solution would be to make nextToUpdate3
an internal variable of ZSTD_compressBlock_opt_generic().
That would make it possible to remove it from ZSTD_matchState_t,
thus restricting its visibility to only where it's actually useful.

This would require deeper changes though,
since the matchState is the natural structure to transport parameters into and inside the parser.
2019-05-28 15:26:52 -07:00
e874dacc08 changed searchLength into minMatch
refactored all relevant API and calls
for consistency.
2018-11-20 14:56:07 -08:00
626040ab53 changed PREFETCH() macro into PREFETCH_L2()
which is more accurate
2018-11-12 17:05:32 -08:00
b8235be865 Avoid Searching Dictionary in ZSTD_btlazy2 When an Optimal Match is Found
Bailing here is important to avoid reading past the end of the input buffer.
2018-10-08 15:59:32 -07:00
d121b3451c Clean Up Debug Log Statements 2018-10-08 15:59:32 -07:00
08da9ad316 Remove Unused Variable 2018-10-08 15:59:32 -07:00
22ddf3523a fixed msan warning
on btlazy2 strategy with dictAttach
2018-10-02 18:20:20 -07:00
c2369fedc4 Restore Passing CParams to ZSTD_insertAndFindFirstIndex_internal 2018-09-28 17:12:54 -07:00
e4ac4a0f16 Support Split Logs in ZSTD_greedy..ZSTD_btlazy2 2018-09-28 17:12:54 -07:00
14764de49f Stop Separately Passing CParams in ZSTD_lazy Internal Functions 2018-09-28 17:12:53 -07:00
dcdf437fed Also Remove CParams from Table Filling Functions' Args 2018-09-28 17:10:42 -07:00
6cb2454646 Remove CParams from Block Compressor Functions' Args 2018-09-28 17:10:42 -07:00
f2d6db45cd [zstd] Add -Wmissing-prototypes 2018-09-27 15:24:48 -07:00
3caba150c6 Fix dmsBtLow Test 2018-06-21 15:53:40 -04:00
0c654d22c8 Force Inline BtFindBestMatch 2018-06-14 14:54:39 -04:00
0551de4b5a Search Dict for Matches 2018-06-13 16:06:28 -04:00
d53200a846 Fix Cast Warning 2018-06-13 14:58:36 -04:00
b82063b266 Extend Dictionary Matches Backwards 2018-06-13 14:58:36 -04:00
2162aa9f18 Do Not Inline DMS Search Function 2018-06-13 14:58:36 -04:00
338bede9b5 Also Implement Depth Repcode Checks 2018-06-13 14:58:36 -04:00
555ab9f8cf Apply Match Continuation Bug Fix 2018-06-13 14:58:36 -04:00
6204b6d592 Check Dict Match State in ZSTD_HcFindBestMatch_generic 2018-06-13 14:58:36 -04:00
2e93736a77 Remove Pre-Existing Repcode Check 2018-06-13 14:58:36 -04:00
3b82a23a35 Second Repcode Check 2018-06-13 14:58:36 -04:00
a2a24bebec First Repcode Check 2018-06-13 14:58:36 -04:00
f74c2cd673 Disallow Too-Long Repcodes When Using an Attached Dict 2018-06-13 14:58:36 -04:00
c14db94450 Rename base -> prefixLowest 2018-06-13 14:58:36 -04:00
5d90708a0a Go Back to Separate Intermediate Functions for Different Dict Modes 2018-06-13 14:58:36 -04:00
f84fc63a43 Further Templatize Intermediate Functions on dictMode 2018-06-13 14:58:36 -04:00
529d3a5acd Convert Existing U32 extDict Vars to ZSTD_dictMode Enums 2018-06-13 14:58:36 -04:00
90cfc799e5 Add _dictMatchState Stubs for ZSTD_lazy Functions 2018-06-13 14:58:36 -04:00
a85ecb32bd Add dictMode Param to ZSTD_compressBlock_lazy_generic 2018-06-13 14:58:36 -04:00
7e5e226cbf Split the window state into substructure 2018-02-26 13:29:57 -08:00
9945e60ac4 Merge branch 'dev' into flexibleLevel 2018-02-10 11:54:49 -08:00
de68c2ff10 Merged ZSTD_preserveUnsortedMark() into ZSTD_reduceIndex()
as it's faster, due to one memory scan instead of two
(confirmed by microbenchmark).

Note : as ZSTD_reduceIndex() is rarely invoked,
it does not translate into a visible gain.
Consider it an exercise in auto-vectorization and micro-benchmarking.
2018-02-07 14:22:35 -08:00
0170cf9a7a minor : modified ZSTD_preserveUnsortedMark() to be more vectorization friendly 2018-02-05 11:46:02 -08:00
5188749e1c ensure compression parameters are updated when only compression level is changed 2018-02-02 16:31:20 -08:00
887cd4e35e Split ZSTD_CCtx into smaller sub-structures 2018-01-16 11:17:50 -08:00
b9a14900ff changed function name to ZSTD_DUBT_findBestMatch() 2018-01-11 12:38:31 -08:00
b17fb488b0 fixed msan test
a pointer calculation was wrong in a corner case
2018-01-06 20:50:36 +01:00
a927fae2a1 fixed ZSTD_reduceIndex()
following suggestions from @terrelln.
Also added some comments to present logic behind ZSTD_preserveUnsortedMark().
2018-01-06 12:31:26 +01:00
f597f55675 improved btlazy2 : list of unsorted candidates can reach extDict
It used to stop on reaching extDict, for simplification.
As a consequence, there was a small loss of performance each time the round buffer would restart from beginning.
It's not a large difference though, just several hundreds of bytes on silesia.
This patch fixes it.
2017-12-30 15:12:59 +01:00
eb52e2f45e simplify ZSTD_preserveUnsortedMark() implementation
since no compiler attempts to auto-vectorize it.
2017-12-30 11:13:52 +01:00
02f64ef955 btlazy2: fixed interaction between unsortedMark and reduceTable 2017-12-29 19:08:51 +01:00
64482c2c97 fixed bug in dubt
the chain of unsorted candidates could grow beyond lowLimit.
2017-12-29 17:04:37 +01:00
5235d8d6ba first implementation of delayed update for btlazy2
This is a pretty nice speed win.

The new strategy consists in stacking new candidates as if it was a hash chain.
Then, only if there is a need to actually consult the chain, they are batch-updated,
before starting the match search itself.
This is supposed to be beneficial when skipping positions,
which happens a lot when using lazy strategy.

The baseline performance for btlazy2 on my laptop is :
15#calgary.tar       :   3265536 ->    955985 (3.416),  7.06 MB/s , 618.0 MB/s
15#enwik7            :  10000000 ->   3067341 (3.260),  4.65 MB/s , 521.2 MB/s
15#silesia.tar       : 211984896 ->  58095131 (3.649),  6.20 MB/s , 682.4 MB/s
(only level 15 remains for btlazy2, as this strategy is squeezed between lazy2 and btopt)

After this patch, and keeping all parameters identical,
speed is increased by a pretty good margin (+30-50%),
but compression ratio suffers a bit :
15#calgary.tar       :   3265536 ->    958060 (3.408),  9.12 MB/s , 621.1 MB/s
15#enwik7            :  10000000 ->   3078318 (3.249),  6.37 MB/s , 525.1 MB/s
15#silesia.tar       : 211984896 ->  58444111 (3.627),  9.89 MB/s , 680.4 MB/s

That's because I kept `1<<searchLog` as a maximum number of candidates to update.
But for a hash chain, this represents the total number of candidates in the chain,
while for the binary, it represents the maximum depth of searches.
Keep in mind that a lot of candidates won't even be visited in the btree,
since they are filtered out by the binary sort.

As a consequence, in the new implementation,
the effective depth of the binary tree is substantially shorter.

To compensate, it's enough to increase `searchLog` value.
Here is the result after adding just +1 to searchLog (level 15 setting in this patch):
15#calgary.tar       :   3265536 ->    956311 (3.415),  8.32 MB/s , 611.4 MB/s
15#enwik7            :  10000000 ->   3067655 (3.260),  5.43 MB/s , 535.5 MB/s
15#silesia.tar       : 211984896 ->  58113144 (3.648),  8.35 MB/s , 679.3 MB/s

aka, almost the same compression ratio as before,
but with a noticeable speed increase (+20-30%).

This modification makes btlazy2 more competitive.
A new round of paramgrill will be necessary to determine which levels are impacted and could adopt the new strategy.
2017-12-28 16:58:57 +01:00
899f2a29f6 strategy ZSTD_btopt pinned to (0) variant (faster one) 2017-11-20 11:53:20 -08:00
3f457264d1 slightly improved compression speed 2017-11-19 14:40:21 -08:00
42c1e64270 slightly improved ratio at -22
merging of repcode search into btsearch introduced a small compression ratio regressio at max level :
1.3.2 : 52728769
after repMerge patch : 52760789 (+32020)

A few minor changes have produced this difference.
They can be hard to spot.

This patch buys back about half of the difference,
by no longer inserting position at hc3 when a long match is found there.
It feels strangely counter-intuitive, but works :
after this patch : 52742555 (-18234)
2017-11-19 14:00:55 -08:00