1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-04 14:48:45 +02:00
Commit Graph

104 Commits

Author SHA1 Message Date
54144285fd small speed improvement for strategy fast
gcc 9.3.0 :
kennedy : 459 -> 466
silesia : 360 -> 365
enwik8  : 267 -> 269

clang 10.0.0 :
kennedy : 436 -> 441
silesia : 364 -> 366
enwik8  : 271 -> 272
2020-05-07 06:15:58 -07:00
5fcbc484c8 Merge pull request #2040 from caoyzh/dev-2
Optimize by prefetching on aarch64
2020-04-08 13:14:47 -07:00
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
7201980650 Optimize by prefetching on aarch64 2020-03-14 15:25:59 +08:00
43fc88f443 Adding comment and remvoing ivdep 2020-03-10 14:57:27 -05:00
4c72a1a9c2 adding vector to main loop 2020-03-05 09:55:38 -08:00
ddab2a94e8 Pass iend into ZSTD_storeSeq() to allow ZSTD_wildcopy() 2019-09-20 00:56:20 -07:00
243200e5bf minor refactor of ZSTD_fast
- reduced variables lifetime
- more accurate code comments
2019-09-17 14:02:57 -07:00
facbe8b2c2 factored the logic selecting lowest match index
as suggested by @terrelln
2019-08-05 15:18:43 +02:00
98692c2838 fixed compression ratio regression when dictionary-compressing medium-size inputs at levels 1-3 2019-08-01 15:58:17 +02:00
a30febaeeb Made fast strategy compatible with new offset validation strategy
fast mode does the same thing as before :
it pre-emptively invalidates any index that could lead to offset > maxDistance.
It's supposed to help speed.

But this logic is performed inside zstd_fast,
so that other strategies can select a different behavior.
2019-05-31 16:34:55 -07:00
95624b77e4 [libzstd] Speed up single segment zstd_fast by 5%
This PR is based on top of PR #1563.

The optimization is to process two input pointers per loop.
It is based on ideas from [igzip] level 1, and talking to @gbtucker.

| Platform                | Silesia     | Enwik8 |
|-------------------------|-------------|--------|
| OSX clang-10            | +5.3%       | +5.4%  |
| i9 5 GHz gcc-8          | +6.6%       | +6.6%  |
| i9 5 GHz clang-7        | +8.0%       | +8.0%  |
| Skylake 2.4 GHz gcc-4.8 | +6.3%       | +7.9%  |
| Skylake 2.4 GHz clang-7 | +6.2%       | +7.5%  |

Testing on all Silesia files on my Intel i9-9900k with gcc-8

| Silesia File | Ratio Change | Speed Change |
|--------------|--------------|--------------|
| silesia.tar  | +0.17%       | +6.6%        |
| dickens      | +0.25%       | +7.0%        |
| mozilla      | +0.02%       | +6.8%        |
| mr           | -0.30%       | +10.9%       |
| nci          | +1.28%       | +4.5%        |
| ooffice      | -0.35%       | +10.7%       |
| osdb         | +0.75%       | +9.8%        |
| reymont      | +0.65%       | +4.6%        |
| samba        | +0.70%       | +5.9%        |
| sao          | -0.01%       | +14.0%       |
| webster      | +0.30%       | +5.5%        |
| xml          | +0.92%       | +5.3%        |
| x-ray        | -0.00%       | +1.4%        |

Same tests on Calgary. For brevity, I've only included files
where compression ratio regressed or was much better.

| Calgary File | Ratio Change | Speed Change |
|--------------|--------------|--------------|
| calgary.tar  | +0.30%       | +7.1%        |
| geo          | -0.14%       | +25.0%       |
| obj1         | -0.46%       | +15.2%       |
| obj2         | -0.18%       | +6.0%        |
| pic          | +1.80%       | +9.3%        |
| trans        | -0.35%       | +5.5%        |

We gain 0.1% of compression ratio on Silesia.
We gain 0.3% of compression ratio on enwik8.
I also tested on the GitHub and hg-commands datasets without a dictionary,
and we gain a small amount of compression ratio on each, as well as speed.

I tested the negative compression levels on Silesia on my
Intel i9-9900k with gcc-8:

| Level | Ratio Change | Speed Change |
|-------|--------------|--------------|
| -1    | +0.13%       | +6.4%        |
| -2    | +4.6%        | -1.5%        |
| -3    | +7.5%        | -4.8%        |
| -4    | +8.5%        | -6.9%        |
| -5    | +9.1%        | -9.1%        |

Roughly, the negative levels now scale half as quickly. E.g. the new
level 16 is roughly equivalent to the old level 8, but a bit quicker
and smaller.  If you don't think this is the right trade off, we can
change it to multiply the step size by 2, instead of adding 1. I think
this makes sense, because it gives a bit slower ratio decay.

[igzip]: https://github.com/01org/isa-l/tree/master/igzip
2019-04-02 19:02:50 -07:00
f00407b640 Split out zstd_fast dict match state function 2019-03-29 10:39:16 -06:00
ed2fb6bd57 fixed : better error message when dictionary missing
during benchmark.
Also : refactored ZSTD_fillHashTable(),
just for readability (it does the same thing)
2018-12-20 17:20:07 -08:00
e874dacc08 changed searchLength into minMatch
refactored all relevant API and calls
for consistency.
2018-11-20 14:56:07 -08:00
bad74c4781 Use Working Ctx Logs when not in DMS Mode
We pre-hash the ptr for the dict match state sometimes. When that actually
happens, a hashlog of 0 can produce undefined behavior (right shift a long
long by 64). Only applies to unoptimized compilations, since when
optimizations are applied, those hash operations are dropped when we're not
actually in dms mode.
2018-09-28 17:12:54 -07:00
fe96e98f81 Support a Separate Hash Log in ZSTD_fast 2018-09-28 17:12:54 -07:00
bc880ebe8f Stop Passing in hashLog and stepSize to ZSTD_compressBlock_fast_generic 2018-09-28 17:12:54 -07:00
dcdf437fed Also Remove CParams from Table Filling Functions' Args 2018-09-28 17:10:42 -07:00
6cb2454646 Remove CParams from Block Compressor Functions' Args 2018-09-28 17:10:42 -07:00
f98c69d77c fix : huge (>4GB) stream of blocks
experimental function ZSTD_compressBlock() is designed for very small data in mind,
for situation where saving the ~12 bytes of frame header can actually make a difference.

Some systems though may have to deal with small and large data entangled.
If it's larger than a block (> 128KB), compressBlock() cannot compress them in one round.

That's why it's possible to compress in multiple rounds.
This is a chain of compressed blocks.

Some users push this capability to the limit, encoding gigantic chain of blocks.
On crossing the 4GB limit, some internal overflow occurs.

This fix moves the overflow correction mechanism higher in the call chain,
so that it's applied also to gigantic chains of blocks.

Added a test case in fuzzer.c, which crashes before the fix, and pass now.
2018-09-26 14:24:28 -07:00
b048af5999 ZSTD_fast: Don't Search Dict Context When Mismatch Was Found 2018-09-14 15:23:35 -07:00
c2c47e24e0 support targetlen==0 with strategy==ZSTD_fast
to mean "normal compression",
targetlen >= 1 now means "disable huffman compression of literals"
2018-06-07 15:49:01 -07:00
357c648c3f changed a few variable names
to unify naming convention
2018-06-04 17:10:50 -07:00
2108decb41 Fixed a nasty corruption bug
recently introduce into the new dictionary mode.
The bug could be reproduced with this command :
./zstreamtest -v --opaqueapi --no-big-tests -s4092 -t639

error was in function ZSTD_count_2segments() :
the beginning of the 2nd segment corresponds to prefixStart
and not the beginning of the current block (istart == src).
This would result in comparing the wrong byte.
2018-06-01 18:54:34 -07:00
d9c7e67125 Assert that Dict and Current Window are Adjacent in Index Space 2018-05-23 17:53:03 -04:00
298d24fa57 Make loadedDictEnd an Index, not the Dict Len 2018-05-23 17:53:03 -04:00
7ef85e0618 Fixes in re Comments 2018-05-23 17:53:03 -04:00
9c92223468 Avoid Undefined Behavior in Match Ptr Calculation 2018-05-23 17:53:03 -04:00
95bdf20a87 Moar Renames 2018-05-23 17:53:03 -04:00
b05ae9b608 Refine ip Initialization to Avoid ARM Weirdness 2018-05-23 17:53:03 -04:00
1a7b34ef28 Use New Index Invariant to Simplify Conditionals 2018-05-23 17:53:03 -04:00
d005e5daf4 Whitespace Fix 2018-05-23 17:53:03 -04:00
154eb09419 Switch to Original Match Calc for noDict Repcode Check 2018-05-23 17:53:03 -04:00
191fc74a51 Rename 'hasDict' to 'dictMode' 2018-05-23 17:53:03 -04:00
ae4fcf7816 Respond to PR Comments; Formatting/Style/Lint Fixes 2018-05-23 17:53:03 -04:00
ca26cecc7a Rename and Reformat 2018-05-23 17:53:03 -04:00
c31ee3c7f8 Fix Rep Code Initialization 2018-05-23 17:53:03 -04:00
b67196f30d Coalesce hasDictMatchState and extDict Checks into One Enum and Rename Stuff 2018-05-23 17:53:03 -04:00
265c2869d1 Split Wrapper Functions to Cause Inlining 2018-05-23 17:53:03 -04:00
6929964d65 Add bounds check in repcode tests 2018-05-23 17:53:03 -04:00
70a537d1d7 Initial Repcode Check Support for Ext Dict Ctx 2018-05-23 17:53:03 -04:00
8d24ff0353 Preliminary Support in ZSTD_compressBlock_fast_generic() for Ext Dict Ctx 2018-05-23 17:53:03 -04:00
295ab0dbfa Only load extra table positions for CDicts
Zstdmt uses prefixes to load the overlap between segments. Loading extra
positions makes compression non-deterministic, depending on the previous
job the context was used for. Since loading extra position takes extra
time as well, only do it when creating a `ZSTD_CDict`.

Fixes #1077.
2018-04-02 14:41:30 -07:00
a146ee04ae added negative compression levels
negative compression level trade compression ratio for more compression speed.
They turn off huffman compression of literals,
and use row 0 as baseline with a stepSize = -cLevel.

added associated test in fuzzer

also added : new advanced parameter ZSTD_p_literalCompression
2018-03-11 05:21:53 -07:00
7e5e226cbf Split the window state into substructure 2018-02-26 13:29:57 -08:00
5188749e1c ensure compression parameters are updated when only compression level is changed 2018-02-02 16:31:20 -08:00
887cd4e35e Split ZSTD_CCtx into smaller sub-structures 2018-01-16 11:17:50 -08:00
9a211d1f05 Load more dictionary positions into table if empty
If the hash table is empty load positions into the hash table
that we would otherwise skip.

| Level | Data Set     | Improvement |
|-------|--------------|-------------|
| 1     | github       | 0.44%       |
| 1     | hg-changelog | 0.13%       |
| 1     | hg-commands  | 1.28%       |
| 1     | hg-manifest  | 0.70%       |
| 3     | github       | 0.74%       |
| 3     | hg-changelog | 0.87%       |
| 3     | hg-commands  | 1.74%       |
| 3     | hg-manifest  | 0.23%       |
2018-01-12 16:17:22 -08:00
ee441d5d2b renamed zstd_compress.h into zstd_compress_internal.h
to emphasize the fact that all definitions it contains
must remain private, accross lib/compress modules.
2017-11-07 16:15:23 -08:00