1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-16 20:24:32 +02:00
Commit Graph

4801 Commits

Author SHA1 Message Date
0a0e64c641 LDM manages its own window round buffer 2018-02-27 12:13:23 -08:00
d18d43aaf9 Merge pull request #1024 from terrelln/window-split
Split the window state into substructure
2018-02-26 17:18:33 -08:00
6b88d592fd Reduce ZSTD_CHAINLOG_MAX to 29 in 32-bit mode 2018-02-26 13:30:24 -08:00
7e5e226cbf Split the window state into substructure 2018-02-26 13:29:57 -08:00
50bc2ce95e Merge pull request #1021 from terrelln/lrm-split
Split block compresser out of long range matcher
2018-02-23 17:36:51 -08:00
653383f74a minor nit from Mac XCode 2018-02-22 15:44:26 -08:00
7e2bf4ebad Remove long range matcher immediate repcode check
The compression ratio gets about 0.01% worse on the files I tested, but the
code is much simpler.
2018-02-22 15:18:47 -08:00
af866b3a58 Split block compresser out of long range matcher
* `ZSTD_ldm_generateSequences()` generates the LDM sequences and
  stores them in a table. It should work with any chunk size, but
  is currently only called one block at a time.
* `ZSTD_ldm_blockCompress()` emits the pre-defined sequences, and
  instead of encoding the literals directly, it passes them to a
  secondary block compressor. The code to handle chunk sizes greater
  than the block size is currently commented out, since it is unused.
  The next PR will uncomment exercise this code.
* During optimal parsing, ensure LDM `minMatchLength` is at least
  `targetLength`. Also don't emit repcode matches in the LDM block
  compressor. Enabling the LDM with the optimal parser now actually improves
  the compression ratio.
* The compression ratio is very similar to before. It is very slightly
  different, because the repcode handling is slightly different. If I remove
  immediate repcode checking in both branches the compressed size is exactly
  the same.
* The speed looks to be the same or better than before.

Up Next (in a separate PR)
--------------------------

Allow sequence generation to happen prior to compression, and produce more
than a block worth of sequences. Expose some API for zstdmt to consume.
This will test out some currently untested code in
`ZSTD_ldm_blockCompress()`.
2018-02-22 15:18:41 -08:00
4fb071ec3c Merge pull request #1022 from facebook/bmi2IntoC
Implemented BMI2 functions directly within huf_decompress.c
2018-02-22 14:30:43 -08:00
0fd4df6ed3 Implemented BMI2 functions directly within huf_decompress.c
This makes it easier to edit for maintenance and evolutions
(I plan to experiment modifications in huffman decompression functions).

The methology followed seems broadly applicable to other BMI2 modules.

Performance was tracked rigorously at each step,
there is no noticeable loss (nor win) of performance compared to `#include` version.

Note however that 4X decoder variants tend to be extremely sensitive to code alignment.
This source code resulted in pretty good performance for gcc 7.2 and 7.3,
but future changes (even in other parts of the code) might trigger the issue again.
2018-02-22 10:51:47 -08:00
4d6632c8f3 Merge pull request #1020 from facebook/betterBench
updated fullbench measurement methodology
2018-02-21 14:51:39 -08:00
6e481504ee fullbench includes assert.h
as it is missing for Windows
2018-02-21 11:42:23 -08:00
9c5a8040a9 fixed huf_compress workspace size 2018-02-21 11:34:49 -08:00
364ce19463 update fullbench measurement methodology
to use less calls to time(), like bench.c.

also upgraded accuracy to nanosecond.
2018-02-21 09:43:32 -08:00
993ffffba3 Merge pull request #1019 from facebook/betterBench
improve benchmark measurement for small inputs
2018-02-21 05:47:08 -08:00
25d00d10fc fixed minor conversion warning 2018-02-20 16:52:28 -08:00
010ba5f71f Merge pull request #1017 from terrelln/c-bmi2
[compress] Support BMI2
2018-02-20 15:34:59 -08:00
3538a535bf use TIMELOOP_NANOSEC
as suggested by @terrelln
2018-02-20 15:33:56 -08:00
d3364aa39e improve benchmark measurement for small inputs
by invoking time() once per batch, instead of once per compression / decompression.
Batch is dynamically resized so that each round lasts approximately 1 second.

Also : increases time accuracy to nanosecond
2018-02-20 14:58:40 -08:00
6e128d3534 [BMI2] Add comments to the bmi2 variable in the contexts 2018-02-20 14:12:11 -08:00
70163bf0d3 added clarification comments in zstd_errors.h
answering some points in #1018
2018-02-20 12:54:49 -08:00
7117ea8bec Merge pull request #1011 from terrelln/bmi2
[decompress] Support BMI2
2018-02-15 11:40:34 -08:00
b58f01537e [compress] Support BMI2 2018-02-14 19:20:32 -08:00
4319132312 [decompress] Support BMI2 2018-02-13 17:00:15 -08:00
5cb1144872 fixed --single-thread
was incorrectly set to -T0 (use as many cores as possible) previously
2018-02-13 14:56:35 -08:00
9716250197 Merge pull request #1014 from facebook/fasterDec
Faster decoding speed
2018-02-13 12:05:54 -08:00
9b184359e2 pretify last unit test output 2018-02-13 10:09:01 -08:00
2524cbd847 added code comment on how to generate default tables
as suggested by @terrelln
2018-02-13 10:02:25 -08:00
71c07966bb added SEQSYMBOL_TABLE_SIZE()
as suggested by @terrelln's comment
2018-02-12 16:52:15 -08:00
821efa466e fixed logo path 2018-02-10 21:05:48 -08:00
5f7495371e Merge branch 'dev' into fasterDec 2018-02-10 14:24:44 -08:00
992c2370f6 Merge pull request #1010 from facebook/flexibleLevel
Updatable compression parameters
2018-02-10 14:19:54 -08:00
9945e60ac4 Merge branch 'dev' into flexibleLevel 2018-02-10 11:54:49 -08:00
4e3db17cab Merge pull request #1013 from facebook/fasterDec32
Disable Long Offset mode in 32-bits
2018-02-09 16:13:55 -08:00
75689838e4 specify new command --single-thread 2018-02-09 15:55:41 -08:00
c72091556b fixed minor nit as per @terrelln's comments 2018-02-09 09:46:08 -08:00
4beaeaace5 Merge branch 'dev' into flexibleLevel 2018-02-09 09:15:05 -08:00
6bfe50ad48 re-enabled ZSTD_decompressSequencesLong() 2018-02-09 09:14:25 -08:00
1850597eaa pre-calculated default decoding tables 2018-02-09 06:01:02 -08:00
ab75df21ed fixed mono-symbol distribution 2018-02-09 05:12:13 -08:00
421a2716d8 fixed default fse distributions
but would be better to pre-calculate tables, for speed
2018-02-09 04:50:58 -08:00
95424409ea addBits and baseline into FSE decoding table
note : unfinished
- need new default tables
- need modify long mode
2018-02-09 04:25:15 -08:00
cc61a3694a Merge branch 'dev' into fasterDec 2018-02-09 02:41:02 -08:00
d6e841d609 fixed streaming_memory_usage example
also:
ensure zstd.h is read from ../lib (instead of /usr/include)
2018-02-07 23:42:09 -08:00
de68c2ff10 Merged ZSTD_preserveUnsortedMark() into ZSTD_reduceIndex()
as it's faster, due to one memory scan instead of two
(confirmed by microbenchmark).

Note : as ZSTD_reduceIndex() is rarely invoked,
it does not translate into a visible gain.
Consider it an exercise in auto-vectorization and micro-benchmarking.
2018-02-07 14:22:35 -08:00
0170cf9a7a minor : modified ZSTD_preserveUnsortedMark() to be more vectorization friendly 2018-02-05 11:46:02 -08:00
94efb1749d faster decoding in 32-bits mode for long offsets (tentative)
On my laptop:
Before:
./zstd32 -b --zstd=wlog=27 silesia.tar enwik8 -S
 3#silesia.tar       : 211984896 ->  66683478 (3.179),  97.6 MB/s , 400.7 MB/s
 3#enwik8            : 100000000 ->  35643153 (2.806),  76.5 MB/s , 303.2 MB/s

After:
./zstd32 -b --zstd=wlog=27 silesia.tar enwik8 -S
 3#silesia.tar       : 211984896 ->  66683478 (3.179),  97.4 MB/s , 435.0 MB/s
 3#enwik8            : 100000000 ->  35643153 (2.806),  76.2 MB/s , 338.1 MB/s

Mileage vary, depending on file, and cpu type.
But a generic rule is : x86 benefits less from "long-offset mode" than x64,
maybe due to register pressure.
On "entropy", long-mode is _never_ a win for x86.
On my laptop though, it may, depending on file and compression level
(enwik8 benefits more from "long-mode" than silesia).
2018-02-04 01:49:31 -08:00
5188749e1c ensure compression parameters are updated when only compression level is changed 2018-02-02 16:31:20 -08:00
4b525af53a zstdmt: applies new parameters on the fly
when invoked from ZSTD_compress_generic()
2018-02-02 15:58:13 -08:00
90eca318a7 fileio: create dedicated function to generate zstd frames
like other formats
2018-02-02 14:24:56 -08:00