1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-03 22:30:29 +02:00
Commit Graph

2243 Commits

Author SHA1 Message Date
b34729018c Minor simplication: no longer need to check src size if using cardinality for minTableLog 2022-10-17 07:55:07 -07:00
75cd42afd7 Update regression results and better variable naming for HUF_cardinality 2022-10-14 13:37:19 -07:00
e60cae33cf Additional ratio optimizations 2022-10-14 10:37:35 -07:00
fa7d9c1139 Set threshold to use optimal table log 2022-10-11 14:33:25 -07:00
8888a2ddcc CI failure fixes 2022-10-11 13:12:19 -07:00
1b445c1c2e Fix hash4Ptr for big endian (#3227) 2022-08-01 10:41:24 -07:00
43f21a600e Intial commit to address 3090. Added support to decompress empty block. (#3118)
* Intial commit to address 3090. Added support to decompress empty block

* Update zstd_decompress_block.c

Addressed review comments for the case of 'set_basic'

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2022-07-14 11:54:34 -07:00
cb9e341129 Nits 2022-06-23 16:59:21 -04:00
93b89fb24b Add docs 2022-06-22 16:13:07 -04:00
2a128110d0 Add prefetchCDictTables CCtxParam 2022-06-22 16:13:07 -04:00
f6ef14329f "Short cache" optimization for level 1-4 DMS (+5-30% compression speed) (#3152)
* first attempt at fast DMS short cache

* significant wins for some scenarios

* fix all clang regressions

* nits

* fix 1.5% gcc11 regression on hot 110Kdict scenario

* fix CI

* nit

* Add tags to doublefast hash table

* use tags in doublefast DMS

* Fix CI

* Clean up some hardcoded logic / constants

* Switch forCCtx to an enum

* nit

* add short cache to ip+1 long search

* Move tag size into hashLog

* Minor nits

* Truncate dictionaries greater than 16MB in short cache mode

* Helper function for tag comparison

* Cap short cache hashLog at 24 to prevent overflow

* size_t dictTagsMatch -> int dictTagsMatch

* nit

* Clean up and comment dictionary truncation

* Move ZSTD_tableFillPurpose_e next to ZSTD_dictTableLoadMethod_e

* Comment and expand helper functions

* Asserts and documentation

* nit
2022-06-21 17:27:19 -04:00
05f3f415ce Fix big endian ARM NEON path
It is not using the NEON acceleration but the bit grouping was applied
2022-06-13 09:16:24 +01:00
f313a773a4 Merge pull request #3157 from embg/huge_dict_bugfix
Bugfix for huge dictionaries
2022-06-09 15:35:29 -04:00
31bd6402c6 Bugfix for huge dictionaries 2022-06-09 11:39:30 -04:00
7c05b9aec3 Remove expensive assert in --rsyncable hot loop
This assert slows the loop down by 10x. We can get similar
coverage by asserting at the beginning & end of the loop.

We need this fix because Debian compiles zstd with asserts
enabled. Separately, we should ask them why, and if they would
consider disabling asserts in their builds. Since we don't
optimize for assert enabled builds.

Fixes Issue #3150.
2022-06-06 11:56:13 -07:00
5081ccb056 Update zstd_compress.c 2022-05-30 14:08:19 +03:00
9166c6ae20 Again unused error warning. Fixed 2022-05-23 14:51:47 +00:00
6b561d230f Move NEON version to a separate function and fix indentation 2022-05-23 14:49:35 +00:00
778f639be9 Disable unused variable warning 2022-05-22 10:50:33 +00:00
e11783b04d [lazy] Optimize ZSTD_row_getMatchMask for level 8-10
We found that movemask is not used properly or consumes too much CPU.
This effort helps to optimize the movemask emulation on ARM.

For level 8-9 we saw 3-5% improvements. For level 10 we say 1.5%
improvement.

The key idea is not to use pure movemasks but to have groups of bits.
For rowEntries == 16, 32 we are going to have groups of size 4 and 2
respectively. It means that each bit will be duplicated within the group

Then we do AND to have only one bit set in the group so that iteration
with lowering bit `a &= (a - 1)` works as well.

Also, aarch64 does not have rotate instructions for 16 bit, only for 32
and 64, that's why we see more improvements for level 8-9.

vshrn_n_u16 instruction is used to achieve that: vshrn_n_u16 shifts by
4 every u16 and narrows to 8 lower bits. See the picture below. It's
also used in
[Folly](c570259008/folly/container/detail/F14Table.h (L446)).
It also uses 2 cycles according to Neoverse-N{1,2} guidelines.

64 bit movemask is already well optimized. We have ongoing experiments
but were not able to validate other implementations work reliably faster.
2022-05-22 10:44:24 +00:00
f349d18776 Merge pull request #3127 from embg/repcode_history
Correct and clarify repcode offset history logic
2022-05-12 13:50:15 -04:00
3620a0a565 Nits 2022-05-12 12:53:15 -04:00
1dd046a507 Fix Comments Slightly 2022-05-11 12:38:45 -04:00
cd1f582943 Hoist Hash Table Writes Up into Each Match Found Block
Refactoring this way avoids the bad write in the case that `step > 4`, and
is a bit more straightforward. It also seems to perform better!
2022-05-11 11:27:34 -04:00
040986a4f4 ZSTD_fast_noDict: Minimize Checks When Writing Hash Table for ip1
This commit avoids checking whether a hashtable write is safe in two of the
three match-found paths in `ZSTD_compressBlock_fast_noDict_generic`. This pro-
duces a ~0.5% speed-up in compression.

A comment in the code describes why we can skip this check in the other two
paths (the repcode check and the first match check in the unrolled loop).

A downside is that in the new position where we make this check, we have not
yet computed `mLength`. We therefore have to avoid writing *possibly* dangerous
positions, rather than the old check which only avoids writing *actually*
dangerous positions. This leads to a miniscule loss in ratio (remember that
this scenario can only been triggered in very negative levels or under incomp-
ressibility acceleration).
2022-05-10 14:29:39 -07:00
22875ece61 Nits 2022-05-09 21:01:38 -04:00
97aabc496e Correct and clarify repcode offset history logic 2022-05-09 21:01:38 -04:00
ac371be27b Remove hasStep variant (not enough wins to justify the code size increase) 2022-04-28 18:06:24 -04:00
ce6b69f5c5 Final nit 2022-04-28 14:49:45 -04:00
6a2e1f7c69 Revert "Hardcode repcode safety check, fix cosmetic nits"
This reverts commit 518cb83833.
2022-04-27 18:16:21 -04:00
518cb83833 Hardcode repcode safety check, fix cosmetic nits 2022-04-26 17:54:25 -04:00
809f652912 Optimize repcode predicate, hardcode hasStep == 0 scenario, cosmetic fixes 2022-04-20 14:40:52 -04:00
2820efe7ec Nits 2022-04-19 11:39:52 -04:00
3536262f70 Port noDict pipeline 2022-04-15 12:16:16 -04:00
64efba4c5e Software pipeline for ZSTD_compressBlock_fast_dictMatchState (#3086)
* prefetch dict content inside loop

* ip0/ip1 pipeline

* add L2_4 prefetch to dms pipeline

* Remove L1 prefetch

* Remove L2 prefetching

* Reduce # of gotos

* Cosmetic fixes

* Check final position sometimes

* Track step size as in bc768bc

* Fix nits
2022-03-17 12:35:11 -04:00
b772f53952 Typo and grammar fixes 2022-03-12 08:58:04 +01:00
529cd7b821 Fix nits 2022-02-14 14:24:50 -05:00
db2f4a6532 Move bitwise builtins into bits.h 2022-02-14 11:16:03 -05:00
b9566fc558 Add rails for huffman table log calculation (#3047) 2022-02-02 15:12:48 -05:00
cad9f8d5f9 fix 44239
credit to oss-fuzz

This issue could happen when using the new Sequence Compression API in Explicit Delimiter Mode
with a too small dstCapacity.
In which case, there was one place where the buffer size wasn't checked.
2022-02-01 10:49:38 -08:00
637b2d7a24 fixed bug 44168
discovered by oss-fuzz

It's a bug in the test itself :
ZSTD_compressBound() as an upper bound of the compress size
only works for data compressed "normally".
But in situations where many flushes are forcefully introduced,
this creates many more blocks,
each of which has a potential to increase the size by 3 bytes.
In extreme cases (lots of small incompressible blocks), the expansion can go beyond ZSTD_compressBound().

This situation is similar when using the CompressSequences() API
with Explicit Block Delimiters.
In which case, each explicit block acts like a deliberate flush.
When employed by a fuzzer, it's possible to generate scenarios like the one described above,
with tons of incompressible blocks of small sizes,
thus going beyond ZSTD_compressBound().

fix : when using Explicit Block Delimiters, use a larger bound, to account for this scenario.
2022-01-29 16:36:20 -08:00
9a68840176 minor refactor to blocksplit
notably simplication of ZSTD_deriveSeqStoreChunk()
2022-01-27 20:24:35 -08:00
5d70ec0bc4 Merge pull request #3033 from facebook/fix44108
fix issue 44108
2022-01-27 10:57:48 -08:00
bad7f82300 Merge pull request #2974 from facebook/fix2966_part3
Lazy parameters adaptation (part 1 - ZSTD_c_stableInBuffer)
2022-01-27 06:14:04 -08:00
8df1257c3c fix issue 44108
credit to oss-fuzz

In rare circumstances, the block-splitter might cut a block at the exact beginning of a repcode.
In which case, since litlength=0, if the repcode expected 1+ literals in front, its signification changes.
This scenario is controlled in ZSTD_seqStore_resolveOffCodes(),
and the repcode is transformed into a raw offset when its new meaning is incorrect.

In more complex scenarios, the previous block might be emitted as uncompressed after all,
thus modifying the expected repcode history.
In the case discovered by oss-fuzz, the first block is emitted as uncompressed,
so the repcode history remains at default values: 1,4,8.

But since the starting repcode is repcode3, and the literal length is == 0,
its meaning is : = repcode1 - 1.
Since repcode1==1, it results in an offset value of 0, which is invalid.

So that's what the `assert()` was verifying : the result of the repcode translation should be a valid offset.

But actually, it doesn't matter, because this result will then be compared to reality,
and since it's an invalid offset, it will necessarily be discarded if incorrect,
then the repcode will be replaced by a raw offset.

So the `assert()` is not useful.
Furthermore, it's incorrect, because it assumes this situation cannot happen, but it does, as described in above scenario.
2022-01-27 05:49:59 -08:00
f2d9652ad8 more usage of new error code stabilityCondition_notRespected
as suggested by @terrelln
2022-01-26 18:30:55 -08:00
8b46895588 removed new huffman depth heuristic
results are now identical to before this PR
2022-01-26 15:22:06 -08:00
a66e8bb437 introduced LitHufLog constant
which properly represents the maximum bit size of compressed literals (11) as defined in the specification.

To be preferred from HUF_TABLELOG_DEFAULT which represents the same value but by accident.

Name selected to keep the same convention as existing width definitions,
MLFSELog, LLFSELog and OffFSELog.
2022-01-26 14:47:24 -08:00
32a5d95dcb moved HufLog to lib/decompress
it's only used to size decompression tables
2022-01-26 14:47:24 -08:00
e9dd923fa4 only declare debug functions in debug mode 2022-01-26 14:47:24 -08:00