1
0
mirror of https://github.com/facebook/zstd.git synced 2025-03-07 01:10:04 +02:00

2168 Commits

Author SHA1 Message Date
Yann Collet
481a2e1010
Merge pull request #3403 from facebook/setCParams
ZSTD_CCtx_setCParams
2022-12-28 14:07:13 -08:00
Elliot Gorokhovsky
2a402626dd
External matchfinder API (#3333)
* First building commit with sample matchfinder

* Set up ZSTD_externalMatchCtx struct

* move seqBuffer to ZSTD_Sequence*

* support non-contiguous dictionary

* clean up parens

* add clearExternalMatchfinder, handle allocation errors

* Add useExternalMatchfinder cParam

* validate useExternalMatchfinder cParam

* Disable LDM + external matchfinder

* Check for static CCtx

* Validate mState and mStateDestructor

* Improve LDM check to cover both branches

* Error API with optional fallback

* handle RLE properly for external matchfinder

* nit

* Move to a CDict-like model for resource ownership

* Add hidden useExternalMatchfinder bool to CCtx_params_s

* Eliminate malloc, move to cwksp allocation

* Handle CCtx reset properly

* Ensure seqStore has enough space for external sequences

* fix capitalization

* Add DEBUGLOG statements

* Add compressionLevel param to matchfinder API

* fix c99 issues and add a param combination error code

* nits

* Test external matchfinder API

* C90 compat for simpleExternalMatchFinder

* Fix some @nocommits and an ASAN bug

* nit

* nit

* nits

* forward declare copySequencesToSeqStore functions in zstd_compress_internal.h

* nit

* nit

* nits

* Update copyright headers

* Fix CMake zstreamtest build

* Fix copyright headers (again)

* typo

* Add externalMatchfinder demo program to make contrib

* Reduce memory consumption for small blockSize

* ZSTD_postProcessExternalMatchFinderResult nits

* test sum(matchlen) + sum(litlen) == srcSize in debug builds

* refExternalMatchFinder -> registerExternalMatchFinder

* C90 nit

* zstreamtest nits

* contrib nits

* contrib nits

* allow block splitter + external matchfinder, refactor

* add windowSize param

* add contrib/externalMatchfinder/README.md

* docs

* go back to old RLE heuristic because of the first block issue

* fix initializer element is not a constant expression

* ref contrib from zstd.h

* extremely pedantic compiler warning fix, meson fix, typo fix

* Additional docs on API limitations

* minor nits

* Refactor maxNbSeq calculation into a helper function

* Fix copyright
2022-12-28 16:45:14 -05:00
Yann Collet
b17743e41b Signal parameter change during MT compression 2022-12-28 13:14:58 -08:00
Yann Collet
89342d1e07 New xp library symbol : ZSTD_CCtx_setCParams()
Inspired by #3395,
offer a new capability to set all parameters defined in a ZSTD_compressionParameters structure
with a single symbol invocation
to improve user code brevity.
2022-12-27 23:49:22 -08:00
Yann Collet
089b2797e3
Merge pull request #3398 from facebook/fix3316
spec update : require minimum nb of literals for 4-streams mode
2022-12-22 16:57:05 -08:00
Yann Collet
6a9c525903 spec update : require minimum nb of literals for 4-streams mode
Reported by @shulib :
the specification for 4-streams mode
doesn't work when the amount of literals to compress is 5 bytes.
Extending it, it also doesn't work for sizes 1 or 2.

This patch updates the specification and the implementation
to require a minimum of 6 literals to trigger or accept the 4-streams mode.

The impact is expected to be a no-op :
the 4-streams mode is never triggered for such small quantity of literals anyway,
since it would be wasteful (it costs ~7.3 bytes more than single-stream mode).
An informal lower limit is set at ~256 bytes,
so the technical minimum is very far from this limit.

This is just meant for completeness of the specification.
2022-12-22 16:14:34 -08:00
Yann Collet
ea2895cef4 Support decompression of compressed blocks of size ZSTD_BLOCKSIZE_MAX exactly 2022-12-22 12:40:27 -08:00
Nick Terrell
40a7188130 Fix make clangbuild & add CI
Fix the errors for:
* `-Wdocumentation`
* `-Wconversion` except `-Wsign-conversion`
2022-12-21 17:31:04 -08:00
W. Felix Handte
5d693cc38c Coalesce Almost All Copyright Notices to Standard Phrasing
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do sed -i '/Copyright .* \(Yann Collet\)\|\(Meta Platforms\)/ s/Copyright .*/Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done

git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0*.c lib/legacy/zstd_v0*.h
nano ./programs/windres/zstd.rc
nano ./build/VS2010/zstd/zstd.rc
nano ./build/VS2010/libzstd-dll/libzstd-dll.rc
```
2022-12-20 12:52:34 -05:00
W. Felix Handte
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
Yann Collet
832c1a6a1c minor reformatting
and minor reliability and maintenance changes
2022-12-18 11:26:57 -08:00
Yann Collet
51355e1f70
Merge pull request #3362 from facebook/compressBound
check potential overflow of compressBound()
2022-12-16 14:22:22 -08:00
Nick Terrell
ee6475cbbd Add missing parens around macro definition
Fixes #3301.
2022-12-15 17:18:23 -08:00
Yann Collet
45ed0df18a check potential overflow of compressBound()
fixed #3323, reported by @nigeltao

Completed documentation around this risk
(which is largely theoretical,
I can't see that happening in any "real world" scenario,
but an erroneous @srcSize value could indeed trigger it).
2022-12-15 15:23:15 -08:00
Nick Terrell
a91e7ec175 Fix corruption that rarely occurs in 32-bit mode with wlog=25
Fix an off-by-one error in the compressor that emits corrupt blocks if:

* Zstd is compiled in 32-bit mode
* The windowLog == 25 exactly
* An offset of 2^25-3, 2^25-2, 2^25-1, or 2^25 is emitted
* The bitstream had 7 bits leftover before writing the offset

This bug has been present since before v1.0, but wasn't able to easily
be triggered, since until somewhat recently zstd wasn't able to find
matches that were within 128KB of the window size.

Add a test case, and fix 2 bugs in `ZSTD_compressSequences()`:
* The `ZSTD_isRLE()` check was incorrect. It wouldn't produce
  corruption, but it could waste CPU and not emit RLE even if the block
  was RLE
* One windowSize was `1 << windowLog`, not `1u << windowLog`

Thanks to @tansy for finding the issue, and giving us a reproducer!

Fixes Issue #3350.
2022-12-15 14:41:50 -08:00
daniellerozenblit
e2fc93340f
Merge branch 'dev' into http-to-https 2022-12-15 10:46:13 -05:00
Danielle Rozenblit
4dffc35f2e Convert references to https from http 2022-12-14 06:58:35 -08:00
Danielle Rozenblit
69ec75f0d5 fix window resizing edge case 2022-12-13 08:35:20 -08:00
Yann Collet
80cf73fd66
Merge pull request #3320 from cwoffenden/msvc-arm64-size_t
Fix for MSVC C4267 warning on ARM64 (which becomes error C2220 with /WX)
2022-12-04 18:55:30 -08:00
Yann Collet
ecd7601c36 minor: proper pledgedSrcSize trace 2022-11-22 06:00:45 -08:00
Elliot Gorokhovsky
c8d870fe52 Improve LDM cparam validation logic 2022-11-21 15:39:18 -05:00
Carl Woffenden
0547c3d3f8 Random edit to re-run the CI
I don't believe the (x64) Mac failure is related to error since it would take the SSE path.
2022-11-19 19:04:08 +01:00
Carl Woffenden
0168914490 Fix for MSVC C4267 error 2022-11-18 11:31:17 +01:00
Nick Terrell
dcc7228de9
[lazy] Use switch instead of indirect function calls. (#3295)
Use a switch statement to select the search function instead of an
indirect function call. This results in a sizable performance win.

This PR is a modification of the approach taken in PR #2828.
When I measured performance for that commit, it was neutral.
However, I now see a performance regression on gcc, but still
neutral on clang. I'm measuring on the same platform, but with
newer compilers. The new approach beats both the current dev
branch and the baseline before PR #2828 was merged.

This PR is necessary for Issue #3275, to update zstd in the kernel.
Without this PR there is a large regression in greedy - btlazy2
compression speed. With this PR it is about neutral.

gcc version: 12.2.0
clang version: 14.0.6
dataset: silesia.tar

| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta  |
|----------|-------|------------------|-----------------|--------|
| gcc      |     5 |            102.6 |           113.7 | +10.8% |
| gcc      |     7 |             66.6 |            74.8 | +12.3% |
| gcc      |     9 |             51.5 |            58.9 | +14.3% |
| gcc      |    13 |             14.3 |            14.3 |  +0.0% |
| clang    |     5 |            108.1 |           114.8 |  +6.2% |
| clang    |     7 |             68.5 |            72.3 |  +5.5% |
| clang    |     9 |             53.2 |            56.2 |  +5.6% |
| clang    |    13 |             14.3 |            14.7 |  +2.8% |

The binary size stays just about the same for clang and gcc, measured
using the `size` command:

| Compiler | Branch | Text    | Data | BSS | Total   |
|----------|--------|---------|------|-----|---------|
| gcc      | dev    | 1127950 | 3312 | 280 | 1131542 |
| gcc      | PR     | 1123422 | 2512 | 280 | 1126214 |
| clang    | dev    | 1046254 | 3256 | 216 | 1049726 |
| clang    | PR     | 1048198 | 2296 | 216 | 1050710 |
2022-10-21 17:14:02 -07:00
Danielle Rozenblit
a910489ff5 No longer pass srcSize to minTableLog 2022-10-17 08:03:44 -07:00
Danielle Rozenblit
b34729018c Minor simplication: no longer need to check src size if using cardinality for minTableLog 2022-10-17 07:55:07 -07:00
Danielle Rozenblit
75cd42afd7 Update regression results and better variable naming for HUF_cardinality 2022-10-14 13:37:19 -07:00
Danielle Rozenblit
e60cae33cf Additional ratio optimizations 2022-10-14 10:37:35 -07:00
Danielle Rozenblit
fa7d9c1139 Set threshold to use optimal table log 2022-10-11 14:33:25 -07:00
Danielle Rozenblit
8888a2ddcc CI failure fixes 2022-10-11 13:12:19 -07:00
Qiongsi Wu
1b445c1c2e
Fix hash4Ptr for big endian (#3227) 2022-08-01 10:41:24 -07:00
udayanbapat
43f21a600e
Intial commit to address 3090. Added support to decompress empty block. (#3118)
* Intial commit to address 3090. Added support to decompress empty block

* Update zstd_decompress_block.c

Addressed review comments for the case of 'set_basic'

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2022-07-14 11:54:34 -07:00
Elliot Gorokhovsky
cb9e341129 Nits 2022-06-23 16:59:21 -04:00
Elliot Gorokhovsky
93b89fb24b Add docs 2022-06-22 16:13:07 -04:00
Elliot Gorokhovsky
2a128110d0 Add prefetchCDictTables CCtxParam 2022-06-22 16:13:07 -04:00
Elliot Gorokhovsky
f6ef14329f
"Short cache" optimization for level 1-4 DMS (+5-30% compression speed) (#3152)
* first attempt at fast DMS short cache

* significant wins for some scenarios

* fix all clang regressions

* nits

* fix 1.5% gcc11 regression on hot 110Kdict scenario

* fix CI

* nit

* Add tags to doublefast hash table

* use tags in doublefast DMS

* Fix CI

* Clean up some hardcoded logic / constants

* Switch forCCtx to an enum

* nit

* add short cache to ip+1 long search

* Move tag size into hashLog

* Minor nits

* Truncate dictionaries greater than 16MB in short cache mode

* Helper function for tag comparison

* Cap short cache hashLog at 24 to prevent overflow

* size_t dictTagsMatch -> int dictTagsMatch

* nit

* Clean up and comment dictionary truncation

* Move ZSTD_tableFillPurpose_e next to ZSTD_dictTableLoadMethod_e

* Comment and expand helper functions

* Asserts and documentation

* nit
2022-06-21 17:27:19 -04:00
Daniel Kutenin
05f3f415ce
Fix big endian ARM NEON path
It is not using the NEON acceleration but the bit grouping was applied
2022-06-13 09:16:24 +01:00
Elliot Gorokhovsky
f313a773a4
Merge pull request #3157 from embg/huge_dict_bugfix
Bugfix for huge dictionaries
2022-06-09 15:35:29 -04:00
Elliot Gorokhovsky
31bd6402c6 Bugfix for huge dictionaries 2022-06-09 11:39:30 -04:00
Nick Terrell
7c05b9aec3 Remove expensive assert in --rsyncable hot loop
This assert slows the loop down by 10x. We can get similar
coverage by asserting at the beginning & end of the loop.

We need this fix because Debian compiles zstd with asserts
enabled. Separately, we should ask them why, and if they would
consider disabling asserts in their builds. Since we don't
optimize for assert enabled builds.

Fixes Issue #3150.
2022-06-06 11:56:13 -07:00
ihsinme
5081ccb056
Update zstd_compress.c 2022-05-30 14:08:19 +03:00
Danila Kutenin
9166c6ae20 Again unused error warning. Fixed 2022-05-23 14:51:47 +00:00
Danila Kutenin
6b561d230f Move NEON version to a separate function and fix indentation 2022-05-23 14:49:35 +00:00
Danila Kutenin
778f639be9 Disable unused variable warning 2022-05-22 10:50:33 +00:00
Danila Kutenin
e11783b04d [lazy] Optimize ZSTD_row_getMatchMask for level 8-10
We found that movemask is not used properly or consumes too much CPU.
This effort helps to optimize the movemask emulation on ARM.

For level 8-9 we saw 3-5% improvements. For level 10 we say 1.5%
improvement.

The key idea is not to use pure movemasks but to have groups of bits.
For rowEntries == 16, 32 we are going to have groups of size 4 and 2
respectively. It means that each bit will be duplicated within the group

Then we do AND to have only one bit set in the group so that iteration
with lowering bit `a &= (a - 1)` works as well.

Also, aarch64 does not have rotate instructions for 16 bit, only for 32
and 64, that's why we see more improvements for level 8-9.

vshrn_n_u16 instruction is used to achieve that: vshrn_n_u16 shifts by
4 every u16 and narrows to 8 lower bits. See the picture below. It's
also used in
[Folly](c570259008/folly/container/detail/F14Table.h (L446)).
It also uses 2 cycles according to Neoverse-N{1,2} guidelines.

64 bit movemask is already well optimized. We have ongoing experiments
but were not able to validate other implementations work reliably faster.
2022-05-22 10:44:24 +00:00
Elliot Gorokhovsky
f349d18776
Merge pull request #3127 from embg/repcode_history
Correct and clarify repcode offset history logic
2022-05-12 13:50:15 -04:00
Elliot Gorokhovsky
3620a0a565 Nits 2022-05-12 12:53:15 -04:00
W. Felix Handte
1dd046a507 Fix Comments Slightly 2022-05-11 12:38:45 -04:00
W. Felix Handte
cd1f582943 Hoist Hash Table Writes Up into Each Match Found Block
Refactoring this way avoids the bad write in the case that `step > 4`, and
is a bit more straightforward. It also seems to perform better!
2022-05-11 11:27:34 -04:00
W. Felix Handte
040986a4f4 ZSTD_fast_noDict: Minimize Checks When Writing Hash Table for ip1
This commit avoids checking whether a hashtable write is safe in two of the
three match-found paths in `ZSTD_compressBlock_fast_noDict_generic`. This pro-
duces a ~0.5% speed-up in compression.

A comment in the code describes why we can skip this check in the other two
paths (the repcode check and the first match check in the unrolled loop).

A downside is that in the new position where we make this check, we have not
yet computed `mLength`. We therefore have to avoid writing *possibly* dangerous
positions, rather than the old check which only avoids writing *actually*
dangerous positions. This leads to a miniscule loss in ratio (remember that
this scenario can only been triggered in very negative levels or under incomp-
ressibility acceleration).
2022-05-10 14:29:39 -07:00