1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-03 22:30:29 +02:00
Commit Graph

4440 Commits

Author SHA1 Message Date
efef80b75e Fix make variable 2022-08-19 12:06:43 +02:00
ae5f273a92 drop -E flag in sed 2022-08-19 12:00:32 +02:00
ef60302af9 Merge pull request #3230 from grossws/fix3229-docs
Add description for ZSTD_decompressStream and ZSTD_initDStream
2022-08-16 12:48:23 -04:00
1c847e2e32 Add description for ZSTD_decompressStream and ZSTD_initDStream
With that these functions become visible in generated docs.

Fixes #3229
2022-08-08 18:02:50 +03:00
a70ca2bd7d Fix off-by-one error in superblock mode (#3221)
Fixes #3212.

Long literal and match lengths had an off-by-one error in ZSTD_getSequenceLength.
Fix the off-by-one error, and add a golden compression test that catches the bug.
Also run all the golden tests in the cli-tests framework.
2022-08-03 11:28:39 -07:00
7e6278a706 Merge pull request #3196 from mileshu/dev
[T124890272] Mark 2 Obsolete Functions(ZSTD_copy*Ctx) Deprecated in Zstd
2022-08-02 12:34:04 -04:00
c450f9f952 [T124890272] Mark 2 Obsolete Functions(ZSTD_copy*Ctx) Deprecated in Zstd
The discussion for this task is here: facebook/zstd#3128.

This task can probably be scoped to the first part: marking these functions deprecated.
We'll later look at removal when we roll out v1.6.0.
2022-08-01 22:45:52 -07:00
0f4fd28a64 Deprecate ZSTD_getDecompressedSize() (#3225)
Fixes #3158.

Mark ZSTD_getDecompressedSize() as deprecated and replaced by ZSTD_getFrameContentSize().
2022-08-01 11:52:14 -07:00
1b445c1c2e Fix hash4Ptr for big endian (#3227) 2022-08-01 10:41:24 -07:00
ec5fdcde19 lib: add hint to generate more pipeline friendly code (#3138)
With statistic data of test data files of silesia
the chance of position beyond highThreshold is very
low (~1.3%@L8 in most cases, all <2.5%), and is in
"lowprob area". Add the branch hint so compiler can
get better pipiline codegen.
With this change it is observed ~1% of mozilla and
xml, and slight (0.3%~0.8%) but consistent uplift on
other files on Arm N1.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade
2022-07-29 10:28:04 -07:00
558cf20d0d decomp: add prefetch for matched seq on aarch64 (#3164)
match is used for following sequence copy. It is
only updated when extDict is needed, which is a
low probability case. So it can be prefetched to
reduce cache miss.
The benchmarks on various Arm platforms showed
uplift from 1% ~ 14% with gcc-11/clang-14.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: If201af4799d2455d74c79f8387404439d7f684ae
2022-07-29 10:27:20 -07:00
43f21a600e Intial commit to address 3090. Added support to decompress empty block. (#3118)
* Intial commit to address 3090. Added support to decompress empty block

* Update zstd_decompress_block.c

Addressed review comments for the case of 'set_basic'

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

* Update lib/decompress/zstd_decompress_block.c

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>

Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
2022-07-14 11:54:34 -07:00
a5655e4017 Revert "T119975957"
This reverts commit 962746edff.
2022-07-12 11:17:25 -07:00
962746edff T119975957
Signed-off-by: Miles HU <yuanpu@fb.com>
2022-07-08 15:01:36 -07:00
5c382bf110 1.5.3 version bump 2022-06-29 14:45:53 -04:00
cb9e341129 Nits 2022-06-23 16:59:21 -04:00
93b89fb24b Add docs 2022-06-22 16:13:07 -04:00
2a128110d0 Add prefetchCDictTables CCtxParam 2022-06-22 16:13:07 -04:00
91aeade735 Streaming decompression can detect incorrect header ID sooner
Streaming decompression used to wait for a minimum of 5 bytes before attempting decoding.
This meant that, in the case that only a few bytes (<5) were provided,
and assuming these bytes are incorrect,
there would be no error reported.
The streaming API would simply request more data, waiting for at least 5 bytes.

This PR makes it possible to detect incorrect Frame IDs as soon as the first byte is provided.

Fix #3169
2022-06-21 23:09:03 -07:00
f6ef14329f "Short cache" optimization for level 1-4 DMS (+5-30% compression speed) (#3152)
* first attempt at fast DMS short cache

* significant wins for some scenarios

* fix all clang regressions

* nits

* fix 1.5% gcc11 regression on hot 110Kdict scenario

* fix CI

* nit

* Add tags to doublefast hash table

* use tags in doublefast DMS

* Fix CI

* Clean up some hardcoded logic / constants

* Switch forCCtx to an enum

* nit

* add short cache to ip+1 long search

* Move tag size into hashLog

* Minor nits

* Truncate dictionaries greater than 16MB in short cache mode

* Helper function for tag comparison

* Cap short cache hashLog at 24 to prevent overflow

* size_t dictTagsMatch -> int dictTagsMatch

* nit

* Clean up and comment dictionary truncation

* Move ZSTD_tableFillPurpose_e next to ZSTD_dictTableLoadMethod_e

* Comment and expand helper functions

* Asserts and documentation

* nit
2022-06-21 17:27:19 -04:00
05f3f415ce Fix big endian ARM NEON path
It is not using the NEON acceleration but the bit grouping was applied
2022-06-13 09:16:24 +01:00
3b1bd91852 Merge pull request #3141 from JunHe77/seqDec
dec: adjust seqSymbol load on aarch64
2022-06-09 13:40:51 -07:00
3b915cd94b Merge pull request #3145 from JunHe77/wildcopy
common: apply two stage copy to aarch64
2022-06-09 13:38:30 -07:00
f313a773a4 Merge pull request #3157 from embg/huge_dict_bugfix
Bugfix for huge dictionaries
2022-06-09 15:35:29 -04:00
31bd6402c6 Bugfix for huge dictionaries 2022-06-09 11:39:30 -04:00
7c05b9aec3 Remove expensive assert in --rsyncable hot loop
This assert slows the loop down by 10x. We can get similar
coverage by asserting at the beginning & end of the loop.

We need this fix because Debian compiles zstd with asserts
enabled. Separately, we should ask them why, and if they would
consider disabling asserts in their builds. Since we don't
optimize for assert enabled builds.

Fixes Issue #3150.
2022-06-06 11:56:13 -07:00
9f346dbe45 Merge pull request #3147 from animalize/dev
fix leaking thread handles on Windows
2022-06-02 10:04:55 -07:00
2491c65937 dec: adjust seqSymbol load on aarch64
ZSTD_seqSymbol is a structure with total of 64 bits
wide. So it can be loaded in one operation and
extract its fields by simply shifting or extracting
on aarch64.
GCC doesn't recognize this and generates more
unnecessary ldr/ldrb/ldrh operations that cause
performance drop.
With this change it is observed 2~4% uplift of
silesia and 2.5~6% of cantrbry @L8 on Arm N1.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I7748909204cf78a17eb9d4f2333692d53239daa8
2022-05-30 22:01:38 +08:00
5081ccb056 Update zstd_compress.c 2022-05-30 14:08:19 +03:00
95073b1af1 fix leaking thread handles on Windows
On Windows, thread handle should be closed explicitly.

Co-authored-by: luben karavelov <luben@users.noreply.github.com>
2022-05-30 16:35:44 +08:00
d7249dafb4 common: apply two stage copy to aarch64
On aarch64 ZSTD_wildcopy uses a simple loop to do
16B based memory copy. There is existing optimized
two stage copy that can achieve better performance.
By applying this to aarch64 it is also observed ~1%
uplift in silesia corpus.

Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: Ic1253308e7a8a7df2d08963ba544e086c81ce8be
2022-05-26 14:40:21 +08:00
9166c6ae20 Again unused error warning. Fixed 2022-05-23 14:51:47 +00:00
6b561d230f Move NEON version to a separate function and fix indentation 2022-05-23 14:49:35 +00:00
778f639be9 Disable unused variable warning 2022-05-22 10:50:33 +00:00
e11783b04d [lazy] Optimize ZSTD_row_getMatchMask for level 8-10
We found that movemask is not used properly or consumes too much CPU.
This effort helps to optimize the movemask emulation on ARM.

For level 8-9 we saw 3-5% improvements. For level 10 we say 1.5%
improvement.

The key idea is not to use pure movemasks but to have groups of bits.
For rowEntries == 16, 32 we are going to have groups of size 4 and 2
respectively. It means that each bit will be duplicated within the group

Then we do AND to have only one bit set in the group so that iteration
with lowering bit `a &= (a - 1)` works as well.

Also, aarch64 does not have rotate instructions for 16 bit, only for 32
and 64, that's why we see more improvements for level 8-9.

vshrn_n_u16 instruction is used to achieve that: vshrn_n_u16 shifts by
4 every u16 and narrows to 8 lower bits. See the picture below. It's
also used in
[Folly](c570259008/folly/container/detail/F14Table.h (L446)).
It also uses 2 cycles according to Neoverse-N{1,2} guidelines.

64 bit movemask is already well optimized. We have ongoing experiments
but were not able to validate other implementations work reliably faster.
2022-05-22 10:44:24 +00:00
f349d18776 Merge pull request #3127 from embg/repcode_history
Correct and clarify repcode offset history logic
2022-05-12 13:50:15 -04:00
3620a0a565 Nits 2022-05-12 12:53:15 -04:00
1dd046a507 Fix Comments Slightly 2022-05-11 12:38:45 -04:00
cd1f582943 Hoist Hash Table Writes Up into Each Match Found Block
Refactoring this way avoids the bad write in the case that `step > 4`, and
is a bit more straightforward. It also seems to perform better!
2022-05-11 11:27:34 -04:00
040986a4f4 ZSTD_fast_noDict: Minimize Checks When Writing Hash Table for ip1
This commit avoids checking whether a hashtable write is safe in two of the
three match-found paths in `ZSTD_compressBlock_fast_noDict_generic`. This pro-
duces a ~0.5% speed-up in compression.

A comment in the code describes why we can skip this check in the other two
paths (the repcode check and the first match check in the unrolled loop).

A downside is that in the new position where we make this check, we have not
yet computed `mLength`. We therefore have to avoid writing *possibly* dangerous
positions, rather than the old check which only avoids writing *actually*
dangerous positions. This leads to a miniscule loss in ratio (remember that
this scenario can only been triggered in very negative levels or under incomp-
ressibility acceleration).
2022-05-10 14:29:39 -07:00
22875ece61 Nits 2022-05-09 21:01:38 -04:00
97aabc496e Correct and clarify repcode offset history logic 2022-05-09 21:01:38 -04:00
7915c1164e Merge pull request #3114 from embg/fast_extdict_pipeline2
Software pipeline for ZSTD_compressBlock_fast_extDict
2022-05-05 15:06:47 -04:00
ac371be27b Remove hasStep variant (not enough wins to justify the code size increase) 2022-04-28 18:06:24 -04:00
ce6b69f5c5 Final nit 2022-04-28 14:49:45 -04:00
6a2e1f7c69 Revert "Hardcode repcode safety check, fix cosmetic nits"
This reverts commit 518cb83833.
2022-04-27 18:16:21 -04:00
518cb83833 Hardcode repcode safety check, fix cosmetic nits 2022-04-26 17:54:25 -04:00
05796796fd fix some typos
Signed-off-by: cuishuang <imcusg@gmail.com>
2022-04-26 17:40:23 +08:00
809f652912 Optimize repcode predicate, hardcode hasStep == 0 scenario, cosmetic fixes 2022-04-20 14:40:52 -04:00
2820efe7ec Nits 2022-04-19 11:39:52 -04:00