1
0
mirror of https://github.com/facebook/zstd.git synced 2025-03-06 16:56:49 +02:00

4652 Commits

Author SHA1 Message Date
Yann Collet
bbaba45589 change experimental parameter name
from ZSTD_c_useBlockSplitter to ZSTD_c_splitAfterSequences.
2024-10-31 13:43:40 -07:00
Yann Collet
4f93206d62 changed variable name to ZSTD_c_blockSplitterLevel
suggested by @terrelln
2024-10-29 11:12:09 -07:00
Yann Collet
fcbf6b014a fixed minor conversion warning 2024-10-28 16:47:38 -07:00
Yann Collet
37706a677c added a test
test both that the new parameter works as intended,
and that the over-split protection works as intended
2024-10-28 16:31:15 -07:00
Yann Collet
226ae73311 expose new parameter ZSTD_c_blockSplitter_level 2024-10-28 16:31:15 -07:00
Yann Collet
01474bf73b add internal compression parameter preBlockSplitter_level
not yet exposed to the interface.

Also: renames `useBlockSplitter` to `postBlockSplitter`
to better qualify the difference between the 2 settings.
2024-10-28 16:31:15 -07:00
Yann Collet
5b4ce643f0 update ZSTD_splitBlock() documentation 2024-10-25 16:25:02 -07:00
Yann Collet
e557abc8a0 new block splitting variant _fromBorders
less precise but still suitable for `fast` strategy.
2024-10-25 16:13:55 -07:00
Yann Collet
da2c0dffd8 add faster block splitting heuristic, suitable for dfast strategy 2024-10-24 14:37:00 -07:00
Yann Collet
2366a87ddc fix minor visual conversion warning 2024-10-24 13:38:12 -07:00
Yann Collet
326c45bb8e complete sample11 with reduced fingerprint size 2024-10-24 13:17:56 -07:00
Yann Collet
ca6e55cbf5 reduce splitBlock arguments 2024-10-24 13:17:56 -07:00
Yann Collet
94d7b07425 organize specialization at recordFingerprint level 2024-10-24 13:17:56 -07:00
Yann Collet
566763fdc9 new variant, sampling by 11 2024-10-24 13:17:56 -07:00
Yann Collet
90095f056d apply limit conditions for all splitting strategies
instead of just for blind split.

This is in anticipation of adversarial input,
that would intentionally target the sampling pattern of the split detector.

Note that, even without this protection, splitting can never expand beyond ZSTD_COMPRESSBOUND(),
because this upper limit uses a 1KB block size worst case scenario,
and splitting never creates blocks thath small.

The protection is more to ensure that data is not expanded by more than 3-bytes per 128 KB full block,
which is a much stricter limit.
2024-10-24 11:36:56 -07:00
Yann Collet
c80645a055 stricter limits to ensure expansion factor with blind-split strategy
issue reported by @terrelln
2024-10-23 14:55:10 -07:00
Yann Collet
7d3e5e3ba1 split all full 128 KB blocks
this helps make the streaming behavior more consistent,
since it does no longer depend on having more data presented on the input.

suggested by @terrelln
2024-10-23 14:18:48 -07:00
Yann Collet
b68ddce818 rewrite fingerprint storage to no longer need 64-bit members
so that it can be stored using standard alignment requirement (sizeof(void*)).

Distance function still requires 64-bit signed multiplication though,
so it won't change the issue regarding the bug in ubsan for clang 32-bit on github ci.
2024-10-23 11:50:57 -07:00
Yann Collet
57239c4d3b fixed minor strict pedantic C90 issue 2024-10-23 11:50:57 -07:00
Yann Collet
18b1e67223 fixed extraneous return
strict C90 compliance test
2024-10-23 11:50:57 -07:00
Yann Collet
0be334d208 fixes static state allocation check
detected by @felixhandte
2024-10-23 11:50:57 -07:00
Yann Collet
06b7cfabf8 rewrote ZSTD_cwksp_initialAllocStart() to be easier to read
following a discussion with @felixhandte
2024-10-23 11:50:57 -07:00
Yann Collet
16450d0732 rewrite penalty update
suggested by @terrelln
2024-10-23 11:50:57 -07:00
Yann Collet
1ec5f9f1f6 changed loop exit condition so that there is no need to assert() within the loop. 2024-10-23 11:50:57 -07:00
Yann Collet
4662f6e646 renamed: FingerPrint => Fingerprint
suggested by @terrelln
2024-10-23 11:50:57 -07:00
Yann Collet
ea85dc7af6 conservatively estimate over-splitting in presence of incompressible loss
ensure data can never be expanded by more than 3 bytes per full block.
2024-10-23 11:50:57 -07:00
Yann Collet
5ae34e4c96 ensure lastBlock is correctly determined
reported by @terrelln
2024-10-23 11:50:57 -07:00
Yann Collet
7bad787d8b made ZSTD_isPower2() an inline function 2024-10-23 11:50:57 -07:00
Yann Collet
a167571db5 added a faster block splitter variant
that samples 1 in 5 positions.

This variant is fast enough for lazy2 and btlazy2,
but it's less good in combination with post-splitter at higher levels (>= btopt).
2024-10-23 11:50:57 -07:00
Yann Collet
1c62e714ab minor split optimization
let's fill the initial stats directly into target fingerprint
2024-10-23 11:50:57 -07:00
Yann Collet
4ce91cbf2b fixed workspace alignment on non 64-bit systems 2024-10-23 11:50:57 -07:00
Yann Collet
cae8d13294 splitter workspace is now provided by ZSTD_CCtx* 2024-10-23 11:50:56 -07:00
Yann Collet
4685eafa81 fix alignment test
for non 64-bit systems
2024-10-23 11:50:56 -07:00
Yann Collet
73a6653653 ZSTD_splitBlock_4k() uses externally provided workspace
ideally, this workspace would be provided from the ZSTD_CCtx* state
2024-10-23 11:50:56 -07:00
Yann Collet
31d48e9ffa fixing minor formatting issue in 32-bit mode with logs enabled 2024-10-23 11:50:56 -07:00
Yann Collet
6dc52122e6 fixed c90 comment style 2024-10-23 11:50:56 -07:00
Yann Collet
20c3d176cd fix assert 2024-10-23 11:50:56 -07:00
Yann Collet
0d4b520657 only split full blocks
short term simplification
2024-10-23 11:50:56 -07:00
Yann Collet
8b3887f579 fixed kernel build 2024-10-23 11:50:56 -07:00
Yann Collet
f83ed087f6 fixed RLE detection test 2024-10-23 11:50:56 -07:00
Yann Collet
83a3402a92 fix overlap write scenario in presence of incompressible data 2024-10-23 11:50:56 -07:00
Yann Collet
fa147cbb4d more ZSTD_memset() to apply 2024-10-23 11:50:56 -07:00
Yann Collet
6021b6663a minor C++-ism
though I really wonder if this is a property worth maintaining.
2024-10-23 11:50:56 -07:00
Yann Collet
e2d7d08888 use ZSTD_memset()
for better portability on Linux kernel
2024-10-23 11:50:56 -07:00
Yann Collet
586ca96fec do not use new as variable name 2024-10-23 11:50:56 -07:00
Yann Collet
9e52789962 fixed strict C90 semantic 2024-10-23 11:50:56 -07:00
Yann Collet
a5bce4ae84 XP: add a pre-splitter
instead of ingesting only full blocks, make an analysis of data, and infer where to split.
2024-10-23 11:50:56 -07:00
Yann Collet
47d4f5662d rewrite code in the manner suggested by @terrelln 2024-10-17 09:37:23 -07:00
Yann Collet
6326775166 slightly improved compression ratio at levels 3 & 4
The compression ratio benefits are small but consistent, i.e. always positive.
On `silesia.tar` corpus, this modification saves ~75 KB at level 3.
The measured speed cost is negligible, i.e. below noise level, between 0 and -1%.
2024-10-17 09:37:23 -07:00
Yann Collet
c2abfc5ba4 minor improvement to level 3 dictionary compression ratio 2024-10-15 17:58:33 -07:00