Yann Collet
18b1e67223
fixed extraneous return
...
strict C90 compliance test
2024-10-23 11:50:57 -07:00
Yann Collet
0be334d208
fixes static state allocation check
...
detected by @felixhandte
2024-10-23 11:50:57 -07:00
Yann Collet
06b7cfabf8
rewrote ZSTD_cwksp_initialAllocStart() to be easier to read
...
following a discussion with @felixhandte
2024-10-23 11:50:57 -07:00
Yann Collet
16450d0732
rewrite penalty update
...
suggested by @terrelln
2024-10-23 11:50:57 -07:00
Yann Collet
1ec5f9f1f6
changed loop exit condition so that there is no need to assert() within the loop.
2024-10-23 11:50:57 -07:00
Yann Collet
4662f6e646
renamed: FingerPrint => Fingerprint
...
suggested by @terrelln
2024-10-23 11:50:57 -07:00
Yann Collet
ea85dc7af6
conservatively estimate over-splitting in presence of incompressible loss
...
ensure data can never be expanded by more than 3 bytes per full block.
2024-10-23 11:50:57 -07:00
Yann Collet
5ae34e4c96
ensure lastBlock
is correctly determined
...
reported by @terrelln
2024-10-23 11:50:57 -07:00
Yann Collet
7bad787d8b
made ZSTD_isPower2() an inline function
2024-10-23 11:50:57 -07:00
Yann Collet
a167571db5
added a faster block splitter variant
...
that samples 1 in 5 positions.
This variant is fast enough for lazy2 and btlazy2,
but it's less good in combination with post-splitter at higher levels (>= btopt).
2024-10-23 11:50:57 -07:00
Yann Collet
1c62e714ab
minor split optimization
...
let's fill the initial stats directly into target fingerprint
2024-10-23 11:50:57 -07:00
Yann Collet
4ce91cbf2b
fixed workspace alignment on non 64-bit systems
2024-10-23 11:50:57 -07:00
Yann Collet
cae8d13294
splitter workspace is now provided by ZSTD_CCtx*
2024-10-23 11:50:56 -07:00
Yann Collet
4685eafa81
fix alignment test
...
for non 64-bit systems
2024-10-23 11:50:56 -07:00
Yann Collet
73a6653653
ZSTD_splitBlock_4k() uses externally provided workspace
...
ideally, this workspace would be provided from the ZSTD_CCtx* state
2024-10-23 11:50:56 -07:00
Yann Collet
31d48e9ffa
fixing minor formatting issue in 32-bit mode with logs enabled
2024-10-23 11:50:56 -07:00
Yann Collet
6dc52122e6
fixed c90 comment style
2024-10-23 11:50:56 -07:00
Yann Collet
20c3d176cd
fix assert
2024-10-23 11:50:56 -07:00
Yann Collet
0d4b520657
only split full blocks
...
short term simplification
2024-10-23 11:50:56 -07:00
Yann Collet
8b3887f579
fixed kernel build
2024-10-23 11:50:56 -07:00
Yann Collet
f83ed087f6
fixed RLE detection test
2024-10-23 11:50:56 -07:00
Yann Collet
83a3402a92
fix overlap write scenario in presence of incompressible data
2024-10-23 11:50:56 -07:00
Yann Collet
fa147cbb4d
more ZSTD_memset() to apply
2024-10-23 11:50:56 -07:00
Yann Collet
6021b6663a
minor C++-ism
...
though I really wonder if this is a property worth maintaining.
2024-10-23 11:50:56 -07:00
Yann Collet
e2d7d08888
use ZSTD_memset()
...
for better portability on Linux kernel
2024-10-23 11:50:56 -07:00
Yann Collet
586ca96fec
do not use new
as variable name
2024-10-23 11:50:56 -07:00
Yann Collet
9e52789962
fixed strict C90 semantic
2024-10-23 11:50:56 -07:00
Yann Collet
a5bce4ae84
XP: add a pre-splitter
...
instead of ingesting only full blocks, make an analysis of data, and infer where to split.
2024-10-23 11:50:56 -07:00
Yann Collet
47d4f5662d
rewrite code in the manner suggested by @terrelln
2024-10-17 09:37:23 -07:00
Yann Collet
6326775166
slightly improved compression ratio at levels 3 & 4
...
The compression ratio benefits are small but consistent, i.e. always positive.
On `silesia.tar` corpus, this modification saves ~75 KB at level 3.
The measured speed cost is negligible, i.e. below noise level, between 0 and -1%.
2024-10-17 09:37:23 -07:00
Yann Collet
c2abfc5ba4
minor improvement to level 3 dictionary compression ratio
2024-10-15 17:58:33 -07:00
Yann Collet
e63896eb58
small dictionary compression speed improvement
...
not as good as small-blocks improvement,
but generally positive.
2024-10-15 17:48:35 -07:00
Yann Collet
8c38bda935
Merge pull request #4165 from facebook/cspeed_cmov
...
Improve compression speed on small blocks
2024-10-11 16:20:19 -07:00
Yann Collet
8e5823b65c
rename variable name
...
findMatch -> matchFound
since it's a test, as opposed to an active search operation.
suggested by @terrelln
2024-10-11 15:38:12 -07:00
Yann Collet
83de00316c
fixed parameter ordering in dfast
...
noticed by @terrelln
2024-10-11 15:36:15 -07:00
Yann Collet
fa1fcb08ab
minor: better variable naming
2024-10-10 16:07:20 -07:00
Yann Collet
d45aee43f4
make __asm__ a __GNUC__ specific
2024-10-08 16:38:35 -07:00
Yann Collet
741b860fc1
store dummy bytes within ZSTD_match4Found_cmov()
...
feels more logical, better contained
2024-10-08 16:34:40 -07:00
Yann Collet
197c258a79
introduce memory barrier to force test order
...
suggested by @terrelln
2024-10-08 15:54:48 -07:00
Yann Collet
186b132495
made search strategy switchable
...
between cmov and branch
and use a simple heuristic based on wlog to select between them.
note: performance is not good on clang (yet)
2024-10-08 13:52:56 -07:00
Yann Collet
2cc600bab2
refactor search into an inline function
...
for easier swapping with a parameter
2024-10-08 11:10:48 -07:00
Yann Collet
1e7fa242f4
minor refactor zstd_fast
...
make hot variables more local
2024-10-07 11:22:40 -07:00
Ilya Tokar
e8fce38954
Optimize compression by avoiding unpredictable branches
...
Avoid unpredictable branch. Use conditional move to generate the address
that is guaranteed to be safe and compare unconditionally.
Instead of
if (idx < limit && x[idx] == val ) // mispredicted idx < limit branch
Do
addr = cmov(safe,x+idx)
if (*addr == val && idx < limit) // almost always false so well predicted
Using microbenchmarks from https://github.com/google/fleetbench ,
I get about ~10% speed-up:
name old cpu/op new cpu/op delta
BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:15 1.46ns ± 3% 1.31ns ± 7% -9.88% (p=0.000 n=35+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:16 1.41ns ± 3% 1.28ns ± 3% -9.56% (p=0.000 n=36+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:15 1.61ns ± 1% 1.43ns ± 3% -10.70% (p=0.000 n=30+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:16 1.54ns ± 2% 1.39ns ± 3% -9.21% (p=0.000 n=37+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:15 1.82ns ± 2% 1.61ns ± 3% -11.31% (p=0.000 n=37+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:16 1.73ns ± 3% 1.56ns ± 3% -9.50% (p=0.000 n=38+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:15 2.12ns ± 2% 1.79ns ± 3% -15.55% (p=0.000 n=34+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:16 1.99ns ± 3% 1.72ns ± 3% -13.70% (p=0.000 n=38+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:15 3.22ns ± 3% 2.94ns ± 3% -8.67% (p=0.000 n=38+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:16 3.19ns ± 4% 2.86ns ± 4% -10.55% (p=0.000 n=40+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:15 2.60ns ± 3% 2.22ns ± 3% -14.53% (p=0.000 n=40+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:16 2.46ns ± 3% 2.13ns ± 2% -13.67% (p=0.000 n=39+36)
BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:15 2.69ns ± 3% 2.46ns ± 3% -8.63% (p=0.000 n=37+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:16 2.63ns ± 3% 2.36ns ± 3% -10.47% (p=0.000 n=40+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:15 3.20ns ± 2% 2.95ns ± 3% -7.94% (p=0.000 n=35+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:16 3.20ns ± 4% 2.87ns ± 4% -10.33% (p=0.000 n=40+40)
I've also measured the impact on internal workloads and saw similar
~10% improvement in performance, measured by cpu usage/byte of data.
2024-09-20 16:07:01 -04:00
Yann Collet
7a48dc230c
fix doc nit: ZDICT_DICTSIZE_MIN
...
fix #4142
2024-09-19 09:50:30 -07:00
Yann Collet
09cb37cbb1
Limit range of operations on Indexes in 32-bit mode
...
and use unsigned type.
This reduce risks that an operation produces a negative number when crossing the 2 GB limit.
2024-08-21 11:03:43 -07:00
Yann Collet
1eb32ff594
Merge pull request #4115 from Adenilson/leak01
...
[zstd][leak] Avoid memory leak on early return of ZSTD_generateSequence
2024-08-09 14:09:17 -07:00
Yann Collet
ee1fc7ee5c
Merge pull request #4114 from Adenilson/trace01
...
[riscv] Enable support for weak symbols
2024-08-09 14:08:57 -07:00
Adenilson Cavalcanti
a40bad8ec0
[zstd][leak] Avoid memory leak on early return of ZSTD_generateSequence
...
Sanity checks on a few of the context parameters (i.e. workers and block size)
may prompt an early return on ZSTD_generateSequences.
Allocating the destination buffer past those return points avoids a potential
memory leak.
This patch should fix issue #4112 .
2024-08-06 18:01:20 -07:00
Adenilson Cavalcanti
6dbd49bcd0
[riscv] Enable support for weak symbols
...
Both gcc and clang support weak symbols on RISC-V, therefore
let's enable it.
This should fix issue #4069 .
2024-08-06 16:55:32 -07:00
Yann Collet
cb784edf5d
added android-ndk-build
2024-07-30 11:34:49 -07:00