1
0
mirror of https://github.com/facebook/zstd.git synced 2025-03-06 16:56:49 +02:00

4617 Commits

Author SHA1 Message Date
Yann Collet
6dc52122e6 fixed c90 comment style 2024-10-23 11:50:56 -07:00
Yann Collet
20c3d176cd fix assert 2024-10-23 11:50:56 -07:00
Yann Collet
0d4b520657 only split full blocks
short term simplification
2024-10-23 11:50:56 -07:00
Yann Collet
8b3887f579 fixed kernel build 2024-10-23 11:50:56 -07:00
Yann Collet
f83ed087f6 fixed RLE detection test 2024-10-23 11:50:56 -07:00
Yann Collet
83a3402a92 fix overlap write scenario in presence of incompressible data 2024-10-23 11:50:56 -07:00
Yann Collet
fa147cbb4d more ZSTD_memset() to apply 2024-10-23 11:50:56 -07:00
Yann Collet
6021b6663a minor C++-ism
though I really wonder if this is a property worth maintaining.
2024-10-23 11:50:56 -07:00
Yann Collet
e2d7d08888 use ZSTD_memset()
for better portability on Linux kernel
2024-10-23 11:50:56 -07:00
Yann Collet
586ca96fec do not use new as variable name 2024-10-23 11:50:56 -07:00
Yann Collet
9e52789962 fixed strict C90 semantic 2024-10-23 11:50:56 -07:00
Yann Collet
a5bce4ae84 XP: add a pre-splitter
instead of ingesting only full blocks, make an analysis of data, and infer where to split.
2024-10-23 11:50:56 -07:00
Yann Collet
47d4f5662d rewrite code in the manner suggested by @terrelln 2024-10-17 09:37:23 -07:00
Yann Collet
6326775166 slightly improved compression ratio at levels 3 & 4
The compression ratio benefits are small but consistent, i.e. always positive.
On `silesia.tar` corpus, this modification saves ~75 KB at level 3.
The measured speed cost is negligible, i.e. below noise level, between 0 and -1%.
2024-10-17 09:37:23 -07:00
Yann Collet
c2abfc5ba4 minor improvement to level 3 dictionary compression ratio 2024-10-15 17:58:33 -07:00
Yann Collet
e63896eb58 small dictionary compression speed improvement
not as good as small-blocks improvement,
but generally positive.
2024-10-15 17:48:35 -07:00
Yann Collet
8c38bda935
Merge pull request #4165 from facebook/cspeed_cmov
Improve compression speed on small blocks
2024-10-11 16:20:19 -07:00
Yann Collet
8e5823b65c rename variable name
findMatch -> matchFound
since it's a test, as opposed to an active search operation.
suggested by @terrelln
2024-10-11 15:38:12 -07:00
Yann Collet
83de00316c fixed parameter ordering in dfast
noticed by @terrelln
2024-10-11 15:36:15 -07:00
Yann Collet
fa1fcb08ab minor: better variable naming 2024-10-10 16:07:20 -07:00
Yann Collet
d45aee43f4 make __asm__ a __GNUC__ specific 2024-10-08 16:38:35 -07:00
Yann Collet
741b860fc1 store dummy bytes within ZSTD_match4Found_cmov()
feels more logical, better contained
2024-10-08 16:34:40 -07:00
Yann Collet
197c258a79 introduce memory barrier to force test order
suggested by @terrelln
2024-10-08 15:54:48 -07:00
Yann Collet
186b132495 made search strategy switchable
between cmov and branch
and use a simple heuristic based on wlog to select between them.

note: performance is not good on clang (yet)
2024-10-08 13:52:56 -07:00
Yann Collet
2cc600bab2 refactor search into an inline function
for easier swapping with a parameter
2024-10-08 11:10:48 -07:00
Yann Collet
1e7fa242f4 minor refactor zstd_fast
make hot variables more local
2024-10-07 11:22:40 -07:00
Ilya Tokar
e8fce38954 Optimize compression by avoiding unpredictable branches
Avoid unpredictable branch. Use conditional move to generate the address
that is guaranteed to be safe and compare unconditionally.
Instead of

if (idx < limit && x[idx] == val ) // mispredicted idx < limit branch

Do

addr = cmov(safe,x+idx)
if (*addr == val && idx < limit) // almost always false so well predicted

Using microbenchmarks from https://github.com/google/fleetbench,
I get about ~10% speed-up:

name                                                                                          old cpu/op   new cpu/op    delta
BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:15                                     1.46ns ± 3%   1.31ns ± 7%   -9.88%  (p=0.000 n=35+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:16                                     1.41ns ± 3%   1.28ns ± 3%   -9.56%  (p=0.000 n=36+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:15                                     1.61ns ± 1%   1.43ns ± 3%  -10.70%  (p=0.000 n=30+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:16                                     1.54ns ± 2%   1.39ns ± 3%   -9.21%  (p=0.000 n=37+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:15                                     1.82ns ± 2%   1.61ns ± 3%  -11.31%  (p=0.000 n=37+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:16                                     1.73ns ± 3%   1.56ns ± 3%   -9.50%  (p=0.000 n=38+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:15                                     2.12ns ± 2%   1.79ns ± 3%  -15.55%  (p=0.000 n=34+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:16                                     1.99ns ± 3%   1.72ns ± 3%  -13.70%  (p=0.000 n=38+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:15                                      3.22ns ± 3%   2.94ns ± 3%   -8.67%  (p=0.000 n=38+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:16                                      3.19ns ± 4%   2.86ns ± 4%  -10.55%  (p=0.000 n=40+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:15                                      2.60ns ± 3%   2.22ns ± 3%  -14.53%  (p=0.000 n=40+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:16                                      2.46ns ± 3%   2.13ns ± 2%  -13.67%  (p=0.000 n=39+36)
BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:15                                      2.69ns ± 3%   2.46ns ± 3%   -8.63%  (p=0.000 n=37+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:16                                      2.63ns ± 3%   2.36ns ± 3%  -10.47%  (p=0.000 n=40+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:15                                      3.20ns ± 2%   2.95ns ± 3%   -7.94%  (p=0.000 n=35+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:16                                      3.20ns ± 4%   2.87ns ± 4%  -10.33%  (p=0.000 n=40+40)

I've also measured the impact on internal workloads and saw similar
~10% improvement in performance, measured by cpu usage/byte of data.
2024-09-20 16:07:01 -04:00
Yann Collet
7a48dc230c fix doc nit: ZDICT_DICTSIZE_MIN
fix #4142
2024-09-19 09:50:30 -07:00
Yann Collet
09cb37cbb1 Limit range of operations on Indexes in 32-bit mode
and use unsigned type.
This reduce risks that an operation produces a negative number when crossing the 2 GB limit.
2024-08-21 11:03:43 -07:00
Yann Collet
1eb32ff594
Merge pull request #4115 from Adenilson/leak01
[zstd][leak] Avoid memory leak on early return of ZSTD_generateSequence
2024-08-09 14:09:17 -07:00
Yann Collet
ee1fc7ee5c
Merge pull request #4114 from Adenilson/trace01
[riscv] Enable support for weak symbols
2024-08-09 14:08:57 -07:00
Adenilson Cavalcanti
a40bad8ec0 [zstd][leak] Avoid memory leak on early return of ZSTD_generateSequence
Sanity checks on a few of the context parameters (i.e. workers and block size)
may prompt an early return on ZSTD_generateSequences.

Allocating the destination buffer past those return points avoids a potential
memory leak.

This patch should fix issue #4112.
2024-08-06 18:01:20 -07:00
Adenilson Cavalcanti
6dbd49bcd0 [riscv] Enable support for weak symbols
Both gcc and clang support weak symbols on RISC-V, therefore
let's enable it.

This should fix issue #4069.
2024-08-06 16:55:32 -07:00
Yann Collet
cb784edf5d added android-ndk-build 2024-07-30 11:34:49 -07:00
Adenilson Cavalcanti
c3c28c4d5a [zstd][android] Fix build with NDK r27
The NDK cross compiler declares the target as __linux (which is
not technically incorrect), which triggers the enablement of _GNU_SOURCE
in the newly added code that requires the presence of qsort_r() used
in the COVER dictionary code.

Even though the NDK uses llvm/libc, it doesn't declare qsort_r()
in the stdlib.h header.

The build fix is to only activate the _GNU_SOURCE macro if the OS is
*not* Android, as then we will fallback to the C90 compliant code.

This patch should solve the reported issue number #4103.
2024-07-29 17:13:58 -07:00
Thomas Petazzoni
5d63f186cc lib/libzstd.mk: fix typo in the definition of LIB_BINDIR
Commit f4dbfce79cb2b82fb496fcd2518ecd3315051b7d ("define LIB_SRCDIR
and LIB_BINDIR") significantly reworked the build logic, but in its
introduction of LIB_BINDIR a typo was made.

It was introduced as such:

+LIB_SRCDIR ?= $(dir $(realpath $(lastword $(MAKEFILE_LIST))))
+LIB_BINDIR ?= $(LIBSRC_DIR)

But the definition of LIB_BINDIR has a typo: it should use
$(LIB_SRCDIR) not $(LIBSRC_DIR).

Due to this, $(LIB_BINDIR) is empty, therefore in programs/Makefile,
-L$(LIB_BINDIR) is expanded to just -L, and consequently when trying
to link the "zstd" binary with the libzstd library, it cannot find it:

host/lib/gcc/powerpc64-buildroot-linux-gnu/13.3.0/../../../../powerpc64-buildroot-linux-gnu/bin/ld: cannot find -lzstd: No such file or directory

This commit fixes the build by fixing this typo.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
2024-07-13 13:53:53 +02:00
Adenilson Cavalcanti
345bcb5ff7 [zstd][dict] Ensure that dictionary training functions are fully reentrant
The two main functions used for dictionary training using the COVER
algorithm require initialization of a COVER_ctx_t where a call
to qsort() is performed.

The issue is that the standard C99 qsort() function doesn't offer
a way to pass an extra parameter for the comparison function callback
(e.g. a pointer to a context) and currently zstd relies on a *global*
static variable to hold a pointer to a context needed to perform
the sort operation.

If a zstd library user invokes either ZDICT_trainFromBuffer_cover or
ZDICT_optimizeTrainFromBuffer_cover from multiple threads, the
global context may be overwritten before/during the call/execution to qsort()
in the initialization of the COVER_ctx_t, thus yielding to crashes
and other bad things (Tm) as reported on issue #4045.

Enters qsort_r(): it was designed to address precisely this situation,
to quote from the documention [1]: "the comparison function does not need to
use global variables to pass through arbitrary arguments, and is therefore
reentrant and safe to use in threads."

It is available with small variations for multiple OSes (GNU, BSD[2],
Windows[3]), and the ISO C11 [4] standard features on annex B-21 qsort_s() as
part of the <stdlib.h>. Let's hope that compilers eventually catch up
with it.

For now, we have to handle the small variations in function parameters
for each platform.

The current fix solves the problem by allowing each executing thread
pass its own COVER_ctx_t instance to qsort_r(), removing the use of
a global pointer and allowing the code to be reentrant.

Unfortunately for *BSD, we cannot leverage qsort_r() given that its API
has changed on newer versions of FreeBSD (14.0) and the other BSD variants
(e.g. NetBSD, OpenBSD) don't implement it.

For such cases we provide a fallback that will work only requiring support
for compilers implementing support for C90.

[1] https://man7.org/linux/man-pages/man3/qsort_r.3.html
[2] https://man.freebsd.org/cgi/man.cgi?query=qsort_r
[3] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort-s?view=msvc-170
[4] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
2024-07-01 23:52:31 -07:00
Yann Collet
3de0541aef
Merge pull request #4079 from elasota/truncated-huff-state-error
Throw error if Huffman weight initial states are truncated
2024-06-30 16:17:03 -07:00
elasota
0938308ff6 Throw error if Huffman weight initial states are truncated 2024-06-20 17:46:16 -04:00
Dimitri Papadopoulos
44e83e9180
Fix typos not found by codespell 2024-06-20 20:16:25 +02:00
Dimitri Papadopoulos
2d736d9c50
Fix new typos found by codespell 2024-06-20 20:12:16 +02:00
Yann Collet
80170f6aad fix macos build
weird: after replacing the UNAME line with an identical one,
it does work properly now(??).
Possibly a case of hidden special character?
2024-06-18 20:24:00 -07:00
Quentin Boswank
f19c98228f Fix $filter and Msys/Cygwin
- switched the patter and input of $filter into the right places
- added pattern wildcard to MSYS_NT & CYGWIN_NT as they change with windows versions
- correctly identify MSYS2, even in an env like MINGW64
2024-06-05 18:37:27 +02:00
Yann Collet
2acf90431a minor:doc: specify decompression behavior in presence of multiple concatenated frames
directly at ZSTD_decompress() level.
2024-06-03 18:30:23 -07:00
Yann Collet
f70fb7c870
Merge pull request #4046 from josepho0918/iar
Improve support for IAR compiler with attributes and intrinsics
2024-05-29 15:33:19 -07:00
Federico Maresca
5e9a6c2fe4
Refactor dictionary matchfinder index safety check (#4039) 2024-05-29 12:35:24 -04:00
Adenilson Cavalcanti
1872688e0a [fix] Add check on failed allocation in legacy/zstd_v06
As reported by Ben Hawkes in #4026, a failure to allocate a zstd context
would lead to a dereference of a NULL pointer due to a missing check
on the returned result of ZSTDv06_createDCtx().

This patch fix the issue by adding a check for valid returned pointer.
2024-05-17 15:40:28 -07:00
Joseph Chen
5fadd8e6b1 revert FSE_readNCount_body attribute 2024-05-15 10:47:50 +08:00
Joseph Chen
2955d92ac0 Improve support for IAR compiler with attributes and intrinsics 2024-05-14 17:01:19 +08:00
Richard Barnes
97291fc502 Increase x-compatibility 2024-04-29 09:19:24 -04:00