1
0
mirror of https://github.com/facebook/zstd.git synced 2025-03-07 01:10:04 +02:00

4594 Commits

Author SHA1 Message Date
Yann Collet
8e5823b65c rename variable name
findMatch -> matchFound
since it's a test, as opposed to an active search operation.
suggested by @terrelln
2024-10-11 15:38:12 -07:00
Yann Collet
83de00316c fixed parameter ordering in dfast
noticed by @terrelln
2024-10-11 15:36:15 -07:00
Yann Collet
fa1fcb08ab minor: better variable naming 2024-10-10 16:07:20 -07:00
Yann Collet
d45aee43f4 make __asm__ a __GNUC__ specific 2024-10-08 16:38:35 -07:00
Yann Collet
741b860fc1 store dummy bytes within ZSTD_match4Found_cmov()
feels more logical, better contained
2024-10-08 16:34:40 -07:00
Yann Collet
197c258a79 introduce memory barrier to force test order
suggested by @terrelln
2024-10-08 15:54:48 -07:00
Yann Collet
186b132495 made search strategy switchable
between cmov and branch
and use a simple heuristic based on wlog to select between them.

note: performance is not good on clang (yet)
2024-10-08 13:52:56 -07:00
Yann Collet
2cc600bab2 refactor search into an inline function
for easier swapping with a parameter
2024-10-08 11:10:48 -07:00
Yann Collet
1e7fa242f4 minor refactor zstd_fast
make hot variables more local
2024-10-07 11:22:40 -07:00
Ilya Tokar
e8fce38954 Optimize compression by avoiding unpredictable branches
Avoid unpredictable branch. Use conditional move to generate the address
that is guaranteed to be safe and compare unconditionally.
Instead of

if (idx < limit && x[idx] == val ) // mispredicted idx < limit branch

Do

addr = cmov(safe,x+idx)
if (*addr == val && idx < limit) // almost always false so well predicted

Using microbenchmarks from https://github.com/google/fleetbench,
I get about ~10% speed-up:

name                                                                                          old cpu/op   new cpu/op    delta
BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:15                                     1.46ns ± 3%   1.31ns ± 7%   -9.88%  (p=0.000 n=35+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:-7/window_log:16                                     1.41ns ± 3%   1.28ns ± 3%   -9.56%  (p=0.000 n=36+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:15                                     1.61ns ± 1%   1.43ns ± 3%  -10.70%  (p=0.000 n=30+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-5/window_log:16                                     1.54ns ± 2%   1.39ns ± 3%   -9.21%  (p=0.000 n=37+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:15                                     1.82ns ± 2%   1.61ns ± 3%  -11.31%  (p=0.000 n=37+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:-3/window_log:16                                     1.73ns ± 3%   1.56ns ± 3%   -9.50%  (p=0.000 n=38+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:15                                     2.12ns ± 2%   1.79ns ± 3%  -15.55%  (p=0.000 n=34+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:-1/window_log:16                                     1.99ns ± 3%   1.72ns ± 3%  -13.70%  (p=0.000 n=38+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:15                                      3.22ns ± 3%   2.94ns ± 3%   -8.67%  (p=0.000 n=38+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:0/window_log:16                                      3.19ns ± 4%   2.86ns ± 4%  -10.55%  (p=0.000 n=40+38)
BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:15                                      2.60ns ± 3%   2.22ns ± 3%  -14.53%  (p=0.000 n=40+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:1/window_log:16                                      2.46ns ± 3%   2.13ns ± 2%  -13.67%  (p=0.000 n=39+36)
BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:15                                      2.69ns ± 3%   2.46ns ± 3%   -8.63%  (p=0.000 n=37+39)
BM_ZSTD_COMPRESS_Fleet/compression_level:2/window_log:16                                      2.63ns ± 3%   2.36ns ± 3%  -10.47%  (p=0.000 n=40+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:15                                      3.20ns ± 2%   2.95ns ± 3%   -7.94%  (p=0.000 n=35+40)
BM_ZSTD_COMPRESS_Fleet/compression_level:3/window_log:16                                      3.20ns ± 4%   2.87ns ± 4%  -10.33%  (p=0.000 n=40+40)

I've also measured the impact on internal workloads and saw similar
~10% improvement in performance, measured by cpu usage/byte of data.
2024-09-20 16:07:01 -04:00
Yann Collet
cb784edf5d added android-ndk-build 2024-07-30 11:34:49 -07:00
Adenilson Cavalcanti
c3c28c4d5a [zstd][android] Fix build with NDK r27
The NDK cross compiler declares the target as __linux (which is
not technically incorrect), which triggers the enablement of _GNU_SOURCE
in the newly added code that requires the presence of qsort_r() used
in the COVER dictionary code.

Even though the NDK uses llvm/libc, it doesn't declare qsort_r()
in the stdlib.h header.

The build fix is to only activate the _GNU_SOURCE macro if the OS is
*not* Android, as then we will fallback to the C90 compliant code.

This patch should solve the reported issue number #4103.
2024-07-29 17:13:58 -07:00
Thomas Petazzoni
5d63f186cc lib/libzstd.mk: fix typo in the definition of LIB_BINDIR
Commit f4dbfce79cb2b82fb496fcd2518ecd3315051b7d ("define LIB_SRCDIR
and LIB_BINDIR") significantly reworked the build logic, but in its
introduction of LIB_BINDIR a typo was made.

It was introduced as such:

+LIB_SRCDIR ?= $(dir $(realpath $(lastword $(MAKEFILE_LIST))))
+LIB_BINDIR ?= $(LIBSRC_DIR)

But the definition of LIB_BINDIR has a typo: it should use
$(LIB_SRCDIR) not $(LIBSRC_DIR).

Due to this, $(LIB_BINDIR) is empty, therefore in programs/Makefile,
-L$(LIB_BINDIR) is expanded to just -L, and consequently when trying
to link the "zstd" binary with the libzstd library, it cannot find it:

host/lib/gcc/powerpc64-buildroot-linux-gnu/13.3.0/../../../../powerpc64-buildroot-linux-gnu/bin/ld: cannot find -lzstd: No such file or directory

This commit fixes the build by fixing this typo.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
2024-07-13 13:53:53 +02:00
Adenilson Cavalcanti
345bcb5ff7 [zstd][dict] Ensure that dictionary training functions are fully reentrant
The two main functions used for dictionary training using the COVER
algorithm require initialization of a COVER_ctx_t where a call
to qsort() is performed.

The issue is that the standard C99 qsort() function doesn't offer
a way to pass an extra parameter for the comparison function callback
(e.g. a pointer to a context) and currently zstd relies on a *global*
static variable to hold a pointer to a context needed to perform
the sort operation.

If a zstd library user invokes either ZDICT_trainFromBuffer_cover or
ZDICT_optimizeTrainFromBuffer_cover from multiple threads, the
global context may be overwritten before/during the call/execution to qsort()
in the initialization of the COVER_ctx_t, thus yielding to crashes
and other bad things (Tm) as reported on issue #4045.

Enters qsort_r(): it was designed to address precisely this situation,
to quote from the documention [1]: "the comparison function does not need to
use global variables to pass through arbitrary arguments, and is therefore
reentrant and safe to use in threads."

It is available with small variations for multiple OSes (GNU, BSD[2],
Windows[3]), and the ISO C11 [4] standard features on annex B-21 qsort_s() as
part of the <stdlib.h>. Let's hope that compilers eventually catch up
with it.

For now, we have to handle the small variations in function parameters
for each platform.

The current fix solves the problem by allowing each executing thread
pass its own COVER_ctx_t instance to qsort_r(), removing the use of
a global pointer and allowing the code to be reentrant.

Unfortunately for *BSD, we cannot leverage qsort_r() given that its API
has changed on newer versions of FreeBSD (14.0) and the other BSD variants
(e.g. NetBSD, OpenBSD) don't implement it.

For such cases we provide a fallback that will work only requiring support
for compilers implementing support for C90.

[1] https://man7.org/linux/man-pages/man3/qsort_r.3.html
[2] https://man.freebsd.org/cgi/man.cgi?query=qsort_r
[3] https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/qsort-s?view=msvc-170
[4] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n1548.pdf
2024-07-01 23:52:31 -07:00
Yann Collet
3de0541aef
Merge pull request #4079 from elasota/truncated-huff-state-error
Throw error if Huffman weight initial states are truncated
2024-06-30 16:17:03 -07:00
elasota
0938308ff6 Throw error if Huffman weight initial states are truncated 2024-06-20 17:46:16 -04:00
Dimitri Papadopoulos
44e83e9180
Fix typos not found by codespell 2024-06-20 20:16:25 +02:00
Dimitri Papadopoulos
2d736d9c50
Fix new typos found by codespell 2024-06-20 20:12:16 +02:00
Yann Collet
80170f6aad fix macos build
weird: after replacing the UNAME line with an identical one,
it does work properly now(??).
Possibly a case of hidden special character?
2024-06-18 20:24:00 -07:00
Quentin Boswank
f19c98228f Fix $filter and Msys/Cygwin
- switched the patter and input of $filter into the right places
- added pattern wildcard to MSYS_NT & CYGWIN_NT as they change with windows versions
- correctly identify MSYS2, even in an env like MINGW64
2024-06-05 18:37:27 +02:00
Yann Collet
2acf90431a minor:doc: specify decompression behavior in presence of multiple concatenated frames
directly at ZSTD_decompress() level.
2024-06-03 18:30:23 -07:00
Yann Collet
f70fb7c870
Merge pull request #4046 from josepho0918/iar
Improve support for IAR compiler with attributes and intrinsics
2024-05-29 15:33:19 -07:00
Federico Maresca
5e9a6c2fe4
Refactor dictionary matchfinder index safety check (#4039) 2024-05-29 12:35:24 -04:00
Adenilson Cavalcanti
1872688e0a [fix] Add check on failed allocation in legacy/zstd_v06
As reported by Ben Hawkes in #4026, a failure to allocate a zstd context
would lead to a dereference of a NULL pointer due to a missing check
on the returned result of ZSTDv06_createDCtx().

This patch fix the issue by adding a check for valid returned pointer.
2024-05-17 15:40:28 -07:00
Joseph Chen
5fadd8e6b1 revert FSE_readNCount_body attribute 2024-05-15 10:47:50 +08:00
Joseph Chen
2955d92ac0 Improve support for IAR compiler with attributes and intrinsics 2024-05-14 17:01:19 +08:00
Richard Barnes
97291fc502 Increase x-compatibility 2024-04-29 09:19:24 -04:00
Richard Barnes
d7cb47036c Make zstd.h compatible with -Wzero-as-null-pointer-constant 2024-04-29 09:19:24 -04:00
Yann Collet
a86f5f3f33 update documentation of ZSTD_decompressStream()
slightly more precise, by recommending to check the return value.
fix #4030
2024-04-23 09:31:35 -07:00
Christian Marangi
f1f1ae369a
Provide variant pkg-config file for multi-threaded static lib
Multi-threaded static library require -pthread to correctly link and works.
The pkg-config we provide tho only works with dynamic multi-threaded library
and won't provide the correct libs and cflags values if lib-mt is used.

To handle this, introduce an env variable MT to permit advanced user to
install and generate a correct pkg-config file for lib-mt or detect if
lib-mt target is called.

With MT env set on calling make install-pc, libzstd.pc.in is a
pkg-config file for a multi-threaded static library.

On calling make lib-mt, a libzstd.pc is generated for a multi-threaded
static library as it's what asked by the user by forcing it.

libzstd.pc is changed to PHONY to force regeneration of it on calling
lib targets or install-pc to handle case where the same directory is
used for mixed compilation.

This was notice while migrating from meson to make build system where
meson generates a correct .pc file while make doesn't.

Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
2024-04-08 17:44:52 +02:00
Yuriy Chernyshov
72c16b187d Fix building on windows-x86 if clang already includes
[D101338](https://reviews.llvm.org/D101338) landed in 2021, so clang16 should have it
2024-04-01 09:53:08 -07:00
Po-Chuan Hsieh
103a85e6f6
Use md5sum rather than gmd5sum for FreeBSD
Reference:	https://man.freebsd.org/cgi/man.cgi?query=md5sum
2024-03-27 20:53:52 +08:00
Yann Collet
c6e5257240
Merge pull request #3977 from facebook/doc_advanced
Doc update
2024-03-21 12:33:15 -07:00
Nick Terrell
731f4b70fc Fix & fuzz ZSTD_generateSequences
This function was seriously flawed:
* It didn't do output bounds checks
* It produced invalid sequences when an uncompressed or RLE block was emitted
* It produced invalid sequences when the block splitter was enabled
* It produced invalid sequences when ZSTD_c_targetCBlockSize was enabled

I've attempted to fix these issues, but this function is just a bad idea,
so I've marked it as deprecated and unsafe. We should replace it with
`ZSTD_extractSequences()` which operates on a compressed frame.
2024-03-21 07:18:05 -07:00
Elliot Gorokhovsky
741b87bbe1
Fuzzing and bugfixes for magicless-format decoding (#3976)
* fuzzing and bugfixes for magicless format

* reset dctx before each decompression

* do not memcmp empty buffers

* nit: decompressor errata
2024-03-20 19:22:34 -04:00
Yann Collet
6f1215b874 fix ZSTD_TARGETCBLOCKSIZE_MIN test
when requested CBlockSize is too low,
bound it to the minimum
instead of returning an error.
2024-03-18 14:10:08 -07:00
Yann Collet
c5da438dc0 fix typo 2024-03-18 12:33:22 -07:00
Yann Collet
902c7ec1fe add doc on CCtx UB state 2024-03-18 12:30:35 -07:00
Yann Collet
5d82c2b57c add a paragraph on UB DCtx state after error 2024-03-18 12:17:41 -07:00
Yann Collet
f5728da365 update targetCBlockSize documentation 2024-03-18 12:04:02 -07:00
Elliot Gorokhovsky
7d970bd83c
Implement one-shot fallback for magicless format (#3971) 2024-03-18 10:55:53 -04:00
Yann Collet
0fcdc62c2d
Merge pull request #3969 from facebook/v156_prep
bump version number
2024-03-15 10:26:49 -07:00
Yann Collet
dff5407cd0
Merge pull request #3966 from facebook/debug_lineNumber
add line number to debug traces
2024-03-15 10:26:06 -07:00
Yann Collet
686e7e4b4b updated version to v1.5.6 2024-03-14 15:38:14 -07:00
Elliot Gorokhovsky
559762da12
Remove duplicate and incorrect docs in zstd_decompress.c (#3967) 2024-03-14 15:55:01 -04:00
Yann Collet
9cc3304614 add line number to debug traces 2024-03-14 12:11:11 -07:00
Felix Handte
515c07a131
Merge pull request #3964 from felixhandte/promote-tgt-c-blk-size-to-stable
Promote `ZSTD_c_targetCBlockSize` Parameter to Stable API
2024-03-14 13:06:46 -04:00
Yann Collet
216099a73f
Merge pull request #3962 from facebook/cover_lessIncludes
reduce the amount of #include in cover.h
2024-03-13 16:52:54 -07:00
W. Felix Handte
3613448fb8 Promote ZSTD_c_targetCBlockSize Parameter to Stable API
This feature has demonstrated itself to be useful in web compression and we
want to encourage other folks to use it. But we currently make it difficult
to do so since it's locked away in the experimental API.

The API itself is really straightforward and I think it's fine to commit to
maintaining support / compatibility for this API even if in the future the
underlying implementation may continue to evolve.

Note that this commit changes its enum name and also its numeric value. Users
who respected the instructions of using the experimental API should be fine
with both of these changes since they should only have referred to it by the.

Conceivably someone could have done bad feature detection of this capability
by doing `#ifdef ZSTD_c_targetCBlockSize` which will now return false since
it's no longer a macro... but I think that's an acceptable hypothetical
breakage.
2024-03-13 17:07:10 -04:00
Nick Terrell
ff0afbad58 [asm][aarch64] Mark that BTI and PAC are supported
Mark that `huf_decompress_amd64.S` supports BTI and PAC, which it trivially does because it is empty for aarch64.

The issue only requested BTI markings, but it also makes sense to mark PAC, which is the only other feature.

Also run add a test for this mode to the ARM64 QEMU test. Before this PR it warns on `huf_decompress_amd64.S`, after it doesn't.

Fixes Issue #3841.
2024-03-13 16:15:51 -04:00