1
0
mirror of https://github.com/facebook/zstd.git synced 2025-12-25 07:27:19 +02:00
Commit Graph

4880 Commits

Author SHA1 Message Date
Yann Collet
ae9f20ca27 Merge pull request #4554 from facebook/no_legacy
Remove legacy support by default
2025-12-18 16:46:12 -08:00
Yann Collet
3a3c506b51 Fix #4553
This is a bug in the streaming implementation of the v0.5 decoder.
The bug has always been there.
It requires an uncommon block configuration, which wasn't tested at the time.

v0.5 is deprecated now,
latest version to produce such format is v0.5.1 from February 2016.
It was superceded in April 2016.
So it's both short lived and very old.

Another PR will remove support of this format,
but it will still be possible to explicitely request this support on demand,
so better fix the issue.
2025-12-18 15:52:11 -08:00
Yann Collet
f818f97be6 build: set ZSTD_LEGACY_SUPPORT=0 in remaining build systems
Summary:
Completes the transition to disabled legacy support by default across all build systems. This follows up on the previous Makefile and CMake changes to ensure consistent default behavior regardless of the build system used.

Updated build configurations: Meson, tests/Makefile, Visual Studio 2008/2010 projects, and BUCK.

Test Plan:
Verified changes compile correctly via `make lib-release`. Build system configurations have been updated consistently across all platforms.
2025-12-18 13:25:47 -08:00
Yann Collet
6c3e805e50 doc: legacy support is now disabled by default 2025-12-18 13:19:11 -08:00
Yann Collet
073c7fb6ea update dev version number to v1.6.0
to reflect the relatively big scope change by removing support of legacy formats.
2025-12-18 13:13:56 -08:00
Yann Collet
38cce02684 Makefile: remove support of legacy formats by default
can still be changed manually by setting `ZSTD_LEGACY_SUPPORT` to a different value
2025-12-18 12:59:14 -08:00
Lukas Kollmer
88ff5c2769 modulemap: remove config_macros 2025-11-25 16:38:08 +01:00
Arpad Panyik
0dffae42e3 AArch64: Remove 32-bit code from ZSTD_decodeSequence
Remove the 32-bit code paths from the AArch64 only sections of
ZSTD_decodeSequence.
2025-10-08 18:59:24 +00:00
Arpad Panyik
33618c89e5 AArch64: Revert previous branch optimization
Revert a branch optimization that was based on an incorrect
assumption in the AArch64 part of ZSTD_decodeSequence. In extreme
cases the existing implementation could lead to data corruption.

Insert an UNLIKELY hint to guide the compilers toward generating more
efficient machine code.
2025-10-08 18:58:45 +00:00
ZijianLi
87cc127705 - Modify the GCC version used for CI testing of the RISCV architecture
- Fix a bug in the ZSTD_row_getRVVMask function
- Improve some performance for ZSTD_copy16()
2025-09-26 22:34:57 +08:00
Yann Collet
17888b3fbe fix minor initialization warnings 2025-09-24 22:08:03 -07:00
Yann Collet
c15fa3cd40 update documentation of ZSTD_getFrameContentSize()
hopefully answering #4495
2025-09-23 23:17:11 -07:00
Yann Collet
4c1f86c777 fix minor warning in legacy decoders
for mingw + clang CI test
2025-09-23 13:01:38 -07:00
Yann Collet
be072c708e Added documentation details for Makefile installation and pkg-config. 2025-09-20 16:33:41 +00:00
Yann Collet
085cc9319a Merge pull request #4486 from rlefko/fix-pthread-init-memleak
Fix memory leak in pthread init functions on failure
2025-09-19 21:42:21 -08:00
Ryan Lefkowitz
c59812e558 🔧 Fix memory leak in pthread init functions on failure
When pthread_mutex_init() or pthread_cond_init() fails in the debug
implementation (DEBUGLEVEL >= 1), the previously allocated memory was
not freed, causing a memory leak.

This fix ensures that allocated memory is properly freed when pthread
initialization functions fail, preventing resource leaks in error
conditions.

The issue affects:
- ZSTD_pthread_mutex_init() at lib/common/threading.c:146
- ZSTD_pthread_cond_init() at lib/common/threading.c:167

This is particularly important for long-running applications or
scenarios with resource constraints where pthread initialization
might fail due to system limits.
2025-09-15 18:20:01 -04:00
w1m024
fb7a86f20f Refactor ZSTD_row_getMatchMask for RVV optimization
Performance (vs. SWAR)
- 16-byte data: 5.87x speedup
- 32-byte data: 9.63x speedup
- 64-byte data: 17.98x speedup

Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>
2025-09-11 20:45:54 +00:00
w1m024
c9d2cbd5ba add RVV optimization for ZSTD_row_getMatchMask
Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>
2025-09-09 06:20:55 +00:00
Yann Collet
b5c294ea01 Merge pull request #4440 from arpadpanyik-arm/convert_seq_sve2
AArch64: Add SVE2 path for convertSequences_noRepcodes
2025-08-21 17:20:33 -07:00
Arpad Panyik
2849f3a5d1 AArch64: Add SVE2 path for convertSequences_noRepcodes
Add an 8-way vector length agnostic (VLA) SVE2 code path for
convertSequences_noRepcodes. It works with any SVE vector length.

Relative performance to GCC-13 using: `./fullbench -b18 -l5 enwik5`

               Neon      SVE2
Neoverse-V2   before     after    uplift
GCC-13:      100.000%  103.209%   1.032x
GCC-14:      100.309%  134.872%   1.344x
GCC-15:      100.355%  134.827%   1.343x
Clang-18:    123.614%  128.565%   1.040x
Clang-19:    123.587%  132.984%   1.076x
Clang-20:    123.629%  133.023%   1.075x

               Neon      SVE2
Cortex-A720   before     after    uplift
GCC-13:      100.000%  116.032%   1.160x
GCC-14:       99.700%  116.648%   1.169x
GCC-15:      100.354%  117.047%   1.166x
Clang-18:    100.447%  116.762%   1.162x
Clang-19:    100.454%  116.627%   1.160x
Clang-20:    100.452%  116.649%   1.161x
2025-08-21 17:37:41 +00:00
Yann Collet
290e692ef8 Merge pull request #4463 from brad0/gnu_source_qsort
Check for build environment instead of just _GNU_SOURCE
2025-08-21 09:30:29 -07:00
Thirumalai Nagalingam
42243c3d46 CI: Update build_package.bat for CMake builds 2025-08-20 17:12:05 +05:30
Brad Smith
0d1f8de9ad Check for build environment instead of just _GNU_SOURCE
Fixes the build on OpenBSD and NetBSD. It is too easy for _GNU_SOURCE
to be defined even on non-Linux systems. Found via py-zstandard with
the embedded copy of zstandard and Python defines _GNU_SOURCE.

Also simplify the Linux checking, there is no need to check the rest
of the symbol names.
2025-08-19 20:06:24 -04:00
Yann Collet
40c285e0ba Merge pull request #4419 from AZero13/patch-1
Check for job before releasing resources
2025-08-19 17:02:48 -07:00
Yann Collet
e128976193 Merge pull request #4448 from Cyan4973/install_oses
regroup list of OSes for install inside common variable
2025-07-28 11:01:58 -08:00
Yann Collet
8bca04ba9f regroup list of OSes for install inside common variable
within lib/install_oses.mk.

fixes #4445
2025-07-28 11:33:22 -07:00
Yann Collet
34f3a0ab11 Merge pull request #4413 from arpadpanyik-arm/huf_decode2x
AArch64: Enhance struct access in Huffman decode 2X
2025-07-23 15:03:37 -08:00
Yann Collet
6f1cb87ade Merge pull request #4443 from facebook/opt_simplify_4442
simplify sequence resolution in zstd_opt
2025-07-23 15:01:36 -08:00
Yann Collet
0055ce7a02 simplify sequence resolution in zstd_opt
initially hinted by @pitaj in #4442
2025-07-18 21:21:47 -07:00
Yann Collet
f9e26bb42b Merge pull request #4394 from AZero13/zstd
Remove redundant setting of allJobsCompleted to 1
2025-07-18 18:55:47 -08:00
Yann Collet
8c651868ff Merge pull request #4418 from arpadpanyik-arm/decode_seq_opt
AArch64: Improve ZSTD_decodeSequence performance
2025-07-18 18:54:49 -08:00
Yann Collet
a1e11db08a Merge pull request #4435 from zijianli1234/dev
add riscv  ci
2025-07-18 18:54:24 -08:00
Arpad Panyik
07cd78d366 AArch64: Add Neon path for convertSequences_noRepcodes
Add a 4-way Neon implementation for the convertSequences_noRepcodes
function. Remove 'static' keywords from all of its implementations to
be able to add unit tests.

Relative performance to Clang-18 using: `./fullbench -b18 -l5 enwik5`

Neoverse-V2   before     after
Clang-18:    100.000%  311.703%
Clang-19:    100.191%  311.714%
Clang-20:    100.181%  311.723%
GCC-13:      107.520%  252.309%
GCC-14:      107.652%  253.158%
GCC-15:      107.674%  253.168%

Cortex-A720   before     after
Clang-18:    100.000%  204.512%
Clang-19:    102.825%  204.600%
Clang-20:    102.807%  204.558%
GCC-13:      110.668%  203.594%
GCC-14:      110.684%  203.978%
GCC-15:      102.864%  204.299%

Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>
2025-07-10 18:20:57 +00:00
Arpad Panyik
8e4400463a Improve ZSTD_get1BlockSummary
Add a faster scalar implementation of ZSTD_get1BlockSummary which
removes the data dependency of the accumulators in the hot loop to
leverage the superscalar potential of recent out-of-order CPUs.
The new algorithm leverages SWAR (SIMD Within A Register) methodology
to exploit the capabilities of 64-bit architectures. It achieves this
by packing two 32-bit data elements into a single 64-bit register,
enabling parallel operations on these subcomponents while ensuring
that the 32-bit boundaries prevent overflow, thereby optimizing
computational efficiency.

Corresponding unit tests are included.

Relative performance to GCC-13 using: `./fullbench -b19 -l5 enwik5`

Neoverse-V2   before     after
GCC-13:      100.000%  290.527%
GCC-14:      100.000%  291.714%
GCC-15:       99.914%  291.495%
Clang-18:    148.072%  264.524%
Clang-19:    148.075%  264.512%
Clang-20:    148.062%  264.490%

Cortex-A720   before     after
GCC-13:      100.000%  235.261%
GCC-14:      101.064%  234.903%
GCC-15:      112.977%  218.547%
Clang-18:    127.135%  180.359%
Clang-19:    127.149%  180.297%
Clang-20:    127.154%  180.260%

Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>
2025-07-10 18:20:49 +00:00
ZijianLi
d04e7944dd add compiler version check. 2025-07-07 23:07:39 +08:00
ZijianLi
2c3f23b018 fix dereferencing type-punned pointer error 2025-06-29 15:36:25 +08:00
Rose
4efbd56749 Check for job before releasing
ZSTDMT_freeCCtx calls ZSTDMT_releaseAllJobResources, but ZSTDMT_releaseAllJobResources may be called when ZSTDMT_freeCCtx is called when initialization fails, resulting in a NULL pointer dereference.
2025-06-24 14:05:08 -04:00
Rose
50f169411b Remove redundant setting of allJobsCompleted to 1
This will do it automatically.
2025-06-24 14:04:21 -04:00
Arpad Panyik
a28e8182b1 AArch64: Improve ZSTD_decodeSequence performance
LLVM's alias-analysis sometimes fails to see that a static-array member
of a struct cannot alias other members. This patch:

- Reduces array accesses via struct indirection to aid load/store alias
  analysis under Clang.
- Converts dynamic array indexing into conditional-move arithmetic,
  eliminating branches and extra loads/stores on out-of-order CPUs.
- Reloads the bitstream only when match-length bits are consumed
  (assuming each reload only needs to happen once per match-length
  read), improving branch-prediction rates.
- Removes the UNLIKELY() hint, which recent compilers already handle
  well without cost.

Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8
compiled with "-O3 -march=armv8.2-a+sve2":

                 Clang-19  Clang-20   Clang-*    GCC-14    GCC-15
 1#silesia.tar:  +11.556%  +16.203%   +0.240%   +2.216%   +7.891%
 2#silesia.tar:  +15.493%  +21.140%   -0.041%   +2.850%   +9.926%
 3#silesia.tar:  +16.887%  +22.570%   -0.183%   +3.056%  +10.660%
 4#silesia.tar:  +17.785%  +23.315%   -0.262%   +3.343%  +11.187%
 5#silesia.tar:  +18.125%  +24.175%   -0.466%   +3.350%  +11.228%
 6#silesia.tar:  +17.607%  +23.339%   -0.591%   +3.175%  +10.851%
 7#silesia.tar:  +17.463%  +22.837%   -0.486%   +3.292%  +10.868%

* Requires Clang-21 support from LLVM commit hash
  `a53003fe23cb6c871e72d70ff2d3a075a7490da2`
   (Clang-21 hasn’t been released as of this writing)

Co-authored by:
 David Sherwood, David.Sherwood@arm.com
 Ola Liljedahl, Ola.Liljedahl@arm.com
2025-06-24 12:22:23 +00:00
Arpad Panyik
bd38fc2c5f AArch64: Enhance struct access in Huffman decode 2X
In the multi-stream multi-symbol Huffman decoder GCC generates
suboptimal code - emitting more loads for HUF_DEltX2 struct member
accesses. Forcing it to use 32-bit loads and bit arithmetic to extract
the necessary parts (UBFX) improves the overall decode speed.

Also avoid integer type conversions in the symbol decodes, which
leads to better instruction selection in table lookup accesses.

On AArch64 the decoder no longer runs into register-pressure limits,
so we can simplify the hot path and improve throughput

Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8
compiled with "-O3 -march=armv8.2-a+sve2":

                 Clang-20   Clang-*    GCC-13    GCC-14    GCC-15
 1#silesia.tar:   +0.820%   +1.365%   +2.480%   +1.348%   +0.987%
 2#silesia.tar:   +0.426%   +0.784%   +1.218%   +0.665%   +0.554%
 3#silesia.tar:   +0.112%   +0.389%   +0.508%   +0.188%   +0.261%

* Requires Clang-21 support from LLVM commit hash
  `a53003fe23cb6c871e72d70ff2d3a075a7490da2`
  (Clang-21 hasn’t been released as of this writing)
2025-06-23 14:16:25 +00:00
Arpad Panyik
1e9d2006ae AArch64: Use better block copy8
The vector copy is only necessary for 16-byte blocks on AArch64.

Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8
compiled with "-O3 -march=armv8.2-a+sve2":

                 Clang-19  Clang-20    GCC-14    GCC-15
 1#silesia.tar:   +0.316%   +0.865%   +0.025%   +0.096%
 2#silesia.tar:   +0.689%   +1.374%   +0.027%   +0.065%
 3#silesia.tar:   +0.811%   +1.654%   +0.034%   +0.033%
 4#silesia.tar:   +0.912%   +1.755%   +0.027%   +0.042%
 5#silesia.tar:   +0.995%   +1.826%   +0.062%   +0.094%
 6#silesia.tar:   +0.976%   +1.777%   +0.065%   +0.104%
 7#silesia.tar:   +0.910%   +1.738%   +0.077%   +0.110%
2025-06-20 17:05:41 +00:00
Yann Collet
7eefc22169 Merge pull request #4367 from ClickHouse/cfi
Add unwind information in huf_decompress_amd64.S
2025-06-19 23:41:38 -07:00
Arpad Panyik
7e4937bc75 AArch64: Add SVE2 implementation of histogram computation
The existing scalar implementation uses a 4-way pipelined histogram
calculation which is very efficient on out-of-order CPUs. However,
this can be further accelerated using the SVE2 HISTSEG instructions -
which compute a histogram for 16 byte chunks in a vector register.

On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions
to compute the histogram for the whole symbol space (0..255) of 16
bytes input. However we can only accumulate 15 of such 16 byte strips
before possible overflow. So we need to extend and save the 8-bit
histogram accumulators to 16-bit after every 240 byte chunks of input.
To store all in registers we would need 32 128-bit registers. Longer
SVE2 vectors could help here, if such machines become available.

The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators
would not be enough. However an LZ pass will prepend the histogram
calculation, so it is impossible (my assumption) to overflow the 16-bit
accumulators.

The symbol distribution is also not uniform, the lower values are more
common, so we used a 3 pass algorithm to prevent stack spilling. In the
first pass we only compute histograms for 64 symbols (4-way SIMD) while
also computing the maximum symbol value. If we have symbol values
larger than 64 we start the second pass to compute the next 96 elements
of the histogram. The final pass calculates the remaining part of the
histogram (256 symbols in total) if needed. This split of histogram
generation gave the best overall results for performance.

This implementation is the best performing of a number of different
cache blocking schemes tested.

Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8
(e26dde3d) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2":

                 Clang-20    GCC-14
 1#silesia.tar:   +6.173%   +5.987%
 2#silesia.tar:   +5.200%   +5.011%
 3#silesia.tar:   +4.332%   +5.031%
 4#silesia.tar:   +2.789%   +3.064%
 5#silesia.tar:   +2.028%   +1.838%
 6#silesia.tar:   +1.562%   +1.340%
 7#silesia.tar:   +1.160%   +0.959%
2025-06-11 12:14:22 +00:00
Michael Kolupaev
a480191f9e Fix Darwin build of huf_decompress_amd64.S 2025-06-08 05:07:09 +00:00
Michael Kolupaev
80cac404c7 Add unwind information in huf_decompress_amd64.S 2025-06-08 05:07:09 +00:00
李子建
d95123f2e6 Improve speed of ZSTD_compressSequencesAndLiterals() using RVV 2025-06-02 17:21:02 +08:00
Nobuhiro Iwamatsu
2d224dc745 Add License variable to pkg-config file
The pkg-config file has License variable that allows you to set the license for
the software. This sets 'BSD-3-Clause OR GPL-2.0-only' to License.

Ref: https://github.com/pkgconf/pkgconf/blob/master/man/pc.5#L116
Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>
2025-05-06 12:16:28 -07:00
Etienne Cordonnier
8929d3b09f Fix duplicate LC_RPATH error on MacOS
After the update to MacOS 15.4, the dynamic loader dyld treats duplicated LC_RPATH as an error.
The `FLAGS` variable already contains `LDFLAGS`, thus using both `FLAGS` and `LDFLAGS`
duplicates all `LDFLAGS`, including `-Wl,rpath` parameters.

The duplicate LC_RPATH causes this kind of errors:

```
dyld[29361]: Library not loaded: @loader_path/../lib/libzstd.1.dylib
      Referenced from: <7131C877-3CF0-33AC-AA05-257BA4FDD770> /Users/foobar/...
      Reason: tried: '/Users/foobar/..../lib/libzstd.1.dylib' (duplicate LC_RPATH '/usr/mypath.../lib')
```

Closes https://github.com/facebook/zstd/issues/4369

Signed-off-by: Etienne Cordonnier <ecordonnier@snap.com>
2025-04-18 15:59:06 +02:00
Yann Collet
2fec3989c1 add an assert
to help static analyzers understand there is no overflow risk there.
2025-03-22 18:23:31 -07:00
Z. Liu
cd8ca9d92e lib/zstd.h: move pragma before static
otherwise will cause dev-python/zstandard build failed when compiling with
clang as reported at https://bugs.gentoo.org/950259

the root cause is pycparser, which is unfixed since reported 2.5 years
ago, :(

Signed-off-by: Z. Liu <zhixu.liu@gmail.com>
2025-03-20 03:40:42 +00:00