krak/zstd - zstd - Gitea: Git with a cup of tea

krak/zstd

mirror of https://github.com/facebook/zstd.git synced 2025-12-25 07:27:19 +02:00

Author	SHA1	Message	Date
Yann Collet	ae9f20ca27	Merge pull request #4554 from facebook/no_legacy Remove legacy support by default	2025-12-18 16:46:12 -08:00
Yann Collet	3a3c506b51	Fix #4553 This is a bug in the streaming implementation of the v0.5 decoder. The bug has always been there. It requires an uncommon block configuration, which wasn't tested at the time. v0.5 is deprecated now, latest version to produce such format is v0.5.1 from February 2016. It was superceded in April 2016. So it's both short lived and very old. Another PR will remove support of this format, but it will still be possible to explicitely request this support on demand, so better fix the issue.	2025-12-18 15:52:11 -08:00
Yann Collet	f818f97be6	build: set ZSTD_LEGACY_SUPPORT=0 in remaining build systems Summary: Completes the transition to disabled legacy support by default across all build systems. This follows up on the previous Makefile and CMake changes to ensure consistent default behavior regardless of the build system used. Updated build configurations: Meson, tests/Makefile, Visual Studio 2008/2010 projects, and BUCK. Test Plan: Verified changes compile correctly via `make lib-release`. Build system configurations have been updated consistently across all platforms.	2025-12-18 13:25:47 -08:00
Yann Collet	6c3e805e50	doc: legacy support is now disabled by default	2025-12-18 13:19:11 -08:00
Yann Collet	073c7fb6ea	update dev version number to v1.6.0 to reflect the relatively big scope change by removing support of legacy formats.	2025-12-18 13:13:56 -08:00
Yann Collet	38cce02684	Makefile: remove support of legacy formats by default can still be changed manually by setting `ZSTD_LEGACY_SUPPORT` to a different value	2025-12-18 12:59:14 -08:00
Lukas Kollmer	88ff5c2769	modulemap: remove `config_macros`	2025-11-25 16:38:08 +01:00
Arpad Panyik	0dffae42e3	AArch64: Remove 32-bit code from ZSTD_decodeSequence Remove the 32-bit code paths from the AArch64 only sections of ZSTD_decodeSequence.	2025-10-08 18:59:24 +00:00
Arpad Panyik	33618c89e5	AArch64: Revert previous branch optimization Revert a branch optimization that was based on an incorrect assumption in the AArch64 part of ZSTD_decodeSequence. In extreme cases the existing implementation could lead to data corruption. Insert an UNLIKELY hint to guide the compilers toward generating more efficient machine code.	2025-10-08 18:58:45 +00:00
ZijianLi	87cc127705	- Modify the GCC version used for CI testing of the RISCV architecture - Fix a bug in the ZSTD_row_getRVVMask function - Improve some performance for ZSTD_copy16()	2025-09-26 22:34:57 +08:00
Yann Collet	17888b3fbe	fix minor initialization warnings	2025-09-24 22:08:03 -07:00
Yann Collet	c15fa3cd40	update documentation of ZSTD_getFrameContentSize() hopefully answering #4495	2025-09-23 23:17:11 -07:00
Yann Collet	4c1f86c777	fix minor warning in legacy decoders for mingw + clang CI test	2025-09-23 13:01:38 -07:00
Yann Collet	be072c708e	Added documentation details for Makefile installation and pkg-config.	2025-09-20 16:33:41 +00:00
Yann Collet	085cc9319a	Merge pull request #4486 from rlefko/fix-pthread-init-memleak Fix memory leak in pthread init functions on failure	2025-09-19 21:42:21 -08:00
Ryan Lefkowitz	c59812e558	🔧 Fix memory leak in pthread init functions on failure When pthread_mutex_init() or pthread_cond_init() fails in the debug implementation (DEBUGLEVEL >= 1), the previously allocated memory was not freed, causing a memory leak. This fix ensures that allocated memory is properly freed when pthread initialization functions fail, preventing resource leaks in error conditions. The issue affects: - ZSTD_pthread_mutex_init() at lib/common/threading.c:146 - ZSTD_pthread_cond_init() at lib/common/threading.c:167 This is particularly important for long-running applications or scenarios with resource constraints where pthread initialization might fail due to system limits.	2025-09-15 18:20:01 -04:00
w1m024	fb7a86f20f	Refactor ZSTD_row_getMatchMask for RVV optimization Performance (vs. SWAR) - 16-byte data: 5.87x speedup - 32-byte data: 9.63x speedup - 64-byte data: 17.98x speedup Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>	2025-09-11 20:45:54 +00:00
w1m024	c9d2cbd5ba	add RVV optimization for ZSTD_row_getMatchMask Co-authored-by: gong-flying <gongxiaofei24@iscas.ac.cn>	2025-09-09 06:20:55 +00:00
Yann Collet	b5c294ea01	Merge pull request #4440 from arpadpanyik-arm/convert_seq_sve2 AArch64: Add SVE2 path for convertSequences_noRepcodes	2025-08-21 17:20:33 -07:00
Arpad Panyik	2849f3a5d1	AArch64: Add SVE2 path for convertSequences_noRepcodes Add an 8-way vector length agnostic (VLA) SVE2 code path for convertSequences_noRepcodes. It works with any SVE vector length. Relative performance to GCC-13 using: `./fullbench -b18 -l5 enwik5` Neon SVE2 Neoverse-V2 before after uplift GCC-13: 100.000% 103.209% 1.032x GCC-14: 100.309% 134.872% 1.344x GCC-15: 100.355% 134.827% 1.343x Clang-18: 123.614% 128.565% 1.040x Clang-19: 123.587% 132.984% 1.076x Clang-20: 123.629% 133.023% 1.075x Neon SVE2 Cortex-A720 before after uplift GCC-13: 100.000% 116.032% 1.160x GCC-14: 99.700% 116.648% 1.169x GCC-15: 100.354% 117.047% 1.166x Clang-18: 100.447% 116.762% 1.162x Clang-19: 100.454% 116.627% 1.160x Clang-20: 100.452% 116.649% 1.161x	2025-08-21 17:37:41 +00:00
Yann Collet	290e692ef8	Merge pull request #4463 from brad0/gnu_source_qsort Check for build environment instead of just _GNU_SOURCE	2025-08-21 09:30:29 -07:00
Thirumalai Nagalingam	42243c3d46	CI: Update build_package.bat for CMake builds	2025-08-20 17:12:05 +05:30
Brad Smith	0d1f8de9ad	Check for build environment instead of just _GNU_SOURCE Fixes the build on OpenBSD and NetBSD. It is too easy for _GNU_SOURCE to be defined even on non-Linux systems. Found via py-zstandard with the embedded copy of zstandard and Python defines _GNU_SOURCE. Also simplify the Linux checking, there is no need to check the rest of the symbol names.	2025-08-19 20:06:24 -04:00
Yann Collet	40c285e0ba	Merge pull request #4419 from AZero13/patch-1 Check for job before releasing resources	2025-08-19 17:02:48 -07:00
Yann Collet	e128976193	Merge pull request #4448 from Cyan4973/install_oses regroup list of OSes for install inside common variable	2025-07-28 11:01:58 -08:00
Yann Collet	8bca04ba9f	regroup list of OSes for install inside common variable within lib/install_oses.mk. fixes #4445	2025-07-28 11:33:22 -07:00
Yann Collet	34f3a0ab11	Merge pull request #4413 from arpadpanyik-arm/huf_decode2x AArch64: Enhance struct access in Huffman decode 2X	2025-07-23 15:03:37 -08:00
Yann Collet	6f1cb87ade	Merge pull request #4443 from facebook/opt_simplify_4442 simplify sequence resolution in zstd_opt	2025-07-23 15:01:36 -08:00
Yann Collet	0055ce7a02	simplify sequence resolution in zstd_opt initially hinted by @pitaj in #4442	2025-07-18 21:21:47 -07:00
Yann Collet	f9e26bb42b	Merge pull request #4394 from AZero13/zstd Remove redundant setting of allJobsCompleted to 1	2025-07-18 18:55:47 -08:00
Yann Collet	8c651868ff	Merge pull request #4418 from arpadpanyik-arm/decode_seq_opt AArch64: Improve ZSTD_decodeSequence performance	2025-07-18 18:54:49 -08:00
Yann Collet	a1e11db08a	Merge pull request #4435 from zijianli1234/dev add riscv ci	2025-07-18 18:54:24 -08:00
Arpad Panyik	07cd78d366	AArch64: Add Neon path for convertSequences_noRepcodes Add a 4-way Neon implementation for the convertSequences_noRepcodes function. Remove 'static' keywords from all of its implementations to be able to add unit tests. Relative performance to Clang-18 using: `./fullbench -b18 -l5 enwik5` Neoverse-V2 before after Clang-18: 100.000% 311.703% Clang-19: 100.191% 311.714% Clang-20: 100.181% 311.723% GCC-13: 107.520% 252.309% GCC-14: 107.652% 253.158% GCC-15: 107.674% 253.168% Cortex-A720 before after Clang-18: 100.000% 204.512% Clang-19: 102.825% 204.600% Clang-20: 102.807% 204.558% GCC-13: 110.668% 203.594% GCC-14: 110.684% 203.978% GCC-15: 102.864% 204.299% Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>	2025-07-10 18:20:57 +00:00
Arpad Panyik	8e4400463a	Improve ZSTD_get1BlockSummary Add a faster scalar implementation of ZSTD_get1BlockSummary which removes the data dependency of the accumulators in the hot loop to leverage the superscalar potential of recent out-of-order CPUs. The new algorithm leverages SWAR (SIMD Within A Register) methodology to exploit the capabilities of 64-bit architectures. It achieves this by packing two 32-bit data elements into a single 64-bit register, enabling parallel operations on these subcomponents while ensuring that the 32-bit boundaries prevent overflow, thereby optimizing computational efficiency. Corresponding unit tests are included. Relative performance to GCC-13 using: `./fullbench -b19 -l5 enwik5` Neoverse-V2 before after GCC-13: 100.000% 290.527% GCC-14: 100.000% 291.714% GCC-15: 99.914% 291.495% Clang-18: 148.072% 264.524% Clang-19: 148.075% 264.512% Clang-20: 148.062% 264.490% Cortex-A720 before after GCC-13: 100.000% 235.261% GCC-14: 101.064% 234.903% GCC-15: 112.977% 218.547% Clang-18: 127.135% 180.359% Clang-19: 127.149% 180.297% Clang-20: 127.154% 180.260% Co-authored by, Thomas Daubney <Thomas.Daubney@arm.com>	2025-07-10 18:20:49 +00:00
ZijianLi	d04e7944dd	add compiler version check.	2025-07-07 23:07:39 +08:00
ZijianLi	2c3f23b018	fix dereferencing type-punned pointer error	2025-06-29 15:36:25 +08:00
Rose	4efbd56749	Check for job before releasing ZSTDMT_freeCCtx calls ZSTDMT_releaseAllJobResources, but ZSTDMT_releaseAllJobResources may be called when ZSTDMT_freeCCtx is called when initialization fails, resulting in a NULL pointer dereference.	2025-06-24 14:05:08 -04:00
Rose	50f169411b	Remove redundant setting of allJobsCompleted to 1 This will do it automatically.	2025-06-24 14:04:21 -04:00
Arpad Panyik	a28e8182b1	AArch64: Improve ZSTD_decodeSequence performance LLVM's alias-analysis sometimes fails to see that a static-array member of a struct cannot alias other members. This patch: - Reduces array accesses via struct indirection to aid load/store alias analysis under Clang. - Converts dynamic array indexing into conditional-move arithmetic, eliminating branches and extra loads/stores on out-of-order CPUs. - Reloads the bitstream only when match-length bits are consumed (assuming each reload only needs to happen once per match-length read), improving branch-prediction rates. - Removes the UNLIKELY() hint, which recent compilers already handle well without cost. Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-19 Clang-20 Clang-* GCC-14 GCC-15 1#silesia.tar: +11.556% +16.203% +0.240% +2.216% +7.891% 2#silesia.tar: +15.493% +21.140% -0.041% +2.850% +9.926% 3#silesia.tar: +16.887% +22.570% -0.183% +3.056% +10.660% 4#silesia.tar: +17.785% +23.315% -0.262% +3.343% +11.187% 5#silesia.tar: +18.125% +24.175% -0.466% +3.350% +11.228% 6#silesia.tar: +17.607% +23.339% -0.591% +3.175% +10.851% 7#silesia.tar: +17.463% +22.837% -0.486% +3.292% +10.868% * Requires Clang-21 support from LLVM commit hash `a53003fe23cb6c871e72d70ff2d3a075a7490da2` (Clang-21 hasn’t been released as of this writing) Co-authored by: David Sherwood, David.Sherwood@arm.com Ola Liljedahl, Ola.Liljedahl@arm.com	2025-06-24 12:22:23 +00:00
Arpad Panyik	bd38fc2c5f	AArch64: Enhance struct access in Huffman decode 2X In the multi-stream multi-symbol Huffman decoder GCC generates suboptimal code - emitting more loads for HUF_DEltX2 struct member accesses. Forcing it to use 32-bit loads and bit arithmetic to extract the necessary parts (UBFX) improves the overall decode speed. Also avoid integer type conversions in the symbol decodes, which leads to better instruction selection in table lookup accesses. On AArch64 the decoder no longer runs into register-pressure limits, so we can simplify the hot path and improve throughput Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-20 Clang-* GCC-13 GCC-14 GCC-15 1#silesia.tar: +0.820% +1.365% +2.480% +1.348% +0.987% 2#silesia.tar: +0.426% +0.784% +1.218% +0.665% +0.554% 3#silesia.tar: +0.112% +0.389% +0.508% +0.188% +0.261% * Requires Clang-21 support from LLVM commit hash `a53003fe23cb6c871e72d70ff2d3a075a7490da2` (Clang-21 hasn’t been released as of this writing)	2025-06-23 14:16:25 +00:00
Arpad Panyik	1e9d2006ae	AArch64: Use better block copy8 The vector copy is only necessary for 16-byte blocks on AArch64. Decompression uplifts on a Neoverse V2 system, using Zstd-1.5.8 compiled with "-O3 -march=armv8.2-a+sve2": Clang-19 Clang-20 GCC-14 GCC-15 1#silesia.tar: +0.316% +0.865% +0.025% +0.096% 2#silesia.tar: +0.689% +1.374% +0.027% +0.065% 3#silesia.tar: +0.811% +1.654% +0.034% +0.033% 4#silesia.tar: +0.912% +1.755% +0.027% +0.042% 5#silesia.tar: +0.995% +1.826% +0.062% +0.094% 6#silesia.tar: +0.976% +1.777% +0.065% +0.104% 7#silesia.tar: +0.910% +1.738% +0.077% +0.110%	2025-06-20 17:05:41 +00:00
Yann Collet	7eefc22169	Merge pull request #4367 from ClickHouse/cfi Add unwind information in huf_decompress_amd64.S	2025-06-19 23:41:38 -07:00
Arpad Panyik	7e4937bc75	AArch64: Add SVE2 implementation of histogram computation The existing scalar implementation uses a 4-way pipelined histogram calculation which is very efficient on out-of-order CPUs. However, this can be further accelerated using the SVE2 HISTSEG instructions - which compute a histogram for 16 byte chunks in a vector register. On a system with 128-bit vectors (VL128) we need 16 HISTSEG executions to compute the histogram for the whole symbol space (0..255) of 16 bytes input. However we can only accumulate 15 of such 16 byte strips before possible overflow. So we need to extend and save the 8-bit histogram accumulators to 16-bit after every 240 byte chunks of input. To store all in registers we would need 32 128-bit registers. Longer SVE2 vectors could help here, if such machines become available. The maximum input block size in Zstd is 128 KiB, so 16-bit accumulators would not be enough. However an LZ pass will prepend the histogram calculation, so it is impossible (my assumption) to overflow the 16-bit accumulators. The symbol distribution is also not uniform, the lower values are more common, so we used a 3 pass algorithm to prevent stack spilling. In the first pass we only compute histograms for 64 symbols (4-way SIMD) while also computing the maximum symbol value. If we have symbol values larger than 64 we start the second pass to compute the next 96 elements of the histogram. The final pass calculates the remaining part of the histogram (256 symbols in total) if needed. This split of histogram generation gave the best overall results for performance. This implementation is the best performing of a number of different cache blocking schemes tested. Compression uplifts on a Neoverse V2 system, using Zstd-1.5.8 (`e26dde3d`) as a baseline, compiled with "-O3 -march=armv8.2-a+sve2": Clang-20 GCC-14 1#silesia.tar: +6.173% +5.987% 2#silesia.tar: +5.200% +5.011% 3#silesia.tar: +4.332% +5.031% 4#silesia.tar: +2.789% +3.064% 5#silesia.tar: +2.028% +1.838% 6#silesia.tar: +1.562% +1.340% 7#silesia.tar: +1.160% +0.959%	2025-06-11 12:14:22 +00:00
Michael Kolupaev	a480191f9e	Fix Darwin build of huf_decompress_amd64.S	2025-06-08 05:07:09 +00:00
Michael Kolupaev	80cac404c7	Add unwind information in huf_decompress_amd64.S	2025-06-08 05:07:09 +00:00
李子建	d95123f2e6	Improve speed of ZSTD_compressSequencesAndLiterals() using RVV	2025-06-02 17:21:02 +08:00
Nobuhiro Iwamatsu	2d224dc745	Add License variable to pkg-config file The pkg-config file has License variable that allows you to set the license for the software. This sets 'BSD-3-Clause OR GPL-2.0-only' to License. Ref: https://github.com/pkgconf/pkgconf/blob/master/man/pc.5#L116 Signed-off-by: Nobuhiro Iwamatsu <iwamatsu@nigauri.org>	2025-05-06 12:16:28 -07:00
Etienne Cordonnier	8929d3b09f	Fix duplicate LC_RPATH error on MacOS After the update to MacOS 15.4, the dynamic loader dyld treats duplicated LC_RPATH as an error. The `FLAGS` variable already contains `LDFLAGS`, thus using both `FLAGS` and `LDFLAGS` duplicates all `LDFLAGS`, including `-Wl,rpath` parameters. The duplicate LC_RPATH causes this kind of errors: ``` dyld[29361]: Library not loaded: @loader_path/../lib/libzstd.1.dylib Referenced from: <7131C877-3CF0-33AC-AA05-257BA4FDD770> /Users/foobar/... Reason: tried: '/Users/foobar/..../lib/libzstd.1.dylib' (duplicate LC_RPATH '/usr/mypath.../lib') ``` Closes https://github.com/facebook/zstd/issues/4369 Signed-off-by: Etienne Cordonnier <ecordonnier@snap.com>	2025-04-18 15:59:06 +02:00
Yann Collet	2fec3989c1	add an assert to help static analyzers understand there is no overflow risk there.	2025-03-22 18:23:31 -07:00
Z. Liu	cd8ca9d92e	lib/zstd.h: move pragma before static otherwise will cause dev-python/zstandard build failed when compiling with clang as reported at https://bugs.gentoo.org/950259 the root cause is pycparser, which is unfixed since reported 2.5 years ago, :( Signed-off-by: Z. Liu <zhixu.liu@gmail.com>	2025-03-20 03:40:42 +00:00

1 2 3 4 5 ...

4880 Commits