1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-06 15:45:37 +02:00
Commit Graph

124 Commits

Author SHA1 Message Date
1f83b7cfc4 fix a minor inefficiency in compress_superblock
and in `decodecorpus`:
the specific case `nbSeq=127` can be represented using the 1-byte format.
Note that both the 1-byte and the 2-bytes formats are valid to represent this case,
so there was no "error", produced data remains valid,
it's just that the 1-byte format is more efficient.

fix #3667

Credit to @ip7z for finding this issue.
2023-06-05 09:51:52 -07:00
fbd97f305a Deprecated bufferless and block level APIs
* Mark all bufferless and block level functions as deprecated
* Update documentation to suggest not using these functions
* Add `_deprecated()` wrappers for functions that we use internally and
  call those instead
2023-03-16 10:04:15 -07:00
329169189c Replace Huffman boolean args with flags bit set 2023-01-20 14:12:53 -08:00
0cc1b0cb22 Delete unused Huffman functions
Remove all Huffman functions that aren't used by zstd.
2023-01-20 14:12:53 -08:00
bcfb7ad03c refactor timefn
The timer storage type is no longer dependent on OS.
This will make it possible to re-enable posix precise timers
since the timer storage type will no longer be sensible to #include order.
See #3168 for details of pbs of previous interface.

Suggestion by @terrelln
2023-01-12 19:24:31 -08:00
8b130009e3 minor simplification refactoring for timefn
`UTIL_getSpanTimeMicro()` can be factored in a generic way,
reducing OS-dependent code.
2023-01-06 16:12:54 -08:00
40a7188130 Fix make clangbuild & add CI
Fix the errors for:
* `-Wdocumentation`
* `-Wconversion` except `-Wsign-conversion`
2022-12-21 17:31:04 -08:00
5d693cc38c Coalesce Almost All Copyright Notices to Standard Phrasing
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora -o -path ./tests/regression/data-cache -o -path ./tests/regression/cache \) -prune -o -type f); do sed -i '/Copyright .* \(Yann Collet\)\|\(Meta Platforms\)/ s/Copyright .*/Copyright (c) Meta Platforms, Inc. and affiliates./' $f; done

git checkout HEAD -- build/VS2010/libzstd-dll/libzstd-dll.rc build/VS2010/zstd/zstd.rc tests/test-license.py contrib/linux-kernel/test/include/linux/xxhash.h examples/streaming_compression_thread_pool.c lib/legacy/zstd_v0*.c lib/legacy/zstd_v0*.h
nano ./programs/windres/zstd.rc
nano ./build/VS2010/zstd/zstd.rc
nano ./build/VS2010/libzstd-dll/libzstd-dll.rc
```
2022-12-20 12:52:34 -05:00
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
7a18d709ae updated all names to offBase convention 2021-12-29 17:30:43 -08:00
681c81f06c abstracted storeSeq() sumtype numeric representation from decodecorpus.c 2021-12-28 11:58:33 -08:00
1aed962216 introduce macros STORE_OFFSET() and STORE_REPCODE()
this meant to abstract the sumtype representation required
to transfert `offcode` to `ZSTD_storeSeq()`.

Unfortunately, the sumtype numeric representation is currently a leaky abstraction
that has permeated many other parts of the code,
especially within `zstd_lazy.c` and also within `zstd_opt.c` and `zstd_compress.c`.

While this PR makes a good job a transfering a large nb of call sites
to using the new macros, there are still a few sites where this transformation is more complex,
or where the numeric representation itself it used "as is".

One of the problematics area is the decision to use the numeric format of the sumtype
within the match finders of `zstd_lazy`.

This commit doesn't change the behavior, it only introduces and employes the macros,
but eventually the resulting code remains identical.

At target, if the numeric representation of the sumtype can be completely abstracted
and no other part of the code depends on it,
it will be possible to move it towards something slightly more efficient.
2021-12-23 22:03:30 -08:00
aeff128331 change seqDef.offset into seqDef.offBase
to better reflect the value stored in this field.
2021-12-23 17:56:08 -08:00
e145b58cfd changed seqDef.matchLength into seqDef.mlBase
since this is effectively what is stored in this field (== matchLength - MINMATCH).
This makes it clearer what needs to be done when reading from / writing to this field.
2021-12-23 13:39:46 -08:00
b77fcac61f change ZSTD_storeSeq() interface to accept matchLength
instead of mlBase.

This removes the need to do `- MINMATCH` at every call site.

The new interface contract is checked with an `assert()`.
2021-12-23 12:03:33 -08:00
46f2710562 [HUF] Improve Huffman encoding speed
Improve Huffman encoding speed by 20% for gcc and 10% for clang.

| Compiler |     Benchmark     | Config  |   Dataset   | Ratio | Speed MB/s (dev) | Speed MB/s (huf-cspeed) | Speed MB/s (huf-cspeed - dev) |
|----------|-------------------|---------|-------------|-------|------------------|-------------------------|-------------------------------|
| gcc      | compress          | level_1 | enwik7      | 2.43  | 253.70           | 258.72                  | 2.0%                          |
| gcc      | compress          | level_1 | silesia     | 2.88  | 341.90           | 348.15                  | 1.8%                          |
| gcc      | compress_literals | level_1 | enwik7      | 1.49  | 761.83           | 912.76                  | 19.8%                         |
| gcc      | compress_literals | level_1 | silesia     | 1.28  | 754.83           | 902.37                  | 19.5%                         |
| gcc      | compress_literals | level_7 | enwik7      | 1.29  | 502.81           | 552.79                  | 9.9%                          |
| gcc      | compress_literals | level_7 | silesia     | 1.11  | 675.97           | 776.44                  | 14.9%                         |
| clang    | compress          | level_1 | enwik7      | 2.43  | 277.54           | 280.98                  | 1.2%                          |
| clang    | compress          | level_1 | silesia     | 2.88  | 369.98           | 375.46                  | 1.5%                          |
| clang    | compress_literals | level_1 | enwik7      | 1.49  | 828.83           | 918.41                  | 10.8%                         |
| clang    | compress_literals | level_1 | silesia     | 1.28  | 815.81           | 905.41                  | 11.0%                         |
| clang    | compress_literals | level_7 | enwik7      | 1.29  | 533.13           | 553.30                  | 3.8%                          |
| clang    | compress_literals | level_7 | silesia     | 1.11  | 714.52           | 775.38                  | 8.5%                          |
2021-07-27 15:10:35 -07:00
a494308ae9 [copyright][license] Switch to yearless copyright and some cleanup in the linux-kernel files
* Switch to yearless copyright per FB policy
* Fix up SPDX-License-Identifier lines in `contrib/linux-kernel` sources
* Add zstd copyright/license header to the `contrib/linux-kernel` sources
* Update the `tests/test-license.py` to check for yearless copyright
* Improvements to `tests/test-license.py`
* Check `contrib/linux-kernel` in `tests/test-license.py`
2021-03-30 10:30:43 -07:00
66e811d782 [license] Update year to 2021 2021-01-04 17:53:52 -05:00
5c0a3489a5 fix aliasing warning in decodecorpus 2020-12-04 19:21:40 -08:00
a90779397a [lib] Reduce zstd stack usage by 1KB 2020-09-09 14:35:39 -07:00
8f8bd2d1ac [regression] Update results.csv 2020-08-20 12:41:35 -07:00
cc7c29595d Fixed tests to use correct workspace size 2020-05-01 13:45:48 -07:00
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
e068bd01df [tests] Fix decodecorpus 2019-09-20 01:09:47 -07:00
a3ce0c9d04 Fixing decodecorpus test issue 2019-07-18 14:32:09 -07:00
734eff70b8 enable repeat mode on rle 2019-06-26 16:39:00 -07:00
a880ca239b Spelling (#1582)
* spelling: accidentally

* spelling: across

* spelling: additionally

* spelling: addresses

* spelling: appropriate

* spelling: assumed

* spelling: available

* spelling: builder

* spelling: capacity

* spelling: compiler

* spelling: compressibility

* spelling: compressor

* spelling: compression

* spelling: contract

* spelling: convenience

* spelling: decompress

* spelling: description

* spelling: deflate

* spelling: deterministically

* spelling: dictionary

* spelling: display

* spelling: eliminate

* spelling: preemptively

* spelling: exclude

* spelling: failure

* spelling: independence

* spelling: independent

* spelling: intentionally

* spelling: matching

* spelling: maximum

* spelling: meaning

* spelling: mishandled

* spelling: memory

* spelling: occasionally

* spelling: occurrence

* spelling: official

* spelling: offsets

* spelling: original

* spelling: output

* spelling: overflow

* spelling: overridden

* spelling: parameter

* spelling: performance

* spelling: probability

* spelling: receives

* spelling: redundant

* spelling: recompression

* spelling: resources

* spelling: sanity

* spelling: segment

* spelling: series

* spelling: specified

* spelling: specify

* spelling: subtracted

* spelling: successful

* spelling: return

* spelling: translation

* spelling: update

* spelling: unrelated

* spelling: useless

* spelling: variables

* spelling: variety

* spelling: verbatim

* spelling: verification

* spelling: visited

* spelling: warming

* spelling: workers

* spelling: with
2019-04-12 11:18:11 -07:00
59a7116cc2 benchfn dependencies reduced to only timefn
benchfn used to rely on mem.h, and util,
which in turn relied on platform.h.
Using benchfn outside of zstd required to bring all these dependencies.

Now, dependency is reduced to timefn only.
This required to create a separate timefn from util,
and rewrite benchfn and timefn to no longer need mem.h.

Separating timefn from util has a wide effect accross the code base,
as usage of time functions is widespread.
A lot of build scripts had to be updated to also include timefn.
2019-04-10 12:37:03 -07:00
03e040a966 Replace Uses of CHECK_E with RETURN_ERROR_IF(*_isError(... 2019-01-28 17:33:01 -05:00
ededcfca57 fix confusion between unsigned <-> U32
as suggested in #1441.

generally U32 and unsigned are the same thing,
except when they are not ...

case : 32-bit compilation for MIPS (uint32_t == unsigned long)

A vast majority of transformation consists in transforming U32 into unsigned.
In rare cases, it's the other way around (typically for internal code, such as seeds).

Among a few issues this patches solves :
- some parameters were declared with type `unsigned` in *.h,
  but with type `U32` in their implementation *.c .
- some parameters have type unsigned*,
  but the caller user a pointer to U32 instead.

These fixes are useful.

However, the bulk of changes is about %u formating,
which requires unsigned type,
but generally receives U32 values instead,
often just for brevity (U32 is shorter than unsigned).
These changes are generally minor, or even annoying.

As a consequence, the amount of code changed is larger than I would expect for such a patch.

Testing is also a pain :
it requires manually modifying `mem.h`,
in order to lie about `U32`
and force it to be an `unsigned long` typically.
On a 64-bit system, this will break the equivalence unsigned == U32.
Unfortunately, it will also break a few static_assert(), controlling structure sizes.
So it also requires modifying `debug.h` to make `static_assert()` a noop.
And then reverting these changes.

So it's inconvenient, and as a consequence,
this property is currently not checked during CI tests.
Therefore, these problems can emerge again in the future.

I wonder if it is worth ensuring proper distinction of U32 != unsigned in CI tests.
It's another restriction for coding, adding more frustration during merge tests,
since most platforms don't need this distinction (hence contributor will not see it),
and while this can matter in theory, the number of platforms impacted seems minimal.

Thoughts ?
2018-12-21 18:09:41 -08:00
7b74405150 refactor HUF_compress_internal for clarity
changed workspace parameter convention
to always provide workspaceSize,
so that size can be explicitly checked.

Also, use more enum to make the meaning of some parameters more explicit.
2018-10-26 13:21:37 -07:00
f181799082 fix decodecorpus incorrect frame generation
fix #1379
decodecorpus was generating one extraneous byte when `nbSeq==0`.
This is disallowed by the specification.

The reference decoder was just skipping the extraneous byte.
It is now stricter, and flag such situation as an error.
2018-10-20 18:56:21 -07:00
e3b5286197 Fix decodecorpus 2018-08-28 13:56:47 -07:00
3b56bb1e4c Fix decodecorpus 2018-08-23 17:48:06 -07:00
2d76defbfe grouped all histogram functions into hist.c
renamed functions with HIST_* prefix
2018-06-13 19:49:31 -04:00
dab8cfa3c7 Combine definitions of SEC_TO_MICRO 2017-11-30 19:40:53 -08:00
9a2f6f477b Use util.h for timing 2017-11-30 14:57:25 -08:00
7f6a783862 fixed a small error in decodeCorpus
a compressed block must be strictly smaller than its decompressed size.
2017-10-07 15:19:52 -07:00
bbe77212ef [libzstd] Increase MaxOff 2017-09-25 13:36:18 -07:00
963558a072 Fix implicit conversion error 2017-09-13 16:01:16 -07:00
40bf0ced7d Add flag to limit max decompressed size in decodeCorpus 2017-09-13 15:16:56 -07:00
e89065506e Make decodecorpus generate raw compressed blocks 2017-09-12 17:18:45 -07:00
3128e03be6 updated license header
to clarify dual-license meaning as "or"
2017-09-08 00:09:23 -07:00
32fb407c9d updated a bunch of headers
for the new license
2017-08-18 16:52:05 -07:00
7ac4724bd2 removed fnum from DISPLAY statements 2017-06-28 13:00:49 -07:00
e667d33b0b fixed generation of buggy test, corrected DISPLAY statements for errors 2017-06-28 12:19:37 -07:00
20eeb243d1 Merge pull request #729 from paulcruz74/corpus
Corpus
2017-06-26 17:47:28 -07:00
3a295a91f8 added additional condition so large offsets into the dictionary are not generated past windowSize 2017-06-23 15:54:51 -07:00
2085375816 fixed bug detected by the API test 2017-06-23 13:44:24 -07:00
8cd134559d type warnings 2017-06-23 12:00:48 -07:00