1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-06 07:37:29 +02:00
Commit Graph

1490 Commits

Author SHA1 Message Date
add7ed2d4a [lib] Fix bug in loading LDM dictionary in MT mode
Exposed when loading a dictionary < LDM minMatch bytes in MT mode.

Test Plan:
```
CC=clang make -j zstreamtest MOREFLAGS="-O0 -fsanitize=address"
./zstreamtest -vv -i100000000 -t1 --newapi -s7065 -t3925297
```

TODO: Add an explicit test that loads a small dictionary in MT mode
2020-05-14 11:52:28 -07:00
70c80e19e6 [greedy] Fix performance instability 2020-05-12 17:51:16 -07:00
c3e921c639 Merge pull request #2131 from terrelln/raw-dict-fuzzer
Fix rare scenario with lazy parser, dictionary, and repcodes
2020-05-12 17:44:31 -07:00
3c1eba4d99 [lib] Fix lazy repcode validity checks 2020-05-12 12:25:06 -07:00
4e0515916d [lib] Fix repcode validation in no dict mode 2020-05-12 11:57:15 -07:00
6d687a8816 [lib] Fix dictionary + repcodes + optimal parser 2020-05-12 10:36:53 -07:00
4b88bd3ee0 [lib][fuzz] Assert sequences are valid in round trip tests 2020-05-11 20:38:49 -07:00
80d3585e31 [lib] Fix lazy parser with dictionary + repcodes 2020-05-11 19:04:30 -07:00
608f1bfc4c fixed context downsize with initStatic
When context is created using initStatic,
no resize is possible.

fix : only bump oversizeDuration when !initStatic
2020-05-11 18:16:38 -07:00
c6636afbbb Fix ZSTD_estimateCCtxSize() Under ASAN
`ZSTD_estimateCCtxSize()` provides estimates for one-shot compression, which
is guaranteed not to buffer inputs or outputs. So it ignores the sizes of the
buffers, assuming they'll be zero. However, the actual workspace allocation
logic always allocates those buffers, and when running under ASAN, the
workspace surrounds every allocation with 256 bytes of redzone. So the 0-sized
buffers end up consuming 512 bytes of space, which is accounted for in the
actual allocation path through the use of `ZSTD_cwksp_alloc_size()` but isn't
in the estimation path, since it ignores the buffers entirely.

This commit fixes this.
2020-05-11 18:58:19 -04:00
54144285fd small speed improvement for strategy fast
gcc 9.3.0 :
kennedy : 459 -> 466
silesia : 360 -> 365
enwik8  : 267 -> 269

clang 10.0.0 :
kennedy : 436 -> 441
silesia : 364 -> 366
enwik8  : 271 -> 272
2020-05-07 06:15:58 -07:00
ad8dbae1b7 Merge pull request #2103 from felixhandte/relative-includes
Migrate Includes to Relative Paths
2020-05-06 09:42:23 -07:00
c29fd7cd8b some more conversion warnings
hunting down some static analyzer warnings
2020-05-05 10:16:59 -07:00
c1b836f4c3 fix minor conversion warnings 2020-05-04 14:43:09 -07:00
6028827fee Rewrite Include Paths to be Relative
Addresses #1998.
2020-05-04 15:20:26 -04:00
7e9aabd652 Merge pull request #2099 from felixhandte/compile-under-pedantic
Compile Under `-pedantic -Werror` and `-std=c90`
2020-05-04 10:07:13 -07:00
816ed80774 Merge pull request #1984 from MeghnaM/1636-Reduce-stack-usage-of-HUF_sort
Reduce stack usage of HUF_sort()
2020-05-04 08:15:31 -07:00
c7da66c9cf Purge C++-Style Comments (// ...), Make Compilation Succeed Under C90 2020-05-04 10:59:15 -04:00
6696933b32 Make All Invocations Start With Literal Format String 2020-05-04 10:59:15 -04:00
5e5f262612 Add (Possibly Empty) Info Strings to All Variadic Error Handling Macro Invocations 2020-05-04 10:58:55 -04:00
e103d7b4a6 Fix superblock mode (#2100)
Fixes:

Enable RLE blocks for superblock mode
Fix the limitation that the literals block must shrink. Instead, when we're within 200 bytes of the next header byte size, we will just use the next one up. That way we should (almost?) always have space for the table.
Remove the limitation that the first sub-block MUST have compressed literals and be compressed. Now one sub-block MUST be compressed (otherwise we fall back to raw block which is okay, since that is streamable). If no block has compressed literals that is okay, we will fix up the next Huffman table.
Handle the case where the last sub-block is uncompressed (maybe it is very small). Before it would skip superblock in this case, now we allow the last sub-block to be uncompressed. To do this we need to regenerate the correct repcodes.
Respect disableLiteralsCompression in superblock mode
Fix superblock mode to handle a block consisting of only compressed literals
Fix a off by 1 error in superblock mode that disabled it whenever there were last literals
Fix superblock mode with long literals/matches (> 0xFFFF)
Allow superblock mode to repeat Huffman tables
Respect ZSTD_minGain().
Tests:

Simple check for the condition in #2096.
When the simple_round_trip fuzzer enables superblock mode, it checks that the compressed size isn't expanded too much.
Remaining limitations:

O(targetCBlockSize^2) because we recompute statistics every sequence
Unable to split literals of length > targetCBlockSize into multiple sequences
Refuses to generate sub-blocks that don't shrink the compressed data, so we could end up with large sub-blocks. We should emit those sections as uncompressed blocks instead.
...
Fixes #2096
2020-05-01 16:11:47 -07:00
0adfc8dfce Fix broken CI; make changes in response to the comments 2020-05-01 13:45:48 -07:00
53d76dc20f Remove magic constant and made other changes addressing the comments 2020-05-01 13:45:48 -07:00
fe8402b522 WIP: Still getting an error 2020-05-01 13:45:48 -07:00
a084d959bd WIP: Increased wksp size, but it's segfaulting 2020-05-01 13:45:48 -07:00
fdb2780c47 Move rank table into HUF_buildCTable_wksp() 2020-05-01 13:45:48 -07:00
1875f616ce passing dictContentType instead of rawContent every time 2020-04-21 22:29:35 -07:00
5b0a452cac Adding --long support for --patch-from (#1959)
* adding long support for patch-from

* adding refPrefix to dictionary_decompress

* adding refPrefix to dictionary_loader

* conversion nit

* triggering log mode on chainLog < fileLog and removing old threshold

* adding refPrefix to dictionary_round_trip

* adding docs

* adding enableldm + forceWindow test for dict

* separate patch-from logic into FIO_adjustParamsForPatchFromMode

* moving memLimit adjustment to outside ifdefs (need for decomp)

* removing refPrefix gate on dictionary_round_trip

* rebase on top of dev refPrefix change

* making sure refPrefx + ldm is < 1% of srcSize

* combining notes for patch-from

* moving memlimit logic inside fileio.c

* adding display for optimal parser and long mode trigger

* conversion nit

* fuzzer found heap-overflow fix

* another conversion nit

* moving FIO_adjustMemLimitForPatchFromMode outside ifndef

* making params immutable

* moving memLimit update before createDictBuffer call

* making maxSrcSize unsigned long long

* making dictSize and maxSrcSize params unsigned long long

* error on files larger than 4gb

* extend refPrefix test to include round trip

* conversion to size_t

* making sure ldm is at least 10x better

* removing break

* including zstd_compress_internal and removing redundant macros

* exposing ZSTD_cycleLog()

* using cycleLog instead of chainLog

* add some more docs about user optimizations

* formatting
2020-04-17 15:58:53 -05:00
5fcbc484c8 Merge pull request #2040 from caoyzh/dev-2
Optimize by prefetching on aarch64
2020-04-08 13:14:47 -07:00
c0d4b2b5a3 Merge pull request #2075 from bimbashrestha/dict_fuzzer_ref
[bug] handling case where prefix is NULL or 0 sized in refPrefix_advanced
2020-04-07 17:37:19 -05:00
1658ae75cd handling nil case for refprefix 2020-04-07 14:41:53 -07:00
a93fadfcd9 Further replication removed
`CHECK_F` is now in `error_private.h`. Minor tidy.
2020-04-07 11:25:16 +02:00
7af7735fa3 Merge remote-tracking branch 'upstream/dev' into single-file-lib 2020-04-07 11:13:02 +02:00
edd9a07322 Code replicated in compression and decompression moved to shared headers
`CHECK_F` macro moved to `error_private.h` (shared between `fse_compress.c` and `fse_decompress.c`). `ZSTD_limitCopy()` moved to `zstd_internal.h` (shared between `zstd_compress.c` and `zstd_decompress.c`). Erroneous build artefact `zstd.h` removed from repo.
2020-04-07 11:02:06 +02:00
0154866749 moving consts to zstd_internal and reusing them 2020-04-03 14:26:15 -07:00
7c420344d2 Single-file decoder script can now (optionally) create an encoder
To complement the single-file decoder a new script was added to create an amalgamated single-file of all of the Zstd source, along with examples and (simple) tests.
2020-04-03 19:07:46 +02:00
ac58c8d720 Fix copyright and license lines
* All copyright lines now have -2020 instead of -present
* All copyright lines include "Facebook, Inc"
* All licenses are now standardized

The copyright in `threading.{h,c}` is not changed because it comes from
zstdmt.

The copyright and license of `divsufsort.{h,c}` is not changed.
2020-03-26 17:02:06 -07:00
d34204a7b7 Merge pull request #2029 from terrelln/minor-opt
[opt] Update repcodes less often
2020-03-23 18:12:32 -07:00
7201980650 Optimize by prefetching on aarch64 2020-03-14 15:25:59 +08:00
66607d0eac Merge pull request #2033 from bimbashrestha/icc
[opt] Small icc level 1 compression speed gain using #pragma vector
2020-03-10 20:42:19 -05:00
a89c45bdbd Typo 2020-03-10 15:19:48 -05:00
43fc88f443 Adding comment and remvoing ivdep 2020-03-10 14:57:27 -05:00
dba3abc95a Missed returns 2020-03-05 12:20:59 -08:00
a75e5f2ffc bitscan add undef check 2020-03-05 11:52:15 -08:00
4c72a1a9c2 adding vector to main loop 2020-03-05 09:55:38 -08:00
81fda0419e [opt] Only update repcodes upon arrival 2020-03-04 17:57:15 -08:00
04744e52dc Merge pull request #2028 from terrelln/minor-opt
[opt] Don't recompute initial literals price
2020-03-04 17:40:59 -08:00
0f9882deb9 [opt] Don't recompute repcodes while emitting sequences 2020-03-04 17:23:00 -08:00
c6caa2d04e [opt] Delete ZSTD_litLengthContribution 2020-03-04 16:35:26 -08:00
610171ed86 [opt] Explain why we don't include literals price 2020-03-04 16:29:19 -08:00