fixed#3323, reported by @nigeltao
Completed documentation around this risk
(which is largely theoretical,
I can't see that happening in any "real world" scenario,
but an erroneous @srcSize value could indeed trigger it).
Fix an off-by-one error in the compressor that emits corrupt blocks if:
* Zstd is compiled in 32-bit mode
* The windowLog == 25 exactly
* An offset of 2^25-3, 2^25-2, 2^25-1, or 2^25 is emitted
* The bitstream had 7 bits leftover before writing the offset
This bug has been present since before v1.0, but wasn't able to easily
be triggered, since until somewhat recently zstd wasn't able to find
matches that were within 128KB of the window size.
Add a test case, and fix 2 bugs in `ZSTD_compressSequences()`:
* The `ZSTD_isRLE()` check was incorrect. It wouldn't produce
corruption, but it could waste CPU and not emit RLE even if the block
was RLE
* One windowSize was `1 << windowLog`, not `1u << windowLog`
Thanks to @tansy for finding the issue, and giving us a reproducer!
Fixes Issue #3350.
Delete unaligned memory access code from the legacy codebase by removing all the
non-memcpy functions. We don't care about speed at all for this codebase, only
simplicity.
Fix an instance of `NULL + 0` in `ZSTD_decompressStream()`. Also, improve our
`stream_decompress` fuzzer to pass `NULL` in/out buffers to
`ZSTD_decompressStream()`, and fix 2 issues that were immediately surfaced.
Fixes#3351
Instead of using packed attribute hack, just use aligned attribute. It
improves code generation on armv6 and armv7, and slightly improves code
generation on aarch64. GCC generates identical code to regular aligned
access on ARMv6 for all versions between 4.5 and trunk, except GCC 5
which is buggy and generates the same (bad) code as packed access:
https://gcc.godbolt.org/z/hq37rz7sb
Use a switch statement to select the search function instead of an
indirect function call. This results in a sizable performance win.
This PR is a modification of the approach taken in PR #2828.
When I measured performance for that commit, it was neutral.
However, I now see a performance regression on gcc, but still
neutral on clang. I'm measuring on the same platform, but with
newer compilers. The new approach beats both the current dev
branch and the baseline before PR #2828 was merged.
This PR is necessary for Issue #3275, to update zstd in the kernel.
Without this PR there is a large regression in greedy - btlazy2
compression speed. With this PR it is about neutral.
gcc version: 12.2.0
clang version: 14.0.6
dataset: silesia.tar
| Compiler | Level | Dev Speed (MB/s) | PR Speed (MB/s) | Delta |
|----------|-------|------------------|-----------------|--------|
| gcc | 5 | 102.6 | 113.7 | +10.8% |
| gcc | 7 | 66.6 | 74.8 | +12.3% |
| gcc | 9 | 51.5 | 58.9 | +14.3% |
| gcc | 13 | 14.3 | 14.3 | +0.0% |
| clang | 5 | 108.1 | 114.8 | +6.2% |
| clang | 7 | 68.5 | 72.3 | +5.5% |
| clang | 9 | 53.2 | 56.2 | +5.6% |
| clang | 13 | 14.3 | 14.7 | +2.8% |
The binary size stays just about the same for clang and gcc, measured
using the `size` command:
| Compiler | Branch | Text | Data | BSS | Total |
|----------|--------|---------|------|-----|---------|
| gcc | dev | 1127950 | 3312 | 280 | 1131542 |
| gcc | PR | 1123422 | 2512 | 280 | 1126214 |
| clang | dev | 1046254 | 3256 | 216 | 1049726 |
| clang | PR | 1048198 | 2296 | 216 | 1050710 |
Currently this function actually reads the dict ID from the dictionary's
header, via `ZSTD_getDictID_fromDict()`. But during decompression the decomp-
ressor actually compares the dict ID in the frame header with the dict ID in
the DDict. Now of course the dict ID in the dictionary contents and the dict
ID in the DDict struct *should* be the same. But in cases of memory corrupt-
ion, where they can drift out of sync, it's misleading for this function to
read it again from the dict buffer rather then return the dict ID that will
actually be used.
Also doing it this way avoids rechecking the magic and so on and so it is a
tiny bit more efficient.
Clang doesn't allow [[deprecated(...)]] attribute after __attribute__.
Move [[deprecated(...)]] before __attribute__ to fix C++14/C++17 uses
with Clang.
Fix#3250
Comparing 4B instead of comparing 1B in ZSTD_noDict
mode, thus it can avoid cases like match in match[ml]
but mismatch in match[ml-3]..match[ml-1]. So the call
count of ZSTD_count can be reduced.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I3449ea423d5c8e8344f75341f19a2d1643c703f6