A minor change in 5434de0 changed a `<=` into a `<`,
and as an indirect consequence allowed compression attempt of literals when there are only 6 literals to compress
(previous limit was effectively 7 literals).
This is not in itself a problem, as the threshold is merely an heuristic,
but it emerged a bug that has always been there, and was just never triggered so far due to the previous limit.
This bug would make the literal compressor believes that all literals are the same symbol,
but for the exact case where nbLiterals==6, plus a pretty wild combination of other limit conditions,
this outcome could be false, resulting in data corruption.
Replaced the blind heuristic by an actual test for all limit cases,
so that even if the threshold is changed again in the future,
the detection of RLE mode will remain reliable.
Reported by @shulib :
the specification for 4-streams mode
doesn't work when the amount of literals to compress is 5 bytes.
Extending it, it also doesn't work for sizes 1 or 2.
This patch updates the specification and the implementation
to require a minimum of 6 literals to trigger or accept the 4-streams mode.
The impact is expected to be a no-op :
the 4-streams mode is never triggered for such small quantity of literals anyway,
since it would be wasteful (it costs ~7.3 bytes more than single-stream mode).
An informal lower limit is set at ~256 bytes,
so the technical minimum is very far from this limit.
This is just meant for completeness of the specification.
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
Fix an instance of `NULL + 0` in `ZSTD_decompressStream()`. Also, improve our
`stream_decompress` fuzzer to pass `NULL` in/out buffers to
`ZSTD_decompressStream()`, and fix 2 issues that were immediately surfaced.
Fixes#3351
Currently this function actually reads the dict ID from the dictionary's
header, via `ZSTD_getDictID_fromDict()`. But during decompression the decomp-
ressor actually compares the dict ID in the frame header with the dict ID in
the DDict. Now of course the dict ID in the dictionary contents and the dict
ID in the DDict struct *should* be the same. But in cases of memory corrupt-
ion, where they can drift out of sync, it's misleading for this function to
read it again from the dict buffer rather then return the dict ID that will
actually be used.
Also doing it this way avoids rechecking the magic and so on and so it is a
tiny bit more efficient.
Fixes#3212.
Long literal and match lengths had an off-by-one error in ZSTD_getSequenceLength.
Fix the off-by-one error, and add a golden compression test that catches the bug.
Also run all the golden tests in the cli-tests framework.
With statistic data of test data files of silesia
the chance of position beyond highThreshold is very
low (~1.3%@L8 in most cases, all <2.5%), and is in
"lowprob area". Add the branch hint so compiler can
get better pipiline codegen.
With this change it is observed ~1% of mozilla and
xml, and slight (0.3%~0.8%) but consistent uplift on
other files on Arm N1.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: Id9ba1d5c767e975290b5c1bf0ecce906544f4ade
match is used for following sequence copy. It is
only updated when extDict is needed, which is a
low probability case. So it can be prefetched to
reduce cache miss.
The benchmarks on various Arm platforms showed
uplift from 1% ~ 14% with gcc-11/clang-14.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: If201af4799d2455d74c79f8387404439d7f684ae
* Intial commit to address 3090. Added support to decompress empty block
* Update zstd_decompress_block.c
Addressed review comments for the case of 'set_basic'
* Update lib/decompress/zstd_decompress_block.c
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
* Update lib/decompress/zstd_decompress_block.c
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
Co-authored-by: Nick Terrell <nickrterrell@gmail.com>
Streaming decompression used to wait for a minimum of 5 bytes before attempting decoding.
This meant that, in the case that only a few bytes (<5) were provided,
and assuming these bytes are incorrect,
there would be no error reported.
The streaming API would simply request more data, waiting for at least 5 bytes.
This PR makes it possible to detect incorrect Frame IDs as soon as the first byte is provided.
Fix#3169
ZSTD_seqSymbol is a structure with total of 64 bits
wide. So it can be loaded in one operation and
extract its fields by simply shifting or extracting
on aarch64.
GCC doesn't recognize this and generates more
unnecessary ldr/ldrb/ldrh operations that cause
performance drop.
With this change it is observed 2~4% uplift of
silesia and 2.5~6% of cantrbry @L8 on Arm N1.
Signed-off-by: Jun He <jun.he@arm.com>
Change-Id: I7748909204cf78a17eb9d4f2333692d53239daa8
Apparently, even when the assembly file is empty (because
`ZSTD_ENABLE_ASM_X86_64_BMI2` is false), it still is marked as possibly
needing an executable stack and so the whole library is marked as such. This
commit applies a simple patch for this problem by moving the noexecstack
indication outside the macro guard.
This commit builds on #2857.
This commit addresses #2963.
This saves some 1.7Kb in rodata section (x86_64, zstd tool),
while assembler code stays the same except
the type of a few load/extend instructions.
Should not have negative performance implications.
Move portability macros to `lib/common/portability_macros.h`. This file
only contains platform/feature detection (e.g. 0/1 macros). This file is
shared between C and ASM code, so it cannot include any C code.
Rename `HUF_` ASM macros to be `ZSTD_` prefixed, and move to the new
header.
Restrict `ZSTD_ASM_SUPPORTED` to `__GNUC__`, because we need the GAS
assembler.
Finally, only include the ASM code if we are actually going to use it.
This disables it on all Windows platforms, which should resolve the
problem brought up in Issue #2789.
short-tests-0 were silently failing. I think because of the && make clean construction. Switch to ; instead.
Also fix all the test failures that were exposed.
`make all` is failing on CircleCI because it is missing Docker. Move that test
to GitHub actions, and switch the pedantic CircleCI test to `make allmost`.
* When dynamic dispatching to bmi2 add lzcnt and bmi to the
TARGET_ATTRIBUTE.
* Centralize the bmi2 TARGET_ATTRIBUTE definition to
BMI2_TARGET_ATTRIBUTE so we can change it in the future.
* Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't
be any cases where bmi2 is supported but bmi1 isn't. But, since we are
using the instruction we should check bmi1 as well.
Putting stack marking into every assembly files is required to indicate
that the stack does not need to be executable.
Executable flag on stack conflicts with some security measures, Systemd
MemoryDenyWriteExecute=yes for example.