short-tests-0 were silently failing. I think because of the && make clean construction. Switch to ; instead.
Also fix all the test failures that were exposed.
`make all` is failing on CircleCI because it is missing Docker. Move that test
to GitHub actions, and switch the pedantic CircleCI test to `make allmost`.
gcc-5 didn't like the l-value overload for defaulted operator=. There is
no reason it needs to be l-value overloaded, so just remove it.
I'm not sure why the build broke for @mckaygerhard in Issue #2811, since
this code hasn't changed since it was added. But, there is no harm in
fixing it.
Fixes issue #2811.
Tests that libzstd.so doesn't have the exec-stack bit set using
readelf. If the stack is marked executable systemd will refuse
to link against zstd. We now test that it isn't set on every PR.
Adds a test for PR #2857
Fixes Issue #2865
* When dynamic dispatching to bmi2 add lzcnt and bmi to the
TARGET_ATTRIBUTE.
* Centralize the bmi2 TARGET_ATTRIBUTE definition to
BMI2_TARGET_ATTRIBUTE so we can change it in the future.
* Only enable bmi2 when both bmi1 & bmi2 are supported. There shouldn't
be any cases where bmi2 is supported but bmi1 isn't. But, since we are
using the instruction we should check bmi1 as well.
PR #2850 attempted to fix a determinism bug that was uncovered by OSS-Fuzz. It
succeeded in addressing that source of non-determinism, but introduced a new
one: it was possible, when index reduction occurred, to map indices in the
window to the reserved value, which would cause them to be zeroed, potentially
altering parsing of the input.
This PR addresses this issue. It makes sure that the bottom of the window is
always `>= ZSTD_WINDOW_START_INDEX`.
I'm not sure if this makes #2850 redundant. I think it's probably still
valuable to have that protection as well.
Credit to OSS-Fuzz for discovering this issue.
It is no longer necessary to get good performance, there is only a small
speed difference between -O2 and -O3, so just stick to the default of
-O2. I've measured neutral compression speed and a ~3% decompression
speed loss in userspace with clang & gcc. I've also measured neutral
compression speed and a ~1% decompression speed loss in the kernel
benchmarks.
This also fixes the stack space usage on parisc. The compiler was buggy
for -O3 and used ~3KB of stack space for several functions. With -O2 the
problem is completely resolved, and stack space is back to a few hundred
bytes.
Additionally, we get a large code size win on gcc:
| Compiler | Before (Bytes) | After (Bytes) | Delta (Bytes) |
|----------|----------------|---------------|---------------|
| gcc-11 | 952754 | 738954 | -213800 |
| clang-12 | 976290 | 938826 | -37464 |
The optimal parser is unlikely to be used in the linux kernel in
practice. There is no reason these functions should be force inlined,
since we aren't gaining anything, and are losing build size.
| Compiler | Before (Bytes) | After (Bytes) | Delta (Bytes) |
|----------|----------------|---------------|---------------|
| gcc-11 | 1142090 | 952754 | -189336 |
| clang-12 | 1228402 | 976290 | -252112 |
This is a temporary solution pending the resolution of PR #2862 in the
`dev` branch.
Take the same approach as in PR #2828 [0] to remove functions that force
inline many function bodies and `switch`. Instead, create one function per
"template" combination, and then switch between these functions. This
allows the compiler to break the large function into many small
functions, which generally helps codegen.
Also, in the `extDict` modes when there is no ext-dict, call the top
level function instead of the force inlined one, to save on code size.
I'm specifically doing this because gcc on the parisc architecture doesn't
handle the large function body well, and ends up using a lot of excess
stack space. Outlining these functions fixes it.
Putting stack marking into every assembly files is required to indicate
that the stack does not need to be executable.
Executable flag on stack conflicts with some security measures, Systemd
MemoryDenyWriteExecute=yes for example.
Previously, if an index was equal to `reducerValue + 1`, it would get remapped
during index reduction to 1 i.e. `ZSTD_DUBT_UNSORTED_MARK`. This can affect the
parsing of the input slightly, by causing tree nodes to be nullified when they
otherwise wouldn't be. This hardly matters from a correctness or efficiency
perspective, but it does impact determinism.
So this commit changes index reduction to avoid mapping indices to collide with
`ZSTD_DUBT_UNSORTED_MARK`.