1
0
mirror of https://github.com/facebook/zstd.git synced 2025-07-05 15:09:03 +02:00
Commit Graph

15 Commits

Author SHA1 Message Date
8957fef554 [huf] Add generic C versions of the fast decoding loops
Add generic C versions of the fast decoding loops to serve architectures
that don't have an assembly implementation. Also allow selecting the C
decoding loop over the assembly decoding loop through a zstd
decompression parameter `ZSTD_d_disableHuffmanAssembly`.

I benchmarked on my Intel i9-9900K and my Macbook Air with an M1 processor.
The benchmark command forces zstd to compress without any matches, using
only literals compression, and measures only Huffman decompression speed:

```
zstd -b1e1 --compress-literals --zstd=tlen=131072 silesia.tar
```

The new fast decoding loops outperform the previous implementation uniformly,
but don't beat the x86-64 assembly. Additionally, the fast C decoding loops suffer
from the same stability problems that we've seen in the past, where the assembly
version doesn't. So even though clang gets close to assembly on x86-64, it still
has stability issues.

| Arch    | Function       | Compiler     | Default (MB/s) | Assembly (MB/s) | Fast (MB/s) |
|---------|----------------|--------------|----------------|-----------------|-------------|
| x86-64  | decompress 4X1 | gcc-12.2.0   |         1029.6 |          1308.1 |      1208.1 |
| x86-64  | decompress 4X1 | clang-14.0.6 |         1019.3 |          1305.6 |      1276.3 |
| x86-64  | decompress 4X2 | gcc-12.2.0   |         1348.5 |          1657.0 |      1374.1 |
| x86-64  | decompress 4X2 | clang-14.0.6 |         1027.6 |          1659.9 |      1468.1 |
| aarch64 | decompress 4X1 | clang-12.0.5 |         1081.0 |             N/A |      1234.9 |
| aarch64 | decompress 4X2 | clang-12.0.5 |         1270.0 |             N/A |      1516.6 |
2023-01-25 13:47:51 -08:00
8927f985ff Update Copyright Headers 'Facebook' -> 'Meta Platforms'
```
for f in $(find . \( -path ./.git -o -path ./tests/fuzz/corpora \) -prune -o -type f);
do
  sed -i 's/Facebook, Inc\./Meta Platforms, Inc. and affiliates./' $f;
done
```
2022-12-20 12:37:57 -05:00
e74ca7979e Simplify HUF_decompress4X2_usingDTable_internal_bmi2_asm_loop
Get rid of three divisions. The original expression was:

    opmin := min((oend0 - op0) / 10, (oend1 - op1) / 10, (oend2 - op2) / 10, (oend3 - op3) / 10)
    r15   := min(r15, opmin)

The division by 10 can be moved outside the `min`:

    opmin := min(oend0 - op0, oend1 - op1, oend2 - op2, oend3 - op3)
    r15   := min(r15, opmin/10)
2022-01-19 18:38:46 +01:00
568c69a4eb x86-64: Hide internal assembly functions
Hide x86-64 internal assembly functions. Before

$ nm -D lib/libzstd.so.1 | grep usingDTable_internal_bmi2_asm_loop
00000000000c23c0 T _HUF_decompress4X1_usingDTable_internal_bmi2_asm_loop
00000000000c23c0 T HUF_decompress4X1_usingDTable_internal_bmi2_asm_loop
00000000000c283d T _HUF_decompress4X2_usingDTable_internal_bmi2_asm_loop
00000000000c283d T HUF_decompress4X2_usingDTable_internal_bmi2_asm_loop
$

After

$ nm -D lib/libzstd.so.1 | grep usingDTable_internal_bmi2_asm_loop
$

This fixes issue #2990.
2022-01-11 10:12:24 -08:00
c7b03c217c [license] Fix license header of huf_decompress_amd64.S
* Add the license header for `huf_decompress_amd64.S`
* Add `.S` files to the `test-license.py` test
2022-01-07 09:35:27 -08:00
ef1f9e80ff Restrict GNU-stack Note to GNU Assemblers 2022-01-05 16:03:32 -05:00
b12edddb37 Write GNU-stack Section on All ELF Architectures
Previously we did this only on Linux, which missed other Unices.
2022-01-05 15:44:40 -05:00
9a9d1ec6f4 Mark Huffman Decoder Assembly noexecstack on All Architectures
Apparently, even when the assembly file is empty (because
`ZSTD_ENABLE_ASM_X86_64_BMI2` is false), it still is marked as possibly
needing an executable stack and so the whole library is marked as such. This
commit applies a simple patch for this problem by moving the noexecstack
indication outside the macro guard.

This commit builds on #2857.

This commit addresses #2963.
2021-12-29 17:47:12 -08:00
c284569457 [asm] Share portability macros and restrict ASM further
Move portability macros to `lib/common/portability_macros.h`. This file
only contains platform/feature detection (e.g. 0/1 macros). This file is
shared between C and ASM code, so it cannot include any C code.

Rename `HUF_` ASM macros to be `ZSTD_` prefixed, and move to the new
header.

Restrict `ZSTD_ASM_SUPPORTED` to `__GNUC__`, because we need the GAS
assembler.

Finally, only include the ASM code if we are actually going to use it.
This disables it on all Windows platforms, which should resolve the
problem brought up in Issue #2789.
2021-12-02 16:58:04 -08:00
91f5891dd0 [CircleCI] Fix short-tests-0
short-tests-0 were silently failing. I think because of the && make clean construction. Switch to ; instead.

Also fix all the test failures that were exposed.

`make all` is failing on CircleCI because it is missing Docker. Move that test
to GitHub actions, and switch the pedantic CircleCI test to `make allmost`.
2021-12-01 17:43:46 -08:00
a37a8df532 Merge pull request #2856 from rex4539/typos
Fix typos
2021-11-17 13:04:30 -08:00
c67e07f34e Remove executable flag from GNU_STACK section
Putting stack marking into every assembly files is required to indicate
that the stack does not need to be executable.
Executable flag on stack conflicts with some security measures, Systemd
MemoryDenyWriteExecute=yes for example.
2021-11-13 22:58:33 +09:00
ebbd675998 Fix typos 2021-11-13 10:04:04 +02:00
abd717a5fa [asm] Switch to C style comments
Switch to C style comments for increased portability, and consistency.
2021-10-20 11:37:05 -07:00
a5f2c45528 Huffman ASM 2021-09-20 14:46:43 -07:00