1
0
mirror of https://github.com/facebook/zstd.git synced 2025-03-07 09:26:03 +02:00
zstd/lib/decompress
Nick Terrell 5ab78c0418 [huf] Improve fast C & ASM performance on small data
* Rename `ilimit` to `ilowest` and set it equal to `src` instead of
  `src + 6 + 8`. This is safe because the fast decoding loops guarantee
  to never read below `ilowest` already. This allows the fast decoder to
  run for at least two more iterations, because it consumes at most 7
  bytes per iteration.
* Continue the fast loop all the way until the number of safe iterations
 is 0. Initially, I thought that when it got towards the end, the
 computation of how many iterations of safe might become expensive. But
 it ends up being slower to have to decode each of the 4 streams
 individually, which makes sense.

This drastically speeds up the Huffman decoder on the `github` dataset
for the issue raised in #3762, measured with `zstd -b1e1r github/`.

| Decoder  | Speed before | Speed after |
|----------|--------------|-------------|
| Fallback | 477 MB/s     | 477 MB/s    |
| Fast C   | 384 MB/s     | 492 MB/s    |
| Assembly | 385 MB/s     | 501 MB/s    |

We can also look at the speed delta for different block sizes of silesia
using `zstd -b1e1r silesia.tar -B#`.

| Decoder  | -B1K ∆ | -B2K ∆ | -B4K ∆ | -B8K ∆ | -B16K ∆ | -B32K ∆ | -B64K ∆ | -B128K ∆ |
|----------|--------|--------|--------|--------|---------|---------|---------|----------|
| Fast C   | +11.2% | +8.2%  | +6.1%  | +4.4%  | +2.7%   | +1.5%   | +0.6%   | +0.2%    |
| Assembly | +12.5% | +9.0%  | +6.2%  | +3.6%  | +1.5%   | +0.7%   | +0.2%   | +0.03%   |
2023-11-20 17:13:01 -05:00
..