mirror of
https://github.com/facebook/zstd.git
synced 2025-03-07 09:26:03 +02:00
* Rename `ilimit` to `ilowest` and set it equal to `src` instead of `src + 6 + 8`. This is safe because the fast decoding loops guarantee to never read below `ilowest` already. This allows the fast decoder to run for at least two more iterations, because it consumes at most 7 bytes per iteration. * Continue the fast loop all the way until the number of safe iterations is 0. Initially, I thought that when it got towards the end, the computation of how many iterations of safe might become expensive. But it ends up being slower to have to decode each of the 4 streams individually, which makes sense. This drastically speeds up the Huffman decoder on the `github` dataset for the issue raised in #3762, measured with `zstd -b1e1r github/`. | Decoder | Speed before | Speed after | |----------|--------------|-------------| | Fallback | 477 MB/s | 477 MB/s | | Fast C | 384 MB/s | 492 MB/s | | Assembly | 385 MB/s | 501 MB/s | We can also look at the speed delta for different block sizes of silesia using `zstd -b1e1r silesia.tar -B#`. | Decoder | -B1K ∆ | -B2K ∆ | -B4K ∆ | -B8K ∆ | -B16K ∆ | -B32K ∆ | -B64K ∆ | -B128K ∆ | |----------|--------|--------|--------|--------|---------|---------|---------|----------| | Fast C | +11.2% | +8.2% | +6.1% | +4.4% | +2.7% | +1.5% | +0.6% | +0.2% | | Assembly | +12.5% | +9.0% | +6.2% | +3.6% | +1.5% | +0.7% | +0.2% | +0.03% |