Zstandard is a fast lossless compression algorithm targeting real-time compression scenarios at zlib-level and better compression ratios. It's backed by a very fast entropy stage, provided by Huff0 and FSE library.
Zstandard version >= 1.0 is required, which is generally only available on newer distributions.
An upcoming feature requires new parameters for storagePosixNew() and this causes a lot of churn because almost every test creates a Posix storage object. Some refactoring in the tests might reduce this duplication but storagePosixNew() is collecting a lot of parameters so converting to storagePosixNewP() makes sense in any case.
There are relatively few call sites in the core code but they still benefit from better readability after this change.
It makes sense to do this check right after the first compression so any issues are caught early.
Also, none of the current compression formats omit decompressCmd so make the test mandatory.
LZ4 compresses data faster than gzip but at a lower ratio. This can be a good tradeoff in certain scenarios.
Note that setting compress-type=lz4 will make new backups and archive incompatible (unrestorable) with prior versions of pgBackRest.
Add compress-type option and deprecate compress option. Since the compress option is boolean it won't work with multiple compression types. Add logic to cfgLoadUpdateOption() to update compress-type if it is not set directly. The compress option should no longer be referenced outside the cfgLoadUpdateOption() function.
Add common/compress/helper module to contain interface functions that work with multiple compression types. Code outside this module should no longer call specific compression drivers, though it may be OK to reference a specific compression type using the new interface (e.g., saving backup history files in gz format).
Unit tests only test compression using the gz format because other formats may not be available in all builds. It is the job of integration tests to exercise all compression types.
Additional compression types will be added in future commits.
This was a minor optimization used in protocol layer compression. Even though it was slightly faster, it omitted the crc-32 that is generated during normal compression which could lead to corrupt data after a bad network transmission. This would be caught on restore by our checksum but it seems better to catch an issue like this early.
The raw option also made the function signature different than future compression formats which may not support raw, or require different code to support raw.
In general, it doesn't seem worth the extra testing to support a format that has minimal benefit and is seldom used, since protocol compression is only enabled when the transmitted data is uncompressed.
"gz" was used as the extension but "gzip" was generally used for function and type naming.
With a new compression format on the way, it makes sense to standardize on a single abbreviation to represent a compression format in the code. Since the extension is standard and we must use it, also use the extension for all naming.