When restoring a cluster that will be promoted but is not intended to be the new primary, it is important to disable archiving to avoid polluting the repository with useless WAL. This option makes disabling archiving a bit easier.
Automatically retrieve the role and temporary credentials for S3 when the AWS instance is associated with an IAM role. Credentials are automatically updated when they are <= 5 minutes from expiring.
Basic configuration is to set repo1-s3-key-type=auto. repo1-s3-role can be used to set a specific role, otherwise it will be retrieved automatically.
We use the Z suffix in many functions to indicate that we are expecting a zero-terminated string so make this function conform to the pattern.
As a bonus the new name is a bit shorter, which is a good quality in a commonly-used function.
A shared access signature (SAS) provides granular, delegated access to resources in a storage account. This is often preferable to using a shared key which provides more access and is a greater security risk if compromised.
Azure and Azure-compatible object stores can now be used for repository storage.
Currently only shared key authentication is supported but SAS will be added soon.
The prior default was determined by benchmarking the Perl code prior to the 1.0 release. In general buffer allocation was more expensive in Perl so large buffers gave the best performance. This was due to multiple buffer allocations for each filter in an IO operation.
The C code allocates fixed buffers for each IO operation so the cost for buffer allocation is lower than Perl. That being the case it made sense to benchmark the C code to determine the optimal buffer default.
The performance/storage tests were used to measure the performance of a variety of filters. 1GiB of data was processed by each filter 10 times and the results of the tests were averaged.
While most buffer sizes gave similar performance, 1MiB appeared to perform the best overall. Of course, different architectures are likely to yield different results but this seems like a sensible default. The buffer-size option may still need to be manually configured to give optimal results.
Raw test data for reference:
4MB buffer (prior default)
copy time 1807ms, avg time 180ms, avg throughput: 5942MB/s
md5 time 14200ms, avg time 1420ms, avg throughput: 756MB/s
sha1 time 11431ms, avg time 1143ms, avg throughput: 939MB/s
sha256 time 23463ms, avg time 2346ms, avg throughput: 457MB/s
gzip -6 time 381199ms, avg time 38119ms, avg throughput: 28MB/s
lz4 -1 time 15484ms, avg time 1548ms, avg throughput: 693MB/s
1MB buffer (new default)
copy time 1760ms, avg time 176ms, avg throughput: 6100MB/s
md5 time 13739ms, avg time 1373ms, avg throughput: 781MB/s
sha1 time 11025ms, avg time 1102ms, avg throughput: 973MB/s
sha256 time 22539ms, avg time 2253ms, avg throughput: 476MB/s
gzip -6 time 372995ms, avg time 37299ms, avg throughput: 28MB/s
lz4 -1 time 15118ms, avg time 1511ms, avg throughput: 710MB/s
512K buffer
copy time 1782ms, avg time 178ms, avg throughput: 6025MB/s
md5 time 13724ms, avg time 1372ms, avg throughput: 782MB/s
sha1 time 10959ms, avg time 1095ms, avg throughput: 979MB/s
sha256 time 22982ms, avg time 2298ms, avg throughput: 467MB/s
gzip -6 time 378120ms, avg time 37812ms, avg throughput: 28MB/s
lz4 -1 time 15484ms, avg time 1548ms, avg throughput: 693MB/s
256K buffer
copy time 1805ms, avg time 180ms, avg throughput: 5948MB/s
md5 time 13706ms, avg time 1370ms, avg throughput: 783MB/s
sha1 time 11074ms, avg time 1107ms, avg throughput: 969MB/s
sha256 time 22588ms, avg time 2258ms, avg throughput: 475MB/s
gzip -6 time 372645ms, avg time 37264ms, avg throughput: 28MB/s
lz4 -1 time 16346ms, avg time 1634ms, avg throughput: 656MB/s
The --repo-retention-full-type option allows retention of full backups based on a time period, specified in days.
The new option will default to 'count' and therefore will not affect current installations. Setting repo-retention-full-type to 'time' will allow the user to use a time period, in days, to indicate full backup retention. Using this method, a full backup can be expired only if the time the backup completed is older than the number of days set with repo-retention-full (calculated from the moment the 'expire' command is run) and at least one full backup meets the retention period. If archive retention has not been configured, then the default settings will expire archives that are prior to the oldest retained full backup. For example, if there are three full backups ending in times that are 25 days old (F1), 20 days old (F2) and 10 days old (F3), then if the full retention period is 15 days, then only F1 will be expired; F2 will be retained because F1 is not at least 15 days old.
An upcoming feature requires new parameters for storagePosixNew() and this causes a lot of churn because almost every test creates a Posix storage object. Some refactoring in the tests might reduce this duplication but storagePosixNew() is collecting a lot of parameters so converting to storagePosixNewP() makes sense in any case.
There are relatively few call sites in the core code but they still benefit from better readability after this change.
Timeout used for connections and read/write operations.
Note that the entire read/write operation does not need to complete within this timeout but some progress must be made, even if it is only a single byte.
This is really a socket option so the new name is clearer.
Since common/io/socket/tcp will contains a mix of options it makes sense to rename it to socket and cascade name changes as needed.
Prior to 2.25 the individual TCP keep-alive options were not being configured due to a missing header. In 2.25 they were being configured incorrectly due to a disconnect between the timeout specified in ms and what was expected by the TCP options, i.e. seconds.
Instead make the TCP keep-alive options directly configurable, with correct units and better testing. Keep-alive is enabled by default (though it can be defaulted to the system setting instead) and the rest of the options are not set by default. This is in line with what PostgreSQL does, though PostgreSQL does not allow keep-alive to be defaulted.
Also move configuration of TCP options before connect() as PostgreSQL does.
Add compress-type option and deprecate compress option. Since the compress option is boolean it won't work with multiple compression types. Add logic to cfgLoadUpdateOption() to update compress-type if it is not set directly. The compress option should no longer be referenced outside the cfgLoadUpdateOption() function.
Add common/compress/helper module to contain interface functions that work with multiple compression types. Code outside this module should no longer call specific compression drivers, though it may be OK to reference a specific compression type using the new interface (e.g., saving backup history files in gz format).
Unit tests only test compression using the gz format because other formats may not be available in all builds. It is the job of integration tests to exercise all compression types.
Additional compression types will be added in future commits.
These commands (e.g. restore, archive-get) never used the compress options but allowed them to be passed on the command line. Now they will error when these options are passed on the command line. If these errors occur then remove the unused options.
Although path-style URIs have been deprecated by AWS, they may still be used with products like Minio because no additional DNS configuration is required.
Path-style URIs must be explicitly enabled since it is not clear how they can be auto-detected reliably. More importantly, faulty detection could cause regressions in current installations.
This macro was created before the String object existed so subsequent usage with String always included a lot of strPtr() wrapping.
TEST_RESULT_STR_Z() had already been introduced but a wholesale replacement of TEST_RESULT_STR() was not done since the priority was on the C migration.
Update all calls to (old) TEST_RESULT_STR() with one of the following variants: (new) TEST_RESULT_STR(), TEST_RESULT_STR_Z(), TEST_RESULT_Z(), TEST_RESULT_Z_STR().
Adding a dummy column which is always set by the P() macro allows a single macro to be used for parameters or no parameters without violating C's prohibition on the {} initializer.
-Wmissing-field-initializers remains disabled because it still gives wildly different results between versions of gcc.
If this option is set then ports appended to repo-s3-endpoint or repo-s3-host will be ignored.
Setting this option explicitly may be the only way to use a bare ipv6 address with S3 (since multiple colons confuse the parser) but we plan to improve this in the future.
Secure options could show up in the help as "current". While the user must have permissions to see the source of the options (e.g. environment, config file) it's still not a good idea to display them in an unexpected context.
Instead show secure options as <redacted> in the help command.
The new name is preferred because pgBackRest does not support any SSL protocol versions (they are all considered to be insecure).
The old name will continue to be accepted.
Remove "File" and "Driver" from object names so they are shorter and easier to keep consistent.
Also remove the "driver" directory so storage implementations are visible directly under "storage".
The function pointer casting used when creating drivers made changing interfaces difficult and led to slightly divergent driver implementations. Unit testing caught production-level errors but there were a lot of small issues and the process was harder than it should have been.
Use void pointers instead so that no casts are required. Introduce the THIS_VOID and THIS() macros to make dealing with void pointers a little safer.
Since we don't want to expose void pointers in header files, driver functions have been removed from the headers and the various driver objects return their interface type. This cuts down on accessor methods and the vast majority of those functions were not being used. Move functions that are still required to .intern.h.
Remove the special "C" crypto functions that were used in libc and instead use the standard interface.
There is only one instance in the core code where this helps. It is mostly helpful in the tests.
There is an argument to be made that only THROW_SYS_ERROR*() variants should be used in the core code to improve test coverage. If so, that will be the subject of a future commit.