Errors should be rare enough that it makes sense to log them at debug level. Right now if there is an error if won't be logged at debug level, which makes it harder to tell why the main process may have terminated the local/remote process.
Coverity complained about time_t being cast directly to unsigned int, so instead cast the result of the operation.
We are confident in both cases that the time_t values will not be out of unsigned int range but Coverity has no way to know that.
One of these is new (introduced by 9efd5cd0) but the other one (from a9867cb0) remained unnoticed for a while, though it has not caused any production impact.
The initial implementation used simple waits when having to loop due to getting a LIBSSH2_ERROR_EAGAIN, but we don't want to just wait some amount of time, we want to wait until we're able to read or write on the fd that we would have blocked on.
This change removes all of the wait code from the SFTP driver and changes the loops to call the newly introduced storageSftpWaitFd(), which in turn checks with libssh2 to determine the appropriate direction to wait on (read, write, or both) and then calls fdReady() to perform the wait using the provided timeout.
This also removes the need to pass ioSession or timeout down into the SFTP read/write code.
In the case that no backups were expired but time-based retention was met no archive expiration would occur and the following would be logged:
INFO: time-based archive retention not met - archive logs will not be expired
In most cases this was harmless, but when retention was first met or if retention was increased, it would require one additional backup to expire earlier WAL. After that expiration worked as normal.
Even once expiration was working normally the message would continue to be output, which was pretty misleading since retention had been met, even though there was nothing to do.
Bring this code in line with count-based retention, i.e. always log what should be expired at detail level (even if nothing will be expired) and then log info about what was expired (even if nothing is expired). For example:
DETAIL: repo1: 11-1 archive retention on backup 20181119-152138F, start = 000000010000000000000002
INFO: repo1: 11-1 no archive to remove
Seems easiest just to make the additional config required since it tests that custom ports are being used correctly. The test for synthetic was a noop since SFTP is not used in synthetic tests.
If there were at least two full backups and the last one was expired, it was impossible to make either a differential or incremental backup without first making a new full backup. The backupLabelCreate() function identified this situation as clock skew because the new backup label was compared with label of the expired full backup.
If the new backup is differential or incremental, then its label is now compared with the labels of differential or incremental backups related to the same full backup.
Also convert a hard-coded date length to a macro.
9ca492c missed adding auditing to this macro and as a result a few memory leaks have slipped through. Add auditing to the macro to close this hole.
Of the leaks found the only possibly serious one is in blockIncrProcess(), which would leak a PackRead of about eight bytes with every superblock. Since superblocks max out at a few thousand per file this was probably not too bad.
Also change the ordering of auditing in FUNCTION_TEST_RETURN_VOID(). Even though the order does not matter, having it different from the other macros is confusing and looks like an error.
This behavior is different than regular options where a repeated value will result in an error. It appears to be a legacy of the original Perl implementation, which used a hash as the underlying data type in the built-in command-line parser, and the C command-line parser was written to match.
This was missed in the C unit test migration and since then a new test was added that was not setting its timezone correctly.
This feature exists to make sure the tests will run on systems with different timezones and has no impact on the core code.
Deleting a stanza after all the storage driver stanzas were created was causing problems because the SFTP driver is slow and the GCS driver has no server (so it threw errors). This delayed the shutdown of PostgreSQL, which for some reason caused systemctl to hang when the documentation was being built on a RHEL host.
Move the section up and add a comment about why the location is required. Also add a comment to the GCS section about its location.
This does not address the issue of systemctl hanging on RHEL container hosts but it will hopefully make it less common.
Bring PostgreSQL >= 12 behavior in line with other versions when recovery type=none.
We are fairly sure this did not work correctly when PostgreSQL 12 was released, but apparently the issue has been fixed since then. Either way, after testing we have determined that the behavior is now as expected.
Some features are conditionally compiled into pgBackRest (e.g. lz4). Previously checking to see if the feature existed was the responsibility of the feature's module.
Centralize this logic in the config/parse module to make the errors more detailed and consistent.
This also fixes the assert that is thrown when SFTP storage is specified but SFTP support is not compiled into pgBackRest.
Combine StringId and int checking into a single loop. This seems more compact and makes it easier to add code that affects both types (and possibly more types in the future).
This was not tested in 87087fac and the generated config was only valid for pushing from the primary. Also do some general cleanup.
Update the SFTP server user to be "pgbackrest" instead of "postgres".
Even though sftp-all=y now creates a valid configuration, the user guide build still fails because SFTP is too slow and operations time out (particularly starting PostgreSQL). This will need to be addressed in a future commit.
This parameter is now optional and defaults to none so there is no reason to explicitly show it in user-facing documentation.
Also make the vm parameter in ci.pl optional to be consistent with how test.pl behaves.
The --no-log-timestamp option was missed when unit test building was migrated to C, which caused test timings to show up in the contributing guide. This caused no harm but did create churn in this file during releases.
Also improve the formatting when test timing is disabled.
Features:
* Block incremental backup. (Reviewed by John Morris, Stephen Frost, Stefan Fercot.)
* SFTP support for repository storage. (Contributed by Reid Thompson. Reviewed by Stephen Frost, David Steele.)
* PostgreSQL 16 support. (Reviewed by Stefan Fercot.)
Improvements:
* Allow page header checks to be skipped. (Reviewed by David Christensen. Suggested by David Christensen.)
* Avoid chown() on recovery files during restore. (Reviewed by Stefan Fercot, Marcelo Henrique Neppel. Suggested by Marcelo Henrique Neppel.)
* Add error retry detail for HTTP retries.
Documentation Improvements:
* Add warning about using recovery type=none. (Reviewed by Stefan Fercot.)
* Add note about running stanza-create on already-created repositories.
The prior timeouts were a bit aggressive and were causing timeouts in the Azure tests. There have also been occasional timeouts in other storage drivers.
The performance of CI environments is pretty variable so increased timeouts should make the tests more stable.
Double spaces have fallen out of favor in recent years because they no longer contribute to readability.
We have been using single spaces and editing related paragraphs for some time, but now it seems best to update the remaining instances to avoid churn in unrelated commits and to make it clearer what spacing contributors should use.
Remove beta status and update documentation to remove beta references and warnings.
The repo-block-* sub-options have been marked internal. Most users will be best off with the default behavior and we may still decide to change these options for remove them in the future.