Eliminate summing and passing of copied files sizes for logging backup size.
Instead, utilize infoBackupDataByLabel() to pull the backup size for the log message.
This allows boolean boolean command-line options to work like their config file equivalents.
At least for now this behavior will remain undocumented since all examples in the documentation will continue to use the standard syntax. The idea is that it will "just work" when options are copied out of config files rather than generating an error.
Previously the archive was only checked at the end of the backup to ensure all WAL required to make the backup consistent was present. The problem was that if archiving was not functioning then the backup had to complete before the user found out, which could be a while if the database was large enough.
Add an archive check immediately after backup start so failures are reported earlier.
The trick is to determine which WAL to check. If the repo is new there may not be any WAL in it and pg_start_backup() will not switch the WAL segment if it is empty. These are both likely scenarios when setting up and/or testing pgBackRest.
If the WAL segment is switched by pg_start_backup(), then check the archive for the segment that was detected prior to backup start. This should be common on normal running clusters with regular activity. Note that this might not be the segment immediately prior to the backup start segment if WAL volume is high.
If pg_start_backup() did not switch the WAL then we can force a switch on PostgreSQL >= 9.3 by creating a restore point. In that case the WAL to check will be the backup start WAL. This is most likely to happen on idle systems, during testing, or immediately after a repo switch.
An advantage of this approach other than earlier notification is that the backup directory will not be created so no resume will be attempted on the next backup.
Note that some additional churn was created in backup.c because the load of archive.info needs to be done earlier.
This is easier to read than using infoBackupDataByLabel() != NULL.
It also allows an assertion to be added to infoBackupDataByLabel() to ensure that a NULL return value is not used unsafely.
This test was lost due to a syntax issue in a58635ac.
Update the test to use system() to better mimic what postgres does and add logging so pgBackRest timing can be determined.
Properly log the size of files copied during the backup, matching the backup size returned from the info command.
In the reference issue, the incremental backup after switchover logs the size of all files evaluated rather than only the size of the files copied in the backup.
This appears to have been an attempt to not delete files that we don't recognize, but it only works in narrow cases and could leave the user is a position of not being able to complete the stanza delete without manual intervention. It seems better just to proceed with the delete, especially since the info files have already been removed.
In addition, deleting the manifests individually could be slow on object stores if there were a very large number of backups.
Size option default and allowed values were displayed in bytes, which was confusing for the user.
This also lays the groundwork for adding units to time options.
Move option parsing functions into a common module so they can be used from the build module.
Allows users to provide an executable to be used when pgbackrest generates command strings that expect to invoke pgbackrest. These generated commands are written to files by pgbackrest, e.g. recovery.conf.
The error handler used a loop to process try, catch, and finally blocks. This worked fine but static analysis tools like Coverity did not understand that the finally block would always run and so there were false positives about double-free, unfreed resource, etc.
This implementation removes the loop, which simplifies everything, and makes it clear that the finally block will always run. This cuts down on Coverity false positives.
This implementation also catches lack of coverage on empty catch blocks so a few test fixes were committed separately in d74fe7a.
A small refactor in backup.c is required because gcc 10.3.1 on Fedora 33 complains that the reason variable may be used uninitialized. It's not clear why this is the case, but reducing the scope of the TRY block fixes the issue.
Rather the converting String to StringIds at runtime, store defaults in StringId format in parse.auto.c and convert user input to StringId during parsing.
The compress-type, repo-type and log-level-* options have allow lists, which means it is more efficient to treat them as StringIds.
For compress-type and log-level-* also update the functions that convert them to enums.
The strIdFrom*() forced the caller to pick an encoding, which led to a number of TRY...CATCH blocks in the code. In practice the caller does not care which encoding is used as long as the string is valid for some encoding.
Update the strIdFrom*() function to try all possible encodings and only throw an error when the string is not valid for any of them.
Bug Fixes:
* Allow "global" as a stanza prefix. (Reviewed by Stefan Fercot. Reported by Younes Alhroub.)
* Fix segfault on invalid GCS key file. (Reviewed by Stephen Frost. Reported by Henrik Feldt.)
Improvements:
* Allow link-map option to create new links. (Reviewed by Don Seiler, Stefan Fercot, Chris Bandy. Suggested by Don Seiler.)
* Increase max index allowed for pg/repo options to 256. (Reviewed by Cynthia Shang.)
* Add WebIdentity authentication for AWS S3. (Reviewed by James Callahan, Reid Thompson, Benjamin Blattberg, Andrew L'Ecuyer.)
* Report backup file validation errors in backup.info. (Contributed by Stefan Fercot. Reviewed by David Steele.)
* Add recovery start time to online backup restore log. (Reviewed by Tom Swartz, Stefan Fercot. Suggested by Tom Swartz.)
* Report original error and retries on local job failure. (Reviewed by Stefan Fercot.)
* Rename page checksum error to error list in info text output. (Reviewed by Stefan Fercot.)
* Add hints to standby replay timeout message. (Reviewed by Cynthia Shang, Stefan Fercot. Suggested by Leigh Downs.)
The new rendering behavior is correct in normal cases, but for the pre-rendered HTML blocks in the command and configuration references it causes a lot of churn. This would be OK if the new HTML was diff-able, but it is not.
Go back to the old behavior of using br tags for this case to reduce churn until a more permanent solution is found.
Since CentOS 8 will be EOL at the end of the year it makes sense to do this now. The centos:8 image is still used in documentation.xml because changes there require manual testing, which will need to be done at a later date. The changes are not user-facing, however, and can be done at any time.
Also update CentOS references to RHEL since that is what we are emulating for testing purposes.
Currently empty CATCH() blocks are always marked as covered because of the loop structure of error handling.
A prototype implementation of error handling without looping has shown that these CATCH() blocks are not covered without new tests. Whether or not that prototype gets committed it is worth adding the tests.
This is mostly to revert some comment changes in b11ab9f7 that will break the ppc64le patch, but at the same time keep the spelling consistent in all comments and documentation.
Also revert some space changes for the same reason.
Azurite released another breaking change (see fbd018cd, 096829b3, c38d6926, and Azurite issue 1039) so make adjustments as needed to documentation and tests.
Also remove some dead code that hid the repo-storage-host option and was made obsolete by all these changes.
Checking the return value is not terribly important here, but if setsockopt() fails it is likely that bind() will fail as well. May as well get it over with and this makes Coverity happy.