If PostgreSQL crashes it can leave behind a pg_internal.init temp file with the pid as the extension, as discussed in https://www.postgresql.org/message-id/flat/20200131045352.GB2631%40paquier.xyz#7700b9481ef5b0dd5f09cc410b4750f6. On restart this file is not cleaned up so it can persist for the lifetime of the cluster or until another process with the same id happens to write pg_internal.init.
This is arguably a bug in PostgreSQL, but in any case it makes sense not to backup this file.
This error was lost during the migration to C. The error that occurred instead (generally an SSH auth error) was hard to debug.
Restore the original behavior by throwing an error immediately if pg1-host is configured for any of these commands. reset-pg1-host can be used to suppress the error when required.
The main improvement is a double-fork to prevent zombie processes if the parent process exits after the (child) async process. This is a real possibility since the parent process sticks around to monitor the results of the async process.
In the first fork, ignore SIGCHLD in the very unlikely case that the async process exits before the first fork. This is probably only possible if the async process exits immediately, perhaps due to a chdir() failure. Set SIGCHLD back to default in the async process so waitpid() will work as expected.
Also update the comment on chdir() to more accurately reflect what is happening.
Finally, add a test in certain debug builds to ensure the first fork exits very quickly. This only works when valgrind is not in use because valgrind makes forking so slow that it is hard to tell if the async process performed work or not (in the case that the second fork goes missing and the async process is a direct child).
In this case the resumable backup should be ignored, but the C code was not able to load the partial manifest written by Perl since the format differs slightly. Add validations to catch this case and continue gracefully.
2a06df93 removed the error file so an old error would not be reported before the async process had a chance to try again. However, if the async process was already running this might lead to a timeout error before reporting the correct error.
Instead, remove the error files once we know that the async process will start, i.e. after the archive lock has been acquired.
This effectively reverts 2a06df93.
Generally, the content-size or content-encoding headers will be used to specify how much content should be expected.
There is a special case where the server sends 'Connection:close' without the content headers and the content may be read up until eof.
This appears to be an atypical usage but it is required by the specification.
Auto-selection is performed only when --set is not specified. If a backup set for the given target time cannot not be found, the latest (default) backup set will be used.
Currently a limited number of date formats are recognized and timezone names are not allowed, only timezone offsets.
Add tzPartsValid() and tzOffsetSecond() to calculate timezone offsets from user provided values.
Update epochFromParts() to accept a timezone offset in seconds.
The test that checks for no output from the server was leaving a connection open which valgrind was complaining about.
Wait on the server long enough to cause the error on the client then close the connection to free the memory.
Validate that checksums exist for zero size files. This means that the checksums for zero size files are explicitly set by backup even though they'll always be the same. Also validate that zero length files have the correct checksum.
Validate that repo size is > 0 if size is > 0. No matter what compression type is used a non-zero amount of data cannot be stored in zero bytes.
Bug Fixes:
* Fix missing files corrupting the manifest. If a file was removed by PostgreSQL during the backup (or was missing from the standby) then the next file might not be copied and updated in the manifest. If this happened then the backup would error when restored. (Reviewed by Cynthia Shang. Reported by Vitaliy Kukharik.)
Improvements:
* Use pkg-config instead of xml2-config for libxml2 build options. (Contributed by David Steele, Adrian Vondendriesch.)
* Validate checksums are set in the manifest on backup/restore. (Reviewed by Cynthia Shang.)
This is a modest start but it addresses the specific issue that was caused by the bug fixed in 45ec694a. This validation will produce an immediate error rather than erroring out partway through the restore.
More validations are planned but this is the most important one and seems safest for this release.
If a file was removed by PostgreSQL during the backup (or was missing from the standby) then the next file might not be copied and updated in the manifest. If this happened then the backup would error when restored.
The issue was that removing files from the manifest invalidated the pointers stored in the processing queues. When a file was removed, all the pointers shifted to the next file in the list, causing a file to be unprocessed. Since the unprocessed file was still in the manifest it would be saved with no checksum, causing a failure on restore.
When process-max was > 1 then the bug would often not express since the file had already been pulled from the queue and updates to the manifest are done by name rather than by pointer.
The last queue was not being sorted when a primary queue was added first.
This did not affect the backup or integrity but could lead to slightly lower performance since large files were not always copied first.
pkg-config is a generic way to get build options rather than relying on a package-specific utility.
XML2_CONFIG can be used to override this utility for systems that do not ship pkg-config.
Previously memNew() used memset() to initialize all struct members to 0, NULL, false, etc. While this appears to work in practice, it is a violation of the C specification. For instance, NULL == 0 must be true but neither NULL nor 0 must be represented with all zero bits.
Instead use designated initializers to initialize structs. These guarantee that struct members will be properly initialized even if they are not specified in the initializer. Note that due to a quirk in the C99 specification at least one member must be explicitly initialized even if it needs to be the default value.
Since pre-zeroed memory is no longer required, adjust memAllocInternal()/memReallocInternal() to return raw memory and update dependent functions accordingly. All instances of memset() have been removed except in debug/test code where needed.
Add memMewPtrArray() to allocate an array of pointers and automatically set all pointers to NULL.
Rename memGrowRaw() to the more logical memResize().
Bug Fixes:
* Fix error in timeline conversion. The timeline is required to verify WAL segments in the archive after a backup. The conversion was performed base 10 instead of 16, which led to errors when the timeline was ≥ 0xA. (Reported by Lukas Ertl, Eric Veldhuyzen.)
The timeline is required to verify WAL segments in the archive after a backup. The conversion was performed base 10 instead of 16, which led to errors when the timeline was ≥ 0xA.