Multiple status files were being created by asynchronous archiving if a high-level error occurred after one or more WAL segments had already been transferred successfully. Error files were being written for every file in the queue regardless of whether it had already succeeded. To fix this, add an option to skip writing error files when an ok file already exists.
There are other situations where both files might exist (various fsync and filesystem error scenarios) so it seems best to retry in the case that multiple status files are found rather than throwing a hard error (which then means that archiving is completely stuck). In the case of multiple status files, a warning will be logged to alert the user that something unusual is happening and the command will be retried.
Reported by fpa-postgres, Joe Ayers, Douglas J Hunley.
This calculation was missed when the WAL segment size was made dynamic in preparation for PostgreSQL 11.
Fix the calculation by checking the actual WAL file sizes instead of using an estimate based on WAL segment size. This is more accurate because it takes into account .history and .backup files, which are smaller. Since the calculation is done in the async process the additional processing time should not adversely affect performance.
Remove the PG_WAL_SIZE constant and instead use local constants where the old value is still required. This is only the case for some tests and PostgreSQL 8.3 which does not provide a way to get the WAL segment size from pg_control.
If an error occurred while acquiring a lock on a remote server the error would be reported correctly, but the queue max detection code was not reached. The tests failed to detect this because they fixed the connection before queue max, allowing the ccde to be reached.
Move the queue max code before the lock so it will run even when remote connections are not working. This means that no attempt will be made to transfer WAL once queue max has been exceeded, but it makes it much more likely that the code will be reach without error.
Update tests to continue errors up to the point where queue max is exceeded.
Reported by Lardière Sébastien.
The Perl process was exiting directly when called but that interfered with proper locking for the forked async process. Now Perl returns results to the C process which handles all errors, including signals.
Now only two types of locks can be taken: archive and backup. Most commands use one or the other but the stanza-* commands acquire both locks. This provides better protection than the old command-based locking scheme.
Move command begin to C except when it must be called after another command in Perl (e.g. expire after backup). Command begin logs correctly for complex data types like hash and list. Specify which commands will log to file immediately and set the default log level for log messages that are common to all commands. File logging is initiated from C.