Unable to reproduce this anymore. It seems to have been fixed with the last round of config changes. Add regression tests to make sure it doesn't happen again.
Most tests are working now. What's not working:
1) --target-resume option fails because pause_on_recovery setting was removed. Need to implement to the new 9.5 option and make that work with older versions in a consistent way.
2) No tests for the new .partial WAL segments that can be generated on timeline switch.
* Major refactoring of the protocol layer to support this work.
* Fixed protocol issue that was preventing ssh errors (especially connect) from being logged.
Replaced IPC::System::Simple and Net::OpenSSH with IPC::Open3 to eliminate CPAN dependency for multiple distros. Using open3 will also be used for local processes so it make sense to switch now.
Also stopped replacing FORMAT number which explains the large number of test log changes. FORMAT should change very rarely and cause test log failures when it does.
More separation of the protocol and remote layers than was done in issue #106.
Settings are passed to the remote via command-line parameters rather than in the protocol.
* Includes updating the manifest to format 4. It turns out the manifest and .info files were not very good for providing information. A format update was required anyway so worked through the backlog of changes that would require a format change.
* Multiple database versions are now supported in the archive. Does't actually work yet but the structure should be good.
* Tests use more constants now that test logs can catch name regressions.
1) Re-checksums files that have checksums in the manifest
2) Recopies files that do not have a checksum
3) Saves the manifest at regular intervals to preserve checksums
4) Unit tests for all cases (that I can think of)
* Fixed a buffering error that could occur on large, highly-compressible files when copying to an uncompressed remote destination. The error was detected in the decompression code and resulted in a failed backup rather than corruption so it should not affect successful backups made with previous versions.
* Pushing duplicate WAL now generates an error. This worked before only if checksums were disabled.
* Database System IDs are used to make sure that all WAL in an archive matches up. This should help prevent misconfigurations that send WAL from multiple clusters to the same archive.
* Regression tests working back to PostgreSQL 8.3.
* Improved threading model by starting threads early and terminating them late.
* Added restore functionality.
* All options can now be set on the command-line making pg_backrest.conf optional.
* De/compression is now performed without threads and checksum/size is calculated in stream. That means file checksums are no longer optional.
* Added option `--no-start-stop` to allow backups when Postgres is shut down. If `postmaster.pid` is present then `--force` is required to make the backup run (though if Postgres is running an inconsistent backup will likely be created). This option was added primarily for the purpose of unit testing, but there may be applications in the real world as well.
* Fixed broken checksums and now they work with normal and resumed backups. Finally realized that checksums and checksum deltas should be functionally separated and this simplied a number of things. Issue #28 has been created for checksum deltas.
* Fixed an issue where a backup could be resumed from an aborted backup that didn't have the same type and prior backup.
* Removed dependency on Moose. It wasn't being used extensively and makes for longer startup times.
* Checksum for backup.manifest to detect corrupted/modified manifest.
* Link `latest` always points to the last backup. This has been added for convenience and to make restores simpler.
* More comprehensive unit tests in all areas.
* Complete rewrite of BackRest::File module to use a custom protocol for remote operations and Perl native GZIP and SHA operations. Compression is performed in threads rather than forked processes.
* Fairly comprehensive unit tests for all the basic operations. More work to be done here for sure, but then there is always more work to be done on unit tests.
* Removed dependency on Storable and replaced with a custom ini file implementation.
* Added much needed documentation (see INSTALL.md).
* Numerous other changes that can only be identified with a diff.