Keepalives are now used to make sure the remote for the main process does not timeout while the thread remotes do all the work. The error messages for timeouts was also improved to make debugging easier.
Unable to reproduce this anymore. It seems to have been fixed with the last round of config changes. Add regression tests to make sure it doesn't happen again.
Most tests are working now. What's not working:
1) --target-resume option fails because pause_on_recovery setting was removed. Need to implement to the new 9.5 option and make that work with older versions in a consistent way.
2) No tests for the new .partial WAL segments that can be generated on timeline switch.
* Major refactoring of the protocol layer to support this work.
* Fixed protocol issue that was preventing ssh errors (especially connect) from being logged.
Also stopped replacing FORMAT number which explains the large number of test log changes. FORMAT should change very rarely and cause test log failures when it does.
More separation of the protocol and remote layers than was done in issue #106.
Settings are passed to the remote via command-line parameters rather than in the protocol.
* Includes updating the manifest to format 4. It turns out the manifest and .info files were not very good for providing information. A format update was required anyway so worked through the backlog of changes that would require a format change.
* Multiple database versions are now supported in the archive. Does't actually work yet but the structure should be good.
* Tests use more constants now that test logs can catch name regressions.
1) Re-checksums files that have checksums in the manifest
2) Recopies files that do not have a checksum
3) Saves the manifest at regular intervals to preserve checksums
4) Unit tests for all cases (that I can think of)
All tests local over SSH with rsync default compression, 4 threads and default compression on backrest. Backrest default is gzip = 6, assuming rsync is the same.
On a 1GB DB:
rsync time = 32.82
backrest time = 19.48
backrest is 171% faster.
On a 5GB DB:
rsync time = 171.16
backrest time = 86.97
backrest is 196% faster.
Moved Remote code from pg_backrest.pl to Config.pm.
Added version specific code to regression tests and Db.pm.
archive-push checks for duplicate WAL in the archive.
archive-push reads the db sys id to match up WAL to the correct archive.