Bug Fixes:
* Fixed an issue where a tablespace link that referenced another link would not produce an error, but instead skip the tablespace entirely. (Reported by Michael Vitale.)
* Fixed an issue where options that should not allow multiple values could be specified multiple times in pgbackrest.conf without an error being raised. (Reported by Michael Vitale.)
* Fixed an issue where the protocol-timeout option was not automatically increased when the db-timeout option was increased. (Reported by Todd Vernick.)
Features:
* Backup from a standby cluster. A connection to the primary cluster is still required to start/stop the backup and copy files that are not replicated, but the vast majority of files are copied from the standby in order to reduce load on the master.
* More flexible configuration for databases. Master and standby can both be configured on the backup server and pgBackRest will automatically determine which is the master. This means no configuration changes for backup are required after failing over from a master to standby when a separate backup server is used.
* Exclude directories during backup that are cleaned, recreated, or zeroed by PostgreSQL at startup. These include pgsql_tmp and pg_stat_tmp. The postgresql.auto.conf.tmp file is now excluded in addition to files that were already excluded: backup_label.old, postmaster.opts, postmaster.pid, recovery.conf, recovery.done.
* Experimental support for non-exclusive backups in PostgreSQL 9.6 beta4. Changes to the control/catalog/WAL versions in subsequent betas may break compatibility but pgBackRest will be updated with each release to keep pace.
Refactoring:
* Simplify protocol creation and identifying which host is local/remote.
* Removed all OP_* function constants that were used only for debugging, not in the protocol, and replaced with __PACKAGE__.
* Improvements in Db module: separated out connect() function, allow executeSql() calls that do not return data, and improve error handling.
* Improve error message for links that reference links in manifest build.
* Added hints to error message when relative paths are detected in archive-push or archive-get.
* Improve backup log messages to indicate which host the files are being copied from.
Bug Fixes:
* Fixed an issue where tablespace paths that had $PGDATA as a substring would be identified as a subdirectories of $PGDATA even when they were not. Also hardened relative path checking a bit. (Reported by Chris Fort.)
Bug Fixes:
* Fixed an issue an where an extraneous remote was created causing threaded backup/restore to possibly timeout and/or throw a lock conflict. (Reported by Michael Vitale.)
* Fixed an issue where db-path was not required for the check command so an assert was raised when it was missing rather than a polite error message. (Reported by Michael Vitale.)
* Fixed check command to throw an error when database version/id does not match that of the archive. (Fixed by Cynthia Shang.)
* Fixed an issue where a remote could try to start its own remote when the backup-host option was not present in pgbackrest.conf on the database server. (Reported by Lardière Sébastien.)
* Fixed an issue where the contents of pg_xlog were being backed up if the directory was symlinked. This didn't cause any issues during restore but was a waste of space.
* Fixed an invalid log() call in lock routines.
Features:
* Experimental support for non-exclusive backups in PostgreSQL 9.6 beta3. Changes to the control/catalog/WAL versions in subsequent betas may break compatibility but pgBackRest will be updated with each release to keep pace.
Refactoring:
* Enhancements to the protocol layer for improved reliability and error handling.
* All remote types now take locks. The exceptions date to when the test harness and pgBackRest were running in the same VM and no longer apply.
* Exceptions are now passed back from threads as messages when possible rather than raised directly.
* Temp files created during backup are now placed in the same directory as the target file.
* Output lock file name when a lock cannot be acquired to aid in debugging.
* Reduce calls to protocolGet() in backup/restore.
* Suppress banners on SSH protocol connections.
* Improved remote error messages to identify the host where the error was raised.
Bug Fixes:
* Fixed an issue where keep-alives could be starved out by lots of small files during multi-threaded backup. They were also completely absent from single/multi-threaded backup resume and restore checksumming. (Reported by Janice Parkinson, Chris Barber.)
* Fixed an issue where the expire command would refuse to run when explicitly called from the command line if the db-host option was set. This was not an issue when expire was run automatically after a backup (Reported by Chris Barber.)
* Fixed an issue where validation was being running on archive_command even when the archive-check option was disabled.
Features:
* Added check command to validate that pgBackRest is configured correctly for archiving and backups. (Contributed by Cynthia Shang.)
* Added the protocol-timeout option. Previously protocol-timeout was set as db-timeout + 30 seconds.
* Failure to shutdown remotes at the end of the backup no longer throws an exception. Instead a warning is generated that recommends a higher protocol-timeout.
* Experimental support for non-exclusive backups in PostgreSQL 9.6 beta2. Changes to the control/catalog/WAL versions in subsequent betas may break compatibility but pgBackRest will be updated with each release to keep pace.
Refactoring:
* The pg_xlogfile_name() function is no longer used to construct WAL filenames from LSNs. While this function is convenient it is not available on a standby. Instead, the archive is searched for the LSN in order to find the timeline. If due to some misadventure the LSN appears on multiple timelines then an error will be thrown, whereas before this condition would have passed unnoticed.
* Option handling is now far more strict. Previously it was possible for a command to use an option that was not explicitly assigned to it. This was especially true for the backup-host and db-host options which are used to determine locality.
* Improved handling of users/groups captured during backup that do not exist on the restore host. Also explicitly handle the case where user/group is not mapped to a name.
* Changed version variable to a constant. It had originally been designed to play nice with a specific packaging tool but that tool was never used.
* Fix usage of sprintf() due to new constraints in Perl 5.22. Parameters not referenced in the format string are no longer allowed. (Fixed by Adrian Vondendriesch.)
Added an execution cache so that documentation can be generated without setting up the full container environment. This is useful for packaging, keeps the documentation consistent for a release, and speeds up generation when no changes are made in the execution list.
Release notes are now broken into sections so that bugs, features, and refactors are clearly delineated. An "Additional Notes" section has been added for changes to documentation and the test suite that do not affect the core code.
The change log was the last piece of documentation to be rendered in Markdown only. Wrote a converter so the document can be output by the standard renderers. The change log will now be located on the website and has been renamed to "Releases".
* Enhanced text output of `info` command to include timestamps, sizes, and the reference list for all backups. Contributed by Cynthia Shang.
* Allow selective restore of databases from a cluster backup. This feature can result in major space and time savings when only specific databases are restored. Unrestored databases will not be accessible but must be manually dropped before they will be removed from the shared catalogue.
* Experimental support for non-exclusive backups in PostgreSQL 9.6 beta1. Changes to the control/catalog/WAL versions in subsequent betas may break compatibility but pgBackRest will be updated with each release to keep pace.
* Fixed an issue where specifying --no-archive-check would throw a configuration error. Reported by Jason O'Donnell.
* Fixed an issue where a temp WAL file left over after a well-timed system crash could cause the next archive-push to fail.
* Fixed an issue where document generation failed because some OSs are not tolerant of having multiple installed versions of PostgreSQL. A separate VM is now created for each version. Also added a sleep after database starts during document generation to ensure the database is running before the next command runs. Reported by John Harvey.
* The retention-archive option can now be be safely set to less than backup retention (retention-full or retention-diff) without also specifying archive-copy=n. The WAL required to make the backups that fall outside of archive retention consistent will be preserved in the archive. However, in this case PITR will not be possible for the backups that fall outside of archive retention.
* When backing up and restoring tablespaces pgBackRest only operates on the subdirectory created for the version of PostgreSQL being run against. Since multiple versions can live in a tablespace (especially during a binary upgrade) this prevents too many files from being copied during a backup and other versions possibly being wiped out during a restore. This only applies to PostgreSQL >= 9.0 -- prior versions of PostgreSQL could not share a tablespace directory.
* Generate an error when archive-check=y but archivecommand does not execute pgbackrest. Contributed by Jason O'Donnell.
* Improved error message when repo-path or repo-remote-path does not exist.
* Added checks for --delta and --force restore options to ensure that the destination is a valid $PGDATA directory. pgBackRest will check for the presence of PGVERSION or backup.manifest (left over from an aborted restore). If neither file is found then --delta and --force will be disabled but the restore will proceed unless there are files in the $PGDATA directory (or any tablespace directories) in which case the operation will be aborted.
* When restore --set=latest (the default) the actual backup restored will be output to the log.
* Support for PostgreSQL 9.5 partial WAL segments and recoverytargetaction setting. The archivemode = 'always' setting is not yet supported.
* Support for recoverytarget = 'immediate' recovery setting introduced in PostgreSQL 9.4.
* The following tablespace checks have been added: paths or files in pgtblspc, relative links in pgtblspc, tablespaces in $PGDATA. All three will generate errors.
* Fixed an issue where longer-running backups/restores would timeout when remote and threaded. Keepalives are now used to make sure the remote for the main process does not timeout while the thread remotes do all the work. The error message for timeouts was also improved to make debugging easier.
* Allow restores to be performed on a read-only repository by using --no-lock and --log-level-file=off. The --no-lock option can only be used with restores.
* Minor styling changes, clarifications and rewording in the user guide.
* The dev branch has been renamed to master and for the time being the master branch has renamed to release, though it will probably be removed at some point -- thus ends the gitflow experiment for pgBackRest. It is recommended that any forks get re-forked and clones get re-cloned.
Most tests are working now. What's not working:
1) --target-resume option fails because pause_on_recovery setting was removed. Need to implement to the new 9.5 option and make that work with older versions in a consistent way.
2) No tests for the new .partial WAL segments that can be generated on timeline switch.
* Major refactoring of the protocol layer to support this work.
* Fixed protocol issue that was preventing ssh errors (especially connect) from being logged.
* Removed pg_backrest_remote and added the functionality to pg_backrest as remote command.
* Added file and directory syncs to the File object for additional safety during backup/restore and archiving. Suggested by Andres Freund.
* Support for Perl 5.10.1 and OpenSSH 5.3 which are default for CentOS/RHEL 6. Found by Eric Radman.
* Improved error message when backup is run without archive_command set and without --no-archive-check specified. Found by Eric Radman.
* Moved version number out of the VERSION file to Version.pm to better support packaging. Suggested by Michael Renner.
* Replaced IPC::System::Simple and Net::OpenSSH with IPC::Open3 to eliminate CPAN dependency for multiple distros.
Replaced IPC::System::Simple and Net::OpenSSH with IPC::Open3 to eliminate CPAN dependency for multiple distros. Using open3 will also be used for local processes so it make sense to switch now.
Also stopped replacing FORMAT number which explains the large number of test log changes. FORMAT should change very rarely and cause test log failures when it does.
* IMPORTANT NOTE: This flag day release breaks compatibility with older versions of PgBackRest. The manifest format, on-disk structure, and the binary names have all changed. You must create a new repository to hold backups for this version of PgBackRest and keep your older repository for a time in case you need to do a restore. The `pg_backrest.conf` file has not changed but you'll need to change any references to `pg_backrest.pl` in cron (or elsewhere) to `pg_backrest` (without the `.pl` extension).
* Add info command.
* More efficient file ordering for backup. Files are copied in descending size order so a single thread does not end up copying a large file at the end. This had already been implemented for restore.
* Logging now uses unbuffered output. This should make log files that are being written by multiple threads less chaotic. Suggested by Michael Renner.
* Experimental support for PostgreSQL 9.5. This may break when the control version or WAL magic changes but will be updated in each release.
* Includes updating the manifest to format 4. It turns out the manifest and .info files were not very good for providing information. A format update was required anyway so worked through the backlog of changes that would require a format change.
* Multiple database versions are now supported in the archive. Does't actually work yet but the structure should be good.
* Tests use more constants now that test logs can catch name regressions.
* Fixed an issue where archive-copy would fail on an incr/diff backup when hardlink=n. In this case the pg_xlog path does not already exist and must be created. Reported by Michael Renner
* Allow duplicate WAL segments to be archived when the checksum matches. This is necessary for some recovery scenarios.
* Allow comments/disabling in pg_backrest.conf using #. Suggested by Michael Renner.
* Better logging before pg_start_backup() to make it clear when the backup is waiting on a checkpoint. Suggested by Michael Renner.
* Various command behavior, help and logging fixes. Reported by Michael Renner.
* Fixed an issue in async archiving where archive-push was not properly returning 0 when archive-max-mb was reached and moved the async check after transfer to avoid having to remove the stop file twice. Also added unit tests for this case and improved error messages to make it clearer to the user what went wrong. Reported by Michael Renner.
* Fixed a locking issue that could allow multiple operations of the same type against a single stanza. This appeared to be benign in terms of data integrity but caused spurious errors while archiving and could lead to errors in backup/restore. Reported by Michael Renner.
* Replaced JSON module with JSON::PP which ships with core Perl.
ASSERTs still dump stack traces to the console and file in all cases. ERRORs only dump stack traces to the file when the file log level is DEBUG or TRACE.
* Better resume support. Resumed files are checked to be sure they have not been modified and the manifest is saved more often to preserve checksums as the backup progresses. More unit tests to verify each resume case.
* Resume is now optional. Use the `resume` setting or `--no-resume` from the command line to disable.
* More info messages during restore. Previously, most of the restore messages were debug level so not a lot was output in the log.
* Fixed an issue where an absolute path was not written into recovery.conf when the restore was run with a relative path.
* Added `tablespace` setting to allow tablespaces to be restored into the `pg_tblspc` path. This produces compact restores that are convenient for development, staging, etc. Currently these restores cannot be backed up as PgBackRest expects only links in the `pg_tblspc` path.
1) Re-checksums files that have checksums in the manifest
2) Recopies files that do not have a checksum
3) Saves the manifest at regular intervals to preserve checksums
4) Unit tests for all cases (that I can think of)
* Fixed a buffering error that could occur on large, highly-compressible files when copying to an uncompressed remote destination. The error was detected in the decompression code and resulted in a failed backup rather than corruption so it should not affect successful backups made with previous versions.
* Pushing duplicate WAL now generates an error. This worked before only if checksums were disabled.
* Database System IDs are used to make sure that all WAL in an archive matches up. This should help prevent misconfigurations that send WAL from multiple clusters to the same archive.
* Regression tests working back to PostgreSQL 8.3.
* Improved threading model by starting threads early and terminating them late.
* Added restore functionality.
* All options can now be set on the command-line making pg_backrest.conf optional.
* De/compression is now performed without threads and checksum/size is calculated in stream. That means file checksums are no longer optional.
* Added option `--no-start-stop` to allow backups when Postgres is shut down. If `postmaster.pid` is present then `--force` is required to make the backup run (though if Postgres is running an inconsistent backup will likely be created). This option was added primarily for the purpose of unit testing, but there may be applications in the real world as well.
* Fixed broken checksums and now they work with normal and resumed backups. Finally realized that checksums and checksum deltas should be functionally separated and this simplied a number of things. Issue #28 has been created for checksum deltas.
* Fixed an issue where a backup could be resumed from an aborted backup that didn't have the same type and prior backup.
* Removed dependency on Moose. It wasn't being used extensively and makes for longer startup times.
* Checksum for backup.manifest to detect corrupted/modified manifest.
* Link `latest` always points to the last backup. This has been added for convenience and to make restores simpler.
* More comprehensive unit tests in all areas.
* Complete rewrite of BackRest::File module to use a custom protocol for remote operations and Perl native GZIP and SHA operations. Compression is performed in threads rather than forked processes.
* Fairly comprehensive unit tests for all the basic operations. More work to be done here for sure, but then there is always more work to be done on unit tests.
* Removed dependency on Storable and replaced with a custom ini file implementation.
* Added much needed documentation (see INSTALL.md).
* Numerous other changes that can only be identified with a diff.
* Working on improving error handling in the file object. This is not complete, but works well enough to find a few errors that have been causing us problems (notably, find is occasionally failing building the archive async manifest when system is under load).
* Found and squashed a nasty bug where file_copy was defaulted to ignore errors. There was also an issue in file_exists that was causing the test to fail when the file actually did exist. Together they could have resulted in a corrupt backup with no errors, though it is very unlikely.
* The archive-get function returns a 1 when the archive file is missing to differentiate from hard errors (ssh connection failure, file copy error, etc.) This lets Postgres know that that the archive stream has terminated normally. However, this does not take into account possible holes in the archive stream.
* If an archive directory which should be empty could not be deleted backrest was throwing an error. There's a good fix for that coming, but for the time being it has been changed to a warning so processing can continue. This was impacting backups as sometimes the final archive file would not get pushed if the first archive file had been in a different directory (plus some bad luck).
Tweaking a few settings after running backups for about a month.
* Removed master_stderr_discard option on database SSH connections. There have been occasional lockups and they could be related issues originally seen in the file code.
* Changed lock file conflicts on backup and expire commands to ERROR. They were set to DEBUG due to a copy-and-paste from the archive locks.
This version has been put into production at Resonate, so it does work, but there are a number of major caveats.
* No restore functionality, but the backup directories are consistent Postgres data directories. You'll need to either uncompress the files or turn off compression in the backup. Uncompressed backups on a ZFS (or similar) filesystem are a good option because backups can be restored locally via a snapshot to create logical backups or do spot data recovery.
* Archiving is single-threaded. This has not posed an issue on our multi-terabyte databases with heavy write volume. Recommend a large WAL volume or to use the async option with a large volume nearby.
* Backups are multi-threaded, but the Net::OpenSSH library does not appear to be 100% threadsafe so it will very occasionally lock up on a thread. There is an overall process timeout that resolves this issue by killing the process. Yes, very ugly.
* Checksums are lost on any resumed backup. Only the final backup will record checksum on multiple resumes. Checksums from previous backups are correctly recorded and a full backup will reset everything.
* The backup.manifest is being written as Storable because Config::IniFile does not seem to handle large files well. Would definitely like to save these as human-readable text.
* Absolutely no documentation (outside the code). Well, excepting these release notes.
* Lots of other little things and not so little things. Much refactoring to follow.