Bug Fixes:
* Fixed missing variable replacements.
* Removed hard-coded host names from configuration file paths.
Documentation Features:
* Allow command-line length to be configured using cmd-line-len param.
* Added compact param to allow CSS to be embedded in HTML file.
* Added pretty param to produce HTML with proper indenting.
* Only generate HTML menu when required and don't require index page.
* Assign numbers to sections by default.
* VM mount points are now optional.
This meant that the queue would never be cleared without manual intervention (such as calling archive-push directly). PostgreSQL now receives errors when there is not enough space to store new WAL segments but the async process will still be started so that space is eventually freed.
Reported by Jens Wilke.
Controls whether console log messages are sent to stderr or stdout. By default this is set to warn which represents a change in behavior from previous versions, even though it may be more intuitive. Setting log-level-stderr=off will preserve the old behavior.
Suggested by Sascha Biberhofer.
* Fixed error message to properly display the archive command when an invalid archive command is detected.
* Check that archive_mode is enabled when archive-check option enabled.
* pgBackRest version number included in command start INFO log output.
* Process ID logged for local process start/stop INFO log output.
* Fixed missing expect output for help module.
* Fixed an issue where local processes were not disconnecting when complete and could later timeout. (Reported by Todd Vernick.)
* Fixed an issue where the protocol layer could timeout while waiting for WAL segments to arrive in the archive. (Reported by Todd Vernick.)
* Fixed an issue where retention-archive was not automatically being set when retention-archive-type=diff, resulting in a less aggressive than intended expiration of archive.
* Additional warnings when archive retention settings may not have the intended effect or would allow indefinite retention.
* Closed#235: "Retention policy question" by adding documentation for archive retention.
Contributed by Cynthia Shang.
A connection to the primary cluster is still required to start/stop the backup and copy files that are not replicated, but the vast majority of files are copied from the standby in order to reduce load on the master.
Master and standby can both be configured on the backup server and pgBackRest will automatically determine which is the master. This means no configuration changes for backup are required after failing over from a master to standby when a separate backup server is used.
These include (depending on the version where they were introduced): pgsql_tmp, pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots, pg_stat_tmp, pg_subtrans. The postgresql.auto.conf.tmp file is now excluded in addition to files that were already excluded: backup_label.old, postmaster.opts, postmaster.pid, recovery.conf, recovery.done.
* Tablespace paths that had $PGDATA as a substring would be identified as a subdirectories of $PGDATA even when they were not.
* Also hardened relative path checking a bit.
This is a better approach than 93320b8 (reverted in this commit) because it ensures that the remote type will be none so any functions that utilize optionRemoteTypeTest will work correctly.
This bug was only an issue when backup-host was not properly configured on the database host.
Improved handling of users/groups captured during backup that do not exist on the restore host. Also explicitly handle the case where user/group is not mapped to a name.
This was worked out as part of the test suite refactor [c8f806a] but not committed with it because of the large number of expect logs changes involved. Keeping them separate made it easier to audit the changes in the refactor.
* Make the code more modular and object-oriented.
* Multiple Docker containers can now be created for a single test to simulate more realistic environments.
The pg_xlogfile_name() function is no longer used to construct WAL filenames from LSNs. While this function is convenient it is not available on a standby. Instead, the archive is searched for the LSN in order to find the timeline. If due to some misadventure the LSN appears on multiple timelines then an error will be thrown, whereas before this condition would have passed unnoticed.
* Fixed an issue where keep-alives could be starved out by lots of small files during multi-threaded operation and were completely absent during single-threaded operation when resuming from a previous incomplete backup.
Reported by Janice Parkinson.
* Added the protocol-timeout option. Previously protocol-timeout was set as db-timeout + 30 seconds.
* Failure to shutdown remotes at the end of the backup no longer throws an exception. A warning is still generated that recommends a higher protocol-timeout.
* Fixed an issue where the expire command would refuse to run when explicitly called from the command line if the db-host option was set. This was not an issue when expire was run after a backup, which is the usual case.
* Option handling is now far more strict. Previously it was possible for a command to use an option that was not explicitly assigned to it. This was especially true for the backup-host and db-host options which are used to determine locality.
Reported by Chris Barber.
* Containers now use a squid proxy for apt/yum to speed builds.
* Obsolete containers are removed by the <br-option>--vm-force</br-option> option.
* Greatly reduced the quantity of Docker containers built by default. Containers are only built for PostgreSQL versions specified in db-minimal and those required to build documentation. Additional containers can be built with --db-version=all or by specifying a version, e.g. --db-version=9.4.
Added an execution cache so that documentation can be generated without setting up the full container environment. This is useful for packaging, keeps the documentation consistent for a release, and speeds up generation when no changes are made in the execution list.
* This will help catch Perl errors in the doc code since it is not run across multiple OSs like the core and test code.
* It is to be hoped that a newer kernel will make Docker more stable.
Release notes are now broken into sections so that bugs, features, and refactors are clearly delineated. An "Additional Notes" section has been added for changes to documentation and the test suite that do not affect the core code.
The change log was the last piece of documentation to be rendered in Markdown only. Wrote a converter so the document can be output by the standard renderers. The change log will now be located on the website and has been renamed to "Releases".
Added database version constants and changed version identification code to use hash tables instead of if-else. Propagated the db version constants to the rest of the code and in passing fixed some path/filename constants.
Added new regression tests to check that specific files are never copied.
This feature can result in major space and time savings when only specific databases are restored. Unrestored databases will not be accessible but must be manually dropped before they will be removed from the shared catalogue.
This change assigns each version of PostgreSQL to a specific OS version for testing to minimize the number of tests being run. In general, older versions of PostgreSQL are assigned to older OS versions.
The old behavior can be enabled with `--db-version=all`.
This change allows for easier testing since all files are local on the host VM and can be easily accessed without using `docker exec`. In addition, this change is required to allow multiple Docker containers per test case which is coming soon.
* All files and directories linked from PGDATA are now included in the backup. By default links will be restored directly into PGDATA as files or directories. The --link-all option can be used to restore all links to their original locations. The --link-map option can be used to remap a link to a new location.
* Removed --tablespace option and replaced with --tablespace-map-all option which should more clearly indicate its function.
* Added detail log level which will output more information than info without being as verbose as debug.
* The repo-path option now always refers to the repository where backups and archive are stored, whether local or remote, so the repo-remote-path option has been removed. The new spool-path option can be used to define a location for queueing WAL segments when archiving asynchronously. Otherwise, a local repository is no longer required.
* Implemented a new config format which should be far simpler to use. See the User Guide and Configuration Reference for details but for a simple configuration all options can now be placed in the stanza section. Options that are shared between stanzas can be placed in the [global] section. More complex configurations can still make use of command sections though this should be a rare use case.
* The default configuration filename is now pgbackrest.conf instead of pg_backrest.conf. This was done for consistency with other naming changes but also to prevent old config files from being loaded accidentally.
* The default repository name was changed from /var/lib/backup to /var/lib/pgbackrest.
* Lock files are now stored in /tmp/pgbackrest by default. These days /run/pgbackrest would be the preferred location but that would require init scripts which are not part of this release. The lock-path option can be used to configure the lock directory.
* Log files are now stored in /var/log/pgbackrest by default and no longer have the date appended so they can be managed with logrotate. The log-path option can be used to configure the lock directory.
* Executable filename changed from pg_backrest to pgbackrest.
This makes make the migrated file functions available to parts of the code that don't have access to a File object. They still exist as wrappers in the File object to support remote calls.
* Specific VMs can now be built by using --vm along with --vm-build.
* Docker caching can be disabled with --vm-force.
* ControlMaster is now used for al VMs to improve test speed.
Fixed an issue where the master process was passing --repo-remote-path instead of --repo-path to the remote and causing the lock files to be created in the default repository directory (/var/lib/backup), generally ending in failure. This was only an issue when --repo-remote-path was defined on the command line rather than in pg_backrest.conf.
Perl Critic added and passes on gentle. A policy file has been created with some permanent exceptions and a list of policies to be fixed in approximately the order they should be fixed in.
Added checks for `--delta` and `--force` restore options to ensure that the destination is a valid $PGDATA directory. pgBackRest will check for the presence of `PG_VERSION` or `backup.manifest` (left over from an aborted restore). If neither is found then `--delta` and `--force` will be disabled but the restore will proceed unless there are files in the $PGDATA directory (or any tablespace directories) in which case the operation will be aborted.
When backing up and restoring tablespaces pgBackRest only operates on the subdirectory created for the version of PostgreSQL being run against. Since multiple versions can live in a tablespace (especially during a binary upgrade) this prevents too many files from being copied during a backup and other versions possibly being wiped out during a `--delta` restore. This only applies to PostgreSQL >= 9.0 -- before that only one PostgreSQL version could use a tablespace.
Fixed an issue where document generation failed because some OSs are not tolerant of having multiple installed versions of PostgreSQL. A separate VM is now created for each version. Also added a sleep after database starts during document generation to ensure the database is running before the next command runs.
Reported by John Harvey.
1) Tests for all operating systems can now be run with a single command.
2) Tests can be run in parallel with --process-max.
3) Container generation now integrated into test.pl
4) Some basic test documentation.
Keepalives are now used to make sure the remote for the main process does not timeout while the thread remotes do all the work. The error messages for timeouts was also improved to make debugging easier.
1) Started on a general markdown renderer
2) Internal links now work in PDF
3) Improvements to PDF styling
4) Some comment and formatting fixes
5) User guide edits.
* Better messaging for expiration.
* Fixed already stopped message.
* retention-archive and retention-archive-type now use retention-full and 'full' when not specified.
* Fixed issue where backup-user was required (should default to backrest).
* ExecuteTest now supports retries.
* Fixed issue where log test was not comparing test logs.
* Fixed issue where test logs would not match for ssh connection errors
Unable to reproduce this anymore. It seems to have been fixed with the last round of config changes. Add regression tests to make sure it doesn't happen again.
Most tests are working now. What's not working:
1) --target-resume option fails because pause_on_recovery setting was removed. Need to implement to the new 9.5 option and make that work with older versions in a consistent way.
2) No tests for the new .partial WAL segments that can be generated on timeline switch.
* Major refactoring of the protocol layer to support this work.
* Fixed protocol issue that was preventing ssh errors (especially connect) from being logged.
Replaced IPC::System::Simple and Net::OpenSSH with IPC::Open3 to eliminate CPAN dependency for multiple distros. Using open3 will also be used for local processes so it make sense to switch now.
Also stopped replacing FORMAT number which explains the large number of test log changes. FORMAT should change very rarely and cause test log failures when it does.
More separation of the protocol and remote layers than was done in issue #106.
Settings are passed to the remote via command-line parameters rather than in the protocol.
* Includes updating the manifest to format 4. It turns out the manifest and .info files were not very good for providing information. A format update was required anyway so worked through the backlog of changes that would require a format change.
* Multiple database versions are now supported in the archive. Does't actually work yet but the structure should be good.
* Tests use more constants now that test logs can catch name regressions.
1) Re-checksums files that have checksums in the manifest
2) Recopies files that do not have a checksum
3) Saves the manifest at regular intervals to preserve checksums
4) Unit tests for all cases (that I can think of)
* Fixed a buffering error that could occur on large, highly-compressible files when copying to an uncompressed remote destination. The error was detected in the decompression code and resulted in a failed backup rather than corruption so it should not affect successful backups made with previous versions.
* Pushing duplicate WAL now generates an error. This worked before only if checksums were disabled.
* Database System IDs are used to make sure that all WAL in an archive matches up. This should help prevent misconfigurations that send WAL from multiple clusters to the same archive.
* Regression tests working back to PostgreSQL 8.3.
* Improved threading model by starting threads early and terminating them late.
* Added restore functionality.
* All options can now be set on the command-line making pg_backrest.conf optional.
* De/compression is now performed without threads and checksum/size is calculated in stream. That means file checksums are no longer optional.
* Added option `--no-start-stop` to allow backups when Postgres is shut down. If `postmaster.pid` is present then `--force` is required to make the backup run (though if Postgres is running an inconsistent backup will likely be created). This option was added primarily for the purpose of unit testing, but there may be applications in the real world as well.
* Fixed broken checksums and now they work with normal and resumed backups. Finally realized that checksums and checksum deltas should be functionally separated and this simplied a number of things. Issue #28 has been created for checksum deltas.
* Fixed an issue where a backup could be resumed from an aborted backup that didn't have the same type and prior backup.
* Removed dependency on Moose. It wasn't being used extensively and makes for longer startup times.
* Checksum for backup.manifest to detect corrupted/modified manifest.
* Link `latest` always points to the last backup. This has been added for convenience and to make restores simpler.
* More comprehensive unit tests in all areas.
* Complete rewrite of BackRest::File module to use a custom protocol for remote operations and Perl native GZIP and SHA operations. Compression is performed in threads rather than forked processes.
* Fairly comprehensive unit tests for all the basic operations. More work to be done here for sure, but then there is always more work to be done on unit tests.
* Removed dependency on Storable and replaced with a custom ini file implementation.
* Added much needed documentation (see INSTALL.md).
* Numerous other changes that can only be identified with a diff.
This version has been put into production at Resonate, so it does work, but there are a number of major caveats.
* No restore functionality, but the backup directories are consistent Postgres data directories. You'll need to either uncompress the files or turn off compression in the backup. Uncompressed backups on a ZFS (or similar) filesystem are a good option because backups can be restored locally via a snapshot to create logical backups or do spot data recovery.
* Archiving is single-threaded. This has not posed an issue on our multi-terabyte databases with heavy write volume. Recommend a large WAL volume or to use the async option with a large volume nearby.
* Backups are multi-threaded, but the Net::OpenSSH library does not appear to be 100% threadsafe so it will very occasionally lock up on a thread. There is an overall process timeout that resolves this issue by killing the process. Yes, very ugly.
* Checksums are lost on any resumed backup. Only the final backup will record checksum on multiple resumes. Checksums from previous backups are correctly recorded and a full backup will reset everything.
* The backup.manifest is being written as Storable because Config::IniFile does not seem to handle large files well. Would definitely like to save these as human-readable text.
* Absolutely no documentation (outside the code). Well, excepting these release notes.
* Lots of other little things and not so little things. Much refactoring to follow.