diff --git a/CHANGELOG.md b/CHANGELOG.md new file mode 100644 index 000000000..c5902dfb5 --- /dev/null +++ b/CHANGELOG.md @@ -0,0 +1,198 @@ +# pgBackRest - Change Log + +## v0.80: DALLAS MILESTONE - UNDER DEVELOPMENT +__No Release Date Set__ + +* Fixed an issue that caused the formatted timestamp for both the oldest and newest backups to be reported as the current time by the `info` command. Only `text` output was affected -- `json` output reported the correct epoch values. _Reported by Michael Renner_. + +* Fixed protocol issue that was preventing ssh errors (especially on connection) from being logged. + +* Now using Perl DBI for connections to PostgreSQL rather than psql. The `cmd-psql` and `cmd-psql-option` settings have been removed and replaced with `db-port` and `db-socket-path`. + +* Remove `pg_control` file at the beginning of the restore and copy it back at the very end. This prevents the possibility that a partial restore can be started by PostgreSQL. + +* The repository is now created and updated with consistent directory and file modes. By default `umask` is set to `0000` but this can be disabled with the `neutral-umask` setting. + +* Added checks to be sure the `db-path` setting is consistent with `db-port` by comparing the `data_directory` as reported by the cluster against the `db-path` setting and the version as reported by the cluster against the value read from `pg_control`. The `db-socket-path` setting is checked to be sure it is an absolute path. + +* Experimental support for PostgreSQL 9.5 alpha1. This may break when the control version or WAL magic changes in future versions but will be updated in each pgBackRest release to keep pace. All regression tests pass except for `--target-resume` tests (this functionality has changed in 9.5) and there is no testing yet for `.partial` WAL segments. + +* Major refactoring of the protocol layer to support future development. + +* Added vagrant test configurations for Ubuntu 14.04 and CentOS 7. + +* Split most of `README.md` out into `USERGUIDE.md` and `CHANGELOG.md` because it was becoming unwieldy. + +## v0.78: Remove CPAN Dependencies, Stability Improvements +__Released July 13, 2015__ + +* Removed dependency on CPAN packages for multi-threaded operation. While it might not be a bad idea to update the `threads` and `Thread::Queue` packages, it is no longer necessary. + +* Added vagrant test configurations for Ubuntu 12.04 and CentOS 6. + +* Modified wait backoff to use a Fibonacci rather than geometric sequence. This will make wait time grow less aggressively while still giving reasonable values. + +* More options for regression tests and improved code to run in a variety of environments. + +## v0.77: CentOS/RHEL 6 Support and Protocol Improvements +__Released June 30, 2015__ + +* Removed `pg_backrest_remote` and added the functionality to `pg_backrest` as the `remote` command. + +* Added file and directory syncs to the `File` object for additional safety during backup/restore and archiving. _Suggested by Andres Freund_. + +* Support for Perl 5.10.1 and OpenSSH 5.3 which are default for CentOS/RHEL 6. _Reported by Eric Radman._ + +* Improved error message when backup is run without `archive_command` set and without `--no-archive-check` specified. _Reported by Eric Radman_. + +* Moved version number out of the `VERSION` file to `Version.pm` to better support packaging. _Suggested by Michael Renner_. + +* Replaced `IPC::System::Simple` and `Net::OpenSSH` with `IPC::Open3` to eliminate CPAN dependency for multiple operating systems. + +## v0.75: New Repository Format, Info Command and Experimental 9.5 Support +__Released June 14, 2015__ + +* **IMPORTANT NOTE**: This flag day release breaks compatibility with older versions of pgBackRest. The manifest format, on-disk structure, and the binary names have all changed. You must create a new repository to hold backups for this version of pgBackRest and keep your older repository for a time in case you need to do a restore. The `pg_backrest.conf` file has not changed but you'll need to change any references to `pg_backrest.pl` in cron (or elsewhere) to `pg_backrest` (without the `.pl` extension). + +* Add `info` command. + +* More efficient file ordering for `backup`. Files are copied in descending size order so a single thread does not end up copying a large file at the end. This had already been implemented for `restore`. + +* Logging now uses unbuffered output. This should make log files that are being written by multiple threads less chaotic. _Suggested by Michael Renner_. + +* Experimental support for PostgreSQL 9.5. This may break when the control version or WAL magic changes but will be updated in each release. + +## v0.70: Stability Improvements for Archiving, Improved Logging and Help +__Released June 1, 2015__ + +* Fixed an issue where `archive-copy` would fail on an incr/diff backup when `hardlink=n`. In this case the `pg_xlog` path does not already exist and must be created. _Reported by Michael Renner_ + +* Allow duplicate WAL segments to be archived when the checksum matches. This is necessary for some recovery scenarios. + +* Allow comments/disabling in `pg_backrest.conf` using the `#` character. Only `#` characters in the forst character of the line are honored. _Suggested by Michael Renner_. + +* Better logging before `pg_start_backup()` to make it clear when the backup is waiting on a checkpoint. _Suggested by Michael Renner_. + +* Various command behavior, help and logging fixes. _Reported by Michael Renner_. + +* Fixed an issue in async archiving where `archive-push` was not properly returning 0 when `archive-max-mb` was reached and moved the async check after transfer to avoid having to remove the stop file twice. Also added unit tests for this case and improved error messages to make it clearer to the user what went wrong. _Reported by Michael Renner_. + +* Fixed a locking issue that could allow multiple operations of the same type against a single stanza. This appeared to be benign in terms of data integrity but caused spurious errors while archiving and could lead to errors in backup/restore. _Reported by Michael Renner_. + +* Replaced `JSON` module with `JSON::PP` which ships with core Perl. + +## v0.65: Improved Resume and Restore Logging, Compact Restores +__Released May 11, 2015__ + +* Better resume support. Resumed files are checked to be sure they have not been modified and the manifest is saved more often to preserve checksums as the backup progresses. More unit tests to verify each resume case. + +* Resume is now optional. Use the `resume` setting or `--no-resume` from the command line to disable. + +* More info messages during restore. Previously, most of the restore messages were debug level so not a lot was output in the log. + +* Fixed an issue where an absolute path was not written into recovery.conf when the restore was run with a relative path. + +* Added `tablespace` setting to allow tablespaces to be restored into the `pg_tblspc` path. This produces compact restores that are convenient for development, staging, etc. Currently these restores cannot be backed up as pgBackRest expects only links in the `pg_tblspc` path. + +## v0.61: Bug Fix for Uncompressed Remote Destination +__Released April 21, 2015__ + +* Fixed a buffering error that could occur on large, highly-compressible files when copying to an uncompressed remote destination. The error was detected in the decompression code and resulted in a failed backup rather than corruption so it should not affect successful backups made with previous versions. + +## v0.60: Better Version Support and WAL Improvements +__Released April 19, 2015__ + +* Pushing duplicate WAL now generates an error. This worked before only if checksums were disabled. + +* Database System IDs are used to make sure that all WAL in an archive matches up. This should help prevent misconfigurations that send WAL from multiple clusters to the same archive. + +* Regression tests working back to PostgreSQL 8.3. + +* Improved threading model by starting threads early and terminating them late. + +## v0.50: Restore and Much More +__Released March 25, 2015__ + +* Added restore functionality. + +* All options can now be set on the command-line making `pg_backrest.conf` optional. + +* De/compression is now performed without threads and checksum/size is calculated in stream. That means file checksums are no longer optional. + +* Added option `--no-start-stop` to allow backups when Postgres is shut down. If `postmaster.pid` is present then `--force` is required to make the backup run (though if Postgres is running an inconsistent backup will likely be created). This option was added primarily for the purpose of unit testing, but there may be applications in the real world as well. + +* Fixed broken checksums and now they work with normal and resumed backups. Finally realized that checksums and checksum deltas should be functionally separated and this simplified a number of things. Issue #28 has been created for checksum deltas. + +* Fixed an issue where a backup could be resumed from an aborted backup that didn't have the same type and prior backup. + +* Removed dependency on `Moose`. It wasn't being used extensively and makes for longer startup times. + +* Checksum for `backup.manifest` to detect a corrupted/modified manifest. + +* Link `latest` always points to the last backup. This has been added for convenience and to make restores simpler. + +* More comprehensive unit tests in all areas. + +## v0.30: Core Restructuring and Unit Tests +__Released October 5, 2014__ + +* Complete rewrite of `BackRest::File` module to use a custom protocol for remote operations and Perl native GZIP and SHA operations. Compression is performed in threads rather than forked processes. + +* Fairly comprehensive unit tests for all the basic operations. More work to be done here for sure, but then there is always more work to be done on unit tests. + +* Removed dependency on `Storable` and replaced with a custom ini file implementation. + +* Added much needed documentation + +* Numerous other changes that can only be identified with a diff. + +## v0.19: Improved Error Reporting/Handling +__Released May 13, 2014__ + +* Working on improving error handling in the `File` object. This is not complete, but works well enough to find a few errors that have been causing us problems (notably, find is occasionally failing building the archive async manifest when system is under load). + +* Found and squashed a nasty bug where `file_copy()` was defaulted to ignore errors. There was also an issue in `file_exists()` that was causing the test to fail when the file actually did exist. Together they could have resulted in a corrupt backup with no errors, though it is very unlikely. + +## v0.18: Return Soft Error When Archive Missing +__Released April 13, 2014__ + +* The `archive-get` command returns a 1 when the archive file is missing to differentiate from hard errors (ssh connection failure, file copy error, etc.) This lets PostgreSQL know that that the archive stream has terminated normally. However, this does not take into account possible holes in the archive stream. + +## v0.17: Warn When Archive Directories Cannot Be Deleted +__Released April 3, 2014__ + +* If an archive directory which should be empty could not be deleted backrest was throwing an error. There's a good fix for that coming, but for the time being it has been changed to a warning so processing can continue. This was impacting backups as sometimes the final archive file would not get pushed if the first archive file had been in a different directory (plus some bad luck). + +## v0.16: RequestTTY=yes for SSH Sessions +__Released April 1, 2014__ + +* Added `RequestTTY=yes` to ssh sessions. Hoping this will prevent random lockups. + +## v0.15: Added archive-get +__Released March 29, 2014__ + +* Added `archive-get` functionality to aid in restores. + +* Added option to force a checkpoint when starting the backup, `start-fast=y`. + +## v0.11: Minor Fixes +__Released March 26, 2014__ + +* Removed `master_stderr_discard` option on database SSH connections. There have been occasional lockups and they could be related to issues originally seen in the file code. + +* Changed lock file conflicts on `backup` and `expire` commands to `ERROR`. They were set to `DEBUG` due to a copy-and-paste from the archive locks. + +## v0.10: Backup and Archiving are Functional +__Released March 5, 2014__ + +* No restore functionality, but the backup directories are consistent PostgreSQL data directories. You'll need to either uncompress the files or turn off compression in the backup. Uncompressed backups on a ZFS (or similar) filesystem are a good option because backups can be restored locally via a snapshot to create logical backups or do spot data recovery. + +* Archiving is single-threaded. This has not posed an issue on our multi-terabyte databases with heavy write volume. Recommend a large WAL volume or to use the async option with a large volume nearby. + +* Backups are multi-threaded, but the `Net::OpenSSH` library does not appear to be 100% thread-safe so it will very occasionally lock up on a thread. There is an overall process timeout that resolves this issue by killing the process. Yes, very ugly. + +* Checksums are lost on any resumed backup. Only the final backup will record checksum on multiple resumes. Checksums from previous backups are correctly recorded and a full backup will reset everything. + +* The `backup.manifest` is being written as `Storable` because `Config::IniFile` does not seem to handle large files well. Would definitely like to save these as human-readable text. + +* Absolutely no documentation (outside the code). Well, excepting these release notes. diff --git a/README.md b/README.md index 8ef965002..cf0f657a5 100644 --- a/README.md +++ b/README.md @@ -24,940 +24,40 @@ Instead of relying on traditional backup tools like tar and rsync, pgBackRest im pgBackRest uses the gitflow model of development. This means that the master branch contains only the release history, i.e. each commit represents a single release and release tags are always from the master branch. The dev branch contains a single commit for each feature or fix and more accurately depicts the development history. Actual development is done on feature (dev_*) branches and squashed into dev after regression tests have passed. In this model dev is considered stable and can be released at any time. As such, the dev branch does not have any special version modifiers. -## Install +## Getting Started -pgBackRest is written entirely in Perl. Some additional modules will need to be installed depending on the OS. +pgBackRest strives to be easy to configure and operate: -### Ubuntu 12.04/14.04 Setup +* [Installation instructions](USERGUIDE.md#installation) for major operating systems. -* Install required Perl modules: -``` -apt-get install libdbd-pg-perl -``` +* [Sample configurations](USERGUIDE.md#examples) that cover most basic use cases. -### CentOS 6 Setup +* [Command guide](USERGUIDE.md#commands) for command-line operations. -* Install Perl and required modules: -``` -yum install perl perl-Time-HiRes perl-IO-String perl-parent perl-JSON perl-Digest-SHA perl-DBD-Pg -``` +* [Settings documentation](USERGUIDE.md#setttings) for creating complex configurations and more detail on options. -### Software Installation +## Contributing -pgBackRest can be installed by downloading the most recent release: +Contributions to pgBackRest are always welcome! -https://github.com/pgmasters/backrest/releases +Code fixes or new features can be submitted via pull requests. Ideas for new features and improvements to existing functionality or documentation can be [submitted as issues](http://github.com/pgmasters/backrest/issues). -pgBackRest can be installed anywhere but it's best (though not required) to install it in the same location on all systems. +Bug reports should be [submitted as issues](http://github.com/pgmasters/backrest/issues). Please provide as much information as possible to aid in determining the cause of the problem. -### Regression Test Setup +You will always receive credit in the [change log](https://github.com/pgmasters/backrest/blob/master/CHANGELOG.md) for your contributions. -* Create the backrest user +## Support -The backrest user must be created on the same system and in the same group as the user you will use for testing (which can be any user you prefer). For example: -``` -adduser -g backrest -``` -* Setup password-less SSH login between the test user and the backrest user +pgBackRest is completely free and open source under the [MIT](https://github.com/pgmasters/backrest/blob/master/LICENSE) license. You may use it for personal or commercial purposes without any restrictions whatsoever. Bug reports are taken very seriously and will be addressed as quickly as possible. -The test user should be able to `ssh backrest@127.0.0.1` and the backrest user should be able to `ssh @127.0.0.1` without requiring any passwords. This article (http://archive.oreilly.com/pub/h/66) has details on how to accomplish this. Do the logons both ways at the command line before running regression tests. +Creating a robust disaster recovery policy with proper replication and backup strategies can be a very complex and daunting task. You may find that you need help during the architecture phase and ongoing support to ensure that your enterprise continues running smoothly. -* Give group read and execute permissions to `~/backrest/test`: +[Crunchy Data](http://www.crunchydatasolutions.com) provides packaged versions of pgBackRest for major operating systems and expert full life-cycle commercial support for pgBackRest and all things PostgreSQL. [Crunchy Data](http://www.crunchydatasolutions.com) is committed to providing open source solutions with no vendor lock-in so cross-compatibility with the community version of pgBackRest is always strictly maintained. -Usually this can be accomplished by running the following as the test user: -``` -chmod 750 ~ -``` -* Running regression: - -Running the full regression suite is generally not necessary. Run the following first: -``` -./test.pl --module=backup --test=full --db-version=all --thread-max=<# threads> -``` -This will run full backup/restore regression with a variety of options on all installed versions of PostgreSQL. If you are only interested in one version then modify the `db-version` setting to X.X (e.g. 9.4). `--thread-max` can be omitted if you are running single-threaded. - -If there are errors in this test then run full regression to help isolate problems: -``` -./test.pl --db-version=all --thread-max=<# threads> -``` -Report regression test failures at https://github.com/pgmasters/backrest/issues. - - -## Configuration - -pgBackRest can be used entirely with command-line parameters but a configuration file is more practical for installations that are complex or set a lot of options. The default location for the configuration file is `/etc/pg_backrest.conf`. - -### Examples - -#### Confguring Postgres for Archiving - -Modify the following settings in `postgresql.conf`: -``` -wal_level = archive -archive_mode = on -archive_command = '/path/to/backrest/bin/pg_backrest --stanza=db archive-push %p' -``` -Replace the path with the actual location where pgBackRest was installed. The stanza parameter should be changed to the actual stanza name for your database. - - -#### Minimal Configuration - -The absolute minimum required to run pgBackRest (if all defaults are accepted) is the database path. - -`/etc/pg_backrest.conf`: -``` -[main] -db-path=/data/db -``` -The `db-path` option could also be provided on the command line, but it's best to use a configuration file as options tend to pile up quickly. - -#### Simple Single Host Configuration - -This configuration is appropriate for a small installation where backups are being made locally or to a remote file system that is mounted locally. A number of additional options are set: - -- `db-port` - Custom port for PostgreSQL. -- `compress` - Disable compression (handy if the file system is already compressed). -- `repo-path` - Path to the pgBackRest repository where backups and WAL archive are stored. -- `log-level-file` - Set the file log level to debug (Lots of extra info if something is not working as expected). -- `hardlink` - Create hardlinks between backups (but never between full backups). -- `thread-max` - Use 2 threads for backup/restore operations. - -`/etc/pg_backrest.conf`: -``` -[global:general] -compress=n -repo-path=/path/to/db/repo - -[global:log] -log-level-file=debug - -[global:backup] -hardlink=y -thread-max=2 - -[main] -db-path=/data/db -db-port=5555 -``` - - -#### Simple Multiple Host Configuration - -This configuration is appropriate for a small installation where backups are being made remotely. Make sure that postgres@db-host has trusted ssh to backrest@backup-host and vice versa. This configuration assumes that you have pg_backrest in the same path on both servers. - -`/etc/pg_backrest.conf` on the db host: -``` -[global:general] -repo-path=/path/to/db/repo -repo-remote-path=/path/to/backup/repo - -[global:backup] -backup-host=backup.mydomain.com -backup-user=backrest - -[global:archive] -archive-async=y - -[main] -db-path=/data/db -``` -`/etc/pg_backrest.conf` on the backup host: -``` -[global:general] -repo-path=/path/to/backup/repo - -[main] -db-host=db.mydomain.com -db-path=/data/db -db-user=postgres -``` - - -### Options - -#### `command` section - -The `command` section defines the location of external commands that are used by pgBackRest. - -##### `cmd-remote` key - -Defines the location of `pg_backrest_remote.pl`. - -Required only if the path to `pg_backrest_remote.pl` is different on the local and remote systems. If not defined, the remote path will be assumed to be the same as the local path. -``` -required: n -default: same as local -example: cmd-remote=/usr/lib/backrest/bin/pg_backrest_remote.pl -``` - -#### `log` section - -The `log` section defines logging-related settings. The following log levels are supported: - -- `off` - No logging at all (not recommended) -- `error` - Log only errors -- `warn` - Log warnings and errors -- `info` - Log info, warnings, and errors -- `debug` - Log debug, info, warnings, and errors -- `trace` - Log trace (very verbose debugging), debug, info, warnings, and errors - - -##### `log-level-file` key - -Sets file log level. -``` -required: n -default: info -example: log-level-file=debug -``` - -##### `log-level-console` key - -Sets console log level. -``` -required: n -default: warn -example: log-level-console=error -``` - -#### `general` section - -The `general` section defines settings that are shared between multiple operations. - -##### `buffer-size` key - -Set the buffer size used for copy, compress, and uncompress functions. A maximum of 3 buffers will be in use at a time per thread. An additional maximum of 256K per thread may be used for zlib buffers. -``` -required: n -default: 4194304 -allow: 16384 - 8388608 -example: buffer-size=32768 -``` - -##### `compress` key - -Enable gzip compression. Backup files are compatible with command-line gzip tools. -``` -required: n -default: y -example: compress=n -``` - -##### `compress-level` key - -Sets the zlib level to be used for file compression when `compress=y`. -``` -required: n -default: 6 -allow: 0-9 -example: compress-level=9 -``` - -##### `compress-level-network` key - -Sets the zlib level to be used for protocol compression when `compress=n` and the database is not on the same host as the backup. Protocol compression is used to reduce network traffic but can be disabled by setting `compress-level-network=0`. When `compress=y` the `compress-level-network` setting is ignored and `compress-level` is used instead so that the file is only compressed once. SSH compression is always disabled. -``` -required: n -default: 3 -allow: 0-9 -example: compress-level-network=1 -``` - -##### `neutral-umask` key - -Sets the umask to 0000 so modes in the repository as created in a sensible way. The default directory mode is 0750 and default file mode is 0640. The lock and log directories set the directory and file mode to 0770 and 0660 respectively. - -To use the executing user's umask instead specify `neutral-umask=n` in the config file or `--no-neutral-umask` on the command line. -``` -required: n -default: y -example: neutral-umask=n -``` - -##### `repo-path` key - -Path to the backrest repository where WAL segments, backups, logs, etc are stored. -``` -required: n -default: /var/lib/backup -example: repo-path=/data/db/backrest -``` - -##### `repo-remote-path` key - -Path to the remote backrest repository where WAL segments, backups, logs, etc are stored. -``` -required: n -example: repo-remote-path=/backup/backrest -``` - -#### `backup` section - -The `backup` section defines settings related to backup. - -##### `backup-host` key - -Sets the backup host when backup up remotely via SSH. Make sure that trusted SSH authentication is configured between the db host and the backup host. - -When backing up to a locally mounted network filesystem this setting is not required. -``` -required: n -example: backup-host=backup.domain.com -``` - -##### `backup-user` key - -Sets user account on the backup host. -``` -required: n -example: backup-user=backrest -``` - -##### `start-fast` key - -Forces a checkpoint (by passing `true` to the `fast` parameter of `pg_start_backup()`) so the backup begins immediately. Otherwise the backup will start after the next regular checkpoint. -``` -required: n -default: n -example: start-fast=y -``` - -##### `hardlink` key - -Enable hard-linking of files in differential and incremental backups to their full backups. This gives the appearance that each backup is a full backup. Be careful, though, because modifying files that are hard-linked can affect all the backups in the set. -``` -required: n -default: n -example: hardlink=y -``` - -##### `manifest-save-threshold` key - -Defines how often the manifest will be saved during a backup (in bytes). Saving the manifest is important because it stores the checksums and allows the resume function to work efficiently. The actual threshold used is 1% of the backup size or `manifest-save-threshold`, whichever is greater. -``` -required: n -default: 1073741824 -example: manifest-save-threshold=5368709120 -``` - -##### `resume` key - -Defines whether the resume feature is enabled. Resume can greatly reduce the amount of time required to run a backup after a previous backup of the same type has failed. It adds complexity, however, so it may be desirable to disable in environments that do not require the feature. -``` -required: n -default: y -example: resume=false -``` - -##### `thread-max` key - -Defines the number of threads to use for backup or restore. Each thread will perform compression and transfer to make the backup run faster, but don't set `thread-max` so high that it impacts database performance during backup. -``` -required: n -default: 1 -example: thread-max=4 -``` - -##### `thread-timeout` key - -Maximum amount of time (in seconds) that a backup thread should run. This limits the amount of time that a thread might be stuck due to unforeseen issues during the backup. Has no affect when `thread-max=1`. -``` -required: n -example: thread-timeout=3600 -``` - -##### `archive-check` key - -Checks that all WAL segments required to make the backup consistent are present in the WAL archive. It's a good idea to leave this as the default unless you are using another method for archiving. -``` -required: n -default: y -example: archive-check=n -``` - -##### `archive-copy` key - -Store WAL segments required to make the backup consistent in the backup's pg_xlog path. This slightly paranoid option protects against corruption or premature expiration in the WAL segment archive. PITR won't be possible without the WAL segment archive and this option also consumes more space. - -Even though WAL segments will be restored with the backup, PostgreSQL will ignore them if a `recovery.conf` file exists and instead use `archive_command` to fetch WAL segments. Specifying `type=none` when restoring will not create `recovery.conf` and force PostgreSQL to use the WAL segments in pg_xlog. This will get the database to a consistent state. -``` -required: n -default: n -example: archive-copy=y -``` - -#### `archive` section - -The `archive` section defines parameters when doing async archiving. This means that the archive files will be stored locally, then a background process will pick them and move them to the backup. - -##### `archive-async` key - -Archive WAL segments asynchronously. WAL segments will be copied to the local repo, then a process will be forked to compress the segment and transfer it to the remote repo if configured. Control will be returned to PostgreSQL as soon as the WAL segment is copied locally. -``` -required: n -default: n -example: archive-async=y -``` - -##### `archive-max-mb` key - -Limits the amount of archive log that will be written locally when `archive-async=y`. After the limit is reached, the following will happen: - -- pgBackRest will notify Postgres that the archive was successfully backed up, then DROP IT. -- An error will be logged to the console and also to the Postgres log. -- A stop file will be written in the lock directory and no more archive files will be backed up until it is removed. - -If this occurs then the archive log stream will be interrupted and PITR will not be possible past that point. A new backup will be required to regain full restore capability. - -The purpose of this feature is to prevent the log volume from filling up at which point Postgres will stop completely. Better to lose the backup than have the database go down. - -To start normal archiving again you'll need to remove the stop file which will be located at `${repo-path}/lock/${stanza}-archive.stop` where `${repo-path}` is the path set in the `general` section, and `${stanza}` is the backup stanza. -``` -required: n -example: archive-max-mb=1024 -``` - -#### `restore` section - -The `restore` section defines settings used for restoring backups. - -##### `tablespace` key - -Defines whether tablespaces will be be restored into their original (or remapped) locations or stored directly under the `pg_tblspc` path. Disabling this setting produces compact restores that are convenient for development, staging, etc. Currently these restores cannot be backed up as pgBackRest expects only links in the `pg_tblspc` path. If no tablespaces are present this this setting has no effect. -``` -required: n -default: y -example: tablespace=n -``` - -#### `expire` section - -The `expire` section defines how long backups will be retained. Expiration only occurs when the number of complete backups exceeds the allowed retention. In other words, if full-retention is set to 2, then there must be 3 complete backups before the oldest will be expired. Make sure you always have enough space for retention + 1 backups. - -##### `retention-full` key - -Number of full backups to keep. When a full backup expires, all differential and incremental backups associated with the full backup will also expire. When not defined then all full backups will be kept. -``` -required: n -example: retention-full=2 -``` - -##### `retention-diff` key - -Number of differential backups to keep. When a differential backup expires, all incremental backups associated with the differential backup will also expire. When not defined all differential backups will be kept. -``` -required: n -example: retention-diff=3 -``` - -##### `retention-archive-type` key - -Type of backup to use for archive retention (full or differential). If set to full, then pgBackRest will keep archive logs for the number of full backups defined by `retention-archive`. If set to differential, then pgBackRest will keep archive logs for the number of differential backups defined by `retention-archive`. - -If not defined then archive logs will be kept indefinitely. In general it is not useful to keep archive logs that are older than the oldest backup, but there may be reasons for doing so. -``` -required: n -default: full -example: retention-archive-type=diff -``` - -##### `retention-archive` key - -Number of backups worth of archive log to keep. If this is set less than your backup retention then be sure you set `archive-copy=y` or you won't be able to restore some older backups. - -For example, if `retention-archive=2` and `retention-full=4`, then any backups older than the most recent two full backups will not have WAL segments in the archive to make them consistent. To solve this, set `archive-copy=y` and use `type=none` when restoring. This issue will be addressed in a future release but for now be careful with this setting. -``` -required: n -example: retention-archive=2 -``` - -#### `stanza` section - -A stanza defines a backup for a specific database. The stanza section must define the base database path and host/user if the database is remote. Also, any global configuration sections can be overridden to define stanza-specific settings. - -##### `db-host` key - -Define the database host. Used for backups where the database host is different from the backup host. -``` -required: n -example: db-host=db.domain.com -``` - -##### `db-user` key - -Defines the logon user when `db-host` is defined. This user will also own the remote pgBackRest process and will initiate connections to PostgreSQL. For this to work correctly the user should be the PostgreSQL cluster owner which is generally `postgres`, the default. -``` -required: n -default: postgres -example: db-user=test_user -``` - -##### `db-path` key - -Path to the db data directory (data_directory setting in postgresql.conf). -``` -required: y -example: db-path=/data/db -``` - -##### `db-port` key - -Port that PostgreSQL is running on. This usually does not need to be specified as most clusters run on the default port. -``` -required: n -default: 5432 -example: db-port=6543 -``` - -##### `db-socket-path` key - -The unix socket directory that was specified when PostgreSQL was started. pgBackRest will automatically look in the standard location for your OS so there usually no need to specify this setting unless the socket directory was explicily modified with the `unix_socket_directory` setting in `postgressql.conf`. -``` -required: n -example: db-socket-path=/var/run/postgresql -``` - -## Operation - -### General Options - -These options are either global or used by all commands. - -#### `config` option - -By default pgBackRest expects the its configuration file to be located at `/etc/pg_backrest.conf`. Use this option to specify another location. -``` -required: n -default: /etc/pg_backrest.conf -example: config=/var/lib/backrest/pg_backrest.conf -``` - -#### `stanza` option - -Defines the stanza for the command. A stanza is the configuration for a database that defines where it is located, how it will be backed up, archiving options, etc. Most db servers will only have one Postgres cluster and therefore one stanza, whereas backup servers will have a stanza for every database that needs to be backed up. - -Examples of how to configure a stanza can be found in the `configuration examples` section. -``` -required: y -example: stanza=main -``` - -#### `help` option - -Displays the pgBackRest help. -``` -required: n -``` - -#### `version` option - -Displays the pgBackRest version. -``` -required: n -``` - -### Commands - -#### `backup` command - -Perform a database backup. pgBackRest does not have a built-in scheduler so it's best to run it from cron or some other scheduling mechanism. - -##### `type` option - -The following backup types are supported: - -- `full` - all database files will be copied and there will be no dependencies on previous backups. -- `incr` - incremental from the last successful backup. -- `diff` - like an incremental backup but always based on the last full backup. - -``` -required: n -default: incr -example: --type=full -``` - -##### `no-start-stop` option - -This option prevents pgBackRest from running `pg_start_backup()` and `pg_stop_backup()` on the database. In order for this to work PostgreSQL should be shut down and pgBackRest will generate an error if it is not. - -The purpose of this option is to allow cold backups. The `pg_xlog` directory is copied as-is and `archive-check` is automatically disabled for the backup. -``` -required: n -default: n -``` - -##### `force` option - -When used with `--no-start-stop` a backup will be run even if pgBackRest thinks that PostgreSQL is running. **This option should be used with extreme care as it will likely result in a bad backup.** - -There are some scenarios where a backup might still be desirable under these conditions. For example, if a server crashes and the database volume can only be mounted read-only, it would be a good idea to take a backup even if `postmaster.pid` is present. In this case it would be better to revert to the prior backup and replay WAL, but possibly there is a very important transaction in a WAL segment that did not get archived. -``` -required: n -default: n -``` - -##### Example: Full Backup - -``` -/path/to/pg_backrest --stanza=db --type=full backup -``` -Run a `full` backup on the `db` stanza. `--type` can also be set to `incr` or `diff` for incremental or differential backups. However, if no `full` backup exists then a `full` backup will be forced even if `incr` or `diff` is requested. - -#### `archive-push` command - -Archive a WAL segment to the repository. - -##### Example - -``` -/path/to/pg_backrest --stanza=db archive-push %p -``` -Accepts a WAL segment from PostgreSQL and archives it in the repository defined by `repo-path`. `%p` is how PostgreSQL specifies the location of the WAL segment to be archived. - -#### `archive-get` command - -Get a WAL segment from the repository. - -##### Example - -``` -/path/to/pg_backrest --stanza=db archive-get %f %p -``` -Retrieves a WAL segment from the repository. This command is used in `recovery.conf` to restore a backup, perform PITR, or as an alternative to streaming for keeping a replica up to date. `%f` is how PostgreSQL specifies the WAL segment it needs and `%p` is the location where it should be copied. - -#### `expire` command - -pgBackRest does backup rotation, but is not concerned with when the backups were created. So if two full backups are configured for retention, pgBackRest will keep two full backups no matter whether they occur, two hours apart or two weeks apart. - -##### Example - -``` -/path/to/pg_backrest --stanza=db expire -``` -Expire (rotate) any backups that exceed the defined retention. Expiration is run automatically after every successful backup, so there is no need to run this command separately unless you have reduced retention, usually to free up some space. - -#### `restore` command - -Perform a database restore. This command is generally run manually, but there are instances where it might be automated. - -##### `set` option - -The backup set to be restored. `latest` will restore the latest backup, otherwise provide the name of the backup to restore. -``` -required: n -default: latest -example: --set=20150131-153358F_20150131-153401I -``` - -##### `delta` option - -By default the PostgreSQL data and tablespace directories are expected to be present but empty. This option performs a delta restore using checksums. -``` -required: n -default: n -``` - -##### `force` option - -By itself this option forces the PostgreSQL data and tablespace paths to be completely overwritten. In combination with `--delta` a timestamp/size delta will be performed instead of using checksums. -``` -required: n -default: n -``` - -##### `type` option - -The following recovery types are supported: - -- `default` - recover to the end of the archive stream. -- `name` - recover the restore point specified in `--target`. -- `xid` - recover to the transaction id specified in `--target`. -- `time` - recover to the time specified in `--target`. -- `preserve` - preserve the existing `recovery.conf` file. -- `none` - no `recovery.conf` file is written so PostgreSQL will attempt to achieve consistency using WAL segments present in `pg_xlog`. Provide the required WAL segments or use the `archive-copy` setting to include them with the backup. - -``` -required: n -default: default -example: --type=xid -``` - -##### `target` option - -Defines the recovery target when `--type` is `name`, `xid`, or `time`. -``` -required: y -example: "--target=2015-01-30 14:15:11 EST" -``` - -##### `target-exclusive` option - -Defines whether recovery to the target would be exclusive (the default is inclusive) and is only valid when `--type` is `time` or `xid`. For example, using `--target-exclusive` would exclude the contents of transaction `1007` when `--type=xid` and `--target=1007`. See `recovery_target_inclusive` option in the PostgreSQL docs for more information. -``` -required: n -default: n -``` - -##### `target-resume` option - -Specifies whether recovery should resume when the recovery target is reached. See `pause_at_recovery_target` in the PostgreSQL docs for more information. -``` -required: n -default: n -``` - -##### `target-timeline` option - -Recovers along the specified timeline. See `recovery_target_timeline` in the PostgreSQL docs for more information. -``` -required: n -example: --target-timeline=3 -``` - -##### `recovery-setting` option - -Recovery settings in recovery.conf options can be specified with this option. See http://www.postgresql.org/docs/X.X/static/recovery-config.html for details on recovery.conf options (replace X.X with your database version). This option can be used multiple times. - -Note: `restore_command` will be automatically generated but can be overridden with this option. Be careful about specifying your own `restore_command` as pgBackRest is designed to handle this for you. Target Recovery options (recovery_target_name, recovery_target_time, etc.) are generated automatically by pgBackRest and should not be set with this option. - -Recovery settings can also be set in the `restore:recovery-setting` section of pg_backrest.conf. For example: -``` -[restore:recovery-setting] -primary_conn_info=db.mydomain.com -standby_mode=on -``` -Since pgBackRest does not start PostgreSQL after writing the `recovery.conf` file, it is always possible to edit/check `recovery.conf` before manually restarting. -``` -required: n -example: --recovery-setting primary_conninfo=db.mydomain.com -``` - -##### `tablespace-map` option - -Moves a tablespace to a new location during the restore. This is useful when tablespace locations are not the same on a replica, or an upgraded system has different mount points. - -Since PostgreSQL 9.2 tablespace locations are not stored in pg_tablespace so moving tablespaces can be done with impunity. However, moving a tablespace to the `data_directory` is not recommended and may cause problems. For more information on moving tablespaces http://www.databasesoup.com/2013/11/moving-tablespaces.html is a good resource. -``` -required: n -example: --tablespace-map ts_01=/db/ts_01 -``` - -##### Example: Restore Latest - -``` -/path/to/pg_backrest --stanza=db --type=name --target=release restore -``` -Restores the latest database backup and then recovers to the `release` restore point. - -#### `info` command - -Retrieve information about backups for a single stanza or for all stanzas. Text output is the default and gives a human-readable summary of backups for the stanza(s) requested. This format is subject to change with any release. - -For machine-readable output use `--output=json`. The JSON output contains far more information than the text output, however **this feature is currently experimental so the format may change between versions**. - -##### `output` option - -The following output types are supported: - -- `text` - Human-readable summary of backup information. -- `json` - Exhaustive machine-readable backup information in JSON format. - -``` -required: n -default: text -example: --output=json -``` - -##### Example: Backup information - -``` -/path/to/pg_backrest --stanza=db --output=json info -``` - -Get information about backups in the `db` stanza. - -## Release Notes - -### v0.80: DALLAS MILESTONE - UNDER DEVELOPMENT - -* Fixed an issue that caused the formatted timestamp for both the oldest and newest backups to be reported as the current time by the `info` command. Only `text` output was affected -- `json` output reported the correct epoch values. Found by Michael Renner. - -* Fixed protocol issue that was preventing ssh errors (especially on connection) from being logged. - -* Now using Perl DBI for connections to PostgreSQL rather than psql. The `cmd-psql` and `cmd-psql-option` settings have been removed and replaced with `db-port` and `db-socket-path`. - -* Remove `pg_control` file at the beginning of the restore and copy it back at the very end. This prevents the possibility that a partial restore can be started by PostgreSQL. - -* The repository is now created and updated with consistent directory and file modes. By default `umask` is set to `0000` but this can be disabled with the `neutral-umask` setting - -* Added checks to be sure the `db-path` setting is consistent with `db-port` by comparing the `data_directory` as reported by the cluster against the `db-path` setting and the version as reported by the cluster against the value read from `pg_control`. The `db-socket-path` setting is checked to be sure it is an absolute path. - -* Major refactoring of the protocol layer to support future development. - -* Added vagrant test configurations for Ubuntu 14.04 and CentOS 7. - -* Experimental support for PostgreSQL 9.5 alpha1. This may break when the control version or WAL magic changes in future versions but will be updated in each pgBackRest release to keep pace. All regression tests pass except for `--target-resume` tests (this functionality has changed in 9.5) and there is no testing yet for `.partial` WAL segments. - -### v0.78: Remove CPAN dependencies, stability improvements - -* Removed dependency on CPAN packages for multi-threaded operation. While it might not be a bad idea to update the `threads` and `Thread::Queue` packages, it is no longer necessary. - -* Added vagrant test configurations for Ubuntu 12.04 and CentOS 6. - -* Modified wait backoff to use a Fibonacci rather than geometric sequence. This will make wait time grow less aggressively while still giving reasonable values. - -* More options for regression tests and improved code to run in a variety of environments. - -### v0.77: CentOS/RHEL 6 support and protocol improvements - -* Removed `pg_backrest_remote` and added the functionality to `pg_backrest` as the `remote` command. - -* Added file and directory syncs to the `File` object for additional safety during backup/restore and archiving. Suggested by Andres Freund. - -* Support for Perl 5.10.1 and OpenSSH 5.3 which are default for CentOS/RHEL 6. Found by Eric Radman. - -* Improved error message when backup is run without `archive_command` set and without `--no-archive-check` specified. Found by Eric Radman. - -* Moved version number out of the `VERSION` file to `Version.pm` to better support packaging. Suggested by Michael Renner. - -* Replaced `IPC::System::Simple` and `Net::OpenSSH` with `IPC::Open3` to eliminate CPAN dependency for multiple distros. - -### v0.75: New repository format, info command and experimental 9.5 support - -* IMPORTANT NOTE: This flag day release breaks compatibility with older versions of pgBackRest. The manifest format, on-disk structure, and the binary names have all changed. You must create a new repository to hold backups for this version of pgBackRest and keep your older repository for a time in case you need to do a restore. The `pg_backrest.conf` file has not changed but you'll need to change any references to `pg_backrest.pl` in cron (or elsewhere) to `pg_backrest` (without the `.pl` extension). - -* Add `info` command. - -* More efficient file ordering for `backup`. Files are copied in descending size order so a single thread does not end up copying a large file at the end. This had already been implemented for `restore`. - -* Logging now uses unbuffered output. This should make log files that are being written by multiple threads less chaotic. Suggested by Michael Renner. - -* Experimental support for PostgreSQL 9.5. This may break when the control version or WAL magic changes but will be updated in each release. - -### v0.70: Stability improvements for archiving, improved logging and help - -* Fixed an issue where archive-copy would fail on an incr/diff backup when hardlink=n. In this case the `pg_xlog` path does not already exist and must be created. Reported by Michael Renner - -* Allow duplicate WAL segments to be archived when the checksum matches. This is necessary for some recovery scenarios. - -* Allow comments/disabling in pg_backrest.conf using #. Suggested by Michael Renner. - -* Better logging before `pg_start_backup()` to make it clear when the backup is waiting on a checkpoint. Suggested by Michael Renner. - -* Various command behavior, help and logging fixes. Reported by Michael Renner. - -* Fixed an issue in async archiving where `archive-push` was not properly returning 0 when `archive-max-mb` was reached and moved the async check after transfer to avoid having to remove the stop file twice. Also added unit tests for this case and improved error messages to make it clearer to the user what went wrong. Reported by Michael Renner. - -* Fixed a locking issue that could allow multiple operations of the same type against a single stanza. This appeared to be benign in terms of data integrity but caused spurious errors while archiving and could lead to errors in backup/restore. Reported by Michael Renner. - -* Replaced JSON module with JSON::PP which ships with core Perl. - -### v0.65: Improved resume and restore logging, compact restores - -* Better resume support. Resumed files are checked to be sure they have not been modified and the manifest is saved more often to preserve checksums as the backup progresses. More unit tests to verify each resume case. - -* Resume is now optional. Use the `resume` setting or `--no-resume` from the command line to disable. - -* More info messages during restore. Previously, most of the restore messages were debug level so not a lot was output in the log. - -* Fixed an issue where an absolute path was not written into recovery.conf when the restore was run with a relative path. - -* Added `tablespace` setting to allow tablespaces to be restored into the `pg_tblspc` path. This produces compact restores that are convenient for development, staging, etc. Currently these restores cannot be backed up as pgBackRest expects only links in the `pg_tblspc` path. - -### v0.61: Bug fix for uncompressed remote destination - -* Fixed a buffering error that could occur on large, highly-compressible files when copying to an uncompressed remote destination. The error was detected in the decompression code and resulted in a failed backup rather than corruption so it should not affect successful backups made with previous versions. - -### v0.60: Better version support and WAL improvements - -* Pushing duplicate WAL now generates an error. This worked before only if checksums were disabled. - -* Database System IDs are used to make sure that all WAL in an archive matches up. This should help prevent misconfigurations that send WAL from multiple clusters to the same archive. - -* Regression tests working back to PostgreSQL 8.3. - -* Improved threading model by starting threads early and terminating them late. - -### v0.50: Restore and much more - -* Added restore functionality. - -* All options can now be set on the command-line making pg_backrest.conf optional. - -* De/compression is now performed without threads and checksum/size is calculated in stream. That means file checksums are no longer optional. - -* Added option `--no-start-stop` to allow backups when Postgres is shut down. If `postmaster.pid` is present then `--force` is required to make the backup run (though if Postgres is running an inconsistent backup will likely be created). This option was added primarily for the purpose of unit testing, but there may be applications in the real world as well. - -* Fixed broken checksums and now they work with normal and resumed backups. Finally realized that checksums and checksum deltas should be functionally separated and this simplified a number of things. Issue #28 has been created for checksum deltas. - -* Fixed an issue where a backup could be resumed from an aborted backup that didn't have the same type and prior backup. - -* Removed dependency on Moose. It wasn't being used extensively and makes for longer startup times. - -* Checksum for backup.manifest to detect corrupted/modified manifest. - -* Link `latest` always points to the last backup. This has been added for convenience and to make restores simpler. - -* More comprehensive unit tests in all areas. - -### v0.30: Core Restructuring and Unit Tests - -* Complete rewrite of BackRest::File module to use a custom protocol for remote operations and Perl native GZIP and SHA operations. Compression is performed in threads rather than forked processes. - -* Fairly comprehensive unit tests for all the basic operations. More work to be done here for sure, but then there is always more work to be done on unit tests. - -* Removed dependency on Storable and replaced with a custom ini file implementation. - -* Added much needed documentation - -* Numerous other changes that can only be identified with a diff. - -### v0.19: Improved Error Reporting/Handling - -* Working on improving error handling in the file object. This is not complete, but works well enough to find a few errors that have been causing us problems (notably, find is occasionally failing building the archive async manifest when system is under load). - -* Found and squashed a nasty bug where `file_copy()` was defaulted to ignore errors. There was also an issue in file_exists that was causing the test to fail when the file actually did exist. Together they could have resulted in a corrupt backup with no errors, though it is very unlikely. - -### v0.18: Return Soft Error When Archive Missing - -* The `archive-get` operation returns a 1 when the archive file is missing to differentiate from hard errors (ssh connection failure, file copy error, etc.) This lets Postgres know that that the archive stream has terminated normally. However, this does not take into account possible holes in the archive stream. - -### v0.17: Warn When Archive Directories Cannot Be Deleted - -* If an archive directory which should be empty could not be deleted backrest was throwing an error. There's a good fix for that coming, but for the time being it has been changed to a warning so processing can continue. This was impacting backups as sometimes the final archive file would not get pushed if the first archive file had been in a different directory (plus some bad luck). - -### v0.16: RequestTTY=yes for SSH Sessions - -* Added `RequestTTY=yes` to ssh sessions. Hoping this will prevent random lockups. - -### v0.15: RequestTTY=yes for SSH Sessions - -* Added archive-get functionality to aid in restores. - -* Added option to force a checkpoint when starting the backup `start-fast=y`. - -### v0.11: Minor Fixes - -* Removed `master_stderr_discard` option on database SSH connections. There have been occasional lockups and they could be related to issues originally seen in the file code. - -* Changed lock file conflicts on backup and expire commands to ERROR. They were set to DEBUG due to a copy-and-paste from the archive locks. - -### v0.10: Backup and Archiving are Functional - -* No restore functionality, but the backup directories are consistent Postgres data directories. You'll need to either uncompress the files or turn off compression in the backup. Uncompressed backups on a ZFS (or similar) filesystem are a good option because backups can be restored locally via a snapshot to create logical backups or do spot data recovery. - -* Archiving is single-threaded. This has not posed an issue on our multi-terabyte databases with heavy write volume. Recommend a large WAL volume or to use the async option with a large volume nearby. - -* Backups are multi-threaded, but the Net::OpenSSH library does not appear to be 100% thread-safe so it will very occasionally lock up on a thread. There is an overall process timeout that resolves this issue by killing the process. Yes, very ugly. - -* Checksums are lost on any resumed backup. Only the final backup will record checksum on multiple resumes. Checksums from previous backups are correctly recorded and a full backup will reset everything. - -* The backup.manifest is being written as Storable because Config::IniFile does not seem to handle large files well. Would definitely like to save these as human-readable text. - -* Absolutely no documentation (outside the code). Well, excepting these release notes. +Please visit [Crunchy Backup Manager](http://crunchydatasolutions.com/crunchy-backup-manager) for more information. ## Recognition Primary recognition goes to Stephen Frost for all his valuable advice and criticism during the development of pgBackRest. -Crunchy Data Solutions (http://www.crunchydata.com) has contributed time and resources to pgBackRest and continues to support development. Resonate (http://www.resonate.com/) also contributed to the development of pgBackRest and allowed me to install early (but well tested) versions as their primary PostgreSQL backup solution. +[Crunchy Data](http://www.crunchydatasolutions.com) has contributed significant time and resources to pgBackRest and continues to actively support development. [Resonate](http://www.resonate.com) also contributed to the development of pgBackRest and allowed early (but well tested) versions to be installed as their primary PostgreSQL backup solution. diff --git a/USERGUIDE.md b/USERGUIDE.md new file mode 100644 index 000000000..f26103220 --- /dev/null +++ b/USERGUIDE.md @@ -0,0 +1,764 @@ +# pgBackRest - User Guide + +## Installation + +pgBackRest is written entirely in Perl. Some additional modules will need to be installed depending on the OS. + +### Ubuntu 12.04/14.04 Setup + +* Install required Perl modules: +``` +apt-get install libdbd-pg-perl +``` + +### CentOS 6 Setup + +* Install Perl and required modules: +``` +yum install perl perl-Time-HiRes perl-IO-String perl-parent perl-JSON perl-Digest-SHA perl-DBD-Pg +``` + +### CentOS 7 Setup + +* Install Perl and required modules: +``` +yum install perl perl-IO-String perl-Thread-Queue perl-JSON-PP perl-Digest-SHA perl-DBD-Pg +``` + +### Software Installation + +pgBackRest can be installed by downloading the most recent release: + +https://github.com/pgmasters/backrest/releases + +pgBackRest can be installed anywhere but it's best (though not required) to install it in the same location on all systems. + +### Regression Test Setup + +* Create the backrest user + +The backrest user must be created on the same system and in the same group as the user you will use for testing (which can be any user you prefer). For example: +``` +adduser -g backrest +``` +* Setup password-less SSH login between the test user and the backrest user + +The test user should be able to `ssh backrest@127.0.0.1` and the backrest user should be able to `ssh @127.0.0.1` without requiring any passwords. This article (http://archive.oreilly.com/pub/h/66) has details on how to accomplish this. Do the logons both ways at the command line before running regression tests. + +* Give group read and execute permissions to `~/backrest/test`: + +Usually this can be accomplished by running the following as the test user: +``` +chmod 750 ~ +``` +* Running regression: + +Running the full regression suite is generally not necessary. Run the following first: +``` +./test.pl --module=backup --test=full --db-version=all --thread-max=<# threads> +``` +This will run full backup/restore regression with a variety of options on all installed versions of PostgreSQL. If you are only interested in one version then modify the `db-version` setting to X.X (e.g. 9.4). `--thread-max` can be omitted if you are running single-threaded. + +If there are errors in this test then run full regression to help isolate problems: +``` +./test.pl --db-version=all --thread-max=<# threads> +``` +Report regression test failures at https://github.com/pgmasters/backrest/issues. + +## Configuration + +pgBackRest can be used entirely with command-line parameters but a configuration file is more practical for installations that are complex or set a lot of options. The default location for the configuration file is `/etc/pg_backrest.conf`. + +### Examples + +#### Confguring Postgres for Archiving + +Modify the following settings in `postgresql.conf`: +``` +wal_level = archive +archive_mode = on +archive_command = '/path/to/backrest/bin/pg_backrest --stanza=db archive-push %p' +``` +Replace the path with the actual location where pgBackRest was installed. The stanza parameter should be changed to the actual stanza name for your database. + +#### Minimal Configuration + +The absolute minimum required to run pgBackRest (if all defaults are accepted) is the database path. + +`/etc/pg_backrest.conf`: +``` +[main] +db-path=/data/db +``` +The `db-path` option could also be provided on the command line, but it's best to use a configuration file as options tend to pile up quickly. + +#### Simple Single Host Configuration + +This configuration is appropriate for a small installation where backups are being made locally or to a remote file system that is mounted locally. A number of additional options are set: + +- `db-port` - Custom port for PostgreSQL. +- `compress` - Disable compression (handy if the file system is already compressed). +- `repo-path` - Path to the pgBackRest repository where backups and WAL archive are stored. +- `log-level-file` - Set the file log level to debug (Lots of extra info if something is not working as expected). +- `hardlink` - Create hardlinks between backups (but never between full backups). +- `thread-max` - Use 2 threads for backup/restore operations. + +`/etc/pg_backrest.conf`: +``` +[global:general] +compress=n +repo-path=/path/to/db/repo + +[global:log] +log-level-file=debug + +[global:backup] +hardlink=y +thread-max=2 + +[main] +db-path=/data/db +db-port=5555 +``` + +#### Simple Multiple Host Configuration + +This configuration is appropriate for a small installation where backups are being made remotely. Make sure that postgres@db-host has trusted ssh to backrest@backup-host and vice versa. This configuration assumes that you have pg_backrest in the same path on both servers. + +`/etc/pg_backrest.conf` on the db host: +``` +[global:general] +repo-path=/path/to/db/repo +repo-remote-path=/path/to/backup/repo + +[global:backup] +backup-host=backup.mydomain.com +backup-user=backrest + +[global:archive] +archive-async=y + +[main] +db-path=/data/db +``` +`/etc/pg_backrest.conf` on the backup host: +``` +[global:general] +repo-path=/path/to/backup/repo + +[main] +db-host=db.mydomain.com +db-path=/data/db +db-user=postgres +``` + +### Setttings + +#### `command` section + +The `command` section defines the location of external commands that are used by pgBackRest. + +##### `cmd-remote` key + +Defines the location of `pg_backrest_remote.pl`. + +Required only if the path to `pg_backrest_remote.pl` is different on the local and remote systems. If not defined, the remote path will be assumed to be the same as the local path. +``` +required: n +default: same as local +example: cmd-remote=/usr/lib/backrest/bin/pg_backrest_remote.pl +``` + +#### `log` section + +The `log` section defines logging-related settings. The following log levels are supported: + +- `off` - No logging at all (not recommended) +- `error` - Log only errors +- `warn` - Log warnings and errors +- `info` - Log info, warnings, and errors +- `debug` - Log debug, info, warnings, and errors +- `trace` - Log trace (very verbose debugging), debug, info, warnings, and errors + + +##### `log-level-file` key + +Sets file log level. +``` +required: n +default: info +example: log-level-file=debug +``` + +##### `log-level-console` key + +Sets console log level. +``` +required: n +default: warn +example: log-level-console=error +``` + +#### `general` section + +The `general` section defines settings that are shared between multiple operations. + +##### `buffer-size` key + +Set the buffer size used for copy, compress, and uncompress functions. A maximum of 3 buffers will be in use at a time per thread. An additional maximum of 256K per thread may be used for zlib buffers. +``` +required: n +default: 4194304 +allow: 16384 - 8388608 +example: buffer-size=32768 +``` + +##### `compress` key + +Enable gzip compression. Backup files are compatible with command-line gzip tools. +``` +required: n +default: y +example: compress=n +``` + +##### `compress-level` key + +Sets the zlib level to be used for file compression when `compress=y`. +``` +required: n +default: 6 +allow: 0-9 +example: compress-level=9 +``` + +##### `compress-level-network` key + +Sets the zlib level to be used for protocol compression when `compress=n` and the database is not on the same host as the backup. Protocol compression is used to reduce network traffic but can be disabled by setting `compress-level-network=0`. When `compress=y` the `compress-level-network` setting is ignored and `compress-level` is used instead so that the file is only compressed once. SSH compression is always disabled. +``` +required: n +default: 3 +allow: 0-9 +example: compress-level-network=1 +``` + +##### `neutral-umask` key + +Sets the umask to 0000 so modes in the repository as created in a sensible way. The default directory mode is 0750 and default file mode is 0640. The lock and log directories set the directory and file mode to 0770 and 0660 respectively. + +To use the executing user's umask instead specify `neutral-umask=n` in the config file or `--no-neutral-umask` on the command line. +``` +required: n +default: y +example: neutral-umask=n +``` + +##### `repo-path` key + +Path to the backrest repository where WAL segments, backups, logs, etc are stored. +``` +required: n +default: /var/lib/backup +example: repo-path=/data/db/backrest +``` + +##### `repo-remote-path` key + +Path to the remote backrest repository where WAL segments, backups, logs, etc are stored. +``` +required: n +example: repo-remote-path=/backup/backrest +``` + +#### `backup` section + +The `backup` section defines settings related to backup. + +##### `backup-host` key + +Sets the backup host when backup up remotely via SSH. Make sure that trusted SSH authentication is configured between the db host and the backup host. + +When backing up to a locally mounted network filesystem this setting is not required. +``` +required: n +example: backup-host=backup.domain.com +``` + +##### `backup-user` key + +Sets user account on the backup host. +``` +required: n +example: backup-user=backrest +``` + +##### `start-fast` key + +Forces a checkpoint (by passing `true` to the `fast` parameter of `pg_start_backup()`) so the backup begins immediately. Otherwise the backup will start after the next regular checkpoint. +``` +required: n +default: n +example: start-fast=y +``` + +##### `hardlink` key + +Enable hard-linking of files in differential and incremental backups to their full backups. This gives the appearance that each backup is a full backup. Be careful, though, because modifying files that are hard-linked can affect all the backups in the set. +``` +required: n +default: n +example: hardlink=y +``` + +##### `manifest-save-threshold` key + +Defines how often the manifest will be saved during a backup (in bytes). Saving the manifest is important because it stores the checksums and allows the resume function to work efficiently. The actual threshold used is 1% of the backup size or `manifest-save-threshold`, whichever is greater. +``` +required: n +default: 1073741824 +example: manifest-save-threshold=5368709120 +``` + +##### `resume` key + +Defines whether the resume feature is enabled. Resume can greatly reduce the amount of time required to run a backup after a previous backup of the same type has failed. It adds complexity, however, so it may be desirable to disable in environments that do not require the feature. +``` +required: n +default: y +example: resume=false +``` + +##### `thread-max` key + +Defines the number of threads to use for backup or restore. Each thread will perform compression and transfer to make the backup run faster, but don't set `thread-max` so high that it impacts database performance during backup. +``` +required: n +default: 1 +example: thread-max=4 +``` + +##### `thread-timeout` key + +Maximum amount of time (in seconds) that a backup thread should run. This limits the amount of time that a thread might be stuck due to unforeseen issues during the backup. Has no affect when `thread-max=1`. +``` +required: n +example: thread-timeout=3600 +``` + +##### `archive-check` key + +Checks that all WAL segments required to make the backup consistent are present in the WAL archive. It's a good idea to leave this as the default unless you are using another method for archiving. +``` +required: n +default: y +example: archive-check=n +``` + +##### `archive-copy` key + +Store WAL segments required to make the backup consistent in the backup's pg_xlog path. This slightly paranoid option protects against corruption or premature expiration in the WAL segment archive. PITR won't be possible without the WAL segment archive and this option also consumes more space. + +Even though WAL segments will be restored with the backup, PostgreSQL will ignore them if a `recovery.conf` file exists and instead use `archive_command` to fetch WAL segments. Specifying `type=none` when restoring will not create `recovery.conf` and force PostgreSQL to use the WAL segments in pg_xlog. This will get the database to a consistent state. +``` +required: n +default: n +example: archive-copy=y +``` + +#### `archive` section + +The `archive` section defines parameters when doing async archiving. This means that the archive files will be stored locally, then a background process will pick them and move them to the backup. + +##### `archive-async` key + +Archive WAL segments asynchronously. WAL segments will be copied to the local repo, then a process will be forked to compress the segment and transfer it to the remote repo if configured. Control will be returned to PostgreSQL as soon as the WAL segment is copied locally. +``` +required: n +default: n +example: archive-async=y +``` + +##### `archive-max-mb` key + +Limits the amount of archive log that will be written locally when `archive-async=y`. After the limit is reached, the following will happen: + +- pgBackRest will notify Postgres that the archive was successfully backed up, then DROP IT. +- An error will be logged to the console and also to the Postgres log. +- A stop file will be written in the lock directory and no more archive files will be backed up until it is removed. + +If this occurs then the archive log stream will be interrupted and PITR will not be possible past that point. A new backup will be required to regain full restore capability. + +The purpose of this feature is to prevent the log volume from filling up at which point Postgres will stop completely. Better to lose the backup than have the database go down. + +To start normal archiving again you'll need to remove the stop file which will be located at `${repo-path}/lock/${stanza}-archive.stop` where `${repo-path}` is the path set in the `general` section, and `${stanza}` is the backup stanza. +``` +required: n +example: archive-max-mb=1024 +``` + +#### `restore` section + +The `restore` section defines settings used for restoring backups. + +##### `tablespace` key + +Defines whether tablespaces will be be restored into their original (or remapped) locations or stored directly under the `pg_tblspc` path. Disabling this setting produces compact restores that are convenient for development, staging, etc. Currently these restores cannot be backed up as pgBackRest expects only links in the `pg_tblspc` path. If no tablespaces are present this this setting has no effect. +``` +required: n +default: y +example: tablespace=n +``` + +#### `expire` section + +The `expire` section defines how long backups will be retained. Expiration only occurs when the number of complete backups exceeds the allowed retention. In other words, if full-retention is set to 2, then there must be 3 complete backups before the oldest will be expired. Make sure you always have enough space for retention + 1 backups. + +##### `retention-full` key + +Number of full backups to keep. When a full backup expires, all differential and incremental backups associated with the full backup will also expire. When not defined then all full backups will be kept. +``` +required: n +example: retention-full=2 +``` + +##### `retention-diff` key + +Number of differential backups to keep. When a differential backup expires, all incremental backups associated with the differential backup will also expire. When not defined all differential backups will be kept. +``` +required: n +example: retention-diff=3 +``` + +##### `retention-archive-type` key + +Type of backup to use for archive retention (full or differential). If set to full, then pgBackRest will keep archive logs for the number of full backups defined by `retention-archive`. If set to differential, then pgBackRest will keep archive logs for the number of differential backups defined by `retention-archive`. + +If not defined then archive logs will be kept indefinitely. In general it is not useful to keep archive logs that are older than the oldest backup, but there may be reasons for doing so. +``` +required: n +default: full +example: retention-archive-type=diff +``` + +##### `retention-archive` key + +Number of backups worth of archive log to keep. If this is set less than your backup retention then be sure you set `archive-copy=y` or you won't be able to restore some older backups. + +For example, if `retention-archive=2` and `retention-full=4`, then any backups older than the most recent two full backups will not have WAL segments in the archive to make them consistent. To solve this, set `archive-copy=y` and use `type=none` when restoring. This issue will be addressed in a future release but for now be careful with this setting. +``` +required: n +example: retention-archive=2 +``` + +#### `stanza` section + +A stanza defines a backup for a specific database. The stanza section must define the base database path and host/user if the database is remote. Also, any global configuration sections can be overridden to define stanza-specific settings. + +##### `db-host` key + +Define the database host. Used for backups where the database host is different from the backup host. +``` +required: n +example: db-host=db.domain.com +``` + +##### `db-user` key + +Defines the logon user when `db-host` is defined. This user will also own the remote pgBackRest process and will initiate connections to PostgreSQL. For this to work correctly the user should be the PostgreSQL cluster owner which is generally `postgres`, the default. +``` +required: n +default: postgres +example: db-user=test_user +``` + +##### `db-path` key + +Path to the db data directory (data_directory setting in postgresql.conf). +``` +required: y +example: db-path=/data/db +``` + +##### `db-port` key + +Port that PostgreSQL is running on. This usually does not need to be specified as most clusters run on the default port. +``` +required: n +default: 5432 +example: db-port=6543 +``` + +##### `db-socket-path` key + +The unix socket directory that was specified when PostgreSQL was started. pgBackRest will automatically look in the standard location for your OS so there usually no need to specify this setting unless the socket directory was explicily modified with the `unix_socket_directory` setting in `postgressql.conf`. +``` +required: n +example: db-socket-path=/var/run/postgresql +``` + +## Commands + +### General Options + +These options are either global or used by all commands. + +#### `config` option + +By default pgBackRest expects the its configuration file to be located at `/etc/pg_backrest.conf`. Use this option to specify another location. +``` +required: n +default: /etc/pg_backrest.conf +example: config=/var/lib/backrest/pg_backrest.conf +``` + +#### `stanza` option + +Defines the stanza for the command. A stanza is the configuration for a database that defines where it is located, how it will be backed up, archiving options, etc. Most db servers will only have one Postgres cluster and therefore one stanza, whereas backup servers will have a stanza for every database that needs to be backed up. + +Examples of how to configure a stanza can be found in the `configuration examples` section. +``` +required: y +example: stanza=main +``` + +#### `help` option + +Displays the pgBackRest help. +``` +required: n +``` + +#### `version` option + +Displays the pgBackRest version. +``` +required: n +``` + +### Commands + +#### `backup` command + +Perform a database backup. pgBackRest does not have a built-in scheduler so it's best to run it from cron or some other scheduling mechanism. + +##### `type` option + +The following backup types are supported: + +- `full` - all database files will be copied and there will be no dependencies on previous backups. +- `incr` - incremental from the last successful backup. +- `diff` - like an incremental backup but always based on the last full backup. + +``` +required: n +default: incr +example: --type=full +``` + +##### `no-start-stop` option + +This option prevents pgBackRest from running `pg_start_backup()` and `pg_stop_backup()` on the database. In order for this to work PostgreSQL should be shut down and pgBackRest will generate an error if it is not. + +The purpose of this option is to allow cold backups. The `pg_xlog` directory is copied as-is and `archive-check` is automatically disabled for the backup. +``` +required: n +default: n +``` + +##### `force` option + +When used with `--no-start-stop` a backup will be run even if pgBackRest thinks that PostgreSQL is running. **This option should be used with extreme care as it will likely result in a bad backup.** + +There are some scenarios where a backup might still be desirable under these conditions. For example, if a server crashes and the database volume can only be mounted read-only, it would be a good idea to take a backup even if `postmaster.pid` is present. In this case it would be better to revert to the prior backup and replay WAL, but possibly there is a very important transaction in a WAL segment that did not get archived. +``` +required: n +default: n +``` + +##### Example: Full Backup + +``` +/path/to/pg_backrest --stanza=db --type=full backup +``` +Run a `full` backup on the `db` stanza. `--type` can also be set to `incr` or `diff` for incremental or differential backups. However, if no `full` backup exists then a `full` backup will be forced even if `incr` or `diff` is requested. + +#### `archive-push` command + +Archive a WAL segment to the repository. + +##### Example + +``` +/path/to/pg_backrest --stanza=db archive-push %p +``` +Accepts a WAL segment from PostgreSQL and archives it in the repository defined by `repo-path`. `%p` is how PostgreSQL specifies the location of the WAL segment to be archived. + +#### `archive-get` command + +Get a WAL segment from the repository. + +##### Example + +``` +/path/to/pg_backrest --stanza=db archive-get %f %p +``` +Retrieves a WAL segment from the repository. This command is used in `recovery.conf` to restore a backup, perform PITR, or as an alternative to streaming for keeping a replica up to date. `%f` is how PostgreSQL specifies the WAL segment it needs and `%p` is the location where it should be copied. + +#### `expire` command + +pgBackRest does backup rotation, but is not concerned with when the backups were created. So if two full backups are configured for retention, pgBackRest will keep two full backups no matter whether they occur, two hours apart or two weeks apart. + +##### Example + +``` +/path/to/pg_backrest --stanza=db expire +``` +Expire (rotate) any backups that exceed the defined retention. Expiration is run automatically after every successful backup, so there is no need to run this command separately unless you have reduced retention, usually to free up some space. + +#### `restore` command + +Perform a database restore. This command is generally run manually, but there are instances where it might be automated. + +##### `set` option + +The backup set to be restored. `latest` will restore the latest backup, otherwise provide the name of the backup to restore. +``` +required: n +default: latest +example: --set=20150131-153358F_20150131-153401I +``` + +##### `delta` option + +By default the PostgreSQL data and tablespace directories are expected to be present but empty. This option performs a delta restore using checksums. +``` +required: n +default: n +``` + +##### `force` option + +By itself this option forces the PostgreSQL data and tablespace paths to be completely overwritten. In combination with `--delta` a timestamp/size delta will be performed instead of using checksums. +``` +required: n +default: n +``` + +##### `type` option + +The following recovery types are supported: + +- `default` - recover to the end of the archive stream. +- `name` - recover the restore point specified in `--target`. +- `xid` - recover to the transaction id specified in `--target`. +- `time` - recover to the time specified in `--target`. +- `preserve` - preserve the existing `recovery.conf` file. +- `none` - no `recovery.conf` file is written so PostgreSQL will attempt to achieve consistency using WAL segments present in `pg_xlog`. Provide the required WAL segments or use the `archive-copy` setting to include them with the backup. + +``` +required: n +default: default +example: --type=xid +``` + +##### `target` option + +Defines the recovery target when `--type` is `name`, `xid`, or `time`. +``` +required: y +example: "--target=2015-01-30 14:15:11 EST" +``` + +##### `target-exclusive` option + +Defines whether recovery to the target would be exclusive (the default is inclusive) and is only valid when `--type` is `time` or `xid`. For example, using `--target-exclusive` would exclude the contents of transaction `1007` when `--type=xid` and `--target=1007`. See `recovery_target_inclusive` option in the PostgreSQL docs for more information. +``` +required: n +default: n +``` + +##### `target-resume` option + +Specifies whether recovery should resume when the recovery target is reached. See `pause_at_recovery_target` in the PostgreSQL docs for more information. +``` +required: n +default: n +``` + +##### `target-timeline` option + +Recovers along the specified timeline. See `recovery_target_timeline` in the PostgreSQL docs for more information. +``` +required: n +example: --target-timeline=3 +``` + +##### `recovery-setting` option + +Recovery settings in recovery.conf options can be specified with this option. See http://www.postgresql.org/docs/X.X/static/recovery-config.html for details on recovery.conf options (replace X.X with your database version). This option can be used multiple times. + +Note: `restore_command` will be automatically generated but can be overridden with this option. Be careful about specifying your own `restore_command` as pgBackRest is designed to handle this for you. Target Recovery options (recovery_target_name, recovery_target_time, etc.) are generated automatically by pgBackRest and should not be set with this option. + +Recovery settings can also be set in the `restore:recovery-setting` section of pg_backrest.conf. For example: +``` +[restore:recovery-setting] +primary_conn_info=db.mydomain.com +standby_mode=on +``` +Since pgBackRest does not start PostgreSQL after writing the `recovery.conf` file, it is always possible to edit/check `recovery.conf` before manually restarting. +``` +required: n +example: --recovery-setting primary_conninfo=db.mydomain.com +``` + +##### `tablespace-map` option + +Moves a tablespace to a new location during the restore. This is useful when tablespace locations are not the same on a replica, or an upgraded system has different mount points. + +Since PostgreSQL 9.2 tablespace locations are not stored in pg_tablespace so moving tablespaces can be done with impunity. However, moving a tablespace to the `data_directory` is not recommended and may cause problems. For more information on moving tablespaces http://www.databasesoup.com/2013/11/moving-tablespaces.html is a good resource. +``` +required: n +example: --tablespace-map ts_01=/db/ts_01 +``` + +##### Example: Restore Latest + +``` +/path/to/pg_backrest --stanza=db --type=name --target=release restore +``` +Restores the latest database backup and then recovers to the `release` restore point. + +#### `info` command + +Retrieve information about backups for a single stanza or for all stanzas. Text output is the default and gives a human-readable summary of backups for the stanza(s) requested. This format is subject to change with any release. + +For machine-readable output use `--output=json`. The JSON output contains far more information than the text output, however **this feature is currently experimental so the format may change between versions**. + +##### `output` option + +The following output types are supported: + +- `text` - Human-readable summary of backup information. +- `json` - Exhaustive machine-readable backup information in JSON format. + +``` +required: n +default: text +example: --output=json +``` + +##### Example: Information for a single stanza + +``` +/path/to/pg_backrest --stanza=db --output=json info +``` + +Get information about backups in the `db` stanza. + +##### Example: Information for all stanzas + +``` +/path/to/pg_backrest --output=json info +``` + +Get information about backups for all stanzas in the repository. diff --git a/doc/doc.pl b/doc/doc.pl index 618d62db5..0dcb8fcab 100755 --- a/doc/doc.pl +++ b/doc/doc.pl @@ -12,14 +12,15 @@ use Carp qw(confess); $SIG{__DIE__} = sub { Carp::confess @_ }; +use Cwd qw(abs_path); use File::Basename qw(dirname); -use Pod::Usage qw(pod2usage); use Getopt::Long qw(GetOptions); +use Pod::Usage qw(pod2usage); use XML::Checker::Parser; use lib dirname($0) . '/../lib'; -use BackRest::Utility; use BackRest::Config; +use BackRest::Utility; #################################################################################################################################### # Usage @@ -38,6 +39,9 @@ doc.pl [options] [operation] =cut +my $strProjectName = 'pgBackRest'; +my $strExeName = 'pg_backrest'; + #################################################################################################################################### # DOC_RENDER_TAG - render a tag to another markup language #################################################################################################################################### @@ -59,8 +63,8 @@ my $oRenderTag = 'setting' => ['`', '`'], 'code' => ['`', '`'], 'code-block' => ['```', '```'], - 'exe' => ['ERROR - EXE NOT SET', ''], - 'backrest' => ['ERROR - TITLE NOT SET', ''], + 'exe' => [$strExeName, ''], + 'backrest' => [$strProjectName, ''], 'postgres' => ['PostgreSQL', ''] }, @@ -345,73 +349,87 @@ sub doc_write close($hFile); } -#################################################################################################################################### -# Load command line parameters and config -#################################################################################################################################### -my $bHelp = false; # Display usage -my $bVersion = false; # Display version -my $bQuiet = false; # Sets log level to ERROR -my $strLogLevel = 'info'; # Log level for tests - -GetOptions ('help' => \$bHelp, - 'version' => \$bVersion, - 'quiet' => \$bQuiet, - 'log-level=s' => \$strLogLevel) - or pod2usage(2); - -# Display version and exit if requested -if ($bHelp || $bVersion) +sub doc_out_get { - print 'pg_backrest ' . version_get() . " doc builder\n"; + my $oNode = shift; + my $strName = shift; + my $bRequired = shift; - if ($bHelp) + foreach my $oChild (@{$$oNode{children}}) { - print "\n"; - pod2usage(); + if ($$oChild{name} eq $strName) + { + return $oChild; + } } - exit 0; -} - -# Set console log level -if ($bQuiet) -{ - $strLogLevel = 'off'; -} - -log_level_set(undef, uc($strLogLevel)); - -#################################################################################################################################### -# Load the doc file -#################################################################################################################################### -# Initialize parser object and parse the file -my $oParser = XML::Checker::Parser->new(ErrorContext => 2, Style => 'Tree'); -my $strFile = dirname($0) . '/doc.xml'; -my $oTree; - -eval -{ - local $XML::Checker::FAIL = sub + if (!defined($bRequired) || $bRequired) { - my $iCode = shift; + confess "unable to find child node '${strName}' in node '$$oNode{name}'"; + } - die XML::Checker::error_string($iCode, @_); - }; - - $oTree = $oParser->parsefile(dirname($0) . '/doc.xml'); -}; - -# Report any error that stopped parsing -if ($@) -{ - $@ =~ s/at \/.*?$//s; # remove module line number - die "malformed xml in '$strFile}':\n" . trim($@); + return undef; } -#################################################################################################################################### -# Build the document from xml -#################################################################################################################################### -my $oDocIn = doc_parse(${$oTree}[0], ${$oTree}[1]); +sub doc_option_list_process +{ + my $oOptionListOut = shift; + my $strOperation = shift; + my $oOptionFoundRef = shift; + my $oOptionRuleRef = shift; + + foreach my $oOptionOut (@{$$oOptionListOut{children}}) + { + my $strOption = $$oOptionOut{param}{id}; + + # if (defined($oOptionFound{$strOption})) + # { + # confess "option ${strOption} has already been found"; + # } + + if ($strOption eq 'help' || $strOption eq 'version') + { + next; + } + + $$oOptionFoundRef{$strOption} = true; + + if (!defined($$oOptionRuleRef{$strOption}{&OPTION_RULE_TYPE})) + { + confess "unable to find option $strOption"; + } + + $$oOptionOut{field}{default} = optionDefault($strOption, $strOperation); + + if (defined($$oOptionOut{field}{default})) + { + $$oOptionOut{field}{required} = false; + + if ($$oOptionRuleRef{$strOption}{&OPTION_RULE_TYPE} eq &OPTION_TYPE_BOOLEAN) + { + $$oOptionOut{field}{default} = $$oOptionOut{field}{default} ? 'y' : 'n'; + } + } + else + { + $$oOptionOut{field}{required} = optionRequired($strOption, $strOperation); + } + + if (defined($strOperation)) + { + $$oOptionOut{field}{cmd} = true; + } + + if ($strOption eq 'cmd-remote') + { + $$oOptionOut{field}{default} = 'same as local'; + } + + # &log(INFO, "operation " . (defined($strOperation) ? $strOperation : '[undef]') . + # ", option ${strOption}, required $$oOptionOut{field}{required}" . + # ", default " . (defined($$oOptionOut{field}{default}) ? $$oOptionOut{field}{default} : 'undef')); + } +} sub doc_build { @@ -459,157 +477,6 @@ sub doc_build return $oOut; } -my $oDocOut = doc_build($oDocIn); - -#################################################################################################################################### -# Build commands pulled from the code -#################################################################################################################################### -# Get the option rules -my $oOptionRule = optionRuleGet(); -my %oOptionFound; - -sub doc_out_get -{ - my $oNode = shift; - my $strName = shift; - my $bRequired = shift; - - foreach my $oChild (@{$$oNode{children}}) - { - if ($$oChild{name} eq $strName) - { - return $oChild; - } - } - - if (!defined($bRequired) || $bRequired) - { - confess "unable to find child node '${strName}' in node '$$oNode{name}'"; - } - - return undef; -} - -sub doc_option_list_process -{ - my $oOptionListOut = shift; - my $strOperation = shift; - - foreach my $oOptionOut (@{$$oOptionListOut{children}}) - { - my $strOption = $$oOptionOut{param}{id}; - - # if (defined($oOptionFound{$strOption})) - # { - # confess "option ${strOption} has already been found"; - # } - - if ($strOption eq 'help' || $strOption eq 'version') - { - next; - } - - $oOptionFound{$strOption} = true; - - if (!defined($$oOptionRule{$strOption}{&OPTION_RULE_TYPE})) - { - confess "unable to find option $strOption"; - } - - $$oOptionOut{field}{default} = optionDefault($strOption, $strOperation); - - if (defined($$oOptionOut{field}{default})) - { - $$oOptionOut{field}{required} = false; - - if ($$oOptionRule{$strOption}{&OPTION_RULE_TYPE} eq &OPTION_TYPE_BOOLEAN) - { - $$oOptionOut{field}{default} = $$oOptionOut{field}{default} ? 'y' : 'n'; - } - } - else - { - $$oOptionOut{field}{required} = optionRequired($strOption, $strOperation); - } - - if (defined($strOperation)) - { - $$oOptionOut{field}{cmd} = true; - } - - if ($strOption eq 'cmd-remote') - { - $$oOptionOut{field}{default} = 'same as local'; - } - - # &log(INFO, "operation " . (defined($strOperation) ? $strOperation : '[undef]') . - # ", option ${strOption}, required $$oOptionOut{field}{required}" . - # ", default " . (defined($$oOptionOut{field}{default}) ? $$oOptionOut{field}{default} : 'undef')); - } -} - -# Ouput general options -my $oOperationGeneralOptionListOut = doc_out_get(doc_out_get(doc_out_get($oDocOut, 'operation'), 'operation-general'), 'option-list'); -doc_option_list_process($oOperationGeneralOptionListOut); - -# Ouput commands -my $oCommandListOut = doc_out_get(doc_out_get($oDocOut, 'operation'), 'command-list'); - -foreach my $oCommandOut (@{$$oCommandListOut{children}}) -{ - my $strOperation = $$oCommandOut{param}{id}; - - my $oOptionListOut = doc_out_get($oCommandOut, 'option-list', false); - - if (defined($oOptionListOut)) - { - doc_option_list_process($oOptionListOut, $strOperation); - } - - my $oExampleListOut = doc_out_get($oCommandOut, 'command-example-list'); - - foreach my $oExampleOut (@{$$oExampleListOut{children}}) - { - if (defined($$oExampleOut{param}{title})) - { - $$oExampleOut{param}{title} = 'Example: ' . $$oExampleOut{param}{title}; - } - else - { - $$oExampleOut{param}{title} = 'Example'; - } - } - - # $$oExampleListOut{param}{title} = 'Examples'; -} - -# Ouput config section -my $oConfigSectionListOut = doc_out_get(doc_out_get($oDocOut, 'config'), 'config-section-list'); - -foreach my $oConfigSectionOut (@{$$oConfigSectionListOut{children}}) -{ - my $oOptionListOut = doc_out_get($oConfigSectionOut, 'config-key-list', false); - - if (defined($oOptionListOut)) - { - doc_option_list_process($oOptionListOut); - } -} - -# Mark undocumented features as processed -$oOptionFound{'no-fork'} = true; -$oOptionFound{'test'} = true; -$oOptionFound{'test-delay'} = true; - -# Make sure all options were processed -foreach my $strOption (sort(keys($oOptionRule))) -{ - if (!defined($oOptionFound{$strOption})) - { - confess "option ${strOption} was not found"; - } -} - #################################################################################################################################### # Render the document #################################################################################################################################### @@ -637,12 +504,6 @@ sub doc_render if (defined($$oDoc{param}{title})) { - if ($iDepth == 1) - { - $$oRenderTag{'markdown'}{'backrest'}[0] = $$oDoc{param}{title}; - $$oRenderTag{'markdown'}{'exe'}[0] = $$oDoc{param}{exe}; - } - $strBuffer = ('#' x $iDepth) . ' '; if (defined($$oDoc{param}{version})) @@ -650,17 +511,30 @@ sub doc_render $strBuffer .= "v$$oDoc{param}{version}: "; } - $strBuffer .= $$oDoc{param}{title}; - } + $strBuffer .= ($iDepth == 1 ? "${strProjectName} - " : '') . $$oDoc{param}{title}; - if (defined($$oDoc{param}{subtitle})) - { - if (!defined($$oDoc{param}{subtitle})) + if (defined($$oDoc{param}{date})) { - confess "subtitle not valid without title"; - } + my $strDate = $$oDoc{param}{date}; - $strBuffer .= " - " . $$oDoc{param}{subtitle}; + if ($strDate !~ /^(XXXX-XX-XX)|([0-9]{4}-[0-9]{2}-[0-9]{2})$/) + { + confess "invalid date ${strDate}"; + } + + if ($strDate =~ /^X/) + { + $strBuffer .= "\n__No Release Date Set__"; + } + else + { + my @stryMonth = ('January', 'February', 'March', 'April', 'May', 'June', + 'July', 'August', 'September', 'October', 'November', 'December'); + + $strBuffer .= "\n__Released " . $stryMonth[(substr($strDate, 5, 2) - 1)] . ' ' . + (substr($strDate, 8, 2) + 0) . ', ' . substr($strDate, 0, 4) . '__'; + } + } } if ($strBuffer ne "") @@ -773,5 +647,161 @@ sub doc_render return $strBuffer; } +#################################################################################################################################### +# Load command line parameters and config +#################################################################################################################################### +my $bHelp = false; # Display usage +my $bVersion = false; # Display version +my $bQuiet = false; # Sets log level to ERROR +my $strLogLevel = 'info'; # Log level for tests + +GetOptions ('help' => \$bHelp, + 'version' => \$bVersion, + 'quiet' => \$bQuiet, + 'log-level=s' => \$strLogLevel) + or pod2usage(2); + +# Display version and exit if requested +if ($bHelp || $bVersion) +{ + print 'pg_backrest ' . version_get() . " doc builder\n"; + + if ($bHelp) + { + print "\n"; + pod2usage(); + } + + exit 0; +} + +# Set console log level +if ($bQuiet) +{ + $strLogLevel = 'off'; +} + +log_level_set(undef, uc($strLogLevel)); + +my $strBasePath = abs_path(dirname($0)); + +sub doc_process +{ + my $strXmlIn = shift; + my $strMdOut = shift; + my $bManual = shift; + +#################################################################################################################################### +# Load the doc file +#################################################################################################################################### +# Initialize parser object and parse the file +my $oParser = XML::Checker::Parser->new(ErrorContext => 2, Style => 'Tree'); +$oParser->set_sgml_search_path("${strBasePath}/xml/dtd"); + +my $oTree; + +eval +{ + local $XML::Checker::FAIL = sub + { + my $iCode = shift; + + die XML::Checker::error_string($iCode, @_); + }; + + $oTree = $oParser->parsefile($strXmlIn); +}; + +# Report any error that stopped parsing +if ($@) +{ + $@ =~ s/at \/.*?$//s; # remove module line number + die "malformed xml in '${strXmlIn}':\n" . trim($@); +} + +#################################################################################################################################### +# Build the document from xml +#################################################################################################################################### +my $oDocIn = doc_parse(${$oTree}[0], ${$oTree}[1]); + +my $oDocOut = doc_build($oDocIn); + +#################################################################################################################################### +# Build commands pulled from the code +#################################################################################################################################### +if ($bManual) +{ +# Get the option rules +my $oOptionRule = optionRuleGet(); +my %oOptionFound; + +# Ouput general options +my $oOperationGeneralOptionListOut = doc_out_get(doc_out_get(doc_out_get($oDocOut, 'operation'), 'operation-general'), 'option-list'); +doc_option_list_process($oOperationGeneralOptionListOut, undef, \%oOptionFound, $oOptionRule); + +# Ouput commands +my $oCommandListOut = doc_out_get(doc_out_get($oDocOut, 'operation'), 'command-list'); + +foreach my $oCommandOut (@{$$oCommandListOut{children}}) +{ + my $strOperation = $$oCommandOut{param}{id}; + + my $oOptionListOut = doc_out_get($oCommandOut, 'option-list', false); + + if (defined($oOptionListOut)) + { + doc_option_list_process($oOptionListOut, $strOperation, \%oOptionFound, $oOptionRule); + } + + my $oExampleListOut = doc_out_get($oCommandOut, 'command-example-list'); + + foreach my $oExampleOut (@{$$oExampleListOut{children}}) + { + if (defined($$oExampleOut{param}{title})) + { + $$oExampleOut{param}{title} = 'Example: ' . $$oExampleOut{param}{title}; + } + else + { + $$oExampleOut{param}{title} = 'Example'; + } + } + + # $$oExampleListOut{param}{title} = 'Examples'; +} + +# Ouput config section +my $oConfigSectionListOut = doc_out_get(doc_out_get($oDocOut, 'config'), 'config-section-list'); + +foreach my $oConfigSectionOut (@{$$oConfigSectionListOut{children}}) +{ + my $oOptionListOut = doc_out_get($oConfigSectionOut, 'config-key-list', false); + + if (defined($oOptionListOut)) + { + doc_option_list_process($oOptionListOut, undef, \%oOptionFound, $oOptionRule); + } +} + +# Mark undocumented features as processed +$oOptionFound{'no-fork'} = true; +$oOptionFound{'test'} = true; +$oOptionFound{'test-delay'} = true; + +# Make sure all options were processed +foreach my $strOption (sort(keys($oOptionRule))) +{ + if (!defined($oOptionFound{$strOption})) + { + confess "option ${strOption} was not found"; + } +} +} + # Write markdown -doc_write(dirname($0) . '/../README.md', doc_render($oDocOut, 'markdown', 1)); +doc_write($strMdOut, doc_render($oDocOut, 'markdown', 1)); +} + +doc_process("${strBasePath}/xml/readme.xml", "${strBasePath}/../README.md", false); +doc_process("${strBasePath}/xml/userguide.xml", "${strBasePath}/../USERGUIDE.md", true); +doc_process("${strBasePath}/xml/changelog.xml", "${strBasePath}/../CHANGELOG.md", false); diff --git a/doc/doc.xml b/doc/doc.xml deleted file mode 100644 index cc95fbe0b..000000000 --- a/doc/doc.xml +++ /dev/null @@ -1,1041 +0,0 @@ - - - - - aims to be a simple backup and restore system that can seamlessly scale up to the largest databases and workloads. - - Primary features: -
    -
  • Local or remote backup
  • -
  • Multi-threaded backup/restore for performance
  • -
  • Checksums
  • -
  • Safe backups (checks that logs required for consistency are present before backup completes)
  • -
  • Full, differential, and incremental backups
  • -
  • Backup rotation (and minimum retention rules with optional separate retention for archive)
  • -
  • In-stream compression/decompression
  • -
  • Archiving and retrieval of logs for replicas/restores built in
  • -
  • Async archiving for very busy systems (including space limits)
  • -
  • Backup directories are consistent clusters (when hardlinks are on and compression is off)
  • -
  • Tablespace support
  • -
  • Restore delta option
  • -
  • Restore using timestamp/size or checksum
  • -
  • Restore remapping base/tablespaces
  • -
  • Support for >= 8.3
  • -
- Instead of relying on traditional backup tools like tar and rsync, implements all backup features internally and uses a custom protocol for communicating with remote systems. Removing reliance on tar and rsync allows for better solutions to database-specific backup issues. The custom remote protocol limits the types of connections that are required to perform a backup which increases security. - - uses the gitflow model of development. This means that the master branch contains only the release history, i.e. each commit represents a single release and release tags are always from the master branch. The dev branch contains a single commit for each feature or fix and more accurately depicts the development history. Actual development is done on feature (dev_*) branches and squashed into dev after regression tests have passed. In this model dev is considered stable and can be released at any time. As such, the dev branch does not have any special version modifiers.
-
- - - is written entirely in Perl. Some additional modules will need to be installed depending on the OS. - - - - * Install required Perl modules: - - apt-get install libdbd-pg-perl - - - - - * Install Perl and required modules: - - yum install perl perl-Time-HiRes perl-IO-String perl-parent perl-JSON perl-Digest-SHA perl-DBD-Pg - - - - - can be installed by downloading the most recent release: - - https://github.com/pgmasters/backrest/releases - - can be installed anywhere but it's best (though not required) to install it in the same location on all systems. - - - - * Create the backrest user - - The backrest user must be created on the same system and in the same group as the user you will use for testing (which can be any user you prefer). For example: - - adduser -g <test-user-group> backrest - - * Setup password-less SSH login between the test user and the backrest user - - The test user should be able to `ssh backrest@127.0.0.1` and the backrest user should be able to `ssh <testuser>@127.0.0.1` without requiring any passwords. This article (http://archive.oreilly.com/pub/h/66) has details on how to accomplish this. Do the logons both ways at the command line before running regression tests. - - * Give group read and execute permissions to ~/backrest/test: - - Usually this can be accomplished by running the following as the test user: - - chmod 750 ~ - - * Running regression: - - Running the full regression suite is generally not necessary. Run the following first: - - ./test.pl --module=backup --test=full --db-version=all --thread-max=<# threads> - - This will run full backup/restore regression with a variety of options on all installed versions of . If you are only interested in one version then modify the db-version setting to X.X (e.g. 9.4). --thread-max can be omitted if you are running single-threaded. - - If there are errors in this test then run full regression to help isolate problems: - - ./test.pl --db-version=all --thread-max=<# threads> - - Report regression test failures at https://github.com/pgmasters/backrest/issues. - - - - - - - can be used entirely with command-line parameters but a configuration file is more practical for installations that are complex or set a lot of options. The default location for the configuration file is /etc/pg_backrest.conf. - - - - Modify the following settings in postgresql.conf: - - wal_level = archive - archive_mode = on - archive_command = '/path/to/backrest/bin/ --stanza=db archive-push %p' - - Replace the path with the actual location where was installed. The stanza parameter should be changed to the actual stanza name for your database. - - - - - The absolute minimum required to run (if all defaults are accepted) is the database path. - - /etc/pg_backrest.conf: - - [main] - db-path=/data/db - - The db-path option could also be provided on the command line, but it's best to use a configuration file as options tend to pile up quickly. - - - - This configuration is appropriate for a small installation where backups are being made locally or to a remote file system that is mounted locally. A number of additional options are set: -
    -
  • db-port - Custom port for .
  • -
  • compress - Disable compression (handy if the file system is already compressed).
  • -
  • repo-path - Path to the repository where backups and WAL archive are stored.
  • -
  • log-level-file - Set the file log level to debug (Lots of extra info if something is not working as expected).
  • -
  • hardlink - Create hardlinks between backups (but never between full backups).
  • -
  • thread-max - Use 2 threads for backup/restore operations.
  • -
- /etc/pg_backrest.conf: - - [global:general] - compress=n - repo-path=/path/to/db/repo - - [global:log] - log-level-file=debug - - [global:backup] - hardlink=y - thread-max=2 - - [main] - db-path=/data/db - db-port=5555 - -
-
- - - This configuration is appropriate for a small installation where backups are being made remotely. Make sure that postgres@db-host has trusted ssh to backrest@backup-host and vice versa. This configuration assumes that you have in the same path on both servers. - - /etc/pg_backrest.conf on the db host: - - [global:general] - repo-path=/path/to/db/repo - repo-remote-path=/path/to/backup/repo - - [global:backup] - backup-host=backup.mydomain.com - backup-user=backrest - - [global:archive] - archive-async=y - - [main] - db-path=/data/db - - /etc/pg_backrest.conf on the backup host: - - [global:general] - repo-path=/path/to/backup/repo - - [main] - db-host=db.mydomain.com - db-path=/data/db - db-user=postgres - - - -
- - - - - The command section defines the location of external commands that are used by . - - - - - Defines the location of pg_backrest_remote.pl. - - Required only if the path to pg_backrest_remote.pl is different on the local and remote systems. If not defined, the remote path will be assumed to be the same as the local path. - - same as local - /usr/lib/backrest/bin/pg_backrest_remote.pl - - - - - - - The log section defines logging-related settings. The following log levels are supported: -
    -
  • off - No logging at all (not recommended)
  • -
  • error - Log only errors
  • -
  • warn - Log warnings and errors
  • -
  • info - Log info, warnings, and errors
  • -
  • debug - Log debug, info, warnings, and errors
  • -
  • trace - Log trace (very verbose debugging), debug, info, warnings, and errors
  • -
- - - - - Sets file log level. - - debug - - - - - Sets console log level. - - error - - -
- - - - The general section defines settings that are shared between multiple operations. - - - - - Set the buffer size used for copy, compress, and uncompress functions. A maximum of 3 buffers will be in use at a time per thread. An additional maximum of 256K per thread may be used for zlib buffers. - - 16384 - 8388608 - 32768 - - - - - Enable gzip compression. Backup files are compatible with command-line gzip tools. - - n - - - - - Sets the zlib level to be used for file compression when compress=y. - - 0-9 - 9 - - - - - Sets the zlib level to be used for protocol compression when compress=n and the database is not on the same host as the backup. Protocol compression is used to reduce network traffic but can be disabled by setting compress-level-network=0. When compress=y the compress-level-network setting is ignored and compress-level is used instead so that the file is only compressed once. SSH compression is always disabled. - - 0-9 - 1 - - - - - Sets the umask to 0000 so modes in the repository as created in a sensible way. The default directory mode is 0750 and default file mode is 0640. The lock and log directories set the directory and file mode to 0770 and 0660 respectively. - - To use the executing user's umask instead specify neutral-umask=n in the config file or --no-neutral-umask on the command line. - - n - - - - - Path to the backrest repository where WAL segments, backups, logs, etc are stored. - - /data/db/backrest - - - - - Path to the remote backrest repository where WAL segments, backups, logs, etc are stored. - - /backup/backrest - - - - - - - The backup section defines settings related to backup. - - - - - Sets the backup host when backup up remotely via SSH. Make sure that trusted SSH authentication is configured between the db host and the backup host. - - When backing up to a locally mounted network filesystem this setting is not required. - - backup.domain.com - - - - - Sets user account on the backup host. - - backrest - - - - - Forces a checkpoint (by passing true to the fast parameter of pg_start_backup()) so the backup begins immediately. Otherwise the backup will start after the next regular checkpoint. - - y - - - - - Enable hard-linking of files in differential and incremental backups to their full backups. This gives the appearance that each backup is a full backup. Be careful, though, because modifying files that are hard-linked can affect all the backups in the set. - - y - - - - - Defines how often the manifest will be saved during a backup (in bytes). Saving the manifest is important because it stores the checksums and allows the resume function to work efficiently. The actual threshold used is 1% of the backup size or manifest-save-threshold, whichever is greater. - - 5368709120 - - - - - Defines whether the resume feature is enabled. Resume can greatly reduce the amount of time required to run a backup after a previous backup of the same type has failed. It adds complexity, however, so it may be desirable to disable in environments that do not require the feature. - false - - - - - Defines the number of threads to use for backup or restore. Each thread will perform compression and transfer to make the backup run faster, but don't set thread-max so high that it impacts database performance during backup. - 4 - - - - - Maximum amount of time (in seconds) that a backup thread should run. This limits the amount of time that a thread might be stuck due to unforeseen issues during the backup. Has no affect when thread-max=1. - - 3600 - - - - - Checks that all WAL segments required to make the backup consistent are present in the WAL archive. It's a good idea to leave this as the default unless you are using another method for archiving. - - n - - - - - Store WAL segments required to make the backup consistent in the backup's pg_xlog path. This slightly paranoid option protects against corruption or premature expiration in the WAL segment archive. PITR won't be possible without the WAL segment archive and this option also consumes more space. - - Even though WAL segments will be restored with the backup, will ignore them if a recovery.conf file exists and instead use archive_command to fetch WAL segments. Specifying type=none when restoring will not create recovery.conf and force to use the WAL segments in pg_xlog. This will get the database to a consistent state. - - y - - - - - - - The archive section defines parameters when doing async archiving. This means that the archive files will be stored locally, then a background process will pick them and move them to the backup. - - - - - Archive WAL segments asynchronously. WAL segments will be copied to the local repo, then a process will be forked to compress the segment and transfer it to the remote repo if configured. Control will be returned to as soon as the WAL segment is copied locally. - y - - - - - Limits the amount of archive log that will be written locally when archive-async=y. After the limit is reached, the following will happen: -
    -
  1. will notify Postgres that the archive was successfully backed up, then DROP IT.
  2. -
  3. An error will be logged to the console and also to the Postgres log.
  4. -
  5. A stop file will be written in the lock directory and no more archive files will be backed up until it is removed.
  6. -
- If this occurs then the archive log stream will be interrupted and PITR will not be possible past that point. A new backup will be required to regain full restore capability. - - The purpose of this feature is to prevent the log volume from filling up at which point Postgres will stop completely. Better to lose the backup than have the database go down. - - To start normal archiving again you'll need to remove the stop file which will be located at ${repo-path}/lock/${stanza}-archive.stop where ${repo-path} is the path set in the general section, and ${stanza} is the backup stanza.
- - 1024 -
-
-
- - - - The restore section defines settings used for restoring backups. - - - - - Defines whether tablespaces will be be restored into their original (or remapped) locations or stored directly under the pg_tblspc path. Disabling this setting produces compact restores that are convenient for development, staging, etc. Currently these restores cannot be backed up as expects only links in the pg_tblspc path. If no tablespaces are present this this setting has no effect. - n - - - - - - - The expire section defines how long backups will be retained. Expiration only occurs when the number of complete backups exceeds the allowed retention. In other words, if full-retention is set to 2, then there must be 3 complete backups before the oldest will be expired. Make sure you always have enough space for retention + 1 backups. - - - - - Number of full backups to keep. When a full backup expires, all differential and incremental backups associated with the full backup will also expire. When not defined then all full backups will be kept. - - 2 - - - - - Number of differential backups to keep. When a differential backup expires, all incremental backups associated with the differential backup will also expire. When not defined all differential backups will be kept. - - 3 - - - - - Type of backup to use for archive retention (full or differential). If set to full, then will keep archive logs for the number of full backups defined by retention-archive. If set to differential, then will keep archive logs for the number of differential backups defined by retention-archive. - - If not defined then archive logs will be kept indefinitely. In general it is not useful to keep archive logs that are older than the oldest backup, but there may be reasons for doing so. - - diff - - - - - Number of backups worth of archive log to keep. If this is set less than your backup retention then be sure you set archive-copy=y or you won't be able to restore some older backups. - - For example, if retention-archive=2 and retention-full=4, then any backups older than the most recent two full backups will not have WAL segments in the archive to make them consistent. To solve this, set archive-copy=y and use type=none when restoring. This issue will be addressed in a future release but for now be careful with this setting. - - 2 - - - - - - - A stanza defines a backup for a specific database. The stanza section must define the base database path and host/user if the database is remote. Also, any global configuration sections can be overridden to define stanza-specific settings. - - - - - Define the database host. Used for backups where the database host is different from the backup host. - - db.domain.com - - - - - Defines the logon user when db-host is defined. This user will also own the remote process and will initiate connections to . For this to work correctly the user should be the cluster owner which is generally postgres, the default. - - test_user - - - - - Path to the db data directory (data_directory setting in postgresql.conf). - - /data/db - - - - - Port that is running on. This usually does not need to be specified as most clusters run on the default port. - - 6543 - - - - - The unix socket directory that was specified when was started. will automatically look in the standard location for your OS so there usually no need to specify this setting unless the socket directory was explicily modified with the unix_socket_directory setting in postgressql.conf. - - /var/run/postgresql - - - -
-
- - - - These options are either global or used by all commands. - - - - - - - - - - - - - - - - - - - - Perform a database backup. does not have a built-in scheduler so it's best to run it from cron or some other scheduling mechanism. - - - - - - - - - - - - - - - -/path/to/ --stanza=db --type=full backup - -Run a full backup on the db stanza. --type can also be set to incr or diff for incremental or differential backups. However, if no full backup exists then a full backup will be forced even if incr or diff is requested. - - - - - - - Archive a WAL segment to the repository. - - - - - /path/to/ --stanza=db archive-push %p - - Accepts a WAL segment from and archives it in the repository defined by repo-path. %p is how specifies the location of the WAL segment to be archived. - - - - - - - Get a WAL segment from the repository. - - - - - /path/to/ --stanza=db archive-get %f %p - - Retrieves a WAL segment from the repository. This command is used in recovery.conf to restore a backup, perform PITR, or as an alternative to streaming for keeping a replica up to date. %f is how specifies the WAL segment it needs and %p is the location where it should be copied. - - - - - - - does backup rotation, but is not concerned with when the backups were created. So if two full backups are configured for retention, will keep two full backups no matter whether they occur, two hours apart or two weeks apart. - - - - - /path/to/ --stanza=db expire - - Expire (rotate) any backups that exceed the defined retention. Expiration is run automatically after every successful backup, so there is no need to run this command separately unless you have reduced retention, usually to free up some space. - - - - - - - Perform a database restore. This command is generally run manually, but there are instances where it might be automated. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - /path/to/ --stanza=db --type=name --target=release restore - - Restores the latest database backup and then recovers to the release restore point. - - - - - - - Retrieve information about backups for a single stanza or for all stanzas. Text output is the default and gives a human-readable summary of backups for the stanza(s) requested. This format is subject to change with any release. - - For machine-readable output use --output=json. The JSON output contains far more information than the text output, however this feature is currently experimental so the format may change between versions. - - - - - - - - - - /path/to/ --stanza=db --output=json info - - - Get information about backups in the db stanza. - - - - - - - - - - - - Fixed an issue that caused the formatted timestamp for both the oldest and newest backups to be reported as the current time by the info command. Only text output was affected -- json output reported the correct epoch values. Found by Michael Renner. - - - Fixed protocol issue that was preventing ssh errors (especially on connection) from being logged. - - - Now using Perl DBI for connections to rather than psql. The cmd-psql and cmd-psql-option settings have been removed and replaced with db-port and db-socket-path. - - - Remove pg_control file at the beginning of the restore and copy it back at the very end. This prevents the possibility that a partial restore can be started by . - - - The repository is now created and updated with consistent directory and file modes. By default umask is set to 0000 but this can be disabled with the neutral-umask setting - - - Added checks to be sure the db-path setting is consistent with db-port by comparing the data_directory as reported by the cluster against the db-path setting and the version as reported by the cluster against the value read from pg_control. The db-socket-path setting is checked to be sure it is an absolute path. - - - Major refactoring of the protocol layer to support future development. - - - Added vagrant test configurations for Ubuntu 14.04 and CentOS 7. - - - Experimental support for 9.5 alpha1. This may break when the control version or WAL magic changes in future versions but will be updated in each release to keep pace. All regression tests pass except for --target-resume tests (this functionality has changed in 9.5) and there is no testing yet for .partial WAL segments. - - - - - - - - Removed dependency on CPAN packages for multi-threaded operation. While it might not be a bad idea to update the threads and Thread::Queue packages, it is no longer necessary. - - - Added vagrant test configurations for Ubuntu 12.04 and CentOS 6. - - - Modified wait backoff to use a Fibonacci rather than geometric sequence. This will make wait time grow less aggressively while still giving reasonable values. - - - More options for regression tests and improved code to run in a variety of environments. - - - - - - - - Removed pg_backrest_remote and added the functionality to pg_backrest as the remote command. - - - Added file and directory syncs to the File object for additional safety during backup/restore and archiving. Suggested by Andres Freund. - - - Support for Perl 5.10.1 and OpenSSH 5.3 which are default for CentOS/RHEL 6. Found by Eric Radman. - - - Improved error message when backup is run without archive_command set and without --no-archive-check specified. Found by Eric Radman. - - - Moved version number out of the VERSION file to Version.pm to better support packaging. Suggested by Michael Renner. - - - Replaced IPC::System::Simple and Net::OpenSSH with IPC::Open3 to eliminate CPAN dependency for multiple distros. - - - - - - - - IMPORTANT NOTE: This flag day release breaks compatibility with older versions of . The manifest format, on-disk structure, and the binary names have all changed. You must create a new repository to hold backups for this version of and keep your older repository for a time in case you need to do a restore. The pg_backrest.conf file has not changed but you'll need to change any references to pg_backrest.pl in cron (or elsewhere) to pg_backrest (without the .pl extension). - - - Add info command. - - - More efficient file ordering for backup. Files are copied in descending size order so a single thread does not end up copying a large file at the end. This had already been implemented for restore. - - - Logging now uses unbuffered output. This should make log files that are being written by multiple threads less chaotic. Suggested by Michael Renner. - - - Experimental support for 9.5. This may break when the control version or WAL magic changes but will be updated in each release. - - - - - - - - Fixed an issue where archive-copy would fail on an incr/diff backup when hardlink=n. In this case the pg_xlog path does not already exist and must be created. Reported by Michael Renner - - - Allow duplicate WAL segments to be archived when the checksum matches. This is necessary for some recovery scenarios. - - - Allow comments/disabling in pg_backrest.conf using #. Suggested by Michael Renner. - - - Better logging before pg_start_backup() to make it clear when the backup is waiting on a checkpoint. Suggested by Michael Renner. - - - Various command behavior, help and logging fixes. Reported by Michael Renner. - - - Fixed an issue in async archiving where archive-push was not properly returning 0 when archive-max-mb was reached and moved the async check after transfer to avoid having to remove the stop file twice. Also added unit tests for this case and improved error messages to make it clearer to the user what went wrong. Reported by Michael Renner. - - - Fixed a locking issue that could allow multiple operations of the same type against a single stanza. This appeared to be benign in terms of data integrity but caused spurious errors while archiving and could lead to errors in backup/restore. Reported by Michael Renner. - - - Replaced JSON module with JSON::PP which ships with core Perl. - - - - - - - - Better resume support. Resumed files are checked to be sure they have not been modified and the manifest is saved more often to preserve checksums as the backup progresses. More unit tests to verify each resume case. - - - Resume is now optional. Use the resume setting or --no-resume from the command line to disable. - - - More info messages during restore. Previously, most of the restore messages were debug level so not a lot was output in the log. - - - Fixed an issue where an absolute path was not written into recovery.conf when the restore was run with a relative path. - - - Added tablespace setting to allow tablespaces to be restored into the pg_tblspc path. This produces compact restores that are convenient for development, staging, etc. Currently these restores cannot be backed up as expects only links in the pg_tblspc path. - - - - - - - - Fixed a buffering error that could occur on large, highly-compressible files when copying to an uncompressed remote destination. The error was detected in the decompression code and resulted in a failed backup rather than corruption so it should not affect successful backups made with previous versions. - - - - - - - - Pushing duplicate WAL now generates an error. This worked before only if checksums were disabled. - - - Database System IDs are used to make sure that all WAL in an archive matches up. This should help prevent misconfigurations that send WAL from multiple clusters to the same archive. - - - Regression tests working back to 8.3. - - - Improved threading model by starting threads early and terminating them late. - - - - - - - - Added restore functionality. - - - All options can now be set on the command-line making pg_backrest.conf optional. - - - De/compression is now performed without threads and checksum/size is calculated in stream. That means file checksums are no longer optional. - - - Added option --no-start-stop to allow backups when Postgres is shut down. If postmaster.pid is present then --force is required to make the backup run (though if Postgres is running an inconsistent backup will likely be created). This option was added primarily for the purpose of unit testing, but there may be applications in the real world as well. - - - Fixed broken checksums and now they work with normal and resumed backups. Finally realized that checksums and checksum deltas should be functionally separated and this simplified a number of things. Issue #28 has been created for checksum deltas. - - - Fixed an issue where a backup could be resumed from an aborted backup that didn't have the same type and prior backup. - - - Removed dependency on Moose. It wasn't being used extensively and makes for longer startup times. - - - Checksum for backup.manifest to detect corrupted/modified manifest. - - - Link latest always points to the last backup. This has been added for convenience and to make restores simpler. - - - More comprehensive unit tests in all areas. - - - - - - - - Complete rewrite of BackRest::File module to use a custom protocol for remote operations and Perl native GZIP and SHA operations. Compression is performed in threads rather than forked processes. - - - Fairly comprehensive unit tests for all the basic operations. More work to be done here for sure, but then there is always more work to be done on unit tests. - - - Removed dependency on Storable and replaced with a custom ini file implementation. - - - Added much needed documentation - - - Numerous other changes that can only be identified with a diff. - - - - - - - - Working on improving error handling in the file object. This is not complete, but works well enough to find a few errors that have been causing us problems (notably, find is occasionally failing building the archive async manifest when system is under load). - - - Found and squashed a nasty bug where file_copy() was defaulted to ignore errors. There was also an issue in file_exists that was causing the test to fail when the file actually did exist. Together they could have resulted in a corrupt backup with no errors, though it is very unlikely. - - - - - - - - The archive-get operation returns a 1 when the archive file is missing to differentiate from hard errors (ssh connection failure, file copy error, etc.) This lets Postgres know that that the archive stream has terminated normally. However, this does not take into account possible holes in the archive stream. - - - - - - - - If an archive directory which should be empty could not be deleted backrest was throwing an error. There's a good fix for that coming, but for the time being it has been changed to a warning so processing can continue. This was impacting backups as sometimes the final archive file would not get pushed if the first archive file had been in a different directory (plus some bad luck). - - - - - - - - Added RequestTTY=yes to ssh sessions. Hoping this will prevent random lockups. - - - - - - - - Added archive-get functionality to aid in restores. - - - Added option to force a checkpoint when starting the backup start-fast=y. - - - - - - - - Removed master_stderr_discard option on database SSH connections. There have been occasional lockups and they could be related to issues originally seen in the file code. - - - Changed lock file conflicts on backup and expire commands to ERROR. They were set to DEBUG due to a copy-and-paste from the archive locks. - - - - - - - - No restore functionality, but the backup directories are consistent Postgres data directories. You'll need to either uncompress the files or turn off compression in the backup. Uncompressed backups on a ZFS (or similar) filesystem are a good option because backups can be restored locally via a snapshot to create logical backups or do spot data recovery. - - - Archiving is single-threaded. This has not posed an issue on our multi-terabyte databases with heavy write volume. Recommend a large WAL volume or to use the async option with a large volume nearby. - - - Backups are multi-threaded, but the Net::OpenSSH library does not appear to be 100% thread-safe so it will very occasionally lock up on a thread. There is an overall process timeout that resolves this issue by killing the process. Yes, very ugly. - - - Checksums are lost on any resumed backup. Only the final backup will record checksum on multiple resumes. Checksums from previous backups are correctly recorded and a full backup will reset everything. - - - The backup.manifest is being written as Storable because Config::IniFile does not seem to handle large files well. Would definitely like to save these as human-readable text. - - - Absolutely no documentation (outside the code). Well, excepting these release notes. - - - - - - - - Primary recognition goes to Stephen Frost for all his valuable advice and criticism during the development of . - - Crunchy Data Solutions (http://www.crunchydata.com) has contributed time and resources to and continues to support development. Resonate (http://www.resonate.com/) also contributed to the development of and allowed me to install early (but well tested) versions as their primary backup solution. - -
diff --git a/doc/xml/changelog.xml b/doc/xml/changelog.xml new file mode 100644 index 000000000..b82e96e1b --- /dev/null +++ b/doc/xml/changelog.xml @@ -0,0 +1,313 @@ + + + + + + + + + + + + Fixed an issue that caused the formatted timestamp for both the oldest and newest backups to be reported as the current time by the info command. Only text output was affected -- json output reported the correct epoch values. Reported by Michael Renner. + + + Fixed protocol issue that was preventing ssh errors (especially on connection) from being logged. + + + Now using Perl DBI for connections to rather than psql. The cmd-psql and cmd-psql-option settings have been removed and replaced with db-port and db-socket-path. + + + Remove pg_control file at the beginning of the restore and copy it back at the very end. This prevents the possibility that a partial restore can be started by . + + + The repository is now created and updated with consistent directory and file modes. By default umask is set to 0000 but this can be disabled with the neutral-umask setting. + + + Added checks to be sure the db-path setting is consistent with db-port by comparing the data_directory as reported by the cluster against the db-path setting and the version as reported by the cluster against the value read from pg_control. The db-socket-path setting is checked to be sure it is an absolute path. + + + Experimental support for 9.5 alpha1. This may break when the control version or WAL magic changes in future versions but will be updated in each release to keep pace. All regression tests pass except for --target-resume tests (this functionality has changed in 9.5) and there is no testing yet for .partial WAL segments. + + + Major refactoring of the protocol layer to support future development. + + + Added vagrant test configurations for Ubuntu 14.04 and CentOS 7. + + + Split most of README.md out into USERGUIDE.md and CHANGELOG.md because it was becoming unwieldy. + + + + + + + + Removed dependency on CPAN packages for multi-threaded operation. While it might not be a bad idea to update the threads and Thread::Queue packages, it is no longer necessary. + + + Added vagrant test configurations for Ubuntu 12.04 and CentOS 6. + + + Modified wait backoff to use a Fibonacci rather than geometric sequence. This will make wait time grow less aggressively while still giving reasonable values. + + + More options for regression tests and improved code to run in a variety of environments. + + + + + + + + Removed pg_backrest_remote and added the functionality to pg_backrest as the remote command. + + + Added file and directory syncs to the File object for additional safety during backup/restore and archiving. Suggested by Andres Freund. + + + Support for Perl 5.10.1 and OpenSSH 5.3 which are default for CentOS/RHEL 6. Reported by Eric Radman. + + + Improved error message when backup is run without archive_command set and without --no-archive-check specified. Reported by Eric Radman. + + + Moved version number out of the VERSION file to Version.pm to better support packaging. Suggested by Michael Renner. + + + Replaced IPC::System::Simple and Net::OpenSSH with IPC::Open3 to eliminate CPAN dependency for multiple operating systems. + + + + + + + + IMPORTANT NOTE: This flag day release breaks compatibility with older versions of . The manifest format, on-disk structure, and the binary names have all changed. You must create a new repository to hold backups for this version of and keep your older repository for a time in case you need to do a restore. The pg_backrest.conf file has not changed but you'll need to change any references to pg_backrest.pl in cron (or elsewhere) to pg_backrest (without the .pl extension). + + + Add info command. + + + More efficient file ordering for backup. Files are copied in descending size order so a single thread does not end up copying a large file at the end. This had already been implemented for restore. + + + Logging now uses unbuffered output. This should make log files that are being written by multiple threads less chaotic. Suggested by Michael Renner. + + + Experimental support for 9.5. This may break when the control version or WAL magic changes but will be updated in each release. + + + + + + + + Fixed an issue where archive-copy would fail on an incr/diff backup when hardlink=n. In this case the pg_xlog path does not already exist and must be created. Reported by Michael Renner + + + Allow duplicate WAL segments to be archived when the checksum matches. This is necessary for some recovery scenarios. + + + Allow comments/disabling in pg_backrest.conf using the # character. Only # characters in the forst character of the line are honored. Suggested by Michael Renner. + + + Better logging before pg_start_backup() to make it clear when the backup is waiting on a checkpoint. Suggested by Michael Renner. + + + Various command behavior, help and logging fixes. Reported by Michael Renner. + + + Fixed an issue in async archiving where archive-push was not properly returning 0 when archive-max-mb was reached and moved the async check after transfer to avoid having to remove the stop file twice. Also added unit tests for this case and improved error messages to make it clearer to the user what went wrong. Reported by Michael Renner. + + + Fixed a locking issue that could allow multiple operations of the same type against a single stanza. This appeared to be benign in terms of data integrity but caused spurious errors while archiving and could lead to errors in backup/restore. Reported by Michael Renner. + + + Replaced JSON module with JSON::PP which ships with core Perl. + + + + + + + + Better resume support. Resumed files are checked to be sure they have not been modified and the manifest is saved more often to preserve checksums as the backup progresses. More unit tests to verify each resume case. + + + Resume is now optional. Use the resume setting or --no-resume from the command line to disable. + + + More info messages during restore. Previously, most of the restore messages were debug level so not a lot was output in the log. + + + Fixed an issue where an absolute path was not written into recovery.conf when the restore was run with a relative path. + + + Added tablespace setting to allow tablespaces to be restored into the pg_tblspc path. This produces compact restores that are convenient for development, staging, etc. Currently these restores cannot be backed up as expects only links in the pg_tblspc path. + + + + + + + + Fixed a buffering error that could occur on large, highly-compressible files when copying to an uncompressed remote destination. The error was detected in the decompression code and resulted in a failed backup rather than corruption so it should not affect successful backups made with previous versions. + + + + + + + + Pushing duplicate WAL now generates an error. This worked before only if checksums were disabled. + + + Database System IDs are used to make sure that all WAL in an archive matches up. This should help prevent misconfigurations that send WAL from multiple clusters to the same archive. + + + Regression tests working back to 8.3. + + + Improved threading model by starting threads early and terminating them late. + + + + + + + + Added restore functionality. + + + All options can now be set on the command-line making pg_backrest.conf optional. + + + De/compression is now performed without threads and checksum/size is calculated in stream. That means file checksums are no longer optional. + + + Added option --no-start-stop to allow backups when Postgres is shut down. If postmaster.pid is present then --force is required to make the backup run (though if Postgres is running an inconsistent backup will likely be created). This option was added primarily for the purpose of unit testing, but there may be applications in the real world as well. + + + Fixed broken checksums and now they work with normal and resumed backups. Finally realized that checksums and checksum deltas should be functionally separated and this simplified a number of things. Issue #28 has been created for checksum deltas. + + + Fixed an issue where a backup could be resumed from an aborted backup that didn't have the same type and prior backup. + + + Removed dependency on Moose. It wasn't being used extensively and makes for longer startup times. + + + Checksum for backup.manifest to detect a corrupted/modified manifest. + + + Link latest always points to the last backup. This has been added for convenience and to make restores simpler. + + + More comprehensive unit tests in all areas. + + + + + + + + Complete rewrite of BackRest::File module to use a custom protocol for remote operations and Perl native GZIP and SHA operations. Compression is performed in threads rather than forked processes. + + + Fairly comprehensive unit tests for all the basic operations. More work to be done here for sure, but then there is always more work to be done on unit tests. + + + Removed dependency on Storable and replaced with a custom ini file implementation. + + + Added much needed documentation + + + Numerous other changes that can only be identified with a diff. + + + + + + + + Working on improving error handling in the File object. This is not complete, but works well enough to find a few errors that have been causing us problems (notably, find is occasionally failing building the archive async manifest when system is under load). + + + Found and squashed a nasty bug where file_copy() was defaulted to ignore errors. There was also an issue in file_exists() that was causing the test to fail when the file actually did exist. Together they could have resulted in a corrupt backup with no errors, though it is very unlikely. + + + + + + + + The archive-get command returns a 1 when the archive file is missing to differentiate from hard errors (ssh connection failure, file copy error, etc.) This lets know that that the archive stream has terminated normally. However, this does not take into account possible holes in the archive stream. + + + + + + + + If an archive directory which should be empty could not be deleted backrest was throwing an error. There's a good fix for that coming, but for the time being it has been changed to a warning so processing can continue. This was impacting backups as sometimes the final archive file would not get pushed if the first archive file had been in a different directory (plus some bad luck). + + + + + + + + Added RequestTTY=yes to ssh sessions. Hoping this will prevent random lockups. + + + + + + + + Added archive-get functionality to aid in restores. + + + Added option to force a checkpoint when starting the backup, start-fast=y. + + + + + + + + Removed master_stderr_discard option on database SSH connections. There have been occasional lockups and they could be related to issues originally seen in the file code. + + + Changed lock file conflicts on backup and expire commands to ERROR. They were set to DEBUG due to a copy-and-paste from the archive locks. + + + + + + + + No restore functionality, but the backup directories are consistent data directories. You'll need to either uncompress the files or turn off compression in the backup. Uncompressed backups on a ZFS (or similar) filesystem are a good option because backups can be restored locally via a snapshot to create logical backups or do spot data recovery. + + + Archiving is single-threaded. This has not posed an issue on our multi-terabyte databases with heavy write volume. Recommend a large WAL volume or to use the async option with a large volume nearby. + + + Backups are multi-threaded, but the Net::OpenSSH library does not appear to be 100% thread-safe so it will very occasionally lock up on a thread. There is an overall process timeout that resolves this issue by killing the process. Yes, very ugly. + + + Checksums are lost on any resumed backup. Only the final backup will record checksum on multiple resumes. Checksums from previous backups are correctly recorded and a full backup will reset everything. + + + The backup.manifest is being written as Storable because Config::IniFile does not seem to handle large files well. Would definitely like to save these as human-readable text. + + + Absolutely no documentation (outside the code). Well, excepting these release notes. + + + + + diff --git a/doc/doc.dtd b/doc/xml/dtd/doc.dtd similarity index 80% rename from doc/doc.dtd rename to doc/xml/dtd/doc.dtd index 5f638909f..879dc72ba 100644 --- a/doc/doc.dtd +++ b/doc/xml/dtd/doc.dtd @@ -1,10 +1,11 @@ - + - - + + + @@ -60,22 +61,26 @@ - - + - - - - - + + + + + + + + + + diff --git a/doc/xml/readme.xml b/doc/xml/readme.xml new file mode 100644 index 000000000..6bfdb7571 --- /dev/null +++ b/doc/xml/readme.xml @@ -0,0 +1,67 @@ + + + + + aims to be a simple backup and restore system that can seamlessly scale up to the largest databases and workloads. + + Primary features: +
    +
  • Local or remote backup
  • +
  • Multi-threaded backup/restore for performance
  • +
  • Checksums
  • +
  • Safe backups (checks that logs required for consistency are present before backup completes)
  • +
  • Full, differential, and incremental backups
  • +
  • Backup rotation (and minimum retention rules with optional separate retention for archive)
  • +
  • In-stream compression/decompression
  • +
  • Archiving and retrieval of logs for replicas/restores built in
  • +
  • Async archiving for very busy systems (including space limits)
  • +
  • Backup directories are consistent clusters (when hardlinks are on and compression is off)
  • +
  • Tablespace support
  • +
  • Restore delta option
  • +
  • Restore using timestamp/size or checksum
  • +
  • Restore remapping base/tablespaces
  • +
  • Support for >= 8.3
  • +
+ Instead of relying on traditional backup tools like tar and rsync, implements all backup features internally and uses a custom protocol for communicating with remote systems. Removing reliance on tar and rsync allows for better solutions to database-specific backup issues. The custom remote protocol limits the types of connections that are required to perform a backup which increases security. + + uses the gitflow model of development. This means that the master branch contains only the release history, i.e. each commit represents a single release and release tags are always from the master branch. The dev branch contains a single commit for each feature or fix and more accurately depicts the development history. Actual development is done on feature (dev_*) branches and squashed into dev after regression tests have passed. In this model dev is considered stable and can be released at any time. As such, the dev branch does not have any special version modifiers.
+
+ + + strives to be easy to configure and operate: + + * [Installation instructions](USERGUIDE.md#installation) for major operating systems. + + * [Sample configurations](USERGUIDE.md#examples) that cover most basic use cases. + + * [Command guide](USERGUIDE.md#commands) for command-line operations. + + * [Settings documentation](USERGUIDE.md#setttings) for creating complex configurations and more detail on options. + + + + Contributions to are always welcome! + + Code fixes or new features can be submitted via pull requests. Ideas for new features and improvements to existing functionality or documentation can be [submitted as issues](http://github.com/pgmasters/backrest/issues). + + Bug reports should be [submitted as issues](http://github.com/pgmasters/backrest/issues). Please provide as much information as possible to aid in determining the cause of the problem. + + You will always receive credit in the [change log](https://github.com/pgmasters/backrest/blob/master/CHANGELOG.md) for your contributions. + + + + is completely free and open source under the [MIT](https://github.com/pgmasters/backrest/blob/master/LICENSE) license. You may use it for personal or commercial purposes without any restrictions whatsoever. Bug reports are taken very seriously and will be addressed as quickly as possible. + + Creating a robust disaster recovery policy with proper replication and backup strategies can be a very complex and daunting task. You may find that you need help during the architecture phase and ongoing support to ensure that your enterprise continues running smoothly. + + [Crunchy Data](http://www.crunchydatasolutions.com) provides packaged versions of for major operating systems and expert full life-cycle commercial support for and all things . [Crunchy Data](http://www.crunchydatasolutions.com) is committed to providing open source solutions with no vendor lock-in so cross-compatibility with the community version of is always strictly maintained. + + Please visit [Crunchy Backup Manager](http://crunchydatasolutions.com/crunchy-backup-manager) for more information. + + + + Primary recognition goes to Stephen Frost for all his valuable advice and criticism during the development of . + + [Crunchy Data](http://www.crunchydatasolutions.com) has contributed significant time and resources to and continues to actively support development. [Resonate](http://www.resonate.com) also contributed to the development of and allowed early (but well tested) versions to be installed as their primary backup solution. + +
diff --git a/doc/xml/userguide.xml b/doc/xml/userguide.xml new file mode 100644 index 000000000..70c9f3d19 --- /dev/null +++ b/doc/xml/userguide.xml @@ -0,0 +1,726 @@ + + + + + + + + + is written entirely in Perl. Some additional modules will need to be installed depending on the OS. + + + + * Install required Perl modules: + + apt-get install libdbd-pg-perl + + + + + * Install Perl and required modules: + + yum install perl perl-Time-HiRes perl-IO-String perl-parent perl-JSON perl-Digest-SHA perl-DBD-Pg + + + + + * Install Perl and required modules: + + yum install perl perl-IO-String perl-Thread-Queue perl-JSON-PP perl-Digest-SHA perl-DBD-Pg + + + + + can be installed by downloading the most recent release: + + https://github.com/pgmasters/backrest/releases + + can be installed anywhere but it's best (though not required) to install it in the same location on all systems. + + + + * Create the backrest user + + The backrest user must be created on the same system and in the same group as the user you will use for testing (which can be any user you prefer). For example: + + adduser -g <test-user-group> backrest + + * Setup password-less SSH login between the test user and the backrest user + + The test user should be able to `ssh backrest@127.0.0.1` and the backrest user should be able to `ssh <testuser>@127.0.0.1` without requiring any passwords. This article (http://archive.oreilly.com/pub/h/66) has details on how to accomplish this. Do the logons both ways at the command line before running regression tests. + + * Give group read and execute permissions to ~/backrest/test: + + Usually this can be accomplished by running the following as the test user: + + chmod 750 ~ + + * Running regression: + + Running the full regression suite is generally not necessary. Run the following first: + + ./test.pl --module=backup --test=full --db-version=all --thread-max=<# threads> + + This will run full backup/restore regression with a variety of options on all installed versions of . If you are only interested in one version then modify the db-version setting to X.X (e.g. 9.4). --thread-max can be omitted if you are running single-threaded. + + If there are errors in this test then run full regression to help isolate problems: + + ./test.pl --db-version=all --thread-max=<# threads> + + Report regression test failures at https://github.com/pgmasters/backrest/issues. + + + + + + can be used entirely with command-line parameters but a configuration file is more practical for installations that are complex or set a lot of options. The default location for the configuration file is /etc/pg_backrest.conf. + + + + Modify the following settings in postgresql.conf: + + wal_level = archive + archive_mode = on + archive_command = '/path/to/backrest/bin/ --stanza=db archive-push %p' + + Replace the path with the actual location where was installed. The stanza parameter should be changed to the actual stanza name for your database. + + + + The absolute minimum required to run (if all defaults are accepted) is the database path. + + /etc/pg_backrest.conf: + + [main] + db-path=/data/db + + The db-path option could also be provided on the command line, but it's best to use a configuration file as options tend to pile up quickly. + + + + This configuration is appropriate for a small installation where backups are being made locally or to a remote file system that is mounted locally. A number of additional options are set: +
    +
  • db-port - Custom port for .
  • +
  • compress - Disable compression (handy if the file system is already compressed).
  • +
  • repo-path - Path to the repository where backups and WAL archive are stored.
  • +
  • log-level-file - Set the file log level to debug (Lots of extra info if something is not working as expected).
  • +
  • hardlink - Create hardlinks between backups (but never between full backups).
  • +
  • thread-max - Use 2 threads for backup/restore operations.
  • +
+ /etc/pg_backrest.conf: + + [global:general] + compress=n + repo-path=/path/to/db/repo + + [global:log] + log-level-file=debug + + [global:backup] + hardlink=y + thread-max=2 + + [main] + db-path=/data/db + db-port=5555 +
+
+ + + This configuration is appropriate for a small installation where backups are being made remotely. Make sure that postgres@db-host has trusted ssh to backrest@backup-host and vice versa. This configuration assumes that you have in the same path on both servers. + + /etc/pg_backrest.conf on the db host: + + [global:general] + repo-path=/path/to/db/repo + repo-remote-path=/path/to/backup/repo + + [global:backup] + backup-host=backup.mydomain.com + backup-user=backrest + + [global:archive] + archive-async=y + + [main] + db-path=/data/db + + /etc/pg_backrest.conf on the backup host: + + [global:general] + repo-path=/path/to/backup/repo + + [main] + db-host=db.mydomain.com + db-path=/data/db + db-user=postgres + + +
+ + + + + The command section defines the location of external commands that are used by . + + + + + Defines the location of pg_backrest_remote.pl. + + Required only if the path to pg_backrest_remote.pl is different on the local and remote systems. If not defined, the remote path will be assumed to be the same as the local path. + + same as local + /usr/lib/backrest/bin/pg_backrest_remote.pl + + + + + + + The log section defines logging-related settings. The following log levels are supported: +
    +
  • off - No logging at all (not recommended)
  • +
  • error - Log only errors
  • +
  • warn - Log warnings and errors
  • +
  • info - Log info, warnings, and errors
  • +
  • debug - Log debug, info, warnings, and errors
  • +
  • trace - Log trace (very verbose debugging), debug, info, warnings, and errors
  • +
+ + + + + Sets file log level. + + debug + + + + + Sets console log level. + + error + + +
+ + + + The general section defines settings that are shared between multiple operations. + + + + + Set the buffer size used for copy, compress, and uncompress functions. A maximum of 3 buffers will be in use at a time per thread. An additional maximum of 256K per thread may be used for zlib buffers. + + 16384 - 8388608 + 32768 + + + + + Enable gzip compression. Backup files are compatible with command-line gzip tools. + + n + + + + + Sets the zlib level to be used for file compression when compress=y. + + 0-9 + 9 + + + + + Sets the zlib level to be used for protocol compression when compress=n and the database is not on the same host as the backup. Protocol compression is used to reduce network traffic but can be disabled by setting compress-level-network=0. When compress=y the compress-level-network setting is ignored and compress-level is used instead so that the file is only compressed once. SSH compression is always disabled. + + 0-9 + 1 + + + + + Sets the umask to 0000 so modes in the repository as created in a sensible way. The default directory mode is 0750 and default file mode is 0640. The lock and log directories set the directory and file mode to 0770 and 0660 respectively. + + To use the executing user's umask instead specify neutral-umask=n in the config file or --no-neutral-umask on the command line. + + n + + + + + Path to the backrest repository where WAL segments, backups, logs, etc are stored. + + /data/db/backrest + + + + + Path to the remote backrest repository where WAL segments, backups, logs, etc are stored. + + /backup/backrest + + + + + + + The backup section defines settings related to backup. + + + + + Sets the backup host when backup up remotely via SSH. Make sure that trusted SSH authentication is configured between the db host and the backup host. + + When backing up to a locally mounted network filesystem this setting is not required. + + backup.domain.com + + + + + Sets user account on the backup host. + + backrest + + + + + Forces a checkpoint (by passing true to the fast parameter of pg_start_backup()) so the backup begins immediately. Otherwise the backup will start after the next regular checkpoint. + + y + + + + + Enable hard-linking of files in differential and incremental backups to their full backups. This gives the appearance that each backup is a full backup. Be careful, though, because modifying files that are hard-linked can affect all the backups in the set. + + y + + + + + Defines how often the manifest will be saved during a backup (in bytes). Saving the manifest is important because it stores the checksums and allows the resume function to work efficiently. The actual threshold used is 1% of the backup size or manifest-save-threshold, whichever is greater. + + 5368709120 + + + + + Defines whether the resume feature is enabled. Resume can greatly reduce the amount of time required to run a backup after a previous backup of the same type has failed. It adds complexity, however, so it may be desirable to disable in environments that do not require the feature. + + false + + + + + Defines the number of threads to use for backup or restore. Each thread will perform compression and transfer to make the backup run faster, but don't set thread-max so high that it impacts database performance during backup. + + 4 + + + + + Maximum amount of time (in seconds) that a backup thread should run. This limits the amount of time that a thread might be stuck due to unforeseen issues during the backup. Has no affect when thread-max=1. + + 3600 + + + + + Checks that all WAL segments required to make the backup consistent are present in the WAL archive. It's a good idea to leave this as the default unless you are using another method for archiving. + + n + + + + + Store WAL segments required to make the backup consistent in the backup's pg_xlog path. This slightly paranoid option protects against corruption or premature expiration in the WAL segment archive. PITR won't be possible without the WAL segment archive and this option also consumes more space. + + Even though WAL segments will be restored with the backup, will ignore them if a recovery.conf file exists and instead use archive_command to fetch WAL segments. Specifying type=none when restoring will not create recovery.conf and force to use the WAL segments in pg_xlog. This will get the database to a consistent state. + + y + + + + + + + The archive section defines parameters when doing async archiving. This means that the archive files will be stored locally, then a background process will pick them and move them to the backup. + + + + + Archive WAL segments asynchronously. WAL segments will be copied to the local repo, then a process will be forked to compress the segment and transfer it to the remote repo if configured. Control will be returned to as soon as the WAL segment is copied locally. + + y + + + + + Limits the amount of archive log that will be written locally when archive-async=y. After the limit is reached, the following will happen: +
    +
  1. will notify Postgres that the archive was successfully backed up, then DROP IT.
  2. +
  3. An error will be logged to the console and also to the Postgres log.
  4. +
  5. A stop file will be written in the lock directory and no more archive files will be backed up until it is removed.
  6. +
+ If this occurs then the archive log stream will be interrupted and PITR will not be possible past that point. A new backup will be required to regain full restore capability. + + The purpose of this feature is to prevent the log volume from filling up at which point Postgres will stop completely. Better to lose the backup than have the database go down. + + To start normal archiving again you'll need to remove the stop file which will be located at ${repo-path}/lock/${stanza}-archive.stop where ${repo-path} is the path set in the general section, and ${stanza} is the backup stanza.
+ + 1024 +
+
+
+ + + + The restore section defines settings used for restoring backups. + + + + + Defines whether tablespaces will be be restored into their original (or remapped) locations or stored directly under the pg_tblspc path. Disabling this setting produces compact restores that are convenient for development, staging, etc. Currently these restores cannot be backed up as expects only links in the pg_tblspc path. If no tablespaces are present this this setting has no effect. + + n + + + + + + + The expire section defines how long backups will be retained. Expiration only occurs when the number of complete backups exceeds the allowed retention. In other words, if full-retention is set to 2, then there must be 3 complete backups before the oldest will be expired. Make sure you always have enough space for retention + 1 backups. + + + + + Number of full backups to keep. When a full backup expires, all differential and incremental backups associated with the full backup will also expire. When not defined then all full backups will be kept. + + 2 + + + + + Number of differential backups to keep. When a differential backup expires, all incremental backups associated with the differential backup will also expire. When not defined all differential backups will be kept. + + 3 + + + + + Type of backup to use for archive retention (full or differential). If set to full, then will keep archive logs for the number of full backups defined by retention-archive. If set to differential, then will keep archive logs for the number of differential backups defined by retention-archive. + + If not defined then archive logs will be kept indefinitely. In general it is not useful to keep archive logs that are older than the oldest backup, but there may be reasons for doing so. + + diff + + + + + Number of backups worth of archive log to keep. If this is set less than your backup retention then be sure you set archive-copy=y or you won't be able to restore some older backups. + + For example, if retention-archive=2 and retention-full=4, then any backups older than the most recent two full backups will not have WAL segments in the archive to make them consistent. To solve this, set archive-copy=y and use type=none when restoring. This issue will be addressed in a future release but for now be careful with this setting. + + 2 + + + + + + + A stanza defines a backup for a specific database. The stanza section must define the base database path and host/user if the database is remote. Also, any global configuration sections can be overridden to define stanza-specific settings. + + + + + Define the database host. Used for backups where the database host is different from the backup host. + + db.domain.com + + + + + Defines the logon user when db-host is defined. This user will also own the remote process and will initiate connections to . For this to work correctly the user should be the cluster owner which is generally postgres, the default. + + test_user + + + + + Path to the db data directory (data_directory setting in postgresql.conf). + + /data/db + + + + + Port that is running on. This usually does not need to be specified as most clusters run on the default port. + + 6543 + + + + + The unix socket directory that was specified when was started. will automatically look in the standard location for your OS so there usually no need to specify this setting unless the socket directory was explicily modified with the unix_socket_directory setting in postgressql.conf. + + /var/run/postgresql + + + +
+
+ + + + These options are either global or used by all commands. + + + + + + + + + + + + + + + + + + + + Perform a database backup. does not have a built-in scheduler so it's best to run it from cron or some other scheduling mechanism. + + + + + + + + + + + + + + + + /path/to/ --stanza=db --type=full backup + + Run a full backup on the db stanza. --type can also be set to incr or diff for incremental or differential backups. However, if no full backup exists then a full backup will be forced even if incr or diff is requested. + + + + + + + Archive a WAL segment to the repository. + + + + + /path/to/ --stanza=db archive-push %p + + Accepts a WAL segment from and archives it in the repository defined by repo-path. %p is how specifies the location of the WAL segment to be archived. + + + + + + + Get a WAL segment from the repository. + + + + + /path/to/ --stanza=db archive-get %f %p + + Retrieves a WAL segment from the repository. This command is used in recovery.conf to restore a backup, perform PITR, or as an alternative to streaming for keeping a replica up to date. %f is how specifies the WAL segment it needs and %p is the location where it should be copied. + + + + + + + does backup rotation, but is not concerned with when the backups were created. So if two full backups are configured for retention, will keep two full backups no matter whether they occur, two hours apart or two weeks apart. + + + + + /path/to/ --stanza=db expire + + Expire (rotate) any backups that exceed the defined retention. Expiration is run automatically after every successful backup, so there is no need to run this command separately unless you have reduced retention, usually to free up some space. + + + + + + + Perform a database restore. This command is generally run manually, but there are instances where it might be automated. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + /path/to/ --stanza=db --type=name --target=release restore + + Restores the latest database backup and then recovers to the release restore point. + + + + + + + Retrieve information about backups for a single stanza or for all stanzas. Text output is the default and gives a human-readable summary of backups for the stanza(s) requested. This format is subject to change with any release. + + For machine-readable output use --output=json. The JSON output contains far more information than the text output, however this feature is currently experimental so the format may change between versions. + + + + + + + + + + /path/to/ --stanza=db --output=json info + + + Get information about backups in the db stanza. + + + + + /path/to/ --output=json info + + + Get information about backups for all stanzas in the repository. + + + + + +
diff --git a/test/test.pl b/test/test.pl index 9b42f978d..a9d699f58 100755 --- a/test/test.pl +++ b/test/test.pl @@ -205,29 +205,30 @@ if ($iThreadMax < 1 || $iThreadMax > 32) } #################################################################################################################################### -# Make sure version number matches in README.md and VERSION +# Make sure version number matches in the change log. #################################################################################################################################### my $hReadMe; my $strLine; my $bMatch = false; +my $strChangeLogFile = abs_path(dirname($0) . '/../CHANGELOG.md'); -if (!open($hReadMe, '<', dirname($0) . '/../README.md')) +if (!open($hReadMe, '<', $strChangeLogFile)) { - confess 'unable to open README.md'; + confess "unable to open ${strChangeLogFile}"; } while ($strLine = readline($hReadMe)) { - if ($strLine =~ /^\#\#\# v/) + if ($strLine =~ /^\#\# v/) { - $bMatch = substr($strLine, 5, length(BACKREST_VERSION)) eq BACKREST_VERSION; + $bMatch = substr($strLine, 4, length(BACKREST_VERSION)) eq BACKREST_VERSION; last; } } if (!$bMatch) { - confess 'unable to find version ' . BACKREST_VERSION . ' as last revision in README.md'; + confess 'unable to find version ' . BACKREST_VERSION . " as last revision in ${strChangeLogFile}"; } ####################################################################################################################################