**pg_probackup** is a utility to manage backup and recovery of PostgreSQL database clusters. It is designed to perform periodic backups of the PostgreSQL instance that enable you to restore the server in case of a failure. pg_probackup supports PostgreSQL 9.5 or higher.
As compared to other backup solutions, pg_probackup offers the following benefits that can help you implement different backup strategies and deal with large amounts of data:
- Incremental backup: page-level incremental backup of three different modes allows you to devise the backup strategy in accordance with your data flow
- Verification: On-demand verification of PostgreSQL instance via dedicated command `checkdb`
- Retention: Managing backups in accordance with retention policies - Time and/or Redundancy based, with two retention methods: `delete expired` and `merge expired`
- Parallelization: Running backup, restore, merge, verificaton and validation processes on multiple parallel threads
- External directories: Add to backup content of directories located outside of the PostgreSQL data directory (PGDATA), such as scripts, configs, logs and pg_dump files
To manage backup data, pg_probackup creates a **backup catalog**. This is a directory that stores all backup files with additional meta information, as well as WAL archives required for point-in-time recovery. You can store backups for different instances in separate subdirectories of a single **backup catalog**.
- Incremental backups only store the data that has changed since the previous backup. It allows to decrease the backup size and speed up backup operations. pg_probackup supports the following modes of incremental backups:
- DELTA backup. In this mode, pg_probackup reads all data files in the data directory and copies only those pages that has changed since the previous backup. Note that this mode can impose read-only I/O pressure equal to a full backup.
- PAGE backup. In this mode, pg_probackup scans all WAL files in the archive from the moment the previous full or incremental backup was taken. Newly created backups contain only the pages that were mentioned in WAL records. This requires all the WAL files since the previous backup to be present in the WAL archive. If the size of these files is comparable to the total size of the database cluster files, speedup is smaller, but the backup still takes less space. You have to configure WAL archiving as explained in the section [Setting up continuous WAL archiving](#setting-up-continuous-wal-archiving) to make PAGE backups.
- PTRACK backup. In this mode, PostgreSQL tracks page changes on the fly. Continuous archiving is not necessary for it to operate. Each time a relation page is updated, this page is marked in a special PTRACK bitmap for this relation. As one page requires just one bit in the PTRACK fork, such bitmaps are quite small. Tracking implies some minor overhead on the database server operation, but speeds up incremental backups significantly.
pg_probackup can take only physical online backups, and online backups require WAL for consistent recovery. So regardless of the chosen **backup mode** (FULL, PAGE, DELTA, etc), all backups taken with pg_probackup must use one of the following **WAL delivery methods**:
- ARCHIVE. Such backups rely on [continuous archiving](#setting-up-continuous-wal-archiving) to ensure consistent recovery. This is the default WAL delivery method.
- STREAM. Such backups include all the files required to restore the cluster to a consistent state at the time the backup was taken. Regardless of [continuous archiving](#setting-up-continuous-wal-archiving) been set up or not, the WAL segments required for consistent recovery are streamed (hence STREAM) via replication protocol during backup and included into the backup files.
- The PostgreSQL server from which the backup was taken and the restored server must be compatible by the [block_size](https://www.postgresql.org/docs/current/runtime-config-preset.html#GUC-BLOCK-SIZE) and [wal_block_size](https://www.postgresql.org/docs/current/runtime-config-preset.html#GUC-WAL-BLOCK-SIZE) parameters and have the same major release number.
pg_probackup can store backups for multiple database clusters in a single backup catalog. To set up the required subdirectories, you must add a backup instance to the backup catalog for each database cluster you are going to back up.
To add a new backup instance, run the following command:
- **data_dir** is the data directory of the cluster you are going to back up. To set up and use pg_probackup, write access to this directory is required.
- **instance_name** is the name of the subdirectories that will store WAL and backup files for this cluster.
pg_probackup creates the **instance_name** subdirectories under the **backups/** and **wal/** directories of the backup catalog. The **backups/instance_name** directory contains the **pg_probackup.conf** configuration file that controls backup and restore settings for this backup instance. If you run this command with the [remote_options](#remote-mode-options), used parameters will be added to **pg_probackup.conf**. For details on how to fine-tune pg_probackup configuration, see the section [Configuring pg_probackup](#configuring-pg_probackup).
The user launching pg_probackup must have full access to **backup_dir** directory and at least read-only access to **data_dir** directory. If you specify the path to the backup catalog in the *BACKUP_PATH* environment variable, you can omit the corresponding option when running pg_probackup commands.
Although pg_probackup can be used by a superuser, it is recommended to create a separate role with the minimum permissions required for the chosen backup strategy. In these configuration instructions, the **backup** role is used as an example.
To enable backups, the following rights are required:
```CREATE ROLE backup WITH LOGIN;
GRANT USAGE ON SCHEMA pg_catalog TO backup;
GRANT EXECUTE ON FUNCTION current_setting(text) TO backup;
GRANT EXECUTE ON FUNCTION pg_is_in_recovery() TO backup;
GRANT EXECUTE ON FUNCTION pg_start_backup(text, boolean, boolean) TO backup;
GRANT EXECUTE ON FUNCTION pg_stop_backup() TO backup;
GRANT EXECUTE ON FUNCTION pg_stop_backup(boolean, boolean) TO backup;
GRANT EXECUTE ON FUNCTION pg_create_restore_point(text) TO backup;
GRANT EXECUTE ON FUNCTION pg_switch_wal() TO backup;
GRANT EXECUTE ON FUNCTION txid_current() TO backup;
GRANT EXECUTE ON FUNCTION txid_current_snapshot() TO backup;
GRANT EXECUTE ON FUNCTION txid_snapshot_xmax(txid_snapshot) TO backup;
```
Since pg_probackup needs to read cluster files directly, pg_probackup must be started on behalf of an OS user that has read access to all files and directories inside the data directory (PGDATA) you are going to back up.
Depending on whether you are plan to take STREAM and/or ARCHIVE backups, PostgreSQL cluster configuration will differ, as specified in the sections below. To back up the database cluster from a standby server or create PTRACK backups, additional setup is required. For details, see the sections [Setting up STREAM Backups](#setting-up-stream-backups), [Setting up continuous WAL archiving](#setting-up-continuous-wal-archiving), [Setting up PTRACK Backups](#setting-up-ptrack-backups) and [Setting up Backup from Standby](#backup-from-standby).
- Make sure the parameter [max_wal_senders](https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-WAL-SENDERS) is set high enough to leave at least one session available for the backup process.
- Set the parameter [wal_level](https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LEVEL) to be higher than `minimal`.
If you are planning to take PAGE backups in STREAM mode, you still have to configure WAL archiving as explained in the section [Setting up continuous WAL archiving](#setting-up-continuous-wal-archiving).
ARCHIVE backups require [continious WAL archiving](https://www.postgresql.org/docs/current/continuous-archiving.html) to be enabled. To set up continious archiving in the cluster, complete the following steps:
- If you are configuring backups on master, [archive_mode](https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-MODE) must be set to `on`. To perform archiving on standby, set this parameter to `always`.
Where **backup_dir** and **instance_name** refer to the already initialized **backup catalog** instance for this database cluster and optional parameters [remote_options](#remote-mode-options) should be used to archive WAL to the remote machine.
>NOTE: Instead of `archive_mode`+`archive_command` method you may opt to use the utility [pg_receivewal](https://www.postgresql.org/docs/current/app-pgreceivewal.html). In this case pg_receivewal `-D directory` option should point to **backup_dir/wal/instance_name** directory. WAL compression that could be done by pg_receivewal is supported by pg_probackup. `Zero data loss` archive strategy can be achieved only by using `pg_receivewal`.
- On the standby server, set the parameter [hot_standby](https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-HOT-STANDBY) to `on`.
- On the master server, set the parameter [full_page_writes](https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-FULL-PAGE-WRITES) to `on`.
Once these steps are complete, you can start taking FULL, PAGE, DELTA or PTRACK backups of appropriate WAL delivery method: ARCHIVE or STREAM, from the standby server.
- All WAL records required for the backup must contain sufficient full-page writes. This requires you to enable `full_page_writes` on the master, and not to use a tools like **pg_compresslog** as [archive_command](https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-COMMAND) to remove full-page writes from WAL files.
This section describes pg_probackup commands. Some commands require mandatory parameters and can take additional options. Optional parameters encased in square brackets. For detailed descriptions of options, see the section [Options](#options).
Displays the synopsis of pg_probackup commands. If one of the pg_probackup commands is specified, shows detailed information about the options that can be used with this command.
Initializes the backup catalog in **backup_dir** that will store backup copies, WAL archive and meta information for the backed up database clusters. If the specified **backup_dir** already exists, it must be empty. Otherwise, pg_probackup displays a corresponding error message.
For details, see the secion [Initializing the Backup Catalog](#initializing-the-backup-catalog).
Initializes a new backup instance inside the backup catalog backup_dir and generates the pg_probackup.conf configuration file that controls backup and restore settings for the cluster with the specified data_dir data directory. For details, see the section [Adding a New Backup Instance](#adding-a-new-backup-instance).
Adds the specified connection, compression, retention, logging and external directory settings into the pg_probackup.conf configuration file, or modifies the previously defined values.
Displays the contents of the pg_probackup.conf configuration file located in the **backup_dir/backups/instance_name** directory. You can specify the `--format=json` option to return the result in the JSON format. By default, configuration settings are shown as plain text.
To edit pg_probackup.conf, use the [set-config](#set-config) command.
It is **not recommended** to edit pg_probackup.conf manually.
Shows the contents of the backup catalog. If instance_name and backup_id are specified, shows detailed information about this backup. You can specify the --format=json option to return the result in the JSON format. By default, the contents of the backup catalog is shown as plain text.
Restores the PostgreSQL instance from a backup copy located in the backup_dir backup catalog. If you specify a recovery target option, pg_probackup will find the closest backup and restores it to the specified recovery target. Otherwise, the most recent backup is used.
Verifies that all the files required to restore the cluster are present and not corrupted. If **instance_name** is not specified, pg_probackup validates all backups available in the backup catalog. If you specify the **instance_name** without any additional options, pg_probackup validates all the backups available for this backup instance. If you specify the **instance_name** with a [recovery target option](#recovery-target-options) and/or a **backup_id**, pg_probackup checks whether it is possible to restore the cluster using these options.
For details, see the section [Validating a Backup](#validating-a-backup).
Merges the specified incremental backup to its parent full backup, together with all incremental backups between them, if any. As a result, the full backup takes in all the merged data, and the incremental backups are removed as redundant.
For details, see the section [Merging Backups](#merging-backups).
Deletes backup with specified **backip_id** or launches the retention purge of backups and archived WAL that do not satisfy the current retention policies.
For details, see the sections [Deleting Backups](#deleting-backups), [Retention Options](#retention-otions) and [Configuring Backup Retention Policy](#configuring-backup-retention-policy).
Copies WAL files into the corresponding subdirectory of the backup catalog, and validates the backup instance by instance_name, system-identifier. If parameters of the backup instance and the cluster do not match, this command fails with the following error message: “Refuse to push WAL segment segment_name into archive. Instance parameters mismatch.” For each WAL file moved to the backup catalog, you will see the following message in PostgreSQL logfile: “pg_probackup archive-push completed successfully”. If the files to be copied already exist in the backup catalog, pg_probackup computes and compares their checksums. If the checksums match, archive-push skips the corresponding file and returns successful execution code. Otherwise, archive-push fails with an error. If you would like to replace WAL files in the case of checksum mismatch, run the archive-push command with the --overwrite option.
Copying is done to temporary file with `.partial` suffix or, if [compression](#compression-options) is used, with `.gz.partial` suffix. After copy is done, atomic rename is performed. This algorihtm ensures that failed archive-push will not stall continuous archiving and that concurrent archiving from multiple sources into single WAL archive has no risk of archive corruption.
Copied to archive WAL segments are synced to disk.
You can use archive-push in [archive_command](https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-COMMAND) PostgreSQL parameter to set up [continous WAl archiving](#setting-up-continuous-wal-archiving).
Copies WAL files from the corresponding subdirectory of the backup catalog to the cluster's write-ahead log location. This command is automatically set by pg_probackup as part of the `restore_command` in **recovery.conf** when restoring backups using a WAL archive. You do not need to set it manually.
This section describes all command-line options for pg_probackup commands. If the option value can be derived from an environment variable, this variable is specified below the command-line option, in the uppercase. Some values can be taken from the pg_probackup.conf configuration file located in the backup catalog.
For details, see the section [Configuring pg_probackup](#configuring-pg_probackup).
If an option is specified using more than one method, command-line input has the highest priority, while the pg_probackup.conf settings have the lowest priority.
Specifies the absolute path to the **backup catalog**. Backup catalog is a directory where all backup files and meta information are stored. Since this option is required for most of the pg_probackup commands, you are recommended to specify it once in the BACKUP_PATH environment variable. In this case, you do not need to use this option each time on the command line.
Specifies the absolute path to the data directory of the database cluster. This option is mandatory only for the [add-instance](#add-instance) command. Other commands can take its value from the PGDATA environment variable, or from the pg_probackup.conf configuration file.
The following options can be used together with the [backup](#backup) command. Additionally [Connection Options](#connection-options), [Retention Options](#retention-options), [Remote Mode Options](#remote-mode-options), [Compression Options](#compression-options), [Logging Options](#logging-options) and [Common Options](#common-options) can be used.
Creates a temporary physical replication slot for streaming WAL from the backed up PostgreSQL instance. It ensures that all the required WAL segments remain available if WAL is rotated while the backup is in progress. This option can only be used together with the --stream option. Default slot name is `pg_probackup_slot`, which can be changed via option `-S / --slot`.
Includes the specified directory into the backup. This option is useful to back up scripts, sql dumps and configuration files located outside of the data directory. If you would like to back up several external directories, separate their paths by a colon on Unix and a semicolon on Windows.
Skips automatic validation after successfull backup. You can use this option if you validate backups regularly and would like to save time when running backup operations.
The following options can be used together with the [restore](#restore) command. Additionally [Recovery Target Options](#recovery-target-options), [Remote Mode Options](#remote-mode-options), [Logging Options](#logging-options) and [Common Options](#common-options) can be used.
Writes a minimal recovery.conf in the output directory to facilitate setting up a standby server. The password is not included. If the replication connection requires a password, you must specify the password manually.
-T OLDDIR=NEWDIR
--tablespace-mapping=OLDDIR=NEWDIR
Relocates the tablespace from the OLDDIR to the NEWDIR directory at the time of recovery. Both OLDDIR and NEWDIR must be absolute paths. If the path contains the equals sign (=), escape it with a backslash. This option can be specified multiple times for multiple tablespaces.
--external-mapping=OLDDIR=NEWDIR
Relocates an external directory included into the backup from the OLDDIR to the NEWDIR directory at the time of recovery. Both OLDDIR and NEWDIR must be absolute paths. If the path contains the equals sign (=), escape it with a backslash. This option can be specified multiple times for multiple directories.
Disables block-level checksum verification to speed up validation. During automatic validation before restore only file-level checksums will be verified.
The following options can be used together with the [checkdb](#checkdb) command. For details on verifying PostgreSQL database cluster, see section [Verifying a Cluster](#verifying-a-cluster)
Performs logical verification of indexes for the specified PostgreSQL instance if no corruption was found while checking data files. You must have the `amcheck` extention or the `amcheck_next` extension installed in the database to check its indexes. For databases without amcheck, index verification will be skipped.
Checks that all heap tuples that should be indexed are actually indexed. You can use this option only together with the `--amcheck` option. Can be used only with `amcheck` extension of version 2.0 and `amcheck_next` extension of any version.
If [continuous WAL archiving](#setting-up-continuous-wal-archiving) is configured, you can use one of these options together with [restore](#restore) or [validate](#validate) commands to specify the moment up to which the database cluster must be restored.
- The immediate value stops the recovery after reaching the consistent state of the specified backup, or the latest available backup if the `-i/--backup_id` option is omitted.
Specifies the LSN of the write-ahead log location up to which recovery will proceed. Can be used only when restoring database cluster of major version 10 or higher.
Specifies whether to stop just after the specified recovery target (true), or just before the recovery target (false). This option can only be used together with `--recovery-target-name`, `--recovery-target-time`, `--recovery-target-lsn`, or `--recovery-target-xid` options. The default value is taken from the [recovery_target_inclusive](https://www.postgresql.org/docs/current/recovery-target-settings.html#RECOVERY-TARGET-INCLUSIVE) parameter.
Specifies [the action](https://www.postgresql.org/docs/current/recovery-target-settings.html#RECOVERY-TARGET-ACTION) the server should take when the recovery target is reached.
You can use these options together with [backup](#backup) and [delete](#delete) commands. For details on configuring retention policy, see the section [Configuring Backup Retention Policy](#configuring-backup-retention-policy).
Controls which message levels are sent to the console log. Valid values are verbose, log, info, notice, warning, error, and off. Each level includes all the levels that follow it. The later the level, the fewer messages are sent. The off level disables console logging.
--log-level-file=log_level
Default: off
Controls which message levels are sent to a log file. Valid values are verbose, log, info, notice, warning, error, and off. Each level includes all the levels that follow it. The later the level, the fewer messages are sent. The off level disables file logging.
--log-filename=log_filename
Default: pg_probackup.log
Defines the filenames of the created log files. The filenames are treated as a strftime pattern, so you can use %-escapes to specify time-varying filenames, as explained in log_filename. For example, if you specify the pg_probackup-%u.log pattern, pg_probackup generates a separate log file for each day of the week, with %u replaced by the corresponding decimal number: pg_probackup-1.log for Monday, pg_probackup-2.log for Tuesday, and so on.
This option takes effect if file logging is enabled by the log-level-file option.
--error-log-filename=error_log_filename
Default: none
Defines the filenames of log files for error messages. The filenames are treated as a strftime pattern, so you can use %-escapes to specify time-varying file names, as explained in log_filename.
If error-log-filename is not set, pg_probackup writes all error messages to stderr.
Defines the directory in which log files will be created. You must specify the absolute path. This directory is created lazily, when the first log message is written.
--log-rotation-size=log_rotation_size
Default: 0
Maximum size of an individual log file. If this value is reached, the log file is rotated once a pg_probackup command is launched, except help and version commands. The zero value disables size-based rotation. Supported units: kB, MB, GB, TB (kB by default).
--log-rotation-age=log_rotation_age
Default: 0
Maximum lifetime of an individual log file. If this value is reached, the log file is rotated once a pg_probackup command is launched, except help and version commands. The time of the last log file creation is stored in $BACKUP_PATH/log/log_rotation. The zero value disables time-based rotation. Supported units: ms, s, min, h, d (min by default).
Specifies the name of the database to connect to. The connection is used only for managing backup process, so you can connect to any existing database. If this option is not provided on the command line, PGDATABASE environment variable, or the pg_probackup.conf configuration file, pg_probackup tries to take this value from the PGUSER environment variable, or from the current user name if PGUSER variable is not set.
-h host
--host=host
PGHOST
Default: local socket
Specifies the host name of the system on which the server is running. If the value begins with a slash, it is used as a directory for the Unix domain socket.
-p port
--port=port
PGPORT
Default: 5432
Specifies the TCP port or the local Unix domain socket file extension on which the server is listening for connections.
-U username
--username=username
PGUSER
User name to connect as.
-w
--no-password
Disables a password prompt. If the server requires password authentication and a password is not available by other means such as a .pgpass file, the connection attempt will fail. This option can be useful in batch jobs and scripts where no user is present to enter a password.
Defines the algorithm to use for compressing data files. Possible values are zlib, pglz, and none. If set to zlib or pglz, this option enables compression. By default, compression is disabled.
For the `archive-push` command, the pglz compression algorithm is not supported.
Defines compression level (0 through 9, 0 being no compression and 9 being best compression). This option can be used together with --compress-algorithm option.
Overwrites archived WAL file. Use this option together with the archive-push command if the specified subdirectory of the backup catalog already contains this WAL file and it needs to be replaced with its newer copy. Otherwise, archive-push reports that a WAL segment already exists, and aborts the operation. If the file to replace has not changed, archive-push skips this file regardless of the `--overwrite` option.
This section describes the options related to running pg_probackup operations remotely via SSH. These options can be used with [add-instance](#add-instance), [set-config](#set-config), [backup](#backup), [restore](#restore), [archive-push](#archive-push) and [archive-get](#archive-get) commands. For details on configuring remote operation mode, see the section [Using pg_probackup in the Remote Mode](#using-pg_probackup-in-the-remote-mode).
This section describes the options related to taking a backup from standby.
>NOTE: Starting from pg_probackup 2.0.24, backups can be taken from standby without connecting to the master server, so these options are no longer required. In lower versions, pg_probackup had to connect to the master to determine recovery time — the earliest moment for which you can restore a consistent state of the database cluster.
--master-db=dbname
Default: postgres, the default PostgreSQL database.
Deprecated. Specifies the name of the database on the master server to connect to. The connection is used only for managing the backup process, so you can connect to any existing database. Can be set in the pg_probackup.conf using the [set-config](#set-config) command.
Deprecated. Wait time for WAL segment streaming via replication, in seconds. By default, pg_probackup waits 300 seconds. You can also define this parameter in the pg_probackup.conf configuration file using the [set-config](#set-config) command.
- FULL — creates a full backup that contains all the data files of the cluster to be restored.
- DELTA — reads all data files in the data directory and creates an incremental backup for pages that have changed since the previous backup.
- PAGE — creates an incremental PAGE backup based on the WAL files that have changed since the previous full or incremental backup was taken.
- PTRACK — creates an incremental PTRACK backup tracking page changes on the fly.
When restoring a cluster from an incremental backup, pg_probackup relies on the previous full backup to restore all the data files first. Thus, you must create at least one full backup before taking incremental ones.
If [data checksums](https://www.postgresql.org/docs/current/runtime-config-preset.html#GUC-DATA-CHECKSUMS) are enabled in the database cluster, pg_probackup uses this information to check correctness of data files. While reading each page, pg_probackup checks whether the calculated checksum coincides with the checksum stored in the page header. This guarantees that the PostgreSQL instance and backup itself are free of corrupted pages.
Note that pg_probackup reads database files directly from filesystem, so under heavy write load during backup it can show false positive checksum failures because of partial writes. In case of page checksumm mismatch, page is readed again and checksumm comparison repeated.
Page is considered corrupted if checksumm comparison failed more than 100 times, is this case backup is aborted.
STREAM backups include all the WAL segments required to restore the cluster to a consistent state at the time the backup was taken. To restore a cluster from an incremental STREAM backup, pg_probackup still requires the full backup and all the incremental backups it depends on.
3. Creating backup from standby of a server that generates small amount of WAL traffic and using [archive_timeout](https://www.postgresql.org/docs/9.6/runtime-config-wal.html#GUC-ARCHIVE-TIMEOUT) is not an option.
To back up a directory located outside of the data directory, use the optional **--external-dirs** parameter that specifies the path to this directory. If you would like to add more than one external directory, provide several paths separated by colons. For example, to include '/etc/dir1/' and '/etc/dir2/' directories into the full backup of your node instance that will be stored under the node_backup directory, run:
pg_probackup creates a separate subdirectory in the backup directory for each external directory. Since external directories included into different backups do not have to be the same, when you are restoring the cluster from an incremental backup, only those directories that belong to this particular backup will be restored. Any external directories stored in the previous backups will be ignored. To include the same directories into each backup of your instance, you can specify them in the pg_probackup.conf configuration file using the `set-config` command with the **--external-dirs** option.
-`checkdb` do not strictly require **the backup catalog**, so it can be used to verify database clusters that are **not** [added to the backup catalog](#adding-a-new-backup-instance).
If **backup_dir** is omitted, then [connection options](#connection-options) and **data_dir** must be provided via environment variables or command-line options.
Physical verification cannot detect logical inconsistencies, missing and nullified blocks or entire files.
Extensions [amcheck](https://www.postgresql.org/docs/current/amcheck.html) and [amcheck_next](https://github.com/petergeoghegan/amcheck) provide a limited solution to this problems.
If you would like, in addition to physical verification, to verify all indexes in all databases using these extensions, you can specify `--amcheck` option when running [checkdb](#checkdb) command:
pg_probackup checkdb -D data_dir --amcheck
Physical verification can be skipped if `--skip-block-validation` option is used. For logical only verification **backup_dir** and **data_dir** are optional, only [connection options](#connection-options) are mandatory:
Logical verification can be done more thoroughly with option `--heapallindexed` by checking that all heap tuples that should be indexed are actually indexed, but at the higher cost of CPU, memory and I/O comsumption.
pg_probackup calculates checksums for each file in a backup during `backup` process. The process of checking checksumms of backup data files is called **backup validation**. By default validation is run immediately after backup is taken and right before restore, to detect possible backup corruptions. If you would like to skip backup validation, you can specify the `--no-validate` option when running `backup` and `restore` commands.
To ensure that all the required backup files are present and can be used to restore the database cluster, you can run the [validate](#validate) command with the exact recovery target options you are going to use for recovery. If you omit all the parameters, all backups are validated.
If validation completes successfully, pg_probackup displays the corresponding message. If validation fails, you will receive an error message with the exact time, transaction ID and LSN up to which the recovery is possible.
- **backup_dir** is the backup catalog that stores all backup files and meta information.
- **instance_name** is the backup instance for the cluster to be restored.
- **backup_id** specifies the backup to restore the cluster from. If you omit this option, pg_probackup uses the latest valid backup available for the specified instance. If you specify an incremental backup to restore, pg_probackup automatically restores the underlying full backup and then sequentially applies all the necessary increments.
If the cluster to restore contains tablespaces, pg_probackup restores them to their original location by default. To restore tablespaces to a different location, use the `--tablespace-mapping` option. Otherwise, restoring the cluster on the same host will fail if tablespaces are in use, because the backup would have to be written to the same directories.
When using the `--tablespace-mapping` option, you must provide absolute paths to the old and new tablespace directories. If a path happens to contain an equals sign (=), escape it with a backslash. This option can be specified multiple times for multiple tablespaces. For example:
Once the restore command is complete, start the database service. If you are restoring an STREAM backup, the restore is complete at once, with the cluster returned to a self-consistent state at the point when the backup was taken. For ARCHIVE backups, PostgreSQL replays all available archived WAL segments, so the cluster is restored to the latest state possible. You can change this behavior by using the [recovery target options](#recovery-target-options) with the `restore` command. Note that using the [recovery target options](#recovery-target-options) when restoring STREAM backup is possible if the WAL archive is available at least starting from the time the STREAM backup was taken.
>NOTE: By default, the [restore](#restore) command validates the specified backup before restoring the cluster. If you run regular backup validations and would like to save time when restoring the cluster, you can specify the `--no-validate` option to skip validation and speed up the recovery.
If you have enabled [continuous WAL archiving](#setting-up-continuous-wal-archiving) before taking backups, you can restore the cluster to its state at an arbitrary point in time (recovery target) using [recovery target options](#recovery-target-options) with the [restore](#restore) and [validate](#validate) commands. If `-i/--backup-id` option is omitted, pg_probackup automatically chooses the backup that is the closest to the specified **recovery target** and starts the restore process, otherwise pg_probackup will try to restore **backup_id** to the specified **recovery target**.
pg_probackup supports the remote mode that allows to perform `backup` and `restore` operations remotely via SSH. In this mode, the backup catalog is stored on a local system, while PostgreSQL instance to be backed up is located on a remote system. You must have pg_probackup installed on both systems.
- On your local system, configure pg_probackup as explained in the section [Installation and Setup](#installation-and-setup). For the [add-instance](#add-instance) and [set-config](#set-config) commands, make sure to specify [remote options](#remote-mode-options) that point to the remote server with the PostgreSQL instance.
- If you would like to take ARCHIVE backups, configure continuous WAL archiving on the remote system as explained in the section [Setting up continuous WAL archiving](#setting-up-continuous-wal-archiving). For the [archive-push](#archive-push) and [archive-get](#archive-get) commands, you must specify the [remote options](#remote-mode-options) that point to host with **backup catalog**.
- Run [backup](#backup) or [restore](#restore) commands with remote options on system with **backup catalog**. pg_probackup connects to the remote system via SSH and creates a backup locally or restores the previously taken backup on the remote system, respectively.
[Backup](#backup), [restore](#restore), [merge](#merge), [delete](#delete), [checkdb](#checkdb) and validate[#validate] processes can be executed on several parallel threads. This can significantly speed up pg_probackup operation given enough resources (CPU cores, disk, and network throughput).
>NOTE: Parallel restore applies only to copying data from the backup catalog to the data directory of the cluster. When PostgreSQL server is started, WAL records need to be replayed, and this cannot be done in parallel.
Once the backup catalog is initialized and a new backup instance is added, you can use the pg_probackup.conf configuration file located in the **backup_dir/backups/instance_name** directory to fine-tune pg_probackup configuration.
For example, [backup](#backup) and [checkdb](#checkdb) commands uses a regular PostgreSQL connection. To avoid specifying these options each time on the command line, you can set them in the pg_probackup.conf configuration file using the [set-config](#set-config) command.
> Note: It is **not recommended** to edit pg_probackup.conf manually.
Additionally, you can define [remote](#remote-mode-options), [retention](#retention-options), [logging](#logging-options) and [compression](#compression-options) settings using the `set-config` command:
You can override the settings defined in pg_probackup.conf when running the backup command via corresponding environment variables and/or command line options.
If you define connection settings in the **pg_probackup.conf** configuration file, you can omit connection options in all the subsequent pg_probackup commands. However, if the corresponding environment variables are set, they get higher priority. The options provided on the command line overwrite both environment variables and configuration file settings.
If nothing is given, the default values are taken. By default pg_probackup tries to use local connection via Unix socket (localhost on Windows) and tries to get the database name and the user name from the PGUSER environment variable or the current OS user name.
By default, all backup copies created with pg_probackup are stored in the specified backup catalog. To save disk space, you can configure retention policy and periodically clean up redundant backup copies accordingly.
To configure retention policy, set one or more of the following variables in the pg_probackup.conf file:
- retention-redundancy — specifies the number of full backup copies to keep in the backup catalog.
- retention-window — defines the earliest point in time for which pg_probackup can complete the recovery. This option is set in the number of days from the current moment. For example, if retention-window=7, pg_probackup must keep at least one full backup copy that is older than seven days, with all the corresponding WAL files.
If both retention-redundancy and retention-window options are set, pg_probackup keeps backup copies that satisfy at least one condition. For example, if you set retention-redundancy=2 and retention-window=7, pg_probackup cleans up the backup directory to keep only two full backup copies and all backups that are newer than seven days.
To clean up the backup catalog in accordance with retention policy, run:
>NOTE: Alternatively, you can use the --delete-expired, --merge-expired and --delete-wal options together with the `backup` command to remove and merge the outdated backup copies once the new backup is created.
Since incremental backups require that their parent full backup and all the preceding incremental backups are available, if any of such backups expire, they still cannot be removed while at least one incremental backup in this chain satisfies the retention policy. To avoid keeping expired backups that are still required to restore an active incremental one, you can merge them with this backup using the --merge-expired option when running `backup` or `delete` commands.
Suppose you have backed up the node instance in the node-backup directory, with the retention-window option is set to 7, and you have the following backups available on April 10, 2019:
Even though P7XDHB and P7XDHU backups are outside the retention window, P7XDHB and P7XDHU cannot be removed as it invalidates the succeeding incremental backups P7XDJA and P7XDQV that are still required, so if you run the `delete` command with the `--delete-expired` option, only the P7XDFT full backup will be removed.
With the `--merge-expired` option, the P7XDJA backup is merged with the underlying P7XDHU and P7XDHB backups and becomes a full one, so there is no need to keep these expired backups anymore:
As you take more and more incremental backups, the total size of the backup catalog can substantially grow. To save disk space, you can merge incremental backups to their parent full backup by running the merge command, specifying the backup ID of the most recent incremental backup you would like to merge:
This command merges the specified incremental backup to its parent full backup, together with all incremental backups between them. Once the merge is complete, the incremental backups are removed as redundant. Thus, the merge operation is virtually equivalent to retaking a full backup and removing all the outdated backups, but it allows to save much time, especially for large data volumes.
Before the merge, pg_probackup validates all the affected backups to ensure that they are valid. You can check the current backup status by running the show command with the backup ID. If the merge is still in progress, the backup status is displayed as MERGING. The merge is idempotent, so you can restart the merge if it was interrupted.
This command will delete the backup with the specified **backup_id**, together with all the incremental backups that followed, if any. This way, you can delete some recent incremental backups, retaining the underlying full backup and some of the incremental backups that follow it.
Note that expired backups cannot be removed while at least one incremental backup that satisfies the retention policy is based on them. If you would like to minimize the number of backups still required to keep incremental backups valid, specify the `--merge-expired` option when running this command:
In this case, pg_probackup searches for the oldest incremental backup that satisfies the retention policy and merges this backup with the underlying full and incremental backups that have already expired, thus making it a full backup. Once the merge is complete, the remaining expired backups are deleted.
Before merging or deleting backups, you can run the delete command with the `--dry-run` option, which displays the status of all the available backups according to the current retention policy, without performing any irreversible actions.