1
0
mirror of https://github.com/pgbackrest/pgbackrest.git synced 2024-12-14 10:13:05 +02:00
Commit Graph

48 Commits

Author SHA1 Message Date
Cynthia Shang
34c63276cd Automatically enable backup checksum delta when anomalies (e.g. timeline switch) are detected.
There are a number of cases where a checksum delta is more appropriate than the default time-based delta:

* Timeline has switched since the prior backup
* File timestamp is older than recorded in the prior backup
* File size changed but timestamp did not
* File timestamp is in the future compared to the start of the backup
* Online option has changed since the prior backup

A practical example is that checksum delta will be enabled after a failover to standby due to the timeline switch.  In this case, timestamps can't be trusted and our recommendation has been to run a full backup, which can impact the retention schedule and requires manual intervention.

Now, a checksum delta will be performed if the backup type is incr/diff.  This means more CPU will be used during the backup but the backup size will be smaller and the retention schedule will not be impacted.

Contributed by Cynthia Shang.
2018-11-01 11:31:25 -04:00
Cynthia Shang
880fbb5e57 Add checksum delta for incremental backups.
Use checksums rather than timestamps to determine if files have changed.  This is useful in cases where the timestamps may not be trustworthy, e.g. when performing an incremental after failing over to a standby.

If checksum delta is enabled then checksums will be used for verification of resumed backups, even if they are full.  Resumes have always used checksums to verify the files in the repository, enabling delta performs checksums on the database files as well.

Note that the user must manually enable this feature in cases were it would be useful or just keep in enabled all the time.  A future commit will address automatically enabling the feature in cases where it seems likely to be useful.

Contributed by Cynthia Shang.
2018-09-19 11:12:45 -04:00
David Steele
375ff9f9d2 Ignore all files in a linked tablespace directory except the subdirectory for the current version of PostgreSQL.
Previously an error would be generated if other files were present and not owned by the PostgreSQL user.  This hasn't been a big deal in practice but it could cause issues.

Also add tests to make sure the same logic applies with links to files, i.e. all other files in the directory should be ignored.  This was actually working correctly, but there were no tests for it before.
2018-08-31 16:06:40 -04:00
David Steele
70514061fd Fix issue where relative links in $PGDATA could be stored in the backup with the wrong path.
Relative link paths were being combined with the paths of previous links (relative or absolute) due to the $strPath variable being modified in the current iteration rather than simply being passed to the next level of recursion.

This issue did not affect absolute links and relative tablespace links were caught by other checks, though the error was confusing.

Reported by Cynthia Shang.
2018-08-30 16:27:36 -04:00
David Steele
14cde54b37 Limit manifest build recursion (i.e. links followed) to sixteen levels to detect link loops. 2018-08-28 16:27:10 -04:00
David Steele
a6cecf7d5e Prevent manifest from being built more than once. 2018-08-28 16:22:30 -04:00
David Steele
bef58a7974 Allow arbitrary directories and/or files to be excluded from a backup.
Misuse of this feature can lead to inconsistent backups so read the --exclude documentation carefully before using.
2018-08-27 15:51:05 -04:00
Cynthia Shang
eb30d88b6a Allow zero-size files in backup manifest to reference a prior manifest regardless of timestamp delta.
Contributed by Cynthia Shang.
2018-08-24 16:50:33 -04:00
Cynthia Shang
bec4c176dc Exclude temporary and unlogged relation (table/index) files from backup.
Implemented using the same logic as the patches adding this feature to PostgreSQL, 8694cc96 and 920a5e50. Temporary relation exclusion is enabled in PostgreSQL ≥ 9.0. Unlogged relation exclusion is enabled in PostgreSQL ≥ 9.1, where the feature was introduced.

Contributed by Cynthia Shang.
2018-07-30 18:53:34 -04:00
Cynthia Shang
0acf705416 Require PostgreSQL catalog version when instantiating a Manifest object (and not loading it from disk).
Contributed by Cynthia Shang.
2018-07-16 17:25:15 -04:00
David Steele
350b30fa49 Move cryptographic hash functions to C using OpenSSL. 2018-06-11 14:52:26 -04:00
David Steele
5e090ba305 Fix failure in manifest build when two or more files in PGDATA are linked to the same directory.
Reported by Vitaliy Kukharik.
2018-05-02 12:19:54 -04:00
David Steele
be90028100 Rename db-* options to pg-* and backup-* options to repo-* to improve consistency.
* repo-* options are now indexed although only one is allowed.
* List deprecated option names in documentation and command-line help.
2018-02-03 18:27:38 -05:00
Cynthia Shang
00f58ec8c0 Fixed inability to restore a single database contained in a tablespace using --db-include.
Fixed by Cynthia Shang.
2018-01-30 16:13:54 -05:00
Cynthia Shang
bd74711ceb Add unit tests for the Manifest module.
Also minor changes to Manifest module, mostly for test reproducibility.

Contributed by Cynthia Shang.
2017-11-28 11:44:24 -05:00
Cynthia Shang
b03c26968a Repository encryption support.
Contributed by Cynthia Shang.
2017-11-06 12:51:12 -05:00
David Steele
6343fdd584 Additional backup exclusions.
* Exclude contents of pg_snapshots, pg_serial, pg_notify, and pg_dynshmem from backup since they are rebuilt on startup.
* Exclude pg_internal.init files from backup since they are rebuilt on startup.
2017-09-04 08:26:57 -04:00
David Steele
fcb7c6fd1d PostgreSQL 10 support. 2017-09-01 12:29:34 -04:00
David Steele
1e0ed07455 Configuration rules are now pulled from the C library when present. 2017-08-25 16:47:47 -04:00
David Steele
d5c1f02c72 Include archive_status directory in online backups.
The archive_status directory is now recreated on restore to support PostgreSQL 8.3 which does not recreate it automatically like more recent versions do.

Also fixed log checking after PostgreSQL shuts down to include FATAL messages and disallow immediate shutdowns which can throw FATAL errors in the log.

Reported by Stephen Frost.
2017-07-24 07:57:47 -04:00
David Steele
2310e423e9 Fixed an issue that prevented tablespaces from being backed up on PostgreSQL ≤ 8.4.
The integration tests that were supposed to prevent this regression did not work as intended.  They verified the contents of a table in the (supposedly) restored tablespace, deleted the table, and then deleted the tablespace.  All of this was deemed sufficient to prove that the tablespace had been restored correctly and was valid.

However, PostgreSQL will happily recreate a tablespace on the basis of a single full-page write, at least in the affected versions.  Since writes to the test table were replayed from WAL with each recovery, all the tests passed even though the tablespace was missing after the restore.

The tests have been updated to include direct comparisons against the file system and a new table that is not replayed after a restore because it is created before the backup and never modified again.

Versions ≥ 9.0 were not affected due to numerous synthetic integration tests that verify backups and restores file by file.
2017-06-27 16:47:40 -04:00
David Steele
de7fc37f88 Storage and IO layer refactor:
Refactor storage layer to allow for new repository filesystems using drivers. (Reviewed by Cynthia Shang.)
Refactor IO layer to allow for new compression formats, checksum types, and other capabilities using filters. (Reviewed by Cynthia Shang.)
2017-06-09 17:51:41 -04:00
David Steele
3d84f2ce5e Improvements to Ini.pm.
* Refactor Ini.pm to facilitate testing.
* Complete statement/branch coverage for Ini.pm.
* Improved functions used to test/munge manifest and info files.
2017-04-10 13:24:45 -04:00
David Steele
02730526fc Fixed an issue where databases created with a non-default tablespace would raise bogus warnings about pg_filenode.map and pg_internal.init not being page aligned.
Reported by blogh.
2017-03-02 13:50:29 -05:00
David Steele
36a5349b1c Added the --checksum-page option.
This option allows pgBackRest to validate page checksums in data files when checksums are enabled on PostgreSQL >= 9.3. Note that this functionality requires a C library which may not initially be available in OS packages. The option will automatically be enabled when the library is present and checksums are enabled on the cluster.
2016-12-12 18:54:07 -05:00
David Steele
a850335015 Simplified the result hash of File->manifest(), Db->tablespaceMapGet(), and Db->databaseMapGet(). 2016-11-30 14:36:39 -05:00
David Steele
dd621081b9 Fixed an issue where tablespace paths with the same prefix would cause an invalid link error.
Reported by Nikhilchandra Kulkarni.
2016-11-07 16:37:16 +02:00
David Steele
f43e5bc52d Removed extraneous use lib directives from Perl modules.
Suggested by Devrim Gündüz.
2016-11-04 13:56:26 +02:00
David Steele
a701309453 Converted Perl threads to processes. 2016-09-06 09:35:02 -04:00
David Steele
bcdb5cdac8 Fixed a issue where tablespaces were copied from the master during standby backup. 2016-09-04 09:19:44 -04:00
David Steele
2feaaf225e Exclude contents of $PGDATA/pg_replslot directory. 2016-09-04 09:13:13 -04:00
David Steele
5ada189a92 Backup from a standby cluster.
A connection to the primary cluster is still required to start/stop the backup and copy files that are not replicated, but the vast majority of files are copied from the standby in order to reduce load on the master.
2016-08-25 11:25:46 -04:00
David Steele
cd6278e5af Revert some backup exclusions until they have been tested more thoroughly. 2016-08-24 12:27:48 -04:00
David Steele
f1412baccf Exclude directories during backup that are cleaned, recreated, or zeroed by PostgreSQL at startup.
These include (depending on the version where they were introduced): pgsql_tmp, pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots, pg_stat_tmp, pg_subtrans. The postgresql.auto.conf.tmp file is now excluded in addition to files that were already excluded: backup_label.old, postmaster.opts, postmaster.pid, recovery.conf, recovery.done.
2016-08-16 09:35:16 -04:00
David Steele
1e0f15f425 Improve error message for links that reference links in manifest build. 2016-08-15 17:23:37 -04:00
David Steele
f9fa1270b2 Fixed #236: Recursive user tablespace symlink.
A tablespace link that referenced another link would not produce an error, but instead skip the tablespace entirely.
2016-08-15 17:11:45 -04:00
David Steele
17b79d6279 Database version refactoring.
* Refactor db version constants into a separate module.
* Update synthetic backup tests to PostgreSQL 9.4.
2016-08-11 22:35:24 -04:00
David Steele
bff262ac47 Removed all OP_* function constants that were used only for debugging, not in the protocol, and replaced with __PACKAGE__. 2016-08-11 17:32:28 -04:00
David Steele
34afe5e85b Fixed issue with tablespace link checking.
* Tablespace paths that had $PGDATA as a substring would be identified as a subdirectories of $PGDATA even when they were not.
* Also hardened relative path checking a bit.
2016-08-09 09:05:27 -04:00
David Steele
a3b8808f94 Fixed an issue where the contents of pg_xlog were being copied if the directory was symlinked. 2016-07-29 18:44:53 -04:00
David Steele
cc2a8777d5 User/group permissions improvements.
Improved handling of users/groups captured during backup that do not exist on the restore host. Also explicitly handle the case where user/group is not mapped to a name.
2016-06-26 21:01:20 -04:00
David Steele
23a3911830 Stop using pg_xlogfile_name().
The pg_xlogfile_name() function is no longer used to construct WAL filenames from LSNs. While this function is convenient it is not available on a standby. Instead, the archive is searched for the LSN in order to find the timeline. If due to some misadventure the LSN appears on multiple timelines then an error will be thrown, whereas before this condition would have passed unnoticed.
2016-06-24 08:06:20 -04:00
David Steele
0451d3afdd Support for non-exclusive backups in PostgreSQL 9.6. 2016-05-16 17:59:26 -04:00
David Steele
9b5a27f657 Add Manifest->addFile().
Some files need to be added to the manifest after the initial build.  This is currently done in only one place but usage will expand in the future so the functionality has been encapsulated in addFile().
2016-05-14 10:39:56 -04:00
David Steele
512d006346 Refactor database version identification for archive and backup commands.
Added database version constants and changed version identification code to use hash tables instead of if-else.  Propagated the db version constants to the rest of the code and in passing fixed some path/filename constants.

Added new regression tests to check that specific files are never copied.
2016-05-14 10:33:12 -04:00
David Steele
0c320e7df7 Allow selective restore of databases from a cluster backup.
This feature can result in major space and time savings when only specific databases are restored. Unrestored databases will not be accessible but must be manually dropped before they will be removed from the shared catalogue.
2016-05-11 09:21:39 -04:00
David Steele
9457e15347 New manifest format.
* All files and directories linked from PGDATA are now included in the backup. By default links will be restored directly into PGDATA as files or directories. The --link-all option can be used to restore all links to their original locations. The --link-map option can be used to remap a link to a new location.

* Removed --tablespace option and replaced with --tablespace-map-all option which should more clearly indicate its function.

* Added detail log level which will output more information than info without being as verbose as debug.
2016-04-14 22:50:02 -04:00
David Steele
18fd25233b New simpler configuration and consistent project/exe/path naming.
* The repo-path option now always refers to the repository where backups and archive are stored, whether local or remote, so the repo-remote-path option has been removed. The new spool-path option can be used to define a location for queueing WAL segments when archiving asynchronously. Otherwise, a local repository is no longer required.

* Implemented a new config format which should be far simpler to use. See the User Guide and Configuration Reference for details but for a simple configuration all options can now be placed in the stanza section. Options that are shared between stanzas can be placed in the [global] section. More complex configurations can still make use of command sections though this should be a rare use case.

* The default configuration filename is now pgbackrest.conf instead of pg_backrest.conf. This was done for consistency with other naming changes but also to prevent old config files from being loaded accidentally.

* The default repository name was changed from /var/lib/backup to /var/lib/pgbackrest.

* Lock files are now stored in /tmp/pgbackrest by default. These days /run/pgbackrest would be the preferred location but that would require init scripts which are not part of this release. The lock-path option can be used to configure the lock directory.

* Log files are now stored in /var/log/pgbackrest by default and no longer have the date appended so they can be managed with logrotate. The log-path option can be used to configure the lock directory.

* Executable filename changed from pg_backrest to pgbackrest.
2016-04-14 09:30:54 -04:00