1
0
mirror of https://github.com/pgbackrest/pgbackrest.git synced 2024-12-14 10:13:05 +02:00
Commit Graph

363 Commits

Author SHA1 Message Date
David Steele
bf873be4aa Redact authentication header when throwing S3 errors.
The authentication header contains the access key (not the secret key) so don't include it in errors that can be seen at any log level.

Suggested by Brad Nicholson.
2018-12-05 12:51:13 -05:00
David Steele
1ad67644da Remove request for S3 object info directly after putting it.
After a file is copied during backup the size is requested from the storage in case it differs from what was written so that repo-size can be reported accurately. This is useful for situations where compression is being done by the filesystem (e.g. ZFS) and what is stored can differ in size from what was written.

In S3 the reported size will always be exactly what was written so there is no need to check the size and doing so immediately can cause problems because the new file might not appear in list commands. This has not been observed on S3 (though it seems to be possible) but it has been reported on the Swift S3 gateway.

Add a driver capability to determine if size needs to be called after a file is written and if not then simply use the number of bytes written for repo-size.

Reported by Matt Kunkel.
2018-11-30 10:38:02 -05:00
David Steele
801e2a5a2c Rename PGBACKREST/BACKREST constants to PROJECT.
This brings consistency between the C and Perl constants and allows for easier code reuse.
2018-11-24 19:05:03 -05:00
David Steele
b0659278cc Add ServiceError for errors from a service that can be retried.
An example is HTTP 5xx errors which should mostly be retried.
2018-11-16 17:22:22 -05:00
David Steele
6532912d51 Begin v2.08 development. 2018-11-16 10:04:14 -05:00
David Steele
04d9e4d5a8 v2.07: Automatic Backup Checksum Delta
Bug Fixes:

* Fix issue with archive-push-queue-max not being honored on connection error. (Reported by Lardière Sébastien.)
* Fix static WAL segment size used to determine if archive-push-queue-max has been exceeded.
* Fix error after log file open failure when processing should continue. (Reported by vthriller.)

Features:

* Automatically enable backup checksum delta when anomalies (e.g. timeline switch) are detected. (Contributed by Cynthia Shang.)

Improvements:

* Retry all S3 5xx errors rather than just 500 internal errors. (Suggested by Craig A. James.)
2018-11-16 09:50:50 -05:00
David Steele
72ea47bfb3 Add KernelError to report miscellaneous kernel errors. 2018-11-11 18:07:56 -05:00
David Steele
48d2795f31 Merge crypto/random module into crypto/crypto.
There wasn't enough code to justify a separate module/test and it seems to fit just fine in crypto/crypto.
2018-11-06 20:04:16 -05:00
David Steele
8efa5e6a6a Rename CipherError to CryptoError.
This aligns with the general renaming from cipher to crypto.
2018-11-06 19:38:38 -05:00
Cynthia Shang
34c63276cd Automatically enable backup checksum delta when anomalies (e.g. timeline switch) are detected.
There are a number of cases where a checksum delta is more appropriate than the default time-based delta:

* Timeline has switched since the prior backup
* File timestamp is older than recorded in the prior backup
* File size changed but timestamp did not
* File timestamp is in the future compared to the start of the backup
* Online option has changed since the prior backup

A practical example is that checksum delta will be enabled after a failover to standby due to the timeline switch.  In this case, timestamps can't be trusted and our recommendation has been to run a full backup, which can impact the retention schedule and requires manual intervention.

Now, a checksum delta will be performed if the backup type is incr/diff.  This means more CPU will be used during the backup but the backup size will be smaller and the retention schedule will not be impacted.

Contributed by Cynthia Shang.
2018-11-01 11:31:25 -04:00
David Steele
cca7a4ffd4 Retry all S3 5xx errors rather than just 500 internal errors.
We were already retrying 500 errors but 503 (rate-limiting) errors were not being retried and would cause an instant failure which aborted the command.

There are only two 5xx errors currently implemented by S3 but instead of adding 503 simply retry all 5xx errors. This is consistent with the http definition of this error class, "the server failed to fulfill an apparently valid request."

Suggested by Craig A. James.
2018-10-30 16:45:42 -04:00
David Steele
286f7e5011 Fix static WAL segment size used to determine if archive-push-queue-max has been exceeded.
This calculation was missed when the WAL segment size was made dynamic in preparation for PostgreSQL 11.

Fix the calculation by checking the actual WAL file sizes instead of using an estimate based on WAL segment size.  This is more accurate because it takes into account .history and .backup files, which are smaller.  Since the calculation is done in the async process the additional processing time should not adversely affect performance.

Remove the PG_WAL_SIZE constant and instead use local constants where the old value is still required.  This is only the case for some tests and PostgreSQL 8.3 which does not provide a way to get the WAL segment size from pg_control.
2018-10-27 20:00:00 +01:00
David Steele
41b00dc204 Fix issue with archive-push-queue-max not being honored on connection error.
If an error occurred while acquiring a lock on a remote server the error would be reported correctly, but the queue max detection code was not reached.  The tests failed to detect this because they fixed the connection before queue max, allowing the ccde to be reached.

Move the queue max code before the lock so it will run even when remote connections are not working.  This means that no attempt will be made to transfer WAL once queue max has been exceeded, but it makes it much more likely that the code will be reach without error.

Update tests to continue errors up to the point where queue max is exceeded.

Reported by Lardière Sébastien.
2018-10-27 16:57:57 +01:00
David Steele
06d68eada0 Begin v2.07 development. 2018-10-16 17:21:01 +01:00
David Steele
904550c97f v2.06: Checksum Delta Backup and PostgreSQL 11 Support
Bug Fixes:

* Fix missing missing URI encoding in S3 driver. (Reported by Dan Farrell.)
* Fix incorrect error message for duplicate options in configuration files. (Reported by Jesper St John.)
* Fix incorrectly reported error return in info logging. A return code of 1 from the archive-get was being logged as an error message at info level but otherwise worked correctly.

Features:

* Add checksum delta for incremental backups which uses checksums rather than timestamps to determine if files have changed. (Contributed by Cynthia Shang.)
* PostgreSQL 11 support, including configurable WAL segment size.

Improvements:

* Ignore all files in a linked tablespace directory except the subdirectory for the current version of PostgreSQL. Previously an error would be generated if other files were present and not owned by the PostgreSQL user.
* Improve info command to display the stanza cipher type. (Contributed by Cynthia Shang. Suggested by Douglas J Hunley.)
* Improve support for special characters in filenames.
* Allow delta option to be specified in the pgBackRest configuration file. (Contributed by Cynthia Shang.)
2018-10-16 14:56:51 +01:00
David Steele
d038b9a029 Support configurable WAL segment size.
PostgreSQL 11 introduces configurable WAL segment sizes, from 1MB to 1GB.

There are two areas that needed to be updated to support this: building the archive-get queue and checking that WAL has been archived after a backup.  Both operations require the WAL segment size to properly build a list.

Checking the archive after a backup is still implemented in Perl and has an active database connection, so just get the WAL segment size from the database.

The archive-get command does not have a connection to the database, so get the WAL segment size from pg_control instead.  This requires a deeper inspection of pg_control than has been done in the past, so it seemed best to copy the relevant data structures from each version of PostgreSQL and build a generic interface layer to address them.  While this approach is a bit verbose, it has the advantage of being relatively simple, and can easily be updated for new versions of PostgreSQL.

Since the integration tests generate pg_control files for testing, teach Perl how to generate files with the correct offsets for both 32-bit and 64-bit architectures.
2018-09-25 10:24:42 +01:00
David Steele
c0b0b4e541 PostgreSQL 11 Beta 4 support.
Catalog version changed for this release, so update it.

Also update and upload a new container with beta 4 installed.
2018-09-21 13:25:27 -04:00
Cynthia Shang
880fbb5e57 Add checksum delta for incremental backups.
Use checksums rather than timestamps to determine if files have changed.  This is useful in cases where the timestamps may not be trustworthy, e.g. when performing an incremental after failing over to a standby.

If checksum delta is enabled then checksums will be used for verification of resumed backups, even if they are full.  Resumes have always used checksums to verify the files in the repository, enabling delta performs checksums on the database files as well.

Note that the user must manually enable this feature in cases were it would be useful or just keep in enabled all the time.  A future commit will address automatically enabling the feature in cases where it seems likely to be useful.

Contributed by Cynthia Shang.
2018-09-19 11:12:45 -04:00
Cynthia Shang
b6b2c915b2 Allow hashSize() to run on remote storage.
Apparently we never needed to run this function remotely.

It will be needed by the backup checksum delta feature, so implement it now.

Contributed by Cynthia Shang.
2018-09-18 11:39:48 -04:00
Cynthia Shang
052e483057 Restore bIgnoreMissing flag in backupFile() lost in storage refactor.
The test to make sure that some files (e.g. pg_control) do not get removed during the backup was lost during the storage refactor committed at de7fc37f.

This did not impact the integrity of the backups, but bring it back since it is a nice sanity check.

Contributed by Cynthia Shang.
2018-09-18 10:18:39 -04:00
David Steele
9e574a37dc Make archive-get info messages consistent between C and Perl implementations.
The info messages were spread around and logged differently based on the execution path and in some cases logged nothing at all.

Temporarily track the async server status with a flag so that info messages are not output in the async process.  The async process will be refactored as a separate command to be exec'd in a future commit.
2018-09-11 12:30:48 -04:00
Cynthia Shang
e351b8c67c Improve info command to display the stanza cipher type.
Contributed by Cynthia Shang.
Suggested by Douglas J Hunley.
2018-09-10 13:09:45 -04:00
David Steele
c688bc8627 Improve support for special characters in filenames.
% characters caused issues in backup/restore due to filenames being appended directly into a format string.

Reserved XML characters (<>&') caused issues in the S3 driver due to improper escaping.

Add a file with all common special characters to regression testing.
2018-09-10 10:54:34 -04:00
David Steele
80ef6fce75 Fix missing missing URI encoding in S3 driver.
File names with uncommon characters (e.g. @) caused authentication failures due to S3 encoding them correctly while the S3 driver did not.

Reported by Dan Farrell.
2018-09-10 10:47:00 -04:00
David Steele
6361a06181 Fix incorrectly reported error return in info logging.
A return code of 1 from the archive-get was being logged as an error message at info level but otherwise worked correctly.

Also improve info messages when an archive segment is or is not found.
2018-09-04 21:46:41 -04:00
David Steele
375ff9f9d2 Ignore all files in a linked tablespace directory except the subdirectory for the current version of PostgreSQL.
Previously an error would be generated if other files were present and not owned by the PostgreSQL user.  This hasn't been a big deal in practice but it could cause issues.

Also add tests to make sure the same logic applies with links to files, i.e. all other files in the directory should be ignored.  This was actually working correctly, but there were no tests for it before.
2018-08-31 16:06:40 -04:00
David Steele
41746b53cd Begin v2.06 development. 2018-08-31 14:24:36 -04:00
David Steele
bc7462d86d v2.05: Environment Variable Options and Exclude Temporary/Unlogged Relations
Bug Fixes:

* Fix issue where relative links in $PGDATA could be stored in the backup with the wrong path. This issue did not affect absolute links and relative tablespace links were caught by other checks. (Reported by Cynthia Shang.)
* Remove incompletely implemented online option from the check command. Offline operation runs counter to the purpose of this command, which is to check if archiving and backups are working correctly. (Reported by Jason O'Donnell.)
* Fix issue where errors raised in C were not logged when called from Perl. pgBackRest properly terminated with the correct error code but lacked an error message to aid in debugging. (Reported by Douglas J Hunley.)
* Fix issue when a boolean option (e.g. delta) was specified more than once. (Reported by Yogesh Sharma.)

Features:

* Allow any option to be set in an environment variable. This includes options that previously could only be specified on the command line, e.g. stanza, and secret options that could not be specified on the command-line, e.g. repo1-s3-key-secret.
* Exclude temporary and unlogged relation (table/index) files from backup. Implemented using the same logic as the patches adding this feature to PostgreSQL, 8694cc96 and 920a5e50. Temporary relation exclusion is enabled in PostgreSQL ≥ 9.0. Unlogged relation exclusion is enabled in PostgreSQL ≥ 9.1, where the feature was introduced. (Contributed by Cynthia Shang.)
* Allow arbitrary directories and/or files to be excluded from a backup. Misuse of this feature can lead to inconsistent backups so read the --exclude documentation carefully before using. (Reviewed by Cynthia Shang.)
* Add log-subprocess option to allow file logging for local and remote subprocesses.
* PostgreSQL 11 Beta 3 support.

Improvements:

* Allow zero-size files in backup manifest to reference a prior manifest regardless of timestamp delta. (Contributed by Cynthia Shang.)
* Improve asynchronous archive-get/archive-push performance by directly checking status files. (Contributed by Stephen Frost.)
* Improve error message when a command is missing the stanza option. (Suggested by Sarah Conway.)
2018-08-31 13:19:43 -04:00
David Steele
d41570c37a Improve log file names for remote processes started by locals.
The log-subprocess feature added in 22765670 failed to take into account the naming for remote processes spawned by local processes.  Not only was the local command used for the naming of log files but the process id was not pass through.  This meant every remote log was named "[stanza]-local-remote-000" which is confusing and meant multiple processes were writing to the same log.

Instead, pass the real command and process id to the remote.  This required a minor change in locking to ignore locks if process id is greater than 0 since remotes started by locals never lock.
2018-08-31 11:31:13 -04:00
David Steele
70514061fd Fix issue where relative links in $PGDATA could be stored in the backup with the wrong path.
Relative link paths were being combined with the paths of previous links (relative or absolute) due to the $strPath variable being modified in the current iteration rather than simply being passed to the next level of recursion.

This issue did not affect absolute links and relative tablespace links were caught by other checks, though the error was confusing.

Reported by Cynthia Shang.
2018-08-30 16:27:36 -04:00
David Steele
14cde54b37 Limit manifest build recursion (i.e. links followed) to sixteen levels to detect link loops. 2018-08-28 16:27:10 -04:00
David Steele
a6cecf7d5e Prevent manifest from being built more than once. 2018-08-28 16:22:30 -04:00
David Steele
bef58a7974 Allow arbitrary directories and/or files to be excluded from a backup.
Misuse of this feature can lead to inconsistent backups so read the --exclude documentation carefully before using.
2018-08-27 15:51:05 -04:00
David Steele
77dca5b968 Allow command/option constants to autonumber in both C and Perl to reduce churn when a new command/option is added. 2018-08-24 19:31:45 -04:00
Cynthia Shang
eb30d88b6a Allow zero-size files in backup manifest to reference a prior manifest regardless of timestamp delta.
Contributed by Cynthia Shang.
2018-08-24 16:50:33 -04:00
David Steele
0ed37ab9e7 Update Archive::Info->archiveIdList() to return a valid error code instead of unknown. 2018-08-24 12:13:10 -04:00
David Steele
2276567027 Add log-subprocess option to allow file logging for local and remote subprocesses. 2018-08-22 20:05:49 -04:00
David Steele
3434240097 Remove incompletely implemented online option from the check command.
Offline operation runs counter to the purpose of this command, which is to check if archiving and backups are working correctly.

Reported by Jason O'Donnell.
2018-08-12 19:24:21 -04:00
David Steele
7993f1a966 Add basic C JSON parser. 2018-08-09 08:06:23 -04:00
Cynthia Shang
bec4c176dc Exclude temporary and unlogged relation (table/index) files from backup.
Implemented using the same logic as the patches adding this feature to PostgreSQL, 8694cc96 and 920a5e50. Temporary relation exclusion is enabled in PostgreSQL ≥ 9.0. Unlogged relation exclusion is enabled in PostgreSQL ≥ 9.1, where the feature was introduced.

Contributed by Cynthia Shang.
2018-07-30 18:53:34 -04:00
David Steele
1359e2908c Fix issue where errors raised in C were not logged when called from Perl.
pgBackRest properly terminated with the correct error code but lacked an error message to aid in debugging.

Reported by Douglas J Hunley.
2018-07-20 08:11:34 -04:00
David Steele
d3cfeebdf9 Rename error-handling variables in Main.pm to conform to standard. 2018-07-20 08:03:44 -04:00
Cynthia Shang
0acf705416 Require PostgreSQL catalog version when instantiating a Manifest object (and not loading it from disk).
Contributed by Cynthia Shang.
2018-07-16 17:25:15 -04:00
David Steele
b1bc53657d Begin v2.05 development. 2018-07-09 08:15:16 -04:00
David Steele
a8143ec125 v2.04: Critical Bug Fix for Backup Resume
IMPORTANT NOTE: This release fixes a critical bug in the backup resume feature. All resumed backups prior to this release should be considered inconsistent. A backup will be resumed after a prior backup fails, unless resume=n has been specified. A resumed backup can be identified by checking the backup log for the message "aborted backup of same type exists, will be cleaned to remove invalid files and resumed". If the message exists, do not use this backup or any backup in the same set for a restore and check the restore logs to see if a resumed backup was restored. If so, there may be inconsistent data in the cluster.

Bug Fixes:

* Fix critical bug in resume that resulted in inconsistent backups. A regression in v0.82 removed the timestamp comparison when deciding which files from the aborted backup to keep on resume. See note above for more details. (Reported by David Youatt, Yogesh Sharma, Stephen Frost.)
* Fix error in selective restore when only one user database exists in the cluster. (Fixed by Cynthia Shang. Reported by Nj Baliyan.)
* Fix non-compliant ISO-8601 timestamp format in S3 authorization headers. AWS and some gateways were tolerant of space rather than zero-padded hours while others were not. (Fixed by Andrew Schwartz.)

Features:

* PostgreSQL 11 Beta 2 support.

Improvements:

* Improve the HTTP client to set content-length to 0 when not specified by the server. S3 (and gateways) always set content-length or transfer-encoding but HTTP 1.1 does not require it and proxies (e.g. HAProxy) may not include either. (Suggested by Adam K. Sumner.)
* Set search_path = 'pg_catalog' on PostgreSQL connections. (Suggested by Stephen Frost.)
2018-07-05 20:16:41 -04:00
David Steele
db17973cd0 Fix critical bug in resume that resulted in inconsistent backups.
A regression in v0.82 removed the timestamp comparison when deciding which files from the aborted backup to keep on resume. All resumed backups should be considered inconsistent. A resumed backup can be identified by checking the log for the message "aborted backup of same type exists, will be cleaned to remove invalid files and resumed".

Reported by David Youatt, Yogesh Sharma, Stephen Frost.
2018-07-03 14:01:57 -04:00
Andrew Schwartz
1bd98b61df Fix non-compliant ISO-8601 timestamp format in S3 authorization headers.
AWS and some gateways were tolerant of space rather than zero-padded hours while others were not.

Fixed by Andrew Schwartz.
2018-07-01 08:17:27 -04:00
David Steele
7e65ddad34 PostgreSQL 11 Beta 2 support. 2018-06-30 14:55:25 -04:00
David Steele
7b0e65d488 Improve the HTTP client to set content-length to 0 when not specified by the server.
S3 (and gateways) always set content-length or transfer-encoding but HTTP 1.1 does not require it and proxies (e.g. HAProxy) may not include either.

Suggested by Adam K. Sumner.
2018-06-26 17:27:22 -04:00
David Steele
aa41e00c9c Set search_path = 'pg_catalog' on PostgreSQL connections.
Suggested by Stephen Frost.
2018-06-21 11:39:37 -04:00