1
0
mirror of https://github.com/pgbackrest/pgbackrest.git synced 2024-12-14 10:13:05 +02:00
Commit Graph

400 Commits

Author SHA1 Message Date
David Steele
c45ae5f221 Begin v2.14 development. 2019-04-19 08:41:17 -04:00
David Steele
41f3874822 v2.13: Bug Fixes
Bug Fixes:

* Fix zero-length reads causing problems for IO filters that did not expect them. (Reported by brunre01, jwpit, Tomasz Kontusz, guruguruguru.)
* Fix reliability of error reporting from local/remote processes.
* Fix Posix/CIFS error messages reporting the wrong filename on write/sync/close.
2019-04-18 21:26:02 -04:00
David Steele
867690c08d Begin v2.13 development.
Also update CentOS packages so documentation builds.
2019-04-12 08:33:10 -04:00
David Steele
4e57b68916 v2.12: C Implementation of Archive Push
IMPORTANT NOTE: The new TLS/SSL implementation forbids dots in S3 bucket names per RFC-2818. This security fix is required for compliant hostname verification.

Bug Fixes:

* Fix issues when a path option is / terminated. (Reported by Marc Cousin.)
* Fix issues when log-level-file=off is set for the archive-get command. (Reported by Brad Nicholson.)
* Fix C code to recognize host:port option format like Perl does. (Reported by Kyle Nevins.)
* Fix issues with remote/local command logging options.

Improvements:

* The archive-push command is implemented entirely in C.
* Increase process-max limit to 999. (Suggested by Rakshitha-BR.)
* Improve error message when an S3 bucket name contains dots.

Documentation Improvements:

* Clarify that S3-compatible object stores are supported. (Suggested by Magnus Hagander.)
2019-04-11 09:14:22 -04:00
David Steele
9a7eab9428 Allow three-digits process IDs in logging.
This is required to support process-max > 99 or else there will be formatting/alignment issues in the logs.
2019-04-07 18:12:07 -04:00
David Steele
1b48684713 The archive-push command is implemented entirely in C.
This new implementation should behave exactly like the old Perl code with the exception of updated log messages.

Remove as much of the Perl code as possible without breaking other commands.
2019-03-29 13:26:33 +00:00
David Steele
fc974626cd Add a note regarding verifying checkpoint against replay position. 2019-03-17 08:35:40 +04:00
David Steele
9382283586 Fix issues when a path option is / terminated.
This condition was not being properly checked for in the C code and it caused problems in the info command, at the very least.

Instead of applying a local fix, introduce a new path option type that will rigorously check the format of any incoming paths.

Reported by Marc Cousin.
2019-03-14 13:48:33 +04:00
David Steele
b8ebea6b1c Add separate archive-push-async command.
This command was previously forked off from the archive-push command which required a bit of artificial option and log manipulation.

A separate command is easier to test and will work on platforms that don't have fork(), e.g. Windows.
2019-03-14 13:38:55 +04:00
David Steele
cf5a5b7b9a Begin v2.12 development. 2019-03-11 10:43:35 +02:00
David Steele
68d20edea6 v2.11: C Implementation of Archive Get
Bug Fixes:

* Fix possible truncated WAL segments when an error occurs mid-write. (Reported by blogh.)
* Fix info command missing WAL min/max when stanza specified. (Fixed by Stefan Fercot.)
* Fix non-compliant JSON for options passed from C to Perl. (Reported by Leo Khomenko.)

Improvements:

* The archive-get command is implemented entirely in C.
* Enable socket keep-alive on older Perl versions. (Contributed by Marc Cousin.)
* Error when parameters are passed to a command that does not accept parameters. (Suggested by Jason O'Donnell.)
* Add hints when unable to find a WAL segment in the archive. (Suggested by Hans-Jürgen Schönig.)
* Improve error when hostname cannot be found in a certificate. (Suggested by James Badger.)
* Add additional options to backup.manifest for debugging purposes. (Contributed by blogh.)
2019-03-10 18:56:00 +02:00
blogh
e4e2606fce Add additional options to backup.manifest for debugging purposes.
Add the buffer-size, compress-level, compress-level-network, and process-max options to the backup:option section in backup.manifest to aid in debugging.

It may also make sense to propagate these options up to backup.info so they can be displayed in the info command, but for now this is deemed sufficient.

Contributed by blogh.
2019-03-10 11:03:52 +02:00
David Steele
21f56f64eb Add hints when unable to find a WAL segment in the archive.
When this error happens in the context of a backup it can be a bit mystifying as to why the backup is failing.  Add some hints to get the user started.

These hints will appear any time a WAL segment can't be found, which makes the hint about the check command redundant when the user is actually running the check command, but it doesn't seem worth trying to exclude the hint in that case.

Suggested by Hans-Jürgen Schönig.
2019-03-10 10:38:12 +02:00
Marc Cousin
cb3b4fa24b Enable socket keep-alive on older Perl versions.
The prior method depended on IO:Socket:SSL to push the keep-alive options down to the socket but it only worked for recent versions of the module.

Instead, create the socket directly using IO::Socket::IP if available or IO:Socket:INET as a fallback.  The keep-alive option is set directly on the socket before it is passed to IO:Socket:SSL.

Contributed by Marc Cousin.
2019-02-28 14:33:29 +02:00
David Steele
db4b447be8 The archive-get command is implemented entirely in C.
This new implementation should behave exactly like the old Perl code with the exception of a few updated log messages.

Remove as much of the Perl code as possible without breaking other commands.
2019-02-27 23:03:02 +02:00
David Steele
1f66bda02e Fix non-compliant JSON for options passed from C to Perl.
We have been using a hacked-up JSON generator to pass options from C to Perl since the C binary was introduced.  This generator was not very compliant which led to issues with \n, ", etc. inside strings.

We have a fully-compliant JSON generator now so use that instead.

Reported by Leo Khomenko.
2019-02-22 12:02:26 +02:00
David Steele
b0b5989aca Migrate remote archive-get command to C.
All required protocol commands are implemented so this is mostly a matter of enabling the feature and updating expect logs.
2019-02-20 22:57:18 +02:00
David Steele
73be64ce49 Add separate archive-get-async command.
This command was previously forked off from the archive-get command which required a bit of artificial option and log manipulation.

A separate command is easier to test and will work on platforms that don't have fork(), e.g. Windows.
2019-02-20 15:52:07 +02:00
David Steele
d211c2b8b5 Fix possible truncated WAL segments when an error occurs mid-write.
The file write object destructors called close() and finalized the file even if it was not completely written.  This was an issue in both the C and Perl code.

Rewrite the destructors to simply free resources (like file handles) rather than calling the close() method.  This leaves the temp file in place for filesystems that use temp files.

Add unit tests to prevent regression.

Reported by blogh.
2019-02-15 11:52:39 +02:00
David Steele
a5f6f801d7 Begin v2.11 development. 2019-02-12 14:11:16 +02:00
David Steele
35903b94d9 v2.10: Bug Fixes
Bug Fixes:

* Add unimplemented S3 driver method required for archive-get. (Reported by mibiio.)
* Fix check for improperly configured pg-path. (Reported by James Chanco Jr.)
2019-02-09 19:52:31 +02:00
David Steele
6e88f93991 Fix check for improperly configured pg-path.
The check to verify that pg-path and data_directory are equal was not working because pg-path was getting overwritten with data_directory before validation took place.

Reported by James Chanco Jr.
2019-02-05 18:55:07 +02:00
David Steele
abc613b454 Begin v2.10 development. 2019-02-02 14:50:24 +02:00
David Steele
a89a376119 v2.09: Minor Improvements and Bug Fixes
Bug Fixes:

* Fix issue with multiple async status files causing a hard error. (Reported by Vidhya Gurumoorthi, Joe Ayers, Douglas J Hunley.)

Improvements:

* The info command is implemented entirely in C.
* Simplify info command text message when no stanzas are present by replacing the repository path with "the repository".
* Add _DARWIN_C_SOURCE flag to Makefile for MacOS builds. (Contributed by Douglas J Hunley.)
* Update address lookup in C TLS client to use modern methods. (Suggested by Bruno Friedmann.)
* Include Posix-compliant header for strcasecmp() and fd_set. (Suggested by ucando.)
2019-01-30 22:37:35 +02:00
David Steele
8f6d324b2c Fix issue with multiple async status files causing a hard error.
Multiple status files were being created by asynchronous archiving if a high-level error occurred after one or more WAL segments had already been transferred successfully.  Error files were being written for every file in the queue regardless of whether it had already succeeded.  To fix this, add an option to skip writing error files when an ok file already exists.

There are other situations where both files might exist (various fsync and filesystem error scenarios) so it seems best to retry in the case that multiple status files are found rather than throwing a hard error (which then means that archiving is completely stuck).  In the case of multiple status files, a warning will be logged to alert the user that something unusual is happening and the command will be retried.

Reported by fpa-postgres, Joe Ayers, Douglas J Hunley.
2019-01-26 16:59:54 +02:00
David Steele
d245f8eb42 The info command is implemented entirely in C.
The C info code has already been committed but this commit wires it into main.

Also remove the info Perl code and tests since they are no longer called.
2019-01-21 13:51:45 +02:00
David Steele
9cac403f61 Add Exec object.
Executes a child process and allows the calling process to communicate with it using read/write io.

This object is specially tailored to implement the protocol layer and may or may not be generally applicable to general purpose
execution.
2019-01-18 11:45:40 +02:00
David Steele
e68d1e7304 Simplify info command text message when no stanza are present.
Replace the repository path with just "the repository".  The path is not important in this context and it is clearer to state where the stanzas are missing from.
2019-01-16 19:23:10 +02:00
David Steele
ef9dc89e08 Update Storage::Local->list() to accept an undefined path.
The Perl code has a tendency to generate absolute paths even when they are not needed. This change helps the C and Perl storage work together via the protocol layer.
2019-01-16 18:49:12 +02:00
David Steele
b4146b6bff Update Perl repo rules to work when stanza is not specified.
The C storage object strives to use rules whenever possible instead of generating absolute paths.  This change helps the C and Perl storage work together via the protocol layer.
2019-01-16 18:45:19 +02:00
David Steele
50717aa846 Begin v2.09 development. 2019-01-04 11:00:59 +02:00
David Steele
db24ff8df4 v2.08: Minor Improvements and Bug Fixes
Bug Fixes:

* Remove request for S3 object info directly after putting it. (Reported by Matt Kunkel.)
* Correct archive-get-queue-max to be size type. (Reported by Ronan Dunklau.)
* Add error message when current user uid/gid does not map to a name. (Reported by Camilo Aguilar.)
* Error when --target-action=shutdown specified for PostgreSQL < 9.5.

Improvements:

* Set TCP keepalives on S3 connections. (Suggested by Ronan Dunklau.)
* Reorder info command text output so most recent backup is output last. (Contributed by Cynthia Shang. Suggested by Ryan Lambert.)
* Change file ownership only when required.
* Redact authentication header when throwing S3 errors. (Suggested by Brad Nicholson.)
2019-01-02 22:04:47 +02:00
David Steele
23b583336f Set TCP keepalives on S3 connections.
Keepalives may help in situations where RST packets are being blocked by a firewall or otherwise do not arrive.

The C code uses select on all reads so it should never block, but add keepalives just in case.

Suggested by Ronan Dunklau.
2018-12-18 22:12:59 +02:00
Cynthia Shang
35bbb5bd68 Reorder info command text output so most recent backup is output last.
After a stanza-upgrade backups for the old cluster are displayed until they expire.  Cluster info was output newest to oldest which meant after an upgrade the most recent backup would no longer be output last.

Update the text output ordering so the most recent backup is always output last.

Contributed by Cynthia Shang.
Suggested by Ryan Lambert.
2018-12-14 18:25:31 -05:00
David Steele
e6abdfb5b8 Add error message when current user uid/gid does not map to a name.
This condition resulted in a nasty stack trace dump when the undefined value was used later on.

Reported by Camilo Aguilar.
2018-12-07 07:41:26 -05:00
David Steele
e73416e9e3 Change file ownership only when required.
Previously chown() would be called even when no ownership changes were required.

In most cases changes are not required and it seems better to perform an extra stat() rather than an extra chown().

Also add unit tests for owner() since there weren't any.
2018-12-05 17:56:47 -05:00
David Steele
e96986a4e1 Error when --target-action=shutdown specified for PostgreSQL < 9.5.
This equaled "promote" on unsupported versions which qualifies as a surprising behavior.
2018-12-05 16:21:45 -05:00
David Steele
bf873be4aa Redact authentication header when throwing S3 errors.
The authentication header contains the access key (not the secret key) so don't include it in errors that can be seen at any log level.

Suggested by Brad Nicholson.
2018-12-05 12:51:13 -05:00
David Steele
1ad67644da Remove request for S3 object info directly after putting it.
After a file is copied during backup the size is requested from the storage in case it differs from what was written so that repo-size can be reported accurately. This is useful for situations where compression is being done by the filesystem (e.g. ZFS) and what is stored can differ in size from what was written.

In S3 the reported size will always be exactly what was written so there is no need to check the size and doing so immediately can cause problems because the new file might not appear in list commands. This has not been observed on S3 (though it seems to be possible) but it has been reported on the Swift S3 gateway.

Add a driver capability to determine if size needs to be called after a file is written and if not then simply use the number of bytes written for repo-size.

Reported by Matt Kunkel.
2018-11-30 10:38:02 -05:00
David Steele
801e2a5a2c Rename PGBACKREST/BACKREST constants to PROJECT.
This brings consistency between the C and Perl constants and allows for easier code reuse.
2018-11-24 19:05:03 -05:00
David Steele
b0659278cc Add ServiceError for errors from a service that can be retried.
An example is HTTP 5xx errors which should mostly be retried.
2018-11-16 17:22:22 -05:00
David Steele
6532912d51 Begin v2.08 development. 2018-11-16 10:04:14 -05:00
David Steele
04d9e4d5a8 v2.07: Automatic Backup Checksum Delta
Bug Fixes:

* Fix issue with archive-push-queue-max not being honored on connection error. (Reported by Lardière Sébastien.)
* Fix static WAL segment size used to determine if archive-push-queue-max has been exceeded.
* Fix error after log file open failure when processing should continue. (Reported by vthriller.)

Features:

* Automatically enable backup checksum delta when anomalies (e.g. timeline switch) are detected. (Contributed by Cynthia Shang.)

Improvements:

* Retry all S3 5xx errors rather than just 500 internal errors. (Suggested by Craig A. James.)
2018-11-16 09:50:50 -05:00
David Steele
72ea47bfb3 Add KernelError to report miscellaneous kernel errors. 2018-11-11 18:07:56 -05:00
David Steele
48d2795f31 Merge crypto/random module into crypto/crypto.
There wasn't enough code to justify a separate module/test and it seems to fit just fine in crypto/crypto.
2018-11-06 20:04:16 -05:00
David Steele
8efa5e6a6a Rename CipherError to CryptoError.
This aligns with the general renaming from cipher to crypto.
2018-11-06 19:38:38 -05:00
Cynthia Shang
34c63276cd Automatically enable backup checksum delta when anomalies (e.g. timeline switch) are detected.
There are a number of cases where a checksum delta is more appropriate than the default time-based delta:

* Timeline has switched since the prior backup
* File timestamp is older than recorded in the prior backup
* File size changed but timestamp did not
* File timestamp is in the future compared to the start of the backup
* Online option has changed since the prior backup

A practical example is that checksum delta will be enabled after a failover to standby due to the timeline switch.  In this case, timestamps can't be trusted and our recommendation has been to run a full backup, which can impact the retention schedule and requires manual intervention.

Now, a checksum delta will be performed if the backup type is incr/diff.  This means more CPU will be used during the backup but the backup size will be smaller and the retention schedule will not be impacted.

Contributed by Cynthia Shang.
2018-11-01 11:31:25 -04:00
David Steele
cca7a4ffd4 Retry all S3 5xx errors rather than just 500 internal errors.
We were already retrying 500 errors but 503 (rate-limiting) errors were not being retried and would cause an instant failure which aborted the command.

There are only two 5xx errors currently implemented by S3 but instead of adding 503 simply retry all 5xx errors. This is consistent with the http definition of this error class, "the server failed to fulfill an apparently valid request."

Suggested by Craig A. James.
2018-10-30 16:45:42 -04:00
David Steele
286f7e5011 Fix static WAL segment size used to determine if archive-push-queue-max has been exceeded.
This calculation was missed when the WAL segment size was made dynamic in preparation for PostgreSQL 11.

Fix the calculation by checking the actual WAL file sizes instead of using an estimate based on WAL segment size.  This is more accurate because it takes into account .history and .backup files, which are smaller.  Since the calculation is done in the async process the additional processing time should not adversely affect performance.

Remove the PG_WAL_SIZE constant and instead use local constants where the old value is still required.  This is only the case for some tests and PostgreSQL 8.3 which does not provide a way to get the WAL segment size from pg_control.
2018-10-27 20:00:00 +01:00
David Steele
41b00dc204 Fix issue with archive-push-queue-max not being honored on connection error.
If an error occurred while acquiring a lock on a remote server the error would be reported correctly, but the queue max detection code was not reached.  The tests failed to detect this because they fixed the connection before queue max, allowing the ccde to be reached.

Move the queue max code before the lock so it will run even when remote connections are not working.  This means that no attempt will be made to transfer WAL once queue max has been exceeded, but it makes it much more likely that the code will be reach without error.

Update tests to continue errors up to the point where queue max is exceeded.

Reported by Lardière Sébastien.
2018-10-27 16:57:57 +01:00