1
0
mirror of https://github.com/pgbackrest/pgbackrest.git synced 2025-09-16 09:06:18 +02:00

2343 Commits

Author SHA1 Message Date
David Steele
9c7f064864 Unnest HTTP/TLS/socket timeouts.
These timeouts were nested, i.e. an error in the socket connection would also retry in the TLS and HTTP layers. This led to a multiplying effect such that it took nine minutes to fully timeout with the default io-timeout of 60 seconds.

A fix for this was attempted in 5314dbff but it reduced retries by too much and had to be reverted in fa5b2d44.

Instead fix by moving connection attempts to the lower layer (e.g. TLS -> socket) out of the exception block but leave it within the retry loop. So, for example, if the socket connection fails after retries then the error will not be retried by TLS. But if the TLS session fails the socket will be reconnected.
2025-09-05 11:42:00 -04:00
Zsolt Parragi
adc5e5b23b Fix possible segfault in page checksum error message.
The STRDEF definition of plural is out of scope in the error message just below it, which may lead to either garbage in the log message or a crash with SIGSEGV.

This is particularly visible with pg_tde, which encrypts tables: if pgbackrest is run with checksum enabled, it tries to emit this message for all encrypted files.

Fix by using simple char * constants in the error message.
2025-08-25 11:51:54 -04:00
David Steele
87339004e5 Add HTTP retries for 408 and 429 errors.
HTTP client errors 408 and 429 were not being retried but there seems to be some benefit to doing so.

408 has only been seen once and in that case the server was returning 400 (with Request Timeout in the response body) but it seems worth doing since this could happen during times of high congestion. Requests are not sent until request content is ready so there is not much else to be done to handle this error.

429 has been seen occasionally in the past but now seems to be common on Cloudflare R2. Ideally we would have a large initial back off here but it is not clear that is worth it at this time. The existing Fibonacci back off should be enough to allow operations to proceed if possible during the configured timeout.
2025-08-13 10:12:29 -04:00
David Steele
fcd00a45f5 Update user guide and tests to Ubuntu 22.04.
20.04 is EOL so update the user guide and tests to 22.04.

The contributing guide was updated to 22.04 in d55836e7e.
2025-08-04 17:42:16 -04:00
David Steele
03cabe2644 Begin v2.57.0 development. 2025-07-21 13:08:40 -04:00
David Steele
eca9290a2a v2.56.0: Progress Info Improvements
Bug Fixes:

* Fix issue with adhoc expiration when no backups in a repository. (Reviewed by Stefan Fercot. Reported by Anup Gupta.)

Features:

* Add restore progress to info command output. (Contributed by Denis Garsh, Maxim Michkov. Reviewed by David Steele.)
* Add progress-only detail level for info command output. (Contributed by Denis Garsh. Reviewed by David Steele, Stefan Fercot.)

Improvements:

* Retry failed reads on object stores. (Reviewed by David Christensen.)
* Fix defaults in command-line help. (Reviewed by David Christensen, Chris Bandy.)

Documentation Improvements:

* Describe discrete option values in a list where appropriate. (Contributed by Anton Kurochkin. Reviewed by David Steele.)
* Fix "less than" in help output for archive-mode option. (Contributed by Anton Kurochkin. Reviewed by David Steele.)
2025-07-21 10:12:22 -04:00
Denis Garsh
8cdd9ce1c4 Add restore progress to info command output.
The prior implementation of the restore command did not provide any progress information. Implement it similarly to the backup command and also update the info command to display restore progress alongside backup progress.

Restore is a read operation and should not block other commands. The only exception is that multiple restores with the same lock and repository path are not allowed, as each restore must write progress to a separate file. Therefore, no lock is needed for restores with a remote role and the restore command should not be terminated when the stop command is executed.
2025-07-16 12:49:34 -04:00
David Steele
d55836e7e8 Update contributing guide to Ubuntu 22.04.
20.04 is EOL and Docker is now noisy during installation so update.
2025-07-15 09:45:36 -04:00
David Steele
ff944585ca Add funding information. 2025-07-07 16:57:08 -04:00
Denis Garsh
2706168f60 Add progress-only detail level for info command output.
The info command fetches a lot of information from the repository about backups and archives, so this operation can be slow. Because progress data is stored in local lock files, accessing the repository is unnecessary when only progress information is required.

This patch introduces the `--detail-level=[progress|full]` option, with `full` as the default. The `progress` level limits the info command output to progress details without querying the repository. The only remaining operations are scanning the folder structure to list available stanzas and reading lock files.

Note: When `progress` is selected, the info command performs no checks beyond verifying stanza availability.
2025-06-26 17:40:05 -04:00
David Steele
638815b4cd Fix issue with adhoc expiration when no backups in a repository.
If there are no backups in one or more repositories then the following error occurs during adhoc expiration:

ASSERT: [025]: cannot get index 0 from list with 0 value(s)

Fix this by skipping the adhoc logic when there are no backups in a repository.
2025-06-24 12:35:18 -04:00
David Steele
44e3ee33d2 Improve defaults and allow lists with option value dependencies.
Some defaults and allow lists were determined in cfgLoadUpdateOption() because they depended on the values of other options, .e.g. compress-level.

Instead build this functionality into the config parser. Not only does this standardize the defaults and allow lists but it makes it possible to automate the documentation, which is also done in this commit.
2025-06-16 17:21:33 -04:00
David Steele
5e14692588 Retry failed reads on object stores.
There is currently a retry if the initial get request fails (depending on the error code) but if the read fails later on while fetching blocks of data it is fatal. In most cases there is a higher level retry (e.g. restore) but restarting the restore job might be expensive depending on how many files are being restored.

Add a retry that will catch read errors and retry from where the last data was successfully read.

A bit of history -- this patch was first started three years ago but the memory context model at that time would not allow the interface (StorageRead) to own the driver (e.g. StorageReadS3). Subsequent improvements in memory contexts have allowed this ownership model and in fact it is now the default so no ownership changes are required in this patch except in StorageReadRemote which was not updated in f6e3073.
2025-06-16 16:25:50 -04:00
David Steele
1e23e38730 Update support and recognition documentation. 2025-06-16 16:04:35 -04:00
Anton Kurochkin
2e76e9fc69 Describe discrete option values in a list where appropriate.
In the documentation and help output the possible values for most options are described as a list but there were several commands for which formatting was done in a different style.

Formatting has been applied to commands and their options to display option values as a list:

* `info` command, `--type` option
* `verify` command, `--output` option
* `version` command, `--output` option
* `manifest` command, `--output` option
2025-06-09 10:34:17 -04:00
David Steele
ad7ba46bc6 Improve the predictability of floating point numbers formatted as strings.
Some C libraries (e.g.musl) render floating point numbers differently when using printf(). This does not cause any problems for the core code but the unit tests require more predictability to function smoothly without a lot of exceptions.

To accomplish this, wrap the various floating point operations in functions that mostly continue doing math using double but format the output string using integers. This leads to more predictable output at the cost of some complexity.

Rounding could also be accomplished using nearbyint() but this would require linking the math library, which does not seem worth it for a fairly simple operation.
2025-06-05 13:27:45 -04:00
Anton Kurochkin
cb673225e9 Fix "less than" in help output for archive-mode option. 2025-05-28 15:33:38 -04:00
David Steele
8737cad566 Fix defaults in command-line help.
This is another attempt to fix broken defaults in the command-line help. PR #2395 tried to do this by removing current and default values (at least from the list) but this removed much of the value of command-line help being context sensitive. Improvements were made to current values in b3ca2e34 but defaults were left as-is, i.e. broken in some cases.

This new approach handles defaults much the same way as current values. If there are multiple defaults for an option then it is displayed in the option list as <multi> and enumerated in the specific option help. For example, if repo1-type=s3 and repo2-type=azure then the defaults for repo1-storage-upload-chunk-size and repo2-storage-upload-chunk-size will differ.

However, getting the above right means we can no longer display defaults for options that are not valid in the current context. For instance, if repo1-type=posix then none of the repo1-s3-* options are valid and therefore their defaults are not displayed in help. In this case they don't have defaults and if you tried to set one of these options you would get an error.
2025-05-12 13:33:42 -04:00
David Steele
1e98684148 Add default type for current pgbackrest binary.
These defaults were being set in cfgLoadUpdate() but they can be better documented if they are part of the option rules. This is part of a general trend to move away from custom-coded defaults.
2025-05-08 09:16:30 -04:00
David Steele
6869543b7a Make default-literal option configuration with default-type.
This allows default types to be added without adding more boolean flags.
2025-05-08 09:12:38 -04:00
David Steele
e38e3e0090 Merge v2.55.1 release. 2025-05-05 13:40:14 -04:00
David Steele
96dfdce085 Revert "calculate content-md5 on S3 only when required".
20bfd14 removed content-md5 where allowed by the specification but failed to notice that either content-md5 or x-amz-content-* is required for PUT when object lock is enabled.

On top of that it appears Scality S3 (at least?) won't accept alternate content checksums when object lock is enabled. Technically this is a violation of the specification but nonetheless the change breaks working installations.

For now it seems safer to revert this change and pursue a better solution for a future feature release.
2025-04-29 15:12:11 -04:00
David Steele
9958b17834 Fix lower bounds checking for option keys.
Specifying an indexed option with a key of 0, e.g. pg0-path, would lead to a segfault.

Add lower bounds checking to fix the issue.
2025-04-28 14:33:56 -04:00
David Steele
1d89e3d75b Add details about notifying packagers to release instructions. 2025-04-23 13:17:19 -04:00
David Steele
01c8c0f864 Begin v2.56.0 development. 2025-04-21 18:51:50 -04:00
David Steele
6c90196e74 v2.55.0: Verification Improvements and PostgreSQL 18 Support
Bug Fixes:

* Fix block incremental restore issue on non-default repository. (Reviewed by David Christensen, Aleksander Łukasz. Reported by Aleksander Łukasz.)
* Do not set recovery_target_timeline=current for PostgreSQL < 12. (Reviewed by Stefan Fercot.)
* Fix expire archive range logging. (Reviewed by Stefan Fercot. Reported by Aleš Zelený.)
* Fix error reporting for queries with no results. (Reviewed by Stefan Fercot. Reported by Susantha Bathige.)

Features:

* Verify recovery target timeline. (Reviewed by Stefan Fercot.)
* Allow verification of a specified backup. (Contributed by Maxim Michkov. Reviewed by David Steele.)
* Add support for S3/GCS requester pays. (Contributed by Timothée Peignier. Reviewed by David Steele.)
* PostgreSQL 18 experimental support. (Reviewed by Stefan Fercot.)
* Allow connections to PostgreSQL on abstract domain sockets. (Reviewed by Chris Bandy. Suggested by Chris Bandy.)
* Add numeric output to version command. (Contributed by Stefan Fercot. Reviewed by David Steele.)

Improvements:

* Allow backup command to operate on remote repositories. (Reviewed by Stefan Fercot.)
* Use lz4 for protocol compression. (Reviewed by Stefan Fercot.)
* Calculate content-md5 on S3 only when required. (Reviewed by David Christensen.)
* Warn when a value for a multi-key option is overwritten. (Reviewed by David Christensen, Stefan Fercot.)
* Add detail logging for expired archive path. (Contributed by Stefan Fercot. Reviewed by David Steele.)
* Remove support for PostgreSQL 9.4. (Reviewed by Stefan Fercot.)
* Remove autoconf/make build. (Reviewed by David Christensen.)

Documentation Improvements:

* Fix documentation for specifying multiple stanzas with tls-server-auth. (Reviewed by David Christensen, Stefan Fercot. Suggested by Terry MacAndrew.)
* Clarify incremental backup expiration. (Reviewed by Stefan Fercot.)
* Clarify requirement for local/remote pgBackRest versions to match. (Contributed by Greg Clough. Reviewed by David Steele.)
* Add FAQ about exporting self-contained cluster. (Contributed by Stefan Fercot. Reviewed by David Steele.)
* Caveat --tablespace-map-all regarding tablespace creation. (Reviewed by Stefan Fercot, Christophe Courtois. Suggested by Christophe Courtois.)
* Clarify behavior of --repo-retention-full-type. (Reviewed by Antoine Beaupré. Suggested by Antoine Beaupré.)
* Change --process-max recommendation for object stores to --repo-bundle. (Reviewed by Stefan Fercot.)
* Update unix_socket_directory to unix_socket_directories. (Contributed by hyunkyu han. Reviewed by David Steele.)
* Recommend not placing spool-path within pg_xlog/pg_wal. (Reviewed by Martín Marqués, Don Seiler. Suggested by Martín Marqués.)
2025-04-21 18:25:00 -04:00
Yegor
4bb7cb1ac6 Fix incorrect not-equal comparison in TEST_RESULT_Z_NE().
This comparison was not being used for strings in the tests but best to fix it so it does not cause confusion in the future.
2025-04-21 09:02:56 -04:00
Stefan Fercot
9386920b79 Add FAQ about exporting self-contained cluster. 2025-04-16 12:26:15 -05:00
David Steele
c2f64bb03b Fix highlight error in user guide.
87776bc9 updated the RHEL documentation to PostgreSQL 13/14 but did not update recovery highlighting to be compatible with RHEL. This was not caught because the RHEL documentation was being build as PDF, which does not do highlighting.

Instead build the RHEL documentation as HTML in the first stage (and PDF in the second) so the error is caught.

Finally, fix the RHEL documentation to generate the highlight by concatenating the log.
2025-04-15 16:50:40 -05:00
David Steele
8bdba74798 Clarify incremental backup expiration.
The documentation was a bit misleading regarding how incremental backups are expired. Update the misleading part ("Differential and Incremental backups are count-based...") and move the explanation of how incremental expiration works out of differential expiration into the introductory paragraph.

Also add a note about how full backups are considered as differential for the purpose of expiration.
2025-04-14 13:34:35 -05:00
David Steele
565d2e0c47 PostgreSQL 18 experimental support.
Support is experimental since PostgreSQL 18 is still in development and has not released a beta, but it may be useful for testing.
2025-04-14 12:29:31 -05:00
David Steele
dd3a7d99a7 Recommend not placing spool-path within pg_xlog/pg_wal.
This is generally not a good idea and it can cause problems for PostgreSQL tools such as pg_rewind.
2025-04-10 12:08:35 -05:00
David Steele
cab4d6fd5c Warn when a value for a multi-key option is overwritten.
Some options can contain multiple key/value pairs. However, if if the key is specified again the value will be silently overwritten. In most cases one value per key is appropriate, but it makes sense to warn the user about the overwrite.
2025-04-10 10:26:50 -05:00
David Steele
68f22aea66 Allow connections to PostgreSQL on abstract domain sockets.
Currently the pg-socket-path option is required to be a valid absolute path but this precludes the use of abstract domain sockets.

Set the option type to string so abstract domain sockets are allowed. This removes some validation but libpq will report if the path is invalid and we don't use it for any other purpose.
2025-04-09 14:50:16 -05:00
David Steele
20bfd14b73 Calculate content-md5 on S3 only when required.
The content-md5 header was generated for all requests with content but it is only required for batch delete requests. It is not clear why this header is required when x-amz-content-sha256 is also provided or why it
is required only for this request but the documentation is clear on the matter. However, the content for these requests is relatively small compared to uploading files so omitting content-md5 where possible will
save some CPU cycles.

Current AWS S3 and recent Minio don't complain if this header is missing but since it is still required by older versions of Minio and it is specified in the documentation for batch delete it is makes sense to
keep it.
2025-04-09 12:27:27 -05:00
David Steele
53a45d25c3 Fix documentation for specifying multiple stanzas with tls-server-auth.
The prior documentation said that multiple stanzas should be specified by repeating the tls-server-auth option. This is incorrect -- in fact a comma-separated list of stanza should be used.

Fix the documentation and add a test to confirm this behavior.

In passing add some const qualifiers that were missing in the relevant code.
2025-04-08 10:45:06 -05:00
Yegor
e18ca19895 Remove extraneous const from cvtIntToZ() prototype. 2025-04-04 10:26:03 -05:00
Yegor
aeea81a388 Fix test logging in storageHelperDryRunInit(). 2025-04-04 10:07:32 -05:00
David Steele
cc50e1aecd Revert "full/incremental backup method".
This method was introduced in cad595f but on further reflection it does not seem worth the added complexity just to make restore consistency faster without improving the speed of the overall backup. The most common recovery case is PITR and this method produces diminishing returns as the recovery time gets further from the backup end time.

A better solution (not implemented here) is to copy unmodified files from prior backups. This is much faster than recopying and compressing files from the cluster (especially on object stores with a copy command) and can even be done after the backup window to further reduce WAL replay required for consistency. It also reduces load on the host where the backup is made.
2025-04-02 15:25:23 -05:00
David Steele
33655095f4 Use latest Minio image for testing.
It has been a while since a Minio bug has broken the tests so go back to testing against the latest version until the trust is broken again.
2025-03-31 19:30:41 -05:00
albertchae
dcfe30e322 Fix instances of "of of". 2025-03-31 14:18:29 -05:00
David Steele
b28ae98cec Update unix_socket_directory to unix_socket_directories.
unix_socket_directory has not been valid since PostgreSQL 9.2 and since 9.5 is now the minimum supported version there is no reason to document both.
2025-03-31 12:24:03 -05:00
Maxim Michkov
e4def8845e Allow verification of a specified backup.
Add support for the verify command --set option. This (internal) option was already accepted without errors but was not implemented.

The default behavior for verify is to check all the backups present. With the --set option only the specified backup will be verified. If the specified backup label is not valid an error is added to the result and verification is skipped. In addition, only WAL required to make the specified backup consistent will be verified.
2025-03-26 08:44:29 -06:00
David Steele
1cfafa386c Strip version from RHEL label in user guide.
RHEL 7 is EOL so remove it from the label. Rather than update the version range just remove it from the label since the user guide is generally applicable to RHEL.
2025-03-25 11:00:53 -06:00
David Steele
87776bc9cf Update PostgresQL versions in user guide.
PostgreSQL 12 is EOL and no longer available in the yum.postgresql.org repository.

Update the base and update versions of the RHEL and Debian documentation to better cover supported versions.
2025-03-25 10:28:46 -06:00
David Steele
84eb9a742d Clarify behavior of --repo-retention-full-type.
Make some clarifications and add explicit documentation for repo-retention-full-type=count.
2025-03-24 12:21:31 -06:00
David Steele
5d2c67c3c6 Fix block incremental restore issue on non-default repository.
If the selected backup to restore was not in the default (lowest number) repository and block incremental was used, then restore would erroneously try to load the file super block list from the default repository. Specifying --repo would fix this since it changed the default repository.

Fix by updating the super block read to the specified repository.
2025-03-21 10:37:20 -06:00
Yegor
723f900eaa Fix missing return in FUNCTION_LOG_RETURN_VOID().
This macro is only ever called last in functions so this is not an active issue, but it makes sense to fix since it would pose a risk for future development.
2025-03-21 09:01:02 -06:00
Yegor
d10ad3780a Fix test logging. 2025-03-19 11:17:59 -06:00
Yegor
f00c0d4e85 Fix comments in unit tests. 2025-03-19 09:34:32 -06:00