Recovery may error unless --type=immediate is specified. This is because after consistency is reached PostgreSQL will flag zeroed pages as errors even for a full-page write.
For PostgreSQL ≥ 13 the ignore_invalid_pages setting may be used to ignore invalid pages. In this case it is important to check the logs after recovery to ensure that no invalid pages were reported in the selected databases.
It is best if the archive-push and backup commands have the same compress-type (e.g. lz4) when using archive-copy. Otherwise, the WAL segments will need to be recompressed with the compress-type used by the backup, which can be fairly expensive depending on how much WAL was generated during the backup.
When the FUNCTION_*_RESULT*() macros were renamed to FUNCTION_*_RETURN_*() in the core code the test harness macros were missed.
Update them to make the naming consistent.
There was already leakage here but when the compression transcoding was added it became a deluge.
There is some argument to be made that the filters should clean themselves up better but a temp mem context makes sense here anyway so do that.
The stanza-create, stanza-upgrade and stanza-delete were required to be run on the repository host. When there was only one repository allowed this was not a problem.
However, with the introduction of multiple repository support, this becomes more of a burden to the user, therefore the stanza-create, stanza-upgrade and stanza-delete commands have been improved to allow for them to be run remotely.
Moving to YAML allows the configuration data to be read by C programs.
Also go back to using YAML::XS since it is the only implementation that has proper boolean support.
Up to four repositories may be configured. A potential benefit is the ability to have a local repository for fast restores and a remote repository for redundancy.
Some commands, e.g. stanza-create/stanza-update, will automatically work with all configured repositories while others, e.g. stanza-delete, will require a repository to be specified using the repo option. See the command reference for details on which commands require the repository to be specified.
Note that the repo option is not required when only repo1 is configured in order to maintain backward compatibility. However, the repo option is required when a single repo is configured as, e.g. repo2. This is to prevent command breakage if a new repository is added later.
The archive-push command will always push WAL to the archive in all configured repositories but backups will need to be scheduled individually for each repository. In many cases this is desirable since backup types and retention will vary by repository. Likewise, restores must specify a repository. It is generally better to specify a repository for restores that has low latency/cost even if that means more recovery time. Only restore testing can determine which repository will be most efficient.
For single repository configurations there should be no change in behavior.
The HTML command reference was showing some options that were not valid because it did not properly understand the new role validity system. Also, the custom section for the new repo option was not being honored.
This is a bit messy because it leads to some duplicated code in help.c but there doesn't seem to be any way to fix that with the Perl data structures as they are.
This code is being migrated to C so it doesn't seem worth messing with it too much with the risk of breaking other things.
Some commands (repo-*, verify) still required the --repo option but it makes sense to give them the same treatment as backup and simply use the first repo when one is not specified.
This leaves stanza-delete as the only remaining command that requires --repo. This is by design to enhance safe usage.
The following options are renamed as specified:
repo1-azure-ca-file -> repo1-storage-ca-file
repo1-azure-ca-path -> repo1-storage-ca-path
repo1-azure-host -> repo1-storage-host
repo1-azure-port -> repo1-storage-port
repo1-azure-verify-tls -> repo1-storage-verify-tls
repo1-s3-ca-file -> repo1-storage-ca-file
repo1-s3-ca-path -> repo1-storage-ca-path
repo1-s3-host -> repo1-storage-host
repo1-s3-port -> repo1-storage-port
repo1-s3-verify-tls -> repo1-storage-verify-tls
The old option names (e.g. repo1-s3-port) will continue to work for repo1, but repo2, etc. will require the new names.
This allows the removal of the callback in the S3/Azure storage drivers that existed only to parse the size/time information.
The extra callback was required because not all callers of storage*ListInternal() want size/time info, so it was wasteful to add it to storage*ListInternal(). Now those callers can request type info only.
This wasn't exposed before because the remote protocol directly uses the storage driver, which bypasses the writeable checks.
However, the upcoming GCS driver explicitly requests write permissions so remote operations fail when a write is required.
It would be far better if the remote itself was marked as writeable but that will require much more work.
Warning on missing breaks in switch statements works great until it is intended.
Suppressing on a case by case basis varies by compiler and version so is not very practical. Our tests should be sufficient to the task of finding missing breaks.
The archive-push command will continue to push even after it gets a write error on one or more repos. The idea is to archive to as many repos as possible even we still need to throw an error to PostgreSQL to prevent it from removing the WAL file.
Add --with-confdir=DIR option to configure, which can be used to override the default configuration directory of /etc/pgbackrest.
Probably in the future it would be better to just leverage ${sysconfdir} which is based on prefix, but since previously the config directory was hard coded to /etc/pgbackrest, we retain that default value by not relying on sysconfdir for now.
The real/all test could fill the ramdisk depending on which vm and pg version were selected.
Debug level should be fine for most purposes and the level can be increased when needed.
The restore command automatically defaults to selecting the latest backup from a single repository. With multiple repositories configured, the restore command will now default to selecting the latest backup from the first repository where backups exist. The order in which the repositories are checked is dictated by the pgbackrest.conf order.
To select from a specific repository, the --repo option can be passed (e.g. --repo=1). The --set option can be passed if a backup other than the latest is desired.
Repositories will be searched in order for the requested archive file.
Errors will be reported as warnings as long as a valid copy of the archive file is found.
Errors are logged to the log file rather than thrown. If, after processing all repos, one or more errors occurred, then a single error error will be thrown to indicate there were errors and the log file should be inspected.
Also update log messages to be more consistent with new patterns.
This is more efficient and the error case can be an assert rather than a runtime error.
For extra safety initialize destinationSize to SIZE_MAX to increase the chances of an error if the switch fails.
There is not enough code here to justify multiple files and declaring the functions for each encoding as static allows the compiler to inline where appropriate.
These constructors wrap encodeToStr() and decodeToBin(), making them convenient and safe by eliminating the need to create intermediate buffers. Encoding/decoding is performed directly into the target String/Buffer. Sizing of the destination buffer is handled by the new functions so it doesn't have to be done at each call site.
If the second letter is capital or a digit then the word is likely an acronym so don't lower-case the first letter.
For now only the digit case is checked since there are no summaries with a capital as the second letter.
GCS requires mixed encoding in the path so encoding inside HttpRequest does not work.
Instead, require the path to be correctly encoded before being passed to HttpRequest.
The path was originally named uri due to the canonicalized path being called "canonicalized uri" in the S3 authentication documentation. The name got propagated everywhere from there.
This is not correct for general usage, however, so rename to path when describing the path component of an HTTP request.
ASCII may occasionally be encoded (e.g. &) to prevent ambiguity depending on where the JSON is located.
Only ASCII can be decoded. In general Unicode should not be encoded in JSON.