3b8f0ef missed some cases that could cause archive-push to fail:
* Checking archive info.
* Checking to see if a WAL segment already exists.
These cases are now handled so archive-push can succeed on any valid repos.
This improvement reduces the number of errors thrown; these errors will now be reported as a status for the stanza or repo as appropriate. Invalid option configurations are still thrown but all other errors are caught, formatted and reported. This was necessary for multiple repositories so that the command can complete gathering information from each repository and report the results rather than immediately aborting when an error occurs.
Two new error codes were introduced:
6 = requested backup not found
99 = other, which is used to indicate an error has occurred that requires more details to be provided
A new stanza name of "[invalid]" was created for instances where a stanza was not specified and no stanza can be found.
If there is only one repository configured the error will move up to the stanza level with the standard error formatting of 'error (message)' where the message will be "other" and the details of the error will be listed on the next line(s):
stanza: stanza1
status: error (other)
[CryptoError] unable to load info file '/var/lib/pgbackrest/repo/backup/stanza1/backup.info' or '/var/lib/pgbackrest/repo/backup/stanza1/backup.info.copy':
CryptoError: cipher header invalid
HINT: is or was the repo encrypted?
FileMissingError: unable to open missing file '/var/lib/pgbackrest/repo/backup/stanza1/backup.info.copy' for read
HINT: backup.info cannot be opened and is required to perform a backup.
HINT: has a stanza-create been performed?
HINT: use option --stanza if encryption settings are different for the stanza than the global
cipher: aes-256-cbc
If a backup set is requested but is not found on any repo, a stanza-level status error of 'requested backup not found' is reported when there are no other errors:
pgbackrest info --stanza=demo --set=bogus
stanza: demo
status: error (requested backup not found)
cipher: mixed
repo1: aes-256-cbc
repo2: none
If there are multiple repositories configured and a single repo is in error but the other repos are ok or have a different error:
pgbackrest info --stanza=demo --set=20210322-171211F
stanza: demo
status: mixed
repo1: error
[CryptoError] unable to load info file '/var/lib/pgbackrest/repo/backup/stanza1/backup.info' or '/var/lib/pgbackrest/repo/backup/stanza1/backup.info.copy':
CryptoError: cipher header invalid
HINT: is or was the repo encrypted?
FileMissingError: unable to open missing file '/var/lib/pgbackrest/repo/backup/stanza1/backup.info.copy' for read
HINT: backup.info cannot be opened and is required to perform a backup.
HINT: has a stanza-create been performed?
HINT: use option --stanza if encryption settings are different for the stanza than the global
repo2: ok
cipher: mixed
repo1: aes-256-cbc
repo2: none
db (current)
wal archive min/max (12): 000000010000000000000001/000000010000000000000003
full backup: 20210322-171211F
timestamp start/stop: 2021-03-22 17:12:11 / 2021-03-22 17:12:28
wal start/stop: 000000010000000000000002 / 000000010000000000000002
database size: 23.4MB, database backup size: 23.4MB
repo2: backup set size: 2.8MB, backup size: 2.8MB
database list: postgres (13359)
Json output will include the repository information and any error information. If no stanzas are found, then [invalid] will be set as the name:
[
{
"archive":[],
"backup":[],
"cipher":"none",
"db":[],
"name":"[invalid]",
"repo":[
{
"cipher":"none",
"key":1,
"status":{
"code":99,
"message":"[PathOpenError] unable to list file info for path '/var/lib/pgbackrest/repo2/backup': [13] Permission denied"
}
}
],
"status":{
"code":99,
"lock":{"backup":{"held":false}},
"message":"other"
}
}
]
The content-length header was being signed since it was the only header that didn't need to be and it seemed simpler just to sign it as well. Also, the S3 documentation encourages signing as many headers as possible to avoid tampering.
However, some proxies munge this header causing authentication failure, so skip signing content-length.
Make protocol handlers have one function per command. This allows the logic of finding the handler to be in ProtocolServer, isolates each command to a function, and removes the need to test the "not found" condition for each handler.
S3 returns 200 for HEAD / which indicates it is a file but does not return the expected headers which causes an error.
Rather than fix this for S3, just automatically return / as not existing for any storage that does not support paths.
Also add some defensive checks to prevent this from generating a segfault if it happens again.
Some standard system databases (e.g. postgres) may be recreated by the user and have an OID that makes them look like user databases.
Identify the standard three system databases (template0, template1, postgres) and restore them non-zeroed no matter what OID they have.
Cipher type was inferred from the presence of cipherSubPass rather than being passed explicitly in order to maintain compatibility with Perl backupFile().
Now that Perl is gone it makes sense to pass it explicitly, as we do elsewhere.
This test was added to take the place of another test, which turned out not to be workable.
Even so, it adds coverages at little cost so it seems worth keeping.
Recovery may error unless --type=immediate is specified. This is because after consistency is reached PostgreSQL will flag zeroed pages as errors even for a full-page write.
For PostgreSQL ≥ 13 the ignore_invalid_pages setting may be used to ignore invalid pages. In this case it is important to check the logs after recovery to ensure that no invalid pages were reported in the selected databases.
It is best if the archive-push and backup commands have the same compress-type (e.g. lz4) when using archive-copy. Otherwise, the WAL segments will need to be recompressed with the compress-type used by the backup, which can be fairly expensive depending on how much WAL was generated during the backup.
When the FUNCTION_*_RESULT*() macros were renamed to FUNCTION_*_RETURN_*() in the core code the test harness macros were missed.
Update them to make the naming consistent.
There was already leakage here but when the compression transcoding was added it became a deluge.
There is some argument to be made that the filters should clean themselves up better but a temp mem context makes sense here anyway so do that.
The stanza-create, stanza-upgrade and stanza-delete were required to be run on the repository host. When there was only one repository allowed this was not a problem.
However, with the introduction of multiple repository support, this becomes more of a burden to the user, therefore the stanza-create, stanza-upgrade and stanza-delete commands have been improved to allow for them to be run remotely.
Moving to YAML allows the configuration data to be read by C programs.
Also go back to using YAML::XS since it is the only implementation that has proper boolean support.
Up to four repositories may be configured. A potential benefit is the ability to have a local repository for fast restores and a remote repository for redundancy.
Some commands, e.g. stanza-create/stanza-update, will automatically work with all configured repositories while others, e.g. stanza-delete, will require a repository to be specified using the repo option. See the command reference for details on which commands require the repository to be specified.
Note that the repo option is not required when only repo1 is configured in order to maintain backward compatibility. However, the repo option is required when a single repo is configured as, e.g. repo2. This is to prevent command breakage if a new repository is added later.
The archive-push command will always push WAL to the archive in all configured repositories but backups will need to be scheduled individually for each repository. In many cases this is desirable since backup types and retention will vary by repository. Likewise, restores must specify a repository. It is generally better to specify a repository for restores that has low latency/cost even if that means more recovery time. Only restore testing can determine which repository will be most efficient.
For single repository configurations there should be no change in behavior.
The HTML command reference was showing some options that were not valid because it did not properly understand the new role validity system. Also, the custom section for the new repo option was not being honored.
This is a bit messy because it leads to some duplicated code in help.c but there doesn't seem to be any way to fix that with the Perl data structures as they are.
This code is being migrated to C so it doesn't seem worth messing with it too much with the risk of breaking other things.
Some commands (repo-*, verify) still required the --repo option but it makes sense to give them the same treatment as backup and simply use the first repo when one is not specified.
This leaves stanza-delete as the only remaining command that requires --repo. This is by design to enhance safe usage.
The following options are renamed as specified:
repo1-azure-ca-file -> repo1-storage-ca-file
repo1-azure-ca-path -> repo1-storage-ca-path
repo1-azure-host -> repo1-storage-host
repo1-azure-port -> repo1-storage-port
repo1-azure-verify-tls -> repo1-storage-verify-tls
repo1-s3-ca-file -> repo1-storage-ca-file
repo1-s3-ca-path -> repo1-storage-ca-path
repo1-s3-host -> repo1-storage-host
repo1-s3-port -> repo1-storage-port
repo1-s3-verify-tls -> repo1-storage-verify-tls
The old option names (e.g. repo1-s3-port) will continue to work for repo1, but repo2, etc. will require the new names.
This allows the removal of the callback in the S3/Azure storage drivers that existed only to parse the size/time information.
The extra callback was required because not all callers of storage*ListInternal() want size/time info, so it was wasteful to add it to storage*ListInternal(). Now those callers can request type info only.
This wasn't exposed before because the remote protocol directly uses the storage driver, which bypasses the writeable checks.
However, the upcoming GCS driver explicitly requests write permissions so remote operations fail when a write is required.
It would be far better if the remote itself was marked as writeable but that will require much more work.
Warning on missing breaks in switch statements works great until it is intended.
Suppressing on a case by case basis varies by compiler and version so is not very practical. Our tests should be sufficient to the task of finding missing breaks.
The archive-push command will continue to push even after it gets a write error on one or more repos. The idea is to archive to as many repos as possible even we still need to throw an error to PostgreSQL to prevent it from removing the WAL file.
Add --with-confdir=DIR option to configure, which can be used to override the default configuration directory of /etc/pgbackrest.
Probably in the future it would be better to just leverage ${sysconfdir} which is based on prefix, but since previously the config directory was hard coded to /etc/pgbackrest, we retain that default value by not relying on sysconfdir for now.
The real/all test could fill the ramdisk depending on which vm and pg version were selected.
Debug level should be fine for most purposes and the level can be increased when needed.