Check that archive files exist in the main process instead of the local process. This means that the archive.info file only needs to be loaded once per execution rather than once per file to get.
Stop looking when a file is missing or in error. PostgreSQL will never request anything past the missing file so there is no point in getting them. This also reduces "unable to find" logging in the async process.
Cache results of storageList() when looking for multiple files to reduce storage I/O.
Look for all requested archive files in the archive-id where the first file is found. They may not all be there, but this reduces the number of list calls. If subsequent files are in another archive id they will be found on the next archive-get call.
Append "asynchronously" to messages when the async process fetched the file (not in the actual async process log, though).
Add "repo1" to make it clear what archive we are talking about. This is not very useful by itself but soon we'll be able to add the archive id, which is very useful.
Add constants for messages that are used multiple times to ensure they stay consistent.
The FUNCTION_LOG_RETURN() macro requires logging macros (e.g. FUNCTION_LOG_*_TYPE and FUNCTION_LOG_*_FORMAT) when returning a struct but these macros don't deliver much value since they only output the name of the struct rather than the contents. A copy of the struct is also made during this operation, which is wasteful.
FUNCTION_LOG_RETURN_STRUCT() does not make a copy of the struct and does not require any logging macros. Returned structures are logged as "struct" but this could be made more accurate using __typeof in the future.
Structures as parameters are not addressed here and work as before, i.e. they require logging macros.
Missing files would indicate that another process is running on the same spool path, which would be a very bad thing.
This check doesn't cost any additional I/O so it seems like a good idea.
If files other than backup.manifest.copy were left in a backup path by a prior resume then the next resume would skip the backup rather than removing it. Since the backup path still existed, it would be found during backup label generation and cause an error if it appeared to be later than the new backup label. This occurred if the skipped backup was full.
The error was only likely on object stores such as S3 because of the order of file deletion. Posix file systems delete from the bottom up because directories containing files cannot be deleted. Object stores do not have directories so files are deleted in whatever order they are provided by the list command. However, the issue can be reproduced on a Posix file system by manually deleting backup.manifest.copy from a resumable backup path.
Fix the issue by removing the resumable backup if it has no manifest files. Also add a new warning message for this condition.
Note that this issue could be resolved by running expire or a new full backup.
These options specify the number of local worker job retries and the retry interval after one immediate retry.
There is some value in allowing retries to be specified by the user but for the most part these options are for suppressing retries during testing, which can save a lot of time. The bug introduced in d1d25c7 and fixed in 8b86d5e also suggests it is better not to use retries in tests.
Remove the default delayed retries for archive-get/archive-push, leaving only the immediate retry. These commands are retried by PostgreSQL so it doesn't make sense to do too many retries internally.
These options are currently internal.
This call was removed by d1d25c71, which worked for archivePushProtocol() and verifyProtocol() since the encryption options are passed from the main process.
archiveGetProtocol() still retrieves these options in the local process so the repo storage must be loaded first.
The test was pretty old and written in stages during the migration, so storage use was a bit archaic and the organization was poor.
Update using the new storage macros and reorganize the tests to provide better coverage.
The macros should make it much easier to write complex tests, especially when compression and encryption are involved.
Update the command/archiveGet test to show how the new macros are used.
This avoids the need for strLstJoin() when testing lists.
Lists are \n delimited (rather than command or pipe) so that non-trivial lists can be more easily diff'd.
Add separation and some visual cues to help identify the start of a test.
Also add a counter which can be used to search for a specific test, which is useful if there is a lot of debug output to search through.
These were required to deal with the legacy Perl code being unable to load new options between tests.
The C code does not have this issue so remove the forks and update process ids in the log tests.
No timeout is expected here but the small timeout prevents errors from being thrown.
This is not a bug since the error would be thrown on the next archive-get call but it does make the tests harder to debug when there is an error.
It is not clear why there was a timeout here at all. It is likely cruft from a prior test or a copy/paste error.
Tests that are duplicated are being removed from the info command unit tests. Specifically tests where the only thing different was whether a lock was held or not which affects only the status display. Removing these tests will reduce churn in the upcoming multi-repo support.
The data returned by the protocol has not been sorted yet so it is vulnerable to differences in collation.
Multiple records are not needed for this test so limit it to one path to solve this issue.
These options were explicitly excluded because it was possible for them to be mangled by SSH if they contained spaces.
They are now excluded by command role validity rules.
The pg option only has one current usage, to let the backup local know which pg index it should copy files from.
There are other possible uses for this option, but they need thought, tests, and documentation.
This option was added in advance of the multi-repo functionality but it has no purpose and it is not clear what the validity rules should be.
The option will be added back when multi-repo functionality is committed.
This results in fewer data duplications and makes the code less fragile since new data add in storageRemoteInfoParse() does not need to be added to an additional list for duplication.
There is an inconsistency when the JSON is output for the case when a stanza is requested and it does not exist in the repo. This was the only case where the archive array was not added to the JSON. Adding it will simplify the upcoming multi-repo support code.
Also, a redundant test was removed rather than updating it for this case.
This was a hack to prevent the remote from loading host settings, which is now handled by option validity for command roles.
These options are still useful so don't remove them, but do leave them internal for now.
Building on 23f5712, limit option validity by role. This is mostly for options that weren't needed for certain roles but were harmless. However, the upcoming multi repository functionality requires the granularity implemented here.
The remote role benefits since host options can automatically excluded when building the options. Also, many options that are only required for the default role (e.g. repo-retention-full) no longer need to be passed in tests for other roles.
Some tests used options in contexts that are currently valid but are not correct usage, i.e. usage of internal options for the default role.
Update these tests in advance of the option validity becoming stricter.
Validity by command was not granular enough so numerous options needed be marked internal so users would not stumble across them. Options were also needlessly being passed to roles that had no use for them.
Introduce per-role validity lists that depend on what roles are valid per command. Also add a check to ensure that only valid roles are used with a command.
This commit adds the functionality but does not introduce any new behavior, i.e. all options are valid for all roles that the command is valid for. A subsequent commit will introduce the new role restrictions to make the changes easier to audit.
Data required for parsing was spread between the config and defined modules, mostly for historical reasons because the same data was used by Perl.
Requiring all the parse rules to be accessed with function interfaces makes the code more complicated and new rules harder to implement.
Instead, move the data to the parse module so in the most complex cases no interface functions are needed. This reduces the total amount of code and paves the way for more complex parse rules.
The help data can be represented more compactly in a pack and this separates data needed for help from data needed for parsing, freeing each to have a more appropriate representation.
This was done in the internal versions but not the user-facing function. That meant the field had to be explicitly read after determining it was NULL, which is wasteful.
Since there is only one behavior now, remove pckReadDefaultNull() and move the logic to pckReadNullInternal().
Testing on Travis-CI has been getting slower (from ~18 minutes to 3-6 hours) and the travis-ci.org service will be terminated at the end of the year. Moving to travis-ci.com is an option but the quotas are too low for our purposes.
Instead use Github Actions, which does not currently have quotas, and runs our current tests with just a few tweaks.
This still leaves multi-architecture tests on Travis-CI but we may be able to run those and stay within the new quotas.
Also fix a minor bug in restoreTest.c exposed by Github Actions using a different name for the user and group.
The pack type is an architecture-independent format for serializing data compactly, inspired by ProtocolBuffers and Avro.
Also add ioReadSmall(), which is optimized for small binary reads, similar to ioReadLineParam().