Bug Fixes:
* Allow [, #, and space as the first character in database names. (Reviewed by Stefan Fercot, Cynthia Shang. Reported by Jefferson Alexandre.)
* Create standby.signal only on PostgreSQL 12 when restore type is standby. (Fixed by Stefan Fercot. Reviewed by David Steele. Reported by Keith Fiske.)
Features:
* Expire history files. (Contributed by Stefan Fercot. Reviewed by David Steele.)
* Report page checksum errors in info command text output. (Contributed by Stefan Fercot. Reviewed by Cynthia Shang.)
* Add repo-azure-endpoint option. (Reviewed by Cynthia Shang, Brian Peterson. Suggested by Brian Peterson.)
* Add pg-database option. (Reviewed by Cynthia Shang.)
Improvements:
* Improve info command output when a stanza is specified but missing. (Contributed by Stefan Fercot. Reviewed by Cynthia Shang, David Steele. Suggested by uspen.)
* Improve performance of large file lists in backup/restore commands. (Reviewed by Cynthia Shang, Oscar.)
* Add retries to PostgreSQL sleep when starting a backup. (Reviewed by Cynthia Shang. Suggested by Vitaliy Kukharik.)
Documentation Improvements:
* Replace RHEL/CentOS 6 documentation with RHEL/CentOS 8.
Update RHEL/CentOS 7 to cover the versions that were previously covered by RHEL/CentOS 6.
Since RHEL/CentOS 7/8 work the same update the documentation logic and labels to reflect this compatibility.
Inaccuracies in sleep time or clock skew might make a single sleep insufficient to reach the next second.
Add a few retries to make the process more reliable but still avoid an infinite loop if something is seriously wrong.
CentOS6 EOL'd and the mirrors were swiftly deleted, leading to failures in tests and documentation.
Remove CentOS 6 for now to get builds going again with the intention to replace it in the near future with CentOS 8.
There is not a lot to be done in this case since it looks like PostgreSQL disconnected while the query was running, but at least improve the error message and remove the assert, which indicates a coding error.
These calls are not required since cipher info is passed explicitly. They are probably a copy-pasto from some past time when one of these functions required it.
Refactor the code to allow a dynamic number of indexes for indexed options, e.g. pg-path. Our reliance on getopt_long() still limits the number of indexes we can have per group, but once this limitation is removed the rest of the code should be happy with dynamic numbers of indexes (with a reasonable maximum).
Add an option to set a default in each group. This was previously handled by the host-id option but now there is a specific option for each group, pg and repo. These remain internal until they can be fully tested with multi-repo support. They are fully tested for internal usage.
Remove the ConfigDefineOption enum and use the ConfigOption enum instead. They are now equal since the indexed options (e.g. cfgOptRepoHost2) have been removed from ConfigOption.
Remove the config/config test module and add required tests to the config/parse test module. Parsing is now the only way to load a config so this removes some redundancy.
Split new internal config structures and functions into a new header file, config.intern.h. More functions will need to be moved over from config.h but that will need to be done in a future commit to reduce churn.
Add repoIdx to repoIsLocal() and storageRepo*(). Multi-repository support requires that repo locality and storage be accessible by index. This allows, for example, multiple repos to be iterated in a loop. This could be done in a separate commit but doesn't seem worth it since the code is related.
Remove the type parameter from storageRepoGet(). This parameter existed solely to provide coverage for the case where the storage type was invalid. A better pattern is to check that the type is S3 once all other types have been ruled out.
Improve locking on remote processes by introducing an exec-id that is unique to the main process and passed to all remote processes. This allows the remote processes to determine if a lock is held by a remote from the same main process. If so, the lock is allowed.
The exec-id is also useful for associating remote logs with main logs for debugging purposes.
When restore type standby is provided, the recovery.signal isn't needed and may lead to some confusion (see #1236).
Lately, when using pg_basebackup --write-recovery-conf, only the standby.signal file is created. This change would then align with that behaviour.
The result structure for the archive id being processed only needs to be retrieved once so moving it outside of the WAL path list processing loop is more efficient.
If a user reset an option such as pg-default on the command-line then an override in the code would not take effect.
Ignore a reset when the code explicitly sets an option to prevent this.
This call to storageRepo() was used to fetch cipher options from a remote to determine if a repo cipher was enabled.
Now the main process does this work and passes the cipher options directly to the local so there is no need to pre-load the repo storage here.
If the push queue limit has been exceeded then nothing will be pushed to the repo so there is no point in checking it. Worse, a failure in the check would cause drop not to run and potentially fill up the disk, exactly the case this feature was designed to prevent.
The async version already checks the push queue limit before checking the repository so now both versions have the same behavior.
These warnings were only being reported to PostgreSQL on the console. Now they are also recorded in the async log increasing the chance that they will be seen.
This also improves coverage by requiring a warning during async processing to have a test case, which has been added.
Checking the default here was fragile. If the default were to change the code would break.
This also removes the only dependency on cfgOptionDefault() outside of the help command.
Three exclamation points are used by convention as a marker for code that needs attention before it can be committed to integration.
If the markers are in this file they come up in every search.
The tests were originally written by loading values directly into the configuration before the parser was available.
Update to use harnessCfgLoad() to simplify the tests and make them compatible with upcoming config changes.
Return a path missing error when a stanza is specified for the info command but the stanza does not exist in the repository.
Previously [] was returned, which is still the case if no stanza is specified and the repository does not exist.
lstRemoveIdx(list, 0) resulted in the entire list being moved down to the first position which could take a long time for big lists. This is a common pattern in backup/restore when processing file queues.
Instead simply move the list pointer up when first item is removed. Then on insert check if there is space at the beginning when there is no longer space at the end and do the move then. This way if a list is built and then drained without any new inserts then no move is required.
There were a number of places in the code where "hostId" was used, but hostId is just the option group index + 1 so this led to a lot of +1 and -1 to convert the id to an index and vice versa.
Instead just use the zero based index wherever possible. This is pretty much everywhere except when the host-id option is read or set, or where a message is being formatted for the user.
Also fix a bug in protocolRemoteParam() where remotes spawned from the main process could get process ids that were not 0. Only the locals should spawn remotes with process id > 0. This seems to have been harmless since the process id is only a label, but it could be confusing when debugging.
The defines for FUNCTION_LOG_VERIFY_WAL_RANGE* are not used in the current verify.c and are currently not planned in the continuing development of the verify command, so they are dead code and are therefore being removed.
bufSize() should only be used whem checking the total size of the buffer, not how much of it is currently used.
In these cases bufUsed() and bufSize() are returning the same value but benign-looking code changes could break this assumption.
iniLoad() was trimming lines which meant that a leading space would not pass checksum validation when a manifest was reloaded. Remove the trims since files we write should never contain extraneous spaces. This further diverges the format for the functions that read conf files (e.g. pgbackrest.conf) and those that read info (e.g. manifest) files.
While we are at it also allow [ and # as initial characters. # was reserved for comments but we never put comments into info files. [ denotes a section but we can get around this by never allowing arrays as values in info files, so if a line ends in ] it must be a section. This is currently the case but enforce it by adding an assert to info/info.c.
The tests were originally written by loading values directly into the configuration before the parser was available.
Update to use harnessCfgLoadRaw() to simplify the tests and make them compatible with upcoming config changes.
Note that some unreachable conditions were removed since they could not be reached via a parsed config, only by munging values directly into the config. cfgOptionTest(optionId) was removed because a non-default value must always be set. cfgOptionValid(cfgOptLogTimestamp) was removed because it is true for all commands except for cfgCmdNone, which is checked with an assert.
cfgOptionId() did not recognize deprecated options which made the help command throw errors when they were specified on the command line. cfgParseOption() will correctly identify deprecated options.
cfgParseOption() can also be used in cfgParse() to reduce code duplication when parsing info out of the option value returned by optionFind().
Finally, code the option key index separately in parse.auto.c. For now they are simply added back together but future code will need them separated.
This has always been equivalent to the ConfigCommand enum so it just adds complexity.
It was created for symmetry with ConfigDefineOption, which will also be removed soon.
Currently indexes above 1 do not have dependencies checked, so this doesn't error.
In a future commit we will enable those checks and this will error if it is not fixed.
These constants don't scale well as the index total is increased for an option.
The core code rarely uses these options and they are easily replaced with cfgOptionName().
The tests had started to make use of the constants, so provide functions that build the option name from the optionId and, optionally, the optionKey.
WAL timeline history files were not being expired because they were small and generally not very plentiful.
However, in some cases large numbers of history files may be generated so it makes sense to remove useless history files to keep things tidy.
The history file for the oldest retained timeline is kept for debugging purposes even though it is not used for recovery.
Instead of using memmove() to manage the internal output buffer for every small read, track the current buffer position and only move data when the small read cannot be satisfied and more data is needed.
Group related options together so operations (e.g. valid, test, index total) can be performed on all options in the group.
Previously, options at the top of the hierarchy of the related options were used to do these tests. This was prone to error as option relationships changed and it was not always clear which option (or options) should be used.