Recovery settings are now written into postgresql.auto.conf instead of recovery.conf. Existing recovery_target* settings will be commented out to help avoid conflicts.
A comment is added before recovery settings to identify them as written by pgBackRest since it is unclear how, in general, old settings will be removed.
recovery.signal and standby.signal are automatically created based on the recovery settings.
The additional details include databases that can be used for selective restore and a list of tablespaces and symlinks with their default destinations.
This information is not included in the JSON output because it requires reading the manifest which is too IO intensive to do for all manifests. We plan to include this information for JSON in a future release.
Most of these lists should be quite small with the exception of the list in get.c, but it doesn't cost much to sort them and may help in corner cases we have not thought of.
Separate the generation of recovery values and formatting them into recovery.conf format. This is generally a good idea, but also makes the code ready to deal with a different recovery file in PostgreSQL 12.
Also move the recovery file logic out of cmdRestore() into restoreRecoveryWrite().
This restore type automatically adds standby_mode=on to recovery.conf.
This could be accomplished previously by setting --recovery-option=standby_mode=on but PostgreSQL 12 requires standby mode to be enabled by a special file named standby.signal.
The new restore type allows us to maintain a common interface between PostgreSQL versions.
For the most part this is a direct migration of the Perl code into C.
There is one important behavioral change with regard to how file permissions are handled. The Perl code tried to set ownership as it was in the manifest even when running as an unprivileged user. This usually just led to errors and frustration.
The C code works like this:
If a restore is run as a non-root user (the typical scenario) then all files restored will belong to the user/group executing pgBackRest. If existing files are not owned by the executing user/group then an error will result if the ownership cannot be updated to the executing user/group. In that case the file ownership will need to be updated by a privileged user before the restore can be retried.
If a restore is run as the root user then pgBackRest will attempt to recreate the ownership recorded in the manifest when the backup was made. Only user/group names are stored in the manifest so the same names must exist on the restore host for this to work. If the user/group name cannot be found locally then the user/group of the PostgreSQL data directory will be used and finally root if the data directory user/group cannot be mapped to a name.
Reviewed by Cynthia Shang.
cfgExecParam() was originally written to provide options for remote processes. Remotes processes do not have access to the local config so it was necessary to pass every non-default option.
Local processes on the other hand, e.g. archive-get, archive-get-async, archive-push-async, and local, do have access to the local config and therefore don't need every parameter to be passed on the command-line. The previous way was not wrong, but it was overly verbose and did not align with the way Perl had worked.
Update cfgExecParam() to accept a local option which excludes options from the command line which can be read from local configs.
Loading jobs in advance uses a lot of memory in the case that there are millions of jobs to be performed. We haven't seen this yet, but with backup and restore on the horizon it will become the norm.
Instead, use a callback so that jobs are only created as they are needed and can be freed as soon as they are completed.
It's possible (even likely) that the ls output is being piped to something like head which will exit when it gets what it needs and leave us writing to a broken pipe.
It would be better to just ignore the broken pipe error but currently we don't store system error codes.
These features finally make the ls command practical.
Currently the JSON contains only name, type, and size. We may add more fields in the future, but these seem like the minimum needed to be useful.
Push the responsibility for sort and find down to the List object by introducing a general comparator function that can be used for both sorting and finding.
Update insert and add functions to return the item added rather than the list. This is more useful in the core code, though numerous updates to the tests were required.
The control and catalog versions were stored a variety of places in the optimistic hope that they would be useful. In fact they never were.
We can't remove them from the backup.info and backup.manifest files due to backwards compatibility concerns, but we can at least avoid loading and storing them in C structures.
Add functions to the PostgreSQL interface which will return the control and catalog versions for any supported version of PostgreSQL to allow backwards compatibility for backup.info and backup.manifest. These functions will be useful in other ways, e.g. generating the tablespace identifier in PostgreSQL >= 9.0.
Info files required three copies in memory to be loaded (the original string, an ini representation, and the final info object). Not only was this memory inefficient but the Ini object does sequential scans when searching for keys making large files very slow to load.
This has not been an issue since archive.info and backup.info are very small, but it becomes a big deal when loading manifests with hundreds of thousands of files.
Instead of holding copies of the data in memory, use a callback to deliver the ini data directly to the object when loading. Use a similar method for save to avoid having an intermediate copy. Save is a bit complex because sections/keys must be written in alpha order or older versions of pgBackRest will not calculate the correct checksum.
Also move the load retry logic to helper functions rather than embedding it in the Info object. This allows for more flexibility in loading and ensures that stack traces will be available when developing unit tests.
Reviewed by Cynthia Shang.
The manifest is not an info file so if anything it should be called backupManifest. But that seems too long for such a commonly used object so manifest seems better.
Note that unlike Perl there is no storage manifest method so this stands as the only manifest in the C code, as befits its importance.
Checking the PostgreSQL-reported path and version against the pgBackRest configuration helps ensure that pgBackRest is operating against the correct cluster.
In Perl this functionality was in the Db object, but check seems like a better place for it in C.
Contributed by Cynthia Shang.
Previously storageLocal() was being used internally but loading pg_control from remote storage is often required.
Also, storagePg() is more appropriate than storageLocal() for all current usage.
Contributed by Cynthia Shang.
The Perl versions remain because they are still being used by the Perl stanza commands. Once the stanza commands are migrated they can be removed.
Contributed by Cynthia Shang.
Implement switch WAL and archive check in C but leave the rest in Perl for now.
The main idea was to have some real integration tests for the new database code so the rest of the migration can wait.
Reviewed by Cynthia Shang.
Migrate functionality from the Perl Db module to C. For now this is just enough to implement the WAL switch check.
Add the dbGet() helper function to get Db objects easily.
Create macros in harnessPq to make writing pq scripts easier by grouping commonly used functions together.
Reviewed by Cynthia Shang.
Keep trying to locate the WAL segment until timeout. This is useful for the check and backup commands which must wait for segments to arrive in the archive.
The local process is now entirely migrated to C. Since all major I/O operations are performed in the local process, the vast majority of I/O is now performed in C.
Contributed by David Steele, Cynthia Shang.
Previously only a single filter could be pushed to the remote since order was not being maintained. Now the filters are strictly ordered.
Results are returned from the remote and set in the local IoFilterGroup so they can be retrieved.
Expand remote filter support to include all filters.
Read all data from an IoRead object and discard it. This is handy for calculating size, hash, etc. when the output is not needed.
Update code where a loop was used before.
gcc < 9 makes all compound literals function scope, even though the C spec requires them to be invalid outside the current scope. Since the compiler and valgrind were not enforcing this we had a few violations which caused problems in gcc >= 9.
Even though we are not quite ready to support gcc 9 officially, fix the scoping violations that currently exist in the codebase.
Reported by chrlange, Ned T. Crigler.
Maintaining the storage layer/drivers in two languages is burdensome. Since the integration tests require the Perl storage layer/drivers we'll need them even after the core code is migrated to C. Create an interface layer so the Perl code can be removed and new storage drivers/features introduced without adding Perl equivalents.
The goal is to move the integration tests to C so this interface will eventually be removed. That being the case, the interface was designed for maximum compatibility to ease the transition. The result looks a bit hacky but we'll improve it as needed until it can be retired.
Secure options could show up in the help as "current". While the user must have permissions to see the source of the options (e.g. environment, config file) it's still not a good idea to display them in an unexpected context.
Instead show secure options as <redacted> in the help command.
Amend commit 434cd832 to error when the db history in archive.info and backup.info do not match.
The Perl code would attempt to reconcile the history by matching on system id and version but we are not planning to migrate that code to C. It's possible that there are users with mismatches but if so they should have been getting errors from info for the last six months. It's easy enough to manually fix these files if there are any mismatches in the field.
Contributed by Cynthia Shang.
If the file is compressible (i.e. not encrypted or already compressed) it can be marked as such in storageNewRead()/storageNewWrite(). If the file is being read from/written to a remote it will be compressed in transit using gzip.
Simplify filter group handling by having the IoRead/IoWrite objects create the filter group automatically. This removes the need for a lot of NULL checking and has a negligible effect on performance since a filter group needs to be created eventually unless the source file is missing.
Allow filters to be created using a VariantList so filter parameters can be passed to the remote.
This implementation duplicates the functionality of the Perl code but does so with different logic and includes full unit tests.
Along the way at least one bug was fixed, see issue #748.
Contributed by Cynthia Shang.
This filter exactly mimics the behavior of the Perl filter so is a drop-in replacement.
The filter is not integrated yet since it requires the Perl-to-C storage layer interface coming in a future commit.
These names more accurately reflect what the functions do and follow the convention started in Info and InfoPg.
Also remove the ignoreMissing parameter since it was never used.
Contributed by Cynthia Shang.
Allow commands to be skipped by default in the command help but still work if help is requested for the command directly. There may be other uses for the flag in the future.
Update help for ls now that it is exposed.
Allows listing repo paths/files from the command-line, to be used primarily for testing and debugging.
This command is internal-only so the interface may change at any time without notice.