The backup manifest stores a complete list of all files, links, and paths in a backup along with metadata such as checksums, sizes,
timestamps, etc. A list of databases is also included for selective restore.
The purpose of the manifest is to allow the restore command to confidently reconstruct the PostgreSQL data directory and ensure that
nothing is missing or corrupt. It is also useful for reporting, e.g. size of backup, backup time, etc.
For now, migrate enough functionality to implement the restore command.
Reviewed by Cynthia Shang.
cfgExecParam() was originally written to provide options for remote processes. Remotes processes do not have access to the local config so it was necessary to pass every non-default option.
Local processes on the other hand, e.g. archive-get, archive-get-async, archive-push-async, and local, do have access to the local config and therefore don't need every parameter to be passed on the command-line. The previous way was not wrong, but it was overly verbose and did not align with the way Perl had worked.
Update cfgExecParam() to accept a local option which excludes options from the command line which can be read from local configs.
strPathAbsolute() generates an absolute path from an absolute base path and an absolute/relative path.
strLstRemoveIdx() is a support function based on lstRemoveIdx().
In general we don't care about path and link times since they are easily recreated when restoring.
So, outside of storageInfo() we don't need to bother testing them.
Loading jobs in advance uses a lot of memory in the case that there are millions of jobs to be performed. We haven't seen this yet, but with backup and restore on the horizon it will become the norm.
Instead, use a callback so that jobs are only created as they are needed and can be freed as soon as they are completed.
It's possible (even likely) that the ls output is being piped to something like head which will exit when it gets what it needs and leave us writing to a broken pipe.
It would be better to just ignore the broken pipe error but currently we don't store system error codes.
These features finally make the ls command practical.
Currently the JSON contains only name, type, and size. We may add more fields in the future, but these seem like the minimum needed to be useful.
This warning gives very unpredictable results between compiler versions and seems unrealistic since most of our structs are zeroed for initialization.
This warning has been disabled in the Makefile for a long time.
Broken vendor packages have been causing builds to break due to an error on apt-get update.
Ignore errors and proceed directory to apt-get install. It's possible that we'll try to reference an expired package version and get an error anyway, but that seems better than a guaranteed hard error.
Push the responsibility for sort and find down to the List object by introducing a general comparator function that can be used for both sorting and finding.
Update insert and add functions to return the item added rather than the list. This is more useful in the core code, though numerous updates to the tests were required.
Connection reuse and pipelining are not the same thing and should not have been conflated.
Update comments and release notes to reflect the correct usage.
Update StorageWritePosix to use the new functions.
A side effect is that storageWritePosixOpen() will no longer error when the user/group name does not exist. It will simply retain the original user/group, i.e. the user that executed the restore.
In general this is a feature since completing a restore is more important than setting permissions exactly from the source host. However, some notification of this omission to the user would be beneficial.
Travis will timeout after 10 minutes with no output. Emit a warning every 5 minutes to keep Travis alive and increase the total timeout to 20 minutes.
Documentation builds have been timing out a lot recently so hopefully this will help.
The control and catalog versions were stored a variety of places in the optimistic hope that they would be useful. In fact they never were.
We can't remove them from the backup.info and backup.manifest files due to backwards compatibility concerns, but we can at least avoid loading and storing them in C structures.
Add functions to the PostgreSQL interface which will return the control and catalog versions for any supported version of PostgreSQL to allow backwards compatibility for backup.info and backup.manifest. These functions will be useful in other ways, e.g. generating the tablespace identifier in PostgreSQL >= 9.0.
Info files required three copies in memory to be loaded (the original string, an ini representation, and the final info object). Not only was this memory inefficient but the Ini object does sequential scans when searching for keys making large files very slow to load.
This has not been an issue since archive.info and backup.info are very small, but it becomes a big deal when loading manifests with hundreds of thousands of files.
Instead of holding copies of the data in memory, use a callback to deliver the ini data directly to the object when loading. Use a similar method for save to avoid having an intermediate copy. Save is a bit complex because sections/keys must be written in alpha order or older versions of pgBackRest will not calculate the correct checksum.
Also move the load retry logic to helper functions rather than embedding it in the Info object. This allows for more flexibility in loading and ensures that stack traces will be available when developing unit tests.
Reviewed by Cynthia Shang.
The manifest is not an info file so if anything it should be called backupManifest. But that seems too long for such a commonly used object so manifest seems better.
Note that unlike Perl there is no storage manifest method so this stands as the only manifest in the C code, as befits its importance.
286a106a updated the documentation to build pgBackRest as an unprivileged user, but the wget command was missed. This command is not actually run, just displayed, because the release is not yet available when the documentation is built.
Update the wget command to run as the local user.
Bug Fixes:
* Improve slow manifest build for very large quantities of tables/segments. (Reported by Jens Wilke.)
* Fix exclusions for special files. (Reported by CluelessTechnologist, Janis Puris, Rachid Broum.)
Improvements:
* The stanza-create/update/delete commands are implemented entirely in C. (Contributed by Cynthia Shang.)
* The start/stop commands are implemented entirely in C. (Contributed by Cynthia Shang.)
* Create log directories/files with 0750/0640 mode. (Suggested by Damiano Albani.)
Documentation Bug Fixes:
* Fix yum.p.o package being installed when custom package specified. (Reported by Joe Ayers, John Harvey.)
Documentation Improvements:
* Build pgBackRest as an unprivileged user. (Suggested by Laurenz Albe.)
The {[os-type-is-centos]} expression was missing parens which meant "and" expressions built on it would always evaluate true if the os-type was centos6.
Reported by Joe Ayers, John Harvey.
Decoding a manifest from the JSON provided by C to the hash required by Perl is an expensive process. If manifest() was called on a remote it was being decoded into a hash and then immediately re-encoded into JSON for transmission over the protocol layer.
Instead, provide a function for the remote to get the raw JSON which can be transmitted as is and decoded in the calling process instead.
This makes remote manifest calls as fast as they were before 2.16, but local calls must still pay the decoding penalty and are therefore slower. This will continue to be true until the Perl storage interface is retired at the end of the C migration.
Note that for reasonable numbers of tables there is no detectable difference. The case in question involved 250K tables with a 10 minute decode time (which was being doubled) on a fast workstation.
These constants should be kept separate because the implementation of any info file might change in the future and only the interface should be expected to remain consistent.
In any case, infoBackup requires Variant constants while infoManifest uses String constants so they are not shareable. Modern compilers should combine the underlying const char * constants.
This test is commonly used for sanity checking but the combination of S3 and encryption makes it hard to use and encourages temporary changes to make it usable.
Acknowledge this and disable S3 and encryption for this test and move them to mock/all/2.