Multi-part upload may fail despite returning an HTTP success code. Check for the ETag field in the result and if not present consider the upload to have failed. This will trigger a retry at the local job level.
Links were followed before they were checked for validity so a circular link would send the manifest build into endless recursion leading to a crash. Fix by moving the recursion after the link check.
Note that this issue has existed since the C migration and was not introduced by the refactor in eba013b.
Data directory creation was added during the C migration, but creation of the base data directory (PGDATA) was prevented by a check migrated from Perl.
Remove the check and update tests to create the data directory at least once.
Includes archiveCommon, archiveGet and archivePush.
Also fixed a test that was looking in repo instead of repo3 in the original archivePush to use the repo3 path as stated by the comment (line 879 in original tests and line 855 in new tests).
It seems better to use TEST_PATH in combination with a constant string rather than have a number of different path constants. This improves readability and reduces confusion about which constant should be used.
For tests already updated as part of the macro-replacement effort, the output tests (TEST_ERROR, TEST_RESULT_LOG, TEST_STORAGE_LIST and TEST_RESULT_STR) have been simplified for readability to remove all but the TEST_PATH constants. The ongoing macro-replacement effort will include these changes.
Updated: expireTest, stanzaTest, checkTest, infoTest, verifyTest (infoArchive and infoBackup had no changes).
Switch from JSON-based to binary protocol for communicating with local and remote process. The pack type is used to implement the binary protocol.
There are a number advantages:
* The pack type is more compact than JSON and are more efficient to render/parse.
* Packs are more strictly typed than JSON.
* Each protocol message is written entirely within ProtocolServer/ProtocolClient so is less likely to get interrupted by an error and leave the protocol in a bad state.
* There is no limit on message size. Previously this was limited by buffer size without a custom implementation, as was done for read/writing files.
Some cruft from the Perl days was removed, specifically allowing NULL messages and stack traces. This is no longer possible in C.
There is room for improvement here, in particular locking down the allowed sequence of protocol messages and building a state machine to enforce it. This will be useful for resetting the protocol when it gets in a bad state.
Some tests had to be reordered or updated, as follows:
* Reordered tests at line 317 and 331 to avoid unnecessary file removal.
* Change "stanza found" test at line 1735 to reflect real-life scenario. Originally this test had the cipher-pass environment key set up which caused the RepoGrp to be 2 but with no valid repo path. This resulted in the repo loops executing for the repo2 but since the path was not defined, the tests just reported "none" for cipher which is incorrect since the repo IS encrypted.
* Moved order of HRN_CFG_LOAD in some tests when able to avoid using storageTest.
It is better to clear errors after the catch block completes rather than leave them set until the next error. This also make is possible to tell when a error is currently being handled, which a function further down the stack might use to modify its behavior. Currently this is only useful in testing, but clearing the error seems like a good idea in general.
Two places used errors outside the CATCH() block. Mem context cleanup now uses a FINALLY() which is a better implementation anyway. The error handling in main() now calls exitSafe() from withing the CATCH() block.
Add StringList, which is not a primitive type but rather an array of String types.
Also update pckWriteToLog() to work after pckWriteEnd(), i.e. this->tagStackTop is NULL.
Move a PackRead or PackWrite object to a new mem context.
Also note that these functions may not work as expected with pack objects created by pckReadNewBuf() and pckWriteNewBuf() since the pack object does not have ownership of the passed buffer and cannot move it.
The hrnErrorThrowP() macro allows errors with specified fields to be generated, which simplifies testing.
Update the common/exit test to use the new macro.
Azurite, which is used for testing, did not enforce this before so the capital letters were not a problem. Now Azurite enforces the same rules as Azure so use lower-case identifiers instead.
These names were only used in integration tests so there was no production impact.
This allows TEST_STORAGE_EXISTS() to be used in most cases where TEST_STORAGE_REMOVE() was used before.
Rename TEST_STORAGE_REMOVE() to HRN_STORAGE_REMOVE() now that is is no longer used as a test. Still allow an error when the file is missing just to help keep tests tidy.
Since the pack type was stored in 4 bits, only 15 values were allowed (0 was reserved).
Allow virtually unlimited types by storing type info in a base-128 encoded integer following the tag when the type bits in the tag are set to 0xF.
Also separate the type IDs used in the pack (PackTypeMap) from those presented to the user (PackType). The prior PackType enum exposed implementation details to the user, e.g. pckTypeUnknown.
The functions were named with short integer representations (e.g. I32) but the param structs were using longer ones, e.g. UInt32. Shorten the integer representations in the param structs to match.
Also rename pckReadUInt64Internal() to pckReadU64Internal() for the same reason.
The pg storage must be started before the repo storage to set the max remotes allowed to 2. The protocol helper expects all remotes to have the same type so we are cheating here a bit, but without this ordering the second remote will never be sent an explicit exit and may not save coverage data.
Bug Fixes:
* Fix issues with leftover spool files from a prior restore. (Reviewed by Cynthia Shang, Stefan Fercot, Floris van Nee. Reported by Floris van Nee.)
* Fix issue when checking links for large numbers of tablespaces. (Reviewed by Cynthia Shang, Avinash Vallarapu. Reported by Avinash Vallarapu.)
* Free no longer needed remotes so they do not timeout during restore. (Reviewed by Cynthia Shang. Reported by Francisco Miguel Biete.)
* Fix help when a valid option is invalid for the specified command. (Reviewed by Stefan Fercot. Reported by Cynthia Shang.)
Features:
* Add PostgreSQL 14 support. (Reviewed by Cynthia Shang.)
* Add automatic GCS authentication for GCE instances. (Reviewed by Jan Wieck, Daniel Farina.)
* Add repo-retention-history option to expire backup history. (Contributed by Stefan Fercot. Reviewed by Cynthia Shang, David Steele.)
* Add db-exclude option. (Contributed by Stefan Fercot. Reviewed by Cynthia Shang.)
Improvements:
* Change archive expiration logging from detail to info level. (Contributed by Cynthia Shang. Reviewed by David Steele.)
* Remove stanza archive spool path on restore. (Reviewed by Cynthia Shang, Stefan Fercot.)
* Do not write files atomically or sync paths during backup copy. (Reviewed by Stephen Frost, Stefan Fercot, Cynthia Shang.)
Documentation Improvements:
* Update contributing documentation. (Contributed by Cynthia Shang. Reviewed by David Steele, Stefan Fercot.)
* Consolidate RHEL/CentOS user guide into a single document. (Reviewed by Cynthia Shang.)
* Clarify that repo-s3-role is not an ARN. (Contributed by Isaac Yuen. Reviewed by David Steele.)
HRN_CFG_LOAD() handles the majority of test configuration loads and has various options for special cases.
It was not clear when to use harnessCfgLoadRaw() vs harnessCfgLoad(). Now "raw" functionality is granular and enabled by parameters, e.g. noStd.
Make the macros more consistent in format and make sure that each macro outputs a line number before doing any work so when errors happen it is clear where they happened.
Add noRecurse option to TEST_STORAGE_LIST().
Add comment option to all storage macros.
Instead store the line number in hrnTestLogPrefix() so it doesn't need to be passed to hrnTestResultBegin().
Also add missing linefeed in hrnStorageList().
All instances of storageTest are better represented with storagePg*(), which allows TEST_PATH and TEST_PATH_PG to be omitted.
Also remove some headers which are no longer needed.
The default is to keep all backup history to match the current behavior. In minimal configuration (0 days), unexpired backups are always kept in history.
When a full backup manifest expires, all dependent differential/incremental manifests expire as well.
Run the remote process inside a forked child process instead of exec'ing it. This allows coverage to accumulate in the remote process rather than needing to test the remote protocol functions directly, resulting in better end-to-end testing and less test duplication. Another advantage is that the pgbackrest binary does not need to be built for the test and the test does not need to run in a container.
This allows protocolRemoteExec() to be shimmed, which means the remote can be run as a child of the test process, simplifying coverage testing.
The shim does not need SSH parameters, so also split those out into a separate function and update the tests to match.
Add executable to parameter list to avoid first option being lost. The backup, restore, and verify tests worked OK with their first option being defaulted because it ended up being job-retry which worked fine as the default.
Add hrnProtocolLocalShimUninstall() allow the shim to be uninstalled.
Log shim at debug level to make it obvious in the logs when a shim is in use.
There are no code changes from PostgreSQL 13 so simply add the new version.
Add CATALOG_VERSION_NO_MAX to allow the catalog version to "float" during the PostgreSQL beta/rc period so new pgBackRest versions are not required when the catalog version changes.
Update the integration tests to handle new PostgreSQL startup messages.
manifestLinkCheck() was pretty inefficient so large numbers of links caused it to use a lot of memory and eventually crash. This is a more efficient implementation which runs O(nlogn) and uses far less memory.
Checking for duplicate file links has been added, which represents a change in behavior, but hopefully a good one.
Test and remove WAL segment 000000010000000100000002 in the test where it is created rather than as a byproduct of a much later test.
Remove incorrect local role from test. It worked, but was not the correct command role to be using when calling cmdArchiveGet().
Move some harness headers down to the correct section.
Comment formatting was not used much but it incurred a heavy cost in each macro to process possible formatting.
Remove formatted comments where they did not contain valuable information and replace with strZ(strNewFmt()) otherwise.
A define was already added for TEST_PATH but it was not widely used. Replace all occurrences of testPath() with TEST_PATH in the tests.
Replace testUser() with TEST_USER, testGroup() with TEST_GROUP, testRepoPath() with HRN_PATH_REPO, testDataPath() with HRN_PATH, testProjectExe() with TEST_PROJECT_EXE, and testScale() with TEST_SCALE.
Replace {[path]}, {[user]}, {[group]}, etc. with defines and remove hrnReplaceKey(). This is better than having two ways to deal with replacements.
In some cases the original test*() getters were kept because they are used by the harness, which does not have access to the new defines. Move them to harnessTest.intern.h to indicate that the tests should no longer use them.
Replace all instances of strNew("") with strNew() and use strNewZ() for non-empty zero-terminated strings. Besides saving a useless parameter, this will allow smarter memory allocation in a future commit by signaling intent, in general, to append or not.
In the tests use STRDEF() or VARSTRDEF() where more appropriate rather than blindly replacing with strNewZ(). Also replace strLstAdd() with strLstAddZ() where appropriate for the same reason.
Run the local process inside a forked child process instead of exec'ing it. This allows coverage to accumulate in the local process rather than needing to test the local protocol functions directly, resulting in better end-to-end testing and less test duplication. Another advantage is that the pgbackrest binary does not need to be built for the test.
The backup, restore, and verify command tests have been updated to use the new shim for coverage.
A shim allows a test harness to access static functions and variables in a C module, and also allows functions to be shimmed (i.e. overridden) for the purposes of testing.
For instance, coverage testing works when a process that is normally exec'd is run as a forked child process instead.
getopt_long() requires an exhaustive list of all possible options that may be found on the command line. Because of the way options are indexed (e.g. repo1-4, pg1-8) optionList[] has 827 entries and we have kept it small by curtailing the maximum indexes very severely. Another issue is that getopt_long() scans the array sequentially so parsing gets slower as the index maximums increase.
Replace getopt_long() with a custom implementation that behaves the same but allows options to be parsed with a function instead of using optionList[]. This commit leaves the list in place in order to focus on the getopt_long() replacement, but cfgParseOption() could be replaced with a more efficient implementation that removes the need for optionList[].
This implementation also fixes an issue where invalid options were misreported in the error message if they only had one dash, e.g. -config. This seems to have been some kind of problem in getopt_long(), but no investigation was done since the new implementation fixes it.
Tests were added at 0825428, 2b8d2da, 34dd663, and 384f247 to check that previously untested getopt_long() behavior doesn't change.
This makes the macro useful when subpaths are present.
Identify types other than files (path, link, etc.) with a single appended character for easier debugging.
Remove stanza archive spool path so existing files do not interfere with the new cluster. For instance, old archive-push acknowledgements could cause a new cluster to skip archiving. This should not happen if a new timeline is selected but better to be safe. Missing stanza spool paths are ignored.
Also add new path expression STORAGE_SPOOL_ARCHIVE to easily access this path.
When running on a GCE instance the authentication token can be pulled directly from the instance metadata. This is configured with repo-gcs-key-type=auto.
In a separate commit (26fefa6), move the code that parses the token response into a separate function, storageGcsAuthToken(), since it is now needed by two key types. This drastically improves the readability of the main commit.
When running outside of our standard Vagrantfile the default will not be set correctly, so require the user to set it.
In any case, this option is primarily useful for reporting so note that in the command line help.
Some version interface test functions were integrated into the core code because they relied on the PostgreSQL versioned interface. Even though they were compiled out for production builds they cluttered the core code and made it harder to determine what was required by core.
Create a PostgreSQL version interface in a test harness to contain these functions. This does require some duplication but the cleaner core code seems a good tradeoff. It is possible for some of this code to be auto-generated but since it is only updated once per year the matter is not pressing.
If an ok file (which indicates the WAL segment was not found) is present on the first iteration of the loop then remove it and spawn the async process to retry. This action also resets the queue.
Also error if no response is received from the async process rather than returning not found. PostgreSQL will respond the same either way, but this allows us to determine when something is going wrong with the async process.
Update archiveAsyncStatus() to allow warnings to be suppressed. It is better to retry if no WAL segment was found before warning because the warning might be stale.
If an option name has a space at the beginning then it will be considered an invalid command, but a space at the end is an invalid option. Add tests for these conditions.
Spaces in option arguments should be preserved, so add a test to be sure this is true.
Convert most of the remaining options that benefit from being StringIds. Since all the command modules can include config.h directly it makes sense to auto-generate these values instead of manually creating an enum for each one.
For the time being StringIds are not being auto-generated because the StringId code does not exist in Perl. However, the *_Z zero-terminated constants for each allowed option value are now auto-generated.
The CentOS 7 documentation test relies on PostgreSQL 9.5 which has been removed from the yum.p.o repository package. Switch the test to CentOS 8 to fix the immediate issue, but a decision on the PostgreSQL 9.5 documentation will need to be made before the next release.
The tests worked fine on multiple architectures, but would only run "bare metal", i.e. tests that required containers could not be run.
Enable basic multi-architecture support by allowing containers to be built using whatever architecture the host supports. Also allow cached containers to be defined for multiple architectures in container.yaml.
Add a Dockerfile which can be used as a container for other containers to provide a consistent development environment.
The primary goal is to allow development on Mac M1 but other architectures should find these improvements useful.
Allows removal of backupType()/backupTypeStr() and improves debug logging of the enum.
Move BackupType enum and string constants to info/infoBackup.h so they are available to more modules. Also convert InfoBackup to use BackupType instead of a String.
Centralize the formatting of the configuration value for display to the user or passing on a command line.
For the new functions, if the value was set by the user via the command line, config, etc., then that exact value will be displayed. This makes it easier for the user to recognize the value and saves having to format it into something reasonable, especially for time and size option types.
Note that cfgOptTypeHash and cfgOptTypeList option types are not supported by these functions, but they are generally not displayed to the user as a whole.
This also fixes a bug in config/load.c where time values where not being formatted correctly in an error message.
Use StringIds for the storage types (e.g. STORAGE_S3_TYPE) and configuration settings, e.g. cfgOptS3KeyType.
Also add new config functions and harness config functions to support StringIds.
There is no need to write the file atomically (e.g. via a temp file on Posix) because checksums are tested on resume after a failed backup. The path does not need be synced for each file because all paths are synced at the end of the backup.
This functionality was not lost during the migration -- it never existed in the Perl code, though these settings are used in restore. See 59f1353 where backupFile() was migrated to C.
Fix the segfault when getting help for an internal option is requested by adding help for all internal options that are valid for a default command role.
Also print warnings about internal options in code rather than putting in each command/option description.