Command-line help is now generated at build time so it does not need to be committed. This reduces churn on commits that add configuration and/or update the help.
Since churn is no longer an issue, help.auto.c is bzip2 compressed to save space in the binary.
The Perl config parser (Data.pm) has been moved to doc/lib since the Perl build path is no longer required.
Likewise doc/xml/reference.xml has been moved to src/build/help/help.xml since it is required at build time.
The newer version of valgrind helps with some arm64 issues that have been fixed since the architecture has become more popular. Also add the valgrind builds to the Vagrantfile and Dockerfile.
Move the CA cert install from the base container to the test container. This means the CA cert can be changed without rebuilding all the base containers.
The primary benefit is that objects can allocate memory for their struct with the context, which saves an additional allocation and makes it easier to read context/allocation dumps. Also, the memory context does not need to be stored with the object since it can be determined using the object pointer.
Object pointers cannot be moved, so this means whatever additional memory is allocated cannot be resized. That makes the additional memory ideal for object structs, but not so much for allocating a list that might change size.
Mem contexts can no longer be reused since they will probably be the wrong size so their memory is freed on memContextFree(). This still means fewer allocations and frees overall.
Interfaces still need to be freed by mem context so the old objMove() and objFree() have been preserved as objMoveContext() and objFreeContext(). This will be addressed in a future commit.
The prior limitations were based on using getopt_long() to parse command-line options, which required a static list of allowed options. Setting index max too high bloated the binary unacceptably. 45a4e80 replaced the functionality of getopt_long() but the static list remained.
Improve cfgParseOption() to use available option data and remove the need for a static list. This also allows the option deprecations to be represented more compactly.
Index max is still capped at 256 because a large enough index could cause parseOptionIdxValue() to run out of memory since it allocates a static list based on the highest index found. If that function were improved with a map of found index values then index max could be set to UINT64_MAX.
Note that deprecations no longer set an index max or define whether reset is valid. These were space-saving measures which are no longer required. This means that indexed deprecated options will also be valid up to 256 and always allow reset, but it doesn't seem worth additional code to limit this behavior.
cfgParseOptionId() is no longer needed because calling cfgParseOption() with .ignoreMissingIndex = true duplicates the functionality of cfgParseOptionId(). This leads to some simplification in the help code.
The certs are available in test/certificate so it makes more sense to use them there. In addition the container does not need to be rebuilt unless the CA cert changes.
contextParentIdx was introduced in 90709dfd to improve the performance of mem context frees. memContextMove() did not get the message, however, and continued to use a loop to find the mem context in the old parent.
Use contextParentIdx to find the mem context in the old parent to avoid a loop.
IMPORTANT NOTE: The log level for copied files in the backup/restore commands has been changed to detail. This makes the info log level less noisy but if these messages are required then set the log level for the backup/restore commands to detail.
Bug Fixes:
* Detect errors in S3 multi-part upload finalize. (Reviewed by Cynthia Shang, Marco Montagna. Reported by Marco Montagna, Lev Kokotov, Anderson A. Mallmann.)
* Fix detection of circular symlinks. (Reviewed by Stefan Fercot. Reported by Rohit Raveendran.)
* Only pass selected repo options to the remote. (Reviewed by David Christensen, Cynthia Shang. Reported by Greg Sabino Mullane, David Christensen.)
Improvements:
* Binary protocol. (Reviewed by Cynthia Shang.)
* Automatically create data directory on restore. (Contributed by Stefan Fercot. Reviewed by David Steele. Suggested by Chris Bandy.)
* Allow restore --type=lsn. (Contributed by Stefan Fercot. Reviewed by Cynthia Shang. Suggested by James Coleman.)
* Change level of backup/restore copied file logging to detail. (Reviewed by Stefan Fercot. Suggested by Jens Wilke.)
* Loop while waiting for checkpoint LSN to reach replay LSN. (Contributed by Stefan Fercot. Reviewed by David Steele. Suggested by Fatih Mencutekin.)
* Log backup file total and restore size/file total. (Reviewed by Cynthia Shang.)
Documentation Bug Fixes:
* Fix incorrect host names in user guide. (Reviewed by Stefan Fercot. Reported by Greg Sabino Mullane.)
Documentation Improvements:
* Update contributing documentation and add pull request template. (Contributed by Cynthia Shang. Reviewed by David Steele.)
* Rearrange backup documentation in user guide. (Reviewed by Cynthia Shang.)
* Clarify restore --type behavior in command reference. (Contributed by Cynthia Shang. Reviewed by David Steele.)
* Fix documentation and comment typos. (Contributed by Eric Radman. Reviewed by David Steele.)
Test Suite Improvements:
* Add check for test path inside repo path. (Reviewed by Greg Sabino Mullane. Suggested by Greg Sabino Mullane.)
* Add CodeQL static code analysis. (Reviewed by Cynthia Shang.)
* Update tests to use standard patterns. (Contributed by Cynthia Shang. Reviewed by David Steele.)
The error was written to the client and then another command read. If the write did not fail then the loop would never exit.
Instead exit on any error that is not raised by the command handler as we can pretty safely assume this is an unrecoverable protocol error. The command handler might throw a protocol error itself, but this should be caught in the next read or write in the main loop.
If the buffer was not full at EOF then ioReadSmall() would get stuck in an infinite loop. Instead, return on EOF even if the buffer is not full.
This is not an issue in released versions since ioReadSmall() is not being used.
Also fix a comment typo.
The backup size was a bit off because it did not include any files (e.g. backup_label, WAL files) that were added to the manifest after the main copy. To fix this move the log message to the very end of the backup.
Add size/file total log message to restore since it did not exist before.
The storageInfoList() test was broken by 54c4eb0c when the remote was changed to use writeable storage. Since the test driver was being injected into the wrong location, new default storage was created and the test effectively did nothing but still "succeeded".
To prevent this type of regression, add checks to ensure the expected test driver is being used and the callback runs the expected number of times.
Cleanup all clients inherited from the parent process so they cannot be accidentally used to send messages to servers that do not belong to this process.
We need to do this carefully so that exit commands are not sent and processes are not terminated, so clear the mem context callback on each object before freeing it.
Options for other repos can cause conflicts and should never be used. Each remote can address exactly one repo or pg cluster.
Also fix an outdated comment.
This function was included in a header but not declared inline, so linker errors happened when the header was included into more than one file.
Because of the setjmp() in TRY_BEGIN() it can't be inlined so put it in a C file.
Also add some missing headers.
There have been intermittent failures on f33 (with coverage) but not on u16 (without coverage).
Reproducing this reliably has been very difficult, so just try increasing the timeouts. This is based on the observation that tests with coverage take longer than tests without, which may lead the f33 tests to fail if CI is running slower than usual.
This will not increase the runtime of the test unless there is an error.
If configure/make has been run in the src path it can conflict with tests, which may require different build options.
Also add a comment when rebuilding for code generation.
This file duplicated the command list that already exists in parse.auto.c.
Combine the data from config.auto.c into parse.auto.c and adjust the interface functions as needed. Quite a few were able to be moved to parse.c as static.
If the test path is inside the repo path then it can cause strange issues during testing because the entire repo path is duplicated into the test path so that all tests see a consistent view of the repo.
Another solution might be to pick a better test path name and exclude it from the rsync, but this fix at least addresses the immediate issue.
This was started in c5ae047e but did not include generation of parse.auto.c.
The parser has also been improved with better errors and multiple passes to reduce dependency on ordering and produce and cleaner output.
Option order resolution now includes cycle detection.
Remove strIdGenerate() since bldStrId() performs the same function without cluttering the core code. Since bldStrId() is intended to work in non-debug builds, move the validity checks for input strings out of the DEBUG block.
StringIds are generated as 5/6 bit, whichever is most efficient, for each option value. cfgOptionStrIdInternal() has been updated for this logic.
This allows a local/remote to be started independently of server initialization, which will be useful for implementing new transport types, e.g. TLS.
Also remove some dead code in localTest.c.
protocolServerNew() does not automatically process a noop so this made the handshake in the constructors asymmetric. This made testing a bit confusing since an extra noop was needed when cmdLocal()/cmdRemote() were not called (since they processed the noop sent by protocolClientNew()).
The extra noops also make complex protocol negotiation (coming in a future commit) more complicated and slower because of the additional round trips.
The storage tests were not modified to the HRN_STORAGE_* nor TEST_STORAGE_* macros as these test are testing the storage drivers.
Note that posixTest.c removed an extraneous #endif // TEST_CONTAINER_REQUIRED and #ifdef TEST_CONTAINER_REQUIRED.
This PR includes all files in the storage/* test directory, namely: azureTest.c, cifsTest.c, gcsTest.c, posixTest.c, remoteTest.c, s3Test.c
--smart is now the default mode. Since --dev is now just an alias for --no-optimize, remove it. --dev-test has been a noop for a while, so this seems like a good time to remove it.
Also make the C auto-generator skip writing files that have not changed to avoid updating the timestamp.
Note that the logging output display of a parent/child test may look jumbled on some systems since the child and parent are attempting to log information at the same time. This is not an issue with the actual test, rather a harness issue that would be beyond the scope of this project to fix.
Parse enough of config.yaml to auto-generate config.auto.h and config.auto.c.
This commit implements most of the infrastructure needed to migrate the rest of the build code to C, but each set of auto-generated files will present its own challenges.
The build is now dependent on libyaml. At this point there is no need for a hard requirement, but that will come soon so it seems better to add the dependency now.
Update Ubuntu 12.04 to 16.04. Version 16.04 is recently EOL but testing on an old version is beneficial.
Update Ubuntu 18.04 to 20.04.
Update Fedora 32 to 33. Version 34 would have been preferred but there were some build issues, i.e. the default shell did not work with configure, and after ksh was installed configure locked up.
Add --no-install-recommends to apt-get commands to save a bit of time and space.
Update test Dockerfile to run in multiple steps. This makes the container larger but also makes rebuilding after changes faster. The --squash option may be used to keep the container small.
Remove obsolete casts in protocol/parallel module. These casts were included in the original migration because Ubuntu 12.04 32-bit gcc required them, but Ubuntu 16.04 32-bit gcc complains. There is no production issue here since at this point in the code the file descriptors are guaranteed to be >= 0.
In the first test (helpRenderSplitSize) added test for empty list and in that and some other tests, the test comment was updated to clarify a bit more what the actual tests is trying to accomplish.
Note that help test parameters can only use the harnessConfig system when testing option values that have been set since options passed to the help command are not "set" options.
Includes backup and backupCommon tests.
Some tests in backupTest were split out where they were originally combined into a single boolean check - which made it difficult to determine which part of the conditional failed.
String values were also removed where they were no longer needed.
It is possible for the checkpoint LSN to lag slightly behind the replay LSN until pg_control has been updated.
Add a loop to keep checking rather than failing when the checkpoint LSN has not been updated.
This removes a lot of boiler plate where every instance needs to create these interfaces.
Also add HRN_FORK_*_NOTIFY*() macros to standardize synchronizing between the parent and child processes.
In both cases update the tests with the new macros.
Simplify HRN_FORK_CHILD_BEGIN() by adding optional parameters with the common defaults.
Add _FD() to macros that retrieve file descriptors to make their purpose clearer.
The log level for copied files in the backup/restore commands has been changed to detail. This makes the info log level less noisy but if these messages are required then set the log level for the backup/restore commands to detail.
The protocol does not put an end message on exit so there was a race between the main process exiting and the remote processes exiting. If the main process exited first then the remote processes might not write coverage data causing coverage to fail.
Fix by calling exit explicitly at the end of the test and update the harness to put an end message so the exits are synchronized.
In the commandTest the HRN_STORAGE_REMOVE replacement uses .errorOnMissing when the code being tested added the file. The reason for this is 3 fold:
1. to ensure that an inadvertent typo in the path/file name does not go undetected,
2. to ensure that nothing else has removed the file prior to the call, and
3. consistency
Also, added "stanza" to comment when a stanza stop file is removed vs an "all" stop file.
Multi-part upload may fail despite returning an HTTP success code. Check for the ETag field in the result and if not present consider the upload to have failed. This will trigger a retry at the local job level.
Links were followed before they were checked for validity so a circular link would send the manifest build into endless recursion leading to a crash. Fix by moving the recursion after the link check.
Note that this issue has existed since the C migration and was not introduced by the refactor in eba013b.
Data directory creation was added during the C migration, but creation of the base data directory (PGDATA) was prevented by a check migrated from Perl.
Remove the check and update tests to create the data directory at least once.
Includes archiveCommon, archiveGet and archivePush.
Also fixed a test that was looking in repo instead of repo3 in the original archivePush to use the repo3 path as stated by the comment (line 879 in original tests and line 855 in new tests).
It seems better to use TEST_PATH in combination with a constant string rather than have a number of different path constants. This improves readability and reduces confusion about which constant should be used.
For tests already updated as part of the macro-replacement effort, the output tests (TEST_ERROR, TEST_RESULT_LOG, TEST_STORAGE_LIST and TEST_RESULT_STR) have been simplified for readability to remove all but the TEST_PATH constants. The ongoing macro-replacement effort will include these changes.
Updated: expireTest, stanzaTest, checkTest, infoTest, verifyTest (infoArchive and infoBackup had no changes).
Switch from JSON-based to binary protocol for communicating with local and remote process. The pack type is used to implement the binary protocol.
There are a number advantages:
* The pack type is more compact than JSON and are more efficient to render/parse.
* Packs are more strictly typed than JSON.
* Each protocol message is written entirely within ProtocolServer/ProtocolClient so is less likely to get interrupted by an error and leave the protocol in a bad state.
* There is no limit on message size. Previously this was limited by buffer size without a custom implementation, as was done for read/writing files.
Some cruft from the Perl days was removed, specifically allowing NULL messages and stack traces. This is no longer possible in C.
There is room for improvement here, in particular locking down the allowed sequence of protocol messages and building a state machine to enforce it. This will be useful for resetting the protocol when it gets in a bad state.
Some tests had to be reordered or updated, as follows:
* Reordered tests at line 317 and 331 to avoid unnecessary file removal.
* Change "stanza found" test at line 1735 to reflect real-life scenario. Originally this test had the cipher-pass environment key set up which caused the RepoGrp to be 2 but with no valid repo path. This resulted in the repo loops executing for the repo2 but since the path was not defined, the tests just reported "none" for cipher which is incorrect since the repo IS encrypted.
* Moved order of HRN_CFG_LOAD in some tests when able to avoid using storageTest.
It is better to clear errors after the catch block completes rather than leave them set until the next error. This also make is possible to tell when a error is currently being handled, which a function further down the stack might use to modify its behavior. Currently this is only useful in testing, but clearing the error seems like a good idea in general.
Two places used errors outside the CATCH() block. Mem context cleanup now uses a FINALLY() which is a better implementation anyway. The error handling in main() now calls exitSafe() from withing the CATCH() block.
Add StringList, which is not a primitive type but rather an array of String types.
Also update pckWriteToLog() to work after pckWriteEnd(), i.e. this->tagStackTop is NULL.
Move a PackRead or PackWrite object to a new mem context.
Also note that these functions may not work as expected with pack objects created by pckReadNewBuf() and pckWriteNewBuf() since the pack object does not have ownership of the passed buffer and cannot move it.
The hrnErrorThrowP() macro allows errors with specified fields to be generated, which simplifies testing.
Update the common/exit test to use the new macro.
Azurite, which is used for testing, did not enforce this before so the capital letters were not a problem. Now Azurite enforces the same rules as Azure so use lower-case identifiers instead.
These names were only used in integration tests so there was no production impact.
This allows TEST_STORAGE_EXISTS() to be used in most cases where TEST_STORAGE_REMOVE() was used before.
Rename TEST_STORAGE_REMOVE() to HRN_STORAGE_REMOVE() now that is is no longer used as a test. Still allow an error when the file is missing just to help keep tests tidy.
Since the pack type was stored in 4 bits, only 15 values were allowed (0 was reserved).
Allow virtually unlimited types by storing type info in a base-128 encoded integer following the tag when the type bits in the tag are set to 0xF.
Also separate the type IDs used in the pack (PackTypeMap) from those presented to the user (PackType). The prior PackType enum exposed implementation details to the user, e.g. pckTypeUnknown.
The functions were named with short integer representations (e.g. I32) but the param structs were using longer ones, e.g. UInt32. Shorten the integer representations in the param structs to match.
Also rename pckReadUInt64Internal() to pckReadU64Internal() for the same reason.
The pg storage must be started before the repo storage to set the max remotes allowed to 2. The protocol helper expects all remotes to have the same type so we are cheating here a bit, but without this ordering the second remote will never be sent an explicit exit and may not save coverage data.
Bug Fixes:
* Fix issues with leftover spool files from a prior restore. (Reviewed by Cynthia Shang, Stefan Fercot, Floris van Nee. Reported by Floris van Nee.)
* Fix issue when checking links for large numbers of tablespaces. (Reviewed by Cynthia Shang, Avinash Vallarapu. Reported by Avinash Vallarapu.)
* Free no longer needed remotes so they do not timeout during restore. (Reviewed by Cynthia Shang. Reported by Francisco Miguel Biete.)
* Fix help when a valid option is invalid for the specified command. (Reviewed by Stefan Fercot. Reported by Cynthia Shang.)
Features:
* Add PostgreSQL 14 support. (Reviewed by Cynthia Shang.)
* Add automatic GCS authentication for GCE instances. (Reviewed by Jan Wieck, Daniel Farina.)
* Add repo-retention-history option to expire backup history. (Contributed by Stefan Fercot. Reviewed by Cynthia Shang, David Steele.)
* Add db-exclude option. (Contributed by Stefan Fercot. Reviewed by Cynthia Shang.)
Improvements:
* Change archive expiration logging from detail to info level. (Contributed by Cynthia Shang. Reviewed by David Steele.)
* Remove stanza archive spool path on restore. (Reviewed by Cynthia Shang, Stefan Fercot.)
* Do not write files atomically or sync paths during backup copy. (Reviewed by Stephen Frost, Stefan Fercot, Cynthia Shang.)
Documentation Improvements:
* Update contributing documentation. (Contributed by Cynthia Shang. Reviewed by David Steele, Stefan Fercot.)
* Consolidate RHEL/CentOS user guide into a single document. (Reviewed by Cynthia Shang.)
* Clarify that repo-s3-role is not an ARN. (Contributed by Isaac Yuen. Reviewed by David Steele.)
HRN_CFG_LOAD() handles the majority of test configuration loads and has various options for special cases.
It was not clear when to use harnessCfgLoadRaw() vs harnessCfgLoad(). Now "raw" functionality is granular and enabled by parameters, e.g. noStd.
Make the macros more consistent in format and make sure that each macro outputs a line number before doing any work so when errors happen it is clear where they happened.
Add noRecurse option to TEST_STORAGE_LIST().
Add comment option to all storage macros.
Instead store the line number in hrnTestLogPrefix() so it doesn't need to be passed to hrnTestResultBegin().
Also add missing linefeed in hrnStorageList().
All instances of storageTest are better represented with storagePg*(), which allows TEST_PATH and TEST_PATH_PG to be omitted.
Also remove some headers which are no longer needed.
The default is to keep all backup history to match the current behavior. In minimal configuration (0 days), unexpired backups are always kept in history.
When a full backup manifest expires, all dependent differential/incremental manifests expire as well.
Run the remote process inside a forked child process instead of exec'ing it. This allows coverage to accumulate in the remote process rather than needing to test the remote protocol functions directly, resulting in better end-to-end testing and less test duplication. Another advantage is that the pgbackrest binary does not need to be built for the test and the test does not need to run in a container.
This allows protocolRemoteExec() to be shimmed, which means the remote can be run as a child of the test process, simplifying coverage testing.
The shim does not need SSH parameters, so also split those out into a separate function and update the tests to match.
Add executable to parameter list to avoid first option being lost. The backup, restore, and verify tests worked OK with their first option being defaulted because it ended up being job-retry which worked fine as the default.
Add hrnProtocolLocalShimUninstall() allow the shim to be uninstalled.
Log shim at debug level to make it obvious in the logs when a shim is in use.
There are no code changes from PostgreSQL 13 so simply add the new version.
Add CATALOG_VERSION_NO_MAX to allow the catalog version to "float" during the PostgreSQL beta/rc period so new pgBackRest versions are not required when the catalog version changes.
Update the integration tests to handle new PostgreSQL startup messages.
manifestLinkCheck() was pretty inefficient so large numbers of links caused it to use a lot of memory and eventually crash. This is a more efficient implementation which runs O(nlogn) and uses far less memory.
Checking for duplicate file links has been added, which represents a change in behavior, but hopefully a good one.
Test and remove WAL segment 000000010000000100000002 in the test where it is created rather than as a byproduct of a much later test.
Remove incorrect local role from test. It worked, but was not the correct command role to be using when calling cmdArchiveGet().
Move some harness headers down to the correct section.
Comment formatting was not used much but it incurred a heavy cost in each macro to process possible formatting.
Remove formatted comments where they did not contain valuable information and replace with strZ(strNewFmt()) otherwise.
A define was already added for TEST_PATH but it was not widely used. Replace all occurrences of testPath() with TEST_PATH in the tests.
Replace testUser() with TEST_USER, testGroup() with TEST_GROUP, testRepoPath() with HRN_PATH_REPO, testDataPath() with HRN_PATH, testProjectExe() with TEST_PROJECT_EXE, and testScale() with TEST_SCALE.
Replace {[path]}, {[user]}, {[group]}, etc. with defines and remove hrnReplaceKey(). This is better than having two ways to deal with replacements.
In some cases the original test*() getters were kept because they are used by the harness, which does not have access to the new defines. Move them to harnessTest.intern.h to indicate that the tests should no longer use them.
Replace all instances of strNew("") with strNew() and use strNewZ() for non-empty zero-terminated strings. Besides saving a useless parameter, this will allow smarter memory allocation in a future commit by signaling intent, in general, to append or not.
In the tests use STRDEF() or VARSTRDEF() where more appropriate rather than blindly replacing with strNewZ(). Also replace strLstAdd() with strLstAddZ() where appropriate for the same reason.
Run the local process inside a forked child process instead of exec'ing it. This allows coverage to accumulate in the local process rather than needing to test the local protocol functions directly, resulting in better end-to-end testing and less test duplication. Another advantage is that the pgbackrest binary does not need to be built for the test.
The backup, restore, and verify command tests have been updated to use the new shim for coverage.
A shim allows a test harness to access static functions and variables in a C module, and also allows functions to be shimmed (i.e. overridden) for the purposes of testing.
For instance, coverage testing works when a process that is normally exec'd is run as a forked child process instead.
getopt_long() requires an exhaustive list of all possible options that may be found on the command line. Because of the way options are indexed (e.g. repo1-4, pg1-8) optionList[] has 827 entries and we have kept it small by curtailing the maximum indexes very severely. Another issue is that getopt_long() scans the array sequentially so parsing gets slower as the index maximums increase.
Replace getopt_long() with a custom implementation that behaves the same but allows options to be parsed with a function instead of using optionList[]. This commit leaves the list in place in order to focus on the getopt_long() replacement, but cfgParseOption() could be replaced with a more efficient implementation that removes the need for optionList[].
This implementation also fixes an issue where invalid options were misreported in the error message if they only had one dash, e.g. -config. This seems to have been some kind of problem in getopt_long(), but no investigation was done since the new implementation fixes it.
Tests were added at 0825428, 2b8d2da, 34dd663, and 384f247 to check that previously untested getopt_long() behavior doesn't change.
This makes the macro useful when subpaths are present.
Identify types other than files (path, link, etc.) with a single appended character for easier debugging.
Remove stanza archive spool path so existing files do not interfere with the new cluster. For instance, old archive-push acknowledgements could cause a new cluster to skip archiving. This should not happen if a new timeline is selected but better to be safe. Missing stanza spool paths are ignored.
Also add new path expression STORAGE_SPOOL_ARCHIVE to easily access this path.
When running on a GCE instance the authentication token can be pulled directly from the instance metadata. This is configured with repo-gcs-key-type=auto.
In a separate commit (26fefa6), move the code that parses the token response into a separate function, storageGcsAuthToken(), since it is now needed by two key types. This drastically improves the readability of the main commit.
When running outside of our standard Vagrantfile the default will not be set correctly, so require the user to set it.
In any case, this option is primarily useful for reporting so note that in the command line help.
Some version interface test functions were integrated into the core code because they relied on the PostgreSQL versioned interface. Even though they were compiled out for production builds they cluttered the core code and made it harder to determine what was required by core.
Create a PostgreSQL version interface in a test harness to contain these functions. This does require some duplication but the cleaner core code seems a good tradeoff. It is possible for some of this code to be auto-generated but since it is only updated once per year the matter is not pressing.
If an ok file (which indicates the WAL segment was not found) is present on the first iteration of the loop then remove it and spawn the async process to retry. This action also resets the queue.
Also error if no response is received from the async process rather than returning not found. PostgreSQL will respond the same either way, but this allows us to determine when something is going wrong with the async process.
Update archiveAsyncStatus() to allow warnings to be suppressed. It is better to retry if no WAL segment was found before warning because the warning might be stale.
If an option name has a space at the beginning then it will be considered an invalid command, but a space at the end is an invalid option. Add tests for these conditions.
Spaces in option arguments should be preserved, so add a test to be sure this is true.
Convert most of the remaining options that benefit from being StringIds. Since all the command modules can include config.h directly it makes sense to auto-generate these values instead of manually creating an enum for each one.
For the time being StringIds are not being auto-generated because the StringId code does not exist in Perl. However, the *_Z zero-terminated constants for each allowed option value are now auto-generated.
The CentOS 7 documentation test relies on PostgreSQL 9.5 which has been removed from the yum.p.o repository package. Switch the test to CentOS 8 to fix the immediate issue, but a decision on the PostgreSQL 9.5 documentation will need to be made before the next release.
The tests worked fine on multiple architectures, but would only run "bare metal", i.e. tests that required containers could not be run.
Enable basic multi-architecture support by allowing containers to be built using whatever architecture the host supports. Also allow cached containers to be defined for multiple architectures in container.yaml.
Add a Dockerfile which can be used as a container for other containers to provide a consistent development environment.
The primary goal is to allow development on Mac M1 but other architectures should find these improvements useful.
Allows removal of backupType()/backupTypeStr() and improves debug logging of the enum.
Move BackupType enum and string constants to info/infoBackup.h so they are available to more modules. Also convert InfoBackup to use BackupType instead of a String.
Centralize the formatting of the configuration value for display to the user or passing on a command line.
For the new functions, if the value was set by the user via the command line, config, etc., then that exact value will be displayed. This makes it easier for the user to recognize the value and saves having to format it into something reasonable, especially for time and size option types.
Note that cfgOptTypeHash and cfgOptTypeList option types are not supported by these functions, but they are generally not displayed to the user as a whole.
This also fixes a bug in config/load.c where time values where not being formatted correctly in an error message.
Use StringIds for the storage types (e.g. STORAGE_S3_TYPE) and configuration settings, e.g. cfgOptS3KeyType.
Also add new config functions and harness config functions to support StringIds.
There is no need to write the file atomically (e.g. via a temp file on Posix) because checksums are tested on resume after a failed backup. The path does not need be synced for each file because all paths are synced at the end of the backup.
This functionality was not lost during the migration -- it never existed in the Perl code, though these settings are used in restore. See 59f1353 where backupFile() was migrated to C.
Fix the segfault when getting help for an internal option is requested by adding help for all internal options that are valid for a default command role.
Also print warnings about internal options in code rather than putting in each command/option description.
The enum truncation observed was due to the value getting passed via a protocol function which silently narrowed the enum.
Even so, add some tests to ensure tested platforms support 64-bit enums.
Although kvAdd() works like kvPut() on the first call, kvPut() is more efficient when a key has a single value.
Update the comment to clarify that kvAdd() is seldom required.
Getting help for a valid option that was invalid for the command would segfault.
Add a check to ensure the option is valid for the command's default role.
This lets the compiler know that these variables are not modified which should lead to better optimization.
Smart compilers should be able to figure this out on their own, but marking parameters const is still good for documentation.
It is often useful to represent identifiers as strings when they cannot easily be represented as an enum/integer, e.g. because they are distributed among a number of unrelated modules or need to be passed to remote processes. Strings are also more helpful in debugging since they can be recognized without cross-referencing the source. However, strings are awkward to work with in C since they cannot be directly used in switch statements leading to less efficient if-else structures.
A StringId encodes a short string into an integer so it can be used in switch statements but may also be readily converted back into a string for debugging purposes. StringIds may also be suitable for matching user input providing the strings are short enough.
This patch includes a sample of StringId usage by converting protocol commands to StringIds. There are many other possible use cases. To list a few:
* All "types" in storage, filters. IO , etc. These types are primarily for identification and debugging so they fit well with this model.
* MemContext names would work well as StringIds since these are entirely for debugging.
* Option values could be represented as StringIds which would mean we could remove the functions that convert strings to enums, e.g. CipherType.
* There are a number of places where enums need to be converted back to strings for logging/debugging purposes. An example is protocolParallelJobToConstZ. If ProtocolParallelJobState were defined as:
typedef enum
{
protocolParallelJobStatePending = STRID5("pend", ...),
protocolParallelJobStateRunning = STRID5("run", ...),
protocolParallelJobStateDone = STRID5("done", ...),
} ProtocolParallelJobState;
then protocolParallelJobToConstZ() could be replaced with strIdToZ(). This also applies to many enums that we don't covert to strings for logging, such as CipherMode.
As an example of usage, convert all protocol commands from strings to StringIds.
Restore excluding the specified databases. Databases excluded will be restored as sparse, zeroed files to save space but still allow PostgreSQL to perform recovery. After recovery, those databases will not be accessible but can be removed with the drop database command. The --db-exclude option can be passed multiple times to specify more than one database to exclude.
When used in combination with the --db-include option, --db-exclude will only apply to standard system databases (template0, template1, and postgres).
This function has not been used since the switch to the fork/exec model.
lockClear() was still used in one test (other than the lock test) so update the test and remove the function.
Both NDEBUG and DEBUG were used in the code, which was a bit confusing.
Define DEBUG in build.auto.c so it is available in all C and header files and stop using NDEBUG. This is preferable to using NDEBUG everywhere since there are multiple DEBUG* defines, e.g. DEBUG_COVERAGE.
Note that NDEBUG is still required since it is used by the C libraries.
Inline functions are more efficient and if they are not used are automatically omitted from the binary.
This also makes the implementation of these functions easier to find and removes the need for a declaration. That is, the complete implementation is located in the header rather than being spread between the header and C file.
OBJECT_DEFINE_MOVE() and OBJECT_DEFINE_FREE() will be replaced with inlines so this would be the only macro left that is constructing functions.
It is not a great pattern anyway since it makes it hard to find the function implementation.
This macro was originally intended to simplify the creation of simple getters but it has been superseded by the pattern introduced in 79a2d02c.
Remove instances of OBJECT_DEFINE_GET() to avoid confusion with the new pattern.
Introduce a standard pattern for exposing public struct members (as documented in CODING.md) and use it to inline lstSize() which should improve the performance of iterating large lists.
Since many functions in these modules are just thin wrappers of other functions, inline where appropriate.
Remove strLstExistsZ() and strLstInsertZ() since they were only used in tests, where the String version of the function is sufficient.
Move strLstNewSplitSizeZ() to command/help/help.c and remove strLstNewSplitSize(). This function has only ever been used by help and does not seem widely applicable.
Bug Fixes:
* Fix option warnings breaking async archive-get/archive-push. (Reviewed by Cynthia Shang. Reported by Lev Kokotov.)
* Fix memory leak in backup during archive copy. (Reviewed by Cynthia Shang. Reported by Christian ROUX, Efremov Egor.)
* Fix stack overflow in cipher passphrase generation. (Reviewed by Cynthia Shang. Reported by bsiara.)
* Fix repo-ls / on S3 repositories. (Reviewed by Cynthia Shang. Reported by Lesovsky Alexey.)
Features:
* Multiple repository support. (Contributed by Cynthia Shang, David Steele. Reviewed by Stefan Fercot, Stephen Frost.)
* GCS support for repository storage. (Reviewed by Cynthia Shang.)
* Add archive-header-check option. (Reviewed by Stephen Frost, Cynthia Shang. Suggested by Hans-Jürgen Schönig.)
Improvements:
* Include recreated system databases during selective restore. (Contributed by Stefan Fercot. Reviewed by Cynthia Shang.)
* Exclude content-length from S3 signed headers. (Reviewed by Cynthia Shang. Suggested by Brian P Bockelman.)
* Consolidate less commonly used repository storage options. (Reviewed by Cynthia Shang.)
* Allow custom config-path default with ./configure --with-configdir. (Contributed by Michael Schout. Reviewed by David Steele.)
* Log archive copy during backup. (Reviewed by Cynthia Shang, Stefan Fercot.)
Documentation Improvements:
* Update reference to include links to user guide examples. (Contributed by Cynthia Shang. Reviewed by David Steele.)
* Update selective restore documentation with caveats. (Reviewed by Cynthia Shang, Stefan Fercot.)
* Add compress-type clarification to archive-copy documentation. (Reviewed by Cynthia Shang, Stefan Fercot.)
* Add compress-level defaults per compress-type value. (Contributed by Cynthia Shang. Reviewed by David Steele.)
* Add note about required NFS settings being the same as PostgreSQL. (Contributed by Cynthia Shang. Reviewed by David Steele.)
hrnReplaceKey() was added to the TEST_ERROR*() macros in 58760486 but some calls to TEST_ERROR*() already used it. This led to the function being called twice on the same buffer which had no effect but valgrind definitely did not like.
Remove extraneous calls to make valgrind happy. Since this is test code there are no implications for production.
The command-example and command-example-list elements were removed from the documentation rendering some time ago so these tags were dead code. The tags, however, contained some examples and information that were pertinent to the command, so where possible, the information was included in the description of the command and/or the user-guide and links to the relevant user guide sections were added.
Note that some commands could not be updated with user guide references since doing so would cause a cyclical reference in the user guide. These commands have an internal comment to indicate this.
In addition, some clarifications were added (e.g. expire --set option) where information was lacking.
Enabled by default, this option checks the WAL header against the PostgreSQL version and system identifier to ensure that the WAL is being copied to the correct stanza. This is in addition to checking pg_control against the stanza and verifying that WAL is being copied from the same PostgreSQL data directory where pg_control is located.
Therefore, disabling this check is fairly safe but should only be done when required, e.g. if the WAL is encrypted.
3b8f0ef missed some cases that could cause archive-push to fail:
* Checking archive info.
* Checking to see if a WAL segment already exists.
These cases are now handled so archive-push can succeed on any valid repos.
This improvement reduces the number of errors thrown; these errors will now be reported as a status for the stanza or repo as appropriate. Invalid option configurations are still thrown but all other errors are caught, formatted and reported. This was necessary for multiple repositories so that the command can complete gathering information from each repository and report the results rather than immediately aborting when an error occurs.
Two new error codes were introduced:
6 = requested backup not found
99 = other, which is used to indicate an error has occurred that requires more details to be provided
A new stanza name of "[invalid]" was created for instances where a stanza was not specified and no stanza can be found.
If there is only one repository configured the error will move up to the stanza level with the standard error formatting of 'error (message)' where the message will be "other" and the details of the error will be listed on the next line(s):
stanza: stanza1
status: error (other)
[CryptoError] unable to load info file '/var/lib/pgbackrest/repo/backup/stanza1/backup.info' or '/var/lib/pgbackrest/repo/backup/stanza1/backup.info.copy':
CryptoError: cipher header invalid
HINT: is or was the repo encrypted?
FileMissingError: unable to open missing file '/var/lib/pgbackrest/repo/backup/stanza1/backup.info.copy' for read
HINT: backup.info cannot be opened and is required to perform a backup.
HINT: has a stanza-create been performed?
HINT: use option --stanza if encryption settings are different for the stanza than the global
cipher: aes-256-cbc
If a backup set is requested but is not found on any repo, a stanza-level status error of 'requested backup not found' is reported when there are no other errors:
pgbackrest info --stanza=demo --set=bogus
stanza: demo
status: error (requested backup not found)
cipher: mixed
repo1: aes-256-cbc
repo2: none
If there are multiple repositories configured and a single repo is in error but the other repos are ok or have a different error:
pgbackrest info --stanza=demo --set=20210322-171211F
stanza: demo
status: mixed
repo1: error
[CryptoError] unable to load info file '/var/lib/pgbackrest/repo/backup/stanza1/backup.info' or '/var/lib/pgbackrest/repo/backup/stanza1/backup.info.copy':
CryptoError: cipher header invalid
HINT: is or was the repo encrypted?
FileMissingError: unable to open missing file '/var/lib/pgbackrest/repo/backup/stanza1/backup.info.copy' for read
HINT: backup.info cannot be opened and is required to perform a backup.
HINT: has a stanza-create been performed?
HINT: use option --stanza if encryption settings are different for the stanza than the global
repo2: ok
cipher: mixed
repo1: aes-256-cbc
repo2: none
db (current)
wal archive min/max (12): 000000010000000000000001/000000010000000000000003
full backup: 20210322-171211F
timestamp start/stop: 2021-03-22 17:12:11 / 2021-03-22 17:12:28
wal start/stop: 000000010000000000000002 / 000000010000000000000002
database size: 23.4MB, database backup size: 23.4MB
repo2: backup set size: 2.8MB, backup size: 2.8MB
database list: postgres (13359)
Json output will include the repository information and any error information. If no stanzas are found, then [invalid] will be set as the name:
[
{
"archive":[],
"backup":[],
"cipher":"none",
"db":[],
"name":"[invalid]",
"repo":[
{
"cipher":"none",
"key":1,
"status":{
"code":99,
"message":"[PathOpenError] unable to list file info for path '/var/lib/pgbackrest/repo2/backup': [13] Permission denied"
}
}
],
"status":{
"code":99,
"lock":{"backup":{"held":false}},
"message":"other"
}
}
]
The content-length header was being signed since it was the only header that didn't need to be and it seemed simpler just to sign it as well. Also, the S3 documentation encourages signing as many headers as possible to avoid tampering.
However, some proxies munge this header causing authentication failure, so skip signing content-length.