- Add detail to errors when info files are loaded with incorrect encryption settings.
- Throw FileMissingError rather than FileOpenError when both copies of the info file are missing.
- If one file is present (but errors) and the other is missing, then return the error for the file that was present.
Contributed by Cynthia Shang.
Previously chown() would be called even when no ownership changes were required.
In most cases changes are not required and it seems better to perform an extra stat() rather than an extra chown().
Also add unit tests for owner() since there weren't any.
This got missed in 1f8931f7 when the test binary was renamed.
Also output call graph along with the flat report. The flat report is generally most useful but it doesn't hurt to have both.
The previous error message only showed the last error. In addition, some errors were missed (such as directory permission errors) that could prevent the copy from being checked.
Show both errors below a generic "unable to load" error. Details are now given explaining exactly why the primary and copy failed.
Previously if one file could not be loaded a warning would be output. This has been removed because it is not clear what the user should do in this case. Should they do a stanza-create --force? Maybe the best idea is to automatically repair the corrupt file, but on the other hand that might just spread corruption if pgBackRest makes the wrong choice.
The decryption filter was added in archiveGetFile() and archiveGetCheck() was modified to return the WAL decryption key stored in archive.info. The rest was plumbing.
The mock/archive/1 integration test added encryption to provide coverage for the new code paths while mock/archive/2 dropped encryption to provide coverage for the existing code paths. This caused some churn in the expect logs but there was no change in behavior.
The only change required was to remove the filter that prevented S3 storage from being used. The archive-get command did not require any modification which demonstrates that the storage interface is working as intended.
The mock/archive/3 integration test was modified to run S3 storage locally to provide coverage for the new code paths while mock/stanza/3 was modified to run S3 storage remotely to provide coverage for the existing code paths. This caused some churn in the expect logs but there was no change in behavior.
Test certificates were generated dynamically but there are advantages to using static certificates. For example, it possible to use the same certificate between container versions. Mostly, it is easier to document the certificates if they are not buried deep in the container code.
The new test certificates are initially intended to be used with the C unit tests but they will eventually be used for integration tests as well.
Two new certificates have been defined. See test/certificate/README.md for details.
The old dynamic certificates will be retained until they are replaced.
Add XmlDocument, XmlNode, and XmlNodeList objects as a thin interface layer on libxml2.
This interface is not intended to be comprehensive. Only a few libxml2 capabilities are exposed but more can be added as needed.
This allows a C unit test to access data in the code repository that might be useful for testing.
Add testRepoPathSet() to set the repository path.
In passing remove extra whitespace in the TEST_RESULT_VOID() macro.
If the last } of a function was marked as uncovered then the context selection would overrun into the next function.
Start checking context on the current line to prevent this. Make the same change for start context even though it doesn't seem to have an issue.
Too few lines were shown for coverage context so show the entire function if it has any missing coverage.
Update colors to work with light and dark browser modes.
The report HTML generated by lcov is overly verbose and cumbersome to navigate. Since we maintain 100% coverage it's far more interesting to look at what is not covered than what is.
The new report presents all missing coverage on a single page and excludes code that is covered for brevity.
Code generation saved files even when they had not changed, which often caused code generation cascades. So, don't save files unless they have changed.
Use rsync to determine which files have changed since the last test run. The manifest of changed files is saved and not removed until all code generation and builds have completed. If an error occurs the work will be redone on the next run.
The eventual goal is to do all the builds from the test/repo directory created by rsync but for now it is only used to track changes.
Improve on 7794ab50 by including the build flag files directly into the Makefile as dependencies (even though they are not includes). This simplifies some of the rsync logic and allows make to do what it does best.
Also split build flag files into test, harness, and build to reduce rebuilds. Test flags are used to build test.c, harness flags are used to build the rest of the files in the test harness, and build flags are used for the files that are not directly involved in testing.
The contents were already preserved between tests in a single test.pl run but for a separate execution the entire project had to be built from scratch, which was getting slower as we added code.
Save the important build flags in a file so the new execution knows whether the build contents can be reused.
There are a number of cases where a checksum delta is more appropriate than the default time-based delta:
* Timeline has switched since the prior backup
* File timestamp is older than recorded in the prior backup
* File size changed but timestamp did not
* File timestamp is in the future compared to the start of the backup
* Online option has changed since the prior backup
A practical example is that checksum delta will be enabled after a failover to standby due to the timeline switch. In this case, timestamps can't be trusted and our recommendation has been to run a full backup, which can impact the retention schedule and requires manual intervention.
Now, a checksum delta will be performed if the backup type is incr/diff. This means more CPU will be used during the backup but the backup size will be smaller and the retention schedule will not be impacted.
Contributed by Cynthia Shang.
We were already retrying 500 errors but 503 (rate-limiting) errors were not being retried and would cause an instant failure which aborted the command.
There are only two 5xx errors currently implemented by S3 but instead of adding 503 simply retry all 5xx errors. This is consistent with the http definition of this error class, "the server failed to fulfill an apparently valid request."
Suggested by Craig A. James.
This calculation was missed when the WAL segment size was made dynamic in preparation for PostgreSQL 11.
Fix the calculation by checking the actual WAL file sizes instead of using an estimate based on WAL segment size. This is more accurate because it takes into account .history and .backup files, which are smaller. Since the calculation is done in the async process the additional processing time should not adversely affect performance.
Remove the PG_WAL_SIZE constant and instead use local constants where the old value is still required. This is only the case for some tests and PostgreSQL 8.3 which does not provide a way to get the WAL segment size from pg_control.
If an error occurred while acquiring a lock on a remote server the error would be reported correctly, but the queue max detection code was not reached. The tests failed to detect this because they fixed the connection before queue max, allowing the ccde to be reached.
Move the queue max code before the lock so it will run even when remote connections are not working. This means that no attempt will be made to transfer WAL once queue max has been exceeded, but it makes it much more likely that the code will be reach without error.
Update tests to continue errors up to the point where queue max is exceeded.
Reported by Lardière Sébastien.
The standard npm packages on Ubuntu 18.04 suddenly required libssl1.0 which broke the pgbackrest package builds. Installing nodejs from deb.nodesource.com seems to work fine with standard libssl.
This package is required by ScalityS3 which is used for local S3 testing.
PostgreSQL 11 introduces configurable WAL segment sizes, from 1MB to 1GB.
There are two areas that needed to be updated to support this: building the archive-get queue and checking that WAL has been archived after a backup. Both operations require the WAL segment size to properly build a list.
Checking the archive after a backup is still implemented in Perl and has an active database connection, so just get the WAL segment size from the database.
The archive-get command does not have a connection to the database, so get the WAL segment size from pg_control instead. This requires a deeper inspection of pg_control than has been done in the past, so it seemed best to copy the relevant data structures from each version of PostgreSQL and build a generic interface layer to address them. While this approach is a bit verbose, it has the advantage of being relatively simple, and can easily be updated for new versions of PostgreSQL.
Since the integration tests generate pg_control files for testing, teach Perl how to generate files with the correct offsets for both 32-bit and 64-bit architectures.
Use checksums rather than timestamps to determine if files have changed. This is useful in cases where the timestamps may not be trustworthy, e.g. when performing an incremental after failing over to a standby.
If checksum delta is enabled then checksums will be used for verification of resumed backups, even if they are full. Resumes have always used checksums to verify the files in the repository, enabling delta performs checksums on the database files as well.
Note that the user must manually enable this feature in cases were it would be useful or just keep in enabled all the time. A future commit will address automatically enabling the feature in cases where it seems likely to be useful.
Contributed by Cynthia Shang.
Apparently we never needed to run this function remotely.
It will be needed by the backup checksum delta feature, so implement it now.
Contributed by Cynthia Shang.
This is a workaround for inefficient handling of many setjmps in gcc >= 4.9. Setjmp is used in all error handling, but in the unit tests each test macro contains an error handling block so they add up pretty quickly for large unit tests.
Enabling -ftree-coalesce-vars in affected versions reduces build time and memory requirements by nearly an order of magnitude. Even so, compiles are much slower than gcc <= 4.8.
We submitted a bug for this at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87316
Which was marked as a duplicate of: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155
For read-only repositories the Posix and CIFS drivers behave exactly the same. Since that's all we support in C right now it's valid to treat them as the same thing. An assertion has been added to remind us to add the CIFS driver before allowing the repository to be writable.
Mostly we want to make sure that the C code does not blow up when the repository type is CIFS.
Storing the expect log (created by common/harnessLog) in the regular test directory was not ideal. It showed up in tests and made it difficult to clear the test directory between each run.
Move the expect log to a purpose-built directory one level up so it does not interfere with regular testing.
C or Perl coverage tests can now be run on any VM provided a recent enough version of Devel::Cover or lcov is available.
For now, leave u18 as the only VM to run coverage tests due to some issues with older versions of lcov.
% characters caused issues in backup/restore due to filenames being appended directly into a format string.
Reserved XML characters (<>&') caused issues in the S3 driver due to improper escaping.
Add a file with all common special characters to regression testing.
By default Valgrind does not exit with an error code when a non-fatal error is detected, e.g. unfreed memory. Use the --error-exitcode option to enabled this behavior.
Update some minor issues discovered in the tests as a result. Luckily, no issues were missed in the core code.
The new archive-get C code can't run (yet) when encryption is enabled. Therefore move the encryption tests so we can test the new C code. We'll move it back when encryption is enabled in C.
Also, push one WAL segment with compression to test decompression in the C code.
A return code of 1 from the archive-get was being logged as an error message at info level but otherwise worked correctly.
Also improve info messages when an archive segment is or is not found.
Previously an error would be generated if other files were present and not owned by the PostgreSQL user. This hasn't been a big deal in practice but it could cause issues.
Also add tests to make sure the same logic applies with links to files, i.e. all other files in the directory should be ignored. This was actually working correctly, but there were no tests for it before.
The log-subprocess feature added in 22765670 failed to take into account the naming for remote processes spawned by local processes. Not only was the local command used for the naming of log files but the process id was not pass through. This meant every remote log was named "[stanza]-local-remote-000" which is confusing and meant multiple processes were writing to the same log.
Instead, pass the real command and process id to the remote. This required a minor change in locking to ignore locks if process id is greater than 0 since remotes started by locals never lock.
Relative link paths were being combined with the paths of previous links (relative or absolute) due to the $strPath variable being modified in the current iteration rather than simply being passed to the next level of recursion.
This issue did not affect absolute links and relative tablespace links were caught by other checks, though the error was confusing.
Reported by Cynthia Shang.
Offline operation runs counter to the purpose of this command, which is to check if archiving and backups are working correctly.
Reported by Jason O'Donnell.
Implemented using the same logic as the patches adding this feature to PostgreSQL, 8694cc96 and 920a5e50. Temporary relation exclusion is enabled in PostgreSQL ≥ 9.0. Unlogged relation exclusion is enabled in PostgreSQL ≥ 9.1, where the feature was introduced.
Contributed by Cynthia Shang.
This allows setting the test log level independently from the general test harness setting, but current only works for the C tests. It is useful for seeing log output from functions on the console while a test is running.
This is more efficient overall and allows the caller to specify how many bytes will be read on each call. Reads are appended if the buffer already contains data but the buffer size will never increase.
Allow Buffer object "used size" to be different than "allocated size". Add functions to manage used size and remaining size and update automatically when possible.
A regression in v0.82 removed the timestamp comparison when deciding which files from the aborted backup to keep on resume. All resumed backups should be considered inconsistent. A resumed backup can be identified by checking the log for the message "aborted backup of same type exists, will be cleaned to remove invalid files and resumed".
Reported by David Youatt, Yogesh Sharma, Stephen Frost.
S3 (and gateways) always set content-length or transfer-encoding but HTTP 1.1 does not require it and proxies (e.g. HAProxy) may not include either.
Suggested by Adam K. Sumner.
* Build containers from scratch for more accurate testing.
* Allow environment load to be skipped.
* Allow bash wrapping to be skipped.
* Allow forcing a command to run as a user without sudo.
Bug Fixes:
* Fix potential buffer overrun in error message handling. (Reported by Lætitia.)
* Fix archive write lock being taken for the synchronous archive-get command. (Reported by Uspen.)
Improvements:
* Embed exported C functions and Perl modules directly into the pgBackRest executable.
* Use time_t instead of __time_t for better portability. (Suggested by Nick Floersch.)
* Print total runtime in milliseconds at command end.
Low-level functions only include stack trace in test builds while higher-level functions ship with stack trace built-in. Stack traces include all parameters passed to the function but production builds only create the parameter list when the log level is set high enough, i.e. debug or trace depending on the function.
* Allow more than one test to provide coverage for the same module.
* Add option to disable valgrind.
* Add option to disabled coverage.
* Add option to disable debug build.
* Add option to disable compiler optimization.
* Add --dev-test mode.
pgBackRest currently has no way to request new credentials so the entire command (e.g. backup, restore) must complete before the credentials expire.
Contributed by Yogesh Sharma.
Many options that were set per test can instead be inferred from the types, i.e. container, c, expect, and individual.
Also finish renaming Perl unit tests with the -perl suffix.
Configuration files are loaded from the directory specified by the --config-include-path option.
Add --config-path option for overriding the default base path of the --config and --config-include-path option.
Contributed by Cynthia Shang.
Mainly this helps with unit tests that need to do log expect testing. Add harnessCfgLoad() test function, which allows a new config to be loaded for unit testing without resetting log functions, opening a log file, or taking locks.
The Perl process was exiting directly when called but that interfered with proper locking for the forked async process. Now Perl returns results to the C process which handles all errors, including signals.
Now only two types of locks can be taken: archive and backup. Most commands use one or the other but the stanza-* commands acquire both locks. This provides better protection than the old command-based locking scheme.
This makes it easier to create objects and then copy them to another context when they are complete without having to worry about freeing them on error. Update List, StringList, and Buffer to allow moves. Update Ini and Storage to take advantage of moves.
Switch from Devel::Cover because it would not report on branch coverage for reports converted from gcov.
Branch coverage is not complete, so for the time being errors will only be generated when statement coverage is not complete. Coverage of unit tests is not displayed in the report unless they are incomplete for either statement or branch coverage.
* Replace remaining NDEBUG blocks with the more granular DEBUG_UNIT.
* Remove some debug memset() calls in MemContext since valgrind is more useful for these checks.
Move command begin to C except when it must be called after another command in Perl (e.g. expire after backup). Command begin logs correctly for complex data types like hash and list. Specify which commands will log to file immediately and set the default log level for log messages that are common to all commands. File logging is initiated from C.
Buffering now takes the pending bytes on the socket into account (when present) rather than relying entirely on select(). In some instances the final bytes would not be flushed until the connection was closed.
The coverage report shows some code as never being run -- but that makes no sense because the tests pass. This may be due to trying to combine the C and Perl coverage reports and overwriting some runs.
Suppress for now with a plan to implement LCOV for the C unit tests.
This provides correct matching in the event there are system-id and db-version duplicates (e.g. after reverting a pg_upgrade).
Fixed by Cynthia Shang.
Reported by Adam K. Sumner.
* Add strCmp*() and strFirst*() to String.
* Add strLstSort() and strLstNewSplitSize() to StringList.
* Add strLstNewSplitZ() to StringList a update calls to strLstNewSplit() as needed.
* Add lstSort to List.
* Add strBeginsWith(), strEndsWith(), strEq(), and strBase().
* Enable compiler type checking for strNewFmt() and strCatFmt().
* Rename strNewSzN() to strNewN().
This allows specific options in pgbackrest.conf to be ignored (and set to default) which reduces the need to write new configuration files for specific needs.
Note that boolean, non-command-line options are already negatable.
When a backup host is present, backups should only be allowed on the backup host and restores should only be allowed on the database host unless an alternate configuration is created that ignores the remote host.
Reported by Lardière Sébastien.
Required to test restores on the backup server, a fairly common scenario.
Improve the restore function to accept optional parameters rather than a long list of parameters. In passing, clean up extraneous use of strType and strComment variables.
When more than one db was specified the path, port, and socket path would for db1 were passed no matter which db was actually being addressed.
Reported by Uspen.
If the backup cannot map a group to a name it stores the group in the manifest as false then uses either the owner of $PGDATA to set the group during restore or failing that the group of the current user. This logic was not working correctly because the selected group was overwriting the user on restore leaving the group undefined and the user incorrectly set to the group. (Reported by Jeff McCormick.)
The existing static files would not work with 32-bit or big-endian systems so create functions to generate these files dynamically rather than creating a bunch of new static files.
Running coverage testing on multiple distros takes time but doesn't add significant value. Also ensure that the distro designated to run coverage tests is one of the default test distros.
After a stanza-upgrade it should still be possible to restore backups from the previous version and perform recovery with archive-get. However, archive-get only checked the most recent db version/id and failed.
Also clean up some issues when the same db version/id appears multiple times in the history.
Fixed by Cynthia Shang.
Reported by Clinton Adams.
* Exclude contents of pg_snapshots, pg_serial, pg_notify, and pg_dynshmem from backup since they are rebuilt on startup.
* Exclude pg_internal.init files from backup since they are rebuilt on startup.
The archive_status directory is now recreated on restore to support PostgreSQL 8.3 which does not recreate it automatically like more recent versions do.
Also fixed log checking after PostgreSQL shuts down to include FATAL messages and disallow immediate shutdowns which can throw FATAL errors in the log.
Reported by Stephen Frost.
Modified the info command (both text and JSON output) to display the archive ID and minimum/maximum WAL currently present in the archive for the current and prior, if any, database cluster version.
Contributed by Cynthia Shang.
The integration tests that were supposed to prevent this regression did not work as intended. They verified the contents of a table in the (supposedly) restored tablespace, deleted the table, and then deleted the tablespace. All of this was deemed sufficient to prove that the tablespace had been restored correctly and was valid.
However, PostgreSQL will happily recreate a tablespace on the basis of a single full-page write, at least in the affected versions. Since writes to the test table were replayed from WAL with each recovery, all the tests passed even though the tablespace was missing after the restore.
The tests have been updated to include direct comparisons against the file system and a new table that is not replayed after a restore because it is created before the backup and never modified again.
Versions ≥ 9.0 were not affected due to numerous synthetic integration tests that verify backups and restores file by file.
* More optimized container suite that greatly improves build time.
* Added static Debian packages for Devel::Cover to reduce build time.
* Add deprecated state for containers. Deprecated containers may only be used to build packages.
* Remove Debian 8 from CI because it does not provide additional coverage over Ubuntu 14.04 and Ubuntu 16.04.
The options accommodate systems where CAs are not automatically found by IO::Socket::SSL, i.e. RHEL7, or to load custom CAs.
Suggested by Scott Frazer.
* Combine hardlink and non/compressed in synthetic tests to reduce test time and improve coverage.
* Change log level of hardlink logging to detail.
* Cast size in S3 manifest to integer.
Refactor storage layer to allow for new repository filesystems using drivers. (Reviewed by Cynthia Shang.)
Refactor IO layer to allow for new compression formats, checksum types, and other capabilities using filters. (Reviewed by Cynthia Shang.)
* Refactor Ini.pm to facilitate testing.
* Complete statement/branch coverage for Ini.pm.
* Improved functions used to test/munge manifest and info files.
* Full coverage is verified when specified.
* Modules marked with partial coverage will error if they are actually fully covered.
* Simplified test representation is DefineTest.
* Added new representation for queries in DefineTest and added API functions.
* Update modules using DefineTest to use new API.
* Fixed an issue where read-only operations that used local worker processes (i.e. restore) were creating write locks that could interfere with parallel archive-push. (Reported by Jens Wilke.)
* Simplify locking scheme. Now, only the master process will hold write locks (archive-push, backup) and not all the local and remote worker processes as before.
The stanza-upgrade command provides a mechanism for upgrading a stanza after upgrading to a new major version of PostgreSQL.
Contributed by Cynthia Shang.
* Automated builds of Debian packages for all supported distributions.
* Added --dev option to aggregate commonly used dev options.
* Added --no-package option to skip package builds.
* C library and packages are built by default, added -smart option to rebuild only when file changes are detected.
* The --libc-only option has been changed to --build-only now that packages builds have been added.
* Documentation can now be built with reusable blocks to reduce duplication.
* Added ability to pass options to containers within the documentation.
* Add proper tag to slightly emphasize proper nouns.
* Allow logging to be suppressed via logDisable() and logEnable().
* Added more flexibility in initializing and cleaning up after modules and tests.
* testResult() suppresses logging and reports exceptions.
* testException() allows messages to be matched with regular expressions.
* Refactor name/locations of common modules that setup test environments.
This option allows pgBackRest to validate page checksums in data files when checksums are enabled on PostgreSQL >= 9.3. Note that this functionality requires a C library which may not initially be available in OS packages. The option will automatically be enabled when the library is present and checksums are enabled on the cluster.
* The options were ignored and did not cause any change in behavior, but it did lead to some confusion. Invalid options will now generate an error.
* Removed erroneous --no-config option in help test module.
* Changed the --no-fork test option to --fork with negation to match all other boolean parameters.
That is, while parsing options. Error codes were still being returned accurately so this would not have made a process look like it succeeded when it did not.
Allow internal symlinks to be suppressed when the repository is located on a filesystem that does not support symlinks. This does not affect any pgBackRest functionality, but the convenience link latest will not be created and neither will internal tablespace symlinks, which will affect the ability to bring up clusters in-place manually using filesystem snapshots.
This regression was introduced in v1.09 and affected efficiency only, all WAL segments were correctly archived in asynchronous mode.
Reported by Stephen Frost.
Bug Fixes:
* Fixed missing variable replacements.
* Removed hard-coded host names from configuration file paths.
Documentation Features:
* Allow command-line length to be configured using cmd-line-len param.
* Added compact param to allow CSS to be embedded in HTML file.
* Added pretty param to produce HTML with proper indenting.
* Only generate HTML menu when required and don't require index page.
* Assign numbers to sections by default.
* VM mount points are now optional.
Controls whether console log messages are sent to stderr or stdout. By default this is set to warn which represents a change in behavior from previous versions, even though it may be more intuitive. Setting log-level-stderr=off will preserve the old behavior.
Suggested by Sascha Biberhofer.