Now that our tests are more diversified it makes sense to load only the packages that are needed for each test.
Move the package loads from .travis.yaml to test/travis.pl where we have more control over what is loaded.
Three major changes were required to get this working:
1) Provide the path to pgbackrest in the build directory when running outside a container. Tests in a container will continue to install and run against /usr/bin/pgbackrest.
1) Set a per-test lock path so tests don't conflict on the default /tmp/pgbackrest path. Also set a per-test log-path while we are at it.
2) Use localhost instead of a custom host for TLS test connections. Tests in containers will continue to update /etc/hosts and use the custom host.
Add infrastructure and update harnessCfgLoad*() to get the correct exe and paths loaded for testing.
Since new tests are required to verify that running outside a container works, also rework the tests in Travis CI to provide coverage within a reasonable amount of time. Mainly, break up to doc tests by VM and run an abbreviated unit test suite on co6 and co7.
Recovery settings are now written into postgresql.auto.conf instead of recovery.conf. Existing recovery_target* settings will be commented out to help avoid conflicts.
A comment is added before recovery settings to identify them as written by pgBackRest since it is unclear how, in general, old settings will be removed.
recovery.signal and standby.signal are automatically created based on the recovery settings.
Scaling allows the starting values to be increased from the command-line without code changes.
Also suppress valgrind and assertions when running performance testing. Optimization is left at -O0 because we should not be depending on compiler optimizations to make our code performant, and it makes profiling more informative.
For the most part this is a direct migration of the Perl code into C.
There is one important behavioral change with regard to how file permissions are handled. The Perl code tried to set ownership as it was in the manifest even when running as an unprivileged user. This usually just led to errors and frustration.
The C code works like this:
If a restore is run as a non-root user (the typical scenario) then all files restored will belong to the user/group executing pgBackRest. If existing files are not owned by the executing user/group then an error will result if the ownership cannot be updated to the executing user/group. In that case the file ownership will need to be updated by a privileged user before the restore can be retried.
If a restore is run as the root user then pgBackRest will attempt to recreate the ownership recorded in the manifest when the backup was made. Only user/group names are stored in the manifest so the same names must exist on the restore host for this to work. If the user/group name cannot be found locally then the user/group of the PostgreSQL data directory will be used and finally root if the data directory user/group cannot be mapped to a name.
Reviewed by Cynthia Shang.
This warning gives very unpredictable results between compiler versions and seems unrealistic since most of our structs are zeroed for initialization.
This warning has been disabled in the Makefile for a long time.
Broken vendor packages have been causing builds to break due to an error on apt-get update.
Ignore errors and proceed directory to apt-get install. It's possible that we'll try to reference an expired package version and get an error anyway, but that seems better than a guaranteed hard error.
Travis will timeout after 10 minutes with no output. Emit a warning every 5 minutes to keep Travis alive and increase the total timeout to 20 minutes.
Documentation builds have been timing out a lot recently so hopefully this will help.
Bug Fixes:
* Improve slow manifest build for very large quantities of tables/segments. (Reported by Jens Wilke.)
* Fix exclusions for special files. (Reported by CluelessTechnologist, Janis Puris, Rachid Broum.)
Improvements:
* The stanza-create/update/delete commands are implemented entirely in C. (Contributed by Cynthia Shang.)
* The start/stop commands are implemented entirely in C. (Contributed by Cynthia Shang.)
* Create log directories/files with 0750/0640 mode. (Suggested by Damiano Albani.)
Documentation Bug Fixes:
* Fix yum.p.o package being installed when custom package specified. (Reported by Joe Ayers, John Harvey.)
Documentation Improvements:
* Build pgBackRest as an unprivileged user. (Suggested by Laurenz Albe.)
In versions <= 2.15 the old regexp caused any file or directory beginning with . to be ignored during a backup. This has caused behavioral differences in 2.16 because the new C code correctly excludes ./.. directories.
This Perl code is only used for testing now, but it should still match the output of the C functions.
Sometimes it is useful to get at the internals of a module that is not being tested for coverage in order to provide coverage for another module that is being tested. The include directive allows this.
Update modules that had previously been added to coverage that only need to be included.
This direct interface to libpq allows simple queries to be run against PostgreSQL and supports timeouts.
Testing is performed using a shim that can use scripted responses to test all aspects of the client code. The shim will be very useful for testing backup scenarios on complex topologies.
Reviewed by Cynthia Shang.
Files (especially build.auto.h) were being removed and forcing a full build between separate invocations of test.pl.
This affected ad-hoc testing at the command-line, not a full test run in CI.
This analysis never produced anything but false positives (var might be NULL) but took over a minute per test run and added 600MB to the test container.
Since 2.91 JSON::PP has a bias for saving variables that look like numbers as numbers even if they were declared as strings.
Force versions to strings where needed by appending ''.
Update the json-pp-perl package on Ubuntu 18.04 to 2.97 to provide test coverage.
No new Perl code is being developed, so these tools are just taking up time and making migrations to newer platforms harder. There are only a few Perl tests remaining with full coverage so the coverage tool does not warn of loss of coverage in most cases.
Remove both tools and associated libraries.
ScalityS3 has not received any maintenance in years and is slow to start which is bad for testing. Replace it with minio which starts quickly and ships as a single executable or a tiny container.
Minio has stricter limits on allowable characters but should still provide enough coverage to show that our encoding is working correctly.
This commit also includes the upgrade to openssl 1.1.1 in the Ubuntu 18.04 container.
Maintaining the storage layer/drivers in two languages is burdensome. Since the integration tests require the Perl storage layer/drivers we'll need them even after the core code is migrated to C. Create an interface layer so the Perl code can be removed and new storage drivers/features introduced without adding Perl equivalents.
The goal is to move the integration tests to C so this interface will eventually be removed. That being the case, the interface was designed for maximum compatibility to ease the transition. The result looks a bit hacky but we'll improve it as needed until it can be retired.
The tests and documentation have been using the core storage layer but soon that will depend entirely on the C library, creating a bootstrap problem (i.e. the storage layer will be needed to build the C library).
Create a simplified Posix storage layer to be used by documentation and the parts of the test code that build and execute the actual tests. The actual tests will still use the core storage driver so they can interact with any type of storage.
The documentation was relying on a ScalityS3 container built for testing which wasn't very transparent. Instead, use the stock minio container and configure it in the documentation.
Also, install certificates and CA so that TLS verification can be enabled.
This report replaces the lcov report that was generated manually for each release.
The lcov report was overly verbose just to say that we have virtually 100% coverage.
The branch coverage exclusion rules were overly broad and included functions that ended in a capital letter, which disabled all coverage for the statement. Improve matching so that all characters in the name must be upper-case for a match.
Some macros with internal branches accepted parameters that might contain conditionals. This made it impossible to tell which branches belonged to which, and in any case an overzealous exclusion rule was ignoring all branches in such cases. Add the DEBUG_COVERAGE flag to build a modified version of the macros without any internal branches to be used for coverage testing. In most cases, the branches were optimizations (like checking logWill()) that improve production performance but are not needed for testing. In other cases, a parameter needed to be added to the underlying function to handle the branch during coverage testing.
Also tweak the coverage rules so that macros without conditionals are automatically excluded from branch coverage as long as they are not themselves a parameter.
Finally, update tests and code where missing coverage was exposed by these changes. Some code was updated to remove existing coverage exclusions when it was a simple change.
Use autoconf to provide a basic configure script. WITH_BACKTRACE is yet to be migrated to configure and the unit tests still use a custom Makefile.
Each C file must include "build.auto.conf" before all other includes and defines. This is enforced by test.pl for includes, but it won't detect incorrect define ordering.
Update packages to call configure and use standard flags to pass options.
Update RHEL repos that have changed upstream. Remove PostgreSQL 9.3 since the RHEL6/7 packages have disappeared.
Remove PostgreSQL versions from U12 that are still getting minor updates so the container does not need to be rebuilt.
LZ4 is included for future development, but this seems like a good time to add it to the containers.
Bug Fixes:
* Fix zero-length reads causing problems for IO filters that did not expect them. (Reported by brunre01, jwpit, Tomasz Kontusz, guruguruguru.)
* Fix reliability of error reporting from local/remote processes.
* Fix Posix/CIFS error messages reporting the wrong filename on write/sync/close.
The test harness was not being built with warnings which caused some wackiness with an improperly structured switch. Just use the same warnings as the code being tested.
Also enable warnings on code that is not directly being tested since other code modules are frequently modified during testing.
This amends 70c30dfb which disabled test tracing in general.
Instead, only enable test tracing by default for modules that are being unit tested. This saves lots of time but still ensures that test tracing is working and helps with debugging in unit tests.
Also rename the option to --debug-test-trace for a clarity.
Detailed stack traces for low-level functions (e.g. strCat, bufMove) can be very useful for debugging but leaving them on for all tests has become quite burdensome in terms of time. Complex operations like generating JSON on a large KevValue can lead to timeouts even with generous values.
Add a new param, --debug-trace, to enable test-level stack trace, but leave it off by default.
Bug Fixes:
* Remove request for S3 object info directly after putting it. (Reported by Matt Kunkel.)
* Correct archive-get-queue-max to be size type. (Reported by Ronan Dunklau.)
* Add error message when current user uid/gid does not map to a name. (Reported by Camilo Aguilar.)
* Error when --target-action=shutdown specified for PostgreSQL < 9.5.
Improvements:
* Set TCP keepalives on S3 connections. (Suggested by Ronan Dunklau.)
* Reorder info command text output so most recent backup is output last. (Contributed by Cynthia Shang. Suggested by Ryan Lambert.)
* Change file ownership only when required.
* Redact authentication header when throwing S3 errors. (Suggested by Brad Nicholson.)
This got missed in 1f8931f7 when the test binary was renamed.
Also output call graph along with the flat report. The flat report is generally most useful but it doesn't hurt to have both.
Test certificates were generated dynamically but there are advantages to using static certificates. For example, it possible to use the same certificate between container versions. Mostly, it is easier to document the certificates if they are not buried deep in the container code.
The new test certificates are initially intended to be used with the C unit tests but they will eventually be used for integration tests as well.
Two new certificates have been defined. See test/certificate/README.md for details.
The old dynamic certificates will be retained until they are replaced.
Add XmlDocument, XmlNode, and XmlNodeList objects as a thin interface layer on libxml2.
This interface is not intended to be comprehensive. Only a few libxml2 capabilities are exposed but more can be added as needed.
This allows a C unit test to access data in the code repository that might be useful for testing.
Add testRepoPathSet() to set the repository path.
In passing remove extra whitespace in the TEST_RESULT_VOID() macro.
If the last } of a function was marked as uncovered then the context selection would overrun into the next function.
Start checking context on the current line to prevent this. Make the same change for start context even though it doesn't seem to have an issue.
Too few lines were shown for coverage context so show the entire function if it has any missing coverage.
Update colors to work with light and dark browser modes.
The report HTML generated by lcov is overly verbose and cumbersome to navigate. Since we maintain 100% coverage it's far more interesting to look at what is not covered than what is.
The new report presents all missing coverage on a single page and excludes code that is covered for brevity.
Code generation saved files even when they had not changed, which often caused code generation cascades. So, don't save files unless they have changed.
Use rsync to determine which files have changed since the last test run. The manifest of changed files is saved and not removed until all code generation and builds have completed. If an error occurs the work will be redone on the next run.
The eventual goal is to do all the builds from the test/repo directory created by rsync but for now it is only used to track changes.
Improve on 7794ab50 by including the build flag files directly into the Makefile as dependencies (even though they are not includes). This simplifies some of the rsync logic and allows make to do what it does best.
Also split build flag files into test, harness, and build to reduce rebuilds. Test flags are used to build test.c, harness flags are used to build the rest of the files in the test harness, and build flags are used for the files that are not directly involved in testing.
The contents were already preserved between tests in a single test.pl run but for a separate execution the entire project had to be built from scratch, which was getting slower as we added code.
Save the important build flags in a file so the new execution knows whether the build contents can be reused.
The standard npm packages on Ubuntu 18.04 suddenly required libssl1.0 which broke the pgbackrest package builds. Installing nodejs from deb.nodesource.com seems to work fine with standard libssl.
This package is required by ScalityS3 which is used for local S3 testing.
This is a workaround for inefficient handling of many setjmps in gcc >= 4.9. Setjmp is used in all error handling, but in the unit tests each test macro contains an error handling block so they add up pretty quickly for large unit tests.
Enabling -ftree-coalesce-vars in affected versions reduces build time and memory requirements by nearly an order of magnitude. Even so, compiles are much slower than gcc <= 4.8.
We submitted a bug for this at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87316
Which was marked as a duplicate of: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155
Storing the expect log (created by common/harnessLog) in the regular test directory was not ideal. It showed up in tests and made it difficult to clear the test directory between each run.
Move the expect log to a purpose-built directory one level up so it does not interfere with regular testing.
C or Perl coverage tests can now be run on any VM provided a recent enough version of Devel::Cover or lcov is available.
For now, leave u18 as the only VM to run coverage tests due to some issues with older versions of lcov.
By default Valgrind does not exit with an error code when a non-fatal error is detected, e.g. unfreed memory. Use the --error-exitcode option to enabled this behavior.
Update some minor issues discovered in the tests as a result. Luckily, no issues were missed in the core code.
This allows setting the test log level independently from the general test harness setting, but current only works for the C tests. It is useful for seeing log output from functions on the console while a test is running.
This is more efficient overall and allows the caller to specify how many bytes will be read on each call. Reads are appended if the buffer already contains data but the buffer size will never increase.
Allow Buffer object "used size" to be different than "allocated size". Add functions to manage used size and remaining size and update automatically when possible.
* Build containers from scratch for more accurate testing.
* Allow environment load to be skipped.
* Allow bash wrapping to be skipped.
* Allow forcing a command to run as a user without sudo.
Bug Fixes:
* Fix potential buffer overrun in error message handling. (Reported by Lætitia.)
* Fix archive write lock being taken for the synchronous archive-get command. (Reported by Uspen.)
Improvements:
* Embed exported C functions and Perl modules directly into the pgBackRest executable.
* Use time_t instead of __time_t for better portability. (Suggested by Nick Floersch.)
* Print total runtime in milliseconds at command end.
Low-level functions only include stack trace in test builds while higher-level functions ship with stack trace built-in. Stack traces include all parameters passed to the function but production builds only create the parameter list when the log level is set high enough, i.e. debug or trace depending on the function.
* Allow more than one test to provide coverage for the same module.
* Add option to disable valgrind.
* Add option to disabled coverage.
* Add option to disable debug build.
* Add option to disable compiler optimization.
* Add --dev-test mode.
Many options that were set per test can instead be inferred from the types, i.e. container, c, expect, and individual.
Also finish renaming Perl unit tests with the -perl suffix.
The Perl process was exiting directly when called but that interfered with proper locking for the forked async process. Now Perl returns results to the C process which handles all errors, including signals.
This makes it easier to create objects and then copy them to another context when they are complete without having to worry about freeing them on error. Update List, StringList, and Buffer to allow moves. Update Ini and Storage to take advantage of moves.