Bug Fixes:
* Retry S3 RequestTimeTooSkewed errors instead of immediately terminating. (Reported by sean0101n, Tim Garton, Jesper St John, Aleš Zelený.)
* Fix incorrect handling of transfer-encoding response to HEAD request. (Reported by Pavel Suderevsky.)
* Fix scoping violations exposed by optimizations in gcc 9. (Reported by Christian Lange, Ned T. Crigler.)
Features:
* Add repo-s3-port option for setting a non-standard S3 service port.
Improvements:
* The local command for backup is implemented entirely in C. (Contributed by David Steele, Cynthia Shang.)
* The check command is implemented partly in C. (Reviewed by Cynthia Shang.)
Implement switch WAL and archive check in C but leave the rest in Perl for now.
The main idea was to have some real integration tests for the new database code so the rest of the migration can wait.
Reviewed by Cynthia Shang.
Migrate functionality from the Perl Db module to C. For now this is just enough to implement the WAL switch check.
Add the dbGet() helper function to get Db objects easily.
Create macros in harnessPq to make writing pq scripts easier by grouping commonly used functions together.
Reviewed by Cynthia Shang.
The cause of this error seems to be that a failed request takes so long that a subsequent retry at the http level uses outdated headers.
We're not sure if pgBackRest it to blame here (in one case a kernel downgrade fixed it, in another case an incorrect network driver was the problem) so add retries to hopefully deal with the issue if it is not too persistent. If SSL_write() has long delays before reporting an error then this will obviously affect backup performance.
Reported by sean0101n, Tim Garton, Jesper St John, Aleš Zelený.
Error codes were not being caught for SSL_write() so it was hard to see exactly what was happening in error cases. Report errors to aid in debugging.
Also add a retry for SSL_ERROR_WANT_READ. Even though we have not been able to reproduce this case it is required by SSL_write() so go ahead and implement it.
Multiple PostgreSQL hosts were supported via the host-id option but there are cases where it is useful to be able to directly specify the host id required, e.g. to iterate through pg* hosts when looking for candidate primaries and standbys during backup.
Keep trying to locate the WAL segment until timeout. This is useful for the check and backup commands which must wait for segments to arrive in the archive.
The remotes have their own config options (repo-host-config, etc.) so don't pass the local config* options.
This was a regression from the behavior of the Perl code and while there have been no field reports it caused breakage on test systems with multiple configurations.
Sometimes it is useful to get at the internals of a module that is not being tested for coverage in order to provide coverage for another module that is being tested. The include directive allows this.
Update modules that had previously been added to coverage that only need to be included.
If this option is set then ports appended to repo-s3-endpoint or repo-s3-host will be ignored.
Setting this option explicitly may be the only way to use a bare ipv6 address with S3 (since multiple colons confuse the parser) but we plan to improve this in the future.
This direct interface to libpq allows simple queries to be run against PostgreSQL and supports timeouts.
Testing is performed using a shim that can use scripted responses to test all aspects of the client code. The shim will be very useful for testing backup scenarios on complex topologies.
Reviewed by Cynthia Shang.
The local process is now entirely migrated to C. Since all major I/O operations are performed in the local process, the vast majority of I/O is now performed in C.
Contributed by David Steele, Cynthia Shang.
Add bool, array, and int64 as valid array subtypes.
Pretty print for the array subtype is not correct but is currently not in use (this can be seen at line 328 in typeJsonTest.c).
Discard all data passed to the filter. Useful for calculating size/checksum on a remote system when no data needs to be returned.
Update ioReadDrain() to automatically use the IoSink filter.
The HTTP server can use either content-length or transfer-encoding to indicate that there is content in the response. HEAD requests do not include content but return all the same headers as GET. In the HEAD case we were ignoring content-length but not transfer-encoding which led to unexpected eof errors on AWS S3. Our test server, minio, uses content-length so this was not caught in integration testing.
Ignore all content for HEAD requests (no matter how it is reported) and add a unit test for transfer-encoding to prevent a regression.
Found by Pavel Suderevsky.
This feature denotes storage that can compress files so that they take up less space than what was written. Currently this includes the Posix and CIFS drivers. The stored size of the file will be rechecked after write to determine if the reported size is different. This check would be wasted on object stores such as S3, and they might not report the file as existing immediately after write.
Also add tests to each storage driver to check features.
Previously only a single filter could be pushed to the remote since order was not being maintained. Now the filters are strictly ordered.
Results are returned from the remote and set in the local IoFilterGroup so they can be retrieved.
Expand remote filter support to include all filters.
Read all data from an IoRead object and discard it. This is handy for calculating size, hash, etc. when the output is not needed.
Update code where a loop was used before.
For offline backups the upper bound was being set to 0x0000FFFF0000FFFF rather than UINT64_MAX. This meant that page checksum errors might be ignored for databases with a lot of past WAL in offline mode.
Online mode is not affected since the upper bound is retrieved from pg_start_backup().
Using version 2.15.1 fixed the duplicate tarball problem but broke the auto-generated links. Fix them manually since this should not be a common problem.
Reported by Mohamad El-Rifai.
Files (especially build.auto.h) were being removed and forcing a full build between separate invocations of test.pl.
This affected ad-hoc testing at the command-line, not a full test run in CI.
This analysis never produced anything but false positives (var might be NULL) but took over a minute per test run and added 600MB to the test container.
Since 2.91 JSON::PP has a bias for saving variables that look like numbers as numbers even if they were declared as strings.
Force versions to strings where needed by appending ''.
Update the json-pp-perl package on Ubuntu 18.04 to 2.97 to provide test coverage.
No new Perl code is being developed, so these tools are just taking up time and making migrations to newer platforms harder. There are only a few Perl tests remaining with full coverage so the coverage tool does not warn of loss of coverage in most cases.
Remove both tools and associated libraries.
gcc < 9 makes all compound literals function scope, even though the C spec requires them to be invalid outside the current scope. Since the compiler and valgrind were not enforcing this we had a few violations which caused problems in gcc >= 9.
Even though we are not quite ready to support gcc 9 officially, fix the scoping violations that currently exist in the codebase.
Reported by chrlange, Ned T. Crigler.
ScalityS3 has not received any maintenance in years and is slow to start which is bad for testing. Replace it with minio which starts quickly and ships as a single executable or a tiny container.
Minio has stricter limits on allowable characters but should still provide enough coverage to show that our encoding is working correctly.
This commit also includes the upgrade to openssl 1.1.1 in the Ubuntu 18.04 container.
Some HTTP error tests were failing after the upgrade to openssl 1.1.1, though the rest of the unit and integration tests worked fine. This seemed to be related to the very small messages used in the error testing, but it pointed to an issue with the code not being fully compliant, made worse by auto-retry being enabled by default.
Disable auto-retry and implement better error handling to bring the code in line with openssl recommendations.
There's no evidence this is a problem in the field, but having all the tests pass seems like a good idea and the new code is certainly more robust.
Coverage will be complete in the next commit when openssl 1.1.1 is introduced.