Maintaining the storage layer/drivers in two languages is burdensome. Since the integration tests require the Perl storage layer/drivers we'll need them even after the core code is migrated to C. Create an interface layer so the Perl code can be removed and new storage drivers/features introduced without adding Perl equivalents.
The goal is to move the integration tests to C so this interface will eventually be removed. That being the case, the interface was designed for maximum compatibility to ease the transition. The result looks a bit hacky but we'll improve it as needed until it can be retired.
Bug Fixes:
* Fix archive retention expiring too aggressively. (Fixed by Cynthia Shang. Reported by Mohamad El-Rifai.)
Improvements:
* The expire command is implemented entirely in C. (Contributed by Cynthia Shang.)
* The local command for restore is implemented entirely in C.
* Remove hard-coded PostgreSQL user so $PGUSER works. (Suggested by Julian Zhang, Janis Puris.)
* Honor configure --prefix option. (Suggested by Daniel Westermann.)
* Rename repo-s3-verify-ssl option to repo-s3-verify-tls. The new name is preferred because pgBackRest does not support any SSL protocol versions (they are all considered to be insecure). The old name will continue to be accepted.
Documentation Improvements:
* Add FAQ to the documentation. (Contributed by Cynthia Shang.)
* Use wal_level=replica in the documentation for PostgreSQL ≥ 9.6. (Suggested by Patrick McLaughlin.)
Secure options could show up in the help as "current". While the user must have permissions to see the source of the options (e.g. environment, config file) it's still not a good idea to display them in an unexpected context.
Instead show secure options as <redacted> in the help command.
Amend commit 434cd832 to error when the db history in archive.info and backup.info do not match.
The Perl code would attempt to reconcile the history by matching on system id and version but we are not planning to migrate that code to C. It's possible that there are users with mismatches but if so they should have been getting errors from info for the last six months. It's easy enough to manually fix these files if there are any mismatches in the field.
Contributed by Cynthia Shang.
If the file is compressible (i.e. not encrypted or already compressed) it can be marked as such in storageNewRead()/storageNewWrite(). If the file is being read from/written to a remote it will be compressed in transit using gzip.
Simplify filter group handling by having the IoRead/IoWrite objects create the filter group automatically. This removes the need for a lot of NULL checking and has a negligible effect on performance since a filter group needs to be created eventually unless the source file is missing.
Allow filters to be created using a VariantList so filter parameters can be passed to the remote.
This implementation duplicates the functionality of the Perl code but does so with different logic and includes full unit tests.
Along the way at least one bug was fixed, see issue #748.
Contributed by Cynthia Shang.
The tests and documentation have been using the core storage layer but soon that will depend entirely on the C library, creating a bootstrap problem (i.e. the storage layer will be needed to build the C library).
Create a simplified Posix storage layer to be used by documentation and the parts of the test code that build and execute the actual tests. The actual tests will still use the core storage driver so they can interact with any type of storage.
This filter exactly mimics the behavior of the Perl filter so is a drop-in replacement.
The filter is not integrated yet since it requires the Perl-to-C storage layer interface coming in a future commit.
These names more accurately reflect what the functions do and follow the convention started in Info and InfoPg.
Also remove the ignoreMissing parameter since it was never used.
Contributed by Cynthia Shang.
Some filters (e.g. encryption and compression) produce output even if there is no input. Since the filter group was marked as "done" initially, processing would not run when there was zero input and that resulted in zero output.
All filters start not done so start the filter group the same way.
The prior method of tailing the docker log no longer seems reliable. Instead, keep retrying the make bucket command until it works and show the error if it times out.
This allows copying from one S3 object to another. We generally try to avoid doing this but there are a few cases where it is needed and the tests do it quite a bit.
One thing to look out for here is that reads require the http client to be explicitly released by calling httpClientDone(). This means than clients could grow if they are not released properly. The http statistics will hopefully alert us if this is happening.
This cache manages multiple http clients and returns one to the caller that is not busy. It is the responsibility of the caller to indicate when they are done with a client. If returnContent is set then the client will automatically be marked done.
Also add special handing for HEAD requests to recognize that content-length is informational only and no content is expected.
This was not enforced at parse time because repo1-cipher-type could be passed on the command-line even in cases where encryption was not needed by the subprocess.
Filter repo-cipher-type so it is never passed on the command line. If the subprocess does not have access to the passphrase then knowing the encryption type is useless anyway.
Filter groups could not be manipulated once they had been assigned to an IO object. Now they can be freely manipulated up to the time the IO object is opened.
Also, move the filter group into the IO object's context so they don't need to be tracked separately.
The previous implementation searched for the file in a list which worked but was not optimal. For arbitrary bucket structures it would also produce a false negative if a match was not found in the first 1000 entries. This was not an issue for our repo structure since the max hits on exists calls is two but it seems worth fixing to avoid future complications.
The C code was passing the host (if specified) with the request which could force the server into path-style URLs, which are not supported.
Instead, use the Perl logic of always passing bucket.endpoint in the request no matter what host is used for the HTTPS connection.
It's an open question whether we should support path-style URLs but since we don't it's useless to tell the server otherwise. Note that Amazon S3 has deprecated path-style URLs and they are no longer supported on newly created buckets.
This was being removed by rsync which forced a full build even when a partial should have been fine. Rewrite the file after the rsync so it is preserved.
Allow commands to be skipped by default in the command help but still work if help is requested for the command directly. There may be other uses for the flag in the future.
Update help for ls now that it is exposed.
Allows listing repo paths/files from the command-line, to be used primarily for testing and debugging.
This command is internal-only so the interface may change at any time without notice.
The documentation was relying on a ScalityS3 container built for testing which wasn't very transparent. Instead, use the stock minio container and configure it in the documentation.
Also, install certificates and CA so that TLS verification can be enabled.
Not all storage types support paths as a physical thing that must be created/destroyed. Add a feature to determine which drivers use paths and simplify the driver API as much as possible given that knowledge and by implementing as much path logic as possible in the Storage object.
Remove the ignoreMissing parameter from pathSync() since it is not used and makes little sense.
Create a standard list of error messages for the drivers to use and apply them where the code was modified -- there is plenty of work still to be done here.
These functions are not required for repository storage so make them optional and error if they are not implemented for non-repository storage, .e.g. pg or spool.
The goal is to simplify the drivers (e.g. S3) that are intended only for repository storage.
The prior behavior was to return NULL so the caller would know the path was missing, but this is rarely useful, complicates the calling code, and increases the chance of segfaults.
The .nullOnMissing param has been added to enable the prior behavior.
The release notes are generally a direct reflection of the git log. So, ease the burden of maintaining the release notes by using the git log to determine what needs to be added.
Currently only non-dev items are required to be matched to a git commit but the goal is to account for all commits.
The git history cache is generated from the git log but can be modified to correct typos and match the release notes as they evolve. The commit hash is used to identify commits that have already been added to the cache.
There's plenty more to do here. For instance, links to the commits for each release item should be added to the release notes.
The C code is designed to be efficient rather than deterministic at the debug log level. As we move more testing from integration to unit tests it makes less sense to try and maintain the expect logs at this log level.
Most of the expect logs have already been moved to detail level but mock/all still had tests at debug level. Change the logging defaults in the config file and remove as many references to log-level-console as possible.
/ is escaped in the spec but the Perl renderer we use does not escape it which leads to checksum mismatches between the two sets of code.
This particular escape seems to be a more recent addition to the spec and is targeted toward embedding JSON in JavaScript.
\/ is still allowed when parsing JSON.