The local process is now entirely migrated to C. Since all major I/O operations are performed in the local process, the vast majority of I/O is now performed in C.
Contributed by David Steele, Cynthia Shang.
Discard all data passed to the filter. Useful for calculating size/checksum on a remote system when no data needs to be returned.
Update ioReadDrain() to automatically use the IoSink filter.
No new Perl code is being developed, so these tools are just taking up time and making migrations to newer platforms harder. There are only a few Perl tests remaining with full coverage so the coverage tool does not warn of loss of coverage in most cases.
Remove both tools and associated libraries.
Some HTTP error tests were failing after the upgrade to openssl 1.1.1, though the rest of the unit and integration tests worked fine. This seemed to be related to the very small messages used in the error testing, but it pointed to an issue with the code not being fully compliant, made worse by auto-retry being enabled by default.
Disable auto-retry and implement better error handling to bring the code in line with openssl recommendations.
There's no evidence this is a problem in the field, but having all the tests pass seems like a good idea and the new code is certainly more robust.
Coverage will be complete in the next commit when openssl 1.1.1 is introduced.
Maintaining the storage layer/drivers in two languages is burdensome. Since the integration tests require the Perl storage layer/drivers we'll need them even after the core code is migrated to C. Create an interface layer so the Perl code can be removed and new storage drivers/features introduced without adding Perl equivalents.
The goal is to move the integration tests to C so this interface will eventually be removed. That being the case, the interface was designed for maximum compatibility to ease the transition. The result looks a bit hacky but we'll improve it as needed until it can be retired.
Amend commit 434cd832 to error when the db history in archive.info and backup.info do not match.
The Perl code would attempt to reconcile the history by matching on system id and version but we are not planning to migrate that code to C. It's possible that there are users with mismatches but if so they should have been getting errors from info for the last six months. It's easy enough to manually fix these files if there are any mismatches in the field.
Contributed by Cynthia Shang.
If the file is compressible (i.e. not encrypted or already compressed) it can be marked as such in storageNewRead()/storageNewWrite(). If the file is being read from/written to a remote it will be compressed in transit using gzip.
Simplify filter group handling by having the IoRead/IoWrite objects create the filter group automatically. This removes the need for a lot of NULL checking and has a negligible effect on performance since a filter group needs to be created eventually unless the source file is missing.
Allow filters to be created using a VariantList so filter parameters can be passed to the remote.
This implementation duplicates the functionality of the Perl code but does so with different logic and includes full unit tests.
Along the way at least one bug was fixed, see issue #748.
Contributed by Cynthia Shang.
This filter exactly mimics the behavior of the Perl filter so is a drop-in replacement.
The filter is not integrated yet since it requires the Perl-to-C storage layer interface coming in a future commit.
This cache manages multiple http clients and returns one to the caller that is not busy. It is the responsibility of the caller to indicate when they are done with a client. If returnContent is set then the client will automatically be marked done.
Also add special handing for HEAD requests to recognize that content-length is informational only and no content is expected.
Allows listing repo paths/files from the command-line, to be used primarily for testing and debugging.
This command is internal-only so the interface may change at any time without notice.
Not all storage types support paths as a physical thing that must be created/destroyed. Add a feature to determine which drivers use paths and simplify the driver API as much as possible given that knowledge and by implementing as much path logic as possible in the Storage object.
Remove the ignoreMissing parameter from pathSync() since it is not used and makes little sense.
Create a standard list of error messages for the drivers to use and apply them where the code was modified -- there is plenty of work still to be done here.
This is just the part of restore run by the local helper processes, not the entire command.
Even so, various optimizations in the code (like pipelining and optimizations for zero-length files) should make the restore command faster on object stores.
Most of the *Free() functions are pretty generic so add macros to make creating them as easy as possible.
Create a distinction between *Free() functions that the caller uses to free memory and callbacks that free third-party resources. There are a number of cases where a driver needs to free resources but does not need a normal *Free() because it is handled by the interface.
Add common/object.h for macros that make object maintenance easier. This pattern can also be used for many more object functions.
Remove "File" and "Driver" from object names so they are shorter and easier to keep consistent.
Also remove the "driver" directory so storage implementations are visible directly under "storage".
The function pointer casting used when creating drivers made changing interfaces difficult and led to slightly divergent driver implementations. Unit testing caught production-level errors but there were a lot of small issues and the process was harder than it should have been.
Use void pointers instead so that no casts are required. Introduce the THIS_VOID and THIS() macros to make dealing with void pointers a little safer.
Since we don't want to expose void pointers in header files, driver functions have been removed from the headers and the various driver objects return their interface type. This cuts down on accessor methods and the vast majority of those functions were not being used. Move functions that are still required to .intern.h.
Remove the special "C" crypto functions that were used in libc and instead use the standard interface.
The function provides all the file/path/link information required to build a backup manifest.
Also update storageInfo() to provide the same information for a single file.
At the same time change the way that load constructors work (and are named) so that Ini objects do not persist after the constructors complete.
infoArchiveSave() is excluded from this commit since it is just a trivial call to infoPgSave() and won't be required soon.
In most cases the JSON type is known so this is more efficient than converting to Variant first, both in terms of memory and time.
Also rename some of the existing functions for consistency.
Removed the "anchor" parameter because it was never used in any calls in the Perl code so it was just a dead parameter that always defaulted to true.
Contributed by Cynthia Shang.
This was not an intentional feature in Perl, but it works, so it makes sense to implement the same syntax in C.
This is a break from other places where a -port option is explicitly supplied, so it may make sense to support both styles going forward. This commit does not address that, however.
Reported by Kyle Nevins.
This new implementation should behave exactly like the old Perl code with the exception of updated log messages.
Remove as much of the Perl code as possible without breaking other commands.
We deal with some pretty big lists in archive-push so a nested-loop anti-join looked like it would not be efficient enough.
This merge anti-join should do the trick even though both lists must be sorted first.
Now that repositories are writable the storage drivers that don't yet support file writes need to be updated to do so.
Note that the part size for multi-part upload has not been defined as a proper constant. This will become an option in the near future so it doesn't seem worth creating a constant that we might then forget to remove.
Logging was being enable on local/remote processes even if --log-subprocess was not specified, so fix that.
Also, make sure that stderr is enabled at error level as it was on Perl. This helps expose error information for debugging.
For remotes, suppress log and lock paths since these are not applicable on remote hosts. These options should be set in the local config if they need to be overridden.
This driver borrows heavily from the Posix driver.
At this point the only difference is that CIFS does not allow explicit directory fsyncs so they need to be suppressed. At some point the CIFS diver will also omit link support.
With the addition of this driver repository storage is now writable.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
This new implementation should behave exactly like the old Perl code with the exception of a few updated log messages.
Remove as much of the Perl code as possible without breaking other commands.
The C local is only used for C commands in the main process.
Some tweaking of the existing protocolGet() command was required. Originally the idea was to share the function for local and remote requests but the differences (as in Perl) were too great to make that practical.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
Prior to this the Perl remote was used to satisfy C requests. This worked fine but since the remote needed to be migrated to C anyway there was no reason to wait.
Add the ProtocolServer object and tweak ProtocolClient to work with it. It was also necessary to add a mechanism to get option values from the remote so that encryption settings could be read and used in the storage object.
Update the remote storage objects to comply with the protocol changes and add the storage protocol handler.
Ideally this commit would have been broken up into smaller chunks but there are cross-dependencies in the protocol layer and it didn't seem worth the extra effort.
Rename FUNCTION_DEBUG_* macros to FUNCTION_LOG_* to more accurately reflect what they do. Further rename FUNCTION_DEBUG_RESULT* macros to FUNCTION_LOG_RETURN* to make it clearer that they return from the function as well as logging. Leave FUNCTION_TEST_* macros as they are.
Consolidate the various ASSERT* macros into a single ASSERT macro that is always compiled out of production builds. It was difficult to figure out when an assert would be checked with all the different types in play. When ASSERTs are compiled in they will always be checked regardless of the log level -- tying these two concepts together was not a good idea.
The C info code has already been committed but this commit wires it into main.
Also remove the info Perl code and tests since they are no longer called.
This is a partial implementation of remote storage with just enough functionality to get the info command working. The client is written in C but the server is still in Perl, which limits progress until a C server is written.
This is a complete protocol client implementation in C.
Currently there is no C server implementation so the C client is talking to a Perl server. This won't work very long, though, as the protocol format, even though in JSON, has a lot of language-specific structure. While it would be possible to maintain compatibility between C and Perl it's probably not worth the effort in the long run.
Just as in Perl there are helper functions to make constructing protocol objects easier. Currently only repository remotes are supported.
Executes a child process and allows the calling process to communicate with it using read/write io.
This object is specially tailored to implement the protocol layer and may or may not be generally applicable to general purpose
execution.
Parameters for the local/remote commands are based on parameters that are passed to the current command.
Generate parameters for the new command based on the intersection of parameters between the current command and the command to be executed.
General i/o objects for reading and writing file descriptors, in particular those that can block. In other words, these are not generally to be used with file descriptors for actual files, but rather pipes, sockets, etc.
The info command will only be executed in C if the repository is local, i.e. not located on a remote repository host. S3 is considered "local" in this case.
This is a direct migration from Perl to integrate as seamlessly with the remaining Perl code as possible. It should not be possible to determine if the C version is running unless debug-level logging is enabled.
Contributed by Cynthia Shang.
The infoBackup object is the counterpart to the infoArchive object which encapsulates the archive.info file.
Currently the object is read-only, i.e. it is not possible to create a new or modify an existing backup.info file.
There a number of constants that will also be used in the infoManifest object so go ahead and create a module to contain them so they don't need to be moved later.
Contributed by Cynthia Shang.
Previously chown() would be called even when no ownership changes were required.
In most cases changes are not required and it seems better to perform an extra stat() rather than an extra chown().
Also add unit tests for owner() since there weren't any.
This allows CipherBlock to be used as a filter in an IoFilterGroup. The C-style functions used by Perl are now deprecated and should not be used for any new code.
Also add functions to convert between cipher names and CipherType.
Add boolean and one-dimensional list types to jsonToKv().
Add varToJson() and kvToJson() to convert Variants and KeyValues to JSON.
Contributed by Cynthia Shang.
A robust HTTP client with pipelining support and automatic retries.
Using a single object to make multiple requests is more efficient because requests are pipelined whenever possible. Requests are automatically retried when the connection has been closed by the server. Any 5xx response is also retried.
Only the HTTPS protocol is currently supported.
A simple, secure TLS client intended to allow access to services that are exposed via HTTPS. We call it TLS instead of SSL because SSL methods are disabled so only TLS connections are allowed.
This object is intended to be used for multiple TLS connections against a service so tlsClientOpen() can be called each time a new connection is needed. By default, an open connection will be reused for pipelining so the user must be prepared to retry their transaction on a read/write error if the server closes the connection before it can be reused. If this behavior is not desirable then tlsClientClose() may be used to ensure that the next call to tlsClientOpen() will create a new TLS session.
Note that tlsClientRead() is non-blocking unless there are *zero* bytes to be read from the session in which case it will raise an error after the defined timeout. In any case the tlsClientRead()/tlsClientWrite()/tlsClientEof() functions should not generally be called directly. Instead use the read/write interfaces available from tlsClientIoRead()/tlsClientIoWrite().
Add XmlDocument, XmlNode, and XmlNodeList objects as a thin interface layer on libxml2.
This interface is not intended to be comprehensive. Only a few libxml2 capabilities are exposed but more can be added as needed.
There are many places (and the number is growing) where a zero-terminated string constant must be transformed into a String object to be usable. This pattern wastes time and memory, especially since the created string is generally used in a read-only fashion.
Define macros to create constant String objects that are initialized at compile time rather than at run time.
The storageList() command accepts a regular expression as a filter. This works fine for local filesystems where it is relatively cheap to get a complete list of files and filter them in code. However, for remote filesystems like S3 it can be expensive to fetch a complete list of files only to discard the bulk of them locally.
S3 does not filter on regular expressions but it can accept a static prefix so this function extracts a prefix from a regular expression when possible.
Even a few characters can drastically reduce the amount of data that must be fetched remotely so the function does not try to be too clever. It requires a ^ anchor and stops scanning when the first special character is found.
Improve on 7794ab50 by including the build flag files directly into the Makefile as dependencies (even though they are not includes). This simplifies some of the rsync logic and allows make to do what it does best.
Also split build flag files into test, harness, and build to reduce rebuilds. Test flags are used to build test.c, harness flags are used to build the rest of the files in the test harness, and build flags are used for the files that are not directly involved in testing.
There are a number of cases where a checksum delta is more appropriate than the default time-based delta:
* Timeline has switched since the prior backup
* File timestamp is older than recorded in the prior backup
* File size changed but timestamp did not
* File timestamp is in the future compared to the start of the backup
* Online option has changed since the prior backup
A practical example is that checksum delta will be enabled after a failover to standby due to the timeline switch. In this case, timestamps can't be trusted and our recommendation has been to run a full backup, which can impact the retention schedule and requires manual intervention.
Now, a checksum delta will be performed if the backup type is incr/diff. This means more CPU will be used during the backup but the backup size will be smaller and the retention schedule will not be impacted.
Contributed by Cynthia Shang.
The C code was warning on failure and continuing but the Perl logging code was never updated with the same feature.
Rather than add the feature to Perl, just disable file logging if the log file cannot be opened. Log files are always opened by C first, so this will eliminate the error in Perl.
Reported by vthriller.
This test has been flapping since 9b9396c7. It seems to be some kind of timing issue since all integration tests pass and this unit passes on all other VMs. It only happens on Travis and is not reproducible in any development environment that we have tried.
For now, disable the test since the constant flapping is causing major delays in testing and quite a bit of time has been spent trying to identify the root cause. We are actively developing these tests and hope the issue will be identified during the course of normal development.
A number of improvements were made to the tests while searching for this issue. While none of them helped, it makes sense to keep the improvements.
There doesn't seem to be any need to implement this as a filter since current use cases (S3 authentication) work on small datasets.
So, use the single function method provided by OpenSSL for simplicity.
PostgreSQL 11 introduces configurable WAL segment sizes, from 1MB to 1GB.
There are two areas that needed to be updated to support this: building the archive-get queue and checking that WAL has been archived after a backup. Both operations require the WAL segment size to properly build a list.
Checking the archive after a backup is still implemented in Perl and has an active database connection, so just get the WAL segment size from the database.
The archive-get command does not have a connection to the database, so get the WAL segment size from pg_control instead. This requires a deeper inspection of pg_control than has been done in the past, so it seemed best to copy the relevant data structures from each version of PostgreSQL and build a generic interface layer to address them. While this approach is a bit verbose, it has the advantage of being relatively simple, and can easily be updated for new versions of PostgreSQL.
Since the integration tests generate pg_control files for testing, teach Perl how to generate files with the correct offsets for both 32-bit and 64-bit architectures.
Use checksums rather than timestamps to determine if files have changed. This is useful in cases where the timestamps may not be trustworthy, e.g. when performing an incremental after failing over to a standby.
If checksum delta is enabled then checksums will be used for verification of resumed backups, even if they are full. Resumes have always used checksums to verify the files in the repository, enabling delta performs checksums on the database files as well.
Note that the user must manually enable this feature in cases were it would be useful or just keep in enabled all the time. A future commit will address automatically enabling the feature in cases where it seems likely to be useful.
Contributed by Cynthia Shang.
As we add storage drivers it's important to keep the tests for each completely separate. Rather than have three tests for each driver, standardize on having a single test unit for each driver.
These are separated the same way in the Perl code where the remote storage driver is located in the Protocol module. However, in the C code the intention is to implement the remote storage driver as a regular driver in the storage layer rather than making a special case out of it.
So, merge the storage helpers. This also has the benefit of making the code a bit simpler.
Also separate storageSpool() and storageSpoolWrite() to make it clearer which operations require write access and to maintain consistency with the other storage helper functions.
The posix driver was developed over time and the naming is not very consistent.
Rename the files and functions to work well with other drivers and generally favor longer names since the driver functions are seldom (eventually never) used outside the driver itself.
Previously, debug log functions had to handle NULLs and truncate output to the available buffer size. This was verbose for both coding and testing.
Instead, create a function/macro combination that allows log functions to return a simple String object. The wrapper function takes care of the memory context, handles NULLs, and truncates the log string based on the available buffer size.
The archive-get command will only be executed in C if the repository is local, unencrypted, and type posix or cifs. Admittedly a limited use case, but this is just the first step in migrating the archive-get command entirely into C.
This is a direct migration from the Perl code (including messages) to integrate as seamlessly with the remaining Perl code as possible. It should not be possible to determine if the C version is running unless debug-level logging is enabled.
Basic functions to detect the presence of stanza or all stop files and error when they are present.
The functionality to detect stop files without error was not migrated. This functionality is only used by stanza-delete and will be migrated with that command.
Implement rules for generating paths within the archive part of the repository. Add a helper function, storageRepo(), to create the repository storage based on configuration settings.
The repository storage helper is located in the protocol module because it will support remote file systems in the future, just as the Perl version does.
Also, improve the existing helper functions a bit using string functions that were not available when they were written.
Use JSON code now that it is available and remove temporary hacks used to get things working initially.
Use passed storage objects rather than using storageLocal(). All storage objects in C are still local but this won't always be the case.
Also, move Postgres version conversion functions to postgres/info.c since they have no dependency on the info objects and will likely be useful elsewhere.
common/harnessLog was not ideally suited for general testing and made all the tests quite awkward. Instead, move all code used to test the common/log module into the logTest module and repurpose common/harnessLog to do log expect testing for all other tests in a cleaner way.
Add a few exceptions for config testing since the log levels are reset by default in config/parse.
Low-level functions only include stack trace in test builds while higher-level functions ship with stack trace built-in. Stack traces include all parameters passed to the function but production builds only create the parameter list when the log level is set high enough, i.e. debug or trace depending on the function.
Many options that were set per test can instead be inferred from the types, i.e. container, c, expect, and individual.
Also finish renaming Perl unit tests with the -perl suffix.
* Add storageCopy(), storageMove(), and storagePathSync().
* Separate StorageFile object into separate read and write objects.
* Abstract out Posix file read/write objects.
Configuration files are loaded from the directory specified by the --config-include-path option.
Add --config-path option for overriding the default base path of the --config and --config-include-path option.
Contributed by Cynthia Shang.
The Perl process was exiting directly when called but that interfered with proper locking for the forked async process. Now Perl returns results to the C process which handles all errors, including signals.
Now only two types of locks can be taken: archive and backup. Most commands use one or the other but the stanza-* commands acquire both locks. This provides better protection than the old command-based locking scheme.
This implementation should be faster because it does not stat each file. It simply assumes that most directory entries are files so attempts an unlink() first. If the entry is reported by error codes to be a directory then it attempts an rmdir().