The TLS server is an alternative to using SSH for protocol connections to remote hosts.
This command is currently experimental and intended only for trial and testing. As such, the new commands and options will not show up in the command-line help unless directly requested.
3b8f0ef missed some cases that could cause archive-push to fail:
* Checking archive info.
* Checking to see if a WAL segment already exists.
These cases are now handled so archive-push can succeed on any valid repos.
Repositories will be searched in order for the requested archive file.
Errors will be reported as warnings as long as a valid copy of the archive file is found.
Multi-repository implementations for the archive-push, check, info, stanza-create, stanza-upgrade, and stanza-delete commands.
Multi-repo configuration is disabled so there should be no behavioral changes between these commands and their current single-repo implementations.
Multi-repo documentation and integration tests are still in the multi-repo development branch. All unit tests work as multi-repo since they are able to bypass the configuration restrictions.
Check that archive files exist in the main process instead of the local process. This means that the archive.info file only needs to be loaded once per execution rather than once per file to get.
Stop looking when a file is missing or in error. PostgreSQL will never request anything past the missing file so there is no point in getting them. This also reduces "unable to find" logging in the async process.
Cache results of storageList() when looking for multiple files to reduce storage I/O.
Look for all requested archive files in the archive-id where the first file is found. They may not all be there, but this reduces the number of list calls. If subsequent files are in another archive id they will be found on the next archive-get call.
Append "asynchronously" to messages when the async process fetched the file (not in the actual async process log, though).
Add "repo1" to make it clear what archive we are talking about. This is not very useful by itself but soon we'll be able to add the archive id, which is very useful.
Add constants for messages that are used multiple times to ensure they stay consistent.
These options specify the number of local worker job retries and the retry interval after one immediate retry.
There is some value in allowing retries to be specified by the user but for the most part these options are for suppressing retries during testing, which can save a lot of time. The bug introduced in d1d25c7 and fixed in 8b86d5e also suggests it is better not to use retries in tests.
Remove the default delayed retries for archive-get/archive-push, leaving only the immediate retry. These commands are retried by PostgreSQL so it doesn't make sense to do too many retries internally.
These options are currently internal.
No timeout is expected here but the small timeout prevents errors from being thrown.
This is not a bug since the error would be thrown on the next archive-get call but it does make the tests harder to debug when there is an error.
It is not clear why there was a timeout here at all. It is likely cruft from a prior test or a copy/paste error.
Improve locking on remote processes by introducing an exec-id that is unique to the main process and passed to all remote processes. This allows the remote processes to determine if a lock is held by a remote from the same main process. If so, the lock is allowed.
The exec-id is also useful for associating remote logs with main logs for debugging purposes.
This aligns better with general PostgreSQL usage and our own documentation (updated in 4bcef702).
Usage in the backup.manifest tests has not been updated since it might break the file format.
Rather than bS3 use strStorage which can indicate more than two storage types.
For the moment there are still only two storage types but this change is required before more can be added.
LZ4 compresses data faster than gzip but at a lower ratio. This can be a good tradeoff in certain scenarios.
Note that setting compress-type=lz4 will make new backups and archive incompatible (unrestorable) with prior versions of pgBackRest.
Add compress-type option and deprecate compress option. Since the compress option is boolean it won't work with multiple compression types. Add logic to cfgLoadUpdateOption() to update compress-type if it is not set directly. The compress option should no longer be referenced outside the cfgLoadUpdateOption() function.
Add common/compress/helper module to contain interface functions that work with multiple compression types. Code outside this module should no longer call specific compression drivers, though it may be OK to reference a specific compression type using the new interface (e.g., saving backup history files in gz format).
Unit tests only test compression using the gz format because other formats may not be available in all builds. It is the job of integration tests to exercise all compression types.
Additional compression types will be added in future commits.
These commands (e.g. restore, archive-get) never used the compress options but allowed them to be passed on the command line. Now they will error when these options are passed on the command line. If these errors occur then remove the unused options.
Set log-level-file=off when more that one test will run. In this case is it impossible to see the logs anyway since they will be automatically cleaned up after the test. This improves performance pretty dramatically since trace-level logging is expensive. If a singe integration test is run then log-level-file is trace by default but can be changed with the --log-level-test-file option.
Reduce buffer-size to 64k to save memory during testing and allow more processes to run in parallel.
Update log replacement rules so that these options can change without affecting expect logs.
Info files required three copies in memory to be loaded (the original string, an ini representation, and the final info object). Not only was this memory inefficient but the Ini object does sequential scans when searching for keys making large files very slow to load.
This has not been an issue since archive.info and backup.info are very small, but it becomes a big deal when loading manifests with hundreds of thousands of files.
Instead of holding copies of the data in memory, use a callback to deliver the ini data directly to the object when loading. Use a similar method for save to avoid having an intermediate copy. Save is a bit complex because sections/keys must be written in alpha order or older versions of pgBackRest will not calculate the correct checksum.
Also move the load retry logic to helper functions rather than embedding it in the Info object. This allows for more flexibility in loading and ensures that stack traces will be available when developing unit tests.
Reviewed by Cynthia Shang.
Not all storage types support paths as a physical thing that must be created/destroyed. Add a feature to determine which drivers use paths and simplify the driver API as much as possible given that knowledge and by implementing as much path logic as possible in the Storage object.
Remove the ignoreMissing parameter from pathSync() since it is not used and makes little sense.
Create a standard list of error messages for the drivers to use and apply them where the code was modified -- there is plenty of work still to be done here.
The C code is designed to be efficient rather than deterministic at the debug log level. As we move more testing from integration to unit tests it makes less sense to try and maintain the expect logs at this log level.
Most of the expect logs have already been moved to detail level but mock/all still had tests at debug level. Change the logging defaults in the config file and remove as many references to log-level-console as possible.
This new implementation should behave exactly like the old Perl code with the exception of updated log messages.
Remove as much of the Perl code as possible without breaking other commands.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
The expect tests were originally a rough-and-ready type of unit test so monitoring changes in the expect log helped us detect changes in behavior.
Now the archive code is heavily unit-tested so the detailed logs mainly cause churn and don't have any measurable benefit.
Reduce the log level to DETAIL to make the logs less verbose and volatile, yet still check user-facing log messages.
This was not being caught because the integration tests for S3 were running remotely and going through the Perl code rather than the new C code.
Implement the exists method for the S3 driver and add tests to prevent a regression.
Reported by mibiio.
- Add detail to errors when info files are loaded with incorrect encryption settings.
- Throw FileMissingError rather than FileOpenError when both copies of the info file are missing.
- If one file is present (but errors) and the other is missing, then return the error for the file that was present.
Contributed by Cynthia Shang.
The previous error message only showed the last error. In addition, some errors were missed (such as directory permission errors) that could prevent the copy from being checked.
Show both errors below a generic "unable to load" error. Details are now given explaining exactly why the primary and copy failed.
Previously if one file could not be loaded a warning would be output. This has been removed because it is not clear what the user should do in this case. Should they do a stanza-create --force? Maybe the best idea is to automatically repair the corrupt file, but on the other hand that might just spread corruption if pgBackRest makes the wrong choice.
The decryption filter was added in archiveGetFile() and archiveGetCheck() was modified to return the WAL decryption key stored in archive.info. The rest was plumbing.
The mock/archive/1 integration test added encryption to provide coverage for the new code paths while mock/archive/2 dropped encryption to provide coverage for the existing code paths. This caused some churn in the expect logs but there was no change in behavior.
TlsClient introduced a non-blocking read which is required to read protocol messages that are linefeed-terminated rather than a known size. However, in many cases the expected number of bytes is known in advance so in that case it is more efficient to have tlsClientRead() block until all the bytes are read.
Add block parameter to all read functions and use it when a blocking read is required. For most read functions this is a noop, i.e. if the read function never blocks then it can ignore the parameter.
In passing, set the log level of storageNew*() functions to debug to expose more high-level I/O operations.
Allow buffers to report a lower size than their allocated size. This means a larger buffer can be used to do the work of a smaller buffer without having to create a new buffer and concatenate.
This is useful for blocking I/O where the buffer may be too large for the amount of data that is available to read.
The InfoPg object was partially modified in 960ad732 to place the current history item in position 0, but infoPgDataCurrent() didn't get updated correctly.
Remove this->indexCurrent and make the current position always equal 0. Use the new lstInsert() function when adding new history items via infoPgAdd(), but continue to use lstAdd() when loading from a file for efficiency.
This does not appear to be a live bug because infoPgDataCurrent() and infoPgAdd() are not yet used in any production code. The archive-get command is the only C code using InfoPG and it always looks at the entire list of items rather than just the current item.
Suggested by Cynthia Shang.
PostgreSQL 11 introduces configurable WAL segment sizes, from 1MB to 1GB.
There are two areas that needed to be updated to support this: building the archive-get queue and checking that WAL has been archived after a backup. Both operations require the WAL segment size to properly build a list.
Checking the archive after a backup is still implemented in Perl and has an active database connection, so just get the WAL segment size from the database.
The archive-get command does not have a connection to the database, so get the WAL segment size from pg_control instead. This requires a deeper inspection of pg_control than has been done in the past, so it seemed best to copy the relevant data structures from each version of PostgreSQL and build a generic interface layer to address them. While this approach is a bit verbose, it has the advantage of being relatively simple, and can easily be updated for new versions of PostgreSQL.
Since the integration tests generate pg_control files for testing, teach Perl how to generate files with the correct offsets for both 32-bit and 64-bit architectures.
Apparently we never needed to run this function remotely.
It will be needed by the backup checksum delta feature, so implement it now.
Contributed by Cynthia Shang.
For read-only repositories the Posix and CIFS drivers behave exactly the same. Since that's all we support in C right now it's valid to treat them as the same thing. An assertion has been added to remind us to add the CIFS driver before allowing the repository to be writable.
Mostly we want to make sure that the C code does not blow up when the repository type is CIFS.
The external storage interfaces (Storage, StorageFileRead, etc.) have been stable for a while, but internally they were calling the posix driver functions directly.
Create driver interfaces for storage, fileRead, and fileWrite and remove all references to the posix driver outside storage/driver/posix (with the exception of a direct call to pathRemove() in Perl LibC).
Posix is still the only available driver so more adjustment may be needed, but this should represent the bulk of the changes.
Previously, debug log functions had to handle NULLs and truncate output to the available buffer size. This was verbose for both coding and testing.
Instead, create a function/macro combination that allows log functions to return a simple String object. The wrapper function takes care of the memory context, handles NULLs, and truncates the log string based on the available buffer size.
The archive-get command will only be executed in C if the repository is local, unencrypted, and type posix or cifs. Admittedly a limited use case, but this is just the first step in migrating the archive-get command entirely into C.
This is a direct migration from the Perl code (including messages) to integrate as seamlessly with the remaining Perl code as possible. It should not be possible to determine if the C version is running unless debug-level logging is enabled.
The info messages were spread around and logged differently based on the execution path and in some cases logged nothing at all.
Temporarily track the async server status with a flag so that info messages are not output in the async process. The async process will be refactored as a separate command to be exec'd in a future commit.
The new archive-get C code can't run (yet) when encryption is enabled. Therefore move the encryption tests so we can test the new C code. We'll move it back when encryption is enabled in C.
Also, push one WAL segment with compression to test decompression in the C code.
A return code of 1 from the archive-get was being logged as an error message at info level but otherwise worked correctly.
Also improve info messages when an archive segment is or is not found.
Low-level functions only include stack trace in test builds while higher-level functions ship with stack trace built-in. Stack traces include all parameters passed to the function but production builds only create the parameter list when the log level is set high enough, i.e. debug or trace depending on the function.