Test points will not be available in the C code so update these tests as best as possible without using them.
This represents a loss of coverage for the Perl code (soon to be removed) which will be made up in the C code with unit tests.
These tests require test points which are not being implemented in the C code.
This functionality is fully tested in the command/control unit tests so integration tests are no longer required.
This expression determines which files contain page checksums but it was also including the directory above the relation directories. In a real PostgreSQL installation this not a problem because these directories don't contain any files.
However, our tests place a file in `base` which the Perl code thought should have page checksums while the new C code says no.
Update the expression to document the change and avoid churn in the expect logs later.
The protocol timeout tests have been superceded by unit tests.
The TEST_BACKUP_RESUME test point was incorrectly included into a number of tests, probably a copy pasto. It didn't hurt anything but it did add 200ms to each test where it appeared.
Catalog and control version tests were redundant. The database version and system id tests covered the important code paths and the C code gets these values from a lookup table.
Finally, fix an incomplete update to the backup.info file while munging for tests.
It wasn't practical for the main process to be ignorant of the remote path, and in any case knowing the path makes debugging easier.
Pull the remote path when connecting and pass the result of local storagePath() to the remote when making calls.
Previously, options were being filtered based on what was currently valid. For chained commands (e.g. backup then expire) some options may be valid for the first command but not the second.
Filter based on the command definition rather than what is currently valid to avoid logging options that are not valid for subsequent commands. This reduces the number of options logged and will hopefully help avoid confusion and expect log churn.
This user was created before we tested in containers to ensure isolation between the pg and repo hosts which were then just directories. The downside is that this resulted in a lot of sudos to set the pgbackrest user and to remove files which did not belong to the main test user.
Containers provide isolation without needing separate users so we can now safely remove the pgbackrest user. This allows us to remove most sudos, except where they are explicitly needed in tests.
While we're at it, remove the code that installed the Perl C library (which also required sudo) and simply add the build path to @INC instead.
Note that building the manifest on each host has been temporarily removed.
This feature will likely be brought back as a non-default option (after the manifest code has been fully migrated to C) since it can be fairly expensive.
Recovery settings are now written into postgresql.auto.conf instead of recovery.conf. Existing recovery_target* settings will be commented out to help avoid conflicts.
A comment is added before recovery settings to identify them as written by pgBackRest since it is unclear how, in general, old settings will be removed.
recovery.signal and standby.signal are automatically created based on the recovery settings.
PostgreSQL 12 will shutdown in these cases which seems to be the correct action (according to the documentation) when hot_standby = off, but older versions are promoting instead. Set target_action explicitly so all versions will behave the same way.
This does beg the question of whether the PostgreSQL 12 behavior is wrong (though it matches the docs) or the previous versions are.
This restore type automatically adds standby_mode=on to recovery.conf.
This could be accomplished previously by setting --recovery-option=standby_mode=on but PostgreSQL 12 requires standby mode to be enabled by a special file named standby.signal.
The new restore type allows us to maintain a common interface between PostgreSQL versions.
For the most part this is a direct migration of the Perl code into C.
There is one important behavioral change with regard to how file permissions are handled. The Perl code tried to set ownership as it was in the manifest even when running as an unprivileged user. This usually just led to errors and frustration.
The C code works like this:
If a restore is run as a non-root user (the typical scenario) then all files restored will belong to the user/group executing pgBackRest. If existing files are not owned by the executing user/group then an error will result if the ownership cannot be updated to the executing user/group. In that case the file ownership will need to be updated by a privileged user before the restore can be retried.
If a restore is run as the root user then pgBackRest will attempt to recreate the ownership recorded in the manifest when the backup was made. Only user/group names are stored in the manifest so the same names must exist on the restore host for this to work. If the user/group name cannot be found locally then the user/group of the PostgreSQL data directory will be used and finally root if the data directory user/group cannot be mapped to a name.
Reviewed by Cynthia Shang.
Info files required three copies in memory to be loaded (the original string, an ini representation, and the final info object). Not only was this memory inefficient but the Ini object does sequential scans when searching for keys making large files very slow to load.
This has not been an issue since archive.info and backup.info are very small, but it becomes a big deal when loading manifests with hundreds of thousands of files.
Instead of holding copies of the data in memory, use a callback to deliver the ini data directly to the object when loading. Use a similar method for save to avoid having an intermediate copy. Save is a bit complex because sections/keys must be written in alpha order or older versions of pgBackRest will not calculate the correct checksum.
Also move the load retry logic to helper functions rather than embedding it in the Info object. This allows for more flexibility in loading and ensures that stack traces will be available when developing unit tests.
Reviewed by Cynthia Shang.
Prior to 2.16 the Perl manifest code would skip any file that began with a dot. This was not intentional but it allowed PostgreSQL socket files to be located in the data directory. The new C code in 2.16 did not have this unintentional exclusion so socket files in the data directory caused errors.
Worse, the file type error was being thrown before the exclusion check so there was really no way around the issue except to move the socket files out of the data directory.
Special file types (e.g. socket, pipe) will now be automatically skipped and a warning logged to notify the user of the exclusion. The warning can be suppressed with an explicit --exclude.
Reported by CluelessTechnologist, Janis Puris, Rachid Broum.
Putting the checksum at the beginning of the file made it impossible to stream the file out when saving. The entire file had to be held in memory while it was checksummed so the checksum could be written at the beginning.
Instead place the checksum at the end. This does not break the existing Perl or C code since the read is not order dependent.
There are no plans to improve the Perl code to take advantage of this change, but it will make the C implementation more efficient.
Reviewed by Cynthia Shang.
Logging stayed in the backup log until the Perl code started. Fix this so it logs to the correct file and will still work after the Perl code is removed.
For offline backups the upper bound was being set to 0x0000FFFF0000FFFF rather than UINT64_MAX. This meant that page checksum errors might be ignored for databases with a lot of past WAL in offline mode.
Online mode is not affected since the upper bound is retrieved from pg_start_backup().
ScalityS3 has not received any maintenance in years and is slow to start which is bad for testing. Replace it with minio which starts quickly and ships as a single executable or a tiny container.
Minio has stricter limits on allowable characters but should still provide enough coverage to show that our encoding is working correctly.
This commit also includes the upgrade to openssl 1.1.1 in the Ubuntu 18.04 container.
Maintaining the storage layer/drivers in two languages is burdensome. Since the integration tests require the Perl storage layer/drivers we'll need them even after the core code is migrated to C. Create an interface layer so the Perl code can be removed and new storage drivers/features introduced without adding Perl equivalents.
The goal is to move the integration tests to C so this interface will eventually be removed. That being the case, the interface was designed for maximum compatibility to ease the transition. The result looks a bit hacky but we'll improve it as needed until it can be retired.
Amend commit 434cd832 to error when the db history in archive.info and backup.info do not match.
The Perl code would attempt to reconcile the history by matching on system id and version but we are not planning to migrate that code to C. It's possible that there are users with mismatches but if so they should have been getting errors from info for the last six months. It's easy enough to manually fix these files if there are any mismatches in the field.
Contributed by Cynthia Shang.
This implementation duplicates the functionality of the Perl code but does so with different logic and includes full unit tests.
Along the way at least one bug was fixed, see issue #748.
Contributed by Cynthia Shang.
Not all storage types support paths as a physical thing that must be created/destroyed. Add a feature to determine which drivers use paths and simplify the driver API as much as possible given that knowledge and by implementing as much path logic as possible in the Storage object.
Remove the ignoreMissing parameter from pathSync() since it is not used and makes little sense.
Create a standard list of error messages for the drivers to use and apply them where the code was modified -- there is plenty of work still to be done here.
The C code is designed to be efficient rather than deterministic at the debug log level. As we move more testing from integration to unit tests it makes less sense to try and maintain the expect logs at this log level.
Most of the expect logs have already been moved to detail level but mock/all still had tests at debug level. Change the logging defaults in the config file and remove as many references to log-level-console as possible.
The new name is preferred because pgBackRest does not support any SSL protocol versions (they are all considered to be insecure).
The old name will continue to be accepted.
Releasing the lock too early was allowing other async processes to sneak in and start running before the current process was completely shut down.
The only symptom seems to have been mixed up log messages so not a very serious issue.
This new implementation should behave exactly like the old Perl code with the exception of updated log messages.
Remove as much of the Perl code as possible without breaking other commands.
This condition was not being properly checked for in the C code and it caused problems in the info command, at the very least.
Instead of applying a local fix, introduce a new path option type that will rigorously check the format of any incoming paths.
Reported by Marc Cousin.
Add the buffer-size, compress-level, compress-level-network, and process-max options to the backup:option section in backup.manifest to aid in debugging.
It may also make sense to propagate these options up to backup.info so they can be displayed in the info command, but for now this is deemed sufficient.
Contributed by blogh.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
The command option was not being set correctly when a remote was started from a local. It was being set as 'local' rather than the command that the local was running as.
Also automatically select the remote protocol id based on whether it is started from a local (use the local protocol id) or from the main process (use 0).
These were not live issues but could cause strange behaviors as new features are added that might be hard to diagnose.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
The expect tests were originally a rough-and-ready type of unit test so monitoring changes in the expect log helped us detect changes in behavior.
Now the stanza code is heavily unit-tested so the detailed logs mainly cause churn and don't have any measurable benefit.
Reduce the log level to DETAIL to make the logs less verbose and volatile, yet still check user-facing log messages.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
The expect tests were originally a rough-and-ready type of unit test so monitoring changes in the expect log helped us detect changes in behavior.
Now the archive code is heavily unit-tested so the detailed logs mainly cause churn and don't have any measurable benefit.
Reduce the log level to DETAIL to make the logs less verbose and volatile, yet still check user-facing log messages.
Expressions such as <REPO:ARCHIVE> require a stanza name in order to be resolved correctly. However, if the stanza name is passed to the remote then that remote will only work correctly for that one stanza.
Instead, resolved the expressions locally but still pass a relative path to the remote. That way, a storage path that is only configured on the remote does not need to be known locally.
This command was previously forked off from the archive-get command which required a bit of artificial option and log manipulation.
A separate command is easier to test and will work on platforms that don't have fork(), e.g. Windows.
This was not being caught because the integration tests for S3 were running remotely and going through the Perl code rather than the new C code.
Implement the exists method for the S3 driver and add tests to prevent a regression.
Reported by mibiio.
The C info code has already been committed but this commit wires it into main.
Also remove the info Perl code and tests since they are no longer called.
Replace the repository path with just "the repository". The path is not important in this context and it is clearer to state where the stanzas are missing from.
- Add detail to errors when info files are loaded with incorrect encryption settings.
- Throw FileMissingError rather than FileOpenError when both copies of the info file are missing.
- If one file is present (but errors) and the other is missing, then return the error for the file that was present.
Contributed by Cynthia Shang.
The previous error message only showed the last error. In addition, some errors were missed (such as directory permission errors) that could prevent the copy from being checked.
Show both errors below a generic "unable to load" error. Details are now given explaining exactly why the primary and copy failed.
Previously if one file could not be loaded a warning would be output. This has been removed because it is not clear what the user should do in this case. Should they do a stanza-create --force? Maybe the best idea is to automatically repair the corrupt file, but on the other hand that might just spread corruption if pgBackRest makes the wrong choice.
The decryption filter was added in archiveGetFile() and archiveGetCheck() was modified to return the WAL decryption key stored in archive.info. The rest was plumbing.
The mock/archive/1 integration test added encryption to provide coverage for the new code paths while mock/archive/2 dropped encryption to provide coverage for the existing code paths. This caused some churn in the expect logs but there was no change in behavior.