The --repo-retention-full-type option allows retention of full backups based on a time period, specified in days.
The new option will default to 'count' and therefore will not affect current installations. Setting repo-retention-full-type to 'time' will allow the user to use a time period, in days, to indicate full backup retention. Using this method, a full backup can be expired only if the time the backup completed is older than the number of days set with repo-retention-full (calculated from the moment the 'expire' command is run) and at least one full backup meets the retention period. If archive retention has not been configured, then the default settings will expire archives that are prior to the oldest retained full backup. For example, if there are three full backups ending in times that are 25 days old (F1), 20 days old (F2) and 10 days old (F3), then if the full retention period is 15 days, then only F1 will be expired; F2 will be retained because F1 is not at least 15 days old.
Zstandard is a fast lossless compression algorithm targeting real-time compression scenarios at zlib-level and better compression ratios. It's backed by a very fast entropy stage, provided by Huff0 and FSE library.
Zstandard version >= 1.0 is required, which is generally only available on newer distributions.
Previously when retention-archive was set (either by the user or by default), archives prior to the archive-start of the oldest remaining full backup (after backup expiration occurred) would be expired even though the retention-archive threshold had not been met. For example, if there were 1 full backup remaining after backup expiration and the retention-archive was set to 2 and retention-archive-type=full, then archives prior to the archive-start of the remaining full backup would still be removed even though retention-archive required 2 full backups remaining before archives should be expired.
The thought was to keep the archive directory clean and since the full backup did not require prior archives, it was safe to delete them. However, this has caused problems for some users in the past (because they needed the WAL for other purposes) and with the new adhoc and time-based retention features, it was decided that the archives should remain until the threshold was met. The archives will eventually be removed and if having them causes space issues, the expire command and the retention-archive can always be run and adjusted.
The specified backup set (i.e. the backup label provided and all of its dependent backups, if any) will be expired regardless of backup retention rules except that at least one full backup must remain in the repository.
This functionality was embedded into TlsClient but that was starting to get unwieldy.
Add SocketClient to contain all socket-related client functionality.
In the ExpireEnvTest.pm backupCreate() function, backup-prior was incorrectly set for diff backups to the previous backup regardless of what backup type the previous backup was. This did not cause any issues in the Mock Expire tests before because it was not being checked. However, in order to reduce churn in the expect logs for a new feature where the backup-prior is utilized, this is being fixed so that the full backup is always used as backup-prior.
LZ4 compresses data faster than gzip but at a lower ratio. This can be a good tradeoff in certain scenarios.
Note that setting compress-type=lz4 will make new backups and archive incompatible (unrestorable) with prior versions of pgBackRest.
Add compress-type option and deprecate compress option. Since the compress option is boolean it won't work with multiple compression types. Add logic to cfgLoadUpdateOption() to update compress-type if it is not set directly. The compress option should no longer be referenced outside the cfgLoadUpdateOption() function.
Add common/compress/helper module to contain interface functions that work with multiple compression types. Code outside this module should no longer call specific compression drivers, though it may be OK to reference a specific compression type using the new interface (e.g., saving backup history files in gz format).
Unit tests only test compression using the gz format because other formats may not be available in all builds. It is the job of integration tests to exercise all compression types.
Additional compression types will be added in future commits.
These commands (e.g. restore, archive-get) never used the compress options but allowed them to be passed on the command line. Now they will error when these options are passed on the command line. If these errors occur then remove the unused options.
Auto-selection is performed only when --set is not specified. If a backup set for the given target time cannot not be found, the latest (default) backup set will be used.
Currently a limited number of date formats are recognized and timezone names are not allowed, only timezone offsets.
Most of these tests are just checking that errors are thrown when required. These are well covered in various unit tests.
The "cannot resume" tests are also well covered in the backup unit tests.
Finally, config warnings are well covered in the config unit tests.
There is more to be done here, but this accounts for the low-hanging fruit.
Set log-level-file=off when more that one test will run. In this case is it impossible to see the logs anyway since they will be automatically cleaned up after the test. This improves performance pretty dramatically since trace-level logging is expensive. If a singe integration test is run then log-level-file is trace by default but can be changed with the --log-level-test-file option.
Reduce buffer-size to 64k to save memory during testing and allow more processes to run in parallel.
Update log replacement rules so that these options can change without affecting expect logs.
For the most part this is a direct migration of the Perl code into C except as noted below.
A backup can now be initiated from a linked directory. The link will not be stored in the manifest or recreated on restore. If a link or directory does not already exist in the restore location then a directory will be created.
The logic for creating backup labels has been improved and it should no longer be possible to get a backup label earlier than the latest backup even with timezone changes or clock skew. This has never been an issue in the field that we know of, but we found it in testing.
For online backups all times are fetched from the PostgreSQL primary host (before only copy start was). This doesn't affect backup integrity but it does prevent clock skew between hosts affecting backup duration reporting.
Archive copy now works as expected when the archive and backup have different compression settings, i.e. when one is compressed and the other is not. This was a long-standing bug in the Perl code.
Resume will now work even if hardlink settings have been changed.
Reviewed by Cynthia Shang.
82df7e6f and 9856fef5 updated tests that used test points in preparation for the feature not being available in the C code.
Since tests points are no longer used remove the infrastructure.
Also remove one stray --test option in mock/all that was essentially a noop but no longer works now that the option has been removed.
Archive check does not run when in offline backup mode but the option was set to true in the manifest. It's harmless since these options are informational only but it could cause confusion when debugging.
Test points are not supported by the new C code so these will be replaced with unit tests.
The fact that the tests still pass even when the changes aren't made mid-backup (except application_name) shows how weak they were in the first place.
Even so, this does represent a regression in (soon to be be removed) Perl coverage.
Test points will not be available in the C code so update these tests as best as possible without using them.
This represents a loss of coverage for the Perl code (soon to be removed) which will be made up in the C code with unit tests.
These tests require test points which are not being implemented in the C code.
This functionality is fully tested in the command/control unit tests so integration tests are no longer required.
This expression determines which files contain page checksums but it was also including the directory above the relation directories. In a real PostgreSQL installation this not a problem because these directories don't contain any files.
However, our tests place a file in `base` which the Perl code thought should have page checksums while the new C code says no.
Update the expression to document the change and avoid churn in the expect logs later.
The protocol timeout tests have been superceded by unit tests.
The TEST_BACKUP_RESUME test point was incorrectly included into a number of tests, probably a copy pasto. It didn't hurt anything but it did add 200ms to each test where it appeared.
Catalog and control version tests were redundant. The database version and system id tests covered the important code paths and the C code gets these values from a lookup table.
Finally, fix an incomplete update to the backup.info file while munging for tests.
It wasn't practical for the main process to be ignorant of the remote path, and in any case knowing the path makes debugging easier.
Pull the remote path when connecting and pass the result of local storagePath() to the remote when making calls.
Previously, options were being filtered based on what was currently valid. For chained commands (e.g. backup then expire) some options may be valid for the first command but not the second.
Filter based on the command definition rather than what is currently valid to avoid logging options that are not valid for subsequent commands. This reduces the number of options logged and will hopefully help avoid confusion and expect log churn.
This user was created before we tested in containers to ensure isolation between the pg and repo hosts which were then just directories. The downside is that this resulted in a lot of sudos to set the pgbackrest user and to remove files which did not belong to the main test user.
Containers provide isolation without needing separate users so we can now safely remove the pgbackrest user. This allows us to remove most sudos, except where they are explicitly needed in tests.
While we're at it, remove the code that installed the Perl C library (which also required sudo) and simply add the build path to @INC instead.
Note that building the manifest on each host has been temporarily removed.
This feature will likely be brought back as a non-default option (after the manifest code has been fully migrated to C) since it can be fairly expensive.
Recovery settings are now written into postgresql.auto.conf instead of recovery.conf. Existing recovery_target* settings will be commented out to help avoid conflicts.
A comment is added before recovery settings to identify them as written by pgBackRest since it is unclear how, in general, old settings will be removed.
recovery.signal and standby.signal are automatically created based on the recovery settings.
PostgreSQL 12 will shutdown in these cases which seems to be the correct action (according to the documentation) when hot_standby = off, but older versions are promoting instead. Set target_action explicitly so all versions will behave the same way.
This does beg the question of whether the PostgreSQL 12 behavior is wrong (though it matches the docs) or the previous versions are.
This restore type automatically adds standby_mode=on to recovery.conf.
This could be accomplished previously by setting --recovery-option=standby_mode=on but PostgreSQL 12 requires standby mode to be enabled by a special file named standby.signal.
The new restore type allows us to maintain a common interface between PostgreSQL versions.
For the most part this is a direct migration of the Perl code into C.
There is one important behavioral change with regard to how file permissions are handled. The Perl code tried to set ownership as it was in the manifest even when running as an unprivileged user. This usually just led to errors and frustration.
The C code works like this:
If a restore is run as a non-root user (the typical scenario) then all files restored will belong to the user/group executing pgBackRest. If existing files are not owned by the executing user/group then an error will result if the ownership cannot be updated to the executing user/group. In that case the file ownership will need to be updated by a privileged user before the restore can be retried.
If a restore is run as the root user then pgBackRest will attempt to recreate the ownership recorded in the manifest when the backup was made. Only user/group names are stored in the manifest so the same names must exist on the restore host for this to work. If the user/group name cannot be found locally then the user/group of the PostgreSQL data directory will be used and finally root if the data directory user/group cannot be mapped to a name.
Reviewed by Cynthia Shang.
Info files required three copies in memory to be loaded (the original string, an ini representation, and the final info object). Not only was this memory inefficient but the Ini object does sequential scans when searching for keys making large files very slow to load.
This has not been an issue since archive.info and backup.info are very small, but it becomes a big deal when loading manifests with hundreds of thousands of files.
Instead of holding copies of the data in memory, use a callback to deliver the ini data directly to the object when loading. Use a similar method for save to avoid having an intermediate copy. Save is a bit complex because sections/keys must be written in alpha order or older versions of pgBackRest will not calculate the correct checksum.
Also move the load retry logic to helper functions rather than embedding it in the Info object. This allows for more flexibility in loading and ensures that stack traces will be available when developing unit tests.
Reviewed by Cynthia Shang.
Prior to 2.16 the Perl manifest code would skip any file that began with a dot. This was not intentional but it allowed PostgreSQL socket files to be located in the data directory. The new C code in 2.16 did not have this unintentional exclusion so socket files in the data directory caused errors.
Worse, the file type error was being thrown before the exclusion check so there was really no way around the issue except to move the socket files out of the data directory.
Special file types (e.g. socket, pipe) will now be automatically skipped and a warning logged to notify the user of the exclusion. The warning can be suppressed with an explicit --exclude.
Reported by CluelessTechnologist, Janis Puris, Rachid Broum.
Putting the checksum at the beginning of the file made it impossible to stream the file out when saving. The entire file had to be held in memory while it was checksummed so the checksum could be written at the beginning.
Instead place the checksum at the end. This does not break the existing Perl or C code since the read is not order dependent.
There are no plans to improve the Perl code to take advantage of this change, but it will make the C implementation more efficient.
Reviewed by Cynthia Shang.
Logging stayed in the backup log until the Perl code started. Fix this so it logs to the correct file and will still work after the Perl code is removed.
For offline backups the upper bound was being set to 0x0000FFFF0000FFFF rather than UINT64_MAX. This meant that page checksum errors might be ignored for databases with a lot of past WAL in offline mode.
Online mode is not affected since the upper bound is retrieved from pg_start_backup().
ScalityS3 has not received any maintenance in years and is slow to start which is bad for testing. Replace it with minio which starts quickly and ships as a single executable or a tiny container.
Minio has stricter limits on allowable characters but should still provide enough coverage to show that our encoding is working correctly.
This commit also includes the upgrade to openssl 1.1.1 in the Ubuntu 18.04 container.
Maintaining the storage layer/drivers in two languages is burdensome. Since the integration tests require the Perl storage layer/drivers we'll need them even after the core code is migrated to C. Create an interface layer so the Perl code can be removed and new storage drivers/features introduced without adding Perl equivalents.
The goal is to move the integration tests to C so this interface will eventually be removed. That being the case, the interface was designed for maximum compatibility to ease the transition. The result looks a bit hacky but we'll improve it as needed until it can be retired.
Amend commit 434cd832 to error when the db history in archive.info and backup.info do not match.
The Perl code would attempt to reconcile the history by matching on system id and version but we are not planning to migrate that code to C. It's possible that there are users with mismatches but if so they should have been getting errors from info for the last six months. It's easy enough to manually fix these files if there are any mismatches in the field.
Contributed by Cynthia Shang.
This implementation duplicates the functionality of the Perl code but does so with different logic and includes full unit tests.
Along the way at least one bug was fixed, see issue #748.
Contributed by Cynthia Shang.
Not all storage types support paths as a physical thing that must be created/destroyed. Add a feature to determine which drivers use paths and simplify the driver API as much as possible given that knowledge and by implementing as much path logic as possible in the Storage object.
Remove the ignoreMissing parameter from pathSync() since it is not used and makes little sense.
Create a standard list of error messages for the drivers to use and apply them where the code was modified -- there is plenty of work still to be done here.
The C code is designed to be efficient rather than deterministic at the debug log level. As we move more testing from integration to unit tests it makes less sense to try and maintain the expect logs at this log level.
Most of the expect logs have already been moved to detail level but mock/all still had tests at debug level. Change the logging defaults in the config file and remove as many references to log-level-console as possible.
The new name is preferred because pgBackRest does not support any SSL protocol versions (they are all considered to be insecure).
The old name will continue to be accepted.
Releasing the lock too early was allowing other async processes to sneak in and start running before the current process was completely shut down.
The only symptom seems to have been mixed up log messages so not a very serious issue.