Some C libraries (e.g.musl) render floating point numbers differently when using printf(). This does not cause any problems for the core code but the unit tests require more predictability to function smoothly without a lot of exceptions.
To accomplish this, wrap the various floating point operations in functions that mostly continue doing math using double but format the output string using integers. This leads to more predictable output at the cost of some complexity.
Rounding could also be accomplished using nearbyint() but this would require linking the math library, which does not seem worth it for a fairly simple operation.
Per our policy to support five EOL versions of PostgreSQL, 9.4 is no longer supported by pgBackRest. Remove all logic associated with 9.4 and update the tests.
This includes a small fix in infoPg.c to allow backup.info files with old versions to be saved. This allows expire to function when old versions are present. Even though those older versions cannot be used, they can be expired.
Tests for 9.4 are left in the expire/info tests to demonstrate that these commands work with old versions present.
The backup reference list can be very long so it seems better to summarize the list by default for text output and keep the full list when --set is specified.
The prior locking only allowed one backup per stanza, which was required by PostgreSQL <= 9.5 and didn't present a problem when only one stanza could be created.
Now that multiple stanzas are allowed relax this restriction so that backups can run concurrently for PostgreSQL > 9.5. To do this, update the locking to be per stanza and repo rather than per stanza. Remotes are not aware of the repos that require locking so send an explicit list of files to be locked to the remote. Also remove the advisory lock for PostgreSQL > 9.5.
For info output the running backups are combined for progress output in order to avoid changing the JSON format. It definitely makes sense to have per repo progress as well but that will be left for a future commit.
Refactor the lock module to split command-specific logic from the basic file locking functionality. Command specific logic is now in command/lock.c. This will make it easier to implement new features such as repository locking and updating lock file contents on remotes.
This implementation is essentially a drop-in replacement but there are a few differences. First, the lock names no longer require a path (the path is added in the lock module). Second, the timeout functionality has been removed since it was not being used.
3c8819e1 replaced gmtime/localtime with gmtime_r/localtime_r but did not take into account a subtle difference in how they operate. While gmtime/localtime operate as if tzset() has been called, i.e. they operate on the TZ env variable directly, gmtime_r/localtime_r require tzset() to be called after changing TZ for consistent results.
Rather than call tzset() every time TZ is changed, add hrnTzSet() to encapsulate both operations.
Per our policy to support five EOL versions of PostgreSQL, 9.3 is no longer supported by pgBackRest.
Remove all logic associated with 9.3 and update the tests.
Each call to lockAcquireP() passed enough information to initialize the lock system. This was somewhat inefficient and as locks become more complicated it will lead to more code duplication. Since a process can only take one type of lock it makes sense to do most of the initialization up front.
Also reduce the log level of lockRelease() since it is only called at exit and the lock will be released in any case.
As calculated this size is not correct since it does not include the parts of prior block incrementals that are required to make the current block incremental valid. At best this could be approximated and the resulting values might be very confusing.
For now, at least, exclude this metric for block incremental backups.
The code is not completely reflowed yet so there are some cases that uncrustify will not catch. The formatting will be improved over time.
Some block of code require special formatting so have been surrounded with the {uncrustify-off}/{uncrustify-on} markers. These exceptions should be kept to a minimum.
Add --code-format (to reformat code) and --code-format-check (to check formatting) to test.pl.
Add a CI test that will check code formatting. Code must be correctly formatted before it can be merge to integration.
Add documentation to the coding standards for code formatting.
uncrustify has been configured to be as close to the current format as possible but the following changes were required:
* Break long struct initializiers out of function calls.
* Bit fields get extra spacing.
* Strings that continue from the previous line no longer indented.
* Ternary operators that do not fit on a single line moved to the next line first.
* Align under parens for multi-line if statements.
* Macros in header #if blocks are no longer indented.
* Purposeful lack of function indentation in tests has been removed.
Currently uncrustify does not completely reflow the code so there are some edge cases that might not be caught. However, this still represents a huge improvement and the formatting can be refined going forward.
Support code for uncrustify will be in a followup commit.
The primary goal of the block incremental backup is to save space in the repository by only storing changed parts of a file rather than the entire file. This implementation is focused on restore performance more than saving space in the repository, though there may be substantial savings depending on the workload.
The repo-block option enables the feature (when repo-bundle is already enabled). The block size is determined based on the file size and age. Very old or very small files will not use block incremental.
Allow key/value annotations to be added with the backup command and added/modified/removed with the new annotate command.
Annotations can be viewed with the info command in text mode when --set is specified and are always included in JSON output.
In the JSON output the percent complete is storage as an integer of the percent complete * 100. So, before display it should be converted to double and divided by 100, or split using integer mod and div.
Note that percent complete will only be displayed on the host where the backup was executed. Remote hosts will show a backup/expire running with no percent complete.
This column has been removed in PostgreSQL 15. Rather than add a lot of special handling, it seems better just to update all versions to not depend on this column.
Add centralized functions to identify the type of database (i.e. system or user) by name and use FirstNormalObjectId when a name is not available.
The new query in the db module will still return the prior result for PostgreSQL <= 15, which will be stored in the manifest. This is important to preserve behavior when downgrading pgBackRest. There are no concerns here for PostgreSQL 15 since older versions of pgBackRest won't be able to restore backups for PostgreSQL 15 anyway.
This flag was only being used by the backup command after manifestNewBuild() and had no other uses. There was a time when it was important for integration testing but the unit tests now fulfill this role.
Since backup is the only code concerned with the primary flag, move the code into the backup module.
We don't have any cross-version testing but this change was tested manually with the most recent version of pgBackRest to make sure it was tolerant of the missing primary info. When an older version of pgBackRest loads a newer manifest the primary flag will always be set to false, which is fine since it is not used.
The backup LSNs are useful for performing LSN-based PITR. LSNs will not be displayed in the general text output (without --set) because they are probably not useful enough to deserve their own line.
Currently errors found during the backup are only available in text output when specifying --set.
Add a flag to backup.info that is available in both the text and json output when --set is not specified. This at least provides the basic info that an error was found in the cluster during the backup, though details are still only available as described above.
"error list" makes it clearer that other errors may be reported. For example, if checksum-page is true in the manifest but no checksum-page-error list is provided then the error is in alignment, i.e. the file size is not a multiple of the page size, with allowances made for a valid-looking partial page at the end of the file.
It is still not possible to differentiate between alignment and page checksum errors in the output but this will be addressed in a future commit.
This removes a lot of boiler plate where every instance needs to create these interfaces.
Also add HRN_FORK_*_NOTIFY*() macros to standardize synchronizing between the parent and child processes.
In both cases update the tests with the new macros.
Simplify HRN_FORK_CHILD_BEGIN() by adding optional parameters with the common defaults.
Add _FD() to macros that retrieve file descriptors to make their purpose clearer.
It seems better to use TEST_PATH in combination with a constant string rather than have a number of different path constants. This improves readability and reduces confusion about which constant should be used.
For tests already updated as part of the macro-replacement effort, the output tests (TEST_ERROR, TEST_RESULT_LOG, TEST_STORAGE_LIST and TEST_RESULT_STR) have been simplified for readability to remove all but the TEST_PATH constants. The ongoing macro-replacement effort will include these changes.
Updated: expireTest, stanzaTest, checkTest, infoTest, verifyTest (infoArchive and infoBackup had no changes).
Some tests had to be reordered or updated, as follows:
* Reordered tests at line 317 and 331 to avoid unnecessary file removal.
* Change "stanza found" test at line 1735 to reflect real-life scenario. Originally this test had the cipher-pass environment key set up which caused the RepoGrp to be 2 but with no valid repo path. This resulted in the repo loops executing for the repo2 but since the path was not defined, the tests just reported "none" for cipher which is incorrect since the repo IS encrypted.
* Moved order of HRN_CFG_LOAD in some tests when able to avoid using storageTest.
HRN_CFG_LOAD() handles the majority of test configuration loads and has various options for special cases.
It was not clear when to use harnessCfgLoadRaw() vs harnessCfgLoad(). Now "raw" functionality is granular and enabled by parameters, e.g. noStd.
A define was already added for TEST_PATH but it was not widely used. Replace all occurrences of testPath() with TEST_PATH in the tests.
Replace testUser() with TEST_USER, testGroup() with TEST_GROUP, testRepoPath() with HRN_PATH_REPO, testDataPath() with HRN_PATH, testProjectExe() with TEST_PROJECT_EXE, and testScale() with TEST_SCALE.
Replace {[path]}, {[user]}, {[group]}, etc. with defines and remove hrnReplaceKey(). This is better than having two ways to deal with replacements.
In some cases the original test*() getters were kept because they are used by the harness, which does not have access to the new defines. Move them to harnessTest.intern.h to indicate that the tests should no longer use them.
Replace all instances of strNew("") with strNew() and use strNewZ() for non-empty zero-terminated strings. Besides saving a useless parameter, this will allow smarter memory allocation in a future commit by signaling intent, in general, to append or not.
In the tests use STRDEF() or VARSTRDEF() where more appropriate rather than blindly replacing with strNewZ(). Also replace strLstAdd() with strLstAddZ() where appropriate for the same reason.
This improvement reduces the number of errors thrown; these errors will now be reported as a status for the stanza or repo as appropriate. Invalid option configurations are still thrown but all other errors are caught, formatted and reported. This was necessary for multiple repositories so that the command can complete gathering information from each repository and report the results rather than immediately aborting when an error occurs.
Two new error codes were introduced:
6 = requested backup not found
99 = other, which is used to indicate an error has occurred that requires more details to be provided
A new stanza name of "[invalid]" was created for instances where a stanza was not specified and no stanza can be found.
If there is only one repository configured the error will move up to the stanza level with the standard error formatting of 'error (message)' where the message will be "other" and the details of the error will be listed on the next line(s):
stanza: stanza1
status: error (other)
[CryptoError] unable to load info file '/var/lib/pgbackrest/repo/backup/stanza1/backup.info' or '/var/lib/pgbackrest/repo/backup/stanza1/backup.info.copy':
CryptoError: cipher header invalid
HINT: is or was the repo encrypted?
FileMissingError: unable to open missing file '/var/lib/pgbackrest/repo/backup/stanza1/backup.info.copy' for read
HINT: backup.info cannot be opened and is required to perform a backup.
HINT: has a stanza-create been performed?
HINT: use option --stanza if encryption settings are different for the stanza than the global
cipher: aes-256-cbc
If a backup set is requested but is not found on any repo, a stanza-level status error of 'requested backup not found' is reported when there are no other errors:
pgbackrest info --stanza=demo --set=bogus
stanza: demo
status: error (requested backup not found)
cipher: mixed
repo1: aes-256-cbc
repo2: none
If there are multiple repositories configured and a single repo is in error but the other repos are ok or have a different error:
pgbackrest info --stanza=demo --set=20210322-171211F
stanza: demo
status: mixed
repo1: error
[CryptoError] unable to load info file '/var/lib/pgbackrest/repo/backup/stanza1/backup.info' or '/var/lib/pgbackrest/repo/backup/stanza1/backup.info.copy':
CryptoError: cipher header invalid
HINT: is or was the repo encrypted?
FileMissingError: unable to open missing file '/var/lib/pgbackrest/repo/backup/stanza1/backup.info.copy' for read
HINT: backup.info cannot be opened and is required to perform a backup.
HINT: has a stanza-create been performed?
HINT: use option --stanza if encryption settings are different for the stanza than the global
repo2: ok
cipher: mixed
repo1: aes-256-cbc
repo2: none
db (current)
wal archive min/max (12): 000000010000000000000001/000000010000000000000003
full backup: 20210322-171211F
timestamp start/stop: 2021-03-22 17:12:11 / 2021-03-22 17:12:28
wal start/stop: 000000010000000000000002 / 000000010000000000000002
database size: 23.4MB, database backup size: 23.4MB
repo2: backup set size: 2.8MB, backup size: 2.8MB
database list: postgres (13359)
Json output will include the repository information and any error information. If no stanzas are found, then [invalid] will be set as the name:
[
{
"archive":[],
"backup":[],
"cipher":"none",
"db":[],
"name":"[invalid]",
"repo":[
{
"cipher":"none",
"key":1,
"status":{
"code":99,
"message":"[PathOpenError] unable to list file info for path '/var/lib/pgbackrest/repo2/backup': [13] Permission denied"
}
}
],
"status":{
"code":99,
"lock":{"backup":{"held":false}},
"message":"other"
}
}
]
When the FUNCTION_*_RESULT*() macros were renamed to FUNCTION_*_RETURN_*() in the core code the test harness macros were missed.
Update them to make the naming consistent.
The info command provides total sizes for files in the backup on the database as well as the repository. The text output and associated user documentation has been updated to provide more clarity regarding the sizes being displayed.
In addition, the info command is updated to allow a user to optionally specify the repository when requesting a specific backup set. In this case, the text output will reflect the status of the stanza, the cipher types and archive min/max over all the repositories instead of a single repository when the repo option is specified.
Multi-repository implementations for the archive-push, check, info, stanza-create, stanza-upgrade, and stanza-delete commands.
Multi-repo configuration is disabled so there should be no behavioral changes between these commands and their current single-repo implementations.
Multi-repo documentation and integration tests are still in the multi-repo development branch. All unit tests work as multi-repo since they are able to bypass the configuration restrictions.
Tests that are duplicated are being removed from the info command unit tests. Specifically tests where the only thing different was whether a lock was held or not which affects only the status display. Removing these tests will reduce churn in the upcoming multi-repo support.
There is an inconsistency when the JSON is output for the case when a stanza is requested and it does not exist in the repo. This was the only case where the archive array was not added to the JSON. Adding it will simplify the upcoming multi-repo support code.
Also, a redundant test was removed rather than updating it for this case.
Improve locking on remote processes by introducing an exec-id that is unique to the main process and passed to all remote processes. This allows the remote processes to determine if a lock is held by a remote from the same main process. If so, the lock is allowed.
The exec-id is also useful for associating remote logs with main logs for debugging purposes.