1
0
mirror of https://github.com/pgbackrest/pgbackrest.git synced 2024-12-14 10:13:05 +02:00
Commit Graph

1693 Commits

Author SHA1 Message Date
David Steele
c30d3e439b
Block incremental map fixes and improvements.
Bug Fixes:

* Remove the distinction between maps where super block size is equal to block size and maps where they are not. In practice, maps with equal blocks are now rare and most of the optimizations can be applied directly to super blocks where the blocks are equal. This fixes a bug where a map that was created with equal size blocks and then converted to differing block sizes would generate an invalid map.

* Free reads during restore to avoid running out of file handles.

Improvements:

* Store super block sizes in the block map. This allows the final block size to be removed from the block list and provides a more optimal restore and better potential for analysis.

* Always round the super block size up to the next block size. This makes the number of blocks per super block more predictable.

* Allow super block sizes to be changed at will in the map. The first case for this is to store the reduced super block size required when the last super block is short but it could be used to dynamically change the super block size to optimize compression.

* Store a block count rather than a list of blocks in a super block. Blocks must always be sequential, though there may be an offset to the first block in a super block. This saves 11-14% on space for checksum sizes 6-7.

* In the case that all the blocks for a super block are present, and there is no offset, the block size is omitted.
2023-03-14 17:48:25 +07:00
David Steele
5c1f78d4dd Fix typo in blockIncrProcess(). 2023-03-12 22:38:38 +07:00
David Steele
1281a6eaf8 Ensure no continuations when block size equals super block size.
In this case each super block contains a single block so continuations are not possible.
2023-03-12 16:21:43 +07:00
David Steele
6b409d049e Update default block size and super block values based on testing.
Block sizes are incremented when the size of the map becomes as large as a single block. This is arbitrary but it appears to give a good balance of block size vs map size.

The full backup super block size is set to minimize loss of compression efficiency since most blocks in the database will likely never be modified. For diff/incr backup super blocks, a smaller size is allowable since only modified blocks are stored. The overall savings of not storing unmodified blocks offsets the small loss in compression efficiency due to the smaller super block and allows more granular fetches during restore.
2023-03-10 14:01:38 +07:00
David Steele
1119a53539 Rename BlockHash to BlockChecksum.
Checksum is the generally used terminology in the code base, even when a hash is being used as a checksum.
2023-03-09 11:04:03 +07:00
David Steele
6252c0e448 Exclude backup set size from info for block incremental backups.
As calculated this size is not correct since it does not include the parts of prior block incrementals that are required to make the current block incremental valid. At best this could be approximated and the resulting values might be very confusing.

For now, at least, exclude this metric for block incremental backups.
2023-03-09 10:30:57 +07:00
David Steele
210bed4511 Use xxHash instead of SHA-1 for block incremental checksums.
xxHash is significantly faster than SHA-1 so this helps reduce the overhead of the feature.

A variable number of bytes are used from the xxHash depending on the block size with a minimum of six bytes for the smallest block size. This keeps the maps smaller while still providing enough bits to detect block changes.
2023-03-09 10:02:04 +07:00
David Steele
8b5153ad21
Block-level incremental backup super blocks.
Small blocks sizes can lead to reduced compression efficiency, so allow multiple blocks to be compressed together in a super block. The disadvantage is that the super block must be read sequentially to retrieve blocks. However, different super block sizes can be used for different backup types, so the full backup super block sizes are large for compression efficiency and diff/incr are smaller for retrieval efficiency.
2023-03-09 09:39:54 +07:00
Stefan Fercot
740c2258e3
Add pg-version-force option for fork integration.
Forks may update pg_control version or WAL magic without affecting the structures that pgBackRest depends on.

This option forces pgBackRest to treat a cluster as the specified version when it cannot be automatically identified.
2023-03-09 08:23:15 +07:00
David Steele
2fa7e53c5d
Skip writing recovery.signal by default for restores of offline backups.
When restoring an offline backup on PostgreSQL >= 12, skip writing recovery.signal by default since this will error if the backup was made with wal_level=minimal. If the user explicitly sets the type option to something other than none, then write recovery.signal as usual since it is possible to do Point-In-Time-Recovery from an offline backup as long as wal_level was not minimal.
2023-03-08 19:05:23 +07:00
David Steele
7e5adc0359 Use raw compression/encryption to bundling and block incremental backup.
Raw encryption was already being used for block incremental. This commit adds raw compression to block incremental where possible (see da918587).

Raw compression/encryption is also added to bundling for a backup set when block incremental is enabled on the full backup. This prevents a break in backward compatibility since block incremental is not backward compatible.
2023-03-07 18:46:24 +07:00
David Steele
da91858702 Add optional raw format for compression types.
Raw format saves 12 bytes of header for gzip and 4 bytes of checksum for lz4 (plus CPU overhead). This may not seem like much, but over millions of small files or incremental blocks can really add up. Even though it may be a relatively small percentage of the overall backup size it is still objectively a large amount of data.

Use raw format for protocol compression to exercise the feature.

Raw compression format will be added to bundling and block incremental in a followup commit.
2023-03-07 18:31:17 +07:00
David Steele
1648c133d6
Keep only one all-default group index.
It is possible for a group index to be created for an option that is later found to not meet dependencies. In this case all values would be default leading to a phantom group, which can be quite confusing.

Remove group indexes that are all default (except the final one) and make sure the key for the final all default group index is 1.
2023-03-04 12:45:08 +07:00
David Steele
16ac5ee8d3 Rename block incremental manifest keys.
Since the keys need to be read/written in order, these keys make the logic a bit simpler.
2023-02-26 16:13:44 +07:00
David Steele
a9867cb0b8 Add repo-block-age-map and repo-block-size-map options.
Make these options configurable. This is primarily for testing purposes so the new options will be kept internal.
2023-02-26 14:49:34 +07:00
Christophe Courtois
15d5dcdd3b
Add explicit instructions for upgrading between 2.x versions.
Add an explicit statement that there is nothing special to do when upgrading between 2.x versions.

Leave the previous paragraph about the default location that changed between 2.00 and 2.02, as it is more a matter of transitioning from 1.x to 2.x.
2023-02-26 14:41:32 +07:00
David Steele
dffc933384 Rename DeltaMap to BlockHash.
This more accurately describes what the object does.
2023-02-13 09:17:30 +07:00
David Steele
779efe0d7a Consistently declare block incremental size as size_t.
The block is often allocated in memory so size_t makes more sense than uint64_t.
2023-02-09 13:01:56 +07:00
David Steele
d520816acf Remove parameter list from deltaMapNew().
Since this filter cannot be used remotely (yet) there is no reason to create a parameter list.
2023-02-09 08:11:05 +07:00
David Steele
3feed389a2 Improve IoChunkedRead end-of-file handling.
Determine end-of-file earlier to improve throughput.

Also clean up some comments and formatting.
2023-02-08 22:34:23 +07:00
David Steele
089fae035b Add block incremental to real/all test output. 2023-02-07 14:09:50 +07:00
David Steele
8e7e9d36a1 Fix contributors in release notes. 2023-01-31 21:28:28 +07:00
David Steele
c5907a2e71 Remove references to SSH made obsolete when TLS was introduced.
Also remove details about SSH compression that are not helpful.
2023-01-31 08:28:32 +07:00
David Steele
240312110c Begin v2.45 development. 2023-01-30 09:27:04 +07:00
David Steele
053468bfb1 v2.44: Remove PostgreSQL 9.0/9.1/9.2 Support
Improvements:

* Remove support for PostgreSQL 9.0/9.1/9.2. (Reviewed by Stefan Fercot.)
* Restore errors when no backup matches the current version of PostgreSQL. (Contributed by Stefan Fercot. Reviewed by David Steele. Suggested by Soulou.)
* Add compress-level range checking for each compress-type. (Reviewed by Stefan Fercot. Suggested by gkleen, ViperRu.)

Documentation Improvements:

* Add warning about enabling "hierarchical namespace" on Azure storage. (Reviewed by Stefan Fercot. Suggested by Vojtech Galda, Pluggi, asjonos.)
* Add replacement for linefeeds in monitoring example. (Reviewed by Stefan Fercot. Suggested by rudonx, gmustdie, Ivan Shelestov.)
* Clarify target-action behavior on various PostgreSQL versions. (Contributed by Chris Bandy. Reviewed by David Steele, Anton Kurochkin, Stefan Fercot. Suggested by Anton Kurochkin, Chris Bandy.)
* Updates and clarifications to index page. (Reviewed by Stefan Fercot.)
* Add dark mode to the website. (Suggested by Stephen Frost.)
2023-01-30 09:15:44 +07:00
David Steele
3aea997df5
Add warning about enabling "hierarchical namespace" on Azure storage.
If this feature is enabled expire will fail since directories need to be deleted separately.

Ideally we would add support for this feature but for now we'll just document the issue.
2023-01-25 10:59:13 +07:00
David Steele
ed818b3186 Add dark mode to the website.
The colors could use more tweaking but at least the website will no longer blind users running dark mode.
2023-01-25 10:35:03 +07:00
David Steele
912eec63bb
Block-level incremental backup.
The primary goal of the block incremental backup is to save space in the repository by only storing changed parts of a file rather than the entire file. This implementation is focused on restore performance more than saving space in the repository, though there may be substantial savings depending on the workload.

The repo-block option enables the feature (when repo-bundle is already enabled). The block size is determined based on the file size and age. Very old or very small files will not use block incremental.
2023-01-20 16:48:57 +07:00
David Steele
34e4835ff3
Refactor common/ini module to remove callbacks and duplicated code.
The callbacks in iniLoad() made the downstream code more complicated than it needed to be so use an iterator model instead.

Combine the two functions that were used to load the ini data to remove code duplication. In theory it would be nice to use iniValueNext() in the config/parse module rather than loading a KeyValue store but this would mean a big change to the parser, which does not seem worthwhile at this time.
2023-01-12 21:24:28 +07:00
David Steele
f018912908 Split VR_EXTERN/FN_EXTERN macros from FV_EXTERN.
This should make it a little clearer what the variable (VR) macros are doing since the declaration/definition cannot both be set to extern (but functions can).

Splitting the variable macros out also allows them to be changed in the future with little churn, while changing the function macro creates a large amount of churn.
2023-01-02 15:24:51 +07:00
David Steele
4fb8a0ecdd Add meson unity build and tests.
This is immediately useful because it will detect any extern'd functions or variables that are not being used. It also detects functions or variables that are declared but not defined.

If a FV/VR_EXTERN macro is missing it will be detected either because of a mismatch in the declaration/definition or because a new defined symbol will appear in the nm test.

Eventually the unity build will be used to create a more optimized pgbackrest binary but that will need to wait.
2022-12-31 17:13:41 +07:00
Stefan Fercot
b9be4fa540
Restore errors when no backup matches the current version of PostgreSQL.
It is probably not a good idea to restore the latest backup when it was not made from the current PostgreSQL version. If there is no backup after a stanza-upgrade then replicas might be built with a prior version leading to failures.

Add an error in this case if the latest backup would be used, i.e. --set or --type=time/lsn is not specified.
2022-12-29 15:37:27 +07:00
David Steele
36ee30d118
Updates and clarifications to index page.
In particular the section about other backup solutions not supporting parallel processing was no longer accurate, so reword it.

Also update some other sections that used older nomenclature, had awkward wording, or needed clarification.
2022-12-28 19:15:44 +07:00
Chris Bandy
84a3ff8b7a
Clarify target-action behavior on various PostgreSQL versions.
The behavior of pause depends on the hot_standby parameter and the PostgreSQL version so mention both.

This behavior has been verified on PostgreSQL 9.6–15. PostgreSQL 12 is an inflection point because the behavior of an unset recovery_target_action with hot_standby=off changed in https://git.postgresql.org/gitweb/?p=postgresql.git;h=2dedf4d9a899b36d1a8ed29be5efbd1b31a8fe85.
2022-12-28 10:48:44 +07:00
David Steele
ae258f604e
Add replacement for linefeeds in monitoring example.
The copy command was converting \n to a linefeed, which the json conversion did not like. In a healthy repository there won't be any linefeeds but certain errors can contain them.

Fix by loading into a text field and then replacing the linefeed when converting to jsonb.
2022-12-27 20:28:38 +07:00
David Steele
44da314adb
Add compress-level range checking for each compress-type.
The prior range checking was done based on the valid values for gz. While this worked it was a subset of what is available for lz4 and zst.

Allow the range to be specified for each compress-type. Adding this functionality to the parse module would be a better solution but that is a bigger project than this fix deserves, at least for now.
2022-12-27 20:05:08 +07:00
David Steele
56b55f81e8
Add repository checksum to make verify and resume more efficient.
Calculate a checksum of the data stored in the repository when a file is transformed (e.g. compressed). This allows resume and verify to operate without needing to decompress/decrypt the data.

This can also be used to verify more complex formats such as block incremental and allow backups from the repository without needing to decompress the data to verify the checksum.

Add some basic encrypted tests to maintain coverage. These will be expanded in a future commit.
2022-12-22 09:26:26 +07:00
David Steele
2ab845e263
Store manifest checksums in memory more efficiently.
Manifest checksums were stored as hex-encoded strings due to legacy compatibility with Perl. Storing the checksums as binary in memory uses half the space and avoids many conversions.

There is no change to the on-disk manifest format which stores the checksum as a hex-encoded string.
2022-12-20 16:35:27 +07:00
David Steele
77c721eb63
Remove support for PostgreSQL 9.0/9.1/9.2.
Our new policy is to support ten versions of PostgreSQL, the five supported releases and the last five EOL releases. As of PostgreSQL 15, that means 9.0/9.1/9.2 are no longer supported by pgBackRest.

Remove all logic associated with 9.0/9.1/9.2 and update the tests.

Document the new support policy.

Update InfoPg to read/write control versions for the history in backup.info, since we can no longer rely on the mappings being available. In theory this could have been an issue after removing 8.3/8.4 if anybody was using a version that old.
2022-12-20 12:20:47 +07:00
David Steele
c972a9359b Begin v2.44 development. 2022-11-28 17:56:59 +08:00
David Steele
cc2ffd8264 v2.43: Bug Fix
Bug Fixes:

* Fix missing reference in diff/incr backup. (Reviewed by Stefan Fercot. Reported by Marcel Borger, ulfedf, jaymefSO.)

Improvements:

* Add hint when an option is specified without an index. (Reviewed by Stefan Fercot.)
2022-11-28 17:47:48 +08:00
David Steele
c4bf775099
Fix missing reference in diff/incr backup.
When loading prior manifests without the new reference list, the code failed to add the current backup to the reference list. Since the current backup is never explicitly referenced, building references from the file list was not sufficient to generate a complete list.

The main problem here was a bad test, fixed in 28f6604. This masked the issue and prevented it from being found. Now it is clear in the test that the current label is missing from the reference list.

Fix by adding the current label to the reference list if a reference list is not stored in the manifest.
2022-11-28 16:42:35 +08:00
David Steele
3f363cb3ae
Add hint when an option is specified without an index.
Hopefully this will make it a little clearer to the user what is wrong when they specify an indexed option without an index.

Also fix an ambiguous use of cfgParseOptionP(). The prior code worked in that it set prefixMatch = true but it was not very readable.
2022-11-22 15:04:13 +08:00
David Steele
092e254794 Begin v2.43 development. 2022-11-22 10:27:20 +08:00
David Steele
70b75532bf v2.42: Bug Fixes
Bug Fixes:

* Fix memory leak in file bundle backup/restore. (Reviewed by John Morris, Oscar. Reported by Oscar.)
* Fix protocol error on short read of remote file. (Reviewed by Stephen Frost.)

Improvements:

* Do not store references for zero-length files when bundling. (Reviewed by Stefan Fercot.)
* Use more generic descriptions for pg_start_backup()/pg_stop_backup(). (Reviewed by Greg Sabino Mullane, David Christensen. Suggested by Greg Sabino Mullane.)

Test Suite Improvements:

* Update test.pl --psql-bin option to match command-line help. (Contributed by Koshi Shibagaki. Reviewed by David Steele.)
2022-11-22 10:20:59 +08:00
k_zshiba
3ad588443b
Update test.pl --psql-bin option to match command-line help.
The option to specify the path to psql was shown in the command-line help as --psql-bin but the option was actually named --pgsql-bin.

Rename to match the help so they are consistent.
2022-11-14 12:47:27 +08:00
David Steele
58b3c91bab Add raw mode to CipherBlock to save space.
The magic in the header is only required so that command-line openssl will recognize the file as being encrypted. In cases where the encrypted data cannot be read with the command-line tool it makes sense to omit the header magic to save some space.

Unfortunately this cannot be enabled for file bundling because it would break backward compatibility. However, it should be possible to enable it for the combination of bundling and block incremental.
2022-11-10 10:28:49 +09:30
David Steele
c9db7bc274 Update cipherBlockNew() to allow optional parameters.
This simplifies calls a bit since digest is never passed and allows for new optional parameters.
2022-11-06 16:12:23 +09:30
David Steele
221db610d2 Shorten names in real/all integration test matrix.
This should allow one or two more parameters to be added without going to a new line, which keeps the matrix easier to read.
2022-10-18 18:02:17 +13:00
David Steele
5fceee88a9 Add backupFileRepoPathP().
The path for a backup file in the repository was being generated in four different places, so move the logic to a function.
2022-10-18 17:39:59 +13:00