pgbackrest

mirror of https://github.com/pgbackrest/pgbackrest.git synced 2024-12-12 10:04:14 +02:00

Author	SHA1	Message	Date
David Steele	210bed4511	Use xxHash instead of SHA-1 for block incremental checksums. xxHash is significantly faster than SHA-1 so this helps reduce the overhead of the feature. A variable number of bytes are used from the xxHash depending on the block size with a minimum of six bytes for the smallest block size. This keeps the maps smaller while still providing enough bits to detect block changes.	2023-03-09 10:02:04 +07:00
David Steele	8b5153ad21	Block-level incremental backup super blocks. Small blocks sizes can lead to reduced compression efficiency, so allow multiple blocks to be compressed together in a super block. The disadvantage is that the super block must be read sequentially to retrieve blocks. However, different super block sizes can be used for different backup types, so the full backup super block sizes are large for compression efficiency and diff/incr are smaller for retrieval efficiency.	2023-03-09 09:39:54 +07:00
David Steele	dffc933384	Rename DeltaMap to BlockHash. This more accurately describes what the object does.	2023-02-13 09:17:30 +07:00
David Steele	1da2666a9e	Add manifest test harness. These macros make adding paths/files/etc to a manifest simpler and easier to read.	2023-01-21 14:03:27 +07:00
David Steele	912eec63bb	Block-level incremental backup. The primary goal of the block incremental backup is to save space in the repository by only storing changed parts of a file rather than the entire file. This implementation is focused on restore performance more than saving space in the repository, though there may be substantial savings depending on the workload. The repo-block option enables the feature (when repo-bundle is already enabled). The block size is determined based on the file size and age. Very old or very small files will not use block incremental.	2023-01-20 16:48:57 +07:00
David Steele	34e4835ff3	Refactor common/ini module to remove callbacks and duplicated code. The callbacks in iniLoad() made the downstream code more complicated than it needed to be so use an iterator model instead. Combine the two functions that were used to load the ini data to remove code duplication. In theory it would be nice to use iniValueNext() in the config/parse module rather than loading a KeyValue store but this would mean a big change to the parser, which does not seem worthwhile at this time.	2023-01-12 21:24:28 +07:00
David Steele	9ca492cecf	Audit mem contexts returned from functions into the calling context. It is possible for functions to accidentally leak child contexts into the calling context, which may use a lot of memory depending on the use case and where it happens. Use the function return type to determine what should be returned and error when something else is returned. Add FUNCTION_AUDIT_*() macros to handle exceptions. This checking is only performed during unit tests on the code being covered by the specific unit test. Note that this does not work yet for memory allocations, i.e. memNew(). These are pretty rare so are not as much of an issue and they can be added in the future.	2023-01-12 17:36:57 +07:00
David Steele	de1dfb66ca	Refactor logging functions to never allocate memory. Allocating memory made these functions simpler but it meant that memory was leaking into the calling context when logging was enabled. It is not clear that this was an issue but it seems that trace level logging could result it a lot of memory usage depending on the use case. This also makes it possible to audit allocations returned to the calling context, which will be done in a followup commit. Also rename objToLog() to objNameToLog() since it seemed logical to name the new function objToLog().	2023-01-12 17:14:36 +07:00
David Steele	0becb6da31	Enhance libbacktrace to handle incomplete stack traces. This fills in backtrace info at the bottom of the call stack when the stack trace is incomplete due to testing. This does not affect release builds, which is why it did not make the first cut, but it turns out to be useful for testing and barely changes the release code (when we do release this). The recursion test in common/error was simplified because it would now return a very large trace.	2023-01-12 10:22:26 +07:00
David Steele	4429bc82f5	Add unit tests for the unit test build code. When this code was migrated to C the unit tests were not included because there were more important priorities at the time. This also requires some adjustments to coverage because of the new code location.	2023-01-05 12:59:06 +07:00
David Steele	2332ce8ffc	Move storageHelperFree() to storageHelper test harness. This function was only used for testing so move into a test harness.	2022-12-31 13:14:27 +07:00
David Steele	8b218158ae	Move regExpMatchPtr()/regExpMatchStr() to build/common/regExp module. Similar to `b9be4fa5`, these functions are not used by the core code so move them to the build module. The new implementation is a little less efficient but that is much less of a worry in the build/test code. Also remove regExpMatchSize() since it was not longer needed.	2022-12-31 12:54:33 +07:00
David Steele	fa9d831f9f	Move xmlNodeAttribute() to build/common/xml module. Similar to `b9be4fa5`, this function was not used by the core code so move it to the build module.	2022-12-31 11:09:50 +07:00
David Steele	163a004f30	Move strReplace() and strUpper() to build/common/string module. Neither of these functions were used by the core code. strReplace() is only used in the tests but it doesn't hurt to put it in build since the build code is not distributed.	2022-12-31 10:26:11 +07:00
David Steele	d517d4a328	Add explicit keyword for covered modules included in other modules. This was done by checking the extension but it is possible to include a module that does not have a vendor or auto extension. Instead make it explicit that the module is included in another module. Also change the variable from "include" to "included" to make it clearer what it indicates.	2022-12-31 10:10:44 +07:00
David Steele	cebbf0d012	Remove unused functions. These functions were either added with the intention that they would be used or they became obsolete over time.	2022-12-30 16:26:48 +07:00
David Steele	010efffb0c	Add hex encode/decoding to decode module. This replaces the bufHex() function and also allows hex to be decoded.	2022-12-11 19:46:48 +07:00
David Steele	4dc632d570	Add backup test harness. This allows test backups to be run in other test modules. It is likely that more logic will be moved here but for now this suffices to get test backups working in the restore module.	2022-12-05 14:15:15 +08:00
David Steele	fee38c2c7c	Pass filters to remote storage as a handler array. The prior code required coverage in the storage/remote module for all filters that could be used remotely. Now the filter handlers are set at runtime so any filter list can be used with a remote. This is more flexible and makes coverage testing easier. It also resolves a test dependency. Move the command/remote unit test near the end so it will have access to all filters without using depends.	2022-10-18 16:11:35 +13:00
David Steele	909be412f8	Swap command/backup and command/restore unit tests. Logically restore belongs after backup and in a future commit restore will have a dependency on some backup objects.	2022-10-14 12:08:40 +13:00
David Steele	5602f179a1	Add varint-128 encode/decode to IoRead/IoWrite. This makes it more efficient to read/write (especially read) varint-128 to/from IO. Update the Pack type to take advantage of the more efficient read and remove some duplicate code.	2022-10-05 17:01:35 -10:00
Reid Thompson	01b81f9d37	Move link creation to storage interface. Direct link creation via Posix functions has been moved to the Posix driver. This change allows adding SFTP softlink creation in the SFTP driver using the standard interface.	2022-10-01 15:26:44 -10:00
Stefan Fercot	381fd0a5a4	Backup key/value annotations. Allow key/value annotations to be added with the backup command and added/modified/removed with the new annotate command. Annotations can be viewed with the info command in text mode when --set is specified and are always included in JSON output.	2022-08-24 10:52:33 -04:00
David Steele	75623d4583	Create snapshot when listing contents of a path. Previously a callback was used to list path contents and if no sort was specified then a snapshot was not required. When deleting files from the path some filesystems could omit files that still existed, which meant the path could not be removed. Filter . out of lists in the Posix driver since this special entry was only used by test code (and filtered everywhere in the core code). Also remove callbacks from the storage interface and replace with an iterator that should be easier to use and guarantees efficient use of the snapshots.	2022-07-08 17:21:39 -04:00
David Steele	55bcb933ee	Move protocol module from common to command. This module has dependencies on command/command so it does not make sense for it to be in the common module. Also move protocolFree() to main() since this is a very large dependency. Adjust the tests so command/exit can be tested later. This is a bit messy but will get adjusted as we improve the test harness.	2022-06-17 11:17:52 -04:00
David Steele	f92ce674f7	Automatically create PostgreSQL version interfaces. Maintaining the version interfaces was complicated by the fact that each interface needed to be in separate compilation unit to avoid type conflicts. This also meant that various build/test files needed to be updated to add the new interfaces. Solve these problems by auto-generating all the interfaces into a single file. This is made possible by parsing defines and types out of the header files and creating macros to rename the types. At the end of the version interface everything is undef'd. Another benefit is that the auto-generated interfaces can be static and included directly into postgres/interface.c. Since some code generation is now always required for tests, change --no-gen to --min-gen in test.pl. It would also make sense to auto-generate the version defines in postgres/version.h, but that will be left for a future commit.	2022-06-06 13:52:56 -04:00
David Steele	148956aed8	Remove useless command/check unit test. This test was a placeholder and did not provide any coverage, but it did give inconsistent results on different shell versions.	2022-06-01 10:13:57 -04:00
David Steele	68a410779a	Add zNewFmt(). This replaces strZ(strNewFmt()), making the code simpler and reducing indentation.	2022-05-06 12:32:49 -04:00
Reid Thompson	65d22e4325	Add verify output and verbose options. These options allow the user to control how the verify results will be output to the console and log.	2022-05-06 11:11:36 -04:00
David Steele	20782c88bc	PostgreSQL 15 support. PostgreSQL 15 drops support for exclusive backup and renames the start/stop backup commands. This is based on the pgdg-testing repo since beta1 has not been released yet, but it seems unlikely that breaking changes will be made at this point. beta1 should be tagged just before our next release so we'll retest before the release.	2022-05-04 11:55:59 -04:00
David Steele	692fe496bd	Remove dependency on pg_database.datlastsysoid. This column has been removed in PostgreSQL 15. Rather than add a lot of special handling, it seems better just to update all versions to not depend on this column. Add centralized functions to identify the type of database (i.e. system or user) by name and use FirstNormalObjectId when a name is not available. The new query in the db module will still return the prior result for PostgreSQL <= 15, which will be stored in the manifest. This is important to preserve behavior when downgrading pgBackRest. There are no concerns here for PostgreSQL 15 since older versions of pgBackRest won't be able to restore backups for PostgreSQL 15 anyway.	2022-05-04 08:22:45 -04:00
David Steele	45c3f4d53c	Improve JSON handling. Previously read/writing JSON required parsing/render via a variant, which add many more memory allocations and loops. Instead allow JSON to be read/written serially to improve performance and simplify the code. This also allows us to get rid of many String and Variant constant which are no longer required. The goal is to be able to read/write very large (e.g. gigabyte manifest) JSON structures, which would not be practical with the current code. Note that external JSON (GCS, S3, etc) is still handled using variants. Converting these will require more consideration about key ordering since it cannot be guaranteed as in our own formats.	2022-04-25 09:06:26 -04:00
David Steele	79b2041663	Add lockRead*() functions for reading locks from another process. Sometimes we need to read a lock from another process. This was done two different ways and in the case of cmdStop() was definitely hacky. Centralize the logic to make it easier to read the locks for another process. This will also make it easier to add new lock data.	2022-04-08 15:55:41 -04:00
Reid Thompson	5ae84d5e47	Improve path validation for repo-* commands. Check for invalid path in repo-* commands. Perform path validation and throw an error when appropriate. Path may not contain '//'. Strip trailing '/' from path. Absolute path must fall under repo path.	2022-03-22 07:50:26 -06:00
Reid Thompson	f7ab002aa7	Improve stop command to honor stanza option. Improve the stop command, when force and stanza options are specified, to terminate only processes holding lock files for the given stanza. Prior to these changes, termination of all processes holding lock files regardless of stanza occurred.	2022-03-08 12:18:23 -06:00
David Steele	b489707793	Move command/backup-common tests in the command/backup module. As much as possible it is better to get coverage with more realistic tests. Merging these modules will allow the page checksum code to be covered with real backups.	2022-02-18 17:50:05 -06:00
David Steele	61ce58692f	Pack manifest file structs to save memory. Manifests with a very large number of files can use a considerable amount of memory. There are a lot of zeroes in the data so it can be stored more efficiently by using base-128 varint encoding for the integers and storing the strings in the same allocation. The downside is that the data needs to be unpacked in order to be used, but in most cases this seems fast enough (about 10% slower than before) except for saving the manifest, which is 10% slower up to 10 million files and then gets about 5x slower by 100 million (two minutes on my M1 Mac). Profiling does not show this slowdown so I wonder if this is related to the change in memory layout. Curiously, the function that increased most was jsonFromStrInternal(), which was not modified. That gives more weight to the idea that there is some kind of memory issue going on here and one hopes that servers would be less affected. Either way, they largest use cases we have seen are for about 6 million files so if we can improve that case I believe we will be better off. Further analysis showed that most of the time was taken up writing the size and timestamp fields, which makes almost no sense. The same amount of time was used if they were hard-coded to 0, which points to some odd memory issue on the M1 architecture. This change has been planned for a while, but the particular impetus at this time is that small file support requires additional fields that would increase manifest memory usage by about 20%, even if the feature is not used. Note that the Pack code has been updated to use the new varint encoder, but the decoder remains separate because it needs to fetch one byte at a time.	2022-01-21 17:05:07 -05:00
David Steele	4a73a02863	Simplify manifest defaults. Manifest defaults for user, group, and mode were previously generated by scanning the data to find the most common values. This was very accurate but slow and complicated. It could also lead to surprising changes in the manifest when a default value suddenly changed. Instead, use the $PGDATA path to generate defaults. In the vast majority of cases the same user/group should own all the path/files and the default file mode is easily derived from the path mode. There may be some edge cases where this generates larger manifests, but in general it reduces time and complexity when saving the manifest. Remove the MCV code since it is longer longer used.	2022-01-21 15:22:48 -05:00
David Steele	47954774c6	Combine encrypted backupFile() tests with unencrypted tests. This makes it easier to comment out all the tests while developing without getting unused variable errors.	2022-01-09 10:11:00 -05:00
David Steele	bb4b30ddd3	Remove support for PostgreSQL 8.3/8.4. There is no evidence that users need 8.3/8.4 anymore but it does cost us in terms of development and testing, especially now that we have a number of new backup/restore features planned. It seems to make sense to remove this support now. If there are users who need to use/migrate from these versions they can use an older version of pgBackRest.	2022-01-06 15:34:04 -05:00
David Steele	615bdff403	Fix socket leak on connection retries. This leak was caused by the file descriptor variable getting clobbered after a long jump. Mark it as volatile to fix. Testing this is a bit complex because the issue only happens in optimized builds, if at all. Put the test into the performance suite, which is always optimized, until a better idea presents itself.	2021-12-14 14:53:41 -05:00
David Steele	0895cfcdf7	Add HRN_PG_CONTROL_PUT() and HRN_PG_CONTROL_TIME(). These macros simplify management of pg_control test files. Centralize time updates for pg_control in the command/backup module. This caused some time updates in the logs. Finally, move the postgres module after the storage module so it can use storage macros.	2021-11-30 13:23:11 -05:00
David Steele	3f7409019d	Ensure ASSERT() macro is always available in test modules. Tests that run without DEBUG for performance did not have ASSERT() and were using CHECK() instead. Instead ensure that the ASSERT() macro is always available in tests.	2021-11-24 16:09:45 -05:00
David Steele	43cfa9cef7	Revive archive performance test. This test was lost due to a syntax issue in `a58635ac`. Update the test to use system() to better mimic what postgres does and add logging so pgBackRest timing can be determined.	2021-11-10 12:14:41 -05:00
David Steele	038abaa71d	Display size option default and allowed values with appropriate units. Size option default and allowed values were displayed in bytes, which was confusing for the user. This also lays the groundwork for adding units to time options. Move option parsing functions into a common module so they can be used from the build module.	2021-11-03 15:23:08 -04:00
David Steele	ccc255d3e0	Add TLS Server. The TLS server is an alternative to using SSH for protocol connections to remote hosts. This command is currently experimental and intended only for trial and testing. As such, the new commands and options will not show up in the command-line help unless directly requested.	2021-10-18 14:32:41 -04:00
David Steele	fb3f6928c9	Add configurable storage helpers to create repository storage. Remove the hardcoded storage helpers from storageRepoGet() except for the the built-in Posix helper and the special remote helper. The goal is to make storage driver development a bit easier by isolating as much of the code as possible into the driver module. This also makes coverage reporting much simpler for additional drivers since they do not need to provide coverage for storage/helper. Consolidate the CIFS tests into the Posix tests since CIFS is just a special case of the Posix. Test all storage features in the Posix test so that other storage driver tests do not need to provide coverage for storage/storage. Remove some dead code in the storage/s3 test.	2021-10-06 19:27:04 -04:00
David Steele	136d309dd4	Allow stack trace to be specified for errorInternalThrow(). This allows the stack trace to be set when an error is received by the protocol, rather than appending it to the message. Now these errors will look no different than any other error and the stack trace will be reported in the same way. One immediate benefit is that test.pl --vm-out --log-level-test=debug will work for tests that check expect log results. Previously, the test would error at the first check because the stack trace included in the message would not match the expected log output.	2021-10-01 15:29:31 -04:00
David Steele	0e76ccb5b7	Convert filter param/result to Pack type. The Pack type is more compact and flexible than the Variant type. The Pack type also allows binary data to be stored, which is useful for transferring the passphrase in the CipherBlock filter. The primary purpose is to allow more (and more complex) result data to be returned efficiently from the PageChecksum filter. For now the PageChecksum filter still returns the original Variant. Converting the result data will be the subject of a future commit. Also convert filter types to StringId.	2021-09-22 10:48:21 -04:00
David Steele	f4e1babf6b	Migrate command-line help generation to C. Command-line help is now generated at build time so it does not need to be committed. This reduces churn on commits that add configuration and/or update the help. Since churn is no longer an issue, help.auto.c is bzip2 compressed to save space in the binary. The Perl config parser (Data.pm) has been moved to doc/lib since the Perl build path is no longer required. Likewise doc/xml/reference.xml has been moved to src/build/help/help.xml since it is required at build time.	2021-09-08 18:16:06 -04:00

1 2 3 4 5 ...

296 Commits