pgbackrest

mirror of https://github.com/pgbackrest/pgbackrest.git synced 2024-12-14 10:13:05 +02:00

Author	SHA1	Message	Date
David Steele	4a075b7252	Add support for more Pack types. Since the pack type was stored in 4 bits, only 15 values were allowed (0 was reserved). Allow virtually unlimited types by storing type info in a base-128 encoded integer following the tag when the type bits in the tag are set to 0xF. Also separate the type IDs used in the pack (PackTypeMap) from those presented to the user (PackType). The prior PackType enum exposed implementation details to the user, e.g. pckTypeUnknown.	2021-06-08 12:55:00 -04:00
David Steele	a76cb57c80	Remove parse.auto.h. At one time there was a lot more in this header but it got merged with the enums and constants in config.auto.c, leaving only the ConfigOptionType enum. Auto-generating the ConfigOptionType enum is not very useful since new option types require code in parse.c.	2021-06-07 08:48:09 -04:00
David Steele	7a449f0e49	Remove unused command string constants.	2021-06-07 08:18:02 -04:00
David Steele	831ee81466	Rename default command role to main. Main makes more sense because we refer to the main process in the code, not the default process. The word default is pretty overloaded anyway.	2021-05-20 14:39:47 -04:00
David Steele	5464ac83d1	Convert option values in commands to StringId. Convert most of the remaining options that benefit from being StringIds. Since all the command modules can include config.h directly it makes sense to auto-generate these values instead of manually creating an enum for each one. For the time being StringIds are not being auto-generated because the StringId code does not exist in Perl. However, the *_Z zero-terminated constants for each allowed option value are now auto-generated.	2021-05-11 17:24:30 -04:00
David Steele	95fb002b85	Preserve unused YAML fix for older Debian/Ubuntu distributions. This fix was not used since backports worked for older Debians and the old affected Ubuntus are EOL. Still, seems worth preserving just in case it comes up elsewhere.	2021-04-27 12:04:50 -04:00
David Steele	468aa79ea8	Remove auto-generated option String constants. These are rarely used and end up using more space than they save since most of them are never referenced. Replace VARSTR() with VARSTRDEF() where these constants are being used.	2021-04-22 19:10:13 -04:00
David Steele	fe4ba455ed	Move configuration definition to src/build/config/config.yaml. Moving to YAML allows the configuration data to be read by C programs. Also go back to using YAML::XS since it is the only implementation that has proper boolean support.	2021-03-08 16:01:05 -05:00
David Steele	1dbb3bf50b	Multiple repository support. Up to four repositories may be configured. A potential benefit is the ability to have a local repository for fast restores and a remote repository for redundancy. Some commands, e.g. stanza-create/stanza-update, will automatically work with all configured repositories while others, e.g. stanza-delete, will require a repository to be specified using the repo option. See the command reference for details on which commands require the repository to be specified. Note that the repo option is not required when only repo1 is configured in order to maintain backward compatibility. However, the repo option is required when a single repo is configured as, e.g. repo2. This is to prevent command breakage if a new repository is added later. The archive-push command will always push WAL to the archive in all configured repositories but backups will need to be scheduled individually for each repository. In many cases this is desirable since backup types and retention will vary by repository. Likewise, restores must specify a repository. It is generally better to specify a repository for restores that has low latency/cost even if that means more recovery time. Only restore testing can determine which repository will be most efficient. For single repository configurations there should be no change in behavior.	2021-03-08 13:31:13 -05:00
David Steele	088662d986	GCS support for repository storage. GCS and GCS-compatible object stores can now be used for repository storage.	2021-03-05 12:13:51 -05:00
David Steele	d1aa765a9d	Consolidate less commonly used repository storage options. The following options are renamed as specified: repo1-azure-ca-file -> repo1-storage-ca-file repo1-azure-ca-path -> repo1-storage-ca-path repo1-azure-host -> repo1-storage-host repo1-azure-port -> repo1-storage-port repo1-azure-verify-tls -> repo1-storage-verify-tls repo1-s3-ca-file -> repo1-storage-ca-file repo1-s3-ca-path -> repo1-storage-ca-path repo1-s3-host -> repo1-storage-host repo1-s3-port -> repo1-storage-port repo1-s3-verify-tls -> repo1-storage-verify-tls The old option names (e.g. repo1-s3-port) will continue to work for repo1, but repo2, etc. will require the new names.	2021-03-02 13:51:40 -05:00
David Steele	6a717e032f	Set config path in configure script. This allows the config path to be modified with a parameter to the configure script, though this commit does not do that. Update the Perl code generator to allow literals so that defaults can be C defines rather than static strings.	2021-02-10 14:46:26 -05:00
David Steele	b65c370346	Add repo-get command.	2021-02-05 10:39:03 -05:00
David Steele	218cd078a6	Add repo-ls command.	2021-02-05 10:07:43 -05:00
Stefan Fercot	4b46115345	Add archive-mode-check option. This option disallows the PostgreSQL archive_mode=always setting and disabling it allows the setting.	2021-02-02 13:43:14 -05:00
David Steele	d2057c53bd	Use YAML::Any module instead of YAML::XS in Perl. YAML::XS requires libyaml so it not as portable as pure Perl versions of YAML. Instead of using YAML:PP just use the general YAML::Any module which uses whatever is installed. We are not concerned about performance for YAML so whatever works is fine.	2021-01-24 15:06:38 -05:00
Cynthia Shang	f32eb9b94e	Partial multi-repository implementation. Multi-repository implementations for the archive-push, check, info, stanza-create, stanza-upgrade, and stanza-delete commands. Multi-repo configuration is disabled so there should be no behavioral changes between these commands and their current single-repo implementations. Multi-repo documentation and integration tests are still in the multi-repo development branch. All unit tests work as multi-repo since they are able to bypass the configuration restrictions.	2021-01-21 15:21:50 -05:00
David Steele	96fd678662	Add job-retry and job-retry-interval options. These options specify the number of local worker job retries and the retry interval after one immediate retry. There is some value in allowing retries to be specified by the user but for the most part these options are for suppressing retries during testing, which can save a lot of time. The bug introduced in `d1d25c7` and fixed in `8b86d5e` also suggests it is better not to use retries in tests. Remove the default delayed retries for archive-get/archive-push, leaving only the immediate retry. These commands are retried by PostgreSQL so it doesn't make sense to do too many retries internally. These options are currently internal.	2021-01-11 15:15:25 -05:00
David Steele	0e1612cda1	Remove explicit command lists where they equal the default. This reduces noise in the file and new commands will automatically get these options.	2020-12-31 12:29:11 -05:00
David Steele	108038292c	Audit options valid for expire command.	2020-12-31 12:13:20 -05:00
David Steele	0acfcb669e	Audit options valid for start/stop commands.	2020-12-31 11:10:48 -05:00
David Steele	09fdde359c	Limit pg option validity and make it command-line only. The pg option only has one current usage, to let the backup local know which pg index it should copy files from. There are other possible uses for this option, but they need thought, tests, and documentation.	2020-12-31 10:08:58 -05:00
David Steele	951cfa9e90	Remove repo option. This option was added in advance of the multi-repo functionality but it has no purpose and it is not clear what the validity rules should be. The option will be added back when multi-repo functionality is committed.	2020-12-31 08:12:35 -05:00
David Steele	9bf7dbf6a2	Do not pass pg-local/repo-local to a remote process. This was a hack to prevent the remote from loading host settings, which is now handled by option validity for command roles. These options are still useful so don't remove them, but do leave them internal for now.	2020-12-30 16:03:49 -05:00
David Steele	141466875f	Remove redundant command list in repo-s3-key option. Use the repo-type command list as similar repo options do.	2020-12-30 10:51:26 -05:00
David Steele	abb8ebe58b	Limit option validity by command role. Building on `23f5712`, limit option validity by role. This is mostly for options that weren't needed for certain roles but were harmless. However, the upcoming multi repository functionality requires the granularity implemented here. The remote role benefits since host options can automatically excluded when building the options. Also, many options that are only required for the default role (e.g. repo-retention-full) no longer need to be passed in tests for other roles.	2020-12-29 15:49:37 -05:00
David Steele	23f5712d02	Allow option validity to be determined by command role. Validity by command was not granular enough so numerous options needed be marked internal so users would not stumble across them. Options were also needlessly being passed to roles that had no use for them. Introduce per-role validity lists that depend on what roles are valid per command. Also add a check to ensure that only valid roles are used with a command. This commit adds the functionality but does not introduce any new behavior, i.e. all options are valid for all roles that the command is valid for. A subsequent commit will introduce the new role restrictions to make the changes easier to audit.	2020-12-28 09:43:23 -05:00
David Steele	9e9e7c4a0d	Move all parse-related rules to parse module. Data required for parsing was spread between the config and defined modules, mostly for historical reasons because the same data was used by Perl. Requiring all the parse rules to be accessed with function interfaces makes the code more complicated and new rules harder to implement. Instead, move the data to the parse module so in the most complex cases no interface functions are needed. This reduces the total amount of code and paves the way for more complex parse rules.	2020-12-17 09:32:31 -05:00
David Steele	f520ecc89a	Move help data from define.auto.c/config.auto.c to a pack. The help data can be represented more compactly in a pack and this separates data needed for help from data needed for parsing, freeing each to have a more appropriate representation.	2020-12-16 15:59:36 -05:00
David Steele	996de0a3e6	Remove cfgCmdNone from CFG_COMMAND_TOTAL. cfgCmdNone is used to indicate a missing or invalid command so should not be used in the total used for command process.	2020-12-16 11:33:51 -05:00
David Steele	39963f6aa5	Remove cfgDefOptionIndexTotal(). This function was only used in one place, which was better served by cfgOptionGroupIdxTotal().	2020-12-14 14:37:23 -05:00
David Steele	87996558d2	Replace double type with time in config module. The C code does not use doubles to represent seconds like the Perl code did so time can be represented as an integer which reduces the number of data types that config has to understand. Also remove Variant doubles since they are no longer used. Note that not all double code was removed since we still need to display times to the user in seconds and it is possible for the times to be fractional. In the future this will likely be simplified by storing the original user input and using that value when the time needs to be displayed.	2020-12-09 08:59:51 -05:00
David Steele	b0ea337965	Add pg-database option. In some rare cases there is no postgres database so this option may be used to specify an alternate database.	2020-12-02 22:42:50 -05:00
David Steele	117f03eba1	Prepare configuration module for multi-repository support. Refactor the code to allow a dynamic number of indexes for indexed options, e.g. pg-path. Our reliance on getopt_long() still limits the number of indexes we can have per group, but once this limitation is removed the rest of the code should be happy with dynamic numbers of indexes (with a reasonable maximum). Add an option to set a default in each group. This was previously handled by the host-id option but now there is a specific option for each group, pg and repo. These remain internal until they can be fully tested with multi-repo support. They are fully tested for internal usage. Remove the ConfigDefineOption enum and use the ConfigOption enum instead. They are now equal since the indexed options (e.g. cfgOptRepoHost2) have been removed from ConfigOption. Remove the config/config test module and add required tests to the config/parse test module. Parsing is now the only way to load a config so this removes some redundancy. Split new internal config structures and functions into a new header file, config.intern.h. More functions will need to be moved over from config.h but that will need to be done in a future commit to reduce churn. Add repoIdx to repoIsLocal() and storageRepo*(). Multi-repository support requires that repo locality and storage be accessible by index. This allows, for example, multiple repos to be iterated in a loop. This could be done in a separate commit but doesn't seem worth it since the code is related. Remove the type parameter from storageRepoGet(). This parameter existed solely to provide coverage for the case where the storage type was invalid. A better pattern is to check that the type is S3 once all other types have been ruled out.	2020-11-23 15:55:46 -05:00
David Steele	7fda83b31e	Allow multiple remote locks from the same main process. Improve locking on remote processes by introducing an exec-id that is unique to the main process and passed to all remote processes. This allows the remote processes to determine if a lock is held by a remote from the same main process. If so, the lock is allowed. The exec-id is also useful for associating remote logs with main logs for debugging purposes.	2020-11-23 12:41:54 -05:00
David Steele	41789d70d1	Remove cfgOptionId() and replace it with cfgParseOption(). cfgOptionId() did not recognize deprecated options which made the help command throw errors when they were specified on the command line. cfgParseOption() will correctly identify deprecated options. cfgParseOption() can also be used in cfgParse() to reduce code duplication when parsing info out of the option value returned by optionFind(). Finally, code the option key index separately in parse.auto.c. For now they are simply added back together but future code will need them separated.	2020-10-20 11:24:26 -04:00
David Steele	6414ae9707	Remove ConfigDefineCommand enum. This has always been equivalent to the ConfigCommand enum so it just adds complexity. It was created for symmetry with ConfigDefineOption, which will also be removed soon.	2020-10-19 18:17:47 -04:00
David Steele	7d069a2b91	Remove indexed option constants. These constants don't scale well as the index total is increased for an option. The core code rarely uses these options and they are easily replaced with cfgOptionName(). The tests had started to make use of the constants, so provide functions that build the option name from the optionId and, optionally, the optionKey.	2020-10-19 14:03:48 -04:00
David Steele	e0f09687e4	Add option groups. Group related options together so operations (e.g. valid, test, index total) can be performed on all options in the group. Previously, options at the top of the hierarchy of the related options were used to do these tests. This was prone to error as option relationships changed and it was not always clear which option (or options) should be used.	2020-10-08 10:52:19 -04:00
David Steele	98f6a1cffd	Remove unused cfgDefOptionPrefix() function and data. This function was made obsolete when Perl was removed in `f0ef73db`.	2020-10-07 12:15:50 -04:00
David Steele	9377d05072	Add repo-azure-endpoint option. This option allows alternate endpoints (e.g. Azure Government) to be configured.	2020-10-06 17:15:48 -04:00
Cynthia Shang	ad79932ba5	Add internal verify command. Scan the WAL archive for missing or invalid files and build up ranges of WAL that will be used to verify backup integrity. A number of errors and warnings are currently emitted but they should not be considered authoritative (yet). The command is incomplete so is marked internal.	2020-09-22 11:57:38 -04:00
David Steele	8c2960fab3	Add archive-mode option to disable archiving on restore. When restoring a cluster that will be promoted but is not intended to be the new primary, it is important to disable archiving to avoid polluting the repository with useless WAL. This option makes disabling archiving a bit easier.	2020-08-25 15:05:41 -04:00
David Steele	851f2e814e	Automatically retrieve temporary S3 credentials on AWS instances. Automatically retrieve the role and temporary credentials for S3 when the AWS instance is associated with an IAM role. Credentials are automatically updated when they are <= 5 minutes from expiring. Basic configuration is to set repo1-s3-key-type=auto. repo1-s3-role can be used to set a specific role, otherwise it will be retrieved automatically.	2020-08-25 10:38:49 -04:00
Stefan Fercot	d3dd32a031	Add expire-auto option. This allows automatic expiration after a successful backup to be disabled.	2020-07-14 08:12:25 -04:00
David Steele	2f7823c627	Add shared access signature (SAS) authorization for Azure. A shared access signature (SAS) provides granular, delegated access to resources in a storage account. This is often preferable to using a shared key which provides more access and is a greater security risk if compromised.	2020-07-09 14:46:48 -04:00
David Steele	3f4371d7a2	Azure support for repository storage. Azure and Azure-compatible object stores can now be used for repository storage. Currently only shared key authentication is supported but SAS will be added soon.	2020-07-02 16:24:34 -04:00
David Steele	da4f15663b	Improve error when pg1-path option missing for archive-get command. The assert thrown was not as descriptive as a proper option missing error.	2020-06-10 11:41:08 -04:00
David Steele	20d8c76b6c	Ignore pg-host* and repo-host* options for the remote command. The purpose of the remote command is to get access to local resources, so a remote should never start another remote. However, this could happen if there were host settings on the remote host, which ended badly with lock errors, loops, etc. Add pg-local and repo-local options to indicate that the resource is local even if there are host settings. Note that for the time being these options are internal and not intended for general usage. However, this is likely the direction needed to allow for more symmetric and manageable configurations.	2020-05-22 13:51:26 -04:00
David Steele	ea9147e2e0	Reduce buffer-size default to 1MiB. The prior default was determined by benchmarking the Perl code prior to the 1.0 release. In general buffer allocation was more expensive in Perl so large buffers gave the best performance. This was due to multiple buffer allocations for each filter in an IO operation. The C code allocates fixed buffers for each IO operation so the cost for buffer allocation is lower than Perl. That being the case it made sense to benchmark the C code to determine the optimal buffer default. The performance/storage tests were used to measure the performance of a variety of filters. 1GiB of data was processed by each filter 10 times and the results of the tests were averaged. While most buffer sizes gave similar performance, 1MiB appeared to perform the best overall. Of course, different architectures are likely to yield different results but this seems like a sensible default. The buffer-size option may still need to be manually configured to give optimal results. Raw test data for reference: 4MB buffer (prior default) copy time 1807ms, avg time 180ms, avg throughput: 5942MB/s md5 time 14200ms, avg time 1420ms, avg throughput: 756MB/s sha1 time 11431ms, avg time 1143ms, avg throughput: 939MB/s sha256 time 23463ms, avg time 2346ms, avg throughput: 457MB/s gzip -6 time 381199ms, avg time 38119ms, avg throughput: 28MB/s lz4 -1 time 15484ms, avg time 1548ms, avg throughput: 693MB/s 1MB buffer (new default) copy time 1760ms, avg time 176ms, avg throughput: 6100MB/s md5 time 13739ms, avg time 1373ms, avg throughput: 781MB/s sha1 time 11025ms, avg time 1102ms, avg throughput: 973MB/s sha256 time 22539ms, avg time 2253ms, avg throughput: 476MB/s gzip -6 time 372995ms, avg time 37299ms, avg throughput: 28MB/s lz4 -1 time 15118ms, avg time 1511ms, avg throughput: 710MB/s 512K buffer copy time 1782ms, avg time 178ms, avg throughput: 6025MB/s md5 time 13724ms, avg time 1372ms, avg throughput: 782MB/s sha1 time 10959ms, avg time 1095ms, avg throughput: 979MB/s sha256 time 22982ms, avg time 2298ms, avg throughput: 467MB/s gzip -6 time 378120ms, avg time 37812ms, avg throughput: 28MB/s lz4 -1 time 15484ms, avg time 1548ms, avg throughput: 693MB/s 256K buffer copy time 1805ms, avg time 180ms, avg throughput: 5948MB/s md5 time 13706ms, avg time 1370ms, avg throughput: 783MB/s sha1 time 11074ms, avg time 1107ms, avg throughput: 969MB/s sha256 time 22588ms, avg time 2258ms, avg throughput: 475MB/s gzip -6 time 372645ms, avg time 37264ms, avg throughput: 28MB/s lz4 -1 time 16346ms, avg time 1634ms, avg throughput: 656MB/s	2020-05-19 16:58:49 -04:00

1 2 3 4

160 Commits