1
0
mirror of https://github.com/pgbackrest/pgbackrest.git synced 2025-02-07 13:42:41 +02:00

356 Commits

Author SHA1 Message Date
David Steele
c6aaf66e9d Add FUNCTION_LOG_RETURN_STRUCT() and update where appropriate.
The FUNCTION_LOG_RETURN() macro requires logging macros (e.g. FUNCTION_LOG_*_TYPE and FUNCTION_LOG_*_FORMAT) when returning a struct but these macros don't deliver much value since they only output the name of the struct rather than the contents. A copy of the struct is also made during this operation, which is wasteful.

FUNCTION_LOG_RETURN_STRUCT() does not make a copy of the struct and does not require any logging macros. Returned structures are logged as "struct" but this could be made more accurate using __typeof in the future.

Structures as parameters are not addressed here and work as before, i.e. they require logging macros.
2021-01-13 07:49:47 -05:00
David Steele
b21ed97982 Check for missing files in queueNeed().
Missing files would indicate that another process is running on the same spool path, which would be a very bad thing.

This check doesn't cost any additional I/O so it seems like a good idea.
2021-01-12 18:47:19 -05:00
David Steele
aeee83044d
Fix resume after partial delete of backup by prior resume.
If files other than backup.manifest.copy were left in a backup path by a prior resume then the next resume would skip the backup rather than removing it. Since the backup path still existed, it would be found during backup label generation and cause an error if it appeared to be later than the new backup label. This occurred if the skipped backup was full.

The error was only likely on object stores such as S3 because of the order of file deletion. Posix file systems delete from the bottom up because directories containing files cannot be deleted. Object stores do not have directories so files are deleted in whatever order they are provided by the list command. However, the issue can be reproduced on a Posix file system by manually deleting backup.manifest.copy from a resumable backup path.

Fix the issue by removing the resumable backup if it has no manifest files. Also add a new warning message for this condition.

Note that this issue could be resolved by running expire or a new full backup.
2021-01-12 12:38:32 -05:00
David Steele
96fd678662
Add job-retry and job-retry-interval options.
These options specify the number of local worker job retries and the retry interval after one immediate retry.

There is some value in allowing retries to be specified by the user but for the most part these options are for suppressing retries during testing, which can save a lot of time. The bug introduced in d1d25c7 and fixed in 8b86d5e also suggests it is better not to use retries in tests.

Remove the default delayed retries for archive-get/archive-push, leaving only the immediate retry. These commands are retried by PostgreSQL so it doesn't make sense to do too many retries internally.

These options are currently internal.
2021-01-11 15:15:25 -05:00
David Steele
8b86d5ea7a Restore storageRepo() call in archiveGetProtocol().
This call was removed by d1d25c71, which worked for archivePushProtocol() and verifyProtocol() since the encryption options are passed from the main process.

archiveGetProtocol() still retrieves these options in the local process so the repo storage must be loaded first.
2021-01-11 11:34:03 -05:00
David Steele
8567b7f733 Make archive-get locality error generate a global.error file.
Moving this error into the try block ensures that a global.error file is generated, which will be seen by archive-get.
2021-01-08 16:29:56 -05:00
David Steele
951cfa9e90 Remove repo option.
This option was added in advance of the multi-repo functionality but it has no purpose and it is not clear what the validity rules should be.

The option will be added back when multi-repo functionality is committed.
2020-12-31 08:12:35 -05:00
Cynthia Shang
cc90163233
Add empty archive array to info command JSON when stanza is missing.
There is an inconsistency when the JSON is output for the case when a stanza is requested and it does not exist in the repo. This was the only case where the archive array was not added to the JSON. Adding it will simplify the upcoming multi-repo support code.

Also, a redundant test was removed rather than updating it for this case.
2020-12-30 16:17:56 -05:00
David Steele
23f5712d02
Allow option validity to be determined by command role.
Validity by command was not granular enough so numerous options needed be marked internal so users would not stumble across them. Options were also needlessly being passed to roles that had no use for them.

Introduce per-role validity lists that depend on what roles are valid per command. Also add a check to ensure that only valid roles are used with a command.

This commit adds the functionality but does not introduce any new behavior, i.e. all options are valid for all roles that the command is valid for. A subsequent commit will introduce the new role restrictions to make the changes easier to audit.
2020-12-28 09:43:23 -05:00
David Steele
9e9e7c4a0d Move all parse-related rules to parse module.
Data required for parsing was spread between the config and defined modules, mostly for historical reasons because the same data was used by Perl.

Requiring all the parse rules to be accessed with function interfaces makes the code more complicated and new rules harder to implement.

Instead, move the data to the parse module so in the most complex cases no interface functions are needed. This reduces the total amount of code and paves the way for more complex parse rules.
2020-12-17 09:32:31 -05:00
David Steele
f520ecc89a Move help data from define.auto.c/config.auto.c to a pack.
The help data can be represented more compactly in a pack and this separates data needed for help from data needed for parsing, freeing each to have a more appropriate representation.
2020-12-16 15:59:36 -05:00
David Steele
996de0a3e6 Remove cfgCmdNone from CFG_COMMAND_TOTAL.
cfgCmdNone is used to indicate a missing or invalid command so should not be used in the total used for command process.
2020-12-16 11:33:51 -05:00
David Steele
87996558d2
Replace double type with time in config module.
The C code does not use doubles to represent seconds like the Perl code did so time can be represented as an integer which reduces the number of data types that config has to understand.

Also remove Variant doubles since they are no longer used.

Note that not all double code was removed since we still need to display times to the user in seconds and it is possible for the times to be fractional. In the future this will likely be simplified by storing the original user input and using that value when the time needs to be displayed.
2020-12-09 08:59:51 -05:00
David Steele
d4211d3aaf Add retries to PostgreSQL sleep when starting a backup.
Inaccuracies in sleep time or clock skew might make a single sleep insufficient to reach the next second.

Add a few retries to make the process more reliable but still avoid an infinite loop if something is seriously wrong.
2020-12-02 22:41:14 -05:00
David Steele
d1d25c710d Remove extraneous storageRepo() calls.
These calls are not required since cipher info is passed explicitly. They are probably a copy-pasto from some past time when one of these functions required it.
2020-11-30 18:03:24 -05:00
Stefan Fercot
5488de8b6a
Report page checksum errors in info command text output.
This feature currently only works for text output. JSON output is planned for the future.
2020-11-25 12:14:03 -05:00
David Steele
117f03eba1 Prepare configuration module for multi-repository support.
Refactor the code to allow a dynamic number of indexes for indexed options, e.g. pg-path. Our reliance on getopt_long() still limits the number of indexes we can have per group, but once this limitation is removed the rest of the code should be happy with dynamic numbers of indexes (with a reasonable maximum).

Add an option to set a default in each group. This was previously handled by the host-id option but now there is a specific option for each group, pg and repo. These remain internal until they can be fully tested with multi-repo support. They are fully tested for internal usage.

Remove the ConfigDefineOption enum and use the ConfigOption enum instead. They are now equal since the indexed options (e.g. cfgOptRepoHost2) have been removed from ConfigOption.

Remove the config/config test module and add required tests to the config/parse test module. Parsing is now the only way to load a config so this removes some redundancy.

Split new internal config structures and functions into a new header file, config.intern.h. More functions will need to be moved over from config.h but that will need to be done in a future commit to reduce churn.

Add repoIdx to repoIsLocal() and storageRepo*(). Multi-repository support requires that repo locality and storage be accessible by index. This allows, for example, multiple repos to be iterated in a loop. This could be done in a separate commit but doesn't seem worth it since the code is related.

Remove the type parameter from storageRepoGet(). This parameter existed solely to provide coverage for the case where the storage type was invalid. A better pattern is to check that the type is S3 once all other types have been ruled out.
2020-11-23 15:55:46 -05:00
David Steele
7fda83b31e
Allow multiple remote locks from the same main process.
Improve locking on remote processes by introducing an exec-id that is unique to the main process and passed to all remote processes. This allows the remote processes to determine if a lock is held by a remote from the same main process. If so, the lock is allowed.

The exec-id is also useful for associating remote logs with main logs for debugging purposes.
2020-11-23 12:41:54 -05:00
Stefan Fercot
191b8ec18b
Create standby.signal only on PostgreSQL 12 when restore type is standby.
When restore type standby is provided, the recovery.signal isn't needed and may lead to some confusion (see #1236).

Lately, when using pg_basebackup --write-recovery-conf, only the standby.signal file is created. This change would then align with that behaviour.
2020-11-19 16:57:19 -05:00
Cynthia Shang
9aacd3c54a
Move retrieval of archiveResult before while loop in verifyArchive().
The result structure for the archive id being processed only needs to be retrieved once so moving it outside of the WAL path list processing loop is more efficient.
2020-11-18 18:19:49 -05:00
David Steele
3f7a66fffc Remove obsolete call to storageRepo() in archivePushProtocol().
This call to storageRepo() was used to fetch cipher options from a remote to determine if a repo cipher was enabled.

Now the main process does this work and passes the cipher options directly to the local so there is no need to pre-load the repo storage here.
2020-11-09 16:19:37 -05:00
David Steele
cbb9b8fd2b Check archive push queue limit before checking repository.
If the push queue limit has been exceeded then nothing will be pushed to the repo so there is no point in checking it. Worse, a failure in the check would cause drop not to run and potentially fill up the disk, exactly the case this feature was designed to prevent.

The async version already checks the push queue limit before checking the repository so now both versions have the same behavior.
2020-11-09 16:16:41 -05:00
David Steele
8cf47f82e4 Log warnings in archive-push async log.
These warnings were only being reported to PostgreSQL on the console. Now they are also recorded in the async log increasing the chance that they will be seen.

This also improves coverage by requiring a warning during async processing to have a test case, which has been added.
2020-11-09 16:10:59 -05:00
David Steele
d5d1ec6f6f Use a constant to check restore target action.
Checking the default here was fragile. If the default were to change the code would break.

This also removes the only dependency on cfgOptionDefault() outside of the help command.
2020-11-04 11:09:05 -05:00
Stefan Fercot
abe9d90c89
Improve info command output when a stanza is specified but missing.
Return a path missing error when a stanza is specified for the info command but the stanza does not exist in the repository.

Previously [] was returned, which is still the case if no stanza is specified and the repository does not exist.
2020-10-27 08:34:18 -04:00
David Steele
d452e9cc38
Use zero-based indexes when referring to option indexes.
There were a number of places in the code where "hostId" was used, but hostId is just the option group index + 1 so this led to a lot of +1 and -1 to convert the id to an index and vice versa.

Instead just use the zero based index wherever possible. This is pretty much everywhere except when the host-id option is read or set, or where a message is being formatted for the user.

Also fix a bug in protocolRemoteParam() where remotes spawned from the main process could get process ids that were not 0. Only the locals should spawn remotes with process id > 0. This seems to have been harmless since the process id is only a label, but it could be confusing when debugging.
2020-10-26 10:25:16 -04:00
Cynthia Shang
ae35c4f029
Remove unused FUNCTION_LOG_VERIFY_WAL_RANGE* defines.
The defines for FUNCTION_LOG_VERIFY_WAL_RANGE* are not used in the current verify.c and are currently not planned in the continuing development of the verify command, so they are dead code and are therefore being removed.
2020-10-26 07:41:08 -04:00
David Steele
176cf0bf60 Use harnessCfgLoadRaw() in command/command and common/exit unit tests.
The tests were originally written by loading values directly into the configuration before the parser was available.

Update to use harnessCfgLoadRaw() to simplify the tests and make them compatible with upcoming config changes.

Note that some unreachable conditions were removed since they could not be reached via a parsed config, only by munging values directly into the config. cfgOptionTest(optionId) was removed because a non-default value must always be set. cfgOptionValid(cfgOptLogTimestamp) was removed because it is true for all commands except for cfgCmdNone, which is checked with an assert.
2020-10-20 14:54:28 -04:00
David Steele
156b7d48cc Get target-action default from cfgOptionDefault() in restore command.
cfgDefOptionDefault() worked but the default is available without having to peek into config definitions.
2020-10-20 12:39:23 -04:00
David Steele
41789d70d1 Remove cfgOptionId() and replace it with cfgParseOption().
cfgOptionId() did not recognize deprecated options which made the help command throw errors when they were specified on the command line. cfgParseOption() will correctly identify deprecated options.

cfgParseOption() can also be used in cfgParse() to reduce code duplication when parsing info out of the option value returned by optionFind().

Finally, code the option key index separately in parse.auto.c. For now they are simply added back together but future code will need them separated.
2020-10-20 11:24:26 -04:00
David Steele
6414ae9707 Remove ConfigDefineCommand enum.
This has always been equivalent to the ConfigCommand enum so it just adds complexity.

It was created for symmetry with ConfigDefineOption, which will also be removed soon.
2020-10-19 18:17:47 -04:00
David Steele
7d069a2b91 Remove indexed option constants.
These constants don't scale well as the index total is increased for an option.

The core code rarely uses these options and they are easily replaced with cfgOptionName().

The tests had started to make use of the constants, so provide functions that build the option name from the optionId and, optionally, the optionKey.
2020-10-19 14:03:48 -04:00
Stefan Fercot
86275c4f85
Expire history files.
WAL timeline history files were not being expired because they were small and generally not very plentiful.

However, in some cases large numbers of history files may be generated so it makes sense to remove useless history files to keep things tidy.

The history file for the oldest retained timeline is kept for debugging purposes even though it is not used for recovery.
2020-10-16 07:42:03 -04:00
David Steele
e0f09687e4
Add option groups.
Group related options together so operations (e.g. valid, test, index total) can be performed on all options in the group.

Previously, options at the top of the hierarchy of the related options were used to do these tests. This was prone to error as option relationships changed and it was not always clear which option (or options) should be used.
2020-10-08 10:52:19 -04:00
Cynthia Shang
ad79932ba5
Add internal verify command.
Scan the WAL archive for missing or invalid files and build up ranges of WAL that will be used to verify backup integrity. A number of errors and warnings are currently emitted but they should not be considered authoritative (yet).

The command is incomplete so is marked internal.
2020-09-22 11:57:38 -04:00
David Steele
927d9adbee
Improve PostgreSQL version identification.
Previously, catalog versions were fixed for all versions which made maintaining the catalog versions during PostgreSQL beta and release candidate cycles very painful. A version of pgBackRest which was functionally compatible was rendered useless by a catalog version bump in PostgreSQL.

Instead use only the control version to identify a PostgreSQL version when possible. Some older versions require a catalog version to positively identify a PostgreSQL version, so include them when required.

Since the catalog number is required to work with tablespaces it will need to be stored. There's already a copy of it in backup.info so use that (even though we have been ignoring it in the C versions).
2020-09-18 16:55:26 -04:00
David Steele
fc77c51182
Improve working directory error message.
Improve the wording of the error message and add a hint to make it clearer what is wrong and how the user can fix it.

Also change the assert to a regular error since this is not an internal error.
2020-09-11 10:10:25 -04:00
Cynthia Shang
b8efb13bcb Move archiveIdComparator() to archive/common module. 2020-09-08 12:28:56 -04:00
David Christensen
9fd31913a8 Add hint about starting the stanza when WAL segment not found.
If a stop command has been issued the check command fails due to archiving timing out.

Provide a hint to document this situation and point the user in the proper direction.
2020-09-03 07:49:49 -04:00
David Steele
8c2960fab3
Add archive-mode option to disable archiving on restore.
When restoring a cluster that will be promoted but is not intended to be the new primary, it is important to disable archiving to avoid polluting the repository with useless WAL. This option makes disabling archiving a bit easier.
2020-08-25 15:05:41 -04:00
David Steele
1812725c8e Fix typo. 2020-08-25 12:50:06 -04:00
David Steele
859b8a50fd Remove unused parameter from cmdBegin(). 2020-08-20 14:16:36 -04:00
David Steele
fccca0d716 Refactor option logging into a general-purpose function. 2020-08-20 14:11:40 -04:00
David Steele
959f77cd6a
Add general-purpose statistics collector.
Currently each module that needs to collect statistics implements custom code to do so. This is cumbersome.

Create a general purpose module for collecting and reporting statistics. Statistics are output in the log at detail level, but there are other uses they could be put to eventually.

No new functionality is added. This is just a drop-in replacement for the current statistics, with the advantage of being more flexible.

The new stats are slower because they involve a list lookup, but performance testing shows stats can be updated at about 40,000/ms which seems fast enough for our purposes.
2020-08-20 14:04:26 -04:00
David Steele
7fdbd94e39
Implement IoClient/IoSession interfaces for SocketClient/SocketSession.
Following up on 111d33c, implement the new interfaces for socket client/session. Now HTTP objects can be used over TLS or plain sockets.

This required adding ioSessionFd() and ioSessionRole() to provide the functionality of sckSessionFd() and sckSessionType(). sckClientHost() and sckClientPort don't make sense in a generic interface so they were replaced with ioSessionName().
2020-08-10 16:03:38 -04:00
Floris van Nee
54c3c39645
Delay backup remote connection close until after archive check.
Only close the remote connection after verifying that the WAL files have been received. This is necessary if the archive_command on the PostgreSQL host is conditional, i.e. archiving only happens while a backup lock is held, to ensure all WAL segments are archived.
2020-08-10 11:35:09 -04:00
David Steele
4d22d6eeca
Move file descriptor read/write ready into IoRead/IoWrite.
Move sckSessionReadyRead()/Write() into the IoRead/IoWrite interfaces. This is a more logical place for them and the alternative would be to add them to the IoSession interface, which does not seem like a good idea.

This is mostly a refactor, but a big change is the select() logic in fdRead.c has been replaced by ioReadReady(). This was duplicated code that was being used by our protocol but not TLS. Since we have not had any problems with requiring poll() in the field this seems like a good time to remove our dependence on select().

Also, IoFdWrite now requires a timeout so update where required, mostly in the tests.
2020-08-08 11:23:37 -04:00
David Steele
cde2c756ea Rename handle to fd.
Pretty much everywhere handle is used what is really meant is file descriptor (fd). This terminology got migrated over from Perl and is just not quite correct, or at least not as correct as fd.

There were also plenty of places fd was used so now all uses are consistent.

The Perl code was not updated but might be in a future commit.
2020-08-05 18:25:07 -04:00
David Steele
94d3a01f73
Proactively close file descriptors after forking async process.
PostgreSQL may be using most of the available file descriptors when it executes the the archive-get/archive-push commands (especially archive-get). This can lead to problems depending on how many file descriptors are needed for parallelism in the async process.

Proactively free file descriptors between 3 and 1023 to help ensure there are enough available for reasonable values of process-max, i.e. <= 300.
2020-08-04 13:20:01 -04:00
David Steele
3e9dce0d76 Rename strPtr()/strPtrNull() to strZ()/strZNull().
We use the Z suffix in many functions to indicate that we are expecting a zero-terminated string so make this function conform to the pattern.

As a bonus the new name is a bit shorter, which is a good quality in a commonly-used function.
2020-07-30 07:49:06 -04:00