Allow a single linefeed-terminated line to be read or written. This is useful for various protocol implementations, including HTTP and pgBackRest's protocol.
On read the maximum line size is limited to buffer-size to prevent runaway memory usage in case a linefeed is not found. This seems fine for HTTP but we may need to revisit this decision when implementing the pgBackRest protocol. Another option would be to increase the minimum buffer size (currently 16KB).
This test has been flapping since 9b9396c7. It seems to be some kind of timing issue since all integration tests pass and this unit passes on all other VMs. It only happens on Travis and is not reproducible in any development environment that we have tried.
For now, disable the test since the constant flapping is causing major delays in testing and quite a bit of time has been spent trying to identify the root cause. We are actively developing these tests and hope the issue will be identified during the course of normal development.
A number of improvements were made to the tests while searching for this issue. While none of them helped, it makes sense to keep the improvements.
Duplicating a non-multi-value option was not throwing the correct message when the option was a boolean.
The reason was that the option was being validated as a boolean before the multi-value check was being done. The validation code assumed it was operating on a string but was instead operating on a string list causing an assertion to fail.
Since it's not safe to do the multi-value check so late, move it up to the command-line and configuration file parse phases instead.
Reported by Jesper St John.
Previously this was done in two separate places by checking if an option was type hash or list.
Bad enough that it was in two places, but an upcoming bug fix will add another instance so make it a function.
There doesn't seem to be any need to implement this as a filter since current use cases (S3 authentication) work on small datasets.
So, use the single function method provided by OpenSSL for simplicity.
This constructor creates a Buffer object directly from a zero-terminated string. The old way was to create a String object first, then convert that to a Buffer using bufNewStr().
Updated in all places that used the old pattern.
PostgreSQL 11 introduces configurable WAL segment sizes, from 1MB to 1GB.
There are two areas that needed to be updated to support this: building the archive-get queue and checking that WAL has been archived after a backup. Both operations require the WAL segment size to properly build a list.
Checking the archive after a backup is still implemented in Perl and has an active database connection, so just get the WAL segment size from the database.
The archive-get command does not have a connection to the database, so get the WAL segment size from pg_control instead. This requires a deeper inspection of pg_control than has been done in the past, so it seemed best to copy the relevant data structures from each version of PostgreSQL and build a generic interface layer to address them. While this approach is a bit verbose, it has the advantage of being relatively simple, and can easily be updated for new versions of PostgreSQL.
Since the integration tests generate pg_control files for testing, teach Perl how to generate files with the correct offsets for both 32-bit and 64-bit architectures.
Unsecured, passwordless SSH can be a scary thing. If an attacker gains access to one system they can easily hop to other systems.
Add documentation on how to use the command parameter in authorized_keys to limit ssh to running a single command, pgbackrest. There is more that could be done for security but this likely addresses most needs.
Also change references to "trusted ssh" to "passwordless ssh" since this seems more correct.
Suggested by Stephen Frost, Magnus Hagander.
Use checksums rather than timestamps to determine if files have changed. This is useful in cases where the timestamps may not be trustworthy, e.g. when performing an incremental after failing over to a standby.
If checksum delta is enabled then checksums will be used for verification of resumed backups, even if they are full. Resumes have always used checksums to verify the files in the repository, enabling delta performs checksums on the database files as well.
Note that the user must manually enable this feature in cases were it would be useful or just keep in enabled all the time. A future commit will address automatically enabling the feature in cases where it seems likely to be useful.
Contributed by Cynthia Shang.
This option was previously allowed on the command-line only for no particular reason that we could determine.
Being able to specify it in the config file seems like a good idea and won't change current usage.
Contributed by Cynthia Shang.
Apparently we never needed to run this function remotely.
It will be needed by the backup checksum delta feature, so implement it now.
Contributed by Cynthia Shang.
The test to make sure that some files (e.g. pg_control) do not get removed during the backup was lost during the storage refactor committed at de7fc37f.
This did not impact the integrity of the backups, but bring it back since it is a nice sanity check.
Contributed by Cynthia Shang.
As we add storage drivers it's important to keep the tests for each completely separate. Rather than have three tests for each driver, standardize on having a single test unit for each driver.
This is a workaround for inefficient handling of many setjmps in gcc >= 4.9. Setjmp is used in all error handling, but in the unit tests each test macro contains an error handling block so they add up pretty quickly for large unit tests.
Enabling -ftree-coalesce-vars in affected versions reduces build time and memory requirements by nearly an order of magnitude. Even so, compiles are much slower than gcc <= 4.8.
We submitted a bug for this at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87316
Which was marked as a duplicate of: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63155
For read-only repositories the Posix and CIFS drivers behave exactly the same. Since that's all we support in C right now it's valid to treat them as the same thing. An assertion has been added to remind us to add the CIFS driver before allowing the repository to be writable.
Mostly we want to make sure that the C code does not blow up when the repository type is CIFS.
Previously it was the responsibility of the individual tests to clean up after themselves. Now the test harness now does the cleanup automatically.
This means that some paths/files need to be recreated with each run but that doesn't happen very often.
An attempt has been made to remove all redundant cleanup code but it's hard to know if everything has been caught. No issues will be caused by anything that was missed, but they will continue to chew up time in the tests.
Storing the expect log (created by common/harnessLog) in the regular test directory was not ideal. It showed up in tests and made it difficult to clear the test directory between each run.
Move the expect log to a purpose-built directory one level up so it does not interfere with regular testing.
These are separated the same way in the Perl code where the remote storage driver is located in the Protocol module. However, in the C code the intention is to implement the remote storage driver as a regular driver in the storage layer rather than making a special case out of it.
So, merge the storage helpers. This also has the benefit of making the code a bit simpler.
Also separate storageSpool() and storageSpoolWrite() to make it clearer which operations require write access and to maintain consistency with the other storage helper functions.
If the total bytes read from the expect log file was 0 then the last byte of whatever was in memory before harnessLogBuffer would be set to 0.
On 32-bit systems this expressed as the high order byte of a pointer being cleared and wackiness (in the form of segfaults) ensued.
Fixed parameter constructors made adding new interface functions a burden, so we switched to using structs to define interfaces in the storage module at c49eaec7.
While propagating this pattern to the IO interfaces it became obvious that the existing variable parameter function pattern (begun in the storage module) was more succinct and consistent with the existing code.
So, use variable parameter functions to define all interfaces. This assumes that the non-interface parameters will be fixed, which seems reasonable for low-level code.
C or Perl coverage tests can now be run on any VM provided a recent enough version of Devel::Cover or lcov is available.
For now, leave u18 as the only VM to run coverage tests due to some issues with older versions of lcov.
The external storage interfaces (Storage, StorageFileRead, etc.) have been stable for a while, but internally they were calling the posix driver functions directly.
Create driver interfaces for storage, fileRead, and fileWrite and remove all references to the posix driver outside storage/driver/posix (with the exception of a direct call to pathRemove() in Perl LibC).
Posix is still the only available driver so more adjustment may be needed, but this should represent the bulk of the changes.
The posix driver was developed over time and the naming is not very consistent.
Rename the files and functions to work well with other drivers and generally favor longer names since the driver functions are seldom (eventually never) used outside the driver itself.
The Storage object represents some some optional parameters as negated if the default is true. This allows sensible defaults without having to specify most optional parameters.
However, there's no need to propagate this down to functions that require all parameters to be passed -- it makes the code and logging more confusing. Rename the parameters and update logic to remove negations.
Previously, debug log functions had to handle NULLs and truncate output to the available buffer size. This was verbose for both coding and testing.
Instead, create a function/macro combination that allows log functions to return a simple String object. The wrapper function takes care of the memory context, handles NULLs, and truncates the log string based on the available buffer size.
The archive-get command will only be executed in C if the repository is local, unencrypted, and type posix or cifs. Admittedly a limited use case, but this is just the first step in migrating the archive-get command entirely into C.
This is a direct migration from the Perl code (including messages) to integrate as seamlessly with the remaining Perl code as possible. It should not be possible to determine if the C version is running unless debug-level logging is enabled.
The lock is now released before the fork and reacquired after the fork so the parent process no longer needs to worry about clearing the lock.
This is the same locking mechanism that will be used once archive-get-async is exec'd as a separate command, so introduce it now to simplify testing.
The info messages were spread around and logged differently based on the execution path and in some cases logged nothing at all.
Temporarily track the async server status with a flag so that info messages are not output in the async process. The async process will be refactored as a separate command to be exec'd in a future commit.
% characters caused issues in backup/restore due to filenames being appended directly into a format string.
Reserved XML characters (<>&') caused issues in the S3 driver due to improper escaping.
Add a file with all common special characters to regression testing.
File names with uncommon characters (e.g. @) caused authentication failures due to S3 encoding them correctly while the S3 driver did not.
Reported by Dan Farrell.
By default Valgrind does not exit with an error code when a non-fatal error is detected, e.g. unfreed memory. Use the --error-exitcode option to enabled this behavior.
Update some minor issues discovered in the tests as a result. Luckily, no issues were missed in the core code.
Basic functions to detect the presence of stanza or all stop files and error when they are present.
The functionality to detect stop files without error was not migrated. This functionality is only used by stanza-delete and will be migrated with that command.
Implement rules for generating paths within the archive part of the repository. Add a helper function, storageRepo(), to create the repository storage based on configuration settings.
The repository storage helper is located in the protocol module because it will support remote file systems in the future, just as the Perl version does.
Also, improve the existing helper functions a bit using string functions that were not available when they were written.
Use JSON code now that it is available and remove temporary hacks used to get things working initially.
Use passed storage objects rather than using storageLocal(). All storage objects in C are still local but this won't always be the case.
Also, move Postgres version conversion functions to postgres/info.c since they have no dependency on the info objects and will likely be useful elsewhere.
A return code of 1 from the archive-get was being logged as an error message at info level but otherwise worked correctly.
Also improve info messages when an archive segment is or is not found.
The Perl functions do so and the integration tests rely on checking for these errors. This has been exposed as more functionality is moved into C.
Passing the errors types is now a bit complicated so instead use a flag to determine which errors to throw.
Previously an error would be generated if other files were present and not owned by the PostgreSQL user. This hasn't been a big deal in practice but it could cause issues.
Also add tests to make sure the same logic applies with links to files, i.e. all other files in the directory should be ignored. This was actually working correctly, but there were no tests for it before.
Bug Fixes:
* Fix issue where relative links in $PGDATA could be stored in the backup with the wrong path. This issue did not affect absolute links and relative tablespace links were caught by other checks. (Reported by Cynthia Shang.)
* Remove incompletely implemented online option from the check command. Offline operation runs counter to the purpose of this command, which is to check if archiving and backups are working correctly. (Reported by Jason O'Donnell.)
* Fix issue where errors raised in C were not logged when called from Perl. pgBackRest properly terminated with the correct error code but lacked an error message to aid in debugging. (Reported by Douglas J Hunley.)
* Fix issue when a boolean option (e.g. delta) was specified more than once. (Reported by Yogesh Sharma.)
Features:
* Allow any option to be set in an environment variable. This includes options that previously could only be specified on the command line, e.g. stanza, and secret options that could not be specified on the command-line, e.g. repo1-s3-key-secret.
* Exclude temporary and unlogged relation (table/index) files from backup. Implemented using the same logic as the patches adding this feature to PostgreSQL, 8694cc96 and 920a5e50. Temporary relation exclusion is enabled in PostgreSQL ≥ 9.0. Unlogged relation exclusion is enabled in PostgreSQL ≥ 9.1, where the feature was introduced. (Contributed by Cynthia Shang.)
* Allow arbitrary directories and/or files to be excluded from a backup. Misuse of this feature can lead to inconsistent backups so read the --exclude documentation carefully before using. (Reviewed by Cynthia Shang.)
* Add log-subprocess option to allow file logging for local and remote subprocesses.
* PostgreSQL 11 Beta 3 support.
Improvements:
* Allow zero-size files in backup manifest to reference a prior manifest regardless of timestamp delta. (Contributed by Cynthia Shang.)
* Improve asynchronous archive-get/archive-push performance by directly checking status files. (Contributed by Stephen Frost.)
* Improve error message when a command is missing the stanza option. (Suggested by Sarah Conway.)
Relative link paths were being combined with the paths of previous links (relative or absolute) due to the $strPath variable being modified in the current iteration rather than simply being passed to the next level of recursion.
This issue did not affect absolute links and relative tablespace links were caught by other checks, though the error was confusing.
Reported by Cynthia Shang.
Prior to this commit, an expression was used to search the spool directory for ok/error files for a specific WAL segment. This involved setting up a regular expression and using opendir/readdir.
Instead, directly probe for the status files, checking directly if a '.ok' or '.error' file exists, avoiding the regular expression and eliminating the directory scan.
Only the two files now probed for could have ever matched the regular expression which had been provided and it's unlikely that many more additional files will be added, so this is a good improvement, and optimization, with little downside.
Contributed by Stephen Frost.
Offline operation runs counter to the purpose of this command, which is to check if archiving and backups are working correctly.
Reported by Jason O'Donnell.
Contributor names have always been presented in the release notes exactly as given, but we tried to assign internal IDs based on last/first name which can be hard to determine and ultimately doesn't make sense.
Inspired by Christophe Pettus' PostgresOpen 2017 talk, "Human Beings Do Not Have a Primary Key".
Implemented using the same logic as the patches adding this feature to PostgreSQL, 8694cc96 and 920a5e50. Temporary relation exclusion is enabled in PostgreSQL ≥ 9.0. Unlogged relation exclusion is enabled in PostgreSQL ≥ 9.1, where the feature was introduced.
Contributed by Cynthia Shang.
This includes PostgreSQL installation which had previously been included in the documentation. This way produces faster builds and there is no need for us to document PostgreSQL installation.
This allows setting the test log level independently from the general test harness setting, but current only works for the C tests. It is useful for seeing log output from functions on the console while a test is running.
common/harnessLog was not ideally suited for general testing and made all the tests quite awkward. Instead, move all code used to test the common/log module into the logTest module and repurpose common/harnessLog to do log expect testing for all other tests in a cleaner way.
Add a few exceptions for config testing since the log levels are reset by default in config/parse.
This is more efficient overall and allows the caller to specify how many bytes will be read on each call. Reads are appended if the buffer already contains data but the buffer size will never increase.
Allow Buffer object "used size" to be different than "allocated size". Add functions to manage used size and remaining size and update automatically when possible.
Use strtoll() instead of sprintf() for conversion. Also use available integer min/max constants rather than hard-coded values.
Reviewed by Stephen Frost.
Suggested by Stephen Frost.
IMPORTANT NOTE: This release fixes a critical bug in the backup resume feature. All resumed backups prior to this release should be considered inconsistent. A backup will be resumed after a prior backup fails, unless resume=n has been specified. A resumed backup can be identified by checking the backup log for the message "aborted backup of same type exists, will be cleaned to remove invalid files and resumed". If the message exists, do not use this backup or any backup in the same set for a restore and check the restore logs to see if a resumed backup was restored. If so, there may be inconsistent data in the cluster.
Bug Fixes:
* Fix critical bug in resume that resulted in inconsistent backups. A regression in v0.82 removed the timestamp comparison when deciding which files from the aborted backup to keep on resume. See note above for more details. (Reported by David Youatt, Yogesh Sharma, Stephen Frost.)
* Fix error in selective restore when only one user database exists in the cluster. (Fixed by Cynthia Shang. Reported by Nj Baliyan.)
* Fix non-compliant ISO-8601 timestamp format in S3 authorization headers. AWS and some gateways were tolerant of space rather than zero-padded hours while others were not. (Fixed by Andrew Schwartz.)
Features:
* PostgreSQL 11 Beta 2 support.
Improvements:
* Improve the HTTP client to set content-length to 0 when not specified by the server. S3 (and gateways) always set content-length or transfer-encoding but HTTP 1.1 does not require it and proxies (e.g. HAProxy) may not include either. (Suggested by Adam K. Sumner.)
* Set search_path = 'pg_catalog' on PostgreSQL connections. (Suggested by Stephen Frost.)
A regression in v0.82 removed the timestamp comparison when deciding which files from the aborted backup to keep on resume. All resumed backups should be considered inconsistent. A resumed backup can be identified by checking the log for the message "aborted backup of same type exists, will be cleaned to remove invalid files and resumed".
Reported by David Youatt, Yogesh Sharma, Stephen Frost.
S3 (and gateways) always set content-length or transfer-encoding but HTTP 1.1 does not require it and proxies (e.g. HAProxy) may not include either.
Suggested by Adam K. Sumner.
* Build containers from scratch for more accurate testing.
* Allow environment load to be skipped.
* Allow bash wrapping to be skipped.
* Allow forcing a command to run as a user without sudo.
Bug Fixes:
* Fix potential buffer overrun in error message handling. (Reported by Lætitia.)
* Fix archive write lock being taken for the synchronous archive-get command. (Reported by Uspen.)
Improvements:
* Embed exported C functions and Perl modules directly into the pgBackRest executable.
* Use time_t instead of __time_t for better portability. (Suggested by Nick Floersch.)
* Print total runtime in milliseconds at command end.
Low-level functions only include stack trace in test builds while higher-level functions ship with stack trace built-in. Stack traces include all parameters passed to the function but production builds only create the parameter list when the log level is set high enough, i.e. debug or trace depending on the function.
* Allow more than one test to provide coverage for the same module.
* Add option to disable valgrind.
* Add option to disabled coverage.
* Add option to disable debug build.
* Add option to disable compiler optimization.
* Add --dev-test mode.
Bug Fixes:
* Fix directory syncs running recursively when only the specified directory should be synced. (Reported by Craig A. James.)
* Fix archive-copy throwing "path not found" error for incr/diff backups. (Reported by yummyliu, Vitaliy Kukharik.)
* Fix failure in manifest build when two or more files in PGDATA are linked to the same directory. (Reported by Vitaliy Kukharik.)
* Fix delta restore failing when a linked file is missing.
* Fix rendering of key/value and list options in help. (Reported by Clinton Adams.)
Features:
* Add asynchronous, parallel archive-get. This feature maintains a queue of WAL segments to help reduce latency when PostgreSQL requests a WAL segment with restore_command.
* Add support for additional pgBackRest configuration files in the directory specified by the --config-include-path option. Add --config-path option for overriding the default base path of the --config and --config-include-path option. (Contributed by Cynthia Shang.)
* Add repo-s3-token option to allow temporary credentials tokens to be configured. pgBackRest currently has no way to request new credentials so the entire command (e.g. backup, restore) must complete before the credentials expire. (Contributed by Yogesh Sharma.)
Improvements:
* Update the archive-push-queue-max, manifest-save-threshold, and buffer-size options to accept values in KB, MB, GB, TB, or PB where the multiplier is a power of 1024. (Contributed by Cynthia Shang.)
* Make backup/restore path sync more efficient. Scanning the entire directory can be very expensive if there are a lot of small tables. The backup manifest contains the path list so use it to perform syncs instead of scanning the backup/restore path.
* Show command parameters as well as command options in initial info log message.
* Rename archive-queue-max option to archive-push-queue-max to avoid confusion with the new archive-get-queue-max option. The old option name will continue to be accepted.
pgBackRest currently has no way to request new credentials so the entire command (e.g. backup, restore) must complete before the credentials expire.
Contributed by Yogesh Sharma.
Many options that were set per test can instead be inferred from the types, i.e. container, c, expect, and individual.
Also finish renaming Perl unit tests with the -perl suffix.
* Add storageCopy(), storageMove(), and storagePathSync().
* Separate StorageFile object into separate read and write objects.
* Abstract out Posix file read/write objects.
Configuration files are loaded from the directory specified by the --config-include-path option.
Add --config-path option for overriding the default base path of the --config and --config-include-path option.
Contributed by Cynthia Shang.
Mainly this helps with unit tests that need to do log expect testing. Add harnessCfgLoad() test function, which allows a new config to be loaded for unit testing without resetting log functions, opening a log file, or taking locks.
The Perl process was exiting directly when called but that interfered with proper locking for the forked async process. Now Perl returns results to the C process which handles all errors, including signals.
Now only two types of locks can be taken: archive and backup. Most commands use one or the other but the stanza-* commands acquire both locks. This provides better protection than the old command-based locking scheme.
This implementation should be faster because it does not stat each file. It simply assumes that most directory entries are files so attempts an unlink() first. If the entry is reported by error codes to be a directory then it attempts an rmdir().
This makes it easier to create objects and then copy them to another context when they are complete without having to worry about freeing them on error. Update List, StringList, and Buffer to allow moves. Update Ini and Storage to take advantage of moves.
Scanning the entire backup directory can be very expensive if there are a lot of small tables. The backup manifest contains the backup directory list so use it to perform syncs instead of scanning the backup directory.