Vendorized code is copied from another project when a library is not available and a git subproject won't work. Currently all the vendorized code is copied from PostgreSQL but it makes sense to have a more general mechanism for indicating vendorized code.
The .vendor extension will be used to denote vendorized code in the same way that .auto is used to denote auto-generated code.
This was an oversight in 438b957f which added multiple compression type support. The booleans were interpreted as none and gz which works fine for the CompressType enum until the position of gz or none changes.
An upcoming feature requires new parameters for storagePosixNew() and this causes a lot of churn because almost every test creates a Posix storage object. Some refactoring in the tests might reduce this duplication but storagePosixNew() is collecting a lot of parameters so converting to storagePosixNewP() makes sense in any case.
There are relatively few call sites in the core code but they still benefit from better readability after this change.
Make the restore clean process look more like manifest build, i.e. do cleanup of each target root directory outside the main cleanup callback. This means some code duplication but removes the logic handling "dot" paths.
Add tests for both restore and backup (which already worked but was not tested).
The prior behavior introduced in dcddf3a5 could possibly lead to postgresql.conf or postgresql.auto.conf being truncated in the backup since they are copied via tmp files and could change size during the backup.
In general it seems safer to limit this feature to WAL-logged files which will be reconstructed during recovery.
If a file grows during the backup it will be reconstructed by WAL replay during recovery so there is no need to copy the additional data.
This also reduces the likelihood of seeing torn pages during the copy. Torn pages can still occur in the middle of the file, though, so they must be handled.
The manifest is excellent for validation but including the entire manifest is too noisy and some values are architecture/algorithm dependent.
Output a redacted version that contains the most important information which can be improved on over time.
Add compress-type option and deprecate compress option. Since the compress option is boolean it won't work with multiple compression types. Add logic to cfgLoadUpdateOption() to update compress-type if it is not set directly. The compress option should no longer be referenced outside the cfgLoadUpdateOption() function.
Add common/compress/helper module to contain interface functions that work with multiple compression types. Code outside this module should no longer call specific compression drivers, though it may be OK to reference a specific compression type using the new interface (e.g., saving backup history files in gz format).
Unit tests only test compression using the gz format because other formats may not be available in all builds. It is the job of integration tests to exercise all compression types.
Additional compression types will be added in future commits.
Page size is passed around a lot but in fact it can only have one value, PG_PAGE_SIZE_DEFAULT, which is checked when pg_control is loaded. There may be an argument for supporting multiple page sizes in the future but for now just use the constant to simplify the code.
There is also a significant performance benefit. Because pageSize was being used in pageChecksumBlock() the main loop was neither unrolled nor vectorized (-funroll-loops -ftree-vectorize) as it is now with a constant loop boundary.
These data structures were copied a few places (but only once in the core code) so put them in a place where everyone can use them.
To do this create a new file, static.auto.h, to contain data types and macros that have stayed the same through all the versions of PostgreSQL that we support. This allows us to have single, non-versioned set of headers and code for stable data structures like page headers.
Migrate a few types from version.auto.h that are required for page header structures and pull the remaining types from PostgreSQL directly.
We had previously renamed xlog to wal so update those where required since we won't be modifying the PostgreSQL names anymore.
This was a minor optimization used in protocol layer compression. Even though it was slightly faster, it omitted the crc-32 that is generated during normal compression which could lead to corrupt data after a bad network transmission. This would be caught on restore by our checksum but it seems better to catch an issue like this early.
The raw option also made the function signature different than future compression formats which may not support raw, or require different code to support raw.
In general, it doesn't seem worth the extra testing to support a format that has minimal benefit and is seldom used, since protocol compression is only enabled when the transmitted data is uncompressed.
"gz" was used as the extension but "gzip" was generally used for function and type naming.
With a new compression format on the way, it makes sense to standardize on a single abbreviation to represent a compression format in the code. Since the extension is standard and we must use it, also use the extension for all naming.
If a file was removed by PostgreSQL during the backup (or was missing from the standby) then the next file might not be copied and updated in the manifest. If this happened then the backup would error when restored.
The issue was that removing files from the manifest invalidated the pointers stored in the processing queues. When a file was removed, all the pointers shifted to the next file in the list, causing a file to be unprocessed. Since the unprocessed file was still in the manifest it would be saved with no checksum, causing a failure on restore.
When process-max was > 1 then the bug would often not express since the file had already been pulled from the queue and updates to the manifest are done by name rather than by pointer.
Previously memNew() used memset() to initialize all struct members to 0, NULL, false, etc. While this appears to work in practice, it is a violation of the C specification. For instance, NULL == 0 must be true but neither NULL nor 0 must be represented with all zero bits.
Instead use designated initializers to initialize structs. These guarantee that struct members will be properly initialized even if they are not specified in the initializer. Note that due to a quirk in the C99 specification at least one member must be explicitly initialized even if it needs to be the default value.
Since pre-zeroed memory is no longer required, adjust memAllocInternal()/memReallocInternal() to return raw memory and update dependent functions accordingly. All instances of memset() have been removed except in debug/test code where needed.
Add memMewPtrArray() to allocate an array of pointers and automatically set all pointers to NULL.
Rename memGrowRaw() to the more logical memResize().
The timeline is required to verify WAL segments in the archive after a backup. The conversion was performed base 10 instead of 16, which led to errors when the timeline was ≥ 0xA.
This macro was created before the String object existed so subsequent usage with String always included a lot of strPtr() wrapping.
TEST_RESULT_STR_Z() had already been introduced but a wholesale replacement of TEST_RESULT_STR() was not done since the priority was on the C migration.
Update all calls to (old) TEST_RESULT_STR() with one of the following variants: (new) TEST_RESULT_STR(), TEST_RESULT_STR_Z(), TEST_RESULT_Z(), TEST_RESULT_Z_STR().
PostgreSQL >= 9.6 uses non-exclusive backup which has implicit stop-auto since the backup will stop when the connection is terminated.
The warning was made more verbose in 1f2ce45e but this now seems like a bad idea since there are likely users with mixed version environments where stop-auto is enabled globally. There's no reason to fill their logs with warnings over a harmless option. If anything we should warn when stop-auto is explicitly set to false but this doesn't seem very important either.
Revert to the prior behavior, which is to warn and reset when stop-auto is enabled on PostgreSQL < 9.3.
For the most part this is a direct migration of the Perl code into C except as noted below.
A backup can now be initiated from a linked directory. The link will not be stored in the manifest or recreated on restore. If a link or directory does not already exist in the restore location then a directory will be created.
The logic for creating backup labels has been improved and it should no longer be possible to get a backup label earlier than the latest backup even with timezone changes or clock skew. This has never been an issue in the field that we know of, but we found it in testing.
For online backups all times are fetched from the PostgreSQL primary host (before only copy start was). This doesn't affect backup integrity but it does prevent clock skew between hosts affecting backup duration reporting.
Archive copy now works as expected when the archive and backup have different compression settings, i.e. when one is compressed and the other is not. This was a long-standing bug in the Perl code.
Resume will now work even if hardlink settings have been changed.
Reviewed by Cynthia Shang.
A recopy would occur if the size or checksum was invalid but on error the backup would terminate.
Instead, recopy the resumed file on any error. If the error is systemic (e.g. network failure) then it should show up again during the recopy.
Adding a dummy column which is always set by the P() macro allows a single macro to be used for parameters or no parameters without violating C's prohibition on the {} initializer.
-Wmissing-field-initializers remains disabled because it still gives wildly different results between versions of gcc.
Three major changes were required to get this working:
1) Provide the path to pgbackrest in the build directory when running outside a container. Tests in a container will continue to install and run against /usr/bin/pgbackrest.
1) Set a per-test lock path so tests don't conflict on the default /tmp/pgbackrest path. Also set a per-test log-path while we are at it.
2) Use localhost instead of a custom host for TLS test connections. Tests in containers will continue to update /etc/hosts and use the custom host.
Add infrastructure and update harnessCfgLoad*() to get the correct exe and paths loaded for testing.
Since new tests are required to verify that running outside a container works, also rework the tests in Travis CI to provide coverage within a reasonable amount of time. Mainly, break up to doc tests by VM and run an abbreviated unit test suite on co6 and co7.
The local process is now entirely migrated to C. Since all major I/O operations are performed in the local process, the vast majority of I/O is now performed in C.
Contributed by David Steele, Cynthia Shang.