This warning was being output when getting help if retention was not set:
WARN: option repo1-retention-full is not set, the repository may run out of space
Suppress this when getting help since the warning will display by default on a system that is not completely configured.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
This is very inefficient in terms of memory and time and dynamic context names were never utilized.
Just require that context names be valid for the life of the context.
In practice they are all static strings.
Allocations required a sequential scan through the allocation list for both contexts and memory. This was very inefficient since for the most part individual memory allocations are seldom freed directly, rather they are freed when their context is freed.
For both types of allocations track an index for the lowest free position. After an allocation of the free position, a sequential search will be required for the next allocation but this is still far better than doing a scan for every allocation.
With a moderately-sized dataset (500 history entries in backup.info), there is a 237X performance improvement when combined with the f74e88bb refactor.
Before:
% cumulative self
time seconds seconds name
65.11 331.37 331.37 memContextAlloc
16.19 413.78 82.40 memContextCurrent
14.74 488.81 75.03 memContextTop
2.65 502.29 13.48 memContextNewIndex
1.18 508.31 6.02 memFind
After:
% cumulative self
time seconds seconds name
94.69 2.14 2.14 memFind
Finding memory allocations in order to free or resize them is the next bottleneck, but this does not seem to be a major issue presently.
The prior method depended on IO:Socket:SSL to push the keep-alive options down to the socket but it only worked for recent versions of the module.
Instead, create the socket directly using IO::Socket::IP if available or IO:Socket:INET as a fallback. The keep-alive option is set directly on the socket before it is passed to IO:Socket:SSL.
Contributed by Marc Cousin.
The command option was not being set correctly when a remote was started from a local. It was being set as 'local' rather than the command that the local was running as.
Also automatically select the remote protocol id based on whether it is started from a local (use the local protocol id) or from the main process (use 0).
These were not live issues but could cause strange behaviors as new features are added that might be hard to diagnose.
This new implementation should behave exactly like the old Perl code with the exception of a few updated log messages.
Remove as much of the Perl code as possible without breaking other commands.
The C local is only used for C commands in the main process.
Some tweaking of the existing protocolGet() command was required. Originally the idea was to share the function for local and remote requests but the differences (as in Perl) were too great to make that practical.
Some IO objects have file descriptors which can be useful for monitoring with select().
It might also be useful to expose handles for write objects but there is currently no use case.
There was a lot of extra boilerplate involved in setting up pipes so that is now automated.
In some cases testing with multiple children is useful so allow that as well.
This amends 70c30dfb which disabled test tracing in general.
Instead, only enable test tracing by default for modules that are being unit tested. This saves lots of time but still ensures that test tracing is working and helps with debugging in unit tests.
Also rename the option to --debug-test-trace for a clarity.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
The expect tests were originally a rough-and-ready type of unit test so monitoring changes in the expect log helped us detect changes in behavior.
Now the stanza code is heavily unit-tested so the detailed logs mainly cause churn and don't have any measurable benefit.
Reduce the log level to DETAIL to make the logs less verbose and volatile, yet still check user-facing log messages.
The same test configurations are run on all four test VMs, which seems a real waste of resources.
Vary the tests per VM to increase coverage while reducing the total number of tests. Be sure to include each major feature (remote, s3, encryption) in each VM at least once.
The expect tests were originally a rough-and-ready type of unit test so monitoring changes in the expect log helped us detect changes in behavior.
Now the archive code is heavily unit-tested so the detailed logs mainly cause churn and don't have any measurable benefit.
Reduce the log level to DETAIL to make the logs less verbose and volatile, yet still check user-facing log messages.
Update error message with the hostname and more detail about what went wrong. Hopefully this will help in diagnosing certificate/hostname issues.
Suggested by James Badger.
We have been using a hacked-up JSON generator to pass options from C to Perl since the C binary was introduced. This generator was not very compliant which led to issues with \n, ", etc. inside strings.
We have a fully-compliant JSON generator now so use that instead.
Reported by Leo Khomenko.
Detailed stack traces for low-level functions (e.g. strCat, bufMove) can be very useful for debugging but leaving them on for all tests has become quite burdensome in terms of time. Complex operations like generating JSON on a large KevValue can lead to timeouts even with generous values.
Add a new param, --debug-trace, to enable test-level stack trace, but leave it off by default.
The remote protocol was calling into the Storage object but this required some translation which will get more awkward as time goes by.
Instead, call directly into the local driver so the communication is directly driver to driver. This still requires resolving the path and may eventually have more duplication with the Storage object methods but it seems the right thing to do.
Expressions such as <REPO:ARCHIVE> require a stanza name in order to be resolved correctly. However, if the stanza name is passed to the remote then that remote will only work correctly for that one stanza.
Instead, resolved the expressions locally but still pass a relative path to the remote. That way, a storage path that is only configured on the remote does not need to be known locally.
This issue was a result of STORAGE_REPO_PATH prepending an extra stanza when the stanza was specified on the command line.
The tests missed this because by some strange coincidence the WAL dirs were empty for each test that specified a stanza. Add new tests to prevent a regression.
Fixed by Stefan Fercot.
Free all cached objects in the storage helper, especially the stanza name.
This clears the storage environment for tests that switch stanza names or go from a stanza name to no stanza name or vice versa. This is only useful for testing right now, but may be used in the future for commands than act on multiple stanzas.
This command was previously forked off from the archive-get command which required a bit of artificial option and log manipulation.
A separate command is easier to test and will work on platforms that don't have fork(), e.g. Windows.
Prior to this the Perl remote was used to satisfy C requests. This worked fine but since the remote needed to be migrated to C anyway there was no reason to wait.
Add the ProtocolServer object and tweak ProtocolClient to work with it. It was also necessary to add a mechanism to get option values from the remote so that encryption settings could be read and used in the storage object.
Update the remote storage objects to comply with the protocol changes and add the storage protocol handler.
Ideally this commit would have been broken up into smaller chunks but there are cross-dependencies in the protocol layer and it didn't seem worth the extra effort.
The file write object destructors called close() and finalized the file even if it was not completely written. This was an issue in both the C and Perl code.
Rewrite the destructors to simply free resources (like file handles) rather than calling the close() method. This leaves the temp file in place for filesystems that use temp files.
Add unit tests to prevent regression.
Reported by blogh.
This was not being caught because the integration tests for S3 were running remotely and going through the Perl code rather than the new C code.
Implement the exists method for the S3 driver and add tests to prevent a regression.
Reported by mibiio.
This already worked in reverse, but this case is needed when a command that only uses protocol-timeout (e.g. info) calls a remote process where protocol-timeout and db-timeout can be set. If protocol-timeout was set to less than the default db-timeout then an error resulted.
Rather than create _P/_PP variants for every type that needs to pass/return pointers, create FUNCTION_*_P/PP() macros that will properly pass or return any single/double pointer types.
There remain a few unresolved edge cases such as CHARPY but this handles the majority of types well.
The string object was reallocating memory with every concatenation which is not very efficient. This is especially true for JSON rendering which does a lot of concatenations.
Instead allocate a pool of extra memory on the first concatenation (50% of size) to be used for future concatenations and reallocate when needed.
Also add a 1GB size limit to ensure that there are no overflows.
Multiple status files were being created by asynchronous archiving if a high-level error occurred after one or more WAL segments had already been transferred successfully. Error files were being written for every file in the queue regardless of whether it had already succeeded. To fix this, add an option to skip writing error files when an ok file already exists.
There are other situations where both files might exist (various fsync and filesystem error scenarios) so it seems best to retry in the case that multiple status files are found rather than throwing a hard error (which then means that archiving is completely stuck). In the case of multiple status files, a warning will be logged to alert the user that something unusual is happening and the command will be retried.
Reported by fpa-postgres, Joe Ayers, Douglas J Hunley.
The implementation using gethostbyname() was only intended to be used during prototyping but was forgotten when the code was finalized.
Replace it with gettaddrinfo() which is more modern and supports IPv6.
Suggested by Bruno Friedmann.
Rename FUNCTION_DEBUG_* macros to FUNCTION_LOG_* to more accurately reflect what they do. Further rename FUNCTION_DEBUG_RESULT* macros to FUNCTION_LOG_RETURN* to make it clearer that they return from the function as well as logging. Leave FUNCTION_TEST_* macros as they are.
Consolidate the various ASSERT* macros into a single ASSERT macro that is always compiled out of production builds. It was difficult to figure out when an assert would be checked with all the different types in play. When ASSERTs are compiled in they will always be checked regardless of the log level -- tying these two concepts together was not a good idea.
The C info code has already been committed but this commit wires it into main.
Also remove the info Perl code and tests since they are no longer called.
This is a partial implementation of remote storage with just enough functionality to get the info command working. The client is written in C but the server is still in Perl, which limits progress until a C server is written.
This is a complete protocol client implementation in C.
Currently there is no C server implementation so the C client is talking to a Perl server. This won't work very long, though, as the protocol format, even though in JSON, has a lot of language-specific structure. While it would be possible to maintain compatibility between C and Perl it's probably not worth the effort in the long run.
Just as in Perl there are helper functions to make constructing protocol objects easier. Currently only repository remotes are supported.
Executes a child process and allows the calling process to communicate with it using read/write io.
This object is specially tailored to implement the protocol layer and may or may not be generally applicable to general purpose
execution.
Parameters for the local/remote commands are based on parameters that are passed to the current command.
Generate parameters for the new command based on the intersection of parameters between the current command and the command to be executed.
General i/o objects for reading and writing file descriptors, in particular those that can block. In other words, these are not generally to be used with file descriptors for actual files, but rather pipes, sockets, etc.
The C code can't get the cipher type from the storage object because the C storage object does not have encryption baked in like the Perl code does.
Instead, check backup.info to see if encryption is enabled. This will need to rethought if another cipher type is added but for now it works fine.
Replace the repository path with just "the repository". The path is not important in this context and it is clearer to state where the stanzas are missing from.
The prior behavior was to throw an exception but this was not very helpful when something unexpected happened. Better to at least emit the error message even if the error code is not very helpful.
There were some small differences in ordering and how the C version handled missing directories. It may be that the C version is more consistent, but for now it is more important to be compatible with the Perl version.
These differences were missed because the C info command was not wired into main.c so it was not being tested in regression. This commit does not fix the wiring issue because there will likely be a release soon and it is too big a change to put in at the last moment.
Casting to int caused large values to be slightly inaccurate so cast to uint64_t instead.
Also, use multiplication where possible since the compiler should precompute multiplied values.
SIGPIPE immediately terminates the process but we would rather catch the EPIPE error and gracefully shutdown.
Ignore SIGPIPE and throw the EPIPE error via normal error handling.
Including the C module after the headers required for testing meant that if headers were missing from the C module they were not caught while directly testing the C module.
The missing headers were caught in general testing, but it is frustrating to get an error in a module that has already passed while testing another module or running CI.
Move the C module include to the very top so missing headers cause immediate failures.
Bug Fixes:
* Remove request for S3 object info directly after putting it. (Reported by Matt Kunkel.)
* Correct archive-get-queue-max to be size type. (Reported by Ronan Dunklau.)
* Add error message when current user uid/gid does not map to a name. (Reported by Camilo Aguilar.)
* Error when --target-action=shutdown specified for PostgreSQL < 9.5.
Improvements:
* Set TCP keepalives on S3 connections. (Suggested by Ronan Dunklau.)
* Reorder info command text output so most recent backup is output last. (Contributed by Cynthia Shang. Suggested by Ryan Lambert.)
* Change file ownership only when required.
* Redact authentication header when throwing S3 errors. (Suggested by Brad Nicholson.)
Admonitions call out places where the user should take special care.
Support added for HTML, PDF, Markdown and help text renderers. XML files have been updated accordingly.
Contributed by Cynthia Shang.
After a stanza-upgrade backups for the old cluster are displayed until they expire. Cluster info was output newest to oldest which meant after an upgrade the most recent backup would no longer be output last.
Update the text output ordering so the most recent backup is always output last.
Contributed by Cynthia Shang.
Suggested by Ryan Lambert.
The info command will only be executed in C if the repository is local, i.e. not located on a remote repository host. S3 is considered "local" in this case.
This is a direct migration from Perl to integrate as seamlessly with the remaining Perl code as possible. It should not be possible to determine if the C version is running unless debug-level logging is enabled.
Contributed by Cynthia Shang.
The infoBackup object is the counterpart to the infoArchive object which encapsulates the archive.info file.
Currently the object is read-only, i.e. it is not possible to create a new or modify an existing backup.info file.
There a number of constants that will also be used in the infoManifest object so go ahead and create a module to contain them so they don't need to be moved later.
Contributed by Cynthia Shang.
Some commands (e.g. info) do not take a stanza or the stanza is optional. In that case it is the job of the command to construct the repository path with a stanza as needed.
Update helper functions to omit the stanza from the constructed path when it is NULL.
Contributed by Cynthia Shang.
- Add detail to errors when info files are loaded with incorrect encryption settings.
- Throw FileMissingError rather than FileOpenError when both copies of the info file are missing.
- If one file is present (but errors) and the other is missing, then return the error for the file that was present.
Contributed by Cynthia Shang.
Previously chown() would be called even when no ownership changes were required.
In most cases changes are not required and it seems better to perform an extra stat() rather than an extra chown().
Also add unit tests for owner() since there weren't any.
This got missed in 1f8931f7 when the test binary was renamed.
Also output call graph along with the flat report. The flat report is generally most useful but it doesn't hurt to have both.
The previous error message only showed the last error. In addition, some errors were missed (such as directory permission errors) that could prevent the copy from being checked.
Show both errors below a generic "unable to load" error. Details are now given explaining exactly why the primary and copy failed.
Previously if one file could not be loaded a warning would be output. This has been removed because it is not clear what the user should do in this case. Should they do a stanza-create --force? Maybe the best idea is to automatically repair the corrupt file, but on the other hand that might just spread corruption if pgBackRest makes the wrong choice.
The decryption filter was added in archiveGetFile() and archiveGetCheck() was modified to return the WAL decryption key stored in archive.info. The rest was plumbing.
The mock/archive/1 integration test added encryption to provide coverage for the new code paths while mock/archive/2 dropped encryption to provide coverage for the existing code paths. This caused some churn in the expect logs but there was no change in behavior.
If InOut filters were placed next to each other then the second filter would never get a NULL input signaling it to flush. This arrangement only worked if the second filter had some other indication that it should flush, such as a decompression filter where the flush is indicated in the input stream.
This is not a live issue because currently no InOut filters are chained together.
This allows CipherBlock to be used as a filter in an IoFilterGroup. The C-style functions used by Perl are now deprecated and should not be used for any new code.
Also add functions to convert between cipher names and CipherType.