Add production checks to ensure no filter gets a zero-size input buffer.
Also, optimize the case where a filter returns no output. There's no sense in running downstream filters if they have no new input.
The IoRead object was passing zero-length buffers into the filter processing code but not all the filters were happy about getting them.
In particular, the gzip compression filter failed if it was given no input directly after it had flushed all of its buffers. This made the problem rather intermittent even though a zero-length buffer was being passed to the filter at the end of every file. It also explains why tweaking compress-level or buffer-size allowed the file to go through.
Since this error was happening after all processing had completed, there does not appear to be any risk that successfully processed files were corrupted.
Reported by brunre01, jwpit, Tomasz Kontusz, guruguruguru.
Releasing the lock too early was allowing other async processes to sneak in and start running before the current process was completely shut down.
The only symptom seems to have been mixed up log messages so not a very serious issue.
Asserts were only only reported on stderr rather than being returned through the protocol layer. This did not appear to be very reliable.
Instead, report the assert through the protocol layer like any other error. Add a stack trace if an assert error or debug logging is enabled.
These work almost exactly like the String constant macros. However, a struct per variant type was required which meant custom constructors and destructors for each type.
Propagate the variant constants out into the codebase wherever they are useful.
The STRING_CONST() macro worked fine for constants but was not able to constify strings created at runtime.
Add the STR() macro to do this by using strlen() to get the size.
Also rename STRING_CONST() to STRDEF() for brevity and to match the other macro name.
Removed the "anchor" parameter because it was never used in any calls in the Perl code so it was just a dead parameter that always defaulted to true.
Contributed by Cynthia Shang.
These constants are easier than using cfgOptionName() and cfgCommandName() and lead to cleaner code and simpler to construct messages.
String versions are provided. Eventually all the strings will be used in the config structures, but for now they are useful to avoid wrapping with strNew().
IMPORTANT NOTE: The new TLS/SSL implementation forbids dots in S3 bucket names per RFC-2818. This security fix is required for compliant hostname verification.
Bug Fixes:
* Fix issues when a path option is / terminated. (Reported by Marc Cousin.)
* Fix issues when log-level-file=off is set for the archive-get command. (Reported by Brad Nicholson.)
* Fix C code to recognize host:port option format like Perl does. (Reported by Kyle Nevins.)
* Fix issues with remote/local command logging options.
Improvements:
* The archive-push command is implemented entirely in C.
* Increase process-max limit to 999. (Suggested by Rakshitha-BR.)
* Improve error message when an S3 bucket name contains dots.
Documentation Improvements:
* Clarify that S3-compatible object stores are supported. (Suggested by Magnus Hagander.)
This was not an intentional feature in Perl, but it works, so it makes sense to implement the same syntax in C.
This is a break from other places where a -port option is explicitly supplied, so it may make sense to support both styles going forward. This commit does not address that, however.
Reported by Kyle Nevins.
The Perl lib we have been using for TLS allows dots in wildcards, but this is forbidden by RFC-2818. The new TLS implementation in C forbids this pattern, just as PostgreSQL and curl do.
However, this does present a problem for users who have been using bucket names with dots in older versions of pgBackRest. Since this limitation exists for security reasons there appears to be no option but to take a hard line and do our best to notify the user of the issue as clearly as possible.
This problem was not specific to archive-get, but that was the only place it was expressing in the last release. The new archive-push was also affected.
The issue was with daemon processes that had closed all their file descriptors. When exec'ing and setting up pipes to communicate with a child process the dup2() function created file descriptors that overlapped with the first descriptor (stdout) that was being duped into. This descriptor was subsequently closed and wackiness ensued.
If logging was enabled (the default) that increased all the file descriptors by one and everything worked.
Fix this by checking if the file descriptor to be closed is the same one being dup'd into. This solution may not be generally applicable but it works fine in this case.
Reported by Brad Nicholson.
The documentation mentioned Amazon S3 frequently but failed to mention that other S3-compatible object stores are also supported.
Tone down the specific mentions of Amazon S3 and replace them with "S3-compatible object store" when appropriate.
Suggested by Magnus Hagander.
This new implementation should behave exactly like the old Perl code with the exception of updated log messages.
Remove as much of the Perl code as possible without breaking other commands.
When a repository server is configured, commands that modify the repository acquire a remote lock as well as a local lock for extra protection against multiple writers.
Instead of the custom logic used in Perl, make remote locking part of the command configuration.
This also means that the C remote needs the stanza since it is used to construct the lock name. We may need to revisit this at a later date.
While the local processes are doing their jobs the remote connection from the main process may timeout.
Send occasional noops to ensure that doesn't happen.
This may not be the best way to detect 64-bit platforms but it seems to be working fine so far.
Create a macro to make it clearer what is being done and to make it easier to change the implementation.
This was missed because the unit tests were reusing a buffer without resetting it to zero, so this flag ended up still set when the test function was called.
This was not a live issue since it only expressed in tests and this code is not used in master yet.
The test harness was not being built with warnings which caused some wackiness with an improperly structured switch. Just use the same warnings as the code being tested.
Also enable warnings on code that is not directly being tested since other code modules are frequently modified during testing.
We deal with some pretty big lists in archive-push so a nested-loop anti-join looked like it would not be efficient enough.
This merge anti-join should do the trick even though both lists must be sorted first.
The prior behavior on a global error (i.e. not file specific) was to write an individual error file for each WAL file being processed. On retry each of these error files would be removed, and if the error was persistent, they would then be recreated. In a busy environment this could mean tens or hundreds of thousands of files.
Another issue was that the error files could not be written until a list of WAL files to process had been generated. This was easy enough for archive-get but archive-push requires more processing and any errors that happened when generating the list would only be reported in the pgBackRest log rather than the PostgreSQL log.
Instead write a global.error file that applies to any WAL file that does not have an explicit ok or error file. This reduces churn and allows more errors to be reported directly to PostgreSQL.