This allows a C unit test to access data in the code repository that might be useful for testing.
Add testRepoPathSet() to set the repository path.
In passing remove extra whitespace in the TEST_RESULT_VOID() macro.
Bug Fixes:
* Fix issue with archive-push-queue-max not being honored on connection error. (Reported by Lardière Sébastien.)
* Fix static WAL segment size used to determine if archive-push-queue-max has been exceeded.
* Fix error after log file open failure when processing should continue. (Reported by vthriller.)
Features:
* Automatically enable backup checksum delta when anomalies (e.g. timeline switch) are detected. (Contributed by Cynthia Shang.)
Improvements:
* Retry all S3 5xx errors rather than just 500 internal errors. (Suggested by Craig A. James.)
These interfaces previously used the memory context of the object they were associated with and did not have their own destructors.
There are times when it is useful to free the interface without also freeing the underlying object so give IoRead and IoWrite their own memory contexts and destructors.
In passing fix a comment type in bufferRead.c.
By default the IoWrite object does not write until the output buffer is full but this is a problem for protocol messages that must be sent in order to get a response.
ioWriteFlush() is not called internally by IoWrite but can be used at any time to immediately write all bytes from the output buffer without closing the IoWrite object.
The prior message stated that there had been a buffer overrun which is not true since the code prevents that.
In fact, this message means the parameter buffer filled while building the parameter list. Rather than display a partial list we output this message instead.
Also remove !!! which by convention we use as a marker for code that needs attention before it can be committed to master.
These macros provide a convenient way to output debug information in tests.
They are not intended to be left in test code when it is committed to master.
ioReadLine() calls ioRead(), which aggressively tries to fill the output buffer, but this doesn't play well with blocking reads.
Give ioReadLine() an option that tells it to read only what is available. That doesn't mean the function will never block but at least it won't do so by reading too far.
The report HTML generated by lcov is overly verbose and cumbersome to navigate. Since we maintain 100% coverage it's far more interesting to look at what is not covered than what is.
The new report presents all missing coverage on a single page and excludes code that is covered for brevity.
Add HTML tags for table elements.
The strExtra parameter allows adhoc tags to be added to an element for features that can't be implemented with CSS, e.g. colspan.
There are many places (and the number is growing) where a zero-terminated string constant must be transformed into a String object to be usable. This pattern wastes time and memory, especially since the created string is generally used in a read-only fashion.
Define macros to create constant String objects that are initialized at compile time rather than at run time.
The storageList() command accepts a regular expression as a filter. This works fine for local filesystems where it is relatively cheap to get a complete list of files and filter them in code. However, for remote filesystems like S3 it can be expensive to fetch a complete list of files only to discard the bulk of them locally.
S3 does not filter on regular expressions but it can accept a static prefix so this function extracts a prefix from a regular expression when possible.
Even a few characters can drastically reduce the amount of data that must be fetched remotely so the function does not try to be too clever. It requires a ^ anchor and stops scanning when the first special character is found.
Allow buffers to report a lower size than their allocated size. This means a larger buffer can be used to do the work of a smaller buffer without having to create a new buffer and concatenate.
This is useful for blocking I/O where the buffer may be too large for the amount of data that is available to read.
The Wait object accepted a double in the constructor for wait time but used TimeMSec internally. This was done for compatibility with the Perl code.
Instead, use TimeMSec in the Wait constructor and make changes as needed to calling code.
Note that Perl still uses a double for its Wait object so translation is needed in some places. There are no plans to update the Perl code as it will become obsolete.
If an object free() method was called manually when a callback was set then the callback would call free() again. This meant that each free() method had to protect against a subsequent call.
Instead, clear the callback (if present) before calling memContextFree(). This is faster (since there is no unecessary callback) and removes the need for semaphores to protect against a double free().
Code generation saved files even when they had not changed, which often caused code generation cascades. So, don't save files unless they have changed.
Use rsync to determine which files have changed since the last test run. The manifest of changed files is saved and not removed until all code generation and builds have completed. If an error occurs the work will be redone on the next run.
The eventual goal is to do all the builds from the test/repo directory created by rsync but for now it is only used to track changes.
The contents were already preserved between tests in a single test.pl run but for a separate execution the entire project had to be built from scratch, which was getting slower as we added code.
Save the important build flags in a file so the new execution knows whether the build contents can be reused.
Mounting/unmounting tmpfs on /home/[user]/test takes time, forces at least 3GB of memory to be available for tests, and makes it harder to preserve data between tests.
Instead, move mounting of tmpfs to the Vagrantfile and add it to fstab so it survives reboots.
There are a number of cases where a checksum delta is more appropriate than the default time-based delta:
* Timeline has switched since the prior backup
* File timestamp is older than recorded in the prior backup
* File size changed but timestamp did not
* File timestamp is in the future compared to the start of the backup
* Online option has changed since the prior backup
A practical example is that checksum delta will be enabled after a failover to standby due to the timeline switch. In this case, timestamps can't be trusted and our recommendation has been to run a full backup, which can impact the retention schedule and requires manual intervention.
Now, a checksum delta will be performed if the backup type is incr/diff. This means more CPU will be used during the backup but the backup size will be smaller and the retention schedule will not be impacted.
Contributed by Cynthia Shang.
We were already retrying 500 errors but 503 (rate-limiting) errors were not being retried and would cause an instant failure which aborted the command.
There are only two 5xx errors currently implemented by S3 but instead of adding 503 simply retry all 5xx errors. This is consistent with the http definition of this error class, "the server failed to fulfill an apparently valid request."
Suggested by Craig A. James.
This calculation was missed when the WAL segment size was made dynamic in preparation for PostgreSQL 11.
Fix the calculation by checking the actual WAL file sizes instead of using an estimate based on WAL segment size. This is more accurate because it takes into account .history and .backup files, which are smaller. Since the calculation is done in the async process the additional processing time should not adversely affect performance.
Remove the PG_WAL_SIZE constant and instead use local constants where the old value is still required. This is only the case for some tests and PostgreSQL 8.3 which does not provide a way to get the WAL segment size from pg_control.
If an error occurred while acquiring a lock on a remote server the error would be reported correctly, but the queue max detection code was not reached. The tests failed to detect this because they fixed the connection before queue max, allowing the ccde to be reached.
Move the queue max code before the lock so it will run even when remote connections are not working. This means that no attempt will be made to transfer WAL once queue max has been exceeded, but it makes it much more likely that the code will be reach without error.
Update tests to continue errors up to the point where queue max is exceeded.
Reported by Lardière Sébastien.
The C code was warning on failure and continuing but the Perl logging code was never updated with the same feature.
Rather than add the feature to Perl, just disable file logging if the log file cannot be opened. Log files are always opened by C first, so this will eliminate the error in Perl.
Reported by vthriller.
The existing tests were not adequate to ensure the history was being added in the correct order when some entries were loaded from a file and others added with infoPgAdd().
Contributed by Cynthia Shang.
The InfoPg object was partially modified in 960ad732 to place the current history item in position 0, but infoPgDataCurrent() didn't get updated correctly.
Remove this->indexCurrent and make the current position always equal 0. Use the new lstInsert() function when adding new history items via infoPgAdd(), but continue to use lstAdd() when loading from a file for efficiency.
This does not appear to be a live bug because infoPgDataCurrent() and infoPgAdd() are not yet used in any production code. The archive-get command is the only C code using InfoPG and it always looks at the entire list of items rather than just the current item.
Suggested by Cynthia Shang.