bzip2 is a widely available, high-quality data compressor. It typically compresses files to within 10% to 15% of the best available techniques (the PPM family of statistical compressors), while being around twice as fast at compression and six times faster at decompression.
bzip2 is currently available on all supported platforms.
Zstandard is a fast lossless compression algorithm targeting real-time compression scenarios at zlib-level and better compression ratios. It's backed by a very fast entropy stage, provided by Huff0 and FSE library.
Zstandard version >= 1.0 is required, which is generally only available on newer distributions.
The previous location was too late to allow --var=s3-all=y to work with --require=/repo-host, which depends on /quickstart/configure-archiving.
Since the section is not included in production documentation, the position is not very important to flow so just move it to where it works.
This is implemented by checking for a backup lock on the host where info is running so there are a few limitations:
* It is not currently possible to know which command is running: backup, expire, or stanza-*. The stanza commands are very unlikely to be running so it's pretty safe to guess backup/expire. Command information may be added to the lock file to improve the accuracy of the reported command.
* If the info command is run on a host that is not participating in the backup, e.g. a standby, then there will be no backup lock. This seems like a minor limitation since running info on the repo or primary host is preferred.
LZ4 compresses data faster than gzip but at a lower ratio. This can be a good tradeoff in certain scenarios.
Note that setting compress-type=lz4 will make new backups and archive incompatible (unrestorable) with prior versions of pgBackRest.
This was the interface between Perl and C introduced in 36a5349b but since f0ef73db has only been used by the Perl integration tests. This is expensive code to maintain just for testing.
The main dependency was the interface to storage, no matter where it was located, e.g. S3. Replace this with the new-introduced repo commands (d3c83453) that allow access to repo storage via the command line.
The other dependency was on various cfgOption* functions and CFGOPT_ constants that were convenient but not necessary. Replace these with hard-coded strings in most places and create new constants for commonly used values.
Remove all auto-generated Perl code. This means that the error list will no longer be maintained automatically so copy used errors to Common::Exception.pm. This file will need to be maintained manually going forward but there is not likely to be much churn as the Perl integration tests are being retired.
Update test.pl and related code to remove LibC builds.
Ding, dong, LibC is dead.
These commands are generally useful but more importantly they allow removing LibC by providing the Perl integration tests an alternate way to work with repository storage.
All the commands are currently internal only and should not be used on production repositories.
This command only makes sense for the repository storage since other storage (e.g. pg and spool) must be located on a local Posix filesystem and can be listed using standard unix commands. Since the repo storage can be located lots of places having a common way to list it makes sense.
Prefix with repo- to make the scope of this command clear.
Update documentation to reflect this change.
Adding a sleep before was necessary since only adding a sleep after did not always work. This helps to ensure the backup stop time for the previous backup does not equal time-recovery-timestamp. The sleep after allows enough time between the time retrieval and dropping important_table so PostgreSQL can consistently recover to before the table drop.
Note that these issues were caused by picking a timestamp too close to the restore command or a database operation, not due to any problem in backup selection of the restore command.
Auto-selection is performed only when --set is not specified. If a backup set for the given target time cannot not be found, the latest (default) backup set will be used.
Currently a limited number of date formats are recognized and timezone names are not allowed, only timezone offsets.
pkg-config is a generic way to get build options rather than relying on a package-specific utility.
XML2_CONFIG can be used to override this utility for systems that do not ship pkg-config.
Remove embedded Perl from the distributed binary. This includes code, configure, Makefile, and packages. The distributed binary is now pure C.
Remove storagePathEnforceSet() from the C Storage object which allowed Perl to write outside of the storage base directory. Update mock/all and real/all integration tests to use storageLocal() where they were violating this rule.
Remove "c" option that allowed the remote to tell if it was being called from C or Perl.
Code to convert options to JSON for passing to Perl (perl/config.c) has been moved to LibC since it is still required for Perl integration tests.
Update build and installation instructions in the user guide.
Remove all Perl unit tests.
Remove obsolete Perl code. In particular this included all the Perl protocol code which required modifications to the Perl storage, manifest, and db objects that are still required for integration testing but only run locally. Any remaining Perl code is required for testing, documentation, or code generation.
Rename perlReq to binReq in define.yaml to indicate that the binary is required for a test. This had been the actual meaning for quite some time but the key was never renamed.
For the most part this is a direct migration of the Perl code into C except as noted below.
A backup can now be initiated from a linked directory. The link will not be stored in the manifest or recreated on restore. If a link or directory does not already exist in the restore location then a directory will be created.
The logic for creating backup labels has been improved and it should no longer be possible to get a backup label earlier than the latest backup even with timezone changes or clock skew. This has never been an issue in the field that we know of, but we found it in testing.
For online backups all times are fetched from the PostgreSQL primary host (before only copy start was). This doesn't affect backup integrity but it does prevent clock skew between hosts affecting backup duration reporting.
Archive copy now works as expected when the archive and backup have different compression settings, i.e. when one is compressed and the other is not. This was a long-standing bug in the Perl code.
Resume will now work even if hardlink settings have been changed.
Reviewed by Cynthia Shang.
We had some problems with newer versions so had held off on updating. Those problems appear to have been resolved.
In addition, the --compat flag is no longer required. Prior versions of MinIO required all parts of a multi-part upload (except the last) to be of equal size. The --compat flag was introduced to restore the default S3 behavior. Now --compat is only required when ETag is being used for MD5 verification, which we don't do.
Note that building the manifest on each host has been temporarily removed.
This feature will likely be brought back as a non-default option (after the manifest code has been fully migrated to C) since it can be fairly expensive.
Recovery settings are now written into postgresql.auto.conf instead of recovery.conf. Existing recovery_target* settings will be commented out to help avoid conflicts.
A comment is added before recovery settings to identify them as written by pgBackRest since it is unclear how, in general, old settings will be removed.
recovery.signal and standby.signal are automatically created based on the recovery settings.
The additional details include databases that can be used for selective restore and a list of tablespaces and symlinks with their default destinations.
This information is not included in the JSON output because it requires reading the manifest which is too IO intensive to do for all manifests. We plan to include this information for JSON in a future release.
This restore type automatically adds standby_mode=on to recovery.conf.
This could be accomplished previously by setting --recovery-option=standby_mode=on but PostgreSQL 12 requires standby mode to be enabled by a special file named standby.signal.
The new restore type allows us to maintain a common interface between PostgreSQL versions.
We haven't had the time to complete this documentation and it has suffered bit rot.
This prevents us from building the docs on PostgreSQL >= 11 so just comment it all out until it can be updated.
For the most part this is a direct migration of the Perl code into C.
There is one important behavioral change with regard to how file permissions are handled. The Perl code tried to set ownership as it was in the manifest even when running as an unprivileged user. This usually just led to errors and frustration.
The C code works like this:
If a restore is run as a non-root user (the typical scenario) then all files restored will belong to the user/group executing pgBackRest. If existing files are not owned by the executing user/group then an error will result if the ownership cannot be updated to the executing user/group. In that case the file ownership will need to be updated by a privileged user before the restore can be retried.
If a restore is run as the root user then pgBackRest will attempt to recreate the ownership recorded in the manifest when the backup was made. Only user/group names are stored in the manifest so the same names must exist on the restore host for this to work. If the user/group name cannot be found locally then the user/group of the PostgreSQL data directory will be used and finally root if the data directory user/group cannot be mapped to a name.
Reviewed by Cynthia Shang.
286a106a updated the documentation to build pgBackRest as an unprivileged user, but the wget command was missed. This command is not actually run, just displayed, because the release is not yet available when the documentation is built.
Update the wget command to run as the local user.
The {[os-type-is-centos]} expression was missing parens which meant "and" expressions built on it would always evaluate true if the os-type was centos6.
Reported by Joe Ayers, John Harvey.
pgBackRest was being built by root in the documentation which is definitely not best practice.
Instead build as the unprivileged default container user. Sudo privileges are still required to install.
Suggested by Laurenz Albe.
Implement switch WAL and archive check in C but leave the rest in Perl for now.
The main idea was to have some real integration tests for the new database code so the rest of the migration can wait.
Reviewed by Cynthia Shang.
This direct interface to libpq allows simple queries to be run against PostgreSQL and supports timeouts.
Testing is performed using a shim that can use scripted responses to test all aspects of the client code. The shim will be very useful for testing backup scenarios on complex topologies.
Reviewed by Cynthia Shang.
Maintaining the storage layer/drivers in two languages is burdensome. Since the integration tests require the Perl storage layer/drivers we'll need them even after the core code is migrated to C. Create an interface layer so the Perl code can be removed and new storage drivers/features introduced without adding Perl equivalents.
The goal is to move the integration tests to C so this interface will eventually be removed. That being the case, the interface was designed for maximum compatibility to ease the transition. The result looks a bit hacky but we'll improve it as needed until it can be retired.
This variable needs to be replaced right before being used without being added to the cache since the host repo path will vary from system to system.
This is frankly a bit of a hack to get the documentation to build in the Debian packages for the upcoming release. We'll need to come up with something more flexible going forward.
The documentation was using wal_level=hot_standby which is a deprecated setting.
Also remove the reference to wal_level=archive since it is no longer supported and is not recommended for older versions.
Suggested by Patrick McLaughlin.
Allows listing repo paths/files from the command-line, to be used primarily for testing and debugging.
This command is internal-only so the interface may change at any time without notice.
The documentation was relying on a ScalityS3 container built for testing which wasn't very transparent. Instead, use the stock minio container and configure it in the documentation.
Also, install certificates and CA so that TLS verification can be enabled.
It would be better if the documentation could be generated on multiple operating systems all in one go, but the doc system currently does not allow vars to be changed once they are set.
The solution is to run the docs for each required OS and stitch the documentation together. It's not pretty but it works and the automation in release.pl should at least make it easy to use.
Use autoconf to provide a basic configure script. WITH_BACKTRACE is yet to be migrated to configure and the unit tests still use a custom Makefile.
Each C file must include "build.auto.conf" before all other includes and defines. This is enforced by test.pl for includes, but it won't detect incorrect define ordering.
Update packages to call configure and use standard flags to pass options.