Having files satisfying both conditions seems somewhat awkward, as users
would usually choose either the number of generations to keep or the
amount of days to keep the files. Hence deletion of a backup is bypassed
only when both parameters are set to infinite.
At the same time correct some typos and incorrections in the deletion
code.
Backup from standbys should use a method based on replication protocol
in a way similar to what is done in pg_basebackup, as it cannot use
pg_start/stop_backup. As I am not sure what would be the right approach
by the way, it is better for the time being to block backups taken
from a standby. It does not penalize the functionality though as taking
disk snapshots is not forbidden either, and a user can still recover
from that. This commit removes at the same time some home-made functions
that created custom backup label files, this is not relyable, especially
if Postgres core format for this file changes across versions. Removing
them at least will save from some bugs for sure.
Name file of WAL segment was generated using the API of xlog_internal.h
called XlogFileName, based on XLogSegNo and not XLogRecPtr as the
previous code assumed. This leaded to backup incorrect, actually too
many WAL files in the archive code path because the analysis was based
on a name completely fucked up. This commit fixes at the same time an
issue in search_next_wal where the function could loop for a too long
amount of time, eating much CPU when looking for the next WAL file.
Regression tests are passing cleanly with this patch.
This commit makes mandatory the presence of a full backup when doing
an incremental or archive backup on an existing timeline. In this case
the process will now simply error out and not take any backup. It looks
safer to use that as a default by the way, so as user will be forced
to take a full backup once a recovery has been done.
Database backup also contained the following condition when doing an
incremental backup:
prev_backup->tli != current.tli
This means that an incremental backup cannot be taken if there is not
already a full backup present in the same timeline. The same condition
should also be used for archive backup but it didn't seem to be the
case...
This bug has been introduced by some older code, it looks that it will be
necessary to re-create a battery of regression tests to test all those
things automtically, as former tests contain nothing to test archive
mode directly.
Those macros were mainly used in code paths where they didn't make that
much sense, complicating heavily the code. Correct at the same time some
code comments.
It was unclear what was being errored out at the beginning of the
process. But it happens that it is just necessary to check if the
backup running is only an archive or not, then return a NULL file
list before continuing process. This should be part of some safety
checks though.
The documentation found on internet is rather unclear about the role
and the goal of this feature, which looks more like a kludge to cover
the fact that most of the system XLOG functions do not work on standby
nodes. Now that this restriction has been removed by using the control
file to look for the current timestamp, this feature is not needed.
The system function used up to now was pg_xlogfile_name_offset, which
cannot be used on a node in recovery, and it was the only way present
to fetch the timeline ID of a backup, either incremental or full. So
instead scan the control file of server and fetch the timeline from
that. This also removes the restriction on which a backup could not
be taken on a standby node. The next step being to have the possibility
to take backups from streams.
It is just troublesome to have to type a subcommands for something that
could be merged into a single table. The output could be made in a
smarter way though...
Due to changes in XlogRecPtr in 9.3, older version of pg_rman are
already incompatible either way, and it is a pain to maintain code
duplicated from past versions of Postgres, so rely a maximum on the
core structures.
History file format has changed from 9.2 to 9.3 to indicate the WAL record
when timeline branched off. In 9.2, the complete WAL file name was used
while in 9.3 the WAL record is used (like 1/4000090). pg_rman contains a
copy of a function of postgres core code to parse the history file that
was not anymore compatible, leading to errors related to timelines.
In Postgres 9.3, XLogRecPtr has been changed to a unique uint64, making
the old structure based on two uint32 obsolete. Note that this makes
pg_rman incompatible with PG <= 9.2.