When cloning from Barman, and --no-upstream-connection was supplied,
the server version number will not be available at this point in the
code. It will however later be extracted from the Barman metadata,
so move the check for the --waldir pg_basebackup option to after
this point.
Also add an explicit check that a server version number has been
obtained (and fall back to extracting it from the cloned data
directory), as subsequent operations depend on knowing this to
be performed correctly.
From PostgreSQL 13, pg_rewind will automatically handle an unclean
shutdown itself, so as long as --force-rewind was provided, so there
is no need to fail with an error.
Note that pg_rewind handles the unclean shutdown by starting PostgreSQL
in single user mode, which it does before performing any checks as
to whether a rewind is actually necessary.
However pg_rewind doesn't take into account the possible presence
of a standby.signal file, so we remove that and recreate it after
pg_rewind was executed.
When executing "repmgr standby clone" in Barman mode, and --waldir
is set in pg_basebackup options, properly report an error if the
target WAL directory could not be created or is not empty.
By setting --waldir in "pg_basebackup_options", standbys cloned using
pg_basebackup would have their WAL directory set to the specified
location and symlinked from the data directory.
This commit causes repmgr to honour that setting even when cloning
from Barman.
As of PostgreSQL 13, changes to the fundamental replication
configuration can be applied with a simple SIGHUP, no restart
required.
In case the old behaviour is desired, i.e. a full restart to apply
the configuration changes, the new configuration parameter
"standby_follow_restart" can be set. This parameter has no effect
in PostgreSQL 12 and earlier.
Previously the check verifying that a node has connected to its upstream
merely assumed the presence of a record in pg_stat_replication indicates
a successful replication connection. However the record may contain a
state other than "streaming", typically "startup" (which will occur when
a node has diverged from its upstream and will therefore never
transition to "streaming"), which needs to be taken into account when
considering the state of the replication connection to avoid false
positives.
repmgr creates a file with a list of tablespace files to fetch from
Barman, however the file may not actually have been flushed to disk
at the point the rsync operation was executed, so may be incomplete
or empty.
Also fix handling of tablespace remapping.
Addresses GitHub #650.
We omitted to do this with the connections used when checking the system
identifier, which means libpq calls by the teardown function using the
pointer risk using unallocated memory.
Addresses issue reported in GitHub #644.
Add a sanity check that rempgr, when remotely executed on the demotion
candidate, is able to connect as superuser. If not, emit a diagnostic
command as a hint.
It's useful to have a confirmation of which database repmgr is trying
to connect to when the -S/--superuser connection is provided.
It will always be the database defined in the repmgr.conf "conninfo"
parameter, but having the name available is useful when e.g.
troubleshooting issues with .pgpass configuration.
Output a command, which when excuted on the local node (promotion
candidate) will attempt to remotely connect to the demotion candidate
and display both the connection message encountered and the connection
parameters used.
This is useful for corner-cases where the connection normally succeeds if a
particular environment variable (e.g. PGPORT) is normally set, but is
not set in the environment where SSH is executed.
Explicitly log if a database connection failure caused the check
to fail.
It's unlikely this situation will be encountered, as the data directory
check will already have run and checked for connection failure, however
there's a small chance the connection could fail between checks.
It's possible that the remote data directory check will fail if e.g.
connection configuration is not consistent across all nodes. This
modification ensures a database error connection is reported, rather
than a spurios issue with the data directory configuration.
Commit 0574279 set the file permissions to 0600 rather than the user's
umask, but if initdb was executed with -g/--allow-group-access, the
file is maintained with 0640, so we'll just maintain the existing
permssions.