get_superuser_connection() was erroneously using the local node record
to connect to as a superuser, which works when registering the primary
but obviously not when cloning a standby.
Addresses GitHub #380.
This is used for determining a timeout when reconnecting to the standby
after executing the "follow_command". This will normally not need to be
set explicitly, but maybe useful in cases where the standby's startup
phase can last longer than usual.
Refactor the original code which generates "recovery.conf" to place the
output into a buffer, which can either be output as "recovery.conf"
or copied to a buffer specified by the caller.
This will generate "recovery.conf" for an existing standby.
Typical use-case is a standby cloned manually from an external data
source (e.g. Barman), where "recovery.conf" needs to be created
(and if required a replication slot).
The --dry-run option will check the pre-requisites but not actually
create "recovery.conf" or a replication slot.
This requires that the upstream node is running, a replication connection
can be made and if required a replication slot can be created.
Implements GitHub #382.
If repmgrd is running in degraded mode on a primary which has been stopped,
then manually been brought back online as a standby (e.g. by creating
recovery.conf and starting the server), ensure it not only detects the
change but automatically updates the node record so it can resume
monitoring the node as a standby.
Previously, repmgrd was looping waiting for the record to be updated
(as is done transparently when executing "repmgr node rejoin") but
if the record was not updated within the timeout period (e.g. by
"repmgr standby register) it would fail to resume monitoring as a
standby.
It seems reasonable to have repmgrd automatically update the node record,
as this will restore failover capability as quickly as possible. If this
is not desired, then the onus is on the user to shut down repmgrd while
making the desired changes.
Check it's actually possible for the demotion candidate to attach to
the promotion candidate before executing the switchover.
As with other checks of this nature, there's a faint possibility the
situation could change between the time the check is carried out and
the demotion candidate is restarted to connect to the promotion candidate,
but there's not a lot we can do about that. The main purpose is to
be able to catch existing misconfigurations before anything gets changed.
Implements GitHub #370.
Check that sufficient walsenders will be available on the promotion
candidate, and if replication slots are in use check if enough of
those will be available.
Note these checks can't guarantee that the walsenders/slots will
be available at the appropriate points during the switchover process,
but do ensure that existing configuration problems will be caught.
Implements GitHub #371.
By checking the PID file in the same way pg_ctl does, we can be pretty
much certain whether the target data directory contains an active
PostgreSQL instance.
"--upstream-node-id", if provided, was not being passed through to
the SQL query executed via the Barman server.
Also modified the query to select the primary node if "--upstream-node-id"
is not provided.
Note: this is a very niche use case.
Attempting to use the contents of pg_control to tell whether the directory
is in use by PostgreSQL can result in false positives; we should use
a check based on the pidfile.
Also change the HINT to indicate a data directory can be overwritten
if -F/--force is provided.
This ensures any connection errors are displayed by default in a
comprehensible, easily reportable way, and saves having to request/filter
DEBUG output.
Implements GitHub #369.
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.