Explicitly check whether the "repmgr node rejoin" command on the
demotion candidate succeeded. Due to the way SSH execution is
currently implemented, we can return either the command execution
status or the command output; to ensure any errors are available,
log them to a temporary file on the demotion candidate and note
its location in case of an error.
While we're at it, improve error message handling when the demotion
candidate fails to rejoin.
Unfortunately it hasn't been possible yet to include all available
configuration items in the main documentation, but we should at least
make it easier to find the full list.
If a PostgreSQL instance was shut down while repmgrd was running, and
repmgrd was subsequently restarted (this chain of events could occur
during e.g. a server reboot), the node record will have been set to
"inactive". Previously, in this case repmgrd would refuse to start up.
However, as we can determine the node is running, it should normally
be no problem to automatically set the node record to "active".
The old behaviour can be restored by setting the new parameter
"repmgrd_exit_on_inactive_node" to "true".
RM19604.
As it's possible to specify the connection information for any available
node, but currently not possible to rejoin to a node other than the
primary, explicitly mention what the rejoin target will be.
Previously, repmgr would forcibly change the permissions on a data
directory to 0700. However from PostgreSQL 11, 0750 is also valid,
so that value should not be changed.
If neither the local node nor the upstream are available, and
"standby_disconnect_on_failover" is set, attempting to fetch
the walreceiver PID will result in repmgrd terminating.
Add a check that the connection is valid before attempting to
fetch the walreceiver PID.
Addresses GitHub #675.
This overrides the equivalent setting in repmgr.conf, if present.
Note this option was available in repmgr versions prior to 4.0, but
was assumed to be redundant. However recently a use-case was made
for its reintroduction.
From PostgreSQL 13, pg_rewind will automatically handle an unclean
shutdown itself, so as long as --force-rewind was provided, so there
is no need to fail with an error.
Note that pg_rewind handles the unclean shutdown by starting PostgreSQL
in single user mode, which it does before performing any checks as
to whether a rewind is actually necessary.
However pg_rewind doesn't take into account the possible presence
of a standby.signal file, so we remove that and recreate it after
pg_rewind was executed.
By setting --waldir in "pg_basebackup_options", standbys cloned using
pg_basebackup would have their WAL directory set to the specified
location and symlinked from the data directory.
This commit causes repmgr to honour that setting even when cloning
from Barman.
As of PostgreSQL 13, changes to the fundamental replication
configuration can be applied with a simple SIGHUP, no restart
required.
In case the old behaviour is desired, i.e. a full restart to apply
the configuration changes, the new configuration parameter
"standby_follow_restart" can be set. This parameter has no effect
in PostgreSQL 12 and earlier.
In certain corner cases, it's possible repmgrd may end up monitoring
a standby which was a former primary, but the node record has not
yet been updated.
Previously repmgrd would abort the promotion with a cryptic message
about being unable to find a node record for node_id -1 (the
default value for an unknown node id).
This commit addes a new configuration option "always_promote", which
determines whether repmgrd should promote the node in this case.
The default is "false", to effectively maintain the existing behaviour.
Logging output has also been improved to make it clearer what has
happened when this situation occurs.