Previously, if an issue was encountered with the old primary, but user
provided -F/--force to have repmgr promote the standby anyway, repmgr
would exit with the log message "STANDBY SWITCHOVER is complete"
and exit code 0 (SUCCESS).
To better report this partial completion, repmgr will now emit the message
"STANDBY SWITCHOVER has completed with issues" (and a HINT to check preceding
log messages) and new exit code 22 (ERR_SWITCHOVER_INCOMPLETE).
If logging output not explicitly rediretced with "-l" in the pg_ctl
options, repmgr would hang waiting for pg_ctl output.
Note that we recommend using the OS-level service commands where
available.
It's possible that a node was registered with "use_replication_slots=false"
but that was later changed to "use_replication_slots=true". If the node
was not subsequently re-registered, the node record will contain an empty
slot name, which will cause any slot creation operation during
"standby follow" or "node rejoin" to fail.
To prevent this happening, check for an empty slot name and automatically
set before proceeding.
Addresses GitHub #343.
The check was being carried out regardless of whether --copy-external-config-files
was specified, which means cloning will fail if no SSH connection is available.
Addresses GitHub #342
The standby may not always be available for connections right after it's
restarted, so attempting to connect and get the node's upstream record
after the restart may fail. Record is now retrieved before the restart.
Addresses GitHub #333.
This situation can occur in provisioning environments, where a node's
upstream may not exist at the point it's cloned. If replication slots
are in use, we'll need to make sure no attempt is made to create
the replication slot on the designated upstream, as that will end in
tears. We assume the user will be prepared to complete this step manually.
As that's what we really want to know. Also return "UNCLEAN_SHUTDOWN"
if that's the case, rather than "RUNNING" which is confusing, even
though it's a command for internal use.
Specifically state which server is being promoted; this is particularly
important when the promotion occurs as part of a series of other operations,
e.g. "standby switchover".
Also no need to disconnect/reconnect while the server is promoted.
In an automatic failover situation, after a standby has been promoted
there's a risk the original primary may become available again before
"standby follow" is issued on another standby node, in which case "standby
follow" will reconnect to the original primary.
As the standby's repmgrd will have received a notification from the new
primary, it will know the primary's ID and can therefore explicitly
direct "standby follow" to follow that primary.