This situation can occur in provisioning environments, where a node's
upstream may not exist at the point it's cloned. If replication slots
are in use, we'll need to make sure no attempt is made to create
the replication slot on the designated upstream, as that will end in
tears. We assume the user will be prepared to complete this step manually.
As that's what we really want to know. Also return "UNCLEAN_SHUTDOWN"
if that's the case, rather than "RUNNING" which is confusing, even
though it's a command for internal use.
Specifically state which server is being promoted; this is particularly
important when the promotion occurs as part of a series of other operations,
e.g. "standby switchover".
Also no need to disconnect/reconnect while the server is promoted.
In an automatic failover situation, after a standby has been promoted
there's a risk the original primary may become available again before
"standby follow" is issued on another standby node, in which case "standby
follow" will reconnect to the original primary.
As the standby's repmgrd will have received a notification from the new
primary, it will know the primary's ID and can therefore explicitly
direct "standby follow" to follow that primary.
Also add improved error detection.
Basically in the worst case we want to enable the user to clone a standby
from Barman even if the upstream node is not running/reachable, as long as
the user explicitly provides a string to use for "primary_conninfo".
Before this we were always forcing a restart, which is technically not
a problem but produces some potentially confusing log entries along the
lines:
pg_ctl: PID file "/path/to/postmaster.pid" does not exist
Is server running?
starting server anyway
This is a remnant of the early repmgr days when there were no alternative
mechanisms for ensuring sufficient WAL remains available while cloning a
standby.
The purpose of this setting was to override a check for an (arbitrary)
minimum setting for "wal_keep_segments". As there's no reliable way
of determining a sensible value for this, and improvements in
pg_basebackup mean WALs can be streamed (possibly using a replication
slot) while the backup is in progress, there's no point in keeping
this around.
We will however still emit a warning about setting "wal_keep_segments"
if the configuration doesn't appear to provide any other way of
ensuring WAL is available during/after the cloning process and
"wal_keep_segments" is not set.
There are now too many options to sensibly fit into general --help
output; we'll add separate output for each repmgr command, e.g.
"repmgr node --help".
Previously repmgr would write all the default libpq parameters
into "primary_conninfo" on "standby clone", but not for
"standby follow", which is inconsistent.
For repmgr4 we'll determine that the upstream node's conninfo
must be canonical and contain all required connection parameters,
even if these are available as defaults or environment variables
in the local environment, as those are transient and may not
be available in all environments/situations.
recovery.conf's "primary_conninfo" will be generated using the
upstream's conninfo parameters, except for those specific
to the downstream node. These are:
- "application_name": this will always be set to the
"node_name" of the downstream node
- "passfile" and "servicefile": these, must of course
reference files on the downstream node so will be extracted
from the downstream node's conninfo, if set
"standby follow" was originally co-opted to start up a demoted node;
this functionality is now delegated to "node rejoin", with the core
functionality of "standby follow" implemented as an internal function.