For some reason we were taking the trouble to extract an appliction_name
from the local node's conninfo, but this was being subsequently overwritten
with the node name (which is what we want anyway).
Previously it was being parsed (a step which ensures any "application_name"
set by the caller is changed to the node name), but the original string
was being copied to "primary_conninfo" anyway.
Note that the next release is intended to be 5.0 to coincide with the
release of PostgreSQL 12; 4.5 is currently a placeholder in case we
need to push out a feature release before then.
Also create an actual event notification for both actions, rather
than just creating the event record.
This is presumably an oversight from the original conversion to
repmgr4 which no-one has noticed before.
Now the "xlog nomenclature" Pg versions are fading into the past,
rename things related to handling pg_basebackup's -X option
(was: --xlog-method, now: --wal-method) to start with "wal_"
rather than "xlog_".
This is a cosmetic change for code clarity.
Previously, the check was attempting to make replication connections
to the source node, and if these were failing, inferring that
insufficient walsenders were available.
However it's quite likely that the connections are refused due to
insufficient user connection permissions. So before performing
the connection check, query the number of potentially available
walsenders on the source node and compare it with the number
required (either 1 or 2) - if insufficient, exit with error and
hint about increasing "max_wal_senders".
Once we've established sufficient walsenders are available, inability
to connect is most likely related to permissions issues on the source
node.
Make it clearer we're dealing with the upstream node record.
Also avoid "overloading" the upstream record when checking for an
existing record with the same node name; this was not technically
a problem but mildly confusing when reading the code.
This helps provide a better picture of the state of the cluster, i.e.
making it more obvious whether there's been a timeline divergence.
This also provides infrastructure for further improvements in cluster
status display and diagnosis.
Note this is only available in PostgreSQL 9.6 and later as it relies
on the SQL functions for interrogating pg_control, which can be executed
remotely. As PostgreSQL 9.5 will shortly be the only community-supported
version without these functions, it's not worth the effort of trying
to duplicate their functionality.
Previously repmgr would happily clone from whatever server
it found at the provided source server address. We should
ensure that a standby can only be cloned from a node which
is part of the main replication cluster.
This check fetches a list of nodes from the source server,
connects to the first non-witness server it finds, and
compares the system identifiers of the source node and the
node it has connected to. If there is a mismatch, then the
source server is clearly not part of the main replication
cluster, and is most likely the witness server.
As the witness server does not, by definition, ever have an entry in pg_stat_replication,
we need to check its "attached" status by connecting to the witness server itself
and querying the reported upstream node ID (which should be set by the witness
server repmgrd). If this matches the current primary node ID, we count it as attached.
This enables us to better determine whether a node is definitively
attached, definitively not attached, or if it was not possible to
determine the attached state.