Append "_sql" to the respective function names, as we'll later be
creating equivalent functions which use the replication protocol
so need a way to distinguish between them.
repmgr has always insisted on determining the upstream's data directory
location, which requires superuser permissions (or from PostgreSQL 10,
membership of the default role "pg_read_all_settings").
Knowledge of the data directory location was required to implement rsync
cloning (now deprecated), but with pg_basebackup the minimum permission
requirement is now only a normal user with access to the repmgr metadata
and a user with replication permissions. The ability to determine the
data directory location is only required if the user specifies the
--copy-external-config-files option, which needs to be able to determine
the data directory to work out which configuration files are located
outside it.
This patch makes it possible to clone a standby with minimum
permissions, with appropriate checks for available permissions if
--copy-external-config-files is provided.
Implements part of GitHub #536 and addresses issue raised in #586.
A more generic option name to cover pre- and post-Pg12 replication
configuration methods.
--recovery-conf-only is retained as an alias for backwards
compatibility.
Now we no longer care about the upstream's data directory, and
normally expect to find the data directory in repmgr.conf, we
can just exit with an error in the corner case that no repmgr.conf
is provided and no data directory specified with -D/--pgdata.
In early repmgr versions, this used to be a requirement for cloning
via rsync, and/or as a fallback location if the user didn't supply
a data directory to clone into. However as rsync cloning has been
deprecated, and the data directory must be specified in repmgr.conf,
this is no longer required, and removing it simplifies user privilege
requirements.
Note that it is still possible to explicitly provide a target data
directory with -D/--pgdata, though this is primarily useful for
the niche use case where repmgr is used as a convenience tool to
clone a node which is not intended to become part of a repmgr
cluster.
This is part of the implementation of GitHub #536 for the minimizing
of user privilege requirements.
"repmgr daemon" can be interpreted to mean the commands affect the local
daemon process only. Rename the commands which affect the entire cluster
to "repmgr service ...".
The "repmgr daemon ..." form of the affected commands is retained for backwards
compatibility.
Previously repmgr would emit the "repmgr extension not found on source node"
which depending on context is somewhat misleading, as it may exist
but not be installed, or the user may be attempting to clone from the
wrong database.
While it's technically possible to have a standby follow another
standby while the primary is not available, repmgr will not be able
to update its metadata, which will cause Confusion and Chaos.
Update the documentation to make this clear, and provide a more helpful
error message if this situation occurs. The operation previously
failed anyway, but with an unhelpful message about not being able to
find a node record.
For some reason we were taking the trouble to extract an appliction_name
from the local node's conninfo, but this was being subsequently overwritten
with the node name (which is what we want anyway).
Previously it was being parsed (a step which ensures any "application_name"
set by the caller is changed to the node name), but the original string
was being copied to "primary_conninfo" anyway.
Now the "xlog nomenclature" Pg versions are fading into the past,
rename things related to handling pg_basebackup's -X option
(was: --xlog-method, now: --wal-method) to start with "wal_"
rather than "xlog_".
This is a cosmetic change for code clarity.
Previously, the check was attempting to make replication connections
to the source node, and if these were failing, inferring that
insufficient walsenders were available.
However it's quite likely that the connections are refused due to
insufficient user connection permissions. So before performing
the connection check, query the number of potentially available
walsenders on the source node and compare it with the number
required (either 1 or 2) - if insufficient, exit with error and
hint about increasing "max_wal_senders".
Once we've established sufficient walsenders are available, inability
to connect is most likely related to permissions issues on the source
node.
Make it clearer we're dealing with the upstream node record.
Also avoid "overloading" the upstream record when checking for an
existing record with the same node name; this was not technically
a problem but mildly confusing when reading the code.
Previously repmgr would happily clone from whatever server
it found at the provided source server address. We should
ensure that a standby can only be cloned from a node which
is part of the main replication cluster.
This check fetches a list of nodes from the source server,
connects to the first non-witness server it finds, and
compares the system identifiers of the source node and the
node it has connected to. If there is a mismatch, then the
source server is clearly not part of the main replication
cluster, and is most likely the witness server.
This enables us to better determine whether a node is definitively
attached, definitively not attached, or if it was not possible to
determine the attached state.
It's not attached to the primary per-se, but needs to know what
the current primary is in order to correctly synchronise its
copy of the metadata.
Per GitHub #560.
Previously "repmgr standby switchover" would abort if any node was unreachable,
as that means it was unable to check if repmgrd is running.
However if the node has been marked as inactive in the repmgr metadata, it's
reasonable to assume the node is no longer part of the replication cluster
and does not need to be checked.
If --siblings-follow is not supplied, list all nodes which repmgr considers
to be siblings (this will include the witness server, if in use), and
which will remain attached to the old primary.
In "recovery.conf", the configuration parameter "node_name" is used
as the "application_name" value, which will be truncated by PostgreSQL
to 63 characters (NAMEDATALEN - 1).
repmgr sometimes needs to be able to extract the application name from
pg_stat_replication to determine if a node is connected (e.g. when
executing "repmgr standby register"), so the comparison will fail
if "node_name" exceeds 63 characters.