These indicate:
- the number of visible nodes sharing the current upstream
- the number of nodes on the current upstream
- the total number of nodes in the entire repmgr cluster.
This allows the failover_validation_command to be used to perform
more thorough validations, including cross-referencing external
cluster management state (e.g. if managed by kubernetes).
GitHub #651.
If the provided connection does not have sufficient permission to read
"pg_stat_replication.state", and there is an entry for the node in
"pg_stat_replication", assume it's connected. Finer-grained detection
requires additional user permissions, nothing we can do about that.
Previously the check verifying that a node has connected to its upstream
merely assumed the presence of a record in pg_stat_replication indicates
a successful replication connection. However the record may contain a
state other than "streaming", typically "startup" (which will occur when
a node has diverged from its upstream and will therefore never
transition to "streaming"), which needs to be taken into account when
considering the state of the replication connection to avoid false
positives.
Commit 0574279 set the file permissions to 0600 rather than the user's
umask, but if initdb was executed with -g/--allow-group-access, the
file is maintained with 0640, so we'll just maintain the existing
permssions.
From PostgreSQL 12, the SQL-level function "pg_promote()" can be used
to promote a PostgreSQL instance, however usage is restricted to
superusers and users to whom explicit execution permission for this
function has been granted.
Therefore, if execution permission is not available, fall back to
"pg_ctl promote".
In a few places, replication connections are generated from the
parameters used by existing connections. This has resulted in a
number of similar blocks of code which do more-or-less the same
thing almost but not quite identically. In two cases, the code
omitted to set "dbname=replication", which can cause problems
in some contexts.
These code blocks have now been consolidated into standardized
functions.
This also resolves the issue addressed by GitHub #619.
Make the code previously only used by "standby follow" generally
available - we'll want to use this from "node rejoin" as well.
While we're at it, when reporting failure due to lack of free
replication slots, report the current value of "max_replication_slots".
Enable operations which create or drop replication slots to be carried
out with the minimum necessary user permissions, i.e. a user with the
REPLICATION attribute.
This can be the repmgr user, or a dedicated replication user.
In the latter case, if the dedicated replication user is only
permitted to make replication connections, the streaming
replication protocol is used to create/drop slots.
Implements part of GitHub #536.
Append "_sql" to the respective function names, as we'll later be
creating equivalent functions which use the replication protocol
so need a way to distinguish between them.
repmgr has always insisted on determining the upstream's data directory
location, which requires superuser permissions (or from PostgreSQL 10,
membership of the default role "pg_read_all_settings").
Knowledge of the data directory location was required to implement rsync
cloning (now deprecated), but with pg_basebackup the minimum permission
requirement is now only a normal user with access to the repmgr metadata
and a user with replication permissions. The ability to determine the
data directory location is only required if the user specifies the
--copy-external-config-files option, which needs to be able to determine
the data directory to work out which configuration files are located
outside it.
This patch makes it possible to clone a standby with minimum
permissions, with appropriate checks for available permissions if
--copy-external-config-files is provided.
Implements part of GitHub #536 and addresses issue raised in #586.
From PostgreSQL 10, a member of the default roles "pg_monitor" and/or
"pg_read_all_settings" can read pg_settings without requiring superuser
privileges.
Previously, a hint was being emitted about making the repmgr user a
member of one of those groups, but no check for membership was being
made, meaning the check could only be run by a superuser.
This helps provide a better picture of the state of the cluster, i.e.
making it more obvious whether there's been a timeline divergence.
This also provides infrastructure for further improvements in cluster
status display and diagnosis.
Note this is only available in PostgreSQL 9.6 and later as it relies
on the SQL functions for interrogating pg_control, which can be executed
remotely. As PostgreSQL 9.5 will shortly be the only community-supported
version without these functions, it's not worth the effort of trying
to duplicate their functionality.
Previously repmgr would happily clone from whatever server
it found at the provided source server address. We should
ensure that a standby can only be cloned from a node which
is part of the main replication cluster.
This check fetches a list of nodes from the source server,
connects to the first non-witness server it finds, and
compares the system identifiers of the source node and the
node it has connected to. If there is a mismatch, then the
source server is clearly not part of the main replication
cluster, and is most likely the witness server.
This enables us to better determine whether a node is definitively
attached, definitively not attached, or if it was not possible to
determine the attached state.
This functionality enables repmgrd (when running on the primary) to
monitor connected child nodes. It will log connections and disconnections
and generate events.
Additionally, repmgrd can execute a custom script if the number of connected
child nodes falls below a configurable threshold. This script can be used
e.g. to "fence" the primary following a failover situation where a new primary
has been promoted and all standbys are now child nodes of that primary.