In early repmgr versions, this used to be a requirement for cloning
via rsync, and/or as a fallback location if the user didn't supply
a data directory to clone into. However as rsync cloning has been
deprecated, and the data directory must be specified in repmgr.conf,
this is no longer required, and removing it simplifies user privilege
requirements.
Note that it is still possible to explicitly provide a target data
directory with -D/--pgdata, though this is primarily useful for
the niche use case where repmgr is used as a convenience tool to
clone a node which is not intended to become part of a repmgr
cluster.
This is part of the implementation of GitHub #536 for the minimizing
of user privilege requirements.
repmgrd has a check to see if the upstream node has unexpectedly
changed, e.g. if the repmgrd service is paused and the PostgreSQL
instance has been pointed to another node.
However this check was relying on the node record on the local node
being up-to-date, which may not be the case immediately after a
failover, when the node is still replaying records updated prior
to the node's own record being updated. In this case it will
mistakenly assume the node is following the original primary
and attempt to restart monitoring, which will fail as the original
primary is no longer available.
To prevent this, we check against the node's record on the upstream
node.
Addresses issue noted in GitHub #587 and #588.
This makes it possible to return log output when executing repmgr
remotely at a different level to the one defined in the remote
repmgr's repmgr.conf.
This is particularly useful when DEBUG output is required.
"repmgr daemon" can be interpreted to mean the commands affect the local
daemon process only. Rename the commands which affect the entire cluster
to "repmgr service ...".
The "repmgr daemon ..." form of the affected commands is retained for backwards
compatibility.
Previously repmgr would emit the "repmgr extension not found on source node"
which depending on context is somewhat misleading, as it may exist
but not be installed, or the user may be attempting to clone from the
wrong database.
From PostgreSQL 10, a member of the default roles "pg_monitor" and/or
"pg_read_all_settings" can read pg_settings without requiring superuser
privileges.
Previously, a hint was being emitted about making the repmgr user a
member of one of those groups, but no check for membership was being
made, meaning the check could only be run by a superuser.
Previously, repmgr was using a very simple ad-hoc string-based parser,
which had various limitations and allowed configuration files to be
created in a way which could cause confusion and/or unexpected
behaviour.
For example, it accepted strings enclosed in single quotes, but treated
strings enclosed in double quotes literally. A node_name defined thusly:
node_name="somenode"
would result in the literal value '"somenode"' being used, which could
lead to unobvious errors along the lines of:
no record found for ""somenode""
The configuration file parser has been adapted from the one used by
PostgreSQL itself, so behaves more-or-less identically (though some
functions such as file inclusion are not supported in repmgr).
This makes configuration parsing more robust and consistent;
additionally, error reporting will be more precise.
Note this does mean that some repmgr.conf items previously accepted
as valid by repmgr will now be rejected; in particular this includes
strings containing spaces which are not enclosed in single quotes.
This reverts commit c6ca183247.
Backing out this patch for now as the Debian build system doesn't
seem to like it, even though it builds just fine on Debian itself.
Previously, repmgr was using a very simple ad-hoc string-based parser,
which had various limitations and allowed configuration files to be
created in a way which could cause confusion and/or unexpected
behaviour.
For example, it accepted strings enclosed in single quotes, but treated
strings enclosed in double quotes literally. A node_name defined thusly:
node_name="somenode"
would result in the literal value '"somenode"' being used, which could
lead to unobvious errors along the lines of:
no record found for ""somenode""
The configuration file parser has been adapted from the one used by
PostgreSQL itself, so behaves more-or-less identically (though some
functions such as file inclusion are not supported in repmgr).
This makes configuration parsing more robust and consistent;
additionally, error reporting will be more precise.
Note this does mean that some repmgr.conf items previously accepted
as valid by repmgr will now be rejected; in particular this includes
strings containing spaces which are not enclosed in single quotes.
While it's technically possible to have a standby follow another
standby while the primary is not available, repmgr will not be able
to update its metadata, which will cause Confusion and Chaos.
Update the documentation to make this clear, and provide a more helpful
error message if this situation occurs. The operation previously
failed anyway, but with an unhelpful message about not being able to
find a node record.
This helps provide a better picture of the state of the cluster, i.e.
making it more obvious whether there's been a timeline divergence.
This also provides infrastructure for further improvements in cluster
status display and diagnosis.
Note this is only available in PostgreSQL 9.6 and later as it relies
on the SQL functions for interrogating pg_control, which can be executed
remotely. As PostgreSQL 9.5 will shortly be the only community-supported
version without these functions, it's not worth the effort of trying
to duplicate their functionality.
Previously repmgr would happily clone from whatever server
it found at the provided source server address. We should
ensure that a standby can only be cloned from a node which
is part of the main replication cluster.
This check fetches a list of nodes from the source server,
connects to the first non-witness server it finds, and
compares the system identifiers of the source node and the
node it has connected to. If there is a mismatch, then the
source server is clearly not part of the main replication
cluster, and is most likely the witness server.
When showing node information, check if the node's copy of its
record shows a different upstream to the one expected according
to the node where the command is executed.
This helps visualise situations where the cluster is in an
unexpected state, and provide a better idea of the actual state.
For example, if a cluster has divided somehow and a set of nodes are
following a new primary, when running "cluster show" etc., repmgr
will now show the name of the primary those nodes are actually
following, rather than the now outdated node name recorded
on the other side of the split. A warning will also be issued
about the situation.
In Barman mode, if there is an existing, populated data directory, and
the "--force" option is provided, the entire directory was being deleted,
and later recreated as part of the rsync process, but with the default
permissions.
Fix this by recreating the data directory with the correct permissions
after deleting it.
This enable selection of the method repmgrd uses to check whether the upstream
node is available. Possible values are:
- "ping" (default): uses PQping() to check server availability
- "connection": executes a query on the connection to check server
availability (similar to repmgr3.x).
With long-running copy operations, it's possible the connection(s) to
the primary/source server may go away for some reason, so recheck
their availability before attempting to reuse.
We only want to check the status of physical replication slots
to determine whether a streaming replication standby has become
detached and there is therefore a risk of uncontrolled WAL buildup
on the local node.
It's not feasible to second-guess the state of logical replication
slots.
If, for whatever reason, repmgrd is not running on a node, but that
node qualifies as promotion candidate, failover will not take place
as that node will never promote itself.
We therefore discount nodes where repmgrd is running as promotion
candidates, which will ensure one node is always promoted.
There is a slight risk here that the node(s) where repmgrd is not running
are further ahead, leading to a timeline fork. It might be possible
to mitigate that by having the "election" leader perform the promote
(or follow) operation.
Previously, if the witness server connection details were provided
to "repmgr witness register" rather than those of the primary server,
repmgr a) write the node record to the witness server rather than
the primary, and b) would loop indefinitely trying to copy the
node table to itself.
Addresses GitHub #538.
Ensure repmgr checks the standby (promotion candidate) is currently
attached to the primary (demotion candidate).
Addresses issue reported in GitHub #519.
Immediately after the demotion candidate (primary) has shut down, we can't
be absolutely sure that the walreceiver has flushed all WAL to disk, so
checking pg_last_wal_receive_lsn() at that point might not reflect
the actual last available WAL location.
To handle this, we'll loop for a while (timeout controlled by configuration
parameter "wal_receive_check_timeout") before finally deciding whether
the standby is still behind the shut-down primary.
Addresses issue raised in GitHub #518.