Previously, repmgr would forcibly change the permissions on a data
directory to 0700. However from PostgreSQL 11, 0750 is also valid,
so that value should not be changed.
This overrides the equivalent setting in repmgr.conf, if present.
Note this option was available in repmgr versions prior to 4.0, but
was assumed to be redundant. However recently a use-case was made
for its reintroduction.
In PostgreSQL 12 and later we need to append replication configuration
to "postgresql.auto.conf" to guarantee it will be read last, and hence
override any preceding replication configuration which may be haunting
the configuration files.
We've been assuming that "postgresql.auto.conf" will always be present,
but at least one corner case has been observed where that was not the
case on the node being cloned from. Moreover it's perfectly acceptable
that this file does not exist (it will be recreated the next time
ALTER SYSTEM is executed), so we should be prepared to handle that case.
In passing, improve handling of more unlikely errors which might be
encountered when processing "postgresql.auto.conf".
From PostgreSQL 13, pg_rewind will automatically handle an unclean
shutdown itself, so as long as --force-rewind was provided, so there
is no need to fail with an error.
Note that pg_rewind handles the unclean shutdown by starting PostgreSQL
in single user mode, which it does before performing any checks as
to whether a rewind is actually necessary.
However pg_rewind doesn't take into account the possible presence
of a standby.signal file, so we remove that and recreate it after
pg_rewind was executed.
By setting --waldir in "pg_basebackup_options", standbys cloned using
pg_basebackup would have their WAL directory set to the specified
location and symlinked from the data directory.
This commit causes repmgr to honour that setting even when cloning
from Barman.
We have a --downstream option to check for attached nodes, but it
would be useful to have a corresponding --upstream option too.
A following patch will adapt the behaviour of this option when executed
on the primary node.
This is mainly useful for the --data-directory-config option, which
requires permission to read pg_settings to verify that the data
directory configured in "repmgr.conf" matches the data directory
actually in use.
If pg_settings read permission is not available, repmgr will fall
back to a simple check that the data directory configured in
"repmgr.conf" is a valid PostgreSQL directory. This is not entirely
foolproof, as it's possible PostgreSQL could be using a different
data directory.
From PostgreSQL 12, the SQL-level function "pg_promote()" can be used
to promote a PostgreSQL instance, however usage is restricted to
superusers and users to whom explicit execution permission for this
function has been granted.
Therefore, if execution permission is not available, fall back to
"pg_ctl promote".
There were some corner cases where "repmgr standby switchover"
would erroneously report a successful switchover, even if the
demotion candidate had not reattached to the promotion candidate.
Also improve the logging in various places to make it clearer
what is happening on which node.
repmgr has always insisted on determining the upstream's data directory
location, which requires superuser permissions (or from PostgreSQL 10,
membership of the default role "pg_read_all_settings").
Knowledge of the data directory location was required to implement rsync
cloning (now deprecated), but with pg_basebackup the minimum permission
requirement is now only a normal user with access to the repmgr metadata
and a user with replication permissions. The ability to determine the
data directory location is only required if the user specifies the
--copy-external-config-files option, which needs to be able to determine
the data directory to work out which configuration files are located
outside it.
This patch makes it possible to clone a standby with minimum
permissions, with appropriate checks for available permissions if
--copy-external-config-files is provided.
Implements part of GitHub #536 and addresses issue raised in #586.
In early repmgr versions, this used to be a requirement for cloning
via rsync, and/or as a fallback location if the user didn't supply
a data directory to clone into. However as rsync cloning has been
deprecated, and the data directory must be specified in repmgr.conf,
this is no longer required, and removing it simplifies user privilege
requirements.
Note that it is still possible to explicitly provide a target data
directory with -D/--pgdata, though this is primarily useful for
the niche use case where repmgr is used as a convenience tool to
clone a node which is not intended to become part of a repmgr
cluster.
This is part of the implementation of GitHub #536 for the minimizing
of user privilege requirements.
repmgrd has a check to see if the upstream node has unexpectedly
changed, e.g. if the repmgrd service is paused and the PostgreSQL
instance has been pointed to another node.
However this check was relying on the node record on the local node
being up-to-date, which may not be the case immediately after a
failover, when the node is still replaying records updated prior
to the node's own record being updated. In this case it will
mistakenly assume the node is following the original primary
and attempt to restart monitoring, which will fail as the original
primary is no longer available.
To prevent this, we check against the node's record on the upstream
node.
Addresses issue noted in GitHub #587 and #588.
This makes it possible to return log output when executing repmgr
remotely at a different level to the one defined in the remote
repmgr's repmgr.conf.
This is particularly useful when DEBUG output is required.
"repmgr daemon" can be interpreted to mean the commands affect the local
daemon process only. Rename the commands which affect the entire cluster
to "repmgr service ...".
The "repmgr daemon ..." form of the affected commands is retained for backwards
compatibility.
Previously repmgr would emit the "repmgr extension not found on source node"
which depending on context is somewhat misleading, as it may exist
but not be installed, or the user may be attempting to clone from the
wrong database.
From PostgreSQL 10, a member of the default roles "pg_monitor" and/or
"pg_read_all_settings" can read pg_settings without requiring superuser
privileges.
Previously, a hint was being emitted about making the repmgr user a
member of one of those groups, but no check for membership was being
made, meaning the check could only be run by a superuser.
Previously, repmgr was using a very simple ad-hoc string-based parser,
which had various limitations and allowed configuration files to be
created in a way which could cause confusion and/or unexpected
behaviour.
For example, it accepted strings enclosed in single quotes, but treated
strings enclosed in double quotes literally. A node_name defined thusly:
node_name="somenode"
would result in the literal value '"somenode"' being used, which could
lead to unobvious errors along the lines of:
no record found for ""somenode""
The configuration file parser has been adapted from the one used by
PostgreSQL itself, so behaves more-or-less identically (though some
functions such as file inclusion are not supported in repmgr).
This makes configuration parsing more robust and consistent;
additionally, error reporting will be more precise.
Note this does mean that some repmgr.conf items previously accepted
as valid by repmgr will now be rejected; in particular this includes
strings containing spaces which are not enclosed in single quotes.
This reverts commit c6ca183247.
Backing out this patch for now as the Debian build system doesn't
seem to like it, even though it builds just fine on Debian itself.
Previously, repmgr was using a very simple ad-hoc string-based parser,
which had various limitations and allowed configuration files to be
created in a way which could cause confusion and/or unexpected
behaviour.
For example, it accepted strings enclosed in single quotes, but treated
strings enclosed in double quotes literally. A node_name defined thusly:
node_name="somenode"
would result in the literal value '"somenode"' being used, which could
lead to unobvious errors along the lines of:
no record found for ""somenode""
The configuration file parser has been adapted from the one used by
PostgreSQL itself, so behaves more-or-less identically (though some
functions such as file inclusion are not supported in repmgr).
This makes configuration parsing more robust and consistent;
additionally, error reporting will be more precise.
Note this does mean that some repmgr.conf items previously accepted
as valid by repmgr will now be rejected; in particular this includes
strings containing spaces which are not enclosed in single quotes.
While it's technically possible to have a standby follow another
standby while the primary is not available, repmgr will not be able
to update its metadata, which will cause Confusion and Chaos.
Update the documentation to make this clear, and provide a more helpful
error message if this situation occurs. The operation previously
failed anyway, but with an unhelpful message about not being able to
find a node record.