When cloning a standby, it's possible to do a "raw" clone by providing
-D/--data-directory but no repmgr.conf file. However the code which
creates "standby.signal" was assuming the presence of a valid
repmgr.conf complete with "data_directory" configuration.
This is very much a niche-use case.
This overrides the equivalent setting in repmgr.conf, if present.
Note this option was available in repmgr versions prior to 4.0, but
was assumed to be redundant. However recently a use-case was made
for its reintroduction.
From PostgreSQL 13, pg_rewind will automatically handle an unclean
shutdown itself, so as long as --force-rewind was provided, so there
is no need to fail with an error.
Note that pg_rewind handles the unclean shutdown by starting PostgreSQL
in single user mode, which it does before performing any checks as
to whether a rewind is actually necessary.
However pg_rewind doesn't take into account the possible presence
of a standby.signal file, so we remove that and recreate it after
pg_rewind was executed.
If two diverged nodes are on the same timeline, currently there's
no way of establishing the divergence point and pg_rewind
is ineffective.
Clarify the log messages to make this clearer.
Previously the check verifying that a node has connected to its upstream
merely assumed the presence of a record in pg_stat_replication indicates
a successful replication connection. However the record may contain a
state other than "streaming", typically "startup" (which will occur when
a node has diverged from its upstream and will therefore never
transition to "streaming"), which needs to be taken into account when
considering the state of the replication connection to avoid false
positives.
This implements storing the configuration file parameter definitions in
an iterable list. This will replace the existing way of populating the
configuration struct, which is a long and cumbersome if/else structure,
and will make it possible to later dump the imported configuration.
We have a --downstream option to check for attached nodes, but it
would be useful to have a corresponding --upstream option too.
A following patch will adapt the behaviour of this option when executed
on the primary node.
This is mainly useful for the --data-directory-config option, which
requires permission to read pg_settings to verify that the data
directory configured in "repmgr.conf" matches the data directory
actually in use.
If pg_settings read permission is not available, repmgr will fall
back to a simple check that the data directory configured in
"repmgr.conf" is a valid PostgreSQL directory. This is not entirely
foolproof, as it's possible PostgreSQL could be using a different
data directory.
In a few places, replication connections are generated from the
parameters used by existing connections. This has resulted in a
number of similar blocks of code which do more-or-less the same
thing almost but not quite identically. In two cases, the code
omitted to set "dbname=replication", which can cause problems
in some contexts.
These code blocks have now been consolidated into standardized
functions.
This also resolves the issue addressed by GitHub #619.
Within a PostgreSQL data directory, all files should have the same
ownership as the data directory itself. PostgreSQL itself expects
this, and ownership of files by another user is likely to cause
problems.
In PostgreSQL 11 or earlier, if "recovery.conf" cannot be moved
by PostgreSQL (because e.g. it is owned by root), it will not be
possible to promote the standby to primary.
In PostgreSQL 12 and later, if "postgresql.auto.conf" on the demotion
candidate (current primary) has incorrect ownership (e.g. owned by
root), repmgr will very likely not be able to modify this file and
write the replication configuration required for the node to rejoin
the cluster as a standby.
Checks added to catch both cases before a switchover is executed.
This is for backbranches to prevent them running against newer
PostgreSQL versions with which they are not compatible, for example
4.4.x with PostgreSQL 12 and later.
Make the code previously only used by "standby follow" generally
available - we'll want to use this from "node rejoin" as well.
While we're at it, when reporting failure due to lack of free
replication slots, report the current value of "max_replication_slots".