repmgr node rejoinrepmgr node rejoinrepmgr node rejoinrejoin a dormant (stopped) node to the replication clusterDescription
Enables a dormant (stopped) node to be rejoined to the replication cluster.
This can optionally use pg_rewind to re-integrate
a node which has diverged from the rest of the cluster, typically a failed primary.
If the node is running and needs to be attached to the current primary, use
.
Note can only be used for standbys which have not diverged
from the rest of the cluster.
Usage
repmgr node rejoin -d '$conninfo'
where $conninfo is the conninfo string of any reachable node in the cluster.
repmgr.conf for the stopped node *must* be supplied explicitly if not
otherwise available.
Options
Check prerequisites but don't actually execute the rejoin.
Execute pg_rewind.
It is only necessary to provide the pg_rewind path
if using PostgreSQL 9.3 or 9.4, and pg_rewind
is not installed in the PostgreSQL bin directory.
comma-separated list of configuration files to retain after
executing pg_rewind.
Currently pg_rewind will overwrite
the local node's configuration files with the files from the source node,
so it's advisable to use this option to ensure they are kept.
Directory to temporarily store configuration files specified with
; default: /tmp.
Don't wait for the node to rejoin cluster.
If this option is supplied, &repmgr; will restart the node but
not wait for it to connect to the primary.
Configuration file settingsnode_rejoin_timeout:
the maximum length of time (in seconds) to wait for
the node to reconnect to the replication cluster (defaults to
the value set in standby_reconnect_timeout,
60 seconds).
Note that standby_reconnect_timeout must be
set to a value equal to or greater than
node_rejoin_timeout.
Event notifications
A node_rejoin event notification will be generated.
Exit codes
One of the following exit codes will be emitted by repmgr node rejoin:
The node rejoin succeeded; or if was provided,
no issues were detected which would prevent the node rejoin.
A configuration issue was detected which prevented &repmgr; from
continuing with the node rejoin.
The node could not be restarted.
The node rejoin operation failed.
Notes
Currently repmgr node rejoin can only be used to attach
a standby to the current primary, not another standby.
The node must have been shut down cleanly; if this was not the case, it will
need to be manually started (remove any existing recovery.conf file first)
until it has reached a consistent recovery point, then shut down cleanly.
If PostgreSQL is started in single-user mode and
input is directed from /dev/null/, it will perform recovery
then immediately quit, and will then be in a state suitable for use by
pg_rewind.
rm -f /var/lib/pgsql/data/recovery.conf
postgres --single -D /var/lib/pgsql/data/ < /dev/null
&repmgr; will attempt to verify whether the node can rejoin as-is, or whether
pg_rewind must be used (see following section).
Using pg_rewindpg_rewindusing with "repmgr node rejoin"repmgr node rejoin can optionally use pg_rewind to re-integrate a
node which has diverged from the rest of the cluster, typically a failed primary.
pg_rewind is available in PostgreSQL 9.5 and later as part of the core distribution,
and can be installed from external sources for PostgreSQL 9.3 and 9.4.
pg_rewindrequires that either
wal_log_hints is enabled, or that
data checksums were enabled when the cluster was initialized. See the
pg_rewind documentation for details.
We strongly recommend familiarizing yourself with pg_rewind before attempting
to use it with &repmgr;, as while it is an extremely useful tool, it is not
a "magic bullet" which can resolve all problematic replication situations.
A typical use-case for pg_rewind is when a scenario like the following
is encountered:
$ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr' \
--force-rewind --config-files=postgresql.local.conf,postgresql.conf --verbose --dry-run
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 6652184002263212600
ERROR: this node cannot attach to rejoin target node 3
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710
HINT: use --force-rewind to execute pg_rewind
Here, node3 was promoted to a primary while the local node was
still attached to the previous primary; this can potentially happen during e.g. a
network split. pg_rewind can re-sync the local node with node3,
removing the need for a full reclone.
To have repmgr node rejoin use pg_rewind,
pass the command line option --force-rewind, which will tell &repmgr;
to execute pg_rewind to ensure the node can be rejoined successfully.
Be aware that if pg_rewind is executed and actually performs a
rewind operation, any configuration files in the PostgreSQL data directory will be
overwritten with those from the source server.
To prevent this happening, provide a comma-separated list of files to retain
using the --config-file command line option; the specified files
will be archived in a temporary directory (whose parent directory can be specified with
--config-archive-dir) and restored once the rewind operation is
complete.
Example, first using --dry-run, then actually executing the
node rejoin command.
$ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr' \
--config-files=postgresql.local.conf,postgresql.conf --verbose --force-rewind --dry-run
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 6652460429293670710
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 3
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710
INFO: prerequisites for using pg_rewind are met
INFO: file "postgresql.local.conf" would be copied to "/tmp/repmgr-config-archive-node2/postgresql.local.conf"
INFO: file "postgresql.replication-setup.conf" would be copied to "/tmp/repmgr-config-archive-node2/postgresql.replication-setup.conf"
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
pg_rewind -D '/var/lib/postgresql/data' --source-server='host=node3 dbname=repmgr user=repmgr'
INFO: prerequisites for executing NODE REJOIN are met
If is used with the option,
this checks the prerequisites for using pg_rewind, but is
not an absolute guarantee that actually executing pg_rewind
will succeed. See also section below.
$ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr' \
--config-files=postgresql.local.conf,postgresql.conf --verbose --force-rewind
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 3
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "pg_rewind -D '/var/lib/postgresql/data' --source-server='host=node3 dbname=repmgr user=repmgr'"
NOTICE: 2 files copied to /var/lib/postgresql/data
NOTICE: setting node 2's upstream to node 3
NOTICE: starting server using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' start"
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 3Caveats when using repmgr node rejoinrepmgr node rejoincaveatsrepmgr node rejoin attempts to determine whether it will succeed by
comparing the timelines and relative WAL positions of the local node (rejoin candidate) and primary
(rejoin target). This is particularly important if planning to use pg_rewind,
which currently (as of PostgreSQL 12) may appear to succeed (or indicate there is no action
needed) but potentially allow an impossible action, such as trying to rejoin a standby to a
primary which is behind the standby. &repmgr; will prevent this situation from occurring.
Currently it is not possible to detect a situation where the rejoin target
is a standby which has been "promoted" by removing recovery.conf
(PostgreSQL 12 and later: standby.signal) and restarting it.
In this case there will be no information about the point the rejoin target diverged
from the current standby; the rejoin operation will fail and
the current standby's PostgreSQL log will contain entries with the text
"record with incorrect prev-link".
We strongly recommend running repmgr node rejoin with the
option first. Additionally it might be a good idea
to execute the pg_rewind command displayed by
&repmgr; with the pg_rewind
option. Note that pg_rewind does not indicate that it
is running in mode.
See also