repmgrdpausingpausing repmgrdPausing repmgrd
In normal operation, repmgrd monitors the state of the
PostgreSQL node it is running on, and will take appropriate action if problems
are detected, e.g. (if so configured) promote the node to primary, if the existing
primary has been determined as failed.
However, repmgrd is unable to distinguish between
planned outages (such as performing a switchover
or installing PostgreSQL maintenance released), and an actual server outage. In versions prior to
&repmgr; 4.2 it was necessary to stop repmgrd on all nodes (or at least
on all nodes where repmgrd is
configured for automatic failover)
to prevent repmgrd from making unintentional changes to the
replication cluster.
From &repmgr; 4.2, repmgrd
can now be "paused", i.e. instructed not to take any action such as performing a failover.
This can be done from any node in the cluster, removing the need to stop/restart
each repmgrd individually.
For major PostgreSQL upgrades, e.g. from PostgreSQL 10 to PostgreSQL 11,
repmgrd should be shut down completely and only started up
once the &repmgr; packages for the new PostgreSQL major version have been installed.
Prerequisites for pausing repmgrd
In order to be able to pause/unpause repmgrd, following
prerequisites must be met:
&repmgr; 4.2 or later must be installed on all nodes.The same major &repmgr; version (e.g. 4.2) must be installed on all nodes (and preferably the same minor version).
PostgreSQL on all nodes must be accessible from the node where the
pause/unpause operation is executed, using the
conninfo string shown by repmgr cluster show.
These conditions are required for normal &repmgr; operation in any case.
Pausing/unpausing repmgrd
To pause repmgrd, execute repmgr daemon pause, e.g.:
$ repmgr -f /etc/repmgr.conf daemon pause
NOTICE: node 1 (node1) paused
NOTICE: node 2 (node2) paused
NOTICE: node 3 (node3) paused
The state of repmgrd on each node can be checked with
repmgr daemon status, e.g.:
$ repmgr -f /etc/repmgr.conf daemon status
ID | Name | Role | Status | repmgrd | PID | Paused?
----+-------+---------+---------+---------+------+---------
1 | node1 | primary | running | running | 7851 | yes
2 | node2 | standby | running | running | 7889 | yes
3 | node3 | standby | running | running | 7918 | yes
If executing a switchover with repmgr standby switchover,
&repmgr; will automatically pause/unpause repmgrd as part of the switchover process.
If the primary (in this example, node1) is stopped, repmgrd
running on one of the standbys (here: node2) will react like this:
[2018-09-20 12:22:21] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
[2018-09-20 12:22:21] [INFO] checking state of node 1, 1 of 5 attempts
[2018-09-20 12:22:21] [INFO] sleeping 1 seconds until next reconnection attempt
...
[2018-09-20 12:22:24] [INFO] sleeping 1 seconds until next reconnection attempt
[2018-09-20 12:22:25] [INFO] checking state of node 1, 5 of 5 attempts
[2018-09-20 12:22:25] [WARNING] unable to reconnect to node 1 after 5 attempts
[2018-09-20 12:22:25] [NOTICE] node is paused
[2018-09-20 12:22:33] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state
[2018-09-20 12:22:33] [DETAIL] repmgrd paused by administrator
[2018-09-20 12:22:33] [HINT] execute "repmgr daemon unpause" to resume normal failover mode
If the primary becomes available again (e.g. following a software upgrade), repmgrd
will automatically reconnect, e.g.:
[2018-09-20 13:12:41] [NOTICE] reconnected to upstream node 1 after 8 seconds, resuming monitoring
To unpause repmgrd, execute repmgr daemon unpause, e.g.:
$ repmgr -f /etc/repmgr.conf daemon unpause
NOTICE: node 1 (node1) unpaused
NOTICE: node 2 (node2) unpaused
NOTICE: node 3 (node3) unpaused
If the previous primary is no longer accessible when repmgrd
is unpaused, no failover action will be taken. Instead, a new primary must be manually promoted using
repmgr standby promote,
and any standbys attached to the new primary with
repmgr standby follow.
This is to prevent repmgr daemon unpause
resulting in the automatic promotion of a new primary, which may be a problem particularly
in larger clusters, where repmgrd could select a different promotion
candidate to the one intended by the administrator.
Details on the repmgrd pausing mechanism
The pause state of each node will be stored over a PostgreSQL restart.
repmgr daemon pause and
repmgr daemon unpause can be
executed even if repmgrd is not running; in this case,
repmgrd will start up in whichever pause state has been set.
repmgr daemon pause and
repmgr daemon unpausedo not stop/start repmgrd.