diff --git a/doc/repmgrd-automatic-failover.sgml b/doc/repmgrd-automatic-failover.sgml
index 9fc2e0c5..42b6ef88 100644
--- a/doc/repmgrd-automatic-failover.sgml
+++ b/doc/repmgrd-automatic-failover.sgml
@@ -168,6 +168,225 @@
+
+
+ Primary visibility consensus
+
+
+ repmgrd
+ primary visibility consensus
+
+
+
+ primary_visibility_consensus
+
+
+
+ In more complex replication setups, particularly where replication occurs between
+ multiple datacentres, it's possible that some but not all standbys get cut off from the
+ primary (but not from the other standbys).
+
+
+ In this situation, normally it's not desirable for any of the standbys which have been
+ cut off to initiate a failover, as the primary is still functioning and standbys are
+ connected. Beginning with &repmgr; 4.4
+ it is now possible for the affected standbys to build a consensus about whether
+ the primary is still available to some standbys ("primary visibility consensus").
+ This is done by polling each standby for the time it last saw the primary;
+ if any have seen the primary very recently, it's reasonable
+ to infer that the primary is still available and a failover should not be started.
+
+
+
+ The time the primary was last seen by each node can be checked by executing
+ repmgr daemon status,
+ which includes this in its output, e.g.:
+ $ repmgr -f /etc/repmgr.conf daemon status
+ ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
+----+-------+---------+-----------+----------+---------+-------+---------+--------------------
+ 1 | node1 | primary | * running | | running | 96563 | no | n/a
+ 2 | node2 | standby | running | node1 | running | 96572 | no | 1 second(s) ago
+ 3 | node3 | standby | running | node1 | running | 96584 | no | 0 second(s) ago
+
+
+
+
+ To enable this functionality, in repmgr.conf set:
+
+ primary_visibility_consensus=true
+
+
+
+ must be set to
+ true on all nodes for it to be effective.
+
+
+
+
+ The following sample &repmgrd; log output demonstrates the behaviour in a situation
+ where one of three standbys is no longer able to connect to the primary, but can
+ connect to the two other standbys ("sibling nodes"):
+
+ [2019-05-17 05:36:12] [WARNING] unable to reconnect to node 1 after 3 attempts
+ [2019-05-17 05:36:12] [INFO] 2 active sibling nodes registered
+ [2019-05-17 05:36:12] [INFO] local node's last receive lsn: 0/7006E58
+ [2019-05-17 05:36:12] [INFO] checking state of sibling node "node3" (ID: 3)
+ [2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) reports its upstream is node 1, last seen 1 second(s) ago
+ [2019-05-17 05:36:12] [NOTICE] node 3 last saw primary node 1 second(s) ago, considering primary still visible
+ [2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/7006E58
+ [2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node2" (ID: 2)
+ [2019-05-17 05:36:12] [INFO] checking state of sibling node "node4" (ID: 4)
+ [2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) reports its upstream is node 1, last seen 0 second(s) ago
+ [2019-05-17 05:36:12] [NOTICE] node 4 last saw primary node 0 second(s) ago, considering primary still visible
+ [2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node4" (ID: 4) is: 0/7006E58
+ [2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) has same LSN as current candidate "node2" (ID: 2)
+ [2019-05-17 05:36:12] [INFO] 2 nodes can see the primary
+ [2019-05-17 05:36:12] [DETAIL] following nodes can see the primary:
+ - node "node3" (ID: 3): 1 second(s) ago
+ - node "node4" (ID: 4): 0 second(s) ago
+ [2019-05-17 05:36:12] [NOTICE] cancelling failover as some nodes can still see the primary
+ [2019-05-17 05:36:12] [NOTICE] election cancelled
+ [2019-05-17 05:36:14] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state
+ In this situation it will cancel the failover and enter degraded monitoring node,
+ waiting for the primary to reappear.
+
+
+
+
+ Standby disconnection on failover
+
+
+ repmgrd
+ standby disconnection on failover
+
+
+
+ standby disconnection on failover
+
+
+
+ If is set to true in
+ repmgr.conf, in a failover situation &repmgrd; will forcibly disconnect
+ the local node's WAL receiver before making a failover decision.
+
+
+
+ is available from PostgreSQL 9.5 and later.
+ Additionally this requires that the repmgr database user is a superuser.
+
+
+
+ By doing this, it's possible to ensure that, at the point the failover decision is made, no nodes
+ are receiving data from the primary and their LSN location will be static.
+
+
+
+ must be set to the same value on
+ all nodes.
+
+
+
+ Note that when using there will be a delay of 5 seconds
+ plus however many seconds it takes to confirm the WAL receiver is disconnected before
+ &repmgrd; proceeds with the failover decision.
+
+
+ Following the failover operation, no matter what the outcome, each node will reconnect its WAL receiver.
+
+
+ If using , we recommend that the
+ option is also used.
+
+
+
+
+
+ Failover validation
+
+
+ repmgrd
+ failover validation
+
+
+
+ failover validation
+
+
+
+ From repmgr 4.3, &repmgr; makes it possible to provide a script
+ to &repmgrd; which, in a failover situation,
+ will be executed by the promotion candidate (the node which has been selected
+ to be the new primary) to confirm whether the node should actually be promoted.
+
+
+ To use this, in repmgr.conf
+ to a script executable by the postgres system user, e.g.:
+
+ failover_validation_command=/path/to/script.sh %n %a
+
+
+ The %n parameter will be replaced with the node ID, and the
+ %a parameter will be replaced by the node name when the script is executed.
+
+
+ This script must return an exit code of 0 to indicate the node should promote itself.
+ Any other value will result in the promotion being aborted and the election rerun.
+ There is a pause of seconds before the election is rerun.
+
+
+ Sample &repmgrd; log file output during which the failover validation
+ script rejects the proposed promotion candidate:
+
+[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
+[2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2)
+[2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command"
+[2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2
+[2019-03-13 21:01:30] [INFO] output returned by failover validation command:
+Node ID: 2
+
+[2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1"
+[2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun
+[2019-03-13 21:01:30] [INFO] 1 followers to notify
+[2019-03-13 21:01:30] [NOTICE] notifying node "node3" (ID: 3) to rerun promotion candidate selection
+INFO: node 3 received notification to rerun promotion candidate election
+[2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval")
+
+
+
+
+
+
+ repmgrd and cascading replication
+
+
+ repmgrd
+ cascading replication
+
+
+
+ cascading replication
+ repmgrd
+
+
+
+ Cascading replication - where a standby can connect to an upstream node and not
+ the primary server itself - was introduced in PostgreSQL 9.2. &repmgr; and
+ &repmgrd; support cascading replication by keeping track of the relationship
+ between standby servers - each node record is stored with the node id of its
+ upstream ("parent") server (except of course the primary server).
+
+
+ In a failover situation where the primary node fails and a top-level standby
+ is promoted, a standby connected to another standby will not be affected
+ and continue working as normal (even if the upstream standby it's connected
+ to becomes the primary node). If however the node's direct upstream fails,
+ the "cascaded standby" will attempt to reconnect to that node's parent
+ (unless failover is set to manual in
+ repmgr.conf).
+
+
+
+
Monitoring standby disconnections on the primary node
@@ -310,7 +529,7 @@
[2019-04-24 15:28:19] [NOTICE] node "node3" (ID: 3) has disconnected
[2019-04-24 15:28:19] [NOTICE] 1 (of 2) child nodes are connected, but at least 2 child nodes required
[2019-04-24 15:28:19] [INFO] most recently detached child node was 3 (ca. 0 seconds ago), not triggering "child_nodes_disconnect_command"
-[2019-04-24 15:28:19] [DETAIL] "child_nodes_disconnect_timeout" set to 30 seconds
+[2019-04-24 15:28:19] [DETAIL] "child_nodes_disconnect_timeout" set To 30 seconds
(...)
@@ -646,140 +865,5 @@ $ repmgr cluster event --event=child_nodes_disconnect_command
-
- Standby disconnection on failover
-
-
- repmgrd
- standby disconnection on failover
-
-
-
- standby disconnection on failover
-
-
-
- If is set to true in
- repmgr.conf, in a failover situation &repmgrd; will forcibly disconnect
- the local node's WAL receiver before making a failover decision.
-
-
-
- is available from PostgreSQL 9.5 and later.
- Additionally this requires that the repmgr database user is a superuser.
-
-
-
- By doing this, it's possible to ensure that, at the point the failover decision is made, no nodes
- are receiving data from the primary and their LSN location will be static.
-
-
-
- must be set to the same value on
- all nodes.
-
-
-
- Note that when using there will be a delay of 5 seconds
- plus however many seconds it takes to confirm the WAL receiver is disconnected before
- &repmgrd; proceeds with the failover decision.
-
-
- Following the failover operation, no matter what the outcome, each node will reconnect its WAL receiver.
-
-
- If using , we recommend that the
- option is also used.
-
-
-
-
-
- Failover validation
-
-
- repmgrd
- failover validation
-
-
-
- failover validation
-
-
-
- From repmgr 4.3, &repmgr; makes it possible to provide a script
- to &repmgrd; which, in a failover situation,
- will be executed by the promotion candidate (the node which has been selected
- to be the new primary) to confirm whether the node should actually be promoted.
-
-
- To use this, in repmgr.conf
- to a script executable by the postgres system user, e.g.:
-
- failover_validation_command=/path/to/script.sh %n %a
-
-
- The %n parameter will be replaced with the node ID, and the
- %a parameter will be replaced by the node name when the script is executed.
-
-
- This script must return an exit code of 0 to indicate the node should promote itself.
- Any other value will result in the promotion being aborted and the election rerun.
- There is a pause of seconds before the election is rerun.
-
-
- Sample &repmgrd; log file output during which the failover validation
- script rejects the proposed promotion candidate:
-
-[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
-[2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2)
-[2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command"
-[2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2
-[2019-03-13 21:01:30] [INFO] output returned by failover validation command:
-Node ID: 2
-
-[2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1"
-[2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun
-[2019-03-13 21:01:30] [INFO] 1 followers to notify
-[2019-03-13 21:01:30] [NOTICE] notifying node "node3" (ID: 3) to rerun promotion candidate selection
-INFO: node 3 received notification to rerun promotion candidate election
-[2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval")
-
-
-
-
-
-
- repmgrd and cascading replication
-
-
- repmgrd
- cascading replication
-
-
-
- cascading replication
- repmgrd
-
-
-
- Cascading replication - where a standby can connect to an upstream node and not
- the primary server itself - was introduced in PostgreSQL 9.2. &repmgr; and
- &repmgrd; support cascading replication by keeping track of the relationship
- between standby servers - each node record is stored with the node id of its
- upstream ("parent") server (except of course the primary server).
-
-
- In a failover situation where the primary node fails and a top-level standby
- is promoted, a standby connected to another standby will not be affected
- and continue working as normal (even if the upstream standby it's connected
- to becomes the primary node). If however the node's direct upstream fails,
- the "cascaded standby" will attempt to reconnect to that node's parent
- (unless failover is set to manual in
- repmgr.conf).
-
-
-
-