From cbaa890a228211e56b264fb399409cb3391a2204 Mon Sep 17 00:00:00 2001 From: Ian Barwick Date: Fri, 17 May 2019 14:55:51 +0900 Subject: [PATCH] doc: document "primary_visibility_consensus" --- doc/repmgrd-automatic-failover.sgml | 356 +++++++++++++++++----------- 1 file changed, 220 insertions(+), 136 deletions(-) diff --git a/doc/repmgrd-automatic-failover.sgml b/doc/repmgrd-automatic-failover.sgml index 9fc2e0c5..42b6ef88 100644 --- a/doc/repmgrd-automatic-failover.sgml +++ b/doc/repmgrd-automatic-failover.sgml @@ -168,6 +168,225 @@ + + + Primary visibility consensus + + + repmgrd + primary visibility consensus + + + + primary_visibility_consensus + + + + In more complex replication setups, particularly where replication occurs between + multiple datacentres, it's possible that some but not all standbys get cut off from the + primary (but not from the other standbys). + + + In this situation, normally it's not desirable for any of the standbys which have been + cut off to initiate a failover, as the primary is still functioning and standbys are + connected. Beginning with &repmgr; 4.4 + it is now possible for the affected standbys to build a consensus about whether + the primary is still available to some standbys ("primary visibility consensus"). + This is done by polling each standby for the time it last saw the primary; + if any have seen the primary very recently, it's reasonable + to infer that the primary is still available and a failover should not be started. + + + + The time the primary was last seen by each node can be checked by executing + repmgr daemon status, + which includes this in its output, e.g.: + $ repmgr -f /etc/repmgr.conf daemon status + ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen +----+-------+---------+-----------+----------+---------+-------+---------+-------------------- + 1 | node1 | primary | * running | | running | 96563 | no | n/a + 2 | node2 | standby | running | node1 | running | 96572 | no | 1 second(s) ago + 3 | node3 | standby | running | node1 | running | 96584 | no | 0 second(s) ago + + + + + To enable this functionality, in repmgr.conf set: + + primary_visibility_consensus=true + + + + must be set to + true on all nodes for it to be effective. + + + + + The following sample &repmgrd; log output demonstrates the behaviour in a situation + where one of three standbys is no longer able to connect to the primary, but can + connect to the two other standbys ("sibling nodes"): + + [2019-05-17 05:36:12] [WARNING] unable to reconnect to node 1 after 3 attempts + [2019-05-17 05:36:12] [INFO] 2 active sibling nodes registered + [2019-05-17 05:36:12] [INFO] local node's last receive lsn: 0/7006E58 + [2019-05-17 05:36:12] [INFO] checking state of sibling node "node3" (ID: 3) + [2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) reports its upstream is node 1, last seen 1 second(s) ago + [2019-05-17 05:36:12] [NOTICE] node 3 last saw primary node 1 second(s) ago, considering primary still visible + [2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/7006E58 + [2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node2" (ID: 2) + [2019-05-17 05:36:12] [INFO] checking state of sibling node "node4" (ID: 4) + [2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) reports its upstream is node 1, last seen 0 second(s) ago + [2019-05-17 05:36:12] [NOTICE] node 4 last saw primary node 0 second(s) ago, considering primary still visible + [2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node4" (ID: 4) is: 0/7006E58 + [2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) has same LSN as current candidate "node2" (ID: 2) + [2019-05-17 05:36:12] [INFO] 2 nodes can see the primary + [2019-05-17 05:36:12] [DETAIL] following nodes can see the primary: + - node "node3" (ID: 3): 1 second(s) ago + - node "node4" (ID: 4): 0 second(s) ago + [2019-05-17 05:36:12] [NOTICE] cancelling failover as some nodes can still see the primary + [2019-05-17 05:36:12] [NOTICE] election cancelled + [2019-05-17 05:36:14] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state + In this situation it will cancel the failover and enter degraded monitoring node, + waiting for the primary to reappear. + + + + + Standby disconnection on failover + + + repmgrd + standby disconnection on failover + + + + standby disconnection on failover + + + + If is set to true in + repmgr.conf, in a failover situation &repmgrd; will forcibly disconnect + the local node's WAL receiver before making a failover decision. + + + + is available from PostgreSQL 9.5 and later. + Additionally this requires that the repmgr database user is a superuser. + + + + By doing this, it's possible to ensure that, at the point the failover decision is made, no nodes + are receiving data from the primary and their LSN location will be static. + + + + must be set to the same value on + all nodes. + + + + Note that when using there will be a delay of 5 seconds + plus however many seconds it takes to confirm the WAL receiver is disconnected before + &repmgrd; proceeds with the failover decision. + + + Following the failover operation, no matter what the outcome, each node will reconnect its WAL receiver. + + + If using , we recommend that the + option is also used. + + + + + + Failover validation + + + repmgrd + failover validation + + + + failover validation + + + + From repmgr 4.3, &repmgr; makes it possible to provide a script + to &repmgrd; which, in a failover situation, + will be executed by the promotion candidate (the node which has been selected + to be the new primary) to confirm whether the node should actually be promoted. + + + To use this, in repmgr.conf + to a script executable by the postgres system user, e.g.: + + failover_validation_command=/path/to/script.sh %n %a + + + The %n parameter will be replaced with the node ID, and the + %a parameter will be replaced by the node name when the script is executed. + + + This script must return an exit code of 0 to indicate the node should promote itself. + Any other value will result in the promotion being aborted and the election rerun. + There is a pause of seconds before the election is rerun. + + + Sample &repmgrd; log file output during which the failover validation + script rejects the proposed promotion candidate: + +[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds +[2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2) +[2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command" +[2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2 +[2019-03-13 21:01:30] [INFO] output returned by failover validation command: +Node ID: 2 + +[2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1" +[2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun +[2019-03-13 21:01:30] [INFO] 1 followers to notify +[2019-03-13 21:01:30] [NOTICE] notifying node "node3" (ID: 3) to rerun promotion candidate selection +INFO: node 3 received notification to rerun promotion candidate election +[2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval") + + + + + + + repmgrd and cascading replication + + + repmgrd + cascading replication + + + + cascading replication + repmgrd + + + + Cascading replication - where a standby can connect to an upstream node and not + the primary server itself - was introduced in PostgreSQL 9.2. &repmgr; and + &repmgrd; support cascading replication by keeping track of the relationship + between standby servers - each node record is stored with the node id of its + upstream ("parent") server (except of course the primary server). + + + In a failover situation where the primary node fails and a top-level standby + is promoted, a standby connected to another standby will not be affected + and continue working as normal (even if the upstream standby it's connected + to becomes the primary node). If however the node's direct upstream fails, + the "cascaded standby" will attempt to reconnect to that node's parent + (unless failover is set to manual in + repmgr.conf). + + + + Monitoring standby disconnections on the primary node @@ -310,7 +529,7 @@ [2019-04-24 15:28:19] [NOTICE] node "node3" (ID: 3) has disconnected [2019-04-24 15:28:19] [NOTICE] 1 (of 2) child nodes are connected, but at least 2 child nodes required [2019-04-24 15:28:19] [INFO] most recently detached child node was 3 (ca. 0 seconds ago), not triggering "child_nodes_disconnect_command" -[2019-04-24 15:28:19] [DETAIL] "child_nodes_disconnect_timeout" set to 30 seconds +[2019-04-24 15:28:19] [DETAIL] "child_nodes_disconnect_timeout" set To 30 seconds (...) @@ -646,140 +865,5 @@ $ repmgr cluster event --event=child_nodes_disconnect_command - - Standby disconnection on failover - - - repmgrd - standby disconnection on failover - - - - standby disconnection on failover - - - - If is set to true in - repmgr.conf, in a failover situation &repmgrd; will forcibly disconnect - the local node's WAL receiver before making a failover decision. - - - - is available from PostgreSQL 9.5 and later. - Additionally this requires that the repmgr database user is a superuser. - - - - By doing this, it's possible to ensure that, at the point the failover decision is made, no nodes - are receiving data from the primary and their LSN location will be static. - - - - must be set to the same value on - all nodes. - - - - Note that when using there will be a delay of 5 seconds - plus however many seconds it takes to confirm the WAL receiver is disconnected before - &repmgrd; proceeds with the failover decision. - - - Following the failover operation, no matter what the outcome, each node will reconnect its WAL receiver. - - - If using , we recommend that the - option is also used. - - - - - - Failover validation - - - repmgrd - failover validation - - - - failover validation - - - - From repmgr 4.3, &repmgr; makes it possible to provide a script - to &repmgrd; which, in a failover situation, - will be executed by the promotion candidate (the node which has been selected - to be the new primary) to confirm whether the node should actually be promoted. - - - To use this, in repmgr.conf - to a script executable by the postgres system user, e.g.: - - failover_validation_command=/path/to/script.sh %n %a - - - The %n parameter will be replaced with the node ID, and the - %a parameter will be replaced by the node name when the script is executed. - - - This script must return an exit code of 0 to indicate the node should promote itself. - Any other value will result in the promotion being aborted and the election rerun. - There is a pause of seconds before the election is rerun. - - - Sample &repmgrd; log file output during which the failover validation - script rejects the proposed promotion candidate: - -[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds -[2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2) -[2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command" -[2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2 -[2019-03-13 21:01:30] [INFO] output returned by failover validation command: -Node ID: 2 - -[2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1" -[2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun -[2019-03-13 21:01:30] [INFO] 1 followers to notify -[2019-03-13 21:01:30] [NOTICE] notifying node "node3" (ID: 3) to rerun promotion candidate selection -INFO: node 3 received notification to rerun promotion candidate election -[2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval") - - - - - - - repmgrd and cascading replication - - - repmgrd - cascading replication - - - - cascading replication - repmgrd - - - - Cascading replication - where a standby can connect to an upstream node and not - the primary server itself - was introduced in PostgreSQL 9.2. &repmgr; and - &repmgrd; support cascading replication by keeping track of the relationship - between standby servers - each node record is stored with the node id of its - upstream ("parent") server (except of course the primary server). - - - In a failover situation where the primary node fails and a top-level standby - is promoted, a standby connected to another standby will not be affected - and continue working as normal (even if the upstream standby it's connected - to becomes the primary node). If however the node's direct upstream fails, - the "cascaded standby" will attempt to reconnect to that node's parent - (unless failover is set to manual in - repmgr.conf). - - - -