diff --git a/doc/filelist.sgml b/doc/filelist.sgml
index 246b7504..0ac4507a 100644
--- a/doc/filelist.sgml
+++ b/doc/filelist.sgml
@@ -49,6 +49,9 @@
+
+
+
diff --git a/doc/repmgr-cluster-cleanup.sgml b/doc/repmgr-cluster-cleanup.sgml
index bafc34f1..df207d0c 100644
--- a/doc/repmgr-cluster-cleanup.sgml
+++ b/doc/repmgr-cluster-cleanup.sgml
@@ -16,7 +16,8 @@
Monitoring history will only be written if repmgrd is active, and
- monitoring_history is set to true in repmgr.conf.
+ monitoring_history is set to true in
+ repmgr.conf.
diff --git a/doc/repmgr.sgml b/doc/repmgr.sgml
index 475f42f6..989efda0 100644
--- a/doc/repmgr.sgml
+++ b/doc/repmgr.sgml
@@ -81,6 +81,9 @@
&repmgrd-automatic-failover;
&repmgrd-configuration;
&repmgrd-demonstration;
+ &repmgrd-cascading-replication;
+ &repmgrd-network-split;
+ &repmgrd-degraded-monitoring;
&repmgrd-monitoring;
diff --git a/doc/repmgrd-cascading-replication.sgml b/doc/repmgrd-cascading-replication.sgml
new file mode 100644
index 00000000..b8e00514
--- /dev/null
+++ b/doc/repmgrd-cascading-replication.sgml
@@ -0,0 +1,17 @@
+
+ repmgrd and cascading replication
+
+ Cascading replication - where a standby can connect to an upstream node and not
+ the primary server itself - was introduced in PostgreSQL 9.2. &repmgr; and
+ repmgrd support cascading replication by keeping track of the relationship
+ between standby servers - each node record is stored with the node id of its
+ upstream ("parent") server (except of course the primary server).
+
+
+ In a failover situation where the primary node fails and a top-level standby
+ is promoted, a standby connected to another standby will not be affected
+ and continue working as normal (even if the upstream standby it's connected
+ to becomes the primary node). If however the node's direct upstream fails,
+ the "cascaded standby" will attempt to reconnect to that node's parent.
+
+
diff --git a/doc/repmgrd-degraded-monitoring.sgml b/doc/repmgrd-degraded-monitoring.sgml
new file mode 100644
index 00000000..adae7236
--- /dev/null
+++ b/doc/repmgrd-degraded-monitoring.sgml
@@ -0,0 +1,69 @@
+
+ "degraded monitoring" mode
+
+ In certain circumstances, `repmgrd` is not able to fulfill its primary mission
+ of monitoring the nodes' upstream server. In these cases it enters "degraded
+ monitoring" mode, where `repmgrd` remains active but is waiting for the situation
+ to be resolved.
+
+
+ Situations where this happens are:
+
+
+
+ a failover situation has occurred, no nodes in the primary node's location are visible
+
+
+
+ a failover situation has occurred, but no promotion candidate is available
+
+
+
+ a failover situation has occurred, but the promotion candidate could not be promoted
+
+
+
+ a failover situation has occurred, but the node was unable to follow the new primary
+
+
+
+ a failover situation has occurred, but no primary has become available
+
+
+
+ a failover situation has occurred, but automatic failover is not enabled for the node
+
+
+
+ repmgrd is monitoring the primary node, but it is not available
+
+
+
+
+
+ Example output in a situation where there is only one standby with failover=manual,
+ and the primary node is unavailable (but is later restarted):
+
+ [2017-08-29 10:59:19] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state (automatic failover disabled)
+ [2017-08-29 10:59:33] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
+ [2017-08-29 10:59:33] [INFO] checking state of node 1, 1 of 5 attempts
+ [2017-08-29 10:59:33] [INFO] sleeping 1 seconds until next reconnection attempt
+ (...)
+ [2017-08-29 10:59:37] [INFO] checking state of node 1, 5 of 5 attempts
+ [2017-08-29 10:59:37] [WARNING] unable to reconnect to node 1 after 5 attempts
+ [2017-08-29 10:59:37] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate
+ [2017-08-29 10:59:37] [NOTICE] no other nodes are available as promotion candidate
+ [2017-08-29 10:59:37] [HINT] use "repmgr standby promote" to manually promote this node
+ [2017-08-29 10:59:37] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state (automatic failover disabled)
+ [2017-08-29 10:59:53] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state (automatic failover disabled)
+ [2017-08-29 11:00:45] [NOTICE] reconnected to upstream node 1 after 68 seconds, resuming monitoring
+ [2017-08-29 11:00:57] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state (automatic failover disabled)
+
+
+
+ By default, repmgrd will continue in degraded monitoring mode indefinitely.
+ However a timeout (in seconds) can be set with degraded_monitoring_timeout.
+
+
+
+
diff --git a/doc/repmgrd-network-split.sgml b/doc/repmgrd-network-split.sgml
new file mode 100644
index 00000000..934bf0b8
--- /dev/null
+++ b/doc/repmgrd-network-split.sgml
@@ -0,0 +1,43 @@
+
+ Handling network splits with repmgrd
+
+ A common pattern for replication cluster setups is to spread servers over
+ more than one datacentre. This can provide benefits such as geographically-
+ distributed read replicas and DR (disaster recovery capability). However
+ this also means there is a risk of disconnection at network level between
+ datacentre locations, which would result in a split-brain scenario if
+ servers in a secondary data centre were no longer able to see the primary
+ in the main data centre and promoted a standby among themselves.
+
+
+ Previous &repmgr; versions used the concept of a "witness server" to
+ artificially create a quorum of servers in a particular location, ensuring
+ that nodes in another location will not elect a new primary if they
+ are unable to see the majority of nodes. However this approach does not
+ scale well, particularly with more complex replication setups, e.g.
+ where the majority of nodes are located outside of the primary datacentre.
+ It also means the witness node needs to be managed as an
+ extra PostgreSQL instance outside of the main replication cluster, which
+ adds administrative and programming complexity.
+
+
+ repmgr4 introduces the concept of location:
+ each node is associated with an arbitrary location string (default is
+ default); this is set in repmgr.conf, e.g.:
+
+ node_id=1
+ node_name=node1
+ conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
+ data_directory='/var/lib/postgresql/data'
+ location='dc1'
+
+
+ In a failover situation, repmgrd will check if any servers in the
+ same location as the current primary node are visible. If not, repmgrd
+ will assume a network interruption and not promote any node in any
+ other location (it will however enter mode until
+ a primary becomes visible).
+
+
+
+