diff --git a/doc/cloning-standbys.sgml b/doc/cloning-standbys.sgml
index 05b513a8..2c1fe095 100644
--- a/doc/cloning-standbys.sgml
+++ b/doc/cloning-standbys.sgml
@@ -308,7 +308,7 @@
After starting the standby, the cluster will look like this, showing that node3
- is attached to node3, not the primary (node1).
+ is attached to node2, not the primary (node1).
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Connection string
diff --git a/doc/command-reference.sgml b/doc/command-reference.sgml
index 2627c5a5..6fb419f9 100644
--- a/doc/command-reference.sgml
+++ b/doc/command-reference.sgml
@@ -148,6 +148,26 @@
+
+
+ repmgr standby promote
+
+ repmgr standby promote
+
+ Promotes a standby to a primary if the current primary has failed. This
+ command requires a valid repmgr.conf file for the standby, either
+ specified explicitly with -f/--config-file or located in a
+ default location; no additional arguments are required.
+
+
+ If the standby promotion succeeds, the server will not need to be
+ restarted. However any other standbys will need to follow the new server,
+ by using ; if repmgrd is active, it will
+ handle this automatically.
+
+
+
+
repmgr standby follow
@@ -170,6 +190,7 @@
+
repmgr node rejoin
@@ -179,8 +200,191 @@
Enables a dormant (stopped) node to be rejoined to the replication cluster.
- This can optionally use `pg_rewind` to re-integrate a node which has diverged
+ This can optionally use pg_rewind to re-integrate a node which has diverged
from the rest of the cluster, typically a failed primary.
+
+
+
+ repmgr cluster show
+
+ repmgr cluster show
+
+ Displays information about each active node in the replication cluster. This
+ command polls each registered server and shows its role (primary /
+ standby / bdr) and status. It polls each server
+ directly and can be run on any node in the cluster; this is also useful when analyzing
+ connectivity from a particular node.
+
+
+ This command requires either a valid repmgr.conf file or a database
+ connection string to one of the registered nodes; no additional arguments are needed.
+
+
+
+ Example:
+
+ $ repmgr -f /etc/repmgr.conf cluster show
+
+ ID | Name | Role | Status | Upstream | Location | Connection string
+ ----+-------+---------+-----------+----------+----------+-----------------------------------------
+ 1 | node1 | primary | * running | | default | host=db_node1 dbname=repmgr user=repmgr
+ 2 | node2 | standby | running | node1 | default | host=db_node2 dbname=repmgr user=repmgr
+ 3 | node3 | standby | running | node1 | default | host=db_node3 dbname=repmgr user=repmgr
+
+
+
+ To show database connection errors when polling nodes, run the command in
+ --verbose mode.
+
+
+ The `cluster show` command accepts an optional parameter --csv, which
+ outputs the replication cluster's status in a simple CSV format, suitable for
+ parsing by scripts:
+
+ $ repmgr -f /etc/repmgr.conf cluster show --csv
+ 1,-1,-1
+ 2,0,0
+ 3,0,1
+
+
+ The columns have following meanings:
+
+
+
+ node ID
+
+
+ availability (0 = available, -1 = unavailable)
+
+
+ recovery state (0 = not in recovery, 1 = in recovery, -1 = unknown)
+
+
+
+
+
+
+ Note that the availability is tested by connecting from the node where
+ repmgr cluster show is executed, and does not necessarily imply the node
+ is down. See and to get
+ a better overviews of connections between nodes.
+
+
+
+
+
+ repmgr cluster matrix
+
+ repmgr cluster matric
+
+ repmgr cluster matrix runs repmgr cluster show on each
+ node and arranges the results in a matrix, recording success or failure.
+
+
+ repmgr cluster matrix requires a valid repmgr.conf
+ file on each node. Additionally passwordless `ssh` connections are required between
+ all nodes.
+
+
+ Example 1 (all nodes up):
+
+ $ repmgr -f /etc/repmgr.conf cluster matrix
+
+ Name | Id | 1 | 2 | 3
+ -------+----+----+----+----
+ node1 | 1 | * | * | *
+ node2 | 2 | * | * | *
+ node3 | 3 | * | * | *
+
+
+ Example 2 (node1 and node2 up, node3 down):
+
+ $ repmgr -f /etc/repmgr.conf cluster matrix
+
+ Name | Id | 1 | 2 | 3
+ -------+----+----+----+----
+ node1 | 1 | * | * | x
+ node2 | 2 | * | * | x
+ node3 | 3 | ? | ? | ?
+
+
+
+ Each row corresponds to one server, and indicates the result of
+ testing an outbound connection from that server.
+
+
+ Since node3 is down, all the entries in its row are filled with
+ ?, meaning that there we cannot test outbound connections.
+
+
+ The other two nodes are up; the corresponding rows have x in the
+ column corresponding to node3, meaning that inbound connections to
+ that node have failed, and `*` in the columns corresponding to
+ node1 and node2, meaning that inbound connections
+ to these nodes have succeeded.
+
+
+ Example 3 (all nodes up, firewall dropping packets originating
+ from node1 and directed to port 5432 on node3) -
+ running repmgr cluster matrix from node1 gives the following output:
+
+ $ repmgr -f /etc/repmgr.conf cluster matrix
+
+ Name | Id | 1 | 2 | 3
+ -------+----+----+----+----
+ node1 | 1 | * | * | x
+ node2 | 2 | * | * | *
+ node3 | 3 | ? | ? | ?
+
+
+ Note this may take some time depending on the connect_timeout
+ setting in the node conninfo strings; default is
+ 1 minute which means without modification the above
+ command would take around 2 minutes to run; see comment elsewhere about setting
+ connect_timeout)
+
+
+ The matrix tells us that we cannot connect from node1 to node3,
+ and that (therefore) we don't know the state of any outbound
+ connection from node3.
+
+
+ In this case, the command will produce a more
+ useful result.
+
+
+
+
+
+
+ repmgr cluster crosscheck
+
+ repmgr cluster crosscheck
+
+ repmgr cluster crosscheck is similar to ,
+ but cross-checks connections between each combination of nodes. In "Example 3" in
+ we have no information about the state of node3.
+ However by running repmgr cluster crosscheck it's possible to get a better
+ overview of the cluster situation:
+
+ $ repmgr -f /etc/repmgr.conf cluster crosscheck
+
+ Name | Id | 1 | 2 | 3
+ -------+----+----+----+----
+ node1 | 1 | * | * | x
+ node2 | 2 | * | * | *
+ node3 | 3 | * | * | *
+
+
+ What happened is that repmgr cluster crosscheck merged its own
+ repmgr cluster matrix with the repmgr cluster matrix
+ output from node2; the latter is able to connect to node3
+ and therefore determine the state ofx outbound connections from that node.
+
+
+
+
+
diff --git a/doc/filelist.sgml b/doc/filelist.sgml
index e4e2a6b8..5d16624a 100644
--- a/doc/filelist.sgml
+++ b/doc/filelist.sgml
@@ -40,6 +40,7 @@
+
diff --git a/doc/promoting-standby.sgml b/doc/promoting-standby.sgml
new file mode 100644
index 00000000..de515951
--- /dev/null
+++ b/doc/promoting-standby.sgml
@@ -0,0 +1,74 @@
+
+ Promoting a standby server with repmgr
+
+ If a primary server fails or needs to be removed from the replication cluster,
+ a new primary server must be designated, to ensure the cluster continues
+ to function correctly. This can be done with ,
+ which promotes the standby on the current server to primary.
+
+
+
+ To demonstrate this, set up a replication cluster with a primary and two attached
+ standby servers so that the cluster looks like this:
+
+ $ repmgr -f /etc/repmgr.conf cluster show
+ ID | Name | Role | Status | Upstream | Location | Connection string
+ ----+-------+---------+-----------+----------+----------+--------------------------------------
+ 1 | node1 | primary | * running | | default | host=node1 dbname=repmgr user=repmgr
+ 2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr
+ 3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr
+
+
+ Stop the current primary with e.g.:
+
+ $ pg_ctl -D /var/lib/postgresql/data -m fast stop
+
+
+ At this point the replication cluster will be in a partially disabled state, with
+ both standbys accepting read-only connections while attempting to connect to the
+ stopped primary. Note that the &repmgr; metadata table will not yet have been updated;
+ executing will note the discrepancy:
+
+ $ repmgr -f /etc/repmgr.conf cluster show
+ ID | Name | Role | Status | Upstream | Location | Connection string
+ ----+-------+---------+---------------+----------+----------+--------------------------------------
+ 1 | node1 | primary | ? unreachable | | default | host=node1 dbname=repmgr user=repmgr
+ 2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr
+ 3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr
+
+ WARNING: following issues were detected
+ node "node1" (ID: 1) is registered as an active primary but is unreachable
+
+
+ Now promote the first standby with:
+
+ $ repmgr -f /etc/repmgr.conf standby promote
+
+
+ This will produce output similar to the following:
+
+ INFO: connecting to standby database
+ NOTICE: promoting standby
+ DETAIL: promoting server using "pg_ctl -l /var/log/postgresql/startup.log -w -D '/var/lib/postgresql/data' promote"
+ server promoting
+ INFO: reconnecting to promoted server
+ NOTICE: STANDBY PROMOTE successful
+ DETAIL: node 2 was successfully promoted to primary
+
+
+ Executing will show the current state; as there is now an
+ active primary, the previous warning will not be displayed:
+
+ $ repmgr -f /etc/repmgr.conf cluster show
+ ID | Name | Role | Status | Upstream | Location | Connection string
+ ----+-------+---------+-----------+----------+----------+--------------------------------------
+ 1 | node1 | primary | - failed | | default | host=node1 dbname=repmgr user=repmgr
+ 2 | node2 | primary | * running | | default | host=node2 dbname=repmgr user=repmgr
+ 3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr
+
+
+ However the sole remaining standby (node3) is still trying to replicate from the failed
+ primary; must now be executed to rectify this situation.
+
+
+
diff --git a/doc/repmgr.sgml b/doc/repmgr.sgml
index a8a8b055..a6b17330 100644
--- a/doc/repmgr.sgml
+++ b/doc/repmgr.sgml
@@ -69,6 +69,7 @@
&configuration;
&cloning-standbys;
+ &promoting-standby;
&command-reference;