From 5f92fbddf2a76cfd2f7ce63b2716a837f21d3dc5 Mon Sep 17 00:00:00 2001
From: Ian Barwick <ian@2ndquadrant.com>
Date: Wed, 13 Mar 2019 16:55:32 +0900
Subject: [PATCH] doc: various updates

---
 doc/configuring-witness-server.sgml |  8 +--
 doc/repmgrd-automatic-failover.sgml | 20 +++---
 doc/repmgrd-demonstration.sgml      | 96 -----------------------------
 doc/repmgrd-overview.sgml           |  7 +++
 4 files changed, 24 insertions(+), 107 deletions(-)
 delete mode 100644 doc/repmgrd-demonstration.sgml
diff --git a/doc/configuring-witness-server.sgml b/doc/configuring-witness-server.sgml
index 6f798acf..54b0aee9 100644
--- a/doc/configuring-witness-server.sgml
+++ b/doc/configuring-witness-server.sgml
@@ -1,7 +1,6 @@
 <chapter id="using-witness-server">
  <indexterm>
   <primary>witness server</primary>
-  <seealso>Using a witness server with repmgrd</seealso>
  </indexterm>
 
 
@@ -9,8 +8,9 @@
  <para>
    A <xref linkend="witness-server"> is a normal PostgreSQL instance which
    is not part of the streaming replication cluster; its purpose is, if a
-   failover situation occurs, to provide proof that the primary server
-   itself is unavailable.
+   failover situation occurs, to provide proof that it is the primary server
+   itself which is unavailable, rather than e.g. a network split between
+   different physical locations.
  </para>
 
  <para>
@@ -53,7 +53,7 @@
    in the same physical location as the cluster's primary server.
  </para>
  <para>
-   This instance should *not* be on the same physical host as the primary server,
+   This instance should <emphasis>not</emphasis> be on the same physical host as the primary server,
    as otherwise if the primary server fails due to hardware issues, the witness
    server will be lost too.
  </para>
diff --git a/doc/repmgrd-automatic-failover.sgml b/doc/repmgrd-automatic-failover.sgml
index d89b6de5..8d893b06 100644
--- a/doc/repmgrd-automatic-failover.sgml
+++ b/doc/repmgrd-automatic-failover.sgml
@@ -27,13 +27,13 @@
  <title>Using a witness server with repmgrd</title>
  <para>
    In a situation caused e.g. by a network interruption between two
-   data centres, it's important to avoid a "split-brain" situation where
+   data centres, it's important to avoid a &quot;split-brain&quot; situation where
    both sides of the network assume they are the active segment and the
    side without an active primary unilaterally promotes one of its standbys.
  </para>
  <para>
    To prevent this situation happening, it's essential to ensure that one
-   network segment has a "voting majority", so other segments will know
+   network segment has a &quot;voting majority&quot;, so other segments will know
    they're in the minority and not attempt to promote a new primary. Where
    an odd number of servers exists, this is not an issue. However, if each
    network has an even number of nodes, it's necessary to provide some way
@@ -41,13 +41,19 @@
  </para>
  <para>
    This is not a fully-fledged standby node and is not integrated into
-   replication, but it effectively represents the "casting vote" when
+   replication, but it effectively represents the &quot;casting vote&quot; when
    deciding which network segment has a majority. A witness server can
-   be set up using <xref linkend="repmgr-witness-register">. Note that it only
-   makes sense to create a witness server in conjunction with running
-   <application>repmgrd</application>; the witness server will require its own
-   <application>repmgrd</application> instance.
+   be set up using <link linkend="repmgr-witness-register"><command>repmgr witness register</command></link>;
+   see also section <link linkend="using-witness-server">Using a witness server</link>.
  </para>
+ <note>
+   <para>
+     It only
+     makes sense to create a witness server in conjunction with running
+     <application>repmgrd</application>; the witness server will require its own
+     <application>repmgrd</application> instance.
+   </para>
+ </note>
 
 </sect1>
 
diff --git a/doc/repmgrd-demonstration.sgml b/doc/repmgrd-demonstration.sgml
deleted file mode 100644
index 2a0530a9..00000000
--- a/doc/repmgrd-demonstration.sgml
+++ /dev/null
@@ -1,96 +0,0 @@
-<chapter id="repmgrd-demonstration">
- <title>repmgrd demonstration</title>
- <para>
-  To demonstrate automatic failover, set up a 3-node replication cluster (one primary
-  and two standbys streaming directly from the primary) so that the cluster looks
-  something like this:
-  <programlisting>
-    $ repmgr -f /etc/repmgr.conf cluster show
-     ID | Name  | Role    | Status    | Upstream | Location | Connection string
-    ----+-------+---------+-----------+----------+----------+--------------------------------------
-     1  | node1 | primary | * running |          | default  | host=node1 dbname=repmgr user=repmgr
-     2  | node2 | standby |   running | node1    | default  | host=node2 dbname=repmgr user=repmgr
-     3  | node3 | standby |   running | node1    | default  | host=node3 dbname=repmgr user=repmgr</programlisting>
- </para>
- <para>
-  Start <application>repmgrd</application> on each standby and verify that it's running by examining the
-  log output, which at log level <literal>INFO</literal> will look like this:
-  <programlisting>
-    [2017-08-24 17:31:00] [NOTICE] using configuration file "/etc/repmgr.conf"
-    [2017-08-24 17:31:00] [INFO] connecting to database "host=node2 dbname=repmgr user=repmgr"
-    [2017-08-24 17:31:00] [NOTICE] starting monitoring of node <literal>node2</literal> (ID: 2)
-    [2017-08-24 17:31:00] [INFO] monitoring connection to upstream node "node1" (node ID: 1)</programlisting>
- </para>
- <para>
-  Each <application>repmgrd</application> should also have recorded its successful startup as an event:
-  <programlisting>
-    $ repmgr -f /etc/repmgr.conf cluster event --event=repmgrd_start
-     Node ID | Name  | Event         | OK | Timestamp           | Details
-    ---------+-------+---------------+----+---------------------+-------------------------------------------------------------
-     3       | node3 | repmgrd_start | t  | 2017-08-24 17:35:54 | monitoring connection to upstream node "node1" (node ID: 1)
-     2       | node2 | repmgrd_start | t  | 2017-08-24 17:35:50 | monitoring connection to upstream node "node1" (node ID: 1)
-     1       | node1 | repmgrd_start | t  | 2017-08-24 17:35:46 | monitoring cluster primary "node1" (node ID: 1)  </programlisting>
- </para>
- <para>
-  Now stop the current primary server with e.g.:
-  <programlisting>
-    pg_ctl -D /var/lib/postgresql/data -m immediate stop</programlisting>
- </para>
- <para>
-  This will force the primary to shut down straight away, aborting all processes
-  and transactions.  This will cause a flurry of activity in the <application>repmgrd</application> log
-  files as each <application>repmgrd</application> detects the failure of the primary and a failover
-  decision is made. This is an extract from the log of a standby server (<literal>node2</literal>)
-  which has promoted to new primary after failure of the original primary (<literal>node1</literal>).
-  <programlisting>
-    [2017-08-24 23:32:01] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
-    [2017-08-24 23:32:08] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
-    [2017-08-24 23:32:08] [INFO] checking state of node 1, 1 of 5 attempts
-    [2017-08-24 23:32:08] [INFO] sleeping 1 seconds until next reconnection attempt
-    [2017-08-24 23:32:09] [INFO] checking state of node 1, 2 of 5 attempts
-    [2017-08-24 23:32:09] [INFO] sleeping 1 seconds until next reconnection attempt
-    [2017-08-24 23:32:10] [INFO] checking state of node 1, 3 of 5 attempts
-    [2017-08-24 23:32:10] [INFO] sleeping 1 seconds until next reconnection attempt
-    [2017-08-24 23:32:11] [INFO] checking state of node 1, 4 of 5 attempts
-    [2017-08-24 23:32:11] [INFO] sleeping 1 seconds until next reconnection attempt
-    [2017-08-24 23:32:12] [INFO] checking state of node 1, 5 of 5 attempts
-    [2017-08-24 23:32:12] [WARNING] unable to reconnect to node 1 after 5 attempts
-    INFO:  setting voting term to 1
-    INFO:  node 2 is candidate
-    INFO:  node 3 has received request from node 2 for electoral term 1 (our term: 0)
-    [2017-08-24 23:32:12] [NOTICE] this node is the winner, will now promote self and inform other nodes
-    INFO: connecting to standby database
-    NOTICE: promoting standby
-    DETAIL: promoting server using 'pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' promote'
-    INFO: reconnecting to promoted server
-    NOTICE: STANDBY PROMOTE successful
-    DETAIL: node 2 was successfully promoted to primary
-    INFO:  node 3 received notification to follow node 2
-    [2017-08-24 23:32:13] [INFO] switching to primary monitoring mode</programlisting>
- </para>
- <para>
-  The cluster status will now look like this, with the original primary (<literal>node1</literal>)
-  marked as inactive, and standby <literal>node3</literal> now following the new primary
-  (<literal>node2</literal>):
-  <programlisting>
-    $ repmgr -f /etc/repmgr.conf cluster show
-     ID | Name  | Role    | Status    | Upstream | Location | Connection string
-    ----+-------+---------+-----------+----------+----------+----------------------------------------------------
-     1  | node1 | primary | - failed  |          | default  | host=node1 dbname=repmgr user=repmgr
-     2  | node2 | primary | * running |          | default  | host=node2 dbname=repmgr user=repmgr
-     3  | node3 | standby |   running | node2    | default  | host=node3 dbname=repmgr user=repmgr</programlisting>
-
- </para>
- <para>
-  <command>repmgr cluster event</command> will display a summary of what happened to each server
-  during the failover:
-  <programlisting>
-    $ repmgr -f /etc/repmgr.conf cluster event
-     Node ID | Name  | Event                    | OK | Timestamp           | Details
-    ---------+-------+--------------------------+----+---------------------+-----------------------------------------------------------------------------------
-     3       | node3 | repmgrd_failover_follow  | t  | 2017-08-24 23:32:16 | node 3 now following new upstream node 2
-     3       | node3 | standby_follow           | t  | 2017-08-24 23:32:16 | node 3 is now attached to node 2
-     2       | node2 | repmgrd_failover_promote | t  | 2017-08-24 23:32:13 | node 2 promoted to primary; old primary 1 marked as failed
-     2       | node2 | standby_promote          | t  | 2017-08-24 23:32:13 | node 2 was successfully promoted to primary</programlisting>
- </para>
-</chapter>
diff --git a/doc/repmgrd-overview.sgml b/doc/repmgrd-overview.sgml
index 5ec26447..5be2805a 100644
--- a/doc/repmgrd-overview.sgml
+++ b/doc/repmgrd-overview.sgml
@@ -29,6 +29,13 @@
      2  | node2 | standby |   running | node1    | default  | host=node2 dbname=repmgr user=repmgr
      3  | node3 | standby |   running | node1    | default  | host=node3 dbname=repmgr user=repmgr</programlisting>
  </para>
+
+ <tip>
+   <para>
+     See section <link linkend="repmgrd-automatic-failover-configuration">Required configuration for automatic failover</link>
+     for an example of minimal <filename>repmgr.conf</filename> file settings suitable for use with <application>repmgrd</application>.
+   </para>
+ </tip>
  <para>
   Start <application>repmgrd</application> on each standby and verify that it's running by examining the
   log output, which at log level <literal>INFO</literal> will look like this: