mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-25 16:16:29 +00:00
Further repmgrd documentation
This commit is contained in:
@@ -46,6 +46,7 @@
|
|||||||
|
|
||||||
<!ENTITY repmgrd-automatic-failover SYSTEM "repmgrd-automatic-failover.sgml">
|
<!ENTITY repmgrd-automatic-failover SYSTEM "repmgrd-automatic-failover.sgml">
|
||||||
<!ENTITY repmgrd-configuration SYSTEM "repmgrd-configuration.sgml">
|
<!ENTITY repmgrd-configuration SYSTEM "repmgrd-configuration.sgml">
|
||||||
|
<!ENTITY repmgrd-demonstration SYSTEM "repmgrd-demonstration.sgml">
|
||||||
|
|
||||||
<!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.sgml">
|
<!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.sgml">
|
||||||
<!ENTITY repmgr-primary-unregister SYSTEM "repmgr-primary-unregister.sgml">
|
<!ENTITY repmgr-primary-unregister SYSTEM "repmgr-primary-unregister.sgml">
|
||||||
|
|||||||
@@ -79,6 +79,7 @@
|
|||||||
<title>Using repmgrd</title>
|
<title>Using repmgrd</title>
|
||||||
&repmgrd-automatic-failover;
|
&repmgrd-automatic-failover;
|
||||||
&repmgrd-configuration;
|
&repmgrd-configuration;
|
||||||
|
&repmgrd-demonstration;
|
||||||
</part>
|
</part>
|
||||||
|
|
||||||
<part id="repmgr-command-reference">
|
<part id="repmgr-command-reference">
|
||||||
|
|||||||
@@ -47,5 +47,28 @@
|
|||||||
<command>repmgr standby follow</command> will result in the node continuing to follow
|
<command>repmgr standby follow</command> will result in the node continuing to follow
|
||||||
the original primary.
|
the original primary.
|
||||||
</para>
|
</para>
|
||||||
|
<sect1 id="repmgrd-connection-settings">
|
||||||
|
<title>repmgrd connection settings</title>
|
||||||
|
<para>
|
||||||
|
In addition to the &repmgr; configuration settings, parameters in the
|
||||||
|
<varname>conninfo</varname> string influence how &repmgr; makes a network connection to
|
||||||
|
PostgreSQL. In particular, if another server in the replication cluster
|
||||||
|
is unreachable at network level, system network settings will influence
|
||||||
|
the length of time it takes to determine that the connection is not possible.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
In particular explicitly setting a parameter for <literal>connect_timeout</literal>
|
||||||
|
should be considered; the effective minimum value of <literal>2</literal>
|
||||||
|
(seconds) will ensure that a connection failure at network level is reported
|
||||||
|
as soon as possible, otherwise depending on the system settings (e.g.
|
||||||
|
<varname>tcp_syn_retries</varname> in Linux) a delay of a minute or more
|
||||||
|
is possible.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
For further details on <varname>conninfo</varname> network connection
|
||||||
|
parameters, see the
|
||||||
|
<ulink url="https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-PARAMKEYWORDS">PostgreSQL documentation</ulink>.
|
||||||
|
</para>
|
||||||
|
</sect1>
|
||||||
|
|
||||||
</chapter>
|
</chapter>
|
||||||
|
|||||||
96
doc/repmgrd-demonstration.sgml
Normal file
96
doc/repmgrd-demonstration.sgml
Normal file
@@ -0,0 +1,96 @@
|
|||||||
|
<chapter id="repmgrd-demonstration">
|
||||||
|
<title>repmgrd demonstration</title>
|
||||||
|
<para>
|
||||||
|
To demonstrate automatic failover, set up a 3-node replication cluster (one primary
|
||||||
|
and two standbys streaming directly from the primary) so that the cluster looks
|
||||||
|
something like this:
|
||||||
|
<programlisting>
|
||||||
|
$ repmgr -f /etc/repmgr.conf cluster show
|
||||||
|
ID | Name | Role | Status | Upstream | Location | Connection string
|
||||||
|
----+-------+---------+-----------+----------+----------+--------------------------------------
|
||||||
|
1 | node1 | primary | * running | | default | host=node1 dbname=repmgr user=repmgr
|
||||||
|
2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr
|
||||||
|
3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr </programlisting>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Start <command>repmgrd</command> on each standby and verify that it's running by examining the
|
||||||
|
log output, which at log level <literal>INFO</literal> will look like this:
|
||||||
|
<programlisting>
|
||||||
|
[2017-08-24 17:31:00] [NOTICE] using configuration file "/etc/repmgr.conf"
|
||||||
|
[2017-08-24 17:31:00] [INFO] connecting to database "host=node2 dbname=repmgr user=repmgr"
|
||||||
|
[2017-08-24 17:31:00] [NOTICE] starting monitoring of node <literal>node2</literal> (ID: 2)
|
||||||
|
[2017-08-24 17:31:00] [INFO] monitoring connection to upstream node "node1" (node ID: 1) </programlisting>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Each <command>repmgrd</command> should also have recorded its successful startup as an event:
|
||||||
|
<programlisting>
|
||||||
|
$ repmgr -f /etc/repmgr.conf cluster event --event=repmgrd_start
|
||||||
|
Node ID | Name | Event | OK | Timestamp | Details
|
||||||
|
---------+-------+---------------+----+---------------------+-------------------------------------------------------------
|
||||||
|
3 | node3 | repmgrd_start | t | 2017-08-24 17:35:54 | monitoring connection to upstream node "node1" (node ID: 1)
|
||||||
|
2 | node2 | repmgrd_start | t | 2017-08-24 17:35:50 | monitoring connection to upstream node "node1" (node ID: 1)
|
||||||
|
1 | node1 | repmgrd_start | t | 2017-08-24 17:35:46 | monitoring cluster primary "node1" (node ID: 1) </programlisting>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Now stop the current primary server with e.g.:
|
||||||
|
<programlisting>
|
||||||
|
pg_ctl -D /var/lib/postgresql/data -m immediate stop</programlisting>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
This will force the primary to shut down straight away, aborting all processes
|
||||||
|
and transactions. This will cause a flurry of activity in the <command>repmgrd</command> log
|
||||||
|
files as each <command>repmgrd</command> detects the failure of the primary and a failover
|
||||||
|
decision is made. This is an extract from the log of a standby server (<literal>node2</literal>)
|
||||||
|
which has promoted to new primary after failure of the original primary (<literal>node1</literal>).
|
||||||
|
<programlisting>
|
||||||
|
[2017-08-24 23:32:01] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
|
||||||
|
[2017-08-24 23:32:08] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
|
||||||
|
[2017-08-24 23:32:08] [INFO] checking state of node 1, 1 of 5 attempts
|
||||||
|
[2017-08-24 23:32:08] [INFO] sleeping 1 seconds until next reconnection attempt
|
||||||
|
[2017-08-24 23:32:09] [INFO] checking state of node 1, 2 of 5 attempts
|
||||||
|
[2017-08-24 23:32:09] [INFO] sleeping 1 seconds until next reconnection attempt
|
||||||
|
[2017-08-24 23:32:10] [INFO] checking state of node 1, 3 of 5 attempts
|
||||||
|
[2017-08-24 23:32:10] [INFO] sleeping 1 seconds until next reconnection attempt
|
||||||
|
[2017-08-24 23:32:11] [INFO] checking state of node 1, 4 of 5 attempts
|
||||||
|
[2017-08-24 23:32:11] [INFO] sleeping 1 seconds until next reconnection attempt
|
||||||
|
[2017-08-24 23:32:12] [INFO] checking state of node 1, 5 of 5 attempts
|
||||||
|
[2017-08-24 23:32:12] [WARNING] unable to reconnect to node 1 after 5 attempts
|
||||||
|
INFO: setting voting term to 1
|
||||||
|
INFO: node 2 is candidate
|
||||||
|
INFO: node 3 has received request from node 2 for electoral term 1 (our term: 0)
|
||||||
|
[2017-08-24 23:32:12] [NOTICE] this node is the winner, will now promote self and inform other nodes
|
||||||
|
INFO: connecting to standby database
|
||||||
|
NOTICE: promoting standby
|
||||||
|
DETAIL: promoting server using '/home/barwick/devel/builds/HEAD/bin/pg_ctl -l /tmp/postgres.5602.log -w -D '/tmp/repmgr-test/node_2/data' promote'
|
||||||
|
INFO: reconnecting to promoted server
|
||||||
|
NOTICE: STANDBY PROMOTE successful
|
||||||
|
DETAIL: node 2 was successfully promoted to primary
|
||||||
|
INFO: node 3 received notification to follow node 2
|
||||||
|
[2017-08-24 23:32:13] [INFO] switching to primary monitoring mode</programlisting>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The cluster status will now look like this, with the original primary (<literal>node1</literal>)
|
||||||
|
marked as inactive, and standby <literal>node3</literal> now following the new primary
|
||||||
|
(<literal>node2</literal>):
|
||||||
|
<programlisting>
|
||||||
|
$ repmgr -f /etc/repmgr.conf cluster show
|
||||||
|
ID | Name | Role | Status | Upstream | Location | Connection string
|
||||||
|
----+-------+---------+-----------+----------+----------+----------------------------------------------------
|
||||||
|
1 | node1 | primary | - failed | | default | host=node1 dbname=repmgr user=repmgr
|
||||||
|
2 | node2 | primary | * running | | default | host=node2 dbname=repmgr user=repmgr
|
||||||
|
3 | node3 | standby | running | node2 | default | host=node3 dbname=repmgr user=repmgr</programlisting>
|
||||||
|
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
<command>repmgr cluster event</command> will display a summary of what happened to each server
|
||||||
|
during the failover:
|
||||||
|
<programlisting>
|
||||||
|
$ repmgr -f /etc/repmgr.conf cluster event
|
||||||
|
Node ID | Name | Event | OK | Timestamp | Details
|
||||||
|
---------+-------+--------------------------+----+---------------------+-----------------------------------------------------------------------------------
|
||||||
|
3 | node3 | repmgrd_failover_follow | t | 2017-08-24 23:32:16 | node 3 now following new upstream node 2
|
||||||
|
3 | node3 | standby_follow | t | 2017-08-24 23:32:16 | node 3 is now attached to node 2
|
||||||
|
2 | node2 | repmgrd_failover_promote | t | 2017-08-24 23:32:13 | node 2 promoted to primary; old primary 1 marked as failed
|
||||||
|
2 | node2 | standby_promote | t | 2017-08-24 23:32:13 | node 2 was successfully promoted to primary</programlisting>
|
||||||
|
</para>
|
||||||
|
</chapter>
|
||||||
Reference in New Issue
Block a user