mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-26 08:36:30 +00:00
doc: various updates
This commit is contained in:
@@ -1,7 +1,6 @@
|
|||||||
<chapter id="using-witness-server">
|
<chapter id="using-witness-server">
|
||||||
<indexterm>
|
<indexterm>
|
||||||
<primary>witness server</primary>
|
<primary>witness server</primary>
|
||||||
<seealso>Using a witness server with repmgrd</seealso>
|
|
||||||
</indexterm>
|
</indexterm>
|
||||||
|
|
||||||
|
|
||||||
@@ -9,8 +8,9 @@
|
|||||||
<para>
|
<para>
|
||||||
A <xref linkend="witness-server"> is a normal PostgreSQL instance which
|
A <xref linkend="witness-server"> is a normal PostgreSQL instance which
|
||||||
is not part of the streaming replication cluster; its purpose is, if a
|
is not part of the streaming replication cluster; its purpose is, if a
|
||||||
failover situation occurs, to provide proof that the primary server
|
failover situation occurs, to provide proof that it is the primary server
|
||||||
itself is unavailable.
|
itself which is unavailable, rather than e.g. a network split between
|
||||||
|
different physical locations.
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
<para>
|
<para>
|
||||||
@@ -53,7 +53,7 @@
|
|||||||
in the same physical location as the cluster's primary server.
|
in the same physical location as the cluster's primary server.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
This instance should *not* be on the same physical host as the primary server,
|
This instance should <emphasis>not</emphasis> be on the same physical host as the primary server,
|
||||||
as otherwise if the primary server fails due to hardware issues, the witness
|
as otherwise if the primary server fails due to hardware issues, the witness
|
||||||
server will be lost too.
|
server will be lost too.
|
||||||
</para>
|
</para>
|
||||||
|
|||||||
@@ -27,13 +27,13 @@
|
|||||||
<title>Using a witness server with repmgrd</title>
|
<title>Using a witness server with repmgrd</title>
|
||||||
<para>
|
<para>
|
||||||
In a situation caused e.g. by a network interruption between two
|
In a situation caused e.g. by a network interruption between two
|
||||||
data centres, it's important to avoid a "split-brain" situation where
|
data centres, it's important to avoid a "split-brain" situation where
|
||||||
both sides of the network assume they are the active segment and the
|
both sides of the network assume they are the active segment and the
|
||||||
side without an active primary unilaterally promotes one of its standbys.
|
side without an active primary unilaterally promotes one of its standbys.
|
||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
To prevent this situation happening, it's essential to ensure that one
|
To prevent this situation happening, it's essential to ensure that one
|
||||||
network segment has a "voting majority", so other segments will know
|
network segment has a "voting majority", so other segments will know
|
||||||
they're in the minority and not attempt to promote a new primary. Where
|
they're in the minority and not attempt to promote a new primary. Where
|
||||||
an odd number of servers exists, this is not an issue. However, if each
|
an odd number of servers exists, this is not an issue. However, if each
|
||||||
network has an even number of nodes, it's necessary to provide some way
|
network has an even number of nodes, it's necessary to provide some way
|
||||||
@@ -41,13 +41,19 @@
|
|||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
This is not a fully-fledged standby node and is not integrated into
|
This is not a fully-fledged standby node and is not integrated into
|
||||||
replication, but it effectively represents the "casting vote" when
|
replication, but it effectively represents the "casting vote" when
|
||||||
deciding which network segment has a majority. A witness server can
|
deciding which network segment has a majority. A witness server can
|
||||||
be set up using <xref linkend="repmgr-witness-register">. Note that it only
|
be set up using <link linkend="repmgr-witness-register"><command>repmgr witness register</command></link>;
|
||||||
makes sense to create a witness server in conjunction with running
|
see also section <link linkend="using-witness-server">Using a witness server</link>.
|
||||||
<application>repmgrd</application>; the witness server will require its own
|
|
||||||
<application>repmgrd</application> instance.
|
|
||||||
</para>
|
</para>
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
It only
|
||||||
|
makes sense to create a witness server in conjunction with running
|
||||||
|
<application>repmgrd</application>; the witness server will require its own
|
||||||
|
<application>repmgrd</application> instance.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
|
|
||||||
</sect1>
|
</sect1>
|
||||||
|
|
||||||
|
|||||||
@@ -1,96 +0,0 @@
|
|||||||
<chapter id="repmgrd-demonstration">
|
|
||||||
<title>repmgrd demonstration</title>
|
|
||||||
<para>
|
|
||||||
To demonstrate automatic failover, set up a 3-node replication cluster (one primary
|
|
||||||
and two standbys streaming directly from the primary) so that the cluster looks
|
|
||||||
something like this:
|
|
||||||
<programlisting>
|
|
||||||
$ repmgr -f /etc/repmgr.conf cluster show
|
|
||||||
ID | Name | Role | Status | Upstream | Location | Connection string
|
|
||||||
----+-------+---------+-----------+----------+----------+--------------------------------------
|
|
||||||
1 | node1 | primary | * running | | default | host=node1 dbname=repmgr user=repmgr
|
|
||||||
2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr
|
|
||||||
3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr</programlisting>
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
Start <application>repmgrd</application> on each standby and verify that it's running by examining the
|
|
||||||
log output, which at log level <literal>INFO</literal> will look like this:
|
|
||||||
<programlisting>
|
|
||||||
[2017-08-24 17:31:00] [NOTICE] using configuration file "/etc/repmgr.conf"
|
|
||||||
[2017-08-24 17:31:00] [INFO] connecting to database "host=node2 dbname=repmgr user=repmgr"
|
|
||||||
[2017-08-24 17:31:00] [NOTICE] starting monitoring of node <literal>node2</literal> (ID: 2)
|
|
||||||
[2017-08-24 17:31:00] [INFO] monitoring connection to upstream node "node1" (node ID: 1)</programlisting>
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
Each <application>repmgrd</application> should also have recorded its successful startup as an event:
|
|
||||||
<programlisting>
|
|
||||||
$ repmgr -f /etc/repmgr.conf cluster event --event=repmgrd_start
|
|
||||||
Node ID | Name | Event | OK | Timestamp | Details
|
|
||||||
---------+-------+---------------+----+---------------------+-------------------------------------------------------------
|
|
||||||
3 | node3 | repmgrd_start | t | 2017-08-24 17:35:54 | monitoring connection to upstream node "node1" (node ID: 1)
|
|
||||||
2 | node2 | repmgrd_start | t | 2017-08-24 17:35:50 | monitoring connection to upstream node "node1" (node ID: 1)
|
|
||||||
1 | node1 | repmgrd_start | t | 2017-08-24 17:35:46 | monitoring cluster primary "node1" (node ID: 1) </programlisting>
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
Now stop the current primary server with e.g.:
|
|
||||||
<programlisting>
|
|
||||||
pg_ctl -D /var/lib/postgresql/data -m immediate stop</programlisting>
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
This will force the primary to shut down straight away, aborting all processes
|
|
||||||
and transactions. This will cause a flurry of activity in the <application>repmgrd</application> log
|
|
||||||
files as each <application>repmgrd</application> detects the failure of the primary and a failover
|
|
||||||
decision is made. This is an extract from the log of a standby server (<literal>node2</literal>)
|
|
||||||
which has promoted to new primary after failure of the original primary (<literal>node1</literal>).
|
|
||||||
<programlisting>
|
|
||||||
[2017-08-24 23:32:01] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
|
|
||||||
[2017-08-24 23:32:08] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
|
|
||||||
[2017-08-24 23:32:08] [INFO] checking state of node 1, 1 of 5 attempts
|
|
||||||
[2017-08-24 23:32:08] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
||||||
[2017-08-24 23:32:09] [INFO] checking state of node 1, 2 of 5 attempts
|
|
||||||
[2017-08-24 23:32:09] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
||||||
[2017-08-24 23:32:10] [INFO] checking state of node 1, 3 of 5 attempts
|
|
||||||
[2017-08-24 23:32:10] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
||||||
[2017-08-24 23:32:11] [INFO] checking state of node 1, 4 of 5 attempts
|
|
||||||
[2017-08-24 23:32:11] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
||||||
[2017-08-24 23:32:12] [INFO] checking state of node 1, 5 of 5 attempts
|
|
||||||
[2017-08-24 23:32:12] [WARNING] unable to reconnect to node 1 after 5 attempts
|
|
||||||
INFO: setting voting term to 1
|
|
||||||
INFO: node 2 is candidate
|
|
||||||
INFO: node 3 has received request from node 2 for electoral term 1 (our term: 0)
|
|
||||||
[2017-08-24 23:32:12] [NOTICE] this node is the winner, will now promote self and inform other nodes
|
|
||||||
INFO: connecting to standby database
|
|
||||||
NOTICE: promoting standby
|
|
||||||
DETAIL: promoting server using 'pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' promote'
|
|
||||||
INFO: reconnecting to promoted server
|
|
||||||
NOTICE: STANDBY PROMOTE successful
|
|
||||||
DETAIL: node 2 was successfully promoted to primary
|
|
||||||
INFO: node 3 received notification to follow node 2
|
|
||||||
[2017-08-24 23:32:13] [INFO] switching to primary monitoring mode</programlisting>
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
The cluster status will now look like this, with the original primary (<literal>node1</literal>)
|
|
||||||
marked as inactive, and standby <literal>node3</literal> now following the new primary
|
|
||||||
(<literal>node2</literal>):
|
|
||||||
<programlisting>
|
|
||||||
$ repmgr -f /etc/repmgr.conf cluster show
|
|
||||||
ID | Name | Role | Status | Upstream | Location | Connection string
|
|
||||||
----+-------+---------+-----------+----------+----------+----------------------------------------------------
|
|
||||||
1 | node1 | primary | - failed | | default | host=node1 dbname=repmgr user=repmgr
|
|
||||||
2 | node2 | primary | * running | | default | host=node2 dbname=repmgr user=repmgr
|
|
||||||
3 | node3 | standby | running | node2 | default | host=node3 dbname=repmgr user=repmgr</programlisting>
|
|
||||||
|
|
||||||
</para>
|
|
||||||
<para>
|
|
||||||
<command>repmgr cluster event</command> will display a summary of what happened to each server
|
|
||||||
during the failover:
|
|
||||||
<programlisting>
|
|
||||||
$ repmgr -f /etc/repmgr.conf cluster event
|
|
||||||
Node ID | Name | Event | OK | Timestamp | Details
|
|
||||||
---------+-------+--------------------------+----+---------------------+-----------------------------------------------------------------------------------
|
|
||||||
3 | node3 | repmgrd_failover_follow | t | 2017-08-24 23:32:16 | node 3 now following new upstream node 2
|
|
||||||
3 | node3 | standby_follow | t | 2017-08-24 23:32:16 | node 3 is now attached to node 2
|
|
||||||
2 | node2 | repmgrd_failover_promote | t | 2017-08-24 23:32:13 | node 2 promoted to primary; old primary 1 marked as failed
|
|
||||||
2 | node2 | standby_promote | t | 2017-08-24 23:32:13 | node 2 was successfully promoted to primary</programlisting>
|
|
||||||
</para>
|
|
||||||
</chapter>
|
|
||||||
@@ -29,6 +29,13 @@
|
|||||||
2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr
|
2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr
|
||||||
3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr</programlisting>
|
3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr</programlisting>
|
||||||
</para>
|
</para>
|
||||||
|
|
||||||
|
<tip>
|
||||||
|
<para>
|
||||||
|
See section <link linkend="repmgrd-automatic-failover-configuration">Required configuration for automatic failover</link>
|
||||||
|
for an example of minimal <filename>repmgr.conf</filename> file settings suitable for use with <application>repmgrd</application>.
|
||||||
|
</para>
|
||||||
|
</tip>
|
||||||
<para>
|
<para>
|
||||||
Start <application>repmgrd</application> on each standby and verify that it's running by examining the
|
Start <application>repmgrd</application> on each standby and verify that it's running by examining the
|
||||||
log output, which at log level <literal>INFO</literal> will look like this:
|
log output, which at log level <literal>INFO</literal> will look like this:
|
||||||
|
|||||||
Reference in New Issue
Block a user