mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-23 15:16:29 +00:00
200 lines
8.5 KiB
Plaintext
200 lines
8.5 KiB
Plaintext
<chapter id="repmgrd-automatic-failover" xreflabel="Automatic failover with repmgrd">
|
|
<indexterm>
|
|
<primary>repmgrd</primary>
|
|
<secondary>automatic failover</secondary>
|
|
</indexterm>
|
|
|
|
<title>Automatic failover with repmgrd</title>
|
|
|
|
<para>
|
|
<application>repmgrd</application> is a management and monitoring daemon which runs
|
|
on each node in a replication cluster. It can automate actions such as
|
|
failover and updating standbys to follow the new primary, as well as
|
|
providing monitoring information about the state of each standby.
|
|
</para>
|
|
|
|
<sect1 id="repmgrd-witness-server" xreflabel="Using a witness server with repmgrd">
|
|
<indexterm>
|
|
<primary>repmgrd</primary>
|
|
<secondary>witness server</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>witness server</primary>
|
|
<secondary>repmgrd</secondary>
|
|
</indexterm>
|
|
|
|
<title>Using a witness server with repmgrd</title>
|
|
<para>
|
|
In a situation caused e.g. by a network interruption between two
|
|
data centres, it's important to avoid a "split-brain" situation where
|
|
both sides of the network assume they are the active segment and the
|
|
side without an active primary unilaterally promotes one of its standbys.
|
|
</para>
|
|
<para>
|
|
To prevent this situation happening, it's essential to ensure that one
|
|
network segment has a "voting majority", so other segments will know
|
|
they're in the minority and not attempt to promote a new primary. Where
|
|
an odd number of servers exists, this is not an issue. However, if each
|
|
network has an even number of nodes, it's necessary to provide some way
|
|
of ensuring a majority, which is where the witness server becomes useful.
|
|
</para>
|
|
<para>
|
|
This is not a fully-fledged standby node and is not integrated into
|
|
replication, but it effectively represents the "casting vote" when
|
|
deciding which network segment has a majority. A witness server can
|
|
be set up using <link linkend="repmgr-witness-register"><command>repmgr witness register</command></link>;
|
|
see also section <link linkend="using-witness-server">Using a witness server</link>.
|
|
</para>
|
|
<note>
|
|
<para>
|
|
It only
|
|
makes sense to create a witness server in conjunction with running
|
|
<application>repmgrd</application>; the witness server will require its own
|
|
<application>repmgrd</application> instance.
|
|
</para>
|
|
</note>
|
|
|
|
</sect1>
|
|
|
|
|
|
<sect1 id="repmgrd-network-split" xreflabel="Handling network splits with repmgrd">
|
|
<indexterm>
|
|
<primary>repmgrd</primary>
|
|
<secondary>network splits</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>network splits</primary>
|
|
</indexterm>
|
|
|
|
<title>Handling network splits with repmgrd</title>
|
|
<para>
|
|
A common pattern for replication cluster setups is to spread servers over
|
|
more than one datacentre. This can provide benefits such as geographically-
|
|
distributed read replicas and DR (disaster recovery capability). However
|
|
this also means there is a risk of disconnection at network level between
|
|
datacentre locations, which would result in a split-brain scenario if
|
|
servers in a secondary data centre were no longer able to see the primary
|
|
in the main data centre and promoted a standby among themselves.
|
|
</para>
|
|
<para>
|
|
&repmgr; enables provision of "<xref linkend="witness-server">" to
|
|
artificially create a quorum of servers in a particular location, ensuring
|
|
that nodes in another location will not elect a new primary if they
|
|
are unable to see the majority of nodes. However this approach does not
|
|
scale well, particularly with more complex replication setups, e.g.
|
|
where the majority of nodes are located outside of the primary datacentre.
|
|
It also means the <literal>witness</literal> node needs to be managed as an
|
|
extra PostgreSQL instance outside of the main replication cluster, which
|
|
adds administrative and programming complexity.
|
|
</para>
|
|
<para>
|
|
<literal>repmgr4</literal> introduces the concept of <literal>location</literal>:
|
|
each node is associated with an arbitrary location string (default is
|
|
<literal>default</literal>); this is set in <filename>repmgr.conf</filename>, e.g.:
|
|
<programlisting>
|
|
node_id=1
|
|
node_name=node1
|
|
conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
|
|
data_directory='/var/lib/postgresql/data'
|
|
location='dc1'</programlisting>
|
|
</para>
|
|
<para>
|
|
In a failover situation, <application>repmgrd</application> will check if any servers in the
|
|
same location as the current primary node are visible. If not, <application>repmgrd</application>
|
|
will assume a network interruption and not promote any node in any
|
|
other location (it will however enter <link linkend="repmgrd-degraded-monitoring">degraded monitoring</link>
|
|
mode until a primary becomes visible).
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="repmgrd-failover-validation" xreflabel="Failover validation">
|
|
<indexterm>
|
|
<primary>repmgrd</primary>
|
|
<secondary>failover validation</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>failover validation</primary>
|
|
</indexterm>
|
|
|
|
<title>Failover validation</title>
|
|
<para>
|
|
From <link linkend="release-4.3">repmgr 4.3</link>, &repmgr; makes it possible to provide a script
|
|
to <application>repmgrd</application> which, in a failover situation,
|
|
will be executed by the promotion candidate (the node which has been selected
|
|
to be the new primary) to confirm whether the node should actually be promoted.
|
|
</para>
|
|
<para>
|
|
To use this, <option>failover_validation_command</option> in <filename>repmgr.conf</filename>
|
|
to a script executable by the <literal>postgres</literal> system user, e.g.:
|
|
<programlisting>
|
|
failover_validation_command=/path/to/script.sh %n %a</programlisting>
|
|
</para>
|
|
<para>
|
|
The <literal>%n</literal> parameter will be replaced with the node ID, and the
|
|
<literal>%a</literal> parameter will be replaced by the node name when the script is executed.
|
|
</para>
|
|
<para>
|
|
This script must return an exit code of <literal>0</literal> to indicate the node should promote itself.
|
|
Any other value will result in the promotion being aborted and the election rerun.
|
|
There is a pause of <option>election_rerun_interval</option> seconds before the election is rerun.
|
|
</para>
|
|
<para>
|
|
Sample <application>repmgrd</application> log file output during which the failover validation
|
|
script rejects the proposed promotion candidate:
|
|
<programlisting>
|
|
[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
|
|
[2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2)
|
|
[2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command"
|
|
[2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2
|
|
[2019-03-13 21:01:30] [INFO] output returned by failover validation command:
|
|
Node ID: 2
|
|
|
|
[2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1"
|
|
[2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun
|
|
[2019-03-13 21:01:30] [INFO] 1 followers to notify
|
|
[2019-03-13 21:01:30] [NOTICE] notifying node "node3" (node ID: 3) to rerun promotion candidate selection
|
|
INFO: node 3 received notification to rerun promotion candidate election
|
|
[2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval")</programlisting>
|
|
</para>
|
|
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="cascading-replication" xreflabel="Cascading replication">
|
|
<indexterm>
|
|
<primary>repmgrd</primary>
|
|
<secondary>cascading replication</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>cascading replication</primary>
|
|
<secondary>repmgrd</secondary>
|
|
</indexterm>
|
|
|
|
<title>repmgrd and cascading replication</title>
|
|
<para>
|
|
Cascading replication - where a standby can connect to an upstream node and not
|
|
the primary server itself - was introduced in PostgreSQL 9.2. &repmgr; and
|
|
<application>repmgrd</application> support cascading replication by keeping track of the relationship
|
|
between standby servers - each node record is stored with the node id of its
|
|
upstream ("parent") server (except of course the primary server).
|
|
</para>
|
|
<para>
|
|
In a failover situation where the primary node fails and a top-level standby
|
|
is promoted, a standby connected to another standby will not be affected
|
|
and continue working as normal (even if the upstream standby it's connected
|
|
to becomes the primary node). If however the node's direct upstream fails,
|
|
the "cascaded standby" will attempt to reconnect to that node's parent
|
|
(unless <varname>failover</varname> is set to <literal>manual</literal> in
|
|
<filename>repmgr.conf</filename>).
|
|
</para>
|
|
|
|
</sect1>
|
|
|
|
|
|
</chapter>
|