mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-23 07:06:30 +00:00
In some circumstances, e.g. while performing a switchover, it is essential
that repmgrd does not take any kind of failover action, as this will put
the cluster into an incorrect state.
Previously it was necessary to stop repmgrd on all nodes (or at least
those nodes which repmgrd would consider as promotion candidates), however
this is a cumbersome and potentially risk-prone operation, particularly if the
replication cluster contains more than a couple of servers.
To prevent this issue from occurring, this patch introduces the ability
to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause")
which notifies repmgrd not to take any failover action until the node
is "unpaused" ("repmgr daemon unpause").
"repmgr daemon status" provides an overview of each node and whether repmgrd
is running, and if so whether it is paused.
"repmgr standby switchover" has been modified to automatically pause repmgrd
while carrying out the switchover.
See documentation for further details.
170 lines
7.5 KiB
Plaintext
170 lines
7.5 KiB
Plaintext
<chapter id="repmgrd-pausing" xreflabel="Pausing repmgrd">
|
|
|
|
<indexterm>
|
|
<primary>repmgrd</primary>
|
|
<secondary>pausing</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>pausing repmgrd</primary>
|
|
</indexterm>
|
|
|
|
<title>Pausing repmgrd</title>
|
|
|
|
<para>
|
|
In normal operation, <application>repmgrd</application> monitors the state of the
|
|
PostgreSQL node it is running on, and will take appropriate action if problems
|
|
are detected, e.g. (if so configured) promote the node to primary, if the existing
|
|
primary has been determined as failed.
|
|
</para>
|
|
|
|
<para>
|
|
However, <application>repmgrd</application> is unable to distinguish between
|
|
planned outages (such as performing a <link linkend="performing-switchover">switchover</link>
|
|
or upgrading a server), and an actual server outage. In versions prior to &repmgr; 4.2
|
|
it was necessary to stop <application>repmgrd</application> on all nodes (or at least
|
|
on all nodes where <application>repmgrd</application> is
|
|
<link linkend="repmgrd-automatic-failover">configured for automatic failover</link>)
|
|
to prevent <application>repmgrd</application> from making changes to the
|
|
replication cluster.
|
|
</para>
|
|
<para>
|
|
From <link linkend="release-4.2">&repmgr; 4.2</link>, <application>repmgrd</application>
|
|
can now be "paused", i.e. instructed not to take any action such as performing a failover.
|
|
This can be done from any node in the cluster, removing the need to stop/restart
|
|
each <application>repmgrd</application> individually.
|
|
</para>
|
|
|
|
<sect1 id="repmgrd-pausing-prerequisites">
|
|
<title>Prerequisites for pausing <application>repmgrd</application></title>
|
|
<para>
|
|
In order to be able to pause/unpause <application>repmgrd</application>, following
|
|
prerequisites must be met:
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
|
|
<listitem>
|
|
<simpara><link linkend="release-4.2">&repmgr; 4.2</link> or later must be installed on all nodes.</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>The same major &repmgr; version (e.g. 4.2) must be installed on all nodes (and preferably the same minor version).</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
PostgreSQL on all nodes must be accessible from the node where the
|
|
<literal>pause</literal>/<literal>unpause</literal> operation is executed, using the
|
|
<varname>conninfo</varname> string shown by <link linkend="repmgr-cluster-show"><command>repmgr cluster show</command></link>.
|
|
</simpara>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<note>
|
|
<para>
|
|
These conditions are required for normal &repmgr; operation in any case.
|
|
</para>
|
|
</note>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="repmgrd-pausing-execution">
|
|
<title>Pausing/unpausing <application>repmgrd</application></title>
|
|
<para>
|
|
To pause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link>, e.g.:
|
|
<programlisting>
|
|
$ repmgr -f /etc/repmgr.conf daemon pause
|
|
NOTICE: node 1 (node1) paused
|
|
NOTICE: node 2 (node2) paused
|
|
NOTICE: node 3 (node3) paused</programlisting>
|
|
</para>
|
|
<para>
|
|
The state of <application>repmgrd</application> on each node can be checked with
|
|
<link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>, e.g.:
|
|
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
|
|
ID | Name | Role | Status | repmgrd | PID | Paused?
|
|
----+-------+---------+---------+---------+------+---------
|
|
1 | node1 | primary | running | running | 7851 | yes
|
|
2 | node2 | standby | running | running | 7889 | yes
|
|
3 | node3 | standby | running | running | 7918 | yes</programlisting>
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
If executing a switchover with <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
|
|
&repmgr; will automatically pause/unpause <application>repmgrd</application> as part of the switchover process.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
If the primary (in this example, <literal>node1</literal>) is stopped, <application>repmgrd</application>
|
|
running on one of the standbys (here: <literal>node2</literal>) will react like this:
|
|
<programlisting>
|
|
[2018-09-20 12:22:21] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
|
|
[2018-09-20 12:22:21] [INFO] checking state of node 1, 1 of 5 attempts
|
|
[2018-09-20 12:22:21] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
...
|
|
[2018-09-20 12:22:24] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2018-09-20 12:22:25] [INFO] checking state of node 1, 5 of 5 attempts
|
|
[2018-09-20 12:22:25] [WARNING] unable to reconnect to node 1 after 5 attempts
|
|
[2018-09-20 12:22:25] [NOTICE] node is paused
|
|
[2018-09-20 12:22:33] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state
|
|
[2018-09-20 12:22:33] [DETAIL] repmgrd paused by administrator
|
|
[2018-09-20 12:22:33] [HINT] execute "repmgr daemon unpause" to resume normal failover mode</programlisting>
|
|
</para>
|
|
<para>
|
|
If the primary becomes available again (e.g. following a software upgrade), <application>repmgrd</application>
|
|
will automatically reconnect, e.g.:
|
|
<programlisting>
|
|
[2018-09-20 13:12:41] [NOTICE] reconnected to upstream node 1 after 8 seconds, resuming monitoring</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
To unpause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>, e.g.:
|
|
<programlisting>
|
|
$ repmgr -f /etc/repmgr.conf daemon pause
|
|
NOTICE: node 1 (node1) unpaused
|
|
NOTICE: node 2 (node2) unpaused
|
|
NOTICE: node 3 (node3) unpaused</programlisting>
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
If the previous primary is no longer accessible when <application>repmgrd</application>
|
|
is unpaused, no failover action will be taken. Instead, a new primary must be manually promoted using
|
|
<link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>,
|
|
and any standbys attached to the new primary with
|
|
<link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>.
|
|
</para>
|
|
<para>
|
|
This is to prevent <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
|
|
resulting in the automatic promotion of a new primary, which may be a problem particularly
|
|
in larger clusters, where <application>repmgrd</application> could select a different promotion
|
|
candidate to the one intended by the administrator.
|
|
</para>
|
|
</note>
|
|
|
|
<sect2 id="repmgrd-pausing-details">
|
|
<title>Details on the <application>repmgrd</application> pausing mechanism</title>
|
|
|
|
<para>
|
|
The pause state of each node will be stored over a PostgreSQL restart.
|
|
</para>
|
|
|
|
<para>
|
|
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
|
|
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link> can be
|
|
executed even if <application>repmgrd</application> is not running; in this case,
|
|
<application>repmgrd</application> will start up in whichever pause state has been set.
|
|
</para>
|
|
<note>
|
|
<para>
|
|
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
|
|
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
|
|
<emphasis>do not</emphasis> stop/start <application>repmgrd</application>.
|
|
</para>
|
|
</note>
|
|
</sect2>
|
|
</sect1>
|
|
</chapter>
|
|
|