mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-23 15:16:29 +00:00
170 lines
7.5 KiB
Plaintext
170 lines
7.5 KiB
Plaintext
<chapter id="repmgrd-pausing" xreflabel="Pausing repmgrd">
|
|
|
|
<indexterm>
|
|
<primary>repmgrd</primary>
|
|
<secondary>pausing</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>pausing repmgrd</primary>
|
|
</indexterm>
|
|
|
|
<title>Pausing repmgrd</title>
|
|
|
|
<para>
|
|
In normal operation, <application>repmgrd</application> monitors the state of the
|
|
PostgreSQL node it is running on, and will take appropriate action if problems
|
|
are detected, e.g. (if so configured) promote the node to primary, if the existing
|
|
primary has been determined as failed.
|
|
</para>
|
|
|
|
<para>
|
|
However, <application>repmgrd</application> is unable to distinguish between
|
|
planned outages (such as performing a <link linkend="performing-switchover">switchover</link>
|
|
or upgrading a server), and an actual server outage. In versions prior to &repmgr; 4.2
|
|
it was necessary to stop <application>repmgrd</application> on all nodes (or at least
|
|
on all nodes where <application>repmgrd</application> is
|
|
<link linkend="repmgrd-automatic-failover">configured for automatic failover</link>)
|
|
to prevent <application>repmgrd</application> from making changes to the
|
|
replication cluster.
|
|
</para>
|
|
<para>
|
|
From <link linkend="release-4.2">&repmgr; 4.2</link>, <application>repmgrd</application>
|
|
can now be "paused", i.e. instructed not to take any action such as performing a failover.
|
|
This can be done from any node in the cluster, removing the need to stop/restart
|
|
each <application>repmgrd</application> individually.
|
|
</para>
|
|
|
|
<sect1 id="repmgrd-pausing-prerequisites">
|
|
<title>Prerequisites for pausing <application>repmgrd</application></title>
|
|
<para>
|
|
In order to be able to pause/unpause <application>repmgrd</application>, following
|
|
prerequisites must be met:
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
|
|
<listitem>
|
|
<simpara><link linkend="release-4.2">&repmgr; 4.2</link> or later must be installed on all nodes.</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>The same major &repmgr; version (e.g. 4.2) must be installed on all nodes (and preferably the same minor version).</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
PostgreSQL on all nodes must be accessible from the node where the
|
|
<literal>pause</literal>/<literal>unpause</literal> operation is executed, using the
|
|
<varname>conninfo</varname> string shown by <link linkend="repmgr-cluster-show"><command>repmgr cluster show</command></link>.
|
|
</simpara>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<note>
|
|
<para>
|
|
These conditions are required for normal &repmgr; operation in any case.
|
|
</para>
|
|
</note>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="repmgrd-pausing-execution">
|
|
<title>Pausing/unpausing <application>repmgrd</application></title>
|
|
<para>
|
|
To pause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link>, e.g.:
|
|
<programlisting>
|
|
$ repmgr -f /etc/repmgr.conf daemon pause
|
|
NOTICE: node 1 (node1) paused
|
|
NOTICE: node 2 (node2) paused
|
|
NOTICE: node 3 (node3) paused</programlisting>
|
|
</para>
|
|
<para>
|
|
The state of <application>repmgrd</application> on each node can be checked with
|
|
<link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>, e.g.:
|
|
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
|
|
ID | Name | Role | Status | repmgrd | PID | Paused?
|
|
----+-------+---------+---------+---------+------+---------
|
|
1 | node1 | primary | running | running | 7851 | yes
|
|
2 | node2 | standby | running | running | 7889 | yes
|
|
3 | node3 | standby | running | running | 7918 | yes</programlisting>
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
If executing a switchover with <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
|
|
&repmgr; will automatically pause/unpause <application>repmgrd</application> as part of the switchover process.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
If the primary (in this example, <literal>node1</literal>) is stopped, <application>repmgrd</application>
|
|
running on one of the standbys (here: <literal>node2</literal>) will react like this:
|
|
<programlisting>
|
|
[2018-09-20 12:22:21] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
|
|
[2018-09-20 12:22:21] [INFO] checking state of node 1, 1 of 5 attempts
|
|
[2018-09-20 12:22:21] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
...
|
|
[2018-09-20 12:22:24] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2018-09-20 12:22:25] [INFO] checking state of node 1, 5 of 5 attempts
|
|
[2018-09-20 12:22:25] [WARNING] unable to reconnect to node 1 after 5 attempts
|
|
[2018-09-20 12:22:25] [NOTICE] node is paused
|
|
[2018-09-20 12:22:33] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state
|
|
[2018-09-20 12:22:33] [DETAIL] repmgrd paused by administrator
|
|
[2018-09-20 12:22:33] [HINT] execute "repmgr daemon unpause" to resume normal failover mode</programlisting>
|
|
</para>
|
|
<para>
|
|
If the primary becomes available again (e.g. following a software upgrade), <application>repmgrd</application>
|
|
will automatically reconnect, e.g.:
|
|
<programlisting>
|
|
[2018-09-20 13:12:41] [NOTICE] reconnected to upstream node 1 after 8 seconds, resuming monitoring</programlisting>
|
|
</para>
|
|
|
|
<para>
|
|
To unpause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>, e.g.:
|
|
<programlisting>
|
|
$ repmgr -f /etc/repmgr.conf daemon unpause
|
|
NOTICE: node 1 (node1) unpaused
|
|
NOTICE: node 2 (node2) unpaused
|
|
NOTICE: node 3 (node3) unpaused</programlisting>
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
If the previous primary is no longer accessible when <application>repmgrd</application>
|
|
is unpaused, no failover action will be taken. Instead, a new primary must be manually promoted using
|
|
<link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>,
|
|
and any standbys attached to the new primary with
|
|
<link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>.
|
|
</para>
|
|
<para>
|
|
This is to prevent <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
|
|
resulting in the automatic promotion of a new primary, which may be a problem particularly
|
|
in larger clusters, where <application>repmgrd</application> could select a different promotion
|
|
candidate to the one intended by the administrator.
|
|
</para>
|
|
</note>
|
|
|
|
<sect2 id="repmgrd-pausing-details">
|
|
<title>Details on the <application>repmgrd</application> pausing mechanism</title>
|
|
|
|
<para>
|
|
The pause state of each node will be stored over a PostgreSQL restart.
|
|
</para>
|
|
|
|
<para>
|
|
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
|
|
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link> can be
|
|
executed even if <application>repmgrd</application> is not running; in this case,
|
|
<application>repmgrd</application> will start up in whichever pause state has been set.
|
|
</para>
|
|
<note>
|
|
<para>
|
|
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
|
|
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
|
|
<emphasis>do not</emphasis> stop/start <application>repmgrd</application>.
|
|
</para>
|
|
</note>
|
|
</sect2>
|
|
</sect1>
|
|
</chapter>
|
|
|