doc: define entity for repmgrd

This commit is contained in:
Ian Barwick
2019-05-01 10:36:54 +09:00
parent 4d1e11533e
commit dbeffbf29a
31 changed files with 298 additions and 297 deletions

View File

@@ -21,42 +21,42 @@
<title>Pausing repmgrd</title>
<para>
In normal operation, <application>repmgrd</application> monitors the state of the
In normal operation, &repmgrd; monitors the state of the
PostgreSQL node it is running on, and will take appropriate action if problems
are detected, e.g. (if so configured) promote the node to primary, if the existing
primary has been determined as failed.
</para>
<para>
However, <application>repmgrd</application> is unable to distinguish between
However, &repmgrd; is unable to distinguish between
planned outages (such as performing a <link linkend="performing-switchover">switchover</link>
or installing PostgreSQL maintenance released), and an actual server outage. In versions prior to
&repmgr; 4.2 it was necessary to stop <application>repmgrd</application> on all nodes (or at least
on all nodes where <application>repmgrd</application> is
&repmgr; 4.2 it was necessary to stop &repmgrd; on all nodes (or at least
on all nodes where &repmgrd; is
<link linkend="repmgrd-automatic-failover">configured for automatic failover</link>)
to prevent <application>repmgrd</application> from making unintentional changes to the
to prevent &repmgrd; from making unintentional changes to the
replication cluster.
</para>
<para>
From <link linkend="release-4.2">&repmgr; 4.2</link>, <application>repmgrd</application>
From <link linkend="release-4.2">&repmgr; 4.2</link>, &repmgrd;
can now be &quot;paused&quot;, i.e. instructed not to take any action such as performing a failover.
This can be done from any node in the cluster, removing the need to stop/restart
each <application>repmgrd</application> individually.
each &repmgrd; individually.
</para>
<note>
<para>
For major PostgreSQL upgrades, e.g. from PostgreSQL 10 to PostgreSQL 11,
<application>repmgrd</application> should be shut down completely and only started up
&repmgrd; should be shut down completely and only started up
once the &repmgr; packages for the new PostgreSQL major version have been installed.
</para>
</note>
<sect2 id="repmgrd-pausing-prerequisites">
<title>Prerequisites for pausing <application>repmgrd</application></title>
<title>Prerequisites for pausing &repmgrd;</title>
<para>
In order to be able to pause/unpause <application>repmgrd</application>, following
In order to be able to pause/unpause &repmgrd;, following
prerequisites must be met:
<itemizedlist spacing="compact" mark="bullet">
@@ -86,9 +86,9 @@
</sect2>
<sect2 id="repmgrd-pausing-execution">
<title>Pausing/unpausing <application>repmgrd</application></title>
<title>Pausing/unpausing &repmgrd;</title>
<para>
To pause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link>, e.g.:
To pause &repmgrd;, execute <link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link>, e.g.:
<programlisting>
$ repmgr -f /etc/repmgr.conf daemon pause
NOTICE: node 1 (node1) paused
@@ -96,7 +96,7 @@ NOTICE: node 2 (node2) paused
NOTICE: node 3 (node3) paused</programlisting>
</para>
<para>
The state of <application>repmgrd</application> on each node can be checked with
The state of &repmgrd; on each node can be checked with
<link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>, e.g.:
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
ID | Name | Role | Status | repmgrd | PID | Paused?
@@ -109,12 +109,12 @@ NOTICE: node 3 (node3) paused</programlisting>
<note>
<para>
If executing a switchover with <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
&repmgr; will automatically pause/unpause <application>repmgrd</application> as part of the switchover process.
&repmgr; will automatically pause/unpause &repmgrd; as part of the switchover process.
</para>
</note>
<para>
If the primary (in this example, <literal>node1</literal>) is stopped, <application>repmgrd</application>
If the primary (in this example, <literal>node1</literal>) is stopped, &repmgrd;
running on one of the standbys (here: <literal>node2</literal>) will react like this:
<programlisting>
[2018-09-20 12:22:21] [WARNING] unable to connect to upstream node "node1" (ID: 1)
@@ -130,14 +130,14 @@ NOTICE: node 3 (node3) paused</programlisting>
[2018-09-20 12:22:33] [HINT] execute "repmgr daemon unpause" to resume normal failover mode</programlisting>
</para>
<para>
If the primary becomes available again (e.g. following a software upgrade), <application>repmgrd</application>
If the primary becomes available again (e.g. following a software upgrade), &repmgrd;
will automatically reconnect, e.g.:
<programlisting>
[2018-09-20 13:12:41] [NOTICE] reconnected to upstream node 1 after 8 seconds, resuming monitoring</programlisting>
</para>
<para>
To unpause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>, e.g.:
To unpause &repmgrd;, execute <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>, e.g.:
<programlisting>
$ repmgr -f /etc/repmgr.conf daemon unpause
NOTICE: node 1 (node1) unpaused
@@ -147,7 +147,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<note>
<para>
If the previous primary is no longer accessible when <application>repmgrd</application>
If the previous primary is no longer accessible when &repmgrd;
is unpaused, no failover action will be taken. Instead, a new primary must be manually promoted using
<link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>,
and any standbys attached to the new primary with
@@ -156,13 +156,13 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<para>
This is to prevent <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
resulting in the automatic promotion of a new primary, which may be a problem particularly
in larger clusters, where <application>repmgrd</application> could select a different promotion
in larger clusters, where &repmgrd; could select a different promotion
candidate to the one intended by the administrator.
</para>
</note>
</sect2>
<sect2 id="repmgrd-pausing-details">
<title>Details on the <application>repmgrd</application> pausing mechanism</title>
<title>Details on the &repmgrd; pausing mechanism</title>
<para>
The pause state of each node will be stored over a PostgreSQL restart.
@@ -171,14 +171,14 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<para>
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link> can be
executed even if <application>repmgrd</application> is not running; in this case,
<application>repmgrd</application> will start up in whichever pause state has been set.
executed even if &repmgrd; is not running; in this case,
&repmgrd; will start up in whichever pause state has been set.
</para>
<note>
<para>
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
<emphasis>do not</emphasis> stop/start <application>repmgrd</application>.
<emphasis>do not</emphasis> stop/start &repmgrd;.
</para>
</note>
</sect2>
@@ -194,7 +194,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<para>
If WAL replay has been paused (using <command>pg_wal_replay_pause()</command>,
on PostgreSQL 9.6 and earlier <command>pg_xlog_replay_pause()</command>),
in a failover situation <application>repmgrd</application> will
in a failover situation &repmgrd; will
automatically resume WAL replay.
</para>
<para>
@@ -225,9 +225,9 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<title>"degraded monitoring" mode</title>
<para>
In certain circumstances, <application>repmgrd</application> is not able to fulfill its primary mission
In certain circumstances, &repmgrd; is not able to fulfill its primary mission
of monitoring the node's upstream server. In these cases it enters &quot;degraded monitoring&quot;
mode, where <application>repmgrd</application> remains active but is waiting for the situation
mode, where &repmgrd; remains active but is waiting for the situation
to be resolved.
</para>
<para>
@@ -287,12 +287,12 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<para>
By default, <literal>repmgrd</literal> will continue in degraded monitoring mode indefinitely.
However a timeout (in seconds) can be set with <varname>degraded_monitoring_timeout</varname>,
after which <application>repmgrd</application> will terminate.
after which &repmgrd; will terminate.
</para>
<note>
<para>
If <application>repmgrd</application> is monitoring a primary mode which has been stopped
If &repmgrd; is monitoring a primary mode which has been stopped
and manually restarted as a standby attached to a new primary, it will automatically detect
the status change and update the node record to reflect the node's new status
as an active standby. It will then resume monitoring the node as a standby.
@@ -313,7 +313,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<title>Storing monitoring data</title>
<para>
When <application>repmgrd</application> is running with the option <literal>monitoring_history=true</literal>,
When &repmgrd; is running with the option <literal>monitoring_history=true</literal>,
it will constantly write standby node status information to the
<varname>monitoring_history</varname> table, providing a near-real time
overview of replication status on all nodes
@@ -351,7 +351,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
specify how many day's worth of data should be retained.
</para>
<para>
It's possible to use <application>repmgrd</application> to run in monitoring
It's possible to use &repmgrd; to run in monitoring
mode only (without automatic failover capability) for some or all
nodes by setting <literal>failover=manual</literal> in the node's
<filename>repmgr.conf</filename> file. In the event of the node's upstream failing,