mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-25 16:16:29 +00:00
doc: define entity for repmgrd
This commit is contained in:
@@ -21,42 +21,42 @@
|
||||
<title>Pausing repmgrd</title>
|
||||
|
||||
<para>
|
||||
In normal operation, <application>repmgrd</application> monitors the state of the
|
||||
In normal operation, &repmgrd; monitors the state of the
|
||||
PostgreSQL node it is running on, and will take appropriate action if problems
|
||||
are detected, e.g. (if so configured) promote the node to primary, if the existing
|
||||
primary has been determined as failed.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
However, <application>repmgrd</application> is unable to distinguish between
|
||||
However, &repmgrd; is unable to distinguish between
|
||||
planned outages (such as performing a <link linkend="performing-switchover">switchover</link>
|
||||
or installing PostgreSQL maintenance released), and an actual server outage. In versions prior to
|
||||
&repmgr; 4.2 it was necessary to stop <application>repmgrd</application> on all nodes (or at least
|
||||
on all nodes where <application>repmgrd</application> is
|
||||
&repmgr; 4.2 it was necessary to stop &repmgrd; on all nodes (or at least
|
||||
on all nodes where &repmgrd; is
|
||||
<link linkend="repmgrd-automatic-failover">configured for automatic failover</link>)
|
||||
to prevent <application>repmgrd</application> from making unintentional changes to the
|
||||
to prevent &repmgrd; from making unintentional changes to the
|
||||
replication cluster.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
From <link linkend="release-4.2">&repmgr; 4.2</link>, <application>repmgrd</application>
|
||||
From <link linkend="release-4.2">&repmgr; 4.2</link>, &repmgrd;
|
||||
can now be "paused", i.e. instructed not to take any action such as performing a failover.
|
||||
This can be done from any node in the cluster, removing the need to stop/restart
|
||||
each <application>repmgrd</application> individually.
|
||||
each &repmgrd; individually.
|
||||
</para>
|
||||
|
||||
<note>
|
||||
<para>
|
||||
For major PostgreSQL upgrades, e.g. from PostgreSQL 10 to PostgreSQL 11,
|
||||
<application>repmgrd</application> should be shut down completely and only started up
|
||||
&repmgrd; should be shut down completely and only started up
|
||||
once the &repmgr; packages for the new PostgreSQL major version have been installed.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<sect2 id="repmgrd-pausing-prerequisites">
|
||||
<title>Prerequisites for pausing <application>repmgrd</application></title>
|
||||
<title>Prerequisites for pausing &repmgrd;</title>
|
||||
<para>
|
||||
In order to be able to pause/unpause <application>repmgrd</application>, following
|
||||
In order to be able to pause/unpause &repmgrd;, following
|
||||
prerequisites must be met:
|
||||
<itemizedlist spacing="compact" mark="bullet">
|
||||
|
||||
@@ -86,9 +86,9 @@
|
||||
</sect2>
|
||||
|
||||
<sect2 id="repmgrd-pausing-execution">
|
||||
<title>Pausing/unpausing <application>repmgrd</application></title>
|
||||
<title>Pausing/unpausing &repmgrd;</title>
|
||||
<para>
|
||||
To pause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link>, e.g.:
|
||||
To pause &repmgrd;, execute <link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link>, e.g.:
|
||||
<programlisting>
|
||||
$ repmgr -f /etc/repmgr.conf daemon pause
|
||||
NOTICE: node 1 (node1) paused
|
||||
@@ -96,7 +96,7 @@ NOTICE: node 2 (node2) paused
|
||||
NOTICE: node 3 (node3) paused</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
The state of <application>repmgrd</application> on each node can be checked with
|
||||
The state of &repmgrd; on each node can be checked with
|
||||
<link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>, e.g.:
|
||||
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
|
||||
ID | Name | Role | Status | repmgrd | PID | Paused?
|
||||
@@ -109,12 +109,12 @@ NOTICE: node 3 (node3) paused</programlisting>
|
||||
<note>
|
||||
<para>
|
||||
If executing a switchover with <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
|
||||
&repmgr; will automatically pause/unpause <application>repmgrd</application> as part of the switchover process.
|
||||
&repmgr; will automatically pause/unpause &repmgrd; as part of the switchover process.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<para>
|
||||
If the primary (in this example, <literal>node1</literal>) is stopped, <application>repmgrd</application>
|
||||
If the primary (in this example, <literal>node1</literal>) is stopped, &repmgrd;
|
||||
running on one of the standbys (here: <literal>node2</literal>) will react like this:
|
||||
<programlisting>
|
||||
[2018-09-20 12:22:21] [WARNING] unable to connect to upstream node "node1" (ID: 1)
|
||||
@@ -130,14 +130,14 @@ NOTICE: node 3 (node3) paused</programlisting>
|
||||
[2018-09-20 12:22:33] [HINT] execute "repmgr daemon unpause" to resume normal failover mode</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
If the primary becomes available again (e.g. following a software upgrade), <application>repmgrd</application>
|
||||
If the primary becomes available again (e.g. following a software upgrade), &repmgrd;
|
||||
will automatically reconnect, e.g.:
|
||||
<programlisting>
|
||||
[2018-09-20 13:12:41] [NOTICE] reconnected to upstream node 1 after 8 seconds, resuming monitoring</programlisting>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
To unpause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>, e.g.:
|
||||
To unpause &repmgrd;, execute <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>, e.g.:
|
||||
<programlisting>
|
||||
$ repmgr -f /etc/repmgr.conf daemon unpause
|
||||
NOTICE: node 1 (node1) unpaused
|
||||
@@ -147,7 +147,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
|
||||
<note>
|
||||
<para>
|
||||
If the previous primary is no longer accessible when <application>repmgrd</application>
|
||||
If the previous primary is no longer accessible when &repmgrd;
|
||||
is unpaused, no failover action will be taken. Instead, a new primary must be manually promoted using
|
||||
<link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>,
|
||||
and any standbys attached to the new primary with
|
||||
@@ -156,13 +156,13 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
<para>
|
||||
This is to prevent <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
|
||||
resulting in the automatic promotion of a new primary, which may be a problem particularly
|
||||
in larger clusters, where <application>repmgrd</application> could select a different promotion
|
||||
in larger clusters, where &repmgrd; could select a different promotion
|
||||
candidate to the one intended by the administrator.
|
||||
</para>
|
||||
</note>
|
||||
</sect2>
|
||||
<sect2 id="repmgrd-pausing-details">
|
||||
<title>Details on the <application>repmgrd</application> pausing mechanism</title>
|
||||
<title>Details on the &repmgrd; pausing mechanism</title>
|
||||
|
||||
<para>
|
||||
The pause state of each node will be stored over a PostgreSQL restart.
|
||||
@@ -171,14 +171,14 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
<para>
|
||||
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
|
||||
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link> can be
|
||||
executed even if <application>repmgrd</application> is not running; in this case,
|
||||
<application>repmgrd</application> will start up in whichever pause state has been set.
|
||||
executed even if &repmgrd; is not running; in this case,
|
||||
&repmgrd; will start up in whichever pause state has been set.
|
||||
</para>
|
||||
<note>
|
||||
<para>
|
||||
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
|
||||
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
|
||||
<emphasis>do not</emphasis> stop/start <application>repmgrd</application>.
|
||||
<emphasis>do not</emphasis> stop/start &repmgrd;.
|
||||
</para>
|
||||
</note>
|
||||
</sect2>
|
||||
@@ -194,7 +194,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
<para>
|
||||
If WAL replay has been paused (using <command>pg_wal_replay_pause()</command>,
|
||||
on PostgreSQL 9.6 and earlier <command>pg_xlog_replay_pause()</command>),
|
||||
in a failover situation <application>repmgrd</application> will
|
||||
in a failover situation &repmgrd; will
|
||||
automatically resume WAL replay.
|
||||
</para>
|
||||
<para>
|
||||
@@ -225,9 +225,9 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
|
||||
<title>"degraded monitoring" mode</title>
|
||||
<para>
|
||||
In certain circumstances, <application>repmgrd</application> is not able to fulfill its primary mission
|
||||
In certain circumstances, &repmgrd; is not able to fulfill its primary mission
|
||||
of monitoring the node's upstream server. In these cases it enters "degraded monitoring"
|
||||
mode, where <application>repmgrd</application> remains active but is waiting for the situation
|
||||
mode, where &repmgrd; remains active but is waiting for the situation
|
||||
to be resolved.
|
||||
</para>
|
||||
<para>
|
||||
@@ -287,12 +287,12 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
<para>
|
||||
By default, <literal>repmgrd</literal> will continue in degraded monitoring mode indefinitely.
|
||||
However a timeout (in seconds) can be set with <varname>degraded_monitoring_timeout</varname>,
|
||||
after which <application>repmgrd</application> will terminate.
|
||||
after which &repmgrd; will terminate.
|
||||
</para>
|
||||
|
||||
<note>
|
||||
<para>
|
||||
If <application>repmgrd</application> is monitoring a primary mode which has been stopped
|
||||
If &repmgrd; is monitoring a primary mode which has been stopped
|
||||
and manually restarted as a standby attached to a new primary, it will automatically detect
|
||||
the status change and update the node record to reflect the node's new status
|
||||
as an active standby. It will then resume monitoring the node as a standby.
|
||||
@@ -313,7 +313,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
|
||||
<title>Storing monitoring data</title>
|
||||
<para>
|
||||
When <application>repmgrd</application> is running with the option <literal>monitoring_history=true</literal>,
|
||||
When &repmgrd; is running with the option <literal>monitoring_history=true</literal>,
|
||||
it will constantly write standby node status information to the
|
||||
<varname>monitoring_history</varname> table, providing a near-real time
|
||||
overview of replication status on all nodes
|
||||
@@ -351,7 +351,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
specify how many day's worth of data should be retained.
|
||||
</para>
|
||||
<para>
|
||||
It's possible to use <application>repmgrd</application> to run in monitoring
|
||||
It's possible to use &repmgrd; to run in monitoring
|
||||
mode only (without automatic failover capability) for some or all
|
||||
nodes by setting <literal>failover=manual</literal> in the node's
|
||||
<filename>repmgr.conf</filename> file. In the event of the node's upstream failing,
|
||||
|
||||
Reference in New Issue
Block a user