mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-22 22:56:29 +00:00
doc: define entity for repmgrd
This commit is contained in:
@@ -19,7 +19,7 @@
|
||||
<para>
|
||||
&repmgr; 3.x builds on the improved replication facilities added
|
||||
in PostgreSQL 9.3, as well as improved automated failover support
|
||||
via <application>repmgrd</application>, and is not compatible with PostgreSQL 9.2
|
||||
via &repmgrd;, and is not compatible with PostgreSQL 9.2
|
||||
and earlier. We recommend upgrading to &repmgr; 4, as the &repmgr; 3.x
|
||||
series is no longer maintained.
|
||||
</para>
|
||||
@@ -135,7 +135,7 @@
|
||||
No.
|
||||
</para>
|
||||
<para>
|
||||
&repmgr; (together with <application>repmgrd</application>) assists with
|
||||
&repmgr; (together with &repmgrd;) assists with
|
||||
<emphasis>managing</emphasis> replication. It does not actually perform replication, which
|
||||
is part of the core PostgreSQL functionality.
|
||||
</para>
|
||||
@@ -152,8 +152,8 @@
|
||||
<title>Does it matter if different &repmgr; versions are present in the replication cluster?</title>
|
||||
<para>
|
||||
Yes. If different "major" &repmgr; versions (e.g. 3.3.x and 4.1.x) are present,
|
||||
&repmgr; (in particular <application>repmgrd</application>)
|
||||
may not run, or run properly, or in the worst case (if different <application>repmgrd</application>
|
||||
&repmgr; (in particular &repmgrd;)
|
||||
may not run, or run properly, or in the worst case (if different &repmgrd;
|
||||
versions are running and there are differences in the failover implementation) break
|
||||
your replication cluster.
|
||||
</para>
|
||||
@@ -282,19 +282,19 @@
|
||||
|
||||
<sect2 id="faq-repmgr-shared-preload-libaries-no-repmgrd" xreflabel="shared_preload_libraries without repmgrd">
|
||||
<title>Do I need to include <literal>shared_preload_libraries = 'repmgr'</literal>
|
||||
in <filename>postgresql.conf</filename> if I'm not using <application>repmgrd</application>?</title>
|
||||
in <filename>postgresql.conf</filename> if I'm not using &repmgrd;?</title>
|
||||
<para>
|
||||
No, the <literal>repmgr</literal> shared library is only needed when running <application>repmgrd</application>.
|
||||
If you later decide to run <application>repmgrd</application>, you just need to add
|
||||
No, the <literal>repmgr</literal> shared library is only needed when running &repmgrd;.
|
||||
If you later decide to run &repmgrd;, you just need to add
|
||||
<literal>shared_preload_libraries = 'repmgr'</literal> and restart PostgreSQL.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="faq-repmgr-permissions" xreflabel="Replication permission problems">
|
||||
<title>I've provided replication permission for the <literal>repmgr</literal> user in <filename>pg_hba.conf</filename>
|
||||
but <command>repmgr</command>/<application>repmgrd</application> complains it can't connect to the server... Why?</title>
|
||||
but <command>repmgr</command>/&repmgrd; complains it can't connect to the server... Why?</title>
|
||||
<para>
|
||||
<command>repmgr</command> and <application>repmgrd</application> need to be able to connect to the repmgr database
|
||||
<command>repmgr</command> and &repmgrd; need to be able to connect to the repmgr database
|
||||
with a normal connection to query metadata. The <literal>replication</literal> connection
|
||||
permission is for PostgreSQL's streaming replication (and doesn't necessarily need to be the <literal>repmgr</literal> user).
|
||||
</para>
|
||||
@@ -349,7 +349,7 @@
|
||||
</sect1>
|
||||
|
||||
<sect1 id="faq-repmgrd" xreflabel="repmgrd">
|
||||
<title><application>repmgrd</application></title>
|
||||
<title>&repmgrd;</title>
|
||||
|
||||
|
||||
<sect2 id="faq-repmgrd-prevent-promotion" xreflabel="Prevent standby from being promoted to primary">
|
||||
@@ -365,12 +365,12 @@
|
||||
</sect2>
|
||||
|
||||
<sect2 id="faq-repmgrd-delayed-standby" xreflabel="Delayed standby support">
|
||||
<title>Does <application>repmgrd</application> support delayed standbys?</title>
|
||||
<title>Does &repmgrd; support delayed standbys?</title>
|
||||
<para>
|
||||
<application>repmgrd</application> can monitor delayed standbys - those set up with
|
||||
&repmgrd; can monitor delayed standbys - those set up with
|
||||
<varname>recovery_min_apply_delay</varname> set to a non-zero value
|
||||
in <filename>recovery.conf</filename> - but as it's not currently possible
|
||||
to directly examine the value applied to the standby, <application>repmgrd</application>
|
||||
to directly examine the value applied to the standby, &repmgrd;
|
||||
may not be able to properly evaluate the node as a promotion candidate.
|
||||
</para>
|
||||
<para>
|
||||
@@ -379,13 +379,13 @@
|
||||
<filename>repmgr.conf</filename>.
|
||||
</para>
|
||||
<para>
|
||||
Note that after registering a delayed standby, <application>repmgrd</application> will only start
|
||||
Note that after registering a delayed standby, &repmgrd; will only start
|
||||
once the metadata added in the primary node has been replicated.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
<sect2 id="faq-repmgrd-logfile-rotate" xreflabel="repmgrd logfile rotation">
|
||||
<title>How can I get <application>repmgrd</application> to rotate its logfile?</title>
|
||||
<title>How can I get &repmgrd; to rotate its logfile?</title>
|
||||
<para>
|
||||
Configure your system's <literal>logrotate</literal> service to do this; see <xref linkend="repmgrd-log-rotation">.
|
||||
</para>
|
||||
@@ -393,11 +393,11 @@
|
||||
</sect2>
|
||||
|
||||
<sect2 id="faq-repmgrd-recloned-no-start" xreflabel="repmgrd not restarting after node cloned">
|
||||
<title>I've recloned a failed primary as a standby, but <application>repmgrd</application> refuses to start?</title>
|
||||
<title>I've recloned a failed primary as a standby, but &repmgrd; refuses to start?</title>
|
||||
<para>
|
||||
Check you registered the standby after recloning. If unregistered, the standby
|
||||
cannot be considered as a promotion candidate even if <varname>failover</varname> is set to
|
||||
<literal>automatic</literal>, which is probably not what you want. <application>repmgrd</application> will start if
|
||||
<literal>automatic</literal>, which is probably not what you want. &repmgrd; will start if
|
||||
<varname>failover</varname> is set to <literal>manual</literal> so the node's replication status can still
|
||||
be monitored, if desired.
|
||||
</para>
|
||||
@@ -405,7 +405,7 @@
|
||||
|
||||
<sect2 id="faq-repmgrd-pg-bindir" xreflabel="repmgrd does not apply pg_bindir to promote_command or follow_command">
|
||||
<title>
|
||||
<application>repmgrd</application> ignores pg_bindir when executing <varname>promote_command</varname> or <varname>follow_command</varname>
|
||||
&repmgrd; ignores pg_bindir when executing <varname>promote_command</varname> or <varname>follow_command</varname>
|
||||
</title>
|
||||
<para>
|
||||
<varname>promote_command</varname> or <varname>follow_command</varname> can be user-defined scripts,
|
||||
@@ -416,13 +416,13 @@
|
||||
|
||||
<sect2 id="faq-repmgrd-startup-no-upstream" xreflabel="repmgrd does not start if upstream node is not running">
|
||||
<title>
|
||||
<application>repmgrd</application> aborts startup with the error "<literal>upstream node must be running before repmgrd can start</literal>"
|
||||
&repmgrd; aborts startup with the error "<literal>upstream node must be running before repmgrd can start</literal>"
|
||||
</title>
|
||||
<para>
|
||||
<application>repmgrd</application> does this to avoid starting up on a replication cluster
|
||||
which is not in a healthy state. If the upstream is unavailable, <application>repmgrd</application>
|
||||
&repmgrd; does this to avoid starting up on a replication cluster
|
||||
which is not in a healthy state. If the upstream is unavailable, &repmgrd;
|
||||
may initiate a failover immediately after starting up, which could have unintended side-effects,
|
||||
particularly if <application>repmgrd</application> is not running on other nodes.
|
||||
particularly if &repmgrd; is not running on other nodes.
|
||||
</para>
|
||||
<para>
|
||||
In particular, it's possible that the node's local copy of the <literal>repmgr.nodes</literal> copy
|
||||
@@ -430,7 +430,7 @@
|
||||
</para>
|
||||
<para>
|
||||
The onus is therefore on the adminstrator to manually set the cluster to a stable, healthy state before
|
||||
starting <application>repmgrd</application>.
|
||||
starting &repmgrd;.
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
|
||||
@@ -310,7 +310,7 @@
|
||||
</para>
|
||||
<para>
|
||||
See also <xref linkend="repmgrd-configuration-debian-ubuntu"> for some specifics related
|
||||
to configuring the <application>repmgrd</application> daemon.
|
||||
to configuring the &repmgrd; daemon.
|
||||
</para>
|
||||
|
||||
<table id="debian-9-packages">
|
||||
@@ -552,7 +552,7 @@ repmgr96-4.1.1-0.0git320.g5113ab0.1.el7.x86_64.rpm</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
PID file location: the default <application>repmgrd</application> PID file
|
||||
PID file location: the default &repmgrd; PID file
|
||||
location can be hard-coded by patching <varname>package_pid_file</varname>
|
||||
in <filename>repmgrd.c</filename>:
|
||||
<programlisting>
|
||||
|
||||
@@ -105,7 +105,7 @@
|
||||
documentation section <link linkend="upgrading-major-version">Upgrading a major version release</link>.
|
||||
</para>
|
||||
<para>
|
||||
If <application>repmgrd</application> is in use, a PostgreSQL restart <emphasis>is</emphasis> required;
|
||||
If &repmgrd; is in use, a PostgreSQL restart <emphasis>is</emphasis> required;
|
||||
in that case we suggest combining this &repmgr; upgrade with the next PostgreSQL
|
||||
minor release, which will require a PostgreSQL restart in any case.
|
||||
</para>
|
||||
@@ -113,7 +113,7 @@
|
||||
|
||||
<important>
|
||||
<para>
|
||||
On Debian-based systems, including Ubuntu, if using <application>repmgrd</application>
|
||||
On Debian-based systems, including Ubuntu, if using &repmgrd;
|
||||
please ensure that in the file <filename>/etc/init.d/repmgrd</filename>, the parameter
|
||||
<varname>REPMGRD_OPTS</varname> contains "<literal>--daemonize=false</literal>", e.g.:
|
||||
<programlisting>
|
||||
@@ -156,7 +156,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
<para>
|
||||
New commands <link linkend="repmgr-daemon-start"><command>repmgr daemon start</command></link> and
|
||||
<link linkend="repmgr-daemon-stop"><command>repmgr daemon stop</command></link>:
|
||||
these provide a standardized way of starting and stopping <application>repmgrd</application>.
|
||||
these provide a standardized way of starting and stopping &repmgrd;.
|
||||
GitHub #528.
|
||||
</para>
|
||||
<note>
|
||||
@@ -172,7 +172,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
<para>
|
||||
<link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>
|
||||
additionally displays the node priority and the interval (in seconds) since the
|
||||
<application>repmgrd</application> instance last verified its upstream node was available.
|
||||
&repmgrd; instance last verified its upstream node was available.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
@@ -240,20 +240,20 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application> will no longer consider nodes where <application>repmgrd</application>
|
||||
&repmgrd; will no longer consider nodes where &repmgrd;
|
||||
is not running as promotion candidates.
|
||||
</para>
|
||||
<para>
|
||||
Previously, if <application>repmgrd</application> was not running on a node, but
|
||||
Previously, if &repmgrd; was not running on a node, but
|
||||
that node qualified as the promotion candidate, it would never be promoted due to
|
||||
the absence of a running <application>repmgrd</application>.
|
||||
the absence of a running &repmgrd;.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Add option <option>connection_check_type</option> to enable selection of the method
|
||||
<application>repmgrd</application> uses to determine whether the upstream node is available.
|
||||
&repmgrd; uses to determine whether the upstream node is available.
|
||||
</para>
|
||||
<para>
|
||||
Possible values are <literal>ping</literal> (default; uses <command>PQping()</command> to
|
||||
@@ -266,7 +266,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
<listitem>
|
||||
<para>
|
||||
New configuration option <link linkend="repmgrd-failover-validation"><option>failover_validation_command</option></link>
|
||||
to allow an external mechanism to validate the failover decision made by <application>repmgrd</application>.
|
||||
to allow an external mechanism to validate the failover decision made by &repmgrd;.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
@@ -279,7 +279,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
In a failover situation, <application>repmgrd</application> will not attempt to promote a
|
||||
In a failover situation, &repmgrd; will not attempt to promote a
|
||||
node if another primary has already appeared (e.g. by being promoted manually).
|
||||
GitHub #420.
|
||||
</para>
|
||||
@@ -364,7 +364,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: on a cascaded standby, don't fail over if
|
||||
&repmgrd;: on a cascaded standby, don't fail over if
|
||||
<literal>failover=manual</literal>. GitHub #531.
|
||||
</para>
|
||||
</listitem>
|
||||
@@ -393,7 +393,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<important>
|
||||
<para>
|
||||
On Debian-based systems, including Ubuntu, if using <application>repmgrd</application>
|
||||
On Debian-based systems, including Ubuntu, if using &repmgrd;
|
||||
please ensure that the in the file <filename>/etc/init.d/repmgrd</filename>, the parameter
|
||||
<varname>REPMGRD_OPTS</varname> contains "<literal>--daemonize=false</literal>", e.g.:
|
||||
<programlisting>
|
||||
@@ -488,12 +488,12 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application> can now be "paused", i.e. instructed
|
||||
&repmgrd; can now be "paused", i.e. instructed
|
||||
not to take any action such as a failover, even if the prerequisites for such an
|
||||
action are detected.
|
||||
</para>
|
||||
<para>
|
||||
This removes the need to stop <application>repmgrd</application> on all nodes when
|
||||
This removes the need to stop &repmgrd; on all nodes when
|
||||
performing a planned operation such as a switchover.
|
||||
</para>
|
||||
<para>
|
||||
@@ -519,7 +519,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: fix parsing of <option>-d/--daemonize</option> option.
|
||||
&repmgrd;: fix parsing of <option>-d/--daemonize</option> option.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
@@ -537,7 +537,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
<para>
|
||||
We recommend upgrading to this version as soon as possible.
|
||||
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.1.0;
|
||||
<application>repmgrd</application> (if running) should be restarted.
|
||||
&repmgrd; (if running) should be restarted.
|
||||
See <xref linkend="upgrading-repmgr"> for more details.
|
||||
</para>
|
||||
|
||||
@@ -619,8 +619,8 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
<para>
|
||||
Check <varname>promote_command</varname> and <varname>follow_command</varname>
|
||||
are defined when reloading configuration. These were checked on startup but
|
||||
not reload by <application>repmgrd</application>, which made it possible to
|
||||
make <application>repmgrd</application> with invalid values. It's unlikely
|
||||
not reload by &repmgrd;, which made it possible to
|
||||
make &repmgrd; with invalid values. It's unlikely
|
||||
anyone would want to do this, but we should make it impossible anyway.
|
||||
(GitHub #486).
|
||||
</para>
|
||||
@@ -661,7 +661,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: fix startup on witness node when local data is stale. (GitHub #488, #489).
|
||||
&repmgrd;: fix startup on witness node when local data is stale. (GitHub #488, #489).
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
@@ -687,7 +687,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
<title>Release 4.1.0</title>
|
||||
<para><emphasis>Tue July 31, 2018</emphasis></para>
|
||||
<para>
|
||||
&repmgr; 4.1.0 introduces some changes to <application>repmgrd</application>
|
||||
&repmgr; 4.1.0 introduces some changes to &repmgrd;
|
||||
behaviour and some additional configuration parameters.
|
||||
</para>
|
||||
<para>
|
||||
@@ -703,7 +703,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application> must be restarted on all nodes where it is running.
|
||||
&repmgrd; must be restarted on all nodes where it is running.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
@@ -825,14 +825,14 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: create a PID file by default
|
||||
&repmgrd;: create a PID file by default
|
||||
(GitHub #457). For details, see <xref linkend="repmgrd-pid-file">.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: daemonize process by default.
|
||||
&repmgrd;: daemonize process by default.
|
||||
In case, for whatever reason, the user does not wish to daemonize the
|
||||
process, provide <option>--daemonize=false</option>.
|
||||
(GitHub #458).
|
||||
@@ -901,7 +901,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
<para>
|
||||
We recommend upgrading to this version as soon as possible.
|
||||
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.5;
|
||||
<application>repmgrd</application> (if running) should be restarted. See <xref linkend="upgrading-repmgr">
|
||||
&repmgrd; (if running) should be restarted. See <xref linkend="upgrading-repmgr">
|
||||
for more details.
|
||||
</para>
|
||||
|
||||
@@ -988,7 +988,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: ensure local node is counted as quorum member
|
||||
&repmgrd;: ensure local node is counted as quorum member
|
||||
(GitHub #439)
|
||||
</para>
|
||||
</listitem>
|
||||
@@ -1005,7 +1005,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
<para>
|
||||
&repmgr; 4.0.5 contains a number of usability enhancements related to
|
||||
<application>pg_rewind</application> usage, <filename>recovery.conf</filename>
|
||||
generation and (in <application>repmgrd</application>) handling of various
|
||||
generation and (in &repmgrd;) handling of various
|
||||
corner-case situations, as well as a number of bug fixes.
|
||||
</para>
|
||||
|
||||
@@ -1070,7 +1070,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: set <literal>connect_timeout=2</literal> (if not explicitly set)
|
||||
&repmgrd;: set <literal>connect_timeout=2</literal> (if not explicitly set)
|
||||
when pinging a server.
|
||||
</para>
|
||||
</listitem>
|
||||
@@ -1126,20 +1126,20 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: handle <command>pg_ctl promote</command> timeout (GitHub #425).
|
||||
&repmgrd;: handle <command>pg_ctl promote</command> timeout (GitHub #425).
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: handle failover situation with only two nodes in the primary
|
||||
&repmgrd;: handle failover situation with only two nodes in the primary
|
||||
location, and at least one node in another location (GitHub #407).
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: prevent standby connection handle from going stale.
|
||||
&repmgrd;: prevent standby connection handle from going stale.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
@@ -1163,7 +1163,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.3;
|
||||
<application>repmgrd</application> (if running) should be restarted. See <xref linkend="upgrading-repmgr">
|
||||
&repmgrd; (if running) should be restarted. See <xref linkend="upgrading-repmgr">
|
||||
for more details.
|
||||
</para>
|
||||
|
||||
@@ -1242,14 +1242,14 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: improve detection of status change from primary to
|
||||
&repmgrd;: improve detection of status change from primary to
|
||||
standby
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: improve reconnection to the local node after a
|
||||
&repmgrd;: improve reconnection to the local node after a
|
||||
failover (previously a connection error due to the node starting up was being
|
||||
interpreted as the node being unavailable)
|
||||
</para>
|
||||
@@ -1257,14 +1257,14 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: when running on a witness server, correctly connect
|
||||
&repmgrd;: when running on a witness server, correctly connect
|
||||
to new primary after a failover
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application>: add <link linkend="event-notifications">event notification</link>
|
||||
&repmgrd;: add <link linkend="event-notifications">event notification</link>
|
||||
<literal>repmgrd_shutdown</literal> (GitHub #393)
|
||||
</para>
|
||||
</listitem>
|
||||
@@ -1433,7 +1433,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
This release can be installed as a simple package upgrade from &repmgr; 4.0.1 or 4.0;
|
||||
<application>repmgrd</application> (if running) should be restarted.
|
||||
&repmgrd; (if running) should be restarted.
|
||||
</para>
|
||||
|
||||
<sect2>
|
||||
@@ -1653,10 +1653,10 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
<listitem>
|
||||
<para>
|
||||
<emphasis>improved logging output</emphasis>:
|
||||
&repmgr; (and <application>repmgrd</application>) now provide more explicit
|
||||
&repmgr; (and &repmgrd;) now provide more explicit
|
||||
logging output giving a better picture of what is going on. Where appropriate,
|
||||
<literal>DETAIL</literal> and <literal>HINT</literal> log lines provide additional
|
||||
detail and suggestions for resolving problems. Additionally, <application>repmgrd</application>
|
||||
detail and suggestions for resolving problems. Additionally, &repmgrd;
|
||||
now emits informational log lines at regular, configurable intervals
|
||||
to confirm that it's running correctly and which node(s) it's monitoring.
|
||||
</para>
|
||||
@@ -1703,11 +1703,11 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
<emphasis>automatic failover</emphasis>:
|
||||
improved detection of node status; promotion decision based on a consensual
|
||||
model, with the promoted primary explicitly informing other standbys to
|
||||
follow it. The <application>repmgrd</application> daemon will continue
|
||||
follow it. The &repmgrd; daemon will continue
|
||||
functioning even if the monitored PostgreSQL instance is down, and resume
|
||||
monitoring if it reappears. Additionally, if the instance's role has changed
|
||||
(typically from a primary to a standby, e.g. following reintegration of a
|
||||
failed primary using <xref linkend="repmgr-node-rejoin">) <application>repmgrd</application>
|
||||
failed primary using <xref linkend="repmgr-node-rejoin">) &repmgrd;
|
||||
will automatically resume monitoring it as a standby.
|
||||
</para>
|
||||
</listitem>
|
||||
@@ -1793,7 +1793,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<application>repmgrd</application>
|
||||
&repmgrd;
|
||||
<itemizedlist>
|
||||
|
||||
<listitem><para>
|
||||
@@ -1915,7 +1915,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
|
||||
<listitem>
|
||||
<simpara>
|
||||
new parameter <varname>log_status_interval</varname>, which causes
|
||||
<application>repmgrd</application> to emit a status log
|
||||
&repmgrd; to emit a status log
|
||||
line at the specified interval
|
||||
</simpara>
|
||||
</listitem>
|
||||
|
||||
@@ -80,7 +80,7 @@
|
||||
the maximum level of logging output.
|
||||
</para>
|
||||
<para>
|
||||
If issues are encountered with <application>repmgrd</application>,
|
||||
If issues are encountered with &repmgrd;,
|
||||
please provide relevant extracts from the &repmgr; log files
|
||||
and if possible the PostgreSQL log itself. Please ensure these
|
||||
logs do not contain any confidential data.
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
<title>Log settings</title>
|
||||
|
||||
<para>
|
||||
By default, &repmgr; and <application>repmgrd</application> write log output to
|
||||
By default, &repmgr; and &repmgrd; write log output to
|
||||
<literal>STDERR</literal>. An alternative log destination can be specified
|
||||
(either a file or <literal>syslog</literal>).
|
||||
</para>
|
||||
@@ -24,7 +24,7 @@
|
||||
<para>
|
||||
This behaviour can be overriden with the command line option <option>--log-to-file</option>,
|
||||
which will redirect all logging output to the configured log destination. This is recommended
|
||||
when &repmgr; is executed by another application, particularly <application>repmgrd</application>,
|
||||
when &repmgr; is executed by another application, particularly &repmgrd;,
|
||||
to enable log output generated by the &repmgr; application to be stored for later reference.
|
||||
</para>
|
||||
</note>
|
||||
@@ -93,9 +93,9 @@
|
||||
</term>
|
||||
<listitem>
|
||||
<para>
|
||||
This setting causes <application>repmgrd</application> to emit a status log
|
||||
This setting causes &repmgrd; to emit a status log
|
||||
line at the specified interval (in seconds, default <literal>300</literal>)
|
||||
describing <application>repmgrd</application>'s current state, e.g.:
|
||||
describing &repmgrd;'s current state, e.g.:
|
||||
</para>
|
||||
<programlisting>
|
||||
[2018-07-12 00:47:32] [INFO] monitoring connection to upstream node "node1" (ID: 1)</programlisting>
|
||||
|
||||
@@ -99,7 +99,7 @@
|
||||
<ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</ulink>.
|
||||
</para>
|
||||
<para>
|
||||
For <application>repmgrd</application>-specific settings, see <xref linkend="repmgrd-configuration">.
|
||||
For &repmgrd;-specific settings, see <xref linkend="repmgrd-configuration">.
|
||||
</para>
|
||||
|
||||
<note>
|
||||
|
||||
@@ -10,7 +10,7 @@
|
||||
<title>Service command settings</title>
|
||||
|
||||
<para>
|
||||
In some circumstances, &repmgr; (and <application>repmgrd</application>) need to
|
||||
In some circumstances, &repmgr; (and &repmgrd;) need to
|
||||
be able to stop, start or restart PostgreSQL. &repmgr; commands which need to do this
|
||||
include <link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>,
|
||||
<link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link> and
|
||||
@@ -68,7 +68,7 @@
|
||||
</para>
|
||||
<para>
|
||||
Do not confuse this with <varname>promote_command</varname>, which is used
|
||||
by <application>repmgrd</application> to execute <xref linkend="repmgr-standby-promote">.
|
||||
by &repmgrd; to execute <xref linkend="repmgr-standby-promote">.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
|
||||
@@ -11,7 +11,7 @@
|
||||
<title>Configuration file</title>
|
||||
|
||||
<para>
|
||||
<application>repmgr</application> and <application>repmgrd</application>
|
||||
<application>repmgr</application> and &repmgrd;
|
||||
use a common configuration file, by default called
|
||||
<filename>repmgr.conf</filename> (although any name can be used if explicitly specified).
|
||||
<filename>repmgr.conf</filename> must contain a number of required parameters, including
|
||||
|
||||
@@ -6,7 +6,7 @@
|
||||
|
||||
<title>Event Notifications</title>
|
||||
<para>
|
||||
Each time &repmgr; or <application>repmgrd</application> perform a significant event, a record
|
||||
Each time &repmgr; or &repmgrd; perform a significant event, a record
|
||||
of that event is written into the <literal>repmgr.events</literal> table together with
|
||||
a timestamp, an indication of failure or success, and further details
|
||||
if appropriate. This is useful for gaining an overview of events
|
||||
@@ -207,7 +207,7 @@
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Events generated by <application>repmgrd</application> (streaming replication mode):
|
||||
Events generated by &repmgrd; (streaming replication mode):
|
||||
|
||||
<itemizedlist spacing="compact" mark="bullet">
|
||||
<listitem>
|
||||
@@ -274,7 +274,7 @@
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Events generated by <application>repmgrd</application> (BDR mode):
|
||||
Events generated by &repmgrd; (BDR mode):
|
||||
<itemizedlist spacing="compact" mark="bullet">
|
||||
<listitem>
|
||||
<simpara><literal>bdr_failover</literal></simpara>
|
||||
|
||||
@@ -45,14 +45,14 @@
|
||||
</simpara>
|
||||
<simpara>
|
||||
If different "major" &repmgr; versions (e.g. 3.3.x and 4.1.x)
|
||||
are installed on different nodes, in the best case &repmgr; (in particular <application>repmgrd</application>)
|
||||
are installed on different nodes, in the best case &repmgr; (in particular &repmgrd;)
|
||||
will not run. In the worst case, you will end up with a broken cluster.
|
||||
</simpara>
|
||||
</note>
|
||||
|
||||
<para>
|
||||
A dedicated system user for &repmgr; is <emphasis>not</emphasis> required; as many &repmgr; and
|
||||
<application>repmgrd</application> actions require direct access to the PostgreSQL data directory,
|
||||
&repmgrd; actions require direct access to the PostgreSQL data directory,
|
||||
these commands should be executed by the <literal>postgres</literal> user.
|
||||
</para>
|
||||
|
||||
|
||||
@@ -58,7 +58,7 @@
|
||||
<listitem>
|
||||
<simpara>
|
||||
This is the action which occurs if a primary server fails and a suitable standby
|
||||
is promoted as the new primary. The <application>repmgrd</application> daemon supports automatic failover
|
||||
is promoted as the new primary. The &repmgrd; daemon supports automatic failover
|
||||
to minimise downtime.
|
||||
</simpara>
|
||||
</listitem>
|
||||
@@ -107,7 +107,7 @@
|
||||
promotes a (local) standby.
|
||||
</para>
|
||||
<para>
|
||||
A witness server only needs to be created if <application>repmgrd</application>
|
||||
A witness server only needs to be created if &repmgrd;
|
||||
is in use.
|
||||
</para>
|
||||
</listitem>
|
||||
@@ -198,7 +198,7 @@
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>repmgr.monitoring_history</literal>: historical standby monitoring information
|
||||
written by <application>repmgrd</application></simpara>
|
||||
written by &repmgrd;</simpara>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
@@ -214,7 +214,7 @@
|
||||
name of the server's upstream node</simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara>repmgr.replication_status: when <application>repmgrd</application>'s monitoring is enabled, shows
|
||||
<simpara>repmgr.replication_status: when &repmgrd;'s monitoring is enabled, shows
|
||||
current monitoring status for each standby.</simpara>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
|
||||
@@ -352,7 +352,7 @@
|
||||
slot_name |
|
||||
config_file | /etc/repmgr.conf</programlisting>
|
||||
<para>
|
||||
Each server in the replication cluster will have its own record. If <application>repmgrd</application>
|
||||
Each server in the replication cluster will have its own record. If &repmgrd;
|
||||
is in use, the fields <literal>upstream_node_id</literal>, <literal>active</literal> and
|
||||
<literal>type</literal> will be updated when the node's status or role changes.
|
||||
</para>
|
||||
|
||||
@@ -38,7 +38,7 @@
|
||||
<title>Notes</title>
|
||||
|
||||
<para>
|
||||
Monitoring history will only be written if <application>repmgrd</application> is active, and
|
||||
Monitoring history will only be written if &repmgrd; is active, and
|
||||
<varname>monitoring_history</varname> is set to <literal>true</literal> in
|
||||
<filename>repmgr.conf</filename>.
|
||||
</para>
|
||||
|
||||
@@ -14,30 +14,30 @@
|
||||
|
||||
<refnamediv>
|
||||
<refname>repmgr daemon pause</refname>
|
||||
<refpurpose>Instruct all <application>repmgrd</application> instances in the replication cluster to pause failover operations</refpurpose>
|
||||
<refpurpose>Instruct all &repmgrd; instances in the replication cluster to pause failover operations</refpurpose>
|
||||
</refnamediv>
|
||||
|
||||
<refsect1>
|
||||
<title>Description</title>
|
||||
<para>
|
||||
This command can be run on any active node in the replication cluster to instruct all
|
||||
running <application>repmgrd</application> instances to "pause" themselves, i.e. take no
|
||||
running &repmgrd; instances to "pause" themselves, i.e. take no
|
||||
action (such as promoting themselves or following a new primary) if a failover event is detected.
|
||||
</para>
|
||||
<para>
|
||||
This functionality is useful for performing maintenance operations, such as switchovers
|
||||
or upgrades, which might otherwise trigger a failover if <application>repmgrd</application>
|
||||
or upgrades, which might otherwise trigger a failover if &repmgrd;
|
||||
is running normally.
|
||||
</para>
|
||||
<note>
|
||||
<para>
|
||||
It's important to wait a few seconds after restarting PostgreSQL on any node before running
|
||||
<command>repmgr daemon pause</command>, as the <application>repmgrd</application> instance
|
||||
<command>repmgr daemon pause</command>, as the &repmgrd; instance
|
||||
on the restarted node will take a second or two before it has updated its status.
|
||||
</para>
|
||||
</note>
|
||||
<para>
|
||||
<xref linkend="repmgr-daemon-unpause"> will instruct all previously paused <application>repmgrd</application>
|
||||
<xref linkend="repmgr-daemon-unpause"> will instruct all previously paused &repmgrd;
|
||||
instances to resume normal failover operation.
|
||||
</para>
|
||||
</refsect1>
|
||||
@@ -69,7 +69,7 @@ NOTICE: node 3 (node3) paused</programlisting>
|
||||
<term><option>--dry-run</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Check if nodes are reachable but don't pause <application>repmgrd</application>.
|
||||
Check if nodes are reachable but don't pause &repmgrd;.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
@@ -87,7 +87,7 @@ NOTICE: node 3 (node3) paused</programlisting>
|
||||
<term><option>SUCCESS (0)</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application> could be paused on all nodes.
|
||||
&repmgrd; could be paused on all nodes.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
@@ -96,7 +96,7 @@ NOTICE: node 3 (node3) paused</programlisting>
|
||||
<term><option>ERR_REPMGRD_PAUSE (26)</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application> could not be paused on one or mode nodes.
|
||||
&repmgrd; could not be paused on one or mode nodes.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
@@ -14,17 +14,17 @@
|
||||
|
||||
<refnamediv>
|
||||
<refname>repmgr daemon start</refname>
|
||||
<refpurpose>Start the <application>repmgrd</application> daemon</refpurpose>
|
||||
<refpurpose>Start the &repmgrd; daemon</refpurpose>
|
||||
</refnamediv>
|
||||
|
||||
<refsect1>
|
||||
<title>Description</title>
|
||||
<para>
|
||||
This command starts the <application>repmgrd</application> daemon on the
|
||||
This command starts the &repmgrd; daemon on the
|
||||
local node.
|
||||
</para>
|
||||
<para>
|
||||
By default, &repmgr; will wait for up to 15 seconds to confirm that <application>repmgrd</application>
|
||||
By default, &repmgr; will wait for up to 15 seconds to confirm that &repmgrd;
|
||||
started. This behaviour can be overridden by specifying a diffent value using the <option>--wait</option>
|
||||
option, or disabled altogether with the <option>--no-wait</option> option.
|
||||
</para>
|
||||
@@ -50,7 +50,7 @@
|
||||
<term><option>--dry-run</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Check prerequisites but don't actually attempt to start <application>repmgrd</application>.
|
||||
Check prerequisites but don't actually attempt to start &repmgrd;.
|
||||
</para>
|
||||
<para>
|
||||
This action will output the command which would be executed.
|
||||
@@ -63,7 +63,7 @@
|
||||
<term><option>--wait</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Wait for the specified number of seconds to confirm that <application>repmgrd</application>
|
||||
Wait for the specified number of seconds to confirm that &repmgrd;
|
||||
started successfully.
|
||||
</para>
|
||||
<para>
|
||||
@@ -77,7 +77,7 @@
|
||||
<term><option>--no-wait</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Don't wait to confirm that <application>repmgrd</application>
|
||||
Don't wait to confirm that &repmgrd;
|
||||
started successfully.
|
||||
</para>
|
||||
<para>
|
||||
@@ -109,7 +109,7 @@
|
||||
<para>
|
||||
<command>repmgr daemon start</command> will execute the command defined by the
|
||||
<varname>repmgrd_service_start_command</varname> parameter in <filename>repmgr.conf</filename>.
|
||||
This must be set to a shell command which will start <application>repmgrd</application>;
|
||||
This must be set to a shell command which will start &repmgrd;;
|
||||
if &repmgr; was installed from a package, this will be the service command defined by the
|
||||
package. For more details see <link linkend="appendix-packages">Appendix: &repmgr; package details</link>.
|
||||
</para>
|
||||
@@ -117,7 +117,7 @@
|
||||
<para>
|
||||
If &repmgr; was installed from a system package, and you do not configure
|
||||
<varname>repmgrd_service_start_command</varname> to an appropriate service command, this may
|
||||
result in the system becoming confused about the state of the <application>repmgrd</application>
|
||||
result in the system becoming confused about the state of the &repmgrd;
|
||||
service; this is particularly the case with <literal>systemd</literal>.
|
||||
</para>
|
||||
</important>
|
||||
@@ -139,12 +139,12 @@
|
||||
<term><option>SUCCESS (0)</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
The <application>repmgrd</application> start command (defined in
|
||||
The &repmgrd; start command (defined in
|
||||
<varname>repmgrd_service_start_command</varname>) was successfully executed.
|
||||
</para>
|
||||
<para>
|
||||
If the <option>--wait</option> option was provided, &repmgr; will confirm that
|
||||
<application>repmgrd</application> has actually started up.
|
||||
&repmgrd; has actually started up.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
@@ -167,10 +167,10 @@
|
||||
&repmgr; was unable to connect to the local PostgreSQL node.
|
||||
</para>
|
||||
<para>
|
||||
PostgreSQL must be running before <application>repmgrd</application>
|
||||
PostgreSQL must be running before &repmgrd;
|
||||
can be started. Additionally, unless the <option>--no-wait</option> option was
|
||||
provided, &repmgr; needs to be able to connect to the local PostgreSQL node
|
||||
to determine the state of <application>repmgrd</application>.
|
||||
to determine the state of &repmgrd;.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
@@ -180,11 +180,11 @@
|
||||
<term><option>ERR_REPMGRD_SERVICE (27)</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
The <application>repmgrd</application> start command (defined in
|
||||
The &repmgrd; start command (defined in
|
||||
<varname>repmgrd_service_start_command</varname>) was not successfully executed.
|
||||
</para>
|
||||
<para>
|
||||
This can also mean that &repmgr; was unable to confirm whether <application>repmgrd</application>
|
||||
This can also mean that &repmgr; was unable to confirm whether &repmgrd;
|
||||
successfully started (unless the <option>--no-wait</option> option was provided).
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
@@ -14,14 +14,14 @@
|
||||
|
||||
<refnamediv>
|
||||
<refname>repmgr daemon status</refname>
|
||||
<refpurpose>display information about the status of <application>repmgrd</application> on each node in the cluster</refpurpose>
|
||||
<refpurpose>display information about the status of &repmgrd; on each node in the cluster</refpurpose>
|
||||
</refnamediv>
|
||||
|
||||
<refsect1>
|
||||
<title>Description</title>
|
||||
<para>
|
||||
This command provides an overview over all active nodes in the cluster and the state
|
||||
of each node's <application>repmgrd</application> instance. It can be used to check
|
||||
of each node's &repmgrd; instance. It can be used to check
|
||||
the result of <xref linkend="repmgr-daemon-pause"> and <xref linkend="repmgr-daemon-unpause">
|
||||
operations.
|
||||
</para>
|
||||
@@ -35,13 +35,13 @@
|
||||
</para>
|
||||
<para>
|
||||
If PostgreSQL is not running on a node, &repmgr; will not be able to determine the
|
||||
status of that node's <application>repmgrd</application> instance.
|
||||
status of that node's &repmgrd; instance.
|
||||
</para>
|
||||
<note>
|
||||
<para>
|
||||
After restarting PostgreSQL on any node, the <application>repmgrd</application> instance
|
||||
After restarting PostgreSQL on any node, the &repmgrd; instance
|
||||
will take a second or two before it is able to update its status. Until then,
|
||||
<application>repmgrd</application> will be shown as not running.
|
||||
&repmgrd; will be shown as not running.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
@@ -50,7 +50,7 @@
|
||||
<refsect1>
|
||||
<title>Examples</title>
|
||||
<para>
|
||||
<application>repmgrd</application> running normally on all nodes:
|
||||
&repmgrd; running normally on all nodes:
|
||||
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
|
||||
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
|
||||
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
|
||||
@@ -60,7 +60,7 @@
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<application>repmgrd</application> paused on all nodes (using <xref linkend="repmgr-daemon-pause">):
|
||||
&repmgrd; paused on all nodes (using <xref linkend="repmgr-daemon-pause">):
|
||||
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
|
||||
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
|
||||
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
|
||||
@@ -70,7 +70,7 @@
|
||||
</para>
|
||||
|
||||
<para>
|
||||
<application>repmgrd</application> not running on one node:
|
||||
&repmgrd; not running on one node:
|
||||
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
|
||||
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
|
||||
----+-------+---------+-----------+----------+-------------+-------+---------+--------------------
|
||||
@@ -127,25 +127,25 @@
|
||||
|
||||
<listitem>
|
||||
<simpara>
|
||||
<application>repmgrd</application> running (1 = running, 0 = not running, -1 = unknown)
|
||||
&repmgrd; running (1 = running, 0 = not running, -1 = unknown)
|
||||
</simpara>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<simpara>
|
||||
<application>repmgrd</application> PID (-1 if not running or status unknown)
|
||||
&repmgrd; PID (-1 if not running or status unknown)
|
||||
</simpara>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<simpara>
|
||||
<application>repmgrd</application> paused (1 = paused, 0 = not paused, -1 = unknown)
|
||||
&repmgrd; paused (1 = paused, 0 = not paused, -1 = unknown)
|
||||
</simpara>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<simpara>
|
||||
<application>repmgrd</application> node priority
|
||||
&repmgrd; node priority
|
||||
</simpara>
|
||||
</listitem>
|
||||
|
||||
|
||||
@@ -14,25 +14,25 @@
|
||||
|
||||
<refnamediv>
|
||||
<refname>repmgr daemon stop</refname>
|
||||
<refpurpose>Stop the <application>repmgrd</application> daemon</refpurpose>
|
||||
<refpurpose>Stop the &repmgrd; daemon</refpurpose>
|
||||
</refnamediv>
|
||||
|
||||
<refsect1>
|
||||
<title>Description</title>
|
||||
<para>
|
||||
This command stops the <application>repmgrd</application> daemon on the
|
||||
This command stops the &repmgrd; daemon on the
|
||||
local node.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
By default, &repmgr; will wait for up to 15 seconds to confirm that <application>repmgrd</application>
|
||||
By default, &repmgr; will wait for up to 15 seconds to confirm that &repmgrd;
|
||||
stopped. This behaviour can be overridden by specifying a diffent value using the <option>--wait</option>
|
||||
option, or disabled altogether with the <option>--no-wait</option> option.
|
||||
</para>
|
||||
<note>
|
||||
<para>
|
||||
If PostgreSQL is not running on the local node, under some circumstances &repmgr; may not
|
||||
be able to confirm if <application>repmgrd</application> has actually stopped.
|
||||
be able to confirm if &repmgrd; has actually stopped.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
@@ -50,7 +50,7 @@
|
||||
<para>
|
||||
<command>repmgr daemon stop</command> will execute the command defined by the
|
||||
<varname>repmgrd_service_stop_command</varname> parameter in <filename>repmgr.conf</filename>.
|
||||
This must be set to a shell command which will stop <application>repmgrd</application>;
|
||||
This must be set to a shell command which will stop &repmgrd;;
|
||||
if &repmgr; was installed from a package, this will be the service command defined by the
|
||||
package. For more details see <link linkend="appendix-packages">Appendix: &repmgr; package details</link>.
|
||||
</para>
|
||||
@@ -59,7 +59,7 @@
|
||||
<para>
|
||||
If &repmgr; was installed from a system package, and you do not configure
|
||||
<varname>repmgrd_service_stop_command</varname> to an appropriate service command, this may
|
||||
result in the system becoming confused about the state of the <application>repmgrd</application>
|
||||
result in the system becoming confused about the state of the &repmgrd;
|
||||
service; this is particularly the case with <literal>systemd</literal>.
|
||||
</para>
|
||||
</important>
|
||||
@@ -76,7 +76,7 @@
|
||||
<term><option>--dry-run</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Check prerequisites but don't actually attempt to stop <application>repmgrd</application>.
|
||||
Check prerequisites but don't actually attempt to stop &repmgrd;.
|
||||
</para>
|
||||
<para>
|
||||
This action will output the command which would be executed.
|
||||
@@ -89,7 +89,7 @@
|
||||
<term><option>--wait</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Wait for the specified number of seconds to confirm that <application>repmgrd</application>
|
||||
Wait for the specified number of seconds to confirm that &repmgrd;
|
||||
stopped successfully.
|
||||
</para>
|
||||
<para>
|
||||
@@ -103,7 +103,7 @@
|
||||
<term><option>--no-wait</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Don't wait to confirm that <application>repmgrd</application>
|
||||
Don't wait to confirm that &repmgrd;
|
||||
stopped successfully.
|
||||
</para>
|
||||
<para>
|
||||
@@ -134,7 +134,7 @@
|
||||
<para>
|
||||
<command>repmgr daemon stop</command> will execute the command defined by the
|
||||
<varname>repmgrd_service_stop_command</varname> parameter in <filename>repmgr.conf</filename>.
|
||||
This must be set to a shell command which will stop <application>repmgrd</application>;
|
||||
This must be set to a shell command which will stop &repmgrd;;
|
||||
if &repmgr; was installed from a package, this will be the service command defined by the
|
||||
package. For more details see <link linkend="appendix-packages">Appendix: &repmgr; package details</link>.
|
||||
</para>
|
||||
@@ -142,7 +142,7 @@
|
||||
<para>
|
||||
If &repmgr; was installed from a system package, and you do not configure
|
||||
<varname>repmgrd_service_stop_command</varname> to an appropriate service command, this may
|
||||
result in the system becoming confused about the state of the <application>repmgrd</application>
|
||||
result in the system becoming confused about the state of the &repmgrd;
|
||||
service; this is particularly the case with <literal>systemd</literal>.
|
||||
</para>
|
||||
</important>
|
||||
@@ -163,7 +163,7 @@
|
||||
<term><option>SUCCESS (0)</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application> could be stopped.
|
||||
&repmgrd; could be stopped.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
@@ -182,7 +182,7 @@
|
||||
<term><option>ERR_REPMGRD_SERVICE (27)</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application> could not be stopped.
|
||||
&repmgrd; could not be stopped.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
@@ -15,14 +15,14 @@
|
||||
|
||||
<refnamediv>
|
||||
<refname>repmgr daemon unpause</refname>
|
||||
<refpurpose>Instruct all <application>repmgrd</application> instances in the replication cluster to resume failover operations</refpurpose>
|
||||
<refpurpose>Instruct all &repmgrd; instances in the replication cluster to resume failover operations</refpurpose>
|
||||
</refnamediv>
|
||||
|
||||
<refsect1>
|
||||
<title>Description</title>
|
||||
<para>
|
||||
This command can be run on any active node in the replication cluster to instruct all
|
||||
running <application>repmgrd</application> instances to "unpause"
|
||||
running &repmgrd; instances to "unpause"
|
||||
(following a previous execution of <xref linkend="repmgr-daemon-pause">)
|
||||
and resume normal failover/monitoring operation.
|
||||
</para>
|
||||
@@ -30,7 +30,7 @@
|
||||
<note>
|
||||
<para>
|
||||
It's important to wait a few seconds after restarting PostgreSQL on any node before running
|
||||
<command>repmgr daemon pause</command>, as the <application>repmgrd</application> instance
|
||||
<command>repmgr daemon pause</command>, as the &repmgrd; instance
|
||||
on the restarted node will take a second or two before it has updated its status.
|
||||
</para>
|
||||
</note>
|
||||
@@ -64,7 +64,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
<term><option>--dry-run</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Check if nodes are reachable but don't unpause <application>repmgrd</application>.
|
||||
Check if nodes are reachable but don't unpause &repmgrd;.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
@@ -82,7 +82,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
<term><option>SUCCESS (0)</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application> could be unpaused on all nodes.
|
||||
&repmgrd; could be unpaused on all nodes.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
@@ -91,7 +91,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
<term><option>ERR_REPMGRD_PAUSE (26)</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
<application>repmgrd</application> could not be unpaused on one or mode nodes.
|
||||
&repmgrd; could not be unpaused on one or mode nodes.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
@@ -122,7 +122,7 @@
|
||||
If not provided, &repmgr; will attempt to follow the current primary node.
|
||||
</para>
|
||||
<para>
|
||||
Note that when using <application>repmgrd</application>, <option>--upstream-node-id</option>
|
||||
Note that when using &repmgrd;, <option>--upstream-node-id</option>
|
||||
should always be configured;
|
||||
see <link linkend="repmgrd-automatic-failover-configuration">Automatic failover configuration</link>
|
||||
for details.
|
||||
|
||||
@@ -23,7 +23,7 @@
|
||||
<para>
|
||||
If the standby promotion succeeds, the server will not need to be
|
||||
restarted. However any other standbys will need to follow the new server,
|
||||
by using <xref linkend="repmgr-standby-follow">; if <application>repmgrd</application>
|
||||
by using <xref linkend="repmgr-standby-follow">; if &repmgrd;
|
||||
is active, it will handle this automatically.
|
||||
</para>
|
||||
<para>
|
||||
|
||||
@@ -17,7 +17,7 @@
|
||||
<para>
|
||||
<command>repmgr standby register</command> adds a standby's information to
|
||||
the &repmgr; metadata. This command needs to be executed to enable
|
||||
promote/follow operations and to allow <application>repmgrd</application> to work with the node.
|
||||
promote/follow operations and to allow &repmgrd; to work with the node.
|
||||
An existing standby can be registered using this command. Execute with the
|
||||
<literal>--dry-run</literal> option to check what would happen without actually registering the
|
||||
standby.
|
||||
@@ -59,7 +59,7 @@
|
||||
<para>
|
||||
Depending on your environment and workload, it may take some time for the standby's node record
|
||||
to propagate from the primary to the standby. Some actions (such as starting
|
||||
<application>repmgrd</application>) require that the standby's node record
|
||||
&repmgrd;) require that the standby's node record
|
||||
is present and up-to-date to function correctly.
|
||||
</para>
|
||||
<para>
|
||||
|
||||
@@ -48,12 +48,12 @@
|
||||
<note>
|
||||
<para>
|
||||
From <link linkend="release-4.2">repmgr 4.2</link>, &repmgr; will instruct any running
|
||||
<application>repmgrd</application> instances to pause operations while the switchover
|
||||
is being carried out, to prevent <application>repmgrd</application> from
|
||||
&repmgrd; instances to pause operations while the switchover
|
||||
is being carried out, to prevent &repmgrd; from
|
||||
unintentionally promoting a node. For more details, see <xref linkend="repmgrd-pausing">.
|
||||
</para>
|
||||
<para>
|
||||
Users of &repmgr; versions prior to 4.2 should ensure that <application>repmgrd</application>
|
||||
Users of &repmgr; versions prior to 4.2 should ensure that &repmgrd;
|
||||
is not running on any nodes while a switchover is being executed.
|
||||
</para>
|
||||
</note>
|
||||
@@ -134,11 +134,11 @@
|
||||
<term><option>--repmgrd-no-pause</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Don't pause <application>repmgrd</application> while executing a switchover.
|
||||
Don't pause &repmgrd; while executing a switchover.
|
||||
</para>
|
||||
<para>
|
||||
This option should not be used unless you take steps by other means
|
||||
to ensure <application>repmgrd</application> is paused or not
|
||||
to ensure &repmgrd; is paused or not
|
||||
running on all nodes.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
@@ -20,7 +20,7 @@
|
||||
record to the &repmgr; metadata, and if necessary initialises the witness
|
||||
node by installing the &repmgr; extension and copying the &repmgr; metadata
|
||||
to the witness server. This command needs to be executed to enable
|
||||
use of the witness server with <application>repmgrd</application>.
|
||||
use of the witness server with &repmgrd;.
|
||||
</para>
|
||||
<para>
|
||||
When executing <command>repmgr witness register</command>, database connection
|
||||
|
||||
@@ -1,4 +1,4 @@
|
||||
<!-- doc/src/sgml/postgres.sgml -->
|
||||
<!-- doc/repmgr.sgml -->
|
||||
|
||||
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
|
||||
|
||||
@@ -9,6 +9,7 @@
|
||||
%filelist;
|
||||
|
||||
<!ENTITY repmgr "<productname>repmgr</productname>">
|
||||
<!ENTITY repmgrd "<productname>repmgrd</productname>">
|
||||
<!ENTITY postgres "<productname>PostgreSQL</productname>">
|
||||
]>
|
||||
|
||||
|
||||
@@ -7,7 +7,7 @@
|
||||
<title>Automatic failover with repmgrd</title>
|
||||
|
||||
<para>
|
||||
<application>repmgrd</application> is a management and monitoring daemon which runs
|
||||
&repmgrd; is a management and monitoring daemon which runs
|
||||
on each node in a replication cluster. It can automate actions such as
|
||||
failover and updating standbys to follow the new primary, as well as
|
||||
providing monitoring information about the state of each standby.
|
||||
@@ -60,7 +60,7 @@
|
||||
|
||||
<note>
|
||||
<simpara>
|
||||
A witness server will only be useful if <application>repmgrd</application>
|
||||
A witness server will only be useful if &repmgrd;
|
||||
is in use.
|
||||
</simpara>
|
||||
</note>
|
||||
@@ -96,11 +96,11 @@
|
||||
<simpara>
|
||||
As the witness server is not part of the replication cluster, further
|
||||
changes to the &repmgr; metadata will be synchronised by
|
||||
<application>repmgrd</application>.
|
||||
&repmgrd;.
|
||||
</simpara>
|
||||
</note>
|
||||
<para>
|
||||
Once the witness server has been configured, <application>repmgrd</application>
|
||||
Once the witness server has been configured, &repmgrd;
|
||||
should be started.
|
||||
</para>
|
||||
|
||||
@@ -156,8 +156,8 @@
|
||||
location='dc1'</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
In a failover situation, <application>repmgrd</application> will check if any servers in the
|
||||
same location as the current primary node are visible. If not, <application>repmgrd</application>
|
||||
In a failover situation, &repmgrd; will check if any servers in the
|
||||
same location as the current primary node are visible. If not, &repmgrd;
|
||||
will assume a network interruption and not promote any node in any
|
||||
other location (it will however enter <link linkend="repmgrd-degraded-monitoring">degraded monitoring</link>
|
||||
mode until a primary becomes visible).
|
||||
@@ -184,7 +184,7 @@
|
||||
</para>
|
||||
</note>
|
||||
<para>
|
||||
When running on the primary node, <application>repmgrd</application> can
|
||||
When running on the primary node, &repmgrd; can
|
||||
monitor connections and in particular disconnections by its attached
|
||||
child nodes (standbys), and optionally execute a custom command
|
||||
if certain criteria are met (such as the number of attached nodes falling to
|
||||
@@ -195,7 +195,7 @@
|
||||
|
||||
<note>
|
||||
<para>
|
||||
Currently <application>repmgrd</application> can only detect disconnections
|
||||
Currently &repmgrd; can only detect disconnections
|
||||
of streaming replication standbys and cannot determine whether a standby
|
||||
has disconnected and fallen back to archive recovery.
|
||||
</para>
|
||||
@@ -207,7 +207,7 @@
|
||||
<sect2 id="repmgrd-primary-child-disconnection-monitoring-process">
|
||||
<title>Standby disconnections monitoring process and criteria</title>
|
||||
<para>
|
||||
<application>repmgrd</application> monitors attach child nodes and decides
|
||||
&repmgrd; monitors attach child nodes and decides
|
||||
whether to invoke the user-defined command based on the following process
|
||||
and criteria:
|
||||
<itemizedlist>
|
||||
@@ -215,7 +215,7 @@
|
||||
<listitem>
|
||||
<para>
|
||||
Every few seconds (defined by the configuration parameter <varname>child_nodes_check_interval</varname>;
|
||||
default: <literal>5</literal> seconds, a value of <literal>0</literal> disables this altogether), <application>repmgrd</application> queries
|
||||
default: <literal>5</literal> seconds, a value of <literal>0</literal> disables this altogether), &repmgrd; queries
|
||||
the <literal>pg_stat_replication</literal> system view and compares
|
||||
the nodes present there against the list of nodes registered with &repmgr; which
|
||||
should be attached to the primary.
|
||||
@@ -225,7 +225,7 @@
|
||||
<listitem>
|
||||
<para>
|
||||
If a child node (standby) is no longer present in <literal>pg_stat_replication</literal>,
|
||||
<application>repmgrd</application> notes the time it detected the node's absence, and additionally generates a
|
||||
&repmgrd; notes the time it detected the node's absence, and additionally generates a
|
||||
<literal>child_node_disconnect</literal> event.
|
||||
</para>
|
||||
</listitem>
|
||||
@@ -233,14 +233,14 @@
|
||||
<listitem>
|
||||
<para>
|
||||
If a chile node (standby) which was absent from <literal>pg_stat_replication</literal> reappears,
|
||||
<application>repmgrd</application> clears the time it detected the node's absence, and additionally generates a
|
||||
&repmgrd; clears the time it detected the node's absence, and additionally generates a
|
||||
<literal>child_node_reconnect</literal> event.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
If an entirely new child node (standby) is detected, <application>repmgrd</application> adds it to its internal list
|
||||
If an entirely new child node (standby) is detected, &repmgrd; adds it to its internal list
|
||||
and additionally generates a <literal>child_node_new_connect</literal> event.
|
||||
</para>
|
||||
</listitem>
|
||||
@@ -248,10 +248,10 @@
|
||||
<listitem>
|
||||
<para>
|
||||
If the <varname>child_nodes_disconnect_command</varname> parameter is set in
|
||||
<filename>repmgr.conf</filename>, <application>repmgrd</application> will then loop through all child nodes.
|
||||
<filename>repmgr.conf</filename>, &repmgrd; will then loop through all child nodes.
|
||||
If it determines that insufficient child nodes are connected, and a
|
||||
minimum of <varname>child_nodes_disconnect_timeout</varname> seconds (default: <literal>30</literal>)
|
||||
has elapsed since the last node became disconnected, <application>repmgrd</application> will then execute the
|
||||
has elapsed since the last node became disconnected, &repmgrd; will then execute the
|
||||
<varname>child_nodes_disconnect_command</varname> script.
|
||||
</para>
|
||||
<para>
|
||||
@@ -267,8 +267,8 @@
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Note that child nodes which are not attached when <application>repmgrd</application>
|
||||
starts will <emphasis>not</emphasis> be considered as missing, as <application>repmgrd</application>
|
||||
Note that child nodes which are not attached when &repmgrd;
|
||||
starts will <emphasis>not</emphasis> be considered as missing, as &repmgrd;
|
||||
cannot know why they are not attached.
|
||||
</para>
|
||||
</listitem>
|
||||
@@ -280,12 +280,12 @@
|
||||
<sect2 id="repmgrd-primary-child-disconnection-example">
|
||||
<title>Standby disconnections monitoring process example</title>
|
||||
<para>
|
||||
This example shows typical <application>repmgrd</application> log output from a three-node cluster
|
||||
This example shows typical &repmgrd; log output from a three-node cluster
|
||||
(primary and two child nodes), with <varname>child_nodes_connected_min_count</varname>
|
||||
set to <literal>2</literal>.
|
||||
</para>
|
||||
<para>
|
||||
<application>repmgrd</application> on the primary has started up, while two child
|
||||
&repmgrd; on the primary has started up, while two child
|
||||
nodes are being provisioned:
|
||||
<programlisting>
|
||||
[2019-04-24 15:25:33] [INFO] monitoring primary node "node1" (ID: 1) in normal state
|
||||
@@ -298,7 +298,7 @@
|
||||
(...)</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
One of the child nodes has disconnected; <application>repmgrd</application>
|
||||
One of the child nodes has disconnected; &repmgrd;
|
||||
is now waiting <varname>child_nodes_disconnect_timeout</varname> seconds
|
||||
before executing <varname>child_nodes_disconnect_command</varname>:
|
||||
<programlisting>
|
||||
@@ -333,7 +333,7 @@
|
||||
<para>
|
||||
If a child node is configured to use archive recovery, it's possible that
|
||||
the child node will disconnect from the primary node and fall back to
|
||||
archive recovery. In this case <application>repmgrd</application>
|
||||
archive recovery. In this case &repmgrd;
|
||||
will nevertheless register a node disconnection.
|
||||
</para>
|
||||
</listitem>
|
||||
@@ -374,7 +374,7 @@
|
||||
<term><varname>child_nodes_check_interval</varname></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Interval (in seconds) after which <application>repmgrd</application> queries the
|
||||
Interval (in seconds) after which &repmgrd; queries the
|
||||
<literal>pg_stat_replication</literal> system view and compares the nodes present
|
||||
there against the list of nodes registered with repmgr which should be attached to the primary.
|
||||
</para>
|
||||
@@ -393,7 +393,7 @@
|
||||
<term><varname>child_nodes_disconnect_command</varname></term>
|
||||
<listitem>
|
||||
<para>
|
||||
User-definable script to be executed when <application>repmgrd</application>
|
||||
User-definable script to be executed when &repmgrd;
|
||||
determines that an insufficient number of child nodes are connected. By default
|
||||
the script is executed when no child nodes are executed, but the execution
|
||||
threshold can be modified by setting one of <varname>child_nodes_connected_min_count</varname>
|
||||
@@ -435,7 +435,7 @@
|
||||
</para>
|
||||
<para>
|
||||
The <varname>child_nodes_disconnect_command</varname> script will not be executed if
|
||||
<application>repmgrd</application> is <link linkend="repmgrd-pausing">paused</link>.
|
||||
&repmgrd; is <link linkend="repmgrd-pausing">paused</link>.
|
||||
</para>
|
||||
|
||||
</listitem>
|
||||
@@ -449,7 +449,7 @@
|
||||
<term><varname>child_nodes_disconnect_timeout</varname></term>
|
||||
<listitem>
|
||||
<para>
|
||||
If <application>repmgrd</application> determines that an insufficient number of
|
||||
If &repmgrd; determines that an insufficient number of
|
||||
child nodes are connected, it will wait for the specified number of seconds
|
||||
to execute the <varname>child_nodes_disconnect_command</varname>.
|
||||
</para>
|
||||
@@ -543,7 +543,7 @@
|
||||
<term><varname>child_node_disconnect</varname></term>
|
||||
<listitem>
|
||||
<para>
|
||||
This event is generated after <application>repmgrd</application>
|
||||
This event is generated after &repmgrd;
|
||||
detects that a child node is no longer streaming from the primary node.
|
||||
</para>
|
||||
<para>
|
||||
@@ -565,7 +565,7 @@ $ repmgr cluster event --event=child_node_disconnect
|
||||
<term><varname>child_node_reconnect</varname></term>
|
||||
<listitem>
|
||||
<para>
|
||||
This event is generated after <application>repmgrd</application>
|
||||
This event is generated after &repmgrd;
|
||||
detects that a child node has resumed streaming from the primary node.
|
||||
</para>
|
||||
<para>
|
||||
@@ -587,7 +587,7 @@ $ repmgr cluster event --event=child_node_reconnect
|
||||
<term><varname>child_node_new_connect</varname></term>
|
||||
<listitem>
|
||||
<para>
|
||||
This event is generated after <application>repmgrd</application>
|
||||
This event is generated after &repmgrd;
|
||||
detects that a new child node has been registered with &repmgr; and has
|
||||
connected to the primary.
|
||||
</para>
|
||||
@@ -610,7 +610,7 @@ $ repmgr cluster event --event=child_node_new_connect
|
||||
<term><varname>child_nodes_disconnect_command</varname></term>
|
||||
<listitem>
|
||||
<para>
|
||||
This event is generated after <application>repmgrd</application> detects
|
||||
This event is generated after &repmgrd; detects
|
||||
that sufficient child nodes have been disconnected for a sufficient amount
|
||||
of time to trigger execution of the <varname>child_nodes_disconnect_command</varname>.
|
||||
</para>
|
||||
@@ -645,7 +645,7 @@ $ repmgr cluster event --event=child_nodes_disconnect_command
|
||||
<title>Standby disconnection on failover</title>
|
||||
<para>
|
||||
If <option>standby_disconnect_on_failover</option> is set to <literal>true</literal> in
|
||||
<filename>repmgr.conf</filename>, in a failover situation <application>repmgrd</application> will forcibly disconnect
|
||||
<filename>repmgr.conf</filename>, in a failover situation &repmgrd; will forcibly disconnect
|
||||
the local node's WAL receiver before making a failover decision.
|
||||
</para>
|
||||
<note>
|
||||
@@ -667,7 +667,7 @@ $ repmgr cluster event --event=child_nodes_disconnect_command
|
||||
<para>
|
||||
Note that when using <option>standby_disconnect_on_failover</option> there will be a delay of 5 seconds
|
||||
plus however many seconds it takes to confirm the WAL receiver is disconnected before
|
||||
<application>repmgrd</application> proceeds with the failover decision.
|
||||
&repmgrd; proceeds with the failover decision.
|
||||
</para>
|
||||
<para>
|
||||
Following the failover operation, no matter what the outcome, each node will reconnect its WAL receiver.
|
||||
@@ -692,7 +692,7 @@ $ repmgr cluster event --event=child_nodes_disconnect_command
|
||||
<title>Failover validation</title>
|
||||
<para>
|
||||
From <link linkend="release-4.3">repmgr 4.3</link>, &repmgr; makes it possible to provide a script
|
||||
to <application>repmgrd</application> which, in a failover situation,
|
||||
to &repmgrd; which, in a failover situation,
|
||||
will be executed by the promotion candidate (the node which has been selected
|
||||
to be the new primary) to confirm whether the node should actually be promoted.
|
||||
</para>
|
||||
@@ -712,7 +712,7 @@ $ repmgr cluster event --event=child_nodes_disconnect_command
|
||||
There is a pause of <option>election_rerun_interval</option> seconds before the election is rerun.
|
||||
</para>
|
||||
<para>
|
||||
Sample <application>repmgrd</application> log file output during which the failover validation
|
||||
Sample &repmgrd; log file output during which the failover validation
|
||||
script rejects the proposed promotion candidate:
|
||||
<programlisting>
|
||||
[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
|
||||
@@ -748,7 +748,7 @@ INFO: node 3 received notification to rerun promotion candidate election
|
||||
<para>
|
||||
Cascading replication - where a standby can connect to an upstream node and not
|
||||
the primary server itself - was introduced in PostgreSQL 9.2. &repmgr; and
|
||||
<application>repmgrd</application> support cascading replication by keeping track of the relationship
|
||||
&repmgrd; support cascading replication by keeping track of the relationship
|
||||
between standby servers - each node record is stored with the node id of its
|
||||
upstream ("parent") server (except of course the primary server).
|
||||
</para>
|
||||
|
||||
@@ -24,10 +24,10 @@
|
||||
<para>
|
||||
In contrast to streaming replication, there's no concept of "promoting" a new
|
||||
primary node with BDR. Instead, "failover" involves monitoring both nodes
|
||||
with <application>repmgrd</application> and redirecting queries from the failed node to the remaining
|
||||
with &repmgrd; and redirecting queries from the failed node to the remaining
|
||||
active node. This can be done by using an
|
||||
<link linkend="event-notifications">event notification</link> script
|
||||
which is called by <application>repmgrd</application> to dynamically
|
||||
which is called by &repmgrd; to dynamically
|
||||
reconfigure a proxy server/connection pooler such as <application>PgBouncer</application>.
|
||||
</para>
|
||||
|
||||
@@ -60,7 +60,7 @@
|
||||
<para>
|
||||
Application database connections *must* be passed through a proxy server/
|
||||
connection pooler such as <application>PgBouncer</application>, and it must be possible to dynamically
|
||||
reconfigure that from <application>repmgrd</application>. The example demonstrated in this document
|
||||
reconfigure that from &repmgrd;. The example demonstrated in this document
|
||||
will use <application>PgBouncer</application>
|
||||
</para>
|
||||
<para>
|
||||
@@ -296,7 +296,7 @@
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara>recreates the <application>PgBouncer</application> configuration file on each
|
||||
node using the information provided by <application>repmgrd</application>
|
||||
node using the information provided by &repmgrd;
|
||||
(primarily the <varname>conninfo</varname> string) to configure
|
||||
<application>PgBouncer</application></simpara>
|
||||
</listitem>
|
||||
@@ -318,21 +318,21 @@
|
||||
<title>Node monitoring and failover</title>
|
||||
<para>
|
||||
At the intervals specified by <varname>monitor_interval_secs</varname>
|
||||
in <filename>repmgr.conf</filename>, <application>repmgrd</application>
|
||||
in <filename>repmgr.conf</filename>, &repmgrd;
|
||||
will ping each node to check if it's available. If a node isn't available,
|
||||
<application>repmgrd</application> will enter failover mode and check <varname>reconnect_attempts</varname>
|
||||
&repmgrd; will enter failover mode and check <varname>reconnect_attempts</varname>
|
||||
times at intervals of <varname>reconnect_interval</varname> to confirm the node is definitely unreachable.
|
||||
This buffer period is necessary to avoid false positives caused by transient
|
||||
network outages.
|
||||
</para>
|
||||
<para>
|
||||
If the node is still unavailable, <application>repmgrd</application> will enter failover mode and execute
|
||||
If the node is still unavailable, &repmgrd; will enter failover mode and execute
|
||||
the script defined in <varname>event_notification_command</varname>; an entry will be logged
|
||||
in the <literal>repmgr.events</literal> table and <application>repmgrd</application> will
|
||||
in the <literal>repmgr.events</literal> table and &repmgrd; will
|
||||
(unless otherwise configured) resume monitoring of the node in "degraded" mode until it reappears.
|
||||
</para>
|
||||
<para>
|
||||
<application>repmgrd</application> logfile output during a failover event will look something like this
|
||||
&repmgrd; logfile output during a failover event will look something like this
|
||||
on one node (usually the node which has failed, here <literal>node2</literal>):
|
||||
<programlisting>
|
||||
...
|
||||
@@ -388,8 +388,8 @@
|
||||
</para>
|
||||
<para>
|
||||
This assumes only the PostgreSQL instance on <literal>node2</literal> has failed. In this case the
|
||||
<application>repmgrd</application> instance running on <literal>node2</literal> has performed the failover. However if
|
||||
the entire server becomes unavailable, <application>repmgrd</application> on <literal>node1</literal> will perform
|
||||
&repmgrd; instance running on <literal>node2</literal> has performed the failover. However if
|
||||
the entire server becomes unavailable, &repmgrd; on <literal>node1</literal> will perform
|
||||
the failover.
|
||||
</para>
|
||||
</sect1>
|
||||
@@ -404,7 +404,7 @@
|
||||
</para>
|
||||
<para>
|
||||
If the failed node comes back up and connects correctly, output similar to this
|
||||
will be visible in the <application>repmgrd</application> log:
|
||||
will be visible in the &repmgrd; log:
|
||||
<programlisting>
|
||||
[2017-07-27 21:25:30] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
|
||||
[2017-07-27 21:25:46] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
|
||||
@@ -417,10 +417,10 @@
|
||||
<sect1 id="bdr-complete-shutdown" xreflabel="Shutdown of both nodes">
|
||||
<title>Shutdown of both nodes</title>
|
||||
<para>
|
||||
If both PostgreSQL instances are shut down, <application>repmgrd</application> will try and handle the
|
||||
If both PostgreSQL instances are shut down, &repmgrd; will try and handle the
|
||||
situation as gracefully as possible, though with no failover candidates available
|
||||
there's not much it can do. Should this case ever occur, we recommend shutting
|
||||
down <application>repmgrd</application> on both nodes and restarting it once the PostgreSQL instances
|
||||
down &repmgrd; on both nodes and restarting it once the PostgreSQL instances
|
||||
are running properly.
|
||||
</para>
|
||||
</sect1>
|
||||
|
||||
@@ -8,13 +8,13 @@
|
||||
<title>repmgrd setup and configuration</title>
|
||||
|
||||
<para>
|
||||
<application>repmgrd</application> is a daemon which runs on each PostgreSQL node,
|
||||
&repmgrd; is a daemon which runs on each PostgreSQL node,
|
||||
monitoring the local node, and (unless it's the primary node) the upstream server
|
||||
(the primary server or with cascading replication, another standby) which it's
|
||||
connected to.
|
||||
</para>
|
||||
<para>
|
||||
<application>repmgrd</application> can be configured to provide failover
|
||||
&repmgrd; can be configured to provide failover
|
||||
capability in case the primary upstream node becomes unreachable, and/or
|
||||
provide monitoring data to the &repmgr; metadatabase.
|
||||
</para>
|
||||
@@ -23,7 +23,7 @@
|
||||
<title>repmgrd configuration</title>
|
||||
|
||||
<para>
|
||||
To use <application>repmgrd</application>, its associated function library <emphasis>must</emphasis> be
|
||||
To use &repmgrd;, its associated function library <emphasis>must</emphasis> be
|
||||
included via <filename>postgresql.conf</filename> with:
|
||||
|
||||
<programlisting>
|
||||
@@ -35,7 +35,7 @@
|
||||
</para>
|
||||
|
||||
<para>
|
||||
The following configuraton options apply to <application>repmgrd</application> in all circumstances:
|
||||
The following configuraton options apply to &repmgrd; in all circumstances:
|
||||
</para>
|
||||
<variablelist>
|
||||
|
||||
@@ -62,7 +62,7 @@
|
||||
<listitem>
|
||||
<para>
|
||||
The option <option>connection_check_type</option> is used to select the method
|
||||
<application>repmgrd</application> uses to determine whether the upstream node is available.
|
||||
&repmgrd; uses to determine whether the upstream node is available.
|
||||
</para>
|
||||
<para>
|
||||
Possible values are:
|
||||
@@ -132,7 +132,7 @@
|
||||
<term><option>degraded_monitoring_timeout</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
Interval (in seconds) after which <application>repmgrd</application> will terminate if
|
||||
Interval (in seconds) after which &repmgrd; will terminate if
|
||||
either of the servers (local node and or upstream node) being monitored is no longer available
|
||||
(<link linkend="repmgrd-degraded-monitoring">degraded monitoring mode</link>).
|
||||
</para>
|
||||
@@ -152,7 +152,7 @@
|
||||
<title>Required configuration for automatic failover</title>
|
||||
|
||||
<para>
|
||||
The following <application>repmgrd</application> options <emphasis>must</emphasis> be set in
|
||||
The following &repmgrd; options <emphasis>must</emphasis> be set in
|
||||
<filename>repmgr.conf</filename>:
|
||||
|
||||
<itemizedlist spacing="compact" mark="bullet">
|
||||
@@ -192,7 +192,7 @@
|
||||
</para>
|
||||
<note>
|
||||
<para>
|
||||
If <option>failover</option> is set to <literal>manual</literal>, <application>repmgrd</application>
|
||||
If <option>failover</option> is set to <literal>manual</literal>, &repmgrd;
|
||||
will not take any action if a failover situation is detected, and the node may need to
|
||||
be modified manually (e.g. by executing <command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command>).
|
||||
</para>
|
||||
@@ -209,7 +209,7 @@
|
||||
<listitem>
|
||||
<para>
|
||||
The program or script defined in <option>promote_command</option> will be executed
|
||||
in a failover situation when <application>repmgrd</application> determines that
|
||||
in a failover situation when &repmgrd; determines that
|
||||
the current node is to become the new primary node.
|
||||
</para>
|
||||
<para>
|
||||
@@ -231,8 +231,8 @@
|
||||
|
||||
<para>
|
||||
Note that the <literal>--log-to-file</literal> option will cause
|
||||
output generated by the &repmgr; command, when executed by <application>repmgrd</application>,
|
||||
to be logged to the same destination configured to receive log output for <application>repmgrd</application>.
|
||||
output generated by the &repmgr; command, when executed by &repmgrd;,
|
||||
to be logged to the same destination configured to receive log output for &repmgrd;.
|
||||
</para>
|
||||
<note>
|
||||
<para>
|
||||
@@ -252,7 +252,7 @@
|
||||
<listitem>
|
||||
<para>
|
||||
The program or script defined in <option>follow_command</option> will be executed
|
||||
in a failover situation when <application>repmgrd</application> determines that
|
||||
in a failover situation when &repmgrd; determines that
|
||||
the current node is to follow the new primary node.
|
||||
</para>
|
||||
<para>
|
||||
@@ -263,7 +263,7 @@
|
||||
The <option>follow_command</option> parameter
|
||||
should provide the <literal>--upstream-node-id=%n</literal>
|
||||
option to <command>repmgr standby follow</command>; the <literal>%n</literal> will be replaced by
|
||||
<application>repmgrd</application> with the ID of the new primary node. If this is not provided,
|
||||
&repmgrd; with the ID of the new primary node. If this is not provided,
|
||||
<command>repmgr standby follow</command> will attempt to determine the new primary by itself, but if the
|
||||
original primary comes back online after the new primary is promoted, there is a risk that
|
||||
<command>repmgr standby follow</command> will result in the node continuing to follow
|
||||
@@ -284,8 +284,8 @@
|
||||
|
||||
<para>
|
||||
Note that the <literal>--log-to-file</literal> option will cause
|
||||
output generated by the &repmgr; command, when executed by <application>repmgrd</application>,
|
||||
to be logged to the same destination configured to receive log output for <application>repmgrd</application>.
|
||||
output generated by the &repmgr; command, when executed by &repmgrd;,
|
||||
to be logged to the same destination configured to receive log output for &repmgrd;.
|
||||
</para>
|
||||
<note>
|
||||
<para>
|
||||
@@ -337,7 +337,7 @@
|
||||
<listitem>
|
||||
<para>
|
||||
User-defined script to execute for an external mechanism to validate the failover
|
||||
decision made by <application>repmgrd</application>.
|
||||
decision made by &repmgrd;.
|
||||
</para>
|
||||
<note>
|
||||
<para>
|
||||
@@ -408,7 +408,7 @@
|
||||
for this option.
|
||||
</para>
|
||||
<para>
|
||||
<application>repmgrd</application> will refuse to start if this option is set
|
||||
&repmgrd; will refuse to start if this option is set
|
||||
but either of these prerequisites is not met.
|
||||
</para>
|
||||
</note>
|
||||
@@ -469,14 +469,14 @@
|
||||
</indexterm>
|
||||
<title>PostgreSQL service configuration</title>
|
||||
<para>
|
||||
If using automatic failover, currently <application>repmgrd</application> will need to execute
|
||||
If using automatic failover, currently &repmgrd; will need to execute
|
||||
<link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>
|
||||
to restart PostgreSQL on standbys to have them follow a new primary.
|
||||
</para>
|
||||
<para>
|
||||
To ensure this happens smoothly, it's essential to provide the appropriate system/service restart
|
||||
command appropriate to your operating system via <varname>service_restart_command</varname>
|
||||
in <filename>repmgr.conf</filename>. If you don't do this, <application>repmgrd</application>
|
||||
in <filename>repmgr.conf</filename>. If you don't do this, &repmgrd;
|
||||
will default to using <command>pg_ctl</command>, which can result in unexpected problems,
|
||||
particularly on <application>systemd</application>-based systems.
|
||||
</para>
|
||||
@@ -549,21 +549,21 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
|
||||
</indexterm>
|
||||
<title>Applying configuration changes to repmgrd</title>
|
||||
<para>
|
||||
To apply configuration file changes to a running <application>repmgrd</application>
|
||||
daemon, execute the operating system's <application>repmgrd</application> service reload command
|
||||
To apply configuration file changes to a running &repmgrd;
|
||||
daemon, execute the operating system's &repmgrd; service reload command
|
||||
(see <xref linkend="appendix-packages"> for examples),
|
||||
or for instances which were manually started, execute <command>kill -HUP</command>, e.g.
|
||||
<command>kill -HUP `cat /tmp/repmgrd.pid`</command>.
|
||||
</para>
|
||||
<tip>
|
||||
<para>
|
||||
Check the <application>repmgrd</application> log to see what changes were
|
||||
Check the &repmgrd; log to see what changes were
|
||||
applied, or if any issues were encountered when reloading the configuration.
|
||||
</para>
|
||||
</tip>
|
||||
<para>
|
||||
Note that only the following subset of configuration file parameters can be changed on a
|
||||
running <application>repmgrd</application> daemon:
|
||||
running &repmgrd; daemon:
|
||||
</para>
|
||||
<itemizedlist spacing="compact" mark="bullet">
|
||||
|
||||
@@ -770,7 +770,7 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
|
||||
<note>
|
||||
<para>
|
||||
After executing <command><link linkend="repmgr-standby-register">repmgr standby register --force</link></command>,
|
||||
<application>repmgrd</application> <emphasis>must</emphasis> be restarted for the changes to take effect.
|
||||
&repmgrd; <emphasis>must</emphasis> be restarted for the changes to take effect.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
@@ -785,7 +785,7 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
|
||||
</indexterm>
|
||||
<title>repmgrd daemon</title>
|
||||
<para>
|
||||
If installed from a package, the <application>repmgrd</application> can be started
|
||||
If installed from a package, the &repmgrd; can be started
|
||||
via the operating system's service command, e.g. in <application>systemd</application>
|
||||
using <command>systemctl</command>.
|
||||
</para>
|
||||
@@ -796,7 +796,7 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
|
||||
<para>
|
||||
The commands <link linkend="repmgr-daemon-start"><command>repmgr daemon start</command></link> and
|
||||
<link linkend="repmgr-daemon-stop"><command>repmgr daemon stop</command></link> can be used
|
||||
as convenience wrappers to start and stop <application>repmgrd</application>.
|
||||
as convenience wrappers to start and stop &repmgrd;.
|
||||
</para>
|
||||
<important>
|
||||
<para>
|
||||
@@ -808,7 +808,7 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
|
||||
</para>
|
||||
</important>
|
||||
<para>
|
||||
<application>repmgrd</application> can be started manually like this:
|
||||
&repmgrd; can be started manually like this:
|
||||
<programlisting>
|
||||
repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid</programlisting>
|
||||
and stopped with <command>kill `cat /tmp/repmgrd.pid`</command>. Adjust paths as appropriate.
|
||||
@@ -825,7 +825,7 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
|
||||
</indexterm>
|
||||
<title>repmgrd's PID file</title>
|
||||
<para>
|
||||
<application>repmgrd</application> will generate a PID file by default.
|
||||
&repmgrd; will generate a PID file by default.
|
||||
</para>
|
||||
<note>
|
||||
<simpara>
|
||||
@@ -845,12 +845,12 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
|
||||
<option>--pid-file</option> may be deprecated in future releases.
|
||||
</para>
|
||||
<para>
|
||||
If a PID file location was specified by the package maintainer, <application>repmgrd</application>
|
||||
If a PID file location was specified by the package maintainer, &repmgrd;
|
||||
will use that. This only applies if &repmgr; was installed from a package and the package
|
||||
maintainer has specified the PID file location.
|
||||
</para>
|
||||
<para>
|
||||
If none of the above apply, <application>repmgrd</application> will create a PID file
|
||||
If none of the above apply, &repmgrd; will create a PID file
|
||||
in the operating system's temporary directory (as setermined by the environment variable
|
||||
<varname>TMPDIR</varname>, or if that is not set, will use <filename>/tmp</filename>).
|
||||
</para>
|
||||
@@ -859,10 +859,10 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
|
||||
<option>--no-pid-file</option>.
|
||||
</para>
|
||||
<para>
|
||||
To see which PID file <application>repmgrd</application> would use, execute <application>repmgrd</application>
|
||||
with the option <option>--show-pid-file</option>. <application>repmgrd</application>
|
||||
To see which PID file &repmgrd; would use, execute &repmgrd;
|
||||
with the option <option>--show-pid-file</option>. &repmgrd;
|
||||
will not start if this option is provided. Note that the value shown is the
|
||||
file <application>repmgrd</application> would use next time it starts, and is
|
||||
file &repmgrd; would use next time it starts, and is
|
||||
not necessarily the PID file currently in use.
|
||||
</para>
|
||||
</sect2>
|
||||
@@ -881,7 +881,7 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
|
||||
|
||||
<para>
|
||||
If &repmgr; was installed from Debian/Ubuntu packages, additional configuration
|
||||
is required before <application>repmgrd</application> is started as a daemon.
|
||||
is required before &repmgrd; is started as a daemon.
|
||||
</para>
|
||||
<para>
|
||||
This is done via the file <filename>/etc/default/repmgrd</filename>, which by default
|
||||
@@ -920,12 +920,12 @@ REPMGRD_OPTS="--daemonize=false"
|
||||
</para>
|
||||
</tip>
|
||||
<para>
|
||||
From <application>repmgrd</application> 4.1, ensure <varname>REPMGRD_OPTS</varname> includes
|
||||
From &repmgrd; 4.1, ensure <varname>REPMGRD_OPTS</varname> includes
|
||||
<option>--daemonize=false</option>, as daemonization is handled by the service command.
|
||||
</para>
|
||||
<para>
|
||||
If using <application>systemd</application>, you may need to execute <command>systemctl daemon-reload</command>.
|
||||
Also, if you attempted to start <application>repmgrd</application> using <command>systemctl start repmgrd</command>,
|
||||
Also, if you attempted to start &repmgrd; using <command>systemctl start repmgrd</command>,
|
||||
you'll need to execute <command>systemctl stop repmgrd</command>. Because that's how <application>systemd</application>
|
||||
rolls.
|
||||
</para>
|
||||
@@ -972,7 +972,7 @@ REPMGRD_OPTS="--daemonize=false"
|
||||
|
||||
<title>repmgrd log rotation</title>
|
||||
<para>
|
||||
To ensure the current <application>repmgrd</application> logfile
|
||||
To ensure the current &repmgrd; logfile
|
||||
(specified in <filename>repmgr.conf</filename> with the parameter
|
||||
<option>log_file</option>) does not grow indefinitely, configure your
|
||||
system's <command>logrotate</command> to regularly rotate it.
|
||||
|
||||
@@ -21,42 +21,42 @@
|
||||
<title>Pausing repmgrd</title>
|
||||
|
||||
<para>
|
||||
In normal operation, <application>repmgrd</application> monitors the state of the
|
||||
In normal operation, &repmgrd; monitors the state of the
|
||||
PostgreSQL node it is running on, and will take appropriate action if problems
|
||||
are detected, e.g. (if so configured) promote the node to primary, if the existing
|
||||
primary has been determined as failed.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
However, <application>repmgrd</application> is unable to distinguish between
|
||||
However, &repmgrd; is unable to distinguish between
|
||||
planned outages (such as performing a <link linkend="performing-switchover">switchover</link>
|
||||
or installing PostgreSQL maintenance released), and an actual server outage. In versions prior to
|
||||
&repmgr; 4.2 it was necessary to stop <application>repmgrd</application> on all nodes (or at least
|
||||
on all nodes where <application>repmgrd</application> is
|
||||
&repmgr; 4.2 it was necessary to stop &repmgrd; on all nodes (or at least
|
||||
on all nodes where &repmgrd; is
|
||||
<link linkend="repmgrd-automatic-failover">configured for automatic failover</link>)
|
||||
to prevent <application>repmgrd</application> from making unintentional changes to the
|
||||
to prevent &repmgrd; from making unintentional changes to the
|
||||
replication cluster.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
From <link linkend="release-4.2">&repmgr; 4.2</link>, <application>repmgrd</application>
|
||||
From <link linkend="release-4.2">&repmgr; 4.2</link>, &repmgrd;
|
||||
can now be "paused", i.e. instructed not to take any action such as performing a failover.
|
||||
This can be done from any node in the cluster, removing the need to stop/restart
|
||||
each <application>repmgrd</application> individually.
|
||||
each &repmgrd; individually.
|
||||
</para>
|
||||
|
||||
<note>
|
||||
<para>
|
||||
For major PostgreSQL upgrades, e.g. from PostgreSQL 10 to PostgreSQL 11,
|
||||
<application>repmgrd</application> should be shut down completely and only started up
|
||||
&repmgrd; should be shut down completely and only started up
|
||||
once the &repmgr; packages for the new PostgreSQL major version have been installed.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<sect2 id="repmgrd-pausing-prerequisites">
|
||||
<title>Prerequisites for pausing <application>repmgrd</application></title>
|
||||
<title>Prerequisites for pausing &repmgrd;</title>
|
||||
<para>
|
||||
In order to be able to pause/unpause <application>repmgrd</application>, following
|
||||
In order to be able to pause/unpause &repmgrd;, following
|
||||
prerequisites must be met:
|
||||
<itemizedlist spacing="compact" mark="bullet">
|
||||
|
||||
@@ -86,9 +86,9 @@
|
||||
</sect2>
|
||||
|
||||
<sect2 id="repmgrd-pausing-execution">
|
||||
<title>Pausing/unpausing <application>repmgrd</application></title>
|
||||
<title>Pausing/unpausing &repmgrd;</title>
|
||||
<para>
|
||||
To pause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link>, e.g.:
|
||||
To pause &repmgrd;, execute <link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link>, e.g.:
|
||||
<programlisting>
|
||||
$ repmgr -f /etc/repmgr.conf daemon pause
|
||||
NOTICE: node 1 (node1) paused
|
||||
@@ -96,7 +96,7 @@ NOTICE: node 2 (node2) paused
|
||||
NOTICE: node 3 (node3) paused</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
The state of <application>repmgrd</application> on each node can be checked with
|
||||
The state of &repmgrd; on each node can be checked with
|
||||
<link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>, e.g.:
|
||||
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
|
||||
ID | Name | Role | Status | repmgrd | PID | Paused?
|
||||
@@ -109,12 +109,12 @@ NOTICE: node 3 (node3) paused</programlisting>
|
||||
<note>
|
||||
<para>
|
||||
If executing a switchover with <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
|
||||
&repmgr; will automatically pause/unpause <application>repmgrd</application> as part of the switchover process.
|
||||
&repmgr; will automatically pause/unpause &repmgrd; as part of the switchover process.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
<para>
|
||||
If the primary (in this example, <literal>node1</literal>) is stopped, <application>repmgrd</application>
|
||||
If the primary (in this example, <literal>node1</literal>) is stopped, &repmgrd;
|
||||
running on one of the standbys (here: <literal>node2</literal>) will react like this:
|
||||
<programlisting>
|
||||
[2018-09-20 12:22:21] [WARNING] unable to connect to upstream node "node1" (ID: 1)
|
||||
@@ -130,14 +130,14 @@ NOTICE: node 3 (node3) paused</programlisting>
|
||||
[2018-09-20 12:22:33] [HINT] execute "repmgr daemon unpause" to resume normal failover mode</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
If the primary becomes available again (e.g. following a software upgrade), <application>repmgrd</application>
|
||||
If the primary becomes available again (e.g. following a software upgrade), &repmgrd;
|
||||
will automatically reconnect, e.g.:
|
||||
<programlisting>
|
||||
[2018-09-20 13:12:41] [NOTICE] reconnected to upstream node 1 after 8 seconds, resuming monitoring</programlisting>
|
||||
</para>
|
||||
|
||||
<para>
|
||||
To unpause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>, e.g.:
|
||||
To unpause &repmgrd;, execute <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>, e.g.:
|
||||
<programlisting>
|
||||
$ repmgr -f /etc/repmgr.conf daemon unpause
|
||||
NOTICE: node 1 (node1) unpaused
|
||||
@@ -147,7 +147,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
|
||||
<note>
|
||||
<para>
|
||||
If the previous primary is no longer accessible when <application>repmgrd</application>
|
||||
If the previous primary is no longer accessible when &repmgrd;
|
||||
is unpaused, no failover action will be taken. Instead, a new primary must be manually promoted using
|
||||
<link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>,
|
||||
and any standbys attached to the new primary with
|
||||
@@ -156,13 +156,13 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
<para>
|
||||
This is to prevent <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
|
||||
resulting in the automatic promotion of a new primary, which may be a problem particularly
|
||||
in larger clusters, where <application>repmgrd</application> could select a different promotion
|
||||
in larger clusters, where &repmgrd; could select a different promotion
|
||||
candidate to the one intended by the administrator.
|
||||
</para>
|
||||
</note>
|
||||
</sect2>
|
||||
<sect2 id="repmgrd-pausing-details">
|
||||
<title>Details on the <application>repmgrd</application> pausing mechanism</title>
|
||||
<title>Details on the &repmgrd; pausing mechanism</title>
|
||||
|
||||
<para>
|
||||
The pause state of each node will be stored over a PostgreSQL restart.
|
||||
@@ -171,14 +171,14 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
<para>
|
||||
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
|
||||
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link> can be
|
||||
executed even if <application>repmgrd</application> is not running; in this case,
|
||||
<application>repmgrd</application> will start up in whichever pause state has been set.
|
||||
executed even if &repmgrd; is not running; in this case,
|
||||
&repmgrd; will start up in whichever pause state has been set.
|
||||
</para>
|
||||
<note>
|
||||
<para>
|
||||
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
|
||||
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
|
||||
<emphasis>do not</emphasis> stop/start <application>repmgrd</application>.
|
||||
<emphasis>do not</emphasis> stop/start &repmgrd;.
|
||||
</para>
|
||||
</note>
|
||||
</sect2>
|
||||
@@ -194,7 +194,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
<para>
|
||||
If WAL replay has been paused (using <command>pg_wal_replay_pause()</command>,
|
||||
on PostgreSQL 9.6 and earlier <command>pg_xlog_replay_pause()</command>),
|
||||
in a failover situation <application>repmgrd</application> will
|
||||
in a failover situation &repmgrd; will
|
||||
automatically resume WAL replay.
|
||||
</para>
|
||||
<para>
|
||||
@@ -225,9 +225,9 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
|
||||
<title>"degraded monitoring" mode</title>
|
||||
<para>
|
||||
In certain circumstances, <application>repmgrd</application> is not able to fulfill its primary mission
|
||||
In certain circumstances, &repmgrd; is not able to fulfill its primary mission
|
||||
of monitoring the node's upstream server. In these cases it enters "degraded monitoring"
|
||||
mode, where <application>repmgrd</application> remains active but is waiting for the situation
|
||||
mode, where &repmgrd; remains active but is waiting for the situation
|
||||
to be resolved.
|
||||
</para>
|
||||
<para>
|
||||
@@ -287,12 +287,12 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
<para>
|
||||
By default, <literal>repmgrd</literal> will continue in degraded monitoring mode indefinitely.
|
||||
However a timeout (in seconds) can be set with <varname>degraded_monitoring_timeout</varname>,
|
||||
after which <application>repmgrd</application> will terminate.
|
||||
after which &repmgrd; will terminate.
|
||||
</para>
|
||||
|
||||
<note>
|
||||
<para>
|
||||
If <application>repmgrd</application> is monitoring a primary mode which has been stopped
|
||||
If &repmgrd; is monitoring a primary mode which has been stopped
|
||||
and manually restarted as a standby attached to a new primary, it will automatically detect
|
||||
the status change and update the node record to reflect the node's new status
|
||||
as an active standby. It will then resume monitoring the node as a standby.
|
||||
@@ -313,7 +313,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
|
||||
<title>Storing monitoring data</title>
|
||||
<para>
|
||||
When <application>repmgrd</application> is running with the option <literal>monitoring_history=true</literal>,
|
||||
When &repmgrd; is running with the option <literal>monitoring_history=true</literal>,
|
||||
it will constantly write standby node status information to the
|
||||
<varname>monitoring_history</varname> table, providing a near-real time
|
||||
overview of replication status on all nodes
|
||||
@@ -351,7 +351,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
|
||||
specify how many day's worth of data should be retained.
|
||||
</para>
|
||||
<para>
|
||||
It's possible to use <application>repmgrd</application> to run in monitoring
|
||||
It's possible to use &repmgrd; to run in monitoring
|
||||
mode only (without automatic failover capability) for some or all
|
||||
nodes by setting <literal>failover=manual</literal> in the node's
|
||||
<filename>repmgr.conf</filename> file. In the event of the node's upstream failing,
|
||||
|
||||
@@ -7,18 +7,18 @@
|
||||
<title>repmgrd overview</title>
|
||||
|
||||
<para>
|
||||
<application>repmgrd</application> ("<literal>replication manager daemon</literal>")
|
||||
&repmgrd; ("<literal>replication manager daemon</literal>")
|
||||
is a management and monitoring daemon which runs
|
||||
on each node in a replication cluster. It can automate actions such as
|
||||
failover and updating standbys to follow the new primary, as well as
|
||||
providing monitoring information about the state of each standby.
|
||||
</para>
|
||||
<para>
|
||||
<application>repmgrd</application> is designed to be straightforward to set up
|
||||
&repmgrd; is designed to be straightforward to set up
|
||||
and does not require additional external infrastructure.
|
||||
</para>
|
||||
<para>
|
||||
Functionality provided by <application>repmgrd</application> includes:
|
||||
Functionality provided by &repmgrd; includes:
|
||||
<itemizedlist spacing="compact" mark="bullet">
|
||||
|
||||
<listitem>
|
||||
@@ -93,11 +93,11 @@
|
||||
<tip>
|
||||
<para>
|
||||
See section <link linkend="repmgrd-automatic-failover-configuration">Required configuration for automatic failover</link>
|
||||
for an example of minimal <filename>repmgr.conf</filename> file settings suitable for use with <application>repmgrd</application>.
|
||||
for an example of minimal <filename>repmgr.conf</filename> file settings suitable for use with &repmgrd;.
|
||||
</para>
|
||||
</tip>
|
||||
<para>
|
||||
Start <application>repmgrd</application> on each standby and verify that it's running by examining the
|
||||
Start &repmgrd; on each standby and verify that it's running by examining the
|
||||
log output, which at log level <literal>INFO</literal> will look like this:
|
||||
<programlisting>
|
||||
[2019-03-15 06:32:05] [NOTICE] repmgrd (repmgrd 4.3) starting up
|
||||
@@ -107,7 +107,7 @@
|
||||
[2019-03-15 06:32:05] [INFO] monitoring connection to upstream node "node1" (ID: 1)</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
Each <application>repmgrd</application> should also have recorded its successful startup as an event:
|
||||
Each &repmgrd; should also have recorded its successful startup as an event:
|
||||
<programlisting>
|
||||
$ repmgr -f /etc/repmgr.conf cluster event --event=repmgrd_start
|
||||
Node ID | Name | Event | OK | Timestamp | Details
|
||||
@@ -123,8 +123,8 @@
|
||||
</para>
|
||||
<para>
|
||||
This will force the primary to shut down straight away, aborting all processes
|
||||
and transactions. This will cause a flurry of activity in the <application>repmgrd</application> log
|
||||
files as each <application>repmgrd</application> detects the failure of the primary and a failover
|
||||
and transactions. This will cause a flurry of activity in the &repmgrd; log
|
||||
files as each &repmgrd; detects the failure of the primary and a failover
|
||||
decision is made. This is an extract from the log of a standby server (<literal>node2</literal>)
|
||||
which has promoted to new primary after failure of the original primary (<literal>node1</literal>).
|
||||
<programlisting>
|
||||
|
||||
@@ -159,12 +159,12 @@
|
||||
<note>
|
||||
<para>
|
||||
From <link linkend="release-4.2">repmgr 4.2</link>, &repmgr; will instruct any running
|
||||
<application>repmgrd</application> instances to pause operations while the switchover
|
||||
is being carried out, to prevent <application>repmgrd</application> from
|
||||
&repmgrd; instances to pause operations while the switchover
|
||||
is being carried out, to prevent &repmgrd; from
|
||||
unintentionally promoting a node. For more details, see <xref linkend="repmgrd-pausing">.
|
||||
</para>
|
||||
<para>
|
||||
Users of &repmgr; versions prior to 4.2 should ensure that <application>repmgrd</application>
|
||||
Users of &repmgr; versions prior to 4.2 should ensure that &repmgrd;
|
||||
is not running on any nodes while a switchover is being executed.
|
||||
</para>
|
||||
</note>
|
||||
@@ -313,13 +313,13 @@
|
||||
</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
If <application>repmgrd</application> is in use, it's worth double-checking that
|
||||
If &repmgrd; is in use, it's worth double-checking that
|
||||
all nodes are unpaused by executing <command><link linkend="repmgr-daemon-status">repmgr-daemon-status</link></command>.
|
||||
</para>
|
||||
|
||||
<note>
|
||||
<para>
|
||||
Users of &repmgr; versions prior to 4.2 will need to manually restart <application>repmgrd</application>
|
||||
Users of &repmgr; versions prior to 4.2 will need to manually restart &repmgrd;
|
||||
on all nodes after the switchover is completed.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
@@ -24,7 +24,7 @@
|
||||
|
||||
<listitem>
|
||||
<simpara>
|
||||
the <application>repmgr</application> and <application>repmgrd</application> executables
|
||||
the <application>repmgr</application> and &repmgrd; executables
|
||||
</simpara>
|
||||
</listitem>
|
||||
|
||||
@@ -37,7 +37,7 @@
|
||||
|
||||
<listitem>
|
||||
<simpara>
|
||||
the shared library module used by <application>repmgrd</application> which
|
||||
the shared library module used by &repmgrd; which
|
||||
is resident in the PostgreSQL backend
|
||||
</simpara>
|
||||
</listitem>
|
||||
@@ -45,8 +45,8 @@
|
||||
</para>
|
||||
<para>
|
||||
With <emphasis>minor releases</emphasis>, usually changes are only made to the <application>repmgr</application>
|
||||
and <application>repmgrd</application> executables. In this case, the upgrade is quite straightforward,
|
||||
and is simply a case of installing the new version, and restarting <application>repmgrd</application>
|
||||
and &repmgrd; executables. In this case, the upgrade is quite straightforward,
|
||||
and is simply a case of installing the new version, and restarting &repmgrd;
|
||||
(if running).
|
||||
</para>
|
||||
|
||||
@@ -82,7 +82,7 @@
|
||||
|
||||
<listitem>
|
||||
<simpara>
|
||||
restart <application>repmgrd</application> on all nodes where it is running
|
||||
restart &repmgrd; on all nodes where it is running
|
||||
</simpara>
|
||||
</listitem>
|
||||
|
||||
@@ -93,7 +93,7 @@
|
||||
<note>
|
||||
<para>
|
||||
Some packaging systems (e.g. <link linkend="packages-debian-ubuntu">Debian/Ubuntu</link>
|
||||
may restart <application>repmgrd</application> as part of the package upgrade process.
|
||||
may restart &repmgrd; as part of the package upgrade process.
|
||||
</para>
|
||||
</note>
|
||||
|
||||
@@ -126,7 +126,7 @@
|
||||
<para>
|
||||
"major version" upgrades need to be planned more carefully, as they may include
|
||||
changes to the &repmgr; metadata (which need to be propagated from the primary to all
|
||||
standbys) and/or changes to the shared object file used by <application>repmgrd</application>
|
||||
standbys) and/or changes to the shared object file used by &repmgrd;
|
||||
(which require a PostgreSQL restart).
|
||||
</para>
|
||||
<para>
|
||||
@@ -138,14 +138,14 @@
|
||||
|
||||
<listitem>
|
||||
<simpara>
|
||||
Stop <application>repmgrd</application> (if in use) on all nodes where it is running.
|
||||
Stop &repmgrd; (if in use) on all nodes where it is running.
|
||||
</simpara>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<simpara>
|
||||
Disable the <application>repmgrd</application> service on all nodes where it is in use;
|
||||
this is to prevent packages from prematurely restarting <application>repmgrd</application>.
|
||||
Disable the &repmgrd; service on all nodes where it is in use;
|
||||
this is to prevent packages from prematurely restarting &repmgrd;.
|
||||
</simpara>
|
||||
</listitem>
|
||||
|
||||
@@ -167,12 +167,12 @@ systemctl daemon-reload</programlisting>
|
||||
<listitem>
|
||||
<simpara>
|
||||
If the &repmgr; shared library module has been updated (check the <link linkend="appendix-release-notes">release notes</link>!),
|
||||
restart PostgreSQL, then <application>repmgrd</application> (if in use) on each node,
|
||||
restart PostgreSQL, then &repmgrd; (if in use) on each node,
|
||||
The order in which this is applied to individual nodes is not critical,
|
||||
and it's also fine to restart PostgreSQL on all nodes first before starting <application>repmgrd</application>.
|
||||
and it's also fine to restart PostgreSQL on all nodes first before starting &repmgrd;.
|
||||
</simpara>
|
||||
<simpara>
|
||||
Note that if the upgrade requires a PostgreSQL restart, <application>repmgrd</application>
|
||||
Note that if the upgrade requires a PostgreSQL restart, &repmgrd;
|
||||
will only function correctly once all nodes have been restarted.
|
||||
</simpara>
|
||||
</listitem>
|
||||
@@ -188,7 +188,7 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
|
||||
|
||||
<listitem>
|
||||
<simpara>
|
||||
Reenable the <application>repmgrd</application> service on all nodes where it is in use, and
|
||||
Reenable the &repmgrd; service on all nodes where it is in use, and
|
||||
ensure it is running.
|
||||
</simpara>
|
||||
</listitem>
|
||||
@@ -212,7 +212,7 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
|
||||
<title>Checking repmgrd status after an upgrade</title>
|
||||
<para>
|
||||
From <link linkend="release-4.2">repmgr 4.2</link>, once the upgrade is complete, execute the <command><link linkend="repmgr-daemon-status">repmgr daemon status</link></command>
|
||||
command (on any node) to show an overview of the status of <application>repmgrd</application> on all nodes.
|
||||
command (on any node) to show an overview of the status of &repmgrd; on all nodes.
|
||||
</para>
|
||||
</sect2>
|
||||
</sect1>
|
||||
@@ -332,7 +332,7 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><varname>monitoring_history</varname>: this replaces the
|
||||
<application>repmgrd</application> command line option
|
||||
&repmgrd; command line option
|
||||
<literal>--monitoring-history</literal></simpara>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
@@ -433,7 +433,7 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
|
||||
<sect2>
|
||||
<title>Upgrading the repmgr schema</title>
|
||||
<para>
|
||||
Ensure <application>repmgrd</application> is not running, or any cron jobs which execute the
|
||||
Ensure &repmgrd; is not running, or any cron jobs which execute the
|
||||
<command>repmgr</command> binary.
|
||||
</para>
|
||||
<para>
|
||||
@@ -499,7 +499,7 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
Check the data is updated as expected by examining the <structname>repmgr.nodes</structname>
|
||||
table; restart <application>repmgrd</application> if required.
|
||||
table; restart &repmgrd; if required.
|
||||
</para>
|
||||
<para>
|
||||
The original <literal>repmgr_$cluster</literal> schema can be dropped at any time.
|
||||
|
||||
Reference in New Issue
Block a user