note that "standby follow" requires a primary to be available

While it's technically possible to have a standby follow another
standby while the primary is not available, repmgr will not be able
to update its metadata, which will cause Confusion and Chaos.

Update the documentation to make this clear, and provide a more helpful
error message if this situation occurs. The operation previously
failed anyway, but with an unhelpful message about not being able to
find a node record.
This commit is contained in:
Ian Barwick
2019-06-11 15:14:17 +09:00
parent 3469152314
commit 09979eaa91
4 changed files with 66 additions and 41 deletions

View File

@@ -24,6 +24,7 @@
repmgr: ensure BDR2-specific functionality cannot be used on repmgr: ensure BDR2-specific functionality cannot be used on
BDR3 and later (Ian) BDR3 and later (Ian)
repmgr: canonicalize the data directory path (Ian) repmgr: canonicalize the data directory path (Ian)
repmgr: note that "standby follow" requires a primary to be available (Ian)
repmgrd: monitor standbys attached to primary (Ian) repmgrd: monitor standbys attached to primary (Ian)
repmgrd: add "primary visibility consensus" functionality (Ian) repmgrd: add "primary visibility consensus" functionality (Ian)
repmgrd: fix memory leak which occurs while the monitored PostgreSQL repmgrd: fix memory leak which occurs while the monitored PostgreSQL

View File

@@ -43,6 +43,14 @@
</para> </para>
</listitem> </listitem>
<listitem>
<para>
<link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>:
note that an active, reachable cluster primary is required for this command;
and provide a more helpful error message if no reachable primary could be found.
</para>
</listitem>
<listitem> <listitem>
<para> <para>
&repmgr;: when executing <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>, &repmgr;: when executing <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
@@ -75,7 +83,6 @@
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
<link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>: <link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>:

View File

@@ -20,49 +20,54 @@
(&quot;follow target&quot;). Typically this will be the primary, but this (&quot;follow target&quot;). Typically this will be the primary, but this
command can also be used to attach the standby to another standby. command can also be used to attach the standby to another standby.
</para> </para>
<para> <para>
This command requires a valid This command requires a valid <filename>repmgr.conf</filename> file for the standby,
<filename>repmgr.conf</filename> file for the standby, either specified either specified explicitly with <literal>-f/--config-file</literal> or located in a
explicitly with <literal>-f/--config-file</literal> or located in a
default location; no additional arguments are required. default location; no additional arguments are required.
</para> </para>
<para> <para>The standby node (&quot;follow candidate&quot;) <emphasis>must</emphasis>
By default &repmgr; will attempt to attach the standby to the current primary. be running. If the new upstream (&quot;follow target&quot;) is not the primary,
If <option>--upstream-node-id</option> is provided, &repmgr; will attempt the cluster primary <emphasis>must</emphasis> be running and accessible from the
to attach the standby to the specified node, which can be another standby. standby node.
</para>
<para>
This command will force a restart of the standby server, which must be
running.
</para> </para>
<tip> <tip>
<para> <para>
To re-add an inactive node to the replication cluster, use To re-add an inactive node to the replication cluster, use
<xref linkend="repmgr-node-rejoin"/>. <xref linkend="repmgr-node-rejoin"/>.
</para> </para>
</tip> </tip>
<para> <para>
<command>repmgr standby follow</command> will wait up to By default &repmgr; will attempt to attach the standby to the current primary.
<varname>standby_follow_timeout</varname> seconds (default: <literal>30</literal>) If <option>--upstream-node-id</option> is provided, &repmgr; will attempt
to verify the standby has actually connected to the new upstream node. to attach the standby to the specified node, which can be another standby.
</para> </para>
<note> <para>
<para> This command will force a restart of PostgreSQL on the standby node.
If <option>recovery_min_apply_delay</option> is set for the standby, it </para>
will not attach to the new upstream node until it has replayed available
WAL. <para>
</para> <command>repmgr standby follow</command> will wait up to
<para> <varname>standby_follow_timeout</varname> seconds (default: <literal>30</literal>)
Conversely, if the standby is attached to an upstream standby to verify the standby has actually connected to the new upstream node.
which has <option>recovery_min_apply_delay</option> set, the upstream </para>
standby's replay state may actually be behind that of its new downstream node.
</para> <note>
</note> <para>
If <option>recovery_min_apply_delay</option> is set for the standby, it
will not attach to the new upstream node until it has replayed available
WAL.
</para>
<para>
Conversely, if the standby is attached to an upstream standby
which has <option>recovery_min_apply_delay</option> set, the upstream
standby's replay state may actually be behind that of its new downstream node.
</para>
</note>
</refsect1> </refsect1>
@@ -124,7 +129,7 @@
<para> <para>
Note that when using &repmgrd;, <option>--upstream-node-id</option> Note that when using &repmgrd;, <option>--upstream-node-id</option>
should always be configured; should always be configured;
see <link linkend="repmgrd-automatic-failover-configuration">Automatic failover configuration</link> see <link linkend="repmgrd-automatic-failover-configuration">Automatic failover configuration</link>
for details. for details.
</para> </para>
</listitem> </listitem>

View File

@@ -2784,12 +2784,6 @@ do_standby_follow(void)
PQfinish(local_conn); PQfinish(local_conn);
if (runtime_options.dry_run == true)
{
log_info(_("prerequisites for executing STANDBY FOLLOW are met"));
exit(SUCCESS);
}
/* /*
* Here we'll need a connection to the primary, if the upstream is not a primary. * Here we'll need a connection to the primary, if the upstream is not a primary.
*/ */
@@ -2802,12 +2796,30 @@ do_standby_follow(void)
primary_conn = get_primary_connection_quiet(follow_target_conn, primary_conn = get_primary_connection_quiet(follow_target_conn,
&primary_node_id, &primary_node_id,
NULL); NULL);
/*
* If follow target is not primary and no other primary could be found,
* abort because we won't be able to update the node record.
*/
if (PQstatus(primary_conn) != CONNECTION_OK)
{
log_error(_("unable to determine the cluster primary"));
log_detail(_("an active primary node is required for \"repmgr standby follow\""));
PQfinish(follow_target_conn);
exit(ERR_FOLLOW_FAIL);
}
} }
else else
{ {
primary_conn = follow_target_conn; primary_conn = follow_target_conn;
} }
if (runtime_options.dry_run == true)
{
log_info(_("prerequisites for executing STANDBY FOLLOW are met"));
exit(SUCCESS);
}
initPQExpBuffer(&follow_output); initPQExpBuffer(&follow_output);
success = do_standby_follow_internal( success = do_standby_follow_internal(