mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-23 07:06:30 +00:00
Currently the (very generic sounding) "standby_reconnect_timeout" configuration file parameter is used in several different contexts and it would be useful to have more granular control over the different timeouts it's used to configure. This patch introduces "node_rejoin_timeout", used in place of "standby_reconnect_timeout" (which wasn't documented) when "repmgr node rejoin" is executed, to determine how long to wait for the node to rejoin the replication cluster. Additionally "repmgrd_standby_startup_timeout" is introduced as a timeout for failover situations, when repmgrd executes "repmgr standby follow" to follow a new primary, and waits for the standby to restart and become available for connections. "standby_reconnect_timeout" is now only relevant for "repmgr standby switchover". Implements GitHub #454.
251 lines
9.1 KiB
Plaintext
251 lines
9.1 KiB
Plaintext
<refentry id="repmgr-node-rejoin">
|
|
|
|
<indexterm>
|
|
<primary>repmgr node rejoin</primary>
|
|
</indexterm>
|
|
|
|
<refmeta>
|
|
<refentrytitle>repmgr node rejoin</refentrytitle>
|
|
</refmeta>
|
|
|
|
<refnamediv>
|
|
<refname>repmgr node rejoin</refname>
|
|
<refpurpose>rejoin a dormant (stopped) node to the replication cluster</refpurpose>
|
|
</refnamediv>
|
|
|
|
<refsect1>
|
|
<title>Description</title>
|
|
<para>
|
|
Enables a dormant (stopped) node to be rejoined to the replication cluster.
|
|
</para>
|
|
<para>
|
|
This can optionally use <application>pg_rewind</application> to re-integrate
|
|
a node which has diverged from the rest of the cluster, typically a failed primary.
|
|
</para>
|
|
|
|
<tip>
|
|
<para>
|
|
If the node is running and needs to be attached to the current primary, use
|
|
<xref linkend="repmgr-standby-follow">.
|
|
</para>
|
|
</tip>
|
|
</refsect1>
|
|
|
|
|
|
<refsect1>
|
|
<title>Usage</title>
|
|
|
|
<para>
|
|
<programlisting>
|
|
repmgr node rejoin -d '$conninfo'</programlisting>
|
|
|
|
where <literal>$conninfo</literal> is the conninfo string of any reachable node in the cluster.
|
|
<filename>repmgr.conf</filename> for the stopped node *must* be supplied explicitly if not
|
|
otherwise available.
|
|
</para>
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
|
|
<title>Options</title>
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
<term><option>--dry-run</option></term>
|
|
<listitem>
|
|
<para>
|
|
Check prerequisites but don't actually execute the rejoin.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>--force-rewind[=/path/to/pg_rewind]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Execute <application>pg_rewind</application> if necessary.
|
|
</para>
|
|
<para>
|
|
It is only necessary to provide the <application>pg_rewind</application>
|
|
if using PostgreSQL 9.3 or 9.4, and <application>pg_rewind</application>
|
|
is not installed in the PostgreSQL <filename>bin</filename> directory.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>--config-files</option></term>
|
|
<listitem>
|
|
<para>
|
|
comma-separated list of configuration files to retain after
|
|
executing <application>pg_rewind</application>.
|
|
</para>
|
|
<para>
|
|
Currently <application>pg_rewind</application> will overwrite
|
|
the local node's configuration files with the files from the source node,
|
|
so it's advisable to use this option to ensure they are kept.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term><option>--config-archive-dir</option></term>
|
|
<listitem>
|
|
<para>
|
|
Directory to temporarily store configuration files specified with
|
|
<option>--config-files</option>; default: <filename>/tmp</filename>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term><option>-W/--no-wait</option></term>
|
|
<listitem>
|
|
<para>
|
|
Don't wait for the node to rejoin cluster.
|
|
</para>
|
|
<para>
|
|
If this option is supplied, &repmgr; will restart the node but
|
|
not wait for it to connect to the primary.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
</refsect1>
|
|
<refsect1>
|
|
<title>Configuration file settings</title>
|
|
|
|
<para>
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
<listitem>
|
|
<simpara>
|
|
<literal>node_rejoin_timeout</literal>:
|
|
the maximum length of time (in seconds) to wait for
|
|
the node to reconnect to the replication cluster (defaults to
|
|
the value set in <literal>standby_reconnect_timeout</literal>,
|
|
60 seconds).
|
|
</simpara>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
</refsect1>
|
|
<refsect1>
|
|
<title>Event notifications</title>
|
|
<para>
|
|
A <literal>node_rejoin</literal> <link linkend="event-notifications">event notification</link> will be generated.
|
|
</para>
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
<title>Notes</title>
|
|
<para>
|
|
Currently <command>repmgr node rejoin</command> can only be used to attach
|
|
a standby to the current primary, not another standby.
|
|
</para>
|
|
<para>
|
|
The node must have been shut down cleanly; if this was not the case, it will
|
|
need to be manually started (remove any existing <filename>recovery.conf</filename> file first)
|
|
until it has reached a consistent recovery point, then shut down cleanly.
|
|
</para>
|
|
<tip>
|
|
<para>
|
|
If <application>PostgreSQL</application> is started in single-user mode and
|
|
input is directed from <filename>/dev/null/</filename>, it will perform recovery
|
|
then immediately quit, and will then be in a state suitable for use by
|
|
<application>pg_rewind</application>.
|
|
<programlisting>
|
|
rm -f /var/lib/pgsql/data/recovery.conf
|
|
postgres --single -D /var/lib/pgsql/data/ < /dev/null</programlisting>
|
|
</para>
|
|
</tip>
|
|
</refsect1>
|
|
|
|
<refsect1 id="repmgr-node-rejoin-pg-rewind" xreflabel="Using pg_rewind">
|
|
|
|
<indexterm>
|
|
<primary>pg_rewind</primary>
|
|
<secondary>using with "repmgr node rejoin"</secondary>
|
|
</indexterm>
|
|
|
|
<title>Using <command>pg_rewind</command></title>
|
|
<para>
|
|
<command>repmgr node rejoin</command> can optionally use <command>pg_rewind</command> to re-integrate a
|
|
node which has diverged from the rest of the cluster, typically a failed primary.
|
|
<command>pg_rewind</command> is available in PostgreSQL 9.5 and later as part of the core distribution,
|
|
and can be installed from external sources for PostgreSQL 9.3 and 9.4.
|
|
</para>
|
|
<note>
|
|
<para>
|
|
<command>pg_rewind</command> <emphasis>requires</emphasis> that either
|
|
<varname>wal_log_hints</varname> is enabled, or that
|
|
data checksums were enabled when the cluster was initialized. See the
|
|
<ulink url="https://www.postgresql.org/docs/current/static/app-pgrewind.html"><command>pg_rewind</command> documentation</ulink> for details.
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
To have <command>repmgr node rejoin</command> use <command>pg_rewind</command> if required,
|
|
pass the command line option <literal>--force-rewind</literal>, which will tell &repmgr;
|
|
to execute <command>pg_rewind</command> to ensure the node can be rejoined successfully.
|
|
</para>
|
|
|
|
<para>
|
|
Be aware that if <command>pg_rewind</command> is executed and actually performs a
|
|
rewind operation, any configuration files in the PostgreSQL data directory will be
|
|
overwritten with those from the source server.
|
|
</para>
|
|
<para>
|
|
To prevent this happening, provide a comma-separated list of files to retain
|
|
using the <literal>--config-file</literal> command line option; the specified files
|
|
will be archived in a temporary directory (whose parent directory can be specified with
|
|
<literal>--config-archive-dir</literal>) and restored once the rewind operation is
|
|
complete.
|
|
</para>
|
|
|
|
<para>
|
|
Example, first using <literal>--dry-run</literal>, then actually executing the
|
|
<literal>node rejoin command</literal>.
|
|
<programlisting>
|
|
$ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node1 dbname=repmgr user=repmgr' \
|
|
--force-rewind --config-files=postgresql.local.conf,postgresql.conf --verbose --dry-run
|
|
NOTICE: using provided configuration file "/etc/repmgr.conf"
|
|
INFO: prerequisites for using pg_rewind are met
|
|
INFO: file "postgresql.local.conf" would be copied to "/tmp/repmgr-config-archive-node1/postgresql.local.conf"
|
|
INFO: file "postgresql.conf" would be copied to "/tmp/repmgr-config-archive-node1/postgresql.local.conf"
|
|
INFO: 2 files would have been copied to "/tmp/repmgr-config-archive-node1"
|
|
INFO: directory "/tmp/repmgr-config-archive-node1" deleted
|
|
INFO: pg_rewind would now be executed
|
|
DETAIL: pg_rewind command is:
|
|
pg_rewind -D '/var/lib/postgresql/data' --source-server='host=node1 dbname=repmgr user=repmgr'</programlisting>
|
|
<programlisting>
|
|
$ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node1 dbname=repmgr user=repmgr' \
|
|
--force-rewind --config-files=postgresql.local.conf,postgresql.conf --verbose
|
|
NOTICE: using provided configuration file "/etc/repmgr.conf"
|
|
INFO: prerequisites for using pg_rewind are met
|
|
INFO: 2 files copied to "/tmp/repmgr-config-archive-node1"
|
|
NOTICE: executing pg_rewind
|
|
NOTICE: 2 files copied to /var/lib/pgsql/data
|
|
INFO: directory "/tmp/repmgr-config-archive-node1" deleted
|
|
INFO: deleting "recovery.done"
|
|
INFO: setting node 1's primary to node 2
|
|
NOTICE: starting server using "pg_ctl-l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' start"
|
|
waiting for server to start.... done
|
|
server started
|
|
NOTICE: NODE REJOIN successful
|
|
DETAIL: node 1 is now attached to node 2</programlisting>
|
|
</para>
|
|
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
<title>See also</title>
|
|
<para>
|
|
<xref linkend="repmgr-standby-follow">
|
|
</para>
|
|
</refsect1>
|
|
</refentry>
|