mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-22 22:56:29 +00:00
* Added check for pg_checkpoint role presence This commit provides the needed infrastructure in `repmgr` so if the `repmgr` database user is a member of the `pg_checkpoint` role, and inherits its privileges, there is no need for such a user to be a superuser. Co-authored-by: Martín Marqués <martin.marques@enterprisedb.com>
450 lines
16 KiB
XML
450 lines
16 KiB
XML
<refentry id="repmgr-standby-switchover">
|
|
<indexterm>
|
|
<primary>repmgr standby switchover</primary>
|
|
</indexterm>
|
|
|
|
<refmeta>
|
|
<refentrytitle>repmgr standby switchover</refentrytitle>
|
|
</refmeta>
|
|
|
|
<refnamediv>
|
|
<refname>repmgr standby switchover</refname>
|
|
<refpurpose>promote a standby to primary and demote the existing primary to a standby</refpurpose>
|
|
</refnamediv>
|
|
|
|
|
|
<refsect1>
|
|
<title>Description</title>
|
|
|
|
<para>
|
|
Promotes a standby to primary and demotes the existing primary to a standby.
|
|
This command must be run on the standby to be promoted, and requires a
|
|
passwordless SSH connection to the current primary.
|
|
</para>
|
|
<para>
|
|
If other nodes are connected to the demotion candidate, &repmgr; can instruct
|
|
these to follow the new primary if the option <literal>--siblings-follow</literal>
|
|
is specified. This requires a passwordless SSH connection between the promotion
|
|
candidate (new primary) and the nodes attached to the demotion candidate
|
|
(existing primary). Note that a witness server, if in use, is also
|
|
counted as a "sibling node" as it needs to be instructed to
|
|
synchronise its metadata with the new primary.
|
|
</para>
|
|
<note>
|
|
<para>
|
|
Performing a switchover is a non-trivial operation. In particular it
|
|
relies on the current primary being able to shut down cleanly and quickly.
|
|
&repmgr; will attempt to check for potential issues but cannot guarantee
|
|
a successful switchover.
|
|
</para>
|
|
<para>
|
|
&repmgr; will refuse to perform the switchover if an exclusive backup is running on
|
|
the current primary, or if WAL replay is paused on the standby.
|
|
</para>
|
|
</note>
|
|
<para>
|
|
For more details on performing a switchover, including preparation and configuration,
|
|
see section <xref linkend="performing-switchover"/>.
|
|
</para>
|
|
|
|
<note>
|
|
<para>
|
|
From <link linkend="release-4.2">repmgr 4.2</link>, &repmgr; will instruct any running
|
|
&repmgrd; instances to pause operations while the switchover
|
|
is being carried out, to prevent &repmgrd; from
|
|
unintentionally promoting a node. For more details, see <xref linkend="repmgrd-pausing"/>.
|
|
</para>
|
|
<para>
|
|
Users of &repmgr; versions prior to 4.2 should ensure that &repmgrd;
|
|
is not running on any nodes while a switchover is being executed.
|
|
</para>
|
|
</note>
|
|
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
<title>User permission requirements</title>
|
|
<para><emphasis>data_directory</emphasis></para>
|
|
<para>
|
|
&repmgr; needs to be able to determine the location of the data directory on the
|
|
demotion candidate. If the &repmgr; is not a superuser or member of the <varname>pg_read_all_settings</varname>
|
|
<ulink url="https://www.postgresql.org/docs/current/predefined-roles.html">predefined roles</ulink>,
|
|
the name of a superuser should be provided with the <option>-S</option>/<option>--superuser</option> option.
|
|
</para>
|
|
<para><emphasis>CHECKPOINT</emphasis></para>
|
|
<para>
|
|
&repmgr; executes <command>CHECKPOINT</command> on the demotion candidate as part of the shutdown
|
|
process to ensure it shuts down as smoothly as possible.
|
|
</para>
|
|
<para>
|
|
Note that <command>CHECKPOINT</command> requires database superuser permissions to execute.
|
|
If the <literal>repmgr</literal> user is not a superuser, the name of a superuser should be
|
|
provided with the <option>-S</option>/<option>--superuser</option> option. From PostgreSQL 15 the <varname>pg_checkpoint</varname>
|
|
predefined role removes the need for superuser permissions to perform <command>CHECKPOINT</command> command.
|
|
</para>
|
|
<para>
|
|
If &repmgr; is unable to execute the <command>CHECKPOINT</command> command, the switchover
|
|
can still be carried out, albeit at a greater risk that the demotion candidate may not
|
|
be able to shut down as smoothly as might otherwise have been the case.
|
|
</para>
|
|
<para><emphasis>pg_promote() (PostgreSQL 12 and later)</emphasis></para>
|
|
<para>
|
|
From PostgreSQL 12, &repmgr; defaults to using the built-in <command>pg_promote()</command> function to
|
|
promote a standby to primary.
|
|
</para>
|
|
<para>
|
|
Note that execution of <function>pg_promote()</function> is restricted to superusers or to
|
|
any user who has been granted execution permission for this function. If the &repmgr; user
|
|
is not permitted to execute <function>pg_promote()</function>, &repmgr; will fall back to using
|
|
"<command>pg_ctl promote</command>". For more details see
|
|
<link linkend="repmgr-standby-promote">repmgr standby promote</link>.
|
|
</para>
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
|
|
<title>Options</title>
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
<term><option>--always-promote</option></term>
|
|
<listitem>
|
|
<para>
|
|
Promote standby to primary, even if it is behind or has diverged
|
|
from the original primary. The original primary will be shut down in any case,
|
|
and will need to be manually reintegrated into the replication cluster.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>--dry-run</option></term>
|
|
<listitem>
|
|
<para>
|
|
Check prerequisites but don't actually execute a switchover.
|
|
</para>
|
|
<important>
|
|
<para>
|
|
Success of <option>--dry-run</option> does not imply the switchover will
|
|
complete successfully, only that
|
|
the prerequisites for performing the operation are met.
|
|
</para>
|
|
</important>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>-F</option></term>
|
|
<term><option>--force</option></term>
|
|
<listitem>
|
|
<para>
|
|
Ignore warnings and continue anyway.
|
|
</para>
|
|
<para>
|
|
Specifically, if a problem is encountered when shutting down the current primary,
|
|
using <option>-F/--force</option> will cause &repmgr; to continue by promoting
|
|
the standby to be the new primary, and if <option>--siblings-follow</option> is
|
|
specified, attach any other standbys to the new primary.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>--force-rewind[=/path/to/pg_rewind]</option></term>
|
|
<listitem>
|
|
<para>
|
|
Use <application>pg_rewind</application> to reintegrate the old primary if necessary
|
|
(and the prerequisites for using <application>pg_rewind</application> are met).
|
|
</para>
|
|
<para>
|
|
If using PostgreSQL 9.4, and the <application>pg_rewind</application>
|
|
binary is not installed in the PostgreSQL <filename>bin</filename> directory,
|
|
provide its full path. For more details see also <xref linkend="switchover-pg-rewind"/>
|
|
and <xref linkend="repmgr-node-rejoin-pg-rewind"/>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>-R</option></term>
|
|
<term><option>--remote-user</option></term>
|
|
<listitem>
|
|
<para>
|
|
System username for remote SSH operations (defaults to local system user).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>--repmgrd-no-pause</option></term>
|
|
<listitem>
|
|
<para>
|
|
Don't pause &repmgrd; while executing a switchover.
|
|
</para>
|
|
<para>
|
|
This option should not be used unless you take steps by other means
|
|
to ensure &repmgrd; is paused or not
|
|
running on all nodes.
|
|
</para>
|
|
<para>
|
|
This option cannot be used together with <option>--repmgrd-force-unpause</option>.
|
|
</para>
|
|
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>--repmgrd-force-unpause</option></term>
|
|
<listitem>
|
|
<para>
|
|
Always unpause all &repmgrd; instances after executing a switchover. This will ensure that
|
|
any &repmgrd; instances which were paused before the switchover will be
|
|
unpaused.
|
|
</para>
|
|
<para>
|
|
This option cannot be used together with <option>--repmgrd-no-pause</option>.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term><option>--siblings-follow</option></term>
|
|
<listitem>
|
|
<para>
|
|
Have nodes attached to the old primary follow the new primary.
|
|
</para>
|
|
<para>
|
|
This will also ensure that a witness node, if in use, is updated
|
|
with the new primary's data.
|
|
</para>
|
|
<note>
|
|
<para>
|
|
In a future &repmgr; release, <option>--siblings-follow</option> will be applied
|
|
by default.
|
|
</para>
|
|
</note>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>-S</option>/<option>--superuser</option></term>
|
|
<listitem>
|
|
<para>
|
|
Use the named superuser instead of the normal &repmgr; user to perform
|
|
actions requiring superuser permissions.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
<title>Configuration file settings</title>
|
|
|
|
<para>
|
|
The following parameters in <filename>repmgr.conf</filename> are relevant to the
|
|
switchover operation:
|
|
</para>
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
<term><option>replication_lag_critical</option></term>
|
|
<listitem>
|
|
|
|
<indexterm>
|
|
<primary>replication_lag_critical</primary>
|
|
<secondary>with "repmgr standby switchover"</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
If replication lag (in seconds) on the standby exceeds this value, the
|
|
switchover will be aborted (unless the <literal>-F/--force</literal> option
|
|
is provided)
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
|
|
<term><option>shutdown_check_timeout</option></term>
|
|
<listitem>
|
|
<indexterm>
|
|
<primary>shutdown_check_timeout</primary>
|
|
<secondary>with "repmgr standby switchover"</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The maximum number of seconds to wait for the
|
|
demotion candidate (current primary) to shut down, before aborting the switchover.
|
|
</para>
|
|
<para>
|
|
Note that this parameter is set on the node where <command>repmgr standby switchover</command>
|
|
is executed (promotion candidate); setting it on the demotion candidate (former primary) will
|
|
have no effect.
|
|
</para>
|
|
<note>
|
|
<para>
|
|
In versions prior to <link linkend="release-4.2">&repmgr; 4.2</link>, <command>repmgr standby switchover</command> would
|
|
use the values defined in <literal>reconnect_attempts</literal> and <literal>reconnect_interval</literal>
|
|
to determine the timeout for demotion candidate shutdown.
|
|
</para>
|
|
</note>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
<term><option>wal_receive_check_timeout</option></term>
|
|
<listitem>
|
|
<indexterm>
|
|
<primary>wal_receive_check_timeout</primary>
|
|
<secondary>with "repmgr standby switchover"</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
After the primary has shut down, the maximum number of seconds to wait for the
|
|
walreceiver on the standby to flush WAL to disk before comparing WAL receive location
|
|
with the primary's shut down location.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
|
|
<varlistentry>
|
|
|
|
<term><option>standby_reconnect_timeout</option></term>
|
|
<listitem>
|
|
<indexterm>
|
|
<primary>standby_reconnect_timeout</primary>
|
|
<secondary>with "repmgr standby switchover"</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
The maximum number of seconds to attempt to wait for the demotion candidate (former primary)
|
|
to reconnect to the promoted primary (default: 60 seconds)
|
|
</para>
|
|
<para>
|
|
Note that this parameter is set on the node where <command>repmgr standby switchover</command>
|
|
is executed (promotion candidate); setting it on the demotion candidate (former primary) will
|
|
have no effect.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
|
|
<term><option>node_rejoin_timeout</option></term>
|
|
<listitem>
|
|
|
|
<indexterm>
|
|
<primary>node_rejoin_timeout</primary>
|
|
<secondary>with "repmgr standby switchover"</secondary>
|
|
</indexterm>
|
|
|
|
<para>
|
|
maximum number of seconds to attempt to wait for the demotion candidate (former primary)
|
|
to reconnect to the promoted primary (default: 60 seconds)
|
|
</para>
|
|
<para>
|
|
Note that this parameter is set on the the demotion candidate (former primary);
|
|
setting it on the node where <command>repmgr standby switchover</command> is
|
|
executed will have no effect.
|
|
</para>
|
|
<para>
|
|
However, this value <emphasis>must</emphasis> be less than <option>standby_reconnect_timeout</option> on the
|
|
promotion candidate (the node where <command>repmgr standby switchover</command> is executed).
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</refsect1>
|
|
|
|
|
|
<refsect1>
|
|
<title>Execution</title>
|
|
|
|
<para>
|
|
Execute with the <literal>--dry-run</literal> option to test the switchover as far as
|
|
possible without actually changing the status of either node.
|
|
</para>
|
|
|
|
<para>
|
|
External database connections, e.g. from an application, should not be permitted while
|
|
the switchover is taking place. In particular, active transactions on the primary
|
|
can potentially disrupt the shutdown process.
|
|
</para>
|
|
</refsect1>
|
|
|
|
<refsect1 id="repmgr-standby-switchover-events">
|
|
<title>Event notifications</title>
|
|
<para>
|
|
<literal>standby_switchover</literal> and <literal>standby_promote</literal>
|
|
<link linkend="event-notifications">event notifications</link> will be generated for the new primary,
|
|
and a <literal>node_rejoin</literal> event notification for the former primary (new standby).
|
|
</para>
|
|
<para>
|
|
If using an event notification script, <literal>standby_switchover</literal>
|
|
will populate the placeholder parameter <literal>%p</literal> with the node ID of
|
|
the former primary.
|
|
</para>
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
<title>Exit codes</title>
|
|
<para>
|
|
One of the following exit codes will be emitted by <command>repmgr standby switchover</command>:
|
|
</para>
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
<term><option>SUCCESS (0)</option></term>
|
|
<listitem>
|
|
<para>
|
|
The switchover completed successfully; or if <option>--dry-run</option> was provided,
|
|
no issues were detected which would prevent the switchover operation.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>ERR_SWITCHOVER_FAIL (18)</option></term>
|
|
<listitem>
|
|
<para>
|
|
The switchover could not be executed.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>ERR_SWITCHOVER_INCOMPLETE (22)</option></term>
|
|
<listitem>
|
|
<para>
|
|
The switchover was executed but a problem was encountered.
|
|
Typically this means the former primary could not be reattached
|
|
as a standby. Check preceding log messages for more information.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
<title>See also</title>
|
|
<para>
|
|
<xref linkend="repmgr-standby-follow"/>, <xref linkend="repmgr-node-rejoin"/>
|
|
</para>
|
|
<para>
|
|
For more details on performing a switchover operation, see the section <xref linkend="performing-switchover"/>.
|
|
</para>
|
|
</refsect1>
|
|
|
|
</refentry>
|