mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-24 23:56:29 +00:00
"standby switchover": avoid potential race condition with WAL location check
Immediately after the demotion candidate (primary) has shut down, we can't be absolutely sure that the walreceiver has flushed all WAL to disk, so checking pg_last_wal_receive_lsn() at that point might not reflect the actual last available WAL location. To handle this, we'll loop for a while (timeout controlled by configuration parameter "wal_receive_check_timeout") before finally deciding whether the standby is still behind the shut-down primary. Addresses issue raised in GitHub #518.
This commit is contained in:
@@ -105,6 +105,15 @@
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
<listitem>
|
||||
<para>
|
||||
Add check <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>
|
||||
when comparing received WAL on the standby to the primary's shutdown location to avoid a potential
|
||||
race condition if the standby's walreceiver has not yet flushed all received WAL to disk.
|
||||
GitHub #518.
|
||||
</para>
|
||||
</listitem>
|
||||
|
||||
</itemizedlist>
|
||||
</para>
|
||||
</sect2>
|
||||
|
||||
@@ -168,20 +168,6 @@
|
||||
|
||||
<variablelist>
|
||||
|
||||
<varlistentry>
|
||||
<indexterm>
|
||||
<primary>x</primary>
|
||||
<secondary>with "repmgr standby switchover "</secondary>
|
||||
</indexterm>
|
||||
|
||||
<term><option></option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
|
||||
<varlistentry>
|
||||
<indexterm>
|
||||
<primary>replication_lag_critical</primary>
|
||||
@@ -207,7 +193,7 @@
|
||||
<term><option>shutdown_check_timeout</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
maximum number of seconds to wait for the
|
||||
The maximum number of seconds to wait for the
|
||||
demotion candidate (current primary) to shut down, before aborting the switchover.
|
||||
</para>
|
||||
<para>
|
||||
@@ -225,7 +211,25 @@
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<indexterm>
|
||||
<primary>wal_receive_check_timeout</primary>
|
||||
<secondary>with "repmgr standby switchover "</secondary>
|
||||
</indexterm>
|
||||
|
||||
<term><option>wal_receive_check_timeout</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
After the primary has shut down, the maximum number of seconds to wait for the
|
||||
walreceiver on the standby to flush WAL to disk before comparing WAL receive location
|
||||
with the primary's shut down location.
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
|
||||
<varlistentry>
|
||||
<indexterm>
|
||||
<primary>standby_reconnect_timeout</primary>
|
||||
<secondary>with "repmgr standby switchover "</secondary>
|
||||
@@ -234,8 +238,8 @@
|
||||
<term><option>standby_reconnect_timeout</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
maximum number of seconds to attempt to wait for the demotion candidate (former primary)
|
||||
to reconnect to the promoted primary (default: 60 seconds)
|
||||
The maximum number of seconds to attempt to wait for the demotion candidate (former primary)
|
||||
to reconnect to the promoted primary (default: 60 seconds)
|
||||
</para>
|
||||
<para>
|
||||
Note that this parameter is set on the node where <command>repmgr standby switchover</command>
|
||||
@@ -245,7 +249,6 @@
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
|
||||
<varlistentry>
|
||||
<indexterm>
|
||||
<primary>node_rejoin_timeout</primary>
|
||||
@@ -265,7 +268,7 @@
|
||||
</para>
|
||||
<para>
|
||||
However, this value <emphasis>must</emphasis> be less than <option>standby_reconnect_timeout</option> on the
|
||||
promotion candidate (node where <command>repmgr standby switchover</command> is executed).
|
||||
promotion candidate (the node where <command>repmgr standby switchover</command> is executed).
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
Reference in New Issue
Block a user