standby switchover: improve handling of node rejoin failure

Explicitly check whether the "repmgr node rejoin" command on the demotion candidate succeeded. Due to the way SSH execution is currently implemented, we can return either the command execution status or the command output; to ensure any errors are available, log them to a temporary file on the demotion candidate and note its location in case of an error. While we're at it, improve error message handling when the demotion candidate fails to rejoin.
2026-06-01 03:39:05 +00:00 · 2021-07-28 11:42:08 +09:00
parent 55efbe60ea
commit 5f1ba6db3d
4 changed files with 159 additions and 60 deletions
@@ -30,6 +30,21 @@
      <title>Improvements</title>
      <para>
        <itemizedlist>
+          <listitem>
+            <para>
+              <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>:
+              Improve handling of node rejoin failure on the demotion candidate.
+            </para>
+            <para>
+              Previously &repmgr; did not check whether <command>repmgr node rejoin</command> actually
+              succeeded on the demotion candidate, and would always wait up to <varname>node_rejoin_timeout</varname>
+              seconds for it to attach to the promotion candidate, even if this would never happen.
+            </para>
+            <para>
+              This makes it easier to identify unexpected events during a switchover operation, such as
+              the demotion candidate being unexpectedly restarted by an external process.
+            </para>
+          </listitem>
          <listitem>
            <para>
              &repmgrd;: at startup, if node record is marked as "inactive", attempt