mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-26 16:46:28 +00:00
"standby switchover": improve log messages and add new exit code
Previously, if an issue was encountered with the old primary, but user provided -F/--force to have repmgr promote the standby anyway, repmgr would exit with the log message "STANDBY SWITCHOVER is complete" and exit code 0 (SUCCESS). To better report this partial completion, repmgr will now emit the message "STANDBY SWITCHOVER has completed with issues" (and a HINT to check preceding log messages) and new exit code 22 (ERR_SWITCHOVER_INCOMPLETE).
This commit is contained in:
4
HISTORY
4
HISTORY
@@ -1,8 +1,10 @@
|
|||||||
4.0.3 2018-02-
|
4.0.3 2018-02-
|
||||||
repmgr: improve switchover handling when "pg_ctl" used to control the
|
repmgr: improve switchover handling when "pg_ctl" used to control the
|
||||||
server and logging output is not explicitly redirected (Ian)
|
server and logging output is not explicitly redirected (Ian)
|
||||||
|
repmgr: improve switchover log messages and exit code when old primary could
|
||||||
|
not be shut down cleanly (Ian)
|
||||||
|
|
||||||
4.0.2 2018-01-
|
4.0.2 2018-01-18
|
||||||
repmgr: add missing -W option to getopt_long() invocation; GitHub #350 (Ian)
|
repmgr: add missing -W option to getopt_long() invocation; GitHub #350 (Ian)
|
||||||
repmgr: automatically create slot name if missing; GitHub #343 (Ian)
|
repmgr: automatically create slot name if missing; GitHub #343 (Ian)
|
||||||
repmgr: fixes to parsing output of remote repmgr invocations; GitHub #349 (Ian)
|
repmgr: fixes to parsing output of remote repmgr invocations; GitHub #349 (Ian)
|
||||||
|
|||||||
@@ -22,9 +22,17 @@
|
|||||||
</para>
|
</para>
|
||||||
<para>
|
<para>
|
||||||
If other standbys are connected to the demotion candidate, &repmgr; can instruct
|
If other standbys are connected to the demotion candidate, &repmgr; can instruct
|
||||||
these to follow the new primary if the option <literal>--siblings-follow</literal>
|
these to follow the new primary if the option <literal>--siblings-follow</literal>
|
||||||
is specified.
|
is specified.
|
||||||
</para>
|
</para>
|
||||||
|
<note>
|
||||||
|
<para>
|
||||||
|
Performing a switchover is a non-trivial operation. In particular it
|
||||||
|
relies on the current primary being able to shut down cleanly and quickly.
|
||||||
|
&repmgr; will attempt to check for potential issues but cannot guarantee
|
||||||
|
a successful switchover.
|
||||||
|
</para>
|
||||||
|
</note>
|
||||||
</refsect1>
|
</refsect1>
|
||||||
|
|
||||||
<refsect1>
|
<refsect1>
|
||||||
@@ -47,6 +55,13 @@
|
|||||||
<para>
|
<para>
|
||||||
Check prerequisites but don't actually execute a switchover.
|
Check prerequisites but don't actually execute a switchover.
|
||||||
</para>
|
</para>
|
||||||
|
<important>
|
||||||
|
<para>
|
||||||
|
Success of <option>--dry-run</option> does not imply the switchover will
|
||||||
|
complete successfully, only that
|
||||||
|
the prerequisites for performing the operation are met.
|
||||||
|
</para>
|
||||||
|
</important>
|
||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
@@ -57,6 +72,12 @@
|
|||||||
<para>
|
<para>
|
||||||
Ignore warnings and continue anyway.
|
Ignore warnings and continue anyway.
|
||||||
</para>
|
</para>
|
||||||
|
<para>
|
||||||
|
Specifically, if a problem is encountered when shutting down the current primary,
|
||||||
|
using <option>-F/--force</option> will cause &repmgr; to continue by promoting
|
||||||
|
the standby to be the new primary, and if <option>--siblings-follow</option> is
|
||||||
|
specified, attach any other standbys to the new primary.
|
||||||
|
</para>
|
||||||
</listitem>
|
</listitem>
|
||||||
</varlistentry>
|
</varlistentry>
|
||||||
|
|
||||||
@@ -103,6 +124,11 @@
|
|||||||
<application>repmgrd</application> should not be active on any nodes while a switchover is being
|
<application>repmgrd</application> should not be active on any nodes while a switchover is being
|
||||||
executed. This restriction may be lifted in a later version.
|
executed. This restriction may be lifted in a later version.
|
||||||
</para>
|
</para>
|
||||||
|
<para>
|
||||||
|
External database connections, e.g. from an application, should not be permitted while
|
||||||
|
the switchover is taking place. In particular, active transactions on the primary
|
||||||
|
can potentially disrupt the shutdown process.
|
||||||
|
</para>
|
||||||
</refsect1>
|
</refsect1>
|
||||||
|
|
||||||
<refsect1>
|
<refsect1>
|
||||||
@@ -119,6 +145,44 @@
|
|||||||
</para>
|
</para>
|
||||||
</refsect1>
|
</refsect1>
|
||||||
|
|
||||||
|
<refsect1>
|
||||||
|
<title>Exit codes</title>
|
||||||
|
<para>
|
||||||
|
Following exit codes can be emitted by <literal>repmgr standby switchover</literal>:
|
||||||
|
</para>
|
||||||
|
<variablelist>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term><option>SUCCESS (0)</option></term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
The switchover completed successfully.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term><option>ERR_SWITCHOVER_FAIL (18)</option></term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
The switchover could not be executed.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term><option>ERR_SWITCHOVER_INCOMPLETE (22)</option></term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
The switchover was executed but a problem was encountered.
|
||||||
|
Typically this means the former primary could not be reattached
|
||||||
|
as a standby.
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
</variablelist>
|
||||||
|
</refsect1>
|
||||||
|
|
||||||
<refsect1>
|
<refsect1>
|
||||||
<title>See also</title>
|
<title>See also</title>
|
||||||
|
|||||||
@@ -43,6 +43,6 @@
|
|||||||
#define ERR_BARMAN 19
|
#define ERR_BARMAN 19
|
||||||
#define ERR_REGISTRATION_SYNC 20
|
#define ERR_REGISTRATION_SYNC 20
|
||||||
#define ERR_OUT_OF_MEMORY 21
|
#define ERR_OUT_OF_MEMORY 21
|
||||||
#define ERR_REJOIN_FAIL 22
|
#define ERR_SWITCHOVER_INCOMPLETE 22
|
||||||
|
|
||||||
#endif /* _ERRCODE_H_ */
|
#endif /* _ERRCODE_H_ */
|
||||||
|
|||||||
@@ -2036,6 +2036,10 @@ do_standby_switchover(void)
|
|||||||
i;
|
i;
|
||||||
bool command_success = false;
|
bool command_success = false;
|
||||||
bool shutdown_success = false;
|
bool shutdown_success = false;
|
||||||
|
|
||||||
|
/* this flag will use to generate the final message generated */
|
||||||
|
bool switchover_success = true;
|
||||||
|
|
||||||
XLogRecPtr remote_last_checkpoint_lsn = InvalidXLogRecPtr;
|
XLogRecPtr remote_last_checkpoint_lsn = InvalidXLogRecPtr;
|
||||||
ReplInfo replication_info = T_REPLINFO_INTIALIZER;
|
ReplInfo replication_info = T_REPLINFO_INTIALIZER;
|
||||||
|
|
||||||
@@ -2894,12 +2898,17 @@ do_standby_switchover(void)
|
|||||||
/* clean up remote node */
|
/* clean up remote node */
|
||||||
remote_conn = establish_db_connection(remote_node_record.conninfo, false);
|
remote_conn = establish_db_connection(remote_node_record.conninfo, false);
|
||||||
|
|
||||||
/* check replication status */
|
/* check new standby (old primary) is reachable */
|
||||||
if (PQstatus(remote_conn) != CONNECTION_OK)
|
if (PQstatus(remote_conn) != CONNECTION_OK)
|
||||||
{
|
{
|
||||||
log_error(_("unable to reestablish connection to remote node \"%s\""),
|
switchover_success = false;
|
||||||
remote_node_record.node_name);
|
|
||||||
/* log_hint(_("")); // depends on replication status */
|
/* TODO: double-check whether new standby has attached */
|
||||||
|
|
||||||
|
log_warning(_("switchover did not fully complete"));
|
||||||
|
log_detail(_("node \"%s\" is now primary but node \"%s\" is not reachable"),
|
||||||
|
local_node_record.node_name,
|
||||||
|
remote_node_record.node_name);
|
||||||
}
|
}
|
||||||
else
|
else
|
||||||
{
|
{
|
||||||
@@ -2910,17 +2919,20 @@ do_standby_switchover(void)
|
|||||||
local_node_record.slot_name);
|
local_node_record.slot_name);
|
||||||
}
|
}
|
||||||
/* TODO warn about any inactive replication slots */
|
/* TODO warn about any inactive replication slots */
|
||||||
|
|
||||||
|
log_notice(_("switchover was successful"));
|
||||||
|
log_detail(_("node \"%s\" is now primary and node \"%s\" is attached as standby"),
|
||||||
|
local_node_record.node_name,
|
||||||
|
remote_node_record.node_name);
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
PQfinish(remote_conn);
|
PQfinish(remote_conn);
|
||||||
|
|
||||||
log_notice(_("switchover was successful"));
|
|
||||||
log_detail(_("node \"%s\" is now primary"),
|
|
||||||
local_node_record.node_name);
|
|
||||||
|
|
||||||
/*
|
/*
|
||||||
* If --siblings-follow specified, attempt to make them follow the new
|
* If --siblings-follow specified, attempt to make them follow the new
|
||||||
* standby
|
* primary
|
||||||
*/
|
*/
|
||||||
|
|
||||||
if (runtime_options.siblings_follow == true && sibling_nodes.node_count > 0)
|
if (runtime_options.siblings_follow == true && sibling_nodes.node_count > 0)
|
||||||
@@ -2993,7 +3005,17 @@ do_standby_switchover(void)
|
|||||||
|
|
||||||
PQfinish(local_conn);
|
PQfinish(local_conn);
|
||||||
|
|
||||||
log_notice(_("STANDBY SWITCHOVER is complete"));
|
if (switchover_success == true)
|
||||||
|
{
|
||||||
|
log_notice(_("STANDBY SWITCHOVER has completed successfully"));
|
||||||
|
}
|
||||||
|
else
|
||||||
|
{
|
||||||
|
log_notice(_("STANDBY SWITCHOVER has completed with issues"));
|
||||||
|
log_hint(_("see preceding log message(s) for details"));
|
||||||
|
exit(ERR_SWITCHOVER_INCOMPLETE);
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
return;
|
return;
|
||||||
}
|
}
|
||||||
|
|||||||
Reference in New Issue
Block a user