repmgr: cluster check commands - non-zero exit code if node(s) unavailable

Return ERR_CLUSTER_CHECK if one or nodes was not reachable.

Implements GitHub #447.
This commit is contained in:
Ian Barwick
2018-06-11 12:39:35 +09:00
parent 00704913a6
commit 3b0cde2846
7 changed files with 117 additions and 20 deletions

View File

@@ -1,4 +1,4 @@
4.0.6 2018-06-?? 4.0.6 2018-06-14
repmgr: (witness register) prevent registration of a witness server with the repmgr: (witness register) prevent registration of a witness server with the
same name as an existing node (Ian) same name as an existing node (Ian)
repmgr: (standby follow) check node has actually connected to new primary repmgr: (standby follow) check node has actually connected to new primary
@@ -13,6 +13,8 @@
GitHub #442 (Ian) GitHub #442 (Ian)
repmgr: when using --dry-run, force log level to INFO to ensure output repmgr: when using --dry-run, force log level to INFO to ensure output
will always be displayed; GitHub #441 (Ian) will always be displayed; GitHub #441 (Ian)
repmgr: (cluster matrix/crosscheck) return non-zero exit code if node
connection issues detected; GitHub #447 (Ian)
repmgrd: ensure local node is counted as quorum member; GitHub #439 (Ian) repmgrd: ensure local node is counted as quorum member; GitHub #439 (Ian)
4.0.5 2018-05-02 4.0.5 2018-05-02

View File

@@ -28,6 +28,40 @@
for more details. for more details.
</para> </para>
<sect2>
<title>Usability enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-cluster-crosscheck">repmgr cluster crosscheck</link></command> and
<command><link linkend="repmgr-cluster-matrix">repmgr cluster matrix</link></command>:
return non-zero exit code if node connection issues detected (GitHub #447)
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>:
Improve handling of external configuration file copying, including consideration in
<option>--dry-run</option> check
(GitHub #443)
</para>
</listitem>
<listitem>
<para>
When using <option>--dry-run</option>, force log level to <literal>INFO</literal>
to ensure output will always be displayed
(GitHub #441)
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2> <sect2>
<title>Bug fixes</title> <title>Bug fixes</title>
<para> <para>
@@ -51,15 +85,6 @@
</listitem> </listitem>
<listitem>
<para>
<command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>:
Improve handling of external configuration file copying, including consideration in
<option>--dry-run</option> check
(GitHub #443)
</para>
</listitem>
<listitem> <listitem>
<para> <para>
<command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>: <command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>:
@@ -84,14 +109,6 @@
</para> </para>
</listitem> </listitem>
<listitem>
<para>
When using <option>--dry-run</option>, force log level to <literal>INFO</literal>
to ensure output will always be displayed
(GitHub #441)
</para>
</listitem>
<listitem> <listitem>
<para> <para>
<application>repmgrd</application>: ensure local node is counted as quorum member <application>repmgrd</application>: ensure local node is counted as quorum member

View File

@@ -38,5 +38,34 @@
and therefore determine the state of outbound connections from that node. and therefore determine the state of outbound connections from that node.
</para> </para>
</refsect1> </refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr cluster crosscheck</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
The check completed successfully and all nodes are reachable.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_CLUSTER_CHECK (25)</option></term>
<listitem>
<para>
One or more nodes could not be reached.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
</refentry> </refentry>

View File

@@ -97,5 +97,35 @@
useful result. useful result.
</para> </para>
</refsect1> </refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr cluster matrix</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
The check completed successfully and all nodes are reachable.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_CLUSTER_CHECK (25)</option></term>
<listitem>
<para>
One or more nodes could not be reached.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
</refentry> </refentry>

View File

@@ -199,7 +199,7 @@
<refsect1> <refsect1>
<title>Exit codes</title> <title>Exit codes</title>
<para> <para>
Following exit codes can be emitted by <literal>repmgr standby switchover</literal>: Following exit codes can be emitted by <command>repmgr standby switchover</command>:
</para> </para>
<variablelist> <variablelist>
@@ -227,7 +227,7 @@
<para> <para>
The switchover was executed but a problem was encountered. The switchover was executed but a problem was encountered.
Typically this means the former primary could not be reattached Typically this means the former primary could not be reattached
as a standby. as a standby. Check preceding log messages for more information.
</para> </para>
</listitem> </listitem>
</varlistentry> </varlistentry>

View File

@@ -46,5 +46,6 @@
#define ERR_SWITCHOVER_INCOMPLETE 22 #define ERR_SWITCHOVER_INCOMPLETE 22
#define ERR_FOLLOW_FAIL 23 #define ERR_FOLLOW_FAIL 23
#define ERR_REJOIN_FAIL 24 #define ERR_REJOIN_FAIL 24
#define ERR_CLUSTER_CHECK 25
#endif /* _ERRCODE_H_ */ #endif /* _ERRCODE_H_ */

View File

@@ -569,6 +569,8 @@ do_cluster_crosscheck(void)
t_node_status_cube **cube; t_node_status_cube **cube;
bool error_found = false;
n = build_cluster_crosscheck(&cube, &name_length); n = build_cluster_crosscheck(&cube, &name_length);
if (runtime_options.output_mode == OM_CSV) if (runtime_options.output_mode == OM_CSV)
{ {
@@ -648,9 +650,11 @@ do_cluster_crosscheck(void)
{ {
case -2: case -2:
c = '?'; c = '?';
error_found = true;
break; break;
case -1: case -1:
c = 'x'; c = 'x';
error_found = true;
break; break;
case 0: case 0:
c = '*'; c = '*';
@@ -689,6 +693,11 @@ do_cluster_crosscheck(void)
free(cube); free(cube);
} }
if (error_found == true)
{
exit(ERR_CLUSTER_CHECK);
}
} }
@@ -704,6 +713,8 @@ do_cluster_matrix()
t_node_matrix_rec **matrix_rec_list; t_node_matrix_rec **matrix_rec_list;
bool error_found = false;
n = build_cluster_matrix(&matrix_rec_list, &name_length); n = build_cluster_matrix(&matrix_rec_list, &name_length);
if (runtime_options.output_mode == OM_CSV) if (runtime_options.output_mode == OM_CSV)
@@ -742,9 +753,11 @@ do_cluster_matrix()
{ {
case -2: case -2:
c = '?'; c = '?';
error_found = true;
break; break;
case -1: case -1:
c = 'x'; c = 'x';
error_found = true;
break; break;
case 0: case 0:
c = '*'; c = '*';
@@ -770,6 +783,11 @@ do_cluster_matrix()
} }
free(matrix_rec_list); free(matrix_rec_list);
if (error_found == true)
{
exit(ERR_CLUSTER_CHECK);
}
} }