mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-22 22:56:29 +00:00
This provides a simple way for checking whether the node's repmgrd is running. GitHub #719.
309 lines
9.3 KiB
XML
309 lines
9.3 KiB
XML
<refentry id="repmgr-node-check">
|
|
<indexterm>
|
|
<primary>repmgr node check</primary>
|
|
</indexterm>
|
|
|
|
<refmeta>
|
|
<refentrytitle>repmgr node check</refentrytitle>
|
|
</refmeta>
|
|
|
|
<refnamediv>
|
|
<refname>repmgr node check</refname>
|
|
<refpurpose>performs some health checks on a node from a replication perspective</refpurpose>
|
|
</refnamediv>
|
|
|
|
<refsect1>
|
|
<title>Description</title>
|
|
<para>
|
|
Performs some health checks on a node from a replication perspective.
|
|
This command must be run on the local node.
|
|
</para>
|
|
<note>
|
|
<para>
|
|
Currently &repmgr; performs health checks on physical replication
|
|
slots only, with the aim of warning about streaming replication standbys which
|
|
have become detached and the associated risk of uncontrolled WAL file
|
|
growth.
|
|
</para>
|
|
</note>
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
<title>Example</title>
|
|
<para>
|
|
Execution on the primary server:
|
|
<programlisting>
|
|
$ repmgr -f /etc/repmgr.conf node check
|
|
Node "node1":
|
|
Server role: OK (node is primary)
|
|
Replication lag: OK (N/A - node is primary)
|
|
WAL archiving: OK (0 pending files)
|
|
Upstream connection: OK (N/A - is primary)
|
|
Downstream servers: OK (2 of 2 downstream nodes attached)
|
|
Replication slots: OK (node has no physical replication slots)
|
|
Missing replication slots: OK (node has no missing physical replication slots)
|
|
Configured data directory: OK (configured "data_directory" is "/var/lib/postgresql/data")</programlisting>
|
|
</para>
|
|
<para>
|
|
Execution on a standby server:
|
|
<programlisting>
|
|
$ repmgr -f /etc/repmgr.conf node check
|
|
Node "node2":
|
|
Server role: OK (node is standby)
|
|
Replication lag: OK (0 seconds)
|
|
WAL archiving: OK (0 pending archive ready files)
|
|
Upstream connection: OK (node "node2" (ID: 2) is attached to expected upstream node "node1" (ID: 1))
|
|
Downstream servers: OK (this node has no downstream nodes)
|
|
Replication slots: OK (node has no physical replication slots)
|
|
Missing physical replication slots: OK (node has no missing physical replication slots)
|
|
Configured data directory: OK (configured "data_directory" is "/var/lib/postgresql/data")</programlisting>
|
|
</para>
|
|
</refsect1>
|
|
<refsect1>
|
|
<title>Individual checks</title>
|
|
<para>
|
|
Each check can be performed individually by supplying
|
|
an additional command line parameter, e.g.:
|
|
<programlisting>
|
|
$ repmgr node check --role
|
|
OK (node is primary)</programlisting>
|
|
</para>
|
|
<para>
|
|
Parameters for individual checks are as follows:
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>--role</option>: checks if the node has the expected role
|
|
</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>--replication-lag</option>: checks if the node is lagging by more than
|
|
<varname>replication_lag_warning</varname> or <varname>replication_lag_critical</varname>
|
|
</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>--archive-ready</option>: checks for WAL files which have not yet been archived,
|
|
and returns <literal>WARNING</literal> or <literal>CRITICAL</literal> if the number
|
|
exceeds <varname>archive_ready_warning</varname> or <varname>archive_ready_critical</varname> respectively.
|
|
</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>--downstream</option>: checks that the expected downstream nodes are attached
|
|
</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>--upstream</option>: checks that the node is attached to its expected upstream
|
|
</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>--slots</option>: checks there are no inactive physical replication slots
|
|
</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>--missing-slots</option>: checks there are no missing physical replication slots
|
|
</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>--data-directory-config</option>: checks the data directory configured in
|
|
<filename>repmgr.conf</filename> matches the actual data directory.
|
|
This check is not directly related to replication, but is useful to verify &repmgr;
|
|
is correctly configured.
|
|
</simpara>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</refsect1>
|
|
|
|
|
|
<refsect1>
|
|
<title>repmgrd</title>
|
|
<para>
|
|
A separate check is available to verify whether &repmgrd; is running,
|
|
This is not included in the general output, as this does not
|
|
per-se constitute a check of the node's replication status.
|
|
</para>
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
<listitem>
|
|
<simpara>
|
|
<option>--repmgrd</option>: checks whether &repmgrd; is running.
|
|
If &repmgrd; is running but paused, status <literal>1</literal>
|
|
(<literal>WARNING</literal>) is returned.
|
|
</simpara>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
<title>Additional checks</title>
|
|
<para>
|
|
Several checks are provided for diagnostic purposes and are not
|
|
included in the general output:
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>--db-connection</option>: checks if &repmgr; can connect to the
|
|
database on the local node.
|
|
</simpara>
|
|
<simpara>
|
|
This option is particularly useful in combination with <command>SSH</command>, as
|
|
it can be used to troubleshoot connection issues encountered when &repmgr; is
|
|
executed remotely (e.g. during a switchover operation).
|
|
</simpara>
|
|
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>--replication-config-owner</option>: checks if the file containing replication
|
|
configuration (PostgreSQL 12 and later: <filename>postgresql.auto.conf</filename>;
|
|
PostgreSQL 11 and earlier: <filename>recovery.conf</filename>) is
|
|
owned by the same user who owns the data directory.
|
|
</simpara>
|
|
<simpara>
|
|
Incorrect ownership of these files (e.g. if they are owned by <literal>root</literal>)
|
|
will cause operations which need to update the replication configuration
|
|
(e.g. <link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>
|
|
or <link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>)
|
|
to fail.
|
|
</simpara>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
</para>
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
<title>Connection options</title>
|
|
<para>
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>-S</option>/<option>--superuser</option>: connect as the
|
|
named superuser instead of the &repmgr; user
|
|
</simpara>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
</para>
|
|
</refsect1>
|
|
|
|
<refsect1>
|
|
<title>Output format</title>
|
|
<para>
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>--csv</option>: generate output in CSV format (not available
|
|
for individual checks)
|
|
</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<option>--nagios</option>: generate output in a Nagios-compatible format
|
|
(for individual checks only)
|
|
</simpara>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</refsect1>
|
|
|
|
|
|
|
|
<refsect1>
|
|
<title>Exit codes</title>
|
|
|
|
<para>
|
|
When executing <command>repmgr node check</command> with one of the individual
|
|
checks listed above, &repmgr; will emit one of the following Nagios-style exit codes
|
|
(even if <option>--nagios</option> is not supplied):
|
|
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<literal>0</literal>: OK
|
|
</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<literal>1</literal>: WARNING
|
|
</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<literal>2</literal>: ERROR
|
|
</simpara>
|
|
</listitem>
|
|
|
|
<listitem>
|
|
<simpara>
|
|
<literal>3</literal>: UNKNOWN
|
|
</simpara>
|
|
</listitem>
|
|
|
|
</itemizedlist>
|
|
</para>
|
|
|
|
|
|
|
|
<para>
|
|
One of the following exit codes will be emitted by <command>repmgr status check</command>
|
|
if no individual check was specified.
|
|
</para>
|
|
|
|
<variablelist>
|
|
|
|
<varlistentry>
|
|
<term><option>SUCCESS (0)</option></term>
|
|
<listitem>
|
|
<para>
|
|
No issues were detected.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>ERR_NODE_STATUS (25)</option></term>
|
|
<listitem>
|
|
<para>
|
|
One or more issues were detected.
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
</variablelist>
|
|
|
|
</refsect1>
|
|
|
|
|
|
|
|
<refsect1>
|
|
<title>See also</title>
|
|
<para>
|
|
<xref linkend="repmgr-node-status"/>, <xref linkend="repmgr-cluster-show"/>
|
|
</para>
|
|
</refsect1>
|
|
|
|
</refentry>
|