Files
repmgr/doc/repmgrd-degraded-monitoring.sgml
2017-10-19 12:22:58 +09:00

76 lines
3.4 KiB
Plaintext

<chapter id="repmgrd-degraded-monitoring">
<indexterm>
<primary>repmgrd</primary>
<secondary>degraded monitoring</secondary>
</indexterm>
<title>"degraded monitoring" mode</title>
<para>
In certain circumstances, <application>repmgrd</application> is not able to fulfill its primary mission
of monitoring the nodes' upstream server. In these cases it enters "degraded
monitoring" mode, where <application>repmgrd</application> remains active but is waiting for the situation
to be resolved.
</para>
<para>
Situations where this happens are:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>a failover situation has occurred, no nodes in the primary node's location are visible</simpara>
</listitem>
<listitem>
<simpara>a failover situation has occurred, but no promotion candidate is available</simpara>
</listitem>
<listitem>
<simpara>a failover situation has occurred, but the promotion candidate could not be promoted</simpara>
</listitem>
<listitem>
<simpara>a failover situation has occurred, but the node was unable to follow the new primary</simpara>
</listitem>
<listitem>
<simpara>a failover situation has occurred, but no primary has become available</simpara>
</listitem>
<listitem>
<simpara>a failover situation has occurred, but automatic failover is not enabled for the node</simpara>
</listitem>
<listitem>
<simpara>repmgrd is monitoring the primary node, but it is not available</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Example output in a situation where there is only one standby with <literal>failover=manual</literal>,
and the primary node is unavailable (but is later restarted):
<programlisting>
[2017-08-29 10:59:19] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state (automatic failover disabled)
[2017-08-29 10:59:33] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
[2017-08-29 10:59:33] [INFO] checking state of node 1, 1 of 5 attempts
[2017-08-29 10:59:33] [INFO] sleeping 1 seconds until next reconnection attempt
(...)
[2017-08-29 10:59:37] [INFO] checking state of node 1, 5 of 5 attempts
[2017-08-29 10:59:37] [WARNING] unable to reconnect to node 1 after 5 attempts
[2017-08-29 10:59:37] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate
[2017-08-29 10:59:37] [NOTICE] no other nodes are available as promotion candidate
[2017-08-29 10:59:37] [HINT] use "repmgr standby promote" to manually promote this node
[2017-08-29 10:59:37] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state (automatic failover disabled)
[2017-08-29 10:59:53] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state (automatic failover disabled)
[2017-08-29 11:00:45] [NOTICE] reconnected to upstream node 1 after 68 seconds, resuming monitoring
[2017-08-29 11:00:57] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state (automatic failover disabled)</programlisting>
</para>
<para>
By default, <literal>repmgrd</literal> will continue in degraded monitoring mode indefinitely.
However a timeout (in seconds) can be set with <varname>degraded_monitoring_timeout</varname>,
after which <application>repmgrd</application> will terminate.
</para>
</chapter>