repmgr/doc/repmgrd-degraded-monitoring.sgml

<chapter id="repmgrd-degraded-monitoring">
 <indexterm>
   <primary>repmgrd</primary>
   <secondary>degraded monitoring</secondary>
 </indexterm>

 <title>"degraded monitoring" mode</title>
 <para>
  In certain circumstances, <application>repmgrd</application> is not able to fulfill its primary mission
  of monitoring the nodes' upstream server. In these cases it enters "degraded
  monitoring" mode, where <application>repmgrd</application> remains active but is waiting for the situation
  to be resolved.
 </para>
 <para>
  Situations where this happens are:
  <itemizedlist spacing="compact" mark="bullet">

   <listitem>
    <simpara>a failover situation has occurred, no nodes in the primary node's location are visible</simpara>
   </listitem>

   <listitem>
    <simpara>a failover situation has occurred, but no promotion candidate is available</simpara>
   </listitem>

   <listitem>
    <simpara>a failover situation has occurred, but the promotion candidate could not be promoted</simpara>
   </listitem>

   <listitem>
    <simpara>a failover situation has occurred, but the node was unable to follow the new primary</simpara>
   </listitem>

   <listitem>
    <simpara>a failover situation has occurred, but no primary has become available</simpara>
   </listitem>

   <listitem>
    <simpara>a failover situation has occurred, but automatic failover is not enabled for the node</simpara>
   </listitem>

   <listitem>
    <simpara>repmgrd is monitoring the primary node, but it is not available</simpara>
   </listitem>
  </itemizedlist>
 </para>

 <para>
  Example output in a situation where there is only one standby with <literal>failover=manual</literal>,
  and the primary node is unavailable (but is later restarted):
  <programlisting>
    [2017-08-29 10:59:19] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state (automatic failover disabled)
    [2017-08-29 10:59:33] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
    [2017-08-29 10:59:33] [INFO] checking state of node 1, 1 of 5 attempts
    [2017-08-29 10:59:33] [INFO] sleeping 1 seconds until next reconnection attempt
    (...)
    [2017-08-29 10:59:37] [INFO] checking state of node 1, 5 of 5 attempts
    [2017-08-29 10:59:37] [WARNING] unable to reconnect to node 1 after 5 attempts
    [2017-08-29 10:59:37] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate
    [2017-08-29 10:59:37] [NOTICE] no other nodes are available as promotion candidate
    [2017-08-29 10:59:37] [HINT] use "repmgr standby promote" to manually promote this node
    [2017-08-29 10:59:37] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state (automatic failover disabled)
    [2017-08-29 10:59:53] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state (automatic failover disabled)
    [2017-08-29 11:00:45] [NOTICE] reconnected to upstream node 1 after 68 seconds, resuming monitoring
    [2017-08-29 11:00:57] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state (automatic failover disabled)</programlisting>

 </para>
 <para>
  By default, <literal>repmgrd</literal> will continue in degraded monitoring mode indefinitely.
  However a timeout (in seconds) can be set with <varname>degraded_monitoring_timeout</varname>,
  after which <application>repmgrd</application> will terminate.

 </para>

</chapter>