More updates

This commit is contained in:
Ian Barwick
2017-10-05 15:21:22 +09:00
parent fee4569887
commit 87ea7850ca
5 changed files with 295 additions and 1 deletions

View File

@@ -0,0 +1,181 @@
<chapter id="event-notifications" xreflabel="event notifications">
<title>Event Notifications</title>
<para>
Each time `repmgr` or `repmgrd` perform a significant event, a record
of that event is written into the `repmgr.events` table together with
a timestamp, an indication of failure or success, and further details
if appropriate. This is useful for gaining an overview of events
affecting the replication cluster. However note that this table has
advisory character and should be used in combination with the `repmgr`
and PostgreSQL logs to obtain details of any events.
</para>
<para>
Example output after a primary was registered and a standby cloned
and registered:
<programlisting>
repmgr=# SELECT * from repmgr.events ;
node_id | event | successful | event_timestamp | details
---------+------------------+------------+-------------------------------+-------------------------------------------------------------------------------------
1 | primary_register | t | 2016-01-08 15:04:39.781733+09 |
2 | standby_clone | t | 2016-01-08 15:04:49.530001+09 | Cloned from host 'repmgr_node1', port 5432; backup method: pg_basebackup; --force: N
2 | standby_register | t | 2016-01-08 15:04:50.621292+09 |
(3 rows)</programlisting>
</para>
<para>
Alternatively, use <xref linkend="repmgr-cluster-event"> to output a
formatted list of events.
</para>
<para>
Additionally, event notifications can be passed to a user-defined program
or script which can take further action, e.g. send email notifications.
This is done by setting the `event_notification_command` parameter in
`repmgr.conf`.
</para>
<para>
This parameter accepts the following format placeholders:
</para>
<variablelist>
<varlistentry>
<term><option>%n</option></term>
<listitem>
<para>
node ID
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%e</option></term>
<listitem>
<para>
event type
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%t</option></term>
<listitem>
<para>
success (1 or 0)
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%t</option></term>
<listitem>
<para>
timestamp
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%d</option></term>
<listitem>
<para>
details
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
The values provided for <literal>%t</literal> and <literal>%d</literal>
will probably contain spaces, so should be quoted in the provided command
configuration, e.g.:
<programlisting>
event_notification_command='/path/to/some/script %n %e %s "%t" "%d"'
</programlisting>
</para>
<para>
Additionally the following format placeholders are available for the event
type <varname>bdr_failover</varname> and optionally <varname>bdr_recovery</varname>:
</para>
<variablelist>
<varlistentry>
<term><option>%c</option></term>
<listitem>
<para>
conninfo string of the next available node
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%a</option></term>
<listitem>
<para>
name of the next available node
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
These should always be quoted.
</para>
<para>
By default, all notification types will be passed to the designated script;
the notification types can be filtered to explicitly named ones:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>primary_register</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_register</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_unregister</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_clone</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_promote</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_follow</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_disconnect_manual</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_start</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_shutdown</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_promote</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_follow</literal></simpara>
</listitem>
<listitem>
<simpara><literal>bdr_failover</literal></simpara>
</listitem>
<listitem>
<simpara><literal>bdr_reconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>bdr_recovery</literal></simpara>
</listitem>
<listitem>
<simpara><literal>bdr_register</literal></simpara>
</listitem>
<listitem>
<simpara><literal>bdr_unregister</literal></simpara>
</listitem>
</itemizedlist>
</para>
<para>
Note that under some circumstances (e.g. when no replication cluster primary
could be located), it will not be possible to write an entry into the
<literal>repmgr.events</literal>
table, in which case executing a script via <varname>event_notification_command</varname>
can serve as a fallback by generating some form of notification.
</para>
</chapter>

View File

@@ -43,10 +43,12 @@
<!ENTITY promoting-standby SYSTEM "promoting-standby.sgml">
<!ENTITY follow-new-primary SYSTEM "follow-new-primary.sgml">
<!ENTITY switchover SYSTEM "switchover.sgml">
<!ENTITY event-notifications SYSTEM "event-notifications.sgml">
<!ENTITY repmgrd-automatic-failover SYSTEM "repmgrd-automatic-failover.sgml">
<!ENTITY repmgrd-configuration SYSTEM "repmgrd-configuration.sgml">
<!ENTITY repmgrd-demonstration SYSTEM "repmgrd-demonstration.sgml">
<!ENTITY repmgrd-monitoring SYSTEM "repmgrd-monitoring.sgml">
<!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.sgml">
<!ENTITY repmgr-primary-unregister SYSTEM "repmgr-primary-unregister.sgml">
@@ -62,9 +64,9 @@
<!ENTITY repmgr-cluster-show SYSTEM "repmgr-cluster-show.sgml">
<!ENTITY repmgr-cluster-matrix SYSTEM "repmgr-cluster-matrix.sgml">
<!ENTITY repmgr-cluster-crosscheck SYSTEM "repmgr-cluster-crosscheck.sgml">
<!ENTITY repmgr-cluster-event SYSTEM "repmgr-cluster-event.sgml">
<!ENTITY repmgr-cluster-cleanup SYSTEM "repmgr-cluster-cleanup.sgml">
<!ENTITY appendix-signatures SYSTEM "appendix-signatures.sgml">
<!ENTITY bookindex SYSTEM "bookindex.sgml">

View File

@@ -0,0 +1,37 @@
<chapter id="repmgr-cluster-event" xreflabel="repmgr cluster event">
<indexterm>
<primary>repmgr cluster event</primary>
</indexterm>
<title>repmgr cluster event</title>
<para>
This outputs a formatted list of cluster events, as stored in the
<literal>repmgr.events</literal> table. Output is in reverse chronological order, and
can be filtered with the following options:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>--all</literal>: outputs all entries</simpara>
</listitem>
<listitem>
<simpara><literal>--limit</literal>: set the maximum number of entries to output (default: 20)</simpara>
</listitem>
<listitem>
<simpara><literal>--node-id</literal>: restrict entries to node with this ID</simpara>
</listitem>
<listitem>
<simpara><literal>--node-name</literal>: restrict entries to node with this name</simpara>
</listitem>
<listitem>
<simpara><literal>--event</literal>: filter specific event</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Example:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster event --event=standby_register
Node ID | Name | Event | OK | Timestamp | Details
---------+-------+------------------+----+---------------------+--------------------------------
3 | node3 | standby_register | t | 2017-08-17 10:28:55 | standby registration succeeded
2 | node2 | standby_register | t | 2017-08-17 10:28:53 | standby registration succeeded</programlisting>
</para>
</chapter>

View File

@@ -73,6 +73,7 @@
&promoting-standby;
&follow-new-primary;
&switchover;
&event-notifications;
</part>
<part id="using-repmgrd">
@@ -80,6 +81,7 @@
&repmgrd-automatic-failover;
&repmgrd-configuration;
&repmgrd-demonstration;
&repmgrd-monitoring;
</part>
<part id="repmgr-command-reference">
@@ -99,6 +101,7 @@
&repmgr-cluster-show;
&repmgr-cluster-matrix;
&repmgr-cluster-crosscheck;
&repmgr-cluster-event;
&repmgr-cluster-cleanup;
</part>

View File

@@ -0,0 +1,71 @@
<chapter id="repmgrd-monitoring">
<title>Monitoring with repmgrd</title>
<para>
When `repmgrd` is running with the option <literal>monitoring_history=true</literal>,
it will constantly write standby node status information to the
<varname>monitoring_history</varname> table, providing a near-real time
overview of replication status on all nodes
in the cluster.
</para>
<para>
The view <literal>replication_status</literal> shows the most recent state
for each node, e.g.:
<programlisting>
repmgr=# select * from repmgr.replication_status;
-[ RECORD 1 ]-------------+------------------------------
primary_node_id | 1
standby_node_id | 2
standby_name | node2
node_type | standby
active | t
last_monitor_time | 2017-08-24 16:28:41.260478+09
last_wal_primary_location | 0/6D57A00
last_wal_standby_location | 0/5000000
replication_lag | 29 MB
replication_time_lag | 00:00:11.736163
apply_lag | 15 MB
communication_time_lag | 00:00:01.365643</programlisting>
</para>
<para>
The interval in which monitoring history is written is controlled by the
configuration parameter <varname>monitor_interval_secs</varname>;
default is 2.
</para>
<para>
As this can generate a large amount of monitoring data in the table
<literal>repmgr.monitoring_history</literal>. it's advisable to regularly
purge historical data using the <xref linkend="repmgr-cluster-cleanup">
command; use the <literal>-k/--keep-history</literal> option to
specify how many day's worth of data should be retained.
</para>
<para>
It's possible to use <command>repmgrd</command> to run in monitoring
mode only (without automatic failover capability) for some or all
nodes by setting <literal>failover=manual</literal> in the node's
<filename>repmgr.conf</filename> file. In the event of the node's upstream failing,
no failover action will be taken and the node will require manual intervention to
be reattached to replication. If this occurs, an
<link linkend="event-notifications">event notification</link>
<varname>standby_disconnect_manual</varname> will be created.
</para>
<para>
Note that when a standby node is not streaming directly from its upstream
node, e.g. recovering WAL from an archive, <varname>apply_lag</varname> will always appear as
<literal>0 bytes</literal>.
</para>
<tip>
<para>
If monitoring history is enabled, the contents of the <literal>repmgr.monitoring_history</literal>
table will be replicated to attached standbys. This means there will be a small but
constant stream of replication activity which may not be desirable. To prevent
this, convert the table to an <literal>UNLOGGED</literal> one with:
<programlisting>
ALTER TABLE repmgr.monitoring_history SET UNLOGGED;</programlisting>
</para>
<para>
This will however mean that monitoring history will not be available on
another node following a failover, and the view <literal>repmgr.replication_status</literal>
will not work on standbys.
</para>
</tip>
</chapter>