mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-22 22:56:29 +00:00
More updates
This commit is contained in:
181
doc/event-notifications.sgml
Normal file
181
doc/event-notifications.sgml
Normal file
@@ -0,0 +1,181 @@
|
||||
<chapter id="event-notifications" xreflabel="event notifications">
|
||||
<title>Event Notifications</title>
|
||||
<para>
|
||||
Each time `repmgr` or `repmgrd` perform a significant event, a record
|
||||
of that event is written into the `repmgr.events` table together with
|
||||
a timestamp, an indication of failure or success, and further details
|
||||
if appropriate. This is useful for gaining an overview of events
|
||||
affecting the replication cluster. However note that this table has
|
||||
advisory character and should be used in combination with the `repmgr`
|
||||
and PostgreSQL logs to obtain details of any events.
|
||||
</para>
|
||||
<para>
|
||||
Example output after a primary was registered and a standby cloned
|
||||
and registered:
|
||||
<programlisting>
|
||||
repmgr=# SELECT * from repmgr.events ;
|
||||
node_id | event | successful | event_timestamp | details
|
||||
---------+------------------+------------+-------------------------------+-------------------------------------------------------------------------------------
|
||||
1 | primary_register | t | 2016-01-08 15:04:39.781733+09 |
|
||||
2 | standby_clone | t | 2016-01-08 15:04:49.530001+09 | Cloned from host 'repmgr_node1', port 5432; backup method: pg_basebackup; --force: N
|
||||
2 | standby_register | t | 2016-01-08 15:04:50.621292+09 |
|
||||
(3 rows)</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
Alternatively, use <xref linkend="repmgr-cluster-event"> to output a
|
||||
formatted list of events.
|
||||
</para>
|
||||
<para>
|
||||
Additionally, event notifications can be passed to a user-defined program
|
||||
or script which can take further action, e.g. send email notifications.
|
||||
This is done by setting the `event_notification_command` parameter in
|
||||
`repmgr.conf`.
|
||||
</para>
|
||||
<para>
|
||||
This parameter accepts the following format placeholders:
|
||||
</para>
|
||||
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term><option>%n</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
node ID
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><option>%e</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
event type
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><option>%t</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
success (1 or 0)
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term><option>%t</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
timestamp
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
|
||||
<varlistentry>
|
||||
<term><option>%d</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
details
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
<para>
|
||||
The values provided for <literal>%t</literal> and <literal>%d</literal>
|
||||
will probably contain spaces, so should be quoted in the provided command
|
||||
configuration, e.g.:
|
||||
<programlisting>
|
||||
event_notification_command='/path/to/some/script %n %e %s "%t" "%d"'
|
||||
</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
Additionally the following format placeholders are available for the event
|
||||
type <varname>bdr_failover</varname> and optionally <varname>bdr_recovery</varname>:
|
||||
</para>
|
||||
<variablelist>
|
||||
<varlistentry>
|
||||
<term><option>%c</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
conninfo string of the next available node
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
<varlistentry>
|
||||
<term><option>%a</option></term>
|
||||
<listitem>
|
||||
<para>
|
||||
name of the next available node
|
||||
</para>
|
||||
</listitem>
|
||||
</varlistentry>
|
||||
</variablelist>
|
||||
<para>
|
||||
These should always be quoted.
|
||||
</para>
|
||||
<para>
|
||||
By default, all notification types will be passed to the designated script;
|
||||
the notification types can be filtered to explicitly named ones:
|
||||
<itemizedlist spacing="compact" mark="bullet">
|
||||
|
||||
<listitem>
|
||||
<simpara><literal>primary_register</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>standby_register</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>standby_unregister</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>standby_clone</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>standby_promote</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>standby_follow</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>standby_disconnect_manual</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>repmgrd_start</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>repmgrd_shutdown</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>repmgrd_failover_promote</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>repmgrd_failover_follow</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>bdr_failover</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>bdr_reconnect</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>bdr_recovery</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>bdr_register</literal></simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>bdr_unregister</literal></simpara>
|
||||
</listitem>
|
||||
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
Note that under some circumstances (e.g. when no replication cluster primary
|
||||
could be located), it will not be possible to write an entry into the
|
||||
<literal>repmgr.events</literal>
|
||||
table, in which case executing a script via <varname>event_notification_command</varname>
|
||||
can serve as a fallback by generating some form of notification.
|
||||
</para>
|
||||
|
||||
|
||||
</chapter>
|
||||
@@ -43,10 +43,12 @@
|
||||
<!ENTITY promoting-standby SYSTEM "promoting-standby.sgml">
|
||||
<!ENTITY follow-new-primary SYSTEM "follow-new-primary.sgml">
|
||||
<!ENTITY switchover SYSTEM "switchover.sgml">
|
||||
<!ENTITY event-notifications SYSTEM "event-notifications.sgml">
|
||||
|
||||
<!ENTITY repmgrd-automatic-failover SYSTEM "repmgrd-automatic-failover.sgml">
|
||||
<!ENTITY repmgrd-configuration SYSTEM "repmgrd-configuration.sgml">
|
||||
<!ENTITY repmgrd-demonstration SYSTEM "repmgrd-demonstration.sgml">
|
||||
<!ENTITY repmgrd-monitoring SYSTEM "repmgrd-monitoring.sgml">
|
||||
|
||||
<!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.sgml">
|
||||
<!ENTITY repmgr-primary-unregister SYSTEM "repmgr-primary-unregister.sgml">
|
||||
@@ -62,9 +64,9 @@
|
||||
<!ENTITY repmgr-cluster-show SYSTEM "repmgr-cluster-show.sgml">
|
||||
<!ENTITY repmgr-cluster-matrix SYSTEM "repmgr-cluster-matrix.sgml">
|
||||
<!ENTITY repmgr-cluster-crosscheck SYSTEM "repmgr-cluster-crosscheck.sgml">
|
||||
<!ENTITY repmgr-cluster-event SYSTEM "repmgr-cluster-event.sgml">
|
||||
<!ENTITY repmgr-cluster-cleanup SYSTEM "repmgr-cluster-cleanup.sgml">
|
||||
|
||||
|
||||
<!ENTITY appendix-signatures SYSTEM "appendix-signatures.sgml">
|
||||
|
||||
<!ENTITY bookindex SYSTEM "bookindex.sgml">
|
||||
|
||||
37
doc/repmgr-cluster-event.sgml
Normal file
37
doc/repmgr-cluster-event.sgml
Normal file
@@ -0,0 +1,37 @@
|
||||
<chapter id="repmgr-cluster-event" xreflabel="repmgr cluster event">
|
||||
<indexterm>
|
||||
<primary>repmgr cluster event</primary>
|
||||
</indexterm>
|
||||
<title>repmgr cluster event</title>
|
||||
<para>
|
||||
This outputs a formatted list of cluster events, as stored in the
|
||||
<literal>repmgr.events</literal> table. Output is in reverse chronological order, and
|
||||
can be filtered with the following options:
|
||||
<itemizedlist spacing="compact" mark="bullet">
|
||||
<listitem>
|
||||
<simpara><literal>--all</literal>: outputs all entries</simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>--limit</literal>: set the maximum number of entries to output (default: 20)</simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>--node-id</literal>: restrict entries to node with this ID</simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>--node-name</literal>: restrict entries to node with this name</simpara>
|
||||
</listitem>
|
||||
<listitem>
|
||||
<simpara><literal>--event</literal>: filter specific event</simpara>
|
||||
</listitem>
|
||||
</itemizedlist>
|
||||
</para>
|
||||
<para>
|
||||
Example:
|
||||
<programlisting>
|
||||
$ repmgr -f /etc/repmgr.conf cluster event --event=standby_register
|
||||
Node ID | Name | Event | OK | Timestamp | Details
|
||||
---------+-------+------------------+----+---------------------+--------------------------------
|
||||
3 | node3 | standby_register | t | 2017-08-17 10:28:55 | standby registration succeeded
|
||||
2 | node2 | standby_register | t | 2017-08-17 10:28:53 | standby registration succeeded</programlisting>
|
||||
</para>
|
||||
</chapter>
|
||||
@@ -73,6 +73,7 @@
|
||||
&promoting-standby;
|
||||
&follow-new-primary;
|
||||
&switchover;
|
||||
&event-notifications;
|
||||
</part>
|
||||
|
||||
<part id="using-repmgrd">
|
||||
@@ -80,6 +81,7 @@
|
||||
&repmgrd-automatic-failover;
|
||||
&repmgrd-configuration;
|
||||
&repmgrd-demonstration;
|
||||
&repmgrd-monitoring;
|
||||
</part>
|
||||
|
||||
<part id="repmgr-command-reference">
|
||||
@@ -99,6 +101,7 @@
|
||||
&repmgr-cluster-show;
|
||||
&repmgr-cluster-matrix;
|
||||
&repmgr-cluster-crosscheck;
|
||||
&repmgr-cluster-event;
|
||||
&repmgr-cluster-cleanup;
|
||||
</part>
|
||||
|
||||
|
||||
71
doc/repmgrd-monitoring.sgml
Normal file
71
doc/repmgrd-monitoring.sgml
Normal file
@@ -0,0 +1,71 @@
|
||||
<chapter id="repmgrd-monitoring">
|
||||
<title>Monitoring with repmgrd</title>
|
||||
<para>
|
||||
When `repmgrd` is running with the option <literal>monitoring_history=true</literal>,
|
||||
it will constantly write standby node status information to the
|
||||
<varname>monitoring_history</varname> table, providing a near-real time
|
||||
overview of replication status on all nodes
|
||||
in the cluster.
|
||||
</para>
|
||||
<para>
|
||||
The view <literal>replication_status</literal> shows the most recent state
|
||||
for each node, e.g.:
|
||||
<programlisting>
|
||||
repmgr=# select * from repmgr.replication_status;
|
||||
-[ RECORD 1 ]-------------+------------------------------
|
||||
primary_node_id | 1
|
||||
standby_node_id | 2
|
||||
standby_name | node2
|
||||
node_type | standby
|
||||
active | t
|
||||
last_monitor_time | 2017-08-24 16:28:41.260478+09
|
||||
last_wal_primary_location | 0/6D57A00
|
||||
last_wal_standby_location | 0/5000000
|
||||
replication_lag | 29 MB
|
||||
replication_time_lag | 00:00:11.736163
|
||||
apply_lag | 15 MB
|
||||
communication_time_lag | 00:00:01.365643</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
The interval in which monitoring history is written is controlled by the
|
||||
configuration parameter <varname>monitor_interval_secs</varname>;
|
||||
default is 2.
|
||||
</para>
|
||||
<para>
|
||||
As this can generate a large amount of monitoring data in the table
|
||||
<literal>repmgr.monitoring_history</literal>. it's advisable to regularly
|
||||
purge historical data using the <xref linkend="repmgr-cluster-cleanup">
|
||||
command; use the <literal>-k/--keep-history</literal> option to
|
||||
specify how many day's worth of data should be retained.
|
||||
</para>
|
||||
<para>
|
||||
It's possible to use <command>repmgrd</command> to run in monitoring
|
||||
mode only (without automatic failover capability) for some or all
|
||||
nodes by setting <literal>failover=manual</literal> in the node's
|
||||
<filename>repmgr.conf</filename> file. In the event of the node's upstream failing,
|
||||
no failover action will be taken and the node will require manual intervention to
|
||||
be reattached to replication. If this occurs, an
|
||||
<link linkend="event-notifications">event notification</link>
|
||||
<varname>standby_disconnect_manual</varname> will be created.
|
||||
</para>
|
||||
<para>
|
||||
Note that when a standby node is not streaming directly from its upstream
|
||||
node, e.g. recovering WAL from an archive, <varname>apply_lag</varname> will always appear as
|
||||
<literal>0 bytes</literal>.
|
||||
</para>
|
||||
<tip>
|
||||
<para>
|
||||
If monitoring history is enabled, the contents of the <literal>repmgr.monitoring_history</literal>
|
||||
table will be replicated to attached standbys. This means there will be a small but
|
||||
constant stream of replication activity which may not be desirable. To prevent
|
||||
this, convert the table to an <literal>UNLOGGED</literal> one with:
|
||||
<programlisting>
|
||||
ALTER TABLE repmgr.monitoring_history SET UNLOGGED;</programlisting>
|
||||
</para>
|
||||
<para>
|
||||
This will however mean that monitoring history will not be available on
|
||||
another node following a failover, and the view <literal>repmgr.replication_status</literal>
|
||||
will not work on standbys.
|
||||
</para>
|
||||
</tip>
|
||||
</chapter>
|
||||
Reference in New Issue
Block a user