mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-26 08:36:30 +00:00
More updates
This commit is contained in:
181
doc/event-notifications.sgml
Normal file
181
doc/event-notifications.sgml
Normal file
@@ -0,0 +1,181 @@
|
|||||||
|
<chapter id="event-notifications" xreflabel="event notifications">
|
||||||
|
<title>Event Notifications</title>
|
||||||
|
<para>
|
||||||
|
Each time `repmgr` or `repmgrd` perform a significant event, a record
|
||||||
|
of that event is written into the `repmgr.events` table together with
|
||||||
|
a timestamp, an indication of failure or success, and further details
|
||||||
|
if appropriate. This is useful for gaining an overview of events
|
||||||
|
affecting the replication cluster. However note that this table has
|
||||||
|
advisory character and should be used in combination with the `repmgr`
|
||||||
|
and PostgreSQL logs to obtain details of any events.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example output after a primary was registered and a standby cloned
|
||||||
|
and registered:
|
||||||
|
<programlisting>
|
||||||
|
repmgr=# SELECT * from repmgr.events ;
|
||||||
|
node_id | event | successful | event_timestamp | details
|
||||||
|
---------+------------------+------------+-------------------------------+-------------------------------------------------------------------------------------
|
||||||
|
1 | primary_register | t | 2016-01-08 15:04:39.781733+09 |
|
||||||
|
2 | standby_clone | t | 2016-01-08 15:04:49.530001+09 | Cloned from host 'repmgr_node1', port 5432; backup method: pg_basebackup; --force: N
|
||||||
|
2 | standby_register | t | 2016-01-08 15:04:50.621292+09 |
|
||||||
|
(3 rows)</programlisting>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Alternatively, use <xref linkend="repmgr-cluster-event"> to output a
|
||||||
|
formatted list of events.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Additionally, event notifications can be passed to a user-defined program
|
||||||
|
or script which can take further action, e.g. send email notifications.
|
||||||
|
This is done by setting the `event_notification_command` parameter in
|
||||||
|
`repmgr.conf`.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
This parameter accepts the following format placeholders:
|
||||||
|
</para>
|
||||||
|
|
||||||
|
<variablelist>
|
||||||
|
<varlistentry>
|
||||||
|
<term><option>%n</option></term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
node ID
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term><option>%e</option></term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
event type
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term><option>%t</option></term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
success (1 or 0)
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
<varlistentry>
|
||||||
|
<term><option>%t</option></term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
timestamp
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
|
||||||
|
<varlistentry>
|
||||||
|
<term><option>%d</option></term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
details
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
</variablelist>
|
||||||
|
<para>
|
||||||
|
The values provided for <literal>%t</literal> and <literal>%d</literal>
|
||||||
|
will probably contain spaces, so should be quoted in the provided command
|
||||||
|
configuration, e.g.:
|
||||||
|
<programlisting>
|
||||||
|
event_notification_command='/path/to/some/script %n %e %s "%t" "%d"'
|
||||||
|
</programlisting>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Additionally the following format placeholders are available for the event
|
||||||
|
type <varname>bdr_failover</varname> and optionally <varname>bdr_recovery</varname>:
|
||||||
|
</para>
|
||||||
|
<variablelist>
|
||||||
|
<varlistentry>
|
||||||
|
<term><option>%c</option></term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
conninfo string of the next available node
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
<varlistentry>
|
||||||
|
<term><option>%a</option></term>
|
||||||
|
<listitem>
|
||||||
|
<para>
|
||||||
|
name of the next available node
|
||||||
|
</para>
|
||||||
|
</listitem>
|
||||||
|
</varlistentry>
|
||||||
|
</variablelist>
|
||||||
|
<para>
|
||||||
|
These should always be quoted.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
By default, all notification types will be passed to the designated script;
|
||||||
|
the notification types can be filtered to explicitly named ones:
|
||||||
|
<itemizedlist spacing="compact" mark="bullet">
|
||||||
|
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>primary_register</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>standby_register</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>standby_unregister</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>standby_clone</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>standby_promote</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>standby_follow</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>standby_disconnect_manual</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>repmgrd_start</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>repmgrd_shutdown</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>repmgrd_failover_promote</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>repmgrd_failover_follow</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>bdr_failover</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>bdr_reconnect</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>bdr_recovery</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>bdr_register</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>bdr_unregister</literal></simpara>
|
||||||
|
</listitem>
|
||||||
|
|
||||||
|
</itemizedlist>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Note that under some circumstances (e.g. when no replication cluster primary
|
||||||
|
could be located), it will not be possible to write an entry into the
|
||||||
|
<literal>repmgr.events</literal>
|
||||||
|
table, in which case executing a script via <varname>event_notification_command</varname>
|
||||||
|
can serve as a fallback by generating some form of notification.
|
||||||
|
</para>
|
||||||
|
|
||||||
|
|
||||||
|
</chapter>
|
||||||
@@ -43,10 +43,12 @@
|
|||||||
<!ENTITY promoting-standby SYSTEM "promoting-standby.sgml">
|
<!ENTITY promoting-standby SYSTEM "promoting-standby.sgml">
|
||||||
<!ENTITY follow-new-primary SYSTEM "follow-new-primary.sgml">
|
<!ENTITY follow-new-primary SYSTEM "follow-new-primary.sgml">
|
||||||
<!ENTITY switchover SYSTEM "switchover.sgml">
|
<!ENTITY switchover SYSTEM "switchover.sgml">
|
||||||
|
<!ENTITY event-notifications SYSTEM "event-notifications.sgml">
|
||||||
|
|
||||||
<!ENTITY repmgrd-automatic-failover SYSTEM "repmgrd-automatic-failover.sgml">
|
<!ENTITY repmgrd-automatic-failover SYSTEM "repmgrd-automatic-failover.sgml">
|
||||||
<!ENTITY repmgrd-configuration SYSTEM "repmgrd-configuration.sgml">
|
<!ENTITY repmgrd-configuration SYSTEM "repmgrd-configuration.sgml">
|
||||||
<!ENTITY repmgrd-demonstration SYSTEM "repmgrd-demonstration.sgml">
|
<!ENTITY repmgrd-demonstration SYSTEM "repmgrd-demonstration.sgml">
|
||||||
|
<!ENTITY repmgrd-monitoring SYSTEM "repmgrd-monitoring.sgml">
|
||||||
|
|
||||||
<!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.sgml">
|
<!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.sgml">
|
||||||
<!ENTITY repmgr-primary-unregister SYSTEM "repmgr-primary-unregister.sgml">
|
<!ENTITY repmgr-primary-unregister SYSTEM "repmgr-primary-unregister.sgml">
|
||||||
@@ -62,9 +64,9 @@
|
|||||||
<!ENTITY repmgr-cluster-show SYSTEM "repmgr-cluster-show.sgml">
|
<!ENTITY repmgr-cluster-show SYSTEM "repmgr-cluster-show.sgml">
|
||||||
<!ENTITY repmgr-cluster-matrix SYSTEM "repmgr-cluster-matrix.sgml">
|
<!ENTITY repmgr-cluster-matrix SYSTEM "repmgr-cluster-matrix.sgml">
|
||||||
<!ENTITY repmgr-cluster-crosscheck SYSTEM "repmgr-cluster-crosscheck.sgml">
|
<!ENTITY repmgr-cluster-crosscheck SYSTEM "repmgr-cluster-crosscheck.sgml">
|
||||||
|
<!ENTITY repmgr-cluster-event SYSTEM "repmgr-cluster-event.sgml">
|
||||||
<!ENTITY repmgr-cluster-cleanup SYSTEM "repmgr-cluster-cleanup.sgml">
|
<!ENTITY repmgr-cluster-cleanup SYSTEM "repmgr-cluster-cleanup.sgml">
|
||||||
|
|
||||||
|
|
||||||
<!ENTITY appendix-signatures SYSTEM "appendix-signatures.sgml">
|
<!ENTITY appendix-signatures SYSTEM "appendix-signatures.sgml">
|
||||||
|
|
||||||
<!ENTITY bookindex SYSTEM "bookindex.sgml">
|
<!ENTITY bookindex SYSTEM "bookindex.sgml">
|
||||||
|
|||||||
37
doc/repmgr-cluster-event.sgml
Normal file
37
doc/repmgr-cluster-event.sgml
Normal file
@@ -0,0 +1,37 @@
|
|||||||
|
<chapter id="repmgr-cluster-event" xreflabel="repmgr cluster event">
|
||||||
|
<indexterm>
|
||||||
|
<primary>repmgr cluster event</primary>
|
||||||
|
</indexterm>
|
||||||
|
<title>repmgr cluster event</title>
|
||||||
|
<para>
|
||||||
|
This outputs a formatted list of cluster events, as stored in the
|
||||||
|
<literal>repmgr.events</literal> table. Output is in reverse chronological order, and
|
||||||
|
can be filtered with the following options:
|
||||||
|
<itemizedlist spacing="compact" mark="bullet">
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>--all</literal>: outputs all entries</simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>--limit</literal>: set the maximum number of entries to output (default: 20)</simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>--node-id</literal>: restrict entries to node with this ID</simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>--node-name</literal>: restrict entries to node with this name</simpara>
|
||||||
|
</listitem>
|
||||||
|
<listitem>
|
||||||
|
<simpara><literal>--event</literal>: filter specific event</simpara>
|
||||||
|
</listitem>
|
||||||
|
</itemizedlist>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Example:
|
||||||
|
<programlisting>
|
||||||
|
$ repmgr -f /etc/repmgr.conf cluster event --event=standby_register
|
||||||
|
Node ID | Name | Event | OK | Timestamp | Details
|
||||||
|
---------+-------+------------------+----+---------------------+--------------------------------
|
||||||
|
3 | node3 | standby_register | t | 2017-08-17 10:28:55 | standby registration succeeded
|
||||||
|
2 | node2 | standby_register | t | 2017-08-17 10:28:53 | standby registration succeeded</programlisting>
|
||||||
|
</para>
|
||||||
|
</chapter>
|
||||||
@@ -73,6 +73,7 @@
|
|||||||
&promoting-standby;
|
&promoting-standby;
|
||||||
&follow-new-primary;
|
&follow-new-primary;
|
||||||
&switchover;
|
&switchover;
|
||||||
|
&event-notifications;
|
||||||
</part>
|
</part>
|
||||||
|
|
||||||
<part id="using-repmgrd">
|
<part id="using-repmgrd">
|
||||||
@@ -80,6 +81,7 @@
|
|||||||
&repmgrd-automatic-failover;
|
&repmgrd-automatic-failover;
|
||||||
&repmgrd-configuration;
|
&repmgrd-configuration;
|
||||||
&repmgrd-demonstration;
|
&repmgrd-demonstration;
|
||||||
|
&repmgrd-monitoring;
|
||||||
</part>
|
</part>
|
||||||
|
|
||||||
<part id="repmgr-command-reference">
|
<part id="repmgr-command-reference">
|
||||||
@@ -99,6 +101,7 @@
|
|||||||
&repmgr-cluster-show;
|
&repmgr-cluster-show;
|
||||||
&repmgr-cluster-matrix;
|
&repmgr-cluster-matrix;
|
||||||
&repmgr-cluster-crosscheck;
|
&repmgr-cluster-crosscheck;
|
||||||
|
&repmgr-cluster-event;
|
||||||
&repmgr-cluster-cleanup;
|
&repmgr-cluster-cleanup;
|
||||||
</part>
|
</part>
|
||||||
|
|
||||||
|
|||||||
71
doc/repmgrd-monitoring.sgml
Normal file
71
doc/repmgrd-monitoring.sgml
Normal file
@@ -0,0 +1,71 @@
|
|||||||
|
<chapter id="repmgrd-monitoring">
|
||||||
|
<title>Monitoring with repmgrd</title>
|
||||||
|
<para>
|
||||||
|
When `repmgrd` is running with the option <literal>monitoring_history=true</literal>,
|
||||||
|
it will constantly write standby node status information to the
|
||||||
|
<varname>monitoring_history</varname> table, providing a near-real time
|
||||||
|
overview of replication status on all nodes
|
||||||
|
in the cluster.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The view <literal>replication_status</literal> shows the most recent state
|
||||||
|
for each node, e.g.:
|
||||||
|
<programlisting>
|
||||||
|
repmgr=# select * from repmgr.replication_status;
|
||||||
|
-[ RECORD 1 ]-------------+------------------------------
|
||||||
|
primary_node_id | 1
|
||||||
|
standby_node_id | 2
|
||||||
|
standby_name | node2
|
||||||
|
node_type | standby
|
||||||
|
active | t
|
||||||
|
last_monitor_time | 2017-08-24 16:28:41.260478+09
|
||||||
|
last_wal_primary_location | 0/6D57A00
|
||||||
|
last_wal_standby_location | 0/5000000
|
||||||
|
replication_lag | 29 MB
|
||||||
|
replication_time_lag | 00:00:11.736163
|
||||||
|
apply_lag | 15 MB
|
||||||
|
communication_time_lag | 00:00:01.365643</programlisting>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
The interval in which monitoring history is written is controlled by the
|
||||||
|
configuration parameter <varname>monitor_interval_secs</varname>;
|
||||||
|
default is 2.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
As this can generate a large amount of monitoring data in the table
|
||||||
|
<literal>repmgr.monitoring_history</literal>. it's advisable to regularly
|
||||||
|
purge historical data using the <xref linkend="repmgr-cluster-cleanup">
|
||||||
|
command; use the <literal>-k/--keep-history</literal> option to
|
||||||
|
specify how many day's worth of data should be retained.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
It's possible to use <command>repmgrd</command> to run in monitoring
|
||||||
|
mode only (without automatic failover capability) for some or all
|
||||||
|
nodes by setting <literal>failover=manual</literal> in the node's
|
||||||
|
<filename>repmgr.conf</filename> file. In the event of the node's upstream failing,
|
||||||
|
no failover action will be taken and the node will require manual intervention to
|
||||||
|
be reattached to replication. If this occurs, an
|
||||||
|
<link linkend="event-notifications">event notification</link>
|
||||||
|
<varname>standby_disconnect_manual</varname> will be created.
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
Note that when a standby node is not streaming directly from its upstream
|
||||||
|
node, e.g. recovering WAL from an archive, <varname>apply_lag</varname> will always appear as
|
||||||
|
<literal>0 bytes</literal>.
|
||||||
|
</para>
|
||||||
|
<tip>
|
||||||
|
<para>
|
||||||
|
If monitoring history is enabled, the contents of the <literal>repmgr.monitoring_history</literal>
|
||||||
|
table will be replicated to attached standbys. This means there will be a small but
|
||||||
|
constant stream of replication activity which may not be desirable. To prevent
|
||||||
|
this, convert the table to an <literal>UNLOGGED</literal> one with:
|
||||||
|
<programlisting>
|
||||||
|
ALTER TABLE repmgr.monitoring_history SET UNLOGGED;</programlisting>
|
||||||
|
</para>
|
||||||
|
<para>
|
||||||
|
This will however mean that monitoring history will not be available on
|
||||||
|
another node following a failover, and the view <literal>repmgr.replication_status</literal>
|
||||||
|
will not work on standbys.
|
||||||
|
</para>
|
||||||
|
</tip>
|
||||||
|
</chapter>
|
||||||
Reference in New Issue
Block a user