mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-23 07:06:30 +00:00
429 lines
19 KiB
Plaintext
429 lines
19 KiB
Plaintext
<chapter id="repmgrd-bdr">
|
|
<indexterm>
|
|
<primary>repmgrd</primary>
|
|
<secondary>BDR</secondary>
|
|
</indexterm>
|
|
|
|
<indexterm>
|
|
<primary>BDR</primary>
|
|
</indexterm>
|
|
|
|
<title>BDR failover with repmgrd</title>
|
|
<para>
|
|
&repmgr; 4.x provides support for monitoring a pair of BDR 2.x nodes and taking action in
|
|
case one of the nodes fails.
|
|
</para>
|
|
<note>
|
|
<simpara>
|
|
Due to the nature of BDR 1.x/2.x, it's only safe to use this solution for
|
|
a two-node scenario. Introducing additional nodes will create an inherent
|
|
risk of node desynchronisation if a node goes down without being cleanly
|
|
removed from the cluster.
|
|
</simpara>
|
|
</note>
|
|
<para>
|
|
In contrast to streaming replication, there's no concept of "promoting" a new
|
|
primary node with BDR. Instead, "failover" involves monitoring both nodes
|
|
with <application>repmgrd</application> and redirecting queries from the failed node to the remaining
|
|
active node. This can be done by using an
|
|
<link linkend="event-notifications">event notification</link> script
|
|
which is called by <application>repmgrd</application> to dynamically
|
|
reconfigure a proxy server/connection pooler such as <application>PgBouncer</application>.
|
|
</para>
|
|
|
|
<note>
|
|
<simpara>
|
|
This &repmgr; functionality is for BDR 2.x only running on PostgreSQL 9.4/9.6.
|
|
It is <emphasis>not</emphasis> required for later BDR versions.
|
|
</simpara>
|
|
</note>
|
|
|
|
<sect1 id="bdr-prerequisites" xreflabel="BDR prequisites">
|
|
<title>Prerequisites</title>
|
|
<important>
|
|
<para>
|
|
This &repmgr; functionality is for BDR 2.x only running on PostgreSQL 9.4/9.6.
|
|
It is <emphasis>not</emphasis> required for later BDR versions.
|
|
</para>
|
|
</important>
|
|
<para>
|
|
&repmgr; 4 requires PostgreSQL 9.4 or 9.6 with the BDR 2 extension
|
|
enabled and configured for a two-node BDR network. &repmgr; 4 packages
|
|
must be installed on each node before attempting to configure
|
|
<application>repmgr</application>.
|
|
</para>
|
|
<note>
|
|
<simpara>
|
|
&repmgr; 4 will refuse to install if it detects more than two BDR nodes.
|
|
</simpara>
|
|
</note>
|
|
<para>
|
|
Application database connections *must* be passed through a proxy server/
|
|
connection pooler such as <application>PgBouncer</application>, and it must be possible to dynamically
|
|
reconfigure that from <application>repmgrd</application>. The example demonstrated in this document
|
|
will use <application>PgBouncer</application>
|
|
</para>
|
|
<para>
|
|
The proxy server / connection poolers must <emphasis>not</emphasis>
|
|
be installed on the database servers.
|
|
</para>
|
|
<para>
|
|
For this example, it's assumed password-less SSH connections are available
|
|
from the PostgreSQL servers to the servers where <application>PgBouncer</application>
|
|
runs, and that the user on those servers has permission to alter the
|
|
<application>PgBouncer</application> configuration files.
|
|
</para>
|
|
<para>
|
|
PostgreSQL connections must be possible between each node, and each node
|
|
must be able to connect to each PgBouncer instance.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="bdr-configuration" xreflabel="BDR configuration">
|
|
<title>Configuration</title>
|
|
<para>
|
|
A sample configuration for <filename>repmgr.conf</filename> on each
|
|
BDR node would look like this:
|
|
<programlisting>
|
|
# Node information
|
|
node_id=1
|
|
node_name='node1'
|
|
conninfo='host=node1 dbname=bdrtest user=repmgr connect_timeout=2'
|
|
data_directory='/var/lib/postgresql/data'
|
|
replication_type='bdr'
|
|
|
|
# Event notification configuration
|
|
event_notifications=bdr_failover
|
|
event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a" >> /tmp/bdr-failover.log 2>&1'
|
|
|
|
# repmgrd options
|
|
monitor_interval_secs=5
|
|
reconnect_attempts=6
|
|
reconnect_interval=5</programlisting>
|
|
</para>
|
|
<para>
|
|
Adjust settings as appropriate; copy and adjust for the second node (particularly
|
|
the values <varname>node_id</varname>, <varname>node_name</varname>
|
|
and <varname>conninfo</varname>).
|
|
</para>
|
|
<para>
|
|
Note that the values provided for the <varname>conninfo</varname> string
|
|
must be valid for connections from <emphasis>both</emphasis> nodes in the
|
|
replication cluster. The database must be the BDR-enabled database.
|
|
</para>
|
|
<para>
|
|
If defined, the <varname>event_notifications</varname> parameter will restrict
|
|
execution of the script defined in <varname>event_notification_command</varname>
|
|
to the specified event(s).
|
|
</para>
|
|
<note>
|
|
<simpara>
|
|
<varname>event_notification_command</varname> is the script which does the actual "heavy lifting"
|
|
of reconfiguring the proxy server/ connection pooler. It is fully
|
|
user-definable; see section <xref linkend="bdr-event-notification-command"> for a reference
|
|
implementation.
|
|
</simpara>
|
|
</note>
|
|
|
|
</sect1>
|
|
|
|
<sect1 id="bdr-repmgr-setup" xreflabel="repmgr setup with BDR">
|
|
<title>repmgr setup</title>
|
|
<para>
|
|
Register both nodes; example on <literal>node1</literal>:
|
|
<programlisting>
|
|
$ repmgr -f /etc/repmgr.conf bdr register
|
|
NOTICE: attempting to install extension "repmgr"
|
|
NOTICE: "repmgr" extension successfully installed
|
|
NOTICE: node record created for node 'node1' (ID: 1)
|
|
NOTICE: BDR node 1 registered (conninfo: host=node1 dbname=bdrtest user=repmgr)</programlisting>
|
|
</para>
|
|
<para>
|
|
and on <literal>node1</literal>:
|
|
<programlisting>
|
|
$ repmgr -f /etc/repmgr.conf bdr register
|
|
NOTICE: node record created for node 'node2' (ID: 2)
|
|
NOTICE: BDR node 2 registered (conninfo: host=node2 dbname=bdrtest user=repmgr)</programlisting>
|
|
</para>
|
|
<para>
|
|
The <literal>repmgr</literal> extension will be automatically created
|
|
when the first node is registered, and will be propagated to the second
|
|
node.
|
|
</para>
|
|
<important>
|
|
<simpara>
|
|
Ensure the &repmgr; package is available on both nodes before
|
|
attempting to register the first node.
|
|
</simpara>
|
|
</important>
|
|
<para>
|
|
At this point the meta data for both nodes has been created; executing
|
|
<xref linkend="repmgr-cluster-show"> (on either node) should produce output like this:
|
|
<programlisting>
|
|
$ repmgr -f /etc/repmgr.conf cluster show
|
|
ID | Name | Role | Status | Upstream | Location | Connection string
|
|
----+-------+------+-----------+----------+--------------------------------------------------------
|
|
1 | node1 | bdr | * running | | default | host=node1 dbname=bdrtest user=repmgr connect_timeout=2
|
|
2 | node2 | bdr | * running | | default | host=node2 dbname=bdrtest user=repmgr connect_timeout=2</programlisting>
|
|
</para>
|
|
<para>
|
|
Additionally it's possible to display log of significant events; executing
|
|
<xref linkend="repmgr-cluster-event"> (on either node) should produce output like this:
|
|
<programlisting>
|
|
$ repmgr -f /etc/repmgr.conf cluster event
|
|
Node ID | Event | OK | Timestamp | Details
|
|
---------+--------------+----+---------------------+----------------------------------------------
|
|
2 | bdr_register | t | 2017-07-27 17:51:48 | node record created for node 'node2' (ID: 2)
|
|
1 | bdr_register | t | 2017-07-27 17:51:00 | node record created for node 'node1' (ID: 1)
|
|
</programlisting>
|
|
</para>
|
|
<para>
|
|
At this point there will only be records for the two node registrations (displayed here
|
|
in reverse chronological order).
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="bdr-event-notification-command" xreflabel="Defining the BDR failover "event_notification command"">
|
|
<title>Defining the BDR failover "event_notification_command"</title>
|
|
<para>
|
|
Key to "failover" execution is the <literal>event_notification_command</literal>,
|
|
which is a user-definable script specified in <filename>repmpgr.conf</filename>
|
|
and which can use a &repmgr; <link linkend="event-notifications">event notification</link>
|
|
to reconfigure the proxy server / connection pooler so it points to the other, still-active node.
|
|
Details of the event will be passed as parameters to the script.
|
|
</para>
|
|
<para>
|
|
Following parameter placeholders are available for the script definition in <filename>repmpgr.conf</filename>;
|
|
these will be replaced with the appropriate value when the script is executed:
|
|
</para>
|
|
|
|
<variablelist>
|
|
<varlistentry>
|
|
<term><option>%n</option></term>
|
|
<listitem>
|
|
<para>
|
|
node ID
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>%e</option></term>
|
|
<listitem>
|
|
<para>
|
|
event type
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>%t</option></term>
|
|
<listitem>
|
|
<para>
|
|
success (1 or 0)
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>%t</option></term>
|
|
<listitem>
|
|
<para>
|
|
timestamp
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
|
|
<varlistentry>
|
|
<term><option>%d</option></term>
|
|
<listitem>
|
|
<para>
|
|
details
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>%c</option></term>
|
|
<listitem>
|
|
<para>
|
|
conninfo string of the next available node (<varname>bdr_failover</varname> and <varname>bdr_recovery</varname>)
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
<varlistentry>
|
|
<term><option>%a</option></term>
|
|
<listitem>
|
|
<para>
|
|
name of the next available node (<varname>bdr_failover</varname> and <varname>bdr_recovery</varname>)
|
|
</para>
|
|
</listitem>
|
|
</varlistentry>
|
|
</variablelist>
|
|
|
|
<para>
|
|
Note that <literal>%c</literal> and <literal>%a</literal> are only provided with
|
|
particular failover events, in this case <varname>bdr_failover</varname>.
|
|
</para>
|
|
<para>
|
|
The provided sample script
|
|
(<literal><ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/scripts/bdr-pgbouncer.sh">scripts/bdr-pgbouncer.sh</ulink></literal>)
|
|
is configured as follows:
|
|
<programlisting>
|
|
event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a"'</programlisting>
|
|
</para>
|
|
<para>
|
|
and parses the placeholder parameters like this:
|
|
<programlisting>
|
|
NODE_ID=$1
|
|
EVENT_TYPE=$2
|
|
SUCCESS=$3
|
|
NEXT_CONNINFO=$4
|
|
NEXT_NODE_NAME=$5</programlisting>
|
|
</para>
|
|
<note>
|
|
<para>
|
|
The sample script also contains some hard-coded values for the <application>PgBouncer</application>
|
|
configuration for both nodes; these will need to be adjusted for your local environment
|
|
(ideally the scripts would be maintained as templates and generated by some
|
|
kind of provisioning system).
|
|
</para>
|
|
</note>
|
|
|
|
<para>
|
|
The script performs following steps:
|
|
<itemizedlist spacing="compact" mark="bullet">
|
|
<listitem>
|
|
<simpara>pauses <application>PgBouncer</application> on all nodes</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>recreates the <application>PgBouncer</application> configuration file on each
|
|
node using the information provided by <application>repmgrd</application>
|
|
(primarily the <varname>conninfo</varname> string) to configure
|
|
<application>PgBouncer</application></simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>reloads the <application>PgBouncer</application> configuration</simpara>
|
|
</listitem>
|
|
<listitem>
|
|
<simpara>executes the <command>RESUME</command> command (in <application>PgBouncer</application>)</simpara>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
<para>
|
|
Following successful script execution, any connections to PgBouncer on the failed BDR node
|
|
will be redirected to the active node.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="bdr-monitoring-failover" xreflabel="Node monitoring and failover">
|
|
<title>Node monitoring and failover</title>
|
|
<para>
|
|
At the intervals specified by <varname>monitor_interval_secs</varname>
|
|
in <filename>repmgr.conf</filename>, <application>repmgrd</application>
|
|
will ping each node to check if it's available. If a node isn't available,
|
|
<application>repmgrd</application> will enter failover mode and check <varname>reconnect_attempts</varname>
|
|
times at intervals of <varname>reconnect_interval</varname> to confirm the node is definitely unreachable.
|
|
This buffer period is necessary to avoid false positives caused by transient
|
|
network outages.
|
|
</para>
|
|
<para>
|
|
If the node is still unavailable, <application>repmgrd</application> will enter failover mode and execute
|
|
the script defined in <varname>event_notification_command</varname>; an entry will be logged
|
|
in the <literal>repmgr.events</literal> table and <application>repmgrd</application> will
|
|
(unless otherwise configured) resume monitoring of the node in "degraded" mode until it reappears.
|
|
</para>
|
|
<para>
|
|
<application>repmgrd</application> logfile output during a failover event will look something like this
|
|
on one node (usually the node which has failed, here <literal>node2</literal>):
|
|
<programlisting>
|
|
...
|
|
[2017-07-27 21:08:39] [INFO] starting continuous BDR node monitoring
|
|
[2017-07-27 21:08:39] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
|
|
[2017-07-27 21:08:55] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
|
|
[2017-07-27 21:09:11] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
|
|
[2017-07-27 21:09:23] [WARNING] unable to connect to node node2 (ID 2)
|
|
[2017-07-27 21:09:23] [INFO] checking state of node 2, 0 of 5 attempts
|
|
[2017-07-27 21:09:23] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:24] [INFO] checking state of node 2, 1 of 5 attempts
|
|
[2017-07-27 21:09:24] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:25] [INFO] checking state of node 2, 2 of 5 attempts
|
|
[2017-07-27 21:09:25] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:26] [INFO] checking state of node 2, 3 of 5 attempts
|
|
[2017-07-27 21:09:26] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:27] [INFO] checking state of node 2, 4 of 5 attempts
|
|
[2017-07-27 21:09:27] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:28] [WARNING] unable to reconnect to node 2 after 5 attempts
|
|
[2017-07-27 21:09:28] [NOTICE] setting node record for node 2 to inactive
|
|
[2017-07-27 21:09:28] [INFO] executing notification command for event "bdr_failover"
|
|
[2017-07-27 21:09:28] [DETAIL] command is:
|
|
/path/to/bdr-pgbouncer.sh 2 bdr_failover 1 "host=host=node1 dbname=bdrtest user=repmgr connect_timeout=2" "node1"
|
|
[2017-07-27 21:09:28] [INFO] node 'node2' (ID: 2) detected as failed; next available node is 'node1' (ID: 1)
|
|
[2017-07-27 21:09:28] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
|
|
[2017-07-27 21:09:28] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
|
|
...</programlisting>
|
|
</para>
|
|
<para>
|
|
Output on the other node (<literal>node1</literal>) during the same event will look like this:
|
|
<programlisting>
|
|
...
|
|
[2017-07-27 21:08:35] [INFO] starting continuous BDR node monitoring
|
|
[2017-07-27 21:08:35] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
|
|
[2017-07-27 21:08:51] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
|
|
[2017-07-27 21:09:07] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
|
|
[2017-07-27 21:09:23] [WARNING] unable to connect to node node2 (ID 2)
|
|
[2017-07-27 21:09:23] [INFO] checking state of node 2, 0 of 5 attempts
|
|
[2017-07-27 21:09:23] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:24] [INFO] checking state of node 2, 1 of 5 attempts
|
|
[2017-07-27 21:09:24] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:25] [INFO] checking state of node 2, 2 of 5 attempts
|
|
[2017-07-27 21:09:25] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:26] [INFO] checking state of node 2, 3 of 5 attempts
|
|
[2017-07-27 21:09:26] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:27] [INFO] checking state of node 2, 4 of 5 attempts
|
|
[2017-07-27 21:09:27] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:28] [WARNING] unable to reconnect to node 2 after 5 attempts
|
|
[2017-07-27 21:09:28] [NOTICE] other node's repmgrd is handling failover
|
|
[2017-07-27 21:09:28] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
|
|
[2017-07-27 21:09:28] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
|
|
...</programlisting>
|
|
</para>
|
|
<para>
|
|
This assumes only the PostgreSQL instance on <literal>node2</literal> has failed. In this case the
|
|
<application>repmgrd</application> instance running on <literal>node2</literal> has performed the failover. However if
|
|
the entire server becomes unavailable, <application>repmgrd</application> on <literal>node1</literal> will perform
|
|
the failover.
|
|
</para>
|
|
</sect1>
|
|
<sect1 id="bdr-node-recovery" xreflabel="Node recovery">
|
|
<title>Node recovery</title>
|
|
<para>
|
|
Following failure of a BDR node, if the node subsequently becomes available again,
|
|
a <varname>bdr_recovery</varname> event will be generated. This could potentially be used to
|
|
reconfigure PgBouncer automatically to bring the node back into the available pool,
|
|
however it would be prudent to manually verify the node's status before
|
|
exposing it to the application.
|
|
</para>
|
|
<para>
|
|
If the failed node comes back up and connects correctly, output similar to this
|
|
will be visible in the <application>repmgrd</application> log:
|
|
<programlisting>
|
|
[2017-07-27 21:25:30] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
|
|
[2017-07-27 21:25:46] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
|
|
[2017-07-27 21:25:46] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
|
|
[2017-07-27 21:25:55] [INFO] active replication slot for node "node1" found after 1 seconds
|
|
[2017-07-27 21:25:55] [NOTICE] node "node2" (ID: 2) has recovered after 986 seconds</programlisting>
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1 id="bdr-complete-shutdown" xreflabel="Shutdown of both nodes">
|
|
<title>Shutdown of both nodes</title>
|
|
<para>
|
|
If both PostgreSQL instances are shut down, <application>repmgrd</application> will try and handle the
|
|
situation as gracefully as possible, though with no failover candidates available
|
|
there's not much it can do. Should this case ever occur, we recommend shutting
|
|
down <application>repmgrd</application> on both nodes and restarting it once the PostgreSQL instances
|
|
are running properly.
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|
|
|