mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-23 07:06:30 +00:00
289 lines
13 KiB
Markdown
289 lines
13 KiB
Markdown
BDR failover with repmgrd
|
|
=========================
|
|
|
|
`repmgr 4` provides support for monitoring BDR nodes and taking action in case
|
|
one of the nodes fails.
|
|
|
|
*NOTE* Due to the nature of BDR, it's only safe to use this solution for
|
|
a two-node scenario. Introducing additional nodes will create an inherent
|
|
risk of node desynchronisation if a node goes down without being cleanly
|
|
removed from the cluster.
|
|
|
|
In contrast to streaming replication, there's no concept of "promoting" a new
|
|
primary node with BDR. Instead, "failover" involves monitoring both nodes
|
|
with `repmgrd` and redirecting queries from the failed node to the remaining
|
|
active node. This can be done by using the event notification script generated by
|
|
`repmgrd` to dynamically reconfigure a proxy server/connection pooler such
|
|
as PgBouncer.
|
|
|
|
|
|
Prerequisites
|
|
-------------
|
|
|
|
`repmgr 4` requires PostgreSQL 9.6 with the BDR 2 extension enabled and
|
|
configured for a two-node BDR network. `repmgr 4` packages
|
|
must be installed on each node before attempting to configure repmgr.
|
|
|
|
*NOTE* `repmgr 4` will refuse to install if it detects more than two
|
|
BDR nodes.
|
|
|
|
Application database connections *must* be passed through a proxy server/
|
|
connection pooler such as PgBouncer, and it must be possible to dynamically
|
|
reconfigure that from `repmgrd`. The example demonstrated in this document
|
|
will use PgBouncer.
|
|
|
|
The proxy server / connection poolers must not be installed on the database
|
|
servers.
|
|
|
|
For this example, it's assumed password-less SSH connections are available
|
|
from the PostgreSQL servers to the servers where PgBouncer runs, and
|
|
that the user on those servers has permission to alter the PgBouncer
|
|
configuration files.
|
|
|
|
PostgreSQL connections must be possible between each node, and each node
|
|
must be able to connect to each PgBouncer instance.
|
|
|
|
|
|
Configuration
|
|
-------------
|
|
|
|
Sample configuration for `repmgr.conf`:
|
|
|
|
node_id=1
|
|
node_name='node1'
|
|
conninfo='host=node1 dbname=bdrtest user=repmgr connect_timeout=2'
|
|
replication_type='bdr'
|
|
|
|
event_notifications=bdr_failover
|
|
event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a" >> /tmp/bdr-failover.log 2>&1'
|
|
|
|
# repmgrd options
|
|
monitor_interval_secs=5
|
|
reconnect_attempts=6
|
|
reconnect_interval=5
|
|
|
|
Adjust settings as appropriate; copy and adjust for the second node (particularly
|
|
the values `node_id`, `node_name` and `conninfo`).
|
|
|
|
Note that the values provided for the `conninfo` string must be valid for
|
|
connections from *both* nodes in the cluster. The database must be the BDR
|
|
database.
|
|
|
|
If defined, `event_notifications` will restrict execution of `event_notification_command`
|
|
to the specified events.
|
|
|
|
`event_notification_command` is the script which does the actual "heavy lifting"
|
|
of reconfiguring the proxy server/ connection pooler. It is fully user-definable;
|
|
a sample implementation is documented below.
|
|
|
|
|
|
repmgr user permissions
|
|
-----------------------
|
|
|
|
`repmgr` will create an extension in the BDR database containing objects
|
|
for administering `repmgr` metadata. The user defined in the `conninfo`
|
|
setting must be able to access all objects. Additionally, superuser permissions
|
|
are required to install the `repmgr` extension. The easiest way to do this
|
|
is create the `repmgr` user as a superuser, however if this is not
|
|
desirable, the `repmgr` user can be created as a normal user and a
|
|
superuser specified with `--superuser` when registering a BDR node.
|
|
|
|
repmgr setup
|
|
------------
|
|
|
|
Register both nodes:
|
|
|
|
$ repmgr -f /etc/repmgr.conf bdr register
|
|
NOTICE: attempting to install extension "repmgr"
|
|
NOTICE: "repmgr" extension successfully installed
|
|
NOTICE: node record created for node 'node1' (ID: 1)
|
|
NOTICE: BDR node 1 registered (conninfo: host=localhost dbname=bdrtest user=repmgr port=5501)
|
|
|
|
$ repmgr -f /etc/repmgr.conf bdr register
|
|
NOTICE: node record created for node 'node2' (ID: 2)
|
|
NOTICE: BDR node 2 registered (conninfo: host=localhost dbname=bdrtest user=repmgr port=5502)
|
|
|
|
The `repmgr` extension will be automatically created when the first
|
|
node is registered, and will be propagated to the second node.
|
|
|
|
*IMPORTANT* ensure the repmgr package is available on both nodes before
|
|
attempting to register the first node
|
|
|
|
|
|
At this point the meta data for both nodes has been created; executing
|
|
`repmgr cluster show` (on either node) should produce output like this:
|
|
|
|
$ repmgr -f /etc/repmgr.conf cluster show
|
|
ID | Name | Role | Status | Upstream | Connection string
|
|
----+-------+------+-----------+----------+--------------------------------------------------------
|
|
1 | node1 | bdr | * running | | host=node1 dbname=bdrtest user=repmgr connect_timeout=2
|
|
2 | node2 | bdr | * running | | host=node2 dbname=bdrtest user=repmgr connect_timeout=2
|
|
|
|
Additionally it's possible to see a log of significant events; so far
|
|
this will only record the two node registrations (in reverse chronological order):
|
|
|
|
Node ID | Event | OK | Timestamp | Details
|
|
---------+--------------+----+---------------------+----------------------------------------------
|
|
2 | bdr_register | t | 2017-07-27 17:51:48 | node record created for node 'node2' (ID: 2)
|
|
1 | bdr_register | t | 2017-07-27 17:51:00 | node record created for node 'node1' (ID: 1)
|
|
|
|
|
|
Defining the "event_notification_command"
|
|
-----------------------------------------
|
|
|
|
Key to "failover" execution is the `event_notification_command`, which is a
|
|
user-definable script which should reconfigure the proxy server/
|
|
connection pooler.
|
|
|
|
Each time `repmgr` (or `repmgrd`) records an event, it can optionally
|
|
execute the script defined in `event_notification_command` to
|
|
take further action; details of the event will be passed as parameters.
|
|
Following placeholders are available to the script:
|
|
|
|
%n - node ID
|
|
%e - event type
|
|
%s - success (1 or 0)
|
|
%t - timestamp
|
|
%d - details
|
|
%c - conninfo string of the next available node
|
|
%a - name of the next available node
|
|
|
|
Note that `%c` and `%a` will only be provided during `bdr_failover`
|
|
events, which is what is of interest here.
|
|
|
|
The provided sample script (`scripts/bdr-pgbouncer.sh`) is configured like
|
|
this:
|
|
|
|
event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a"'
|
|
|
|
and parses the configures parameters like this:
|
|
|
|
NODE_ID=$1
|
|
EVENT_TYPE=$2
|
|
SUCCESS=$3
|
|
NEXT_CONNINFO=$4
|
|
NEXT_NODE_NAME=$5
|
|
|
|
It also contains some hard-coded values about the PgBouncer configuration for
|
|
both nodes; these will need to be adjusted for your local environment of course
|
|
(ideally the scripts would be maintained as templates and generated by some
|
|
kind of provisioning system).
|
|
|
|
The script performs following steps:
|
|
|
|
- pauses PgBouncer on all nodes
|
|
- recreates the PgBouncer configuration file on each node using the information
|
|
provided by `repmgrd` (mainly the `conninfo` string) to configure PgBouncer
|
|
to point to the remaining node
|
|
- reloads the PgBouncer configuration
|
|
- resumes PgBouncer
|
|
|
|
From that point, any connections to PgBouncer on the failed BDR node will be redirected
|
|
to the active node.
|
|
|
|
|
|
repmgrd
|
|
-------
|
|
|
|
|
|
|
|
Node monitoring and failover
|
|
----------------------------
|
|
|
|
At the intervals specified by `monitor_interval_secs` in `repmgr.conf`, `repmgrd`
|
|
will ping each node to check if it's available. If a node isn't available,
|
|
`repmgrd` will enter failover mode and check `reconnect_attempts` times
|
|
at intervals of `reconnect_interval` to confirm the node is definitely unreachable.
|
|
This buffer period is necessary to avoid false positives caused by transient
|
|
network outages.
|
|
|
|
If the node is still unavailable, `repmgrd` will enter failover mode and execute
|
|
the script defined in `event_notification_command`; an entry will be logged
|
|
in the `repmgr.events` table and `repmgrd` will (unless otherwise configured)
|
|
resume monitoring of the node in "degraded" mode until it reappears.
|
|
|
|
`repmgrd` logfile output during a failover event will look something like this
|
|
one one node (usually the node which has failed, here "node2"):
|
|
|
|
...
|
|
[2017-07-27 21:08:39] [INFO] starting continuous BDR node monitoring
|
|
[2017-07-27 21:08:39] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
|
|
[2017-07-27 21:08:55] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
|
|
[2017-07-27 21:09:11] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
|
|
[2017-07-27 21:09:23] [WARNING] unable to connect to node node2 (ID 2)
|
|
[2017-07-27 21:09:23] [INFO] checking state of node 2, 0 of 5 attempts
|
|
[2017-07-27 21:09:23] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:24] [INFO] checking state of node 2, 1 of 5 attempts
|
|
[2017-07-27 21:09:24] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:25] [INFO] checking state of node 2, 2 of 5 attempts
|
|
[2017-07-27 21:09:25] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:26] [INFO] checking state of node 2, 3 of 5 attempts
|
|
[2017-07-27 21:09:26] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:27] [INFO] checking state of node 2, 4 of 5 attempts
|
|
[2017-07-27 21:09:27] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:28] [WARNING] unable to reconnect to node 2 after 5 attempts
|
|
[2017-07-27 21:09:28] [NOTICE] setting node record for node 2 to inactive
|
|
[2017-07-27 21:09:28] [INFO] executing notification command for event "bdr_failover"
|
|
[2017-07-27 21:09:28] [DETAIL] command is:
|
|
/path/to/bdr-pgbouncer.sh 2 bdr_failover 1 "host=host=node1 dbname=bdrtest user=repmgr connect_timeout=2" "node1"
|
|
[2017-07-27 21:09:28] [INFO] node 'node2' (ID: 2) detected as failed; next available node is 'node1' (ID: 1)
|
|
[2017-07-27 21:09:28] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
|
|
[2017-07-27 21:09:28] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
|
|
...
|
|
|
|
Output on the other node ("node1") during the same event will look like this:
|
|
|
|
[2017-07-27 21:08:35] [INFO] starting continuous BDR node monitoring
|
|
[2017-07-27 21:08:35] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
|
|
[2017-07-27 21:08:51] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
|
|
[2017-07-27 21:09:07] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
|
|
[2017-07-27 21:09:23] [WARNING] unable to connect to node node2 (ID 2)
|
|
[2017-07-27 21:09:23] [INFO] checking state of node 2, 0 of 5 attempts
|
|
[2017-07-27 21:09:23] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:24] [INFO] checking state of node 2, 1 of 5 attempts
|
|
[2017-07-27 21:09:24] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:25] [INFO] checking state of node 2, 2 of 5 attempts
|
|
[2017-07-27 21:09:25] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:26] [INFO] checking state of node 2, 3 of 5 attempts
|
|
[2017-07-27 21:09:26] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:27] [INFO] checking state of node 2, 4 of 5 attempts
|
|
[2017-07-27 21:09:27] [INFO] sleeping 1 seconds until next reconnection attempt
|
|
[2017-07-27 21:09:28] [WARNING] unable to reconnect to node 2 after 5 attempts
|
|
[2017-07-27 21:09:28] [NOTICE] other node's repmgrd is handling failover
|
|
[2017-07-27 21:09:28] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
|
|
[2017-07-27 21:09:28] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
|
|
|
|
This assumes only the PostgreSQL instance on "node2" has failed. In this case the
|
|
`repmgrd` instance running on "node2" has performed the failover. However if
|
|
the entire server becomes unavailable, `repmgrd` on "node1" will perform
|
|
the failover.
|
|
|
|
|
|
Node recovery
|
|
-------------
|
|
|
|
Following failure of a BDR node, if the node subsequently becomes available again,
|
|
a `bdr_recovery` event will be generated. This could potentially be used to
|
|
reconfigure PgBouncer automatically to bring the node back into the available pool,
|
|
however it would be prudent to manually verify the node's status before
|
|
exposing it to the application.
|
|
|
|
If the failed node comes back up and connects correctly, output similar to this
|
|
will be visible in the `repmgrd` log:
|
|
|
|
[2017-07-27 21:25:30] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
|
|
[2017-07-27 21:25:46] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
|
|
[2017-07-27 21:25:46] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
|
|
[2017-07-27 21:25:55] [INFO] active replication slot for node "node1" found after 1 seconds
|
|
[2017-07-27 21:25:55] [NOTICE] node "node2" (ID: 2) has recovered after 986 seconds
|
|
|
|
|
|
Shutdown of both nodes
|
|
----------------------
|
|
|
|
If both PostgreSQL instances are shut down, `repmgrd` will try and handle the
|
|
situation as gracefully as possible, though with no failover candidates available
|
|
there's not much it can do. Should this case ever occur, we recommend shutting
|
|
down `repmgrd` on both nodes and restarting it once the PostgreSQL instances
|
|
are running properly.
|