mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-22 22:56:29 +00:00
Add/update document "repmgrd-node-fencing.md"
This commit is contained in:
153
doc/repmgrd-node-fencing.md
Normal file
153
doc/repmgrd-node-fencing.md
Normal file
@@ -0,0 +1,153 @@
|
||||
Fencing a failed master node with repmgrd and pgbouncer
|
||||
=======================================================
|
||||
|
||||
With automatic failover, it's essential to ensure that a failed master
|
||||
remains inaccessible to your application, even if it comes back online
|
||||
again, to avoid a split-brain situation.
|
||||
|
||||
By using `pgbouncer` together with `repmgrd`, it's possible to combine
|
||||
automatic failover with a process to isolate the failed master from
|
||||
your application and ensure that all connections which should go to
|
||||
the master are directed there smoothly without having to reconfigure
|
||||
your application. (Note that as a connection pooler, `pgbouncer` can
|
||||
benefit your application in other ways, but those are beyond the scope
|
||||
of this document).
|
||||
|
||||
* * *
|
||||
|
||||
> *WARNING*: automatic failover is tricky to get right. This document
|
||||
> demonstrates one possible implementation method, however you should
|
||||
> carefully configure and test any setup to suit the needs of your own
|
||||
> replication cluster/application.
|
||||
|
||||
* * *
|
||||
|
||||
In a failover situation, `repmgrd` promotes a standby to master by executing
|
||||
the command defined in `promote_command`. Normally this would be something like:
|
||||
|
||||
repmgr standby promote -f /etc/repmgr.conf
|
||||
|
||||
By wrapping this in a custom script which adjusts the `pgbouncer` configuration
|
||||
on all nodes, it's possible to fence the failed master and redirect write
|
||||
connections to the new master.
|
||||
|
||||
The script consists of two sections:
|
||||
|
||||
* the promotion command itself
|
||||
* commands to reconfigure `pgbouncer` on all nodes
|
||||
|
||||
Note that it requires password-less SSH access from the `repmgr` nodes
|
||||
to all the `pgbouncer` nodes to be able to update the `pgbouncer`
|
||||
configuration files.
|
||||
|
||||
For the purposes of this demonstration, we'll assume there are 3 nodes (master
|
||||
and two standbys), with `pgbouncer` listening on port 6432 handling connections
|
||||
to a database called `appdb`. The `postgres` system user must have write
|
||||
access to the `pgbouncer` configuration files on all nodes. We'll assume
|
||||
there's a main `pgbouncer` configuration file, `/etc/pgbouncer.ini`, which uses
|
||||
the `%include` directive (available from PgBouncer 1.6) to include a separate
|
||||
configuration file, `/etc/pgbouncer.database.ini`, which will be modified by
|
||||
`repmgr`.
|
||||
|
||||
* * *
|
||||
|
||||
> *NOTE*: in this self-contained demonstration, `pgbouncer` is running on the
|
||||
> database servers, however in a production environment it will make more
|
||||
> sense to run `pgbouncer` on either separate nodes or the application server.
|
||||
|
||||
* * *
|
||||
|
||||
`/etc/pgbouncer.ini` should look something like this:
|
||||
|
||||
[pgbouncer]
|
||||
|
||||
logfile = /var/log/pgbouncer/pgbouncer.log
|
||||
pidfile = /var/run/pgbouncer/pgbouncer.pid
|
||||
|
||||
listen_addr = *
|
||||
listen_port = 6532
|
||||
unix_socket_dir = /tmp
|
||||
|
||||
auth_type = trust
|
||||
auth_file = /etc/pgbouncer.auth
|
||||
|
||||
admin_users = postgres
|
||||
stats_users = postgres
|
||||
|
||||
pool_mode = transaction
|
||||
|
||||
max_client_conn = 100
|
||||
default_pool_size = 20
|
||||
min_pool_size = 5
|
||||
reserve_pool_size = 5
|
||||
reserve_pool_timeout = 3
|
||||
|
||||
log_connections = 1
|
||||
log_disconnections = 1
|
||||
log_pooler_errors = 1
|
||||
|
||||
%include /etc/pgbouncer.database.ini
|
||||
|
||||
The actual script is as follows; adjust the configurable items as appropriate:
|
||||
|
||||
`/var/lib/postgres/repmgr/promote.sh`
|
||||
|
||||
|
||||
#!/usr/bin/env bash
|
||||
set -u
|
||||
set -e
|
||||
|
||||
# Configurable items
|
||||
PGBOUNCER_HOSTS="node1 node2 node3"
|
||||
PGBOUNCER_DATABASE_INI="/etc/pgbouncer.database.ini"
|
||||
PGBOUNCER_DATABASE="appdb"
|
||||
PGBOUNCER_PORT=6432
|
||||
|
||||
REPMGR_DB="repmgr"
|
||||
REPMGR_USER="repmgr"
|
||||
|
||||
# 1. Promote this node from standby to master
|
||||
|
||||
repmgr standby promote -f /etc/repmgr.conf
|
||||
|
||||
# 2. Reconfigure pgbouncer instances
|
||||
|
||||
PGBOUNCER_DATABASE_INI_NEW="/tmp/pgbouncer.database.ini"
|
||||
|
||||
for HOST in $PGBOUNCER_HOSTS
|
||||
do
|
||||
# Recreate the pgbouncer config file
|
||||
echo -e "[databases]\n" > $PGBOUNCER_DATABASE_INI_NEW
|
||||
|
||||
psql -d $REPMGR_DB -U $REPMGR_USER -t -A \
|
||||
-c "SELECT '${PGBOUNCER_DATABASE}-rw= ' || conninfo || ' application_name=pgbouncer_${HOST}' \
|
||||
FROM repmgr.nodes \
|
||||
WHERE active = TRUE AND type='primary'" >> $PGBOUNCER_DATABASE_INI_NEW
|
||||
|
||||
psql -d $REPMGR_DB -U $REPMGR_USER -t -A \
|
||||
-c "SELECT '${PGBOUNCER_DATABASE}-ro= ' || conninfo || ' application_name=pgbouncer_${HOST}' \
|
||||
FROM repmgr.nodes \
|
||||
WHERE node_name='${HOST}'" >> $PGBOUNCER_DATABASE_INI_NEW
|
||||
|
||||
|
||||
rsync $PGBOUNCER_DATABASE_INI_NEW $HOST:$PGBOUNCER_DATABASE_INI
|
||||
|
||||
psql -tc "reload" -h $HOST -p $PGBOUNCER_PORT -U postgres pgbouncer
|
||||
|
||||
done
|
||||
|
||||
# Clean up generated file
|
||||
rm $PGBOUNCER_DATABASE_INI_NEW
|
||||
|
||||
echo "Reconfiguration of pgbouncer complete"
|
||||
|
||||
Script and template file should be installed on each node where
|
||||
`repmgrd` is running.
|
||||
|
||||
Finally, set `promote_command` in `repmgr.conf` on each node to
|
||||
point to the custom promote script:
|
||||
|
||||
promote_command=/var/lib/postgres/repmgr/promote.sh
|
||||
|
||||
and reload/restart any running `repmgrd` instances for the changes to take
|
||||
effect.
|
||||
Reference in New Issue
Block a user