mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-25 08:06:29 +00:00
Add documentation for repmgrd failover process and failed node fencing
Addresses GitHub #200.
This commit is contained in:
21
README.md
21
README.md
@@ -580,13 +580,13 @@ base backups and WAL files.
|
|||||||
|
|
||||||
Barman support provides the following advantages:
|
Barman support provides the following advantages:
|
||||||
|
|
||||||
- the primary node does not need to perform a new backup every time a
|
- the master node does not need to perform a new backup every time a
|
||||||
new standby is cloned;
|
new standby is cloned;
|
||||||
- a standby node can be disconnected for longer periods without losing
|
- a standby node can be disconnected for longer periods without losing
|
||||||
the ability to catch up, and without causing accumulation of WAL
|
the ability to catch up, and without causing accumulation of WAL
|
||||||
files on the primary node;
|
files on the master node;
|
||||||
- therefore, `repmgr` does not need to use replication slots, and the
|
- therefore, `repmgr` does not need to use replication slots, and the
|
||||||
primary node does not need to set `wal_keep_segments`.
|
master node does not need to set `wal_keep_segments`.
|
||||||
|
|
||||||
> *NOTE*: In view of the above, Barman support is incompatible with
|
> *NOTE*: In view of the above, Barman support is incompatible with
|
||||||
> the `use_replication_slots` setting in `repmgr.conf`.
|
> the `use_replication_slots` setting in `repmgr.conf`.
|
||||||
@@ -1743,6 +1743,21 @@ which contains connection details for the local database.
|
|||||||
the current working directory; no additional arguments are required.
|
the current working directory; no additional arguments are required.
|
||||||
|
|
||||||
|
|
||||||
|
### Further documentation
|
||||||
|
|
||||||
|
As well as this README, the `repmgr` source contains following additional
|
||||||
|
documentation files:
|
||||||
|
|
||||||
|
* FAQ.md - frequently asked questions
|
||||||
|
* CONTRIBUTING.md - how to contribute to `repmgr`
|
||||||
|
* PACKAGES.md - details on building packages
|
||||||
|
* SSH-RSYNC.md - how to set up passwordless SSH between nodes
|
||||||
|
* docs/repmgrd-failover-mechanism.md - how repmgrd picks which node to promote
|
||||||
|
* docs/repmgrd-node-fencing.md - how to "fence" a failed master node
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Error codes
|
### Error codes
|
||||||
|
|
||||||
`repmgr` or `repmgrd` will return one of the following error codes on program
|
`repmgr` or `repmgrd` will return one of the following error codes on program
|
||||||
|
|||||||
75
docs/repmgrd-failover-mechanism.md
Normal file
75
docs/repmgrd-failover-mechanism.md
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
repmgrd's failover algorithm
|
||||||
|
============================
|
||||||
|
|
||||||
|
When implementing automatic failover, there are two factors which are critical in
|
||||||
|
ensuring the desired result is achieved:
|
||||||
|
|
||||||
|
- has the master node genuinely failed?
|
||||||
|
- which is the best node to promote to the new master?
|
||||||
|
|
||||||
|
This document outlines repmgrd's decision-making process during automatic failover
|
||||||
|
for standbys directly connected to the master node.
|
||||||
|
|
||||||
|
|
||||||
|
Master node failure detection
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
If a `repmgrd` instance running on a PostgreSQL standby node is unable to connect to
|
||||||
|
the master node, this doesn't neccesarily mean that the master is down and a
|
||||||
|
failover is required. Factors such as network connectivity issues could mean that
|
||||||
|
even though the standby node is isolated, the replication cluster as a whole
|
||||||
|
is functioning correctly, and promoting the standby without further verification
|
||||||
|
could result in a "split-brain" situation.
|
||||||
|
|
||||||
|
In the event that `repmgrd` is unable to connect to the master node, it will attempt
|
||||||
|
to reconnect to the master server several times (as defined by the `reconnect_attempts`
|
||||||
|
parameter in `repmgr.conf`), with reconnection attempts occuring at the interval
|
||||||
|
specified by `reconnect_interval`. This happens to verify that the master is definitively
|
||||||
|
not accessible (e.g. that connection was not lost due to a brief network glitch).
|
||||||
|
|
||||||
|
Appropriate values for these settings will depend very much on the replication
|
||||||
|
cluster environment. There will necessarily be a trade-off between the time it
|
||||||
|
takes to assume the master is not reachable, and the reliability of that conclusion.
|
||||||
|
A standby in a different physical location to the master will probably need a longer
|
||||||
|
check interval to rule out possible network issues, whereas one located in the same
|
||||||
|
rack with a direct connection between servers could perform the check very quickly.
|
||||||
|
|
||||||
|
Note that it's possible the master comes back online after this point is reached,
|
||||||
|
but before a new master has been selected; in this case it will be noticed
|
||||||
|
during the selection of a new master and no actual failover will take place.
|
||||||
|
|
||||||
|
Promotion candidate selection
|
||||||
|
-----------------------------
|
||||||
|
|
||||||
|
Once `repmgrd` has decided the master is definitively unreachable, following checks
|
||||||
|
will be carried out:
|
||||||
|
|
||||||
|
* attempts to connect to all other nodes in the cluster (including the witness
|
||||||
|
node, if defined) to establish the state of the cluster, including their
|
||||||
|
current LSN
|
||||||
|
|
||||||
|
* If less than half of the nodes are visible (from the viewpoint
|
||||||
|
of this node), `repmgrd` will not take any further action. This is to ensure that
|
||||||
|
e.g. if a replication cluster is spread over multiple data centres, a split-brain
|
||||||
|
situation does not occur if there is a network failure between datacentres. Note
|
||||||
|
that if nodes are split evenly between data centres, a witness server can be
|
||||||
|
used to establish the "majority" daat centre.
|
||||||
|
|
||||||
|
* `repmgrd` polls all visible servers and waits for each node to return a valid LSN;
|
||||||
|
it updates the LSN previously stored for this node if it has increased since
|
||||||
|
the initial check
|
||||||
|
|
||||||
|
* once all LSNs have been retrieved, `repmgrd` will check for the highest LSN; if
|
||||||
|
its own node has the highest LSN, it will attempt to promote itself (using the
|
||||||
|
command defined in `promote_command` in `repmgr.conf`. Note that if using
|
||||||
|
`repmgr standby promote` as the promotion command, and the original master becomes available
|
||||||
|
before the promotion takes effect, `repmgr` will return an error and no promotion
|
||||||
|
will take place, and `repmgrd` will resume monitoring as usual.
|
||||||
|
|
||||||
|
* if the node is not the promotion candidate, `repmgrd` will execute the
|
||||||
|
`follow_command` defined in `repmgr.conf`. If using `repmgr standby follow` here,
|
||||||
|
`repmgr` will attempt to detect the new master node and attach to that.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
150
docs/repmgrd-node-fencing.md
Normal file
150
docs/repmgrd-node-fencing.md
Normal file
@@ -0,0 +1,150 @@
|
|||||||
|
Fencing a failed master node with repmgrd and pgbouncer
|
||||||
|
=======================================================
|
||||||
|
|
||||||
|
With automatic failover, it's essential to ensure that a failed master
|
||||||
|
remains inaccessible to your application, even if it comes back online
|
||||||
|
again, to avoid a split-brain situation.
|
||||||
|
|
||||||
|
By using `pgbouncer` together with `repmgrd`, it's possible to combine
|
||||||
|
automatic failover with a process to isolate the failed master from
|
||||||
|
your application and ensure that all connections which should go to
|
||||||
|
the master are directed there smoothly without having to reconfigure
|
||||||
|
your application. (Note that as a connection pooler, `pgbouncer` can
|
||||||
|
benefit your application in other ways, but those are beyond the scope
|
||||||
|
of this document).
|
||||||
|
|
||||||
|
* * *
|
||||||
|
|
||||||
|
> *WARNING*: automatic failover is tricky to get right. This document
|
||||||
|
> demonstrates one possible implementation method, however you should
|
||||||
|
> carefully configure and test any setup to suit the needs of your own
|
||||||
|
> replication cluster/application.
|
||||||
|
|
||||||
|
* * *
|
||||||
|
|
||||||
|
In a failover situation, `repmgrd` promotes a standby to master by
|
||||||
|
executing the command defined in `promote_command`. Normally this
|
||||||
|
would be something like:
|
||||||
|
|
||||||
|
repmgr standby promote -f /etc/repmgr.conf
|
||||||
|
|
||||||
|
By wrapping this in a custom script which adjusts the `pgbouncer`
|
||||||
|
configuration on all nodes, it's possible to fence the failed master
|
||||||
|
and redirect write connections to the new master.
|
||||||
|
|
||||||
|
The script consists of three sections:
|
||||||
|
|
||||||
|
* commands to pause `pgbouncer` on all nodes
|
||||||
|
* the promotion command itself
|
||||||
|
* commands to reconfigure and restart `pgbouncer` on all nodes
|
||||||
|
|
||||||
|
Note that it requires password-less SSH access between all nodes to be
|
||||||
|
able to update the `pgbouncer` configuration files.
|
||||||
|
|
||||||
|
For the purposes of this demonstration, we'll assume there are 3 nodes
|
||||||
|
(master and two standbys), with `pgbouncer` listening on port 6432
|
||||||
|
handling connections to a database called `appdb`. The `postgres`
|
||||||
|
system user must have write access to the `pgbouncer` configuration
|
||||||
|
file on all nodes, assumed to be at `/etc/pgbouncer.ini`.
|
||||||
|
|
||||||
|
The script also requires a template file containing global `pgbouncer`
|
||||||
|
configuration, which should looks something like this (adjust
|
||||||
|
settings appropriately for your environment):
|
||||||
|
|
||||||
|
`/var/lib/postgres/repmgr/pgbouncer.ini.template`
|
||||||
|
|
||||||
|
[pgbouncer]
|
||||||
|
|
||||||
|
logfile = /var/log/pgbouncer/pgbouncer.log
|
||||||
|
pidfile = /var/run/pgbouncer/pgbouncer.pid
|
||||||
|
|
||||||
|
listen_addr = *
|
||||||
|
listen_port = 6532
|
||||||
|
unix_socket_dir = /tmp
|
||||||
|
|
||||||
|
auth_type = trust
|
||||||
|
auth_file = /etc/pgbouncer.auth
|
||||||
|
|
||||||
|
admin_users = postgres
|
||||||
|
stats_users = postgres
|
||||||
|
|
||||||
|
pool_mode = transaction
|
||||||
|
|
||||||
|
max_client_conn = 100
|
||||||
|
default_pool_size = 20
|
||||||
|
min_pool_size = 5
|
||||||
|
reserve_pool_size = 5
|
||||||
|
reserve_pool_timeout = 3
|
||||||
|
|
||||||
|
log_connections = 1
|
||||||
|
log_disconnections = 1
|
||||||
|
log_pooler_errors = 1
|
||||||
|
|
||||||
|
The actual script is as follows; adjust the configurable items as appropriate:
|
||||||
|
|
||||||
|
`/var/lib/postgres/repmgr/promote.sh`
|
||||||
|
|
||||||
|
|
||||||
|
#!/usr/bin/env bash
|
||||||
|
set -u
|
||||||
|
set -e
|
||||||
|
|
||||||
|
# Configurable items
|
||||||
|
PGBOUNCER_HOSTS="node1 node2 node3"
|
||||||
|
REPMGR_DB="repmgr"
|
||||||
|
REPMGR_USER="repmgr"
|
||||||
|
REPMGR_SCHEMA="repmgr_test"
|
||||||
|
PGBOUNCER_CONFIG="/etc/pgbouncer.ini"
|
||||||
|
PGBOUNCER_INI_TEMPLATE="/var/lib/postgres/repmgr/pgbouncer.ini.template"
|
||||||
|
PGBOUNCER_DATABASE="appdb"
|
||||||
|
|
||||||
|
# 1. Pause running pgbouncer instances
|
||||||
|
for HOST in $PGBOUNCER_HOSTS
|
||||||
|
do
|
||||||
|
psql -t -c "pause" -h $HOST -p $PORT -U postgres pgbouncer
|
||||||
|
done
|
||||||
|
|
||||||
|
|
||||||
|
# 2. Promote this node from standby to master
|
||||||
|
|
||||||
|
repmgr standby promote -f /etc/repmgr.conf
|
||||||
|
|
||||||
|
|
||||||
|
# 3. Reconfigure pgbouncer instances
|
||||||
|
|
||||||
|
PGBOUNCER_INI_NEW="/tmp/pgbouncer.ini.new"
|
||||||
|
|
||||||
|
for HOST in $PGBOUNCER_HOSTS
|
||||||
|
do
|
||||||
|
# Recreate the pgbouncer config file
|
||||||
|
echo -e "[databases]\n" > $PGBOUNCER_INI_NEW
|
||||||
|
|
||||||
|
psql -d $REPMGR_DB -U $REPMGR_USER -t -A \
|
||||||
|
-c "SELECT '$PGBOUNCER_DATABASE= ' || conninfo || ' application_name=pgbouncer_$HOST' \
|
||||||
|
FROM $REPMGR_SCHEMA.repl_nodes \
|
||||||
|
WHERE active = TRUE AND type='master'" >> $PGBOUNCER_INI_NEW
|
||||||
|
|
||||||
|
cat $PGBOUNCER_INI_TEMPLATE >> $PGBOUNCER_INI_NEW
|
||||||
|
|
||||||
|
rsync $PGBOUNCER_INI_NEW $HOST:$PGBOUNCER_CONFIG
|
||||||
|
|
||||||
|
psql -tc "reload" -h $HOST -U postgres pgbouncer
|
||||||
|
psql -tc "resume" -h $HOST -U postgres pgbouncer
|
||||||
|
|
||||||
|
done
|
||||||
|
|
||||||
|
# Clean up generated file
|
||||||
|
rm $PGBOUNCER_INI_NEW
|
||||||
|
|
||||||
|
echo "Reconfiguration of pgbouncer complete"
|
||||||
|
|
||||||
|
Script and template file should be installed on each node where
|
||||||
|
`repmgrd` is running.
|
||||||
|
|
||||||
|
Finally, set `promote_command` in `repmgr.conf` on each node to
|
||||||
|
point to the custom promote script:
|
||||||
|
|
||||||
|
promote_command=/var/lib/postgres/repmgr/promote.sh
|
||||||
|
|
||||||
|
and reload/restart any running `repmgrd` instances for the changes to take
|
||||||
|
effect.
|
||||||
Reference in New Issue
Block a user