Compare commits

...

8 Commits
v3.3.1 ... v3.2

Author SHA1 Message Date
Ian Barwick
fefa43e3a6 Minor README fix 2016-10-05 16:49:01 +09:00
Ian Barwick
c1a1fe6f82 Update README
`--ignore-external-config-files` deprecated
2016-10-05 16:48:40 +09:00
Ian Barwick
4dc3a05e8d Update history 2016-10-05 13:58:05 +09:00
Ian Barwick
5945accd84 Add documentation for repmgrd failover process and failed node fencing
Addresses GitHub #200.
2016-10-05 13:58:01 +09:00
Ian Barwick
15cbda9ec3 repmgr: consistent error message style 2016-10-05 13:57:57 +09:00
Ian Barwick
358559acc4 Update barman-wal-restore documentation
Barman 2.0 provides this in a separate, more convenient `barman-cli` package;
document this and add note about previous `barman-wal-restore.py` script.
2016-10-03 16:04:02 +09:00
Ian Barwick
0a9f8e160a Tweak repmgr.conf.sample
Put `monitor_interval_secs` at the start of the `repmgrd` section, as it's
a very fundamental configuration item.
2016-10-03 16:03:59 +09:00
Ian Barwick
a2d67e85de Bump version
3.2
2016-09-30 15:13:56 +09:00
7 changed files with 272 additions and 42 deletions

10
HISTORY
View File

@@ -1,4 +1,4 @@
3.2 2016- 3.2 2016-10-05
repmgr: add support for cloning from a Barman backup (Gianni) repmgr: add support for cloning from a Barman backup (Gianni)
repmgr: add commands `standby matrix` and `standby crosscheck` (Gianni) repmgr: add commands `standby matrix` and `standby crosscheck` (Gianni)
repmgr: suppress connection error display in `repmgr cluster show` repmgr: suppress connection error display in `repmgr cluster show`
@@ -16,9 +16,15 @@
repmgr: add option `--copy-external-config-files` for files outside repmgr: add option `--copy-external-config-files` for files outside
of the data directory (Ian) of the data directory (Ian)
repmgr: add configuration options to override the default pg_ctl repmgr: add configuration options to override the default pg_ctl
commands (Jarkko Oranen) commands (Jarkko Oranen, Ian)
repmgr: only require `wal_keep_segments` to be set in certain corner
cases (Ian)
repmgr: better support cloning from a node other than the one to
stream from (Ian)
repmgrd: don't start if node is inactive and failover=automatic (Ian)
packaging: improve "repmgr-auto" Debian package (Gianni) packaging: improve "repmgr-auto" Debian package (Gianni)
3.1.5 2016-08-15 3.1.5 2016-08-15
repmgrd: in a failover situation, prevent endless looping when repmgrd: in a failover situation, prevent endless looping when
attempting to establish the status of a node with attempting to establish the status of a node with

View File

@@ -7,7 +7,7 @@ replication capabilities with utilities to set up standby servers, monitor
replication, and perform administrative tasks such as failover or switchover replication, and perform administrative tasks such as failover or switchover
operations. operations.
The current `repmgr` version, 3.1.5, supports all PostgreSQL versions from The current `repmgr` version, 3.2, supports all PostgreSQL versions from
9.3, including the upcoming 9.6. 9.3, including the upcoming 9.6.
Overview Overview
@@ -580,13 +580,13 @@ base backups and WAL files.
Barman support provides the following advantages: Barman support provides the following advantages:
- the primary node does not need to perform a new backup every time a - the master node does not need to perform a new backup every time a
new standby is cloned; new standby is cloned;
- a standby node can be disconnected for longer periods without losing - a standby node can be disconnected for longer periods without losing
the ability to catch up, and without causing accumulation of WAL the ability to catch up, and without causing accumulation of WAL
files on the primary node; files on the master node;
- therefore, `repmgr` does not need to use replication slots, and the - therefore, `repmgr` does not need to use replication slots, and the
primary node does not need to set `wal_keep_segments`. master node does not need to set `wal_keep_segments`.
> *NOTE*: In view of the above, Barman support is incompatible with > *NOTE*: In view of the above, Barman support is incompatible with
> the `use_replication_slots` setting in `repmgr.conf`. > the `use_replication_slots` setting in `repmgr.conf`.
@@ -599,8 +599,8 @@ ensure that:
- the `barman_server` setting in `repmgr.conf` is set to the SSH - the `barman_server` setting in `repmgr.conf` is set to the SSH
hostname of the Barman server; hostname of the Barman server;
- the `restore_command` setting in `repmgr.conf` is configured to - the `restore_command` setting in `repmgr.conf` is configured to
use a copy of the `barman-wal-restore.py` script shipped with Barman use a copy of the `barman-wal-restore` script shipped with the
(see below); `barman-cli package` (see below);
- the Barman catalogue includes at least one valid backup for this - the Barman catalogue includes at least one valid backup for this
server. server.
@@ -616,26 +616,24 @@ ensure that:
> corresponding to the value of `barman_server` in `repmgr.conf`. See > corresponding to the value of `barman_server` in `repmgr.conf`. See
> the "Host" section in `man 5 ssh_config` for more details. > the "Host" section in `man 5 ssh_config` for more details.
`barman-wal-restore.py` is a Python script provided by the Barman `barman-wal-restore` is a Python script provided by the Barman
development team, which must be copied in a location accessible to development team as part of the `barman-cli` package (Barman 2.0
`repmgr`, and marked as executable; `restore_command` must then be and later; for Barman 1.x the script is provided separately as
set in `repmgr.conf` as follows: `barman-wal-restore.py`).
`restore_command` must then be set in `repmgr.conf` as follows:
<script> <Barman hostname> <cluster_name> %f %p <script> <Barman hostname> <cluster_name> %f %p
For instance, suppose that we have installed Barman on the `barmansrv` For instance, suppose that we have installed Barman on the `barmansrv`
host, and that we have placed a copy of `barman-wal-restore.py` into host, and that `barman-wal-restore` is located as an executable at
the `/usr/local/bin` directory. First, we ensure that the script is `/usr/bin/barman-wal-restore`; `repmgr.conf` should include the following
executable: lines:
sudo chmod +x /usr/local/bin/barman-wal-restore.py
Then we check that `repmgr.conf` includes the following lines:
barman_server=barmansrv barman_server=barmansrv
restore_command=/usr/local/bin/barman-wal-restore.py barmansrv test %f %p restore_command=/usr/bin/barman-wal-restore barmansrv test %f %p
To use a non-default Barman configuration file on the Barman server, NOTE: to use a non-default Barman configuration file on the Barman server,
specify this in `repmgr.conf` with `barman_config`: specify this in `repmgr.conf` with `barman_config`:
barman_config=/path/to/barman.conf barman_config=/path/to/barman.conf
@@ -688,24 +686,10 @@ and destination server as the contents of files existing on both servers need
to be compared, meaning this method is not necessarily faster than making a to be compared, meaning this method is not necessarily faster than making a
fresh clone with `pg_basebackup`. fresh clone with `pg_basebackup`.
> *NOTE*: `barman-wal-restore.py` supports command line switches to > *NOTE*: `barman-wal-restore` supports command line switches to
> control parallelism (`--parallel=N`) and compression (`--bzip2`, > control parallelism (`--parallel=N`) and compression (`--bzip2`,
> `--gzip`). > `--gzip`).
### Dealing with PostgreSQL configuration files
By default, `repmgr` will attempt to copy the standard configuration files
(`postgresql.conf`, `pg_hba.conf` and `pg_ident.conf`) even if they are located
outside of the data directory (though currently they will be copied
into the standby's data directory). To prevent this happening, when executing
`repmgr standby clone` provide the `--ignore-external-config-files` option.
If using `rsync` to clone a standby, additional control over which files
not to transfer is possible by configuring `rsync_options` in `repmgr.conf`,
which enables any valid `rsync` options to be passed to that command, e.g.:
rsync_options='--exclude=postgresql.local.conf'
### Controlling `primary_conninfo` in `recovery.conf` ### Controlling `primary_conninfo` in `recovery.conf`
The `primary_conninfo` setting in `recovery.conf` generated by `repmgr` The `primary_conninfo` setting in `recovery.conf` generated by `repmgr`
@@ -1028,7 +1012,7 @@ should have been updated to reflect this:
at a two-server master/standby replication cluster and currently does at a two-server master/standby replication cluster and currently does
not support additional standbys. not support additional standbys.
- `repmgr standby switchover` is designed to use the `pg_rewind` utility, - `repmgr standby switchover` is designed to use the `pg_rewind` utility,
standard in 9.5 and later and available for separately in 9.3 and 9.4 standard in 9.5 and later and available separately in 9.3 and 9.4
(see note below) (see note below)
- `pg_rewind` *requires* that either `wal_log_hints` is enabled, or that - `pg_rewind` *requires* that either `wal_log_hints` is enabled, or that
data checksums were enabled when the cluster was initialized. See the data checksums were enabled when the cluster was initialized. See the
@@ -1745,6 +1729,21 @@ which contains connection details for the local database.
the current working directory; no additional arguments are required. the current working directory; no additional arguments are required.
### Further documentation
As well as this README, the `repmgr` source contains following additional
documentation files:
* FAQ.md - frequently asked questions
* CONTRIBUTING.md - how to contribute to `repmgr`
* PACKAGES.md - details on building packages
* SSH-RSYNC.md - how to set up passwordless SSH between nodes
* docs/repmgrd-failover-mechanism.md - how repmgrd picks which node to promote
* docs/repmgrd-node-fencing.md - how to "fence" a failed master node
### Error codes ### Error codes
`repmgr` or `repmgrd` will return one of the following error codes on program `repmgr` or `repmgrd` will return one of the following error codes on program

View File

@@ -0,0 +1,75 @@
repmgrd's failover algorithm
============================
When implementing automatic failover, there are two factors which are critical in
ensuring the desired result is achieved:
- has the master node genuinely failed?
- which is the best node to promote to the new master?
This document outlines repmgrd's decision-making process during automatic failover
for standbys directly connected to the master node.
Master node failure detection
-----------------------------
If a `repmgrd` instance running on a PostgreSQL standby node is unable to connect to
the master node, this doesn't neccesarily mean that the master is down and a
failover is required. Factors such as network connectivity issues could mean that
even though the standby node is isolated, the replication cluster as a whole
is functioning correctly, and promoting the standby without further verification
could result in a "split-brain" situation.
In the event that `repmgrd` is unable to connect to the master node, it will attempt
to reconnect to the master server several times (as defined by the `reconnect_attempts`
parameter in `repmgr.conf`), with reconnection attempts occuring at the interval
specified by `reconnect_interval`. This happens to verify that the master is definitively
not accessible (e.g. that connection was not lost due to a brief network glitch).
Appropriate values for these settings will depend very much on the replication
cluster environment. There will necessarily be a trade-off between the time it
takes to assume the master is not reachable, and the reliability of that conclusion.
A standby in a different physical location to the master will probably need a longer
check interval to rule out possible network issues, whereas one located in the same
rack with a direct connection between servers could perform the check very quickly.
Note that it's possible the master comes back online after this point is reached,
but before a new master has been selected; in this case it will be noticed
during the selection of a new master and no actual failover will take place.
Promotion candidate selection
-----------------------------
Once `repmgrd` has decided the master is definitively unreachable, following checks
will be carried out:
* attempts to connect to all other nodes in the cluster (including the witness
node, if defined) to establish the state of the cluster, including their
current LSN
* If less than half of the nodes are visible (from the viewpoint
of this node), `repmgrd` will not take any further action. This is to ensure that
e.g. if a replication cluster is spread over multiple data centres, a split-brain
situation does not occur if there is a network failure between datacentres. Note
that if nodes are split evenly between data centres, a witness server can be
used to establish the "majority" daat centre.
* `repmgrd` polls all visible servers and waits for each node to return a valid LSN;
it updates the LSN previously stored for this node if it has increased since
the initial check
* once all LSNs have been retrieved, `repmgrd` will check for the highest LSN; if
its own node has the highest LSN, it will attempt to promote itself (using the
command defined in `promote_command` in `repmgr.conf`. Note that if using
`repmgr standby promote` as the promotion command, and the original master becomes available
before the promotion takes effect, `repmgr` will return an error and no promotion
will take place, and `repmgrd` will resume monitoring as usual.
* if the node is not the promotion candidate, `repmgrd` will execute the
`follow_command` defined in `repmgr.conf`. If using `repmgr standby follow` here,
`repmgr` will attempt to detect the new master node and attach to that.

View File

@@ -0,0 +1,150 @@
Fencing a failed master node with repmgrd and pgbouncer
=======================================================
With automatic failover, it's essential to ensure that a failed master
remains inaccessible to your application, even if it comes back online
again, to avoid a split-brain situation.
By using `pgbouncer` together with `repmgrd`, it's possible to combine
automatic failover with a process to isolate the failed master from
your application and ensure that all connections which should go to
the master are directed there smoothly without having to reconfigure
your application. (Note that as a connection pooler, `pgbouncer` can
benefit your application in other ways, but those are beyond the scope
of this document).
* * *
> *WARNING*: automatic failover is tricky to get right. This document
> demonstrates one possible implementation method, however you should
> carefully configure and test any setup to suit the needs of your own
> replication cluster/application.
* * *
In a failover situation, `repmgrd` promotes a standby to master by
executing the command defined in `promote_command`. Normally this
would be something like:
repmgr standby promote -f /etc/repmgr.conf
By wrapping this in a custom script which adjusts the `pgbouncer`
configuration on all nodes, it's possible to fence the failed master
and redirect write connections to the new master.
The script consists of three sections:
* commands to pause `pgbouncer` on all nodes
* the promotion command itself
* commands to reconfigure and restart `pgbouncer` on all nodes
Note that it requires password-less SSH access between all nodes to be
able to update the `pgbouncer` configuration files.
For the purposes of this demonstration, we'll assume there are 3 nodes
(master and two standbys), with `pgbouncer` listening on port 6432
handling connections to a database called `appdb`. The `postgres`
system user must have write access to the `pgbouncer` configuration
file on all nodes, assumed to be at `/etc/pgbouncer.ini`.
The script also requires a template file containing global `pgbouncer`
configuration, which should looks something like this (adjust
settings appropriately for your environment):
`/var/lib/postgres/repmgr/pgbouncer.ini.template`
[pgbouncer]
logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid
listen_addr = *
listen_port = 6532
unix_socket_dir = /tmp
auth_type = trust
auth_file = /etc/pgbouncer.auth
admin_users = postgres
stats_users = postgres
pool_mode = transaction
max_client_conn = 100
default_pool_size = 20
min_pool_size = 5
reserve_pool_size = 5
reserve_pool_timeout = 3
log_connections = 1
log_disconnections = 1
log_pooler_errors = 1
The actual script is as follows; adjust the configurable items as appropriate:
`/var/lib/postgres/repmgr/promote.sh`
#!/usr/bin/env bash
set -u
set -e
# Configurable items
PGBOUNCER_HOSTS="node1 node2 node3"
REPMGR_DB="repmgr"
REPMGR_USER="repmgr"
REPMGR_SCHEMA="repmgr_test"
PGBOUNCER_CONFIG="/etc/pgbouncer.ini"
PGBOUNCER_INI_TEMPLATE="/var/lib/postgres/repmgr/pgbouncer.ini.template"
PGBOUNCER_DATABASE="appdb"
# 1. Pause running pgbouncer instances
for HOST in $PGBOUNCER_HOSTS
do
psql -t -c "pause" -h $HOST -p $PORT -U postgres pgbouncer
done
# 2. Promote this node from standby to master
repmgr standby promote -f /etc/repmgr.conf
# 3. Reconfigure pgbouncer instances
PGBOUNCER_INI_NEW="/tmp/pgbouncer.ini.new"
for HOST in $PGBOUNCER_HOSTS
do
# Recreate the pgbouncer config file
echo -e "[databases]\n" > $PGBOUNCER_INI_NEW
psql -d $REPMGR_DB -U $REPMGR_USER -t -A \
-c "SELECT '$PGBOUNCER_DATABASE= ' || conninfo || ' application_name=pgbouncer_$HOST' \
FROM $REPMGR_SCHEMA.repl_nodes \
WHERE active = TRUE AND type='master'" >> $PGBOUNCER_INI_NEW
cat $PGBOUNCER_INI_TEMPLATE >> $PGBOUNCER_INI_NEW
rsync $PGBOUNCER_INI_NEW $HOST:$PGBOUNCER_CONFIG
psql -tc "reload" -h $HOST -U postgres pgbouncer
psql -tc "resume" -h $HOST -U postgres pgbouncer
done
# Clean up generated file
rm $PGBOUNCER_INI_NEW
echo "Reconfiguration of pgbouncer complete"
Script and template file should be installed on each node where
`repmgrd` is running.
Finally, set `promote_command` in `repmgr.conf` on each node to
point to the custom promote script:
promote_command=/var/lib/postgres/repmgr/promote.sh
and reload/restart any running `repmgrd` instances for the changes to take
effect.

View File

@@ -6119,7 +6119,7 @@ do_witness_unregister(void)
NULL, NULL); NULL, NULL);
if (PQstatus(master_conn) != CONNECTION_OK) if (PQstatus(master_conn) != CONNECTION_OK)
{ {
log_err(_("Unable to connect to master server\n")); log_err(_("unable to connect to master server\n"));
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
} }

View File

@@ -160,6 +160,9 @@
# These settings are only applied when repmgrd is running. Values shown # These settings are only applied when repmgrd is running. Values shown
# are defaults. # are defaults.
# monitoring interval in seconds; default is 2
#monitor_interval_secs=2
# Number of seconds to wait for a response from the primary server before # Number of seconds to wait for a response from the primary server before
# deciding it has failed. # deciding it has failed.
@@ -187,9 +190,6 @@
#promote_command='repmgr standby promote -f /path/to/repmgr.conf' #promote_command='repmgr standby promote -f /path/to/repmgr.conf'
#follow_command='repmgr standby follow -f /path/to/repmgr.conf -W' #follow_command='repmgr standby follow -f /path/to/repmgr.conf -W'
# monitoring interval in seconds; default is 2
#monitor_interval_secs=2
# change wait time for primary; before we bail out and exit when the primary # change wait time for primary; before we bail out and exit when the primary
# disappears, we wait 'reconnect_attempts' * 'retry_promote_interval_secs' # disappears, we wait 'reconnect_attempts' * 'retry_promote_interval_secs'
# seconds; by default this would be half an hour, as 'retry_promote_interval_secs' # seconds; by default this would be half an hour, as 'retry_promote_interval_secs'

View File

@@ -1,6 +1,6 @@
#ifndef _VERSION_H_ #ifndef _VERSION_H_
#define _VERSION_H_ #define _VERSION_H_
#define REPMGR_VERSION "3.2dev" #define REPMGR_VERSION "3.2"
#endif #endif