Minor README fix

Update README
`--ignore-external-config-files` deprecated
2026-03-23 15:16:29 +00:00 · 2016-10-05 16:49:01 +09:00 · 2016-10-05 16:48:40 +09:00 · 2016-10-05 13:58:05 +09:00 · 2016-10-05 13:58:01 +09:00 · 2016-10-05 13:57:57 +09:00
7 changed files with 272 additions and 42 deletions
--- a/10
+++ b/10
@@ -1,4 +1,4 @@
-3.2     2016-
+3.2     2016-10-05
        repmgr: add support for cloning from a Barman backup (Gianni)
        repmgr: add commands `standby matrix` and `standby crosscheck` (Gianni)
        repmgr: suppress connection error display in `repmgr cluster show`
@@ -16,9 +16,15 @@
        repmgr: add option `--copy-external-config-files` for files outside
           of the data directory (Ian)
        repmgr: add configuration options to override the default pg_ctl
-           commands (Jarkko Oranen)
+           commands (Jarkko Oranen, Ian)
        repmgr: only require `wal_keep_segments` to be set in certain corner
           cases (Ian)
        repmgr: better support cloning from a node other than the one to
           stream from (Ian)
        repmgrd: don't start if node is inactive and failover=automatic (Ian)
        packaging: improve "repmgr-auto" Debian package (Gianni)
 3.1.5   2016-08-15
        repmgrd: in a failover situation, prevent endless looping when
          attempting to establish the status of a node with
--- a/README.md
+++ b/README.md
@@ -7,7 +7,7 @@ replication capabilities with utilities to set up standby servers, monitor
 replication, and perform administrative tasks such as failover or switchover
 operations.
-The current `repmgr` version, 3.1.5, supports all PostgreSQL versions from
+The current `repmgr` version, 3.2, supports all PostgreSQL versions from
 9.3, including the upcoming 9.6.
 Overview
@@ -580,13 +580,13 @@ base backups and WAL files.
 Barman support provides the following advantages:
- the primary node does not need to perform a new backup every time a
+- the master node does not need to perform a new backup every time a
  new standby is cloned;
 - a standby node can be disconnected for longer periods without losing
  the ability to catch up, and without causing accumulation of WAL
-  files on the primary node;
+  files on the master node;
 - therefore, `repmgr` does not need to use replication slots, and the
-  primary node does not need to set `wal_keep_segments`.
+  master node does not need to set `wal_keep_segments`.
 > *NOTE*: In view of the above, Barman support is incompatible with
 > the `use_replication_slots` setting in `repmgr.conf`.
@@ -599,8 +599,8 @@ ensure that:
 - the `barman_server` setting in `repmgr.conf` is set to the SSH
  hostname of the Barman server;
 - the `restore_command` setting in `repmgr.conf` is configured to
-  use a copy of the `barman-wal-restore.py` script shipped with Barman
+  use a copy of the `barman-wal-restore` script shipped with the
-  (see below);
+  `barman-cli package` (see below);
 - the Barman catalogue includes at least one valid backup for this
  server.
@@ -616,26 +616,24 @@ ensure that:
 > corresponding to the value of `barman_server` in `repmgr.conf`. See
 > the "Host" section in `man 5 ssh_config` for more details.
-`barman-wal-restore.py` is a Python script provided by the Barman
+`barman-wal-restore` is a Python script provided by the Barman
-development team, which must be copied in a location accessible to
+development team as part of the `barman-cli` package (Barman 2.0
-`repmgr`, and marked as executable; `restore_command` must then be
+and later; for Barman 1.x the script is provided separately as
-set in `repmgr.conf` as follows:
+`barman-wal-restore.py`).
 `restore_command` must then be set in `repmgr.conf` as follows:
    <script> <Barman hostname> <cluster_name> %f %p
 For instance, suppose that we have installed Barman on the `barmansrv`
-host, and that we have placed a copy of `barman-wal-restore.py` into
+host, and that `barman-wal-restore` is located as an executable at
-the `/usr/local/bin` directory. First, we ensure that the script is
+`/usr/bin/barman-wal-restore`;  `repmgr.conf` should include the following
-executable:
+lines:
    sudo chmod +x /usr/local/bin/barman-wal-restore.py
 Then we check that `repmgr.conf` includes the following lines:
    barman_server=barmansrv
-    restore_command=/usr/local/bin/barman-wal-restore.py barmansrv test %f %p
+    restore_command=/usr/bin/barman-wal-restore barmansrv test %f %p
-To use a non-default Barman configuration file on the Barman server,
+NOTE: to use a non-default Barman configuration file on the Barman server,
 specify this in `repmgr.conf` with `barman_config`:
    barman_config=/path/to/barman.conf
@@ -688,24 +686,10 @@ and destination server as the contents of files existing on both servers need
 to be compared, meaning this method is not necessarily faster than making a
 fresh clone with `pg_basebackup`.
-> *NOTE*: `barman-wal-restore.py` supports command line switches to
+> *NOTE*: `barman-wal-restore` supports command line switches to
 > control parallelism (`--parallel=N`) and compression (`--bzip2`,
 > `--gzip`).
 ### Dealing with PostgreSQL configuration files
 By default, `repmgr` will attempt to copy the standard configuration files
 (`postgresql.conf`, `pg_hba.conf` and `pg_ident.conf`) even if they are located
 outside of the data directory (though currently they will be copied
 into the standby's data directory). To prevent this happening, when executing
 `repmgr standby clone` provide the `--ignore-external-config-files` option.
 If using `rsync` to clone a standby, additional control over which files
 not to transfer is possible by configuring `rsync_options` in `repmgr.conf`,
 which enables any valid `rsync` options to be passed to that command, e.g.:
    rsync_options='--exclude=postgresql.local.conf'
 ### Controlling `primary_conninfo` in `recovery.conf`
 The `primary_conninfo` setting in `recovery.conf` generated by `repmgr`
@@ -1028,7 +1012,7 @@ should have been updated to reflect this:
  at a two-server master/standby replication cluster and currently does
  not support additional standbys.
 - `repmgr standby switchover` is designed to use the `pg_rewind` utility,
-  standard in 9.5 and later and available for separately in 9.3 and 9.4
+  standard in 9.5 and later and available separately in 9.3 and 9.4
  (see note below)
 - `pg_rewind` *requires* that either `wal_log_hints` is enabled, or that
   data checksums were enabled when the cluster was initialized. See the
@@ -1745,6 +1729,21 @@ which contains connection details for the local database.
    the current working directory; no additional arguments are required.
 ### Further documentation
 As well as this README, the `repmgr` source contains following additional
 documentation files:
 * FAQ.md - frequently asked questions
 * CONTRIBUTING.md - how to contribute to `repmgr`
 * PACKAGES.md - details on building packages
 * SSH-RSYNC.md - how to set up passwordless SSH between nodes
 * docs/repmgrd-failover-mechanism.md - how repmgrd picks which node to promote
 * docs/repmgrd-node-fencing.md - how to "fence" a failed master node
 ### Error codes
 `repmgr` or `repmgrd` will return one of the following error codes on program
--- a/docs/repmgrd-failover-mechanism.md
+++ b/docs/repmgrd-failover-mechanism.md
@@ -0,0 +1,75 @@
 repmgrd's failover algorithm
 ============================
 When implementing automatic failover, there are two factors which are critical in
 ensuring the desired result is achieved:
  - has the master node genuinely failed?
  - which is the best node to promote to the new master?
 This document outlines repmgrd's decision-making process during automatic failover
 for standbys directly connected to the master node.
 Master node failure detection
 -----------------------------
 If a `repmgrd` instance running on a PostgreSQL standby node is unable to connect to
 the master node, this doesn't neccesarily mean that the master is down and a
 failover is required. Factors such as network connectivity issues could mean that
 even though the standby node is isolated, the replication cluster as a whole
 is functioning correctly, and promoting the standby without further verification
 could result in a "split-brain" situation.
 In the event that `repmgrd` is unable to connect to the master node, it will attempt
 to reconnect to the master server several times (as defined by the `reconnect_attempts`
 parameter in `repmgr.conf`), with reconnection attempts  occuring at the interval
 specified by `reconnect_interval`. This happens to verify that the master is definitively
 not accessible (e.g. that connection was not lost due to a brief network glitch).
 Appropriate values for these settings will depend very much on the replication
 cluster environment. There will necessarily be a trade-off between the time it
 takes to assume the master is not reachable, and the reliability of that conclusion.
 A standby in a different physical location to the master will probably need a longer
 check interval to rule out possible network issues, whereas one located in the same
 rack with a direct connection between servers could perform the check very quickly.
 Note that it's possible the master comes back online after this point is reached,
 but before a new master has been selected; in this case it will be noticed
 during the selection of a new master and no actual failover will take place.
 Promotion candidate selection
 -----------------------------
 Once `repmgrd` has decided the master is definitively unreachable, following checks
 will be carried out:
 * attempts to connect to all other nodes in the cluster (including the witness
  node, if defined) to establish the state of the cluster, including their
  current LSN
 * If less than half of the nodes are visible (from the viewpoint
  of this node), `repmgrd` will not take any further action. This is to ensure that
  e.g. if a replication cluster is spread over multiple data centres, a split-brain
  situation does not occur if there is a network failure between datacentres. Note
  that if nodes are split evenly between data centres, a witness server can be
  used to establish the "majority" daat centre.
 * `repmgrd` polls all visible servers and waits for each node to return a valid LSN;
  it updates the LSN previously  stored for this node if it has increased since
  the initial check
 * once all LSNs have been retrieved, `repmgrd` will check for the highest LSN; if
  its own node has the highest LSN, it will attempt to promote itself (using the
  command defined in `promote_command` in `repmgr.conf`. Note that if using
  `repmgr standby promote` as the promotion command, and the original master becomes available
  before the promotion takes effect, `repmgr` will return an error and no promotion
  will take place, and `repmgrd` will resume monitoring as usual.
 * if the node is not the promotion candidate, `repmgrd` will execute the
  `follow_command` defined in `repmgr.conf`. If using `repmgr standby follow` here,
  `repmgr` will attempt to detect the new master node and attach to that.
--- a/docs/repmgrd-node-fencing.md
+++ b/docs/repmgrd-node-fencing.md
@@ -0,0 +1,150 @@
 Fencing a failed master node with repmgrd and pgbouncer
 =======================================================
 With automatic failover, it's essential to ensure that a failed master
 remains inaccessible to your application, even if it comes back online
 again, to avoid a split-brain situation.
 By using `pgbouncer` together with `repmgrd`, it's possible to combine
 automatic failover with a process to isolate the failed master from
 your application and ensure that all connections which should go to
 the master are directed there smoothly without having to reconfigure
 your application. (Note that as a connection pooler, `pgbouncer` can
 benefit your application in other ways, but those are beyond the scope
 of this document).
 * * *
 > *WARNING*: automatic failover is tricky to get right. This document
 > demonstrates one possible implementation method, however you should
 > carefully configure and test any setup to suit the needs of your own
 > replication cluster/application.
 * * *
 In a failover situation, `repmgrd` promotes a standby to master by
 executing the command defined in `promote_command`. Normally this
 would be something like:
    repmgr standby promote -f /etc/repmgr.conf
 By wrapping this in a custom script which adjusts the `pgbouncer`
 configuration on all nodes, it's possible to fence the failed master
 and redirect write connections to the new master.
 The script consists of three sections:
 * commands to pause `pgbouncer` on all nodes
 * the promotion command itself
 * commands to reconfigure and restart `pgbouncer` on all nodes
 Note that it requires password-less SSH access between all nodes to be
 able to update the `pgbouncer` configuration files.
 For the purposes of this demonstration, we'll assume there are 3 nodes
 (master and two standbys), with `pgbouncer` listening on port 6432
 handling connections to a database called `appdb`. The `postgres`
 system user must have write access to the `pgbouncer` configuration
 file on all nodes, assumed to be at `/etc/pgbouncer.ini`.
 The script also requires a template file containing global `pgbouncer`
 configuration, which should looks something like this (adjust
 settings appropriately for your environment):
 `/var/lib/postgres/repmgr/pgbouncer.ini.template`
    [pgbouncer]
    logfile = /var/log/pgbouncer/pgbouncer.log
    pidfile = /var/run/pgbouncer/pgbouncer.pid
    listen_addr = *
    listen_port = 6532
    unix_socket_dir = /tmp
    auth_type = trust
    auth_file = /etc/pgbouncer.auth
    admin_users = postgres
    stats_users = postgres
    pool_mode = transaction
    max_client_conn = 100
    default_pool_size = 20
    min_pool_size = 5
    reserve_pool_size = 5
    reserve_pool_timeout = 3
    log_connections = 1
    log_disconnections = 1
    log_pooler_errors = 1
 The actual script is as follows; adjust the configurable items as appropriate:
 `/var/lib/postgres/repmgr/promote.sh`
    #!/usr/bin/env bash
    set -u
    set -e
    # Configurable items
    PGBOUNCER_HOSTS="node1 node2 node3"
    REPMGR_DB="repmgr"
    REPMGR_USER="repmgr"
    REPMGR_SCHEMA="repmgr_test"
    PGBOUNCER_CONFIG="/etc/pgbouncer.ini"
    PGBOUNCER_INI_TEMPLATE="/var/lib/postgres/repmgr/pgbouncer.ini.template"
    PGBOUNCER_DATABASE="appdb"
    # 1. Pause running pgbouncer instances
    for HOST in $PGBOUNCER_HOSTS
    do
        psql -t -c "pause" -h $HOST -p $PORT -U postgres pgbouncer
    done
    # 2. Promote this node from standby to master
    repmgr standby promote -f /etc/repmgr.conf
    # 3. Reconfigure pgbouncer instances
    PGBOUNCER_INI_NEW="/tmp/pgbouncer.ini.new"
    for HOST in $PGBOUNCER_HOSTS
    do
        # Recreate the pgbouncer config file
        echo -e "[databases]\n" > $PGBOUNCER_INI_NEW
        psql -d $REPMGR_DB -U $REPMGR_USER -t -A \
          -c "SELECT '$PGBOUNCER_DATABASE= ' || conninfo || ' application_name=pgbouncer_$HOST' \
              FROM $REPMGR_SCHEMA.repl_nodes \
              WHERE active = TRUE AND type='master'" >> $PGBOUNCER_INI_NEW
        cat $PGBOUNCER_INI_TEMPLATE >> $PGBOUNCER_INI_NEW
        rsync $PGBOUNCER_INI_NEW $HOST:$PGBOUNCER_CONFIG
        psql -tc "reload" -h $HOST -U postgres pgbouncer
        psql -tc "resume" -h $HOST -U postgres pgbouncer
    done
    # Clean up generated file
    rm $PGBOUNCER_INI_NEW
    echo "Reconfiguration of pgbouncer complete"
 Script and template file should be installed on each node where
 `repmgrd` is running.
 Finally, set `promote_command` in `repmgr.conf` on each node to
 point to the custom promote script:
    promote_command=/var/lib/postgres/repmgr/promote.sh
 and reload/restart any running `repmgrd` instances for the changes to take
 effect.
--- a/repmgr.c
+++ b/repmgr.c
@@ -6119,7 +6119,7 @@ do_witness_unregister(void)
 										NULL, NULL);
 	if (PQstatus(master_conn) != CONNECTION_OK)
 	{
-		log_err(_("Unable to connect to master server\n"));
+		log_err(_("unable to connect to master server\n"));
 		exit(ERR_BAD_CONFIG);
 	}
--- a/repmgr.conf.sample
+++ b/repmgr.conf.sample
@@ -160,6 +160,9 @@
 # These settings are only applied when repmgrd is running. Values shown
 # are defaults.
 # monitoring interval in seconds; default is 2
 #monitor_interval_secs=2
 # Number of seconds to wait for a response from the primary server before
 # deciding it has failed.
@@ -187,9 +190,6 @@
 #promote_command='repmgr standby promote -f /path/to/repmgr.conf'
 #follow_command='repmgr standby follow -f /path/to/repmgr.conf -W'
 # monitoring interval in seconds; default is 2
 #monitor_interval_secs=2
 # change wait time for primary; before we bail out and exit when the primary
 # disappears, we wait 'reconnect_attempts' * 'retry_promote_interval_secs'
 # seconds; by default this would be half an hour, as 'retry_promote_interval_secs'
--- a/version.h
+++ b/version.h
@@ -1,6 +1,6 @@
 #ifndef _VERSION_H_
 #define _VERSION_H_
-#define REPMGR_VERSION "3.2dev"
+#define REPMGR_VERSION "3.2"
 #endif
Author	SHA1	Message	Date
Ian Barwick	fefa43e3a6	Minor README fix	2016-10-05 16:49:01 +09:00
Ian Barwick	c1a1fe6f82	Update README `--ignore-external-config-files` deprecated	2016-10-05 16:48:40 +09:00
Ian Barwick	4dc3a05e8d	Update history	2016-10-05 13:58:05 +09:00
Ian Barwick	5945accd84	Add documentation for repmgrd failover process and failed node fencing Addresses GitHub #200.	2016-10-05 13:58:01 +09:00
Ian Barwick	15cbda9ec3	repmgr: consistent error message style	2016-10-05 13:57:57 +09:00
Ian Barwick	358559acc4	Update `barman-wal-restore` documentation Barman 2.0 provides this in a separate, more convenient `barman-cli` package; document this and add note about previous `barman-wal-restore.py` script.	2016-10-03 16:04:02 +09:00
Ian Barwick	0a9f8e160a	Tweak repmgr.conf.sample Put `monitor_interval_secs` at the start of the `repmgrd` section, as it's a very fundamental configuration item.	2016-10-03 16:03:59 +09:00
Ian Barwick	a2d67e85de	Bump version 3.2	2016-09-30 15:13:56 +09:00