diff --git a/FAILOVER.rst b/FAILOVER.rst deleted file mode 100644 index 72128632..00000000 --- a/FAILOVER.rst +++ /dev/null @@ -1,238 +0,0 @@ -==================================================== - PostgreSQL Automatic Failover - User Documentation -==================================================== - -Automatic Failover -================== - -repmgr allows for automatic failover when it detects the failure of the master node. -Following is a quick setup for this. - -Installation -============ - -For convenience, we define: - -**node1** - is the fully qualified domain name of the Master server, IP 192.168.1.10 -**node2** - is the fully qualified domain name of the Standby server, IP 192.168.1.11 -**witness** - is the fully qualified domain name of the server used as a witness, IP 192.168.1.12 - -**Note:** We don't recommend using names with the status of a server like «masterserver», -because it would be confusing once a failover takes place and the Master is -now on the «standbyserver». - -Summary -------- - -2 PostgreSQL servers are involved in the replication. Automatic failover needs -a vote to decide what server it should promote, so an odd number is required. -A witness-repmgrd is installed in a third server where it uses a PostgreSQL -cluster to communicate with other repmgrd daemons. - -1. Install PostgreSQL in all the servers involved (including the witness server) - -2. Install repmgr in all the servers involved (including the witness server) - -3. Configure the Master PostreSQL - -4. Clone the Master to the Standby using "repmgr standby clone" command - -5. Configure repmgr in all the servers involved (including the witness server) - -6. Register Master and Standby nodes - -7. Initiate witness server - -8. Start the repmgrd daemons in all nodes - -**Note** A complete High-Availability design needs at least 3 servers to still have -a backup node after a first failure. - -Install PostgreSQL ------------------- - -You can install PostgreSQL using any of the recommended methods. You should ensure -it's 9.0 or later. - -Install repmgr --------------- - -Install repmgr following the steps in the README file. - -Configure PostreSQL -------------------- - -Log in to node1. - -Edit the file postgresql.conf and modify the parameters:: - - listen_addresses='*' - wal_level = 'hot_standby' - archive_mode = on - archive_command = 'cd .' # we can also use exit 0, anything that - # just does nothing - max_wal_senders = 10 - wal_keep_segments = 5000 # 80 GB required on pg_xlog - hot_standby = on - shared_preload_libraries = 'repmgr_funcs' - -Edit the file pg_hba.conf and add lines for the replication:: - - host repmgr repmgr 127.0.0.1/32 trust - host repmgr repmgr 192.168.1.10/30 trust - host replication all 192.168.1.10/30 trust - -**Note:** It is also possible to use a password authentication (md5), .pgpass file -should be edited to allow connection between each node. - -Create the user and database to manage replication:: - - su - postgres - createuser -s repmgr - createdb -O repmgr repmgr - -Restart the PostgreSQL server:: - - pg_ctl -D $PGDATA restart - -And check everything is fine in the server log. - -Create the ssh-key for the postgres user and copy it to other servers:: - - su - postgres - ssh-keygen # /!\ do not use a passphrase /!\ - cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys - chmod 600 ~/.ssh/authorized_keys - exit - rsync -avz ~postgres/.ssh/authorized_keys node2:~postgres/.ssh/ - rsync -avz ~postgres/.ssh/authorized_keys witness:~postgres/.ssh/ - rsync -avz ~postgres/.ssh/id_rsa* node2:~postgres/.ssh/ - rsync -avz ~postgres/.ssh/id_rsa* witness:~postgres/.ssh/ - -Clone Master ------------- - -Log in to node2. - -Clone node1 (the current Master):: - - su - postgres - repmgr -d repmgr -U repmgr -h node1 standby clone - -Start the PostgreSQL server:: - - pg_ctl -D $PGDATA start - -And check everything is fine in the server log. - -Configure repmgr ----------------- - -Log in to each server and configure repmgr by editing the file -/etc/repmgr/repmgr.conf:: - - cluster=my_cluster - node=1 - node_name=earth - conninfo='host=192.168.1.10 dbname=repmgr user=repmgr' - master_response_timeout=60 - reconnect_attempts=6 - reconnect_interval=10 - failover=automatic - promote_command='promote_command.sh' - follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf' - -**cluster** - is the name of the current replication. -**node** - is the number of the current node (1, 2 or 3 in the current example). -**node_name** - is an identifier for every node. -**conninfo** - is used to connect to the local PostgreSQL server (where the configuration file is) from any node. In the witness server configuration you need to add a 'port=5499' to the conninfo. -**master_response_timeout** - is the maximum amount of time we are going to wait before deciding the master has died and start the failover procedure. -**reconnect_attempts** - is the number of times we will try to reconnect to master after a failure has been detected and before start the failover procedure. -**reconnect_interval** - is the amount of time between retries to reconnect to master after a failure has been detected and before start the failover procedure. -**failover** - configure behavior: *manual* or *automatic*. -**promote_command** - the command executed to do the failover (including the PostgreSQL failover itself). The command must return 0 on success. -**follow_command** - the command executed to address the current standby to another Master. The command must return 0 on success. - -Register Master and Standby ---------------------------- - -Log in to node1. - -Register the node as master:: - - su - postgres - repmgr -f /etc/repmgr/repmgr.conf master register - -This will also create the repmgr schema and functions. - -Log in to node2. Register it as a standby:: - - su - postgres - repmgr -f /etc/repmgr/repmgr.conf standby register - -Initialize witness server -------------------------- - -Log in to witness. - -Initialize the witness server:: - - su - postgres - repmgr -d repmgr -U repmgr -h 192.168.1.10 -D $WITNESS_PGDATA -f /etc/repmgr/repmgr.conf witness create - -The witness server needs the following information from the command -line: - -* Connection details for the current master, to copy the cluster - configuration. -* A location for initializing its own $PGDATA. - -repmgr will also ask for the superuser password on the witness database so -it can reconnect when needed (the command line option --initdb-no-pwprompt -will set up a password-less superuser). - -By default the witness server will listen on port 5499; this value can be -overridden by explicitly providing the port number in the conninfo string -in repmgr.conf. (Note that it is also possible to specify the port number -with the -l/--local-port option, however this option is now deprecated and -will be overridden by a port setting in the conninfo string). - -Start the repmgrd daemons -------------------------- - -Log in to node2 and witness:: - - su - postgres - repmgrd -f /etc/repmgr/repmgr.conf --daemonize -> /var/log/postgresql/repmgr.log 2>&1 - -**Note:** The Master does not need a repmgrd daemon. - -Suspend Automatic behavior -========================== - -Edit the repmgr.conf of the node to remove from automatic processing and change:: - - failover=manual - -Then, signal repmgrd daemon:: - - su - postgres - kill -HUP $(pidof repmgrd) - -Usage -===== - -The repmgr documentation is in the README file (how to build, options, etc.) diff --git a/QUICKSTART.md b/QUICKSTART.md deleted file mode 100644 index 2dd3e492..00000000 --- a/QUICKSTART.md +++ /dev/null @@ -1,118 +0,0 @@ -repmgr quickstart guide -======================= - -This quickstart guide provides some annotated examples on basic -`repmgr` setup. It assumes you are familiar with PostgreSQL replication -concepts setup and Linux/UNIX system administration. - -For the purposes of this guide, we'll assume the database user will be -`repmgr_usr` and the database will be `repmgr_db`. - - -Master setup ------------- - -1. Configure PostgreSQL - - - create user and database: - - ``` - CREATE ROLE repmgr_usr LOGIN SUPERUSER; - CREATE DATABASE repmgr_db OWNER repmgr_usr; - ``` - - - configure `postgresql.conf` for replication (see README.md for sample - settings) - - - update `pg_hba.conf`, e.g.: - - ``` - host repmgr_db repmgr_usr 192.168.1.0/24 trust - host replication repmgr_usr 192.168.1.0/24 trust - ``` - - Restart the PostgreSQL server after making these changes. - -2. Create the `repmgr` configuration file: - - $ cat /path/to/repmgr/node1/repmgr.conf - cluster=test - node=1 - node_name=node1 - conninfo='host=repmgr_node1 user=repmgr_usr dbname=repmgr_db' - pg_bindir=/path/to/postgres/bin - - (For an annotated `repmgr.conf` file, see `repmgr.conf.sample` in the - repository's root directory). - -3. Register the master node with `repmgr`: - - $ repmgr -f /path/to/repmgr/node1/repmgr.conf --verbose master register - [2015-03-03 17:45:53] [INFO] repmgr connecting to master database - [2015-03-03 17:45:53] [INFO] repmgr connected to master, checking its state - [2015-03-03 17:45:53] [INFO] master register: creating database objects inside the repmgr_test schema - [2015-03-03 17:45:53] [NOTICE] Master node correctly registered for cluster test with id 1 (conninfo: host=localhost user=repmgr_usr dbname=repmgr_db) - -Standby setup -------------- - -1. Use `repmgr standby clone` to clone a standby from the master: - - repmgr -D /path/to/standby/data -d repmgr_db -U repmgr_usr --verbose standby clone 192.168.1.2 - [2015-03-03 18:18:21] [NOTICE] No configuration file provided and default file './repmgr.conf' not found - continuing with default values - [2015-03-03 18:18:21] [NOTICE] repmgr Destination directory ' /path/to/standby/data' provided - [2015-03-03 18:18:21] [INFO] repmgr connecting to upstream node - [2015-03-03 18:18:21] [INFO] repmgr connected to upstream node, checking its state - [2015-03-03 18:18:21] [INFO] Successfully connected to upstream node. Current installation size is 27 MB - [2015-03-03 18:18:21] [NOTICE] Starting backup... - [2015-03-03 18:18:21] [INFO] creating directory " /path/to/standby/data"... - [2015-03-03 18:18:21] [INFO] Executing: 'pg_basebackup -l "repmgr base backup" -h localhost -p 9595 -U repmgr_usr -D /path/to/standby/data ' - NOTICE: pg_stop_backup complete, all required WAL segments have been archived - [2015-03-03 18:18:23] [NOTICE] repmgr standby clone (using pg_basebackup) complete - [2015-03-03 18:18:23] [NOTICE] HINT: You can now start your postgresql server - [2015-03-03 18:18:23] [NOTICE] for example : pg_ctl -D /path/to/standby/data start - - Note that the `repmgr.conf` file is not required when cloning a standby. - However we recommend providing a valid `repmgr.conf` if you wish to use - replication slots, or want `repmgr` to log the clone event to the - `repl_events` table. - - This will clone the PostgreSQL database files from the master, including its - `postgresql.conf` and `pg_hba.conf` files, and additionally automatically create - the `recovery.conf` file containing the correct parameters to start streaming - from the primary node. - -2. Start the PostgreSQL server - -3. Create the `repmgr` configuration file: - - $ cat /path/node2/repmgr/repmgr.conf - cluster=test - node=2 - node_name=node2 - conninfo='host=repmgr_node2 user=repmgr_usr dbname=repmgr_db' - pg_bindir=/path/to/postgres/bin - -4. Register the standby node with `repmgr`: - - $ repmgr -f /path/to/repmgr/node2/repmgr.conf --verbose standby register - [2015-03-03 18:24:34] [NOTICE] Opening configuration file: /path/to/repmgr/node2/repmgr.conf - [2015-03-03 18:24:34] [INFO] repmgr connecting to standby database - [2015-03-03 18:24:34] [INFO] repmgr connecting to master database - [2015-03-03 18:24:34] [INFO] finding node list for cluster 'test' - [2015-03-03 18:24:34] [INFO] checking role of cluster node '1' - [2015-03-03 18:24:34] [INFO] repmgr connected to master, checking its state - [2015-03-03 18:24:34] [INFO] repmgr registering the standby - [2015-03-03 18:24:34] [INFO] repmgr registering the standby complete - [2015-03-03 18:24:34] [NOTICE] Standby node correctly registered for cluster test with id 2 (conninfo: host=localhost user=repmgr_usr dbname=repmgr_db) - - -This concludes the basic `repmgr` setup of master and standby. The records -created in the `repl_nodes` table should look something like this: - - repmgr_db=# SELECT * from repmgr_test.repl_nodes; - id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active - ----+---------+------------------+---------+-------+----------------------------------------------------+-----------+----------+-------- - 1 | primary | | test | node1 | host=repmgr_node1 user=repmgr_usr dbname=repmgr_db | | 0 | t - 2 | standby | 1 | test | node2 | host=repmgr_node2 user=repmgr_usr dbname=repmgr_db | | 0 | t - (2 rows) diff --git a/README.md b/README.md index 6ce9dcbd..59959b03 100644 --- a/README.md +++ b/README.md @@ -1,278 +1,944 @@ repmgr: Replication Manager for PostgreSQL ========================================== -`repmgr` is an open-source tool to manage replication and failover -between multiple PostgreSQL servers. It enhances PostgreSQL's built-in -hot-standby capabilities with tools to set up standby servers, monitor +`repmgr` is a suite of open-source tools to manage replication and failover +within a cluster of PostgreSQL servers. It enhances PostgreSQL's built-in +replication capabilities with utilities to set up standby servers, monitor replication, and perform administrative tasks such as failover or manual switchover operations. -This document covers `repmgr 3`, which supports PostgreSQL 9.3 and later. -This version can use `pg_basebackup` to clone standby servers, supports -replication slots and cascading replication, doesn't require a restart -after promotion, and has many usability improvements. - -Please continue to use `repmgr 2` with PostgreSQL 9.2 and earlier. -For a list of changes since `repmgr 2` and instructions on upgrading to -`repmgr 3`, see the "Upgrading from repmgr 2" section below. - -For a list of frequently asked questions about `repmgr`, please refer -to the file `FAQ.md`. Overview -------- -The `repmgr` command-line tool is used to perform administrative tasks, -and the `repmgrd` daemon is used to optionally monitor replication and -manage automatic failover. +The `repmgr` suite provides two main tools: -To get started, each PostgreSQL node in your cluster must have a -`repmgr.conf` file. The current master node must be registered using -`repmgr master register`. Existing standby servers can be registered -using `repmgr standby register`. A new standby server can be created -using `repmgr standby clone` followed by `repmgr standby register`. +- `repmgr` - a command-line tool used to perform administrative tasks such as: + - setting up standby servers + - promoting a standby server to master + - switching over master and standby servers + - displaying the status of servers in the replication cluster -See the `QUICKSTART.md` file for examples of how to use these commands. +- `repmgrd` is a daemon which actively monitors servers in a replication cluster + and performs the following tasks: + - monitoring and recording replication performance + - performing failover by detecting failure of the master and + promoting the most suitable standby server + - provide notifications about events in the cluster to a user-defined + script which can perform tasks such as sending alerts by email -Once the cluster is in operation, run `repmgr cluster show` to see the -status of the registered primary and standby nodes. Any standby can be -manually promoted using `repmgr standby promote`. Other standby nodes -can be told to follow the new master using `repmgr standby follow`. We -show examples of these commands below. -Next, for detailed monitoring, you must run `repmgrd` (with the same -configuration file) on all your nodes. Replication status information is -stored in a custom schema along with information about registered nodes. -You also need `repmgrd` to configure automatic failover in your cluster. +`repmgr` supports and enhances PostgreSQL's built-in streaming replication, which +provides a single read/write master server and one or more read-only standbys +containing near-real time copies of the master server's database. -See the `FAILOVER.rst` file for an explanation of how to set up -automatic failover. +For a multi-master replication solution, please see 2ndQuadrant's BDR +(bi-directional replication) extension. For selective replication, e.g. +of individual tables or databases from one server to another, please +see 2ndQuadrant's pglogical extension. -Requirements ------------- -`repmgr` is developed and tested on Linux and OS X, but it should work -on any UNIX-like system which PostgreSQL itself supports. +### Concepts -All nodes must be running the same major version of PostgreSQL, and we -recommend that they also run the same minor version. This version of -`repmgr` (v3) supports PostgreSQL 9.3 and later. +This guide assumes that you are familiar with PostgreSQL administration and +streaming replication concepts. For further details on streaming +replication, see this link: -Earlier versions of `repmgr` needed password-less SSH access between -nodes in order to clone standby servers using `rsync`. `repmgr 3` can -use `pg_basebackup` instead in most circumstances; ssh is not required. + http://www.postgresql.org/docs/current/interactive/warm-standby.html#STREAMING-REPLICATION -You will need to use rsync only if your PostgreSQL configuration files -are outside your data directory (as on Debian) and you wish these to -be copied by `repmgr`. See the `SSH-RSYNC.md` file for details on -configuring password-less SSH between your nodes. +The following terms are used throughout the `repmgr` documentation. + +- `replication cluster` + +In the `repmgr` documentation, "replication cluster" refers to the network +of PostgreSQL servers connected by streaming replication. + +- `node` + +A `node` is a server within a network cluster. + +- `upstream node` + +This is the node a standby server is connected to; either the master server or in +the case of cascading replication, another standby. + +- `failover` + +If a master server fails, generally it's desirable to promote a suitable +standby as the new master. The `repmgrd` daemon supports automatic failover +in this kind of case. + +- `switchover` + +In certain circumstances, such as hardware or operating system maintenance, +it's necessary to take a master server offline; in this case a controlled +switchover is necessary, whereby a suitable standby is promoted and the +existing master removed from the replication cluster in a controlled manner. +The `repmgr` command line client provides this functionality. + +- `witness server` + +`repmgr` provides functionality to set up a so-called "witness server" to +assist in determining a new master server in a failover situation with more +than one standby. The witness server itself is not part of the replication +cluster, although it does contain a copy of the repmgr metadata schema +(see below). + +The purpose of a witness server is to provide a "casting vote" where servers +in the replication cluster are split over more than one location. In the event +of a loss of connectivity between locations, the presence or absence of +the witness server will decide whether a server at that location is promoted +to master; this is to prevent a "split-brain" situation where an isolated +location interprets a network outage as a failure of the (remote) master and +promotes a (local) standby. + +A witness server only needs to be created if `repmgrd` is in use. + +### repmgr user and metadata + +In order to effectively manage a replication cluster, `repmgr` needs to store +information about the servers in the cluster in a dedicated database schema. +This schema is automatically created during the first step in initialising +a `repmgr`-controlled cluster (`repmgr master register`) and contains the +following objects: + +tables: + - `repl_events`: records events of interest + - `repl_nodes`: connection and status information for each server in the + replication cluster + - `repl_monitor`: historical standby monitoring information written by `repmgrd` + +views: + - `repl_show_nodes`: based on the `repl_nodes` showing name of the server's + upstream node + - `repl_status`: when `repmgrd`'s monitoring is enabled, shows current monitoring + status for each node + +The `repmgr` metadata schema can be stored in an existing database or in its own +dedicated database. + +A dedicated superuser is required to own the meta-database as well as carry out +administrative actions. Installation ------------ -`repmgr` must be installed on each PostgreSQL server node. +### System requirements -* Packages - - PGDG publishes RPM packages for RedHat-based distributions - - Debian/Ubuntu provide .deb packages. - - See `PACKAGES.md` for details on building .deb and .rpm packages - from the `repmgr` source code. +`repmgr` is developed and tested on Linux and OS X, but should work on any +UNIX-like system supported by PostgreSQL itself. + +`repmgr` supports PostgreSQL from version 9.3. + +All servers in the replication cluster must be running the same major version of +PostgreSQL, and we recommend that they also run the same minor version. + +The `repmgr` tools must be installed on each server in the replication cluster. + +A dedicated system user for `repmgr` is *not* required; as many `repmgr` and +`repmgrd` actions require direct access to the PostgreSQL data directory, +it is usually executed by the `postgres` user. + +Additionally, we recommend installing `rsync` and enabling passwordless +`ssh` connectivity between all servers in the replication cluster. + +### Packages + +We recommend installing `repmgr` using the available packages for your +system. + +- RedHat/CentOS: RPM packages for `repmgr` are available via Yum through + the PostgreSQL Global Development Group RPM repository ( http://yum.postgresql.org/ ). + You need to follow the instructions for your distribution (RedHat, CentOS, + Fedora, etc.) and architecture as detailed at yum.postgresql.org. + +- Debian/Ubuntu: the most recent `repmgr` packages are available from the + PostgreSQL Community APT repository ( http://apt.postgresql.org/ ). + Instructions can be found in the APT section of the PostgreSQL Wiki + ( https://wiki.postgresql.org/wiki/Apt ). + +See `PACKAGES.md` for details on building .deb and .rpm packages from the +`repmgr` source code. + + +### Source installation + +`repmgr` source code can be obtained directly from the project GitHub repository: + + git clone https://github.com/2ndQuadrant/repmgr + +Release tarballs are also available: -* Source installation - - `git clone https://github.com/2ndQuadrant/repmgr` - - Or download tar.gz files from https://github.com/2ndQuadrant/repmgr/releases - - To install from source, run `sudo make USE_PGXS=1 install` + http://repmgr.org/downloads.php -After installation, you should be able to run `repmgr --version` and -`repmgrd --version`. These binaries should be installed in the same -directory as other PostgreSQL binaries, such as `psql`. +`repmgr` is compiled in the same way as a PostgreSQL extension using the PGXS +infrastructure, e.g.: -Configuration -------------- + sudo make USE_PGXS=1 install -### Server configuration +`repmgr` can be built in any environment suitable for building PostgreSQL itself. -By default, `repmgr` uses PostgreSQL's built-in replication protocol to -clone a primary and create a standby server. If your configuration files -live outside your data directory, however, you will still need to set up -password-less SSH so that rsync can be used. See the `SSH-RSYNC.md` file -for details. + +### Configuration + +`repmgr` and `repmgrd` use a common configuration file, usually named +`repmgr.conf` (although any name can be used if explicitly specified). +At the very least, `repmgr.conf` must contain the connection parameters +for the local `repmgr` database. + +The configuration file will be looked for in the following locations: + +- a configuration file specified by the `-f/--config-file` command line option +- `repmgr.conf` in the local directory +- `/etc/repmgr.conf` +- the directory reported by `pg_config --sysconfdir` + +Note that if a file is explicitly specified with `-f/--config-file`, an error will +be raised if it is not found or not readable and no attempt will be made to check +default locations. + +For a full list of annotated configuration items, see the file `repmgr.conf.sample`. + +These parameters in the configuration file can be overridden with command line +options: + +- `-L/--log-level` +- `-b/--pg_bindir` + + +Setting up a simple replication cluster with repmgr +--------------------------------------------------- + +The following section will describe how to set up a basic replication cluster +with a master and a standby server using the `repmgr` command line tool. +It is assumed PostgreSQL is installed on both servers in the cluster, +`rsync` is available and password-less SSH connections are possible between +both servers. + + *TIP*: for testing `repmgr`, it's possible to use multiple PostgreSQL + instances running on different ports on the same computer, with + password-less SSH access to `localhost` enabled. ### PostgreSQL configuration -The primary server needs to be configured for replication with settings -like the following in `postgresql.conf`: +On the master server, a PostgreSQL instance must be initialised and running. +The following replication settings must be included in `postgresql.conf`: - # Allow read-only queries on standby servers. The number of WAL - # senders should be larger than the number of standby servers. + # Ensure WAL files contain enough information to enable read-only queries + # on the standby - hot_standby = on wal_level = 'hot_standby' + + # Enable up to 10 replication connections + max_wal_senders = 10 - # How much WAL to retain on the primary to allow a temporarily + # How much WAL to retain on the master to allow a temporarily # disconnected standby to catch up again. The larger this is, the # longer the standby can be disconnected. This is needed only in # 9.3; from 9.4, replication slots can be used instead (see below). wal_keep_segments = 5000 - # Enable archiving, but leave it unconfigured (so that it can be - # configured without a restart later). Recommended, not required. + # Enable read-only queries on a standby + # (Note: this will be ignored on a master but we recommend including + # it anyway) - archive_mode = on - archive_command = 'cd .' + hot_standby = on - # If you plan to use repmgrd, ensure that shared_preload_libraries - # is configured to load 'repmgr_funcs' - shared_preload_libraries = 'repmgr_funcs' - -PostgreSQL 9.4 makes it possible to use replication slots, which means -the value of `wal_keep_segments` need no longer be set. See section -"Replication slots" below for more details. - -With PostgreSQL 9.3, `repmgr` expects `wal_keep_segments` to be set to -at least 5000 (= 80GB of WAL) by default, though this can be overriden -with the `-w N` argument. - -A dedicated PostgreSQL superuser account and a database in which to -store monitoring and replication data are required. Create them by -running the following commands: +Create a dedicated PostgreSQL superuser account and a database for +the `repmgr` metadata, e.g. createuser -s repmgr createdb repmgr -O repmgr -We recommend using the name `repmgr` for both user and database, but you -can use whatever name you like (and you need to set the names you chose -in the `conninfo` string in `repmgr.conf`; see below). We also recommend -that you set the `repmgr` user's search path to include the `repmgr` schema -for convenience when querying the metadata tables and views. +For the examples in this document, the name `repmgr` will be used for both +user and database, but any names can be used. -The `repmgr` application will create its metadata schema in the `repmgr` -database when the master server is registered. +Ensure the `repmgr` user has appropriate permissions in `pg_hba.conf` and +can connect in replication mode; `pg_hba.conf` should contain entries +similar to the following: -### repmgr configuration + local replication repmgr trust + host replication repmgr 127.0.0.1/32 trust + host replication repmgr 192.168.1.0/32 trust -Create a `repmgr.conf` file on each server. Here's a minimal sample: + local repmgr repmgr trust + host repmgr repmgr 127.0.0.1/32 trust + host repmgr repmgr 192.168.1.0/32 trust + +Adjust according to your network environment and authentication requirements. + +On the standby, do not create a PostgreSQL instance, but do ensure an empty +directory is available for the `postgres` system user to create a data +directory. + + +### repmgr configuration file + +Create a `repmgr.conf` file on the master server. The file must contain at +least the following parameters: cluster=test node=1 node_name=node1 conninfo='host=repmgr_node1 user=repmgr dbname=repmgr' -The `cluster` name must be the same on all nodes. The `node` (an -integer) and `node_name` must be unique to each node. +- `cluster`: an arbitrary name for the replication cluster; this must be identical + on all nodes +- `node`: a unique integer identifying the node +- `node_name`: a unique string identifying the node; we recommend a name + specific to the server (e.g. 'server_1'); avoid names indicating the + current replication role like 'master' or 'standby' as the server's + role could change. +- `conninfo`: a valid connection string for the `repmgr` database on the + *current* server. (On the standby, the database will not yet exist, but + `repmgr` needs to know the connection details to complete the setup + process). -The `conninfo` string must point to repmgr's database *on this node*. -The host must be an IP or a name that all the nodes in the cluster can -resolve (not `localhost`!). All nodes must use the same username and -database name, but other parameters, such as the port, can vary between -nodes. +`repmgr.conf` should not be stored inside the PostgreSQL data directory, +as it could be overwritten when setting up or reinitialising the PostgreSQL +server. See section `Configuration` above for further details about `repmgr.conf`. -Your `repmgr.conf` should not be stored inside the PostgreSQL data -directory. We recommend `/etc/repmgr/repmgr.conf`, but you can place it -anywhere and use the `-f /path/to/repmgr.conf` option to tell `repmgr` -where it is. If not specified, `repmgr` will search for `repmgr.conf` in -the current working directory. +`repmgr` will create a schema named after the cluster and prefixed with `repmgr_`, +e.g. `repmgr_test`; we also recommend that you set the `repmgr` user's search path +to include this schema name, e.g. -If your PostgreSQL binaries (`pg_ctl`, `pg_basebackup`) are not in your -`PATH`, you can specify an alternate location in `repmgr.conf`: + ALTER USER repmgr SET search_path TO repmgr_test, "$user", public; - pg_bindir=/path/to/postgres/bin +### Initialise the master server -See `repmgr.conf.sample` for an example configuration file with all -available configuration settings annotated. +To enable `repmgr` to support a replication cluster, the master node must +be registered with `repmgr`, which creates the `repmgr` database and adds +a metadata record for the server: -### Starting up + $ repmgr -f repmgr.conf master register + [2016-01-07 16:56:46] [NOTICE] master node correctly registered for cluster test with id 1 (conninfo: host=repmgr_node1 user=repmgr dbname=repmgr) -The master node must be registered first using `repmgr master register`, -and standby servers must be registered using `repmgr standby register`; -this inserts details about each node into the control database. Use -`repmgr cluster show` to see the result. +The metadata record looks like this: -See the `QUICKSTART.md` file for examples of how to use these commands. + repmgr=# SELECT * FROM repmgr_test.repl_nodes; + id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active + ----+---------+------------------+---------+-------+---------------------------------------------+-----------+----------+-------- + 1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 | t + (1 row) -Failover --------- +Each server in the replication cluster will have its own record and will be updated +when its status or role changes. -To promote a standby to master, on the standby execute e.g.: +### Clone the standby server - repmgr -f /etc/repmgr/repmgr.conf --verbose standby promote +Create a `repmgr.conf` file on the standby server. It must contain at +least the same parameters as the master's `repmgr.conf`, but with +the values `node`, `node_name` and `conninfo` adjusted accordingly, e.g.: -`repmgr` will attempt to connect to the current master to verify that it -is not available (if it is, `repmgr` will not promote the standby). + cluster=test + node=2 + node_name=node2 + conninfo='host=repmgr_node2 user=repmgr dbname=repmgr' -Other standby servers need to be told to follow the new master with e.g.: +Clone the standby with: - repmgr -f /etc/repmgr/repmgr.conf --verbose standby follow + $ repmgr -h repmgr_node1 -U repmgr -d repmgr -D /path/to/node2/data/ -f /etc/repmgr.conf standby clone + [2016-01-07 17:21:26] [NOTICE] destination directory '/path/to/node2/data/' provided + [2016-01-07 17:21:26] [NOTICE] starting backup... + [2016-01-07 17:21:26] [HINT] this may take some time; consider using the -c/--fast-checkpoint option + NOTICE: pg_stop_backup complete, all required WAL segments have been archived + [2016-01-07 17:21:28] [NOTICE] standby clone (using pg_basebackup) complete + [2016-01-07 17:21:28] [NOTICE] you can now start your PostgreSQL server + [2016-01-07 17:21:28] [HINT] for example : pg_ctl -D /path/to/node2/data/ start -See file `FAILOVER.rst` for details on setting up automated failover. +This will clone the PostgreSQL data directory files from the master using +PostgreSQL's pg_basebackup utility. A `recovery.conf` file containing the +correct parameters to start streaming from the master server will +be created automatically, and unless otherwise the `postgresql.conf` and `pg_hba.conf` +files will be copied. + +Make any adjustments to the configuration files now, then start the standby server. + +*NOTE*: `repmgr standby clone` does not require `repmgr.conf`, however we +recommend providing this as `repmgr` will set the `application_name` parameter +in `recovery.conf` as value provided in `node_name`, making it easier to identify +the node in `pg_stat_replication`. It's also possible to provide some advanced +options for controlling the standby cloning process; see next section for +details. + +### Verify replication is functioning + +Connect to the master server and execute: + + repmgr=# SELECT * FROM pg_stat_replication; + -[ RECORD 1 ]----+------------------------------ + pid | 7704 + usesysid | 16384 + usename | repmgr + application_name | node2 + client_addr | 192.168.1.2 + client_hostname | + client_port | 46196 + backend_start | 2016-01-07 17:32:58.322373+09 + backend_xmin | + state | streaming + sent_location | 0/3000220 + write_location | 0/3000220 + flush_location | 0/3000220 + replay_location | 0/3000220 + sync_priority | 0 + sync_state | async -Converting a failed master to a standby ---------------------------------------- +### Register the standby -Often it's desirable to bring a failed master back into replication -as a standby. First, ensure that the master's PostgreSQL server is -no longer running; then use `repmgr standby clone` to re-sync its -data directory with the current master, e.g.: +Register the standby server with: - repmgr -f /etc/repmgr/repmgr.conf \ - --force --rsync-only \ - -h node2 -d repmgr -U repmgr --verbose \ - standby clone + repmgr -f /etc/repmgr.conf standby register + [2016-01-08 11:13:16] [NOTICE] standby node correctly registered for cluster test with id 2 (conninfo: host=repmgr_node2 user=repmgr dbname=repmgr) -Here it's essential to use the command line options `--force`, to -ensure `repmgr` will re-use the existing data directory, and -`--rsync-only`, which causes `repmgr` to use `rsync` rather than -`pg_basebackup`, as the latter can only be used to clone a fresh -standby. +Connect to the standby servers' `repmgr` database and check the `repl_nodes` +table: -The node can then be restarted. + repmgr=# SELECT * FROM repmgr_test.repl_nodes ORDER BY id; + id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active + ----+---------+------------------+---------+-------+---------------------------------------------+-----------+----------+-------- + 1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 | t + 2 | standby | 1 | test | node2 | host=repmgr_node2 dbname=repmgr user=repmgr | | 100 | t + (2 rows) -The node will then need to be re-registered with `repmgr`; again -the `--force` option is required to update the existing record: - - repmgr -f /etc/repmgr/repmgr.conf \ - --force \ - standby register +The standby server now has a copy of records for all servers in the replication +cluster. Note that the relationship between master and standby is explicitly +defined via the `upstream_node_id` value, which shows here that the standby's +upstream server is the replication cluster master. While of limited use +in a simple 2-level master/standby replication cluster, this information is +required to effectively manage cascading replication (see below). +Advanced options for cloning a standby +-------------------------------------- -Replication management with repmgrd +The above section demonstrates the simplest possible way to clone +a standby server. Depending on your situation, finer-grained control +over the cloning process may be necessary. + +### pg_basebackup options when cloning a standby + +By default, `pg_basebackup` performs a checkpoint before beginning the +backup process. However, a normal checkpoint may take some time to complete; +a fast checkpoint can be forced with `repmgr`'s `-c/--fast-checkpoint` +option. This may impact performance of the server being cloned from +so should be used with care. + +Further options can be passed to the `pg_basebackup` utility via +the `pg_basebackup_options` in `repmgr.conf`. See the PostgreSQL +documentation for more details: + http://www.postgresql.org/docs/current/static/app-pgbasebackup.html + +### Using rsync to clone a standby + +By default `repmgr` uses the `pg_basebackup` utility to clone a standby's +data directory from the master. Under some circumstances it may be +desirable to use `rsync` to do this, such as when resyncing the data +directory of a failed server with an active replication node. + +To use `rsync` instead of `pg_basebackup`, provide the `-r/--rsync-only` +option when executing `repmgr standby clone`. + +Note that `repmgr` forces `rsync` to use `--checksum` mode to ensure that all +the required files are copied. This results in additional I/O on both source +and destination server as the contents of files existing on both servers need +to be compared, meaning this method is not necessarily faster than making a +fresh clone with `pg_basebackup`. + + +### Dealing with configuration files + +By default, `repmgr` will attempt to copy the standard configuration files +(`postgresql.conf`, `pg_hba.conf` and `pg_ident.conf`) even if they are located +outside of the data directory (though note currently they will be copied +into the standby's data directory). To prevent this happening, when executing +`repmgr standby clone` provide the `--ignore-external-config-files` option. + +If using `rsync` to clone a standby, additional control over which files +not to transfer is possible by configuring `rsync_options` in `repmgr.conf`, +which enables any valid `rsync` options to be passed to that command, e.g.: + + rsync_options='--exclude=postgresql.local.conf' + + +Setting up cascading replication with repmgr +-------------------------------------------- + +Cascading replication, introduced with PostgreSQL 9.2, enables a standby server +to replicate from another standby server rather than directly from the master, +meaning replication changes "cascade" down through a hierarchy of servers. This +can be used to reduce load on the master and minimize bandwith usage between +sites. + +`repmgr` supports cascading replication. When cloning a standby, in `repmgr.conf` +set the parameter `upstream_node` to the id of the server the standby +should connect to, and `repmgr` will perform the clone using this server +and create `recovery.conf` to point to it. Note that if `upstream_node` +is not explicitly provided, `repmgr` will use the master as the server +to clone from. + +To demonstrate cascading replication, ensure you have a master and standby +set up as shown above in the section "Setting up a simple replication cluster +with repmgr". Create an additional standby server with `repmgr.conf` looking +like this: + + cluster=test + node=3 + node_name=node3 + conninfo='host=repmgr_node3 user=repmgr dbname=repmgr' + upstream_node=2 + +Ensure `upstream_node` contains the `node` id of the previously +created standby. Clone this standby (using the connection parameters +for the existing standby) and register it: + + $ repmgr -h repmgr_node2 -U repmgr -d repmgr -D /path/to/node3/data/ -f /etc/repmgr.conf standby clone + [2016-01-08 13:44:52] [NOTICE] destination directory 'node_3/data/' provided + [2016-01-08 13:44:52] [NOTICE] starting backup (using pg_basebackup)... + [2016-01-08 13:44:52] [HINT] this may take some time; consider using the -c/--fast-checkpoint option + [2016-01-08 13:44:52] [NOTICE] standby clone (using pg_basebackup) complete + [2016-01-08 13:44:52] [NOTICE] you can now start your PostgreSQL server + [2016-01-08 13:44:52] [HINT] for example : pg_ctl -D /path/to/node_3/data start + + $ repmgr -f node_3/repmgr.conf standby register + [2016-01-08 14:04:32] [NOTICE] standby node correctly registered for cluster test with id 3 (conninfo: host=repmgr_node3 dbname=repmgr user=repmgr) + +After starting the standby, the `repl_nodes` table will look like this: + + repmgr=# SELECT * FROM repmgr_test.repl_nodes ORDER BY id; + id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active + ----+---------+------------------+---------+-------+---------------------------------------------+-----------+----------+-------- + 1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 | t + 2 | standby | 1 | test | node2 | host=repmgr_node2 dbname=repmgr user=repmgr | | 100 | t + 3 | standby | 2 | test | node3 | host=repmgr_node3 dbname=repmgr user=repmgr | | 100 | t + (3 rows) + + +Using replication slots with repmgr ----------------------------------- +Replication slots were introduced with PostgreSQL 9.4 and are designed to ensure +that any standby connected to the master using replication slot will always +be able to retrieve the required WAL files. This removes the need to manually +manage WAL file retention by estimating the number of WAL files that need to +be maintained on the master using `wal_keep_segments`. Do however be aware +that if a standby is disconnected, WAL will continue to accumulate on the master +until either the standby reconnects or the replication slot is dropped. + +To enable `repmgr` to use replication slots, set the boolean parameter +`use_replication_slots` in `repmgr.conf`: + + use_replication_slots=1 + +Note that `repmgr` will fail with an error if this option is specified when +working with PostgreSQL 9.3. + +When cloning a standby, `repmgr` will automatically generate an appropriate +slot name, which is stored in the `repl_nodes` table, and create the slot +on the master: + + +Be aware that when initially cloning a standby, you will need to ensure +that all required WAL files remain available while the cloning is taking +place. If using the default `pg_basebackup` method, we recommend setting +`pg_basebackup`'s `--xlog-method` parameter to `stream` like this: + + pg_basebackup_options='--xlog-method=stream' + +See the `pg_basebackup` documentation for details: + http://www.postgresql.org/docs/current/static/app-pgbasebackup.html + +Otherwise it's necessary to set `wal_keep_segments` to an appropriately high +value. + +Further information on replication slots in the PostgreSQL documentation: + http://www.postgresql.org/docs/current/interactive/warm-standby.html#STREAMING-REPLICATION-SLOTS + + +Promoting a standby server with repmgr +-------------------------------------- + +If a master server fails or needs to be removed from the replication cluster, +a new master server must be designated to ensure the cluster continues +working correctly. This can be done with `repmgr standby promote`, which promotes +the standby on the current server to master + +To demonstrate this, set up a replication cluster with a master and two attached +standby servers so that the `repl_nodes` table looks like this: + + repmgr=# SELECT * FROM repmgr_test.repl_nodes ORDER BY id; + id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active + ----+---------+------------------+---------+-------+---------------------------------------------+-----------+----------+-------- + 1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 | t + 2 | standby | 1 | test | node2 | host=repmgr_node2 dbname=repmgr user=repmgr | | 100 | t + 3 | standby | 1 | test | node3 | host=repmgr_node3 dbname=repmgr user=repmgr | | 100 | t + (3 rows) + +Stop the current master with e.g.: + + $ pg_ctl -D /path/to/node_1/data -m fast stop + +At this point the replication cluster will be in a partially disabled state with +both standbys accepting read-only connections while attempting to connect to the +stopped master. Note that the `repl_nodes` table will not yet have been updated +and will still show the master as active. + +Promote the first standby with: + + $ repmgr -f /etc/repmgr.conf standby promote + +This will produce output similar to the following: + + [2016-01-08 16:07:31] [ERROR] connection to database failed: could not connect to server: Connection refused + Is the server running on host "repmgr_node1" (192.161.2.1) and accepting + TCP/IP connections on port 5432? + could not connect to server: Connection refused + Is the server running on host "repmgr_node1" (192.161.2.1) and accepting + TCP/IP connections on port 5432? + + [2016-01-08 16:07:31] [NOTICE] promoting standby + [2016-01-08 16:07:31] [NOTICE] promoting server using '/usr/bin/postgres/pg_ctl -D /path/to/node_2/data promote' + server promoting + [2016-01-08 16:07:33] [NOTICE] STANDBY PROMOTE successful + +Note: the first `[ERROR]` is `repmgr` attempting to connect to the current +master to verify that it has failed. If a valid master is found, `repmgr` +will refuse to promote a standby. + +The `repl_nodes` table will now look like this: + + id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active + ----+---------+------------------+---------+-------+---------------------------------------------+-----------+----------+-------- + 1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 | f + 2 | master | | test | node2 | host=repmgr_node2 dbname=repmgr user=repmgr | | 100 | t + 3 | standby | 1 | test | node3 | host=repmgr_node3 dbname=repmgr user=repmgr | | 100 | t + (3 rows) + +The previous master has been marked as inactive, and `node2`'s `upstream_node_id` +has been cleared as it's now the "topmost" server in the replication cluster. + +However the sole remaining standby is still trying to replicate from the failed +master; `repmgr standby follow` must now be executed to rectify this situation. + + +Following a new master server with repmgr +----------------------------------------- + +Following the failure or removal of the replication cluster's existing master +server, `repmgr standby follow` can be used to make 'orphaned' standbys +follow the new master. + +To demonstrate this, assuming a replication cluster in the same state as the +end of the preceding section ("Promoting a standby server with repmgr"), +execute this: + + $ repmgr -f /etc/repmgr.conf -D /path/to/node_3/data/ -h repmgr_node2 -U repmgr -d repmgr standby follow + [2016-01-08 16:57:06] [NOTICE] restarting server using '/usr/bin/postgres/pg_ctl -D /path/to/node_3/data/ -w -m fast restart' + waiting for server to shut down.... done + server stopped + waiting for server to start.... done + server started + +The standby is now replicating from the new master and `repl_nodes` has been +updated to reflect this: + + id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active + ----+---------+------------------+---------+-------+---------------------------------------------+-----------+----------+-------- + 1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 | f + 2 | master | | test | node2 | host=repmgr_node2 dbname=repmgr user=repmgr | | 100 | t + 3 | standby | 2 | test | node3 | host=repmgr_node3 dbname=repmgr user=repmgr | | 100 | t + (3 rows) + + +Note that `repmgr standby follow` can akso be used to detach a standby from its +current upstream server and follow another upstream server, including the +master. + + +Performing a switchover with repmgr +----------------------------------- + +A typical use-case for replication is a combination of master and standby +server, with the standby serving as a backup which can easily be activated +in case of a problem with the master. Such an unplanned failover would +normally be handled by promoting the standby, after which appropriate action +taken to restore the old master. + +In some cases however it's desirable to promote the standby in a planned +way, e.g. so maintenance can be performed on the master; this kind of switchover +is supported by the `repmgr standby switchover` command. + +`repmgr standby switchover` differs from other `repmgr` actions in that it +also performs actions on another server, for which reason both passwordless +SSH access and the path of `repmgr.conf` on that server. + + *NOTE* `repmgr standby switchover` performs a relatively complex series + of operations on two servers, and should therefore be performed after + careful preparation and with adequate attention. In particular you should + be confident that your network environment is stable and reliable. + + We recommend running `repmgr standby switchover` at the most verbose + logging level (`--log-level DEBUG --verbose`) and capturing all output + to assist troubleshooting any problems. + + Please read carefully the list of caveats below. + +To demonstrate switchover, we will assume a replication cluster running on +PostgreSQL 9.5 or later with a master (`node1`) and a standby (`node2`); +after the switchover `node2` should become the master with `node1` following it. + +The switchover command must be run from the standby which is to be promoted, +and in its simplest form looks like this: + + repmgr -f /etc/repmgr.conf -C /etc/repmgr.conf standby switchover + +`-f /etc/repmgr.conf` is, as usual the local repmgr node's configuration file. +`-C /etc/repmgr.conf` is the path to the configuration file on the current +master, which is required to execute `repmgr` remotely on that server; +if it is not provided with `-C`, `repmgr` will check the same path as on the +local server, as well as the normal default locations. `repmgr` will check +this file can be found before performing any further actions. + + $ repmgr -f /etc/repmgr.conf -C /etc/repmgr.conf standby switchover -v + [2016-01-27 16:38:33] [NOTICE] using configuration file "/etc/repmgr.conf" + [2016-01-27 16:38:33] [NOTICE] switching current node 2 to master server and demoting current master to standby... + [2016-01-27 16:38:34] [NOTICE] 5 files copied to /tmp/repmgr-node1-archive + [2016-01-27 16:38:34] [NOTICE] connection to database failed: FATAL: the database system is shutting down + + [2016-01-27 16:38:34] [NOTICE] current master has been stopped + [2016-01-27 16:38:34] [ERROR] connection to database failed: FATAL: the database system is shutting down + + [2016-01-27 16:38:34] [NOTICE] promoting standby + [2016-01-27 16:38:34] [NOTICE] promoting server using '/usr/local/bin/pg_ctl -D /var/lib/postgresql/9.5/node_2/data promote' + server promoting + [2016-01-27 16:38:36] [NOTICE] STANDBY PROMOTE successful + [2016-01-27 16:38:36] [NOTICE] Executing pg_rewind on old master server + [2016-01-27 16:38:36] [NOTICE] 5 files copied to /var/lib/postgresql/9.5/data + [2016-01-27 16:38:36] [NOTICE] restarting server using '/usr/local/bin/pg_ctl -w -D /var/lib/postgresql/9.5/node_1/data -m fast restart' + pg_ctl: PID file "/var/lib/postgresql/9.5/node_1/data/postmaster.pid" does not exist + Is server running? + starting server anyway + [2016-01-27 16:38:37] [NOTICE] node 1 is replicating in state "streaming" + [2016-01-27 16:38:37] [NOTICE] switchover was successful + +Messages containing the line `connection to database failed: FATAL: the database +system is shutting down` are not errors - `repmgr` is polling the old master database +to make sure it has shut down correctly. `repmgr` will also archive any +configuration files in the old master's data directory as they will otherwise +be overwritten by `pg_rewind`; they are restored once the `pg_rewind` operation +has completed. + +The old master is now replicating as a standby from the new master and `repl_nodes` +should have been updated to reflect this: + + repmgr=# SELECT * from repl_nodes ORDER BY id; + id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active + ----+---------+------------------+---------+-------+------------------------------------------+-----------+----------+-------- + 1 | standby | 2 | test | node1 | host=localhost dbname=repmgr user=repmgr | | 100 | t + 2 | master | | test | node2 | host=localhost dbname=repmgr user=repmgr | | 100 | t + (2 rows) + + +### Caveats + +- the functionality provided `repmgr standby switchover` is primarily aimed + at a two-server master/standby replication cluster and currently does + not support additional standbys. +- `repmgr standby switchover` is designed to use the `pg_rewind` utility, + standard in 9.5 and later and available for seperately in 9.3 and 9.4 + (see note below) +- `pg_rewind` *requires* that either `wal_log_hints` is enabled, or that + data checksums were enabled when the cluster was initialized. See the + `pg_rewind` documentation for details: + http://www.postgresql.org/docs/current/static/app-pgrewind.html +- `repmgrd` should not be running when a switchover is carried out, otherwise + the `repmgrd` may try and promote a standby by itself. +- Any other standbys attached to the old master will need to be manually + instructed to point to the new master (e.g. with `repmgr standby follow`). + +We hope to remove some of these restrictions in future versions of `repmgr`. + + +### Switchover and PostgreSQL 9.3/9.4 + +In order to efficiently reintegrate a demoted master into the replication +cluster as a standby, it's necessary to resynchronise its data directory +with that of the current master, as it's very likely that their timelines +will have diverged slightly following the shutdown of the old master. + +The utility `pg_rewind` provides an efficient way of doing this, however +is not included in the core PostgreSQL distribution for versions 9.3 and 9.4. +Hoever, `pg_rewind` is available separately for these versions and we +strongly recommend its installation. To use it with versions 9.3 and 9.4, +provide the command line option `--pg_rewind`, optionally with the +path to the `pg_rewind` binary location if not installed in the PostgreSQL +`bin` directory. + +`pg_rewind` for versions 9.3 and 9.4 can be obtained from: + https://github.com/vmware/pg_rewind + +If `pg_rewind` is not available, as a fallback `repmgr` will use `repmgr +standby clone` to resynchronise the old master's data directory using +`rsync`. However, in order to ensure all files are synchronised, the +entire data directory on both servers must be scanned, a process which +can take some time on larger databases, in which case you should +consider making a fresh standby clone. + + +Unregistering a standby from a replication cluster +-------------------------------------------------- + +To unregister a running standby, execute: + + repmgr standby unregister -f /etc/repmgr.conf + +This will remove the standby record from `repmgr`'s internal metadata +table (`repl_nodes`). A `standby_unregister` event notification will be +recorded in the `repl_events` table. + +Note that this command will not stop the server itself or remove +it from the replication cluster. + +If the standby is not running, the standby record must be manually +removed from the `repl_nodes` table with e.g.: + + DELETE FROM repmgr_test.repl_nodes WHERE id = 3; + +Adjust schema and node ID accordingly. A future `repmgr` release +will make it possible to unregister failed standbys. + + +Automatic failover with repmgrd +------------------------------- + `repmgrd` is a management and monitoring daemon which runs on standby nodes and which can automate actions such as failover and updating standbys to -follow the new master.`repmgrd` can be started simply with e.g.: +follow the new master. - repmgrd -f /etc/repmgr/repmgr.conf --verbose > $HOME/repmgr/repmgr.log 2>&1 +To use `repmgrd` for automatic failover, the following `repmgrd` options must +be set in `repmgr.conf`: -or alternatively: + failover=automatic + promote_command='repmgr standby promote -f /etc/repmgr/repmgr.conf' + follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf' - repmgrd -f /etc/repmgr/repmgr.conf --verbose --monitoring-history > $HOME/repmgr/repmgrd.log 2>&1 +(See `repmgr.conf.sample` for further `repmgrd`-specific settings). -which will track replication advance or lag on all registered standbys. +When `failover` is set to `automatic`, upon detecting failure of the current +master, `repmgrd` will execute one of `promote_command` or `follow_command`, +depending on whether the current server is becoming the new master or +needs to follow another server which has become the new master. Note that +these commands can be any valid shell script which results in one of these +actions happening, but we strongly recommend executing `repmgr` directly. + +`repmgrd` can be started simply with e.g.: + + repmgrd -f /etc/repmgr.conf --verbose > $HOME/repmgr/repmgr.log 2>&1 For permanent operation, we recommend using the options `-d/--daemonize` to detach the `repmgrd` process, and `-p/--pid-file` to write the process PID to a file. -Example log output (at default log level): +Note that currently `repmgrd` is not required to run on the master server. - [2015-03-11 13:15:40] [INFO] checking cluster configuration with schema 'repmgr_test' - [2015-03-11 13:15:40] [INFO] checking node 2 in cluster 'test' - [2015-03-11 13:15:40] [INFO] reloading configuration file and updating repmgr tables - [2015-03-11 13:15:40] [INFO] starting continuous standby node monitoring +To demonstrate automatic failover, set up a 3-node replication cluster (one master +and two standbys streaming directly from the master) so that the `repl_nodes` +table looks like this: + + repmgr=# SELECT * FROM repmgr_test.repl_nodes ORDER BY id; + id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active + ----+---------+------------------+---------+-------+---------------------------------------------+-----------+----------+-------- + 1 | master | | test | node1 | host=repmgr_node1 dbname=repmgr user=repmgr | | 100 | t + 2 | standby | 1 | test | node2 | host=repmgr_node2 dbname=repmgr user=repmgr | | 100 | t + 3 | standby | 1 | test | node3 | host=repmgr_node3 dbname=repmgr user=repmgr | | 100 | t + (3 rows) + + +Start `repmgrd` on each standby and verify that it's running by examining +the log output, which at default log level will look like this: + + [2016-01-05 13:15:40] [INFO] checking cluster configuration with schema 'repmgr_test' + [2016-01-05 13:15:40] [INFO] checking node 2 in cluster 'test' + [2016-01-05 13:15:40] [INFO] reloading configuration file and updating repmgr tables + [2016-01-05 13:15:40] [INFO] starting continuous standby node monitoring + +Each `repmgrd` should also have noted its successful startup in the `repl_events` +table: + + repmgr=# SELECT * FROM repl_events WHERE event = 'repmgrd_start'; + node_id | event | successful | event_timestamp | details + ---------+---------------+------------+-------------------------------+--------- + 2 | repmgrd_start | t | 2016-01-27 18:22:38.080231+09 | + 3 | repmgrd_start | t | 2016-01-27 18:22:38.08756+09 | + (2 rows) + +Now stop the current master server with e.g.: + + pg_ctl -D /path/to/node1/data -m immediate stop + +This will force the master node to shut down straight away, aborting all +processes and transactions. This will cause a flurry of activity in +the `repmgrd` log files as each `repmgrd` detects the failure of the master +and a failover decision is made. Here extracts from the standby server +promoted to new master: + + [2016-01-06 18:32:58] [WARNING] connection to upstream has been lost, trying to recover... 15 seconds before failover decision + [2016-01-06 18:33:03] [WARNING] connection to upstream has been lost, trying to recover... 10 seconds before failover decision + [2016-01-06 18:33:08] [WARNING] connection to upstream has been lost, trying to recover... 5 seconds before failover decision + ... + [2016-01-06 18:33:18] [NOTICE] this node is the best candidate to be the new master, promoting... + ... + [2016-01-06 18:33:20] [NOTICE] STANDBY PROMOTE successful + +and here from the standby server which is now following the new master: + + [2016-01-06 18:32:58] [WARNING] connection to upstream has been lost, trying to recover... 15 seconds before failover decision + [2016-01-06 18:33:03] [WARNING] connection to upstream has been lost, trying to recover... 10 seconds before failover decision + [2016-01-06 18:33:08] [WARNING] connection to upstream has been lost, trying to recover... 5 seconds before failover decision + ... + [2016-01-06 18:33:23] [NOTICE] node 2 is the best candidate for new master, attempting to follow... + [2016-01-06 18:33:23] [INFO] changing standby's master + ... + [2016-01-06 18:33:25] [NOTICE] node 3 now following new upstream node 2 + +The `repl_nodes` table should have been updated to reflect the new situation, +with the original master (`node1`) marked as inactive, and standby `node3` +now following the new master (`node2`): + + repmgr=# SELECT * from repl_nodes ORDER BY id; + id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active + ----+---------+------------------+---------+-------+------------------------------------------+-----------+----------+-------- + 1 | master | | test | node1 | host=localhost dbname=repmgr user=repmgr | | 100 | f + 2 | master | | test | node2 | host=localhost dbname=repmgr user=repmgr | | 100 | t + 3 | standby | 2 | test | node3 | host=localhost dbname=repmgr user=repmgr | | 100 | t + (3 rows) + +The `repl_events` table will contain a summary of what happened to each server +during the failover: + + repmgr=# SELECT * from repmgr_test.repl_events where event_timestamp>='2016-01-06 18:30'; + node_id | event | successful | event_timestamp | details + ---------+--------------------------+------------+-------------------------------+---------------------------------------------------------- + 2 | standby_promote | t | 2016-01-06 18:33:20.061736+09 | node 2 was successfully promoted to master + 2 | repmgrd_failover_promote | t | 2016-01-06 18:33:20.067132+09 | node 2 promoted to master; old master 1 marked as failed + 3 | repmgrd_failover_follow | t | 2016-01-06 18:33:25.331012+09 | node 3 now following new upstream node 2 + (3 rows) + + +repmgrd log rotation +-------------------- Note that currently `repmgrd` does not provide logfile rotation. To ensure the current logfile does not grow indefinitely, configure your system's `logrotate` to do this. Sample configuration to rotate logfiles weekly with retention for up to 52 weeks and rotation forced if a file grows beyond 100Mb: - /var/log/postgresql/repmgr-9.4.log { + /var/log/postgresql/repmgr-9.5.log { missingok compress rotate 52 @@ -281,9 +947,39 @@ for up to 52 weeks and rotation forced if a file grows beyond 100Mb: create 0600 postgres postgres } +Monitoring +---------- -Witness server --------------- +When `repmgrd` is running with the option `-m/--monitoring-history`, it will +constantly write node status information to the `repl_monitor` table, which can +be queried easily using the view `repl_status`: + + repmgr=# SELECT * FROM repmgr_test.repl_status; + -[ RECORD 1 ]-------------+----------------------------- + primary_node | 1 + standby_node | 2 + standby_name | node2 + node_type | standby + active | t + last_monitor_time | 2016-01-05 14:02:34.51713+09 + last_wal_primary_location | 0/3012AF0 + last_wal_standby_location | 0/3012AF0 + replication_lag | 0 bytes + replication_time_lag | 00:00:03.463085 + apply_lag | 0 bytes + communication_time_lag | 00:00:00.955385 + +The interval in which monitoring history is written is controlled by the +configuration parameter `monitor_interval_secs`; default is 2. + +As this can generate a large amount of monitoring data in the `repl_monitor` +table , it's advisable to regularly purge historical data with +`repmgr cluster cleanup`; use the `-k/--keep-history` to specify how +many day's worth of data should be retained. + + +Using a witness server with repmgrd +------------------------------------ In a situation caused e.g. by a network interruption between two data centres, it's important to avoid a "split-brain" situation where @@ -306,83 +1002,9 @@ makes sense to create a witness server in conjunction with running `repmgrd`; the witness server will require its own `repmgrd` instance. -Monitoring ----------- -When `repmgrd` is running with the option `-m/--monitoring-history`, it will -constantly write node status information to the `repl_monitor` table, which can -be queried easily using the view `repl_status`: - - repmgr=# SELECT * FROM repmgr_test.repl_status; - -[ RECORD 1 ]-------------+----------------------------- - primary_node | 1 - standby_node | 2 - standby_name | node2 - node_type | standby - active | t - last_monitor_time | 2015-03-11 14:02:34.51713+09 - last_wal_primary_location | 0/3012AF0 - last_wal_standby_location | 0/3012AF0 - replication_lag | 0 bytes - replication_time_lag | 00:00:03.463085 - apply_lag | 0 bytes - communication_time_lag | 00:00:00.955385 - - -Event logging and notifications -------------------------------- - -To help understand what significant events (e.g. failure of a node) happened -when and for what reason, `repmgr` logs such events into the `repl_events` -table, e.g.: - - repmgr_db=# SELECT * from repmgr_test.repl_events ; - node_id | event | successful | event_timestamp | details - ---------+------------------+------------+-------------------------------+----------------------------------------------------------------------------------- - 1 | master_register | t | 2015-03-16 17:36:21.711796+09 | - 2 | standby_clone | t | 2015-03-16 17:36:31.286934+09 | Cloned from host 'localhost', port 5500; backup method: pg_basebackup; --force: N - 2 | standby_register | t | 2015-03-16 17:36:32.391567+09 | - (3 rows) - - -Additionally `repmgr` can execute an external program each time an event is -logged. This program is defined with the configuration variable -`event_notification_command`; the command string can contain the following -placeholders, which will be replaced with the same content which is -written to the `repl_events` table: - - %n - node id - %e - event type - %s - success (1 or 0) - %t - timestamp - %d - description - -Example: - - event_notification_command=/path/to/some-script %n %e %s "%t" "%d" - -By default the program defined with `event_notification_command` will be -executed for every event; to restrict execution to certain events, list -these in the parameter `event_notifications` - - event_notifications=master_register,standby_register - -Following event types currently exist: - - master_register - standby_register - standby_unregister - standby_clone - standby_promote - witness_create - repmgrd_start - repmgrd_monitor - repmgrd_failover_promote - repmgrd_failover_follow - - -Cascading replication ---------------------- +repmgrd and cascading replication +--------------------------------- Cascading replication - where a standby can connect to an upstream node and not the master server itself - was introduced in PostgreSQL 9.2. `repmgr` and @@ -396,79 +1018,113 @@ and continue working as normal (even if the upstream standby it's connected to becomes the master node). If however the node's direct upstream fails, the "cascaded standby" will attempt to reconnect to that node's parent. -To configure standby servers for cascading replication, add the parameter -`upstream_node` to `repmgr.conf` and set it to the id of the node it should -connect to, e.g.: - cluster=test - node=2 - node_name=node2 - upstream_node=1 +Generating event notifications with repmgr/repmgrd +-------------------------------------------------- -Replication slots ------------------ +Each time `repmgr` or `repmgrd` perform a significant event, a record +of that event is written into the `repl_events` table together with +a timestamp, an indication of failure or success, and further details +if appropriate. This is useful for gaining an overview of events +affecting the replication cluster. However note that this table has +advisory character and should be used in combination with the `repmgr` +and PostgreSQL logs to obtain details of any events. -Replication slots were introduced with PostgreSQL 9.4 and enable standbys to -notify the master of their WAL consumption, ensuring that the master will -not remove any WAL files until they have been received by all standbys. -This mitigates the requirement to manage WAL file retention using -`wal_keep_segments` etc., with the caveat that if a standby fails, no WAL -files will be removed until the standby's replication slot is deleted. +Example output after a master was registered and a standby cloned +and registered: -To enable replication slots, set the boolean parameter `use_replication_slots` -in `repmgr.conf`: + repmgr=# SELECT * from repmgr_test.repl_events ; + node_id | event | successful | event_timestamp | details + ---------+------------------+------------+-------------------------------+------------------------------------------------------------------------------------- + 1 | master_register | t | 2016-01-08 15:04:39.781733+09 | + 2 | standby_clone | t | 2016-01-08 15:04:49.530001+09 | Cloned from host 'repmgr_node1', port 5432; backup method: pg_basebackup; --force: N + 2 | standby_register | t | 2016-01-08 15:04:50.621292+09 | + (3 rows) - use_replication_slots=1 +Additionally, event notifications can be passed to a user-defined program +or script which can take further action, e.g. send email notifications. +This is done by setting the `event_notification_command` parameter in +`repmgr.conf`. -`repmgr` will automatically generate an appropriate slot name, which is -stored in the `repl_nodes` table. +This parameter accepts the following format placehikders: -Note that `repmgr` will fail with an error if this option is specified when -working with PostgreSQL 9.3. + %n - node ID + %e - event type + %s - success (1 or 0) + %t - timestamp + %d - details -Be aware that when initially cloning a standby, you will need to ensure -that all required WAL files remain available while the cloning is taking -place. If using the default `pg_basebackup` method, we recommend setting -`pg_basebackup`'s `--xlog-method` parameter to `stream` like this: +The values provided for "%t" and "%d" will probably contain spaces, +so should be quoted in the provided command configuration, e.g.: - pg_basebackup_options='--xlog-method=stream' + event_notification_command='/path/to/some/script %n %e %s "%t" "%d"' -See the `pg_basebackup` documentation [*] for details. Otherwise you'll need -to set `wal_keep_segments` to an appropriately high value. +By default, all notifications will be passed; the notification types +can be filtered to explicitly named ones: -[*] http://www.postgresql.org/docs/current/static/app-pgbasebackup.html + event_notifications=master_register,standby_register,witness_create -Further reading: - * http://www.postgresql.org/docs/current/interactive/warm-standby.html#STREAMING-REPLICATION-SLOTS - * http://blog.2ndquadrant.com/postgresql-9-4-slots/ +The following event types are available: -Upgrading from repmgr 2 ------------------------ + * master_register + * standby_register + * standby_unregister + * standby_clone + * standby_promote + * standby_follow + * standby_switchover + * witness_create + * witness_create + * repmgrd_start + * repmgrd_shutdown + * repmgrd_failover_promote + * repmgrd_failover_follow -`repmgr 3` is largely compatible with `repmgr 2`; the only step required -to upgrade is to update the `repl_nodes` table to the definition needed -by `repmgr 3`. See the file `sql/repmgr2_repmgr3.sql` for details on how -to do this. +Note that under some circumstances (e.g. no replication cluster master could +be located), it will not be possible to write an entry into the `repl_events` +table, in which case `event_notification_command` can serve as a fallback. -`repmgrd` must *not* be running while `repl_nodes` is being updated. +Upgrading repmgr +---------------- -Existing `repmgr.conf` files can be retained as-is. +`repmgr` is updated regularly with point releases (e.g. 3.0.2 to 3.0.3) +containing bugfixes and other minor improvements. Any substantial new +functionality will be included in a feature release (e.g. 3.0.x to 3.1.x). ---------------------------------------- +In general `repmgr` can be upgraded as-is without any further action required, +however feature releases may require the `repmgr` database to be upgraded. +An SQL script will be provided - please check the release notes for details. Reference --------- -### repmgr command reference +### Default values -Not all of these commands need the ``repmgr.conf`` file, but they need to be able to -connect to the remote and local databases. +For some command line and most configuration file parameters, `repmgr` falls +back to default values if values for these are not explicitly provided. + +The file `repmgr.conf.sample` documents the default value of configuration +parameters if one is set. Of particular note is the log level, which +defaults to NOTICE; particularly when using repmgr from the command line +it may be useful to set this to a higher level with `-L/--log-level`. e.g. +to `INFO`. + +Execute `repmgr --help` to see the default values for various command +line parameters, particularly database connection parameters. + +See the section `Configuration` above for information on how the +configuration file is located if `-f/--config-file` is not supplied. + +### repmgr commands + +The `repmgr` command line tool accepts commands for specific servers in the +replication in the format "`server type` `action`", or for the entire +replication cluster in the format "`cluster` `action`". Each command is +described below. + +In general, each command needs to be provided with the path to `repmgr.conf`, +which contains connection details for the local database. -You can teach it which is the remote database by using the -h parameter or -as a last parameter in standby clone and standby follow. If you need to specify -a port different then the default 5432 you can specify a -p parameter. -Standby is always considered as localhost and a second -p parameter will indicate -its port if is different from the default one. * `master register` @@ -486,7 +1142,7 @@ its port if is different from the default one. * `standby unregister` Unregisters a standby with `repmgr`. This command does not affect the actual - replication. + replication, just removes the standby's entry from the `repl_nodes` table. * `standby clone [node to be cloned]` @@ -520,6 +1176,27 @@ its port if is different from the default one. This command will not function if the current master is still running. +* `standby switchover` + + Promotes a standby to master and demotes the existing master to a standby. + This command must be run on the standby to be promoted, and requires a + password-less SSH connection to the current master. Additionally the + location of the master's `repmgr.conf` file must be provided with + `-C/--remote-config-file`. + + `repmgrd` should not be active if a switchover is attempted. This + restriction may be lifted in a later version. + +* `standby follow` + + Attaches the standby to a new master. This command requires a valid + `repmgr.conf` file for the standby, either specified explicitly with + `-f/--config-file` or located in the current working directory; no + additional arguments are required. + + This command will force a restart of the standby server. It can only be used + to attach a standby to a new master node. + * `witness create` Creates a witness server as a separate PostgreSQL instance. This instance @@ -534,36 +1211,26 @@ its port if is different from the default one. By default the witness server will use port 5499 to facilitate easier setup on a server running an existing node. -* `standby follow` - - Attaches the standby to a new master. This command requires a valid - `repmgr.conf` file for the standby, either specified explicitly with - `-f/--config-file` or located in the current working directory; no - additional arguments are required. - - This command will force a restart of the standby server. It can only be used - to attach a standby to a new master node. - * `cluster show` - Displays information about each node in the replication cluster. This + Displays information about each active node in the replication cluster. This command polls each registered server and shows its role (master / standby / witness) or "FAILED" if the node doesn't respond. It polls each server directly and can be run on any node in the cluster; this is also useful when analyzing connectivity from a particular node. - This command requires a valid `repmgr.conf` file for the node on which it is - executed, either specified explicitly with `-f/--config-file` or located in - the current working directory; no additional arguments are required. + This command requires a valid `repmgr.conf` file to be provided; no + additional arguments are required. Example: - repmgr -f /path/to/repmgr.conf cluster show - Role | Connection String - * master | host=node1 dbname=repmgr user=repmgr - standby | host=node2 dbname=repmgr user=repmgr - standby | host=node3 dbname=repmgr user=repmgr + $ repmgr -f /etc/repmgr.conf cluster show + Role | Name | Upstream | Connection String + ----------+-------|----------|-------------------------------------------- + * master | node1 | | host=repmgr_node1 dbname=repmgr user=repmgr + standby | node2 | node1 | host=repmgr_node1 dbname=repmgr user=repmgr + standby | node3 | node2 | host=repmgr_node1 dbname=repmgr user=repmgr * `cluster cleanup` @@ -576,31 +1243,6 @@ its port if is different from the default one. executed, either specified explicitly with `-f/--config-file` or located in the current working directory; no additional arguments are required. -### repmgr configuration file - -See `repmgr.conf.sample` for an example configuration file with available -configuration settings annotated. - -### repmgr database schema - -`repmgr` creates a small schema for its own use in the database specified in -each node's `conninfo` configuration parameter. This database can in principle -be any database. The schema name is the global `cluster` name prefixed -with `repmgr_`, so for the example setup above the schema name is -`repmgr_test`. - -The schema contains two tables: - -* `repl_nodes` - stores information about all registered servers in the cluster -* `repl_monitor` - stores monitoring information about each node (generated by `repmgrd` with - `-m/--monitoring-history` option enabled) - -and one view: -* `repl_status` - summarizes the latest monitoring information for each node (generated by `repmgrd` with - `-m/--monitoring-history` option enabled) ### Error codes @@ -625,17 +1267,22 @@ exit: Support and Assistance ---------------------- -2ndQuadrant provides 24x7 production support for repmgr, including +2ndQuadrant provides 24x7 production support for `repmgr`, including configuration assistance, installation verification and training for running a robust replication cluster. For further details see: * http://2ndquadrant.com/en/support/ -There is a mailing list/forum to discuss contributions or issues -http://groups.google.com/group/repmgr +There is a mailing list/forum to discuss contributions or issues: + +* http://groups.google.com/group/repmgr The IRC channel #repmgr is registered with freenode. +Please report bugs and other issues to: + +* https://github.com/2ndQuadrant/repmgr + Further information is available at http://www.repmgr.org/ We'd love to hear from you about how you use repmgr. Case studies and @@ -661,6 +1308,5 @@ Thanks from the repmgr core team. Further reading --------------- -* http://blog.2ndquadrant.com/announcing-repmgr-2-0/ * http://blog.2ndquadrant.com/managing-useful-clusters-repmgr/ * http://blog.2ndquadrant.com/easier_postgresql_90_clusters/