Add a quick setup for autofailover

2026-06-01 03:39:05 +00:00 · 2012-06-26 07:49:43 -05:00
parent e3c3c22b6e
commit d58ea77798
2 changed files with 215 additions and 3 deletions
@@ -23,7 +23,7 @@ databases as a single cluster.  repmgr includes two components:
 Supported Releases
 ------------------
-repmgr works with PostgreSQL versions 9.0 and 9.1.
+repmgr works with PostgreSQL versions 9.0 and superior.
 There are currently no incompatibilities when upgrading repmgr from 9.0 to 9.1,
 so your 9.0 configuration will work with 9.1
@@ -389,7 +389,7 @@ walkthrough assumes the following setup:
 * Another standby server called "node3" with a similar configuration to "node2".
-* The Postgress installation in each of the above is defined as $PGDATA, 
+* The Postgres installation in each of the above is defined as $PGDATA, 
  which is represented here as ``/var/lib/pgsql/9.0/data``
 Creating some sample data
@@ -514,12 +514,14 @@ following the standard directory structure of a RHEL system.  It should contain:
  cluster=test
  node=1
  node_name=earth
  conninfo='host=node1 user=repmgr dbname=pgbench'
 On "node2" create the file ``/var/lib/pgsql/repmgr/repmgr.conf`` with::
  cluster=test
  node=2
  node_name=mars
  conninfo='host=node2 user=repmgr dbname=pgbench'
 The STANDBY CLONE process should have created a recovery.conf file on
@@ -712,12 +714,14 @@ and it should contain::
  cluster=test
  node=1
  node_name=earth
  conninfo='host=127.0.0.1 dbname=testdb'
 On "standby" create the file ``/home/standby/repmgr/repmgr.conf`` with::
  cluster=test
  node=2
  node_name=mars
  conninfo='host=127.0.0.1 dbname=testdb'
 Next, with "prime" server running, we want to use the ``clone standby`` command
@@ -1133,4 +1137,3 @@ Jaime Casanova
 Simon Riggs
 Greg Smith
 Cedric Villemain
@@ -0,0 +1,209 @@
 =====================================================
 PostgreSQL Automatic Fail-Over - User Documentation
 =====================================================
 Automatic Failover
 ==================
 repmgr allows setups for automatic failover when it detects the failure of the master node.
 Following is a quick setup for this.
 Installation
 ============
 For convenience, we define:
  * node1 is the hostname fully qualified of the Master server, IP 192.168.1.10
  * node2 is the hostname fully qualified of the Standby server, IP 192.168.1.11
  * witness is the hostname fully qualified of the server used for witness, IP 192.168.1.12
 :Note: It is not recommanded to use name defining status of a server like «masterserver»,
       this is a name leading to confusion once a failover take place and the Master is
       now on the «standbyserver».
 Summary
 -------
 2 PostgreSQL servers are involved in the replication.  Automatic fail-over need
 to vote to decide what server it should promote, thus an odd number is required
 and a witness-repmgrd is installed in a third server where it uses a PostgreSQL
 cluster to communicate with other repmgrd daemons.
 1. Install PostgreSQL in all the servers involved (including the server used for
 witness)
 2. Install repmgr in all the servers involved (including the server used for witness)
 3. Configure the Master PostreSQL
 4. Clone the Master to the Standby using "repmgr standby clone" command
 5. Configure repmgr in all the servers involved (including the server used for witness)
 6. Register Master and Standby nodes
 7. Initiate witness server
 8. Start the repmgrd daemons in all nodes
 :Note: A complete Hight-Availability design need at least 3 servers to still have
       a backup node after a first failure.
 Install PostgreSQL
 ------------------
 You can install PostgreSQL using any of the recommended methods. You should ensure
 it's 9.0 or superior.
 Install repmgr
 --------------
 Install repmgr following the steps in the README.
 Configure PostreSQL
 -------------------
 Log in node1.
 Edit the file postgresql.conf and modify the parameters::
  listen_addresses='*'
  wal_level = 'hot_standby'
  archive_mode = on
  archive_command = 'cd .'	 # we can also use exit 0, anything that 
                             # just does nothing
  max_wal_senders = 10
  wal_keep_segments = 5000   # 80 GB required on pg_xlog
  hot_standby = on
  shared_preload_libraries = 'repmgr_funcs'
 Edit the file pg_hba.conf and add lines for the replication::
  host     repmgr           repmgr      127.0.0.1/32            trust
  host     repmgr           repmgr      192.168.1.10/30         trust
  host     replication      all         192.168.1.10/30         trust
 :Note: It is also possible to use a password authentication (md5), .pgpass file
       should be edited to allow connection between each node.
 Create the user and database to manage replication::
  su - postgres
  createuser -s repmgr
  createdb -O repmgr repmgr
  psql -f /usr/share/postgresql/9.0/contrib/repmgr_funcs.sql repmgr
 Restart the PostgreSQL server::
  pg_ctl -D $PGDATA restart
 And check everything is fine in the server log.
 Create the ssh-key for the postgres user and copy it to other servers::
  su - postgres
  ssh-keygen             # /!\ do not use a passphrase /!\
  cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
  chmod 600 ~/.ssh/authorized_keys
  exit
  rsync -avz ~postgres/.ssh/authorized_keys node2:~postgres/.ssh/
  rsync -avz ~postgres/.ssh/authorized_keys witness:~postgres/.ssh/
  rsync -avz ~postgres/.ssh/id_rsa* node2:~postgres/.ssh/
  rsync -avz ~postgres/.ssh/id_rsa* witness:~postgres/.ssh/
 Clone Master
 ------------
 Log in node2.
 Clone the node1 (the current Master)::
  su - postgres
  repmgr -d repmgr -U repmgr standby clone node1
 Start the PostgreSQL server::
  pg_ctl -D $PGDATA start
 And check everything is fine in the server log.
 Configure repmgr
 ----------------
 Log in each server and configure repmgr by editing the file
 /etc/repmgr/repmgr.conf::
  cluster=my_cluster
  node=1
  node_name=earth
  conninfo='host=192.168.1.10 dbname=repmgr user=repmgr'
  master_response_timeout=60
  failover=automatic
  promote_command='promote_command.sh'
  follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'
 * *cluster* is the name of the current replication.
 * *node* is the number of the current node (1, 2 or 3 in the current example).
 * *node_name* is an identifier for every node.
 * *conninfo* is used to connect to the local PostgreSQL server (where the configuration file is) from any node. In the witness server configuration it is needed to add a 'port=5499' to the conninfo.
 * *master_response_timeout* is the maximum amount of time we are going to wait before deciding the master has died and start failover procedure.
 * *failover* configure behavior : *manual* or *automatic*.
 * *promote_command* the command executed to do the failover (including the PostgreSQL failover itself). The command must return 0 on success.
 * *follow_command* the command executed to address the current standby to another Master. The command must return 0 on success.
 Register Master and Standby
 ---------------------------
 Log in node1.
 Register the node as Master::
  su - postgres
  repmgr -f /etc/repmgr/repmgr.conf master register
 Log in node2.
 Register the node as Standby::
  su - postgres
  repmgr -f /etc/repmgr/repmgr.conf standby register
 Initialize witness server
 -------------------------
 Log in witness.
 Initialize the witness server::
  su - postgres
  repmgr -d repmgr -U repmgr -h 192.168.1.10 -D $WITNESS_PGDATA -f /etc/repmgr/repmgr.conf witness create node1
 It needs information to connect to the master to copy the configuration of the cluster, also it needs to know where it should initialize it's own $PGDATA.
 As part of the procees it also ask for the superuser password so it can connect when needed.
 Start the repmgrd daemons
 -------------------------
 Log in node2 and witness.
  su - postgres
  repmgrd -f /etc/repmgr/repmgr.conf > /var/log/postgresql/repmgr.log 2>&1
 :Note: The Master does not need a repmgrd daemon.
 Suspend Automatic behavior
 ==========================
 Edit the repmgr.conf of the node to remove from automatic processing and change::
 	failover=manual
 Then, signal repmgrd daemon::
 	su - postgres
 	kill -HUP `pidoff repmgrd`
 TODO : -HUP configuration update is not implemented and it should check its
 	   configuration file  against its configuration in DB, updating
 	   accordingly the SQL conf (especialy the failover manual or auto)
 	   this allow witness-standby and standby-not-promotable features
 	   and simpler usage of the tool ;)
 Usage
 =====
 The repmgr documentation is in the README file (how to build, options, etc.)