mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-26 08:36:30 +00:00
Add a quick setup for autofailover
This commit is contained in:
@@ -23,7 +23,7 @@ databases as a single cluster. repmgr includes two components:
|
|||||||
Supported Releases
|
Supported Releases
|
||||||
------------------
|
------------------
|
||||||
|
|
||||||
repmgr works with PostgreSQL versions 9.0 and 9.1.
|
repmgr works with PostgreSQL versions 9.0 and superior.
|
||||||
|
|
||||||
There are currently no incompatibilities when upgrading repmgr from 9.0 to 9.1,
|
There are currently no incompatibilities when upgrading repmgr from 9.0 to 9.1,
|
||||||
so your 9.0 configuration will work with 9.1
|
so your 9.0 configuration will work with 9.1
|
||||||
@@ -389,7 +389,7 @@ walkthrough assumes the following setup:
|
|||||||
|
|
||||||
* Another standby server called "node3" with a similar configuration to "node2".
|
* Another standby server called "node3" with a similar configuration to "node2".
|
||||||
|
|
||||||
* The Postgress installation in each of the above is defined as $PGDATA,
|
* The Postgres installation in each of the above is defined as $PGDATA,
|
||||||
which is represented here as ``/var/lib/pgsql/9.0/data``
|
which is represented here as ``/var/lib/pgsql/9.0/data``
|
||||||
|
|
||||||
Creating some sample data
|
Creating some sample data
|
||||||
@@ -514,12 +514,14 @@ following the standard directory structure of a RHEL system. It should contain:
|
|||||||
|
|
||||||
cluster=test
|
cluster=test
|
||||||
node=1
|
node=1
|
||||||
|
node_name=earth
|
||||||
conninfo='host=node1 user=repmgr dbname=pgbench'
|
conninfo='host=node1 user=repmgr dbname=pgbench'
|
||||||
|
|
||||||
On "node2" create the file ``/var/lib/pgsql/repmgr/repmgr.conf`` with::
|
On "node2" create the file ``/var/lib/pgsql/repmgr/repmgr.conf`` with::
|
||||||
|
|
||||||
cluster=test
|
cluster=test
|
||||||
node=2
|
node=2
|
||||||
|
node_name=mars
|
||||||
conninfo='host=node2 user=repmgr dbname=pgbench'
|
conninfo='host=node2 user=repmgr dbname=pgbench'
|
||||||
|
|
||||||
The STANDBY CLONE process should have created a recovery.conf file on
|
The STANDBY CLONE process should have created a recovery.conf file on
|
||||||
@@ -712,12 +714,14 @@ and it should contain::
|
|||||||
|
|
||||||
cluster=test
|
cluster=test
|
||||||
node=1
|
node=1
|
||||||
|
node_name=earth
|
||||||
conninfo='host=127.0.0.1 dbname=testdb'
|
conninfo='host=127.0.0.1 dbname=testdb'
|
||||||
|
|
||||||
On "standby" create the file ``/home/standby/repmgr/repmgr.conf`` with::
|
On "standby" create the file ``/home/standby/repmgr/repmgr.conf`` with::
|
||||||
|
|
||||||
cluster=test
|
cluster=test
|
||||||
node=2
|
node=2
|
||||||
|
node_name=mars
|
||||||
conninfo='host=127.0.0.1 dbname=testdb'
|
conninfo='host=127.0.0.1 dbname=testdb'
|
||||||
|
|
||||||
Next, with "prime" server running, we want to use the ``clone standby`` command
|
Next, with "prime" server running, we want to use the ``clone standby`` command
|
||||||
@@ -1133,4 +1137,3 @@ Jaime Casanova
|
|||||||
Simon Riggs
|
Simon Riggs
|
||||||
Greg Smith
|
Greg Smith
|
||||||
Cedric Villemain
|
Cedric Villemain
|
||||||
|
|
||||||
|
|||||||
209
autofailover_quick_setup.rst
Normal file
209
autofailover_quick_setup.rst
Normal file
@@ -0,0 +1,209 @@
|
|||||||
|
=====================================================
|
||||||
|
PostgreSQL Automatic Fail-Over - User Documentation
|
||||||
|
=====================================================
|
||||||
|
|
||||||
|
Automatic Failover
|
||||||
|
==================
|
||||||
|
|
||||||
|
repmgr allows setups for automatic failover when it detects the failure of the master node.
|
||||||
|
Following is a quick setup for this.
|
||||||
|
|
||||||
|
Installation
|
||||||
|
============
|
||||||
|
|
||||||
|
For convenience, we define:
|
||||||
|
|
||||||
|
* node1 is the hostname fully qualified of the Master server, IP 192.168.1.10
|
||||||
|
* node2 is the hostname fully qualified of the Standby server, IP 192.168.1.11
|
||||||
|
* witness is the hostname fully qualified of the server used for witness, IP 192.168.1.12
|
||||||
|
|
||||||
|
:Note: It is not recommanded to use name defining status of a server like «masterserver»,
|
||||||
|
this is a name leading to confusion once a failover take place and the Master is
|
||||||
|
now on the «standbyserver».
|
||||||
|
|
||||||
|
Summary
|
||||||
|
-------
|
||||||
|
|
||||||
|
2 PostgreSQL servers are involved in the replication. Automatic fail-over need
|
||||||
|
to vote to decide what server it should promote, thus an odd number is required
|
||||||
|
and a witness-repmgrd is installed in a third server where it uses a PostgreSQL
|
||||||
|
cluster to communicate with other repmgrd daemons.
|
||||||
|
|
||||||
|
1. Install PostgreSQL in all the servers involved (including the server used for
|
||||||
|
witness)
|
||||||
|
2. Install repmgr in all the servers involved (including the server used for witness)
|
||||||
|
3. Configure the Master PostreSQL
|
||||||
|
4. Clone the Master to the Standby using "repmgr standby clone" command
|
||||||
|
5. Configure repmgr in all the servers involved (including the server used for witness)
|
||||||
|
6. Register Master and Standby nodes
|
||||||
|
7. Initiate witness server
|
||||||
|
8. Start the repmgrd daemons in all nodes
|
||||||
|
|
||||||
|
:Note: A complete Hight-Availability design need at least 3 servers to still have
|
||||||
|
a backup node after a first failure.
|
||||||
|
|
||||||
|
Install PostgreSQL
|
||||||
|
------------------
|
||||||
|
|
||||||
|
You can install PostgreSQL using any of the recommended methods. You should ensure
|
||||||
|
it's 9.0 or superior.
|
||||||
|
|
||||||
|
Install repmgr
|
||||||
|
--------------
|
||||||
|
|
||||||
|
Install repmgr following the steps in the README.
|
||||||
|
|
||||||
|
Configure PostreSQL
|
||||||
|
-------------------
|
||||||
|
|
||||||
|
Log in node1.
|
||||||
|
|
||||||
|
Edit the file postgresql.conf and modify the parameters::
|
||||||
|
|
||||||
|
listen_addresses='*'
|
||||||
|
wal_level = 'hot_standby'
|
||||||
|
archive_mode = on
|
||||||
|
archive_command = 'cd .' # we can also use exit 0, anything that
|
||||||
|
# just does nothing
|
||||||
|
max_wal_senders = 10
|
||||||
|
wal_keep_segments = 5000 # 80 GB required on pg_xlog
|
||||||
|
hot_standby = on
|
||||||
|
shared_preload_libraries = 'repmgr_funcs'
|
||||||
|
|
||||||
|
Edit the file pg_hba.conf and add lines for the replication::
|
||||||
|
|
||||||
|
host repmgr repmgr 127.0.0.1/32 trust
|
||||||
|
host repmgr repmgr 192.168.1.10/30 trust
|
||||||
|
host replication all 192.168.1.10/30 trust
|
||||||
|
|
||||||
|
:Note: It is also possible to use a password authentication (md5), .pgpass file
|
||||||
|
should be edited to allow connection between each node.
|
||||||
|
|
||||||
|
Create the user and database to manage replication::
|
||||||
|
|
||||||
|
su - postgres
|
||||||
|
createuser -s repmgr
|
||||||
|
createdb -O repmgr repmgr
|
||||||
|
psql -f /usr/share/postgresql/9.0/contrib/repmgr_funcs.sql repmgr
|
||||||
|
|
||||||
|
Restart the PostgreSQL server::
|
||||||
|
|
||||||
|
pg_ctl -D $PGDATA restart
|
||||||
|
|
||||||
|
And check everything is fine in the server log.
|
||||||
|
|
||||||
|
Create the ssh-key for the postgres user and copy it to other servers::
|
||||||
|
|
||||||
|
su - postgres
|
||||||
|
ssh-keygen # /!\ do not use a passphrase /!\
|
||||||
|
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
|
||||||
|
chmod 600 ~/.ssh/authorized_keys
|
||||||
|
exit
|
||||||
|
rsync -avz ~postgres/.ssh/authorized_keys node2:~postgres/.ssh/
|
||||||
|
rsync -avz ~postgres/.ssh/authorized_keys witness:~postgres/.ssh/
|
||||||
|
rsync -avz ~postgres/.ssh/id_rsa* node2:~postgres/.ssh/
|
||||||
|
rsync -avz ~postgres/.ssh/id_rsa* witness:~postgres/.ssh/
|
||||||
|
|
||||||
|
Clone Master
|
||||||
|
------------
|
||||||
|
|
||||||
|
Log in node2.
|
||||||
|
|
||||||
|
Clone the node1 (the current Master)::
|
||||||
|
|
||||||
|
su - postgres
|
||||||
|
repmgr -d repmgr -U repmgr standby clone node1
|
||||||
|
|
||||||
|
Start the PostgreSQL server::
|
||||||
|
|
||||||
|
pg_ctl -D $PGDATA start
|
||||||
|
|
||||||
|
And check everything is fine in the server log.
|
||||||
|
|
||||||
|
Configure repmgr
|
||||||
|
----------------
|
||||||
|
|
||||||
|
Log in each server and configure repmgr by editing the file
|
||||||
|
/etc/repmgr/repmgr.conf::
|
||||||
|
|
||||||
|
cluster=my_cluster
|
||||||
|
node=1
|
||||||
|
node_name=earth
|
||||||
|
conninfo='host=192.168.1.10 dbname=repmgr user=repmgr'
|
||||||
|
master_response_timeout=60
|
||||||
|
failover=automatic
|
||||||
|
promote_command='promote_command.sh'
|
||||||
|
follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'
|
||||||
|
|
||||||
|
* *cluster* is the name of the current replication.
|
||||||
|
* *node* is the number of the current node (1, 2 or 3 in the current example).
|
||||||
|
* *node_name* is an identifier for every node.
|
||||||
|
* *conninfo* is used to connect to the local PostgreSQL server (where the configuration file is) from any node. In the witness server configuration it is needed to add a 'port=5499' to the conninfo.
|
||||||
|
* *master_response_timeout* is the maximum amount of time we are going to wait before deciding the master has died and start failover procedure.
|
||||||
|
* *failover* configure behavior : *manual* or *automatic*.
|
||||||
|
* *promote_command* the command executed to do the failover (including the PostgreSQL failover itself). The command must return 0 on success.
|
||||||
|
* *follow_command* the command executed to address the current standby to another Master. The command must return 0 on success.
|
||||||
|
|
||||||
|
Register Master and Standby
|
||||||
|
---------------------------
|
||||||
|
|
||||||
|
Log in node1.
|
||||||
|
|
||||||
|
Register the node as Master::
|
||||||
|
|
||||||
|
su - postgres
|
||||||
|
repmgr -f /etc/repmgr/repmgr.conf master register
|
||||||
|
|
||||||
|
Log in node2.
|
||||||
|
|
||||||
|
Register the node as Standby::
|
||||||
|
|
||||||
|
su - postgres
|
||||||
|
repmgr -f /etc/repmgr/repmgr.conf standby register
|
||||||
|
|
||||||
|
Initialize witness server
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
Log in witness.
|
||||||
|
|
||||||
|
Initialize the witness server::
|
||||||
|
|
||||||
|
su - postgres
|
||||||
|
repmgr -d repmgr -U repmgr -h 192.168.1.10 -D $WITNESS_PGDATA -f /etc/repmgr/repmgr.conf witness create node1
|
||||||
|
|
||||||
|
It needs information to connect to the master to copy the configuration of the cluster, also it needs to know where it should initialize it's own $PGDATA.
|
||||||
|
As part of the procees it also ask for the superuser password so it can connect when needed.
|
||||||
|
|
||||||
|
Start the repmgrd daemons
|
||||||
|
-------------------------
|
||||||
|
|
||||||
|
Log in node2 and witness.
|
||||||
|
|
||||||
|
su - postgres
|
||||||
|
repmgrd -f /etc/repmgr/repmgr.conf > /var/log/postgresql/repmgr.log 2>&1
|
||||||
|
|
||||||
|
:Note: The Master does not need a repmgrd daemon.
|
||||||
|
|
||||||
|
|
||||||
|
Suspend Automatic behavior
|
||||||
|
==========================
|
||||||
|
|
||||||
|
Edit the repmgr.conf of the node to remove from automatic processing and change::
|
||||||
|
|
||||||
|
failover=manual
|
||||||
|
|
||||||
|
Then, signal repmgrd daemon::
|
||||||
|
|
||||||
|
su - postgres
|
||||||
|
kill -HUP `pidoff repmgrd`
|
||||||
|
|
||||||
|
TODO : -HUP configuration update is not implemented and it should check its
|
||||||
|
configuration file against its configuration in DB, updating
|
||||||
|
accordingly the SQL conf (especialy the failover manual or auto)
|
||||||
|
this allow witness-standby and standby-not-promotable features
|
||||||
|
and simpler usage of the tool ;)
|
||||||
|
|
||||||
|
Usage
|
||||||
|
=====
|
||||||
|
|
||||||
|
The repmgr documentation is in the README file (how to build, options, etc.)
|
||||||
Reference in New Issue
Block a user