fixed documentation and line endings

This commit is contained in:
Christian Kruse
2014-01-23 10:39:21 +01:00
parent 680f23fb1d
commit 9d6ac2ebf9

View File

@@ -1,213 +1,225 @@
===================================================== =====================================================
PostgreSQL Automatic Fail-Over - User Documentation PostgreSQL Automatic Fail-Over - User Documentation
===================================================== =====================================================
Automatic Failover Automatic Failover
================== ==================
repmgr allows setups for automatic failover when it detects the failure of the master node. repmgr allows setups for automatic failover when it detects the failure of the master node.
Following is a quick setup for this. Following is a quick setup for this.
Installation Installation
============ ============
For convenience, we define: For convenience, we define:
* node1 is the hostname fully qualified of the Master server, IP 192.168.1.10 **node1**
* node2 is the hostname fully qualified of the Standby server, IP 192.168.1.11 is the hostname fully qualified of the Master server, IP 192.168.1.10
* witness is the hostname fully qualified of the server used for witness, IP 192.168.1.12 **node2**
is the hostname fully qualified of the Standby server, IP 192.168.1.11
:Note: It is not recommanded to use name defining status of a server like «masterserver», **witness**
this is a name leading to confusion once a failover take place and the Master is is the hostname fully qualified of the server used for witness, IP 192.168.1.12
now on the «standbyserver».
**Note:** It is not recommanded to use name defining status of a server like «masterserver»,
Summary this is a name leading to confusion once a failover take place and the Master is
------- now on the «standbyserver».
2 PostgreSQL servers are involved in the replication. Automatic fail-over need Summary
to vote to decide what server it should promote, thus an odd number is required -------
and a witness-repmgrd is installed in a third server where it uses a PostgreSQL
cluster to communicate with other repmgrd daemons. 2 PostgreSQL servers are involved in the replication. Automatic fail-over need
to vote to decide what server it should promote, thus an odd number is required
1. Install PostgreSQL in all the servers involved (including the server used for and a witness-repmgrd is installed in a third server where it uses a PostgreSQL
witness) cluster to communicate with other repmgrd daemons.
2. Install repmgr in all the servers involved (including the server used for witness)
3. Configure the Master PostreSQL 1. Install PostgreSQL in all the servers involved (including the server used for
4. Clone the Master to the Standby using "repmgr standby clone" command witness)
5. Configure repmgr in all the servers involved (including the server used for witness)
6. Register Master and Standby nodes 2. Install repmgr in all the servers involved (including the server used for witness)
7. Initiate witness server
8. Start the repmgrd daemons in all nodes 3. Configure the Master PostreSQL
:Note: A complete Hight-Availability design need at least 3 servers to still have 4. Clone the Master to the Standby using "repmgr standby clone" command
a backup node after a first failure.
5. Configure repmgr in all the servers involved (including the server used for witness)
Install PostgreSQL
------------------ 6. Register Master and Standby nodes
You can install PostgreSQL using any of the recommended methods. You should ensure 7. Initiate witness server
it's 9.0 or superior.
8. Start the repmgrd daemons in all nodes
Install repmgr
-------------- **Note** A complete Hight-Availability design need at least 3 servers to still have
a backup node after a first failure.
Install repmgr following the steps in the README.
Install PostgreSQL
Configure PostreSQL ------------------
-------------------
You can install PostgreSQL using any of the recommended methods. You should ensure
Log in node1. it's 9.0 or superior.
Edit the file postgresql.conf and modify the parameters:: Install repmgr
--------------
listen_addresses='*'
wal_level = 'hot_standby' Install repmgr following the steps in the README.
archive_mode = on
archive_command = 'cd .' # we can also use exit 0, anything that Configure PostreSQL
# just does nothing -------------------
max_wal_senders = 10
wal_keep_segments = 5000 # 80 GB required on pg_xlog Log in node1.
hot_standby = on
shared_preload_libraries = 'repmgr_funcs' Edit the file postgresql.conf and modify the parameters::
Edit the file pg_hba.conf and add lines for the replication:: listen_addresses='*'
wal_level = 'hot_standby'
host repmgr repmgr 127.0.0.1/32 trust archive_mode = on
host repmgr repmgr 192.168.1.10/30 trust archive_command = 'cd .' # we can also use exit 0, anything that
host replication all 192.168.1.10/30 trust # just does nothing
max_wal_senders = 10
:Note: It is also possible to use a password authentication (md5), .pgpass file wal_keep_segments = 5000 # 80 GB required on pg_xlog
should be edited to allow connection between each node. hot_standby = on
shared_preload_libraries = 'repmgr_funcs'
Create the user and database to manage replication::
Edit the file pg_hba.conf and add lines for the replication::
su - postgres
createuser -s repmgr host repmgr repmgr 127.0.0.1/32 trust
createdb -O repmgr repmgr host repmgr repmgr 192.168.1.10/30 trust
psql -f /usr/share/postgresql/9.0/contrib/repmgr_funcs.sql repmgr host replication all 192.168.1.10/30 trust
Restart the PostgreSQL server:: **Note:** It is also possible to use a password authentication (md5), .pgpass file
should be edited to allow connection between each node.
pg_ctl -D $PGDATA restart
Create the user and database to manage replication::
And check everything is fine in the server log.
su - postgres
Create the ssh-key for the postgres user and copy it to other servers:: createuser -s repmgr
createdb -O repmgr repmgr
su - postgres psql -f /usr/share/postgresql/9.0/contrib/repmgr_funcs.sql repmgr
ssh-keygen # /!\ do not use a passphrase /!\
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys Restart the PostgreSQL server::
chmod 600 ~/.ssh/authorized_keys
exit pg_ctl -D $PGDATA restart
rsync -avz ~postgres/.ssh/authorized_keys node2:~postgres/.ssh/
rsync -avz ~postgres/.ssh/authorized_keys witness:~postgres/.ssh/ And check everything is fine in the server log.
rsync -avz ~postgres/.ssh/id_rsa* node2:~postgres/.ssh/
rsync -avz ~postgres/.ssh/id_rsa* witness:~postgres/.ssh/ Create the ssh-key for the postgres user and copy it to other servers::
Clone Master su - postgres
------------ ssh-keygen # /!\ do not use a passphrase /!\
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
Log in node2. chmod 600 ~/.ssh/authorized_keys
exit
Clone the node1 (the current Master):: rsync -avz ~postgres/.ssh/authorized_keys node2:~postgres/.ssh/
rsync -avz ~postgres/.ssh/authorized_keys witness:~postgres/.ssh/
su - postgres rsync -avz ~postgres/.ssh/id_rsa* node2:~postgres/.ssh/
repmgr -d repmgr -U repmgr -h node1 standby clone rsync -avz ~postgres/.ssh/id_rsa* witness:~postgres/.ssh/
Start the PostgreSQL server:: Clone Master
------------
pg_ctl -D $PGDATA start
Log in node2.
And check everything is fine in the server log.
Clone the node1 (the current Master)::
Configure repmgr
---------------- su - postgres
repmgr -d repmgr -U repmgr -h node1 standby clone
Log in each server and configure repmgr by editing the file
/etc/repmgr/repmgr.conf:: Start the PostgreSQL server::
cluster=my_cluster pg_ctl -D $PGDATA start
node=1
node_name=earth And check everything is fine in the server log.
conninfo='host=192.168.1.10 dbname=repmgr user=repmgr'
master_response_timeout=60 Configure repmgr
reconnect_attempts=6 ----------------
reconnect_interval=10
failover=automatic Log in each server and configure repmgr by editing the file
promote_command='promote_command.sh' /etc/repmgr/repmgr.conf::
follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'
cluster=my_cluster
* *cluster* is the name of the current replication. node=1
* *node* is the number of the current node (1, 2 or 3 in the current example). node_name=earth
* *node_name* is an identifier for every node. conninfo='host=192.168.1.10 dbname=repmgr user=repmgr'
* *conninfo* is used to connect to the local PostgreSQL server (where the configuration file is) from any node. In the witness server configuration it is needed to add a 'port=5499' to the conninfo. master_response_timeout=60
* *master_response_timeout* is the maximum amount of time we are going to wait before deciding the master has died and start failover procedure. reconnect_attempts=6
* *reconnect_attempts* is the number of times we will try to reconnect to master after a failure has been detected and before start failover procedure. reconnect_interval=10
* *reconnect_interval* is the amount of time between retries to reconnect to master after a failure has been detected and before start failover procedure. failover=automatic
* *failover* configure behavior : *manual* or *automatic*. promote_command='promote_command.sh'
* *promote_command* the command executed to do the failover (including the PostgreSQL failover itself). The command must return 0 on success. follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'
* *follow_command* the command executed to address the current standby to another Master. The command must return 0 on success.
**cluster**
Register Master and Standby is the name of the current replication.
--------------------------- **node**
is the number of the current node (1, 2 or 3 in the current example).
Log in node1. **node_name**
is an identifier for every node.
Register the node as Master:: **conninfo**
is used to connect to the local PostgreSQL server (where the configuration file is) from any node. In the witness server configuration it is needed to add a 'port=5499' to the conninfo.
su - postgres **master_response_timeout**
repmgr -f /etc/repmgr/repmgr.conf master register is the maximum amount of time we are going to wait before deciding the master has died and start failover procedure.
**reconnect_attempts**
Log in node2. is the number of times we will try to reconnect to master after a failure has been detected and before start failover procedure.
**reconnect_interval**
Register the node as Standby:: is the amount of time between retries to reconnect to master after a failure has been detected and before start failover procedure.
**failover**
su - postgres configure behavior: *manual* or *automatic*.
repmgr -f /etc/repmgr/repmgr.conf standby register **promote_command**
the command executed to do the failover (including the PostgreSQL failover itself). The command must return 0 on success.
Initialize witness server **follow_command**
------------------------- the command executed to address the current standby to another Master. The command must return 0 on success.
Log in witness. Register Master and Standby
---------------------------
Initialize the witness server::
Log in node1.
su - postgres
repmgr -d repmgr -U repmgr -h 192.168.1.10 -D $WITNESS_PGDATA -f /etc/repmgr/repmgr.conf witness create Register the node as Master::
It needs information to connect to the master to copy the configuration of the cluster, also it needs to know where it should initialize it's own $PGDATA. su - postgres
As part of the procees it also ask for the superuser password so it can connect when needed. repmgr -f /etc/repmgr/repmgr.conf master register
Start the repmgrd daemons Log in node2. Register it as a standby::
-------------------------
su - postgres
Log in node2 and witness. repmgr -f /etc/repmgr/repmgr.conf standby register
su - postgres Initialize witness server
repmgrd -f /etc/repmgr/repmgr.conf > /var/log/postgresql/repmgr.log 2>&1 -------------------------
:Note: The Master does not need a repmgrd daemon. Log in witness.
Initialize the witness server::
Suspend Automatic behavior
========================== su - postgres
repmgr -d repmgr -U repmgr -h 192.168.1.10 -D $WITNESS_PGDATA -f /etc/repmgr/repmgr.conf witness create
Edit the repmgr.conf of the node to remove from automatic processing and change::
It needs information to connect to the master to copy the configuration of the cluster, also it needs to know where it should initialize it's own $PGDATA.
failover=manual As part of the procees it also ask for the superuser password so it can connect when needed.
Then, signal repmgrd daemon:: Start the repmgrd daemons
-------------------------
su - postgres
kill -HUP `pidoff repmgrd` Log in node2 and witness.
TODO : -HUP configuration update is not implemented and it should check its su - postgres
configuration file against its configuration in DB, updating repmgrd -f /etc/repmgr/repmgr.conf > /var/log/postgresql/repmgr.log 2>&1
accordingly the SQL conf (especialy the failover manual or auto)
this allow witness-standby and standby-not-promotable features **Note:** The Master does not need a repmgrd daemon.
and simpler usage of the tool ;)
Usage Suspend Automatic behavior
===== ==========================
The repmgr documentation is in the README file (how to build, options, etc.) Edit the repmgr.conf of the node to remove from automatic processing and change::
failover=manual
Then, signal repmgrd daemon::
su - postgres
kill -HUP `pidoff repmgrd`
Usage
=====
The repmgr documentation is in the README file (how to build, options, etc.)