mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-23 15:16:29 +00:00
906 lines
32 KiB
ReStructuredText
906 lines
32 KiB
ReStructuredText
===================================================
|
|
repmgr: Replication Manager for PostgreSQL clusters
|
|
===================================================
|
|
|
|
Introduction
|
|
============
|
|
|
|
Introduction to repmgr commands
|
|
===============================
|
|
|
|
Suppose we have 3 nodes: node1 (the initial master), node2 and node3.
|
|
To make node2 and node3 be standbys of node1, execute this on both nodes
|
|
(node2 and node3)::
|
|
|
|
repmgr -D /var/lib/pgsql/9.0 standby clone node1
|
|
|
|
In order to get full monitoring and easier state transitions,
|
|
you register each of the nodes, by creating a ``repmgr.conf`` file
|
|
and executing commands like this on the appropriate nodes::
|
|
|
|
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose master register
|
|
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby register
|
|
|
|
Once everything is registered, you start the repmgrd daemon. It
|
|
will maintain a view showing the state of all the nodes in the cluster,
|
|
including how far they are lagging behind the master.
|
|
|
|
If you lose node1 you can then run this on node2::
|
|
|
|
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby promote
|
|
|
|
To make node2 the new master. Then on node3 run::
|
|
|
|
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby follow
|
|
|
|
To make node3 follow node2 (rather than node1).
|
|
|
|
If now we want to add a new node, we can a prepare a new server (node4)
|
|
and run::
|
|
|
|
repmgr -D /var/lib/pgsql/9.0 standby clone node2
|
|
|
|
And if a previously failed node becomes available again, such as
|
|
the lost node1 above, you can get it to resynchronize by only copying
|
|
over changes made while it was down. That happens with what's
|
|
called a forced clone, which overwrites existing data rather than
|
|
assuming it starts with an empty database directory tree::
|
|
|
|
repmgr -D /var/lib/pgsql/9.0 --force standby clone node1
|
|
|
|
This can be much faster than creating a brand new node that must
|
|
copy over every file in the database.
|
|
|
|
Installation Outline
|
|
====================
|
|
|
|
To install and use repmgr and repmgrd follow these steps:
|
|
|
|
1. Build repmgr programs
|
|
|
|
2. Set up trusted copy between postgres accounts, needed for the
|
|
``STANDBY CLONE`` step
|
|
|
|
3. Check your primary server is correctly configured
|
|
|
|
4. Write a suitable ``repmgr.conf`` for the node
|
|
|
|
5. Setup repmgrd to aid in failover transitions
|
|
|
|
Confirm software was built correctly
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
You should now find the repmgr programs available in the subdirectory where
|
|
the rest of your PostgreSQL binary files are located. You can confirm the
|
|
software is available by checking its version::
|
|
|
|
repmgr --version
|
|
repmgrd --version
|
|
|
|
You may need to include the full path of the binary instead, such as this
|
|
RHEL example::
|
|
|
|
/usr/pgsql-9.0/bin/repmgr --version
|
|
/usr/pgsql-9.0/bin/repmgrd --version
|
|
|
|
Or in this Debian example::
|
|
|
|
/usr/lib/postgresql/9.0/bin/repmgr --version
|
|
/usr/lib/postgresql/9.0/bin/repmgrd --version
|
|
|
|
Below this binary installation base directory is referred to as PGDIR.
|
|
|
|
Primary server configuration
|
|
----------------------------
|
|
|
|
PostgreSQL should have been previously built and installed on the system. Here
|
|
is a sample of changes to the ``postgresql.conf`` file::
|
|
|
|
listen_addresses='*'
|
|
wal_level = 'hot_standby'
|
|
archive_mode = on
|
|
archive_command = 'cd .' # we can also use exit 0, anything that
|
|
# just does nothing
|
|
max_wal_senders = 10
|
|
wal_keep_segments = 5000 # 80 GB required on pg_xlog
|
|
hot_standby = on
|
|
|
|
Also you need to add the machines that will participate in the cluster in
|
|
``pg_hba.conf`` file. One possibility is to trust all connections from the
|
|
replication users from all internal addresses, such as::
|
|
|
|
host all all 192.168.1.0/24 trust
|
|
host replication all 192.168.1.0/24 trust
|
|
|
|
A more secure setup adds a repmgr user and database, just giving
|
|
access to that user::
|
|
|
|
host repmgr repmgr 192.168.1.0/24 trust
|
|
host replication all 192.168.1.0/24 trust
|
|
|
|
If you give a password to the user, you need to create a ``.pgpass`` file for
|
|
them as well to allow automatic login. In this case you might use the
|
|
``md5`` authentication method instead of ``trust`` for the repmgr user.
|
|
|
|
Don't forget to restart the database server after making all these changes.
|
|
|
|
Usage walkthrough
|
|
=================
|
|
|
|
This assumes you've already followed the steps in "Installation Outline" to
|
|
install repmgr and repmgrd on the system.
|
|
|
|
A typical production installation of ``repmgr`` might involve two PostgreSQL
|
|
instances on seperate servers, both running under the ``postgres`` user account
|
|
and both using the default port (5432). This walkthrough assumes the following
|
|
setup:
|
|
|
|
* A primary (master) server called "node1," running as the "postgres" user
|
|
who is also the owner of the files. This server is operating on port 5432. This
|
|
server will be known as "node1" in the cluster "test".
|
|
|
|
* A secondary (standby) server called "node2," running as the "postgres" user
|
|
who is also the owner of the files. This server is operating on port 5432. This
|
|
server will be known as "node2" in the cluster "test".
|
|
|
|
* Another standby server called "node3" with a similar configuration to "node2".
|
|
|
|
* The Postgres installation in each of the above is defined as $PGDATA,
|
|
which is represented here as ``/var/lib/pgsql/9.0/data``
|
|
|
|
Creating some sample data
|
|
-------------------------
|
|
|
|
If you already have a database with useful data to replicate, you can
|
|
skip this step and use it instead. But if you do not already have
|
|
data in this cluster to replication, you can create some like this::
|
|
|
|
createdb pgbench
|
|
pgbench -i -s 10 pgbench
|
|
|
|
Examples below will use the database name ``pgbench`` to match this.
|
|
Substitute the name of your database instead. Note that the standby
|
|
nodes created here will include information for every database in the
|
|
cluster, not just the specified one. Needing the database name is
|
|
mainly for user authentication purposes.
|
|
|
|
Setting up a repmgr user
|
|
------------------------
|
|
|
|
Make sure that the "standby" user has a role in the database, "pgbench" in this
|
|
case, and can login. On "node1"::
|
|
|
|
createuser --login --superuser repmgr
|
|
|
|
Alternately you could start ``psql`` on the pgbench database on "node1" and at
|
|
the node1b# prompt type::
|
|
|
|
CREATE ROLE repmgr SUPERUSER LOGIN;
|
|
|
|
The main advantage of the latter is that you can do it remotely to any
|
|
system you already have superuser access to.
|
|
|
|
Clearing the PostgreSQL installation on the Standby
|
|
---------------------------------------------------
|
|
|
|
To setup a new streaming replica, startin by removing any PostgreSQL
|
|
installation on the existing standby nodes.
|
|
|
|
* Stop any server on "node2" and "node3". You can confirm the database
|
|
servers running using a command like this::
|
|
|
|
ps -eaf | grep postgres
|
|
|
|
And looking for the various database server processes: server, logger,
|
|
wal writer, and autovacuum launcher.
|
|
|
|
* Go to "node2" and "node3" database directories and remove the PostgreSQL installation::
|
|
|
|
cd $PGDATA
|
|
rm -rf *
|
|
|
|
This will delete the entire database installation in ``/var/lib/pgsql/9.0/data``.
|
|
Be careful that $PGDATA is defined here; executing ``ls`` to confirm you're
|
|
in the right place is always a good idea before executing ``rm``.
|
|
|
|
Testing remote access to the master
|
|
-----------------------------------
|
|
|
|
On the "node2" server, first test that you can connect to "node1" the
|
|
way repmgr will by executing::
|
|
|
|
psql -h node1 -U repmgr -d pgbench
|
|
|
|
Possible sources for a problem here include:
|
|
|
|
* Login role specified was not created on "node1"
|
|
|
|
* The database configuration on "node1" is not listening on a TCP/IP port.
|
|
That could be because the ``listen_addresses`` parameter was not updated,
|
|
or if it was but the server wasn't restarted afterwards. You can
|
|
test this on "node1" itself the same way::
|
|
|
|
psql -h node1 -U repmgr -d pgbench
|
|
|
|
With the "-h" parameter forcing a connnection over TCP/IP, rather
|
|
than the default UNIX socket method.
|
|
|
|
* There is a firewall setup that prevents incoming access to the
|
|
PostgreSQL port (defaulting to 5432) used to access "node1". In
|
|
this situation you would be able to connect to the "node1" server
|
|
on itself, but not from any other host, and you'd just get a timeout
|
|
when trying rather than a proper error message.
|
|
|
|
* The ``pg_hba.conf`` file does not list appropriate statements to allow
|
|
this user to login. In this case you should connect to the server,
|
|
but see an error message mentioning the ``pg_hba.conf``.
|
|
|
|
Cloning the standby
|
|
-------------------
|
|
|
|
With "node1" server running, we want to use the ``clone standby`` command
|
|
in repmgr to copy over the entire PostgreSQL database cluster onto the
|
|
"node2" server. Execute the clone process with::
|
|
|
|
repmgr -D $PGDATA -d pgbench -p 5432 -U repmgr -R postgres --verbose standby clone node1
|
|
|
|
Here "-U" specifies the database user to connect to the master as, while
|
|
"-R" specifies what user to run the rsync command as. Potentially you
|
|
could leave out one or both of these, in situations where the user and/or
|
|
role setup is the same on each node.
|
|
|
|
If this fails with an error message about accessing the master database,
|
|
you should return to the previous step and confirm access to "node1"
|
|
from "node2" with ``psql``, using the same parameters given to repmgr.
|
|
|
|
NOTE: you need to have $PGDIR/bin (where the PostgreSQL binaries are installed)
|
|
in your path for the above to work. If you don't want that as a permanent
|
|
setting, you can temporarily set it before running individual commands like
|
|
this::
|
|
|
|
PATH=$PGDIR/bin:$PATH repmgr -D $PGDATA ...
|
|
|
|
Setup repmgr configuration file
|
|
-------------------------------
|
|
|
|
Create a directory to store each repmgr configuration in for each node.
|
|
In that, there needs to be a ``repmgr.conf`` file for each node in the cluster.
|
|
For each node we'll assume this is stored in ``/var/lib/pgsql/repmgr/repmgr.conf``
|
|
following the standard directory structure of a RHEL system. It should contain::
|
|
|
|
cluster=test
|
|
node=1
|
|
node_name=earth
|
|
conninfo='host=node1 user=repmgr dbname=pgbench'
|
|
|
|
On "node2" create the file ``/var/lib/pgsql/repmgr/repmgr.conf`` with::
|
|
|
|
cluster=test
|
|
node=2
|
|
node_name=mars
|
|
conninfo='host=node2 user=repmgr dbname=pgbench'
|
|
|
|
The STANDBY CLONE process should have created a recovery.conf file on
|
|
"node2" in the $PGDATA directory that reads as follows::
|
|
|
|
standby_mode = 'on'
|
|
primary_conninfo = 'host=node1 port=5432'
|
|
|
|
Registering the master and standby
|
|
----------------------------------
|
|
|
|
First, register the master by typing on "node1"::
|
|
|
|
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose master register
|
|
|
|
Then start the "standby" server.
|
|
|
|
You could now register the standby by typing on "node2"::
|
|
|
|
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby register
|
|
|
|
However, you can instead start repmgrd::
|
|
|
|
repmgrd -f /var/lib/pgsql/repmgr/repmgr.conf --daemonize --verbose > /var/lib/pgsql/repmgr/repmgr.log 2>&1
|
|
|
|
Which will automatically register your standby system. And eventually
|
|
you need repmgrd running anyway, to save lag monitoring information.
|
|
repmgrd will log the deamon activity to the listed file. You can
|
|
watch what it is doing with::
|
|
|
|
tail -f /var/lib/pgsql/repmgr/repmgr.log
|
|
|
|
Hit control-C to exit this tail command when you are done.
|
|
|
|
Monitoring and testing
|
|
----------------------
|
|
|
|
At this point, you have a functioning primary on "node1" and a functioning
|
|
standby server running on "node2". You can confirm the master knows
|
|
about the standby, and that it is keeping it current, by looking at
|
|
``repl_status``::
|
|
|
|
postgres@node2 $ psql -x -d pgbench -c "SELECT * FROM repmgr_test.repl_status"
|
|
-[ RECORD 1 ]-------------+------------------------------
|
|
primary_node | 1
|
|
standby_node | 2
|
|
last_monitor_time | 2011-02-23 08:19:39.791974-05
|
|
last_wal_primary_location | 0/1902D5E0
|
|
last_wal_standby_location | 0/1902D5E0
|
|
replication_lag | 0 bytes
|
|
apply_lag | 0 bytes
|
|
time_lag | 00:26:13.30293
|
|
|
|
Some tests you might do at this point include:
|
|
|
|
* Insert some records into the primary server here, confirm they appear
|
|
very quickly (within milliseconds) on the standby, and that the
|
|
repl_status view advances accordingly.
|
|
|
|
* Verify that you can run queries against the standby server, but
|
|
cannot make insertions into the standby database.
|
|
|
|
Simulating the failure of the primary server
|
|
--------------------------------------------
|
|
|
|
To simulate the loss of the primary server, simply stop the "node1" server.
|
|
At this point, the standby contains the database as it existed at the time of
|
|
the "failure" of the primary server. If looking at ``repl_status`` on
|
|
"node2", you should see the time_lag value increase the longer "node1"
|
|
is down.
|
|
|
|
Promoting the Standby to be the Primary
|
|
---------------------------------------
|
|
|
|
Now you can promote the standby server to be the primary, to allow
|
|
applications to read and write to the database again, by typing::
|
|
|
|
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby promote
|
|
|
|
The server restarts and now has read/write ability.
|
|
|
|
Bringing the former Primary up as a Standby
|
|
-------------------------------------------
|
|
|
|
To make the former primary act as a standby, which is necessary before
|
|
restoring the original roles, type the following on node1::
|
|
|
|
repmgr -D $PGDATA -d pgbench -p 5432 -U repmgr -R postgres --verbose --force standby clone node2
|
|
|
|
Then start the "node1" server, which is now acting as a standby server.
|
|
Check
|
|
|
|
Make sure the record(s) inserted the earlier step are still available on the
|
|
now standby (prime). Confirm the database on "node1" is read-only.
|
|
|
|
Restoring the original roles of prime to primary and standby to standby
|
|
-----------------------------------------------------------------------
|
|
|
|
Now restore to the original configuration by stopping
|
|
"node2" (now acting as a primary), promoting "node1" again to be the
|
|
primary server, then bringing up "node2" as a standby with a valid
|
|
``recovery.conf`` file.
|
|
|
|
Stop the "node2" server and type the following on "node1" server::
|
|
|
|
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby promote
|
|
|
|
Now the original primary, "node1", is acting again as primary.
|
|
|
|
Start the "node2" server and type this on "node2"::
|
|
|
|
repmgr standby clone --force -h node2 -p 5432 -U postgres -R postgres --verbose
|
|
|
|
Verify the roles have reversed by attempting to insert a record on "node1"
|
|
and on "node2".
|
|
|
|
The servers are now again acting as primary on "node1" and standby on "node2".
|
|
|
|
Alternate setup: both servers on one host
|
|
==========================================
|
|
|
|
Another test setup assumes you might be using the default installation of
|
|
PostgreSQL on port 5432 for some other purpose, and instead relocates these
|
|
instances onto different ports running as different users. In places where
|
|
``127.0.0.1`` is used as a host name, a more traditional configuration
|
|
would instead use the name of the relevant host for that parameter.
|
|
You can usually leave out changes to the port number in this case too.
|
|
|
|
* A primary (master) server called "prime," with a user as "prime," who is
|
|
also the owner of the files. This server is operating on port 5433. This
|
|
server will be known as "node1" in the cluster "test"
|
|
|
|
* A standby server called "standby", with a user of "standby", who is the
|
|
owner of the files. This server is operating on port 5434. This server
|
|
will be known and "node2" on the cluster "test."
|
|
|
|
* A database exists on "prime" called "testdb."
|
|
|
|
* The Postgres installation in each of the above is defined as $PGDATA,
|
|
which is represented here with ``/data/prime`` as the "prime" server and
|
|
``/data/standby`` as the "standby" server.
|
|
|
|
You might setup such an installation by adjusting the login script for the
|
|
"prime" and "standby" users as in these two examples::
|
|
|
|
# prime
|
|
PGDATA=/data/prime
|
|
PGENGINE=/usr/pgsql-9.0/bin
|
|
PGPORT=5433
|
|
export PGDATA PGENGINE PGPORT
|
|
PATH="$PATH:$PGENGINE"
|
|
|
|
# standby
|
|
PGDATA=/data/standby
|
|
PGENGINE=/usr/pgsql-9.0/bin
|
|
PGPORT=5434
|
|
export PGDATA PGENGINE PGPORT
|
|
PATH="$PATH:$PGENGINE"
|
|
|
|
And then starting/stopping each installation as needed using the ``pg_ctl``
|
|
utility.
|
|
|
|
Note: naming your nodes based on their starting role is not a recommended
|
|
best practice! As you'll see in this example, once there is a failover, names
|
|
strongly associated with one particular role (primary or standby) can become
|
|
confusing, once that node no longer has that role. Future versions of this
|
|
walkthrough are expected to use more generic terminology for these names.
|
|
|
|
Clearing the PostgreSQL installation on the Standby
|
|
---------------------------------------------------
|
|
|
|
Setup a streaming replica, strip away any PostgreSQL installation on the existing replica:
|
|
|
|
* Stop both servers.
|
|
|
|
* Go to "standby" database directory and remove the PostgreSQL installation::
|
|
|
|
cd $PGDATA
|
|
rm -rf *
|
|
|
|
This will delete the entire database installation in ``/data/standby``.
|
|
|
|
Building the standby
|
|
--------------------
|
|
|
|
Create a directory to store each repmgr configuration in for each node.
|
|
In that, there needs to be a ``repmgr.conf`` file for each node in the cluster.
|
|
For "prime" we'll assume this is stored in ``/home/prime/repmgr``
|
|
and it should contain::
|
|
|
|
cluster=test
|
|
node=1
|
|
node_name=earth
|
|
conninfo='host=127.0.0.1 dbname=testdb'
|
|
|
|
On "standby" create the file ``/home/standby/repmgr/repmgr.conf`` with::
|
|
|
|
cluster=test
|
|
node=2
|
|
node_name=mars
|
|
conninfo='host=127.0.0.1 dbname=testdb'
|
|
|
|
Next, with "prime" server running, we want to use the ``clone standby`` command
|
|
in repmgr to copy over the entire PostgreSQL database cluster onto the
|
|
"standby" server. On the "standby" server, type::
|
|
|
|
repmgr -D $PGDATA -p 5433 -U prime -R prime --verbose standby clone localhost
|
|
|
|
Next, we need a recovery.conf file on "standby" in the $PGDATA directory
|
|
that reads as follows::
|
|
|
|
standby_mode = 'on'
|
|
primary_conninfo = 'host=127.0.0.1 port=5433'
|
|
|
|
Make sure that standby has a qualifying role in the database, "testdb" in this
|
|
case, and can login. Start ``psql`` on the testdb database on "prime" and at
|
|
the testdb# prompt type::
|
|
|
|
CREATE ROLE standby SUPERUSER LOGIN
|
|
|
|
Registering the master and standby
|
|
----------------------------------
|
|
|
|
First, register the master by typing on "prime"::
|
|
|
|
repmgr -f /home/prime/repmgr/repmgr.conf --verbose master register
|
|
|
|
On "standby," edit the ``postgresql.conf`` file and change the port to 5434.
|
|
|
|
Start the "standby" server.
|
|
|
|
Register the standby by typing on "standby"::
|
|
|
|
repmgr -f /home/standby/repmgr/repmgr.conf --verbose standby register
|
|
|
|
At this point, you have a functioning primary on "prime" and a functioning
|
|
standby server running on "standby." You can confirm the master knows
|
|
about the standby, and that it is keeping it current, by running the
|
|
following on the master::
|
|
|
|
psql -x -d pgbench -c "SELECT * FROM repmgr_test.repl_status"
|
|
|
|
Some tests you might do at this point include:
|
|
|
|
* Insert some records into the primary server here, confirm they appear
|
|
very quickly (within milliseconds) on the standby, and that the
|
|
repl_status view advances accordingly.
|
|
|
|
* Verify that you can run queries against the standby server, but
|
|
cannot make insertions into the standby database.
|
|
|
|
Simulating the failure of the primary server
|
|
--------------------------------------------
|
|
|
|
To simulate the loss of the primary server, simply stop the "prime" server.
|
|
At this point, the standby contains the database as it existed at the time of
|
|
the "failure" of the primary server.
|
|
|
|
Promoting the Standby to be the Primary
|
|
---------------------------------------
|
|
|
|
Now you can promote the standby server to be the primary, to allow
|
|
applications to read and write to the database again, by typing::
|
|
|
|
repmgr -f /home/standby/repmgr/repmgr.conf --verbose standby promote
|
|
|
|
The server restarts and now has read/write ability.
|
|
|
|
Bringing the former Primary up as a Standby
|
|
-------------------------------------------
|
|
|
|
To make the former primary act as a standby, which is necessary before
|
|
restoring the original roles, type::
|
|
|
|
repmgr -U standby -R prime -h 127.0.0.1 -p 5433 -d testdb --force --verbose standby clone
|
|
|
|
Stop and restart the "prime" server, which is now acting as a standby server.
|
|
|
|
Make sure the record(s) inserted the earlier step are still available on the
|
|
now standby (prime). Confirm the database on "prime" is read-only.
|
|
|
|
Restoring the original roles of prime to primary and standby to standby
|
|
-----------------------------------------------------------------------
|
|
|
|
Now restore to the original configuration by stopping the
|
|
"standby" (now acting as a primary), promoting "prime" again to be the
|
|
primary server, then bringing up "standby" as a standby with a valid
|
|
``recovery.conf`` file on "standby".
|
|
|
|
Stop the "standby" server::
|
|
|
|
repmgr -f /home/prime/repmgr/repmgr.conf standby promote
|
|
|
|
Now the original primary, "prime" is acting again as primary.
|
|
|
|
Start the "standby" server and type this on "prime"::
|
|
|
|
repmgr standby clone --force -h 127.0.0.1 -p 5434 -U prime -R standby --verbose
|
|
|
|
Stop the "standby" and change the port to be 5434 in the ``postgresql.conf``
|
|
file.
|
|
|
|
Verify the roles have reversed by attempting to insert a record on "standby"
|
|
and on "prime."
|
|
|
|
The servers are now again acting as primary on "prime" and standby on "standby".
|
|
|
|
Maintainance of monitor history
|
|
-------------------------------
|
|
|
|
Once you have changed roles (with a failover or to restore original roles)
|
|
you would end up with records saying that node1 is primary and other records
|
|
saying that node2 is the primary. Which could be confusing.
|
|
Also, if you don't do anything about it the monitor history will keep growing.
|
|
For both of those reasons you sometime want to make some maintainance of the
|
|
``repl_monitor`` table.
|
|
|
|
If you want to clean the history after a few days you can execute the
|
|
CLUSTER CLEANUP command in a cron. For example to keep just one day of history
|
|
you can put this in your crontab::
|
|
|
|
0 1 * * * repmgr cluster cleanup -k 1 -f ~/repmgr.conf
|
|
|
|
Configuration and command reference
|
|
===================================
|
|
|
|
Configuration File
|
|
------------------
|
|
|
|
``repmgr.conf`` is looked for in the directory repmgrd or repmgr exists in.
|
|
The configuration file should have 3 lines:
|
|
|
|
1. cluster: A string (single quoted) that identify the cluster we are on
|
|
|
|
2. node: An integer that identify our node in the cluster
|
|
|
|
3. conninfo: A string (single quoted) specifying how we can connect to this node's PostgreSQL service
|
|
|
|
repmgr
|
|
------
|
|
|
|
Command line syntax
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
The current supported syntax for the program can be seen using::
|
|
|
|
repmgr --help
|
|
|
|
The output from this program looks like this::
|
|
|
|
repmgr: Replicator manager
|
|
Usage:
|
|
repmgr [OPTIONS] master {register}
|
|
repmgr [OPTIONS] standby {register|clone|promote|follow}
|
|
|
|
General options:
|
|
--help show this help, then exit
|
|
--version output version information, then exit
|
|
--verbose output verbose activity information
|
|
|
|
Connection options:
|
|
-d, --dbname=DBNAME database to connect to
|
|
-h, --host=HOSTNAME database server host or socket directory
|
|
-p, --port=PORT database server port
|
|
-U, --username=USERNAME database user name to connect as
|
|
|
|
Configuration options:
|
|
-D, --data-dir=DIR local directory where the files will be copied to
|
|
-f, --config-file=PATH path to the configuration file
|
|
-R, --remote-user=USERNAME database server username for rsync
|
|
-w, --wal-keep-segments=VALUE minimum value for the GUC wal_keep_segments (default: 5000)
|
|
-I, --ignore-rsync-warning ignore rsync partial transfer warning
|
|
-F, --force force potentially dangerous operations to happen
|
|
|
|
repmgr performs some tasks like clone a node, promote it or making follow another node and then exits.
|
|
COMMANDS:
|
|
master register - registers the master in a cluster
|
|
standby register - registers a standby in a cluster
|
|
standby clone [node] - allows creation of a new standby
|
|
standby promote - allows manual promotion of a specific standby into a new master in the event of a failover
|
|
standby follow - allows the standby to re-point itself to a new master
|
|
|
|
The ``--verbose`` option can be useful in troubleshooting issues with
|
|
the program.
|
|
|
|
repmgr commands
|
|
---------------
|
|
|
|
Not all of these commands need the ``repmgr.conf`` file, but they need to be able to
|
|
connect to the remote and local databases.
|
|
|
|
You can teach it which is the remote database by using the -h parameter or
|
|
as a last parameter in standby clone and standby follow. If you need to specify
|
|
a port different then the default 5432 you can specify a -p parameter.
|
|
Standby is always considered as localhost and a second -p parameter will indicate
|
|
its port if is different from the default one.
|
|
|
|
* master register
|
|
|
|
* Registers a master in a cluster, it needs to be executed before any
|
|
standby nodes are registered
|
|
|
|
* standby register
|
|
|
|
* Registers a standby in a cluster, it needs to be executed before
|
|
repmgrd will function on the node.
|
|
|
|
* standby clone [node to be cloned]
|
|
|
|
* Does a backup via ``rsync`` of the data directory of the primary. And it
|
|
creates the recovery file we need to start a new hot standby server.
|
|
It doesn't need the ``repmgr.conf`` so it can be executed anywhere on the
|
|
new node. You can change to the directory you want the new database
|
|
cluster at and execute::
|
|
|
|
./repmgr standby clone node1
|
|
|
|
or run from wherever you are with a full path::
|
|
|
|
./repmgr -D /path/to/new/data/directory standby clone node1
|
|
|
|
That will make a backup of the primary then you only need to start the server
|
|
using a command like::
|
|
|
|
pg_ctl -D /your_data_directory_path start
|
|
|
|
Note that some installations will also redirect the output log file when
|
|
executing ``pg_ctl``; check the server startup script you are using
|
|
and try to match what it does.
|
|
|
|
* standby promote
|
|
|
|
* Allows manual promotion of a specific standby into a new primary in the
|
|
event of a failover. This needs to be executed on the same directory
|
|
where the ``repmgr.conf`` is in the standby, or you can use the ``-f`` option
|
|
to indicate where the ``repmgr.conf`` is at. It doesn't need any
|
|
additional arguments::
|
|
|
|
./repmgr standby promote
|
|
|
|
That will restart your standby postgresql service.
|
|
|
|
* standby follow
|
|
|
|
* Allows the standby to base itself to the new primary passed as a
|
|
parameter. This needs to be executed on the same directory where the
|
|
``repmgr.conf`` is in the standby, or you can use the ``-f`` option
|
|
to indicate where the ``repmgr.conf`` is at. Example::
|
|
|
|
./repmgr standby follow
|
|
|
|
* cluster show
|
|
|
|
* Shows the role (standby/master) and connection string for all nodes configured
|
|
in the cluster or "FAILED" if the node doesn't respond. This allow us to know
|
|
which nodes are alive and which one needs attention and to have a notion of the
|
|
structure of clusters we just have access to. Example::
|
|
|
|
./repmgr cluster show
|
|
|
|
* cluster cleanup
|
|
|
|
* Cleans the monitor's history from repmgr tables. This avoids the repl_monitor table
|
|
to grow excesivelly which in turns affects repl_status view performance, also
|
|
keeps controlled the space in disk used by repmgr. This command can be used manually
|
|
or in a cron to make it periodically.
|
|
There is also a --keep-history (-k) option to indicate how many days of history we
|
|
want to keep, so the command will clean up history older than "keep-history" days. Example::
|
|
|
|
./repmgr cluster cleanup -k 2
|
|
|
|
repmgrd Daemon
|
|
--------------
|
|
|
|
Command line syntax
|
|
~~~~~~~~~~~~~~~~~~~
|
|
|
|
The current supported syntax for the program can be seen using::
|
|
|
|
repmgrd --help
|
|
|
|
The output from this program looks like this::
|
|
|
|
repmgrd: Replicator manager daemon
|
|
Usage:
|
|
repmgrd [OPTIONS]
|
|
|
|
Options:
|
|
--help show this help, then exit
|
|
--version output version information, then exit
|
|
--verbose output verbose activity information
|
|
--monitoring-history track advance or lag of the replication in every standby in repl_monitor
|
|
-f, --config-file=PATH path to the configuration file
|
|
-d, --daemonize detach process from foregroun
|
|
-p, --pid-file=PATH write a PID file
|
|
|
|
repmgrd monitors a cluster of servers.
|
|
|
|
The ``--verbose`` option can be useful in troubleshooting issues with
|
|
the program.
|
|
|
|
Usage
|
|
-----
|
|
|
|
repmgrd reads the ``repmgr.conf`` file in current directory, or as
|
|
indicated with -f parameter. If run on a standby, it checks if that
|
|
standby is in ``repl_nodes`` and adds it if not.
|
|
|
|
Before you can run repmgrd you need to register a master in a cluster
|
|
using the ``MASTER REGISTER`` command. If run on a master,
|
|
repmgrd will exit, as it has nothing to do on them yet. It is only
|
|
targeted at running on standby servers currently. If converting
|
|
a former master into a standby, you will need to start repmgrd
|
|
in order to make it fully operational in its new role.
|
|
|
|
The repmgr daemon creates 2 connections: one to the master and another to the
|
|
standby.
|
|
|
|
Lag monitoring
|
|
--------------
|
|
|
|
repmgrd helps monitor a set of master and standby servers. You can
|
|
see which node is the current master, as well as how far behind each
|
|
is from current.
|
|
To activate the monitor capabilities of repmgr you must include the
|
|
option --monitoring-history when running it::
|
|
|
|
repmgrd --monitoring-history --config-file=/path/to/repmgr.conf &
|
|
|
|
To look at the current lag between primary and each node listed
|
|
in ``repl_node``, consult the ``repl_status`` view::
|
|
|
|
psql -d postgres -c "SELECT * FROM repmgr_test.repl_status"
|
|
|
|
This view shows the latest monitor info from every node.
|
|
|
|
* replication_lag: in bytes. This is how far the latest xlog record
|
|
we have received is from master.
|
|
|
|
* apply_lag: in bytes. This is how far the latest xlog record
|
|
we have applied is from the latest record we have received.
|
|
|
|
* time_lag: in seconds. How many seconds behind the master is this node.
|
|
|
|
Error codes
|
|
-----------
|
|
|
|
When the repmgr or repmgrd program exits, it will set one of the
|
|
following
|
|
|
|
* SUCCESS (0) Program ran successfully.
|
|
* ERR_BAD_CONFIG (1) One of the configuration checks the program makes failed.
|
|
* ERR_BAD_RSYNC (2) An rsync call made by the program returned an error.
|
|
* ERR_NO_RESTART (4) An attempt to restart a PostgreSQL instance failed.
|
|
* ERR_DB_CON (6) Error when trying to connect to a database.
|
|
* ERR_DB_QUERY (7) Error executing a database query.
|
|
* ERR_PROMOTED (8) Exiting program because the node has been promoted to master.
|
|
* ERR_BAD_PASSWORD (9) Password used to connect to a database was rejected.
|
|
* ERR_STR_OVERFLOW (10) String overflow error
|
|
* ERR_FAILOVER_FAIL (11) Error encountered during failover (repmgrd only)
|
|
* ERR_BAD_SSH (12) Error when connecting to remote host via SSH
|
|
* ERR_SYS_FAILURE (13) Error when forking (repmgrd only)
|
|
* ERR_BAD_BASEBACKUP (14) Error when executing pg_basebackup
|
|
|
|
License and Contributions
|
|
=========================
|
|
|
|
repmgr is licensed under the GPL v3. All of its code and documentation is
|
|
Copyright 2010-2015, 2ndQuadrant Limited. See the files COPYRIGHT and LICENSE for
|
|
details.
|
|
|
|
Main sponsorship of repmgr has been from 2ndQuadrant customers.
|
|
|
|
Additional work has been sponsored by the 4CaaST project for cloud computing,
|
|
which has received funding from the European Union's Seventh Framework Programme
|
|
(FP7/2007-2013) under grant agreement 258862.
|
|
|
|
Contributions to repmgr are welcome, and will be listed in the file CREDITS.
|
|
2ndQuadrant Limited requires that any contributions provide a copyright
|
|
assignment and a disclaimer of any work-for-hire ownership claims from the
|
|
employer of the developer. This lets us make sure that all of the repmgr
|
|
distribution remains free code. Please contact info@2ndQuadrant.com for a
|
|
copy of the relevant Copyright Assignment Form.
|
|
|
|
Code style
|
|
----------
|
|
|
|
Code in repmgr is formatted to a consistent style using the following command::
|
|
|
|
astyle --style=ansi --indent=tab --suffix=none *.c *.h
|
|
|
|
Contributors should reformat their code similarly before submitting code to
|
|
the project, in order to minimize merge conflicts with other work.
|
|
|
|
Support and Assistance
|
|
======================
|
|
|
|
2ndQuadrant provides 24x7 production support for repmgr, as well as help you
|
|
configure it correctly, verify an installation and train you in running a
|
|
robust replication cluster.
|
|
|
|
There is a mailing list/forum to discuss contributions or issues
|
|
http://groups.google.com/group/repmgr
|
|
|
|
#repmgr is registered in freenode IRC
|
|
|
|
Further information is available at http://www.repmgr.org/
|
|
|
|
We'd love to hear from you about how you use repmgr. Case studies and
|
|
news are always welcome. Send us an email at info@2ndQuadrant.com, or
|
|
send a postcard to
|
|
|
|
repmgr
|
|
c/o 2ndQuadrant
|
|
7200 The Quorum
|
|
Oxford Business Park North
|
|
Oxford
|
|
OX4 2JZ
|
|
|
|
Thanks from the repmgr core team
|
|
|
|
Jaime Casanova
|
|
Simon Riggs
|
|
Greg Smith
|
|
Cedric Villemain
|