From 419ce28d3dd9b09f7c0927b6dc6bedf332eb96dd Mon Sep 17 00:00:00 2001 From: Ian Barwick Date: Thu, 26 Mar 2015 11:44:09 +0900 Subject: [PATCH] Remove old README --- OLD-README.rst | 905 ------------------------------------------------- 1 file changed, 905 deletions(-) delete mode 100644 OLD-README.rst diff --git a/OLD-README.rst b/OLD-README.rst deleted file mode 100644 index b4293ddd..00000000 --- a/OLD-README.rst +++ /dev/null @@ -1,905 +0,0 @@ -=================================================== -repmgr: Replication Manager for PostgreSQL clusters -=================================================== - -Introduction -============ - -Introduction to repmgr commands -=============================== - -Suppose we have 3 nodes: node1 (the initial master), node2 and node3. -To make node2 and node3 be standbys of node1, execute this on both nodes -(node2 and node3):: - - repmgr -D /var/lib/pgsql/9.0 standby clone node1 - -In order to get full monitoring and easier state transitions, -you register each of the nodes, by creating a ``repmgr.conf`` file -and executing commands like this on the appropriate nodes:: - - repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose master register - repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby register - -Once everything is registered, you start the repmgrd daemon. It -will maintain a view showing the state of all the nodes in the cluster, -including how far they are lagging behind the master. - -If you lose node1 you can then run this on node2:: - - repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby promote - -To make node2 the new master. Then on node3 run:: - - repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby follow - -To make node3 follow node2 (rather than node1). - -If now we want to add a new node, we can a prepare a new server (node4) -and run:: - - repmgr -D /var/lib/pgsql/9.0 standby clone node2 - -And if a previously failed node becomes available again, such as -the lost node1 above, you can get it to resynchronize by only copying -over changes made while it was down. That happens with what's -called a forced clone, which overwrites existing data rather than -assuming it starts with an empty database directory tree:: - - repmgr -D /var/lib/pgsql/9.0 --force standby clone node1 - -This can be much faster than creating a brand new node that must -copy over every file in the database. - -Installation Outline -==================== - -To install and use repmgr and repmgrd follow these steps: - -1. Build repmgr programs - -2. Set up trusted copy between postgres accounts, needed for the - ``STANDBY CLONE`` step - -3. Check your primary server is correctly configured - -4. Write a suitable ``repmgr.conf`` for the node - -5. Setup repmgrd to aid in failover transitions - -Confirm software was built correctly -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -You should now find the repmgr programs available in the subdirectory where -the rest of your PostgreSQL binary files are located. You can confirm the -software is available by checking its version:: - - repmgr --version - repmgrd --version - -You may need to include the full path of the binary instead, such as this -RHEL example:: - - /usr/pgsql-9.0/bin/repmgr --version - /usr/pgsql-9.0/bin/repmgrd --version - -Or in this Debian example:: - - /usr/lib/postgresql/9.0/bin/repmgr --version - /usr/lib/postgresql/9.0/bin/repmgrd --version - -Below this binary installation base directory is referred to as PGDIR. - -Primary server configuration ----------------------------- - -PostgreSQL should have been previously built and installed on the system. Here -is a sample of changes to the ``postgresql.conf`` file:: - - listen_addresses='*' - wal_level = 'hot_standby' - archive_mode = on - archive_command = 'cd .' # we can also use exit 0, anything that - # just does nothing - max_wal_senders = 10 - wal_keep_segments = 5000 # 80 GB required on pg_xlog - hot_standby = on - -Also you need to add the machines that will participate in the cluster in -``pg_hba.conf`` file. One possibility is to trust all connections from the -replication users from all internal addresses, such as:: - - host all all 192.168.1.0/24 trust - host replication all 192.168.1.0/24 trust - -A more secure setup adds a repmgr user and database, just giving -access to that user:: - - host repmgr repmgr 192.168.1.0/24 trust - host replication all 192.168.1.0/24 trust - -If you give a password to the user, you need to create a ``.pgpass`` file for -them as well to allow automatic login. In this case you might use the -``md5`` authentication method instead of ``trust`` for the repmgr user. - -Don't forget to restart the database server after making all these changes. - -Usage walkthrough -================= - -This assumes you've already followed the steps in "Installation Outline" to -install repmgr and repmgrd on the system. - -A typical production installation of ``repmgr`` might involve two PostgreSQL -instances on seperate servers, both running under the ``postgres`` user account -and both using the default port (5432). This walkthrough assumes the following -setup: - -* A primary (master) server called "node1," running as the "postgres" user - who is also the owner of the files. This server is operating on port 5432. This - server will be known as "node1" in the cluster "test". - -* A secondary (standby) server called "node2," running as the "postgres" user - who is also the owner of the files. This server is operating on port 5432. This - server will be known as "node2" in the cluster "test". - -* Another standby server called "node3" with a similar configuration to "node2". - -* The Postgres installation in each of the above is defined as $PGDATA, - which is represented here as ``/var/lib/pgsql/9.0/data`` - -Creating some sample data -------------------------- - -If you already have a database with useful data to replicate, you can -skip this step and use it instead. But if you do not already have -data in this cluster to replication, you can create some like this:: - - createdb pgbench - pgbench -i -s 10 pgbench - -Examples below will use the database name ``pgbench`` to match this. -Substitute the name of your database instead. Note that the standby -nodes created here will include information for every database in the -cluster, not just the specified one. Needing the database name is -mainly for user authentication purposes. - -Setting up a repmgr user ------------------------- - -Make sure that the "standby" user has a role in the database, "pgbench" in this -case, and can login. On "node1":: - - createuser --login --superuser repmgr - -Alternately you could start ``psql`` on the pgbench database on "node1" and at -the node1b# prompt type:: - - CREATE ROLE repmgr SUPERUSER LOGIN; - -The main advantage of the latter is that you can do it remotely to any -system you already have superuser access to. - -Clearing the PostgreSQL installation on the Standby ---------------------------------------------------- - -To setup a new streaming replica, startin by removing any PostgreSQL -installation on the existing standby nodes. - -* Stop any server on "node2" and "node3". You can confirm the database - servers running using a command like this:: - - ps -eaf | grep postgres - - And looking for the various database server processes: server, logger, - wal writer, and autovacuum launcher. - -* Go to "node2" and "node3" database directories and remove the PostgreSQL installation:: - - cd $PGDATA - rm -rf * - - This will delete the entire database installation in ``/var/lib/pgsql/9.0/data``. - Be careful that $PGDATA is defined here; executing ``ls`` to confirm you're - in the right place is always a good idea before executing ``rm``. - -Testing remote access to the master ------------------------------------ - -On the "node2" server, first test that you can connect to "node1" the -way repmgr will by executing:: - - psql -h node1 -U repmgr -d pgbench - -Possible sources for a problem here include: - -* Login role specified was not created on "node1" - -* The database configuration on "node1" is not listening on a TCP/IP port. - That could be because the ``listen_addresses`` parameter was not updated, - or if it was but the server wasn't restarted afterwards. You can - test this on "node1" itself the same way:: - - psql -h node1 -U repmgr -d pgbench - - With the "-h" parameter forcing a connnection over TCP/IP, rather - than the default UNIX socket method. - -* There is a firewall setup that prevents incoming access to the - PostgreSQL port (defaulting to 5432) used to access "node1". In - this situation you would be able to connect to the "node1" server - on itself, but not from any other host, and you'd just get a timeout - when trying rather than a proper error message. - -* The ``pg_hba.conf`` file does not list appropriate statements to allow - this user to login. In this case you should connect to the server, - but see an error message mentioning the ``pg_hba.conf``. - -Cloning the standby -------------------- - -With "node1" server running, we want to use the ``clone standby`` command -in repmgr to copy over the entire PostgreSQL database cluster onto the -"node2" server. Execute the clone process with:: - - repmgr -D $PGDATA -d pgbench -p 5432 -U repmgr -R postgres --verbose standby clone node1 - -Here "-U" specifies the database user to connect to the master as, while -"-R" specifies what user to run the rsync command as. Potentially you -could leave out one or both of these, in situations where the user and/or -role setup is the same on each node. - -If this fails with an error message about accessing the master database, -you should return to the previous step and confirm access to "node1" -from "node2" with ``psql``, using the same parameters given to repmgr. - -NOTE: you need to have $PGDIR/bin (where the PostgreSQL binaries are installed) -in your path for the above to work. If you don't want that as a permanent -setting, you can temporarily set it before running individual commands like -this:: - - PATH=$PGDIR/bin:$PATH repmgr -D $PGDATA ... - -Setup repmgr configuration file -------------------------------- - -Create a directory to store each repmgr configuration in for each node. -In that, there needs to be a ``repmgr.conf`` file for each node in the cluster. -For each node we'll assume this is stored in ``/var/lib/pgsql/repmgr/repmgr.conf`` -following the standard directory structure of a RHEL system. It should contain:: - - cluster=test - node=1 - node_name=earth - conninfo='host=node1 user=repmgr dbname=pgbench' - -On "node2" create the file ``/var/lib/pgsql/repmgr/repmgr.conf`` with:: - - cluster=test - node=2 - node_name=mars - conninfo='host=node2 user=repmgr dbname=pgbench' - -The STANDBY CLONE process should have created a recovery.conf file on -"node2" in the $PGDATA directory that reads as follows:: - - standby_mode = 'on' - primary_conninfo = 'host=node1 port=5432' - -Registering the master and standby ----------------------------------- - -First, register the master by typing on "node1":: - - repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose master register - -Then start the "standby" server. - -You could now register the standby by typing on "node2":: - - repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby register - -However, you can instead start repmgrd:: - - repmgrd -f /var/lib/pgsql/repmgr/repmgr.conf --daemonize --verbose > /var/lib/pgsql/repmgr/repmgr.log 2>&1 - -Which will automatically register your standby system. And eventually -you need repmgrd running anyway, to save lag monitoring information. -repmgrd will log the deamon activity to the listed file. You can -watch what it is doing with:: - - tail -f /var/lib/pgsql/repmgr/repmgr.log - -Hit control-C to exit this tail command when you are done. - -Monitoring and testing ----------------------- - -At this point, you have a functioning primary on "node1" and a functioning -standby server running on "node2". You can confirm the master knows -about the standby, and that it is keeping it current, by looking at -``repl_status``:: - - postgres@node2 $ psql -x -d pgbench -c "SELECT * FROM repmgr_test.repl_status" - -[ RECORD 1 ]-------------+------------------------------ - primary_node | 1 - standby_node | 2 - last_monitor_time | 2011-02-23 08:19:39.791974-05 - last_wal_primary_location | 0/1902D5E0 - last_wal_standby_location | 0/1902D5E0 - replication_lag | 0 bytes - apply_lag | 0 bytes - time_lag | 00:26:13.30293 - -Some tests you might do at this point include: - -* Insert some records into the primary server here, confirm they appear - very quickly (within milliseconds) on the standby, and that the - repl_status view advances accordingly. - -* Verify that you can run queries against the standby server, but - cannot make insertions into the standby database. - -Simulating the failure of the primary server --------------------------------------------- - -To simulate the loss of the primary server, simply stop the "node1" server. -At this point, the standby contains the database as it existed at the time of -the "failure" of the primary server. If looking at ``repl_status`` on -"node2", you should see the time_lag value increase the longer "node1" -is down. - -Promoting the Standby to be the Primary ---------------------------------------- - -Now you can promote the standby server to be the primary, to allow -applications to read and write to the database again, by typing:: - - repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby promote - -The server restarts and now has read/write ability. - -Bringing the former Primary up as a Standby -------------------------------------------- - -To make the former primary act as a standby, which is necessary before -restoring the original roles, type the following on node1:: - - repmgr -D $PGDATA -d pgbench -p 5432 -U repmgr -R postgres --verbose --force standby clone node2 - -Then start the "node1" server, which is now acting as a standby server. -Check - -Make sure the record(s) inserted the earlier step are still available on the -now standby (prime). Confirm the database on "node1" is read-only. - -Restoring the original roles of prime to primary and standby to standby ------------------------------------------------------------------------ - -Now restore to the original configuration by stopping -"node2" (now acting as a primary), promoting "node1" again to be the -primary server, then bringing up "node2" as a standby with a valid -``recovery.conf`` file. - -Stop the "node2" server and type the following on "node1" server:: - - repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby promote - -Now the original primary, "node1", is acting again as primary. - -Start the "node2" server and type this on "node2":: - - repmgr standby clone --force -h node2 -p 5432 -U postgres -R postgres --verbose - -Verify the roles have reversed by attempting to insert a record on "node1" -and on "node2". - -The servers are now again acting as primary on "node1" and standby on "node2". - -Alternate setup: both servers on one host -========================================== - -Another test setup assumes you might be using the default installation of -PostgreSQL on port 5432 for some other purpose, and instead relocates these -instances onto different ports running as different users. In places where -``127.0.0.1`` is used as a host name, a more traditional configuration -would instead use the name of the relevant host for that parameter. -You can usually leave out changes to the port number in this case too. - -* A primary (master) server called "prime," with a user as "prime," who is - also the owner of the files. This server is operating on port 5433. This - server will be known as "node1" in the cluster "test" - -* A standby server called "standby", with a user of "standby", who is the - owner of the files. This server is operating on port 5434. This server - will be known and "node2" on the cluster "test." - -* A database exists on "prime" called "testdb." - -* The Postgres installation in each of the above is defined as $PGDATA, - which is represented here with ``/data/prime`` as the "prime" server and - ``/data/standby`` as the "standby" server. - -You might setup such an installation by adjusting the login script for the -"prime" and "standby" users as in these two examples:: - - # prime - PGDATA=/data/prime - PGENGINE=/usr/pgsql-9.0/bin - PGPORT=5433 - export PGDATA PGENGINE PGPORT - PATH="$PATH:$PGENGINE" - - # standby - PGDATA=/data/standby - PGENGINE=/usr/pgsql-9.0/bin - PGPORT=5434 - export PGDATA PGENGINE PGPORT - PATH="$PATH:$PGENGINE" - -And then starting/stopping each installation as needed using the ``pg_ctl`` -utility. - -Note: naming your nodes based on their starting role is not a recommended -best practice! As you'll see in this example, once there is a failover, names -strongly associated with one particular role (primary or standby) can become -confusing, once that node no longer has that role. Future versions of this -walkthrough are expected to use more generic terminology for these names. - -Clearing the PostgreSQL installation on the Standby ---------------------------------------------------- - -Setup a streaming replica, strip away any PostgreSQL installation on the existing replica: - -* Stop both servers. - -* Go to "standby" database directory and remove the PostgreSQL installation:: - - cd $PGDATA - rm -rf * - - This will delete the entire database installation in ``/data/standby``. - -Building the standby --------------------- - -Create a directory to store each repmgr configuration in for each node. -In that, there needs to be a ``repmgr.conf`` file for each node in the cluster. -For "prime" we'll assume this is stored in ``/home/prime/repmgr`` -and it should contain:: - - cluster=test - node=1 - node_name=earth - conninfo='host=127.0.0.1 dbname=testdb' - -On "standby" create the file ``/home/standby/repmgr/repmgr.conf`` with:: - - cluster=test - node=2 - node_name=mars - conninfo='host=127.0.0.1 dbname=testdb' - -Next, with "prime" server running, we want to use the ``clone standby`` command -in repmgr to copy over the entire PostgreSQL database cluster onto the -"standby" server. On the "standby" server, type:: - - repmgr -D $PGDATA -p 5433 -U prime -R prime --verbose standby clone localhost - -Next, we need a recovery.conf file on "standby" in the $PGDATA directory -that reads as follows:: - - standby_mode = 'on' - primary_conninfo = 'host=127.0.0.1 port=5433' - -Make sure that standby has a qualifying role in the database, "testdb" in this -case, and can login. Start ``psql`` on the testdb database on "prime" and at -the testdb# prompt type:: - - CREATE ROLE standby SUPERUSER LOGIN - -Registering the master and standby ----------------------------------- - -First, register the master by typing on "prime":: - - repmgr -f /home/prime/repmgr/repmgr.conf --verbose master register - -On "standby," edit the ``postgresql.conf`` file and change the port to 5434. - -Start the "standby" server. - -Register the standby by typing on "standby":: - - repmgr -f /home/standby/repmgr/repmgr.conf --verbose standby register - -At this point, you have a functioning primary on "prime" and a functioning -standby server running on "standby." You can confirm the master knows -about the standby, and that it is keeping it current, by running the -following on the master:: - - psql -x -d pgbench -c "SELECT * FROM repmgr_test.repl_status" - -Some tests you might do at this point include: - -* Insert some records into the primary server here, confirm they appear - very quickly (within milliseconds) on the standby, and that the - repl_status view advances accordingly. - -* Verify that you can run queries against the standby server, but - cannot make insertions into the standby database. - -Simulating the failure of the primary server --------------------------------------------- - -To simulate the loss of the primary server, simply stop the "prime" server. -At this point, the standby contains the database as it existed at the time of -the "failure" of the primary server. - -Promoting the Standby to be the Primary ---------------------------------------- - -Now you can promote the standby server to be the primary, to allow -applications to read and write to the database again, by typing:: - - repmgr -f /home/standby/repmgr/repmgr.conf --verbose standby promote - -The server restarts and now has read/write ability. - -Bringing the former Primary up as a Standby -------------------------------------------- - -To make the former primary act as a standby, which is necessary before -restoring the original roles, type:: - - repmgr -U standby -R prime -h 127.0.0.1 -p 5433 -d testdb --force --verbose standby clone - -Stop and restart the "prime" server, which is now acting as a standby server. - -Make sure the record(s) inserted the earlier step are still available on the -now standby (prime). Confirm the database on "prime" is read-only. - -Restoring the original roles of prime to primary and standby to standby ------------------------------------------------------------------------ - -Now restore to the original configuration by stopping the -"standby" (now acting as a primary), promoting "prime" again to be the -primary server, then bringing up "standby" as a standby with a valid -``recovery.conf`` file on "standby". - -Stop the "standby" server:: - - repmgr -f /home/prime/repmgr/repmgr.conf standby promote - -Now the original primary, "prime" is acting again as primary. - -Start the "standby" server and type this on "prime":: - - repmgr standby clone --force -h 127.0.0.1 -p 5434 -U prime -R standby --verbose - -Stop the "standby" and change the port to be 5434 in the ``postgresql.conf`` -file. - -Verify the roles have reversed by attempting to insert a record on "standby" -and on "prime." - -The servers are now again acting as primary on "prime" and standby on "standby". - -Maintainance of monitor history -------------------------------- - -Once you have changed roles (with a failover or to restore original roles) -you would end up with records saying that node1 is primary and other records -saying that node2 is the primary. Which could be confusing. -Also, if you don't do anything about it the monitor history will keep growing. -For both of those reasons you sometime want to make some maintainance of the -``repl_monitor`` table. - -If you want to clean the history after a few days you can execute the -CLUSTER CLEANUP command in a cron. For example to keep just one day of history -you can put this in your crontab:: - -0 1 * * * repmgr cluster cleanup -k 1 -f ~/repmgr.conf - -Configuration and command reference -=================================== - -Configuration File ------------------- - -``repmgr.conf`` is looked for in the directory repmgrd or repmgr exists in. -The configuration file should have 3 lines: - -1. cluster: A string (single quoted) that identify the cluster we are on - -2. node: An integer that identify our node in the cluster - -3. conninfo: A string (single quoted) specifying how we can connect to this node's PostgreSQL service - -repmgr ------- - -Command line syntax -~~~~~~~~~~~~~~~~~~~ - -The current supported syntax for the program can be seen using:: - - repmgr --help - -The output from this program looks like this:: - - repmgr: Replicator manager - Usage: - repmgr [OPTIONS] master {register} - repmgr [OPTIONS] standby {register|clone|promote|follow} - - General options: - --help show this help, then exit - --version output version information, then exit - --verbose output verbose activity information - - Connection options: - -d, --dbname=DBNAME database to connect to - -h, --host=HOSTNAME database server host or socket directory - -p, --port=PORT database server port - -U, --username=USERNAME database user name to connect as - - Configuration options: - -D, --data-dir=DIR local directory where the files will be copied to - -f, --config-file=PATH path to the configuration file - -R, --remote-user=USERNAME database server username for rsync - -w, --wal-keep-segments=VALUE minimum value for the GUC wal_keep_segments (default: 5000) - -I, --ignore-rsync-warning ignore rsync partial transfer warning - -F, --force force potentially dangerous operations to happen - - repmgr performs some tasks like clone a node, promote it or making follow another node and then exits. - COMMANDS: - master register - registers the master in a cluster - standby register - registers a standby in a cluster - standby clone [node] - allows creation of a new standby - standby promote - allows manual promotion of a specific standby into a new master in the event of a failover - standby follow - allows the standby to re-point itself to a new master - -The ``--verbose`` option can be useful in troubleshooting issues with -the program. - -repmgr commands ---------------- - -Not all of these commands need the ``repmgr.conf`` file, but they need to be able to -connect to the remote and local databases. - -You can teach it which is the remote database by using the -h parameter or -as a last parameter in standby clone and standby follow. If you need to specify -a port different then the default 5432 you can specify a -p parameter. -Standby is always considered as localhost and a second -p parameter will indicate -its port if is different from the default one. - -* master register - - * Registers a master in a cluster, it needs to be executed before any - standby nodes are registered - -* standby register - - * Registers a standby in a cluster, it needs to be executed before - repmgrd will function on the node. - -* standby clone [node to be cloned] - - * Does a backup via ``rsync`` of the data directory of the primary. And it - creates the recovery file we need to start a new hot standby server. - It doesn't need the ``repmgr.conf`` so it can be executed anywhere on the - new node. You can change to the directory you want the new database - cluster at and execute:: - - ./repmgr standby clone node1 - - or run from wherever you are with a full path:: - - ./repmgr -D /path/to/new/data/directory standby clone node1 - - That will make a backup of the primary then you only need to start the server - using a command like:: - - pg_ctl -D /your_data_directory_path start - - Note that some installations will also redirect the output log file when - executing ``pg_ctl``; check the server startup script you are using - and try to match what it does. - -* standby promote - - * Allows manual promotion of a specific standby into a new primary in the - event of a failover. This needs to be executed on the same directory - where the ``repmgr.conf`` is in the standby, or you can use the ``-f`` option - to indicate where the ``repmgr.conf`` is at. It doesn't need any - additional arguments:: - - ./repmgr standby promote - - That will restart your standby postgresql service. - -* standby follow - - * Allows the standby to base itself to the new primary passed as a - parameter. This needs to be executed on the same directory where the - ``repmgr.conf`` is in the standby, or you can use the ``-f`` option - to indicate where the ``repmgr.conf`` is at. Example:: - - ./repmgr standby follow - -* cluster show - - * Shows the role (standby/master) and connection string for all nodes configured - in the cluster or "FAILED" if the node doesn't respond. This allow us to know - which nodes are alive and which one needs attention and to have a notion of the - structure of clusters we just have access to. Example:: - - ./repmgr cluster show - -* cluster cleanup - - * Cleans the monitor's history from repmgr tables. This avoids the repl_monitor table - to grow excesivelly which in turns affects repl_status view performance, also - keeps controlled the space in disk used by repmgr. This command can be used manually - or in a cron to make it periodically. - There is also a --keep-history (-k) option to indicate how many days of history we - want to keep, so the command will clean up history older than "keep-history" days. Example:: - - ./repmgr cluster cleanup -k 2 - -repmgrd Daemon --------------- - -Command line syntax -~~~~~~~~~~~~~~~~~~~ - -The current supported syntax for the program can be seen using:: - - repmgrd --help - -The output from this program looks like this:: - - repmgrd: Replicator manager daemon - Usage: - repmgrd [OPTIONS] - - Options: - --help show this help, then exit - --version output version information, then exit - --verbose output verbose activity information - --monitoring-history track advance or lag of the replication in every standby in repl_monitor - -f, --config-file=PATH path to the configuration file - -d, --daemonize detach process from foregroun - -p, --pid-file=PATH write a PID file - - repmgrd monitors a cluster of servers. - -The ``--verbose`` option can be useful in troubleshooting issues with -the program. - -Usage ------ - -repmgrd reads the ``repmgr.conf`` file in current directory, or as -indicated with -f parameter. If run on a standby, it checks if that -standby is in ``repl_nodes`` and adds it if not. - -Before you can run repmgrd you need to register a master in a cluster -using the ``MASTER REGISTER`` command. If run on a master, -repmgrd will exit, as it has nothing to do on them yet. It is only -targeted at running on standby servers currently. If converting -a former master into a standby, you will need to start repmgrd -in order to make it fully operational in its new role. - -The repmgr daemon creates 2 connections: one to the master and another to the -standby. - -Lag monitoring --------------- - -repmgrd helps monitor a set of master and standby servers. You can -see which node is the current master, as well as how far behind each -is from current. -To activate the monitor capabilities of repmgr you must include the -option --monitoring-history when running it:: - - repmgrd --monitoring-history --config-file=/path/to/repmgr.conf & - -To look at the current lag between primary and each node listed -in ``repl_node``, consult the ``repl_status`` view:: - - psql -d postgres -c "SELECT * FROM repmgr_test.repl_status" - -This view shows the latest monitor info from every node. - -* replication_lag: in bytes. This is how far the latest xlog record - we have received is from master. - -* apply_lag: in bytes. This is how far the latest xlog record - we have applied is from the latest record we have received. - -* time_lag: in seconds. How many seconds behind the master is this node. - -Error codes ------------ - -When the repmgr or repmgrd program exits, it will set one of the -following - -* SUCCESS (0) Program ran successfully. -* ERR_BAD_CONFIG (1) One of the configuration checks the program makes failed. -* ERR_BAD_RSYNC (2) An rsync call made by the program returned an error. -* ERR_NO_RESTART (4) An attempt to restart a PostgreSQL instance failed. -* ERR_DB_CON (6) Error when trying to connect to a database. -* ERR_DB_QUERY (7) Error executing a database query. -* ERR_PROMOTED (8) Exiting program because the node has been promoted to master. -* ERR_BAD_PASSWORD (9) Password used to connect to a database was rejected. -* ERR_STR_OVERFLOW (10) String overflow error -* ERR_FAILOVER_FAIL (11) Error encountered during failover (repmgrd only) -* ERR_BAD_SSH (12) Error when connecting to remote host via SSH -* ERR_SYS_FAILURE (13) Error when forking (repmgrd only) -* ERR_BAD_BASEBACKUP (14) Error when executing pg_basebackup - -License and Contributions -========================= - -repmgr is licensed under the GPL v3. All of its code and documentation is -Copyright 2010-2015, 2ndQuadrant Limited. See the files COPYRIGHT and LICENSE for -details. - -Main sponsorship of repmgr has been from 2ndQuadrant customers. - -Additional work has been sponsored by the 4CaaST project for cloud computing, -which has received funding from the European Union's Seventh Framework Programme -(FP7/2007-2013) under grant agreement 258862. - -Contributions to repmgr are welcome, and will be listed in the file CREDITS. -2ndQuadrant Limited requires that any contributions provide a copyright -assignment and a disclaimer of any work-for-hire ownership claims from the -employer of the developer. This lets us make sure that all of the repmgr -distribution remains free code. Please contact info@2ndQuadrant.com for a -copy of the relevant Copyright Assignment Form. - -Code style ----------- - -Code in repmgr is formatted to a consistent style using the following command:: - - astyle --style=ansi --indent=tab --suffix=none *.c *.h - -Contributors should reformat their code similarly before submitting code to -the project, in order to minimize merge conflicts with other work. - -Support and Assistance -====================== - -2ndQuadrant provides 24x7 production support for repmgr, as well as help you -configure it correctly, verify an installation and train you in running a -robust replication cluster. - -There is a mailing list/forum to discuss contributions or issues -http://groups.google.com/group/repmgr - -#repmgr is registered in freenode IRC - -Further information is available at http://www.repmgr.org/ - -We'd love to hear from you about how you use repmgr. Case studies and -news are always welcome. Send us an email at info@2ndQuadrant.com, or -send a postcard to - -repmgr -c/o 2ndQuadrant -7200 The Quorum -Oxford Business Park North -Oxford -OX4 2JZ - -Thanks from the repmgr core team - -Jaime Casanova -Simon Riggs -Greg Smith -Cedric Villemain