Update README with standard two-host examples

2026-06-01 03:39:05 +00:00 · 2011-02-23 08:42:06 -05:00
parent 9c6288993b
commit 5dcec5818f
1 changed files with 320 additions and 84 deletions
@@ -250,7 +250,7 @@ on their partner node without a password.
 First generate a ssh key, using an empty passphrase, and copy the resulting 
 keys and a maching authorization file to a privledged user on the other system::
-  [postgres@db1]$ ssh-keygen -t rsa
+  [postgres@node1]$ ssh-keygen -t rsa
  Generating public/private rsa key pair.
  Enter file in which to save the key (/var/lib/pgsql/.ssh/id_rsa): 
  Enter passphrase (empty for no passphrase): 
@@ -259,19 +259,19 @@ keys and a maching authorization file to a privledged user on the other system::
  Your public key has been saved in /var/lib/pgsql/.ssh/id_rsa.pub.
  The key fingerprint is:
  aa:bb:cc:dd:ee:ff:aa:11:22:33:44:55:66:77:88:99 postgres@db1.domain.com
-  [postgres@db1]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
+  [postgres@node1]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
-  [postgres@db1]$ chmod go-rwx ~/.ssh/*
+  [postgres@node1]$ chmod go-rwx ~/.ssh/*
-  [postgres@db1]$ cd ~/.ssh
+  [postgres@node1]$ cd ~/.ssh
-  [postgres@db1]$ scp id_rsa.pub id_rsa authorized_keys user@db2:
+  [postgres@node1]$ scp id_rsa.pub id_rsa authorized_keys postgres@node2:
 Login as a user on the other system, and install the files into the postgres 
 user's account::
-  [user@db2 ~]$ sudo chown postgres.postgres authorized_keys id_rsa.pub id_rsa
+  [user@node2 ~]$ sudo chown postgres.postgres authorized_keys id_rsa.pub id_rsa
-  [user@db2 ~]$ sudo mkdir -p ~postgres/.ssh
+  [user@node2 ~]$ sudo mkdir -p ~postgres/.ssh
-  [user@db2 ~]$ sudo chown postgres.postgres ~postgres/.ssh
+  [user@node2 ~]$ sudo chown postgres.postgres ~postgres/.ssh
-  [user@db2 ~]$ sudo mv authorized_keys id_rsa.pub id_rsa ~postgres/.ssh
+  [user@node2 ~]$ sudo mv authorized_keys id_rsa.pub id_rsa ~postgres/.ssh
-  [user@db2 ~]$ sudo chmod -R go-rwx ~postgres/.ssh
+  [user@node2 ~]$ sudo chmod -R go-rwx ~postgres/.ssh
 Now test that ssh in both directions works.  You may have to accept some new 
 known hosts in the process.
@@ -381,13 +381,13 @@ its port if is different from the default one.
 * master register
-  * Registers a master in a cluster, it needs to be executed before any node is 
+  * Registers a master in a cluster, it needs to be executed before any
-    registered
+    standby nodes are registered
 * standby register
-  * Registers a standby in a cluster, it needs to be executed before any repmgrd 
+  * Registers a standby in a cluster, it needs to be executed before
-    is executed
+    repmgrd will function on the node.
 * standby clone [node to be cloned] 
@@ -432,8 +432,8 @@ its port if is different from the default one.
        ./repmgr standby follow
-Examples
+Brief examples
-========
+==============
 Suppose we have 3 nodes: node1 (the initial master), node2 and node3
@@ -464,8 +464,8 @@ this::
  PATH=$PGDIR/bin:$PATH repmgr standby promote
-repmgr Daemon
+repmgrd Daemon
-=============
+==============
 Command line syntax
 -------------------
@@ -507,7 +507,7 @@ Lag monitoring
 To look at the current lag between primary and each node listed
 in ``repl_node``, consult the ``repl_status`` view::
-  psql -d postgres -c "SELECT * FROM repl_status"
+  psql -d postgres -c "SELECT * FROM repmgr_test.repl_status"
 This view shows the latest monitor info from every node.
@@ -539,52 +539,260 @@ and the same in the standby.
 The repmgr daemon creates 2 connections: one to the master and another to the
 standby.
 Error codes
 ===========
 When the repmgr or repmgrd program exits, it will set one of the
 following 
 * SUCCESS 0:  Program ran successfully.
 * ERR_BAD_CONFIG 1:  One of the configuration checks the program makes failed.
 * ERR_BAD_RSYNC 2:  An rsync call made by the program returned an error.
 * ERR_STOP_BACKUP 3:  A ``pg_stop_backup()`` call made by the program didn't succeed.
 * ERR_NO_RESTART 4:  An attempt to restart a PostgreSQL instance failed.
 * ERR_NEEDS_XLOG 5:  Could note create the ``pg_xlog`` directory when cloning.
 * ERR_DB_CON 6:  Error when trying to connect to a database.
 * ERR_DB_QUERY 7:  Error executing a database query.
 * ERR_PROMOTED 8:  Exiting program because the node has been promoted to master.
 * ERR_BAD_PASSWORD 9:  Password used to connect to a database was rejected.
 Detailed walkthrough
 ====================
 This assumes you've already followed the steps in "Installation Outline" to
-install repmgr and repmgr on the system.
+install repmgr and repmgrd on the system.
-The following scenario involves two PostgreSQL installations on the same server
+A normal production installation of ``repmgr`` will normally involve two
-hardware, so that additional systems aren't needed for testing.  A normal
+different systems running on the same port, typically the default of 5432, 
-production installation of ``repmgr`` will normally involve two different
+with both using files owned by the ``postgres`` user account.  This
-systems running on the same port, typically the default of 5432, 
+walkthrough assumes the following setup:
 with both using files owned by the ``postgres`` user account.  In places where
 ``127.0.0.1`` is used as a host name below, you would instead use the name of
 the relevant host for that parameter.  You can usually leave out changes
 to the port number in this case too.
-The test setup assumes you might be using the default installation of
+* A primary (master) server called "node1," running as the "postgres" user 
  who is also the owner of the files. This server is operating on port 5432.  This
  server will be known as "node1" in the cluster "test".
 * A secondary (standby) server called "node2," running as the "postgres" user 
  who is also the owner of the files. This server is operating on port 5432.  This
  server will be known as "node2" in the cluster "test".
 * Another standby server called "node3" with a similar configuration to "node2".
 * The Postgress installation in each of the above is defined as $PGDATA, 
  which is represented here as ``/var/lib/pgsql/9.0/data``
 Creating some sample data
 -------------------------
 If you already have a database with useful data to replicate, you can
 skip this step and use it instead.  But if you do not already have
 data in this cluster to replication, you can create some like this::
    createdb pgbench
 	pgbench -i -s 10 pgbench
 Examples below will use the database name ``pgbench`` to match this.
 Substitute the name of your database instead.  Note that the standby
 nodes created here will include information for every database in the
 cluster, not just the specified one.  Needing the database name is
 mainly for user authentication purposes.
 Setting up a repmgr user
 ------------------------
 Make sure that the "standby" user has a role in the database, "pgbench" in this
 case, and can login.   On "node1"::
  createuser --login --superuser repmgr
 Alternately you could start ``psql`` on the pgbench database on "node1" and at
 the node1b# prompt type::
  CREATE ROLE repmgr SUPERUSER LOGIN;
 The main advantage of the latter is that you can do it remotely to any
 system you already have superuser access to.
 Clearing the PostgreSQL installation on the Standby
 ---------------------------------------------------
 To setup a new streaming replica, startin by removing any PostgreSQL
 installation on the existing standby nodes.
 * Stop any server on "node2" and "node3".  You can confirm the database
  servers running using a command like this:
    ps -eaf | grep postgreg
  And looking for the various database server processes:  server, logger,
  wal writer, and autovacuum launcher.
 * Go to "node2" and "node3" database directories and remove the PostgreSQL installation::
    cd $PGDATA
    rm -rf *
  This will delete the entire database installation in ``/var/lib/pgsql/9.0/data``.
  Be careful that $PGDATA is defined here; executing ``ls`` to confirm you're
  in the right place is always a good idea before executing ``rm``.
 Testing remote access to the master
 -----------------------------------
 On the "node2" server, first test that you can connect to "node1" the
 way repmgr will by executing::
  psql -h node1 -U repmgr -d pgbench
 Possible sources for a problem here include:
 * Login role specified was not created on "node1"
 * The database configuration on "node1" is not listening on a TCP/IP port.
   That could be because the ``listen_addresses`` parameter was not updated,
   or if it was but the server wasn't restarted afterwards.  You can
   test this on "node1" itself the same way::
     psql -h node1 -U repmgr -d pgbench
   With the "-h" parameter forcing a connnection over TCP/IP, rather
   than the default UNIX socket method.
 * There is a firewall setup that prevents incoming access to the
   PostgreSQL port (defaulting to 5432) used to access "node1".  In
   this situation you would be able to connect to the "node1" server
   on itself, but not from any other host, and you'd just get a timeout
   when trying rather than a proper error message.
 * The ``pg_hba.conf`` file does not list appropriate statements to allow
   this user to login.  In this case you should connect to the server,
   but see an error message mentioning the ``pg_hba.conf``.
 Cloning the standby
 -------------------
 With "node1" server running, we want to use the ``clone standby`` command
 in repmgr to copy over the entire PostgreSQL database cluster onto the
 "node2" server.  Execute the clone process with::
  repmgr -D $PGDATA -d pgbench -p 5432 -U repmgr -R postgres --verbose standby clone node1
 Here "-U" specifies the database user to connect to the master as, while
 "-R" specifies what user to run the rsync command as.  Potentially you
 could leave out one or both of these, in situations where the user and/or
 role setup is the same on each node.
 If this fails with an error message about accessing the master database,
 you should return to the previous step and confirm access to "node1"
 from "node2" with ``psql``, using the same parameters given to repmgr.
 Setup repmgr configuration file
 -------------------------------
 Create a directory to store each repmgr configuration in for each node.
 In that, there needs to be a ``repmgr.conf`` file for each node in the cluster.
 For each node we'll assume this is stored in ``/var/lib/pgsql/repmgr/repmgr.conf``
 following the standard directory structure of a RHEL system.  It should contain::
  cluster=test
  node=1
  conninfo='host=node1 user=repmgr dbname=pgbench'
 On "node2" create the file ``/var/lib/pgsql/repmgr/repmgr.conf`` with::
  cluster=test
  node=2
  conninfo='host=node2 user=repmgr dbname=pgbench'
 The STANDBY CLONE process should have created a recovery.conf file on
 "node2" in the $PGDATA directory that reads as follows::
  standby_mode = 'on'
  primary_conninfo = 'host=node1 port=5432'
 Registering the master and standby
 ----------------------------------
 First, register the master by typing on "node1"::
  repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose master register
 Start the "standby" server.
 Register the standby by typing on "node2"::
  repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby register
 At this point, you have a functioning primary on "node1" and a functioning
 standby server running on "node2".  You can confirm the master knows
 about the standby, and that it is keeping it current, by running the
 following on the master::
  psql -x -d pgbench -c "SELECT * FROM repmgr_test.repl_status"
 Some tests you might do at this point include:
 * Insert some records into the primary server here, confirm they appear
  very quickly (within milliseconds) on the standby, and that the
  repl_status view advances accordingly.
 * Verify that you can run queries against the standby server, but
  cannot make insertions into the standby database.  
 Simulating the failure of the primary server
 --------------------------------------------
 To simulate the loss of the primary server, simply stop the "node1" server.
 At this point, the standby contains the database as it existed at the time of
 the "failure" of the primary server.
 Promoting the Standby to be the Primary
 ---------------------------------------
 Now you can promote the standby server to be the primary, to allow
 applications to read and write to the database again, by typing::
  repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby promote
 The server restarts and now has read/write ability.
 Bringing the former Primary up as a Standby
 -------------------------------------------
 To make the former primary act as a standby, which is necessary before
 restoring the original roles, type::
  repmgr -U postgres -R postgres -h node1 -p 5432 -d pgbench --force --verbose standby clone
 Stop and restart the "node1" server, which is now acting as a standby server.
 Make sure the record(s) inserted the earlier step are still available on the
 now standby (prime).  Confirm the database on "node1" is read-only.
 Restoring the original roles of prime to primary and standby to standby
 -----------------------------------------------------------------------
 Now restore to the original configuration by stopping
 "node2" (now acting as a primary), promoting "node1" again to be the
 primary server, then bringing up "node2" as a standby with a valid
 ``recovery.conf`` file.
 Stop the "node2" server::
  repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby promote
 Now the original primary, "node1" is acting again as primary.
 Start the "node2" server and type this on "node1"::
  repmgr standby clone --force -h node2 -p 5432 -U postgres -R postgres --verbose
 Verify the roles have reversed by attempting to insert a record on "node"
 and on "node1".
 The servers are now again acting as primary on "node1" and standby on "node2".
 Alternate setup:  both servers on one host
 ==========================================
 Another test setup assumes you might be using the default installation of
 PostgreSQL on port 5432 for some other purpose, and instead relocates these
-instances onto different ports running as different users:
+instances onto different ports running as different users.  In places where
 ``127.0.0.1`` is used as a host name, a more traditional configuration
 would instead use the name of the relevant host for that parameter. 
 You can usually leave out changes to the port number in this case too.
-* A primary (master) server called “prime," with a user as “prime," who is
+* A primary (master) server called "prime," with a user as "prime," who is
  also the owner of the files. This server is operating on port 5433.  This
-  server will be known as “node1" in the cluster “test"
+  server will be known as "node1" in the cluster "test"
-* A standby server called “standby", with a user of “standby", who is the
+* A standby server called "standby", with a user of "standby", who is the
  owner of the files.  This server is operating on port 5434.  This server
-  will be known and “node2" on the cluster “test."
+  will be known and "node2" on the cluster "test."
-* A database exists on “prime" called “testdb."
+* A database exists on "prime" called "testdb."
 * The Postgress installation in each of the above is defined as $PGDATA, 
  which is represented here with ``/data/prime`` as the "prime" server and 
@@ -623,7 +831,7 @@ Setup a streaming replica, strip away any PostgreSQL installation on the existin
 * Stop both servers.
-* Go to “standby" database directory and remove the PostgreSQL installation::
+* Go to "standby" database directory and remove the PostgreSQL installation::
    cd $PGDATA
    rm -rf *
@@ -635,33 +843,33 @@ Building the standby
 Create a directory to store each repmgr configuration in for each node.
 In that, there needs to be a ``repmgr.conf`` file for each node in the cluster.
-For “prime" we'll assume this is stored in ``/home/prime/repmgr``
+For "prime" we'll assume this is stored in ``/home/prime/repmgr``
 and it should contain::
  cluster=test
  node=1
  conninfo='host=127.0.0.1 dbname=testdb'
-On “standby" create the file ``/home/standby/repmgr/repmgr.conf`` with::
+On "standby" create the file ``/home/standby/repmgr/repmgr.conf`` with::
  cluster=test
  node=2
  conninfo='host=127.0.0.1 dbname=testdb'
-Next, with “prime" server running, we want to use the ``clone standby`` command
+Next, with "prime" server running, we want to use the ``clone standby`` command
 in repmgr to copy over the entire PostgreSQL database cluster onto the
-“standby" server.  On the “standby" server, type::
+"standby" server.  On the "standby" server, type::
  repmgr -D $PGDATA -p 5433 -U prime -R prime --verbose standby clone localhost
-Next, we need a recovery.conf file on “standby" in the $PGDATA directory
+Next, we need a recovery.conf file on "standby" in the $PGDATA directory
 that reads as follows::
  standby_mode = 'on'
  primary_conninfo = 'host=127.0.0.1 port=5433'
-Make sure that standby has a qualifying role in the database, “testdb" in this
+Make sure that standby has a qualifying role in the database, "testdb" in this
-case, and can login. Start ``psql`` on the testdb database on “prime" and at
+case, and can login. Start ``psql`` on the testdb database on "prime" and at
 the testdb# prompt type::
  CREATE ROLE standby SUPERUSER LOGIN
@@ -669,30 +877,40 @@ the testdb# prompt type::
 Registering the master and standby
 ----------------------------------
-First, register the master by typing on “prime"::
+First, register the master by typing on "prime"::
  repmgr -f /home/prime/repmgr/repmgr.conf --verbose master register
-On “standby," edit the ``postgresql.conf`` file and change the port to 5434.
+On "standby," edit the ``postgresql.conf`` file and change the port to 5434.
-Start the “standby" server.
+Start the "standby" server.
-Register the standby by typing on “standby"::
+Register the standby by typing on "standby"::
  repmgr -f /home/standby/repmgr/repmgr.conf --verbose standby register
-At this point, you have a functioning primary on “prime" and a functioning
+At this point, you have a functioning primary on "prime" and a functioning
-standby server running on “standby."  It's recommended that you insert some
+standby server running on "standby."  You can confirm the master knows
-records into the primary server here, then confirm they appear very quickly
+about the standby, and that it is keeping it current, by running the
-(within milliseconds) on the standby.  Also verify that one can make queries
+following on the master::
-against the standby server and cannot make insertions into the standby database.  
+
  psql -x -d pgbench -c "SELECT * FROM repmgr_test.repl_status"
 Some tests you might do at this point include:
 * Insert some records into the primary server here, confirm they appear
  very quickly (within milliseconds) on the standby, and that the
  repl_status view advances accordingly.
 * Verify that you can run queries against the standby server, but
  cannot make insertions into the standby database.  
 Simulating the failure of the primary server
 --------------------------------------------
-To simulate the loss of the primary server, simply stop the “prime" server.
+To simulate the loss of the primary server, simply stop the "prime" server.
 At this point, the standby contains the database as it existed at the time of
-the “failure" of the primary server.
+the "failure" of the primary server.
 Promoting the Standby to be the Primary
 ---------------------------------------
@@ -712,36 +930,54 @@ restoring the original roles, type::
  repmgr -U standby -R prime -h 127.0.0.1 -p 5433 -d testdb --force --verbose standby clone
-Stop and restart the “prime" server, which is now acting as a standby server.
+Stop and restart the "prime" server, which is now acting as a standby server.
 Make sure the record(s) inserted the earlier step are still available on the
-now standby (prime).  Confirm the database on “prime" is read-only.
+now standby (prime).  Confirm the database on "prime" is read-only.
 Restoring the original roles of prime to primary and standby to standby
 -----------------------------------------------------------------------
 Now restore to the original configuration by stopping the
-“standby" (now acting as a primary), promoting “prime" again to be the
+"standby" (now acting as a primary), promoting "prime" again to be the
-primary server, then bringing up “standby" as a standby with a valid
+primary server, then bringing up "standby" as a standby with a valid
-``recovery.conf`` file on “standby".
+``recovery.conf`` file on "standby".
-Stop the “standby" server::
+Stop the "standby" server::
  repmgr -f /home/prime/repmgr/repmgr.conf standby promote
-Now the original primary, “prime" is acting again as primary.
+Now the original primary, "prime" is acting again as primary.
-Start the “standby" server and type this on “prime"::
+Start the "standby" server and type this on "prime"::
  repmgr standby clone --force -h 127.0.0.1 -p 5434 -U prime -R standby --verbose
-Stop the “standby" and change the port to be 5434 in the ``postgresql.conf``
+Stop the "standby" and change the port to be 5434 in the ``postgresql.conf``
 file.
-Verify the roles have reversed by attempting to insert a record on “standby"
+Verify the roles have reversed by attempting to insert a record on "standby"
-and on “prime."
+and on "prime."
-The servers are now again acting as primary on “prime" and standby on “standby".
+The servers are now again acting as primary on "prime" and standby on "standby".
 Error codes
 ===========
 When the repmgr or repmgrd program exits, it will set one of the
 following 
 * SUCCESS 0:  Program ran successfully.
 * ERR_BAD_CONFIG 1:  One of the configuration checks the program makes failed.
 * ERR_BAD_RSYNC 2:  An rsync call made by the program returned an error.
 * ERR_STOP_BACKUP 3:  A ``pg_stop_backup()`` call made by the program didn't succeed.
 * ERR_NO_RESTART 4:  An attempt to restart a PostgreSQL instance failed.
 * ERR_NEEDS_XLOG 5:  Could note create the ``pg_xlog`` directory when cloning.
 * ERR_DB_CON 6:  Error when trying to connect to a database.
 * ERR_DB_QUERY 7:  Error executing a database query.
 * ERR_PROMOTED 8:  Exiting program because the node has been promoted to master.
 * ERR_BAD_PASSWORD 9:  Password used to connect to a database was rejected.
 License and Contributions
 =========================