Greg Smith 3a950c9f8b Squashed commit of the following:
commit e7ef17117efe6679e154a4905d587c808b48df50
Merge: cd3a280... 43268f2...
Author: Greg Smith <greg@2ndQuadrant.com>
Date:   Tue Jun 7 01:40:08 2011 -0400

    Merge commit 'origin/master' into autofailover

    Conflicts:
    	repmgr.c

commit cd3a280804a01c5270c5c743e5822c7beb9ac77a
Merge: 72ad378... 8200b68...
Author: Greg Smith <greg@2ndQuadrant.com>
Date:   Tue Jun 7 00:52:42 2011 -0400

    Merge commit 'origin/master' into autofailover

    Conflicts:
    	config.c

commit 72ad378bed21d74dab743fec411fe10b19007481
Merge: 17bafa1... 367d0b1...
Author: Greg Smith <greg@2ndQuadrant.com>
Date:   Tue Jun 7 00:38:01 2011 -0400

    Merge commit 'origin/master' into autofailover

    Conflicts:
    	config.c
    	dbutils.c
    	repmgr.c
    	repmgrd.c

commit 17bafa1ca509c1f6614810bab2538e570ebc599e
Author: Greg Smith <greg@2ndQuadrant.com>
Date:   Tue Jun 7 00:31:28 2011 -0400

    Run astyle to fix recent changes

commit a5fbbaecce8fe86bc17c0ebeb1324f9262967316
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Tue May 10 00:46:58 2011 +0200

    Fix a crititcal bug in the decision process

    If the postgresql on the first node returned by the query to find
    candidates in do_failover is down then the initialization of the
    bestCandidate is done with non assigned variables.

    Fix the situation by moving the initialization in the loop above.
    And loop until we have a find_best. Added a log message if no candidate
    is found

commit 42b21475ac248db8f0e50f5956ef96808e92c68c
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Mon May 9 22:39:21 2011 +0200

    Add test_ssh_connection

    The feature was written by Jaime and reworked to fix
    https://github.com/greg2ndQuadrant/repmgr/issues/5

commit 86f01afae631e9541600af6578e649d88c3ece98
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Mon May 9 21:39:42 2011 +0200

    Improve log output

commit db2f29fc1c8ea03a8ff85717873f8a876846b844
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Mon May 9 01:41:34 2011 -0500

    Only compare getenv("USER") when it's actually set, otherwise it
    will segfault

commit ea4f3f20747e2e0294551d5e61869bdde6d3cd7b
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Mon May 9 01:03:39 2011 -0500

    Fix a message to only show when log_info is requested and the verbose
    flag is set.
    This is because it needs a calculation that is only done when the
    verbose flag is set, so if i have requested log INFO level but haven't
    set the flag it shows a null

commit 35a53bac7e341cfdbb64d2c15fa77c9c4e18bd40
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Mon May 9 01:00:54 2011 -0500

    Use log_* functions in do_witness_create()

commit 8c526f758a46ad53b4d391fc76360561d4ff8bdd
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Sun May 8 19:30:34 2011 -0500

    Add a fallback_application_name parameter to the conninfo identify
    the connection if application_name is not set

commit 01057fc12cbc1fb656d619f483044f28a5f08d37
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri May 6 23:57:27 2011 +0200

    Fix the best_candidate loop

    there was an overflow in the loop, already fixed but loosed during merge.

commit e80effa3daf56f08005704fc1a5bbe69c1324212
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri May 6 23:55:15 2011 +0200

    Fix check in do_failover (merge faillure)

    And also remove an unused variable as I was here.

commit 79ba37e2933f4e87523a77375dfda1d96150e7d3
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Thu May 5 21:15:46 2011 +0200

    Fix compile error

commit 67c7b5d68c95a60bb4cd0cfb750b4c8d047fa2a0
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Sun Apr 24 23:27:57 2011 +0200

    And apply astyle  ....

commit 9a321722537d96983b8162227ff629a267b6ed67
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Sun Apr 24 23:27:09 2011 +0200

    Cosmetic change to reduce diff with master

commit 09037efea3fa2c31896b5dc78b0340516a743ba6
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Sun Apr 24 22:26:03 2011 +0200

    Apply astyle

commit 7c4786f662943558be967be4a8dad976f52155dd
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Sun Apr 24 02:22:12 2011 +0200

    Improve the standby clone action

    By default, all config files and directories are cloned from the master in the
    same place in the slave.
    If a destination directory is provided (-D), everything is copied in the
    provided dir, and if the master have tablespaces repmgr exit without cloning.

commit a6d7f765b9403a2cff7e2e1df8ae45a5a7ee1665
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Apr 22 23:31:09 2011 +0200

    Add success message for repmgr standby register

commit 26bf3b08e661137dd3f3c0d3c00fd6b3b90b08b3
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Apr 22 22:51:28 2011 +0200

    Change the exit to a return in config.c

commit 1bd8f4c119e1dbf9a94b2eaec884abce96eeb174
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Apr 22 22:32:57 2011 +0200

    Reduce duplicate code

commit db553fab45ca075f95f09bdb2147de68948b60c8
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Apr 22 22:24:04 2011 +0200

    Some cosmetic

commit f19d0ad714ebcf7df7726772e887c873d005d350
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Apr 22 22:23:06 2011 +0200

    Move a function declaration into header file

commit 1f328bc438c896a9f2067069d756f901b58d41f2
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Mon Apr 11 00:38:30 2011 -0500

    We don't use conninfo as a separate variable anymore

commit f6ade0d63b8a5dd43377f546f5311b4a151b2bfb
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Sun Apr 10 20:53:22 2011 -0500

    Fix a few typos

commit ceca9fa983c8dbde61a7a78da29a1e1871756d8c
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Sun Apr 10 19:32:57 2011 -0500

    Fix code to allow the code to compile:
    - some log_* had problems with parenthesis
    - some uses of variables without the runtime_options prefix

commit 73431f955afd77560bca5370924e09329566c4b7
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Sun Apr 10 23:21:37 2011 +0200

    Fix the debian package name

commit 688eab371110083ae8715b35f414e29c6d87e1ac
Merge: 5c23375... 7995c42...
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Sun Apr 10 23:17:58 2011 +0200

    Merge branch 'autofailover' of git.2ndquadrant.it:repmgr into autofailover

commit 5c23375f88a53ed469e9d13934d618f7a74669be
Merge: cc3315c... c4ae574...
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Sun Apr 10 23:08:36 2011 +0200

    Merge branch 'master' into autofailover

    Conflicts:
    	repmgr.c

commit 7995c428161566cfc54a67eb16f9134c859e7381
Merge: 788ff98... 1303e49...
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Sun Apr 10 16:14:30 2011 -0500

    Merge branch 'autofailover' of git+ssh://git.2ndquadrant.it/git/repmgr into autofailover

commit cc3315ce235b898711c34fd1f2fa1116dbee4e16
Merge: 1303e49... d77186c...
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Sun Apr 10 23:03:11 2011 +0200

    Merge commit 'd77186c90444b9c5ca2de201651841f56a7ded02' into autofailover

commit 1303e49852705046e15ef64f5f7ab739a1689431
Merge: 7ff621b... 4c792c8...
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Sun Apr 10 22:28:08 2011 +0200

    Merge commit '4c792c8013f5713589f53dbdb47721febf139a85' into autofailover

commit 788ff98e94311a33e3e6f7d85a303cbc61288e5f
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Tue Mar 15 19:00:56 2011 -0500

    A few fixes after merge to unbroke what the merge broke, and to make
    the new logging system more consistent through the system

commit 7ff621b96784dfaf40baab4f0f8e7857b4aed6ce
Author: Dan Farina <drfarina@acm.org>
Date:   Tue Dec 7 21:30:44 2010 -0800

    Install install/uninstall SQL also.

    Signed-off-by: Dan Farina <drfarina@acm.org>
    Signed-off-by: Peter van Hardenberg <pvh@heroku.com>

commit c9147dad8223eff20bf5d52ced8a35eed6d82110
Author: Dan Farina <drfarina@acm.org>
Date:   Tue Dec 7 21:30:20 2010 -0800

    Split up install/uninstall actions more like a standard contrib

    Signed-off-by: Dan Farina <drfarina@acm.org>
    Signed-off-by: Peter van Hardenberg <pvh@heroku.com>

commit c8028780b50f2c7fb4384cb9891796647f356e19
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Sat Feb 12 13:29:32 2011 +0100

    Fixing SLEEP times and RETRY

commit 39a1bf3d29f3e33fbf0e1b066a311e8a72f2dc38
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Sat Feb 12 01:17:37 2011 +0100

    Add a pause after update_shared_memory() in do_failover

    we pause for SLEEP_MONITOR+1 to let other nodes update themselves.

commit 527af2baa945e3b640352c01c6dd181d93c9529a
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Feb 11 21:14:22 2011 +0100

    change the debian package filename too

commit c8cb27c7039b2b3a838554187a8add850a42027a
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Feb 11 15:14:40 2011 +0100

    Change package name for the automatic fail-over branch of repmgr

commit 7427988628f754e57069453d65a71f79117c3a3d
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Feb 11 14:28:03 2011 +0100

    Exit 1 when SIGINT

commit af366fe731b70e24ead056e50b69269392bd15a1
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Feb 11 14:27:46 2011 +0100

    Improve log output when reloading configuration

commit 6cc18ce081d7bf55ba9993e9d87567879da35c4d
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Feb 11 14:20:36 2011 +0100

    Add reload conf on (re)start

commit 4259e2c410fd0ef1273c7d1b4ab8fcf1e778e968
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Feb 11 14:01:37 2011 +0100

    avoid double free on repmgrd exit as master
    Per commit from Charles Duffy <charles@tippr.com>
    and faillure to cherry-pick it correctly.

    Conflicts:

    	repmgrd.c

commit 431e27b1c005e000f9a346d982419979b4363d77
Author: Greg Smith <greg@2ndQuadrant.com>
Date:   Thu Feb 10 15:09:18 2011 -0500

    Tweak .gitignore to ignore more doc build artifacts

commit b725fa9ae65c7bd5fea7a4e944db5685dee2e8bd
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Sun Mar 13 15:16:27 2011 -0500

    Delete a paragraph that appears twice, because a merge problem

commit d990c77b327a282c1903b7a339f35a22b6a89958
Author: trbs <trbs@trbs.net>
Date:   Tue Jan 11 18:24:17 2011 +0100

    added note about postgresql-server-dev-9.0 and use libxslt-dev instead of version specific package name

commit 69bc1cd3772103b529598978160327e1f9025157
Author: trbs <trbs@trbs.net>
Date:   Fri Jan 7 01:32:31 2011 +0100

    fix line

commit f7b1d1e5e3764c85cec7afa81c164fac3679e1ea
Author: trbs <trbs@trbs.net>
Date:   Thu Dec 23 15:02:23 2010 +0100

    Updated README with Debian/Ubuntu install information

commit 77d28960ff78c3936be0e1029305b0b578e260a9
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Feb 11 13:34:49 2011 +0100

    Create the function used for shared memory access in create_schema, note that this is incompatible with current master

commit 4a73043f232f0a143ede898841530f4d7442c95b
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Fri Feb 11 10:00:34 2011 +0100

    improve log output

commit 62c90a4e86b2cd56ec14255adcfef564945d0769
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Fri Feb 11 00:40:05 2011 -0500

    Close local connection on witness before exit on error of primary

commit e5156865e05670fa9944d74d472127082556d0a0
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Fri Feb 11 00:34:25 2011 -0500

    Remove a semicolon which is just a typo

commit 7586a09bc321241932adacf6a1431029964dc46f
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Fri Feb 11 00:07:02 2011 -0500

    Fix the computation of quorum, we need to count master and the
    division should not be an integer division

commit a19c0ad2059a00e9e7415fc6ea280c109c809c9c
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Thu Feb 10 23:54:35 2011 +0100

    move the functions back into public schema

commit 19fc8ffb1dc0fd9acddad5d22bf5c01704687474
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Thu Feb 10 00:48:00 2011 -0500

    A few more fixes.
    Make repmgr functions exists in repmgr schema and fix a typo that
    caused a seg fault.

commit c6d2b8c6421f93074d7d616980feb0175ee4ef36
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Wed Feb 9 17:56:44 2011 -0500

    A few places where i forgot to update the priority field

commit 0ff0bb8d981b868693c6a751e7e80473b25f2399
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Wed Feb 9 14:24:43 2011 -0500

    Fix a few bugs from last commit and make reload configuration also
    update registration on repl_nodes

commit 508c34e9dfb2bfb7e47d5c6836ead7992e6112fe
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Wed Feb 9 13:45:20 2011 -0500

    Add a way for the user to indicate it's preference about which node
    should be promoted in case of a conflict (ie: two nodes with the
    same wal location).

    This will be provided as a parameter in repmgr.conf called priority,
    andd will be registered in the repl_nodes table.

commit 6005f1bbf90de61b4c5ebc34302307fa05b019a7
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Wed Feb 9 11:15:30 2011 -0500

    Add a heartbeat for the witness, this should write to repl_monitor
    table so we can see the witness in repl_status and monitor if it
    is working.

    Also close connection at the end of do_witness_create in repmgr.c

commit ac1c6367ab689aeae2eff3dda22db42337f300c1
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Wed Feb 9 01:26:41 2011 -0500

    Add a sighup handler to reload the configuration

commit 7df2fb7b74a3c5287319e56112840d9c2a3e7d5b
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Thu Feb 3 18:42:36 2011 +0100

    Change the is_pgup () check test

    remove spurious 'return'

commit 7e58e6aa91ab3f681854a44fe282b44da81768fa
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Thu Feb 3 16:53:17 2011 +0100

    Add constant for the sleep times and retry, rework monitor functions

    Rename MonitorExecute() to StandbyMonitor()
    Add    WitnessMonitor() # very simple version to start service mode isolation

commit 1b270dab2e2c3c60527b86a33cd0fc9c0d11c08c
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Thu Feb 3 16:23:01 2011 +0100

    Improve PrimaryCheck

    add a function "bool is_pgup()"

    Now, repmgrd-master can work.

commit c6f07229713c8f2b77596459c06184edddd8d77e
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Wed Feb 2 19:31:06 2011 +0100

    Fix strcmp in config parser, now failover parameter should be set correctly

commit 0b690698a0d9aa87d3e8f1e462ee0771aa2ae9e8
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Wed Feb 2 16:23:50 2011 +0100

    fix sprintf extra param

commit 6050da315824048661be9c425ae6005576e5870f
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Wed Feb 2 13:53:29 2011 +0100

    Add some other files to ignore

commit a146dd581b46ea0e26b7b56b087d6b0d4ae15d44
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Wed Feb 2 13:53:04 2011 +0100

    Fix SQL query

commit 8f5db0f9c0f68ce2519afda72b6a778536427eab
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Wed Feb 2 00:51:54 2011 +0100

    Some more minor fix and remove TODO

commit c9299ad74e8f929bdc24804a6a834f24b66b7074
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Wed Feb 2 00:39:18 2011 +0100

    fix some memory leak and fix testlogic for is_standby is_witness

    * is_standby() must be tested *after* is_witness else we think we are in a master
    * remove SELECT * in favor of SELECT witness

commit cc5d06ea8bf1dcde4c264e95eb90f7fb1e821af3
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Tue Feb 1 23:40:15 2011 +0100

    Forgot to remove a param from fprintf

commit 426e22fa8dfd78f0c256bda1b166a31807de9ec6
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Tue Feb 1 22:47:58 2011 +0100

    Restore previous usage of --force and rsync tablespace before data_dir

    The --force option is used to reduce the time needed to restore a failed
    node: it will overwrite existing files thanks to rsync --delete option

    The tablespaces need to be coyed first, because there are symlinks to
    them from the data_directory

commit 1937973fced703d14159e6aae1cbdabb5619accb
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Tue Feb 1 21:09:12 2011 +0100

    Improve message of repmgrd

commit 035a9bcc1eea55cd95790bc72276727cc492694a
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Tue Feb 1 21:08:38 2011 +0100

    Fix (bool *)PQgetval

commit bf9181654213f898949e9c8f094b974915f82258
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Tue Feb 1 01:54:49 2011 +0100

    Fix pg_hba on witness and connection

    * Copy the pg_hba.conf file from master to witness server
    * createdb and createuser in witness if they are different from getenv(USER)

commit a2d8dcb2fd105d8f02bd76856969aca6605c66fa
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Tue Feb 1 01:01:43 2011 +0100

    Improve initialization of repmgr (+ critical bug and minor fixes)

    * standby clone now *clone* the master files and dir to the *same*
      place on the standby if destination_directory is not provided
    * add preload library to the witness configuration
    * sleep 2 seconds after starting the witness postgresql to let it
      start enough to be able to connect to it.
    * Fix rsync files
    * Fix insert configuration into witness

commit bc1a265d272e4805ac7859c208b51b57edd10fc7
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Mon Jan 31 12:25:20 2011 +0100

    Fix some error message new line

commit e087bd5de5ab43ffac90c6a20df6ef3fb19eed6d
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Mon Jan 31 11:37:08 2011 +0100

    Guess data_directory from master in 'standby clone' and remove --force for dir

    --force does not overwrite directories anymore (it was not working very well anyway)
    dest_dir is the same as the master's one by default.
    Move down the tablespace check directories process

commit 0a961e7ef05f26c87af1946b8141a639076fc488
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Mon Jan 31 11:21:40 2011 +0100

    Add new function: create_pgdir (and fix 2 bugs in the process).
    It also fix function create_schema.

    Reduce repmgr code

commit 7e5958dcc1daa9b54cb6f295af96fbef750c7952
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Mon Jan 31 10:34:58 2011 +0100

    Improve an ERROR message

commit f3a66a65a361f919727fc2d0ff9bf9544a10a822
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Mon Jan 31 10:25:45 2011 +0100

    Improve error message about 'wal_keep_segments'

commit 150dbcc0fe53ce4eff08797210fd2e9e4dd0e17a
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Sat Jan 29 23:35:00 2011 -0500

    Add witness server support

commit 6281e22a9c467da883ad960567f8ab6bdbc155ba
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Thu Jan 27 21:32:11 2011 +0100

    Build all at once and update debian makefile to include the sql/

commit 50d752bf1ead7c9343900d4b494844284b7aac6c
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Thu Jan 27 02:10:31 2011 +0100

    Adding information for debian and --version test

commit 16d56dbfa05314eea69869ee2a7a705636432ad9
Author: Cédric Villemain <cedric@2ndQuadrant.fr>
Date:   Thu Jan 27 02:03:20 2011 +0100

    Add a hint at the end of the standby clone
    and minor typo and message shuffle

commit 6404ba247de1e2e3b995f30b6e7626e459849136
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Wed Jan 26 06:13:30 2011 -0500

    Fix compiler warning about variables beign used unintialized

commit a4f48993d5fe3b22bdd2aaefcff315115f8764b7
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Fri Jan 21 21:09:03 2011 -0500

    Fix a new typo

commit 904e61c9edcbbce6b1027c80ff77317d7cbd4919
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Fri Jan 21 19:30:56 2011 -0500

    Use a function to make the call to repmgr_update_standby_location()
    so i avoid typos like the one i fixed in a previous commit. It also
    makes the code cleaner.

commit 4ed388726f4bc0a52cc88d044d1f81697f348a7c
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Wed Jan 19 09:17:16 2011 -0500

    Fix a typo when calling the sql function that writes shared memory

commit d9232266561306eabef90e13c084c051a0e7f458
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Tue Jan 18 01:25:23 2011 -0500

    Define the variable that we are using to test the result status of
    the system() call.

commit 4d131c212b91e40ca027f76637c182456ab12514
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Tue Jan 18 01:04:12 2011 -0500

    Makes repmgrd warn if promote_command or follow_command fails, add
    a "still alive" check for primary.
    Add a few messages and fix a bug in do_failover() in which we were
    using a closed PGresult.

commit a5189e68cf4c8cf84259ea667a35e96de56fa4c9
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Thu Jan 13 15:45:50 2011 -0500

    Initial attempt to get autofailover

commit d0e09010a9d4610997c900b62ea1df2a71b01015
Author: Jaime Casanova <jaime@2ndQuadrant.com>
Date:   Wed Jan 12 14:40:29 2011 -0500

    Add options failover, promote_command and follow_command
    to repmgr.conf, in pass also rename sample repmgr.conf to
    repmgr.conf.sample
    promote_command and follow_command allows to use a custom script
    for those actions.
2011-06-07 01:42:15 -04:00
2011-06-07 01:42:15 -04:00
2011-06-07 01:42:15 -04:00
2011-06-07 01:42:15 -04:00
2011-06-07 01:42:15 -04:00
2011-06-07 01:42:15 -04:00
2011-06-07 01:42:15 -04:00
2011-06-07 01:42:15 -04:00
2011-06-07 01:42:15 -04:00
2011-06-07 01:42:15 -04:00
2010-10-30 13:56:40 -04:00
2011-06-07 01:42:15 -04:00
2011-06-07 01:42:15 -04:00
2011-06-07 01:42:15 -04:00
2011-06-07 00:46:16 -04:00
2011-06-07 01:42:15 -04:00

===================================================
repmgr: Replication Manager for PostgreSQL clusters
===================================================

Introduction
============

PostgreSQL 9.0 allow us to have replicated Hot Standby servers 
which we can query and/or use for high availability.

While the main components of the feature are included with
PostgreSQL, the user is expected to manage the high availability
part of it.

repmgr allows you to monitor and manage your replicated PostgreSQL
databases as a single cluster.  repmgr includes two components:

* repmgr: command program that performs tasks and then exits

* repmgrd: management and monitoring daemon that watches the cluster
  and can automate remote actions.

Requirements
------------

repmgr is currently aimed for installation on UNIX-like systems that include
development tools such as ``gcc`` and ``gmake``.  It also requires that the
``rsync`` utility is available in the PATH of the user running the repmgr
programs.  Some operations also require PostgreSQL components such
as ``pg_config`` and ``pg_ctl`` be in the PATH.

Introduction to repmgr commands
===============================

Suppose we have 3 nodes: node1 (the initial master), node2 and node3.
To make node2 and node3 be standbys of node1, execute this on both nodes
(node2 and node3)::

  repmgr -D /var/lib/pgsql/9.0 standby clone node1

In order to get full monitoring and easier state transitions,
you register each of the nodes, by creating a ``repmgr.conf`` file
and executing commands like this on the appropriate nodes::

  repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose master register
  repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby register

Once everything is registered, you start the repmgrd daemon.  It
will maintain a view showing the state of all the nodes in the cluster,
including how far they are lagging behind the master.

If you lose node1 you can then run this on node2::

  repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby promote 

To make node2 the new master.  Then on node3 run::

  repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby follow

To make node3 follow node2 (rather than node1).

If now we want to add a new node, we can a prepare a new server (node4)
and run::

  repmgr -D /var/lib/pgsql/9.0 standby clone node2
  
And if a previously failed node becomes available again, such as
the lost node1 above, you can get it to resynchronize by only copying
over changes made while it was down using.  That hapens with what's
called a forced clone, which overwrites existing data rather than
assuming it starts with an empty database directory tree::

  repmgr -D /var/lib/pgsql/9.0 --force standby clone node1

This can be much faster than creating a brand new node that must
copy over every file in the database.

Installation Outline
====================

To install and use repmgr and repmgrd follow these steps:

1. Build repmgr programs 

2. Set up trusted copy between postgres accounts, needed for the
   ``STANDBY CLONE`` step

3. Check your primary server is correctly configured

4. Write a suitable ``repmgr.conf`` for the node

5. Setup repmgrd to aid in failover transitions

Build repmgr programs
---------------------

Both methods of installation will place the binaries at the same location as your
postgres binaries, such as ``psql``.  There are two ways to build it.  The second
requires a full PostgreSQL source code tree to install the program directly into.
The first instead uses the PostgreSQL Extension System (PGXS) to install.  For
this method to work, you will need the pg_config program available in your PATH.
In some distributions of PostgreSQL, this requires installing a separate
development package in addition to the basic server software.

Build repmgr programs - PGXS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If you are using a packaged PostgreSQL build and have ``pg_config``
available, the package can be built and installed using PGXS instead::

  tar xvzf repmgr-1.0.tar.gz
  cd repmgr
  make USE_PGXS=1
  make USE_PGXS=1 install

This is preferred to building from the ``contrib`` subdirectory of the main
source code tree.

If you need to remove the source code temporary files from this directory,
that can be done like this::

  make USE_PGXS=1 clean
  
See below for building notes specific to RedHat Linux variants.

Using a full source code tree
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

In this method, the repmgr distribution is copied into the PostgreSQL source
code tree, assumed to be at the ${postgresql_sources} for this example.
The resulting subdirectory must be named ``contrib/repmgr``, without any
version number::

  cp repmgr.tar.gz ${postgresql_sources}/contrib
  cd ${postgresql_sources}/contrib 
  tar xvzf repmgr-1.0.tar.gz
  cd repmgr
  make
  make install

If you need to remove the source code temporary files from this directory,
that can be done like this::

  make clean

Notes on RedHat Linux, Fedora, and CentOS Builds
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The RPM packages of PostgreSQL put ``pg_config`` into the ``postgresql-devel``
package, not the main server one.  And if you have a RPM install of PostgreSQL
9.0, the entire PostgreSQL binary directory will not be in your PATH by default
either.  Individual utilities are made available via the ``alternatives``
mechanism, but not all commands will be wrapped that way.  The files installed
by repmgr will certainly not be in the default PATH for the postgres user
on such a system.  They will instead be in /usr/pgsql-9.0/bin/ on this
type of system.

When building repmgr against a RPM packaged build, you may discover that some
development packages are needed as well.  The following build errors can
occur::

  /usr/bin/ld: cannot find -lxslt
  /usr/bin/ld: cannot find -lpam
  
Install the following packages to correct those::

  yum install libxslt-devel
  yum install pam-devel

If building repmgr as a regular user, then doing the install into the system
directories using sudo, the syntax is hard.  ``pg_config`` won't be in root's
path either.  The following recipe should work::

  sudo PATH="/usr/pgsql-9.0/bin:$PATH" make USE_PGXS=1 install

Issues with 32 and 64 bit RPMs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If when building, you receive a series of errors of this form::

  /usr/bin/ld: skipping incompatible /usr/pgsql-9.0/lib/libpq.so when searching for -lpq

This is likely because you have both the 32 and 64 bit versions of the
``postgresql90-devel`` package installed.  You can check that like this::

  rpm -qa --queryformat '%{NAME}\t%{ARCH}\n'  | grep postgresql90-devel

And if two packages appear, one for i386 and one for x86_64, that's not supposed
to be allowed.

This can happen when using the PGDG repo to install that package;
here is an example sessions demonstrating the problem case appearing::

  # yum install postgresql-devel
  ..
  Setting up Install Process
  Resolving Dependencies
  --> Running transaction check
  ---> Package postgresql90-devel.i386 0:9.0.2-2PGDG.rhel5 set to be updated
  ---> Package postgresql90-devel.x86_64 0:9.0.2-2PGDG.rhel5 set to be updated
  --> Finished Dependency Resolution
  
  Dependencies Resolved

  =========================================================================
   Package               Arch      Version              Repository    Size
  =========================================================================
  Installing:
   postgresql90-devel    i386      9.0.2-2PGDG.rhel5    pgdg90        1.5 M
   postgresql90-devel    x86_64    9.0.2-2PGDG.rhel5    pgdg90        1.6 M

Note how both the i386 and x86_64 platform architectures are selected for
installation.  Your main PostgreSQL package will only be compatible with one of
those, and if the repmgr build finds the wrong postgresql90-devel these
"skipping incompatible" messages appear.

In this case, you can temporarily remove both packages, then just install the
correct one for your architecture.  Example::

  rpm -e postgresql90-devel --allmatches
  yum install postgresql90-devel-9.0.2-2PGDG.rhel5.x86_64

Instead just deleting the package from the wrong platform might not leave behind
the correct files, due to the way in which these accidentally happen to interact.
If you already tried to build repmgr before doing this, you'll need to do::

    make USE_PGXS=1 clean

To get rid of leftover files from the wrong architecture.

Notes on Ubuntu, Debian or other Debian-based Builds
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The Debian packages of PostgreSQL put ``pg_config`` into the development package
called ``postgresql-server-dev-$version``.

When building repmgr against a Debian packages build, you may discover that some
development packages are needed as well. You will need the following development
packages installed::

  sudo apt-get install libxslt-dev libxml2-dev libpam-dev libedit-dev

If your using Debian packages for PostgreSQL and are building repmgr with the
USE_PGXS option you also need to install the corresponding development package::

  sudo apt-get install postgresql-server-dev-9.0

If you build and install repmgr manually it will not be on the system path. The
binaries will be installed in /usr/lib/postgresql/$version/bin/ which is not on
the default path. The reason behind this is that Ubuntu/Debian systems manage
multiple installed versions of PostgreSQL on the same system through a wrapper
called pg_wrapper and repmgr is not (yet) known to this wrapper.

You can solve this in many different ways, the most Debian like is to make an
alternate for repmgr and repmgrd::

  sudo update-alternatives --install /usr/bin/repmgr repmgr /usr/lib/postgresql/9.0/bin/repmgr 10
  sudo update-alternatives --install /usr/bin/repmgrd repmgrd /usr/lib/postgresql/9.0/bin/repmgrd 10

You can also make a deb package of repmgr using::

  make USE_PGXS=1 deb

This will build a Debian package one level up from where you build, normally the 
same directory that you have your repmgr/ directory in.

Confirm software was built correctly
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

You should now find the repmgr programs available in the subdirectory where
the rest of your PostgreSQL installation is at.  You can confirm the software
is available by checking its version::

  repmgr --version
  repmgrd --version

You may need to include the full path of the binary instead, such as this
RHEL example::

  /usr/pgsql-9.0/bin/repmgr --version
  /usr/pgsql-9.0/bin/repmgrd --version

Or in this Debian example::

  /usr/lib/postgresql/9.0/bin/repmgr --version
  /usr/lib/postgresql/9.0/bin/repmgrd --version

Below this binary installation base directory is referred to as PGDIR.

Set up trusted copy between postgres accounts
---------------------------------------------

Initial copy between nodes uses the rsync program running over ssh.  For this 
to work, the postgres accounts on each system need to be able to access files 
on their partner node without a password.

First generate a ssh key, using an empty passphrase, and copy the resulting 
keys and a maching authorization file to a privledged user on the other system::

  [postgres@node1]$ ssh-keygen -t rsa
  Generating public/private rsa key pair.
  Enter file in which to save the key (/var/lib/pgsql/.ssh/id_rsa): 
  Enter passphrase (empty for no passphrase): 
  Enter same passphrase again: 
  Your identification has been saved in /var/lib/pgsql/.ssh/id_rsa.
  Your public key has been saved in /var/lib/pgsql/.ssh/id_rsa.pub.
  The key fingerprint is:
  aa:bb:cc:dd:ee:ff:aa:11:22:33:44:55:66:77:88:99 postgres@db1.domain.com
  [postgres@node1]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
  [postgres@node1]$ chmod go-rwx ~/.ssh/*
  [postgres@node1]$ cd ~/.ssh
  [postgres@node1]$ scp id_rsa.pub id_rsa authorized_keys postgres@node2:

Login as a user on the other system, and install the files into the postgres 
user's account::

  [user@node2 ~]$ sudo chown postgres.postgres authorized_keys id_rsa.pub id_rsa
  [user@node2 ~]$ sudo mkdir -p ~postgres/.ssh
  [user@node2 ~]$ sudo chown postgres.postgres ~postgres/.ssh
  [user@node2 ~]$ sudo mv authorized_keys id_rsa.pub id_rsa ~postgres/.ssh
  [user@node2 ~]$ sudo chmod -R go-rwx ~postgres/.ssh

Now test that ssh in both directions works.  You may have to accept some new 
known hosts in the process.

Primary server configuration
----------------------------

PostgreSQL should have been previously built and installed on the system.  Here
is a sample of changes to the ``postgresql.conf`` file::

  listen_addresses='*'
  wal_level = 'hot_standby'
  archive_mode = on
  archive_command = 'cd .'	 # we can also use exit 0, anything that 
                             # just does nothing
  max_wal_senders = 10
  wal_keep_segments = 5000     # 80 GB required on pg_xlog
  hot_standby = on

Also you need to add the machines that will participate in the cluster in 
``pg_hba.conf`` file.  One possibility is to trust all connections from the
replication users from all internal addresses, such as::

  host     all              all         192.168.1.0/24         trust
  host     replication      all         192.168.1.0/24         trust

A more secure setup adds a repmgr user and database, just giving
access to that user::

  host     repmgr           repmgr      192.168.1.0/24         trust
  host     replication      all         192.168.1.0/24         trust

If you give a password to the user, you need to create a ``.pgpass`` file for
them as well to allow automatic login.  In this case you might use the
``md5`` authentication method instead of ``trust`` for the repmgr user.

Don't forget to restart the database server after making all these changes.

Usage walkthrough
=================

This assumes you've already followed the steps in "Installation Outline" to
install repmgr and repmgrd on the system.

A normal production installation of ``repmgr`` will normally involve two
different systems running on the same port, typically the default of 5432, 
with both using files owned by the ``postgres`` user account.  This
walkthrough assumes the following setup:

* A primary (master) server called "node1," running as the "postgres" user 
  who is also the owner of the files. This server is operating on port 5432.  This
  server will be known as "node1" in the cluster "test".

* A secondary (standby) server called "node2," running as the "postgres" user 
  who is also the owner of the files. This server is operating on port 5432.  This
  server will be known as "node2" in the cluster "test".

* Another standby server called "node3" with a similar configuration to "node2".

* The Postgress installation in each of the above is defined as $PGDATA, 
  which is represented here as ``/var/lib/pgsql/9.0/data``
  
Creating some sample data
-------------------------

If you already have a database with useful data to replicate, you can
skip this step and use it instead.  But if you do not already have
data in this cluster to replication, you can create some like this::

    createdb pgbench
    pgbench -i -s 10 pgbench
	
Examples below will use the database name ``pgbench`` to match this.
Substitute the name of your database instead.  Note that the standby
nodes created here will include information for every database in the
cluster, not just the specified one.  Needing the database name is
mainly for user authentication purposes.

Setting up a repmgr user
------------------------

Make sure that the "standby" user has a role in the database, "pgbench" in this
case, and can login.   On "node1"::

  createuser --login --superuser repmgr

Alternately you could start ``psql`` on the pgbench database on "node1" and at
the node1b# prompt type::

  CREATE ROLE repmgr SUPERUSER LOGIN;

The main advantage of the latter is that you can do it remotely to any
system you already have superuser access to.

Clearing the PostgreSQL installation on the Standby
---------------------------------------------------

To setup a new streaming replica, startin by removing any PostgreSQL
installation on the existing standby nodes.

* Stop any server on "node2" and "node3".  You can confirm the database
  servers running using a command like this::
  
    ps -eaf | grep postgres
	
  And looking for the various database server processes:  server, logger,
  wal writer, and autovacuum launcher.
  
* Go to "node2" and "node3" database directories and remove the PostgreSQL installation::

    cd $PGDATA
    rm -rf *

  This will delete the entire database installation in ``/var/lib/pgsql/9.0/data``.
  Be careful that $PGDATA is defined here; executing ``ls`` to confirm you're
  in the right place is always a good idea before executing ``rm``.

Testing remote access to the master
-----------------------------------

On the "node2" server, first test that you can connect to "node1" the
way repmgr will by executing::

  psql -h node1 -U repmgr -d pgbench

Possible sources for a problem here include:

* Login role specified was not created on "node1"

* The database configuration on "node1" is not listening on a TCP/IP port.
  That could be because the ``listen_addresses`` parameter was not updated,
  or if it was but the server wasn't restarted afterwards.  You can
  test this on "node1" itself the same way::

    psql -h node1 -U repmgr -d pgbench

  With the "-h" parameter forcing a connnection over TCP/IP, rather
  than the default UNIX socket method.

* There is a firewall setup that prevents incoming access to the
  PostgreSQL port (defaulting to 5432) used to access "node1".  In
  this situation you would be able to connect to the "node1" server
  on itself, but not from any other host, and you'd just get a timeout
  when trying rather than a proper error message.
	 
* The ``pg_hba.conf`` file does not list appropriate statements to allow
  this user to login.  In this case you should connect to the server,
  but see an error message mentioning the ``pg_hba.conf``.

Cloning the standby
-------------------

With "node1" server running, we want to use the ``clone standby`` command
in repmgr to copy over the entire PostgreSQL database cluster onto the
"node2" server.  Execute the clone process with::

  repmgr -D $PGDATA -d pgbench -p 5432 -U repmgr -R postgres --verbose standby clone node1

Here "-U" specifies the database user to connect to the master as, while
"-R" specifies what user to run the rsync command as.  Potentially you
could leave out one or both of these, in situations where the user and/or
role setup is the same on each node.

If this fails with an error message about accessing the master database,
you should return to the previous step and confirm access to "node1"
from "node2" with ``psql``, using the same parameters given to repmgr.

NOTE: you need to have $PGDIR/bin (where the PostgreSQL binaries are installed)
in your path for the above to work.  If you don't want that as a permanent
setting, you can temporarily set it before running individual commands like
this::

  PATH=$PGDIR/bin:$PATH repmgr -D $PGDATA ...

Setup repmgr configuration file
-------------------------------

Create a directory to store each repmgr configuration in for each node.
In that, there needs to be a ``repmgr.conf`` file for each node in the cluster.
For each node we'll assume this is stored in ``/var/lib/pgsql/repmgr/repmgr.conf``
following the standard directory structure of a RHEL system.  It should contain::

  cluster=test
  node=1
  conninfo='host=node1 user=repmgr dbname=pgbench'

On "node2" create the file ``/var/lib/pgsql/repmgr/repmgr.conf`` with::

  cluster=test
  node=2
  conninfo='host=node2 user=repmgr dbname=pgbench'

The STANDBY CLONE process should have created a recovery.conf file on
"node2" in the $PGDATA directory that reads as follows::

  standby_mode = 'on'
  primary_conninfo = 'host=node1 port=5432'

Registering the master and standby
----------------------------------

First, register the master by typing on "node1"::

  repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose master register

Then start the "standby" server.

You could now register the standby by typing on "node2"::

  repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby register

However, you can instead start repmgrd::

  repmgrd -f /var/lib/pgsql/repmgr/repmgr.conf --verbose > /var/lib/pgsql/repmgr/repmgr.log 2>&1

Which will automatically register your standby system.  And eventually
you need repmgrd running anyway, to save lag monitoring information.
repmgrd will log the deamon activity to the listed file.  You can
watch what it is doing with::

  tail -f /var/lib/pgsql/repmgr/repmgr.log

Hit control-C to exit this tail command when you are done.

Monitoring and testing
----------------------

At this point, you have a functioning primary on "node1" and a functioning
standby server running on "node2".  You can confirm the master knows
about the standby, and that it is keeping it current, by looking at
``repl_status``::

	postgres@node2 $ psql -x -d pgbench -c "SELECT * FROM repmgr_test.repl_status"
	-[ RECORD 1 ]-------------+------------------------------
	primary_node              | 1
	standby_node              | 2
	last_monitor_time         | 2011-02-23 08:19:39.791974-05
	last_wal_primary_location | 0/1902D5E0
	last_wal_standby_location | 0/1902D5E0
	replication_lag           | 0 bytes
	apply_lag                 | 0 bytes
	time_lag                  | 00:26:13.30293

Some tests you might do at this point include:

* Insert some records into the primary server here, confirm they appear
  very quickly (within milliseconds) on the standby, and that the
  repl_status view advances accordingly.

* Verify that you can run queries against the standby server, but
  cannot make insertions into the standby database.  

Simulating the failure of the primary server
--------------------------------------------

To simulate the loss of the primary server, simply stop the "node1" server.
At this point, the standby contains the database as it existed at the time of
the "failure" of the primary server.  If looking at ``repl_status`` on
"node2", you should see the time_lag value increase the longer "node1" 
is down.

Promoting the Standby to be the Primary
---------------------------------------

Now you can promote the standby server to be the primary, to allow
applications to read and write to the database again, by typing::

  repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby promote

The server restarts and now has read/write ability.

Bringing the former Primary up as a Standby
-------------------------------------------

To make the former primary act as a standby, which is necessary before
restoring the original roles, type the following on node1::

  repmgr -D $PGDATA -d pgbench -p 5432 -U repmgr -R postgres --verbose --force standby clone node2

Then start the "node1" server, which is now acting as a standby server.
Check 

Make sure the record(s) inserted the earlier step are still available on the
now standby (prime).  Confirm the database on "node1" is read-only.

Restoring the original roles of prime to primary and standby to standby
-----------------------------------------------------------------------

Now restore to the original configuration by stopping
"node2" (now acting as a primary), promoting "node1" again to be the
primary server, then bringing up "node2" as a standby with a valid
``recovery.conf`` file.

Stop the "node2" server::

  repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby promote

Now the original primary, "node1" is acting again as primary.

Start the "node2" server and type this on "node1"::

  repmgr standby clone --force -h node2 -p 5432 -U postgres -R postgres --verbose

Verify the roles have reversed by attempting to insert a record on "node"
and on "node1".

The servers are now again acting as primary on "node1" and standby on "node2".

Alternate setup:  both servers on one host
==========================================

Another test setup assumes you might be using the default installation of
PostgreSQL on port 5432 for some other purpose, and instead relocates these
instances onto different ports running as different users.  In places where
``127.0.0.1`` is used as a host name, a more traditional configuration
would instead use the name of the relevant host for that parameter. 
You can usually leave out changes to the port number in this case too.

* A primary (master) server called "prime," with a user as "prime," who is
  also the owner of the files. This server is operating on port 5433.  This
  server will be known as "node1" in the cluster "test"

* A standby server called "standby", with a user of "standby", who is the
  owner of the files.  This server is operating on port 5434.  This server
  will be known and "node2" on the cluster "test."

* A database exists on "prime" called "testdb."

* The Postgress installation in each of the above is defined as $PGDATA, 
  which is represented here with ``/data/prime`` as the "prime" server and 
  ``/data/standby`` as the "standby" server.

You might setup such an installation by adjusting the login script for the
"prime" and "standby" users as in these two examples::

  # prime
  PGDATA=/data/prime
  PGENGINE=/usr/pgsql-9.0/bin
  PGPORT=5433
  export PGDATA PGENGINE PGPORT
  PATH="$PATH:$PGENGINE"

  # standby
  PGDATA=/data/standby
  PGENGINE=/usr/pgsql-9.0/bin
  PGPORT=5434
  export PGDATA PGENGINE PGPORT
  PATH="$PATH:$PGENGINE"

And then starting/stopping each installation as needed using the ``pg_ctl``
utility.

Note:  naming your nodes based on their starting role is not a recommended
best practice!  As you'll see in this example, once there is a failover, names
strongly associated with one particular role (primary or standby) can become
confusing, once that node no longer has that role.  Future versions of this
walkthrough are expected to use more generic terminology for these names.

Clearing the PostgreSQL installation on the Standby
---------------------------------------------------

Setup a streaming replica, strip away any PostgreSQL installation on the existing replica:

* Stop both servers.

* Go to "standby" database directory and remove the PostgreSQL installation::

    cd $PGDATA
    rm -rf *

  This will delete the entire database installation in ``/data/standby``.

Building the standby
--------------------

Create a directory to store each repmgr configuration in for each node.
In that, there needs to be a ``repmgr.conf`` file for each node in the cluster.
For "prime" we'll assume this is stored in ``/home/prime/repmgr``
and it should contain::

  cluster=test
  node=1
  conninfo='host=127.0.0.1 dbname=testdb'

On "standby" create the file ``/home/standby/repmgr/repmgr.conf`` with::

  cluster=test
  node=2
  conninfo='host=127.0.0.1 dbname=testdb'

Next, with "prime" server running, we want to use the ``clone standby`` command
in repmgr to copy over the entire PostgreSQL database cluster onto the
"standby" server.  On the "standby" server, type::

  repmgr -D $PGDATA -p 5433 -U prime -R prime --verbose standby clone localhost

Next, we need a recovery.conf file on "standby" in the $PGDATA directory
that reads as follows::

  standby_mode = 'on'
  primary_conninfo = 'host=127.0.0.1 port=5433'

Make sure that standby has a qualifying role in the database, "testdb" in this
case, and can login. Start ``psql`` on the testdb database on "prime" and at
the testdb# prompt type::

  CREATE ROLE standby SUPERUSER LOGIN

Registering the master and standby
----------------------------------

First, register the master by typing on "prime"::

  repmgr -f /home/prime/repmgr/repmgr.conf --verbose master register

On "standby," edit the ``postgresql.conf`` file and change the port to 5434.

Start the "standby" server.

Register the standby by typing on "standby"::

  repmgr -f /home/standby/repmgr/repmgr.conf --verbose standby register

At this point, you have a functioning primary on "prime" and a functioning
standby server running on "standby."  You can confirm the master knows
about the standby, and that it is keeping it current, by running the
following on the master::

  psql -x -d pgbench -c "SELECT * FROM repmgr_test.repl_status"

Some tests you might do at this point include:

* Insert some records into the primary server here, confirm they appear
  very quickly (within milliseconds) on the standby, and that the
  repl_status view advances accordingly.

* Verify that you can run queries against the standby server, but
  cannot make insertions into the standby database.  

Simulating the failure of the primary server
--------------------------------------------

To simulate the loss of the primary server, simply stop the "prime" server.
At this point, the standby contains the database as it existed at the time of
the "failure" of the primary server.

Promoting the Standby to be the Primary
---------------------------------------

Now you can promote the standby server to be the primary, to allow
applications to read and write to the database again, by typing::

  repmgr -f /home/standby/repmgr/repmgr.conf --verbose standby promote

The server restarts and now has read/write ability.

Bringing the former Primary up as a Standby
-------------------------------------------

To make the former primary act as a standby, which is necessary before
restoring the original roles, type::

  repmgr -U standby -R prime -h 127.0.0.1 -p 5433 -d testdb --force --verbose standby clone

Stop and restart the "prime" server, which is now acting as a standby server.

Make sure the record(s) inserted the earlier step are still available on the
now standby (prime).  Confirm the database on "prime" is read-only.

Restoring the original roles of prime to primary and standby to standby
-----------------------------------------------------------------------

Now restore to the original configuration by stopping the
"standby" (now acting as a primary), promoting "prime" again to be the
primary server, then bringing up "standby" as a standby with a valid
``recovery.conf`` file on "standby".

Stop the "standby" server::

  repmgr -f /home/prime/repmgr/repmgr.conf standby promote

Now the original primary, "prime" is acting again as primary.

Start the "standby" server and type this on "prime"::

  repmgr standby clone --force -h 127.0.0.1 -p 5434 -U prime -R standby --verbose

Stop the "standby" and change the port to be 5434 in the ``postgresql.conf``
file.

Verify the roles have reversed by attempting to insert a record on "standby"
and on "prime."

The servers are now again acting as primary on "prime" and standby on "standby".

Configuration and command reference
===================================

Configuration File
------------------

``repmgr.conf`` is looked for in the directory repmgrd or repmgr exists in.
The configuration file should have 3 lines:

1. cluster: A string (single quoted) that identify the cluster we are on 

2. node: An integer that identify our node in the cluster

3. conninfo: A string (single quoted) specifying how we can connect to this node's PostgreSQL service

repmgr
------

Command line syntax
~~~~~~~~~~~~~~~~~~~

The current supported syntax for the program can be seen using::

  repmgr --help
  
The output from this program looks like this::

  repmgr: Replicator manager 
  Usage:
   repmgr [OPTIONS] master  {register}
   repmgr [OPTIONS] standby {register|clone|promote|follow}

  General options:
    --help                     show this help, then exit
    --version                  output version information, then exit
    --verbose                  output verbose activity information

  Connection options:
    -d, --dbname=DBNAME        database to connect to
    -h, --host=HOSTNAME        database server host or socket directory
    -p, --port=PORT            database server port
    -U, --username=USERNAME    database user name to connect as

  Configuration options:
    -D, --data-dir=DIR         local directory where the files will be copied to
    -f, --config_file=PATH     path to the configuration file
    -R, --remote-user=USERNAME database server username for rsync
    -w, --wal-keep-segments=VALUE  minimum value for the GUC wal_keep_segments (default: 5000)
    -I, --ignore-rsync-warning ignore rsync partial transfer warning
    -F, --force                force potentially dangerous operations to happen

  repmgr performs some tasks like clone a node, promote it or making follow another node and then exits.
  COMMANDS:
   master register       - registers the master in a cluster
   standby register      - registers a standby in a cluster
   standby clone [node]  - allows creation of a new standby
   standby promote       - allows manual promotion of a specific standby into a new master in the event of a failover
   standby follow        - allows the standby to re-point itself to a new master

The ``--verbose`` option can be useful in troubleshooting issues with
the program.

repmgr commands
---------------

Not all of these commands need the ``repmgr.conf`` file, but they need to be able to
connect to the remote and local databases.

You can teach it which is the remote database by using the -h parameter or 
as a last parameter in standby clone and standby follow. If you need to specify
a port different then the default 5432 you can specify a -p parameter.
Standby is always considered as localhost and a second -p parameter will indicate
its port if is different from the default one.

* master register

  * Registers a master in a cluster, it needs to be executed before any
    standby nodes are registered

* standby register

  * Registers a standby in a cluster, it needs to be executed before
    repmgrd will function on the node.

* standby clone [node to be cloned] 

  * Does a backup via ``rsync`` of the data directory of the primary. And it 
    creates the recovery file we need to start a new hot standby server.
    It doesn't need the ``repmgr.conf`` so it can be executed anywhere on the
    new node.  You can change to the directory you want the new database
    cluster at and execute::

      ./repmgr standby clone node1

    or run from wherever you are with a full path::

     ./repmgr -D /path/to/new/data/directory standby clone node1

    That will make a backup of the primary then you only need to start the server
    using a command like::

      pg_ctl -D /your_data_directory_path start

    Note that some installations will also redirect the output log file when
    executing ``pg_ctl``; check the server startup script you are using
    and try to match what it does.

* standby promote 

  * Allows manual promotion of a specific standby into a new primary in the
    event of a failover.  This needs to be executed on the same directory
    where the ``repmgr.conf`` is in the standby, or you can use the ``-f`` option
    to indicate where the ``repmgr.conf`` is at.  It doesn't need any
    additional arguments::

      ./repmgr standby promote

    That will restart your standby postgresql service.

* standby follow 

    * Allows the standby to base itself to the new primary passed as a
      parameter.  This needs to be executed on the same directory where the
      ``repmgr.conf`` is in the standby, or you can use the ``-f`` option
      to indicate where the ``repmgr.conf`` is at.  Example::

        ./repmgr standby follow

repmgrd Daemon
--------------

Command line syntax
~~~~~~~~~~~~~~~~~~~

The current supported syntax for the program can be seen using::

  repmgrd --help
  
The output from this program looks like this::

  repmgrd: Replicator manager daemon 
  Usage:
   repmgrd [OPTIONS]
  
  Options:
    --help                    show this help, then exit
    --version                 output version information, then exit
    --verbose                 output verbose activity information
    -f, --config_file=PATH    database to connect to
  
  repmgrd monitors a cluster of servers.

The ``--verbose`` option can be useful in troubleshooting issues with
the program.

Usage
-----

repmgrd reads the ``repmgr.conf`` file in current directory, or as
indicated with -f parameter.  If run on a standby, it checks if that
standby is in ``repl_nodes`` and adds it if not.

Before you can run repmgrd you need to register a master in a cluster
using the ``MASTER REGISTER`` command.  If run on a master,
repmgrd will exit, as it has nothing to do on them yet.  It is only
targeted at running on standby servers currently.  If converting
a former master into a standby, you will need to start repmgrd
in order to make it fully operational in its new role.

The repmgr daemon creates 2 connections: one to the master and another to the
standby.

Lag monitoring
--------------

repmgrd helps monitor a set of master and standby servers.  You can
see which node is the current master, as well as how far behind each
is from current.

To look at the current lag between primary and each node listed
in ``repl_node``, consult the ``repl_status`` view::

  psql -d postgres -c "SELECT * FROM repmgr_test.repl_status"

This view shows the latest monitor info from every node.
 
* replication_lag: in bytes.  This is how far the latest xlog record 
  we have received is from master.

* apply_lag: in bytes.  This is how far the latest xlog record
  we have applied is from the latest record we have received.

* time_lag: in seconds.  How many seconds behind the master is this node.

Error codes
-----------

When the repmgr or repmgrd program exits, it will set one of the
following 

* SUCCESS 0:  Program ran successfully.
* ERR_BAD_CONFIG 1:  One of the configuration checks the program makes failed.
* ERR_BAD_RSYNC 2:  An rsync call made by the program returned an error.
* ERR_STOP_BACKUP 3:  A ``pg_stop_backup()`` call made by the program didn't succeed.
* ERR_NO_RESTART 4:  An attempt to restart a PostgreSQL instance failed.
* ERR_NEEDS_XLOG 5:  Could note create the ``pg_xlog`` directory when cloning.
* ERR_DB_CON 6:  Error when trying to connect to a database.
* ERR_DB_QUERY 7:  Error executing a database query.
* ERR_PROMOTED 8:  Exiting program because the node has been promoted to master.
* ERR_BAD_PASSWORD 9:  Password used to connect to a database was rejected.

License and Contributions
=========================

repmgr is licensed under the GPL v3.  All of its code and documentation is
Copyright 2010-2011, 2ndQuadrant Limited.  See the files COPYRIGHT and LICENSE for
details.

Contributions to repmgr are welcome, and listed in the file CREDITS.
2ndQuadrant Limited requires that any contributions provide a copyright
assignment and a disclaimer of any work-for-hire ownership claims from the
employer of the developer.  This lets us make sure that all of the repmgr
distribution remains free code.  Please contact info@2ndQuadrant.com for a
copy of the relevant Copyright Assignment Form.

Code style
----------

Code in repmgr is formatted to a consistent style using the following command::

  astyle --style=ansi --indent=tab --suffix=none *.c *.h

Contributors should reformat their code similarly before submitting code to
the project, in order to minimize merge conflicts with other work.
repmgr v5.5.0 Latest
2024-11-22 14:34:48 +00:00
Languages
C 98.1%
Lex 1.3%
Makefile 0.4%
Perl 0.2%