mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-22 22:56:29 +00:00
3a950c9f8b0e097e42c21a44c2c5f05d4b8343af
commit e7ef17117efe6679e154a4905d587c808b48df50 Merge: cd3a280... 43268f2... Author: Greg Smith <greg@2ndQuadrant.com> Date: Tue Jun 7 01:40:08 2011 -0400 Merge commit 'origin/master' into autofailover Conflicts: repmgr.c commit cd3a280804a01c5270c5c743e5822c7beb9ac77a Merge: 72ad378... 8200b68... Author: Greg Smith <greg@2ndQuadrant.com> Date: Tue Jun 7 00:52:42 2011 -0400 Merge commit 'origin/master' into autofailover Conflicts: config.c commit 72ad378bed21d74dab743fec411fe10b19007481 Merge: 17bafa1... 367d0b1... Author: Greg Smith <greg@2ndQuadrant.com> Date: Tue Jun 7 00:38:01 2011 -0400 Merge commit 'origin/master' into autofailover Conflicts: config.c dbutils.c repmgr.c repmgrd.c commit 17bafa1ca509c1f6614810bab2538e570ebc599e Author: Greg Smith <greg@2ndQuadrant.com> Date: Tue Jun 7 00:31:28 2011 -0400 Run astyle to fix recent changes commit a5fbbaecce8fe86bc17c0ebeb1324f9262967316 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Tue May 10 00:46:58 2011 +0200 Fix a crititcal bug in the decision process If the postgresql on the first node returned by the query to find candidates in do_failover is down then the initialization of the bestCandidate is done with non assigned variables. Fix the situation by moving the initialization in the loop above. And loop until we have a find_best. Added a log message if no candidate is found commit 42b21475ac248db8f0e50f5956ef96808e92c68c Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Mon May 9 22:39:21 2011 +0200 Add test_ssh_connection The feature was written by Jaime and reworked to fix https://github.com/greg2ndQuadrant/repmgr/issues/5 commit 86f01afae631e9541600af6578e649d88c3ece98 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Mon May 9 21:39:42 2011 +0200 Improve log output commit db2f29fc1c8ea03a8ff85717873f8a876846b844 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Mon May 9 01:41:34 2011 -0500 Only compare getenv("USER") when it's actually set, otherwise it will segfault commit ea4f3f20747e2e0294551d5e61869bdde6d3cd7b Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Mon May 9 01:03:39 2011 -0500 Fix a message to only show when log_info is requested and the verbose flag is set. This is because it needs a calculation that is only done when the verbose flag is set, so if i have requested log INFO level but haven't set the flag it shows a null commit 35a53bac7e341cfdbb64d2c15fa77c9c4e18bd40 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Mon May 9 01:00:54 2011 -0500 Use log_* functions in do_witness_create() commit 8c526f758a46ad53b4d391fc76360561d4ff8bdd Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Sun May 8 19:30:34 2011 -0500 Add a fallback_application_name parameter to the conninfo identify the connection if application_name is not set commit 01057fc12cbc1fb656d619f483044f28a5f08d37 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri May 6 23:57:27 2011 +0200 Fix the best_candidate loop there was an overflow in the loop, already fixed but loosed during merge. commit e80effa3daf56f08005704fc1a5bbe69c1324212 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri May 6 23:55:15 2011 +0200 Fix check in do_failover (merge faillure) And also remove an unused variable as I was here. commit 79ba37e2933f4e87523a77375dfda1d96150e7d3 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Thu May 5 21:15:46 2011 +0200 Fix compile error commit 67c7b5d68c95a60bb4cd0cfb750b4c8d047fa2a0 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Sun Apr 24 23:27:57 2011 +0200 And apply astyle .... commit 9a321722537d96983b8162227ff629a267b6ed67 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Sun Apr 24 23:27:09 2011 +0200 Cosmetic change to reduce diff with master commit 09037efea3fa2c31896b5dc78b0340516a743ba6 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Sun Apr 24 22:26:03 2011 +0200 Apply astyle commit 7c4786f662943558be967be4a8dad976f52155dd Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Sun Apr 24 02:22:12 2011 +0200 Improve the standby clone action By default, all config files and directories are cloned from the master in the same place in the slave. If a destination directory is provided (-D), everything is copied in the provided dir, and if the master have tablespaces repmgr exit without cloning. commit a6d7f765b9403a2cff7e2e1df8ae45a5a7ee1665 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Apr 22 23:31:09 2011 +0200 Add success message for repmgr standby register commit 26bf3b08e661137dd3f3c0d3c00fd6b3b90b08b3 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Apr 22 22:51:28 2011 +0200 Change the exit to a return in config.c commit 1bd8f4c119e1dbf9a94b2eaec884abce96eeb174 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Apr 22 22:32:57 2011 +0200 Reduce duplicate code commit db553fab45ca075f95f09bdb2147de68948b60c8 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Apr 22 22:24:04 2011 +0200 Some cosmetic commit f19d0ad714ebcf7df7726772e887c873d005d350 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Apr 22 22:23:06 2011 +0200 Move a function declaration into header file commit 1f328bc438c896a9f2067069d756f901b58d41f2 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Mon Apr 11 00:38:30 2011 -0500 We don't use conninfo as a separate variable anymore commit f6ade0d63b8a5dd43377f546f5311b4a151b2bfb Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Sun Apr 10 20:53:22 2011 -0500 Fix a few typos commit ceca9fa983c8dbde61a7a78da29a1e1871756d8c Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Sun Apr 10 19:32:57 2011 -0500 Fix code to allow the code to compile: - some log_* had problems with parenthesis - some uses of variables without the runtime_options prefix commit 73431f955afd77560bca5370924e09329566c4b7 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Sun Apr 10 23:21:37 2011 +0200 Fix the debian package name commit 688eab371110083ae8715b35f414e29c6d87e1ac Merge: 5c23375... 7995c42... Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Sun Apr 10 23:17:58 2011 +0200 Merge branch 'autofailover' of git.2ndquadrant.it:repmgr into autofailover commit 5c23375f88a53ed469e9d13934d618f7a74669be Merge: cc3315c... c4ae574... Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Sun Apr 10 23:08:36 2011 +0200 Merge branch 'master' into autofailover Conflicts: repmgr.c commit 7995c428161566cfc54a67eb16f9134c859e7381 Merge: 788ff98... 1303e49... Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Sun Apr 10 16:14:30 2011 -0500 Merge branch 'autofailover' of git+ssh://git.2ndquadrant.it/git/repmgr into autofailover commit cc3315ce235b898711c34fd1f2fa1116dbee4e16 Merge: 1303e49... d77186c... Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Sun Apr 10 23:03:11 2011 +0200 Merge commit 'd77186c90444b9c5ca2de201651841f56a7ded02' into autofailover commit 1303e49852705046e15ef64f5f7ab739a1689431 Merge: 7ff621b... 4c792c8... Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Sun Apr 10 22:28:08 2011 +0200 Merge commit '4c792c8013f5713589f53dbdb47721febf139a85' into autofailover commit 788ff98e94311a33e3e6f7d85a303cbc61288e5f Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Tue Mar 15 19:00:56 2011 -0500 A few fixes after merge to unbroke what the merge broke, and to make the new logging system more consistent through the system commit 7ff621b96784dfaf40baab4f0f8e7857b4aed6ce Author: Dan Farina <drfarina@acm.org> Date: Tue Dec 7 21:30:44 2010 -0800 Install install/uninstall SQL also. Signed-off-by: Dan Farina <drfarina@acm.org> Signed-off-by: Peter van Hardenberg <pvh@heroku.com> commit c9147dad8223eff20bf5d52ced8a35eed6d82110 Author: Dan Farina <drfarina@acm.org> Date: Tue Dec 7 21:30:20 2010 -0800 Split up install/uninstall actions more like a standard contrib Signed-off-by: Dan Farina <drfarina@acm.org> Signed-off-by: Peter van Hardenberg <pvh@heroku.com> commit c8028780b50f2c7fb4384cb9891796647f356e19 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Sat Feb 12 13:29:32 2011 +0100 Fixing SLEEP times and RETRY commit 39a1bf3d29f3e33fbf0e1b066a311e8a72f2dc38 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Sat Feb 12 01:17:37 2011 +0100 Add a pause after update_shared_memory() in do_failover we pause for SLEEP_MONITOR+1 to let other nodes update themselves. commit 527af2baa945e3b640352c01c6dd181d93c9529a Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Feb 11 21:14:22 2011 +0100 change the debian package filename too commit c8cb27c7039b2b3a838554187a8add850a42027a Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Feb 11 15:14:40 2011 +0100 Change package name for the automatic fail-over branch of repmgr commit 7427988628f754e57069453d65a71f79117c3a3d Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Feb 11 14:28:03 2011 +0100 Exit 1 when SIGINT commit af366fe731b70e24ead056e50b69269392bd15a1 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Feb 11 14:27:46 2011 +0100 Improve log output when reloading configuration commit 6cc18ce081d7bf55ba9993e9d87567879da35c4d Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Feb 11 14:20:36 2011 +0100 Add reload conf on (re)start commit 4259e2c410fd0ef1273c7d1b4ab8fcf1e778e968 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Feb 11 14:01:37 2011 +0100 avoid double free on repmgrd exit as master Per commit from Charles Duffy <charles@tippr.com> and faillure to cherry-pick it correctly. Conflicts: repmgrd.c commit 431e27b1c005e000f9a346d982419979b4363d77 Author: Greg Smith <greg@2ndQuadrant.com> Date: Thu Feb 10 15:09:18 2011 -0500 Tweak .gitignore to ignore more doc build artifacts commit b725fa9ae65c7bd5fea7a4e944db5685dee2e8bd Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Sun Mar 13 15:16:27 2011 -0500 Delete a paragraph that appears twice, because a merge problem commit d990c77b327a282c1903b7a339f35a22b6a89958 Author: trbs <trbs@trbs.net> Date: Tue Jan 11 18:24:17 2011 +0100 added note about postgresql-server-dev-9.0 and use libxslt-dev instead of version specific package name commit 69bc1cd3772103b529598978160327e1f9025157 Author: trbs <trbs@trbs.net> Date: Fri Jan 7 01:32:31 2011 +0100 fix line commit f7b1d1e5e3764c85cec7afa81c164fac3679e1ea Author: trbs <trbs@trbs.net> Date: Thu Dec 23 15:02:23 2010 +0100 Updated README with Debian/Ubuntu install information commit 77d28960ff78c3936be0e1029305b0b578e260a9 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Feb 11 13:34:49 2011 +0100 Create the function used for shared memory access in create_schema, note that this is incompatible with current master commit 4a73043f232f0a143ede898841530f4d7442c95b Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Fri Feb 11 10:00:34 2011 +0100 improve log output commit 62c90a4e86b2cd56ec14255adcfef564945d0769 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Fri Feb 11 00:40:05 2011 -0500 Close local connection on witness before exit on error of primary commit e5156865e05670fa9944d74d472127082556d0a0 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Fri Feb 11 00:34:25 2011 -0500 Remove a semicolon which is just a typo commit 7586a09bc321241932adacf6a1431029964dc46f Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Fri Feb 11 00:07:02 2011 -0500 Fix the computation of quorum, we need to count master and the division should not be an integer division commit a19c0ad2059a00e9e7415fc6ea280c109c809c9c Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Thu Feb 10 23:54:35 2011 +0100 move the functions back into public schema commit 19fc8ffb1dc0fd9acddad5d22bf5c01704687474 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Thu Feb 10 00:48:00 2011 -0500 A few more fixes. Make repmgr functions exists in repmgr schema and fix a typo that caused a seg fault. commit c6d2b8c6421f93074d7d616980feb0175ee4ef36 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Wed Feb 9 17:56:44 2011 -0500 A few places where i forgot to update the priority field commit 0ff0bb8d981b868693c6a751e7e80473b25f2399 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Wed Feb 9 14:24:43 2011 -0500 Fix a few bugs from last commit and make reload configuration also update registration on repl_nodes commit 508c34e9dfb2bfb7e47d5c6836ead7992e6112fe Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Wed Feb 9 13:45:20 2011 -0500 Add a way for the user to indicate it's preference about which node should be promoted in case of a conflict (ie: two nodes with the same wal location). This will be provided as a parameter in repmgr.conf called priority, andd will be registered in the repl_nodes table. commit 6005f1bbf90de61b4c5ebc34302307fa05b019a7 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Wed Feb 9 11:15:30 2011 -0500 Add a heartbeat for the witness, this should write to repl_monitor table so we can see the witness in repl_status and monitor if it is working. Also close connection at the end of do_witness_create in repmgr.c commit ac1c6367ab689aeae2eff3dda22db42337f300c1 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Wed Feb 9 01:26:41 2011 -0500 Add a sighup handler to reload the configuration commit 7df2fb7b74a3c5287319e56112840d9c2a3e7d5b Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Thu Feb 3 18:42:36 2011 +0100 Change the is_pgup () check test remove spurious 'return' commit 7e58e6aa91ab3f681854a44fe282b44da81768fa Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Thu Feb 3 16:53:17 2011 +0100 Add constant for the sleep times and retry, rework monitor functions Rename MonitorExecute() to StandbyMonitor() Add WitnessMonitor() # very simple version to start service mode isolation commit 1b270dab2e2c3c60527b86a33cd0fc9c0d11c08c Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Thu Feb 3 16:23:01 2011 +0100 Improve PrimaryCheck add a function "bool is_pgup()" Now, repmgrd-master can work. commit c6f07229713c8f2b77596459c06184edddd8d77e Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Wed Feb 2 19:31:06 2011 +0100 Fix strcmp in config parser, now failover parameter should be set correctly commit 0b690698a0d9aa87d3e8f1e462ee0771aa2ae9e8 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Wed Feb 2 16:23:50 2011 +0100 fix sprintf extra param commit 6050da315824048661be9c425ae6005576e5870f Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Wed Feb 2 13:53:29 2011 +0100 Add some other files to ignore commit a146dd581b46ea0e26b7b56b087d6b0d4ae15d44 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Wed Feb 2 13:53:04 2011 +0100 Fix SQL query commit 8f5db0f9c0f68ce2519afda72b6a778536427eab Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Wed Feb 2 00:51:54 2011 +0100 Some more minor fix and remove TODO commit c9299ad74e8f929bdc24804a6a834f24b66b7074 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Wed Feb 2 00:39:18 2011 +0100 fix some memory leak and fix testlogic for is_standby is_witness * is_standby() must be tested *after* is_witness else we think we are in a master * remove SELECT * in favor of SELECT witness commit cc5d06ea8bf1dcde4c264e95eb90f7fb1e821af3 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Tue Feb 1 23:40:15 2011 +0100 Forgot to remove a param from fprintf commit 426e22fa8dfd78f0c256bda1b166a31807de9ec6 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Tue Feb 1 22:47:58 2011 +0100 Restore previous usage of --force and rsync tablespace before data_dir The --force option is used to reduce the time needed to restore a failed node: it will overwrite existing files thanks to rsync --delete option The tablespaces need to be coyed first, because there are symlinks to them from the data_directory commit 1937973fced703d14159e6aae1cbdabb5619accb Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Tue Feb 1 21:09:12 2011 +0100 Improve message of repmgrd commit 035a9bcc1eea55cd95790bc72276727cc492694a Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Tue Feb 1 21:08:38 2011 +0100 Fix (bool *)PQgetval commit bf9181654213f898949e9c8f094b974915f82258 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Tue Feb 1 01:54:49 2011 +0100 Fix pg_hba on witness and connection * Copy the pg_hba.conf file from master to witness server * createdb and createuser in witness if they are different from getenv(USER) commit a2d8dcb2fd105d8f02bd76856969aca6605c66fa Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Tue Feb 1 01:01:43 2011 +0100 Improve initialization of repmgr (+ critical bug and minor fixes) * standby clone now *clone* the master files and dir to the *same* place on the standby if destination_directory is not provided * add preload library to the witness configuration * sleep 2 seconds after starting the witness postgresql to let it start enough to be able to connect to it. * Fix rsync files * Fix insert configuration into witness commit bc1a265d272e4805ac7859c208b51b57edd10fc7 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Mon Jan 31 12:25:20 2011 +0100 Fix some error message new line commit e087bd5de5ab43ffac90c6a20df6ef3fb19eed6d Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Mon Jan 31 11:37:08 2011 +0100 Guess data_directory from master in 'standby clone' and remove --force for dir --force does not overwrite directories anymore (it was not working very well anyway) dest_dir is the same as the master's one by default. Move down the tablespace check directories process commit 0a961e7ef05f26c87af1946b8141a639076fc488 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Mon Jan 31 11:21:40 2011 +0100 Add new function: create_pgdir (and fix 2 bugs in the process). It also fix function create_schema. Reduce repmgr code commit 7e5958dcc1daa9b54cb6f295af96fbef750c7952 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Mon Jan 31 10:34:58 2011 +0100 Improve an ERROR message commit f3a66a65a361f919727fc2d0ff9bf9544a10a822 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Mon Jan 31 10:25:45 2011 +0100 Improve error message about 'wal_keep_segments' commit 150dbcc0fe53ce4eff08797210fd2e9e4dd0e17a Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Sat Jan 29 23:35:00 2011 -0500 Add witness server support commit 6281e22a9c467da883ad960567f8ab6bdbc155ba Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Thu Jan 27 21:32:11 2011 +0100 Build all at once and update debian makefile to include the sql/ commit 50d752bf1ead7c9343900d4b494844284b7aac6c Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Thu Jan 27 02:10:31 2011 +0100 Adding information for debian and --version test commit 16d56dbfa05314eea69869ee2a7a705636432ad9 Author: Cédric Villemain <cedric@2ndQuadrant.fr> Date: Thu Jan 27 02:03:20 2011 +0100 Add a hint at the end of the standby clone and minor typo and message shuffle commit 6404ba247de1e2e3b995f30b6e7626e459849136 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Wed Jan 26 06:13:30 2011 -0500 Fix compiler warning about variables beign used unintialized commit a4f48993d5fe3b22bdd2aaefcff315115f8764b7 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Fri Jan 21 21:09:03 2011 -0500 Fix a new typo commit 904e61c9edcbbce6b1027c80ff77317d7cbd4919 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Fri Jan 21 19:30:56 2011 -0500 Use a function to make the call to repmgr_update_standby_location() so i avoid typos like the one i fixed in a previous commit. It also makes the code cleaner. commit 4ed388726f4bc0a52cc88d044d1f81697f348a7c Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Wed Jan 19 09:17:16 2011 -0500 Fix a typo when calling the sql function that writes shared memory commit d9232266561306eabef90e13c084c051a0e7f458 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Tue Jan 18 01:25:23 2011 -0500 Define the variable that we are using to test the result status of the system() call. commit 4d131c212b91e40ca027f76637c182456ab12514 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Tue Jan 18 01:04:12 2011 -0500 Makes repmgrd warn if promote_command or follow_command fails, add a "still alive" check for primary. Add a few messages and fix a bug in do_failover() in which we were using a closed PGresult. commit a5189e68cf4c8cf84259ea667a35e96de56fa4c9 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Thu Jan 13 15:45:50 2011 -0500 Initial attempt to get autofailover commit d0e09010a9d4610997c900b62ea1df2a71b01015 Author: Jaime Casanova <jaime@2ndQuadrant.com> Date: Wed Jan 12 14:40:29 2011 -0500 Add options failover, promote_command and follow_command to repmgr.conf, in pass also rename sample repmgr.conf to repmgr.conf.sample promote_command and follow_command allows to use a custom script for those actions.
===================================================
repmgr: Replication Manager for PostgreSQL clusters
===================================================
Introduction
============
PostgreSQL 9.0 allow us to have replicated Hot Standby servers
which we can query and/or use for high availability.
While the main components of the feature are included with
PostgreSQL, the user is expected to manage the high availability
part of it.
repmgr allows you to monitor and manage your replicated PostgreSQL
databases as a single cluster. repmgr includes two components:
* repmgr: command program that performs tasks and then exits
* repmgrd: management and monitoring daemon that watches the cluster
and can automate remote actions.
Requirements
------------
repmgr is currently aimed for installation on UNIX-like systems that include
development tools such as ``gcc`` and ``gmake``. It also requires that the
``rsync`` utility is available in the PATH of the user running the repmgr
programs. Some operations also require PostgreSQL components such
as ``pg_config`` and ``pg_ctl`` be in the PATH.
Introduction to repmgr commands
===============================
Suppose we have 3 nodes: node1 (the initial master), node2 and node3.
To make node2 and node3 be standbys of node1, execute this on both nodes
(node2 and node3)::
repmgr -D /var/lib/pgsql/9.0 standby clone node1
In order to get full monitoring and easier state transitions,
you register each of the nodes, by creating a ``repmgr.conf`` file
and executing commands like this on the appropriate nodes::
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose master register
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby register
Once everything is registered, you start the repmgrd daemon. It
will maintain a view showing the state of all the nodes in the cluster,
including how far they are lagging behind the master.
If you lose node1 you can then run this on node2::
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby promote
To make node2 the new master. Then on node3 run::
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby follow
To make node3 follow node2 (rather than node1).
If now we want to add a new node, we can a prepare a new server (node4)
and run::
repmgr -D /var/lib/pgsql/9.0 standby clone node2
And if a previously failed node becomes available again, such as
the lost node1 above, you can get it to resynchronize by only copying
over changes made while it was down using. That hapens with what's
called a forced clone, which overwrites existing data rather than
assuming it starts with an empty database directory tree::
repmgr -D /var/lib/pgsql/9.0 --force standby clone node1
This can be much faster than creating a brand new node that must
copy over every file in the database.
Installation Outline
====================
To install and use repmgr and repmgrd follow these steps:
1. Build repmgr programs
2. Set up trusted copy between postgres accounts, needed for the
``STANDBY CLONE`` step
3. Check your primary server is correctly configured
4. Write a suitable ``repmgr.conf`` for the node
5. Setup repmgrd to aid in failover transitions
Build repmgr programs
---------------------
Both methods of installation will place the binaries at the same location as your
postgres binaries, such as ``psql``. There are two ways to build it. The second
requires a full PostgreSQL source code tree to install the program directly into.
The first instead uses the PostgreSQL Extension System (PGXS) to install. For
this method to work, you will need the pg_config program available in your PATH.
In some distributions of PostgreSQL, this requires installing a separate
development package in addition to the basic server software.
Build repmgr programs - PGXS
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If you are using a packaged PostgreSQL build and have ``pg_config``
available, the package can be built and installed using PGXS instead::
tar xvzf repmgr-1.0.tar.gz
cd repmgr
make USE_PGXS=1
make USE_PGXS=1 install
This is preferred to building from the ``contrib`` subdirectory of the main
source code tree.
If you need to remove the source code temporary files from this directory,
that can be done like this::
make USE_PGXS=1 clean
See below for building notes specific to RedHat Linux variants.
Using a full source code tree
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In this method, the repmgr distribution is copied into the PostgreSQL source
code tree, assumed to be at the ${postgresql_sources} for this example.
The resulting subdirectory must be named ``contrib/repmgr``, without any
version number::
cp repmgr.tar.gz ${postgresql_sources}/contrib
cd ${postgresql_sources}/contrib
tar xvzf repmgr-1.0.tar.gz
cd repmgr
make
make install
If you need to remove the source code temporary files from this directory,
that can be done like this::
make clean
Notes on RedHat Linux, Fedora, and CentOS Builds
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The RPM packages of PostgreSQL put ``pg_config`` into the ``postgresql-devel``
package, not the main server one. And if you have a RPM install of PostgreSQL
9.0, the entire PostgreSQL binary directory will not be in your PATH by default
either. Individual utilities are made available via the ``alternatives``
mechanism, but not all commands will be wrapped that way. The files installed
by repmgr will certainly not be in the default PATH for the postgres user
on such a system. They will instead be in /usr/pgsql-9.0/bin/ on this
type of system.
When building repmgr against a RPM packaged build, you may discover that some
development packages are needed as well. The following build errors can
occur::
/usr/bin/ld: cannot find -lxslt
/usr/bin/ld: cannot find -lpam
Install the following packages to correct those::
yum install libxslt-devel
yum install pam-devel
If building repmgr as a regular user, then doing the install into the system
directories using sudo, the syntax is hard. ``pg_config`` won't be in root's
path either. The following recipe should work::
sudo PATH="/usr/pgsql-9.0/bin:$PATH" make USE_PGXS=1 install
Issues with 32 and 64 bit RPMs
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If when building, you receive a series of errors of this form::
/usr/bin/ld: skipping incompatible /usr/pgsql-9.0/lib/libpq.so when searching for -lpq
This is likely because you have both the 32 and 64 bit versions of the
``postgresql90-devel`` package installed. You can check that like this::
rpm -qa --queryformat '%{NAME}\t%{ARCH}\n' | grep postgresql90-devel
And if two packages appear, one for i386 and one for x86_64, that's not supposed
to be allowed.
This can happen when using the PGDG repo to install that package;
here is an example sessions demonstrating the problem case appearing::
# yum install postgresql-devel
..
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package postgresql90-devel.i386 0:9.0.2-2PGDG.rhel5 set to be updated
---> Package postgresql90-devel.x86_64 0:9.0.2-2PGDG.rhel5 set to be updated
--> Finished Dependency Resolution
Dependencies Resolved
=========================================================================
Package Arch Version Repository Size
=========================================================================
Installing:
postgresql90-devel i386 9.0.2-2PGDG.rhel5 pgdg90 1.5 M
postgresql90-devel x86_64 9.0.2-2PGDG.rhel5 pgdg90 1.6 M
Note how both the i386 and x86_64 platform architectures are selected for
installation. Your main PostgreSQL package will only be compatible with one of
those, and if the repmgr build finds the wrong postgresql90-devel these
"skipping incompatible" messages appear.
In this case, you can temporarily remove both packages, then just install the
correct one for your architecture. Example::
rpm -e postgresql90-devel --allmatches
yum install postgresql90-devel-9.0.2-2PGDG.rhel5.x86_64
Instead just deleting the package from the wrong platform might not leave behind
the correct files, due to the way in which these accidentally happen to interact.
If you already tried to build repmgr before doing this, you'll need to do::
make USE_PGXS=1 clean
To get rid of leftover files from the wrong architecture.
Notes on Ubuntu, Debian or other Debian-based Builds
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Debian packages of PostgreSQL put ``pg_config`` into the development package
called ``postgresql-server-dev-$version``.
When building repmgr against a Debian packages build, you may discover that some
development packages are needed as well. You will need the following development
packages installed::
sudo apt-get install libxslt-dev libxml2-dev libpam-dev libedit-dev
If your using Debian packages for PostgreSQL and are building repmgr with the
USE_PGXS option you also need to install the corresponding development package::
sudo apt-get install postgresql-server-dev-9.0
If you build and install repmgr manually it will not be on the system path. The
binaries will be installed in /usr/lib/postgresql/$version/bin/ which is not on
the default path. The reason behind this is that Ubuntu/Debian systems manage
multiple installed versions of PostgreSQL on the same system through a wrapper
called pg_wrapper and repmgr is not (yet) known to this wrapper.
You can solve this in many different ways, the most Debian like is to make an
alternate for repmgr and repmgrd::
sudo update-alternatives --install /usr/bin/repmgr repmgr /usr/lib/postgresql/9.0/bin/repmgr 10
sudo update-alternatives --install /usr/bin/repmgrd repmgrd /usr/lib/postgresql/9.0/bin/repmgrd 10
You can also make a deb package of repmgr using::
make USE_PGXS=1 deb
This will build a Debian package one level up from where you build, normally the
same directory that you have your repmgr/ directory in.
Confirm software was built correctly
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You should now find the repmgr programs available in the subdirectory where
the rest of your PostgreSQL installation is at. You can confirm the software
is available by checking its version::
repmgr --version
repmgrd --version
You may need to include the full path of the binary instead, such as this
RHEL example::
/usr/pgsql-9.0/bin/repmgr --version
/usr/pgsql-9.0/bin/repmgrd --version
Or in this Debian example::
/usr/lib/postgresql/9.0/bin/repmgr --version
/usr/lib/postgresql/9.0/bin/repmgrd --version
Below this binary installation base directory is referred to as PGDIR.
Set up trusted copy between postgres accounts
---------------------------------------------
Initial copy between nodes uses the rsync program running over ssh. For this
to work, the postgres accounts on each system need to be able to access files
on their partner node without a password.
First generate a ssh key, using an empty passphrase, and copy the resulting
keys and a maching authorization file to a privledged user on the other system::
[postgres@node1]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/var/lib/pgsql/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /var/lib/pgsql/.ssh/id_rsa.
Your public key has been saved in /var/lib/pgsql/.ssh/id_rsa.pub.
The key fingerprint is:
aa:bb:cc:dd:ee:ff:aa:11:22:33:44:55:66:77:88:99 postgres@db1.domain.com
[postgres@node1]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[postgres@node1]$ chmod go-rwx ~/.ssh/*
[postgres@node1]$ cd ~/.ssh
[postgres@node1]$ scp id_rsa.pub id_rsa authorized_keys postgres@node2:
Login as a user on the other system, and install the files into the postgres
user's account::
[user@node2 ~]$ sudo chown postgres.postgres authorized_keys id_rsa.pub id_rsa
[user@node2 ~]$ sudo mkdir -p ~postgres/.ssh
[user@node2 ~]$ sudo chown postgres.postgres ~postgres/.ssh
[user@node2 ~]$ sudo mv authorized_keys id_rsa.pub id_rsa ~postgres/.ssh
[user@node2 ~]$ sudo chmod -R go-rwx ~postgres/.ssh
Now test that ssh in both directions works. You may have to accept some new
known hosts in the process.
Primary server configuration
----------------------------
PostgreSQL should have been previously built and installed on the system. Here
is a sample of changes to the ``postgresql.conf`` file::
listen_addresses='*'
wal_level = 'hot_standby'
archive_mode = on
archive_command = 'cd .' # we can also use exit 0, anything that
# just does nothing
max_wal_senders = 10
wal_keep_segments = 5000 # 80 GB required on pg_xlog
hot_standby = on
Also you need to add the machines that will participate in the cluster in
``pg_hba.conf`` file. One possibility is to trust all connections from the
replication users from all internal addresses, such as::
host all all 192.168.1.0/24 trust
host replication all 192.168.1.0/24 trust
A more secure setup adds a repmgr user and database, just giving
access to that user::
host repmgr repmgr 192.168.1.0/24 trust
host replication all 192.168.1.0/24 trust
If you give a password to the user, you need to create a ``.pgpass`` file for
them as well to allow automatic login. In this case you might use the
``md5`` authentication method instead of ``trust`` for the repmgr user.
Don't forget to restart the database server after making all these changes.
Usage walkthrough
=================
This assumes you've already followed the steps in "Installation Outline" to
install repmgr and repmgrd on the system.
A normal production installation of ``repmgr`` will normally involve two
different systems running on the same port, typically the default of 5432,
with both using files owned by the ``postgres`` user account. This
walkthrough assumes the following setup:
* A primary (master) server called "node1," running as the "postgres" user
who is also the owner of the files. This server is operating on port 5432. This
server will be known as "node1" in the cluster "test".
* A secondary (standby) server called "node2," running as the "postgres" user
who is also the owner of the files. This server is operating on port 5432. This
server will be known as "node2" in the cluster "test".
* Another standby server called "node3" with a similar configuration to "node2".
* The Postgress installation in each of the above is defined as $PGDATA,
which is represented here as ``/var/lib/pgsql/9.0/data``
Creating some sample data
-------------------------
If you already have a database with useful data to replicate, you can
skip this step and use it instead. But if you do not already have
data in this cluster to replication, you can create some like this::
createdb pgbench
pgbench -i -s 10 pgbench
Examples below will use the database name ``pgbench`` to match this.
Substitute the name of your database instead. Note that the standby
nodes created here will include information for every database in the
cluster, not just the specified one. Needing the database name is
mainly for user authentication purposes.
Setting up a repmgr user
------------------------
Make sure that the "standby" user has a role in the database, "pgbench" in this
case, and can login. On "node1"::
createuser --login --superuser repmgr
Alternately you could start ``psql`` on the pgbench database on "node1" and at
the node1b# prompt type::
CREATE ROLE repmgr SUPERUSER LOGIN;
The main advantage of the latter is that you can do it remotely to any
system you already have superuser access to.
Clearing the PostgreSQL installation on the Standby
---------------------------------------------------
To setup a new streaming replica, startin by removing any PostgreSQL
installation on the existing standby nodes.
* Stop any server on "node2" and "node3". You can confirm the database
servers running using a command like this::
ps -eaf | grep postgres
And looking for the various database server processes: server, logger,
wal writer, and autovacuum launcher.
* Go to "node2" and "node3" database directories and remove the PostgreSQL installation::
cd $PGDATA
rm -rf *
This will delete the entire database installation in ``/var/lib/pgsql/9.0/data``.
Be careful that $PGDATA is defined here; executing ``ls`` to confirm you're
in the right place is always a good idea before executing ``rm``.
Testing remote access to the master
-----------------------------------
On the "node2" server, first test that you can connect to "node1" the
way repmgr will by executing::
psql -h node1 -U repmgr -d pgbench
Possible sources for a problem here include:
* Login role specified was not created on "node1"
* The database configuration on "node1" is not listening on a TCP/IP port.
That could be because the ``listen_addresses`` parameter was not updated,
or if it was but the server wasn't restarted afterwards. You can
test this on "node1" itself the same way::
psql -h node1 -U repmgr -d pgbench
With the "-h" parameter forcing a connnection over TCP/IP, rather
than the default UNIX socket method.
* There is a firewall setup that prevents incoming access to the
PostgreSQL port (defaulting to 5432) used to access "node1". In
this situation you would be able to connect to the "node1" server
on itself, but not from any other host, and you'd just get a timeout
when trying rather than a proper error message.
* The ``pg_hba.conf`` file does not list appropriate statements to allow
this user to login. In this case you should connect to the server,
but see an error message mentioning the ``pg_hba.conf``.
Cloning the standby
-------------------
With "node1" server running, we want to use the ``clone standby`` command
in repmgr to copy over the entire PostgreSQL database cluster onto the
"node2" server. Execute the clone process with::
repmgr -D $PGDATA -d pgbench -p 5432 -U repmgr -R postgres --verbose standby clone node1
Here "-U" specifies the database user to connect to the master as, while
"-R" specifies what user to run the rsync command as. Potentially you
could leave out one or both of these, in situations where the user and/or
role setup is the same on each node.
If this fails with an error message about accessing the master database,
you should return to the previous step and confirm access to "node1"
from "node2" with ``psql``, using the same parameters given to repmgr.
NOTE: you need to have $PGDIR/bin (where the PostgreSQL binaries are installed)
in your path for the above to work. If you don't want that as a permanent
setting, you can temporarily set it before running individual commands like
this::
PATH=$PGDIR/bin:$PATH repmgr -D $PGDATA ...
Setup repmgr configuration file
-------------------------------
Create a directory to store each repmgr configuration in for each node.
In that, there needs to be a ``repmgr.conf`` file for each node in the cluster.
For each node we'll assume this is stored in ``/var/lib/pgsql/repmgr/repmgr.conf``
following the standard directory structure of a RHEL system. It should contain::
cluster=test
node=1
conninfo='host=node1 user=repmgr dbname=pgbench'
On "node2" create the file ``/var/lib/pgsql/repmgr/repmgr.conf`` with::
cluster=test
node=2
conninfo='host=node2 user=repmgr dbname=pgbench'
The STANDBY CLONE process should have created a recovery.conf file on
"node2" in the $PGDATA directory that reads as follows::
standby_mode = 'on'
primary_conninfo = 'host=node1 port=5432'
Registering the master and standby
----------------------------------
First, register the master by typing on "node1"::
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose master register
Then start the "standby" server.
You could now register the standby by typing on "node2"::
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby register
However, you can instead start repmgrd::
repmgrd -f /var/lib/pgsql/repmgr/repmgr.conf --verbose > /var/lib/pgsql/repmgr/repmgr.log 2>&1
Which will automatically register your standby system. And eventually
you need repmgrd running anyway, to save lag monitoring information.
repmgrd will log the deamon activity to the listed file. You can
watch what it is doing with::
tail -f /var/lib/pgsql/repmgr/repmgr.log
Hit control-C to exit this tail command when you are done.
Monitoring and testing
----------------------
At this point, you have a functioning primary on "node1" and a functioning
standby server running on "node2". You can confirm the master knows
about the standby, and that it is keeping it current, by looking at
``repl_status``::
postgres@node2 $ psql -x -d pgbench -c "SELECT * FROM repmgr_test.repl_status"
-[ RECORD 1 ]-------------+------------------------------
primary_node | 1
standby_node | 2
last_monitor_time | 2011-02-23 08:19:39.791974-05
last_wal_primary_location | 0/1902D5E0
last_wal_standby_location | 0/1902D5E0
replication_lag | 0 bytes
apply_lag | 0 bytes
time_lag | 00:26:13.30293
Some tests you might do at this point include:
* Insert some records into the primary server here, confirm they appear
very quickly (within milliseconds) on the standby, and that the
repl_status view advances accordingly.
* Verify that you can run queries against the standby server, but
cannot make insertions into the standby database.
Simulating the failure of the primary server
--------------------------------------------
To simulate the loss of the primary server, simply stop the "node1" server.
At this point, the standby contains the database as it existed at the time of
the "failure" of the primary server. If looking at ``repl_status`` on
"node2", you should see the time_lag value increase the longer "node1"
is down.
Promoting the Standby to be the Primary
---------------------------------------
Now you can promote the standby server to be the primary, to allow
applications to read and write to the database again, by typing::
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf --verbose standby promote
The server restarts and now has read/write ability.
Bringing the former Primary up as a Standby
-------------------------------------------
To make the former primary act as a standby, which is necessary before
restoring the original roles, type the following on node1::
repmgr -D $PGDATA -d pgbench -p 5432 -U repmgr -R postgres --verbose --force standby clone node2
Then start the "node1" server, which is now acting as a standby server.
Check
Make sure the record(s) inserted the earlier step are still available on the
now standby (prime). Confirm the database on "node1" is read-only.
Restoring the original roles of prime to primary and standby to standby
-----------------------------------------------------------------------
Now restore to the original configuration by stopping
"node2" (now acting as a primary), promoting "node1" again to be the
primary server, then bringing up "node2" as a standby with a valid
``recovery.conf`` file.
Stop the "node2" server::
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby promote
Now the original primary, "node1" is acting again as primary.
Start the "node2" server and type this on "node1"::
repmgr standby clone --force -h node2 -p 5432 -U postgres -R postgres --verbose
Verify the roles have reversed by attempting to insert a record on "node"
and on "node1".
The servers are now again acting as primary on "node1" and standby on "node2".
Alternate setup: both servers on one host
==========================================
Another test setup assumes you might be using the default installation of
PostgreSQL on port 5432 for some other purpose, and instead relocates these
instances onto different ports running as different users. In places where
``127.0.0.1`` is used as a host name, a more traditional configuration
would instead use the name of the relevant host for that parameter.
You can usually leave out changes to the port number in this case too.
* A primary (master) server called "prime," with a user as "prime," who is
also the owner of the files. This server is operating on port 5433. This
server will be known as "node1" in the cluster "test"
* A standby server called "standby", with a user of "standby", who is the
owner of the files. This server is operating on port 5434. This server
will be known and "node2" on the cluster "test."
* A database exists on "prime" called "testdb."
* The Postgress installation in each of the above is defined as $PGDATA,
which is represented here with ``/data/prime`` as the "prime" server and
``/data/standby`` as the "standby" server.
You might setup such an installation by adjusting the login script for the
"prime" and "standby" users as in these two examples::
# prime
PGDATA=/data/prime
PGENGINE=/usr/pgsql-9.0/bin
PGPORT=5433
export PGDATA PGENGINE PGPORT
PATH="$PATH:$PGENGINE"
# standby
PGDATA=/data/standby
PGENGINE=/usr/pgsql-9.0/bin
PGPORT=5434
export PGDATA PGENGINE PGPORT
PATH="$PATH:$PGENGINE"
And then starting/stopping each installation as needed using the ``pg_ctl``
utility.
Note: naming your nodes based on their starting role is not a recommended
best practice! As you'll see in this example, once there is a failover, names
strongly associated with one particular role (primary or standby) can become
confusing, once that node no longer has that role. Future versions of this
walkthrough are expected to use more generic terminology for these names.
Clearing the PostgreSQL installation on the Standby
---------------------------------------------------
Setup a streaming replica, strip away any PostgreSQL installation on the existing replica:
* Stop both servers.
* Go to "standby" database directory and remove the PostgreSQL installation::
cd $PGDATA
rm -rf *
This will delete the entire database installation in ``/data/standby``.
Building the standby
--------------------
Create a directory to store each repmgr configuration in for each node.
In that, there needs to be a ``repmgr.conf`` file for each node in the cluster.
For "prime" we'll assume this is stored in ``/home/prime/repmgr``
and it should contain::
cluster=test
node=1
conninfo='host=127.0.0.1 dbname=testdb'
On "standby" create the file ``/home/standby/repmgr/repmgr.conf`` with::
cluster=test
node=2
conninfo='host=127.0.0.1 dbname=testdb'
Next, with "prime" server running, we want to use the ``clone standby`` command
in repmgr to copy over the entire PostgreSQL database cluster onto the
"standby" server. On the "standby" server, type::
repmgr -D $PGDATA -p 5433 -U prime -R prime --verbose standby clone localhost
Next, we need a recovery.conf file on "standby" in the $PGDATA directory
that reads as follows::
standby_mode = 'on'
primary_conninfo = 'host=127.0.0.1 port=5433'
Make sure that standby has a qualifying role in the database, "testdb" in this
case, and can login. Start ``psql`` on the testdb database on "prime" and at
the testdb# prompt type::
CREATE ROLE standby SUPERUSER LOGIN
Registering the master and standby
----------------------------------
First, register the master by typing on "prime"::
repmgr -f /home/prime/repmgr/repmgr.conf --verbose master register
On "standby," edit the ``postgresql.conf`` file and change the port to 5434.
Start the "standby" server.
Register the standby by typing on "standby"::
repmgr -f /home/standby/repmgr/repmgr.conf --verbose standby register
At this point, you have a functioning primary on "prime" and a functioning
standby server running on "standby." You can confirm the master knows
about the standby, and that it is keeping it current, by running the
following on the master::
psql -x -d pgbench -c "SELECT * FROM repmgr_test.repl_status"
Some tests you might do at this point include:
* Insert some records into the primary server here, confirm they appear
very quickly (within milliseconds) on the standby, and that the
repl_status view advances accordingly.
* Verify that you can run queries against the standby server, but
cannot make insertions into the standby database.
Simulating the failure of the primary server
--------------------------------------------
To simulate the loss of the primary server, simply stop the "prime" server.
At this point, the standby contains the database as it existed at the time of
the "failure" of the primary server.
Promoting the Standby to be the Primary
---------------------------------------
Now you can promote the standby server to be the primary, to allow
applications to read and write to the database again, by typing::
repmgr -f /home/standby/repmgr/repmgr.conf --verbose standby promote
The server restarts and now has read/write ability.
Bringing the former Primary up as a Standby
-------------------------------------------
To make the former primary act as a standby, which is necessary before
restoring the original roles, type::
repmgr -U standby -R prime -h 127.0.0.1 -p 5433 -d testdb --force --verbose standby clone
Stop and restart the "prime" server, which is now acting as a standby server.
Make sure the record(s) inserted the earlier step are still available on the
now standby (prime). Confirm the database on "prime" is read-only.
Restoring the original roles of prime to primary and standby to standby
-----------------------------------------------------------------------
Now restore to the original configuration by stopping the
"standby" (now acting as a primary), promoting "prime" again to be the
primary server, then bringing up "standby" as a standby with a valid
``recovery.conf`` file on "standby".
Stop the "standby" server::
repmgr -f /home/prime/repmgr/repmgr.conf standby promote
Now the original primary, "prime" is acting again as primary.
Start the "standby" server and type this on "prime"::
repmgr standby clone --force -h 127.0.0.1 -p 5434 -U prime -R standby --verbose
Stop the "standby" and change the port to be 5434 in the ``postgresql.conf``
file.
Verify the roles have reversed by attempting to insert a record on "standby"
and on "prime."
The servers are now again acting as primary on "prime" and standby on "standby".
Configuration and command reference
===================================
Configuration File
------------------
``repmgr.conf`` is looked for in the directory repmgrd or repmgr exists in.
The configuration file should have 3 lines:
1. cluster: A string (single quoted) that identify the cluster we are on
2. node: An integer that identify our node in the cluster
3. conninfo: A string (single quoted) specifying how we can connect to this node's PostgreSQL service
repmgr
------
Command line syntax
~~~~~~~~~~~~~~~~~~~
The current supported syntax for the program can be seen using::
repmgr --help
The output from this program looks like this::
repmgr: Replicator manager
Usage:
repmgr [OPTIONS] master {register}
repmgr [OPTIONS] standby {register|clone|promote|follow}
General options:
--help show this help, then exit
--version output version information, then exit
--verbose output verbose activity information
Connection options:
-d, --dbname=DBNAME database to connect to
-h, --host=HOSTNAME database server host or socket directory
-p, --port=PORT database server port
-U, --username=USERNAME database user name to connect as
Configuration options:
-D, --data-dir=DIR local directory where the files will be copied to
-f, --config_file=PATH path to the configuration file
-R, --remote-user=USERNAME database server username for rsync
-w, --wal-keep-segments=VALUE minimum value for the GUC wal_keep_segments (default: 5000)
-I, --ignore-rsync-warning ignore rsync partial transfer warning
-F, --force force potentially dangerous operations to happen
repmgr performs some tasks like clone a node, promote it or making follow another node and then exits.
COMMANDS:
master register - registers the master in a cluster
standby register - registers a standby in a cluster
standby clone [node] - allows creation of a new standby
standby promote - allows manual promotion of a specific standby into a new master in the event of a failover
standby follow - allows the standby to re-point itself to a new master
The ``--verbose`` option can be useful in troubleshooting issues with
the program.
repmgr commands
---------------
Not all of these commands need the ``repmgr.conf`` file, but they need to be able to
connect to the remote and local databases.
You can teach it which is the remote database by using the -h parameter or
as a last parameter in standby clone and standby follow. If you need to specify
a port different then the default 5432 you can specify a -p parameter.
Standby is always considered as localhost and a second -p parameter will indicate
its port if is different from the default one.
* master register
* Registers a master in a cluster, it needs to be executed before any
standby nodes are registered
* standby register
* Registers a standby in a cluster, it needs to be executed before
repmgrd will function on the node.
* standby clone [node to be cloned]
* Does a backup via ``rsync`` of the data directory of the primary. And it
creates the recovery file we need to start a new hot standby server.
It doesn't need the ``repmgr.conf`` so it can be executed anywhere on the
new node. You can change to the directory you want the new database
cluster at and execute::
./repmgr standby clone node1
or run from wherever you are with a full path::
./repmgr -D /path/to/new/data/directory standby clone node1
That will make a backup of the primary then you only need to start the server
using a command like::
pg_ctl -D /your_data_directory_path start
Note that some installations will also redirect the output log file when
executing ``pg_ctl``; check the server startup script you are using
and try to match what it does.
* standby promote
* Allows manual promotion of a specific standby into a new primary in the
event of a failover. This needs to be executed on the same directory
where the ``repmgr.conf`` is in the standby, or you can use the ``-f`` option
to indicate where the ``repmgr.conf`` is at. It doesn't need any
additional arguments::
./repmgr standby promote
That will restart your standby postgresql service.
* standby follow
* Allows the standby to base itself to the new primary passed as a
parameter. This needs to be executed on the same directory where the
``repmgr.conf`` is in the standby, or you can use the ``-f`` option
to indicate where the ``repmgr.conf`` is at. Example::
./repmgr standby follow
repmgrd Daemon
--------------
Command line syntax
~~~~~~~~~~~~~~~~~~~
The current supported syntax for the program can be seen using::
repmgrd --help
The output from this program looks like this::
repmgrd: Replicator manager daemon
Usage:
repmgrd [OPTIONS]
Options:
--help show this help, then exit
--version output version information, then exit
--verbose output verbose activity information
-f, --config_file=PATH database to connect to
repmgrd monitors a cluster of servers.
The ``--verbose`` option can be useful in troubleshooting issues with
the program.
Usage
-----
repmgrd reads the ``repmgr.conf`` file in current directory, or as
indicated with -f parameter. If run on a standby, it checks if that
standby is in ``repl_nodes`` and adds it if not.
Before you can run repmgrd you need to register a master in a cluster
using the ``MASTER REGISTER`` command. If run on a master,
repmgrd will exit, as it has nothing to do on them yet. It is only
targeted at running on standby servers currently. If converting
a former master into a standby, you will need to start repmgrd
in order to make it fully operational in its new role.
The repmgr daemon creates 2 connections: one to the master and another to the
standby.
Lag monitoring
--------------
repmgrd helps monitor a set of master and standby servers. You can
see which node is the current master, as well as how far behind each
is from current.
To look at the current lag between primary and each node listed
in ``repl_node``, consult the ``repl_status`` view::
psql -d postgres -c "SELECT * FROM repmgr_test.repl_status"
This view shows the latest monitor info from every node.
* replication_lag: in bytes. This is how far the latest xlog record
we have received is from master.
* apply_lag: in bytes. This is how far the latest xlog record
we have applied is from the latest record we have received.
* time_lag: in seconds. How many seconds behind the master is this node.
Error codes
-----------
When the repmgr or repmgrd program exits, it will set one of the
following
* SUCCESS 0: Program ran successfully.
* ERR_BAD_CONFIG 1: One of the configuration checks the program makes failed.
* ERR_BAD_RSYNC 2: An rsync call made by the program returned an error.
* ERR_STOP_BACKUP 3: A ``pg_stop_backup()`` call made by the program didn't succeed.
* ERR_NO_RESTART 4: An attempt to restart a PostgreSQL instance failed.
* ERR_NEEDS_XLOG 5: Could note create the ``pg_xlog`` directory when cloning.
* ERR_DB_CON 6: Error when trying to connect to a database.
* ERR_DB_QUERY 7: Error executing a database query.
* ERR_PROMOTED 8: Exiting program because the node has been promoted to master.
* ERR_BAD_PASSWORD 9: Password used to connect to a database was rejected.
License and Contributions
=========================
repmgr is licensed under the GPL v3. All of its code and documentation is
Copyright 2010-2011, 2ndQuadrant Limited. See the files COPYRIGHT and LICENSE for
details.
Contributions to repmgr are welcome, and listed in the file CREDITS.
2ndQuadrant Limited requires that any contributions provide a copyright
assignment and a disclaimer of any work-for-hire ownership claims from the
employer of the developer. This lets us make sure that all of the repmgr
distribution remains free code. Please contact info@2ndQuadrant.com for a
copy of the relevant Copyright Assignment Form.
Code style
----------
Code in repmgr is formatted to a consistent style using the following command::
astyle --style=ansi --indent=tab --suffix=none *.c *.h
Contributors should reformat their code similarly before submitting code to
the project, in order to minimize merge conflicts with other work.
Releases
1
repmgr v5.5.0
Latest
Languages
C
98.1%
Lex
1.3%
Makefile
0.4%
Perl
0.2%