Compare commits

...

117 Commits

Author SHA1 Message Date
Christian Kruse
9c3d79147b now version.h contains the right version 2014-02-07 21:47:39 +01:00
Christian Kruse
ca470647cb cleanup of usage text
Now it properly aligns and breaks at 78 characters.
2014-01-30 14:26:17 +01:00
Christian Kruse
62ee287e3f updated TODO 2014-01-30 14:10:14 +01:00
Christian Kruse
729a1b848a release notes for 2.0 stable 2014-01-30 13:59:17 +01:00
Christian Kruse
701cf043fd fix: seems as if I missread -hackers 2014-01-23 16:46:49 +01:00
Christian Kruse
bbb67c55f6 simple past of set is set 2014-01-23 10:50:37 +01:00
Christian Kruse
c2c48a9fe6 removed already finished TODO tasks 2014-01-23 10:48:04 +01:00
Christian Kruse
9d6ac2ebf9 fixed documentation and line endings 2014-01-23 10:39:21 +01:00
Christian Kruse
680f23fb1d copyright push 2014-01-23 10:37:49 +01:00
Christian Kruse
1159113c58 ignore the dynamic shared memory directory, too 2014-01-23 10:02:32 +01:00
Christian Kruse
f25a709454 added an explicit type cast to avoid compiler warnings 2014-01-22 15:17:47 +01:00
Christian Kruse
897daddcc7 removed not needed arguments to avoid compiler warnings 2014-01-22 15:17:28 +01:00
Christian Kruse
0fdcce0477 use if instead of switch and avoid a warning 2014-01-22 15:12:29 +01:00
Christian Kruse
de58eff7c1 added a chdir() for proper daemonizing 2014-01-22 14:30:38 +01:00
Christian Kruse
f2a0b31a20 more log format fixes 2014-01-22 14:30:24 +01:00
Christian Kruse
e007a55967 fix: do not use fsync()
We do not need fsync(), the fflush() is enough to avoid concurrent
logs.
2014-01-22 11:47:50 +01:00
Christian Kruse
d235c696af fix: do not newline at the start of a log line
This breaks the log file format since it will have a line break directly
after the timestamp
2014-01-22 11:47:02 +01:00
Christian Kruse
4ef6fbb5fe do not close stderr but reopen it to /dev/null
We want stderr to be always a valid file descriptor
2014-01-21 16:25:57 +01:00
Christian Kruse
2e61d7b156 refactoring: daemonizing is now a function 2014-01-21 16:19:49 +01:00
Christian Kruse
4496a0761e we now use a function and are more sophisticated
Refactoring part: we now use a function to generate the PID
file. Sophistication: we now check if the PID contained in the file is a
valid PID. We ignore the file if it doesn't.
2014-01-21 16:18:15 +01:00
Christian Kruse
3978ead184 use a second fork to avoid a terminal
after the setsid() we are the process leader. And as a process leader we
are able to open a new terminal, even if we currently don't own one. So
we do another fork and do not call setsid() and not become a process
leader to avoid that.
2014-01-21 15:51:33 +01:00
Christian Kruse
b36dbf61fe reopening stdin and stdout to /dev/null now
stdin, stdout and stderr should always be valid file handles. Thus we
don't close them but reopen them to /dev/null
2014-01-21 15:31:38 +01:00
Christian Kruse
84466ecca5 log_crit() is more appropriate 2014-01-21 15:23:20 +01:00
Christian Kruse
649086e5e4 use unlink() instead of remove()
`remove()` will do a rmdir if necessary - we don't want that. So we use `unlink()`
2014-01-21 15:22:31 +01:00
Christian Kruse
7cf2eb440d renamed config options to a much more descriptive name 2014-01-21 15:19:50 +01:00
Christian Kruse
388bbfb773 split install target into install_prog and install_ext
Patch by Marco Nenciarini <mnencia@debian.org>
2014-01-21 14:23:33 +01:00
Christian Kruse
a89aa02c68 fix: make pg_config be settable from outside the makefile
Patch by Marco Nenciarini <mnencia@debian.org>
2014-01-21 14:22:59 +01:00
Christian Kruse
c81793b63f fix: added forgotten options.priority value
Patch by Marco Nenciarini <mnencia@debian.org>
2014-01-21 14:18:12 +01:00
Christian Kruse
b4e83cf188 Add format attribute checking for printf() like functions
Patch by Marco Nenciarini <mnencia@debian.org>
2014-01-21 14:14:36 +01:00
Christian Kruse
1db61ce277 fix: fail when repmgr_funcs is not pre-loaded
when repmgr_funcs is not pre-loaded `repmgr_update_standby_location()`
will return false and `repmgr_get_last_standby_location()` will return
an empty string. Thus we may end in an endless loop. To avoid that we fail.
2014-01-21 13:54:10 +01:00
Christian Kruse
41abf9a7ef fix: flushing and fsync()ing the log file
When not flushing and fsync()ing it the output may be garbled due to
concurrent writes to the file (system() spawns a child process with
stdin/stdout/stderr inherited from it's parent)
2014-01-21 13:52:27 +01:00
Christian Kruse
abebc53ddc fix: sscanf() does not set variables to 0 on error 2014-01-21 13:48:41 +01:00
Christian Kruse
5fc4a0382f added config options sleep_delay and sleep_monitor
sleep_monitor replaces the old SLEEP_MONITOR define and makes it
configurable; this is the interval in which we monitor

sleep_delay replaces the old sleep(300) when waiting for the master to
recover.
2014-01-17 14:35:50 +01:00
Christian Kruse
a7d3c9b93a fix: also close stderr when using syslog logging 2014-01-17 12:14:26 +01:00
Christian Kruse
ee9dc9e247 do not use exit()
We avoid using exit() to be able to clean up when we have to
terminate. This includes removal of the PID file as well as closing
database connections.
2014-01-17 11:28:55 +01:00
Christian Kruse
94cb5b94e7 fix: reopen log file on SIGHUP 2014-01-16 17:16:45 +01:00
Christian Kruse
a08aa50f92 fix: close stdin and stdout only in repmgrd
closing stdin and stdout might cause problems when using system(), so we
avoid it.
2014-01-16 16:01:58 +01:00
Christian Kruse
9563877fbb new config option, stdout/stdin closed
Now stdin and stdout get closed. Additionally stderr gets closed and
reopened to the new config option „logfile“ if specified
2014-01-16 15:22:34 +01:00
Christian Kruse
4f3bd6612c do not exit in getMasterConnection() 2014-01-16 15:07:15 +01:00
Christian Kruse
192ee3cdb0 do not exit in get_cluster_size 2014-01-16 15:07:06 +01:00
Christian Kruse
6f149ead8f do not exit in guc_setted and guc_setted_typed 2014-01-16 14:48:46 +01:00
Christian Kruse
77aa6aa326 do not exit in pg_version 2014-01-16 14:48:42 +01:00
Christian Kruse
18206b3a64 do not exit() in is_witness 2014-01-16 14:28:56 +01:00
Christian Kruse
91446bcf93 fix: do not try to reconnect infinitely 2014-01-10 17:26:02 +01:00
Christian Kruse
dcdf8788ae fix: handle connection loss to standby
We do basically the same as we do for the master since connections drop
from time to time
2014-01-10 17:12:03 +01:00
Christian Kruse
4fabfbbbd0 fix: do not exit in is_standby()
Instead we now return an int with 0 meaning „not a standby,“ 1 meaning
„is a standby“ and -1 meaning „connection dropped“
2014-01-10 17:11:16 +01:00
Christian Kruse
c41030b40e Merge branch 'REL2_0_STABLE'
Conflicts:
	HISTORY
	dbutils.h
	repmgr.c
	repmgrd.c
	version.h
2014-01-10 16:07:33 +01:00
Christian Kruse
a0fdadd5d2 this way it is much cleaner 2014-01-09 15:35:44 +01:00
Christian Kruse
4c3d7f80ed now code compiles with -ansi -pedantic and has less warnings 2014-01-09 14:45:07 +01:00
Christian Kruse
6e3fe059d8 added config options pg_bindir and pg_ctl_options 2014-01-09 14:44:34 +01:00
Christian Kruse
9f26254ac3 fix: added some missing initializers to avoid compiler warning 2014-01-09 13:33:22 +01:00
Christian Kruse
0e8ff1730e added handling of a PID file 2014-01-09 13:04:40 +01:00
Christian Kruse
634fdff303 fix: do not call setup_event_handlers() on WIN32
If we put setup_event_handlers() in #ifdef WIN32, we have to do it for
the call and the declaration, too
2014-01-09 12:57:16 +01:00
Christian Kruse
cbce29f009 fixed typos 2014-01-08 11:55:03 +01:00
Christian Kruse
920f925e4b added a new cli option --daemonize
This option forks the process and generates a new session. This
effectively detaches it from the shell. Don't forget to redirect stderr
or use syslog for logging!
2014-01-08 11:53:15 +01:00
Christian Kruse
9fe2d6886e white space cleanup 2014-01-07 16:42:06 +01:00
Christian Kruse
0068dd573a fix: do not compare pointers but the strings 2014-01-07 15:52:29 +01:00
Christian Kruse
d0f3cb59c7 fix: create data directory after sanity check 2014-01-07 14:42:55 +01:00
Christian Kruse
7428e92e10 fix: correctly check the return value of PQexec()
not only check if return value is not NULL but also check that the
returned result is a PGRES_COMMAND_OK (e.g. the INSERT was successful)
2014-01-07 14:27:31 +01:00
Christian Kruse
a97065113d fix: remove own node earlier if force is set
We have to remove our own node before we check for a new master if force
is set; else master register would fail on the second time since there
already is a master (ourselves), even if we specify -F
2014-01-07 14:16:58 +01:00
Christian Kruse
9e2f276fcf fix: do not exit after pg_start_backup() w/o pg_stop_backup() 2014-01-07 14:02:29 +01:00
Christian Kruse
b0cd2b5e43 fix: do not exit() in create_pgdir()
This could leave the database in a locked state (pg_start_backup()).
And since all calls to create_pgdir() handle the return value correctly
we simply replace the exit() by a return false
2014-01-07 14:01:46 +01:00
Jaime Casanova
9209248420 Fix oversight in the header of guc_setted_typed() 2013-12-19 11:09:08 -05:00
Jaime Casanova
6693b99288 Files to create the debian package
Patch by: Christian Kruse
2013-12-19 01:43:12 -05:00
Jaime Casanova
8e7b487838 Update debian control file 2013-12-19 01:41:24 -05:00
Jaime Casanova
7f796e2d15 Update history and credit files 2013-12-19 01:40:00 -05:00
Jaime Casanova
5e04ab6eae Add a ssh_options parameter to allow ssh checking
to consider non-default values (ie: a different port)

Patch by Jay Taylor
2013-12-19 01:22:55 -05:00
Jaime Casanova
a1f4285e2b Add guc_setted_typed() function to allow
wal_keep_segmeents to be checked as an integer instead
of text

Patch by Jay Taylor
2013-12-19 01:22:42 -05:00
Jaime Casanova
493133986d Add timestamps to log line in stderr
Patch by Christian Kruse
2013-12-19 01:15:28 -05:00
Jaime Casanova
8b370dc581 Fix some typos
Patch by Krzysztof Gajdemski
2013-12-07 13:25:46 -05:00
Jaime Casanova
43af00aa12 Ignore pg_log when cloning, just like we ignore pg_xlog 2013-12-04 01:23:48 -05:00
Jaime Casanova
3c8df59eb9 Make repmgr compile in 9.3.
Patch provided by Shawn Ellis with some fixes by me.
2013-11-14 00:43:35 -05:00
Jaime Casanova
b410772627 Rework algorithm to coordinate voting
Make this by waiting for all nodes to finish a step, before starting
a new one. So everyone starts promoting or following in a coordinated
fashion.
Also make a few fixes.
2013-09-26 13:24:31 -05:00
Jaime Casanova
d99024ba11 Make repmgrd survive to the failover
To do this it needs to reconnect to the new master
2013-09-26 11:58:59 -05:00
Jaime Casanova
1afaa3a26f Rearrange the logic in do_failover() for further improvements.
Specially, make this a more coordinated process by making all
nodes waiting for the others before going to the next step.

This is one step further in following Andres Freund advices
but there is still a lot to do in order to complete that,
specially it could be needed to add more fields to repl_nodes
and to the shm area.
2013-09-23 18:28:58 -05:00
Jaime Casanova
079a7c9f16 In a failover situation get the nodes in a well defined order.
When deciding which node will be the new master, we should get the
nodes in a well defined order otherwise two standbys could process
nodes with the same priority in different order and end up with
a two master situation.
2013-07-26 00:59:50 -05:00
Jaime Casanova
3b66a31ac9 In a failover situation get the nodes in a well defined order.
When deciding which node will be the new master, we should get the
nodes in a well defined order otherwise two standbys could process
nodes with the same priority in different order and end up with
a two master situation.
2013-07-26 00:52:31 -05:00
Jaime Casanova
bdf957ca52 Add a missing ')'. This is a typo introduced in commit
2bc8044fda

Per complaint from Carlos Chapi when compiling for a customer.
2013-07-13 12:39:13 -05:00
Jaime Casanova
ad3630e7a9 Add a missing ')'. This is a typo introduced in commit
2bc8044fda

Per complaint from Carlos Chapi when compiling for a customer.
2013-07-13 12:37:15 -05:00
Jaime Casanova
67b451aa45 If PQgetCancel() returns NULL we should also return false.
Noted by Andres Freund.
2013-07-12 08:03:36 -05:00
Jaime Casanova
0a70d907ae Improve messages in wait_connection_availability, so we know what
error makes the failover procedure to start

By gripe from Andres Freund
2013-07-12 08:03:25 -05:00
Jaime Casanova
2e7acf03c4 If PQgetCancel() returns NULL we should also return false.
Noted by Andres Freund.
2013-07-12 08:01:01 -05:00
Jaime Casanova
2bc8044fda Improve messages in wait_connection_availability, so we know what
error makes the failover procedure to start

By gripe from Andres Freund
2013-07-10 19:25:58 -05:00
Jaime Casanova
ab1d380843 If PQcancel() fails, consider it as if the master is failing.
Because PQcancel() establish a new synchronous connection to the
database, if it fails it means something wrong has happenned with
master. So instead of just ignore the failure, CancelQuery() now
reports a failure condition so we can detect master's death in
that situation.

This is very important specially when only postmaster crashes but
other children/backend connections are still there. Because the
children connection won't fail and CancelQuery() failure is our
only indication of something wrong happenning.
Currently we just ignore the PQcancel() failure which leads us to
a situation in which we just loop forever
trying to cancel the async query.

Reported by: Martin Euser <martin.euser@nl.abnamro.com>
Problem analyzed and bug spotted by: Andres Freund <andres@2ndquadrant.com>
Patch by: Jaime Casanova <jaime@2ndquadrant.com>
2013-07-10 10:21:51 -05:00
Jaime Casanova
b0b44a157f If PQcancel() fails, consider it as if the master is failing.
Because PQcancel() establish a new synchronous connection to the
database, if it fails it means something wrong has happenned with
master. So instead of just ignore the failure, CancelQuery() now
reports a failure condition so we can detect master's death in
that situation.

This is very important specially when only postmaster crashes but
other children/backend connections are still there. Because the
children connection won't fail and CancelQuery() failure is our
only indication of something wrong happenning.
Currently we just ignore the PQcancel() failure which leads us to
a situation in which we just loop forever
trying to cancel the async query.

Reported by: Martin Euser <martin.euser@nl.abnamro.com>
Problem analyzed and bug spotted by: Andres Freund <andres@2ndquadrant.com>
Patch by: Jaime Casanova <jaime@2ndquadrant.com>
2013-07-10 09:53:45 -05:00
Jaime Casanova
49a2531930 Options -F -W -I -v doesn't accept arguments, which means that on
getopt_long shouldn't be marked with the colon (:) character.

This has been wrong since day one, so backpatching all the way until
1.1
2013-01-13 16:37:39 -05:00
Jaime Casanova
672b237c4e Options -F -W -I -v doesn't accept arguments, which means that on
getopt_long shouldn't be marked with the colon (:) character.

This has been wrong since day one, so backpatching all the way until
1.1
2013-01-13 16:32:56 -05:00
Jaime Casanova
7d94151494 If the node is a witness don't bother asking its position, it always
will be 0/0. We just need to check that we can connect to it to determine
if we are in the majority.
2013-01-11 03:44:50 -05:00
Jaime Casanova
4191b77e70 If the node is a witness don't bother asking its position, it always
will be 0/0. We just need to check that we can connect to it to determine
if we are in the majority.
2013-01-11 03:42:08 -05:00
Jaime Casanova
2a5d431481 Fix a problem that caused a standby to promote itself without going to
voting procedure.

This is because of a race condition inside CheckPrimaryConnection().

This has independently reported by Alex Railean and Dumitru, and Frank Jördens.
Analyzed and fixed by Cédric Villemain.

The fix have been verified to work by Frank
2012-12-19 12:01:27 -05:00
Jaime Casanova
81b8a944de Fix a problem that caused a standby to promote itself without going to
voting procedure.

This is because of a race condition inside CheckPrimaryConnection().

This has independently reported by Alex Railean and Dumitru, and Frank Jördens.
Analyzed and fixed by Cédric Villemain.

The fix have been verified to work by Frank
2012-12-19 11:45:58 -05:00
Jaime Casanova
93a999adc7 Formatting code using astyle 2012-12-11 11:49:07 -05:00
Jaime Casanova
1b69282df9 Formatting code using astyle 2012-12-11 11:47:59 -05:00
Jaime Casanova
06dd252f69 To select new master it needs to know which standby has received more
xlog records from master, so it standby should use pg_last_xlog_receive_location()
to report their positions. This solves a possible situation in which
a standby that is considered as new master when promoted is no longer
the best option.
2012-12-03 09:27:12 -05:00
Jaime Casanova
088ca29fe3 To select new master it needs to know which standby has received more
xlog records from master, so it standby should use pg_last_xlog_receive_location()
to report their positions. This solves a possible situation in which
a standby that is considered as new master when promoted is no longer
the best option.
2012-12-03 09:18:08 -05:00
Jaime Casanova
30e9d06172 Add an option for STANDBY FOLLOW to wait for a master to appear.
This is important for autofailover to do the right thing when
standbys detected master death at different times.

While this is a new option, seems important for the autofailover
to work properly so i will consider the lack of it a bug and
will backpatch to 2.0 where autofailover was introduced.

For gripe from Alex Railean, about a standby not finding the new
master because the new master hasn't finish promoting.
2012-11-14 15:09:26 -05:00
Jaime Casanova
d6bd5aa381 Add an option for STANDBY FOLLOW to wait for a master to appear.
This is important for autofailover to do the right thing when
standbys detected master death at different times.

While this is a new option, seems important for the autofailover
to work properly so i will consider the lack of it a bug and
will backpatch to 2.0 where autofailover was introduced.

For gripe from Alex Railean, about a standby not finding the new
master because the new master hasn't finish promoting.
2012-11-14 15:07:59 -05:00
Gabriele Bartolini
bbdcffa813 Fixed typos notified by lintian 2012-11-09 18:09:43 +01:00
Jaime Casanova
cd1a84252e Fix node decision logic when priorities are involved. Currently if
two nodes with different prorities are equally good to be promoted
the second one (with a lower priority, considering them
in descending order) will win.

Per report from Brailean Dumitru
2012-09-16 02:47:02 -05:00
Jaime Casanova
5f33d9d715 Fix node decision logic when priorities are involved. Currently if
two nodes with different prorities are equally good to be promoted
the second one (with a lower priority, considering them
in descending order) will win.

Per report from Brailean Dumitru
2012-09-16 02:38:28 -05:00
Jaime Casanova
2e19b3688b Add a comment 2012-09-16 02:26:18 -05:00
Jaime Casanova
877f4cf82e Add a comment 2012-09-16 02:23:16 -05:00
Jaime Casanova
de883a4c84 Keep compiler quiet. Noted when compiling in FreeBSD in which i
get a warning for an uninitialized variable.

Also, define InvalidXLogRecPtr. We don't really need it but using
it make the initialization future proof (considering that in 9.3
XLogRecPtr will change its structure).
2012-09-16 02:21:18 -05:00
Jaime Casanova
949f5ee498 Keep compiler quiet. Noted when compiling in FreeBSD in which i
get a warning for an uninitialized variable.

Also, define InvalidXLogRecPtr. We don't really need it but using
it make the initialization future proof (considering that in 9.3
XLogRecPtr will change its structure).
2012-09-16 02:10:02 -05:00
Jaime Casanova
eb2f7efb4a When we have more command-line arguments than we should have we
need to show that last value and we should use only optind for that
instead of optind+1
2012-09-15 17:39:10 -05:00
Jaime Casanova
85ff3ec286 Fix documentation to always use -h sintax to refer to the node we
want to clone or connect to, instead of relying on the fact that
for some time putting that argument at last worked.
2012-09-15 17:38:42 -05:00
Jaime Casanova
499a501afd Make repmgr compatible with FreeBSD.
We need to add an #include and make it use a different path for the
"true" binary.

Maybe we need to make this changes for all BSD systems but having no
evidence of that i prefer to make this only for systems with __FreeBSD__
2012-09-15 17:37:59 -05:00
Jaime Casanova
0a9107d76d Improve sample of commands for promote and follow 2012-09-15 17:37:43 -05:00
Jaime Casanova
2803bb92a8 Make repmgr compatible with FreeBSD.
We need to add an #include and make it use a different path for the
"true" binary.

Maybe we need to make this changes for all BSD systems but having no
evidence of that i prefer to make this only for systems with __FreeBSD__
2012-09-15 17:32:38 -05:00
Jaime Casanova
16fe41eecf Improve sample of commands for promote and follow 2012-09-11 15:53:57 -05:00
Jaime Casanova
95ec0450da When we have more command-line arguments than we should have we
need to show that last value and we should use only optind for that
instead of optind+1
2012-08-30 02:11:48 -05:00
Jaime Casanova
57aa95f674 Fix documentation to always use -h sintax to refer to the node we
want to clone or connect to, instead of relying on the fact that
for some time putting that argument at last worked.
2012-08-30 02:10:10 -05:00
Jaime Casanova
d365a309fc Fix HISTORY to show from newest to oldest 2012-07-27 11:29:07 -05:00
Jaime Casanova
d5a41bb587 Fix tabs in HISTORY 2012-07-27 11:22:04 -05:00
Jaime Casanova
474d3217b4 Fix typos in RELEASE NOTES 2012-07-27 11:21:49 -05:00
Jaime Casanova
7a00d5a9a4 Now that we can have no monitoring we need to check all nodes at failover
not only those in repl_monitor
2012-07-21 17:53:15 -05:00
Jaime Casanova
5683b905dd New development branch is 2.1dev 2012-07-21 12:22:04 -05:00
29 changed files with 1531 additions and 776 deletions

View File

@@ -1,4 +1,4 @@
Copyright (c) 2010-2012, 2ndQuadrant Limited
Copyright (c) 2010-2014, 2ndQuadrant Limited
All rights reserved.
This program is free software: you can redistribute it and/or modify

View File

@@ -10,3 +10,7 @@ Hannu Krosing <hannu@2ndQuadrant.com>
Cédric Villemain <cedric@2ndquadrant.com>
Charles Duffy <charles@dyfis.net>
Daniel Farina <daniel@heroku.com>
Shawn Ellis <shawn.ellis17@gmail.com>
Jay Taylor <jay@jaytaylor.com>
Christian Kruse <christian@2ndQuadrant.com>
Krzysztof Gajdemski <songo@debian.org.pl>

20
HISTORY
View File

@@ -1,4 +1,22 @@
2.0beta 2012-07-27
2.0stable 2014-01-30
Documentation fixes (Christian)
General refactoring, code quality improvements and stabilization work (Christian)
Added proper daemonizing (-d/--daemonize) (Christian)
Added PID file handling (-p/--pid-file) (Christian)
New config option: monitor_interval_secs (Christian)
New config option: retry_promote_interval (Christian)
New config option: logfile (Christian)
New config option: pg_bindir (Christian)
New config option: pgctl_options (Christian)
2.0beta2 2013-12-19
Improve autofailover logic and algorithms (Jaime, Andres)
Ignore pg_log when cloning (Jaime)
Add timestamps to log line in stderr (Christian)
Correctly check wal_keep_segments (Jay Taylor)
Add a ssh_options parameter (Jay Taylor)
2.0beta1 2012-07-27
Make CLONE command try to make an exact copy including $PGDATA location (Cedric)
Add detection of master failure (Jaime)
Add the notion of a witness server (Jaime)

View File

@@ -1,6 +1,6 @@
#
# Makefile
# Copyright (c) 2ndQuadrant, 2010-2012
# Copyright (c) 2ndQuadrant, 2010-2014
repmgrd_OBJS = dbutils.o config.o repmgrd.o log.o strutil.o
repmgr_OBJS = dbutils.o check_dir.o config.o repmgr.o log.o strutil.o
@@ -21,7 +21,8 @@ repmgr: $(repmgr_OBJS)
$(CC) $(CFLAGS) $(repmgr_OBJS) $(PG_LIBS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o repmgr
ifdef USE_PGXS
PGXS := $(shell pg_config --pgxs)
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
else
subdir = contrib/repmgr
@@ -32,9 +33,13 @@ endif
# XXX: Try to use PROGRAM construct (see pgxs.mk) someday. Right now
# is overriding pgxs install.
install:
install: install_prog install_ext
install_prog:
$(INSTALL_PROGRAM) repmgrd$(X) '$(DESTDIR)$(bindir)'
$(INSTALL_PROGRAM) repmgr$(X) '$(DESTDIR)$(bindir)'
install_ext:
$(MAKE) -C sql install
ifneq (,$(DATA)$(DATA_built))

View File

@@ -1085,7 +1085,7 @@ License and Contributions
=========================
repmgr is licensed under the GPL v3. All of its code and documentation is
Copyright 2010-2012, 2ndQuadrant Limited. See the files COPYRIGHT and LICENSE for
Copyright 2010-2014, 2ndQuadrant Limited. See the files COPYRIGHT and LICENSE for
details.
Main sponsorship of repmgr has been from 2ndQuadrant customers.

19
TODO
View File

@@ -1,21 +1,18 @@
Known issues in repmgr
======================
* The check for whether ``wal_keep_segments`` is considered large enough
does a string comparison rather than an integer one. It can give both
false positive (setting is large enough but flagged as too small) and
false negative (setting is too small but not noted as such) errors.
* When running repmgr against a remote machine, operations that start
the database server using the ``pg_ctl`` command may accidentally
terminate after their associated ssh session ends.
* After running repmgrd as a regular foreground application, hitting
control-C causes the program to crash.
Planned feature improvements
============================
* Before running ``pg_start_backup()``, a sanity check that there is a
a working ssh connection to the destination would help find
configuration errors before disturbing the database.
* Timeline increases when promoting a standby
* A better check which standby did receive most of the data
* Make the fact that a standby may be delayed a factor in the voting
algorithm
* include support for delayed standbys

View File

@@ -1,213 +1,225 @@
=====================================================
PostgreSQL Automatic Fail-Over - User Documentation
=====================================================
Automatic Failover
==================
repmgr allows setups for automatic failover when it detects the failure of the master node.
Following is a quick setup for this.
Installation
============
For convenience, we define:
* node1 is the hostname fully qualified of the Master server, IP 192.168.1.10
* node2 is the hostname fully qualified of the Standby server, IP 192.168.1.11
* witness is the hostname fully qualified of the server used for witness, IP 192.168.1.12
:Note: It is not recommanded to use name defining status of a server like «masterserver»,
this is a name leading to confusion once a failover take place and the Master is
now on the «standbyserver».
Summary
-------
2 PostgreSQL servers are involved in the replication. Automatic fail-over need
to vote to decide what server it should promote, thus an odd number is required
and a witness-repmgrd is installed in a third server where it uses a PostgreSQL
cluster to communicate with other repmgrd daemons.
1. Install PostgreSQL in all the servers involved (including the server used for
witness)
2. Install repmgr in all the servers involved (including the server used for witness)
3. Configure the Master PostreSQL
4. Clone the Master to the Standby using "repmgr standby clone" command
5. Configure repmgr in all the servers involved (including the server used for witness)
6. Register Master and Standby nodes
7. Initiate witness server
8. Start the repmgrd daemons in all nodes
:Note: A complete Hight-Availability design need at least 3 servers to still have
a backup node after a first failure.
Install PostgreSQL
------------------
You can install PostgreSQL using any of the recommended methods. You should ensure
it's 9.0 or superior.
Install repmgr
--------------
Install repmgr following the steps in the README.
Configure PostreSQL
-------------------
Log in node1.
Edit the file postgresql.conf and modify the parameters::
listen_addresses='*'
wal_level = 'hot_standby'
archive_mode = on
archive_command = 'cd .' # we can also use exit 0, anything that
# just does nothing
max_wal_senders = 10
wal_keep_segments = 5000 # 80 GB required on pg_xlog
hot_standby = on
shared_preload_libraries = 'repmgr_funcs'
Edit the file pg_hba.conf and add lines for the replication::
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr 192.168.1.10/30 trust
host replication all 192.168.1.10/30 trust
:Note: It is also possible to use a password authentication (md5), .pgpass file
should be edited to allow connection between each node.
Create the user and database to manage replication::
su - postgres
createuser -s repmgr
createdb -O repmgr repmgr
psql -f /usr/share/postgresql/9.0/contrib/repmgr_funcs.sql repmgr
Restart the PostgreSQL server::
pg_ctl -D $PGDATA restart
And check everything is fine in the server log.
Create the ssh-key for the postgres user and copy it to other servers::
su - postgres
ssh-keygen # /!\ do not use a passphrase /!\
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
exit
rsync -avz ~postgres/.ssh/authorized_keys node2:~postgres/.ssh/
rsync -avz ~postgres/.ssh/authorized_keys witness:~postgres/.ssh/
rsync -avz ~postgres/.ssh/id_rsa* node2:~postgres/.ssh/
rsync -avz ~postgres/.ssh/id_rsa* witness:~postgres/.ssh/
Clone Master
------------
Log in node2.
Clone the node1 (the current Master)::
su - postgres
repmgr -d repmgr -U repmgr standby clone node1
Start the PostgreSQL server::
pg_ctl -D $PGDATA start
And check everything is fine in the server log.
Configure repmgr
----------------
Log in each server and configure repmgr by editing the file
/etc/repmgr/repmgr.conf::
cluster=my_cluster
node=1
node_name=earth
conninfo='host=192.168.1.10 dbname=repmgr user=repmgr'
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic
promote_command='promote_command.sh'
follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'
* *cluster* is the name of the current replication.
* *node* is the number of the current node (1, 2 or 3 in the current example).
* *node_name* is an identifier for every node.
* *conninfo* is used to connect to the local PostgreSQL server (where the configuration file is) from any node. In the witness server configuration it is needed to add a 'port=5499' to the conninfo.
* *master_response_timeout* is the maximum amount of time we are going to wait before deciding the master has died and start failover procedure.
* *reconnect_attempts* is the number of times we will try to reconnect to master after a failure has been detected and before start failover procedure.
* *reconnect_interval* is the amount of time between retries to reconnect to master after a failure has been detected and before start failover procedure.
* *failover* configure behavior : *manual* or *automatic*.
* *promote_command* the command executed to do the failover (including the PostgreSQL failover itself). The command must return 0 on success.
* *follow_command* the command executed to address the current standby to another Master. The command must return 0 on success.
Register Master and Standby
---------------------------
Log in node1.
Register the node as Master::
su - postgres
repmgr -f /etc/repmgr/repmgr.conf master register
Log in node2.
Register the node as Standby::
su - postgres
repmgr -f /etc/repmgr/repmgr.conf standby register
Initialize witness server
-------------------------
Log in witness.
Initialize the witness server::
su - postgres
repmgr -d repmgr -U repmgr -h 192.168.1.10 -D $WITNESS_PGDATA -f /etc/repmgr/repmgr.conf witness create node1
It needs information to connect to the master to copy the configuration of the cluster, also it needs to know where it should initialize it's own $PGDATA.
As part of the procees it also ask for the superuser password so it can connect when needed.
Start the repmgrd daemons
-------------------------
Log in node2 and witness.
su - postgres
repmgrd -f /etc/repmgr/repmgr.conf > /var/log/postgresql/repmgr.log 2>&1
:Note: The Master does not need a repmgrd daemon.
Suspend Automatic behavior
==========================
Edit the repmgr.conf of the node to remove from automatic processing and change::
failover=manual
Then, signal repmgrd daemon::
su - postgres
kill -HUP `pidoff repmgrd`
TODO : -HUP configuration update is not implemented and it should check its
configuration file against its configuration in DB, updating
accordingly the SQL conf (especialy the failover manual or auto)
this allow witness-standby and standby-not-promotable features
and simpler usage of the tool ;)
Usage
=====
The repmgr documentation is in the README file (how to build, options, etc.)
=====================================================
PostgreSQL Automatic Fail-Over - User Documentation
=====================================================
Automatic Failover
==================
repmgr allows setups for automatic failover when it detects the failure of the master node.
Following is a quick setup for this.
Installation
============
For convenience, we define:
**node1**
is the hostname fully qualified of the Master server, IP 192.168.1.10
**node2**
is the hostname fully qualified of the Standby server, IP 192.168.1.11
**witness**
is the hostname fully qualified of the server used for witness, IP 192.168.1.12
**Note:** It is not recommanded to use name defining status of a server like «masterserver»,
this is a name leading to confusion once a failover take place and the Master is
now on the «standbyserver».
Summary
-------
2 PostgreSQL servers are involved in the replication. Automatic fail-over need
to vote to decide what server it should promote, thus an odd number is required
and a witness-repmgrd is installed in a third server where it uses a PostgreSQL
cluster to communicate with other repmgrd daemons.
1. Install PostgreSQL in all the servers involved (including the server used for
witness)
2. Install repmgr in all the servers involved (including the server used for witness)
3. Configure the Master PostreSQL
4. Clone the Master to the Standby using "repmgr standby clone" command
5. Configure repmgr in all the servers involved (including the server used for witness)
6. Register Master and Standby nodes
7. Initiate witness server
8. Start the repmgrd daemons in all nodes
**Note** A complete Hight-Availability design need at least 3 servers to still have
a backup node after a first failure.
Install PostgreSQL
------------------
You can install PostgreSQL using any of the recommended methods. You should ensure
it's 9.0 or superior.
Install repmgr
--------------
Install repmgr following the steps in the README.
Configure PostreSQL
-------------------
Log in node1.
Edit the file postgresql.conf and modify the parameters::
listen_addresses='*'
wal_level = 'hot_standby'
archive_mode = on
archive_command = 'cd .' # we can also use exit 0, anything that
# just does nothing
max_wal_senders = 10
wal_keep_segments = 5000 # 80 GB required on pg_xlog
hot_standby = on
shared_preload_libraries = 'repmgr_funcs'
Edit the file pg_hba.conf and add lines for the replication::
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr 192.168.1.10/30 trust
host replication all 192.168.1.10/30 trust
**Note:** It is also possible to use a password authentication (md5), .pgpass file
should be edited to allow connection between each node.
Create the user and database to manage replication::
su - postgres
createuser -s repmgr
createdb -O repmgr repmgr
psql -f /usr/share/postgresql/9.0/contrib/repmgr_funcs.sql repmgr
Restart the PostgreSQL server::
pg_ctl -D $PGDATA restart
And check everything is fine in the server log.
Create the ssh-key for the postgres user and copy it to other servers::
su - postgres
ssh-keygen # /!\ do not use a passphrase /!\
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
exit
rsync -avz ~postgres/.ssh/authorized_keys node2:~postgres/.ssh/
rsync -avz ~postgres/.ssh/authorized_keys witness:~postgres/.ssh/
rsync -avz ~postgres/.ssh/id_rsa* node2:~postgres/.ssh/
rsync -avz ~postgres/.ssh/id_rsa* witness:~postgres/.ssh/
Clone Master
------------
Log in node2.
Clone the node1 (the current Master)::
su - postgres
repmgr -d repmgr -U repmgr -h node1 standby clone
Start the PostgreSQL server::
pg_ctl -D $PGDATA start
And check everything is fine in the server log.
Configure repmgr
----------------
Log in each server and configure repmgr by editing the file
/etc/repmgr/repmgr.conf::
cluster=my_cluster
node=1
node_name=earth
conninfo='host=192.168.1.10 dbname=repmgr user=repmgr'
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic
promote_command='promote_command.sh'
follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'
**cluster**
is the name of the current replication.
**node**
is the number of the current node (1, 2 or 3 in the current example).
**node_name**
is an identifier for every node.
**conninfo**
is used to connect to the local PostgreSQL server (where the configuration file is) from any node. In the witness server configuration it is needed to add a 'port=5499' to the conninfo.
**master_response_timeout**
is the maximum amount of time we are going to wait before deciding the master has died and start failover procedure.
**reconnect_attempts**
is the number of times we will try to reconnect to master after a failure has been detected and before start failover procedure.
**reconnect_interval**
is the amount of time between retries to reconnect to master after a failure has been detected and before start failover procedure.
**failover**
configure behavior: *manual* or *automatic*.
**promote_command**
the command executed to do the failover (including the PostgreSQL failover itself). The command must return 0 on success.
**follow_command**
the command executed to address the current standby to another Master. The command must return 0 on success.
Register Master and Standby
---------------------------
Log in node1.
Register the node as Master::
su - postgres
repmgr -f /etc/repmgr/repmgr.conf master register
Log in node2. Register it as a standby::
su - postgres
repmgr -f /etc/repmgr/repmgr.conf standby register
Initialize witness server
-------------------------
Log in witness.
Initialize the witness server::
su - postgres
repmgr -d repmgr -U repmgr -h 192.168.1.10 -D $WITNESS_PGDATA -f /etc/repmgr/repmgr.conf witness create
It needs information to connect to the master to copy the configuration of the cluster, also it needs to know where it should initialize it's own $PGDATA.
As part of the procees it also ask for the superuser password so it can connect when needed.
Start the repmgrd daemons
-------------------------
Log in node2 and witness.
su - postgres
repmgrd -f /etc/repmgr/repmgr.conf > /var/log/postgresql/repmgr.log 2>&1
**Note:** The Master does not need a repmgrd daemon.
Suspend Automatic behavior
==========================
Edit the repmgr.conf of the node to remove from automatic processing and change::
failover=manual
Then, signal repmgrd daemon::
su - postgres
kill -HUP `pidoff repmgrd`
Usage
=====
The repmgr documentation is in the README file (how to build, options, etc.)

View File

@@ -1,6 +1,6 @@
/*
* check_dir.c - Directories management functions
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -127,10 +127,10 @@ mkdir_p(char *path, mode_t omode)
{
struct stat sb;
mode_t numask,
oumask;
oumask;
int first,
last,
retval;
last,
retval;
char *p;
p = path;
@@ -225,12 +225,12 @@ is_pg_dir(char *dir)
struct stat sb;
int r;
// test pgdata
/* test pgdata */
xsnprintf(path, buf_sz, "%s/PG_VERSION", dir);
if (stat(path, &sb) == 0)
return true;
// test tablespace dir
/* test tablespace dir */
sprintf(path, "ls %s/PG_*/ -I*", dir);
r = system(path);
if (r == 0)
@@ -256,7 +256,7 @@ create_pgdir(char *dir, bool force)
{
log_err(_("couldn't create directory \"%s\"...\n"),
dir);
exit(ERR_BAD_CONFIG);
return false;
}
break;
case 1:
@@ -268,7 +268,7 @@ create_pgdir(char *dir, bool force)
{
log_err(_("could not change permissions of directory \"%s\": %s\n"),
dir, strerror(errno));
exit(ERR_BAD_CONFIG);
return false;
}
break;
case 2:
@@ -293,7 +293,7 @@ create_pgdir(char *dir, bool force)
"If you are sure you want to clone here, "
"please check there is no PostgreSQL server "
"running and use the --force option\n"));
exit(ERR_BAD_CONFIG);
return false;
}
return false;
@@ -301,7 +301,7 @@ create_pgdir(char *dir, bool force)
/* Trouble accessing directory */
log_err(_("could not access directory \"%s\": %s\n"),
dir, strerror(errno));
exit(ERR_BAD_CONFIG);
return false;
}
return true;
}

View File

@@ -1,6 +1,6 @@
/*
* check_dir.h
* Copyright (c) 2ndQuadrant, 2010-2012
* Copyright (c) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by

View File

@@ -1,6 +1,6 @@
/*
* config.c - Functions to parse the config file
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -41,6 +41,9 @@ parse_config(const char *config_file, t_configuration_options *options)
memset(options->promote_command, 0, sizeof(options->promote_command));
memset(options->follow_command, 0, sizeof(options->follow_command));
memset(options->rsync_options, 0, sizeof(options->rsync_options));
memset(options->ssh_options, 0, sizeof(options->ssh_options));
memset(options->pg_bindir, 0, sizeof(options->pg_bindir));
memset(options->pgctl_options, 0, sizeof(options->pgctl_options));
/* if nothing has been provided defaults to 60 */
options->master_response_timeout = 60;
@@ -49,6 +52,9 @@ parse_config(const char *config_file, t_configuration_options *options)
options->reconnect_attempts = 6;
options->reconnect_intvl = 10;
options->monitor_interval_secs = 2;
options->retry_promote_interval_secs = 300;
/*
* Since some commands don't require a config file at all, not
* having one isn't necessarily a problem.
@@ -78,6 +84,8 @@ parse_config(const char *config_file, t_configuration_options *options)
strncpy (options->conninfo, value, MAXLEN);
else if (strcmp(name, "rsync_options") == 0)
strncpy (options->rsync_options, value, QUERY_STR_LEN);
else if (strcmp(name, "ssh_options") == 0)
strncpy (options->ssh_options, value, QUERY_STR_LEN);
else if (strcmp(name, "loglevel") == 0)
strncpy (options->loglevel, value, MAXLEN);
else if (strcmp(name, "logfacility") == 0)
@@ -111,6 +119,16 @@ parse_config(const char *config_file, t_configuration_options *options)
options->reconnect_attempts = atoi(value);
else if (strcmp(name, "reconnect_interval") == 0)
options->reconnect_intvl = atoi(value);
else if (strcmp(name, "pg_bindir") == 0)
strncpy (options->pg_bindir, value, MAXLEN);
else if (strcmp(name, "pg_ctl_options") == 0)
strncpy (options->pgctl_options, value, MAXLEN);
else if (strcmp(name, "logfile") == 0)
strncpy(options->logfile, value, MAXLEN);
else if (strcmp(name, "monitor_interval_secs") == 0)
options->monitor_interval_secs = atoi(value);
else if (strcmp(name, "retry_promote_interval_secs") == 0)
options->retry_promote_interval_secs = atoi(value);
else
log_warning(_("%s/%s: Unknown name/value pair!\n"), name, value);
}
@@ -148,6 +166,12 @@ parse_config(const char *config_file, t_configuration_options *options)
log_err(_("Reconnect intervals must be zero or greater. Check the configuration file.\n"));
exit(ERR_BAD_CONFIG);
}
if (*options->pg_bindir == '\0')
{
log_err(_("pg_bindir config value not found. Check the configuration file.\n"));
exit(ERR_BAD_CONFIG);
}
}
@@ -218,49 +242,49 @@ reload_configuration(char *config_file, t_configuration_options *orig_options)
parse_config(config_file, &new_options);
if (new_options.node == -1)
{
log_warning(_("\nCannot load new configuration, will keep current one.\n"));
log_warning(_("Cannot load new configuration, will keep current one.\n"));
return false;
}
if (strcmp(new_options.cluster_name, orig_options->cluster_name) != 0)
{
log_warning(_("\nCannot change cluster name, will keep current configuration.\n"));
log_warning(_("Cannot change cluster name, will keep current configuration.\n"));
return false;
}
if (new_options.node != orig_options->node)
{
log_warning(_("\nCannot change node number, will keep current configuration.\n"));
log_warning(_("Cannot change node number, will keep current configuration.\n"));
return false;
}
if (new_options.node_name != orig_options->node_name)
if (strcmp(new_options.node_name, orig_options->node_name) != 0)
{
log_warning(_("\nCannot change standby name, will keep current configuration.\n"));
log_warning(_("Cannot change standby name, will keep current configuration.\n"));
return false;
}
if (new_options.failover != MANUAL_FAILOVER && new_options.failover != AUTOMATIC_FAILOVER)
{
log_warning(_("\nNew value for failover is not valid. Should be MANUAL or AUTOMATIC.\n"));
log_warning(_("New value for failover is not valid. Should be MANUAL or AUTOMATIC.\n"));
return false;
}
if (new_options.master_response_timeout <= 0)
{
log_warning(_("\nNew value for master_response_timeout is not valid. Should be greater than zero.\n"));
log_warning(_("New value for master_response_timeout is not valid. Should be greater than zero.\n"));
return false;
}
if (new_options.reconnect_attempts < 0)
{
log_warning(_("\nNew value for reconnect_attempts is not valid. Should be greater or equal than zero.\n"));
log_warning(_("New value for reconnect_attempts is not valid. Should be greater or equal than zero.\n"));
return false;
}
if (new_options.reconnect_intvl < 0)
{
log_warning(_("\nNew value for reconnect_interval is not valid. Should be greater or equal than zero.\n"));
log_warning(_("New value for reconnect_interval is not valid. Should be greater or equal than zero.\n"));
return false;
}
@@ -268,7 +292,7 @@ reload_configuration(char *config_file, t_configuration_options *orig_options)
conn = establishDBConnection(new_options.conninfo, false);
if (!conn || (PQstatus(conn) != CONNECTION_OK))
{
log_warning(_("\nconninfo string is not valid, will keep current configuration.\n"));
log_warning(_("conninfo string is not valid, will keep current configuration.\n"));
return false;
}
PQfinish(conn);
@@ -283,6 +307,7 @@ reload_configuration(char *config_file, t_configuration_options *orig_options)
strcpy(orig_options->promote_command, new_options.promote_command);
strcpy(orig_options->follow_command, new_options.follow_command);
strcpy(orig_options->rsync_options, new_options.rsync_options);
strcpy(orig_options->ssh_options, new_options.ssh_options);
orig_options->master_response_timeout = new_options.master_response_timeout;
orig_options->reconnect_attempts = new_options.reconnect_attempts;
orig_options->reconnect_intvl = new_options.reconnect_intvl;

View File

@@ -1,6 +1,6 @@
/*
* config.h
* Copyright (c) 2ndQuadrant, 2010-2012
* Copyright (c) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -36,11 +36,19 @@ typedef struct
char loglevel[MAXLEN];
char logfacility[MAXLEN];
char rsync_options[QUERY_STR_LEN];
char ssh_options[QUERY_STR_LEN];
int master_response_timeout;
int reconnect_attempts;
int reconnect_intvl;
char pg_bindir[MAXLEN];
char pgctl_options[MAXLEN];
char logfile[MAXLEN];
int monitor_interval_secs;
int retry_promote_interval_secs;
} t_configuration_options;
#define T_CONFIGURATION_OPTIONS_INITIALIZER { "", -1, "", MANUAL_FAILOVER, -1, "", "", "", "", "", "", "", -1, -1, -1, "", "", "", 0, 0 }
void parse_config(const char *config_file, t_configuration_options *options);
void parse_line(char *buff, char *name, char *value);
char *trim(char *s);

143
dbutils.c
View File

@@ -1,6 +1,6 @@
/*
* dbutils.c - Database connection/management functions
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -71,25 +71,22 @@ establishDBConnectionByParams(const char *keywords[], const char *values[],const
return conn;
}
bool
int
is_standby(PGconn *conn)
{
PGresult *res;
bool result = false;
int result = 0;
res = PQexec(conn, "SELECT pg_is_in_recovery()");
if (PQresultStatus(res) != PGRES_TUPLES_OK)
if (res == NULL || PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_err(_("Can't query server mode: %s"),
PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
exit(ERR_DB_QUERY);
result = -1;
}
if (PQntuples(res) == 1 && strcmp(PQgetvalue(res, 0, 0), "t") == 0)
result = true;
else if (PQntuples(res) == 1 && strcmp(PQgetvalue(res, 0, 0), "t") == 0)
result = 1;
PQclear(res);
return result;
@@ -97,11 +94,11 @@ is_standby(PGconn *conn)
bool
int
is_witness(PGconn *conn, char *schema, char *cluster, int node_id)
{
PGresult *res;
bool result = false;
int result = 0;
char sqlquery[QUERY_STR_LEN];
sqlquery_snprintf(sqlquery, "SELECT witness from %s.repl_nodes where cluster = '%s' and id = %d",
@@ -110,13 +107,10 @@ is_witness(PGconn *conn, char *schema, char *cluster, int node_id)
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_err(_("Can't query server mode: %s"), PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
exit(ERR_DB_QUERY);
result = -1;
}
if (PQntuples(res) == 1 && strcmp(PQgetvalue(res, 0, 0), "t") == 0)
result = true;
else if (PQntuples(res) == 1 && strcmp(PQgetvalue(res, 0, 0), "t") == 0)
result = 1;
PQclear(res);
return result;
@@ -138,7 +132,7 @@ is_pgup(PGconn *conn, int timeout)
{
if (twice)
return false;
PQreset(conn); // reconnect
PQreset(conn); /* reconnect */
twice = true;
}
else
@@ -146,15 +140,16 @@ is_pgup(PGconn *conn, int timeout)
/*
* Send a SELECT 1 just to check if the connection is OK
*/
CancelQuery(conn, timeout);
if (!CancelQuery(conn, timeout))
goto failed;
if (wait_connection_availability(conn, timeout) != 1)
goto failed;
sqlquery_snprintf(sqlquery, "SELECT 1");
if (PQsendQuery(conn, sqlquery) == 0)
{
log_warning(_("PQsendQuery: Query could not be sent to primary. %s\n"),
PQerrorMessage(conn));
log_warning(_("PQsendQuery: Query could not be sent to primary. %s\n"),
PQerrorMessage(conn));
goto failed;
}
if (wait_connection_availability(conn, timeout) != 1)
@@ -163,10 +158,10 @@ is_pgup(PGconn *conn, int timeout)
break;
failed:
// we need to retry, because we might just have loose the connection once
/* we need to retry, because we might just have loose the connection once */
if (twice)
return false;
PQreset(conn); // reconnect
PQreset(conn); /* reconnect */
twice = true;
}
}
@@ -197,8 +192,7 @@ pg_version(PGconn *conn, char* major_version)
log_err(_("Version check PQexec failed: %s"),
PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
exit(ERR_DB_QUERY);
return NULL;
}
major_version1 = atoi(PQgetvalue(res, 0, 0));
@@ -219,12 +213,13 @@ pg_version(PGconn *conn, char* major_version)
}
bool
guc_setted(PGconn *conn, const char *parameter, const char *op,
int
guc_set(PGconn *conn, const char *parameter, const char *op,
const char *value)
{
PGresult *res;
char sqlquery[QUERY_STR_LEN];
int retval = 1;
sqlquery_snprintf(sqlquery, "SELECT true FROM pg_settings "
" WHERE name = '%s' AND setting %s '%s'",
@@ -235,18 +230,49 @@ guc_setted(PGconn *conn, const char *parameter, const char *op,
{
log_err(_("GUC setting check PQexec failed: %s"),
PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
exit(ERR_DB_QUERY);
retval = -1;
}
if (PQntuples(res) == 0)
else if (PQntuples(res) == 0)
{
PQclear(res);
return false;
retval = 0;
}
PQclear(res);
return true;
return retval;
}
/**
* Just like guc_set except with an extra parameter containing the name of
* the pg datatype so that the comparison can be done properly.
*/
int
guc_set_typed(PGconn *conn, const char *parameter, const char *op,
const char *value, const char *datatype)
{
PGresult *res;
char sqlquery[QUERY_STR_LEN];
int retval = 1;
sqlquery_snprintf(sqlquery, "SELECT true FROM pg_settings "
" WHERE name = '%s' AND setting::%s %s '%s'::%s",
parameter, datatype, op, value, datatype);
res = PQexec(conn, sqlquery);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_err(_("GUC setting check PQexec failed: %s"),
PQerrorMessage(conn));
retval = -1;
}
else if (PQntuples(res) == 0)
{
retval = 0;
}
PQclear(res);
return retval;
}
@@ -254,7 +280,7 @@ const char *
get_cluster_size(PGconn *conn)
{
PGresult *res;
const char *size;
const char *size = NULL;
char sqlquery[QUERY_STR_LEN];
sqlquery_snprintf(
@@ -267,11 +293,12 @@ get_cluster_size(PGconn *conn)
{
log_err(_("Get cluster size PQexec failed: %s"),
PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
exit(ERR_DB_QUERY);
}
size = PQgetvalue(res, 0, 0);
else
{
size = PQgetvalue(res, 0, 0);
}
PQclear(res);
return size;
}
@@ -332,8 +359,7 @@ getMasterConnection(PGconn *standby_conn, char *schema, char *cluster,
log_err(_("Can't get nodes info: %s\n"),
PQerrorMessage(standby_conn));
PQclear(res1);
PQfinish(standby_conn);
exit(ERR_DB_QUERY);
return NULL;
}
for (i = 0; i < PQntuples(res1); i++)
@@ -396,7 +422,7 @@ getMasterConnection(PGconn *standby_conn, char *schema, char *cluster,
/*
* wait until current query finishes ignoring any results, this could be an async command
* or a cancelation of a query
* or a cancelation of a query
* return 1 if Ok; 0 if any error ocurred; -1 if timeout reached
*/
int
@@ -408,11 +434,11 @@ wait_connection_availability(PGconn *conn, int timeout)
{
if (PQconsumeInput(conn) == 0)
{
log_warning(_("PQconsumeInput: Query could not be sent to primary. %s\n"),
PQerrorMessage(conn));
log_warning(_("wait_connection_availability: could not receive data from connection. %s\n"),
PQerrorMessage(conn));
return 0;
}
if (PQisBusy(conn) == 0)
{
res = PQgetResult(conn);
@@ -424,23 +450,40 @@ wait_connection_availability(PGconn *conn, int timeout)
}
if (timeout >= 0)
return 1;
else
else {
log_warning(_("wait_connection_availability: timeout reached"));
return -1;
}
}
void
bool
CancelQuery(PGconn *conn, int timeout)
{
char errbuf[ERRBUFF_SIZE];
PGcancel *pgcancel;
wait_connection_availability(conn, timeout);
if (wait_connection_availability(conn, timeout) != 1)
return false;
pgcancel = PQgetCancel(conn);
if (!pgcancel || PQcancel(pgcancel, errbuf, ERRBUFF_SIZE) == 0)
if (pgcancel == NULL)
return false;
/*
* PQcancel can only return 0 if socket()/connect()/send()
* fails, in any of those cases we can assume something
* bad happened to the connection
*/
if (PQcancel(pgcancel, errbuf, ERRBUFF_SIZE) == 0)
{
log_warning(_("Can't stop current query: %s\n"), errbuf);
PQfreeCancel(pgcancel);
return false;
}
PQfreeCancel(pgcancel);
return true;
}

View File

@@ -1,6 +1,6 @@
/*
* dbutils.h
* Copyright (c) 2ndQuadrant, 2010-2012
* Copyright (c) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -26,16 +26,19 @@ PGconn *establishDBConnection(const char *conninfo, const bool exit_on_error);
PGconn *establishDBConnectionByParams(const char *keywords[],
const char *values[],
const bool exit_on_error);
bool is_standby(PGconn *conn);
bool is_witness(PGconn *conn, char *schema, char *cluster, int node_id);
int is_standby(PGconn *conn);
int is_witness(PGconn *conn, char *schema, char *cluster, int node_id);
bool is_pgup(PGconn *conn, int timeout);
char *pg_version(PGconn *conn, char* major_version);
bool guc_setted(PGconn *conn, const char *parameter, const char *op,
const char *value);
int guc_set(PGconn *conn, const char *parameter, const char *op,
const char *value);
int guc_set_typed(PGconn *conn, const char *parameter, const char *op,
const char *value, const char *datatype);
const char *get_cluster_size(PGconn *conn);
PGconn *getMasterConnection(PGconn *standby_conn, char *schema, char *cluster,
int *master_id, char *master_conninfo_out);
int wait_connection_availability(PGconn *conn, int timeout);
void CancelQuery(PGconn *conn, int timeout);
bool CancelQuery(PGconn *conn, int timeout);
#endif

View File

@@ -1,9 +1,9 @@
Package: repmgr-auto
Version: 1.0-1
Version: 2.0beta2
Section: database
Priority: optional
Architecture: all
Depends: rsync, postgresql-9.0
Maintainer: Greg Smith <greg@2ndQuadrant.com>
Depends: rsync, postgresql-9.0 | postgresql-9.1 | postgresql-9.2 | postgresql-9.3
Maintainer: Jaime Casanova <jaime@2ndQuadrant.com>
Description: PostgreSQL replication setup, magament and monitoring
has two main executables

14
debian/repmgr.repmgrd.default vendored Normal file
View File

@@ -0,0 +1,14 @@
#!/bin/sh
# default settings for repmgrd. This file is source by /bin/sh from
# /etc/init.d/repmgrd
# Options for repmgrd
REPMGRD_OPTS=""
# repmgrd binary
REPMGR_BIN="/usr/bin/repmgr"
# pid file
REPMGR_PIDFILE="/var/run/repmgrd.pid"

48
debian/repmgr.repmgrd.init vendored Normal file
View File

@@ -0,0 +1,48 @@
#!/bin/sh
### BEGIN INIT INFO
# Provides: repmgrd
# Required-Start: $local_fs $remote_fs $network $syslog $postgresql
# Required-Stop: $local_fs $remote_fs $network $syslog $postgresql
# Should-Start: $syslog $postgresql
# Should-Start: $syslog $postgresql
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Start/stop repmgrd
### END INIT INFO
set -e
if test -f /etc/default/repmgrd; then
. /etc/default/repmgrd
fi
if [ -z "$REPMGRD_BIN" ]; then
REPMGRD_BIN="/usr/bin/repmgrd"
fi
if [ -z "$REPMGRD_PIDFILE" ]; then
REPMGRD_PIDFILE="/var/run/repmgrd.pid"
fi
test -x $REPMGRD_BIN || exit 0
case "$1" in
start)
start-stop-daemon --start --quiet --make-pidfile --pidfile $REPMGRD_PIDFILE --exec $REPMGRD_BIN $REPMGRD_OPTS
;;
stop)
start-stop-daemon --stop --oknodo --quiet --pidfile $REPMGRD_PIDFILE
;;
restart)
$0 stop && $0 start || exit 1
;;
*)
echo "Usage: $0 {start|stop|restart}"
exit 1
;;
esac
exit 0

View File

@@ -1,6 +1,6 @@
/*
* errcode.h
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -35,5 +35,6 @@
#define ERR_STR_OVERFLOW 10
#define ERR_FAILOVER_FAIL 11
#define ERR_BAD_SSH 12
#define ERR_SYS_FAILURE 13
#endif /* _ERRCODE_H_ */

36
log.c
View File

@@ -1,6 +1,6 @@
/*
* log.c - Logging methods
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This module is a set of methods for logging (currently only syslog)
*
@@ -25,9 +25,11 @@
#ifdef HAVE_SYSLOG
#include <syslog.h>
#include <stdarg.h>
#endif
#include <stdarg.h>
#include <time.h>
#include "log.h"
#define DEFAULT_IDENT "repmgr"
@@ -37,13 +39,36 @@
/* #define REPMGR_DEBUG */
void stderr_log_with_level(const char *level_name, int level, const char *fmt, ...) {
size_t len = strlen(fmt);
char fmt1[len + 150];
time_t t;
struct tm *tm;
char buff[100];
va_list ap;
if(log_level >= level) {
time(&t);
tm = localtime(&t);
va_start(ap, fmt);
strftime(buff, 100, "[%Y-%m-%d %H:%M:%S]", tm);
snprintf(fmt1, len + 150, "%s [%s] %s", buff, level_name, fmt);
vfprintf(stderr, fmt1, ap);
va_end(ap);
}
}
static int detect_log_level(const char* level);
static int detect_log_facility(const char* facility);
int log_type = REPMGR_STDERR;
int log_level = LOG_NOTICE;
bool logger_init(const char* ident, const char* level, const char* facility)
bool logger_init(t_configuration_options *opts, const char* ident, const char* level, const char* facility)
{
int l;
@@ -115,6 +140,11 @@ bool logger_init(const char* ident, const char* level, const char* facility)
#endif
if (*opts->logfile)
{
freopen(opts->logfile, "a", stderr);
}
return true;
}

22
log.h
View File

@@ -1,6 +1,6 @@
/*
* log.h
* Copyright (c) 2ndQuadrant, 2010-2012
* Copyright (c) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -25,15 +25,17 @@
#define REPMGR_SYSLOG 1
#define REPMGR_STDERR 2
void stderr_log_with_level(const char *level_name, int level, const char *fmt, ...) __attribute__ ((format (PG_PRINTF_ATTRIBUTE, 3, 4)));
/* Standard error logging */
#define stderr_log_debug(...) if (log_level >= LOG_DEBUG) fprintf(stderr, __VA_ARGS__)
#define stderr_log_info(...) if (log_level >= LOG_INFO) fprintf(stderr, __VA_ARGS__)
#define stderr_log_notice(...) if (log_level >= LOG_NOTICE) fprintf(stderr, __VA_ARGS__)
#define stderr_log_warning(...) if (log_level >= LOG_WARNING) fprintf(stderr, __VA_ARGS__)
#define stderr_log_err(...) if (log_level >= LOG_ERR) fprintf(stderr, __VA_ARGS__)
#define stderr_log_crit(...) if (log_level >= LOG_CRIT) fprintf(stderr, __VA_ARGS__)
#define stderr_log_alert(...) if (log_level >= LOG_ALERT) fprintf(stderr, __VA_ARGS__)
#define stderr_log_emerg(...) if (log_level >= LOG_EMERG) fprintf(stderr, __VA_ARGS__)
#define stderr_log_debug(...) stderr_log_with_level("DEBUG", LOG_DEBUG, __VA_ARGS__)
#define stderr_log_info(...) stderr_log_with_level("INFO", LOG_INFO, __VA_ARGS__)
#define stderr_log_notice(...) stderr_log_with_level("NOTICE", LOG_NOTICE, __VA_ARGS__)
#define stderr_log_warning(...) stderr_log_with_level("WARNING", LOG_WARNING, __VA_ARGS__)
#define stderr_log_err(...) stderr_log_with_level("ERROR", LOG_ERR, __VA_ARGS__)
#define stderr_log_crit(...) stderr_log_with_level("CRITICAL", LOG_CRIT, __VA_ARGS__)
#define stderr_log_alert(...) stderr_log_with_level("ALERT", LOG_ALERT, __VA_ARGS__)
#define stderr_log_emerg(...) stderr_log_with_level("EMERGENCY", LOG_EMERG, __VA_ARGS__)
#ifdef HAVE_SYSLOG
@@ -112,7 +114,7 @@
/* Logger initialisation and shutdown */
bool logger_shutdown(void);
bool logger_init(const char* ident, const char* level, const char* facility);
bool logger_init(t_configuration_options *opts, const char* ident, const char* level, const char* facility);
void logger_min_verbose(int minimum);
extern int log_type;

462
repmgr.c
View File

@@ -1,6 +1,6 @@
/*
* repmgr.c - Command interpreter for the repmgr
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This module is a command-line utility to easily setup a cluster of
* hot standby servers for an HA environment
@@ -30,6 +30,7 @@
#include <stdio.h>
#include <stdlib.h>
#include <sys/wait.h>
#include <time.h>
#include <unistd.h>
@@ -55,7 +56,7 @@
static bool create_recovery_file(const char *data_dir);
static int test_ssh_connection(char *host, char *remote_user);
static int copy_remote_files(char *host, char *remote_user, char *remote_path,
char *local_path, bool is_directory);
char *local_path, bool is_directory);
static bool check_parameters_for_action(const int action);
static bool create_schema(PGconn *conn);
static bool copy_configuration(PGconn *masterconn, PGconn *witnessconn);
@@ -84,8 +85,8 @@ bool need_a_node = true;
bool require_password = false;
/* Initialization of runtime options */
t_runtime_options runtime_options = { "", "", "", "", "", "", DEFAULT_WAL_KEEP_SEGMENTS, false, false, false, "", "", 0 };
t_configuration_options options = { "", -1, "", MANUAL_FAILOVER, -1, "", "", "", "", "", "", -1 };
t_runtime_options runtime_options = T_RUNTIME_OPTIONS_INITIALIZER;
t_configuration_options options = T_CONFIGURATION_OPTIONS_INITIALIZER;
static char *server_mode = NULL;
static char *server_cmd = NULL;
@@ -104,8 +105,9 @@ main(int argc, char **argv)
{"config-file", required_argument, NULL, 'f'},
{"remote-user", required_argument, NULL, 'R'},
{"wal-keep-segments", required_argument, NULL, 'w'},
{"keep-history", required_argument, NULL, 'k'},
{"keep-history", required_argument, NULL, 'k'},
{"force", no_argument, NULL, 'F'},
{"wait", no_argument, NULL, 'W'},
{"ignore-rsync-warning", no_argument, NULL, 'I'},
{"verbose", no_argument, NULL, 'v'},
{NULL, 0, NULL, 0}
@@ -132,7 +134,7 @@ main(int argc, char **argv)
}
while ((c = getopt_long(argc, argv, "d:h:p:U:D:l:f:R:w:k:F:I:v", long_options,
while ((c = getopt_long(argc, argv, "d:h:p:U:D:l:f:R:w:k:FWIv", long_options,
&optindex)) != -1)
{
switch (c)
@@ -171,11 +173,14 @@ main(int argc, char **argv)
if (atoi(optarg) > 0)
runtime_options.keep_history = atoi(optarg);
else
runtime_options.keep_history = 0;
runtime_options.keep_history = 0;
break;
case 'F':
runtime_options.force = true;
break;
case 'W':
runtime_options.wait_for_master = true;
break;
case 'I':
runtime_options.ignore_rsync_warn = true;
break;
@@ -263,13 +268,10 @@ main(int argc, char **argv)
}
}
switch (optind < argc)
if (optind < argc)
{
case 0:
break;
default:
log_err(_("%s: too many command-line arguments (first extra is \"%s\")\n"),
progname, argv[optind + 1]);
progname, argv[optind]);
usage();
exit(ERR_BAD_CONFIG);
}
@@ -317,7 +319,7 @@ main(int argc, char **argv)
* at, but it often requires detailed logging to troubleshoot
* problems.
*/
logger_init(progname, options.loglevel, options.logfacility);
logger_init(&options, progname, options.loglevel, options.logfacility);
if (runtime_options.verbose)
logger_min_verbose(LOG_INFO);
@@ -391,7 +393,7 @@ do_cluster_show(void)
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_err(_("Can't get nodes informations, have you regitered them?\n%s\n"), PQerrorMessage(conn));
log_err(_("Can't get nodes information, have you registered them?\n%s\n"), PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
exit(ERR_BAD_CONFIG);
@@ -420,7 +422,7 @@ do_cluster_show(void)
PQclear(res);
}
static void
static void
do_cluster_cleanup(void)
{
int master_id;
@@ -429,14 +431,14 @@ do_cluster_cleanup(void)
PGresult *res;
char sqlquery[QUERY_STR_LEN];
/* We need to connect to check configuration */
log_info(_("%s connecting to database\n"), progname);
conn = establishDBConnection(options.conninfo, true);
/* We need to connect to check configuration */
log_info(_("%s connecting to database\n"), progname);
conn = establishDBConnection(options.conninfo, true);
/* check if there is a master in this cluster */
log_info(_("%s connecting to master database\n"), progname);
master_conn = getMasterConnection(conn, repmgr_schema, options.cluster_name,
&master_id, NULL);
&master_id, NULL);
if (!master_conn)
{
log_err(_("cluster cleanup: cannot connect to master\n"));
@@ -448,8 +450,8 @@ do_cluster_cleanup(void)
if (runtime_options.keep_history > 0)
{
sqlquery_snprintf(sqlquery, "DELETE FROM %s.repl_monitor "
" WHERE age(now(), last_monitor_time) >= '%d days'::interval;",
repmgr_schema, runtime_options.keep_history);
" WHERE age(now(), last_monitor_time) >= '%d days'::interval;",
repmgr_schema, runtime_options.keep_history);
}
else
{
@@ -481,29 +483,35 @@ do_master_register(void)
{
PGconn *conn;
PGresult *res;
char sqlquery[QUERY_STR_LEN];
char sqlquery[QUERY_STR_LEN], *ret_ver;
bool schema_exists = false;
char schema_quoted[MAXLEN];
char master_version[MAXVERSIONSTR];
int ret;
conn = establishDBConnection(options.conninfo, true);
/* master should be v9 or better */
log_info(_("%s connecting to master database\n"), progname);
pg_version(conn, master_version);
if (strcmp(master_version, "") == 0)
ret_ver = pg_version(conn, master_version);
if (ret_ver == NULL || strcmp(master_version, "") == 0)
{
PQfinish(conn);
log_err( _("%s needs master to be PostgreSQL 9.0 or better\n"), progname);
if (ret_ver != NULL)
log_err( _("%s needs master to be PostgreSQL 9.0 or better\n"), progname);
return;
}
/* Check we are a master */
log_info(_("%s connected to master, checking its state\n"), progname);
if (is_standby(conn))
ret = is_standby(conn);
if (ret)
{
log_err(_("Trying to register a standby node as a master\n"));
log_err(_(ret == 1 ? "Trying to register a standby node as a master\n" :
"Connection to node lost!\n"));
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
@@ -560,8 +568,24 @@ do_master_register(void)
PGconn *master_conn;
int id;
if (runtime_options.force)
{
sqlquery_snprintf(sqlquery, "DELETE FROM %s.repl_nodes "
" WHERE id = %d",
repmgr_schema, options.node);
log_debug(_("master register: %s\n"), sqlquery);
if (!PQexec(conn, sqlquery))
{
log_warning(_("Cannot delete node details, %s\n"),
PQerrorMessage(conn));
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
}
/* Ensure there isn't any other master already registered */
master_conn = getMasterConnection(conn, repmgr_schema,
master_conn = getMasterConnection(conn, repmgr_schema,
options.cluster_name, &id,NULL);
if (master_conn != NULL)
{
@@ -572,26 +596,11 @@ do_master_register(void)
}
/* Now register the master */
if (runtime_options.force)
{
sqlquery_snprintf(sqlquery, "DELETE FROM %s.repl_nodes "
" WHERE id = %d",
repmgr_schema, options.node);
log_debug(_("master register: %s\n"), sqlquery);
if (!PQexec(conn, sqlquery))
{
log_warning(_("Cannot delete node details, %s\n"),
PQerrorMessage(conn));
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
}
sqlquery_snprintf(sqlquery, "INSERT INTO %s.repl_nodes (id, cluster, name, conninfo, priority) "
"VALUES (%d, '%s', '%s', '%s', %d)",
repmgr_schema, options.node, options.cluster_name, options.node_name,
options.conninfo, options.priority);
repmgr_schema, options.node, options.cluster_name, options.node_name,
options.conninfo, options.priority);
log_debug(_("master register: %s\n"), sqlquery);
if (!PQexec(conn, sqlquery))
@@ -614,10 +623,10 @@ do_standby_register(void)
{
PGconn *conn;
PGconn *master_conn;
int master_id;
int master_id, ret;
PGresult *res;
char sqlquery[QUERY_STR_LEN];
char sqlquery[QUERY_STR_LEN], *ret_ver;
char schema_quoted[MAXLEN];
char master_version[MAXVERSIONSTR];
@@ -630,18 +639,22 @@ do_standby_register(void)
/* should be v9 or better */
log_info(_("%s connected to standby, checking its state\n"), progname);
pg_version(conn, standby_version);
if (strcmp(standby_version, "") == 0)
ret_ver = pg_version(conn, standby_version);
if (ret_ver == NULL || strcmp(standby_version, "") == 0)
{
PQfinish(conn);
log_err(_("%s needs standby to be PostgreSQL 9.0 or better\n"), progname);
if (ret_ver != NULL)
log_err(_("%s needs standby to be PostgreSQL 9.0 or better\n"), progname);
exit(ERR_BAD_CONFIG);
}
/* Check we are a standby */
if (!is_standby(conn))
ret = is_standby(conn);
if (ret == 0 || ret == -1)
{
log_err(_("repmgr: This node should be a standby (%s)\n"), options.conninfo);
log_err(_(ret == 0 ? "repmgr: This node should be a standby (%s)\n" :
"repmgr: connection to node (%s) lost\n"), options.conninfo);
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
@@ -692,12 +705,13 @@ do_standby_register(void)
/* master should be v9 or better */
log_info(_("%s connected to master, checking its state\n"), progname);
pg_version(master_conn, master_version);
if (strcmp(master_version, "") == 0)
ret_ver = pg_version(master_conn, master_version);
if (ret_ver == NULL || strcmp(master_version, "") == 0)
{
PQfinish(conn);
PQfinish(master_conn);
log_err(_("%s needs master to be PostgreSQL 9.0 or better\n"), progname);
if (ret_ver != NULL)
log_err(_("%s needs master to be PostgreSQL 9.0 or better\n"), progname);
exit(ERR_BAD_CONFIG);
}
@@ -733,11 +747,12 @@ do_standby_register(void)
sqlquery_snprintf(sqlquery, "INSERT INTO %s.repl_nodes(id, cluster, name, conninfo, priority) "
"VALUES (%d, '%s', '%s', '%s', %d)",
repmgr_schema, options.node, options.cluster_name, options.node_name,
options.conninfo, options.priority);
repmgr_schema, options.node, options.cluster_name, options.node_name,
options.conninfo, options.priority);
log_debug(_("standby register: %s\n"), sqlquery);
if (!PQexec(master_conn, sqlquery))
res = PQexec(master_conn, sqlquery);
if (!res || PQresultStatus(res) != PGRES_COMMAND_OK)
{
log_err(_("Cannot insert node details, %s\n"),
PQerrorMessage(master_conn));
@@ -760,10 +775,11 @@ do_standby_clone(void)
{
PGconn *conn;
PGresult *res;
char sqlquery[QUERY_STR_LEN];
char sqlquery[QUERY_STR_LEN], *ret;
const char *cluster_size;
int r = 0;
int i;
int r = 0, retval = SUCCESS;
int i, is_standby_retval;
bool flag_success = false;
bool test_mode = false;
@@ -814,45 +830,60 @@ do_standby_clone(void)
/* primary should be v9 or better */
log_info(_("%s connected to master, checking its state\n"), progname);
pg_version(conn, master_version);
if (strcmp(master_version, "") == 0)
ret = pg_version(conn, master_version);
if (ret == NULL || strcmp(master_version, "") == 0)
{
PQfinish(conn);
log_err(_("%s needs master to be PostgreSQL 9.0 or better\n"), progname);
if (ret != NULL)
log_err(_("%s needs master to be PostgreSQL 9.0 or better\n"), progname);
exit(ERR_BAD_CONFIG);
}
/* Check we are cloning a primary node */
if (is_standby(conn))
is_standby_retval = is_standby(conn);
if (is_standby_retval)
{
log_err(_(is_standby_retval == 1 ? "The command should clone a primary node\n" :
"Connection to node lost!\n"));
PQfinish(conn);
log_err(_("\nThe command should clone a primary node\n"));
exit(ERR_BAD_CONFIG);
}
/* And check if it is well configured */
if (!guc_setted(conn, "wal_level", "=", "hot_standby"))
i = guc_set(conn, "wal_level", "=", "hot_standby");
if (i == 0 || i == -1)
{
PQfinish(conn);
log_err(_("%s needs parameter 'wal_level' to be set to 'hot_standby'\n"), progname);
if (i == 0)
log_err(_("%s needs parameter 'wal_level' to be set to 'hot_standby'\n"), progname);
exit(ERR_BAD_CONFIG);
}
if (!guc_setted(conn, "wal_keep_segments", ">=", runtime_options.wal_keep_segments))
i = guc_set_typed(conn, "wal_keep_segments", ">=", runtime_options.wal_keep_segments, "integer");
if (i == 0 || i == -1)
{
PQfinish(conn);
log_err(_("%s needs parameter 'wal_keep_segments' to be set to %s or greater (see the '-w' option or edit the postgresql.conf of the PostgreSQL master.)\n"), progname, runtime_options.wal_keep_segments);
if (i == 0)
log_err(_("%s needs parameter 'wal_keep_segments' to be set to %s or greater (see the '-w' option or edit the postgresql.conf of the PostgreSQL master.)\n"), progname, runtime_options.wal_keep_segments);
exit(ERR_BAD_CONFIG);
}
if (!guc_setted(conn, "archive_mode", "=", "on"))
i = guc_set(conn, "archive_mode", "=", "on");
if (i == 0 || i == -1)
{
PQfinish(conn);
log_err(_("%s needs parameter 'archive_mode' to be set to 'on'\n"), progname);
if (i == 0)
log_err(_("%s needs parameter 'archive_mode' to be set to 'on'\n"), progname);
exit(ERR_BAD_CONFIG);
}
if (!guc_setted(conn, "hot_standby", "=", "on"))
i = guc_set(conn, "hot_standby", "=", "on");
if (i == 0 || i == -1)
{
PQfinish(conn);
log_err(_("%s needs parameter 'hot_standby' to be set to 'on'\n"), progname);
if (i == 0)
log_err(_("%s needs parameter 'hot_standby' to be set to 'on'\n"), progname);
exit(ERR_BAD_CONFIG);
}
@@ -920,7 +951,7 @@ do_standby_clone(void)
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
/* We need all 5 parameters, and they can be retrieved only by superusers */
if (PQntuples(res) != 5)
{
@@ -947,7 +978,10 @@ do_standby_clone(void)
}
PQclear(res);
log_info(_("Succesfully connected to primary. Current installation size is %s\n"), get_cluster_size(conn));
cluster_size = get_cluster_size(conn);
if (cluster_size == NULL)
exit(ERR_DB_QUERY);
log_info(_("Successfully connected to primary. Current installation size is %s\n"), cluster_size);
/*
* XXX master_xlog_directory should be discovered from master configuration
@@ -983,8 +1017,8 @@ do_standby_clone(void)
log_notice(_("Starting backup...\n"));
/*
* in pg 9.1 default is to wait for a sync standby to ack,
/*
* in pg 9.1 default is to wait for a sync standby to ack,
* avoid that by turning off sync rep for this session
*/
sqlquery_snprintf(sqlquery, "SET synchronous_commit TO OFF");
@@ -1031,6 +1065,8 @@ do_standby_clone(void)
{
log_err(_("%s: couldn't use directory %s ...\nUse --force option to force\n"),
progname, local_data_directory);
r = ERR_BAD_CONFIG;
retval = ERR_BAD_CONFIG;
goto stop_backup;
}
@@ -1170,7 +1206,7 @@ stop_backup:
log_err(_("Can't stop backup: %s\n"), PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
exit(ERR_STOP_BACKUP);
exit(retval);
}
last_wal_segment = PQgetvalue(res, 0, 0);
@@ -1238,13 +1274,13 @@ do_standby_promote(void)
{
PGconn *conn;
PGresult *res;
char sqlquery[QUERY_STR_LEN];
char sqlquery[QUERY_STR_LEN], *ret;
char script[MAXLEN];
PGconn *old_master_conn;
int old_master_id;
int r;
int r, retval;
char data_dir[MAXLEN];
char recovery_file_path[MAXFILENAME];
char recovery_done_path[MAXFILENAME];
@@ -1257,18 +1293,22 @@ do_standby_promote(void)
/* we need v9 or better */
log_info(_("%s connected to master, checking its state\n"), progname);
pg_version(conn, standby_version);
if (strcmp(standby_version, "") == 0)
ret = pg_version(conn, standby_version);
if (ret == NULL || strcmp(standby_version, "") == 0)
{
log_err(_("%s needs standby to be PostgreSQL 9.0 or better\n"), progname);
if (ret != NULL)
log_err(_("%s needs standby to be PostgreSQL 9.0 or better\n"), progname);
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
/* Check we are in a standby node */
if (!is_standby(conn))
retval = is_standby(conn);
if (retval == 0 || retval == -1)
{
log_err(_("%s: The command should be executed on a standby node\n"), progname);
log_err(_(retval == 0 ? "%s: The command should be executed on a standby node\n" :
"%s: connection to node lost!\n"), progname);
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
@@ -1308,13 +1348,12 @@ do_standby_promote(void)
rename(recovery_file_path, recovery_done_path);
/*
* We assume the pg_ctl script is in the PATH. Restart and wait for
* the server to finish starting, so that the check below will
* find an active server rather than one starting up. This may
* Restart and wait for the server to finish starting, so that the check
* below will find an active server rather than one starting up. This may
* hang for up the default timeout (60 seconds).
*/
log_notice(_("%s: restarting server using pg_ctl\n"), progname);
maxlen_snprintf(script, "pg_ctl -D %s -w -m fast restart", data_dir);
log_notice(_("%s: restarting server using %s/pg_ctl\n"), progname, options.pg_bindir);
maxlen_snprintf(script, "%s/pg_ctl %s -D %s -w -m fast restart", options.pg_bindir, options.pgctl_options, data_dir);
r = system(script);
if (r != 0)
{
@@ -1325,13 +1364,15 @@ do_standby_promote(void)
/* reconnect to check we got promoted */
log_info(_("%s connecting to now restarted database\n"), progname);
conn = establishDBConnection(options.conninfo, true);
if (is_standby(conn))
retval = is_standby(conn);
if (retval)
{
log_err(_("\n%s: STANDBY PROMOTE failed, this is still a standby node.\n"), progname);
log_err(_(retval == 1 ? "%s: STANDBY PROMOTE failed, this is still a standby node.\n" :
"%s: connection to node lost!\n"), progname);
}
else
{
log_err(_("\n%s: STANDBY PROMOTE successful. You should REINDEX any hash indexes you have.\n"), progname);
log_err(_("%s: STANDBY PROMOTE successful. You should REINDEX any hash indexes you have.\n"), progname);
}
PQfinish(conn);
return;
@@ -1343,13 +1384,13 @@ do_standby_follow(void)
{
PGconn *conn;
PGresult *res;
char sqlquery[QUERY_STR_LEN];
char sqlquery[QUERY_STR_LEN], *ret;
char script[MAXLEN];
char master_conninfo[MAXLEN];
PGconn *master_conn;
int master_id;
int r;
int r, retval;
char data_dir[MAXLEN];
char master_version[MAXVERSIONSTR];
@@ -1361,26 +1402,44 @@ do_standby_follow(void)
/* Check we are in a standby node */
log_info(_("%s connected to standby, checking its state\n"), progname);
if (!is_standby(conn))
retval = is_standby(conn);
if (retval == 0 || retval == -1)
{
log_err(_("\n%s: The command should be executed in a standby node\n"), progname);
log_err(_(retval == 0 ? "%s: The command should be executed in a standby node\n" :
"%s: connection to node lost!\n"), progname);
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
/* should be v9 or better */
pg_version(conn, standby_version);
if (strcmp(standby_version, "") == 0)
ret = pg_version(conn, standby_version);
if (ret == NULL || strcmp(standby_version, "") == 0)
{
log_err(_("\n%s needs standby to be PostgreSQL 9.0 or better\n"), progname);
if (ret != NULL)
log_err(_("%s needs standby to be PostgreSQL 9.0 or better\n"), progname);
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
/* we also need to check if there is any master in the cluster */
log_info(_("%s connecting to master database\n"), progname);
master_conn = getMasterConnection(conn, repmgr_schema,
options.cluster_name, &master_id,(char *) &master_conninfo);
/*
* we also need to check if there is any master in the cluster
* or wait for one to appear if we have set the wait option
*/
log_info(_("%s discovering new master...\n"), progname);
do
{
if (!is_pgup(conn, options.master_response_timeout))
{
conn = establishDBConnection(options.conninfo, true);
}
master_conn = getMasterConnection(conn, repmgr_schema,
options.cluster_name, &master_id,(char *) &master_conninfo);
}
while (master_conn == NULL && runtime_options.wait_for_master);
if (master_conn == NULL)
{
log_err(_("There isn't a master to follow in this cluster\n"));
@@ -1389,9 +1448,12 @@ do_standby_follow(void)
}
/* Check we are going to point to a master */
if (is_standby(master_conn))
retval = is_standby(master_conn);
if (retval)
{
log_err(_("%s: The node to follow should be a master\n"), progname);
log_err(_(retval == 1 ? "%s: The node to follow should be a master\n" :
"%s: connection to node lost!\n"), progname);
PQfinish(conn);
PQfinish(master_conn);
exit(ERR_BAD_CONFIG);
@@ -1399,10 +1461,11 @@ do_standby_follow(void)
/* should be v9 or better */
log_info(_("%s connected to master, checking its state\n"), progname);
pg_version(master_conn, master_version);
if (strcmp(master_version, "") == 0)
ret = pg_version(master_conn, master_version);
if (ret == NULL || strcmp(master_version, "") == 0)
{
log_err(_("%s needs master to be PostgreSQL 9.0 or better\n"), progname);
if (ret != NULL)
log_err(_("%s needs master to be PostgreSQL 9.0 or better\n"), progname);
PQfinish(conn);
PQfinish(master_conn);
exit(ERR_BAD_CONFIG);
@@ -1427,7 +1490,7 @@ do_standby_follow(void)
strncpy(runtime_options.masterport, PQport(master_conn), MAXLEN);
PQfinish(master_conn);
log_info(_("%s Changing standby's master"),progname);
log_info(_("%s Changing standby's master\n"),progname);
/* Get the data directory full path */
sqlquery_snprintf(sqlquery, "SELECT setting "
@@ -1450,8 +1513,7 @@ do_standby_follow(void)
exit(ERR_BAD_CONFIG);
/* Finally, restart the service */
/* We assume the pg_ctl script is in the PATH */
maxlen_snprintf(script, "pg_ctl -w -D %s -m fast restart", data_dir);
maxlen_snprintf(script, "%s/pg_ctl %s -w -D %s -m fast restart", options.pg_bindir, options.pgctl_options, data_dir);
r = system(script);
if (r != 0)
{
@@ -1469,27 +1531,19 @@ do_witness_create(void)
PGconn *masterconn;
PGconn *witnessconn;
PGresult *res;
char sqlquery[QUERY_STR_LEN];
char sqlquery[QUERY_STR_LEN], *ret;
char script[MAXLEN];
char buf[MAXLEN];
FILE *pg_conf = NULL;
int r = 0;
int r = 0, retval;
int i;
char master_version[MAXVERSIONSTR];
char master_hba_file[MAXLEN];
/* Check this directory could be used as a PGDATA dir */
if (!create_pgdir(runtime_options.dest_dir, runtime_options.force))
{
log_err(_("witness create: couldn't create data directory (\"%s\") for witness"),
runtime_options.dest_dir);
exit(ERR_BAD_CONFIG);
}
/* Connection parameters for master only */
keywords[0] = "host";
values[0] = runtime_options.host;
@@ -1505,23 +1559,27 @@ do_witness_create(void)
}
/* primary should be v9 or better */
pg_version(masterconn, master_version);
if (strcmp(master_version, "") == 0)
ret = pg_version(masterconn, master_version);
if (ret == NULL || strcmp(master_version, "") == 0)
{
log_err(_("%s needs master to be PostgreSQL 9.0 or better\n"), progname);
if (ret != NULL)
log_err(_("%s needs master to be PostgreSQL 9.0 or better\n"), progname);
PQfinish(masterconn);
exit(ERR_BAD_CONFIG);
}
/* Check we are connecting to a primary node */
if (is_standby(masterconn))
retval = is_standby(masterconn);
if (retval)
{
log_err(_("The command should not run on a standby node\n"));
log_err(_(retval == 1 ? "The command should not run on a standby node\n" :
"Connection to node lost!\n"));
PQfinish(masterconn);
exit(ERR_BAD_CONFIG);
}
log_info(_("Succesfully connected to primary.\n"));
log_info(_("Successfully connected to primary.\n"));
r = test_ssh_connection(runtime_options.host, runtime_options.remote_user);
if (r != 0)
@@ -1531,6 +1589,15 @@ do_witness_create(void)
exit(ERR_BAD_SSH);
}
/* Check this directory could be used as a PGDATA dir */
if (!create_pgdir(runtime_options.dest_dir, runtime_options.force))
{
log_err(_("witness create: couldn't create data directory (\"%s\") for witness"),
runtime_options.dest_dir);
exit(ERR_BAD_CONFIG);
}
/*
* To create a witness server we need to:
* 1) initialize the cluster
@@ -1539,8 +1606,7 @@ do_witness_create(void)
*/
/* Create the cluster for witness */
/* We assume the pg_ctl script is in the PATH */
sprintf(script, "pg_ctl -D %s init -o \"-W\"", runtime_options.dest_dir);
sprintf(script, "%s/pg_ctl %s -D %s init -o \"-W\"", options.pg_bindir, options.pgctl_options, runtime_options.dest_dir);
log_info("Initialize cluster for witness: %s.\n", script);
r = system(script);
@@ -1559,7 +1625,7 @@ do_witness_create(void)
pg_conf = fopen(buf, "a");
if (pg_conf == NULL)
{
log_err(_("\n%s: could not open \"%s\" for adding extra config: %s\n"), progname, buf, strerror(errno));
log_err(_("%s: could not open \"%s\" for adding extra config: %s\n"), progname, buf, strerror(errno));
PQfinish(masterconn);
exit(ERR_BAD_CONFIG);
}
@@ -1582,8 +1648,8 @@ do_witness_create(void)
/* Get the pg_hba.conf full path */
sqlquery_snprintf(sqlquery, "SELECT name, setting "
" FROM pg_settings "
" WHERE name IN ('hba_file')");
" FROM pg_settings "
" WHERE name IN ('hba_file')");
log_debug(_("witness create: %s"), sqlquery);
res = PQexec(masterconn, sqlquery);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
@@ -1613,7 +1679,7 @@ do_witness_create(void)
}
/* start new instance */
sprintf(script, "pg_ctl -w -D %s start", runtime_options.dest_dir);
sprintf(script, "%s/pg_ctl %s -w -D %s start", options.pg_bindir, options.pgctl_options, runtime_options.dest_dir);
log_info(_("Start cluster for witness: %s"), script);
r = system(script);
if (r != 0)
@@ -1626,7 +1692,7 @@ do_witness_create(void)
/* register ourselves in the master */
sqlquery_snprintf(sqlquery, "INSERT INTO %s.repl_nodes(id, cluster, name, conninfo, priority, witness) "
"VALUES (%d, '%s', '%s', '%s', %d, true)",
repmgr_schema, options.node, options.cluster_name, options.node_name, options.conninfo);
repmgr_schema, options.node, options.cluster_name, options.node_name, options.conninfo, options.priority);
log_debug(_("witness create: %s"), sqlquery);
if (!PQexec(masterconn, sqlquery))
@@ -1658,7 +1724,7 @@ do_witness_create(void)
PQfinish(masterconn);
PQfinish(witnessconn);
log_notice(_("Configuration has been succesfully copied to the witness\n"));
log_notice(_("Configuration has been successfully copied to the witness\n"));
}
@@ -1666,8 +1732,8 @@ do_witness_create(void)
static void
usage(void)
{
log_err(_("\n\n%s: Replicator manager \n"), progname);
log_err(_("Try \"%s --help\" for more information.\n"), progname);
fprintf(stderr, _("\n\n%s: Replicator manager \n"), progname);
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname);
}
@@ -1677,40 +1743,46 @@ help(const char *progname)
{
printf(_("\n%s: Replicator manager \n"), progname);
printf(_("Usage:\n"));
printf(_(" %s [OPTIONS] master {register}\n"), progname);
printf(_(" %s [OPTIONS] master {register}\n"), progname);
printf(_(" %s [OPTIONS] standby {register|clone|promote|follow}\n"),
progname);
printf(_(" %s [OPTIONS] cluster {show|cleanup}\n"), progname);
printf(_("\nGeneral options:\n"));
printf(_(" --help show this help, then exit\n"));
printf(_(" --version output version information, then exit\n"));
printf(_(" --verbose output verbose activity information\n"));
printf(_(" --help show this help, then exit\n"));
printf(_(" --version output version information, then exit\n"));
printf(_(" --verbose output verbose activity information\n"));
printf(_("\nConnection options:\n"));
printf(_(" -d, --dbname=DBNAME database to connect to\n"));
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
printf(_(" -p, --port=PORT database server port\n"));
printf(_(" -U, --username=USERNAME database user name to connect as\n"));
printf(_(" -d, --dbname=DBNAME database to connect to\n"));
printf(_(" -h, --host=HOSTNAME database server host or socket directory\n"));
printf(_(" -p, --port=PORT database server port\n"));
printf(_(" -U, --username=USERNAME database user name to connect as\n"));
printf(_("\nConfiguration options:\n"));
printf(_(" -D, --data-dir=DIR local directory where the files will be copied to\n"));
printf(_(" -l, --local-port=PORT standby or witness server local port\n"));
printf(_(" -f, --config_file=PATH path to the configuration file\n"));
printf(_(" -R, --remote-user=USERNAME database server username for rsync\n"));
printf(_(" -w, --wal-keep-segments=VALUE minimum value for the GUC wal_keep_segments (default: 5000)\n"));
printf(_(" -I, --ignore-rsync-warning ignore rsync partial transfer warning\n"));
printf(_(" -k, --keep-history=VALUE keeps indicated number of days of history\n"));
printf(_(" -F, --force force potentially dangerous operations to happen\n"));
printf(_(" -D, --data-dir=DIR local directory where the files will be\n" \
" copied to\n"));
printf(_(" -l, --local-port=PORT standby or witness server local port\n"));
printf(_(" -f, --config_file=PATH path to the configuration file\n"));
printf(_(" -R, --remote-user=USERNAME database server username for rsync\n"));
printf(_(" -w, --wal-keep-segments=VALUE minimum value for the GUC\n" \
" wal_keep_segments (default: 5000)\n"));
printf(_(" -I, --ignore-rsync-warning ignore rsync partial transfer warning\n"));
printf(_(" -k, --keep-history=VALUE keeps indicated number of days of\n" \
" history\n"));
printf(_(" -F, --force force potentially dangerous operations\n" \
" to happen\n"));
printf(_(" -W, --wait wait for a master to appear\n"));
printf(_("\n%s performs some tasks like clone a node, promote it "), progname);
printf(_("or making follow another node and then exits.\n"));
printf(_("\n%s performs some tasks like clone a node, promote it or making follow\n"), progname);
printf(_("another node and then exits.\n\n"));
printf(_("COMMANDS:\n"));
printf(_(" master register - registers the master in a cluster\n"));
printf(_(" standby register - registers a standby in a cluster\n"));
printf(_(" standby clone [node] - allows creation of a new standby\n"));
printf(_(" standby promote - allows manual promotion of a specific standby into a "));
printf(_("new master in the event of a failover\n"));
printf(_(" standby follow - allows the standby to re-point itself to a new master\n"));
printf(_(" cluster show - print node informations\n"));
printf(_(" cluster cleanup - cleans monitor's history\n"));
printf(_(" master register - registers the master in a cluster\n"));
printf(_(" standby register - registers a standby in a cluster\n"));
printf(_(" standby clone [node] - allows creation of a new standby\n"));
printf(_(" standby promote - allows manual promotion of a specific standby into\n" \
" a new master in the event of a failover\n"));
printf(_(" standby follow - allows the standby to re-point itself to a new\n" \
" master\n"));
printf(_(" cluster show - print node information\n"));
printf(_(" cluster cleanup - cleans monitor's history\n"));
}
@@ -1762,13 +1834,20 @@ test_ssh_connection(char *host, char *remote_user)
char script[MAXLEN];
int r;
/* On some OS, true is located in a different place than in Linux */
#ifdef __FreeBSD__
#define TRUEBIN_PATH "/usr/bin/true"
#else
#define TRUEBIN_PATH "/bin/true"
#endif
/* Check if we have ssh connectivity to host before trying to rsync */
if (!remote_user[0])
maxlen_snprintf(script, "ssh -o Batchmode=yes %s /bin/true", host);
maxlen_snprintf(script, "ssh -o Batchmode=yes %s %s %s", options.ssh_options, host, TRUEBIN_PATH);
else
maxlen_snprintf(script, "ssh -o Batchmode=yes %s -l %s /bin/true", host, remote_user);
maxlen_snprintf(script, "ssh -o Batchmode=yes %s %s -l %s %s", options.ssh_options, host, remote_user, TRUEBIN_PATH);
log_debug(_("command is: %s"), script);
log_debug(_("command is: %s\n"), script);
r = system(script);
if (r != 0)
log_info(_("Can not connect to the remote host (%s)\n"), host);
@@ -1805,7 +1884,7 @@ copy_remote_files(char *host, char *remote_user, char *remote_path,
if (is_directory)
{
strcat(rsync_flags, " --exclude=pg_xlog* --exclude=pg_control --exclude=*.pid");
strcat(rsync_flags, " --exclude=pg_xlog* --exclude=pg_log* --exclude=pg_control --exclude=*.pid");
maxlen_snprintf(script, "rsync %s %s:%s/* %s",
rsync_flags, host_string, remote_path, local_path);
}
@@ -1835,7 +1914,7 @@ copy_remote_files(char *host, char *remote_user, char *remote_path,
log_info(_("rsync partial transfer warning ignored\n"));
}
else
log_warning( _("\nrsync completed with return code 24: "
log_warning( _("rsync completed with return code 24: "
"\"Partial transfer due to vanished source files\".\n"
"This can happen because of normal operation "
"on the master server, but it may indicate an "
@@ -2004,7 +2083,7 @@ create_schema(PGconn *conn)
sqlquery_snprintf(sqlquery, "CREATE TABLE %s.repl_nodes ( "
" id integer primary key, "
" cluster text not null, "
" name text not null, "
" name text not null, "
" conninfo text not null, "
" priority integer not null, "
" witness boolean not null default false)", repmgr_schema);
@@ -2037,8 +2116,8 @@ create_schema(PGconn *conn)
/* a view */
sqlquery_snprintf(sqlquery, "CREATE VIEW %s.repl_status AS "
" SELECT primary_node, standby_node, name AS standby_name, last_monitor_time, "
" last_wal_primary_location, last_wal_standby_location, "
" pg_size_pretty(replication_lag) replication_lag, "
" last_wal_primary_location, last_wal_standby_location, "
" pg_size_pretty(replication_lag) replication_lag, "
" pg_size_pretty(apply_lag) apply_lag, "
" age(now(), last_monitor_time) AS time_lag "
" FROM %s.repl_monitor JOIN %s.repl_nodes ON standby_node = id "
@@ -2056,8 +2135,8 @@ create_schema(PGconn *conn)
/* an index to improve performance of the view */
sqlquery_snprintf(sqlquery, "CREATE INDEX idx_repl_status_sort "
" ON %s.repl_monitor (last_monitor_time, standby_node) ",
repmgr_schema);
" ON %s.repl_monitor (last_monitor_time, standby_node) ",
repmgr_schema);
log_debug(_("master register: %s\n"), sqlquery);
if (!PQexec(conn, sqlquery))
{
@@ -2069,9 +2148,9 @@ create_schema(PGconn *conn)
/* XXX Here we MUST try to load the repmgr_function.sql not hardcode it here */
sqlquery_snprintf(sqlquery,
"CREATE OR REPLACE FUNCTION %s.repmgr_update_standby_location(text) RETURNS boolean "
"AS '$libdir/repmgr_funcs', 'repmgr_update_standby_location' "
"LANGUAGE C STRICT ", repmgr_schema);
"CREATE OR REPLACE FUNCTION %s.repmgr_update_standby_location(text) RETURNS boolean "
"AS '$libdir/repmgr_funcs', 'repmgr_update_standby_location' "
"LANGUAGE C STRICT ", repmgr_schema);
if (!PQexec(conn, sqlquery))
{
fprintf(stderr, "Cannot create the function repmgr_update_standby_location: %s\n",
@@ -2080,9 +2159,9 @@ create_schema(PGconn *conn)
}
sqlquery_snprintf(sqlquery,
"CREATE OR REPLACE FUNCTION %s.repmgr_get_last_standby_location() RETURNS text "
"AS '$libdir/repmgr_funcs', 'repmgr_get_last_standby_location' "
"LANGUAGE C STRICT ", repmgr_schema);
"CREATE OR REPLACE FUNCTION %s.repmgr_get_last_standby_location() RETURNS text "
"AS '$libdir/repmgr_funcs', 'repmgr_get_last_standby_location' "
"LANGUAGE C STRICT ", repmgr_schema);
if (!PQexec(conn, sqlquery))
{
fprintf(stderr, "Cannot create the function repmgr_get_last_standby_location: %s\n",
@@ -2154,30 +2233,35 @@ write_primary_conninfo(char* line)
/* Environment variable for password (UGLY, please use .pgpass!) */
const char *password = getenv("PGPASSWORD");
if (password != NULL) {
if (password != NULL)
{
maxlen_snprintf(password_buf, " password=%s", password);
}
else if (require_password) {
else if (require_password)
{
log_err(_("%s: PGPASSWORD not set, but having one is required\n"),
progname);
progname);
exit(ERR_BAD_PASSWORD);
}
if (runtime_options.host[0]) {
if (runtime_options.host[0])
{
maxlen_snprintf(host_buf, " host=%s", runtime_options.host);
}
if (runtime_options.username[0]) {
if (runtime_options.username[0])
{
maxlen_snprintf(user_buf, " user=%s", runtime_options.username);
}
if (options.node_name[0]) {
if (options.node_name[0])
{
maxlen_snprintf(appname_buf, " application_name=%s", options.node_name);
}
maxlen_snprintf(conn_buf, "port=%s%s%s%s%s",
(runtime_options.masterport[0]) ? runtime_options.masterport : "5432", host_buf, user_buf, password_buf,
appname_buf);
(runtime_options.masterport[0]) ? runtime_options.masterport : "5432", host_buf, user_buf, password_buf,
appname_buf);
maxlen_snprintf(line, "primary_conninfo = '%s'", conn_buf);

View File

@@ -11,7 +11,8 @@ node_name=standby2
# Connection information
conninfo='host=192.168.204.104'
rsync_options=--archive --checksum --compress --progress --rsh=ssh
rsync_options=--archive --checksum --compress --progress --rsh="ssh -o \"StrictHostKeyChecking no\""
ssh_options=-o "StrictHostKeyChecking no"
# How many seconds we wait for master response before declaring master failure
master_response_timeout=60
@@ -23,8 +24,8 @@ reconnect_interval=10
# Autofailover options
failover=automatic
priority=-1
promote_command='repmgr promote'
follow_command='repmgr follow'
promote_command='repmgr standby promote -f /path/to/repmgr.conf'
follow_command='repmgr standby follow -f /path/to/repmgr.conf -W'
# Log level: possible values are DEBUG, INFO, NOTICE, WARNING, ERR, ALERT, CRIT or EMERG
# Default: NOTICE
@@ -33,3 +34,29 @@ loglevel=NOTICE
# Logging facility: possible values are STDERR or - for Syslog integration - one of LOCAL0, LOCAL1, ..., LOCAL7, USER
# Default: STDERR
logfacility=STDERR
# path to pg_ctl executable
pg_bindir=/usr/bin/
#
# you may add command line arguments for pg_ctl
#
# pg_ctl_options='-s'
#
# redirect stderr to a logfile
#
# logfile='/var/log/repmgr.log'
#
# change monitoring interval; default is 2s
#
# monitor_interval_secs=2
#
# change wait time for master; before we bail out and exit when the
# master disappears, we wait 6 * retry_promote_interval_secs seconds;
# by default this would be half an hour (since sleep_delay default
# value is 300)
#
# retry_promote_interval_secs=300

View File

@@ -1,6 +1,6 @@
/*
* repmgr.h
* Copyright (c) 2ndQuadrant, 2010-2012
* Copyright (c) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -59,6 +59,7 @@ typedef struct
char wal_keep_segments[MAXLEN];
bool verbose;
bool force;
bool wait_for_master;
bool ignore_rsync_warn;
char masterport[MAXLEN];
@@ -68,6 +69,6 @@ typedef struct
int keep_history;
} t_runtime_options;
#define SLEEP_MONITOR 2
#define T_RUNTIME_OPTIONS_INITIALIZER { "", "", "", "", "", "", DEFAULT_WAL_KEEP_SEGMENTS, false, false, false, false, "", "", 0 }
#endif

View File

@@ -1,7 +1,7 @@
/*
* repmgr.sql
*
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
*/

921
repmgrd.c

File diff suppressed because it is too large Load Diff

View File

@@ -9,7 +9,8 @@ DATA=uninstall_repmgr_funcs.sql
OBJS=repmgr_funcs.o
ifdef USE_PGXS
PGXS := $(shell pg_config --pgxs)
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
else
subdir = contrib/repmgr/sql

View File

@@ -1,7 +1,7 @@
/*
* strutil.c
*
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -25,7 +25,7 @@
#include "log.h"
#include "strutil.h"
static int xvsnprintf(char *str, size_t size, const char *format, va_list ap);
static int xvsnprintf(char *str, size_t size, const char *format, va_list ap) __attribute__ ((format (PG_PRINTF_ATTRIBUTE, 3, 0)));
/* Add strnlen on platforms that don't have it, like OS X */
#ifndef strnlen
@@ -44,7 +44,7 @@ xvsnprintf(char *str, size_t size, const char *format, va_list ap)
retval = vsnprintf(str, size, format, ap);
if (retval >= size)
if (retval >= (int)size)
{
log_err(_("Buffer of size not large enough to format entire string '%s'\n"),
str);

View File

@@ -1,6 +1,6 @@
/*
* strutil.h
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
*
* This program is free software: you can redistribute it and/or modify
@@ -31,9 +31,9 @@
#define MAXCONNINFO 1024
extern int xsnprintf(char *str, size_t size, const char *format, ...);
extern int sqlquery_snprintf(char *str, const char *format, ...);
extern int maxlen_snprintf(char *str, const char *format, ...);
extern int xsnprintf(char *str, size_t size, const char *format, ...) __attribute__ ((format (PG_PRINTF_ATTRIBUTE, 3, 4)));
extern int sqlquery_snprintf(char *str, const char *format, ...) __attribute__ ((format (PG_PRINTF_ATTRIBUTE, 2, 3)));
extern int maxlen_snprintf(char *str, const char *format, ...) __attribute__ ((format (PG_PRINTF_ATTRIBUTE, 2, 3)));
/* Add strnlen on platforms that don't have it, like OS X */
#ifndef strnlen

View File

@@ -1,7 +1,7 @@
/*
* uninstall_repmgr.sql
*
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
*/

View File

@@ -1,4 +1,5 @@
#ifndef _VERSION_H_
#define _VERSION_H_
#define REPMGR_VERSION "2.0beta1"
#define REPMGR_VERSION "2.0RC1"
#endif