Commit Graph

314 Commits

Author SHA1 Message Date
Christian Kruse
a0fdadd5d2 this way it is much cleaner 2014-01-09 15:35:44 +01:00
Christian Kruse
4c3d7f80ed now code compiles with -ansi -pedantic and has less warnings 2014-01-09 14:45:07 +01:00
Christian Kruse
6e3fe059d8 added config options pg_bindir and pg_ctl_options 2014-01-09 14:44:34 +01:00
Christian Kruse
9f26254ac3 fix: added some missing initializers to avoid compiler warning 2014-01-09 13:33:22 +01:00
Christian Kruse
0e8ff1730e added handling of a PID file 2014-01-09 13:04:40 +01:00
Christian Kruse
634fdff303 fix: do not call setup_event_handlers() on WIN32
If we put setup_event_handlers() in #ifdef WIN32, we have to do it for
the call and the declaration, too
2014-01-09 12:57:16 +01:00
Christian Kruse
cbce29f009 fixed typos 2014-01-08 11:55:03 +01:00
Christian Kruse
920f925e4b added a new cli option --daemonize
This option forks the process and generates a new session. This
effectively detaches it from the shell. Don't forget to redirect stderr
or use syslog for logging!
2014-01-08 11:53:15 +01:00
Christian Kruse
9fe2d6886e white space cleanup 2014-01-07 16:42:06 +01:00
Christian Kruse
0068dd573a fix: do not compare pointers but the strings 2014-01-07 15:52:29 +01:00
Christian Kruse
d0f3cb59c7 fix: create data directory after sanity check 2014-01-07 14:42:55 +01:00
Christian Kruse
7428e92e10 fix: correctly check the return value of PQexec()
not only check if return value is not NULL but also check that the
returned result is a PGRES_COMMAND_OK (e.g. the INSERT was successful)
2014-01-07 14:27:31 +01:00
Christian Kruse
a97065113d fix: remove own node earlier if force is set
We have to remove our own node before we check for a new master if force
is set; else master register would fail on the second time since there
already is a master (ourselves), even if we specify -F
2014-01-07 14:16:58 +01:00
Christian Kruse
9e2f276fcf fix: do not exit after pg_start_backup() w/o pg_stop_backup() 2014-01-07 14:02:29 +01:00
Christian Kruse
b0cd2b5e43 fix: do not exit() in create_pgdir()
This could leave the database in a locked state (pg_start_backup()).
And since all calls to create_pgdir() handle the return value correctly
we simply replace the exit() by a return false
2014-01-07 14:01:46 +01:00
Jaime Casanova
9209248420 Fix oversight in the header of guc_setted_typed() v2.0beta2 2013-12-19 11:09:08 -05:00
Jaime Casanova
6693b99288 Files to create the debian package
Patch by: Christian Kruse
2013-12-19 01:43:12 -05:00
Jaime Casanova
8e7b487838 Update debian control file 2013-12-19 01:41:24 -05:00
Jaime Casanova
7f796e2d15 Update history and credit files 2013-12-19 01:40:00 -05:00
Jaime Casanova
5e04ab6eae Add a ssh_options parameter to allow ssh checking
to consider non-default values (ie: a different port)

Patch by Jay Taylor
2013-12-19 01:22:55 -05:00
Jaime Casanova
a1f4285e2b Add guc_setted_typed() function to allow
wal_keep_segmeents to be checked as an integer instead
of text

Patch by Jay Taylor
2013-12-19 01:22:42 -05:00
Jaime Casanova
493133986d Add timestamps to log line in stderr
Patch by Christian Kruse
2013-12-19 01:15:28 -05:00
Jaime Casanova
8b370dc581 Fix some typos
Patch by Krzysztof Gajdemski
2013-12-07 13:25:46 -05:00
Jaime Casanova
43af00aa12 Ignore pg_log when cloning, just like we ignore pg_xlog 2013-12-04 01:23:48 -05:00
Jaime Casanova
3c8df59eb9 Make repmgr compile in 9.3.
Patch provided by Shawn Ellis with some fixes by me.
2013-11-14 00:43:35 -05:00
Jaime Casanova
b410772627 Rework algorithm to coordinate voting
Make this by waiting for all nodes to finish a step, before starting
a new one. So everyone starts promoting or following in a coordinated
fashion.
Also make a few fixes.
2013-09-26 13:24:31 -05:00
Jaime Casanova
d99024ba11 Make repmgrd survive to the failover
To do this it needs to reconnect to the new master
2013-09-26 11:58:59 -05:00
Jaime Casanova
1afaa3a26f Rearrange the logic in do_failover() for further improvements.
Specially, make this a more coordinated process by making all
nodes waiting for the others before going to the next step.

This is one step further in following Andres Freund advices
but there is still a lot to do in order to complete that,
specially it could be needed to add more fields to repl_nodes
and to the shm area.
2013-09-23 18:28:58 -05:00
Jaime Casanova
3b66a31ac9 In a failover situation get the nodes in a well defined order.
When deciding which node will be the new master, we should get the
nodes in a well defined order otherwise two standbys could process
nodes with the same priority in different order and end up with
a two master situation.
2013-07-26 00:52:31 -05:00
Jaime Casanova
ad3630e7a9 Add a missing ')'. This is a typo introduced in commit
2bc8044fda

Per complaint from Carlos Chapi when compiling for a customer.
2013-07-13 12:37:15 -05:00
Jaime Casanova
2e7acf03c4 If PQgetCancel() returns NULL we should also return false.
Noted by Andres Freund.
2013-07-12 08:01:01 -05:00
Jaime Casanova
2bc8044fda Improve messages in wait_connection_availability, so we know what
error makes the failover procedure to start

By gripe from Andres Freund
2013-07-10 19:25:58 -05:00
Jaime Casanova
b0b44a157f If PQcancel() fails, consider it as if the master is failing.
Because PQcancel() establish a new synchronous connection to the
database, if it fails it means something wrong has happenned with
master. So instead of just ignore the failure, CancelQuery() now
reports a failure condition so we can detect master's death in
that situation.

This is very important specially when only postmaster crashes but
other children/backend connections are still there. Because the
children connection won't fail and CancelQuery() failure is our
only indication of something wrong happenning.
Currently we just ignore the PQcancel() failure which leads us to
a situation in which we just loop forever
trying to cancel the async query.

Reported by: Martin Euser <martin.euser@nl.abnamro.com>
Problem analyzed and bug spotted by: Andres Freund <andres@2ndquadrant.com>
Patch by: Jaime Casanova <jaime@2ndquadrant.com>
2013-07-10 09:53:45 -05:00
Jaime Casanova
49a2531930 Options -F -W -I -v doesn't accept arguments, which means that on
getopt_long shouldn't be marked with the colon (:) character.

This has been wrong since day one, so backpatching all the way until
1.1
2013-01-13 16:37:39 -05:00
Jaime Casanova
4191b77e70 If the node is a witness don't bother asking its position, it always
will be 0/0. We just need to check that we can connect to it to determine
if we are in the majority.
2013-01-11 03:42:08 -05:00
Jaime Casanova
2a5d431481 Fix a problem that caused a standby to promote itself without going to
voting procedure.

This is because of a race condition inside CheckPrimaryConnection().

This has independently reported by Alex Railean and Dumitru, and Frank Jördens.
Analyzed and fixed by Cédric Villemain.

The fix have been verified to work by Frank
2012-12-19 12:01:27 -05:00
Jaime Casanova
93a999adc7 Formatting code using astyle 2012-12-11 11:49:07 -05:00
Jaime Casanova
088ca29fe3 To select new master it needs to know which standby has received more
xlog records from master, so it standby should use pg_last_xlog_receive_location()
to report their positions. This solves a possible situation in which
a standby that is considered as new master when promoted is no longer
the best option.
2012-12-03 09:18:08 -05:00
Jaime Casanova
30e9d06172 Add an option for STANDBY FOLLOW to wait for a master to appear.
This is important for autofailover to do the right thing when
standbys detected master death at different times.

While this is a new option, seems important for the autofailover
to work properly so i will consider the lack of it a bug and
will backpatch to 2.0 where autofailover was introduced.

For gripe from Alex Railean, about a standby not finding the new
master because the new master hasn't finish promoting.
2012-11-14 15:09:26 -05:00
Jaime Casanova
cd1a84252e Fix node decision logic when priorities are involved. Currently if
two nodes with different prorities are equally good to be promoted
the second one (with a lower priority, considering them
in descending order) will win.

Per report from Brailean Dumitru
2012-09-16 02:47:02 -05:00
Jaime Casanova
2e19b3688b Add a comment 2012-09-16 02:26:18 -05:00
Jaime Casanova
de883a4c84 Keep compiler quiet. Noted when compiling in FreeBSD in which i
get a warning for an uninitialized variable.

Also, define InvalidXLogRecPtr. We don't really need it but using
it make the initialization future proof (considering that in 9.3
XLogRecPtr will change its structure).
2012-09-16 02:21:18 -05:00
Jaime Casanova
499a501afd Make repmgr compatible with FreeBSD.
We need to add an #include and make it use a different path for the
"true" binary.

Maybe we need to make this changes for all BSD systems but having no
evidence of that i prefer to make this only for systems with __FreeBSD__
2012-09-15 17:37:59 -05:00
Jaime Casanova
0a9107d76d Improve sample of commands for promote and follow 2012-09-15 17:37:43 -05:00
Jaime Casanova
95ec0450da When we have more command-line arguments than we should have we
need to show that last value and we should use only optind for that
instead of optind+1
2012-08-30 02:11:48 -05:00
Jaime Casanova
57aa95f674 Fix documentation to always use -h sintax to refer to the node we
want to clone or connect to, instead of relying on the fact that
for some time putting that argument at last worked.
2012-08-30 02:10:10 -05:00
Jaime Casanova
56d2ae4e81 Fix HISTORY to show from newest to oldest v2.0beta1 2012-07-27 11:26:18 -05:00
Jaime Casanova
3edd87a041 Fix tabs in HISTORY 2012-07-27 11:20:56 -05:00
Jaime Casanova
740208da1c Fix typos in RELEASE NOTES 2012-07-27 11:15:50 -05:00
Jaime Casanova
664e1a8321 Now that we can have no monitoring we need to check all nodes at failover
not only those in repl_monitor
2012-07-21 17:49:38 -05:00