Commit Graph

42 Commits

Author SHA1 Message Date
Ian Barwick
31b856dd9f Add "witness register" functionality 2017-11-15 14:03:54 +09:00
Ian Barwick
e16eb42693 repmgrd: detect role change from primary to standby
If repmgrd is monitoring a primary which is taken off-line, then later
restored as a standby, detect this change and resume monitoring
in standby node.

Addresses GitHub #338.
2017-11-15 14:03:26 +09:00
Ian Barwick
cbc97d84ac repmgrd: updates related to node_id handling 2017-11-15 14:03:15 +09:00
Ian Barwick
96fe7dd2d6 repmgrd: catch corner cases where monitoring data is not available 2017-11-15 14:03:12 +09:00
Ian Barwick
13935a88c9 repmgrd: ensure shmem is reinitialised after a restart 2017-11-09 19:51:31 +09:00
Ian Barwick
5275890467 repmgrd: misc fixes 2017-11-09 19:51:26 +09:00
Ian Barwick
7f865fdaf3 repmgrd: fix priority/node_id tie-break check 2017-11-09 19:51:22 +09:00
Ian Barwick
a3428e4d8a repmgrd: simplify the candidate selection logic
All disconnected nodes will be in a static, known state, so as long as
each node has the same meta-information (repmgr.nodes) and is able
to retrieve the last receive LSN of the other nodes, it is possible
for each node to independently determine the best promotion candidate,
thereby reaching consensus without an explicit "voting" process.
2017-11-09 19:51:13 +09:00
Ian Barwick
03b9475755 repmgrd: fixes to failover handling
get_new_primary() returns NULL if no notification for the new primary has
been received, but the code was expecting it to return UNKNOWN_NODE_ID,
which was causing repmgrd to prematurely drop out of the new primary
detection loop if no notification had been received by the time the loop
started.

Also store the electoral term as a single row, single column table,
to ensure that all repmgrds see the same turn. It is then bumped
by the winning node after it gets promoted.

Various logging improvements.
2017-11-09 19:51:09 +09:00
Ian Barwick
d6c27f8938 Standardize quoting in log messages 2017-10-04 09:34:59 +09:00
Ian Barwick
a9f4a027a7 pgindent run 2017-09-11 11:14:13 +09:00
Ian Barwick
3447257ae4 repmgrd: minor fixes and comment updates 2017-09-08 20:59:21 +09:00
Ian Barwick
e4f7dc8234 Add copyright notices 2017-09-08 13:27:39 +09:00
Ian Barwick
1ef00f5a3b repmgrd: parse "follow_command" during cascaded standby failover 2017-09-05 11:19:25 +09:00
Ian Barwick
78e6bdeebe Have repmgrd parse "standby follow --upstream-node-id=%n" 2017-09-04 13:42:50 +09:00
Ian Barwick
ab6702891a Minor fixes to cascading standby failover. 2017-09-01 13:09:17 +09:00
Ian Barwick
154c76e5e7 repmgrd: improve cascaded standby failover
Check primary is available.
2017-08-29 15:29:17 +09:00
Ian Barwick
e0888c1f62 repmgrd: handle SIGHUP 2017-08-29 12:55:13 +09:00
Ian Barwick
df827c6518 Update repmgrd documentation 2017-08-29 11:04:30 +09:00
Ian Barwick
4a11551c2f repmgrd: handle local node failure 2017-08-28 10:31:43 +09:00
Ian Barwick
fcd111ac4c Improve logging output during failover process 2017-08-24 22:44:03 +09:00
Ian Barwick
db157ad9bc Update README 2017-08-24 17:43:01 +09:00
Ian Barwick
eee8d65259 Update view "replication_status" 2017-08-24 15:05:13 +09:00
Ian Barwick
a659132ea4 repmgrd: write monitoring statistics 2017-08-24 11:49:44 +09:00
Ian Barwick
8dfb7bbc7d repmgrd: handle promotion failure properly 2017-08-23 21:44:18 +09:00
Ian Barwick
6259463007 repmgrd: various fixes for "manual" failover mode 2017-08-23 10:56:55 +09:00
Ian Barwick
791640e3b4 repmgrd: never execute "service_promote_command" directly 2017-08-02 12:09:25 +09:00
Ian Barwick
7cf3b9b618 repmgrd: improve logging of BDR monitoring
Also always log information about event_notification command
2017-07-27 21:12:41 +09:00
Ian Barwick
56b2e9bb84 Rename/add configuration file options
In previous versions of repmgr, some options had ambiguous meanings,
and/or were used for slightly different purposes. This way we end
up with a couple more options (most of which probably won't need
adjusting) but greater clarity and flexibility.

Removed:

  master_reponse_timeout:
    renamed to "async_query_timeout", as this was its main usage

  retry_promote_interval_secs:
    replaced by "primary_notification_timeout"

Added:
  async_query_timeout:
    timeout (in seconds) when executing asynchronous queries

  primary_notification_timeout:
    number of seconds to wait for notification from the new primary
    after a failover

  primary_follow_timeout:
    number of seconds to wait for the new primary to become available
    when executing "repmgr standby follow"
2017-07-25 11:13:32 +09:00
Ian Barwick
d3776ad13e repmgrd: consolidate some code 2017-07-19 15:28:25 +09:00
Ian Barwick
a7b7d86ecc repmgrd: handle manual failover mode correctly 2017-07-19 14:01:01 +09:00
Ian Barwick
23e6440dfd repmgrd: initiate primary monitoring when local node is promoted manually 2017-07-19 11:15:38 +09:00
Ian Barwick
9558d0d3b8 repmgrd: prevent promotion of sole candidate if priority set to zero 2017-07-19 09:38:32 +09:00
Ian Barwick
6e270b2faf repmgrd: catch cases where more than one node has initiated voting
The node(s) with higher ID will "yield", leaving the decision making
up to the node with the lower ID.

This happens very rarely, usually when the random delay is close
enough on two or mode nodes that vote initiation is simultaneous.
2017-07-18 17:04:24 +09:00
Ian Barwick
48a0aa3bf7 repmgrd: improve failover handling
Make retry frequency/interval configurable as per streaming replication.
2017-07-17 14:56:52 +09:00
Ian Barwick
a56bb41891 Remove redundant fields from node record struct 2017-07-17 14:11:14 +09:00
Ian Barwick
0dcd479322 Store node status in node record struct 2017-07-17 13:50:17 +09:00
Ian Barwick
46acf75286 Fix usage of get_primary_node_record() 2017-07-17 12:12:59 +09:00
Ian Barwick
ec00202a31 Add configure option --with-bdr-only
Builds repmgr with only BDR functionality; other code is disabled
at critical points.
2017-07-16 17:18:34 +09:00
Ian Barwick
951c7dbd07 repmgrd: in BDR mode, have each repmgrd monitor each node
This will cover both the case when an entire node including
repmgrd goes down, and when one PostgreSQL instance goes down
but repmgrd is still up (in which case only one of the repmgrds
will handle the failover).
2017-07-14 15:01:18 +09:00
Ian Barwick
e3b3fb65f0 repmgrd: restrict BDR monitoring to two node setup
It's not safe to have more than two nodes with this kind of
"failover", so we don't need to select alternative nodes by
priority.
2017-07-14 12:56:11 +09:00
Ian Barwick
d77e8d4d22 repmgrd: split physical and BDR functionality into separate files 2017-07-13 17:21:29 +09:00