Commit Graph

52 Commits

Author SHA1 Message Date
Ian Barwick
ec068e38a2 Remove --bdr-only configuration option
This was required for a specific use case during pre-release
development and is no longer needed now the physical streaming
replication handling is implemented.
2018-01-25 10:48:09 +09:00
Ian Barwick
e64d965c6a repmgrd: document standby_[failure|recovery] event notifications
Also clean up the relevant code section.

Addresses GitHub #359.
2018-01-04 09:33:37 +09:00
Ian Barwick
26a9e848fd Update copyright notices to 2018 2018-01-02 10:19:46 +09:00
Ian Barwick
8c422d6084 Remove unneeded functions 2017-11-20 15:18:21 +09:00
Ian Barwick
08b443dce0 repmgrd: renable monitoring data recording when in archive recovery.
The warning emitted gives the impression that monitoring data shouldn't
be written if there's no streaming replication, but we can and should
do this as long as we have a primary connection.

Explictly document this in the code.

Also remove an unused variable warning.
2017-11-16 17:17:17 +09:00
Ian Barwick
9d432546bf repmgrd: don't fail over unless more than 50% of active nodes are visible. 2017-11-15 13:48:28 +09:00
Ian Barwick
3c557ebd8e repmgrd: finalize witness failover handling 2017-11-15 13:48:25 +09:00
Ian Barwick
4efeb52cba repmgrd: synchronise repmgr.nodes table on witness server 2017-11-15 13:48:21 +09:00
Ian Barwick
60422c66f9 repmgrd: handle witness server 2017-11-15 13:48:17 +09:00
Ian Barwick
a31980b590 repmgrd: basic witness node monitoring 2017-11-15 13:48:11 +09:00
Ian Barwick
a6cc4d80f0 Add "witness register" functionality 2017-11-15 13:47:45 +09:00
Ian Barwick
9908a9c662 repmgrd: detect role change from primary to standby
If repmgrd is monitoring a primary which is taken off-line, then later
restored as a standby, detect this change and resume monitoring
in standby node.

Addresses GitHub #338.
2017-11-10 17:19:30 +09:00
Ian Barwick
0230bafae1 repmgrd: updates related to node_id handling 2017-11-10 12:07:31 +09:00
Ian Barwick
de577adc67 repmgrd: catch corner cases where monitoring data is not available 2017-11-09 22:27:09 +09:00
Ian Barwick
fed17d49e3 repmgrd: ensure shmem is reinitialised after a restart 2017-11-09 19:31:21 +09:00
Ian Barwick
d80763f974 repmgrd: misc fixes 2017-11-09 19:31:16 +09:00
Ian Barwick
331e982bdb repmgrd: fix priority/node_id tie-break check 2017-11-09 19:31:12 +09:00
Ian Barwick
6ac6e0733a repmgrd: simplify the candidate selection logic
All disconnected nodes will be in a static, known state, so as long as
each node has the same meta-information (repmgr.nodes) and is able
to retrieve the last receive LSN of the other nodes, it is possible
for each node to independently determine the best promotion candidate,
thereby reaching consensus without an explicit "voting" process.
2017-11-09 19:31:04 +09:00
Ian Barwick
79d21b516b repmgrd: fixes to failover handling
get_new_primary() returns NULL if no notification for the new primary has
been received, but the code was expecting it to return UNKNOWN_NODE_ID,
which was causing repmgrd to prematurely drop out of the new primary
detection loop if no notification had been received by the time the loop
started.

Also store the electoral term as a single row, single column table,
to ensure that all repmgrds see the same turn. It is then bumped
by the winning node after it gets promoted.

Various logging improvements.
2017-11-08 14:28:08 +09:00
Ian Barwick
d6c27f8938 Standardize quoting in log messages 2017-10-04 09:34:59 +09:00
Ian Barwick
a9f4a027a7 pgindent run 2017-09-11 11:14:13 +09:00
Ian Barwick
3447257ae4 repmgrd: minor fixes and comment updates 2017-09-08 20:59:21 +09:00
Ian Barwick
e4f7dc8234 Add copyright notices 2017-09-08 13:27:39 +09:00
Ian Barwick
1ef00f5a3b repmgrd: parse "follow_command" during cascaded standby failover 2017-09-05 11:19:25 +09:00
Ian Barwick
78e6bdeebe Have repmgrd parse "standby follow --upstream-node-id=%n" 2017-09-04 13:42:50 +09:00
Ian Barwick
ab6702891a Minor fixes to cascading standby failover. 2017-09-01 13:09:17 +09:00
Ian Barwick
154c76e5e7 repmgrd: improve cascaded standby failover
Check primary is available.
2017-08-29 15:29:17 +09:00
Ian Barwick
e0888c1f62 repmgrd: handle SIGHUP 2017-08-29 12:55:13 +09:00
Ian Barwick
df827c6518 Update repmgrd documentation 2017-08-29 11:04:30 +09:00
Ian Barwick
4a11551c2f repmgrd: handle local node failure 2017-08-28 10:31:43 +09:00
Ian Barwick
fcd111ac4c Improve logging output during failover process 2017-08-24 22:44:03 +09:00
Ian Barwick
db157ad9bc Update README 2017-08-24 17:43:01 +09:00
Ian Barwick
eee8d65259 Update view "replication_status" 2017-08-24 15:05:13 +09:00
Ian Barwick
a659132ea4 repmgrd: write monitoring statistics 2017-08-24 11:49:44 +09:00
Ian Barwick
8dfb7bbc7d repmgrd: handle promotion failure properly 2017-08-23 21:44:18 +09:00
Ian Barwick
6259463007 repmgrd: various fixes for "manual" failover mode 2017-08-23 10:56:55 +09:00
Ian Barwick
791640e3b4 repmgrd: never execute "service_promote_command" directly 2017-08-02 12:09:25 +09:00
Ian Barwick
7cf3b9b618 repmgrd: improve logging of BDR monitoring
Also always log information about event_notification command
2017-07-27 21:12:41 +09:00
Ian Barwick
56b2e9bb84 Rename/add configuration file options
In previous versions of repmgr, some options had ambiguous meanings,
and/or were used for slightly different purposes. This way we end
up with a couple more options (most of which probably won't need
adjusting) but greater clarity and flexibility.

Removed:

  master_reponse_timeout:
    renamed to "async_query_timeout", as this was its main usage

  retry_promote_interval_secs:
    replaced by "primary_notification_timeout"

Added:
  async_query_timeout:
    timeout (in seconds) when executing asynchronous queries

  primary_notification_timeout:
    number of seconds to wait for notification from the new primary
    after a failover

  primary_follow_timeout:
    number of seconds to wait for the new primary to become available
    when executing "repmgr standby follow"
2017-07-25 11:13:32 +09:00
Ian Barwick
d3776ad13e repmgrd: consolidate some code 2017-07-19 15:28:25 +09:00
Ian Barwick
a7b7d86ecc repmgrd: handle manual failover mode correctly 2017-07-19 14:01:01 +09:00
Ian Barwick
23e6440dfd repmgrd: initiate primary monitoring when local node is promoted manually 2017-07-19 11:15:38 +09:00
Ian Barwick
9558d0d3b8 repmgrd: prevent promotion of sole candidate if priority set to zero 2017-07-19 09:38:32 +09:00
Ian Barwick
6e270b2faf repmgrd: catch cases where more than one node has initiated voting
The node(s) with higher ID will "yield", leaving the decision making
up to the node with the lower ID.

This happens very rarely, usually when the random delay is close
enough on two or mode nodes that vote initiation is simultaneous.
2017-07-18 17:04:24 +09:00
Ian Barwick
48a0aa3bf7 repmgrd: improve failover handling
Make retry frequency/interval configurable as per streaming replication.
2017-07-17 14:56:52 +09:00
Ian Barwick
a56bb41891 Remove redundant fields from node record struct 2017-07-17 14:11:14 +09:00
Ian Barwick
0dcd479322 Store node status in node record struct 2017-07-17 13:50:17 +09:00
Ian Barwick
46acf75286 Fix usage of get_primary_node_record() 2017-07-17 12:12:59 +09:00
Ian Barwick
ec00202a31 Add configure option --with-bdr-only
Builds repmgr with only BDR functionality; other code is disabled
at critical points.
2017-07-16 17:18:34 +09:00
Ian Barwick
951c7dbd07 repmgrd: in BDR mode, have each repmgrd monitor each node
This will cover both the case when an entire node including
repmgrd goes down, and when one PostgreSQL instance goes down
but repmgrd is still up (in which case only one of the repmgrds
will handle the failover).
2017-07-14 15:01:18 +09:00