Commit Graph

46 Commits

Author SHA1 Message Date
Ian Barwick
56d9f5b856 Ensure witness node sets last upstream seen time 2019-03-14 10:53:47 +09:00
Ian Barwick
c3c58df7b9 repmgrd: improve logging output when executing "failover_validate_command" 2019-03-13 21:07:26 +09:00
Ian Barwick
1615353f48 repmgrd: optionally disconnect WAL receivers during failover
This is intended to ensure that all nodes have a constant LSN while
making the failover decision.

This feature is experimental and needs to be explicitly enabled with the
configuration file option "standby_disconnect_on_failover".

Note enabling this option will result in a delay in the failover decision
until the WAL receiver is disconnected on all nodes.
2019-03-06 15:53:57 +09:00
Ian Barwick
4b89cbd98d Rename "..._primary_last_seen" functions to "..._upstream_last_seen"
As that better reflects what they do.
2019-02-28 15:36:55 +09:00
Ian Barwick
067ed82931 Remove unneeded debugging output 2019-02-26 21:16:11 +09:00
Ian Barwick
7dce3ed234 Update copyright notices to 2019 2019-01-21 14:54:35 +09:00
Ian Barwick
784c9c4793 repmgrd: return predictable default values for get_primary_last_seen()
Return 0 if the node is not in recovery. In which case it's probably
rather pointless calling this function anyway.

Return -1 if the "last_seen" field has never been set (i.e. repmgrd
hasn't started yet).
2018-11-21 11:30:32 +09:00
Ian Barwick
1458f6e6aa add functions to determine when primary last seen by repmgrd node 2018-11-21 11:30:22 +09:00
Ian Barwick
a459c60145 Avoid defining variable-length arrays
As of PostgreSQL commit d9dd406f, variable length arrays are no longer
permitted. As they're not actually required anyway, just define appropriate
constants.

Also noted in GitHub #510.
2018-10-26 10:09:45 +09:00
Ian Barwick
fd66d93937 Fix LWLockRelease() call in unset_bdr_failover_handler() 2018-10-08 09:36:50 +09:00
Ian Barwick
2491b8ae52 Add functionality to "pause" repmgrd
In some circumstances, e.g. while performing a switchover, it is essential
that repmgrd does not take any kind of failover action, as this will put
the cluster into an incorrect state.

Previously it was necessary to stop repmgrd on all nodes (or at least
those nodes which repmgrd would consider as promotion candidates), however
this is a cumbersome and potentially risk-prone operation, particularly if the
replication cluster contains more than a couple of servers.

To prevent this issue from occurring, this patch introduces the ability
to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause")
which notifies repmgrd not to take any failover action until the node
is "unpaused" ("repmgr daemon unpause").

"repmgr daemon status" provides an overview of each node and whether repmgrd
is running, and if so whether it is paused.

"repmgr standby switchover" has been modified to automatically pause repmgrd
while carrying out the switchover.

See documentation for further details.
2018-09-27 16:42:10 +09:00
Ian Barwick
ec068e38a2 Remove --bdr-only configuration option
This was required for a specific use case during pre-release
development and is no longer needed now the physical streaming
replication handling is implemented.
2018-01-25 10:48:09 +09:00
Ian Barwick
26a9e848fd Update copyright notices to 2018 2018-01-02 10:19:46 +09:00
Ian Barwick
8c422d6084 Remove unneeded functions 2017-11-20 15:18:21 +09:00
Ian Barwick
aa089820ab repmgrd: check shared library is loaded
If this isn't the case, "repmgrd" will appear to run but not handle
failover correctly.

Address GitHub #337.
2017-11-10 14:35:17 +09:00
Ian Barwick
0230bafae1 repmgrd: updates related to node_id handling 2017-11-10 12:07:31 +09:00
Ian Barwick
4ca7e6a6bf repmgrd: remove unneeded functions 2017-11-09 19:31:08 +09:00
Ian Barwick
79d21b516b repmgrd: fixes to failover handling
get_new_primary() returns NULL if no notification for the new primary has
been received, but the code was expecting it to return UNKNOWN_NODE_ID,
which was causing repmgrd to prematurely drop out of the new primary
detection loop if no notification had been received by the time the loop
started.

Also store the electoral term as a single row, single column table,
to ensure that all repmgrds see the same turn. It is then bumped
by the winning node after it gets promoted.

Various logging improvements.
2017-11-08 14:28:08 +09:00
Ian Barwick
7232187f4d Ensure shared memory functions handle NULL parameters correctly 2017-11-08 12:19:07 +09:00
Ian Barwick
4ef2b111da Fix lock acquisition in shared memory functions 2017-11-08 11:55:08 +09:00
Ian Barwick
23c011fe5e Update regression tests 2017-10-04 09:35:21 +09:00
Ian Barwick
7c3f6a00bd Add some sanity checks for calls to repmgrd functions 2017-10-04 09:35:13 +09:00
Ian Barwick
750a776f1d Fixes for PostgreSQL 9.3 support 2017-09-18 11:00:39 +09:00
Ian Barwick
31c7cb4e9a Fixes for 9.3 support 2017-09-15 17:13:17 +09:00
Ian Barwick
687c8b4e27 Initial changes for 9.3 support 2017-09-15 10:27:37 +09:00
Ian Barwick
a9f4a027a7 pgindent run 2017-09-11 11:14:13 +09:00
Ian Barwick
e4f7dc8234 Add copyright notices 2017-09-08 13:27:39 +09:00
Ian Barwick
fcd111ac4c Improve logging output during failover process 2017-08-24 22:44:03 +09:00
Ian Barwick
eee8d65259 Update view "replication_status" 2017-08-24 15:05:13 +09:00
Ian Barwick
6e270b2faf repmgrd: catch cases where more than one node has initiated voting
The node(s) with higher ID will "yield", leaving the decision making
up to the node with the lower ID.

This happens very rarely, usually when the random delay is close
enough on two or mode nodes that vote initiation is simultaneous.
2017-07-18 17:04:24 +09:00
Ian Barwick
437cb26b7e Fixes to function request_vote() 2017-07-17 12:04:56 +09:00
Ian Barwick
084e0429fc Disable non-BDR functions for BDR-only buils 2017-07-17 08:44:49 +09:00
Ian Barwick
951c7dbd07 repmgrd: in BDR mode, have each repmgrd monitor each node
This will cover both the case when an entire node including
repmgrd goes down, and when one PostgreSQL instance goes down
but repmgrd is still up (in which case only one of the repmgrds
will handle the failover).
2017-07-14 15:01:18 +09:00
Ian Barwick
618a2346e1 repmgrd: various fixed, mainly clearing status after a failover event 2017-07-04 11:55:03 +09:00
Ian Barwick
890b88d644 More failover fixes 2017-07-03 17:37:32 +09:00
Ian Barwick
debe5a18c5 have new primary communicate to standbys 2017-06-30 21:45:25 +09:00
Ian Barwick
a666a49977 Execute promote command 2017-06-30 16:04:47 +09:00
Ian Barwick
9caa715eb0 minor fixes 2017-06-30 14:30:41 +09:00
Ian Barwick
fc4f276844 Improve handling
not sure if we need to store the electoral term...
2017-06-30 13:40:19 +09:00
Ian Barwick
3514e20367 poke it around until it works less badly 2017-06-29 09:35:09 +09:00
Ian Barwick
fa86fe4ad8 Basic voting 2017-06-29 01:11:21 +09:00
Ian Barwick
d6b6255144 interim commit 2017-06-28 18:20:03 +09:00
Ian Barwick
f4e8bf891d interim commit 2017-06-28 17:28:26 +09:00
Ian Barwick
ded8d95e5a interim commit 2017-06-28 16:38:41 +09:00
Ian Barwick
35b6178e07 placeholder code for function 2017-06-27 09:50:47 +09:00
Ian Barwick
e6237cc81a Makefiles and placeholder code 2017-04-18 11:26:51 +09:00