Ian Barwick
7cf3b9b618
repmgrd: improve logging of BDR monitoring
...
Also always log information about event_notification command
2017-07-27 21:12:41 +09:00
Ian Barwick
fed6fba4ef
repmgrd: more fixes for BDR node recovery
2017-07-27 14:13:39 +09:00
Ian Barwick
dc24d62009
repmgrd: improve BDR recovery handling
2017-07-27 11:53:55 +09:00
Ian Barwick
8a2e4db1bc
Add "repmgr node status"
...
Outputs an overview of a node's status, and emits warnings if any
issues detected.
2017-07-25 00:39:04 +09:00
Ian Barwick
93c35618a2
Use bdr.bdr_is_active_in_db() when checking for BDR presence
2017-07-24 19:09:09 +09:00
Ian Barwick
e9cdf1c870
Add note
2017-07-20 23:57:28 +09:00
Ian Barwick
1a45287e76
Misc updates and fixes
2017-07-20 21:15:55 +09:00
Ian Barwick
b99443b0c8
Improvements to repmgr cluster show
...
Add documentation; show recovery status in --csv mode.
2017-07-20 10:25:13 +09:00
Ian Barwick
49ac9cf9ca
Add "repmgr cluster show"
2017-07-19 17:36:21 +09:00
Ian Barwick
a7b7d86ecc
repmgrd: handle manual failover mode correctly
2017-07-19 14:01:01 +09:00
Ian Barwick
23e6440dfd
repmgrd: initiate primary monitoring when local node is promoted manually
2017-07-19 11:15:38 +09:00
Ian Barwick
6e270b2faf
repmgrd: catch cases where more than one node has initiated voting
...
The node(s) with higher ID will "yield", leaving the decision making
up to the node with the lower ID.
This happens very rarely, usually when the random delay is close
enough on two or mode nodes that vote initiation is simultaneous.
2017-07-18 17:04:24 +09:00
Ian Barwick
2c8dd49831
repmgrd: additional check to ensure only one node handles failover
...
It's possible the "failover" is completed by one repmgrd before the
other has a chance to react, in which case the am_bdr_failover_handler()
check will not apply. Instead check if the node record has already been
set to "inactive".
2017-07-17 16:47:42 +09:00
Ian Barwick
a56bb41891
Remove redundant fields from node record struct
2017-07-17 14:11:14 +09:00
Ian Barwick
ec554e5694
Improve connection handling
...
Set "connect_timeout" and "fallback_application_name" if not present.
2017-07-17 11:10:37 +09:00
Ian Barwick
951c7dbd07
repmgrd: in BDR mode, have each repmgrd monitor each node
...
This will cover both the case when an entire node including
repmgrd goes down, and when one PostgreSQL instance goes down
but repmgrd is still up (in which case only one of the repmgrds
will handle the failover).
2017-07-14 15:01:18 +09:00
Ian Barwick
e3b3fb65f0
repmgrd: restrict BDR monitoring to two node setup
...
It's not safe to have more than two nodes with this kind of
"failover", so we don't need to select alternative nodes by
priority.
2017-07-14 12:56:11 +09:00
Ian Barwick
d653888c65
Support pre-10 WAL functions
2017-07-14 10:40:11 +09:00
Ian Barwick
dfcf85a62f
repmgrd: further BDR sanity checks
2017-07-14 10:27:28 +09:00
Ian Barwick
0320f409aa
Detect BDR capability via presence of extension
2017-07-13 14:13:46 +09:00
Ian Barwick
7eadbf6b17
Various improvements to "repmgr bdr register/unregister"
2017-07-12 22:38:03 +09:00
Ian Barwick
0a1addfdc0
When registering a BDR node, sync repmgr.nodes from another node
...
If a BDR node is added via bdr_group_join(), repmgr.nodes will
start off empty, so we'll need to sync it ourselves before adding
it to the repmgr replication set.
2017-07-12 10:11:25 +09:00
Ian Barwick
1cccb1dd5a
Add "repmgr bdr unregister"
2017-07-12 10:11:21 +09:00
Ian Barwick
71a0871232
Add "repmgr bdr register"
2017-07-11 15:38:58 +09:00
Ian Barwick
2962ffe605
repmgrd: initial BDR monitoring support
2017-07-10 23:58:59 +09:00
Ian Barwick
dddea9814b
Add BDR-related database functions
2017-07-10 21:52:39 +09:00
Ian Barwick
5fbcf3e476
Remove witness server references
2017-07-10 09:31:31 +09:00
Ian Barwick
9e3d942917
Handle various (unlikely) failure states
2017-07-10 09:00:18 +09:00
Ian Barwick
5bf7098139
repmgrd: consolidate clear_node_info_list() calls
2017-07-09 11:10:49 +09:00
Ian Barwick
2787994a6e
Make repmgrd failover settings configurable
2017-07-07 21:11:22 +09:00
Ian Barwick
0d226867b4
Add "location" column
2017-07-06 01:17:00 +09:00
Ian Barwick
614287548d
Fix function get_primary_node_record()
2017-07-05 11:20:32 +09:00
Ian Barwick
617dee6bd6
Add function create_event_record()
...
For logging an event to the event table without generating an external
event notification.
Rename existing create_event_record*() functions to create_event_notification*()
as this describes their function better.
2017-07-05 09:52:22 +09:00
Ian Barwick
24c6b2c9f1
repmgrd: initial code for cascaded standby failover
2017-07-04 23:14:05 +09:00
Ian Barwick
618a2346e1
repmgrd: various fixed, mainly clearing status after a failover event
2017-07-04 11:55:03 +09:00
Ian Barwick
c12bf01b5a
When clearing a node info list, reset the node count to 0
2017-07-03 21:59:02 +09:00
Ian Barwick
890b88d644
More failover fixes
2017-07-03 17:37:32 +09:00
Ian Barwick
debe5a18c5
have new primary communicate to standbys
2017-06-30 21:45:25 +09:00
Ian Barwick
fc4f276844
Improve handling
...
not sure if we need to store the electoral term...
2017-06-30 13:40:19 +09:00
Ian Barwick
3514e20367
poke it around until it works less badly
2017-06-29 09:35:09 +09:00
Ian Barwick
fa86fe4ad8
Basic voting
2017-06-29 01:11:21 +09:00
Ian Barwick
d6b6255144
interim commit
2017-06-28 18:20:03 +09:00
Ian Barwick
f4e8bf891d
interim commit
2017-06-28 17:28:26 +09:00
Ian Barwick
ded8d95e5a
interim commit
2017-06-28 16:38:41 +09:00
Ian Barwick
78a16d746d
Initial primary node monitoring
2017-06-27 00:15:29 +09:00
Ian Barwick
46c956e61a
Use "primary" instead of "master"
2017-06-23 21:33:54 +09:00
Ian Barwick
28808a02ab
Fix return value of _get_node_record()
2017-06-23 20:44:40 +09:00
Ian Barwick
1b2652037d
Rename enum types for consistency
2017-06-23 16:38:14 +09:00
Ian Barwick
dbaa2e0b44
Add a RecordStatus return type for functions which populate record structures
...
Unify a bunch of slightly different ways of handling the result.
2017-06-23 16:16:46 +09:00
Ian Barwick
6cdf73b4cb
repmgr standby promote: suppress master database connection error message
...
Otherwise the first line of output is an ERROR, which is confusing,
even though it's expected.
2017-06-21 13:21:44 +09:00