repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-22 22:56:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	5e4bdb5a1b	repmgrd: handle failover with two nodes in the primary location If two nodes were in the primary location, and at least one node in another location, the non-failed node in the primary location was not recognising itself as a promotion candidate. Addresses GitHub #407.	2018-04-02 20:51:27 +09:00
Ian Barwick	a403da67bc	Consolidate connection closure calls	2018-03-27 16:43:59 +09:00
Ian Barwick	0e55a60660	Add event "repmgrd_failover_aborted"	2018-03-21 13:23:06 +09:00
Ian Barwick	81c69e3677	repmgrd: fix typo	2018-03-21 12:36:15 +09:00
Ian Barwick	2a99dfa15b	repmgrd: fix failover handling in "manual" mode Regression was introduced in commit `c7a585c555`	2018-03-07 19:21:40 +09:00
Ian Barwick	cdb504d700	Add event "repmgrd_shutdown" Implements GitHub #393	2018-03-06 11:00:03 +09:00
Ian Barwick	0af2077bed	repmgrd: add debug log output for "monitor_interval_secs" sleep in all modes	2018-03-06 10:56:21 +09:00
Ian Barwick	bc766a48ed	repmgrd: retry standby connection after cascading standby failover	2018-03-02 11:05:07 +09:00
Ian Barwick	55441f2729	repmgrd: add configuration file parameter "standby_reconnect_timeout" This is used for determining a timeout when reconnecting to the standby after executing the "follow_command". This will normally not need to be set explicitly, but maybe useful in cases where the standby's startup phase can last longer than usual.	2018-03-02 11:04:56 +09:00
Ian Barwick	c1356b9e0d	repmgrd: retry standby connection after "follow_command" executed It's possible that the standby is still starting up after the "follow_command" completes, so poll for a while until we get a connection.	2018-03-02 11:04:19 +09:00
Ian Barwick	22b3a74fa0	repmgrd: improve detection of status change from primary to standby If repmgrd is running in degraded mode on a primary which has been stopped, then manually been brought back online as a standby (e.g. by creating recovery.conf and starting the server), ensure it not only detects the change but automatically updates the node record so it can resume monitoring the node as a standby. Previously, repmgrd was looping waiting for the record to be updated (as is done transparently when executing "repmgr node rejoin") but if the record was not updated within the timeout period (e.g. by "repmgr standby register) it would fail to resume monitoring as a standby. It seems reasonable to have repmgrd automatically update the node record, as this will restore failover capability as quickly as possible. If this is not desired, then the onus is on the user to shut down repmgrd while making the desired changes.	2018-02-22 15:50:45 +09:00
Ian Barwick	ec068e38a2	Remove --bdr-only configuration option This was required for a specific use case during pre-release development and is no longer needed now the physical streaming replication handling is implemented.	2018-01-25 10:48:09 +09:00
Ian Barwick	e64d965c6a	repmgrd: document standby_[failure\|recovery] event notifications Also clean up the relevant code section. Addresses GitHub #359.	2018-01-04 09:33:37 +09:00
Ian Barwick	26a9e848fd	Update copyright notices to 2018	2018-01-02 10:19:46 +09:00
Ian Barwick	8c422d6084	Remove unneeded functions	2017-11-20 15:18:21 +09:00
Ian Barwick	08b443dce0	repmgrd: renable monitoring data recording when in archive recovery. The warning emitted gives the impression that monitoring data shouldn't be written if there's no streaming replication, but we can and should do this as long as we have a primary connection. Explictly document this in the code. Also remove an unused variable warning.	2017-11-16 17:17:17 +09:00
Ian Barwick	9d432546bf	repmgrd: don't fail over unless more than 50% of active nodes are visible.	2017-11-15 13:48:28 +09:00
Ian Barwick	3c557ebd8e	repmgrd: finalize witness failover handling	2017-11-15 13:48:25 +09:00
Ian Barwick	4efeb52cba	repmgrd: synchronise repmgr.nodes table on witness server	2017-11-15 13:48:21 +09:00
Ian Barwick	60422c66f9	repmgrd: handle witness server	2017-11-15 13:48:17 +09:00
Ian Barwick	a31980b590	repmgrd: basic witness node monitoring	2017-11-15 13:48:11 +09:00
Ian Barwick	a6cc4d80f0	Add "witness register" functionality	2017-11-15 13:47:45 +09:00
Ian Barwick	9908a9c662	repmgrd: detect role change from primary to standby If repmgrd is monitoring a primary which is taken off-line, then later restored as a standby, detect this change and resume monitoring in standby node. Addresses GitHub #338.	2017-11-10 17:19:30 +09:00
Ian Barwick	0230bafae1	repmgrd: updates related to node_id handling	2017-11-10 12:07:31 +09:00
Ian Barwick	de577adc67	repmgrd: catch corner cases where monitoring data is not available	2017-11-09 22:27:09 +09:00
Ian Barwick	fed17d49e3	repmgrd: ensure shmem is reinitialised after a restart	2017-11-09 19:31:21 +09:00
Ian Barwick	d80763f974	repmgrd: misc fixes	2017-11-09 19:31:16 +09:00
Ian Barwick	331e982bdb	repmgrd: fix priority/node_id tie-break check	2017-11-09 19:31:12 +09:00
Ian Barwick	6ac6e0733a	repmgrd: simplify the candidate selection logic All disconnected nodes will be in a static, known state, so as long as each node has the same meta-information (repmgr.nodes) and is able to retrieve the last receive LSN of the other nodes, it is possible for each node to independently determine the best promotion candidate, thereby reaching consensus without an explicit "voting" process.	2017-11-09 19:31:04 +09:00
Ian Barwick	79d21b516b	repmgrd: fixes to failover handling get_new_primary() returns NULL if no notification for the new primary has been received, but the code was expecting it to return UNKNOWN_NODE_ID, which was causing repmgrd to prematurely drop out of the new primary detection loop if no notification had been received by the time the loop started. Also store the electoral term as a single row, single column table, to ensure that all repmgrds see the same turn. It is then bumped by the winning node after it gets promoted. Various logging improvements.	2017-11-08 14:28:08 +09:00
Ian Barwick	d6c27f8938	Standardize quoting in log messages	2017-10-04 09:34:59 +09:00
Ian Barwick	a9f4a027a7	pgindent run	2017-09-11 11:14:13 +09:00
Ian Barwick	3447257ae4	repmgrd: minor fixes and comment updates	2017-09-08 20:59:21 +09:00
Ian Barwick	e4f7dc8234	Add copyright notices	2017-09-08 13:27:39 +09:00
Ian Barwick	1ef00f5a3b	repmgrd: parse "follow_command" during cascaded standby failover	2017-09-05 11:19:25 +09:00
Ian Barwick	78e6bdeebe	Have repmgrd parse "standby follow --upstream-node-id=%n"	2017-09-04 13:42:50 +09:00
Ian Barwick	ab6702891a	Minor fixes to cascading standby failover.	2017-09-01 13:09:17 +09:00
Ian Barwick	154c76e5e7	repmgrd: improve cascaded standby failover Check primary is available.	2017-08-29 15:29:17 +09:00
Ian Barwick	e0888c1f62	repmgrd: handle SIGHUP	2017-08-29 12:55:13 +09:00
Ian Barwick	df827c6518	Update repmgrd documentation	2017-08-29 11:04:30 +09:00
Ian Barwick	4a11551c2f	repmgrd: handle local node failure	2017-08-28 10:31:43 +09:00
Ian Barwick	fcd111ac4c	Improve logging output during failover process	2017-08-24 22:44:03 +09:00
Ian Barwick	db157ad9bc	Update README	2017-08-24 17:43:01 +09:00
Ian Barwick	eee8d65259	Update view "replication_status"	2017-08-24 15:05:13 +09:00
Ian Barwick	a659132ea4	repmgrd: write monitoring statistics	2017-08-24 11:49:44 +09:00
Ian Barwick	8dfb7bbc7d	repmgrd: handle promotion failure properly	2017-08-23 21:44:18 +09:00
Ian Barwick	6259463007	repmgrd: various fixes for "manual" failover mode	2017-08-23 10:56:55 +09:00
Ian Barwick	791640e3b4	repmgrd: never execute "service_promote_command" directly	2017-08-02 12:09:25 +09:00
Ian Barwick	7cf3b9b618	repmgrd: improve logging of BDR monitoring Also always log information about event_notification command	2017-07-27 21:12:41 +09:00
Ian Barwick	56b2e9bb84	Rename/add configuration file options In previous versions of repmgr, some options had ambiguous meanings, and/or were used for slightly different purposes. This way we end up with a couple more options (most of which probably won't need adjusting) but greater clarity and flexibility. Removed: master_reponse_timeout: renamed to "async_query_timeout", as this was its main usage retry_promote_interval_secs: replaced by "primary_notification_timeout" Added: async_query_timeout: timeout (in seconds) when executing asynchronous queries primary_notification_timeout: number of seconds to wait for notification from the new primary after a failover primary_follow_timeout: number of seconds to wait for the new primary to become available when executing "repmgr standby follow"	2017-07-25 11:13:32 +09:00

1 2 3 4 5

213 Commits