repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-24 15:46:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	9aea5b8aa7	repmgrd: fix failover handling in "manual" mode Regression was introduced in commit `c7a585c555`	2018-03-06 22:35:51 +09:00
Ian Barwick	9c72c0d66e	Add event "repmgrd_shutdown" Implements GitHub #393	2018-03-06 10:59:54 +09:00
Ian Barwick	5a52917421	repmgrd: add debug log output for "monitor_interval_secs" sleep in all modes	2018-03-05 14:23:58 +09:00
Ian Barwick	fe594c95ad	repmgrd: retry standby connection after cascading standby failover	2018-02-28 21:15:11 +09:00
Ian Barwick	60e63feaca	repmgrd: add configuration file parameter "standby_reconnect_timeout" This is used for determining a timeout when reconnecting to the standby after executing the "follow_command". This will normally not need to be set explicitly, but maybe useful in cases where the standby's startup phase can last longer than usual.	2018-02-28 18:56:33 +09:00
Ian Barwick	5e8b41e221	repmgrd: retry standby connection after "follow_command" executed It's possible that the standby is still starting up after the "follow_command" completes, so poll for a while until we get a connection.	2018-02-28 15:35:47 +09:00
Ian Barwick	c7a585c555	repmgrd: improve log output - emit explicit startup NOTICE - emit NOTICE when falling back to degraded monitoring on a primary node - improve log message and event notification details when monitoring a former primary which has been reconnected as a standby	2018-02-28 12:35:13 +09:00
Ian Barwick	829cf5cca4	repmgrd: improve detection of status change from primary to standby If repmgrd is running in degraded mode on a primary which has been stopped, then manually been brought back online as a standby (e.g. by creating recovery.conf and starting the server), ensure it not only detects the change but automatically updates the node record so it can resume monitoring the node as a standby. Previously, repmgrd was looping waiting for the record to be updated (as is done transparently when executing "repmgr node rejoin") but if the record was not updated within the timeout period (e.g. by "repmgr standby register) it would fail to resume monitoring as a standby. It seems reasonable to have repmgrd automatically update the node record, as this will restore failover capability as quickly as possible. If this is not desired, then the onus is on the user to shut down repmgrd while making the desired changes.	2018-02-22 11:35:47 +09:00
Ian Barwick	6dc1969ad5	Remove --bdr-only configuration option This was required for a specific use case during pre-release development and is no longer needed now the physical streaming replication handling is implemented.	2018-01-18 13:30:47 +09:00
Ian Barwick	486f8e5a2c	repmgrd: document standby_[failure\|recovery] event notifications Also clean up the relevant code section. Addresses GitHub #359.	2018-01-04 09:34:49 +09:00
Ian Barwick	1521657965	Update copyright notices to 2018	2018-01-02 10:20:09 +09:00
Ian Barwick	f6a6df3600	repmgrd: renable monitoring data recording when in archive recovery. The warning emitted gives the impression that monitoring data shouldn't be written if there's no streaming replication, but we can and should do this as long as we have a primary connection. Explictly document this in the code. Also remove an unused variable warning.	2017-11-20 15:29:21 +09:00
Ian Barwick	67e27f9ecd	Remove unneeded functions	2017-11-20 15:26:32 +09:00
Ian Barwick	53ebde8f33	repmgrd: don't fail over unless more than 50% of active nodes are visible.	2017-11-15 14:04:41 +09:00
Ian Barwick	5e9d50f8ca	repmgrd: finalize witness failover handling	2017-11-15 14:04:37 +09:00
Ian Barwick	347e753c27	repmgrd: synchronise repmgr.nodes table on witness server	2017-11-15 14:04:34 +09:00
Ian Barwick	2f978847b1	repmgrd: handle witness server	2017-11-15 14:04:30 +09:00
Ian Barwick	e02ddd0f37	repmgrd: basic witness node monitoring	2017-11-15 14:04:23 +09:00
Ian Barwick	31b856dd9f	Add "witness register" functionality	2017-11-15 14:03:54 +09:00
Ian Barwick	e16eb42693	repmgrd: detect role change from primary to standby If repmgrd is monitoring a primary which is taken off-line, then later restored as a standby, detect this change and resume monitoring in standby node. Addresses GitHub #338.	2017-11-15 14:03:26 +09:00
Ian Barwick	cbc97d84ac	repmgrd: updates related to node_id handling	2017-11-15 14:03:15 +09:00
Ian Barwick	96fe7dd2d6	repmgrd: catch corner cases where monitoring data is not available	2017-11-15 14:03:12 +09:00
Ian Barwick	13935a88c9	repmgrd: ensure shmem is reinitialised after a restart	2017-11-09 19:51:31 +09:00
Ian Barwick	5275890467	repmgrd: misc fixes	2017-11-09 19:51:26 +09:00
Ian Barwick	7f865fdaf3	repmgrd: fix priority/node_id tie-break check	2017-11-09 19:51:22 +09:00
Ian Barwick	a3428e4d8a	repmgrd: simplify the candidate selection logic All disconnected nodes will be in a static, known state, so as long as each node has the same meta-information (repmgr.nodes) and is able to retrieve the last receive LSN of the other nodes, it is possible for each node to independently determine the best promotion candidate, thereby reaching consensus without an explicit "voting" process.	2017-11-09 19:51:13 +09:00
Ian Barwick	03b9475755	repmgrd: fixes to failover handling get_new_primary() returns NULL if no notification for the new primary has been received, but the code was expecting it to return UNKNOWN_NODE_ID, which was causing repmgrd to prematurely drop out of the new primary detection loop if no notification had been received by the time the loop started. Also store the electoral term as a single row, single column table, to ensure that all repmgrds see the same turn. It is then bumped by the winning node after it gets promoted. Various logging improvements.	2017-11-09 19:51:09 +09:00
Ian Barwick	d6c27f8938	Standardize quoting in log messages	2017-10-04 09:34:59 +09:00
Ian Barwick	a9f4a027a7	pgindent run	2017-09-11 11:14:13 +09:00
Ian Barwick	3447257ae4	repmgrd: minor fixes and comment updates	2017-09-08 20:59:21 +09:00
Ian Barwick	e4f7dc8234	Add copyright notices	2017-09-08 13:27:39 +09:00
Ian Barwick	1ef00f5a3b	repmgrd: parse "follow_command" during cascaded standby failover	2017-09-05 11:19:25 +09:00
Ian Barwick	78e6bdeebe	Have repmgrd parse "standby follow --upstream-node-id=%n"	2017-09-04 13:42:50 +09:00
Ian Barwick	ab6702891a	Minor fixes to cascading standby failover.	2017-09-01 13:09:17 +09:00
Ian Barwick	154c76e5e7	repmgrd: improve cascaded standby failover Check primary is available.	2017-08-29 15:29:17 +09:00
Ian Barwick	e0888c1f62	repmgrd: handle SIGHUP	2017-08-29 12:55:13 +09:00
Ian Barwick	df827c6518	Update repmgrd documentation	2017-08-29 11:04:30 +09:00
Ian Barwick	4a11551c2f	repmgrd: handle local node failure	2017-08-28 10:31:43 +09:00
Ian Barwick	fcd111ac4c	Improve logging output during failover process	2017-08-24 22:44:03 +09:00
Ian Barwick	db157ad9bc	Update README	2017-08-24 17:43:01 +09:00
Ian Barwick	eee8d65259	Update view "replication_status"	2017-08-24 15:05:13 +09:00
Ian Barwick	a659132ea4	repmgrd: write monitoring statistics	2017-08-24 11:49:44 +09:00
Ian Barwick	8dfb7bbc7d	repmgrd: handle promotion failure properly	2017-08-23 21:44:18 +09:00
Ian Barwick	6259463007	repmgrd: various fixes for "manual" failover mode	2017-08-23 10:56:55 +09:00
Ian Barwick	791640e3b4	repmgrd: never execute "service_promote_command" directly	2017-08-02 12:09:25 +09:00
Ian Barwick	7cf3b9b618	repmgrd: improve logging of BDR monitoring Also always log information about event_notification command	2017-07-27 21:12:41 +09:00
Ian Barwick	56b2e9bb84	Rename/add configuration file options In previous versions of repmgr, some options had ambiguous meanings, and/or were used for slightly different purposes. This way we end up with a couple more options (most of which probably won't need adjusting) but greater clarity and flexibility. Removed: master_reponse_timeout: renamed to "async_query_timeout", as this was its main usage retry_promote_interval_secs: replaced by "primary_notification_timeout" Added: async_query_timeout: timeout (in seconds) when executing asynchronous queries primary_notification_timeout: number of seconds to wait for notification from the new primary after a failover primary_follow_timeout: number of seconds to wait for the new primary to become available when executing "repmgr standby follow"	2017-07-25 11:13:32 +09:00
Ian Barwick	d3776ad13e	repmgrd: consolidate some code	2017-07-19 15:28:25 +09:00
Ian Barwick	a7b7d86ecc	repmgrd: handle manual failover mode correctly	2017-07-19 14:01:01 +09:00
Ian Barwick	23e6440dfd	repmgrd: initiate primary monitoring when local node is promoted manually	2017-07-19 11:15:38 +09:00

1 2

60 Commits