repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-24 15:46:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	63bdc19132	repmgrd: ensure local node is counted as quorum member Rename "standby_nodes" to "sibling_nodes" to make it clearer in the code what total is actually provided by the struct. Addresses GitHub #439.	2018-06-01 17:19:40 +09:00
Ian Barwick	0ffaff75df	repmgrd: ensue degraded monitoring timeout works on standby Parameter "degraded_monitoring_timeout" was not being acted on when monitoring a streaming replication standby. Addresses GitHub #439.	2018-05-31 17:53:31 +09:00
Martín Marqués	2dfe1d18e9	Fix typo in a code comment	2018-05-19 12:29:04 -03:00
Ian Barwick	67ccd4dcb3	repmgrd: don't explicitly close connections on shutdown	2018-04-30 15:13:30 +09:00
Ian Barwick	f86e89ba45	repmgrd: notify sibling nodes to follow new primary after pg_ctl timeout If "pg_ctl promote" fails due to a timeout, but the promotion itself succeeds, have repmgrd on the new primary explicitly notify any sibling nodes to follow it. Previously the sibling nodes would wait "primary_notification_timeout" seconds before attempting to discover the new primary. This (and preceding commit `eac80ae`) address GitHub #425.	2018-04-27 11:59:00 +09:00
Ian Barwick	a6d0ba07ed	repmgrd: handle pg_ctl timeout It's possible "pg_ctl promote" will timeout, causing "repmgr standby follow" to return with an error; however the promotion itself will usually succeed, so detect this case and handle accordingly.	2018-04-26 19:23:26 +09:00
Ian Barwick	242fa287b4	repmgrd: catch corner case in standby connection handle check If repmgrd marks the local node as unavailable, and it was actually restarting but a failover event occured before the next local node check, failover will continue with the stale connection handle. Add a final local node check just before starting the failover process, so repmgrd can reconnect if it wasn't able to before.	2018-04-24 21:55:36 +09:00
Ian Barwick	fa908432c8	Minor doc and log output tweaks	2018-04-24 21:08:31 +09:00
Ian Barwick	afa942fef6	repmgrd: prevent standby connection handle from going stale If monitoring history not in use, there's no activity on the standby's connection handle, so if e.g. the standby is restarted, PQstatus() never returns CONNECTION_BAD and repmgrd never notices the connection is stale. Therefore execute a throw-away statement at "monitor_interval_secs".	2018-04-23 23:51:03 +09:00
Ian Barwick	90cba78f52	repmgrd: tweak event notifications on standby failure The event notification was only being created if there was a valid primary connection; it should be created in any case, so an event notification script can be executed.	2018-04-17 10:27:25 +09:00
Ian Barwick	65371489c6	repmgrd: handle failover with two nodes in the primary location If two nodes were in the primary location, and at least one node in another location, the non-failed node in the primary location was not recognising itself as a promotion candidate. Addresses GitHub #407.	2018-03-30 12:17:34 +09:00
Ian Barwick	37e53108a2	Consolidate connection closure calls	2018-03-27 08:52:23 +09:00
Ian Barwick	7e2af17783	repmgrd: tweak log notices when marking a standby as failed Announce what we're going to do (set the node record inactive) before performing the action. Makes reading the log slightly easier.	2018-03-23 13:27:37 +08:00
Ian Barwick	b4272853e7	Add event "repmgrd_failover_aborted"	2018-03-23 10:44:00 +08:00
Ian Barwick	d9cc09cee4	repmgrd: fix typo	2018-03-21 12:36:51 +09:00
Ian Barwick	9aea5b8aa7	repmgrd: fix failover handling in "manual" mode Regression was introduced in commit `c7a585c555`	2018-03-06 22:35:51 +09:00
Ian Barwick	9c72c0d66e	Add event "repmgrd_shutdown" Implements GitHub #393	2018-03-06 10:59:54 +09:00
Ian Barwick	5a52917421	repmgrd: add debug log output for "monitor_interval_secs" sleep in all modes	2018-03-05 14:23:58 +09:00
Ian Barwick	fe594c95ad	repmgrd: retry standby connection after cascading standby failover	2018-02-28 21:15:11 +09:00
Ian Barwick	60e63feaca	repmgrd: add configuration file parameter "standby_reconnect_timeout" This is used for determining a timeout when reconnecting to the standby after executing the "follow_command". This will normally not need to be set explicitly, but maybe useful in cases where the standby's startup phase can last longer than usual.	2018-02-28 18:56:33 +09:00
Ian Barwick	5e8b41e221	repmgrd: retry standby connection after "follow_command" executed It's possible that the standby is still starting up after the "follow_command" completes, so poll for a while until we get a connection.	2018-02-28 15:35:47 +09:00
Ian Barwick	c7a585c555	repmgrd: improve log output - emit explicit startup NOTICE - emit NOTICE when falling back to degraded monitoring on a primary node - improve log message and event notification details when monitoring a former primary which has been reconnected as a standby	2018-02-28 12:35:13 +09:00
Ian Barwick	829cf5cca4	repmgrd: improve detection of status change from primary to standby If repmgrd is running in degraded mode on a primary which has been stopped, then manually been brought back online as a standby (e.g. by creating recovery.conf and starting the server), ensure it not only detects the change but automatically updates the node record so it can resume monitoring the node as a standby. Previously, repmgrd was looping waiting for the record to be updated (as is done transparently when executing "repmgr node rejoin") but if the record was not updated within the timeout period (e.g. by "repmgr standby register) it would fail to resume monitoring as a standby. It seems reasonable to have repmgrd automatically update the node record, as this will restore failover capability as quickly as possible. If this is not desired, then the onus is on the user to shut down repmgrd while making the desired changes.	2018-02-22 11:35:47 +09:00
Ian Barwick	6dc1969ad5	Remove --bdr-only configuration option This was required for a specific use case during pre-release development and is no longer needed now the physical streaming replication handling is implemented.	2018-01-18 13:30:47 +09:00
Ian Barwick	486f8e5a2c	repmgrd: document standby_[failure\|recovery] event notifications Also clean up the relevant code section. Addresses GitHub #359.	2018-01-04 09:34:49 +09:00
Ian Barwick	1521657965	Update copyright notices to 2018	2018-01-02 10:20:09 +09:00
Ian Barwick	f6a6df3600	repmgrd: renable monitoring data recording when in archive recovery. The warning emitted gives the impression that monitoring data shouldn't be written if there's no streaming replication, but we can and should do this as long as we have a primary connection. Explictly document this in the code. Also remove an unused variable warning.	2017-11-20 15:29:21 +09:00
Ian Barwick	67e27f9ecd	Remove unneeded functions	2017-11-20 15:26:32 +09:00
Ian Barwick	53ebde8f33	repmgrd: don't fail over unless more than 50% of active nodes are visible.	2017-11-15 14:04:41 +09:00
Ian Barwick	5e9d50f8ca	repmgrd: finalize witness failover handling	2017-11-15 14:04:37 +09:00
Ian Barwick	347e753c27	repmgrd: synchronise repmgr.nodes table on witness server	2017-11-15 14:04:34 +09:00
Ian Barwick	2f978847b1	repmgrd: handle witness server	2017-11-15 14:04:30 +09:00
Ian Barwick	e02ddd0f37	repmgrd: basic witness node monitoring	2017-11-15 14:04:23 +09:00
Ian Barwick	31b856dd9f	Add "witness register" functionality	2017-11-15 14:03:54 +09:00
Ian Barwick	e16eb42693	repmgrd: detect role change from primary to standby If repmgrd is monitoring a primary which is taken off-line, then later restored as a standby, detect this change and resume monitoring in standby node. Addresses GitHub #338.	2017-11-15 14:03:26 +09:00
Ian Barwick	cbc97d84ac	repmgrd: updates related to node_id handling	2017-11-15 14:03:15 +09:00
Ian Barwick	96fe7dd2d6	repmgrd: catch corner cases where monitoring data is not available	2017-11-15 14:03:12 +09:00
Ian Barwick	13935a88c9	repmgrd: ensure shmem is reinitialised after a restart	2017-11-09 19:51:31 +09:00
Ian Barwick	5275890467	repmgrd: misc fixes	2017-11-09 19:51:26 +09:00
Ian Barwick	7f865fdaf3	repmgrd: fix priority/node_id tie-break check	2017-11-09 19:51:22 +09:00
Ian Barwick	a3428e4d8a	repmgrd: simplify the candidate selection logic All disconnected nodes will be in a static, known state, so as long as each node has the same meta-information (repmgr.nodes) and is able to retrieve the last receive LSN of the other nodes, it is possible for each node to independently determine the best promotion candidate, thereby reaching consensus without an explicit "voting" process.	2017-11-09 19:51:13 +09:00
Ian Barwick	03b9475755	repmgrd: fixes to failover handling get_new_primary() returns NULL if no notification for the new primary has been received, but the code was expecting it to return UNKNOWN_NODE_ID, which was causing repmgrd to prematurely drop out of the new primary detection loop if no notification had been received by the time the loop started. Also store the electoral term as a single row, single column table, to ensure that all repmgrds see the same turn. It is then bumped by the winning node after it gets promoted. Various logging improvements.	2017-11-09 19:51:09 +09:00
Ian Barwick	d6c27f8938	Standardize quoting in log messages	2017-10-04 09:34:59 +09:00
Ian Barwick	a9f4a027a7	pgindent run	2017-09-11 11:14:13 +09:00
Ian Barwick	3447257ae4	repmgrd: minor fixes and comment updates	2017-09-08 20:59:21 +09:00
Ian Barwick	e4f7dc8234	Add copyright notices	2017-09-08 13:27:39 +09:00
Ian Barwick	1ef00f5a3b	repmgrd: parse "follow_command" during cascaded standby failover	2017-09-05 11:19:25 +09:00
Ian Barwick	78e6bdeebe	Have repmgrd parse "standby follow --upstream-node-id=%n"	2017-09-04 13:42:50 +09:00
Ian Barwick	ab6702891a	Minor fixes to cascading standby failover.	2017-09-01 13:09:17 +09:00
Ian Barwick	154c76e5e7	repmgrd: improve cascaded standby failover Check primary is available.	2017-08-29 15:29:17 +09:00

1 2

75 Commits