repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-24 15:46:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	6f315c1b3c	repmgrd: don't explicitly close connections on shutdown	2018-05-01 10:21:10 +09:00
Ian Barwick	16048a879e	repmgrd: notify sibling nodes to follow new primary after pg_ctl timeout If "pg_ctl promote" fails due to a timeout, but the promotion itself succeeds, have repmgrd on the new primary explicitly notify any sibling nodes to follow it. Previously the sibling nodes would wait "primary_notification_timeout" seconds before attempting to discover the new primary. This (and preceding commit `eac80ae`) address GitHub #425.	2018-04-27 11:54:21 +09:00
Ian Barwick	eac80ae9c1	repmgrd: handle pg_ctl timeout It's possible "pg_ctl promote" will timeout, causing "repmgr standby follow" to return with an error; however the promotion itself will usually succeed, so detect this case and handle accordingly.	2018-04-26 19:19:42 +09:00
Ian Barwick	7822aa784f	repmgrd: catch corner case in standby connection handle check If repmgrd marks the local node as unavailable, and it was actually restarting but a failover event occured before the next local node check, failover will continue with the stale connection handle. Add a final local node check just before starting the failover process, so repmgrd can reconnect if it wasn't able to before.	2018-04-24 21:56:57 +09:00
Ian Barwick	4455ded935	repmgrd: prevent standby connection handle from going stale If monitoring history not in use, there's no activity on the standby's connection handle, so if e.g. the standby is restarted, PQstatus() never returns CONNECTION_BAD and repmgrd never notices the connection is stale. Therefore execute a throw-away statement at "monitor_interval_secs".	2018-04-24 21:56:52 +09:00
Ian Barwick	fd0b850f41	Minor doc and log output tweaks	2018-04-24 21:08:05 +09:00
Ian Barwick	85ab2d94b7	repmgrd: tweak event notifications on standby failure The event notification was only being created if there was a valid primary connection; it should be created in any case, so an event notification script can be executed.	2018-04-20 10:15:08 +09:00
Ian Barwick	96811ccc01	repmgrd: tweak log notices when marking a standby as failed Announce what we're going to do (set the node record inactive) before performing the action. Makes reading the log slightly easier.	2018-04-03 14:37:43 +09:00
Ian Barwick	73982859f6	repmgrd: improve log output - emit explicit startup NOTICE - emit NOTICE when falling back to degraded monitoring on a primary node - improve log message and event notification details when monitoring a former primary which has been reconnected as a standby	2018-04-03 14:37:06 +09:00
Ian Barwick	5e4bdb5a1b	repmgrd: handle failover with two nodes in the primary location If two nodes were in the primary location, and at least one node in another location, the non-failed node in the primary location was not recognising itself as a promotion candidate. Addresses GitHub #407.	2018-04-02 20:51:27 +09:00
Ian Barwick	a403da67bc	Consolidate connection closure calls	2018-03-27 16:43:59 +09:00
Ian Barwick	0e55a60660	Add event "repmgrd_failover_aborted"	2018-03-21 13:23:06 +09:00
Ian Barwick	81c69e3677	repmgrd: fix typo	2018-03-21 12:36:15 +09:00
Ian Barwick	2a99dfa15b	repmgrd: fix failover handling in "manual" mode Regression was introduced in commit `c7a585c555`	2018-03-07 19:21:40 +09:00
Ian Barwick	cdb504d700	Add event "repmgrd_shutdown" Implements GitHub #393	2018-03-06 11:00:03 +09:00
Ian Barwick	0af2077bed	repmgrd: add debug log output for "monitor_interval_secs" sleep in all modes	2018-03-06 10:56:21 +09:00
Ian Barwick	bc766a48ed	repmgrd: retry standby connection after cascading standby failover	2018-03-02 11:05:07 +09:00
Ian Barwick	55441f2729	repmgrd: add configuration file parameter "standby_reconnect_timeout" This is used for determining a timeout when reconnecting to the standby after executing the "follow_command". This will normally not need to be set explicitly, but maybe useful in cases where the standby's startup phase can last longer than usual.	2018-03-02 11:04:56 +09:00
Ian Barwick	c1356b9e0d	repmgrd: retry standby connection after "follow_command" executed It's possible that the standby is still starting up after the "follow_command" completes, so poll for a while until we get a connection.	2018-03-02 11:04:19 +09:00
Ian Barwick	22b3a74fa0	repmgrd: improve detection of status change from primary to standby If repmgrd is running in degraded mode on a primary which has been stopped, then manually been brought back online as a standby (e.g. by creating recovery.conf and starting the server), ensure it not only detects the change but automatically updates the node record so it can resume monitoring the node as a standby. Previously, repmgrd was looping waiting for the record to be updated (as is done transparently when executing "repmgr node rejoin") but if the record was not updated within the timeout period (e.g. by "repmgr standby register) it would fail to resume monitoring as a standby. It seems reasonable to have repmgrd automatically update the node record, as this will restore failover capability as quickly as possible. If this is not desired, then the onus is on the user to shut down repmgrd while making the desired changes.	2018-02-22 15:50:45 +09:00
Ian Barwick	ec068e38a2	Remove --bdr-only configuration option This was required for a specific use case during pre-release development and is no longer needed now the physical streaming replication handling is implemented.	2018-01-25 10:48:09 +09:00
Ian Barwick	e64d965c6a	repmgrd: document standby_[failure\|recovery] event notifications Also clean up the relevant code section. Addresses GitHub #359.	2018-01-04 09:33:37 +09:00
Ian Barwick	26a9e848fd	Update copyright notices to 2018	2018-01-02 10:19:46 +09:00
Ian Barwick	8c422d6084	Remove unneeded functions	2017-11-20 15:18:21 +09:00
Ian Barwick	08b443dce0	repmgrd: renable monitoring data recording when in archive recovery. The warning emitted gives the impression that monitoring data shouldn't be written if there's no streaming replication, but we can and should do this as long as we have a primary connection. Explictly document this in the code. Also remove an unused variable warning.	2017-11-16 17:17:17 +09:00
Ian Barwick	9d432546bf	repmgrd: don't fail over unless more than 50% of active nodes are visible.	2017-11-15 13:48:28 +09:00
Ian Barwick	3c557ebd8e	repmgrd: finalize witness failover handling	2017-11-15 13:48:25 +09:00
Ian Barwick	4efeb52cba	repmgrd: synchronise repmgr.nodes table on witness server	2017-11-15 13:48:21 +09:00
Ian Barwick	60422c66f9	repmgrd: handle witness server	2017-11-15 13:48:17 +09:00
Ian Barwick	a31980b590	repmgrd: basic witness node monitoring	2017-11-15 13:48:11 +09:00
Ian Barwick	a6cc4d80f0	Add "witness register" functionality	2017-11-15 13:47:45 +09:00
Ian Barwick	9908a9c662	repmgrd: detect role change from primary to standby If repmgrd is monitoring a primary which is taken off-line, then later restored as a standby, detect this change and resume monitoring in standby node. Addresses GitHub #338.	2017-11-10 17:19:30 +09:00
Ian Barwick	0230bafae1	repmgrd: updates related to node_id handling	2017-11-10 12:07:31 +09:00
Ian Barwick	de577adc67	repmgrd: catch corner cases where monitoring data is not available	2017-11-09 22:27:09 +09:00
Ian Barwick	fed17d49e3	repmgrd: ensure shmem is reinitialised after a restart	2017-11-09 19:31:21 +09:00
Ian Barwick	d80763f974	repmgrd: misc fixes	2017-11-09 19:31:16 +09:00
Ian Barwick	331e982bdb	repmgrd: fix priority/node_id tie-break check	2017-11-09 19:31:12 +09:00
Ian Barwick	6ac6e0733a	repmgrd: simplify the candidate selection logic All disconnected nodes will be in a static, known state, so as long as each node has the same meta-information (repmgr.nodes) and is able to retrieve the last receive LSN of the other nodes, it is possible for each node to independently determine the best promotion candidate, thereby reaching consensus without an explicit "voting" process.	2017-11-09 19:31:04 +09:00
Ian Barwick	79d21b516b	repmgrd: fixes to failover handling get_new_primary() returns NULL if no notification for the new primary has been received, but the code was expecting it to return UNKNOWN_NODE_ID, which was causing repmgrd to prematurely drop out of the new primary detection loop if no notification had been received by the time the loop started. Also store the electoral term as a single row, single column table, to ensure that all repmgrds see the same turn. It is then bumped by the winning node after it gets promoted. Various logging improvements.	2017-11-08 14:28:08 +09:00
Ian Barwick	d6c27f8938	Standardize quoting in log messages	2017-10-04 09:34:59 +09:00
Ian Barwick	a9f4a027a7	pgindent run	2017-09-11 11:14:13 +09:00
Ian Barwick	3447257ae4	repmgrd: minor fixes and comment updates	2017-09-08 20:59:21 +09:00
Ian Barwick	e4f7dc8234	Add copyright notices	2017-09-08 13:27:39 +09:00
Ian Barwick	1ef00f5a3b	repmgrd: parse "follow_command" during cascaded standby failover	2017-09-05 11:19:25 +09:00
Ian Barwick	78e6bdeebe	Have repmgrd parse "standby follow --upstream-node-id=%n"	2017-09-04 13:42:50 +09:00
Ian Barwick	ab6702891a	Minor fixes to cascading standby failover.	2017-09-01 13:09:17 +09:00
Ian Barwick	154c76e5e7	repmgrd: improve cascaded standby failover Check primary is available.	2017-08-29 15:29:17 +09:00
Ian Barwick	e0888c1f62	repmgrd: handle SIGHUP	2017-08-29 12:55:13 +09:00
Ian Barwick	df827c6518	Update repmgrd documentation	2017-08-29 11:04:30 +09:00
Ian Barwick	4a11551c2f	repmgrd: handle local node failure	2017-08-28 10:31:43 +09:00

1 2

72 Commits