repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-07-16 14:29:05 +00:00

Author	SHA1	Message	Date
John Naylor	4e414d2ea0	Fix typo	2019-02-24 10:50:09 +07:00
Ian Barwick	ea36609159	Add some missing query error logging	2019-02-23 16:54:07 +09:00
Ian Barwick	0c68018631	repmgrd: log details of nodes which can see primary If a failover is cancelled because other nodes can still see the primary, log the identies of those nodes.	2019-02-23 15:55:06 +09:00
Ian Barwick	b72c894db4	repmgrd: during failover, check if other nodes have seen the primary In a situation where only some standbys are cut off from the primary, a failover would result in a split brain/split cluster situation, as it's likely one of the cut-off standbys will promote itself, and other cut-off standbys (but not all standbys) will follow it. To prevent this happening, interrogate the other sibiling nodes to check whether they've seen the primary within a reasonably short interval; if this is the case, do not take any failover action. This feature is experimental.	2019-02-23 13:03:22 +09:00
Ian Barwick	07097575b1	daemon status: add column "upstream last seen" This displays the interval (in seconds) since the repmgrd instance on each node last confirmed its upstream node is available.	2019-02-23 13:03:16 +09:00
Ian Barwick	71d151ca87	Don't check status of logical replication slots We only want to check the status of physical replication slots to determine whether a streaming replication standby has become detached and there is therefore a risk of uncontrolled WAL buildup on the local node. It's not feasible to second-guess the state of logical replication slots.	2019-02-23 10:09:43 +09:00
Ian Barwick	5abec2bb97	doc: clarify replication slot usage with Barman Barman will usually use one replication slot, but that's generally preferable to multiple slots.	2019-02-22 13:52:02 +09:00
Ian Barwick	de70fd42dc	node check: simplify output generation in --is-shutdown-cleanly check	2019-02-22 10:49:06 +09:00
Ian Barwick	99550b91bd	standby register: warn if standby is running and connection params provided Addresses GitHub #552.	2019-02-22 10:31:00 +09:00
John Naylor	70190c37c4	Bring list of supported versions on the doc front page in line with the supported version matrix	2019-02-20 11:41:17 +07:00
Ian Barwick	f3fc4e5afb	Minor syntax formatting tweak For consistency.	2019-02-15 19:58:35 +09:00
Ian Barwick	629c552348	primary unregister: ensure correct behaviour when executed on a witness Fixes GitHub #548.	2019-02-15 19:49:17 +09:00
Ian Barwick	85a97c933f	Handle unhandled NodeStatus in switch statement	2019-02-15 19:31:06 +09:00
Ian Barwick	3a5a4388c7	cluster show: differentiate unreachable status Differentiate between unreachable nodes and nodes which are running but rejecting connections.	2019-02-15 16:01:55 +09:00
Ian Barwick	9338a9e233	Improve logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail line	2019-02-15 10:49:56 +09:00
Ian Barwick	7fad2ed2c8	standby switchover: improve error output It wasn't clear why repmgr thinks the demotion candidate is not the upstream of the promotion candidate.	2019-02-14 17:22:24 +09:00
Ian Barwick	9305953bd2	Fix history file parsing Also add additional debugging output.	2019-02-14 15:52:40 +09:00
Ian Barwick	aeb9639ed9	node rejoin: add more log detail during rejoin success check Stating what is actually being checked where might be useful when diagnosing potential issues.	2019-02-13 15:29:39 +09:00
Ian Barwick	bc9e725d05	node rejoin: always emit detail about relative LSNs Previously repmgr only emitted that if there was a timeline/LSN mismatch, but it's useful to have confirmation of how it came to the conclusion that rejoin will succeed.	2019-02-13 15:16:40 +09:00
Ian Barwick	905e108f8f	doc: fix typos etc. in "standby follow" reference	2019-02-12 17:24:56 +09:00
Ian Barwick	f2362a06fa	doc: update "standby switchover" reference	2019-02-12 16:39:13 +09:00
Ian Barwick	7b85cb9f12	doc: update "standby follow" reference Add note about handling of timeline forks etc.	2019-02-12 16:39:06 +09:00
Ian Barwick	790bec21dd	node rejoin: handle case where node to rejoin was primary In that case the minRecoveryPoint* fields may be empty.	2019-02-12 13:31:25 +09:00
Ian Barwick	a0dc673439	"node rejoin": use minRecoveryPointTLI for comparing timelines	2019-02-12 13:31:21 +09:00
Ian Barwick	25019d1cc5	Refactor is_wal_replay_paused() query Make sure it doesn't emit an error if executed on a node not in recovery. The caller should theoretically only execute it on nodes in recovery, but there are sure to be corner cases where the node has come out of recovery.	2019-02-12 10:21:05 +09:00
Ian Barwick	d00cb767a6	cluster show: don't try to run WAL replay pause query on unreachable node	2019-02-12 10:15:06 +09:00
Ian Barwick	8e0d28d8dc	Fix "repmgr daemon --help" output Per report from Shaun.	2019-02-12 09:20:29 +09:00
yonj1e	e146fb4fc3	Fix undeclared 'TRUE' error GitHub #547.	2019-02-11 16:55:54 +09:00
Ian Barwick	8773543e10	doc: update "daemon (start\|stop)" documentation Clarify various aspects related to configuration.	2019-02-11 10:55:10 +09:00
Ian Barwick	a4cd4ee553	doc: fix quoting in "standby switchover" index entries	2019-02-11 10:34:02 +09:00
Ian Barwick	a61dd8a750	doc: tweak support text	2019-02-08 15:28:12 +09:00
Ian Barwick	2c84716e66	doc: add information about reporting issues etc. Useful to have a linkable document listing the information required to have a chance of troubleshooting issues.	2019-02-08 11:55:42 +09:00
Ian Barwick	f1667a7e98	repmgrd: don't consider nodes where repmgrd is not running If, for whatever reason, repmgrd is not running on a node, but that node qualifies as promotion candidate, failover will not take place as that node will never promote itself. We therefore discount nodes where repmgrd is running as promotion candidates, which will ensure one node is always promoted. There is a slight risk here that the node(s) where repmgrd is not running are further ahead, leading to a timeline fork. It might be possible to mitigate that by having the "election" leader perform the promote (or follow) operation.	2019-02-07 17:07:13 +09:00
Ian Barwick	b91900f831	doc: clarify "repmgr daemon status" CSV output	2019-02-07 14:55:42 +09:00
Ian Barwick	aa1e64ec11	Warn about redundant use of --compact option	2019-02-07 14:35:30 +09:00
Ian Barwick	5d6037303b	"daemon status": display node priority GitHub #541.	2019-02-07 14:35:24 +09:00
Ian Barwick	8aaf6571a0	"cluster show": display node priority GitHUb #541.	2019-02-07 14:35:21 +09:00
Ian Barwick	9433f80364	"cluster show": warn about nodes with paused WAL replay We do this in "repmgr daemon status" already, so do it here too for consistency. Related to GitHub #540.	2019-02-07 13:48:46 +09:00
Ian Barwick	aee13aee52	doc: note repmgrd behaviour when WAL replay is paused Related to GitHub #540.	2019-02-07 13:28:29 +09:00
Ian Barwick	f0a0be0248	Remove pointless default allocation in _get_node_record()	2019-02-07 11:41:08 +09:00
Ian Barwick	c4332d9a52	repmgrd: forcibly resume WAL replay if paused If WAL replay is paused, and there is WAL pending replay, a promote command will be queued until replay is resumed. As it's conceivable that there are corner cases where one standby with replay paused has actually received the most WAL, we'll forcibly resume WAL replay so it can be reliably promoted, if needed. Related to GitHub #540.	2019-02-07 11:39:48 +09:00
Ian Barwick	c7b325e2a4	Add function resume_wal_replay()	2019-02-07 11:33:02 +09:00
Ian Barwick	b89941f218	Store WAL replay pause status in ReplInfo struct	2019-02-07 10:24:42 +09:00
Ian Barwick	2b3b1faa20	refactor query in function get_replication_info() In particular handle all cases where one of the functions called in the query can return NULL in the query itself.	2019-02-06 15:40:27 +09:00
Ian Barwick	b9cd321aed	repmgrd: skip LSN checks of 0 priority node The node will never become a candidate so we can save the round trip to fetch its LSN.	2019-02-06 14:27:01 +09:00
Ian Barwick	984ce7420b	"daemon status": emit warning if WAL replay is paused Specifically, if WAL replay is paused and WAL is pending replay, this node cannot be promoted until WAL replay is unpaused. In this state it is not a suitable promotion candidate in a failover situation.	2019-02-06 13:32:20 +09:00
Ian Barwick	464ec6bec3	Ensure conninfo param list is initialized for --recovery-conf-only option	2019-02-06 12:58:09 +09:00
Ian Barwick	3bbbf6daa9	"recovery_file_path" is MAXPGPATH	2019-02-06 10:42:09 +09:00
Ian Barwick	cd3312496e	Rename functions which return an LSN for clarity	2019-02-06 09:32:53 +09:00
Ian Barwick	cce8b76171	"standby switchover": abort if promotion candidate has WAL replay paused If replay is paused, we can't be really sure that more WAL will be received between the check and the promote operation, which would risk the promote operation not taking place during the switchover (it would happen as soon as WAL replay is resumed and pending WAL is replayed). Therefore we simply quit with an informative slew of messages and leave the user to sort it out. GitHub #540.	2019-02-05 16:32:39 +09:00

1 2 3 4 5 ...

1187 Commits