repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-06-01 03:39:05 +00:00

Author	SHA1	Message	Date
Ian Barwick	f1667a7e98	repmgrd: don't consider nodes where repmgrd is not running If, for whatever reason, repmgrd is not running on a node, but that node qualifies as promotion candidate, failover will not take place as that node will never promote itself. We therefore discount nodes where repmgrd is running as promotion candidates, which will ensure one node is always promoted. There is a slight risk here that the node(s) where repmgrd is not running are further ahead, leading to a timeline fork. It might be possible to mitigate that by having the "election" leader perform the promote (or follow) operation.	2019-02-07 17:07:13 +09:00
Ian Barwick	b91900f831	doc: clarify "repmgr daemon status" CSV output	2019-02-07 14:55:42 +09:00
Ian Barwick	aa1e64ec11	Warn about redundant use of --compact option	2019-02-07 14:35:30 +09:00
Ian Barwick	5d6037303b	"daemon status": display node priority GitHub #541.	2019-02-07 14:35:24 +09:00
Ian Barwick	8aaf6571a0	"cluster show": display node priority GitHUb #541.	2019-02-07 14:35:21 +09:00
Ian Barwick	9433f80364	"cluster show": warn about nodes with paused WAL replay We do this in "repmgr daemon status" already, so do it here too for consistency. Related to GitHub #540.	2019-02-07 13:48:46 +09:00
Ian Barwick	aee13aee52	doc: note repmgrd behaviour when WAL replay is paused Related to GitHub #540.	2019-02-07 13:28:29 +09:00
Ian Barwick	f0a0be0248	Remove pointless default allocation in _get_node_record()	2019-02-07 11:41:08 +09:00
Ian Barwick	c4332d9a52	repmgrd: forcibly resume WAL replay if paused If WAL replay is paused, and there is WAL pending replay, a promote command will be queued until replay is resumed. As it's conceivable that there are corner cases where one standby with replay paused has actually received the most WAL, we'll forcibly resume WAL replay so it can be reliably promoted, if needed. Related to GitHub #540.	2019-02-07 11:39:48 +09:00
Ian Barwick	c7b325e2a4	Add function resume_wal_replay()	2019-02-07 11:33:02 +09:00
Ian Barwick	b89941f218	Store WAL replay pause status in ReplInfo struct	2019-02-07 10:24:42 +09:00
Ian Barwick	2b3b1faa20	refactor query in function get_replication_info() In particular handle all cases where one of the functions called in the query can return NULL in the query itself.	2019-02-06 15:40:27 +09:00
Ian Barwick	b9cd321aed	repmgrd: skip LSN checks of 0 priority node The node will never become a candidate so we can save the round trip to fetch its LSN.	2019-02-06 14:27:01 +09:00
Ian Barwick	984ce7420b	"daemon status": emit warning if WAL replay is paused Specifically, if WAL replay is paused and WAL is pending replay, this node cannot be promoted until WAL replay is unpaused. In this state it is not a suitable promotion candidate in a failover situation.	2019-02-06 13:32:20 +09:00
Ian Barwick	464ec6bec3	Ensure conninfo param list is initialized for --recovery-conf-only option	2019-02-06 12:58:09 +09:00
Ian Barwick	3bbbf6daa9	"recovery_file_path" is MAXPGPATH	2019-02-06 10:42:09 +09:00
Ian Barwick	cd3312496e	Rename functions which return an LSN for clarity	2019-02-06 09:32:53 +09:00
Ian Barwick	cce8b76171	"standby switchover": abort if promotion candidate has WAL replay paused If replay is paused, we can't be really sure that more WAL will be received between the check and the promote operation, which would risk the promote operation not taking place during the switchover (it would happen as soon as WAL replay is resumed and pending WAL is replayed). Therefore we simply quit with an informative slew of messages and leave the user to sort it out. GitHub #540.	2019-02-05 16:32:39 +09:00
Ian Barwick	2a529e7e8b	"standby promote": don't promote if replay paused and in archive recovery It does not appear feasible to predict if there is still WAL waiting to be replayed from archive. In this case take no action. GitHub #540.	2019-02-05 14:39:08 +09:00
Ian Barwick	f62b3b2868	Fix Pg10+ function names	2019-02-05 13:37:35 +09:00
Ian Barwick	701944c194	"standby promote": add check for WAL replay status if replay is paused If WAL replay is paused but WAL is still pending replay, PostgreSQL will ignore the promote request until WAL replay is unpaused. This may lead to the standby being promoted at an unpredictable point in time outside of repmgr's control. Moreover it may not be obvious that this is happening, or why, and it will appear that an apparently successful promotion attempt has not actually worked. To prevent this from happening, repmgr will now refuse to promote the standy if WAL replay is paused and WAL is still pending replay. GitHub #540.	2019-02-05 13:30:37 +09:00
Ian Barwick	d8048060a2	doc: rephrase exit code preamble Previously it kind of implied more than one code can be emitted.	2019-02-05 11:06:26 +09:00
Ian Barwick	31f25856a2	doc: update "repmgr node rejoin" reference	2019-02-05 10:57:23 +09:00
Ian Barwick	92c73b68a0	Clean up dbutils.c Put functions into the same "section" as noted in the header file.	2019-02-05 09:36:54 +09:00
Ian Barwick	90909e2e42	doc: update source install instructions Note packages required to compile if the package "build dep" option is not viable.	2019-02-04 17:09:11 +09:00
Martín Marqués	b036870c83	doc: fix typo in the release notes for 4.3 GitHub #539 Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>	2019-02-04 16:39:58 +09:00
Ian Barwick	321eb844e4	doc: update Debian/Ubuntu repmgrd configuration Remove reference to setting "repmgrd_pid_file", as this should not be set in this context. Per GitHub #517.	2019-02-04 16:11:13 +09:00
Ian Barwick	2c9700586c	repmgr: "witness register" - check connection is to primary node Previously, if the witness server connection details were provided to "repmgr witness register" rather than those of the primary server, repmgr a) write the node record to the witness server rather than the primary, and b) would loop indefinitely trying to copy the node table to itself. Addresses GitHub #538.	2019-02-04 14:45:32 +09:00
Ian Barwick	f9a1861ded	Refactor ReplInfo struct handling Eventually we'll want to have this contain the optional replication info contained in the t_node_info struct, which should then contain a pointer to a ReplInfo struct.	2019-02-02 18:39:24 +09:00
Ian Barwick	59ed86c01a	"cluster show": fix formatting with multiple digit node IDs	2019-02-02 14:07:49 +09:00
Ian Barwick	f24b30327c	Add missing "daemon (start\|stop)" options to main help output	2019-02-02 13:11:31 +09:00
Ian Barwick	48381a5b4e	Use --compact option for abbreviated display output --terse is meant for reducing log chatter.	2019-02-02 13:06:59 +09:00
Ian Barwick	20b79f998c	Define some previously magic numbers	2019-02-01 19:14:16 +09:00
Ian Barwick	a41e7bb726	doc: various minor updates	2019-02-01 17:24:32 +09:00
Ian Barwick	b9ba97a36d	"standby switchover": check replication connection to upstream Ensure repmgr checks the standby (promotion candidate) is currently attached to the primary (demotion candidate). Addresses issue reported in GitHub #519.	2019-02-01 15:28:06 +09:00
Ian Barwick	d8aa472c5f	doc: fix URL typo	2019-02-01 13:13:11 +09:00
Ian Barwick	9273e7af73	"standby switchover": avoid potential race condition with WAL location check Immediately after the demotion candidate (primary) has shut down, we can't be absolutely sure that the walreceiver has flushed all WAL to disk, so checking pg_last_wal_receive_lsn() at that point might not reflect the actual last available WAL location. To handle this, we'll loop for a while (timeout controlled by configuration parameter "wal_receive_check_timeout") before finally deciding whether the standby is still behind the shut-down primary. Addresses issue raised in GitHub #518.	2019-02-01 12:06:22 +09:00
Ian Barwick	f04f2af8aa	Add missing include files Per compiler griping on OS X.	2019-01-31 16:10:48 +09:00
Ian Barwick	bdb4f66a9d	Add an Assert() to detect attempted array overflow in param_set...() functions Previously the code would do nothing if an attempt was made to add parameters if the array is already full. As the array is designed to contain all valid libpq connection parameters, there's no reason it should ever "overflow" like this. If there is, then it means the caller is attempting to add invalid values. Add an Assert() so we can easily detect this in the unlikely event it ever occurs. Noted after examining the issue raised in GitHub #533, which is nonsensical as it implies we'd be OK with writing beyond the end of the array, however it doesn't hurt to make it a bit clearer what is happening and why.	2019-01-31 14:11:00 +09:00
Ian Barwick	c402b08791	doc: update "node rejoin" page Add some notes about situations where node rejoin cannot work, and pg_rewind usage.	2019-01-31 12:25:58 +09:00
Ian Barwick	64bb034d34	"node rejoin": catch corner case where repmgr metadata is outdated If the rejoin target is not in recovery, but not registered as primary (we detect this by attempting to connect to the registered primary) we abort and suggest fixing the repmgr metadata first.	2019-01-31 11:54:05 +09:00
Ian Barwick	ea54aaa290	Use "rejoin target" instead of "follow target" in "node rejoin" log output	2019-01-31 11:32:38 +09:00
Ian Barwick	b34c331eba	"node rejoin": fail if rejoin target has same timeline and lower LSN pg_rewind will not resolve this situation.	2019-01-31 11:15:55 +09:00
Ian Barwick	19e0b6a1b6	doc: update "node rejoin" documentation In particular, update examples to reflect changed output in repmgr 4.3.	2019-01-31 10:49:39 +09:00
Ian Barwick	9349171b55	doc: document "node_rejoin_timeout" for switchover operations	2019-01-30 15:43:34 +09:00
Ian Barwick	d4ee4cc14c	"daemon stop": be careful with hints about "daemon status" If PostgreSQL is not running, "repmgr daemon status" can't be executed.	2019-01-30 14:49:43 +09:00
Ian Barwick	d7420d7274	daemon (start\|stop): verify that repmgrd starts/stops. Note this may not always be possible for "daemon stop" if we are unable to determine the repmgrd PID.	2019-01-30 14:41:31 +09:00
Ian Barwick	70e4243a1d	Clean up calls to repmgr_atoi() In some places we were still providing "false" from the original implementation, which was intended to indicate whether a negative value was allowed. This has not been a problem, as it merely means we have been providing "0", which is the same thing; however we can finer-tune some of the calls (e.g. node ID must be or greater).	2019-01-30 11:43:43 +09:00
Ian Barwick	b6264b77c4	repmgr: mandate explicit configuration for "daemon (start\|stop)" The initial implementation was designed to fall back to "manual" start/stop of repmgrd if the "repmgrd_service_..._command" parameters were not set. However on reflection, this is too much of a potential footgun, so we will mandate provision of these parameters.	2019-01-30 10:57:06 +09:00
Ian Barwick	9e7cb6d01c	doc: make it easier to find info about installation of old packages	2019-01-30 09:45:08 +09:00

1 2 3 4 5 ...

1155 Commits