repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-23 07:06:30 +00:00

Author	SHA1	Message	Date
Ian Barwick	9de31428f1	Consolidate replication connection code In a few places, replication connections are generated from the parameters used by existing connections. This has resulted in a number of similar blocks of code which do more-or-less the same thing almost but not quite identically. In two cases, the code omitted to set "dbname=replication", which can cause problems in some contexts. These code blocks have now been consolidated into standardized functions. This also resolves the issue addressed by GitHub #619.	2020-03-05 17:21:37 +09:00
Ian Barwick	8f6058c676	standby switchover: check replication configuration file ownership Within a PostgreSQL data directory, all files should have the same ownership as the data directory itself. PostgreSQL itself expects this, and ownership of files by another user is likely to cause problems. In PostgreSQL 11 or earlier, if "recovery.conf" cannot be moved by PostgreSQL (because e.g. it is owned by root), it will not be possible to promote the standby to primary. In PostgreSQL 12 and later, if "postgresql.auto.conf" on the demotion candidate (current primary) has incorrect ownership (e.g. owned by root), repmgr will very likely not be able to modify this file and write the replication configuration required for the node to rejoin the cluster as a standby. Checks added to catch both cases before a switchover is executed.	2020-03-04 17:21:22 +09:00
Ian Barwick	7ed0a99d70	Make code to check standby join status available globally This makes it possible to check the standby join status from another node, e.g. the promotion candidate during a switchover operation.	2020-02-04 12:52:55 +09:00
Ian Barwick	e2a362a171	"node rejoin": check for available replication slots on the rejoin target "standby follow" did this already, but "node rejoin" didn't.	2020-02-03 17:03:20 +09:00
Ian Barwick	4d4ed3bcd6	Remove BDR 2.x support The BDR 2.x support was conceptual only and was never used in production. As BDR 2.x will be EOL'd shortly, there is no risk it will be needed.	2020-01-16 09:52:42 +09:00
Ian Barwick	7fdf2f1778	Update copyright notices to 2020	2020-01-13 14:06:20 +09:00
Ian Barwick	f5044465cb	Add function to safely modify postgresql.auto.conf This is required for PostgreSQL 12 and later.	2019-08-14 16:57:42 +09:00
Ian Barwick	4ebc43fd63	Clean up variable usage in do_node_status() Variable with the same name existed both at function level and within local code blocks.	2019-08-14 14:15:41 +09:00
Ian Barwick	38b373e6df	"node check": check role membership when trying to read pg_settings From PostgreSQL 10, a member of the default roles "pg_monitor" and/or "pg_read_all_settings" can read pg_settings without requiring superuser privileges. Previously, a hint was being emitted about making the repmgr user a member of one of those groups, but no check for membership was being made, meaning the check could only be run by a superuser.	2019-08-07 14:26:48 +09:00
Ian Barwick	3c8bab97d8	Fix variable declarations	2019-05-22 17:26:34 +09:00
Ian Barwick	dd78a16006	Change return type of is_downstream_node_attached() from bool to NodeAttached This enables us to better determine whether a node is definitively attached, definitively not attached, or if it was not possible to determine the attached state.	2019-05-14 15:57:20 +09:00
Ian Barwick	52905f1eb3	Standardize on "ID: %i" when logging node IDs Previously there was a mix of "id:", "node id:", "node ID:" and "node_id:".	2019-04-30 17:07:33 +09:00
Ian Barwick	bb42d8cba6	Fix calculation of maximum filename length	2019-03-28 12:40:29 +09:00
Ian Barwick	fe822a9eea	Prevent potential file descriptor resource leak	2019-03-28 12:29:10 +09:00
Ian Barwick	03cd5a6028	Put closedir call in correct location	2019-03-28 12:08:42 +09:00
Ian Barwick	1e1c596446	Add various missing close() calls	2019-03-28 11:32:25 +09:00
Ian Barwick	314a1e8f4f	use a constant to denote unknown replication lag	2019-03-20 17:26:04 +09:00
Ian Barwick	19bf4d7434	Count witness and zero-priority nodes in visibility check	2019-03-14 11:17:51 +09:00
Ian Barwick	9823978f41	repmgrd: don't wait for WAL receiver to reconnect during failover If the WAL receiver has been temporarily disabled, we don't want to wait for it to start up as it may not be able to at that point; we do however need to reset "wal_retrieve_retry_interval".	2019-03-06 15:54:56 +09:00
Ian Barwick	1615353f48	repmgrd: optionally disconnect WAL receivers during failover This is intended to ensure that all nodes have a constant LSN while making the failover decision. This feature is experimental and needs to be explicitly enabled with the configuration file option "standby_disconnect_on_failover". Note enabling this option will result in a delay in the failover decision until the WAL receiver is disconnected on all nodes.	2019-03-06 15:53:57 +09:00
Ian Barwick	71d151ca87	Don't check status of logical replication slots We only want to check the status of physical replication slots to determine whether a streaming replication standby has become detached and there is therefore a risk of uncontrolled WAL buildup on the local node. It's not feasible to second-guess the state of logical replication slots.	2019-02-23 10:09:43 +09:00
Ian Barwick	de70fd42dc	node check: simplify output generation in --is-shutdown-cleanly check	2019-02-22 10:49:06 +09:00
Ian Barwick	aeb9639ed9	node rejoin: add more log detail during rejoin success check Stating what is actually being checked where might be useful when diagnosing potential issues.	2019-02-13 15:29:39 +09:00
Ian Barwick	790bec21dd	node rejoin: handle case where node to rejoin was primary In that case the minRecoveryPoint* fields may be empty.	2019-02-12 13:31:25 +09:00
Ian Barwick	a0dc673439	"node rejoin": use minRecoveryPointTLI for comparing timelines	2019-02-12 13:31:21 +09:00
Ian Barwick	cce8b76171	"standby switchover": abort if promotion candidate has WAL replay paused If replay is paused, we can't be really sure that more WAL will be received between the check and the promote operation, which would risk the promote operation not taking place during the switchover (it would happen as soon as WAL replay is resumed and pending WAL is replayed). Therefore we simply quit with an informative slew of messages and leave the user to sort it out. GitHub #540.	2019-02-05 16:32:39 +09:00
Ian Barwick	f9a1861ded	Refactor ReplInfo struct handling Eventually we'll want to have this contain the optional replication info contained in the t_node_info struct, which should then contain a pointer to a ReplInfo struct.	2019-02-02 18:39:24 +09:00
Ian Barwick	20b79f998c	Define some previously magic numbers	2019-02-01 19:14:16 +09:00
Ian Barwick	64bb034d34	"node rejoin": catch corner case where repmgr metadata is outdated If the rejoin target is not in recovery, but not registered as primary (we detect this by attempting to connect to the registered primary) we abort and suggest fixing the repmgr metadata first.	2019-01-31 11:54:05 +09:00
Ian Barwick	59eca2be30	node rejoin: improve error code handling - return ERR_REJOIN_FAIL in all cases where the rejoin operation fails - ensure ERR_FOLLOW_FAIL is not returned - document error codes	2019-01-24 10:31:45 +09:00
Ian Barwick	dfe57d2406	"node rejoin": log pg_rewind command as DETAIL rather than DEBUG	2019-01-23 17:15:07 +09:00
Ian Barwick	061932d023	"node rejoin": verify status of rejoin target This adapts the code previously added to "standby follow" to verify whether the rejoin target can actually be rejoined.	2019-01-23 17:08:55 +09:00
Ian Barwick	3f5762e03a	Refactor upstream attachment check code Move it from the "standby follow" code to an independent function so it can be used in other contexts, e.g. "node rejoin".	2019-01-23 15:11:42 +09:00
Ian Barwick	42fa9a2a88	Log node rejoin failure as ERROR	2019-01-23 13:55:40 +09:00
Ian Barwick	f23065e041	Fix typo in log message	2019-01-23 13:53:29 +09:00
Ian Barwick	7dce3ed234	Update copyright notices to 2019	2019-01-21 14:54:35 +09:00
Ian Barwick	0b3a310802	Add --data-directory-config option to "repmgr node check" Implements part of GitHub #523.	2019-01-16 16:03:44 +09:00
Ian Barwick	9e90fcd584	"standby follow": verify status of follow target This commit adds infrastruture for repmgr to be able to check whether one standby can attach to another node, regardless whether it is a standby or a primary. This is intended to prevent a node from attempting to follow a node whose timeline has diverged. The --dry-run option makes it possible to test a follow operation before it is carried out. As a useful side-effect this makes it possible for a standby to follow another standby. This is an initial implementation; documentation and possibly further changes to follow.	2018-11-29 17:14:38 +09:00
Ian Barwick	74c44a7178	doc: document "repmgr node service" This was originally intended for internal use, but it's mentioned several times in the documentation and is useful for diagnostic purposes.	2018-11-28 12:58:07 +09:00
Ian Barwick	793d83b22c	Refactor server version detection Most of the time we can simply get the version number directly from the connection handle. Previously it was held in a global variable, which was an icky way of doing things. In a few special cases we also need the actual version string, which is obtained directly from the database.	2018-11-22 21:30:31 +09:00
Ian Barwick	61c91df332	"repmgr node": use appendPQExpBufferStr/-Char() where appropriate	2018-10-03 14:09:29 +09:00
Ian Barwick	b346914d4d	repmgr: fix "Missing replication slots" label in "node check" Per report in GitHub #507.	2018-10-03 13:53:52 +09:00
Ian Barwick	9681708b1a	repmgr: improve slot handling in "node rejoin" On the rejoined node, if a replication slot for the new upstream exists (which is typically the case after a failover), delete that slot. Also emit a warning about any inactive replication slots which may need to be cleaned up manually. GitHub #499.	2018-08-30 12:24:13 +09:00
Ian Barwick	a5cfc244bc	repmgr: have "node status" check for missing downstream nodes This matches the behaviour of "node check".	2018-07-18 10:27:19 +09:00
Ian Barwick	ae60caacdd	repmgr: make "node check" and "node status" return ERR_NODE_STATUS when appropriate If any issue is detected (and "node check" is not being executed with a specific individual check), "ERR_NODE_STATUS" is returned.	2018-07-05 14:31:06 +09:00
Ian Barwick	92d0e6809b	repmgr: "cluster show" to return non-zero value if an issue encountered	2018-07-05 13:32:50 +09:00
Ian Barwick	b2081dca52	De-overload configuration file parameter "standby_reconnect_timeout" Currently the (very generic sounding) "standby_reconnect_timeout" configuration file parameter is used in several different contexts and it would be useful to have more granular control over the different timeouts it's used to configure. This patch introduces "node_rejoin_timeout", used in place of "standby_reconnect_timeout" (which wasn't documented) when "repmgr node rejoin" is executed, to determine how long to wait for the node to rejoin the replication cluster. Additionally "repmgrd_standby_startup_timeout" is introduced as a timeout for failover situations, when repmgrd executes "repmgr standby follow" to follow a new primary, and waits for the standby to restart and become available for connections. "standby_reconnect_timeout" is now only relevant for "repmgr standby switchover". Implements GitHub #454.	2018-06-28 18:00:55 +09:00
Ian Barwick	080a29c33b	node check: add --missing-slots check This enables an explicit check for slots which should exist (according to the repmgr metadata) but which aren't present.	2018-06-22 17:21:40 +09:00
Ian Barwick	dd7a4068d2	node check: implement CSV output This is advertised in the --help output and placeholder code was in place, but it wasn't actually implemented.	2018-06-22 13:14:57 +09:00
Ian Barwick	fcf237fe31	node status: improve output and documentation In the default text output mode, list inactive slots. In CSV output mode, list inactive slots as additional information; add output line with number of missing slots and a list thereof. Also document --csv output mode.	2018-06-22 11:46:50 +09:00

1 2 3

131 Commits