repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-22 22:56:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	53546b1c88	node rejoin: remove unneeded PQfinish()	2020-06-10 10:36:55 +09:00
Ian Barwick	bc566f7a42	standby check: ignore upstream/downstream connections if node is witness Per report in GitHub #641.	2020-05-08 09:37:30 +09:00
Ian Barwick	38b3447bd3	Add repmgr home page to --help output Per PostgreSQL commit 1933ae629e7b706c6c23673a381e778819db307d it seems to be all the rage these days.	2020-04-24 09:41:56 +09:00
Ian Barwick	45e96f21a5	node check: add option --db-connection This is intended for diagnostic purposes, primarily when diagnosing the connection parameters used when repmgr is being executed on a remote node.	2020-04-15 17:48:23 +09:00
Ian Barwick	cfd35852b7	standby switchover: improve archive check error handling Explicitly log if a database connection failure caused the check to fail. It's unlikely this situation will be encountered, as the data directory check will already have run and checked for connection failure, however there's a small chance the connection could fail between checks.	2020-04-15 14:08:33 +09:00
Ian Barwick	32dde4eaaf	standby switchover: improve directory check failure handling It's possible that the remote data directory check will fail if e.g. connection configuration is not consistent across all nodes. This modification ensures a database error connection is reported, rather than a spurios issue with the data directory configuration.	2020-04-15 14:08:29 +09:00
Ian Barwick	78f89a4d47	node check: report connection error if --optformat provided The --optformat option is intended for use when repmgr is being invoked remotely by another repmgr instance, typically during a switchover operation. Previously no output was returned if the local repmgr was unable to connect to its local PostgreSQL instance, which made diagnosing various corner-case problems trickier than it should be.	2020-04-15 14:08:25 +09:00
Ian Barwick	e59da2d74e	node check: display upstream info before downstream info It makes more sense that way.	2020-03-31 11:08:28 +09:00
Ian Barwick	bffb8fa11b	node check: handle --upstream option when node is primary	2020-03-31 10:49:30 +09:00
Ian Barwick	d9cb38c7f0	node check: add --upstream option We have a --downstream option to check for attached nodes, but it would be useful to have a corresponding --upstream option too. A following patch will adapt the behaviour of this option when executed on the primary node.	2020-03-30 17:54:52 +09:00
Ian Barwick	6895916914	node service: explicitly note the node identity where CHECKPOINT issued This output is logged during "standby switchover", so it's useful to be aware of which node the activity is being performed on.	2020-03-25 14:00:55 +09:00
Ian Barwick	2e9bc31c8c	Consolidate code for establishing a superuser connection	2020-03-25 11:02:31 +09:00
Ian Barwick	325e3ea541	Update "repmgr node" help output	2020-03-25 09:37:49 +09:00
Ian Barwick	2b06f2d1ae	node service: enable provision of the -S/--superuser option This is required to be able to execute a CHECKPOINT if the normal repmgr user is not a superuser.	2020-03-24 17:25:34 +09:00
Ian Barwick	304c1391cc	node action: don't attempt to issue a CHECKPOINT as non-superuser Raise a warning instead.	2020-03-24 17:25:27 +09:00
Ian Barwick	e561ddc8d3	node check: accept -S/--superuser option This is mainly useful for the --data-directory-config option, which requires permission to read pg_settings to verify that the data directory configured in "repmgr.conf" matches the data directory actually in use. If pg_settings read permission is not available, repmgr will fall back to a simple check that the data directory configured in "repmgr.conf" is a valid PostgreSQL directory. This is not entirely foolproof, as it's possible PostgreSQL could be using a different data directory.	2020-03-23 17:14:04 +09:00
Ian Barwick	9de31428f1	Consolidate replication connection code In a few places, replication connections are generated from the parameters used by existing connections. This has resulted in a number of similar blocks of code which do more-or-less the same thing almost but not quite identically. In two cases, the code omitted to set "dbname=replication", which can cause problems in some contexts. These code blocks have now been consolidated into standardized functions. This also resolves the issue addressed by GitHub #619.	2020-03-05 17:21:37 +09:00
Ian Barwick	8f6058c676	standby switchover: check replication configuration file ownership Within a PostgreSQL data directory, all files should have the same ownership as the data directory itself. PostgreSQL itself expects this, and ownership of files by another user is likely to cause problems. In PostgreSQL 11 or earlier, if "recovery.conf" cannot be moved by PostgreSQL (because e.g. it is owned by root), it will not be possible to promote the standby to primary. In PostgreSQL 12 and later, if "postgresql.auto.conf" on the demotion candidate (current primary) has incorrect ownership (e.g. owned by root), repmgr will very likely not be able to modify this file and write the replication configuration required for the node to rejoin the cluster as a standby. Checks added to catch both cases before a switchover is executed.	2020-03-04 17:21:22 +09:00
Ian Barwick	7ed0a99d70	Make code to check standby join status available globally This makes it possible to check the standby join status from another node, e.g. the promotion candidate during a switchover operation.	2020-02-04 12:52:55 +09:00
Ian Barwick	e2a362a171	"node rejoin": check for available replication slots on the rejoin target "standby follow" did this already, but "node rejoin" didn't.	2020-02-03 17:03:20 +09:00
Ian Barwick	4d4ed3bcd6	Remove BDR 2.x support The BDR 2.x support was conceptual only and was never used in production. As BDR 2.x will be EOL'd shortly, there is no risk it will be needed.	2020-01-16 09:52:42 +09:00
Ian Barwick	7fdf2f1778	Update copyright notices to 2020	2020-01-13 14:06:20 +09:00
Ian Barwick	f5044465cb	Add function to safely modify postgresql.auto.conf This is required for PostgreSQL 12 and later.	2019-08-14 16:57:42 +09:00
Ian Barwick	4ebc43fd63	Clean up variable usage in do_node_status() Variable with the same name existed both at function level and within local code blocks.	2019-08-14 14:15:41 +09:00
Ian Barwick	38b373e6df	"node check": check role membership when trying to read pg_settings From PostgreSQL 10, a member of the default roles "pg_monitor" and/or "pg_read_all_settings" can read pg_settings without requiring superuser privileges. Previously, a hint was being emitted about making the repmgr user a member of one of those groups, but no check for membership was being made, meaning the check could only be run by a superuser.	2019-08-07 14:26:48 +09:00
Ian Barwick	3c8bab97d8	Fix variable declarations	2019-05-22 17:26:34 +09:00
Ian Barwick	dd78a16006	Change return type of is_downstream_node_attached() from bool to NodeAttached This enables us to better determine whether a node is definitively attached, definitively not attached, or if it was not possible to determine the attached state.	2019-05-14 15:57:20 +09:00
Ian Barwick	52905f1eb3	Standardize on "ID: %i" when logging node IDs Previously there was a mix of "id:", "node id:", "node ID:" and "node_id:".	2019-04-30 17:07:33 +09:00
Ian Barwick	bb42d8cba6	Fix calculation of maximum filename length	2019-03-28 12:40:29 +09:00
Ian Barwick	fe822a9eea	Prevent potential file descriptor resource leak	2019-03-28 12:29:10 +09:00
Ian Barwick	03cd5a6028	Put closedir call in correct location	2019-03-28 12:08:42 +09:00
Ian Barwick	1e1c596446	Add various missing close() calls	2019-03-28 11:32:25 +09:00
Ian Barwick	314a1e8f4f	use a constant to denote unknown replication lag	2019-03-20 17:26:04 +09:00
Ian Barwick	19bf4d7434	Count witness and zero-priority nodes in visibility check	2019-03-14 11:17:51 +09:00
Ian Barwick	9823978f41	repmgrd: don't wait for WAL receiver to reconnect during failover If the WAL receiver has been temporarily disabled, we don't want to wait for it to start up as it may not be able to at that point; we do however need to reset "wal_retrieve_retry_interval".	2019-03-06 15:54:56 +09:00
Ian Barwick	1615353f48	repmgrd: optionally disconnect WAL receivers during failover This is intended to ensure that all nodes have a constant LSN while making the failover decision. This feature is experimental and needs to be explicitly enabled with the configuration file option "standby_disconnect_on_failover". Note enabling this option will result in a delay in the failover decision until the WAL receiver is disconnected on all nodes.	2019-03-06 15:53:57 +09:00
Ian Barwick	71d151ca87	Don't check status of logical replication slots We only want to check the status of physical replication slots to determine whether a streaming replication standby has become detached and there is therefore a risk of uncontrolled WAL buildup on the local node. It's not feasible to second-guess the state of logical replication slots.	2019-02-23 10:09:43 +09:00
Ian Barwick	de70fd42dc	node check: simplify output generation in --is-shutdown-cleanly check	2019-02-22 10:49:06 +09:00
Ian Barwick	aeb9639ed9	node rejoin: add more log detail during rejoin success check Stating what is actually being checked where might be useful when diagnosing potential issues.	2019-02-13 15:29:39 +09:00
Ian Barwick	790bec21dd	node rejoin: handle case where node to rejoin was primary In that case the minRecoveryPoint* fields may be empty.	2019-02-12 13:31:25 +09:00
Ian Barwick	a0dc673439	"node rejoin": use minRecoveryPointTLI for comparing timelines	2019-02-12 13:31:21 +09:00
Ian Barwick	cce8b76171	"standby switchover": abort if promotion candidate has WAL replay paused If replay is paused, we can't be really sure that more WAL will be received between the check and the promote operation, which would risk the promote operation not taking place during the switchover (it would happen as soon as WAL replay is resumed and pending WAL is replayed). Therefore we simply quit with an informative slew of messages and leave the user to sort it out. GitHub #540.	2019-02-05 16:32:39 +09:00
Ian Barwick	f9a1861ded	Refactor ReplInfo struct handling Eventually we'll want to have this contain the optional replication info contained in the t_node_info struct, which should then contain a pointer to a ReplInfo struct.	2019-02-02 18:39:24 +09:00
Ian Barwick	20b79f998c	Define some previously magic numbers	2019-02-01 19:14:16 +09:00
Ian Barwick	64bb034d34	"node rejoin": catch corner case where repmgr metadata is outdated If the rejoin target is not in recovery, but not registered as primary (we detect this by attempting to connect to the registered primary) we abort and suggest fixing the repmgr metadata first.	2019-01-31 11:54:05 +09:00
Ian Barwick	59eca2be30	node rejoin: improve error code handling - return ERR_REJOIN_FAIL in all cases where the rejoin operation fails - ensure ERR_FOLLOW_FAIL is not returned - document error codes	2019-01-24 10:31:45 +09:00
Ian Barwick	dfe57d2406	"node rejoin": log pg_rewind command as DETAIL rather than DEBUG	2019-01-23 17:15:07 +09:00
Ian Barwick	061932d023	"node rejoin": verify status of rejoin target This adapts the code previously added to "standby follow" to verify whether the rejoin target can actually be rejoined.	2019-01-23 17:08:55 +09:00
Ian Barwick	3f5762e03a	Refactor upstream attachment check code Move it from the "standby follow" code to an independent function so it can be used in other contexts, e.g. "node rejoin".	2019-01-23 15:11:42 +09:00
Ian Barwick	42fa9a2a88	Log node rejoin failure as ERROR	2019-01-23 13:55:40 +09:00

1 2 3

147 Commits