repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-22 22:56:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	09979eaa91	note that "standby follow" requires a primary to be available While it's technically possible to have a standby follow another standby while the primary is not available, repmgr will not be able to update its metadata, which will cause Confusion and Chaos. Update the documentation to make this clear, and provide a more helpful error message if this situation occurs. The operation previously failed anyway, but with an unhelpful message about not being able to find a node record.	2019-06-11 15:14:17 +09:00
Ian Barwick	341421f8e0	standby follow: remove some ineffective code For some reason we were taking the trouble to extract an appliction_name from the local node's conninfo, but this was being subsequently overwritten with the node name (which is what we want anyway).	2019-06-06 12:12:33 +09:00
Ian Barwick	f5d29f6591	doc: update release notes	2019-06-06 11:30:30 +09:00
Ian Barwick	c0ea5ffa04	Ensure parsed value of --upstream-conninfo is written to recovery.conf Previously it was being parsed (a step which ensures any "application_name" set by the caller is changed to the node name), but the original string was being copied to "primary_conninfo" anyway.	2019-06-06 11:30:24 +09:00
Ian Barwick	45e17223b9	Update variable/field names relating to pg_basebackup's -X option Now the "xlog nomenclature" Pg versions are fading into the past, rename things related to handling pg_basebackup's -X option (was: --xlog-method, now: --wal-method) to start with "wal_" rather than "xlog_". This is a cosmetic change for code clarity.	2019-05-30 09:32:06 +09:00
Ian Barwick	c153e2fc02	standby clone: improve --dry-run output Log positive check results as an additional confirmation that the upstream configuration appears to be correct.	2019-05-28 00:54:39 +09:00
Ian Barwick	44a39760a1	standby clone: improve source node replication connection check Previously, the check was attempting to make replication connections to the source node, and if these were failing, inferring that insufficient walsenders were available. However it's quite likely that the connections are refused due to insufficient user connection permissions. So before performing the connection check, query the number of potentially available walsenders on the source node and compare it with the number required (either 1 or 2) - if insufficient, exit with error and hint about increasing "max_wal_senders". Once we've established sufficient walsenders are available, inability to connect is most likely related to permissions issues on the source node.	2019-05-28 00:11:53 +09:00
Ian Barwick	b959f771c1	Improve naming/usage of node record variables in "standby clone" Make it clearer we're dealing with the upstream node record. Also avoid "overloading" the upstream record when checking for an existing record with the same node name; this was not technically a problem but mildly confusing when reading the code.	2019-05-27 23:39:49 +09:00
Ian Barwick	c9e85996f5	repmgr: prevent a standby being cloned from a witness server Previously repmgr would happily clone from whatever server it found at the provided source server address. We should ensure that a standby can only be cloned from a node which is part of the main replication cluster. This check fetches a list of nodes from the source server, connects to the first non-witness server it finds, and compares the system identifiers of the source node and the node it has connected to. If there is a mismatch, then the source server is clearly not part of the main replication cluster, and is most likely the witness server.	2019-05-22 16:52:25 +09:00
Ian Barwick	dd78a16006	Change return type of is_downstream_node_attached() from bool to NodeAttached This enables us to better determine whether a node is definitively attached, definitively not attached, or if it was not possible to determine the attached state.	2019-05-14 15:57:20 +09:00
Ian Barwick	d8e4c54ea4	"standby switchover": add "--repmgrd-force-unpause" Implements GitHub #559.	2019-05-10 16:04:07 +09:00
Ian Barwick	4b37562444	Make it clearer that a witness node counts as a "sibling node" It's not attached to the primary per-se, but needs to know what the current primary is in order to correctly synchronise its copy of the metadata. Per GitHub #560.	2019-05-02 14:22:53 +09:00
Ian Barwick	fed09ecaae	standby promote: have former siblings follow new primary	2019-05-02 12:04:49 +09:00
Ian Barwick	98d09f83b5	standby (promote\|switchover): improve --dry-run functionality Continue checks as far as possible.	2019-05-02 12:04:43 +09:00
Ian Barwick	7bbe938e19	Separate promotion candidate walsender/slot checks into discrete functions For use by "standby promote" as well as "standby follow"	2019-05-02 12:04:40 +09:00
Ian Barwick	63c7f758c3	Remove unneeded server version number variables No need to pass these around.	2019-05-02 12:04:33 +09:00
Ian Barwick	b9f07f6a91	standby promote: use variable name "local_conn" for the local connection handle This is consistent with usage in other functions, and makes it easier to differentiate between the local node connection and the primary connection.	2019-05-02 12:04:26 +09:00
Ian Barwick	e4615b4666	Refactor code for executing --siblings-follow This will enable provision of "--siblings-follow" to "repmgr standby promote"	2019-05-02 12:04:15 +09:00
Ian Barwick	52905f1eb3	Standardize on "ID: %i" when logging node IDs Previously there was a mix of "id:", "node id:", "node ID:" and "node_id:".	2019-04-30 17:07:33 +09:00
Ian Barwick	89a7261483	Always quote node names in log messages	2019-04-30 15:52:56 +09:00
Ian Barwick	e32acda8c0	standby switchover: ignore nodes which are unreachable and marked as inactive Previously "repmgr standby switchover" would abort if any node was unreachable, as that means it was unable to check if repmgrd is running. However if the node has been marked as inactive in the repmgr metadata, it's reasonable to assume the node is no longer part of the replication cluster and does not need to be checked.	2019-04-29 14:35:49 +09:00
Ian Barwick	ad28cf95bd	standby register: add upstream node ID in event details	2019-04-16 11:01:22 +09:00
Ian Barwick	dd454a8374	Miscellaneous string handling cleanup This is mainly to prevent effectively spurious truncation warnings in recent GCC versions.	2019-04-10 16:18:56 +09:00
Ian Barwick	77b9887d61	standby clone: improve --dry-run behaviour in barman mode - emit additional informational output - ensure that provision of --force does not result in an existing data directory being modified in any way	2019-04-08 15:12:22 +09:00
Ian Barwick	67e977592c	standby switchover: list nodes which will remain attatched to the old primary If --siblings-follow is not supplied, list all nodes which repmgr considers to be siblings (this will include the witness server, if in use), and which will remain attached to the old primary.	2019-04-02 10:46:59 +09:00
Ian Barwick	79613af8d0	Handle potential NULL return from string_skip_prefix()	2019-03-28 12:45:53 +09:00
Ian Barwick	e44c048ae2	Update code comment	2019-03-28 12:44:30 +09:00
Ian Barwick	1e1c596446	Add various missing close() calls	2019-03-28 11:32:25 +09:00
Ian Barwick	ba1f05ece9	Restrict "node_name" to maximum 63 characters In "recovery.conf", the configuration parameter "node_name" is used as the "application_name" value, which will be truncated by PostgreSQL to 63 characters (NAMEDATALEN - 1). repmgr sometimes needs to be able to extract the application name from pg_stat_replication to determine if a node is connected (e.g. when executing "repmgr standby register"), so the comparison will fail if "node_name" exceeds 63 characters.	2019-03-28 10:37:57 +09:00
Ian Barwick	73ad689390	standby register: fail if --upstream-node-id is the local node ID	2019-03-27 14:22:55 +09:00
Ian Barwick	6f0f338968	standby follow: set replication user when connecting to local node	2019-03-21 16:43:39 +09:00
Ian Barwick	bd26eb3025	standby switchover: don't attempt to pause repmgrd on unreachable nodes	2019-03-21 13:48:59 +09:00
Ian Barwick	314a1e8f4f	use a constant to denote unknown replication lag	2019-03-20 17:26:04 +09:00
Ian Barwick	46efe57cd0	Improve database connection failure logging Log the output of PQerrorStatus() in a couple of places where it was missing. Additionally, always log the output of PQerrorStatus() starting with a blank line, otherwise the first line looks like it was emitted by repmgr, and it's harder to scan the error message. Before: [2019-03-20 11:24:15] [DETAIL] could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5501? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5501? After: [2019-03-20 11:27:21] [DETAIL] could not connect to server: Connection refused Is the server running on host "localhost" (::1) and accepting TCP/IP connections on port 5501? could not connect to server: Connection refused Is the server running on host "localhost" (127.0.0.1) and accepting TCP/IP connections on port 5501?	2019-03-20 11:47:28 +09:00
Ian Barwick	19bf4d7434	Count witness and zero-priority nodes in visibility check	2019-03-14 11:17:51 +09:00
Ian Barwick	b1875a8d91	Split command execution functions into separate library These may need to be executed by repmgrd.	2019-02-27 14:41:17 +09:00
Ian Barwick	0578053875	standby clone: check upstream connections after data copy operation With long-running copy operations, it's possible the connection(s) to the primary/source server may go away for some reason, so recheck their availability before attempting to reuse.	2019-02-26 14:37:05 +09:00
Ian Barwick	99550b91bd	standby register: warn if standby is running and connection params provided Addresses GitHub #552.	2019-02-22 10:31:00 +09:00
Ian Barwick	f3fc4e5afb	Minor syntax formatting tweak For consistency.	2019-02-15 19:58:35 +09:00
Ian Barwick	7fad2ed2c8	standby switchover: improve error output It wasn't clear why repmgr thinks the demotion candidate is not the upstream of the promotion candidate.	2019-02-14 17:22:24 +09:00
Ian Barwick	464ec6bec3	Ensure conninfo param list is initialized for --recovery-conf-only option	2019-02-06 12:58:09 +09:00
Ian Barwick	3bbbf6daa9	"recovery_file_path" is MAXPGPATH	2019-02-06 10:42:09 +09:00
Ian Barwick	cd3312496e	Rename functions which return an LSN for clarity	2019-02-06 09:32:53 +09:00
Ian Barwick	cce8b76171	"standby switchover": abort if promotion candidate has WAL replay paused If replay is paused, we can't be really sure that more WAL will be received between the check and the promote operation, which would risk the promote operation not taking place during the switchover (it would happen as soon as WAL replay is resumed and pending WAL is replayed). Therefore we simply quit with an informative slew of messages and leave the user to sort it out. GitHub #540.	2019-02-05 16:32:39 +09:00
Ian Barwick	2a529e7e8b	"standby promote": don't promote if replay paused and in archive recovery It does not appear feasible to predict if there is still WAL waiting to be replayed from archive. In this case take no action. GitHub #540.	2019-02-05 14:39:08 +09:00
Ian Barwick	701944c194	"standby promote": add check for WAL replay status if replay is paused If WAL replay is paused but WAL is still pending replay, PostgreSQL will ignore the promote request until WAL replay is unpaused. This may lead to the standby being promoted at an unpredictable point in time outside of repmgr's control. Moreover it may not be obvious that this is happening, or why, and it will appear that an apparently successful promotion attempt has not actually worked. To prevent this from happening, repmgr will now refuse to promote the standy if WAL replay is paused and WAL is still pending replay. GitHub #540.	2019-02-05 13:30:37 +09:00
Ian Barwick	f9a1861ded	Refactor ReplInfo struct handling Eventually we'll want to have this contain the optional replication info contained in the t_node_info struct, which should then contain a pointer to a ReplInfo struct.	2019-02-02 18:39:24 +09:00
Ian Barwick	b9ba97a36d	"standby switchover": check replication connection to upstream Ensure repmgr checks the standby (promotion candidate) is currently attached to the primary (demotion candidate). Addresses issue reported in GitHub #519.	2019-02-01 15:28:06 +09:00
Ian Barwick	9273e7af73	"standby switchover": avoid potential race condition with WAL location check Immediately after the demotion candidate (primary) has shut down, we can't be absolutely sure that the walreceiver has flushed all WAL to disk, so checking pg_last_wal_receive_lsn() at that point might not reflect the actual last available WAL location. To handle this, we'll loop for a while (timeout controlled by configuration parameter "wal_receive_check_timeout") before finally deciding whether the standby is still behind the shut-down primary. Addresses issue raised in GitHub #518.	2019-02-01 12:06:22 +09:00
Ian Barwick	d7420d7274	daemon (start\|stop): verify that repmgrd starts/stops. Note this may not always be possible for "daemon stop" if we are unable to determine the repmgrd PID.	2019-01-30 14:41:31 +09:00

1 2 3 4 5 ...

295 Commits