repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-23 15:16:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	32b81e7d49	"daemon start": initial implementation	2019-01-29 13:01:14 +09:00
Ian Barwick	1980deb480	repmgrd: check for a change to the upstream node If the upstream node has changed, for example after "repmgr standby follow" was manually executed, restart monitoring to ensure repmgrd is monitoring the correct node.	2019-01-22 13:33:13 +09:00
Ian Barwick	7dce3ed234	Update copyright notices to 2019	2019-01-21 14:54:35 +09:00
Ian Barwick	d4e993a240	Improve handling of connection URIs when executing remote commands Previously, if connection URIs were in use and "repmgr standby switchover" was executed, repmgr would pass the connection URI as-is to the demotion candidate to execute "repmgr node rejoin". However the presence of unescaped ampersands in the connection URI was causing the rejoin command to be incorrectly executed. Addresses GitHub #525.	2019-01-14 11:11:51 +09:00
Ian Barwick	81eb9d99e7	Add missing comma	2019-01-08 11:44:32 +09:00
Ian Barwick	40408a1734	repmgrd: check binary and extension major versions match repmgr requires that the same "major version" (e.g. 4.3) is present on all nodes, otherwise - particularly in the case of repmgrd - it's highly likely things won't work as expected. Implements part of GitHub #515.	2019-01-07 15:39:40 +09:00
Ian Barwick	66b40ffc68	Simplify function create_replication_slot() Following the changes in `793d83b`, it's no longer necessary to pass the server version number.	2018-11-29 14:35:01 +09:00
Ian Barwick	b498db87aa	Remove redundant function declaration	2018-11-28 13:51:14 +09:00
Ian Barwick	793d83b22c	Refactor server version detection Most of the time we can simply get the version number directly from the connection handle. Previously it was held in a global variable, which was an icky way of doing things. In a few special cases we also need the actual version string, which is obtained directly from the database.	2018-11-22 21:30:31 +09:00
Ian Barwick	0f4e04e61e	Add function get_current_lsn() This is a somewhat convoluted attempt to retrieve the current LSN of any node, regardless of whether in recovery or not, and if in recovery, independent of whether streaming or recovering from archive.	2018-11-22 19:31:49 +09:00
Ian Barwick	80a280cbf4	Add function get_timeline_history() This will be required for verifying whether one node is able to follow another node.	2018-11-22 15:26:50 +09:00
Ian Barwick	0caec90d81	repmgrd: set primary last seen	2018-11-21 11:30:27 +09:00
Ian Barwick	c3bc5585d9	Add sanity check for extension version This should cover the cases where the "repmgr" extension was installed manually but not updated, or an upgrade was not fully completed.	2018-10-31 11:16:36 +09:00
Ian Barwick	c336e384ab	Support "pg_promote()" function (PostgreSQL 12 and later) This is an experimental feature.	2018-10-26 11:02:45 +09:00
Ian Barwick	bc1956dee9	Formatting standardization	2018-10-26 10:42:13 +09:00
Ian Barwick	2491b8ae52	Add functionality to "pause" repmgrd In some circumstances, e.g. while performing a switchover, it is essential that repmgrd does not take any kind of failover action, as this will put the cluster into an incorrect state. Previously it was necessary to stop repmgrd on all nodes (or at least those nodes which repmgrd would consider as promotion candidates), however this is a cumbersome and potentially risk-prone operation, particularly if the replication cluster contains more than a couple of servers. To prevent this issue from occurring, this patch introduces the ability to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause") which notifies repmgrd not to take any failover action until the node is "unpaused" ("repmgr daemon unpause"). "repmgr daemon status" provides an overview of each node and whether repmgrd is running, and if so whether it is paused. "repmgr standby switchover" has been modified to automatically pause repmgrd while carrying out the switchover. See documentation for further details.	2018-09-27 16:42:10 +09:00
Ian Barwick	688337dec3	repmgr: add "--node-id" option to "cluster cleanup" Implements GitHub #493.	2018-09-25 15:56:40 +09:00
Ian Barwick	b0a2ee2259	get_all_node_records(): display any error encountered and return success status In many cases we'll want to bail out with an error if the node list can't be retrieved for any reason. This saves some repetitive coding.	2018-09-13 10:14:43 +09:00
Ian Barwick	17e75f6b31	repmgrd: improve reconnection handling Previously, if the server being monitored was not available, repmgrd would always close the existing connection handle and open a new one. However, in some cases, e.g. a brief network outage, the existing connection handle is still good and does not need to be reopened. This could be particularly problematic if monitoring_history is on, as this risks leaving orphan sessions on the primary which (given a sufficiently unstable network) could lead to all available backends being occupied. Instead, during an outage we now use a new connection to verify the server is accessible; if the old connection is still available (e.g. following a short network interruption) we continue using that; if not (e.g. the server was restarted), we use the new one.	2018-08-30 15:46:08 +09:00
Ian Barwick	7ecfb333b9	doc: add note about switchover and exclusive backups Also rename server_not_in_exclusive_backup_mode() to avoid double negatives. GitHub #476.	2018-07-19 16:02:31 +09:00
Martín Marqués	8f13a66aaa	Check that there is no exclusive backup taking place while we perform a switchover. We've found that this can cause some issues with postgres control metadata (could be a postgres bug) so best thing is not no switchover if there's a backup taking place. It's also a bad idea from an architectual point of view, as a switchover is supposed to be planed, so why perform it when we are taking backups. GitHub #476.	2018-07-19 16:02:21 +09:00
Ian Barwick	fcf237fe31	node status: improve output and documentation In the default text output mode, list inactive slots. In CSV output mode, list inactive slots as additional information; add output line with number of missing slots and a list thereof. Also document --csv output mode.	2018-06-22 11:46:50 +09:00
Ian Barwick	836d2125fe	Improve BDR3 node query We can get everything we need from bdr.node_summary	2018-06-15 14:30:06 +09:00
Ian Barwick	bf0d67c60a	Add repmgr.nodes to the BDR replication set	2018-06-15 14:29:08 +09:00
Ian Barwick	108c3a36fb	Enable creation of repmgr extension on BDR3 node	2018-06-15 14:26:47 +09:00
Ian Barwick	8377704596	Convert BDR query functions to handle BDR2/BDR3	2018-06-15 14:26:07 +09:00
Ian Barwick	276239422b	standby clone: don't assume existence of "user" in upstream conninfo Usually a seperate user (typically "repmgr") is set up specifically to manage the repmgr metadata, however there's no compelling requirement to do this, and it's possible the database owner (usually: "postgres") will be used, in which case it's possible the username will be left out of the conninfo string. Addresses GitHub #437.	2018-05-24 15:52:51 +09:00
Ian Barwick	4455ded935	repmgrd: prevent standby connection handle from going stale If monitoring history not in use, there's no activity on the standby's connection handle, so if e.g. the standby is restarted, PQstatus() never returns CONNECTION_BAD and repmgrd never notices the connection is stale. Therefore execute a throw-away statement at "monitor_interval_secs".	2018-04-24 21:56:52 +09:00
Ian Barwick	1e1b4b1a65	"standby register/follow": provide primary node details for event notifications For events generated by these commands, it may be useful to know details of the primary node. This makes following additional parameters available to event notification scripts: - %p: node ID of the primary - %a: node name of the primary - %c: conninfo string for the primary Implements GitHub #375	2018-04-03 14:32:19 +09:00
Ian Barwick	3ccf1cf182	Enable pg_rewind to be used with PostgreSQL 9.3/9.4 pg_rewind is not part of the core distribution for those, but we provided support in repmgr 3.3 so should extend it to repmgr 4. Note that there is no check in place whether the pg_rewind binary exists, so it's up to the user to ensure it's present. Addresses GitHub #413.	2018-04-02 20:54:29 +09:00
Ian Barwick	a403da67bc	Consolidate connection closure calls	2018-03-27 16:43:59 +09:00
Ian Barwick	0219f4c91f	Always set "connect_timeout" when pinging a PostgreSQL instance Insert "connect_timeout=2" into the connection parameters, if not explicitly set by the user. This will prevent excessive wait time for the host operating system to report a connection timeout.	2018-03-21 11:48:57 +09:00
Ian Barwick	d7702b3444	Correctly handle error message pointer when parsing strings. When parsing conninfo strings, ensure the error message pointer is actually returned to the caller. Not a criticial issue, just meant the contents of the error message were not being displayed.	2018-03-10 14:29:12 +09:00
Ian Barwick	9981ede1af	"standby clone": fix --superuser handling get_superuser_connection() was erroneously using the local node record to connect to as a superuser, which works when registering the primary but obviously not when cloning a standby. Addresses GitHub #380.	2018-03-02 16:43:19 +09:00
Ian Barwick	29cb153643	"node status": improve replication slot warnings Addresses GitHub #385	2018-02-23 11:19:33 +09:00
Ian Barwick	c644ddde51	Fix typo in function name	2018-02-22 15:50:57 +09:00
Ian Barwick	22b3a74fa0	repmgrd: improve detection of status change from primary to standby If repmgrd is running in degraded mode on a primary which has been stopped, then manually been brought back online as a standby (e.g. by creating recovery.conf and starting the server), ensure it not only detects the change but automatically updates the node record so it can resume monitoring the node as a standby. Previously, repmgrd was looping waiting for the record to be updated (as is done transparently when executing "repmgr node rejoin") but if the record was not updated within the timeout period (e.g. by "repmgr standby register) it would fail to resume monitoring as a standby. It seems reasonable to have repmgrd automatically update the node record, as this will restore failover capability as quickly as possible. If this is not desired, then the onus is on the user to shut down repmgrd while making the desired changes.	2018-02-22 15:50:45 +09:00
Ian Barwick	6b7f6089ba	"node status": add warning about missing replication slots Implements GitHub #364.	2018-02-12 11:38:27 +09:00
Ian Barwick	927bf038a0	"standby switchover": check demotion candidate can make replication connection Check it's actually possible for the demotion candidate to attach to the promotion candidate before executing the switchover. As with other checks of this nature, there's a faint possibility the situation could change between the time the check is carried out and the demotion candidate is restarted to connect to the promotion candidate, but there's not a lot we can do about that. The main purpose is to be able to catch existing misconfigurations before anything gets changed. Implements GitHub #370.	2018-02-09 10:00:54 +09:00
Ian Barwick	657ed83921	"cluster show": improve handling of database errors In particular, if running "repmgr cluster show" against a database without the repmgr metadata, showing the error (rather than just "no records found" etc.) will provide some clues about the problem.	2018-02-05 10:35:56 +09:00
Ian Barwick	6c81e54f76	"standby follow": check for replication slot availability on target node	2018-02-02 17:18:43 +09:00
Ian Barwick	8fd0c4ad83	repmgr: assume node is actually shutting down if pingable and that's the reported status	2018-01-12 21:53:37 +09:00
Ian Barwick	7ccae6c2b1	repmgr: automatically create slot name if missing It's possible that a node was registered with "use_replication_slots=false" but that was later changed to "use_replication_slots=true". If the node was not subsequently re-registered, the node record will contain an empty slot name, which will cause any slot creation operation during "standby follow" or "node rejoin" to fail. To prevent this happening, check for an empty slot name and automatically set before proceeding. Addresses GitHub #343.	2018-01-11 14:47:50 +09:00
Ian Barwick	61d46172b9	repmgr: catch possible corner case when checking node shutdown status It's conceivable that PQping is returning "no response" but the shutdown hasn't quite completed.	2018-01-10 15:09:21 +09:00
Ian Barwick	810471b2f2	repmgr: during switchover, correctly detect unclean shutdown status	2018-01-10 12:25:16 +09:00
Ian Barwick	5bd8cf958a	repmgr standby switchover: add "%p" event notification parameter This will contain the node ID of the former primary.	2018-01-10 12:25:12 +09:00
Ian Barwick	fcb7e7a29b	"repmgr bdr register": create missing connection replication set if needed Previously the assumption was that the "repmgr" replication set would be set up when the nodes are created, however no checks were implemented and this was not well-documented. Addresses GitHub #347.	2018-01-04 17:46:49 +09:00
Ian Barwick	26e404b1f3	"repmgr bdr register": improve node name check We'll use "bdr.bdr_get_local_node_name()" to check the local BDR node name and the repmgr one match.	2018-01-04 17:46:44 +09:00
Ian Barwick	cad12b1fb7	"repmgr cluster event": move query to dbutils.c	2018-01-04 14:55:46 +09:00
Ian Barwick	26a9e848fd	Update copyright notices to 2018	2018-01-02 10:19:46 +09:00

1 2 3

149 Commits