repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-23 15:16:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	5886772cdb	Teach witness repmgrd to deal with the absence of a primary Previously it would refuse to start if the primary was not reachable, the thinking being that it's pointless trying to monitor an incomplete cluster. However following an aborted failover situation, repmgrd will restart monitoring and on the witness server, this will lead to it aborting itself due to to continuing absence of primary. To resolve this, witness repmgrd will now start monitoring in degraded mode if no primary is found in the hope a primary will reappear at some point.	2018-11-29 12:17:26 +09:00
Ian Barwick	3c5ef69f38	Add sanity check for extension version This should cover the cases where the "repmgr" extension was installed manually but not updated, or an upgrade was not fully completed.	2018-10-31 11:28:19 +09:00
Ian Barwick	a554914854	Avoid defining variable-length arrays As of PostgreSQL commit d9dd406f, variable length arrays are no longer permitted. As they're not actually required anyway, just define appropriate constants. Also noted in GitHub #510.	2018-10-26 10:15:01 +09:00
Ian Barwick	3e38759c02	use appendPQExpBufferStr/-Char() consistently	2018-10-04 08:42:42 +09:00
Ian Barwick	2491b8ae52	Add functionality to "pause" repmgrd In some circumstances, e.g. while performing a switchover, it is essential that repmgrd does not take any kind of failover action, as this will put the cluster into an incorrect state. Previously it was necessary to stop repmgrd on all nodes (or at least those nodes which repmgrd would consider as promotion candidates), however this is a cumbersome and potentially risk-prone operation, particularly if the replication cluster contains more than a couple of servers. To prevent this issue from occurring, this patch introduces the ability to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause") which notifies repmgrd not to take any failover action until the node is "unpaused" ("repmgr daemon unpause"). "repmgr daemon status" provides an overview of each node and whether repmgrd is running, and if so whether it is paused. "repmgr standby switchover" has been modified to automatically pause repmgrd while carrying out the switchover. See documentation for further details.	2018-09-27 16:42:10 +09:00
Ian Barwick	688337dec3	repmgr: add "--node-id" option to "cluster cleanup" Implements GitHub #493.	2018-09-25 15:56:40 +09:00
Ian Barwick	b0a2ee2259	get_all_node_records(): display any error encountered and return success status In many cases we'll want to bail out with an error if the node list can't be retrieved for any reason. This saves some repetitive coding.	2018-09-13 10:14:43 +09:00
Ian Barwick	17e75f6b31	repmgrd: improve reconnection handling Previously, if the server being monitored was not available, repmgrd would always close the existing connection handle and open a new one. However, in some cases, e.g. a brief network outage, the existing connection handle is still good and does not need to be reopened. This could be particularly problematic if monitoring_history is on, as this risks leaving orphan sessions on the primary which (given a sufficiently unstable network) could lead to all available backends being occupied. Instead, during an outage we now use a new connection to verify the server is accessible; if the old connection is still available (e.g. following a short network interruption) we continue using that; if not (e.g. the server was restarted), we use the new one.	2018-08-30 15:46:08 +09:00
Ian Barwick	ceeb6d7130	repmgrd: improve monitoring statistics logging Add more granular logging to help diagnose issues, and also keep track of when the last monitoring statistics update was set and emit that as DETAIL every time we emit a log status update.	2018-08-30 12:36:59 +09:00
Ian Barwick	3573950425	Add additional query error logging It's unlikely we'll get an error in these cases, but you never know. Also, with queries which return a list of node records, it's necessary to call _populate_node_records() even if the query fails, so a properly initalised, albeit empty list is returned to the caller.	2018-08-29 10:25:43 +09:00
Ian Barwick	c1586e39b7	Log text of failed queries at log level ERROR Previously query texts were always logged at log level DEBUG, but that doesn't help much in a normal production environment when trying to identify the cause of issues. Also make various other minor improvements to query logging and handling of database errors. Implements GitHub #498.	2018-08-29 10:08:52 +09:00
Ian Barwick	e1e59e85d7	repmgr: add "cluster_cleanup" event GitHub #492.	2018-08-24 09:20:05 +09:00
Ian Barwick	34c4f4c3f8	repmgr: truncate version string if necessary Some distributions may add extra information to PG_VERSION after the actual version number (e.g. "10.4 (Debian 10.4-2.pgdg90+1)"), so copy the version number string up until the first space is found. GitHub #490.	2018-08-14 09:55:23 +09:00
Ian Barwick	44a224ad92	repmgrd: fix configuration file reloading Don't allow "promote_command" or "follow_command" to be empty. GitHub #486.	2018-08-02 16:35:26 +09:00
Ian Barwick	7ecfb333b9	doc: add note about switchover and exclusive backups Also rename server_not_in_exclusive_backup_mode() to avoid double negatives. GitHub #476.	2018-07-19 16:02:31 +09:00
Martín Marqués	8f13a66aaa	Check that there is no exclusive backup taking place while we perform a switchover. We've found that this can cause some issues with postgres control metadata (could be a postgres bug) so best thing is not no switchover if there's a backup taking place. It's also a bad idea from an architectual point of view, as a switchover is supposed to be planed, so why perform it when we are taking backups. GitHub #476.	2018-07-19 16:02:21 +09:00
Ian Barwick	ef35d071bf	Fix is_active_bdr_node() query for BDR 2.x Copy/paste error when adapting the query for BDR 3.x.	2018-07-19 09:50:30 +09:00
Ian Barwick	7decc7975f	Fix BDR version check repgexp_match() is only available from PostgreSQL 10 and later.	2018-07-18 10:54:16 +09:00
Ian Barwick	fcf237fe31	node status: improve output and documentation In the default text output mode, list inactive slots. In CSV output mode, list inactive slots as additional information; add output line with number of missing slots and a list thereof. Also document --csv output mode.	2018-06-22 11:46:50 +09:00
Ian Barwick	0f97a98f28	repmgr: don't count witness node as a standby when running "node status" Addresses GitHub #451.	2018-06-21 13:06:18 +09:00
Ian Barwick	836d2125fe	Improve BDR3 node query We can get everything we need from bdr.node_summary	2018-06-15 14:30:06 +09:00
Ian Barwick	bf0d67c60a	Add repmgr.nodes to the BDR replication set	2018-06-15 14:29:08 +09:00
Ian Barwick	108c3a36fb	Enable creation of repmgr extension on BDR3 node	2018-06-15 14:26:47 +09:00
Ian Barwick	8377704596	Convert BDR query functions to handle BDR2/BDR3	2018-06-15 14:26:07 +09:00
Ian Barwick	4f642f8332	Detect and store BDR major version number when executing "is_bdr_db()" BDR3 metadata structure is very different to BDR1/2, so we'll need to generate queries according to version.	2018-06-15 14:25:55 +09:00
Ian Barwick	bcab4bc391	_create_event(): log event and node ID for debugging	2018-06-12 10:30:30 +09:00
Ian Barwick	b1b49748a7	"config_file" is MAXPGPATH, not MAXLEN The two values are the same anyway, so change is more for consistency.	2018-05-24 15:52:57 +09:00
Ian Barwick	276239422b	standby clone: don't assume existence of "user" in upstream conninfo Usually a seperate user (typically "repmgr") is set up specifically to manage the repmgr metadata, however there's no compelling requirement to do this, and it's possible the database owner (usually: "postgres") will be used, in which case it's possible the username will be left out of the conninfo string. Addresses GitHub #437.	2018-05-24 15:52:51 +09:00
Ian Barwick	bd63948937	Include "arpa/inet.h" in dbutils.c Needed for htonl() on FreeBSD.	2018-05-10 12:03:04 +09:00
Ian Barwick	887b845aa0	repmgrd: always close the connection if the pointer is not NULL	2018-04-26 10:04:07 +09:00
Ian Barwick	7822aa784f	repmgrd: catch corner case in standby connection handle check If repmgrd marks the local node as unavailable, and it was actually restarting but a failover event occured before the next local node check, failover will continue with the stale connection handle. Add a final local node check just before starting the failover process, so repmgrd can reconnect if it wasn't able to before.	2018-04-24 21:56:57 +09:00
Ian Barwick	4455ded935	repmgrd: prevent standby connection handle from going stale If monitoring history not in use, there's no activity on the standby's connection handle, so if e.g. the standby is restarted, PQstatus() never returns CONNECTION_BAD and repmgrd never notices the connection is stale. Therefore execute a throw-away statement at "monitor_interval_secs".	2018-04-24 21:56:52 +09:00
Ian Barwick	1bbb2ef213	Fix superuser password handling When establishing a superuser connection, the connection parameters were being copied from the existing (non-superuser) connection, which in some circumstances can lead to that user's password being included in the copied parameter list. The password parameter, if set, will now always be removed, which will cause libpq to retrieve the correct one from the .pgpass file. Addresses GitHub #400.	2018-04-12 12:49:41 +09:00
Ian Barwick	1e1b4b1a65	"standby register/follow": provide primary node details for event notifications For events generated by these commands, it may be useful to know details of the primary node. This makes following additional parameters available to event notification scripts: - %p: node ID of the primary - %a: node name of the primary - %c: conninfo string for the primary Implements GitHub #375	2018-04-03 14:32:19 +09:00
Ian Barwick	cf64f9e95c	Always initialise t_conninfo_param_list structures	2018-04-03 14:31:24 +09:00
Ian Barwick	3ccf1cf182	Enable pg_rewind to be used with PostgreSQL 9.3/9.4 pg_rewind is not part of the core distribution for those, but we provided support in repmgr 3.3 so should extend it to repmgr 4. Note that there is no check in place whether the pg_rewind binary exists, so it's up to the user to ensure it's present. Addresses GitHub #413.	2018-04-02 20:54:29 +09:00
Ian Barwick	50321bb95d	Log pg_control access errors as WARNINGs rather than DEBUG This will make it easier to diagnose issues, possibly with an incorrect "data_directory" setting in "repmgr.conf".	2018-04-02 09:28:56 +09:00
Ian Barwick	22c40ae62d	doc: update HISTORY and release notes	2018-03-30 09:41:48 +09:00
Ian Barwick	a403da67bc	Consolidate connection closure calls	2018-03-27 16:43:59 +09:00
Ian Barwick	462fdca4b4	Tidy up queries in dbutils.c - standardize formatting - prefix various internal function calls with "pg_catalog.", to mitigate possible risks from CVE-2018-1058	2018-03-23 10:28:28 +08:00
Ian Barwick	0219f4c91f	Always set "connect_timeout" when pinging a PostgreSQL instance Insert "connect_timeout=2" into the connection parameters, if not explicitly set by the user. This will prevent excessive wait time for the host operating system to report a connection timeout.	2018-03-21 11:48:57 +09:00
Ian Barwick	85a4adc99c	Update HISTORY	2018-03-21 06:48:32 +09:00
Martín Marqués	208d7d418e	While reviewing `7cb6e5af8d` before merging I noticed that besides the result cleanup added, there was still a missing spot inside the if condition. Adding the PQclear that was missing.	2018-03-13 11:43:36 -03:00
Andrzej Nowicki	d2a2df13d5	One more memory leak fixed	2018-03-13 11:23:33 +01:00
Andrzej Nowicki	358e001218	Clear node list to avoid memory leak, fixes #402	2018-03-13 11:05:24 +01:00
Ian Barwick	d7702b3444	Correctly handle error message pointer when parsing strings. When parsing conninfo strings, ensure the error message pointer is actually returned to the caller. Not a criticial issue, just meant the contents of the error message were not being displayed.	2018-03-10 14:29:12 +09:00
Ian Barwick	9981ede1af	"standby clone": fix --superuser handling get_superuser_connection() was erroneously using the local node record to connect to as a superuser, which works when registering the primary but obviously not when cloning a standby. Addresses GitHub #380.	2018-03-02 16:43:19 +09:00
Ian Barwick	29cb153643	"node status": improve replication slot warnings Addresses GitHub #385	2018-02-23 11:19:33 +09:00
Ian Barwick	c644ddde51	Fix typo in function name	2018-02-22 15:50:57 +09:00
Ian Barwick	ee98a3a58e	"standby clone": add --recovery-conf-only option This will generate "recovery.conf" for an existing standby. Typical use-case is a standby cloned manually from an external data source (e.g. Barman), where "recovery.conf" needs to be created (and if required a replication slot). The --dry-run option will check the pre-requisites but not actually create "recovery.conf" or a replication slot. This requires that the upstream node is running, a replication connection can be made and if required a replication slot can be created. Implements GitHub #382.	2018-02-22 15:50:51 +09:00

1 2 3 4 5

204 Commits