repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-23 15:16:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	ceeb6d7130	repmgrd: improve monitoring statistics logging Add more granular logging to help diagnose issues, and also keep track of when the last monitoring statistics update was set and emit that as DETAIL every time we emit a log status update.	2018-08-30 12:36:59 +09:00
Ian Barwick	3573950425	Add additional query error logging It's unlikely we'll get an error in these cases, but you never know. Also, with queries which return a list of node records, it's necessary to call _populate_node_records() even if the query fails, so a properly initalised, albeit empty list is returned to the caller.	2018-08-29 10:25:43 +09:00
Ian Barwick	c1586e39b7	Log text of failed queries at log level ERROR Previously query texts were always logged at log level DEBUG, but that doesn't help much in a normal production environment when trying to identify the cause of issues. Also make various other minor improvements to query logging and handling of database errors. Implements GitHub #498.	2018-08-29 10:08:52 +09:00
Ian Barwick	e1e59e85d7	repmgr: add "cluster_cleanup" event GitHub #492.	2018-08-24 09:20:05 +09:00
Ian Barwick	34c4f4c3f8	repmgr: truncate version string if necessary Some distributions may add extra information to PG_VERSION after the actual version number (e.g. "10.4 (Debian 10.4-2.pgdg90+1)"), so copy the version number string up until the first space is found. GitHub #490.	2018-08-14 09:55:23 +09:00
Ian Barwick	44a224ad92	repmgrd: fix configuration file reloading Don't allow "promote_command" or "follow_command" to be empty. GitHub #486.	2018-08-02 16:35:26 +09:00
Ian Barwick	7ecfb333b9	doc: add note about switchover and exclusive backups Also rename server_not_in_exclusive_backup_mode() to avoid double negatives. GitHub #476.	2018-07-19 16:02:31 +09:00
Martín Marqués	8f13a66aaa	Check that there is no exclusive backup taking place while we perform a switchover. We've found that this can cause some issues with postgres control metadata (could be a postgres bug) so best thing is not no switchover if there's a backup taking place. It's also a bad idea from an architectual point of view, as a switchover is supposed to be planed, so why perform it when we are taking backups. GitHub #476.	2018-07-19 16:02:21 +09:00
Ian Barwick	ef35d071bf	Fix is_active_bdr_node() query for BDR 2.x Copy/paste error when adapting the query for BDR 3.x.	2018-07-19 09:50:30 +09:00
Ian Barwick	7decc7975f	Fix BDR version check repgexp_match() is only available from PostgreSQL 10 and later.	2018-07-18 10:54:16 +09:00
Ian Barwick	fcf237fe31	node status: improve output and documentation In the default text output mode, list inactive slots. In CSV output mode, list inactive slots as additional information; add output line with number of missing slots and a list thereof. Also document --csv output mode.	2018-06-22 11:46:50 +09:00
Ian Barwick	0f97a98f28	repmgr: don't count witness node as a standby when running "node status" Addresses GitHub #451.	2018-06-21 13:06:18 +09:00
Ian Barwick	836d2125fe	Improve BDR3 node query We can get everything we need from bdr.node_summary	2018-06-15 14:30:06 +09:00
Ian Barwick	bf0d67c60a	Add repmgr.nodes to the BDR replication set	2018-06-15 14:29:08 +09:00
Ian Barwick	108c3a36fb	Enable creation of repmgr extension on BDR3 node	2018-06-15 14:26:47 +09:00
Ian Barwick	8377704596	Convert BDR query functions to handle BDR2/BDR3	2018-06-15 14:26:07 +09:00
Ian Barwick	4f642f8332	Detect and store BDR major version number when executing "is_bdr_db()" BDR3 metadata structure is very different to BDR1/2, so we'll need to generate queries according to version.	2018-06-15 14:25:55 +09:00
Ian Barwick	bcab4bc391	_create_event(): log event and node ID for debugging	2018-06-12 10:30:30 +09:00
Ian Barwick	b1b49748a7	"config_file" is MAXPGPATH, not MAXLEN The two values are the same anyway, so change is more for consistency.	2018-05-24 15:52:57 +09:00
Ian Barwick	276239422b	standby clone: don't assume existence of "user" in upstream conninfo Usually a seperate user (typically "repmgr") is set up specifically to manage the repmgr metadata, however there's no compelling requirement to do this, and it's possible the database owner (usually: "postgres") will be used, in which case it's possible the username will be left out of the conninfo string. Addresses GitHub #437.	2018-05-24 15:52:51 +09:00
Ian Barwick	bd63948937	Include "arpa/inet.h" in dbutils.c Needed for htonl() on FreeBSD.	2018-05-10 12:03:04 +09:00
Ian Barwick	887b845aa0	repmgrd: always close the connection if the pointer is not NULL	2018-04-26 10:04:07 +09:00
Ian Barwick	7822aa784f	repmgrd: catch corner case in standby connection handle check If repmgrd marks the local node as unavailable, and it was actually restarting but a failover event occured before the next local node check, failover will continue with the stale connection handle. Add a final local node check just before starting the failover process, so repmgrd can reconnect if it wasn't able to before.	2018-04-24 21:56:57 +09:00
Ian Barwick	4455ded935	repmgrd: prevent standby connection handle from going stale If monitoring history not in use, there's no activity on the standby's connection handle, so if e.g. the standby is restarted, PQstatus() never returns CONNECTION_BAD and repmgrd never notices the connection is stale. Therefore execute a throw-away statement at "monitor_interval_secs".	2018-04-24 21:56:52 +09:00
Ian Barwick	1bbb2ef213	Fix superuser password handling When establishing a superuser connection, the connection parameters were being copied from the existing (non-superuser) connection, which in some circumstances can lead to that user's password being included in the copied parameter list. The password parameter, if set, will now always be removed, which will cause libpq to retrieve the correct one from the .pgpass file. Addresses GitHub #400.	2018-04-12 12:49:41 +09:00
Ian Barwick	1e1b4b1a65	"standby register/follow": provide primary node details for event notifications For events generated by these commands, it may be useful to know details of the primary node. This makes following additional parameters available to event notification scripts: - %p: node ID of the primary - %a: node name of the primary - %c: conninfo string for the primary Implements GitHub #375	2018-04-03 14:32:19 +09:00
Ian Barwick	cf64f9e95c	Always initialise t_conninfo_param_list structures	2018-04-03 14:31:24 +09:00
Ian Barwick	3ccf1cf182	Enable pg_rewind to be used with PostgreSQL 9.3/9.4 pg_rewind is not part of the core distribution for those, but we provided support in repmgr 3.3 so should extend it to repmgr 4. Note that there is no check in place whether the pg_rewind binary exists, so it's up to the user to ensure it's present. Addresses GitHub #413.	2018-04-02 20:54:29 +09:00
Ian Barwick	50321bb95d	Log pg_control access errors as WARNINGs rather than DEBUG This will make it easier to diagnose issues, possibly with an incorrect "data_directory" setting in "repmgr.conf".	2018-04-02 09:28:56 +09:00
Ian Barwick	22c40ae62d	doc: update HISTORY and release notes	2018-03-30 09:41:48 +09:00
Ian Barwick	a403da67bc	Consolidate connection closure calls	2018-03-27 16:43:59 +09:00
Ian Barwick	462fdca4b4	Tidy up queries in dbutils.c - standardize formatting - prefix various internal function calls with "pg_catalog.", to mitigate possible risks from CVE-2018-1058	2018-03-23 10:28:28 +08:00
Ian Barwick	0219f4c91f	Always set "connect_timeout" when pinging a PostgreSQL instance Insert "connect_timeout=2" into the connection parameters, if not explicitly set by the user. This will prevent excessive wait time for the host operating system to report a connection timeout.	2018-03-21 11:48:57 +09:00
Ian Barwick	85a4adc99c	Update HISTORY	2018-03-21 06:48:32 +09:00
Martín Marqués	208d7d418e	While reviewing `7cb6e5af8d` before merging I noticed that besides the result cleanup added, there was still a missing spot inside the if condition. Adding the PQclear that was missing.	2018-03-13 11:43:36 -03:00
Andrzej Nowicki	d2a2df13d5	One more memory leak fixed	2018-03-13 11:23:33 +01:00
Andrzej Nowicki	358e001218	Clear node list to avoid memory leak, fixes #402	2018-03-13 11:05:24 +01:00
Ian Barwick	d7702b3444	Correctly handle error message pointer when parsing strings. When parsing conninfo strings, ensure the error message pointer is actually returned to the caller. Not a criticial issue, just meant the contents of the error message were not being displayed.	2018-03-10 14:29:12 +09:00
Ian Barwick	9981ede1af	"standby clone": fix --superuser handling get_superuser_connection() was erroneously using the local node record to connect to as a superuser, which works when registering the primary but obviously not when cloning a standby. Addresses GitHub #380.	2018-03-02 16:43:19 +09:00
Ian Barwick	29cb153643	"node status": improve replication slot warnings Addresses GitHub #385	2018-02-23 11:19:33 +09:00
Ian Barwick	c644ddde51	Fix typo in function name	2018-02-22 15:50:57 +09:00
Ian Barwick	ee98a3a58e	"standby clone": add --recovery-conf-only option This will generate "recovery.conf" for an existing standby. Typical use-case is a standby cloned manually from an external data source (e.g. Barman), where "recovery.conf" needs to be created (and if required a replication slot). The --dry-run option will check the pre-requisites but not actually create "recovery.conf" or a replication slot. This requires that the upstream node is running, a replication connection can be made and if required a replication slot can be created. Implements GitHub #382.	2018-02-22 15:50:51 +09:00
Ian Barwick	22b3a74fa0	repmgrd: improve detection of status change from primary to standby If repmgrd is running in degraded mode on a primary which has been stopped, then manually been brought back online as a standby (e.g. by creating recovery.conf and starting the server), ensure it not only detects the change but automatically updates the node record so it can resume monitoring the node as a standby. Previously, repmgrd was looping waiting for the record to be updated (as is done transparently when executing "repmgr node rejoin") but if the record was not updated within the timeout period (e.g. by "repmgr standby register) it would fail to resume monitoring as a standby. It seems reasonable to have repmgrd automatically update the node record, as this will restore failover capability as quickly as possible. If this is not desired, then the onus is on the user to shut down repmgrd while making the desired changes.	2018-02-22 15:50:45 +09:00
Ian Barwick	f5f02ae0ee	Replace remaining instances of strcpy() with strncpy() Also use strncmp() to match.	2018-02-15 13:31:55 +09:00
Ian Barwick	6b7f6089ba	"node status": add warning about missing replication slots Implements GitHub #364.	2018-02-12 11:38:27 +09:00
Ian Barwick	ee2df36a76	"standby switchover": additional sanity checks Check that sufficient walsenders will be available on the promotion candidate, and if replication slots are in use check if enough of those will be available. Note these checks can't guarantee that the walsenders/slots will be available at the appropriate points during the switchover process, but do ensure that existing configuration problems will be caught. Implements GitHub #371.	2018-02-08 15:19:24 +09:00
Ian Barwick	657ed83921	"cluster show": improve handling of database errors In particular, if running "repmgr cluster show" against a database without the repmgr metadata, showing the error (rather than just "no records found" etc.) will provide some clues about the problem.	2018-02-05 10:35:56 +09:00
Ian Barwick	6c81e54f76	"standby follow": check for replication slot availability on target node	2018-02-02 17:18:43 +09:00
Ian Barwick	375a96a5c8	repmgrd: log execution error in "repmgrd_get_local_node_id()" That shouldn't happen, but if it does it will make it easier to identify the issue.	2018-01-16 11:16:19 +09:00
Ian Barwick	8fd0c4ad83	repmgr: assume node is actually shutting down if pingable and that's the reported status	2018-01-12 21:53:37 +09:00

1 2 3 4

196 Commits