repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-22 22:56:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	a48d408e4e	Consistently log strerror output as DETAIL	2019-01-29 12:10:55 +09:00
Ian Barwick	1980deb480	repmgrd: check for a change to the upstream node If the upstream node has changed, for example after "repmgr standby follow" was manually executed, restart monitoring to ensure repmgrd is monitoring the correct node.	2019-01-22 13:33:13 +09:00
Ian Barwick	7dce3ed234	Update copyright notices to 2019	2019-01-21 14:54:35 +09:00
Ian Barwick	d4e993a240	Improve handling of connection URIs when executing remote commands Previously, if connection URIs were in use and "repmgr standby switchover" was executed, repmgr would pass the connection URI as-is to the demotion candidate to execute "repmgr node rejoin". However the presence of unescaped ampersands in the connection URI was causing the rejoin command to be incorrectly executed. Addresses GitHub #525.	2019-01-14 11:11:51 +09:00
Ian Barwick	40408a1734	repmgrd: check binary and extension major versions match repmgr requires that the same "major version" (e.g. 4.3) is present on all nodes, otherwise - particularly in the case of repmgrd - it's highly likely things won't work as expected. Implements part of GitHub #515.	2019-01-07 15:39:40 +09:00
Ian Barwick	313aa3c5d7	Refactor follow verification to reduce need for CHECKPOINT A CHECKPOINT is not always required; hopefully we can narrow it down to one corner case where we need to determine the minium recovery location. Also get local timeline ID via IDENTIFY_SYSTEM, as fetching it from pg_control risks returning the prior timeline ID if the timeline switch has just taken place and no restart point has yet occurred.	2018-12-04 15:27:22 +09:00
Ian Barwick	c53782cda3	Fix typo in query	2018-11-29 15:24:49 +09:00
Ian Barwick	66b40ffc68	Simplify function create_replication_slot() Following the changes in `793d83b`, it's no longer necessary to pass the server version number.	2018-11-29 14:35:01 +09:00
Ian Barwick	a6a2be2239	Teach witness repmgrd to deal with the absence of a primary Previously it would refuse to start if the primary was not reachable, the thinking being that it's pointless trying to monitor an incomplete cluster. However following an aborted failover situation, repmgrd will restart monitoring and on the witness server, this will lead to it aborting itself due to to continuing absence of primary. To resolve this, witness repmgrd will now start monitoring in degraded mode if no primary is found in the hope a primary will reappear at some point.	2018-11-29 12:15:41 +09:00
Ian Barwick	bdcc4d9e83	Check correct result status in ...primary_last_seen() functions	2018-11-29 11:08:28 +09:00
Ian Barwick	793d83b22c	Refactor server version detection Most of the time we can simply get the version number directly from the connection handle. Previously it was held in a global variable, which was an icky way of doing things. In a few special cases we also need the actual version string, which is obtained directly from the database.	2018-11-22 21:30:31 +09:00
Ian Barwick	0f4e04e61e	Add function get_current_lsn() This is a somewhat convoluted attempt to retrieve the current LSN of any node, regardless of whether in recovery or not, and if in recovery, independent of whether streaming or recovering from archive.	2018-11-22 19:31:49 +09:00
Ian Barwick	80a280cbf4	Add function get_timeline_history() This will be required for verifying whether one node is able to follow another node.	2018-11-22 15:26:50 +09:00
Ian Barwick	784c9c4793	repmgrd: return predictable default values for get_primary_last_seen() Return 0 if the node is not in recovery. In which case it's probably rather pointless calling this function anyway. Return -1 if the "last_seen" field has never been set (i.e. repmgrd hasn't started yet).	2018-11-21 11:30:32 +09:00
Ian Barwick	0caec90d81	repmgrd: set primary last seen	2018-11-21 11:30:27 +09:00
Ian Barwick	c3bc5585d9	Add sanity check for extension version This should cover the cases where the "repmgr" extension was installed manually but not updated, or an upgrade was not fully completed.	2018-10-31 11:16:36 +09:00
Ian Barwick	c336e384ab	Support "pg_promote()" function (PostgreSQL 12 and later) This is an experimental feature.	2018-10-26 11:02:45 +09:00
Ian Barwick	a459c60145	Avoid defining variable-length arrays As of PostgreSQL commit d9dd406f, variable length arrays are no longer permitted. As they're not actually required anyway, just define appropriate constants. Also noted in GitHub #510.	2018-10-26 10:09:45 +09:00
Ian Barwick	3e38759c02	use appendPQExpBufferStr/-Char() consistently	2018-10-04 08:42:42 +09:00
Ian Barwick	2491b8ae52	Add functionality to "pause" repmgrd In some circumstances, e.g. while performing a switchover, it is essential that repmgrd does not take any kind of failover action, as this will put the cluster into an incorrect state. Previously it was necessary to stop repmgrd on all nodes (or at least those nodes which repmgrd would consider as promotion candidates), however this is a cumbersome and potentially risk-prone operation, particularly if the replication cluster contains more than a couple of servers. To prevent this issue from occurring, this patch introduces the ability to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause") which notifies repmgrd not to take any failover action until the node is "unpaused" ("repmgr daemon unpause"). "repmgr daemon status" provides an overview of each node and whether repmgrd is running, and if so whether it is paused. "repmgr standby switchover" has been modified to automatically pause repmgrd while carrying out the switchover. See documentation for further details.	2018-09-27 16:42:10 +09:00
Ian Barwick	688337dec3	repmgr: add "--node-id" option to "cluster cleanup" Implements GitHub #493.	2018-09-25 15:56:40 +09:00
Ian Barwick	b0a2ee2259	get_all_node_records(): display any error encountered and return success status In many cases we'll want to bail out with an error if the node list can't be retrieved for any reason. This saves some repetitive coding.	2018-09-13 10:14:43 +09:00
Ian Barwick	17e75f6b31	repmgrd: improve reconnection handling Previously, if the server being monitored was not available, repmgrd would always close the existing connection handle and open a new one. However, in some cases, e.g. a brief network outage, the existing connection handle is still good and does not need to be reopened. This could be particularly problematic if monitoring_history is on, as this risks leaving orphan sessions on the primary which (given a sufficiently unstable network) could lead to all available backends being occupied. Instead, during an outage we now use a new connection to verify the server is accessible; if the old connection is still available (e.g. following a short network interruption) we continue using that; if not (e.g. the server was restarted), we use the new one.	2018-08-30 15:46:08 +09:00
Ian Barwick	ceeb6d7130	repmgrd: improve monitoring statistics logging Add more granular logging to help diagnose issues, and also keep track of when the last monitoring statistics update was set and emit that as DETAIL every time we emit a log status update.	2018-08-30 12:36:59 +09:00
Ian Barwick	3573950425	Add additional query error logging It's unlikely we'll get an error in these cases, but you never know. Also, with queries which return a list of node records, it's necessary to call _populate_node_records() even if the query fails, so a properly initalised, albeit empty list is returned to the caller.	2018-08-29 10:25:43 +09:00
Ian Barwick	c1586e39b7	Log text of failed queries at log level ERROR Previously query texts were always logged at log level DEBUG, but that doesn't help much in a normal production environment when trying to identify the cause of issues. Also make various other minor improvements to query logging and handling of database errors. Implements GitHub #498.	2018-08-29 10:08:52 +09:00
Ian Barwick	e1e59e85d7	repmgr: add "cluster_cleanup" event GitHub #492.	2018-08-24 09:20:05 +09:00
Ian Barwick	34c4f4c3f8	repmgr: truncate version string if necessary Some distributions may add extra information to PG_VERSION after the actual version number (e.g. "10.4 (Debian 10.4-2.pgdg90+1)"), so copy the version number string up until the first space is found. GitHub #490.	2018-08-14 09:55:23 +09:00
Ian Barwick	44a224ad92	repmgrd: fix configuration file reloading Don't allow "promote_command" or "follow_command" to be empty. GitHub #486.	2018-08-02 16:35:26 +09:00
Ian Barwick	7ecfb333b9	doc: add note about switchover and exclusive backups Also rename server_not_in_exclusive_backup_mode() to avoid double negatives. GitHub #476.	2018-07-19 16:02:31 +09:00
Martín Marqués	8f13a66aaa	Check that there is no exclusive backup taking place while we perform a switchover. We've found that this can cause some issues with postgres control metadata (could be a postgres bug) so best thing is not no switchover if there's a backup taking place. It's also a bad idea from an architectual point of view, as a switchover is supposed to be planed, so why perform it when we are taking backups. GitHub #476.	2018-07-19 16:02:21 +09:00
Ian Barwick	ef35d071bf	Fix is_active_bdr_node() query for BDR 2.x Copy/paste error when adapting the query for BDR 3.x.	2018-07-19 09:50:30 +09:00
Ian Barwick	7decc7975f	Fix BDR version check repgexp_match() is only available from PostgreSQL 10 and later.	2018-07-18 10:54:16 +09:00
Ian Barwick	fcf237fe31	node status: improve output and documentation In the default text output mode, list inactive slots. In CSV output mode, list inactive slots as additional information; add output line with number of missing slots and a list thereof. Also document --csv output mode.	2018-06-22 11:46:50 +09:00
Ian Barwick	0f97a98f28	repmgr: don't count witness node as a standby when running "node status" Addresses GitHub #451.	2018-06-21 13:06:18 +09:00
Ian Barwick	836d2125fe	Improve BDR3 node query We can get everything we need from bdr.node_summary	2018-06-15 14:30:06 +09:00
Ian Barwick	bf0d67c60a	Add repmgr.nodes to the BDR replication set	2018-06-15 14:29:08 +09:00
Ian Barwick	108c3a36fb	Enable creation of repmgr extension on BDR3 node	2018-06-15 14:26:47 +09:00
Ian Barwick	8377704596	Convert BDR query functions to handle BDR2/BDR3	2018-06-15 14:26:07 +09:00
Ian Barwick	4f642f8332	Detect and store BDR major version number when executing "is_bdr_db()" BDR3 metadata structure is very different to BDR1/2, so we'll need to generate queries according to version.	2018-06-15 14:25:55 +09:00
Ian Barwick	bcab4bc391	_create_event(): log event and node ID for debugging	2018-06-12 10:30:30 +09:00
Ian Barwick	b1b49748a7	"config_file" is MAXPGPATH, not MAXLEN The two values are the same anyway, so change is more for consistency.	2018-05-24 15:52:57 +09:00
Ian Barwick	276239422b	standby clone: don't assume existence of "user" in upstream conninfo Usually a seperate user (typically "repmgr") is set up specifically to manage the repmgr metadata, however there's no compelling requirement to do this, and it's possible the database owner (usually: "postgres") will be used, in which case it's possible the username will be left out of the conninfo string. Addresses GitHub #437.	2018-05-24 15:52:51 +09:00
Ian Barwick	bd63948937	Include "arpa/inet.h" in dbutils.c Needed for htonl() on FreeBSD.	2018-05-10 12:03:04 +09:00
Ian Barwick	887b845aa0	repmgrd: always close the connection if the pointer is not NULL	2018-04-26 10:04:07 +09:00
Ian Barwick	7822aa784f	repmgrd: catch corner case in standby connection handle check If repmgrd marks the local node as unavailable, and it was actually restarting but a failover event occured before the next local node check, failover will continue with the stale connection handle. Add a final local node check just before starting the failover process, so repmgrd can reconnect if it wasn't able to before.	2018-04-24 21:56:57 +09:00
Ian Barwick	4455ded935	repmgrd: prevent standby connection handle from going stale If monitoring history not in use, there's no activity on the standby's connection handle, so if e.g. the standby is restarted, PQstatus() never returns CONNECTION_BAD and repmgrd never notices the connection is stale. Therefore execute a throw-away statement at "monitor_interval_secs".	2018-04-24 21:56:52 +09:00
Ian Barwick	1bbb2ef213	Fix superuser password handling When establishing a superuser connection, the connection parameters were being copied from the existing (non-superuser) connection, which in some circumstances can lead to that user's password being included in the copied parameter list. The password parameter, if set, will now always be removed, which will cause libpq to retrieve the correct one from the .pgpass file. Addresses GitHub #400.	2018-04-12 12:49:41 +09:00
Ian Barwick	1e1b4b1a65	"standby register/follow": provide primary node details for event notifications For events generated by these commands, it may be useful to know details of the primary node. This makes following additional parameters available to event notification scripts: - %p: node ID of the primary - %a: node name of the primary - %c: conninfo string for the primary Implements GitHub #375	2018-04-03 14:32:19 +09:00
Ian Barwick	cf64f9e95c	Always initialise t_conninfo_param_list structures	2018-04-03 14:31:24 +09:00

1 2 3 4 5

219 Commits