repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-07-16 22:39:04 +00:00

Author	SHA1	Message	Date
Ian Barwick	9439467958	doc: add troubleshooting section to switchover documentation	2018-09-25 13:47:58 +09:00
Ian Barwick	38e3aae053	repmgr: add parameter "shutdown_check_timeout" Previously, "repmgr standby switchover" used the configuration file parameters "reconnect_interval" and "reconnect_attempts" to define a timeout to determine whether the current primary (demotion candidate) has shut down. However, these parameters are intended for primary failure detection and are generally lower in value, while a controlled shutdown may take longer, resulting in the switchover being aborted as repmgr was not waiting long enough. To prevent this happening, parameter "shutdown_check_timeout" has been added. This complements the existing "standby_reconnect_timeout" parameter used by "repmgr standby switchover". Implements GitHub #504.	2018-09-25 11:34:06 +09:00
Ian Barwick	80bef0eb28	doc: minor fixes to "repmgr.conf.sample"	2018-09-25 10:53:24 +09:00
Ian Barwick	bea4b03cc2	doc: update "repmgr node rejoin" documentation Clarify various points related to --force-rewind and pg_rewind usage.	2018-09-14 14:08:34 +09:00
Ian Barwick	97905b02ae	repmgrd: fix comment	2018-09-13 10:15:22 +09:00
Ian Barwick	b0a2ee2259	get_all_node_records(): display any error encountered and return success status In many cases we'll want to bail out with an error if the node list can't be retrieved for any reason. This saves some repetitive coding.	2018-09-13 10:14:43 +09:00
Ian Barwick	bb4fdcda98	doc: update link	2018-09-12 14:17:14 +09:00
Ian Barwick	7b33faa09b	repmgr: improve "cluster show" output Only output full contents of connection error messages in --verbose mode, otherwise it can spew a lot of text onto the screen.	2018-09-07 16:59:54 +09:00
Ian Barwick	5de2b1ee13	repmgrd: update local node id in shared memory after local node restart Also ensure local node restarts are handled more elegantly, so we're not surprised by a stale connection handle. GitHub #502.	2018-09-07 11:59:53 +09:00
Ian Barwick	f184b1e68a	doc: update 4.1.1 release notes	2018-09-04 12:35:46 +09:00
Ian Barwick	bd2f6db1e1	doc: update 4.1.1 release notes	2018-09-04 09:47:38 +09:00
Ian Barwick	1693ec0e90	repmgrd: fix syntax	2018-08-30 16:27:07 +09:00
Ian Barwick	17e75f6b31	repmgrd: improve reconnection handling Previously, if the server being monitored was not available, repmgrd would always close the existing connection handle and open a new one. However, in some cases, e.g. a brief network outage, the existing connection handle is still good and does not need to be reopened. This could be particularly problematic if monitoring_history is on, as this risks leaving orphan sessions on the primary which (given a sufficiently unstable network) could lead to all available backends being occupied. Instead, during an outage we now use a new connection to verify the server is accessible; if the old connection is still available (e.g. following a short network interruption) we continue using that; if not (e.g. the server was restarted), we use the new one.	2018-08-30 15:46:08 +09:00
Ian Barwick	3b8586d82a	doc: update release notes	2018-08-30 13:05:17 +09:00
Ian Barwick	6acec3e041	doc: fix internal link	2018-08-30 12:40:08 +09:00
Ian Barwick	1d830bf0e2	doc: update package signing key link	2018-08-30 12:40:05 +09:00
Ian Barwick	3f99ee8ede	doc: update source requirement links Per report from Daymel Bonne.	2018-08-30 12:40:02 +09:00
Ian Barwick	b5f640d04d	doc: improve event notification documentation - add undocumented events (per report from Daymel Bonne) - split up list into sections for better overview - where feasible, add cross-links	2018-08-30 12:39:58 +09:00
Ian Barwick	92a62a958e	doc: clarify statement about BDR HA support	2018-08-30 12:39:54 +09:00
Ian Barwick	a4a956593c	doc: clarify when "standby follow" can be used. The unqualified wording previously implied that any running server could be rejoined with "standby follow", which is not the case with a "split brain" primary.	2018-08-30 12:39:51 +09:00
Ian Barwick	ceeb6d7130	repmgrd: improve monitoring statistics logging Add more granular logging to help diagnose issues, and also keep track of when the last monitoring statistics update was set and emit that as DETAIL every time we emit a log status update.	2018-08-30 12:36:59 +09:00
Ian Barwick	9681708b1a	repmgr: improve slot handling in "node rejoin" On the rejoined node, if a replication slot for the new upstream exists (which is typically the case after a failover), delete that slot. Also emit a warning about any inactive replication slots which may need to be cleaned up manually. GitHub #499.	2018-08-30 12:24:13 +09:00
Ian Barwick	3573950425	Add additional query error logging It's unlikely we'll get an error in these cases, but you never know. Also, with queries which return a list of node records, it's necessary to call _populate_node_records() even if the query fails, so a properly initalised, albeit empty list is returned to the caller.	2018-08-29 10:25:43 +09:00
Ian Barwick	c1586e39b7	Log text of failed queries at log level ERROR Previously query texts were always logged at log level DEBUG, but that doesn't help much in a normal production environment when trying to identify the cause of issues. Also make various other minor improvements to query logging and handling of database errors. Implements GitHub #498.	2018-08-29 10:08:52 +09:00
Ian Barwick	7745844078	"standby switchover": improve replication connection check Previously repmgr would first check that a replication can be made from the demotion candidate to the promotion candidate, however it's preferable to sanity-check the number of available walsenders first, to provide a more useful error message.	2018-08-24 16:31:25 +09:00
Ian Barwick	e1e59e85d7	repmgr: add "cluster_cleanup" event GitHub #492.	2018-08-24 09:20:05 +09:00
Cédric Villemain	6fc79470fc	Fix grep to find conninfo it used to use \t* but [[:space:]] should be better as it does match more kind of spaces (the current one being broken in my case on RH7)	2018-08-23 18:33:55 +02:00
Ian Barwick	b7d576863d	doc: update FAQ Add note about why repmgrd refuses to start up if the upstream is not running.	2018-08-20 15:33:55 +09:00
Ian Barwick	c1338df5e3	doc: clarify repmgrd FAQ item "priority" must be 0 or greater.	2018-08-20 15:30:43 +09:00
Ian Barwick	221fb63e92	repmgrd: fix startup on witness node when local data is stale Previously, when running on a witness server, repmgrd didn't consider the local cache of the "repmgr.nodes" table might be outdated, e.g. as repmgrd wasn't running on the witness server during a failover, so could potentially end up monitoring a former primary now running as a standby. When running on a witness server, at startup repmgrd will now scan all nodes to determine the current primary, and refresh its local cache from there. This will also ensure it can start up even if the node currently registered as primary in the local cache is not available. Implements GitHub #488 and #489.	2018-08-20 15:29:29 +09:00
Ian Barwick	987823861f	doc: document sources of old package versions	2018-08-20 15:25:00 +09:00
Ian Barwick	7a6eb6321b	doc: add information about snapshot packages	2018-08-20 15:24:57 +09:00
Ian Barwick	f4df6696ba	doc: update release notes	2018-08-20 15:24:43 +09:00
Ian Barwick	bc584d84f6	repmgrd: improve cascaded standby failover handling In particular, improve handling of the case where the standby follow command fails due to the primary not being available. GitHub #480.	2018-08-20 15:23:54 +09:00
Ian Barwick	76f5bcf3cd	repmgrd: fix PQExpBuffer handling in upstream failover handler Was sometimes leading to blank log lines.	2018-08-20 15:23:50 +09:00
Ian Barwick	b1aab930af	repmgrd: don't imply primary is in recovery if it's not available	2018-08-20 15:23:46 +09:00
Ian Barwick	58994365ff	repmgrd: fix "repmgrd_upstream_reconnect" event notification Upstream node is not always the primary node. Per report in GitHub #480.	2018-08-20 15:23:42 +09:00
Ian Barwick	c3949b2aea	"standby clone" - don't copy external config files in dry run mode Avoid copying files during a --dry-run as it may introduce unexpected changes on the target node. During an actual clone operation, any problems with copying files will be detected early and the operation aborted before the actual database cloning commences. GitHub #491.	2018-08-20 15:23:37 +09:00
Ian Barwick	6ba49de44e	"standby promote": improve log messages Make it clearer what repmgr is waiting for, and what to do if the promotion appears to fail.	2018-08-16 11:52:01 +09:00
Ian Barwick	b61f853a69	repmgrd: ensure primary connection handle is refreshed after reconnect In some circumstances, if monitoring history was in use, repmgrd was attempting to fetch the primary's current LSN on a stale connection handle.	2018-08-15 16:55:03 +09:00
Ian Barwick	f2bc898761	repmgr: fix handling of slot creation error when cloning If cloning from another node other than the intended upstream, and replication slots are in use, once the cloning process is complete, repmgr will attempt to connect to the intended upstream to create the replication slot. Previously it would abort with a connection error, but as this issue is not fatal to the cloning process itself, and in some situations may be intentional, it's better to log a warning and continue. We should probably collate this (and any similar items needing attention after the cloning operation) into a list output at the end, otherwise the warning may get overlooked.	2018-08-15 15:12:23 +09:00
Ian Barwick	7bcf87b8ed	doc: update FAQ Explain why some values in recovery.conf are surrounded by pairs of single quotes.	2018-08-15 14:42:56 +09:00
Ian Barwick	6983547325	doc: improve "repmgr cluster cleanup" documentation	2018-08-14 10:09:52 +09:00
Ian Barwick	34c4f4c3f8	repmgr: truncate version string if necessary Some distributions may add extra information to PG_VERSION after the actual version number (e.g. "10.4 (Debian 10.4-2.pgdg90+1)"), so copy the version number string up until the first space is found. GitHub #490.	2018-08-14 09:55:23 +09:00
Ian Barwick	f8667c1aac	doc: better explain where pg_bindir won't be applied Basically any setting which can contain a user-defined script must have the full path set, even if it's repmgr being executed. We could potentially apply some heuristics to detect if the first item in the setting is "repmgr" (or more precisely repmgrd's program name), but this will require some careful thought and testing that it works as intended.	2018-08-14 09:54:27 +09:00
Ian Barwick	08ab6290c1	Add dummy 4.2 extension SQL file	2018-08-14 09:54:27 +09:00
Abhijit Menon-Sen	97cafd8c54	Fix upstream node name in warning This log_warning is supposed to reproduce the error in the block above, but used the current node's name instead of the intended upstream node.	2018-08-12 09:15:13 +05:30
Ian Barwick	78b969f208	repmgrd: report version number after logger initialisation This ensures the version number always makes it into the log destination. Implements GitHub #487.	2018-08-08 15:44:06 +09:00
Ian Barwick	3f558416f3	doc: clarify witness server location	2018-08-07 13:10:30 +09:00
Ian Barwick	410fa5e54d	Bump master branch to 4.2dev	2018-08-07 13:03:28 +09:00

1 2 3 4 5 ...

960 Commits