repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-07-16 14:29:05 +00:00

Author	SHA1	Message	Date
Ian Barwick	531194fa27	Initial implementation of "failover_validation_command"	2019-03-08 15:29:06 +09:00
Ian Barwick	2aa67c992c	Make recently added configuration options reloadable	2019-03-08 15:28:59 +09:00
Ian Barwick	37892afcfc	Add configuration option "primary_visibility_consensus" This determines whether repmgrd should continue with a failover if one or more nodes report they can still see the standby.	2019-03-08 15:28:53 +09:00
Ian Barwick	e4e5e35552	Add configuration option "sibling_nodes_disconnect_timeout" This controls the maximum length of time in seconds that repmgrd will wait for other standbys to disconnect their WAL receivers in a failover situation. This setting is only used when "standby_disconnect_on_failover" is set to "true".	2019-03-08 15:28:48 +09:00
Ian Barwick	b320c1f0ae	Reset "wal_retrieve_retry_interval" for all nodes	2019-03-08 15:28:42 +09:00
Ian Barwick	280654bed6	repmgrd: don't wait for WAL receiver to reconnect during failover If the WAL receiver has been temporarily disabled, we don't want to wait for it to start up as it may not be able to at that point; we do however need to reset "wal_retrieve_retry_interval".	2019-03-08 15:28:27 +09:00
Ian Barwick	ae675059c0	Improve logging/sanity checking for "node control" options	2019-03-08 15:28:22 +09:00
Ian Barwick	454ebabe89	Improve logging when disabling/enabling WAL receiver Also check action is being run on node which is in recovery.	2019-03-08 15:28:17 +09:00
Ian Barwick	d1d6ef8d12	Check for WAL receiver start up	2019-03-08 15:28:11 +09:00
Ian Barwick	5d6eab74f6	Log warning if "standby_disconnect_on_failover" used on pre-9.5 "standby_disconnect_on_failover" requires availability of "wal_retrieve_retry_interval", which is available from PostgreSQL 9.5. 9.4 will fall out of community support this year, so it doesn't seem productive at this point to do anything more than put the onus on the user to read the documentation and heed any warning messages in the logs.	2019-03-08 15:28:01 +09:00
Ian Barwick	59b7453bbf	repmgrd: optionally disconnect WAL receivers during failover This is intended to ensure that all nodes have a constant LSN while making the failover decision. This feature is experimental and needs to be explicitly enabled with the configuration file option "standby_disconnect_on_failover". Note enabling this option will result in a delay in the failover decision until the WAL receiver is disconnected on all nodes.	2019-03-08 15:27:54 +09:00
Ian Barwick	bde8c7e29c	repmgrd: handle reconnect to restarted server when using "connection" checks	2019-03-08 15:27:49 +09:00
Ian Barwick	bc6584a90d	*_transaction() functions: log error message text as DETAIL Per behaviour elsewhere.	2019-03-06 13:23:57 +09:00
Ian Barwick	074d79b44f	repmgrd: add option "connection_check_type" This enable selection of the method repmgrd uses to check whether the upstream node is available. Possible values are: - "ping" (default): uses PQping() to check server availability - "connection": executes a query on the connection to check server availability (similar to repmgr3.x).	2019-03-06 13:23:53 +09:00
Ian Barwick	2eeb288573	repmgrd: ignore invalid "upstream_last_seen" value	2019-03-06 13:23:47 +09:00
Ian Barwick	48a2274b11	Use appendPQExpBufferStr where approrpriate	2019-03-06 13:23:38 +09:00
Ian Barwick	19bcfa7264	Rename "..._primary_last_seen" functions to "..._upstream_last_seen" As that better reflects what they do.	2019-03-06 13:23:33 +09:00
Ian Barwick	486877c3d5	repmgrd: log details of nodes which can see primary If a failover is cancelled because other nodes can still see the primary, log the identies of those nodes.	2019-03-06 13:23:27 +09:00
Ian Barwick	9753bcc8c3	repmgrd: during failover, check if other nodes have seen the primary In a situation where only some standbys are cut off from the primary, a failover would result in a split brain/split cluster situation, as it's likely one of the cut-off standbys will promote itself, and other cut-off standbys (but not all standbys) will follow it. To prevent this happening, interrogate the other sibiling nodes to check whether they've seen the primary within a reasonably short interval; if this is the case, do not take any failover action. This feature is experimental.	2019-03-06 13:23:22 +09:00
Ian Barwick	bd35b450da	daemon status: with csv output, show repmgrd status as unknown where appropriate Previously, if PostgreSQL was not running on the node, repmgrd and pause status were shown as "0", implying their status was known. This brings the csv output in line with the human-readable output, which displays "n/a" in this case.	2019-02-28 12:28:04 +09:00
Ian Barwick	1f256d4d73	doc: upate release notes	2019-02-28 10:02:05 +09:00
Ian Barwick	1524e2449f	Split command execution functions into separate library These may need to be executed by repmgrd.	2019-02-27 14:41:38 +09:00
Ian Barwick	0cd2bd2e91	repmgrd: add additional logging during a failover operation	2019-02-27 11:45:34 +09:00
Ian Barwick	98b78df16c	Remove unneeded debugging output	2019-02-26 21:17:17 +09:00
Ian Barwick	b946dce2f0	doc: update introductory blurb	2019-02-26 15:19:41 +09:00
Ian Barwick	39234afcbf	standby clone: check upstream connections after data copy operation With long-running copy operations, it's possible the connection(s) to the primary/source server may go away for some reason, so recheck their availability before attempting to reuse.	2019-02-26 14:37:51 +09:00
John Naylor	23569a19b1	Doc fix: PostgreSQL 9.4 is no longer considered recent	2019-02-25 13:02:56 +09:00
John Naylor	c650fd3412	Fix typo	2019-02-25 13:02:51 +09:00
Ian Barwick	c30e65b3f2	Add some missing query error logging	2019-02-25 13:02:45 +09:00
Ian Barwick	07097575b1	daemon status: add column "upstream last seen" This displays the interval (in seconds) since the repmgrd instance on each node last confirmed its upstream node is available.	2019-02-23 13:03:16 +09:00
Ian Barwick	71d151ca87	Don't check status of logical replication slots We only want to check the status of physical replication slots to determine whether a streaming replication standby has become detached and there is therefore a risk of uncontrolled WAL buildup on the local node. It's not feasible to second-guess the state of logical replication slots.	2019-02-23 10:09:43 +09:00
Ian Barwick	5abec2bb97	doc: clarify replication slot usage with Barman Barman will usually use one replication slot, but that's generally preferable to multiple slots.	2019-02-22 13:52:02 +09:00
Ian Barwick	de70fd42dc	node check: simplify output generation in --is-shutdown-cleanly check	2019-02-22 10:49:06 +09:00
Ian Barwick	99550b91bd	standby register: warn if standby is running and connection params provided Addresses GitHub #552.	2019-02-22 10:31:00 +09:00
John Naylor	70190c37c4	Bring list of supported versions on the doc front page in line with the supported version matrix	2019-02-20 11:41:17 +07:00
Ian Barwick	f3fc4e5afb	Minor syntax formatting tweak For consistency.	2019-02-15 19:58:35 +09:00
Ian Barwick	629c552348	primary unregister: ensure correct behaviour when executed on a witness Fixes GitHub #548.	2019-02-15 19:49:17 +09:00
Ian Barwick	85a97c933f	Handle unhandled NodeStatus in switch statement	2019-02-15 19:31:06 +09:00
Ian Barwick	3a5a4388c7	cluster show: differentiate unreachable status Differentiate between unreachable nodes and nodes which are running but rejecting connections.	2019-02-15 16:01:55 +09:00
Ian Barwick	9338a9e233	Improve logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail lineImprove logging output Avoid emitting blank detail line	2019-02-15 10:49:56 +09:00
Ian Barwick	7fad2ed2c8	standby switchover: improve error output It wasn't clear why repmgr thinks the demotion candidate is not the upstream of the promotion candidate.	2019-02-14 17:22:24 +09:00
Ian Barwick	9305953bd2	Fix history file parsing Also add additional debugging output.	2019-02-14 15:52:40 +09:00
Ian Barwick	aeb9639ed9	node rejoin: add more log detail during rejoin success check Stating what is actually being checked where might be useful when diagnosing potential issues.	2019-02-13 15:29:39 +09:00
Ian Barwick	bc9e725d05	node rejoin: always emit detail about relative LSNs Previously repmgr only emitted that if there was a timeline/LSN mismatch, but it's useful to have confirmation of how it came to the conclusion that rejoin will succeed.	2019-02-13 15:16:40 +09:00
Ian Barwick	905e108f8f	doc: fix typos etc. in "standby follow" reference	2019-02-12 17:24:56 +09:00
Ian Barwick	f2362a06fa	doc: update "standby switchover" reference	2019-02-12 16:39:13 +09:00
Ian Barwick	7b85cb9f12	doc: update "standby follow" reference Add note about handling of timeline forks etc.	2019-02-12 16:39:06 +09:00
Ian Barwick	790bec21dd	node rejoin: handle case where node to rejoin was primary In that case the minRecoveryPoint* fields may be empty.	2019-02-12 13:31:25 +09:00
Ian Barwick	a0dc673439	"node rejoin": use minRecoveryPointTLI for comparing timelines	2019-02-12 13:31:21 +09:00
Ian Barwick	25019d1cc5	Refactor is_wal_replay_paused() query Make sure it doesn't emit an error if executed on a node not in recovery. The caller should theoretically only execute it on nodes in recovery, but there are sure to be corner cases where the node has come out of recovery.	2019-02-12 10:21:05 +09:00

1 2 3 4 5 ...

1212 Commits