Commit Graph

1917 Commits

Author SHA1 Message Date
Ian Barwick 2a8f8d8400 doc: expand repmgrd configuration section 2019-03-11 14:50:33 +09:00
Ian Barwick 4ef706c2ca Execute "failover_validation_command" when only one standby exists 2019-03-08 12:19:37 +09:00
Ian Barwick 663c2e75b4 Make "failover_validation_command" reloadable 2019-03-08 09:27:19 +09:00
Ian Barwick db0d71c6a7 Initial implementation of "failover_validation_command" 2019-03-08 08:49:15 +09:00
Ian Barwick 6f4f56dd8c Make recently added configuration options reloadable 2019-03-07 10:58:25 +09:00
Ian Barwick 33fefd9f52 Add configuration option "primary_visibility_consensus"
This determines whether repmgrd should continue with a failover if
one or more nodes report they can still see the standby.
2019-03-07 10:41:42 +09:00
Ian Barwick a3f90d2bba Add configuration option "sibling_nodes_disconnect_timeout"
This controls the maximum length of time in seconds that repmgrd will
wait for other standbys to disconnect their WAL receivers in a failover
situation.

This setting is only used when "standby_disconnect_on_failover" is set to "true".
2019-03-06 15:56:21 +09:00
Ian Barwick 2ed044c358 Reset "wal_retrieve_retry_interval" for all nodes 2019-03-06 15:55:03 +09:00
Ian Barwick 9823978f41 repmgrd: don't wait for WAL receiver to reconnect during failover
If the WAL receiver has been temporarily disabled, we don't want to
wait for it to start up as it may not be able to at that point; we do
however need to reset "wal_retrieve_retry_interval".
2019-03-06 15:54:56 +09:00
Ian Barwick ae8171e461 Improve logging/sanity checking for "node control" options 2019-03-06 15:54:30 +09:00
Ian Barwick 1f8f64d57c Improve logging when disabling/enabling WAL receiver
Also check action is being run on node which is in recovery.
2019-03-06 15:54:26 +09:00
Ian Barwick 13c650fa83 Check for WAL receiver start up 2019-03-06 15:54:23 +09:00
Ian Barwick f85b4cd98e Log warning if "standby_disconnect_on_failover" used on pre-9.5
"standby_disconnect_on_failover" requires availability of "wal_retrieve_retry_interval",
which is available from PostgreSQL 9.5.

9.4 will fall out of community support this year, so it doesn't seem
productive at this point to do anything more than put the onus on the user
to read the documentation and heed any warning messages in the logs.
2019-03-06 15:54:15 +09:00
Ian Barwick 1615353f48 repmgrd: optionally disconnect WAL receivers during failover
This is intended to ensure that all nodes have a constant LSN while
making the failover decision.

This feature is experimental and needs to be explicitly enabled with the
configuration file option "standby_disconnect_on_failover".

Note enabling this option will result in a delay in the failover decision
until the WAL receiver is disconnected on all nodes.
2019-03-06 15:53:57 +09:00
Ian Barwick dd04ebb809 repmgrd: handle reconnect to restarted server when using "connection" checks 2019-03-06 14:54:05 +09:00
Ian Barwick b4dcda37a1 *_transaction() functions: log error message text as DETAIL
Per behaviour elsewhere.
2019-03-06 12:12:47 +09:00
Ian Barwick 63f7ad546e repmgrd: add option "connection_check_type"
This enable selection of the method repmgrd uses to check whether the upstream
node is available. Possible values are:

 - "ping" (default): uses PQping() to check server availability
 - "connection":  executes a query on the connection to check server
   availability (similar to repmgr3.x).
2019-03-06 12:09:54 +09:00
Ian Barwick 4f83111033 repmgrd: ignore invalid "upstream_last_seen" value 2019-03-05 11:00:29 +09:00
Ian Barwick 92103c5338 Use appendPQExpBufferStr where approrpriate 2019-03-01 16:42:00 +09:00
Ian Barwick 4b89cbd98d Rename "..._primary_last_seen" functions to "..._upstream_last_seen"
As that better reflects what they do.
2019-02-28 15:36:55 +09:00
Ian Barwick 0330fa6e62 daemon status: with csv output, show repmgrd status as unknown where appropriate
Previously, if PostgreSQL was not running on the node, repmgrd and
pause status were shown as "0", implying their status was known.

This brings the csv output in line with the human-readable output,
which displays "n/a" in this case.
2019-02-28 12:27:39 +09:00
Ian Barwick 4006f8af3c doc: upate release notes 2019-02-28 10:01:51 +09:00
Ian Barwick b1875a8d91 Split command execution functions into separate library
These may need to be executed by repmgrd.
2019-02-27 14:41:17 +09:00
Ian Barwick 5c2264eb8d Update .gitignore
Ignore artefacts from failed patch application.
2019-02-27 13:02:30 +09:00
Ian Barwick a6c16541c2 doc: tweak wording in event notification documentation 2019-02-27 13:01:19 +09:00
Ian Barwick 790a1cc492 repmgrd: add additional logging during a failover operation 2019-02-27 11:46:05 +09:00
Ian Barwick 067ed82931 Remove unneeded debugging output 2019-02-26 21:16:11 +09:00
Ian Barwick 59f32d74df doc: update introductory blurb 2019-02-26 15:16:46 +09:00
Ian Barwick 0578053875 standby clone: check upstream connections after data copy operation
With long-running copy operations, it's possible the connection(s) to
the primary/source server may go away for some reason, so recheck
their availability before attempting to reuse.
2019-02-26 14:37:05 +09:00
John Naylor 897e3bee14 Doc fix: PostgreSQL 9.4 is no longer considered recent 2019-02-24 12:44:10 +07:00
John Naylor 4e414d2ea0 Fix typo 2019-02-24 10:50:09 +07:00
Ian Barwick ea36609159 Add some missing query error logging 2019-02-23 16:54:07 +09:00
Ian Barwick 0c68018631 repmgrd: log details of nodes which can see primary
If a failover is cancelled because other nodes can still see the primary,
log the identies of those nodes.
2019-02-23 15:55:06 +09:00
Ian Barwick b72c894db4 repmgrd: during failover, check if other nodes have seen the primary
In a situation where only some standbys are cut off from the primary,
a failover would result in a split brain/split cluster situation,
as it's likely one of the cut-off standbys will promote itself, and
other cut-off standbys (but not all standbys) will follow it.

To prevent this happening, interrogate the other sibiling nodes to
check whether they've seen the primary within a reasonably short interval;
if this is the case, do not take any failover action.

This feature is experimental.
2019-02-23 13:03:22 +09:00
Ian Barwick 07097575b1 daemon status: add column "upstream last seen"
This displays the interval (in seconds) since the repmgrd instance on
each node last confirmed its upstream node is available.
2019-02-23 13:03:16 +09:00
Ian Barwick 71d151ca87 Don't check status of logical replication slots
We only want to check the status of physical replication slots
to determine whether a streaming replication standby has become
detached and there is therefore a risk of uncontrolled WAL buildup
on the local node.

It's not feasible to second-guess the state of logical replication
slots.
2019-02-23 10:09:43 +09:00
Ian Barwick 5abec2bb97 doc: clarify replication slot usage with Barman
Barman will usually use one replication slot, but that's generally
preferable to multiple slots.
2019-02-22 13:52:02 +09:00
Ian Barwick de70fd42dc node check: simplify output generation in --is-shutdown-cleanly check 2019-02-22 10:49:06 +09:00
Ian Barwick 99550b91bd standby register: warn if standby is running and connection params provided
Addresses GitHub #552.
2019-02-22 10:31:00 +09:00
John Naylor 70190c37c4 Bring list of supported versions on the doc front page in line with the supported version matrix 2019-02-20 11:41:17 +07:00
Ian Barwick f3fc4e5afb Minor syntax formatting tweak
For consistency.
2019-02-15 19:58:35 +09:00
Ian Barwick 629c552348 primary unregister: ensure correct behaviour when executed on a witness
Fixes GitHub #548.
2019-02-15 19:49:17 +09:00
Ian Barwick 85a97c933f Handle unhandled NodeStatus in switch statement 2019-02-15 19:31:06 +09:00
Ian Barwick 3a5a4388c7 cluster show: differentiate unreachable status
Differentiate between unreachable nodes and nodes which are running
but rejecting connections.
2019-02-15 16:01:55 +09:00
Ian Barwick 9338a9e233 Improve logging output
Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail line
2019-02-15 10:49:56 +09:00
Ian Barwick 7fad2ed2c8 standby switchover: improve error output
It wasn't clear why repmgr thinks the demotion candidate is not
the upstream of the promotion candidate.
2019-02-14 17:22:24 +09:00
Ian Barwick 9305953bd2 Fix history file parsing
Also add additional debugging output.
2019-02-14 15:52:40 +09:00
Ian Barwick aeb9639ed9 node rejoin: add more log detail during rejoin success check
Stating what is actually being checked where might be useful
when diagnosing potential issues.
2019-02-13 15:29:39 +09:00
Ian Barwick bc9e725d05 node rejoin: always emit detail about relative LSNs
Previously repmgr only emitted that if there was a timeline/LSN
mismatch, but it's useful to have confirmation of how it came
to the conclusion that rejoin will succeed.
2019-02-13 15:16:40 +09:00
Ian Barwick 905e108f8f doc: fix typos etc. in "standby follow" reference 2019-02-12 17:24:56 +09:00