Commit Graph

704 Commits

Author SHA1 Message Date
John Naylor e06d3de444 Correct some doc typos 2019-03-14 11:58:31 +08:00
Ian Barwick 9d056b2f72 doc: expand "standby_disconnect_on_failover" documentation 2019-03-14 12:08:13 +09:00
Ian Barwick c1d6753081 doc: fix option name typo 2019-03-14 09:32:06 +09:00
Ian Barwick 2b59b4894a doc: expand "failover_validate_command" documentation 2019-03-13 21:10:03 +09:00
Ian Barwick 0e2f3e563a doc: various updates 2019-03-13 16:55:32 +09:00
Ian Barwick 8c4421d110 doc: merge repmgrd witness server description into failover section 2019-03-13 16:12:17 +09:00
Ian Barwick 69cb3f1e82 doc: merge repmgrd split network handling description into failover section 2019-03-13 16:12:14 +09:00
Ian Barwick 960acfeb3c doc: merge repmgrd monitoring description into operating section 2019-03-13 16:12:11 +09:00
Ian Barwick a8d50a5b98 doc: merge repmgrd degraded monitoring description into operation section 2019-03-13 16:12:06 +09:00
Ian Barwick 11e5993bf5 doc: merge repmgrd notes into operation documentation 2019-03-13 16:12:03 +09:00
Ian Barwick 09861a5604 doc: merge repmgrd pause documentation into overview 2019-03-13 16:11:59 +09:00
Ian Barwick 89bba77d4d doc: initial repmgrd doc refactoring 2019-03-13 16:11:55 +09:00
Ian Barwick dd6ece326f doc: update repmgrd configuration documentation 2019-03-13 13:34:08 +09:00
Ian Barwick fc397f25f6 repmgrd: enable election rerun
If "failover_validation_command" is set, and the command returns an error,
rerun the election.

There is a pause between reruns to avoid "churn"; the length of this pause
is controlled by the configuration parameter "election_rerun_interval".
2019-03-12 17:12:19 +09:00
Ian Barwick b9cdcd55e7 doc: update list of reloadable repmgrd configuration options 2019-03-11 16:18:10 +09:00
Ian Barwick db87ff46fd doc: document "failover_validation_command" 2019-03-11 15:02:33 +09:00
Ian Barwick 2a8f8d8400 doc: expand repmgrd configuration section 2019-03-11 14:50:33 +09:00
Ian Barwick 63f7ad546e repmgrd: add option "connection_check_type"
This enable selection of the method repmgrd uses to check whether the upstream
node is available. Possible values are:

 - "ping" (default): uses PQping() to check server availability
 - "connection":  executes a query on the connection to check server
   availability (similar to repmgr3.x).
2019-03-06 12:09:54 +09:00
Ian Barwick 0330fa6e62 daemon status: with csv output, show repmgrd status as unknown where appropriate
Previously, if PostgreSQL was not running on the node, repmgrd and
pause status were shown as "0", implying their status was known.

This brings the csv output in line with the human-readable output,
which displays "n/a" in this case.
2019-02-28 12:27:39 +09:00
Ian Barwick 4006f8af3c doc: upate release notes 2019-02-28 10:01:51 +09:00
Ian Barwick a6c16541c2 doc: tweak wording in event notification documentation 2019-02-27 13:01:19 +09:00
Ian Barwick 59f32d74df doc: update introductory blurb 2019-02-26 15:16:46 +09:00
Ian Barwick 0578053875 standby clone: check upstream connections after data copy operation
With long-running copy operations, it's possible the connection(s) to
the primary/source server may go away for some reason, so recheck
their availability before attempting to reuse.
2019-02-26 14:37:05 +09:00
John Naylor 897e3bee14 Doc fix: PostgreSQL 9.4 is no longer considered recent 2019-02-24 12:44:10 +07:00
John Naylor 4e414d2ea0 Fix typo 2019-02-24 10:50:09 +07:00
Ian Barwick 07097575b1 daemon status: add column "upstream last seen"
This displays the interval (in seconds) since the repmgrd instance on
each node last confirmed its upstream node is available.
2019-02-23 13:03:16 +09:00
Ian Barwick 71d151ca87 Don't check status of logical replication slots
We only want to check the status of physical replication slots
to determine whether a streaming replication standby has become
detached and there is therefore a risk of uncontrolled WAL buildup
on the local node.

It's not feasible to second-guess the state of logical replication
slots.
2019-02-23 10:09:43 +09:00
Ian Barwick 5abec2bb97 doc: clarify replication slot usage with Barman
Barman will usually use one replication slot, but that's generally
preferable to multiple slots.
2019-02-22 13:52:02 +09:00
John Naylor 70190c37c4 Bring list of supported versions on the doc front page in line with the supported version matrix 2019-02-20 11:41:17 +07:00
Ian Barwick 629c552348 primary unregister: ensure correct behaviour when executed on a witness
Fixes GitHub #548.
2019-02-15 19:49:17 +09:00
Ian Barwick 3a5a4388c7 cluster show: differentiate unreachable status
Differentiate between unreachable nodes and nodes which are running
but rejecting connections.
2019-02-15 16:01:55 +09:00
Ian Barwick 905e108f8f doc: fix typos etc. in "standby follow" reference 2019-02-12 17:24:56 +09:00
Ian Barwick f2362a06fa doc: update "standby switchover" reference 2019-02-12 16:39:13 +09:00
Ian Barwick 7b85cb9f12 doc: update "standby follow" reference
Add note about handling of timeline forks etc.
2019-02-12 16:39:06 +09:00
Ian Barwick 8773543e10 doc: update "daemon (start|stop)" documentation
Clarify various aspects related to configuration.
2019-02-11 10:55:10 +09:00
Ian Barwick a4cd4ee553 doc: fix quoting in "standby switchover" index entries 2019-02-11 10:34:02 +09:00
Ian Barwick a61dd8a750 doc: tweak support text 2019-02-08 15:28:12 +09:00
Ian Barwick 2c84716e66 doc: add information about reporting issues etc.
Useful to have a linkable document listing the information required
to have a chance of troubleshooting issues.
2019-02-08 11:55:42 +09:00
Ian Barwick f1667a7e98 repmgrd: don't consider nodes where repmgrd is not running
If, for whatever reason, repmgrd is not running on a node, but that
node qualifies as promotion candidate, failover will not take place
as that node will never promote itself.

We therefore discount nodes where repmgrd is running as promotion
candidates, which will ensure one node is always promoted.

There is a slight risk here that the node(s) where repmgrd is not running
are further ahead, leading to a timeline fork. It might be possible
to mitigate that by having the "election" leader perform the promote
(or follow) operation.
2019-02-07 17:07:13 +09:00
Ian Barwick b91900f831 doc: clarify "repmgr daemon status" CSV output 2019-02-07 14:55:42 +09:00
Ian Barwick 5d6037303b "daemon status": display node priority
GitHub #541.
2019-02-07 14:35:24 +09:00
Ian Barwick 8aaf6571a0 "cluster show": display node priority
GitHUb #541.
2019-02-07 14:35:21 +09:00
Ian Barwick aee13aee52 doc: note repmgrd behaviour when WAL replay is paused
Related to GitHub #540.
2019-02-07 13:28:29 +09:00
Ian Barwick cce8b76171 "standby switchover": abort if promotion candidate has WAL replay paused
If replay is paused, we can't be really sure that more WAL will be received
between the check and the promote operation, which would risk the promote
operation not taking place during the switchover (it would happen
as soon as WAL replay is resumed and pending WAL is replayed).

Therefore we simply quit with an informative slew of messages and
leave the user to sort it out.

GitHub #540.
2019-02-05 16:32:39 +09:00
Ian Barwick 2a529e7e8b "standby promote": don't promote if replay paused and in archive recovery
It does not appear feasible to predict if there is still WAL waiting to
be replayed from archive. In this case take no action.

GitHub #540.
2019-02-05 14:39:08 +09:00
Ian Barwick 701944c194 "standby promote": add check for WAL replay status if replay is paused
If WAL replay is paused but WAL is still pending replay, PostgreSQL will ignore
the promote request until WAL replay is unpaused. This may lead to the standby
being promoted at an unpredictable point in time outside of repmgr's
control. Moreover it may not be obvious that this is happening, or why, and
it will appear that an apparently successful promotion attempt has not
actually worked.

To prevent this from happening, repmgr will now refuse to promote the
standy if WAL replay is paused *and* WAL is still pending replay.

GitHub #540.
2019-02-05 13:30:37 +09:00
Ian Barwick d8048060a2 doc: rephrase exit code preamble
Previously it kind of implied more than one code can be emitted.
2019-02-05 11:06:26 +09:00
Ian Barwick 31f25856a2 doc: update "repmgr node rejoin" reference 2019-02-05 10:57:23 +09:00
Ian Barwick 90909e2e42 doc: update source install instructions
Note packages required to compile if the package "build dep"
option is not viable.
2019-02-04 17:09:11 +09:00
Martín Marqués b036870c83 doc: fix typo in the release notes for 4.3
GitHub #539

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2019-02-04 16:39:58 +09:00