Commit Graph

1330 Commits

Author SHA1 Message Date
Ian Barwick de70fd42dc node check: simplify output generation in --is-shutdown-cleanly check 2019-02-22 10:49:06 +09:00
Ian Barwick 99550b91bd standby register: warn if standby is running and connection params provided
Addresses GitHub #552.
2019-02-22 10:31:00 +09:00
John Naylor 70190c37c4 Bring list of supported versions on the doc front page in line with the supported version matrix 2019-02-20 11:41:17 +07:00
Ian Barwick f3fc4e5afb Minor syntax formatting tweak
For consistency.
2019-02-15 19:58:35 +09:00
Ian Barwick 629c552348 primary unregister: ensure correct behaviour when executed on a witness
Fixes GitHub #548.
2019-02-15 19:49:17 +09:00
Ian Barwick 85a97c933f Handle unhandled NodeStatus in switch statement 2019-02-15 19:31:06 +09:00
Ian Barwick 3a5a4388c7 cluster show: differentiate unreachable status
Differentiate between unreachable nodes and nodes which are running
but rejecting connections.
2019-02-15 16:01:55 +09:00
Ian Barwick 9338a9e233 Improve logging output
Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail line
2019-02-15 10:49:56 +09:00
Ian Barwick 7fad2ed2c8 standby switchover: improve error output
It wasn't clear why repmgr thinks the demotion candidate is not
the upstream of the promotion candidate.
2019-02-14 17:22:24 +09:00
Ian Barwick 9305953bd2 Fix history file parsing
Also add additional debugging output.
2019-02-14 15:52:40 +09:00
Ian Barwick aeb9639ed9 node rejoin: add more log detail during rejoin success check
Stating what is actually being checked where might be useful
when diagnosing potential issues.
2019-02-13 15:29:39 +09:00
Ian Barwick bc9e725d05 node rejoin: always emit detail about relative LSNs
Previously repmgr only emitted that if there was a timeline/LSN
mismatch, but it's useful to have confirmation of how it came
to the conclusion that rejoin will succeed.
2019-02-13 15:16:40 +09:00
Ian Barwick 905e108f8f doc: fix typos etc. in "standby follow" reference 2019-02-12 17:24:56 +09:00
Ian Barwick f2362a06fa doc: update "standby switchover" reference 2019-02-12 16:39:13 +09:00
Ian Barwick 7b85cb9f12 doc: update "standby follow" reference
Add note about handling of timeline forks etc.
2019-02-12 16:39:06 +09:00
Ian Barwick 790bec21dd node rejoin: handle case where node to rejoin was primary
In that case the minRecoveryPoint* fields may be empty.
2019-02-12 13:31:25 +09:00
Ian Barwick a0dc673439 "node rejoin": use minRecoveryPointTLI for comparing timelines 2019-02-12 13:31:21 +09:00
Ian Barwick 25019d1cc5 Refactor is_wal_replay_paused() query
Make sure it doesn't emit an error if executed on a node not
in recovery.

The caller should theoretically only execute it on nodes in
recovery, but there are sure to be corner cases where the node
has come out of recovery.
2019-02-12 10:21:05 +09:00
Ian Barwick d00cb767a6 cluster show: don't try to run WAL replay pause query on unreachable node 2019-02-12 10:15:06 +09:00
Ian Barwick 8e0d28d8dc Fix "repmgr daemon --help" output
Per report from Shaun.
2019-02-12 09:20:29 +09:00
yonj1e e146fb4fc3 Fix undeclared 'TRUE' error
GitHub #547.
2019-02-11 16:55:54 +09:00
Ian Barwick 8773543e10 doc: update "daemon (start|stop)" documentation
Clarify various aspects related to configuration.
2019-02-11 10:55:10 +09:00
Ian Barwick a4cd4ee553 doc: fix quoting in "standby switchover" index entries 2019-02-11 10:34:02 +09:00
Ian Barwick a61dd8a750 doc: tweak support text 2019-02-08 15:28:12 +09:00
Ian Barwick 2c84716e66 doc: add information about reporting issues etc.
Useful to have a linkable document listing the information required
to have a chance of troubleshooting issues.
2019-02-08 11:55:42 +09:00
Ian Barwick f1667a7e98 repmgrd: don't consider nodes where repmgrd is not running
If, for whatever reason, repmgrd is not running on a node, but that
node qualifies as promotion candidate, failover will not take place
as that node will never promote itself.

We therefore discount nodes where repmgrd is running as promotion
candidates, which will ensure one node is always promoted.

There is a slight risk here that the node(s) where repmgrd is not running
are further ahead, leading to a timeline fork. It might be possible
to mitigate that by having the "election" leader perform the promote
(or follow) operation.
2019-02-07 17:07:13 +09:00
Ian Barwick b91900f831 doc: clarify "repmgr daemon status" CSV output 2019-02-07 14:55:42 +09:00
Ian Barwick aa1e64ec11 Warn about redundant use of --compact option 2019-02-07 14:35:30 +09:00
Ian Barwick 5d6037303b "daemon status": display node priority
GitHub #541.
2019-02-07 14:35:24 +09:00
Ian Barwick 8aaf6571a0 "cluster show": display node priority
GitHUb #541.
2019-02-07 14:35:21 +09:00
Ian Barwick 9433f80364 "cluster show": warn about nodes with paused WAL replay
We do this in "repmgr daemon status" already, so do it here too for consistency.

Related to GitHub #540.
2019-02-07 13:48:46 +09:00
Ian Barwick aee13aee52 doc: note repmgrd behaviour when WAL replay is paused
Related to GitHub #540.
2019-02-07 13:28:29 +09:00
Ian Barwick f0a0be0248 Remove pointless default allocation in _get_node_record() 2019-02-07 11:41:08 +09:00
Ian Barwick c4332d9a52 repmgrd: forcibly resume WAL replay if paused
If WAL replay is paused, and there is WAL pending replay, a promote command
will be queued until replay is resumed.

As it's conceivable that there are corner cases where one standby with
replay paused has actually received the most WAL, we'll forcibly
resume WAL replay so it can be reliably promoted, if needed.

Related to GitHub #540.
2019-02-07 11:39:48 +09:00
Ian Barwick c7b325e2a4 Add function resume_wal_replay() 2019-02-07 11:33:02 +09:00
Ian Barwick b89941f218 Store WAL replay pause status in ReplInfo struct 2019-02-07 10:24:42 +09:00
Ian Barwick 2b3b1faa20 refactor query in function get_replication_info()
In particular handle all cases where one of the functions called
in the query can return NULL in the query itself.
2019-02-06 15:40:27 +09:00
Ian Barwick b9cd321aed repmgrd: skip LSN checks of 0 priority node
The node will never become a candidate so we can save the round trip
to fetch its LSN.
2019-02-06 14:27:01 +09:00
Ian Barwick 984ce7420b "daemon status": emit warning if WAL replay is paused
Specifically, if WAL replay is paused *and* WAL is pending replay,
this node cannot be promoted until WAL replay is unpaused. In this
state it is not a suitable promotion candidate in a failover situation.
2019-02-06 13:32:20 +09:00
Ian Barwick 464ec6bec3 Ensure conninfo param list is initialized for --recovery-conf-only option 2019-02-06 12:58:09 +09:00
Ian Barwick 3bbbf6daa9 "recovery_file_path" is MAXPGPATH 2019-02-06 10:42:09 +09:00
Ian Barwick cd3312496e Rename functions which return an LSN for clarity 2019-02-06 09:32:53 +09:00
Ian Barwick cce8b76171 "standby switchover": abort if promotion candidate has WAL replay paused
If replay is paused, we can't be really sure that more WAL will be received
between the check and the promote operation, which would risk the promote
operation not taking place during the switchover (it would happen
as soon as WAL replay is resumed and pending WAL is replayed).

Therefore we simply quit with an informative slew of messages and
leave the user to sort it out.

GitHub #540.
2019-02-05 16:32:39 +09:00
Ian Barwick 2a529e7e8b "standby promote": don't promote if replay paused and in archive recovery
It does not appear feasible to predict if there is still WAL waiting to
be replayed from archive. In this case take no action.

GitHub #540.
2019-02-05 14:39:08 +09:00
Ian Barwick f62b3b2868 Fix Pg10+ function names 2019-02-05 13:37:35 +09:00
Ian Barwick 701944c194 "standby promote": add check for WAL replay status if replay is paused
If WAL replay is paused but WAL is still pending replay, PostgreSQL will ignore
the promote request until WAL replay is unpaused. This may lead to the standby
being promoted at an unpredictable point in time outside of repmgr's
control. Moreover it may not be obvious that this is happening, or why, and
it will appear that an apparently successful promotion attempt has not
actually worked.

To prevent this from happening, repmgr will now refuse to promote the
standy if WAL replay is paused *and* WAL is still pending replay.

GitHub #540.
2019-02-05 13:30:37 +09:00
Ian Barwick d8048060a2 doc: rephrase exit code preamble
Previously it kind of implied more than one code can be emitted.
2019-02-05 11:06:26 +09:00
Ian Barwick 31f25856a2 doc: update "repmgr node rejoin" reference 2019-02-05 10:57:23 +09:00
Ian Barwick 92c73b68a0 Clean up dbutils.c
Put functions into the same "section" as noted in the header file.
2019-02-05 09:36:54 +09:00
Ian Barwick 90909e2e42 doc: update source install instructions
Note packages required to compile if the package "build dep"
option is not viable.
2019-02-04 17:09:11 +09:00