John Naylor
23569a19b1
Doc fix: PostgreSQL 9.4 is no longer considered recent
2019-02-25 13:02:56 +09:00
John Naylor
c650fd3412
Fix typo
2019-02-25 13:02:51 +09:00
Ian Barwick
c30e65b3f2
Add some missing query error logging
2019-02-25 13:02:45 +09:00
Ian Barwick
07097575b1
daemon status: add column "upstream last seen"
...
This displays the interval (in seconds) since the repmgrd instance on
each node last confirmed its upstream node is available.
2019-02-23 13:03:16 +09:00
Ian Barwick
71d151ca87
Don't check status of logical replication slots
...
We only want to check the status of physical replication slots
to determine whether a streaming replication standby has become
detached and there is therefore a risk of uncontrolled WAL buildup
on the local node.
It's not feasible to second-guess the state of logical replication
slots.
2019-02-23 10:09:43 +09:00
Ian Barwick
5abec2bb97
doc: clarify replication slot usage with Barman
...
Barman will usually use one replication slot, but that's generally
preferable to multiple slots.
2019-02-22 13:52:02 +09:00
Ian Barwick
de70fd42dc
node check: simplify output generation in --is-shutdown-cleanly check
2019-02-22 10:49:06 +09:00
Ian Barwick
99550b91bd
standby register: warn if standby is running and connection params provided
...
Addresses GitHub #552 .
2019-02-22 10:31:00 +09:00
John Naylor
70190c37c4
Bring list of supported versions on the doc front page in line with the supported version matrix
2019-02-20 11:41:17 +07:00
Ian Barwick
f3fc4e5afb
Minor syntax formatting tweak
...
For consistency.
2019-02-15 19:58:35 +09:00
Ian Barwick
629c552348
primary unregister: ensure correct behaviour when executed on a witness
...
Fixes GitHub #548 .
2019-02-15 19:49:17 +09:00
Ian Barwick
85a97c933f
Handle unhandled NodeStatus in switch statement
2019-02-15 19:31:06 +09:00
Ian Barwick
3a5a4388c7
cluster show: differentiate unreachable status
...
Differentiate between unreachable nodes and nodes which are running
but rejecting connections.
2019-02-15 16:01:55 +09:00
Ian Barwick
9338a9e233
Improve logging output
...
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail line
2019-02-15 10:49:56 +09:00
Ian Barwick
7fad2ed2c8
standby switchover: improve error output
...
It wasn't clear why repmgr thinks the demotion candidate is not
the upstream of the promotion candidate.
2019-02-14 17:22:24 +09:00
Ian Barwick
9305953bd2
Fix history file parsing
...
Also add additional debugging output.
2019-02-14 15:52:40 +09:00
Ian Barwick
aeb9639ed9
node rejoin: add more log detail during rejoin success check
...
Stating what is actually being checked where might be useful
when diagnosing potential issues.
2019-02-13 15:29:39 +09:00
Ian Barwick
bc9e725d05
node rejoin: always emit detail about relative LSNs
...
Previously repmgr only emitted that if there was a timeline/LSN
mismatch, but it's useful to have confirmation of how it came
to the conclusion that rejoin will succeed.
2019-02-13 15:16:40 +09:00
Ian Barwick
905e108f8f
doc: fix typos etc. in "standby follow" reference
2019-02-12 17:24:56 +09:00
Ian Barwick
f2362a06fa
doc: update "standby switchover" reference
2019-02-12 16:39:13 +09:00
Ian Barwick
7b85cb9f12
doc: update "standby follow" reference
...
Add note about handling of timeline forks etc.
2019-02-12 16:39:06 +09:00
Ian Barwick
790bec21dd
node rejoin: handle case where node to rejoin was primary
...
In that case the minRecoveryPoint* fields may be empty.
2019-02-12 13:31:25 +09:00
Ian Barwick
a0dc673439
"node rejoin": use minRecoveryPointTLI for comparing timelines
2019-02-12 13:31:21 +09:00
Ian Barwick
25019d1cc5
Refactor is_wal_replay_paused() query
...
Make sure it doesn't emit an error if executed on a node not
in recovery.
The caller should theoretically only execute it on nodes in
recovery, but there are sure to be corner cases where the node
has come out of recovery.
2019-02-12 10:21:05 +09:00
Ian Barwick
d00cb767a6
cluster show: don't try to run WAL replay pause query on unreachable node
2019-02-12 10:15:06 +09:00
Ian Barwick
8e0d28d8dc
Fix "repmgr daemon --help" output
...
Per report from Shaun.
2019-02-12 09:20:29 +09:00
yonj1e
e146fb4fc3
Fix undeclared 'TRUE' error
...
GitHub #547 .
2019-02-11 16:55:54 +09:00
Ian Barwick
8773543e10
doc: update "daemon (start|stop)" documentation
...
Clarify various aspects related to configuration.
2019-02-11 10:55:10 +09:00
Ian Barwick
a4cd4ee553
doc: fix quoting in "standby switchover" index entries
2019-02-11 10:34:02 +09:00
Ian Barwick
a61dd8a750
doc: tweak support text
2019-02-08 15:28:12 +09:00
Ian Barwick
2c84716e66
doc: add information about reporting issues etc.
...
Useful to have a linkable document listing the information required
to have a chance of troubleshooting issues.
2019-02-08 11:55:42 +09:00
Ian Barwick
f1667a7e98
repmgrd: don't consider nodes where repmgrd is not running
...
If, for whatever reason, repmgrd is not running on a node, but that
node qualifies as promotion candidate, failover will not take place
as that node will never promote itself.
We therefore discount nodes where repmgrd is running as promotion
candidates, which will ensure one node is always promoted.
There is a slight risk here that the node(s) where repmgrd is not running
are further ahead, leading to a timeline fork. It might be possible
to mitigate that by having the "election" leader perform the promote
(or follow) operation.
2019-02-07 17:07:13 +09:00
Ian Barwick
b91900f831
doc: clarify "repmgr daemon status" CSV output
2019-02-07 14:55:42 +09:00
Ian Barwick
aa1e64ec11
Warn about redundant use of --compact option
2019-02-07 14:35:30 +09:00
Ian Barwick
5d6037303b
"daemon status": display node priority
...
GitHub #541 .
2019-02-07 14:35:24 +09:00
Ian Barwick
8aaf6571a0
"cluster show": display node priority
...
GitHUb #541 .
2019-02-07 14:35:21 +09:00
Ian Barwick
9433f80364
"cluster show": warn about nodes with paused WAL replay
...
We do this in "repmgr daemon status" already, so do it here too for consistency.
Related to GitHub #540 .
2019-02-07 13:48:46 +09:00
Ian Barwick
aee13aee52
doc: note repmgrd behaviour when WAL replay is paused
...
Related to GitHub #540 .
2019-02-07 13:28:29 +09:00
Ian Barwick
f0a0be0248
Remove pointless default allocation in _get_node_record()
2019-02-07 11:41:08 +09:00
Ian Barwick
c4332d9a52
repmgrd: forcibly resume WAL replay if paused
...
If WAL replay is paused, and there is WAL pending replay, a promote command
will be queued until replay is resumed.
As it's conceivable that there are corner cases where one standby with
replay paused has actually received the most WAL, we'll forcibly
resume WAL replay so it can be reliably promoted, if needed.
Related to GitHub #540 .
2019-02-07 11:39:48 +09:00
Ian Barwick
c7b325e2a4
Add function resume_wal_replay()
2019-02-07 11:33:02 +09:00
Ian Barwick
b89941f218
Store WAL replay pause status in ReplInfo struct
2019-02-07 10:24:42 +09:00
Ian Barwick
2b3b1faa20
refactor query in function get_replication_info()
...
In particular handle all cases where one of the functions called
in the query can return NULL in the query itself.
2019-02-06 15:40:27 +09:00
Ian Barwick
b9cd321aed
repmgrd: skip LSN checks of 0 priority node
...
The node will never become a candidate so we can save the round trip
to fetch its LSN.
2019-02-06 14:27:01 +09:00
Ian Barwick
984ce7420b
"daemon status": emit warning if WAL replay is paused
...
Specifically, if WAL replay is paused *and* WAL is pending replay,
this node cannot be promoted until WAL replay is unpaused. In this
state it is not a suitable promotion candidate in a failover situation.
2019-02-06 13:32:20 +09:00
Ian Barwick
464ec6bec3
Ensure conninfo param list is initialized for --recovery-conf-only option
2019-02-06 12:58:09 +09:00
Ian Barwick
3bbbf6daa9
"recovery_file_path" is MAXPGPATH
2019-02-06 10:42:09 +09:00
Ian Barwick
cd3312496e
Rename functions which return an LSN for clarity
2019-02-06 09:32:53 +09:00
Ian Barwick
cce8b76171
"standby switchover": abort if promotion candidate has WAL replay paused
...
If replay is paused, we can't be really sure that more WAL will be received
between the check and the promote operation, which would risk the promote
operation not taking place during the switchover (it would happen
as soon as WAL replay is resumed and pending WAL is replayed).
Therefore we simply quit with an informative slew of messages and
leave the user to sort it out.
GitHub #540 .
2019-02-05 16:32:39 +09:00
Ian Barwick
2a529e7e8b
"standby promote": don't promote if replay paused and in archive recovery
...
It does not appear feasible to predict if there is still WAL waiting to
be replayed from archive. In this case take no action.
GitHub #540 .
2019-02-05 14:39:08 +09:00