Ian Barwick
48a2274b11
Use appendPQExpBufferStr where approrpriate
2019-03-06 13:23:38 +09:00
Ian Barwick
19bcfa7264
Rename "..._primary_last_seen" functions to "..._upstream_last_seen"
...
As that better reflects what they do.
2019-03-06 13:23:33 +09:00
Ian Barwick
486877c3d5
repmgrd: log details of nodes which can see primary
...
If a failover is cancelled because other nodes can still see the primary,
log the identies of those nodes.
2019-03-06 13:23:27 +09:00
Ian Barwick
9753bcc8c3
repmgrd: during failover, check if other nodes have seen the primary
...
In a situation where only some standbys are cut off from the primary,
a failover would result in a split brain/split cluster situation,
as it's likely one of the cut-off standbys will promote itself, and
other cut-off standbys (but not all standbys) will follow it.
To prevent this happening, interrogate the other sibiling nodes to
check whether they've seen the primary within a reasonably short interval;
if this is the case, do not take any failover action.
This feature is experimental.
2019-03-06 13:23:22 +09:00
Ian Barwick
bd35b450da
daemon status: with csv output, show repmgrd status as unknown where appropriate
...
Previously, if PostgreSQL was not running on the node, repmgrd and
pause status were shown as "0", implying their status was known.
This brings the csv output in line with the human-readable output,
which displays "n/a" in this case.
2019-02-28 12:28:04 +09:00
Ian Barwick
1f256d4d73
doc: upate release notes
2019-02-28 10:02:05 +09:00
Ian Barwick
1524e2449f
Split command execution functions into separate library
...
These may need to be executed by repmgrd.
2019-02-27 14:41:38 +09:00
Ian Barwick
0cd2bd2e91
repmgrd: add additional logging during a failover operation
2019-02-27 11:45:34 +09:00
Ian Barwick
98b78df16c
Remove unneeded debugging output
2019-02-26 21:17:17 +09:00
Ian Barwick
b946dce2f0
doc: update introductory blurb
2019-02-26 15:19:41 +09:00
Ian Barwick
39234afcbf
standby clone: check upstream connections after data copy operation
...
With long-running copy operations, it's possible the connection(s) to
the primary/source server may go away for some reason, so recheck
their availability before attempting to reuse.
2019-02-26 14:37:51 +09:00
John Naylor
23569a19b1
Doc fix: PostgreSQL 9.4 is no longer considered recent
2019-02-25 13:02:56 +09:00
John Naylor
c650fd3412
Fix typo
2019-02-25 13:02:51 +09:00
Ian Barwick
c30e65b3f2
Add some missing query error logging
2019-02-25 13:02:45 +09:00
Ian Barwick
07097575b1
daemon status: add column "upstream last seen"
...
This displays the interval (in seconds) since the repmgrd instance on
each node last confirmed its upstream node is available.
2019-02-23 13:03:16 +09:00
Ian Barwick
71d151ca87
Don't check status of logical replication slots
...
We only want to check the status of physical replication slots
to determine whether a streaming replication standby has become
detached and there is therefore a risk of uncontrolled WAL buildup
on the local node.
It's not feasible to second-guess the state of logical replication
slots.
2019-02-23 10:09:43 +09:00
Ian Barwick
5abec2bb97
doc: clarify replication slot usage with Barman
...
Barman will usually use one replication slot, but that's generally
preferable to multiple slots.
2019-02-22 13:52:02 +09:00
Ian Barwick
de70fd42dc
node check: simplify output generation in --is-shutdown-cleanly check
2019-02-22 10:49:06 +09:00
Ian Barwick
99550b91bd
standby register: warn if standby is running and connection params provided
...
Addresses GitHub #552 .
2019-02-22 10:31:00 +09:00
John Naylor
70190c37c4
Bring list of supported versions on the doc front page in line with the supported version matrix
2019-02-20 11:41:17 +07:00
Ian Barwick
f3fc4e5afb
Minor syntax formatting tweak
...
For consistency.
2019-02-15 19:58:35 +09:00
Ian Barwick
629c552348
primary unregister: ensure correct behaviour when executed on a witness
...
Fixes GitHub #548 .
2019-02-15 19:49:17 +09:00
Ian Barwick
85a97c933f
Handle unhandled NodeStatus in switch statement
2019-02-15 19:31:06 +09:00
Ian Barwick
3a5a4388c7
cluster show: differentiate unreachable status
...
Differentiate between unreachable nodes and nodes which are running
but rejecting connections.
2019-02-15 16:01:55 +09:00
Ian Barwick
9338a9e233
Improve logging output
...
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail lineImprove logging output
Avoid emitting blank detail line
2019-02-15 10:49:56 +09:00
Ian Barwick
7fad2ed2c8
standby switchover: improve error output
...
It wasn't clear why repmgr thinks the demotion candidate is not
the upstream of the promotion candidate.
2019-02-14 17:22:24 +09:00
Ian Barwick
9305953bd2
Fix history file parsing
...
Also add additional debugging output.
2019-02-14 15:52:40 +09:00
Ian Barwick
aeb9639ed9
node rejoin: add more log detail during rejoin success check
...
Stating what is actually being checked where might be useful
when diagnosing potential issues.
2019-02-13 15:29:39 +09:00
Ian Barwick
bc9e725d05
node rejoin: always emit detail about relative LSNs
...
Previously repmgr only emitted that if there was a timeline/LSN
mismatch, but it's useful to have confirmation of how it came
to the conclusion that rejoin will succeed.
2019-02-13 15:16:40 +09:00
Ian Barwick
905e108f8f
doc: fix typos etc. in "standby follow" reference
2019-02-12 17:24:56 +09:00
Ian Barwick
f2362a06fa
doc: update "standby switchover" reference
2019-02-12 16:39:13 +09:00
Ian Barwick
7b85cb9f12
doc: update "standby follow" reference
...
Add note about handling of timeline forks etc.
2019-02-12 16:39:06 +09:00
Ian Barwick
790bec21dd
node rejoin: handle case where node to rejoin was primary
...
In that case the minRecoveryPoint* fields may be empty.
2019-02-12 13:31:25 +09:00
Ian Barwick
a0dc673439
"node rejoin": use minRecoveryPointTLI for comparing timelines
2019-02-12 13:31:21 +09:00
Ian Barwick
25019d1cc5
Refactor is_wal_replay_paused() query
...
Make sure it doesn't emit an error if executed on a node not
in recovery.
The caller should theoretically only execute it on nodes in
recovery, but there are sure to be corner cases where the node
has come out of recovery.
2019-02-12 10:21:05 +09:00
Ian Barwick
d00cb767a6
cluster show: don't try to run WAL replay pause query on unreachable node
2019-02-12 10:15:06 +09:00
Ian Barwick
8e0d28d8dc
Fix "repmgr daemon --help" output
...
Per report from Shaun.
2019-02-12 09:20:29 +09:00
yonj1e
e146fb4fc3
Fix undeclared 'TRUE' error
...
GitHub #547 .
2019-02-11 16:55:54 +09:00
Ian Barwick
8773543e10
doc: update "daemon (start|stop)" documentation
...
Clarify various aspects related to configuration.
2019-02-11 10:55:10 +09:00
Ian Barwick
a4cd4ee553
doc: fix quoting in "standby switchover" index entries
2019-02-11 10:34:02 +09:00
Ian Barwick
a61dd8a750
doc: tweak support text
2019-02-08 15:28:12 +09:00
Ian Barwick
2c84716e66
doc: add information about reporting issues etc.
...
Useful to have a linkable document listing the information required
to have a chance of troubleshooting issues.
2019-02-08 11:55:42 +09:00
Ian Barwick
f1667a7e98
repmgrd: don't consider nodes where repmgrd is not running
...
If, for whatever reason, repmgrd is not running on a node, but that
node qualifies as promotion candidate, failover will not take place
as that node will never promote itself.
We therefore discount nodes where repmgrd is running as promotion
candidates, which will ensure one node is always promoted.
There is a slight risk here that the node(s) where repmgrd is not running
are further ahead, leading to a timeline fork. It might be possible
to mitigate that by having the "election" leader perform the promote
(or follow) operation.
2019-02-07 17:07:13 +09:00
Ian Barwick
b91900f831
doc: clarify "repmgr daemon status" CSV output
2019-02-07 14:55:42 +09:00
Ian Barwick
aa1e64ec11
Warn about redundant use of --compact option
2019-02-07 14:35:30 +09:00
Ian Barwick
5d6037303b
"daemon status": display node priority
...
GitHub #541 .
2019-02-07 14:35:24 +09:00
Ian Barwick
8aaf6571a0
"cluster show": display node priority
...
GitHUb #541 .
2019-02-07 14:35:21 +09:00
Ian Barwick
9433f80364
"cluster show": warn about nodes with paused WAL replay
...
We do this in "repmgr daemon status" already, so do it here too for consistency.
Related to GitHub #540 .
2019-02-07 13:48:46 +09:00
Ian Barwick
aee13aee52
doc: note repmgrd behaviour when WAL replay is paused
...
Related to GitHub #540 .
2019-02-07 13:28:29 +09:00
Ian Barwick
f0a0be0248
Remove pointless default allocation in _get_node_record()
2019-02-07 11:41:08 +09:00