Commit Graph

370 Commits

Author SHA1 Message Date
Ian Barwick
8773543e10 doc: update "daemon (start|stop)" documentation
Clarify various aspects related to configuration.
2019-02-11 10:55:10 +09:00
Ian Barwick
a4cd4ee553 doc: fix quoting in "standby switchover" index entries 2019-02-11 10:34:02 +09:00
Ian Barwick
a61dd8a750 doc: tweak support text 2019-02-08 15:28:12 +09:00
Ian Barwick
2c84716e66 doc: add information about reporting issues etc.
Useful to have a linkable document listing the information required
to have a chance of troubleshooting issues.
2019-02-08 11:55:42 +09:00
Ian Barwick
f1667a7e98 repmgrd: don't consider nodes where repmgrd is not running
If, for whatever reason, repmgrd is not running on a node, but that
node qualifies as promotion candidate, failover will not take place
as that node will never promote itself.

We therefore discount nodes where repmgrd is running as promotion
candidates, which will ensure one node is always promoted.

There is a slight risk here that the node(s) where repmgrd is not running
are further ahead, leading to a timeline fork. It might be possible
to mitigate that by having the "election" leader perform the promote
(or follow) operation.
2019-02-07 17:07:13 +09:00
Ian Barwick
b91900f831 doc: clarify "repmgr daemon status" CSV output 2019-02-07 14:55:42 +09:00
Ian Barwick
5d6037303b "daemon status": display node priority
GitHub #541.
2019-02-07 14:35:24 +09:00
Ian Barwick
8aaf6571a0 "cluster show": display node priority
GitHUb #541.
2019-02-07 14:35:21 +09:00
Ian Barwick
aee13aee52 doc: note repmgrd behaviour when WAL replay is paused
Related to GitHub #540.
2019-02-07 13:28:29 +09:00
Ian Barwick
cce8b76171 "standby switchover": abort if promotion candidate has WAL replay paused
If replay is paused, we can't be really sure that more WAL will be received
between the check and the promote operation, which would risk the promote
operation not taking place during the switchover (it would happen
as soon as WAL replay is resumed and pending WAL is replayed).

Therefore we simply quit with an informative slew of messages and
leave the user to sort it out.

GitHub #540.
2019-02-05 16:32:39 +09:00
Ian Barwick
2a529e7e8b "standby promote": don't promote if replay paused and in archive recovery
It does not appear feasible to predict if there is still WAL waiting to
be replayed from archive. In this case take no action.

GitHub #540.
2019-02-05 14:39:08 +09:00
Ian Barwick
701944c194 "standby promote": add check for WAL replay status if replay is paused
If WAL replay is paused but WAL is still pending replay, PostgreSQL will ignore
the promote request until WAL replay is unpaused. This may lead to the standby
being promoted at an unpredictable point in time outside of repmgr's
control. Moreover it may not be obvious that this is happening, or why, and
it will appear that an apparently successful promotion attempt has not
actually worked.

To prevent this from happening, repmgr will now refuse to promote the
standy if WAL replay is paused *and* WAL is still pending replay.

GitHub #540.
2019-02-05 13:30:37 +09:00
Ian Barwick
d8048060a2 doc: rephrase exit code preamble
Previously it kind of implied more than one code can be emitted.
2019-02-05 11:06:26 +09:00
Ian Barwick
31f25856a2 doc: update "repmgr node rejoin" reference 2019-02-05 10:57:23 +09:00
Ian Barwick
90909e2e42 doc: update source install instructions
Note packages required to compile if the package "build dep"
option is not viable.
2019-02-04 17:09:11 +09:00
Martín Marqués
b036870c83 doc: fix typo in the release notes for 4.3
GitHub #539

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2019-02-04 16:39:58 +09:00
Ian Barwick
321eb844e4 doc: update Debian/Ubuntu repmgrd configuration
Remove reference to setting "repmgrd_pid_file", as this should not
be set in this context.

Per GitHub #517.
2019-02-04 16:11:13 +09:00
Ian Barwick
2c9700586c repmgr: "witness register" - check connection is to primary node
Previously, if the witness server connection details were provided
to "repmgr witness register" rather than those of the primary server,
repmgr a) write the node record to the witness server rather than
the primary, and b) would loop indefinitely trying to copy the
node table to itself.

Addresses GitHub #538.
2019-02-04 14:45:32 +09:00
Ian Barwick
59ed86c01a "cluster show": fix formatting with multiple digit node IDs 2019-02-02 14:07:49 +09:00
Ian Barwick
48381a5b4e Use --compact option for abbreviated display output
--terse is meant for reducing log chatter.
2019-02-02 13:06:59 +09:00
Ian Barwick
a41e7bb726 doc: various minor updates 2019-02-01 17:24:32 +09:00
Ian Barwick
b9ba97a36d "standby switchover": check replication connection to upstream
Ensure repmgr checks the standby (promotion candidate) is currently
attached to the primary (demotion candidate).

Addresses issue reported in GitHub #519.
2019-02-01 15:28:06 +09:00
Ian Barwick
d8aa472c5f doc: fix URL typo 2019-02-01 13:13:11 +09:00
Ian Barwick
9273e7af73 "standby switchover": avoid potential race condition with WAL location check
Immediately after the demotion candidate (primary) has shut down, we can't
be absolutely sure that the walreceiver has flushed all WAL to disk, so
checking pg_last_wal_receive_lsn() at that point might not reflect
the actual last available WAL location.

To handle this, we'll loop for a while (timeout controlled by configuration
parameter "wal_receive_check_timeout") before finally deciding whether
the standby is still behind the shut-down primary.

Addresses issue raised in GitHub #518.
2019-02-01 12:06:22 +09:00
Ian Barwick
c402b08791 doc: update "node rejoin" page
Add some notes about situations where node rejoin cannot work, and
pg_rewind usage.
2019-01-31 12:25:58 +09:00
Ian Barwick
ea54aaa290 Use "rejoin target" instead of "follow target" in "node rejoin" log output 2019-01-31 11:32:38 +09:00
Ian Barwick
19e0b6a1b6 doc: update "node rejoin" documentation
In particular, update examples to reflect changed output in repmgr 4.3.
2019-01-31 10:49:39 +09:00
Ian Barwick
9349171b55 doc: document "node_rejoin_timeout" for switchover operations 2019-01-30 15:43:34 +09:00
Ian Barwick
d7420d7274 daemon (start|stop): verify that repmgrd starts/stops.
Note this may not always be possible for "daemon stop" if we are unable
to determine the repmgrd PID.
2019-01-30 14:41:31 +09:00
Ian Barwick
b6264b77c4 repmgr: mandate explicit configuration for "daemon (start|stop)"
The initial implementation was designed to fall back to "manual"
start/stop of repmgrd if the "repmgrd_service_..._command" parameters
were not set.

However on reflection, this is too much of a potential footgun, so
we will mandate provision of these parameters.
2019-01-30 10:57:06 +09:00
Ian Barwick
9e7cb6d01c doc: make it easier to find info about installation of old packages 2019-01-30 09:45:08 +09:00
Ian Barwick
a5aa47c1dd daemon start/stop: add warning about missing configuration
repmgr will attempt to construct appropriate commands to start
and stop repmgrd, but usually it's preferable for them to be explicitly
defined, particularly if repmgr is installed from packages.
2019-01-29 14:08:00 +09:00
Ian Barwick
7654dd615b Finalize "daemon (start|stop)" commands
Implements GitHub #528.
2019-01-29 13:16:11 +09:00
Ian Barwick
c83e9870fe doc: update "repmgr daemon (start|stop)" documentation 2019-01-29 13:01:36 +09:00
Ian Barwick
ba13172b3a Add initial "repmgr daemon (start|stop)" documentation 2019-01-29 13:01:19 +09:00
Ian Barwick
e5f50e7b99 doc: add additional index entries for repmgrd 2019-01-28 09:52:51 +09:00
Ian Barwick
aeea02b598 doc: update "standby follow" error codes 2019-01-24 10:43:38 +09:00
Ian Barwick
59eca2be30 node rejoin: improve error code handling
- return ERR_REJOIN_FAIL in all cases where the rejoin operation fails
 - ensure ERR_FOLLOW_FAIL is not returned
 - document error codes
2019-01-24 10:31:45 +09:00
Ian Barwick
061932d023 "node rejoin": verify status of rejoin target
This adapts the code previously added to "standby follow" to verify
whether the rejoin target can actually be rejoined.
2019-01-23 17:08:55 +09:00
Ian Barwick
0970789b1d doc: improve package install instructions
Including:
- additional clarification for Pg 9.x RPM package names
- consistent usage of sudo
2019-01-23 12:55:06 +09:00
Ian Barwick
07b79286b5 doc: clarify use-cases for pausing repmgrd 2019-01-23 12:33:57 +09:00
Ian Barwick
c3d284e097 doc: better document relevant PostgreSQL settings
There is a brief section in the Quickstart Guide, but it is hard to find
unless you know it is there.
2019-01-23 12:25:02 +09:00
Ian Barwick
a9e09d436a doc: use "/current/" in URL path
From the next major release, the current documentation will be located in the
"/docs/current/" subdirectory. This makes it easier to provide canonical
links to the latest version of the documentation (similar to how  the
main PostgreSQL documentation is organised).

Once the following major release is available, the documentation will be moved
to a subdirectory with the version number, e.g. "/docs/4.3/".
2019-01-23 10:28:48 +09:00
Ian Barwick
965984a510 doc: update internal documentation links 2019-01-23 10:18:31 +09:00
Abhijit Menon-Sen
99161c38d2 Fix typo 2019-01-21 17:37:01 +05:30
Ian Barwick
57d3ee768c doc: clarify data directory requirement in quickstart guide 2019-01-21 15:12:33 +09:00
Ian Barwick
7dce3ed234 Update copyright notices to 2019 2019-01-21 14:54:35 +09:00
Ian Barwick
58efb0f158 repmgrd: on a cascaded standby, don't fail over if "failover=manual"
Addresses GitHub #531.
2019-01-21 14:16:49 +09:00
Ian Barwick
d261768541 Standardize on --host option 2019-01-17 10:52:41 +09:00
Ian Barwick
aa8547a219 Improve "witness register" documentation, help and logging
Make it clearer that a) the primary server's hostname is required,
and b) how to provide it.

Based on feedback provided in GitHub #529.
2019-01-17 10:42:53 +09:00