Commit Graph

116 Commits

Author SHA1 Message Date
Ian Barwick
5b5b456ecb "standby switchover": improve logging
Also no need to disconnect/reconnect from/to local node while it promotes.
2017-09-05 10:26:27 +09:00
Ian Barwick
d82e936556 "standby promote": improve logging
Specifically state which server is being promoted; this is particularly
important when the promotion occurs as part of a series of other operations,
e.g. "standby switchover".

Also no need to disconnect/reconnect while the server is promoted.
2017-09-05 09:43:16 +09:00
Ian Barwick
78e6bdeebe Have repmgrd parse "standby follow --upstream-node-id=%n" 2017-09-04 13:42:50 +09:00
Ian Barwick
47a4b49890 Add "repmgr standby follow --upstream-node-id"
In an automatic failover situation, after a standby has been promoted
there's a risk the original primary may become available again before
"standby follow" is issued on another standby node, in which case "standby
follow" will reconnect to the original primary.

As the standby's repmgrd will have received a notification from the new
primary, it will know the primary's ID and can therefore explicitly
direct "standby follow" to follow that primary.
2017-09-04 09:11:59 +09:00
Ian Barwick
3aeceab081 "standby follow": add missing sleep() call when --wait specified 2017-09-02 13:11:03 +09:00
Ian Barwick
edb74ccef9 Various fixes to "repmgr node rejoin" 2017-09-01 11:30:31 +09:00
Ian Barwick
c7423ebb44 Various minor fixes 2017-08-31 23:54:52 +09:00
Ian Barwick
91941183bc Use replication user, if set, when checking replication connections 2017-08-31 17:54:49 +09:00
Ian Barwick
0e0b221507 Add configuration file setting "use_primary_conninfo_password"
If, for whatever reason, the upstream server password needs to be set
in "primary_conninfo", enable it to be extracted from $PGPASSWORD.
2017-08-31 14:57:07 +09:00
Ian Barwick
ae634100a3 "standby switchover": sanity-check remote "repmgr" binary before proceeding 2017-08-31 11:00:14 +09:00
Ian Barwick
8e35a415a9 Refactor extraction of value to use for "primary_conninfo"
Also add improved error detection.

Basically in the worst case we want to enable the user to clone a standby
from Barman even if the upstream node is not running/reachable, as long as
the user explicitly provides a string to use for "primary_conninfo".
2017-08-31 09:59:30 +09:00
Ian Barwick
13f2d46e92 "standby clone": exit with 0 after successful --dry-run 2017-08-30 10:00:48 +09:00
Ian Barwick
8b03859dbe "standby clone": add early sanity check for external configuration files
This still requires an SSH connection, so we need to check early before
the cloning starts, and also emit useful information for --dry-run.
2017-08-29 22:11:16 +09:00
Ian Barwick
b900f9996f "repmgr standby clone": add --dry-run option 2017-08-28 15:04:50 +09:00
Ian Barwick
e05bab8284 "standby switchover": epxlictly confirm suitability for --pg-rewind
If --force-rewind requested.
2017-08-28 14:50:08 +09:00
Ian Barwick
754084c814 Update "repmgr standby --help" output 2017-08-26 10:27:22 +09:00
Ian Barwick
57215a8bd7 Add --help output for "standby clone" 2017-08-25 23:07:17 +09:00
Ian Barwick
dc172cae20 When performing a follow operation, start/restart server as appropriate
Before this we were always forcing a restart, which is technically not
a problem but produces some potentially confusing log entries along the
lines:

  pg_ctl: PID file "/path/to/postmaster.pid" does not exist
  Is server running?
  starting server anyway
2017-08-25 16:50:30 +09:00
Ian Barwick
a449e8512e repmgr: improve "repmgr standby switchover" log output
Particularly in --dry-run mode it's useful to get a confirmation that
various prerequisites are met.
2017-08-25 16:01:11 +09:00
Ian Barwick
2092a55b9e Update README
Document "standby switchover" and additional repmgrd information.
2017-08-25 00:39:22 +09:00
Ian Barwick
fcd111ac4c Improve logging output during failover process 2017-08-24 22:44:03 +09:00
Ian Barwick
ef0163bd84 "standby follow": ensure recovery.conf uses "node_name" as "application_name"
In repmgr4 we want to make it easier to establish which node is connected
to which.
2017-08-22 13:21:29 +09:00
Ian Barwick
4943909282 Fix source server version number checks during "standby clone" 2017-08-21 13:36:11 +09:00
Ian Barwick
594e9e5007 Document upgrade process from repmgr3
Also provide unpackaged extension upgrade SQL, and a script to
assist converting repmgr.conf files.
2017-08-17 23:37:31 +09:00
Ian Barwick
da24d883e5 Remove option "--wal-keep-segments"
This is a remnant of the early repmgr days when there were no alternative
mechanisms for ensuring sufficient WAL remains available while cloning a
standby.

The purpose of this setting was to override a check for an (arbitrary)
minimum setting for "wal_keep_segments". As there's no reliable way
of determining a sensible value for this, and improvements in
pg_basebackup mean WALs can be streamed (possibly using a replication
slot) while the backup is in progress, there's no point in keeping
this around.

We will however still emit a warning about setting "wal_keep_segments"
if the configuration doesn't appear to provide any other way of
ensuring WAL is available during/after the cloning process and
"wal_keep_segments" is not set.
2017-08-17 14:45:13 +09:00
Ian Barwick
b1ba476241 Rename "archiver" check etc. to "archive-ready"
Gives a better indication of what's being checked.
2017-08-17 12:23:56 +09:00
Ian Barwick
a0bad5fdc0 General code cleanup 2017-08-16 23:09:02 +09:00
Ian Barwick
0ac16f7630 Add more --help output 2017-08-16 17:49:46 +09:00
Ian Barwick
4efc8fb9ce Add placeholder functions for "repmgr $command --help"
There are now too many options to sensibly fit into general --help
output; we'll add separate output for each repmgr command, e.g.
"repmgr node --help".
2017-08-16 13:24:14 +09:00
Ian Barwick
3b2158edbf Initialise variables, where appropriate 2017-08-14 15:11:42 +09:00
Ian Barwick
eabd56f3be "standby follow": check node system identifiers match 2017-08-14 11:45:08 +09:00
Ian Barwick
1292e8991a Improve "standby switchover" --dry-run output 2017-08-10 22:43:05 +09:00
Ian Barwick
4f2161bd83 Cleanup various #defines 2017-08-10 15:11:53 +09:00
Ian Barwick
cc52227d61 Miscellaneous cleanup 2017-08-10 15:05:01 +09:00
Ian Barwick
7ca68b7cc8 Standardize "primary_conninfo" generation
Previously repmgr would write all the default libpq parameters
into "primary_conninfo" on "standby clone", but not for
"standby follow", which is inconsistent.

For repmgr4 we'll determine that the upstream node's conninfo
must be canonical and contain all required connection parameters,
even if these are available as defaults or environment variables
in the local environment, as those are transient and may not
be available in all environments/situations.

recovery.conf's "primary_conninfo" will be generated using the
upstream's conninfo parameters, except for those specific
to the downstream node. These are:

  - "application_name": this will always be set to the
      "node_name"  of the downstream node
  - "passfile" and "servicefile": these, must of course
    reference files on the downstream node so will be extracted
    from the downstream node's conninfo, if set
2017-08-10 12:37:50 +09:00
Ian Barwick
1cb0adfdcb Finalize switchover process 2017-08-10 09:34:48 +09:00
Ian Barwick
5fb86771b1 Use stored node configuration file path when executing remote commands
Makes life much easier.
2017-08-10 09:12:07 +09:00
Ian Barwick
a57fb5b50c After switchover, enable sibling standbys to follow new primary 2017-08-10 00:06:16 +09:00
Ian Barwick
4930c95ef7 Consolidate final output of "standby follow" / "node rejoin" 2017-08-09 19:31:42 +09:00
Ian Barwick
df425a38b7 Refactor "standby follow" functionality
"standby follow" was originally co-opted to start up a demoted node;
this functionality is now delegated to "node rejoin", with the core
functionality of "standby follow" implemented as an internal function.
2017-08-09 13:26:27 +09:00
Ian Barwick
b1e544f962 Enable use of pg_rewind during switchover operations
But only if required and --force-rewind required, and pg_rewind
can actually be used.
2017-08-09 12:09:37 +09:00
Ian Barwick
2553839630 Split actual promote functionality of do_standby_promote() into seperate function
No need to do all the sanity checks performed by "repmgr standby promote"
when promoting the standby during a switchover operation.
2017-08-08 10:45:56 +09:00
Ian Barwick
f2cf46bba3 Check replication lag before attempting switchover 2017-08-08 10:16:47 +09:00
Ian Barwick
2499b42ef8 switchover: check for pending archive files on the demotion candidate
If the current primary (demotion candidate) still has any files to archive,
it will delay the shutdown until all files are archived. If there is a
substantial number of files, and/or the archive command executes slowly,
this will probably lead to an unwelcome delay in the switchover process.
2017-08-08 00:37:20 +09:00
Ian Barwick
068ecc963d Minor log output fix 2017-08-04 23:58:15 +09:00
Ian Barwick
20eeeef884 don't try and drop non-existent slot after switchover 2017-08-04 14:20:38 +09:00
Ian Barwick
972f8394ff Fix slot deletion after switchover 2017-08-04 13:16:46 +09:00
Ian Barwick
82639b6903 Refactor slot name handling
Better to work with the slot name in a node record, rather than
creating a global variable.
2017-08-04 11:56:11 +09:00
Ian Barwick
2c682b31c2 Attempt to delete replication slot on old primary after switchover 2017-08-04 11:55:54 +09:00
Ian Barwick
c34f5c1ed1 Initial switchover code 2017-08-04 09:39:30 +09:00