Commit Graph

64 Commits

Author SHA1 Message Date
Ian Barwick
c66c8ebc98 repmgr: add --terse mode to "cluster show"
This suppresses display of the usually lengthy "conninfo" column, mainly
useful for generating a compact table suitable for pasting into emails,
chats etc. without messy line breaks.

Implements GitHub #521.
2019-01-09 10:06:37 +09:00
Ian Barwick
455a0bd93f Use make_remote_repmgr_path() in place of make_repmgr_path()
Also we can now simplify "cluster (matrix|crosscheck)" commands as
beginning with v4.0, we know where the configuration file is, so can
provide that when invoking repmgr remotely.
2018-10-02 09:59:18 +09:00
Ian Barwick
11d25e2aef Add configuration parameter "repmgr_bindir"
This is to facilitate remote invocation of repmgr when the repmgr
binary is located somewhere other than the PostgreSQL binary directory, as it
cannot be assumed all package maintainers will install repmgr there.

This parameter is optional; if not set (the default), repmgr will fall back
to "pg_bindir" (if set).

Addresses GitHub #246.
2018-10-02 09:59:12 +09:00
Ian Barwick
2491b8ae52 Add functionality to "pause" repmgrd
In some circumstances, e.g. while performing a switchover, it is essential
that repmgrd does not take any kind of failover action, as this will put
the cluster into an incorrect state.

Previously it was necessary to stop repmgrd on all nodes (or at least
those nodes which repmgrd would consider as promotion candidates), however
this is a cumbersome and potentially risk-prone operation, particularly if the
replication cluster contains more than a couple of servers.

To prevent this issue from occurring, this patch introduces the ability
to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause")
which notifies repmgrd not to take any failover action until the node
is "unpaused" ("repmgr daemon unpause").

"repmgr daemon status" provides an overview of each node and whether repmgrd
is running, and if so whether it is paused.

"repmgr standby switchover" has been modified to automatically pause repmgrd
while carrying out the switchover.

See documentation for further details.
2018-09-27 16:42:10 +09:00
Ian Barwick
9681708b1a repmgr: improve slot handling in "node rejoin"
On the rejoined node, if a replication slot for the new upstream exists
(which is typically the case after a failover), delete that slot.

Also emit a warning about any inactive replication slots which may need
to be cleaned up manually.

GitHub #499.
2018-08-30 12:24:13 +09:00
Ian Barwick
56919ea499 repmgr: add -q/--quiet option
This suppresses log output below log level ERROR. This is useful mainly
when repmgr is being executed programmatically, e.g. in a cronjob,
where it's only useful to receive output if something goes wrong.

Note we advise against using this option when executing repmgr
commands which operate on PostgreSQL nodes (standby follow,
standby promote, standby switchover, node rejoin), particularly when
executed by repmgrd, as the log output will provide valuable
troubleshooting information.

Implements suggestion in GitHub #468.
2018-07-13 12:09:41 +09:00
Ian Barwick
37311e15a3 repmgr: fix "standby register --wait-sync" when no timeout provided
The default value for "wait_register_sync_seconds" was zero, which is treated
as disabling --wait-sync altogether. Default value now set to -1, which is taken
to mean no timeout value supplied.
2018-07-04 17:22:04 +09:00
Ian Barwick
080a29c33b node check: add --missing-slots check
This enables an explicit check for slots which should exist (according
to the repmgr metadata) but which aren't present.
2018-06-22 17:21:40 +09:00
Ian Barwick
8320179f34 Add configuration file parameter "config_directory"
This enables explicit provision of an external configuration file
directory, which if set will be passed to "pg_ctl" as the -D
parameter. Otherwise "pg_ctl" will default to using the data directory,
which will cause some operations to fail if the configuration files
are not present there.

Note this is implemented primarily for feature completeness and for
development/testing purposes. Users who have installed "repmgr" from
a package should not rely on "pg_ctl" to stop/start/restart PostgreSQL,
instead they should set the appropriate "service_..._command" for their
operating system. For more details see:

    https://repmgr.org/docs/4.0/configuration-service-commands.html

Note: in a future release, the presence of "config_directory" in repmgr.conf
will be used to implictly set "--copy-external-config-files=samepath" when
cloning a standby; this is a behaviour change so will be implemented in the
next major realease (repmgr 4.1).

Implements GitHub #424.
2018-04-25 11:58:24 +09:00
Ian Barwick
3b00dc912a Catch various corner cases when restarting a PostgreSQL instance 2018-04-03 14:40:53 +09:00
Ian Barwick
a3f371b8c0 "node rejoin": actively check for node to rejoin cluster
Previously repmgr was relying on whatever command was configured to
start PostgreSQL to determine whether the node being rejoined had
started correctly. However it's preferable to actively poll the upstream
to confirm it has restarted and actually attached as a standby before
confirming success of the "node rejoin" action.

This can be overridden with the -W/--no-wait option.

(Note that for consistency with other PostgreSQL utilities, the
short form of the --wait option is now "-w"; this is currently
only used in "repmgr standby follow".)

Also update "repmgr node rejoin" documentation with a list of supported
options, and add some useful index entries for "pg_rewind".

Implements GitHub #415.
2018-04-03 10:34:44 +09:00
Ian Barwick
3ccf1cf182 Enable pg_rewind to be used with PostgreSQL 9.3/9.4
pg_rewind is not part of the core distribution for those, but we
provided support in repmgr 3.3 so should extend it to repmgr 4.

Note that there is no check in place whether the pg_rewind binary
exists, so it's up to the user to ensure it's present.

Addresses GitHub #413.
2018-04-02 20:54:29 +09:00
Ian Barwick
9c5e76401f Fix "repmgr cluster crosscheck" output
Addresses GitHub #398.
2018-03-27 16:44:04 +09:00
Ian Barwick
ee98a3a58e "standby clone": add --recovery-conf-only option
This will generate "recovery.conf" for an existing standby.

Typical use-case is a standby cloned manually from an external data
source (e.g. Barman), where "recovery.conf" needs to be created
(and if required a replication slot).

The --dry-run option will check the pre-requisites but not actually
create "recovery.conf" or a replication slot.

This requires that the upstream node is running, a replication connection
can be made and if required a replication slot can be created.

Implements GitHub #382.
2018-02-22 15:50:51 +09:00
Ian Barwick
927bf038a0 "standby switchover": check demotion candidate can make replication connection
Check it's actually possible for the demotion candidate to attach to
the promotion candidate before executing the switchover.

As with other checks of this nature, there's a faint possibility the
situation could change between the time the check is carried out and
the demotion candidate is restarted to connect to the promotion candidate,
but there's not a lot we can do about that. The main purpose is to
be able to catch existing misconfigurations before anything gets changed.

Implements GitHub #370.
2018-02-09 10:00:54 +09:00
Ian Barwick
b705127a34 "repmgr standby register": add --wait-start option
Implements GitHub #356.
2018-01-04 14:56:08 +09:00
Ian Barwick
26a9e848fd Update copyright notices to 2018 2018-01-02 10:19:46 +09:00
Ian Barwick
8c121da8a1 Add diagnostic option "repmgr node check --has-passfile"
This checks if the active libpq version (9.6 and later) has the
"passfile" option, and returns 0 if present, 1 if not.
`
2017-12-11 20:09:48 +09:00
Ian Barwick
30b11c08e6 Disable any configuration settings not compatible with PostgreSQL 9.3
And emit a warning while we're at it.
2017-09-18 13:12:38 +09:00
Ian Barwick
ea2693bc75 Move create_recovery_file() et al to repmgr-action-standby.c
As they're only ever called from there.
2017-09-18 09:53:08 +09:00
Ian Barwick
b6b31b15b2 Implement "repmgr cluster cleanup" 2017-09-11 13:48:46 +09:00
Ian Barwick
a9f4a027a7 pgindent run 2017-09-11 11:14:13 +09:00
Ian Barwick
e4f7dc8234 Add copyright notices 2017-09-08 13:27:39 +09:00
Ian Barwick
edee80cc37 Rename option "node check --is-shutdown" to "--is-shutdown-cleanly"
As that's what we really want to know. Also return "UNCLEAN_SHUTDOWN"
if that's the case, rather than "RUNNING" which is confusing, even
though it's a command for internal use.
2017-09-07 11:15:27 +09:00
Ian Barwick
47a4b49890 Add "repmgr standby follow --upstream-node-id"
In an automatic failover situation, after a standby has been promoted
there's a risk the original primary may become available again before
"standby follow" is issued on another standby node, in which case "standby
follow" will reconnect to the original primary.

As the standby's repmgrd will have received a notification from the new
primary, it will know the primary's ID and can therefore explicitly
direct "standby follow" to follow that primary.
2017-09-04 09:11:59 +09:00
Ian Barwick
91941183bc Use replication user, if set, when checking replication connections 2017-08-31 17:54:49 +09:00
Ian Barwick
0e0b221507 Add configuration file setting "use_primary_conninfo_password"
If, for whatever reason, the upstream server password needs to be set
in "primary_conninfo", enable it to be extracted from $PGPASSWORD.
2017-08-31 14:57:07 +09:00
Ian Barwick
da24d883e5 Remove option "--wal-keep-segments"
This is a remnant of the early repmgr days when there were no alternative
mechanisms for ensuring sufficient WAL remains available while cloning a
standby.

The purpose of this setting was to override a check for an (arbitrary)
minimum setting for "wal_keep_segments". As there's no reliable way
of determining a sensible value for this, and improvements in
pg_basebackup mean WALs can be streamed (possibly using a replication
slot) while the backup is in progress, there's no point in keeping
this around.

We will however still emit a warning about setting "wal_keep_segments"
if the configuration doesn't appear to provide any other way of
ensuring WAL is available during/after the cloning process and
"wal_keep_segments" is not set.
2017-08-17 14:45:13 +09:00
Ian Barwick
b1ba476241 Rename "archiver" check etc. to "archive-ready"
Gives a better indication of what's being checked.
2017-08-17 12:23:56 +09:00
Ian Barwick
4efc8fb9ce Add placeholder functions for "repmgr $command --help"
There are now too many options to sensibly fit into general --help
output; we'll add separate output for each repmgr command, e.g.
"repmgr node --help".
2017-08-16 13:24:14 +09:00
Ian Barwick
4c0d719cdb Add replication slot check to "repmgr node check" 2017-08-16 11:17:02 +09:00
Ian Barwick
554673e83e Add "repmgr node check --downstream" 2017-08-15 15:50:46 +09:00
Ian Barwick
10ef30096c "node check": add server role check 2017-08-14 22:57:09 +09:00
Ian Barwick
fa7d60cd51 "node check": initial general output 2017-08-14 17:32:44 +09:00
Ian Barwick
8a50a72dc5 Additional "node status" output 2017-08-10 17:18:08 +09:00
Ian Barwick
5fb86771b1 Use stored node configuration file path when executing remote commands
Makes life much easier.
2017-08-10 09:12:07 +09:00
Ian Barwick
a57fb5b50c After switchover, enable sibling standbys to follow new primary 2017-08-10 00:06:16 +09:00
Ian Barwick
b1e544f962 Enable use of pg_rewind during switchover operations
But only if required and --force-rewind required, and pg_rewind
can actually be used.
2017-08-09 12:09:37 +09:00
Ian Barwick
f2cf46bba3 Check replication lag before attempting switchover 2017-08-08 10:16:47 +09:00
Ian Barwick
2499b42ef8 switchover: check for pending archive files on the demotion candidate
If the current primary (demotion candidate) still has any files to archive,
it will delay the shutdown until all files are archived. If there is a
substantial number of files, and/or the archive command executes slowly,
this will probably lead to an unwelcome delay in the switchover process.
2017-08-08 00:37:20 +09:00
Ian Barwick
82639b6903 Refactor slot name handling
Better to work with the slot name in a node record, rather than
creating a global variable.
2017-08-04 11:56:11 +09:00
Ian Barwick
5948cf6cda repmgr standby switchover: add sanity check for pg_rewind useability
pg_rewind will only be executed on a demoted primary if explictly
requested, to prevent transactions on the primary, which
were never replicated, from being automatically overwritten.

If --force-rewind is provided, we'll need to check pg_rewind
is actually useable before we need to use it.
2017-08-04 00:45:55 +09:00
Ian Barwick
112ca6321a Initial switchover implementation
The repmgr3 implementation required the promotion candidate (standby)
to directly work with the demotion candidate's data directory,
directly execute server control commands etc.

Here we delegated a lot more of that work to the repmgr on the
demotion candidate, which reduces the amount of back-and-forth
over SSH and generally makes things cleaner and smoother.

In particular the repmgr on the demotion candidate will carry
out a thorough check that the node is shut down and report
the last checkpoint LSN to the promotion candidate; this
can then be used to determine whether pg_rewind needs to be
executed on the demoted primary before reintegrating it back
into the cluster (todo).

Also implement "--dry-run" for this action, which will sanity-check the
nodes as far as possible without executing the switchover.

Additionally some of the new repmgr node commands (or command options)
introduced for this can be also executed by the user to obtain
additional information about the status of each node.
2017-08-03 16:38:37 +09:00
Ian Barwick
83cda89362 Get data directory for server commands if needed
Also add configuration file option "pgdata" for hard-coding the
node's data directory - if the "repmgr" DB user isn't a superuser
or doesn't have permission to extract the data directory, we'll
need another way of finding out.
2017-08-02 13:16:16 +09:00
Ian Barwick
aa528dfdfb Consolidate generation of various server control commands
This is needed for better switchover control, so we can instruct
the remote repmgr to issue the appropriate server command rather
than trying to work out what it should be from the local node.
2017-08-02 12:01:20 +09:00
Ian Barwick
f023b9c90c Add "repmgr node archive-config" 2017-08-01 17:38:54 +09:00
Ian Barwick
a9b0c16b3c Add "cluster matrix" and "cluster crosscheck" actions 2017-07-26 11:24:33 +09:00
Ian Barwick
49ac9cf9ca Add "repmgr cluster show" 2017-07-19 17:36:21 +09:00
Ian Barwick
a29bc3e0fa Rename config.[ch] to configfile.[ch] 2017-07-16 09:41:26 +09:00
Ian Barwick
4e06355b57 Replace repmgr.conf item "upstream_node_id" with --upstream-node-id
This is only relevant when cloning a standby and the node's upstream
can change after failover/switchover etc., so no point keeping the
original value around in the configuration file.
2017-07-04 23:34:20 +09:00