Commit Graph

126 Commits

Author SHA1 Message Date
Ian Barwick
b1e544f962 Enable use of pg_rewind during switchover operations
But only if required and --force-rewind required, and pg_rewind
can actually be used.
2017-08-09 12:09:37 +09:00
Ian Barwick
2553839630 Split actual promote functionality of do_standby_promote() into seperate function
No need to do all the sanity checks performed by "repmgr standby promote"
when promoting the standby during a switchover operation.
2017-08-08 10:45:56 +09:00
Ian Barwick
f2cf46bba3 Check replication lag before attempting switchover 2017-08-08 10:16:47 +09:00
Ian Barwick
2499b42ef8 switchover: check for pending archive files on the demotion candidate
If the current primary (demotion candidate) still has any files to archive,
it will delay the shutdown until all files are archived. If there is a
substantial number of files, and/or the archive command executes slowly,
this will probably lead to an unwelcome delay in the switchover process.
2017-08-08 00:37:20 +09:00
Ian Barwick
068ecc963d Minor log output fix 2017-08-04 23:58:15 +09:00
Ian Barwick
20eeeef884 don't try and drop non-existent slot after switchover 2017-08-04 14:20:38 +09:00
Ian Barwick
972f8394ff Fix slot deletion after switchover 2017-08-04 13:16:46 +09:00
Ian Barwick
82639b6903 Refactor slot name handling
Better to work with the slot name in a node record, rather than
creating a global variable.
2017-08-04 11:56:11 +09:00
Ian Barwick
2c682b31c2 Attempt to delete replication slot on old primary after switchover 2017-08-04 11:55:54 +09:00
Ian Barwick
c34f5c1ed1 Initial switchover code 2017-08-04 09:39:30 +09:00
Ian Barwick
5948cf6cda repmgr standby switchover: add sanity check for pg_rewind useability
pg_rewind will only be executed on a demoted primary if explictly
requested, to prevent transactions on the primary, which
were never replicated, from being automatically overwritten.

If --force-rewind is provided, we'll need to check pg_rewind
is actually useable before we need to use it.
2017-08-04 00:45:55 +09:00
Ian Barwick
7d77fd4072 Log successful switchover event 2017-08-03 17:02:30 +09:00
Ian Barwick
112ca6321a Initial switchover implementation
The repmgr3 implementation required the promotion candidate (standby)
to directly work with the demotion candidate's data directory,
directly execute server control commands etc.

Here we delegated a lot more of that work to the repmgr on the
demotion candidate, which reduces the amount of back-and-forth
over SSH and generally makes things cleaner and smoother.

In particular the repmgr on the demotion candidate will carry
out a thorough check that the node is shut down and report
the last checkpoint LSN to the promotion candidate; this
can then be used to determine whether pg_rewind needs to be
executed on the demoted primary before reintegrating it back
into the cluster (todo).

Also implement "--dry-run" for this action, which will sanity-check the
nodes as far as possible without executing the switchover.

Additionally some of the new repmgr node commands (or command options)
introduced for this can be also executed by the user to obtain
additional information about the status of each node.
2017-08-03 16:38:37 +09:00
Ian Barwick
c67aa15581 Make "pgdata" a mandatory configuration file setting
There are some circumstances, e.g. during switchover operations,
where repmgr may need to operate on a data directory while the
server isn't running, in which case there's no way to retrieve
that information.
2017-08-02 23:04:24 +09:00
Ian Barwick
83cda89362 Get data directory for server commands if needed
Also add configuration file option "pgdata" for hard-coding the
node's data directory - if the "repmgr" DB user isn't a superuser
or doesn't have permission to extract the data directory, we'll
need another way of finding out.
2017-08-02 13:16:16 +09:00
Ian Barwick
aa528dfdfb Consolidate generation of various server control commands
This is needed for better switchover control, so we can instruct
the remote repmgr to issue the appropriate server command rather
than trying to work out what it should be from the local node.
2017-08-02 12:01:20 +09:00
Ian Barwick
e5d50bbfd5 Separate configuration file queries into a discrete function
Simplifies main application code and makes it easier to reuse
the queries.
2017-08-02 00:04:20 +09:00
Ian Barwick
f023b9c90c Add "repmgr node archive-config" 2017-08-01 17:38:54 +09:00
Ian Barwick
56b2e9bb84 Rename/add configuration file options
In previous versions of repmgr, some options had ambiguous meanings,
and/or were used for slightly different purposes. This way we end
up with a couple more options (most of which probably won't need
adjusting) but greater clarity and flexibility.

Removed:

  master_reponse_timeout:
    renamed to "async_query_timeout", as this was its main usage

  retry_promote_interval_secs:
    replaced by "primary_notification_timeout"

Added:
  async_query_timeout:
    timeout (in seconds) when executing asynchronous queries

  primary_notification_timeout:
    number of seconds to wait for notification from the new primary
    after a failover

  primary_follow_timeout:
    number of seconds to wait for the new primary to become available
    when executing "repmgr standby follow"
2017-07-25 11:13:32 +09:00
Ian Barwick
8a2e4db1bc Add "repmgr node status"
Outputs an overview of a node's status, and emits warnings if any
issues detected.
2017-07-25 00:39:04 +09:00
Ian Barwick
248525ccba Remove unused PQexpBuffer
It was being referenced without being initialised, but the output
which would have been placed there is not used anyway, so discard
completely.
2017-07-18 12:06:26 +09:00
Ian Barwick
ec554e5694 Improve connection handling
Set "connect_timeout" and "fallback_application_name" if not present.
2017-07-17 11:10:37 +09:00
Ian Barwick
951c7dbd07 repmgrd: in BDR mode, have each repmgrd monitor each node
This will cover both the case when an entire node including
repmgrd goes down, and when one PostgreSQL instance goes down
but repmgrd is still up (in which case only one of the repmgrds
will handle the failover).
2017-07-14 15:01:18 +09:00
Ian Barwick
2787994a6e Make repmgrd failover settings configurable 2017-07-07 21:11:22 +09:00
Ian Barwick
b08511ec79 When cloning from Barman, use basebackups_directory
See: https://github.com/2ndQuadrant/repmgr/issues/312
2017-07-07 13:39:10 +09:00
Ian Barwick
0d226867b4 Add "location" column 2017-07-06 01:17:00 +09:00
Ian Barwick
a31d66f826 repmgr standby follow: add event details
Useful to have a confirmation of which node was followed.
2017-07-05 11:36:30 +09:00
Ian Barwick
617dee6bd6 Add function create_event_record()
For logging an event to the event table without generating an external
event notification.

Rename existing create_event_record*() functions to create_event_notification*()
as this describes their function better.
2017-07-05 09:52:22 +09:00
Ian Barwick
4e06355b57 Replace repmgr.conf item "upstream_node_id" with --upstream-node-id
This is only relevant when cloning a standby and the node's upstream
can change after failover/switchover etc., so no point keeping the
original value around in the configuration file.
2017-07-04 23:34:20 +09:00
Ian Barwick
7845a1fb47 Minimum supported version is currently 9.4 2017-06-25 21:46:50 +09:00
Ian Barwick
b64581c582 repmgrd: log startup on primary 2017-06-24 08:44:19 +09:00
Ian Barwick
8117d4dcc4 Various minor fixes 2017-06-23 21:42:28 +09:00
Ian Barwick
46c956e61a Use "primary" instead of "master" 2017-06-23 21:33:54 +09:00
Ian Barwick
1b2652037d Rename enum types for consistency 2017-06-23 16:38:14 +09:00
Ian Barwick
dbaa2e0b44 Add a RecordStatus return type for functions which populate record structures
Unify a bunch of slightly different ways of handling the result.
2017-06-23 16:16:46 +09:00
Ian Barwick
3e3607167c Remove references to --data-dir 2017-06-23 14:13:32 +09:00
Ian Barwick
a5d15c22a8 repmgr standby follow: ensure data directory provided, if required
Required when using host parameters to reactivate a stopped node;
we have no other way of knowing the data directory.
2017-06-23 13:42:07 +09:00
Ian Barwick
8d84732026 repmgr standby follow: suppress master database connection error messages 2017-06-21 14:53:02 +09:00
Ian Barwick
6cdf73b4cb repmgr standby promote: suppress master database connection error message
Otherwise the first line of output is an ERROR, which is confusing,
even though it's expected.
2017-06-21 13:21:44 +09:00
Ian Barwick
0c531e07e7 repmgr standby promote: add detail about an existing master 2017-06-21 10:25:12 +09:00
Ian Barwick
030fdc046b repmgr standby follow: main code 2017-06-16 21:38:53 +09:00
Ian Barwick
7b976ef2df repmgr standby follow: initial code 2017-06-16 00:05:18 +09:00
Ian Barwick
a69f80a9af standby clone: enable overwrite of existing data directory
But only if the --force is used, and the instance isn't active.
2017-06-15 22:43:49 +09:00
Ian Barwick
36b3782009 Store the replication user in repmgr.nodes
When creating recovery.conf outside of "repmgr standby clone",
there was no way of knowing if a replication user had been
explicitly provided with --replication-user, meaning the value
of "primary_conninfo" would be set to the "conninfo" field of the
node's upstream node record.

We'll add an extra column to store the replication user for each
node so it can be referenced at any time.
2017-06-14 23:27:26 +09:00
Ian Barwick
6af75a1151 repmgr standby: improve behaviour
- word hint about registering depending on whether record exists or not
- when checking for existing records with same name, check node id
  is different
2017-06-13 09:22:24 +09:00
Ian Barwick
e89c43c5cb Remove unused backup functions
Not needed since removal of rsync functionality
2017-06-13 00:35:01 +09:00
Ian Barwick
13e4913f1f Document events generated by functions 2017-06-12 08:18:10 +09:00
Ian Barwick
aa53514f9f repmgr: various fixes for "master unregister" 2017-06-12 08:18:10 +09:00
Ian Barwick
124398bed5 Replace is_standby() with get_recovery_type()
We what to know what kind of node it is, not whether it's a standby or not.
2017-06-09 11:25:43 +09:00
Ian Barwick
3a56bec4b5 repmgr: remove rsync cloning option 2017-05-31 22:59:35 +09:00