Commit Graph

70 Commits

Author SHA1 Message Date
Ian Barwick
2499b42ef8 switchover: check for pending archive files on the demotion candidate
If the current primary (demotion candidate) still has any files to archive,
it will delay the shutdown until all files are archived. If there is a
substantial number of files, and/or the archive command executes slowly,
this will probably lead to an unwelcome delay in the switchover process.
2017-08-08 00:37:20 +09:00
Ian Barwick
5948cf6cda repmgr standby switchover: add sanity check for pg_rewind useability
pg_rewind will only be executed on a demoted primary if explictly
requested, to prevent transactions on the primary, which
were never replicated, from being automatically overwritten.

If --force-rewind is provided, we'll need to check pg_rewind
is actually useable before we need to use it.
2017-08-04 00:45:55 +09:00
Ian Barwick
112ca6321a Initial switchover implementation
The repmgr3 implementation required the promotion candidate (standby)
to directly work with the demotion candidate's data directory,
directly execute server control commands etc.

Here we delegated a lot more of that work to the repmgr on the
demotion candidate, which reduces the amount of back-and-forth
over SSH and generally makes things cleaner and smoother.

In particular the repmgr on the demotion candidate will carry
out a thorough check that the node is shut down and report
the last checkpoint LSN to the promotion candidate; this
can then be used to determine whether pg_rewind needs to be
executed on the demoted primary before reintegrating it back
into the cluster (todo).

Also implement "--dry-run" for this action, which will sanity-check the
nodes as far as possible without executing the switchover.

Additionally some of the new repmgr node commands (or command options)
introduced for this can be also executed by the user to obtain
additional information about the status of each node.
2017-08-03 16:38:37 +09:00
Ian Barwick
aa528dfdfb Consolidate generation of various server control commands
This is needed for better switchover control, so we can instruct
the remote repmgr to issue the appropriate server command rather
than trying to work out what it should be from the local node.
2017-08-02 12:01:20 +09:00
Ian Barwick
e5d50bbfd5 Separate configuration file queries into a discrete function
Simplifies main application code and makes it easier to reuse
the queries.
2017-08-02 00:04:20 +09:00
Ian Barwick
f023b9c90c Add "repmgr node archive-config" 2017-08-01 17:38:54 +09:00
Ian Barwick
8a5665a421 repmgr node status: add information about current LSN locations for streaming standbys 2017-08-01 10:34:12 +09:00
Ian Barwick
dc24d62009 repmgrd: improve BDR recovery handling 2017-07-27 11:53:55 +09:00
Ian Barwick
8a2e4db1bc Add "repmgr node status"
Outputs an overview of a node's status, and emits warnings if any
issues detected.
2017-07-25 00:39:04 +09:00
Ian Barwick
b99443b0c8 Improvements to repmgr cluster show
Add documentation; show recovery status in --csv mode.
2017-07-20 10:25:13 +09:00
Ian Barwick
49ac9cf9ca Add "repmgr cluster show" 2017-07-19 17:36:21 +09:00
Ian Barwick
23e6440dfd repmgrd: initiate primary monitoring when local node is promoted manually 2017-07-19 11:15:38 +09:00
Ian Barwick
6e270b2faf repmgrd: catch cases where more than one node has initiated voting
The node(s) with higher ID will "yield", leaving the decision making
up to the node with the lower ID.

This happens very rarely, usually when the random delay is close
enough on two or mode nodes that vote initiation is simultaneous.
2017-07-18 17:04:24 +09:00
Ian Barwick
a56bb41891 Remove redundant fields from node record struct 2017-07-17 14:11:14 +09:00
Ian Barwick
0dcd479322 Store node status in node record struct 2017-07-17 13:50:17 +09:00
Ian Barwick
ec554e5694 Improve connection handling
Set "connect_timeout" and "fallback_application_name" if not present.
2017-07-17 11:10:37 +09:00
Ian Barwick
a29bc3e0fa Rename config.[ch] to configfile.[ch] 2017-07-16 09:41:26 +09:00
Ian Barwick
951c7dbd07 repmgrd: in BDR mode, have each repmgrd monitor each node
This will cover both the case when an entire node including
repmgrd goes down, and when one PostgreSQL instance goes down
but repmgrd is still up (in which case only one of the repmgrds
will handle the failover).
2017-07-14 15:01:18 +09:00
Ian Barwick
e3b3fb65f0 repmgrd: restrict BDR monitoring to two node setup
It's not safe to have more than two nodes with this kind of
"failover", so we don't need to select alternative nodes by
priority.
2017-07-14 12:56:11 +09:00
Ian Barwick
dfcf85a62f repmgrd: further BDR sanity checks 2017-07-14 10:27:28 +09:00
Ian Barwick
7eadbf6b17 Various improvements to "repmgr bdr register/unregister" 2017-07-12 22:38:03 +09:00
Ian Barwick
0a1addfdc0 When registering a BDR node, sync repmgr.nodes from another node
If a BDR node is added via bdr_group_join(), repmgr.nodes will
start off empty, so we'll need to sync it ourselves before adding
it to the repmgr replication set.
2017-07-12 10:11:25 +09:00
Ian Barwick
1cccb1dd5a Add "repmgr bdr unregister" 2017-07-12 10:11:21 +09:00
Ian Barwick
71a0871232 Add "repmgr bdr register" 2017-07-11 15:38:58 +09:00
Ian Barwick
2962ffe605 repmgrd: initial BDR monitoring support 2017-07-10 23:58:59 +09:00
Ian Barwick
dddea9814b Add BDR-related database functions 2017-07-10 21:52:39 +09:00
Ian Barwick
5fbcf3e476 Remove witness server references 2017-07-10 09:31:31 +09:00
Ian Barwick
0d226867b4 Add "location" column 2017-07-06 01:17:00 +09:00
Ian Barwick
9351e532b4 Ensure configuration parameter "replication_user" is smaller than NAMEDATALEN 2017-07-06 00:22:23 +09:00
Ian Barwick
617dee6bd6 Add function create_event_record()
For logging an event to the event table without generating an external
event notification.

Rename existing create_event_record*() functions to create_event_notification*()
as this describes their function better.
2017-07-05 09:52:22 +09:00
Ian Barwick
24c6b2c9f1 repmgrd: initial code for cascaded standby failover 2017-07-04 23:14:05 +09:00
Ian Barwick
618a2346e1 repmgrd: various fixed, mainly clearing status after a failover event 2017-07-04 11:55:03 +09:00
Ian Barwick
debe5a18c5 have new primary communicate to standbys 2017-06-30 21:45:25 +09:00
Ian Barwick
fc4f276844 Improve handling
not sure if we need to store the electoral term...
2017-06-30 13:40:19 +09:00
Ian Barwick
3514e20367 poke it around until it works less badly 2017-06-29 09:35:09 +09:00
Ian Barwick
fa86fe4ad8 Basic voting 2017-06-29 01:11:21 +09:00
Ian Barwick
d6b6255144 interim commit 2017-06-28 18:20:03 +09:00
Ian Barwick
ded8d95e5a interim commit 2017-06-28 16:38:41 +09:00
Ian Barwick
35b6178e07 placeholder code for function 2017-06-27 09:50:47 +09:00
Ian Barwick
78a16d746d Initial primary node monitoring 2017-06-27 00:15:29 +09:00
Ian Barwick
46c956e61a Use "primary" instead of "master" 2017-06-23 21:33:54 +09:00
Ian Barwick
1b2652037d Rename enum types for consistency 2017-06-23 16:38:14 +09:00
Ian Barwick
dbaa2e0b44 Add a RecordStatus return type for functions which populate record structures
Unify a bunch of slightly different ways of handling the result.
2017-06-23 16:16:46 +09:00
Ian Barwick
6cdf73b4cb repmgr standby promote: suppress master database connection error message
Otherwise the first line of output is an ERROR, which is confusing,
even though it's expected.
2017-06-21 13:21:44 +09:00
Ian Barwick
94a88326ef repmgrd: further code ported 2017-06-20 09:17:29 +09:00
Ian Barwick
030fdc046b repmgr standby follow: main code 2017-06-16 21:38:53 +09:00
Ian Barwick
36b3782009 Store the replication user in repmgr.nodes
When creating recovery.conf outside of "repmgr standby clone",
there was no way of knowing if a replication user had been
explicitly provided with --replication-user, meaning the value
of "primary_conninfo" would be set to the "conninfo" field of the
node's upstream node record.

We'll add an extra column to store the replication user for each
node so it can be referenced at any time.
2017-06-14 23:27:26 +09:00
Ian Barwick
e89c43c5cb Remove unused backup functions
Not needed since removal of rsync functionality
2017-06-13 00:35:01 +09:00
Ian Barwick
bb7d3e41c3 repmgr master unregister: check for downstream nodes
Foreign key dependencies will make it impossible to remove the node
if it still has downstream nodes pointing to it.
2017-06-12 22:24:50 +09:00
Ian Barwick
aa53514f9f repmgr: various fixes for "master unregister" 2017-06-12 08:18:10 +09:00