Commit Graph

261 Commits

Author SHA1 Message Date
Ian Barwick
068ecc963d Minor log output fix 2017-08-04 23:58:15 +09:00
Ian Barwick
20eeeef884 don't try and drop non-existent slot after switchover 2017-08-04 14:20:38 +09:00
Ian Barwick
972f8394ff Fix slot deletion after switchover 2017-08-04 13:16:46 +09:00
Ian Barwick
82639b6903 Refactor slot name handling
Better to work with the slot name in a node record, rather than
creating a global variable.
2017-08-04 11:56:11 +09:00
Ian Barwick
2c682b31c2 Attempt to delete replication slot on old primary after switchover 2017-08-04 11:55:54 +09:00
Ian Barwick
c34f5c1ed1 Initial switchover code 2017-08-04 09:39:30 +09:00
Ian Barwick
5948cf6cda repmgr standby switchover: add sanity check for pg_rewind useability
pg_rewind will only be executed on a demoted primary if explictly
requested, to prevent transactions on the primary, which
were never replicated, from being automatically overwritten.

If --force-rewind is provided, we'll need to check pg_rewind
is actually useable before we need to use it.
2017-08-04 00:45:55 +09:00
Ian Barwick
0815accdef Formatting fix 2017-08-03 23:58:25 +09:00
Ian Barwick
7d77fd4072 Log successful switchover event 2017-08-03 17:02:30 +09:00
Ian Barwick
112ca6321a Initial switchover implementation
The repmgr3 implementation required the promotion candidate (standby)
to directly work with the demotion candidate's data directory,
directly execute server control commands etc.

Here we delegated a lot more of that work to the repmgr on the
demotion candidate, which reduces the amount of back-and-forth
over SSH and generally makes things cleaner and smoother.

In particular the repmgr on the demotion candidate will carry
out a thorough check that the node is shut down and report
the last checkpoint LSN to the promotion candidate; this
can then be used to determine whether pg_rewind needs to be
executed on the demoted primary before reintegrating it back
into the cluster (todo).

Also implement "--dry-run" for this action, which will sanity-check the
nodes as far as possible without executing the switchover.

Additionally some of the new repmgr node commands (or command options)
introduced for this can be also executed by the user to obtain
additional information about the status of each node.
2017-08-03 16:38:37 +09:00
Ian Barwick
c67aa15581 Make "pgdata" a mandatory configuration file setting
There are some circumstances, e.g. during switchover operations,
where repmgr may need to operate on a data directory while the
server isn't running, in which case there's no way to retrieve
that information.
2017-08-02 23:04:24 +09:00
Ian Barwick
83cda89362 Get data directory for server commands if needed
Also add configuration file option "pgdata" for hard-coding the
node's data directory - if the "repmgr" DB user isn't a superuser
or doesn't have permission to extract the data directory, we'll
need another way of finding out.
2017-08-02 13:16:16 +09:00
Ian Barwick
791640e3b4 repmgrd: never execute "service_promote_command" directly 2017-08-02 12:09:25 +09:00
Ian Barwick
aa528dfdfb Consolidate generation of various server control commands
This is needed for better switchover control, so we can instruct
the remote repmgr to issue the appropriate server command rather
than trying to work out what it should be from the local node.
2017-08-02 12:01:20 +09:00
Ian Barwick
5b7b276ada Make log levels case-insensitive 2017-08-02 09:46:53 +09:00
Ian Barwick
e5d50bbfd5 Separate configuration file queries into a discrete function
Simplifies main application code and makes it easier to reuse
the queries.
2017-08-02 00:04:20 +09:00
Ian Barwick
a1ad62d04e Add "repmgr node restore-config" 2017-08-01 22:13:32 +09:00
Ian Barwick
f023b9c90c Add "repmgr node archive-config" 2017-08-01 17:38:54 +09:00
Ian Barwick
3683d096f1 Avoid using PG_VERSION_NUM in frontend code
Debian.
2017-08-01 10:43:42 +09:00
Ian Barwick
8a5665a421 repmgr node status: add information about current LSN locations for streaming standbys 2017-08-01 10:34:12 +09:00
Ian Barwick
d00cb63179 repmgrd: prevent segfault if no configfile provided 2017-07-31 12:54:23 +09:00
Ian Barwick
fbe74cbee4 Rename repmgr{d}4 binaries to repmgr{d}
This was useful during initial development but now no longer required.
2017-07-31 10:37:15 +09:00
Ian Barwick
8d7d83347a repmgrd: add log line to indicate node recovery detected 2017-07-31 09:58:13 +09:00
Ian Barwick
3582a80e48 Rename package from repmgr4 to repmgr 2017-07-28 12:21:55 +09:00
Ian Barwick
dd73039d02 Update BDR documentation 2017-07-27 21:44:10 +09:00
Ian Barwick
7cf3b9b618 repmgrd: improve logging of BDR monitoring
Also always log information about event_notification command
2017-07-27 21:12:41 +09:00
Ian Barwick
0037d58dae Update README 2017-07-27 18:12:29 +09:00
Ian Barwick
5606434a97 Initial BDR failover documentation 2017-07-27 18:11:49 +09:00
Ian Barwick
42ecf5de74 Add TODO for repmgr cluster show 2017-07-27 18:11:13 +09:00
Ian Barwick
4c2ba42000 Update sample configuration file 2017-07-27 18:10:56 +09:00
Ian Barwick
4cf66c33db repmgrd: more fixes to BDR recovery handling 2017-07-27 16:33:41 +09:00
Ian Barwick
b4a655d074 Update README 2017-07-27 16:33:23 +09:00
Ian Barwick
fed6fba4ef repmgrd: more fixes for BDR node recovery 2017-07-27 14:13:39 +09:00
Ian Barwick
dc24d62009 repmgrd: improve BDR recovery handling 2017-07-27 11:53:55 +09:00
Ian Barwick
d8a1799215 Update -?/--help output 2017-07-27 10:08:32 +09:00
Ian Barwick
eff26b496c repmgrd: updates for BDR monitoring 2017-07-27 09:49:53 +09:00
Ian Barwick
a9b0c16b3c Add "cluster matrix" and "cluster crosscheck" actions 2017-07-26 11:24:33 +09:00
Ian Barwick
c3083a0ba0 repmgr node status: add "raw" data columns too 2017-07-25 12:06:42 +09:00
Ian Barwick
2a08317984 repmgr node status: optional CSV output 2017-07-25 11:26:09 +09:00
Ian Barwick
56b2e9bb84 Rename/add configuration file options
In previous versions of repmgr, some options had ambiguous meanings,
and/or were used for slightly different purposes. This way we end
up with a couple more options (most of which probably won't need
adjusting) but greater clarity and flexibility.

Removed:

  master_reponse_timeout:
    renamed to "async_query_timeout", as this was its main usage

  retry_promote_interval_secs:
    replaced by "primary_notification_timeout"

Added:
  async_query_timeout:
    timeout (in seconds) when executing asynchronous queries

  primary_notification_timeout:
    number of seconds to wait for notification from the new primary
    after a failover

  primary_follow_timeout:
    number of seconds to wait for the new primary to become available
    when executing "repmgr standby follow"
2017-07-25 11:13:32 +09:00
Ian Barwick
cbe19d5868 repmgr node status: collate output into list
To make output in different formats (e.g. CSV) easier.
2017-07-25 09:27:21 +09:00
Ian Barwick
a793e951b6 Remove unused function
PQExpBuffers used to generate SQL, no need to worry about maximum
query length and more flexible for generating dynamic queries.
2017-07-25 08:22:21 +09:00
Ian Barwick
8a2e4db1bc Add "repmgr node status"
Outputs an overview of a node's status, and emits warnings if any
issues detected.
2017-07-25 00:39:04 +09:00
Ian Barwick
93c35618a2 Use bdr.bdr_is_active_in_db() when checking for BDR presence 2017-07-24 19:09:09 +09:00
Ian Barwick
d3c2a0f505 repmgrd: record bdr_recovery event on the node which was up
Attempting to write on the recovered node may result in an
error if it hadn't already started up.
2017-07-24 18:56:18 +09:00
Ian Barwick
8f2dde3bde repmgrd: log BDR node recovery on the running node, not the recovered node
The recovered node might still be starting up.
2017-07-24 12:50:51 +09:00
Ian Barwick
e9cdf1c870 Add note 2017-07-20 23:57:28 +09:00
Ian Barwick
1a45287e76 Misc updates and fixes 2017-07-20 21:15:55 +09:00
Ian Barwick
b99443b0c8 Improvements to repmgr cluster show
Add documentation; show recovery status in --csv mode.
2017-07-20 10:25:13 +09:00
Ian Barwick
a5c5d9fa40 Show BDR status in "repmgr cluster show" output 2017-07-20 09:23:24 +09:00