Creates a Postgres checkpoint after `pg_ctl promote` runs on the former standby and before `pg_rewind` runs on the former master. This fixes the race condition that was reported in https://github.com/2ndQuadrant/repmgr/issues/372
Currently repmgr assumes the SSH hostname will be the same as the
database hostname, and it's easy enough now to extract this
from the node's conninfo string.
We can consider re-adding this in the next release if required.
- The "cluster matrix" command supports CSV mode via the --csv
switch.
- Add the optional ssh_hostname configuration parameter, which is
required by "cluster matrix".
- A corresponding ssh_hostname column has been added to the repl_nodes
table and to the repl_show_nodes view.
This prevents connection error messages being mixed in
with `repmgr cluster show` output. Error message output can
still be enabled with the --verbose flag.
Fixes GitHub #215
This is to ensure that when repmgr executes pg_basebackup it doesn't
add any options which would conflict with user-supplied options.
This is related to GitHub #206, where the -S/--slot option has been
added for 9.6 - it's important to check this doesn't conflict with
-X/--xlog-method.
While we're at it, rename the ErrorList handling code to ItemList
etc. so we can use it for generic non-error-related lists.
Although the witness server will resync the repl_nodes table following
a failover, other operations (e.g. removing or cloning a standby)
were previously not reflected in the witness server's copy of this
table.
As a short-term workaround, automatically resync the table at regular
intervals (defined by the configuration file parameter
"witness_repl_nodes_sync_interval_secs", default 30 seconds).
The difference between this and establish_db_connection() is that
it outputs any connection failure as a [NOTICE] rather than an
[ERROR]; it's intended for use in e.g. polling a server to wait
for it to come up/go down, while preventing [ERROR] log lines
which may cause confusion.
Perform a switchover by:
- stopping current primary node
- promoting this standby node to primary
- forcing previous primary node to follow this node
Caveats:
- repmgrd must not be running, otherwise it may
attempt a failover
(TODO: find some way of notifying repmgrd of planned
activity like this)
- currently only set up for two-node operation; any other
standbys will probably become downstream cascaded standbys
of the old primary once it's restarted
- as we're executing repmgr remotely (on the old primary),
we'll need the location of its configuration file; this
can be provided explicitly with -C/--remote-config-file,
otherwise repmgr will look in default locations on the
remote server
- this does not yet support "rewinding" stopped nodes
which will be unable to catch up with the primary
TODO:
- update help, docs
- make connection test timeouts/intervals configurable
Registering a master creates the schema, but it may be desirable
to forcibly reregister a master without deleting the schema, so
uncouple the dependency.
Also ensure schema creation is atomic by wrapping it in a transaction.
Per GitHub issue #49.
repmgrd correctly updates ID of the upstream node after automatic
failover, but repmgr was not doing that for manual failvers.
This moves the existing function to dbutils and modifies it so that
it does not rely on global variables with configuration (available
just in repmgrd).
This should fix issue #67 (hopefully, haven't done much testing).
Command to be executed each time an event is logged.
Following formatting sequences will be interpolated:
%e - event type
%d - description
%s - success (1 or 0)
%t - timestamp
This makes keeping track of events such as failovers
much easier. Note that this is for convenience and is
not a foolproof auditing log.
Sample output:
repmgr_db=# SELECT * from repmgr_test.repl_events ;
node_id | event | successful | event_timestamp | details
---------+--------------------------+------------+-------------------------------+----------------------------------------------------------
1 | master_register | t | 2015-03-06 14:14:08.196636+09 |
2 | standby_clone | t | 2015-03-06 14:14:17.660768+09 | Backup method: pg_basebackup; --force: N
2 | standby_register | t | 2015-03-06 14:14:18.762222+09 |
4 | witness_create | t | 2015-03-06 14:14:22.072815+09 |
3 | standby_clone | t | 2015-03-06 14:14:23.524673+09 | Backup method: pg_basebackup; --force: N
3 | standby_register | t | 2015-03-06 14:14:24.620161+09 |
2 | repmgrd_start | t | 2015-03-06 14:14:29.639096+09 |
3 | repmgrd_start | t | 2015-03-06 14:14:29.641489+09 |
4 | repmgrd_start | t | 2015-03-06 14:14:29.648002+09 |
2 | standby_promote | t | 2015-03-06 14:15:01.956737+09 | Node 2 was successfully be promoted to master
2 | repmgrd_failover_promote | t | 2015-03-06 14:15:01.964771+09 | Node 2 promoted to master; old master 1 marked as failed
3 | repmgrd_failover_follow | t | 2015-03-06 14:15:07.228493+09 | Node 3 now following new upstream node 2
(12 rows)
This involves mainly abstracting the functions which copy
and create records from repmgr.c to dbutils.c, as they need
to be shared between repmgr and repmgrd.
Per issue noted here:
https://groups.google.com/forum/#!topic/repmgr/v5nu1Xwf6X0
Sometimes it's desirable to re-sync a "stale" data directory
on a standby, rather than start from scratch with pg_basebackup().
This re-adds the rsync code from the 2.x series, with some
modifications.
TODO: tablespace support.
To handle cascaded replication we're going to have to keep track
of each node's upstream node. Also enumerate the node type
("primary", "standby" or "witness") and mark if active.
Use the reported `server_version_num` integer for version number
detection and comparison. This makes it easier to set an arbitrary
minimum supported version (rather than "9.0 or later") as well
as future-proofing for 10.x and later.