Commit Graph

151 Commits

Author SHA1 Message Date
Ian Barwick
64035ef701 "standby register/follow": provide primary node details for event notifications
For events generated by these commands, it may be useful to know details
of the primary node. This makes following additional parameters available
to event notification scripts:

- %p: node ID of the primary
- %a: node name of the primary
- %c: conninfo string for the primary

Implements GitHub #375
2018-02-06 09:36:46 +09:00
Ian Barwick
f96cc3b906 "cluster show": improve handling of database errors
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.
2018-02-05 10:15:48 +09:00
Ian Barwick
50894b6124 "standby follow": check for replication slot availability on target node 2018-02-02 15:01:23 +09:00
Ian Barwick
3d6437c8f8 repmgr: assume node is actually shutting down if pingable and that's the reported status 2018-01-16 11:17:06 +09:00
Ian Barwick
54b5c8ad94 repmgrd: log execution error in "repmgrd_get_local_node_id()"
That shouldn't happen, but if it does it will make it easier to
identify the issue.
2018-01-16 11:14:04 +09:00
Ian Barwick
ae7963dc64 repmgr: automatically create slot name if missing
It's possible that a node was registered with "use_replication_slots=false"
but that was later changed to "use_replication_slots=true". If the node
was not subsequently re-registered, the node record will contain an empty
slot name, which will cause any slot creation operation during
"standby follow" or "node rejoin" to fail.

To prevent this happening, check for an empty slot name and automatically
set before proceeding.

Addresses GitHub #343.
2018-01-11 11:13:41 +09:00
Ian Barwick
faffb2a6e7 repmgr: catch possible corner case when checking node shutdown status
It's conceivable that PQping is returning "no response" but the
shutdown hasn't quite completed.
2018-01-10 14:56:00 +09:00
Ian Barwick
5d57044118 repmgr: during switchover, correctly detect unclean shutdown status 2018-01-10 12:21:04 +09:00
Ian Barwick
07a88c78a5 repmgr standby switchover: add "%p" event notification parameter
This will contain the node ID of the former primary.
2018-01-10 11:01:00 +09:00
Ian Barwick
aee12dc2c7 "repmgr bdr register": create missing connection replication set if needed
Previously the assumption was that the "repmgr" replication set would be
set up when the nodes are created, however no checks were implemented
and this was not well-documented.

Addresses GitHub #347.
2018-01-04 17:12:52 +09:00
Ian Barwick
c5c86e1ada "repmgr bdr register": improve node name check
We'll use "bdr.bdr_get_local_node_name()" to check the local BDR node
name and the repmgr one match.
2018-01-04 16:07:06 +09:00
Ian Barwick
3d2530d6f9 Fix query in is_active_bdr_node()
Boolean column was not being checked correctly.

Also add detail output in "repmgr node role --check", where the function
is called.
2018-01-04 10:48:31 +09:00
Ian Barwick
b26e400199 "repmgr cluster event": move query to dbutils.c 2018-01-04 10:06:54 +09:00
Ian Barwick
1521657965 Update copyright notices to 2018 2018-01-02 10:20:09 +09:00
Ian Barwick
54a10a0c3f Add diagnostic option "repmgr node check --has-passfile"
This checks if the active libpq version (9.6 and later) has the
"passfile" option, and returns 0 if present, 1 if not.
`
2017-12-05 12:53:04 +09:00
Ian Barwick
270da1294c repmgr: initialise "voting_term" in "repmgr primary register"
This previously happened in the extension SQL code, which could
potentially cause replay problems if installing on a BDR cluster.

As this table is only required for streaming replication failover,
move the initialisation to "repmgr primary register".

Addresses GitHub #344 .
2017-11-28 12:26:33 +09:00
Ian Barwick
f8a0b051c8 repmgr: fix return code output for repmgr node check --action=...
Addresses GitHub #340
2017-11-23 10:35:41 +09:00
Martín Marqués
3e4a5e6ff5 Fix missing FQN for the nodes table.
This bug was not detected before because most users work with the repmgr
user. For that reason, the repmgr schema is already in the search_path
by default.

Add the repmgr schema to the nodes table in the LEFT JOIN used for
cluster show (and in other places)

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-11-23 10:35:38 +09:00
Ian Barwick
67e27f9ecd Remove unneeded functions 2017-11-20 15:26:32 +09:00
Ian Barwick
3f872cde0c "repmgr node ...": fixes for 9.3
Mainly to account for the lack of replication slots.
2017-11-16 11:26:39 +09:00
Ian Barwick
e331069f53 Escape double-quotes in strings passed to an event notification script
The string in question will be generated internally by repmgr as a simple
one-line string with no control characters etc., so all that needs to be
escaped at the moment are any double quotes.
2017-11-16 10:38:55 +09:00
Ian Barwick
53ebde8f33 repmgrd: don't fail over unless more than 50% of active nodes are visible. 2017-11-15 14:04:41 +09:00
Ian Barwick
e6644305d3 Add "witness unregister" functionality 2017-11-15 14:03:57 +09:00
Ian Barwick
31b856dd9f Add "witness register" functionality 2017-11-15 14:03:54 +09:00
Ian Barwick
d459b92186 Add configuration file "passfile"
This will enable a custom .pgpass to be included in "primary_conninfo"
(provided it's supported by the libpq version on the standby).
2017-11-15 14:03:37 +09:00
Ian Barwick
4d6dc57589 repmgrd: check shared library is loaded
If this isn't the case, "repmgrd" will appear to run but not handle
failover correctly.

Address GitHub #337.
2017-11-15 14:03:18 +09:00
Ian Barwick
9e2fb7ea13 repmgrd: remove unneeded functions 2017-11-09 19:51:18 +09:00
Ian Barwick
03b9475755 repmgrd: fixes to failover handling
get_new_primary() returns NULL if no notification for the new primary has
been received, but the code was expecting it to return UNKNOWN_NODE_ID,
which was causing repmgrd to prematurely drop out of the new primary
detection loop if no notification had been received by the time the loop
started.

Also store the electoral term as a single row, single column table,
to ensure that all repmgrds see the same turn. It is then bumped
by the winning node after it gets promoted.

Various logging improvements.
2017-11-09 19:51:09 +09:00
Ian Barwick
d6c27f8938 Standardize quoting in log messages 2017-10-04 09:34:59 +09:00
Ian Barwick
31c7cb4e9a Fixes for 9.3 support 2017-09-15 17:13:17 +09:00
Ian Barwick
b6b31b15b2 Implement "repmgr cluster cleanup" 2017-09-11 13:48:46 +09:00
Ian Barwick
a9f4a027a7 pgindent run 2017-09-11 11:14:13 +09:00
Ian Barwick
e4f7dc8234 Add copyright notices 2017-09-08 13:27:39 +09:00
Ian Barwick
a28bbd68eb "standby clone": improve replication slots handling
Ensure replication slot is created on the upstream node and deleted from
the source node, if upstream node and source nodes differ.
2017-09-06 12:16:02 +09:00
Ian Barwick
bd07a34472 "standby clone": improve log messages
Make it clearer which nodes are being connected to, and why.
2017-09-06 10:15:52 +09:00
Ian Barwick
1ef00f5a3b repmgrd: parse "follow_command" during cascaded standby failover 2017-09-05 11:19:25 +09:00
Ian Barwick
78e6bdeebe Have repmgrd parse "standby follow --upstream-node-id=%n" 2017-09-04 13:42:50 +09:00
Ian Barwick
9a0f45d7d3 Standardize quotation marks in log messages 2017-09-04 10:56:57 +09:00
Ian Barwick
1445d1002c Clean up log messages
Remove legacy trailing new lines.
2017-09-04 10:52:42 +09:00
Ian Barwick
ed16c32fe7 Check minimum server version for pg_rewind 2017-08-31 13:30:59 +09:00
Ian Barwick
fcd111ac4c Improve logging output during failover process 2017-08-24 22:44:03 +09:00
Ian Barwick
eee8d65259 Update view "replication_status" 2017-08-24 15:05:13 +09:00
Ian Barwick
a659132ea4 repmgrd: write monitoring statistics 2017-08-24 11:49:44 +09:00
Ian Barwick
a0bad5fdc0 General code cleanup 2017-08-16 23:09:02 +09:00
Ian Barwick
8ff545f9ae Add --help output for "repmgr cluster" 2017-08-16 16:33:07 +09:00
Ian Barwick
4c0d719cdb Add replication slot check to "repmgr node check" 2017-08-16 11:17:02 +09:00
Ian Barwick
554673e83e Add "repmgr node check --downstream" 2017-08-15 15:50:46 +09:00
Ian Barwick
10ef30096c "node check": add server role check 2017-08-14 22:57:09 +09:00
Ian Barwick
3b2158edbf Initialise variables, where appropriate 2017-08-14 15:11:42 +09:00
Ian Barwick
eabd56f3be "standby follow": check node system identifiers match 2017-08-14 11:45:08 +09:00