Commit Graph

51 Commits

Author SHA1 Message Date
Ian Barwick
2491b8ae52 Add functionality to "pause" repmgrd
In some circumstances, e.g. while performing a switchover, it is essential
that repmgrd does not take any kind of failover action, as this will put
the cluster into an incorrect state.

Previously it was necessary to stop repmgrd on all nodes (or at least
those nodes which repmgrd would consider as promotion candidates), however
this is a cumbersome and potentially risk-prone operation, particularly if the
replication cluster contains more than a couple of servers.

To prevent this issue from occurring, this patch introduces the ability
to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause")
which notifies repmgrd not to take any failover action until the node
is "unpaused" ("repmgr daemon unpause").

"repmgr daemon status" provides an overview of each node and whether repmgrd
is running, and if so whether it is paused.

"repmgr standby switchover" has been modified to automatically pause repmgrd
while carrying out the switchover.

See documentation for further details.
2018-09-27 16:42:10 +09:00
Ian Barwick
688337dec3 repmgr: add "--node-id" option to "cluster cleanup"
Implements GitHub #493.
2018-09-25 15:56:40 +09:00
Ian Barwick
b0a2ee2259 get_all_node_records(): display any error encountered and return success status
In many cases we'll want to bail out with an error if the node list can't
be retrieved for any reason. This saves some repetitive coding.
2018-09-13 10:14:43 +09:00
Ian Barwick
7b33faa09b repmgr: improve "cluster show" output
Only output full contents of connection error messages in --verbose mode,
otherwise it can spew a lot of text onto the screen.
2018-09-07 16:59:54 +09:00
Ian Barwick
c1586e39b7 Log text of failed queries at log level ERROR
Previously query texts were always logged at log level DEBUG, but
that doesn't help much in a normal production environment when
trying to identify the cause of issues.

Also make various other minor improvements to query logging and
handling of database errors.

Implements GitHub #498.
2018-08-29 10:08:52 +09:00
Ian Barwick
e1e59e85d7 repmgr: add "cluster_cleanup" event
GitHub #492.
2018-08-24 09:20:05 +09:00
Ian Barwick
b3f64987cb repmgr: add --csv output to "cluster event"
Implements GitHub #471.
2018-07-13 11:19:42 +09:00
Ian Barwick
4c7c681a14 repmgr: have "cluster show" exit with a non-zero value if issues detected
If any issues are detected (e.g. node not reachable, unexpected node status
etc.), "repmgr cluster show" returns exit code 25 ("ERR_NODE_STATUS").

Note that exit code 25 was introduced recently as "ERR_CLUSTER_CHECK",
however it makes sense to use this to indicate issues detected by any
command which can detect node issues.

Addresses GitHub #456.
2018-07-05 11:03:48 +09:00
Greg Clough
190104c7db Added "cluster cleanup" to help 2018-06-29 22:54:59 +01:00
Ian Barwick
3b0cde2846 repmgr: cluster check commands - non-zero exit code if node(s) unavailable
Return ERR_CLUSTER_CHECK if one or nodes was not reachable.

Implements GitHub #447.
2018-06-12 10:30:11 +09:00
Ian Barwick
cf64f9e95c Always initialise t_conninfo_param_list structures 2018-04-03 14:31:24 +09:00
Ian Barwick
9c5e76401f Fix "repmgr cluster crosscheck" output
Addresses GitHub #398.
2018-03-27 16:44:04 +09:00
Ian Barwick
dd45189fa8 "cluster show": output any connection error messagesin list of warnings
This ensures any connection errors are displayed by default in a
comprehensible, easily reportable way, and saves having to request/filter
DEBUG output.

Implements GitHub #369.
2018-02-05 10:36:04 +09:00
Ian Barwick
a79c4fae88 "cluster show": minor code cleanup 2018-02-05 10:36:00 +09:00
Ian Barwick
657ed83921 "cluster show": improve handling of database errors
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.
2018-02-05 10:35:56 +09:00
Ian Barwick
cad12b1fb7 "repmgr cluster event": move query to dbutils.c 2018-01-04 14:55:46 +09:00
Ian Barwick
625187a61e "repmgr cluster events": optionally omit "Details" column with --terse
Implements GitHub #360.
2018-01-04 14:55:34 +09:00
Ian Barwick
26a9e848fd Update copyright notices to 2018 2018-01-02 10:19:46 +09:00
Ian Barwick
a6cc4d80f0 Add "witness register" functionality 2017-11-15 13:47:45 +09:00
Ian Barwick
7c3abe28b9 Standardize terminology on "primary" (in place of "master") 2017-10-24 13:42:50 +09:00
Ian Barwick
e8b74ea897 "repmgr cluster crosscheck": add --csv output
As advertised.
2017-10-04 09:34:19 +09:00
Ian Barwick
b6cd816923 Tidy up some log output 2017-09-12 11:08:41 +09:00
Ian Barwick
cf59944a35 Fix "repmhr cluster --help" output 2017-09-11 21:25:30 +09:00
Ian Barwick
b6b31b15b2 Implement "repmgr cluster cleanup" 2017-09-11 13:48:46 +09:00
Ian Barwick
a9f4a027a7 pgindent run 2017-09-11 11:14:13 +09:00
Ian Barwick
e4f7dc8234 Add copyright notices 2017-09-08 13:27:39 +09:00
Ian Barwick
1c015c72a0 "cluster show": display "location" field too 2017-09-05 14:25:29 +09:00
Ian Barwick
9f0d44373b "cluster show": syntax fix 2017-09-05 13:35:25 +09:00
Ian Barwick
f9f05158d2 Fix "cluster show --csv" output 2017-09-01 00:14:01 +09:00
Ian Barwick
c7423ebb44 Various minor fixes 2017-08-31 23:54:52 +09:00
Ian Barwick
d85b066f45 repmgrd cluster show: add warnings with information about discrepancies 2017-08-28 11:19:32 +09:00
Ian Barwick
7a9064cd1b "repmgr cluster events": show node name in output, if available
Nodes can be removed from repmgr.nodes, so we'll only have the historical
ID available via repmgr.events.
2017-08-17 10:49:54 +09:00
Ian Barwick
bbd59ab9a2 Update "repmgr cluster event" documentation and --help output 2017-08-17 10:40:48 +09:00
Ian Barwick
c93fa73a71 Ensure "repmgr cluster events" can filter on node name 2017-08-17 10:22:18 +09:00
Ian Barwick
0ac16f7630 Add more --help output 2017-08-16 17:49:46 +09:00
Ian Barwick
ae30f41de6 Clean up various unhandled memory allocations 2017-08-16 17:09:13 +09:00
Ian Barwick
8ff545f9ae Add --help output for "repmgr cluster" 2017-08-16 16:33:07 +09:00
Ian Barwick
4efc8fb9ce Add placeholder functions for "repmgr $command --help"
There are now too many options to sensibly fit into general --help
output; we'll add separate output for each repmgr command, e.g.
"repmgr node --help".
2017-08-16 13:24:14 +09:00
Ian Barwick
3b2158edbf Initialise variables, where appropriate 2017-08-14 15:11:42 +09:00
Ian Barwick
2499b42ef8 switchover: check for pending archive files on the demotion candidate
If the current primary (demotion candidate) still has any files to archive,
it will delay the shutdown until all files are archived. If there is a
substantial number of files, and/or the archive command executes slowly,
this will probably lead to an unwelcome delay in the switchover process.
2017-08-08 00:37:20 +09:00
Ian Barwick
0815accdef Formatting fix 2017-08-03 23:58:25 +09:00
Ian Barwick
42ecf5de74 Add TODO for repmgr cluster show 2017-07-27 18:11:13 +09:00
Ian Barwick
a9b0c16b3c Add "cluster matrix" and "cluster crosscheck" actions 2017-07-26 11:24:33 +09:00
Ian Barwick
b99443b0c8 Improvements to repmgr cluster show
Add documentation; show recovery status in --csv mode.
2017-07-20 10:25:13 +09:00
Ian Barwick
a5c5d9fa40 Show BDR status in "repmgr cluster show" output 2017-07-20 09:23:24 +09:00
Ian Barwick
8dcfbfc313 Improve "repmgr cluster show" display
Rather than simply emit "FAILED" for an unreachable node,
indicate whether its state matches that expected by repmgr.

E.g. following output:

   ID | Name  | Role    | Status               | Upstream | Connection string
  ----+-------+---------+----------------------+----------+----------------------------------------------------
   1  | node1 | primary | * running            |          | host=localhost dbname=repmgr user=repmgr port=5501
   2  | node2 | standby | ? unreachable        | node1    | host=localhost dbname=repmgr user=repmgr port=5502
   3  | node3 | standby | ! running as primary | node1    | host=localhost dbname=repmgr user=repmgr port=5503

is for a cluster where "node2" has been manually stopped, and "node3"
manually promoted.
2017-07-19 23:16:16 +09:00
Ian Barwick
b79a514660 Improve "repmgr cluster event" output 2017-07-19 22:06:54 +09:00
Ian Barwick
f7d5621941 Improve "repmgr cluster show" output generation 2017-07-19 21:34:53 +09:00
Ian Barwick
49ac9cf9ca Add "repmgr cluster show" 2017-07-19 17:36:21 +09:00
Ian Barwick
675dc5adb3 repmgr cluster event: order output by event_timestamp
Ordering by the derived "timestamp" column doesn't have sufficient
granularity.
2017-05-01 08:37:41 +09:00