Commit Graph

77 Commits

Author SHA1 Message Date
Ian Barwick
10f00b8822 repmgr: pass explicitly provided log level when executing repmgr remotely
This makes it possible to return log output when executing repmgr
remotely at a different level to the one defined in the remote
repmgr's repmgr.conf.

This is particularly useful when DEBUG output is required.
2019-09-17 15:38:43 +09:00
Ian Barwick
d4df0055c9 repmgr: use --compact (not --terse) in "cluster events" to hide details column
This is consistent with usage elsewhere.

"--terse" is intended to reduce logging noise.
2019-05-30 14:19:37 +09:00
Ian Barwick
c560dfbbce cluster show: display timeline ID
This helps provide a better picture of the state of the cluster, i.e.
making it more obvious whether there's been a timeline divergence.

This also provides infrastructure for further improvements in cluster
status display and diagnosis.

Note this is only available in PostgreSQL 9.6 and later as it relies
on the SQL functions for interrogating pg_control, which can be executed
remotely. As PostgreSQL 9.5 will shortly be the only community-supported
version without these functions, it's not worth the effort of trying
to duplicate their functionality.
2019-05-27 09:39:19 +09:00
Ian Barwick
fca033fb9d cluster show/daemon status: report upstream node mismatches
When showing node information, check if the node's copy of its
record shows a different upstream to the one expected according
to the node where the command is executed.

This helps visualise situations where the cluster is in an
unexpected state, and provide a better idea of the actual state.

For example, if a cluster has divided somehow and a set of nodes are
following a new primary, when running "cluster show" etc., repmgr
will now show the name of the primary those nodes are actually
following, rather than the now outdated node name recorded
on the other side of the split. A warning will also be issued
about the situation.
2019-05-14 13:11:31 +09:00
Ian Barwick
ae44012383 Minor code fixes to "cluster show"/"daemon status" formatting 2019-05-14 11:36:59 +09:00
Ian Barwick
2082a8d3f3 Consolidate some code 2019-04-25 16:04:40 +09:00
Ian Barwick
c8d52bab6d cluster show: fix thinko introduced in commit 9fe2fa2 2019-04-25 15:46:07 +09:00
Ian Barwick
9fe2fa2daf daemon status: make output more like that of "cluster show"
In particular make any issues with unexpected server state more
obvious.
2019-04-25 14:45:41 +09:00
Ian Barwick
be9c6d5fc6 Use correct sizeof() argument in a couple of strncpy calls
Source and destination buffers are however the same length in both cases.

Per GitHub #561.
2019-04-04 10:58:00 +09:00
Ian Barwick
799ac6d453 Add is_server_available_quiet()
For use in cases where the caller collates node availability information
and doesn't want to prematurely emit log output.
2019-04-01 12:27:30 +09:00
Ian Barwick
d43975eb5f Use correct argument for sizeof() 2019-03-28 11:02:50 +09:00
Ian Barwick
ba1f05ece9 Restrict "node_name" to maximum 63 characters
In "recovery.conf", the configuration parameter "node_name" is used
as the "application_name" value, which will be truncated by PostgreSQL
to 63 characters (NAMEDATALEN - 1).

repmgr sometimes needs to be able to extract the application name from
pg_stat_replication to determine if a node is connected (e.g. when
executing "repmgr standby register"), so the comparison will fail
if "node_name" exceeds 63 characters.
2019-03-28 10:37:57 +09:00
Ian Barwick
b1875a8d91 Split command execution functions into separate library
These may need to be executed by repmgrd.
2019-02-27 14:41:17 +09:00
Ian Barwick
3a5a4388c7 cluster show: differentiate unreachable status
Differentiate between unreachable nodes and nodes which are running
but rejecting connections.
2019-02-15 16:01:55 +09:00
Ian Barwick
d00cb767a6 cluster show: don't try to run WAL replay pause query on unreachable node 2019-02-12 10:15:06 +09:00
Ian Barwick
8aaf6571a0 "cluster show": display node priority
GitHUb #541.
2019-02-07 14:35:21 +09:00
Ian Barwick
9433f80364 "cluster show": warn about nodes with paused WAL replay
We do this in "repmgr daemon status" already, so do it here too for consistency.

Related to GitHub #540.
2019-02-07 13:48:46 +09:00
Ian Barwick
59ed86c01a "cluster show": fix formatting with multiple digit node IDs 2019-02-02 14:07:49 +09:00
Ian Barwick
48381a5b4e Use --compact option for abbreviated display output
--terse is meant for reducing log chatter.
2019-02-02 13:06:59 +09:00
Ian Barwick
7dce3ed234 Update copyright notices to 2019 2019-01-21 14:54:35 +09:00
Ian Barwick
c66c8ebc98 repmgr: add --terse mode to "cluster show"
This suppresses display of the usually lengthy "conninfo" column, mainly
useful for generating a compact table suitable for pasting into emails,
chats etc. without messy line breaks.

Implements GitHub #521.
2019-01-09 10:06:37 +09:00
Ian Barwick
b89b3c0961 Fix "repmgr cluster cleanup" help output
Table name mentioned was incorrect.
2019-01-08 09:49:43 +09:00
Ian Barwick
3e38759c02 use appendPQExpBufferStr/-Char() consistently 2018-10-04 08:42:42 +09:00
Ian Barwick
7ab81e10de Log SSH errors when running "repmgr cluster (matrix|crosscheck)"
Previously repmgr would abort with an unhelpful message about being
unable to parse CSV output.

With this commit, it will continue running, and display a list of
inaccessible nodes as an addendum to the main output (unless --csv
or --terse options are specified).

Addresses GitHub #246.
2018-10-03 10:12:18 +09:00
Ian Barwick
455a0bd93f Use make_remote_repmgr_path() in place of make_repmgr_path()
Also we can now simplify "cluster (matrix|crosscheck)" commands as
beginning with v4.0, we know where the configuration file is, so can
provide that when invoking repmgr remotely.
2018-10-02 09:59:18 +09:00
Ian Barwick
11d25e2aef Add configuration parameter "repmgr_bindir"
This is to facilitate remote invocation of repmgr when the repmgr
binary is located somewhere other than the PostgreSQL binary directory, as it
cannot be assumed all package maintainers will install repmgr there.

This parameter is optional; if not set (the default), repmgr will fall back
to "pg_bindir" (if set).

Addresses GitHub #246.
2018-10-02 09:59:12 +09:00
Ian Barwick
2491b8ae52 Add functionality to "pause" repmgrd
In some circumstances, e.g. while performing a switchover, it is essential
that repmgrd does not take any kind of failover action, as this will put
the cluster into an incorrect state.

Previously it was necessary to stop repmgrd on all nodes (or at least
those nodes which repmgrd would consider as promotion candidates), however
this is a cumbersome and potentially risk-prone operation, particularly if the
replication cluster contains more than a couple of servers.

To prevent this issue from occurring, this patch introduces the ability
to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause")
which notifies repmgrd not to take any failover action until the node
is "unpaused" ("repmgr daemon unpause").

"repmgr daemon status" provides an overview of each node and whether repmgrd
is running, and if so whether it is paused.

"repmgr standby switchover" has been modified to automatically pause repmgrd
while carrying out the switchover.

See documentation for further details.
2018-09-27 16:42:10 +09:00
Ian Barwick
688337dec3 repmgr: add "--node-id" option to "cluster cleanup"
Implements GitHub #493.
2018-09-25 15:56:40 +09:00
Ian Barwick
b0a2ee2259 get_all_node_records(): display any error encountered and return success status
In many cases we'll want to bail out with an error if the node list can't
be retrieved for any reason. This saves some repetitive coding.
2018-09-13 10:14:43 +09:00
Ian Barwick
7b33faa09b repmgr: improve "cluster show" output
Only output full contents of connection error messages in --verbose mode,
otherwise it can spew a lot of text onto the screen.
2018-09-07 16:59:54 +09:00
Ian Barwick
c1586e39b7 Log text of failed queries at log level ERROR
Previously query texts were always logged at log level DEBUG, but
that doesn't help much in a normal production environment when
trying to identify the cause of issues.

Also make various other minor improvements to query logging and
handling of database errors.

Implements GitHub #498.
2018-08-29 10:08:52 +09:00
Ian Barwick
e1e59e85d7 repmgr: add "cluster_cleanup" event
GitHub #492.
2018-08-24 09:20:05 +09:00
Ian Barwick
b3f64987cb repmgr: add --csv output to "cluster event"
Implements GitHub #471.
2018-07-13 11:19:42 +09:00
Ian Barwick
4c7c681a14 repmgr: have "cluster show" exit with a non-zero value if issues detected
If any issues are detected (e.g. node not reachable, unexpected node status
etc.), "repmgr cluster show" returns exit code 25 ("ERR_NODE_STATUS").

Note that exit code 25 was introduced recently as "ERR_CLUSTER_CHECK",
however it makes sense to use this to indicate issues detected by any
command which can detect node issues.

Addresses GitHub #456.
2018-07-05 11:03:48 +09:00
Greg Clough
190104c7db Added "cluster cleanup" to help 2018-06-29 22:54:59 +01:00
Ian Barwick
3b0cde2846 repmgr: cluster check commands - non-zero exit code if node(s) unavailable
Return ERR_CLUSTER_CHECK if one or nodes was not reachable.

Implements GitHub #447.
2018-06-12 10:30:11 +09:00
Ian Barwick
cf64f9e95c Always initialise t_conninfo_param_list structures 2018-04-03 14:31:24 +09:00
Ian Barwick
9c5e76401f Fix "repmgr cluster crosscheck" output
Addresses GitHub #398.
2018-03-27 16:44:04 +09:00
Ian Barwick
dd45189fa8 "cluster show": output any connection error messagesin list of warnings
This ensures any connection errors are displayed by default in a
comprehensible, easily reportable way, and saves having to request/filter
DEBUG output.

Implements GitHub #369.
2018-02-05 10:36:04 +09:00
Ian Barwick
a79c4fae88 "cluster show": minor code cleanup 2018-02-05 10:36:00 +09:00
Ian Barwick
657ed83921 "cluster show": improve handling of database errors
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.
2018-02-05 10:35:56 +09:00
Ian Barwick
cad12b1fb7 "repmgr cluster event": move query to dbutils.c 2018-01-04 14:55:46 +09:00
Ian Barwick
625187a61e "repmgr cluster events": optionally omit "Details" column with --terse
Implements GitHub #360.
2018-01-04 14:55:34 +09:00
Ian Barwick
26a9e848fd Update copyright notices to 2018 2018-01-02 10:19:46 +09:00
Ian Barwick
a6cc4d80f0 Add "witness register" functionality 2017-11-15 13:47:45 +09:00
Ian Barwick
7c3abe28b9 Standardize terminology on "primary" (in place of "master") 2017-10-24 13:42:50 +09:00
Ian Barwick
e8b74ea897 "repmgr cluster crosscheck": add --csv output
As advertised.
2017-10-04 09:34:19 +09:00
Ian Barwick
b6cd816923 Tidy up some log output 2017-09-12 11:08:41 +09:00
Ian Barwick
cf59944a35 Fix "repmhr cluster --help" output 2017-09-11 21:25:30 +09:00
Ian Barwick
b6b31b15b2 Implement "repmgr cluster cleanup" 2017-09-11 13:48:46 +09:00