Commit Graph

231 Commits

Author SHA1 Message Date
Ian Barwick
b9f07f6a91 standby promote: use variable name "local_conn" for the local connection handle
This is consistent with usage in other functions, and makes it easier to
differentiate between the local node connection and the primary connection.
2019-05-02 12:04:26 +09:00
Ian Barwick
89a7261483 Always quote node names in log messages 2019-04-30 15:52:56 +09:00
Ian Barwick
5f10e68f31 emit warning if "--siblings-follow" provided out-of-context 2019-04-29 14:12:22 +09:00
Ian Barwick
2082a8d3f3 Consolidate some code 2019-04-25 16:04:40 +09:00
Ian Barwick
9fe2fa2daf daemon status: make output more like that of "cluster show"
In particular make any issues with unexpected server state more
obvious.
2019-04-25 14:45:41 +09:00
Ian Barwick
5a9175c740 Clarify hints about updating the repmgr extension 2019-04-24 11:37:31 +09:00
Ian Barwick
a9b56d9833 Fix hint message
s/UPGRADE/UPDATE
2019-04-10 12:08:26 +09:00
Ian Barwick
5e9f202c9a Add missing break 2019-03-28 12:44:50 +09:00
Ian Barwick
9d5afeebbc Remove logically dead code 2019-03-28 12:35:41 +09:00
Ian Barwick
ba1f05ece9 Restrict "node_name" to maximum 63 characters
In "recovery.conf", the configuration parameter "node_name" is used
as the "application_name" value, which will be truncated by PostgreSQL
to 63 characters (NAMEDATALEN - 1).

repmgr sometimes needs to be able to extract the application name from
pg_stat_replication to determine if a node is connected (e.g. when
executing "repmgr standby register"), so the comparison will fail
if "node_name" exceeds 63 characters.
2019-03-28 10:37:57 +09:00
Ian Barwick
539861cb58 repmgrd: during failover, check if a node was already promoted
Previously, repmgrd assumed that during a failover, there would not
already be another primary node. However it's possible a node was
promoted manually. While this is not a desirable situation, it's
conceivable this could happen in the wild, so we should check for
it and react accordingly.

Also sanity-check that the follow target can actually be followed.

Addresses issue raised in GitHub #420.
2019-03-22 14:06:41 +09:00
Ian Barwick
46efe57cd0 Improve database connection failure logging
Log the output of PQerrorStatus() in a couple of places where it was missing.

Additionally, always log the output of PQerrorStatus() starting with a blank
line, otherwise the first line looks like it was emitted by repmgr, and
it's harder to scan the error message.

Before:

    [2019-03-20 11:24:15] [DETAIL] could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?

After:

    [2019-03-20 11:27:21] [DETAIL]
    could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?
2019-03-20 11:47:28 +09:00
Ian Barwick
ae8171e461 Improve logging/sanity checking for "node control" options 2019-03-06 15:54:30 +09:00
Ian Barwick
1615353f48 repmgrd: optionally disconnect WAL receivers during failover
This is intended to ensure that all nodes have a constant LSN while
making the failover decision.

This feature is experimental and needs to be explicitly enabled with the
configuration file option "standby_disconnect_on_failover".

Note enabling this option will result in a delay in the failover decision
until the WAL receiver is disconnected on all nodes.
2019-03-06 15:53:57 +09:00
Ian Barwick
b1875a8d91 Split command execution functions into separate library
These may need to be executed by repmgrd.
2019-02-27 14:41:17 +09:00
Ian Barwick
9338a9e233 Improve logging output
Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail line
2019-02-15 10:49:56 +09:00
Ian Barwick
9305953bd2 Fix history file parsing
Also add additional debugging output.
2019-02-14 15:52:40 +09:00
Ian Barwick
bc9e725d05 node rejoin: always emit detail about relative LSNs
Previously repmgr only emitted that if there was a timeline/LSN
mismatch, but it's useful to have confirmation of how it came
to the conclusion that rejoin will succeed.
2019-02-13 15:16:40 +09:00
yonj1e
e146fb4fc3 Fix undeclared 'TRUE' error
GitHub #547.
2019-02-11 16:55:54 +09:00
Ian Barwick
aa1e64ec11 Warn about redundant use of --compact option 2019-02-07 14:35:30 +09:00
Ian Barwick
cd3312496e Rename functions which return an LSN for clarity 2019-02-06 09:32:53 +09:00
Ian Barwick
f24b30327c Add missing "daemon (start|stop)" options to main help output 2019-02-02 13:11:31 +09:00
Ian Barwick
48381a5b4e Use --compact option for abbreviated display output
--terse is meant for reducing log chatter.
2019-02-02 13:06:59 +09:00
Ian Barwick
f04f2af8aa Add missing include files
Per compiler griping on OS X.
2019-01-31 16:10:48 +09:00
Ian Barwick
ea54aaa290 Use "rejoin target" instead of "follow target" in "node rejoin" log output 2019-01-31 11:32:38 +09:00
Ian Barwick
b34c331eba "node rejoin": fail if rejoin target has same timeline and lower LSN
pg_rewind will not resolve this situation.
2019-01-31 11:15:55 +09:00
Ian Barwick
d7420d7274 daemon (start|stop): verify that repmgrd starts/stops.
Note this may not always be possible for "daemon stop" if we are unable
to determine the repmgrd PID.
2019-01-30 14:41:31 +09:00
Ian Barwick
70e4243a1d Clean up calls to repmgr_atoi()
In some places we were still providing "false" from the original implementation,
which was intended to indicate whether a negative value was allowed.

This has not been a problem, as it merely means we have been providing "0",
which is the same thing; however we can finer-tune some of the calls
(e.g. node ID must be or greater).
2019-01-30 11:43:43 +09:00
Ian Barwick
8b13d14294 "daemon stop": initial implementation 2019-01-29 13:01:23 +09:00
Ian Barwick
32b81e7d49 "daemon start": initial implementation 2019-01-29 13:01:14 +09:00
Ian Barwick
cbfef17a1d Fix check of --no-wait option 2019-01-29 12:29:05 +09:00
Ian Barwick
061932d023 "node rejoin": verify status of rejoin target
This adapts the code previously added to "standby follow" to verify
whether the rejoin target can actually be rejoined.
2019-01-23 17:08:55 +09:00
Ian Barwick
3f5762e03a Refactor upstream attachment check code
Move it from the "standby follow" code to an independent function so it can
be used in other contexts, e.g. "node rejoin".
2019-01-23 15:11:42 +09:00
Ian Barwick
7dce3ed234 Update copyright notices to 2019 2019-01-21 14:54:35 +09:00
Ian Barwick
d261768541 Standardize on --host option 2019-01-17 10:52:41 +09:00
Ian Barwick
aa8547a219 Improve "witness register" documentation, help and logging
Make it clearer that a) the primary server's hostname is required,
and b) how to provide it.

Based on feedback provided in GitHub #529.
2019-01-17 10:42:53 +09:00
Ian Barwick
0b3a310802 Add --data-directory-config option to "repmgr node check"
Implements part of GitHub #523.
2019-01-16 16:03:44 +09:00
Ian Barwick
b3c2831bd3 repmgr: add --dry-run option to "standby promote"
Implements GitHub #522.
2019-01-10 12:36:58 +09:00
Ian Barwick
c66c8ebc98 repmgr: add --terse mode to "cluster show"
This suppresses display of the usually lengthy "conninfo" column, mainly
useful for generating a compact table suitable for pasting into emails,
chats etc. without messy line breaks.

Implements GitHub #521.
2019-01-09 10:06:37 +09:00
Ian Barwick
1156f27979 Fix "repmgr --help" output
Add missing references to "witness" and "daemon" actions.
2019-01-08 10:11:31 +09:00
Ian Barwick
b5b9aacc8a Add command line option "repmgr --version-number"
Outputs the raw version number.

Intended for use by scripts etc.
2019-01-08 10:08:23 +09:00
Ian Barwick
9cf5bf3f93 Note primary/standby aliases for "node check" and "node status" actions
Add comment noting the intent behind those code sections, otherwise it
looks like a copy'n'paste error.

This currently isn't documented.
2019-01-08 09:26:37 +09:00
Ian Barwick
9a5bd0d489 Update comment listing valid actions 2019-01-08 09:16:51 +09:00
Ian Barwick
74c44a7178 doc: document "repmgr node service"
This was originally intended for internal use, but it's mentioned
several times in the documentation and is useful for diagnostic
purposes.
2018-11-28 12:58:07 +09:00
Ian Barwick
793d83b22c Refactor server version detection
Most of the time we can simply get the version number directly from
the connection handle. Previously it was held in a global variable,
which was an icky way of doing things.

In a few special cases we also need the actual version string, which
is obtained directly from the database.
2018-11-22 21:30:31 +09:00
Ian Barwick
c3bc5585d9 Add sanity check for extension version
This should cover the cases where the "repmgr" extension was installed
manually but not updated, or an upgrade was not fully completed.
2018-10-31 11:16:36 +09:00
Ian Barwick
455a0bd93f Use make_remote_repmgr_path() in place of make_repmgr_path()
Also we can now simplify "cluster (matrix|crosscheck)" commands as
beginning with v4.0, we know where the configuration file is, so can
provide that when invoking repmgr remotely.
2018-10-02 09:59:18 +09:00
Ian Barwick
11d25e2aef Add configuration parameter "repmgr_bindir"
This is to facilitate remote invocation of repmgr when the repmgr
binary is located somewhere other than the PostgreSQL binary directory, as it
cannot be assumed all package maintainers will install repmgr there.

This parameter is optional; if not set (the default), repmgr will fall back
to "pg_bindir" (if set).

Addresses GitHub #246.
2018-10-02 09:59:12 +09:00
Ian Barwick
b14fbbdc72 Add "repmgr daemon ..." options to main help output 2018-09-27 19:07:59 +09:00
Ian Barwick
2491b8ae52 Add functionality to "pause" repmgrd
In some circumstances, e.g. while performing a switchover, it is essential
that repmgrd does not take any kind of failover action, as this will put
the cluster into an incorrect state.

Previously it was necessary to stop repmgrd on all nodes (or at least
those nodes which repmgrd would consider as promotion candidates), however
this is a cumbersome and potentially risk-prone operation, particularly if the
replication cluster contains more than a couple of servers.

To prevent this issue from occurring, this patch introduces the ability
to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause")
which notifies repmgrd not to take any failover action until the node
is "unpaused" ("repmgr daemon unpause").

"repmgr daemon status" provides an overview of each node and whether repmgrd
is running, and if so whether it is paused.

"repmgr standby switchover" has been modified to automatically pause repmgrd
while carrying out the switchover.

See documentation for further details.
2018-09-27 16:42:10 +09:00