Commit Graph

248 Commits

Author SHA1 Message Date
Ian Barwick
a502b2cf96 Move function parse_repmgr_version() to a more appropriate location 2019-09-24 13:14:03 +09:00
Ian Barwick
10f00b8822 repmgr: pass explicitly provided log level when executing repmgr remotely
This makes it possible to return log output when executing repmgr
remotely at a different level to the one defined in the remote
repmgr's repmgr.conf.

This is particularly useful when DEBUG output is required.
2019-09-17 15:38:43 +09:00
Ian Barwick
677a94513e repmgr: note that --dry-run is not effective with "repmgr service status" 2019-08-28 15:14:35 +09:00
Ian Barwick
931da14df1 Rename some "repmgr daemon ..." commands to "repmgr service ..."
"repmgr daemon" can be interpreted to mean the commands affect the local
daemon process only. Rename the commands which affect the entire cluster
to "repmgr service ...".

The "repmgr daemon ..." form of the affected commands is retained for backwards
 compatibility.
2019-08-28 14:58:11 +09:00
Ian Barwick
f5044465cb Add function to safely modify postgresql.auto.conf
This is required for PostgreSQL 12 and later.
2019-08-14 16:57:42 +09:00
Ian Barwick
a1775237d4 Update comment
Deprecated command line option --data-dir was removed in commit 5ca0b57,
but a comment still referred to it.
2019-08-14 14:12:09 +09:00
Ian Barwick
94ba635811 Define our own PG_AUTOCONF_FILENAME 2019-08-13 16:48:44 +09:00
Ian Barwick
c0f3990973 Use appendPQExpBufferStr where appropriate 2019-08-13 16:32:40 +09:00
Ian Barwick
5ca0b57d0c Remove command-line options deprecated since repmgr 3.3
The following options have long since been deprecated, and any attempt
to use them results only in a warning that they are no longer valid:

  --data-dir
  --no-conninfo-password
  --recovery-min-apply-delay
2019-08-05 16:26:12 +09:00
Ian Barwick
7d20aea606 Fix typo in comment 2019-08-01 15:20:44 +09:00
Ian Barwick
d4df0055c9 repmgr: use --compact (not --terse) in "cluster events" to hide details column
This is consistent with usage elsewhere.

"--terse" is intended to reduce logging noise.
2019-05-30 14:19:37 +09:00
Ian Barwick
e6195edbca cluster show: warn if unable to connect to witness's upstream
Fix also applies to "daemon status".
2019-05-21 12:35:49 +09:00
Ian Barwick
2326c384c0 cluster show: fix upstream check for witnesses
Fix also applies to "daemon status"
2019-05-21 12:28:32 +09:00
Ian Barwick
f03e012c99 cluster show/daemon status: report if node not attached to advertised upstream 2019-05-14 16:15:03 +09:00
Ian Barwick
8587539adb Fix command line sanity check 2019-05-14 13:27:00 +09:00
Ian Barwick
fca033fb9d cluster show/daemon status: report upstream node mismatches
When showing node information, check if the node's copy of its
record shows a different upstream to the one expected according
to the node where the command is executed.

This helps visualise situations where the cluster is in an
unexpected state, and provide a better idea of the actual state.

For example, if a cluster has divided somehow and a set of nodes are
following a new primary, when running "cluster show" etc., repmgr
will now show the name of the primary those nodes are actually
following, rather than the now outdated node name recorded
on the other side of the split. A warning will also be issued
about the situation.
2019-05-14 13:11:31 +09:00
Ian Barwick
d8e4c54ea4 "standby switchover": add "--repmgrd-force-unpause"
Implements GitHub #559.
2019-05-10 16:04:07 +09:00
Ian Barwick
b9f07f6a91 standby promote: use variable name "local_conn" for the local connection handle
This is consistent with usage in other functions, and makes it easier to
differentiate between the local node connection and the primary connection.
2019-05-02 12:04:26 +09:00
Ian Barwick
89a7261483 Always quote node names in log messages 2019-04-30 15:52:56 +09:00
Ian Barwick
5f10e68f31 emit warning if "--siblings-follow" provided out-of-context 2019-04-29 14:12:22 +09:00
Ian Barwick
2082a8d3f3 Consolidate some code 2019-04-25 16:04:40 +09:00
Ian Barwick
9fe2fa2daf daemon status: make output more like that of "cluster show"
In particular make any issues with unexpected server state more
obvious.
2019-04-25 14:45:41 +09:00
Ian Barwick
5a9175c740 Clarify hints about updating the repmgr extension 2019-04-24 11:37:31 +09:00
Ian Barwick
a9b56d9833 Fix hint message
s/UPGRADE/UPDATE
2019-04-10 12:08:26 +09:00
Ian Barwick
5e9f202c9a Add missing break 2019-03-28 12:44:50 +09:00
Ian Barwick
9d5afeebbc Remove logically dead code 2019-03-28 12:35:41 +09:00
Ian Barwick
ba1f05ece9 Restrict "node_name" to maximum 63 characters
In "recovery.conf", the configuration parameter "node_name" is used
as the "application_name" value, which will be truncated by PostgreSQL
to 63 characters (NAMEDATALEN - 1).

repmgr sometimes needs to be able to extract the application name from
pg_stat_replication to determine if a node is connected (e.g. when
executing "repmgr standby register"), so the comparison will fail
if "node_name" exceeds 63 characters.
2019-03-28 10:37:57 +09:00
Ian Barwick
539861cb58 repmgrd: during failover, check if a node was already promoted
Previously, repmgrd assumed that during a failover, there would not
already be another primary node. However it's possible a node was
promoted manually. While this is not a desirable situation, it's
conceivable this could happen in the wild, so we should check for
it and react accordingly.

Also sanity-check that the follow target can actually be followed.

Addresses issue raised in GitHub #420.
2019-03-22 14:06:41 +09:00
Ian Barwick
46efe57cd0 Improve database connection failure logging
Log the output of PQerrorStatus() in a couple of places where it was missing.

Additionally, always log the output of PQerrorStatus() starting with a blank
line, otherwise the first line looks like it was emitted by repmgr, and
it's harder to scan the error message.

Before:

    [2019-03-20 11:24:15] [DETAIL] could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?

After:

    [2019-03-20 11:27:21] [DETAIL]
    could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?
2019-03-20 11:47:28 +09:00
Ian Barwick
ae8171e461 Improve logging/sanity checking for "node control" options 2019-03-06 15:54:30 +09:00
Ian Barwick
1615353f48 repmgrd: optionally disconnect WAL receivers during failover
This is intended to ensure that all nodes have a constant LSN while
making the failover decision.

This feature is experimental and needs to be explicitly enabled with the
configuration file option "standby_disconnect_on_failover".

Note enabling this option will result in a delay in the failover decision
until the WAL receiver is disconnected on all nodes.
2019-03-06 15:53:57 +09:00
Ian Barwick
b1875a8d91 Split command execution functions into separate library
These may need to be executed by repmgrd.
2019-02-27 14:41:17 +09:00
Ian Barwick
9338a9e233 Improve logging output
Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail line
2019-02-15 10:49:56 +09:00
Ian Barwick
9305953bd2 Fix history file parsing
Also add additional debugging output.
2019-02-14 15:52:40 +09:00
Ian Barwick
bc9e725d05 node rejoin: always emit detail about relative LSNs
Previously repmgr only emitted that if there was a timeline/LSN
mismatch, but it's useful to have confirmation of how it came
to the conclusion that rejoin will succeed.
2019-02-13 15:16:40 +09:00
yonj1e
e146fb4fc3 Fix undeclared 'TRUE' error
GitHub #547.
2019-02-11 16:55:54 +09:00
Ian Barwick
aa1e64ec11 Warn about redundant use of --compact option 2019-02-07 14:35:30 +09:00
Ian Barwick
cd3312496e Rename functions which return an LSN for clarity 2019-02-06 09:32:53 +09:00
Ian Barwick
f24b30327c Add missing "daemon (start|stop)" options to main help output 2019-02-02 13:11:31 +09:00
Ian Barwick
48381a5b4e Use --compact option for abbreviated display output
--terse is meant for reducing log chatter.
2019-02-02 13:06:59 +09:00
Ian Barwick
f04f2af8aa Add missing include files
Per compiler griping on OS X.
2019-01-31 16:10:48 +09:00
Ian Barwick
ea54aaa290 Use "rejoin target" instead of "follow target" in "node rejoin" log output 2019-01-31 11:32:38 +09:00
Ian Barwick
b34c331eba "node rejoin": fail if rejoin target has same timeline and lower LSN
pg_rewind will not resolve this situation.
2019-01-31 11:15:55 +09:00
Ian Barwick
d7420d7274 daemon (start|stop): verify that repmgrd starts/stops.
Note this may not always be possible for "daemon stop" if we are unable
to determine the repmgrd PID.
2019-01-30 14:41:31 +09:00
Ian Barwick
70e4243a1d Clean up calls to repmgr_atoi()
In some places we were still providing "false" from the original implementation,
which was intended to indicate whether a negative value was allowed.

This has not been a problem, as it merely means we have been providing "0",
which is the same thing; however we can finer-tune some of the calls
(e.g. node ID must be or greater).
2019-01-30 11:43:43 +09:00
Ian Barwick
8b13d14294 "daemon stop": initial implementation 2019-01-29 13:01:23 +09:00
Ian Barwick
32b81e7d49 "daemon start": initial implementation 2019-01-29 13:01:14 +09:00
Ian Barwick
cbfef17a1d Fix check of --no-wait option 2019-01-29 12:29:05 +09:00
Ian Barwick
061932d023 "node rejoin": verify status of rejoin target
This adapts the code previously added to "standby follow" to verify
whether the rejoin target can actually be rejoined.
2019-01-23 17:08:55 +09:00
Ian Barwick
3f5762e03a Refactor upstream attachment check code
Move it from the "standby follow" code to an independent function so it can
be used in other contexts, e.g. "node rejoin".
2019-01-23 15:11:42 +09:00