123 Commits

Author SHA1 Message Date
Ian Barwick
4c05307da1 repmgr: fix error message
It's possible the missing node record might be for a witness server,
though we have no way of knowing that.
2022-05-11 14:58:36 +09:00
Ian Barwick
79d1f005db repmgrd: activate inactive node record at startup
If a PostgreSQL instance was shut down while repmgrd was running, and
repmgrd was subsequently restarted (this chain of events could occur
during e.g. a server reboot), the node record will have been set to
"inactive". Previously, in this case repmgrd would refuse to start up.
However, as we can determine the node is running, it should normally
be no problem to automatically set the node record to "active".

The old behaviour can be restored by setting the new parameter
"repmgrd_exit_on_inactive_node" to "true".

RM19604.
2021-07-12 17:46:09 +09:00
Ian Barwick
2af71c6426 repmgrd: ensure short option "-s" is accepted
The long option --show-pid-file was fine.
2021-06-03 18:41:11 +09:00
Ian Barwick
d266df3143 Change copyright information to "EnterpriseDB Corporation"
RM20485.
2021-03-01 11:03:52 +09:00
Ian Barwick
b37a599fc6 Update copyright notices to 2021 2021-01-04 12:54:54 +09:00
Josh Soref
f619c3a8ff Fix various typos in code comments.
Via GitHub #687.
2020-12-22 13:43:06 +09:00
Ian Barwick
c480d01f9c Improve HINT about upgrading the repmgr extension
Per feedback in GitHub #685.
2020-12-15 08:41:46 +09:00
Ian Barwick
674c06d01c Decouple extension version check from binary version
Until now the extension version has always moved in lock-step
with the binary version, but that doesn't always need to be
the case, so make it possible to have an extension version
which does not match the binary version.
2020-10-30 14:42:58 +09:00
Ian Barwick
e65738c989 Explicitly unset search path when connecting to database 2020-05-22 16:11:55 +09:00
Ian Barwick
aaea24b58b repmgrd: log reconnection of paired connection 2020-05-22 14:21:17 +09:00
Ian Barwick
cf60844c45 repmgrd: ensure primary connection is reset if same as upstream
Addresses GitHub #633.
2020-05-22 11:11:54 +09:00
Ian Barwick
3dde8f1386 "retire" old configuration handling code 2020-05-14 11:57:00 +09:00
Ian Barwick
a863dc7f6c repmgrd: additional check for the upstream connection
It's possible the upstream server was intermittently unavailable in
the interval between checks, invalidating the upstream connection.
With check types "ping" and "connection", the connection would not be
restored, so if the availability check was successful, additionally
verify the upstream connection and restore if necessary.

Addresses GitHub #633.
2020-05-14 10:26:57 +09:00
Ian Barwick
d37513312a Move the main configfile structure into configfile.c
This is required for a later refactoring of the configuration file
handling.
2020-05-05 14:43:55 +09:00
Ian Barwick
4d4ed3bcd6 Remove BDR 2.x support
The BDR 2.x support was conceptual only and was never used in
production. As BDR 2.x will be EOL'd shortly, there is no risk it will
be needed.
2020-01-16 09:52:42 +09:00
Ian Barwick
7fdf2f1778 Update copyright notices to 2020 2020-01-13 14:06:20 +09:00
Ian Barwick
8ead0042ad Miscellaneous comment and logging cleanup1 2019-05-23 09:31:46 +09:00
Ian Barwick
5a9175c740 Clarify hints about updating the repmgr extension 2019-04-24 11:37:31 +09:00
Ian Barwick
7d0caefaee Fix logging related to "connection_check_type"
Also log the selected type at repmgrd startup.
2019-03-20 11:58:18 +09:00
Ian Barwick
46efe57cd0 Improve database connection failure logging
Log the output of PQerrorStatus() in a couple of places where it was missing.

Additionally, always log the output of PQerrorStatus() starting with a blank
line, otherwise the first line looks like it was emitted by repmgr, and
it's harder to scan the error message.

Before:

    [2019-03-20 11:24:15] [DETAIL] could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?

After:

    [2019-03-20 11:27:21] [DETAIL]
    could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?
2019-03-20 11:47:28 +09:00
Ian Barwick
c2206b007a repmgrd: optionally check upstream availability through connection attempts 2019-03-14 15:44:53 +09:00
Ian Barwick
573d027db6 repmgrd: various minor logging improvements 2019-03-13 11:27:17 +09:00
Ian Barwick
f85b4cd98e Log warning if "standby_disconnect_on_failover" used on pre-9.5
"standby_disconnect_on_failover" requires availability of "wal_retrieve_retry_interval",
which is available from PostgreSQL 9.5.

9.4 will fall out of community support this year, so it doesn't seem
productive at this point to do anything more than put the onus on the user
to read the documentation and heed any warning messages in the logs.
2019-03-06 15:54:15 +09:00
Ian Barwick
dd04ebb809 repmgrd: handle reconnect to restarted server when using "connection" checks 2019-03-06 14:54:05 +09:00
Ian Barwick
63f7ad546e repmgrd: add option "connection_check_type"
This enable selection of the method repmgrd uses to check whether the upstream
node is available. Possible values are:

 - "ping" (default): uses PQping() to check server availability
 - "connection":  executes a query on the connection to check server
   availability (similar to repmgr3.x).
2019-03-06 12:09:54 +09:00
Ian Barwick
a48d408e4e Consistently log strerror output as DETAIL 2019-01-29 12:10:55 +09:00
Ian Barwick
7dce3ed234 Update copyright notices to 2019 2019-01-21 14:54:35 +09:00
Ian Barwick
40408a1734 repmgrd: check binary and extension major versions match
repmgr requires that the same "major version" (e.g. 4.3) is present
on all nodes, otherwise - particularly in the case of repmgrd - it's
highly likely things won't work as expected.

Implements part of GitHub #515.
2019-01-07 15:39:40 +09:00
Ian Barwick
793d83b22c Refactor server version detection
Most of the time we can simply get the version number directly from
the connection handle. Previously it was held in a global variable,
which was an icky way of doing things.

In a few special cases we also need the actual version string, which
is obtained directly from the database.
2018-11-22 21:30:31 +09:00
Ian Barwick
c3bc5585d9 Add sanity check for extension version
This should cover the cases where the "repmgr" extension was installed
manually but not updated, or an upgrade was not fully completed.
2018-10-31 11:16:36 +09:00
Ian Barwick
ab6c3d9b6e Handle NULL strings when parsing boolean arguments 2018-10-17 11:47:32 +09:00
Ian Barwick
ad03885b72 repmgrd: fix parsing of -d/--daemonize option
The getopt API doesn't cope well with optional arguments to short form options,
e.g. "-o foo", so we need to check the next argument value to see whether it looks
like an option or an actual argument value.
2018-10-04 11:48:54 +09:00
Ian Barwick
a40fd60cb5 repmgrd: fix parsing of -d/--daemonize option 2018-10-03 11:36:38 +09:00
Ian Barwick
2491b8ae52 Add functionality to "pause" repmgrd
In some circumstances, e.g. while performing a switchover, it is essential
that repmgrd does not take any kind of failover action, as this will put
the cluster into an incorrect state.

Previously it was necessary to stop repmgrd on all nodes (or at least
those nodes which repmgrd would consider as promotion candidates), however
this is a cumbersome and potentially risk-prone operation, particularly if the
replication cluster contains more than a couple of servers.

To prevent this issue from occurring, this patch introduces the ability
to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause")
which notifies repmgrd not to take any failover action until the node
is "unpaused" ("repmgr daemon unpause").

"repmgr daemon status" provides an overview of each node and whether repmgrd
is running, and if so whether it is paused.

"repmgr standby switchover" has been modified to automatically pause repmgrd
while carrying out the switchover.

See documentation for further details.
2018-09-27 16:42:10 +09:00
Ian Barwick
1693ec0e90 repmgrd: fix syntax 2018-08-30 16:27:07 +09:00
Ian Barwick
17e75f6b31 repmgrd: improve reconnection handling
Previously, if the server being monitored was not available, repmgrd
would always close the existing connection handle and open a new one.

However, in some cases, e.g. a brief network outage, the existing
connection handle is still good and does not need to be reopened.

This could be particularly problematic if monitoring_history is on,
as this risks leaving orphan sessions on the primary which (given
a sufficiently unstable network) could lead to all available backends
being occupied.

Instead, during an outage we now use a new connection to verify
the server is accessible; if the old connection is still available
(e.g. following a short network interruption) we continue using that;
if  not (e.g. the server was restarted), we use the new one.
2018-08-30 15:46:08 +09:00
Ian Barwick
78b969f208 repmgrd: report version number *after* logger initialisation
This ensures the version number always makes it into the log destination.

Implements GitHub #487.
2018-08-08 15:44:06 +09:00
Ian Barwick
388ac2f392 repmgrd: enable package to supply default PID file path
Also add documentation for packagers about paths which can be patched
as default package values.
2018-07-13 10:26:47 +09:00
Ian Barwick
a194cf56b3 repmgr: exit with an error if an unrecognised command line option is provided.
This matches the behaviour of other PostgreSQL utilities such as psql, though
repmgr will only abort once all command line options are parsed, so as many
errors as possible are found and displayed. If a repmgr "command" (e.g.
"repmgr primary ..." was provided, a hint about the relevant command
help section (e.g. "repmgr primary --help") will be provided alongside
the generic help command (i.e. "repmgr --help").

Addresses GitHub #464, with further improvements.
2018-07-04 11:02:50 +09:00
Ian Barwick
802755fd60 repmgrd: daemonize process by default
It's hard to imagine a use case where this isn't desirable, but
in case, for whatever reason, the user does not wish to daemonize the
process, the command line option "--daemonize=false" can be provided.

Implements GitHub #458.
2018-06-29 22:01:49 +09:00
Ian Barwick
8d636690bd repmgrd: create pid file by default
Traditionally repmgrd will only write a pidfile if explicitly requested with
-p/--pid-file. However it's normally desirable to have a pidfile, and it's
preferable to have one used by default to prevent accidentally starting a second
repmgrd instance.

Following changes made:

 - add configuration file parameter "repmgrd_pid_file" (initially overridden by
   -p/--pid-file for backwards compatibility, though eventually we'll want to
   drop -p/--pid-file altogether)
 - add command line option --no-pid-file
 - if neither "repmgrd_pid_file" nor -p/--pid-file is set, create the pid file
   in a temporary directory

Implements GitHub #457.
2018-06-29 14:36:24 +09:00
Ian Barwick
6f315c1b3c repmgrd: don't explicitly close connections on shutdown 2018-05-01 10:21:10 +09:00
Ian Barwick
73982859f6 repmgrd: improve log output
- emit explicit startup NOTICE
- emit NOTICE when falling back to degraded monitoring on a primary node
- improve log message and event notification details when monitoring
  a former primary which has been reconnected as a standby
2018-04-03 14:37:06 +09:00
Ian Barwick
a403da67bc Consolidate connection closure calls 2018-03-27 16:43:59 +09:00
Ian Barwick
0219f4c91f Always set "connect_timeout" when pinging a PostgreSQL instance
Insert "connect_timeout=2" into the connection parameters, if not
explicitly set by the user. This will prevent excessive wait time
for the host operating system to report a connection timeout.
2018-03-21 11:48:57 +09:00
Ian Barwick
bad034f7ee repmgrd: remove duplicate local record check in BDR mode 2018-03-07 19:21:33 +09:00
Ian Barwick
cdb504d700 Add event "repmgrd_shutdown"
Implements GitHub #393
2018-03-06 11:00:03 +09:00
Ian Barwick
e38a9ec7e1 repmgrd: fix main monitoring loop for witness server
Missing "break" was breaking it when following a new primary.
2018-03-02 11:04:22 +09:00
Ian Barwick
64d85587de repmgrd: check "repmgr" extension is installed before starting
Implements GitHub #361.
2018-02-12 11:38:31 +09:00
Ian Barwick
ec068e38a2 Remove --bdr-only configuration option
This was required for a specific use case during pre-release
development and is no longer needed now the physical streaming
replication handling is implemented.
2018-01-25 10:48:09 +09:00