Commit 79d1f00 modified repmgrd to automatically set an inactive node
to "active" at startup.
However we need to avoid doing that for cases where the node role has
changed (e.g. a former primary was recloned as a standby) but the node
record was not updated.
If a PostgreSQL instance was shut down while repmgrd was running, and
repmgrd was subsequently restarted (this chain of events could occur
during e.g. a server reboot), the node record will have been set to
"inactive". Previously, in this case repmgrd would refuse to start up.
However, as we can determine the node is running, it should normally
be no problem to automatically set the node record to "active".
The old behaviour can be restored by setting the new parameter
"repmgrd_exit_on_inactive_node" to "true".
RM19604.
In a corner-case situation where a standby is unable to attach to
the new primary due to a mismatch in the WAL stream, the connection
used to verify the recovery status of the new primary was not being
closed, leading to a risk of connection exhaustion on the new primary.
Addresses GitHub #682.
Per proposal in GitHub #662, this patch attempts to synchronise each
repmgrd's primary reconnection attempts to prevent potential race
conditions. This relies on each node's clock being correcly
synchronised.
Currently this change is experimental and is not enabled by default.
It can be enabled by setting the repmgr.conf parameter
"reconnect_loop_sync".
In theory the local connection should not be affected by the node's
promotion. However we're handing over control to an external command
which is usually just "repmgr standby promote", but could potentially
be a user-defined script with unknowable side effects. So it's
better to be safe than sorry.
In certain corner cases, it's possible repmgrd may end up monitoring
a standby which was a former primary, but the node record has not
yet been updated.
Previously repmgrd would abort the promotion with a cryptic message
about being unable to find a node record for node_id -1 (the
default value for an unknown node id).
This commit addes a new configuration option "always_promote", which
determines whether repmgrd should promote the node in this case.
The default is "false", to effectively maintain the existing behaviour.
Logging output has also been improved to make it clearer what has
happened when this situation occurs.
These indicate:
- the number of visible nodes sharing the current upstream
- the number of nodes on the current upstream
- the total number of nodes in the entire repmgr cluster.
This allows the failover_validation_command to be used to perform
more thorough validations, including cross-referencing external
cluster management state (e.g. if managed by kubernetes).
GitHub #651.
If the primary connection went away, and the upstream is not the
primary, attempt to reconnect if the monitoring update fails.
If the upstream is the primary, the reconnection will happen on
the next connection check.
It's possible the upstream server was intermittently unavailable in
the interval between checks, invalidating the upstream connection.
With check types "ping" and "connection", the connection would not be
restored, so if the availability check was successful, additionally
verify the upstream connection and restore if necessary.
Addresses GitHub #633.
clear_node_info_list() will clean up any remaining active connections,
but we need to ensure all failed connections are cleaned up at the point
of failure to prevent leaks.
Per report in GitHub #643.
Rather than parse the configuration file into a new structure and
copy changed values from that into the main structure, we'll copy
the existing structure before parsing the changed configuration
file directly into the nmain structure, and revert using the copy
if any issues are encountered.
This is necessary as preparation for further reworking of the
configuration file structure handling. It also makes the reload
idempotent.
While we're at it, make some general improvements to the reload
handling, particularly:
- improve logging to show "before" and "after" values
- collate change notifications and only display if no errors
were found
- remove unnecessary double-logging of errors
- various bugfixes
It's possible a repmgrd instance might still be in the primary check
phase while a primary has already been promoted. Therefore it's
necessary to check for new primary notifications here, so we can
follow a new primary as quickly as possible.
In a few places, replication connections are generated from the
parameters used by existing connections. This has resulted in a
number of similar blocks of code which do more-or-less the same
thing almost but not quite identically. In two cases, the code
omitted to set "dbname=replication", which can cause problems
in some contexts.
These code blocks have now been consolidated into standardized
functions.
This also resolves the issue addressed by GitHub #619.
repmgrd has a check to see if the upstream node has unexpectedly
changed, e.g. if the repmgrd service is paused and the PostgreSQL
instance has been pointed to another node.
However this check was relying on the node record on the local node
being up-to-date, which may not be the case immediately after a
failover, when the node is still replaying records updated prior
to the node's own record being updated. In this case it will
mistakenly assume the node is following the original primary
and attempt to restart monitoring, which will fail as the original
primary is no longer available.
To prevent this, we check against the node's record on the upstream
node.
Addresses issue noted in GitHub #587 and #588.
"repmgr daemon" can be interpreted to mean the commands affect the local
daemon process only. Rename the commands which affect the entire cluster
to "repmgr service ...".
The "repmgr daemon ..." form of the affected commands is retained for backwards
compatibility.
Previously, if a standby's repmgrd was looping in degraded monitoring
mode looking for a new primary to follow, once a new primary was
detected the follow command would be executed without any prior
logging at non-DEBUG log levels.
As the witness server does not, by definition, ever have an entry in pg_stat_replication,
we need to check its "attached" status by connecting to the witness server itself
and querying the reported upstream node ID (which should be set by the witness
server repmgrd). If this matches the current primary node ID, we count it as attached.