Previously, "repmgr standby switchover" used the configuration file parameters
"reconnect_interval" and "reconnect_attempts" to define a timeout to determine
whether the current primary (demotion candidate) has shut down.
However, these parameters are intended for primary failure detection and are
generally lower in value, while a controlled shutdown may take longer, resulting
in the switchover being aborted as repmgr was not waiting long enough.
To prevent this happening, parameter "shutdown_check_timeout" has been added.
This complements the existing "standby_reconnect_timeout" parameter used
by "repmgr standby switchover".
Implements GitHub #504.
Previously, if the server being monitored was not available, repmgrd
would always close the existing connection handle and open a new one.
However, in some cases, e.g. a brief network outage, the existing
connection handle is still good and does not need to be reopened.
This could be particularly problematic if monitoring_history is on,
as this risks leaving orphan sessions on the primary which (given
a sufficiently unstable network) could lead to all available backends
being occupied.
Instead, during an outage we now use a new connection to verify
the server is accessible; if the old connection is still available
(e.g. following a short network interruption) we continue using that;
if not (e.g. the server was restarted), we use the new one.
The unqualified wording previously implied that any running server could
be rejoined with "standby follow", which is not the case with a
"split brain" primary.
Previously query texts were always logged at log level DEBUG, but
that doesn't help much in a normal production environment when
trying to identify the cause of issues.
Also make various other minor improvements to query logging and
handling of database errors.
Implements GitHub #498.
Previously repmgr would first check that a replication can be made
from the demotion candidate to the promotion candidate, however it's
preferable to sanity-check the number of available walsenders first,
to provide a more useful error message.
Avoid copying files during a --dry-run as it may introduce unexpected changes
on the target node. During an actual clone operation, any problems with
copying files will be detected early and the operation aborted before
the actual database cloning commences.
GitHub #491.
Basically any setting which can contain a user-defined script
*must* have the full path set, even if it's repmgr being executed.
We could potentially apply some heuristics to detect if the first
item in the setting is "repmgr" (or more precisely repmgrd's program
name), but this will require some careful thought and testing
that it works as intended.
In the sample logrotate configuration file, use "copytruncate" rather than "create",
as repmgrd currently doesn't reopen the log file (unless the configuration changes).
Per suggestion in GitHub #465.
The documentation implied it would override "promote_command", which is
not the case.
"promote_command" is used by repmgrd to execute "repmgr standby promote"
(either directly or via a custom script).
"service_promote_command" can be set to specify a package-level service
command to promote the local PostgreSQL instance from standby to primary,
e.g. Debian's pg_ctlcluster. If set, this will be executed by "repmgr standby promote".
Also update code comments to clarify usage.
Related to GitHub #473.