Previously, if the server being monitored was not available, repmgrd
would always close the existing connection handle and open a new one.
However, in some cases, e.g. a brief network outage, the existing
connection handle is still good and does not need to be reopened.
This could be particularly problematic if monitoring_history is on,
as this risks leaving orphan sessions on the primary which (given
a sufficiently unstable network) could lead to all available backends
being occupied.
Instead, during an outage we now use a new connection to verify
the server is accessible; if the old connection is still available
(e.g. following a short network interruption) we continue using that;
if not (e.g. the server was restarted), we use the new one.
Previously query texts were always logged at log level DEBUG, but
that doesn't help much in a normal production environment when
trying to identify the cause of issues.
Also make various other minor improvements to query logging and
handling of database errors.
Implements GitHub #498.
Previously repmgr would first check that a replication can be made
from the demotion candidate to the promotion candidate, however it's
preferable to sanity-check the number of available walsenders first,
to provide a more useful error message.
The unqualified wording previously implied that any running server could
be rejoined with "standby follow", which is not the case with a
"split brain" primary.
Avoid copying files during a --dry-run as it may introduce unexpected changes
on the target node. During an actual clone operation, any problems with
copying files will be detected early and the operation aborted before
the actual database cloning commences.
GitHub #491.
Basically any setting which can contain a user-defined script
*must* have the full path set, even if it's repmgr being executed.
We could potentially apply some heuristics to detect if the first
item in the setting is "repmgr" (or more precisely repmgrd's program
name), but this will require some careful thought and testing
that it works as intended.
In the sample logrotate configuration file, use "copytruncate" rather than "create",
as repmgrd currently doesn't reopen the log file (unless the configuration changes).
Per suggestion in GitHub #465.
The documentation implied it would override "promote_command", which is
not the case.
"promote_command" is used by repmgrd to execute "repmgr standby promote"
(either directly or via a custom script).
"service_promote_command" can be set to specify a package-level service
command to promote the local PostgreSQL instance from standby to primary,
e.g. Debian's pg_ctlcluster. If set, this will be executed by "repmgr standby promote".
Also update code comments to clarify usage.
Related to GitHub #473.