Previously, if the server being monitored was not available, repmgrd
would always close the existing connection handle and open a new one.
However, in some cases, e.g. a brief network outage, the existing
connection handle is still good and does not need to be reopened.
This could be particularly problematic if monitoring_history is on,
as this risks leaving orphan sessions on the primary which (given
a sufficiently unstable network) could lead to all available backends
being occupied.
Instead, during an outage we now use a new connection to verify
the server is accessible; if the old connection is still available
(e.g. following a short network interruption) we continue using that;
if not (e.g. the server was restarted), we use the new one.
Previously query texts were always logged at log level DEBUG, but
that doesn't help much in a normal production environment when
trying to identify the cause of issues.
Also make various other minor improvements to query logging and
handling of database errors.
Implements GitHub #498.
Previously repmgr would first check that a replication can be made
from the demotion candidate to the promotion candidate, however it's
preferable to sanity-check the number of available walsenders first,
to provide a more useful error message.
This suppresses log output below log level ERROR. This is useful mainly
when repmgr is being executed programmatically, e.g. in a cronjob,
where it's only useful to receive output if something goes wrong.
Note we advise against using this option when executing repmgr
commands which operate on PostgreSQL nodes (standby follow,
standby promote, standby switchover, node rejoin), particularly when
executed by repmgrd, as the log output will provide valuable
troubleshooting information.
Implements suggestion in GitHub #468.
Default was previously NOTICE (as in repmgr 3.x) but documentation
implied it was INFO, and many of the the documentation examples assume
it is.
This produces some quite informative log output, without creating excessive
log file volume. In particular it's useful to get a better idea of what
repmgrd is actually doing.
Also add documentation section for the log configuration parameters.
GitHub #470, containing change suggested in GitHub #467.
If any issues are detected (e.g. node not reachable, unexpected node status
etc.), "repmgr cluster show" returns exit code 25 ("ERR_NODE_STATUS").
Note that exit code 25 was introduced recently as "ERR_CLUSTER_CHECK",
however it makes sense to use this to indicate issues detected by any
command which can detect node issues.
Addresses GitHub #456.
The default value for "wait_register_sync_seconds" was zero, which is treated
as disabling --wait-sync altogether. Default value now set to -1, which is taken
to mean no timeout value supplied.
This matches the behaviour of other PostgreSQL utilities such as psql, though
repmgr will only abort once all command line options are parsed, so as many
errors as possible are found and displayed. If a repmgr "command" (e.g.
"repmgr primary ..." was provided, a hint about the relevant command
help section (e.g. "repmgr primary --help") will be provided alongside
the generic help command (i.e. "repmgr --help").
Addresses GitHub #464, with further improvements.
It's hard to imagine a use case where this isn't desirable, but
in case, for whatever reason, the user does not wish to daemonize the
process, the command line option "--daemonize=false" can be provided.
Implements GitHub #458.
Check that sufficient walsenders will be available on the promotion
candidate, and if replication slots are in use check if enough of
those will be available.
Note these checks can't guarantee that the walsenders/slots will
be available at the appropriate points during the switchover process,
but do ensure that existing configuration problems will be caught.
Implements GitHub #371.