If two nodes were in the primary location, and at least one node in
another location, the non-failed node in the primary location was not
recognising itself as a promotion candidate.
Addresses GitHub #407.
Addresses issue described in GitHub #378.
PostgreSQL itself doesn't issue a checkpoint after promotion to ensure
the newly promoted server is available as quickly as possible, so we'll
only execute an explicit CHECKPOINT when it's actually required, i.e.
when pg_rewind will be executed. This is required as pg_rewind uses
the timeline reported in the pg_control file to compare with the
server to be rewound, and the pg_control timeline is only updated after
the first checkpoint, so there is an interval where pg_rewind will
erroneously assume both servers are on the timeline and take no action.
During a switchover operation, once the demoted primary has been restarted
as a standby, repmgr attempts to reconnect to verify its status and drop
any redundant replication slots. However it's possible the standby may still
be in the startup phase, so poll for "standby_reconnect_timeout" seconds
before giving up.
Addresses GitHub #408.
When executing a command on a remote server, repmgr expects the remote binary
to be in the same location as the local binary. It's reasonable to assume
repmgr will be deployed in a unified environment; if not, the onus is on the
user to ensure repmgr can find the remote binary, e.g. by creating appropriate
symlinks.
Addresses query in GitHub #406.
When parsing conninfo strings, ensure the error message pointer is
actually returned to the caller.
Not a criticial issue, just meant the contents of the error message
were not being displayed.
get_superuser_connection() was erroneously using the local node record
to connect to as a superuser, which works when registering the primary
but obviously not when cloning a standby.
Addresses GitHub #380.
This is used for determining a timeout when reconnecting to the standby
after executing the "follow_command". This will normally not need to be
set explicitly, but maybe useful in cases where the standby's startup
phase can last longer than usual.
- emit explicit startup NOTICE
- emit NOTICE when falling back to degraded monitoring on a primary node
- improve log message and event notification details when monitoring
a former primary which has been reconnected as a standby
This introduces following new configuration file parameters, which
were previously hard-coded values:
- promote_check_timeout
- promote_check_interval
Implements GitHub #387.