It's possible the upstream node may be temporarily not accepting connections
but is still running, so we only confirm that connections are not possible once
PQping() reports a negative result.
This feature has been adapted from repmgr4.
Ensure relevant decision making information is visible at the
default log level (INFO), and also that where log messages are
specific to a particular node, that node's ID is noted.
Currently in repmgrd3, if a repmgrd enters failover, but one or more other
repmgrds do not (e.g. partial primary invisibility), the repmgrd in failover
may enter an infinite loop waiting for the repmgrd(s) not in failover to
update shared memory.
Function was created but never actually used, resulting in incorrect
values for "communication_time_lag" in the "repl_status" view.
This appears to have been an oversight in the original commit
( c3b58658ad ).
Addresses GitHub #290
When checking the new standby's record in pg_stat_replication, keep
polling until the expected status is reported, and only give up
after a timeout was exceeded.
Previously repmgr would report an error if status was "startup",
even though this is not a problem.
If failover=automatic, it would be reasonable to expect repmgrd to
consider this node as a promotion candidate, however this will not
happen if it is marked inactive. This often happens when a failed
primary is recloned as a standby but not re-registered, and if
repmgrd would run it would give the incorrect impression that
failover capability is available.
Addresses GitHub #153.
The LSN reported by the shared memory function defaults to "0/0"
(InvalidXLogRecPtr) - this indicates that the repmgrd on that node
hasn't been able to update it yet. However during failover several
places in the code assumed this is an error, which would cause
an endless loop waiting for updates which would never come.
To get around this without changing function definitions, we can
store an explicit message in the shared memory location field so the
caller can tell whether the other node hasn't yet updated the field,
or encountered situation which means it should not be considered
as a promotion candidate (which in most cases will be because
`failover` is set to `manual`.
Resolves GitHub #222.
This is to ensure that when repmgr executes pg_basebackup it doesn't
add any options which would conflict with user-supplied options.
This is related to GitHub #206, where the -S/--slot option has been
added for 9.6 - it's important to check this doesn't conflict with
-X/--xlog-method.
While we're at it, rename the ErrorList handling code to ItemList
etc. so we can use it for generic non-error-related lists.
Otherwise the monitoring table's 'last_wal_standby_location' will stay at
the location of the last streaming WAL received.
This complements the bugfix applied in e814c1120e.