If neither the local node nor the upstream are available, and
"standby_disconnect_on_failover" is set, attempting to fetch
the walreceiver PID will result in repmgrd terminating.
Add a check that the connection is valid before attempting to
fetch the walreceiver PID.
Addresses GitHub #675.
As it seems redirecting stderr to stdin (2>&1) when executing
system commands results in a SIGPIPE (141) return code, making
it impossible to determine the actual return code, redirect
stderr to a temporary file and collate the output from that.
There are possibly better ways of doing this which could
be revisited at a future date.
Output a command, which when excuted on the local node (promotion
candidate) will attempt to remotely connect to the demotion candidate
and display both the connection message encountered and the connection
parameters used.
This is useful for corner-cases where the connection normally succeeds if a
particular environment variable (e.g. PGPORT) is normally set, but is
not set in the environment where SSH is executed.
If the WAL receiver has been temporarily disabled, we don't want to
wait for it to start up as it may not be able to at that point; we do
however need to reset "wal_retrieve_retry_interval".
This is intended to ensure that all nodes have a constant LSN while
making the failover decision.
This feature is experimental and needs to be explicitly enabled with the
configuration file option "standby_disconnect_on_failover".
Note enabling this option will result in a delay in the failover decision
until the WAL receiver is disconnected on all nodes.