Explicitly check whether the "repmgr node rejoin" command on the
demotion candidate succeeded. Due to the way SSH execution is
currently implemented, we can return either the command execution
status or the command output; to ensure any errors are available,
log them to a temporary file on the demotion candidate and note
its location in case of an error.
While we're at it, improve error message handling when the demotion
candidate fails to rejoin.
- state how many nodes are to be operated on
- if errors were encountered with any nodes, emit the total number
of nodes as well as the number of affected nodes
- log nodes where repmgrd was not running anyway as NOTICE, not
WARNING
Unfortunately it hasn't been possible yet to include all available
configuration items in the main documentation, but we should at least
make it easier to find the full list.
If a PostgreSQL instance was shut down while repmgrd was running, and
repmgrd was subsequently restarted (this chain of events could occur
during e.g. a server reboot), the node record will have been set to
"inactive". Previously, in this case repmgrd would refuse to start up.
However, as we can determine the node is running, it should normally
be no problem to automatically set the node record to "active".
The old behaviour can be restored by setting the new parameter
"repmgrd_exit_on_inactive_node" to "true".
RM19604.
The string may not always start with "PostgreSQL" when building
against non-community versions.
Life would be much easier here if there was an option like
"pg_config --version-number" or similar.
If executing "repmgr standby clone --replication-conf-only" on a node
which was set up without replication slots, but the repmgr configuration
was since changed to "use_replication_slots=1", repmgr will attempt to
create the replication slot. This will however fail if "slot_name"
is not set in the node's record, so have repmgr set the slot_name in
this case.
It might be preferable to preemptively create the slot name for each
node when configuring the cluster, however this would be a behavioural
change which would be better off in a major release (for example, it's
conceivable a user runs sanity checks on the node records and expects
to find the slot names empty if replication slots are not in use).
Some of the more generically named functions are at risk of colliding
with functions defined in other libraries. To mitigate that risk,
prefix with "repmgr_", unless the name already has some reference
to repmgr.
This requires an extension version bump.
RM20471.
As it's possible to specify the connection information for any available
node, but currently not possible to rejoin to a node other than the
primary, explicitly mention what the rejoin target will be.
In a corner-case situation where a standby is unable to attach to
the new primary due to a mismatch in the WAL stream, the connection
used to verify the recovery status of the new primary was not being
closed, leading to a risk of connection exhaustion on the new primary.
Addresses GitHub #682.
Previously, repmgr would forcibly change the permissions on a data
directory to 0700. However from PostgreSQL 11, 0750 is also valid,
so that value should not be changed.