Check it's actually possible for the demotion candidate to attach to
the promotion candidate before executing the switchover.
As with other checks of this nature, there's a faint possibility the
situation could change between the time the check is carried out and
the demotion candidate is restarted to connect to the promotion candidate,
but there's not a lot we can do about that. The main purpose is to
be able to catch existing misconfigurations before anything gets changed.
Implements GitHub #370.
Check that sufficient walsenders will be available on the promotion
candidate, and if replication slots are in use check if enough of
those will be available.
Note these checks can't guarantee that the walsenders/slots will
be available at the appropriate points during the switchover process,
but do ensure that existing configuration problems will be caught.
Implements GitHub #371.
By checking the PID file in the same way pg_ctl does, we can be pretty
much certain whether the target data directory contains an active
PostgreSQL instance.
"--upstream-node-id", if provided, was not being passed through to
the SQL query executed via the Barman server.
Also modified the query to select the primary node if "--upstream-node-id"
is not provided.
Note: this is a very niche use case.
Attempting to use the contents of pg_control to tell whether the directory
is in use by PostgreSQL can result in false positives; we should use
a check based on the pidfile.
Also change the HINT to indicate a data directory can be overwritten
if -F/--force is provided.
This ensures any connection errors are displayed by default in a
comprehensible, easily reportable way, and saves having to request/filter
DEBUG output.
Implements GitHub #369.
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.
Previously, if an issue was encountered with the old primary, but user
provided -F/--force to have repmgr promote the standby anyway, repmgr
would exit with the log message "STANDBY SWITCHOVER is complete"
and exit code 0 (SUCCESS).
To better report this partial completion, repmgr will now emit the message
"STANDBY SWITCHOVER has completed with issues" (and a HINT to check preceding
log messages) and new exit code 22 (ERR_SWITCHOVER_INCOMPLETE).
If logging output not explicitly rediretced with "-l" in the pg_ctl
options, repmgr would hang waiting for pg_ctl output.
Note that we recommend using the OS-level service commands where
available.
This was required for a specific use case during pre-release
development and is no longer needed now the physical streaming
replication handling is implemented.
It's possible that a node was registered with "use_replication_slots=false"
but that was later changed to "use_replication_slots=true". If the node
was not subsequently re-registered, the node record will contain an empty
slot name, which will cause any slot creation operation during
"standby follow" or "node rejoin" to fail.
To prevent this happening, check for an empty slot name and automatically
set before proceeding.
Addresses GitHub #343.