In theory the local connection should not be affected by the node's
promotion. However we're handing over control to an external command
which is usually just "repmgr standby promote", but could potentially
be a user-defined script with unknowable side effects. So it's
better to be safe than sorry.
As of PostgreSQL 13, changes to the fundamental replication
configuration can be applied with a simple SIGHUP, no restart
required.
In case the old behaviour is desired, i.e. a full restart to apply
the configuration changes, the new configuration parameter
"standby_follow_restart" can be set. This parameter has no effect
in PostgreSQL 12 and earlier.
In certain corner cases, it's possible repmgrd may end up monitoring
a standby which was a former primary, but the node record has not
yet been updated.
Previously repmgrd would abort the promotion with a cryptic message
about being unable to find a node record for node_id -1 (the
default value for an unknown node id).
This commit addes a new configuration option "always_promote", which
determines whether repmgrd should promote the node in this case.
The default is "false", to effectively maintain the existing behaviour.
Logging output has also been improved to make it clearer what has
happened when this situation occurs.
If the control file couldn't be parsed for whatever reason, return
the default value for the requested parameter.
It'd be better to have the caller pass in a pointer to the parameter
and have the function return bool so the caller doesn't assume the
control file was read successfully. This is important for handling
DBState, where no "value unknown" default is available.
These indicate:
- the number of visible nodes sharing the current upstream
- the number of nodes on the current upstream
- the total number of nodes in the entire repmgr cluster.
This allows the failover_validation_command to be used to perform
more thorough validations, including cross-referencing external
cluster management state (e.g. if managed by kubernetes).
GitHub #651.
If the provided connection does not have sufficient permission to read
"pg_stat_replication.state", and there is an entry for the node in
"pg_stat_replication", assume it's connected. Finer-grained detection
requires additional user permissions, nothing we can do about that.
Previously the check verifying that a node has connected to its upstream
merely assumed the presence of a record in pg_stat_replication indicates
a successful replication connection. However the record may contain a
state other than "streaming", typically "startup" (which will occur when
a node has diverged from its upstream and will therefore never
transition to "streaming"), which needs to be taken into account when
considering the state of the replication connection to avoid false
positives.
Add notes to the documention mentioning that after postgres or repmgr
upgrades (postgres major upgrades), there are some changes that need
to be taken care of.
Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
Previously these actions were hard-wired to assume node IDs would only
ever have two digits at most.
Refactor to use the same table generation code as other actions, which
properly handles variable column sizes.
The option was never actually set anywhere.
By removing it, readline will now produce a reasoanably helpful message
in the offchance it is provided, e.g.:
option '--node=foo' is ambiguous; possibilities: '--node-id' '--node-name'