This is needed for better switchover control, so we can instruct
the remote repmgr to issue the appropriate server command rather
than trying to work out what it should be from the local node.
In previous versions of repmgr, some options had ambiguous meanings,
and/or were used for slightly different purposes. This way we end
up with a couple more options (most of which probably won't need
adjusting) but greater clarity and flexibility.
Removed:
master_reponse_timeout:
renamed to "async_query_timeout", as this was its main usage
retry_promote_interval_secs:
replaced by "primary_notification_timeout"
Added:
async_query_timeout:
timeout (in seconds) when executing asynchronous queries
primary_notification_timeout:
number of seconds to wait for notification from the new primary
after a failover
primary_follow_timeout:
number of seconds to wait for the new primary to become available
when executing "repmgr standby follow"
Rather than simply emit "FAILED" for an unreachable node,
indicate whether its state matches that expected by repmgr.
E.g. following output:
ID | Name | Role | Status | Upstream | Connection string
----+-------+---------+----------------------+----------+----------------------------------------------------
1 | node1 | primary | * running | | host=localhost dbname=repmgr user=repmgr port=5501
2 | node2 | standby | ? unreachable | node1 | host=localhost dbname=repmgr user=repmgr port=5502
3 | node3 | standby | ! running as primary | node1 | host=localhost dbname=repmgr user=repmgr port=5503
is for a cluster where "node2" has been manually stopped, and "node3"
manually promoted.
The node(s) with higher ID will "yield", leaving the decision making
up to the node with the lower ID.
This happens very rarely, usually when the random delay is close
enough on two or mode nodes that vote initiation is simultaneous.
It's possible the "failover" is completed by one repmgrd before the
other has a chance to react, in which case the am_bdr_failover_handler()
check will not apply. Instead check if the node record has already been
set to "inactive".