Previously, "repmgr standby switchover" used the configuration file parameters
"reconnect_interval" and "reconnect_attempts" to define a timeout to determine
whether the current primary (demotion candidate) has shut down.
However, these parameters are intended for primary failure detection and are
generally lower in value, while a controlled shutdown may take longer, resulting
in the switchover being aborted as repmgr was not waiting long enough.
To prevent this happening, parameter "shutdown_check_timeout" has been added.
This complements the existing "standby_reconnect_timeout" parameter used
by "repmgr standby switchover".
Implements GitHub #504.
Currently the (very generic sounding) "standby_reconnect_timeout" configuration
file parameter is used in several different contexts and it would be useful
to have more granular control over the different timeouts it's used to configure.
This patch introduces "node_rejoin_timeout", used in place of "standby_reconnect_timeout"
(which wasn't documented) when "repmgr node rejoin" is executed, to determine
how long to wait for the node to rejoin the replication cluster.
Additionally "repmgrd_standby_startup_timeout" is introduced as a timeout for
failover situations, when repmgrd executes "repmgr standby follow" to follow
a new primary, and waits for the standby to restart and become available
for connections.
"standby_reconnect_timeout" is now only relevant for "repmgr standby switchover".
Implements GitHub #454.
After restarting the standby, poll pg_stat_replication on the upstream
until the standby connects, and exit with an error if it doesn't by the
timeout defined in "standby_follow_timeout".
Implments GitHub #444.
This is used for determining a timeout when reconnecting to the standby
after executing the "follow_command". This will normally not need to be
set explicitly, but maybe useful in cases where the standby's startup
phase can last longer than usual.
This previously happened in the extension SQL code, which could
potentially cause replay problems if installing on a BDR cluster.
As this table is only required for streaming replication failover,
move the initialisation to "repmgr primary register".
Addresses GitHub #344 .
the wait()-macros (WEXITSTATUS etc.) live in sys/wait.h as per
1003.1, and on some platforms (notably FreeBSD) compilation will
fail if wait.h isn't included explicitely.
If the current primary (demotion candidate) still has any files to archive,
it will delay the shutdown until all files are archived. If there is a
substantial number of files, and/or the archive command executes slowly,
this will probably lead to an unwelcome delay in the switchover process.
In previous versions of repmgr, some options had ambiguous meanings,
and/or were used for slightly different purposes. This way we end
up with a couple more options (most of which probably won't need
adjusting) but greater clarity and flexibility.
Removed:
master_reponse_timeout:
renamed to "async_query_timeout", as this was its main usage
retry_promote_interval_secs:
replaced by "primary_notification_timeout"
Added:
async_query_timeout:
timeout (in seconds) when executing asynchronous queries
primary_notification_timeout:
number of seconds to wait for notification from the new primary
after a failover
primary_follow_timeout:
number of seconds to wait for the new primary to become available
when executing "repmgr standby follow"