repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-25 08:06:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	84f4c6c979	doc: note that --siblings-follow will become default in a future release	2019-04-02 11:04:36 +09:00
Ian Barwick	67e977592c	standby switchover: list nodes which will remain attatched to the old primary If --siblings-follow is not supplied, list all nodes which repmgr considers to be siblings (this will include the witness server, if in use), and which will remain attached to the old primary.	2019-04-02 10:46:59 +09:00
Ian Barwick	f2362a06fa	doc: update "standby switchover" reference	2019-02-12 16:39:13 +09:00
Ian Barwick	a4cd4ee553	doc: fix quoting in "standby switchover" index entries	2019-02-11 10:34:02 +09:00
Ian Barwick	cce8b76171	"standby switchover": abort if promotion candidate has WAL replay paused If replay is paused, we can't be really sure that more WAL will be received between the check and the promote operation, which would risk the promote operation not taking place during the switchover (it would happen as soon as WAL replay is resumed and pending WAL is replayed). Therefore we simply quit with an informative slew of messages and leave the user to sort it out. GitHub #540.	2019-02-05 16:32:39 +09:00
Ian Barwick	d8048060a2	doc: rephrase exit code preamble Previously it kind of implied more than one code can be emitted.	2019-02-05 11:06:26 +09:00
Ian Barwick	9273e7af73	"standby switchover": avoid potential race condition with WAL location check Immediately after the demotion candidate (primary) has shut down, we can't be absolutely sure that the walreceiver has flushed all WAL to disk, so checking pg_last_wal_receive_lsn() at that point might not reflect the actual last available WAL location. To handle this, we'll loop for a while (timeout controlled by configuration parameter "wal_receive_check_timeout") before finally deciding whether the standby is still behind the shut-down primary. Addresses issue raised in GitHub #518.	2019-02-01 12:06:22 +09:00
Ian Barwick	9349171b55	doc: document "node_rejoin_timeout" for switchover operations	2019-01-30 15:43:34 +09:00
Ian Barwick	af0a60b8eb	doc: remove redundant warning No longer relevant for 4.2 and later.	2018-11-12 09:38:11 +09:00
Ian Barwick	2491b8ae52	Add functionality to "pause" repmgrd In some circumstances, e.g. while performing a switchover, it is essential that repmgrd does not take any kind of failover action, as this will put the cluster into an incorrect state. Previously it was necessary to stop repmgrd on all nodes (or at least those nodes which repmgrd would consider as promotion candidates), however this is a cumbersome and potentially risk-prone operation, particularly if the replication cluster contains more than a couple of servers. To prevent this issue from occurring, this patch introduces the ability to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause") which notifies repmgrd not to take any failover action until the node is "unpaused" ("repmgr daemon unpause"). "repmgr daemon status" provides an overview of each node and whether repmgrd is running, and if so whether it is paused. "repmgr standby switchover" has been modified to automatically pause repmgrd while carrying out the switchover. See documentation for further details.	2018-09-27 16:42:10 +09:00
Ian Barwick	9439467958	doc: add troubleshooting section to switchover documentation	2018-09-25 13:47:58 +09:00
Ian Barwick	38e3aae053	repmgr: add parameter "shutdown_check_timeout" Previously, "repmgr standby switchover" used the configuration file parameters "reconnect_interval" and "reconnect_attempts" to define a timeout to determine whether the current primary (demotion candidate) has shut down. However, these parameters are intended for primary failure detection and are generally lower in value, while a controlled shutdown may take longer, resulting in the switchover being aborted as repmgr was not waiting long enough. To prevent this happening, parameter "shutdown_check_timeout" has been added. This complements the existing "standby_reconnect_timeout" parameter used by "repmgr standby switchover". Implements GitHub #504.	2018-09-25 11:34:06 +09:00
Ian Barwick	b5f640d04d	doc: improve event notification documentation - add undocumented events (per report from Daymel Bonne) - split up list into sections for better overview - where feasible, add cross-links	2018-08-30 12:39:58 +09:00
Ian Barwick	7ecfb333b9	doc: add note about switchover and exclusive backups Also rename server_not_in_exclusive_backup_mode() to avoid double negatives. GitHub #476.	2018-07-19 16:02:31 +09:00
Ian Barwick	c6b8d78bad	doc: add extra emphasis about not running repmgrd during switchover One day this will no longer be an issue, until then let's hope the fine documentation is read.	2018-07-11 09:53:29 +09:00
Ian Barwick	b2081dca52	De-overload configuration file parameter "standby_reconnect_timeout" Currently the (very generic sounding) "standby_reconnect_timeout" configuration file parameter is used in several different contexts and it would be useful to have more granular control over the different timeouts it's used to configure. This patch introduces "node_rejoin_timeout", used in place of "standby_reconnect_timeout" (which wasn't documented) when "repmgr node rejoin" is executed, to determine how long to wait for the node to rejoin the replication cluster. Additionally "repmgrd_standby_startup_timeout" is introduced as a timeout for failover situations, when repmgrd executes "repmgr standby follow" to follow a new primary, and waits for the standby to restart and become available for connections. "standby_reconnect_timeout" is now only relevant for "repmgr standby switchover". Implements GitHub #454.	2018-06-28 18:00:55 +09:00
Ian Barwick	eca1943026	doc: emphasize that repmgrd should not be running during a switchover	2018-06-12 10:30:35 +09:00
Ian Barwick	3b0cde2846	repmgr: cluster check commands - non-zero exit code if node(s) unavailable Return ERR_CLUSTER_CHECK if one or nodes was not reachable. Implements GitHub #447.	2018-06-12 10:30:11 +09:00
Ian Barwick	93f80c413e	doc: link to service command configuration from switchover section	2018-04-20 10:15:22 +09:00
Ian Barwick	3ccf1cf182	Enable pg_rewind to be used with PostgreSQL 9.3/9.4 pg_rewind is not part of the core distribution for those, but we provided support in repmgr 3.3 so should extend it to repmgr 4. Note that there is no check in place whether the pg_rewind binary exists, so it's up to the user to ensure it's present. Addresses GitHub #413.	2018-04-02 20:54:29 +09:00
Ian Barwick	1558497ae4	repmgr: poll demoted primary after restart during switchover During a switchover operation, once the demoted primary has been restarted as a standby, repmgr attempts to reconnect to verify its status and drop any redundant replication slots. However it's possible the standby may still be in the startup phase, so poll for "standby_reconnect_timeout" seconds before giving up. Addresses GitHub #408.	2018-03-27 16:44:10 +09:00
Ian Barwick	ae691688be	doc: fix descriptions of %p event notification script parameter	2018-02-05 15:52:48 +09:00
Ian Barwick	b4dbee517f	doc: note password SSH requirements for "standby switchover"	2018-02-02 17:18:31 +09:00
Ian Barwick	811d2a45bd	"standby switchover": improve log messages and add new exit code Previously, if an issue was encountered with the old primary, but user provided -F/--force to have repmgr promote the standby anyway, repmgr would exit with the log message "STANDBY SWITCHOVER is complete" and exit code 0 (SUCCESS). To better report this partial completion, repmgr will now emit the message "STANDBY SWITCHOVER has completed with issues" (and a HINT to check preceding log messages) and new exit code 22 (ERR_SWITCHOVER_INCOMPLETE).	2018-01-31 11:03:54 +09:00
Ian Barwick	5bd8cf958a	repmgr standby switchover: add "%p" event notification parameter This will contain the node ID of the former primary.	2018-01-10 12:25:12 +09:00
Ian Barwick	5a45997db5	doc: document command line options for "standby switchover"	2018-01-10 12:25:07 +09:00
Ian Barwick	625d032435	doc: link event notification page from relevate command reference pages	2018-01-04 14:56:15 +09:00
Ian Barwick	f78c169c3d	docs: improve event notification documentation	2017-11-29 14:43:28 +09:00
Ian Barwick	2341da7a06	docs: convert command reference sections to <refentry> format Note that most entries still need a bit more tidying up, consistent structuring, provision of more examples etc.	2017-10-31 11:27:13 +09:00
Ian Barwick	2745c92fc8	Documentation: update markup	2017-10-18 11:12:20 +09:00
Ian Barwick	d156de533d	Split up command reference	2017-10-05 11:48:21 +09:00

31 Commits