repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-22 22:56:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	73d2088a85	standby follow: don't restart server (PostgreSQL 13 and later) As of PostgreSQL 13, changes to the fundamental replication configuration can be applied with a simple SIGHUP, no restart required. In case the old behaviour is desired, i.e. a full restart to apply the configuration changes, the new configuration parameter "standby_follow_restart" can be set. This parameter has no effect in PostgreSQL 12 and earlier.	2020-09-29 17:53:51 +09:00
Ian Barwick	ce229beff8	repmgrd: add configuration option "always_promote" In certain corner cases, it's possible repmgrd may end up monitoring a standby which was a former primary, but the node record has not yet been updated. Previously repmgrd would abort the promotion with a cryptic message about being unable to find a node record for node_id -1 (the default value for an unknown node id). This commit addes a new configuration option "always_promote", which determines whether repmgrd should promote the node in this case. The default is "false", to effectively maintain the existing behaviour. Logging output has also been improved to make it clearer what has happened when this situation occurs.	2020-09-29 14:18:00 +09:00
Ian Barwick	70061c51aa	Further improve handling of possible pg_control read errors Builds on changes in commit `147f454`, and ensures appropriate action is taken if a value cannot be read from pg_control.	2020-09-28 13:59:34 +09:00
Ian Barwick	3945314e65	Remove PostgreSQL 9.3 support PostgreSQL 9.3 community support ended in November 2018.	2020-09-04 11:37:12 +09:00
Ian Barwick	029164a817	Make configuration default value usage more consistent	2020-05-14 11:57:04 +09:00
Ian Barwick	4a1855fabe	Place configuration settings struct in separate file	2020-05-14 11:56:45 +09:00
Ian Barwick	38b3447bd3	Add repmgr home page to --help output Per PostgreSQL commit 1933ae629e7b706c6c23673a381e778819db307d it seems to be all the rage these days.	2020-04-24 09:41:56 +09:00
Ian Barwick	cd7f36a6fd	Add general check function "check_replication_slots_available()" Make the code previously only used by "standby follow" generally available - we'll want to use this from "node rejoin" as well. While we're at it, when reporting failure due to lack of free replication slots, report the current value of "max_replication_slots".	2020-02-03 16:43:55 +09:00
Ian Barwick	4d4ed3bcd6	Remove BDR 2.x support The BDR 2.x support was conceptual only and was never used in production. As BDR 2.x will be EOL'd shortly, there is no risk it will be needed.	2020-01-16 09:52:42 +09:00
Ian Barwick	7fdf2f1778	Update copyright notices to 2020	2020-01-13 14:06:20 +09:00
Ian Barwick	1ed8b1067a	Prevent use of backend string functions From PostgreSQL 12, port.h forcibly redefines printf() et al to use the versions defined by PostgreSQL (pg_printf() et al). As this causes linking issues in build environments which build pre-Pg12 versions against Pg12's libpq, ensure relevant macros defined in port.h are undefined.	2019-09-26 12:47:51 +09:00
Ian Barwick	2a37e28304	write standby.signal	2019-08-19 10:40:31 +09:00
Ian Barwick	68be86349b	Add function to parse version string returned by "repmgr --version"	2019-08-08 13:47:19 +09:00
Ian Barwick	d893ce227b	repmgrd: optionally exclude/include witness server from child node checks	2019-06-03 16:04:54 +09:00
Ian Barwick	5a90513878	repmgrd: monitor standbys attached to primary This functionality enables repmgrd (when running on the primary) to monitor connected child nodes. It will log connections and disconnections and generate events. Additionally, repmgrd can execute a custom script if the number of connected child nodes falls below a configurable threshold. This script can be used e.g. to "fence" the primary following a failover situation where a new primary has been promoted and all standbys are now child nodes of that primary.	2019-04-22 16:18:52 +09:00
Ian Barwick	314a1e8f4f	use a constant to denote unknown replication lag	2019-03-20 17:26:04 +09:00
Ian Barwick	43f28f4097	Clarify calls to check_primary_status() Use a constant rather than a magic number to indicate non-provision of elapsed degraded monitoring time.	2019-03-18 14:21:34 +09:00
Ian Barwick	fc397f25f6	repmgrd: enable election rerun If "failover_validation_command" is set, and the command returns an error, rerun the election. There is a pause between reruns to avoid "churn"; the length of this pause is controlled by the configuration parameter "election_rerun_interval".	2019-03-12 17:12:19 +09:00
Ian Barwick	a3f90d2bba	Add configuration option "sibling_nodes_disconnect_timeout" This controls the maximum length of time in seconds that repmgrd will wait for other standbys to disconnect their WAL receivers in a failover situation. This setting is only used when "standby_disconnect_on_failover" is set to "true".	2019-03-06 15:56:21 +09:00
Ian Barwick	1615353f48	repmgrd: optionally disconnect WAL receivers during failover This is intended to ensure that all nodes have a constant LSN while making the failover decision. This feature is experimental and needs to be explicitly enabled with the configuration file option "standby_disconnect_on_failover". Note enabling this option will result in a delay in the failover decision until the WAL receiver is disconnected on all nodes.	2019-03-06 15:53:57 +09:00
Ian Barwick	b1875a8d91	Split command execution functions into separate library These may need to be executed by repmgrd.	2019-02-27 14:41:17 +09:00
Ian Barwick	20b79f998c	Define some previously magic numbers	2019-02-01 19:14:16 +09:00
Ian Barwick	9273e7af73	"standby switchover": avoid potential race condition with WAL location check Immediately after the demotion candidate (primary) has shut down, we can't be absolutely sure that the walreceiver has flushed all WAL to disk, so checking pg_last_wal_receive_lsn() at that point might not reflect the actual last available WAL location. To handle this, we'll loop for a while (timeout controlled by configuration parameter "wal_receive_check_timeout") before finally deciding whether the standby is still behind the shut-down primary. Addresses issue raised in GitHub #518.	2019-02-01 12:06:22 +09:00
Ian Barwick	70e4243a1d	Clean up calls to repmgr_atoi() In some places we were still providing "false" from the original implementation, which was intended to indicate whether a negative value was allowed. This has not been a problem, as it merely means we have been providing "0", which is the same thing; however we can finer-tune some of the calls (e.g. node ID must be or greater).	2019-01-30 11:43:43 +09:00
Ian Barwick	7dce3ed234	Update copyright notices to 2019	2019-01-21 14:54:35 +09:00
Ian Barwick	2491b8ae52	Add functionality to "pause" repmgrd In some circumstances, e.g. while performing a switchover, it is essential that repmgrd does not take any kind of failover action, as this will put the cluster into an incorrect state. Previously it was necessary to stop repmgrd on all nodes (or at least those nodes which repmgrd would consider as promotion candidates), however this is a cumbersome and potentially risk-prone operation, particularly if the replication cluster contains more than a couple of servers. To prevent this issue from occurring, this patch introduces the ability to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause") which notifies repmgrd not to take any failover action until the node is "unpaused" ("repmgr daemon unpause"). "repmgr daemon status" provides an overview of each node and whether repmgrd is running, and if so whether it is paused. "repmgr standby switchover" has been modified to automatically pause repmgrd while carrying out the switchover. See documentation for further details.	2018-09-27 16:42:10 +09:00
Ian Barwick	38e3aae053	repmgr: add parameter "shutdown_check_timeout" Previously, "repmgr standby switchover" used the configuration file parameters "reconnect_interval" and "reconnect_attempts" to define a timeout to determine whether the current primary (demotion candidate) has shut down. However, these parameters are intended for primary failure detection and are generally lower in value, while a controlled shutdown may take longer, resulting in the switchover being aborted as repmgr was not waiting long enough. To prevent this happening, parameter "shutdown_check_timeout" has been added. This complements the existing "standby_reconnect_timeout" parameter used by "repmgr standby switchover". Implements GitHub #504.	2018-09-25 11:34:06 +09:00
Ian Barwick	b2081dca52	De-overload configuration file parameter "standby_reconnect_timeout" Currently the (very generic sounding) "standby_reconnect_timeout" configuration file parameter is used in several different contexts and it would be useful to have more granular control over the different timeouts it's used to configure. This patch introduces "node_rejoin_timeout", used in place of "standby_reconnect_timeout" (which wasn't documented) when "repmgr node rejoin" is executed, to determine how long to wait for the node to rejoin the replication cluster. Additionally "repmgrd_standby_startup_timeout" is introduced as a timeout for failover situations, when repmgrd executes "repmgr standby follow" to follow a new primary, and waits for the standby to restart and become available for connections. "standby_reconnect_timeout" is now only relevant for "repmgr standby switchover". Implements GitHub #454.	2018-06-28 18:00:55 +09:00
Ian Barwick	bf0d67c60a	Add repmgr.nodes to the BDR replication set	2018-06-15 14:29:08 +09:00
Ian Barwick	4f642f8332	Detect and store BDR major version number when executing "is_bdr_db()" BDR3 metadata structure is very different to BDR1/2, so we'll need to generate queries according to version.	2018-06-15 14:25:55 +09:00
Ian Barwick	efc388065e	standby follow: check node has connect to new primary After restarting the standby, poll pg_stat_replication on the upstream until the standby connects, and exit with an error if it doesn't by the timeout defined in "standby_follow_timeout". Implments GitHub #444.	2018-06-07 15:04:45 +09:00
Ian Barwick	55441f2729	repmgrd: add configuration file parameter "standby_reconnect_timeout" This is used for determining a timeout when reconnecting to the standby after executing the "follow_command". This will normally not need to be set explicitly, but maybe useful in cases where the standby's startup phase can last longer than usual.	2018-03-02 11:04:56 +09:00
Ian Barwick	b705127a34	"repmgr standby register": add --wait-start option Implements GitHub #356.	2018-01-04 14:56:08 +09:00
Ian Barwick	26a9e848fd	Update copyright notices to 2018	2018-01-02 10:19:46 +09:00
Ian Barwick	472d703d2e	repmgr: initialise "voting_term" in "repmgr primary register" This previously happened in the extension SQL code, which could potentially cause replay problems if installing on a BDR cluster. As this table is only required for streaming replication failover, move the initialisation to "repmgr primary register". Addresses GitHub #344 .	2017-11-28 11:08:12 +09:00
Ian Barwick	a6cc4d80f0	Add "witness register" functionality	2017-11-15 13:47:45 +09:00
Christoph Moench-Tegeder	a89084c6b5	include sys/wait.h for wait() and friends the wait()-macros (WEXITSTATUS etc.) live in sys/wait.h as per 1003.1, and on some platforms (notably FreeBSD) compilation will fail if wait.h isn't included explicitely.	2017-09-19 17:31:49 +02:00
Ian Barwick	687c8b4e27	Initial changes for 9.3 support	2017-09-15 10:27:37 +09:00
Ian Barwick	a9f4a027a7	pgindent run	2017-09-11 11:14:13 +09:00
Ian Barwick	e4f7dc8234	Add copyright notices	2017-09-08 13:27:39 +09:00
Ian Barwick	ed16c32fe7	Check minimum server version for pg_rewind	2017-08-31 13:30:59 +09:00
Ian Barwick	b1ba476241	Rename "archiver" check etc. to "archive-ready" Gives a better indication of what's being checked.	2017-08-17 12:23:56 +09:00
Ian Barwick	eabd56f3be	"standby follow": check node system identifiers match	2017-08-14 11:45:08 +09:00
Ian Barwick	b95b3e50e3	Return system identification information with appropriate data types	2017-08-14 08:50:54 +09:00
Ian Barwick	50b82f785e	Add function to execute "IDENTIFY_SYSTEM"	2017-08-11 22:01:02 +09:00
Ian Barwick	8a50a72dc5	Additional "node status" output	2017-08-10 17:18:08 +09:00
Ian Barwick	4f2161bd83	Cleanup various #defines	2017-08-10 15:11:53 +09:00
Ian Barwick	970ed5d959	Bump minimum supported version to 9.5 We assume availability of pg_rewind.	2017-08-10 15:07:32 +09:00
Ian Barwick	f2cf46bba3	Check replication lag before attempting switchover	2017-08-08 10:16:47 +09:00
Ian Barwick	2499b42ef8	switchover: check for pending archive files on the demotion candidate If the current primary (demotion candidate) still has any files to archive, it will delay the shutdown until all files are archived. If there is a substantial number of files, and/or the archive command executes slowly, this will probably lead to an unwelcome delay in the switchover process.	2017-08-08 00:37:20 +09:00

1 2

74 Commits