repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-23 15:16:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	b3b9281253	Parse pg_basebackup option --waldir/--xlogdir	2020-10-07 15:13:50 +09:00
Ian Barwick	e10d9fd393	EXPERIMENTAL: synchronise try_primary_reconnect()'s reconnection loop Per proposal in GitHub #662, this patch attempts to synchronise each repmgrd's primary reconnection attempts to prevent potential race conditions. This relies on each node's clock being correcly synchronised. Currently this change is experimental and is not enabled by default. It can be enabled by setting the repmgr.conf parameter "reconnect_loop_sync".	2020-10-06 13:35:49 +09:00
Ian Barwick	5b254a1be9	repmgrd: add parameter "failover_delay" This parameter is not documented and intended for use during testing. It should not be used in production.	2020-10-05 16:43:06 +09:00
Ian Barwick	73d2088a85	standby follow: don't restart server (PostgreSQL 13 and later) As of PostgreSQL 13, changes to the fundamental replication configuration can be applied with a simple SIGHUP, no restart required. In case the old behaviour is desired, i.e. a full restart to apply the configuration changes, the new configuration parameter "standby_follow_restart" can be set. This parameter has no effect in PostgreSQL 12 and earlier.	2020-09-29 17:53:51 +09:00
Ian Barwick	ce229beff8	repmgrd: add configuration option "always_promote" In certain corner cases, it's possible repmgrd may end up monitoring a standby which was a former primary, but the node record has not yet been updated. Previously repmgrd would abort the promotion with a cryptic message about being unable to find a node record for node_id -1 (the default value for an unknown node id). This commit addes a new configuration option "always_promote", which determines whether repmgrd should promote the node in this case. The default is "false", to effectively maintain the existing behaviour. Logging output has also been improved to make it clearer what has happened when this situation occurs.	2020-09-29 14:18:00 +09:00
Ian Barwick	1f3e098104	Add option "--dump-config" This is initially intended for verifying the configuration parsing mechanism and is currently undocumented.	2020-09-18 15:12:22 +09:00
Ian Barwick	e65738c989	Explicitly unset search path when connecting to database	2020-05-22 16:11:55 +09:00
Ian Barwick	d59cadd5f6	Remove old configuration handling code This expunges two large and cumbersome sets of if/else statements and the T_CONFIGURATION_OPTIONS_INITIALIZER macro, all of which needed to be kept in sync when adding/modifying configuration file parameters.	2020-05-14 11:57:16 +09:00
Ian Barwick	04aee7b406	Set defaults before loading configuration file	2020-05-14 11:57:07 +09:00
Ian Barwick	3dde8f1386	"retire" old configuration handling code	2020-05-14 11:57:00 +09:00
Ian Barwick	4a1855fabe	Place configuration settings struct in separate file	2020-05-14 11:56:45 +09:00
Ian Barwick	d79d4c50b2	handle tablespace mapping	2020-05-14 11:56:42 +09:00
Ian Barwick	2071fa8c7e	Initial implementation of an iterable configuration item list This implements storing the configuration file parameter definitions in an iterable list. This will replace the existing way of populating the configuration struct, which is a long and cumbersome if/else structure, and will make it possible to later dump the imported configuration.	2020-05-14 11:56:38 +09:00
Ian Barwick	fdc6f61257	Pass base configuration file directory to configuration parser If provided, the parser will use this to process include directives with unqualified filenames.	2020-05-14 11:56:23 +09:00
Ian Barwick	f5018e42f3	Initial refactoring of configuration file parsing Have the configuration file parsing routine itself open the respective configuration file, rather than passing a file pointer from the original caller. This is required for handling include directives, which we'll want to do for sanity-checking the PostgreSQL configuration on a freshly cloned, unstarted standby.	2020-05-14 11:56:19 +09:00
Ian Barwick	bcc284cac9	Refactor configuration file reload handling Rather than parse the configuration file into a new structure and copy changed values from that into the main structure, we'll copy the existing structure before parsing the changed configuration file directly into the nmain structure, and revert using the copy if any issues are encountered. This is necessary as preparation for further reworking of the configuration file structure handling. It also makes the reload idempotent. While we're at it, make some general improvements to the reload handling, particularly: - improve logging to show "before" and "after" values - collate change notifications and only display if no errors were found - remove unnecessary double-logging of errors - various bugfixes	2020-05-05 15:29:07 +09:00
Ian Barwick	d37513312a	Move the main configfile structure into configfile.c This is required for a later refactoring of the configuration file handling.	2020-05-05 14:43:55 +09:00
Ian Barwick	5ee4540640	Fix typo in comment	2020-05-01 12:13:06 +09:00
Ian Barwick	4d4ed3bcd6	Remove BDR 2.x support The BDR 2.x support was conceptual only and was never used in production. As BDR 2.x will be EOL'd shortly, there is no risk it will be needed.	2020-01-16 09:52:42 +09:00
Ian Barwick	7fdf2f1778	Update copyright notices to 2020	2020-01-13 14:06:20 +09:00
Ian Barwick	9eb6ce52b4	Write replication configuration for Pg12 and later	2019-08-19 10:40:27 +09:00
Ian Barwick	f5044465cb	Add function to safely modify postgresql.auto.conf This is required for PostgreSQL 12 and later.	2019-08-14 16:57:42 +09:00
Ian Barwick	94ba635811	Define our own PG_AUTOCONF_FILENAME	2019-08-13 16:48:44 +09:00
Ian Barwick	8d55cab25e	Convert configuration file parsing to use flex Previously, repmgr was using a very simple ad-hoc string-based parser, which had various limitations and allowed configuration files to be created in a way which could cause confusion and/or unexpected behaviour. For example, it accepted strings enclosed in single quotes, but treated strings enclosed in double quotes literally. A node_name defined thusly: node_name="somenode" would result in the literal value '"somenode"' being used, which could lead to unobvious errors along the lines of: no record found for ""somenode"" The configuration file parser has been adapted from the one used by PostgreSQL itself, so behaves more-or-less identically (though some functions such as file inclusion are not supported in repmgr). This makes configuration parsing more robust and consistent; additionally, error reporting will be more precise. Note this does mean that some repmgr.conf items previously accepted as valid by repmgr will now be rejected; in particular this includes strings containing spaces which are not enclosed in single quotes.	2019-08-01 10:17:20 +09:00
Ian Barwick	5bf9605286	Revert "Convert configuration file parsing to use flex" This reverts commit `c6ca183247`. Backing out this patch for now as the Debian build system doesn't seem to like it, even though it builds just fine on Debian itself.	2019-07-18 10:19:18 +09:00
Ian Barwick	c6ca183247	Convert configuration file parsing to use flex Previously, repmgr was using a very simple ad-hoc string-based parser, which had various limitations and allowed configuration files to be created in a way which could cause confusion and/or unexpected behaviour. For example, it accepted strings enclosed in single quotes, but treated strings enclosed in double quotes literally. A node_name defined thusly: node_name="somenode" would result in the literal value '"somenode"' being used, which could lead to unobvious errors along the lines of: no record found for ""somenode"" The configuration file parser has been adapted from the one used by PostgreSQL itself, so behaves more-or-less identically (though some functions such as file inclusion are not supported in repmgr). This makes configuration parsing more robust and consistent; additionally, error reporting will be more precise. Note this does mean that some repmgr.conf items previously accepted as valid by repmgr will now be rejected; in particular this includes strings containing spaces which are not enclosed in single quotes.	2019-07-03 12:18:01 +09:00
Ian Barwick	d893ce227b	repmgrd: optionally exclude/include witness server from child node checks	2019-06-03 16:04:54 +09:00
Ian Barwick	45e17223b9	Update variable/field names relating to pg_basebackup's -X option Now the "xlog nomenclature" Pg versions are fading into the past, rename things related to handling pg_basebackup's -X option (was: --xlog-method, now: --wal-method) to start with "wal_" rather than "xlog_". This is a cosmetic change for code clarity.	2019-05-30 09:32:06 +09:00
Ian Barwick	5a90513878	repmgrd: monitor standbys attached to primary This functionality enables repmgrd (when running on the primary) to monitor connected child nodes. It will log connections and disconnections and generate events. Additionally, repmgrd can execute a custom script if the number of connected child nodes falls below a configurable threshold. This script can be used e.g. to "fence" the primary following a failover situation where a new primary has been promoted and all standbys are now child nodes of that primary.	2019-04-22 16:18:52 +09:00
Ian Barwick	80f66e87c9	Improve string handling during configuration file reload	2019-04-16 11:20:41 +09:00
Ian Barwick	ba1f05ece9	Restrict "node_name" to maximum 63 characters In "recovery.conf", the configuration parameter "node_name" is used as the "application_name" value, which will be truncated by PostgreSQL to 63 characters (NAMEDATALEN - 1). repmgr sometimes needs to be able to extract the application name from pg_stat_replication to determine if a node is connected (e.g. when executing "repmgr standby register"), so the comparison will fail if "node_name" exceeds 63 characters.	2019-03-28 10:37:57 +09:00
Ian Barwick	7d0caefaee	Fix logging related to "connection_check_type" Also log the selected type at repmgrd startup.	2019-03-20 11:58:18 +09:00
Ian Barwick	c2206b007a	repmgrd: optionally check upstream availability through connection attempts	2019-03-14 15:44:53 +09:00
Ian Barwick	fc397f25f6	repmgrd: enable election rerun If "failover_validation_command" is set, and the command returns an error, rerun the election. There is a pause between reruns to avoid "churn"; the length of this pause is controlled by the configuration parameter "election_rerun_interval".	2019-03-12 17:12:19 +09:00
Ian Barwick	db0d71c6a7	Initial implementation of "failover_validation_command"	2019-03-08 08:49:15 +09:00
Ian Barwick	33fefd9f52	Add configuration option "primary_visibility_consensus" This determines whether repmgrd should continue with a failover if one or more nodes report they can still see the standby.	2019-03-07 10:41:42 +09:00
Ian Barwick	a3f90d2bba	Add configuration option "sibling_nodes_disconnect_timeout" This controls the maximum length of time in seconds that repmgrd will wait for other standbys to disconnect their WAL receivers in a failover situation. This setting is only used when "standby_disconnect_on_failover" is set to "true".	2019-03-06 15:56:21 +09:00
Ian Barwick	1615353f48	repmgrd: optionally disconnect WAL receivers during failover This is intended to ensure that all nodes have a constant LSN while making the failover decision. This feature is experimental and needs to be explicitly enabled with the configuration file option "standby_disconnect_on_failover". Note enabling this option will result in a delay in the failover decision until the WAL receiver is disconnected on all nodes.	2019-03-06 15:53:57 +09:00
Ian Barwick	63f7ad546e	repmgrd: add option "connection_check_type" This enable selection of the method repmgrd uses to check whether the upstream node is available. Possible values are: - "ping" (default): uses PQping() to check server availability - "connection": executes a query on the connection to check server availability (similar to repmgr3.x).	2019-03-06 12:09:54 +09:00
Ian Barwick	9273e7af73	"standby switchover": avoid potential race condition with WAL location check Immediately after the demotion candidate (primary) has shut down, we can't be absolutely sure that the walreceiver has flushed all WAL to disk, so checking pg_last_wal_receive_lsn() at that point might not reflect the actual last available WAL location. To handle this, we'll loop for a while (timeout controlled by configuration parameter "wal_receive_check_timeout") before finally deciding whether the standby is still behind the shut-down primary. Addresses issue raised in GitHub #518.	2019-02-01 12:06:22 +09:00
Ian Barwick	32b81e7d49	"daemon start": initial implementation	2019-01-29 13:01:14 +09:00
Ian Barwick	7dce3ed234	Update copyright notices to 2019	2019-01-21 14:54:35 +09:00
Ian Barwick	11d25e2aef	Add configuration parameter "repmgr_bindir" This is to facilitate remote invocation of repmgr when the repmgr binary is located somewhere other than the PostgreSQL binary directory, as it cannot be assumed all package maintainers will install repmgr there. This parameter is optional; if not set (the default), repmgr will fall back to "pg_bindir" (if set). Addresses GitHub #246.	2018-10-02 09:59:12 +09:00
Ian Barwick	38e3aae053	repmgr: add parameter "shutdown_check_timeout" Previously, "repmgr standby switchover" used the configuration file parameters "reconnect_interval" and "reconnect_attempts" to define a timeout to determine whether the current primary (demotion candidate) has shut down. However, these parameters are intended for primary failure detection and are generally lower in value, while a controlled shutdown may take longer, resulting in the switchover being aborted as repmgr was not waiting long enough. To prevent this happening, parameter "shutdown_check_timeout" has been added. This complements the existing "standby_reconnect_timeout" parameter used by "repmgr standby switchover". Implements GitHub #504.	2018-09-25 11:34:06 +09:00
Ian Barwick	44a224ad92	repmgrd: fix configuration file reloading Don't allow "promote_command" or "follow_command" to be empty. GitHub #486.	2018-08-02 16:35:26 +09:00
Ian Barwick	a194cf56b3	repmgr: exit with an error if an unrecognised command line option is provided. This matches the behaviour of other PostgreSQL utilities such as psql, though repmgr will only abort once all command line options are parsed, so as many errors as possible are found and displayed. If a repmgr "command" (e.g. "repmgr primary ..." was provided, a hint about the relevant command help section (e.g. "repmgr primary --help") will be provided alongside the generic help command (i.e. "repmgr --help"). Addresses GitHub #464, with further improvements.	2018-07-04 11:02:50 +09:00
Ian Barwick	802755fd60	repmgrd: daemonize process by default It's hard to imagine a use case where this isn't desirable, but in case, for whatever reason, the user does not wish to daemonize the process, the command line option "--daemonize=false" can be provided. Implements GitHub #458.	2018-06-29 22:01:49 +09:00
Ian Barwick	8d636690bd	repmgrd: create pid file by default Traditionally repmgrd will only write a pidfile if explicitly requested with -p/--pid-file. However it's normally desirable to have a pidfile, and it's preferable to have one used by default to prevent accidentally starting a second repmgrd instance. Following changes made: - add configuration file parameter "repmgrd_pid_file" (initially overridden by -p/--pid-file for backwards compatibility, though eventually we'll want to drop -p/--pid-file altogether) - add command line option --no-pid-file - if neither "repmgrd_pid_file" nor -p/--pid-file is set, create the pid file in a temporary directory Implements GitHub #457.	2018-06-29 14:36:24 +09:00
Ian Barwick	b2081dca52	De-overload configuration file parameter "standby_reconnect_timeout" Currently the (very generic sounding) "standby_reconnect_timeout" configuration file parameter is used in several different contexts and it would be useful to have more granular control over the different timeouts it's used to configure. This patch introduces "node_rejoin_timeout", used in place of "standby_reconnect_timeout" (which wasn't documented) when "repmgr node rejoin" is executed, to determine how long to wait for the node to rejoin the replication cluster. Additionally "repmgrd_standby_startup_timeout" is introduced as a timeout for failover situations, when repmgrd executes "repmgr standby follow" to follow a new primary, and waits for the standby to restart and become available for connections. "standby_reconnect_timeout" is now only relevant for "repmgr standby switchover". Implements GitHub #454.	2018-06-28 18:00:55 +09:00
Ian Barwick	efc388065e	standby follow: check node has connect to new primary After restarting the standby, poll pg_stat_replication on the upstream until the standby connects, and exit with an error if it doesn't by the timeout defined in "standby_follow_timeout". Implments GitHub #444.	2018-06-07 15:04:45 +09:00

1 2

82 Commits