repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-25 08:06:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	81077d4bc2	standby switchover: fix behaviour if witness node is a sibling The witness node is not a streaming replication standby, so executing "repmgr standby follow" will fail. Instead, execute "repmgr witness register --force" to update the witness node record on the primary and its local copy of all node records. Addresses GitHub #453.	2018-06-21 17:16:18 +09:00
Ian Barwick	68a9745e7e	standby follow: check node has connect to new primary After restarting the standby, poll pg_stat_replication on the upstream until the standby connects, and exit with an error if it doesn't by the timeout defined in "standby_follow_timeout". Implments GitHub #444.	2018-06-07 14:41:05 +09:00
Ian Barwick	638a119c85	standby follow: add hint about using "node rejoin" If "repmgr standby follow" is executed on a node which isn't running, point out "repmgr node rejoin" should probably be used instead.	2018-06-07 11:02:32 +09:00
Ian Barwick	4aef4ea11e	standby clone: improve external configuration file copying If --copy-external-config-files was provided, check that we can copy the files before cloning the standby, and abort if an error is encountered. This will give the user the opportunity to fix any issues before running the entire (and potentially lengthy) clone. Previously errors were logged but no action taken, and the final message indicated the clone operation was successful. Addresses GitHub #443.	2018-06-01 13:00:07 +09:00
Ian Barwick	edceb32ccb	standby clone: --recovery-conf-only expects the standby to be registered Note this in the documentation, and add a HINT about registering it if the standby record is not available. Related to GitHub #438.	2018-05-29 11:54:38 +09:00
Ian Barwick	3dba8336e9	standby clone: don't assume existence of "user" in upstream conninfo Usually a seperate user (typically "repmgr") is set up specifically to manage the repmgr metadata, however there's no compelling requirement to do this, and it's possible the database owner (usually: "postgres") will be used, in which case it's possible the username will be left out of the conninfo string. Addresses GitHub #437.	2018-05-24 15:51:41 +09:00
Ian Barwick	55bb93bd3f	"standby clone": log actual connection string used to connect to upstream Useful for diagnostic purposes.	2018-05-10 11:58:48 +09:00
Ian Barwick	3364f8bdf0	Add configuration file parameter "config_directory" This enables explicit provision of an external configuration file directory, which if set will be passed to "pg_ctl" as the -D parameter. Otherwise "pg_ctl" will default to using the data directory, which will cause some operations to fail if the configuration files are not present there. Note this is implemented primarily for feature completeness and for development/testing purposes. Users who have installed "repmgr" from a package should not rely on "pg_ctl" to stop/start/restart PostgreSQL, instead they should set the appropriate "service_..._command" for their operating system. For more details see: https://repmgr.org/docs/4.0/configuration-service-commands.html Note: in a future release, the presence of "config_directory" in repmgr.conf will be used to implictly set "--copy-external-config-files=samepath" when cloning a standby; this is a behaviour change so will be implemented in the next major realease (repmgr 4.1). Implements GitHub #424.	2018-04-25 11:57:27 +09:00
Ian Barwick	478bbcccbf	Add "dbname=replication" to all replication connection strings Previously repmgr was attempting to make replication connections with "dbname" set to the repmgr database name. While this works if e.g. the repmgr user also has replication permissions, it will fail if a dedicated replication user is specified, who only has permission to access the virtual "replication" database. Change this to use "dbname=replication" if the replication connection user is different to the normal repmgr database user. (We could just always set it to "replication", but that might break existing installations e.g. where a .pgpass file is in use and there's no "replication" entry for the normal repmgr database user). Addresses GitHub #421.	2018-04-12 16:10:02 +09:00
Ian Barwick	94a7f0c719	Don't issue a CHECKPOINT after promoting a standby. Issuing a CHECKPOINT immediately after promoting a standby may impact performance. Commit `239a548e9d` ensures one is only issued when required, i.e. during a switchover when pg_rewind will be executed. This reverts commit `a2068768ab`.	2018-04-09 14:39:47 +09:00
Ian Barwick	6ac42f1593	"standby register": add sanity check when --upstream-node-id not supplied If --upstream-node-id was not supplied to "repmgr standby register", repmgr defaults to the primary node as upstream node. If the local node is available, we now double-check that it's attached to the primary, in case the lack of --upstream-node-id was an accidental ommission. This check is only made when the local node is available. This behaviour can be overriden with -F/--force (though it's hard to imagine a scenario where that would be useful). Addresses GitHub #395.	2018-04-05 17:40:05 +09:00
Ian Barwick	0d73d3c2b5	Enable provision of "archive_cleanup_command" in recovery.conf If "archive_cleanup_command" is defined in "repmgr.conf", a corresponding entry will be made in the node's "recovery.conf" file after cloning a standby. Note that we recommend using PgBarman to manage WAL archives, but are providing this facility to help repmgr to be integrated in existing environments. Implements GitHub #416.	2018-04-03 14:11:24 +09:00
Ian Barwick	7f1f04636d	Refactor pg_control parsing The "data_checksum_version" field towards the end of the ControlFileData struct, meaning its position varies between versions. Previously this wasn't a problem as it was only required for operations involving 9.5 and later, and its position within the control file has not changed between the current release and current HEAD. However, in order to support pg_rewind in 9.3 and 9.4, which both have changes in the control file format, we'll need version-specific parsing. This will also make it easier to deal with any future changes to the control file format.	2018-04-02 20:55:10 +09:00
Ian Barwick	6a1797cadd	Enable pg_rewind to be used with PostgreSQL 9.3/9.4 pg_rewind is not part of the core distribution for those, but we provided support in repmgr 3.3 so should extend it to repmgr 4. Note that there is no check in place whether the pg_rewind binary exists, so it's up to the user to ensure it's present. Addresses GitHub #413.	2018-04-02 20:55:04 +09:00
Ian Barwick	505d72d19c	"standby switchover": force checkpoint if pg_rewind requested. Addresses issue described in GitHub #378. PostgreSQL itself doesn't issue a checkpoint after promotion to ensure the newly promoted server is available as quickly as possible, so we'll only execute an explicit CHECKPOINT when it's actually required, i.e. when pg_rewind will be executed. This is required as pg_rewind uses the timeline reported in the pg_control file to compare with the server to be rewound, and the pg_control timeline is only updated after the first checkpoint, so there is an interval where pg_rewind will erroneously assume both servers are on the timeline and take no action.	2018-03-30 09:12:25 +09:00
Ian Barwick	b292ac61f8	"standby switchover": update hint	2018-03-30 09:12:21 +09:00
Ian Barwick	3e1f0ec168	repmgr: move demoted primary check to the final step during switchover This will give the demoted primary more time to start up as a standby, during which "standby follow" can be executed on sibling nodes, if specified.	2018-03-27 16:41:13 +09:00
Ian Barwick	6f9a1f975e	repmgr: poll demoted primary after restart during switchover During a switchover operation, once the demoted primary has been restarted as a standby, repmgr attempts to reconnect to verify its status and drop any redundant replication slots. However it's possible the standby may still be in the startup phase, so poll for "standby_reconnect_timeout" seconds before giving up. Addresses GitHub #408.	2018-03-27 15:58:18 +09:00
Ian Barwick	562b6ddfc2	Add error code ERR_FOLLOW_FAIL	2018-03-23 10:34:19 +08:00
Ian Barwick	b2eb9b8525	Correctly handle error message pointer when parsing strings. When parsing conninfo strings, ensure the error message pointer is actually returned to the caller. Not a criticial issue, just meant the contents of the error message were not being displayed.	2018-03-10 14:28:10 +09:00
Ian Barwick	93830cad61	Fix directory creation when cloning from Barman	2018-03-05 19:31:53 +09:00
Ian Barwick	c29d1efc37	"standby clone": improve replication user selection Use the upstream node's replication user when checking the replication connection.	2018-03-02 16:21:32 +09:00
Ian Barwick	6fbbe2a97a	"standby clone": fix --superuser handling get_superuser_connection() was erroneously using the local node record to connect to as a superuser, which works when registering the primary but obviously not when cloning a standby. Addresses GitHub #380.	2018-03-02 14:49:17 +09:00
Ian Barwick	98384559a6	"standby clone": remove restriction on replication slots in Barman mode While it's preferable to avoid standby replication slots if Barman is in use, there's no technical reason to prevent this. Implements GitHub #379.	2018-03-01 15:47:28 +09:00
Ian Barwick	4a1477343b	repmgr: escape "restore_command" in generated recovery.conf	2018-03-01 10:39:04 +09:00
Ian Barwick	d2b9d20393	"standy clone": fix primary_conninfo when --upstream-conninfo provided	2018-03-01 09:18:40 +09:00
Ian Barwick	9365bf3474	"standby promote": make timeout values configurable This introduces following new configuration file parameters, which were previously hard-coded values: - promote_check_timeout - promote_check_interval Implements GitHub #387.	2018-02-27 10:04:58 +09:00
Ian Barwick	1f021dc9fa	"standby clone --recovery-conf-only": display generated file with --dry-run Refactor the original code which generates "recovery.conf" to place the output into a buffer, which can either be output as "recovery.conf" or copied to a buffer specified by the caller.	2018-02-23 10:16:47 +09:00
Ian Barwick	3a764f678a	"standby clone": add --recovery-conf-only option This will generate "recovery.conf" for an existing standby. Typical use-case is a standby cloned manually from an external data source (e.g. Barman), where "recovery.conf" needs to be created (and if required a replication slot). The --dry-run option will check the pre-requisites but not actually create "recovery.conf" or a replication slot. This requires that the upstream node is running, a replication connection can be made and if required a replication slot can be created. Implements GitHub #382.	2018-02-22 15:47:19 +09:00
Ian Barwick	a8232337d8	Catch various corner cases when restarting a PostgreSQL instance	2018-02-14 11:28:38 +09:00
Ian Barwick	c9eb1bfcc0	Always initialise t_conninfo_param_list structures	2018-02-13 10:48:18 +09:00
Ian Barwick	9732f78565	repmgrd: check "repmgr" extension is installed before starting Implements GitHub #361.	2018-02-12 11:31:59 +09:00
Ian Barwick	ed6a167915	Execute a CHECKPOINT immediately after promoting the server This ensures "pg_control" is updated with the latest timeline, mainly to ensure that if "pg_rewind" is executed as part of a switchover that it sees the latest timeline. Per suggestion from GitHub user "superflav" in GitHub #378. See also: https://www.postgresql.org/message-id/flat/20150428180253.GU30322%40tamriel.snowman.net	2018-02-09 12:09:16 +09:00
Ian Barwick	fbbe7afd61	doc: update HISTORY and release notes	2018-02-09 11:42:16 +09:00
Ian Barwick	ae1fc93e48	Ensure correct server version number used for replication stats query	2018-02-09 11:06:15 +09:00
Ian Barwick	7b4ee80af2	"standby switchover": check demotion candidate can make replication connection Check it's actually possible for the demotion candidate to attach to the promotion candidate before executing the switchover. As with other checks of this nature, there's a faint possibility the situation could change between the time the check is carried out and the demotion candidate is restarted to connect to the promotion candidate, but there's not a lot we can do about that. The main purpose is to be able to catch existing misconfigurations before anything gets changed. Implements GitHub #370.	2018-02-09 10:01:29 +09:00
Ian Barwick	d3e1937808	"standby switchover": additional sanity checks Check that sufficient walsenders will be available on the promotion candidate, and if replication slots are in use check if enough of those will be available. Note these checks can't guarantee that the walsenders/slots will be available at the appropriate points during the switchover process, but do ensure that existing configuration problems will be caught. Implements GitHub #371.	2018-02-08 15:23:10 +09:00
Ian Barwick	871d6fdee3	"standby clone": cowardly refuse to clone into an active data directory By checking the PID file in the same way pg_ctl does, we can be pretty much certain whether the target data directory contains an active PostgreSQL instance.	2018-02-08 11:43:24 +09:00
Ian Barwick	c7dfe9e040	Fix "standby clone" in Barman mode with --no-upstream-connection "--upstream-node-id", if provided, was not being passed through to the SQL query executed via the Barman server. Also modified the query to select the primary node if "--upstream-node-id" is not provided. Note: this is a very niche use case.	2018-02-07 16:36:44 +09:00
Ian Barwick	5c92a9e057	repmgr: simplify data directory checks when cloning Attempting to use the contents of pg_control to tell whether the directory is in use by PostgreSQL can result in false positives; we should use a check based on the pidfile. Also change the HINT to indicate a data directory can be overwritten if -F/--force is provided.	2018-02-07 14:37:57 +09:00
Ian Barwick	aa5f025738	"standby clone": ensure "pg_subtrans" directory is created in Barman mode	2018-02-07 10:56:18 +09:00
Ian Barwick	596a19ee37	Move parse_output_to_argv() to configfile.c So it can be used by parse_pg_basebackup_options(). Addresses GitHub #376.	2018-02-07 09:43:06 +09:00
Ian Barwick	64035ef701	"standby register/follow": provide primary node details for event notifications For events generated by these commands, it may be useful to know details of the primary node. This makes following additional parameters available to event notification scripts: - %p: node ID of the primary - %a: node name of the primary - %c: conninfo string for the primary Implements GitHub #375	2018-02-06 09:36:46 +09:00
Ian Barwick	9d301b4789	"standby register": add event notification "standby_register_sync" Implements GitHub #374.	2018-02-05 15:21:38 +09:00
Ian Barwick	50894b6124	"standby follow": check for replication slot availability on target node	2018-02-02 15:01:23 +09:00
Ian Barwick	c54045bcd8	"standby follow": initial implementation of --dry-run option GitHub #363.	2018-02-01 14:18:40 +09:00
Ian Barwick	c0a53471e1	"standby switchover": improve log messages and add new exit code Previously, if an issue was encountered with the old primary, but user provided -F/--force to have repmgr promote the standby anyway, repmgr would exit with the log message "STANDBY SWITCHOVER is complete" and exit code 0 (SUCCESS). To better report this partial completion, repmgr will now emit the message "STANDBY SWITCHOVER has completed with issues" (and a HINT to check preceding log messages) and new exit code 22 (ERR_SWITCHOVER_INCOMPLETE).	2018-01-31 10:25:15 +09:00
Ian Barwick	2eec8b5d79	Have do_standby_follow_internal() not abort on error Pass the error code back to the caller instead, mainly so "repmgr node rejoin" can better report errors.	2018-01-30 16:53:04 +09:00
Ian Barwick	c11e92cf2a	repmgr: improve switchover handling when "pg_ctl" used If logging output not explicitly rediretced with "-l" in the pg_ctl options, repmgr would hang waiting for pg_ctl output. Note that we recommend using the OS-level service commands where available.	2018-01-30 13:43:37 +09:00
Ian Barwick	f294d09034	"repmgr standby register": improve error output when standby not running Add explicit HINT	2018-01-26 22:13:11 +09:00

1 2 3 4 5

206 Commits