repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-25 16:16:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	793d83b22c	Refactor server version detection Most of the time we can simply get the version number directly from the connection handle. Previously it was held in a global variable, which was an icky way of doing things. In a few special cases we also need the actual version string, which is obtained directly from the database.	2018-11-22 21:30:31 +09:00
Ian Barwick	b223cb4cee	standby follow: improve handling of --upstream-node-id	2018-11-22 11:16:44 +09:00
Ian Barwick	c3bc5585d9	Add sanity check for extension version This should cover the cases where the "repmgr" extension was installed manually but not updated, or an upgrade was not fully completed.	2018-10-31 11:16:36 +09:00
Ian Barwick	c336e384ab	Support "pg_promote()" function (PostgreSQL 12 and later) This is an experimental feature.	2018-10-26 11:02:45 +09:00
Ian Barwick	dc8ffd30c6	"standby switchover": close all connections used to check repmgrd status The connections used to check repmgrd status on all nodes were not being closed if repmgrd was not running. Normally this wouldn't be a huge problem as they will go away when repmgr terminates or the PostgreSQL server restarted. However, if shutdown mode is "smart", the open connection on the demotion candidate will cause the shutdown operation to fail until repmgr times out.	2018-10-23 11:05:28 +09:00
Ian Barwick	36bd7cdc9f	Speed up witness "failover" during a switchover	2018-10-18 17:26:29 +09:00
Ian Barwick	15a5d2ee9d	"repmgr standby": use appendPQExpBufferStr/-Char() consistently	2018-10-03 17:31:12 +09:00
Ian Barwick	2491b8ae52	Add functionality to "pause" repmgrd In some circumstances, e.g. while performing a switchover, it is essential that repmgrd does not take any kind of failover action, as this will put the cluster into an incorrect state. Previously it was necessary to stop repmgrd on all nodes (or at least those nodes which repmgrd would consider as promotion candidates), however this is a cumbersome and potentially risk-prone operation, particularly if the replication cluster contains more than a couple of servers. To prevent this issue from occurring, this patch introduces the ability to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause") which notifies repmgrd not to take any failover action until the node is "unpaused" ("repmgr daemon unpause"). "repmgr daemon status" provides an overview of each node and whether repmgrd is running, and if so whether it is paused. "repmgr standby switchover" has been modified to automatically pause repmgrd while carrying out the switchover. See documentation for further details.	2018-09-27 16:42:10 +09:00
Ian Barwick	9439467958	doc: add troubleshooting section to switchover documentation	2018-09-25 13:47:58 +09:00
Ian Barwick	38e3aae053	repmgr: add parameter "shutdown_check_timeout" Previously, "repmgr standby switchover" used the configuration file parameters "reconnect_interval" and "reconnect_attempts" to define a timeout to determine whether the current primary (demotion candidate) has shut down. However, these parameters are intended for primary failure detection and are generally lower in value, while a controlled shutdown may take longer, resulting in the switchover being aborted as repmgr was not waiting long enough. To prevent this happening, parameter "shutdown_check_timeout" has been added. This complements the existing "standby_reconnect_timeout" parameter used by "repmgr standby switchover". Implements GitHub #504.	2018-09-25 11:34:06 +09:00
Ian Barwick	9681708b1a	repmgr: improve slot handling in "node rejoin" On the rejoined node, if a replication slot for the new upstream exists (which is typically the case after a failover), delete that slot. Also emit a warning about any inactive replication slots which may need to be cleaned up manually. GitHub #499.	2018-08-30 12:24:13 +09:00
Ian Barwick	7745844078	"standby switchover": improve replication connection check Previously repmgr would first check that a replication can be made from the demotion candidate to the promotion candidate, however it's preferable to sanity-check the number of available walsenders first, to provide a more useful error message.	2018-08-24 16:31:25 +09:00
Cédric Villemain	6fc79470fc	Fix grep to find conninfo it used to use \t* but [[:space:]] should be better as it does match more kind of spaces (the current one being broken in my case on RH7)	2018-08-23 18:33:55 +02:00
Ian Barwick	c3949b2aea	"standby clone" - don't copy external config files in dry run mode Avoid copying files during a --dry-run as it may introduce unexpected changes on the target node. During an actual clone operation, any problems with copying files will be detected early and the operation aborted before the actual database cloning commences. GitHub #491.	2018-08-20 15:23:37 +09:00
Ian Barwick	6ba49de44e	"standby promote": improve log messages Make it clearer what repmgr is waiting for, and what to do if the promotion appears to fail.	2018-08-16 11:52:01 +09:00
Ian Barwick	f2bc898761	repmgr: fix handling of slot creation error when cloning If cloning from another node other than the intended upstream, and replication slots are in use, once the cloning process is complete, repmgr will attempt to connect to the intended upstream to create the replication slot. Previously it would abort with a connection error, but as this issue is not fatal to the cloning process itself, and in some situations may be intentional, it's better to log a warning and continue. We should probably collate this (and any similar items needing attention after the cloning operation) into a list output at the end, otherwise the warning may get overlooked.	2018-08-15 15:12:23 +09:00
Abhijit Menon-Sen	97cafd8c54	Fix upstream node name in warning This log_warning is supposed to reproduce the error in the block above, but used the current node's name instead of the intended upstream node.	2018-08-12 09:15:13 +05:30
Ian Barwick	7ecfb333b9	doc: add note about switchover and exclusive backups Also rename server_not_in_exclusive_backup_mode() to avoid double negatives. GitHub #476.	2018-07-19 16:02:31 +09:00
Martín Marqués	8f13a66aaa	Check that there is no exclusive backup taking place while we perform a switchover. We've found that this can cause some issues with postgres control metadata (could be a postgres bug) so best thing is not no switchover if there's a backup taking place. It's also a bad idea from an architectual point of view, as a switchover is supposed to be planed, so why perform it when we are taking backups. GitHub #476.	2018-07-19 16:02:21 +09:00
Ian Barwick	673bde2b7f	repmgr: fix "primary_slot_name" when using "standby clone" with --recovery-conf-only Addresses GitHub #474.	2018-07-17 13:42:10 +09:00
Martín Marqués	81de200561	Add information to the --help and docs of standby clone regarding the need to provide a conninfo line to the upstream from which we will be cloning from.	2018-07-16 18:56:41 -03:00
Ian Barwick	29de052dd8	repmgr: clarify intent behind --wait-sync timeout processing	2018-07-05 10:09:04 +09:00
Ian Barwick	37311e15a3	repmgr: fix "standby register --wait-sync" when no timeout provided The default value for "wait_register_sync_seconds" was zero, which is treated as disabling --wait-sync altogether. Default value now set to -1, which is taken to mean no timeout value supplied.	2018-07-04 17:22:04 +09:00
Ian Barwick	fcf237fe31	node status: improve output and documentation In the default text output mode, list inactive slots. In CSV output mode, list inactive slots as additional information; add output line with number of missing slots and a list thereof. Also document --csv output mode.	2018-06-22 11:46:50 +09:00
Ian Barwick	c5ba72c2c5	standby switchover: fix behaviour if witness node is a sibling The witness node is not a streaming replication standby, so executing "repmgr standby follow" will fail. Instead, execute "repmgr witness register --force" to update the witness node record on the primary and its local copy of all node records. Addresses GitHub #453.	2018-06-21 16:48:58 +09:00
Ian Barwick	efc388065e	standby follow: check node has connect to new primary After restarting the standby, poll pg_stat_replication on the upstream until the standby connects, and exit with an error if it doesn't by the timeout defined in "standby_follow_timeout". Implments GitHub #444.	2018-06-07 15:04:45 +09:00
Ian Barwick	0108fb2e72	standby follow: add hint about using "node rejoin" If "repmgr standby follow" is executed on a node which isn't running, point out "repmgr node rejoin" should probably be used instead.	2018-06-07 15:04:30 +09:00
Ian Barwick	535fba43d3	standby clone: improve external configuration file copying If --copy-external-config-files was provided, check that we can copy the files before cloning the standby, and abort if an error is encountered. This will give the user the opportunity to fix any issues before running the entire (and potentially lengthy) clone. Previously errors were logged but no action taken, and the final message indicated the clone operation was successful. Addresses GitHub #443.	2018-06-07 15:04:01 +09:00
Ian Barwick	7613b1769c	standby clone: --recovery-conf-only expects the standby to be registered Note this in the documentation, and add a HINT about registering it if the standby record is not available. Related to GitHub #438.	2018-05-31 09:42:53 +09:00
Ian Barwick	276239422b	standby clone: don't assume existence of "user" in upstream conninfo Usually a seperate user (typically "repmgr") is set up specifically to manage the repmgr metadata, however there's no compelling requirement to do this, and it's possible the database owner (usually: "postgres") will be used, in which case it's possible the username will be left out of the conninfo string. Addresses GitHub #437.	2018-05-24 15:52:51 +09:00
Ian Barwick	6c518f1403	"standby clone": log actual connection string used to connect to upstream Useful for diagnostic purposes.	2018-05-10 12:03:13 +09:00
Ian Barwick	8320179f34	Add configuration file parameter "config_directory" This enables explicit provision of an external configuration file directory, which if set will be passed to "pg_ctl" as the -D parameter. Otherwise "pg_ctl" will default to using the data directory, which will cause some operations to fail if the configuration files are not present there. Note this is implemented primarily for feature completeness and for development/testing purposes. Users who have installed "repmgr" from a package should not rely on "pg_ctl" to stop/start/restart PostgreSQL, instead they should set the appropriate "service_..._command" for their operating system. For more details see: https://repmgr.org/docs/4.0/configuration-service-commands.html Note: in a future release, the presence of "config_directory" in repmgr.conf will be used to implictly set "--copy-external-config-files=samepath" when cloning a standby; this is a behaviour change so will be implemented in the next major realease (repmgr 4.1). Implements GitHub #424.	2018-04-25 11:58:24 +09:00
Ian Barwick	cda952f1e4	Add "dbname=replication" to all replication connection strings Previously repmgr was attempting to make replication connections with "dbname" set to the repmgr database name. While this works if e.g. the repmgr user also has replication permissions, it will fail if a dedicated replication user is specified, who only has permission to access the virtual "replication" database. Change this to use "dbname=replication" if the replication connection user is different to the normal repmgr database user. (We could just always set it to "replication", but that might break existing installations e.g. where a .pgpass file is in use and there's no "replication" entry for the normal repmgr database user). Addresses GitHub #421.	2018-04-12 16:11:16 +09:00
Ian Barwick	62c29aab32	Don't issue a CHECKPOINT after promoting a standby. Issuing a CHECKPOINT immediately after promoting a standby may impact performance. Commit `239a548e9d` ensures one is only issued when required, i.e. during a switchover when pg_rewind will be executed. This reverts commit `a2068768ab`.	2018-04-09 14:35:54 +09:00
Ian Barwick	e8ba213174	"standby register": add sanity check when --upstream-node-id not supplied If --upstream-node-id was not supplied to "repmgr standby register", repmgr defaults to the primary node as upstream node. If the local node is available, we now double-check that it's attached to the primary, in case the lack of --upstream-node-id was an accidental ommission. This check is only made when the local node is available. This behaviour can be overriden with -F/--force (though it's hard to imagine a scenario where that would be useful). Addresses GitHub #395.	2018-04-05 17:38:55 +09:00
Ian Barwick	ec998bf9c5	doc: update HISTORY and release notes	2018-04-03 15:00:49 +09:00
Ian Barwick	e36b180de8	Ensure correct server version number used for replication stats query	2018-04-03 14:45:37 +09:00
Ian Barwick	a2068768ab	Execute a CHECKPOINT immediately after promoting the server This ensures "pg_control" is updated with the latest timeline, mainly to ensure that if "pg_rewind" is executed as part of a switchover that it sees the latest timeline. Per suggestion from GitHub user "superflav" in GitHub #378. See also: https://www.postgresql.org/message-id/flat/20150428180253.GU30322%40tamriel.snowman.net	2018-04-03 14:44:44 +09:00
Ian Barwick	bde9fea48c	Fix directory creation when cloning from Barman	2018-04-03 14:44:03 +09:00
Ian Barwick	3b00dc912a	Catch various corner cases when restarting a PostgreSQL instance	2018-04-03 14:40:53 +09:00
Ian Barwick	1e1b4b1a65	"standby register/follow": provide primary node details for event notifications For events generated by these commands, it may be useful to know details of the primary node. This makes following additional parameters available to event notification scripts: - %p: node ID of the primary - %a: node name of the primary - %c: conninfo string for the primary Implements GitHub #375	2018-04-03 14:32:19 +09:00
Ian Barwick	cf64f9e95c	Always initialise t_conninfo_param_list structures	2018-04-03 14:31:24 +09:00
Ian Barwick	dfdebd6c08	Enable provision of "archive_cleanup_command" in recovery.conf If "archive_cleanup_command" is defined in "repmgr.conf", a corresponding entry will be made in the node's "recovery.conf" file after cloning a standby. Note that we recommend using PgBarman to manage WAL archives, but are providing this facility to help repmgr to be integrated in existing environments. Implements GitHub #416.	2018-04-03 14:10:21 +09:00
Ian Barwick	63a11f8926	"standby promote": make timeout values configurable This introduces following new configuration file parameters, which were previously hard-coded values: - promote_check_timeout - promote_check_interval Implements GitHub #387.	2018-04-03 14:10:14 +09:00
Ian Barwick	ad24b04c35	Refactor pg_control parsing The "data_checksum_version" field towards the end of the ControlFileData struct, meaning its position varies between versions. Previously this wasn't a problem as it was only required for operations involving 9.5 and later, and its position within the control file has not changed between the current release and current HEAD. However, in order to support pg_rewind in 9.3 and 9.4, which both have changes in the control file format, we'll need version-specific parsing. This will also make it easier to deal with any future changes to the control file format.	2018-04-02 20:54:42 +09:00
Ian Barwick	3ccf1cf182	Enable pg_rewind to be used with PostgreSQL 9.3/9.4 pg_rewind is not part of the core distribution for those, but we provided support in repmgr 3.3 so should extend it to repmgr 4. Note that there is no check in place whether the pg_rewind binary exists, so it's up to the user to ensure it's present. Addresses GitHub #413.	2018-04-02 20:54:29 +09:00
Ian Barwick	239a548e9d	"standby switchover": force checkpoint if pg_rewind requested. Addresses issue described in GitHub #378. PostgreSQL itself doesn't issue a checkpoint after promotion to ensure the newly promoted server is available as quickly as possible, so we'll only execute an explicit CHECKPOINT when it's actually required, i.e. when pg_rewind will be executed. This is required as pg_rewind uses the timeline reported in the pg_control file to compare with the server to be rewound, and the pg_control timeline is only updated after the first checkpoint, so there is an interval where pg_rewind will erroneously assume both servers are on the timeline and take no action.	2018-03-29 23:55:08 +09:00
Ian Barwick	231ef5563e	"standby switchover": update hint	2018-03-29 23:41:59 +09:00
Ian Barwick	7111483b65	repmgr: move demoted primary check to the final step during switchover This will give the demoted primary more time to start up as a standby, during which "standby follow" can be executed on sibling nodes, if specified.	2018-03-27 16:44:15 +09:00
Ian Barwick	1558497ae4	repmgr: poll demoted primary after restart during switchover During a switchover operation, once the demoted primary has been restarted as a standby, repmgr attempts to reconnect to verify its status and drop any redundant replication slots. However it's possible the standby may still be in the startup phase, so poll for "standby_reconnect_timeout" seconds before giving up. Addresses GitHub #408.	2018-03-27 16:44:10 +09:00

1 2 3 4 5

230 Commits