repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-23 15:16:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	bb42d8cba6	Fix calculation of maximum filename length	2019-03-28 12:40:29 +09:00
Ian Barwick	fe822a9eea	Prevent potential file descriptor resource leak	2019-03-28 12:29:10 +09:00
Ian Barwick	03cd5a6028	Put closedir call in correct location	2019-03-28 12:08:42 +09:00
Ian Barwick	1e1c596446	Add various missing close() calls	2019-03-28 11:32:25 +09:00
Ian Barwick	314a1e8f4f	use a constant to denote unknown replication lag	2019-03-20 17:26:04 +09:00
Ian Barwick	19bf4d7434	Count witness and zero-priority nodes in visibility check	2019-03-14 11:17:51 +09:00
Ian Barwick	9823978f41	repmgrd: don't wait for WAL receiver to reconnect during failover If the WAL receiver has been temporarily disabled, we don't want to wait for it to start up as it may not be able to at that point; we do however need to reset "wal_retrieve_retry_interval".	2019-03-06 15:54:56 +09:00
Ian Barwick	1615353f48	repmgrd: optionally disconnect WAL receivers during failover This is intended to ensure that all nodes have a constant LSN while making the failover decision. This feature is experimental and needs to be explicitly enabled with the configuration file option "standby_disconnect_on_failover". Note enabling this option will result in a delay in the failover decision until the WAL receiver is disconnected on all nodes.	2019-03-06 15:53:57 +09:00
Ian Barwick	71d151ca87	Don't check status of logical replication slots We only want to check the status of physical replication slots to determine whether a streaming replication standby has become detached and there is therefore a risk of uncontrolled WAL buildup on the local node. It's not feasible to second-guess the state of logical replication slots.	2019-02-23 10:09:43 +09:00
Ian Barwick	de70fd42dc	node check: simplify output generation in --is-shutdown-cleanly check	2019-02-22 10:49:06 +09:00
Ian Barwick	aeb9639ed9	node rejoin: add more log detail during rejoin success check Stating what is actually being checked where might be useful when diagnosing potential issues.	2019-02-13 15:29:39 +09:00
Ian Barwick	790bec21dd	node rejoin: handle case where node to rejoin was primary In that case the minRecoveryPoint* fields may be empty.	2019-02-12 13:31:25 +09:00
Ian Barwick	a0dc673439	"node rejoin": use minRecoveryPointTLI for comparing timelines	2019-02-12 13:31:21 +09:00
Ian Barwick	cce8b76171	"standby switchover": abort if promotion candidate has WAL replay paused If replay is paused, we can't be really sure that more WAL will be received between the check and the promote operation, which would risk the promote operation not taking place during the switchover (it would happen as soon as WAL replay is resumed and pending WAL is replayed). Therefore we simply quit with an informative slew of messages and leave the user to sort it out. GitHub #540.	2019-02-05 16:32:39 +09:00
Ian Barwick	f9a1861ded	Refactor ReplInfo struct handling Eventually we'll want to have this contain the optional replication info contained in the t_node_info struct, which should then contain a pointer to a ReplInfo struct.	2019-02-02 18:39:24 +09:00
Ian Barwick	20b79f998c	Define some previously magic numbers	2019-02-01 19:14:16 +09:00
Ian Barwick	64bb034d34	"node rejoin": catch corner case where repmgr metadata is outdated If the rejoin target is not in recovery, but not registered as primary (we detect this by attempting to connect to the registered primary) we abort and suggest fixing the repmgr metadata first.	2019-01-31 11:54:05 +09:00
Ian Barwick	59eca2be30	node rejoin: improve error code handling - return ERR_REJOIN_FAIL in all cases where the rejoin operation fails - ensure ERR_FOLLOW_FAIL is not returned - document error codes	2019-01-24 10:31:45 +09:00
Ian Barwick	dfe57d2406	"node rejoin": log pg_rewind command as DETAIL rather than DEBUG	2019-01-23 17:15:07 +09:00
Ian Barwick	061932d023	"node rejoin": verify status of rejoin target This adapts the code previously added to "standby follow" to verify whether the rejoin target can actually be rejoined.	2019-01-23 17:08:55 +09:00
Ian Barwick	3f5762e03a	Refactor upstream attachment check code Move it from the "standby follow" code to an independent function so it can be used in other contexts, e.g. "node rejoin".	2019-01-23 15:11:42 +09:00
Ian Barwick	42fa9a2a88	Log node rejoin failure as ERROR	2019-01-23 13:55:40 +09:00
Ian Barwick	f23065e041	Fix typo in log message	2019-01-23 13:53:29 +09:00
Ian Barwick	7dce3ed234	Update copyright notices to 2019	2019-01-21 14:54:35 +09:00
Ian Barwick	0b3a310802	Add --data-directory-config option to "repmgr node check" Implements part of GitHub #523.	2019-01-16 16:03:44 +09:00
Ian Barwick	9e90fcd584	"standby follow": verify status of follow target This commit adds infrastruture for repmgr to be able to check whether one standby can attach to another node, regardless whether it is a standby or a primary. This is intended to prevent a node from attempting to follow a node whose timeline has diverged. The --dry-run option makes it possible to test a follow operation before it is carried out. As a useful side-effect this makes it possible for a standby to follow another standby. This is an initial implementation; documentation and possibly further changes to follow.	2018-11-29 17:14:38 +09:00
Ian Barwick	74c44a7178	doc: document "repmgr node service" This was originally intended for internal use, but it's mentioned several times in the documentation and is useful for diagnostic purposes.	2018-11-28 12:58:07 +09:00
Ian Barwick	793d83b22c	Refactor server version detection Most of the time we can simply get the version number directly from the connection handle. Previously it was held in a global variable, which was an icky way of doing things. In a few special cases we also need the actual version string, which is obtained directly from the database.	2018-11-22 21:30:31 +09:00
Ian Barwick	61c91df332	"repmgr node": use appendPQExpBufferStr/-Char() where appropriate	2018-10-03 14:09:29 +09:00
Ian Barwick	b346914d4d	repmgr: fix "Missing replication slots" label in "node check" Per report in GitHub #507.	2018-10-03 13:53:52 +09:00
Ian Barwick	9681708b1a	repmgr: improve slot handling in "node rejoin" On the rejoined node, if a replication slot for the new upstream exists (which is typically the case after a failover), delete that slot. Also emit a warning about any inactive replication slots which may need to be cleaned up manually. GitHub #499.	2018-08-30 12:24:13 +09:00
Ian Barwick	a5cfc244bc	repmgr: have "node status" check for missing downstream nodes This matches the behaviour of "node check".	2018-07-18 10:27:19 +09:00
Ian Barwick	ae60caacdd	repmgr: make "node check" and "node status" return ERR_NODE_STATUS when appropriate If any issue is detected (and "node check" is not being executed with a specific individual check), "ERR_NODE_STATUS" is returned.	2018-07-05 14:31:06 +09:00
Ian Barwick	92d0e6809b	repmgr: "cluster show" to return non-zero value if an issue encountered	2018-07-05 13:32:50 +09:00
Ian Barwick	b2081dca52	De-overload configuration file parameter "standby_reconnect_timeout" Currently the (very generic sounding) "standby_reconnect_timeout" configuration file parameter is used in several different contexts and it would be useful to have more granular control over the different timeouts it's used to configure. This patch introduces "node_rejoin_timeout", used in place of "standby_reconnect_timeout" (which wasn't documented) when "repmgr node rejoin" is executed, to determine how long to wait for the node to rejoin the replication cluster. Additionally "repmgrd_standby_startup_timeout" is introduced as a timeout for failover situations, when repmgrd executes "repmgr standby follow" to follow a new primary, and waits for the standby to restart and become available for connections. "standby_reconnect_timeout" is now only relevant for "repmgr standby switchover". Implements GitHub #454.	2018-06-28 18:00:55 +09:00
Ian Barwick	080a29c33b	node check: add --missing-slots check This enables an explicit check for slots which should exist (according to the repmgr metadata) but which aren't present.	2018-06-22 17:21:40 +09:00
Ian Barwick	dd7a4068d2	node check: implement CSV output This is advertised in the --help output and placeholder code was in place, but it wasn't actually implemented.	2018-06-22 13:14:57 +09:00
Ian Barwick	fcf237fe31	node status: improve output and documentation In the default text output mode, list inactive slots. In CSV output mode, list inactive slots as additional information; add output line with number of missing slots and a list thereof. Also document --csv output mode.	2018-06-22 11:46:50 +09:00
Ian Barwick	4d70a667fb	node check: clarify status information for witness server Previously the output gave the impression the server was a primary, which is technically the case, but it's not the actual cluster primary. Also output an error if the node is in recovery, which is unlikely but you never know.	2018-06-22 10:15:45 +09:00
Ian Barwick	0f97a98f28	repmgr: don't count witness node as a standby when running "node status" Addresses GitHub #451.	2018-06-21 13:06:18 +09:00
Ian Barwick	269e3242c8	"repmgr node ...": update comments and formatting	2018-06-21 12:12:07 +09:00
Ian Barwick	b0ed87832b	repmgr: don't count witness node as a standby when running "node check" Addresses GitHub #451.	2018-06-21 11:13:46 +09:00
Ian Barwick	bb320a64f5	repmgr: consolidate code in "standby switchover" Commit `41274f5525` left us with two if statements in sequence with exactly the same condition, so consolidate both into a single statement. Clarify code comments while we're at it.	2018-06-12 10:30:24 +09:00
Ian Barwick	7861392450	node rejoin: avoid outputting empty DETAIL message	2018-06-07 15:03:36 +09:00
Ian Barwick	b297e40d77	node rejoin: improve handling of --config-file parameter Fixes bug when parsing --config-file values (GitHub #442). Also improves handling in --dry-run mode, as some checks for the provided files were being skipped if --dry-run supplied, even though they are intended to work with --dry-run.	2018-06-07 15:03:30 +09:00
Ian Barwick	8320179f34	Add configuration file parameter "config_directory" This enables explicit provision of an external configuration file directory, which if set will be passed to "pg_ctl" as the -D parameter. Otherwise "pg_ctl" will default to using the data directory, which will cause some operations to fail if the configuration files are not present there. Note this is implemented primarily for feature completeness and for development/testing purposes. Users who have installed "repmgr" from a package should not rely on "pg_ctl" to stop/start/restart PostgreSQL, instead they should set the appropriate "service_..._command" for their operating system. For more details see: https://repmgr.org/docs/4.0/configuration-service-commands.html Note: in a future release, the presence of "config_directory" in repmgr.conf will be used to implictly set "--copy-external-config-files=samepath" when cloning a standby; this is a behaviour change so will be implemented in the next major realease (repmgr 4.1). Implements GitHub #424.	2018-04-25 11:58:24 +09:00
Ian Barwick	cda952f1e4	Add "dbname=replication" to all replication connection strings Previously repmgr was attempting to make replication connections with "dbname" set to the repmgr database name. While this works if e.g. the repmgr user also has replication permissions, it will fail if a dedicated replication user is specified, who only has permission to access the virtual "replication" database. Change this to use "dbname=replication" if the replication connection user is different to the normal repmgr database user. (We could just always set it to "replication", but that might break existing installations e.g. where a .pgpass file is in use and there's no "replication" entry for the normal repmgr database user). Addresses GitHub #421.	2018-04-12 16:11:16 +09:00
Ian Barwick	4876a9fde3	Add TODO for pg_rewind changes coming in PostgreSQL 11	2018-04-03 21:56:46 +09:00
Ian Barwick	cf64f9e95c	Always initialise t_conninfo_param_list structures	2018-04-03 14:31:24 +09:00
Ian Barwick	a3f371b8c0	"node rejoin": actively check for node to rejoin cluster Previously repmgr was relying on whatever command was configured to start PostgreSQL to determine whether the node being rejoined had started correctly. However it's preferable to actively poll the upstream to confirm it has restarted and actually attached as a standby before confirming success of the "node rejoin" action. This can be overridden with the -W/--no-wait option. (Note that for consistency with other PostgreSQL utilities, the short form of the --wait option is now "-w"; this is currently only used in "repmgr standby follow".) Also update "repmgr node rejoin" documentation with a list of supported options, and add some useful index entries for "pg_rewind". Implements GitHub #415.	2018-04-03 10:34:44 +09:00

1 2 3

119 Commits