repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-22 22:56:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	300e11eb76	Rearrange some command line option handling definitions for clarity	2020-06-10 13:02:13 +09:00
Ian Barwick	d37513312a	Move the main configfile structure into configfile.c This is required for a later refactoring of the configuration file handling.	2020-05-05 14:43:55 +09:00
Ian Barwick	45e96f21a5	node check: add option --db-connection This is intended for diagnostic purposes, primarily when diagnosing the connection parameters used when repmgr is being executed on a remote node.	2020-04-15 17:48:23 +09:00
Ian Barwick	32dde4eaaf	standby switchover: improve directory check failure handling It's possible that the remote data directory check will fail if e.g. connection configuration is not consistent across all nodes. This modification ensures a database error connection is reported, rather than a spurios issue with the data directory configuration.	2020-04-15 14:08:29 +09:00
Ian Barwick	d9cb38c7f0	node check: add --upstream option We have a --downstream option to check for attached nodes, but it would be useful to have a corresponding --upstream option too. A following patch will adapt the behaviour of this option when executed on the primary node.	2020-03-30 17:54:52 +09:00
Ian Barwick	8f6058c676	standby switchover: check replication configuration file ownership Within a PostgreSQL data directory, all files should have the same ownership as the data directory itself. PostgreSQL itself expects this, and ownership of files by another user is likely to cause problems. In PostgreSQL 11 or earlier, if "recovery.conf" cannot be moved by PostgreSQL (because e.g. it is owned by root), it will not be possible to promote the standby to primary. In PostgreSQL 12 and later, if "postgresql.auto.conf" on the demotion candidate (current primary) has incorrect ownership (e.g. owned by root), repmgr will very likely not be able to modify this file and write the replication configuration required for the node to rejoin the cluster as a standby. Checks added to catch both cases before a switchover is executed.	2020-03-04 17:21:22 +09:00
Ian Barwick	7ed0a99d70	Make code to check standby join status available globally This makes it possible to check the standby join status from another node, e.g. the promotion candidate during a switchover operation.	2020-02-04 12:52:55 +09:00
Ian Barwick	cd7f36a6fd	Add general check function "check_replication_slots_available()" Make the code previously only used by "standby follow" generally available - we'll want to use this from "node rejoin" as well. While we're at it, when reporting failure due to lack of free replication slots, report the current value of "max_replication_slots".	2020-02-03 16:43:55 +09:00
Ian Barwick	7fdf2f1778	Update copyright notices to 2020	2020-01-13 14:06:20 +09:00
Ian Barwick	220ec7fc96	Minimize user permissions requirements for replication slots Enable operations which create or drop replication slots to be carried out with the minimum necessary user permissions, i.e. a user with the REPLICATION attribute. This can be the repmgr user, or a dedicated replication user. In the latter case, if the dedicated replication user is only permitted to make replication connections, the streaming replication protocol is used to create/drop slots. Implements part of GitHub #536.	2019-10-30 15:51:15 +09:00
Ian Barwick	b74f965f54	standby clone: rename --recovery-conf-only to --replication-conf-only A more generic option name to cover pre- and post-Pg12 replication configuration methods. --recovery-conf-only is retained as an alias for backwards compatibility.	2019-10-18 14:44:57 +09:00
Ian Barwick	a502b2cf96	Move function parse_repmgr_version() to a more appropriate location	2019-09-24 13:14:03 +09:00
Ian Barwick	fca033fb9d	cluster show/daemon status: report upstream node mismatches When showing node information, check if the node's copy of its record shows a different upstream to the one expected according to the node where the command is executed. This helps visualise situations where the cluster is in an unexpected state, and provide a better idea of the actual state. For example, if a cluster has divided somehow and a set of nodes are following a new primary, when running "cluster show" etc., repmgr will now show the name of the primary those nodes are actually following, rather than the now outdated node name recorded on the other side of the split. A warning will also be issued about the situation.	2019-05-14 13:11:31 +09:00
Ian Barwick	d8e4c54ea4	"standby switchover": add "--repmgrd-force-unpause" Implements GitHub #559.	2019-05-10 16:04:07 +09:00
Ian Barwick	9fe2fa2daf	daemon status: make output more like that of "cluster show" In particular make any issues with unexpected server state more obvious.	2019-04-25 14:45:41 +09:00
Ian Barwick	ba1f05ece9	Restrict "node_name" to maximum 63 characters In "recovery.conf", the configuration parameter "node_name" is used as the "application_name" value, which will be truncated by PostgreSQL to 63 characters (NAMEDATALEN - 1). repmgr sometimes needs to be able to extract the application name from pg_stat_replication to determine if a node is connected (e.g. when executing "repmgr standby register"), so the comparison will fail if "node_name" exceeds 63 characters.	2019-03-28 10:37:57 +09:00
Ian Barwick	1615353f48	repmgrd: optionally disconnect WAL receivers during failover This is intended to ensure that all nodes have a constant LSN while making the failover decision. This feature is experimental and needs to be explicitly enabled with the configuration file option "standby_disconnect_on_failover". Note enabling this option will result in a delay in the failover decision until the WAL receiver is disconnected on all nodes.	2019-03-06 15:53:57 +09:00
Ian Barwick	b1875a8d91	Split command execution functions into separate library These may need to be executed by repmgrd.	2019-02-27 14:41:17 +09:00
Ian Barwick	48381a5b4e	Use --compact option for abbreviated display output --terse is meant for reducing log chatter.	2019-02-02 13:06:59 +09:00
Ian Barwick	d7420d7274	daemon (start\|stop): verify that repmgrd starts/stops. Note this may not always be possible for "daemon stop" if we are unable to determine the repmgrd PID.	2019-01-30 14:41:31 +09:00
Ian Barwick	32b81e7d49	"daemon start": initial implementation	2019-01-29 13:01:14 +09:00
Ian Barwick	061932d023	"node rejoin": verify status of rejoin target This adapts the code previously added to "standby follow" to verify whether the rejoin target can actually be rejoined.	2019-01-23 17:08:55 +09:00
Ian Barwick	3f5762e03a	Refactor upstream attachment check code Move it from the "standby follow" code to an independent function so it can be used in other contexts, e.g. "node rejoin".	2019-01-23 15:11:42 +09:00
Ian Barwick	7dce3ed234	Update copyright notices to 2019	2019-01-21 14:54:35 +09:00
Ian Barwick	0b3a310802	Add --data-directory-config option to "repmgr node check" Implements part of GitHub #523.	2019-01-16 16:03:44 +09:00
Ian Barwick	10be941298	Fix typo "node join" should be "node rejoin"	2019-01-14 15:39:13 +09:00
Ian Barwick	c66c8ebc98	repmgr: add --terse mode to "cluster show" This suppresses display of the usually lengthy "conninfo" column, mainly useful for generating a compact table suitable for pasting into emails, chats etc. without messy line breaks. Implements GitHub #521.	2019-01-09 10:06:37 +09:00
Ian Barwick	455a0bd93f	Use make_remote_repmgr_path() in place of make_repmgr_path() Also we can now simplify "cluster (matrix\|crosscheck)" commands as beginning with v4.0, we know where the configuration file is, so can provide that when invoking repmgr remotely.	2018-10-02 09:59:18 +09:00
Ian Barwick	11d25e2aef	Add configuration parameter "repmgr_bindir" This is to facilitate remote invocation of repmgr when the repmgr binary is located somewhere other than the PostgreSQL binary directory, as it cannot be assumed all package maintainers will install repmgr there. This parameter is optional; if not set (the default), repmgr will fall back to "pg_bindir" (if set). Addresses GitHub #246.	2018-10-02 09:59:12 +09:00
Ian Barwick	2491b8ae52	Add functionality to "pause" repmgrd In some circumstances, e.g. while performing a switchover, it is essential that repmgrd does not take any kind of failover action, as this will put the cluster into an incorrect state. Previously it was necessary to stop repmgrd on all nodes (or at least those nodes which repmgrd would consider as promotion candidates), however this is a cumbersome and potentially risk-prone operation, particularly if the replication cluster contains more than a couple of servers. To prevent this issue from occurring, this patch introduces the ability to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause") which notifies repmgrd not to take any failover action until the node is "unpaused" ("repmgr daemon unpause"). "repmgr daemon status" provides an overview of each node and whether repmgrd is running, and if so whether it is paused. "repmgr standby switchover" has been modified to automatically pause repmgrd while carrying out the switchover. See documentation for further details.	2018-09-27 16:42:10 +09:00
Ian Barwick	9681708b1a	repmgr: improve slot handling in "node rejoin" On the rejoined node, if a replication slot for the new upstream exists (which is typically the case after a failover), delete that slot. Also emit a warning about any inactive replication slots which may need to be cleaned up manually. GitHub #499.	2018-08-30 12:24:13 +09:00
Ian Barwick	56919ea499	repmgr: add -q/--quiet option This suppresses log output below log level ERROR. This is useful mainly when repmgr is being executed programmatically, e.g. in a cronjob, where it's only useful to receive output if something goes wrong. Note we advise against using this option when executing repmgr commands which operate on PostgreSQL nodes (standby follow, standby promote, standby switchover, node rejoin), particularly when executed by repmgrd, as the log output will provide valuable troubleshooting information. Implements suggestion in GitHub #468.	2018-07-13 12:09:41 +09:00
Ian Barwick	37311e15a3	repmgr: fix "standby register --wait-sync" when no timeout provided The default value for "wait_register_sync_seconds" was zero, which is treated as disabling --wait-sync altogether. Default value now set to -1, which is taken to mean no timeout value supplied.	2018-07-04 17:22:04 +09:00
Ian Barwick	080a29c33b	node check: add --missing-slots check This enables an explicit check for slots which should exist (according to the repmgr metadata) but which aren't present.	2018-06-22 17:21:40 +09:00
Ian Barwick	8320179f34	Add configuration file parameter "config_directory" This enables explicit provision of an external configuration file directory, which if set will be passed to "pg_ctl" as the -D parameter. Otherwise "pg_ctl" will default to using the data directory, which will cause some operations to fail if the configuration files are not present there. Note this is implemented primarily for feature completeness and for development/testing purposes. Users who have installed "repmgr" from a package should not rely on "pg_ctl" to stop/start/restart PostgreSQL, instead they should set the appropriate "service_..._command" for their operating system. For more details see: https://repmgr.org/docs/4.0/configuration-service-commands.html Note: in a future release, the presence of "config_directory" in repmgr.conf will be used to implictly set "--copy-external-config-files=samepath" when cloning a standby; this is a behaviour change so will be implemented in the next major realease (repmgr 4.1). Implements GitHub #424.	2018-04-25 11:58:24 +09:00
Ian Barwick	3b00dc912a	Catch various corner cases when restarting a PostgreSQL instance	2018-04-03 14:40:53 +09:00
Ian Barwick	a3f371b8c0	"node rejoin": actively check for node to rejoin cluster Previously repmgr was relying on whatever command was configured to start PostgreSQL to determine whether the node being rejoined had started correctly. However it's preferable to actively poll the upstream to confirm it has restarted and actually attached as a standby before confirming success of the "node rejoin" action. This can be overridden with the -W/--no-wait option. (Note that for consistency with other PostgreSQL utilities, the short form of the --wait option is now "-w"; this is currently only used in "repmgr standby follow".) Also update "repmgr node rejoin" documentation with a list of supported options, and add some useful index entries for "pg_rewind". Implements GitHub #415.	2018-04-03 10:34:44 +09:00
Ian Barwick	3ccf1cf182	Enable pg_rewind to be used with PostgreSQL 9.3/9.4 pg_rewind is not part of the core distribution for those, but we provided support in repmgr 3.3 so should extend it to repmgr 4. Note that there is no check in place whether the pg_rewind binary exists, so it's up to the user to ensure it's present. Addresses GitHub #413.	2018-04-02 20:54:29 +09:00
Ian Barwick	9c5e76401f	Fix "repmgr cluster crosscheck" output Addresses GitHub #398.	2018-03-27 16:44:04 +09:00
Ian Barwick	ee98a3a58e	"standby clone": add --recovery-conf-only option This will generate "recovery.conf" for an existing standby. Typical use-case is a standby cloned manually from an external data source (e.g. Barman), where "recovery.conf" needs to be created (and if required a replication slot). The --dry-run option will check the pre-requisites but not actually create "recovery.conf" or a replication slot. This requires that the upstream node is running, a replication connection can be made and if required a replication slot can be created. Implements GitHub #382.	2018-02-22 15:50:51 +09:00
Ian Barwick	927bf038a0	"standby switchover": check demotion candidate can make replication connection Check it's actually possible for the demotion candidate to attach to the promotion candidate before executing the switchover. As with other checks of this nature, there's a faint possibility the situation could change between the time the check is carried out and the demotion candidate is restarted to connect to the promotion candidate, but there's not a lot we can do about that. The main purpose is to be able to catch existing misconfigurations before anything gets changed. Implements GitHub #370.	2018-02-09 10:00:54 +09:00
Ian Barwick	b705127a34	"repmgr standby register": add --wait-start option Implements GitHub #356.	2018-01-04 14:56:08 +09:00
Ian Barwick	26a9e848fd	Update copyright notices to 2018	2018-01-02 10:19:46 +09:00
Ian Barwick	8c121da8a1	Add diagnostic option "repmgr node check --has-passfile" This checks if the active libpq version (9.6 and later) has the "passfile" option, and returns 0 if present, 1 if not. `	2017-12-11 20:09:48 +09:00
Ian Barwick	30b11c08e6	Disable any configuration settings not compatible with PostgreSQL 9.3 And emit a warning while we're at it.	2017-09-18 13:12:38 +09:00
Ian Barwick	ea2693bc75	Move create_recovery_file() et al to repmgr-action-standby.c As they're only ever called from there.	2017-09-18 09:53:08 +09:00
Ian Barwick	b6b31b15b2	Implement "repmgr cluster cleanup"	2017-09-11 13:48:46 +09:00
Ian Barwick	a9f4a027a7	pgindent run	2017-09-11 11:14:13 +09:00
Ian Barwick	e4f7dc8234	Add copyright notices	2017-09-08 13:27:39 +09:00
Ian Barwick	edee80cc37	Rename option "node check --is-shutdown" to "--is-shutdown-cleanly" As that's what we really want to know. Also return "UNCLEAN_SHUTDOWN" if that's the case, rather than "RUNNING" which is confusing, even though it's a command for internal use.	2017-09-07 11:15:27 +09:00

1 2

90 Commits