repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-07-16 14:29:05 +00:00

Author	SHA1	Message	Date
Ian Barwick	b624fc7efa	Bump version 4.0.5	2018-05-01 09:21:32 +09:00
Ian Barwick	67ccd4dcb3	repmgrd: don't explicitly close connections on shutdown	2018-04-30 15:13:30 +09:00
Ian Barwick	6de3a5a997	Fix parsing of "archive_ready_critical" configuration file parameter. Per report in GitHub #426.	2018-04-28 06:59:20 +09:00
Ian Barwick	f86e89ba45	repmgrd: notify sibling nodes to follow new primary after pg_ctl timeout If "pg_ctl promote" fails due to a timeout, but the promotion itself succeeds, have repmgrd on the new primary explicitly notify any sibling nodes to follow it. Previously the sibling nodes would wait "primary_notification_timeout" seconds before attempting to discover the new primary. This (and preceding commit `eac80ae`) address GitHub #425.	2018-04-27 11:59:00 +09:00
Ian Barwick	a6d0ba07ed	repmgrd: handle pg_ctl timeout It's possible "pg_ctl promote" will timeout, causing "repmgr standby follow" to return with an error; however the promotion itself will usually succeed, so detect this case and handle accordingly.	2018-04-26 19:23:26 +09:00
Ian Barwick	b553a70ad5	repmgrd: always close the connection if the pointer is not NULL	2018-04-25 14:08:17 +09:00
Ian Barwick	3364f8bdf0	Add configuration file parameter "config_directory" This enables explicit provision of an external configuration file directory, which if set will be passed to "pg_ctl" as the -D parameter. Otherwise "pg_ctl" will default to using the data directory, which will cause some operations to fail if the configuration files are not present there. Note this is implemented primarily for feature completeness and for development/testing purposes. Users who have installed "repmgr" from a package should not rely on "pg_ctl" to stop/start/restart PostgreSQL, instead they should set the appropriate "service_..._command" for their operating system. For more details see: https://repmgr.org/docs/4.0/configuration-service-commands.html Note: in a future release, the presence of "config_directory" in repmgr.conf will be used to implictly set "--copy-external-config-files=samepath" when cloning a standby; this is a behaviour change so will be implemented in the next major realease (repmgr 4.1). Implements GitHub #424.	2018-04-25 11:57:27 +09:00
Ian Barwick	242fa287b4	repmgrd: catch corner case in standby connection handle check If repmgrd marks the local node as unavailable, and it was actually restarting but a failover event occured before the next local node check, failover will continue with the stale connection handle. Add a final local node check just before starting the failover process, so repmgrd can reconnect if it wasn't able to before.	2018-04-24 21:55:36 +09:00
Ian Barwick	fa908432c8	Minor doc and log output tweaks	2018-04-24 21:08:31 +09:00
Ian Barwick	afa942fef6	repmgrd: prevent standby connection handle from going stale If monitoring history not in use, there's no activity on the standby's connection handle, so if e.g. the standby is restarted, PQstatus() never returns CONNECTION_BAD and repmgrd never notices the connection is stale. Therefore execute a throw-away statement at "monitor_interval_secs".	2018-04-23 23:51:03 +09:00
Ian Barwick	94cfc66b04	doc: minor clarification	2018-04-20 12:23:04 +09:00
Ian Barwick	87eae9a50f	doc: additional details about repmgrd usage in Debian/Ubuntu	2018-04-20 12:04:15 +09:00
Ian Barwick	82a37f4865	doc: add Debian package details	2018-04-20 10:57:19 +09:00
Ian Barwick	a38f727b7d	doc: Improve CentOS package-related documentation	2018-04-20 10:31:42 +09:00
Ian Barwick	e6df936c1b	doc: link to service command configuration from switchover section	2018-04-19 17:09:10 +09:00
Ian Barwick	91ca997d40	doc: improve configuration documentation With special attention to setting service commands, and extra special mention of "pg_ctlcluster" for Debian/Ubuntu users.	2018-04-19 16:49:26 +09:00
Ian Barwick	65c90a2a64	doc: update CentOS package documentation	2018-04-19 14:27:17 +09:00
Ian Barwick	90cba78f52	repmgrd: tweak event notifications on standby failure The event notification was only being created if there was a valid primary connection; it should be created in any case, so an event notification script can be executed.	2018-04-17 10:27:25 +09:00
Ian Barwick	f8908d7e31	Bump version 4.0.5dev	2018-04-13 10:18:04 +09:00
Ian Barwick	478bbcccbf	Add "dbname=replication" to all replication connection strings Previously repmgr was attempting to make replication connections with "dbname" set to the repmgr database name. While this works if e.g. the repmgr user also has replication permissions, it will fail if a dedicated replication user is specified, who only has permission to access the virtual "replication" database. Change this to use "dbname=replication" if the replication connection user is different to the normal repmgr database user. (We could just always set it to "replication", but that might break existing installations e.g. where a .pgpass file is in use and there's no "replication" entry for the normal repmgr database user). Addresses GitHub #421.	2018-04-12 16:10:02 +09:00
Ian Barwick	a03d41de28	doc: mention --recovery-conf-only introduced in repmgr 4.0.4 Per GitHub #419.	2018-04-12 13:13:11 +09:00
Ian Barwick	f1e527adcb	doc: various updates related to "standby clone" operations.	2018-04-12 13:08:05 +09:00
Ian Barwick	09e597dcdd	Fix superuser password handling When establishing a superuser connection, the connection parameters were being copied from the existing (non-superuser) connection, which in some circumstances can lead to that user's password being included in the copied parameter list. The password parameter, if set, will now always be removed, which will cause libpq to retrieve the correct one from the .pgpass file. Addresses GitHub #400.	2018-04-12 12:50:17 +09:00
Ian Barwick	94a7f0c719	Don't issue a CHECKPOINT after promoting a standby. Issuing a CHECKPOINT immediately after promoting a standby may impact performance. Commit `239a548e9d` ensures one is only issued when required, i.e. during a switchover when pg_rewind will be executed. This reverts commit `a2068768ab`.	2018-04-09 14:39:47 +09:00
Ian Barwick	6ac42f1593	"standby register": add sanity check when --upstream-node-id not supplied If --upstream-node-id was not supplied to "repmgr standby register", repmgr defaults to the primary node as upstream node. If the local node is available, we now double-check that it's attached to the primary, in case the lack of --upstream-node-id was an accidental ommission. This check is only made when the local node is available. This behaviour can be overriden with -F/--force (though it's hard to imagine a scenario where that would be useful). Addresses GitHub #395.	2018-04-05 17:40:05 +09:00
Ian Barwick	94b72382e5	doc: minor FAQ tweaks	2018-04-05 17:10:52 +09:00
Ian Barwick	18c12f58a4	doc: add a section about repmgrd and service commands etc.	2018-04-05 11:47:35 +09:00
Ian Barwick	cf3fa18085	doc: miscelleneous FAQ updates - clarify pg_rewind item - add note about what's included in recovery.conf	2018-04-04 10:08:04 +09:00
Ian Barwick	a5281d93dc	Add TODO for pg_rewind changes coming in PostgreSQL 11	2018-04-03 21:57:50 +09:00
Ian Barwick	0d73d3c2b5	Enable provision of "archive_cleanup_command" in recovery.conf If "archive_cleanup_command" is defined in "repmgr.conf", a corresponding entry will be made in the node's "recovery.conf" file after cloning a standby. Note that we recommend using PgBarman to manage WAL archives, but are providing this facility to help repmgr to be integrated in existing environments. Implements GitHub #416.	2018-04-03 14:11:24 +09:00
Ian Barwick	23c99304a6	"node rejoin": actively check for node to rejoin cluster Previously repmgr was relying on whatever command was configured to start PostgreSQL to determine whether the node being rejoined had started correctly. However it's preferable to actively poll the upstream to confirm it has restarted and actually attached as a standby before confirming success of the "node rejoin" action. This can be overridden with the -W/--no-wait option. (Note that for consistency with other PostgreSQL utilities, the short form of the --wait option is now "-w"; this is currently only used in "repmgr standby follow".) Also update "repmgr node rejoin" documentation with a list of supported options, and add some useful index entries for "pg_rewind". Implements GitHub #415.	2018-04-03 10:36:13 +09:00
Ian Barwick	1ab16bc6c2	doc: fix option description for "repmgr primary register"	2018-04-03 10:10:05 +09:00
Ian Barwick	7f1f04636d	Refactor pg_control parsing The "data_checksum_version" field towards the end of the ControlFileData struct, meaning its position varies between versions. Previously this wasn't a problem as it was only required for operations involving 9.5 and later, and its position within the control file has not changed between the current release and current HEAD. However, in order to support pg_rewind in 9.3 and 9.4, which both have changes in the control file format, we'll need version-specific parsing. This will also make it easier to deal with any future changes to the control file format.	2018-04-02 20:55:10 +09:00
Ian Barwick	6a1797cadd	Enable pg_rewind to be used with PostgreSQL 9.3/9.4 pg_rewind is not part of the core distribution for those, but we provided support in repmgr 3.3 so should extend it to repmgr 4. Note that there is no check in place whether the pg_rewind binary exists, so it's up to the user to ensure it's present. Addresses GitHub #413.	2018-04-02 20:55:04 +09:00
Ian Barwick	94d26dbe9f	Always set "connect_timeout" when pinging a PostgreSQL instance Insert "connect_timeout=2" into the connection parameters, if not explicitly set by the user. This will prevent excessive wait time for the host operating system to report a connection timeout.	2018-04-02 09:31:42 +09:00
Ian Barwick	ae655eb4fd	Add TODO list This file will collate various requests and ideas for future developement. In particular it will reference requests which come in via the GitHub issue tracker, so we can acknowledge and close off the request and not have an open unresolved issue hanging around.	2018-03-30 14:18:51 +09:00
Ian Barwick	65371489c6	repmgrd: handle failover with two nodes in the primary location If two nodes were in the primary location, and at least one node in another location, the non-failed node in the primary location was not recognising itself as a promotion candidate. Addresses GitHub #407.	2018-03-30 12:17:34 +09:00
Ian Barwick	28c7737dc0	Log pg_control access errors as WARNINGs rather than DEBUG This will make it easier to diagnose issues, possibly with an incorrect "data_directory" setting in "repmgr.conf".	2018-03-30 11:24:44 +09:00
Ian Barwick	505d72d19c	"standby switchover": force checkpoint if pg_rewind requested. Addresses issue described in GitHub #378. PostgreSQL itself doesn't issue a checkpoint after promotion to ensure the newly promoted server is available as quickly as possible, so we'll only execute an explicit CHECKPOINT when it's actually required, i.e. when pg_rewind will be executed. This is required as pg_rewind uses the timeline reported in the pg_control file to compare with the server to be rewound, and the pg_control timeline is only updated after the first checkpoint, so there is an interval where pg_rewind will erroneously assume both servers are on the timeline and take no action.	2018-03-30 09:12:25 +09:00
Ian Barwick	b292ac61f8	"standby switchover": update hint	2018-03-30 09:12:21 +09:00
Ian Barwick	293d66bf71	Fix minimum accepted value for "degraded_monitoring_timeout" Should be -1, the default. Addresses GitHub #411.	2018-03-30 09:12:17 +09:00
Ian Barwick	3e1f0ec168	repmgr: move demoted primary check to the final step during switchover This will give the demoted primary more time to start up as a standby, during which "standby follow" can be executed on sibling nodes, if specified.	2018-03-27 16:41:13 +09:00
Ian Barwick	6f9a1f975e	repmgr: poll demoted primary after restart during switchover During a switchover operation, once the demoted primary has been restarted as a standby, repmgr attempts to reconnect to verify its status and drop any redundant replication slots. However it's possible the standby may still be in the startup phase, so poll for "standby_reconnect_timeout" seconds before giving up. Addresses GitHub #408.	2018-03-27 15:58:18 +09:00
Ian Barwick	deea4f69f7	Fix "repmgr cluster crosscheck" output Addresses GitHub #398.	2018-03-27 10:28:27 +09:00
Ian Barwick	37e53108a2	Consolidate connection closure calls	2018-03-27 08:52:23 +09:00
Ian Barwick	96cf06204c	doc: add note about remote command execution When executing a command on a remote server, repmgr expects the remote binary to be in the same location as the local binary. It's reasonable to assume repmgr will be deployed in a unified environment; if not, the onus is on the user to ensure repmgr can find the remote binary, e.g. by creating appropriate symlinks. Addresses query in GitHub #406.	2018-03-27 08:47:56 +09:00
Ian Barwick	381e22c2c7	Misc tweaks to witness code	2018-03-26 20:59:38 +09:00
Ian Barwick	7e2af17783	repmgrd: tweak log notices when marking a standby as failed Announce what we're going to do (set the node record inactive) before performing the action. Makes reading the log slightly easier.	2018-03-23 13:27:37 +08:00
Ian Barwick	b4272853e7	Add event "repmgrd_failover_aborted"	2018-03-23 10:44:00 +08:00
Ian Barwick	562b6ddfc2	Add error code ERR_FOLLOW_FAIL	2018-03-23 10:34:19 +08:00

1 2 3 4 5 ...

815 Commits