repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-24 15:46:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	ef7bed1b3d	repmgrd: refactor standby monitoring status query and code This had grown somewhat complex with addition of handling for various corner cases. Much of the work has now been delegated to the query itself.	2016-08-16 19:15:58 +09:00
Ian Barwick	6bd1c6a36d	Skip largely pointless master reconnection attempt. Experimental - see notes in code.	2016-08-16 13:25:39 +09:00
Ian Barwick	9831cabd4d	Minor refactoring of do_master_failover() - rename some variables for clarity - ensure all structures are initialised correctly - update code comments	2016-08-16 11:23:59 +09:00
Ian Barwick	a310417a49	Refactor standby monitoring query Addresses GitHub #224	2016-08-11 17:28:16 +09:00
Ian Barwick	84ab37c600	Improve handling of failover events when `failover` is set to `manual` - prevent repmgrd from repeatedly executing the failover code - add event notification 'standby_disconnect_manual' - update documentation This addresses GitHub #221.	2016-08-09 12:09:09 +09:00
Ian Barwick	6a198401db	Fix repmgrd's command line help option parsing As in commit `d0c05e6f46`, properly distinguish between the command line option -? and getopt's unknown option marker '?'	2016-08-08 21:17:56 +09:00
Ian Barwick	cb78802027	repmgrd: prevent endless loops in failover with manual node The LSN reported by the shared memory function defaults to "0/0" (InvalidXLogRecPtr) - this indicates that the repmgrd on that node hasn't been able to update it yet. However during failover several places in the code assumed this is an error, which would cause an endless loop waiting for updates which would never come. To get around this without changing function definitions, we can store an explicit message in the shared memory location field so the caller can tell whether the other node hasn't yet updated the field, or encountered situation which means it should not be considered as a promotion candidate (which in most cases will be because `failover` is set to `manual`. Resolves GitHub #222.	2016-08-08 14:29:24 +09:00
Ian Barwick	02668ee045	Parse the contents of the "pg_basebackup_options" parameter in repmgr.conf This is to ensure that when repmgr executes pg_basebackup it doesn't add any options which would conflict with user-supplied options. This is related to GitHub #206, where the -S/--slot option has been added for 9.6 - it's important to check this doesn't conflict with -X/--xlog-method. While we're at it, rename the ErrorList handling code to ItemList etc. so we can use it for generic non-error-related lists.	2016-07-26 16:12:43 +09:00
Ian Barwick	091541619d	Fix repmgrd monitoring calculation when in archive recovery	2016-07-06 09:27:31 +09:00
Ian Barwick	74f6f97f26	repmgrd: log whether in standby or witness monitor loop This is mainly for development and debugging purposes.	2016-06-29 10:31:57 +09:00
Ian Barwick	f1ee6e19b6	Ensure configuration options correctly initialised in repmgrd.c Per GitHub #150. Also remove unused variable.	2016-06-27 11:26:05 +09:00
Ian Barwick	a2b5ba595a	repmgrd: reword log message for clarity	2016-06-23 09:47:35 +09:00
Ian Barwick	c16ab3c889	Fix handling of global PGconn variables in repmgrd Don't call PQfinish before calling terminate(), elsewhere always set to NULL after calling PQfinish(). This fixes GitHub #182.	2016-06-21 17:30:22 +09:00
Ian Barwick	dd5b6f9f12	Whitespace fixes	2016-06-21 16:04:41 +09:00
Ian Barwick	303bb22ee1	Note potential replication lag check improvement	2016-06-20 12:23:34 +09:00
Ian Barwick	5d8b1a3a31	monitoring: ensure that invalid replication_lag value is not inserted. Per Github #189.	2016-06-20 10:55:25 +09:00
Ian Barwick	1ade1acb22	Report standby location as last apply location when in archive recovery Otherwise the monitoring table's 'last_wal_standby_location' will stay at the location of the last streaming WAL received. This complements the bugfix applied in `e814c1120e`.	2016-06-15 15:41:10 +09:00
Ian Barwick	66fd003ab4	Schema-qualify pg_catalog objects	2016-06-10 17:58:10 +09:00
Martin	b6ebd34e2f	Some other indentation fixes found	2016-06-03 20:20:43 -03:00
Martin	46ff9fb587	No code change, just indentation was incorrect in the failover part making it hard to read.	2016-06-03 20:20:43 -03:00
Ian Barwick	e814c1120e	repmgrd: handle situations where streaming replication is inactive	2016-05-12 22:17:44 +09:00
Ian Barwick	247823db4d	Remove extraneous PQfinish()	2016-05-12 14:05:44 +09:00
Ian Barwick	0a798bf6e4	Comment fixes and formatting tweaks	2016-05-12 09:52:22 +09:00
Ian Barwick	21b2ff1a1f	repmgrd: better handling of missing upstream_node_id Ensure we default to master node.	2016-05-12 09:20:33 +09:00
Ian Barwick	57f9432692	Add missing newlines in log messages	2016-05-11 21:47:40 +09:00
Ian Barwick	54d3c7a4ca	repmgrd: avoid additional connection to local instance in do_master_failover()	2016-05-11 09:55:38 +09:00
Ian Barwick	b0f6b7bad7	repmgrd: rename variable for clarity	2016-05-11 08:29:55 +09:00
Ian Barwick	4dbbf40196	Don't follow the promotion candidate standby if the primary reappears	2016-05-10 13:58:59 +09:00
Ian Barwick	d5e24689a4	Don't terminate a standby's repmgrd if self-promotion fails due to master reappearing Per GitHub #173	2016-05-10 11:45:03 +09:00
Ian Barwick	2946c097f0	repmgrd: rename some variables to better match the system functions they're populated from	2016-04-12 15:51:42 +09:00
Ian Barwick	5d32026b79	Improve debugging output for node resyncing We'll need this for testing.	2016-04-01 11:29:35 +09:00
Ian Barwick	190cc7dcb4	Rename copy_configuration () to witness_copy_node_records() As it's witness-specific. Per suggestion from Martín.	2016-04-01 08:44:23 +09:00
Ian Barwick	c48c248c15	Regularly sync witness server repl_nodes table. Although the witness server will resync the repl_nodes table following a failover, other operations (e.g. removing or cloning a standby) were previously not reflected in the witness server's copy of this table. As a short-term workaround, automatically resync the table at regular intervals (defined by the configuration file parameter "witness_repl_nodes_sync_interval_secs", default 30 seconds).	2016-03-29 16:49:28 +09:00
Ian Barwick	c828598bfb	It's unlikely this situation will occur on a witness server Which is why the error message is for master/standby only.	2016-03-28 15:53:25 +09:00
Ian Barwick	d400d7f9ac	repmgrd: fix error message	2016-02-24 15:33:36 +09:00
Ian Barwick	c6e1bc205a	Prevent repmgr/repmgrd running as root	2016-02-22 14:58:17 +09:00
Ian Barwick	1375adcac8	Standardize capitalisation in log messages	2016-01-28 07:24:45 +09:00
Ian Barwick	e859a58405	Change some repmgrd log messages to NOTICE So key events during failover on promoted and following standbys logged at the same level.	2016-01-27 18:39:27 +09:00
Ian Barwick	b72058dba8	Update copyright notice to 2016	2016-01-05 15:57:46 +09:00
Ian Barwick	7b2439b824	repmgrd: -v/--verbose option does not require a parameter	2016-01-05 10:45:47 +09:00
Ian Barwick	7a4d84379c	Prevent invalid replication_lag values being written to the monitoring table A fix for this was introduced with commit `ee9270fe8d` and removed in `4f1c67a1bf`. Refactor the original fix to simply omit attempting to write an invalid entry to the monitoring table.	2016-01-04 13:31:50 +09:00
Ian Barwick	490e12b1af	Clean up whitespace and comments	2016-01-04 11:58:33 +09:00
Martín Marqués	7b9df3ac8f	Merge pull request #133 from martinmarques/fix-standby-follows-other-node-repmgrd-fails Fix standby follows other node repmgrd fails	2015-12-29 13:25:09 -03:00
Martín Marqués	d6bf870316	Merge pull request #131 from martinmarques/fix-failed-standby Fix failed standby	2015-12-29 13:24:08 -03:00
Ian Barwick	cfec04d19f	Modify log output to hint	2015-12-18 17:24:04 +09:00
Martin	4f1c67a1bf	This doesn't really mean the standby s following a new master, so we are removing it. Basically, on startup the standby will start receiving again from the begining of the WAL and so received will be lower then applied. A proper code is needed to make sure the standby is still following the correct master (as per node information)	2015-12-17 12:17:03 -03:00
Martín Marqués	aca2b9547f	Change where we activate back the standby node that was failed. We will do it where we are sending the message that says that the standby has recovered, eliminating some complexity	2015-12-11 09:36:48 -03:00
Martín Marqués	c9db7f57d2	Fix bug discovered last week which prevents recovered standby from being used in the cluster. Main issue was that if the local repmgrd was not able to connect locally, it would set the local node as failed (active = false). This is fine, because we actually don't know if the node is active (actually, it's not active ATM) so it's best to keep it out of the cluster. The problem is that if the postgres service comes back up, and is able to recover by it self, then we should ack that fact and set it as active. There was another issue related with repmgrd being terminated if the postgres service was downs. This is not the correct thing to do: we should keep trying to connect to the local standby.	2015-12-07 16:14:19 -03:00
Martín Marqués	96ac39ba0f	Fix bug discovered last week which prevents recovered standby from being used in the cluster. Main issue was that if the local repmgrd was not able to connect locally, it would set the local node as failed (active = false). This is fine, because we actually don't know if the node is active (actually, it's not active ATM) so it's best to keep it out of the cluster. The problem is that if the postgres service comes back up, and is able to recover by it self, then we should ack that fact and set it as active. There was another issue related with repmgrd being terminated if the postgres service was downs. This is not the correct thing to do: we should keep trying to connect to the local standby.	2015-12-07 15:59:28 -03:00
Ian Barwick	120688013e	Add "standby switchover" mode Perform a switchover by: - stopping current primary node - promoting this standby node to primary - forcing previous primary node to follow this node Caveats: - repmgrd must not be running, otherwise it may attempt a failover (TODO: find some way of notifying repmgrd of planned activity like this) - currently only set up for two-node operation; any other standbys will probably become downstream cascaded standbys of the old primary once it's restarted - as we're executing repmgr remotely (on the old primary), we'll need the location of its configuration file; this can be provided explicitly with -C/--remote-config-file, otherwise repmgr will look in default locations on the remote server - this does not yet support "rewinding" stopped nodes which will be unable to catch up with the primary TODO: - update help, docs - make connection test timeouts/intervals configurable	2015-11-30 12:20:24 +09:00

1 2 3 4 5 ...

317 Commits