repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-23 15:16:29 +00:00

Author	SHA1	Message	Date
Ian Barwick	e3e1c5de4e	Use "immediately_reserve" parameter in pg_create_physical_replication_slot (9.6)	2016-04-04 12:56:00 +09:00
Ian Barwick	190cc7dcb4	Rename copy_configuration () to witness_copy_node_records() As it's witness-specific. Per suggestion from Martín.	2016-04-01 08:44:23 +09:00
Ian Barwick	c48c248c15	Regularly sync witness server repl_nodes table. Although the witness server will resync the repl_nodes table following a failover, other operations (e.g. removing or cloning a standby) were previously not reflected in the witness server's copy of this table. As a short-term workaround, automatically resync the table at regular intervals (defined by the configuration file parameter "witness_repl_nodes_sync_interval_secs", default 30 seconds).	2016-03-29 16:49:28 +09:00
Ian Barwick	ca6cbcf965	Add sanity checks to be sure pg_rewind can be used before executing a switchover	2016-01-28 09:25:00 +09:00
Ian Barwick	f982708b35	Add function test_db_connection() The difference between this and establish_db_connection() is that it outputs any connection failure as a [NOTICE] rather than an [ERROR]; it's intended for use in e.g. polling a server to wait for it to come up/go down, while preventing [ERROR] log lines which may cause confusion.	2016-01-20 07:56:03 +09:00
Ian Barwick	b72058dba8	Update copyright notice to 2016	2016-01-05 15:57:46 +09:00
Ian Barwick	120688013e	Add "standby switchover" mode Perform a switchover by: - stopping current primary node - promoting this standby node to primary - forcing previous primary node to follow this node Caveats: - repmgrd must not be running, otherwise it may attempt a failover (TODO: find some way of notifying repmgrd of planned activity like this) - currently only set up for two-node operation; any other standbys will probably become downstream cascaded standbys of the old primary once it's restarted - as we're executing repmgr remotely (on the old primary), we'll need the location of its configuration file; this can be provided explicitly with -C/--remote-config-file, otherwise repmgr will look in default locations on the remote server - this does not yet support "rewinding" stopped nodes which will be unable to catch up with the primary TODO: - update help, docs - make connection test timeouts/intervals configurable	2015-11-30 12:20:24 +09:00
Ian Barwick	933647d6de	Make t_node_info generally available And have it include all the fields from the repl_nodes table.	2015-11-25 12:57:18 +09:00
Ian Barwick	8e7d110a22	Check for existing master record before deleting it Otherwise repmgr implies it's deleting a record which isn't actually there.	2015-09-24 17:39:39 +09:00
Ian Barwick	c429b0b186	Don't fail with error when registering master if schema already defined Registering a master creates the schema, but it may be desirable to forcibly reregister a master without deleting the schema, so uncouple the dependency. Also ensure schema creation is atomic by wrapping it in a transaction. Per GitHub issue #49.	2015-09-24 16:55:43 +09:00
Tomas Vondra	ef6b24551a	call update_node_record_set_upstream() for STANDBY FOLLOW repmgrd correctly updates ID of the upstream node after automatic failover, but repmgr was not doing that for manual failvers. This moves the existing function to dbutils and modifies it so that it does not rely on global variables with configuration (available just in repmgrd). This should fix issue #67 (hopefully, haven't done much testing).	2015-09-23 12:32:47 +09:00
Ian Barwick	96255b988a	Remove unused function And standardized nomenclature on "master" rather than "primary".	2015-03-24 14:29:54 +09:00
Ian Barwick	3e2c9ed410	Support --fast-checkpoint	2015-03-23 12:18:17 +09:00
Ian Barwick	922dfd88e5	Add configuration option 'event_notification_command' Command to be executed each time an event is logged. Following formatting sequences will be interpolated: %e - event type %d - description %s - success (1 or 0) %t - timestamp	2015-03-16 13:41:13 +09:00
Ian Barwick	0307c51d4b	Add initial event logging code	2015-03-16 07:44:54 +09:00
Ian Barwick	4e6c250830	Remove experimental event logging code Needs more bikeshedding.	2015-03-09 14:39:04 +09:00
Ian Barwick	2339adba6c	Fix event logging when cloning from another standby We can only write to the primary, which we'll need to find seperately when cloning from a standby.	2015-03-06 18:39:36 +09:00
Ian Barwick	491309f4ba	Write events of note to a log table This makes keeping track of events such as failovers much easier. Note that this is for convenience and is not a foolproof auditing log. Sample output: repmgr_db=# SELECT * from repmgr_test.repl_events ; node_id \| event \| successful \| event_timestamp \| details ---------+--------------------------+------------+-------------------------------+---------------------------------------------------------- 1 \| master_register \| t \| 2015-03-06 14:14:08.196636+09 \| 2 \| standby_clone \| t \| 2015-03-06 14:14:17.660768+09 \| Backup method: pg_basebackup; --force: N 2 \| standby_register \| t \| 2015-03-06 14:14:18.762222+09 \| 4 \| witness_create \| t \| 2015-03-06 14:14:22.072815+09 \| 3 \| standby_clone \| t \| 2015-03-06 14:14:23.524673+09 \| Backup method: pg_basebackup; --force: N 3 \| standby_register \| t \| 2015-03-06 14:14:24.620161+09 \| 2 \| repmgrd_start \| t \| 2015-03-06 14:14:29.639096+09 \| 3 \| repmgrd_start \| t \| 2015-03-06 14:14:29.641489+09 \| 4 \| repmgrd_start \| t \| 2015-03-06 14:14:29.648002+09 \| 2 \| standby_promote \| t \| 2015-03-06 14:15:01.956737+09 \| Node 2 was successfully be promoted to master 2 \| repmgrd_failover_promote \| t \| 2015-03-06 14:15:01.964771+09 \| Node 2 promoted to master; old master 1 marked as failed 3 \| repmgrd_failover_follow \| t \| 2015-03-06 14:15:07.228493+09 \| Node 3 now following new upstream node 2 (12 rows)	2015-03-06 14:35:41 +09:00
Ian Barwick	0f8759d316	Consolidate duplicated code	2015-03-04 17:27:51 +09:00
Ian Barwick	3d3f082617	Ensure witness server updates its node records following a failover This involves mainly abstracting the functions which copy and create records from repmgr.c to dbutils.c, as they need to be shared between repmgr and repmgrd. Per issue noted here: https://groups.google.com/forum/#!topic/repmgr/v5nu1Xwf6X0	2015-03-03 08:57:20 +09:00
Ian Barwick	63c416bb76	Set `synchronous_commit` to off for current session Forward-ported from 2.x; need to verify it makes sense.	2015-02-27 11:40:57 +09:00
Ian Barwick	32611f5f04	Add --rsync-only option Sometimes it's desirable to re-sync a "stale" data directory on a standby, rather than start from scratch with pg_basebackup(). This re-adds the rsync code from the 2.x series, with some modifications. TODO: tablespace support.	2015-02-25 14:17:09 +09:00
Ian Barwick	2ece014952	Initial support for physical replication slots Todo: - if slots specified in repmgr.conf, verify server version - store generated slot name in `repl_nodes` table	2015-02-02 15:53:53 +09:00
Ian Barwick	99dae5cdcb	Function `is_witness()` no longer required Node type can be extracted directly from the metadata	2015-01-27 15:32:13 +09:00
Ian Barwick	609453a848	Handle failover of top-level standby Cascaded standbys will not go into failover so we need to ignore these when looking for candidates for promotion.	2015-01-16 12:35:01 +09:00
Ian Barwick	a82d37e48a	Improve node metadata and upstream connecting mechanism To handle cascaded replication we're going to have to keep track of each node's upstream node. Also enumerate the node type ("primary", "standby" or "witness") and mark if active.	2015-01-16 10:28:02 +09:00
Ian Barwick	692204e381	Consolidate duplicated schema check code	2015-01-15 18:34:51 +09:00
Ian Barwick	2ae27521a3	Some infrastructure for supporting cascading replication Does not fully work yet.	2015-01-15 15:37:09 +09:00
Ian Barwick	74a963a10e	Fix schema quoting There was a lot of duplicated/unused related to handling the schema name; consolidated and rationalised.	2015-01-09 15:51:34 +09:00
Ian Barwick	687872e979	get_data_directory() -> get_pg_setting() More code consolidation	2015-01-06 13:47:31 +09:00
Ian Barwick	3033f2dfaf	Fix get_cluster_size() Was returning a pointer to a cleared PQresult	2015-01-06 10:30:33 +09:00
Ian Barwick	718024454e	Add function get_data_directory() Consolidate duplicate code	2015-01-06 10:06:58 +09:00
Ian Barwick	4e9c58c7db	Update copyright to 2015	2015-01-03 08:12:13 +09:00
Ian Barwick	8b69b1e16f	Optionally retrieve `server_version` string as well In a couple of places we'll need to report the human-readable version number	2014-12-29 15:52:53 +09:00
Ian Barwick	f94626bf7b	Refactor version number detection Use the reported `server_version_num` integer for version number detection and comparison. This makes it easier to set an arbitrary minimum supported version (rather than "9.0 or later") as well as future-proofing for 10.x and later.	2014-12-29 14:54:04 +09:00
Christian Kruse	98b1f8d28a	rather big refactoring: use a naming scheme In the past naming of functions, variables and such didn't really have a naming scheme. Now they should have.	2014-03-06 18:34:40 +01:00
Christian Kruse	d8b8bf0e2a	pg_indent'ing all files…	2014-03-06 18:34:40 +01:00
Christian Kruse	0ff14a2aa1	avoid compiler warnings	2014-02-21 13:47:29 +01:00
Christian Kruse	18f1fed77f	fixing wait_connection_availability() wait_connection_availability() did take at least 2 seconds per call in the old incarnation. Now we may finish a call without any sleep at all when the result is already ready at the time called	2014-02-15 01:31:12 +01:00
Christian Kruse	bbb67c55f6	simple past of set is set	2014-01-23 10:50:37 +01:00
Christian Kruse	680f23fb1d	copyright push	2014-01-23 10:37:49 +01:00
Christian Kruse	6f149ead8f	do not exit in guc_setted and guc_setted_typed	2014-01-16 14:48:46 +01:00
Christian Kruse	18206b3a64	do not exit() in is_witness	2014-01-16 14:28:56 +01:00
Christian Kruse	4fabfbbbd0	fix: do not exit in is_standby() Instead we now return an int with 0 meaning „not a standby,“ 1 meaning „is a standby“ and -1 meaning „connection dropped“	2014-01-10 17:11:16 +01:00
Christian Kruse	c41030b40e	Merge branch 'REL2_0_STABLE' Conflicts: HISTORY dbutils.h repmgr.c repmgrd.c version.h	2014-01-10 16:07:33 +01:00
Jaime Casanova	9209248420	Fix oversight in the header of guc_setted_typed()	2013-12-19 11:09:08 -05:00
Jaime Casanova	a1f4285e2b	Add guc_setted_typed() function to allow wal_keep_segmeents to be checked as an integer instead of text Patch by Jay Taylor	2013-12-19 01:22:42 -05:00
Jaime Casanova	ab1d380843	If PQcancel() fails, consider it as if the master is failing. Because PQcancel() establish a new synchronous connection to the database, if it fails it means something wrong has happenned with master. So instead of just ignore the failure, CancelQuery() now reports a failure condition so we can detect master's death in that situation. This is very important specially when only postmaster crashes but other children/backend connections are still there. Because the children connection won't fail and CancelQuery() failure is our only indication of something wrong happenning. Currently we just ignore the PQcancel() failure which leads us to a situation in which we just loop forever trying to cancel the async query. Reported by: Martin Euser <martin.euser@nl.abnamro.com> Problem analyzed and bug spotted by: Andres Freund <andres@2ndquadrant.com> Patch by: Jaime Casanova <jaime@2ndquadrant.com>	2013-07-10 10:21:51 -05:00
Jaime Casanova	b0b44a157f	If PQcancel() fails, consider it as if the master is failing. Because PQcancel() establish a new synchronous connection to the database, if it fails it means something wrong has happenned with master. So instead of just ignore the failure, CancelQuery() now reports a failure condition so we can detect master's death in that situation. This is very important specially when only postmaster crashes but other children/backend connections are still there. Because the children connection won't fail and CancelQuery() failure is our only indication of something wrong happenning. Currently we just ignore the PQcancel() failure which leads us to a situation in which we just loop forever trying to cancel the async query. Reported by: Martin Euser <martin.euser@nl.abnamro.com> Problem analyzed and bug spotted by: Andres Freund <andres@2ndquadrant.com> Patch by: Jaime Casanova <jaime@2ndquadrant.com>	2013-07-10 09:53:45 -05:00
Jaime Casanova	93a999adc7	Formatting code using astyle	2012-12-11 11:49:07 -05:00

1 2

78 Commits