repmgr

mirror of https://github.com/EnterpriseDB/repmgr.git synced 2026-03-23 15:16:29 +00:00

Author	SHA1	Message	Date
Christian Kruse	680f23fb1d	copyright push	2014-01-23 10:37:49 +01:00
Christian Kruse	897daddcc7	removed not needed arguments to avoid compiler warnings	2014-01-22 15:17:28 +01:00
Christian Kruse	de58eff7c1	added a chdir() for proper daemonizing	2014-01-22 14:30:38 +01:00
Christian Kruse	e007a55967	fix: do not use fsync() We do not need fsync(), the fflush() is enough to avoid concurrent logs.	2014-01-22 11:47:50 +01:00
Christian Kruse	4ef6fbb5fe	do not close stderr but reopen it to /dev/null We want stderr to be always a valid file descriptor	2014-01-21 16:25:57 +01:00
Christian Kruse	2e61d7b156	refactoring: daemonizing is now a function	2014-01-21 16:19:49 +01:00
Christian Kruse	4496a0761e	we now use a function and are more sophisticated Refactoring part: we now use a function to generate the PID file. Sophistication: we now check if the PID contained in the file is a valid PID. We ignore the file if it doesn't.	2014-01-21 16:18:15 +01:00
Christian Kruse	3978ead184	use a second fork to avoid a terminal after the setsid() we are the process leader. And as a process leader we are able to open a new terminal, even if we currently don't own one. So we do another fork and do not call setsid() and not become a process leader to avoid that.	2014-01-21 15:51:33 +01:00
Christian Kruse	b36dbf61fe	reopening stdin and stdout to /dev/null now stdin, stdout and stderr should always be valid file handles. Thus we don't close them but reopen them to /dev/null	2014-01-21 15:31:38 +01:00
Christian Kruse	84466ecca5	`log_crit()` is more appropriate	2014-01-21 15:23:20 +01:00
Christian Kruse	649086e5e4	use `unlink()` instead of `remove()` `remove()` will do a rmdir if necessary - we don't want that. So we use `unlink()`	2014-01-21 15:22:31 +01:00
Christian Kruse	7cf2eb440d	renamed config options to a much more descriptive name	2014-01-21 15:19:50 +01:00
Christian Kruse	1db61ce277	fix: fail when repmgr_funcs is not pre-loaded when repmgr_funcs is not pre-loaded `repmgr_update_standby_location()` will return false and `repmgr_get_last_standby_location()` will return an empty string. Thus we may end in an endless loop. To avoid that we fail.	2014-01-21 13:54:10 +01:00
Christian Kruse	41abf9a7ef	fix: flushing and fsync()ing the log file When not flushing and fsync()ing it the output may be garbled due to concurrent writes to the file (system() spawns a child process with stdin/stdout/stderr inherited from it's parent)	2014-01-21 13:52:27 +01:00
Christian Kruse	abebc53ddc	fix: sscanf() does not set variables to 0 on error	2014-01-21 13:48:41 +01:00
Christian Kruse	5fc4a0382f	added config options sleep_delay and sleep_monitor sleep_monitor replaces the old SLEEP_MONITOR define and makes it configurable; this is the interval in which we monitor sleep_delay replaces the old sleep(300) when waiting for the master to recover.	2014-01-17 14:35:50 +01:00
Christian Kruse	a7d3c9b93a	fix: also close stderr when using syslog logging	2014-01-17 12:14:26 +01:00
Christian Kruse	ee9dc9e247	do not use exit() We avoid using exit() to be able to clean up when we have to terminate. This includes removal of the PID file as well as closing database connections.	2014-01-17 11:28:55 +01:00
Christian Kruse	94cb5b94e7	fix: reopen log file on SIGHUP	2014-01-16 17:16:45 +01:00
Christian Kruse	a08aa50f92	fix: close stdin and stdout only in repmgrd closing stdin and stdout might cause problems when using system(), so we avoid it.	2014-01-16 16:01:58 +01:00
Christian Kruse	9563877fbb	new config option, stdout/stdin closed Now stdin and stdout get closed. Additionally stderr gets closed and reopened to the new config option „logfile“ if specified	2014-01-16 15:22:34 +01:00
Christian Kruse	77aa6aa326	do not exit in pg_version	2014-01-16 14:48:42 +01:00
Christian Kruse	18206b3a64	do not exit() in is_witness	2014-01-16 14:28:56 +01:00
Christian Kruse	91446bcf93	fix: do not try to reconnect infinitely	2014-01-10 17:26:02 +01:00
Christian Kruse	dcdf8788ae	fix: handle connection loss to standby We do basically the same as we do for the master since connections drop from time to time	2014-01-10 17:12:03 +01:00
Christian Kruse	4fabfbbbd0	fix: do not exit in is_standby() Instead we now return an int with 0 meaning „not a standby,“ 1 meaning „is a standby“ and -1 meaning „connection dropped“	2014-01-10 17:11:16 +01:00
Christian Kruse	c41030b40e	Merge branch 'REL2_0_STABLE' Conflicts: HISTORY dbutils.h repmgr.c repmgrd.c version.h	2014-01-10 16:07:33 +01:00
Christian Kruse	4c3d7f80ed	now code compiles with -ansi -pedantic and has less warnings	2014-01-09 14:45:07 +01:00
Christian Kruse	0e8ff1730e	added handling of a PID file	2014-01-09 13:04:40 +01:00
Christian Kruse	634fdff303	fix: do not call setup_event_handlers() on WIN32 If we put setup_event_handlers() in #ifdef WIN32, we have to do it for the call and the declaration, too	2014-01-09 12:57:16 +01:00
Christian Kruse	920f925e4b	added a new cli option --daemonize This option forks the process and generates a new session. This effectively detaches it from the shell. Don't forget to redirect stderr or use syslog for logging!	2014-01-08 11:53:15 +01:00
Christian Kruse	9fe2d6886e	white space cleanup	2014-01-07 16:42:06 +01:00
Jaime Casanova	3c8df59eb9	Make repmgr compile in 9.3. Patch provided by Shawn Ellis with some fixes by me.	2013-11-14 00:43:35 -05:00
Jaime Casanova	d99024ba11	Make repmgrd survive to the failover To do this it needs to reconnect to the new master	2013-09-26 11:58:59 -05:00
Jaime Casanova	1afaa3a26f	Rearrange the logic in do_failover() for further improvements. Specially, make this a more coordinated process by making all nodes waiting for the others before going to the next step. This is one step further in following Andres Freund advices but there is still a lot to do in order to complete that, specially it could be needed to add more fields to repl_nodes and to the shm area.	2013-09-23 18:28:58 -05:00
Jaime Casanova	079a7c9f16	In a failover situation get the nodes in a well defined order. When deciding which node will be the new master, we should get the nodes in a well defined order otherwise two standbys could process nodes with the same priority in different order and end up with a two master situation.	2013-07-26 00:59:50 -05:00
Jaime Casanova	3b66a31ac9	In a failover situation get the nodes in a well defined order. When deciding which node will be the new master, we should get the nodes in a well defined order otherwise two standbys could process nodes with the same priority in different order and end up with a two master situation.	2013-07-26 00:52:31 -05:00
Jaime Casanova	ab1d380843	If PQcancel() fails, consider it as if the master is failing. Because PQcancel() establish a new synchronous connection to the database, if it fails it means something wrong has happenned with master. So instead of just ignore the failure, CancelQuery() now reports a failure condition so we can detect master's death in that situation. This is very important specially when only postmaster crashes but other children/backend connections are still there. Because the children connection won't fail and CancelQuery() failure is our only indication of something wrong happenning. Currently we just ignore the PQcancel() failure which leads us to a situation in which we just loop forever trying to cancel the async query. Reported by: Martin Euser <martin.euser@nl.abnamro.com> Problem analyzed and bug spotted by: Andres Freund <andres@2ndquadrant.com> Patch by: Jaime Casanova <jaime@2ndquadrant.com>	2013-07-10 10:21:51 -05:00
Jaime Casanova	b0b44a157f	If PQcancel() fails, consider it as if the master is failing. Because PQcancel() establish a new synchronous connection to the database, if it fails it means something wrong has happenned with master. So instead of just ignore the failure, CancelQuery() now reports a failure condition so we can detect master's death in that situation. This is very important specially when only postmaster crashes but other children/backend connections are still there. Because the children connection won't fail and CancelQuery() failure is our only indication of something wrong happenning. Currently we just ignore the PQcancel() failure which leads us to a situation in which we just loop forever trying to cancel the async query. Reported by: Martin Euser <martin.euser@nl.abnamro.com> Problem analyzed and bug spotted by: Andres Freund <andres@2ndquadrant.com> Patch by: Jaime Casanova <jaime@2ndquadrant.com>	2013-07-10 09:53:45 -05:00
Jaime Casanova	7d94151494	If the node is a witness don't bother asking its position, it always will be 0/0. We just need to check that we can connect to it to determine if we are in the majority.	2013-01-11 03:44:50 -05:00
Jaime Casanova	4191b77e70	If the node is a witness don't bother asking its position, it always will be 0/0. We just need to check that we can connect to it to determine if we are in the majority.	2013-01-11 03:42:08 -05:00
Jaime Casanova	2a5d431481	Fix a problem that caused a standby to promote itself without going to voting procedure. This is because of a race condition inside CheckPrimaryConnection(). This has independently reported by Alex Railean and Dumitru, and Frank Jördens. Analyzed and fixed by Cédric Villemain. The fix have been verified to work by Frank	2012-12-19 12:01:27 -05:00
Jaime Casanova	81b8a944de	Fix a problem that caused a standby to promote itself without going to voting procedure. This is because of a race condition inside CheckPrimaryConnection(). This has independently reported by Alex Railean and Dumitru, and Frank Jördens. Analyzed and fixed by Cédric Villemain. The fix have been verified to work by Frank	2012-12-19 11:45:58 -05:00
Jaime Casanova	93a999adc7	Formatting code using astyle	2012-12-11 11:49:07 -05:00
Jaime Casanova	1b69282df9	Formatting code using astyle	2012-12-11 11:47:59 -05:00
Jaime Casanova	06dd252f69	To select new master it needs to know which standby has received more xlog records from master, so it standby should use pg_last_xlog_receive_location() to report their positions. This solves a possible situation in which a standby that is considered as new master when promoted is no longer the best option.	2012-12-03 09:27:12 -05:00
Jaime Casanova	088ca29fe3	To select new master it needs to know which standby has received more xlog records from master, so it standby should use pg_last_xlog_receive_location() to report their positions. This solves a possible situation in which a standby that is considered as new master when promoted is no longer the best option.	2012-12-03 09:18:08 -05:00
Jaime Casanova	cd1a84252e	Fix node decision logic when priorities are involved. Currently if two nodes with different prorities are equally good to be promoted the second one (with a lower priority, considering them in descending order) will win. Per report from Brailean Dumitru	2012-09-16 02:47:02 -05:00
Jaime Casanova	5f33d9d715	Fix node decision logic when priorities are involved. Currently if two nodes with different prorities are equally good to be promoted the second one (with a lower priority, considering them in descending order) will win. Per report from Brailean Dumitru	2012-09-16 02:38:28 -05:00
Jaime Casanova	2e19b3688b	Add a comment	2012-09-16 02:26:18 -05:00

1 2 3

130 Commits