Fix alignment and syntax

Add sample configuration for systemd support
Allow overriding start, stop and restart commands issued by repmgr
2026-03-23 15:16:29 +00:00 · 2016-09-23 13:45:59 -03:00 · 2016-09-23 13:45:50 -03:00 · 2016-09-23 13:45:32 -03:00 · 2016-08-15 12:31:42 +09:00 · 2016-08-15 12:22:33 +09:00
14 changed files with 877 additions and 307 deletions
--- a/16
+++ b/16
@@ -1,4 +1,18 @@
-3.1.4   2016-07-
+3.1.5   2016-08-15
+        repmgrd: in a failover situation, prevent endless looping when
+          attempting to establish the status of a node with
+          `failover=manual` (Ian)
+        repmgrd: improve handling of failover events on standbys with
+          `failover=manual`, and create a new event notification
+          for this, `standby_disconnect_manual` (Ian)
+        repmgr: add further event notifications (Gianni)
+        repmgr: when executing `standby switchover`, don't collect remote
+          command output unless required (Gianni, Ian)
+        repmgrd: improve standby monitoring query (Ian, based on suggestion
+          from  Álvaro)
+        repmgr: various command line handling improvements (Ian)
+
+3.1.4   2016-07-12
        repmgr: new configuration option for setting "restore_command"
          in the recovery.conf file generated by repmgr (Martín)
        repmgr: add --csv option to "repmgr cluster show" (Gianni)
--- a/README.md
+++ b/README.md
@@ -155,9 +155,15 @@ system.

 - RedHat/CentOS: RPM packages for `repmgr` are available via Yum through
  the PostgreSQL Global Development Group RPM repository ( http://yum.postgresql.org/ ).
-  You need to follow the instructions for your distribution (RedHat, CentOS,
+  Follow the instructions for your distribution (RedHat, CentOS,
  Fedora, etc.) and architecture as detailed at yum.postgresql.org.

+  2ndQuadrant also provides its own RPM packages which are made available
+  at the same time as each `repmgr` release, as it can take some days for
+  them to become available via the main PGDG repository. See here for details:
+
+     http://repmgr.org/yum-repository.html
+
 - Debian/Ubuntu: the most recent `repmgr` packages are available from the
  PostgreSQL Community APT repository ( http://apt.postgresql.org/ ).
  Instructions can be found in the APT section of the PostgreSQL Wiki
@@ -215,6 +221,34 @@ command line options:
 - `-b/--pg_bindir`


+### Command line options and environment variables
+
+For some commands, e.g. `repmgr standby clone`, database connection parameters
+need to be provided. Like other PostgreSQL utilities, following standard
+parameters can be used:
+
+- `-d/--dbname=DBNAME`
+- `-h/--host=HOSTNAME`
+- `-p/--port=PORT`
+- `-U/--username=USERNAME`
+
+If `-d/--dbname` contains an `=` sign or starts with a valid URI prefix (`postgresql://`
+or `postgres://`), it is treated as a conninfo string. See the PostgreSQL
+documentation for further details:
+
+  https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-CONNSTRING
+
+Note that if a `conninfo` string is provided, values set in this will override any
+provided as individual parameters. For example, with `-d 'host=foo' --host bar`, `foo`
+will be chosen over `bar`.
+
+Like other PostgreSQL utilities, `repmgr` will default to any values set in environment
+variables if explicit command line parameters are not provided. See the PostgreSQL
+documentation for further details:
+
+  https://www.postgresql.org/docs/current/static/libpq-envars.html
+
+
 Setting up a simple replication cluster with repmgr
 ---------------------------------------------------

@@ -383,14 +417,32 @@ Clone the standby with:
    [2016-01-07 17:21:28] [NOTICE] you can now start your PostgreSQL server
    [2016-01-07 17:21:28] [HINT] for example : pg_ctl -D /path/to/node2/data/ start

-This will clone the PostgreSQL data directory files from the master at repmgr_node1
-using PostgreSQL's pg_basebackup utility. A `recovery.conf` file containing the
+This will clone the PostgreSQL data directory files from the master at `repmgr_node1`
+using PostgreSQL's `pg_basebackup` utility. A `recovery.conf` file containing the
 correct parameters to start streaming from this master server will be created
-automatically, and unless otherwise the `postgresql.conf` and `pg_hba.conf`
+automatically, and unless otherwise specified, the `postgresql.conf` and `pg_hba.conf`
 files will be copied from the master.

-Make any adjustments to the PostgreSQL configuration files now, then start the
-standby server.
+Be aware that when initially cloning a standby, you will need to ensure
+that all required WAL files remain available while the cloning is taking
+place. To ensure this happens when using the default `pg_basebackup` method,
+`repmgr` will set `pg_basebackup`'s `--xlog-method` parameter to `stream`,
+which will ensure all WAL files generated during the cloning process are
+streamed in parallel with the main backup. Note that this requires two
+replication connections to be available.
+
+To override this behaviour, in `repmgr.conf` set `pg_basebackup`'s
+`--xlog-method` parameter to `fetch`:
+
+    pg_basebackup_options='--xlog-method=fetch'
+
+and ensure that `wal_keep_segments` is set to an appropriately high value.
+See the `pg_basebackup` documentation for details:
+
+    https://www.postgresql.org/docs/current/static/app-pgbasebackup.html
+
+Make any adjustments to the standby's PostgreSQL configuration files now,
+then start the server.

 * * *

@@ -470,7 +522,11 @@ so should be used with care.
 Further options can be passed to the `pg_basebackup` utility via
 the setting `pg_basebackup_options` in `repmgr.conf`. See the PostgreSQL
 documentation for more details of available options:
+<<<<<<< HEAD
+  http://www.postgresql.org/docs/current/static/app-pgbasebackup.html
+=======
  https://www.postgresql.org/docs/current/static/app-pgbasebackup.html
+>>>>>>> 72f9b0145afab1060dd1202c8f8937653c8b2e39

 ### Using rsync to clone a standby

@@ -488,7 +544,6 @@ and destination server as the contents of files existing on both servers need
 to be compared, meaning this method is not necessarily faster than making a
 fresh clone with `pg_basebackup`.

-
 ### Dealing with PostgreSQL configuration files

 By default, `repmgr` will attempt to copy the standard configuration files
@@ -503,6 +558,21 @@ which enables any valid `rsync` options to be passed to that command, e.g.:

    rsync_options='--exclude=postgresql.local.conf'

+### Controlling `primary_conninfo` in `recovery.conf`
+
+`repmgr` will create the `primary_conninfo` setting in `recovery.conf` based
+on the connection parameters provided to `repmgr standby clone` and PostgreSQL's
+standard connection defaults, including any environment variables set on the
+local node.
+
+To include specific connection parameters other than the standard host, port,
+username and database values (e.g. `sslmode`), include these in a `conninfo`-style
+tring passed to `repmgr` with `-d/--dbname` (see above for details), and/or set
+appropriate environment variables.
+
+Note that PostgreSQL will always set explicit defaults for `sslmode` and
+`sslcompression`.
+

 Setting up cascading replication with repmgr
 --------------------------------------------
@@ -576,6 +646,10 @@ To enable `repmgr` to use replication slots, set the boolean parameter
 Note that `repmgr` will fail with an error if this option is specified when
 working with PostgreSQL 9.3.

+Replication slots must be enabled in `postgresql.conf` by setting the parameter
+`max_replication_slots` to at least the number of expected standbys (changes
+to this parameter require a server restart).
+
 When cloning a standby, `repmgr` will automatically generate an appropriate
 slot name, which is stored in the `repl_nodes` table, and create the slot
 on the master:
@@ -598,18 +672,6 @@ Note that a slot name will be created by default for the master but not
 actually used unless the master is converted to a standby using e.g.
 `repmgr standby switchover`.

-Be aware that when initially cloning a standby, you will need to ensure
-that all required WAL files remain available while the cloning is taking
-place. If using the default `pg_basebackup` method, we recommend setting
-`pg_basebackup`'s `--xlog-method` parameter to `stream` like this:
-
-    pg_basebackup_options='--xlog-method=stream'
-
-See the `pg_basebackup` documentation for details:
-    https://www.postgresql.org/docs/current/static/app-pgbasebackup.html
-
-Otherwise it's necessary to set `wal_keep_segments` to an appropriately high
-value.

 Further information on replication slots in the PostgreSQL documentation:
    https://www.postgresql.org/docs/current/interactive/warm-standby.html#STREAMING-REPLICATION-SLOTS
@@ -907,7 +969,7 @@ actions happening, but we strongly recommend executing `repmgr` directly.

 `repmgrd` can be started simply with e.g.:

-    repmgrd -f /etc/repmgr.conf --verbose > $HOME/repmgr/repmgr.log 2>&1
+    repmgrd -f /etc/repmgr.conf --verbose >> $HOME/repmgr/repmgr.log 2>&1

 For permanent operation, we recommend using the options `-d/--daemonize` to
 detach the `repmgrd` process, and `-p/--pid-file` to write the process PID
@@ -929,7 +991,7 @@ table looks like this:


 Start `repmgrd` on each standby and verify that it's running by examining
-the log output, which at default log level will look like this:
+the log output, which at log level INFO will look like this:

    [2016-01-05 13:15:40] [INFO] checking cluster configuration with schema 'repmgr_test'
    [2016-01-05 13:15:40] [INFO] checking node 2 in cluster 'test'
@@ -1029,7 +1091,7 @@ the length of time it takes to determine that the connection is not possible.
 In particular explicitly setting a parameter for `connect_timeout` should
 be considered; the effective minimum value of `2` (seconds) will ensure
 that a connection failure at network level is reported as soon as possible,
-otherwise dependeing on the system settings (e.g. `tcp_syn_retries` in Linux)
+otherwise depending on the system settings (e.g. `tcp_syn_retries` in Linux)
 a delay of a minute or more is possible.

 For further details on `conninfo` network connection parameters, see:
@@ -1070,9 +1132,16 @@ table , it's advisable to regularly purge historical data with
 `repmgr cluster cleanup`; use the `-k/--keep-history` to specify how
 many day's worth of data should be retained.

+It's possible to use `repmgrd` to provide monitoring only for some or all
+nodes by setting `failover = manual` in the node's `repmgr.conf`. In the
+event of the node's upstream failing, no failover action will be taken
+and the node will require manual intervention to be reattached to replication.
+If this occurs, event notification `standby_disconnect_manual` will be
+created.
+
 Note that when a standby node is not streaming directly from its upstream
-node, i.e. recovering WAL from an archive, `apply_lag` will always
-appear as `0 bytes`.
+node, e.g. recovering WAL from an archive, `apply_lag` will always appear as
+`0 bytes`.


 Using a witness server with repmgrd
@@ -1169,6 +1238,7 @@ The following event types are available:
  * `standby_promote`
  * `standby_follow`
  * `standby_switchover`
+  * `standby_disconnect_manual`
  * `witness_create`
  * `witness_create`
  * `repmgrd_start`
@@ -1330,17 +1400,32 @@ which contains connection details for the local database.
    when analyzing connectivity from a particular node.

    This command requires a valid `repmgr.conf` file to be provided; no
-    additional arguments are required.
+    additional arguments are needed.

    Example:

        $ repmgr -f /etc/repmgr.conf cluster show

        Role      | Name  | Upstream | Connection String
-        ----------+-------|----------|--------------------------------------------
-        * master  | node1 |          | host=repmgr_node1 dbname=repmgr user=repmgr
-          standby | node2 | node1    | host=repmgr_node1 dbname=repmgr user=repmgr
-          standby | node3 | node2    | host=repmgr_node1 dbname=repmgr user=repmgr
+        ----------+-------|----------|----------------------------------------
+        * master  | node1 |          | host=db_node1 dbname=repmgr user=repmgr
+          standby | node2 | node1    | host=db_node2 dbname=repmgr user=repmgr
+          standby | node3 | node2    | host=db_node3 dbname=repmgr user=repmgr
+
+    To show database connection errors when polling nodes, run the command in
+    `--verbose` mode.
+
+    The `cluster show` command now accepts the optional parameter `--csv`, which
+    outputs the replication cluster's status in a simple CSV format, suitable for
+    parsing by scripts:
+
+        $ repmgr -f /etc/repmgr.conf cluster show --csv
+        1,-1
+        2,0
+        3,1
+
+    The first column is the node's ID, and the second column represents the
+    node's status (0 = master, 1 = standby, -1 = failed).

 * `cluster cleanup`

@@ -1359,20 +1444,22 @@ which contains connection details for the local database.
 `repmgr` or `repmgrd` will return one of the following error codes on program
 exit:

-* SUCCESS (0)              Program ran successfully.
-* ERR_BAD_CONFIG (1)       Configuration file could not be parsed or was invalid
-* ERR_BAD_RSYNC (2)        An rsync call made by the program returned an error
-* ERR_NO_RESTART (4)       An attempt to restart a PostgreSQL instance failed
-* ERR_DB_CON (6)           Error when trying to connect to a database
-* ERR_DB_QUERY (7)         Error while executing a database query
-* ERR_PROMOTED (8)         Exiting program because the node has been promoted to master
-* ERR_BAD_PASSWORD (9)     Password used to connect to a database was rejected
-* ERR_STR_OVERFLOW (10)    String overflow error
-* ERR_FAILOVER_FAIL (11)   Error encountered during failover (repmgrd only)
-* ERR_BAD_SSH (12)         Error when connecting to remote host via SSH
-* ERR_SYS_FAILURE (13)     Error when forking (repmgrd only)
-* ERR_BAD_BASEBACKUP (14)  Error when executing pg_basebackup
-* ERR_MONITORING_FAIL (16) Unrecoverable error encountered during monitoring (repmgrd only)
+* SUCCESS (0)               Program ran successfully.
+* ERR_BAD_CONFIG (1)        Configuration file could not be parsed or was invalid
+* ERR_BAD_RSYNC (2)         An rsync call made by the program returned an error (repmgr only)
+* ERR_NO_RESTART (4)        An attempt to restart a PostgreSQL instance failed
+* ERR_DB_CON (6)            Error when trying to connect to a database
+* ERR_DB_QUERY (7)          Error while executing a database query
+* ERR_PROMOTED (8)          Exiting program because the node has been promoted to master
+* ERR_STR_OVERFLOW (10)     String overflow error
+* ERR_FAILOVER_FAIL (11)    Error encountered during failover (repmgrd only)
+* ERR_BAD_SSH (12)          Error when connecting to remote host via SSH (repmgr only)
+* ERR_SYS_FAILURE (13)      Error when forking (repmgrd only)
+* ERR_BAD_BASEBACKUP (14)   Error when executing pg_basebackup (repmgr only)
+* ERR_MONITORING_FAIL (16)  Unrecoverable error encountered during monitoring (repmgrd only)
+* ERR_BAD_BACKUP_LABEL (17) Corrupt or unreadable backup label encountered (repmgr only)
+* ERR_SWITCHOVER_FAIL (18)  Error encountered during switchover (repmgr only)
+

 Support and Assistance
 ----------------------
@@ -1418,5 +1505,6 @@ Thanks from the repmgr core team.
 Further reading
 ---------------

+* http://blog.2ndquadrant.com/improvements-in-repmgr-3-1-4/
 * http://blog.2ndquadrant.com/managing-useful-clusters-repmgr/
 * http://blog.2ndquadrant.com/easier_postgresql_90_clusters/
--- a/config.c
+++ b/config.c
@@ -219,6 +219,9 @@ parse_config(t_configuration_options *options)
 	memset(options->node_name, 0, sizeof(options->node_name));
 	memset(options->promote_command, 0, sizeof(options->promote_command));
 	memset(options->follow_command, 0, sizeof(options->follow_command));
+	memset(options->stop_command, 0, sizeof(options->stop_command));
+	memset(options->start_command, 0, sizeof(options->start_command));
+	memset(options->restart_command, 0, sizeof(options->restart_command));
 	memset(options->rsync_options, 0, sizeof(options->rsync_options));
 	memset(options->ssh_options, 0, sizeof(options->ssh_options));
 	memset(options->pg_bindir, 0, sizeof(options->pg_bindir));
@@ -341,6 +344,12 @@ parse_config(t_configuration_options *options)
 			strncpy(options->promote_command, value, MAXLEN);
 		else if (strcmp(name, "follow_command") == 0)
 			strncpy(options->follow_command, value, MAXLEN);
+		else if (strcmp(name, "stop_command") == 0)
+			strncpy(options->stop_command, value, MAXLEN);
+		else if (strcmp(name, "start_command") == 0)
+			strncpy(options->start_command, value, MAXLEN);
+		else if (strcmp(name, "restart_command") == 0)
+			strncpy(options->restart_command, value, MAXLEN);
 		else if (strcmp(name, "master_response_timeout") == 0)
 			options->master_response_timeout = repmgr_atoi(value, "master_response_timeout", &config_errors, false);
 		/*
--- a/config.h
+++ b/config.h
@@ -62,6 +62,9 @@ typedef struct
 	char		node_name[MAXLEN];
 	char		promote_command[MAXLEN];
 	char		follow_command[MAXLEN];
+	char		stop_command[MAXLEN];
+	char		start_command[MAXLEN];
+	char		restart_command[MAXLEN];
 	char		loglevel[MAXLEN];
 	char		logfacility[MAXLEN];
 	char		rsync_options[QUERY_STR_LEN];
@@ -87,7 +90,7 @@ typedef struct
 * The following will initialize the structure with a minimal set of options;
 * actual defaults are set in parse_config() before parsing the configuration file
 */
-#define T_CONFIGURATION_OPTIONS_INITIALIZER { "", -1, NO_UPSTREAM_NODE, "", MANUAL_FAILOVER, -1, "", "", "", "", "", "", "", -1, -1, -1, "", "", "", "", "", 0, 0, 0, 0, "", { NULL, NULL }, { NULL, NULL } }
+#define T_CONFIGURATION_OPTIONS_INITIALIZER { "", -1, NO_UPSTREAM_NODE, "", MANUAL_FAILOVER, -1, "", "", "", "", "", "", "", "", "", "", -1, -1, -1, "", "", "", "", "", 0, 0, 0, 0, "", { NULL, NULL }, {NULL, NULL} }

 typedef struct ErrorListCell
 {
--- a/dbutils.c
+++ b/dbutils.c
@@ -34,7 +34,7 @@ char repmgr_schema_quoted[MAXLEN] = "";
 static int _get_node_record(PGconn *conn, char *cluster, char *sqlquery, t_node_info *node_info);

 PGconn *
-_establish_db_connection(const char *conninfo, const bool exit_on_error, const bool log_notice)
+_establish_db_connection(const char *conninfo, const bool exit_on_error, const bool log_notice, const bool verbose_only)
 {
 	/* Make a connection to the database */
 	PGconn	   *conn = NULL;
@@ -50,15 +50,23 @@ _establish_db_connection(const char *conninfo, const bool exit_on_error, const b
 	/* Check to see that the backend connection was successfully made */
 	if ((PQstatus(conn) != CONNECTION_OK))
 	{
-		if (log_notice)
+		bool emit_log = true;
+
+		if (verbose_only == true && verbose_logging == false)
+			emit_log = false;
+
+		if (emit_log)
 		{
-			log_notice(_("connection to database failed: %s\n"),
-					PQerrorMessage(conn));
-		}
-		else
-		{
-			log_err(_("connection to database failed: %s\n"),
-					PQerrorMessage(conn));
+			if (log_notice)
+			{
+				log_notice(_("connection to database failed: %s\n"),
+						   PQerrorMessage(conn));
+			}
+			else
+			{
+				log_err(_("connection to database failed: %s\n"),
+						PQerrorMessage(conn));
+			}
 		}

 		if (exit_on_error)
@@ -71,16 +79,35 @@ _establish_db_connection(const char *conninfo, const bool exit_on_error, const b
 	return conn;
 }

+
+/*
+ * Establish a database connection, optionally exit on error
+ */
 PGconn *
 establish_db_connection(const char *conninfo, const bool exit_on_error)
 {
-	return _establish_db_connection(conninfo, exit_on_error, false);
+	return _establish_db_connection(conninfo, exit_on_error, false, false);
 }

+/*
+ * Attempt to establish a database connection, never exit on error, only
+ * output error messages if --verbose option used
+ */
 PGconn *
-test_db_connection(const char *conninfo, const bool exit_on_error)
+establish_db_connection_quiet(const char *conninfo)
 {
-	return _establish_db_connection(conninfo, exit_on_error, true);
+	return _establish_db_connection(conninfo, false, false, true);
+}
+
+/*
+ * Attempt to establish a database connection, never exit on error,
+ * output connection error messages as NOTICE (useful when connection
+ * failure is expected)
+ */
+PGconn *
+test_db_connection(const char *conninfo)
+{
+	return _establish_db_connection(conninfo, false, true, false);
 }


--- a/dbutils.h
+++ b/dbutils.h
@@ -81,11 +81,12 @@ typedef struct s_replication_slot

 PGconn *_establish_db_connection(const char *conninfo,
 								 const bool exit_on_error,
-								 const bool log_notice);
+								 const bool log_notice,
+								 const bool verbose_only);
 PGconn *establish_db_connection(const char *conninfo,
 								const bool exit_on_error);
-PGconn *test_db_connection(const char *conninfo,
-						   const bool exit_on_error);
+PGconn *establish_db_connection_quiet(const char *conninfo);
+PGconn *test_db_connection(const char *conninfo);
 PGconn *establish_db_connection_by_params(const char *keywords[],
 								  const char *values[],
 								  const bool exit_on_error);
--- a/errcode.h
+++ b/errcode.h
@@ -29,7 +29,6 @@
 #define ERR_DB_CON 6
 #define ERR_DB_QUERY 7
 #define ERR_PROMOTED 8
-#define ERR_BAD_PASSWORD 9
 #define ERR_STR_OVERFLOW 10
 #define ERR_FAILOVER_FAIL 11
 #define ERR_BAD_SSH 12
--- a/log.c
+++ b/log.c
@@ -142,7 +142,7 @@ log_verbose(int level, const char *fmt, ...)


 bool
-logger_init(t_configuration_options * opts, const char *ident)
+logger_init(t_configuration_options *opts, const char *ident)
 {
 	char	   *level = opts->loglevel;
 	char	   *facility = opts->logfacility;
--- a/log.h
+++ b/log.h
@@ -130,5 +130,7 @@ __attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));

 extern int	log_type;
 extern int	log_level;
+extern int	verbose_logging;
+extern int	terse_logging;

-#endif
+#endif /* _REPMGR_LOG_H_ */
--- a/repmgr.c
+++ b/repmgr.c
--- a/repmgr.conf.sample
+++ b/repmgr.conf.sample
@@ -101,6 +101,29 @@
 # (if not provided, defaults to system $PATH)
 #pg_bindir=/usr/bin/

+# service control commands
+#
+# repmgr provides options to to override the default pg_ctl commands
+# used to stop, start  and restart the PostgreSQL cluster
+#
+# NOTE: These commands must be runnable on remote nodes as well for switchover
+# to function correctly.
+#
+# If you use sudo, the user repmgr runs as (usually 'postgres')  must have
+# passwordless sudo access to execute the command
+#
+# For example, to use systemd, you may use the following configuration:
+#
+#    # this is required when running sudo over ssh without -t:
+#    Defaults:postgres !requiretty
+#    postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql-9.5, \
+#       /usr/bin/systemctl start postgresql-9.5, \
+#       /usr/bin/systemctl restart postgresql-9.5
+#
+# start_command = systemctl start postgresql-9.5
+# stop_command = systemctl stop postgresql-9.5
+# restart_command = systemctl restart postgresql-9.5
+
 # external command options

 #rsync_options=--archive --checksum --compress --progress --rsh="ssh -o \"StrictHostKeyChecking no\""
@@ -144,10 +167,18 @@
 #reconnect_interval=10

 # Autofailover options
-#failover=manual    # one of 'automatic', 'manual'
-                    # (default: manual)
-#priority=100       # a value of zero or less prevents the node being promoted to primary
+#failover=manual    # one of 'automatic', 'manual' (default: manual)
+                    # defines the action to take in the event of upstream failure
+                    #
+                    # 'automatic': repmgrd will automatically attempt to promote the
+                    #    node or follow the new upstream node
+                    # 'manual': repmgrd will take no action and the mode will require
+                    #    manual attention to reattach it to replication
+
+#priority=100       # indicate a preferred priorty for promoting nodes
+                    # a value of zero or less prevents the node being promoted to primary
                    # (default: 100)
+
 #promote_command='repmgr standby promote -f /path/to/repmgr.conf'
 #follow_command='repmgr standby follow -f /path/to/repmgr.conf -W'

--- a/repmgr.h
+++ b/repmgr.h
@@ -47,6 +47,15 @@
 #define NO_UPSTREAM_NODE	-1
 #define UNKNOWN_NODE_ID     -1

+#define OPT_HELP                         1
+#define OPT_CHECK_UPSTREAM_CONFIG        2
+#define OPT_RECOVERY_MIN_APPLY_DELAY     3
+#define OPT_IGNORE_EXTERNAL_CONFIG_FILES 4
+#define OPT_CONFIG_ARCHIVE_DIR           5
+#define OPT_PG_REWIND                    6
+#define OPT_PWPROMPT                     7
+#define OPT_CSV                          8
+#define OPT_INITDB_NO_PWPROMPT           9


 /* Run time options type */
@@ -92,11 +101,10 @@ typedef struct
 	char		recovery_min_apply_delay[MAXLEN];

 	/* deprecated command line options */
-	char		localport[MAXLEN];
-	bool		initdb_no_pwprompt;
+	char            localport[MAXLEN];
 }	t_runtime_options;

-#define T_RUNTIME_OPTIONS_INITIALIZER { "", "", "", "", "", "", "", DEFAULT_WAL_KEEP_SEGMENTS, false, false, false, false, false, false, false, false, false, false, "", "", "", "", "fast", "", 0, "", "", "", false }
+#define T_RUNTIME_OPTIONS_INITIALIZER { "", "", "", "", "", "", "", DEFAULT_WAL_KEEP_SEGMENTS, false, false, false, false, false, false, false, false, false, false, "", "", "", "", "fast", "", 0, "", "", ""}

 struct BackupLabel
 {
--- a/repmgrd.c
+++ b/repmgrd.c
@@ -41,7 +41,10 @@
 #include "access/xlogdefs.h"
 #include "pqexpbuffer.h"

+/* Message strings passed in repmgrSharedState->location */

+#define PASSIVE_NODE "PASSIVE_NODE"
+#define LSN_QUERY_ERROR "LSN_QUERY_ERROR"

 /* Local info */
 t_configuration_options local_options = T_CONFIGURATION_OPTIONS_INITIALIZER;
@@ -59,6 +62,13 @@ t_node_info node_info;

 bool		failover_done = false;

+/*
+ * when `failover=manual`, and the upstream server has gone away,
+ * this flag is set to indicate we should connect to whatever the
+ * current master is to update monitoring information
+ */
+bool		manual_mode_upstream_disconnected = false;
+
 char	   *pid_file = NULL;

 static void help(void);
@@ -124,7 +134,7 @@ main(int argc, char **argv)
 		{"monitoring-history", no_argument, NULL, 'm'},
 		{"daemonize", no_argument, NULL, 'd'},
 		{"pid-file", required_argument, NULL, 'p'},
-		{"help", no_argument, NULL, '?'},
+		{"help", no_argument, NULL, OPT_HELP},
 		{"version", no_argument, NULL, 'V'},
 		{NULL, 0, NULL, 0}
 	};
@@ -158,6 +168,23 @@ main(int argc, char **argv)
 	{
 		switch (c)
 		{
+			case '?':
+				/* Actual help option given */
+				if (strcmp(argv[optind - 1], "-?") == 0)
+				{
+					help();
+					exit(SUCCESS);
+				}
+				/* unknown option reported by getopt */
+				else
+					goto unknown_option;
+				break;
+			case OPT_HELP:
+				help();
+				exit(SUCCESS);
+			case 'V':
+				printf("%s %s (PostgreSQL %s)\n", progname(), REPMGR_VERSION, PG_VERSION);
+				exit(SUCCESS);
 			case 'f':
 				config_file = optarg;
 				break;
@@ -173,13 +200,9 @@ main(int argc, char **argv)
 			case 'p':
 				pid_file = optarg;
 				break;
-			case '?':
-				help();
-				exit(SUCCESS);
-			case 'V':
-				printf("%s %s (PostgreSQL %s)\n", progname(), REPMGR_VERSION, PG_VERSION);
-				exit(SUCCESS);
+
 			default:
+		unknown_option:
 				usage();
 				exit(ERR_BAD_CONFIG);
 		}
@@ -433,6 +456,7 @@ main(int argc, char **argv)
 					my_local_conn = establish_db_connection(local_options.conninfo, true);
 					update_registration();
 				}
+
 				/* Log startup event */
 				if (startup_event_logged == false)
 				{
@@ -639,7 +663,7 @@ witness_monitor(void)
 								 local_options.master_response_timeout) != 1)
 		return;

-	/* Get local xlog info */
+	/* Get timestamp for monitoring update */
 	sqlquery_snprintf(sqlquery, "SELECT CURRENT_TIMESTAMP");

 	res = PQexec(my_local_conn, sqlquery);
@@ -720,6 +744,8 @@ standby_monitor(void)
 	const char *upstream_node_type = NULL;

 	bool		receiving_streamed_wal = true;
+
+
 	/*
 	 * Verify that the local node is still available - if not there's
 	 * no point in doing much else anyway
@@ -741,15 +767,32 @@ standby_monitor(void)
 		goto continue_monitoring_standby;
 	}

-	upstream_conn = get_upstream_connection(my_local_conn,
-											local_options.cluster_name,
-											local_options.node,
-											&upstream_node_id,
-											upstream_conninfo);
+	/*
+	 * Standby has `failover` set to manual and is disconnected from
+	 * replication following a prior upstream node failure - we'll
+	 * find the master to be able to write monitoring information, if
+	 * required
+	 */
+	if (manual_mode_upstream_disconnected == true)
+	{
+		upstream_conn = get_master_connection(my_local_conn,
+												local_options.cluster_name,
+												&upstream_node_id,
+												upstream_conninfo);
+		upstream_node_type = "master";
+	}
+	else
+	{
+		upstream_conn = get_upstream_connection(my_local_conn,
+												local_options.cluster_name,
+												local_options.node,
+												&upstream_node_id,
+												upstream_conninfo);

-	upstream_node_type = (upstream_node_id == master_options.node)
-		? "master"
-		: "upstream";
+		upstream_node_type = (upstream_node_id == master_options.node)
+			? "master"
+			: "upstream";
+	}

 	/*
 	 * Check that the upstream node is still available
@@ -764,29 +807,52 @@ standby_monitor(void)

 	if (PQstatus(upstream_conn) != CONNECTION_OK)
 	{
+		int previous_master_node_id = master_options.node;
+
 		PQfinish(upstream_conn);
 		upstream_conn = NULL;

+		/*
+		 * When `failover=manual`, no actual failover will be performed, instead
+		 * the following happens:
+		 *  - find the new master
+		 *  - create an event notification `standby_disconnect_manual`
+		 *  - set a flag to indicate we're disconnected from replication,
+		 */
 		if (local_options.failover == MANUAL_FAILOVER)
 		{
 			log_err(_("Unable to reconnect to %s. Now checking if another node has been promoted.\n"), upstream_node_type);

+			/*
+			 * Set the location string in shared memory to indicate to other
+			 * repmgrd instances that we're *not* a promotion candidate and
+			 * that other repmgrd instance should not expect location updates
+			 * from us
+			 */
+
+			update_shared_memory(PASSIVE_NODE);
+
 			for (connection_retries = 0; connection_retries < local_options.reconnect_attempts; connection_retries++)
 			{
 				master_conn = get_master_connection(my_local_conn,
 					local_options.cluster_name, &master_options.node, NULL);
+
 				if (PQstatus(master_conn) == CONNECTION_OK)
 				{
 					/*
 					 * Connected, we can continue the process so break the
 					 * loop
 					 */
-					log_err(_("connected to node %d, continuing monitoring.\n"),
+					log_notice(_("connected to node %d, continuing monitoring.\n"),
 							master_options.node);
 					break;
 				}
 				else
 				{
+					/*
+					 * XXX this is the only place where `retry_promote_interval_secs`
+					 * is used - this parameter should be renamed or possibly be replaced
+					 */
 					log_err(
 					    _("no new master found, waiting %i seconds before retry...\n"),
 					    local_options.retry_promote_interval_secs
@@ -816,6 +882,36 @@ standby_monitor(void)

 				terminate(ERR_DB_CON);
 			}
+
+			/*
+			 * connected to a master - is it the same as the former upstream?
+			 * if not:
+			 *  - create event standby_disconnect
+			 *  - set global "disconnected_manual_standby"
+			 */
+
+			if (previous_master_node_id != master_options.node)
+			{
+				PQExpBufferData errmsg;
+				initPQExpBuffer(&errmsg);
+
+				appendPQExpBuffer(&errmsg,
+								  _("node %i is in manual failover mode and is now disconnected from replication"),
+								  local_options.node);
+
+				log_verbose(LOG_DEBUG, "old master: %i; current: %i\n", previous_master_node_id, master_options.node);
+
+				manual_mode_upstream_disconnected = true;
+
+				create_event_record(master_conn,
+									&local_options,
+									local_options.node,
+									"standby_disconnect_manual",
+									/* here "true" indicates the action has occurred as expected */
+									true,
+									errmsg.data);
+
+			}
 		}
 		else if (local_options.failover == AUTOMATIC_FAILOVER)
 		{
@@ -916,8 +1012,8 @@ standby_monitor(void)
 		 * the stream. If we set the local standby node as failed and it's now running
 		 * and receiving replication data, we should activate it again.
 		 */
-	        set_local_node_status();
-	        log_info(_("standby connection recovered!\n"));
+		set_local_node_status();
+		log_info(_("standby connection recovered!\n"));
 	}

 	/* Fast path for the case where no history is requested */
@@ -929,6 +1025,7 @@ standby_monitor(void)
 	 * from the upstream node to write monitoring information
 	 */

+	/* XXX not used? */
 	upstream_node = get_node_info(my_local_conn, local_options.cluster_name, upstream_node_id);

 	sprintf(sqlquery,
@@ -983,12 +1080,19 @@ standby_monitor(void)
 		return;

 	/* Get local xlog info */
+
 	sqlquery_snprintf(sqlquery,
-					  "SELECT CURRENT_TIMESTAMP, "
-					  "pg_catalog.pg_last_xlog_receive_location(), "
-					  "pg_catalog.pg_last_xlog_replay_location(), "
-					  "pg_catalog.pg_last_xact_replay_timestamp(), "
-					  "pg_catalog.pg_last_xlog_receive_location() >= pg_catalog.pg_last_xlog_replay_location()");
+					  " SELECT ts, "
+					  "        receive_location, "
+					  "        replay_location, "
+					  "        replay_timestamp, "
+					  "        receive_location >= replay_location "
+					  "   FROM (SELECT CURRENT_TIMESTAMP AS ts, "
+					  "         pg_catalog.pg_last_xlog_receive_location() AS receive_location, "
+					  "         pg_catalog.pg_last_xlog_replay_location()  AS replay_location, "
+					  "         pg_catalog.pg_last_xact_replay_timestamp() AS replay_timestamp "
+					  "        ) q ");
+

 	res = PQexec(my_local_conn, sqlquery);
 	if (PQresultStatus(res) != PGRES_TUPLES_OK)
@@ -1073,10 +1177,12 @@ standby_monitor(void)
 	}
 	else
 	{
-		apply_lag = (long long unsigned int)lsn_last_xlog_receive_location - lsn_last_xlog_replay_location;
 		lsn_last_xlog_receive_location = lsn_to_xlogrecptr(last_xlog_receive_location, NULL);
+
+		apply_lag = (long long unsigned int)lsn_last_xlog_receive_location - lsn_last_xlog_replay_location;
 	}

+
 	/* Calculate replication lag */
 	if (lsn_master_current_xlog_location >= lsn_last_xlog_receive_location)
 	{
@@ -1121,7 +1227,6 @@ standby_monitor(void)
 					  last_xlog_receive_location,
 					  replication_lag,
 					  apply_lag);
-
 	/*
 	 * Execute the query asynchronously, but don't check for a result. We will
 	 * check the result next time we pause for a monitor step.
@@ -1158,8 +1263,6 @@ do_master_failover(void)
 	XLogRecPtr	xlog_recptr;
 	bool		lsn_format_ok;

-	char		last_xlog_replay_location[MAXLEN];
-
 	PGconn	   *node_conn = NULL;

 	/*
@@ -1340,8 +1443,8 @@ do_master_failover(void)
 				  " considered as new master and exit.\n"),
 				PQerrorMessage(my_local_conn));
 		PQclear(res);
-		sprintf(last_xlog_replay_location, "'%X/%X'", 0, 0);
-		update_shared_memory(last_xlog_replay_location);
+
+		update_shared_memory(LSN_QUERY_ERROR);
 		terminate(ERR_DB_QUERY);
 	}
 	/* write last location in shared memory */
@@ -1391,6 +1494,7 @@ do_master_failover(void)

 		while (!nodes[i].is_ready)
 		{
+			char location_value[MAXLEN];

 			sqlquery_snprintf(sqlquery,
 							  "SELECT %s.repmgr_get_last_standby_location()",
@@ -1406,7 +1510,11 @@ do_master_failover(void)
 				terminate(ERR_DB_QUERY);
 			}

-			xlog_recptr = lsn_to_xlogrecptr(PQgetvalue(res, 0, 0), &lsn_format_ok);
+			/* Copy the returned value as we'll need to reference it a few times */
+			strncpy(location_value, PQgetvalue(res, 0, 0), MAXLEN);
+			PQclear(res);
+
+			xlog_recptr = lsn_to_xlogrecptr(location_value, &lsn_format_ok);

 			/* If position reported as "invalid", check for format error or
 			 * empty string; otherwise position is 0/0 and we need to continue
@@ -1414,10 +1522,36 @@ do_master_failover(void)
 			 */
 			if (xlog_recptr == InvalidXLogRecPtr)
 			{
+				bool continue_loop = true;
+
 				if (lsn_format_ok == false)
 				{
+
+					/*
+					 * The node is indicating it is not a promotion candidate -
+					 * in this case we can store its invalid LSN to ensure it
+					 * can't be a promotion candidate when comparing locations
+					 */
+					if (strcmp(location_value, PASSIVE_NODE) == 0)
+					{
+						log_debug("node %i is passive mode\n", nodes[i].node_id);
+						log_info(_("node %i will not be considered for promotion\n"), nodes[i].node_id);
+						nodes[i].xlog_location = InvalidXLogRecPtr;
+						continue_loop = false;
+					}
+					/*
+					 * This should probably never happen but if it does, rule the
+					 * node out as a promotion candidate
+					 */
+					else if (strcmp(location_value, LSN_QUERY_ERROR) == 0)
+					{
+						log_warning(_("node %i is unable to update its shared memory and will not be considered for promotion\n"), nodes[i].node_id);
+						nodes[i].xlog_location = InvalidXLogRecPtr;
+						continue_loop = false;
+					}
+
 					/* Unable to parse value returned by `repmgr_get_last_standby_location()` */
-					if (*PQgetvalue(res, 0, 0) == '\0')
+					else if (*location_value == '\0')
 					{
 						log_crit(
 							_("unable to obtain LSN from node %i"), nodes[i].node_id
@@ -1426,8 +1560,8 @@ do_master_failover(void)
 							_("please check that 'shared_preload_libraries=repmgr_funcs' is set in postgresql.conf\n")
 							);

-						PQclear(res);
 						PQfinish(node_conn);
+						/* XXX shouldn't we just ignore this node? */
 						exit(ERR_BAD_CONFIG);
 					}

@@ -1435,25 +1569,29 @@ do_master_failover(void)
 					 * Very unlikely to happen; in the absence of any better
 					 * strategy keep checking
 					 */
-					log_warning(_("unable to parse LSN \"%s\"\n"),
-								PQgetvalue(res, 0, 0));
+					else {
+						log_warning(_("unable to parse LSN \"%s\"\n"),
+									location_value);
+					}
 				}
 				else
 				{
 					log_debug(
 						_("invalid LSN returned from node %i: '%s'\n"),
 						nodes[i].node_id,
-						PQgetvalue(res, 0, 0)
-						);
+						location_value);
 				}

-				PQclear(res);
-
-				/* If position is 0/0, keep checking */
-				/* XXX we should add a timeout here to prevent infinite looping
+				/*
+				 * If the node is still reporting an InvalidXLogRecPtr, it means
+				 * its repmgrd hasn't yet had time to update it (either with a valid
+				 * XLogRecPtr or a message) so we continue looping.
+				 *
+				 * XXX we should add a timeout here to prevent infinite looping
 				 * if the other node's repmgrd is not up
 				 */
-				continue;
+				if (continue_loop == true)
+					continue;
 			}

 			if (nodes[i].xlog_location < xlog_recptr)
@@ -1461,8 +1599,7 @@ do_master_failover(void)
 				nodes[i].xlog_location = xlog_recptr;
 			}

-			log_debug(_("LSN of node %i is: %s\n"), nodes[i].node_id, PQgetvalue(res, 0, 0));
-			PQclear(res);
+			log_debug(_("LSN of node %i is: %s\n"), nodes[i].node_id, location_value);

 			ready_nodes++;
 			nodes[i].is_ready = true;
@@ -2138,7 +2275,7 @@ lsn_to_xlogrecptr(char *lsn, bool *format_ok)
 	{
 		if (format_ok != NULL)
 			*format_ok = false;
-		log_err(_("incorrect log location format: %s\n"), lsn);
+		log_warning(_("incorrect log location format: %s\n"), lsn);
 		return 0;
 	}

--- a/version.h
+++ b/version.h
@@ -1,6 +1,6 @@
 #ifndef _VERSION_H_
 #define _VERSION_H_

-#define REPMGR_VERSION "3.2dev"
+#define REPMGR_VERSION "3.1.5"

 #endif
Author	SHA1	Message	Date
Martin	3802b917e0	Fix alignment and syntax	2016-09-23 13:45:59 -03:00
Jarkko Oranen	4f7a2a0614	Add sample configuration for systemd support	2016-09-23 13:45:50 -03:00
Jarkko Oranen	06c7fe04b0	Allow overriding start, stop and restart commands issued by repmgr This commit introduces three new options: - start_command - stop_command - restart_command If these are set, repmgr will issue the specified command instead of the default pg_ctl commands	2016-09-23 13:45:32 -03:00
Ian Barwick	1fe01e9168	Update HISTORY	2016-08-15 12:31:42 +09:00
Ian Barwick	ed1136f443	Reinstate deprecated command line options and add warnings -l/--local-port will be removed in 3.2, not 3.1.x. --initdb-no-pwprompt already has no effect.	2016-08-15 12:22:33 +09:00
Ian Barwick	a7ed60a533	Update README.md Note default usage of `pg_basebackup --xlog-method=stream`.	2016-08-15 10:20:12 +09:00
Renaud Fortier	fc5a18410d	Update README.md I think this will improve the readme.	2016-08-15 10:20:07 +09:00
Ian Barwick	fd52c8ec3c	Update HISTORY	2016-08-12 09:58:04 +09:00
Ian Barwick	47f1c6fa84	Revert "Improved "repmgr-auto" Debian package" This reverts commit `5b91a5e2e5`.	2016-08-12 09:55:47 +09:00
Ian Barwick	fba89ef37c	repmgr: set default user for -R/--remote-user	2016-08-12 09:32:40 +09:00
Ian Barwick	4cc6cbe32f	`repmgr standby clone` historically accepts a hostname as third parameter	2016-08-12 09:20:54 +09:00
Ian Barwick	c715077c29	Clean up command line option handling and help output - properly distinguish between the command line option -? and getopt's unknown option marker '?' - remove deprecated command line options --initdb-no-pwprompt and -l/--local-port - add witness command summary in help output	2016-08-11 17:33:05 +09:00
Ian Barwick	c178d8ed27	Refactor standby monitoring query Addresses GitHub #224	2016-08-11 17:28:59 +09:00
Ian Barwick	d4d06f43f7	When the output of a remote command isn't required, ensure it's consumed anyway This fixes a regression introduced with commit `85f68e9f77` Also clean up some code made redundant by same.	2016-08-11 08:52:27 +09:00
Ian Barwick	0d346a9f54	Update HISTORY Also remove code comment obsoleted by previous commit	2016-08-09 15:41:09 +09:00
Gianni Ciolli	abb16e4366	Now STANDBY SWITCHOVER and STANDBY FOLLOW log an event notification on success and also on some failures, precisely those when it makes sense or it is reasonably possible to do so.	2016-08-09 15:40:59 +09:00
Gianni Ciolli	59b1924d5b	Only collect remote command output if the caller requires it This addresses GitHub #216 and #167.	2016-08-09 15:34:57 +09:00
Ian Barwick	c88ea62643	Update HISTORY	2016-08-09 12:28:51 +09:00
Gianni Ciolli	5b91a5e2e5	Improved "repmgr-auto" Debian package * Version set to 3.2dev * Binaries are placed in PGBINDIR and then linked from /usr/bin, instead of being placed into /usr/bin directly. This is necessary for the switchover command, because it requires pg_rewind, which is placed in PGBINDIR too.	2016-08-09 12:28:22 +09:00
Ian Barwick	c2a1a35282	Bump version 3.1.5	2016-08-09 12:21:06 +09:00
Ian Barwick	2b8b74ae75	Update HISTORY	2016-08-09 12:20:38 +09:00
Ian Barwick	08ef4d4be6	Improve handling of failover events when `failover` is set to `manual` - prevent repmgrd from repeatedly executing the failover code - add event notification 'standby_disconnect_manual' - update documentation This addresses GitHub #221.	2016-08-09 12:20:20 +09:00
Ian Barwick	1a0049f086	repmgrd: prevent endless loops in failover with manual node The LSN reported by the shared memory function defaults to "0/0" (InvalidXLogRecPtr) - this indicates that the repmgrd on that node hasn't been able to update it yet. However during failover several places in the code assumed this is an error, which would cause an endless loop waiting for updates which would never come. To get around this without changing function definitions, we can store an explicit message in the shared memory location field so the caller can tell whether the other node hasn't yet updated the field, or encountered situation which means it should not be considered as a promotion candidate (which in most cases will be because `failover` is set to `manual`. Resolves GitHub #222.	2016-08-09 12:20:03 +09:00
Ian Barwick	af6f0fc2cf	Fix repmgrd's command line help option parsing As in commit `d0c05e6f46`, properly distinguish between the command line option -? and getopt's unknown option marker '?'	2016-08-08 21:19:13 +09:00
Ian Barwick	893d67473d	Document `repmgr cluster show --csv`	2016-08-01 16:13:03 +09:00
Ian Barwick	a922cd5558	Suppress connection error display in `repmgr cluster show` This prevents connection error messages being mixed in with `repmgr cluster show` output. Error message output can still be enabled with the --verbose flag. Fixes GitHub #215	2016-08-01 15:01:23 +09:00
Ian Barwick	7bbc664230	Miscellaneous code cleanup and typo fixes	2016-07-28 16:39:32 +09:00
Ian Barwick	a6998fe0f9	Update README Default log level is NOTICE, not INFO.	2016-07-28 16:39:21 +09:00
Ian Barwick	dadfdcc51f	Rename RECOVERY_FILE to RECOVERY_COMMAND_FILE This is for consistency with the PostgreSQL source code (see: src/backend/access/transam/xlog.c ), but as it's not exported we need to define it ourselves anyway.	2016-07-26 09:21:38 +09:00
Ian Barwick	b8823d5c1f	Update README Add note about 2ndQuadrant RPM repository.	2016-07-13 09:55:34 +09:00
Ian Barwick	e59b57376d	Update code comments	2016-07-12 10:59:48 +09:00
Ian Barwick	3db87e6a31	Remove unused error code ERR_BAD_PASSWORD	2016-07-12 10:59:42 +09:00
Ian Barwick	94d05619c3	README: update error code list	2016-07-12 10:59:37 +09:00
Ian Barwick	807c7c926c	Update README with details about conninfo parameter handling From 3.1.4 `repmgr` will behave like other PostgreSQL utilities when handling database connection parameters, in particular accepting a conninfo string and honouring libpq connection defaults.	2016-07-12 10:59:30 +09:00
Ian Barwick	df68f1f3f6	Make more consistent use of conninfo parameters Removed the existing keyword array which has a fixed, limited number of parameters and replace it with a dynamic array which can be used to store as many parameters as reported by libpq.	2016-07-12 10:59:26 +09:00
Ian Barwick	d4c75bb6c7	Add missing space when setting "application_name"	2016-07-12 10:59:20 +09:00
Ian Barwick	94d4e1128d	Improve default host/dbname handling repmgr disallows socket connections anyway (the whole point of providing the host is to connect to a remote machine) so don't show that as a fallback default in the -?/--help output.	2016-07-12 10:59:13 +09:00
Ian Barwick	dbd82ba687	Enable a conninfo string to be passed to repmgr in the -d/--dbname parameter This matches the behaviour of other PostgreSQL utilities such as pg_basebackup, psql et al. Note that unlike psql, but like pg_basebackup, repmgr does not accept a "left-over" parameter as a conninfo string; this could be added later. Parameters specified in the conninfo string will override any parameters supplied correcly (e.g. `-d "host=foo"` will override `-h bar`).	2016-07-12 10:59:09 +09:00
Ian Barwick	0888fbc538	Generate "primary_conninfo" using the primary connection's parameters Having successfully connected to the primary, we can use the actual parameters reported by libpq to create "primary_conninfo", rather than the limited subset previously defined by repmgr. Assuming that the user can pass a conninfo string to repmgr (see following commit), this makes it possible to provide other connection parameters, e.g. related to SSL usage.	2016-07-12 10:59:05 +09:00
Ian Barwick	92a84bd950	Remove now-superfluous wildcard in rmtree() call	2016-07-07 09:54:50 +09:00
Ian Barwick	a3318d65d2	Bump version 3.1.4	2016-07-07 08:49:42 +09:00
Ian Barwick	374e9811c9	Merge branch 'master' of github.com:2ndQuadrant/repmgr into REL3_1_STABLE	2016-07-06 16:43:39 +09:00
Ian Barwick	16896510dc	Fix log formatting	2016-05-17 17:24:30 +09:00
Ian Barwick	1c155a1088	Update HISTORY	2016-05-17 11:12:18 +09:00
Ian Barwick	31d57f4122	Update Makefile Add include file dependencies (see caveat in file). Also update comments.	2016-05-16 19:15:58 +09:00
Ian Barwick	7b313b9d71	README.md: improve documentation of `repl_status` view	2016-05-16 13:51:08 +09:00
Ian Barwick	cf126642bd	repmgrd: handle situations where streaming replication is inactive	2016-05-16 12:31:31 +09:00
Ian Barwick	52281fcde8	repmgrd: rename some variables to better match the system functions they're populated from	2016-05-16 12:31:06 +09:00
Ian Barwick	de573edaaa	Remove extraneous PQfinish()	2016-05-16 12:23:39 +09:00
Ian Barwick	4cb7f301ad	Correct check for wal_level in 9.3	2016-05-16 12:23:33 +09:00
Ian Barwick	87d8de4441	Remove unneeded column	2016-05-16 12:23:25 +09:00
Ian Barwick	6db742f81e	repmgrd: better handling of missing upstream_node_id Ensure we default to master node.	2016-05-16 12:23:20 +09:00
Ian Barwick	c79933685c	Add missing newlines in log messages	2016-05-16 12:23:15 +09:00
Ian Barwick	04ba672b9f	repmgrd: avoid additional connection to local instance in do_master_failover()	2016-05-16 12:23:09 +09:00
Ian Barwick	4f4111063a	Suppress gnu_printf format warning	2016-05-16 12:23:03 +09:00
Ian Barwick	3a3a536e6d	repmgrd: rename variable for clarity	2016-05-16 12:22:58 +09:00
Ian Barwick	6f7206a5a1	Don't follow the promotion candidate standby if the primary reappears	2016-05-16 12:22:49 +09:00
Ian Barwick	f9fd1dd227	Don't terminate a standby's repmgrd if self-promotion fails due to master reappearing Per GitHub #173	2016-05-16 12:22:40 +09:00
Martin	8140ba9c27	The commit fixes problems not taking in account while working on the issue with rsync returning non-zero status on vanishing files on commit `83e5f98171`. Alvaro Herrera gave me some tips which pointed me in the correct direction. This was reported by sungjae lee <sj860908@gmail.com>	2016-05-16 12:22:27 +09:00
Ian Barwick	32dba444e1	Enable long option --pgdata as alias for -D/--data-dir pg_ctl provides -D/--pgdata; we want to be as close to the core utilities as possible.	2016-05-16 12:22:17 +09:00
Ian Barwick	8212ff8d8a	Bump version 3.1.3	2016-05-12 07:54:42 +09:00
Martin	1ccd0edad2	We were not checking the return code after rsyncing the tablespaces. This fixes #168	2016-04-17 17:59:50 -03:00
Martin	59b31dd1ca	Ignore rsync error code for vanished files. It's very common to come over vanish files during a backup or rsync o the data directory (dropped index, temp tables, etc.) This fixes #149	2016-04-17 17:59:50 -03:00
Ian Barwick	300b9f0cc2	Fix pre-9.6 wal_level check	2016-04-12 16:18:29 +09:00
Ian Barwick	0efee4cf65	Fix hint message formatting	2016-04-12 16:07:38 +09:00
Ian Barwick	0cb2584886	Bump version 3.1.2	2016-04-12 15:56:39 +09:00
Ian Barwick	b88d27248c	Use "immediately_reserve" parameter in pg_create_physical_replication_slot (9.6)	2016-04-12 15:56:06 +09:00
Ian Barwick	683c54325e	Enable repmgr to be compiled with PostgreSQL 9.6	2016-04-12 15:55:51 +09:00
Ian Barwick	70d398cd47	Update HISTORY	2016-04-12 15:53:40 +09:00
Ian Barwick	7b7d80e5f2	Update HISTORY	2016-04-12 15:53:33 +09:00
Ian Barwick	96b0e26084	Remove duplicate inclusion from header file	2016-04-06 14:16:00 +09:00
Ian Barwick	91c498f6f1	Update HISTORY	2016-04-06 11:57:46 +09:00
Ian Barwick	d48093e732	Preserver failover slots when cloning a standby, if enabled	2016-04-06 11:20:27 +09:00
Ian Barwick	3f0d1754a4	MAXFILENAME -> MAXPGPATH	2016-04-06 11:20:27 +09:00
Craig Ringer	f27979bbe1	WIP support for preserving failover slots	2016-04-06 11:20:27 +09:00
Ian Barwick	e9445a5d5e	Make self-referencing foreign key on repl_nodes table deferrable	2016-04-01 15:20:36 +09:00
Ian Barwick	9a2717b5e3	Improve debugging output for node resyncing We'll need this for testing.	2016-04-01 15:20:32 +09:00
Ian Barwick	dd6ea1cd77	Rename copy_configuration () to witness_copy_node_records() As it's witness-specific. Per suggestion from Martín.	2016-04-01 11:30:08 +09:00
Ian Barwick	de5908c122	Make witness server node update an atomic operation If the connection to the primary is lost, roll back to the previously known state. TRUNCATE is of course not MVCC-friendly, but that shouldn't matter here as only one process should ever be looking at this table.	2016-04-01 11:15:27 +09:00
Ian Barwick	4b5c84921c	Replace MAXFILENAME with MAXPGPATH	2016-03-31 20:11:10 +09:00
Ian Barwick	aaa8d70cef	Comment out configuration items in sample config file The configured values are either the defaults, or examples which may not work in a real environment. If this file is being used as a template, the onus is on the user to uncomment and check all desired parameters.	2016-03-31 15:14:30 +09:00
Gianni Ciolli	ca31b846e7	Rewording comment for clarity.	2016-03-31 15:01:29 +09:00
Ian Barwick	a27cecb559	Update HISTORY	2016-03-31 14:59:03 +09:00
Ian Barwick	cf0cdfa6a1	Bump version 3.1.2rc1	2016-03-31 14:56:49 +09:00
Ian Barwick	31489d92c0	Check directory entity filetype in a more portable way	2016-03-30 20:21:41 +09:00
Ian Barwick	b7fd13aed2	Fix pg_ctl path generation in do_standby_switchover()	2016-03-30 16:46:57 +09:00
Ian Barwick	3c4bf27aa7	Add headers as dependencies in Makefile	2016-03-30 15:06:15 +09:00
Ian Barwick	0ebd9c15d9	Regularly sync witness server repl_nodes table. Although the witness server will resync the repl_nodes table following a failover, other operations (e.g. removing or cloning a standby) were previously not reflected in the witness server's copy of this table. As a short-term workaround, automatically resync the table at regular intervals (defined by the configuration file parameter "witness_repl_nodes_sync_interval_secs", default 30 seconds).	2016-03-30 15:06:12 +09:00
Nikolay Shaplov	f9dba283d4	Better use /24 network mask in this example	2016-03-30 15:05:29 +09:00
Ian Barwick	205f1cebbb	It's unlikely this situation will occur on a witness server Which is why the error message is for master/standby only.	2016-03-30 15:05:26 +09:00
Ian Barwick	4d97c1ebf7	Add hint about registering the server after cloning it. This step is easy to forget.	2016-03-30 15:05:20 +09:00
Ian Barwick	12c395e91f	README: Add note about 'repmgr_funcs'	2016-03-30 15:05:17 +09:00
Ian Barwick	bd1e4f71d6	repmgrd: fix error message	2016-03-30 15:05:10 +09:00
Ian Barwick	cb49071ea4	Fix code comment	2016-03-30 15:05:06 +09:00
Ian Barwick	5ad674edff	Bump version 3.1.1	2016-02-23 15:56:24 +09:00
Ian Barwick	ac09bad89c	Minor fixes to README.md	2016-02-23 14:37:59 +09:00
Ian Barwick	009d92fec8	Ensure witness node is registered before the repl_nodes table is copied This fixes a bug introduced into the previous commit, where the witness node was registered last to prevent a spurious node record being created even if witness server creation failed.	2016-02-23 14:37:54 +09:00
Martin	b3d8a68a1d	Fix a few paragraphs from the README.md.	2016-02-23 14:37:48 +09:00
Ian Barwick	05b47cb2a8	Prevent repmgr/repmgrd running as root	2016-02-23 14:37:44 +09:00
Ian Barwick	dc542a1b7d	Better handling of errors during witness creation Ensure witness is only registered after all steps for creation have been successfully completed. Also write an event record if connection could not be made to the witness server after initial creation. This addresses GitHub issue #146.	2016-02-23 14:37:39 +09:00
Ian Barwick	6ce8058749	witness creation: extract database and user names from the local conninfo string 99.9% of the time they'll be the same as the primary connection, but it's more consistent to use the provided local conninfo string (from which the port is already extracted).	2016-02-23 14:37:31 +09:00
Ian Barwick	2edcac77f0	README.md: update witness server section	2016-02-23 14:37:27 +09:00
Ian Barwick	f740374392	Add '-P/--pwprompt' option for "repmgr create witness" Optionally prompt for superuser and repmgr user when creating a witness. This ensures a password can be provided if the primary's pg_hba.conf mandates it. This deprecates '--initdb-no-pwprompt'; and changes the default behaviour of "repmgr create witness", which previously required a superuser password unless '--initdb-no-pwprompt' was supplied. This behaviour is more consistent with other PostgreSQL utilities such as createuser. Partial fix for GitHub issue #145.	2016-02-23 14:37:23 +09:00