From fbdf9617faa99c837ecdb49bd6a0e763f3dde4d8 Mon Sep 17 00:00:00 2001 From: Ian Barwick Date: Fri, 15 Mar 2019 15:43:11 +0900 Subject: [PATCH] doc: update repmgrd example output --- doc/repmgrd-overview.sgml | 103 ++++++++++++++++++++------------------ repmgr.conf.sample | 7 +-- 2 files changed, 58 insertions(+), 52 deletions(-) diff --git a/doc/repmgrd-overview.sgml b/doc/repmgrd-overview.sgml index 5be2805a..613b2479 100644 --- a/doc/repmgrd-overview.sgml +++ b/doc/repmgrd-overview.sgml @@ -22,12 +22,12 @@ and two standbys streaming directly from the primary) so that the cluster looks something like this: - $ repmgr -f /etc/repmgr.conf cluster show - ID | Name | Role | Status | Upstream | Location | Connection string - ----+-------+---------+-----------+----------+----------+-------------------------------------- - 1 | node1 | primary | * running | | default | host=node1 dbname=repmgr user=repmgr - 2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr - 3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr + $ repmgr -f /etc/repmgr.conf cluster show --compact + ID | Name | Role | Status | Upstream | Location | Prio. + ----+-------+---------+-----------+----------+----------+------- + 1 | node1 | primary | * running | | default | 100 + 2 | node2 | standby | running | node1 | default | 100 + 3 | node3 | standby | running | node1 | default | 100 @@ -40,10 +40,11 @@ Start repmgrd on each standby and verify that it's running by examining the log output, which at log level INFO will look like this: - [2017-08-24 17:31:00] [NOTICE] using configuration file "/etc/repmgr.conf" - [2017-08-24 17:31:00] [INFO] connecting to database "host=node2 dbname=repmgr user=repmgr" - [2017-08-24 17:31:00] [NOTICE] starting monitoring of node node2 (ID: 2) - [2017-08-24 17:31:00] [INFO] monitoring connection to upstream node "node1" (node ID: 1) + [2019-03-15 06:32:05] [NOTICE] repmgrd (repmgrd 4.3) starting up + [2019-03-15 06:32:05] [INFO] connecting to database "host=node2 dbname=repmgr user=repmgr connect_timeout=2" + INFO: set_repmgrd_pid(): provided pidfile is /var/run/repmgr/repmgrd-11.pid + [2019-03-15 06:32:05] [NOTICE] starting monitoring of node "node2" (ID: 2) + [2019-03-15 06:32:05] [INFO] monitoring connection to upstream node "node1" (node ID: 1) Each repmgrd should also have recorded its successful startup as an event: @@ -51,9 +52,9 @@ $ repmgr -f /etc/repmgr.conf cluster event --event=repmgrd_start Node ID | Name | Event | OK | Timestamp | Details ---------+-------+---------------+----+---------------------+------------------------------------------------------------- - 3 | node3 | repmgrd_start | t | 2017-08-24 17:35:54 | monitoring connection to upstream node "node1" (node ID: 1) - 2 | node2 | repmgrd_start | t | 2017-08-24 17:35:50 | monitoring connection to upstream node "node1" (node ID: 1) - 1 | node1 | repmgrd_start | t | 2017-08-24 17:35:46 | monitoring cluster primary "node1" (node ID: 1) + 3 | node3 | repmgrd_start | t | 2019-03-14 04:17:30 | monitoring connection to upstream node "node1" (node ID: 1) + 2 | node2 | repmgrd_start | t | 2019-03-14 04:11:47 | monitoring connection to upstream node "node1" (node ID: 1) + 1 | node1 | repmgrd_start | t | 2019-03-14 04:04:31 | monitoring cluster primary "node1" (node ID: 1) Now stop the current primary server with e.g.: @@ -67,55 +68,59 @@ decision is made. This is an extract from the log of a standby server (node2) which has promoted to new primary after failure of the original primary (node1). - [2017-08-24 23:32:01] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state - [2017-08-24 23:32:08] [WARNING] unable to connect to upstream node "node1" (node ID: 1) - [2017-08-24 23:32:08] [INFO] checking state of node 1, 1 of 5 attempts - [2017-08-24 23:32:08] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-08-24 23:32:09] [INFO] checking state of node 1, 2 of 5 attempts - [2017-08-24 23:32:09] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-08-24 23:32:10] [INFO] checking state of node 1, 3 of 5 attempts - [2017-08-24 23:32:10] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-08-24 23:32:11] [INFO] checking state of node 1, 4 of 5 attempts - [2017-08-24 23:32:11] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-08-24 23:32:12] [INFO] checking state of node 1, 5 of 5 attempts - [2017-08-24 23:32:12] [WARNING] unable to reconnect to node 1 after 5 attempts - INFO: setting voting term to 1 - INFO: node 2 is candidate - INFO: node 3 has received request from node 2 for electoral term 1 (our term: 0) - [2017-08-24 23:32:12] [NOTICE] this node is the winner, will now promote self and inform other nodes - INFO: connecting to standby database - NOTICE: promoting standby - DETAIL: promoting server using 'pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' promote' - INFO: reconnecting to promoted server + [2019-03-15 06:37:50] [WARNING] unable to connect to upstream node "node1" (node ID: 1) + [2019-03-15 06:37:50] [INFO] checking state of node 1, 1 of 3 attempts + [2019-03-15 06:37:50] [INFO] sleeping 5 seconds until next reconnection attempt + [2019-03-15 06:37:55] [INFO] checking state of node 1, 2 of 3 attempts + [2019-03-15 06:37:55] [INFO] sleeping 5 seconds until next reconnection attempt + [2019-03-15 06:38:00] [INFO] checking state of node 1, 3 of 3 attempts + [2019-03-15 06:38:00] [WARNING] unable to reconnect to node 1 after 3 attempts + [2019-03-15 06:38:00] [INFO] primary and this node have the same location ("default") + [2019-03-15 06:38:00] [INFO] local node's last receive lsn: 0/900CBF8 + [2019-03-15 06:38:00] [INFO] node 3 last saw primary node 12 second(s) ago + [2019-03-15 06:38:00] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/900CBF8 + [2019-03-15 06:38:00] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node2" (ID: 2) + [2019-03-15 06:38:00] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds + [2019-03-15 06:38:00] [NOTICE] promotion candidate is "node2" (ID: 2) + [2019-03-15 06:38:00] [NOTICE] this node is the winner, will now promote itself and inform other nodes + [2019-03-15 06:38:00] [INFO] promote_command is: + "/usr/pgsql-11/bin/repmgr -f /etc/repmgr/11/repmgr.conf standby promote" + NOTICE: promoting standby to primary + DETAIL: promoting server "node2" (ID: 2) using "/usr/pgsql-11/bin/pg_ctl -w -D '/var/lib/pgsql/11/data' promote" + NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete NOTICE: STANDBY PROMOTE successful - DETAIL: node 2 was successfully promoted to primary + DETAIL: server "node2" (ID: 2) was successfully promoted to primary + [2019-03-15 06:38:01] [INFO] 3 followers to notify + [2019-03-15 06:38:01] [NOTICE] notifying node "node3" (node ID: 3) to follow node 2 INFO: node 3 received notification to follow node 2 - [2017-08-24 23:32:13] [INFO] switching to primary monitoring mode + [2019-03-15 06:38:01] [INFO] switching to primary monitoring mode + [2019-03-15 06:38:01] [NOTICE] monitoring cluster primary "node2" (node ID: 2) The cluster status will now look like this, with the original primary (node1) marked as inactive, and standby node3 now following the new primary (node2): - $ repmgr -f /etc/repmgr.conf cluster show - ID | Name | Role | Status | Upstream | Location | Connection string - ----+-------+---------+-----------+----------+----------+---------------------------------------------------- - 1 | node1 | primary | - failed | | default | host=node1 dbname=repmgr user=repmgr - 2 | node2 | primary | * running | | default | host=node2 dbname=repmgr user=repmgr - 3 | node3 | standby | running | node2 | default | host=node3 dbname=repmgr user=repmgr + $ repmgr -f /etc/repmgr.conf cluster show --compact + ID | Name | Role | Status | Upstream | Location | Prio. + ----+-------+---------+-----------+----------+----------+------- + 1 | node1 | primary | - failed | | default | 100 + 2 | node2 | primary | * running | | default | 100 + 3 | node3 | standby | running | node2 | default | 100 - repmgr cluster event will display a summary of what happened to each server - during the failover: + repmgr cluster event will display a summary of + what happened to each server during the failover: $ repmgr -f /etc/repmgr.conf cluster event - Node ID | Name | Event | OK | Timestamp | Details - ---------+-------+--------------------------+----+---------------------+----------------------------------------------------------------------------------- - 3 | node3 | repmgrd_failover_follow | t | 2017-08-24 23:32:16 | node 3 now following new upstream node 2 - 3 | node3 | standby_follow | t | 2017-08-24 23:32:16 | node 3 is now attached to node 2 - 2 | node2 | repmgrd_failover_promote | t | 2017-08-24 23:32:13 | node 2 promoted to primary; old primary 1 marked as failed - 2 | node2 | standby_promote | t | 2017-08-24 23:32:13 | node 2 was successfully promoted to primary + Node ID | Name | Event | OK | Timestamp | Details + ---------+-------+----------------------------+----+---------------------+------------------------------------------------------------- + 3 | node3 | repmgrd_failover_follow | t | 2019-03-15 06:38:03 | node 3 now following new upstream node 2 + 3 | node3 | standby_follow | t | 2019-03-15 06:38:02 | standby attached to upstream node "node2" (node ID: 2) + 2 | node2 | repmgrd_reload | t | 2019-03-15 06:38:01 | monitoring cluster primary "node2" (node ID: 2) + 2 | node2 | repmgrd_failover_promote | t | 2019-03-15 06:38:01 | node 2 promoted to primary; old primary 1 marked as failed + 2 | node2 | standby_promote | t | 2019-03-15 06:38:01 | server "node2" (ID: 2) was successfully promoted to primary diff --git a/repmgr.conf.sample b/repmgr.conf.sample index 8bbb3fbf..fac18391 100644 --- a/repmgr.conf.sample +++ b/repmgr.conf.sample @@ -7,7 +7,8 @@ # parameter will be treated as empty or false. # # IMPORTANT: string values can be provided as-is, or enclosed in single quotes -# (but not double-quotes, which will be interpreted as part of the string), e.g.: +# (but not double-quotes, which will be interpreted as part of the string), +# e.g.: # # node_name=foo # node_name = 'foo' @@ -24,9 +25,9 @@ # using the server's hostname or another identifier # unambiguously associated with the server to avoid # confusion. Avoid choosing names which reflect the - # node's current role, e.g. "primary" or "standby1", + # node's current role, e.g. 'primary' or 'standby1', # as roles can change and it will be confusing if - # the current primary is called "standby1". + # the current primary is called 'standby1'. #conninfo='' # Database connection information as a conninfo string. # All servers in the cluster must be able to connect to