|
|
|
|
@@ -100,11 +100,11 @@
|
|
|
|
|
Start &repmgrd; on each standby and verify that it's running by examining the
|
|
|
|
|
log output, which at log level <literal>INFO</literal> will look like this:
|
|
|
|
|
<programlisting>
|
|
|
|
|
[2019-03-15 06:32:05] [NOTICE] repmgrd (repmgrd 4.3) starting up
|
|
|
|
|
[2019-03-15 06:32:05] [INFO] connecting to database "host=node2 dbname=repmgr user=repmgr connect_timeout=2"
|
|
|
|
|
INFO: set_repmgrd_pid(): provided pidfile is /var/run/repmgr/repmgrd-11.pid
|
|
|
|
|
[2019-03-15 06:32:05] [NOTICE] starting monitoring of node "node2" (ID: 2)
|
|
|
|
|
[2019-03-15 06:32:05] [INFO] monitoring connection to upstream node "node1" (ID: 1)</programlisting>
|
|
|
|
|
[2019-08-15 07:14:42] [NOTICE] repmgrd (repmgrd 5.0) starting up
|
|
|
|
|
[2019-08-15 07:14:42] [INFO] connecting to database "host=node2 dbname=repmgr user=repmgr connect_timeout=2"
|
|
|
|
|
INFO: set_repmgrd_pid(): provided pidfile is /var/run/repmgr/repmgrd-12.pid
|
|
|
|
|
[2019-08-15 07:14:42] [NOTICE] starting monitoring of node "node2" (ID: 2)
|
|
|
|
|
[2019-08-15 07:14:42] [INFO] monitoring connection to upstream node "node1" (ID: 1)</programlisting>
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
Each &repmgrd; should also have recorded its successful startup as an event:
|
|
|
|
|
@@ -112,9 +112,9 @@
|
|
|
|
|
$ repmgr -f /etc/repmgr.conf cluster event --event=repmgrd_start
|
|
|
|
|
Node ID | Name | Event | OK | Timestamp | Details
|
|
|
|
|
---------+-------+---------------+----+---------------------+--------------------------------------------------------
|
|
|
|
|
3 | node3 | repmgrd_start | t | 2019-03-14 04:17:30 | monitoring connection to upstream node "node1" (ID: 1)
|
|
|
|
|
2 | node2 | repmgrd_start | t | 2019-03-14 04:11:47 | monitoring connection to upstream node "node1" (ID: 1)
|
|
|
|
|
1 | node1 | repmgrd_start | t | 2019-03-14 04:04:31 | monitoring cluster primary "node1" (ID: 1)</programlisting>
|
|
|
|
|
3 | node3 | repmgrd_start | t | 2019-08-15 07:14:42 | monitoring connection to upstream node "node1" (ID: 1)
|
|
|
|
|
2 | node2 | repmgrd_start | t | 2019-08-15 07:14:41 | monitoring connection to upstream node "node1" (ID: 1)
|
|
|
|
|
1 | node1 | repmgrd_start | t | 2019-08-15 07:14:39 | monitoring cluster primary "node1" (ID: 1)</programlisting>
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
Now stop the current primary server with e.g.:
|
|
|
|
|
@@ -128,33 +128,33 @@
|
|
|
|
|
decision is made. This is an extract from the log of a standby server (<literal>node2</literal>)
|
|
|
|
|
which has promoted to new primary after failure of the original primary (<literal>node1</literal>).
|
|
|
|
|
<programlisting>
|
|
|
|
|
[2019-03-15 06:37:50] [WARNING] unable to connect to upstream node "node1" (ID: 1)
|
|
|
|
|
[2019-03-15 06:37:50] [INFO] checking state of node 1, 1 of 3 attempts
|
|
|
|
|
[2019-03-15 06:37:50] [INFO] sleeping 5 seconds until next reconnection attempt
|
|
|
|
|
[2019-03-15 06:37:55] [INFO] checking state of node 1, 2 of 3 attempts
|
|
|
|
|
[2019-03-15 06:37:55] [INFO] sleeping 5 seconds until next reconnection attempt
|
|
|
|
|
[2019-03-15 06:38:00] [INFO] checking state of node 1, 3 of 3 attempts
|
|
|
|
|
[2019-03-15 06:38:00] [WARNING] unable to reconnect to node 1 after 3 attempts
|
|
|
|
|
[2019-03-15 06:38:00] [INFO] primary and this node have the same location ("default")
|
|
|
|
|
[2019-03-15 06:38:00] [INFO] local node's last receive lsn: 0/900CBF8
|
|
|
|
|
[2019-03-15 06:38:00] [INFO] node 3 last saw primary node 12 second(s) ago
|
|
|
|
|
[2019-03-15 06:38:00] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/900CBF8
|
|
|
|
|
[2019-03-15 06:38:00] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node2" (ID: 2)
|
|
|
|
|
[2019-03-15 06:38:00] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
|
|
|
|
|
[2019-03-15 06:38:00] [NOTICE] promotion candidate is "node2" (ID: 2)
|
|
|
|
|
[2019-03-15 06:38:00] [NOTICE] this node is the winner, will now promote itself and inform other nodes
|
|
|
|
|
[2019-03-15 06:38:00] [INFO] promote_command is:
|
|
|
|
|
"/usr/pgsql-11/bin/repmgr -f /etc/repmgr/11/repmgr.conf standby promote"
|
|
|
|
|
[2019-08-15 07:27:50] [WARNING] unable to connect to upstream node "node1" (ID: 1)
|
|
|
|
|
[2019-08-15 07:27:50] [INFO] checking state of node 1, 1 of 3 attempts
|
|
|
|
|
[2019-08-15 07:27:50] [INFO] sleeping 5 seconds until next reconnection attempt
|
|
|
|
|
[2019-08-15 07:27:55] [INFO] checking state of node 1, 2 of 3 attempts
|
|
|
|
|
[2019-08-15 07:27:55] [INFO] sleeping 5 seconds until next reconnection attempt
|
|
|
|
|
[2019-08-15 07:28:00] [INFO] checking state of node 1, 3 of 3 attempts
|
|
|
|
|
[2019-08-15 07:28:00] [WARNING] unable to reconnect to node 1 after 3 attempts
|
|
|
|
|
[2019-08-15 07:28:00] [INFO] primary and this node have the same location ("default")
|
|
|
|
|
[2019-08-15 07:28:00] [INFO] local node's last receive lsn: 0/900CBF8
|
|
|
|
|
[2019-08-15 07:28:00] [INFO] node 3 last saw primary node 12 second(s) ago
|
|
|
|
|
[2019-08-15 07:28:00] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/900CBF8
|
|
|
|
|
[2019-08-15 07:28:00] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node2" (ID: 2)
|
|
|
|
|
[2019-08-15 07:28:00] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
|
|
|
|
|
[2019-08-15 07:28:00] [NOTICE] promotion candidate is "node2" (ID: 2)
|
|
|
|
|
[2019-08-15 07:28:00] [NOTICE] this node is the winner, will now promote itself and inform other nodes
|
|
|
|
|
[2019-08-15 07:28:00] [INFO] promote_command is:
|
|
|
|
|
"/usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby promote"
|
|
|
|
|
NOTICE: promoting standby to primary
|
|
|
|
|
DETAIL: promoting server "node2" (ID: 2) using "/usr/pgsql-11/bin/pg_ctl -w -D '/var/lib/pgsql/11/data' promote"
|
|
|
|
|
DETAIL: promoting server "node2" (ID: 2) using "/usr/pgsql-12/bin/pg_ctl -w -D '/var/lib/pgsql/12/data' promote"
|
|
|
|
|
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
|
|
|
|
|
NOTICE: STANDBY PROMOTE successful
|
|
|
|
|
DETAIL: server "node2" (ID: 2) was successfully promoted to primary
|
|
|
|
|
[2019-03-15 06:38:01] [INFO] 3 followers to notify
|
|
|
|
|
[2019-03-15 06:38:01] [NOTICE] notifying node "node3" (ID: 3) to follow node 2
|
|
|
|
|
[2019-08-15 07:28:01] [INFO] 3 followers to notify
|
|
|
|
|
[2019-08-15 07:28:01] [NOTICE] notifying node "node3" (ID: 3) to follow node 2
|
|
|
|
|
INFO: node 3 received notification to follow node 2
|
|
|
|
|
[2019-03-15 06:38:01] [INFO] switching to primary monitoring mode
|
|
|
|
|
[2019-03-15 06:38:01] [NOTICE] monitoring cluster primary "node2" (ID: 2)</programlisting>
|
|
|
|
|
[2019-08-15 07:28:01] [INFO] switching to primary monitoring mode
|
|
|
|
|
[2019-08-15 07:28:01] [NOTICE] monitoring cluster primary "node2" (ID: 2)</programlisting>
|
|
|
|
|
</para>
|
|
|
|
|
<para>
|
|
|
|
|
The cluster status will now look like this, with the original primary (<literal>node1</literal>)
|
|
|
|
|
@@ -176,11 +176,11 @@
|
|
|
|
|
$ repmgr -f /etc/repmgr.conf cluster event
|
|
|
|
|
Node ID | Name | Event | OK | Timestamp | Details
|
|
|
|
|
---------+-------+----------------------------+----+---------------------+-------------------------------------------------------------
|
|
|
|
|
3 | node3 | repmgrd_failover_follow | t | 2019-03-15 06:38:03 | node 3 now following new upstream node 2
|
|
|
|
|
3 | node3 | standby_follow | t | 2019-03-15 06:38:02 | standby attached to upstream node "node2" (ID: 2)
|
|
|
|
|
2 | node2 | repmgrd_reload | t | 2019-03-15 06:38:01 | monitoring cluster primary "node2" (ID: 2)
|
|
|
|
|
2 | node2 | repmgrd_failover_promote | t | 2019-03-15 06:38:01 | node 2 promoted to primary; old primary 1 marked as failed
|
|
|
|
|
2 | node2 | standby_promote | t | 2019-03-15 06:38:01 | server "node2" (ID: 2) was successfully promoted to primary</programlisting>
|
|
|
|
|
3 | node3 | repmgrd_failover_follow | t | 2019-08-15 07:38:03 | node 3 now following new upstream node 2
|
|
|
|
|
3 | node3 | standby_follow | t | 2019-08-15 07:38:02 | standby attached to upstream node "node2" (ID: 2)
|
|
|
|
|
2 | node2 | repmgrd_reload | t | 2019-08-15 07:38:01 | monitoring cluster primary "node2" (ID: 2)
|
|
|
|
|
2 | node2 | repmgrd_failover_promote | t | 2019-08-15 07:38:01 | node 2 promoted to primary; old primary 1 marked as failed
|
|
|
|
|
2 | node2 | standby_promote | t | 2019-08-15 07:38:01 | server "node2" (ID: 2) was successfully promoted to primary</programlisting>
|
|
|
|
|
</para>
|
|
|
|
|
|
|
|
|
|
</sect1>
|
|
|
|
|
|