repmgrd operation

repmgrd operation repmgrd operation Pausing the repmgrd service repmgrd pausing pausing repmgrd In normal operation, &repmgrd; monitors the state of the PostgreSQL node it is running on, and will take appropriate action if problems are detected, e.g. (if so configured) promote the node to primary, if the existing primary has been determined as failed. However, &repmgrd; is unable to distinguish between planned outages (such as performing a switchover or installing PostgreSQL maintenance released), and an actual server outage. In versions prior to &repmgr; 4.2 it was necessary to stop &repmgrd; on all nodes (or at least on all nodes where &repmgrd; is configured for automatic failover) to prevent &repmgrd; from making unintentional changes to the replication cluster. From &repmgr; 4.2, &repmgrd; can now be "paused", i.e. instructed not to take any action such as performing a failover. This can be done from any node in the cluster, removing the need to stop/restart each &repmgrd; individually. For major PostgreSQL upgrades, e.g. from PostgreSQL 11 to PostgreSQL 12, &repmgrd; should be shut down completely and only started up once the &repmgr; packages for the new PostgreSQL major version have been installed. Prerequisites for pausing &repmgrd; In order to be able to pause/unpause &repmgrd;, following prerequisites must be met: &repmgr; 4.2 or later must be installed on all nodes. The same major &repmgr; version (e.g. 4.2) must be installed on all nodes (and preferably the same minor version). PostgreSQL on all nodes must be accessible from the node where the pause/unpause operation is executed, using the conninfo string shown by repmgr cluster show. These conditions are required for normal &repmgr; operation in any case. Pausing/unpausing &repmgrd; To pause &repmgrd;, execute repmgr service pause (&repmgr; 4.2 - 4.4: repmgr daemon pause), e.g.: $ repmgr -f /etc/repmgr.conf service pause NOTICE: node 1 (node1) paused NOTICE: node 2 (node2) paused NOTICE: node 3 (node3) paused The state of &repmgrd; on each node can be checked with repmgr service status (&repmgr; 4.2 - 4.4: repmgr daemon status), e.g.: $ repmgr -f /etc/repmgr.conf service status ID | Name | Role | Status | repmgrd | PID | Paused? ----+-------+---------+---------+---------+------+--------- 1 | node1 | primary | running | running | 7851 | yes 2 | node2 | standby | running | running | 7889 | yes 3 | node3 | standby | running | running | 7918 | yes If executing a switchover with repmgr standby switchover, &repmgr; will automatically pause/unpause the &repmgrd; service as part of the switchover process. If the primary (in this example, node1) is stopped, &repmgrd; running on one of the standbys (here: node2) will react like this: [2019-08-28 12:22:21] [WARNING] unable to connect to upstream node "node1" (node ID: 1) [2019-08-28 12:22:21] [INFO] checking state of node 1, 1 of 5 attempts [2019-08-28 12:22:21] [INFO] sleeping 1 seconds until next reconnection attempt ... [2019-08-28 12:22:24] [INFO] sleeping 1 seconds until next reconnection attempt [2019-08-28 12:22:25] [INFO] checking state of node 1, 5 of 5 attempts [2019-08-28 12:22:25] [WARNING] unable to reconnect to node 1 after 5 attempts [2019-08-28 12:22:25] [NOTICE] node is paused [2019-08-28 12:22:33] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state [2019-08-28 12:22:33] [DETAIL] repmgrd paused by administrator [2019-08-28 12:22:33] [HINT] execute "repmgr service unpause" to resume normal failover mode If the primary becomes available again (e.g. following a software upgrade), &repmgrd; will automatically reconnect, e.g.: [2019-08-28 12:25:41] [NOTICE] reconnected to upstream node "node1" (ID: 1) after 8 seconds, resuming monitoring To unpause the &repmgrd; service, execute repmgr service unpause ((&repmgr; 4.2 - 4.4: repmgr daemon unpause), e.g.: $ repmgr -f /etc/repmgr.conf service unpause NOTICE: node 1 (node1) unpaused NOTICE: node 2 (node2) unpaused NOTICE: node 3 (node3) unpaused If the previous primary is no longer accessible when &repmgrd; is unpaused, no failover action will be taken. Instead, a new primary must be manually promoted using repmgr standby promote, and any standbys attached to the new primary with repmgr standby follow. This is to prevent execution of repmgr service unpause resulting in the automatic promotion of a new primary, which may be a problem particularly in larger clusters, where &repmgrd; could select a different promotion candidate to the one intended by the administrator. Details on the &repmgrd; pausing mechanism The pause state of each node will be stored over a PostgreSQL restart. repmgr service pause and repmgr service unpause can be executed even if &repmgrd; is not running; in this case, &repmgrd; will start up in whichever pause state has been set. repmgr service pause and repmgr service unpause do not start/stop &repmgrd;. The commands repmgr daemon start and repmgr daemon stop (if correctly configured) can be used to start/stop &repmgrd; on individual nodes. repmgrd and paused WAL replay repmgrd paused WAL replay If WAL replay has been paused (using pg_wal_replay_pause(), on PostgreSQL 9.6 and earlier pg_xlog_replay_pause()), in a failover situation &repmgrd; will automatically resume WAL replay. This is because if WAL replay is paused, but WAL is pending replay, PostgreSQL cannot be promoted until WAL replay is resumed. repmgr standby promote will refuse to promote a node in this state, as the PostgreSQL promote command will not be acted on until WAL replay is resumed, leaving the cluster in a potentially unstable state. In this case it is up to the user to decide whether to resume WAL replay. "degraded monitoring" mode repmgrd degraded monitoring degraded monitoring In certain circumstances, &repmgrd; is not able to fulfill its primary mission of monitoring the node's upstream server. In these cases it enters "degraded monitoring" mode, where &repmgrd; remains active but is waiting for the situation to be resolved. Situations where this happens are: a failover situation has occurred, no nodes in the primary node's location are visible a failover situation has occurred, but no promotion candidate is available a failover situation has occurred, but the promotion candidate could not be promoted a failover situation has occurred, but the node was unable to follow the new primary a failover situation has occurred, but no primary has become available a failover situation has occurred, but automatic failover is not enabled for the node repmgrd is monitoring the primary node, but it is not available (and no other node has been promoted as primary) Example output in a situation where there is only one standby with failover=manual, and the primary node is unavailable (but is later restarted): [2017-08-29 10:59:19] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in normal state (automatic failover disabled) [2017-08-29 10:59:33] [WARNING] unable to connect to upstream node "node1" (ID: 1) [2017-08-29 10:59:33] [INFO] checking state of node 1, 1 of 5 attempts [2017-08-29 10:59:33] [INFO] sleeping 1 seconds until next reconnection attempt (...) [2017-08-29 10:59:37] [INFO] checking state of node 1, 5 of 5 attempts [2017-08-29 10:59:37] [WARNING] unable to reconnect to node 1 after 5 attempts [2017-08-29 10:59:37] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate [2017-08-29 10:59:37] [NOTICE] no other nodes are available as promotion candidate [2017-08-29 10:59:37] [HINT] use "repmgr standby promote" to manually promote this node [2017-08-29 10:59:37] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state (automatic failover disabled) [2017-08-29 10:59:53] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state (automatic failover disabled) [2017-08-29 11:00:45] [NOTICE] reconnected to upstream node "node1" (ID: 1) after 68 seconds, resuming monitoring [2017-08-29 11:00:57] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in normal state (automatic failover disabled) By default, repmgrd will continue in degraded monitoring mode indefinitely. However a timeout (in seconds) can be set with degraded_monitoring_timeout, after which &repmgrd; will terminate. If &repmgrd; is monitoring a primary mode which has been stopped and manually restarted as a standby attached to a new primary, it will automatically detect the status change and update the node record to reflect the node's new status as an active standby. It will then resume monitoring the node as a standby. Storing monitoring data repmgrd monitoring monitoring with repmgrd When &repmgrd; is running with the option monitoring_history=true, it will constantly write standby node status information to the monitoring_history table, providing a near-real time overview of replication status on all nodes in the cluster. The view replication_status shows the most recent state for each node, e.g.: repmgr=# select * from repmgr.replication_status; -[ RECORD 1 ]-------------+------------------------------ primary_node_id | 1 standby_node_id | 2 standby_name | node2 node_type | standby active | t last_monitor_time | 2017-08-24 16:28:41.260478+09 last_wal_primary_location | 0/6D57A00 last_wal_standby_location | 0/5000000 replication_lag | 29 MB replication_time_lag | 00:00:11.736163 apply_lag | 15 MB communication_time_lag | 00:00:01.365643 The interval in which monitoring history is written is controlled by the configuration parameter monitor_interval_secs; default is 2. As this can generate a large amount of monitoring data in the table repmgr.monitoring_history. it's advisable to regularly purge historical data using the command; use the -k/--keep-history option to specify how many day's worth of data should be retained. It's possible to use &repmgrd; to run in monitoring mode only (without automatic failover capability) for some or all nodes by setting failover=manual in the node's repmgr.conf file. In the event of the node's upstream failing, no failover action will be taken and the node will require manual intervention to be reattached to replication. If this occurs, an event notification standby_disconnect_manual will be created. Note that when a standby node is not streaming directly from its upstream node, e.g. recovering WAL from an archive, apply_lag will always appear as 0 bytes. If monitoring history is enabled, the contents of the repmgr.monitoring_history table will be replicated to attached standbys. This means there will be a small but constant stream of replication activity which may not be desirable. To prevent this, convert the table to an UNLOGGED one with: ALTER TABLE repmgr.monitoring_history SET UNLOGGED; This will however mean that monitoring history will not be available on another node following a failover, and the view repmgr.replication_status will not work on standbys.