Performing a switchover with repmgr

Performing a switchover with repmgr A typical use-case for replication is a combination of primary and standby server, with the standby serving as a backup which can easily be activated in case of a problem with the primary. Such an unplanned failover would normally be handled by promoting the standby, after which an appropriate action must be taken to restore the old primary. In some cases however it's desirable to promote the standby in a planned way, e.g. so maintenance can be performed on the primary; this kind of switchover is supported by the command. repmgr standby switchover differs from other &repmgr; actions in that it lso performs actions on another server (the demotion candidate), which means passwordless SSH access is required to that server from the one where repmgr standby switchover is executed. repmgr standby switchover performs a relatively complex series of operations on two servers, and should therefore be performed after careful preparation and with adequate attention. In particular you should be confident that your network environment is stable and reliable. Additionally you should be sure that the current primary can be shut down quickly and cleanly. In particular, access from applications should be minimalized or preferably blocked completely. Also be aware that if there is a backlog of files waiting to be archived, PostgreSQL will not shut down until archiving completes. We recommend running repmgr standby switchover at the most verbose logging level (--log-level=DEBUG --verbose) and capturing all output to assist troubleshooting any problems. Please also read carefully the sections and `Caveats` below. switchover preparation Preparing for switchover As mentioned above, success of the switchover operation depends on &repmgr; being able to shut down the current primary server quickly and cleanly. Double-check which commands will be used to stop/start/restart the current primary; on the primary execute: repmgr -f /etc./repmgr.conf node service --list --action=stop repmgr -f /etc./repmgr.conf node service --list --action=start repmgr -f /etc./repmgr.conf node service --list --action=restart On systemd systems we strongly recommend using the appropriate systemctl commands (typically run via sudo) to ensure systemd informed about the status of the PostgreSQL service. Check that access from applications is minimalized or preferably blocked completely, so applications are not unexpectedly interrupted. Check there is no significant replication lag on standbys attached to the current primary. If WAL file archiving is set up, check that there is no backlog of files waiting to be archived, as PostgreSQL will not finally shut down until all these have been archived. If there is a backlog exceeding archive_ready_warning WAL files, `repmgr` will emit a warning before attempting to perform a switchover; you can also check manually with repmgr node check --archive-ready. Ensure that repmgrd is *not* running anywhere to prevent it unintentionally promoting a node. Finally, consider executing repmgr standby switchover with the --dry-run option; this will perform any necessary checks and inform you about success/failure, and stop before the first actual command is run (which would be the shutdown of the current primary). Example output: $ repmgr standby switchover -f /etc/repmgr.conf --siblings-follow --dry-run NOTICE: checking switchover on node "node2" (ID: 2) in --dry-run mode INFO: SSH connection to host "localhost" succeeded INFO: archive mode is "off" INFO: replication lag on this standby is 0 seconds INFO: all sibling nodes are reachable via SSH NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby INFO: following shutdown command would be run on node "node1": "pg_ctl -l /var/log/postgresql/startup.log -D '/var/lib/postgresql/data' -m fast -W stop" switchover execution Executing the switchover command To demonstrate switchover, we will assume a replication cluster with a primary (node1) and one standby (node2); after the switchover node2 should become the primary with node1 following it. The switchover command must be run from the standby which is to be promoted, and in its simplest form looks like this: $ repmgr -f /etc/repmgr.conf standby switchover NOTICE: executing switchover on node "node2" (ID: 2) INFO: searching for primary node INFO: checking if node 1 is primary INFO: current primary node is 1 INFO: SSH connection to host "localhost" succeeded INFO: archive mode is "off" INFO: replication lag on this standby is 0 seconds NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby NOTICE: stopping current primary node "node1" (ID: 1) NOTICE: issuing CHECKPOINT DETAIL: executing server command "pg_ctl -l /var/log/postgres/startup.log -D '/var/lib/pgsql/data' -m fast -W stop" INFO: checking primary status; 1 of 6 attempts NOTICE: current primary has been cleanly shut down at location 0/3001460 NOTICE: promoting standby to primary DETAIL: promoting server "node2" (ID: 2) using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' promote" server promoting NOTICE: STANDBY PROMOTE successful DETAIL: server "node2" (ID: 2) was successfully promoted to primary INFO: setting node 1's primary to node 2 NOTICE: starting server using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' restart" NOTICE: NODE REJOIN successful DETAIL: node 1 is now attached to node 2 NOTICE: switchover was successful DETAIL: node "node2" is now primary NOTICE: STANDBY SWITCHOVER is complete The old primary is now replicating as a standby from the new primary, and the cluster status will now look like this: $ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+-------------------------------------- 1 | node1 | standby | running | node2 | default | host=node1 dbname=repmgr user=repmgr 2 | node2 | primary | * running | | default | host=node2 dbname=repmgr user=repmgr switchover caveats Caveats If using PostgreSQL 9.3 or 9.4, you should ensure that the shutdown command is configured to use PostgreSQL's fast shutdown mode (the default in 9.5 and later). If relying on pg_ctl to perform database server operations, you should include -m fast in pg_ctl_options in repmgr.conf. pg_rewind *requires* that either wal_log_hints is enabled, or that data checksums were enabled when the cluster was initialized. See the pg_rewind documentation for details. repmgrd should not be running with setting failover=automatic in repmgr.conf when a switchover is carried out, otherwise the repmgrd daemon may try and promote a standby by itself. We hope to remove some of these restrictions in future versions of `repmgr`.