repmgr command reference

repmgr command reference Overview of repmgr commands. repmgr primary register repmgr primary register repmgr primary register registers a primary node in a streaming replication cluster, and configures it for use with repmgr, including installing the &repmgr; extension. This command needs to be executed before any standby nodes are registered. Execute with the --dry-run option to check what would happen without actually registering the primary. repmgr master register can be used as an alias for repmgr primary register/ repmgr primary unregister repmgr primary unregister repmgr primary register unregisters an inactive primary node from the `repmgr` metadata. This is typically when the primary has failed and is being removed from the cluster after a new primary has been promoted. Execute with the --dry-run option to check what would happen without actually unregistering the node. repmgr master unregister can be used as an alias for repmgr primary unregister/ repmgr standby clone cloning repmgr standby clone repmgr standby clone clones a PostgreSQL node from another PostgreSQL node, typically the primary, but optionally from any other node in the cluster or from Barman. It creates the recovery.conf file required to attach the cloned node to the primary node (or another standby, if cascading replication is in use). repmgr standby clone does not start the standby, and after cloning repmgr standby register must be executed to notify &repmgr; of its presence. Handling configuration files Note that by default, all configuration files in the source node's data directory will be copied to the cloned node. Typically these will be postgresql.conf, postgresql.auto.conf, pg_hba.conf and pg_ident.conf. These may require modification before the standby is started. In some cases (e.g. on Debian or Ubuntu Linux installations), PostgreSQL's configuration files are located outside of the data directory and will not be copied by default. &repmgr; can copy these files, either to the same location on the standby server (provided appropriate directory and file permissions are available), or into the standby's data directory. This requires passwordless SSH access to the primary server. Add the option --copy-external-config-files to the repmgr standby clone command; by default files will be copied to the same path as on the upstream server. Note that the user executing repmgr must have write access to those directories. To have the configuration files placed in the standby's data directory, specify --copy-external-config-files=pgdata, but note that any include directives in the copied files may need to be updated. For reliable configuration file management we recommend using a configuration management tool such as Ansible, Chef, Puppet or Salt. Managing WAL during the cloning process When initially cloning a standby, you will need to ensure that all required WAL files remain available while the cloning is taking place. To ensure this happens when using the default `pg_basebackup` method, &repmgr; will set pg_basebackup's --xlog-method parameter to stream, which will ensure all WAL files generated during the cloning process are streamed in parallel with the main backup. Note that this requires two replication connections to be available (&repmgr; will verify sufficient connections are available before attempting to clone, and this can be checked before performing the clone using the --dry-run option). To override this behaviour, in repmgr.conf set pg_basebackup's --xlog-method parameter to fetch: pg_basebackup_options='--xlog-method=fetch' and ensure that wal_keep_segments is set to an appropriately high value. See the pg_basebackup documentation for details. From PostgreSQL 10, pg_basebackup's --xlog-method parameter has been renamed to --wal-method. repmgr standby register repmgr standby register repmgr standby register adds a standby's information to the &repmgr; metadata. This command needs to be executed to enable promote/follow operations and to allow repmgrd to work with the node. An existing standby can be registered using this command. Execute with the --dry-run option to check what would happen without actually registering the standby. Waiting for the registration to propagate to the standby Depending on your environment and workload, it may take some time for the standby's node record to propagate from the primary to the standby. Some actions (such as starting repmgrd) require that the standby's node record is present and up-to-date to function correctly. By providing the option --wait-sync to the repmgr standby register command, &repmgr; will wait until the record is synchronised before exiting. An optional timeout (in seconds) can be added to this option (e.g. --wait-sync=60). Registering an inactive node Under some circumstances you may wish to register a standby which is not yet running; this can be the case when using provisioning tools to create a complex replication cluster. In this case, by using the -F/--force option and providing the connection parameters to the primary server, the standby can be registered. Similarly, with cascading replication it may be necessary to register a standby whose upstream node has not yet been registered - in this case, using -F/--force will result in the creation of an inactive placeholder record for the upstream node, which will however later need to be registered with the -F/--force option too. When used with repmgr standby register, care should be taken that use of the -F/--force option does not result in an incorrectly configured cluster. repmgr standby unregister repmgr standby unregister Unregisters a standby with `repmgr`. This command does not affect the actual replication, just removes the standby's entry from the &repmgr; metadata. To unregister a running standby, execute: repmgr standby unregister -f /etc/repmgr.conf This will remove the standby record from &repmgr;'s internal metadata table (repmgr.nodes). A standby_unregister event notification will be recorded in the repmgr.events table. If the standby is not running, the command can be executed on another node by providing the id of the node to be unregistered using the command line parameter --node-id, e.g. executing the following command on the master server will unregister the standby with id 3: repmgr standby unregister -f /etc/repmgr.conf --node-id=3 repmgr standby promote repmgr standby promote Promotes a standby to a primary if the current primary has failed. This command requires a valid repmgr.conf file for the standby, either specified explicitly with -f/--config-file or located in a default location; no additional arguments are required. If the standby promotion succeeds, the server will not need to be restarted. However any other standbys will need to follow the new server, by using ; if repmgrd is active, it will handle this automatically. repmgr standby follow repmgr standby follow Attaches the standby to a new primary. This command requires a valid repmgr.conf file for the standby, either specified explicitly with -f/--config-file or located in a default location; no additional arguments are required. This command will force a restart of the standby server, which must be running. It can only be used to attach a standby to a new primary node. To re-add an inactive node to the replication cluster, see repmgr standby switchover repmgr standby switchover Promotes a standby to primary and demotes the existing primary to a standby. This command must be run on the standby to be promoted, and requires a passwordless SSH connection to the current primary. If other standbys are connected to the demotion candidate, &repmgr; can instruct these to follow the new primary if the option --siblings-follow is specified. Execute with the --dry-run option to test the switchover as far as possible without actually changing the status of either node. repmgrd should not be active on any nodes while a switchover is being executed. This restriction may be lifted in a later version. For more details see the section . repmgr node status repmgr node status Displays an overview of a node's basic information and replication status. This command must be run on the local node. Sample output (execute repmgr node status): Node "node1": PostgreSQL version: 10beta1 Total data size: 30 MB Conninfo: host=node1 dbname=repmgr user=repmgr connect_timeout=2 Role: primary WAL archiving: off Archive command: (none) Replication connections: 2 (of maximal 10) Replication slots: 0 (of maximal 10) Replication lag: n/a See to diagnose issues. repmgr node check repmgr node check Performs some health checks on a node from a replication perspective. This command must be run on the local node. Sample output (execute repmgr node check): Node "node1": Server role: OK (node is primary) Replication lag: OK (N/A - node is primary) WAL archiving: OK (0 pending files) Downstream servers: OK (2 of 2 downstream nodes attached) Replication slots: OK (node has no replication slots) Additionally each check can be performed individually by supplying an additional command line parameter, e.g.: $ repmgr node check --role OK (node is primary) Parameters for individual checks are as follows: --role: checks if the node has the expected role --replication-lag: checks if the node is lagging by more than replication_lag_warning or replication_lag_critical --archive-ready: checks for WAL files which have not yet been archived --downstream: checks that the expected downstream nodes are attached --slots: checks there are no inactive replication slots Individual checks can also be output in a Nagios-compatible format by additionally providing the option --nagios. repmgr node rejoin repmgr node rejoin Enables a dormant (stopped) node to be rejoined to the replication cluster. This can optionally use pg_rewind to re-integrate a node which has diverged from the rest of the cluster, typically a failed primary. repmgr cluster show repmgr cluster show Displays information about each active node in the replication cluster. This command polls each registered server and shows its role (primary / standby / bdr) and status. It polls each server directly and can be run on any node in the cluster; this is also useful when analyzing connectivity from a particular node. This command requires either a valid repmgr.conf file or a database connection string to one of the registered nodes; no additional arguments are needed. Example: $ repmgr -f /etc/repmgr.conf cluster show ID | Name | Role | Status | Upstream | Location | Connection string ----+-------+---------+-----------+----------+----------+----------------------------------------- 1 | node1 | primary | * running | | default | host=db_node1 dbname=repmgr user=repmgr 2 | node2 | standby | running | node1 | default | host=db_node2 dbname=repmgr user=repmgr 3 | node3 | standby | running | node1 | default | host=db_node3 dbname=repmgr user=repmgr To show database connection errors when polling nodes, run the command in --verbose mode. The `cluster show` command accepts an optional parameter --csv, which outputs the replication cluster's status in a simple CSV format, suitable for parsing by scripts: $ repmgr -f /etc/repmgr.conf cluster show --csv 1,-1,-1 2,0,0 3,0,1 The columns have following meanings: node ID availability (0 = available, -1 = unavailable) recovery state (0 = not in recovery, 1 = in recovery, -1 = unknown) Note that the availability is tested by connecting from the node where repmgr cluster show is executed, and does not necessarily imply the node is down. See and to get a better overviews of connections between nodes. repmgr cluster matrix repmgr cluster matrix repmgr cluster matrix runs repmgr cluster show on each node and arranges the results in a matrix, recording success or failure. repmgr cluster matrix requires a valid repmgr.conf file on each node. Additionally passwordless `ssh` connections are required between all nodes. Example 1 (all nodes up): $ repmgr -f /etc/repmgr.conf cluster matrix Name | Id | 1 | 2 | 3 -------+----+----+----+---- node1 | 1 | * | * | * node2 | 2 | * | * | * node3 | 3 | * | * | * Example 2 (node1 and node2 up, node3 down): $ repmgr -f /etc/repmgr.conf cluster matrix Name | Id | 1 | 2 | 3 -------+----+----+----+---- node1 | 1 | * | * | x node2 | 2 | * | * | x node3 | 3 | ? | ? | ? Each row corresponds to one server, and indicates the result of testing an outbound connection from that server. Since node3 is down, all the entries in its row are filled with ?, meaning that there we cannot test outbound connections. The other two nodes are up; the corresponding rows have x in the column corresponding to node3, meaning that inbound connections to that node have failed, and `*` in the columns corresponding to node1 and node2, meaning that inbound connections to these nodes have succeeded. Example 3 (all nodes up, firewall dropping packets originating from node1 and directed to port 5432 on node3) - running repmgr cluster matrix from node1 gives the following output: $ repmgr -f /etc/repmgr.conf cluster matrix Name | Id | 1 | 2 | 3 -------+----+----+----+---- node1 | 1 | * | * | x node2 | 2 | * | * | * node3 | 3 | ? | ? | ? Note this may take some time depending on the connect_timeout setting in the node conninfo strings; default is 1 minute which means without modification the above command would take around 2 minutes to run; see comment elsewhere about setting connect_timeout) The matrix tells us that we cannot connect from node1 to node3, and that (therefore) we don't know the state of any outbound connection from node3. In this case, the command will produce a more useful result. repmgr cluster crosscheck repmgr cluster crosscheck repmgr cluster crosscheck is similar to , but cross-checks connections between each combination of nodes. In "Example 3" in we have no information about the state of node3. However by running repmgr cluster crosscheck it's possible to get a better overview of the cluster situation: $ repmgr -f /etc/repmgr.conf cluster crosscheck Name | Id | 1 | 2 | 3 -------+----+----+----+---- node1 | 1 | * | * | x node2 | 2 | * | * | * node3 | 3 | * | * | * What happened is that repmgr cluster crosscheck merged its own repmgr cluster matrix with the repmgr cluster matrix output from node2; the latter is able to connect to node3 and therefore determine the state of outbound connections from that node. repmgr cluster cleanup repmgr cluster cleanup Purges monitoring history from the repmgr.monitoring_history table to prevent excessive table growth. Use the -k/--keep-history to specify the number of days of monitoring history to retain. This command can be used manually or as a cronjob. This command requires a valid repmgr.conf file for the node on which it is executed; no additional arguments are required. Monitoring history will only be written if repmgrd is active, and monitoring_history is set to true in repmgr.conf.