Split up command reference

This commit is contained in:
Ian Barwick
2017-10-05 11:48:21 +09:00
parent da47eb4bff
commit d156de533d
18 changed files with 621 additions and 612 deletions

View File

@@ -1,610 +0,0 @@
<chapter id="command-reference" xreflabel="command reference">
<title>repmgr command reference</title>
<para>
Overview of repmgr commands.
</para>
<sect1 id="repmgr-primary-register" xreflabel="repmgr primary register">
<indexterm><primary>repmgr primary register</primary></indexterm>
<title>repmgr primary register</title>
<para>
<command>repmgr primary register</command> registers a primary node in a
streaming replication cluster, and configures it for use with repmgr, including
installing the &repmgr; extension. This command needs to be executed before any
standby nodes are registered.
</para>
<para>
Execute with the <literal>--dry-run</literal> option to check what would happen without
actually registering the primary.
</para>
<para>
<command>repmgr master register</command> can be used as an alias for
<command>repmgr primary register</command>/
</para>
</sect1>
<sect1 id="repmgr-primary-unregister" xreflabel="repmgr primary unregister">
<indexterm><primary>repmgr primary unregister</primary></indexterm>
<title>repmgr primary unregister</title>
<para>
<command>repmgr primary register</command> unregisters an inactive primary node
from the `repmgr` metadata. This is typically when the primary has failed and is
being removed from the cluster after a new primary has been promoted.
</para>
<para>
Execute with the <literal>--dry-run</literal> option to check what would happen without
actually unregistering the node.
</para>
<para>
<command>repmgr master unregister</command> can be used as an alias for
<command>repmgr primary unregister</command>/
</para>
</sect1>
<sect1 id="repmgr-standby-clone" xreflabel="repmgr standby clone">
<indexterm>
<primary>repmgr standby clone</primary>
<seealso>cloning</seealso>
</indexterm>
<title>repmgr standby clone</title>
<para>
<command>repmgr standby clone</command> clones a PostgreSQL node from another
PostgreSQL node, typically the primary, but optionally from any other node in
the cluster or from Barman. It creates the <filename>recovery.conf</filename> file required
to attach the cloned node to the primary node (or another standby, if cascading replication
is in use).
</para>
<note>
<simpara>
<command>repmgr standby clone</command> does not start the standby, and after cloning
<command>repmgr standby register</command> must be executed to notify &repmgr; of its presence.
</simpara>
</note>
<sect2 id="repmgr-standby-clone-config-file-copying" xreflabel="Copying configuration files">
<title>Handling configuration files</title>
<para>
Note that by default, all configuration files in the source node's data
directory will be copied to the cloned node. Typically these will be
<filename>postgresql.conf</filename>, <filename>postgresql.auto.conf</filename>,
<filename>pg_hba.conf</filename> and <filename>pg_ident.conf</filename>.
These may require modification before the standby is started.
</para>
<para>
In some cases (e.g. on Debian or Ubuntu Linux installations), PostgreSQL's
configuration files are located outside of the data directory and will
not be copied by default. &repmgr; can copy these files, either to the same
location on the standby server (provided appropriate directory and file permissions
are available), or into the standby's data directory. This requires passwordless
SSH access to the primary server. Add the option <literal>--copy-external-config-files</literal>
to the <command>repmgr standby clone</command> command; by default files will be copied to
the same path as on the upstream server. Note that the user executing <command>repmgr</command>
must have write access to those directories.
</para>
<para>
To have the configuration files placed in the standby's data directory, specify
<literal>--copy-external-config-files=pgdata</literal>, but note that
any include directives in the copied files may need to be updated.
</para>
<tip>
<simpara>
For reliable configuration file management we recommend using a
configuration management tool such as Ansible, Chef, Puppet or Salt.
</simpara>
</tip>
</sect2>
<sect2 id="repmgr-standby-clone-wal-management" xreflabel="Managing WAL during the cloning process">
<title>Managing WAL during the cloning process</title>
<para>
When initially cloning a standby, you will need to ensure
that all required WAL files remain available while the cloning is taking
place. To ensure this happens when using the default `pg_basebackup` method,
&repmgr; will set <command>pg_basebackup</command>'s <literal>--xlog-method</literal>
parameter to <literal>stream</literal>,
which will ensure all WAL files generated during the cloning process are
streamed in parallel with the main backup. Note that this requires two
replication connections to be available (&repmgr; will verify sufficient
connections are available before attempting to clone, and this can be checked
before performing the clone using the <literal>--dry-run</literal> option).
</para>
<para>
To override this behaviour, in <filename>repmgr.conf</filename> set
<command>pg_basebackup</command>'s <literal>--xlog-method</literal>
parameter to <literal>fetch</literal>:
<programlisting>
pg_basebackup_options='--xlog-method=fetch'</programlisting>
and ensure that <literal>wal_keep_segments</literal> is set to an appropriately high value.
See the <ulink url="https://www.postgresql.org/docs/current/static/app-pgbasebackup.html">
pg_basebackup</ulink> documentation for details.
</para>
<note>
<simpara>
From PostgreSQL 10, <command>pg_basebackup</command>'s
<literal>--xlog-method</literal> parameter has been renamed to
<literal>--wal-method</literal>.
</simpara>
</note>
</sect2>
</sect1>
<sect1 id="repmgr-standby-register" xreflabel="repmgr standby register">
<indexterm><primary>repmgr standby register</primary></indexterm>
<title>repmgr standby register</title>
<para>
<command>repmgr standby register</command> adds a standby's information to
the &repmgr; metadata. This command needs to be executed to enable
promote/follow operations and to allow <command>repmgrd</command> to work with the node.
An existing standby can be registered using this command. Execute with the
<literal>--dry-run</literal> option to check what would happen without actually registering the
standby.
</para>
<sect2 id="repmgr-standby-register-wait" xreflabel="repmgr standby register --wait">
<title>Waiting for the registration to propagate to the standby</title>
<para>
Depending on your environment and workload, it may take some time for
the standby's node record to propagate from the primary to the standby. Some
actions (such as starting <command>repmgrd</command>) require that the standby's node record
is present and up-to-date to function correctly.
</para>
<para>
By providing the option <literal>--wait-sync</literal> to the
<command>repmgr standby register</command> command, &repmgr; will wait
until the record is synchronised before exiting. An optional timeout (in
seconds) can be added to this option (e.g. <literal>--wait-sync=60</literal>).
</para>
</sect2>
<sect2 id="rempgr-standby-register-inactive-node" xreflabel="Registering an inactive node">
<title>Registering an inactive node</title>
<para>
Under some circumstances you may wish to register a standby which is not
yet running; this can be the case when using provisioning tools to create
a complex replication cluster. In this case, by using the <literal>-F/--force</literal>
option and providing the connection parameters to the primary server,
the standby can be registered.
</para>
<para>
Similarly, with cascading replication it may be necessary to register
a standby whose upstream node has not yet been registered - in this case,
using <literal>-F/--force</literal> will result in the creation of an inactive placeholder
record for the upstream node, which will however later need to be registered
with the <literal>-F/--force</literal> option too.
</para>
<para>
When used with <command>repmgr standby register</command>, care should be taken that use of the
<literal>-F/--force</literal> option does not result in an incorrectly configured cluster.
</para>
</sect2>
</sect1>
<sect1 id="repmgr-standby-unregister" xreflabel="repmgr standby unregister">
<indexterm><primary>repmgr standby unregister</primary></indexterm>
<title>repmgr standby unregister</title>
<para>
Unregisters a standby with `repmgr`. This command does not affect the actual
replication, just removes the standby's entry from the &repmgr; metadata.
</para>
<para>
To unregister a running standby, execute:
<programlisting>
repmgr standby unregister -f /etc/repmgr.conf</programlisting>
</para>
<para>
This will remove the standby record from &repmgr;'s internal metadata
table (<literal>repmgr.nodes</literal>). A <literal>standby_unregister</literal>
event notification will be recorded in the <literal>repmgr.events</literal> table.
</para>
<para>
If the standby is not running, the command can be executed on another
node by providing the id of the node to be unregistered using
the command line parameter <literal>--node-id</literal>, e.g. executing the following
command on the master server will unregister the standby with
id <literal>3</literal>:
<programlisting>
repmgr standby unregister -f /etc/repmgr.conf --node-id=3
</programlisting>
</para>
</sect1>
<sect1 id="repmgr-standby-promote" xreflabel="repmgr standby promote">
<indexterm>
<primary>repmgr standby promote</primary>
</indexterm>
<title>repmgr standby promote</title>
<para>
Promotes a standby to a primary if the current primary has failed. This
command requires a valid <filename>repmgr.conf</filename> file for the standby, either
specified explicitly with <literal>-f/--config-file</literal> or located in a
default location; no additional arguments are required.
</para>
<para>
If the standby promotion succeeds, the server will not need to be
restarted. However any other standbys will need to follow the new server,
by using <xref linkend="repmgr-standby-follow">; if <command>repmgrd</command> is active, it will
handle this automatically.
</para>
</sect1>
<sect1 id="repmgr-standby-follow" xreflabel="repmgr standby follow">
<indexterm>
<primary>repmgr standby follow</primary>
</indexterm>
<title>repmgr standby follow</title>
<para>
Attaches the standby to a new primary. This command requires a valid
<filename>repmgr.conf</filename> file for the standby, either specified
explicitly with <literal>-f/--config-file</literal> or located in a
default location; no additional arguments are required.
</para>
<para>
This command will force a restart of the standby server, which must be
running. It can only be used to attach a standby to a new primary node.
</para>
<para>
To re-add an inactive node to the replication cluster, see
<xref linkend="repmgr-node-rejoin">
</para>
</sect1>
<sect1 id="repmgr-standby-switchover" xreflabel="repmgr standby switchover">
<indexterm>
<primary>repmgr standby switchover</primary>
</indexterm>
<title>repmgr standby switchover</title>
<para>
Promotes a standby to primary and demotes the existing primary to a standby.
This command must be run on the standby to be promoted, and requires a
passwordless SSH connection to the current primary.
</para>
<para>
If other standbys are connected to the demotion candidate, &repmgr; can instruct
these to follow the new primary if the option <literal>--siblings-follow</literal>
is specified.
</para>
<para>
Execute with the <literal>--dry-run</literal> option to test the switchover as far as
possible without actually changing the status of either node.
</para>
<para>
<command>repmgrd</command> should not be active on any nodes while a switchover is being
executed. This restriction may be lifted in a later version.
</para>
<para>
For more details see the section <xref linkend="performing-switchover">.
</para>
</sect1>
<sect1 id="repmgr-node-status" xreflabel="repmgr node status">
<indexterm>
<primary>repmgr node status</primary>
</indexterm>
<title>repmgr node status</title>
<para>
Displays an overview of a node's basic information and replication
status. This command must be run on the local node.
</para>
<para>
Sample output (execute <command>repmgr node status</command>):
<programlisting>
Node "node1":
PostgreSQL version: 10beta1
Total data size: 30 MB
Conninfo: host=node1 dbname=repmgr user=repmgr connect_timeout=2
Role: primary
WAL archiving: off
Archive command: (none)
Replication connections: 2 (of maximal 10)
Replication slots: 0 (of maximal 10)
Replication lag: n/a
</programlisting>
</para>
<para>
See <xref linkend="repmgr-node-check"> to diagnose issues.
</para>
</sect1>
<sect1 id="repmgr-node-check" xreflabel="repmgr node check">
<indexterm>
<primary>repmgr node check</primary>
</indexterm>
<title>repmgr node check</title>
<para>
Performs some health checks on a node from a replication perspective.
This command must be run on the local node.
</para>
<para>
Sample output (execute <command>repmgr node check</command>):
<programlisting>
Node "node1":
Server role: OK (node is primary)
Replication lag: OK (N/A - node is primary)
WAL archiving: OK (0 pending files)
Downstream servers: OK (2 of 2 downstream nodes attached)
Replication slots: OK (node has no replication slots)
</programlisting>
</para>
<para>
Additionally each check can be performed individually by supplying
an additional command line parameter, e.g.:
<programlisting>
$ repmgr node check --role
OK (node is primary)
</programlisting>
</para>
<para>
Parameters for individual checks are as follows:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>--role</literal>: checks if the node has the expected role
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--replication-lag</literal>: checks if the node is lagging by more than
<varname>replication_lag_warning</varname> or <varname>replication_lag_critical</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--archive-ready</literal>: checks for WAL files which have not yet been archived
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--downstream</literal>: checks that the expected downstream nodes are attached
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--slots</literal>: checks there are no inactive replication slots
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Individual checks can also be output in a Nagios-compatible format by additionally
providing the option <literal>--nagios</literal>.
</para>
</sect1>
<sect1 id="repmgr-node-rejoin" xreflabel="repmgr node rejoin">
<indexterm>
<primary>repmgr node rejoin</primary>
</indexterm>
<title>repmgr node rejoin</title>
<para>
Enables a dormant (stopped) node to be rejoined to the replication cluster.
</para>
<para>
This can optionally use <command>pg_rewind</command> to re-integrate a node which has diverged
from the rest of the cluster, typically a failed primary.
</para>
</sect1>
<sect1 id="repmgr-cluster-show" xreflabel="repmgr cluster show">
<indexterm>
<primary>repmgr cluster show</primary>
</indexterm>
<title>repmgr cluster show</title>
<para>
Displays information about each active node in the replication cluster. This
command polls each registered server and shows its role (<literal>primary</literal> /
<literal>standby</literal> / <literal>bdr</literal>) and status. It polls each server
directly and can be run on any node in the cluster; this is also useful when analyzing
connectivity from a particular node.
</para>
<para>
This command requires either a valid <filename>repmgr.conf</filename> file or a database
connection string to one of the registered nodes; no additional arguments are needed.
</para>
<para>
Example:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Connection string
----+-------+---------+-----------+----------+----------+-----------------------------------------
1 | node1 | primary | * running | | default | host=db_node1 dbname=repmgr user=repmgr
2 | node2 | standby | running | node1 | default | host=db_node2 dbname=repmgr user=repmgr
3 | node3 | standby | running | node1 | default | host=db_node3 dbname=repmgr user=repmgr</programlisting>
</para>
<para>
To show database connection errors when polling nodes, run the command in
<literal>--verbose</literal> mode.
</para>
<para>
The `cluster show` command accepts an optional parameter <literal>--csv</literal>, which
outputs the replication cluster's status in a simple CSV format, suitable for
parsing by scripts:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show --csv
1,-1,-1
2,0,0
3,0,1</programlisting>
</para>
<para>
The columns have following meanings:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
node ID
</simpara>
<simpara>
availability (0 = available, -1 = unavailable)
</simpara>
<simpara>
recovery state (0 = not in recovery, 1 = in recovery, -1 = unknown)
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Note that the availability is tested by connecting from the node where
<command>repmgr cluster show</command> is executed, and does not necessarily imply the node
is down. See <xref linkend="repmgr-cluster-matrix"> and <xref linkend="repmgr-cluster-crosscheck"> to get
a better overviews of connections between nodes.
</para>
</sect1>
<sect1 id="repmgr-cluster-matrix" xreflabel="repmgr cluster matrix">
<indexterm>
<primary>repmgr cluster matrix</primary>
</indexterm>
<title>repmgr cluster matrix</title>
<para>
<command>repmgr cluster matrix</command> runs <command>repmgr cluster show</command> on each
node and arranges the results in a matrix, recording success or failure.
</para>
<para>
<command>repmgr cluster matrix</command> requires a valid <filename>repmgr.conf</filename>
file on each node. Additionally passwordless `ssh` connections are required between
all nodes.
</para>
<para>
Example 1 (all nodes up):
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster matrix
Name | Id | 1 | 2 | 3
-------+----+----+----+----
node1 | 1 | * | * | *
node2 | 2 | * | * | *
node3 | 3 | * | * | *</programlisting>
</para>
<para>
Example 2 (<literal>node1</literal> and <literal>node2</literal> up, <literal>node3</literal> down):
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster matrix
Name | Id | 1 | 2 | 3
-------+----+----+----+----
node1 | 1 | * | * | x
node2 | 2 | * | * | x
node3 | 3 | ? | ? | ?
</programlisting>
</para>
<para>
Each row corresponds to one server, and indicates the result of
testing an outbound connection from that server.
</para>
<para>
Since <literal>node3</literal> is down, all the entries in its row are filled with
<literal>?</literal>, meaning that there we cannot test outbound connections.
</para>
<para>
The other two nodes are up; the corresponding rows have <literal>x</literal> in the
column corresponding to <literal>node3</literal>, meaning that inbound connections to
that node have failed, and `*` in the columns corresponding to
<literal>node1</literal> and <literal>node2</literal>, meaning that inbound connections
to these nodes have succeeded.
</para>
<para>
Example 3 (all nodes up, firewall dropping packets originating
from <literal>node1</literal> and directed to port 5432 on <literal>node3</literal>) -
running <command>repmgr cluster matrix</command> from <literal>node1</literal> gives the following output:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster matrix
Name | Id | 1 | 2 | 3
-------+----+----+----+----
node1 | 1 | * | * | x
node2 | 2 | * | * | *
node3 | 3 | ? | ? | ?</programlisting>
</para>
<para>
Note this may take some time depending on the <varname>connect_timeout</varname>
setting in the node <varname>conninfo</varname> strings; default is
<literal>1 minute</literal> which means without modification the above
command would take around 2 minutes to run; see comment elsewhere about setting
<varname>connect_timeout</varname>)
</para>
<para>
The matrix tells us that we cannot connect from <literal>node1</literal> to <literal>node3</literal>,
and that (therefore) we don't know the state of any outbound
connection from <literal>node3</literal>.
</para>
<para>
In this case, the <xref linkend="repmgr-cluster-crosscheck"> command will produce a more
useful result.
</para>
</sect1>
<sect1 id="repmgr-cluster-crosscheck" xreflabel="repmgr cluster crosscheck">
<indexterm>
<primary>repmgr cluster crosscheck</primary>
</indexterm>
<title>repmgr cluster crosscheck</title>
<para>
<command>repmgr cluster crosscheck</command> is similar to <xref linkend="repmgr-cluster-matrix">,
but cross-checks connections between each combination of nodes. In "Example 3" in
<xref linkend="repmgr-cluster-matrix"> we have no information about the state of <literal>node3</literal>.
However by running <command>repmgr cluster crosscheck</command> it's possible to get a better
overview of the cluster situation:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster crosscheck
Name | Id | 1 | 2 | 3
-------+----+----+----+----
node1 | 1 | * | * | x
node2 | 2 | * | * | *
node3 | 3 | * | * | *</programlisting>
</para>
<para>
What happened is that <command>repmgr cluster crosscheck</command> merged its own
<command>repmgr cluster matrix</command> with the <command>repmgr cluster matrix</command>
output from <literal>node2</literal>; the latter is able to connect to <literal>node3</literal>
and therefore determine the state of outbound connections from that node.
</para>
</sect1>
<sect1 id="repmgr-cluster-cleanup" xreflabel="repmgr cluster cleanup">
<indexterm>
<primary>repmgr cluster cleanup</primary>
</indexterm>
<title>repmgr cluster cleanup</title>
<para>
Purges monitoring history from the <literal>repmgr.monitoring_history</literal> table to
prevent excessive table growth. Use the <literal>-k/--keep-history</literal> to specify the
number of days of monitoring history to retain. This command can be used
manually or as a cronjob.
</para>
<para>
This command requires a valid <filename>repmgr.conf</filename> file for the node on which it is
executed; no additional arguments are required.
</para>
<note>
<simpara>
Monitoring history will only be written if <command>repmgrd</command> is active, and
<varname>monitoring_history</varname> is set to <literal>true</literal> in <filename>repmgr.conf</filename>.
</simpara>
</note>
</sect1>
</chapter>

View File

@@ -43,7 +43,23 @@
<!ENTITY promoting-standby SYSTEM "promoting-standby.sgml">
<!ENTITY follow-new-primary SYSTEM "follow-new-primary.sgml">
<!ENTITY switchover SYSTEM "switchover.sgml">
<!ENTITY command-reference SYSTEM "command-reference.sgml">
<!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.sgml">
<!ENTITY repmgr-primary-unregister SYSTEM "repmgr-primary-unregister.sgml">
<!ENTITY repmgr-standby-clone SYSTEM "repmgr-standby-clone.sgml">
<!ENTITY repmgr-standby-register SYSTEM "repmgr-standby-register.sgml">
<!ENTITY repmgr-standby-unregister SYSTEM "repmgr-standby-unregister.sgml">
<!ENTITY repmgr-standby-promote SYSTEM "repmgr-standby-promote.sgml">
<!ENTITY repmgr-standby-follow SYSTEM "repmgr-standby-follow.sgml">
<!ENTITY repmgr-standby-switchover SYSTEM "repmgr-standby-switchover.sgml">
<!ENTITY repmgr-node-status SYSTEM "repmgr-node-status.sgml">
<!ENTITY repmgr-node-check SYSTEM "repmgr-node-check.sgml">
<!ENTITY repmgr-node-rejoin SYSTEM "repmgr-node-rejoin.sgml">
<!ENTITY repmgr-cluster-show SYSTEM "repmgr-cluster-show.sgml">
<!ENTITY repmgr-cluster-matrix SYSTEM "repmgr-cluster-matrix.sgml">
<!ENTITY repmgr-cluster-crosscheck SYSTEM "repmgr-cluster-crosscheck.sgml">
<!ENTITY repmgr-cluster-cleanup SYSTEM "repmgr-cluster-cleanup.sgml">
<!ENTITY appendix-signatures SYSTEM "appendix-signatures.sgml">
<!ENTITY bookindex SYSTEM "bookindex.sgml">

View File

@@ -0,0 +1,22 @@
<chapter id="repmgr-cluster-cleanup" xreflabel="repmgr cluster cleanup">
<indexterm>
<primary>repmgr cluster cleanup</primary>
</indexterm>
<title>repmgr cluster cleanup</title>
<para>
Purges monitoring history from the <literal>repmgr.monitoring_history</literal> table to
prevent excessive table growth. Use the <literal>-k/--keep-history</literal> to specify the
number of days of monitoring history to retain. This command can be used
manually or as a cronjob.
</para>
<para>
This command requires a valid <filename>repmgr.conf</filename> file for the node on which it is
executed; no additional arguments are required.
</para>
<note>
<simpara>
Monitoring history will only be written if <command>repmgrd</command> is active, and
<varname>monitoring_history</varname> is set to <literal>true</literal> in <filename>repmgr.conf</filename>.
</simpara>
</note>
</chapter>

View File

@@ -0,0 +1,28 @@
<chapter id="repmgr-cluster-crosscheck" xreflabel="repmgr cluster crosscheck">
<indexterm>
<primary>repmgr cluster crosscheck</primary>
</indexterm>
<title>repmgr cluster crosscheck</title>
<para>
<command>repmgr cluster crosscheck</command> is similar to <xref linkend="repmgr-cluster-matrix">,
but cross-checks connections between each combination of nodes. In "Example 3" in
<xref linkend="repmgr-cluster-matrix"> we have no information about the state of <literal>node3</literal>.
However by running <command>repmgr cluster crosscheck</command> it's possible to get a better
overview of the cluster situation:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster crosscheck
Name | Id | 1 | 2 | 3
-------+----+----+----+----
node1 | 1 | * | * | x
node2 | 2 | * | * | *
node3 | 3 | * | * | *</programlisting>
</para>
<para>
What happened is that <command>repmgr cluster crosscheck</command> merged its own
<command>repmgr cluster matrix</command> with the <command>repmgr cluster matrix</command>
output from <literal>node2</literal>; the latter is able to connect to <literal>node3</literal>
and therefore determine the state of outbound connections from that node.
</para>
</chapter>

View File

@@ -0,0 +1,83 @@
<chapter id="repmgr-cluster-matrix" xreflabel="repmgr cluster matrix">
<indexterm>
<primary>repmgr cluster matrix</primary>
</indexterm>
<title>repmgr cluster matrix</title>
<para>
<command>repmgr cluster matrix</command> runs <command>repmgr cluster show</command> on each
node and arranges the results in a matrix, recording success or failure.
</para>
<para>
<command>repmgr cluster matrix</command> requires a valid <filename>repmgr.conf</filename>
file on each node. Additionally passwordless `ssh` connections are required between
all nodes.
</para>
<para>
Example 1 (all nodes up):
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster matrix
Name | Id | 1 | 2 | 3
-------+----+----+----+----
node1 | 1 | * | * | *
node2 | 2 | * | * | *
node3 | 3 | * | * | *</programlisting>
</para>
<para>
Example 2 (<literal>node1</literal> and <literal>node2</literal> up, <literal>node3</literal> down):
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster matrix
Name | Id | 1 | 2 | 3
-------+----+----+----+----
node1 | 1 | * | * | x
node2 | 2 | * | * | x
node3 | 3 | ? | ? | ?
</programlisting>
</para>
<para>
Each row corresponds to one server, and indicates the result of
testing an outbound connection from that server.
</para>
<para>
Since <literal>node3</literal> is down, all the entries in its row are filled with
<literal>?</literal>, meaning that there we cannot test outbound connections.
</para>
<para>
The other two nodes are up; the corresponding rows have <literal>x</literal> in the
column corresponding to <literal>node3</literal>, meaning that inbound connections to
that node have failed, and `*` in the columns corresponding to
<literal>node1</literal> and <literal>node2</literal>, meaning that inbound connections
to these nodes have succeeded.
</para>
<para>
Example 3 (all nodes up, firewall dropping packets originating
from <literal>node1</literal> and directed to port 5432 on <literal>node3</literal>) -
running <command>repmgr cluster matrix</command> from <literal>node1</literal> gives the following output:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster matrix
Name | Id | 1 | 2 | 3
-------+----+----+----+----
node1 | 1 | * | * | x
node2 | 2 | * | * | *
node3 | 3 | ? | ? | ?</programlisting>
</para>
<para>
Note this may take some time depending on the <varname>connect_timeout</varname>
setting in the node <varname>conninfo</varname> strings; default is
<literal>1 minute</literal> which means without modification the above
command would take around 2 minutes to run; see comment elsewhere about setting
<varname>connect_timeout</varname>)
</para>
<para>
The matrix tells us that we cannot connect from <literal>node1</literal> to <literal>node3</literal>,
and that (therefore) we don't know the state of any outbound
connection from <literal>node3</literal>.
</para>
<para>
In this case, the <xref linkend="repmgr-cluster-crosscheck"> command will produce a more
useful result.
</para>
</chapter>

View File

@@ -0,0 +1,67 @@
<chapter id="repmgr-cluster-show" xreflabel="repmgr cluster show">
<indexterm>
<primary>repmgr cluster show</primary>
</indexterm>
<title>repmgr cluster show</title>
<para>
Displays information about each active node in the replication cluster. This
command polls each registered server and shows its role (<literal>primary</literal> /
<literal>standby</literal> / <literal>bdr</literal>) and status. It polls each server
directly and can be run on any node in the cluster; this is also useful when analyzing
connectivity from a particular node.
</para>
<para>
This command requires either a valid <filename>repmgr.conf</filename> file or a database
connection string to one of the registered nodes; no additional arguments are needed.
</para>
<para>
Example:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Connection string
----+-------+---------+-----------+----------+----------+-----------------------------------------
1 | node1 | primary | * running | | default | host=db_node1 dbname=repmgr user=repmgr
2 | node2 | standby | running | node1 | default | host=db_node2 dbname=repmgr user=repmgr
3 | node3 | standby | running | node1 | default | host=db_node3 dbname=repmgr user=repmgr</programlisting>
</para>
<para>
To show database connection errors when polling nodes, run the command in
<literal>--verbose</literal> mode.
</para>
<para>
The `cluster show` command accepts an optional parameter <literal>--csv</literal>, which
outputs the replication cluster's status in a simple CSV format, suitable for
parsing by scripts:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show --csv
1,-1,-1
2,0,0
3,0,1</programlisting>
</para>
<para>
The columns have following meanings:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
node ID
</simpara>
<simpara>
availability (0 = available, -1 = unavailable)
</simpara>
<simpara>
recovery state (0 = not in recovery, 1 = in recovery, -1 = unknown)
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Note that the availability is tested by connecting from the node where
<command>repmgr cluster show</command> is executed, and does not necessarily imply the node
is down. See <xref linkend="repmgr-cluster-matrix"> and <xref linkend="repmgr-cluster-crosscheck"> to get
a better overviews of connections between nodes.
</para>
</chapter>

View File

@@ -0,0 +1,70 @@
<chapter id="repmgr-node-check" xreflabel="repmgr node check">
<indexterm>
<primary>repmgr node check</primary>
</indexterm>
<title>repmgr node check</title>
<para>
Performs some health checks on a node from a replication perspective.
This command must be run on the local node.
</para>
<para>
Sample output (execute <command>repmgr node check</command>):
<programlisting>
Node "node1":
Server role: OK (node is primary)
Replication lag: OK (N/A - node is primary)
WAL archiving: OK (0 pending files)
Downstream servers: OK (2 of 2 downstream nodes attached)
Replication slots: OK (node has no replication slots)
</programlisting>
</para>
<para>
Additionally each check can be performed individually by supplying
an additional command line parameter, e.g.:
<programlisting>
$ repmgr node check --role
OK (node is primary)
</programlisting>
</para>
<para>
Parameters for individual checks are as follows:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>--role</literal>: checks if the node has the expected role
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--replication-lag</literal>: checks if the node is lagging by more than
<varname>replication_lag_warning</varname> or <varname>replication_lag_critical</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--archive-ready</literal>: checks for WAL files which have not yet been archived
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--downstream</literal>: checks that the expected downstream nodes are attached
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--slots</literal>: checks there are no inactive replication slots
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Individual checks can also be output in a Nagios-compatible format by additionally
providing the option <literal>--nagios</literal>.
</para>
</chapter>

View File

@@ -0,0 +1,13 @@
<chapter id="repmgr-node-rejoin" xreflabel="repmgr node rejoin">
<indexterm>
<primary>repmgr node rejoin</primary>
</indexterm>
<title>repmgr node rejoin</title>
<para>
Enables a dormant (stopped) node to be rejoined to the replication cluster.
</para>
<para>
This can optionally use <command>pg_rewind</command> to re-integrate a node which has diverged
from the rest of the cluster, typically a failed primary.
</para>
</chapter>

View File

@@ -0,0 +1,29 @@
<chapter id="repmgr-node-status" xreflabel="repmgr node status">
<indexterm>
<primary>repmgr node status</primary>
</indexterm>
<title>repmgr node status</title>
<para>
Displays an overview of a node's basic information and replication
status. This command must be run on the local node.
</para>
<para>
Sample output (execute <command>repmgr node status</command>):
<programlisting>
Node "node1":
PostgreSQL version: 10beta1
Total data size: 30 MB
Conninfo: host=node1 dbname=repmgr user=repmgr connect_timeout=2
Role: primary
WAL archiving: off
Archive command: (none)
Replication connections: 2 (of maximal 10)
Replication slots: 0 (of maximal 10)
Replication lag: n/a
</programlisting>
</para>
<para>
See <xref linkend="repmgr-node-check"> to diagnose issues.
</para>
</chapter>

View File

@@ -0,0 +1,18 @@
<chapter id="repmgr-primary-register" xreflabel="repmgr primary register">
<indexterm><primary>repmgr primary register</primary></indexterm>
<title>repmgr primary register</title>
<para>
<command>repmgr primary register</command> registers a primary node in a
streaming replication cluster, and configures it for use with repmgr, including
installing the &repmgr; extension. This command needs to be executed before any
standby nodes are registered.
</para>
<para>
Execute with the <literal>--dry-run</literal> option to check what would happen without
actually registering the primary.
</para>
<para>
<command>repmgr master register</command> can be used as an alias for
<command>repmgr primary register</command>.
</para>
</chapter>

View File

@@ -0,0 +1,18 @@
<chapter id="repmgr-primary-unregister" xreflabel="repmgr primary unregister">
<indexterm><primary>repmgr primary unregister</primary></indexterm>
<title>repmgr primary unregister</title>
<para>
<command>repmgr primary register</command> unregisters an inactive primary node
from the &repmgr; metadata. This is typically when the primary has failed and is
being removed from the cluster after a new primary has been promoted.
</para>
<para>
Execute with the <literal>--dry-run</literal> option to check what would happen without
actually unregistering the node.
</para>
<para>
<command>repmgr master unregister</command> can be used as an alias for
<command>repmgr primary unregister</command>/
</para>
</chapter>

View File

@@ -0,0 +1,91 @@
<chapter id="repmgr-standby-clone" xreflabel="repmgr standby clone">
<indexterm>
<primary>repmgr standby clone</primary>
<seealso>cloning</seealso>
</indexterm>
<title>repmgr standby clone</title>
<para>
<command>repmgr standby clone</command> clones a PostgreSQL node from another
PostgreSQL node, typically the primary, but optionally from any other node in
the cluster or from Barman. It creates the <filename>recovery.conf</filename> file required
to attach the cloned node to the primary node (or another standby, if cascading replication
is in use).
</para>
<note>
<simpara>
<command>repmgr standby clone</command> does not start the standby, and after cloning
<command>repmgr standby register</command> must be executed to notify &repmgr; of its presence.
</simpara>
</note>
<sect1 id="repmgr-standby-clone-config-file-copying" xreflabel="Copying configuration files">
<title>Handling configuration files</title>
<para>
Note that by default, all configuration files in the source node's data
directory will be copied to the cloned node. Typically these will be
<filename>postgresql.conf</filename>, <filename>postgresql.auto.conf</filename>,
<filename>pg_hba.conf</filename> and <filename>pg_ident.conf</filename>.
These may require modification before the standby is started.
</para>
<para>
In some cases (e.g. on Debian or Ubuntu Linux installations), PostgreSQL's
configuration files are located outside of the data directory and will
not be copied by default. &repmgr; can copy these files, either to the same
location on the standby server (provided appropriate directory and file permissions
are available), or into the standby's data directory. This requires passwordless
SSH access to the primary server. Add the option <literal>--copy-external-config-files</literal>
to the <command>repmgr standby clone</command> command; by default files will be copied to
the same path as on the upstream server. Note that the user executing <command>repmgr</command>
must have write access to those directories.
</para>
<para>
To have the configuration files placed in the standby's data directory, specify
<literal>--copy-external-config-files=pgdata</literal>, but note that
any include directives in the copied files may need to be updated.
</para>
<tip>
<simpara>
For reliable configuration file management we recommend using a
configuration management tool such as Ansible, Chef, Puppet or Salt.
</simpara>
</tip>
</sect1>
<sect1 id="repmgr-standby-clone-wal-management" xreflabel="Managing WAL during the cloning process">
<title>Managing WAL during the cloning process</title>
<para>
When initially cloning a standby, you will need to ensure
that all required WAL files remain available while the cloning is taking
place. To ensure this happens when using the default `pg_basebackup` method,
&repmgr; will set <command>pg_basebackup</command>'s <literal>--xlog-method</literal>
parameter to <literal>stream</literal>,
which will ensure all WAL files generated during the cloning process are
streamed in parallel with the main backup. Note that this requires two
replication connections to be available (&repmgr; will verify sufficient
connections are available before attempting to clone, and this can be checked
before performing the clone using the <literal>--dry-run</literal> option).
</para>
<para>
To override this behaviour, in <filename>repmgr.conf</filename> set
<command>pg_basebackup</command>'s <literal>--xlog-method</literal>
parameter to <literal>fetch</literal>:
<programlisting>
pg_basebackup_options='--xlog-method=fetch'</programlisting>
and ensure that <literal>wal_keep_segments</literal> is set to an appropriately high value.
See the <ulink url="https://www.postgresql.org/docs/current/static/app-pgbasebackup.html">
pg_basebackup</ulink> documentation for details.
</para>
<note>
<simpara>
From PostgreSQL 10, <command>pg_basebackup</command>'s
<literal>--xlog-method</literal> parameter has been renamed to
<literal>--wal-method</literal>.
</simpara>
</note>
</sect1>
</chapter>

View File

@@ -0,0 +1,21 @@
<chapter id="repmgr-standby-follow" xreflabel="repmgr standby follow">
<indexterm>
<primary>repmgr standby follow</primary>
</indexterm>
<title>repmgr standby follow</title>
<para>
Attaches the standby to a new primary. This command requires a valid
<filename>repmgr.conf</filename> file for the standby, either specified
explicitly with <literal>-f/--config-file</literal> or located in a
default location; no additional arguments are required.
</para>
<para>
This command will force a restart of the standby server, which must be
running. It can only be used to attach a standby to a new primary node.
</para>
<para>
To re-add an inactive node to the replication cluster, see
<xref linkend="repmgr-node-rejoin">
</para>
</chapter>

View File

@@ -0,0 +1,18 @@
<chapter id="repmgr-standby-promote" xreflabel="repmgr standby promote">
<indexterm>
<primary>repmgr standby promote</primary>
</indexterm>
<title>repmgr standby promote</title>
<para>
Promotes a standby to a primary if the current primary has failed. This
command requires a valid <filename>repmgr.conf</filename> file for the standby, either
specified explicitly with <literal>-f/--config-file</literal> or located in a
default location; no additional arguments are required.
</para>
<para>
If the standby promotion succeeds, the server will not need to be
restarted. However any other standbys will need to follow the new server,
by using <xref linkend="repmgr-standby-follow">; if <command>repmgrd</command>
is active, it will handle this automatically.
</para>
</chapter>

View File

@@ -0,0 +1,50 @@
<chapter id="repmgr-standby-register" xreflabel="repmgr standby register">
<indexterm><primary>repmgr standby register</primary></indexterm>
<title>repmgr standby register</title>
<para>
<command>repmgr standby register</command> adds a standby's information to
the &repmgr; metadata. This command needs to be executed to enable
promote/follow operations and to allow <command>repmgrd</command> to work with the node.
An existing standby can be registered using this command. Execute with the
<literal>--dry-run</literal> option to check what would happen without actually registering the
standby.
</para>
<sect1 id="repmgr-standby-register-wait" xreflabel="repmgr standby register --wait">
<title>Waiting for the registration to propagate to the standby</title>
<para>
Depending on your environment and workload, it may take some time for
the standby's node record to propagate from the primary to the standby. Some
actions (such as starting <command>repmgrd</command>) require that the standby's node record
is present and up-to-date to function correctly.
</para>
<para>
By providing the option <literal>--wait-sync</literal> to the
<command>repmgr standby register</command> command, &repmgr; will wait
until the record is synchronised before exiting. An optional timeout (in
seconds) can be added to this option (e.g. <literal>--wait-sync=60</literal>).
</para>
</sect1>
<sect1 id="rempgr-standby-register-inactive-node" xreflabel="Registering an inactive node">
<title>Registering an inactive node</title>
<para>
Under some circumstances you may wish to register a standby which is not
yet running; this can be the case when using provisioning tools to create
a complex replication cluster. In this case, by using the <literal>-F/--force</literal>
option and providing the connection parameters to the primary server,
the standby can be registered.
</para>
<para>
Similarly, with cascading replication it may be necessary to register
a standby whose upstream node has not yet been registered - in this case,
using <literal>-F/--force</literal> will result in the creation of an inactive placeholder
record for the upstream node, which will however later need to be registered
with the <literal>-F/--force</literal> option too.
</para>
<para>
When used with <command>repmgr standby register</command>, care should be taken that use of the
<literal>-F/--force</literal> option does not result in an incorrectly configured cluster.
</para>
</sect1>
</chapter>

View File

@@ -0,0 +1,27 @@
<chapter id="repmgr-standby-switchover" xreflabel="repmgr standby switchover">
<indexterm>
<primary>repmgr standby switchover</primary>
</indexterm>
<title>repmgr standby switchover</title>
<para>
Promotes a standby to primary and demotes the existing primary to a standby.
This command must be run on the standby to be promoted, and requires a
passwordless SSH connection to the current primary.
</para>
<para>
If other standbys are connected to the demotion candidate, &repmgr; can instruct
these to follow the new primary if the option <literal>--siblings-follow</literal>
is specified.
</para>
<para>
Execute with the <literal>--dry-run</literal> option to test the switchover as far as
possible without actually changing the status of either node.
</para>
<para>
<command>repmgrd</command> should not be active on any nodes while a switchover is being
executed. This restriction may be lifted in a later version.
</para>
<para>
For more details see the section <xref linkend="performing-switchover">.
</para>
</chapter>

View File

@@ -0,0 +1,29 @@
<chapter id="repmgr-standby-unregister" xreflabel="repmgr standby unregister">
<indexterm><primary>repmgr standby unregister</primary></indexterm>
<title>repmgr standby unregister</title>
<para>
Unregisters a standby with `repmgr`. This command does not affect the actual
replication, just removes the standby's entry from the &repmgr; metadata.
</para>
<para>
To unregister a running standby, execute:
<programlisting>
repmgr standby unregister -f /etc/repmgr.conf</programlisting>
</para>
<para>
This will remove the standby record from &repmgr;'s internal metadata
table (<literal>repmgr.nodes</literal>). A <literal>standby_unregister</literal>
event notification will be recorded in the <literal>repmgr.events</literal> table.
</para>
<para>
If the standby is not running, the command can be executed on another
node by providing the id of the node to be unregistered using
the command line parameter <literal>--node-id</literal>, e.g. executing the following
command on the master server will unregister the standby with
id <literal>3</literal>:
<programlisting>
repmgr standby unregister -f /etc/repmgr.conf --node-id=3
</programlisting>
</para>
</chapter>

View File

@@ -52,6 +52,7 @@
<keyword>PostgreSQL</keyword>
<keyword>replication</keyword>
<keyword>asynchronous</keyword>
<keyword>HA</keyword>
<keyword>high-availability</keyword>
</keywordset>
</bookinfo>
@@ -72,9 +73,27 @@
&promoting-standby;
&follow-new-primary;
&switchover;
&command-reference;
</part>
<part id="repmgr-command-reference">
<title>repmgr command reference</title>
&repmgr-primary-register;
&repmgr-primary-unregister;
&repmgr-standby-clone;
&repmgr-standby-register;
&repmgr-standby-unregister;
&repmgr-standby-promote;
&repmgr-standby-follow;
&repmgr-standby-switchover;
&repmgr-node-status;
&repmgr-node-check;
&repmgr-node-rejoin;
&repmgr-cluster-show;
&repmgr-cluster-matrix;
&repmgr-cluster-crosscheck;
&repmgr-cluster-cleanup;
</part>
&appendix-signatures;