repmgr node checkrepmgr node checkrepmgr node checkperforms some health checks on a node from a replication perspectiveDescription
Performs some health checks on a node from a replication perspective.
This command must be run on the local node.
Currently &repmgr; performs health checks on physical replication
slots only, with the aim of warning about streaming replication standbys which
have become detached and the associated risk of uncontrolled WAL file
growth.
Example
Execution on the primary server:
$ repmgr -f /etc/repmgr.conf node check
Node "node1":
Server role: OK (node is primary)
Replication lag: OK (N/A - node is primary)
WAL archiving: OK (0 pending files)
Upstream connection: OK (N/A - is primary)
Downstream servers: OK (2 of 2 downstream nodes attached)
Replication slots: OK (node has no physical replication slots)
Missing replication slots: OK (node has no missing physical replication slots)
Configured data directory: OK (configured "data_directory" is "/var/lib/postgresql/data")
Execution on a standby server:
$ repmgr -f /etc/repmgr.conf node check
Node "node2":
Server role: OK (node is standby)
Replication lag: OK (0 seconds)
WAL archiving: OK (0 pending archive ready files)
Upstream connection: OK (node "node2" (ID: 2) is attached to expected upstream node "node1" (ID: 1))
Downstream servers: OK (this node has no downstream nodes)
Replication slots: OK (node has no physical replication slots)
Missing physical replication slots: OK (node has no missing physical replication slots)
Configured data directory: OK (configured "data_directory" is "/var/lib/postgresql/data")Individual checks
Each check can be performed individually by supplying
an additional command line parameter, e.g.:
$ repmgr node check --role
OK (node is primary)
Parameters for individual checks are as follows:
: checks if the node has the expected role
: checks if the node is lagging by more than
replication_lag_warning or replication_lag_critical
: checks for WAL files which have not yet been archived,
and returns WARNING or CRITICAL if the number
exceeds archive_ready_warning or archive_ready_critical respectively.
: checks that the expected downstream nodes are attached
: checks that the node is attached to its expected upstream
: checks there are no inactive physical replication slots
: checks there are no missing physical replication slots
: checks the data directory configured in
repmgr.conf matches the actual data directory.
This check is not directly related to replication, but is useful to verify &repmgr;
is correctly configured.
repmgrd
A separate check is available to verify whether &repmgrd; is running,
This is not included in the general output, as this does not
per-se constitute a check of the node's replication status.
: checks whether &repmgrd; is running.
If &repmgrd; is running but paused, status 1
(WARNING) is returned.
Additional checks
Several checks are provided for diagnostic purposes and are not
included in the general output:
: checks if &repmgr; can connect to the
database on the local node.
This option is particularly useful in combination with SSH, as
it can be used to troubleshoot connection issues encountered when &repmgr; is
executed remotely (e.g. during a switchover operation).
: checks if the file containing replication
configuration (PostgreSQL 12 and later: postgresql.auto.conf;
PostgreSQL 11 and earlier: recovery.conf) is
owned by the same user who owns the data directory.
Incorrect ownership of these files (e.g. if they are owned by root)
will cause operations which need to update the replication configuration
(e.g. repmgr standby follow
or repmgr standby promote)
to fail.
Connection options
/: connect as the
named superuser instead of the &repmgr; user
Output format
: generate output in CSV format (not available
for individual checks)
: generate output in a Nagios-compatible format
(for individual checks only)
Exit codes
When executing repmgr node check with one of the individual
checks listed above, &repmgr; will emit one of the following Nagios-style exit codes
(even if is not supplied):
0: OK
1: WARNING
2: ERROR
3: UNKNOWN
One of the following exit codes will be emitted by repmgr status check
if no individual check was specified.
No issues were detected.
One or more issues were detected.
See also,