repmgrd setup and configurationrepmgrdconfiguration
&repmgrd; is a daemon which runs on each PostgreSQL node,
monitoring the local node, and (unless it's the primary node) the upstream server
(the primary server or with cascading replication, another standby) which it's
connected to.
&repmgrd; can be configured to provide failover
capability in case the primary upstream node becomes unreachable, and/or
provide monitoring data to the &repmgr; metadatabase.
repmgrd configuration
To use &repmgrd;, its associated function library must be
included via postgresql.conf with:
shared_preload_libraries = 'repmgr'
Changing this setting requires a restart of PostgreSQL; for more details see
the PostgreSQL documentation.
The following configuraton options apply to &repmgrd; in all circumstances:
monitor_interval_secs
The interval (in seconds, default: 2) to check the availability of the upstream node.
connection_check_type
The option is used to select the method
&repmgrd; uses to determine whether the upstream node is available.
Possible values are:
ping (default) - uses PQping() to
determine server availability
connection - determines server availability
by attempt ingto make a new connection to the upstream node
query - determines server availability
by executing an SQL statement on the node via the existing connection
reconnect_attempts
The number of attempts (default: 6) will be made to reconnect to an unreachable
upstream node before initiating a failover.
There will be an interval of seconds between each reconnection
attempt.
reconnect_interval
Interval (in seconds, default: 10) between attempts to reconnect to an unreachable
upstream node.
The number of reconnection attempts is defined by the parameter .
degraded_monitoring_timeout
Interval (in seconds) after which &repmgrd; will terminate if
either of the servers (local node and or upstream node) being monitored is no longer available
(degraded monitoring mode).
-1 (default) disables this timeout completely.
See also repmgr.conf.sample for an annotated sample configuration file.
Required configuration for automatic failover
The following &repmgrd; options must be set in
repmgr.conf:
Example:
failover=automatic
promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
Details of each option are as follows:
failover
can be one of automatic or manual.
If is set to manual, &repmgrd;
will not take any action if a failover situation is detected, and the node may need to
be modified manually (e.g. by executing repmgr standby follow).
promote_command
The program or script defined in will be executed
in a failover situation when &repmgrd; determines that
the current node is to become the new primary node.
Normally is set as &repmgr;'s
repmgr standby promote command.
When invoking repmgr standby promote (either directly via
the , or in a script called
via ),
must not be included as a
command line option for repmgr standby promote.
It is also possible to provide a shell script to e.g. perform user-defined tasks
before promoting the current node. In this case the script must
at some point execute repmgr standby promote
to promote the node; if this is not done, &repmgr; metadata will not be updated and
&repmgr; will no longer function reliably.
Example:
promote_command='/usr/bin/repmgr standby promote -f /etc/repmgr.conf --log-to-file'
Note that the --log-to-file option will cause
output generated by the &repmgr; command, when executed by &repmgrd;,
to be logged to the same destination configured to receive log output for &repmgrd;.
&repmgr; will not apply when executing
or ; these can be user-defined scripts so must always be
specified with the full path.
follow_command
The program or script defined in will be executed
in a failover situation when &repmgrd; determines that
the current node is to follow the new primary node.
Normally is set as &repmgr;'s
repmgr standby follow command.
The parameter
should provide the --upstream-node-id=%n
option to repmgr standby follow; the %n will be replaced by
&repmgrd; with the ID of the new primary node. If this is not provided,
repmgr standby follow will attempt to determine the new primary by itself, but if the
original primary comes back online after the new primary is promoted, there is a risk that
repmgr standby follow will result in the node continuing to follow
the original primary.
It is also possible to provide a shell script to e.g. perform user-defined tasks
before promoting the current node. In this case the script must
at some point execute repmgr standby follow
to promote the node; if this is not done, &repmgr; metadata will not be updated and
&repmgr; will no longer function reliably.
Example:
follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'
Note that the --log-to-file option will cause
output generated by the &repmgr; command, when executed by &repmgrd;,
to be logged to the same destination configured to receive log output for &repmgrd;.
&repmgr; will not apply when executing
or ; these can be user-defined scripts so must always be
specified with the full path.
Optional configuration for automatic failover
The following configuraton options can be use to fine-tune automatic failover:
priority
Indicates a preferred priority (default: 100) for promoting nodes;
a value of zero prevents the node being promoted to primary.
Note that the priority setting is only applied if two or more nodes are
determined as promotion candidates; in that case the node with the
higher priority is selected.
failover_validation_command
User-defined script to execute for an external mechanism to validate the failover
decision made by &repmgrd;.
This option must be identically configured
on all nodes.
One or both of the following parameter placeholders
should be provided, which will be replaced by repmgrd with the appropriate
value:
%n: node ID%a: node name
See also: Failover validation.
primary_visibility_consensus
If true, only continue with failover if no standbys have seen
the primary node recently.
This option must be identically configured
on all nodes.
standby_disconnect_on_failover
In a failover situation, disconnect the local node's WAL receiver.
This option is available from PostgreSQL 9.5 and later.
This option must be identically configured
on all nodes.
Additionally the &repmgr; user must be a superuser
for this option.
&repmgrd; will refuse to start if this option is set
but either of these prerequisites is not met.
See also: Standby disconnection on failover.
The following options can be used to further fine-tune failover behaviour.
In practice it's unlikely these will need to be changed from their default
values, but are available as configuration options should the need arise.
election_rerun_interval
If is set, and the command returns
an error, pause the specified amount of seconds (default: 15) before rerunning the election.
sibling_nodes_disconnect_timeout
If is true, the
maximum length of time (in seconds, default: 30)
to wait for other standbys to confirm they have disconnected their
WAL receivers.
PostgreSQL service configurationrepmgrdPostgreSQL service configuration
If using automatic failover, currently &repmgrd; will need to execute
repmgr standby follow
to restart PostgreSQL on standbys to have them follow a new primary.
To ensure this happens smoothly, it's essential to provide the appropriate system/service restart
command appropriate to your operating system via service_restart_command
in repmgr.conf. If you don't do this, &repmgrd;
will default to using pg_ctl, which can result in unexpected problems,
particularly on systemd-based systems.
For more details, see .
repmgrd service configurationrepmgrdrepmgrd service configuration
If you are intending to use the repmgr daemon start
and repmgr daemon stop commands, the following
parameters must be set in repmgr.conf:
repmgrd_service_start_commandrepmgrd_service_stop_command
Example (for &repmgr; with PostgreSQL 11 on CentOS 7):
repmgrd_service_start_command='sudo systemctl repmgr11 start'
repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
For more details see the reference page for each command.
Monitoring configurationrepmgrdmonitoring configuration
To enable monitoring, set:
monitoring_history=yes
in repmgr.conf.
Monitoring data is written at the interval defined by
the option (see above).
For more details on monitoring, see .
Applying configuration changes to repmgrdrepmgrdapplying configuration changes
To apply configuration file changes to a running &repmgrd;
daemon, execute the operating system's &repmgrd; service reload command
(see for examples),
or for instances which were manually started, execute kill -HUP, e.g.
kill -HUP `cat /tmp/repmgrd.pid`.
Check the &repmgrd; log to see what changes were
applied, or if any issues were encountered when reloading the configuration.
Note that only the following subset of configuration file parameters can be changed on a
running &repmgrd; daemon:
async_query_timeoutbdr_local_monitoring_onlybdr_recovery_timeoutchild_nodes_check_intervalchild_nodes_connected_include_witnesschild_nodes_connected_min_countchild_nodes_disconnect_commandchild_nodes_disconnect_min_countchild_nodes_disconnect_timeoutconnection_check_typeconninfodegraded_monitoring_timeoutevent_notification_commandevent_notificationsfailover_validation_commandfailoverfollow_commandlog_facilitylog_filelog_levellog_status_intervalmonitor_interval_secsmonitoring_historyprimary_notification_timeoutprimary_visibility_consensuspromote_commandreconnect_attemptsreconnect_intervalretry_promote_interval_secsrepmgrd_standby_startup_timeoutsibling_nodes_disconnect_timeoutstandby_disconnect_on_failover
The following set of configuration file parameters must be updated via
repmgr standby register --force,
as they require changes to the repmgr.nodes table so they are visible to
all nodes in the replication cluster:
node_idnode_namedata_directorylocationpriority
After executing repmgr standby register --force,
&repmgrd; must be restarted for the changes to take effect.
repmgrd daemonrepmgrdstarting and stopping
If installed from a package, the &repmgrd; can be started
via the operating system's service command, e.g. in systemd
using systemctl.
See appendix for details of service commands
for different distributions.
The commands repmgr daemon start and
repmgr daemon stop can be used
as convenience wrappers to start and stop &repmgrd;.
repmgr daemon start and
repmgr daemon stop require
that the appropriate start/stop commands are configured as
repmgrd_service_start_command and repmgrd_service_stop_command
in repmgr.conf.
&repmgrd; can be started manually like this:
repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid
and stopped with kill `cat /tmp/repmgrd.pid`. Adjust paths as appropriate.
repmgrd's PID filerepmgrdPID filePID filerepmgrd
&repmgrd; will generate a PID file by default.
This is a behaviour change from previous versions (earlier than 4.1), where
the PID file had to be explicitly specified with the command line
parameter .
The PID file can be specified in repmgr.conf with the configuration
parameter repmgrd_pid_file.
It can also be specified on the command line (as in previous versions) with
the command line parameter . Note this will override
any value set in repmgr.conf with repmgrd_pid_file.
may be deprecated in future releases.
If a PID file location was specified by the package maintainer, &repmgrd;
will use that. This only applies if &repmgr; was installed from a package and the package
maintainer has specified the PID file location.
If none of the above apply, &repmgrd; will create a PID file
in the operating system's temporary directory (as setermined by the environment variable
TMPDIR, or if that is not set, will use /tmp).
To prevent a PID file being generated at all, provide the command line option
.
To see which PID file &repmgrd; would use, execute &repmgrd;
with the option . &repmgrd;
will not start if this option is provided. Note that the value shown is the
file &repmgrd; would use next time it starts, and is
not necessarily the PID file currently in use.
repmgrd daemon configuration on Debian/UbunturepmgrdDebian/Ubuntu and daemon configurationDebian/Ubunturepmgrd daemon configuration
If &repmgr; was installed from Debian/Ubuntu packages, additional configuration
is required before &repmgrd; is started as a daemon.
This is done via the file /etc/default/repmgrd, which by default
looks like this:
# default settings for repmgrd. This file is source by /bin/sh from
# /etc/init.d/repmgrd
# disable repmgrd by default so it won't get started upon installation
# valid values: yes/no
REPMGRD_ENABLED=no
# configuration file (required)
#REPMGRD_CONF="/path/to/repmgr.conf"
# additional options
REPMGRD_OPTS="--daemonize=false"
# user to run repmgrd as
#REPMGRD_USER=postgres
# repmgrd binary
#REPMGRD_BIN=/usr/bin/repmgrd
# pid file
#REPMGRD_PIDFILE=/var/run/repmgrd.pid
Set REPMGRD_ENABLED to yes, and REPMGRD_CONF
to the repmgr.conf file you are using.
See for details of the Debian/Ubuntu packages and
typical file locations (including repmgr.conf).
From &repmgrd; 4.1, ensure REPMGRD_OPTS includes
, as daemonization is handled by the service command.
If using systemd, you may need to execute systemctl daemon-reload.
Also, if you attempted to start &repmgrd; using systemctl start repmgrd,
you'll need to execute systemctl stop repmgrd. Because that's how systemd
rolls.
repmgrd connection settings
In addition to the &repmgr; configuration settings, parameters in the
conninfo string influence how &repmgr; makes a network connection to
PostgreSQL. In particular, if another server in the replication cluster
is unreachable at network level, system network settings will influence
the length of time it takes to determine that the connection is not possible.
In particular explicitly setting a parameter for connect_timeout
should be considered; the effective minimum value of 2
(seconds) will ensure that a connection failure at network level is reported
as soon as possible, otherwise depending on the system settings (e.g.
tcp_syn_retries in Linux) a delay of a minute or more
is possible.
For further details on conninfo network connection
parameters, see the
PostgreSQL documentation.
repmgrd log rotationlog rotationrepmgrdrepmgrdlog rotation
To ensure the current &repmgrd; logfile
(specified in repmgr.conf with the parameter
) does not grow indefinitely, configure your
system's logrotate to regularly rotate it.
Sample configuration to rotate logfiles weekly with retention for
up to 52 weeks and rotation forced if a file grows beyond 100Mb:
/var/log/repmgr/repmgrd.log {
missingok
compress
rotate 52
maxsize 100M
weekly
create 0600 postgres postgres
postrotate
/usr/bin/killall -HUP repmgrd
endscript
}