FAQ (Frequently Asked Questions)

FAQ (Frequently Asked Questions) FAQ (Frequently Asked Questions) General What's the difference between the repmgr versions? &repmgr; 4 is a complete rewrite of the existing &repmgr; code base and implements &repmgr; as a PostgreSQL extension. It supports all PostgreSQL versions from 9.3 (although some &repmgr; features are not available for PostgreSQL 9.3 and 9.4). &repmgr; 3.x builds on the improved replication facilities added in PostgreSQL 9.3, as well as improved automated failover support via repmgrd, and is not compatible with PostgreSQL 9.2 and earlier. We recommend upgrading to &repmgr; 4, as the &repmgr; 3.x series will no longer be actively maintained. &repmgr; 2.x supports PostgreSQL 9.0 ~ 9.3. While it is compatible with PostgreSQL 9.3, we recommend using repmgr 4.x. &repmgr; 2.x is no longer maintained. What's the advantage of using replication slots? Replication slots, introduced in PostgreSQL 9.4, ensure that the primary server will retain WAL files until they have been consumed by all standby servers. This makes WAL file management much easier, and if used &repmgr; will no longer insist on a fixed minimum number (default: 5000) of WAL files being retained. However this does mean that if a standby is no longer connected to the primary, the presence of the replication slot will cause WAL files to be retained indefinitely. How many replication slots should I define in <varname>max_replication_slots</varname>? Normally at least same number as the number of standbys which will connect to the node. Note that changes to max_replication_slots require a server restart to take effect, and as there is no particular penalty for unused replication slots, setting a higher figure will make adding new nodes easier. Does &repmgr; support hash indexes? Before PostgreSQL 10, hash indexes were not WAL logged and are therefore not suitable for use in streaming replication in PostgreSQL 9.6 and earlier. See the PostgreSQL documentation for details. From PostgreSQL 10, this restriction has been lifted and hash indexes can be used in a streaming replication cluster. Can &repmgr; assist with upgrading a PostgreSQL cluster? For minor version upgrades, e.g. from 9.6.7 to 9.6.8, a common approach is to upgrade a standby to the latest version, perform a switchover promoting it to a primary, then upgrade the former primary. For major version upgrades (e.g. from PostgreSQL 9.6 to PostgreSQL 10), the traditional approach is to "reseed" a cluster by upgrading a single node with pg_upgrade and recloning standbys from this. To minimize downtime during major upgrades, for more recent PostgreSQL versions pglogical can be used to set up a parallel cluster using the newer PostgreSQL version, which can be kept in sync with the existing production cluster until the new cluster is ready to be put into production. <command>repmgr</command> Can I register an existing PostgreSQL server with repmgr? Yes, any existing PostgreSQL server which is part of the same replication cluster can be registered with &repmgr;. There's no requirement for a standby to have been cloned using &repmgr;. Can I use a standby not cloned by &repmgr; as a &repmgr; node? For a standby which has been manually cloned or recovered from an external backup manager such as Barman, the command repmgr standby clone --recovery-conf-only can be used to create the correct recovery.conf file for use with &repmgr; (and will create a replication slot if required). Once this has been done, register the node as usual. What does &repmgr; write in <filename>recovery.conf</filename>, and what options can be set there? See section Customising recovery.conf. How can a failed primary be re-added as a standby? This is a two-stage process. First, the failed primary's data directory must be re-synced with the current primary; secondly the failed primary needs to be re-registered as a standby. It's possible to use pg_rewind to re-synchronise the existing data directory, which will usually be much faster than re-cloning the server. However pg_rewind can only be used if PostgreSQL either has wal_log_hints enabled, or data checksums were enabled when the cluster was initialized. Note that pg_rewind is available as part of the core PostgreSQL distribution from PostgreSQL 9.5, and as a third-party utility for PostgreSQL 9.3 and 9.4. &repmgr; provides the command repmgr node rejoin which can optionally execute pg_rewind; see the documentation for details, in particular the section . If pg_rewind cannot be used, then the data directory will need to be re-cloned from scratch. Is there an easy way to check my primary server is correctly configured for use with &repmgr;? Execute repmgr standby clone with the --dry-run option; this will report any configuration problems which need to be rectified. When cloning a standby, how can I get &repmgr; to copy <filename>postgresql.conf</filename> and <filename>pg_hba.conf</filename> from the PostgreSQL configuration directory in <filename>/etc</filename>? Use the command line option --copy-external-config-files. For more details see . Do I need to include <literal>shared_preload_libraries = 'repmgr'</literal> in <filename>postgresql.conf</filename> if I'm not using <application>repmgrd</application>? No, the repmgr shared library is only needed when running repmgrd. If you later decide to run repmgrd, you just need to add shared_preload_libraries = 'repmgr' and restart PostgreSQL. I've provided replication permission for the <literal>repmgr</literal> user in <filename>pg_hba.conf</filename> but <command>repmgr</command>/<application>repmgrd</application> complains it can't connect to the server... Why? repmgr and repmgrd need to be able to connect to the repmgr database with a normal connection to query metadata. The replication connection permission is for PostgreSQL's streaming replication (and doesn't necessarily need to be the repmgr user). When cloning a standby, why do I need to provide the connection parameters for the primary server on the command line, not in the configuration file? Cloning a standby is a one-time action; the role of the server being cloned from could change, so fixing it in the configuration file would create confusion. If &repmgr; needs to establish a connection to the primary server, it can retrieve this from the repmgr.nodes table on the local node, and if necessary scan the replication cluster until it locates the active primary. When cloning a standby, how do I ensure the WAL files are placed in a custom directory? Provide the option --waldir (--xlogdir in PostgreSQL 9.6 and earlier) with the absolute path to the WAL directory in pg_basebackup_options. For more details see . Why is there no foreign key on the <literal>node_id</literal> column in the <literal>repmgr.events</literal> table? Under some circumstances event notifications can be generated for servers which have not yet been registered; it's also useful to retain a record of events which includes servers removed from the replication cluster which no longer have an entry in the repmrg.nodes table. <application>repmgrd</application> How can I prevent a node from ever being promoted to primary? In `repmgr.conf`, set its priority to a value of 0 or less; apply the changed setting with repmgr standby register --force. Additionally, if failover is set to manual, the node will never be considered as a promotion candidate. Does <application>repmgrd</application> support delayed standbys? repmgrd can monitor delayed standbys - those set up with recovery_min_apply_delay set to a non-zero value in recovery.conf - but as it's not currently possible to directly examine the value applied to the standby, repmgrd may not be able to properly evaluate the node as a promotion candidate. We recommend that delayed standbys are explicitly excluded from promotion by setting priority to 0 in repmgr.conf. Note that after registering a delayed standby, repmgrd will only start once the metadata added in the primary node has been replicated. How can I get <application>repmgrd</application> to rotate its logfile? Configure your system's logrotate service to do this; see . I've recloned a failed primary as a standby, but <application>repmgrd</application> refuses to start? Check you registered the standby after recloning. If unregistered, the standby cannot be considered as a promotion candidate even if failover is set to automatic, which is probably not what you want. repmgrd will start if failover is set to manual so the node's replication status can still be monitored, if desired.