diff --git a/doc/appendix-faq.xml b/doc/appendix-faq.xml index f1a120b6..41768074 100644 --- a/doc/appendix-faq.xml +++ b/doc/appendix-faq.xml @@ -3,443 +3,442 @@ FAQ (Frequently Asked Questions) - FAQ (Frequently Asked Questions) + FAQ (Frequently Asked Questions) - - General + + General - - What's the difference between the repmgr versions? - - &repmgr; 4 is a complete rewrite of the previous &repmgr; code base - and implements &repmgr; as a PostgreSQL extension. It - supports all PostgreSQL versions from 9.3 (although some &repmgr; - features are not available for PostgreSQL 9.3 and 9.4). - - + + What's the difference between the repmgr versions? - &repmgr; 5 is fundamentally the same code base as &repmgr; 4, but provides - support for the revised replication configuration mechanism in PostgreSQL 12. + &repmgr; 4 is a complete rewrite of the previous &repmgr; code base + and implements &repmgr; as a PostgreSQL extension. It + supports all PostgreSQL versions from 9.3 (although some &repmgr; + features are not available for PostgreSQL 9.3 and 9.4). - - - &repmgr; 3.x builds on the improved replication facilities added - in PostgreSQL 9.3, as well as improved automated failover support - via &repmgrd;, and is not compatible with PostgreSQL 9.2 - and earlier. We recommend upgrading to &repmgr; 4, as the &repmgr; 3.x - series is no longer maintained. - - - &repmgr; 2.x supports PostgreSQL 9.0 ~ 9.3. While it is compatible - with PostgreSQL 9.3, we recommend using repmgr 4.x. &repmgr; 2.x is - no longer maintained. - - - See also &repmgr; compatibility matrix - and Should I upgrade &repmgr;?. - - + + + &repmgr; 5 is fundamentally the same code base as &repmgr; 4, but provides + support for the revised replication configuration mechanism in PostgreSQL 12. + + + + &repmgr; 3.x builds on the improved replication facilities added + in PostgreSQL 9.3, as well as improved automated failover support + via &repmgrd;, and is not compatible with PostgreSQL 9.2 + and earlier. We recommend upgrading to &repmgr; 4, as the &repmgr; 3.x + series is no longer maintained. + + + &repmgr; 2.x supports PostgreSQL 9.0 ~ 9.3. While it is compatible + with PostgreSQL 9.3, we recommend using repmgr 4.x. &repmgr; 2.x is + no longer maintained. + + + See also &repmgr; compatibility matrix + and Should I upgrade &repmgr;?. + + - - What's the advantage of using replication slots? - - Replication slots, introduced in PostgreSQL 9.4, ensure that the - primary server will retain WAL files until they have been consumed - by all standby servers. This means standby servers should never - fail due to not being able to retrieve required WAL files from the - primary. - - - However this does mean that if a standby is no longer connected to the - primary, the presence of the replication slot will cause WAL files - to be retained indefinitely, and eventually lead to disk space - exhaustion. - + + What's the advantage of using replication slots? + + Replication slots, introduced in PostgreSQL 9.4, ensure that the + primary server will retain WAL files until they have been consumed + by all standby servers. This means standby servers should never + fail due to not being able to retrieve required WAL files from the + primary. + + + However this does mean that if a standby is no longer connected to the + primary, the presence of the replication slot will cause WAL files + to be retained indefinitely, and eventually lead to disk space + exhaustion. + - - - 2ndQuadrant's recommended configuration is to configure - Barman as a fallback - source of WAL files, rather than maintain replication slots for - each standby. See also: Using Barman as a WAL file source. - - - + + + 2ndQuadrant's recommended configuration is to configure + Barman as a fallback + source of WAL files, rather than maintain replication slots for + each standby. See also: Using Barman as a WAL file source. + + + - - How many replication slots should I define in <varname>max_replication_slots</varname>? - - Normally at least same number as the number of standbys which will connect - to the node. Note that changes to max_replication_slots require a server - restart to take effect, and as there is no particular penalty for unused - replication slots, setting a higher figure will make adding new nodes - easier. - - + + How many replication slots should I define in <varname>max_replication_slots</varname>? + + Normally at least same number as the number of standbys which will connect + to the node. Note that changes to max_replication_slots require a server + restart to take effect, and as there is no particular penalty for unused + replication slots, setting a higher figure will make adding new nodes + easier. + + - - Does &repmgr; support hash indexes? - - Before PostgreSQL 10, hash indexes were not WAL logged and are therefore not suitable - for use in streaming replication in PostgreSQL 9.6 and earlier. See the - PostgreSQL documentation - for details. - - - From PostgreSQL 10, this restriction has been lifted and hash indexes can be used - in a streaming replication cluster. - - + + Does &repmgr; support hash indexes? + + Before PostgreSQL 10, hash indexes were not WAL logged and are therefore not suitable + for use in streaming replication in PostgreSQL 9.6 and earlier. See the + PostgreSQL documentation + for details. + + + From PostgreSQL 10, this restriction has been lifted and hash indexes can be used + in a streaming replication cluster. + + - - Can &repmgr; assist with upgrading a PostgreSQL cluster? - - For minor version upgrades, e.g. from 9.6.7 to 9.6.8, a common - approach is to upgrade a standby to the latest version, perform a - switchover promoting it to a primary, - then upgrade the former primary. - - - For major version upgrades (e.g. from PostgreSQL 9.6 to PostgreSQL 10), - the traditional approach is to "reseed" a cluster by upgrading a single - node with pg_upgrade - and recloning standbys from this. - - - To minimize downtime during major upgrades from PostgreSQL 9.4 and later, - pglogical - can be used to set up a parallel cluster using the newer PostgreSQL version, - which can be kept in sync with the existing production cluster until the - new cluster is ready to be put into production. - - + + Can &repmgr; assist with upgrading a PostgreSQL cluster? + + For minor version upgrades, e.g. from 9.6.7 to 9.6.8, a common + approach is to upgrade a standby to the latest version, perform a + switchover promoting it to a primary, + then upgrade the former primary. + + + For major version upgrades (e.g. from PostgreSQL 9.6 to PostgreSQL 10), + the traditional approach is to "reseed" a cluster by upgrading a single + node with pg_upgrade + and recloning standbys from this. + + + To minimize downtime during major upgrades from PostgreSQL 9.4 and later, + pglogical + can be used to set up a parallel cluster using the newer PostgreSQL version, + which can be kept in sync with the existing production cluster until the + new cluster is ready to be put into production. + + - - What does this error mean: <literal>ERROR: could not access file "$libdir/repmgr"</literal>? - - It means the &repmgr; extension code is not installed in the - PostgreSQL application directory. This typically happens when using PostgreSQL - packages provided by a third-party vendor, which often have different - filesystem layouts. - - - Either use PostgreSQL packages provided by the community or 2ndQuadrant; if this - is not possible, contact your vendor for assistance. - - + + What does this error mean: <literal>ERROR: could not access file "$libdir/repmgr"</literal>? + + It means the &repmgr; extension code is not installed in the + PostgreSQL application directory. This typically happens when using PostgreSQL + packages provided by a third-party vendor, which often have different + filesystem layouts. + + + Either use PostgreSQL packages provided by the community or 2ndQuadrant; if this + is not possible, contact your vendor for assistance. + + - - How can I obtain old versions of &repmgr; packages? - - See appendix for details. - - + + How can I obtain old versions of &repmgr; packages? + + See appendix for details. + + - - Is &repmgr; required for streaming replication? - - No. - - - &repmgr; (together with &repmgrd;) assists with - managing replication. It does not actually perform replication, which - is part of the core PostgreSQL functionality. - - + + Is &repmgr; required for streaming replication? + + No. + + + &repmgr; (together with &repmgrd;) assists with + managing replication. It does not actually perform replication, which + is part of the core PostgreSQL functionality. + + - - Will replication stop working if &repmgr; is uninstalled? - - No. See preceding question. - - + + Will replication stop working if &repmgr; is uninstalled? + + No. See preceding question. + + - - Does it matter if different &repmgr; versions are present in the replication cluster? - - Yes. If different "major" &repmgr; versions (e.g. 3.3.x and 4.1.x) are present, - &repmgr; (in particular &repmgrd;) - may not run, or run properly, or in the worst case (if different &repmgrd; - versions are running and there are differences in the failover implementation) break - your replication cluster. - - - If different "minor" &repmgr; versions (e.g. 4.1.1 and 4.1.6) are installed, - &repmgr; will function, but we strongly recommend always running the same version - to ensure there are no unexpected suprises, e.g. a newer version behaving slightly - differently to the older version. - - - See also Should I upgrade &repmgr;?. - - + + Does it matter if different &repmgr; versions are present in the replication cluster? + + Yes. If different "major" &repmgr; versions (e.g. 3.3.x and 4.1.x) are present, + &repmgr; (in particular &repmgrd;) + may not run, or run properly, or in the worst case (if different &repmgrd; + versions are running and there are differences in the failover implementation) break + your replication cluster. + + + If different "minor" &repmgr; versions (e.g. 4.1.1 and 4.1.6) are installed, + &repmgr; will function, but we strongly recommend always running the same version + to ensure there are no unexpected suprises, e.g. a newer version behaving slightly + differently to the older version. + + + See also Should I upgrade &repmgr;?. + + - - Should I upgrade &repmgr;? - - Yes. - - - We don't release new versions for fun, you know. Upgrading may require a little effort, - but running an older &repmgr; version with bugs which have since been fixed may end up - costing you more effort. The same applies to PostgreSQL itself. - + + Should I upgrade &repmgr;? + + Yes. + + + We don't release new versions for fun, you know. Upgrading may require a little effort, + but running an older &repmgr; version with bugs which have since been fixed may end up + costing you more effort. The same applies to PostgreSQL itself. + + - + + Why do I need to specify the data directory location in repmgr.conf? + + In some circumstances &repmgr; may need to access a PostgreSQL data + directory while the PostgreSQL server is not running, e.g. to confirm + it shut down cleanly during a switchover. + + + Additionally, this provides support when using &repmgr; on PostgreSQL 9.6 and + earlier, where the repmgr user is not a superuser; in that + case the repmgr user will not be able to access the + data_directory configuration setting, access to which is restricted + to superusers. (In PostgreSQL 10 and later, non-superusers can be added to the + group which will enable them to read this setting). + + + - - Why do I need to specify the data directory location in repmgr.conf? - - In some circumstances &repmgr; may need to access a PostgreSQL data - directory while the PostgreSQL server is not running, e.g. to confirm - it shut down cleanly during a switchover. - - - Additionally, this provides support when using &repmgr; on PostgreSQL 9.6 and - earlier, where the repmgr user is not a superuser; in that - case the repmgr user will not be able to access the - data_directory configuration setting, access to which is restricted - to superusers. (In PostgreSQL 10 and later, non-superusers can be added to the - group which will enable them to read this setting). - - - + + <command>repmgr</command> - - <command>repmgr</command> + + Can I register an existing PostgreSQL server with repmgr? + + Yes, any existing PostgreSQL server which is part of the same replication + cluster can be registered with &repmgr;. There's no requirement for a + standby to have been cloned using &repmgr;. + + - - Can I register an existing PostgreSQL server with repmgr? - - Yes, any existing PostgreSQL server which is part of the same replication - cluster can be registered with &repmgr;. There's no requirement for a - standby to have been cloned using &repmgr;. - - + + Can I use a standby not cloned by &repmgr; as a &repmgr; node? - - Can I use a standby not cloned by &repmgr; as a &repmgr; node? + + For a standby which has been manually cloned or recovered from an external + backup manager such as Barman, the command + repmgr standby clone --recovery-conf-only + can be used to create the correct recovery.conf file for + use with &repmgr; (and will create a replication slot if required). Once this has been done, + register the node as usual. + + - - For a standby which has been manually cloned or recovered from an external - backup manager such as Barman, the command - repmgr standby clone --recovery-conf-only - can be used to create the correct recovery.conf file for - use with &repmgr; (and will create a replication slot if required). Once this has been done, - register the node as usual. - - + + What does &repmgr; write in <filename>recovery.conf</filename>, and what options can be set there? + + See section Customising recovery.conf. + + - - What does &repmgr; write in <filename>recovery.conf</filename>, and what options can be set there? - - See section Customising recovery.conf. - - + + How can a failed primary be re-added as a standby? + + This is a two-stage process. First, the failed primary's data directory + must be re-synced with the current primary; secondly the failed primary + needs to be re-registered as a standby. + + + It's possible to use pg_rewind to re-synchronise the existing data + directory, which will usually be much + faster than re-cloning the server. However pg_rewind can only + be used if PostgreSQL either has wal_log_hints enabled, or + data checksums were enabled when the cluster was initialized. + + + Note that pg_rewind is available as part of the core PostgreSQL + distribution from PostgreSQL 9.5, and as a third-party utility for PostgreSQL 9.3 and 9.4. + + + &repmgr; provides the command repmgr node rejoin which can + optionally execute pg_rewind; see the + documentation for details, in particular the section . + + + If pg_rewind cannot be used, then the data directory will need + to be re-cloned from scratch. + - - How can a failed primary be re-added as a standby? - - This is a two-stage process. First, the failed primary's data directory - must be re-synced with the current primary; secondly the failed primary - needs to be re-registered as a standby. - - - It's possible to use pg_rewind to re-synchronise the existing data - directory, which will usually be much - faster than re-cloning the server. However pg_rewind can only - be used if PostgreSQL either has wal_log_hints enabled, or - data checksums were enabled when the cluster was initialized. - - - Note that pg_rewind is available as part of the core PostgreSQL - distribution from PostgreSQL 9.5, and as a third-party utility for PostgreSQL 9.3 and 9.4. - - - &repmgr; provides the command repmgr node rejoin which can - optionally execute pg_rewind; see the - documentation for details, in particular the section . - - - If pg_rewind cannot be used, then the data directory will need - to be re-cloned from scratch. - + - + + Is there an easy way to check my primary server is correctly configured for use with &repmgr;? + + Execute repmgr standby clone + with the --dry-run option; this will report any configuration problems + which need to be rectified. + + - - Is there an easy way to check my primary server is correctly configured for use with &repmgr;? - - Execute repmgr standby clone - with the --dry-run option; this will report any configuration problems - which need to be rectified. - - + + When cloning a standby, how can I get &repmgr; to copy + <filename>postgresql.conf</filename> and <filename>pg_hba.conf</filename> from the PostgreSQL configuration + directory in <filename>/etc</filename>? + + Use the command line option --copy-external-config-files. For more details + see . + + - - When cloning a standby, how can I get &repmgr; to copy - <filename>postgresql.conf</filename> and <filename>pg_hba.conf</filename> from the PostgreSQL configuration - directory in <filename>/etc</filename>? - - Use the command line option --copy-external-config-files. For more details - see . - - + + Do I need to include <literal>shared_preload_libraries = 'repmgr'</literal> + in <filename>postgresql.conf</filename> if I'm not using &repmgrd;? + + No, the repmgr shared library is only needed when running &repmgrd;. + If you later decide to run &repmgrd;, you just need to add + shared_preload_libraries = 'repmgr' and restart PostgreSQL. + + - - Do I need to include <literal>shared_preload_libraries = 'repmgr'</literal> - in <filename>postgresql.conf</filename> if I'm not using &repmgrd;? - - No, the repmgr shared library is only needed when running &repmgrd;. - If you later decide to run &repmgrd;, you just need to add - shared_preload_libraries = 'repmgr' and restart PostgreSQL. - - + + I've provided replication permission for the <literal>repmgr</literal> user in <filename>pg_hba.conf</filename> + but <command>repmgr</command>/&repmgrd; complains it can't connect to the server... Why? + + repmgr and &repmgrd; need to be able to connect to the repmgr database + with a normal connection to query metadata. The replication connection + permission is for PostgreSQL's streaming replication (and doesn't necessarily need to be the repmgr user). + + - - I've provided replication permission for the <literal>repmgr</literal> user in <filename>pg_hba.conf</filename> - but <command>repmgr</command>/&repmgrd; complains it can't connect to the server... Why? - - repmgr and &repmgrd; need to be able to connect to the repmgr database - with a normal connection to query metadata. The replication connection - permission is for PostgreSQL's streaming replication (and doesn't necessarily need to be the repmgr user). - - + + When cloning a standby, why do I need to provide the connection parameters + for the primary server on the command line, not in the configuration file? + + Cloning a standby is a one-time action; the role of the server being cloned + from could change, so fixing it in the configuration file would create + confusion. If &repmgr; needs to establish a connection to the primary + server, it can retrieve this from the repmgr.nodes table on the local + node, and if necessary scan the replication cluster until it locates the active primary. + + - - When cloning a standby, why do I need to provide the connection parameters - for the primary server on the command line, not in the configuration file? - - Cloning a standby is a one-time action; the role of the server being cloned - from could change, so fixing it in the configuration file would create - confusion. If &repmgr; needs to establish a connection to the primary - server, it can retrieve this from the repmgr.nodes table on the local - node, and if necessary scan the replication cluster until it locates the active primary. - - + + When cloning a standby, how do I ensure the WAL files are placed in a custom directory? + + Provide the option --waldir (--xlogdir in PostgreSQL 9.6 + and earlier) with the absolute path to the WAL directory in pg_basebackup_options. + For more details see . + + - - When cloning a standby, how do I ensure the WAL files are placed in a custom directory? - - Provide the option --waldir (--xlogdir in PostgreSQL 9.6 - and earlier) with the absolute path to the WAL directory in pg_basebackup_options. - For more details see . - - + + Why is there no foreign key on the <literal>node_id</literal> column in the <literal>repmgr.events</literal> + table? + + Under some circumstances event notifications can be generated for servers + which have not yet been registered; it's also useful to retain a record + of events which includes servers removed from the replication cluster + which no longer have an entry in the repmgr.nodes table. + + - - Why is there no foreign key on the <literal>node_id</literal> column in the <literal>repmgr.events</literal> - table? - - Under some circumstances event notifications can be generated for servers - which have not yet been registered; it's also useful to retain a record - of events which includes servers removed from the replication cluster - which no longer have an entry in the repmgr.nodes table. - - - - - Why are some values in <filename>recovery.conf</filename> surrounded by pairs of single quotes? - - This is to ensure that user-supplied values which are written as parameter values in recovery.conf - are escaped correctly and do not cause errors when recovery.conf is parsed. - - - The escaping is performed by an internal PostgreSQL routine, which leaves strings consisting - of digits and alphabetical characters only as-is, but wraps everything else in pairs of single quotes, - even if the string does not contain any characters which need escaping. - - + + Why are some values in <filename>recovery.conf</filename> surrounded by pairs of single quotes? + + This is to ensure that user-supplied values which are written as parameter values in recovery.conf + are escaped correctly and do not cause errors when recovery.conf is parsed. + + + The escaping is performed by an internal PostgreSQL routine, which leaves strings consisting + of digits and alphabetical characters only as-is, but wraps everything else in pairs of single quotes, + even if the string does not contain any characters which need escaping. + + - + - - &repmgrd; + + &repmgrd; - - How can I prevent a node from ever being promoted to primary? - - In repmgr.conf, set its priority to a value of 0; apply the changed setting with - repmgr standby register --force. - - - Additionally, if failover is set to manual, the node will never - be considered as a promotion candidate. - - + + How can I prevent a node from ever being promoted to primary? + + In repmgr.conf, set its priority to a value of 0; apply the changed setting with + repmgr standby register --force. + + + Additionally, if failover is set to manual, the node will never + be considered as a promotion candidate. + + - - Does &repmgrd; support delayed standbys? - - &repmgrd; can monitor delayed standbys - those set up with - recovery_min_apply_delay set to a non-zero value - in recovery.conf - but as it's not currently possible - to directly examine the value applied to the standby, &repmgrd; - may not be able to properly evaluate the node as a promotion candidate. - - - We recommend that delayed standbys are explicitly excluded from promotion - by setting priority to 0 in - repmgr.conf. - - - Note that after registering a delayed standby, &repmgrd; will only start - once the metadata added in the primary node has been replicated. - - + + Does &repmgrd; support delayed standbys? + + &repmgrd; can monitor delayed standbys - those set up with + recovery_min_apply_delay set to a non-zero value + in recovery.conf - but as it's not currently possible + to directly examine the value applied to the standby, &repmgrd; + may not be able to properly evaluate the node as a promotion candidate. + + + We recommend that delayed standbys are explicitly excluded from promotion + by setting priority to 0 in + repmgr.conf. + + + Note that after registering a delayed standby, &repmgrd; will only start + once the metadata added in the primary node has been replicated. + + - - How can I get &repmgrd; to rotate its logfile? - - Configure your system's logrotate service to do this; see . - + + How can I get &repmgrd; to rotate its logfile? + + Configure your system's logrotate service to do this; see . + - + - - I've recloned a failed primary as a standby, but &repmgrd; refuses to start? - - Check you registered the standby after recloning. If unregistered, the standby - cannot be considered as a promotion candidate even if failover is set to - automatic, which is probably not what you want. &repmgrd; will start if - failover is set to manual so the node's replication status can still - be monitored, if desired. - - + + I've recloned a failed primary as a standby, but &repmgrd; refuses to start? + + Check you registered the standby after recloning. If unregistered, the standby + cannot be considered as a promotion candidate even if failover is set to + automatic, which is probably not what you want. &repmgrd; will start if + failover is set to manual so the node's replication status can still + be monitored, if desired. + + - - - &repmgrd; ignores pg_bindir when executing <varname>promote_command</varname> or <varname>follow_command</varname> - - - promote_command or follow_command can be user-defined scripts, - so &repmgr; will not apply even if excuting &repmgr;. Always provide the full - path; see for more details. - - + + + &repmgrd; ignores pg_bindir when executing <varname>promote_command</varname> or <varname>follow_command</varname> + + + promote_command or follow_command can be user-defined scripts, + so &repmgr; will not apply even if excuting &repmgr;. Always provide the full + path; see for more details. + + - - - &repmgrd; aborts startup with the error "<literal>upstream node must be running before repmgrd can start</literal>" - - - &repmgrd; does this to avoid starting up on a replication cluster - which is not in a healthy state. If the upstream is unavailable, &repmgrd; - may initiate a failover immediately after starting up, which could have unintended side-effects, - particularly if &repmgrd; is not running on other nodes. - - - In particular, it's possible that the node's local copy of the repmgr.nodes copy - is out-of-date, which may lead to incorrect failover behaviour. - - - The onus is therefore on the adminstrator to manually set the cluster to a stable, healthy state before - starting &repmgrd;. - - + + + &repmgrd; aborts startup with the error "<literal>upstream node must be running before repmgrd can start</literal>" + + + &repmgrd; does this to avoid starting up on a replication cluster + which is not in a healthy state. If the upstream is unavailable, &repmgrd; + may initiate a failover immediately after starting up, which could have unintended side-effects, + particularly if &repmgrd; is not running on other nodes. + + + In particular, it's possible that the node's local copy of the repmgr.nodes copy + is out-of-date, which may lead to incorrect failover behaviour. + + + The onus is therefore on the adminstrator to manually set the cluster to a stable, healthy state before + starting &repmgrd;. + + - +