Compare commits

...

1464 Commits

Author SHA1 Message Date
Mario Gonzalez
c631642409 Update product matrix and copyright date
References: https://github.com/EnterpriseDB/repmgr/issues/879
2025-01-15 13:02:22 -03:00
Mario Gonzalez
c0d9dc6dac Update HISTORY and appendix for v5.5.0
References: HL-38
2024-11-22 11:34:48 -03:00
Mario Gonzalez
836894c965 Bump repmgr.control to 5.5
References: HL-38
2024-11-21 14:42:50 +01:00
Mario Gonzalez
7f3a26f0d9 Add primary_node_info inline function
When a t_node_info struct must be re initilised, it was re-declared
only. Moreover, a macro was user for any var creation of this type
impeding use the same to reset the values later if needed.

This new function allows to re initilise again a t_node_info typed
variable without the need to redeclare it to create other varibles for
the same purpose, and also shadowing is now avoided from pg16.

Macros seems to be replaced by `static inline` functions in upstream
postgres, credits to  Alvaro Herrera <alvherre@alvh.no-ip.org> for this
idea.

References: HL-40
2024-11-18 09:46:14 -03:00
Mario Gonzalez
a469221e28 Fix shadowed declaration
Since b5934bfd6071 in postgresql.git the flag
`-Wshadow=compatible-local` is activated. This commit fixes any
duplicated declaration made in the same function.

References: HL-40
2024-11-18 09:46:14 -03:00
Mario Gonzalez
c6d9f38458 Bump repmgr version 5.5.0 2024-11-15 10:51:12 -03:00
Francisco Miguel Biete Banon
5abdbb0d39 Definition moved to a different header 2024-10-28 09:37:30 -03:00
Martín Marqués
8f27e3bc5d Fix wrong reference to connection from commit f69485c0
Commit f69485c0 introduced a change in how we check if the repmgr
user can run checkpoints on PG15 and newer.

There was a regression in the messages where the conn passed
was not declared. That is because the pointer we were using was
called local_conn instead of conn.

Signed-off-by: Martín Marqués <martin.marques@enterprisedb.com>
Co-authored-by: Mario Gonzalez <mario.gonzalez@enterprisedb.com>
2024-10-22 15:40:56 -03:00
RealGreenDragon
b92d43d136 Fixed repmgr.conf.sample 2024-10-14 14:46:27 +02:00
RealGreenDragon
4a28c57bc7 Check for USAGE (instead of MEMBER) privilege in all pg_has_role occurrences 2024-09-11 20:17:32 +02:00
RealGreenDragon
f69485c0ba Added check for pg_checkpoint role presence (#807)
* Added check for pg_checkpoint role presence

This commit provides the needed infrastructure in `repmgr` so if the `repmgr` database
user is a member of the `pg_checkpoint` role, and inherits its privileges, there is no 
need for such a user to be a superuser.

Co-authored-by: Martín Marqués <martin.marques@enterprisedb.com>
2024-09-11 15:13:44 -03:00
Martín Marqués
b4a0938081 Update Authors and version on README
Signed-off-by: Martín Marqués <martin.marques@enterprisedb.com>
2024-09-09 18:00:42 +02:00
Martín Marqués
569f906003 Add CODEOWNERS to the repmgr repo
Signed-off-by: Martín Marqués <martin.marques@enterprisedb.com>
2024-09-09 16:30:50 +02:00
RealGreenDragon
94b21ae8ac Fixed standby_disconnect_on_failover description in repmgr.conf 2024-09-09 15:29:48 +02:00
RealGreenDragon
4c9cca64d0 Fixed standby_disconnect_on_failover docs 2024-09-09 15:29:48 +02:00
RealGreenDragon
82e2fd66e1 Fixed can_disable_walsender indentation and warning message 2024-09-09 15:29:48 +02:00
RealGreenDragon
90fe1b8135 Fixed indentation 2024-09-09 15:29:48 +02:00
RealGreenDragon
1cd168360e Added ALTER SYSTEM permission in docs 2024-09-09 15:29:48 +02:00
RealGreenDragon
e8aa3aced7 Added check for ALTER SYSTEM permission presence 2024-09-09 15:29:48 +02:00
Martín Marqués
d3b1ff45b0 Merge pull request #782 from EnterpriseDB/Fixed-reported-phrase
Update configuration-file-optional-settings.xml
2023-07-04 09:04:04 -03:00
Martín Marqués
450786ec29 Typo in the documentation
Signed-off-by: Martín Marqués <martin.marques@enterprisedb.com>
2023-07-04 14:02:08 +02:00
Ian Barwick
70b34308cc repmgrd: ensure witness node metadata is updated
If the primary changed while the witness repmgrd was not running,
ensure the witness's upstream node ID is updated when the witness
repmgrd is restarted.
2023-06-08 16:35:56 +09:00
Martín Marqués
19c92a7092 Merge pull request #803 from Nyantechnolog/patch-1
Update README.md
2023-04-17 12:57:51 -03:00
Sidorov Pavel
520ff25ef3 Update README.md
upd upper version according to https://repmgr.org
2023-04-04 11:54:59 +03:00
Martín Marqués
8e81c04b4a Add release notes for repmgr release 5.4.0
This release added support for pg-backup-api. Here we add the
release notes which expose the new feature.

Signed-off-by: Martín Marqués <martin.marques@enterprisedb.com>
2023-03-24 15:05:03 +00:00
Martín Marqués
aad988c292 Merge pull request #800 from EnterpriseDB/update-docs-for-pg-backupapi-mode
Adding new mode to clone standbys.
2023-03-16 07:14:33 -03:00
Martín Marqués
43b8a5f65f Various improvements and fixes to the pg-backup-api restore section on the docs
I added a paragraph at the beginning of the section on where to look for
instructions on how to install the pg-backup-api rest API.

Fixed typos and some gramatical changes. Also reworded the first paragraph
(which is now the second one).

Signed-off-by: Martín Marqués <martin.marques@enterprisedb.com>
2023-03-14 11:39:22 +01:00
Martín Marqués
26cfb56170 New sect2 block has to be inside the sect1. The closing tag has to go at the end.
Signed-off-by: Martín Marqués <martin.marques@enterprisedb.com>
2023-03-14 10:14:16 +01:00
Mario Gonzalez
f0cc225de0 Adding new mode to clone standbys.
repmgr v5.4.0 supports this new mode called `pg_backupapi` which is
enabled by defining `pg_backupapi_host` in repmgr.conf.
2023-03-13 11:03:02 -03:00
Martín Marqués
02d8e0c808 Merge pull request #799 from EnterpriseDB/validate-repmgr-options
Validate repmgr options
2023-03-08 07:13:45 -03:00
Mario Gonzalez
167d166ae8 Validate options when using pg_backupapi mode.
When `pg_backupapi` is enabled, we should validate if the remote ssh
command, node name and backup_id are also defined.
2023-03-07 16:51:25 -03:00
Mario Gonzalez
03c2ae1bd8 Connecting to default pg-backup-api port: 7480/TCP 2023-03-07 16:50:02 -03:00
Martín Marqués
4021037d38 Merge pull request #797 from EnterpriseDB/connect-to-pg-backup-api
Connect to pg backup api
2023-03-07 14:34:54 -03:00
Ian Barwick
7cd7566409 doc: update README
Remove reference to non-existent "scripts/" directory.
2023-03-07 08:36:00 +09:00
Mario Gonzalez
81c3200ef2 Integration of pgbackupapi 2023-02-27 12:23:57 -03:00
Mario Gonzalez
c6366db6f9 Library to connect to a pg-backup-api server.
Module to provide communication with a server in order to send new
tasks and query them.

pg-backup-api is a REST api that at the moment, is integrated with
barman only, and can provide a communication interface in order to
perform remote recoveries through background tasks. There are cases
where instead of writing a new repmgr module of some feature that
exists in backup tools already, like barman, pg-backup-api can
provide that integration.
2023-02-27 12:23:57 -03:00
Martín Marqués
078a47e182 Merge pull request #791 from EnterpriseDB/adding-flex-and-libs-as-deps
Adding flex and libs as deps
2023-01-27 06:32:59 -03:00
Mario Gonzalez
ed2c0aaf0b Checking for required libraries before compiling.
This commit ensures all the different libraries are installed before
starting to compile.
2023-01-25 15:51:55 -03:00
Martín Marqués
6249027a03 Merge pull request #792 from EnterpriseDB/dev/remove-sonarqube
Dev/remove sonarqube
2023-01-24 17:03:11 -03:00
Martín Marqués
b36fca17d9 Remove Sonarqube
Sonarqube has been failing for some time (seems like a user/credential
problem).

There will be discussion on how we'll resolve this, but that will be
sometime in the future. For now we are just removing the action.

Signed-off-by: Martín Marqués <martin.marques@enterprisedb.com>
2023-01-24 13:00:35 -03:00
Mario Gonzalez
a3b919d599 Checking if flex is installed before compiling.
We are checking for sed amd other external tools inside configure script,
now is turn of flex.

In systems without flex you will have this error:

gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
[...]
postgresql/internal  -Wdate-time -D_FORTIFY_SOURCE=2 -D_GNU_SOURCE -I/usr/include/libxml2   -c -o configfile.o configfile.c
flex  -o'configfile-scan.c' configfile-scan.l
make: flex: No such file or directory
make: *** [Makefile.global:40: configfile-scan.c] Error 127
2023-01-20 18:48:51 -03:00
Dee Dee Rothery
4c95d8d75e Update configuration-file-optional-settings.xml 2022-12-08 06:26:52 -05:00
Martín Marqués
c21d452076 Fix typo in documentation
Fixed a mising closing quote and closing parenthesis in the repmgrd overview
section of the documentation.

Signed-off-by: Martín Marqués <martin.marques@enterprisedb.com>
2022-11-16 13:55:36 -03:00
Ian Barwick
3b89731899 doc: update release notes 2022-10-18 10:26:44 +09:00
Ian Barwick
138bee98e9 doc: update compatibility matrix
Additional "Notes" column added.
2022-10-17 10:05:48 +09:00
Ian Barwick
83ffe84ff5 doc: update compatibility matrix for repmgr 5.3 / PostgreSQL 15 2022-10-17 10:05:26 +09:00
Ian Barwick
bb3206a2bf doc: clarify failover behaviour when node priority is zero
Make it clear the node will not be promoted under any circumstances.
2022-09-20 09:35:19 +09:00
Ian Barwick
49dfaea471 repmgrd: ensure notification script is called for "repmgrd_upstream_disconnect"
GitHUb #760.
2022-09-02 17:20:39 +09:00
Ian Barwick
8edc64f64e doc: update note about pg_rewind corner-case bug
Sadly it's likely there are still installations running PostgreSQL
versions released prior to February 2021, so keep the note there
for now.
2022-07-14 10:11:01 +09:00
Ian Barwick
3ce646f960 doc: mention full_page_writes as prerequisite for pg_rewind
Anyone disabling it needs to examine their life choices, but as
we check for it anyway, might as well explicitly mention it for
completeness.
2022-07-14 09:45:33 +09:00
Ian Barwick
dc0e89e234 doc: consolidate PostgreSQL 9.4 pg_rewind references
Anyone still using PostgreSQL 9.4 should re-examine their life choices
at this point, but as it is not yet de-supported by repmgr, consolidate
the references to it whenever pg_rewind is mentioned to a single
section.
2022-07-14 09:33:12 +09:00
Ian Barwick
41b6194580 Replace some appendPQExpBuffer occurrences with appendPQExpBufferStr 2022-07-14 09:14:56 +09:00
Ian Barwick
d501781a5f repmgr: update README
- update current version
- remove reference to upgrading from 3.x
2022-07-06 09:39:31 +09:00
Ian Barwick
de5265f594 Update Makefile
Re-add $(LIBS), previously removed in 9d7eebef.

While not required for internal builds, per GitHub #755 this may be
required in some environments.
2022-06-03 10:30:18 +09:00
Ian Barwick
66ac4183b4 doc: clarifications for upgrade process 2022-05-26 10:18:05 +09:00
Ian Barwick
59e5bc1500 doc: update compatibility matrix for repmgr 5.4 / PostgreSQL 15 2022-05-25 14:05:04 +09:00
Ian Barwick
8164914598 Add 5.3.2 release date 2022-05-25 14:04:19 +09:00
Ian Barwick
eb867516ff doc: update release notes 2022-05-25 14:03:39 +09:00
Ian Barwick
6b961ab6a7 doc: update release notes 2022-05-25 14:03:36 +09:00
Ian Barwick
7e2d14d225 shared library: remove redundant code
This has never actually served any purpose; see core commit ab02d702ef.
2022-05-17 20:00:12 +09:00
Ian Barwick
a90d1cf3dd Ensure replication slots can be dropped by a replication-only user
If the repmgr user is a non-superuser, and a replication-only user exists,
ensure redundant replication slots are dropped correctly.
2022-05-16 16:37:25 +09:00
Ian Barwick
cce5ca2245 doc: update release notes 2022-05-16 12:05:35 +09:00
Ian Barwick
6f87d2c61e repmgrd: improve walsender disable check
Specifically, don't attempt to disable walsenders if "standby_disconnect_on_failover"
is "true", but the repmgr user is not a superuser.

This restriction can be lifted from PostgreSQL 15.
2022-05-16 11:52:10 +09:00
Ian Barwick
c0763c94c8 dbutils: move get_wal_receiver_pid() to a more consistent location 2022-05-16 10:40:42 +09:00
Ian Barwick
ea689d17b5 repmgr: add code comment about pg_stat_tmp in PostgreSQL 15+
See core commit 6f0cf8787.
2022-05-15 15:19:02 +09:00
Ian Barwick
de343f0cbe shared library: update for PostgreSQL 15
Core commit 4f2400cb3f adds shmem_request_hook, meaning the
memory initialization previously carried out in _PG_init() needs
to be handled in that hook.
2022-05-14 18:31:05 +09:00
Ian Barwick
e6f1b8c279 doc: expand non-superuser permissions explanation
Information about using repmgr with a non-superuser was spread
throughout individual documentation systems; this commit creates
an overview of requirements and potential issues.
2022-05-13 14:30:03 +09:00
Ian Barwick
8e5f469fa1 doc: clarify permissions requirements in "standby switchover" 2022-05-13 14:26:34 +09:00
Ian Barwick
73cb3abd44 doc: clarify permissions requirements in "standby clone" 2022-05-13 14:26:28 +09:00
Ian Barwick
a3c77cac41 standby clone: update source comment 2022-05-13 13:39:05 +09:00
Ian Barwick
706d570790 doc: add missing short form option to "standby clone" 2022-05-13 11:53:49 +09:00
Ian Barwick
433bde82e8 dbutils.c: fix typo in comment 2022-05-13 11:51:58 +09:00
Ian Barwick
8f5319ce75 Disable exclusive backup check in PostgreSQL 15 and later
Exclusive backup functionality was removed in core commit 39969e2a
so we can and must avoid checking for exclusive backups.
2022-05-11 15:25:38 +09:00
Ian Barwick
4c05307da1 repmgr: fix error message
It's possible the missing node record might be for a witness server,
though we have no way of knowing that.
2022-05-11 14:58:36 +09:00
zhouhj43183
743e0e5480 repmgrd: ensure witness node marked active
Previously, if the witness node's PostgreSQL was unreachable, it would
be marked as "inactive" on the primary, and under some circumstances
would not be corrected to "active" once the witness node's PostgreSQL
came back.

PR #754; some modifications by Ian Barwick.
2022-05-11 13:35:48 +09:00
Ian Barwick
35c4ba1fbe doc: update release notes
Add notes about when PostgreSQL and/or repmgrd restart required.

GitHub #752.
2022-05-10 16:22:51 +09:00
Ian Barwick
30082eab02 doc: update release notes 2022-05-10 16:09:17 +09:00
Ian Barwick
d9d60fa420 standby clone: don't error out if unable to determine cluster size
The cluster size check is purely informative, and is not in any way
essential for the standby clone operation. As it's possible the query
may fail if the repmgr user does not have sufficient privileges to
query all databases in the cluster, we can simply ignore any failure.

Note that the code comment indicated the query also served to sanity-
check that queries can actually be executed. While this was the case
historically, the preceding server version check now serves the same
purpose and will not have the same risk of failure due to missing
permissions.
2022-05-10 15:08:02 +09:00
Ian Barwick
2756d6fd94 node check: prevent WARNING in --downstream --nagios output 2022-04-20 17:45:00 +09:00
Ian Barwick
bb63fb08d0 node check: fix --downstream --nagios output
Ensure the performance data output (the bit after the pipe) contains
information in a pareseable format.

GitHub #749.
2022-04-20 17:35:44 +09:00
Ian Barwick
9508673e00 node check: tweak --upstream --nagios output
No data follows the pipe, so omit it and add missing line break.
2022-04-20 16:40:53 +09:00
Ian Barwick
3e3803fa32 doc: fix XML error 2022-04-19 11:26:22 +09:00
Felix Dreissig
8119aef378 Doc: Update or remove references to recovery.conf
The recovery.conf file is not present anymore for Postgres >= 12.

PR 748.
2022-04-19 11:06:08 +09:00
Felix Dreissig
35c8a53790 Sample config: Update link to docs 2022-04-19 10:20:18 +09:00
Doug Calobrisi
1c7fec9a1a Add SonarQube (#751) 2022-04-18 10:04:54 -04:00
James Stroud
3b8294e4cb Update install-packages.xml postgres 14 and not 11
Hi, I think you had a typo, you meant to say Postgres 14 and not 11.  If am wrong just ignore, thanks and take care James Stroud
2022-03-31 16:03:02 +09:00
Ian Barwick
f5e42e1851 doc: use "note" rather than "important" 2022-03-16 10:40:28 +09:00
Ian Barwick
f7b9e11d6f doc: update witness server notes
- clarify that a separate PostgreSQL instance is needed for each
  witness server
- remove historical 3.x reference
2022-03-16 10:26:47 +09:00
Ian Barwick
197c87d13b docs: update package installation instructions
- use "dnf" in place of "yum"
- mention Rocky Linux
- remove 9.6 examples
2022-03-11 10:23:51 +09:00
Ian Barwick
67e2e8e613 docs: update one more instance of the company name 2022-02-15 17:25:11 +09:00
Ian Barwick
c6f137d438 docs: update other instances of the company name, where appropriate 2022-02-15 17:25:07 +09:00
Ian Barwick
6cd41aaaf6 docs: update company name on cover pages 2022-02-15 17:25:03 +09:00
Ian Barwick
b4dca347b4 docs: finalize 5.3.1 release notes 2022-02-15 13:39:33 +09:00
Ian Barwick
2e1d1f9a57 Add include for pwd.h
This was previously included via the PostgreSQL source, but that
seems to have gone away in recent HEAD builds.
2022-02-03 14:20:22 +09:00
Ian Barwick
fa851ea38f doc: update version matrix
5.2.1 is the latest release in the 5.2.x series.
2022-02-03 13:31:04 +09:00
Ian Barwick
593bffb18f doc: repmgr 5.2 is no longer supported. 2022-02-03 13:26:26 +09:00
Ian Barwick
8a36cc3b2d doc: update version matrix 2022-02-03 13:26:20 +09:00
Ian Barwick
2733579a76 doc: update release notes 2022-02-03 13:09:42 +09:00
Ian Barwick
947d14979f Fix upgrade paths from 4.1 ~ 4.3 to 5.2 and later
A number of C functions were added in releases 4.2 to 4.4; however
these were renamed in 5.3 to prevent naming clashes with other
extensions.

This does however mean that when upgrading from one of the above
versions, the intermediate upgrade steps will attempt to create
SQL functions referencing C functions which no longer exist in
repmgr.so, and hence cause the upgrade to fail.

We can work around this by providing empty upgrade scripts
from these versions to 4.4, which skip the problematic CREATE
FUNCTION commands. The functions will be correctly created in
the 5.2--5.3 upgrade script.
2022-02-03 12:59:52 +09:00
Ian Barwick
b8cb1feb49 repmgrd: move connection pointer declaration inside relevant block
As it's used only there and nowhere else.
2022-01-04 12:45:28 +09:00
Ian Barwick
5de74d9e18 doc: update release notes 2022-01-04 12:32:43 +09:00
zhouhj43183
2dce8e14f9 repmgrd: ensure potentially open connections are closed
When recovering from degraded state in local node monitoring, in some
cases a new connection was opened to the local node without closing
the old one, which will result in memory leakage.
2022-01-04 12:22:09 +09:00
Ian Barwick
81f9c0ebd0 doc: update repmgr.conf.sample
Minor formatting fix.
2021-12-08 09:50:37 +09:00
Ian Barwick
9f53d45c74 doc: update repmgr.conf.sample
Remove bogus -W option in "repmgr standby follow" example invocation
for the "follow_command" parameter.

The option (which corresponds to "--no-wait") is not used by
"repmgr standby follow".

Per report from Jimmy Angelakos.
2021-12-08 09:50:19 +09:00
Ian Barwick
aef5a4f8cf doc: fix typo 2021-11-05 09:12:06 +09:00
Ian Barwick
580d112d3d Removed temporary include file workaround 2021-10-29 15:41:53 +09:00
Ian Barwick
6ba27a9c6b Add dummy include file
This is a workaround required to facilitate Debian package builds
against PostgreSQL Extended.
2021-10-12 16:51:51 +09:00
Ian Barwick
51e7025747 doc: update release notes 2021-10-12 10:13:42 +09:00
Ian Barwick
ded20be505 repmgrd: improve node activation at startup
Commit 79d1f00 modified repmgrd to automatically set an inactive node
to "active" at startup.

However we need to avoid doing that for cases where the node role has
changed (e.g. a former primary was recloned as a standby) but the node
record was not updated.
2021-10-11 14:39:04 +09:00
Ian Barwick
e7e62f7f35 repmgrd: add %p event notification parameter for "repmgrd_failover_promote"
This enables an event notification script to identify the former primary
node.
2021-09-28 10:25:27 +09:00
Ian Barwick
3870768d80 Add --repmgrd option to "repmgr node check"
This provides a simple way for checking whether the node's repmgrd is
running.

GitHub #719.
2021-09-28 09:46:31 +09:00
Ian Barwick
16349d95e9 Update Makefile for 5.4dev 2021-09-16 14:43:35 +09:00
Ian Barwick
bba282a131 Add extension script files for 5.4dev. 2021-09-16 14:32:55 +09:00
Ian Barwick
02c29dc860 Add extension script for unpackaged upgrades to 5.3 2021-09-16 13:56:38 +09:00
Ian Barwick
7c5efe2baa doc: update 5.3.0 release notes 2021-09-16 13:42:58 +09:00
Ian Barwick
b2756806d9 Bump master branch to 5.4dev 2021-09-16 10:15:20 +09:00
Ian Barwick
bf0478088c standby: add missing include 2021-08-18 10:26:56 +09:00
Ian Barwick
efd5792de4 Remove redundant shared library function prototypes
From PostgreSQL 9.4 (commit e7128e8d), explicit function prototypes
are not required as they will be generated by the PG_FUNCTION_INFO_V1
macro.

(We were supporting PostgreSQL 9.3 until relatively recently, so there
was nothing particular to gain by removing these earlier).
2021-08-18 10:19:13 +09:00
Ian Barwick
17987a2690 standby switchover: detect if demotion candidate is running as a primary
This shouldn't happen, but if it does, log the fact for easier analysis.
2021-07-28 11:50:57 +09:00
Ian Barwick
5f1ba6db3d standby switchover: improve handling of node rejoin failure
Explicitly check whether the "repmgr node rejoin" command on the
demotion candidate succeeded. Due to the way SSH execution is
currently implemented, we can return either the command execution
status or the command output; to ensure any errors are available,
log them to a temporary file on the demotion candidate and note
its location in case of an error.

While we're at it, improve error message handling when the demotion
candidate fails to rejoin.
2021-07-28 11:42:40 +09:00
Ian Barwick
55efbe60ea standby switchover: optionally delay promotion
This is for testing purposes only and should not be used in production.
2021-07-27 17:01:27 +09:00
Ian Barwick
132f5ebc08 standby switchover: improve logging of repmgrd pause actions
- state how many nodes are to be operated on
- if errors were encountered with any nodes, emit the total number
  of nodes as well as the number of affected nodes
- log nodes where repmgrd was not running anyway as NOTICE, not
  WARNING
2021-07-26 17:46:58 +09:00
Ian Barwick
c901f36f81 Standardize formatting of node ID in log messages
Mostly we have "(ID: %i)", so use that rather than "(ID %i)" which has
crept into a few places.
2021-07-26 17:27:36 +09:00
Ian Barwick
edb49b2747 doc: link to main documentation section about RemoveIPC 2021-07-26 11:01:03 +09:00
Ian Barwick
a35d85ed70 doc: link to service commands section from switchover docs 2021-07-26 09:47:14 +09:00
Ian Barwick
b6b91425d9 doc: document pg_bindir setting
Per suggestion in GitHub #705.
2021-07-20 13:36:08 +09:00
Ian Barwick
32329ca55a doc: link to sample configuration file
Unfortunately it hasn't been possible yet to include all available
configuration items in the main documentation, but we should at least
make it easier to find the full list.
2021-07-20 13:18:57 +09:00
Ian Barwick
79d1f005db repmgrd: activate inactive node record at startup
If a PostgreSQL instance was shut down while repmgrd was running, and
repmgrd was subsequently restarted (this chain of events could occur
during e.g. a server reboot), the node record will have been set to
"inactive". Previously, in this case repmgrd would refuse to start up.
However, as we can determine the node is running, it should normally
be no problem to automatically set the node record to "active".

The old behaviour can be restored by setting the new parameter
"repmgrd_exit_on_inactive_node" to "true".

RM19604.
2021-07-12 17:46:09 +09:00
Ian Barwick
f64f498afb Be more flexible when parsing the output from pg_config --version
The string may not always start with "PostgreSQL" when building
against non-community versions.

Life would be much easier here if there was an option like
"pg_config --version-number" or similar.
2021-07-05 15:38:34 +09:00
Ian Barwick
f10c013e89 doc: note PostgreSQL 14 support
repmgr 5.3 will provide support for PostgreSQL 14.
2021-07-01 16:00:21 +09:00
Ian Barwick
2059a55a99 doc: update release notes 2021-07-01 13:44:33 +09:00
Ian Barwick
99ed17b838 doc: update release notes 2021-07-01 13:28:15 +09:00
Ian Barwick
078b4ad863 standby clone: set "slot_name" in node record if required
If executing "repmgr standby clone --replication-conf-only" on a node
which was set up without replication slots, but the repmgr configuration
was since changed to "use_replication_slots=1", repmgr will attempt to
create the replication slot. This will however fail if "slot_name"
is not set in the node's record, so have repmgr set the slot_name in
this case.

It might be preferable to preemptively create the slot name for each
node when configuring the cluster, however this would be a behavioural
change which would be better off in a major release (for example, it's
conceivable a user runs sanity checks on the node records and expects
to find the slot names empty if replication slots are not in use).
2021-07-01 13:04:23 +09:00
Ian Barwick
2af71c6426 repmgrd: ensure short option "-s" is accepted
The long option --show-pid-file was fine.
2021-06-03 18:41:11 +09:00
Ian Barwick
9349520530 Remove reference to PostgreSQL 9.3 in --help output
It is not supported by repmgr 5.2 and later.
2021-04-15 15:31:59 +09:00
Ian Barwick
14851e61de doc: clarify "connection_check_type='query'" 2021-03-03 13:12:58 +09:00
Ian Barwick
888e1d7a3b docs: update repmgr.conf.sample
Fix description for connection_check_type='connection'.
2021-03-02 11:44:14 +09:00
Ian Barwick
1b4c2a60bb docs: update README
Note latest version number.
2021-03-02 11:26:15 +09:00
Ian Barwick
da163e811c doc: update README
Fix and update broken link.
2021-03-02 11:14:16 +09:00
Ian Barwick
80d1beef7e doc: update GitHub links to new location 2021-03-02 08:58:49 +09:00
Ian Barwick
4f009548f6 doc: remove generated .fo files 2021-03-01 11:06:41 +09:00
Ian Barwick
d266df3143 Change copyright information to "EnterpriseDB Corporation"
RM20485.
2021-03-01 11:03:52 +09:00
Ian Barwick
dd8204e013 Rename various shared library functions
Some of the more generically named functions are at risk of colliding
with functions defined in other libraries. To mitigate that risk,
prefix with "repmgr_", unless the name already has some reference
to repmgr.

This requires an extension version bump.

RM20471.
2021-02-23 10:14:28 +09:00
Ian Barwick
d34b4e71a6 Fix incorrect comment 2021-02-22 13:28:27 +09:00
Ian Barwick
12749c3f63 doc: fix XML markup
An incorrect column count was causing PDF builds to fail.
2021-02-19 20:39:41 +09:00
Ian Barwick
ce59d92731 doc: update repmgr.conf.sample 2021-01-14 15:27:24 +09:00
Ian Barwick
cfbeed50d6 node rejoin: emit rejoin target note information as NOTICE
As it's possible to specify the connection information for any available
node, but currently not possible to rejoin to a node other than the
primary, explicitly mention what the rejoin target will be.
2021-01-06 14:11:37 +09:00
Ian Barwick
da3eaee127 doc: "repmgr node rejoin" clarifications
- make it clearer a node can only be joined to the primary
- update patch status
2021-01-06 12:36:11 +09:00
Ian Barwick
b37a599fc6 Update copyright notices to 2021 2021-01-04 12:54:54 +09:00
Ian Barwick
f011e552d0 Add missing PQconninfoFree() call 2020-12-24 18:07:18 +09:00
Ian Barwick
d1cc05faf9 repmgrd: edit code comment for clarity 2020-12-22 13:58:34 +09:00
Ian Barwick
7ceba84e32 doc: minor grammar tweak 2020-12-22 13:57:31 +09:00
Josh Soref
842c67ca18 doc: various spelling fixes
Via GitHub #687.
2020-12-22 13:47:56 +09:00
Josh Soref
f619c3a8ff Fix various typos in code comments.
Via GitHub #687.
2020-12-22 13:43:06 +09:00
Josh Soref
5a88858596 repmgr: various log ouput typo fixes
Via GitHub #687.
2020-12-22 13:18:11 +09:00
Josh Soref
02bc143c75 repmgr: fix typo in "repmgr node --help" output
Via GitHub #687.
2020-12-22 13:07:16 +09:00
Ian Barwick
c480d01f9c Improve HINT about upgrading the repmgr extension
Per feedback in GitHub #685.
2020-12-15 08:41:46 +09:00
Ian Barwick
e762200a12 doc: update README
Link to recent-ish EDB blog article.
2020-12-08 13:31:21 +09:00
Ian Barwick
2133e1097e standby switchover: remove extraneous space in log message 2020-12-08 13:14:45 +09:00
Ian Barwick
77d7a098a1 doc: add 5.2.1 release date 2020-12-08 12:41:46 +09:00
Ian Barwick
4e9cdf0267 doc: update 5.2.1 release notes 2020-12-04 14:49:20 +09:00
Ian Barwick
0d8bf2a935 Minor string formatting optimization 2020-12-04 10:16:21 +09:00
Ian Barwick
debbda6074 standby clone: tweak error message
Probably a remnant from the 9.1 era, where it was not possible to
take a base backup from a standby.
2020-12-04 10:13:45 +09:00
Ian Barwick
d5b94431f2 standby follow: fix standby.signal generation
Oversight from previous a93c6dfc.
2020-12-02 09:11:20 +09:00
Ian Barwick
93187e9743 Add missing connection close
In a corner-case situation where a standby is unable to attach to
the new primary due to a mismatch in the WAL stream, the connection
used to verify the recovery status of the new primary was not being
closed, leading to a risk of connection exhaustion on the new primary.

Addresses GitHub #682.
2020-12-01 21:33:07 +09:00
Ian Barwick
f7e45863ad standby clone: fix data directory permissions handling for Pg11 and later
Previously, repmgr would forcibly change the permissions on a data
directory to 0700. However from PostgreSQL 11, 0750 is also valid,
so that value should not be changed.
2020-12-01 11:48:22 +09:00
Ian Barwick
89556d6488 standby clone: add --recovery-min-apply-delay to help output 2020-11-30 16:44:48 +09:00
Ian Barwick
4ad868d119 doc: update 5.2.1 release notes 2020-11-30 16:44:48 +09:00
Ian Barwick
a93c6dfca7 Ensure standby.signal is set correctly if -D/--data-directory supplied
When cloning a standby, it's possible to do a "raw" clone by providing
-D/--data-directory but no repmgr.conf file. However the code which
creates "standby.signal" was assuming the presence of a valid
repmgr.conf complete with "data_directory" configuration.

This is very much a niche-use case.
2020-11-27 11:24:15 +09:00
Ian Barwick
4d8bc63834 repmgrd: fix issue with incorrect reconnect_interval
Addresses GitHub #673.
2020-11-25 20:40:28 +09:00
Ian Barwick
7bca9df223 Update Makefile
We don't actually need $(LIBS) in there; this was cargo-culted in
from somewhere.
2020-11-24 17:37:48 +09:00
Ian Barwick
1ac62a4352 Avoid compiler warnings for various strncpy() operations
Here the compiler may complain that the source length is being used,
though in all cases the source length was previously used to
define the length of the destination buffer, so it's not actually
a problem.
2020-11-24 15:42:49 +09:00
Ian Barwick
8f7a32a9a2 repmgr: prevent termination in corner-case situation
If neither the local node nor the upstream are available, and
"standby_disconnect_on_failover" is set, attempting to fetch
the walreceiver PID will result in repmgrd terminating.

Add a check that the connection is valid before attempting to
fetch the walreceiver PID.

Addresses GitHub #675.
2020-11-17 16:34:55 +09:00
Ian Barwick
9c04de11fc standby clone: various clarifications for --replication-conf-only option
In particular, the emitted HINT was not really appropriate for Pg13 and
later.
2020-11-17 09:58:51 +09:00
Ian Barwick
040b1ae4e3 Update corner-case error message
Not possible to build repmgr compatible with Pg12+ against Pg11
and earlier due to the addition of FullTransactionId.
2020-11-17 09:39:41 +09:00
Ian Barwick
703aed3fa3 doc: tweak "repmgr standby clone" reference
As recovery.conf starts to fade away, mention that last.
2020-11-10 16:07:22 +09:00
Ian Barwick
7ee0098771 standby clone: add option --recovery-min-apply-delay
This overrides the equivalent setting in repmgr.conf, if present.

Note this option was available in repmgr versions prior to 4.0, but
was assumed to be redundant. However recently a use-case was made
for its reintroduction.
2020-11-10 15:55:04 +09:00
Ian Barwick
430d12b870 Fix typo 2020-11-10 13:40:38 +09:00
Romain Jacquier
c8b2d23361 Fix help witness
Fix the `repmgr witness --help` command where at the "Unregister" section the message shown was
```
"witness register" unregisters a witness node.
```
instead of
```
"witness unregister" unregisters a witness node.
```

GitHub #676.
2020-11-09 13:34:08 +09:00
Ian Barwick
8543c0bcf6 standby clone: emit pg_basebackup command in --dry-run mode 2020-11-04 12:00:42 +09:00
Ian Barwick
674c06d01c Decouple extension version check from binary version
Until now the extension version has always moved in lock-step
with the binary version, but that doesn't always need to be
the case, so make it possible to have an extension version
which does not match the binary version.
2020-10-30 14:42:58 +09:00
Ian Barwick
970d7a136f Fix return value of pg_reload_conf() database utility function
Would always return "false", but as the value wasn't used anywhere,
the issue was inconsequential.

However while we're at it, actually check the return value in the
two places it's called, to help diagnose any issues in the unlikely
event they occur.

Per issue reported via GitHub PR #671 from user duzhgg.
2020-10-30 14:25:11 +09:00
Ian Barwick
7bde686796 standby clone: handle missing "postgresql.auto.conf"
In PostgreSQL 12 and later we need to append replication configuration
to "postgresql.auto.conf" to guarantee it will be read last, and hence
override any preceding replication configuration which may be haunting
the configuration files.

We've been assuming that "postgresql.auto.conf" will always be present,
but at least one corner case has been observed where that was not the
case on the node being cloned from. Moreover it's perfectly acceptable
that this file does not exist (it will be recreated the next time
ALTER SYSTEM is executed), so we should be prepared to handle that case.

In passing, improve handling of more unlikely errors which might be
encountered when processing "postgresql.auto.conf".
2020-10-30 12:25:03 +09:00
Ian Barwick
ab1447aeca Standardize code style 2020-10-30 11:06:15 +09:00
Ian Barwick
293e37688f config: fix parsing of "replication_type"
This is a legacy parameter which can currently only contain one value,
"physical" (the default).

It can be safely omitted.

Addresses GitHub #672.
2020-10-30 10:14:04 +09:00
Ian Barwick
96718151a6 doc: update README
- remove partial sentence
- remove links to very dated blog entries
2020-10-27 13:58:07 +09:00
Ian Barwick
65ffe51bb4 doc: update README
Link to release notes as a simple way of providing the latest release
information.
2020-10-27 13:52:19 +09:00
Ian Barwick
b6d0288a82 Finalize release date 2020-10-22 21:23:50 +09:00
Ian Barwick
f888407ad8 Additional fix to upgrade script
Drop old "repl_events" table.
2020-10-22 21:22:52 +09:00
Ian Barwick
1512c7b761 Fix extension script for unpackaged upgrades to 5.2
Apparently "ALTER TABLE" (which we were using to convert the
"repl_events" table) does not mark the table as being part of the
extension. Instead, we need to create the new table and copy the
data, as is done with the other tables.
2020-10-22 21:22:47 +09:00
Ian Barwick
8877d4d508 doc: add missing "unpackaged" reference 2020-10-22 21:22:43 +09:00
Ian Barwick
f18b2e900d standby clone: improve Barman source server check
Use "remote_command()" to execute the remote psql command, and
provide the -X option to psql to ensure it doesn't read ~/.psqlrc.
2020-10-22 21:22:04 +09:00
Ian Barwick
5c4aa1856c doc: update README 2020-10-22 21:20:17 +09:00
Ian Barwick
091a2df167 doc: update release notes 2020-10-20 14:09:44 +09:00
Ian Barwick
17a1732eb0 Bump master branch to 5.3dev
Also update the minimum version check to PostgreSQL 9.4.
2020-10-20 13:41:49 +09:00
Ian Barwick
397e0ed5be Silence potential compiler complaint
*We* know the target buffer is sufficiently sized to accept the
source string, but the compiler doesn't.
2020-10-20 09:51:49 +09:00
Ian Barwick
8f3994b071 doc: update "repmgr standby clone" reference
Clarify which replication configuration parameters will be written
for which PostgreSQL version.
2020-10-20 09:35:22 +09:00
Ian Barwick
f7c232b393 Only write "recovery_target_timeline" for PostgreSQL 11 and earlier
"recovery_target_timeline" defaults to "latest" from PostgreSQL 12
(see core commit 2dedf4d9) so no need to write it explicitly.
2020-10-20 09:18:29 +09:00
Ian Barwick
ac2feba380 Improve capture of pg_rewind stderr output
As it seems redirecting stderr to stdin (2>&1) when executing
system commands results in a SIGPIPE (141) return code, making
it impossible to determine the actual return code, redirect
stderr to a temporary file and collate the output from that.

There are possibly better ways of doing this which could
be revisited at a future date.
2020-10-15 14:01:18 +09:00
Ian Barwick
725e9f9851 Remove more unused code 2020-10-15 11:03:18 +09:00
Ian Barwick
00bf4d61fa Remove unused code 2020-10-15 10:54:18 +09:00
Ian Barwick
338c4d3f8a node rejoin: rename variable for clarity 2020-10-14 17:57:20 +09:00
Ian Barwick
250c6291df node rejoin: fix Pg13+ "standby.signal" handling with pg_rewind 2020-10-14 17:51:57 +09:00
Ian Barwick
773159a9e8 standby clone: move check for --waldir pg_basebackup option
When cloning from Barman, and --no-upstream-connection was supplied,
the server version number will not be available at this point in the
code. It will however later be extracted from the Barman metadata,
so move the check for the --waldir pg_basebackup option to after
this point.

Also add an explicit check that a server version number has been
obtained (and fall back to extracting it from the cloned data
directory), as subsequent operations depend on knowing this to
be performed correctly.
2020-10-13 14:27:46 +09:00
Ian Barwick
5f986bc981 node rejoin: handle unclean shutdown in Pg13
From PostgreSQL 13, pg_rewind will automatically handle an unclean
shutdown itself, so as long as --force-rewind was provided, so there
is no need to fail with an error.

Note that pg_rewind handles the unclean shutdown by starting PostgreSQL
in single user mode, which it does before performing any checks as
to whether a rewind is actually necessary.

However pg_rewind doesn't take into account the possible presence
of a standby.signal file, so we remove that and recreate it after
pg_rewind was executed.
2020-10-13 10:18:55 +09:00
Ian Barwick
d62743ddf4 Make repmgr metadata tables dumpable
This makes it easier to extract data for troubleshooting.
2020-10-12 10:02:52 +09:00
Ian Barwick
b195547525 Remove hack to create pre-9.4 monitoring history table
The pg_lsn datatype was introduced in PostgreSQL 9.4, and we no
longer need to support earlier versions.
2020-10-12 09:32:37 +09:00
Ian Barwick
758c18985c node rejoin: better document unrecoverable situation
If two diverged nodes are on the same timeline, currently there's
no way of establishing the divergence point and pg_rewind
is ineffective.

Clarify the log messages to make this clearer.
2020-10-08 21:09:51 +09:00
Ian Barwick
7969dc4800 Enable "node rejoin" to join a target with a lower timeline
This has been possible since PostgreSQL 9.6, but the node rejoin/follow
check did not consider this possibility.
2020-10-08 16:51:16 +09:00
Ian Barwick
0fc8c6c79c Add some break statements to silence compiler warnings 2020-10-08 13:10:00 +09:00
Ian Barwick
e7acb6809b standby clone: improve error logging
When executing "repmgr standby clone" in Barman mode, and --waldir
is set in pg_basebackup options, properly report an error if the
target WAL directory could not be created or is not empty.
2020-10-08 10:46:20 +09:00
Ian Barwick
4b524c52b6 standby clone: honour --waldir setting when cloning from Barman
By setting --waldir in "pg_basebackup_options", standbys cloned using
pg_basebackup would have their WAL directory set to the specified
location and symlinked from the data directory.

This commit causes repmgr to honour that setting even when cloning
from Barman.
2020-10-07 15:13:52 +09:00
Ian Barwick
b3b9281253 Parse pg_basebackup option --waldir/--xlogdir 2020-10-07 15:13:50 +09:00
Ian Barwick
679cfe0852 doc: update release notes
Note PostgreSQL 13 support as a general feature.
2020-10-06 14:17:16 +09:00
Ian Barwick
e10d9fd393 EXPERIMENTAL: synchronise try_primary_reconnect()'s reconnection loop
Per proposal in GitHub #662, this patch attempts to synchronise each
repmgrd's primary reconnection attempts to prevent potential race
conditions. This relies on each node's clock being correcly
synchronised.

Currently this change is experimental and is not enabled by default.
It can be enabled by setting the repmgr.conf parameter
"reconnect_loop_sync".
2020-10-06 13:35:49 +09:00
Ian Barwick
467d19bcd4 Use atoll() to parse system_identifier on 32bit systems
Addresses issue in GitHub #665.
2020-10-06 09:44:31 +09:00
Ian Barwick
be244e2155 Fix typo
s/paremeter/parameter/
2020-10-05 17:06:52 +09:00
Ian Barwick
42283bf344 repmgrd: check local connection after promoting local node
In theory the local connection should not be affected by the node's
promotion. However we're handing over control to an external command
which is usually just "repmgr standby promote", but could potentially
be a user-defined script with unknowable side effects. So it's
better to be safe than sorry.
2020-10-05 16:50:41 +09:00
Ian Barwick
5b254a1be9 repmgrd: add parameter "failover_delay"
This parameter is not documented and intended for use during testing.
It should not be used in production.
2020-10-05 16:43:06 +09:00
Ian Barwick
9aaf7d79a2 standby switchover: in Pg13 and later, promotion overrides paused WAL replay
Preventing a switchover in this case no longer makes sense, so we apply
the checks to PostgreSQL 12 and earlier only.
2020-09-30 15:15:11 +09:00
Ian Barwick
e86c035242 standby promote: in Pg13 and later, promotion overrides paused WAL replay
Aborting in this case no longer makes sense, so we apply the checks
to PostgreSQL 12 and earlier only.
2020-09-30 15:15:07 +09:00
Ian Barwick
73d2088a85 standby follow: don't restart server (PostgreSQL 13 and later)
As of PostgreSQL 13, changes to the fundamental replication
configuration can be applied with a simple SIGHUP, no restart
required.

In case the old behaviour is desired, i.e. a full restart to apply
the configuration changes, the new configuration parameter
"standby_follow_restart" can be set. This parameter has no effect
in PostgreSQL 12 and earlier.
2020-09-29 17:53:51 +09:00
Ian Barwick
48f95f9a39 Fix typo in comment 2020-09-29 15:26:28 +09:00
Ian Barwick
ce229beff8 repmgrd: add configuration option "always_promote"
In certain corner cases, it's possible repmgrd may end up monitoring
a standby which was a former primary, but the node record has not
yet been updated.

Previously repmgrd would abort the promotion with a cryptic message
about being unable to find a node record for node_id -1 (the
default value for an unknown node id).

This commit addes a new configuration option "always_promote", which
determines whether repmgrd should promote the node in this case.
The default is "false", to effectively maintain the existing behaviour.

Logging output has also been improved to make it clearer what has
happened when this situation occurs.
2020-09-29 14:18:00 +09:00
Ian Barwick
16eeae700c repmgrd: minor log message tweak 2020-09-29 10:23:31 +09:00
Ian Barwick
70061c51aa Further improve handling of possible pg_control read errors
Builds on changes in commit 147f454, and ensures appropriate
action is taken if a value cannot be read from pg_control.
2020-09-28 13:59:34 +09:00
houzj.fnst
3ffeffbd8b Remove redundant condition
GitHub #655.
2020-09-28 13:08:18 +09:00
Ian Barwick
147f454d32 Minor sanity check for control file extraction functions
If the control file couldn't be parsed for whatever reason, return
the default value for the requested parameter.

It'd be better to have the caller pass in a pointer to the parameter
and have the function return bool so the caller doesn't assume the
control file was read successfully. This is important for handling
DBState, where no "value unknown" default is available.
2020-09-28 10:47:56 +09:00
Ian Barwick
26b5664741 repmgr: enable "primary unregister --force" to unregister an active primary
The primary must have no registered standby nodes.

Also document usage when unregistering a primary node which is actually
running as a standby.
2020-09-23 15:12:19 +09:00
Ian Barwick
cb86180f4f doc: document include directives 2020-09-18 16:27:52 +09:00
Ian Barwick
1f3e098104 Add option "--dump-config"
This is initially intended for verifying the configuration parsing
mechanism and is currently undocumented.
2020-09-18 15:12:22 +09:00
Ian Barwick
4670515285 config: fix parsing of event_notifications list 2020-09-18 14:36:56 +09:00
Ian Barwick
bccc2673b6 doc: update compatibility matrix 2020-09-18 11:40:47 +09:00
Ian Barwick
158008c5c5 doc: update release notes 2020-09-18 11:29:07 +09:00
Ian Barwick
5f3d1cdeb6 doc: note removal of PostgreSQL 9.3 support 2020-09-17 16:05:16 +09:00
Ian Barwick
82515a9733 doc: document new parameters for "failover_validation_command" 2020-09-17 15:48:18 +09:00
Stanislav Paskalev
73e8373337 Add %v, %u and %t parameters to "failover_validation_command"
These indicate:
 - the number of visible nodes sharing the current upstream
 - the number of nodes on the current upstream
 - the total number of nodes in the entire repmgr cluster.

This allows the failover_validation_command to be used to perform
more thorough validations, including cross-referencing external
cluster management state (e.g. if managed by kubernetes).

GitHub #651.
2020-09-17 15:48:12 +09:00
Ian Barwick
f1bdb09512 doc: note existing pg_rewind corner-case bug 2020-09-15 14:21:14 +09:00
Ian Barwick
028e3ab48d doc: rearrange "repmgr node rejoin" reference for clarity
The <important> section looked like an actual subsection, so convert
that and the following example section into <refsect2> sections.
2020-09-15 13:42:18 +09:00
Ian Barwick
b5b7d635ad doc: fix "release-current" tag 2020-09-04 15:12:33 +09:00
Ian Barwick
146654bf0e Clarify code comment 2020-09-04 15:10:54 +09:00
Ian Barwick
4c3aed2573 rsync: exclude log/pg_log directory depending on PostgreSQL version
This is more for completeness as the data source here is Barman, which
shouldn't contain log files anyway.
2020-09-04 15:03:27 +09:00
Ian Barwick
3945314e65 Remove PostgreSQL 9.3 support
PostgreSQL 9.3 community support ended in November 2018.
2020-09-04 11:37:12 +09:00
Ian Barwick
f4938a4a42 standby clone: tweak --help output wording 2020-09-03 10:41:04 +09:00
Ian Barwick
9a836b3c04 Add option --verify-backup
This causes pg_verifybackup to be executed immediately after
pg_basebackup completes.

PostreSQL 13 and later.
2020-09-03 10:40:32 +09:00
Ian Barwick
c8e52e486f Have make_pg_path() output to a PQexpBuffer
Calling functions are all using one anyway, so there's no point keeping
static buffers around.
2020-09-02 15:30:57 +09:00
Ian Barwick
1f7ac843fd Consolidate role availability checking code 2020-09-01 14:37:33 +09:00
Ian Barwick
8d57d7e001 is_downstream_node_attached(): avoid false negative
If the provided connection does not have sufficient permission to read
"pg_stat_replication.state", and there is an entry for the node in
"pg_stat_replication", assume it's connected. Finer-grained detection
requires additional user permissions, nothing we can do about that.
2020-09-01 14:28:40 +09:00
Ian Barwick
13e7c679cd Minor coding style fixes 2020-09-01 13:35:13 +09:00
Ian Barwick
466590af28 Fix comment 2020-09-01 13:23:46 +09:00
Ian Barwick
a88c80248c repmgrd: minor tweaks to witness node synchronisation
Explicitly roll back if any operation fails, and add debugging output
to track elapsed time between synchronisation intervals.
2020-09-01 09:58:14 +09:00
Ian Barwick
1131e3aad2 PostgreSQL 13: support "wal_keep_size"
Renamed from "wal_keep_segments" in core commit f5dff459.
2020-08-31 17:18:41 +09:00
Ian Barwick
0630d9644e Improve replication connection check
Previously the check verifying that a node has connected to its upstream
merely assumed the presence of a record in pg_stat_replication indicates
a successful replication connection. However the record may contain a
state other than "streaming", typically "startup" (which will occur when
a node has diverged from its upstream and will therefore never
transition to "streaming"), which needs to be taken into account when
considering the state of the replication connection to avoid false
positives.
2020-08-27 16:09:25 +09:00
Ian Barwick
c50a2d049c doc: note use of wildcards in .pgpass file 2020-08-19 10:32:34 +09:00
Ian Barwick
20100f5aaa docs: link to PostgreSQL roadmap 2020-08-06 09:50:23 +09:00
Ian Barwick
2d20c110bf doc: update "repmgr witness register" description
Add missing "Options" section.
2020-08-06 09:50:19 +09:00
Ian Barwick
aed0045c3a docs: reformat additonal config file upgrade notes into a new section
It's easier to link to the information that way.
2020-08-06 09:50:16 +09:00
Martín Marqués
cd81046c26 doc: add two notes on section related to configuration files
Add notes to the documention mentioning that after postgres or repmgr
upgrades (postgres major upgrades), there are some changes that need
to be taken care of.

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2020-08-06 09:50:11 +09:00
Ian Barwick
fb32284cdc remove superfluous debugging output 2020-08-06 09:05:55 +09:00
Ian Barwick
893044f1e9 Add missing pfree() calls 2020-07-16 11:08:20 +09:00
Ian Barwick
271d407c7c doc: update sample configuration file
Clarify parameters for recovery_min_apply_delay
2020-07-16 11:07:49 +09:00
Ian Barwick
1d0103ca44 cluster matrix/crosscheck: improve text mode output formatting
Previously these actions were hard-wired to assume node IDs would only
ever have two digits at most.

Refactor to use the same table generation code as other actions, which
properly handles variable column sizes.
2020-07-06 13:52:28 +09:00
Ian Barwick
51d56684a7 node status: clarify "archive_mode" message on standbys
"archive_mode = 'always'" available from PostgreSQL 9.5.
2020-07-06 10:21:00 +09:00
Ian Barwick
4d88f177a7 doc: clarify "node rejoin" usage
Emphasize that conninfo must be provided for a running node.
2020-07-06 09:55:06 +09:00
Ian Barwick
de3f0802b4 Update source comments to clarify data directory modifications 2020-06-19 13:51:00 +09:00
Ian Barwick
2985b9d91f Remove vestiges of deprecated "--node" option
The option was never actually set anywhere.

By removing it, readline will now produce a reasoanably helpful message
in the offchance it is provided, e.g.:

   option '--node=foo' is ambiguous; possibilities: '--node-id' '--node-name'
2020-06-10 13:11:22 +09:00
Ian Barwick
300e11eb76 Rearrange some command line option handling definitions for clarity 2020-06-10 13:02:13 +09:00
Ian Barwick
53546b1c88 node rejoin: remove unneeded PQfinish() 2020-06-10 10:36:55 +09:00
Ian Barwick
547bbb06d8 doc: note downstream node (dis)connection monitoring in more places 2020-06-09 16:21:36 +09:00
Ian Barwick
0d0ffc675c standby clone: add a strategic Assert 2020-06-09 14:31:49 +09:00
Ian Barwick
11dc923a20 standby clone: minor code cleanup 2020-06-09 14:31:44 +09:00
Ian Barwick
e97319f01d Fix typo in comment 2020-06-09 14:31:40 +09:00
Ian Barwick
db1cb1433f Rename the TablespaceDataListCell element "f" to "fptr" for clarity
And add a few more comments to make it clearer what's going on.
2020-06-09 14:31:36 +09:00
Ian Barwick
c1428a3ecd standby clone: fixes for Barman tablespace handling.
repmgr creates a file with a list of tablespace files to fetch from
Barman, however the file may not actually have been flushed to disk
at the point the rsync operation was executed, so may be incomplete
or empty.

Also fix handling of tablespace remapping.

Addresses GitHub #650.
2020-06-09 10:52:10 +09:00
Ian Barwick
fc568a9101 run_file_backup(): fix comments
Explicitly document use-case for this function, and fix a comment
which probably got munged by pg_indent.
2020-06-08 12:45:38 +09:00
Ian Barwick
e65738c989 Explicitly unset search path when connecting to database 2020-05-22 16:11:55 +09:00
Ian Barwick
aaea24b58b repmgrd: log reconnection of paired connection 2020-05-22 14:21:17 +09:00
Ian Barwick
d75a35a788 repmgrd: clarify why node is not configured for automatic failover 2020-05-22 11:21:48 +09:00
Ian Barwick
8233560629 repmgrd: ensure cascaded standby reconnects to primary
If the primary connection went away, and the upstream is not the
primary, attempt to reconnect if the monitoring update fails.

If the upstream is the primary, the reconnection will happen on
the next connection check.
2020-05-22 11:11:58 +09:00
Ian Barwick
cf60844c45 repmgrd: ensure primary connection is reset if same as upstream
Addresses GitHub #633.
2020-05-22 11:11:54 +09:00
Ian Barwick
a0d3fae7ab standby register: ensure location field is compared during record check 2020-05-21 14:35:03 +09:00
Ian Barwick
f05978e2e1 doc: update example log output to match current code 2020-05-15 10:33:47 +09:00
Ian Barwick
b7475792e7 configuration: add maximum nesting depth
As with ff4771ab, this is the same behaviour as core PostgreSQL.
2020-05-14 16:27:35 +09:00
Ian Barwick
5c16e94672 configuration: clean up unneeded code and comments 2020-05-14 16:27:22 +09:00
Ian Barwick
ff4771ab02 configuration: reject direct recursion
Ensure a configuration file can't include itself.

This is the same behaviour as core PostgreSQL.
2020-05-14 15:32:04 +09:00
Ian Barwick
d59cadd5f6 Remove old configuration handling code
This expunges two large and cumbersome sets of if/else statements and
the T_CONFIGURATION_OPTIONS_INITIALIZER macro, all of which needed to
be kept in sync when adding/modifying configuration file parameters.
2020-05-14 11:57:16 +09:00
Ian Barwick
04aee7b406 Set defaults before loading configuration file 2020-05-14 11:57:07 +09:00
Ian Barwick
029164a817 Make configuration default value usage more consistent 2020-05-14 11:57:04 +09:00
Ian Barwick
3dde8f1386 "retire" old configuration handling code 2020-05-14 11:57:00 +09:00
Ian Barwick
94fbf76b2e be quiet, griping compiler! 2020-05-14 11:56:56 +09:00
Ian Barwick
4a1855fabe Place configuration settings struct in separate file 2020-05-14 11:56:45 +09:00
Ian Barwick
d79d4c50b2 handle tablespace mapping 2020-05-14 11:56:42 +09:00
Ian Barwick
2071fa8c7e Initial implementation of an iterable configuration item list
This implements storing the configuration file parameter definitions in
an iterable list. This will replace the existing way of populating the
configuration struct, which is a long and cumbersome if/else structure,
and will make it possible to later dump the imported configuration.
2020-05-14 11:56:38 +09:00
Ian Barwick
9945a3a4a8 Handle "include_dir" 2020-05-14 11:56:35 +09:00
Ian Barwick
0df0db1281 Handle "include_if_exists" 2020-05-14 11:56:30 +09:00
Ian Barwick
682ec9184a Handle "include" 2020-05-14 11:56:26 +09:00
Ian Barwick
fdc6f61257 Pass base configuration file directory to configuration parser
If provided, the parser will use this to process include directives
with unqualified filenames.
2020-05-14 11:56:23 +09:00
Ian Barwick
f5018e42f3 Initial refactoring of configuration file parsing
Have the configuration file parsing routine itself open the respective
configuration file, rather than passing a file pointer from the original
caller. This is required for handling include directives, which we'll
want to do for sanity-checking the PostgreSQL configuration on a freshly
cloned, unstarted standby.
2020-05-14 11:56:19 +09:00
Ian Barwick
26689871dc Update filename in comment 2020-05-14 10:50:36 +09:00
Ian Barwick
a863dc7f6c repmgrd: additional check for the upstream connection
It's possible the upstream server was intermittently unavailable in
the interval between checks, invalidating the upstream connection.
With check types "ping" and "connection", the connection would not be
restored, so if the availability check was successful, additionally
verify the upstream connection and restore if necessary.

Addresses GitHub #633.
2020-05-14 10:26:57 +09:00
Ian Barwick
9b6fe6858a doc: update repmgr.conf.sample
Was missing "query" option for "connection_check_type".
2020-05-12 17:05:22 +09:00
Ian Barwick
2f667116d8 repmgrd: include node name in log output
Missed in commit fd52df0.
2020-05-12 15:31:47 +09:00
Ian Barwick
8ee4fac5bb repmgrd: minor refactoring of try_primary_reconnect() 2020-05-12 14:52:14 +09:00
Ian Barwick
bb56387aaa repmgrd: consolidate connection closing code
PQfinish() should only be called on local PGconn pointers which
will not be reused.
2020-05-12 14:48:39 +09:00
Ian Barwick
5d00094936 repmgrd: ensure "close_connection()" always called after connection failure 2020-05-12 14:41:33 +09:00
Ian Barwick
ebdfdc530d repmgrd: ensure PQfinish() always executed on failed connections in NodeInfoLists
clear_node_info_list() will clean up any remaining active connections,
but we need to ensure all failed connections are cleaned up at the point
of failure to prevent leaks.

Per report in GitHub #643.
2020-05-12 14:22:08 +09:00
Ian Barwick
e5d3285d02 repmgrd: remove redundant log message 2020-05-11 16:59:32 +09:00
Ian Barwick
fd52df0fab repmgrd: include node name in log output in more places
Still a few places where only the node ID was reported, but it's always
useful to have the node name as well.
2020-05-11 16:55:31 +09:00
Ian Barwick
1b5ad743b5 standby clone: explicitly set closed connection pointers to NULL
We omitted to do this with the connections used when checking the system
identifier, which means libpq calls by the teardown function using the
pointer risk using unallocated memory.

Addresses issue reported in GitHub #644.
2020-05-11 13:52:10 +09:00
Ian Barwick
389c0ab9c0 Clarify use of function parameter 2020-05-11 13:37:42 +09:00
Ian Barwick
72dfe28e81 doc: clarify usage of "-f /etc/repmgr.conf" in examples 2020-05-08 10:22:52 +09:00
Ian Barwick
bc566f7a42 standby check: ignore upstream/downstream connections if node is witness
Per report in GitHub #641.
2020-05-08 09:37:30 +09:00
Ian Barwick
d1ab6ce28b standby clone: emit warning, not error if server is 9.3 and tablespace_mapping provided 2020-05-07 10:23:07 +09:00
Ian Barwick
bcc284cac9 Refactor configuration file reload handling
Rather than parse the configuration file into a new structure and
copy changed values from that into the main structure, we'll copy
the existing structure before parsing the changed configuration
file directly into the nmain structure, and revert using the copy
if any issues are encountered.

This is necessary as preparation for further reworking of the
configuration file structure handling. It also makes the reload
idempotent.

While we're at it, make some general improvements to the reload
handling, particularly:

 - improve logging to show "before" and "after" values
 - collate change notifications and only display if no errors
   were found
 - remove unnecessary double-logging of errors
 - various bugfixes
2020-05-05 15:29:07 +09:00
Ian Barwick
d37513312a Move the main configfile structure into configfile.c
This is required for a later refactoring of the configuration file
handling.
2020-05-05 14:43:55 +09:00
Ian Barwick
3ca642fee1 repmgrd: log receipt of SIGHUP at log level NOTICE
PostgreSQL itself logs it at log level LOG, which we don't have,
but NOTICE seems reasonable, especially as we log SIGTERM as that.
2020-05-05 13:41:23 +09:00
Ian Barwick
be8e5b45fa Add utility function validate_conninfo_string() 2020-05-05 13:41:18 +09:00
Ian Barwick
5ee4540640 Fix typo in comment 2020-05-01 12:13:06 +09:00
Ian Barwick
6507a374f7 doc: update legacy upgrade procedure
From PostgreSQL 13, the "CREATE EXTENSION ... FROM" syntax is no longer
available. It's unlikely at this point that someone will find themselves
with a PostgreSQL 13 database and the legacy repmgr schema, but we'll
cover all bases just in case.
2020-05-01 10:27:33 +09:00
Ian Barwick
c884f21a58 Tidy up log message 2020-04-28 14:23:39 +09:00
Ian Barwick
11e1b7a2c5 Check function return values in modify_auto_conf() 2020-04-27 14:26:04 +09:00
Ian Barwick
d0c5dffe91 standby clone: explicitly log that replication slots not in use
Helps with diagnosing output.
2020-04-27 13:57:18 +09:00
Ian Barwick
38b3447bd3 Add repmgr home page to --help output
Per PostgreSQL commit 1933ae629e7b706c6c23673a381e778819db307d it seems
to be all the rage these days.
2020-04-24 09:41:56 +09:00
Ian Barwick
3200c8c4e4 doc: clarify usage of the "passfile" parameter. 2020-04-23 15:03:40 +09:00
Ian Barwick
971309c830 Fix parsing of database connection check results in "standby clone" 2020-04-23 13:19:20 +09:00
Ian Barwick
1628bfb846 Update references to "recovery.conf" in _do_create_replication_conf() 2020-04-23 11:42:13 +09:00
Ian Barwick
8adcb1348d repmgrd: improve logging of promote_command failure
- log failure *before* we check if the primary has reappeared
- log the error code
2020-04-21 15:02:15 +09:00
Ian Barwick
025e66ea46 standby switchover: check superuser connection on demotion candidate
Add a sanity check that rempgr, when remotely executed on the demotion
candidate, is able to connect as superuser. If not, emit a diagnostic
command as a hint.
2020-04-21 11:25:01 +09:00
Ian Barwick
4e48301d78 standby switchover: note database name for superuser connections
It's useful to have a confirmation of which database repmgr is trying
to connect to when the -S/--superuser connection is provided.

It will always be the database defined in the repmgr.conf "conninfo"
parameter, but having the name available is useful when e.g.
troubleshooting issues with .pgpass configuration.
2020-04-20 16:49:47 +09:00
Ian Barwick
f8b214f721 doc: have Makefile clean up generated html files 2020-04-20 15:40:13 +09:00
Ian Barwick
96acd3f915 doc: clarify .pgpass usage with -S/--superuser option 2020-04-20 15:34:29 +09:00
Ian Barwick
c1584d587c doc: remove DEBUG output from example 2020-04-20 12:14:47 +09:00
Ian Barwick
2f26a02b5c doc: clarify usage of -F/--force with "standby promote"
Per GitHub #632.
2020-04-20 12:11:49 +09:00
Ian Barwick
cb2fb53556 Fix debug logging
Per GitHub #630.
2020-04-20 11:07:51 +09:00
Ian Barwick
e1953742a1 doc: note additional diagnostic options provided by "node check" 2020-04-17 11:53:56 +09:00
Ian Barwick
97d83bd443 standby switchover: add hint for diagnosing remote DB connection failure
Output a command, which when excuted on the local node (promotion
candidate) will attempt to remotely connect to the demotion candidate
and display both the connection message encountered and the connection
parameters used.

This is useful for corner-cases where the connection normally succeeds if a
particular environment variable (e.g. PGPORT) is normally set, but is
not set in the environment where SSH is executed.
2020-04-17 11:20:02 +09:00
Ian Barwick
45e96f21a5 node check: add option --db-connection
This is intended for diagnostic purposes, primarily when diagnosing
the connection parameters used when repmgr is being executed on a
remote node.
2020-04-15 17:48:23 +09:00
Ian Barwick
9df1731fb8 doc: update release notes 2020-04-15 14:50:51 +09:00
Ian Barwick
cfd35852b7 standby switchover: improve archive check error handling
Explicitly log if a database connection failure caused the check
to fail.

It's unlikely this situation will be encountered, as the data directory
check will already have run and checked for connection failure, however
there's a small chance the connection could fail between checks.
2020-04-15 14:08:33 +09:00
Ian Barwick
32dde4eaaf standby switchover: improve directory check failure handling
It's possible that the remote data directory check will fail if e.g.
connection configuration is not consistent across all nodes. This
modification ensures a database error connection is reported, rather
than a spurios issue with the data directory configuration.
2020-04-15 14:08:29 +09:00
Ian Barwick
78f89a4d47 node check: report connection error if --optformat provided
The --optformat option is intended for use when repmgr is being invoked
remotely by another repmgr instance, typically during a switchover
operation.

Previously no output was returned if the local repmgr was unable to
connect to its local PostgreSQL instance, which made diagnosing
various corner-case problems trickier than it should be.
2020-04-15 14:08:25 +09:00
Ian Barwick
410dd40526 standby switchover: standardize log message 2020-04-15 10:24:44 +09:00
Ian Barwick
3d85d8f5ff doc: update link to Debian package archive
See also https://www.df7cb.de/blog/2020/apt-archive.postgresql.org.html
2020-04-14 12:32:59 +09:00
Ian Barwick
d20711c267 Update Makefile for 5.2dev 2020-04-13 17:49:58 +09:00
Ian Barwick
e4a7da0132 Add upgrade route for repmgr 3.x to repmgr 5.1
The removal of some extensions functions means it's not possible to
follow the conventional incremental upgrade path; instead we'll
create a script for direct upgrades to 5.1.
2020-04-13 17:45:19 +09:00
Ian Barwick
4f0e8a503a doc: fix typo in 5.1.0 release section ID 2020-04-13 17:17:06 +09:00
Ian Barwick
c39289d570 doc: finalize release notes 2020-04-13 17:17:01 +09:00
Ian Barwick
662b603770 doc: update release notes 2020-04-13 17:16:55 +09:00
Ian Barwick
fa4f37ddb9 Bump master branch to 5.2dev 2020-04-08 14:22:51 +09:00
Ian Barwick
599bab590a Create temporary pg.auto.conf file with the same permissions as the original
Commit 0574279 set the file permissions to 0600 rather than the user's
umask, but if initdb was executed with -g/--allow-group-access, the
file is maintained with 0640, so we'll just maintain the existing
permssions.
2020-04-07 13:29:59 +09:00
Ian Barwick
cd80f265ac standby clone: warn about missing pg_rewind prerequisites
These are not essential for cloning a standby, but useful to warn
as early as possible in case the user is intending to use pg_rewind.
2020-04-06 15:37:37 +09:00
Ian Barwick
09f0be8ceb Minor log output fixes 2020-04-06 13:19:58 +09:00
Ian Barwick
59159dede7 Ensure next release is 5.1, not 5.1.0 2020-04-03 14:57:17 +09:00
Ian Barwick
780453e168 repmgrd: clarify log messages
Display the identity of the node question in the meassges fixed
in commit 8a27c89; this makes it easier to diagnose log output.
2020-04-03 13:02:49 +09:00
Tom Janson
8a27c89d18 repmgrd: fix inverted log message
Warning is emitted when the node in question is in recovery.
2020-04-03 12:39:36 +09:00
Ian Barwick
0a2091d5d3 repmgrd: handle new primary notification during failover check
It's possible a repmgrd instance might still be in the primary check
phase while a primary has already been promoted. Therefore it's
necessary to check for new primary notifications here, so we can
follow a new primary as quickly as possible.
2020-04-02 15:45:14 +09:00
Ian Barwick
447054a630 standby promote: in --dry-run mode, display promote command which will be used
For PostgreSQL 12 and later, explicitly note whether repmgr user has
execution permissions on the pg_promote() function.
2020-04-02 12:37:32 +09:00
Ian Barwick
d998cab3d0 doc: add note about granting permissions on pg_promote() 2020-04-02 11:32:02 +09:00
Ian Barwick
7f460c88bf doc: update "node check" examples
Also add example output for a standby.
2020-03-31 11:09:00 +09:00
Ian Barwick
e59da2d74e node check: display upstream info before downstream info
It makes more sense that way.
2020-03-31 11:08:28 +09:00
Ian Barwick
bffb8fa11b node check: handle --upstream option when node is primary 2020-03-31 10:49:30 +09:00
Ian Barwick
d9cb38c7f0 node check: add --upstream option
We have a --downstream option to check for attached nodes, but it
would be useful to have a corresponding --upstream option too.

A following patch will adapt the behaviour of this option when executed
on the primary node.
2020-03-30 17:54:52 +09:00
Ian Barwick
f3258c5002 cluster cleanup: explicitly log vacuum operation 2020-03-26 11:38:51 +09:00
Ian Barwick
5d92c99bb9 standby switchover: warn if no superuser connection available 2020-03-26 11:18:04 +09:00
Ian Barwick
6895916914 node service: explicitly note the node identity where CHECKPOINT issued
This output is logged during "standby switchover", so it's useful to be
aware of which node the activity is being performed on.
2020-03-25 14:00:55 +09:00
Ian Barwick
e64349e4da standby switchover: accept -S/--superuser option 2020-03-25 14:00:51 +09:00
Ian Barwick
2e9bc31c8c Consolidate code for establishing a superuser connection 2020-03-25 11:02:31 +09:00
Ian Barwick
325e3ea541 Update "repmgr node" help output 2020-03-25 09:37:49 +09:00
Ian Barwick
2b06f2d1ae node service: enable provision of the -S/--superuser option
This is required to be able to execute a CHECKPOINT if the normal
repmgr user is not a superuser.
2020-03-24 17:25:34 +09:00
Ian Barwick
304c1391cc node action: don't attempt to issue a CHECKPOINT as non-superuser
Raise a warning instead.
2020-03-24 17:25:27 +09:00
Ian Barwick
b09631b3bc doc: clarify pg_promote() usage 2020-03-24 15:40:26 +09:00
Ian Barwick
e561ddc8d3 node check: accept -S/--superuser option
This is mainly useful for the --data-directory-config option, which
requires permission to read pg_settings to verify that the data
directory configured in "repmgr.conf" matches the data directory
actually in use.

If pg_settings read permission is not available, repmgr will fall
back to a simple check that the data directory configured in
"repmgr.conf" is a valid PostgreSQL directory. This is not entirely
foolproof, as it's possible PostgreSQL could be using a different
data directory.
2020-03-23 17:14:04 +09:00
Ian Barwick
06f0e5e94f Minor error message output tweak 2020-03-23 16:30:00 +09:00
Ian Barwick
12adb5e0d1 Add warning if --superuser option provided when it won't be used
Currently the only place this option is relevant is "standby clone".
2020-03-23 15:28:22 +09:00
Ian Barwick
b691a1bd10 doc: minor fixes to pg_upgrade notes 2020-03-20 17:24:41 +09:00
Ian Barwick
dd35c22033 doc: add note about pg_upgrade and standbys 2020-03-20 17:22:15 +09:00
Ian Barwick
ddde31b14e doc: update section about passwords when cloning standbys
Reference new password management section and remove duplicate
info.
2020-03-18 17:24:39 +09:00
Ian Barwick
09a78111f6 doc: add note about alternative passfile locations 2020-03-18 17:08:54 +09:00
Ian Barwick
76cea52755 doc: add section about password management
This is briefly covered in the section about cloning, but is hard to
find.
2020-03-18 17:08:49 +09:00
Ian Barwick
57ba3ef19a doc: clarify database user permission requirements 2020-03-10 14:26:27 +09:00
Ian Barwick
0ddc9b8bbf doc: move database permission configuration to separate file 2020-03-10 14:26:23 +09:00
Ian Barwick
5722c0a582 doc: fix typo 2020-03-06 17:41:19 +09:00
Ian Barwick
eb346ac6ae Simplify configuration parsing code
No need to check for empty parameters; this is something left over
from the original configuration file parsing. Since converting
to use the PostgreSQL-based configuration file parsing, it is
redundant.
2020-03-06 16:27:37 +09:00
Ian Barwick
0bc0a28378 standby promote: enable "service_promote_command" in PostgreSQL 12
This enables the promote command generated internally by repmgr to
be overridden if desired, in the same way as for PostgreSQL 11 and
earlier.
2020-03-06 13:21:30 +09:00
Ian Barwick
fb5ce720f3 standby promote: fall back to "pg_ctl promote" if necessary
From PostgreSQL 12, the SQL-level function "pg_promote()" can be used
to promote a PostgreSQL instance, however usage is restricted to
superusers and users to whom explicit execution permission for this
function has been granted.

Therefore, if execution permission is not available, fall back to
"pg_ctl promote".
2020-03-06 12:53:37 +09:00
Ian Barwick
7c96afc6fb Move user information database functions into their own section
Code reorganisation for easier location of functions.
2020-03-06 12:53:34 +09:00
Ian Barwick
6559258b53 doc: update release notes 2020-03-05 17:24:24 +09:00
Ian Barwick
9de31428f1 Consolidate replication connection code
In a few places, replication connections are generated from the
parameters used by existing connections. This has resulted in a
number of similar blocks of code which do more-or-less the same
thing almost but not quite identically. In two cases, the code
omitted to set "dbname=replication", which can cause problems
in some contexts.

These code blocks have now been consolidated into standardized
functions.

This also resolves the issue addressed by GitHub #619.
2020-03-05 17:21:37 +09:00
Ian Barwick
10304a1a3b doc: update release notes 2020-03-04 17:21:31 +09:00
Ian Barwick
63aac64938 standby switchover: fetch remote repmgr version number 2020-03-04 17:21:27 +09:00
Ian Barwick
8f6058c676 standby switchover: check replication configuration file ownership
Within a PostgreSQL data directory, all files should have the same
ownership as the data directory itself. PostgreSQL itself expects
this, and ownership of files by another user is likely to cause
problems.

In PostgreSQL 11 or earlier, if "recovery.conf" cannot be moved
by PostgreSQL (because e.g. it is owned by root), it will not be
possible to promote the standby to primary.

In PostgreSQL 12 and later, if "postgresql.auto.conf" on the demotion
candidate (current primary) has incorrect ownership (e.g. owned by
root), repmgr will very likely not be able to modify this file and
write the replication configuration required for the node to rejoin
the cluster as a standby.

Checks added to catch both cases before a switchover is executed.
2020-03-04 17:21:22 +09:00
Ian Barwick
194b6d0948 Minor code simplification 2020-03-03 15:27:45 +09:00
Ian Barwick
c6dfe53f03 cluster show: update code comment listing supported options 2020-03-02 16:03:48 +09:00
Ian Barwick
e218422eca standby clone: fix references to "recovery.conf" for Pg 12 and later
"standby clone --recovery-conf-only" still mentioned "recovery.conf" in a
couple of places; change that to the more generic "replication configuration"
for Pg 12 and later.
2020-02-27 12:20:09 +09:00
Ian Barwick
e02e3dae29 doc: update "repmgr cluster show" output example 2020-02-26 16:02:08 +09:00
Ian Barwick
6ef722956b cluster show: show unreachable node's upstream name as uncertain 2020-02-25 16:50:45 +09:00
Ian Barwick
cebb1249aa Convert printed "WARNING" messages to logging output
In a couple of places ("cluster show" and "service status") we printed
connection status errors as a "WARNING" to stdout, followed by a log
HINT, but the latter was ineffective unless the configured log
level happened to match the level of the most recently emitted log
line (which will most likely be DEBUG).

Convert the printed WARNING lines to an actual log WARNING, to make the
behaviour of this output behave as expected.
2020-02-25 15:25:59 +09:00
Ian Barwick
51a7c31833 doc: update "repmgr cluster show" examples 2020-02-25 12:22:19 +09:00
Ian Barwick
76af2d9e08 cluster show: don't display witness node timeline ID
The witness node is not part of the replication cluster, so its timeline
ID is not of any relevance.
2020-02-25 10:33:54 +09:00
Ian Barwick
3b03edebb6 cluster show: correct timeline column length calculation
Unlikely to have made a difference unless abnormally long priority
or timeline values exist.
2020-02-25 09:55:41 +09:00
Ian Barwick
eaee7145f6 repmgrd: improve logging
Note node name and type when logging primary node visibility.
2020-02-24 15:33:03 +09:00
Ian Barwick
2bb89d252b doc: a witness server is also relevant for primary visibility consensus checks
Clarify documentation and example to make it clear that a witness
server, if present, is also used for primary visibility consensus
checks.
2020-02-24 15:25:15 +09:00
Ian Barwick
e782f2d949 repmgrd: improve logging
For easier log analysis, state which node is the current primary.
2020-02-20 13:06:41 +09:00
Ian Barwick
5d058dc371 doc: link to latest blog article from README 2020-02-19 09:30:57 +09:00
Ian Barwick
089b3ecb8b doc: note PostgreSQL 9.4 EOL 2020-02-14 11:17:06 +09:00
Ian Barwick
b4af80fdec Add optional check for unsupported future PostgreSQL releases
This is for backbranches to prevent them running against newer
PostgreSQL versions with which they are not compatible, for example
4.4.x with PostgreSQL 12 and later.
2020-02-14 10:43:19 +09:00
laixiong
cb7bbda021 standby switchover: check remote's registered repmgr.conf
Check the demotion candidate's registered repmgr.conf file can be found.

If the configuration file has been deleted or moved, previously the
resulting error message would have been a confusing reference to
an incorrectly configured data directory; by explicitly checking for the
expected configuration file, we can make troubleshooting easier.

Original patch by laixiong <yin.zhb@gmail.com> (GitHub #615), modified
by Ian Barwick.
2020-02-05 15:56:57 +09:00
Ian Barwick
9cf4616af1 standby switchover: mark successful if standby attaches later
Handle corner case where standby (demotion candidate) doesn't
attach during the main check cycle, but does at the final check,
where we'll want to mark the operation as successful.
2020-02-05 14:29:30 +09:00
Ian Barwick
6f01c54620 repmgr: improve "standby switchover" completion checks
There were some corner cases where "repmgr standby switchover"
would erroneously report a successful switchover, even if the
demotion candidate had not reattached to the promotion candidate.

Also improve the logging in various places to make it clearer
what is happening on which node.
2020-02-04 15:38:10 +09:00
Ian Barwick
7ed0a99d70 Make code to check standby join status available globally
This makes it possible to check the standby join status from another
node, e.g. the promotion candidate during a switchover operation.
2020-02-04 12:52:55 +09:00
Ian Barwick
e2a362a171 "node rejoin": check for available replication slots on the rejoin target
"standby follow" did this already, but "node rejoin" didn't.
2020-02-03 17:03:20 +09:00
Ian Barwick
cd7f36a6fd Add general check function "check_replication_slots_available()"
Make the code previously only used by "standby follow" generally
available - we'll want to use this from "node rejoin" as well.

While we're at it, when reporting failure due to lack of free
replication slots, report the current value of "max_replication_slots".
2020-02-03 16:43:55 +09:00
Ian Barwick
ab9c84c655 Report error code on follow/rejoin failure due to non-available slot
Previously, if "repmgr standby follow" or "repmgr node rejoin" failed
due to a replication slot not being available, no error code was
returned.
2020-02-03 15:03:31 +09:00
Ian Barwick
0141bc2be7 standby switchover: display "shutdown_check_timeout" value in --dry-run mode
It's useful to be aware of this setting.
2020-01-30 10:30:18 +09:00
Ian Barwick
4d6cff6c42 doc: link to latest blog article 2020-01-30 10:24:12 +09:00
Ian Barwick
84b824d86a Add missing values to action_name() 2020-01-29 15:32:40 +09:00
Ian Barwick
a7689ecd78 standby switchover: fix repmgr execution confirmation in --dry-run mode
Inexplicably, "localhost" was hard-coded, rather than the remote host
name.
2020-01-29 14:04:37 +09:00
Ian Barwick
ef30892250 standby switchover: improve wording of pending archive file messages 2020-01-28 13:40:59 +09:00
Ian Barwick
3ae6691d34 doc: minor clarifications to Debian package info 2020-01-24 10:36:46 +09:00
Ian Barwick
bd8eb82fb9 doc: update repository links to https 2020-01-24 10:29:40 +09:00
Ian Barwick
4d4ed3bcd6 Remove BDR 2.x support
The BDR 2.x support was conceptual only and was never used in
production. As BDR 2.x will be EOL'd shortly, there is no risk it will
be needed.
2020-01-16 09:52:42 +09:00
Ian Barwick
7fdf2f1778 Update copyright notices to 2020 2020-01-13 14:06:20 +09:00
Ian Barwick
46222cc0ae doc: note availability of RHEL 8 packages 2020-01-07 09:44:47 +09:00
Renaud Fortier
afa88f0514 Update repmgr.conf.sample
Add empty single quotes to promote_command and follow_command
2019-12-16 12:27:59 +09:00
Ian Barwick
647c0c879e repmgrd: fix configuration file reload handling
Usually repmgrd requires the parameters "promote_command" and
"follow_command" to be present in the configuration file. These are
not required if "failover=manual", but the configuration sanity check
following receipt of SIGHUP was not checking that.

Addresses issue reported in GitHub #614.
2019-12-16 11:40:05 +09:00
Ian Barwick
3f5d2f6ee9 standby follow: don't attempt to delete slot if new upstream is same as current
An attempt will be made to delete an existing replication slot on the
old upstream node (this is important during e.g. a switchover operation
or when attaching a cascaded standby to a new upstream). However if the
standby is currently attached to the follow target node anyway, the
replication slot should never be deleted.
2019-12-10 15:56:00 +09:00
Ian Barwick
ab6e5ceab3 doc: add reference to "ssh_options"
This is listed in "repmgr.conf.sample" but not the main documentation.
2019-11-25 10:17:31 +09:00
Ian Barwick
eb1d5c4e93 doc: link PgBouncer fencing document from main docs 2019-11-21 09:53:53 +09:00
Ian Barwick
b3c09c48bf doc: document "tablespace_mapping" parameter.
This was previously only mentioned in "repmgr.conf.sample".
2019-11-20 16:52:35 +09:00
Ian Barwick
e2ffeac67d doc: add missing single quotes in repmgr.conf.sample 2019-11-20 15:13:12 +09:00
Ian Barwick
4ed72eb901 Minor formatting fix 2019-11-20 15:13:01 +09:00
Ian Barwick
f158e35c13 Make variable local to code block 2019-11-20 10:13:55 +09:00
Ian Barwick
21475b9c70 doc: update pgbouncer fencing example
Add "--log-to-file" to "repmgr standby promote", per recommended practice.
2019-11-19 21:57:17 +09:00
Ian Barwick
95ee576052 doc: fix minor punctuation typo 2019-11-11 15:52:15 +09:00
Ian Barwick
ac753c2ba1 doc: ensure various repmgr.conf values are quoted appropriately 2019-11-08 11:54:31 +09:00
Ian Barwick
ce85ba6df5 doc: update repmgr.conf sample
Convert recovery.conf references to generic configuration descriptions,
and fix spacing.
2019-11-08 11:54:27 +09:00
Ian Barwick
c3aba173ea doc: clarify Barman configuation
Per confusion noted in GitHub #602.
2019-11-07 14:26:57 +09:00
Ian Barwick
93acdcfda2 doc: add single quotes to "barman_config" example.
Mandatory from repmgr 5.x.
2019-11-07 14:23:04 +09:00
Ian Barwick
25fb24eee4 Minor cleanup in repmgr-client.c 2019-10-30 16:58:30 +09:00
Ian Barwick
220ec7fc96 Minimize user permissions requirements for replication slots
Enable operations which create or drop replication slots to be carried
out with the minimum necessary user permissions, i.e. a user with the
REPLICATION attribute.

This can be the repmgr user, or a dedicated replication user.
In the latter case, if the dedicated replication user is only
permitted to make replication connections, the streaming
replication protocol is used to create/drop slots.

Implements part of GitHub #536.
2019-10-30 15:51:15 +09:00
Ian Barwick
1a9bcddccd standby clone: fix typo in log message 2019-10-28 14:08:48 +09:00
Ian Barwick
f0693271d3 Clean up replication slot creation/deletion functions 2019-10-25 14:31:11 +09:00
Ian Barwick
c23162e787 doc: note superuser requirement for "repmgr primary register" 2019-10-25 12:43:39 +09:00
Ian Barwick
b8f323af5a primary register: improve debug log output 2019-10-25 12:43:39 +09:00
Ian Barwick
63217e436a doc: note permission requirements for "repmgr standby (promote|switchover)
Per issues noted in GitHub #595.
2019-10-25 12:43:39 +09:00
Ian Barwick
45b9002e5b Modify function "is_replication_role()" to reference "pg_roles"
"pg_authid" is restricted to superusers.
2019-10-24 16:57:50 +09:00
Ian Barwick
52f9cd3bae Rename "_do_create_recovery_conf()" to "_do_create_replication_conf()"
As of PostgreSQL 12, the functionality is no longer specific to the
recovery.conf file.
2019-10-24 15:12:39 +09:00
Ian Barwick
cc540a54e5 Add functions for slot creation via the streaming replication protocol 2019-10-24 10:05:32 +09:00
Ian Barwick
9083f26990 Clarify usage of log_db_error() function 2019-10-24 10:04:15 +09:00
Ian Barwick
5405ae7100 doc: note which repmgr versions are supported in the compatibility matrix 2019-10-24 09:46:28 +09:00
Ian Barwick
aa2674e284 Fix typo in placeholder date 2019-10-24 09:45:12 +09:00
Ian Barwick
b007d5ed4b doc: update README
Link to compatibility matrix, support section.
2019-10-24 09:42:23 +09:00
Ian Barwick
63ddc2d39e Let _establish_db_connection() work with replication connections
Previously this was limited to establish_db_connection_by_params().
2019-10-23 14:45:44 +09:00
Ian Barwick
9976e646cd doc: update README 2019-10-23 13:45:34 +09:00
Ian Barwick
dc11330d58 Rename replication slot create/drop functions
Append "_sql" to the respective function names, as we'll later be
creating equivalent functions which use the replication protocol
so need a way to distinguish between them.
2019-10-23 13:43:09 +09:00
Ian Barwick
be494f0d5f standby clone: minimize requirement to check upstream data directory location
repmgr has always insisted on determining the upstream's data directory
location, which requires superuser permissions (or from PostgreSQL 10,
membership of the default role "pg_read_all_settings").

Knowledge of the data directory location was required to implement rsync
cloning (now deprecated), but with pg_basebackup the minimum permission
requirement is now only a normal user with access to the repmgr metadata
and a user with replication permissions. The ability to determine the
data directory location is only required if the user specifies the
--copy-external-config-files option, which needs to be able to determine
the data directory to work out which configuration files are located
outside it.

This patch makes it possible to clone a standby with minimum
permissions, with appropriate checks for available permissions if
--copy-external-config-files is provided.

Implements part of GitHub #536 and addresses issue raised in #586.
2019-10-23 10:46:40 +09:00
Ian Barwick
d7fd55be99 Update HISTORY 2019-10-18 16:47:28 +09:00
Ian Barwick
0574279ccb Ensure postgresql.auto.conf is created with correct permissions 2019-10-18 16:45:46 +09:00
Ian Barwick
b74f965f54 standby clone: rename --recovery-conf-only to --replication-conf-only
A more generic option name to cover pre- and post-Pg12 replication
configuration methods.

--recovery-conf-only is retained as an alias for backwards
compatibility.
2019-10-18 14:44:57 +09:00
Ian Barwick
b9e360d5b8 Tweak "repmgr standby --help" output not to mention recovery.conf
Use the more generic "replication configuration" to cover Pg12
and later.
2019-10-18 14:09:54 +09:00
Ian Barwick
047249e980 doc: expand section about requesting support 2019-10-18 11:58:13 +09:00
Ian Barwick
14f46b076e doc: add link to blog entry about Pg12 replication configuration changes 2019-10-18 11:57:52 +09:00
Ian Barwick
52abe309df Add function is_replication_role() 2019-10-17 17:13:18 +09:00
Ian Barwick
f45b9d7024 Call check_93_config() directly from check_upstream_config() 2019-10-16 16:57:53 +09:00
Ian Barwick
de67fa2441 standby clone: simplify data directory check
Now we no longer care about the upstream's data directory, and
normally expect to find the data directory in repmgr.conf, we
can just exit with an error in the corner case that no repmgr.conf
is provided and no data directory specified with -D/--pgdata.
2019-10-16 16:06:26 +09:00
Ian Barwick
5f6d970fd9 standby clone: update code comment
Follow-up to 0dce03a
2019-10-16 15:45:20 +09:00
Ian Barwick
0dce03a5f8 standby clone: don't query upstream's data directory
In early repmgr versions, this used to be a requirement for cloning
via rsync, and/or as a fallback location if the user didn't supply
a data directory to clone into. However as rsync cloning has been
deprecated, and the data directory must be specified in repmgr.conf,
this is no longer required, and removing it simplifies user privilege
requirements.

Note that it is still possible to explicitly provide a target data
directory with -D/--pgdata, though this is primarily useful for
the niche use case where repmgr is used as a convenience tool to
clone a node which is not intended to become part of a repmgr
cluster.

This is part of the implementation of GitHub #536 for the minimizing
of user privilege requirements.
2019-10-16 13:21:29 +09:00
Ian Barwick
5d81e03d2d Bump master branch to 5.1dev 2019-10-16 11:04:37 +09:00
Ian Barwick
fdb61a1dea Change version number from 5.0 to 5.0.0
Previous initial "major" releases were two-element only (e.g. 4.4);
beginning from repmgr 5 we want to ensure all version numbers have
three elements, for general consistency, including the generation
of package names.
2019-10-16 10:45:54 +09:00
Ian Barwick
8a38188c47 doc: split notes about PostgreSQL 9.3 and 9.4 support into a new subsection 2019-10-15 10:47:12 +09:00
Ian Barwick
5dcca6b053 doc: add note about 4.x development policy 2019-10-15 10:47:09 +09:00
Ian Barwick
a0591afb1e doc: add repmgr 5.0 release date 2019-10-15 10:29:52 +09:00
Ian Barwick
af1c889bc3 Bump version to 5.0 2019-10-14 15:10:40 +09:00
Ian Barwick
2304584679 Fix handling of upstream node change check
repmgrd has a check to see if the upstream node has unexpectedly
changed, e.g. if the repmgrd service is paused and the PostgreSQL
instance has been pointed to another node.

However this check was relying on the node record on the local node
being up-to-date, which may not be the case immediately after a
failover, when the node is still replaying records updated prior
to the node's own record being updated. In this case it will
mistakenly assume the node is following the original primary
and attempt to restart monitoring, which will fail as the original
primary is no longer available.

To prevent this, we check against the node's record on the upstream
node.

Addresses issue noted in GitHub #587 and #588.
2019-10-14 12:28:04 +09:00
Ian Barwick
4aaa24a5f8 Update configuration file conversion script
Ensure output is quoted.
2019-10-08 10:59:20 +09:00
Ian Barwick
ea29af2e68 Remove mistakenly added file 2019-10-08 10:33:27 +09:00
Ian Barwick
7053ed5b51 doc: update links to Barman documentation 2019-10-08 10:30:41 +09:00
Ian Barwick
a845d7126d doc: minor updates to "repmgr standby clone" reference
- remove references to repmgr 4.0.4 (present because feature
  was added in a minor release, but that's a long time ago)
- note configuration is appended to postgresql.auto.conf
2019-10-08 10:21:26 +09:00
Ian Barwick
1c317059cd doc: clarify use of --recovery-conf-only 2019-10-08 10:03:44 +09:00
Ian Barwick
dcf5bfb649 doc: fix minor formatting error 2019-10-08 09:48:04 +09:00
Ian Barwick
dbd3d34c89 doc; update repmgr.conf.sample
Note new PostgreSQL-style parsing and add link to documentation.
2019-10-07 18:28:16 +09:00
Ian Barwick
f3c3320a9c doc: update examples in quickstart guide 2019-10-07 18:27:38 +09:00
Ian Barwick
cfc5bde219 doc: update repmgr.conf samples in Barman section
From repmgr 5.x we really need to quote all the things.
2019-10-07 12:05:26 +09:00
Ian Barwick
ea57269569 doc: improve instructions for cloning from Barman 2019-10-07 11:56:40 +09:00
Ian Barwick
b885337abc Minor code formatting fix 2019-10-07 10:48:35 +09:00
Ian Barwick
e8b5b92893 doc: improve Barman standby clone example 2019-10-03 15:52:32 +09:00
Ian Barwick
02bdcf657f doc: add link to FAQ section about 3rd party package compatibilty 2019-10-03 15:23:57 +09:00
Ian Barwick
405f70f769 doc: clarify barman-cli package usage
As of Barman 2.8, the barman-cli package has been merged with the core
Barman code, so is only requires an explicit mention for Barman 2.0 ~ 2.7.
2019-10-03 14:35:45 +09:00
Ian Barwick
577ca35de5 doc: remove reference to Barman 1.x.
Barman 1.x is very outdated and should no longer be used anyway.
2019-10-03 14:25:17 +09:00
Martín Marqués
a557f2d69e Typo in the documentation of the repmgrd configuration
Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2019-10-01 09:44:30 -03:00
Ian Barwick
1196821457 doc: update FAQ
Note pg_monitor default role as well, and link to relevant section
in the PostgreSQL documentation.
2019-10-01 10:29:44 +09:00
Ian Barwick
c1d464f3da doc: add FAQ entry about 3rd-party PostgreSQL packages 2019-10-01 10:07:25 +09:00
Ian Barwick
4646bbc289 doc: standardize doc formatting
Indent with two spaces instead of one.
2019-10-01 09:44:04 +09:00
Ian Barwick
bb9a0c2297 doc: update FAQ
Add note about repmgr 5 in the "What's the difference between the repmgr
versions?" item.
2019-10-01 09:41:49 +09:00
Ian Barwick
1ed8b1067a Prevent use of backend string functions
From PostgreSQL 12, port.h forcibly redefines printf() et al to use
the versions defined by PostgreSQL (pg_printf() et al). As this
causes linking issues in build environments which build pre-Pg12
versions against Pg12's libpq, ensure relevant macros defined
in port.h are undefined.
2019-09-26 12:47:51 +09:00
Ian Barwick
a502b2cf96 Move function parse_repmgr_version() to a more appropriate location 2019-09-24 13:14:03 +09:00
Ian Barwick
8e6d111f32 Refactor remote_command() function
Use dynamic rather than fixed buffer to generate the command string.
2019-09-24 13:14:03 +09:00
Ian Barwick
50d4cee877 doc: link to GitHub release page 2019-09-24 11:37:25 +09:00
Ian Barwick
db99e98236 doc: note flex required for source builds
From 5.0 onwards.
2019-09-24 11:34:24 +09:00
Ian Barwick
10f00b8822 repmgr: pass explicitly provided log level when executing repmgr remotely
This makes it possible to return log output when executing repmgr
remotely at a different level to the one defined in the remote
repmgr's repmgr.conf.

This is particularly useful when DEBUG output is required.
2019-09-17 15:38:43 +09:00
Ian Barwick
98e96f4375 doc: clarify configuration file changes 2019-09-17 13:01:03 +09:00
Ian Barwick
8489fd061f doc: update release notes 2019-09-17 11:19:05 +09:00
Ian Barwick
56aae22b6c doc: update release notes 2019-09-17 11:00:04 +09:00
Ian Barwick
bce039f336 doc: add clarifications to "Standby disconnection on failover" 2019-09-05 18:00:56 +09:00
Ian Barwick
19190153a0 doc: clarify prerequisites for executing the "repmgr service" commands
Note node accessibility requirements on each command's reference
page.
2019-08-30 12:07:37 +09:00
Ian Barwick
b1c6418e8d doc: expand "see also" links for "repmgr (service|daemon)" commands 2019-08-30 11:51:32 +09:00
Ian Barwick
12491c6aa1 doc: add note about starting/stopping repmgrd on individual nodes 2019-08-30 11:51:11 +09:00
Ian Barwick
d5b806eeff Fix pg_control handling for PostgreSQL 12
"FullTransactionId" is not present in earlier versions, so we'll
have to #ifdef that out for those.

We should probably add more elegant checks to ensure that repmgr
is being executed against the PostgreSQL version it was built against.
In a normal production environment that will be the case as the
matching package will have been installed, so version mismatches
don't usually occur.

Note that while currently most aspects of the repmgr client are
compatible across PostgreSQL versions, the repmgrd code is very
version specific, so version specificity is a given anyway.
2019-08-29 17:22:01 +09:00
Ian Barwick
37bfe6ee4f Have "make clean" remove "repmgr_version.h" 2019-08-29 17:17:33 +09:00
Ian Barwick
97c21ed907 doc: update package examples
Use recent version numbers where appropriate.
2019-08-29 15:57:49 +09:00
Ian Barwick
3ea47522cd doc: clarify "repmgr daemon (start|stop)" purpose 2019-08-28 16:28:29 +09:00
Ian Barwick
935be3d669 doc: update example PostgreSQL version references to Pg12 2019-08-28 15:52:14 +09:00
Ian Barwick
494444869d doc: updates for repmgr 5.x and PostgreSQL 12 2019-08-28 15:33:01 +09:00
Ian Barwick
677a94513e repmgr: note that --dry-run is not effective with "repmgr service status" 2019-08-28 15:14:35 +09:00
Ian Barwick
931da14df1 Rename some "repmgr daemon ..." commands to "repmgr service ..."
"repmgr daemon" can be interpreted to mean the commands affect the local
daemon process only. Rename the commands which affect the entire cluster
to "repmgr service ...".

The "repmgr daemon ..." form of the affected commands is retained for backwards
 compatibility.
2019-08-28 14:58:11 +09:00
Ian Barwick
3e812f6e91 repmgrd: always emit NOTICE when attempting to follow a new primary
Previously, if a standby's repmgrd was looping in degraded monitoring
mode looking for a new primary to follow, once a new primary was
detected the follow command would be executed without any prior
logging at non-DEBUG log levels.
2019-08-26 16:02:41 +09:00
Ian Barwick
ffc7b7817b doc: update HISTORY
Note PostgreSQL support.
2019-08-22 15:42:29 +09:00
Ian Barwick
fb6352735a The next major release will be 5.0.
4.5 was a placeholder release number in case a major release was required
prior to the release of Pg12.
2019-08-22 15:15:56 +09:00
Ian Barwick
f122c44a77 Fix debugging output 2019-08-21 15:52:08 +09:00
Ian Barwick
b4e6922a26 doc: add further note about configuration file changes 2019-08-20 16:36:58 +09:00
Ian Barwick
6e200d32a4 doc: add note about max_wal_senders in Pg12 and later 2019-08-20 11:17:36 +09:00
Ian Barwick
507b27c05d Mark set_repmgrd_pid() as "RETURNS NULL ON NULL INPUT"
When unsetting the PID, we'll want to set the pidfile to NULL rather
than an empty string.
2019-08-19 20:15:30 +09:00
Ian Barwick
28f4536372 repmgrd: fix pidfile handling at shutdown 2019-08-19 17:55:18 +09:00
Ian Barwick
2ec761a979 doc: update "repmgr standby clone" reference for PostgreSQL 12
Mainly replace "recovery.conf" with the more generic "replication
configuration".
2019-08-19 10:40:42 +09:00
Ian Barwick
b5225a2662 Support --create-recovery-conf in Pg12 2019-08-19 10:40:38 +09:00
Ian Barwick
d5ed38f573 Make "standby follow" work in Pg12 2019-08-19 10:40:35 +09:00
Ian Barwick
2a37e28304 write standby.signal 2019-08-19 10:40:31 +09:00
Ian Barwick
9eb6ce52b4 Write replication configuration for Pg12 and later 2019-08-19 10:40:27 +09:00
Ian Barwick
9e072c6773 Update code comment with list of options for "standby clone" 2019-08-15 18:11:47 +09:00
Ian Barwick
72daa38baa durable_rename() only available externally from Pg10 2019-08-14 20:28:05 +09:00
Ian Barwick
f5044465cb Add function to safely modify postgresql.auto.conf
This is required for PostgreSQL 12 and later.
2019-08-14 16:57:42 +09:00
Ian Barwick
4ebc43fd63 Clean up variable usage in do_node_status()
Variable with the same name existed both at function level and within
local code blocks.
2019-08-14 14:15:41 +09:00
Ian Barwick
a1775237d4 Update comment
Deprecated command line option --data-dir was removed in commit 5ca0b57,
but a comment still referred to it.
2019-08-14 14:12:09 +09:00
Ian Barwick
94ba635811 Define our own PG_AUTOCONF_FILENAME 2019-08-13 16:48:44 +09:00
Ian Barwick
c0f3990973 Use appendPQExpBufferStr where appropriate 2019-08-13 16:32:40 +09:00
Ian Barwick
d0f5ee1851 doc: mention not to use --siblings-follow in the repmgrd promote command
This is noted on the "repmgr standby promote" page but needs repeating
on the repmgrd configuration page.
2019-08-13 11:49:52 +09:00
Ian Barwick
75c0987e79 repmgrd: emit node name when reporting follow target attach error
This is consistent with other error messages.
2019-08-13 11:02:52 +09:00
Ian Barwick
68be86349b Add function to parse version string returned by "repmgr --version" 2019-08-08 13:47:19 +09:00
Ian Barwick
666c6f5140 "standby clone": improve error messages related to extension status
Previously repmgr would emit the "repmgr extension not found on source node"
which depending on context is somewhat misleading, as it may exist
but not be installed, or the user may be attempting to clone from the
wrong database.
2019-08-07 16:41:27 +09:00
Ian Barwick
3df65d0eb3 Simplify pg_has_role() call
Specifying CURRENT_USER is superfluous here.
2019-08-07 14:43:56 +09:00
Ian Barwick
38b373e6df "node check": check role membership when trying to read pg_settings
From PostgreSQL 10, a member of the default roles "pg_monitor" and/or
"pg_read_all_settings" can read pg_settings without requiring superuser
privileges.

Previously, a hint was being emitted about making the repmgr user a
member of one of those groups, but no check for membership was being
made, meaning the check could only be run by a superuser.
2019-08-07 14:26:48 +09:00
Ian Barwick
10870503d1 Add missing field in init_replication_info()
"upstream_node_id" was not being initialised.
2019-08-06 21:24:37 +09:00
Ian Barwick
5ca0b57d0c Remove command-line options deprecated since repmgr 3.3
The following options have long since been deprecated, and any attempt
to use them results only in a warning that they are no longer valid:

  --data-dir
  --no-conninfo-password
  --recovery-min-apply-delay
2019-08-05 16:26:12 +09:00
Ian Barwick
7d20aea606 Fix typo in comment 2019-08-01 15:20:44 +09:00
Ian Barwick
424d92e311 doc: fix typo 2019-08-01 14:14:37 +09:00
Ian Barwick
8d55cab25e Convert configuration file parsing to use flex
Previously, repmgr was using a very simple ad-hoc string-based parser,
which had various limitations and allowed configuration files to be
created in a way which could cause confusion and/or unexpected
behaviour.

For example, it accepted strings enclosed in single quotes, but treated
strings enclosed in double quotes literally. A node_name defined thusly:

    node_name="somenode"

would result in the literal value '"somenode"' being used, which could
lead to unobvious errors along the lines of:

    no record found for ""somenode""

The configuration file parser has been adapted from the one used by
PostgreSQL itself, so behaves more-or-less identically (though some
functions such as file inclusion are not supported in repmgr).

This makes configuration parsing more robust and consistent;
additionally, error reporting will be more precise.

Note this does mean that some repmgr.conf items previously accepted
as valid by repmgr will now be rejected; in particular this includes
strings containing spaces which are not enclosed in single quotes.
2019-08-01 10:17:20 +09:00
Ian Barwick
ab7e527af8 More sed tweakage 2019-07-26 21:25:16 +09:00
Ian Barwick
9274fdc6ba Use more portable sed invocation
Works on FreeBSD too.
2019-07-26 20:13:15 +09:00
Ian Barwick
018394faa2 Define PG_ACTUAL_VERSION_NUM
Due to [insert reason here], in the Debian package build process (and
only there), when building frontend code PG_VERSION_NUM appears to be
from the newest libpq-dev version installed, and does not necessarily
match the version of the server the code is being built against.

To work around this distribution-specific package build issue, we'll
define our own substitution variable which is taken from the value
provided in Makefile.global.
2019-07-26 18:12:07 +09:00
Ian Barwick
532a5207e2 More portable usage of sed in Makefile 2019-07-26 18:08:16 +09:00
Ian Barwick
5bf9605286 Revert "Convert configuration file parsing to use flex"
This reverts commit c6ca183247.

Backing out this patch for now as the Debian build system doesn't
seem to like it, even though it builds just fine on Debian itself.
2019-07-18 10:19:18 +09:00
Ian Barwick
215f4bb9d9 doc: add note about parallel restore from Barman 2019-07-18 09:23:09 +09:00
Ian Barwick
75a381ed27 doc: add reminder to update release date in version header 2019-07-09 11:37:50 +09:00
Ian Barwick
a26b7e29a0 Add release date placeholder 2019-07-09 11:33:09 +09:00
Ian Barwick
d09214b83d doc: define entity &releasedate;
This should be used wherever we need to show the latest release
date.

Don't use this in the release notes however, as it will be easy to
forget to update it when adding notes for a new release.
2019-07-09 11:32:43 +09:00
Ian Barwick
822abbbe5b doc: update compatibility matrix
Use &repmgrversion; entity to generate the current version number and
prevent document bitrot.

Also define a "release-current" ID attribute for ease of linking to
the current release notes.

Per notification from the mailing list.
2019-07-09 11:07:04 +09:00
Ian Barwick
d51704e272 doc: update release notes 2019-07-04 11:21:37 +09:00
Ian Barwick
c6ca183247 Convert configuration file parsing to use flex
Previously, repmgr was using a very simple ad-hoc string-based parser,
which had various limitations and allowed configuration files to be
created in a way which could cause confusion and/or unexpected
behaviour.

For example, it accepted strings enclosed in single quotes, but treated
strings enclosed in double quotes literally. A node_name defined thusly:

    node_name="somenode"

would result in the literal value '"somenode"' being used, which could
lead to unobvious errors along the lines of:

    no record found for ""somenode""

The configuration file parser has been adapted from the one used by
PostgreSQL itself, so behaves more-or-less identically (though some
functions such as file inclusion are not supported in repmgr).

This makes configuration parsing more robust and consistent;
additionally, error reporting will be more precise.

Note this does mean that some repmgr.conf items previously accepted
as valid by repmgr will now be rejected; in particular this includes
strings containing spaces which are not enclosed in single quotes.
2019-07-03 12:18:01 +09:00
Ian Barwick
b125628f7b doc: update release notes
Finalize release date.
2019-06-26 15:57:42 +09:00
Ian Barwick
cd550fcd5c doc: clean up release notes
Remove tabs.
2019-06-14 16:53:45 +09:00
Ian Barwick
b6dc8af6c7 doc: fix typo 2019-06-12 16:28:28 +09:00
Ian Barwick
09979eaa91 note that "standby follow" requires a primary to be available
While it's technically possible to have a standby follow another
standby while the primary is not available, repmgr will not be able
to update its metadata, which will cause Confusion and Chaos.

Update the documentation to make this clear, and provide a more helpful
error message if this situation occurs. The operation previously
failed anyway, but with an unhelpful message about not being able to
find a node record.
2019-06-11 15:14:17 +09:00
Ian Barwick
3469152314 doc: document optional configuration settings 2019-06-10 14:08:34 +09:00
Ian Barwick
3bf308509f doc: add missing space 2019-06-10 09:02:11 +09:00
Ian Barwick
01852f7e3a doc: improve repmgr.conf settings documentation 2019-06-07 12:48:36 +09:00
Ian Barwick
36a09a5c4b doc: improve configuration documentation 2019-06-07 12:16:04 +09:00
Ian Barwick
7180e2bed7 Canonicalize the data directory path when parsing the configuration file
This ensures the provided path matches the path PostgreSQL reports as its
data directory.
2019-06-07 09:48:01 +09:00
Ian Barwick
6aca764d5e Fix extension version number query 2019-06-06 12:46:12 +09:00
Ian Barwick
341421f8e0 standby follow: remove some ineffective code
For some reason we were taking the trouble to extract an appliction_name
from the local node's conninfo, but this was being subsequently overwritten
with the node name (which is what we want anyway).
2019-06-06 12:12:33 +09:00
Ian Barwick
f5d29f6591 doc: update release notes 2019-06-06 11:30:30 +09:00
Ian Barwick
c0ea5ffa04 Ensure parsed value of --upstream-conninfo is written to recovery.conf
Previously it was being parsed (a step which ensures any "application_name"
set by the caller is changed to the node name), but the original string
was being copied to "primary_conninfo" anyway.
2019-06-06 11:30:24 +09:00
Ian Barwick
aa44c8abf1 Add .sql extension files for 4.5 2019-06-04 15:58:22 +09:00
Ian Barwick
456a95f159 Bump master branch to 4.5dev
Note that the next release is intended to be 5.0 to coincide with the
release of PostgreSQL 12; 4.5 is currently a placeholder in case we
need to push out a feature release before then.
2019-06-04 14:26:48 +09:00
Ian Barwick
703a483e81 Remove redundant comment in .sql files 2019-06-04 13:46:10 +09:00
Ian Barwick
d893ce227b repmgrd: optionally exclude/include witness server from child node checks 2019-06-03 16:04:54 +09:00
Ian Barwick
e8731f8159 doc: update child node monitoring documentation 2019-06-03 16:04:51 +09:00
Ian Barwick
20d710e34c doc: update filename referenced in code comment 2019-06-03 15:30:02 +09:00
Ian Barwick
7e8710b1e9 doc: remove redundant entity definitions 2019-06-03 15:29:08 +09:00
Ian Barwick
19e8387d8f doc: remove mistakenly committed .sgml file 2019-05-30 19:58:06 +09:00
Ian Barwick
b5ff2ec120 repmgrd: update log text 2019-05-30 16:08:04 +09:00
Ian Barwick
0a4072b8f7 witness (un)register: add event details
Also create an actual event notification for both actions, rather
than just creating the event record.

This is presumably an oversight from the original conversion to
repmgr4 which no-one has noticed before.
2019-05-30 14:41:10 +09:00
Ian Barwick
d4df0055c9 repmgr: use --compact (not --terse) in "cluster events" to hide details column
This is consistent with usage elsewhere.

"--terse" is intended to reduce logging noise.
2019-05-30 14:19:37 +09:00
Ian Barwick
06a83247c9 repmgrd: note node type when logging child node dis/re-connections 2019-05-30 14:06:54 +09:00
Ian Barwick
a6ea1d0fda repmgrd: fix witness node disconnection monitoring 2019-05-30 11:51:50 +09:00
Ian Barwick
9a0994856a doc: note witness node behaviour in child node monitoring 2019-05-30 11:50:31 +09:00
Ian Barwick
45e17223b9 Update variable/field names relating to pg_basebackup's -X option
Now the "xlog nomenclature" Pg versions are fading into the past,
rename things related to handling pg_basebackup's -X option
(was: --xlog-method, now: --wal-method) to start with "wal_"
rather than "xlog_".

This is a cosmetic change for code clarity.
2019-05-30 09:32:06 +09:00
Ian Barwick
9085ca46a8 doc: update release notes 2019-05-28 15:38:19 +09:00
Ian Barwick
9114299223 Tweak log output if attempted to register witness on primary cluster 2019-05-28 14:58:32 +09:00
John Naylor
519df66197 Disallow witness on primary cluster 2019-05-28 14:40:15 +09:00
Ian Barwick
d1e708454f Fix fwrite() result check 2019-05-28 14:37:36 +09:00
Ian Barwick
d54f0d66fb Free palloc'd StringInfoData data 2019-05-28 13:04:44 +09:00
Ian Barwick
c153e2fc02 standby clone: improve --dry-run output
Log positive check results as an additional confirmation that the
upstream configuration appears to be correct.
2019-05-28 00:54:39 +09:00
Ian Barwick
44a39760a1 standby clone: improve source node replication connection check
Previously, the check was attempting to make replication connections
to the source node, and if these were failing, inferring that
insufficient walsenders were available.

However it's quite likely that the connections are refused due to
insufficient user connection permissions. So before performing
the connection check, query the number of potentially available
walsenders on the source node and compare it with the number
required (either 1 or 2) - if insufficient, exit with error and
hint about increasing "max_wal_senders".

Once we've established sufficient walsenders are available, inability
to connect is most likely related to permissions issues on the source
node.
2019-05-28 00:11:53 +09:00
Ian Barwick
b959f771c1 Improve naming/usage of node record variables in "standby clone"
Make it clearer we're dealing with the upstream node record.

Also avoid "overloading" the upstream record when checking for an
existing record with the same node name; this was not technically
a problem but mildly confusing when reading the code.
2019-05-27 23:39:49 +09:00
Ian Barwick
c560dfbbce cluster show: display timeline ID
This helps provide a better picture of the state of the cluster, i.e.
making it more obvious whether there's been a timeline divergence.

This also provides infrastructure for further improvements in cluster
status display and diagnosis.

Note this is only available in PostgreSQL 9.6 and later as it relies
on the SQL functions for interrogating pg_control, which can be executed
remotely. As PostgreSQL 9.5 will shortly be the only community-supported
version without these functions, it's not worth the effort of trying
to duplicate their functionality.
2019-05-27 09:39:19 +09:00
Ian Barwick
df6d160d2e Reformat REPMGR_NODE_COLUMNS macros for readability 2019-05-24 16:39:02 +09:00
Ian Barwick
14b805d650 Makefile: improve documentation targets
- add documentation targets to main Makefile
- ensure clean/maintainer-clean remove all generated documentation files
2019-05-24 14:15:54 +09:00
Ian Barwick
1d46261c24 doc: update appendix "Installing old package versions"
Move legacy 3.x package info to separate section.
2019-05-24 10:03:38 +09:00
Ian Barwick
8ead0042ad Miscellaneous comment and logging cleanup1 2019-05-23 09:31:46 +09:00
Ian Barwick
2bce1b371c doc: fold putative 4.3.1 release notes into 4.4 2019-05-23 09:03:18 +09:00
Ian Barwick
3c8bab97d8 Fix variable declarations 2019-05-22 17:26:34 +09:00
Ian Barwick
c9e85996f5 repmgr: prevent a standby being cloned from a witness server
Previously repmgr would happily clone from whatever server
it found at the provided source server address. We should
ensure that a standby can only be cloned from a node which
is part of the main replication cluster.

This check fetches a list of nodes from the source server,
connects to the first non-witness server it finds, and
compares the system identifiers of the source node and the
node it has connected to. If there is a mismatch, then the
source server is clearly not part of the main replication
cluster, and is most likely the witness server.
2019-05-22 16:52:25 +09:00
Ian Barwick
fa66e72c2f repmgrd: count witness server as child node for connection monitoring purposes
As the witness server does not, by definition, ever have an entry in pg_stat_replication,
we need to check its "attached" status by connecting to the witness server itself
and querying the reported upstream node ID (which should be set by the witness
server repmgrd). If this matches the current primary node ID, we count it as attached.
2019-05-21 15:19:41 +09:00
Ian Barwick
e6195edbca cluster show: warn if unable to connect to witness's upstream
Fix also applies to "daemon status".
2019-05-21 12:35:49 +09:00
Ian Barwick
2326c384c0 cluster show: fix upstream check for witnesses
Fix also applies to "daemon status"
2019-05-21 12:28:32 +09:00
Ian Barwick
074769a090 doc: remove copypasta error 2019-05-20 15:40:03 +09:00
Ian Barwick
10425d6967 doc: rename file endings from .sgml to .xml
As they are now XML files. In PostgreSQL itself they remain with
the .sgml suffix for backwards compatibility, but that's not
important for us.
2019-05-20 15:38:40 +09:00
Ian Barwick
cbaa890a22 doc: document "primary_visibility_consensus" 2019-05-17 14:55:51 +09:00
Ian Barwick
24e1108dba doc: fix incorrect case 2019-05-17 11:15:24 +09:00
Ian Barwick
f03e012c99 cluster show/daemon status: report if node not attached to advertised upstream 2019-05-14 16:15:03 +09:00
Ian Barwick
dd78a16006 Change return type of is_downstream_node_attached() from bool to NodeAttached
This enables us to better determine whether a node is definitively
attached, definitively not attached, or if it was not possible to
determine the attached state.
2019-05-14 15:57:20 +09:00
Ian Barwick
7599afce8b doc: mention minimum PostgreSQL version for building repmgr docs
As-is, it won't build against PostgreSQL 9.4 or earlier, but as 9.4
will be removed from community support later this year, it's not
so critical.
2019-05-14 14:26:03 +09:00
Ian Barwick
8587539adb Fix command line sanity check 2019-05-14 13:27:00 +09:00
Ian Barwick
fca033fb9d cluster show/daemon status: report upstream node mismatches
When showing node information, check if the node's copy of its
record shows a different upstream to the one expected according
to the node where the command is executed.

This helps visualise situations where the cluster is in an
unexpected state, and provide a better idea of the actual state.

For example, if a cluster has divided somehow and a set of nodes are
following a new primary, when running "cluster show" etc., repmgr
will now show the name of the primary those nodes are actually
following, rather than the now outdated node name recorded
on the other side of the split. A warning will also be issued
about the situation.
2019-05-14 13:11:31 +09:00
Ian Barwick
ae44012383 Minor code fixes to "cluster show"/"daemon status" formatting 2019-05-14 11:36:59 +09:00
Ian Barwick
b938f10206 repmgr client: mark some options as deprecated 2019-05-13 15:45:34 +09:00
Ian Barwick
0af732e88f doc: tweaks for PDF generation 2019-05-13 15:42:51 +09:00
Ian Barwick
1d36e34dfd doc: use "--wal-method" as the standard option
and note it's "--xlog-method" for 9.6 and earlier. This matches
practice elsewhere in the documentation.
2019-05-13 09:31:53 +09:00
Ian Barwick
d8e4c54ea4 "standby switchover": add "--repmgrd-force-unpause"
Implements GitHub #559.
2019-05-10 16:04:07 +09:00
Ian Barwick
d43b40c5c6 doc: enable creation of PDF files 2019-05-10 10:50:49 +09:00
Ian Barwick
ecf4bdb431 doc: fix typos in source install instructions
s/llib/lib/g
2019-05-10 10:28:52 +09:00
Ian Barwick
9d7a3e24af doc: tweak Makefile 2019-05-10 10:25:07 +09:00
Ian Barwick
6684822274 doc: update documentation build instructions
Also add an item in the release notes
2019-05-10 10:10:14 +09:00
Ian Barwick
edf3aa6687 doc: restore original stylesheet for now 2019-05-09 16:24:41 +09:00
Ian Barwick
255623004c doc: update link to PostgreSQL documentation 2019-05-09 16:24:37 +09:00
Ian Barwick
04a6bf86f2 doc: update documentation build instructions 2019-05-09 16:24:30 +09:00
Ian Barwick
3804c95019 doc: (re)add single page HTML generation 2019-05-09 16:24:26 +09:00
Ian Barwick
409eb47e2a doc: convert documentation to DocBook XML
This brings the repmgr documentation build system in line with that
used by the main PostgreSQL project, and removed the restriction
that documentation must be built against PostgreSQL 9.6 or earlier.

Main formatting changes are:

 - convert empty-element tags (mainly <xref/>)
 - put <indexterm> sections in the correct location
 - correct usage of various entities.
2019-05-09 16:24:21 +09:00
Ian Barwick
1a6f7e979d doc: update release notes 2019-05-07 17:23:03 +09:00
Ian Barwick
6f8fa45604 doc: update release notes 2019-05-07 15:49:54 +09:00
Ian Barwick
5e03627e6c doc: update release notes 2019-05-07 15:29:56 +09:00
Ian Barwick
1c13e57c8b doc: update release notes 2019-05-07 15:27:13 +09:00
Ian Barwick
02245a0014 repmgrd: add missing PQfinish() calls 2019-05-02 18:50:21 +09:00
Ian Barwick
4b37562444 Make it clearer that a witness node counts as a "sibling node"
It's not attached to the primary per-se, but needs to know what
the current primary is in order to correctly synchronise its
copy of the metadata.

Per GitHub #560.
2019-05-02 14:22:53 +09:00
Ian Barwick
8da355eb3f doc: update release notes 2019-05-02 14:00:07 +09:00
Ian Barwick
b8fa71257a doc: update "repmgr standby promote" documentation
Document new "--siblings-follow" option.
2019-05-02 12:06:08 +09:00
Ian Barwick
fed09ecaae standby promote: have former siblings follow new primary 2019-05-02 12:04:49 +09:00
Ian Barwick
98d09f83b5 standby (promote|switchover): improve --dry-run functionality
Continue checks as far as possible.
2019-05-02 12:04:43 +09:00
Ian Barwick
7bbe938e19 Separate promotion candidate walsender/slot checks into discrete functions
For use by "standby promote" as well as "standby follow"
2019-05-02 12:04:40 +09:00
Ian Barwick
63c7f758c3 Remove unneeded server version number variables
No need to pass these around.
2019-05-02 12:04:33 +09:00
Ian Barwick
b9f07f6a91 standby promote: use variable name "local_conn" for the local connection handle
This is consistent with usage in other functions, and makes it easier to
differentiate between the local node connection and the primary connection.
2019-05-02 12:04:26 +09:00
Ian Barwick
e4615b4666 Refactor code for executing --siblings-follow
This will enable provision of "--siblings-follow" to "repmgr standby promote"
2019-05-02 12:04:15 +09:00
Ian Barwick
dbeffbf29a doc: define entity for repmgrd 2019-05-01 10:36:54 +09:00
Ian Barwick
4d1e11533e doc: add missing space in example output 2019-05-01 10:14:18 +09:00
Ian Barwick
52905f1eb3 Standardize on "ID: %i" when logging node IDs
Previously there was a mix of "id:", "node id:", "node ID:" and "node_id:".
2019-04-30 17:07:33 +09:00
Ian Barwick
6c3b4c0db8 Remove unused line 2019-04-30 15:53:24 +09:00
Ian Barwick
89a7261483 Always quote node names in log messages 2019-04-30 15:52:56 +09:00
Frantisek Holop
d7de0a64e0 doc: bit too many e.g.'s
PR #565.
2019-04-30 10:47:45 +09:00
Frantisek Holop
531c4d9853 doc: promote -> follow
PR #565
2019-04-30 10:43:15 +09:00
Ian Barwick
356fe2e640 Fix "repmgr daemon status --csv" output 2019-04-29 20:52:27 +09:00
Ian Barwick
e32acda8c0 standby switchover: ignore nodes which are unreachable and marked as inactive
Previously "repmgr standby switchover" would abort if any node was unreachable,
as that means it was unable to check if repmgrd is running.

However if the node has been marked as inactive in the repmgr metadata, it's
reasonable to assume the node is no longer part of the replication cluster
and does not need to be checked.
2019-04-29 14:35:49 +09:00
Ian Barwick
5f10e68f31 emit warning if "--siblings-follow" provided out-of-context 2019-04-29 14:12:22 +09:00
Ian Barwick
87910a5448 repmgrd: improve logging of sibling node's upstream info
If the sibling node has already been promoted (for whatever
reason, e.g. "repmgr standby promote" was executed manually)
and has exited recovery, the upstream node ID will normally
be reported as "-1", which is correct, but looks confusing in
the logs.

We now only report the upstream node ID if the sibling node
is still in recovery, *or* if it has exited recovery but is
still reporting an extant node ID.
2019-04-29 13:51:17 +09:00
Ian Barwick
ec6266e375 doc: list caveats when monitoring child node disconnection 2019-04-25 17:52:14 +09:00
Ian Barwick
2082a8d3f3 Consolidate some code 2019-04-25 16:04:40 +09:00
Ian Barwick
c8d52bab6d cluster show: fix thinko introduced in commit 9fe2fa2 2019-04-25 15:46:07 +09:00
Ian Barwick
dbbf35ded1 Update HISTORY 2019-04-25 14:59:33 +09:00
Ian Barwick
9fe2fa2daf daemon status: make output more like that of "cluster show"
In particular make any issues with unexpected server state more
obvious.
2019-04-25 14:45:41 +09:00
Ian Barwick
da24896fd5 doc: add child node monitoring example 2019-04-24 16:04:47 +09:00
Ian Barwick
c092ce60a7 doc: document "child_node..." configuration parameters 2019-04-24 14:48:38 +09:00
Ian Barwick
090493ebc9 doc: document "child_node" events 2019-04-24 13:19:00 +09:00
Ian Barwick
8d80267ab1 doc: update "repmgr primary register" output 2019-04-24 13:18:31 +09:00
Ian Barwick
3231b5034d Remove temporary debugging log output 2019-04-24 13:17:52 +09:00
Ian Barwick
5a9175c740 Clarify hints about updating the repmgr extension 2019-04-24 11:37:31 +09:00
Ian Barwick
58b33fb411 Clarify a couple of code comments 2019-04-24 10:55:53 +09:00
Ian Barwick
3129da221e "primary register": ensure --force works if another primary is registered but not running 2019-04-23 16:54:07 +09:00
Ian Barwick
6cbf436bf8 Don't execute "child_nodes_disconnect_command" when repmgrd paused 2019-04-23 14:08:13 +09:00
Ian Barwick
5a90513878 repmgrd: monitor standbys attached to primary
This functionality enables repmgrd (when running on the primary) to
monitor connected child nodes. It will log connections and disconnections
and generate events.

Additionally, repmgrd can execute a custom script if the number of connected
child nodes falls below a configurable threshold. This script can be used
e.g. to "fence" the primary following a failover situation where a new primary
has been promoted and all standbys are now child nodes of that primary.
2019-04-22 16:18:52 +09:00
Ian Barwick
64c4cb81d5 Update pg_control processing for PostgreSQL 12 2019-04-18 09:31:33 +09:00
Ian Barwick
3115face28 doc: add note about when a PostgreSQL restart is required
Per query in GitHub #564.
2019-04-17 09:43:35 +09:00
Ian Barwick
80f66e87c9 Improve string handling during configuration file reload 2019-04-16 11:20:41 +09:00
Ian Barwick
ad28cf95bd standby register: add upstream node ID in event details 2019-04-16 11:01:22 +09:00
Ian Barwick
a0c6cb602f repmgrd: remove duplicate function definition 2019-04-16 10:53:05 +09:00
Ian Barwick
27803f93ff repmgrd: always unset upstream node ID when monitoring a primary 2019-04-12 12:26:39 +09:00
Ian Barwick
1a344d488a Use sizeof() consistently 2019-04-11 23:07:58 +09:00
Ian Barwick
46d17d0933 repmgrd: fix log output 2019-04-11 16:29:08 +09:00
Ian Barwick
6b79e08706 repmgrd: add addiitonal log output in do_election() 2019-04-11 15:46:20 +09:00
Ian Barwick
cd6a55c7cb repmgrd: improve primary visibility consensus check
Exclude sibling nodes which report they're following a different
node. This shouldn't happen, but could.
2019-04-11 15:46:14 +09:00
Ian Barwick
008bd00a59 repmgrd: store upstream node ID in shared memory 2019-04-11 15:46:09 +09:00
Ian Barwick
5a8741199f repmgrd: exclude witness server from followability check 2019-04-11 11:19:12 +09:00
Ian Barwick
dd454a8374 Miscellaneous string handling cleanup
This is mainly to prevent effectively spurious truncation warnings
in recent GCC versions.
2019-04-10 16:18:56 +09:00
Ian Barwick
a9b56d9833 Fix hint message
s/UPGRADE/UPDATE
2019-04-10 12:08:26 +09:00
Ian Barwick
ef47589c6b standby clone: always ensure directory is created with correct permissions
In Barman mode, if there is an existing, populated data directory, and
the "--force" option is provided, the entire directory was being deleted,
and later recreated as part of the rsync process, but with the default
permissions.

Fix this by recreating the data directory with the correct permissions
after deleting it.
2019-04-09 10:58:27 +09:00
Ian Barwick
77b9887d61 standby clone: improve --dry-run behaviour in barman mode
- emit additional informational output
- ensure that provision of --force does not result in an existing
  data directory being modified in any way
2019-04-08 15:12:22 +09:00
Ian Barwick
7631c60933 doc: update release notes 2019-04-08 11:27:25 +09:00
Ian Barwick
a8d560860d Ensure BDR-specific code only runs on BDR 2.x
The BDR support in repmgr is for a specific BDR 2.x use case, and
is not suitable for more recent BDR versions.
2019-04-05 14:37:49 +09:00
Ian Barwick
c338bc9c5e doc: add note about BDR replication type in sample config 2019-04-05 14:37:49 +09:00
Ian Barwick
3c8e42ff15 doc: emphasise that BDR2 support is for BDR2 only 2019-04-05 10:53:23 +09:00
Ian Barwick
be9c6d5fc6 Use correct sizeof() argument in a couple of strncpy calls
Source and destination buffers are however the same length in both cases.

Per GitHub #561.
2019-04-04 10:58:00 +09:00
Ian Barwick
55e79bd0b7 doc: update 4.3 release notes 2019-04-03 15:08:35 +09:00
Ian Barwick
8970a72be9 doc: update README
Link to current documentation version
2019-04-03 11:12:48 +09:00
Ian Barwick
7791abb8f7 doc: add a link to the current documentation from the contents page 2019-04-03 10:54:18 +09:00
Ian Barwick
602e06a8f4 doc: finalize 4.3 release notes 2019-04-02 14:42:06 +09:00
Ian Barwick
84f4c6c979 doc: note that --siblings-follow will become default in a future release 2019-04-02 11:04:36 +09:00
Ian Barwick
67e977592c standby switchover: list nodes which will remain attatched to the old primary
If --siblings-follow is not supplied, list all nodes which repmgr considers
to be siblings (this will include the witness server, if in use), and
which will remain attached to the old primary.
2019-04-02 10:46:59 +09:00
Ian Barwick
b1cd7e7edf doc: update quickstart guide
Improve sample PostgreSQL replication configuration, including
links to the PostgreSQL documentation for each configuration item.

Also set "max_replication_slots" to the same value as "max_wal_senders"
to ensure the sample configuration will work regardless of whether
replication slots are in use, though we do still encourage careful
reading of the comments in the sample configuration and the documentation
in general.
2019-04-02 09:27:37 +09:00
Ian Barwick
a564f365c1 Fix default return value in alter_system_int() 2019-04-01 14:50:19 +09:00
Ian Barwick
799ac6d453 Add is_server_available_quiet()
For use in cases where the caller collates node availability information
and doesn't want to prematurely emit log output.
2019-04-01 12:27:30 +09:00
Ian Barwick
57c0ccd477 Improve copying of strings from database results
Where feasible, specify the maximum string length via sizeof(), and
use snprintf() in place of strncpy().
2019-04-01 11:19:58 +09:00
Ian Barwick
aef8e31897 Bump master branch to 4.4dev 2019-03-28 17:24:36 +09:00
Ian Barwick
3d4b81ba2a Handle unhandled error situation in enable_wal_receiver() 2019-03-28 14:52:16 +09:00
Ian Barwick
98d924685b Updae BDR repmgrd to handle node_name as a max 63 char string
Follow-up from commit 1953ec7.
2019-03-28 14:32:52 +09:00
Ian Barwick
79613af8d0 Handle potential NULL return from string_skip_prefix() 2019-03-28 12:45:53 +09:00
Ian Barwick
5e9f202c9a Add missing break 2019-03-28 12:44:50 +09:00
Ian Barwick
e44c048ae2 Update code comment 2019-03-28 12:44:30 +09:00
Ian Barwick
bb42d8cba6 Fix calculation of maximum filename length 2019-03-28 12:40:29 +09:00
Ian Barwick
9d5afeebbc Remove logically dead code 2019-03-28 12:35:41 +09:00
Ian Barwick
fe822a9eea Prevent potential file descriptor resource leak 2019-03-28 12:29:10 +09:00
Ian Barwick
03cd5a6028 Put closedir call in correct location 2019-03-28 12:08:42 +09:00
Ian Barwick
1e1c596446 Add various missing close() calls 2019-03-28 11:32:25 +09:00
Ian Barwick
d43975eb5f Use correct argument for sizeof() 2019-03-28 11:02:50 +09:00
Ian Barwick
ece20f4831 Cast "int" to "long long" 2019-03-28 11:02:25 +09:00
Ian Barwick
e23f5afc5f doc: note valid characters for "node_name"
"node_name" will be used as "application_name", so should only contain
characters valid for that; see:

    https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-APPLICATION-NAME

Not yet enforced.
2019-03-28 10:53:43 +09:00
Ian Barwick
ba1f05ece9 Restrict "node_name" to maximum 63 characters
In "recovery.conf", the configuration parameter "node_name" is used
as the "application_name" value, which will be truncated by PostgreSQL
to 63 characters (NAMEDATALEN - 1).

repmgr sometimes needs to be able to extract the application name from
pg_stat_replication to determine if a node is connected (e.g. when
executing "repmgr standby register"), so the comparison will fail
if "node_name" exceeds 63 characters.
2019-03-28 10:37:57 +09:00
Ian Barwick
73ad689390 standby register: fail if --upstream-node-id is the local node ID 2019-03-27 14:22:55 +09:00
Ian Barwick
e9ece34aeb log_db_error(): fix formatted message handling 2019-03-27 11:00:31 +09:00
Ian Barwick
9dd2f30c72 Use sizeof(buf) rather than hard-coding value 2019-03-27 10:43:49 +09:00
Ian Barwick
9164d3931b repmgrd: clean up PQExpBuffer handling
Unless the PQExpBuffer is required for the duration of the function,
ensure it's always a variable local to the relevant code block. This
mitigates the risk of accidentally accessing a generically named
PQExpBuffer which hasn't been initialised or was previously terminated.
2019-03-26 13:15:25 +09:00
Ian Barwick
801ed2b0c8 repmgrd: don't terminate uninitialized PQExpBuffer 2019-03-26 11:35:45 +09:00
Ian Barwick
e490f35223 doc: fix syntax 2019-03-22 15:43:55 +09:00
Ian Barwick
ec873b0119 doc: update release notes 2019-03-22 15:43:49 +09:00
Ian Barwick
539861cb58 repmgrd: during failover, check if a node was already promoted
Previously, repmgrd assumed that during a failover, there would not
already be another primary node. However it's possible a node was
promoted manually. While this is not a desirable situation, it's
conceivable this could happen in the wild, so we should check for
it and react accordingly.

Also sanity-check that the follow target can actually be followed.

Addresses issue raised in GitHub #420.
2019-03-22 14:06:41 +09:00
Ian Barwick
6f0f338968 standby follow: set replication user when connecting to local node 2019-03-21 16:43:39 +09:00
Ian Barwick
bd26eb3025 standby switchover: don't attempt to pause repmgrd on unreachable nodes 2019-03-21 13:48:59 +09:00
Ian Barwick
9b089b7401 doc: add note about compiling against Pg11 and later with the --with-llvm option 2019-03-21 10:30:00 +09:00
Ian Barwick
314a1e8f4f use a constant to denote unknown replication lag 2019-03-20 17:26:04 +09:00
Ian Barwick
7204a0faf4 doc: consolidate witness server documentation 2019-03-20 16:31:52 +09:00
Ian Barwick
5e775cef16 doc: various improvements to repmgrd documentation 2019-03-20 16:10:03 +09:00
Ian Barwick
7d0caefaee Fix logging related to "connection_check_type"
Also log the selected type at repmgrd startup.
2019-03-20 11:58:18 +09:00
Ian Barwick
7434cc0b8e repmgrd: improve witness node monitoring
Mainly fix a couple of places where "standby" was hard-coded into a log
message which can apply either to a witness or a standby.
2019-03-20 11:47:36 +09:00
Ian Barwick
b84d98fe81 Explictly log PQping() failures 2019-03-20 11:47:32 +09:00
Ian Barwick
46efe57cd0 Improve database connection failure logging
Log the output of PQerrorStatus() in a couple of places where it was missing.

Additionally, always log the output of PQerrorStatus() starting with a blank
line, otherwise the first line looks like it was emitted by repmgr, and
it's harder to scan the error message.

Before:

    [2019-03-20 11:24:15] [DETAIL] could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?

After:

    [2019-03-20 11:27:21] [DETAIL]
    could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?
2019-03-20 11:47:28 +09:00
Ian Barwick
426759ca8e check_primary_status(): handle case where recovery type unknown 2019-03-18 16:16:54 +09:00
Ian Barwick
39df55c39c Check node recovery type before attempting to write an event record
In some corner cases (e.g. immediately after a switchover) where
the current primary has not yet been determined, the provided connection
might not be writeable. This prevents error messages such as
"cannot execute INSERT in a read-only transaction" generating unnecessary
noise in the logs.
2019-03-18 15:26:16 +09:00
Ian Barwick
f54ff85cfa Remove outdated comment
This was only relevant for repmgr3 and earlier; in repmgr4 the schema
is hard-coded.
2019-03-18 15:19:11 +09:00
Ian Barwick
8ab51c2ae3 Refactor check_primary_status()
Reduce nested if/else branching, and improve documentation.
2019-03-18 15:01:21 +09:00
Ian Barwick
43f28f4097 Clarify calls to check_primary_status()
Use a constant rather than a magic number to indicate non-provision
of elapsed degraded monitoring time.
2019-03-18 14:21:34 +09:00
Ian Barwick
0940185f49 doc: clarify "cluster show" error codes 2019-03-18 10:49:38 +09:00
John Naylor
4f9fc56871 Fix assorted Makefile bugs
1. The target additional-maintainer-clean was misspelled as
maintainer-additional-clean.

2. Add add missing clean targets, in particular sysutils.o, config.h,
repmgr_version.h, and Makefile.global. While at it, use a wildcard
for obj files.

3. Don't delete configure.

4. Remove generated file doc/version.sgml from the repo.

5. Have maintainer-clean recurse to the doc directory.
2019-03-15 16:29:31 +09:00
Ian Barwick
fbdf9617fa doc: update repmgrd example output 2019-03-15 15:43:11 +09:00
Ian Barwick
dfb92df05f doc: miscellaenous cleanup 2019-03-15 14:39:37 +09:00
Ian Barwick
9dd87dd5ce doc: add explanation of the configuration file format 2019-03-15 14:02:42 +09:00
Ian Barwick
a2df69512a doc: update "connection_check_type" descriptions 2019-03-14 15:44:59 +09:00
Ian Barwick
c2206b007a repmgrd: optionally check upstream availability through connection attempts 2019-03-14 15:44:53 +09:00
John Naylor
e06d3de444 Correct some doc typos 2019-03-14 11:58:31 +08:00
Ian Barwick
9d056b2f72 doc: expand "standby_disconnect_on_failover" documentation 2019-03-14 12:08:13 +09:00
Ian Barwick
19bf4d7434 Count witness and zero-priority nodes in visibility check 2019-03-14 11:17:51 +09:00
Ian Barwick
56d9f5b856 Ensure witness node sets last upstream seen time 2019-03-14 10:53:47 +09:00
Ian Barwick
c1d6753081 doc: fix option name typo 2019-03-14 09:32:06 +09:00
Ian Barwick
2b59b4894a doc: expand "failover_validate_command" documentation 2019-03-13 21:10:03 +09:00
Ian Barwick
c3c58df7b9 repmgrd: improve logging output when executing "failover_validate_command" 2019-03-13 21:07:26 +09:00
Ian Barwick
0e2f3e563a doc: various updates 2019-03-13 16:55:32 +09:00
Ian Barwick
8c4421d110 doc: merge repmgrd witness server description into failover section 2019-03-13 16:12:17 +09:00
Ian Barwick
69cb3f1e82 doc: merge repmgrd split network handling description into failover section 2019-03-13 16:12:14 +09:00
Ian Barwick
960acfeb3c doc: merge repmgrd monitoring description into operating section 2019-03-13 16:12:11 +09:00
Ian Barwick
a8d50a5b98 doc: merge repmgrd degraded monitoring description into operation section 2019-03-13 16:12:06 +09:00
Ian Barwick
11e5993bf5 doc: merge repmgrd notes into operation documentation 2019-03-13 16:12:03 +09:00
Ian Barwick
09861a5604 doc: merge repmgrd pause documentation into overview 2019-03-13 16:11:59 +09:00
Ian Barwick
89bba77d4d doc: initial repmgrd doc refactoring 2019-03-13 16:11:55 +09:00
Ian Barwick
dd6ece326f doc: update repmgrd configuration documentation 2019-03-13 13:34:08 +09:00
Ian Barwick
573d027db6 repmgrd: various minor logging improvements 2019-03-13 11:27:17 +09:00
Ian Barwick
1afb41647b repmgrd: remove global variable
Make the "sibling_nodes" local, and pass by reference where relevant.
2019-03-12 17:12:23 +09:00
Ian Barwick
fc397f25f6 repmgrd: enable election rerun
If "failover_validation_command" is set, and the command returns an error,
rerun the election.

There is a pause between reruns to avoid "churn"; the length of this pause
is controlled by the configuration parameter "election_rerun_interval".
2019-03-12 17:12:19 +09:00
Ian Barwick
99923f5ffc Remove redundant struct allocation 2019-03-11 19:06:07 +09:00
Ian Barwick
b9cdcd55e7 doc: update list of reloadable repmgrd configuration options 2019-03-11 16:18:10 +09:00
Ian Barwick
db87ff46fd doc: document "failover_validation_command" 2019-03-11 15:02:33 +09:00
Ian Barwick
2a8f8d8400 doc: expand repmgrd configuration section 2019-03-11 14:50:33 +09:00
Ian Barwick
4ef706c2ca Execute "failover_validation_command" when only one standby exists 2019-03-08 12:19:37 +09:00
Ian Barwick
663c2e75b4 Make "failover_validation_command" reloadable 2019-03-08 09:27:19 +09:00
Ian Barwick
db0d71c6a7 Initial implementation of "failover_validation_command" 2019-03-08 08:49:15 +09:00
Ian Barwick
6f4f56dd8c Make recently added configuration options reloadable 2019-03-07 10:58:25 +09:00
Ian Barwick
33fefd9f52 Add configuration option "primary_visibility_consensus"
This determines whether repmgrd should continue with a failover if
one or more nodes report they can still see the standby.
2019-03-07 10:41:42 +09:00
Ian Barwick
a3f90d2bba Add configuration option "sibling_nodes_disconnect_timeout"
This controls the maximum length of time in seconds that repmgrd will
wait for other standbys to disconnect their WAL receivers in a failover
situation.

This setting is only used when "standby_disconnect_on_failover" is set to "true".
2019-03-06 15:56:21 +09:00
Ian Barwick
2ed044c358 Reset "wal_retrieve_retry_interval" for all nodes 2019-03-06 15:55:03 +09:00
Ian Barwick
9823978f41 repmgrd: don't wait for WAL receiver to reconnect during failover
If the WAL receiver has been temporarily disabled, we don't want to
wait for it to start up as it may not be able to at that point; we do
however need to reset "wal_retrieve_retry_interval".
2019-03-06 15:54:56 +09:00
Ian Barwick
ae8171e461 Improve logging/sanity checking for "node control" options 2019-03-06 15:54:30 +09:00
Ian Barwick
1f8f64d57c Improve logging when disabling/enabling WAL receiver
Also check action is being run on node which is in recovery.
2019-03-06 15:54:26 +09:00
Ian Barwick
13c650fa83 Check for WAL receiver start up 2019-03-06 15:54:23 +09:00
Ian Barwick
f85b4cd98e Log warning if "standby_disconnect_on_failover" used on pre-9.5
"standby_disconnect_on_failover" requires availability of "wal_retrieve_retry_interval",
which is available from PostgreSQL 9.5.

9.4 will fall out of community support this year, so it doesn't seem
productive at this point to do anything more than put the onus on the user
to read the documentation and heed any warning messages in the logs.
2019-03-06 15:54:15 +09:00
Ian Barwick
1615353f48 repmgrd: optionally disconnect WAL receivers during failover
This is intended to ensure that all nodes have a constant LSN while
making the failover decision.

This feature is experimental and needs to be explicitly enabled with the
configuration file option "standby_disconnect_on_failover".

Note enabling this option will result in a delay in the failover decision
until the WAL receiver is disconnected on all nodes.
2019-03-06 15:53:57 +09:00
Ian Barwick
dd04ebb809 repmgrd: handle reconnect to restarted server when using "connection" checks 2019-03-06 14:54:05 +09:00
Ian Barwick
b4dcda37a1 *_transaction() functions: log error message text as DETAIL
Per behaviour elsewhere.
2019-03-06 12:12:47 +09:00
Ian Barwick
63f7ad546e repmgrd: add option "connection_check_type"
This enable selection of the method repmgrd uses to check whether the upstream
node is available. Possible values are:

 - "ping" (default): uses PQping() to check server availability
 - "connection":  executes a query on the connection to check server
   availability (similar to repmgr3.x).
2019-03-06 12:09:54 +09:00
Ian Barwick
4f83111033 repmgrd: ignore invalid "upstream_last_seen" value 2019-03-05 11:00:29 +09:00
Ian Barwick
92103c5338 Use appendPQExpBufferStr where approrpriate 2019-03-01 16:42:00 +09:00
Ian Barwick
4b89cbd98d Rename "..._primary_last_seen" functions to "..._upstream_last_seen"
As that better reflects what they do.
2019-02-28 15:36:55 +09:00
Ian Barwick
0330fa6e62 daemon status: with csv output, show repmgrd status as unknown where appropriate
Previously, if PostgreSQL was not running on the node, repmgrd and
pause status were shown as "0", implying their status was known.

This brings the csv output in line with the human-readable output,
which displays "n/a" in this case.
2019-02-28 12:27:39 +09:00
Ian Barwick
4006f8af3c doc: upate release notes 2019-02-28 10:01:51 +09:00
Ian Barwick
b1875a8d91 Split command execution functions into separate library
These may need to be executed by repmgrd.
2019-02-27 14:41:17 +09:00
Ian Barwick
5c2264eb8d Update .gitignore
Ignore artefacts from failed patch application.
2019-02-27 13:02:30 +09:00
Ian Barwick
a6c16541c2 doc: tweak wording in event notification documentation 2019-02-27 13:01:19 +09:00
Ian Barwick
790a1cc492 repmgrd: add additional logging during a failover operation 2019-02-27 11:46:05 +09:00
Ian Barwick
067ed82931 Remove unneeded debugging output 2019-02-26 21:16:11 +09:00
Ian Barwick
59f32d74df doc: update introductory blurb 2019-02-26 15:16:46 +09:00
Ian Barwick
0578053875 standby clone: check upstream connections after data copy operation
With long-running copy operations, it's possible the connection(s) to
the primary/source server may go away for some reason, so recheck
their availability before attempting to reuse.
2019-02-26 14:37:05 +09:00
John Naylor
897e3bee14 Doc fix: PostgreSQL 9.4 is no longer considered recent 2019-02-24 12:44:10 +07:00
John Naylor
4e414d2ea0 Fix typo 2019-02-24 10:50:09 +07:00
Ian Barwick
ea36609159 Add some missing query error logging 2019-02-23 16:54:07 +09:00
Ian Barwick
0c68018631 repmgrd: log details of nodes which can see primary
If a failover is cancelled because other nodes can still see the primary,
log the identies of those nodes.
2019-02-23 15:55:06 +09:00
Ian Barwick
b72c894db4 repmgrd: during failover, check if other nodes have seen the primary
In a situation where only some standbys are cut off from the primary,
a failover would result in a split brain/split cluster situation,
as it's likely one of the cut-off standbys will promote itself, and
other cut-off standbys (but not all standbys) will follow it.

To prevent this happening, interrogate the other sibiling nodes to
check whether they've seen the primary within a reasonably short interval;
if this is the case, do not take any failover action.

This feature is experimental.
2019-02-23 13:03:22 +09:00
Ian Barwick
07097575b1 daemon status: add column "upstream last seen"
This displays the interval (in seconds) since the repmgrd instance on
each node last confirmed its upstream node is available.
2019-02-23 13:03:16 +09:00
Ian Barwick
71d151ca87 Don't check status of logical replication slots
We only want to check the status of physical replication slots
to determine whether a streaming replication standby has become
detached and there is therefore a risk of uncontrolled WAL buildup
on the local node.

It's not feasible to second-guess the state of logical replication
slots.
2019-02-23 10:09:43 +09:00
Ian Barwick
5abec2bb97 doc: clarify replication slot usage with Barman
Barman will usually use one replication slot, but that's generally
preferable to multiple slots.
2019-02-22 13:52:02 +09:00
Ian Barwick
de70fd42dc node check: simplify output generation in --is-shutdown-cleanly check 2019-02-22 10:49:06 +09:00
Ian Barwick
99550b91bd standby register: warn if standby is running and connection params provided
Addresses GitHub #552.
2019-02-22 10:31:00 +09:00
John Naylor
70190c37c4 Bring list of supported versions on the doc front page in line with the supported version matrix 2019-02-20 11:41:17 +07:00
Ian Barwick
f3fc4e5afb Minor syntax formatting tweak
For consistency.
2019-02-15 19:58:35 +09:00
Ian Barwick
629c552348 primary unregister: ensure correct behaviour when executed on a witness
Fixes GitHub #548.
2019-02-15 19:49:17 +09:00
Ian Barwick
85a97c933f Handle unhandled NodeStatus in switch statement 2019-02-15 19:31:06 +09:00
Ian Barwick
3a5a4388c7 cluster show: differentiate unreachable status
Differentiate between unreachable nodes and nodes which are running
but rejecting connections.
2019-02-15 16:01:55 +09:00
Ian Barwick
9338a9e233 Improve logging output
Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail lineImprove logging output

Avoid emitting blank detail line
2019-02-15 10:49:56 +09:00
Ian Barwick
7fad2ed2c8 standby switchover: improve error output
It wasn't clear why repmgr thinks the demotion candidate is not
the upstream of the promotion candidate.
2019-02-14 17:22:24 +09:00
Ian Barwick
9305953bd2 Fix history file parsing
Also add additional debugging output.
2019-02-14 15:52:40 +09:00
Ian Barwick
aeb9639ed9 node rejoin: add more log detail during rejoin success check
Stating what is actually being checked where might be useful
when diagnosing potential issues.
2019-02-13 15:29:39 +09:00
Ian Barwick
bc9e725d05 node rejoin: always emit detail about relative LSNs
Previously repmgr only emitted that if there was a timeline/LSN
mismatch, but it's useful to have confirmation of how it came
to the conclusion that rejoin will succeed.
2019-02-13 15:16:40 +09:00
Ian Barwick
905e108f8f doc: fix typos etc. in "standby follow" reference 2019-02-12 17:24:56 +09:00
Ian Barwick
f2362a06fa doc: update "standby switchover" reference 2019-02-12 16:39:13 +09:00
Ian Barwick
7b85cb9f12 doc: update "standby follow" reference
Add note about handling of timeline forks etc.
2019-02-12 16:39:06 +09:00
Ian Barwick
790bec21dd node rejoin: handle case where node to rejoin was primary
In that case the minRecoveryPoint* fields may be empty.
2019-02-12 13:31:25 +09:00
Ian Barwick
a0dc673439 "node rejoin": use minRecoveryPointTLI for comparing timelines 2019-02-12 13:31:21 +09:00
Ian Barwick
25019d1cc5 Refactor is_wal_replay_paused() query
Make sure it doesn't emit an error if executed on a node not
in recovery.

The caller should theoretically only execute it on nodes in
recovery, but there are sure to be corner cases where the node
has come out of recovery.
2019-02-12 10:21:05 +09:00
Ian Barwick
d00cb767a6 cluster show: don't try to run WAL replay pause query on unreachable node 2019-02-12 10:15:06 +09:00
Ian Barwick
8e0d28d8dc Fix "repmgr daemon --help" output
Per report from Shaun.
2019-02-12 09:20:29 +09:00
yonj1e
e146fb4fc3 Fix undeclared 'TRUE' error
GitHub #547.
2019-02-11 16:55:54 +09:00
Ian Barwick
8773543e10 doc: update "daemon (start|stop)" documentation
Clarify various aspects related to configuration.
2019-02-11 10:55:10 +09:00
Ian Barwick
a4cd4ee553 doc: fix quoting in "standby switchover" index entries 2019-02-11 10:34:02 +09:00
Ian Barwick
a61dd8a750 doc: tweak support text 2019-02-08 15:28:12 +09:00
Ian Barwick
2c84716e66 doc: add information about reporting issues etc.
Useful to have a linkable document listing the information required
to have a chance of troubleshooting issues.
2019-02-08 11:55:42 +09:00
Ian Barwick
f1667a7e98 repmgrd: don't consider nodes where repmgrd is not running
If, for whatever reason, repmgrd is not running on a node, but that
node qualifies as promotion candidate, failover will not take place
as that node will never promote itself.

We therefore discount nodes where repmgrd is running as promotion
candidates, which will ensure one node is always promoted.

There is a slight risk here that the node(s) where repmgrd is not running
are further ahead, leading to a timeline fork. It might be possible
to mitigate that by having the "election" leader perform the promote
(or follow) operation.
2019-02-07 17:07:13 +09:00
Ian Barwick
b91900f831 doc: clarify "repmgr daemon status" CSV output 2019-02-07 14:55:42 +09:00
Ian Barwick
aa1e64ec11 Warn about redundant use of --compact option 2019-02-07 14:35:30 +09:00
Ian Barwick
5d6037303b "daemon status": display node priority
GitHub #541.
2019-02-07 14:35:24 +09:00
Ian Barwick
8aaf6571a0 "cluster show": display node priority
GitHUb #541.
2019-02-07 14:35:21 +09:00
Ian Barwick
9433f80364 "cluster show": warn about nodes with paused WAL replay
We do this in "repmgr daemon status" already, so do it here too for consistency.

Related to GitHub #540.
2019-02-07 13:48:46 +09:00
Ian Barwick
aee13aee52 doc: note repmgrd behaviour when WAL replay is paused
Related to GitHub #540.
2019-02-07 13:28:29 +09:00
Ian Barwick
f0a0be0248 Remove pointless default allocation in _get_node_record() 2019-02-07 11:41:08 +09:00
Ian Barwick
c4332d9a52 repmgrd: forcibly resume WAL replay if paused
If WAL replay is paused, and there is WAL pending replay, a promote command
will be queued until replay is resumed.

As it's conceivable that there are corner cases where one standby with
replay paused has actually received the most WAL, we'll forcibly
resume WAL replay so it can be reliably promoted, if needed.

Related to GitHub #540.
2019-02-07 11:39:48 +09:00
Ian Barwick
c7b325e2a4 Add function resume_wal_replay() 2019-02-07 11:33:02 +09:00
Ian Barwick
b89941f218 Store WAL replay pause status in ReplInfo struct 2019-02-07 10:24:42 +09:00
Ian Barwick
2b3b1faa20 refactor query in function get_replication_info()
In particular handle all cases where one of the functions called
in the query can return NULL in the query itself.
2019-02-06 15:40:27 +09:00
Ian Barwick
b9cd321aed repmgrd: skip LSN checks of 0 priority node
The node will never become a candidate so we can save the round trip
to fetch its LSN.
2019-02-06 14:27:01 +09:00
Ian Barwick
984ce7420b "daemon status": emit warning if WAL replay is paused
Specifically, if WAL replay is paused *and* WAL is pending replay,
this node cannot be promoted until WAL replay is unpaused. In this
state it is not a suitable promotion candidate in a failover situation.
2019-02-06 13:32:20 +09:00
Ian Barwick
464ec6bec3 Ensure conninfo param list is initialized for --recovery-conf-only option 2019-02-06 12:58:09 +09:00
Ian Barwick
3bbbf6daa9 "recovery_file_path" is MAXPGPATH 2019-02-06 10:42:09 +09:00
Ian Barwick
cd3312496e Rename functions which return an LSN for clarity 2019-02-06 09:32:53 +09:00
Ian Barwick
cce8b76171 "standby switchover": abort if promotion candidate has WAL replay paused
If replay is paused, we can't be really sure that more WAL will be received
between the check and the promote operation, which would risk the promote
operation not taking place during the switchover (it would happen
as soon as WAL replay is resumed and pending WAL is replayed).

Therefore we simply quit with an informative slew of messages and
leave the user to sort it out.

GitHub #540.
2019-02-05 16:32:39 +09:00
Ian Barwick
2a529e7e8b "standby promote": don't promote if replay paused and in archive recovery
It does not appear feasible to predict if there is still WAL waiting to
be replayed from archive. In this case take no action.

GitHub #540.
2019-02-05 14:39:08 +09:00
Ian Barwick
f62b3b2868 Fix Pg10+ function names 2019-02-05 13:37:35 +09:00
Ian Barwick
701944c194 "standby promote": add check for WAL replay status if replay is paused
If WAL replay is paused but WAL is still pending replay, PostgreSQL will ignore
the promote request until WAL replay is unpaused. This may lead to the standby
being promoted at an unpredictable point in time outside of repmgr's
control. Moreover it may not be obvious that this is happening, or why, and
it will appear that an apparently successful promotion attempt has not
actually worked.

To prevent this from happening, repmgr will now refuse to promote the
standy if WAL replay is paused *and* WAL is still pending replay.

GitHub #540.
2019-02-05 13:30:37 +09:00
Ian Barwick
d8048060a2 doc: rephrase exit code preamble
Previously it kind of implied more than one code can be emitted.
2019-02-05 11:06:26 +09:00
Ian Barwick
31f25856a2 doc: update "repmgr node rejoin" reference 2019-02-05 10:57:23 +09:00
Ian Barwick
92c73b68a0 Clean up dbutils.c
Put functions into the same "section" as noted in the header file.
2019-02-05 09:36:54 +09:00
Ian Barwick
90909e2e42 doc: update source install instructions
Note packages required to compile if the package "build dep"
option is not viable.
2019-02-04 17:09:11 +09:00
Martín Marqués
b036870c83 doc: fix typo in the release notes for 4.3
GitHub #539

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2019-02-04 16:39:58 +09:00
Ian Barwick
321eb844e4 doc: update Debian/Ubuntu repmgrd configuration
Remove reference to setting "repmgrd_pid_file", as this should not
be set in this context.

Per GitHub #517.
2019-02-04 16:11:13 +09:00
Ian Barwick
2c9700586c repmgr: "witness register" - check connection is to primary node
Previously, if the witness server connection details were provided
to "repmgr witness register" rather than those of the primary server,
repmgr a) write the node record to the witness server rather than
the primary, and b) would loop indefinitely trying to copy the
node table to itself.

Addresses GitHub #538.
2019-02-04 14:45:32 +09:00
Ian Barwick
f9a1861ded Refactor ReplInfo struct handling
Eventually we'll want to have this contain the optional replication
info contained in the t_node_info struct, which should then contain a
pointer to a ReplInfo struct.
2019-02-02 18:39:24 +09:00
Ian Barwick
59ed86c01a "cluster show": fix formatting with multiple digit node IDs 2019-02-02 14:07:49 +09:00
Ian Barwick
f24b30327c Add missing "daemon (start|stop)" options to main help output 2019-02-02 13:11:31 +09:00
Ian Barwick
48381a5b4e Use --compact option for abbreviated display output
--terse is meant for reducing log chatter.
2019-02-02 13:06:59 +09:00
Ian Barwick
20b79f998c Define some previously magic numbers 2019-02-01 19:14:16 +09:00
Ian Barwick
a41e7bb726 doc: various minor updates 2019-02-01 17:24:32 +09:00
Ian Barwick
b9ba97a36d "standby switchover": check replication connection to upstream
Ensure repmgr checks the standby (promotion candidate) is currently
attached to the primary (demotion candidate).

Addresses issue reported in GitHub #519.
2019-02-01 15:28:06 +09:00
Ian Barwick
d8aa472c5f doc: fix URL typo 2019-02-01 13:13:11 +09:00
Ian Barwick
9273e7af73 "standby switchover": avoid potential race condition with WAL location check
Immediately after the demotion candidate (primary) has shut down, we can't
be absolutely sure that the walreceiver has flushed all WAL to disk, so
checking pg_last_wal_receive_lsn() at that point might not reflect
the actual last available WAL location.

To handle this, we'll loop for a while (timeout controlled by configuration
parameter "wal_receive_check_timeout") before finally deciding whether
the standby is still behind the shut-down primary.

Addresses issue raised in GitHub #518.
2019-02-01 12:06:22 +09:00
Ian Barwick
f04f2af8aa Add missing include files
Per compiler griping on OS X.
2019-01-31 16:10:48 +09:00
Ian Barwick
bdb4f66a9d Add an Assert() to detect attempted array overflow in param_set...() functions
Previously the code would do nothing if an attempt was made to add parameters
if the array is already full.

As the array is designed to contain all valid libpq connection parameters,
there's no reason it should ever "overflow" like this. If there is, then
it means the caller is attempting to add invalid values. Add an Assert()
so we can easily detect this in the unlikely event it ever occurs.

Noted after examining the issue raised in GitHub #533, which is nonsensical
as it implies we'd be OK with writing beyond the end of the array, however
it doesn't hurt to make it a bit clearer what is happening and why.
2019-01-31 14:11:00 +09:00
Ian Barwick
c402b08791 doc: update "node rejoin" page
Add some notes about situations where node rejoin cannot work, and
pg_rewind usage.
2019-01-31 12:25:58 +09:00
Ian Barwick
64bb034d34 "node rejoin": catch corner case where repmgr metadata is outdated
If the rejoin target is not in recovery, but not registered as primary
(we detect this by attempting to connect to the registered primary)
we abort and suggest fixing the repmgr metadata first.
2019-01-31 11:54:05 +09:00
Ian Barwick
ea54aaa290 Use "rejoin target" instead of "follow target" in "node rejoin" log output 2019-01-31 11:32:38 +09:00
Ian Barwick
b34c331eba "node rejoin": fail if rejoin target has same timeline and lower LSN
pg_rewind will not resolve this situation.
2019-01-31 11:15:55 +09:00
Ian Barwick
19e0b6a1b6 doc: update "node rejoin" documentation
In particular, update examples to reflect changed output in repmgr 4.3.
2019-01-31 10:49:39 +09:00
Ian Barwick
9349171b55 doc: document "node_rejoin_timeout" for switchover operations 2019-01-30 15:43:34 +09:00
Ian Barwick
d4ee4cc14c "daemon stop": be careful with hints about "daemon status"
If PostgreSQL is not running, "repmgr daemon status" can't be executed.
2019-01-30 14:49:43 +09:00
Ian Barwick
d7420d7274 daemon (start|stop): verify that repmgrd starts/stops.
Note this may not always be possible for "daemon stop" if we are unable
to determine the repmgrd PID.
2019-01-30 14:41:31 +09:00
Ian Barwick
70e4243a1d Clean up calls to repmgr_atoi()
In some places we were still providing "false" from the original implementation,
which was intended to indicate whether a negative value was allowed.

This has not been a problem, as it merely means we have been providing "0",
which is the same thing; however we can finer-tune some of the calls
(e.g. node ID must be or greater).
2019-01-30 11:43:43 +09:00
Ian Barwick
b6264b77c4 repmgr: mandate explicit configuration for "daemon (start|stop)"
The initial implementation was designed to fall back to "manual"
start/stop of repmgrd if the "repmgrd_service_..._command" parameters
were not set.

However on reflection, this is too much of a potential footgun, so
we will mandate provision of these parameters.
2019-01-30 10:57:06 +09:00
Ian Barwick
9e7cb6d01c doc: make it easier to find info about installation of old packages 2019-01-30 09:45:08 +09:00
Ian Barwick
0435bda115 Fix string comparison 2019-01-29 20:42:33 +09:00
Ian Barwick
a5aa47c1dd daemon start/stop: add warning about missing configuration
repmgr will attempt to construct appropriate commands to start
and stop repmgrd, but usually it's preferable for them to be explicitly
defined, particularly if repmgr is installed from packages.
2019-01-29 14:08:00 +09:00
Ian Barwick
7654dd615b Finalize "daemon (start|stop)" commands
Implements GitHub #528.
2019-01-29 13:16:11 +09:00
Ian Barwick
c83e9870fe doc: update "repmgr daemon (start|stop)" documentation 2019-01-29 13:01:36 +09:00
Ian Barwick
8b13d14294 "daemon stop": initial implementation 2019-01-29 13:01:23 +09:00
Ian Barwick
ba13172b3a Add initial "repmgr daemon (start|stop)" documentation 2019-01-29 13:01:19 +09:00
Ian Barwick
32b81e7d49 "daemon start": initial implementation 2019-01-29 13:01:14 +09:00
Ian Barwick
cbfef17a1d Fix check of --no-wait option 2019-01-29 12:29:05 +09:00
Ian Barwick
a48d408e4e Consistently log strerror output as DETAIL 2019-01-29 12:10:55 +09:00
Ian Barwick
e5f50e7b99 doc: add additional index entries for repmgrd 2019-01-28 09:52:51 +09:00
Ian Barwick
aeea02b598 doc: update "standby follow" error codes 2019-01-24 10:43:38 +09:00
Ian Barwick
59eca2be30 node rejoin: improve error code handling
- return ERR_REJOIN_FAIL in all cases where the rejoin operation fails
 - ensure ERR_FOLLOW_FAIL is not returned
 - document error codes
2019-01-24 10:31:45 +09:00
Ian Barwick
dfe57d2406 "node rejoin": log pg_rewind command as DETAIL rather than DEBUG 2019-01-23 17:15:07 +09:00
Ian Barwick
061932d023 "node rejoin": verify status of rejoin target
This adapts the code previously added to "standby follow" to verify
whether the rejoin target can actually be rejoined.
2019-01-23 17:08:55 +09:00
Ian Barwick
3f5762e03a Refactor upstream attachment check code
Move it from the "standby follow" code to an independent function so it can
be used in other contexts, e.g. "node rejoin".
2019-01-23 15:11:42 +09:00
Ian Barwick
42fa9a2a88 Log node rejoin failure as ERROR 2019-01-23 13:55:40 +09:00
Ian Barwick
f23065e041 Fix typo in log message 2019-01-23 13:53:29 +09:00
Ian Barwick
efe4a9c344 repmgrd: log receipt of SIGINT/SIGTERM 2019-01-23 13:44:59 +09:00
Ian Barwick
0970789b1d doc: improve package install instructions
Including:
- additional clarification for Pg 9.x RPM package names
- consistent usage of sudo
2019-01-23 12:55:06 +09:00
Ian Barwick
07b79286b5 doc: clarify use-cases for pausing repmgrd 2019-01-23 12:33:57 +09:00
Ian Barwick
c3d284e097 doc: better document relevant PostgreSQL settings
There is a brief section in the Quickstart Guide, but it is hard to find
unless you know it is there.
2019-01-23 12:25:02 +09:00
Ian Barwick
a9e09d436a doc: use "/current/" in URL path
From the next major release, the current documentation will be located in the
"/docs/current/" subdirectory. This makes it easier to provide canonical
links to the latest version of the documentation (similar to how  the
main PostgreSQL documentation is organised).

Once the following major release is available, the documentation will be moved
to a subdirectory with the version number, e.g. "/docs/4.3/".
2019-01-23 10:28:48 +09:00
Ian Barwick
965984a510 doc: update internal documentation links 2019-01-23 10:18:31 +09:00
Ian Barwick
1980deb480 repmgrd: check for a change to the upstream node
If the upstream node has changed, for example after "repmgr standby follow"
was manually executed, restart monitoring to ensure repmgrd is monitoring the
correct node.
2019-01-22 13:33:13 +09:00
Ian Barwick
b6fe91ebcd repmgrd: track status of local (standby) node
If the local node is not available, note the degraded monitoring status.
2019-01-22 10:36:22 +09:00
Ian Barwick
44cbb44500 repmgrd: improve logging output for standby monitoring 2019-01-22 10:36:14 +09:00
Abhijit Menon-Sen
99161c38d2 Fix typo 2019-01-21 17:37:01 +05:30
Ian Barwick
57d3ee768c doc: clarify data directory requirement in quickstart guide 2019-01-21 15:12:33 +09:00
Ian Barwick
7dce3ed234 Update copyright notices to 2019 2019-01-21 14:54:35 +09:00
Ian Barwick
58efb0f158 repmgrd: on a cascaded standby, don't fail over if "failover=manual"
Addresses GitHub #531.
2019-01-21 14:16:49 +09:00
Ian Barwick
d261768541 Standardize on --host option 2019-01-17 10:52:41 +09:00
Ian Barwick
aa8547a219 Improve "witness register" documentation, help and logging
Make it clearer that a) the primary server's hostname is required,
and b) how to provide it.

Based on feedback provided in GitHub #529.
2019-01-17 10:42:53 +09:00
Fabio Pardi
9f04a846ec doc: command to unpause should be 'unpause'
GitHub #530.
2019-01-17 10:13:22 +09:00
Ian Barwick
ff0e480fdd Ensure functions in dirutil.c do not directly modify the provided path 2019-01-16 17:24:31 +09:00
Ian Barwick
8881b69c06 "standby switchover": check remote data directory configuration
The switchover will fail if the data_directory parameter in repmgr.conf
on the remote node (demotion candidate) is incorrectly configured.
We use the previously added "repmgr node check --data-directory-config
to verify this, and abort early if an issue is discovered.

Implements GitHub #523.
2019-01-16 16:03:49 +09:00
Ian Barwick
0b3a310802 Add --data-directory-config option to "repmgr node check"
Implements part of GitHub #523.
2019-01-16 16:03:44 +09:00
Ian Barwick
4523137bfc doc: note "pg_read_all_settings" in FAQ
Relevant for PostgreSQL 10 and later where the repmgr user is not a superuser.
2019-01-16 11:27:35 +09:00
Ian Barwick
666f5cf851 doc: add FAQ entry clarifying why "data_directory" is required in repmgr.conf 2019-01-16 09:50:29 +09:00
Fabio Pardi
e89938e132 doc: add missing space after varname
GitHub #526
2019-01-16 09:37:58 +09:00
Ian Barwick
d97905f6fd doc: fix typos and update version example 2019-01-15 12:56:18 +09:00
Ian Barwick
bed66edfd9 doc: clarify Debian source install instructions 2019-01-15 12:52:30 +09:00
Ian Barwick
ba7ef9e643 doc: update PostgreSQL documentation links
"/static/" path element no longer required.
2019-01-15 12:45:33 +09:00
Ian Barwick
10be941298 Fix typo
"node join" should be "node rejoin"
2019-01-14 15:39:13 +09:00
Ian Barwick
75379eab2e doc: update "repmgr standby follow" documentation
Note corner case where repmgr will not be able to check for timeline
divergence.
2019-01-14 13:53:52 +09:00
Ian Barwick
d4e993a240 Improve handling of connection URIs when executing remote commands
Previously, if connection URIs were in use and "repmgr standby switchover"
was executed, repmgr would pass the connection URI as-is to the demotion
candidate to execute "repmgr node rejoin". However the presence of
unescaped ampersands in the connection URI was causing the rejoin command
to be incorrectly executed.

Addresses GitHub #525.
2019-01-14 11:11:51 +09:00
Ian Barwick
695a45f9ed Fix regression test
get_new_primary() output has changed.
2019-01-14 10:04:20 +09:00
Ian Barwick
028c874f81 "standby follow": simplify check when follow target has higher timeline
No need for a CHECKPOINT here, which simplifies things considerably.
2019-01-11 16:34:04 +09:00
Ian Barwick
b3c2831bd3 repmgr: add --dry-run option to "standby promote"
Implements GitHub #522.
2019-01-10 12:36:58 +09:00
Ian Barwick
e191a32eac "standby follow": update documentation 2019-01-09 16:22:45 +09:00
Ian Barwick
c66c8ebc98 repmgr: add --terse mode to "cluster show"
This suppresses display of the usually lengthy "conninfo" column, mainly
useful for generating a compact table suitable for pasting into emails,
chats etc. without messy line breaks.

Implements GitHub #521.
2019-01-09 10:06:37 +09:00
Ian Barwick
3389491151 Misc comment and log output corrections 2019-01-09 09:41:59 +09:00
Ian Barwick
81eb9d99e7 Add missing comma 2019-01-08 11:44:32 +09:00
Ian Barwick
1156f27979 Fix "repmgr --help" output
Add missing references to "witness" and "daemon" actions.
2019-01-08 10:11:31 +09:00
Ian Barwick
b5b9aacc8a Add command line option "repmgr --version-number"
Outputs the raw version number.

Intended for use by scripts etc.
2019-01-08 10:08:23 +09:00
Ian Barwick
b89b3c0961 Fix "repmgr cluster cleanup" help output
Table name mentioned was incorrect.
2019-01-08 09:49:43 +09:00
Ian Barwick
9cf5bf3f93 Note primary/standby aliases for "node check" and "node status" actions
Add comment noting the intent behind those code sections, otherwise it
looks like a copy'n'paste error.

This currently isn't documented.
2019-01-08 09:26:37 +09:00
Ian Barwick
9a5bd0d489 Update comment listing valid actions 2019-01-08 09:16:51 +09:00
Ian Barwick
40408a1734 repmgrd: check binary and extension major versions match
repmgr requires that the same "major version" (e.g. 4.3) is present
on all nodes, otherwise - particularly in the case of repmgrd - it's
highly likely things won't work as expected.

Implements part of GitHub #515.
2019-01-07 15:39:40 +09:00
Ian Barwick
40410e43ab doc: update FAQ
Make it clear 3.x is no longer maintained.
2019-01-07 12:34:43 +09:00
Ian Barwick
3c25d5a03a doc: update FAQ
Add link to repmgr compatibility matrix.
2019-01-07 12:22:52 +09:00
Ian Barwick
7e21ceb158 doc: note importance of installing same repmgr versions 2019-01-07 12:18:17 +09:00
Ian Barwick
313aa3c5d7 Refactor follow verification to reduce need for CHECKPOINT
A CHECKPOINT is not always required; hopefully we can narrow it down
to one corner case where we need to determine the minium recovery
location.

Also get local timeline ID via IDENTIFY_SYSTEM, as fetching it from
pg_control risks returning the prior timeline ID if the timeline
switch has just taken place and no restart point has yet occurred.
2018-12-04 15:27:22 +09:00
Ian Barwick
10d46f7e85 Fix variable name typo 2018-12-04 10:22:23 +09:00
Ian Barwick
9e90fcd584 "standby follow": verify status of follow target
This commit adds infrastruture for repmgr to be able to check
whether one standby can attach to another node, regardless whether
it is a standby or a primary.

This is intended to prevent a node from attempting to follow a
node whose timeline has diverged. The --dry-run option makes
it possible to test a follow operation before it is carried out.

As a useful side-effect this makes it possible for a standby to
follow another standby.

This is an initial implementation; documentation and possibly
further changes to follow.
2018-11-29 17:14:38 +09:00
Ian Barwick
c53782cda3 Fix typo in query 2018-11-29 15:24:49 +09:00
Ian Barwick
66b40ffc68 Simplify function create_replication_slot()
Following the changes in 793d83b, it's no longer necessary to
pass the server version number.
2018-11-29 14:35:01 +09:00
Ian Barwick
a6a2be2239 Teach witness repmgrd to deal with the absence of a primary
Previously it would refuse to start if the primary was not reachable,
the thinking being that it's pointless trying to monitor an incomplete
cluster.

However following an aborted failover situation, repmgrd will restart
monitoring and on the witness server, this will lead to it aborting
itself due to to continuing absence of primary.

To resolve this, witness repmgrd will now start monitoring in degraded
mode if no primary is found in the hope a primary will reappear at
some point.
2018-11-29 12:15:41 +09:00
Ian Barwick
bdcc4d9e83 Check correct result status in ...primary_last_seen() functions 2018-11-29 11:08:28 +09:00
Ian Barwick
9f587efb74 doc: update HISTORY 2018-11-29 10:34:28 +09:00
Ian Barwick
2aacd29e60 "witness register": don't try and read nodes table if it doesn't exist
Previously, "repmgr witness register --dry-run" would attempt to check
for records in the nodes table, but that might not exist yet. Skip
that check if the repmgr extension is not yet installed.

Implements GitHub #513.
2018-11-28 15:06:20 +09:00
Ian Barwick
311f7e561e "standby switchover": use empheral witness server connection
Intended to prevent issue reported in GitHub #514.
2018-11-28 14:29:41 +09:00
Ian Barwick
b498db87aa Remove redundant function declaration 2018-11-28 13:51:14 +09:00
Ian Barwick
74c44a7178 doc: document "repmgr node service"
This was originally intended for internal use, but it's mentioned
several times in the documentation and is useful for diagnostic
purposes.
2018-11-28 12:58:07 +09:00
Ian Barwick
5ff3744895 Create function get_pg_version() to read PG_VERSION
With the recovery configuration changes in PostgreSQL 12, there will
be situations where we'll need to determine the version number from
a dormant data directory in order to determine whether to write
recovery.conf or not.
2018-11-27 09:39:56 +09:00
Ian Barwick
793d83b22c Refactor server version detection
Most of the time we can simply get the version number directly from
the connection handle. Previously it was held in a global variable,
which was an icky way of doing things.

In a few special cases we also need the actual version string, which
is obtained directly from the database.
2018-11-22 21:30:31 +09:00
Ian Barwick
0f4e04e61e Add function get_current_lsn()
This is a somewhat convoluted attempt to retrieve the current LSN
of any node, regardless of whether in recovery or not, and if in
recovery, independent of whether streaming or recovering from
archive.
2018-11-22 19:31:49 +09:00
Ian Barwick
80a280cbf4 Add function get_timeline_history()
This will be required for verifying whether one node is able to
follow another node.
2018-11-22 15:26:50 +09:00
Ian Barwick
b223cb4cee standby follow: improve handling of --upstream-node-id 2018-11-22 11:16:44 +09:00
Ian Barwick
9d1f5c0de3 Update 4.2 - 4.3 extension upgrade script 2018-11-21 12:39:27 +09:00
Ian Barwick
784c9c4793 repmgrd: return predictable default values for get_primary_last_seen()
Return 0 if the node is not in recovery. In which case it's probably
rather pointless calling this function anyway.

Return -1 if the "last_seen" field has never been set (i.e. repmgrd
hasn't started yet).
2018-11-21 11:30:32 +09:00
Ian Barwick
0caec90d81 repmgrd: set primary last seen 2018-11-21 11:30:27 +09:00
Ian Barwick
1458f6e6aa add functions to determine when primary last seen by repmgrd node 2018-11-21 11:30:22 +09:00
Ian Barwick
a2d38c6084 doc: clarify "repmgr standby clone --recovery-conf-only" option
Make it clearer that the standby needs to have been cloned by whatever
method before running the command.
2018-11-20 10:19:53 +09:00
Ian Barwick
5f1bf0fb8f Bump master branch to 4.3dev 2018-11-16 12:50:04 +09:00
Ian Barwick
7d99b96717 Update/correct comments in controldata code 2018-11-14 09:52:52 +09:00
Ian Barwick
3b10750a7f doc: fix missing quotation marks
Patch from Cédric Villemain
2018-11-12 10:22:07 +09:00
Ian Barwick
af0a60b8eb doc: remove redundant warning
No longer relevant for 4.2 and later.
2018-11-12 09:38:11 +09:00
Ian Barwick
b419c5fec7 doc: update FAQ
Emphasize that repmgr does not actually perform replication.
2018-11-05 10:06:43 +09:00
Ian Barwick
2cfcc33a64 doc: add version compatibility matrix 2018-11-05 09:54:13 +09:00
Ian Barwick
273db444b2 doc: clarify replication slot FAQ entry 2018-10-31 16:20:15 +09:00
Ian Barwick
2bf3eeb931 doc: update FAQ
Emphasize that repmgr does not actually perform replication.
2018-10-31 11:56:41 +09:00
Ian Barwick
c3bc5585d9 Add sanity check for extension version
This should cover the cases where the "repmgr" extension was installed
manually but not updated, or an upgrade was not fully completed.
2018-10-31 11:16:36 +09:00
Ian Barwick
b84f217710 doc: note repmgr extension can be installed manually 2018-10-31 10:27:37 +09:00
Ian Barwick
90c49c0c28 doc: consolidate descriptions of SSH connectivity requirements 2018-10-31 10:14:03 +09:00
Ian Barwick
41c1550788 doc: clarify network and software prerequisites 2018-10-31 10:01:18 +09:00
Ian Barwick
c336e384ab Support "pg_promote()" function (PostgreSQL 12 and later)
This is an experimental feature.
2018-10-26 11:02:45 +09:00
Ian Barwick
bc1956dee9 Formatting standardization 2018-10-26 10:42:13 +09:00
Ian Barwick
a459c60145 Avoid defining variable-length arrays
As of PostgreSQL commit d9dd406f, variable length arrays are no longer
permitted. As they're not actually required anyway, just define appropriate
constants.

Also noted in GitHub #510.
2018-10-26 10:09:45 +09:00
Ian Barwick
65721bbbcd doc: update README 2018-10-24 15:24:04 +09:00
Ian Barwick
96895ba8a8 doc: update 4.2 release notes 2018-10-24 15:24:00 +09:00
Ian Barwick
e0d6d906e7 repmgrd: fix upstream role check
Only take action if it's confirmed as a standby.
2018-10-23 12:47:55 +09:00
Ian Barwick
dc8ffd30c6 "standby switchover": close all connections used to check repmgrd status
The connections used to check repmgrd status on all nodes were not being
closed if repmgrd was not running. Normally this wouldn't be a huge
problem as they will go away when repmgr terminates or the PostgreSQL
server restarted. However, if shutdown mode is "smart", the open
connection on the demotion candidate will cause the shutdown operation
to fail until repmgr times out.
2018-10-23 11:05:28 +09:00
Ian Barwick
24392fa11b doc: fix typos 2018-10-23 09:21:00 +09:00
Ian Barwick
06b5239ada doc: fix typo
Per user report on mailing list.
2018-10-23 08:59:30 +09:00
Ian Barwick
56173d94a9 Fix Makefile for VPATH builds under PostgreSQL 11 2018-10-22 16:38:18 +09:00
Ian Barwick
578f11003c repmgrd: improve node role change detection 2018-10-19 11:25:11 +09:00
Ian Barwick
36bd7cdc9f Speed up witness "failover" during a switchover 2018-10-18 17:26:29 +09:00
Ian Barwick
62ac56c3f5 repmgrd: handle case where upstream is no longer primary
If the upstream comes back on line (e.g. after a switchover), and its
status is no longer primary, restart monitoring to ensure the correct
primary (potentially the current node) is being monitored.
2018-10-18 16:50:13 +09:00
Ian Barwick
c79852cce0 Ensure witness repmgrd detects change in upstream's role
This ensures that e.g. after a switchover, repmgrd running on a witness
node will automatically detect the new primary and monitor that.
2018-10-18 16:15:46 +09:00
Ian Barwick
3907a545b0 repmgrd: ensure witness node doesn't try and follow another witness
Theoretically there should never be more than one witness node
visible here, but it's not impossible to rule it out, so add a
check just in case.
2018-10-18 12:17:06 +09:00
Ian Barwick
d1d057a184 doc: improve upgrade instructions
Note requirement to execute "systemctl daemon-reload" for systemd
systems...
2018-10-17 17:07:52 +09:00
Ian Barwick
b70e3b48c8 doc: improve upgrade instructions 2018-10-17 14:32:38 +09:00
Ian Barwick
ab6c3d9b6e Handle NULL strings when parsing boolean arguments 2018-10-17 11:47:32 +09:00
Ian Barwick
6999dbb52a Doc: update HISTORY and 4.2 release notes 2018-10-17 11:47:28 +09:00
Ian Barwick
b2348c9a70 repmgrd: improve promotion script failure handling
While scanning for a new primary following a promotion script failure,
repmgrd was treating a witness server as a potential new primary
and would attempt to "follow" it. Fortunately "repmgr standby follow"
would do the right thing and choose the actual primary, if available,
otherwise do nothing, so the cluster would eventually end up in the
correct state, albeit for the wrong reason.

By skipping the witness server as a potential new primary,
repmgrd will do the right thing if the original primary does come
back online, i.e. resume monitoring as before.
2018-10-16 11:42:54 +09:00
Ian Barwick
7b26180ebb doc: update upgrade instructions 2018-10-16 09:44:49 +09:00
Ian Barwick
d70a5250ab doc: update upgrade instructions 2018-10-11 14:57:49 +09:00
Abhijit Menon-Sen
024accfbba Merge pull request #508 from gilou/docfix
Missing comma in sudoers example
2018-10-10 22:00:43 +05:30
Gilles Pietri
55c967fd14 Missing comma in sudoers example 2018-10-10 17:07:36 +02:00
Ian Barwick
c1edb896df Move repmgrd pid functions to 4.1 → 4.2 upgrade file 2018-10-10 10:12:39 +09:00
Ian Barwick
fd66d93937 Fix LWLockRelease() call in unset_bdr_failover_handler() 2018-10-08 09:36:50 +09:00
Ian Barwick
40e94635b2 doc: fix typo in repmgr.conf.sample 2018-10-08 09:36:28 +09:00
Ian Barwick
9ad41bfb0f doc: expand upgrade section 2018-10-05 17:45:57 +09:00
Ian Barwick
35c156ce7e Update 4.1 → 4.2 upgrade script 2018-10-05 12:15:18 +09:00
Ian Barwick
85f27ff559 doc: note repmgr's default pg_basebackup options 2018-10-04 13:13:28 +09:00
Ian Barwick
ad03885b72 repmgrd: fix parsing of -d/--daemonize option
The getopt API doesn't cope well with optional arguments to short form options,
e.g. "-o foo", so we need to check the next argument value to see whether it looks
like an option or an actual argument value.
2018-10-04 11:48:54 +09:00
Ian Barwick
3e38759c02 use appendPQExpBufferStr/-Char() consistently 2018-10-04 08:42:42 +09:00
Ian Barwick
15a5d2ee9d "repmgr standby": use appendPQExpBufferStr/-Char() consistently 2018-10-03 17:31:12 +09:00
Ian Barwick
61c91df332 "repmgr node": use appendPQExpBufferStr/-Char() where appropriate 2018-10-03 14:09:29 +09:00
Ian Barwick
b346914d4d repmgr: fix "Missing replication slots" label in "node check"
Per report in GitHub #507.
2018-10-03 13:53:52 +09:00
Ian Barwick
ac40ef0e43 doc: add additional index entries for package information 2018-10-03 11:59:42 +09:00
Ian Barwick
eebf07549f doc: update repmgrd configuration for Debian/Ubuntu 2018-10-03 11:59:27 +09:00
Ian Barwick
a40fd60cb5 repmgrd: fix parsing of -d/--daemonize option 2018-10-03 11:36:38 +09:00
Ian Barwick
bd24848ce9 doc: add tip about setting "ConnectTimeout" for SSH 2018-10-03 10:16:47 +09:00
Ian Barwick
7ab81e10de Log SSH errors when running "repmgr cluster (matrix|crosscheck)"
Previously repmgr would abort with an unhelpful message about being
unable to parse CSV output.

With this commit, it will continue running, and display a list of
inaccessible nodes as an addendum to the main output (unless --csv
or --terse options are specified).

Addresses GitHub #246.
2018-10-03 10:12:18 +09:00
Ian Barwick
455a0bd93f Use make_remote_repmgr_path() in place of make_repmgr_path()
Also we can now simplify "cluster (matrix|crosscheck)" commands as
beginning with v4.0, we know where the configuration file is, so can
provide that when invoking repmgr remotely.
2018-10-02 09:59:18 +09:00
Ian Barwick
11d25e2aef Add configuration parameter "repmgr_bindir"
This is to facilitate remote invocation of repmgr when the repmgr
binary is located somewhere other than the PostgreSQL binary directory, as it
cannot be assumed all package maintainers will install repmgr there.

This parameter is optional; if not set (the default), repmgr will fall back
to "pg_bindir" (if set).

Addresses GitHub #246.
2018-10-02 09:59:12 +09:00
Ian Barwick
b14fbbdc72 Add "repmgr daemon ..." options to main help output 2018-09-27 19:07:59 +09:00
Ian Barwick
2491b8ae52 Add functionality to "pause" repmgrd
In some circumstances, e.g. while performing a switchover, it is essential
that repmgrd does not take any kind of failover action, as this will put
the cluster into an incorrect state.

Previously it was necessary to stop repmgrd on all nodes (or at least
those nodes which repmgrd would consider as promotion candidates), however
this is a cumbersome and potentially risk-prone operation, particularly if the
replication cluster contains more than a couple of servers.

To prevent this issue from occurring, this patch introduces the ability
to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause")
which notifies repmgrd not to take any failover action until the node
is "unpaused" ("repmgr daemon unpause").

"repmgr daemon status" provides an overview of each node and whether repmgrd
is running, and if so whether it is paused.

"repmgr standby switchover" has been modified to automatically pause repmgrd
while carrying out the switchover.

See documentation for further details.
2018-09-27 16:42:10 +09:00
Ian Barwick
fce3c02760 Update control file checks for PostgreSQL 11 2018-09-27 14:08:12 +09:00
Ian Barwick
1f8f6f3a39 repmgrd: add notice about different location preventing standby promotion
Though we note this in the DEBUG output, it's not immediately obvious
from the logs, especially outside of the DEBUG log level, why a node
didn't promote itself if it is in a different location to the primary.
2018-09-27 11:06:18 +09:00
Ian Barwick
401f903456 repmgrd: document parameters which can be reloaded via SIGHUP
Also add a new subsection with details on reloading repmgrd configuration.
2018-09-27 10:44:23 +09:00
Ian Barwick
688337dec3 repmgr: add "--node-id" option to "cluster cleanup"
Implements GitHub #493.
2018-09-25 15:56:40 +09:00
Ian Barwick
b660cb9fe4 doc: fix link in 4.1.1 release notes 2018-09-25 14:30:38 +09:00
Ian Barwick
5d8d9db21d doc: update 4.2 release notes 2018-09-25 14:28:28 +09:00
Ian Barwick
9439467958 doc: add troubleshooting section to switchover documentation 2018-09-25 13:47:58 +09:00
Ian Barwick
38e3aae053 repmgr: add parameter "shutdown_check_timeout"
Previously, "repmgr standby switchover" used the configuration file parameters
"reconnect_interval" and "reconnect_attempts" to define a timeout to determine
whether the current primary (demotion candidate) has shut down.

However, these parameters are intended for primary failure detection and are
generally lower in value, while a controlled shutdown may take longer, resulting
in the switchover being aborted as repmgr was not waiting long enough.

To prevent this happening, parameter "shutdown_check_timeout" has been added.
This complements the existing "standby_reconnect_timeout" parameter used
by "repmgr standby switchover".

Implements GitHub #504.
2018-09-25 11:34:06 +09:00
Ian Barwick
80bef0eb28 doc: minor fixes to "repmgr.conf.sample" 2018-09-25 10:53:24 +09:00
Ian Barwick
bea4b03cc2 doc: update "repmgr node rejoin" documentation
Clarify various points related to --force-rewind and pg_rewind usage.
2018-09-14 14:08:34 +09:00
Ian Barwick
97905b02ae repmgrd: fix comment 2018-09-13 10:15:22 +09:00
Ian Barwick
b0a2ee2259 get_all_node_records(): display any error encountered and return success status
In many cases we'll want to bail out with an error if the node list can't
be retrieved for any reason. This saves some repetitive coding.
2018-09-13 10:14:43 +09:00
Ian Barwick
bb4fdcda98 doc: update link 2018-09-12 14:17:14 +09:00
Ian Barwick
7b33faa09b repmgr: improve "cluster show" output
Only output full contents of connection error messages in --verbose mode,
otherwise it can spew a lot of text onto the screen.
2018-09-07 16:59:54 +09:00
Ian Barwick
5de2b1ee13 repmgrd: update local node id in shared memory after local node restart
Also ensure local node restarts are handled more elegantly, so we're not
surprised by a stale connection handle.

GitHub #502.
2018-09-07 11:59:53 +09:00
Ian Barwick
f184b1e68a doc: update 4.1.1 release notes 2018-09-04 12:35:46 +09:00
Ian Barwick
bd2f6db1e1 doc: update 4.1.1 release notes 2018-09-04 09:47:38 +09:00
Ian Barwick
1693ec0e90 repmgrd: fix syntax 2018-08-30 16:27:07 +09:00
Ian Barwick
17e75f6b31 repmgrd: improve reconnection handling
Previously, if the server being monitored was not available, repmgrd
would always close the existing connection handle and open a new one.

However, in some cases, e.g. a brief network outage, the existing
connection handle is still good and does not need to be reopened.

This could be particularly problematic if monitoring_history is on,
as this risks leaving orphan sessions on the primary which (given
a sufficiently unstable network) could lead to all available backends
being occupied.

Instead, during an outage we now use a new connection to verify
the server is accessible; if the old connection is still available
(e.g. following a short network interruption) we continue using that;
if  not (e.g. the server was restarted), we use the new one.
2018-08-30 15:46:08 +09:00
Ian Barwick
3b8586d82a doc: update release notes 2018-08-30 13:05:17 +09:00
Ian Barwick
6acec3e041 doc: fix internal link 2018-08-30 12:40:08 +09:00
Ian Barwick
1d830bf0e2 doc: update package signing key link 2018-08-30 12:40:05 +09:00
Ian Barwick
3f99ee8ede doc: update source requirement links
Per report from Daymel Bonne.
2018-08-30 12:40:02 +09:00
Ian Barwick
b5f640d04d doc: improve event notification documentation
- add undocumented events (per report from Daymel Bonne)
 - split up list into sections for better overview
 - where feasible, add cross-links
2018-08-30 12:39:58 +09:00
Ian Barwick
92a62a958e doc: clarify statement about BDR HA support 2018-08-30 12:39:54 +09:00
Ian Barwick
a4a956593c doc: clarify when "standby follow" can be used.
The unqualified wording previously implied that any running server could
be rejoined with "standby follow", which is not the case with a
"split brain" primary.
2018-08-30 12:39:51 +09:00
Ian Barwick
ceeb6d7130 repmgrd: improve monitoring statistics logging
Add more granular logging to help diagnose issues, and also keep track
of when the last monitoring statistics update was set and emit that
as DETAIL every time we emit a log status update.
2018-08-30 12:36:59 +09:00
Ian Barwick
9681708b1a repmgr: improve slot handling in "node rejoin"
On the rejoined node, if a replication slot for the new upstream exists
(which is typically the case after a failover), delete that slot.

Also emit a warning about any inactive replication slots which may need
to be cleaned up manually.

GitHub #499.
2018-08-30 12:24:13 +09:00
Ian Barwick
3573950425 Add additional query error logging
It's unlikely we'll get an error in these cases, but you never know.

Also, with queries which return a list of node records, it's necessary
to call _populate_node_records() even if the query fails, so a properly
initalised, albeit empty list is returned to the caller.
2018-08-29 10:25:43 +09:00
Ian Barwick
c1586e39b7 Log text of failed queries at log level ERROR
Previously query texts were always logged at log level DEBUG, but
that doesn't help much in a normal production environment when
trying to identify the cause of issues.

Also make various other minor improvements to query logging and
handling of database errors.

Implements GitHub #498.
2018-08-29 10:08:52 +09:00
Ian Barwick
7745844078 "standby switchover": improve replication connection check
Previously repmgr would first check that a replication can be made
from the demotion candidate to the promotion candidate, however it's
preferable to sanity-check the number of available walsenders first,
to provide a more useful error message.
2018-08-24 16:31:25 +09:00
Ian Barwick
e1e59e85d7 repmgr: add "cluster_cleanup" event
GitHub #492.
2018-08-24 09:20:05 +09:00
Cédric Villemain
6fc79470fc Fix grep to find conninfo
it used to use \t* but [[:space:]] should be better as it does match more kind
of spaces (the current one being broken in my case on RH7)
2018-08-23 18:33:55 +02:00
Ian Barwick
b7d576863d doc: update FAQ
Add note about why repmgrd refuses to start up if the upstream is
not running.
2018-08-20 15:33:55 +09:00
Ian Barwick
c1338df5e3 doc: clarify repmgrd FAQ item
"priority" must be 0 or greater.
2018-08-20 15:30:43 +09:00
Ian Barwick
221fb63e92 repmgrd: fix startup on witness node when local data is stale
Previously, when running on a witness server, repmgrd didn't consider
the local cache of the "repmgr.nodes" table might be outdated, e.g.
as repmgrd wasn't running on the witness server during a failover,
so could potentially end up monitoring a former primary now running
as a standby.

When running on a witness server, at startup repmgrd will now scan
all nodes to determine the current primary, and refresh its local
cache from there. This will also ensure it can start up even if the
node currently registered as primary in the local cache is not available.

Implements GitHub #488 and #489.
2018-08-20 15:29:29 +09:00
Ian Barwick
987823861f doc: document sources of old package versions 2018-08-20 15:25:00 +09:00
Ian Barwick
7a6eb6321b doc: add information about snapshot packages 2018-08-20 15:24:57 +09:00
Ian Barwick
f4df6696ba doc: update release notes 2018-08-20 15:24:43 +09:00
Ian Barwick
bc584d84f6 repmgrd: improve cascaded standby failover handling
In particular, improve handling of the case where the standby follow
command fails due to the primary not being available.

GitHub #480.
2018-08-20 15:23:54 +09:00
Ian Barwick
76f5bcf3cd repmgrd: fix PQExpBuffer handling in upstream failover handler
Was sometimes leading to blank log lines.
2018-08-20 15:23:50 +09:00
Ian Barwick
b1aab930af repmgrd: don't imply primary is in recovery if it's not available 2018-08-20 15:23:46 +09:00
Ian Barwick
58994365ff repmgrd: fix "repmgrd_upstream_reconnect" event notification
Upstream node is not always the primary node.

Per report in GitHub #480.
2018-08-20 15:23:42 +09:00
Ian Barwick
c3949b2aea "standby clone" - don't copy external config files in dry run mode
Avoid copying files during a --dry-run as it may introduce unexpected changes
on the target node. During an actual clone operation, any problems with
copying files will be detected early and the operation aborted before
the actual database cloning commences.

GitHub #491.
2018-08-20 15:23:37 +09:00
Ian Barwick
6ba49de44e "standby promote": improve log messages
Make it clearer what repmgr is waiting for, and what to do if the
promotion appears to fail.
2018-08-16 11:52:01 +09:00
Ian Barwick
b61f853a69 repmgrd: ensure primary connection handle is refreshed after reconnect
In some circumstances, if monitoring history was in use, repmgrd was attempting
to fetch the primary's current LSN on a stale connection handle.
2018-08-15 16:55:03 +09:00
Ian Barwick
f2bc898761 repmgr: fix handling of slot creation error when cloning
If cloning from another node other than the intended upstream, and
replication slots are in use, once the cloning process is complete,
repmgr will attempt to connect to the intended upstream to create
the replication slot.

Previously it would abort with a connection error, but as this issue
is not fatal to the cloning process itself, and in some situations may
be intentional, it's better to log a warning and continue.

We should probably collate this (and any similar items needing
attention after the cloning operation) into a list output at the end,
otherwise the warning may get overlooked.
2018-08-15 15:12:23 +09:00
Ian Barwick
7bcf87b8ed doc: update FAQ
Explain why some values in recovery.conf are surrounded by pairs of single
quotes.
2018-08-15 14:42:56 +09:00
Ian Barwick
6983547325 doc: improve "repmgr cluster cleanup" documentation 2018-08-14 10:09:52 +09:00
Ian Barwick
34c4f4c3f8 repmgr: truncate version string if necessary
Some distributions may add extra information to PG_VERSION after
the actual version number (e.g. "10.4 (Debian 10.4-2.pgdg90+1)"), so
copy the version number string up until the first space is found.

GitHub #490.
2018-08-14 09:55:23 +09:00
Ian Barwick
f8667c1aac doc: better explain where pg_bindir won't be applied
Basically any setting which can contain a user-defined script
*must* have the full path set, even if it's repmgr being executed.

We could potentially apply some heuristics to detect if the first
item in the setting is "repmgr" (or more precisely repmgrd's program
name), but this will require some careful thought and testing
that it works as intended.
2018-08-14 09:54:27 +09:00
Ian Barwick
08ab6290c1 Add dummy 4.2 extension SQL file 2018-08-14 09:54:27 +09:00
Abhijit Menon-Sen
97cafd8c54 Fix upstream node name in warning
This log_warning is supposed to reproduce the error in the block above,
but used the current node's name instead of the intended upstream node.
2018-08-12 09:15:13 +05:30
Ian Barwick
78b969f208 repmgrd: report version number *after* logger initialisation
This ensures the version number always makes it into the log destination.

Implements GitHub #487.
2018-08-08 15:44:06 +09:00
Ian Barwick
3f558416f3 doc: clarify witness server location 2018-08-07 13:10:30 +09:00
Ian Barwick
410fa5e54d Bump master branch to 4.2dev 2018-08-07 13:03:28 +09:00
Ian Barwick
44a224ad92 repmgrd: fix configuration file reloading
Don't allow "promote_command" or "follow_command" to be empty.

GitHub #486.
2018-08-02 16:35:26 +09:00
Ian Barwick
33dedf4e96 repmgrd: always reopen log file after receiving SIGHUP
For whatever reason, since at least repmgr 2.0 the log file was only
ever reopened if a configuration file change took place.

GitHub #485.
2018-08-02 10:54:31 +09:00
Ian Barwick
4f4d20c30b doc: fix typo 2018-08-02 10:54:24 +09:00
Ian Barwick
69cb87322d doc: update repmgrd log rotation configuration
In the sample logrotate configuration file, use "copytruncate" rather than "create",
as repmgrd currently doesn't reopen the log file (unless the configuration changes).

Per suggestion in GitHub #465.
2018-08-02 10:54:20 +09:00
Ian Barwick
4351836520 doc: update 2ndQuadrant repository locations in packaging appendix 2018-08-02 10:54:16 +09:00
Ian Barwick
a87f18682c repmgrd: consolidate SIGHUP handling
Move identical code blocks into single function.
2018-08-02 10:54:12 +09:00
Ian Barwick
1a630d079e doc: add note about new repository structure to 4.1.0 release notes 2018-08-02 10:54:08 +09:00
Ian Barwick
d2929f6426 doc: update 4.1.0 release notes 2018-08-02 10:54:03 +09:00
Ian Barwick
f3f002bea5 doc: add release date for 4.1.0 2018-07-31 11:00:38 +09:00
Ian Barwick
93471b8d68 doc: update Debian installation instructions
2ndQuadrant repository structure has changed.
2018-07-31 11:00:32 +09:00
Ian Barwick
46e0a9a8db doc: update RPM installation instructions
2ndQuadrant repository structure has changed.

Also remove reference to the old, very deprecated original repmgr RPM
repository.
2018-07-31 11:00:28 +09:00
Ian Barwick
3620fa79e8 doc: fix typo 2018-07-31 11:00:10 +09:00
Ian Barwick
c236405251 Update extension metadata for 4.1 release
This release does not make any changes to the extension database
objects.
2018-07-24 09:56:43 +09:00
Ian Barwick
527a5f7fee doc: update release notes and upgrade instructions 2018-07-24 09:54:06 +09:00
Ian Barwick
937cffd54c doc: clarify BDR repmgrd configuration
Link directly to section about configuring the "event_notification_command".
2018-07-23 13:21:11 +09:00
Ian Barwick
2b1e12591a doc: fix markup errors 2018-07-23 13:18:38 +09:00
Ian Barwick
7ecfb333b9 doc: add note about switchover and exclusive backups
Also rename server_not_in_exclusive_backup_mode() to avoid double
negatives.

GitHub #476.
2018-07-19 16:02:31 +09:00
Martín Marqués
8f13a66aaa Check that there is no exclusive backup taking place while we perform
a switchover.

We've found that this can cause some issues with postgres control
metadata (could be a postgres bug) so best thing is *not* no switchover
if there's a backup taking place.

It's also a bad idea from an architectual point of view, as a switchover
is supposed to be planed, so why perform it when we are taking backups.

GitHub #476.
2018-07-19 16:02:21 +09:00
Ian Barwick
ef35d071bf Fix is_active_bdr_node() query for BDR 2.x
Copy/paste error when adapting the query for BDR 3.x.
2018-07-19 09:50:30 +09:00
Ian Barwick
b87f9dabb4 doc: remove duplicate item in list of event notifications 2018-07-18 16:10:55 +09:00
Ian Barwick
7decc7975f Fix BDR version check
repgexp_match() is only available from PostgreSQL 10 and later.
2018-07-18 10:54:16 +09:00
Ian Barwick
a5cfc244bc repmgr: have "node status" check for missing downstream nodes
This matches the behaviour of "node check".
2018-07-18 10:27:19 +09:00
Ian Barwick
673bde2b7f repmgr: fix "primary_slot_name" when using "standby clone" with --recovery-conf-only
Addresses GitHub #474.
2018-07-17 13:42:10 +09:00
Martín Marqués
81de200561 Add information to the --help and docs of standby clone regarding the need
to provide a conninfo line to the upstream from which we will be cloning
from.
2018-07-16 18:56:41 -03:00
Ian Barwick
cb46fb6410 repmgrd: when reloading configuration, log any errors encountered 2018-07-16 16:46:39 +09:00
Ian Barwick
bd58e4128c repmgrd: log "promote_command" at log_level "INFO"
If repmgrd is promoting the local node, it was only logging the contents
of "promote_command" at DEBUG level; it would be useful to see this at
the default log level.

Related to GitHub #473.
2018-07-16 15:33:10 +09:00
Ian Barwick
63242e2277 doc: update documentation of "promote_command" and "service_promote_command"
The documentation implied it would override "promote_command", which is
not the case.

"promote_command" is used by repmgrd to execute "repmgr standby promote"
(either directly or via a custom script).

"service_promote_command" can be set to specify a package-level service
command to promote the local PostgreSQL instance from standby to primary,
e.g. Debian's pg_ctlcluster. If set, this will be executed by "repmgr standby promote".

Also update code comments to clarify usage.

Related to GitHub #473.
2018-07-16 14:43:53 +09:00
Ian Barwick
69782cf703 repmgr: enable "witness unregister" to be run on any node
Provide the ID of the witness node with --node-id=...

Implements GitHub #472.
2018-07-13 17:37:59 +09:00
Ian Barwick
5acb3e6790 doc: update release notes 2018-07-13 15:35:34 +09:00
Ian Barwick
6dfcaa357e doc: update release notes 2018-07-13 15:06:04 +09:00
Ian Barwick
8acc50e752 Bump version number in configure.in 2018-07-13 14:05:29 +09:00
Ian Barwick
56919ea499 repmgr: add -q/--quiet option
This suppresses log output below log level ERROR. This is useful mainly
when repmgr is being executed programmatically, e.g. in a cronjob,
where it's only useful to receive output if something goes wrong.

Note we advise against using this option when executing repmgr
commands which operate on PostgreSQL nodes (standby follow,
standby promote, standby switchover, node rejoin), particularly when
executed by repmgrd, as the log output will provide valuable
troubleshooting information.

Implements suggestion in GitHub #468.
2018-07-13 12:09:41 +09:00
Ian Barwick
b3f64987cb repmgr: add --csv output to "cluster event"
Implements GitHub #471.
2018-07-13 11:19:42 +09:00
Ian Barwick
388ac2f392 repmgrd: enable package to supply default PID file path
Also add documentation for packagers about paths which can be patched
as default package values.
2018-07-13 10:26:47 +09:00
Ian Barwick
8b059bc9b0 Change default for "log_level" to INFO
Default was previously NOTICE (as in repmgr 3.x) but documentation
implied it was INFO, and many of the the documentation examples assume
it is.

This produces some quite informative log output, without creating excessive
log file volume. In particular it's useful to get a better idea of what
repmgrd is actually doing.

Also add documentation section for the log configuration parameters.

GitHub #470, containing change suggested in GitHub #467.
2018-07-12 14:50:48 +09:00
Ian Barwick
cfa7155784 doc: update links to configuration file sections 2018-07-12 11:43:04 +09:00
Ian Barwick
47644b55ed doc: rearrange repmgr.conf documentation 2018-07-12 11:36:28 +09:00
Ian Barwick
17f30ec364 repmgrd: add additional local node connection check
It's possible there are corner-cases where do_election() is called while the
local connection is invalid, so perform an additional check.
2018-07-11 15:11:20 +09:00
Ian Barwick
c6b8d78bad doc: add extra emphasis about not running repmgrd during switchover
One day this will no longer be an issue, until then let's hope the
fine documentation is read.
2018-07-11 09:53:29 +09:00
Ian Barwick
ae60caacdd repmgr: make "node check" and "node status" return ERR_NODE_STATUS when appropriate
If any issue is detected (and "node check" is not being executed with a specific
individual check), "ERR_NODE_STATUS" is returned.
2018-07-05 14:31:06 +09:00
Ian Barwick
92d0e6809b repmgr: "cluster show" to return non-zero value if an issue encountered 2018-07-05 13:32:50 +09:00
Ian Barwick
4c7c681a14 repmgr: have "cluster show" exit with a non-zero value if issues detected
If any issues are detected (e.g. node not reachable, unexpected node status
etc.), "repmgr cluster show" returns exit code 25 ("ERR_NODE_STATUS").

Note that exit code 25 was introduced recently as "ERR_CLUSTER_CHECK",
however it makes sense to use this to indicate issues detected by any
command which can detect node issues.

Addresses GitHub #456.
2018-07-05 11:03:48 +09:00
Ian Barwick
29de052dd8 repmgr: clarify intent behind --wait-sync timeout processing 2018-07-05 10:09:04 +09:00
Ian Barwick
ebf2a3a7cc doc: fix typo in release notes 2018-07-05 08:45:10 +09:00
Ian Barwick
37311e15a3 repmgr: fix "standby register --wait-sync" when no timeout provided
The default value for "wait_register_sync_seconds" was zero, which is treated
as disabling --wait-sync altogether. Default value now set to -1, which is taken
to mean no timeout value supplied.
2018-07-04 17:22:04 +09:00
Ian Barwick
a194cf56b3 repmgr: exit with an error if an unrecognised command line option is provided.
This matches the behaviour of other PostgreSQL utilities such as psql, though
repmgr will only abort once all command line options are parsed, so as many
errors as possible are found and displayed. If a repmgr "command" (e.g.
"repmgr primary ..." was provided, a hint about the relevant command
help section (e.g. "repmgr primary --help") will be provided alongside
the generic help command (i.e. "repmgr --help").

Addresses GitHub #464, with further improvements.
2018-07-04 11:02:50 +09:00
Abhijit Menon-Sen
c4f9205f17 Merge pull request #460 from gclough/repmgr_conf_sample_typo_priority
Fixed typo in repmgr.conf.sample, "priority"
2018-07-03 17:43:57 +05:30
Abhijit Menon-Sen
6d09ebcfb5 Merge pull request #462 from gclough/repmgr_cluster_help_2
Fix "cluster cleanup" help
2018-07-03 17:43:35 +05:30
Abhijit Menon-Sen
319a29583d Merge pull request #461 from gclough/add_cluster_cleanup_help
Added "cluster cleanup" to help
2018-07-03 17:43:20 +05:30
Greg Clough
a5d47fd478 Fix "cluster cleanup" help
Fix "cluster cleanup" help
2018-06-29 22:57:06 +01:00
Greg Clough
190104c7db Added "cluster cleanup" to help 2018-06-29 22:54:59 +01:00
Greg Clough
ff16d3b3bb Fixed typo in repmgr.conf.sample, "priority"
Fixed typo in repmgr.conf.sample, "priority"
2018-06-29 22:00:09 +01:00
Ian Barwick
802755fd60 repmgrd: daemonize process by default
It's hard to imagine a use case where this isn't desirable, but
in case, for whatever reason, the user does not wish to daemonize the
process, the command line option "--daemonize=false" can be provided.

Implements GitHub #458.
2018-06-29 22:01:49 +09:00
Ian Barwick
d00c0c67d0 repmgrd: document PID file options/configuration 2018-06-29 17:00:25 +09:00
Ian Barwick
8d636690bd repmgrd: create pid file by default
Traditionally repmgrd will only write a pidfile if explicitly requested with
-p/--pid-file. However it's normally desirable to have a pidfile, and it's
preferable to have one used by default to prevent accidentally starting a second
repmgrd instance.

Following changes made:

 - add configuration file parameter "repmgrd_pid_file" (initially overridden by
   -p/--pid-file for backwards compatibility, though eventually we'll want to
   drop -p/--pid-file altogether)
 - add command line option --no-pid-file
 - if neither "repmgrd_pid_file" nor -p/--pid-file is set, create the pid file
   in a temporary directory

Implements GitHub #457.
2018-06-29 14:36:24 +09:00
Ian Barwick
b2081dca52 De-overload configuration file parameter "standby_reconnect_timeout"
Currently the (very generic sounding) "standby_reconnect_timeout" configuration
file parameter is used in several different contexts and it would be useful
to have more granular control over the different timeouts it's used to configure.

This patch introduces "node_rejoin_timeout", used in place of "standby_reconnect_timeout"
(which wasn't documented) when "repmgr node rejoin" is executed, to determine
how long to wait for the node to rejoin the replication cluster.

Additionally "repmgrd_standby_startup_timeout" is introduced as a timeout for
failover situations, when repmgrd executes "repmgr standby follow" to follow
a new primary, and waits for the standby to restart and become available
for connections.

"standby_reconnect_timeout" is now only relevant for "repmgr standby switchover".

Implements GitHub #454.
2018-06-28 18:00:55 +09:00
Ian Barwick
080a29c33b node check: add --missing-slots check
This enables an explicit check for slots which should exist (according
to the repmgr metadata) but which aren't present.
2018-06-22 17:21:40 +09:00
Ian Barwick
dd7a4068d2 node check: implement CSV output
This is advertised in the --help output and placeholder code was in
place, but it wasn't actually implemented.
2018-06-22 13:14:57 +09:00
Ian Barwick
fcf237fe31 node status: improve output and documentation
In the default text output mode, list inactive slots.

In CSV output mode, list inactive slots as additional information;
add output line with number of missing slots and a list thereof.

Also document --csv output mode.
2018-06-22 11:46:50 +09:00
Ian Barwick
4d70a667fb node check: clarify status information for witness server
Previously the output gave the impression the server was a primary,
which is technically the case, but it's not the actual cluster primary.

Also output an error if the node is in recovery, which is unlikely but
you never know.
2018-06-22 10:15:45 +09:00
Ian Barwick
c5ba72c2c5 standby switchover: fix behaviour if witness node is a sibling
The witness node is not a streaming replication standby, so executing
"repmgr standby follow" will fail. Instead, execute "repmgr witness
register --force" to update the witness node record on the primary and
its local copy of all node records.

Addresses GitHub #453.
2018-06-21 16:48:58 +09:00
Ian Barwick
0f97a98f28 repmgr: don't count witness node as a standby when running "node status"
Addresses GitHub #451.
2018-06-21 13:06:18 +09:00
Ian Barwick
269e3242c8 "repmgr node ...": update comments and formatting 2018-06-21 12:12:07 +09:00
Ian Barwick
b0ed87832b repmgr: don't count witness node as a standby when running "node check"
Addresses GitHub #451.
2018-06-21 11:13:46 +09:00
Ian Barwick
836d2125fe Improve BDR3 node query
We can get everything we need from bdr.node_summary
2018-06-15 14:30:06 +09:00
Ian Barwick
bf0d67c60a Add repmgr.nodes to the BDR replication set 2018-06-15 14:29:08 +09:00
Ian Barwick
e1d807188d Add extension upgrade files 2018-06-15 14:27:42 +09:00
Ian Barwick
108c3a36fb Enable creation of repmgr extension on BDR3 node 2018-06-15 14:26:47 +09:00
Ian Barwick
8377704596 Convert BDR query functions to handle BDR2/BDR3 2018-06-15 14:26:07 +09:00
Ian Barwick
4f642f8332 Detect and store BDR major version number when executing "is_bdr_db()"
BDR3 metadata structure is very different to BDR1/2, so we'll need to
generate queries according to version.
2018-06-15 14:25:55 +09:00
Ian Barwick
029ba46470 doc: remove info about old RPM package repository 2018-06-15 13:27:19 +09:00
Ian Barwick
098f8eaf2a doc: finalize release notes 2018-06-15 13:27:14 +09:00
Ian Barwick
d60bd232f0 Enable "recovery_min_apply_delay" to be zero.
Addresses GitHub #448.
2018-06-14 11:11:33 +09:00
Ian Barwick
eca1943026 doc: emphasize that repmgrd should not be running during a switchover 2018-06-12 10:30:35 +09:00
Ian Barwick
bcab4bc391 _create_event(): log event and node ID for debugging 2018-06-12 10:30:30 +09:00
Ian Barwick
bb320a64f5 repmgr: consolidate code in "standby switchover"
Commit 41274f5525 left us with two if statements
in sequence with exactly the same condition, so consolidate both into a single
statement. Clarify code comments while we're at it.
2018-06-12 10:30:24 +09:00
Ian Barwick
3b0cde2846 repmgr: cluster check commands - non-zero exit code if node(s) unavailable
Return ERR_CLUSTER_CHECK if one or nodes was not reachable.

Implements GitHub #447.
2018-06-12 10:30:11 +09:00
Ian Barwick
00704913a6 doc: 4.0.6 release notes 2018-06-12 10:29:35 +09:00
Ian Barwick
efc388065e standby follow: check node has connect to new primary
After restarting the standby, poll pg_stat_replication on the upstream
until the standby connects, and exit with an error if it doesn't by the
timeout defined in "standby_follow_timeout".

Implments GitHub #444.
2018-06-07 15:04:45 +09:00
Ian Barwick
e12fbb7b4d doc: update release notes 2018-06-07 15:04:38 +09:00
Ian Barwick
0108fb2e72 standby follow: add hint about using "node rejoin"
If "repmgr standby follow" is executed on a node which isn't running,
point out "repmgr node rejoin" should probably be used instead.
2018-06-07 15:04:30 +09:00
Ian Barwick
e408351697 doc: fix typos 2018-06-07 15:04:25 +09:00
Ian Barwick
f904cd2573 witness_register: check for existing node with same name 2018-06-07 15:04:18 +09:00
Ian Barwick
95fe7ea621 repmgrd: ensure local node is counted as quorum member
Rename "standby_nodes" to "sibling_nodes" to make it clearer in the
code what total is actually provided by the struct.

Addresses GitHub #439.
2018-06-07 15:04:12 +09:00
Ian Barwick
a50ac039da doc: fix typo 2018-06-07 15:04:06 +09:00
Ian Barwick
535fba43d3 standby clone: improve external configuration file copying
If --copy-external-config-files was provided, check that we can copy
the files *before* cloning the standby, and abort if an error is
encountered. This will give the user the opportunity to fix any issues
before running the entire (and potentially lengthy) clone.

Previously errors were logged but no action taken, and the final
message indicated the clone operation was successful.

Addresses GitHub #443.
2018-06-07 15:04:01 +09:00
Ian Barwick
043a6c5bea repmgrd: ensue degraded monitoring timeout works on standby
Parameter "degraded_monitoring_timeout" was not being acted on when
monitoring a streaming replication standby.

Addresses GitHub #439.
2018-06-07 15:03:52 +09:00
Ian Barwick
8da26f1c6c If --dry-run specified, ensure minimum log level is INFO
When executed with --dry-run, repmgr outputs detail about what would
happen using log level INFO. If the log_level is configured to
NOTICE or higher, it's possible some or all of the --dry-run output
might not be displayed.

Addresses GitHub #441.
2018-06-07 15:03:43 +09:00
Ian Barwick
7861392450 node rejoin: avoid outputting empty DETAIL message 2018-06-07 15:03:36 +09:00
Ian Barwick
b297e40d77 node rejoin: improve handling of --config-file parameter
Fixes bug when parsing --config-file values (GitHub #442).

Also improves handling in --dry-run mode, as some checks for the
provided files were being skipped if --dry-run supplied, even though
they are intended to work with --dry-run.
2018-06-07 15:03:30 +09:00
Ian Barwick
7613b1769c standby clone: --recovery-conf-only expects the standby to be registered
Note this in the documentation, and add a HINT about registering it
if the standby record is not available.

Related to GitHub #438.
2018-05-31 09:42:53 +09:00
Ian Barwick
b1b49748a7 "config_file" is MAXPGPATH, not MAXLEN
The two values are the same anyway, so change is more for consistency.
2018-05-24 15:52:57 +09:00
Ian Barwick
276239422b standby clone: don't assume existence of "user" in upstream conninfo
Usually a seperate user (typically "repmgr") is set up specifically to manage
the repmgr metadata, however there's no compelling requirement to do this, and
it's possible the database owner (usually: "postgres") will be used, in which
case it's possible the username will be left out of the conninfo string.

Addresses GitHub #437.
2018-05-24 15:52:51 +09:00
Martín Marqués
49418e096e Fix typo in a code comment 2018-05-19 12:30:03 -03:00
Ian Barwick
6c518f1403 "standby clone": log actual connection string used to connect to upstream
Useful for diagnostic purposes.
2018-05-10 12:03:13 +09:00
Ian Barwick
b365765bc8 Fix check for -d/--dbname parameter
Not a bug per-se, just meant some unnecessary processing was done on
an empty string.

Per note from petere.
2018-05-10 12:03:09 +09:00
Ian Barwick
bd63948937 Include "arpa/inet.h" in dbutils.c
Needed for htonl() on FreeBSD.
2018-05-10 12:03:04 +09:00
Ian Barwick
69c1f147ea doc: update 2ndQuadrant repository information
Canonical link for each repository should not include any directories.
2018-05-10 10:39:31 +09:00
Ian Barwick
ce8d3cf0b0 doc: update repository information 2018-05-10 10:39:27 +09:00
Ian Barwick
14134f8e70 doc: update package installation information
Document the new public 2ndQuadrant apt repository
2018-05-10 10:39:23 +09:00
Ian Barwick
be8448ddcb doc: update package installation information
Document the new, public 2ndQuadrant RPM repository.
2018-05-10 10:39:18 +09:00
Ian Barwick
a2ff1536ad doc: add notes about package compatibility
We need to emphasise that the repmgr packages are only compatible
with packages based on the PGDG filesystem layout; 3rd party vendor
packages often put application and data directories elsewhere.
See e.g. GitHub #427.
2018-05-10 10:38:54 +09:00
Ian Barwick
9c0c1b663e Minor documentation fixes 2018-05-10 10:25:29 +09:00
Ian Barwick
2d43feb34b doc: update HISTORY and add 4.0.5 release notes 2018-05-01 10:21:40 +09:00
Ian Barwick
6f315c1b3c repmgrd: don't explicitly close connections on shutdown 2018-05-01 10:21:10 +09:00
Ian Barwick
635bdccb2c Fix parsing of "archive_ready_critical" configuration file parameter.
Per report in GitHub #426.
2018-04-28 07:00:56 +09:00
Ian Barwick
16048a879e repmgrd: notify sibling nodes to follow new primary after pg_ctl timeout
If "pg_ctl promote" fails due to a timeout, but the promotion itself succeeds,
have repmgrd on the new primary explicitly notify any sibling nodes to
follow it.

Previously the sibling nodes would wait "primary_notification_timeout" seconds
before attempting to discover the new primary.

This (and preceding commit eac80ae) address GitHub #425.
2018-04-27 11:54:21 +09:00
Ian Barwick
eac80ae9c1 repmgrd: handle pg_ctl timeout
It's possible "pg_ctl promote" will timeout, causing "repmgr standby
follow" to return with an error; however the promotion itself will usually
succeed, so detect this case and handle accordingly.
2018-04-26 19:19:42 +09:00
Ian Barwick
887b845aa0 repmgrd: always close the connection if the pointer is not NULL 2018-04-26 10:04:07 +09:00
Ian Barwick
8320179f34 Add configuration file parameter "config_directory"
This enables explicit provision of an external configuration file
directory, which if set will be passed to "pg_ctl" as the -D
parameter. Otherwise "pg_ctl" will default to using the data directory,
which will cause some operations to fail if the configuration files
are not present there.

Note this is implemented primarily for feature completeness and for
development/testing purposes. Users who have installed "repmgr" from
a package should not rely on "pg_ctl" to stop/start/restart PostgreSQL,
instead they should set the appropriate "service_..._command" for their
operating system. For more details see:

    https://repmgr.org/docs/4.0/configuration-service-commands.html

Note: in a future release, the presence of "config_directory" in repmgr.conf
will be used to implictly set "--copy-external-config-files=samepath" when
cloning a standby; this is a behaviour change so will be implemented in the
next major realease (repmgr 4.1).

Implements GitHub #424.
2018-04-25 11:58:24 +09:00
Ian Barwick
7822aa784f repmgrd: catch corner case in standby connection handle check
If repmgrd marks the local node as unavailable, and it was actually
restarting but a failover event occured before the next local node
check, failover will continue with the stale connection handle.

Add a final local node check just before starting the failover
process, so repmgrd can reconnect if it wasn't able to before.
2018-04-24 21:56:57 +09:00
Ian Barwick
4455ded935 repmgrd: prevent standby connection handle from going stale
If monitoring history not in use, there's no activity on the standby's
connection handle, so if e.g. the standby is restarted, PQstatus()
never returns CONNECTION_BAD and repmgrd never notices the connection
is stale. Therefore execute a throw-away statement at "monitor_interval_secs".
2018-04-24 21:56:52 +09:00
Ian Barwick
fd0b850f41 Minor doc and log output tweaks 2018-04-24 21:08:05 +09:00
Ian Barwick
d9ac1d6fd0 doc: minor clarification 2018-04-20 12:58:46 +09:00
Ian Barwick
11e4d9fd05 doc: additional details about repmgrd usage in Debian/Ubuntu 2018-04-20 12:58:41 +09:00
Ian Barwick
4b54106f48 doc: add Debian package details 2018-04-20 12:58:37 +09:00
Ian Barwick
f3941ceab0 doc: Improve CentOS package-related documentation 2018-04-20 12:58:33 +09:00
Ian Barwick
93f80c413e doc: link to service command configuration from switchover section 2018-04-20 10:15:22 +09:00
Ian Barwick
09b8a86605 doc: improve configuration documentation
With special attention to setting service commands, and extra special
mention of "pg_ctlcluster" for Debian/Ubuntu users.
2018-04-20 10:15:18 +09:00
Ian Barwick
6b3d54a5f3 doc: update CentOS package documentation 2018-04-20 10:15:14 +09:00
Ian Barwick
85ab2d94b7 repmgrd: tweak event notifications on standby failure
The event notification was only being created if there was a valid
primary connection; it should be created in any case, so an event
notification script can be executed.
2018-04-20 10:15:08 +09:00
Ian Barwick
cda952f1e4 Add "dbname=replication" to all replication connection strings
Previously repmgr was attempting to make replication connections
with "dbname" set to the repmgr database name. While this works
if e.g. the repmgr user also has replication permissions, it will
fail if a dedicated replication user is specified, who only has
permission to access the virtual "replication" database.

Change this to use "dbname=replication" if the replication connection
user is different to the normal repmgr database user.

(We could just always set it to "replication", but that might break
existing installations e.g. where a .pgpass file is in use and there's
no "replication" entry for the normal repmgr database user).

Addresses GitHub #421.
2018-04-12 16:11:16 +09:00
Ian Barwick
99ad57f88a doc: mention --recovery-conf-only introduced in repmgr 4.0.4
Per GitHub #419.
2018-04-12 16:11:12 +09:00
Ian Barwick
ad0671ead2 doc: various updates related to "standby clone" operations. 2018-04-12 16:11:07 +09:00
Ian Barwick
1bbb2ef213 Fix superuser password handling
When establishing a superuser connection, the connection parameters
were being copied from the existing (non-superuser) connection, which
in some circumstances can lead to that user's password being
included in the copied parameter list. The password parameter, if set, will
now always be removed, which will cause libpq to retrieve the correct
one from the .pgpass file.

Addresses GitHub #400.
2018-04-12 12:49:41 +09:00
Ian Barwick
62c29aab32 Don't issue a CHECKPOINT after promoting a standby.
Issuing a CHECKPOINT immediately after promoting a standby may impact
performance. Commit 239a548e9d ensures
one is only issued when required, i.e. during a switchover when
pg_rewind will be executed.

This reverts commit a2068768ab.
2018-04-09 14:35:54 +09:00
Ian Barwick
b9dc94f28f doc: update FAQ location 2018-04-07 11:46:10 +09:00
Ian Barwick
e8ba213174 "standby register": add sanity check when --upstream-node-id not supplied
If --upstream-node-id was not supplied to "repmgr standby register",
repmgr defaults to the primary node as upstream node. If the local node is
available, we now double-check that it's attached to the primary,
in case the lack of --upstream-node-id was an accidental ommission.

This check is only made when the local node is available.

This behaviour can be overriden with -F/--force (though it's hard to
imagine a scenario where that would be useful).

Addresses GitHub #395.
2018-04-05 17:38:55 +09:00
Ian Barwick
0dcddbb062 doc: minor FAQ tweaks 2018-04-05 17:10:33 +09:00
Ian Barwick
b4dab86c3b doc: add a section about repmgrd and service commands etc. 2018-04-05 11:49:08 +09:00
Ian Barwick
644a56a645 doc: miscelleneous FAQ updates
- clarify pg_rewind item
 - add note about what's included in recovery.conf
2018-04-04 10:07:08 +09:00
Ian Barwick
4876a9fde3 Add TODO for pg_rewind changes coming in PostgreSQL 11 2018-04-03 21:56:46 +09:00
Ian Barwick
ec998bf9c5 doc: update HISTORY and release notes 2018-04-03 15:00:49 +09:00
Ian Barwick
e36b180de8 Ensure correct server version number used for replication stats query 2018-04-03 14:45:37 +09:00
Ian Barwick
a2068768ab Execute a CHECKPOINT immediately after promoting the server
This ensures "pg_control" is updated with the latest timeline, mainly
to ensure that if "pg_rewind" is executed as part of a switchover
that it sees the latest timeline.

Per suggestion from GitHub user "superflav" in GitHub #378.

See also:

  https://www.postgresql.org/message-id/flat/20150428180253.GU30322%40tamriel.snowman.net
2018-04-03 14:44:44 +09:00
Ian Barwick
bde9fea48c Fix directory creation when cloning from Barman 2018-04-03 14:44:03 +09:00
Ian Barwick
cdaf84c329 doc: minor readbility fix 2018-04-03 14:42:48 +09:00
Ian Barwick
c4cd0c46da doc: add note about replication slots and PostgreSQL upgrades 2018-04-03 14:41:58 +09:00
Ian Barwick
3b00dc912a Catch various corner cases when restarting a PostgreSQL instance 2018-04-03 14:40:53 +09:00
Ian Barwick
1a80de1290 doc: document "primary_follow_timeout" configuration file parameter. 2018-04-03 14:39:38 +09:00
Ian Barwick
26b565dff2 Improve repmgrd logging in BDR mode
Also ensure interval status log line is shown as intended
2018-04-03 14:38:32 +09:00
Ian Barwick
96811ccc01 repmgrd: tweak log notices when marking a standby as failed
Announce what we're going to do (set the node record inactive) *before*
performing the action. Makes reading the log slightly easier.
2018-04-03 14:37:43 +09:00
Ian Barwick
73982859f6 repmgrd: improve log output
- emit explicit startup NOTICE
- emit NOTICE when falling back to degraded monitoring on a primary node
- improve log message and event notification details when monitoring
  a former primary which has been reconnected as a standby
2018-04-03 14:37:06 +09:00
Ian Barwick
afb7ca886c doc: note change of shared library name from "repmgr_funcs" to "repmgr" 2018-04-03 14:35:45 +09:00
Ian Barwick
df11ad894f doc: update release notes
Add note about requiring 4.0.3 or later on all nodes when performing
a switchover from a noder running 4.0.3 or later.

Per report in GitHub #388.
2018-04-03 14:35:18 +09:00
Ian Barwick
614b4ae84b doc: update 4.0.4 release date 2018-04-03 14:34:24 +09:00
Ian Barwick
1e1b4b1a65 "standby register/follow": provide primary node details for event notifications
For events generated by these commands, it may be useful to know details
of the primary node. This makes following additional parameters available
to event notification scripts:

- %p: node ID of the primary
- %a: node name of the primary
- %c: conninfo string for the primary

Implements GitHub #375
2018-04-03 14:32:19 +09:00
Ian Barwick
cf64f9e95c Always initialise t_conninfo_param_list structures 2018-04-03 14:31:24 +09:00
Ian Barwick
dfdebd6c08 Enable provision of "archive_cleanup_command" in recovery.conf
If "archive_cleanup_command" is defined in "repmgr.conf", a corresponding
entry will be made in the node's "recovery.conf" file after cloning a
standby.

Note that we recommend using PgBarman to manage WAL archives, but are
providing this facility to help repmgr to be integrated in existing environments.

Implements GitHub #416.
2018-04-03 14:10:21 +09:00
Ian Barwick
63a11f8926 "standby promote": make timeout values configurable
This introduces following new configuration file parameters, which
were previously hard-coded values:

 - promote_check_timeout
 - promote_check_interval

Implements GitHub #387.
2018-04-03 14:10:14 +09:00
Ian Barwick
a3f371b8c0 "node rejoin": actively check for node to rejoin cluster
Previously repmgr was relying on whatever command was configured to
start PostgreSQL to determine whether the node being rejoined had
started correctly. However it's preferable to actively poll the upstream
to confirm it has restarted and actually attached as a standby before
confirming success of the "node rejoin" action.

This can be overridden with the -W/--no-wait option.

(Note that for consistency with other PostgreSQL utilities, the
short form of the --wait option is now "-w"; this is currently
only used in "repmgr standby follow".)

Also update "repmgr node rejoin" documentation with a list of supported
options, and add some useful index entries for "pg_rewind".

Implements GitHub #415.
2018-04-03 10:34:44 +09:00
Ian Barwick
938692c169 doc: fix option description for "repmgr primary register" 2018-04-03 10:09:24 +09:00
Ian Barwick
ad24b04c35 Refactor pg_control parsing
The "data_checksum_version" field towards the end of the ControlFileData struct,
meaning its position varies between versions. Previously this wasn't a problem
as it was only required for operations involving 9.5 and later, and its position
within the control file has not changed between the current release and current
HEAD.

However, in order to support pg_rewind in 9.3 and 9.4, which both have changes in
the control file format, we'll need version-specific parsing. This will also make
it easier to deal with any future changes to the control file format.
2018-04-02 20:54:42 +09:00
Ian Barwick
3ccf1cf182 Enable pg_rewind to be used with PostgreSQL 9.3/9.4
pg_rewind is not part of the core distribution for those, but we
provided support in repmgr 3.3 so should extend it to repmgr 4.

Note that there is no check in place whether the pg_rewind binary
exists, so it's up to the user to ensure it's present.

Addresses GitHub #413.
2018-04-02 20:54:29 +09:00
Ian Barwick
5e4bdb5a1b repmgrd: handle failover with two nodes in the primary location
If two nodes were in the primary location, and at least one node in
another location, the non-failed node in the primary location was not
recognising itself as a promotion candidate.

Addresses GitHub #407.
2018-04-02 20:51:27 +09:00
Ian Barwick
50321bb95d Log pg_control access errors as WARNINGs rather than DEBUG
This will make it easier to diagnose issues, possibly with an incorrect
"data_directory" setting in "repmgr.conf".
2018-04-02 09:28:56 +09:00
Ian Barwick
253c215c12 Add TODO list
This file will collate various requests and ideas for future developement.
In particular it will reference requests which come in via the GitHub issue
tracker, so we can acknowledge and close off the request and not have an
open unresolved issue hanging around.
2018-03-30 14:24:36 +09:00
Ian Barwick
22c40ae62d doc: update HISTORY and release notes 2018-03-30 09:41:48 +09:00
Ian Barwick
239a548e9d "standby switchover": force checkpoint if pg_rewind requested.
Addresses issue described in GitHub #378.

PostgreSQL itself doesn't issue a checkpoint after promotion to ensure
the newly promoted server is available as quickly as possible, so we'll
only execute an explicit CHECKPOINT when it's actually required, i.e.
when pg_rewind will be executed. This is required as pg_rewind uses
the timeline reported in the pg_control file to compare with the
server to be rewound, and the pg_control timeline is only updated after
the first checkpoint, so there is an interval where pg_rewind will
erroneously assume both servers are on the timeline and take no action.
2018-03-29 23:55:08 +09:00
Ian Barwick
231ef5563e "standby switchover": update hint 2018-03-29 23:41:59 +09:00
Ian Barwick
e1413fa8ea Fix minimum accepted value for "degraded_monitoring_timeout"
Should be -1, the default.

Addresses GitHub #411.
2018-03-29 21:15:03 +09:00
Ian Barwick
7111483b65 repmgr: move demoted primary check to the final step during switchover
This will give the demoted primary more time to start up as a standby,
during which "standby follow" can be executed on sibling nodes, if
specified.
2018-03-27 16:44:15 +09:00
Ian Barwick
1558497ae4 repmgr: poll demoted primary after restart during switchover
During a switchover operation, once the demoted primary has been restarted
as a standby, repmgr attempts to reconnect to verify its status and drop
any redundant replication slots. However it's possible the standby may still
be in the startup phase, so poll for "standby_reconnect_timeout" seconds
before giving up.

Addresses GitHub #408.
2018-03-27 16:44:10 +09:00
Ian Barwick
9c5e76401f Fix "repmgr cluster crosscheck" output
Addresses GitHub #398.
2018-03-27 16:44:04 +09:00
Ian Barwick
a403da67bc Consolidate connection closure calls 2018-03-27 16:43:59 +09:00
Ian Barwick
71b13f5307 doc: add note about remote command execution
When executing a command on a remote server, repmgr expects the remote binary
to be in the same location as the local binary. It's reasonable to assume
repmgr will be deployed in a unified environment; if not, the onus is on the
user to ensure repmgr can find the remote binary, e.g. by creating appropriate
symlinks.

Addresses query in GitHub #406.
2018-03-27 16:43:55 +09:00
Ian Barwick
1c5561d114 Misc tweaks to witness code 2018-03-26 20:59:29 +09:00
Ian Barwick
c0b607ef41 doc: update list of event notifications 2018-03-23 10:40:39 +08:00
Ian Barwick
462fdca4b4 Tidy up queries in dbutils.c
- standardize formatting
- prefix various internal function calls with "pg_catalog.", to
  mitigate possible risks from CVE-2018-1058
2018-03-23 10:28:28 +08:00
Ian Barwick
0e55a60660 Add event "repmgrd_failover_aborted" 2018-03-21 13:23:06 +09:00
Ian Barwick
93deab3e96 Add error code ERR_FOLLOW_FAIL 2018-03-21 13:11:30 +09:00
Ian Barwick
81c69e3677 repmgrd: fix typo 2018-03-21 12:36:15 +09:00
Ian Barwick
0219f4c91f Always set "connect_timeout" when pinging a PostgreSQL instance
Insert "connect_timeout=2" into the connection parameters, if not
explicitly set by the user. This will prevent excessive wait time
for the host operating system to report a connection timeout.
2018-03-21 11:48:57 +09:00
Ian Barwick
85a4adc99c Update HISTORY 2018-03-21 06:48:32 +09:00
Martín Marqués
208d7d418e While reviewing 7cb6e5af8d before merging
I noticed that besides the result cleanup added, there was still a missing
spot inside the if condition.

Adding the PQclear that was missing.
2018-03-13 11:43:36 -03:00
Martín Marqués
7cb6e5af8d Merge pull request #403 from AndrzejNowicki/master
Clear node list to avoid memory leak on witness
2018-03-13 11:41:10 -03:00
Andrzej Nowicki
d2a2df13d5 One more memory leak fixed 2018-03-13 11:23:33 +01:00
Andrzej Nowicki
358e001218 Clear node list to avoid memory leak, fixes #402 2018-03-13 11:05:24 +01:00
Ian Barwick
d7702b3444 Correctly handle error message pointer when parsing strings.
When parsing conninfo strings, ensure the error message pointer is
actually returned to the caller.

Not a criticial issue, just meant the contents of the error message
were not being displayed.
2018-03-10 14:29:12 +09:00
Ian Barwick
a8286030c0 doc: update "repmgr primary unregister" description
As noted by GitHub user yonj1e in GitHub #396.
2018-03-08 19:11:41 +09:00
Ian Barwick
ff0ba3e19a doc: update FAQ
Additional clarification for "repmgr standby clone --recovery-conf-only"
2018-03-08 19:11:33 +09:00
Ian Barwick
6f5cce7e6f doc: update FAQ
Add entry about upgrading PostgreSQL
2018-03-08 19:11:21 +09:00
Ian Barwick
509f7a8255 Fix parsing of -k/--keep-history option
GitHub #394.
2018-03-07 19:22:04 +09:00
Ian Barwick
e8cdf72ecd Add 4.0.4 release notes 2018-03-07 19:21:49 +09:00
Ian Barwick
2a99dfa15b repmgrd: fix failover handling in "manual" mode
Regression was introduced in commit c7a585c555
2018-03-07 19:21:40 +09:00
Ian Barwick
bad034f7ee repmgrd: remove duplicate local record check in BDR mode 2018-03-07 19:21:33 +09:00
Ian Barwick
cdb504d700 Add event "repmgrd_shutdown"
Implements GitHub #393
2018-03-06 11:00:03 +09:00
Ian Barwick
0af2077bed repmgrd: add debug log output for "monitor_interval_secs" sleep in all modes 2018-03-06 10:56:21 +09:00
Emre Hasegeli
dea87b7285 Add witness options to the main help
GitHub #392
2018-03-06 10:55:06 +09:00
Martín Marqués
d6b13f3428 Merge pull request #391 from hasegeli/helpmissing
Add missing options to the main help
2018-03-02 15:36:53 -03:00
Emre Hasegeli
5808d8190e Add missing options to the main help 2018-03-02 17:08:50 +01:00
Ian Barwick
d2a5cc23cc "standby clone": improve replication user selection
Use the upstream node's replication user when checking the replication
connection.
2018-03-02 16:43:23 +09:00
Ian Barwick
9981ede1af "standby clone": fix --superuser handling
get_superuser_connection() was erroneously using the local node record
to connect to as a superuser, which works when registering the primary
but obviously not when cloning a standby.

Addresses GitHub #380.
2018-03-02 16:43:19 +09:00
Ian Barwick
40ccae57a3 Update HISTORY 2018-03-02 11:05:30 +09:00
Ian Barwick
3c2b8e5792 "standby clone": remove restriction on replication slots in Barman mode
While it's preferable to avoid standby replication slots if Barman is in
use, there's no technical reason to prevent this.

Implements GitHub #379.
2018-03-02 11:05:25 +09:00
Ian Barwick
354231284e repmgr: escape "restore_command" in generated recovery.conf 2018-03-02 11:05:21 +09:00
Ian Barwick
dbbfcb6a63 "standy clone": fix primary_conninfo when --upstream-conninfo provided 2018-03-02 11:05:15 +09:00
Ian Barwick
bc766a48ed repmgrd: retry standby connection after cascading standby failover 2018-03-02 11:05:07 +09:00
Ian Barwick
55441f2729 repmgrd: add configuration file parameter "standby_reconnect_timeout"
This is used for determining a timeout when reconnecting to the standby
after executing the "follow_command". This will normally not need to be
set explicitly, but maybe useful in cases where the standby's startup
phase can last longer than usual.
2018-03-02 11:04:56 +09:00
Ian Barwick
e38a9ec7e1 repmgrd: fix main monitoring loop for witness server
Missing "break" was breaking it when following a new primary.
2018-03-02 11:04:22 +09:00
Ian Barwick
c1356b9e0d repmgrd: retry standby connection after "follow_command" executed
It's possible that the standby is still starting up after the "follow_command"
completes, so poll for a while until we get a connection.
2018-03-02 11:04:19 +09:00
Ian Barwick
383a17fba1 doc: add <options> section for various commands 2018-02-26 16:54:27 +09:00
Ian Barwick
29cb153643 "node status": improve replication slot warnings
Addresses GitHub #385
2018-02-23 11:19:33 +09:00
Ian Barwick
15625183c1 "standby clone": document --recovery-conf-only option 2018-02-23 11:19:21 +09:00
Ian Barwick
b6a1b75d22 "standby clone --recovery-conf-only": display generated file with --dry-run
Refactor the original code which generates "recovery.conf" to place the
output into a buffer, which can either be output as "recovery.conf"
or copied to a buffer specified by the caller.
2018-02-23 11:18:45 +09:00
Ian Barwick
c644ddde51 Fix typo in function name 2018-02-22 15:50:57 +09:00
Ian Barwick
ee98a3a58e "standby clone": add --recovery-conf-only option
This will generate "recovery.conf" for an existing standby.

Typical use-case is a standby cloned manually from an external data
source (e.g. Barman), where "recovery.conf" needs to be created
(and if required a replication slot).

The --dry-run option will check the pre-requisites but not actually
create "recovery.conf" or a replication slot.

This requires that the upstream node is running, a replication connection
can be made and if required a replication slot can be created.

Implements GitHub #382.
2018-02-22 15:50:51 +09:00
Ian Barwick
22b3a74fa0 repmgrd: improve detection of status change from primary to standby
If repmgrd is running in degraded mode on a primary which has been stopped,
then manually been brought back online as a standby (e.g. by creating
recovery.conf and starting the server), ensure it not only detects the
change but automatically updates the node record so it can resume
monitoring the node as a standby.

Previously, repmgrd was looping waiting for the record to be updated
(as is done transparently when executing "repmgr node rejoin") but
if the record was not updated within the timeout period (e.g. by
"repmgr standby register) it would fail to resume monitoring as a
standby.

It seems reasonable to have repmgrd automatically update the node record,
as this will restore failover capability as quickly as possible. If this
is not desired, then the onus is on the user to shut down repmgrd while
making the desired changes.
2018-02-22 15:50:45 +09:00
Ian Barwick
98af51da03 "node rejoin": ensure --dry-run is honoured
Addresses GitHub #383.
2018-02-20 15:31:03 +09:00
Ian Barwick
e5eff3f6d5 doc: update 4.0.3 release notes 2018-02-16 12:15:44 +09:00
Ian Barwick
728a256a93 doc: update release notes 2018-02-16 12:15:35 +09:00
Ian Barwick
f5f02ae0ee Replace remaining instances of strcpy() with strncpy()
Also use strncmp() to match.
2018-02-15 13:31:55 +09:00
Ian Barwick
64d85587de repmgrd: check "repmgr" extension is installed before starting
Implements GitHub #361.
2018-02-12 11:38:31 +09:00
Ian Barwick
6b7f6089ba "node status": add warning about missing replication slots
Implements GitHub #364.
2018-02-12 11:38:27 +09:00
Ian Barwick
5719a0dfd3 Update repmgr.conf.sample
Add missing parameter "monitor_interval_secs"
2018-02-12 11:38:22 +09:00
Ian Barwick
927bf038a0 "standby switchover": check demotion candidate can make replication connection
Check it's actually possible for the demotion candidate to attach to
the promotion candidate before executing the switchover.

As with other checks of this nature, there's a faint possibility the
situation could change between the time the check is carried out and
the demotion candidate is restarted to connect to the promotion candidate,
but there's not a lot we can do about that. The main purpose is to
be able to catch existing misconfigurations before anything gets changed.

Implements GitHub #370.
2018-02-09 10:00:54 +09:00
Ian Barwick
76a93af15c "witness register": fix primary node check
Addresses GitHub #377, based on report by user yonj1e in #373.
2018-02-08 16:41:04 +09:00
Ian Barwick
ee2df36a76 "standby switchover": additional sanity checks
Check that sufficient walsenders will be available on the promotion
candidate, and if replication slots are in use check if enough of
those will be available.

Note these checks can't guarantee that the walsenders/slots will
be available at the appropriate points during the switchover process,
but do ensure that existing configuration problems will be caught.

Implements GitHub #371.
2018-02-08 15:19:24 +09:00
Ian Barwick
571e6b2783 "standby clone": cowardly refuse to clone into an active data directory
By checking the PID file in the same way pg_ctl does, we can be pretty
much certain whether the target data directory contains an active
PostgreSQL instance.
2018-02-08 10:19:05 +09:00
Ian Barwick
76cc11b786 Fix "standby clone" in Barman mode with --no-upstream-connection
"--upstream-node-id", if provided, was not being passed through to
the SQL query executed via the Barman server.

Also modified the query to select the primary node if "--upstream-node-id"
is not provided.

Note: this is a very niche use case.
2018-02-07 16:34:01 +09:00
Ian Barwick
56710f4819 repmgr: simplify data directory checks when cloning
Attempting to use the contents of pg_control to tell whether the directory
is in use by PostgreSQL can result in false positives; we should use
a check based on the pidfile.

Also change the HINT to indicate a data directory can be overwritten
if -F/--force is provided.
2018-02-07 14:45:37 +09:00
Ian Barwick
f9528efdb8 "standby clone": ensure "pg_subtrans" directory is created in Barman mode 2018-02-07 14:45:04 +09:00
Ian Barwick
658ec20e37 doc: fix GitHub reference in release notes 2018-02-07 14:43:47 +09:00
Ian Barwick
e6aa831782 Update HISTORY and release notes 2018-02-07 14:43:43 +09:00
Ian Barwick
9b56f157dc Move parse_output_to_argv() to configfile.c
So it can be used by parse_pg_basebackup_options().

Addresses GitHub #376.
2018-02-07 09:47:50 +09:00
Ian Barwick
05f872effe Fix typo in HINT 2018-02-07 08:56:29 +09:00
Ian Barwick
ae691688be doc: fix descriptions of %p event notification script parameter 2018-02-05 15:52:48 +09:00
Ian Barwick
57f1e939c5 "standby register": add event notification "standby_register_sync"
Implements GitHub #374.
2018-02-05 15:20:19 +09:00
Ian Barwick
48b5deebf3 doc: minor fixes to BDR docs
Also remove duplicate file.
2018-02-05 14:01:37 +09:00
Ian Barwick
1868453953 doc: improve BDR failover documentation 2018-02-05 13:25:49 +09:00
Ian Barwick
dd45189fa8 "cluster show": output any connection error messagesin list of warnings
This ensures any connection errors are displayed by default in a
comprehensible, easily reportable way, and saves having to request/filter
DEBUG output.

Implements GitHub #369.
2018-02-05 10:36:04 +09:00
Ian Barwick
a79c4fae88 "cluster show": minor code cleanup 2018-02-05 10:36:00 +09:00
Ian Barwick
657ed83921 "cluster show": improve handling of database errors
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.
2018-02-05 10:35:56 +09:00
Tony Finch
4fb085f52d "repmgr node status": correct upstream node info (#363)
repmgr was printing the name and ID of this node instead of its upstream

Signed-off-by: Tony Finch <dot@dotat.at>
2018-02-05 09:52:58 +09:00
Ian Barwick
d0bb5b1565 Ensure an inactive PostgreSQL data directory can be deleted.
Addresses GitHub #366.
2018-02-02 17:18:51 +09:00
Ian Barwick
ee64f3a745 "standby follow": finalize implementation of --dry-run option 2018-02-02 17:18:47 +09:00
Ian Barwick
6c81e54f76 "standby follow": check for replication slot availability on target node 2018-02-02 17:18:43 +09:00
Ian Barwick
65bf203a89 Improve "repmgr primary unregister" documentation and --help output
Per observations in GitHub #373
2018-02-02 17:18:36 +09:00
Ian Barwick
b4dbee517f doc: note password SSH requirements for "standby switchover" 2018-02-02 17:18:31 +09:00
Ian Barwick
e23d28a22d "standby follow": initial implementation of --dry-run option
GitHub #363.
2018-02-01 14:16:49 +09:00
Ian Barwick
811d2a45bd "standby switchover": improve log messages and add new exit code
Previously, if an issue was encountered with the old primary, but user
provided -F/--force to have repmgr promote the standby anyway, repmgr
would exit with the log message "STANDBY SWITCHOVER is complete"
and exit code 0 (SUCCESS).

To better report this partial completion, repmgr will now emit the message
"STANDBY SWITCHOVER has completed with issues" (and a HINT to check preceding
log messages) and new exit code 22 (ERR_SWITCHOVER_INCOMPLETE).
2018-01-31 11:03:54 +09:00
Ian Barwick
92f4710ee2 Have do_standby_follow_internal() not abort on error
Pass the error code back to the caller instead, mainly so
"repmgr node rejoin" can better report errors.
2018-01-31 11:03:27 +09:00
Ian Barwick
044d8a1098 repmgr: improve switchover handling when "pg_ctl" used
If logging output not explicitly rediretced with "-l" in the pg_ctl
options, repmgr would hang waiting for pg_ctl output.

Note that we recommend using the OS-level service commands where
available.
2018-01-30 16:56:26 +09:00
Ian Barwick
b38f45120c "repmgr standby register": improve error output when standby not running
Add explicit HINT
2018-01-27 07:17:34 +09:00
Ian Barwick
db3a046393 doc: expand upgrade documentation
Include section about using pg_upgrade
2018-01-25 10:48:24 +09:00
Ian Barwick
ec068e38a2 Remove --bdr-only configuration option
This was required for a specific use case during pre-release
development and is no longer needed now the physical streaming
replication handling is implemented.
2018-01-25 10:48:09 +09:00
Ian Barwick
3a382e826e doc: update 4.0.2 release notes
Add details about upgrading.
2018-01-19 09:10:42 +09:00
Ian Barwick
3dcf57a333 doc: add 4.0.2 release notes 2018-01-19 09:10:42 +09:00
Vlad
f658c8d3d8 doc: add missing word in overview
GitHub pull request #362
2018-01-19 09:09:40 +09:00
Ian Barwick
375a96a5c8 repmgrd: log execution error in "repmgrd_get_local_node_id()"
That shouldn't happen, but if it does it will make it easier to
identify the issue.
2018-01-16 11:16:19 +09:00
Ian Barwick
b4d6724405 doc: improve switchover documentation
Emphasize need to set the "service_*_command" options when repmgr is
installed from a package.
2018-01-16 11:16:19 +09:00
Ian Barwick
8fd0c4ad83 repmgr: assume node is actually shutting down if pingable and that's the reported status 2018-01-12 21:53:37 +09:00
Ian Barwick
7ccae6c2b1 repmgr: automatically create slot name if missing
It's possible that a node was registered with "use_replication_slots=false"
but that was later changed to "use_replication_slots=true". If the node
was not subsequently re-registered, the node record will contain an empty
slot name, which will cause any slot creation operation during
"standby follow" or "node rejoin" to fail.

To prevent this happening, check for an empty slot name and automatically
set before proceeding.

Addresses GitHub #343.
2018-01-11 14:47:50 +09:00
Ian Barwick
61d46172b9 repmgr: catch possible corner case when checking node shutdown status
It's conceivable that PQping is returning "no response" but the
shutdown hasn't quite completed.
2018-01-10 15:09:21 +09:00
Ian Barwick
810471b2f2 repmgr: during switchover, correctly detect unclean shutdown status 2018-01-10 12:25:16 +09:00
Ian Barwick
5bd8cf958a repmgr standby switchover: add "%p" event notification parameter
This will contain the node ID of the former primary.
2018-01-10 12:25:12 +09:00
Ian Barwick
5a45997db5 doc: document command line options for "standby switchover" 2018-01-10 12:25:07 +09:00
Ian Barwick
f1f5100007 repmgr standby switchover: add event details 2018-01-10 12:25:00 +09:00
Ian Barwick
1c8ad4d89b Consolidate parsing of output from executing repmgr on a remote server
This should also fix the issue reported in GitHub #349.
2018-01-09 16:24:13 +09:00
Ian Barwick
842a610e84 Fix call to is_active_bdr_node() in BDR repmgrd
Following the fix to "is_active_bdr_node()" in 841f03ae, it turns out
the call in repmgrd-bdr.c was only accidentally working; explicitly
test for a false return value.
2018-01-04 21:03:36 +09:00
Ian Barwick
fcb7e7a29b "repmgr bdr register": create missing connection replication set if needed
Previously the assumption was that the "repmgr" replication set would be
set up when the nodes are created, however no checks were implemented
and this was not well-documented.

Addresses GitHub #347.
2018-01-04 17:46:49 +09:00
Ian Barwick
26e404b1f3 "repmgr bdr register": improve node name check
We'll use "bdr.bdr_get_local_node_name()" to check the local BDR node
name and the repmgr one match.
2018-01-04 17:46:44 +09:00
Ian Barwick
625d032435 doc: link event notification page from relevate command reference pages 2018-01-04 14:56:15 +09:00
Ian Barwick
3d07d65966 doc: update package documentation 2018-01-04 14:56:12 +09:00
Ian Barwick
b705127a34 "repmgr standby register": add --wait-start option
Implements GitHub #356.
2018-01-04 14:56:08 +09:00
Ian Barwick
832b38c5cb doc: fix typos in "repmgr primary unregister" command reference 2018-01-04 14:56:02 +09:00
Ian Barwick
3739a7b84d doc: add link to event notifications page from "repmgr cluster event" 2018-01-04 14:55:56 +09:00
Ian Barwick
841f03aeba Fix query in is_active_bdr_node()
Boolean column was not being checked correctly.

Also add detail output in "repmgr node role --check", where the function
is called.
2018-01-04 14:55:51 +09:00
Ian Barwick
cad12b1fb7 "repmgr cluster event": move query to dbutils.c 2018-01-04 14:55:46 +09:00
Ian Barwick
d31cc80d26 docs: document "repmgr cluster event --terse" 2018-01-04 14:55:40 +09:00
Ian Barwick
625187a61e "repmgr cluster events": optionally omit "Details" column with --terse
Implements GitHub #360.
2018-01-04 14:55:34 +09:00
Ian Barwick
e64d965c6a repmgrd: document standby_[failure|recovery] event notifications
Also clean up the relevant code section.

Addresses GitHub #359.
2018-01-04 09:33:37 +09:00
Ian Barwick
5d8ec136e6 repmgr node rejoin: handle missing node record correctly
If a connection was provided for a database other than the "repmgr"
database, error was logged but execution continued, resulting in
the connection being finished twice.

Addresses GitHub #358.
2018-01-03 15:17:01 +09:00
Ian Barwick
9951a8e106 doc: add appendix with details about packages
work-in-progress
2018-01-02 17:23:24 +09:00
Ian Barwick
26a9e848fd Update copyright notices to 2018 2018-01-02 10:19:46 +09:00
Ian Barwick
ba0b0a497f doc: Fix event notification placeholder typo
Per report from Carlos.
2018-01-01 10:28:19 +09:00
Ian Barwick
09dc43a61c docs: update HISTORY 2017-12-27 10:22:25 +09:00
Ian Barwick
b349f82571 doc: update documentation build instructions
Describe how to build documentation as a single file, and also note
requirement to build against 9.6 or earlier.
2017-12-27 10:05:44 +09:00
Ian Barwick
adbb627850 Merge branch 'doc-nochunks' of https://github.com/fanf2/repmgr
Pull request GitHub #353.
2017-12-27 09:58:09 +09:00
Ian Barwick
c47f976bde repmgr.conf.sample: fix command line argument
"repmgr node check --archive-ready" is correct, however abbreviated
versions will be accepted by getopt_long() if they don't match
or partially match any other options.

Per report by "chaintng" in GitHub #355.
2017-12-27 09:39:14 +09:00
Tony Finch
7c8cd7a482 doc: an optional all-in-one-file manual 2017-12-21 18:31:05 +00:00
Ian Barwick
edce8addbd repmgr: add missing -W option to getopt_long() invocation
Addresses GitHub #350.
2017-12-20 10:24:58 +09:00
Martín Marqués
b0f6202448 Merge pull request #352 from dbonne/master
Fix package name
2017-12-19 15:21:51 -03:00
Daymel Bonne Solís
985b13b6d3 Fix package name 2017-12-19 13:09:55 -05:00
Martín Marqués
69e64a9464 Add more information to the setting up sudo without requiretty in
the documentation

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-12-14 14:39:22 -03:00
Martín Marqués
f58954b3be Switch spaces for tabs in repmgr.conf sample file.
This makes comments stay aligned in most cases the conf file is
modified, and when indentation changes, it's easy to re-align
(by removing or adding a tab)

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-12-14 07:00:05 -03:00
Ian Barwick
3761d17752 docs: update 4.0.1 release date 2017-12-13 15:16:26 +09:00
Ian Barwick
8c121da8a1 Add diagnostic option "repmgr node check --has-passfile"
This checks if the active libpq version (9.6 and later) has the
"passfile" option, and returns 0 if present, 1 if not.
`
2017-12-11 20:09:48 +09:00
Abhijit Menon-Sen
6e9e4543e8 Fix typo: upstream_node_id → upstream_node 2017-12-08 09:46:58 +05:30
Ian Barwick
c94f1b7338 Fix unpackaged upgrade SQL for PostgreSQL 9.3 2017-12-04 17:52:36 +09:00
Ian Barwick
f78c169c3d docs: improve event notification documentation 2017-11-29 14:43:28 +09:00
Ian Barwick
f2db9f3ea4 docs: minor fixes to various examples 2017-11-29 11:33:42 +09:00
Ian Barwick
9944324c3a docs: add additional note about setting "wal_log_hints"
Useful to reference this when discussing PostgreSQL configuration in
general.
2017-11-29 11:22:12 +09:00
Ian Barwick
836f32bdbc Update release notes 2017-11-28 13:42:09 +09:00
Ian Barwick
cebbc73c38 Update HISTORY 2017-11-28 13:01:45 +09:00
Ian Barwick
472d703d2e repmgr: initialise "voting_term" in "repmgr primary register"
This previously happened in the extension SQL code, which could
potentially cause replay problems if installing on a BDR cluster.

As this table is only required for streaming replication failover,
move the initialisation to "repmgr primary register".

Addresses GitHub #344 .
2017-11-28 11:08:12 +09:00
Ian Barwick
de34e4e89b docs: add 2ndQ yum repository installation instructions
These replace the HTML document at https://repmgr.org/yum-repository.html
2017-11-24 14:13:33 +09:00
Ian Barwick
3a8ee126f3 Delete any replication slots copied by pg_rewind
If --force-rewind is used in conjunction with "repmgr node rejoin",
any replication slots present on the source node will be copied too;
it's essential to remove these to prevent stale slots being extant
when the node starts up.

We do this at file system level *before* the server starts to minimize
the risk of any problems.

Addresses GitHub #334
2017-11-24 11:13:31 +09:00
Ian Barwick
da93dd1f57 docs: fix configuration file example
Per report from Carlos Chapi.
2017-11-24 09:26:09 +09:00
Ian Barwick
295c18f6ff repmgr: fix configuration file sanity check
The check was being carried out regardless of whether --copy-external-config-files
was specified, which means cloning will fail if no SSH connection is available.

Addresses GitHub #342
2017-11-23 22:48:34 +09:00
Ian Barwick
81beec54aa repmgr: fix return code output for repmgr node check --action=...
Addresses GitHub #340
2017-11-23 10:34:21 +09:00
Martín Marqués
2e42226f68 Fix missing FQN for the nodes table.
This bug was not detected before because most users work with the repmgr
user. For that reason, the repmgr schema is already in the search_path
by default.

Add the repmgr schema to the nodes table in the LEFT JOIN used for
cluster show (and in other places)

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-11-22 17:13:58 -03:00
Ian Barwick
de10d7984a docs: update 4.0.0 release notes 2017-11-21 16:54:13 +09:00
Ian Barwick
404aab4041 docs: miscellaneous updates 2017-11-20 15:47:59 +09:00
Ian Barwick
8c422d6084 Remove unneeded functions 2017-11-20 15:18:21 +09:00
Ian Barwick
8b78b7292d docs: add note about "service_promote_command" in repmgr.conf.sample
It must never contain "repmgr standby promote", as it is intended
to enable use of package-level promote commands such as Debian's
"pg_ctlcluster promote".

Addresses GitHub #336.
2017-11-20 12:29:47 +09:00
Ian Barwick
4cebba32e2 remove spurios "/base" path element in Barman tablespace cloning code.
Addresses GitHub #339
2017-11-20 10:50:26 +09:00
Ian Barwick
c9f12cfbe0 repmgr: don't add empty "passfile" parameter in recovery.conf 2017-11-20 10:27:45 +09:00
Ian Barwick
5b4c92392c docs: expand witness documentation 2017-11-17 11:00:43 +09:00
Ian Barwick
e2b94adec3 docs: miscellaneous cleanup 2017-11-17 09:39:11 +09:00
Ian Barwick
3164bfa043 docs: add initial witness server documentation 2017-11-17 08:51:21 +09:00
Ian Barwick
08b443dce0 repmgrd: renable monitoring data recording when in archive recovery.
The warning emitted gives the impression that monitoring data shouldn't
be written if there's no streaming replication, but we can and should
do this as long as we have a primary connection.

Explictly document this in the code.

Also remove an unused variable warning.
2017-11-16 17:17:17 +09:00
Ian Barwick
9165d27f9f "repmgr node ...": fixes for 9.3
Mainly to account for the lack of replication slots.
2017-11-16 11:25:16 +09:00
Ian Barwick
b8b991398a Escape double-quotes in strings passed to an event notification script
The string in question will be generated internally by repmgr as a simple
one-line string with no control characters etc., so all that needs to be
escaped at the moment are any double quotes.
2017-11-16 10:36:48 +09:00
Ian Barwick
a9a17f206e docs: improve documentation of pg_basebackup_options 2017-11-15 20:50:13 +09:00
Ian Barwick
9d432546bf repmgrd: don't fail over unless more than 50% of active nodes are visible. 2017-11-15 13:48:28 +09:00
Ian Barwick
3c557ebd8e repmgrd: finalize witness failover handling 2017-11-15 13:48:25 +09:00
Ian Barwick
4efeb52cba repmgrd: synchronise repmgr.nodes table on witness server 2017-11-15 13:48:21 +09:00
Ian Barwick
60422c66f9 repmgrd: handle witness server 2017-11-15 13:48:17 +09:00
Ian Barwick
b63872afbb "witness register": set upstream_node_id to that of the primary 2017-11-15 13:48:14 +09:00
Ian Barwick
a31980b590 repmgrd: basic witness node monitoring 2017-11-15 13:48:11 +09:00
Ian Barwick
e07a3c7976 docs: add witness command reference files to file list 2017-11-15 13:48:06 +09:00
Ian Barwick
9d9a1be062 docs: add command reference for "witness (un)register" 2017-11-15 13:48:03 +09:00
Ian Barwick
8208b3f844 witness (un)register: add --dry-run mode 2017-11-15 13:48:00 +09:00
Ian Barwick
ecb8297b1f witness unregister: enable execution when witness server is down
Also add help output for "repmgr witness --help".
2017-11-15 13:47:54 +09:00
Ian Barwick
1553596f84 repmgr: minor fix to "repmgr standby --help" output 2017-11-15 13:47:52 +09:00
Ian Barwick
022d9c58c2 Add "witness unregister" functionality 2017-11-15 13:47:48 +09:00
Ian Barwick
a6cc4d80f0 Add "witness register" functionality 2017-11-15 13:47:45 +09:00
Ian Barwick
7fffe3ed96 witness: initial code framework 2017-11-15 13:47:41 +09:00
Ian Barwick
9b93a595f5 docs: add some more index entries 2017-11-14 20:55:37 +09:00
Ian Barwick
c34e08b802 docs: document "passfile" configuration file parameter 2017-11-14 20:53:26 +09:00
Ian Barwick
eb14bb58c6 Add configuration file "passfile"
This will enable a custom .pgpass to be included in "primary_conninfo"
(provided it's supported by the libpq version on the standby).
2017-11-14 19:30:25 +09:00
Ian Barwick
aa28069d8b docs: update release notes
Add note about changes to password handling.1
2017-11-14 18:47:39 +09:00
Ian Barwick
a1e272f64c Update extension SQL 2017-11-13 10:02:46 +09:00
Ian Barwick
9908a9c662 repmgrd: detect role change from primary to standby
If repmgrd is monitoring a primary which is taken off-line, then later
restored as a standby, detect this change and resume monitoring
in standby node.

Addresses GitHub #338.
2017-11-10 17:19:30 +09:00
Ian Barwick
aa089820ab repmgrd: check shared library is loaded
If this isn't the case, "repmgrd" will appear to run but not handle
failover correctly.

Address GitHub #337.
2017-11-10 14:35:17 +09:00
Ian Barwick
0230bafae1 repmgrd: updates related to node_id handling 2017-11-10 12:07:31 +09:00
Ian Barwick
de577adc67 repmgrd: catch corner cases where monitoring data is not available 2017-11-09 22:27:09 +09:00
Ian Barwick
fed17d49e3 repmgrd: ensure shmem is reinitialised after a restart 2017-11-09 19:31:21 +09:00
Ian Barwick
d80763f974 repmgrd: misc fixes 2017-11-09 19:31:16 +09:00
Ian Barwick
331e982bdb repmgrd: fix priority/node_id tie-break check 2017-11-09 19:31:12 +09:00
Ian Barwick
4ca7e6a6bf repmgrd: remove unneeded functions 2017-11-09 19:31:08 +09:00
Ian Barwick
6ac6e0733a repmgrd: simplify the candidate selection logic
All disconnected nodes will be in a static, known state, so as long as
each node has the same meta-information (repmgr.nodes) and is able
to retrieve the last receive LSN of the other nodes, it is possible
for each node to independently determine the best promotion candidate,
thereby reaching consensus without an explicit "voting" process.
2017-11-09 19:31:04 +09:00
Ian Barwick
79d21b516b repmgrd: fixes to failover handling
get_new_primary() returns NULL if no notification for the new primary has
been received, but the code was expecting it to return UNKNOWN_NODE_ID,
which was causing repmgrd to prematurely drop out of the new primary
detection loop if no notification had been received by the time the loop
started.

Also store the electoral term as a single row, single column table,
to ensure that all repmgrds see the same turn. It is then bumped
by the winning node after it gets promoted.

Various logging improvements.
2017-11-08 14:28:08 +09:00
Ian Barwick
7232187f4d Ensure shared memory functions handle NULL parameters correctly 2017-11-08 12:19:07 +09:00
Ian Barwick
fe98270b3f Update .gitignore
Ignore output from "make installcheck"
2017-11-08 12:09:33 +09:00
Ian Barwick
5a3e20fc38 README: update links to https versions 2017-11-08 12:07:35 +09:00
Ian Barwick
4ef2b111da Fix lock acquisition in shared memory functions 2017-11-08 11:55:08 +09:00
Ian Barwick
97471626b4 Update repmgr.conf.sample 2017-11-02 17:43:03 +09:00
Ian Barwick
4bd236b64c docs: fix example in BDR section 2017-11-02 11:23:41 +09:00
Ian Barwick
615dd2ecf4 docs: tweak Markdown URL formatting 2017-11-01 10:58:23 +09:00
Ian Barwick
1c1887f9cc docs: update links to repmgr 4.0 documentation 2017-11-01 10:50:22 +09:00
Ian Barwick
d3f11a640d docs: update copyright info 2017-11-01 09:35:57 +09:00
Ian Barwick
2341da7a06 docs: convert command reference sections to <refentry> format
Note that most entries still need a bit more tidying up, consistent structuring,
provision of more examples etc.
2017-10-31 11:27:13 +09:00
Ian Barwick
2c468d64fb "standby follow": get upstream record before server restart, if required
The standby may not always be available for connections right after it's
restarted, so attempting to connect and get the node's upstream record
after the restart may fail. Record is now retrieved before the restart.

Addresses GitHub #333.
2017-10-27 16:30:14 +09:00
Ian Barwick
9d9b74d740 docs: add sample output to "standby follow" and "standby promote" 2017-10-27 15:03:34 +09:00
Ian Barwick
a90d4419a6 docs: add note about building docs 2017-10-27 10:44:16 +09:00
Ian Barwick
68756c79f3 Fix typo 2017-10-27 09:50:48 +09:00
Ian Barwick
8ad081e7b5 docs: finalize conversion of existing BDR repmgr documentation 2017-10-26 18:52:35 +09:00
Ian Barwick
6b76704817 Initial conversion of existing BDR repmgr documentation 2017-10-26 16:29:40 +09:00
Ian Barwick
c03c509e73 docs: update configuration documentation 2017-10-26 16:11:17 +09:00
Ian Barwick
d9db4f6c45 repmgr node rejoin: add --dry-run option 2017-10-25 11:01:58 +09:00
Ian Barwick
c89d59fe96 Improve trim() function
Did not cope well with trailing spaces or entirely blank strings.
2017-10-24 15:34:43 +09:00
Ian Barwick
02b6d3748b Docs: update "repmgr cluster show" 2017-10-24 13:48:38 +09:00
Ian Barwick
7c3abe28b9 Standardize terminology on "primary" (in place of "master") 2017-10-24 13:42:50 +09:00
Ian Barwick
a39b8ccc2d --dry-run available for "node rejoin" 2017-10-23 10:40:21 +09:00
Ian Barwick
5638d4ab89 docs: fix formatting 2017-10-23 09:59:29 +09:00
Ian Barwick
37bdad290c Add --help output for "repmgr node service"
Addresses GitHub #329.
2017-10-20 16:44:44 +09:00
Ian Barwick
8911434da5 Add --help output for "repmgr node rejoin"
Addresses GitHub #329.
2017-10-20 16:31:17 +09:00
Ian Barwick
8a2bbcebfd docs: fix typo 2017-10-20 16:05:05 +09:00
Ian Barwick
61f01f8305 node rewind: add check for pg_rewind and --dry-run mode
Addresses GitHub #330
2017-10-20 14:15:23 +09:00
Ian Barwick
a35d77b7f0 Note Barman configuration file parameter changes 2017-10-20 11:30:36 +09:00
Ian Barwick
40ea1abbb4 Fix error message typo 2017-10-20 11:18:53 +09:00
Ian Barwick
785bfe9837 Prevent relative configuration file path being stored in the repmgr metadata
The configuration file path is stored to make remote execution of repmgr
(e.g. during "repmgr standby switchover") simpler, so relative paths
make no sense.

Addresses GitHub #332
2017-10-20 10:57:43 +09:00
Ian Barwick
31cd54bcff Update README
Main body of documentation moved to DocBook format and hosted at:

    https://repmgr.org/docs/index.html

as the existing README and sundry additional files were becoming
unmanageable. Conversion to DocBook format enables all documentation
to be managed in a single structured system, with cross-references,
indexes, linkable URLS etc.
2017-10-19 16:32:00 +09:00
Ian Barwick
35c8bb4e75 docs: update "repmgr cluster show" page 2017-10-19 16:21:59 +09:00
Ian Barwick
6b9ac22029 docs: expand release notes and redirect "changes-in-repmgr4.md" 2017-10-19 14:09:14 +09:00
Ian Barwick
7bf3c78f57 Add 4.0 release notes 2017-10-19 13:58:41 +09:00
Ian Barwick
34ee16899e doc: add missing entry for "priority" in repmgr.conf.sample
Per report from Shaun Thomas.
2017-10-19 13:14:52 +09:00
Ian Barwick
0938685ae7 docs: add more index references 2017-10-19 12:21:50 +09:00
Ian Barwick
b400436fba docs: note way of forcing recovery then quitting in single user mode 2017-10-18 22:31:06 +09:00
Ian Barwick
2745c92fc8 Documentation: update markup 2017-10-18 11:12:20 +09:00
Ian Barwick
34c0131b2d Update package signature documentation 2017-10-18 10:50:49 +09:00
Ian Barwick
c9abfdcc04 Document "upgrading-from-repmgr3.md" moved to main repmgr documentation 2017-10-18 09:37:16 +09:00
Ian Barwick
a878d7aaea Update "repmgr node rejoin" documentation 2017-10-17 17:40:50 +09:00
Ian Barwick
93aa7cea1a Add placeholder FAQ.md
This replaces the original FAQ maintainted for repmgr 3.x; repmgr 4
documentation is now available in DocBook format.
2017-10-17 16:31:55 +09:00
Ian Barwick
f00e6296e9 Move deprecated command line option
Not required in repmgr4, we're keeping it around for backwards compatibility;
a warning will be issued if used.
2017-10-17 16:07:44 +09:00
Ian Barwick
91354a71cc Add FAQ to documentation 2017-10-17 15:46:36 +09:00
Ian Barwick
c78cb6e1d6 Bump dev version number 2017-10-17 13:09:37 +09:00
Ian Barwick
71430a9f65 Various documentation fixes 2017-10-17 11:00:37 +09:00
Ian Barwick
3e93f847fd Update doc version 2017-10-16 11:25:56 +09:00
199 changed files with 48052 additions and 13651 deletions

7
.github/CODEOWNERS vendored Normal file
View File

@@ -0,0 +1,7 @@
# Each line is a file pattern followed by one or more owners.
# These owners will be the default owners for everything in
# the repo. Unless a later match takes precedence,
# @global-owner1 and @global-owner2 will be requested for
# review when someone opens a pull request.
* @EnterpriseDB/repmgr-dev

10
.gitignore vendored
View File

@@ -39,14 +39,20 @@ lib*.pc
# test output
/results/
/doc/Makefile
/regression.diffs
/regression.out
# other
/.lineno
*.dSYM
*.orig
*.rej
# generated binaries
repmgr
repmgrd
repmgr4
repmgrd4
# generated files
configfile-scan.c

View File

@@ -2,7 +2,7 @@ License and Contributions
=========================
`repmgr` is licensed under the GPL v3. All of its code and documentation is
Copyright 2010-2017, 2ndQuadrant Limited. See the files COPYRIGHT and LICENSE for
Copyright 2010-2021, EnterpriseDB Corporation. See the files COPYRIGHT and LICENSE for
details.
The development of repmgr has primarily been sponsored by 2ndQuadrant customers.
@@ -12,10 +12,10 @@ which has received funding from the European Union's Seventh Framework Programme
(FP7/2007-2013) under grant agreement 258862.
Contributions to `repmgr` are welcome, and will be listed in the file `CREDITS`.
2ndQuadrant Limited requires that any contributions provide a copyright
EnterpriseDB Corporation requires that any contributions provide a copyright
assignment and a disclaimer of any work-for-hire ownership claims from the
employer of the developer. This lets us make sure that all of the repmgr
distribution remains free code. Please contact info@2ndQuadrant.com for a
distribution remains free code. Please contact info@enterprise.com for a
copy of the relevant Copyright Assignment Form.
Code style
@@ -24,7 +24,7 @@ Code style
Code in repmgr should be formatted to the same standards as the main PostgreSQL
project. For more details see:
https://www.postgresql.org/docs/current/static/source-format.html
https://www.postgresql.org/docs/current/source-format.html
Contributors should reformat their code similarly before submitting code to
the project, in order to minimize merge conflicts with other work.

View File

@@ -1,4 +1,4 @@
Copyright (c) 2010-2017, 2ndQuadrant Limited
Copyright (c) 2010-2021, EnterpriseDB Corporation
All rights reserved.
This program is free software: you can redistribute it and/or modify
@@ -12,5 +12,5 @@ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see http://www.gnu.org/licenses/
along with this program. If not, see https://www.gnu.org/licenses/
to obtain one.

10
FAQ.md Normal file
View File

@@ -0,0 +1,10 @@
FAQ - Frequently Asked Questions about repmgr
=============================================
The repmgr 4 FAQ is located here: [repmgr FAQ (Frequently Asked Questions)](https://repmgr.org/docs/current/appendix-faq.html "repmgr FAQ")
The repmgr 3.x FAQ can be found here:
https://github.com/EnterpriseDB/repmgr/blob/REL3_3_STABLE/FAQ.md
Note that repmgr 3.x is no longer supported.

346
HISTORY
View File

@@ -1,6 +1,344 @@
4.0 2017-10-04
Complete rewrite with many changes; see file "doc/upgrading-from-repmgr3.md"
for details.
5.5.0 2024-11-20
Support for PostgreSQL 17 added
Fix warnings detected by the -Wshadow=compatible-local
added in PostgreSQL 16
5.4.1 2023-07-04
repmgrd: ensure witness node metadata is updated (Ian)
5.4.0 2023-03-16
Support cloning replicas using pg-backup-api
5.3.3 2022-10-17
Support for PostgreSQL added
repmgrd: ensure event notification script is called for event
"repmgrd_upstream_disconnect"; GitHub #760 (Ian)
5.3.2 2022-05-25
standby clone: don't error out if unable to determine cluster size (Ian)
node check: fix --downstream --nagios output; GitHub #749 (Ian)
repmgrd: ensure witness node marked active (hslightdb)
repmgrd: improve walsender disable check (Ian)
general: ensure replication slots can be dropped by a
replication-only user (Ian)
5.3.1 2022-02-15
repmgrd: fixes for potential connection leaks (hslightdb)
repmgr: fix upgrade path from repmgr 4.2 and 4.3 to repmgr 5.3 (Ian)
5.3.0 2021-10-12
standby switchover: improve handling of node rejoin failure (Ian)
repmgrd: prefix all shared library functions with "repmgr_" to
minimize the risk of clashes with other shared libraries (Ian)
repmgrd: at startup, if node record is marked as "inactive", attempt
to set it to "active" (Ian)
standby clone: set "slot_name" in node record if required (Ian)
node rejoin: emit rejoin target note information as NOTICE (Ian)
repmgrd: ensure short option "-s" is accepted (Ian)
5.2.1 2020-12-07
config: fix parsing of "replication_type"; GitHub #672 (Ian)
standby clone: handle missing "postgresql.auto.conf" (Ian)
standby clone: add option --recovery-min-apply-delay (Ian)
standby clone: fix data directory permissions handling for
PostgreSQL 11 and later (Ian)
repmgrd: prevent termination when local node not available and
standby_disconnect_on_failover; GitHub #675 (Ian)
repmgrd: ensure reconnect_interval" is correctly handled;
GitHub #673 (Ian)
5.2.0 2020-10-22
general: add support for PostgreSQL 13 (Ian)
general: remove support for PostgreSQL 9.3 (Ian)
config: add support for file inclusion directives (Ian)
repmgr: "primary unregister --force" will unregister an active primary
with no registered standby nodes (Ian)
repmgr: add option --verify-backup to "standby clone" (Ian)
repmgr: "standby clone" honours --waldir option if set in
"pg_basebackup_options" (Ian)
repmgr: add option --db-connection to "node check" (Ian)
repmgr: report database connection error if the --optformat option was
provided to "node check" (Ian)
repmgr: improve "node rejoin" checks (Ian)
repmgr: enable "node rejoin" to join a target with a lower timeline (Ian)
repmgr: support pg_rewind's automatic crash recovery in Pg13 and later (Ian)
repmgr: improve output formatting for cluster matrix/crosscheck (Ian)
repmgr: improve database connection failure error checking on the
demotion candidate during "standby switchover" (Ian)
repmgr: make repmgr metadata tables dumpable (Ian)
repmgr: fix issue with tablespace mapping when cloning from Barman;
GitHub #650 (Ian)
repmgr: improve handling of pg_control read errors (Ian)
repmgrd: add additional optional parameters to "failover_validation command"
(spaskalev; GitHub #651)
repmgrd: ensure primary connection is reset if same as upstream;
GitHub #633 (Ian)
5.1.0 2020-04-13
repmgr: remove BDR 2.x support
repmgr: don't query upstream's data directory (Ian)
repmgr: rename --recovery-conf-only to --replication-conf-only (Ian)
repmgr: ensure postgresql.auto.conf is created with correct permissions (Ian)
repmgr: minimize requirement to check upstream data directory location
during "standby clone" (Ian)
repmgr: warn about missing pg_rewind prerequisites when executing
"standby clone" (Ian)
repmgr: add --upstream option to "node check"
repmgr: report error code on follow/rejoin failure due to non-available
replication slot (Ian)
repmgr: ensure "node rejoin" checks for available replication slots (Ian)
repmgr: improve "standby switchover" completion checks (Ian)
repmgr: add replication configuration file ownership check to
"standby switchover" (Ian)
repmgr: check the demotion candidate's registered repmgr.conf file can
be found (laixiong; GitHub 615)
repmgr: consolidate replication connection code (Ian)
repmgr: check permissions for "pg_promote()" and fall back to pg_ctl
if necessary (Ian)
repmgr: in --dry-run mode, display promote command which will be used (Ian)
repmgr: enable "service_promote_command" in PostgreSQL 12 (Ian)
repmgr: accept option -S/--superuser for "node check"; GitHub #612 (Ian)
5.0 2019-10-15
general: add PostgreSQL 12 support (Ian)
general: parse configuration file using flex (Ian)
repmgr: rename "repmgr daemon ..." commands to "repmgr service ..." (Ian)
repmgr: improve data directory check (Ian)
repmgr: improve extension check during "standby clone" (Ian)
repmgr: pass provided log level when executing repmgr remotely (Ian)
repmgrd: fix handling of upstream node change check (Ian)
4.4 2019-06-27
repmgr: improve "daemon status" output (Ian)
repmgr: add "--siblings-follow" option to "standby promote" (Ian)
repmgr: add "--repmgrd-force-unpause" option to "standby switchover" (Ian)
repmgr: fix data directory permissions issue in barman mode where
an existing directory is being overwritten (Ian)
repmgr: improve "--dry-run" behaviour for "standby promote" and
"standby switchover" (Ian)
repmgr: when running "standby clone" with the "--upstream-conninfo" option
ensure that "application_name" is set correctly in "primary_conninfo" (Ian)
repmgr: ensure "--dry-run" together with --force when running "standby clone"
in barman mode does not modify an existing data directory (Ian)
repmgr: improve "--dry-run" output when running "standby clone" in
basebackup mode (Ian)
repmgr: improve upstream walsender checks when running "standby clone" (Ian)
repmgr: display node timeline ID in "cluster show" output (Ian)
repmgr: in "cluster show" and "daemon status", show upstream node name
as reported by each individual node (Ian)
repmgr: in "cluster show" and "daemon status", check if a node is attached
to its advertised upstream node
repmgr: use --compact rather than --terse option in "cluster event" (Ian)
repmgr: prevent a standby being cloned from a witness server (Ian)
repmgr: prevent a witness server being registered on the cluster primary (John)
repmgr: ensure BDR2-specific functionality cannot be used on
BDR3 and later (Ian)
repmgr: canonicalize the data directory path (Ian)
repmgr: note that "standby follow" requires a primary to be available (Ian)
repmgrd: monitor standbys attached to primary (Ian)
repmgrd: add "primary visibility consensus" functionality (Ian)
repmgrd: fix memory leak which occurs while the monitored PostgreSQL
node is not running (Ian)
general: documentation converted to DocBook XML format (Ian)
4.3 2019-04-02
repmgr: add "daemon (start|stop)" command; GitHub #528 (Ian)
repmgr: add --version-number command line option (Ian)
repmgr: add --compact option to "cluster show"; GitHub #521 (Ian)
repmgr: cluster show - differentiate between unreachable nodes
and nodes which are running but rejecting connections (Ian)
repmgr: add --dry-run option to "standby promote"; GitHub #522 (Ian)
repmgr: add "node check --data-directory-config"; GitHub #523 (Ian)
repmgr: prevent potential race condition in "standby switchover"
when checking received WAL location; GitHub #518 (Ian)
repmgr: ensure "standby switchover" verifies repmgr can read the
data directory on the demotion candidate; GitHub #523 (Ian)
repmgr: ensure "standby switchover" verifies replication connection
exists; GitHub #519 (Ian)
repmgr: add sanity check for correct extension version (Ian)
repmgr: ensure "witness register --dry-run" does not attempt to read node
tables if repmgr extension not installed; GitHub #513 (Ian)
repmgr: ensure "standby register" fails when --upstream-node-id is the
same as the local node ID (Ian)
repmgrd: check binary and extension major versions match; GitHub #515 (Ian)
repmgrd: on a cascaded standby, don't fail over if "failover=manual";
GitHub #531 (Ian)
repmgrd: don't consider nodes where repmgrd is not running as promotion
candidates (Ian)
repmgrd: add option "connection_check_type" (Ian)
repmgrd: improve witness monitoring when primary node not available (Ian)
repmgrd: handle situation where a primary has unexpectedly appeared
during failover; GitHub #420 (Ian)
general: fix Makefile (John)
4.2 2018-10-24
repmgr: add parameter "shutdown_check_timeout" for use by "standby switchover";
GitHub #504 (Ian)
repmgr: add "--node-id" option to "repmgr cluster cleanup"; GitHub #493 (Ian)
repmgr: report unreachable nodes when running "repmgr cluster (matrix|crosscheck);
GitHub #246 (Ian)
repmgr: add configuration file parameter "repmgr_bindir"; GitHub #246 (Ian)
repmgr: fix "Missing replication slots" label in "node check"; GitHub #507 (Ian)
repmgrd: fix parsing of -d/--daemonize option (Ian)
repmgrd: support "pausing" of repmgrd (Ian)
4.1.1 2018-09-05
logging: explicitly log the text of failed queries as ERRORs to
assist logfile analysis; GitHub #498
repmgr: truncate version string, if necessary; GitHub #490 (Ian)
repmgr: improve messages emitted during "standby promote" (Ian)
repmgr: "standby clone" - don't copy external config files in --dry-run
mode; GitHub #491 (Ian)
repmgr: add "cluster_cleanup" event; GitHub #492 (Ian)
repmgr: (standby switchover) improve detection of free walsenders;
GitHub #495 (Ian)
repmgr: (node rejoin) improve replication slot handling; GitHub #499 (Ian)
repmgrd: ensure that sending SIGHUP always results in the log file
being reopened; GitHub #485 (Ian)
repmgrd: report version number *after* logger initialisation; GitHub #487 (Ian)
repmgrd: fix startup on witness node when local data is stale; GitHub #488/#489 (Ian)
repmgrd: improve cascaded standby failover handling; GitHub #480 (Ian)
repmgrd: improve reconnection handling (Ian)
4.1.0 2018-07-31
repmgr: change default log_level to INFO, add documentation; GitHub #470 (Ian)
repmgr: add "--missing-slots" check to "repmgr node check" (Ian)
repmgr: improve command line error handling; GitHub #464 (Ian)
repmgr: fix "standby register --wait-sync" when no timeout provided (Ian)
repmgr: "cluster show" returns non-zero value if an issue encountered;
GitHub #456 (Ian)
repmgr: "node check" and "node status" returns non-zero value if an issue
encountered (Ian)
repmgr: add CSV output mode to "cluster event"; GitHub #471 (Ian)
repmgr: add -q/--quiet option to suppress non-error output; GitHub #468 (Ian)
repmgr: "node status" returns non-zero value if an issue encountered (Ian)
repmgr: enable "recovery_min_apply_delay" to be 0; GitHub #448 (Ian)
repmgr: "cluster cleanup" - add missing help options; GitHub #461/#462 (gclough)
repmgr: ensure witness node follows new primary after switchover;
GitHub #453 (Ian)
repmgr: fix witness node handling in "node check"/"node status";
GitHub #451 (Ian)
repmgr: fix "primary_slot_name" when using "standby clone" with --recovery-conf-only;
GitHub #474 (Ian)
repmgr: don't perform a switchover if an exclusive backup is running;
GitHub #476 (Martín)
repmgr: enable "witness unregister" to be run on any node; GitHub #472 (Ian)
repmgrd: create a PID file by default; GitHub #457 (Ian)
repmgrd: daemonize process by default; GitHub #458 (Ian)
4.0.6 2018-06-14
repmgr: (witness register) prevent registration of a witness server with the
same name as an existing node (Ian)
repmgr: (standby follow) check node has actually connected to new primary
before reporting success; GitHub #444 (Ian)
repmgr: (standby clone) improve handling of external configuration file copying,
including consideration in --dry-run check; GitHub #443 (Ian)
repmgr: (standby clone) don't require presence of "user" parameter in
conninfo string; GitHub #437 (Ian)
repmgr: (standby clone) improve documentation of --recovery-conf-only
mode; GitHub #438 (Ian)
repmgr: (node rejoin) fix bug when parsing --config-files parameter;
GitHub #442 (Ian)
repmgr: when using --dry-run, force log level to INFO to ensure output
will always be displayed; GitHub #441 (Ian)
repmgr: (cluster matrix/crosscheck) return non-zero exit code if node
connection issues detected; GitHub #447 (Ian)
repmgrd: ensure local node is counted as quorum member; GitHub #439 (Ian)
4.0.5 2018-05-02
repmgr: poll demoted primary after restart as a standby during a
switchover operation; GitHub #408 (Ian)
repmgr: add configuration parameter "config_directory"; GitHub #424 (Ian)
repmgr: add "dbname=replication" to all replication connection strings;
GitHub #421 (Ian)
repmgr: add sanity check if --upstream-node-id not supplied when executing
"standby register"; GitHub #395 (Ian)
repmgr: enable provision of "archive_cleanup_command" in recovery.conf;
GitHub #416 (Ian)
repmgr: actively check for node to rejoin cluster; GitHub #415 (Ian)
repmgr: enable pg_rewind to be used with PostgreSQL 9.3/9.4; GitHub #413 (Ian)
repmgr: fix minimum accepted value for "degraded_monitoring_timeout";
GitHub #411 (Ian)
repmgr: fix superuser password handling; GitHub #400 (Ian)
repmgr: fix parsing of "archive_ready_critical" configuration file
parameter; GitHub #426 (Ian)
repmgr: fix display of conninfo parsing error messages (Ian)
repmgr: fix "repmgr cluster crosscheck" output; GitHub #389 (Ian)
repmgrd: prevent standby connection handle from going stale (Ian)
repmgrd: fix memory leaks in witness code; GitHub #402 (AndrzejNowicki, Martín)
repmgrd: handle "pg_ctl promote" timeout; GitHub #425 (Ian)
repmgrd: handle failover situation with only two nodes in the primary
location, and at least one node in another location; GitHub #407 (Ian)
repmgrd: set "connect_timeout=2" when pinging a server (Ian)
4.0.4 2018-03-09
repmgr: add "standby clone --recovery-conf-only" option; GitHub #382 (Ian)
repmgr: make "standby promote" timeout values configurable; GitHub #387 (Ian)
repmgr: improve replication slot warnings generated by "node status";
GitHub #385 (Ian)
repmgr: remove restriction on replication slots when cloning from
a Barman server; GitHub #379 (Ian)
repmgr: ensure "node rejoin" honours "--dry-run" option; GitHub #383 (Ian)
repmgr: fix --superuser handling when cloning a standby; GitHub #380 (Ian)
repmgr: update various help options; GitHub #391, #392 (hasegeli)
repmgrd: add event "repmgrd_shutdown"; GitHub #393 (Ian)
repmgrd: improve detection of status change from primary to standby (Ian)
repmgrd: improve log output in various situations (Ian)
repmgrd: improve reconnection to the local node after a failover (Ian)
repmgrd: ensure witness server connects to new primary after a failover (Ian)
4.0.3 2018-02-15
repmgr: improve switchover handling when "pg_ctl" used to control the
server and logging output is not explicitly redirected (Ian)
repmgr: improve switchover log messages and exit code when old primary could
not be shut down cleanly (Ian)
repmgr: check demotion candidate can make a replication connection to the
promotion candidate before executing a switchover; GitHub #370 (Ian)
repmgr: add check for sufficient walsenders/replication slots before executing
a switchover; GitHub #371 (Ian)
repmgr: add --dry-run mode to "repmgr standby follow"; GitHub #368 (Ian)
repmgr: provide information about the primary node for "standby_register" and
"standby_follow" event notifications; GitHub #375 (Ian)
repmgr: add "standby_register_sync" event notification; GitHub #374 (Ian)
repmgr: output any connection error messages in "cluster show"'s list of
warnings; GitHub #369 (Ian)
repmgr: ensure an inactive data directory can be deleted; GitHub #366 (Ian)
repmgr: fix upstream node display in "repmgr node status"; GitHub #363 (fanf2)
repmgr: improve/clarify documentation and update --help output for
"primary unregister"; GitHub #373 (Ian)
repmgr: allow replication slots when Barman is configured; GitHub #379 (Ian)
repmgr: fix parsing of "pg_basebackup_options"; GitHub #376 (Ian)
repmgr: ensure "pg_subtrans" directory is created when cloning a standby in
Barman mode (Ian)
repmgr: fix primary node check in "witness register"; GitHub #377 (Ian)
4.0.2 2018-01-18
repmgr: add missing -W option to getopt_long() invocation; GitHub #350 (Ian)
repmgr: automatically create slot name if missing; GitHub #343 (Ian)
repmgr: fixes to parsing output of remote repmgr invocations; GitHub #349 (Ian)
repmgr: BDR support - create missing connection replication set
if required; GitHub #347 (Ian)
repmgr: handle missing node record in "repmgr node rejoin"; GitHub #358 (Ian)
repmgr: enable documentation to be build as single HTML file; GitHub #353 (fanf2)
repmgr: recognize "--terse" option for "repmgr cluster event"; GitHub #360 (Ian)
repmgr: add "--wait-start" option for "repmgr standby register"; GitHub #356 (Ian)
repmgr: add "%p" event notification parameter for "repmgr standby switchover"
containing the node ID of the demoted primary (Ian)
docs: various fixes and updates (Ian, Daymel, Martín, ams)
4.0.1 2017-12-13
repmgr: ensure "repmgr node check --action=" returns appropriate return
code; GitHub #340 (Ian)
repmgr: add missing schema qualification in get_all_node_records_with_upstream()
query GitHub #341 (Martín)
repmgr: initialise "voting_term" table in application, not extension SQL;
GitHub #344 (Ian)
repmgr: delete any replication slots copied by pg_rewind; GitHub #334 (Ian)
repmgr: fix configuration file sanity check; GitHub #342 (Ian)
4.0.0 2017-11-21
Complete rewrite with many changes; for details see the repmgr 4.0.0 release
notes at: https://repmgr.org/docs/4.0/release-4.0.0.html
3.3.2 2017-06-01
Add support for PostgreSQL 10 (Ian)
@@ -223,7 +561,7 @@
Add a ssh_options parameter (Jay Taylor)
2.0beta1 2012-07-27
Make CLONE command try to make an exact copy including $PGDATA location (Cedric)
Make CLONE command try to make an exact copy including $PGDATA location (Cedric)
Add detection of master failure (Jaime)
Add the notion of a witness server (Jaime)
Add autofailover capabilities (Jaime)

View File

@@ -2,6 +2,7 @@
# Makefile.global.in
# @configure_input@
# Can only be built using pgxs
USE_PGXS=1
@@ -14,14 +15,26 @@ ifeq ($(vpath_build),yes)
VPATH := $(repmgr_abs_srcdir)/$(repmgr_subdir)
USE_VPATH :=$(VPATH)
endif
SED=@SED@
GIT_WORK_TREE=${repmgr_abs_srcdir}
GIT_DIR=${repmgr_abs_srcdir}/.git
export GIT_DIR
export GIT_WORK_TREE
PG_LDFLAGS=-lcurl -ljson-c
include $(PGXS)
-include ${repmgr_abs_srcdir}/Makefile.custom
REPMGR_VERSION=$(shell awk '/^\#define REPMGR_VERSION / { print $3; }' ${repmgr_abs_srcdir}/repmgr_version.h.in | cut -d '"' -f 2)
REPMGR_RELEASE_DATE=$(shell awk '/^\#define REPMGR_RELEASE_DATE / { print $3; }' ${repmgr_abs_srcdir}/repmgr_version.h.in | cut -d '"' -f 2)
FLEX = flex
##########################################################################
#
# Global targets and rules
%.c: %.l
$(FLEX) $(FLEXFLAGS) -o'$@' $<

View File

@@ -11,7 +11,30 @@ EXTENSION = repmgr
DATA = \
repmgr--unpackaged--4.0.sql \
repmgr--4.0.sql
repmgr--unpackaged--5.1.sql \
repmgr--unpackaged--5.2.sql \
repmgr--unpackaged--5.3.sql \
repmgr--4.0.sql \
repmgr--4.0--4.1.sql \
repmgr--4.1.sql \
repmgr--4.1--4.2.sql \
repmgr--4.2.sql \
repmgr--4.2--4.3.sql \
repmgr--4.3.sql \
repmgr--4.3--4.4.sql \
repmgr--4.4.sql \
repmgr--4.4--5.0.sql \
repmgr--5.0.sql \
repmgr--5.0--5.1.sql \
repmgr--5.1.sql \
repmgr--5.1--5.2.sql \
repmgr--5.2.sql \
repmgr--5.2--5.3.sql \
repmgr--5.3.sql \
repmgr--5.3--5.4.sql \
repmgr--5.4.sql \
repmgr--5.4--5.5.sql \
repmgr--5.5.sql
REGRESS = repmgr_extension
@@ -26,24 +49,36 @@ all: \
PG_CPPFLAGS = -std=gnu89 -I$(includedir_internal) -I$(libpq_srcdir) -Wall -Wmissing-prototypes -Wmissing-declarations $(EXTRA_CFLAGS)
SHLIB_LINK = $(libpq)
HEADERS = $(wildcard *.h)
OBJS = \
repmgr.o
include Makefile.global
ifeq ($(vpath_build),yes)
HEADERS = $(wildcard *.h)
else
HEADERS_built = $(wildcard *.h)
endif
$(info Building against PostgreSQL $(MAJORVERSION))
REPMGR_CLIENT_OBJS = repmgr-client.o \
repmgr-action-primary.o repmgr-action-standby.o repmgr-action-bdr.o repmgr-action-cluster.o repmgr-action-node.o \
configfile.o log.o strutil.o controldata.o dirutil.o compat.o dbutils.o
REPMGRD_OBJS = repmgrd.o repmgrd-physical.o repmgrd-bdr.o configfile.o log.o dbutils.o strutil.o controldata.o
repmgr-action-primary.o repmgr-action-standby.o repmgr-action-witness.o \
repmgr-action-cluster.o repmgr-action-node.o repmgr-action-service.o repmgr-action-daemon.o \
configdata.o configfile.o configfile-scan.o log.o strutil.o controldata.o dirutil.o compat.o \
dbutils.o sysutils.o pgbackupapi.o
REPMGRD_OBJS = repmgrd.o repmgrd-physical.o configdata.o configfile.o configfile-scan.o log.o \
dbutils.o strutil.o controldata.o compat.o sysutils.o
DATE=$(shell date "+%Y-%m-%d")
repmgr_version.h: repmgr_version.h.in
sed '0,/REPMGR_VERSION_DATE/s,\(REPMGR_VERSION_DATE\).*,\1 "$(DATE)",' $< >$@
$(SED) -E 's/REPMGR_VERSION_DATE.*""/REPMGR_VERSION_DATE "$(DATE)"/' $< >$@; \
$(SED) -i -E 's/PG_ACTUAL_VERSION_NUM/PG_ACTUAL_VERSION_NUM $(VERSION_NUM)/' $@
configfile-scan.c: configfile-scan.l
$(REPMGR_CLIENT_OBJS): repmgr-client.h repmgr_version.h
@@ -63,10 +98,19 @@ Makefile: Makefile.in config.status configure
Makefile.global: Makefile.global.in config.status configure
./config.status $@
doc:
$(MAKE) -C doc all
doc: repmgr_version.h
$(MAKE) -C doc html
install-doc:
doc-repmgr.html: repmgr_version.h
$(MAKE) -C doc repmgr.html
doc-repmgr-A4.pdf: repmgr_version.h
$(MAKE) -C doc repmgr-A4.pdf
doc-repmgr-US.pdf: repmgr_version.h
$(MAKE) -C doc repmgr-US.pdf
install-doc: doc
$(MAKE) -C doc install
clean: additional-clean
@@ -74,27 +118,17 @@ clean: additional-clean
maintainer-clean: additional-maintainer-clean
additional-clean:
rm -f repmgr-client.o
rm -f repmgr-action-primary.o
rm -f repmgr-action-standby.o
rm -f repmgr-action-bdr.o
rm -f repmgr-action-node.o
rm -f repmgr-action-cluster.o
rm -f repmgrd.o
rm -f repmgrd-physical.o
rm -f repmgrd-bdr.o
rm -f compat.o
rm -f configfile.o
rm -f controldata.o
rm -f dbutils.o
rm -f dirutil.o
rm -f log.o
rm -f strutil.o
rm -f *.o
rm -f repmgr_version.h
$(MAKE) -C doc clean
maintainer-additional-clean: clean
rm -f configure
additional-maintainer-clean: clean
$(MAKE) -C doc maintainer-clean
rm -f config.status config.log
rm -f config.h
rm -f repmgr_version.h
rm -f Makefile
rm -f Makefile.global
@rm -rf autom4te.cache/
ifeq ($(MAJORVERSION),$(filter $(MAJORVERSION),9.3 9.4))
@@ -109,3 +143,4 @@ installdirs-scripts:
.PHONY: installdirs-scripts
endif
.PHONY: doc doc-repmgr.html doc-repmgr-A4.pdf doc-repmgr-US.pdf install-doc

2114
README.md

File diff suppressed because it is too large Load Diff

20
TODO.md Normal file
View File

@@ -0,0 +1,20 @@
TODO
====
This file contains a list of improvements which are desirable and/or have
been requested, and which we aim to address/implement when time and resources
permit.
It is *not* a roadmap and there's no guarantee of any item being implemented
within any given timeframe.
Enable suspension of repmgrd failover
-------------------------------------
When performing maintenance, e.g. a switchover, it's necessary to stop all
repmgrd nodes to prevent unintended failover; this is obviously inconvenient.
We'll need to implement some way of notifying each repmgrd to suspend automatic
failover until further notice.
Requested in GitHub #410 ( https://github.com/EnterpriseDB/repmgr/issues/410 )

View File

@@ -6,7 +6,7 @@
* supported PostgreSQL versions. They're unlikely to change but
* it would be worth keeping an eye on them for any fixes/improvements.
*
* Copyright (c) 2ndQuadrant, 2010-2017
* Copyright (c) EnterpriseDB Corporation, 2010-2021
*
* Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -98,9 +98,42 @@ appendShellString(PQExpBuffer buf, const char *str)
if (*p == '\'')
appendPQExpBufferStr(buf, "'\"'\"'");
else if (*p == '&')
appendPQExpBufferStr(buf, "\\&");
else
appendPQExpBufferChar(buf, *p);
}
appendPQExpBufferChar(buf, '\'');
}
/*
* Adapted from: src/fe_utils/string_utils.c
*/
void
appendRemoteShellString(PQExpBuffer buf, const char *str)
{
const char *p;
appendPQExpBufferStr(buf, "\\'");
for (p = str; *p; p++)
{
if (*p == '\n' || *p == '\r')
{
fprintf(stderr,
_("shell command argument contains a newline or carriage return: \"%s\"\n"),
str);
exit(ERR_BAD_CONFIG);
}
if (*p == '\'')
appendPQExpBufferStr(buf, "'\"'\"'");
else if (*p == '&')
appendPQExpBufferStr(buf, "\\&");
else
appendPQExpBufferChar(buf, *p);
}
appendPQExpBufferStr(buf, "\\'");
}

View File

@@ -1,6 +1,6 @@
/*
* compat.h
* Copyright (c) 2ndQuadrant, 2010-2017
* Copyright (c) EnterpriseDB Corporation, 2010-2021
*
* Portions Copyright (c) 1996-2013, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -27,4 +27,6 @@ extern void appendConnStrVal(PQExpBuffer buf, const char *str);
extern void appendShellString(PQExpBuffer buf, const char *str);
extern void appendRemoteShellString(PQExpBuffer buf, const char *str);
#endif

View File

@@ -1,4 +1,2 @@
/* config.h.in. Generated from configure.in by autoheader. */
/* Only build repmgr for BDR */
#undef BDR_ONLY

986
configdata.c Normal file
View File

@@ -0,0 +1,986 @@
/*
* configdata.c - contains structs with parsed configuration data
*
* Copyright (c) EnterpriseDB Corporation, 2010-2021
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include "repmgr.h"
#include "configfile.h"
/*
* Parsed configuration settings are stored here
*/
t_configuration_options config_file_options;
/*
* Configuration settings are defined here
*/
struct ConfigFileSetting config_file_settings[] =
{
/* ================
* node information
* ================
*/
/* node_id */
{
"node_id",
CONFIG_INT,
{ .intptr = &config_file_options.node_id },
{ .intdefault = UNKNOWN_NODE_ID },
{ .intminval = MIN_NODE_ID },
{},
{}
},
/* node_name */
{
"node_name",
CONFIG_STRING,
{ .strptr = config_file_options.node_name },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.node_name) },
{}
},
/* conninfo */
{
"conninfo",
CONFIG_STRING,
{ .strptr = config_file_options.conninfo },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.conninfo) },
{}
},
/* replication_user */
{
"replication_user",
CONFIG_STRING,
{ .strptr = config_file_options.replication_user },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.replication_user) },
{}
},
/* data_directory */
{
"data_directory",
CONFIG_STRING,
{ .strptr = config_file_options.data_directory },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.data_directory) },
{ .postprocess_func = &repmgr_canonicalize_path }
},
/* config_directory */
{
"config_directory",
CONFIG_STRING,
{ .strptr = config_file_options.config_directory },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.config_directory) },
{ .postprocess_func = &repmgr_canonicalize_path }
},
/* pg_bindir */
{
"pg_bindir",
CONFIG_STRING,
{ .strptr = config_file_options.pg_bindir },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.pg_bindir) },
{ .postprocess_func = &repmgr_canonicalize_path }
},
/* repmgr_bindir */
{
"repmgr_bindir",
CONFIG_STRING,
{ .strptr = config_file_options.repmgr_bindir },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.repmgr_bindir) },
{ .postprocess_func = &repmgr_canonicalize_path }
},
/* replication_type */
{
"replication_type",
CONFIG_REPLICATION_TYPE,
{ .replicationtypeptr = &config_file_options.replication_type },
{ .replicationtypedefault = DEFAULT_REPLICATION_TYPE },
{},
{},
{}
},
/* ================
* logging settings
* ================
*/
/*
* log_level
* NOTE: the default for "log_level" is set in log.c and does not need
* to be initialised here
*/
{
"log_level",
CONFIG_STRING,
{ .strptr = config_file_options.log_level },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.log_level) },
{}
},
/* log_facility */
{
"log_facility",
CONFIG_STRING,
{ .strptr = config_file_options.log_facility },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.log_facility) },
{}
},
/* log_file */
{
"log_file",
CONFIG_STRING,
{ .strptr = config_file_options.log_file },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.log_file) },
{ .postprocess_func = &repmgr_canonicalize_path }
},
/* log_status_interval */
{
"log_status_interval",
CONFIG_INT,
{ .intptr = &config_file_options.log_status_interval },
{ .intdefault = DEFAULT_LOG_STATUS_INTERVAL, },
{ .intminval = 0 },
{},
{}
},
/* ======================
* standby clone settings
* ======================
*/
/* use_replication_slots */
{
"use_replication_slots",
CONFIG_BOOL,
{ .boolptr = &config_file_options.use_replication_slots },
{ .booldefault = DEFAULT_USE_REPLICATION_SLOTS },
{},
{},
{}
},
/* pg_basebackup_options */
{
"pg_basebackup_options",
CONFIG_STRING,
{ .strptr = config_file_options.pg_basebackup_options },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.pg_basebackup_options) },
{}
},
/* restore_command */
{
"restore_command",
CONFIG_STRING,
{ .strptr = config_file_options.restore_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.restore_command) },
{}
},
/* tablespace_mapping */
{
"tablespace_mapping",
CONFIG_TABLESPACE_MAPPING,
{ .tablespacemappingptr = &config_file_options.tablespace_mapping },
{},
{},
{},
{}
},
/* recovery_min_apply_delay */
{
"recovery_min_apply_delay",
CONFIG_STRING,
{ .strptr = config_file_options.recovery_min_apply_delay },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.recovery_min_apply_delay) },
{
.process_func = &parse_time_unit_parameter,
.providedptr = &config_file_options.recovery_min_apply_delay_provided
}
},
/* archive_cleanup_command */
{
"archive_cleanup_command",
CONFIG_STRING,
{ .strptr = config_file_options.archive_cleanup_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.archive_cleanup_command) },
{}
},
/* use_primary_conninfo_password */
{
"use_primary_conninfo_password",
CONFIG_BOOL,
{ .boolptr = &config_file_options.use_primary_conninfo_password },
{ .booldefault = DEFAULT_USE_PRIMARY_CONNINFO_PASSWORD },
{},
{},
{}
},
/* passfile */
{
"passfile",
CONFIG_STRING,
{ .strptr = config_file_options.passfile },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.passfile) },
{}
},
/* ======================
* standby clone settings
* ======================
*/
/* promote_check_timeout */
{
"promote_check_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.promote_check_timeout },
{ .intdefault = DEFAULT_PROMOTE_CHECK_TIMEOUT },
{ .intminval = 1 },
{},
{}
},
/* promote_check_interval */
{
"promote_check_interval",
CONFIG_INT,
{ .intptr = &config_file_options.promote_check_interval },
{ .intdefault = DEFAULT_PROMOTE_CHECK_INTERVAL },
{ .intminval = 1 },
{},
{}
},
/* pg_backupapi_backup_id*/
{
"pg_backupapi_backup_id",
CONFIG_STRING,
{ .strptr = config_file_options.pg_backupapi_backup_id },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.pg_backupapi_backup_id) },
{}
},
/* pg_backupapi_host*/
{
"pg_backupapi_host",
CONFIG_STRING,
{ .strptr = config_file_options.pg_backupapi_host },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.pg_backupapi_host) },
{}
},
/* pg_backupapi_node_name */
{
"pg_backupapi_node_name",
CONFIG_STRING,
{ .strptr = config_file_options.pg_backupapi_node_name },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.pg_backupapi_node_name) },
{}
},
/* pg_backupapi_remote_ssh_command */
{
"pg_backupapi_remote_ssh_command",
CONFIG_STRING,
{ .strptr = config_file_options.pg_backupapi_remote_ssh_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.pg_backupapi_remote_ssh_command) },
{}
},
/* =======================
* standby follow settings
* =======================
*/
/* primary_follow_timeout */
{
"primary_follow_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.primary_follow_timeout },
{ .intdefault = DEFAULT_PRIMARY_FOLLOW_TIMEOUT, },
{ .intminval = 1 },
{},
{}
},
/* standby_follow_timeout */
{
"standby_follow_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.standby_follow_timeout },
{ .intdefault = DEFAULT_STANDBY_FOLLOW_TIMEOUT },
{ .intminval = 1 },
{},
{}
},
/* standby_follow_restart */
{
"standby_follow_restart",
CONFIG_BOOL,
{ .boolptr = &config_file_options.standby_follow_restart },
{ .booldefault = DEFAULT_STANDBY_FOLLOW_RESTART },
{},
{},
{}
},
/* ===========================
* standby switchover settings
* ===========================
*/
/* shutdown_check_timeout */
{
"shutdown_check_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.shutdown_check_timeout },
{ .intdefault = DEFAULT_SHUTDOWN_CHECK_TIMEOUT },
{ .intminval = 1 },
{},
{}
},
/* standby_reconnect_timeout */
{
"standby_reconnect_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.standby_reconnect_timeout },
{ .intdefault = DEFAULT_STANDBY_RECONNECT_TIMEOUT },
{ .intminval = 1 },
{},
{}
},
/* wal_receive_check_timeout */
{
"wal_receive_check_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.wal_receive_check_timeout },
{ .intdefault = DEFAULT_WAL_RECEIVE_CHECK_TIMEOUT },
{ .intminval = 1 },
{},
{}
},
/* ====================
* node rejoin settings
* ====================
*/
/* node_rejoin_timeout */
{
"node_rejoin_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.node_rejoin_timeout },
{ .intdefault = DEFAULT_NODE_REJOIN_TIMEOUT },
{ .intminval = 1 },
{},
{}
},
/* ===================
* node check settings
* ===================
*/
/* archive_ready_warning */
{
"archive_ready_warning",
CONFIG_INT,
{ .intptr = &config_file_options.archive_ready_warning },
{ .intdefault = DEFAULT_ARCHIVE_READY_WARNING },
{ .intminval = 1 },
{},
{}
},
/* archive_ready_critical */
{
"archive_ready_critical",
CONFIG_INT,
{ .intptr = &config_file_options.archive_ready_critical },
{ .intdefault = DEFAULT_ARCHIVE_READY_CRITICAL },
{ .intminval = 1 },
{},
{}
},
/* replication_lag_warning */
{
"replication_lag_warning",
CONFIG_INT,
{ .intptr = &config_file_options.replication_lag_warning },
{ .intdefault = DEFAULT_REPLICATION_LAG_WARNING },
{ .intminval = 1 },
{},
{}
},
/* replication_lag_critical */
{
"replication_lag_critical",
CONFIG_INT,
{ .intptr = &config_file_options.replication_lag_critical },
{ .intdefault = DEFAULT_REPLICATION_LAG_CRITICAL },
{ .intminval = 1 },
{},
{}
},
/* ================
* witness settings
* ================
*/
/* witness_sync_interval */
{
"witness_sync_interval",
CONFIG_INT,
{ .intptr = &config_file_options.witness_sync_interval },
{ .intdefault = DEFAULT_WITNESS_SYNC_INTERVAL },
{ .intminval = 1 },
{},
{}
},
/* ================
* repmgrd settings
* ================
*/
/* failover */
{
"failover",
CONFIG_FAILOVER_MODE,
{ .failovermodeptr = &config_file_options.failover },
{ .failovermodedefault = FAILOVER_MANUAL },
{},
{},
{}
},
/* location */
{
"location",
CONFIG_STRING,
{ .strptr = config_file_options.location },
{ .strdefault = DEFAULT_LOCATION },
{},
{ .strmaxlen = sizeof(config_file_options.location) },
{}
},
/* priority */
{
"priority",
CONFIG_INT,
{ .intptr = &config_file_options.priority },
{ .intdefault = DEFAULT_PRIORITY, },
{ .intminval = 0 },
{},
{}
},
/* promote_command */
{
"promote_command",
CONFIG_STRING,
{ .strptr = config_file_options.promote_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.promote_command) },
{}
},
/* follow_command */
{
"follow_command",
CONFIG_STRING,
{ .strptr = config_file_options.follow_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.follow_command) },
{}
},
/* monitor_interval_secs */
{
"monitor_interval_secs",
CONFIG_INT,
{ .intptr = &config_file_options.monitor_interval_secs },
{ .intdefault = DEFAULT_MONITORING_INTERVAL },
{ .intminval = 1 },
{},
{}
},
/* reconnect_attempts */
{
"reconnect_attempts",
CONFIG_INT,
{ .intptr = &config_file_options.reconnect_attempts },
{ .intdefault = DEFAULT_RECONNECTION_ATTEMPTS },
{ .intminval = 0 },
{},
{}
},
/* reconnect_interval */
{
"reconnect_interval",
CONFIG_INT,
{ .intptr = &config_file_options.reconnect_interval },
{ .intdefault = DEFAULT_RECONNECTION_INTERVAL },
{ .intminval = 0 },
{},
{}
},
/* monitoring_history */
{
"monitoring_history",
CONFIG_BOOL,
{ .boolptr = &config_file_options.monitoring_history },
{ .booldefault = DEFAULT_MONITORING_HISTORY },
{},
{},
{}
},
/* degraded_monitoring_timeout */
{
"degraded_monitoring_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.degraded_monitoring_timeout },
{ .intdefault = DEFAULT_DEGRADED_MONITORING_TIMEOUT },
{ .intminval = -1 },
{},
{}
},
/* async_query_timeout */
{
"async_query_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.async_query_timeout },
{ .intdefault = DEFAULT_ASYNC_QUERY_TIMEOUT },
{ .intminval = 0 },
{},
{}
},
/* primary_notification_timeout */
{
"primary_notification_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.primary_notification_timeout },
{ .intdefault = DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT },
{ .intminval = 0 },
{},
{}
},
/* repmgrd_standby_startup_timeout */
{
"repmgrd_standby_startup_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.repmgrd_standby_startup_timeout },
{ .intdefault = DEFAULT_REPMGRD_STANDBY_STARTUP_TIMEOUT },
{ .intminval = 0 },
{},
{}
},
/* repmgrd_pid_file */
{
"repmgrd_pid_file",
CONFIG_STRING,
{ .strptr = config_file_options.repmgrd_pid_file },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.repmgrd_pid_file) },
{ .postprocess_func = &repmgr_canonicalize_path }
},
/* repmgrd_exit_on_inactive_node */
{
"repmgrd_exit_on_inactive_node",
CONFIG_BOOL,
{ .boolptr = &config_file_options.repmgrd_exit_on_inactive_node},
{ .booldefault = DEFAULT_REPMGRD_EXIT_ON_INACTIVE_NODE },
{},
{},
{}
},
/* standby_disconnect_on_failover */
{
"standby_disconnect_on_failover",
CONFIG_BOOL,
{ .boolptr = &config_file_options.standby_disconnect_on_failover },
{ .booldefault = DEFAULT_STANDBY_DISCONNECT_ON_FAILOVER },
{},
{},
{}
},
/* sibling_nodes_disconnect_timeout */
{
"sibling_nodes_disconnect_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.sibling_nodes_disconnect_timeout },
{ .intdefault = DEFAULT_SIBLING_NODES_DISCONNECT_TIMEOUT },
{ .intminval = 0 },
{},
{}
},
/* connection_check_type */
{
"connection_check_type",
CONFIG_CONNECTION_CHECK_TYPE,
{ .checktypeptr = &config_file_options.connection_check_type },
{ .checktypedefault = DEFAULT_CONNECTION_CHECK_TYPE },
{},
{},
{}
},
/* primary_visibility_consensus */
{
"primary_visibility_consensus",
CONFIG_BOOL,
{ .boolptr = &config_file_options.primary_visibility_consensus },
{ .booldefault = DEFAULT_PRIMARY_VISIBILITY_CONSENSUS },
{},
{},
{}
},
/* always_promote */
{
"always_promote",
CONFIG_BOOL,
{ .boolptr = &config_file_options.always_promote },
{ .booldefault = DEFAULT_ALWAYS_PROMOTE },
{},
{},
{}
},
/* failover_validation_command */
{
"failover_validation_command",
CONFIG_STRING,
{ .strptr = config_file_options.failover_validation_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.failover_validation_command) },
{}
},
/* election_rerun_interval */
{
"election_rerun_interval",
CONFIG_INT,
{ .intptr = &config_file_options.election_rerun_interval },
{ .intdefault = DEFAULT_ELECTION_RERUN_INTERVAL },
{ .intminval = 1 },
{},
{}
},
/* child_nodes_check_interval */
{
"child_nodes_check_interval",
CONFIG_INT,
{ .intptr = &config_file_options.child_nodes_check_interval },
{ .intdefault = DEFAULT_CHILD_NODES_CHECK_INTERVAL },
{ .intminval = 1 },
{},
{}
},
/* child_nodes_disconnect_min_count */
{
"child_nodes_disconnect_min_count",
CONFIG_INT,
{ .intptr = &config_file_options.child_nodes_disconnect_min_count },
{ .intdefault = DEFAULT_CHILD_NODES_DISCONNECT_MIN_COUNT },
{ .intminval = -1 },
{},
{}
},
/* child_nodes_connected_min_count */
{
"child_nodes_connected_min_count",
CONFIG_INT,
{ .intptr = &config_file_options.child_nodes_connected_min_count },
{ .intdefault = DEFAULT_CHILD_NODES_CONNECTED_MIN_COUNT},
{ .intminval = -1 },
{},
{}
},
/* child_nodes_connected_include_witness */
{
"child_nodes_connected_include_witness",
CONFIG_BOOL,
{ .boolptr = &config_file_options.child_nodes_connected_include_witness },
{ .booldefault = DEFAULT_CHILD_NODES_CONNECTED_INCLUDE_WITNESS },
{},
{},
{}
},
/* child_nodes_disconnect_timeout */
{
"child_nodes_disconnect_timeout",
CONFIG_INT,
{ .intptr = &config_file_options.child_nodes_disconnect_timeout },
{ .intdefault = DEFAULT_CHILD_NODES_DISCONNECT_TIMEOUT },
{ .intminval = 0 },
{},
{}
},
/* child_nodes_disconnect_command */
{
"child_nodes_disconnect_command",
CONFIG_STRING,
{ .strptr = config_file_options.child_nodes_disconnect_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.child_nodes_disconnect_command) },
{}
},
/* ================
* service settings
* ================
*/
/* pg_ctl_options */
{
"pg_ctl_options",
CONFIG_STRING,
{ .strptr = config_file_options.pg_ctl_options },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.pg_ctl_options) },
{}
},
/* service_start_command */
{
"service_start_command",
CONFIG_STRING,
{ .strptr = config_file_options.service_start_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.service_start_command) },
{}
},
/* service_stop_command */
{
"service_stop_command",
CONFIG_STRING,
{ .strptr = config_file_options.service_stop_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.service_stop_command) },
{}
},
/* service_restart_command */
{
"service_restart_command",
CONFIG_STRING,
{ .strptr = config_file_options.service_restart_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.service_restart_command) },
{}
},
/* service_reload_command */
{
"service_reload_command",
CONFIG_STRING,
{ .strptr = config_file_options.service_reload_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.service_reload_command) },
{}
},
/* service_promote_command */
{
"service_promote_command",
CONFIG_STRING,
{ .strptr = config_file_options.service_promote_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.service_promote_command) },
{}
},
/* ========================
* repmgrd service settings
* ========================
*/
/* repmgrd_service_start_command */
{
"repmgrd_service_start_command",
CONFIG_STRING,
{ .strptr = config_file_options.repmgrd_service_start_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.repmgrd_service_start_command) },
{}
},
/* repmgrd_service_stop_command */
{
"repmgrd_service_stop_command",
CONFIG_STRING,
{ .strptr = config_file_options.repmgrd_service_stop_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.repmgrd_service_stop_command) },
{}
},
/* ===========================
* event notification settings
* ===========================
*/
/* event_notification_command */
{
"event_notification_command",
CONFIG_STRING,
{ .strptr = config_file_options.event_notification_command },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.event_notification_command) },
{}
},
{
"event_notifications",
CONFIG_EVENT_NOTIFICATION_LIST,
{ .notificationlistptr = &config_file_options.event_notifications },
{},
{},
{},
{}
},
/* ===============
* barman settings
* ===============
*/
/* barman_host */
{
"barman_host",
CONFIG_STRING,
{ .strptr = config_file_options.barman_host },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.barman_host) },
{}
},
/* barman_server */
{
"barman_server",
CONFIG_STRING,
{ .strptr = config_file_options.barman_server },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.barman_server) },
{}
},
/* barman_config */
{
"barman_config",
CONFIG_STRING,
{ .strptr = config_file_options.barman_config },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.barman_config) },
{}
},
/* ==================
* rsync/ssh settings
* ==================
*/
/* rsync_options */
{
"rsync_options",
CONFIG_STRING,
{ .strptr = config_file_options.rsync_options },
{ .strdefault = "" },
{},
{ .strmaxlen = sizeof(config_file_options.rsync_options) },
{}
},
/* ssh_options */
{
"ssh_options",
CONFIG_STRING,
{ .strptr = config_file_options.ssh_options },
{ .strdefault = DEFAULT_SSH_OPTIONS },
{},
{ .strmaxlen = sizeof(config_file_options.ssh_options) },
{}
},
/* ==================================
* undocumented experimental settings
* ==================================
*/
/* reconnect_loop_sync */
{
"reconnect_loop_sync",
CONFIG_BOOL,
{ .boolptr = &config_file_options.reconnect_loop_sync },
{ .booldefault = false },
{},
{},
{}
},
/* ==========================
* undocumented test settings
* ==========================
*/
/* promote_delay */
{
"promote_delay",
CONFIG_INT,
{ .intptr = &config_file_options.promote_delay },
{ .intdefault = 0 },
{ .intminval = 1 },
{},
{}
},
/* failover_delay */
{
"failover_delay",
CONFIG_INT,
{ .intptr = &config_file_options.failover_delay },
{ .intdefault = 0 },
{ .intminval = 1 },
{},
{}
},
{
"connection_check_query",
CONFIG_STRING,
{ .strptr = config_file_options.connection_check_query },
{ .strdefault = "SELECT 1" },
{},
{ .strmaxlen = sizeof(config_file_options.connection_check_query) },
{}
},
/* End-of-list marker */
{
NULL, CONFIG_INT, {}, {}, {}, {}, {}
}
};

644
configfile-scan.l Normal file
View File

@@ -0,0 +1,644 @@
/*
* Scanner for the configuration file
*/
%{
#include <setjmp.h>
#include <sys/stat.h>
#include <dirent.h>
#include "repmgr.h"
#include "configfile.h"
/*
* flex emits a yy_fatal_error() function that it calls in response to
* critical errors like malloc failure, file I/O errors, and detection of
* internal inconsistency. That function prints a message and calls exit().
* Mutate it to instead call our handler, which jumps out of the parser.
*/
#undef fprintf
#define fprintf(file, fmt, msg) CONF_flex_fatal(msg)
enum
{
CONF_ID = 1,
CONF_STRING = 2,
CONF_INTEGER = 3,
CONF_REAL = 4,
CONF_EQUALS = 5,
CONF_UNQUOTED_STRING = 6,
CONF_QUALIFIED_ID = 7,
CONF_EOL = 99,
CONF_ERROR = 100
};
static unsigned int ConfigFileLineno;
static const char *CONF_flex_fatal_errmsg;
static sigjmp_buf *CONF_flex_fatal_jmp;
static char *CONF_scanstr(const char *s);
static int CONF_flex_fatal(const char *msg);
static bool ProcessConfigFile(const char *base_dir, const char *config_file, const char *calling_file, bool strict, int depth, KeyValueList *contents, ItemList *error_list, ItemList *warning_list);
static bool ProcessConfigFp(FILE *fp, const char *config_file, const char *calling_file, int depth, const char *base_dir, KeyValueList *contents, ItemList *error_list, ItemList *warning_list);
static bool ProcessConfigDirectory(const char *base_dir, const char *includedir, const char *calling_file, int depth, KeyValueList *contents, ItemList *error_list, ItemList *warning_list);
static char *AbsoluteConfigLocation(const char *base_dir, const char *location, const char *calling_file);
%}
%option 8bit
%option never-interactive
%option nodefault
%option noinput
%option nounput
%option noyywrap
%option warn
%option prefix="CONF_yy"
SIGN ("-"|"+")
DIGIT [0-9]
HEXDIGIT [0-9a-fA-F]
UNIT_LETTER [a-zA-Z]
INTEGER {SIGN}?({DIGIT}+|0x{HEXDIGIT}+){UNIT_LETTER}*
EXPONENT [Ee]{SIGN}?{DIGIT}+
REAL {SIGN}?{DIGIT}*"."{DIGIT}*{EXPONENT}?
LETTER [A-Za-z_\200-\377]
LETTER_OR_DIGIT [A-Za-z_0-9\200-\377]
ID {LETTER}{LETTER_OR_DIGIT}*
QUALIFIED_ID {ID}"."{ID}
UNQUOTED_STRING {LETTER}({LETTER_OR_DIGIT}|[-._:/])*
STRING \'([^'\\\n]|\\.|\'\')*\'
%%
\n ConfigFileLineno++; return CONF_EOL;
[ \t\r]+ /* eat whitespace */
#.* /* eat comment (.* matches anything until newline) */
{ID} return CONF_ID;
{QUALIFIED_ID} return CONF_QUALIFIED_ID;
{STRING} return CONF_STRING;
{UNQUOTED_STRING} return CONF_UNQUOTED_STRING;
{INTEGER} return CONF_INTEGER;
{REAL} return CONF_REAL;
= return CONF_EQUALS;
. return CONF_ERROR;
%%
extern bool
ProcessRepmgrConfigFile(const char *config_file, const char *base_dir, ItemList *error_list, ItemList *warning_list)
{
return ProcessConfigFile(base_dir, config_file, NULL, true, 0, NULL, error_list, warning_list);
}
extern bool
ProcessPostgresConfigFile(const char *config_file, const char *base_dir, bool strict, KeyValueList *contents, ItemList *error_list, ItemList *warning_list)
{
return ProcessConfigFile(base_dir, config_file, NULL, strict, 0, contents, error_list, warning_list);
}
static bool
ProcessConfigFile(const char *base_dir, const char *config_file, const char *calling_file, bool strict, int depth, KeyValueList *contents, ItemList *error_list, ItemList *warning_list)
{
char *abs_path;
bool success = true;
FILE *fp;
/*
* Reject file name that is all-blank (including empty), as that leads to
* confusion --- we'd try to read the containing directory as a file.
*/
if (strspn(config_file, " \t\r\n") == strlen(config_file))
{
return false;
}
/*
* Reject too-deep include nesting depth. This is just a safety check to
* avoid dumping core due to stack overflow if an include file loops back
* to itself. The maximum nesting depth is pretty arbitrary.
*/
if (depth > 10)
{
item_list_append_format(error_list,
_("could not open configuration file \"%s\": maximum nesting depth exceeded"),
config_file);
return false;
}
abs_path = AbsoluteConfigLocation(base_dir, config_file, calling_file);
/* Reject direct recursion */
if (calling_file && strcmp(abs_path, calling_file) == 0)
{
item_list_append_format(error_list,
_("configuration file recursion in \"%s\""),
calling_file);
pfree(abs_path);
return false;
}
fp = fopen(abs_path, "r");
if (!fp)
{
if (strict == false)
{
item_list_append_format(error_list,
"skipping configuration file \"%s\"",
abs_path);
}
else
{
item_list_append_format(error_list,
"could not open configuration file \"%s\": %s",
abs_path,
strerror(errno));
success = false;
}
}
else
{
success = ProcessConfigFp(fp, abs_path, calling_file, depth + 1, base_dir, contents, error_list, warning_list);
}
free(abs_path);
return success;
}
static bool
ProcessConfigFp(FILE *fp, const char *config_file, const char *calling_file, int depth, const char *base_dir, KeyValueList *contents, ItemList *error_list, ItemList *warning_list)
{
volatile bool OK = true;
volatile YY_BUFFER_STATE lex_buffer = NULL;
sigjmp_buf flex_fatal_jmp;
int errorcount;
int token;
if (sigsetjmp(flex_fatal_jmp, 1) == 0)
{
CONF_flex_fatal_jmp = &flex_fatal_jmp;
}
else
{
/*
* Regain control after a fatal, internal flex error. It may have
* corrupted parser state. Consequently, abandon the file, but trust
* that the state remains sane enough for yy_delete_buffer().
*/
item_list_append_format(error_list,
"%s at file \"%s\" line %u",
CONF_flex_fatal_errmsg, config_file, ConfigFileLineno);
OK = false;
goto cleanup;
}
/*
* Parse
*/
ConfigFileLineno = 1;
errorcount = 0;
lex_buffer = yy_create_buffer(fp, YY_BUF_SIZE);
yy_switch_to_buffer(lex_buffer);
/* This loop iterates once per logical line */
while ((token = yylex()))
{
char *opt_name = NULL;
char *opt_value = NULL;
if (token == CONF_EOL) /* empty or comment line */
continue;
/* first token on line is option name */
if (token != CONF_ID && token != CONF_QUALIFIED_ID)
goto parse_error;
opt_name = pstrdup(yytext);
/* next we have an optional equal sign; discard if present */
token = yylex();
if (token == CONF_EQUALS)
token = yylex();
/* now we must have the option value */
if (token != CONF_ID &&
token != CONF_STRING &&
token != CONF_INTEGER &&
token != CONF_REAL &&
token != CONF_UNQUOTED_STRING)
goto parse_error;
if (token == CONF_STRING) /* strip quotes and escapes */
opt_value = CONF_scanstr(yytext);
else
opt_value = pstrdup(yytext);
/* now we'd like an end of line, or possibly EOF */
token = yylex();
if (token != CONF_EOL)
{
if (token != 0)
goto parse_error;
/* treat EOF like \n for line numbering purposes, cf bug 4752 */
ConfigFileLineno++;
}
/* Handle include files */
if (base_dir != NULL && strcasecmp(opt_name, "include_dir") == 0)
{
/*
* An include_dir directive isn't a variable and should be
* processed immediately.
*/
if (!ProcessConfigDirectory(base_dir, opt_value, config_file,
depth + 1, contents,
error_list, warning_list))
OK = false;
yy_switch_to_buffer(lex_buffer);
pfree(opt_name);
pfree(opt_value);
}
else if (base_dir != NULL && strcasecmp(opt_name, "include_if_exists") == 0)
{
if (!ProcessConfigFile(base_dir, opt_value, config_file,
false, depth + 1, contents,
error_list, warning_list))
OK = false;
yy_switch_to_buffer(lex_buffer);
pfree(opt_name);
pfree(opt_value);
}
else if (base_dir != NULL && strcasecmp(opt_name, "include") == 0)
{
if (!ProcessConfigFile(base_dir, opt_value, config_file,
true, depth + 1, contents,
error_list, warning_list))
OK = false;
yy_switch_to_buffer(lex_buffer);
pfree(opt_name);
pfree(opt_value);
}
else
{
/* OK, process the option name and value */
if (contents != NULL)
{
key_value_list_replace_or_set(contents,
opt_name,
opt_value);
}
else
{
parse_configuration_item(error_list,
warning_list,
opt_name,
opt_value);
}
}
/* break out of loop if read EOF, else loop for next line */
if (token == 0)
break;
continue;
parse_error:
/* release storage if we allocated any on this line */
if (opt_name)
pfree(opt_name);
if (opt_value)
pfree(opt_value);
/* report the error */
if (token == CONF_EOL || token == 0)
{
item_list_append_format(error_list,
_("syntax error in file \"%s\" line %u, near end of line"),
config_file, ConfigFileLineno - 1);
}
else
{
item_list_append_format(error_list,
_("syntax error in file \"%s\" line %u, near token \"%s\""),
config_file, ConfigFileLineno, yytext);
}
OK = false;
errorcount++;
/*
* To avoid producing too much noise when fed a totally bogus file,
* give up after 100 syntax errors per file (an arbitrary number).
* Also, if we're only logging the errors at DEBUG level anyway, might
* as well give up immediately. (This prevents postmaster children
* from bloating the logs with duplicate complaints.)
*/
if (errorcount >= 100)
{
fprintf(stderr,
_("too many syntax errors found, abandoning file \"%s\"\n"),
config_file);
break;
}
/* resync to next end-of-line or EOF */
while (token != CONF_EOL && token != 0)
token = yylex();
/* break out of loop on EOF */
if (token == 0)
break;
}
cleanup:
yy_delete_buffer(lex_buffer);
return OK;
}
/*
* Read and parse all config files in a subdirectory in alphabetical order
*
* includedir is the absolute or relative path to the subdirectory to scan.
*
* See ProcessConfigFp for further details.
*/
static bool
ProcessConfigDirectory(const char *base_dir, const char *includedir, const char *calling_file, int depth, KeyValueList *contents, ItemList *error_list, ItemList *warning_list)
{
char *directory;
DIR *d;
struct dirent *de;
char **filenames;
int num_filenames;
int size_filenames;
bool status;
/*
* Reject directory name that is all-blank (including empty), as that
* leads to confusion --- we'd read the containing directory, typically
* resulting in recursive inclusion of the same file(s).
*/
if (strspn(includedir, " \t\r\n") == strlen(includedir))
{
item_list_append_format(error_list,
_("empty configuration directory name: \"%s\""),
includedir);
return false;
}
directory = AbsoluteConfigLocation(base_dir, includedir, calling_file);
d = opendir(directory);
if (d == NULL)
{
item_list_append_format(error_list,
_("could not open configuration directory \"%s\": %s"),
directory,
strerror(errno));
status = false;
goto cleanup;
}
/*
* Read the directory and put the filenames in an array, so we can sort
* them prior to processing the contents.
*/
size_filenames = 32;
filenames = (char **) palloc(size_filenames * sizeof(char *));
num_filenames = 0;
while ((de = readdir(d)) != NULL)
{
struct stat st;
char filename[MAXPGPATH];
/*
* Only parse files with names ending in ".conf". Explicitly reject
* files starting with ".". This excludes things like "." and "..",
* as well as typical hidden files, backup files, and editor debris.
*/
if (strlen(de->d_name) < 6)
continue;
if (de->d_name[0] == '.')
continue;
if (strcmp(de->d_name + strlen(de->d_name) - 5, ".conf") != 0)
continue;
join_path_components(filename, directory, de->d_name);
canonicalize_path(filename);
if (stat(filename, &st) == 0)
{
if (!S_ISDIR(st.st_mode))
{
/* Add file to array, increasing its size in blocks of 32 */
if (num_filenames >= size_filenames)
{
size_filenames += 32;
filenames = (char **) repalloc(filenames,
size_filenames * sizeof(char *));
}
filenames[num_filenames] = pstrdup(filename);
num_filenames++;
}
}
else
{
/*
* stat does not care about permissions, so the most likely reason
* a file can't be accessed now is if it was removed between the
* directory listing and now.
*/
item_list_append_format(error_list,
_("could not stat file \"%s\": %s"),
filename, strerror(errno));
status = false;
goto cleanup;
}
}
if (num_filenames > 0)
{
int i;
qsort(filenames, num_filenames, sizeof(char *), pg_qsort_strcmp);
for (i = 0; i < num_filenames; i++)
{
if (!ProcessConfigFile(base_dir, filenames[i], calling_file,
true, depth, contents,
error_list, warning_list))
{
status = false;
goto cleanup;
}
}
}
status = true;
cleanup:
if (d)
closedir(d);
pfree(directory);
return status;
}
/*
* scanstr
*
* Strip the quotes surrounding the given string, and collapse any embedded
* '' sequences and backslash escapes.
*
* the string returned is palloc'd and should eventually be pfree'd by the
* caller.
*/
static char *
CONF_scanstr(const char *s)
{
char *newStr;
int len,
i,
j;
Assert(s != NULL && s[0] == '\'');
len = strlen(s);
Assert(s != NULL);
Assert(len >= 2);
Assert(s[len - 1] == '\'');
/* Skip the leading quote; we'll handle the trailing quote below */
s++, len--;
/* Since len still includes trailing quote, this is enough space */
newStr = palloc(len);
for (i = 0, j = 0; i < len; i++)
{
if (s[i] == '\\')
{
i++;
switch (s[i])
{
case 'b':
newStr[j] = '\b';
break;
case 'f':
newStr[j] = '\f';
break;
case 'n':
newStr[j] = '\n';
break;
case 'r':
newStr[j] = '\r';
break;
case 't':
newStr[j] = '\t';
break;
case '0':
case '1':
case '2':
case '3':
case '4':
case '5':
case '6':
case '7':
{
int k;
long octVal = 0;
for (k = 0;
s[i + k] >= '0' && s[i + k] <= '7' && k < 3;
k++)
octVal = (octVal << 3) + (s[i + k] - '0');
i += k - 1;
newStr[j] = ((char) octVal);
}
break;
default:
newStr[j] = s[i];
break;
} /* switch */
}
else if (s[i] == '\'' && s[i + 1] == '\'')
{
/* doubled quote becomes just one quote */
newStr[j] = s[++i];
}
else
newStr[j] = s[i];
j++;
}
/* We copied the ending quote to newStr, so replace with \0 */
Assert(j > 0 && j <= len);
newStr[--j] = '\0';
return newStr;
}
/*
* Given a configuration file or directory location that may be a relative
* path, return an absolute one. We consider the location to be relative to
* the directory holding the calling file, or to DataDir if no calling file.
*/
static char *
AbsoluteConfigLocation(const char *base_dir, const char *location, const char *calling_file)
{
char abs_path[MAXPGPATH];
if (is_absolute_path(location))
return strdup(location);
if (calling_file != NULL)
{
strlcpy(abs_path, calling_file, sizeof(abs_path));
get_parent_directory(abs_path);
join_path_components(abs_path, abs_path, location);
canonicalize_path(abs_path);
}
else if (base_dir != NULL)
{
join_path_components(abs_path, base_dir, location);
canonicalize_path(abs_path);
}
else
{
strlcpy(abs_path, location, sizeof(abs_path));
}
return strdup(abs_path);
}
/*
* Flex fatal errors bring us here. Stash the error message and jump back to
* ParseConfigFp(). Assume all msg arguments point to string constants; this
* holds for flex 2.5.31 (earliest we support) and flex 2.5.35 (latest as of
* this writing). Otherwise, we would need to copy the message.
*
* We return "int" since this takes the place of calls to fprintf().
*/
static int
CONF_flex_fatal(const char *msg)
{
CONF_flex_fatal_errmsg = msg;
siglongjmp(*CONF_flex_fatal_jmp, 1);
return 0; /* keep compiler quiet */
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,7 +1,7 @@
/*
* configfile.h
*
* Copyright (c) 2ndQuadrant, 2010-2017
* Copyright (c) EnterpriseDB Corporation, 2010-2021
*
*
* This program is free software: you can redistribute it and/or modify
@@ -28,6 +28,12 @@
/* magic number for use in t_recovery_conf */
#define TARGET_TIMELINE_LATEST 0
/*
* This is defined in src/include/utils.h, however it's not practical
* to include that from a frontend application.
*/
#define PG_AUTOCONF_FILENAME "postgresql.auto.conf"
extern bool config_file_found;
extern char config_file_path[MAXPGPATH];
@@ -37,6 +43,18 @@ typedef enum
FAILOVER_AUTOMATIC
} failover_mode_opt;
typedef enum
{
CHECK_PING,
CHECK_QUERY,
CHECK_CONNECTION
} ConnectionCheckType;
typedef enum
{
REPLICATION_TYPE_PHYSICAL
} ReplicationType;
typedef struct EventNotificationListCell
{
struct EventNotificationListCell *next;
@@ -65,31 +83,108 @@ typedef struct TablespaceList
} TablespaceList;
typedef enum
{
CONFIG_BOOL,
CONFIG_INT,
CONFIG_STRING,
CONFIG_FAILOVER_MODE,
CONFIG_CONNECTION_CHECK_TYPE,
CONFIG_EVENT_NOTIFICATION_LIST,
CONFIG_TABLESPACE_MAPPING,
CONFIG_REPLICATION_TYPE
} ConfigItemType;
typedef struct ConfigFileSetting
{
const char *name;
ConfigItemType type;
union
{
int *intptr;
char *strptr;
bool *boolptr;
failover_mode_opt *failovermodeptr;
ConnectionCheckType *checktypeptr;
EventNotificationList *notificationlistptr;
TablespaceList *tablespacemappingptr;
ReplicationType *replicationtypeptr;
} val;
union {
int intdefault;
const char *strdefault;
bool booldefault;
failover_mode_opt failovermodedefault;
ConnectionCheckType checktypedefault;
ReplicationType replicationtypedefault;
} defval;
union {
int intminval;
} minval;
union {
int strmaxlen;
} maxval;
struct {
void (*process_func)(const char *, const char *, char *, ItemList *errors);
void (*postprocess_func)(const char *, const char *, char *, ItemList *errors);
bool *providedptr;
} process;
} ConfigFileSetting;
/* Declare the main configfile structure for client applications */
extern ConfigFileSetting config_file_settings[];
typedef struct
{
/* node information */
int node_id;
char node_name[MAXLEN];
char node_name[NAMEDATALEN];
char conninfo[MAXLEN];
char replication_user[NAMEDATALEN];
char data_directory[MAXPGPATH];
char config_directory[MAXPGPATH];
char pg_bindir[MAXPGPATH];
int replication_type;
char repmgr_bindir[MAXPGPATH];
ReplicationType replication_type;
/* log settings */
char log_level[MAXLEN];
char log_facility[MAXLEN];
char log_file[MAXLEN];
char log_file[MAXPGPATH];
int log_status_interval;
/* standby action settings */
/* standby clone settings */
bool use_replication_slots;
char pg_basebackup_options[MAXLEN];
char restore_command[MAXLEN];
TablespaceList tablespace_mapping;
char recovery_min_apply_delay[MAXLEN];
bool recovery_min_apply_delay_provided;
char archive_cleanup_command[MAXLEN];
bool use_primary_conninfo_password;
char passfile[MAXPGPATH];
char pg_backupapi_backup_id[NAMEDATALEN];
char pg_backupapi_host[NAMEDATALEN];
char pg_backupapi_node_name[NAMEDATALEN];
char pg_backupapi_remote_ssh_command[MAXLEN];
/* standby promote settings */
int promote_check_timeout;
int promote_check_interval;
/* standby follow settings */
int primary_follow_timeout;
int standby_follow_timeout;
bool standby_follow_restart;
/* standby switchover settings */
int shutdown_check_timeout;
int standby_reconnect_timeout;
int wal_receive_check_timeout;
/* node rejoin settings */
int node_rejoin_timeout;
/* node check settings */
int archive_ready_warning;
@@ -97,6 +192,9 @@ typedef struct
int replication_lag_warning;
int replication_lag_critical;
/* witness settings */
int witness_sync_interval;
/* repmgrd settings */
failover_mode_opt failover;
char location[MAXLEN];
@@ -110,22 +208,37 @@ typedef struct
int degraded_monitoring_timeout;
int async_query_timeout;
int primary_notification_timeout;
int primary_follow_timeout;
/* BDR settings */
bool bdr_local_monitoring_only;
bool bdr_recovery_timeout;
int repmgrd_standby_startup_timeout;
char repmgrd_pid_file[MAXPGPATH];
bool repmgrd_exit_on_inactive_node;
bool standby_disconnect_on_failover;
int sibling_nodes_disconnect_timeout;
ConnectionCheckType connection_check_type;
bool primary_visibility_consensus;
bool always_promote;
char failover_validation_command[MAXPGPATH];
int election_rerun_interval;
int child_nodes_check_interval;
int child_nodes_disconnect_min_count;
int child_nodes_connected_min_count;
bool child_nodes_connected_include_witness;
int child_nodes_disconnect_timeout;
char child_nodes_disconnect_command[MAXPGPATH];
/* service settings */
char pg_ctl_options[MAXLEN];
char service_stop_command[MAXLEN];
char service_start_command[MAXLEN];
char service_restart_command[MAXLEN];
char service_reload_command[MAXLEN];
char service_promote_command[MAXLEN];
char service_start_command[MAXPGPATH];
char service_stop_command[MAXPGPATH];
char service_restart_command[MAXPGPATH];
char service_reload_command[MAXPGPATH];
char service_promote_command[MAXPGPATH];
/* repmgrd service settings */
char repmgrd_service_start_command[MAXPGPATH];
char repmgrd_service_stop_command[MAXPGPATH];
/* event notification settings */
char event_notification_command[MAXLEN];
char event_notification_command[MAXPGPATH];
char event_notifications_orig[MAXLEN];
EventNotificationList event_notifications;
@@ -138,58 +251,35 @@ typedef struct
char rsync_options[MAXLEN];
char ssh_options[MAXLEN];
/* undocumented test settings */
/*
* undocumented settings
*
* These settings are for testing or experimental features
* and may be changed without notice.
*/
/* experimental settings */
bool reconnect_loop_sync;
/* test settings */
int promote_delay;
int failover_delay;
char connection_check_query[MAXLEN];
} t_configuration_options;
/*
* The following will initialize the structure with a minimal set of options;
* actual defaults are set in parse_config() before parsing the configuration file
*/
#define T_CONFIGURATION_OPTIONS_INITIALIZER { \
/* node information */ \
UNKNOWN_NODE_ID, "", "", "", "", "", REPLICATION_TYPE_PHYSICAL, \
/* log settings */ \
"", "", "", DEFAULT_LOG_STATUS_INTERVAL, \
/* standby action settings */ \
false, "", "", { NULL, NULL }, "", false, false, \
/* node check settings */ \
DEFAULT_ARCHIVE_READY_WARNING, DEFAULT_ARCHIVE_READY_CRITICAL, \
DEFAULT_REPLICATION_LAG_WARNING, DEFAULT_REPLICATION_LAG_CRITICAL, \
/* repmgrd settings */ \
FAILOVER_MANUAL, DEFAULT_LOCATION, DEFAULT_PRIORITY, "", "", \
DEFAULT_MONITORING_INTERVAL, \
DEFAULT_RECONNECTION_ATTEMPTS, \
DEFAULT_RECONNECTION_INTERVAL, \
false, -1, \
DEFAULT_ASYNC_QUERY_TIMEOUT, \
DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT, \
DEFAULT_PRIMARY_FOLLOW_TIMEOUT, \
/* BDR settings */ \
false, DEFAULT_BDR_RECOVERY_TIMEOUT, \
/* service settings */ \
"", "", "", "", "", "", \
/* event notification settings */ \
"", "", { NULL, NULL }, \
/* barman settings */ \
"", "", "", \
/* rsync/ssh settings */ \
"", "", \
/* undocumented test settings */ \
0 \
}
/* Declare the main configfile structure for client applications */
extern t_configuration_options config_file_options;
typedef struct
{
char slot[MAXLEN];
char xlog_method[MAXLEN];
char wal_method[MAXLEN];
char waldir[MAXPGPATH];
bool no_slot; /* from PostgreSQL 10 */
} t_basebackup_options;
#define T_BASEBACKUP_OPTIONS_INITIALIZER { "", "", false }
#define T_BASEBACKUP_OPTIONS_INITIALIZER { "", "", "", false }
typedef enum
@@ -241,30 +331,53 @@ typedef struct
"", "", "", "" \
}
#include "dbutils.h"
void set_progname(const char *argv0);
const char *progname(void);
void load_config(const char *config_file, bool verbose, bool terse, t_configuration_options *options, char *argv0);
void parse_config(t_configuration_options *options, bool terse);
bool reload_config(t_configuration_options *orig_options);
void load_config(const char *config_file, bool verbose, bool terse, char *argv0);
bool reload_config(t_server_type server_type);
void dump_config(void);
void parse_configuration_item(ItemList *error_list, ItemList *warning_list, const char *name, const char *value);
bool parse_recovery_conf(const char *data_dir, t_recovery_conf *conf);
bool parse_bool(const char *s,
const char *config_item,
ItemList *error_list);
int repmgr_atoi(const char *s,
const char *config_item,
ItemList *error_list,
int minval);
void parse_time_unit_parameter(const char *name, const char *value, char *dest, ItemList *errors);
void repmgr_canonicalize_path(const char *name, const char *value, char *config_item, ItemList *errors);
bool parse_pg_basebackup_options(const char *pg_basebackup_options,
t_basebackup_options *backup_options,
int server_version_num,
ItemList *error_list);
int parse_output_to_argv(const char *string, char ***argv_array);
void free_parsed_argv(char ***argv_array);
const char *format_failover_mode(failover_mode_opt failover);
/* called by repmgr-client and repmgrd */
void exit_with_cli_errors(ItemList *error_list);
void exit_with_cli_errors(ItemList *error_list, const char *repmgr_command);
void print_item_list(ItemList *item_list);
const char *print_replication_type(ReplicationType type);
const char *print_connection_check_type(ConnectionCheckType type);
char *print_event_notification_list(EventNotificationList *list);
char *print_tablespace_mapping(TablespaceList *tablespacemappingptr);
extern bool modify_auto_conf(const char *data_dir, KeyValueList *items);
extern bool ProcessRepmgrConfigFile(const char *config_file, const char *base_dir, ItemList *error_list, ItemList *warning_list);
extern bool ProcessPostgresConfigFile(const char *config_file, const char *base_dir, bool strict, KeyValueList *contents, ItemList *error_list, ItemList *warning_list);
#endif /* _REPMGR_CONFIGFILE_H_ */

199
configure vendored
View File

@@ -1,8 +1,8 @@
#! /bin/sh
# Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.69 for repmgr 4.0.
# Generated by GNU Autoconf 2.69 for repmgr 5.4.0.
#
# Report bugs to <pgsql-bugs@postgresql.org>.
# Report bugs to <repmgr@googlegroups.com>.
#
#
# Copyright (C) 1992-1996, 1998-2012 Free Software Foundation, Inc.
@@ -11,7 +11,7 @@
# This configure script is free software; the Free Software Foundation
# gives unlimited permission to copy, distribute and modify it.
#
# Copyright (c) 2010-2017, 2ndQuadrant Ltd.
# Copyright (c) 2010-2021, EnterpriseDB Corporation
## -------------------- ##
## M4sh Initialization. ##
## -------------------- ##
@@ -269,7 +269,7 @@ fi
$as_echo "$0: be upgraded to zsh 4.3.4 or later."
else
$as_echo "$0: Please tell bug-autoconf@gnu.org and
$0: pgsql-bugs@postgresql.org about your system, including
$0: repmgr@googlegroups.com about your system, including
$0: any error possibly output before this message. Then
$0: install a modern shell, or manually run the script
$0: under such a shell if you do have one."
@@ -582,13 +582,16 @@ MAKEFLAGS=
# Identity of this package.
PACKAGE_NAME='repmgr'
PACKAGE_TARNAME='repmgr'
PACKAGE_VERSION='4.0'
PACKAGE_STRING='repmgr 4.0'
PACKAGE_BUGREPORT='pgsql-bugs@postgresql.org'
PACKAGE_URL='https://2ndquadrant.com/en/resources/repmgr/'
PACKAGE_VERSION='5.4.0'
PACKAGE_STRING='repmgr 5.4.0'
PACKAGE_BUGREPORT='repmgr@googlegroups.com'
PACKAGE_URL='https://repmgr.org/'
ac_subst_vars='LTLIBOBJS
LIBOBJS
HAVE_SED
HAVE_GSED
HAVE_GNUSED
vpath_build
SED
PG_CONFIG
@@ -633,7 +636,6 @@ SHELL'
ac_subst_files=''
ac_user_opts='
enable_option_checking
with_bdr_only
'
ac_precious_vars='build_alias
host_alias
@@ -1179,7 +1181,7 @@ if test "$ac_init_help" = "long"; then
# Omit some internal or obsolete options to make the list less imposing.
# This message is too long to be a string in the A/UX 3.1 sh.
cat <<_ACEOF
\`configure' configures repmgr 4.0 to adapt to many kinds of systems.
\`configure' configures repmgr 5.4.0 to adapt to many kinds of systems.
Usage: $0 [OPTION]... [VAR=VALUE]...
@@ -1240,23 +1242,18 @@ fi
if test -n "$ac_init_help"; then
case $ac_init_help in
short | recursive ) echo "Configuration of repmgr 4.0:";;
short | recursive ) echo "Configuration of repmgr 5.4.0:";;
esac
cat <<\_ACEOF
Optional Packages:
--with-PACKAGE[=ARG] use PACKAGE [ARG=yes]
--without-PACKAGE do not use PACKAGE (same as --with-PACKAGE=no)
--with-bdr-only BDR-only build
Some influential environment variables:
PG_CONFIG Location to find pg_config for target PostgreSQL (default PATH)
Use these variables to override the choices made by `configure' or to help
it to find libraries and programs with nonstandard names/locations.
Report bugs to <pgsql-bugs@postgresql.org>.
repmgr home page: <https://2ndquadrant.com/en/resources/repmgr/>.
Report bugs to <repmgr@googlegroups.com>.
repmgr home page: <https://repmgr.org/>.
_ACEOF
ac_status=$?
fi
@@ -1319,14 +1316,14 @@ fi
test -n "$ac_init_help" && exit $ac_status
if $ac_init_version; then
cat <<\_ACEOF
repmgr configure 4.0
repmgr configure 5.4.0
generated by GNU Autoconf 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
This configure script is free software; the Free Software Foundation
gives unlimited permission to copy, distribute and modify it.
Copyright (c) 2010-2017, 2ndQuadrant Ltd.
Copyright (c) 2010-2021, EnterpriseDB Corporation
_ACEOF
exit
fi
@@ -1338,7 +1335,7 @@ cat >config.log <<_ACEOF
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.
It was created by repmgr $as_me 4.0, which was
It was created by repmgr $as_me 5.4.0, which was
generated by GNU Autoconf 2.69. Invocation command line was
$ $0 $@
@@ -1694,20 +1691,6 @@ ac_config_headers="$ac_config_headers config.h"
# Check whether --with-bdr_only was given.
if test "${with_bdr_only+set}" = set; then :
withval=$with_bdr_only;
fi
if test "x$with_bdr_only" != "x"; then :
$as_echo "#define BDR_ONLY \"1\"" >>confdefs.h
fi
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for a sed that does not truncate output" >&5
$as_echo_n "checking for a sed that does not truncate output... " >&6; }
if ${ac_cv_path_SED+:} false; then :
@@ -1828,11 +1811,11 @@ fi
pgac_pg_config_version=$($PG_CONFIG --version 2>/dev/null)
major_version_num=$(echo "$pgac_pg_config_version"|
$SED -e 's/^PostgreSQL \([0-9]\{1,2\}\).*$/\1/')
$SED -e 's/^[^0-9]\+ \([0-9]\{1,2\}\).*$/\1/')
if test "$major_version_num" -lt '10'; then
version_num=$(echo "$pgac_pg_config_version"|
$SED -e 's/^PostgreSQL \([0-9]*\)\.\([0-9]*\)\([a-zA-Z0-9.]*\)$/\1.\2/')
$SED -e 's/^[^0-9]\+ \([0-9]*\)\.\([0-9]*\)\([a-zA-Z0-9.]*\)$/\1.\2/')
if test -z "$version_num"; then
as_fn_error $? "could not detect the PostgreSQL version, wrong or broken pg_config?" "$LINENO" 5
@@ -1841,12 +1824,12 @@ if test "$major_version_num" -lt '10'; then
version_num_int=$(echo "$version_num"|
$SED -e 's/^\([0-9]*\)\.\([0-9]*\)$/\1\2/')
if test "$version_num_int" -lt '93'; then
if test "$version_num_int" -lt '94'; then
as_fn_error $? "repmgr is not compatible with detected PostgreSQL version: $version_num" "$LINENO" 5
fi
else
version_num=$(echo "$pgac_pg_config_version"|
$SED -e 's/^PostgreSQL \(.\+\)$/\1/')
$SED -e 's/^[^0-9]\+ \(.\+\)$/\1/')
if test -z "$version_num"; then
as_fn_error $? "could not detect the PostgreSQL version, wrong or broken pg_config?" "$LINENO" 5
@@ -1867,12 +1850,137 @@ else
fi
# Extract the first word of "gnused", so it can be a program name with args.
set dummy gnused; ac_word=$2
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
$as_echo_n "checking for $ac_word... " >&6; }
if ${ac_cv_prog_HAVE_GNUSED+:} false; then :
$as_echo_n "(cached) " >&6
else
if test -n "$HAVE_GNUSED"; then
ac_cv_prog_HAVE_GNUSED="$HAVE_GNUSED" # Let the user override the test.
else
as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
for as_dir in $PATH
do
IFS=$as_save_IFS
test -z "$as_dir" && as_dir=.
for ac_exec_ext in '' $ac_executable_extensions; do
if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
ac_cv_prog_HAVE_GNUSED="yes"
$as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
break 2
fi
done
done
IFS=$as_save_IFS
test -z "$ac_cv_prog_HAVE_GNUSED" && ac_cv_prog_HAVE_GNUSED="no"
fi
fi
HAVE_GNUSED=$ac_cv_prog_HAVE_GNUSED
if test -n "$HAVE_GNUSED"; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $HAVE_GNUSED" >&5
$as_echo "$HAVE_GNUSED" >&6; }
else
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
$as_echo "no" >&6; }
fi
# Extract the first word of "gsed", so it can be a program name with args.
set dummy gsed; ac_word=$2
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
$as_echo_n "checking for $ac_word... " >&6; }
if ${ac_cv_prog_HAVE_GSED+:} false; then :
$as_echo_n "(cached) " >&6
else
if test -n "$HAVE_GSED"; then
ac_cv_prog_HAVE_GSED="$HAVE_GSED" # Let the user override the test.
else
as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
for as_dir in $PATH
do
IFS=$as_save_IFS
test -z "$as_dir" && as_dir=.
for ac_exec_ext in '' $ac_executable_extensions; do
if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
ac_cv_prog_HAVE_GSED="yes"
$as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
break 2
fi
done
done
IFS=$as_save_IFS
test -z "$ac_cv_prog_HAVE_GSED" && ac_cv_prog_HAVE_GSED="no"
fi
fi
HAVE_GSED=$ac_cv_prog_HAVE_GSED
if test -n "$HAVE_GSED"; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $HAVE_GSED" >&5
$as_echo "$HAVE_GSED" >&6; }
else
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
$as_echo "no" >&6; }
fi
# Extract the first word of "sed", so it can be a program name with args.
set dummy sed; ac_word=$2
{ $as_echo "$as_me:${as_lineno-$LINENO}: checking for $ac_word" >&5
$as_echo_n "checking for $ac_word... " >&6; }
if ${ac_cv_prog_HAVE_SED+:} false; then :
$as_echo_n "(cached) " >&6
else
if test -n "$HAVE_SED"; then
ac_cv_prog_HAVE_SED="$HAVE_SED" # Let the user override the test.
else
as_save_IFS=$IFS; IFS=$PATH_SEPARATOR
for as_dir in $PATH
do
IFS=$as_save_IFS
test -z "$as_dir" && as_dir=.
for ac_exec_ext in '' $ac_executable_extensions; do
if as_fn_executable_p "$as_dir/$ac_word$ac_exec_ext"; then
ac_cv_prog_HAVE_SED="yes"
$as_echo "$as_me:${as_lineno-$LINENO}: found $as_dir/$ac_word$ac_exec_ext" >&5
break 2
fi
done
done
IFS=$as_save_IFS
test -z "$ac_cv_prog_HAVE_SED" && ac_cv_prog_HAVE_SED="no"
fi
fi
HAVE_SED=$ac_cv_prog_HAVE_SED
if test -n "$HAVE_SED"; then
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $HAVE_SED" >&5
$as_echo "$HAVE_SED" >&6; }
else
{ $as_echo "$as_me:${as_lineno-$LINENO}: result: no" >&5
$as_echo "no" >&6; }
fi
if test "$HAVE_GNUSED" = yes; then
SED=gnused
else
if test "$HAVE_GSED" = yes; then
SED=gsed
else
SED=sed
fi
fi
ac_config_files="$ac_config_files Makefile"
ac_config_files="$ac_config_files Makefile.global"
ac_config_files="$ac_config_files doc/Makefile"
cat >confcache <<\_ACEOF
# This file is a shell script that caches the results of configure
# tests run on this system so they can be shared between configure
@@ -2379,7 +2487,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
# report actual input values of CONFIG_FILES etc. instead of their
# values after options handling.
ac_log="
This file was extended by repmgr $as_me 4.0, which was
This file was extended by repmgr $as_me 5.4.0, which was
generated by GNU Autoconf 2.69. Invocation command line was
CONFIG_FILES = $CONFIG_FILES
@@ -2435,14 +2543,14 @@ $config_files
Configuration headers:
$config_headers
Report bugs to <pgsql-bugs@postgresql.org>.
repmgr home page: <https://2ndquadrant.com/en/resources/repmgr/>."
Report bugs to <repmgr@googlegroups.com>.
repmgr home page: <https://repmgr.org/>."
_ACEOF
cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`"
ac_cs_version="\\
repmgr config.status 4.0
repmgr config.status 5.4.0
configured by $0, generated by GNU Autoconf 2.69,
with options \\"\$ac_cs_config\\"
@@ -2566,7 +2674,6 @@ do
"config.h") CONFIG_HEADERS="$CONFIG_HEADERS config.h" ;;
"Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;;
"Makefile.global") CONFIG_FILES="$CONFIG_FILES Makefile.global" ;;
"doc/Makefile") CONFIG_FILES="$CONFIG_FILES doc/Makefile" ;;
*) as_fn_error $? "invalid argument: \`$ac_config_target'" "$LINENO" 5;;
esac

View File

@@ -1,17 +1,11 @@
AC_INIT([repmgr], [4.0], [pgsql-bugs@postgresql.org], [repmgr], [https://2ndquadrant.com/en/resources/repmgr/])
AC_INIT([repmgr], [5.5.0], [repmgr@googlegroups.com], [repmgr], [https://repmgr.org/])
AC_COPYRIGHT([Copyright (c) 2010-2017, 2ndQuadrant Ltd.])
AC_COPYRIGHT([Copyright (c) 2010-2024, EnterpriseDB Corporation])
AC_CONFIG_HEADER(config.h)
AC_ARG_VAR([PG_CONFIG], [Location to find pg_config for target PostgreSQL (default PATH)])
AC_ARG_WITH([bdr_only], [AS_HELP_STRING([--with-bdr-only], [BDR-only build])])
AS_IF([test "x$with_bdr_only" != "x"],
[AC_DEFINE([BDR_ONLY], ["1"], [Only build repmgr for BDR])]
)
AC_PROG_SED
if test -z "$PG_CONFIG"; then
@@ -25,11 +19,11 @@ fi
pgac_pg_config_version=$($PG_CONFIG --version 2>/dev/null)
major_version_num=$(echo "$pgac_pg_config_version"|
$SED -e 's/^PostgreSQL \([[0-9]]\{1,2\}\).*$/\1/')
$SED -e 's/^[[^0-9]]\+ \([[0-9]]\{1,2\}\).*$/\1/')
if test "$major_version_num" -lt '10'; then
version_num=$(echo "$pgac_pg_config_version"|
$SED -e 's/^PostgreSQL \([[0-9]]*\)\.\([[0-9]]*\)\([[a-zA-Z0-9.]]*\)$/\1.\2/')
$SED -e 's/^[[^0-9]]\+ \([[0-9]]*\)\.\([[0-9]]*\)\([[a-zA-Z0-9.]]*\)$/\1.\2/')
if test -z "$version_num"; then
AC_MSG_ERROR([could not detect the PostgreSQL version, wrong or broken pg_config?])
@@ -38,12 +32,12 @@ if test "$major_version_num" -lt '10'; then
version_num_int=$(echo "$version_num"|
$SED -e 's/^\([[0-9]]*\)\.\([[0-9]]*\)$/\1\2/')
if test "$version_num_int" -lt '93'; then
if test "$version_num_int" -lt '94'; then
AC_MSG_ERROR([repmgr is not compatible with detected PostgreSQL version: $version_num])
fi
else
version_num=$(echo "$pgac_pg_config_version"|
$SED -e 's/^PostgreSQL \(.\+\)$/\1/')
$SED -e 's/^[[^0-9]]\+ \(.\+\)$/\1/')
if test -z "$version_num"; then
AC_MSG_ERROR([could not detect the PostgreSQL version, wrong or broken pg_config?])
@@ -63,8 +57,43 @@ else
fi
AC_SUBST(vpath_build)
AC_CHECK_PROG(HAVE_GNUSED,gnused,yes,no)
AC_CHECK_PROG(HAVE_GSED,gsed,yes,no)
AC_CHECK_PROG(HAVE_SED,sed,yes,no)
AC_CHECK_PROG(HAVE_FLEX,flex,yes,no)
if test "$HAVE_GNUSED" = yes; then
SED=gnused
else
if test "$HAVE_GSED" = yes; then
SED=gsed
else
SED=sed
fi
fi
AC_SUBST(SED)
AS_IF([test x"$HAVE_FLEX" != x"yes"], AC_MSG_ERROR([flex should be installed first]))
#Checking libraries
GENERIC_LIB_FAILED_MSG="library should be installed"
AC_CHECK_LIB(selinux, is_selinux_enabled, [],
[AC_MSG_ERROR(['selinux' $GENERIC_LIB_FAILED_MSG])])
AC_CHECK_LIB(lz4, LZ4_compress_default, [],
[AC_MSG_ERROR(['Z4' $GENERIC_LIB_FAILED_MSG])])
AC_CHECK_LIB(xslt, xsltCleanupGlobals, [],
[AC_MSG_ERROR(['xslt' $GENERIC_LIB_FAILED_MSG])])
AC_CHECK_LIB(pam, pam_start, [],
[AC_MSG_ERROR(['pam' $GENERIC_LIB_FAILED_MSG])])
AC_CHECK_LIB(gssapi_krb5, gss_init_sec_context, [],
[AC_MSG_ERROR([gssapi_krb5 $GENERIC_LIB_FAILED_MSG])])
AC_CONFIG_FILES([Makefile])
AC_CONFIG_FILES([Makefile.global])
AC_CONFIG_FILES([doc/Makefile])
AC_OUTPUT

View File

@@ -73,7 +73,16 @@ while(<$fh>) {
if ($param eq 'data_directory') {
$data_directory_found = 1;
}
push @outp, $line;
# Don't quote numbers
if ($value =~ /^\d+$/) {
push @outp, sprintf(q|%s=%s|, $param, $value);
}
# Quote everything else
else {
$value =~ s/'/''/g;
push @outp, sprintf(q|%s='%s'|, $param, $value);
}
}
}
@@ -83,6 +92,6 @@ print join("\n", @outp);
print "\n";
if ($data_directory_found == 0) {
print "data_directory=\n";
print "data_directory=''\n";
}

View File

@@ -1,6 +1,12 @@
/*
* controldata.c
* Copyright (c) 2ndQuadrant, 2010-2017
* controldata.c - functions for reading the pg_control file
*
* The functions provided here enable repmgr to read a pg_control file
* in a version-independent way, even if the PostgreSQL instance is not
* running. For that reason we can't use on the pg_control_*() functions
* provided in PostgreSQL 9.6 and later.
*
* Copyright (c) EnterpriseDB Corporation, 2010-2021
*
* Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -30,6 +36,53 @@
static ControlFileInfo *get_controlfile(const char *DataDir);
int
get_pg_version(const char *data_directory, char *version_string)
{
char PgVersionPath[MAXPGPATH] = "";
FILE *fp = NULL;
char *endptr = NULL;
char file_version_string[MAX_VERSION_STRING] = "";
long file_major, file_minor;
int ret;
snprintf(PgVersionPath, MAXPGPATH, "%s/PG_VERSION", data_directory);
fp = fopen(PgVersionPath, "r");
if (fp == NULL)
{
log_warning(_("could not open file \"%s\" for reading"),
PgVersionPath);
log_detail("%s", strerror(errno));
return UNKNOWN_SERVER_VERSION_NUM;
}
file_version_string[0] = '\0';
ret = fscanf(fp, "%23s", file_version_string);
fclose(fp);
if (ret != 1 || endptr == file_version_string)
{
log_warning(_("unable to determine major version number from PG_VERSION"));
return UNKNOWN_SERVER_VERSION_NUM;
}
file_major = strtol(file_version_string, &endptr, 10);
file_minor = 0;
if (*endptr == '.')
file_minor = strtol(endptr + 1, NULL, 10);
if (version_string != NULL)
strncpy(version_string, file_version_string, MAX_VERSION_STRING);
return ((int) file_major * 10000) + ((int) file_minor * 100);
}
uint64
get_system_identifier(const char *data_directory)
{
@@ -39,38 +92,33 @@ get_system_identifier(const char *data_directory)
control_file_info = get_controlfile(data_directory);
if (control_file_info->control_file_processed == true)
system_identifier = control_file_info->control_file->system_identifier;
else
system_identifier = UNKNOWN_SYSTEM_IDENTIFIER;
system_identifier = control_file_info->system_identifier;
pfree(control_file_info->control_file);
pfree(control_file_info);
return system_identifier;
}
DBState
get_db_state(const char *data_directory)
bool
get_db_state(const char *data_directory, DBState *state)
{
ControlFileInfo *control_file_info = NULL;
DBState state;
bool control_file_processed;
control_file_info = get_controlfile(data_directory);
control_file_processed = control_file_info->control_file_processed;
if (control_file_info->control_file_processed == true)
state = control_file_info->control_file->state;
else
/* if we were unable to parse the control file, assume DB is shut down */
state = DB_SHUTDOWNED;
if (control_file_processed == true)
*state = control_file_info->state;
pfree(control_file_info->control_file);
pfree(control_file_info);
return state;
return control_file_processed;
}
extern XLogRecPtr
XLogRecPtr
get_latest_checkpoint_location(const char *data_directory)
{
ControlFileInfo *control_file_info = NULL;
@@ -78,12 +126,9 @@ get_latest_checkpoint_location(const char *data_directory)
control_file_info = get_controlfile(data_directory);
if (control_file_info->control_file_processed == false)
return InvalidXLogRecPtr;
if (control_file_info->control_file_processed == true)
checkPoint = control_file_info->checkPoint;
checkPoint = control_file_info->control_file->checkPoint;
pfree(control_file_info->control_file);
pfree(control_file_info);
return checkPoint;
@@ -94,20 +139,13 @@ int
get_data_checksum_version(const char *data_directory)
{
ControlFileInfo *control_file_info = NULL;
int data_checksum_version = -1;
int data_checksum_version = UNKNOWN_DATA_CHECKSUM_VERSION;
control_file_info = get_controlfile(data_directory);
if (control_file_info->control_file_processed == false)
{
data_checksum_version = -1;
}
else
{
data_checksum_version = (int) control_file_info->control_file->data_checksum_version;
}
if (control_file_info->control_file_processed == true)
data_checksum_version = (int) control_file_info->data_checksum_version;
pfree(control_file_info->control_file);
pfree(control_file_info);
return data_checksum_version;
@@ -134,38 +172,148 @@ describe_db_state(DBState state)
case DB_IN_PRODUCTION:
return _("in production");
}
return _("unrecognized status code");
}
TimeLineID
get_timeline(const char *data_directory)
{
ControlFileInfo *control_file_info = NULL;
TimeLineID timeline = -1;
control_file_info = get_controlfile(data_directory);
timeline = (int) control_file_info->timeline;
pfree(control_file_info);
return timeline;
}
TimeLineID
get_min_recovery_end_timeline(const char *data_directory)
{
ControlFileInfo *control_file_info = NULL;
TimeLineID timeline = -1;
control_file_info = get_controlfile(data_directory);
timeline = (int) control_file_info->minRecoveryPointTLI;
pfree(control_file_info);
return timeline;
}
XLogRecPtr
get_min_recovery_location(const char *data_directory)
{
ControlFileInfo *control_file_info = NULL;
XLogRecPtr minRecoveryPoint = InvalidXLogRecPtr;
control_file_info = get_controlfile(data_directory);
minRecoveryPoint = control_file_info->minRecoveryPoint;
pfree(control_file_info);
return minRecoveryPoint;
}
/*
* we maintain our own version of get_controlfile() as we need cross-version
* We maintain our own version of get_controlfile() as we need cross-version
* compatibility, and also don't care if the file isn't readable.
*/
static ControlFileInfo *
get_controlfile(const char *DataDir)
{
char file_version_string[MAX_VERSION_STRING] = "";
ControlFileInfo *control_file_info;
int fd;
int fd, version_num;
char ControlFilePath[MAXPGPATH] = "";
void *ControlFileDataPtr = NULL;
int expected_size = 0;
control_file_info = palloc0(sizeof(ControlFileInfo));
/* set default values */
control_file_info->control_file_processed = false;
control_file_info->control_file = palloc0(sizeof(ControlFileData));
control_file_info->system_identifier = UNKNOWN_SYSTEM_IDENTIFIER;
control_file_info->state = DB_SHUTDOWNED;
control_file_info->checkPoint = InvalidXLogRecPtr;
control_file_info->data_checksum_version = -1;
control_file_info->timeline = -1;
control_file_info->minRecoveryPointTLI = -1;
control_file_info->minRecoveryPoint = InvalidXLogRecPtr;
/*
* Read PG_VERSION, as we'll need to determine which struct to read
* the control file contents into
*/
version_num = get_pg_version(DataDir, file_version_string);
if (version_num == UNKNOWN_SERVER_VERSION_NUM)
{
log_warning(_("unable to determine server version number from PG_VERSION"));
return control_file_info;
}
if (version_num < MIN_SUPPORTED_VERSION_NUM)
{
log_warning(_("data directory appears to be initialised for %s"),
file_version_string);
log_detail(_("minimum supported PostgreSQL version is %s"),
MIN_SUPPORTED_VERSION);
return control_file_info;
}
snprintf(ControlFilePath, MAXPGPATH, "%s/global/pg_control", DataDir);
if ((fd = open(ControlFilePath, O_RDONLY | PG_BINARY, 0)) == -1)
{
log_debug("could not open file \"%s\" for reading: %s",
ControlFilePath, strerror(errno));
log_warning(_("could not open file \"%s\" for reading"),
ControlFilePath);
log_detail("%s", strerror(errno));
return control_file_info;
}
if (read(fd, control_file_info->control_file, sizeof(ControlFileData)) != sizeof(ControlFileData))
if (version_num >= 120000)
{
log_debug("could not read file \"%s\": %s",
ControlFilePath, strerror(errno));
#if PG_ACTUAL_VERSION_NUM >= 120000
expected_size = sizeof(ControlFileData12);
ControlFileDataPtr = palloc0(expected_size);
#endif
}
else if (version_num >= 110000)
{
expected_size = sizeof(ControlFileData11);
ControlFileDataPtr = palloc0(expected_size);
}
else if (version_num >= 90500)
{
expected_size = sizeof(ControlFileData95);
ControlFileDataPtr = palloc0(expected_size);
}
else if (version_num >= 90400)
{
expected_size = sizeof(ControlFileData94);
ControlFileDataPtr = palloc0(expected_size);
}
if (read(fd, ControlFileDataPtr, expected_size) != expected_size)
{
log_warning(_("could not read file \"%s\""),
ControlFilePath);
log_detail("%s", strerror(errno));
close(fd);
return control_file_info;
}
@@ -173,12 +321,62 @@ get_controlfile(const char *DataDir)
control_file_info->control_file_processed = true;
if (version_num >= 120000)
{
#if PG_ACTUAL_VERSION_NUM >= 120000
ControlFileData12 *ptr = (struct ControlFileData12 *)ControlFileDataPtr;
control_file_info->system_identifier = ptr->system_identifier;
control_file_info->state = ptr->state;
control_file_info->checkPoint = ptr->checkPoint;
control_file_info->data_checksum_version = ptr->data_checksum_version;
control_file_info->timeline = ptr->checkPointCopy.ThisTimeLineID;
control_file_info->minRecoveryPointTLI = ptr->minRecoveryPointTLI;
control_file_info->minRecoveryPoint = ptr->minRecoveryPoint;
#else
fprintf(stderr, "ERROR: please use a repmgr version built for PostgreSQL 12 or later\n");
exit(ERR_BAD_CONFIG);
#endif
}
else if (version_num >= 110000)
{
ControlFileData11 *ptr = (struct ControlFileData11 *)ControlFileDataPtr;
control_file_info->system_identifier = ptr->system_identifier;
control_file_info->state = ptr->state;
control_file_info->checkPoint = ptr->checkPoint;
control_file_info->data_checksum_version = ptr->data_checksum_version;
control_file_info->timeline = ptr->checkPointCopy.ThisTimeLineID;
control_file_info->minRecoveryPointTLI = ptr->minRecoveryPointTLI;
control_file_info->minRecoveryPoint = ptr->minRecoveryPoint;
}
else if (version_num >= 90500)
{
ControlFileData95 *ptr = (struct ControlFileData95 *)ControlFileDataPtr;
control_file_info->system_identifier = ptr->system_identifier;
control_file_info->state = ptr->state;
control_file_info->checkPoint = ptr->checkPoint;
control_file_info->data_checksum_version = ptr->data_checksum_version;
control_file_info->timeline = ptr->checkPointCopy.ThisTimeLineID;
control_file_info->minRecoveryPointTLI = ptr->minRecoveryPointTLI;
control_file_info->minRecoveryPoint = ptr->minRecoveryPoint;
}
else if (version_num >= 90400)
{
ControlFileData94 *ptr = (struct ControlFileData94 *)ControlFileDataPtr;
control_file_info->system_identifier = ptr->system_identifier;
control_file_info->state = ptr->state;
control_file_info->checkPoint = ptr->checkPoint;
control_file_info->data_checksum_version = ptr->data_checksum_version;
control_file_info->timeline = ptr->checkPointCopy.ThisTimeLineID;
control_file_info->minRecoveryPointTLI = ptr->minRecoveryPointTLI;
control_file_info->minRecoveryPoint = ptr->minRecoveryPoint;
}
pfree(ControlFileDataPtr);
/*
* We don't check the CRC here as we're potentially checking a pg_control
* file from a different PostgreSQL version to the one repmgr was compiled
* against. However we're only interested in the first few fields, which
* should be constant across supported versions
*
* against.
*/
return control_file_info;

View File

@@ -1,6 +1,6 @@
/*
* controldata.h
* Copyright (c) 2ndQuadrant, 2010-2017
* Copyright (c) EnterpriseDB Corporation, 2010-2021
*
* Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
@@ -12,16 +12,378 @@
#include "postgres_fe.h"
#include "catalog/pg_control.h"
#define MAX_VERSION_STRING 24
/*
* A simplified representation of pg_control containing only those fields
* required by repmgr.
*/
typedef struct
{
bool control_file_processed;
ControlFileData *control_file;
uint64 system_identifier;
DBState state;
XLogRecPtr checkPoint;
uint32 data_checksum_version;
TimeLineID timeline;
TimeLineID minRecoveryPointTLI;
XLogRecPtr minRecoveryPoint;
} ControlFileInfo;
extern DBState get_db_state(const char *data_directory);
typedef struct CheckPoint94
{
XLogRecPtr redo; /* next RecPtr available when we began to
* create CheckPoint (i.e. REDO start point) */
TimeLineID ThisTimeLineID; /* current TLI */
TimeLineID PrevTimeLineID; /* previous TLI, if this record begins a new
* timeline (equals ThisTimeLineID otherwise) */
bool fullPageWrites; /* current full_page_writes */
uint32 nextXidEpoch; /* higher-order bits of nextXid */
TransactionId nextXid; /* next free XID */
Oid nextOid; /* next free OID */
MultiXactId nextMulti; /* next free MultiXactId */
MultiXactOffset nextMultiOffset; /* next free MultiXact offset */
TransactionId oldestXid; /* cluster-wide minimum datfrozenxid */
Oid oldestXidDB; /* database with minimum datfrozenxid */
MultiXactId oldestMulti; /* cluster-wide minimum datminmxid */
Oid oldestMultiDB; /* database with minimum datminmxid */
pg_time_t time; /* time stamp of checkpoint */
TransactionId oldestActiveXid;
} CheckPoint94;
/* Same for 9.5, 9.6, 10, 11 */
typedef struct CheckPoint95
{
XLogRecPtr redo; /* next RecPtr available when we began to
* create CheckPoint (i.e. REDO start point) */
TimeLineID ThisTimeLineID; /* current TLI */
TimeLineID PrevTimeLineID; /* previous TLI, if this record begins a new
* timeline (equals ThisTimeLineID otherwise) */
bool fullPageWrites; /* current full_page_writes */
uint32 nextXidEpoch; /* higher-order bits of nextXid */
TransactionId nextXid; /* next free XID */
Oid nextOid; /* next free OID */
MultiXactId nextMulti; /* next free MultiXactId */
MultiXactOffset nextMultiOffset; /* next free MultiXact offset */
TransactionId oldestXid; /* cluster-wide minimum datfrozenxid */
Oid oldestXidDB; /* database with minimum datfrozenxid */
MultiXactId oldestMulti; /* cluster-wide minimum datminmxid */
Oid oldestMultiDB; /* database with minimum datminmxid */
pg_time_t time; /* time stamp of checkpoint */
TransactionId oldestCommitTsXid; /* oldest Xid with valid commit
* timestamp */
TransactionId newestCommitTsXid; /* newest Xid with valid commit
* timestamp */
TransactionId oldestActiveXid;
} CheckPoint95;
#if PG_ACTUAL_VERSION_NUM >= 120000
/*
* Following fields removed in PostgreSQL 12;
*
* uint32 nextXidEpoch;
* TransactionId nextXid;
*
* and replaced by:
*
* FullTransactionId nextFullXid;
*/
typedef struct CheckPoint12
{
XLogRecPtr redo; /* next RecPtr available when we began to
* create CheckPoint (i.e. REDO start point) */
TimeLineID ThisTimeLineID; /* current TLI */
TimeLineID PrevTimeLineID; /* previous TLI, if this record begins a new
* timeline (equals ThisTimeLineID otherwise) */
bool fullPageWrites; /* current full_page_writes */
FullTransactionId nextFullXid; /* next free full transaction ID */
Oid nextOid; /* next free OID */
MultiXactId nextMulti; /* next free MultiXactId */
MultiXactOffset nextMultiOffset; /* next free MultiXact offset */
TransactionId oldestXid; /* cluster-wide minimum datfrozenxid */
Oid oldestXidDB; /* database with minimum datfrozenxid */
MultiXactId oldestMulti; /* cluster-wide minimum datminmxid */
Oid oldestMultiDB; /* database with minimum datminmxid */
pg_time_t time; /* time stamp of checkpoint */
TransactionId oldestCommitTsXid; /* oldest Xid with valid commit
* timestamp */
TransactionId newestCommitTsXid; /* newest Xid with valid commit
* timestamp */
/*
* Oldest XID still running. This is only needed to initialize hot standby
* mode from an online checkpoint, so we only bother calculating this for
* online checkpoints and only when wal_level is replica. Otherwise it's
* set to InvalidTransactionId.
*/
TransactionId oldestActiveXid;
} CheckPoint12;
#endif
typedef struct ControlFileData94
{
uint64 system_identifier;
uint32 pg_control_version; /* PG_CONTROL_VERSION */
uint32 catalog_version_no; /* see catversion.h */
DBState state; /* see enum above */
pg_time_t time; /* time stamp of last pg_control update */
XLogRecPtr checkPoint; /* last check point record ptr */
XLogRecPtr prevCheckPoint; /* previous check point record ptr */
CheckPoint94 checkPointCopy; /* copy of last check point record */
XLogRecPtr unloggedLSN; /* current fake LSN value, for unlogged rels */
XLogRecPtr minRecoveryPoint;
TimeLineID minRecoveryPointTLI;
XLogRecPtr backupStartPoint;
XLogRecPtr backupEndPoint;
bool backupEndRequired;
int wal_level;
bool wal_log_hints;
int MaxConnections;
int max_worker_processes;
int max_prepared_xacts;
int max_locks_per_xact;
uint32 maxAlign; /* alignment requirement for tuples */
double floatFormat; /* constant 1234567.0 */
uint32 blcksz; /* data block size for this DB */
uint32 relseg_size; /* blocks per segment of large relation */
uint32 xlog_blcksz; /* block size within WAL files */
uint32 xlog_seg_size; /* size of each WAL segment */
uint32 nameDataLen; /* catalog name field width */
uint32 indexMaxKeys; /* max number of columns in an index */
uint32 toast_max_chunk_size; /* chunk size in TOAST tables */
uint32 loblksize; /* chunk size in pg_largeobject */
bool enableIntTimes; /* int64 storage enabled? */
bool float4ByVal; /* float4 pass-by-value? */
bool float8ByVal; /* float8, int8, etc pass-by-value? */
/* Are data pages protected by checksums? Zero if no checksum version */
uint32 data_checksum_version;
} ControlFileData94;
/*
* Following field added since 9.4:
*
* bool track_commit_timestamp;
*
* Unchanged in 9.6
*
* In 10, following field appended *after* "data_checksum_version":
*
* char mock_authentication_nonce[MOCK_AUTH_NONCE_LEN];
*
* (but we don't care about that)
*/
typedef struct ControlFileData95
{
uint64 system_identifier;
uint32 pg_control_version; /* PG_CONTROL_VERSION */
uint32 catalog_version_no; /* see catversion.h */
DBState state; /* see enum above */
pg_time_t time; /* time stamp of last pg_control update */
XLogRecPtr checkPoint; /* last check point record ptr */
XLogRecPtr prevCheckPoint; /* previous check point record ptr */
CheckPoint95 checkPointCopy; /* copy of last check point record */
XLogRecPtr unloggedLSN; /* current fake LSN value, for unlogged rels */
XLogRecPtr minRecoveryPoint;
TimeLineID minRecoveryPointTLI;
XLogRecPtr backupStartPoint;
XLogRecPtr backupEndPoint;
bool backupEndRequired;
int wal_level;
bool wal_log_hints;
int MaxConnections;
int max_worker_processes;
int max_prepared_xacts;
int max_locks_per_xact;
bool track_commit_timestamp;
uint32 maxAlign; /* alignment requirement for tuples */
double floatFormat; /* constant 1234567.0 */
uint32 blcksz; /* data block size for this DB */
uint32 relseg_size; /* blocks per segment of large relation */
uint32 xlog_blcksz; /* block size within WAL files */
uint32 xlog_seg_size; /* size of each WAL segment */
uint32 nameDataLen; /* catalog name field width */
uint32 indexMaxKeys; /* max number of columns in an index */
uint32 toast_max_chunk_size; /* chunk size in TOAST tables */
uint32 loblksize; /* chunk size in pg_largeobject */
bool enableIntTimes; /* int64 storage enabled? */
bool float4ByVal; /* float4 pass-by-value? */
bool float8ByVal; /* float8, int8, etc pass-by-value? */
uint32 data_checksum_version;
} ControlFileData95;
/*
* Following field removed in 11:
*
* XLogRecPtr prevCheckPoint;
*
* In 10, following field appended *after* "data_checksum_version":
*
* char mock_authentication_nonce[MOCK_AUTH_NONCE_LEN];
*
* (but we don't care about that)
*/
typedef struct ControlFileData11
{
uint64 system_identifier;
uint32 pg_control_version; /* PG_CONTROL_VERSION */
uint32 catalog_version_no; /* see catversion.h */
DBState state; /* see enum above */
pg_time_t time; /* time stamp of last pg_control update */
XLogRecPtr checkPoint; /* last check point record ptr */
CheckPoint95 checkPointCopy; /* copy of last check point record */
XLogRecPtr unloggedLSN; /* current fake LSN value, for unlogged rels */
XLogRecPtr minRecoveryPoint;
TimeLineID minRecoveryPointTLI;
XLogRecPtr backupStartPoint;
XLogRecPtr backupEndPoint;
bool backupEndRequired;
int wal_level;
bool wal_log_hints;
int MaxConnections;
int max_worker_processes;
int max_prepared_xacts;
int max_locks_per_xact;
bool track_commit_timestamp;
uint32 maxAlign; /* alignment requirement for tuples */
double floatFormat; /* constant 1234567.0 */
uint32 blcksz; /* data block size for this DB */
uint32 relseg_size; /* blocks per segment of large relation */
uint32 xlog_blcksz; /* block size within WAL files */
uint32 xlog_seg_size; /* size of each WAL segment */
uint32 nameDataLen; /* catalog name field width */
uint32 indexMaxKeys; /* max number of columns in an index */
uint32 toast_max_chunk_size; /* chunk size in TOAST tables */
uint32 loblksize; /* chunk size in pg_largeobject */
bool enableIntTimes; /* int64 storage enabled? */
bool float4ByVal; /* float4 pass-by-value? */
bool float8ByVal; /* float8, int8, etc pass-by-value? */
uint32 data_checksum_version;
} ControlFileData11;
#if PG_ACTUAL_VERSION_NUM >= 120000
/*
* Following field added in Pg12:
*
* int max_wal_senders;
*/
typedef struct ControlFileData12
{
uint64 system_identifier;
uint32 pg_control_version; /* PG_CONTROL_VERSION */
uint32 catalog_version_no; /* see catversion.h */
DBState state; /* see enum above */
pg_time_t time; /* time stamp of last pg_control update */
XLogRecPtr checkPoint; /* last check point record ptr */
CheckPoint12 checkPointCopy; /* copy of last check point record */
XLogRecPtr unloggedLSN; /* current fake LSN value, for unlogged rels */
XLogRecPtr minRecoveryPoint;
TimeLineID minRecoveryPointTLI;
XLogRecPtr backupStartPoint;
XLogRecPtr backupEndPoint;
bool backupEndRequired;
int wal_level;
bool wal_log_hints;
int MaxConnections;
int max_worker_processes;
int max_wal_senders;
int max_prepared_xacts;
int max_locks_per_xact;
bool track_commit_timestamp;
uint32 maxAlign; /* alignment requirement for tuples */
double floatFormat; /* constant 1234567.0 */
uint32 blcksz; /* data block size for this DB */
uint32 relseg_size; /* blocks per segment of large relation */
uint32 xlog_blcksz; /* block size within WAL files */
uint32 xlog_seg_size; /* size of each WAL segment */
uint32 nameDataLen; /* catalog name field width */
uint32 indexMaxKeys; /* max number of columns in an index */
uint32 toast_max_chunk_size; /* chunk size in TOAST tables */
uint32 loblksize; /* chunk size in pg_largeobject */
bool float4ByVal; /* float4 pass-by-value? */
bool float8ByVal; /* float8, int8, etc pass-by-value? */
uint32 data_checksum_version;
} ControlFileData12;
#endif
extern int get_pg_version(const char *data_directory, char *version_string);
extern bool get_db_state(const char *data_directory, DBState *state);
extern const char *describe_db_state(DBState state);
extern int get_data_checksum_version(const char *data_directory);
extern uint64 get_system_identifier(const char *data_directory);
extern XLogRecPtr get_latest_checkpoint_location(const char *data_directory);
extern TimeLineID get_timeline(const char *data_directory);
extern TimeLineID get_min_recovery_end_timeline(const char *data_directory);
extern XLogRecPtr get_min_recovery_location(const char *data_directory);
#endif /* _CONTROLDATA_H_ */

4617
dbutils.c

File diff suppressed because it is too large Load Diff

363
dbutils.h
View File

@@ -1,7 +1,7 @@
/*
* dbutils.h
*
* Copyright (c) 2ndQuadrant, 2010-2017
* Copyright (c) EnterpriseDB Corporation, 2010-2021
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -20,6 +20,7 @@
#ifndef _REPMGR_DBUTILS_H_
#define _REPMGR_DBUTILS_H_
#include "access/timeline.h"
#include "access/xlogdefs.h"
#include "pqexpbuffer.h"
#include "portability/instr_time.h"
@@ -28,8 +29,36 @@
#include "strutil.h"
#include "voting.h"
#define REPMGR_NODES_COLUMNS "node_id, type, upstream_node_id, node_name, conninfo, repluser, slot_name, location, priority, active, config_file, '' AS upstream_node_name "
#define BDR_NODES_COLUMNS "node_sysid, node_timeline, node_dboid, node_status, node_name, node_local_dsn, node_init_from_dsn, node_read_only, node_seq_id"
#define REPMGR_NODES_COLUMNS \
"n.node_id, " \
"n.type, " \
"n.upstream_node_id, " \
"n.node_name, " \
"n.conninfo, " \
"n.repluser, " \
"n.slot_name, " \
"n.location, " \
"n.priority, " \
"n.active, " \
"n.config_file, " \
"'' AS upstream_node_name, " \
"NULL AS attached "
#define REPMGR_NODES_COLUMNS_WITH_UPSTREAM \
"n.node_id, " \
"n.type, " \
"n.upstream_node_id, " \
"n.node_name, " \
"n.conninfo, " \
"n.repluser, " \
"n.slot_name, " \
"n.location, " \
"n.priority, " \
"n.active, "\
"n.config_file, " \
"un.node_name AS upstream_node_name, " \
"NULL AS attached "
#define ERRBUFF_SIZE 512
@@ -38,12 +67,13 @@ typedef enum
UNKNOWN = 0,
PRIMARY,
STANDBY,
BDR
WITNESS
} t_server_type;
typedef enum
{
REPMGR_INSTALLED = 0,
REPMGR_OLD_VERSION_INSTALLED,
REPMGR_AVAILABLE,
REPMGR_UNAVAILABLE,
REPMGR_UNKNOWN
@@ -73,27 +103,91 @@ typedef enum
{
NODE_STATUS_UNKNOWN = -1,
NODE_STATUS_UP,
NODE_STATUS_SHUTTING_DOWN,
NODE_STATUS_DOWN,
NODE_STATUS_UNCLEAN_SHUTDOWN
NODE_STATUS_UNCLEAN_SHUTDOWN,
NODE_STATUS_REJECTED
} NodeStatus;
typedef enum
{
VR_VOTE_REFUSED = -1,
VR_POSITIVE_VOTE,
VR_NEGATIVE_VOTE
} VoteRequestResult;
CONN_UNKNOWN = -1,
CONN_OK,
CONN_BAD,
CONN_ERROR
} ConnectionStatus;
typedef enum
{
/* unable to query "pg_stat_replication" or other error */
NODE_ATTACHED_UNKNOWN = -1,
/* node has record in "pg_stat_replication" and state is not "streaming" */
NODE_ATTACHED,
/* node has record in "pg_stat_replication" but state is not "streaming" */
NODE_NOT_ATTACHED,
/* node has no record in "pg_stat_replication" */
NODE_DETACHED
} NodeAttached;
typedef enum
{
SLOT_UNKNOWN = -1,
SLOT_NOT_FOUND,
SLOT_NOT_PHYSICAL,
SLOT_INACTIVE,
SLOT_ACTIVE
} ReplSlotStatus;
typedef enum
{
BACKUP_STATE_UNKNOWN = -1,
BACKUP_STATE_IN_BACKUP,
BACKUP_STATE_NO_BACKUP
} BackupState;
/*
* Struct to store node information
* Struct to store extension version information
*/
typedef struct s_extension_versions {
char default_version[8];
int default_version_num;
char installed_version[8];
int installed_version_num;
} t_extension_versions;
#define T_EXTENSION_VERSIONS_INITIALIZER { \
"", \
UNKNOWN_SERVER_VERSION_NUM, \
"", \
UNKNOWN_SERVER_VERSION_NUM \
}
typedef struct
{
char current_timestamp[MAXLEN];
bool in_recovery;
TimeLineID timeline_id;
char timeline_id_str[MAXLEN];
XLogRecPtr last_wal_receive_lsn;
XLogRecPtr last_wal_replay_lsn;
char last_xact_replay_timestamp[MAXLEN];
int replication_lag_time;
bool receiving_streamed_wal;
bool wal_replay_paused;
int upstream_last_seen;
int upstream_node_id;
} ReplInfo;
/*
* Struct to store node information.
*
* The first section represents the contents of the "repmgr.nodes"
* table; subsequent section contain information collated in
* various contexts.
*/
typedef struct s_node_info
{
@@ -101,8 +195,8 @@ typedef struct s_node_info
int node_id;
int upstream_node_id;
t_server_type type;
char node_name[MAXLEN];
char upstream_node_name[MAXLEN];
char node_name[NAMEDATALEN];
char upstream_node_name[NAMEDATALEN];
char conninfo[MAXLEN];
char repluser[NAMEDATALEN];
char location[MAXLEN];
@@ -119,7 +213,7 @@ typedef struct s_node_info
/* for ad-hoc use e.g. when working with a list of nodes */
char details[MAXLEN];
bool reachable;
bool attached;
NodeAttached attached;
/* various statistics */
int max_wal_senders;
int attached_wal_receivers;
@@ -127,6 +221,8 @@ typedef struct s_node_info
int total_replication_slots;
int active_replication_slots;
int inactive_replication_slots;
/* replication info */
ReplInfo *replication_info;
} t_node_info;
@@ -151,9 +247,10 @@ typedef struct s_node_info
MS_NORMAL, \
NULL, \
/* for ad-hoc use e.g. when working with a list of nodes */ \
"", true, true \
"", true, true, \
/* various statistics */ \
-1, -1, -1, -1, -1, -1 \
-1, -1, -1, -1, -1, -1, \
NULL \
}
@@ -181,11 +278,13 @@ typedef struct s_event_info
{
char *node_name;
char *conninfo_str;
int node_id;
} t_event_info;
#define T_EVENT_INFO_INITIALIZER { \
NULL, \
NULL \
NULL, \
UNKNOWN_NODE_ID \
}
@@ -227,66 +326,6 @@ typedef struct s_connection_user
#define T_CONNECTION_USER_INITIALIZER { "", false }
/* represents an entry in bdr.bdr_nodes */
typedef struct s_bdr_node_info
{
char node_sysid[MAXLEN];
uint32 node_timeline;
uint32 node_dboid;
char node_status;
char node_name[MAXLEN];
char node_local_dsn[MAXLEN];
char node_init_from_dsn[MAXLEN];
bool read_only;
uint32 node_seq_id;
} t_bdr_node_info;
#define T_BDR_NODE_INFO_INITIALIZER { \
"", InvalidOid, InvalidOid, \
'?', "", "", "", \
false, -1 \
}
/* structs to store a list of BDR node records */
typedef struct BdrNodeInfoListCell
{
struct BdrNodeInfoListCell *next;
t_bdr_node_info *node_info;
} BdrNodeInfoListCell;
typedef struct BdrNodeInfoList
{
BdrNodeInfoListCell *head;
BdrNodeInfoListCell *tail;
int node_count;
} BdrNodeInfoList;
#define T_BDR_NODE_INFO_LIST_INITIALIZER { \
NULL, \
NULL, \
0 \
}
typedef struct
{
char current_timestamp[MAXLEN];
uint64 last_wal_receive_lsn;
uint64 last_wal_replay_lsn;
char last_xact_replay_timestamp[MAXLEN];
int replication_lag_time;
bool receiving_streamed_wal;
} ReplInfo;
#define T_REPLINFO_INTIALIZER { \
"", \
InvalidXLogRecPtr, \
InvalidXLogRecPtr, \
"", \
0 \
}
typedef struct
{
char filepath[MAXPGPATH];
@@ -296,6 +335,7 @@ typedef struct
#define T_CONFIGFILE_INFO_INITIALIZER { "", "", false }
typedef struct
{
int size;
@@ -305,6 +345,7 @@ typedef struct
#define T_CONFIGFILE_LIST_INITIALIZER { 0, 0, NULL }
typedef struct
{
uint64 system_identifier;
@@ -317,9 +358,24 @@ typedef struct
UNKNOWN_TIMELINE_ID, \
InvalidXLogRecPtr \
}
/* global variables */
extern int server_version_num;
typedef struct RepmgrdInfo {
int node_id;
int pid;
char pid_text[MAXLEN];
char pid_file[MAXLEN];
bool pg_running;
char pg_running_text[MAXLEN];
RecoveryType recovery_type;
bool running;
char repmgrd_running[MAXLEN];
bool paused;
bool wal_paused_pending_wal;
int upstream_last_seen;
char upstream_last_seen_text[MAXLEN];
} RepmgrdInfo;
/* macros */
@@ -329,33 +385,32 @@ extern int server_version_num;
/* utility functions */
XLogRecPtr parse_lsn(const char *str);
extern void
wrap_ddl_query(PQExpBufferData *query_buf, int replication_type, const char *fmt,...)
__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 4)));
bool atobool(const char *value);
/* connection functions */
PGconn *establish_db_connection(const char *conninfo,
PGconn *establish_db_connection(const char *conninfo,
const bool exit_on_error);
PGconn *establish_db_connection_quiet(const char *conninfo);
PGconn *establish_db_connection_as_user(const char *conninfo,
const char *user,
const bool exit_on_error);
PGconn *establish_db_connection_by_params(t_conninfo_param_list *param_list,
PGconn *establish_db_connection_by_params(t_conninfo_param_list *param_list,
const bool exit_on_error);
PGconn *establish_primary_db_connection(PGconn *conn,
const bool exit_on_error);
PGconn *establish_db_connection_with_replacement_param(const char *conninfo,
const char *param,
const char *value,
const bool exit_on_error);
PGconn *establish_replication_connection_from_conn(PGconn *conn, const char *repluser);
PGconn *establish_replication_connection_from_conninfo(const char *conninfo, const char *repluser);
PGconn *establish_primary_db_connection(PGconn *conn,
const bool exit_on_error);
PGconn *get_primary_connection(PGconn *standby_conn, int *primary_id, char *primary_conninfo_out);
PGconn *get_primary_connection_quiet(PGconn *standby_conn, int *primary_id, char *primary_conninfo_out);
PGconn *duplicate_connection(PGconn *conn, const char *user, bool replication);
bool is_superuser_connection(PGconn *conn, t_connection_user *userinfo);
void close_connection(PGconn **conn);
/* conninfo manipulation functions */
bool get_conninfo_value(const char *conninfo, const char *keyword, char *output);
bool get_conninfo_default_value(const char *param, char *output, int maxlen);
void initialize_conninfo_params(t_conninfo_param_list *param_list, bool set_defaults);
void free_conninfo_params(t_conninfo_param_list *param_list);
void copy_conninfo_params(t_conninfo_param_list *dest_list, t_conninfo_param_list *source_list);
@@ -363,68 +418,108 @@ void conn_to_param_list(PGconn *conn, t_conninfo_param_list *param_list);
void param_set(t_conninfo_param_list *param_list, const char *param, const char *value);
void param_set_ine(t_conninfo_param_list *param_list, const char *param, const char *value);
char *param_get(t_conninfo_param_list *param_list, const char *param);
bool parse_conninfo_string(const char *conninfo_str, t_conninfo_param_list *param_list, char *errmsg, bool ignore_local_params);
bool validate_conninfo_string(const char *conninfo_str, char **errmsg);
bool parse_conninfo_string(const char *conninfo_str, t_conninfo_param_list *param_list, char **errmsg, bool ignore_local_params);
char *param_list_to_string(t_conninfo_param_list *param_list);
char *normalize_conninfo_string(const char *conninfo_str);
bool has_passfile(void);
/* transaction functions */
bool begin_transaction(PGconn *conn);
bool commit_transaction(PGconn *conn);
bool rollback_transaction(PGconn *conn);
bool check_cluster_schema(PGconn *conn);
/* GUC manipulation functions */
bool set_config(PGconn *conn, const char *config_param, const char *config_value);
bool set_config_bool(PGconn *conn, const char *config_param, bool state);
int guc_set(PGconn *conn, const char *parameter, const char *op,
const char *value);
int guc_set_typed(PGconn *conn, const char *parameter, const char *op,
const char *value, const char *datatype);
int guc_set(PGconn *conn, const char *parameter, const char *op, const char *value);
bool get_pg_setting(PGconn *conn, const char *setting, char *output);
bool get_pg_setting_bool(PGconn *conn, const char *setting, bool *output);
bool get_pg_setting_int(PGconn *conn, const char *setting, int *output);
bool alter_system_int(PGconn *conn, const char *name, int value);
bool pg_reload_conf(PGconn *conn);
/* server information functions */
bool get_cluster_size(PGconn *conn, char *size);
int get_server_version(PGconn *conn, char *server_version);
int get_server_version(PGconn *conn, char *server_version_buf);
RecoveryType get_recovery_type(PGconn *conn);
int get_primary_node_id(PGconn *conn);
bool can_use_pg_rewind(PGconn *conn, const char *data_directory, PQExpBufferData *reason);
int get_ready_archive_files(PGconn *conn, const char *data_directory);
bool identify_system(PGconn *repl_conn, t_system_identification *identification);
uint64 system_identifier(PGconn *conn);
TimeLineHistoryEntry *get_timeline_history(PGconn *repl_conn, TimeLineID tli);
pid_t get_wal_receiver_pid(PGconn *conn);
/* user/role information functions */
bool can_execute_checkpoint(PGconn *conn);
bool can_execute_pg_promote(PGconn *conn);
bool can_disable_walsender(PGconn *conn);
bool connection_has_pg_monitor_role(PGconn *conn, const char *subrole);
bool is_replication_role(PGconn *conn, char *rolname);
bool is_superuser_connection(PGconn *conn, t_connection_user *userinfo);
/* repmgrd shared memory functions */
bool repmgrd_set_local_node_id(PGconn *conn, int local_node_id);
int repmgrd_get_local_node_id(PGconn *conn);
bool repmgrd_check_local_node_id(PGconn *conn);
BackupState server_in_exclusive_backup_mode(PGconn *conn);
void repmgrd_set_pid(PGconn *conn, pid_t repmgrd_pid, const char *pidfile);
pid_t repmgrd_get_pid(PGconn *conn);
bool repmgrd_is_running(PGconn *conn);
bool repmgrd_is_paused(PGconn *conn);
bool repmgrd_pause(PGconn *conn, bool pause);
int repmgrd_get_upstream_node_id(PGconn *conn);
bool repmgrd_set_upstream_node_id(PGconn *conn, int node_id);
/* extension functions */
ExtensionStatus get_repmgr_extension_status(PGconn *conn);
ExtensionStatus get_repmgr_extension_status(PGconn *conn, t_extension_versions *extversions);
/* node management functions */
void checkpoint(PGconn *conn);
bool vacuum_table(PGconn *conn, const char *table);
bool promote_standby(PGconn *conn, bool wait, int wait_seconds);
bool resume_wal_replay(PGconn *conn);
/* node record functions */
t_server_type parse_node_type(const char *type);
const char *get_node_type_string(t_server_type type);
RecordStatus get_node_record(PGconn *conn, int node_id, t_node_info *node_info);
RecordStatus refresh_node_record(PGconn *conn, int node_id, t_node_info *node_info);
RecordStatus get_node_record_with_upstream(PGconn *conn, int node_id, t_node_info *node_info);
RecordStatus get_node_record_by_name(PGconn *conn, const char *node_name, t_node_info *node_info);
t_node_info *get_node_record_pointer(PGconn *conn, int node_id);
bool get_local_node_record(PGconn *conn, int node_id, t_node_info *node_info);
bool get_primary_node_record(PGconn *conn, t_node_info *node_info);
void get_all_node_records(PGconn *conn, NodeInfoList *node_list);
bool get_all_node_records(PGconn *conn, NodeInfoList *node_list);
bool get_all_nodes_count(PGconn *conn, int *count);
void get_downstream_node_records(PGconn *conn, int node_id, NodeInfoList *nodes);
void get_active_sibling_node_records(PGconn *conn, int node_id, int upstream_node_id, NodeInfoList *node_list);
bool get_child_nodes(PGconn *conn, int node_id, NodeInfoList *node_list);
void get_node_records_by_priority(PGconn *conn, NodeInfoList *node_list);
void get_all_node_records_with_upstream(PGconn *conn, NodeInfoList *node_list);
bool get_all_node_records_with_upstream(PGconn *conn, NodeInfoList *node_list);
bool get_downstream_nodes_with_missing_slot(PGconn *conn, int this_node_id, NodeInfoList *noede_list);
bool create_node_record(PGconn *conn, char *repmgr_action, t_node_info *node_info);
bool update_node_record(PGconn *conn, char *repmgr_action, t_node_info *node_info);
bool delete_node_record(PGconn *conn, int node);
bool truncate_node_records(PGconn *conn);
bool update_node_record_set_active(PGconn *conn, int this_node_id, bool active);
bool update_node_record_set_primary(PGconn *conn, int this_node_id);
bool update_node_record_set_active_standby(PGconn *conn, int this_node_id);
bool update_node_record_set_upstream(PGconn *conn, int this_node_id, int new_upstream_node_id);
bool update_node_record_status(PGconn *conn, int this_node_id, char *type, int upstream_node_id, bool active);
bool update_node_record_conn_priority(PGconn *conn, t_configuration_options *options);
bool update_node_record_slot_name(PGconn *primary_conn, int node_id, char *slot_name);
bool witness_copy_node_records(PGconn *primary_conn, PGconn *witness_conn);
void clear_node_info_list(NodeInfoList *nodes);
@@ -438,21 +533,33 @@ void config_file_list_add(t_configfile_list *list, const char *file, const char
bool create_event_record(PGconn *conn, t_configuration_options *options, int node_id, char *event, bool successful, char *details);
bool create_event_notification(PGconn *conn, t_configuration_options *options, int node_id, char *event, bool successful, char *details);
bool create_event_notification_extended(PGconn *conn, t_configuration_options *options, int node_id, char *event, bool successful, char *details, t_event_info *event_info);
PGresult *get_event_records(PGconn *conn, int node_id, const char *node_name, const char *event, bool all, int limit);
/* replication slot functions */
bool create_replication_slot(PGconn *conn, char *slot_name, int server_version_num, PQExpBufferData *error_msg);
bool drop_replication_slot(PGconn *conn, char *slot_name);
void create_slot_name(char *slot_name, int node_id);
bool create_replication_slot_sql(PGconn *conn, char *slot_name, PQExpBufferData *error_msg);
bool create_replication_slot_replprot(PGconn *conn, PGconn *repl_conn, char *slot_name, PQExpBufferData *error_msg);
bool drop_replication_slot_sql(PGconn *conn, char *slot_name);
bool drop_replication_slot_replprot(PGconn *repl_conn, char *slot_name);
RecordStatus get_slot_record(PGconn *conn, char *slot_name, t_replication_slot *record);
int get_free_replication_slot_count(PGconn *conn, int *max_replication_slots);
int get_inactive_replication_slots(PGconn *conn, KeyValueList *list);
/* tablespace functions */
bool get_tablespace_name_by_location(PGconn *conn, const char *location, char *name);
/* asynchronous query functions */
bool cancel_query(PGconn *conn, int timeout);
int wait_connection_availability(PGconn *conn, long long timeout);
int wait_connection_availability(PGconn *conn, int timeout);
/* node availability functions */
bool is_server_available(const char *conninfo);
bool is_server_available_quiet(const char *conninfo);
bool is_server_available_params(t_conninfo_param_list *param_list);
ExecStatusType connection_ping(PGconn *conn);
ExecStatusType connection_ping_reconnect(PGconn *conn);
/* monitoring functions */
void
@@ -468,43 +575,41 @@ add_monitoring_record(PGconn *primary_conn,
long long unsigned int apply_lag_bytes
);
int get_number_of_monitoring_records_to_delete(PGconn *primary_conn, int keep_history);
bool delete_monitoring_records(PGconn *primary_conn, int keep_history);
int get_number_of_monitoring_records_to_delete(PGconn *primary_conn, int keep_history, int node_id);
bool delete_monitoring_records(PGconn *primary_conn, int keep_history, int node_id);
/* node voting functions */
NodeVotingStatus get_voting_status(PGconn *conn);
VoteRequestResult request_vote(PGconn *conn, t_node_info *this_node, t_node_info *other_node, int electoral_term);
int set_voting_status_initiated(PGconn *conn);
void initialize_voting_term(PGconn *conn);
int get_current_term(PGconn *conn);
void increment_current_term(PGconn *conn);
bool announce_candidature(PGconn *conn, t_node_info *this_node, t_node_info *other_node, int electoral_term);
void notify_follow_primary(PGconn *conn, int primary_node_id);
bool get_new_primary(PGconn *conn, int *primary_node_id);
void reset_voting_status(PGconn *conn);
/* replication status functions */
XLogRecPtr get_current_wal_lsn(PGconn *conn);
XLogRecPtr get_primary_current_lsn(PGconn *conn);
XLogRecPtr get_node_current_lsn(PGconn *conn);
XLogRecPtr get_last_wal_receive_location(PGconn *conn);
bool get_replication_info(PGconn *conn, ReplInfo *replication_info);
void init_replication_info(ReplInfo *replication_info);
bool get_replication_info(PGconn *conn, t_server_type node_type, ReplInfo *replication_info);
int get_replication_lag_seconds(PGconn *conn);
TimeLineID get_node_timeline(PGconn *conn, char *timeline_id_str);
void get_node_replication_stats(PGconn *conn, t_node_info *node_info);
bool is_downstream_node_attached(PGconn *conn, char *node_name);
NodeAttached is_downstream_node_attached(PGconn *conn, char *node_name, char **node_state);
NodeAttached is_downstream_node_attached_quiet(PGconn *conn, char *node_name, char **node_state);
void set_upstream_last_seen(PGconn *conn, int upstream_node_id);
int get_upstream_last_seen(PGconn *conn, t_server_type node_type);
/* BDR functions */
void get_all_bdr_node_records(PGconn *conn, BdrNodeInfoList *node_list);
RecordStatus get_bdr_node_record_by_name(PGconn *conn, const char *node_name, t_bdr_node_info *node_info);
bool is_bdr_db(PGconn *conn, PQExpBufferData *output);
bool is_active_bdr_node(PGconn *conn, const char *node_name);
bool is_bdr_repmgr(PGconn *conn);
bool is_table_in_bdr_replication_set(PGconn *conn, const char *tablename, const char *set);
bool add_table_to_bdr_replication_set(PGconn *conn, const char *tablename, const char *set);
void add_extension_tables_to_bdr_replication_set(PGconn *conn);
bool is_wal_replay_paused(PGconn *conn, bool check_pending_wal);
bool bdr_node_exists(PGconn *conn, const char *node_name);
ReplSlotStatus get_bdr_node_replication_slot_status(PGconn *conn, const char *node_name);
void get_bdr_other_node_name(PGconn *conn, int node_id, char *name_buf);
/* repmgrd status functions */
CheckStatus get_repmgrd_status(PGconn *conn);
bool am_bdr_failover_handler(PGconn *conn, int node_id);
void unset_bdr_failover_handler(PGconn *conn);
/* miscellaneous debugging functions */
const char *print_node_status(NodeStatus node_status);
const char *print_pqping_status(PGPing ping_status);
#endif /* _REPMGR_DBUTILS_H_ */

317
dirutil.c
View File

@@ -3,7 +3,7 @@
* dirmod.c
* directory handling functions
*
* Copyright (c) 2ndQuadrant, 2010-2017
* Copyright (c) EnterpriseDB Corporation, 2010-2021
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -21,6 +21,7 @@
#include <unistd.h>
#include <dirent.h>
#include <signal.h>
#include <sys/stat.h>
#include <errno.h>
#include <stdio.h>
@@ -34,34 +35,33 @@
#include "dirutil.h"
#include "strutil.h"
#include "log.h"
#include "controldata.h"
static int unlink_dir_callback(const char *fpath, const struct stat *sb, int typeflag, struct FTW *ftwbuf);
/* PID can be negative if backend is standalone */
typedef long pgpid_t;
/*
* make sure the directory either doesn't exist or is empty
* we use this function to check the new data directory and
* the directories for tablespaces
* Check if a directory exists, and if so whether it is empty.
*
* This is the same check initdb does on the new PGDATA dir
*
* Returns 0 if nonexistent, 1 if exists and empty, 2 if not empty,
* or -1 if trouble accessing directory
* This function is used for checking both the data directory
* and tablespace directories.
*/
int
check_dir(char *path)
DataDirState
check_dir(const char *path)
{
DIR *chkdir;
struct dirent *file;
int result = 1;
DIR *chkdir = NULL;
struct dirent *file = NULL;
int result = DIR_EMPTY;
errno = 0;
chkdir = opendir(path);
if (!chkdir)
return (errno == ENOENT) ? 0 : -1;
return (errno == ENOENT) ? DIR_NOENT : DIR_ERROR;
while ((file = readdir(chkdir)) != NULL)
{
@@ -73,25 +73,15 @@ check_dir(char *path)
}
else
{
result = 2; /* not empty */
result = DIR_NOT_EMPTY;
break;
}
}
#ifdef WIN32
/*
* This fix is in mingw cvs (runtime/mingwex/dirent.c rev 1.4), but not in
* released version
*/
if (GetLastError() == ERROR_NO_MORE_FILES)
errno = 0;
#endif
closedir(chkdir);
if (errno != 0)
return -1; /* some kind of I/O error? */
return DIR_ERROR; /* some kind of I/O error? */
return result;
}
@@ -101,23 +91,75 @@ check_dir(char *path)
* Create directory with error log message when failing
*/
bool
create_dir(char *path)
create_dir(const char *path)
{
if (mkdir_p(path, 0700) == 0)
char create_dir_path[MAXPGPATH];
/* mkdir_p() may modify the supplied path */
strncpy(create_dir_path, path, MAXPGPATH);
if (mkdir_p(create_dir_path, 0700) == 0)
return true;
log_error(_("unable to create directory \"%s\": %s"),
path, strerror(errno));
log_error(_("unable to create directory \"%s\""), create_dir_path);
log_detail("%s", strerror(errno));
return false;
}
bool
set_dir_permissions(char *path)
{
return (chmod(path, 0700) != 0) ? false : true;
}
bool
set_dir_permissions(const char *path, int server_version_num)
{
struct stat stat_buf;
bool no_group_access =
(server_version_num != UNKNOWN_SERVER_VERSION_NUM) &&
(server_version_num < 110000);
/*
* At this point the path should exist, so this check is very
* much just-in-case.
*/
if (stat(path, &stat_buf) != 0)
{
if (errno == ENOENT)
{
log_warning(_("directory \"%s\" does not exist"), path);
}
else
{
log_warning(_("could not read permissions of directory \"%s\""),
path);
log_detail("%s", strerror(errno));
}
return false;
}
/*
* If mode is not 0700 or 0750, attempt to change.
*/
if ((no_group_access == true && (stat_buf.st_mode & (S_IRWXG | S_IRWXO)))
|| (no_group_access == false && (stat_buf.st_mode & (S_IWGRP | S_IRWXO))))
{
/*
* Currently we default to 0700.
* There is no facility to override this directly,
* but the user can manually create the directory with
* the desired permissions.
*/
if (chmod(path, 0700) != 0) {
log_error(_("unable to change permissions of directory \"%s\""), path);
log_detail("%s", strerror(errno));
return false;
}
return true;
}
/* Leave as-is */
return true;
}
/* function from initdb.c */
@@ -146,26 +188,6 @@ mkdir_p(char *path, mode_t omode)
oumask = 0;
retval = 0;
#ifdef WIN32
/* skip network and drive specifiers for win32 */
if (strlen(p) >= 2)
{
if (p[0] == '/' && p[1] == '/')
{
/* network drive */
p = strstr(p + 2, "/");
if (p == NULL)
return 1;
}
else if (p[1] == ':' &&
((p[0] >= 'a' && p[0] <= 'z') ||
(p[0] >= 'A' && p[0] <= 'Z')))
{
/* local drive */
p += 2;
}
}
#endif
if (p[0] == '/') /* Skip leading '/'. */
++p;
@@ -183,7 +205,7 @@ mkdir_p(char *path, mode_t omode)
/*
* POSIX 1003.2: For each dir operand that does not name an
* existing directory, effects equivalent to those caused by the
* following command shall occcur:
* following command shall occur:
*
* mkdir -p -m $(umask -S),u+wx $(dirname dir) && mkdir [-m mode]
* dir
@@ -227,9 +249,9 @@ mkdir_p(char *path, mode_t omode)
bool
is_pg_dir(char *path)
is_pg_dir(const char *path)
{
char dirpath[MAXPGPATH];
char dirpath[MAXPGPATH] = "";
struct stat sb;
/* test pgdata */
@@ -242,17 +264,93 @@ is_pg_dir(char *path)
return false;
}
/*
* Attempt to determine if a PostgreSQL data directory is in use
* by reading the pidfile. This is the same mechanism used by
* "pg_ctl".
*
* This function will abort with appropriate log messages if a file error
* is encountered, as the user will need to address the situation before
* any further useful progress can be made.
*/
PgDirState
is_pg_running(const char *path)
{
long pid;
FILE *pidf;
char pid_file[MAXPGPATH];
/* it's reasonable to assume the pidfile name will not change */
snprintf(pid_file, MAXPGPATH, "%s/postmaster.pid", path);
pidf = fopen(pid_file, "r");
if (pidf == NULL)
{
/*
* No PID file - PostgreSQL shouldn't be running. From 9.3 (the
* earliest version we care about) removal of the PID file will
* cause the postmaster to shut down, so it's highly unlikely
* that PostgreSQL will still be running.
*/
if (errno == ENOENT)
{
return PG_DIR_NOT_RUNNING;
}
else
{
log_error(_("unable to open PostgreSQL PID file \"%s\""), pid_file);
log_detail("%s", strerror(errno));
exit(ERR_BAD_CONFIG);
}
}
/*
* In the unlikely event we're unable to extract a PID from the PID file,
* log a warning but assume we're not dealing with a running instance
* as PostgreSQL should have shut itself down in these cases anyway.
*/
if (fscanf(pidf, "%ld", &pid) != 1)
{
/* Is the file empty? */
if (ftell(pidf) == 0 && feof(pidf))
{
log_warning(_("PostgreSQL PID file \"%s\" is empty"), path);
}
else
{
log_warning(_("invalid data in PostgreSQL PID file \"%s\""), path);
}
fclose(pidf);
return PG_DIR_NOT_RUNNING;
}
fclose(pidf);
if (pid == getpid())
return PG_DIR_NOT_RUNNING;
if (pid == getppid())
return PG_DIR_NOT_RUNNING;
if (kill(pid, 0) == 0)
return PG_DIR_RUNNING;
return PG_DIR_NOT_RUNNING;
}
bool
create_pg_dir(char *path, bool force)
create_pg_dir(const char *path, bool force)
{
bool pg_dir = false;
/* Check this directory could be used as a PGDATA dir */
/* Check this directory can be used as a PGDATA dir */
switch (check_dir(path))
{
case 0:
/* dir not there, must create it */
case DIR_NOENT:
/* Directory does not exist, attempt to create it. */
log_info(_("creating directory \"%s\"..."), path);
if (!create_dir(path))
@@ -262,55 +360,90 @@ create_pg_dir(char *path, bool force)
return false;
}
break;
case 1:
/* Present but empty, fix permissions and use it */
log_info(_("checking and correcting permissions on existing directory %s"),
case DIR_EMPTY:
/*
* Directory exists but empty, fix permissions and use it.
*
* Note that at this point the caller might not know the server
* version number, so in this case "set_dir_permissions()" will
* accept 0750 as a valid setting. As this is invalid in Pg10 and
* earlier, the caller should call "set_dir_permissions()" again
* when it has the number.
*
* We need to do the permissions check here in any case to catch
* fatal permissions early.
*/
log_info(_("checking and correcting permissions on existing directory \"%s\""),
path);
if (!set_dir_permissions(path))
if (!set_dir_permissions(path, UNKNOWN_SERVER_VERSION_NUM))
{
log_error(_("unable to change permissions of directory \"%s\":\n %s"),
path, strerror(errno));
return false;
}
break;
case 2:
/* Present and not empty */
case DIR_NOT_EMPTY:
/* exists but is not empty */
log_warning(_("directory \"%s\" exists but is not empty"),
path);
pg_dir = is_pg_dir(path);
if (pg_dir && force)
if (is_pg_dir(path))
{
/* TODO: check DB state, if not running overwrite */
if (false)
if (force == true)
{
log_notice(_("deleting existing data directory \"%s\""), path);
log_notice(_("-F/--force provided - deleting existing data directory \"%s\""), path);
nftw(path, unlink_dir_callback, 64, FTW_DEPTH | FTW_PHYS);
/* recreate the directory ourselves to ensure permissions are correct */
if (!create_dir(path))
{
log_error(_("unable to create directory \"%s\"..."),
path);
return false;
}
return true;
}
/* Let it continue */
break;
}
else if (pg_dir && !force)
{
log_hint(_("This looks like a PostgreSQL directory.\n"
"If you are sure you want to clone here, "
"please check there is no PostgreSQL server "
"running and use the -F/--force option"));
return false;
}
else
{
if (force == true)
{
log_notice(_("deleting existing directory \"%s\""), path);
nftw(path, unlink_dir_callback, 64, FTW_DEPTH | FTW_PHYS);
return false;
default:
log_error(_("could not access directory \"%s\": %s"),
path, strerror(errno));
/* recreate the directory ourselves to ensure permissions are correct */
if (!create_dir(path))
{
log_error(_("unable to create directory \"%s\"..."),
path);
return false;
}
return true;
}
return false;
}
break;
case DIR_ERROR:
log_error(_("could not access directory \"%s\"")
, path);
log_detail("%s", strerror(errno));
return false;
}
return true;
}
int
rmdir_recursive(const char *path)
{
return nftw(path, unlink_dir_callback, 64, FTW_DEPTH | FTW_PHYS);
}
static int
unlink_dir_callback(const char *fpath, const struct stat *sb, int typeflag, struct FTW *ftwbuf)
{

View File

@@ -1,6 +1,6 @@
/*
* dirutil.h
* Copyright (c) 2ndQuadrant, 2010-2017
* Copyright (c) EnterpriseDB Corporation, 2010-2021
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -19,12 +19,29 @@
#ifndef _DIRUTIL_H_
#define _DIRUTIL_H_
extern int mkdir_p(char *path, mode_t omode);
extern bool set_dir_permissions(char *path);
typedef enum
{
DIR_ERROR = -1,
DIR_NOENT,
DIR_EMPTY,
DIR_NOT_EMPTY
} DataDirState;
extern int check_dir(char *path);
extern bool create_dir(char *path);
extern bool is_pg_dir(char *path);
extern bool create_pg_dir(char *path, bool force);
typedef enum
{
PG_DIR_ERROR = -1,
PG_DIR_NOT_RUNNING,
PG_DIR_RUNNING
} PgDirState;
extern int mkdir_p(char *path, mode_t omode);
extern bool set_dir_permissions(const char *path, int server_version_num);
extern DataDirState check_dir(const char *path);
extern bool create_dir(const char *path);
extern bool is_pg_dir(const char *path);
extern PgDirState is_pg_running(const char *path);
extern bool create_pg_dir(const char *path, bool force);
extern int rmdir_recursive(const char *path);
#endif

8
doc/.gitignore vendored
View File

@@ -1,5 +1,9 @@
HTML.index
bookindex.sgml
bookindex.xml
html-stamp
html/
version.sgml
repmgr.html
version.xml
*.fo
*.pdf
*.sgml

104
doc/Makefile Normal file
View File

@@ -0,0 +1,104 @@
# Make "html" the default target, since that is what most people tend
# to want to use.
html:
all: html
subdir = doc
repmgr_top_builddir = ..
include $(repmgr_top_builddir)/Makefile.global
XMLINCLUDE = --path .
ifndef XMLLINT
XMLLINT = $(missing) xmllint
endif
ifndef XSLTPROC
XSLTPROC = $(missing) xsltproc
endif
ifndef FOP
FOP = $(missing) fop
endif
override XSLTPROCFLAGS += --stringparam repmgr.version '$(REPMGR_VERSION)'
GENERATED_XML = version.xml
ALLXML := $(wildcard $(srcdir)/*.xml) $(GENERATED_XML)
version.xml: $(repmgr_top_builddir)/repmgr_version.h
{ \
echo "<!ENTITY repmgrversion \"$(REPMGR_VERSION)\">"; \
echo "<!ENTITY releasedate \"$(REPMGR_RELEASE_DATE)\">"; \
} > $@
##
## HTML
##
html: html-stamp
html-stamp: stylesheet.xsl repmgr.xml $(ALLXML)
$(XMLLINT) $(XMLINCLUDE) --noout --valid $(word 2,$^)
$(XSLTPROC) $(XMLINCLUDE) $(XSLTPROCFLAGS) $(XSLTPROC_HTML_FLAGS) $(wordlist 1,2,$^)
cp $(srcdir)/stylesheet.css $(srcdir)/website-docs.css html/
touch $@
# single-page HTML
repmgr.html: stylesheet-html-nochunk.xsl repmgr.xml $(ALLXML)
$(XMLLINT) $(XMLINCLUDE) --noout --valid $(word 2,$^)
$(XSLTPROC) $(XMLINCLUDE) $(XSLTPROCFLAGS) $(XSLTPROC_HTML_FLAGS) -o $@ $(wordlist 1,2,$^)
zip: html
cp -r html repmgr-docs-$(REPMGR_VERSION)
zip -r repmgr-docs-$(REPMGR_VERSION).zip repmgr-docs-$(REPMGR_VERSION)
rm -rf repmgr-docs-$(REPMGR_VERSION)
##
## Print
##
repmgr.pdf:
$(error Invalid target; use repmgr-A4.pdf or repmgr-US.pdf as targets)
# Standard paper size
repmgr-A4.fo: stylesheet-fo.xsl repmgr.xml $(ALLXML)
$(XMLLINT) $(XMLINCLUDE) --noout --valid $(word 2,$^)
$(XSLTPROC) $(XMLINCLUDE) $(XSLTPROCFLAGS) --stringparam paper.type A4 -o $@ $(wordlist 1,2,$^)
repmgr-A4.pdf: repmgr-A4.fo
$(FOP) -fo $< -pdf $@
# North American paper size
repmgr-US.fo: stylesheet-fo.xsl repmgr.xml $(ALLXML)
$(XMLLINT) $(XMLINCLUDE) --noout --valid $(word 2,$^)
$(XSLTPROC) $(XMLINCLUDE) $(XSLTPROCFLAGS) --stringparam paper.type USletter -o $@ $(wordlist 1,2,$^)
repmgr-US.pdf: repmgr-US.fo
$(FOP) -fo $< -pdf $@
install: html
@$(MKDIR_P) $(DESTDIR)$(docdir)/$(docmoduledir)/repmgr
@$(INSTALL_DATA) $(wildcard html/*.html) $(wildcard html/*.css) $(DESTDIR)$(docdir)/$(docmoduledir)/repmgr
@echo Installed docs to $(DESTDIR)$(docdir)/$(docmoduledir)/repmgr
clean:
rm -f html-stamp
rm -f HTML.index $(GENERATED_XML)
rm -f repmgr.html
rm -f repmgr-A4.pdf
rm -f repmgr-US.pdf
rm -f *.fo
rm -f html/*
maintainer-clean:
rm -rf html
.PHONY: html

View File

@@ -1,71 +0,0 @@
repmgr_subdir = doc
repmgr_top_builddir = ..
include $(repmgr_top_builddir)/Makefile.global
ifndef JADE
JADE = $(missing) jade
endif
SGMLINCLUDE = -D . -D ${srcdir}
SPFLAGS += -wall -wno-unused-param -wno-empty -wfully-tagged
JADE.html.call = $(JADE) $(JADEFLAGS) $(SPFLAGS) $(SGMLINCLUDE) $(CATALOG) -d stylesheet.dsl -t sgml -i output-html
ALLSGML := $(wildcard $(srcdir)/*.sgml)
# to build bookindex
ALMOSTALLSGML := $(filter-out %bookindex.sgml,$(ALLSGML))
GENERATED_SGML = version.sgml bookindex.sgml
Makefile: Makefile.in
cd $(repmgr_top_builddir) && ./config.status doc/Makefile
all: html
html: html-stamp
html-stamp: repmgr.sgml $(ALLSGML) $(GENERATED_SGML) stylesheet.dsl website-docs.css
$(MKDIR_P) html
$(JADE.html.call) -i include-index $<
cp $(srcdir)/stylesheet.css $(srcdir)/website-docs.css html/
touch $@
version.sgml: ${repmgr_top_builddir}/repmgr_version.h
{ \
echo "<!ENTITY repmgrversion \"$(REPMGR_VERSION)\">"; \
} > $@
HTML.index: repmgr.sgml $(ALMOSTALLSGML) stylesheet.dsl
@$(MKDIR_P) html
$(JADE.html.call) -V html-index $<
website-docs.css:
@$(MKDIR_P) html
curl http://www.postgresql.org/media/css/docs.css > ${srcdir}/website-docs.css
bookindex.sgml: HTML.index
ifdef COLLATEINDEX
LC_ALL=C $(PERL) $(COLLATEINDEX) -f -g -i 'bookindex' -o $@ $<
else
@$(missing) collateindex.pl $< $@
endif
clean:
rm -f html-stamp
rm -f HTML.index $(GENERATED_SGML)
maintainer-clean:
rm -rf html
rm -rf Makefile
zip: html
cp -r html repmgr-docs-$(REPMGR_VERSION)
zip -r repmgr-docs-$(REPMGR_VERSION).zip repmgr-docs-$(REPMGR_VERSION)
rm -rf repmgr-docs-$(REPMGR_VERSION)
install: html
@$(MKDIR_P) $(DESTDIR)$(docdir)/$(docmoduledir)/repmgr
@$(INSTALL_DATA) $(wildcard html/*.html) $(wildcard html/*.css) $(DESTDIR)$(docdir)/$(docmoduledir)/repmgr
@echo Installed docs to $(DESTDIR)$(docdir)/$(docmoduledir)/repmgr
.PHONY: html all

488
doc/appendix-faq.xml Normal file
View File

@@ -0,0 +1,488 @@
<appendix id="appendix-faq" xreflabel="FAQ">
<title>FAQ (Frequently Asked Questions)</title>
<indexterm>
<primary>FAQ (Frequently Asked Questions)</primary>
</indexterm>
<sect1 id="faq-general" xreflabel="General">
<title>General</title>
<sect2 id="faq-xrepmgr-version-diff" xreflabel="Version differences">
<title>What's the difference between the repmgr versions?</title>
<para>
&repmgr; 4 is a complete rewrite of the previous &repmgr; code base
and implements &repmgr; as a PostgreSQL extension. It
supports all PostgreSQL versions from 9.3 (although some &repmgr;
features are not available for PostgreSQL 9.3 and 9.4).
</para>
<note>
<para>
&repmgr; 5 is fundamentally the same code base as &repmgr; 4, but provides
support for the revised replication configuration mechanism in PostgreSQL 12.
</para>
<para>
Support for PostgreSQL 9.3 is no longer available from &repmgr; 5.2.
</para>
</note>
<para>
&repmgr; 3.x builds on the improved replication facilities added
in PostgreSQL 9.3, as well as improved automated failover support
via &repmgrd;, and is not compatible with PostgreSQL 9.2
and earlier. We recommend upgrading to &repmgr; 4, as the &repmgr; 3.x
series is no longer maintained.
</para>
<para>
&repmgr; 2.x supports PostgreSQL 9.0 ~ 9.3. While it is compatible
with PostgreSQL 9.3, we recommend using repmgr 4.x. &repmgr; 2.x is
no longer maintained.
</para>
<para>
See also <link linkend="install-compatibility-matrix">&repmgr; compatibility matrix</link>
and <link linkend="faq-upgrade-repmgr">Should I upgrade &repmgr;?</link>.
</para>
</sect2>
<sect2 id="faq-replication-slots-advantage" xreflabel="Advantages of replication slots">
<title>What's the advantage of using replication slots?</title>
<para>
Replication slots, introduced in PostgreSQL 9.4, ensure that the
primary server will retain WAL files until they have been consumed
by all standby servers. This means standby servers should never
fail due to not being able to retrieve required WAL files from the
primary.
</para>
<para>
However this does mean that if a standby is no longer connected to the
primary, the presence of the replication slot will cause WAL files
to be retained indefinitely, and eventually lead to disk space
exhaustion.
</para>
<tip>
<para>
Our recommended configuration is to configure
<ulink url="https://www.pgbarman.org/">Barman</ulink> as a fallback
source of WAL files, rather than maintain replication slots for
each standby. See also: <link linkend="cloning-from-barman-restore-command">Using Barman as a WAL file source</link>.
</para>
</tip>
</sect2>
<sect2 id="faq-replication-slots-number" xreflabel="Number of replication slots">
<title>How many replication slots should I define in <varname>max_replication_slots</varname>?</title>
<para>
Normally at least same number as the number of standbys which will connect
to the node. Note that changes to <varname>max_replication_slots</varname> require a server
restart to take effect, and as there is no particular penalty for unused
replication slots, setting a higher figure will make adding new nodes
easier.
</para>
</sect2>
<sect2 id="faq-hash-index" xreflabel="Hash indexes">
<title>Does &repmgr; support hash indexes?</title>
<para>
Before PostgreSQL 10, hash indexes were not WAL logged and are therefore not suitable
for use in streaming replication in PostgreSQL 9.6 and earlier. See the
<ulink url="https://www.postgresql.org/docs/9.6/sql-createindex.html#AEN80279">PostgreSQL documentation</ulink>
for details.
</para>
<para>
From PostgreSQL 10, this restriction has been lifted and hash indexes can be used
in a streaming replication cluster.
</para>
</sect2>
<sect2 id="faq-upgrades" xreflabel="Upgrading PostgreSQL with repmgr">
<title>Can &repmgr; assist with upgrading a PostgreSQL cluster?</title>
<para>
For <emphasis>minor</emphasis> version upgrades, e.g. from 9.6.7 to 9.6.8, a common
approach is to upgrade a standby to the latest version, perform a
<link linkend="performing-switchover">switchover</link> promoting it to a primary,
then upgrade the former primary.
</para>
<para>
For <emphasis>major</emphasis> version upgrades (e.g. from PostgreSQL 9.6 to PostgreSQL 10),
the traditional approach is to "reseed" a cluster by upgrading a single
node with <ulink url="https://www.postgresql.org/docs/current/pgupgrade.html">pg_upgrade</ulink>
and recloning standbys from this.
</para>
<para>
To minimize downtime during major upgrades from PostgreSQL 9.4 and later,
<ulink url="https://www.2ndquadrant.com/en/resources/pglogical/">pglogical</ulink>
can be used to set up a parallel cluster using the newer PostgreSQL version,
which can be kept in sync with the existing production cluster until the
new cluster is ready to be put into production.
</para>
</sect2>
<sect2 id="faq-libdir-repmgr-error">
<title>What does this error mean: <literal>ERROR: could not access file "$libdir/repmgr"</literal>?</title>
<para>
It means the &repmgr; extension code is not installed in the
PostgreSQL application directory. This typically happens when using PostgreSQL
packages provided by a third-party vendor, which often have different
filesystem layouts.
</para>
<para>
Either use PostgreSQL packages provided by the community or EnterpriseDB; if this
is not possible, contact your vendor for assistance.
</para>
</sect2>
<sect2 id="faq-old-packages">
<title>How can I obtain old versions of &repmgr; packages?</title>
<para>
See appendix <xref linkend="packages-old-versions"/> for details.
</para>
</sect2>
<sect2 id="faq-repmgr-required-for-replication">
<title>Is &repmgr; required for streaming replication?</title>
<para>
No.
</para>
<para>
&repmgr; (together with &repmgrd;) assists with
<emphasis>managing</emphasis> replication. It does not actually perform replication, which
is part of the core PostgreSQL functionality.
</para>
</sect2>
<sect2 id="faq-what-if-repmgr-uninstalled">
<title>Will replication stop working if &repmgr; is uninstalled?</title>
<para>
No. See preceding question.
</para>
</sect2>
<sect2 id="faq-version-mix">
<title>Does it matter if different &repmgr; versions are present in the replication cluster?</title>
<para>
Yes. If different &quot;major&quot; &repmgr; versions (e.g. 3.3.x and 4.1.x) are present,
&repmgr; (in particular &repmgrd;)
may not run, or run properly, or in the worst case (if different &repmgrd;
versions are running and there are differences in the failover implementation) break
your replication cluster.
</para>
<para>
If different &quot;minor&quot; &repmgr; versions (e.g. 4.1.1 and 4.1.6) are installed,
&repmgr; will function, but we strongly recommend always running the same version
to ensure there are no unexpected surprises, e.g. a newer version behaving slightly
differently to the older version.
</para>
<para>
See also <link linkend="faq-upgrade-repmgr">Should I upgrade &repmgr;?</link>.
</para>
</sect2>
<sect2 id="faq-upgrade-repmgr">
<title>Should I upgrade &repmgr;?</title>
<para>
Yes.
</para>
<para>
We don't release new versions for fun, you know. Upgrading may require a little effort,
but running an older &repmgr; version with bugs which have since been fixed may end up
costing you more effort. The same applies to PostgreSQL itself.
</para>
</sect2>
<sect2 id="faq-repmgr-conf-data-directory">
<title>Why do I need to specify the data directory location in repmgr.conf?</title>
<para>
In some circumstances &repmgr; may need to access a PostgreSQL data
directory while the PostgreSQL server is not running, e.g. to confirm
it shut down cleanly during a <link linkend="performing-switchover">switchover</link>.
</para>
<para>
Additionally, this provides support when using &repmgr; on PostgreSQL 9.6 and
earlier, where the <literal>repmgr</literal> user is not a superuser; in that
case the <literal>repmgr</literal> user will not be able to access the
<literal>data_directory</literal> configuration setting, access to which is restricted
to superusers.
</para>
<para>
In PostgreSQL 10 and later, non-superusers can be added to the
<ulink url="https://www.postgresql.org/docs/current/default-roles.html">default role</ulink>
<option>pg_read_all_settings</option> (or the meta-role <option>pg_monitor</option>)
which will enable them to read this setting.
</para>
</sect2>
<sect2 id="faq-third-party-packages" xreflabel="Compatibility with third party vendor packages">
<title>Are &repmgr; packages compatible with <literal>$third_party_vendor</literal>'s packages?</title>
<para>
&repmgr; packages provided by EnterpriseDB are compatible with the community-provided PostgreSQL
packages and specified software provided by EnterpriseDB.
</para>
<para>
A number of other vendors provide their own versions of PostgreSQL packages, often with different
package naming schemes and/or file locations.
</para>
<para>
We cannot guarantee that &repmgr; packages will be compatible with these packages.
It may be possible to override package dependencies (e.g. <literal>rpm --nodeps</literal>
for CentOS-based systems or <literal>dpkg --force-depends</literal> for Debian-based systems).
</para>
</sect2>
</sect1>
<sect1 id="faq-repmgr" xreflabel="repmgr">
<title><command>repmgr</command></title>
<sect2 id="faq-register-existing-node" xreflabel="registering an existing node">
<title>Can I register an existing PostgreSQL server with repmgr?</title>
<para>
Yes, any existing PostgreSQL server which is part of the same replication
cluster can be registered with &repmgr;. There's no requirement for a
standby to have been cloned using &repmgr;.
</para>
</sect2>
<sect2 id="faq-repmgr-clone-other-source" >
<title>Can I use a standby not cloned by &repmgr; as a &repmgr; node?</title>
<para>
For a standby which has been manually cloned or recovered from an external
backup manager such as Barman, the command
<command><link linkend="repmgr-standby-clone">repmgr standby clone --replication-conf-only</link></command>
can be used to create the correct replication configuration file for
use with &repmgr; (and will create a replication slot if required). Once this has been done,
<link linkend="repmgr-standby-register">register the node</link> as usual.
</para>
</sect2>
<sect2 id="faq-repmgr-recovery-conf" >
<title>What does &repmgr; write in the replication configuration, and what options can be set there?</title>
<para>
See section <link linkend="repmgr-standby-clone-recovery-conf">Customising replication configuration</link>.
</para>
</sect2>
<sect2 id="faq-repmgr-failed-primary-standby" xreflabel="Reintegrate a failed primary as a standby">
<title>How can a failed primary be re-added as a standby?</title>
<para>
This is a two-stage process. First, the failed primary's data directory
must be re-synced with the current primary; secondly the failed primary
needs to be re-registered as a standby.
</para>
<para>
It's possible to use <command>pg_rewind</command> to re-synchronise the existing data
directory, which will usually be much
faster than re-cloning the server. However <command>pg_rewind</command> can only
be used if PostgreSQL either has <varname>wal_log_hints</varname> enabled, or
data checksums were enabled when the cluster was initialized.
</para>
<para>
Note that <command>pg_rewind</command> is available as part of the core PostgreSQL
distribution from PostgreSQL 9.5, and as a third-party utility for PostgreSQL 9.3 and 9.4.
</para>
<para>
&repmgr; provides the command <command>repmgr node rejoin</command> which can
optionally execute <command>pg_rewind</command>; see the <xref linkend="repmgr-node-rejoin"/>
documentation for details, in particular the section <xref linkend="repmgr-node-rejoin-pg-rewind"/>.
</para>
<para>
If <command>pg_rewind</command> cannot be used, then the data directory will need
to be re-cloned from scratch.
</para>
</sect2>
<sect2 id="faq-repmgr-check-configuration" xreflabel="Check PostgreSQL configuration">
<title>Is there an easy way to check my primary server is correctly configured for use with &repmgr;?</title>
<para>
Execute <command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>
with the <literal>--dry-run</literal> option; this will report any configuration problems
which need to be rectified.
</para>
</sect2>
<sect2 id="faq-repmgr-clone-skip-config-files" xreflabel="">
<title>When cloning a standby, how can I get &repmgr; to copy
<filename>postgresql.conf</filename> and <filename>pg_hba.conf</filename> from the PostgreSQL configuration
directory in <filename>/etc</filename>?</title>
<para>
Use the command line option <literal>--copy-external-config-files</literal>. For more details
see <xref linkend="repmgr-standby-clone-config-file-copying"/>.
</para>
</sect2>
<sect2 id="faq-repmgr-shared-preload-libraries-no-repmgrd" xreflabel="shared_preload_libraries without repmgrd">
<title>Do I need to include <literal>shared_preload_libraries = 'repmgr'</literal>
in <filename>postgresql.conf</filename> if I'm not using &repmgrd;?</title>
<para>
No, the <literal>repmgr</literal> shared library is only needed when running &repmgrd;.
If you later decide to run &repmgrd;, you just need to add
<literal>shared_preload_libraries = 'repmgr'</literal> and restart PostgreSQL.
</para>
</sect2>
<sect2 id="faq-repmgr-permissions" xreflabel="Replication permission problems">
<title>I've provided replication permission for the <literal>repmgr</literal> user in <filename>pg_hba.conf</filename>
but <command>repmgr</command>/&repmgrd; complains it can't connect to the server... Why?</title>
<para>
<command>repmgr</command> and &repmgrd; need to be able to connect to the repmgr database
with a normal connection to query metadata. The <literal>replication</literal> connection
permission is for PostgreSQL's streaming replication (and doesn't necessarily need to be the <literal>repmgr</literal> user).
</para>
</sect2>
<sect2 id="faq-repmgr-clone-provide-primary-conninfo" xreflabel="Providing primary connection parameters">
<title>When cloning a standby, why do I need to provide the connection parameters
for the primary server on the command line, not in the configuration file?</title>
<para>
Cloning a standby is a one-time action; the role of the server being cloned
from could change, so fixing it in the configuration file would create
confusion. If &repmgr; needs to establish a connection to the primary
server, it can retrieve this from the <literal>repmgr.nodes</literal> table on the local
node, and if necessary scan the replication cluster until it locates the active primary.
</para>
</sect2>
<sect2 id="faq-repmgr-clone-waldir-xlogdir" xreflabel="Providing a custom WAL directory">
<title>When cloning a standby, how do I ensure the WAL files are placed in a custom directory?</title>
<para>
Provide the option <literal>--waldir</literal> (<literal>--xlogdir</literal> in PostgreSQL 9.6
and earlier) with the absolute path to the WAL directory in <varname>pg_basebackup_options</varname>.
For more details see <xref linkend="cloning-advanced-pg-basebackup-options"/>.
</para>
<para>
In &repmgr; 5.2 and later, this setting will also be honoured when cloning from Barman.
</para>
</sect2>
<sect2 id="faq-repmgr-events-no-fkey" xreflabel="No foreign key on node_id in repmgr.events">
<title>Why is there no foreign key on the <literal>node_id</literal> column in the <literal>repmgr.events</literal>
table?</title>
<para>
Under some circumstances event notifications can be generated for servers
which have not yet been registered; it's also useful to retain a record
of events which includes servers removed from the replication cluster
which no longer have an entry in the <literal>repmgr.nodes</literal> table.
</para>
</sect2>
<sect2 id="faq-repmgr-recovery-conf-quoted-values" xreflabel="Quoted values in replication.conf">
<title>Why are some values in <filename>recovery.conf</filename> (PostgreSQL 11 and earlier) surrounded by pairs of single quotes?</title>
<para>
This is to ensure that user-supplied values which are written as parameter values in <filename>recovery.conf</filename>
are escaped correctly and do not cause errors when the file is parsed.
</para>
<para>
The escaping is performed by an internal PostgreSQL routine, which leaves strings consisting
of digits and alphabetical characters only as-is, but wraps everything else in pairs of single quotes,
even if the string does not contain any characters which need escaping.
</para>
</sect2>
<sect2 id="faq-repmgr-exclude-metadata-from-dump" xreflabel="Excluding repmgr metadata from pg_dump output">
<title>How can I exclude &repmgr; metadata from <application>pg_dump</application> output?</title>
<para>
Beginning with &repmgr; 5.2, the metadata tables associated with the &repmgr; extension
(<literal>repmgr.nodes</literal>, <literal>repmgr.events</literal> and <literal>repmgr.monitoring_history</literal>)
have been marked as dumpable as they contain configuration and user-generated data.
</para>
<para>
To exclude these from <application>pg_dump</application> output, add the flag <option>--exclude-schema=repmgr</option>.
</para>
<para>
To exclude individual &repmgr; metadata tables from <application>pg_dump</application> output, add the flag
e.g. <option>--exclude-table=repmgr.monitoring_history</option>. This flag can be provided multiple times
to exclude individual tables,
</para>
</sect2>
</sect1>
<sect1 id="faq-repmgrd" xreflabel="repmgrd">
<title>&repmgrd;</title>
<sect2 id="faq-repmgrd-prevent-promotion" xreflabel="Prevent standby from being promoted to primary">
<title>How can I prevent a node from ever being promoted to primary?</title>
<para>
In <filename>repmgr.conf</filename>, set its priority to a value of <literal>0</literal>; apply the changed setting with
<command><link linkend="repmgr-standby-register">repmgr standby register --force</link></command>.
</para>
<para>
Additionally, if <varname>failover</varname> is set to <literal>manual</literal>, the node will never
be considered as a promotion candidate.
</para>
</sect2>
<sect2 id="faq-repmgrd-delayed-standby" xreflabel="Delayed standby support">
<title>Does &repmgrd; support delayed standbys?</title>
<para>
&repmgrd; can monitor delayed standbys - those set up with
<varname>recovery_min_apply_delay</varname> set to a non-zero value
in the replication configuration. However &repmgrd; does not currently
consider this setting, and therefore may not be able to properly evaluate
the node as a promotion candidate.
</para>
<para>
We recommend that delayed standbys are explicitly excluded from promotion
by setting <varname>priority</varname> to <literal>0</literal> in
<filename>repmgr.conf</filename>.
</para>
<para>
Note that after registering a delayed standby, &repmgrd; will only start
once the metadata added in the primary node has been replicated.
</para>
</sect2>
<sect2 id="faq-repmgrd-logfile-rotate" xreflabel="repmgrd logfile rotation">
<title>How can I get &repmgrd; to rotate its logfile?</title>
<para>
Configure your system's <literal>logrotate</literal> service to do this; see <xref linkend="repmgrd-log-rotation"/>.
</para>
</sect2>
<sect2 id="faq-repmgrd-recloned-no-start" xreflabel="repmgrd not restarting after node cloned">
<title>I've recloned a failed primary as a standby, but &repmgrd; refuses to start?</title>
<para>
Check you registered the standby after recloning. If unregistered, the standby
cannot be considered as a promotion candidate even if <varname>failover</varname> is set to
<literal>automatic</literal>, which is probably not what you want. &repmgrd; will start if
<varname>failover</varname> is set to <literal>manual</literal> so the node's replication status can still
be monitored, if desired.
</para>
</sect2>
<sect2 id="faq-repmgrd-pg-bindir" xreflabel="repmgrd does not apply pg_bindir to promote_command or follow_command">
<title>
&repmgrd; ignores pg_bindir when executing <varname>promote_command</varname> or <varname>follow_command</varname>
</title>
<para>
<varname>promote_command</varname> or <varname>follow_command</varname> can be user-defined scripts,
so &repmgr; will not apply <option>pg_bindir</option> even if executing &repmgr;. Always provide the full
path; see <xref linkend="repmgrd-automatic-failover-configuration"/> for more details.
</para>
</sect2>
<sect2 id="faq-repmgrd-startup-no-upstream" xreflabel="repmgrd does not start if upstream node is not running">
<title>
&repmgrd; aborts startup with the error "<literal>upstream node must be running before repmgrd can start</literal>"
</title>
<para>
&repmgrd; does this to avoid starting up on a replication cluster
which is not in a healthy state. If the upstream is unavailable, &repmgrd;
may initiate a failover immediately after starting up, which could have unintended side-effects,
particularly if &repmgrd; is not running on other nodes.
</para>
<para>
In particular, it's possible that the node's local copy of the <literal>repmgr.nodes</literal> copy
is out-of-date, which may lead to incorrect failover behaviour.
</para>
<para>
The onus is therefore on the administrator to manually set the cluster to a stable, healthy state before
starting &repmgrd;.
</para>
</sect2>
</sect1>
</appendix>

542
doc/appendix-packages.xml Normal file
View File

@@ -0,0 +1,542 @@
<appendix id="appendix-packages" xreflabel="Package details">
<title>&repmgr; package details</title>
<indexterm>
<primary>packages</primary>
</indexterm>
<para>
This section provides technical details about various &repmgr; binary
packages, such as location of the installed binaries and
configuration files.
</para>
<sect1 id="packages-centos" xreflabel="CentOS packages">
<title>CentOS Packages</title>
<indexterm>
<primary>packages</primary>
<secondary>CentOS packages</secondary>
</indexterm>
<indexterm>
<primary>CentOS</primary>
<secondary>package information</secondary>
</indexterm>
<para>
Currently, &repmgr; RPM packages are provided for versions 6.x and 7.x of CentOS. These should also
work on matching versions of Red Hat Enterprise Linux, Scientific Linux and Oracle Enterprise Linux;
together with CentOS, these are the same RedHat-based distributions for which the main community project
(PGDG) provides packages (see the <ulink url="https://yum.postgresql.org/">PostgreSQL RPM Building Project</ulink>
page for details).
</para>
<para>
Note these &repmgr; RPM packages are not designed to work with SuSE/OpenSuSE.
</para>
<note>
<para>
&repmgr; packages are designed to be compatible with community-provided PostgreSQL packages.
They may not work with vendor-specific packages such as those provided by RedHat for RHEL
customers, as the filesystem layout may be different to the community RPMs.
Please contact your support vendor for assistance.
</para>
</note>
<sect2 id="packages-centos-repositories">
<title>CentOS repositories</title>
<para>
&repmgr; packages are available from the public EDB repository, and also the
PostgreSQL community repository. The EDB repository is updated immediately
after each &repmgr; release.
</para>
<table id="centos-2ndquadrant-repository">
<title>EDB public repository</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Repository URL:</entry>
<entry><ulink url="https://dl.enterprisedb.com/">https://dl.enterprisedb.com/</ulink></entry>
</row>
<row>
<entry>Repository documentation:</entry>
<entry><ulink url="https://repmgr.org/docs/current/installation-packages.html#INSTALLATION-PACKAGES-REDHAT-2NDQ">https://repmgr.org/docs/current/installation-packages.html#INSTALLATION-PACKAGES-REDHAT-2NDQ</ulink></entry>
</row>
</tbody>
</tgroup>
</table>
<table id="centos-pgdg-repository">
<title>PostgreSQL community repository (PGDG)</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Repository URL:</entry>
<entry><ulink url="https://yum.postgresql.org/repopackages.php">https://yum.postgresql.org/repopackages.php</ulink></entry>
</row>
<row>
<entry>Repository documentation:</entry>
<entry><ulink url="https://yum.postgresql.org/">https://yum.postgresql.org/</ulink></entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2 id="packages-centos-details">
<title>CentOS package details</title>
<para>
The two tables below list relevant information, paths, commands etc. for the &repmgr; packages on
CentOS 7 (with systemd) and CentOS 6 (no systemd). Substitute the appropriate PostgreSQL major
version number for your installation.
</para>
<note>
<para>
For PostgreSQL 9.6 and lower, the CentOS packages use a mixture of <literal>9.6</literal>
and <literal>96</literal> in various places to designate the major version; e.g. the
package name is <literal>repmgr96</literal>, but the binary directory is
<filename>/var/lib/pgsql/9.6/data</filename>.
</para>
<para>
From PostgreSQL 10, the first part of the version number (e.g. <literal>10</literal>) is
the major version, so there is more consistency in file/path/package naming
(package <literal>repmgr10</literal>, binary directory <filename>/var/lib/pgsql/10/data</filename>).
</para>
</note>
<table id="centos-7-packages">
<title>CentOS 7 packages</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Package name example:</entry>
<entry><filename>repmgr11-4.4.0-1.rhel7.x86_64</filename></entry>
</row>
<row>
<entry>Metapackage:</entry>
<entry>(none)</entry>
</row>
<row>
<entry>Installation command:</entry>
<entry><literal>yum install repmgr11</literal></entry>
</row>
<row>
<entry>Binary location:</entry>
<entry><filename>/usr/pgsql-11/bin</filename></entry>
</row>
<row>
<entry>repmgr in default path:</entry>
<entry>NO</entry>
</row>
<row>
<entry>Configuration file location:</entry>
<entry><filename>/etc/repmgr/11/repmgr.conf</filename></entry>
</row>
<row>
<entry>Data directory:</entry>
<entry><filename>/var/lib/pgsql/11/data</filename></entry>
</row>
<row>
<entry>repmgrd service command:</entry>
<entry><command>systemctl [start|stop|restart|reload] repmgr11</command></entry>
</row>
<row>
<entry>repmgrd service file location:</entry>
<entry><filename>/usr/lib/systemd/system/repmgr11.service</filename></entry>
</row>
<row>
<entry>repmgrd log file location:</entry>
<entry>(not specified by package; set in <filename>repmgr.conf</filename>)</entry>
</row>
</tbody>
</tgroup>
</table>
<table id="centos-6-packages">
<title>CentOS 6 packages</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Package name example:</entry>
<entry><filename>repmgr96-4.0.4-1.rhel6.x86_64</filename></entry>
</row>
<row>
<entry>Metapackage:</entry>
<entry>(none)</entry>
</row>
<row>
<entry>Installation command:</entry>
<entry><literal>yum install repmgr96</literal></entry>
</row>
<row>
<entry>Binary location:</entry>
<entry><filename>/usr/pgsql-9.6/bin</filename></entry>
</row>
<row>
<entry>repmgr in default path:</entry>
<entry>NO</entry>
</row>
<row>
<entry>Configuration file location:</entry>
<entry><filename>/etc/repmgr/9.6/repmgr.conf</filename></entry>
</row>
<row>
<entry>Data directory:</entry>
<entry><filename>/var/lib/pgsql/9.6/data</filename></entry>
</row>
<row>
<entry>repmgrd service command:</entry>
<entry><literal>service [start|stop|restart|reload] repmgr-9.6</literal></entry>
</row>
<row>
<entry>repmgrd service file location:</entry>
<entry><literal>/etc/init.d/repmgr-9.6</literal></entry>
</row>
<row>
<entry>repmgrd log file location:</entry>
<entry><filename>/var/log/repmgr/repmgrd-9.6.log</filename></entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
</sect1>
<sect1 id="packages-debian-ubuntu" xreflabel="Debian/Ubuntu packages">
<title>Debian/Ubuntu Packages</title>
<indexterm>
<primary>packages</primary>
<secondary>Debian/Ubuntu packages</secondary>
</indexterm>
<indexterm>
<primary>Debian/Ubuntu</primary>
<secondary>package information</secondary>
</indexterm>
<para>
&repmgr; <literal>.deb</literal> packages are provided by EDB as well as the
PostgreSQL Community APT repository, and are available for each community-supported
PostgreSQL version, currently supported Debian releases, and currently supported
Ubuntu LTS releases.
</para>
<sect2 id="packages-apt-repository">
<title>APT repositories</title>
<table id="apt-2ndquadrant-repository">
<title>EDB public repository</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Repository URL:</entry>
<entry><ulink url="https://dl.enterprisedb.com/">https://dl.enterprisedb.com/</ulink></entry>
</row>
<row>
<entry>Repository documentation:</entry>
<entry><ulink url="https://repmgr.org/docs/current/installation-packages.html#INSTALLATION-PACKAGES-DEBIAN">https://repmgr.org/docs/current/installation-packages.html#INSTALLATION-PACKAGES-DEBIAN</ulink></entry>
</row>
</tbody>
</tgroup>
</table>
<table id="apt-repository">
<title>PostgreSQL Community APT repository (PGDG)</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Repository URL:</entry>
<entry><ulink url="https://apt.postgresql.org/">https://apt.postgresql.org/</ulink></entry>
</row>
<row>
<entry>Repository documentation:</entry>
<entry><ulink url="https://wiki.postgresql.org/wiki/Apt">https://wiki.postgresql.org/wiki/Apt</ulink></entry>
</row>
</tbody>
</tgroup>
</table>
</sect2>
<sect2 id="packages-debian-details">
<title>Debian/Ubuntu package details</title>
<para>
The table below lists relevant information, paths, commands etc. for the &repmgr; packages on
Debian 9.x ("Stretch"). Substitute the appropriate PostgreSQL major
version number for your installation.
</para>
<para>
See also <xref linkend="repmgrd-configuration-debian-ubuntu"/> for some specifics related
to configuring the &repmgrd; daemon.
</para>
<table id="debian-9-packages">
<title>Debian 9.x packages</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Package name example:</entry>
<entry><filename>postgresql-11-repmgr</filename></entry>
</row>
<row>
<entry>Metapackage:</entry>
<entry><filename>repmgr-common</filename></entry>
</row>
<row>
<entry>Installation command:</entry>
<entry><literal>apt-get install postgresql-11-repmgr</literal></entry>
</row>
<row>
<entry>Binary location:</entry>
<entry><filename>/usr/lib/postgresql/11/bin</filename></entry>
</row>
<row>
<entry>repmgr in default path:</entry>
<entry>Yes (via wrapper script <filename>/usr/bin/repmgr</filename>)</entry>
</row>
<row>
<entry>Configuration file location:</entry>
<entry>(not set by package)</entry>
</row>
<row>
<entry>Data directory:</entry>
<entry><filename>/var/lib/postgresql/11/main</filename></entry>
</row>
<row>
<entry>PostgreSQL service command:</entry>
<entry><command>systemctl [start|stop|restart|reload] postgresql@11-main</command></entry>
</row>
<row>
<entry>repmgrd service command:</entry>
<entry><command>systemctl [start|stop|restart|reload] repmgrd</command></entry>
</row>
<row>
<entry>repmgrd service file location:</entry>
<entry><filename>/etc/init.d/repmgrd</filename> (defaults in: <filename>/etc/defaults/repmgrd</filename>)</entry>
</row>
<row>
<entry>repmgrd log file location:</entry>
<entry>(not specified by package; set in <filename>repmgr.conf</filename>)</entry>
</row>
</tbody>
</tgroup>
</table>
<note>
<para>
When using Debian packages, instead of using the <application>systemd</application> service
command directly, it's recommended to execute <command>pg_ctlcluster</command>
(as <literal>root</literal>, either directly or via <command>sudo</command>), e.g.:
<programlisting>
<command>pg_ctlcluster 11 main [start|stop|restart|reload]</command></programlisting>
</para>
<para>
For pre-<application>systemd</application> systems, <command>pg_ctlcluster</command>
can be executed directly by the <literal>postgres</literal> user.
</para>
</note>
</sect2>
</sect1>
<sect1 id="packages-snapshot" xreflabel="Snapshot packages">
<title>Snapshot packages</title>
<indexterm>
<primary>snapshot packages</primary>
</indexterm>
<indexterm>
<primary>packages</primary>
<secondary>snapshots</secondary>
</indexterm>
<para>
For testing new features and bug fixes, from time to time EDB provides
so-called &quot;snapshot packages&quot; via its public repository. These packages
are built from the &repmgr; source at a particular point in time, and are not formal
releases.
</para>
<note>
<para>
We do not recommend installing these packages in a production environment
unless specifically advised.
</para>
</note>
<para>
To install a snapshot package, it's necessary to install the EDB public snapshot repository,
following the instructions here: <ulink url="https://dl.enterprisedb.com/default/release/site/">https://dl.enterprisedb.com/default/release/site/</ulink> but replace <literal>release</literal> with <literal>snapshot</literal>
in the appropriate URL.
</para>
<para>
For example, to install the snapshot RPM repository for PostgreSQL 9.6, execute (as <literal>root</literal>):
<programlisting>
curl https://dl.enterprisedb.com/default/snapshot/get/9.6/rpm | bash</programlisting>
or as a normal user with root sudo access:
<programlisting>
curl https://dl.enterprisedb.com/default/snapshot/get/9.6/rpm | sudo bash</programlisting>
</para>
<para>
Alternatively you can browse the repository here:
<ulink url="https://dl.enterprisedb.com/default/snapshot/browse/">https://dl.enterprisedb.com/default/snapshot/browse/</ulink>.
</para>
<para>
Once the repository is installed, installing or updating &repmgr; will result in the latest snapshot
package being installed.
</para>
<para>
The package name will be formatted like this:
<programlisting>
repmgr96-4.1.1-0.0git320.g5113ab0.1.el7.x86_64.rpm</programlisting>
containing the snapshot build number (here: <literal>320</literal>) and the hash
of the <application>git</application> commit it was built from (here: <literal>g5113ab0</literal>).
</para>
<para>
Note that the next formal release (in the above example <literal>4.1.1</literal>), once available,
will install in place of any snapshot builds.
</para>
</sect1>
<sect1 id="packages-old-versions" xreflabel="Installing old package versions">
<title>Installing old package versions</title>
<indexterm>
<primary>old packages</primary>
</indexterm>
<indexterm>
<primary>packages</primary>
<secondary>old versions</secondary>
</indexterm>
<indexterm>
<primary>installation</primary>
<secondary>old package versions</secondary>
</indexterm>
<sect2 id="packages-old-versions-debian" xreflabel="old Debian package versions">
<title>Debian/Ubuntu</title>
<para>
An archive of old packages (<literal>3.3.2</literal> and later) for Debian/Ubuntu-based systems is available here:
<ulink url="https://apt-archive.postgresql.org/">https://apt-archive.postgresql.org/</ulink>
</para>
</sect2>
<sect2 id="packages-old-versions-rhel-centos" xreflabel="old RHEL/CentOS package versions">
<title>RHEL/CentOS</title>
<para>
Old versions can be located with e.g.:
<programlisting>
yum --showduplicates list repmgr96</programlisting>
(substitute the appropriate package name; see <xref linkend="packages-centos"/>) and installed with:
<programlisting>
yum install {package_name}-{version}</programlisting>
where <literal>{package_name}</literal> is the base package name (e.g. <literal>repmgr96</literal>)
and <literal>{version}</literal> is the version listed by the
<command> yum --showduplicates list ...</command> command, e.g. <literal>4.0.6-1.rhel6</literal>.
</para>
<para>For example:
<programlisting>
yum install repmgr96-4.0.6-1.rhel6</programlisting>
</para>
</sect2>
</sect1>
<sect1 id="packages-packager-info" xreflabel="Information for packagers">
<title>Information for packagers</title>
<indexterm>
<primary>packages</primary>
<secondary>information for packagers</secondary>
</indexterm>
<para>
We recommend patching the following parameters when
building the package as built-in default values for user convenience.
These values can nevertheless be overridden by the user, if desired.
</para>
<itemizedlist>
<listitem>
<para>
Configuration file location: the default configuration file location
can be hard-coded by patching <varname>package_conf_file</varname>
in <filename>configfile.c</filename>:
<programlisting>
/* packagers: if feasible, patch configuration file path into "package_conf_file" */
char package_conf_file[MAXPGPATH] = "";</programlisting>
</para>
<para>
See also: <xref linkend="configuration-file"/>
</para>
</listitem>
<listitem>
<para>
PID file location: the default &repmgrd; PID file
location can be hard-coded by patching <varname>package_pid_file</varname>
in <filename>repmgrd.c</filename>:
<programlisting>
/* packagers: if feasible, patch PID file path into "package_pid_file" */
char package_pid_file[MAXPGPATH] = "";</programlisting>
</para>
<para>
See also: <xref linkend="repmgrd-pid-file"/>
</para>
</listitem>
</itemizedlist>
</sect1>
</appendix>

File diff suppressed because it is too large Load Diff

View File

@@ -1,5 +0,0 @@
<appendix id="appendix-signatures" xreflabel="Verifying digital signatures">
<title>Verifying digital signatures</title>
<para>WIP</para>
</appendix>

View File

@@ -0,0 +1,37 @@
<appendix id="appendix-signatures" xreflabel="Verifying digital signatures">
<title>Verifying digital signatures</title>
<sect1 id="repmgr-source-key" xreflabel="repmgr source key">
<title>repmgr source code signing key</title>
<para>
The signing key ID used for <application>repmgr</application> source code bundles is:
<ulink url="https://repmgr.org/download/SOURCE-GPG-KEY-repmgr">
<literal>0x297F1DCC</literal></ulink>.
</para>
<para>
To download the <application>repmgr</application> source key to your computer:
<programlisting>
curl -s https://repmgr.org/download/SOURCE-GPG-KEY-repmgr | gpg --import
gpg --fingerprint 0x297F1DCC
</programlisting>
then verify that the fingerprint is the expected value:
<programlisting>
085A BE38 6FD9 72CE 6365 340D 8365 683D 297F 1DCC</programlisting>
</para>
<para>
For checking tarballs, first download and import the <application>repmgr</application>
source signing key as shown above. Then download both source tarball and the detached
key (e.g. <filename>repmgr-4.0beta1.tar.gz</filename> and
<filename>repmgr-4.0beta1.tar.gz.asc</filename>) from
<ulink url="https://repmgr.org/download/">https://repmgr.org/download/</ulink>
and use <application>gpg</application> to verify the key, e.g.:
<programlisting>
gpg --verify repmgr-4.0beta1.tar.gz.asc</programlisting>
</para>
</sect1>
</appendix>

114
doc/appendix-support.xml Normal file
View File

@@ -0,0 +1,114 @@
<appendix id="appendix-support" xreflabel="repmgr support">
<title>&repmgr; support</title>
<indexterm>
<primary>support</primary>
</indexterm>
<para>
<ulink url="https://www.enterprisedb.com/">EDB</ulink> provides 24x7
production support for &repmgr; and other PostgreSQL
products, including configuration assistance, installation
verification and training for running a robust replication cluster.
</para>
<para>
For further details see: <ulink url="https://www.enterprisedb.com/support/postgresql-support-overview-get-the-most-out-of-postgresql">Support Center</ulink>
</para>
<para>
A mailing list/forum is provided via Google groups to discuss contributions or issues: <ulink url="https://groups.google.com/group/repmgr">https://groups.google.com/group/repmgr</ulink>.
</para>
<para>
Please report bugs and other issues to: <ulink url="https://github.com/EnterpriseDB/repmgr">https://github.com/EnterpriseDB/repmgr</ulink>.
</para>
<important>
<para>
Please read the <link linkend="appendix-support-reporting-issues">following section</link> before submitting questions or issue reports.
</para>
</important>
<sect1 id="appendix-support-reporting-issues" xreflabel="Reportins Issues">
<title>Reporting Issues</title>
<indexterm>
<primary>support</primary>
<secondary>reporting issues</secondary>
</indexterm>
<para>
When asking questions or reporting issues, it is extremely helpful if the following information is included:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
PostgreSQL version
</simpara>
</listitem>
<listitem>
<simpara>
&repmgr; version
</simpara>
</listitem>
<listitem>
<simpara>
How was &repmgr; installed? From source? From packages? If
so from which repository?
</simpara>
</listitem>
<listitem>
<simpara>
<filename>repmgr.conf</filename> files (suitably anonymized if necessary)
</simpara>
</listitem>
<listitem>
<simpara>
Contents of the <literal>repmgr.nodes</literal> table (suitably anonymized if necessary)
</simpara>
</listitem>
<listitem>
<simpara>
PostgreSQL 11 and earlier: contents of the <filename>recovery.conf</filename> file
(suitably anonymized if necessary).
</simpara>
</listitem>
<listitem>
<simpara>
PostgreSQL 12 and later: contents of the <filename>postgresql.auto.conf</filename> file
(suitably anonymized if necessary), and whether or not the PostgreSQL data directory
contains the files <filename>standby.signal</filename> and/or <filename>recovery.signal</filename>.
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
If issues are encountered with a &repmgr; client command, please provide
the output of that command executed with the options
<option>-LDEBUG --verbose</option>, which will ensure &repmgr; emits
the maximum level of logging output.
</para>
<para>
If issues are encountered with &repmgrd;,
please provide relevant extracts from the &repmgr; log files
and if possible the PostgreSQL log itself. Please ensure these
logs do not contain any confidential data.
</para>
<para>
In all cases it is <emphasis>extremely</emphasis> useful to receive
as much detail as possible on how to reliably reproduce
an issue.
</para>
</sect1>
</appendix>

View File

@@ -1,288 +0,0 @@
BDR failover with repmgrd
=========================
`repmgr 4` provides support for monitoring BDR nodes and taking action in case
one of the nodes fails.
*NOTE* Due to the nature of BDR, it's only safe to use this solution for
a two-node scenario. Introducing additional nodes will create an inherent
risk of node desynchronisation if a node goes down without being cleanly
removed from the cluster.
In contrast to streaming replication, there's no concept of "promoting" a new
primary node with BDR. Instead, "failover" involves monitoring both nodes
with `repmgrd` and redirecting queries from the failed node to the remaining
active node. This can be done by using the event notification script generated by
`repmgrd` to dynamically reconfigure a proxy server/connection pooler such
as PgBouncer.
Prerequisites
-------------
`repmgr 4` requires PostgreSQL 9.6 with the BDR 2 extension enabled and
configured for a two-node BDR network. `repmgr 4` packages
must be installed on each node before attempting to configure repmgr.
*NOTE* `repmgr 4` will refuse to install if it detects more than two
BDR nodes.
Application database connections *must* be passed through a proxy server/
connection pooler such as PgBouncer, and it must be possible to dynamically
reconfigure that from `repmgrd`. The example demonstrated in this document
will use PgBouncer.
The proxy server / connection poolers must not be installed on the database
servers.
For this example, it's assumed password-less SSH connections are available
from the PostgreSQL servers to the servers where PgBouncer runs, and
that the user on those servers has permission to alter the PgBouncer
configuration files.
PostgreSQL connections must be possible between each node, and each node
must be able to connect to each PgBouncer instance.
Configuration
-------------
Sample configuration for `repmgr.conf`:
node_id=1
node_name='node1'
conninfo='host=node1 dbname=bdrtest user=repmgr connect_timeout=2'
replication_type='bdr'
event_notifications=bdr_failover
event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a" >> /tmp/bdr-failover.log 2>&1'
# repmgrd options
monitor_interval_secs=5
reconnect_attempts=6
reconnect_interval=5
Adjust settings as appropriate; copy and adjust for the second node (particularly
the values `node_id`, `node_name` and `conninfo`).
Note that the values provided for the `conninfo` string must be valid for
connections from *both* nodes in the cluster. The database must be the BDR
database.
If defined, `event_notifications` will restrict execution of `event_notification_command`
to the specified events.
`event_notification_command` is the script which does the actual "heavy lifting"
of reconfiguring the proxy server/ connection pooler. It is fully user-definable;
a sample implementation is documented below.
repmgr user permissions
-----------------------
`repmgr` will create an extension in the BDR database containing objects
for administering `repmgr` metadata. The user defined in the `conninfo`
setting must be able to access all objects. Additionally, superuser permissions
are required to install the `repmgr` extension. The easiest way to do this
is create the `repmgr` user as a superuser, however if this is not
desirable, the `repmgr` user can be created as a normal user and a
superuser specified with `--superuser` when registering a BDR node.
repmgr setup
------------
Register both nodes:
$ repmgr -f /etc/repmgr.conf bdr register
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: node record created for node 'node1' (ID: 1)
NOTICE: BDR node 1 registered (conninfo: host=localhost dbname=bdrtest user=repmgr port=5501)
$ repmgr -f /etc/repmgr.conf bdr register
NOTICE: node record created for node 'node2' (ID: 2)
NOTICE: BDR node 2 registered (conninfo: host=localhost dbname=bdrtest user=repmgr port=5502)
The `repmgr` extension will be automatically created when the first
node is registered, and will be propagated to the second node.
*IMPORTANT* ensure the repmgr package is available on both nodes before
attempting to register the first node
At this point the meta data for both nodes has been created; executing
`repmgr cluster show` (on either node) should produce output like this:
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Connection string
----+-------+------+-----------+----------+--------------------------------------------------------
1 | node1 | bdr | * running | | host=node1 dbname=bdrtest user=repmgr connect_timeout=2
2 | node2 | bdr | * running | | host=node2 dbname=bdrtest user=repmgr connect_timeout=2
Additionally it's possible to see a log of significant events; so far
this will only record the two node registrations (in reverse chronological order):
Node ID | Event | OK | Timestamp | Details
---------+--------------+----+---------------------+----------------------------------------------
2 | bdr_register | t | 2017-07-27 17:51:48 | node record created for node 'node2' (ID: 2)
1 | bdr_register | t | 2017-07-27 17:51:00 | node record created for node 'node1' (ID: 1)
Defining the "event_notification_command"
-----------------------------------------
Key to "failover" execution is the `event_notification_command`, which is a
user-definable script which should reconfigure the proxy server/
connection pooler.
Each time `repmgr` (or `repmgrd`) records an event, it can optionally
execute the script defined in `event_notification_command` to
take further action; details of the event will be passed as parameters.
Following placeholders are available to the script:
%n - node ID
%e - event type
%s - success (1 or 0)
%t - timestamp
%d - details
%c - conninfo string of the next available node
%a - name of the next available node
Note that `%c` and `%a` will only be provided during `bdr_failover`
events, which is what is of interest here.
The provided sample script (`scripts/bdr-pgbouncer.sh`) is configured like
this:
event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a"'
and parses the configures parameters like this:
NODE_ID=$1
EVENT_TYPE=$2
SUCCESS=$3
NEXT_CONNINFO=$4
NEXT_NODE_NAME=$5
It also contains some hard-coded values about the PgBouncer configuration for
both nodes; these will need to be adjusted for your local environment of course
(ideally the scripts would be maintained as templates and generated by some
kind of provisioning system).
The script performs following steps:
- pauses PgBouncer on all nodes
- recreates the PgBouncer configuration file on each node using the information
provided by `repmgrd` (mainly the `conninfo` string) to configure PgBouncer
to point to the remaining node
- reloads the PgBouncer configuration
- resumes PgBouncer
From that point, any connections to PgBouncer on the failed BDR node will be redirected
to the active node.
repmgrd
-------
Node monitoring and failover
----------------------------
At the intervals specified by `monitor_interval_secs` in `repmgr.conf`, `repmgrd`
will ping each node to check if it's available. If a node isn't available,
`repmgrd` will enter failover mode and check `reconnect_attempts` times
at intervals of `reconnect_interval` to confirm the node is definitely unreachable.
This buffer period is necessary to avoid false positives caused by transient
network outages.
If the node is still unavailable, `repmgrd` will enter failover mode and execute
the script defined in `event_notification_command`; an entry will be logged
in the `repmgr.events` table and `repmgrd` will (unless otherwise configured)
resume monitoring of the node in "degraded" mode until it reappears.
`repmgrd` logfile output during a failover event will look something like this
one one node (usually the node which has failed, here "node2"):
...
[2017-07-27 21:08:39] [INFO] starting continuous BDR node monitoring
[2017-07-27 21:08:39] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
[2017-07-27 21:08:55] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
[2017-07-27 21:09:11] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
[2017-07-27 21:09:23] [WARNING] unable to connect to node node2 (ID 2)
[2017-07-27 21:09:23] [INFO] checking state of node 2, 0 of 5 attempts
[2017-07-27 21:09:23] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-07-27 21:09:24] [INFO] checking state of node 2, 1 of 5 attempts
[2017-07-27 21:09:24] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-07-27 21:09:25] [INFO] checking state of node 2, 2 of 5 attempts
[2017-07-27 21:09:25] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-07-27 21:09:26] [INFO] checking state of node 2, 3 of 5 attempts
[2017-07-27 21:09:26] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-07-27 21:09:27] [INFO] checking state of node 2, 4 of 5 attempts
[2017-07-27 21:09:27] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-07-27 21:09:28] [WARNING] unable to reconnect to node 2 after 5 attempts
[2017-07-27 21:09:28] [NOTICE] setting node record for node 2 to inactive
[2017-07-27 21:09:28] [INFO] executing notification command for event "bdr_failover"
[2017-07-27 21:09:28] [DETAIL] command is:
/path/to/bdr-pgbouncer.sh 2 bdr_failover 1 "host=host=node1 dbname=bdrtest user=repmgr connect_timeout=2" "node1"
[2017-07-27 21:09:28] [INFO] node 'node2' (ID: 2) detected as failed; next available node is 'node1' (ID: 1)
[2017-07-27 21:09:28] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
[2017-07-27 21:09:28] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
...
Output on the other node ("node1") during the same event will look like this:
[2017-07-27 21:08:35] [INFO] starting continuous BDR node monitoring
[2017-07-27 21:08:35] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
[2017-07-27 21:08:51] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
[2017-07-27 21:09:07] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
[2017-07-27 21:09:23] [WARNING] unable to connect to node node2 (ID 2)
[2017-07-27 21:09:23] [INFO] checking state of node 2, 0 of 5 attempts
[2017-07-27 21:09:23] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-07-27 21:09:24] [INFO] checking state of node 2, 1 of 5 attempts
[2017-07-27 21:09:24] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-07-27 21:09:25] [INFO] checking state of node 2, 2 of 5 attempts
[2017-07-27 21:09:25] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-07-27 21:09:26] [INFO] checking state of node 2, 3 of 5 attempts
[2017-07-27 21:09:26] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-07-27 21:09:27] [INFO] checking state of node 2, 4 of 5 attempts
[2017-07-27 21:09:27] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-07-27 21:09:28] [WARNING] unable to reconnect to node 2 after 5 attempts
[2017-07-27 21:09:28] [NOTICE] other node's repmgrd is handling failover
[2017-07-27 21:09:28] [INFO] monitoring BDR replication status on node "node1" (ID: 1)
[2017-07-27 21:09:28] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
This assumes only the PostgreSQL instance on "node2" has failed. In this case the
`repmgrd` instance running on "node2" has performed the failover. However if
the entire server becomes unavailable, `repmgrd` on "node1" will perform
the failover.
Node recovery
-------------
Following failure of a BDR node, if the node subsequently becomes available again,
a `bdr_recovery` event will be generated. This could potentially be used to
reconfigure PgBouncer automatically to bring the node back into the available pool,
however it would be prudent to manually verify the node's status before
exposing it to the application.
If the failed node comes back up and connects correctly, output similar to this
will be visible in the `repmgrd` log:
[2017-07-27 21:25:30] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
[2017-07-27 21:25:46] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
[2017-07-27 21:25:46] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
[2017-07-27 21:25:55] [INFO] active replication slot for node "node1" found after 1 seconds
[2017-07-27 21:25:55] [NOTICE] node "node2" (ID: 2) has recovered after 986 seconds
Shutdown of both nodes
----------------------
If both PostgreSQL instances are shut down, `repmgrd` will try and handle the
situation as gracefully as possible, though with no failover candidates available
there's not much it can do. Should this case ever occur, we recommend shutting
down `repmgrd` on both nodes and restarting it once the PostgreSQL instances
are running properly.

View File

@@ -1,106 +1,7 @@
Changes in repmgr 4
===================
Standardisation on `primary`
----------------------------
This document has been integrated into the main `repmgr` documentation
and is now located here:
To standardise terminology, `primary` is used to denote the read/write
node in a streaming replication cluster. `master` is still accepted
as a synonym (e.g. `repmgr master register`).
New command line options
------------------------
- `--dry-run`: repmgr will attempt to perform the action as far as possible
without making any changes to the database
- `--upstream-node-id`: use to specify the upstream node the standby will
connect later stream from, when cloning a standby. This replaces the configuration
file parameter `upstream_node`, as the upstream node is set when the standby
is initially cloned, but can change over the lifetime of an installation (due
to failovers, switchovers etc.) so it's pointless/confusing keeping the original
value around in the config file.
Changed command line options
----------------------------
### repmgr
- `--replication-user` has been deprecated; it has been replaced by the
configuration file option `replication_user`. The value (which defaults
to the user in the `conninfo` string) will be stored in the repmgr metadata
for use by standby clone/follow..
- `--recovery-min-apply-delay` is now a configuration file parameter
`recovery_min_apply_delay, to ensure the setting does not get lost when
a standby follows a new upstream.
### repmgrd
- `--monitoring-history` is deprecated and has been replaced by the
configuration file option `monitoring_history`. This enables the
setting to be changed without having to modify system service files.
Changes to repmgr commands
--------------------------
### `repmgr cluster show`
This now displays the role of each node (e.g. `primary`, `standby`)
and its status in separate columns.
The `--csv` option now emits a third column indicating the recovery
status of the node.
Configuration file changes
--------------------------
### Required settings
The following 4 parameters are mandatory in `repmgr.conf`:
- `node_id`
- `node_name`
- `conninfo`
- `data_directory`
### Renamed settings
Some settings have been renamed for clarity and consistency:
- `node`: now `node_id`
- `name`: now `node_name`
- `master_reponse_timeout`: now `async_query_timeout` to better indicate its
purpose
- The following configuration file parameters have been renamed for consistency
with other parameters (and conform to the pattern used by PostgreSQL itself,
which uses the prefix `log_` for logging parameters):
- `loglevel` has been renamed to `log_level`
- `logfile` has been renamed to `log_file`
- `logfacility` has been renamed to `log_facility`
### Removed settings
- `cluster`: has been removed
- `upstream_node`: see note about `--upstream-node-id` above.
- `retry_promote_interval_secs`: this is now redundant due to changes in the
failover/promotion mechanism; the new equivalent is `primary_notification_timeout`
### Logging changes
- default value for `log_level` is `INFO` rather than `NOTICE`.
- new parameter `log_status_interval`, which causes `repmgrd` to emit a status log
line at the specified interval
repmgrd
-------
The `repmgr` shared library has been renamed from `repmgr_funcs` to `repmgr`,
meaning `shared_preload_libraries` needs to be updated to the new name:
shared_preload_libraries = 'repmgr'
> [Release notes](https://repmgr.org/docs/current/release-4.0.html)

View File

@@ -1,17 +1,21 @@
<chapter id="cloning-standbys" xreflabel="cloning standbys">
<title>Cloning standbys</title>
<para>
</para>
<sect1 id="cloning-from-barman" xreflabel="Cloning from Barman">
<indexterm>
<primary>cloning</primary>
<secondary>from Barman</secondary>
</indexterm>
<title>Cloning a standby from Barman</title>
<indexterm>
<primary>cloning</primary>
<secondary>from Barman</secondary>
</indexterm>
<indexterm>
<primary>Barman</primary>
<secondary>cloning a standby</secondary>
</indexterm>
<para>
<xref linkend="repmgr-standby-clone"> can use
<ulink url="https://www.2ndquadrant.com/">2ndQuadrant</ulink>'s
<xref linkend="repmgr-standby-clone"/> can use
<ulink url="https://www.enterprisedb.com/">EDB</ulink>'s
<ulink url="https://www.pgbarman.org/">Barman</ulink> application
to clone a standby (and also as a fallback source for WAL files).
</para>
@@ -42,13 +46,33 @@
<para>
WAL management on the primary becomes much easier as there's no need
to use replication slots, and <varname>wal_keep_segments</varname>
(PostgreSQL 13 and later: <varname>wal_keep_size</varname>)
does not need to be set.
</para>
</listitem>
</itemizedlist>
</para>
<sect2 id="cloning-from-barman-prerequisites" xreflabel="Prerequisites for cloning from Barman">
<note>
<para>
Currently &repmgr;'s support for cloning from Barman is implemented by using
<productname>rsync</productname> to clone from the Barman server.
</para>
<para>
It is therefore not able to make use of Barman's parallel restore facility, which
is executed on the Barman server and clones to the target server.
</para>
<para>
Barman's parallel restore facility can be used by executing it manually on
the Barman server and configuring replication on the resulting cloned
standby using
<command><link linkend="repmgr-standby-clone">repmgr standby clone --replication-conf-only</link></command>.
</para>
</note>
<sect2 id="cloning-from-barman-prerequisites">
<title>Prerequisites for cloning from Barman</title>
<para>
In order to enable Barman support for <command>repmgr standby clone</command>, following
@@ -56,8 +80,7 @@
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>
the <varname>barman_server</varname> setting in <filename>repmgr.conf</filename> is the same as the
server configured in Barman;
the Barman catalogue must include at least one valid backup for this server;
</para>
</listitem>
<listitem>
@@ -68,18 +91,90 @@
</listitem>
<listitem>
<para>
the <varname>restore_command</varname> setting in <filename>repmgr.conf</filename> is configured to
use a copy of the <command>barman-wal-restore</command> script shipped with the
<literal>barman-cli</literal> package (see below);
</para>
</listitem>
<listitem>
<para>
the Barman catalogue includes at least one valid backup for this server.
the <varname>barman_server</varname> setting in <filename>repmgr.conf</filename> is the same as the
server configured in Barman.
</para>
</listitem>
</itemizedlist>
</para>
<para>
For example, assuming Barman is located on the host &quot;<literal>barmansrv</literal>&quot;
under the &quot;<literal>barman</literal>&quot; user account,
<filename>repmgr.conf</filename> should contain the following entries:
<programlisting>
barman_host='barman@barmansrv'
barman_server='pg'</programlisting>
</para>
<para>
Here <literal>pg</literal> corresponds to a section in Barman's configuration file for a specific
server backup configuration, which would look something like:
<programlisting>
[pg]
description = "Main cluster"
...
</programlisting>
</para>
<para>
More details on Barman configuration can be found in the
<ulink url="https://docs.pgbarman.org/">Barman documentation</ulink>'s
<ulink url="https://docs.pgbarman.org/#configuration">configuration section</ulink>.
</para>
<note>
<para>
To use a non-default Barman configuration file on the Barman server,
specify this in <filename>repmgr.conf</filename> with <filename>barman_config</filename>:
<programlisting>
barman_config='/path/to/barman.conf'</programlisting>
</para>
</note>
<para>
We also recommend configuring the <varname>restore_command</varname> setting in <filename>repmgr.conf</filename>
to use the <command>barman-wal-restore</command> script
(see section <xref linkend="cloning-from-barman-restore-command"/> below).
</para>
<tip>
<simpara>
If you have a non-default SSH configuration on the Barman
server, e.g. using a port other than 22, then you can set those
parameters in a dedicated Host section in <filename>~/.ssh/config</filename>
corresponding to the value of <varname>barman_host</varname> in
<filename>repmgr.conf</filename>. See the <literal>Host</literal>
section in <command>man 5 ssh_config</command> for more details.
</simpara>
</tip>
<para>
If you wish to place WAL files in a location outside the main
PostgreSQL data directory, set <option>--waldir</option>
(PostgreSQL 9.6 and earlier: <option>--xlogdir</option>) in
<option>pg_basebackup_options</option> to the target directory
(must be an absolute filepath). &repmgr; will create and
symlink to this directory in exactly the same way
<application>pg_basebackup</application> would.
</para>
<para>
It's now possible to clone a standby from Barman, e.g.:
<programlisting>
$ repmgr -f /etc/repmgr.conf -h node1 -U repmgr -d repmgr standby clone
NOTICE: destination directory "/var/lib/postgresql/data" provided
INFO: connecting to Barman server to verify backup for "test_cluster"
INFO: checking and correcting permissions on existing directory "/var/lib/postgresql/data"
INFO: creating directory "/var/lib/postgresql/data/repmgr"...
INFO: connecting to Barman server to fetch server parameters
INFO: connecting to source node
DETAIL: current installation size is 30 MB
NOTICE: retrieving backup from Barman...
(...)
NOTICE: standby clone (from Barman) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /var/lib/postgresql/data start</programlisting>
</para>
<note>
<simpara>
Barman support is automatically enabled if <varname>barman_server</varname>
@@ -89,81 +184,155 @@
command line option.
</simpara>
</note>
<tip>
<simpara>
If you have a non-default SSH configuration on the Barman
server, e.g. using a port other than 22, then you can set those
parameters in a dedicated Host section in <filename>~/.ssh/config</filename>
corresponding to the value of<varname>barman_host</varname> in
<filename>repmgr.conf</filename>. See the <literal>Host</literal>
section in <command>man 5 ssh_config</command> for more details.
</simpara>
</tip>
<para>
It's now possible to clone a standby from Barman, e.g.:
<programlisting>
NOTICE: using configuration file "/etc/repmgr.conf"
NOTICE: destination directory "/var/lib/postgresql/data" provided
INFO: connecting to Barman server to verify backup for test_cluster
INFO: checking and correcting permissions on existing directory "/var/lib/postgresql/data"
INFO: creating directory "/var/lib/postgresql/data/repmgr"...
INFO: connecting to Barman server to fetch server parameters
INFO: connecting to upstream node
INFO: connected to source node, checking its state
INFO: successfully connected to source node
DETAIL: current installation size is 29 MB
NOTICE: retrieving backup from Barman...
receiving file list ...
(...)
NOTICE: standby clone (from Barman) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /var/lib/postgresql/data start</programlisting>
</para>
</sect2>
<sect2 id="cloning-from-barman-restore-command" xreflabel="Using Barman as a WAL file source">
<title>Using Barman as a WAL file source</title>
<indexterm>
<primary>Barman</primary>
<secondary>fetching archived WAL</secondary>
</indexterm>
<para>
As a fallback in case streaming replication is interrupted, PostgreSQL can optionally
retrieve WAL files from an archive, such as that provided by Barman. This is done by
setting <varname>restore_command</varname> in <filename>recovery.conf</filename> to
setting <varname>restore_command</varname> in the replication configuration to
a valid shell command which can retrieve a specified WAL file from the archive.
</para>
<para>
<command>barman-wal-restore</command> is a Python script provided as part of the <literal>barman-cli</literal>
package (Barman 2.0 and later; for Barman 1.x the script is provided separately as
<command>barman-wal-restore.py</command>) which performs this function for Barman.
package (Barman 2.0 ~ 2.7) or as part of the core Barman distribution (Barman 2.8 and later).
</para>
<para>
To use <command>barman-wal-restore</command> with &repmgr;
and assuming Barman is located on the <literal>barmansrv</literal> host
To use <command>barman-wal-restore</command> with &repmgr;,
assuming Barman is located on the host &quot;<literal>barmansrv</literal>&quot;
under the &quot;<literal>barman</literal>&quot; user account,
and that <command>barman-wal-restore</command> is located as an executable at
<filename>/usr/bin/barman-wal-restore</filename>,
<filename>repmgr.conf</filename> should include the following lines:
<programlisting>
barman_host=barmansrv
barman_server=somedb
restore_command=/usr/bin/barman-wal-restore barmansrv somedb %f %p</programlisting>
barman_host='barman@barmansrv'
barman_server='pg'
restore_command='/usr/bin/barman-wal-restore barmansrv pg %f %p'</programlisting>
</para>
<note>
<simpara>
<command>barman-wal-restore</command> supports command line switches to
control parallelism (<literal>--parallel=N</literal>) and compression (
<literal>--bzip2</literal>, <literal>--gzip</literal>).
control parallelism (<literal>--parallel=N</literal>) and compression
(<literal>--bzip2</literal>, <literal>--gzip</literal>).
</simpara>
</note>
<note>
<para>
To use a non-default Barman configuration file on the Barman server,
specify this in <filename>repmgr.conf</filename> with <filename>barman_config</filename>:
<programlisting>
barman_config=/path/to/barman.conf</programlisting>
</para>
</note>
</sect2>
<sect2 id="cloning-from-barman-pg_backupapi-mode" xreflabel="Using Barman through its API (pg-backup-api)">
<title>Using Barman through its API (pg-backup-api)</title>
<indexterm>
<primary>cloning</primary>
<secondary>pg-backup-api</secondary>
</indexterm>
<para>
You can find information on how to install and setup pg-backup-api in
<ulink url="https://www.enterprisedb.com/docs/supported-open-source/barman/pg-backup-api/">the pg-backup-api
documentation</ulink>.
</para>
<para>
This mode (`pg-backupapi`) was introduced in v5.4.0 as a way to further integrate with Barman letting Barman
handle the restore. This also reduces the ssh keys that need to share between the backup and postgres nodes.
As long as you have access to the API service by HTTP calls, you could perform recoveries right away.
You just need to instruct Barman through the API which backup you need and on which node the backup needs to
to be restored on.
</para>
<para>
In order to enable <literal>pg_backupapi mode</literal> support for <command>repmgr standby clone</command>,
you need the following lines in repmgr.conf:
<itemizedlist spacing="compact" mark="bullet">
<listitem><para>pg_backupapi_host: Where pg-backup-api is hosted</para></listitem>
<listitem><para>pg_backupapi_node_name: Name of the server as understood by Barman</para></listitem>
<listitem><para>pg_backupapi_remote_ssh_command: How Barman will be connecting as to the node</para></listitem>
<listitem><para>pg_backupapi_backup_id: ID of the existing backup you need to restore</para></listitem>
</itemizedlist>
This is an example of how repmgr.conf would look like:
<programlisting>
pg_backupapi_host = '192.168.122.154'
pg_backupapi_node_name = 'burrito'
pg_backupapi_remote_ssh_command = 'ssh john_doe@192.168.122.1'
pg_backupapi_backup_id = '20230223T093201'
</programlisting>
</para>
<para>
<literal>pg_backupapi_host</literal> is the variable name that enables this mode, and when you set it,
all the rest of the above variables are required. Also, remember that this service is just an interface
between Barman and repmgr, hence if something fails during a recovery, you should check Barman's logs upon
why the process couldn't finish properly.
</para>
<note>
<simpara>
Despite in Barman you can define shortcuts like "lastest" or "oldest", they are not supported for the
time being in pg-backup-api. These shortcuts will be supported in a future release.
</simpara>
</note>
<para>
This is a real example of repmgr's output cloning with the API. Note that during this operation, we stopped
the service for a little while and repmgr had to retry but that doesn't affect the final outcome. The primary
is listening on localhost's port 6001:
<programlisting>
$ repmgr -f ~/nodes/node_3/repmgr.conf standby clone -U repmgr -p 6001 -h localhost
NOTICE: destination directory "/home/mario/nodes/node_3/data" provided
INFO: Attempting to use `pg_backupapi` new restore mode
INFO: connecting to source node
DETAIL: connection string is: user=repmgr port=6001 host=localhost
DETAIL: current installation size is 8541 MB
DEBUG: 1 node records returned by source node
DEBUG: connecting to: "user=repmgr dbname=repmgr host=localhost port=6001 connect_timeout=2 fallback_application_name=repmgr options=-csearch_path="
DEBUG: upstream_node_id determined as 1
INFO: Attempting to use `pg_backupapi` new restore mode
INFO: replication slot usage not requested; no replication slot will be set up for this standby
NOTICE: starting backup (using pg_backupapi)...
INFO: Success creating the task: operation id '20230309T150647'
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
Incorrect reply received for that operation ID.
INFO: Retrying...
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status IN_PROGRESS
INFO: status DONE
NOTICE: standby clone (from pg_backupapi) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /home/mario/nodes/node_3/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
</programlisting>
</para>
</sect2> <!--END cloning-from-barman-pg_backupapi-mode !-->
</sect1>
<sect1 id="cloning-replication-slots" xreflabel="Cloning and replication slots">
<sect1 id="cloning-replication-slots" xreflabel="Cloning and replication slots">
<title>Cloning and replication slots</title>
<indexterm>
<primary>cloning</primary>
<secondary>replication slots</secondary>
@@ -173,13 +342,13 @@
<primary>replication slots</primary>
<secondary>cloning</secondary>
</indexterm>
<title>Cloning and replication slots</title>
<para>
Replication slots were introduced with PostgreSQL 9.4 and are designed to ensure
that any standby connected to the primary using a replication slot will always
be able to retrieve the required WAL files. This removes the need to manually
manage WAL file retention by estimating the number of WAL files that need to
be maintained on the primary using <varname>wal_keep_segments</varname>.
be maintained on the primary using <varname>wal_keep_segments</varname>
(PostgreSQL 13 and later: <varname>wal_keep_size</varname>).
Do however be aware that if a standby is disconnected, WAL will continue to
accumulate on the primary until either the standby reconnects or the replication
slot is dropped.
@@ -188,11 +357,10 @@
To enable &repmgr; to use replication slots, set the boolean parameter
<varname>use_replication_slots</varname> in <filename>repmgr.conf</filename>:
<programlisting>
use_replication_slots=true
</programlisting>
use_replication_slots=true</programlisting>
</para>
<para>
Replication slots must be enabled in <filename>postgresql.conf</filename>` by
Replication slots must be enabled in <filename>postgresql.conf</filename> by
setting the parameter <varname>max_replication_slots</varname> to at least the
number of expected standbys (changes to this parameter require a server restart).
</para>
@@ -234,41 +402,43 @@
build up indefinitely, possibly leading to server failure.
</simpara>
<simpara>
As an alternative we recommend using 2ndQuadrant's <ulink url="https://www.pgbarman.org/">Barman</ulink>,
which offloads WAL management to a separate server, negating the need to use replication
slots to reserve WAL. See section <xref linkend="cloning-from-barman">
As an alternative we recommend using EDB's <ulink url="https://www.pgbarman.org/">Barman</ulink>,
which offloads WAL management to a separate server, removing the requirement to use a replication
slot for each individual standby to reserve WAL. See section <xref linkend="cloning-from-barman"/>
for more details on using &repmgr; together with Barman.
</simpara>
</tip>
</sect1>
<sect1 id="cloning-cascading" xreflabel="Cloning and cascading replication">
<title>Cloning and cascading replication</title>
<indexterm>
<primary>cloning</primary>
<secondary>cascading replication</secondary>
</indexterm>
<title>Cloning and cascading replication</title>
<para>
Cascading replication, introduced with PostgreSQL 9.2, enables a standby server
to replicate from another standby server rather than directly from the primary,
meaning replication changes "cascade" down through a hierarchy of servers. This
can be used to reduce load on the primary and minimize bandwith usage between
can be used to reduce load on the primary and minimize bandwidth usage between
sites. For more details, see the
<ulink url="https://www.postgresql.org/docs/current/static/warm-standby.html#CASCADING-REPLICATION">
<ulink url="https://www.postgresql.org/docs/current/warm-standby.html#CASCADING-REPLICATION">
PostgreSQL cascading replication documentation</ulink>.
</para>
<para>
&repmgr; supports cascading replication. When cloning a standby,
set the command-line parameter <literal>--upstream-node-id</literal> to the
<varname>node_id</varname> of the server the standby should connect to, and
&repmgr; will create <filename>recovery.conf</filename> to point to it. Note
&repmgr; will create a replication configuration file to point to it. Note
that if <literal>--upstream-node-id</literal> is not explicitly provided,
&repmgr; will set the standby's <filename>recovery.conf</filename> to
&repmgr; will set the standby's replication configuration to
point to the primary node.
</para>
<para>
To demonstrate cascading replication, first ensure you have a primary and standby
set up as shown in the <xref linkend="quickstart">.
set up as shown in the <xref linkend="quickstart"/>.
Then create an additional standby server with <filename>repmgr.conf</filename> looking
like this:
<programlisting>
@@ -324,63 +494,113 @@
cluster, you may wish to clone a downstream standby whose upstream node
does not yet exist. In this case you can clone from the primary (or
another upstream node); provide the parameter <literal>--upstream-conninfo</literal>
to explictly set the upstream's <varname>primary_conninfo</varname> string
in <filename>recovery.conf</filename>.
to explicitly set the upstream's <varname>primary_conninfo</varname> string
in the replication configuration.
</simpara>
</tip>
</sect1>
<sect1 id="cloning-advanced" xreflabel="Advanced cloning options">
<title>Advanced cloning options</title>
<indexterm>
<primary>cloning</primary>
<secondary>advanced options</secondary>
</indexterm>
<title>Advanced cloning options</title>
<sect2 id="cloning-advanced-pg-basebackup-options" xreflabel="pg_basebackup options when cloning a standby">
<title>pg_basebackup options when cloning a standby</title>
<para>
As &repmgr; uses <command>pg_basebackup</command> to clone a standby, it's possible to
provide additional parameters for <command>pg_basebackup</command> to customise the
cloning process.
</para>
<para>
By default, <command>pg_basebackup</command> performs a checkpoint before beginning the backup
process. However, a normal checkpoint may take some time to complete;
a fast checkpoint can be forced with the <literal>-c/--fast-checkpoint</literal> option.
However this may impact performance of the server being cloned from
a fast checkpoint can be forced with <command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>'s
<literal>-c/--fast-checkpoint</literal> option.
Note that this may impact performance of the server being cloned from (typically the primary)
so should be used with care.
</para>
<tip>
<simpara>
If <application>Barman</application> is set up for the cluster, it's possible to
clone the standby directly from Barman, without any impact on the server the standby
is being cloned from. For more details see <xref linkend="cloning-from-barman"/>.
</simpara>
</tip>
<para>
Further options can be passed to the <command>pg_basebackup</command> utility via
the setting <varname>pg_basebackup_options</varname> in <filename>repmgr.conf</filename>.
See the <ulink url="https://www.postgresql.org/docs/current/static/app-pgbasebackup.html">PostgreSQL pg_basebackup documentation</ulink>
Other options can be passed to <command>pg_basebackup</command> by including them
in the <filename>repmgr.conf</filename> setting <varname>pg_basebackup_options</varname>.
</para>
<para>
Not that by default, &repmgr; executes <command>pg_basebackup</command> with <option>-X/--wal-method</option>
(PostgreSQL 9.6 and earlier: <option>-X/--xlog-method</option>) set to <literal>stream</literal>.
From PostgreSQL 9.6, if replication slots are in use, it will also create a replication slot before
running the base backup, and execute <command>pg_basebackup</command> with the
<option>-S/--slot</option> option set to the name of the previously created replication slot.
</para>
<para>
These parameters can set by the user in <varname>pg_basebackup_options</varname>, in which case they
will override the &repmgr; default values. However normally there's no reason to do this.
</para>
<para>
If using a separate directory to store WAL files, provide the option <literal>--waldir</literal>
(<literal>--xlogdir</literal> in PostgreSQL 9.6 and earlier) with the absolute path to the
WAL directory. Any WALs generated during the cloning process will be copied here, and
a symlink will automatically be created from the main data directory.
</para>
<tip>
<para>
The <literal>--waldir</literal> (<literal>--xlogdir</literal>) option,
if present in <varname>pg_basebackup_options</varname>, will be honoured by &repmgr;
when cloning from Barman (&repmgr; 5.2 and later).
</para>
</tip>
<para>
See the <ulink url="https://www.postgresql.org/docs/current/app-pgbasebackup.html">PostgreSQL pg_basebackup documentation</ulink>
for more details of available options.
</para>
</sect2>
<sect2 id="cloning-advanced-managing-passwords" xreflabel="Managing passwords">
<title>Managing passwords</title>
<indexterm>
<primary>cloning</primary>
<secondary>using passwords</secondary>
</indexterm>
<para>
If replication connections to a standby's upstream server are password-protected,
the standby must be able to provide the password so it can begin streaming
replication.
the standby must be able to provide the password so it can begin streaming replication.
</para>
<para>
The recommended way to do this is to store the password in the <literal>postgres</literal> system
user's <filename>~/.pgpass</filename> file. It's also possible to store the password in the
environment variable <varname>PGPASSWORD</varname>, however this is not recommended for
security reasons. For more details see the
<ulink url="https://www.postgresql.org/docs/current/static/libpq-pgpass.html">PostgreSQL password file documentation</ulink>.
user's <filename>~/.pgpass</filename> file. For more information on using the password file, see
the documentation section <xref linkend="configuration-password-file"/>.
</para>
<note>
<para>
If using a <filename>pgpass</filename> file, an entry for the replication user (by default the
user who connects to the <literal>repmgr</literal> database) <emphasis>must</emphasis>
be provided, with database name set to <literal>replication</literal>, e.g.:
<programlisting>
node1:5432:replication:repmgr:12345</programlisting>
</para>
</note>
<para>
If, for whatever reason, you wish to include the password in <filename>recovery.conf</filename>,
If, for whatever reason, you wish to include the password in the replication configuration file,
set <varname>use_primary_conninfo_password</varname> to <literal>true</literal> in
<filename>repmgr.conf</filename>. This will read a password set in <varname>PGPASSWORD</varname>
(but not <filename>~/.pgpass</filename>) and place it into the <literal>primary_conninfo</literal>
string in <filename>recovery.conf</filename>. Note that <varname>PGPASSWORD</varname>
will need to be set during any action which causes <filename>recovery.conf</filename> to be
rewritten, e.g. <xref linkend="repmgr-standby-follow">.
</para>
<para>
It is of course also possible to include the password value in the <varname>conninfo</varname>
string for each node, but this is obviously a security risk and should be
avoided.
(but not <filename>~/.pgpass</filename>) and place it into the <varname>primary_conninfo</varname>
string in the replication configuration. Note that <varname>PGPASSWORD</varname>
will need to be set during any action which causes the replication configuration file to be
rewritten, e.g. <xref linkend="repmgr-standby-follow"/>.
</para>
</sect2>
@@ -391,12 +611,40 @@
user (in addition to the user who manages the &repmgr; metadata). In this case,
the replication user should be set in <filename>repmgr.conf</filename> via the parameter
<varname>replication_user</varname>; &repmgr; will use this value when making
replication connections and generating <filename>recovery.conf</filename>. This
value will also be stored in the <literal>repmgr.nodes</literal>
replication connections and generating the replication configuration. This
value will also be stored in the parameter <literal>repmgr.nodes</literal>
table for each node; it no longer needs to be explicitly specified when
cloning a node or executing <xref linkend="repmgr-standby-follow">.
cloning a node or executing <xref linkend="repmgr-standby-follow"/>.
</para>
</sect2>
<sect2 id="cloning-advanced-tablespace-mapping" xreflabel="Tablespace mapping">
<title>Tablespace mapping</title>
<indexterm>
<primary>tablespace mapping</primary>
</indexterm>
<para>
&repmgr; provides a <option>tablespace_mapping</option> configuration
file option, which will makes it possible to map the tablespace on the source node to
a different location on the local node.
</para>
<para>
To use this, add <option>tablespace_mapping</option> to <filename>repmgr.conf</filename>
like this:
<programlisting>
tablespace_mapping='/var/lib/pgsql/tblspc1=/data/pgsql/tblspc1'
</programlisting>
</para>
<para>
where the left-hand value represents the tablespace on the source node,
and the right-hand value represents the tablespace on the standby to be cloned.
</para>
<para>
This parameter can be provided multiple times.
</para>
</sect2>
</sect1>

View File

@@ -0,0 +1,107 @@
<sect1 id="configuration-file-log-settings" xreflabel="log settings">
<title>Log settings</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>log settings</secondary>
</indexterm>
<indexterm>
<primary>log settings</primary>
<secondary>configuration in repmgr.conf</secondary>
</indexterm>
<para>
By default, &repmgr; and &repmgrd; write log output to
<literal>STDERR</literal>. An alternative log destination can be specified
(either a file or <literal>syslog</literal>).
</para>
<note>
<para>
The &repmgr; application itself will continue to write log output to <literal>STDERR</literal>
even if another log destination is configured, as otherwise any output resulting from a command
line operation will "disappear" into the log.
</para>
<para>
This behaviour can be overriden with the command line option <option>--log-to-file</option>,
which will redirect all logging output to the configured log destination. This is recommended
when &repmgr; is executed by another application, particularly &repmgrd;,
to enable log output generated by the &repmgr; application to be stored for later reference.
</para>
</note>
<variablelist>
<varlistentry id="repmgr-conf-log-level" xreflabel="log_level">
<term><varname>log_level</varname> (<type>string</type>)</term>
<listitem>
<indexterm>
<primary><varname>log_level</varname> configuration file parameter</primary>
</indexterm>
<para>
One of <option>DEBUG</option>, <option>INFO</option>, <option>NOTICE</option>,
<option>WARNING</option>, <option>ERROR</option>, <option>ALERT</option>, <option>CRIT</option>
or <option>EMERG</option>.
</para>
<para>
Default is <option>INFO</option>.
</para>
<para>
Note that <option>DEBUG</option> will produce a substantial amount of log output
and should not be enabled in normal use.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-log-facility" xreflabel="log_facility">
<term><varname>log_facility</varname> (<type>string</type>)
<indexterm>
<primary><varname>log_facility</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Logging facility: possible values are <option>STDERR</option> (default), or for
syslog integration, one of <option>LOCAL0</option>, <option>LOCAL1</option>, <option>...</option>,
<option>LOCAL7</option>, <option>USER</option>.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-log-file" xreflabel="log_file">
<term><varname>log_file</varname> (<type>string</type>)
<indexterm>
<primary><varname>log_file</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
If <xref linkend="repmgr-conf-log-facility"/> is set to <option>STDERR</option>, log output
can be redirected to the specified file.
</para>
<para>
See <xref linkend="repmgrd-log-rotation"/> for information on configuring log rotation.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-log-status-interval" xreflabel="log_status_interval">
<term><varname>log_status_interval</varname> (<type>integer</type>)
<indexterm>
<primary><varname>log_status_interval</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
This setting causes &repmgrd; to emit a status log
line at the specified interval (in seconds, default <literal>300</literal>)
describing &repmgrd;'s current state, e.g.:
</para>
<programlisting>
[2018-07-12 00:47:32] [INFO] monitoring connection to upstream node "node1" (ID: 1)</programlisting>
</listitem>
</varlistentry>
</variablelist>
</sect1>

View File

@@ -0,0 +1,189 @@
<sect1 id="configuration-file-optional-settings" xreflabel="optional configuration file settings">
<title>Optional configuration file settings</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>optional settings</secondary>
</indexterm>
<note>
<simpara>
This section documents a subset of optional configuration settings; for a full
and annotated view of all configuration options see the
<ulink url="https://raw.githubusercontent.com/EnterpriseDB/repmgr/master/repmgr.conf.sample">sample repmgr.conf file</ulink>
</simpara>
</note>
<variablelist>
<varlistentry id="repmgr-conf-config-directory" xreflabel="config_directory">
<term><varname>config_directory</varname> (<type>string</type>)
<indexterm>
<primary><varname>config_directory</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
If PostgreSQL configuration files are located outside the data
directory, specify the directory where the main
<filename>postgresql.conf</filename> file is located.
</para>
<para>
This enables explicit provision of an external configuration file
directory, which if set will be passed to <command>pg_ctl</command> as the
<option>-D</option> parameter. Otherwise <command>pg_ctl</command> will
default to using the data directory, which will cause some operations
to fail if the configuration files are not present there.
</para>
<note>
<para>
This is implemented primarily for feature completeness and for
development/testing purposes. Users who have installed &repmgr; from
a package should <emphasis>not</emphasis> rely on to stop/start/restart PostgreSQL,
instead they should set the appropriate <option>service_..._command</option>
for their operating system. For more details see
<xref linkend="configuration-file-service-commands"/>.
</para>
</note>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-replication-user" xreflabel="replication_user">
<term><varname>replication_user</varname> (<type>string</type>)
<indexterm>
<primary><varname>replication_user</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
PostgreSQL user to make replication connections with.
If not set defaults, to the user defined in <xref linkend="repmgr-conf-conninfo"/>.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-replication-type" xreflabel="replication_type">
<term><varname>replication_type</varname> (<type>string</type>)
<indexterm>
<primary><varname>replication_type</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Must be <literal>physical</literal> (the default).
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-location" xreflabel="location">
<term><varname>location</varname> (<type>string</type>)
<indexterm>
<primary><varname>location</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
An arbitrary string defining the location of the node; this
is used during failover to check visibility of the
current primary node.
</para>
<para>
For more details see <xref linkend="repmgrd-network-split"/>.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-use-replication-slots" xreflabel="use_replication_slots">
<term><varname>use_replication_slots</varname> (<type>boolean</type>)
<indexterm>
<primary><varname>use_replication_slots</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Whether to use physical replication slots.
</para>
<note>
<para>
When using replication slots,
<varname>max_replication_slots</varname> should be configured for
at least the number of standbys which will connect
to the primary.
</para>
</note>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-ssh-options" xreflabel="ssh_options">
<term><varname>ssh_options</varname> (<type>string</type>)
<indexterm>
<primary><varname>ssh_options</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Options to append to the <command>ssh</command> command when executed
by &repmgr;.
</para>
<para>
We recommend adding <literal>-q</literal> to suppress any superfluous
SSH chatter such as login banners, and also an explicit
<option>ConnectTimeout</option> value,
e.g.:
<programlisting>
ssh_options='-q -o ConnectTimeout=10'</programlisting>
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-pg-bindir" xreflabel="pg_bindir">
<term><varname>pg_bindir</varname> (<type>string</type>)
<indexterm>
<primary><varname>pg_bindir</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Path to the PostgreSQL binary directory (location of <application>pg_ctl</application>,
<application>pg_basebackup</application> etc.). Only required
if these are not in the system <varname>PATH</varname>.
</para>
<tip>
<para>
When &repmgr; is executed via <application>SSH</application> (e.g. when running
<command><link linkend="repmgr-standby-switchover">repmgr standby switchover</link></command>,
<command><link linkend="repmgr-cluster-matrix">repmgr cluster matrix</link></command> or
<command><link linkend="repmgr-cluster-crosscheck">repmgr cluster crosscheck</link></command>,
or if it is executed as cronjob), a login shell will not be used and only the
default system <varname>PATH</varname> will be set. Therefore it's recommended to set
<varname>pg_bindir</varname> so &repmgr; can correctly invoke binaries on a remote
system and avoid potential path issues.
</para>
</tip>
<para>
Debian/Ubuntu users: you will probably need to set this to the directory where
<application>pg_ctl</application> is located, e.g. <filename>/usr/lib/postgresql/9.6/bin/</filename>.
</para>
<para>
<emphasis>NOTE</emphasis>: <varname>pg_bindir</varname> is only used when &repmgr; directly
executes PostgreSQL binaries; any user-defined scripts
<emphasis>must</emphasis> be specified with the full path.
</para>
</listitem>
</varlistentry>
</variablelist>
<tip>
<simpara>
See the <ulink url="https://raw.githubusercontent.com/EnterpriseDB/repmgr/master/repmgr.conf.sample">sample repmgr.conf file</ulink>
for a full and annotated view of all configuration options.
</simpara>
</tip>
</sect1>

View File

@@ -1,5 +1,12 @@
<sect1 id="configuration-file-settings" xreflabel="configuration file settings">
<title>Configuration file settings</title>
<sect1 id="configuration-file-settings" xreflabel="required configuration file settings">
<title>Required configuration file settings</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>required settings</secondary>
</indexterm>
<para>
Each <filename>repmgr.conf</filename> file must contain the following parameters:
</para>
@@ -34,6 +41,10 @@
called <varname>standby1</varname> (for example), things will be confusing
to say the least.
</para>
<para>
The string's maximum length is 63 characters and it should
contain only printable ASCII characters.
</para>
</listitem>
</varlistentry>
@@ -51,7 +62,7 @@
</para>
<para>
For details on conninfo strings, see section <ulink
url="https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-CONNSTRING">Connection Strings</>
url="https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING">Connection Strings</ulink>
in the PosgreSQL documentation.
</para>
<para>
@@ -59,19 +70,19 @@
<varname>connect_timeout</varname> in the <varname>conninfo</varname>
string to determine the length of time which elapses before a network
connection attempt is abandoned; for details see <ulink
url="https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-CONNECT-CONNECT-TIMEOUT">
the PostgreSQL documentation</>.
url="https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNECT-CONNECT-TIMEOUT">
the PostgreSQL documentation</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-data-directory" xreflabel="data_directory">
<term><varname>data_directory</varname> (<type>string</type>)
<indexterm>
<primary><varname>data_directory</varname> configuration file parameter</primary>
</indexterm>
</term>
<term><varname>data_directory</varname> (<type>string</type>)</term>
<listitem>
<indexterm>
<primary><varname>data_directory</varname> configuration file parameter</primary>
</indexterm>
<para>
The node's data directory. This is needed by repmgr
when performing operations when the PostgreSQL instance
@@ -85,30 +96,9 @@
</variablelist>
</para>
<para>
For a full list of annotated configuration items, see the file
<ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</>.
</para>
<para>
See <xref linkend="configuration-file-optional-settings"/> for further configuration options.
</para>
<note>
<para>
The following parameters in the configuration file can be overridden with
command line options:
<itemizedlist>
<listitem>
<simpara>
<literal>-L/--log-level</literal> overrides <literal>log_level</literal> in
<filename>repmgr.conf</filename>
</simpara>
</listitem>
<listitem>
<simpara>
<literal>-b/--pg_bindir</literal> overrides <literal>pg_bindir</literal> in
<filename>repmgr.conf</filename>
</simpara>
</listitem>
</itemizedlist>
</para>
</note>
</sect1>

View File

@@ -0,0 +1,133 @@
<sect1 id="configuration-file-service-commands" xreflabel="service command settings">
<title>Service command settings</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>service command settings</secondary>
</indexterm>
<indexterm>
<primary>service command settings</primary>
<secondary>configuration in repmgr.conf</secondary>
</indexterm>
<para>
In some circumstances, &repmgr; (and &repmgrd;) need to
be able to stop, start or restart PostgreSQL. &repmgr; commands which need to do this
include <link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>,
<link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link> and
<link linkend="repmgr-node-rejoin"><command>repmgr node rejoin</command></link>.
</para>
<para>
By default, &repmgr; will use PostgreSQL's <command>pg_ctl</command> utility to control the PostgreSQL
server. However this can lead to various problems, particularly when PostgreSQL has been
installed from packages, and especially so if <application>systemd</application> is in use.
</para>
<note>
<para>
If using <application>systemd</application>, ensure you have <varname>RemoveIPC</varname> set to <literal>off</literal>.
See the <ulink url="https://www.postgresql.org/docs/current/index.html">PostgreSQL documentation</ulink> section
<ulink url="https://www.postgresql.org/docs/current/kernel-resources.html#SYSTEMD-REMOVEIPC">systemd RemoveIPC</ulink>
and also the <ulink url="https://wiki.postgresql.org/wiki/Systemd">systemd</ulink>
entry in the <ulink url="https://wiki.postgresql.org/wiki/Main_Page">PostgreSQL wiki</ulink> for details.
</para>
</note>
<para>
With this in mind, we recommend to <emphasis>always</emphasis> configure &repmgr; to use the
available system service commands.
</para>
<para>
To do this, specify the appropriate command for each action
in <filename>repmgr.conf</filename> using the following configuration
parameters:
<programlisting>
service_start_command
service_stop_command
service_restart_command
service_reload_command</programlisting>
</para>
<note>
<para>
&repmgr; will not apply <option>pg_bindir</option> when executing any of these commands;
these can be user-defined scripts so must always be specified with the full path.
</para>
</note>
<note>
<para>
It's also possible to specify a <varname>service_promote_command</varname>.
This is intended for systems which provide a package-level promote command,
such as Debian's <application>pg_ctlcluster</application>, to promote the
PostgreSQL from standby to primary.
</para>
<para>
If your packaging system does not provide such a command, it can be left empty,
and &repmgr; will generate the appropriate `pg_ctl ... promote` command.
</para>
<para>
Do not confuse this with <varname>promote_command</varname>, which is used
by &repmgrd; to execute <xref linkend="repmgr-standby-promote"/>.
</para>
</note>
<para>
To confirm which command &repmgr; will execute for each action, use
<command><link linkend="repmgr-node-service">repmgr node service --list-actions --action=...</link></command>, e.g.:
<programlisting>
repmgr -f /etc/repmgr.conf node service --list-actions --action=stop
repmgr -f /etc/repmgr.conf node service --list-actions --action=start
repmgr -f /etc/repmgr.conf node service --list-actions --action=restart
repmgr -f /etc/repmgr.conf node service --list-actions --action=reload</programlisting>
</para>
<para>
These commands will be executed by the system user which &repmgr; runs as (usually <literal>postgres</literal>)
and will probably require passwordless sudo access to be able to execute the command.
</para>
<para>
For example, using <application>systemd</application> on CentOS 7, the service commands can be
set as follows:
<programlisting>
service_start_command = 'sudo systemctl start postgresql-9.6'
service_stop_command = 'sudo systemctl stop postgresql-9.6'
service_restart_command = 'sudo systemctl restart postgresql-9.6'
service_reload_command = 'sudo systemctl reload postgresql-9.6'</programlisting>
and <filename>/etc/sudoers</filename> should be set as follows:
<programlisting>
Defaults:postgres !requiretty
postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql-9.6, \
/usr/bin/systemctl start postgresql-9.6, \
/usr/bin/systemctl restart postgresql-9.6, \
/usr/bin/systemctl reload postgresql-9.6</programlisting>
</para>
<important>
<indexterm>
<primary>pg_ctlcluster</primary>
<secondary>service command settings</secondary>
</indexterm>
<para>
Debian/Ubuntu users: instead of calling <command>sudo systemctl</command> directly, use
<command>sudo pg_ctlcluster</command>, e.g.:
<programlisting>
service_start_command = 'sudo pg_ctlcluster 9.6 main start'
service_stop_command = 'sudo pg_ctlcluster 9.6 main stop'
service_restart_command = 'sudo pg_ctlcluster 9.6 main restart'
service_reload_command = 'sudo pg_ctlcluster 9.6 main reload'</programlisting>
and set <filename>/etc/sudoers</filename> accordingly.
</para>
<para>
While <command>pg_ctlcluster</command> will work when executed as user <literal>postgres</literal>,
it's strongly recommended to use <command>sudo pg_ctlcluster</command> on <application>systemd</application>
systems, to ensure <application>systemd</application> has a correct picture of
the PostgreSQL application state.
</para>
</important>
</sect1>

View File

@@ -1,46 +0,0 @@
<sect1 id="configuration-file" xreflabel="configuration file location">
<title>Configuration file location</title>
<para>
<application>repmgr</application> and <application>repmgrd</application>
use a common configuration file, by default called
<filename>repmgr.conf</filename> (although any name can be used if explicitly specified).
<filename>repmgr.conf</filename> must contain a number of required parameters, including
the database connection string for the local node and the location
of its data directory; other values will be inferred from defaults if
not explicitly supplied. See section `configuration file parameters`
for more details.
</para>
<para>
The configuration file will be searched for in the following locations:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>a configuration file specified by the `-f/--config-file` command line option</para>
</listitem>
<listitem>
<para>
a location specified by the package maintainer (if <application>repmgr</application>
as installed from a package and the package maintainer has specified the configuration
file location)
</para>
</listitem>
<listitem>
<para><filename>repmgr.conf</filename> in the local directory</para>
</listitem>
<listitem>
<para><filename>/etc/repmgr.conf</filename></para>
</listitem>
<listitem>
<para>the directory reported by <application>pg_config --sysconfdir</application></para>
</listitem>
</itemizedlist>
</para>
<para>
Note that if a file is explicitly specified with <application>-f/--config-file</application>,
an error will be raised if it is not found or not readable and no attempt will be made to
check default locations; this is to prevent <application>repmgr</application> unexpectedly
reading the wrong file.
</para>
</sect1>

315
doc/configuration-file.xml Normal file
View File

@@ -0,0 +1,315 @@
<sect1 id="configuration-file" xreflabel="configuration file">
<title>Configuration file</title>
<indexterm>
<primary>repmgr.conf</primary>
</indexterm>
<indexterm>
<primary>configuration</primary>
<secondary>repmgr.conf</secondary>
</indexterm>
<para>
<application>repmgr</application> and &repmgrd;
use a common configuration file, by default called
<filename>repmgr.conf</filename> (although any name can be used if explicitly specified).
<filename>repmgr.conf</filename> must contain a number of required parameters, including
the database connection string for the local node and the location
of its data directory; other values will be inferred from defaults if
not explicitly supplied. See section <xref linkend="configuration-file-settings"/>
for more details.
</para>
<sect2 id="configuration-file-format" xreflabel="configuration file format">
<title>Configuration file format</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>format</secondary>
</indexterm>
<para>
<filename>repmgr.conf</filename> is a plain text file with one parameter/value
combination per line.
</para>
<para>
Whitespace is insignificant (except within a quoted parameter value) and blank lines are ignored.
Hash marks (<literal>#</literal>) designate the remainder of the line as a comment.
Parameter values that are not simple identifiers or numbers should be single-quoted.
</para>
<para>
To embed a single quote in a parameter value, write either two quotes (preferred) or backslash-quote.
</para>
<para>
Example of a valid <filename>repmgr.conf</filename> file:
<programlisting>
# repmgr.conf
node_id=1
node_name= node1
conninfo ='host=node1 dbname=repmgr user=repmgr connect_timeout=2'
data_directory = '/var/lib/pgsql/12/data'</programlisting>
</para>
<note>
<para>
Beginning with <link linkend="release-5.0">repmgr 5.0</link>, configuration
file parsing has been tightened up and now matches the way PostgreSQL
itself parses configuration files.
</para>
<para>
This means <filename>repmgr.conf</filename> files used with earlier &repmgr;
versions may need slight modification before they can be used with &repmgr; 5
and later.
</para>
<para>
The main change is that &repmgr; requires most string values to be
enclosed in single quotes. For example, this was previously valid:
<programlisting>
conninfo=host=node1 user=repmgr dbname=repmgr connect_timeout=2</programlisting>
but must now be changed to:
<programlisting>
conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'</programlisting>
</para>
</note>
<sect3 id="configuration-file-include-directives" xreflabel="configuration file include directives">
<title>Configuration file include directives</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>include directives</secondary>
</indexterm>
<para>
From &repmgr; 5.2, the configuration file can contain the following include directives:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<option>include</option>: include the specified file,
either as an absolute path or path relative to the current file
</simpara>
</listitem>
<listitem>
<simpara>
<option>include_if_exists</option>: include the specified file.
The file is specified as an absolute path or path relative to the current file.
However, if it does not exist, an error will not be raised.
</simpara>
</listitem>
<listitem>
<simpara>
<option>include_dir</option>: include files in the specified directory
which have the <filename>.conf</filename> suffix.
The directory is specified either as an absolute path or path
relative to the current file
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
These behave in exactly the same way as the PostgreSQL configuration file processing;
see the <ulink url="https://www.postgresql.org/docs/current/config-setting.html#CONFIG-INCLUDES">PostgreSQL documentation</ulink>
for additional details.
</para>
</sect3>
</sect2>
<sect2 id="configuration-file-items" xreflabel="configuration file items">
<title>Configuration file items</title>
<para>
The following sections document some sections of the configuration file:
<itemizedlist>
<listitem>
<simpara>
<xref linkend="configuration-file-settings"/>
</simpara>
</listitem>
<listitem>
<simpara>
<xref linkend="configuration-file-optional-settings"/>
</simpara>
</listitem>
<listitem>
<simpara>
<xref linkend="configuration-file-log-settings"/>
</simpara>
</listitem>
<listitem>
<simpara>
<xref linkend="configuration-file-service-commands"/>
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
For a full list of annotated configuration items, see the file
<ulink url="https://raw.githubusercontent.com/EnterpriseDB/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</ulink>.
</para>
<para>
For &repmgrd;-specific settings, see <xref linkend="repmgrd-configuration"/>.
</para>
<note>
<para>
The following parameters in the configuration file can be overridden with
command line options:
<itemizedlist>
<listitem>
<simpara>
<literal>-L/--log-level</literal> overrides <literal>log_level</literal> in
<filename>repmgr.conf</filename>
</simpara>
</listitem>
<listitem>
<simpara>
<literal>-b/--pg_bindir</literal> overrides <literal>pg_bindir</literal> in
<filename>repmgr.conf</filename>
</simpara>
</listitem>
</itemizedlist>
</para>
</note>
</sect2>
<sect2 id="configuration-file-location" xreflabel="configuration file location">
<title>Configuration file location</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>location</secondary>
</indexterm>
<para>
The configuration file will be searched for in the following locations:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>a configuration file specified by the <literal>-f/--config-file</literal> command line option</para>
</listitem>
<listitem>
<para>
a location specified by the package maintainer (if <application>repmgr</application>
as installed from a package and the package maintainer has specified the configuration
file location)
</para>
</listitem>
<listitem>
<para><filename>repmgr.conf</filename> in the local directory</para>
</listitem>
<listitem>
<para><filename>/etc/repmgr.conf</filename></para>
</listitem>
<listitem>
<para>the directory reported by <application>pg_config --sysconfdir</application></para>
</listitem>
</itemizedlist>
</para>
<para>
In examples provided in this documentation, it is assumed the configuration file is located
at <filename>/etc/repmgr.conf</filename>. If &repmgr; is installed from a package, the
configuration file will probably be located at another location specified by the packager;
see appendix <xref linkend="appendix-packages"/> for configuration file locations in
different packaging systems.
</para>
<para>
Note that if a file is explicitly specified with <literal>-f/--config-file</literal>,
an error will be raised if it is not found or not readable, and no attempt will be made to
check default locations; this is to prevent <application>repmgr</application> unexpectedly
reading the wrong configuration file.
</para>
<note>
<para>
If providing the configuration file location with <literal>-f/--config-file</literal>,
avoid using a relative path, particularly when executing <xref linkend="repmgr-primary-register"/>
and <xref linkend="repmgr-standby-register"/>, as &repmgr; stores the configuration file location
in the repmgr metadata for use when &repmgr; is executed remotely (e.g. during
<xref linkend="repmgr-standby-switchover"/>). &repmgr; will attempt to convert the
a relative path into an absolute one, but this may not be the same as the path you
would explicitly provide (e.g. <filename>./repmgr.conf</filename> might be converted
to <filename>/path/to/./repmgr.conf</filename>, whereas you'd normally write
<filename>/path/to/repmgr.conf</filename>).
</para>
</note>
</sect2>
<sect2 id="configuration-file-postgresql-major-upgrades" xreflabel="configuration file and PostgreSQL major version upgrades">
<title>Configuration file and PostgreSQL major version upgrades</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>PostgreSQL major version upgrades</secondary>
</indexterm>
<para>
When upgrading the PostgreSQL cluster to a new major version, <filename>repmgr.conf</filename>
will probably needed to be updated.
</para>
<para>
Usually <option>pg_bindir</option> and <option>data_directory</option> will need to be modified,
particularly if the default package locations are used, as these usually change.
</para>
<para>
It's also possible the location of <filename>repmgr.conf</filename> itself will change
(e.g. from <filename>/etc/repmgr/11/repmgr.conf</filename> to <filename>/etc/repmgr/12/repmgr.conf</filename>).
This is stored as part of the &repmgr; metadata and is used by &repmgr; to execute &repmgr; remotely
(e.g. during a <link linkend="performing-switchover">switchover operation</link>).
</para>
<para>
If the content and/or location of <filename>repmgr.conf</filename> has changed, the &repmgr; metadata
needs to be updated to reflect this. The &repmgr; metadata can be updated on each node with:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<link linkend="repmgr-primary-register">
<command>repmgr primary register --force -f /path/to/repmgr.conf</command>
</link>
</simpara>
</listitem>
<listitem>
<simpara>
<link linkend="repmgr-standby-register">
<command>repmgr standby register --force -f /path/to/repmgr.conf</command>
</link>
</simpara>
</listitem>
<listitem>
<simpara>
<link linkend="repmgr-witness-register">
<command>repmgr witness register --force -f /path/to/repmgr.conf -h primary_host</command>
</link>
</simpara>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>

View File

@@ -0,0 +1,175 @@
<sect1 id="configuration-password-management" xreflabel="password management">
<title>Password Management</title>
<indexterm>
<primary>passwords</primary>
</indexterm>
<sect2 id="configuration-password-management-options" xreflabel="password management options">
<title>Password Management Options</title>
<indexterm>
<primary>passwords</primary>
<secondary>options for managing</secondary>
</indexterm>
<para>
For security purposes it's desirable to protect database access using a password.
</para>
<para>
PostgreSQL has three ways of providing a password:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
including the password in the <option>conninfo</option> string
(e.g. &quot;<literal>host=node1 dbname=repmgr user=repmgr password=foo</literal>&quot;)
</simpara>
</listitem>
<listitem>
<simpara>
exporting the password as an environment variable (<envar>PGPASSWORD</envar>)
</simpara>
</listitem>
<listitem>
<simpara>
storing the password in a dedicated password file
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
We strongly advise against including the password in the <option>conninfo</option> string, as
this will result in the database password being exposed in various places, including in the
<filename>repmgr.conf</filename> file, the <literal>repmgr.nodes</literal> table, any output
generated by &repmgr; which lists the node <option>conninfo</option> strings (e.g.
<link linkend="repmgr-cluster-show">repmgr cluster show</link>) and in the &repmgr; log file,
particularly at <option>log_level=DEBUG</option>.
</para>
<note>
<para>
Currently &repmgr; does not fully support use of the <option>password</option> option in the
<option>conninfo</option> string.
</para>
</note>
<para>
Exporting the password as an environment variable (<envar>PGPASSWORD</envar>) is considered
less insecure, but the PostgreSQL documentation explicitly recommends against doing this:
<blockquote>
<attribution><ulink url="https://www.postgresql.org/docs/current/libpq-envars.html">Environment Variables</ulink></attribution>
<para>
<envar>PGPASSWORD</envar> behaves the same as the <option>password</option>
connection parameter. Use of this environment variable
is not recommended for security reasons, as some operating systems
allow non-root users to see process environment variables via
<application>ps</application>; instead consider using a password file.
</para>
</blockquote>
</para>
<para>
The most secure option for managing passwords is to use a dedicated password file; see the following
section for more details.
</para>
</sect2>
<sect2 id="configuration-password-file" xreflabel="password file">
<title>Using a password file</title>
<indexterm>
<primary>pgpass</primary>
</indexterm>
<indexterm>
<primary>.pgpass</primary>
</indexterm>
<indexterm>
<primary>passwords</primary>
<secondary>using a password file</secondary>
</indexterm>
<para>
The most secure way of storing passwords is in a password file,
which by default is <filename>~/.pgpass</filename>. This file
can only be read by the system user who owns the file, and
PostgreSQL will refuse to use the file unless read/write
permissions are restricted to the file owner. The password(s)
contained in the file will not be directly accessed by
&repmgr; (or any other libpq-based client software such as <application>psql</application>).
</para>
<para>
For full details see the
<ulink url="https://www.postgresql.org/docs/current/libpq-pgpass.html">PostgreSQL password file documentation</ulink>.
</para>
<para>
For use with &repmgr;, the <filename>~/.pgpass</filename> must two entries for each
node in the replication cluster: one for the &repmgr; user who accesses the &repmgr; metadatabase,
and one for replication connections (regardless of whether a dedicated replication user is used).
The file must be present on each node in the replication cluster.
</para>
<para>
A <filename>~/.pgpass</filename> file for a 3-node cluster where the <literal>repmgr</literal> database user
is used for both for accessing the &repmgr; metadatabase and for replication connections would look like this:
<programlisting>
node1:5432:repmgr:repmgr:foo
node1:5432:replication:repmgr:foo
node2:5432:repmgr:repmgr:foo
node2:5432:replication:repmgr:foo
node3:5432:repmgr:repmgr:foo
node3:5432:replication:repmgr:foo</programlisting>
If a dedicated replication user (here: <literal>repluser</literal>) is in use, the file would look like this:
<programlisting>
node1:5432:repmgr:repmgr:foo
node1:5432:replication:repluser:foo
node2:5432:repmgr:repmgr:foo
node2:5432:replication:repluser:foo
node3:5432:repmgr:repmgr:foo
node3:5432:replication:repluser:foo</programlisting>
If you are planning to use the <option>-S</option>/<option>--superuser</option> option,
there must also be an entry enabling the superuser to connect to the &repmgr; database.
Assuming the superuser is <literal>postgres</literal>, the file would look like this:
<programlisting>
node1:5432:repmgr:repmgr:foo
node1:5432:repmgr:postgres:foo
node1:5432:replication:repluser:foo
node2:5432:repmgr:repmgr:foo
node2:5432:repmgr:postgres:foo
node2:5432:replication:repluser:foo
node3:5432:repmgr:repmgr:foo
node3:5432:repmgr:postgres:foo
node3:5432:replication:repluser:foo</programlisting>
</para>
<para>
The <filename>~/.pgpass</filename> file can be simplified with the use of wildcards if
there is no requirement to restrict provision of passwords to particular hosts, ports
or databases. The preceding file could then be formatted like this:
<programlisting>
*:*:*:repmgr:foo
*:*:*:postgres:foo
</programlisting>
</para>
<note>
<para>
It's possible to specify an alternative location for the <filename>~/.pgpass</filename> file, either via
the environment variable <envar>PGPASSFILE</envar>, or (from PostgreSQL 9.6) using the
<varname>passfile</varname> parameter in connection strings.
</para>
<para>
If using the <varname>passfile</varname> parameter, it's essential to ensure the file is in the same
location on all nodes, as when connecting to a remote node, the file referenced is the one on the
local node.
</para>
<para>
Additionally, you <emphasis>must</emphasis> specify the passfile location in <filename>repmgr.conf</filename>
with the <option>passfile</option> option so &repmgr; can write the correct path when creating the
<option>primary_conninfo</option> parameter for replication configuration on standbys.
</para>
</note>
</sect2>
</sect1>

View File

@@ -0,0 +1,200 @@
<sect1 id="configuration-permissions" xreflabel="Database user permissions">
<title>repmgr database user permissions</title>
<indexterm>
<primary>configuration</primary>
<secondary>database user permissions</secondary>
</indexterm>
<para>
If the &repmgr; database user (the PostgreSQL user defined in the
<varname>conninfo</varname> setting is a superuser, no further user permissions need
to be granted.
</para>
<sect2 id="configuration-permissions-no-superuser" xreflabel="Non-super user permissions">
<title>repmgr user as a non-superuser</title>
<para>
In principle the &repmgr; database user does not need to be a superuser.
In this case the &repmgr; will need to be granted execution permissions on certain
functions, and membership of certain roles. However be aware that &repmgr; does
expect to be able to execute certain commands which are restricted to superusers;
in this case either a superuser must be specified with the <option>-S</option>/<option>--superuser</option>
(where available) option, or the corresponding action should be executed manually as a superuser.
</para>
<para>
The following sections describe the actions needed to use &repmgr; with a non-superuser,
and relevant caveats.
</para>
<sect3 id="configuration-permissions-replication" xreflabel="Replication role">
<title>Replication role</title>
<para>
&repmgr; requires a database user with the <literal>REPLICATION</literal> role
to be able to create a replication connection and (if configured) to administer
replication slots.
</para>
<para>
By default this is the database user defined in the <varname>conninfo</varname>
setting. This user can be:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
a superuser
</simpara>
</listitem>
<listitem>
<simpara>
a non-superuser with the <literal>REPLICATION</literal> role
</simpara>
</listitem>
<listitem>
<simpara>
another user defined in the <filename>repmgr.conf</filename> parameter <varname>replication_user</varname> with the <literal>REPLICATION</literal> role
</simpara>
</listitem>
</itemizedlist>
</para>
</sect3>
<sect3 id="configuration-permissions-roles" xreflabel="Database roles for non-superusers">
<title>Database roles</title>
<para>
A non-superuser &repmgr; database user should be a member of the following
<ulink url="https://www.postgresql.org/docs/current/predefined-roles.html">predefined roles</ulink>
(PostgreSQL 10 and later):
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<varname>pg_read_all_stats</varname>
(to read the <varname>status</varname> column of <literal>pg_stat_replication</literal>
and execute <function>pg_database_size()</function> on all databases)
</simpara>
</listitem>
<listitem>
<simpara>
<varname>pg_read_all_settings</varname> (to access the <varname>data_directory</varname> setting)
</simpara>
</listitem>
</itemizedlist>
Alternatively the meta-role <varname>pg_monitor</varname> can be granted, which includes membership
of the above predefined roles.
</para>
<para>
PostgreSQL 15 introduced the <varname>pg_checkpoint</varname> predefined role which allows a
non-superuser &repmgr; database user to perform a CHECKPOINT command.
</para>
<para>
Membership of these roles can be granted with e.g. <command>GRANT pg_read_all_stats TO repmgr</command>.
</para>
<para>
Users of PostgreSQL 9.6 or earlier should upgrade to a supported PostgreSQL version, or provide
the <option>-S</option>/<option>--superuser</option> where available.
</para>
</sect3>
<sect3 id="configuration-permissions-extension" xreflabel="Extension creation">
<title>Extension creation</title>
<para>
&repmgr; requires that the database defined in the <varname>conninfo</varname>
setting contains the <literal>repmgr</literal> extension. The database user defined in the
<varname>conninfo</varname> setting must be able to access this database and
the database objects contained within the extension.
</para>
<para>
The <literal>repmgr</literal> extension can only be installed by a superuser.
If the &repmgr; user is a superuser, &repmgr; will create the extension automatically.
</para>
<para>
Alternatively, the extension can be created manually by a superuser
(with &quot;<command>CREATE EXTENSION repmgr</command>&quot;) before executing
<link linkend="repmgr-primary-register">repmgr primary register</link>.
</para>
</sect3>
<sect3 id="configuration-permissions-functions" xreflabel="Function permissions for non-superusers">
<title>Function permissions</title>
<para>
If the &repmgr; database user is not a superuser, <literal>EXECUTE</literal> permission should be
granted on the following function:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<function>pg_wal_replay_resume()</function> (required by &repmgrd; during failover operations;
if permission is not granted, the failoved process may not function reliably if a node
has WAL replay paused)
</simpara>
</listitem>
<listitem>
<simpara>
<function>pg_promote()</function> (PostgreSQL 12 and later; if permission is not granted,
&repmgr; will fall back to <command>pg_ctl promote</command>)
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
<literal>EXECUTE</literal> permission on functions can be granted with e.g.:
<command>GRANT EXECUTE ON FUNCTION pg_catalog.pg_wal_replay_resume() TO repmgr</command>.
</para>
</sect3>
<sect3 id="configuration-permissions-superuser-required" xreflabel="repmgr actions requiring a superuser">
<title>repmgr actions requiring a superuser</title>
<para>
In some circumstances, &repmgr; may need to perform an operation which cannot be delegated to a
non-superuser.
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
The <command>CHECKPOINT</command> command is executed by
<link linkend="repmgr-standby-switchover">repmgr standby switchover</link>. This can only
be executed by a superuser; if the &repmgr; user is not a superuser,
the <option>-S</option>/<option>--superuser</option> should be used.
From PostgreSQL 15 the <varname>pg_checkpoint</varname> predefined role removes the need of
superuser permissions to perform <command>CHECKPOINT</command> command.
</simpara>
<simpara>
If &repmgr; is not able to execute <command>CHECKPOINT</command>,
there is a risk that the demotion candidate may not be able to shut down as smoothly as might otherwise
have been the case.
</simpara>
</listitem>
<listitem>
<simpara>
The <command>ALTER SYSTEM</command> is executed by &repmgrd; if
<varname>standby_disconnect_on_failover</varname> is set to <literal>true</literal> in
<filename>repmgr.conf</filename>. Until PostgreSQL 14 <command>ALTER SYSTEM</command> can only be executed by
a superuser; if the &repmgr; user is not a superuser, this functionality will not be available.
From PostgreSQL 15 a specific ALTER SYSTEM privilege can be granted with e.g.
<command>GRANT ALTER SYSTEM ON PARAMETER wal_retrieve_retry_interval TO repmgr</command>.
</simpara>
</listitem>
</itemizedlist>
</para>
</sect3>
<sect3 id="configuration-permissions-superuser-option" xreflabel="repmgr commands with --superuser option">
<title>repmgr commands with --superuser option</title>
<para>
The following repmgr commands provide the <option>-S</option>/<option>--superuser</option> option:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><link linkend="repmgr-standby-clone">repmgr standby clone</link> (to be able to copy configuration files outside of the data directory if <option>--copy-external-config-files</option> provided)</simpara>
</listitem>
<listitem>
<simpara><link linkend="repmgr-standby-switchover">repmgr standby switchover</link> (to execute <command>CHECKPOINT</command>)</simpara>
</listitem>
<listitem>
<simpara><link linkend="repmgr-node-check">repmgr node check</link> (to execute <command>repmgr node check --data-directory-config</command>; note this is also called by <link linkend="repmgr-standby-switchover">repmgr standby switchover</link>)</simpara>
</listitem>
<listitem>
<simpara><link linkend="repmgr-node-service">repmgr node service</link> (to execute <command>CHECKPOINT</command> via the <option>--checkpoint</option>; note this is also called by <link linkend="repmgr-standby-switchover">repmgr standby switchover</link>)</simpara>
</listitem>
</itemizedlist>
</para>
</sect3>
</sect2>
</sect1>

View File

@@ -1,7 +0,0 @@
<chapter id="configuration" xreflabel="Configuration">
<title>repmgr configuration</title>
&configuration-file;
&configuration-file-settings;
</chapter>

335
doc/configuration.xml Normal file
View File

@@ -0,0 +1,335 @@
<chapter id="configuration" xreflabel="Configuration">
<title>repmgr configuration</title>
<sect1 id="configuration-prerequisites" xreflabel="Prerequisites for configuration">
<title>Prerequisites for configuration</title>
<indexterm>
<primary>configuration</primary>
<secondary>prerequisites</secondary>
</indexterm>
<indexterm>
<primary>configuration</primary>
<secondary>ssh</secondary>
</indexterm>
<para>
Following software must be installed on both servers:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><application>PostgreSQL</application></simpara>
</listitem>
<listitem>
<simpara>
<application>repmgr</application>
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
At network level, connections between the PostgreSQL port (default: <literal>5432</literal>)
must be possible between all nodes.
</para>
<para>
Passwordless <command>SSH</command> connectivity between all servers in the replication cluster
is not required, but is necessary in the following cases:
<itemizedlist>
<listitem>
<simpara>if you need &repmgr; to copy configuration files from outside the PostgreSQL
data directory (as is the case with e.g. <link linkend="packages-debian-ubuntu">Debian packages</link>);
in this case <command>rsync</command> must also be installed on all servers.
</simpara>
</listitem>
<listitem>
<simpara>to perform <link linkend="performing-switchover">switchover operations</link></simpara>
</listitem>
<listitem>
<simpara>
when executing <command><link linkend="repmgr-cluster-matrix">repmgr cluster matrix</link></command>
and <command><link linkend="repmgr-cluster-crosscheck">repmgr cluster crosscheck</link></command>
</simpara>
</listitem>
</itemizedlist>
</para>
<tip>
<simpara>
Consider setting <varname>ConnectTimeout</varname> to a low value in your SSH configuration.
This will make it faster to detect any SSH connection errors.
</simpara>
</tip>
<sect2 id="configuration-postgresql" xreflabel="PostgreSQL configuration">
<title>PostgreSQL configuration for &repmgr;</title>
<indexterm>
<primary>configuration</primary>
<secondary>PostgreSQL</secondary>
</indexterm>
<indexterm>
<primary>PostgreSQL configuration</primary>
</indexterm>
<para>
The following PostgreSQL configuration parameters may need to be changed in order
for &repmgr; (and replication itself) to function correctly.
</para>
<variablelist>
<varlistentry>
<term><option>hot_standby</option></term>
<listitem>
<indexterm>
<primary>hot_standby</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
<option>hot_standby</option> must always be set to <literal>on</literal>, as &repmgr; needs
to be able to connect to each server it manages.
</para>
<para>
Note that <option>hot_standby</option> defaults to <literal>on</literal> from PostgreSQL 10
and later; in PostgreSQL 9.6 and earlier, the default was <literal>off</literal>.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-HOT-STANDBY">hot_standby</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>wal_level</option></term>
<listitem>
<indexterm>
<primary>wal_level</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
<option>wal_level</option> must be one of <option>replica</option> or <option>logical</option>
(PostgreSQL 9.5 and earlier: one of <option>hot_standby</option> or <option>logical</option>).
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LEVEL">wal_level</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>max_wal_senders</option></term>
<listitem>
<indexterm>
<primary>max_wal_senders</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
<option>max_wal_senders</option> must be set to a value of <literal>2</literal> or greater.
In general you will need one WAL sender for each standby which will attach to the PostgreSQL
instance; additionally &repmgr; will require two free WAL senders in order to clone further
standbys.
</para>
<para>
<option>max_wal_senders</option> should be set to an appropriate value on all PostgreSQL
instances in the replication cluster which may potentially become a primary server or
(in cascading replication) the upstream server of a standby.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-WAL-SENDERS">max_wal_senders</ulink>.
</para>
<note>
<para>
From <productname>PostgreSQL 12</productname>, <option>max_wal_senders</option>
<emphasis>must</emphasis> be set to the same or a higher value as the primary node
(at the time the node was cloned), otherwise the standby will refuse
to start (unless <option>hot_standby</option> is set to <literal>off</literal>, which
will prevent the node from accepting queries).
</para>
</note>
</listitem>
</varlistentry>
<varlistentry>
<term><option>max_replication_slots</option></term>
<listitem>
<indexterm>
<primary>max_replication_slots</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
If you are intending to use replication slots, <option>max_replication_slots</option>
must be set to a non-zero value.
</para>
<para>
<option>max_replication_slots</option> should be set to an appropriate value on all PostgreSQL
instances in the replication cluster which may potentially become a primary server or
(in cascading replication) the upstream server of a standby.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-REPLICATION-SLOTS">max_replication_slots</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>wal_log_hints</option></term>
<listitem>
<indexterm>
<primary>wal_log_hints</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>If you are intending to use <application>pg_rewind</application>,
and the cluster was not initialised using data checksums, you may want to consider enabling
<option>wal_log_hints</option>.
</para>
<para>
For more details see <xref linkend="repmgr-node-rejoin-pg-rewind"/>.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LOG-HINTS">wal_log_hints</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>archive_mode</option></term>
<listitem>
<indexterm>
<primary>archive_mode</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
We suggest setting <option>archive_mode</option> to <literal>on</literal> (and
<option>archive_command</option> to <literal>/bin/true</literal>; see below)
even if you are currently not planning to use WAL file archiving.
</para>
<para>
This will make it simpler to set up WAL file archiving if it is ever required,
as changes to <option>archive_mode</option> require a full PostgreSQL server
restart, while <option>archive_command</option> changes can be applied via a normal
configuration reload.
</para>
<para>
However, &repmgr; itself does not require WAL file archiving.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-MODE">archive_mode</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>archive_command</option></term>
<listitem>
<indexterm>
<primary>archive_command</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
If you have set <option>archive_mode</option> to <literal>on</literal> but are not currently planning
to use WAL file archiving, set <option>archive_command</option> to a command which does nothing but returns
<literal>true</literal>, such as <command>/bin/true</command>. See above for details.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-COMMAND">archive_command</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>wal_keep_segments</option> / <option>wal_keep_size</option></term>
<listitem>
<indexterm>
<primary>wal_keep_segments</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<indexterm>
<primary>wal_keep_size</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
Normally there is no need to set <option>wal_keep_segments</option>
(PostgreSQL 13 and later: <varname>wal_keep_size</varname>; default: <literal>0</literal>),
as it is <emphasis>not</emphasis> a reliable way of ensuring that all required WAL
segments are available to standbys. Replication slots and/or an archiving solution
such as Barman are recommended to ensure standbys have a reliable
source of WAL segments at all times.
</para>
<para>
The only reason ever to set <option>wal_keep_segments</option> / <option>wal_keep_size</option>
is you have you have configured <option>pg_basebackup_options</option>
in <filename>repmgr.conf</filename> to include the setting <literal>--wal-method=fetch</literal>
(PostgreSQL 9.6 and earlier: <literal>--xlog-method=fetch</literal>)
<emphasis>and</emphasis> you have <emphasis>not</emphasis> set <option>restore_command</option>
in <filename>repmgr.conf</filename> to fetch WAL files from a reliable source such as Barman,
in which case you'll need to set <option>wal_keep_segments</option>
to a sufficiently high number to ensure that all WAL files required by the standby
are retained. However we do not recommend WAL retention in this way.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-WAL-KEEP-SEGMENTS">wal_keep_segments</ulink>.
<!--
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-WAL-KEEP-SIZE">wal_keep_size</ulink>.
-->
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
See also the <link linkend="quickstart-postgresql-configuration">PostgreSQL configuration</link> section in the
<link linkend="quickstart">Quick-start guide</link>.
</para>
</sect2>
</sect1>
&configuration-file;
&configuration-file-required-settings;
&configuration-file-optional-settings;
&configuration-file-log-settings;
&configuration-file-service-commands;
&configuration-permissions;
&configuration-password-management;
</chapter>

View File

@@ -1,181 +0,0 @@
<chapter id="event-notifications" xreflabel="event notifications">
<title>Event Notifications</title>
<para>
Each time `repmgr` or `repmgrd` perform a significant event, a record
of that event is written into the `repmgr.events` table together with
a timestamp, an indication of failure or success, and further details
if appropriate. This is useful for gaining an overview of events
affecting the replication cluster. However note that this table has
advisory character and should be used in combination with the `repmgr`
and PostgreSQL logs to obtain details of any events.
</para>
<para>
Example output after a primary was registered and a standby cloned
and registered:
<programlisting>
repmgr=# SELECT * from repmgr.events ;
node_id | event | successful | event_timestamp | details
---------+------------------+------------+-------------------------------+-------------------------------------------------------------------------------------
1 | primary_register | t | 2016-01-08 15:04:39.781733+09 |
2 | standby_clone | t | 2016-01-08 15:04:49.530001+09 | Cloned from host 'repmgr_node1', port 5432; backup method: pg_basebackup; --force: N
2 | standby_register | t | 2016-01-08 15:04:50.621292+09 |
(3 rows)</programlisting>
</para>
<para>
Alternatively, use <xref linkend="repmgr-cluster-event"> to output a
formatted list of events.
</para>
<para>
Additionally, event notifications can be passed to a user-defined program
or script which can take further action, e.g. send email notifications.
This is done by setting the `event_notification_command` parameter in
`repmgr.conf`.
</para>
<para>
This parameter accepts the following format placeholders:
</para>
<variablelist>
<varlistentry>
<term><option>%n</option></term>
<listitem>
<para>
node ID
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%e</option></term>
<listitem>
<para>
event type
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%t</option></term>
<listitem>
<para>
success (1 or 0)
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%t</option></term>
<listitem>
<para>
timestamp
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%d</option></term>
<listitem>
<para>
details
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
The values provided for <literal>%t</literal> and <literal>%d</literal>
will probably contain spaces, so should be quoted in the provided command
configuration, e.g.:
<programlisting>
event_notification_command='/path/to/some/script %n %e %s "%t" "%d"'
</programlisting>
</para>
<para>
Additionally the following format placeholders are available for the event
type <varname>bdr_failover</varname> and optionally <varname>bdr_recovery</varname>:
</para>
<variablelist>
<varlistentry>
<term><option>%c</option></term>
<listitem>
<para>
conninfo string of the next available node
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%a</option></term>
<listitem>
<para>
name of the next available node
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
These should always be quoted.
</para>
<para>
By default, all notification types will be passed to the designated script;
the notification types can be filtered to explicitly named ones:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>primary_register</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_register</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_unregister</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_clone</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_promote</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_follow</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_disconnect_manual</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_start</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_shutdown</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_promote</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_follow</literal></simpara>
</listitem>
<listitem>
<simpara><literal>bdr_failover</literal></simpara>
</listitem>
<listitem>
<simpara><literal>bdr_reconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>bdr_recovery</literal></simpara>
</listitem>
<listitem>
<simpara><literal>bdr_register</literal></simpara>
</listitem>
<listitem>
<simpara><literal>bdr_unregister</literal></simpara>
</listitem>
</itemizedlist>
</para>
<para>
Note that under some circumstances (e.g. when no replication cluster primary
could be located), it will not be possible to write an entry into the
<literal>repmgr.events</literal>
table, in which case executing a script via <varname>event_notification_command</varname>
can serve as a fallback by generating some form of notification.
</para>
</chapter>

282
doc/event-notifications.xml Normal file
View File

@@ -0,0 +1,282 @@
<chapter id="event-notifications" xreflabel="event notifications">
<title>Event Notifications</title>
<indexterm>
<primary>event notifications</primary>
</indexterm>
<para>
Each time &repmgr; or &repmgrd; perform a significant event, a record
of that event is written into the <literal>repmgr.events</literal> table together with
a timestamp, an indication of failure or success, and further details
if appropriate. This is useful for gaining an overview of events
affecting the replication cluster. However note that this table has
advisory character and should be used in combination with the &repmgr;
and PostgreSQL logs to obtain details of any events.
</para>
<para>
Example output after a primary was registered and a standby cloned
and registered:
<programlisting>
repmgr=# SELECT * from repmgr.events ;
node_id | event | successful | event_timestamp | details
---------+------------------+------------+-------------------------------+-------------------------------------------------------------------------------------
1 | primary_register | t | 2016-01-08 15:04:39.781733+09 |
2 | standby_clone | t | 2016-01-08 15:04:49.530001+09 | Cloned from host 'repmgr_node1', port 5432; backup method: pg_basebackup; --force: N
2 | standby_register | t | 2016-01-08 15:04:50.621292+09 |
(3 rows)</programlisting>
</para>
<para>
Alternatively, use <xref linkend="repmgr-cluster-event"/> to output a
formatted list of events.
</para>
<para>
Additionally, event notifications can be passed to a user-defined program
or script which can take further action, e.g. send email notifications.
This is done by setting the <literal>event_notification_command</literal> parameter in
<filename>repmgr.conf</filename>.
</para>
<para>
The following format placeholders are provided for all event notifications:
</para>
<variablelist>
<varlistentry>
<term><option>%n</option></term>
<listitem>
<para>
node ID
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%e</option></term>
<listitem>
<para>
event type
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%s</option></term>
<listitem>
<para>
success (1) or failure (0)
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%t</option></term>
<listitem>
<para>
timestamp
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%d</option></term>
<listitem>
<para>
details
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
The values provided for <literal>%t</literal> and <literal>%d</literal>
may contain spaces, so should be quoted in the provided command
configuration, e.g.:
<programlisting>
event_notification_command='/path/to/some/script %n %e %s "%t" "%d"'</programlisting>
</para>
<para>
The following parameters are provided for a subset of event notifications; their meaning may
change according to context:
</para>
<variablelist>
<varlistentry>
<term><option>%p</option></term>
<listitem>
<para>
node ID of the current primary (<xref linkend="repmgr-standby-register"/> and <xref linkend="repmgr-standby-follow"/>)
</para>
<para>
node ID of the demoted primary (<xref linkend="repmgr-standby-switchover"/> only)
</para>
<para>
node ID of the former primary (<literal>repmgrd_failover_promote</literal> only)
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%c</option></term>
<listitem>
<para>
<literal>conninfo</literal> string of the primary node
(<xref linkend="repmgr-standby-register"/> and <xref linkend="repmgr-standby-follow"/>)
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>%a</option></term>
<listitem>
<para>
name of the current primary node (<xref linkend="repmgr-standby-register"/> and <xref linkend="repmgr-standby-follow"/>)
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
The values provided for <literal>%c</literal> and <literal>%a</literal>
may contain spaces, so should always be quoted.
</para>
<para>
By default, all notification types will be passed to the designated script;
the notification types can be filtered to explicitly named ones using the
<varname>event_notifications</varname> parameter, e.g.:
<programlisting>
event_notifications='primary_register,standby_register,witness_register'</programlisting>
</para>
<para>
Events generated by the &repmgr; command:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal><link linkend="repmgr-primary-register-events">cluster_created</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-primary-register-events">primary_register</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-primary-unregister-events">primary_unregister</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-clone-events">standby_clone</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-register-events">standby_register</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-register-events">standby_register_sync</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-unregister-events">standby_unregister</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-promote-events">standby_promote</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-follow-events">standby_follow</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-switchover-events">standby_switchover</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-witness-register-events">witness_register</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-witness-unregister-events">witness_unregister</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-node-rejoin-events">node_rejoin</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-cluster-cleanup-events">cluster_cleanup</link></literal></simpara>
</listitem>
</itemizedlist>
</para>
<para>
Events generated by &repmgrd; (streaming replication mode):
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>repmgrd_start</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_shutdown</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_reload</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_promote</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_follow</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_aborted</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_standby_reconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_promote_error</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_local_disconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_local_reconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_upstream_disconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_upstream_reconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_disconnect_manual</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_failure</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_recovery</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgrd-primary-child-disconnection-events">child_node_disconnect</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgrd-primary-child-disconnection-events">child_node_reconnect</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgrd-primary-child-disconnection-events">child_node_new_connect</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgrd-primary-child-disconnection-events">child_nodes_disconnect_command</link></literal></simpara>
</listitem>
</itemizedlist>
</para>
<para>
Note that under some circumstances (e.g. when no replication cluster primary
could be located), it will not be possible to write an entry into the
<literal>repmgr.events</literal>
table, in which case executing a script via <varname>event_notification_command</varname>
can serve as a fallback by generating some form of notification.
</para>
</chapter>

View File

@@ -1,77 +0,0 @@
<!-- doc/filelist.sgml -->
<!ENTITY legal SYSTEM "legal.sgml">
<!ENTITY bookindex SYSTEM "bookindex.sgml">
<!--
Some parts of the documentation are also source for some plain-text
files used during installation. To selectively ignore or include
some parts (e.g., external xref's) when generating these files we use
these parameter entities. See also standalone-install.sgml.
-->
<!ENTITY % standalone-ignore "INCLUDE">
<!ENTITY % standalone-include "IGNORE">
<!-- doc/filelist.sgml -->
<!--
By default, no index is included. Use -i include-index on the command line
to include it.
-->
<!ENTITY % include-index "IGNORE">
<!--
Create empty index element for processing by XSLT stylesheet.
-->
<!ENTITY % include-xslt-index "IGNORE">
<!--
Include external documentation sections
-->
<!ENTITY overview SYSTEM "overview.sgml">
<!ENTITY install SYSTEM "install.sgml">
<!ENTITY install-requirements SYSTEM "install-requirements.sgml">
<!ENTITY install-packages SYSTEM "install-packages.sgml">
<!ENTITY install-source SYSTEM "install-source.sgml">
<!ENTITY quickstart SYSTEM "quickstart.sgml">
<!ENTITY configuration SYSTEM "configuration.sgml">
<!ENTITY configuration-file SYSTEM "configuration-file.sgml">
<!ENTITY configuration-file-settings SYSTEM "configuration-file-settings.sgml">
<!ENTITY cloning-standbys SYSTEM "cloning-standbys.sgml">
<!ENTITY promoting-standby SYSTEM "promoting-standby.sgml">
<!ENTITY follow-new-primary SYSTEM "follow-new-primary.sgml">
<!ENTITY switchover SYSTEM "switchover.sgml">
<!ENTITY event-notifications SYSTEM "event-notifications.sgml">
<!ENTITY upgrading-repmgr SYSTEM "upgrading-repmgr.sgml">
<!ENTITY repmgrd-automatic-failover SYSTEM "repmgrd-automatic-failover.sgml">
<!ENTITY repmgrd-configuration SYSTEM "repmgrd-configuration.sgml">
<!ENTITY repmgrd-demonstration SYSTEM "repmgrd-demonstration.sgml">
<!ENTITY repmgrd-monitoring SYSTEM "repmgrd-monitoring.sgml">
<!ENTITY repmgrd-degraded-monitoring SYSTEM "repmgrd-degraded-monitoring.sgml">
<!ENTITY repmgrd-cascading-replication SYSTEM "repmgrd-cascading-replication.sgml">
<!ENTITY repmgrd-network-split SYSTEM "repmgrd-network-split.sgml">
<!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.sgml">
<!ENTITY repmgr-primary-unregister SYSTEM "repmgr-primary-unregister.sgml">
<!ENTITY repmgr-standby-clone SYSTEM "repmgr-standby-clone.sgml">
<!ENTITY repmgr-standby-register SYSTEM "repmgr-standby-register.sgml">
<!ENTITY repmgr-standby-unregister SYSTEM "repmgr-standby-unregister.sgml">
<!ENTITY repmgr-standby-promote SYSTEM "repmgr-standby-promote.sgml">
<!ENTITY repmgr-standby-follow SYSTEM "repmgr-standby-follow.sgml">
<!ENTITY repmgr-standby-switchover SYSTEM "repmgr-standby-switchover.sgml">
<!ENTITY repmgr-node-status SYSTEM "repmgr-node-status.sgml">
<!ENTITY repmgr-node-check SYSTEM "repmgr-node-check.sgml">
<!ENTITY repmgr-node-rejoin SYSTEM "repmgr-node-rejoin.sgml">
<!ENTITY repmgr-cluster-show SYSTEM "repmgr-cluster-show.sgml">
<!ENTITY repmgr-cluster-matrix SYSTEM "repmgr-cluster-matrix.sgml">
<!ENTITY repmgr-cluster-crosscheck SYSTEM "repmgr-cluster-crosscheck.sgml">
<!ENTITY repmgr-cluster-event SYSTEM "repmgr-cluster-event.sgml">
<!ENTITY repmgr-cluster-cleanup SYSTEM "repmgr-cluster-cleanup.sgml">
<!ENTITY appendix-signatures SYSTEM "appendix-signatures.sgml">
<!ENTITY bookindex SYSTEM "bookindex.sgml">

71
doc/filelist.xml Normal file
View File

@@ -0,0 +1,71 @@
<!-- doc/filelist.xml -->
<!ENTITY legal SYSTEM "legal.xml">
<!ENTITY bookindex SYSTEM "bookindex.xml">
<!--
Include external documentation sections
-->
<!ENTITY overview SYSTEM "overview.xml">
<!ENTITY install SYSTEM "install.xml">
<!ENTITY install-requirements SYSTEM "install-requirements.xml">
<!ENTITY install-packages SYSTEM "install-packages.xml">
<!ENTITY install-source SYSTEM "install-source.xml">
<!ENTITY quickstart SYSTEM "quickstart.xml">
<!ENTITY configuration SYSTEM "configuration.xml">
<!ENTITY configuration-file SYSTEM "configuration-file.xml">
<!ENTITY configuration-file-required-settings SYSTEM "configuration-file-required-settings.xml">
<!ENTITY configuration-file-optional-settings SYSTEM "configuration-file-optional-settings.xml">
<!ENTITY configuration-file-log-settings SYSTEM "configuration-file-log-settings.xml">
<!ENTITY configuration-file-service-commands SYSTEM "configuration-file-service-commands.xml">
<!ENTITY configuration-permissions SYSTEM "configuration-permissions.xml">
<!ENTITY configuration-password-management SYSTEM "configuration-password-management.xml">
<!ENTITY cloning-standbys SYSTEM "cloning-standbys.xml">
<!ENTITY promoting-standby SYSTEM "promoting-standby.xml">
<!ENTITY follow-new-primary SYSTEM "follow-new-primary.xml">
<!ENTITY switchover SYSTEM "switchover.xml">
<!ENTITY event-notifications SYSTEM "event-notifications.xml">
<!ENTITY upgrading-repmgr SYSTEM "upgrading-repmgr.xml">
<!ENTITY repmgrd-overview SYSTEM "repmgrd-overview.xml">
<!ENTITY repmgrd-automatic-failover SYSTEM "repmgrd-automatic-failover.xml">
<!ENTITY repmgrd-configuration SYSTEM "repmgrd-configuration.xml">
<!ENTITY repmgrd-operation SYSTEM "repmgrd-operation.xml">
<!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.xml">
<!ENTITY repmgr-primary-unregister SYSTEM "repmgr-primary-unregister.xml">
<!ENTITY repmgr-standby-clone SYSTEM "repmgr-standby-clone.xml">
<!ENTITY repmgr-standby-register SYSTEM "repmgr-standby-register.xml">
<!ENTITY repmgr-standby-unregister SYSTEM "repmgr-standby-unregister.xml">
<!ENTITY repmgr-standby-promote SYSTEM "repmgr-standby-promote.xml">
<!ENTITY repmgr-standby-follow SYSTEM "repmgr-standby-follow.xml">
<!ENTITY repmgr-standby-switchover SYSTEM "repmgr-standby-switchover.xml">
<!ENTITY repmgr-witness-register SYSTEM "repmgr-witness-register.xml">
<!ENTITY repmgr-witness-unregister SYSTEM "repmgr-witness-unregister.xml">
<!ENTITY repmgr-node-status SYSTEM "repmgr-node-status.xml">
<!ENTITY repmgr-node-check SYSTEM "repmgr-node-check.xml">
<!ENTITY repmgr-node-rejoin SYSTEM "repmgr-node-rejoin.xml">
<!ENTITY repmgr-node-service SYSTEM "repmgr-node-service.xml">
<!ENTITY repmgr-cluster-show SYSTEM "repmgr-cluster-show.xml">
<!ENTITY repmgr-cluster-matrix SYSTEM "repmgr-cluster-matrix.xml">
<!ENTITY repmgr-cluster-crosscheck SYSTEM "repmgr-cluster-crosscheck.xml">
<!ENTITY repmgr-cluster-event SYSTEM "repmgr-cluster-event.xml">
<!ENTITY repmgr-cluster-cleanup SYSTEM "repmgr-cluster-cleanup.xml">
<!ENTITY repmgr-service-status SYSTEM "repmgr-service-status.xml">
<!ENTITY repmgr-service-pause SYSTEM "repmgr-service-pause.xml">
<!ENTITY repmgr-service-unpause SYSTEM "repmgr-service-unpause.xml">
<!ENTITY repmgr-daemon-start SYSTEM "repmgr-daemon-start.xml">
<!ENTITY repmgr-daemon-stop SYSTEM "repmgr-daemon-stop.xml">
<!ENTITY appendix-release-notes SYSTEM "appendix-release-notes.xml">
<!ENTITY appendix-faq SYSTEM "appendix-faq.xml">
<!ENTITY appendix-signatures SYSTEM "appendix-signatures.xml">
<!ENTITY appendix-packages SYSTEM "appendix-packages.xml">
<!ENTITY appendix-support SYSTEM "appendix-support.xml">
<!ENTITY bookindex SYSTEM "bookindex.xml">

View File

@@ -1,16 +1,22 @@
<chapter id="follow-new-primary" xreflabel="Following a new primary">
<chapter id="follow-new-primary">
<title>Following a new primary</title>
<indexterm>
<primary>Following a new primary</primary>
<seealso>repmgr standby follow</seealso>
</indexterm>
<para>
Following the failure or removal of the replication cluster's existing primary
server, <xref linkend="repmgr-standby-follow"> can be used to make 'orphaned' standbys
server, <xref linkend="repmgr-standby-follow"/> can be used to make &quot;orphaned&quot; standbys
follow the new primary and catch up to its current state.
</para>
<para>
To demonstrate this, assuming a replication cluster in the same state as the
end of the preceding section (<xref linkend="promoting-standby">),
end of the preceding section (<xref linkend="promoting-standby"/>),
execute this:
<programlisting>
$ repmgr -f /etc/repmgr.conf repmgr standby follow
$ repmgr -f /etc/repmgr.conf standby follow
INFO: changing node 3's primary to node 2
NOTICE: restarting server using "pg_ctl -l /var/log/postgresql/startup.log -w -D '/var/lib/postgresql/data' restart"
waiting for server to shut down......... done
@@ -22,7 +28,8 @@
</programlisting>
</para>
<para>
The standby is now replicating from the new primary and `repmgr cluster show`
The standby is now replicating from the new primary and
<command><link linkend="repmgr-cluster-show">repmgr cluster show</link></command>
output reflects this:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show

View File

@@ -1,32 +0,0 @@
<sect1 id="installation-packages" xreflabel="Installing from packages">
<title>Installing &repmgr; from packages</title>
<para>
We recommend installing `repmgr` using the available packages for your
system.
</para>
<sect2 id="installation-packages-redhat" xreflabel="Installing from packages on RHEL, Fedora and CentOS">
<title>RedHat/Fedora/CentOS</title>
<para>
RPM packages for `repmgr` are available via Yum through
the PostgreSQL Global Development Group RPM repository
( <ulink url="https://yum.postgresql.org/">http://yum.postgresql.org/</>).
Follow the instructions for your distribution (RedHat, CentOS,
Fedora, etc.) and architecture as detailed at yum.postgresql.org.
</para>
<para>
2ndQuadrant also provides its own RPM packages which are made available
at the same time as each `repmgr` release, as it can take some days for
them to become available via the main PGDG repository. See here for details:
<ulink url="http://repmgr.org/yum-repository.html">http://repmgr.org/yum-repository.html</>
</para>
</sect2>
<sect2 id="installation-packages-debian" xreflabel="Installing from packages on Debian or Ubuntu">
<title>Debian/Ubuntu</title>
<para>.deb packages for `repmgr` are available from the
PostgreSQL Community APT repository (<ulink url="http://apt.postgresql.org/">http://apt.postgresql.org/</> ).
Instructions can be found in the APT section of the PostgreSQL Wiki
(<ulink url="https://wiki.postgresql.org/wiki/Apt">https://wiki.postgresql.org/wiki/Apt</> ).
</para>
</sect2>
</sect1>

285
doc/install-packages.xml Normal file
View File

@@ -0,0 +1,285 @@
<sect1 id="installation-packages" xreflabel="Installing from packages">
<title>Installing &repmgr; from packages</title>
<indexterm>
<primary>installation</primary>
<secondary>from packages</secondary>
</indexterm>
<para>
We recommend installing &repmgr; using the available packages for your
system.
</para>
<sect2 id="installation-packages-redhat" xreflabel="Installing from packages on RHEL, CentOS and Fedora">
<title>RedHat/CentOS/Fedora</title>
<indexterm>
<primary>installation</primary>
<secondary>on Red Hat/CentOS/Fedora etc.</secondary>
</indexterm>
<para>
&repmgr; RPM packages for RedHat/CentOS variants and Fedora are available from the
<ulink url="https://www.enterprisedb.com">EDB</ulink>
<ulink url="https://dl.enterprisedb.com/">public repository</ulink>; see following
section for details.
</para>
<note>
<para>
Currently the <ulink url="https://www.enterprisedb.com">EDB</ulink>
<ulink url="https://dl.enterprisedb.com/">public repository</ulink> provides
support for RedHat/CentOS versions 6,7 and 8.
</para>
</note>
<para>
RPM packages for &repmgr; are also available via Yum through
the PostgreSQL Global Development Group (PGDG) RPM repository
(<ulink url="https://yum.postgresql.org/">https://yum.postgresql.org/</ulink>).
Follow the instructions for your distribution (RedHat, CentOS,
Fedora, etc.) and architecture as detailed there. Note that it can take some days
for new &repmgr; packages to become available via the this repository.
</para>
<note>
<para>
&repmgr; RPM packages are designed to be compatible with the community-provided PostgreSQL packages
and EDB's PostgreSQL Extended Server (formerly 2ndQPostgres).
They may not work with vendor-specific packages such as those provided by RedHat for RHEL
customers, as the PostgreSQL filesystem layout may be different to the community RPMs.
Please contact your support vendor for assistance.
</para>
<para>
See also <link linkend="appendix-faq">FAQ</link> entry
<xref linkend="faq-third-party-packages"/>.
</para>
</note>
<para>
For more information on the package contents, including details of installation
paths and relevant <link linkend="configuration-file-service-commands">service commands</link>,
see the appendix section <xref linkend="packages-centos"/>.
</para>
<sect3 id="installation-packages-redhat-2ndq">
<title>EDB public RPM yum repository</title>
<para>
<ulink url="https://www.enterprisedb.com/">EDB</ulink> provides a dedicated <literal>yum</literal>
<ulink url="https://dl.enterprisedb.com/">public repository</ulink> for EDB software,
including &repmgr;. We recommend using this for all future &repmgr; releases.
</para>
<para>
General instructions for using this repository can be found on its
<ulink url="https://dl.enterprisedb.com/">homepage</ulink>. Specific instructions
for installing &repmgr; follow below.
</para>
<para>
<emphasis>Installation</emphasis>
<itemizedlist>
<listitem>
<para>
Locate the repository RPM for your PostgreSQL version from the list at:
<ulink url="https://dl.enterprisedb.com/">https://dl.enterprisedb.com/</ulink>
</para>
</listitem>
<listitem>
<para>
Install the repository definition for your distribution and PostgreSQL version
(this enables the EDB repository as a source of &repmgr; packages).
</para>
<para>
For example, for PostgreSQL 14 on Rocky Linux 8, execute:
<programlisting>
curl https://dl.enterprisedb.com/default/release/get/14/rpm | sudo bash</programlisting>
</para>
<para>
Verify that the repository is installed with:
<programlisting>
sudo dnf repolist</programlisting>
The output should contain two entries like this:
<programlisting>
2ndquadrant-dl-default-release-pg14 2ndQuadrant packages (PG14) for 8 - x86_64
2ndquadrant-dl-default-release-pg14-debug 2ndQuadrant packages (PG14) for 8 - x86_64 - Debug</programlisting>
</para>
</listitem>
<listitem>
<para>
Install the &repmgr; version appropriate for your PostgreSQL version (e.g. <literal>repmgr14</literal>):
<programlisting>
sudo dnf install repmgr14</programlisting>
</para>
<tip>
<para>
To determine the names of available packages, execute:
<programlisting>
dnf search repmgr</programlisting>
</para>
<para>
In CentOS 7 and earlier, use <literal>yum</literal> instead of <literal>dnf</literal>.
</para>
</tip>
</listitem>
</itemizedlist>
</para>
<para>
<emphasis>Compatibility with PGDG Repositories</emphasis>
</para>
<para>
The EDB &repmgr; yum repository packages use the same definitions and file system layout as the
main PGDG repository.
</para>
<para>
Normally <application>yum</application> will prioritize the repository with the most recent &repmgr; version.
Once the PGDG repository has been updated, it doesn't matter which repository
the packages are installed from.
</para>
<para>
To ensure the EDB repository is always prioritised, set the <literal>priority</literal> option
in the repository configuration file (e.g. <filename>/etc/yum.repos.d/2ndquadrant-dl-default-release-pg14.repo</filename>
accordingly.
</para>
<note>
<para>
With CentOS 7 and earlier, the package <literal>yum-plugin-priorities</literal> must be installed
to be able to set the repository priority.
</para>
</note>
<para>
<emphasis>Installing a specific package version</emphasis>
</para>
<para>
To install a specific package version, execute <command>dnf --showduplicates list</command>
for the package in question:
<programlisting>
[root@localhost ~]# dnf --showduplicates list repmgr10
Last metadata expiration check: 0:09:15 ago on Fri 11 Mar 2022 01:09:19 AM UTC.
Installed Packages
repmgr10.x86_64 5.3.1-1.el8 @2ndquadrant-dl-default-release-pg10
Available Packages
repmgr10.x86_64 5.0.0-1.rhel8 pgdg10
repmgr10.x86_64 5.1.0-1.el8 2ndquadrant-dl-default-release-pg10
repmgr10.x86_64 5.1.0-1.rhel8 pgdg10
repmgr10.x86_64 5.1.0-2.el8 2ndquadrant-dl-default-release-pg10
repmgr10.x86_64 5.2.0-1.el8 2ndquadrant-dl-default-release-pg10
repmgr10.x86_64 5.2.0-1.rhel8 pgdg10
repmgr10.x86_64 5.2.1-1.el8 2ndquadrant-dl-default-release-pg10
repmgr10.x86_64 5.3.0-1.el8 2ndquadrant-dl-default-release-pg10
repmgr10.x86_64 5.3.1-1.el8 2ndquadrant-dl-default-release-pg10</programlisting>
then append the appropriate version number to the package name with a hyphen, e.g.:
<programlisting>
[root@localhost ~]# dnf install repmgr10-5.3.0-1.el8</programlisting>
</para>
<para>
<emphasis>Installing old packages</emphasis>
</para>
<para>
See appendix <link linkend="packages-old-versions-rhel-centos">Installing old package versions</link>
for details on how to retrieve older package versions.
</para>
</sect3>
</sect2>
<sect2 id="installation-packages-debian" xreflabel="Installing from packages on Debian or Ubuntu">
<title>Debian/Ubuntu</title>
<indexterm>
<primary>installation</primary>
<secondary>on Debian/Ubuntu etc.</secondary>
</indexterm>
<para>.deb packages for &repmgr; are available from the
PostgreSQL Community APT repository (<ulink url="https://apt.postgresql.org/">https://apt.postgresql.org/</ulink>).
Instructions can be found in the APT section of the PostgreSQL Wiki
(<ulink url="https://wiki.postgresql.org/wiki/Apt">https://wiki.postgresql.org/wiki/Apt</ulink>).
</para>
<para>
For more information on the package contents, including details of installation
paths and relevant <link linkend="configuration-file-service-commands">service commands</link>,
see the appendix section <xref linkend="packages-debian-ubuntu"/>.
</para>
<sect3 id="installation-packages-debian-ubuntu-2ndq">
<title>EDB public apt repository for Debian/Ubuntu</title>
<para>
<ulink url="https://www.enterprisedb.com/">EDB</ulink> provides a
<ulink url="https://dl.enterprisedb.com/">public apt repository</ulink> for EDB software,
including &repmgr;.
</para>
<para>
General instructions for using this repository can be found on its
<ulink url="https://dl.enterprisedb.com/">homepage</ulink>. Specific instructions
for installing &repmgr; follow below.
</para>
<para>
<emphasis>Installation</emphasis>
<itemizedlist>
<listitem>
<para>
Install the repository definition for your distribution and PostgreSQL version
(this enables the EDB repository as a source of &repmgr; packages) by executing:
<programlisting>
curl https://dl.enterprisedb.com/default/release/get/deb | sudo bash</programlisting>
</para>
<note>
<para>
This will automatically install the following additional packages, if not already present:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>lsb-release</literal></simpara>
</listitem>
<listitem>
<simpara><literal>apt-transport-https</literal></simpara>
</listitem>
</itemizedlist>
</para>
</note>
</listitem>
<listitem>
<para>
Install the &repmgr; version appropriate for your PostgreSQL version (e.g. <literal>repmgr11</literal>):
<programlisting>
sudo apt-get install postgresql-11-repmgr</programlisting>
</para>
<note>
<para>
For packages for PostgreSQL 9.6 and earlier, the package name includes
a period between major and minor version numbers, e.g.
<literal>postgresql-9.6-repmgr</literal>.
</para>
</note>
</listitem>
</itemizedlist>
</para>
<para>
<emphasis>Installing old packages</emphasis>
</para>
<para>
See appendix <link linkend="packages-old-versions-debian">Installing old package versions</link>
for details on how to retrieve older package versions.
</para>
</sect3>
</sect2>
</sect1>

View File

@@ -1,67 +0,0 @@
<sect1 id="install-requirements" xreflabel="installation requirements">
<title>Requirements for installing repmgr</title>
<para>
repmgr is developed and tested on Linux and OS X, but should work on any
UNIX-like system supported by PostgreSQL itself. There is no support for
Microsoft Windows.
</para>
<para>
From version 4.0, repmgr is compatible with all PostgreSQL versions from 9.4, including PostgreSQL 10.
</para>
<para>
PostgreSQL 9.3 is supported by repmgr 3.3.
</para>
<note>
<simpara>
If upgrading from `repmgr 3`, please see the separate upgrade guide
`doc/upgrading-from-repmgr3.md`.
</simpara>
</note>
<para>
All servers in the replication cluster must be running the same major version of
PostgreSQL, and we recommend that they also run the same minor version.
</para>
<para>
`repmgr` must be installed on each server in the replication cluster.
If installing repmgr from packages, the package version must match the PostgreSQL
version. If installing from source, repmgr must be compiled against the same
major version.
</para>
<para>
A dedicated system user for `repmgr` is *not* required; as many `repmgr` and
`repmgrd` actions require direct access to the PostgreSQL data directory,
these commands should be executed by the `postgres` user.
</para>
<para>
Passwordless `ssh` connectivity between all servers in the replication cluster
is not required, but is necessary in the following cases:
<itemizedlist>
<listitem>
<simpara>if you need `repmgr` to copy configuration files from outside the PostgreSQL
data directory (in which case `rsync` is also required)</simpara>
</listitem>
<listitem>
<simpara>to perform switchover operations</simpara>
</listitem>
<listitem>
<simpara>when executing `repmgr cluster matrix` and `repmgr cluster crosscheck`</simpara>
</listitem>
</itemizedlist>
</para>
<tip>
<simpara>
We recommend using a session multiplexer utility such as `screen` or
`tmux` when performing long-running actions (such as cloning a database)
on a remote server - this will ensure the `repmgr` action won't be prematurely
terminated if your `ssh` session to the server is interrupted or closed.
</simpara>
</tip>
</sect1>

View File

@@ -0,0 +1,346 @@
<sect1 id="install-requirements" xreflabel="installation requirements">
<title>Requirements for installing repmgr</title>
<indexterm>
<primary>installation</primary>
<secondary>requirements</secondary>
</indexterm>
<para>
repmgr is developed and tested on Linux and OS X, but should work on any
UNIX-like system supported by PostgreSQL itself. There is no support for
Microsoft Windows.
</para>
<para>
&repmgr; &repmgrversion; is compatible with all supported PostgreSQL versions from 13.x. See
section <link linkend="install-compatibility-matrix">&repmgr; compatibility matrix</link>
for an overview of version compatibility.
</para>
<note>
<simpara>
If upgrading from &repmgr; 3.x, please see the section <xref linkend="upgrading-from-repmgr-3"/>.
</simpara>
</note>
<para>
All servers in the replication cluster must be running the same major version of
PostgreSQL, and we recommend that they also run the same minor version.
</para>
<para>
&repmgr; must be installed on each server in the replication cluster.
If installing repmgr from packages, the package version must match the PostgreSQL
version. If installing from source, &repmgr; must be compiled against the same
major version.
</para>
<note>
<simpara>
The same &quot;major&quot; &repmgr; version (e.g. <literal>&repmgrversion;.x</literal>) <emphasis>must</emphasis>
be installed on all node in the replication cluster. We strongly recommend keeping all
nodes on the same (preferably latest) &quot;minor&quot; &repmgr; version to minimize the risk
of incompatibilities.
</simpara>
<simpara>
If different &quot;major&quot; &repmgr; versions (e.g. 5.2.x and &repmgrversion;)
are installed on different nodes, in the best case &repmgr; (in particular &repmgrd;)
will not run. In the worst case, you will end up with a broken cluster.
</simpara>
</note>
<para>
A dedicated system user for &repmgr; is <emphasis>not</emphasis> required; as many &repmgr; and
&repmgrd; actions require direct access to the PostgreSQL data directory,
these commands should be executed by the <literal>postgres</literal> user.
</para>
<para>
See also <link linkend="configuration-prerequisites">Prerequisites for configuration</link>
for information on networking requirements.
</para>
<tip>
<simpara>
We recommend using a session multiplexer utility such as <command>screen</command> or
<command>tmux</command> when performing long-running actions (such as cloning a database)
on a remote server - this will ensure the &repmgr; action won't be prematurely
terminated if your <command>ssh</command> session to the server is interrupted or closed.
</simpara>
</tip>
<sect2 id="install-compatibility-matrix">
<title>&repmgr; compatibility matrix</title>
<indexterm>
<primary>repmgr</primary>
<secondary>compatibility matrix</secondary>
</indexterm>
<indexterm>
<primary>compatibility matrix</primary>
</indexterm>
<para>
The following table provides an overview of which &repmgr; version supports
which PostgreSQL version.
</para>
<table id="repmgr-compatibility-matrix">
<title>&repmgr; compatibility matrix</title>
<tgroup cols="4">
<thead>
<row>
<entry>
&repmgr; version
</entry>
<entry>
Supported?
</entry>
<entry>
Latest release
</entry>
<entry>
Supported PostgreSQL versions
</entry>
<entry>
Notes
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
&repmgr; 5.5
</entry>
<entry>
Yes
</entry>
<entry>
<link linkend="release-5.5.0">&repmgrversion;</link> (&releasedate;)
</entry>
<entry>
12, 13, 15, 16, 17
</entry>
<entry>
&nbsp;
</entry>
</row>
<row>
<entry>
&repmgr; 5.4.1
</entry>
<entry>
Yes
</entry>
<entry>
<link linkend="release-5.4.1">5.4.1</link> (2023-04-04)
</entry>
<entry>
10, 11, 12, 13, 15
</entry>
<entry>
&nbsp;
</entry>
</row>
<row>
<entry>
&repmgr; 5.3.1
</entry>
<entry>
Yes
</entry>
<entry>
<link linkend="release-5.3.1">5.3.1</link> (2022-02-15)
</entry>
<entry>
9.4, 9.5, 9.6, 10, 11, 12, 13, 14, 15
</entry>
<entry>
PostgreSQL 15 supported from &repmgr; 5.3.3
</entry>
</row>
<row>
<entry>
&repmgr; 5.2
</entry>
<entry>
No
</entry>
<entry>
<link linkend="release-5.2.1">5.2.1</link> (2020-12-07)
</entry>
<entry>
9.4, 9.5, 9.6, 10, 11, 12, 13
</entry>
<entry>
&nbsp;
</entry>
</row>
<row>
<entry>
&repmgr; 5.1
</entry>
<entry>
No
</entry>
<entry>
<link linkend="release-5.1.0">5.1.0</link> (2020-04-13)
</entry>
<entry>
9.3, 9.4, 9.5, 9.6, 10, 11, 12
</entry>
<entry>
&nbsp;
</entry>
</row>
<row>
<entry>
&repmgr; 5.0
</entry>
<entry>
No
</entry>
<entry>
<link linkend="release-5.0">5.0</link> (2019-10-15)
</entry>
<entry>
9.3, 9.4, 9.5, 9.6, 10, 11, 12
</entry>
<entry>
&nbsp;
</entry>
</row>
<row>
<entry>
&repmgr; 4.x
</entry>
<entry>
No
</entry>
<entry>
<link linkend="release-4.4">4.4</link> (2019-06-27)
</entry>
<entry>
9.3, 9.4, 9.5, 9.6, 10, 11
</entry>
<entry>
&nbsp;
</entry>
</row>
<row>
<entry>
&repmgr; 3.x
</entry>
<entry>
No
</entry>
<entry>
<ulink url="https://repmgr.org/release-notes-3.3.2.html">3.3.2</ulink> (2017-05-30)
</entry>
<entry>
9.3, 9.4, 9.5, 9.6
</entry>
<entry>
&nbsp;
</entry>
</row>
<row>
<entry>
&repmgr; 2.x
</entry>
<entry>
No
</entry>
<entry>
<ulink url="https://repmgr.org/release-notes-2.0.3.html">2.0.3</ulink> (2015-04-16)
</entry>
<entry>
9.0, 9.1, 9.2, 9.3, 9.4
</entry>
<entry>
&nbsp;
</entry>
</row>
</tbody>
</tgroup>
</table>
<important>
<para>
The &repmgr; series older than 5.x are no longer maintained or supported.
We strongly recommend upgrading to the latest &repmgr; version.
</para>
<para>
Following the release of &repmgr; 5.0, there will be no further releases of
the &repmgr; 4.x series or older. Note that &repmgr; 5.x is an incremental development
of the 4.x series and &repmgr; 4.x users should upgrade to this as soon as possible.
</para>
</important>
</sect2>
<sect2 id="install-postgresql-93-94">
<title>PostgreSQL 9.4 support</title>
<indexterm>
<primary>PostgreSQL 9.4</primary>
<secondary>repmgr support</secondary>
</indexterm>
<para>
Note that some &repmgr; functionality is not available in PostgreSQL 9.4:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>
In PostgreSQL 9.4, <command>pg_rewind</command> is not part of the core
distribution. <command>pg_rewind</command> will need to be compiled separately to be able
to use any &repmgr; functionality which takes advantage of it.
</para>
</listitem>
</itemizedlist>
<warning>
<para>
PostgreSQL 9.3 has reached the end of its community support period (final release was
<ulink url="https://www.postgresql.org/docs/9.3/release-9-3-25.html">9.3.25</ulink>
in November 2018) and will no longer be updated with security or bugfixes.
</para>
<para>
Beginning with &repmgr; 5.2, &repmgr; no longer supports PostgreSQL 9.3.
</para>
<para>
PostgreSQL 9.4 has reached the end of its community support period (final release was
<ulink url="https://www.postgresql.org/docs/9.4/release-9-4-26.html">9.4.26</ulink>
in February 2020) and will no longer be updated with security or bugfixes.
</para>
<para>
We recommend that users of these versions migrate to a supported PostgreSQL version
as soon as possible.
</para>
<para>
For further details, see the <ulink url="https://www.postgresql.org/support/versioning/">PostgreSQL Versioning Policy</ulink>.
</para>
</warning>
</sect2>
</sect1>

View File

@@ -1,139 +0,0 @@
<sect1 id="installation-source" xreflabel="Installing from source code">
<title>Installing &repmgr; from source</title>
<sect2 id="installation-source-prereqs">
<title>Prerequisites for installing from source</title>
<para>
To install &repmgr; the prerequisites for compiling
&postgres; must be installed. These are described in &postgres;'s
documentation
on <ulink url="https://www.postgresql.org/docs/current/install-requirements.html">build requirements</ulink>
and <ulink url="https://www.postgresql.org/docs/current/docguide-toolsets.html">build requirements for documentation</ulink>.
</para>
<para>
Most mainstream Linux distributions and other UNIX variants provide simple
ways to install the prerequisites from packages.
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>
<literal>Debian</literal> and <literal>Ubuntu</literal>: First
add the <ulink
url="http://apt.postgresql.org/">apt.postgresql.org</ulink>
repository to your <filename>sources.list</filename> if you
have not already done so. Then install the pre-requisites for
building PostgreSQL with:
<programlisting>
sudo apt-get update
sudo apt-get build-dep postgresql-9.6
</programlisting>
</para>
</listitem>
<listitem>
<para>
<literal>RHEL or CentOS 6.x or 7.x</literal>: install the appropriate repository RPM
for your system from <ulink url="https://yum.postgresql.org/repopackages.php">
yum.postgresql.org</ulink>. Then install the prerequisites for building
PostgreSQL with:
<programlisting>
sudo yum check-update
sudo yum groupinstall "Development Tools"
sudo yum install yum-utils openjade docbook-dtds docbook-style-dsssl docbook-style-xsl
sudo yum-builddep postgresql96
</programlisting>
</para>
</listitem>
</itemizedlist>
</para>
<note>
<simpara>
Select the appropriate PostgreSQL versions for your target repmgr version.
</simpara>
</note>
</sect2>
<sect2 id="installation-get-source">
<title>Getting &repmgr; source code</title>
<para>
There are two ways to get the &repmgr; source code: with git, or by downloading tarballs of released versions.
</para>
<sect3>
<title>Using <application>git</application> to get the &repmgr; sources</title>
<para>
Use <application><ulink url="https://git-scm.com">git</ulink></application> if you expect
to update often, you want to keep track of development or if you want to contribute
changes to &repmgr;. There is no reason <emphasis>not</emphasis> to use <application>git</application>
if you're familiar with it.
</para>
<para>
The source for &repmgr; is maintained at
<ulink url="https://github.com/2ndQuadrant/repmgr">https://github.com/2ndQuadrant/repmgr</ulink>.
</para>
<para>
There are also tags for each &repmgr; release, e.g. <filename>REL4_0_STABLE</filename>.
</para>
<para>
Clone the source code using <application>git</application>:
<programlisting>
git clone https://github.com/2ndQuadrant/repmgr
</programlisting>
</para>
<para>
For more information on using <application>git</application> see
<ulink url="https://git-scm.com/">git-scm.com</ulink>.
</para>
</sect3>
<sect3>
<title>Downloading release source tarballs</title>
<para>
Official release source code is uploaded as tarballs to the
&repmgr; website along with a tarball checksum and a matching GnuPG
signature. See
<ulink url="http://repmgr.org/">http://repmgr.org/</ulink>
for the download information. See <xref linkend="appendix-signatures">
for information on verifying digital signatures.
</para>
<para>
You will need to download the repmgr source, e.g. <filename>repmgr-4.0.tar.gz</filename>.
You may optionally verify the package checksums from the
<literal>.md5</literal> files and/or verify the GnuPG signatures
per <xref linkend="appendix-signatures">.
</para>
<para>
After you unpack the source code archives using <literal>tar xf</literal>
the installation process is the same as if you were installing from a git
clone.
</para>
</sect3>
</sect2>
<sect2 id="installation-repmgr-source">
<title>Installation of &repmgr; from source</title>
<para>
To installing &repmgr; from source, simply execute:
<programlisting>
./configure && make install
</programlisting>
Ensure `pg_config` for the target PostgreSQL version is in `$PATH`.
</para>
</sect2>
</sect1>

293
doc/install-source.xml Normal file
View File

@@ -0,0 +1,293 @@
<sect1 id="installation-source" xreflabel="Installing from source code">
<title>Installing &repmgr; from source</title>
<indexterm>
<primary>installation</primary>
<secondary>from source</secondary>
</indexterm>
<sect2 id="installation-source-prereqs">
<title>Prerequisites for installing from source</title>
<para>
To install &repmgr; the prerequisites for compiling
&postgres; must be installed. These are described in &postgres;'s
documentation
on <ulink url="https://www.postgresql.org/docs/current/install-requirements.html">build requirements</ulink>
and <ulink url="https://www.postgresql.org/docs/current/docguide-toolsets.html">build requirements for documentation</ulink>.
</para>
<para>
Most mainstream Linux distributions and other UNIX variants provide simple
ways to install the prerequisites from packages.
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>
<literal>Debian</literal> and <literal>Ubuntu</literal>: First
add the <ulink url="https://apt.postgresql.org/">apt.postgresql.org</ulink>
repository to your <filename>sources.list</filename> if you
have not already done so, and ensure the source repository is enabled.
</para>
<tip>
<para>
If not configured, the source repository can be added by including
a <literal>deb-src</literal> line as a copy of the existing <literal>deb</literal>
line in the repository file, which is usually
<filename>/etc/apt/sources.list.d/pgdg.list</filename>, e.g.:
<programlisting>
deb https://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main
deb-src https://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisting>
</para>
</tip>
<para>
Then install the prerequisites for
building PostgreSQL with e.g.:
<programlisting>
sudo apt-get update
sudo apt-get build-dep postgresql-9.6</programlisting>
</para>
<important>
<simpara>
Select the appropriate PostgreSQL version for your target repmgr version.
</simpara>
</important>
<note>
<para>
If using <command>apt-get build-dep</command> is not possible, the
following packages may need to be installed manually:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>flex</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libedit-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libkrb5-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libpam0g-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libreadline-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libselinux1-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libssl-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libxml2-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libxslt1-dev</literal></simpara>
</listitem>
</itemizedlist>
</para>
</note>
</listitem>
<listitem>
<para>
<literal>RHEL or CentOS 6.x or 7.x</literal>: install the appropriate repository RPM
for your system from <ulink url="https://yum.postgresql.org/repopackages.php">
yum.postgresql.org</ulink>. Then install the prerequisites for building
PostgreSQL with:
<programlisting>
sudo yum check-update
sudo yum groupinstall "Development Tools"
sudo yum install yum-utils openjade docbook-dtds docbook-style-dsssl docbook-style-xsl
sudo yum-builddep postgresql96</programlisting>
</para>
<important>
<simpara>
Select the appropriate PostgreSQL version for your target repmgr version.
</simpara>
</important>
<note>
<para>
If using <command>yum-builddep</command> is not possible, the
following packages may need to be installed manually:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>flex</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libselinux-devel</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libxml2-devel</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libxslt-devel</literal></simpara>
</listitem>
<listitem>
<simpara><literal>openssl-devel</literal></simpara>
</listitem>
<listitem>
<simpara><literal>pam-devel</literal></simpara>
</listitem>
<listitem>
<simpara><literal>readline-devel</literal></simpara>
</listitem>
</itemizedlist>
</para>
</note>
<tip>
<para>
If building against PostgreSQL 11 or later configured with the <option>--with-llvm</option> option
(this is the case with the PGDG-provided packages) you'll also need to install the
<literal>llvm-toolset-7-clang</literal> package. This is available via the
<ulink url="https://wiki.centos.org/AdditionalResources/Repositories/SCL">Software Collections (SCL) Repository</ulink>.
</para>
</tip>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2 id="installation-get-source">
<title>Getting &repmgr; source code</title>
<para>
There are two ways to get the &repmgr; source code: with git, or by downloading tarballs of released versions.
</para>
<sect3>
<title>Using <application>git</application> to get the &repmgr; sources</title>
<para>
Use <application><ulink url="https://git-scm.com">git</ulink></application> if you expect
to update often, you want to keep track of development or if you want to contribute
changes to &repmgr;. There is no reason <emphasis>not</emphasis> to use <application>git</application>
if you're familiar with it.
</para>
<para>
The source for &repmgr; is maintained at
<ulink url="https://github.com/EnterpriseDB/repmgr">https://github.com/EnterpriseDB/repmgr</ulink>.
</para>
<para>
There are also tags for each <ulink url="https://github.com/EnterpriseDB/repmgr/releases">&repmgr; release</ulink>, e.g.
<literal><ulink url="https://github.com/EnterpriseDB/repmgr/releases/tag/v4.4.0">v4.4.0</ulink></literal>.
</para>
<para>
Clone the source code using <application>git</application>:
<programlisting>
git clone https://github.com/EnterpriseDB/repmgr</programlisting>
</para>
<para>
For more information on using <application>git</application> see
<ulink url="https://git-scm.com/">git-scm.com</ulink>.
</para>
</sect3>
<sect3>
<title>Downloading release source tarballs</title>
<para>
Official release source code is uploaded as tarballs to the
&repmgr; website along with a tarball checksum and a matching GnuPG
signature. See
<ulink url="http://repmgr.org/">http://repmgr.org/</ulink>
for the download information. See <xref linkend="appendix-signatures"/>
for information on verifying digital signatures.
</para>
<para>
You will need to download the repmgr source, e.g. <filename>repmgr-4.0.tar.gz</filename>.
You may optionally verify the package checksums from the
<literal>.md5</literal> files and/or verify the GnuPG signatures
per <xref linkend="appendix-signatures"/>.
</para>
<para>
After you unpack the source code archives using <command>tar xf</command>
the installation process is the same as if you were installing from a git
clone.
</para>
</sect3>
</sect2>
<sect2 id="installation-repmgr-source">
<title>Installation of &repmgr; from source</title>
<para>
To installing &repmgr; from source, simply execute:
<programlisting>
./configure &amp;&amp; make install</programlisting>
Ensure <command>pg_config</command> for the target PostgreSQL version is in
<varname>$PATH</varname>.
</para>
</sect2>
<sect2 id="installation-build-repmgr-docs" xreflabel="Building repmgr documentation">
<title>Building &repmgr; documentation</title>
<para>
The &repmgr; documentation is (like the main PostgreSQL project)
written in DocBook XML format. To build it locally as HTML, you'll need to
install the required packages as described in the
<ulink url="https://www.postgresql.org/docs/current/docguide-toolsets.html">PostgreSQL documentation</ulink>.
</para>
<para>
The minimum PostgreSQL version for building the &repmgr; documentation is
PostgreSQL 9.5.
</para>
<note>
<simpara>
In &repmgr; 4.3 and earlier, the documentation can only be built against
PostgreSQL 9.6 or earlier.
</simpara>
</note>
<para>
To build the documentation as HTML, execute:
<programlisting>
./configure &amp;&amp; make doc</programlisting>
</para>
<para>
The generated HTML files will be placed in the <filename>doc/html</filename>
subdirectory of your source tree.
</para>
<para>
To build the documentation as a single HTML file, after configuring and building
the main &repmgr; source as described above, execute:
<programlisting>
./configure &amp;&amp; make doc-repmgr.html</programlisting>
</para>
<para>
To build the documentation as a PDF file, after configuring and building
the main &repmgr; source as described above, execute:
<programlisting>
./configure &amp;&amp; make doc-repmgr-A4.pdf</programlisting>
</para>
</sect2>
</sect1>

View File

@@ -1,6 +1,11 @@
<chapter id="installation" xreflabel="Installation">
<title>Installation</title>
<indexterm>
<primary>installation</primary>
</indexterm>
<para>
&repmgr; can be installed from binary packages provided by your operating
system's packaging system, or from source.
@@ -14,7 +19,7 @@
only option if there are no packages for your operating system yet.
</para>
<para>
Before installing &repmgr; make sure you satisfy the <xref linkend="install-requirements">.
Before installing &repmgr; make sure you satisfy the <xref linkend="install-requirements"/>.
</para>
&install-requirements;

View File

@@ -1,20 +0,0 @@
<!-- doc/src/sgml/legal.sgml -->
<date>2017</date>
<copyright>
<year>2010-2017</year>
<holder>2ndQuadrant, Ltd.</holder>
</copyright>
<legalnotice id="legalnotice">
<title>Legal Notice</title>
<para>
<productname>repmgr</productname> is Copyright &copy; 2010-2017
by 2ndQuadrant, Ltd.
</para>
<para>add license</para>
</legalnotice>

37
doc/legal.xml Normal file
View File

@@ -0,0 +1,37 @@
<!-- doc/legal.xml -->
<date>2025</date>
<copyright>
<year>2010-2025</year>
<holder>EDB</holder>
</copyright>
<legalnotice id="legalnotice">
<title>Legal Notice</title>
<para>
<productname>repmgr</productname> is Copyright &copy; 2010-2025
by EDB All rights reserved.
</para>
<para>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
</para>
<para>
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
</para>
<para>
You should have received a copy of the GNU General Public License
along with this program. If not, see
<ulink url="https://www.gnu.org/licenses/">https://www.gnu.org/licenses/</ulink>
to obtain one.
</para>
</legalnotice>

View File

@@ -2,26 +2,32 @@
<title>repmgr overview</title>
<para>
This chapter provides a high-level overview of repmgr's components and functionality.
This chapter provides a high-level overview of &repmgr;'s components and
functionality.
</para>
<sect1 id="repmgr-concepts" xreflabel="Concepts">
<title>Concepts</title>
<indexterm>
<primary>concepts</primary>
</indexterm>
<para>
This guide assumes that you are familiar with PostgreSQL administration and
streaming replication concepts. For further details on streaming
replication, see the PostgreSQL documentation section on <ulink
url="https://www.postgresql.org/docs/current/interactive/warm-standby.html#STREAMING-REPLICATION">
streaming replication</>.
url="https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION">
streaming replication</ulink>.
</para>
<para>
The following terms are used throughout the `repmgr` documentation.
The following terms are used throughout the &repmgr; documentation.
<variablelist>
<varlistentry>
<term>replication cluster</term>
<listitem>
<simpara>
In the `repmgr` documentation, "replication cluster" refers to the network
In the &repmgr; documentation, "replication cluster" refers to the network
of PostgreSQL servers connected by streaming replication.
</simpara>
</listitem>
@@ -31,7 +37,7 @@
<term>node</term>
<listitem>
<simpara>
A node is a server within a replication cluster.
A node is a single PostgreSQL server within a replication cluster.
</simpara>
</listitem>
</varlistentry>
@@ -41,7 +47,7 @@
<listitem>
<simpara>
The node a standby server connects to, in order to receive streaming replication.
This is either the primary server or in the case of cascading replication, another
This is either the primary server, or in the case of cascading replication, another
standby.
</simpara>
</listitem>
@@ -52,7 +58,7 @@
<listitem>
<simpara>
This is the action which occurs if a primary server fails and a suitable standby
is promoted as the new primary. The `repmgrd` daemon supports automatic failover
is promoted as the new primary. The &repmgrd; daemon supports automatic failover
to minimise downtime.
</simpara>
</listitem>
@@ -66,7 +72,7 @@
it's necessary to take a primary server offline; in this case a controlled
switchover is necessary, whereby a suitable standby is promoted and the
existing primary removed from the replication cluster in a controlled manner.
The `repmgr` command line client provides this functionality.
The &repmgr; command line client provides this functionality.
</simpara>
</listitem>
</varlistentry>
@@ -82,13 +88,37 @@
</simpara>
</listitem>
</varlistentry>
<varlistentry id="witness-server">
<term>witness server</term>
<listitem>
<para>
&repmgr; provides functionality to set up a so-called "witness server" to
assist in determining a new primary server in a failover situation with more
than one standby. The witness server itself is not part of the replication
cluster, although it does contain a copy of the repmgr metadata schema.
</para>
<para>
The purpose of a witness server is to provide a "casting vote" where servers
in the replication cluster are split over more than one location. In the event
of a loss of connectivity between locations, the presence or absence of
the witness server will decide whether a server at that location is promoted
to primary; this is to prevent a "split-brain" situation where an isolated
location interprets a network outage as a failure of the (remote) primary and
promotes a (local) standby.
</para>
<para>
A witness server only needs to be created if &repmgrd;
is in use.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>
</sect1>
<sect1 id="repmgr-components" xreflabel="Components">
<title>Components</title>
<para>
`repmgr` is a suite of open-source tools to manage replication and failover
&repmgr; is a suite of open-source tools to manage replication and failover
within a cluster of PostgreSQL servers. It supports and enhances PostgreSQL's
built-in streaming replication, which provides a single read/write primary server
and one or more read-only standbys containing near-real time copies of the primary
@@ -147,11 +177,12 @@
<sect1 id="repmgr-user-metadata" xreflabel="Repmgr user and metadata">
<title>Repmgr user and metadata</title>
<para>
In order to effectively manage a replication cluster, `repmgr` needs to store
In order to effectively manage a replication cluster, &repmgr; needs to store
information about the servers in the cluster in a dedicated database schema.
This schema is automatically by the `repmgr` extension, which is installed
during the first step in initialising a `repmgr`-administered cluster
(`repmgr primary register`) and contains the following objects:
This schema is automatically created by the &repmgr; extension, which is installed
during the first step in initializing a &repmgr;-administered cluster
(<command><link linkend="repmgr-primary-register">repmgr primary register</link></command>)
and contains the following objects:
<variablelist>
<varlistentry>
<term>Tables</term>
@@ -159,14 +190,15 @@
<para>
<itemizedlist>
<listitem>
<simpara>repmgr.events: records events of interest</simpara>
<simpara><literal>repmgr.events</literal>: records events of interest</simpara>
</listitem>
<listitem>
<simpara>repmgr.nodes: connection and status information for each server in the
<simpara><literal>repmgr.nodes</literal>: connection and status information for each server in the
replication cluster</simpara>
</listitem>
<listitem>
<simpara>repmgr.monitoring_history: historical standby monitoring information written by `repmgrd`</simpara>
<simpara><literal>repmgr.monitoring_history</literal>: historical standby monitoring information
written by &repmgrd;</simpara>
</listitem>
</itemizedlist>
</para>
@@ -178,12 +210,12 @@
<para>
<itemizedlist>
<listitem>
<simpara>repmgr.show_nodes: based on the table `repl_nodes`, additionally showing the
name of the server's upstream node</simpara>
<simpara>repmgr.show_nodes: based on the table <literal>repmgr.nodes</literal>, additionally showing the
name of the server's upstream node</simpara>
</listitem>
<listitem>
<simpara>repmgr.replication_status: when `repmgrd`'s monitoring is enabled, shows current monitoring
status for each standby.</simpara>
<simpara>repmgr.replication_status: when &repmgrd;'s monitoring is enabled, shows
current monitoring status for each standby.</simpara>
</listitem>
</itemizedlist>
</para>
@@ -193,16 +225,16 @@
</para>
<para>
The `repmgr` metadata schema can be stored in an existing database or in its own
dedicated database. Note that the `repmgr` metadata schema cannot reside on a database
server which is not part of the replication cluster managed by `repmgr`.
The &repmgr; metadata schema can be stored in an existing database or in its own
dedicated database. Note that the &repmgr; metadata schema cannot reside on a database
server which is not part of the replication cluster managed by &repmgr;.
</para>
<para>
A database user must be available for `repmgr` to access this database and perform
A database user must be available for &repmgr; to access this database and perform
necessary changes. This user does not need to be a superuser, however some operations
such as initial installation of the `repmgr` extension will require a superuser
such as initial installation of the &repmgr; extension will require a superuser
connection (this can be specified where required with the command line option
`--superuser`).
<literal>--superuser</literal>).
</para>
</sect1>

View File

@@ -1,9 +1,13 @@
<chapter id="promoting-standby" xreflabel="Promoting a standby">
<title>Promoting a standby server with repmgr</title>
<indexterm>
<primary>promoting a standby</primary>
<seealso>repmgr standby promote</seealso>
</indexterm>
<para>
If a primary server fails or needs to be removed from the replication cluster,
a new primary server must be designated, to ensure the cluster continues
to function correctly. This can be done with <xref linkend="repmgr-standby-promote">,
to function correctly. This can be done with <xref linkend="repmgr-standby-promote"/>,
which promotes the standby on the current server to primary.
</para>
@@ -27,7 +31,7 @@
At this point the replication cluster will be in a partially disabled state, with
both standbys accepting read-only connections while attempting to connect to the
stopped primary. Note that the &repmgr; metadata table will not yet have been updated;
executing <xref linkend="repmgr-cluster-show"> will note the discrepancy:
executing <xref linkend="repmgr-cluster-show"/> will note the discrepancy:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Connection string
@@ -56,7 +60,7 @@
DETAIL: node 2 was successfully promoted to primary</programlisting>
</para>
<para>
Executing <xref linkend="repmgr-cluster-show"> will show the current state; as there is now an
Executing <xref linkend="repmgr-cluster-show"/> will show the current state; as there is now an
active primary, the previous warning will not be displayed:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
@@ -68,8 +72,8 @@
</para>
<para>
However the sole remaining standby (<literal>node3</literal>) is still trying to replicate from the failed
primary; <xref linkend="repmgr-standby-follow"> must now be executed to rectify this situation
(see <xref linkend="follow-new-primary"> for example).
primary; <xref linkend="repmgr-standby-follow"/> must now be executed to rectify this situation
(see <xref linkend="follow-new-primary"/> for example).
</para>
</chapter>

View File

@@ -1,16 +1,25 @@
<chapter id="quickstart" xreflabel="Quick-start guide">
<title>Quick-start guide</title>
<indexterm>
<primary>quickstart</primary>
</indexterm>
<para>
This section gives a quick introduction to &repmgr;, including setting up a
sample &repmgr; installation and a basic replication cluster.
</para>
<para>
These instructions are not suitable for a production install, as they may not
take into account security considerations, proper system administration
procedures etc..
These instructions for demonstration purposes and are not suitable for a production
install, as issues such as account security considerations, and system administration
best practices are omitted.
</para>
<note>
<simpara>
To upgrade an existing &repmgr; 3.x installation, see section
<xref linkend="upgrading-from-repmgr-3"/>.
</simpara>
</note>
<sect1 id="quickstart-prerequisites">
<title>Prerequisites for setting up a basic replication cluster with &repmgr;</title>
@@ -45,7 +54,8 @@
</para>
<para>
If you want <application>repmgr</application> to copy configuration files which are
located outside the PostgreSQL data directory, and/or to test <command>switchover</command>
located outside the PostgreSQL data directory, and/or to test
<command><link linkend="repmgr-standby-switchover">switchover</link></command>
functionality, you will also need passwordless SSH connections between both servers, and
<application>rsync</application> should be installed.
</para>
@@ -58,7 +68,7 @@
</tip>
</sect1>
<sect1 id="quickstart-postgresql-configuration">
<sect1 id="quickstart-postgresql-configuration" xreflabel="PostgreSQL configuration">
<title>PostgreSQL configuration</title>
<para>
On the primary server, a PostgreSQL instance must be initialised and running.
@@ -66,13 +76,26 @@
</para>
<programlisting>
# Enable replication connections; set this figure to at least one more
# Enable replication connections; set this value to at least one more
# than the number of standbys which will connect to this server
# (note that repmgr will execute `pg_basebackup` in WAL streaming mode,
# which requires two free WAL senders)
# (note that repmgr will execute "pg_basebackup" in WAL streaming mode,
# which requires two free WAL senders).
#
# See: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-WAL-SENDERS
max_wal_senders = 10
# If using replication slots, set this value to at least one more
# than the number of standbys which will connect to this server.
# Note that repmgr will only make use of replication slots if
# "use_replication_slots" is set to "true" in "repmgr.conf".
# (If you are not intending to use replication slots, this value
# can be set to "0").
#
# See: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-REPLICATION-SLOTS
max_replication_slots = 10
# Ensure WAL files contain enough information to enable read-only queries
# on the standby.
#
@@ -80,42 +103,48 @@
# PostgreSQL 9.6 and later: one of 'replica' or 'logical'
# ('hot_standby' will still be accepted as an alias for 'replica')
#
# See: https://www.postgresql.org/docs/current/static/runtime-config-wal.html#GUC-WAL-LEVEL
# See: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LEVEL
wal_level = 'hot_standby'
# Enable read-only queries on a standby
# (Note: this will be ignored on a primary but we recommend including
# it anyway)
# it anyway, in case the primary later becomes a standby)
#
# See: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-HOT-STANDBY
hot_standby = on
# Enable WAL file archiving
#
# See: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-MODE
archive_mode = on
# Set archive command to a script or application that will safely store
# you WALs in a secure place. /bin/true is an example of a command that
# ignores archiving. Use something more sensible.
archive_command = '/bin/true'
# If you have configured `pg_basebackup_options`
# in `repmgr.conf` to include the setting `--xlog-method=fetch` (from
# PostgreSQL 10 `--wal-method=fetch`), *and* you have not set
# `restore_command` in `repmgr.conf`to fetch WAL files from another
# source such as Barman, you'll need to set `wal_keep_segments` to a
# high enough value to ensure that all WAL files generated while
# the standby is being cloned are retained until the standby starts up.
# Set archive command to a dummy command; this can later be changed without
# needing to restart the PostgreSQL instance.
#
# wal_keep_segments = 5000
# See: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-COMMAND
archive_command = '/bin/true'
</programlisting>
<tip>
<simpara>
Rather than editing these settings in the default <filename>postgresql.conf</filename>
file, create a separate file such as <filename>postgresql.replication.conf</filename> and
file, create a separate file such as <filename>postgresql.replication.conf</filename> and
include it from the end of the main configuration file with:
<command>include 'postgresql.replication.conf</command>.
<command>include 'postgresql.replication.conf'</command>.
</simpara>
</tip>
<para>
Additionally, if you are intending to use <application>pg_rewind</application>,
and the cluster was not initialised using data checksums, you may want to consider enabling
<varname>wal_log_hints</varname>; for more details see <xref linkend="repmgr-node-rejoin-pg-rewind"/>.
</para>
<para>
See also the <link linkend="configuration-postgresql">PostgreSQL configuration</link> section in the
<link linkend="configuration">repmgr configuration guide</link>.
</para>
</sect1>
<sect1 id="quickstart-repmgr-user-database">
@@ -138,7 +167,7 @@
For the sake of simplicity, the <literal>repmgr</literal> user is created
as a superuser. If desired, it's possible to create the <literal>repmgr</literal>
user as a normal user. However for certain operations superuser permissions
are requiredl; in this case the command line option <command>--superuser</command>
are required; in this case the command line option <command>--superuser</command>
can be provided to specify a superuser.
</para>
<para>
@@ -186,11 +215,20 @@
<sect1 id="quickstart-standby-preparation">
<title>Preparing the standby</title>
<para>
On the standby, do not create a PostgreSQL instance, but do ensure the destination
On the standby, do <emphasis>not</emphasis> create a PostgreSQL instance (i.e.
do not execute <application>initdb</application> or any database creation
scripts provided by packages), but do ensure the destination
data directory (and any other directories which you want PostgreSQL to use)
exist and are owned by the <literal>postgres</literal> system user. Permissions
must be set to <literal>0700</literal> (<literal>drwx------</literal>).
</para>
<tip>
<simpara>
&repmgr; will place a copy of the primary's database files in this directory.
It will however refuse to run if a PostgreSQL instance has already been
created there.
</simpara>
</tip>
<para>
Check the primary database is reachable from the standby using <application>psql</application>:
</para>
@@ -200,7 +238,7 @@
<note>
<para>
&repmgr; stores connection information as <ulink
url="https://www.postgresql.org/docs/current/static/libpq-connect.html#LIBPQ-CONNSTRING">libpq
url="https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING">libpq
connection strings</ulink> throughout. This documentation refers to them as <literal>conninfo</literal>
strings; an alternative name is <literal>DSN</literal> (<literal>data source name</literal>).
We'll use these in place of the <command>-h hostname -d databasename -U username</command> syntax.
@@ -216,7 +254,7 @@
</para>
<programlisting>
node_id=1
node_name=node1
node_name='node1'
conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/data'
</programlisting>
@@ -224,20 +262,47 @@
<para>
<filename>repmgr.conf</filename> should not be stored inside the PostgreSQL data directory,
as it could be overwritten when setting up or reinitialising the PostgreSQL
server. See sections on <xref linkend="configuration-file"> and <xref linkend="configuration-file-settings">
server. See sections <xref linkend="configuration"/> and <xref linkend="configuration-file"/>
for further details about <filename>repmgr.conf</filename>.
</para>
<note>
<para>
&repmgr; only uses <option>pg_bindir</option> when it executes
PostgreSQL binaries directly.
</para>
<para>
For user-defined scripts such as <option>promote_command</option> and the
various <option>service_*_command</option>s, you <emphasis>must</emphasis>
always explicitly provide the full path to the binary or script being
executed, even if it is &repmgr; itself.
</para>
<para>
This is because these options can contain user-defined scripts in arbitrary
locations, so prepending <option>pg_bindir</option> may break them.
</para>
</note>
<tip>
<simpara>
For Debian-based distributions we recommend explictly setting
<literal>pg_bindir</literal> to the directory where <command>pg_ctl</command> and other binaries
For Debian-based distributions we recommend explicitly setting
<option>pg_bindir</option> to the directory where <command>pg_ctl</command> and other binaries
not in the standard path are located. For PostgreSQL 9.6 this would be <filename>/usr/lib/postgresql/9.6/bin/</filename>.
</simpara>
</tip>
<tip>
<simpara>
If your distribution places the &repmgr; binaries in a location other than the
PostgreSQL installation directory, specify this with <option>repmgr_bindir</option>
to enable &repmgr; to perform operations (e.g.
<command><link linkend="repmgr-cluster-crosscheck">repmgr cluster crosscheck</link></command>)
on other nodes.
</simpara>
</tip>
<para>
See the file
<ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</>
<ulink url="https://raw.githubusercontent.com/EnterpriseDB/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</ulink>
for details of all available configuration parameters.
</para>
@@ -286,7 +351,7 @@
slot_name |
config_file | /etc/repmgr.conf</programlisting>
<para>
Each server in the replication cluster will have its own record. If <command>repmgrd</command>
Each server in the replication cluster will have its own record. If &repmgrd;
is in use, the fields <literal>upstream_node_id</literal>, <literal>active</literal> and
<literal>type</literal> will be updated when the node's status or role changes.
</para>
@@ -301,11 +366,10 @@
(and possibly <literal>data_directory</literal>) adjusted accordingly, e.g.:
</para>
<programlisting>
node=2
node_name=node2
node_id=2
node_name='node2'
conninfo='host=node2 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/data'
</programlisting>
data_directory='/var/lib/postgresql/data'</programlisting>
<para>
Use the <command>--dry-run</command> option to check the standby can be cloned:
</para>
@@ -341,9 +405,10 @@
</programlisting>
<para>
This has cloned the PostgreSQL data directory files from the primary <literal>node1</literal>
using PostgreSQL's <command>pg_basebackup</command> utility. A <filename>recovery.conf</filename>
file containing the correct parameters to start streaming from this primary server will be created
automatically.
using PostgreSQL's <command>pg_basebackup</command> utility. Replication configuration
containing the correct parameters to start streaming from this primary server will be
automatically appended to <filename>postgresql.auto.conf</filename>. (In PostgreSQL 11
and earlier the file <filename>recovery.conf</filename> will be created).
</para>
<note>
<simpara>
@@ -395,7 +460,7 @@
</para>
<para>
From PostgreSQL 9.6 you can also use the view
<ulink url="https://www.postgresql.org/docs/current/static/monitoring-stats.html#PG-STAT-WAL-RECEIVER-VIEW">
<ulink url="https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-WAL-RECEIVER-VIEW">
<literal>pg_stat_wal_receiver</literal></ulink> to check the replication status from the standby.
<programlisting>
@@ -413,11 +478,14 @@
latest_end_lsn | 0/7000538
latest_end_time | 2017-08-28 15:20:56.418735+09
slot_name |
sender_host | node1
sender_port | 5432
conninfo | user=repmgr dbname=replication host=node1 application_name=node2
</programlisting>
Note that the <varname>conninfo</varname> value is that generated in <filename>recovery.conf</filename>
and will differ slightly from the primary's <varname>conninfo</varname> as set in <filename>repmgr.conf</filename> -
among others it will contain the connecting node's name as <varname>application_name</varname>.
Note that the <varname>conninfo</varname> value is that generated in <filename>postgresql.auto.conf</filename>
(PostgreSQL 11 and earlier: <filename>recovery.conf</filename>) and will differ slightly from the primary's
<varname>conninfo</varname> as set in <filename>repmgr.conf</filename> - among others it will contain the
connecting node's name as <varname>application_name</varname>.
</para>
</sect1>
@@ -432,11 +500,12 @@
<para>
Check the node is registered by executing <command>repmgr cluster show</command> on the standby:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Connection string
----+-------+---------+-----------+----------+----------+--------------------------------------
1 | node1 | primary | * running | | default | host=node1 dbname=repmgr user=repmgr
2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr</programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=node1 dbname=repmgr user=repmgr
2 | node2 | standby | running | node1 | default | 100 | 1 | host=node2 dbname=repmgr user=repmgr</programlisting>
</para>
<para>
Both nodes are now registered with &repmgr; and the records have been copied to the standby server.

View File

@@ -1,23 +0,0 @@
<chapter id="repmgr-cluster-cleanup" xreflabel="repmgr cluster cleanup">
<indexterm>
<primary>repmgr cluster cleanup</primary>
</indexterm>
<title>repmgr cluster cleanup</title>
<para>
Purges monitoring history from the <literal>repmgr.monitoring_history</literal> table to
prevent excessive table growth. Use the <literal>-k/--keep-history</literal> to specify the
number of days of monitoring history to retain. This command can be used
manually or as a cronjob.
</para>
<para>
This command requires a valid <filename>repmgr.conf</filename> file for the node on which it is
executed; no additional arguments are required.
</para>
<note>
<simpara>
Monitoring history will only be written if <command>repmgrd</command> is active, and
<varname>monitoring_history</varname> is set to <literal>true</literal> in
<filename>repmgr.conf</filename>.
</simpara>
</note>
</chapter>

View File

@@ -0,0 +1,77 @@
<refentry id="repmgr-cluster-cleanup">
<indexterm>
<primary>repmgr cluster cleanup</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr cluster cleanup</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr cluster cleanup</refname>
<refpurpose>purge monitoring history</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
Purges monitoring history from the <literal>repmgr.monitoring_history</literal> table to
prevent excessive table growth.
</para>
<para>
By default <emphasis>all</emphasis> data will be removed; Use the <option>-k/--keep-history</option>
option to specify the number of days of monitoring history to retain.
</para>
<para>
This command can be executed manually or as a cronjob.
</para>
</refsect1>
<refsect1>
<title>Usage</title>
<para>
This command requires a valid <filename>repmgr.conf</filename> file for the node on which it is
executed; no additional arguments are required.
</para>
</refsect1>
<refsect1>
<title>Notes</title>
<para>
Monitoring history will only be written if &repmgrd; is active, and
<varname>monitoring_history</varname> is set to <literal>true</literal> in
<filename>repmgr.conf</filename>.
</para>
</refsect1>
<refsect1 id="repmgr-cluster-cleanup-events">
<title>Event notifications</title>
<para>
A <literal>cluster_cleanup</literal> <link linkend="event-notifications">event notification</link> will be generated.
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--node-id</option></term>
<listitem>
<para>
Only delete monitoring records for the specified node.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
For more details see the sections <xref linkend="repmgrd-monitoring"/> and
<xref linkend="repmgrd-monitoring-configuration"/>.
</para>
</refsect1>
</refentry>

View File

@@ -1,28 +0,0 @@
<chapter id="repmgr-cluster-crosscheck" xreflabel="repmgr cluster crosscheck">
<indexterm>
<primary>repmgr cluster crosscheck</primary>
</indexterm>
<title>repmgr cluster crosscheck</title>
<para>
<command>repmgr cluster crosscheck</command> is similar to <xref linkend="repmgr-cluster-matrix">,
but cross-checks connections between each combination of nodes. In "Example 3" in
<xref linkend="repmgr-cluster-matrix"> we have no information about the state of <literal>node3</literal>.
However by running <command>repmgr cluster crosscheck</command> it's possible to get a better
overview of the cluster situation:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster crosscheck
Name | Id | 1 | 2 | 3
-------+----+----+----+----
node1 | 1 | * | * | x
node2 | 2 | * | * | *
node3 | 3 | * | * | *</programlisting>
</para>
<para>
What happened is that <command>repmgr cluster crosscheck</command> merged its own
<command>repmgr cluster matrix</command> with the <command>repmgr cluster matrix</command>
output from <literal>node2</literal>; the latter is able to connect to <literal>node3</literal>
and therefore determine the state of outbound connections from that node.
</para>
</chapter>

View File

@@ -0,0 +1,96 @@
<refentry id="repmgr-cluster-crosscheck">
<indexterm>
<primary>repmgr cluster crosscheck</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr cluster crosscheck</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr cluster crosscheck</refname>
<refpurpose>cross-checks connections between each combination of nodes</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
<command>repmgr cluster crosscheck</command> is similar to <xref linkend="repmgr-cluster-matrix"/>,
but cross-checks connections between each combination of nodes. In "Example 3" in
<xref linkend="repmgr-cluster-matrix"/> we have no information about the state of <literal>node3</literal>.
However by running <command>repmgr cluster crosscheck</command> it's possible to get a better
overview of the cluster situation:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster crosscheck
Name | Id | 1 | 2 | 3
-------+----+----+----+----
node1 | 1 | * | * | x
node2 | 2 | * | * | *
node3 | 3 | * | * | *</programlisting>
</para>
<para>
What happened is that <command>repmgr cluster crosscheck</command> merged its own
<command><link linkend="repmgr-cluster-matrix">repmgr cluster matrix</link></command> with the
<command>repmgr cluster matrix</command> output from <literal>node2</literal>; the latter is
able to connect to <literal>node3</literal>
and therefore determine the state of outbound connections from that node.
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
One of the following exit codes will be emitted by <command>repmgr cluster crosscheck</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
The check completed successfully and all nodes are reachable.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_BAD_SSH (12)</option></term>
<listitem>
<para>
One or more nodes could not be accessed via SSH.
</para>
<note>
<simpara>
This only applies to nodes unreachable from the node where
this command is executed.
</simpara>
<simpara>
It's also possible that the crosscheck establishes that
connections between PostgreSQL on all nodes are functioning,
even if SSH access between some nodes is not possible.
</simpara>
</note>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
PostgreSQL on one or more nodes could not be reached.
</para>
<note>
<simpara>
This error code overrides <option>ERR_BAD_SSH</option>.
</simpara>
</note>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
</refentry>

View File

@@ -1,37 +0,0 @@
<chapter id="repmgr-cluster-event" xreflabel="repmgr cluster event">
<indexterm>
<primary>repmgr cluster event</primary>
</indexterm>
<title>repmgr cluster event</title>
<para>
This outputs a formatted list of cluster events, as stored in the
<literal>repmgr.events</literal> table. Output is in reverse chronological order, and
can be filtered with the following options:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>--all</literal>: outputs all entries</simpara>
</listitem>
<listitem>
<simpara><literal>--limit</literal>: set the maximum number of entries to output (default: 20)</simpara>
</listitem>
<listitem>
<simpara><literal>--node-id</literal>: restrict entries to node with this ID</simpara>
</listitem>
<listitem>
<simpara><literal>--node-name</literal>: restrict entries to node with this name</simpara>
</listitem>
<listitem>
<simpara><literal>--event</literal>: filter specific event</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Example:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster event --event=standby_register
Node ID | Name | Event | OK | Timestamp | Details
---------+-------+------------------+----+---------------------+--------------------------------
3 | node3 | standby_register | t | 2017-08-17 10:28:55 | standby registration succeeded
2 | node2 | standby_register | t | 2017-08-17 10:28:53 | standby registration succeeded</programlisting>
</para>
</chapter>

View File

@@ -0,0 +1,79 @@
<refentry id="repmgr-cluster-event">
<indexterm>
<primary>repmgr cluster event</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr cluster event</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr cluster event</refname>
<refpurpose>output a formatted list of cluster events</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
Outputs a formatted list of cluster events, as stored in the <literal>repmgr.events</literal> table.
</para>
</refsect1>
<refsect1>
<title>Usage</title>
<para>
Output is in reverse chronological order, and
can be filtered with the following options:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>--all</literal>: outputs all entries</simpara>
</listitem>
<listitem>
<simpara><literal>--limit</literal>: set the maximum number of entries to output (default: 20)</simpara>
</listitem>
<listitem>
<simpara><literal>--node-id</literal>: restrict entries to node with this ID</simpara>
</listitem>
<listitem>
<simpara><literal>--node-name</literal>: restrict entries to node with this name</simpara>
</listitem>
<listitem>
<simpara><literal>--event</literal>: filter specific event (see <xref linkend="event-notifications"/> for a full list)</simpara>
</listitem>
</itemizedlist>
</para>
<para>
The &quot;Details&quot; column can be omitted by providing <literal>--compact</literal>.
</para>
</refsect1>
<refsect1>
<title>Output format</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>--csv</literal>: generate output in CSV format. Note that the <literal>Details</literal>
column will currently not be emitted in CSV format.
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>Example</title>
<para>
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster event --event=standby_register
Node ID | Name | Event | OK | Timestamp | Details
---------+-------+------------------+----+---------------------+-------------------------------------------------------
3 | node3 | standby_register | t | 2019-04-16 10:59:59 | standby registration succeeded; upstream node ID is 1
2 | node2 | standby_register | t | 2019-04-16 10:59:57 | standby registration succeeded; upstream node ID is 1</programlisting>
</para>
</refsect1>
</refentry>

View File

@@ -1,27 +1,44 @@
<chapter id="repmgr-cluster-matrix" xreflabel="repmgr cluster matrix">
<refentry id="repmgr-cluster-matrix">
<indexterm>
<primary>repmgr cluster matrix</primary>
</indexterm>
<title>repmgr cluster matrix</title>
<para>
<command>repmgr cluster matrix</command> runs <command>repmgr cluster show</command> on each
node and arranges the results in a matrix, recording success or failure.
</para>
<para>
<command>repmgr cluster matrix</command> requires a valid <filename>repmgr.conf</filename>
file on each node. Additionally passwordless `ssh` connections are required between
all nodes.
</para>
<para>
<refmeta>
<refentrytitle>repmgr cluster matrix</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr cluster matrix</refname>
<refpurpose>
runs repmgr cluster show on each node and summarizes output
</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
<command>repmgr cluster matrix</command> runs <command><link linkend="repmgr-cluster-show">repmgr cluster show</link></command> on each
node and arranges the results in a matrix, recording success or failure.
</para>
<para>
<command>repmgr cluster matrix</command> requires a valid <filename>repmgr.conf</filename>
file on each node. Additionally, passwordless <command>ssh</command> connections are required between
all nodes.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<para>
Example 1 (all nodes up):
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster matrix
Name | Id | 1 | 2 | 3
-------+----+----+----+----
node1 | 1 | * | * | *
node2 | 2 | * | * | *
node3 | 3 | * | * | *</programlisting>
node1 | 1 | * | * | *
node2 | 2 | * | * | *
node3 | 3 | * | * | *</programlisting>
</para>
<para>
Example 2 (<literal>node1</literal> and <literal>node2</literal> up, <literal>node3</literal> down):
@@ -46,7 +63,7 @@
<para>
The other two nodes are up; the corresponding rows have <literal>x</literal> in the
column corresponding to <literal>node3</literal>, meaning that inbound connections to
that node have failed, and `*` in the columns corresponding to
that node have failed, and <literal>*</literal> in the columns corresponding to
<literal>node1</literal> and <literal>node2</literal>, meaning that inbound connections
to these nodes have succeeded.
</para>
@@ -76,8 +93,53 @@
connection from <literal>node3</literal>.
</para>
<para>
In this case, the <xref linkend="repmgr-cluster-crosscheck"> command will produce a more
In this case, the <xref linkend="repmgr-cluster-crosscheck"/> command will produce a more
useful result.
</para>
</chapter>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
One of the following exit codes will be emitted by <command>repmgr cluster matrix</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
The check completed successfully and all nodes are reachable.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_BAD_SSH (12)</option></term>
<listitem>
<para>
One or more nodes could not be accessed via SSH.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
PostgreSQL on one or more nodes could not be reached.
</para>
<note>
<simpara>
This error code overrides <option>ERR_BAD_SSH</option>.
</simpara>
</note>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
</refentry>

View File

@@ -1,67 +0,0 @@
<chapter id="repmgr-cluster-show" xreflabel="repmgr cluster show">
<indexterm>
<primary>repmgr cluster show</primary>
</indexterm>
<title>repmgr cluster show</title>
<para>
Displays information about each active node in the replication cluster. This
command polls each registered server and shows its role (<literal>primary</literal> /
<literal>standby</literal> / <literal>bdr</literal>) and status. It polls each server
directly and can be run on any node in the cluster; this is also useful when analyzing
connectivity from a particular node.
</para>
<para>
This command requires either a valid <filename>repmgr.conf</filename> file or a database
connection string to one of the registered nodes; no additional arguments are needed.
</para>
<para>
Example:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Connection string
----+-------+---------+-----------+----------+----------+-----------------------------------------
1 | node1 | primary | * running | | default | host=db_node1 dbname=repmgr user=repmgr
2 | node2 | standby | running | node1 | default | host=db_node2 dbname=repmgr user=repmgr
3 | node3 | standby | running | node1 | default | host=db_node3 dbname=repmgr user=repmgr</programlisting>
</para>
<para>
To show database connection errors when polling nodes, run the command in
<literal>--verbose</literal> mode.
</para>
<para>
The `cluster show` command accepts an optional parameter <literal>--csv</literal>, which
outputs the replication cluster's status in a simple CSV format, suitable for
parsing by scripts:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show --csv
1,-1,-1
2,0,0
3,0,1</programlisting>
</para>
<para>
The columns have following meanings:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
node ID
</simpara>
<simpara>
availability (0 = available, -1 = unavailable)
</simpara>
<simpara>
recovery state (0 = not in recovery, 1 = in recovery, -1 = unknown)
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Note that the availability is tested by connecting from the node where
<command>repmgr cluster show</command> is executed, and does not necessarily imply the node
is down. See <xref linkend="repmgr-cluster-matrix"> and <xref linkend="repmgr-cluster-crosscheck"> to get
a better overviews of connections between nodes.
</para>
</chapter>

245
doc/repmgr-cluster-show.xml Normal file
View File

@@ -0,0 +1,245 @@
<refentry id="repmgr-cluster-show">
<indexterm>
<primary>repmgr cluster show</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr cluster show</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr cluster show</refname>
<refpurpose>display information about each registered node in the replication cluster</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
Displays information about each registered node in the replication cluster. This
command polls each registered server and shows its role (<literal>primary</literal> /
<literal>standby</literal>) and status. It polls each server
directly and can be run on any node in the cluster; this is also useful when analyzing
connectivity from a particular node.
</para>
<para>
For PostgreSQL 9.6 and later, the output will also contain the node's current timeline ID.
</para>
<para>
Node availability is tested by connecting from the node where
<command>repmgr cluster show</command> is executed, and does not necessarily imply the node
is down. See <xref linkend="repmgr-cluster-matrix"/> and <xref linkend="repmgr-cluster-crosscheck"/> to get
better overviews of connections between nodes.
</para>
</refsect1>
<refsect1>
<title>Execution</title>
<para>
This command requires either a valid <filename>repmgr.conf</filename> file or a database
connection string to one of the registered nodes; no additional arguments are needed.
</para>
<para>
To show database connection errors when polling nodes, run the command in
<literal>--verbose</literal> mode.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<para>
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+-----------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=db_node1 dbname=repmgr user=repmgr
2 | node2 | standby | running | node1 | default | 100 | 1 | host=db_node2 dbname=repmgr user=repmgr
3 | node3 | standby | running | node1 | default | 100 | 1 | host=db_node3 dbname=repmgr user=repmgr
4 | node4 | standby | running | node1 | default | 100 | 1 | host=db_node4 dbname=repmgr user=repmgr
5 | node5 | witness | * running | node1 | default | 0 | n/a | host=db_node5 dbname=repmgr user=repmgr</programlisting>
</para>
</refsect1>
<refsect1>
<title>Notes</title>
<para>
The column <literal>Role</literal> shows the expected server role according to the
&repmgr; metadata.
</para>
<para>
<literal>Status</literal> shows whether the server is running or unreachable.
If the node has an unexpected role not reflected in the &repmgr; metadata, e.g. a node was manually
promoted to primary, this will be highlighted with an exclamation mark.
If a connection to the node cannot be made, this will be highlighted with a question mark.
Note that the node will only be shown as <literal>? unreachable</literal>
if a connection is not possible at network level; if the PostgreSQL instance on the
node is pingable but not accepting connections, it will be shown as <literal>? running</literal>.
</para>
<para>
In the following example, executed on <literal>node3</literal>, <literal>node1</literal> is not reachable
at network level and assumed to be down; <literal>node2</literal> has been promoted to primary
(but <literal>node3</literal> is not attached to it, and its metadata has not yet been updated);
<literal>node4</literal> is running but rejecting connections (from <literal>node3</literal> at least).
<programlisting>
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+----------------------+----------+----------+----------+----------+----------------------------------------------------
1 | node1 | primary | ? unreachable | | default | 100 | | host=db_node1 dbname=repmgr user=repmgr
2 | node2 | standby | ! running as primary | ? node1 | default | 100 | 2 | host=db_node2 dbname=repmgr user=repmgr
3 | node3 | standby | running | ? node1 | default | 100 | 1 | host=db_node3 dbname=repmgr user=repmgr
4 | node4 | standby | ? running | ? node1 | default | 100 | | host=db_node4 dbname=repmgr user=repmgr
WARNING: following issues were detected
- unable to connect to node "node1" (ID: 1)
- node "node1" (ID: 1) is registered as an active primary but is unreachable
- node "node2" (ID: 2) is registered as standby but running as primary
- unable to connect to node "node2" (ID: 2)'s upstream node "node1" (ID: 1)
- unable to determine if node "node2" (ID: 2) is attached to its upstream node "node1" (ID: 1)
- unable to connect to node "node3" (ID: 3)'s upstream node "node1" (ID: 1)
- unable to determine if node "node3" (ID: 3) is attached to its upstream node "node1" (ID: 1)
- unable to connect to node "node4" (ID: 4)
HINT: execute with --verbose option to see connection error messages</programlisting>
</para>
<para>
To diagnose connection issues, execute <command>repmgr cluster show</command>
with the <option>--verbose</option> option; this will display the error message
for each failed connection attempt.
</para>
<tip>
<para>
Use <xref linkend="repmgr-cluster-matrix"/> and <xref linkend="repmgr-cluster-crosscheck"/>
to diagnose connection issues across the whole replication cluster.
</para>
</tip>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--csv</option></term>
<listitem>
<para>
<command>repmgr cluster show</command> accepts an optional parameter <literal>--csv</literal>, which
outputs the replication cluster's status in a simple CSV format, suitable for
parsing by scripts, e.g.:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show --csv
1,-1,-1
2,0,0
3,0,1</programlisting>
</para>
<para>
The columns have following meanings:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
node ID
</simpara>
</listitem>
<listitem>
<simpara>
availability (0 = available, -1 = unavailable)
</simpara>
</listitem>
<listitem>
<simpara>
recovery state (0 = not in recovery, 1 = in recovery, -1 = unknown)
</simpara>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--compact</option></term>
<listitem>
<para>
Suppress display of the <literal>conninfo</literal> column.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--terse</option></term>
<listitem>
<para>
Suppress warnings about connection issues.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--verbose</option></term>
<listitem>
<para>
Display the full text of any database connection error messages
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
One of the following exit codes will be emitted by <command>repmgr cluster show</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_BAD_CONFIG (1)</option></term>
<listitem>
<para>
An issue was encountered while attempting to retrieve
&repmgr; metadata.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_DB_CONN (6)</option></term>
<listitem>
<para>
&repmgr; was unable to connect to the local PostgreSQL instance.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected with the replication configuration,
e.g. a node was not in its expected state.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-node-status"/>, <xref linkend="repmgr-node-check"/>, <xref linkend="repmgr-service-status"/>
</para>
</refsect1>
</refentry>

208
doc/repmgr-daemon-start.xml Normal file
View File

@@ -0,0 +1,208 @@
<refentry id="repmgr-daemon-start">
<indexterm>
<primary>repmgr daemon start</primary>
</indexterm>
<indexterm>
<primary>repmgrd</primary>
<secondary>starting</secondary>
</indexterm>
<refmeta>
<refentrytitle>repmgr daemon start</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr daemon start</refname>
<refpurpose>Start the &repmgrd; daemon on the local node</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command starts the &repmgrd; service on the
local node.
</para>
<para>
By default, &repmgr; will wait for up to 15 seconds to confirm that &repmgrd;
started. This behaviour can be overridden by specifying a different value using the <option>--wait</option>
option, or disabled altogether with the <option>--no-wait</option> option.
</para>
<important>
<para>
The <filename>repmgr.conf</filename> parameter <varname>repmgrd_service_start_command</varname>
must be set for <command>repmgr daemon start</command> to work; see section
<xref linkend="repmgr-daemon-start-configuration"/> for details.
</para>
</important>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually attempt to start &repmgrd;.
</para>
<para>
This action will output the command which would be executed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-w</option></term>
<term><option>--wait</option></term>
<listitem>
<para>
Wait for the specified number of seconds to confirm that &repmgrd;
started successfully.
</para>
<para>
Note that providing <option>--wait=0</option> is the equivalent of <option>--no-wait</option>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--no-wait</option></term>
<listitem>
<para>
Don't wait to confirm that &repmgrd;
started successfully.
</para>
<para>
This is equivalent to providing <option>--wait=0</option>.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1 id="repmgr-daemon-start-configuration" xreflabel="repmgr daemon start configuration">
<title>Configuration file settings</title>
<para>
The following parameter in <filename>repmgr.conf</filename> is relevant
to <command>repmgr daemon start</command>:
</para>
<variablelist>
<varlistentry>
<term><option>repmgrd_service_start_command</option></term>
<listitem>
<indexterm>
<primary>repmgrd_service_start_command</primary>
<secondary>with &quot;repmgr daemon start&quot;</secondary>
</indexterm>
<para>
<command>repmgr daemon start</command> will execute the command defined by the
<varname>repmgrd_service_start_command</varname> parameter in <filename>repmgr.conf</filename>.
This must be set to a shell command which will start &repmgrd;;
if &repmgr; was installed from a package, this will be the service command defined by the
package. For more details see <link linkend="appendix-packages">Appendix: &repmgr; package details</link>.
</para>
<important>
<para>
If &repmgr; was installed from a system package, and you do not configure
<varname>repmgrd_service_start_command</varname> to an appropriate service command, this may
result in the system becoming confused about the state of the &repmgrd;
service; this is particularly the case with <literal>systemd</literal>.
</para>
</important>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
One of the following exit codes will be emitted by <command>repmgr daemon start</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
The &repmgrd; start command (defined in
<varname>repmgrd_service_start_command</varname>) was successfully executed.
</para>
<para>
If the <option>--wait</option> option was provided, &repmgr; will confirm that
&repmgrd; has actually started up.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_BAD_CONFIG (1)</option></term>
<listitem>
<para>
<varname>repmgrd_service_start_command</varname> is not defined in
<filename>repmgr.conf</filename>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_DB_CONN (6)</option></term>
<listitem>
<para>
&repmgr; was unable to connect to the local PostgreSQL node.
</para>
<para>
PostgreSQL must be running before &repmgrd;
can be started. Additionally, unless the <option>--no-wait</option> option was
provided, &repmgr; needs to be able to connect to the local PostgreSQL node
to determine the state of &repmgrd;.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_REPMGRD_SERVICE (27)</option></term>
<listitem>
<para>
The &repmgrd; start command (defined in
<varname>repmgrd_service_start_command</varname>) was not successfully executed.
</para>
<para>
This can also mean that &repmgr; was unable to confirm whether &repmgrd;
successfully started (unless the <option>--no-wait</option> option was provided).
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-daemon-stop"/>,
<xref linkend="repmgrd-daemon"/>,
<xref linkend="repmgr-service-status"/>,
<xref linkend="repmgr-service-pause"/>,
<xref linkend="repmgr-service-unpause"/>
</para>
</refsect1>
</refentry>

205
doc/repmgr-daemon-stop.xml Normal file
View File

@@ -0,0 +1,205 @@
<refentry id="repmgr-daemon-stop">
<indexterm>
<primary>repmgr daemon stop</primary>
</indexterm>
<indexterm>
<primary>repmgrd</primary>
<secondary>stopping</secondary>
</indexterm>
<refmeta>
<refentrytitle>repmgr daemon stop</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr daemon stop</refname>
<refpurpose>Stop the &repmgrd; daemon on the local node</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command stops the &repmgrd; daemon on the
local node.
</para>
<para>
By default, &repmgr; will wait for up to 15 seconds to confirm that &repmgrd;
stopped. This behaviour can be overridden by specifying a different value using the <option>--wait</option>
option, or disabled altogether with the <option>--no-wait</option> option.
</para>
<note>
<para>
If PostgreSQL is not running on the local node, under some circumstances &repmgr; may not
be able to confirm if &repmgrd; has actually stopped.
</para>
</note>
<important>
<para>
The <filename>repmgr.conf</filename> parameter <varname>repmgrd_service_stop_command</varname>
must be set for <command>repmgr daemon stop</command> to work; see section
<xref linkend="repmgr-daemon-stop-configuration"/> for details.
</para>
</important>
</refsect1>
<refsect1>
<title>Configuration</title>
<para>
<command>repmgr daemon stop</command> will execute the command defined by the
<varname>repmgrd_service_stop_command</varname> parameter in <filename>repmgr.conf</filename>.
This must be set to a shell command which will stop &repmgrd;;
if &repmgr; was installed from a package, this will be the service command defined by the
package. For more details see <link linkend="appendix-packages">Appendix: &repmgr; package details</link>.
</para>
<important>
<para>
If &repmgr; was installed from a system package, and you do not configure
<varname>repmgrd_service_stop_command</varname> to an appropriate service command, this may
result in the system becoming confused about the state of the &repmgrd;
service; this is particularly the case with <literal>systemd</literal>.
</para>
</important>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually attempt to stop &repmgrd;.
</para>
<para>
This action will output the command which would be executed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-w</option></term>
<term><option>--wait</option></term>
<listitem>
<para>
Wait for the specified number of seconds to confirm that &repmgrd;
stopped successfully.
</para>
<para>
Note that providing <option>--wait=0</option> is the equivalent of <option>--no-wait</option>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--no-wait</option></term>
<listitem>
<para>
Don't wait to confirm that &repmgrd;
stopped successfully.
</para>
<para>
This is equivalent to providing <option>--wait=0</option>.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1 id="repmgr-daemon-stop-configuration" xreflabel="repmgr daemon stop configuration">
<title>Configuration file settings</title>
<para>
The following parameter in <filename>repmgr.conf</filename> is relevant
to <command>repmgr daemon stop</command>:
</para>
<variablelist>
<varlistentry>
<term><option>repmgrd_service_stop_command</option></term>
<listitem>
<indexterm>
<primary>repmgrd_service_stop_command</primary>
<secondary>with &quot;repmgr daemon stop&quot;</secondary>
</indexterm>
<para>
<command>repmgr daemon stop</command> will execute the command defined by the
<varname>repmgrd_service_stop_command</varname> parameter in <filename>repmgr.conf</filename>.
This must be set to a shell command which will stop &repmgrd;;
if &repmgr; was installed from a package, this will be the service command defined by the
package. For more details see <link linkend="appendix-packages">Appendix: &repmgr; package details</link>.
</para>
<important>
<para>
If &repmgr; was installed from a system package, and you do not configure
<varname>repmgrd_service_stop_command</varname> to an appropriate service command, this may
result in the system becoming confused about the state of the &repmgrd;
service; this is particularly the case with <literal>systemd</literal>.
</para>
</important>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
One of the following exit codes will be emitted by <command>repmgr daemon stop</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
&repmgrd; could be stopped.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_BAD_CONFIG (1)</option></term>
<listitem>
<para>
<varname>repmgrd_service_stop_command</varname> is not defined in
<filename>repmgr.conf</filename>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_REPMGRD_SERVICE (27)</option></term>
<listitem>
<para>
&repmgrd; could not be stopped.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-daemon-start"/>,
<xref linkend="repmgrd-daemon"/>,
<xref linkend="repmgr-service-status"/>,
<xref linkend="repmgr-service-pause"/>,
<xref linkend="repmgr-service-unpause"/>
</para>
</refsect1>
</refentry>

View File

@@ -1,70 +0,0 @@
<chapter id="repmgr-node-check" xreflabel="repmgr node check">
<indexterm>
<primary>repmgr node check</primary>
</indexterm>
<title>repmgr node check</title>
<para>
Performs some health checks on a node from a replication perspective.
This command must be run on the local node.
</para>
<para>
Sample output (execute <command>repmgr node check</command>):
<programlisting>
Node "node1":
Server role: OK (node is primary)
Replication lag: OK (N/A - node is primary)
WAL archiving: OK (0 pending files)
Downstream servers: OK (2 of 2 downstream nodes attached)
Replication slots: OK (node has no replication slots)
</programlisting>
</para>
<para>
Additionally each check can be performed individually by supplying
an additional command line parameter, e.g.:
<programlisting>
$ repmgr node check --role
OK (node is primary)
</programlisting>
</para>
<para>
Parameters for individual checks are as follows:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>--role</literal>: checks if the node has the expected role
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--replication-lag</literal>: checks if the node is lagging by more than
<varname>replication_lag_warning</varname> or <varname>replication_lag_critical</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--archive-ready</literal>: checks for WAL files which have not yet been archived
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--downstream</literal>: checks that the expected downstream nodes are attached
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--slots</literal>: checks there are no inactive replication slots
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Individual checks can also be output in a Nagios-compatible format by additionally
providing the option <literal>--nagios</literal>.
</para>
</chapter>

308
doc/repmgr-node-check.xml Normal file
View File

@@ -0,0 +1,308 @@
<refentry id="repmgr-node-check">
<indexterm>
<primary>repmgr node check</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr node check</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr node check</refname>
<refpurpose>performs some health checks on a node from a replication perspective</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
Performs some health checks on a node from a replication perspective.
This command must be run on the local node.
</para>
<note>
<para>
Currently &repmgr; performs health checks on physical replication
slots only, with the aim of warning about streaming replication standbys which
have become detached and the associated risk of uncontrolled WAL file
growth.
</para>
</note>
</refsect1>
<refsect1>
<title>Example</title>
<para>
Execution on the primary server:
<programlisting>
$ repmgr -f /etc/repmgr.conf node check
Node "node1":
Server role: OK (node is primary)
Replication lag: OK (N/A - node is primary)
WAL archiving: OK (0 pending files)
Upstream connection: OK (N/A - is primary)
Downstream servers: OK (2 of 2 downstream nodes attached)
Replication slots: OK (node has no physical replication slots)
Missing replication slots: OK (node has no missing physical replication slots)
Configured data directory: OK (configured "data_directory" is "/var/lib/postgresql/data")</programlisting>
</para>
<para>
Execution on a standby server:
<programlisting>
$ repmgr -f /etc/repmgr.conf node check
Node "node2":
Server role: OK (node is standby)
Replication lag: OK (0 seconds)
WAL archiving: OK (0 pending archive ready files)
Upstream connection: OK (node "node2" (ID: 2) is attached to expected upstream node "node1" (ID: 1))
Downstream servers: OK (this node has no downstream nodes)
Replication slots: OK (node has no physical replication slots)
Missing physical replication slots: OK (node has no missing physical replication slots)
Configured data directory: OK (configured "data_directory" is "/var/lib/postgresql/data")</programlisting>
</para>
</refsect1>
<refsect1>
<title>Individual checks</title>
<para>
Each check can be performed individually by supplying
an additional command line parameter, e.g.:
<programlisting>
$ repmgr node check --role
OK (node is primary)</programlisting>
</para>
<para>
Parameters for individual checks are as follows:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<option>--role</option>: checks if the node has the expected role
</simpara>
</listitem>
<listitem>
<simpara>
<option>--replication-lag</option>: checks if the node is lagging by more than
<varname>replication_lag_warning</varname> or <varname>replication_lag_critical</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<option>--archive-ready</option>: checks for WAL files which have not yet been archived,
and returns <literal>WARNING</literal> or <literal>CRITICAL</literal> if the number
exceeds <varname>archive_ready_warning</varname> or <varname>archive_ready_critical</varname> respectively.
</simpara>
</listitem>
<listitem>
<simpara>
<option>--downstream</option>: checks that the expected downstream nodes are attached
</simpara>
</listitem>
<listitem>
<simpara>
<option>--upstream</option>: checks that the node is attached to its expected upstream
</simpara>
</listitem>
<listitem>
<simpara>
<option>--slots</option>: checks there are no inactive physical replication slots
</simpara>
</listitem>
<listitem>
<simpara>
<option>--missing-slots</option>: checks there are no missing physical replication slots
</simpara>
</listitem>
<listitem>
<simpara>
<option>--data-directory-config</option>: checks the data directory configured in
<filename>repmgr.conf</filename> matches the actual data directory.
This check is not directly related to replication, but is useful to verify &repmgr;
is correctly configured.
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>repmgrd</title>
<para>
A separate check is available to verify whether &repmgrd; is running,
This is not included in the general output, as this does not
per-se constitute a check of the node's replication status.
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<option>--repmgrd</option>: checks whether &repmgrd; is running.
If &repmgrd; is running but paused, status <literal>1</literal>
(<literal>WARNING</literal>) is returned.
</simpara>
</listitem>
</itemizedlist>
</refsect1>
<refsect1>
<title>Additional checks</title>
<para>
Several checks are provided for diagnostic purposes and are not
included in the general output:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<option>--db-connection</option>: checks if &repmgr; can connect to the
database on the local node.
</simpara>
<simpara>
This option is particularly useful in combination with <command>SSH</command>, as
it can be used to troubleshoot connection issues encountered when &repmgr; is
executed remotely (e.g. during a switchover operation).
</simpara>
</listitem>
<listitem>
<simpara>
<option>--replication-config-owner</option>: checks if the file containing replication
configuration (PostgreSQL 12 and later: <filename>postgresql.auto.conf</filename>;
PostgreSQL 11 and earlier: <filename>recovery.conf</filename>) is
owned by the same user who owns the data directory.
</simpara>
<simpara>
Incorrect ownership of these files (e.g. if they are owned by <literal>root</literal>)
will cause operations which need to update the replication configuration
(e.g. <link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>
or <link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>)
to fail.
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>Connection options</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<option>-S</option>/<option>--superuser</option>: connect as the
named superuser instead of the &repmgr; user
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>Output format</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<option>--csv</option>: generate output in CSV format (not available
for individual checks)
</simpara>
</listitem>
<listitem>
<simpara>
<option>--nagios</option>: generate output in a Nagios-compatible format
(for individual checks only)
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
When executing <command>repmgr node check</command> with one of the individual
checks listed above, &repmgr; will emit one of the following Nagios-style exit codes
(even if <option>--nagios</option> is not supplied):
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>0</literal>: OK
</simpara>
</listitem>
<listitem>
<simpara>
<literal>1</literal>: WARNING
</simpara>
</listitem>
<listitem>
<simpara>
<literal>2</literal>: ERROR
</simpara>
</listitem>
<listitem>
<simpara>
<literal>3</literal>: UNKNOWN
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
One of the following exit codes will be emitted by <command>repmgr status check</command>
if no individual check was specified.
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-node-status"/>, <xref linkend="repmgr-cluster-show"/>
</para>
</refsect1>
</refentry>

View File

@@ -1,13 +0,0 @@
<chapter id="repmgr-node-rejoin" xreflabel="repmgr node rejoin">
<indexterm>
<primary>repmgr node rejoin</primary>
</indexterm>
<title>repmgr node rejoin</title>
<para>
Enables a dormant (stopped) node to be rejoined to the replication cluster.
</para>
<para>
This can optionally use <command>pg_rewind</command> to re-integrate a node which has diverged
from the rest of the cluster, typically a failed primary.
</para>
</chapter>

505
doc/repmgr-node-rejoin.xml Normal file
View File

@@ -0,0 +1,505 @@
<refentry id="repmgr-node-rejoin">
<indexterm>
<primary>repmgr node rejoin</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr node rejoin</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr node rejoin</refname>
<refpurpose>rejoin a dormant (stopped) node to the replication cluster</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
Enables a dormant (stopped) node to be rejoined to the replication cluster.
</para>
<para>
This can optionally use <application>pg_rewind</application> to re-integrate
a node which has diverged from the rest of the cluster, typically a failed primary.
</para>
<para>
Note that <command>repmgr node rejoin</command> can only be used to attach
a standby to the current primary, not another standby.
</para>
<tip>
<para>
If the node is running and needs to be attached to the current primary, use
<xref linkend="repmgr-standby-follow"/>.
</para>
<para>
Note <xref linkend="repmgr-standby-follow"/> can only be used for standbys which have not diverged
from the rest of the cluster.
</para>
</tip>
</refsect1>
<refsect1>
<title>Usage</title>
<para>
<programlisting>
repmgr node rejoin -d '$conninfo'</programlisting>
where <literal>$conninfo</literal> is the PostgreSQL <literal>conninfo</literal> string of the
<emphasis>current</emphasis> primary node (or that of any reachable node in the cluster, but
<emphasis>not</emphasis> the local node). This is so that &repmgr; can fetch up-to-date information
about the current state of the cluster.
</para>
<para>
<filename>repmgr.conf</filename> for the stopped node *must* be supplied explicitly if not
otherwise available.
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually execute the rejoin.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--force-rewind</option></term>
<listitem>
<para>
Execute <application>pg_rewind</application>.
</para>
<para>
See <xref linkend="repmgr-node-rejoin-pg-rewind"/> for more details on using
<application>pg_rewind</application>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--config-files</option></term>
<listitem>
<para>
comma-separated list of configuration files to retain after
executing <application>pg_rewind</application>.
</para>
<para>
Currently <application>pg_rewind</application> will overwrite
the local node's configuration files with the files from the source node,
so it's advisable to use this option to ensure they are kept.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--config-archive-dir</option></term>
<listitem>
<para>
Directory to temporarily store configuration files specified with
<option>--config-files</option>; default: <filename>/tmp</filename>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-W/--no-wait</option></term>
<listitem>
<para>
Don't wait for the node to rejoin cluster.
</para>
<para>
If this option is supplied, &repmgr; will restart the node but
not wait for it to connect to the primary.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Configuration file settings</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>node_rejoin_timeout</literal>:
the maximum length of time (in seconds) to wait for
the node to reconnect to the replication cluster (defaults to
the value set in <literal>standby_reconnect_timeout</literal>,
60 seconds).
</simpara>
<simpara>
Note that <literal>standby_reconnect_timeout</literal> must be
set to a value equal to or greater than
<literal>node_rejoin_timeout</literal>.
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1 id="repmgr-node-rejoin-events">
<title>Event notifications</title>
<para>
A <literal>node_rejoin</literal> <link linkend="event-notifications">event notification</link> will be generated.
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
One of the following exit codes will be emitted by <command>repmgr node rejoin</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
The node rejoin succeeded; or if <option>--dry-run</option> was provided,
no issues were detected which would prevent the node rejoin.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_BAD_CONFIG (1)</option></term>
<listitem>
<para>
A configuration issue was detected which prevented &repmgr; from
continuing with the node rejoin.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NO_RESTART (4)</option></term>
<listitem>
<para>
The node could not be restarted.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_REJOIN_FAIL (24)</option></term>
<listitem>
<para>
The node rejoin operation failed.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Notes</title>
<para>
Currently <command>repmgr node rejoin</command> can only be used to attach
a standby to the current primary, not another standby.
</para>
<para>
The node's PostgreSQL instance must have been shut down cleanly. If this was not the
case, it will need to be started up until it has reached a consistent recovery point,
then shut down cleanly.
</para>
<para>
In PostgreSQL 13 and later, this will be done automatically
if the <option>--force-rewind</option> is provided (even if an actual rewind
is not necessary).
</para>
<para>
With PostgreSQL 12 and earlier, PostgreSQL will need to
be started and shut down manually; see below for the best way to do this.
</para>
<tip>
<para>
If <application>PostgreSQL</application> is started in single-user mode and
input is directed from <filename>/dev/null/</filename>, it will perform recovery
then immediately quit, and will then be in a state suitable for use by
<application>pg_rewind</application>.
<programlisting>
rm -f /var/lib/pgsql/data/recovery.conf
postgres --single -D /var/lib/pgsql/data/ &lt; /dev/null</programlisting>
</para>
<para>
Note that <filename>standby.signal</filename> (PostgreSQL 11 and earlier:
<filename>recovery.conf</filename>) <emphasis>must</emphasis> be removed
from the data directory for PostgreSQL to be able to start in single
user mode.
</para>
</tip>
</refsect1>
<refsect1 id="repmgr-node-rejoin-pg-rewind" xreflabel="Using pg_rewind">
<title>Using <command>pg_rewind</command></title>
<indexterm>
<primary>pg_rewind</primary>
<secondary>using with "repmgr node rejoin"</secondary>
</indexterm>
<para>
<command>repmgr node rejoin</command> can optionally use <command>pg_rewind</command> to re-integrate a
node which has diverged from the rest of the cluster, typically a failed primary.
</para>
<note>
<para>
<command>pg_rewind</command> <emphasis>requires</emphasis> that either
<varname>wal_log_hints</varname> is enabled, or that
data checksums were enabled when the cluster was initialized. See the
<ulink url="https://www.postgresql.org/docs/current/app-pgrewind.html"><command>pg_rewind</command> documentation</ulink> for details.
</para>
<para>
Additionally, <varname>full_page_writes</varname> must be enabled; this is the default and
normally should never be disabled.
</para>
</note>
<para>
We strongly recommend familiarizing yourself with <command>pg_rewind</command> before attempting
to use it with &repmgr;, as while it is an extremely useful tool, it is <emphasis>not</emphasis>
a &quot;magic bullet&quot; which can resolve all problematic replication situations.
</para>
<para>
A typical use-case for <command>pg_rewind</command> is when a scenario like the following
is encountered:
<programlisting>
$ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr' \
--force-rewind --config-files=postgresql.local.conf,postgresql.conf --verbose --dry-run
NOTICE: rejoin target is node "node3" (node ID: 3)
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 6652184002263212600
ERROR: this node cannot attach to rejoin target node 3
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710
HINT: use --force-rewind to execute pg_rewind</programlisting>
Here, <literal>node3</literal> was promoted to a primary while the local node was
still attached to the previous primary; this can potentially happen during e.g. a
network split. <command>pg_rewind</command> can re-sync the local node with <literal>node3</literal>,
removing the need for a full reclone.
</para>
<para>
To have <command>repmgr node rejoin</command> use <command>pg_rewind</command>,
pass the command line option <literal>--force-rewind</literal>, which will tell &repmgr;
to execute <command>pg_rewind</command> to ensure the node can be rejoined successfully.
</para>
<refsect2 id="repmgr-node-rejoin-pg-rewind-config-files" xreflabel="pg_rewind and configuration files">
<title><command>pg_rewind</command> and configuration file retention</title>
<indexterm>
<primary>pg_rewind</primary>
<secondary>configuration file retention</secondary>
</indexterm>
<para>
Be aware that if <command>pg_rewind</command> is executed and actually performs a
rewind operation, any configuration files in the PostgreSQL data directory will be
overwritten with those from the source server.
</para>
<para>
To prevent this happening, provide a comma-separated list of files to retain
using the <option>--config-file</option> command line option; the specified files
will be archived in a temporary directory (whose parent directory can be specified with
<option>--config-archive-dir</option>, default: <filename>/tmp</filename>)
and restored once the rewind operation is complete.
</para>
</refsect2>
<refsect2 id="repmgr-node-rejoin-pg-rewind-example" xreflabel="example using repmgr node rejoin and pg_rewind">
<title>Example using <command>repmgr node rejoin</command> and <command>pg_rewind</command></title>
<indexterm>
<primary>pg_rewind</primary>
<secondary>configuration file retention</secondary>
</indexterm>
<para>
Example, first using <option>--dry-run</option>, then actually executing the
<literal>node rejoin command</literal>.
<programlisting>
$ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr' \
--config-files=postgresql.local.conf,postgresql.conf --verbose --force-rewind --dry-run
NOTICE: rejoin target is node "node3" (node ID: 3)
INFO: replication connection to the rejoin target node was successful
INFO: local and rejoin target system identifiers match
DETAIL: system identifier is 6652460429293670710
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 3
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710
INFO: prerequisites for using pg_rewind are met
INFO: file "postgresql.local.conf" would be copied to "/tmp/repmgr-config-archive-node2/postgresql.local.conf"
INFO: file "postgresql.replication-setup.conf" would be copied to "/tmp/repmgr-config-archive-node2/postgresql.replication-setup.conf"
INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is:
pg_rewind -D '/var/lib/postgresql/data' --source-server='host=node3 dbname=repmgr user=repmgr'
INFO: prerequisites for executing NODE REJOIN are met</programlisting>
<note>
<para>
If <option>--force-rewind</option> is used with the <option>--dry-run</option> option,
this checks the prerequisites for using <application>pg_rewind</application>, but is
not an absolute guarantee that actually executing <application>pg_rewind</application>
will succeed. See also section <xref linkend="repmgr-node-rejoin-caveats"/> below.
</para>
</note>
<programlisting>
$ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node3 dbname=repmgr user=repmgr' \
--config-files=postgresql.local.conf,postgresql.conf --verbose --force-rewind
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 3
DETAIL: rejoin target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/610D710
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "pg_rewind -D '/var/lib/postgresql/data' --source-server='host=node3 dbname=repmgr user=repmgr'"
NOTICE: 2 files copied to /var/lib/postgresql/data
NOTICE: setting node 2's upstream to node 3
NOTICE: starting server using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' start"
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 3</programlisting>
</para>
</refsect2>
<refsect2 id="repmgr-node-rejoin-postgresql-94" xreflabel="pg_rewind and PostgreSQL 9.4">
<title><command>pg_rewind</command> and PostgreSQL 9.4</title>
<indexterm>
<primary>pg_rewind</primary>
<secondary>PostgreSQL 9.4</secondary>
</indexterm>
<para>
<application>pg_rewind</application> is available in PostgreSQL 9.5 and later as part of the core distribution.
Users of PostgreSQL 9.4 will need to manually install it; the source code is available here:
<ulink url="https://github.com/vmware/pg_rewind">https://github.com/vmware/pg_rewind</ulink>.
If the <application>pg_rewind</application>
binary is not installed in the PostgreSQL <filename>bin</filename> directory, provide
its full path on the demotion candidate with <option>--force-rewind</option>.
</para>
<para>
Note that building the 9.4 version of <application>pg_rewind</application> requires the PostgreSQL
source code.
</para>
</refsect2>
</refsect1>
<refsect1 id="repmgr-node-rejoin-caveats" xreflabel="Caveats">
<title>Caveats when using <command>repmgr node rejoin</command></title>
<indexterm>
<primary>repmgr node rejoin</primary>
<secondary>caveats</secondary>
</indexterm>
<para>
<command>repmgr node rejoin</command> attempts to determine whether it will succeed by
comparing the timelines and relative WAL positions of the local node (rejoin candidate) and primary
(rejoin target). This is particularly important if planning to use <application>pg_rewind</application>,
which currently (as of PostgreSQL 12) may appear to succeed (or indicate there is no action
needed) but potentially allow an impossible action, such as trying to rejoin a standby to a
primary which is behind the standby. &repmgr; will prevent this situation from occurring.
</para>
<para>
Currently it is <emphasis>not</emphasis> possible to detect a situation where the rejoin target
is a standby which has been &quot;promoted&quot; by removing <filename>recovery.conf</filename>
(PostgreSQL 12 and later: <filename>standby.signal</filename>) and restarting it.
In this case there will be no information about the point the rejoin target diverged
from the current standby; the rejoin operation will fail and
the current standby's PostgreSQL log will contain entries with the text
&quot;<literal>record with incorrect prev-link</literal>&quot;.
</para>
<para>
In PostgreSQL 9.5 and earlier, it is <emphasis>not</emphasis> possible to use
<application>pg_rewind</application> to attach to a target node with a lower
timeline than the local node.
</para>
<para>
We strongly recommend running <command>repmgr node rejoin</command> with the
<option>--dry-run</option> option first. Additionally it might be a good idea
to execute the <application>pg_rewind</application> command displayed by
&repmgr; with the <application>pg_rewind</application> <option>--dry-run</option>
option. Note that <application>pg_rewind</application> does not indicate that it
is running in <option>--dry-run</option> mode.
</para>
<warning>
<para>
In all PostgreSQL released before February 2021, <application>pg_rewind</application>
contains a corner-case bug which affects standbys in a very specific situation.
</para>
<para>
This situation occurs when a standby was shut down <emphasis>before</emphasis> its
primary node, and an attempt is made to attach this standby to another primary
in the same cluster (following a &quot;split brain&quot; situation where the standby
was connected to the wrong primary). In this case, &repmgr; will correctly determine
that <application>pg_rewind</application> should be executed, however
<application>pg_rewind</application> incorrectly decides that no action is necessary.
</para>
<para>
In this situation, &repmgr; will report something like:
<programlisting>
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 1
DETAIL: rejoin target server's timeline 3 forked off current database system timeline 2 before current recovery point 0/7019C10</programlisting>
but when executed, <application>pg_rewind</application> will report:
<programlisting>
pg_rewind: servers diverged at WAL location 0/7015540 on timeline 2
pg_rewind: no rewind required</programlisting>
and if an attempt is made to attach the standby to the new primary, PostgreSQL logs on the standby
will contain errors like:
<programlisting>
[2020-09-07 15:01:41 UTC] LOG: 00000: replication terminated by primary server
[2020-09-07 15:01:41 UTC] DETAIL: End of WAL reached on timeline 2 at 0/7015540.
[2020-09-07 15:01:41 UTC] LOG: 00000: new timeline 3 forked off current database system timeline 2 before current recovery point 0/7019C10</programlisting>
</para>
<para>
Currently it is not possible to resolve this situation using <application>pg_rewind</application>.
A <ulink url="https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=2b4f3130382fe2f8705863e4d38589d4d69cd695">patch</ulink>
was submitted and is included in all PostgreSQL versions released in February 2021 or later.
</para>
<para>
As a workaround, start the primary server the standby was previously attached to,
and ensure the standby can be attached to it. If <application>pg_rewind</application> was actually executed,
it will have copied in the <filename>.history</filename> file from the target primary server; this must
be removed. <command>repmgr node rejoin</command> can then be used to attach the standby to the original
primary. Ensure any changes pending on the primary have propagated to the standby. Then shut down the primary
server <emphasis>first</emphasis>, before shutting down the standby. It should then be possible to
use <command>repmgr node rejoin</command> to attach the standby to the new primary.
</para>
</warning>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-standby-follow"/>, <xref linkend="repmgr-standby-switchover"/>
</para>
</refsect1>
</refentry>

166
doc/repmgr-node-service.xml Normal file
View File

@@ -0,0 +1,166 @@
<refentry id="repmgr-node-service">
<indexterm>
<primary>repmgr node service</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr node service</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr node service</refname>
<refpurpose>show or execute the system service command to stop/start/restart/reload/promote a node</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
Shows or executes the system service command to stop/start/restart/reload a node.
</para>
<para>
This command is mainly meant for internal &repmgr; usage, but is useful for
confirming the command configuration.
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Log the steps which would be taken, including displaying the command which would be executed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--action</option></term>
<listitem>
<para>
The action to perform. One of <literal>start</literal>, <literal>stop</literal>,
<literal>restart</literal>, <literal>reload</literal> or <literal>promote</literal>.
</para>
<para>
If the parameter <option>--list-actions</option> is provided together with
<option>--action</option>, the command which would be executed will be printed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--list-actions</option></term>
<listitem>
<para>
List all configured commands.
</para>
<para>
If the parameter <option>--action</option> is provided together with
<option>--list-actions</option>, the command which would be executed for that
particular action will be printed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--checkpoint</option></term>
<listitem>
<para>
Issue a <command>CHECKPOINT</command> before stopping or restarting the node.
</para>
<para>
Note that a superuser connection is required to be able to execute the
<command>CHECKPOINT</command> command. From PostgreSQL 15 the <varname>pg_checkpoint</varname>
predefined role removes the need for superuser permissions to perform <command>CHECKPOINT</command> command.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-S</option>/<option>--superuser</option></term>
<listitem>
<para>
Connect as the named superuser instead of the normal &repmgr; user.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
One of the following exit codes will be emitted by <command>repmgr node service</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_LOCAL_COMMAND (5)</option></term>
<listitem>
<para>
Execution of the system service command failed.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Examples</title>
<para>
See what action would be taken for a restart:
<programlisting>
[postgres@node1 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf node service --action=restart --checkpoint --dry-run
INFO: a CHECKPOINT would be issued here
INFO: would execute server command "sudo service postgresql-12 restart"</programlisting>
</para>
<para>
Restart the PostgreSQL instance:
<programlisting>
[postgres@node1 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf node service --action=restart --checkpoint
NOTICE: issuing CHECKPOINT
DETAIL: executing server command "sudo service postgresql-12 restart"
Redirecting to /bin/systemctl restart postgresql-12.service</programlisting>
</para>
<para>
List all commands:
<programlisting>
[postgres@node1 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf node service --list-actions
Following commands would be executed for each action:
start: "sudo service postgresql-12 start"
stop: "sudo service postgresql-12 stop"
restart: "sudo service postgresql-12 restart"
reload: "sudo service postgresql-12 reload"
promote: "/usr/pgsql-12/bin/pg_ctl -w -D '/var/lib/pgsql/12/data' promote"</programlisting>
</para>
<para>
List a single command:
<programlisting>
[postgres@node1 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf node service --list-actions --action=promote
/usr/pgsql-12/bin/pg_ctl -w -D '/var/lib/pgsql/12/data' promote </programlisting>
</para>
</refsect1>
</refentry>

View File

@@ -1,29 +0,0 @@
<chapter id="repmgr-node-status" xreflabel="repmgr node status">
<indexterm>
<primary>repmgr node status</primary>
</indexterm>
<title>repmgr node status</title>
<para>
Displays an overview of a node's basic information and replication
status. This command must be run on the local node.
</para>
<para>
Sample output (execute <command>repmgr node status</command>):
<programlisting>
Node "node1":
PostgreSQL version: 10beta1
Total data size: 30 MB
Conninfo: host=node1 dbname=repmgr user=repmgr connect_timeout=2
Role: primary
WAL archiving: off
Archive command: (none)
Replication connections: 2 (of maximal 10)
Replication slots: 0 (of maximal 10)
Replication lag: n/a
</programlisting>
</para>
<para>
See <xref linkend="repmgr-node-check"> to diagnose issues.
</para>
</chapter>

View File

@@ -0,0 +1,91 @@
<refentry id="repmgr-node-status">
<indexterm>
<primary>repmgr node status</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr node status</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr node status</refname>
<refpurpose>show overview of a node's basic information and replication status</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
Displays an overview of a node's basic information and replication
status. This command must be run on the local node.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<para>
<programlisting>
$ repmgr -f /etc/repmgr.conf node status
Node "node1":
PostgreSQL version: 10beta1
Total data size: 30 MB
Conninfo: host=node1 dbname=repmgr user=repmgr connect_timeout=2
Role: primary
WAL archiving: off
Archive command: (none)
Replication connections: 2 (of maximal 10)
Replication slots: 0 (of maximal 10)
Replication lag: n/a</programlisting>
</para>
</refsect1>
<refsect1>
<title>Output format</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>--csv</literal>: generate output in CSV format
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
One of the following exit codes will be emitted by <command>repmgr node status</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
See <xref linkend="repmgr-node-check"/> to diagnose issues and <xref linkend="repmgr-cluster-show"/>
for an overview of all nodes in the cluster.
</para>
</refsect1>
</refentry>

View File

@@ -1,18 +0,0 @@
<chapter id="repmgr-primary-register" xreflabel="repmgr primary register">
<indexterm><primary>repmgr primary register</primary></indexterm>
<title>repmgr primary register</title>
<para>
<command>repmgr primary register</command> registers a primary node in a
streaming replication cluster, and configures it for use with repmgr, including
installing the &repmgr; extension. This command needs to be executed before any
standby nodes are registered.
</para>
<para>
Execute with the <literal>--dry-run</literal> option to check what would happen without
actually registering the primary.
</para>
<para>
<command>repmgr master register</command> can be used as an alias for
<command>repmgr primary register</command>.
</para>
</chapter>

View File

@@ -0,0 +1,124 @@
<refentry id="repmgr-primary-register">
<indexterm>
<primary>repmgr primary register</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr primary register</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr primary register</refname>
<refpurpose>initialise a repmgr installation and register the primary node</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
<command>repmgr primary register</command> registers a primary node in a
streaming replication cluster, and configures it for use with &repmgr;, including
installing the &repmgr; extension. This command needs to be executed before any
standby nodes are registered.
</para>
<note>
<para>
&repmgr; will attempt to install the &repmgr; extension as part of this command,
however this will fail if the <literal>repmgr</literal> user is not a superuser.
</para>
<para>
It's possible to install the &repmgr; extension manually before executing
<command>repmgr primary register</command>; in this case &repmgr; will
detect the presence of the extension and skip that step.
</para>
</note>
</refsect1>
<refsect1>
<title>Execution</title>
<para>
Execute with the <option>--dry-run</option> option to check what would happen without
actually registering the primary.
</para>
<note>
<para>
If providing the configuration file location with <option>-f/--config-file</option>,
avoid using a relative path, as &repmgr; stores the configuration file location
in the repmgr metadata for use when &repmgr; is executed remotely (e.g. during
<xref linkend="repmgr-standby-switchover"/>). &repmgr; will attempt to convert the
a relative path into an absolute one, but this may not be the same as the path you
would explicitly provide (e.g. <filename>./repmgr.conf</filename> might be converted
to <filename>/path/to/./repmgr.conf</filename>, whereas you'd normally write
<filename>/path/to/repmgr.conf</filename>).
</para>
</note>
<para>
<command>repmgr master register</command> can be used as an alias for
<command>repmgr primary register</command>.
</para>
</refsect1>
<refsect1>
<title>User permission requirements</title>
<para>
The <literal>repmgr</literal> user must be a superuser in order for &repmgr;
to be able to install the <literal>repmgr</literal> extension.
</para>
<para>
If this is not the case, the <literal>repmgr</literal> extension can be installed
manually before executing <command>repmgr primary register</command>.
</para>
<para>
A future &repmgr; release will enable the provision of a <option>--superuser</option>
name for the installation of the extension.
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually register the primary.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-F</option>, <option>--force</option></term>
<listitem>
<para>
Overwrite an existing node record
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1 id="repmgr-primary-register-events">
<title>Event notifications</title>
<para>
Following <link linkend="event-notifications">event notifications</link> will be generated:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>cluster_created</literal></simpara>
</listitem>
<listitem>
<simpara><literal>primary_register</literal></simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
</refentry>

View File

@@ -1,18 +0,0 @@
<chapter id="repmgr-primary-unregister" xreflabel="repmgr primary unregister">
<indexterm><primary>repmgr primary unregister</primary></indexterm>
<title>repmgr primary unregister</title>
<para>
<command>repmgr primary register</command> unregisters an inactive primary node
from the &repmgr; metadata. This is typically when the primary has failed and is
being removed from the cluster after a new primary has been promoted.
</para>
<para>
Execute with the <literal>--dry-run</literal> option to check what would happen without
actually unregistering the node.
</para>
<para>
<command>repmgr master unregister</command> can be used as an alias for
<command>repmgr primary unregister</command>/
</para>
</chapter>

View File

@@ -0,0 +1,85 @@
<refentry id="repmgr-primary-unregister">
<indexterm>
<primary>repmgr primary unregister</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr primary unregister</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr primary unregister</refname>
<refpurpose>unregister an inactive primary node</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
<command>repmgr primary unregister</command> unregisters an inactive primary node
from the &repmgr; metadata. This is typically when the primary has failed and is
being removed from the cluster after a new primary has been promoted.
</para>
</refsect1>
<refsect1>
<title>Execution</title>
<para>
<command>repmgr primary unregister</command> can be run on any active &repmgr; node,
with the ID of the node to unregister passed as <option>--node-id</option>.
</para>
<para>
Execute with the <literal>--dry-run</literal> option to check what would happen without
actually unregistering the node.
</para>
<para>
<command>repmgr master unregister</command> can be used as an alias for
<command>repmgr primary unregister</command>.
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually unregister the primary.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--node-id</option></term>
<listitem>
<para>
ID of the inactive primary to be unregistered.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--force</option></term>
<listitem>
<para>
Forcibly unregister the node if it is registered as an active primary,
as long as it has no registered standbys; or if it is registered as
a primary but running as a standby.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1 id="repmgr-primary-unregister-events">
<title>Event notifications</title>
<para>
A <literal>primary_unregister</literal> <link linkend="event-notifications">event notification</link> will be generated.
</para>
</refsect1>
</refentry>

View File

@@ -0,0 +1,127 @@
<refentry id="repmgr-service-pause">
<indexterm>
<primary>repmgr service pause</primary>
</indexterm>
<indexterm>
<primary>repmgrd</primary>
<secondary>pausing</secondary>
</indexterm>
<refmeta>
<refentrytitle>repmgr service pause</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr service pause</refname>
<refpurpose>Instruct all &repmgrd; instances in the replication cluster to pause failover operations</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command can be run on any active node in the replication cluster to instruct all
running &repmgrd; instances to &quot;pause&quot; themselves, i.e. take no
action (such as promoting themselves or following a new primary) if a failover event is detected.
</para>
<para>
This functionality is useful for performing maintenance operations, such as switchovers
or upgrades, which might otherwise trigger a failover if &repmgrd;
is running normally.
</para>
<note>
<para>
It's important to wait a few seconds after restarting PostgreSQL on any node before running
<command>repmgr service pause</command>, as the &repmgrd; instance
on the restarted node will take a second or two before it has updated its status.
</para>
</note>
<para>
<xref linkend="repmgr-service-unpause"/> will instruct all previously paused &repmgrd;
instances to resume normal failover operation.
</para>
</refsect1>
<refsect1>
<title>Prerequisites</title>
<para>
PostgreSQL must be accessible on all nodes (using the <varname>conninfo</varname> string shown by
<link linkend="repmgr-cluster-show"><command>repmgr cluster show</command></link>)
from the node where <command>repmgr service pause</command> is executed.
</para>
</refsect1>
<refsect1>
<title>Execution</title>
<para>
<command>repmgr service pause</command> can be executed on any active node in the
replication cluster. A valid <filename>repmgr.conf</filename> file is required.
It will have no effect on previously paused nodes.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<para>
<programlisting>
$ repmgr -f /etc/repmgr.conf service pause
NOTICE: node 1 (node1) paused
NOTICE: node 2 (node2) paused
NOTICE: node 3 (node3) paused</programlisting>
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check if nodes are reachable but don't pause &repmgrd;.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
One of the following exit codes will be emitted by <command>repmgr service unpause</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
&repmgrd; could be paused on all nodes.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_REPMGRD_PAUSE (26)</option></term>
<listitem>
<para>
&repmgrd; could not be paused on one or mode nodes.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-service-unpause"/>,
<xref linkend="repmgr-service-status"/>,
<xref linkend="repmgrd-pausing"/>,
<xref linkend="repmgr-daemon-start"/>,
<xref linkend="repmgr-daemon-stop"/>
</para>
</refsect1>
</refentry>

View File

@@ -0,0 +1,215 @@
<refentry id="repmgr-service-status">
<indexterm>
<primary>repmgr service status</primary>
</indexterm>
<indexterm>
<primary>repmgrd</primary>
<secondary>displaying service status</secondary>
</indexterm>
<refmeta>
<refentrytitle>repmgr service status</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr service status</refname>
<refpurpose>display information about the status of &repmgrd; on each node in the cluster</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command provides an overview over all active nodes in the cluster and the state
of each node's &repmgrd; instance. It can be used to check
the result of <xref linkend="repmgr-service-pause"/> and <xref linkend="repmgr-service-unpause"/>
operations.
</para>
</refsect1>
<refsect1>
<title>Prerequisites</title>
<para>
PostgreSQL should be accessible on all nodes (using the <varname>conninfo</varname> string shown by
<link linkend="repmgr-cluster-show"><command>repmgr cluster show</command></link>)
from the node where <command>repmgr service status</command> is executed.
</para>
</refsect1>
<refsect1>
<title>Execution</title>
<para>
<command>repmgr service status</command> can be executed on any active node in the
replication cluster. A valid <filename>repmgr.conf</filename> file is required.
</para>
<para>
If a node is not accessible, or PostgreSQL itself is not running on the node,
&repmgr; will not be able to determine the status of that node's &repmgrd; instance,
and &quot;<literal>n/a</literal>&quot; will be displayed in the node's <literal>repmgrd</literal>
column.
</para>
<note>
<para>
After restarting PostgreSQL on any node, the &repmgrd; instance
will take a second or two before it is able to update its status. Until then,
&repmgrd; will be shown as not running.
</para>
</note>
</refsect1>
<refsect1>
<title>Examples</title>
<para>
&repmgrd; running normally on all nodes:
<programlisting>$ repmgr -f /etc/repmgr.conf service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node1 | primary | * running | | running | 96563 | no | n/a
2 | node2 | standby | running | node1 | running | 96572 | no | 1 second(s) ago
3 | node3 | standby | running | node1 | running | 96584 | no | 0 second(s) ago</programlisting>
</para>
<para>
&repmgrd; paused on all nodes (using <xref linkend="repmgr-service-pause"/>):
<programlisting>$ repmgr -f /etc/repmgr.conf service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node1 | primary | * running | | running | 96563 | yes | n/a
2 | node2 | standby | running | node1 | running | 96572 | yes | 1 second(s) ago
3 | node3 | standby | running | node1 | running | 96584 | yes | 0 second(s) ago</programlisting>
</para>
<para>
&repmgrd; not running on one node:
<programlisting>$ repmgr -f /etc/repmgr.conf service status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+-------------+-------+---------+--------------------
1 | node1 | primary | * running | | running | 96563 | yes | n/a
2 | node2 | standby | running | node1 | not running | n/a | n/a | n/a
3 | node3 | standby | running | node1 | running | 96584 | yes | 0 second(s) ago</programlisting>
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--csv</option></term>
<listitem>
<para>
<command>repmgr service status</command> accepts an optional parameter <literal>--csv</literal>, which
outputs the replication cluster's status in a simple CSV format, suitable for
parsing by scripts, e.g.:
<programlisting>
$ repmgr -f /etc/repmgr.conf service status --csv
1,node1,primary,1,1,5722,1,100,-1,default
2,node2,standby,1,0,-1,1,100,1,default
3,node3,standby,1,1,5779,1,100,1,default</programlisting>
</para>
<para>
The columns have following meanings:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
node ID
</simpara>
</listitem>
<listitem>
<simpara>
node name
</simpara>
</listitem>
<listitem>
<simpara>
node type (primary or standby)
</simpara>
</listitem>
<listitem>
<simpara>
PostgreSQL server running (1 = running, 0 = not running)
</simpara>
</listitem>
<listitem>
<simpara>
&repmgrd; running (1 = running, 0 = not running, -1 = unknown)
</simpara>
</listitem>
<listitem>
<simpara>
&repmgrd; PID (-1 if not running or status unknown)
</simpara>
</listitem>
<listitem>
<simpara>
&repmgrd; paused (1 = paused, 0 = not paused, -1 = unknown)
</simpara>
</listitem>
<listitem>
<simpara>
&repmgrd; node priority
</simpara>
</listitem>
<listitem>
<simpara>
interval in seconds since the node's upstream was last seen (this will be -1 if the value could not be retrieved, or the node is primary)
</simpara>
</listitem>
<listitem>
<simpara>
node location
</simpara>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--detail</option></term>
<listitem>
<para>
Display additional information (<literal>location</literal>, <literal>priority</literal>)
about the &repmgr; configuration.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--verbose</option></term>
<listitem>
<para>
Display the full text of any database connection error messages.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-service-pause"/>,
<xref linkend="repmgr-service-unpause"/>,
<xref linkend="repmgr-cluster-show"/>,
<xref linkend="repmgrd-pausing"/>,
<xref linkend="repmgr-daemon-start"/>,
<xref linkend="repmgr-daemon-stop"/>
</para>
</refsect1>
</refentry>

View File

@@ -0,0 +1,121 @@
<refentry id="repmgr-service-unpause">
<indexterm>
<primary>repmgr service unpause</primary>
</indexterm>
<indexterm>
<primary>repmgrd</primary>
<secondary>unpausing</secondary>
</indexterm>
<refmeta>
<refentrytitle>repmgr service unpause</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr service unpause</refname>
<refpurpose>Instruct all &repmgrd; instances in the replication cluster to resume failover operations</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command can be run on any active node in the replication cluster to instruct all
running &repmgrd; instances to &quot;unpause&quot;
(following a previous execution of <xref linkend="repmgr-service-pause"/>)
and resume normal failover/monitoring operation.
</para>
<note>
<para>
It's important to wait a few seconds after restarting PostgreSQL on any node before running
<command>repmgr service pause</command>, as the &repmgrd; instance
on the restarted node will take a second or two before it has updated its status.
</para>
</note>
</refsect1>
<refsect1>
<title>Prerequisites</title>
<para>
PostgreSQL must be accessible on all nodes (using the <varname>conninfo</varname> string shown by
<link linkend="repmgr-cluster-show"><command>repmgr cluster show</command></link>)
from the node where <command>repmgr service pause</command> is executed.
</para>
</refsect1>
<refsect1>
<title>Execution</title>
<para>
<command>repmgr service unpause</command> can be executed on any active node in the
replication cluster. A valid <filename>repmgr.conf</filename> file is required.
It will have no effect on nodes which are not already paused.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<para>
<programlisting>
$ repmgr -f /etc/repmgr.conf service unpause
NOTICE: node 1 (node1) unpaused
NOTICE: node 2 (node2) unpaused
NOTICE: node 3 (node3) unpaused</programlisting>
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check if nodes are reachable but don't unpause &repmgrd;.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
One of the following exit codes will be emitted by <command>repmgr service unpause</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
&repmgrd; could be unpaused on all nodes.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_REPMGRD_PAUSE (26)</option></term>
<listitem>
<para>
&repmgrd; could not be unpaused on one or mode nodes.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-service-pause"/>,
<xref linkend="repmgr-service-status"/>,
<xref linkend="repmgrd-pausing"/>,
<xref linkend="repmgr-daemon-start"/>,
<xref linkend="repmgr-daemon-stop"/>
</para>
</refsect1>
</refentry>

View File

@@ -1,91 +0,0 @@
<chapter id="repmgr-standby-clone" xreflabel="repmgr standby clone">
<indexterm>
<primary>repmgr standby clone</primary>
<seealso>cloning</seealso>
</indexterm>
<title>repmgr standby clone</title>
<para>
<command>repmgr standby clone</command> clones a PostgreSQL node from another
PostgreSQL node, typically the primary, but optionally from any other node in
the cluster or from Barman. It creates the <filename>recovery.conf</filename> file required
to attach the cloned node to the primary node (or another standby, if cascading replication
is in use).
</para>
<note>
<simpara>
<command>repmgr standby clone</command> does not start the standby, and after cloning
<command>repmgr standby register</command> must be executed to notify &repmgr; of its presence.
</simpara>
</note>
<sect1 id="repmgr-standby-clone-config-file-copying" xreflabel="Copying configuration files">
<title>Handling configuration files</title>
<para>
Note that by default, all configuration files in the source node's data
directory will be copied to the cloned node. Typically these will be
<filename>postgresql.conf</filename>, <filename>postgresql.auto.conf</filename>,
<filename>pg_hba.conf</filename> and <filename>pg_ident.conf</filename>.
These may require modification before the standby is started.
</para>
<para>
In some cases (e.g. on Debian or Ubuntu Linux installations), PostgreSQL's
configuration files are located outside of the data directory and will
not be copied by default. &repmgr; can copy these files, either to the same
location on the standby server (provided appropriate directory and file permissions
are available), or into the standby's data directory. This requires passwordless
SSH access to the primary server. Add the option <literal>--copy-external-config-files</literal>
to the <command>repmgr standby clone</command> command; by default files will be copied to
the same path as on the upstream server. Note that the user executing <command>repmgr</command>
must have write access to those directories.
</para>
<para>
To have the configuration files placed in the standby's data directory, specify
<literal>--copy-external-config-files=pgdata</literal>, but note that
any include directives in the copied files may need to be updated.
</para>
<tip>
<simpara>
For reliable configuration file management we recommend using a
configuration management tool such as Ansible, Chef, Puppet or Salt.
</simpara>
</tip>
</sect1>
<sect1 id="repmgr-standby-clone-wal-management" xreflabel="Managing WAL during the cloning process">
<title>Managing WAL during the cloning process</title>
<para>
When initially cloning a standby, you will need to ensure
that all required WAL files remain available while the cloning is taking
place. To ensure this happens when using the default `pg_basebackup` method,
&repmgr; will set <command>pg_basebackup</command>'s <literal>--xlog-method</literal>
parameter to <literal>stream</literal>,
which will ensure all WAL files generated during the cloning process are
streamed in parallel with the main backup. Note that this requires two
replication connections to be available (&repmgr; will verify sufficient
connections are available before attempting to clone, and this can be checked
before performing the clone using the <literal>--dry-run</literal> option).
</para>
<para>
To override this behaviour, in <filename>repmgr.conf</filename> set
<command>pg_basebackup</command>'s <literal>--xlog-method</literal>
parameter to <literal>fetch</literal>:
<programlisting>
pg_basebackup_options='--xlog-method=fetch'</programlisting>
and ensure that <literal>wal_keep_segments</literal> is set to an appropriately high value.
See the <ulink url="https://www.postgresql.org/docs/current/static/app-pgbasebackup.html">
pg_basebackup</ulink> documentation for details.
</para>
<note>
<simpara>
From PostgreSQL 10, <command>pg_basebackup</command>'s
<literal>--xlog-method</literal> parameter has been renamed to
<literal>--wal-method</literal>.
</simpara>
</note>
</sect1>
</chapter>

View File

@@ -0,0 +1,487 @@
<refentry id="repmgr-standby-clone">
<indexterm>
<primary>repmgr standby clone</primary>
<seealso>cloning</seealso>
</indexterm>
<refmeta>
<refentrytitle>repmgr standby clone</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr standby clone</refname>
<refpurpose>clone a PostgreSQL standby node from another PostgreSQL node</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
<command>repmgr standby clone</command> clones a PostgreSQL node from another
PostgreSQL node, typically the primary, but optionally from any other node in
the cluster or from Barman. It creates the replication configuration required
to attach the cloned node to the primary node (or another standby, if cascading replication
is in use).
</para>
<note>
<simpara>
<command>repmgr standby clone</command> does not start the standby, and after cloning
a standby, the command <command>repmgr standby register</command> must be executed to
notify &repmgr; of its existence.
</simpara>
</note>
</refsect1>
<refsect1 id="repmgr-standby-clone-config-file-copying" xreflabel="Copying configuration files">
<title>Handling configuration files</title>
<para>
Note that by default, all configuration files in the source node's data
directory will be copied to the cloned node. Typically these will be
<filename>postgresql.conf</filename>, <filename>postgresql.auto.conf</filename>,
<filename>pg_hba.conf</filename> and <filename>pg_ident.conf</filename>.
These may require modification before the standby is started.
</para>
<para>
In some cases (e.g. on Debian or Ubuntu Linux installations), PostgreSQL's
configuration files are located outside of the data directory and will
not be copied by default. &repmgr; can copy these files, either to the same
location on the standby server (provided appropriate directory and file permissions
are available), or into the standby's data directory. This requires passwordless
SSH access to the primary server. Add the option <option>--copy-external-config-files</option>
to the <command>repmgr standby clone</command> command; by default files will be copied to
the same path as on the upstream server. Note that the user executing <command>repmgr</command>
must have write access to those directories.
</para>
<para>
To have the configuration files placed in the standby's data directory, specify
<literal>--copy-external-config-files=pgdata</literal>, but note that
any include directives in the copied files may need to be updated.
</para>
<note>
<para>
When executing <command>repmgr standby clone</command> with the
<option>--copy-external-config-files</option> aand <option>--dry-run</option>
options, &repmgr; will check the SSH connection to the source node, but
will not verify whether the files can actually be copied.
</para>
<para>
During the actual clone operation, a check will be made before the database itself
is cloned to determine whether the files can actually be copied; if any problems are
encountered, the clone operation will be aborted, enabling the user to fix
any issues before retrying the clone operation.
</para>
</note>
<tip>
<simpara>
For reliable configuration file management we recommend using a
configuration management tool such as Ansible, Chef, Puppet or Salt.
</simpara>
</tip>
</refsect1>
<refsect1 id="repmgr-standby-clone-recovery-conf">
<title>Customising replication configuration</title>
<indexterm>
<primary>recovery.conf</primary>
<secondary>customising with &quot;repmgr standby clone&quot;</secondary>
</indexterm>
<indexterm>
<primary>replication configuration</primary>
<secondary>customising with &quot;repmgr standby clone&quot;</secondary>
</indexterm>
<para>
By default, &repmgr; will create a minimal replication configuration
containing following parameters:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><varname>primary_conninfo</varname></simpara>
</listitem>
<listitem>
<simpara><varname>primary_slot_name</varname> (if replication slots in use)</simpara>
</listitem>
</itemizedlist>
<para>
For PostgreSQL 11 and earlier, these parameters will also be set:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><varname>standby_mode</varname> (always <literal>'on'</literal>)</simpara>
</listitem>
<listitem>
<simpara><varname>recovery_target_timeline</varname> (always <literal>'latest'</literal>)</simpara>
</listitem>
</itemizedlist>
<para>
The following additional parameters can be specified in <filename>repmgr.conf</filename>
for inclusion in the replication configuration:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><varname>restore_command</varname></simpara>
</listitem>
<listitem>
<simpara><varname>archive_cleanup_command</varname></simpara>
</listitem>
<listitem>
<simpara><varname>recovery_min_apply_delay</varname></simpara>
</listitem>
</itemizedlist>
<note>
<para>
We recommend using <ulink url="https://www.pgbarman.org/">Barman</ulink> to manage
WAL file archiving. For more details on combining &repmgr; and <application>Barman</application>,
in particular using <varname>restore_command</varname> to configure Barman as a backup source of
WAL files, see <xref linkend="cloning-from-barman"/>.
</para>
</note>
</refsect1>
<refsect1 id="repmgr-standby-clone-wal-management">
<title>Managing WAL during the cloning process</title>
<para>
When initially cloning a standby, you will need to ensure
that all required WAL files remain available while the cloning is taking
place. To ensure this happens when using the default <command>pg_basebackup</command> method,
&repmgr; will set <command>pg_basebackup</command>'s <literal>--wal-method</literal>
parameter to <literal>stream</literal>,
which will ensure all WAL files generated during the cloning process are
streamed in parallel with the main backup. Note that this requires two
replication connections to be available (&repmgr; will verify sufficient
connections are available before attempting to clone, and this can be checked
before performing the clone using the <literal>--dry-run</literal> option).
</para>
<para>
To override this behaviour, in <filename>repmgr.conf</filename> set
<command>pg_basebackup</command>'s <literal>--wal-method</literal>
parameter to <literal>fetch</literal>:
<programlisting>
pg_basebackup_options='--wal-method=fetch'</programlisting>
and ensure that <literal>wal_keep_segments</literal> (PostgreSQL 13 and later:
<literal>wal_keep_size</literal>) is set to an appropriately high value. Note
however that this is not a particularly reliable way of ensuring sufficient
WAL is retained and is not recommended.
See the <ulink url="https://www.postgresql.org/docs/current/app-pgbasebackup.html">
pg_basebackup</ulink> documentation for details.
</para>
<note>
<simpara>
If using PostgreSQL 9.6 or earlier, replace <literal>--wal-method</literal>
with <literal>--xlog-method</literal>.
</simpara>
</note>
</refsect1>
<refsect1 id="repmgr-standby-clone-wal-directory">
<title>Placing WAL files into a different directory</title>
<para>
To ensure that WAL files are placed in a directory outside of the main data
directory (e.g. to keep them on a separate disk for performance reasons),
specify the location with <option>--waldir</option>
(PostgreSQL 9.6 and earlier: <option>--xlogdir</option>) in
the <filename>repmgr.conf</filename> parameter <option>pg_basebackup_options</option>,
e.g.:
<programlisting>
pg_basebackup_options='--waldir=/path/to/wal-directory'</programlisting>
This setting will also be honored by &repmgr; when cloning from Barman
(&repmgr; 5.2 and later).
</para>
</refsect1>
<!-- don't rename this id as it may be used in external links -->
<refsect1 id="repmgr-standby-create-recovery-conf">
<title>Using a standby cloned by another method</title>
<indexterm>
<primary>replication configuration</primary>
<secondary>generating for a standby cloned by another method</secondary>
</indexterm>
<indexterm>
<primary>recovery.conf</primary>
<secondary>generating for a standby cloned by another method</secondary>
</indexterm>
<para>
&repmgr; supports standbys cloned by another method (e.g. using <application>barman</application>'s
<command><ulink url="https://docs.pgbarman.org/#recover">barman recover</ulink></command> command).
</para>
<para>
To integrate the standby as a &repmgr; node, once the standby has been cloned,
ensure the <filename>repmgr.conf</filename>
file is created for the node, and that it has been registered using
<command><link linkend="repmgr-standby-register">repmgr standby register</link></command>.
</para>
<tip>
<para>
To register a standby which is not running, execute
<link linkend="repmgr-standby-register">repmgr standby register --force</link>
and provide the connection details for the primary.
</para>
<para>
See <xref linkend="repmgr-standby-register-inactive-node"/> for more details.
</para>
</tip>
<para>
Then execute the command <command>repmgr standby clone --replication-conf-only</command>.
This will create the <filename>recovery.conf</filename> file needed to attach
the node to its upstream (in PostgreSQL 12 and later: append replication configuration
to <filename>postgresql.auto.conf</filename>), and will also create a replication slot on the
upstream node if required.
</para>
<para>
The upstream node must be running so the correct replication configuration can be obtained.
</para>
<para>
If the standby is running, the replication configuration will not be written unless the
<option>-F/--force</option> option is provided.
</para>
<tip>
<para>
Execute <command>repmgr standby clone --replication-conf-only --dry-run</command>
to check the prerequisites for creating the recovery configuration,
and display the configuration changes which would be made without actually
making any changes.
</para>
</tip>
<para>
In PostgreSQL 13 and later, the PostgreSQL configuration must be reloaded for replication
configuration changes to take effect.
</para>
<para>
In PostgreSQL 12 and earlier, the PostgreSQL instance must be restarted for replication
configuration changes to take effect.
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>-d, --dbname=CONNINFO</option></term>
<listitem>
<para>
Connection string of the upstream node to use for cloning.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually clone the standby.
</para>
<para>
If <option>--replication-conf-only</option> specified, the contents of
the generated recovery configuration will be displayed
but not written.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-c, --fast-checkpoint</option></term>
<listitem>
<para>
Force fast checkpoint (not effective when cloning from Barman).
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--copy-external-config-files[={samepath|pgdata}]</option></term>
<listitem>
<para>
Copy configuration files located outside the data directory on the source
node to the same path on the standby (default) or to the
PostgreSQL data directory.
</para>
<para>
Note that to be able to use this option, the &repmgr; user must be a superuser or
member of the <literal>pg_read_all_settings</literal> predefined role.
If this is not the case, provide a valid superuser with the
<option>-S</option>/<option>--superuser</option> option.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--no-upstream-connection</option></term>
<listitem>
<para>
When using Barman, do not connect to upstream node.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--recovery-min-apply-delay</option></term>
<listitem>
<para>
Set PostgreSQL configuration <option>recovery_min_apply_delay</option> parameter
to the provided value.
</para>
<para>
This overrides any <option>recovery_min_apply_delay</option> provided via
<filename>repmgr.conf</filename>.
</para>
<para>
For more details on this parameter, see:
<ulink url="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-RECOVERY-MIN-APPLY-DELAY">recovery_min_apply_delay</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-R, --remote-user=USERNAME</option></term>
<listitem>
<para>
Remote system username for SSH operations (default: current local system username).
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--replication-conf-only</option></term>
<listitem>
<para>
Create recovery configuration for a previously cloned instance.
</para>
<para>
In PostgreSQL 12 and later, the replication configuration will be
written to <filename>postgresql.auto.conf</filename>.
</para>
<para>
In PostgreSQL 11 and earlier, the replication configuration will be
written to <filename>recovery.conf</filename>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--replication-user</option></term>
<listitem>
<para>
User to make replication connections with (optional, not usually required).
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-S</option>/<option>--superuser</option></term>
<listitem>
<para>
The name of a valid PostgreSQL superuser can be provided with this option.
</para>
<para>
This is only required if the <option>--copy-external-config-files</option> was provided
and the &repmgr; user is not a superuser or member of the <literal>pg_read_all_settings</literal>
predefined role.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--upstream-conninfo</option></term>
<listitem>
<para>
<literal>primary_conninfo</literal> value to include in the recovery configuration
when the intended upstream server does not yet exist.
</para>
<para>
Note that &repmgr; may modify the provided value, in particular to set the
correct <literal>application_name</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--upstream-node-id</option></term>
<listitem>
<para>
ID of the upstream node to replicate from (optional, defaults to primary node)
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--verify-backup</option></term>
<listitem>
<para>
<!-- update link after Pg13 release -->
Verify a cloned node using the
<ulink url="https://www.postgresql.org/docs/13/app-pgverifybackup.html">pg_verifybackup</ulink>
utility (PostgreSQL 13 and later).
</para>
<para>
This option can currently only be used when cloning directly from an upstream
node.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--without-barman </option></term>
<listitem>
<para>
Do not use Barman even if configured.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1 id="repmgr-standby-clone-events">
<title>Event notifications</title>
<para>
A <literal>standby_clone</literal> <link linkend="event-notifications">event notification</link> will be generated.
</para>
</refsect1>
<refsect1>
<title>See also</title>
<para>
See <xref linkend="cloning-standbys"/> for details about various aspects of cloning.
</para>
</refsect1>
</refentry>

View File

@@ -1,21 +0,0 @@
<chapter id="repmgr-standby-follow" xreflabel="repmgr standby follow">
<indexterm>
<primary>repmgr standby follow</primary>
</indexterm>
<title>repmgr standby follow</title>
<para>
Attaches the standby to a new primary. This command requires a valid
<filename>repmgr.conf</filename> file for the standby, either specified
explicitly with <literal>-f/--config-file</literal> or located in a
default location; no additional arguments are required.
</para>
<para>
This command will force a restart of the standby server, which must be
running. It can only be used to attach a standby to a new primary node.
</para>
<para>
To re-add an inactive node to the replication cluster, see
<xref linkend="repmgr-node-rejoin">
</para>
</chapter>

View File

@@ -0,0 +1,271 @@
<refentry id="repmgr-standby-follow">
<indexterm>
<primary>repmgr standby follow</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr standby follow</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr standby follow</refname>
<refpurpose>attach a running standby to a new upstream node</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
Attaches the standby (&quot;follow candidate&quot;) to a new upstream node
(&quot;follow target&quot;). Typically this will be the primary, but this
command can also be used to attach the standby to another standby.
</para>
<para>
This command requires a valid <filename>repmgr.conf</filename> file for the standby,
either specified explicitly with <literal>-f/--config-file</literal> or located in a
default location; no additional arguments are required.
</para>
<para>The standby node (&quot;follow candidate&quot;) <emphasis>must</emphasis>
be running. If the new upstream (&quot;follow target&quot;) is not the primary,
the cluster primary <emphasis>must</emphasis> be running and accessible from the
standby node.
</para>
<tip>
<para>
To re-add an inactive node to the replication cluster, use
<xref linkend="repmgr-node-rejoin"/>.
</para>
</tip>
<para>
By default &repmgr; will attempt to attach the standby to the current primary.
If <option>--upstream-node-id</option> is provided, &repmgr; will attempt
to attach the standby to the specified node, which can be another standby.
</para>
<para>
In PostgreSQL 12 and earlier, this command will force a restart of PostgreSQL on the standby node.
</para>
<para>
In PostgreSQL 13 and later, by default this command will signal PostgreSQL to reload its
configuration, which will cause PostgreSQL to follow the new upstream without
a restart. If this behaviour is not desired for whatever reason, the configuration
file parameter <varname>standby_follow_restart</varname> can be set <literal>true</literal>
to always force a restart.
</para>
<para>
<command>repmgr standby follow</command> will wait up to
<varname>standby_follow_timeout</varname> seconds (default: <literal>30</literal>)
to verify the standby has actually connected to the new upstream node.
</para>
<note>
<para>
If <option>recovery_min_apply_delay</option> is set for the standby, it
will not attach to the new upstream node until it has replayed available
WAL.
</para>
<para>
Conversely, if the standby is attached to an upstream standby
which has <option>recovery_min_apply_delay</option> set, the upstream
standby's replay state may actually be behind that of its new downstream node.
</para>
</note>
</refsect1>
<refsect1>
<title>Example</title>
<para>
<programlisting>
$ repmgr -f /etc/repmgr.conf standby follow
INFO: setting node 3's primary to node 2
NOTICE: restarting server using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/postgres/data' restart"
waiting for server to shut down........ done
server stopped
waiting for server to start.... done
server started
NOTICE: STANDBY FOLLOW successful
DETAIL: node 3 is now attached to node 2</programlisting>
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually follow a new upstream node.
</para>
<para>
This will also verify whether the standby is capable of following the new upstream node.
</para>
<important>
<para>
If a standby was turned into a primary by removing <filename>recovery.conf</filename>
(<application>PostgreSQL 12</application> and later: <filename>standby.signal</filename>),
&repmgr; will <emphasis>not</emphasis> be able to determine whether that primary's timeline
has diverged from the timeline of the standby (&quot;follow candidate&quot;).
</para>
<para>
We recommend always to use <link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>
to promote a standby to primary, as this will ensure that the new primary
will perform a timeline switch (making it practical to check for timeline divergence)
and also that &repmgr; metadata is updated correctly.
</para>
</important>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--upstream-node-id</option></term>
<listitem>
<para>
Node ID of the new upstream node (&quot;follow target&quot;).
</para>
<para>
If not provided, &repmgr; will attempt to follow the current primary node.
</para>
<para>
Note that when using &repmgrd;, <option>--upstream-node-id</option>
should always be configured;
see <link linkend="repmgrd-automatic-failover-configuration">Automatic failover configuration</link>
for details.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-w</option></term>
<term><option>--wait</option></term>
<listitem>
<para>
Wait for a primary to appear. &repmgr; will wait for up to
<varname>primary_follow_timeout</varname> seconds
(default: 60 seconds) to verify that the standby is following the new primary.
This value can be defined in <filename>repmgr.conf</filename>.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Execution</title>
<para>
Execute with the <literal>--dry-run</literal> option to test the follow operation as
far as possible, without actually changing the status of the node.
</para>
<para>
Note that &repmgr; will first attempt to determine whether the standby
(&quot;follow candidate&quot;) is capable of following the
new upstream node (&quot;follow target&quot;).
</para>
<para>
If, for example, the new upstream node has diverged from this node's timeline,
for example if the new upstream node was promoted to primary while this node
was still attached to the original primary, it will <emphasis>not</emphasis>
be possible to follow the new upstream node, and &repmgr; will emit an error
message like this:
<programlisting>
ERROR: this node cannot attach to follow target node &quot;node3&quot; (ID 3)
DETAIL: follow target server's timeline 2 forked off current database system timeline 1 before current recovery point 0/6108880</programlisting>
</para>
<para>
In this case, it may be possible to have this node follow the new upstream
using <command><link linkend="repmgr-node-rejoin">repmgr node rejoin</link></command>
with the <option>--force-rewind</option> to execute <command>pg_rewind</command>.
This does mean that transactions which exist on this node, but not the new upstream,
will be lost.
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
One of the following exit codes will be emitted by <command>repmgr standby follow</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
The follow operation succeeded; or if <option>--dry-run</option> was provided,
no issues were detected which would prevent the follow operation.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_BAD_CONFIG (1)</option></term>
<listitem>
<para>
A configuration issue was detected which prevented &repmgr; from
continuing with the follow operation.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NO_RESTART (4)</option></term>
<listitem>
<para>
The node could not be restarted.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_DB_CONN (6)</option></term>
<listitem>
<para>
&repmgr; was unable to establish a database connection to one of the nodes.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_FOLLOW_FAIL (23)</option></term>
<listitem>
<para>
&repmgr; was unable to complete the follow command.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1 id="repmgr-standby-follow-events">
<title>Event notifications</title>
<para>
A <literal>standby_follow</literal> <link linkend="event-notifications">event notification</link> will be generated.
</para>
<para>
If provided, &repmgr; will substitute the placeholders <literal>%p</literal> with the node ID of the node
being followed, <literal>%c</literal> with its <literal>conninfo</literal> string, and
<literal>%a</literal> with its node name.
</para>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-node-rejoin"/>
</para>
</refsect1>
</refentry>

View File

@@ -1,18 +0,0 @@
<chapter id="repmgr-standby-promote" xreflabel="repmgr standby promote">
<indexterm>
<primary>repmgr standby promote</primary>
</indexterm>
<title>repmgr standby promote</title>
<para>
Promotes a standby to a primary if the current primary has failed. This
command requires a valid <filename>repmgr.conf</filename> file for the standby, either
specified explicitly with <literal>-f/--config-file</literal> or located in a
default location; no additional arguments are required.
</para>
<para>
If the standby promotion succeeds, the server will not need to be
restarted. However any other standbys will need to follow the new server,
by using <xref linkend="repmgr-standby-follow">; if <command>repmgrd</command>
is active, it will handle this automatically.
</para>
</chapter>

View File

@@ -0,0 +1,345 @@
<refentry id="repmgr-standby-promote">
<indexterm>
<primary>repmgr standby promote</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr standby promote</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr standby promote</refname>
<refpurpose>promote a standby to a primary</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
Promotes a standby to a primary if the current primary has failed. This
command requires a valid <filename>repmgr.conf</filename> file for the standby, either
specified explicitly with <literal>-f/--config-file</literal> or located in a
default location; no additional arguments are required.
</para>
<important>
<para>
If &repmgrd; is active, you must execute
<command><link linkend="repmgr-service-pause">repmgr service pause</link></command>
(&repmgr; 4.2 - 4.4: <command><link linkend="repmgr-service-pause">repmgr service pause</link></command>)
to temporarily disable &repmgrd; while making any changes
to the replication cluster.
</para>
</important>
<para>
If the standby promotion succeeds, the server will not need to be
restarted. However any other standbys will need to follow the new primary,
and will need to be restarted to do this.
</para>
<para>
Beginning with <link linkend="release-4.4">repmgr 4.4</link>,
the option <option>--siblings-follow</option> can be used to have
all other standbys (and a witness server, if in use)
follow the new primary.
</para>
<note>
<para>
If using &repmgrd;, when invoking
<command>repmgr standby promote</command> (either directly via
the <option>promote_command</option>, or in a script called
via <option>promote_command</option>), <option>--siblings-follow</option>
<emphasis>must not</emphasis> be included as a
command line option for <command>repmgr standby promote</command>.
</para>
</note>
<para>
In <link linkend="release-4.3">repmgr 4.3</link> and earlier,
<command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command>
must be executed on each standby individually.
</para>
<para>
&repmgr; will wait for up to <varname>promote_check_timeout</varname> seconds
(default: <literal>60</literal>) to verify that the standby has been promoted, and will
check the promotion every <varname>promote_check_interval</varname> seconds (default: 1 second).
Both values can be defined in <filename>repmgr.conf</filename>.
</para>
<warning>
<para>
In PostgreSQL 12 and earlier, if WAL replay is paused on the standby, and not all
WAL files on the standby have been replayed, &repmgr; will not attempt to promote it.
</para>
<para>
This is because if WAL replay is paused, PostgreSQL itself will not react to a promote command
until WAL replay is resumed and all pending WAL has been replayed. This means
attempting to promote PostgreSQL in this state will leave PostgreSQL in a condition where the
promotion may occur at a unpredictable point in the future.
</para>
<para>
Note that if the standby is in archive recovery, &repmgr; will not be able to determine
if more WAL is pending replay, and will abort the promotion attempt if WAL replay is paused.
</para>
<para>
This restriction does <emphasis>not</emphasis> apply to PostgreSQL 13 and later.
</para>
</warning>
</refsect1>
<refsect1>
<title>Example</title>
<para>
<programlisting>
$ repmgr -f /etc/repmgr.conf standby promote
NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using "pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/postgres/data' promote"
server promoting
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary</programlisting>
</para>
</refsect1>
<refsect1>
<title>User permission requirements</title>
<para><emphasis>pg_promote() (PostgreSQL 12 and later)</emphasis></para>
<para>
From PostgreSQL 12, &repmgr; will attempt to use the built-in <function>pg_promote()</function>
function to promote a standby to primary.
</para>
<para>
By default, execution of <function>pg_promote()</function> is restricted to superusers.
If the <literal>repmgr</literal> user does not have permission to execute
<function>pg_promote()</function>, &repmgr; will fall back to using &quot;<command>pg_ctl promote</command>&quot;.
</para>
<tip>
<para>
Execute <command>repmgr standby promote</command> with the <option>--dry-run</option>
to check whether the &repmgr; user has permission to execute <function>pg_promote()</function>.
</para>
<para>
If the <literal>repmgr</literal> user is not a superuser, execution permission for this
function can be granted with e.g.:
<programlisting>
GRANT EXECUTE ON FUNCTION pg_catalog.pg_promote TO repmgr</programlisting>
</para>
<para>
Note that permissions are only effective for the database they are granted in, so
this <emphasis>must</emphasis> be executed in the &repmgr; database to be effective.
</para>
</tip>
<para>
For more details on <function>pg_promote()</function>, see the
<ulink url="https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-RECOVERY-CONTROL-TABLE">PostgreSQL documentation</ulink>.
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check if this node can be promoted, but don't carry out the promotion.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--siblings-follow</option></term>
<listitem>
<para>
Have all sibling nodes (nodes formerly attached to the same upstream
node as the promotion candidate) follow this node after it has been promoted.
</para>
<para>
Note that a witness server, if in use, is also
counted as a &quot;sibling node&quot; as it needs to be instructed to
synchronise its metadata with the new primary.
</para>
<important>
<para>
Do <emphasis>not</emphasis> provide this option when configuring
&repmgrd;'s <option>promote_command</option>.
</para>
</important>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-F</option></term>
<term><option>--force</option></term>
<listitem>
<para>
Ignore warnings and continue anyway.
</para>
<para>
This option is relevant in the following situations if <option>--siblings-follow</option> was specified:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
If one or more sibling nodes was not reachable via SSH, the standby will be promoted anyway.
</simpara>
</listitem>
<listitem>
<simpara>
If the promotion candidate has insufficient free walsenders to accommodate the standbys which will
be attached to it, the standby will be promoted anyway.
</simpara>
</listitem>
<listitem>
<simpara>
If replication slots are in use but the promotion candidate has insufficient free replication slots
to accommodate the standbys which will be attached to it, the standby will be promoted anyway.
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Note that if the <option>-F</option>/<option>--force</option> option is used when any of the above
situations is encountered, the onus is on the user to manually resolve any resulting issues.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Configuration file settings</title>
<para>
The following parameters in <filename>repmgr.conf</filename> are relevant to the
promote operation:
</para>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<indexterm>
<primary>promote_check_interval</primary>
<secondary>with &quot;repmgr standby promote &quot;</secondary>
</indexterm>
<simpara>
<literal>promote_check_interval</literal>:
interval (in seconds, default: 1 second) to wait between each check
to determine whether the standby has been promoted.
</simpara>
</listitem>
<listitem>
<indexterm>
<primary>promote_check_timeout</primary>
<secondary>with &quot;repmgr standby promote &quot;</secondary>
</indexterm>
<simpara>
<literal>promote_check_timeout</literal>:
time (in seconds, default: 60 seconds) to wait to verify that the standby has been promoted
before exiting with <literal>ERR_PROMOTION_FAIL</literal>.
</simpara>
</listitem>
<listitem>
<indexterm>
<primary>service_promote_command</primary>
<secondary>with &quot;repmgr standby promote &quot;</secondary>
</indexterm>
<simpara>
<literal>service_promote_command</literal>:
a command which will be executed instead of <command>pg_ctl promote</command>
or (in PostgreSQL 12 and later) <function>pg_promote()</function>.
</simpara>
<simpara>
This is intended for systems which provide a package-level promote command,
such as Debian's <application>pg_ctlcluster</application>, to promote the
PostgreSQL from standby to primary.
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr standby promote</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
The standby was successfully promoted to primary.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_DB_CONN (6)</option></term>
<listitem>
<para>
&repmgr; was unable to connect to the local PostgreSQL node.
</para>
<para>
PostgreSQL must be running before the node can be promoted.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_PROMOTION_FAIL (8)</option></term>
<listitem>
<para>
The node could not be promoted to primary for one of the following
reasons:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
there is an existing primary node in the replication cluster
</simpara>
</listitem>
<listitem>
<simpara>
the node is not a standby
</simpara>
</listitem>
<listitem>
<simpara>
WAL replay is paused on the node
</simpara>
</listitem>
<listitem>
<simpara>
execution of the PostgreSQL promote command failed
</simpara>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1 id="repmgr-standby-promote-events">
<title>Event notifications</title>
<para>
A <literal>standby_promote</literal> <link linkend="event-notifications">event notification</link> will be generated.
</para>
</refsect1>
</refentry>

View File

@@ -1,50 +0,0 @@
<chapter id="repmgr-standby-register" xreflabel="repmgr standby register">
<indexterm><primary>repmgr standby register</primary></indexterm>
<title>repmgr standby register</title>
<para>
<command>repmgr standby register</command> adds a standby's information to
the &repmgr; metadata. This command needs to be executed to enable
promote/follow operations and to allow <command>repmgrd</command> to work with the node.
An existing standby can be registered using this command. Execute with the
<literal>--dry-run</literal> option to check what would happen without actually registering the
standby.
</para>
<sect1 id="repmgr-standby-register-wait" xreflabel="repmgr standby register --wait">
<title>Waiting for the registration to propagate to the standby</title>
<para>
Depending on your environment and workload, it may take some time for
the standby's node record to propagate from the primary to the standby. Some
actions (such as starting <command>repmgrd</command>) require that the standby's node record
is present and up-to-date to function correctly.
</para>
<para>
By providing the option <literal>--wait-sync</literal> to the
<command>repmgr standby register</command> command, &repmgr; will wait
until the record is synchronised before exiting. An optional timeout (in
seconds) can be added to this option (e.g. <literal>--wait-sync=60</literal>).
</para>
</sect1>
<sect1 id="rempgr-standby-register-inactive-node" xreflabel="Registering an inactive node">
<title>Registering an inactive node</title>
<para>
Under some circumstances you may wish to register a standby which is not
yet running; this can be the case when using provisioning tools to create
a complex replication cluster. In this case, by using the <literal>-F/--force</literal>
option and providing the connection parameters to the primary server,
the standby can be registered.
</para>
<para>
Similarly, with cascading replication it may be necessary to register
a standby whose upstream node has not yet been registered - in this case,
using <literal>-F/--force</literal> will result in the creation of an inactive placeholder
record for the upstream node, which will however later need to be registered
with the <literal>-F/--force</literal> option too.
</para>
<para>
When used with <command>repmgr standby register</command>, care should be taken that use of the
<literal>-F/--force</literal> option does not result in an incorrectly configured cluster.
</para>
</sect1>
</chapter>

View File

@@ -0,0 +1,197 @@
<refentry id="repmgr-standby-register" xreflabel="repmgr standby register">
<indexterm>
<primary>repmgr standby register</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr standby register</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr standby register</refname>
<refpurpose>add a standby's information to the &repmgr; metadata</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
<command>repmgr standby register</command> adds a standby's information to
the &repmgr; metadata. This command needs to be executed to enable
promote/follow operations and to allow &repmgrd; to work with the node.
An existing standby can be registered using this command. Execute with the
<literal>--dry-run</literal> option to check what would happen without actually registering the
standby.
</para>
<note>
<para>
If providing the configuration file location with <literal>-f/--config-file</literal>,
avoid using a relative path, as &repmgr; stores the configuration file location
in the repmgr metadata for use when &repmgr; is executed remotely (e.g. during
<xref linkend="repmgr-standby-switchover"/>). &repmgr; will attempt to convert the
a relative path into an absolute one, but this may not be the same as the path you
would explicitly provide (e.g. <filename>./repmgr.conf</filename> might be converted
to <filename>/path/to/./repmgr.conf</filename>, whereas you'd normally write
<filename>/path/to/repmgr.conf</filename>).
</para>
</note>
</refsect1>
<refsect1 id="repmgr-standby-register-wait-start" xreflabel="repmgr standby register --wait-start">
<title>Waiting for the the standby to start</title>
<para>
By default, &repmgr; will wait 30 seconds for the standby to become available before
aborting with a connection error. This is useful when setting up a standby from a script,
as the standby may not have fully started up by the time <command>repmgr standby register</command>
is executed.
</para>
<para>
To change the timeout, pass the desired value with the <literal>--wait-start</literal> option.
A value of <literal>0</literal> will disable the timeout.
</para>
<para>
The timeout will be ignored if <literal>-F/--force</literal> was provided.
</para>
</refsect1>
<refsect1 id="repmgr-standby-register-wait-sync" xreflabel="repmgr standby register --wait-sync">
<title>Waiting for the registration to propagate to the standby</title>
<para>
Depending on your environment and workload, it may take some time for the standby's node record
to propagate from the primary to the standby. Some actions (such as starting
&repmgrd;) require that the standby's node record
is present and up-to-date to function correctly.
</para>
<para>
By providing the option <option>--wait-sync</option> to the
<command>repmgr standby register</command> command, &repmgr; will wait
until the record is synchronised before exiting. An optional timeout (in
seconds) can be added to this option (e.g. <option>--wait-sync=60</option>).
</para>
</refsect1>
<refsect1 id="repmgr-standby-register-inactive-node" xreflabel="Registering an inactive node">
<title>Registering an inactive node</title>
<para>
Under some circumstances you may wish to register a standby which is not
yet running; this can be the case when using provisioning tools to create
a complex replication cluster, or if the node was not cloned by &repmgr;.
</para>
<para>
In this case, by using the <option>-F/--force</option>
option and providing the connection parameters to the primary server,
the standby can be registered even if it has not yet been started.
</para>
<tip>
<para>
Connection parameters can either be provided either as a <literal>conninfo</literal> string
(e.g. <option>-d 'host=node1 user=repmgr'</option> or as individual connection parameters
(<option>-h/--host</option>, <option>-d/--dbname</option>,
<option>-U/--user</option>, <option>-p/--port</option> etc.).
</para>
</tip>
<para>
Similarly, with cascading replication it may be necessary to register
a standby whose upstream node has not yet been registered - in this case,
using <option>-F/--force</option> will result in the creation of an inactive placeholder
record for the upstream node, which will however later need to be registered
with the <option>-F/--force</option> option too.
</para>
<para>
When used with <command>repmgr standby register</command>, care should be taken that use of the
<option>-F/--force</option> option does not result in an incorrectly configured cluster.
</para>
</refsect1>
<refsect1 id="repmgr-standby-register-node-cloned-other-source">
<title>Registering a node not cloned by repmgr</title>
<para>
If you've cloned a standby using another method (e.g. <application>barman</application>'s
<command><ulink url="https://docs.pgbarman.org/#recover">barman recover</ulink></command>
command), register the node as detailed in section
<xref linkend="repmgr-standby-register-inactive-node"/> then execute
<link linkend="repmgr-standby-create-recovery-conf">repmgr standby clone --replication-conf-only</link>
to generate the appropriate replication configuration.
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually register the standby.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-F</option>/<option>--force</option></term>
<listitem>
<para>
Overwrite an existing node record
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--upstream-node-id</option></term>
<listitem>
<para>
ID of the upstream node to replicate from (optional)
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--wait-start</option></term>
<listitem>
<para>
wait for the standby to start (timeout in seconds, default 30 seconds)
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--wait-sync</option></term>
<listitem>
<para>
wait for the node record to synchronise to the standby (optional timeout in seconds)
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1 id="repmgr-standby-register-events">
<title>Event notifications</title>
<para>
A <literal>standby_register</literal> <link linkend="event-notifications">event notification</link>
will be generated immediately after the node record is updated on the primary.
</para>
<para>
If the <option>--wait-sync</option> option is provided, a <literal>standby_register_sync</literal>
event notification will be generated immediately after the node record has synchronised to the
standby.
</para>
<para>
If provided, &repmgr; will substitute the placeholders <literal>%p</literal> with the node ID of the
primary node, <literal>%c</literal> with its <literal>conninfo</literal> string, and
<literal>%a</literal> with its node name.
</para>
</refsect1>
</refentry>

View File

@@ -1,27 +0,0 @@
<chapter id="repmgr-standby-switchover" xreflabel="repmgr standby switchover">
<indexterm>
<primary>repmgr standby switchover</primary>
</indexterm>
<title>repmgr standby switchover</title>
<para>
Promotes a standby to primary and demotes the existing primary to a standby.
This command must be run on the standby to be promoted, and requires a
passwordless SSH connection to the current primary.
</para>
<para>
If other standbys are connected to the demotion candidate, &repmgr; can instruct
these to follow the new primary if the option <literal>--siblings-follow</literal>
is specified.
</para>
<para>
Execute with the <literal>--dry-run</literal> option to test the switchover as far as
possible without actually changing the status of either node.
</para>
<para>
<command>repmgrd</command> should not be active on any nodes while a switchover is being
executed. This restriction may be lifted in a later version.
</para>
<para>
For more details see the section <xref linkend="performing-switchover">.
</para>
</chapter>

View File

@@ -0,0 +1,449 @@
<refentry id="repmgr-standby-switchover">
<indexterm>
<primary>repmgr standby switchover</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr standby switchover</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr standby switchover</refname>
<refpurpose>promote a standby to primary and demote the existing primary to a standby</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
Promotes a standby to primary and demotes the existing primary to a standby.
This command must be run on the standby to be promoted, and requires a
passwordless SSH connection to the current primary.
</para>
<para>
If other nodes are connected to the demotion candidate, &repmgr; can instruct
these to follow the new primary if the option <literal>--siblings-follow</literal>
is specified. This requires a passwordless SSH connection between the promotion
candidate (new primary) and the nodes attached to the demotion candidate
(existing primary). Note that a witness server, if in use, is also
counted as a &quot;sibling node&quot; as it needs to be instructed to
synchronise its metadata with the new primary.
</para>
<note>
<para>
Performing a switchover is a non-trivial operation. In particular it
relies on the current primary being able to shut down cleanly and quickly.
&repmgr; will attempt to check for potential issues but cannot guarantee
a successful switchover.
</para>
<para>
&repmgr; will refuse to perform the switchover if an exclusive backup is running on
the current primary, or if WAL replay is paused on the standby.
</para>
</note>
<para>
For more details on performing a switchover, including preparation and configuration,
see section <xref linkend="performing-switchover"/>.
</para>
<note>
<para>
From <link linkend="release-4.2">repmgr 4.2</link>, &repmgr; will instruct any running
&repmgrd; instances to pause operations while the switchover
is being carried out, to prevent &repmgrd; from
unintentionally promoting a node. For more details, see <xref linkend="repmgrd-pausing"/>.
</para>
<para>
Users of &repmgr; versions prior to 4.2 should ensure that &repmgrd;
is not running on any nodes while a switchover is being executed.
</para>
</note>
</refsect1>
<refsect1>
<title>User permission requirements</title>
<para><emphasis>data_directory</emphasis></para>
<para>
&repmgr; needs to be able to determine the location of the data directory on the
demotion candidate. If the &repmgr; is not a superuser or member of the <varname>pg_read_all_settings</varname>
<ulink url="https://www.postgresql.org/docs/current/predefined-roles.html">predefined roles</ulink>,
the name of a superuser should be provided with the <option>-S</option>/<option>--superuser</option> option.
</para>
<para><emphasis>CHECKPOINT</emphasis></para>
<para>
&repmgr; executes <command>CHECKPOINT</command> on the demotion candidate as part of the shutdown
process to ensure it shuts down as smoothly as possible.
</para>
<para>
Note that <command>CHECKPOINT</command> requires database superuser permissions to execute.
If the <literal>repmgr</literal> user is not a superuser, the name of a superuser should be
provided with the <option>-S</option>/<option>--superuser</option> option. From PostgreSQL 15 the <varname>pg_checkpoint</varname>
predefined role removes the need for superuser permissions to perform <command>CHECKPOINT</command> command.
</para>
<para>
If &repmgr; is unable to execute the <command>CHECKPOINT</command> command, the switchover
can still be carried out, albeit at a greater risk that the demotion candidate may not
be able to shut down as smoothly as might otherwise have been the case.
</para>
<para><emphasis>pg_promote() (PostgreSQL 12 and later)</emphasis></para>
<para>
From PostgreSQL 12, &repmgr; defaults to using the built-in <command>pg_promote()</command> function to
promote a standby to primary.
</para>
<para>
Note that execution of <function>pg_promote()</function> is restricted to superusers or to
any user who has been granted execution permission for this function. If the &repmgr; user
is not permitted to execute <function>pg_promote()</function>, &repmgr; will fall back to using
&quot;<command>pg_ctl promote</command>&quot;. For more details see
<link linkend="repmgr-standby-promote">repmgr standby promote</link>.
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--always-promote</option></term>
<listitem>
<para>
Promote standby to primary, even if it is behind or has diverged
from the original primary. The original primary will be shut down in any case,
and will need to be manually reintegrated into the replication cluster.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually execute a switchover.
</para>
<important>
<para>
Success of <option>--dry-run</option> does not imply the switchover will
complete successfully, only that
the prerequisites for performing the operation are met.
</para>
</important>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-F</option></term>
<term><option>--force</option></term>
<listitem>
<para>
Ignore warnings and continue anyway.
</para>
<para>
Specifically, if a problem is encountered when shutting down the current primary,
using <option>-F/--force</option> will cause &repmgr; to continue by promoting
the standby to be the new primary, and if <option>--siblings-follow</option> is
specified, attach any other standbys to the new primary.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--force-rewind[=/path/to/pg_rewind]</option></term>
<listitem>
<para>
Use <application>pg_rewind</application> to reintegrate the old primary if necessary
(and the prerequisites for using <application>pg_rewind</application> are met).
</para>
<para>
If using PostgreSQL 9.4, and the <application>pg_rewind</application>
binary is not installed in the PostgreSQL <filename>bin</filename> directory,
provide its full path. For more details see also <xref linkend="switchover-pg-rewind"/>
and <xref linkend="repmgr-node-rejoin-pg-rewind"/>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-R</option></term>
<term><option>--remote-user</option></term>
<listitem>
<para>
System username for remote SSH operations (defaults to local system user).
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--repmgrd-no-pause</option></term>
<listitem>
<para>
Don't pause &repmgrd; while executing a switchover.
</para>
<para>
This option should not be used unless you take steps by other means
to ensure &repmgrd; is paused or not
running on all nodes.
</para>
<para>
This option cannot be used together with <option>--repmgrd-force-unpause</option>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--repmgrd-force-unpause</option></term>
<listitem>
<para>
Always unpause all &repmgrd; instances after executing a switchover. This will ensure that
any &repmgrd; instances which were paused before the switchover will be
unpaused.
</para>
<para>
This option cannot be used together with <option>--repmgrd-no-pause</option>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--siblings-follow</option></term>
<listitem>
<para>
Have nodes attached to the old primary follow the new primary.
</para>
<para>
This will also ensure that a witness node, if in use, is updated
with the new primary's data.
</para>
<note>
<para>
In a future &repmgr; release, <option>--siblings-follow</option> will be applied
by default.
</para>
</note>
</listitem>
</varlistentry>
<varlistentry>
<term><option>-S</option>/<option>--superuser</option></term>
<listitem>
<para>
Use the named superuser instead of the normal &repmgr; user to perform
actions requiring superuser permissions.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Configuration file settings</title>
<para>
The following parameters in <filename>repmgr.conf</filename> are relevant to the
switchover operation:
</para>
<variablelist>
<varlistentry>
<term><option>replication_lag_critical</option></term>
<listitem>
<indexterm>
<primary>replication_lag_critical</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<para>
If replication lag (in seconds) on the standby exceeds this value, the
switchover will be aborted (unless the <literal>-F/--force</literal> option
is provided)
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>shutdown_check_timeout</option></term>
<listitem>
<indexterm>
<primary>shutdown_check_timeout</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<para>
The maximum number of seconds to wait for the
demotion candidate (current primary) to shut down, before aborting the switchover.
</para>
<para>
Note that this parameter is set on the node where <command>repmgr standby switchover</command>
is executed (promotion candidate); setting it on the demotion candidate (former primary) will
have no effect.
</para>
<note>
<para>
In versions prior to <link linkend="release-4.2">&repmgr; 4.2</link>, <command>repmgr standby switchover</command> would
use the values defined in <literal>reconnect_attempts</literal> and <literal>reconnect_interval</literal>
to determine the timeout for demotion candidate shutdown.
</para>
</note>
</listitem>
</varlistentry>
<varlistentry>
<term><option>wal_receive_check_timeout</option></term>
<listitem>
<indexterm>
<primary>wal_receive_check_timeout</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<para>
After the primary has shut down, the maximum number of seconds to wait for the
walreceiver on the standby to flush WAL to disk before comparing WAL receive location
with the primary's shut down location.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>standby_reconnect_timeout</option></term>
<listitem>
<indexterm>
<primary>standby_reconnect_timeout</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<para>
The maximum number of seconds to attempt to wait for the demotion candidate (former primary)
to reconnect to the promoted primary (default: 60 seconds)
</para>
<para>
Note that this parameter is set on the node where <command>repmgr standby switchover</command>
is executed (promotion candidate); setting it on the demotion candidate (former primary) will
have no effect.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>node_rejoin_timeout</option></term>
<listitem>
<indexterm>
<primary>node_rejoin_timeout</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<para>
maximum number of seconds to attempt to wait for the demotion candidate (former primary)
to reconnect to the promoted primary (default: 60 seconds)
</para>
<para>
Note that this parameter is set on the the demotion candidate (former primary);
setting it on the node where <command>repmgr standby switchover</command> is
executed will have no effect.
</para>
<para>
However, this value <emphasis>must</emphasis> be less than <option>standby_reconnect_timeout</option> on the
promotion candidate (the node where <command>repmgr standby switchover</command> is executed).
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Execution</title>
<para>
Execute with the <literal>--dry-run</literal> option to test the switchover as far as
possible without actually changing the status of either node.
</para>
<para>
External database connections, e.g. from an application, should not be permitted while
the switchover is taking place. In particular, active transactions on the primary
can potentially disrupt the shutdown process.
</para>
</refsect1>
<refsect1 id="repmgr-standby-switchover-events">
<title>Event notifications</title>
<para>
<literal>standby_switchover</literal> and <literal>standby_promote</literal>
<link linkend="event-notifications">event notifications</link> will be generated for the new primary,
and a <literal>node_rejoin</literal> event notification for the former primary (new standby).
</para>
<para>
If using an event notification script, <literal>standby_switchover</literal>
will populate the placeholder parameter <literal>%p</literal> with the node ID of
the former primary.
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
One of the following exit codes will be emitted by <command>repmgr standby switchover</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
The switchover completed successfully; or if <option>--dry-run</option> was provided,
no issues were detected which would prevent the switchover operation.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_SWITCHOVER_FAIL (18)</option></term>
<listitem>
<para>
The switchover could not be executed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_SWITCHOVER_INCOMPLETE (22)</option></term>
<listitem>
<para>
The switchover was executed but a problem was encountered.
Typically this means the former primary could not be reattached
as a standby. Check preceding log messages for more information.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-standby-follow"/>, <xref linkend="repmgr-node-rejoin"/>
</para>
<para>
For more details on performing a switchover operation, see the section <xref linkend="performing-switchover"/>.
</para>
</refsect1>
</refentry>

Some files were not shown because too many files have changed in this diff Show More