Compare commits

..

345 Commits

Author SHA1 Message Date
Ian Barwick
ee1a6f9d0f doc: add a link to the current documentation from the contents page 2019-04-03 10:48:36 +09:00
Ian Barwick
49eb408873 doc: fix typo
Per user report on mailing list.
2018-10-23 09:01:00 +09:00
Ian Barwick
fba3d29514 doc: clarify BDR repmgrd configuration
Link directly to section about configuring the "event_notification_command".
2018-07-23 13:23:28 +09:00
Ian Barwick
77200e5030 doc: remove duplicate item in list of event notifications 2018-07-18 16:11:18 +09:00
Ian Barwick
4589b8d439 doc: update documentation of "promote_command" and "service_promote_command"
See commit 63242e2277
2018-07-16 14:55:07 +09:00
Ian Barwick
048f7c3310 doc: add extra emphasis about not running repmgrd during switchover
One day this will no longer be an issue, until then let's hope the
fine documentation is read.
2018-07-11 09:55:37 +09:00
Ian Barwick
1e5f63792f node check: implement CSV output
This is advertised in the --help output and placeholder code was in
place, but it wasn't actually implemented.
2018-06-22 15:46:50 +09:00
Ian Barwick
d26989bd12 node status: improve output and documentation
In the default text output mode, list inactive slots.

In CSV output mode, list inactive slots as additional information;
add output line with number of missing slots and a list thereof.

Also document --csv output mode.
2018-06-22 15:46:44 +09:00
Ian Barwick
f999c810a7 node check: clarify status information for witness server
Previously the output gave the impression the server was a primary,
which is technically the case, but it's not the actual cluster primary.

Also output an error if the node is in recovery, which is unlikely but
you never know.
2018-06-22 15:46:40 +09:00
Ian Barwick
81077d4bc2 standby switchover: fix behaviour if witness node is a sibling
The witness node is not a streaming replication standby, so executing
"repmgr standby follow" will fail. Instead, execute "repmgr witness
register --force" to update the witness node record on the primary and
its local copy of all node records.

Addresses GitHub #453.
2018-06-21 17:16:18 +09:00
Ian Barwick
a549941d4f repmgr: don't count witness node as a standby when running "node status"
Addresses GitHub #451.
2018-06-21 14:27:47 +09:00
Ian Barwick
2f6c159f9a "repmgr node ...": update comments and formatting 2018-06-21 14:27:42 +09:00
Ian Barwick
2eca1a0311 repmgr: don't count witness node as a standby when running "node check"
Addresses GitHub #451.
2018-06-21 11:31:09 +09:00
Ian Barwick
f6377084ec doc: remove info about old RPM package repository 2018-06-15 11:14:10 +09:00
Ian Barwick
d85c02b92b doc: finalize release notes 2018-06-15 10:52:51 +09:00
Ian Barwick
d9ba41fc35 doc: emphasize that repmgrd should not be running during a switchover 2018-06-11 15:31:22 +09:00
Ian Barwick
afdaf9be66 _create_event(): log event and node ID for debugging 2018-06-11 15:20:01 +09:00
Ian Barwick
8067924c3e repmgr: consolidate code in "standby switchover"
Commit 41274f5525 left us with two if statements
in sequence with exactly the same condition, so consolidate both into a single
statement. Clarify code comments while we're at it.
2018-06-11 15:14:40 +09:00
Ian Barwick
e94a6eefde repmgr: cluster check commands - non-zero exit code if node(s) unavailable
Return ERR_CLUSTER_CHECK if one or nodes was not reachable.

Implements GitHub #447.
2018-06-11 12:41:19 +09:00
Ian Barwick
69d7b6f7eb doc: 4.0.6 release notes 2018-06-07 17:14:50 +09:00
Ian Barwick
8ec3b2a536 Bump version
4.0.6
2018-06-07 15:08:48 +09:00
Ian Barwick
68a9745e7e standby follow: check node has connect to new primary
After restarting the standby, poll pg_stat_replication on the upstream
until the standby connects, and exit with an error if it doesn't by the
timeout defined in "standby_follow_timeout".

Implments GitHub #444.
2018-06-07 14:41:05 +09:00
Ian Barwick
20ce53e2d2 doc: update release notes 2018-06-07 12:48:54 +09:00
Ian Barwick
638a119c85 standby follow: add hint about using "node rejoin"
If "repmgr standby follow" is executed on a node which isn't running,
point out "repmgr node rejoin" should probably be used instead.
2018-06-07 11:02:32 +09:00
Ian Barwick
053863cdd0 doc: fix typos 2018-06-07 10:40:30 +09:00
Ian Barwick
009cc0480c witness_register: check for existing node with same name 2018-06-07 10:04:26 +09:00
Ian Barwick
63bdc19132 repmgrd: ensure local node is counted as quorum member
Rename "standby_nodes" to "sibling_nodes" to make it clearer in the
code what total is actually provided by the struct.

Addresses GitHub #439.
2018-06-01 17:19:40 +09:00
Ian Barwick
fbd389d0b3 doc: fix typo 2018-06-01 13:07:19 +09:00
Ian Barwick
4aef4ea11e standby clone: improve external configuration file copying
If --copy-external-config-files was provided, check that we can copy
the files *before* cloning the standby, and abort if an error is
encountered. This will give the user the opportunity to fix any issues
before running the entire (and potentially lengthy) clone.

Previously errors were logged but no action taken, and the final
message indicated the clone operation was successful.

Addresses GitHub #443.
2018-06-01 13:00:07 +09:00
Ian Barwick
0ffaff75df repmgrd: ensue degraded monitoring timeout works on standby
Parameter "degraded_monitoring_timeout" was not being acted on when
monitoring a streaming replication standby.

Addresses GitHub #439.
2018-05-31 17:53:31 +09:00
Ian Barwick
c54bb73fb2 If --dry-run specified, ensure minimum log level is INFO
When executed with --dry-run, repmgr outputs detail about what would
happen using log level INFO. If the log_level is configured to
NOTICE or higher, it's possible some or all of the --dry-run output
might not be displayed.

Addresses GitHub #441.
2018-05-31 15:30:26 +09:00
Ian Barwick
28ea2e48de node rejoin: avoid outputting empty DETAIL message 2018-05-31 15:10:51 +09:00
Ian Barwick
41274f5525 node rejoin: improve handling of --config-file parameter
Fixes bug when parsing --config-file values (GitHub #442).

Also improves handling in --dry-run mode, as some checks for the
provided files were being skipped if --dry-run supplied, even though
they are intended to work with --dry-run.
2018-05-31 11:44:31 +09:00
Ian Barwick
edceb32ccb standby clone: --recovery-conf-only expects the standby to be registered
Note this in the documentation, and add a HINT about registering it
if the standby record is not available.

Related to GitHub #438.
2018-05-29 11:54:38 +09:00
Ian Barwick
3dba8336e9 standby clone: don't assume existence of "user" in upstream conninfo
Usually a seperate user (typically "repmgr") is set up specifically to manage
the repmgr metadata, however there's no compelling requirement to do this, and
it's possible the database owner (usually: "postgres") will be used, in which
case it's possible the username will be left out of the conninfo string.

Addresses GitHub #437.
2018-05-24 15:51:41 +09:00
Ian Barwick
97d0cee259 "config_file" is MAXPGPATH, not MAXLEN
The two values are the same anyway, so change is more for consistency.
2018-05-22 17:19:55 +09:00
Martín Marqués
2dfe1d18e9 Fix typo in a code comment 2018-05-19 12:29:04 -03:00
Ian Barwick
55bb93bd3f "standby clone": log actual connection string used to connect to upstream
Useful for diagnostic purposes.
2018-05-10 11:58:48 +09:00
Ian Barwick
4c49954cd4 Fix check for -d/--dbname parameter
Not a bug per-se, just meant some unnecessary processing was done on
an empty string.

Per note from petere.
2018-05-10 11:57:02 +09:00
Ian Barwick
a880b6ce16 Include "arpa/inet.h" in dbutils.c
Needed for htonl() on FreeBSD.
2018-05-10 11:25:52 +09:00
Ian Barwick
c51a2283dd Minor documentation fixes 2018-05-10 10:27:25 +09:00
Ian Barwick
717828e73e doc: update 2ndQuadrant repository information
Canonical link for each repository should not include any directories.
2018-05-03 17:21:29 +09:00
Ian Barwick
c7477d7a9c doc: update repository information 2018-05-03 15:22:33 +09:00
Ian Barwick
1db8d3904f doc: update package installation information
Document the new public 2ndQuadrant apt repository
2018-05-03 15:07:26 +09:00
Ian Barwick
362f478d55 doc: update package installation information
Document the new, public 2ndQuadrant RPM repository.
2018-05-03 14:12:29 +09:00
Ian Barwick
cb1bf892e6 Finalize 4.0.5 release 2018-05-01 11:26:30 +09:00
Ian Barwick
b1b5fe1193 doc: add notes about package compatibility
We need to emphasise that the repmgr packages are only compatible
with packages based on the PGDG filesystem layout; 3rd party vendor
packages often put application and data directories elsewhere.
See e.g. GitHub #427.
2018-05-01 11:08:59 +09:00
Ian Barwick
af0e141859 doc: update FAQ location 2018-05-01 10:27:59 +09:00
Ian Barwick
580c1a9170 doc: update HISTORY and add 4.0.5 release notes 2018-05-01 10:13:44 +09:00
Ian Barwick
b624fc7efa Bump version
4.0.5
2018-05-01 09:21:32 +09:00
Ian Barwick
67ccd4dcb3 repmgrd: don't explicitly close connections on shutdown 2018-04-30 15:13:30 +09:00
Ian Barwick
6de3a5a997 Fix parsing of "archive_ready_critical" configuration file parameter.
Per report in GitHub #426.
2018-04-28 06:59:20 +09:00
Ian Barwick
f86e89ba45 repmgrd: notify sibling nodes to follow new primary after pg_ctl timeout
If "pg_ctl promote" fails due to a timeout, but the promotion itself succeeds,
have repmgrd on the new primary explicitly notify any sibling nodes to
follow it.

Previously the sibling nodes would wait "primary_notification_timeout" seconds
before attempting to discover the new primary.

This (and preceding commit eac80ae) address GitHub #425.
2018-04-27 11:59:00 +09:00
Ian Barwick
a6d0ba07ed repmgrd: handle pg_ctl timeout
It's possible "pg_ctl promote" will timeout, causing "repmgr standby
follow" to return with an error; however the promotion itself will usually
succeed, so detect this case and handle accordingly.
2018-04-26 19:23:26 +09:00
Ian Barwick
b553a70ad5 repmgrd: always close the connection if the pointer is not NULL 2018-04-25 14:08:17 +09:00
Ian Barwick
3364f8bdf0 Add configuration file parameter "config_directory"
This enables explicit provision of an external configuration file
directory, which if set will be passed to "pg_ctl" as the -D
parameter. Otherwise "pg_ctl" will default to using the data directory,
which will cause some operations to fail if the configuration files
are not present there.

Note this is implemented primarily for feature completeness and for
development/testing purposes. Users who have installed "repmgr" from
a package should not rely on "pg_ctl" to stop/start/restart PostgreSQL,
instead they should set the appropriate "service_..._command" for their
operating system. For more details see:

    https://repmgr.org/docs/4.0/configuration-service-commands.html

Note: in a future release, the presence of "config_directory" in repmgr.conf
will be used to implictly set "--copy-external-config-files=samepath" when
cloning a standby; this is a behaviour change so will be implemented in the
next major realease (repmgr 4.1).

Implements GitHub #424.
2018-04-25 11:57:27 +09:00
Ian Barwick
242fa287b4 repmgrd: catch corner case in standby connection handle check
If repmgrd marks the local node as unavailable, and it was actually
restarting but a failover event occured before the next local node
check, failover will continue with the stale connection handle.

Add a final local node check just before starting the failover
process, so repmgrd can reconnect if it wasn't able to before.
2018-04-24 21:55:36 +09:00
Ian Barwick
fa908432c8 Minor doc and log output tweaks 2018-04-24 21:08:31 +09:00
Ian Barwick
afa942fef6 repmgrd: prevent standby connection handle from going stale
If monitoring history not in use, there's no activity on the standby's
connection handle, so if e.g. the standby is restarted, PQstatus()
never returns CONNECTION_BAD and repmgrd never notices the connection
is stale. Therefore execute a throw-away statement at "monitor_interval_secs".
2018-04-23 23:51:03 +09:00
Ian Barwick
94cfc66b04 doc: minor clarification 2018-04-20 12:23:04 +09:00
Ian Barwick
87eae9a50f doc: additional details about repmgrd usage in Debian/Ubuntu 2018-04-20 12:04:15 +09:00
Ian Barwick
82a37f4865 doc: add Debian package details 2018-04-20 10:57:19 +09:00
Ian Barwick
a38f727b7d doc: Improve CentOS package-related documentation 2018-04-20 10:31:42 +09:00
Ian Barwick
e6df936c1b doc: link to service command configuration from switchover section 2018-04-19 17:09:10 +09:00
Ian Barwick
91ca997d40 doc: improve configuration documentation
With special attention to setting service commands, and extra special
mention of "pg_ctlcluster" for Debian/Ubuntu users.
2018-04-19 16:49:26 +09:00
Ian Barwick
65c90a2a64 doc: update CentOS package documentation 2018-04-19 14:27:17 +09:00
Ian Barwick
90cba78f52 repmgrd: tweak event notifications on standby failure
The event notification was only being created if there was a valid
primary connection; it should be created in any case, so an event
notification script can be executed.
2018-04-17 10:27:25 +09:00
Ian Barwick
f8908d7e31 Bump version
4.0.5dev
2018-04-13 10:18:04 +09:00
Ian Barwick
478bbcccbf Add "dbname=replication" to all replication connection strings
Previously repmgr was attempting to make replication connections
with "dbname" set to the repmgr database name. While this works
if e.g. the repmgr user also has replication permissions, it will
fail if a dedicated replication user is specified, who only has
permission to access the virtual "replication" database.

Change this to use "dbname=replication" if the replication connection
user is different to the normal repmgr database user.

(We could just always set it to "replication", but that might break
existing installations e.g. where a .pgpass file is in use and there's
no "replication" entry for the normal repmgr database user).

Addresses GitHub #421.
2018-04-12 16:10:02 +09:00
Ian Barwick
a03d41de28 doc: mention --recovery-conf-only introduced in repmgr 4.0.4
Per GitHub #419.
2018-04-12 13:13:11 +09:00
Ian Barwick
f1e527adcb doc: various updates related to "standby clone" operations. 2018-04-12 13:08:05 +09:00
Ian Barwick
09e597dcdd Fix superuser password handling
When establishing a superuser connection, the connection parameters
were being copied from the existing (non-superuser) connection, which
in some circumstances can lead to that user's password being
included in the copied parameter list. The password parameter, if set, will
now always be removed, which will cause libpq to retrieve the correct
one from the .pgpass file.

Addresses GitHub #400.
2018-04-12 12:50:17 +09:00
Ian Barwick
94a7f0c719 Don't issue a CHECKPOINT after promoting a standby.
Issuing a CHECKPOINT immediately after promoting a standby may impact
performance. Commit 239a548e9d ensures
one is only issued when required, i.e. during a switchover when
pg_rewind will be executed.

This reverts commit a2068768ab.
2018-04-09 14:39:47 +09:00
Ian Barwick
6ac42f1593 "standby register": add sanity check when --upstream-node-id not supplied
If --upstream-node-id was not supplied to "repmgr standby register",
repmgr defaults to the primary node as upstream node. If the local node is
available, we now double-check that it's attached to the primary,
in case the lack of --upstream-node-id was an accidental ommission.

This check is only made when the local node is available.

This behaviour can be overriden with -F/--force (though it's hard to
imagine a scenario where that would be useful).

Addresses GitHub #395.
2018-04-05 17:40:05 +09:00
Ian Barwick
94b72382e5 doc: minor FAQ tweaks 2018-04-05 17:10:52 +09:00
Ian Barwick
18c12f58a4 doc: add a section about repmgrd and service commands etc. 2018-04-05 11:47:35 +09:00
Ian Barwick
cf3fa18085 doc: miscelleneous FAQ updates
- clarify pg_rewind item
 - add note about what's included in recovery.conf
2018-04-04 10:08:04 +09:00
Ian Barwick
a5281d93dc Add TODO for pg_rewind changes coming in PostgreSQL 11 2018-04-03 21:57:50 +09:00
Ian Barwick
0d73d3c2b5 Enable provision of "archive_cleanup_command" in recovery.conf
If "archive_cleanup_command" is defined in "repmgr.conf", a corresponding
entry will be made in the node's "recovery.conf" file after cloning a
standby.

Note that we recommend using PgBarman to manage WAL archives, but are
providing this facility to help repmgr to be integrated in existing environments.

Implements GitHub #416.
2018-04-03 14:11:24 +09:00
Ian Barwick
23c99304a6 "node rejoin": actively check for node to rejoin cluster
Previously repmgr was relying on whatever command was configured to
start PostgreSQL to determine whether the node being rejoined had
started correctly. However it's preferable to actively poll the upstream
to confirm it has restarted and actually attached as a standby before
confirming success of the "node rejoin" action.

This can be overridden with the -W/--no-wait option.

(Note that for consistency with other PostgreSQL utilities, the
short form of the --wait option is now "-w"; this is currently
only used in "repmgr standby follow".)

Also update "repmgr node rejoin" documentation with a list of supported
options, and add some useful index entries for "pg_rewind".

Implements GitHub #415.
2018-04-03 10:36:13 +09:00
Ian Barwick
1ab16bc6c2 doc: fix option description for "repmgr primary register" 2018-04-03 10:10:05 +09:00
Ian Barwick
7f1f04636d Refactor pg_control parsing
The "data_checksum_version" field towards the end of the ControlFileData struct,
meaning its position varies between versions. Previously this wasn't a problem
as it was only required for operations involving 9.5 and later, and its position
within the control file has not changed between the current release and current
HEAD.

However, in order to support pg_rewind in 9.3 and 9.4, which both have changes in
the control file format, we'll need version-specific parsing. This will also make
it easier to deal with any future changes to the control file format.
2018-04-02 20:55:10 +09:00
Ian Barwick
6a1797cadd Enable pg_rewind to be used with PostgreSQL 9.3/9.4
pg_rewind is not part of the core distribution for those, but we
provided support in repmgr 3.3 so should extend it to repmgr 4.

Note that there is no check in place whether the pg_rewind binary
exists, so it's up to the user to ensure it's present.

Addresses GitHub #413.
2018-04-02 20:55:04 +09:00
Ian Barwick
94d26dbe9f Always set "connect_timeout" when pinging a PostgreSQL instance
Insert "connect_timeout=2" into the connection parameters, if not
explicitly set by the user. This will prevent excessive wait time
for the host operating system to report a connection timeout.
2018-04-02 09:31:42 +09:00
Ian Barwick
ae655eb4fd Add TODO list
This file will collate various requests and ideas for future developement.
In particular it will reference requests which come in via the GitHub issue
tracker, so we can acknowledge and close off the request and not have an
open unresolved issue hanging around.
2018-03-30 14:18:51 +09:00
Ian Barwick
65371489c6 repmgrd: handle failover with two nodes in the primary location
If two nodes were in the primary location, and at least one node in
another location, the non-failed node in the primary location was not
recognising itself as a promotion candidate.

Addresses GitHub #407.
2018-03-30 12:17:34 +09:00
Ian Barwick
28c7737dc0 Log pg_control access errors as WARNINGs rather than DEBUG
This will make it easier to diagnose issues, possibly with an incorrect
"data_directory" setting in "repmgr.conf".
2018-03-30 11:24:44 +09:00
Ian Barwick
505d72d19c "standby switchover": force checkpoint if pg_rewind requested.
Addresses issue described in GitHub #378.

PostgreSQL itself doesn't issue a checkpoint after promotion to ensure
the newly promoted server is available as quickly as possible, so we'll
only execute an explicit CHECKPOINT when it's actually required, i.e.
when pg_rewind will be executed. This is required as pg_rewind uses
the timeline reported in the pg_control file to compare with the
server to be rewound, and the pg_control timeline is only updated after
the first checkpoint, so there is an interval where pg_rewind will
erroneously assume both servers are on the timeline and take no action.
2018-03-30 09:12:25 +09:00
Ian Barwick
b292ac61f8 "standby switchover": update hint 2018-03-30 09:12:21 +09:00
Ian Barwick
293d66bf71 Fix minimum accepted value for "degraded_monitoring_timeout"
Should be -1, the default.

Addresses GitHub #411.
2018-03-30 09:12:17 +09:00
Ian Barwick
3e1f0ec168 repmgr: move demoted primary check to the final step during switchover
This will give the demoted primary more time to start up as a standby,
during which "standby follow" can be executed on sibling nodes, if
specified.
2018-03-27 16:41:13 +09:00
Ian Barwick
6f9a1f975e repmgr: poll demoted primary after restart during switchover
During a switchover operation, once the demoted primary has been restarted
as a standby, repmgr attempts to reconnect to verify its status and drop
any redundant replication slots. However it's possible the standby may still
be in the startup phase, so poll for "standby_reconnect_timeout" seconds
before giving up.

Addresses GitHub #408.
2018-03-27 15:58:18 +09:00
Ian Barwick
deea4f69f7 Fix "repmgr cluster crosscheck" output
Addresses GitHub #398.
2018-03-27 10:28:27 +09:00
Ian Barwick
37e53108a2 Consolidate connection closure calls 2018-03-27 08:52:23 +09:00
Ian Barwick
96cf06204c doc: add note about remote command execution
When executing a command on a remote server, repmgr expects the remote binary
to be in the same location as the local binary. It's reasonable to assume
repmgr will be deployed in a unified environment; if not, the onus is on the
user to ensure repmgr can find the remote binary, e.g. by creating appropriate
symlinks.

Addresses query in GitHub #406.
2018-03-27 08:47:56 +09:00
Ian Barwick
381e22c2c7 Misc tweaks to witness code 2018-03-26 20:59:38 +09:00
Ian Barwick
7e2af17783 repmgrd: tweak log notices when marking a standby as failed
Announce what we're going to do (set the node record inactive) *before*
performing the action. Makes reading the log slightly easier.
2018-03-23 13:27:37 +08:00
Ian Barwick
b4272853e7 Add event "repmgrd_failover_aborted" 2018-03-23 10:44:00 +08:00
Ian Barwick
562b6ddfc2 Add error code ERR_FOLLOW_FAIL 2018-03-23 10:34:19 +08:00
Ian Barwick
a15e5c9d52 Tidy up queries in dbutils.c
- standardize formatting
- prefix various internal function calls with "pg_catalog.", to
  mitigate possible risks from CVE-2018-1058
2018-03-23 10:33:28 +08:00
Ian Barwick
d9cc09cee4 repmgrd: fix typo 2018-03-21 12:36:51 +09:00
Ian Barwick
c4f6abe951 Update HISTORY 2018-03-21 06:51:56 +09:00
Martín Marqués
e454fb77d3 While reviewing 7cb6e5af8d before merging
I noticed that besides the result cleanup added, there was still a missing
spot inside the if condition.

Adding the PQclear that was missing.
2018-03-21 06:51:50 +09:00
Andrzej Nowicki
b76e5852d3 One more memory leak fixed 2018-03-21 06:51:43 +09:00
Andrzej Nowicki
0674364ffd Clear node list to avoid memory leak, fixes #402 2018-03-21 06:51:37 +09:00
Ian Barwick
b2eb9b8525 Correctly handle error message pointer when parsing strings.
When parsing conninfo strings, ensure the error message pointer is
actually returned to the caller.

Not a criticial issue, just meant the contents of the error message
were not being displayed.
2018-03-10 14:28:10 +09:00
Ian Barwick
71c5d10a8c doc: update 4.0.4 release date 2018-03-09 20:07:16 +09:00
Ian Barwick
1476b21cd4 doc: update release notes
Add note about requiring 4.0.3 or later on all nodes when performing
a switchover from a noder running 4.0.3 or later.

Per report in GitHub #388.
2018-03-09 09:46:58 +09:00
Ian Barwick
b17993abdb doc: update "repmgr primary unregister" description
As noted by GitHub user yonj1e in GitHub #396.
2018-03-08 15:01:25 +09:00
Ian Barwick
8f68344f9a doc: update FAQ
Additional clarification for "repmgr standby clone --recovery-conf-only"
2018-03-08 10:04:30 +09:00
Ian Barwick
125ac6c297 doc: update FAQ
Add entry about upgrading PostgreSQL
2018-03-08 10:04:30 +09:00
Ian Barwick
955860923f Fix parsing of -k/--keep-history option
GitHub #394.
2018-03-07 19:14:18 +09:00
Ian Barwick
50626f90cc Add 4.0.4 release notes 2018-03-07 14:17:04 +09:00
Ian Barwick
9aea5b8aa7 repmgrd: fix failover handling in "manual" mode
Regression was introduced in commit c7a585c555
2018-03-06 22:35:51 +09:00
Ian Barwick
ed1bcb159e repmgrd: remove duplicate local record check in BDR mode 2018-03-06 12:31:07 +09:00
Ian Barwick
9c72c0d66e Add event "repmgrd_shutdown"
Implements GitHub #393
2018-03-06 10:59:54 +09:00
Emre Hasegeli
0ddc226c2a Add witness options to the main help
GitHub #392
2018-03-06 10:57:33 +09:00
Ian Barwick
93830cad61 Fix directory creation when cloning from Barman 2018-03-05 19:31:53 +09:00
Ian Barwick
bca1660d5e Improve repmgrd logging in BDR mode
Also ensure interval status log line is shown as intended
2018-03-05 15:05:40 +09:00
Ian Barwick
5a52917421 repmgrd: add debug log output for "monitor_interval_secs" sleep in all modes 2018-03-05 14:23:58 +09:00
Emre Hasegeli
70752d7d4a Add missing options to the main help 2018-03-05 09:52:04 +09:00
Ian Barwick
c29d1efc37 "standby clone": improve replication user selection
Use the upstream node's replication user when checking the replication
connection.
2018-03-02 16:21:32 +09:00
Ian Barwick
6fbbe2a97a "standby clone": fix --superuser handling
get_superuser_connection() was erroneously using the local node record
to connect to as a superuser, which works when registering the primary
but obviously not when cloning a standby.

Addresses GitHub #380.
2018-03-02 14:49:17 +09:00
Ian Barwick
ce42d6827e Update HISTORY 2018-03-01 15:51:09 +09:00
Ian Barwick
98384559a6 "standby clone": remove restriction on replication slots in Barman mode
While it's preferable to avoid standby replication slots if Barman is in
use, there's no technical reason to prevent this.

Implements GitHub #379.
2018-03-01 15:47:28 +09:00
Ian Barwick
4a1477343b repmgr: escape "restore_command" in generated recovery.conf 2018-03-01 10:39:04 +09:00
Ian Barwick
d2b9d20393 "standy clone": fix primary_conninfo when --upstream-conninfo provided 2018-03-01 09:18:40 +09:00
Ian Barwick
fe594c95ad repmgrd: retry standby connection after cascading standby failover 2018-02-28 21:15:11 +09:00
Ian Barwick
60e63feaca repmgrd: add configuration file parameter "standby_reconnect_timeout"
This is used for determining a timeout when reconnecting to the standby
after executing the "follow_command". This will normally not need to be
set explicitly, but maybe useful in cases where the standby's startup
phase can last longer than usual.
2018-02-28 18:56:33 +09:00
Ian Barwick
ae4d0f2622 repmgrd: fix main monitoring loop for witness server
Missing "break" was breaking it when following a new primary.
2018-02-28 16:30:14 +09:00
Ian Barwick
5e8b41e221 repmgrd: retry standby connection after "follow_command" executed
It's possible that the standby is still starting up after the "follow_command"
completes, so poll for a while until we get a connection.
2018-02-28 15:35:47 +09:00
Ian Barwick
c7a585c555 repmgrd: improve log output
- emit explicit startup NOTICE
- emit NOTICE when falling back to degraded monitoring on a primary node
- improve log message and event notification details when monitoring
  a former primary which has been reconnected as a standby
2018-02-28 12:35:13 +09:00
Ian Barwick
a27dd8c49c doc: document "primary_follow_timeout" configuration file parameter. 2018-02-27 10:09:40 +09:00
Ian Barwick
9365bf3474 "standby promote": make timeout values configurable
This introduces following new configuration file parameters, which
were previously hard-coded values:

 - promote_check_timeout
 - promote_check_interval

Implements GitHub #387.
2018-02-27 10:04:58 +09:00
Ian Barwick
e8ae0831fe doc: add <options> section for various commands 2018-02-26 16:54:54 +09:00
Ian Barwick
518866eba5 "node status": improve replication slot warnings
Addresses GitHub #385
2018-02-23 11:06:47 +09:00
Ian Barwick
ed0330c334 "standby clone": document --recovery-conf-only option 2018-02-23 10:54:42 +09:00
Ian Barwick
1f021dc9fa "standby clone --recovery-conf-only": display generated file with --dry-run
Refactor the original code which generates "recovery.conf" to place the
output into a buffer, which can either be output as "recovery.conf"
or copied to a buffer specified by the caller.
2018-02-23 10:16:47 +09:00
Ian Barwick
425839d764 Fix typo in function name 2018-02-22 15:48:41 +09:00
Ian Barwick
3a764f678a "standby clone": add --recovery-conf-only option
This will generate "recovery.conf" for an existing standby.

Typical use-case is a standby cloned manually from an external data
source (e.g. Barman), where "recovery.conf" needs to be created
(and if required a replication slot).

The --dry-run option will check the pre-requisites but not actually
create "recovery.conf" or a replication slot.

This requires that the upstream node is running, a replication connection
can be made and if required a replication slot can be created.

Implements GitHub #382.
2018-02-22 15:47:19 +09:00
Ian Barwick
829cf5cca4 repmgrd: improve detection of status change from primary to standby
If repmgrd is running in degraded mode on a primary which has been stopped,
then manually been brought back online as a standby (e.g. by creating
recovery.conf and starting the server), ensure it not only detects the
change but automatically updates the node record so it can resume
monitoring the node as a standby.

Previously, repmgrd was looping waiting for the record to be updated
(as is done transparently when executing "repmgr node rejoin") but
if the record was not updated within the timeout period (e.g. by
"repmgr standby register) it would fail to resume monitoring as a
standby.

It seems reasonable to have repmgrd automatically update the node record,
as this will restore failover capability as quickly as possible. If this
is not desired, then the onus is on the user to shut down repmgrd while
making the desired changes.
2018-02-22 11:35:47 +09:00
Ian Barwick
14420d83fa "node rejoin": ensure --dry-run is honoured
Addresses GitHub #383.
2018-02-20 15:28:39 +09:00
Ian Barwick
a80e22f0ed Bump version
4.0.4
2018-02-16 12:19:31 +09:00
Ian Barwick
832993bfbc doc: update 4.0.3 release notes 2018-02-16 12:15:10 +09:00
Ian Barwick
f1ea5e62df doc: update release notes 2018-02-15 14:42:29 +09:00
Ian Barwick
b47448d0e5 Replace remaining instances of strcpy() with strncpy()
Also use strncmp() to match.
2018-02-15 13:17:06 +09:00
Ian Barwick
a8232337d8 Catch various corner cases when restarting a PostgreSQL instance 2018-02-14 11:28:38 +09:00
Ian Barwick
c9eb1bfcc0 Always initialise t_conninfo_param_list structures 2018-02-13 10:48:18 +09:00
Ian Barwick
db552dfbc7 Bump version
4.0.3
2018-02-12 15:03:29 +09:00
Ian Barwick
9732f78565 repmgrd: check "repmgr" extension is installed before starting
Implements GitHub #361.
2018-02-12 11:31:59 +09:00
Ian Barwick
eb7dca2919 "node status": add warning about missing replication slots
Implements GitHub #364.
2018-02-12 10:53:31 +09:00
Ian Barwick
c113102926 Update repmgr.conf.sample
Add missing parameter "monitor_interval_secs"
2018-02-12 09:35:57 +09:00
Ian Barwick
ed6a167915 Execute a CHECKPOINT immediately after promoting the server
This ensures "pg_control" is updated with the latest timeline, mainly
to ensure that if "pg_rewind" is executed as part of a switchover
that it sees the latest timeline.

Per suggestion from GitHub user "superflav" in GitHub #378.

See also:

  https://www.postgresql.org/message-id/flat/20150428180253.GU30322%40tamriel.snowman.net
2018-02-09 12:09:16 +09:00
Ian Barwick
fbbe7afd61 doc: update HISTORY and release notes 2018-02-09 11:42:16 +09:00
Ian Barwick
ae1fc93e48 Ensure correct server version number used for replication stats query 2018-02-09 11:06:15 +09:00
Ian Barwick
7b4ee80af2 "standby switchover": check demotion candidate can make replication connection
Check it's actually possible for the demotion candidate to attach to
the promotion candidate before executing the switchover.

As with other checks of this nature, there's a faint possibility the
situation could change between the time the check is carried out and
the demotion candidate is restarted to connect to the promotion candidate,
but there's not a lot we can do about that. The main purpose is to
be able to catch existing misconfigurations before anything gets changed.

Implements GitHub #370.
2018-02-09 10:01:29 +09:00
Ian Barwick
0b8755e278 "witness register": fix primary node check
Addresses GitHub #377, based on report by user yonj1e in #373.
2018-02-08 16:28:50 +09:00
Ian Barwick
d3e1937808 "standby switchover": additional sanity checks
Check that sufficient walsenders will be available on the promotion
candidate, and if replication slots are in use check if enough of
those will be available.

Note these checks can't guarantee that the walsenders/slots will
be available at the appropriate points during the switchover process,
but do ensure that existing configuration problems will be caught.

Implements GitHub #371.
2018-02-08 15:23:10 +09:00
Ian Barwick
871d6fdee3 "standby clone": cowardly refuse to clone into an active data directory
By checking the PID file in the same way pg_ctl does, we can be pretty
much certain whether the target data directory contains an active
PostgreSQL instance.
2018-02-08 11:43:24 +09:00
Ian Barwick
c7dfe9e040 Fix "standby clone" in Barman mode with --no-upstream-connection
"--upstream-node-id", if provided, was not being passed through to
the SQL query executed via the Barman server.

Also modified the query to select the primary node if "--upstream-node-id"
is not provided.

Note: this is a very niche use case.
2018-02-07 16:36:44 +09:00
Ian Barwick
5c92a9e057 repmgr: simplify data directory checks when cloning
Attempting to use the contents of pg_control to tell whether the directory
is in use by PostgreSQL can result in false positives; we should use
a check based on the pidfile.

Also change the HINT to indicate a data directory can be overwritten
if -F/--force is provided.
2018-02-07 14:37:57 +09:00
Ian Barwick
aa5f025738 "standby clone": ensure "pg_subtrans" directory is created in Barman mode 2018-02-07 10:56:18 +09:00
Ian Barwick
5b91a2d409 Update HISTORY and release notes 2018-02-07 09:55:36 +09:00
Ian Barwick
596a19ee37 Move parse_output_to_argv() to configfile.c
So it can be used by parse_pg_basebackup_options().

Addresses GitHub #376.
2018-02-07 09:43:06 +09:00
Ian Barwick
23ff83b3b4 Fix typo in HINT 2018-02-07 08:55:51 +09:00
Ian Barwick
ba1f6bee0d doc: fix GitHub reference in release notes 2018-02-07 08:53:23 +09:00
Ian Barwick
da9c8f2491 Update HISTORY and release notes 2018-02-06 10:38:13 +09:00
Ian Barwick
64035ef701 "standby register/follow": provide primary node details for event notifications
For events generated by these commands, it may be useful to know details
of the primary node. This makes following additional parameters available
to event notification scripts:

- %p: node ID of the primary
- %a: node name of the primary
- %c: conninfo string for the primary

Implements GitHub #375
2018-02-06 09:36:46 +09:00
Ian Barwick
da3a5ab1dc doc: fix descriptions of %p event notification script parameter 2018-02-05 15:54:06 +09:00
Ian Barwick
9d301b4789 "standby register": add event notification "standby_register_sync"
Implements GitHub #374.
2018-02-05 15:21:38 +09:00
Ian Barwick
c070c649f7 doc: minor fixes to BDR docs
Also remove duplicate file.
2018-02-05 15:21:34 +09:00
Ian Barwick
3b823396eb doc: improve BDR failover documentation 2018-02-05 15:21:28 +09:00
Ian Barwick
c19e7f1025 "cluster show": output any connection error messagesin list of warnings
This ensures any connection errors are displayed by default in a
comprehensible, easily reportable way, and saves having to request/filter
DEBUG output.

Implements GitHub #369.
2018-02-05 10:32:20 +09:00
Ian Barwick
e4b5a1e19f "cluster show": minor code cleanup 2018-02-05 10:25:05 +09:00
Ian Barwick
f96cc3b906 "cluster show": improve handling of database errors
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.
2018-02-05 10:15:48 +09:00
Tony Finch
a481ca7ce2 "repmgr node status": correct upstream node info (#363)
repmgr was printing the name and ID of this node instead of its upstream

Signed-off-by: Tony Finch <dot@dotat.at>
2018-02-05 09:54:00 +09:00
Ian Barwick
32dc450a09 doc: add note about replication slots and PostgreSQL upgrades 2018-02-02 18:33:43 +09:00
Ian Barwick
34dbf64f50 Ensure an inactive PostgreSQL data directory can be deleted.
Addresses GitHub #366.
2018-02-02 17:12:25 +09:00
Ian Barwick
ea653a8dbc "standby follow": finalize implementation of --dry-run option 2018-02-02 15:42:08 +09:00
Ian Barwick
50894b6124 "standby follow": check for replication slot availability on target node 2018-02-02 15:01:23 +09:00
Ian Barwick
94e187c476 Improve "repmgr primary unregister" documentation and --help output
Per observations in GitHub #373
2018-02-02 14:12:15 +09:00
Ian Barwick
de6284ae79 doc: note password SSH requirements for "standby switchover" 2018-02-02 14:01:58 +09:00
Ian Barwick
c54045bcd8 "standby follow": initial implementation of --dry-run option
GitHub #363.
2018-02-01 14:18:40 +09:00
Ian Barwick
c0a53471e1 "standby switchover": improve log messages and add new exit code
Previously, if an issue was encountered with the old primary, but user
provided -F/--force to have repmgr promote the standby anyway, repmgr
would exit with the log message "STANDBY SWITCHOVER is complete"
and exit code 0 (SUCCESS).

To better report this partial completion, repmgr will now emit the message
"STANDBY SWITCHOVER has completed with issues" (and a HINT to check preceding
log messages) and new exit code 22 (ERR_SWITCHOVER_INCOMPLETE).
2018-01-31 10:25:15 +09:00
Ian Barwick
2eec8b5d79 Have do_standby_follow_internal() not abort on error
Pass the error code back to the caller instead, mainly so
"repmgr node rejoin" can better report errors.
2018-01-30 16:53:04 +09:00
Ian Barwick
c11e92cf2a repmgr: improve switchover handling when "pg_ctl" used
If logging output not explicitly rediretced with "-l" in the pg_ctl
options, repmgr would hang waiting for pg_ctl output.

Note that we recommend using the OS-level service commands where
available.
2018-01-30 13:43:37 +09:00
Ian Barwick
f294d09034 "repmgr standby register": improve error output when standby not running
Add explicit HINT
2018-01-26 22:13:11 +09:00
Ian Barwick
26c597ef5a doc: expand upgrade documentation
Include section about using pg_upgrade
2018-01-23 10:57:19 +09:00
Vlad
b8efbb7a15 doc: add missing word in overview
GitHub pull request #362
2018-01-19 09:11:54 +09:00
Ian Barwick
3044696c05 doc: update 4.0.2 release notes
Add details about upgrading.
2018-01-19 09:09:59 +09:00
Ian Barwick
6dc1969ad5 Remove --bdr-only configuration option
This was required for a specific use case during pre-release
development and is no longer needed now the physical streaming
replication handling is implemented.
2018-01-18 13:30:47 +09:00
Ian Barwick
cb41ef1733 doc: update list of event notifications 2018-01-18 11:48:10 +09:00
Ian Barwick
d10f1f289e Bump version in configure.in
4.0.2
2018-01-16 13:55:58 +09:00
Ian Barwick
5731ba6043 Update version and release date 2018-01-16 12:58:11 +09:00
Ian Barwick
3d6437c8f8 repmgr: assume node is actually shutting down if pingable and that's the reported status 2018-01-16 11:17:06 +09:00
Ian Barwick
54b5c8ad94 repmgrd: log execution error in "repmgrd_get_local_node_id()"
That shouldn't happen, but if it does it will make it easier to
identify the issue.
2018-01-16 11:14:04 +09:00
Ian Barwick
0eca08ffaf doc: improve switchover documentation
Emphasize need to set the "service_*_command" options when repmgr is
installed from a package.
2018-01-16 11:06:39 +09:00
Ian Barwick
05c1dc2b92 doc: add 4.0.2 release notes 2018-01-11 16:39:58 +09:00
Ian Barwick
2bd300073d doc: minor readbility fix 2018-01-11 15:49:56 +09:00
Ian Barwick
01e020df8e doc: note change of shared library name from "repmgr_funcs" to "repmgr" 2018-01-11 15:47:35 +09:00
Ian Barwick
ae7963dc64 repmgr: automatically create slot name if missing
It's possible that a node was registered with "use_replication_slots=false"
but that was later changed to "use_replication_slots=true". If the node
was not subsequently re-registered, the node record will contain an empty
slot name, which will cause any slot creation operation during
"standby follow" or "node rejoin" to fail.

To prevent this happening, check for an empty slot name and automatically
set before proceeding.

Addresses GitHub #343.
2018-01-11 11:13:41 +09:00
Ian Barwick
faffb2a6e7 repmgr: catch possible corner case when checking node shutdown status
It's conceivable that PQping is returning "no response" but the
shutdown hasn't quite completed.
2018-01-10 14:56:00 +09:00
Ian Barwick
5d57044118 repmgr: during switchover, correctly detect unclean shutdown status 2018-01-10 12:21:04 +09:00
Ian Barwick
07a88c78a5 repmgr standby switchover: add "%p" event notification parameter
This will contain the node ID of the former primary.
2018-01-10 11:01:00 +09:00
Ian Barwick
f7df8b9c80 doc: document command line options for "standby switchover" 2018-01-10 10:19:36 +09:00
Ian Barwick
20920b3da1 repmgr standby switchover: add event details 2018-01-10 09:55:24 +09:00
Ian Barwick
683f4de182 Bump version
4.0.2
2018-01-09 13:43:58 +09:00
Ian Barwick
0c62821ffb Consolidate parsing of output from executing repmgr on a remote server
This should also fix the issue reported in GitHub #349.
2018-01-09 13:33:38 +09:00
Ian Barwick
6b70e8bbe6 doc: list repmgr.conf parameters relevant during switchover 2018-01-08 11:13:39 +09:00
Ian Barwick
6b223698c9 Fix call to is_active_bdr_node() in BDR repmgrd
Following the fix to "is_active_bdr_node()" in 841f03ae, it turns out
the call in repmgrd-bdr.c was only accidentally working; explicitly
test for a false return value.
2018-01-04 21:06:45 +09:00
Ian Barwick
aee12dc2c7 "repmgr bdr register": create missing connection replication set if needed
Previously the assumption was that the "repmgr" replication set would be
set up when the nodes are created, however no checks were implemented
and this was not well-documented.

Addresses GitHub #347.
2018-01-04 17:12:52 +09:00
Ian Barwick
c5c86e1ada "repmgr bdr register": improve node name check
We'll use "bdr.bdr_get_local_node_name()" to check the local BDR node
name and the repmgr one match.
2018-01-04 16:07:06 +09:00
Ian Barwick
7476dc84f2 doc: link event notification page from relevate command reference pages 2018-01-04 14:54:14 +09:00
Ian Barwick
f6d63f5216 doc: update package documentation 2018-01-04 13:11:44 +09:00
Ian Barwick
a608b0bc18 "repmgr standby register": add --wait-start option
Implements GitHub #356.
2018-01-04 12:48:12 +09:00
Ian Barwick
469ebba656 doc: fix typos in "repmgr primary unregister" command reference 2018-01-04 12:31:29 +09:00
Ian Barwick
647c21ad0e doc: add link to event notifications page from "repmgr cluster event" 2018-01-04 10:57:54 +09:00
Ian Barwick
3d2530d6f9 Fix query in is_active_bdr_node()
Boolean column was not being checked correctly.

Also add detail output in "repmgr node role --check", where the function
is called.
2018-01-04 10:48:31 +09:00
Ian Barwick
b26e400199 "repmgr cluster event": move query to dbutils.c 2018-01-04 10:06:54 +09:00
Ian Barwick
152e9545a4 docs: document "repmgr cluster event --terse" 2018-01-04 09:53:54 +09:00
Ian Barwick
83b8f05221 "repmgr cluster events": optionally omit "Details" column with --terse
Implements GitHub #360.
2018-01-04 09:48:00 +09:00
Ian Barwick
486f8e5a2c repmgrd: document standby_[failure|recovery] event notifications
Also clean up the relevant code section.

Addresses GitHub #359.
2018-01-04 09:34:49 +09:00
Ian Barwick
e517cc74d1 repmgr node rejoin: handle missing node record correctly
If a connection was provided for a database other than the "repmgr"
database, error was logged but execution continued, resulting in
the connection being finished twice.

Addresses GitHub #358.
2018-01-03 15:20:10 +09:00
Ian Barwick
26285b470f doc: add appendix with details about packages
work-in-progress
2018-01-02 17:24:51 +09:00
Ian Barwick
1521657965 Update copyright notices to 2018 2018-01-02 10:20:09 +09:00
Ian Barwick
041604e303 doc: Fix event notification placeholder typo
Per report from Carlos.
2018-01-01 10:29:34 +09:00
Ian Barwick
0be0100a7c docs: update HISTORY 2017-12-27 10:24:56 +09:00
Ian Barwick
2133834dda doc: update documentation build instructions
Describe how to build documentation as a single file, and also note
requirement to build against 9.6 or earlier.
2017-12-27 10:24:22 +09:00
Ian Barwick
d5fd93c350 repmgr.conf.sample: fix command line argument
"repmgr node check --archive-ready" is correct, however abbreviated
versions will be accepted by getopt_long() if they don't match
or partially match any other options.

Per report by "chaintng" in GitHub #355.
2017-12-27 10:24:17 +09:00
Tony Finch
5804778b58 doc: an optional all-in-one-file manual 2017-12-27 10:24:10 +09:00
Ian Barwick
407a7ea2f4 repmgr: add missing -W option to getopt_long() invocation
Addresses GitHub #350.
2017-12-20 10:28:31 +09:00
Martín Marqués
4d2eca0978 Switch spaces for tabs in repmgr.conf sample file.
This makes comments stay aligned in most cases the conf file is
modified, and when indentation changes, it's easy to re-align
(by removing or adding a tab)

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-12-20 09:27:06 +09:00
Martín Marqués
9d25544ab5 Add more information to the setting up sudo without requiretty in
the documentation

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-12-20 09:27:02 +09:00
Daymel Bonne Solís
8506607388 Fix package name 2017-12-20 09:26:57 +09:00
Ian Barwick
e8e059c26d docs: update 4.0.1 release date 2017-12-13 15:15:13 +09:00
Abhijit Menon-Sen
38d293694d Fix typo: upstream_node_id → upstream_node 2017-12-11 09:30:37 +09:00
Ian Barwick
54a10a0c3f Add diagnostic option "repmgr node check --has-passfile"
This checks if the active libpq version (9.6 and later) has the
"passfile" option, and returns 0 if present, 1 if not.
`
2017-12-05 12:53:04 +09:00
Ian Barwick
a8016f602f Fix unpackaged upgrade SQL for PostgreSQL 9.3 2017-12-04 17:46:52 +09:00
Ian Barwick
de57ecdad1 Finalize 4.0.1 release files 2017-11-29 17:02:47 +09:00
Ian Barwick
1fde81cf3f docs: improve event notification documentation 2017-11-29 14:44:07 +09:00
Ian Barwick
146c412061 docs: minor fixes to various examples 2017-11-29 11:30:38 +09:00
Ian Barwick
e9cb61ae7a docs: add additional note about setting "wal_log_hints"
Useful to reference this when discussing PostgreSQL configuration in
general.
2017-11-29 11:25:14 +09:00
Ian Barwick
50e9460b3e Update release notes 2017-11-28 13:42:28 +09:00
Ian Barwick
47e7cbe147 Update HISTORY 2017-11-28 13:00:31 +09:00
Ian Barwick
bf0be3eb43 Bump version
4.0.1
2017-11-28 12:36:22 +09:00
Ian Barwick
270da1294c repmgr: initialise "voting_term" in "repmgr primary register"
This previously happened in the extension SQL code, which could
potentially cause replay problems if installing on a BDR cluster.

As this table is only required for streaming replication failover,
move the initialisation to "repmgr primary register".

Addresses GitHub #344 .
2017-11-28 12:26:33 +09:00
Ian Barwick
d3c47f450f docs: add 2ndQ yum repository installation instructions
These replace the HTML document at https://repmgr.org/yum-repository.html
2017-11-24 14:14:36 +09:00
Ian Barwick
c20475f94a Delete any replication slots copied by pg_rewind
If --force-rewind is used in conjunction with "repmgr node rejoin",
any replication slots present on the source node will be copied too;
it's essential to remove these to prevent stale slots being extant
when the node starts up.

We do this at file system level *before* the server starts to minimize
the risk of any problems.

Addresses GitHub #334
2017-11-24 11:15:14 +09:00
Ian Barwick
e0560c3e70 docs: fix configuration file example
Per report from Carlos Chapi.
2017-11-24 09:27:39 +09:00
Ian Barwick
3fa2bef6f4 repmgr: fix configuration file sanity check
The check was being carried out regardless of whether --copy-external-config-files
was specified, which means cloning will fail if no SSH connection is available.

Addresses GitHub #342
2017-11-23 22:50:28 +09:00
Ian Barwick
f8a0b051c8 repmgr: fix return code output for repmgr node check --action=...
Addresses GitHub #340
2017-11-23 10:35:41 +09:00
Martín Marqués
3e4a5e6ff5 Fix missing FQN for the nodes table.
This bug was not detected before because most users work with the repmgr
user. For that reason, the repmgr schema is already in the search_path
by default.

Add the repmgr schema to the nodes table in the LEFT JOIN used for
cluster show (and in other places)

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-11-23 10:35:38 +09:00
Ian Barwick
020b5b6982 docs: update 4.0.0 release notes 2017-11-21 16:27:18 +09:00
Ian Barwick
932326e4a0 Bump version in configure.in 2017-11-20 17:55:22 +09:00
Ian Barwick
019cd081e8 Bump version
4.0.0
2017-11-20 15:45:48 +09:00
Ian Barwick
3ace908126 docs: miscellaneous updates 2017-11-20 15:44:31 +09:00
Ian Barwick
2ad174489c docs: improve documentation of pg_basebackup_options 2017-11-20 15:30:31 +09:00
Ian Barwick
9124e0f0a2 docs: expand witness documentation 2017-11-20 15:29:31 +09:00
Ian Barwick
060b746743 docs: miscellaneous cleanup 2017-11-20 15:29:28 +09:00
Ian Barwick
bdb82d3aba docs: add initial witness server documentation 2017-11-20 15:29:24 +09:00
Ian Barwick
f6a6df3600 repmgrd: renable monitoring data recording when in archive recovery.
The warning emitted gives the impression that monitoring data shouldn't
be written if there's no streaming replication, but we can and should
do this as long as we have a primary connection.

Explictly document this in the code.

Also remove an unused variable warning.
2017-11-20 15:29:21 +09:00
Ian Barwick
67e27f9ecd Remove unneeded functions 2017-11-20 15:26:32 +09:00
Ian Barwick
454c0b7bd9 docs: add note about "service_promote_command" in repmgr.conf.sample
It must never contain "repmgr standby promote", as it is intended
to enable use of package-level promote commands such as Debian's
"pg_ctlcluster promote".

Addresses GitHub #336.
2017-11-20 12:31:24 +09:00
Ian Barwick
faf297b07f remove spurios "/base" path element in Barman tablespace cloning code.
Addresses GitHub #339
2017-11-20 11:10:30 +09:00
Ian Barwick
0dae8c9f0b repmgr: don't add empty "passfile" parameter in recovery.conf 2017-11-20 10:28:16 +09:00
Ian Barwick
3f872cde0c "repmgr node ...": fixes for 9.3
Mainly to account for the lack of replication slots.
2017-11-16 11:26:39 +09:00
Ian Barwick
e331069f53 Escape double-quotes in strings passed to an event notification script
The string in question will be generated internally by repmgr as a simple
one-line string with no control characters etc., so all that needs to be
escaped at the moment are any double quotes.
2017-11-16 10:38:55 +09:00
Ian Barwick
53ebde8f33 repmgrd: don't fail over unless more than 50% of active nodes are visible. 2017-11-15 14:04:41 +09:00
Ian Barwick
5e9d50f8ca repmgrd: finalize witness failover handling 2017-11-15 14:04:37 +09:00
Ian Barwick
347e753c27 repmgrd: synchronise repmgr.nodes table on witness server 2017-11-15 14:04:34 +09:00
Ian Barwick
2f978847b1 repmgrd: handle witness server 2017-11-15 14:04:30 +09:00
Ian Barwick
3014f72fda "witness register": set upstream_node_id to that of the primary 2017-11-15 14:04:26 +09:00
Ian Barwick
e02ddd0f37 repmgrd: basic witness node monitoring 2017-11-15 14:04:23 +09:00
Ian Barwick
29fcee2209 docs: add witness command reference files to file list 2017-11-15 14:04:19 +09:00
Ian Barwick
f61f7f82eb docs: add command reference for "witness (un)register" 2017-11-15 14:04:14 +09:00
Ian Barwick
efe28cbbeb witness (un)register: add --dry-run mode 2017-11-15 14:04:09 +09:00
Ian Barwick
6131c1d8ce witness unregister: enable execution when witness server is down
Also add help output for "repmgr witness --help".
2017-11-15 14:04:06 +09:00
Ian Barwick
c907b7b33d repmgr: minor fix to "repmgr standby --help" output 2017-11-15 14:04:01 +09:00
Ian Barwick
e6644305d3 Add "witness unregister" functionality 2017-11-15 14:03:57 +09:00
Ian Barwick
31b856dd9f Add "witness register" functionality 2017-11-15 14:03:54 +09:00
Ian Barwick
dff2bcc5de witness: initial code framework 2017-11-15 14:03:50 +09:00
Ian Barwick
688e609169 docs: add some more index entries 2017-11-15 14:03:44 +09:00
Ian Barwick
3e68c9fcc6 docs: document "passfile" configuration file parameter 2017-11-15 14:03:40 +09:00
Ian Barwick
d459b92186 Add configuration file "passfile"
This will enable a custom .pgpass to be included in "primary_conninfo"
(provided it's supported by the libpq version on the standby).
2017-11-15 14:03:37 +09:00
Ian Barwick
2a898721c0 docs: update release notes
Add note about changes to password handling.1
2017-11-15 14:03:34 +09:00
Ian Barwick
35782d83c0 Update extension SQL 2017-11-15 14:03:30 +09:00
Ian Barwick
e16eb42693 repmgrd: detect role change from primary to standby
If repmgrd is monitoring a primary which is taken off-line, then later
restored as a standby, detect this change and resume monitoring
in standby node.

Addresses GitHub #338.
2017-11-15 14:03:26 +09:00
Ian Barwick
4d6dc57589 repmgrd: check shared library is loaded
If this isn't the case, "repmgrd" will appear to run but not handle
failover correctly.

Address GitHub #337.
2017-11-15 14:03:18 +09:00
Ian Barwick
cbc97d84ac repmgrd: updates related to node_id handling 2017-11-15 14:03:15 +09:00
Ian Barwick
96fe7dd2d6 repmgrd: catch corner cases where monitoring data is not available 2017-11-15 14:03:12 +09:00
Ian Barwick
13935a88c9 repmgrd: ensure shmem is reinitialised after a restart 2017-11-09 19:51:31 +09:00
Ian Barwick
5275890467 repmgrd: misc fixes 2017-11-09 19:51:26 +09:00
Ian Barwick
7f865fdaf3 repmgrd: fix priority/node_id tie-break check 2017-11-09 19:51:22 +09:00
Ian Barwick
9e2fb7ea13 repmgrd: remove unneeded functions 2017-11-09 19:51:18 +09:00
Ian Barwick
a3428e4d8a repmgrd: simplify the candidate selection logic
All disconnected nodes will be in a static, known state, so as long as
each node has the same meta-information (repmgr.nodes) and is able
to retrieve the last receive LSN of the other nodes, it is possible
for each node to independently determine the best promotion candidate,
thereby reaching consensus without an explicit "voting" process.
2017-11-09 19:51:13 +09:00
Ian Barwick
03b9475755 repmgrd: fixes to failover handling
get_new_primary() returns NULL if no notification for the new primary has
been received, but the code was expecting it to return UNKNOWN_NODE_ID,
which was causing repmgrd to prematurely drop out of the new primary
detection loop if no notification had been received by the time the loop
started.

Also store the electoral term as a single row, single column table,
to ensure that all repmgrds see the same turn. It is then bumped
by the winning node after it gets promoted.

Various logging improvements.
2017-11-09 19:51:09 +09:00
Ian Barwick
de1eb3c459 Ensure shared memory functions handle NULL parameters correctly 2017-11-09 19:51:02 +09:00
Ian Barwick
a13eccccc5 Update .gitignore
Ignore output from "make installcheck"
2017-11-09 19:50:57 +09:00
Ian Barwick
158f132bc0 README: update links to https versions 2017-11-09 19:50:53 +09:00
Ian Barwick
cdf54d217a Fix lock acquisition in shared memory functions 2017-11-09 19:50:48 +09:00
Ian Barwick
1a8a82f207 Update repmgr.conf.sample 2017-11-09 19:50:42 +09:00
Ian Barwick
60e877ca39 docs: fix example in BDR section 2017-11-02 11:24:10 +09:00
Ian Barwick
91531bffe4 docs: tweak Markdown URL formatting 2017-11-01 10:59:10 +09:00
Ian Barwick
fc5f46ca5a docs: update links to repmgr 4.0 documentation 2017-11-01 10:49:58 +09:00
Ian Barwick
b76952e136 docs: update copyright info 2017-11-01 09:36:16 +09:00
Ian Barwick
c3a1969f55 docs: convert command reference sections to <refentry> format
Note that most entries still need a bit more tidying up, consistent structuring,
provision of more examples etc.
2017-10-31 11:29:49 +09:00
Ian Barwick
11d856a1ec "standby follow": get upstream record before server restart, if required
The standby may not always be available for connections right after it's
restarted, so attempting to connect and get the node's upstream record
after the restart may fail. Record is now retrieved before the restart.

Addresses GitHub #333.
2017-10-27 16:30:25 +09:00
Ian Barwick
fbf357947d docs: add sample output to "standby follow" and "standby promote" 2017-10-27 15:05:46 +09:00
Ian Barwick
47eaa99537 docs: add note about building docs 2017-10-27 10:46:58 +09:00
Ian Barwick
aeee11d1b7 docs: finalize conversion of existing BDR repmgr documentation 2017-10-26 18:57:34 +09:00
Ian Barwick
e4713c5eca docs: update configuration documentation 2017-10-26 18:57:29 +09:00
Ian Barwick
e55e5a0581 Initial conversion of existing BDR repmgr documentation 2017-10-26 18:56:58 +09:00
Ian Barwick
fb0aae183d Docs: update "repmgr cluster show" 2017-10-26 09:42:36 +09:00
Ian Barwick
52655e9cd5 Improve trim() function
Did not cope well with trailing spaces or entirely blank strings.
2017-10-26 09:42:26 +09:00
Ian Barwick
c5d91ca88c repmgr node rejoin: add --dry-run option 2017-10-26 09:42:12 +09:00
Ian Barwick
9f5edd07ad Fix typo 2017-10-26 09:35:25 +09:00
Ian Barwick
f58b102d51 Standardize terminology on "primary" (in place of "master") 2017-10-24 13:44:03 +09:00
Ian Barwick
90733aecf7 --dry-run available for "node rejoin" 2017-10-23 10:40:43 +09:00
Ian Barwick
e0be228c89 docs: fix formatting 2017-10-23 10:00:00 +09:00
Ian Barwick
a9759cf6ca Add --help output for "repmgr node service"
Addresses GitHub #329.
2017-10-20 16:49:29 +09:00
Ian Barwick
6852ac82c6 Add --help output for "repmgr node rejoin"
Addresses GitHub #329.
2017-10-20 16:49:19 +09:00
Ian Barwick
c27bd2a135 docs: fix typo 2017-10-20 16:06:46 +09:00
Ian Barwick
5045e2eb9d node rewind: add check for pg_rewind and --dry-run mode
Addresses GitHub #330
2017-10-20 14:16:56 +09:00
Ian Barwick
23f7af17a2 Note Barman configuration file parameter changes 2017-10-20 11:31:31 +09:00
Ian Barwick
93936c090d Fix error message typo 2017-10-20 11:19:12 +09:00
Ian Barwick
564c951f0c Prevent relative configuration file path being stored in the repmgr metadata
The configuration file path is stored to make remote execution of repmgr
(e.g. during "repmgr standby switchover") simpler, so relative paths
make no sense.

Addresses GitHub #332
2017-10-20 10:59:54 +09:00
Ian Barwick
3f5e8f6aec Update README
Main body of documentation moved to DocBook format and hosted at:

    https://repmgr.org/docs/index.html

as the existing README and sundry additional files were becoming
unmanageable. Conversion to DocBook format enables all documentation
to be managed in a single structured system, with cross-references,
indexes, linkable URLS etc.
2017-10-19 16:39:33 +09:00
Ian Barwick
a6a97cda86 docs: update "repmgr cluster show" page 2017-10-19 16:39:27 +09:00
Ian Barwick
18c8e4c529 Add placeholder FAQ.md
This replaces the original FAQ maintainted for repmgr 3.x; repmgr 4
documentation is now available in DocBook format.
2017-10-19 16:22:28 +09:00
Ian Barwick
6984fe7029 docs: expand release notes and redirect "changes-in-repmgr4.md" 2017-10-19 14:11:17 +09:00
Ian Barwick
5ecc3a0a8f Add 4.0 release notes 2017-10-19 13:59:03 +09:00
Ian Barwick
febde097be doc: add missing entry for "priority" in repmgr.conf.sample
Per report from Shaun Thomas.
2017-10-19 13:16:36 +09:00
Ian Barwick
19ea248226 docs: add more index references 2017-10-19 12:22:58 +09:00
Ian Barwick
acdbd1110a docs: note way of forcing recovery then quitting in single user mode 2017-10-19 12:22:54 +09:00
Ian Barwick
946683182c Documentation: update markup 2017-10-18 11:12:37 +09:00
Ian Barwick
c9fbb7febf Update package signature documentation 2017-10-18 10:51:35 +09:00
Ian Barwick
ff966fe533 Document "upgrading-from-repmgr3.md" moved to main repmgr documentation 2017-10-18 10:51:29 +09:00
Ian Barwick
7001960cc1 Update "repmgr node rejoin" documentation 2017-10-17 17:41:36 +09:00
Ian Barwick
1cfba44799 Add FAQ to documentation 2017-10-17 16:16:40 +09:00
Ian Barwick
d1f9ca4b43 Move deprecated command line option
Not required in repmgr4, we're keeping it around for backwards compatibility;
a warning will be issued if used.
2017-10-17 16:16:06 +09:00
Ian Barwick
f6c253f8a6 Various documentation fixes 2017-10-17 11:02:33 +09:00
Ian Barwick
95ec8d8b21 Bump doc version 2017-10-17 09:46:23 +09:00
Ian Barwick
041f1b7667 Merge commit '0b2a6fe2fb958f10f211f0656fd91cae980fd08d' into REL4_0_STABLE 2017-10-16 11:22:48 +09:00
Ian Barwick
104279016a Update HISTORY 2017-10-04 13:33:37 +09:00
Ian Barwick
901a7603b1 Stamp 4.0beta1 2017-10-04 13:01:49 +09:00
92 changed files with 2299 additions and 9061 deletions

4
FAQ.md
View File

@@ -1,10 +1,8 @@
FAQ - Frequently Asked Questions about repmgr FAQ - Frequently Asked Questions about repmgr
============================================= =============================================
The repmgr 4 FAQ is located here: [repmgr FAQ (Frequently Asked Questions)](https://repmgr.org/docs/current/appendix-faq.html "repmgr FAQ") The repmgr 4 FAQ is located here: [repmgr FAQ (Frequently Asked Questions)](https://repmgr.org/docs/4.0/appendix-faq.html "repmgr FAQ")
The repmgr 3.x FAQ can be found here: The repmgr 3.x FAQ can be found here:
https://github.com/2ndQuadrant/repmgr/blob/REL3_3_STABLE/FAQ.md https://github.com/2ndQuadrant/repmgr/blob/REL3_3_STABLE/FAQ.md
Note that repmgr 3.x is no longer supported.

55
HISTORY
View File

@@ -1,58 +1,3 @@
4.2 2018-10-24
repmgr: add parameter "shutdown_check_timeout" for use by "standby switchover";
GitHub #504 (Ian)
repmgr: add "--node-id" option to "repmgr cluster cleanup"; GitHub #493 (Ian)
repmgr: report unreachable nodes when running "repmgr cluster (matrix|crosscheck);
GitHub #246 (Ian)
repmgr: add configuration file parameter "repmgr_bindir"; GitHub #246 (Ian)
repmgr: fix "Missing replication slots" label in "node check"; GitHub #507 (Ian)
repmgrd: fix parsing of -d/--daemonize option (Ian)
repmgrd: support "pausing" of repmgrd (Ian)
4.1.1 2018-09-05
logging: explicitly log the text of failed queries as ERRORs to
assist logfile analysis; GitHub #498
repmgr: truncate version string, if necessary; GitHub #490 (Ian)
repmgr: improve messages emitted during "standby promote" (Ian)
repmgr: "standby clone" - don't copy external config files in --dry-run
mode; GitHub #491 (Ian)
repmgr: add "cluster_cleanup" event; GitHub #492 (Ian)
repmgr: (standby switchover) improve detection of free walsenders;
GitHub #495 (Ian)
repmgr: (node rejoin) improve replication slot handling; GitHub #499 (Ian)
repmgrd: ensure that sending SIGHUP always results in the log file
being reopened; GitHub #485 (Ian)
repmgrd: report version number *after* logger initialisation; GitHub #487 (Ian)
repmgrd: fix startup on witness node when local data is stale; GitHub #488/#489 (Ian)
repmgrd: improve cascaded standby failover handling; GitHub #480 (Ian)
repmgrd: improve reconnection handling (Ian)
4.1.0 2018-07-31
repmgr: change default log_level to INFO, add documentation; GitHub #470 (Ian)
repmgr: add "--missing-slots" check to "repmgr node check" (Ian)
repmgr: improve command line error handling; GitHub #464 (Ian)
repmgr: fix "standby register --wait-sync" when no timeout provided (Ian)
repmgr: "cluster show" returns non-zero value if an issue encountered;
GitHub #456 (Ian)
repmgr: "node check" and "node status" returns non-zero value if an issue
encountered (Ian)
repmgr: add CSV output mode to "cluster event"; GitHub #471 (Ian)
repmgr: add -q/--quiet option to suppress non-error output; GitHub #468 (Ian)
repmgr: "node status" returns non-zero value if an issue encountered (Ian)
repmgr: enable "recovery_min_apply_delay" to be 0; GitHub #448 (Ian)
repmgr: "cluster cleanup" - add missing help options; GitHub #461/#462 (gclough)
repmgr: ensure witness node follows new primary after switchover;
GitHub #453 (Ian)
repmgr: fix witness node handling in "node check"/"node status";
GitHub #451 (Ian)
repmgr: fix "primary_slot_name" when using "standby clone" with --recovery-conf-only;
GitHub #474 (Ian)
repmgr: don't perform a switchover if an exclusive backup is running;
GitHub #476 (Martín)
repmgr: enable "witness unregister" to be run on any node; GitHub #472 (Ian)
repmgrd: create a PID file by default; GitHub #457 (Ian)
repmgrd: daemonize process by default; GitHub #458 (Ian)
4.0.6 2018-06-14 4.0.6 2018-06-14
repmgr: (witness register) prevent registration of a witness server with the repmgr: (witness register) prevent registration of a witness server with the
same name as an existing node (Ian) same name as an existing node (Ian)

View File

@@ -11,11 +11,7 @@ EXTENSION = repmgr
DATA = \ DATA = \
repmgr--unpackaged--4.0.sql \ repmgr--unpackaged--4.0.sql \
repmgr--4.0.sql \ repmgr--4.0.sql
repmgr--4.0--4.1.sql \
repmgr--4.1.sql \
repmgr--4.1--4.2.sql \
repmgr--4.2.sql
REGRESS = repmgr_extension REGRESS = repmgr_extension
@@ -30,24 +26,19 @@ all: \
PG_CPPFLAGS = -std=gnu89 -I$(includedir_internal) -I$(libpq_srcdir) -Wall -Wmissing-prototypes -Wmissing-declarations $(EXTRA_CFLAGS) PG_CPPFLAGS = -std=gnu89 -I$(includedir_internal) -I$(libpq_srcdir) -Wall -Wmissing-prototypes -Wmissing-declarations $(EXTRA_CFLAGS)
SHLIB_LINK = $(libpq) SHLIB_LINK = $(libpq)
HEADERS = $(wildcard *.h)
OBJS = \ OBJS = \
repmgr.o repmgr.o
include Makefile.global include Makefile.global
ifeq ($(vpath_build),yes)
HEADERS = $(wildcard *.h)
else
HEADERS_built = $(wildcard *.h)
endif
$(info Building against PostgreSQL $(MAJORVERSION)) $(info Building against PostgreSQL $(MAJORVERSION))
REPMGR_CLIENT_OBJS = repmgr-client.o \ REPMGR_CLIENT_OBJS = repmgr-client.o \
repmgr-action-primary.o repmgr-action-standby.o repmgr-action-witness.o \ repmgr-action-primary.o repmgr-action-standby.o repmgr-action-witness.o \
repmgr-action-bdr.o repmgr-action-cluster.o repmgr-action-node.o repmgr-action-daemon.o \ repmgr-action-bdr.o repmgr-action-cluster.o repmgr-action-node.o \
configfile.o log.o strutil.o controldata.o dirutil.o compat.o dbutils.o configfile.o log.o strutil.o controldata.o dirutil.o compat.o dbutils.o
REPMGRD_OBJS = repmgrd.o repmgrd-physical.o repmgrd-bdr.o configfile.o log.o dbutils.o strutil.o controldata.o compat.o REPMGRD_OBJS = repmgrd.o repmgrd-physical.o repmgrd-bdr.o configfile.o log.o dbutils.o strutil.o controldata.o compat.o
DATE=$(shell date "+%Y-%m-%d") DATE=$(shell date "+%Y-%m-%d")
@@ -91,7 +82,6 @@ additional-clean:
rm -f repmgr-action-bdr.o rm -f repmgr-action-bdr.o
rm -f repmgr-action-node.o rm -f repmgr-action-node.o
rm -f repmgr-action-cluster.o rm -f repmgr-action-cluster.o
rm -f repmgr-action-daemon.o
rm -f repmgrd.o rm -f repmgrd.o
rm -f repmgrd-physical.o rm -f repmgrd-physical.o
rm -f repmgrd-bdr.o rm -f repmgrd-bdr.o

View File

@@ -10,7 +10,7 @@ operations.
`repmgr 4` is a complete rewrite of the existing `repmgr` codebase, allowing `repmgr 4` is a complete rewrite of the existing `repmgr` codebase, allowing
the use of all of the latest features in PostgreSQL replication. the use of all of the latest features in PostgreSQL replication.
PostgreSQL 11, 10, 9.6 and 9.5 are fully supported. PostgreSQL 10, 9.6 and 9.5 are fully supported.
PostgreSQL 9.4 and 9.3 are supported, with some restrictions. PostgreSQL 9.4 and 9.3 are supported, with some restrictions.
`repmgr` is distributed under the GNU GPL 3 and maintained by 2ndQuadrant. `repmgr` is distributed under the GNU GPL 3 and maintained by 2ndQuadrant.
@@ -19,7 +19,7 @@ PostgreSQL 9.4 and 9.3 are supported, with some restrictions.
`repmgr 4` supports monitoring of a two-node BDR 2.0 cluster on PostgreSQL 9.6 `repmgr 4` supports monitoring of a two-node BDR 2.0 cluster on PostgreSQL 9.6
only. Note that BDR 2.0 is not publicly available; please contact 2ndQuadrant only. Note that BDR 2.0 is not publicly available; please contact 2ndQuadrant
for details. for details. `repmgr 4` will support future public BDR releases.
Documentation Documentation
@@ -27,7 +27,7 @@ Documentation
The main `repmgr` documentation is available here: The main `repmgr` documentation is available here:
> [repmgr 4 documentation](https://repmgr.org/docs/4.2/index.html) > [repmgr 4 documentation](https://repmgr.org/docs/4.0/index.html)
The `README` file for `repmgr` 3.x is available here: The `README` file for `repmgr` 3.x is available here:

View File

@@ -28,8 +28,10 @@ char config_file_path[MAXPGPATH] = "";
static bool config_file_provided = false; static bool config_file_provided = false;
bool config_file_found = false; bool config_file_found = false;
static void parse_config(t_configuration_options *options, bool terse);
static void _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *warning_list); static void _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *warning_list);
static bool parse_bool(const char *s,
const char *config_item,
ItemList *error_list);
static void _parse_line(char *buf, char *name, char *value); static void _parse_line(char *buf, char *name, char *value);
static void parse_event_notifications_list(t_configuration_options *options, const char *arg); static void parse_event_notifications_list(t_configuration_options *options, const char *arg);
@@ -88,7 +90,8 @@ load_config(const char *config_file, bool verbose, bool terse, t_configuration_o
if (pwd != NULL) if (pwd != NULL)
{ {
appendPQExpBufferStr(&fullpath, pwd); appendPQExpBuffer(&fullpath,
"%s", pwd);
} }
else else
{ {
@@ -104,7 +107,9 @@ load_config(const char *config_file, bool verbose, bool terse, t_configuration_o
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
} }
appendPQExpBufferStr(&fullpath, cwd); appendPQExpBuffer(&fullpath,
"%s",
cwd);
} }
appendPQExpBuffer(&fullpath, appendPQExpBuffer(&fullpath,
@@ -236,7 +241,7 @@ end_search:
} }
static void void
parse_config(t_configuration_options *options, bool terse) parse_config(t_configuration_options *options, bool terse)
{ {
/* Collate configuration file errors here for friendlier reporting */ /* Collate configuration file errors here for friendlier reporting */
@@ -285,7 +290,6 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
memset(options->data_directory, 0, sizeof(options->data_directory)); memset(options->data_directory, 0, sizeof(options->data_directory));
memset(options->config_directory, 0, sizeof(options->data_directory)); memset(options->config_directory, 0, sizeof(options->data_directory));
memset(options->pg_bindir, 0, sizeof(options->pg_bindir)); memset(options->pg_bindir, 0, sizeof(options->pg_bindir));
memset(options->repmgr_bindir, 0, sizeof(options->repmgr_bindir));
options->replication_type = REPLICATION_TYPE_PHYSICAL; options->replication_type = REPLICATION_TYPE_PHYSICAL;
/*------------- /*-------------
@@ -329,13 +333,6 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
options->primary_follow_timeout = DEFAULT_PRIMARY_FOLLOW_TIMEOUT; options->primary_follow_timeout = DEFAULT_PRIMARY_FOLLOW_TIMEOUT;
options->standby_follow_timeout = DEFAULT_STANDBY_FOLLOW_TIMEOUT; options->standby_follow_timeout = DEFAULT_STANDBY_FOLLOW_TIMEOUT;
/*------------------------
* standby switchover settings
*------------------------
*/
options->shutdown_check_timeout = DEFAULT_SHUTDOWN_CHECK_TIMEOUT;
options->standby_reconnect_timeout = DEFAULT_STANDBY_RECONNECT_TIMEOUT;
/*----------------- /*-----------------
* repmgrd settings * repmgrd settings
*----------------- *-----------------
@@ -355,8 +352,7 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
options->degraded_monitoring_timeout = -1; options->degraded_monitoring_timeout = -1;
options->async_query_timeout = DEFAULT_ASYNC_QUERY_TIMEOUT; options->async_query_timeout = DEFAULT_ASYNC_QUERY_TIMEOUT;
options->primary_notification_timeout = DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT; options->primary_notification_timeout = DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT;
options->repmgrd_standby_startup_timeout = -1; /* defaults to "standby_reconnect_timeout" if not set */ options->standby_reconnect_timeout = DEFAULT_STANDBY_RECONNECT_TIMEOUT;
memset(options->repmgrd_pid_file, 0, sizeof(options->repmgrd_pid_file));
/*------------- /*-------------
* witness settings * witness settings
@@ -488,8 +484,6 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
} }
else if (strcmp(name, "pg_bindir") == 0) else if (strcmp(name, "pg_bindir") == 0)
strncpy(options->pg_bindir, value, MAXPGPATH); strncpy(options->pg_bindir, value, MAXPGPATH);
else if (strcmp(name, "repmgr_bindir") == 0)
strncpy(options->repmgr_bindir, value, MAXPGPATH);
else if (strcmp(name, "replication_type") == 0) else if (strcmp(name, "replication_type") == 0)
{ {
@@ -545,16 +539,6 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
else if (strcmp(name, "standby_follow_timeout") == 0) else if (strcmp(name, "standby_follow_timeout") == 0)
options->standby_follow_timeout = repmgr_atoi(value, name, error_list, 0); options->standby_follow_timeout = repmgr_atoi(value, name, error_list, 0);
/* standby switchover settings */
else if (strcmp(name, "shutdown_check_timeout") == 0)
options->shutdown_check_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "standby_reconnect_timeout") == 0)
options->standby_reconnect_timeout = repmgr_atoi(value, name, error_list, 0);
/* node rejoin settings */
else if (strcmp(name, "node_rejoin_timeout") == 0)
options->node_rejoin_timeout = repmgr_atoi(value, name, error_list, 0);
/* node check settings */ /* node check settings */
else if (strcmp(name, "archive_ready_warning") == 0) else if (strcmp(name, "archive_ready_warning") == 0)
options->archive_ready_warning = repmgr_atoi(value, name, error_list, 1); options->archive_ready_warning = repmgr_atoi(value, name, error_list, 1);
@@ -604,10 +588,8 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
options->async_query_timeout = repmgr_atoi(value, name, error_list, 0); options->async_query_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "primary_notification_timeout") == 0) else if (strcmp(name, "primary_notification_timeout") == 0)
options->primary_notification_timeout = repmgr_atoi(value, name, error_list, 0); options->primary_notification_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "repmgrd_standby_startup_timeout") == 0) else if (strcmp(name, "standby_reconnect_timeout") == 0)
options->repmgrd_standby_startup_timeout = repmgr_atoi(value, name, error_list, 0); options->standby_reconnect_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "repmgrd_pid_file") == 0)
strncpy(options->repmgrd_pid_file, value, MAXPGPATH);
/* witness settings */ /* witness settings */
else if (strcmp(name, "witness_sync_interval") == 0) else if (strcmp(name, "witness_sync_interval") == 0)
@@ -789,17 +771,6 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
PQconninfoFree(conninfo_options); PQconninfoFree(conninfo_options);
} }
/* set values for parameters which default to other parameters */
/*
* From 4.1, "repmgrd_standby_startup_timeout" replaces "standby_reconnect_timeout"
* in repmgrd; fall back to "standby_reconnect_timeout" if no value explicitly provided
*/
if (options->repmgrd_standby_startup_timeout == -1)
{
options->repmgrd_standby_startup_timeout = options->standby_reconnect_timeout;
}
/* add warning about changed "barman_" parameter meanings */ /* add warning about changed "barman_" parameter meanings */
if ((options->barman_host[0] == '\0' && options->barman_server[0] != '\0') || if ((options->barman_host[0] == '\0' && options->barman_server[0] != '\0') ||
(options->barman_host[0] != '\0' && options->barman_server[0] == '\0')) (options->barman_host[0] != '\0' && options->barman_server[0] == '\0'))
@@ -816,19 +787,13 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
if (options->archive_ready_warning >= options->archive_ready_critical) if (options->archive_ready_warning >= options->archive_ready_critical)
{ {
item_list_append(error_list, item_list_append(error_list,
_("\"archive_ready_critical\" must be greater than \"archive_ready_warning\"")); _("\archive_ready_critical\" must be greater than \"archive_ready_warning\""));
} }
if (options->replication_lag_warning >= options->replication_lag_critical) if (options->replication_lag_warning >= options->replication_lag_critical)
{ {
item_list_append(error_list, item_list_append(error_list,
_("\"replication_lag_critical\" must be greater than \"replication_lag_warning\"")); _("\replication_lag_critical\" must be greater than \"replication_lag_warning\""));
}
if (options->standby_reconnect_timeout < options->node_rejoin_timeout)
{
item_list_append(error_list,
_("\"standby_reconnect_timeout\" must be equal to or greater than \"node_rejoin_timeout\""));
} }
} }
@@ -994,11 +959,12 @@ parse_time_unit_parameter(const char *name, const char *value, char *dest, ItemL
char *ptr = NULL; char *ptr = NULL;
int targ = strtol(value, &ptr, 10); int targ = strtol(value, &ptr, 10);
if (targ < 0) if (targ < 1)
{ {
if (errors != NULL) if (errors != NULL)
{ {
item_list_append_format(errors, item_list_append_format(
errors,
_("invalid value provided for \"%s\""), _("invalid value provided for \"%s\""),
name); name);
} }
@@ -1052,16 +1018,13 @@ parse_time_unit_parameter(const char *name, const char *value, char *dest, ItemL
* - promote_delay * - promote_delay
* - reconnect_attempts * - reconnect_attempts
* - reconnect_interval * - reconnect_interval
* - repmgrd_standby_startup_timeout
* - retry_promote_interval_secs * - retry_promote_interval_secs
* *
* non-changeable options (repmgrd references these from the "repmgr.nodes" * non-changeable options
* table, not the configuration file)
* *
* - node_id * - node_id
* - node_name * - node_name
* - data_directory * - data_directory
* - location
* - priority * - priority
* - replication_type * - replication_type
* *
@@ -1070,7 +1033,7 @@ parse_time_unit_parameter(const char *name, const char *value, char *dest, ItemL
*/ */
bool bool
reload_config(t_configuration_options *orig_options, t_server_type server_type) reload_config(t_configuration_options *orig_options)
{ {
PGconn *conn; PGconn *conn;
t_configuration_options new_options = T_CONFIGURATION_OPTIONS_INITIALIZER; t_configuration_options new_options = T_CONFIGURATION_OPTIONS_INITIALIZER;
@@ -1080,50 +1043,17 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
static ItemList config_errors = {NULL, NULL}; static ItemList config_errors = {NULL, NULL};
static ItemList config_warnings = {NULL, NULL}; static ItemList config_warnings = {NULL, NULL};
PQExpBufferData errors;
log_info(_("reloading configuration file")); log_info(_("reloading configuration file"));
_parse_config(&new_options, &config_errors, &config_warnings); _parse_config(&new_options, &config_errors, &config_warnings);
if (server_type == PRIMARY || server_type == STANDBY)
{
if (new_options.promote_command[0] == '\0')
{
item_list_append(&config_errors, _("\"promote_command\": required parameter was not found"));
}
if (new_options.follow_command[0] == '\0')
{
item_list_append(&config_errors, _("\"follow_command\": required parameter was not found"));
}
}
if (config_errors.head != NULL) if (config_errors.head != NULL)
{ {
ItemListCell *cell = NULL; /* XXX dump errors to log */
log_warning(_("unable to parse new configuration, retaining current configuration")); log_warning(_("unable to parse new configuration, retaining current configuration"));
initPQExpBuffer(&errors);
appendPQExpBufferStr(&errors,
"following errors were detected:\n");
for (cell = config_errors.head; cell; cell = cell->next)
{
appendPQExpBuffer(&errors,
" %s\n", cell->string);
}
log_detail("%s", errors.data);
termPQExpBuffer(&errors);
return false; return false;
} }
/* The following options cannot be changed */ /* The following options cannot be changed */
if (new_options.node_id != orig_options->node_id) if (new_options.node_id != orig_options->node_id)
@@ -1277,7 +1207,7 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
config_changed = true; config_changed = true;
} }
/* promote_delay (for testing use only; not documented */ /* promote_delay */
if (orig_options->promote_delay != new_options.promote_delay) if (orig_options->promote_delay != new_options.promote_delay)
{ {
orig_options->promote_delay = new_options.promote_delay; orig_options->promote_delay = new_options.promote_delay;
@@ -1304,15 +1234,6 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
config_changed = true; config_changed = true;
} }
/* repmgrd_standby_startup_timeout */
if (orig_options->repmgrd_standby_startup_timeout != new_options.repmgrd_standby_startup_timeout)
{
orig_options->repmgrd_standby_startup_timeout = new_options.repmgrd_standby_startup_timeout;
log_info(_("\"repmgrd_standby_startup_timeout\" is now \"%i\""), new_options.repmgrd_standby_startup_timeout);
config_changed = true;
}
/* /*
* Handle changes to logging configuration * Handle changes to logging configuration
*/ */
@@ -1405,23 +1326,13 @@ exit_with_config_file_errors(ItemList *config_errors, ItemList *config_warnings,
void void
exit_with_cli_errors(ItemList *error_list, const char *repmgr_command) exit_with_cli_errors(ItemList *error_list)
{ {
fprintf(stderr, _("The following command line errors were encountered:\n")); fprintf(stderr, _("The following command line errors were encountered:\n"));
print_item_list(error_list); print_item_list(error_list);
if (repmgr_command != NULL)
{
fprintf(stderr, _("Try \"%s --help\" or \"%s %s --help\" for more information.\n"),
progname(),
progname(),
repmgr_command);
}
else
{
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname()); fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname());
}
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
} }
@@ -1526,14 +1437,11 @@ repmgr_atoi(const char *value, const char *config_item, ItemList *error_list, in
* *
* https://www.postgresql.org/docs/current/static/config-setting.html * https://www.postgresql.org/docs/current/static/config-setting.html
*/ */
bool static bool
parse_bool(const char *s, const char *config_item, ItemList *error_list) parse_bool(const char *s, const char *config_item, ItemList *error_list)
{ {
PQExpBufferData errors; PQExpBufferData errors;
if (s == NULL)
return true;
if (strcasecmp(s, "0") == 0) if (strcasecmp(s, "0") == 0)
return false; return false;
@@ -1815,9 +1723,6 @@ free_parsed_argv(char ***argv_array)
} }
bool bool
parse_pg_basebackup_options(const char *pg_basebackup_options, t_basebackup_options *backup_options, int server_version_num, ItemList *error_list) parse_pg_basebackup_options(const char *pg_basebackup_options, t_basebackup_options *backup_options, int server_version_num, ItemList *error_list)
{ {

View File

@@ -75,7 +75,6 @@ typedef struct
char data_directory[MAXPGPATH]; char data_directory[MAXPGPATH];
char config_directory[MAXPGPATH]; char config_directory[MAXPGPATH];
char pg_bindir[MAXPGPATH]; char pg_bindir[MAXPGPATH];
char repmgr_bindir[MAXPGPATH];
int replication_type; int replication_type;
/* log settings */ /* log settings */
@@ -103,13 +102,6 @@ typedef struct
int primary_follow_timeout; int primary_follow_timeout;
int standby_follow_timeout; int standby_follow_timeout;
/* standby switchover settings */
int shutdown_check_timeout;
int standby_reconnect_timeout;
/* node rejoin settings */
int node_rejoin_timeout;
/* node check settings */ /* node check settings */
int archive_ready_warning; int archive_ready_warning;
int archive_ready_critical; int archive_ready_critical;
@@ -132,8 +124,7 @@ typedef struct
int degraded_monitoring_timeout; int degraded_monitoring_timeout;
int async_query_timeout; int async_query_timeout;
int primary_notification_timeout; int primary_notification_timeout;
int repmgrd_standby_startup_timeout; int standby_reconnect_timeout;
char repmgrd_pid_file[MAXPGPATH];
/* BDR settings */ /* BDR settings */
bool bdr_local_monitoring_only; bool bdr_local_monitoring_only;
@@ -172,7 +163,7 @@ typedef struct
#define T_CONFIGURATION_OPTIONS_INITIALIZER { \ #define T_CONFIGURATION_OPTIONS_INITIALIZER { \
/* node information */ \ /* node information */ \
UNKNOWN_NODE_ID, "", "", "", "", "", "", "", REPLICATION_TYPE_PHYSICAL, \ UNKNOWN_NODE_ID, "", "", "", "", "", "", REPLICATION_TYPE_PHYSICAL, \
/* log settings */ \ /* log settings */ \
"", "", "", DEFAULT_LOG_STATUS_INTERVAL, \ "", "", "", DEFAULT_LOG_STATUS_INTERVAL, \
/* standby clone settings */ \ /* standby clone settings */ \
@@ -182,11 +173,6 @@ typedef struct
/* standby follow settings */ \ /* standby follow settings */ \
DEFAULT_PRIMARY_FOLLOW_TIMEOUT, \ DEFAULT_PRIMARY_FOLLOW_TIMEOUT, \
DEFAULT_STANDBY_FOLLOW_TIMEOUT, \ DEFAULT_STANDBY_FOLLOW_TIMEOUT, \
/* standby switchover settings */ \
DEFAULT_SHUTDOWN_CHECK_TIMEOUT, \
DEFAULT_STANDBY_RECONNECT_TIMEOUT, \
/* node rejoin settings */ \
DEFAULT_NODE_REJOIN_TIMEOUT, \
/* node check settings */ \ /* node check settings */ \
DEFAULT_ARCHIVE_READY_WARNING, DEFAULT_ARCHIVE_READY_CRITICAL, \ DEFAULT_ARCHIVE_READY_WARNING, DEFAULT_ARCHIVE_READY_CRITICAL, \
DEFAULT_REPLICATION_LAG_WARNING, DEFAULT_REPLICATION_LAG_CRITICAL, \ DEFAULT_REPLICATION_LAG_WARNING, DEFAULT_REPLICATION_LAG_CRITICAL, \
@@ -200,7 +186,7 @@ typedef struct
false, -1, \ false, -1, \
DEFAULT_ASYNC_QUERY_TIMEOUT, \ DEFAULT_ASYNC_QUERY_TIMEOUT, \
DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT, \ DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT, \
-1, "", \ DEFAULT_STANDBY_RECONNECT_TIMEOUT, \
/* BDR settings */ \ /* BDR settings */ \
false, DEFAULT_BDR_RECOVERY_TIMEOUT, \ false, DEFAULT_BDR_RECOVERY_TIMEOUT, \
/* service settings */ \ /* service settings */ \
@@ -276,20 +262,16 @@ typedef struct
"", "", "", "" \ "", "", "", "" \
} }
#include "dbutils.h"
void set_progname(const char *argv0); void set_progname(const char *argv0);
const char *progname(void); const char *progname(void);
void load_config(const char *config_file, bool verbose, bool terse, t_configuration_options *options, char *argv0); void load_config(const char *config_file, bool verbose, bool terse, t_configuration_options *options, char *argv0);
bool reload_config(t_configuration_options *orig_options, t_server_type server_type); void parse_config(t_configuration_options *options, bool terse);
bool reload_config(t_configuration_options *orig_options);
bool parse_recovery_conf(const char *data_dir, t_recovery_conf *conf); bool parse_recovery_conf(const char *data_dir, t_recovery_conf *conf);
bool parse_bool(const char *s,
const char *config_item,
ItemList *error_list);
int repmgr_atoi(const char *s, int repmgr_atoi(const char *s,
const char *config_item, const char *config_item,
ItemList *error_list, ItemList *error_list,
@@ -305,7 +287,7 @@ void free_parsed_argv(char ***argv_array);
/* called by repmgr-client and repmgrd */ /* called by repmgr-client and repmgrd */
void exit_with_cli_errors(ItemList *error_list, const char *repmgr_command); void exit_with_cli_errors(ItemList *error_list);
void print_item_list(ItemList *item_list); void print_item_list(ItemList *item_list);
#endif /* _REPMGR_CONFIGFILE_H_ */ #endif /* _REPMGR_CONFIGFILE_H_ */

18
configure vendored
View File

@@ -1,6 +1,6 @@
#! /bin/sh #! /bin/sh
# Guess values for system-dependent variables and create Makefiles. # Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.69 for repmgr 4.2. # Generated by GNU Autoconf 2.69 for repmgr 4.0.5.
# #
# Report bugs to <pgsql-bugs@postgresql.org>. # Report bugs to <pgsql-bugs@postgresql.org>.
# #
@@ -582,8 +582,8 @@ MAKEFLAGS=
# Identity of this package. # Identity of this package.
PACKAGE_NAME='repmgr' PACKAGE_NAME='repmgr'
PACKAGE_TARNAME='repmgr' PACKAGE_TARNAME='repmgr'
PACKAGE_VERSION='4.2' PACKAGE_VERSION='4.0.5'
PACKAGE_STRING='repmgr 4.2' PACKAGE_STRING='repmgr 4.0.5'
PACKAGE_BUGREPORT='pgsql-bugs@postgresql.org' PACKAGE_BUGREPORT='pgsql-bugs@postgresql.org'
PACKAGE_URL='https://2ndquadrant.com/en/resources/repmgr/' PACKAGE_URL='https://2ndquadrant.com/en/resources/repmgr/'
@@ -1178,7 +1178,7 @@ if test "$ac_init_help" = "long"; then
# Omit some internal or obsolete options to make the list less imposing. # Omit some internal or obsolete options to make the list less imposing.
# This message is too long to be a string in the A/UX 3.1 sh. # This message is too long to be a string in the A/UX 3.1 sh.
cat <<_ACEOF cat <<_ACEOF
\`configure' configures repmgr 4.2 to adapt to many kinds of systems. \`configure' configures repmgr 4.0.5 to adapt to many kinds of systems.
Usage: $0 [OPTION]... [VAR=VALUE]... Usage: $0 [OPTION]... [VAR=VALUE]...
@@ -1239,7 +1239,7 @@ fi
if test -n "$ac_init_help"; then if test -n "$ac_init_help"; then
case $ac_init_help in case $ac_init_help in
short | recursive ) echo "Configuration of repmgr 4.2:";; short | recursive ) echo "Configuration of repmgr 4.0.5:";;
esac esac
cat <<\_ACEOF cat <<\_ACEOF
@@ -1313,7 +1313,7 @@ fi
test -n "$ac_init_help" && exit $ac_status test -n "$ac_init_help" && exit $ac_status
if $ac_init_version; then if $ac_init_version; then
cat <<\_ACEOF cat <<\_ACEOF
repmgr configure 4.2 repmgr configure 4.0.5
generated by GNU Autoconf 2.69 generated by GNU Autoconf 2.69
Copyright (C) 2012 Free Software Foundation, Inc. Copyright (C) 2012 Free Software Foundation, Inc.
@@ -1332,7 +1332,7 @@ cat >config.log <<_ACEOF
This file contains any messages produced by compilers while This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake. running configure, to aid debugging if configure makes a mistake.
It was created by repmgr $as_me 4.2, which was It was created by repmgr $as_me 4.0.5, which was
generated by GNU Autoconf 2.69. Invocation command line was generated by GNU Autoconf 2.69. Invocation command line was
$ $0 $@ $ $0 $@
@@ -2359,7 +2359,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
# report actual input values of CONFIG_FILES etc. instead of their # report actual input values of CONFIG_FILES etc. instead of their
# values after options handling. # values after options handling.
ac_log=" ac_log="
This file was extended by repmgr $as_me 4.2, which was This file was extended by repmgr $as_me 4.0.5, which was
generated by GNU Autoconf 2.69. Invocation command line was generated by GNU Autoconf 2.69. Invocation command line was
CONFIG_FILES = $CONFIG_FILES CONFIG_FILES = $CONFIG_FILES
@@ -2422,7 +2422,7 @@ _ACEOF
cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1 cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`" ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`"
ac_cs_version="\\ ac_cs_version="\\
repmgr config.status 4.2 repmgr config.status 4.0.5
configured by $0, generated by GNU Autoconf 2.69, configured by $0, generated by GNU Autoconf 2.69,
with options \\"\$ac_cs_config\\" with options \\"\$ac_cs_config\\"

View File

@@ -1,4 +1,4 @@
AC_INIT([repmgr], [4.2], [pgsql-bugs@postgresql.org], [repmgr], [https://2ndquadrant.com/en/resources/repmgr/]) AC_INIT([repmgr], [4.0.6], [pgsql-bugs@postgresql.org], [repmgr], [https://2ndquadrant.com/en/resources/repmgr/])
AC_COPYRIGHT([Copyright (c) 2010-2018, 2ndQuadrant Ltd.]) AC_COPYRIGHT([Copyright (c) 2010-2018, 2ndQuadrant Ltd.])

View File

@@ -227,15 +227,7 @@ get_controlfile(const char *DataDir)
control_file_info->control_file_processed = true; control_file_info->control_file_processed = true;
if (version_num >= 110000) if (version_num >= 90500)
{
ControlFileData11 *ptr = (struct ControlFileData11 *)ControlFileDataPtr;
control_file_info->system_identifier = ptr->system_identifier;
control_file_info->state = ptr->state;
control_file_info->checkPoint = ptr->checkPoint;
control_file_info->data_checksum_version = ptr->data_checksum_version;
}
else if (version_num >= 90500)
{ {
ControlFileData95 *ptr = (struct ControlFileData95 *)ControlFileDataPtr; ControlFileData95 *ptr = (struct ControlFileData95 *)ControlFileDataPtr;
control_file_info->system_identifier = ptr->system_identifier; control_file_info->system_identifier = ptr->system_identifier;

View File

@@ -265,71 +265,6 @@ typedef struct ControlFileData95
} ControlFileData95; } ControlFileData95;
/*
* Following field removed in 11:
*
* XLogRecPtr prevCheckPoint;
*
* In 10, following field appended *after* "data_checksum_version":
*
* char mock_authentication_nonce[MOCK_AUTH_NONCE_LEN];
*
* (but we don't care about that)
*/
typedef struct ControlFileData11
{
uint64 system_identifier;
uint32 pg_control_version; /* PG_CONTROL_VERSION */
uint32 catalog_version_no; /* see catversion.h */
DBState state; /* see enum above */
pg_time_t time; /* time stamp of last pg_control update */
XLogRecPtr checkPoint; /* last check point record ptr */
CheckPoint95 checkPointCopy; /* copy of last check point record */
XLogRecPtr unloggedLSN; /* current fake LSN value, for unlogged rels */
XLogRecPtr minRecoveryPoint;
TimeLineID minRecoveryPointTLI;
XLogRecPtr backupStartPoint;
XLogRecPtr backupEndPoint;
bool backupEndRequired;
int wal_level;
bool wal_log_hints;
int MaxConnections;
int max_worker_processes;
int max_prepared_xacts;
int max_locks_per_xact;
bool track_commit_timestamp;
uint32 maxAlign; /* alignment requirement for tuples */
double floatFormat; /* constant 1234567.0 */
uint32 blcksz; /* data block size for this DB */
uint32 relseg_size; /* blocks per segment of large relation */
uint32 xlog_blcksz; /* block size within WAL files */
uint32 xlog_seg_size; /* size of each WAL segment */
uint32 nameDataLen; /* catalog name field width */
uint32 indexMaxKeys; /* max number of columns in an index */
uint32 toast_max_chunk_size; /* chunk size in TOAST tables */
uint32 loblksize; /* chunk size in pg_largeobject */
bool enableIntTimes; /* int64 storage enabled? */
bool float4ByVal; /* float4 pass-by-value? */
bool float8ByVal; /* float8, int8, etc pass-by-value? */
uint32 data_checksum_version;
} ControlFileData11;
extern DBState get_db_state(const char *data_directory); extern DBState get_db_state(const char *data_directory);

1267
dbutils.c

File diff suppressed because it is too large Load Diff

View File

@@ -29,9 +29,7 @@
#include "voting.h" #include "voting.h"
#define REPMGR_NODES_COLUMNS "n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name " #define REPMGR_NODES_COLUMNS "n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name "
#define BDR2_NODES_COLUMNS "node_sysid, node_timeline, node_dboid, node_name, node_local_dsn, ''" #define BDR_NODES_COLUMNS "node_sysid, node_timeline, node_dboid, node_status, node_name, node_local_dsn, node_init_from_dsn, node_read_only, node_seq_id"
#define BDR3_NODES_COLUMNS "ns.node_id, 0, 0, ns.node_name, ns.interface_connstr, ns.peer_state_name"
#define ERRBUFF_SIZE 512 #define ERRBUFF_SIZE 512
@@ -47,7 +45,6 @@ typedef enum
typedef enum typedef enum
{ {
REPMGR_INSTALLED = 0, REPMGR_INSTALLED = 0,
REPMGR_OLD_VERSION_INSTALLED,
REPMGR_AVAILABLE, REPMGR_AVAILABLE,
REPMGR_UNAVAILABLE, REPMGR_UNAVAILABLE,
REPMGR_UNKNOWN REPMGR_UNKNOWN
@@ -97,28 +94,6 @@ typedef enum
SLOT_ACTIVE SLOT_ACTIVE
} ReplSlotStatus; } ReplSlotStatus;
typedef enum
{
BACKUP_STATE_UNKNOWN = -1,
BACKUP_STATE_IN_BACKUP,
BACKUP_STATE_NO_BACKUP
} BackupState;
/*
* Struct to store extension version information
*/
typedef struct s_extension_versions {
char default_version[8];
char installed_version[8];
} t_extension_versions;
#define T_EXTENSION_VERSIONS_INITIALIZER { \
"", \
"", \
}
/* /*
* Struct to store node information * Struct to store node information
*/ */
@@ -262,14 +237,18 @@ typedef struct s_bdr_node_info
char node_sysid[MAXLEN]; char node_sysid[MAXLEN];
uint32 node_timeline; uint32 node_timeline;
uint32 node_dboid; uint32 node_dboid;
char node_status;
char node_name[MAXLEN]; char node_name[MAXLEN];
char node_local_dsn[MAXLEN]; char node_local_dsn[MAXLEN];
char peer_state_name[MAXLEN]; char node_init_from_dsn[MAXLEN];
bool read_only;
uint32 node_seq_id;
} t_bdr_node_info; } t_bdr_node_info;
#define T_BDR_NODE_INFO_INITIALIZER { \ #define T_BDR_NODE_INFO_INITIALIZER { \
"", InvalidOid, InvalidOid, \ "", InvalidOid, InvalidOid, \
"", "", "" \ '?', "", "", "", \
false, -1 \
} }
@@ -342,21 +321,6 @@ typedef struct
UNKNOWN_TIMELINE_ID, \ UNKNOWN_TIMELINE_ID, \
InvalidXLogRecPtr \ InvalidXLogRecPtr \
} }
typedef struct RepmgrdInfo {
int node_id;
int pid;
char pid_text[MAXLEN];
char pid_file[MAXLEN];
bool pg_running;
char pg_running_text[MAXLEN];
bool running;
char repmgrd_running[MAXLEN];
bool paused;
} RepmgrdInfo;
/* global variables */ /* global variables */
extern int server_version_num; extern int server_version_num;
@@ -379,10 +343,12 @@ bool atobool(const char *value);
PGconn *establish_db_connection(const char *conninfo, PGconn *establish_db_connection(const char *conninfo,
const bool exit_on_error); const bool exit_on_error);
PGconn *establish_db_connection_quiet(const char *conninfo); PGconn *establish_db_connection_quiet(const char *conninfo);
PGconn *establish_db_connection_by_params(t_conninfo_param_list *param_list, PGconn *establish_db_connection_by_params(t_conninfo_param_list *param_list,
const bool exit_on_error); const bool exit_on_error);
PGconn *establish_primary_db_connection(PGconn *conn, PGconn *establish_primary_db_connection(PGconn *conn,
const bool exit_on_error); const bool exit_on_error);
PGconn *get_primary_connection(PGconn *standby_conn, int *primary_id, char *primary_conninfo_out); PGconn *get_primary_connection(PGconn *standby_conn, int *primary_id, char *primary_conninfo_out);
PGconn *get_primary_connection_quiet(PGconn *standby_conn, int *primary_id, char *primary_conninfo_out); PGconn *get_primary_connection_quiet(PGconn *standby_conn, int *primary_id, char *primary_conninfo_out);
@@ -408,6 +374,7 @@ bool has_passfile(void);
bool begin_transaction(PGconn *conn); bool begin_transaction(PGconn *conn);
bool commit_transaction(PGconn *conn); bool commit_transaction(PGconn *conn);
bool rollback_transaction(PGconn *conn); bool rollback_transaction(PGconn *conn);
bool check_cluster_schema(PGconn *conn);
/* GUC manipulation functions */ /* GUC manipulation functions */
bool set_config(PGconn *conn, const char *config_param, const char *config_value); bool set_config(PGconn *conn, const char *config_param, const char *config_value);
@@ -425,15 +392,9 @@ int get_ready_archive_files(PGconn *conn, const char *data_directory);
bool identify_system(PGconn *repl_conn, t_system_identification *identification); bool identify_system(PGconn *repl_conn, t_system_identification *identification);
bool repmgrd_set_local_node_id(PGconn *conn, int local_node_id); bool repmgrd_set_local_node_id(PGconn *conn, int local_node_id);
int repmgrd_get_local_node_id(PGconn *conn); int repmgrd_get_local_node_id(PGconn *conn);
BackupState server_in_exclusive_backup_mode(PGconn *conn);
void repmgrd_set_pid(PGconn *conn, pid_t repmgrd_pid, const char *pidfile);
pid_t repmgrd_get_pid(PGconn *conn);
bool repmgrd_is_running(PGconn *conn);
bool repmgrd_is_paused(PGconn *conn);
bool repmgrd_pause(PGconn *conn, bool pause);
/* extension functions */ /* extension functions */
ExtensionStatus get_repmgr_extension_status(PGconn *conn, t_extension_versions *extversions); ExtensionStatus get_repmgr_extension_status(PGconn *conn);
/* node management functions */ /* node management functions */
void checkpoint(PGconn *conn); void checkpoint(PGconn *conn);
@@ -453,7 +414,7 @@ t_node_info *get_node_record_pointer(PGconn *conn, int node_id);
bool get_local_node_record(PGconn *conn, int node_id, t_node_info *node_info); bool get_local_node_record(PGconn *conn, int node_id, t_node_info *node_info);
bool get_primary_node_record(PGconn *conn, t_node_info *node_info); bool get_primary_node_record(PGconn *conn, t_node_info *node_info);
bool get_all_node_records(PGconn *conn, NodeInfoList *node_list); void get_all_node_records(PGconn *conn, NodeInfoList *node_list);
void get_downstream_node_records(PGconn *conn, int node_id, NodeInfoList *nodes); void get_downstream_node_records(PGconn *conn, int node_id, NodeInfoList *nodes);
void get_active_sibling_node_records(PGconn *conn, int node_id, int upstream_node_id, NodeInfoList *node_list); void get_active_sibling_node_records(PGconn *conn, int node_id, int upstream_node_id, NodeInfoList *node_list);
void get_node_records_by_priority(PGconn *conn, NodeInfoList *node_list); void get_node_records_by_priority(PGconn *conn, NodeInfoList *node_list);
@@ -507,7 +468,7 @@ int wait_connection_availability(PGconn *conn, long long timeout);
/* node availability functions */ /* node availability functions */
bool is_server_available(const char *conninfo); bool is_server_available(const char *conninfo);
bool is_server_available_params(t_conninfo_param_list *param_list); bool is_server_available_params(t_conninfo_param_list *param_list);
ExecStatusType connection_ping(PGconn *conn); void connection_ping(PGconn *conn);
/* monitoring functions */ /* monitoring functions */
void void
@@ -523,8 +484,8 @@ add_monitoring_record(PGconn *primary_conn,
long long unsigned int apply_lag_bytes long long unsigned int apply_lag_bytes
); );
int get_number_of_monitoring_records_to_delete(PGconn *primary_conn, int keep_history, int node_id); int get_number_of_monitoring_records_to_delete(PGconn *primary_conn, int keep_history);
bool delete_monitoring_records(PGconn *primary_conn, int keep_history, int node_id); bool delete_monitoring_records(PGconn *primary_conn, int keep_history);
@@ -546,14 +507,12 @@ void get_node_replication_stats(PGconn *conn, int server_version_num, t_node_in
bool is_downstream_node_attached(PGconn *conn, char *node_name); bool is_downstream_node_attached(PGconn *conn, char *node_name);
/* BDR functions */ /* BDR functions */
int get_bdr_version_num(void);
void get_all_bdr_node_records(PGconn *conn, BdrNodeInfoList *node_list); void get_all_bdr_node_records(PGconn *conn, BdrNodeInfoList *node_list);
RecordStatus get_bdr_node_record_by_name(PGconn *conn, const char *node_name, t_bdr_node_info *node_info); RecordStatus get_bdr_node_record_by_name(PGconn *conn, const char *node_name, t_bdr_node_info *node_info);
bool is_bdr_db(PGconn *conn, PQExpBufferData *output); bool is_bdr_db(PGconn *conn, PQExpBufferData *output);
bool is_bdr_db_quiet(PGconn *conn); bool is_bdr_db_quiet(PGconn *conn);
bool is_active_bdr_node(PGconn *conn, const char *node_name); bool is_active_bdr_node(PGconn *conn, const char *node_name);
bool is_bdr_repmgr(PGconn *conn); bool is_bdr_repmgr(PGconn *conn);
char *get_default_bdr_replication_set(PGconn *conn);
bool is_table_in_bdr_replication_set(PGconn *conn, const char *tablename, const char *set); bool is_table_in_bdr_replication_set(PGconn *conn, const char *tablename, const char *set);
bool add_table_to_bdr_replication_set(PGconn *conn, const char *tablename, const char *set); bool add_table_to_bdr_replication_set(PGconn *conn, const char *tablename, const char *set);
void add_extension_tables_to_bdr_replication_set(PGconn *conn); void add_extension_tables_to_bdr_replication_set(PGconn *conn);

View File

@@ -21,17 +21,13 @@
in PostgreSQL 9.3, as well as improved automated failover support in PostgreSQL 9.3, as well as improved automated failover support
via <application>repmgrd</application>, and is not compatible with PostgreSQL 9.2 via <application>repmgrd</application>, and is not compatible with PostgreSQL 9.2
and earlier. We recommend upgrading to &repmgr; 4, as the &repmgr; 3.x and earlier. We recommend upgrading to &repmgr; 4, as the &repmgr; 3.x
series is no longer maintained. series will no longer be actively maintained.
</para> </para>
<para> <para>
&repmgr; 2.x supports PostgreSQL 9.0 ~ 9.3. While it is compatible &repmgr; 2.x supports PostgreSQL 9.0 ~ 9.3. While it is compatible
with PostgreSQL 9.3, we recommend using repmgr 4.x. &repmgr; 2.x is with PostgreSQL 9.3, we recommend using repmgr 4.x. &repmgr; 2.x is
no longer maintained. no longer maintained.
</para> </para>
<para>
See also <link linkend="install-compatibility-matrix">&repmgr; compatibility matrix</link>
and <link linkend="faq-upgrade-repmgr">Should I upgrade &repmgr;?</link>.
</para>
</sect2> </sect2>
<sect2 id="faq-replication-slots-advantage" xreflabel="Advantages of replication slots"> <sect2 id="faq-replication-slots-advantage" xreflabel="Advantages of replication slots">
@@ -39,25 +35,15 @@
<para> <para>
Replication slots, introduced in PostgreSQL 9.4, ensure that the Replication slots, introduced in PostgreSQL 9.4, ensure that the
primary server will retain WAL files until they have been consumed primary server will retain WAL files until they have been consumed
by all standby servers. This means standby servers should never by all standby servers. This makes WAL file management much easier,
fail due to not being able to retrieve required WAL files from the and if used &repmgr; will no longer insist on a fixed minimum number
primary. (default: 5000) of WAL files being retained.
</para> </para>
<para> <para>
However this does mean that if a standby is no longer connected to the However this does mean that if a standby is no longer connected to the
primary, the presence of the replication slot will cause WAL files primary, the presence of the replication slot will cause WAL files
to be retained indefinitely, and eventually lead to disk space to be retained indefinitely.
exhaustion.
</para> </para>
<tip>
<para>
2ndQuadrant's recommended configuration is to configure
<ulink url="https://www.pgbarman.org/">Barman</ulink> as a fallback
source of WAL files, rather than maintain replication slots for
each standby. See also: <link linkend="cloning-from-barman-restore-command">Using Barman as a WAL file source</link>.
</para>
</tip>
</sect2> </sect2>
<sect2 id="faq-replication-slots-number" xreflabel="Number of replication slots"> <sect2 id="faq-replication-slots-number" xreflabel="Number of replication slots">
@@ -122,82 +108,6 @@
is not possible, contact your vendor for assistance. is not possible, contact your vendor for assistance.
</para> </para>
</sect2> </sect2>
<sect2 id="faq-old-packages">
<title>How can I obtain old versions of &repmgr; packages?</title>
<para>
See appendix <xref linkend="packages-old-versions"> for details.
</para>
</sect2>
<sect2 id="faq-repmgr-required-for-replication">
<title>Is &repmgr; required for streaming replication?</title>
<para>
No.
</para>
<para>
&repmgr; (together with <application>repmgrd</application>) assists with
<emphasis>managing</emphasis> replication. It does not actually perform replication, which
is part of the core PostgreSQL functionality.
</para>
</sect2>
<sect2 id="faq-what-if-repmgr-uninstalled">
<title>Will replication stop working if &repmgr; is uninstalled?</title>
<para>
No. See preceding question.
</para>
</sect2>
<sect2 id="faq-version-mix">
<title>Does it matter if different &repmgr; versions are present in the replication cluster?</title>
<para>
Yes. If different &quot;major&quot; &repmgr; versions (e.g. 3.3.x and 4.1.x) are present,
&repmgr; (in particular <application>repmgrd</application>)
may not run, or run properly, or in the worst case (if different <application>repmgrd</application>
versions are running and there are differences in the failover implementation) break
your replication cluster.
</para>
<para>
If different &quot;minor&quot; &repmgr; versions (e.g. 4.1.1 and 4.1.6) are installed,
&repmgr; will function, but we strongly recommend always running the same version
to ensure there are no unexpected suprises, e.g. a newer version behaving slightly
differently to the older version.
</para>
<para>
See also <link linkend="faq-upgrade-repmgr">Should I upgrade &repmgr;?</link>.
</para>
</sect2>
<sect2 id="faq-upgrade-repmgr">
<title>Should I upgrade &repmgr;?</title>
<para>
Yes.
</para>
<para>
We don't release new versions for fun, you know. Upgrading may require a little effort,
but running an older &repmgr; version with bugs which have since been fixed may end up
costing you more effort. The same applies to PostgreSQL itself.
</para>
</sect2>
<sect2 id="faq-repmgr-conf-data-directory">
<title>Why do I need to specify the data directory location in repmgr.conf?</title>
<para>
In some circumstances &repmgr; may need to access a PostgreSQL data
directory while the PostgreSQL server is not running, e.g. to confirm
it shut down cleanly during a <link linkend="performing-switchover">switchover</link>.
</para>
<para>
Additionally, this provides support when using &repmgr; on PostgreSQL 9.6 and
earlier, where the <literal>repmgr</literal> user is not a superuser; in that
case the <literal>repmgr</literal> user will not be able to access the
<literal>data_directory</literal> configuration setting, access to which is restricted
to superusers. (In PostgreSQL 10 and later, non-superusers can be added to the
group <option>pg_read_all_settings</option> which will enable them to read this setting).
</para>
</sect2>
</sect1> </sect1>
<sect1 id="faq-repmgr" xreflabel="repmgr"> <sect1 id="faq-repmgr" xreflabel="repmgr">
@@ -329,22 +239,11 @@
Under some circumstances event notifications can be generated for servers Under some circumstances event notifications can be generated for servers
which have not yet been registered; it's also useful to retain a record which have not yet been registered; it's also useful to retain a record
of events which includes servers removed from the replication cluster of events which includes servers removed from the replication cluster
which no longer have an entry in the <literal>repmgr.nodes</literal> table. which no longer have an entry in the <literal>repmrg.nodes</literal> table.
</para> </para>
</sect2> </sect2>
<sect2 id="faq-repmgr-recovery-conf-quoted-values" xreflabel="Quoted values in recovery.conf">
<title>Why are some values in <filename>recovery.conf</filename> surrounded by pairs of single quotes?</title>
<para>
This is to ensure that user-supplied values which are written as parameter values in <filename>recovery.conf</filename>
are escaped correctly and do not cause errors when <filename>recovery.conf</filename> is parsed.
</para>
<para>
The escaping is performed by an internal PostgreSQL routine, which leaves strings consisting
of digits and alphabetical characters only as-is, but wraps everything else in pairs of single quotes,
even if the string does not contain any characters which need escaping.
</para>
</sect2>
</sect1> </sect1>
@@ -356,7 +255,7 @@
<sect2 id="faq-repmgrd-prevent-promotion" xreflabel="Prevent standby from being promoted to primary"> <sect2 id="faq-repmgrd-prevent-promotion" xreflabel="Prevent standby from being promoted to primary">
<title>How can I prevent a node from ever being promoted to primary?</title> <title>How can I prevent a node from ever being promoted to primary?</title>
<para> <para>
In <filename>repmgr.conf</filename>, set its priority to a value of <literal>0</literal>; apply the changed setting with In `repmgr.conf`, set its priority to a value of 0 or less; apply the changed setting with
<command><link linkend="repmgr-standby-register">repmgr standby register --force</link></command>. <command><link linkend="repmgr-standby-register">repmgr standby register --force</link></command>.
</para> </para>
<para> <para>
@@ -404,36 +303,5 @@
</para> </para>
</sect2> </sect2>
<sect2 id="faq-repmgrd-pg-bindir" xreflabel="repmgrd does not apply pg_bindir to promote_command or follow_command">
<title>
<application>repmgrd</application> ignores pg_bindir when executing <varname>promote_command</varname> or <varname>follow_command</varname>
</title>
<para>
<varname>promote_command</varname> or <varname>follow_command</varname> can be user-defined scripts,
so &repmgr; will not apply <option>pg_bindir</option> even if excuting &repmgr;. Always provide the full
path; see <xref linkend="repmgrd-automatic-failover-configuration"> for more details.
</para>
</sect2>
<sect2 id="faq-repmgrd-startup-no-upstream" xreflabel="repmgrd does not start if upstream node is not running">
<title>
<application>repmgrd</application> aborts startup with the error "<literal>upstream node must be running before repmgrd can start</literal>"
</title>
<para>
<application>repmgrd</application> does this to avoid starting up on a replication cluster
which is not in a healthy state. If the upstream is unavailable, <application>repmgrd</application>
may initiate a failover immediately after starting up, which could have unintended side-effects,
particularly if <application>repmgrd</application> is not running on other nodes.
</para>
<para>
In particular, it's possible that the node's local copy of the <literal>repmgr.nodes</literal> copy
is out-of-date, which may lead to incorrect failover behaviour.
</para>
<para>
The onus is therefore on the adminstrator to manually set the cluster to a stable, healthy state before
starting <application>repmgrd</application>.
</para>
</sect2>
</sect1> </sect1>
</appendix> </appendix>

View File

@@ -12,17 +12,10 @@
<sect1 id="packages-centos" xreflabel="CentOS packages"> <sect1 id="packages-centos" xreflabel="CentOS packages">
<title>CentOS Packages</title> <title>CentOS Packages</title>
<indexterm> <indexterm>
<primary>packages</primary> <primary>packages</primary>
<secondary>CentOS packages</secondary> <secondary>CentOS packages</secondary>
</indexterm> </indexterm>
<indexterm>
<primary>CentOS</primary>
<secondary>package information</secondary>
</indexterm>
<para> <para>
Currently, &repmgr; RPM packages are provided for versions 6.x and 7.x of CentOS. These should also Currently, &repmgr; RPM packages are provided for versions 6.x and 7.x of CentOS. These should also
work on matching versions of Red Hat Enterprise Linux, Scientific Linux and Oracle Enterprise Linux; work on matching versions of Red Hat Enterprise Linux, Scientific Linux and Oracle Enterprise Linux;
@@ -60,11 +53,11 @@
<tbody> <tbody>
<row> <row>
<entry>Repository URL:</entry> <entry>Repository URL:</entry>
<entry><ulink url="https://dl.2ndquadrant.com/">https://dl.2ndquadrant.com/</ulink></entry> <entry><ulink url="https://rpm.2ndquadrant.com/">https://rpm.2ndquadrant.com/</ulink></entry>
</row> </row>
<row> <row>
<entry>Repository documentation:</entry> <entry>Repository documentation:</entry>
<entry><ulink url="https://repmgr.org/docs/current/installation-packages.html#INSTALLATION-PACKAGES-REDHAT-2NDQ">https://repmgr.org/docs/current/installation-packages.html#INSTALLATION-PACKAGES-REDHAT-2NDQ</ulink></entry> <entry><ulink url="https://repmgr.org/docs/4.0/installation-packages.html#INSTALLATION-PACKAGES-REDHAT-2NDQ">https://repmgr.org/docs/4.0/installation-packages.html#INSTALLATION-PACKAGES-REDHAT-2NDQ</ulink></entry>
</row> </row>
</tbody> </tbody>
</tgroup> </tgroup>
@@ -244,12 +237,6 @@
<primary>packages</primary> <primary>packages</primary>
<secondary>Debian/Ubuntu packages</secondary> <secondary>Debian/Ubuntu packages</secondary>
</indexterm> </indexterm>
<indexterm>
<primary>Debian/Ubuntu</primary>
<secondary>package information</secondary>
</indexterm>
<para> <para>
&repmgr; <literal>.deb</literal> packages are provided via the &repmgr; <literal>.deb</literal> packages are provided via the
PostgreSQL Community APT repository, and are available for each community-supported PostgreSQL Community APT repository, and are available for each community-supported
@@ -266,23 +253,6 @@
</para> </para>
<table id="apt-2ndquadrant-repository">
<title>2ndQuadrant public repository</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Repository URL:</entry>
<entry><ulink url="https://dl.2ndquadrant.com/">https://dl.2ndquadrant.com/</ulink></entry>
</row>
<row>
<entry>Repository documentation:</entry>
<entry><ulink url="https://repmgr.org/docs/current/installation-packages.html#INSTALLATION-PACKAGES-DEBIAN">https://repmgr.org/docs/current/installation-packages.html#INSTALLATION-PACKAGES-DEBIAN</ulink></entry>
</row>
</tbody>
</tgroup>
</table>
<table id="apt-repository"> <table id="apt-repository">
<title>PostgreSQL Community APT repository (PGDG)</title> <title>PostgreSQL Community APT repository (PGDG)</title>
<tgroup cols="2"> <tgroup cols="2">
@@ -394,169 +364,4 @@
</sect2> </sect2>
</sect1> </sect1>
<sect1 id="packages-snapshot" xreflabel="Snapshot packages">
<title>Snapshot packages</title>
<indexterm>
<primary>snapshot packages</primary>
</indexterm>
<indexterm>
<primary>packages</primary>
<secondary>snaphots</secondary>
</indexterm>
<para>
For testing new features and bug fixes, from time to time 2ndQuadrant provides
so-called &quot;snapshot packages&quot; via its public repository. These packages
are built from the &repmgr; source at a particular point in time, and are not formal
releases.
</para>
<note>
<para>
We do not recommend installing these packages in a production environment
unless specifically advised.
</para>
</note>
<para>
To install a snapshot package, it's necessary to install the 2ndQuadrant public snapshot repository,
following the instructions here: <ulink url="https://dl.2ndquadrant.com/default/release/site/">https://dl.2ndquadrant.com/default/release/site/</ulink> but replace <literal>release</literal> with <literal>snapshot</literal>
in the appropriate URL.
</para>
<para>
For example, to install the snapshot RPM repository for PostgreSQL 9.6, execute (as <literal>root</literal>):
<programlisting>
curl https://dl.2ndquadrant.com/default/snapshot/get/9.6/rpm | bash</programlisting>
or as a normal user with root sudo access:
<programlisting>
curl https://dl.2ndquadrant.com/default/snapshot/get/9.6/rpm | sudo bash</programlisting>
</para>
<para>
Alternatively you can browse the repository here:
<ulink url="https://dl.2ndquadrant.com/default/snapshot/browse/">https://dl.2ndquadrant.com/default/snapshot/browse/</ulink>.
</para>
<para>
Once the repository is installed, installing or updating &repmgr; will result in the latest snapshot
package being installed.
</para>
<para>
The package name will be formatted like this:
<programlisting>
repmgr96-4.1.1-0.0git320.g5113ab0.1.el7.x86_64.rpm</programlisting>
containg the snapshot build number (here: <literal>320</literal>) and the hash
of the <application>git</application> commit it was built from (here: <literal>g5113ab0</literal>).
</para>
<para>
Note that the next formal release (in the above example <literal>4.1.1</literal>), once available,
will install in place of any snapshot builds.
</para>
</sect1>
<sect1 id="packages-old-versions" xreflabel="Installing old package versions">
<title>Installing old package versions</title>
<indexterm>
<primary>old packages</primary>
</indexterm>
<indexterm>
<primary>packages</primary>
<secondary>old versions</secondary>
</indexterm>
<sect2 id="packages-old-versions-debian" xreflabel="old Debian package versions">
<title>Debian/Ubuntu</title>
<para>
An archive of old packages (<literal>3.3.2</literal> and later) for Debian/Ubuntu-based systems is available here:
<ulink url="http://atalia.postgresql.org/morgue/r/repmgr/">http://atalia.postgresql.org/morgue/r/repmgr/</ulink>
</para>
</sect2>
<sect2 id="packages-old-versions-rhel-centos" xreflabel="old RHEL/CentOS package versions">
<title>RHEL/CentOS</title>
<para>
Old RPM packages (<literal>3.2</literal> and later) can be retrieved from the
(deprecated) 2ndQuadrant repository at
<ulink url="http://packages.2ndquadrant.com/">http://packages.2ndquadrant.com/</ulink>
by installing the appropriate repository RPM:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
<listitem>
<simpara>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
</itemizedlist>
<para>
Old versions can be located with e.g.:
<programlisting>
yum --showduplicates list repmgr96</programlisting>
(substitute the appropriate package name; see <xref linkend="packages-centos">) and installed with:
<programlisting>
yum install {package_name}-{version}</programlisting>
where <literal>{package_name}</literal> is the base package name (e.g. <literal>repmgr96</literal>)
and <literal>{version}</literal> is the version listed by the
<command> yum --showduplicates list ...</command> command, e.g. <literal>4.0.6-1.rhel6</literal>.
</para>
<para>For example:
<programlisting>
yum install repmgr96-4.0.6-1.rhel6</programlisting>
</para>
</sect2>
</sect1>
<sect1 id="packages-packager-info" xreflabel="Information for packagers">
<title>Information for packagers</title>
<indexterm>
<primary>packages</primary>
<secondary>information for packagers</secondary>
</indexterm>
<para>
We recommend patching the following parameters when
building the package as built-in default values for user convenience.
These values can nevertheless be overridden by the user, if desired.
</para>
<itemizedlist>
<listitem>
<para>
Configuration file location: the default configuration file location
can be hard-coded by patching <varname>package_conf_file</varname>
in <filename>configfile.c</filename>:
<programlisting>
/* packagers: if feasible, patch configuration file path into "package_conf_file" */
char package_conf_file[MAXPGPATH] = "";</programlisting>
</para>
<para>
See also: <xref linkend="configuration-file">
</para>
</listitem>
<listitem>
<para>
PID file location: the default <application>repmgrd</application> PID file
location can be hard-coded by patching <varname>package_pid_file</varname>
in <filename>repmgrd.c</filename>:
<programlisting>
/* packagers: if feasible, patch PID file path into "package_pid_file" */
char package_pid_file[MAXPGPATH] = "";</programlisting>
</para>
<para>
See also: <xref linkend="repmgrd-pid-file">
</para>
</listitem>
</itemizedlist>
</sect1>
</appendix> </appendix>

View File

@@ -15,513 +15,9 @@
See also: <xref linkend="upgrading-repmgr"> See also: <xref linkend="upgrading-repmgr">
</para> </para>
<sect1 id="release-4.2">
<title>Release 4.2</title>
<para><emphasis>Wed October 24, 2018</emphasis></para>
<para>
&repmgr; 4.2 is a major release, with the main new feature being the
ability to <link linkend="repmgrd-pausing">pause repmgrd</link>, e.g. during planned maintenance
operations. Various other usability enhancements and a couple of bug fixes are also included;
see notes below for details.
</para>
<para>
A restart of the PostgreSQL server <emphasis>is</emphasis> required
for this release. For detailed upgrade instructions, see
<link linkend="upgrading-major-version">Upgrading a major version release</link>.
</para>
<sect2>
<title>Configuration file changes</title>
<para>
<itemizedlist>
<listitem>
<para>
New parameter <varname>shutdown_check_timeout</varname> (default: 60 seconds) added;
this provides an explicit timeout for
<command><link linkend="repmgr-standby-switchover">repmgr standby switchover</link></command>
to check that the demotion candidate (current primary) has shut down. Previously, the parameters
<literal>reconnect_attempts</literal> and <literal>reconnect_interval</literal>
were used to calculate a timeout, but these are actually
intended for primary failure detection. (GitHub #504).
</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>
New parameter <varname>repmgr_bindir</varname> added, to facilitate remote invocation of repmgr
when the repmgr binary is located somewhere other than the PostgreSQL binary directory, as it
cannot be assumed all package maintainers will install &repmgr; there.
</para>
<para>
This parameter is optional; if not set (the default), &repmgr; will fall back
to <option>pg_bindir</option> (if set).
</para>
<para>
(GitHub #246).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>repmgr enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-cluster-cleanup">repmgr cluster cleanup</link></command>
now accepts the <option>--node-id</option> option to delete records for only one
node. (GitHub #493).
</para>
</listitem>
<listitem>
<para>
When running
<command><link linkend="repmgr-cluster-matrix">repmgr cluster matrix</link></command> and
<command><link linkend="repmgr-cluster-crosscheck">repmgr cluster crosscheck</link></command>,
&repmgr; will report nodes unreachable via SSH, and emit return code <literal>ERR_BAD_SSH</literal>.
(GitHub #246).
</para>
<note>
<para>
Users relying on
<command><link linkend="repmgr-cluster-crosscheck">repmgr cluster crosscheck</link></command>
to return a non-zero return code as a way of detecting connectivity errors should be aware
that <literal>ERR_BAD_SSH</literal> will be returned if there is an SSH connection error
from the node where the command is executed, even if the command is able to establish
that PostgreSQL connectivity is fine. Therefore the exact return code should be checked
to determine what kind of connectivity error has been detected.
</para>
</note>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>repmgrd enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<application>repmgrd</application> can now be &quot;paused&quot;, i.e. instructed
not to take any action such as a failover, even if the prerequisites for such an
action are detected.
</para>
<para>
This removes the need to stop <application>repmgrd</application> on all nodes when
performing a planned operation such as a switchover.
</para>
<para>
For further details, see <link linkend="repmgrd-pausing">Pausing repmgrd</link>.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>Bug fixes</title>
<para>
<itemizedlist>
<listitem>
<para>
&repmgr;: fix &quot;Missing replication slots&quot; label in
<command><link linkend="repmgr-node-check">repmgr node check</link></command>. (GitHub #507)
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: fix parsing of <option>-d/--daemonize</option> option.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<sect1 id="release-4.1.1">
<title>Release 4.1.1</title>
<para><emphasis>Wed September 5, 2018</emphasis></para>
<para>
repmgr 4.1.1 contains a number of usability enhancements and bug fixes.
</para>
<para>
We recommend upgrading to this version as soon as possible.
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.1.0;
<application>repmgrd</application> (if running) should be restarted.
See <xref linkend="upgrading-repmgr"> for more details.
</para>
<sect2>
<title>repmgr enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-standby-switchover">repmgr standby switchover --dry-run</link></command>
no longer copies external configuration files to test they can be copied; this avoids making
any changes to the target system. (GitHub #491).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-cluster-cleanup">repmgr cluster cleanup</link></command>:
add <literal>cluster_cleanup</literal> event. (GitHub #492).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-standby-switchover">repmgr standby switchover</link></command>:
improve detection of free walsenders. (GitHub #495).
</para>
</listitem>
<listitem>
<para>
Improve messages emitted during
<command><link linkend="repmgr-standby-promote">repmgr standby promote</link></command>.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>repmgrd enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
Always reopen the log file after
receiving <literal>SIGHUP</literal>. Previously this only happened if
a configuration file change was detected.
(GitHub #485).
</para>
</listitem>
<listitem>
<para>
Report version number <emphasis>after</emphasis>
logger initialisation. (GitHub #487).
</para>
</listitem>
<listitem>
<para>
Improve cascaded standby failover handling. (GitHub #480).
</para>
</listitem>
<listitem>
<para>
Improve reconnection handling after brief network outages; if
monitoring data being collected, this could lead to orphaned
sessions on the primary. (GitHub #480).
</para>
</listitem>
<listitem>
<para>
Check <varname>promote_command</varname> and <varname>follow_command</varname>
are defined when reloading configuration. These were checked on startup but
not reload by <application>repmgrd</application>, which made it possible to
make <application>repmgrd</application> with invalid values. It's unlikely
anyone would want to do this, but we should make it impossible anyway.
(GitHub #486).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>Other</title>
<para>
<itemizedlist>
<listitem>
<para>
Text of any failed queries will now be logged as <literal>ERROR</literal> to assist
logfile analysis at log levels higher than <literal>DEBUG</literal>.
(GitHub #498).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>Bug fixes</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-node-rejoin">repmgr node rejoin</link></command>:
remove new upstream's replication slot if it still exists on the rejoined
standby. (GitHub #499).
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: fix startup on witness node when local data is stale. (GitHub #488, #489).
</para>
</listitem>
<listitem>
<para>
Truncate version string reported by PostgreSQL if necessary; some
distributions insert additional detail after the actual version.
(GitHub #490).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<sect1 id="release-4.1.0">
<title>Release 4.1.0</title>
<para><emphasis>Tue July 31, 2018</emphasis></para>
<para>
&repmgr; 4.1.0 introduces some changes to <application>repmgrd</application>
behaviour and some additional configuration parameters.
</para>
<para>
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.6.
The following post-upgrade steps must be carried out:
<itemizedlist>
<listitem>
<para>
Execute <command>ALTER EXTENSION repmgr UPDATE</command>
on the primary server in the database where &repmgr; is installed.
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application> must be restarted on all nodes where it is running.
</para>
</listitem>
</itemizedlist>
A restart of the PostgreSQL server is <emphasis>not</emphasis> required
for this release (unless upgrading from repmgr 3.x).
</para>
<para>
See <xref linkend="upgrading-repmgr-extension"> for more details.
</para>
<para>
Configuration changes are backwards-compatible and no changes to
<filename>repmgr.conf</filename> are required. However users should
review the changes listed below.
</para>
<note>
<para>
<emphasis>Repository changes</emphasis>
</para>
<para>
Coinciding with this release, the 2ndQuadrant repository structure has changed.
See section <xref linkend="installation-packages"> for details, particularly
if you are using a RPM-based system.
</para>
</note>
<sect2>
<title>Configuration file changes</title>
<para>
<itemizedlist>
<listitem>
<para>
Default for <xref linkend="repmgr-conf-log-level"> is now <option>INFO</option>.
This produces additional informative log output, without creating excessive additional
log file volume, and matches the setting assumed for examples in the documentation.
(GitHub #470).
</para>
</listitem>
<listitem>
<para>
<varname>recovery_min_apply_delay</varname> now accepts a minimum value
of <literal>zero</literal> (GitHub #448).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>repmgr enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<application>repmgr</application>: always exit with an error if an unrecognised
command line option is provided. This matches the behaviour of other PostgreSQL
utilities such as <application>psql</application>. (GitHub #464).
</para>
</listitem>
<listitem>
<para>
<application>repmgr</application>: add <option>-q/--quiet</option> option to suppress non-error
output. (GitHub #468).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-cluster-show">repmgr cluster show</link></command>,
<command><link linkend="repmgr-node-check">repmgr node check</link></command> and
<command><link linkend="repmgr-node-status">repmgr node status</link></command>
return non-zero exit code if node status issues detected. (GitHub #456).
</para>
</listitem>
<listitem>
<para>
Add <option>--csv</option> output option for
<command><link linkend="repmgr-cluster-event">repmgr cluster event</link></command>.
(GitHub #471).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-witness-unregister">repmgr witness unregister</link></command>
can be run on any node, by providing the ID of the witness node with <option>--node-id</option>.
(GitHub #472).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-standby-switchover">repmgr standby switchover</link></command>
will refuse to run if an exclusive backup is taking place on the current primary.
(GitHub #476).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>repmgrd enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<application>repmgrd</application>: create a PID file by default
(GitHub #457). For details, see <xref linkend="repmgrd-pid-file">.
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: daemonize process by default.
In case, for whatever reason, the user does not wish to daemonize the
process, provide <option>--daemonize=false</option>.
(GitHub #458).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>Bug fixes</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-standby-register">repmgr standby register --wait-sync</link></command>:
fix behaviour when no timeout provided.
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-cluster-cleanup">repmgr cluster cleanup</link></command>:
add missing help options. (GitHub #461/#462).
</para>
</listitem>
<listitem>
<para>
Ensure witness node follows new primary after switchover. (GitHub #453).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-node-check">repmgr node check</link></command> and
<command><link linkend="repmgr-node-status">repmgr node status</link></command>:
fix witness node handling. (GitHub #451).
</para>
</listitem>
<listitem>
<para>
When using <command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>
with <option>--recovery-conf-only</option> and replication slots, ensure
<varname>primary_slot_name</varname> is set correctly. (GitHub #474).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<sect1 id="release-4.0.6"> <sect1 id="release-4.0.6">
<title>Release 4.0.6</title> <title>Release 4.0.6</title>
<para><emphasis>Thu June 14, 2018</emphasis></para> <para><emphasis>June 14, 2018</emphasis></para>
<para> <para>
&repmgr; 4.0.6 contains a number of bug fixes and usability enhancements. &repmgr; 4.0.6 contains a number of bug fixes and usability enhancements.
</para> </para>
@@ -644,7 +140,7 @@
<listitem> <listitem>
<para> <para>
Various documentation improvements, with particular emphasis on Various documentation improvements, with particular emphasis on
the importance of setting appropriate <link linkend="configuration-file-service-commands">service commands</link> the importance of setting appropriate <link linkend="configuration-service-commands">service commands</link>
instead of relying on <application>pg_ctl</application>. instead of relying on <application>pg_ctl</application>.
</para> </para>
</listitem> </listitem>

View File

@@ -5,14 +5,14 @@
<title>repmgr source code signing key</title> <title>repmgr source code signing key</title>
<para> <para>
The signing key ID used for <application>repmgr</application> source code bundles is: The signing key ID used for <application>repmgr</application> source code bundles is:
<ulink url="https://repmgr.org/download/SOURCE-GPG-KEY-repmgr"> <ulink url="http://packages.2ndquadrant.com/repmgr/SOURCE-GPG-KEY-repmgr">
<literal>0x297F1DCC</literal></ulink>. <literal>0x297F1DCC</literal></ulink>.
</para> </para>
<para> <para>
To download the <application>repmgr</application> source key to your computer: To download the <application>repmgr</application> source key to your computer:
<programlisting> <programlisting>
curl -s https://repmgr.org/download/SOURCE-GPG-KEY-repmgr | gpg --import curl -s http://packages.2ndquadrant.com/repmgr/SOURCE-GPG-KEY-repmgr | gpg --import
gpg --fingerprint 0x297F1DCC gpg --fingerprint 0x297F1DCC
</programlisting> </programlisting>
then verify that the fingerprint is the expected value: then verify that the fingerprint is the expected value:

View File

@@ -4,5 +4,5 @@ BDR failover with repmgrd
This document has been integrated into the main `repmgr` documentation This document has been integrated into the main `repmgr` documentation
and is now located here: and is now located here:
> [BDR failover with repmgrd](https://repmgr.org/docs/current/repmgrd-bdr.html) > [BDR failover with repmgrd](https://repmgr.org/docs/4.0/repmgrd-bdr.html)

View File

@@ -4,4 +4,4 @@ Changes in repmgr 4
This document has been integrated into the main `repmgr` documentation This document has been integrated into the main `repmgr` documentation
and is now located here: and is now located here:
> [Release notes](https://repmgr.org/docs/current/release-4.0.html) > [Release notes](https://repmgr.org/docs/4.0/release-4.0.html)

View File

@@ -243,8 +243,8 @@
</simpara> </simpara>
<simpara> <simpara>
As an alternative we recommend using 2ndQuadrant's <ulink url="https://www.pgbarman.org/">Barman</ulink>, As an alternative we recommend using 2ndQuadrant's <ulink url="https://www.pgbarman.org/">Barman</ulink>,
which offloads WAL management to a separate server, removing the requirement to use a replication which offloads WAL management to a separate server, negating the need to use replication
slot for each individual standby to reserve WAL. See section <xref linkend="cloning-from-barman"> slots to reserve WAL. See section <xref linkend="cloning-from-barman">
for more details on using &repmgr; together with Barman. for more details on using &repmgr; together with Barman.
</simpara> </simpara>
</tip> </tip>
@@ -352,12 +352,10 @@
provide additional parameters for <command>pg_basebackup</command> to customise the provide additional parameters for <command>pg_basebackup</command> to customise the
cloning process. cloning process.
</para> </para>
<para> <para>
By default, <command>pg_basebackup</command> performs a checkpoint before beginning the backup By default, <command>pg_basebackup</command> performs a checkpoint before beginning the backup
process. However, a normal checkpoint may take some time to complete; process. However, a normal checkpoint may take some time to complete;
a fast checkpoint can be forced with <command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>'s a fast checkpoint can be forced with the <literal>-c/--fast-checkpoint</literal> option.
<literal>-c/--fast-checkpoint</literal> option.
Note that this may impact performance of the server being cloned from (typically the primary) Note that this may impact performance of the server being cloned from (typically the primary)
so should be used with care. so should be used with care.
</para> </para>
@@ -372,18 +370,6 @@
Other options can be passed to <command>pg_basebackup</command> by including them Other options can be passed to <command>pg_basebackup</command> by including them
in the <filename>repmgr.conf</filename> setting <varname>pg_basebackup_options</varname>. in the <filename>repmgr.conf</filename> setting <varname>pg_basebackup_options</varname>.
</para> </para>
<para>
Not that by default, &repmgr; executes <command>pg_basebackup</command> with <option>-X/--wal-method</option>
(PostgreSQL 9.6 and earlier: <option>-X/--xlog-method</option>) set to <literal>stream</literal>.
From PostgreSQL 9.6, if replication slots are in use, it will also create a replication slot before
running the base backup, and execute <command>pg_basebackup</command> with the
<option>-S/--slot</option> option set to the name of the previously created replication slot.
</para>
<para>
These parameters can set by the user in <varname>pg_basebackup_options</varname>, in which case they
will override the &repmgr; default values. However normally there's no reason to do this.
</para>
<para> <para>
If using a separate directory to store WAL files, provide the option <literal>--waldir</literal> If using a separate directory to store WAL files, provide the option <literal>--waldir</literal>
(<literal>--xlogdir</literal> in PostgreSQL 9.6 and earlier) with the absolute path to the (<literal>--xlogdir</literal> in PostgreSQL 9.6 and earlier) with the absolute path to the

View File

@@ -1,107 +0,0 @@
<sect1 id="configuration-file-log-settings" xreflabel="log settings">
<indexterm>
<primary>repmgr.conf</primary>
<secondary>log settings</secondary>
</indexterm>
<indexterm>
<primary>log settings</primary>
<secondary>configuration in repmgr.conf</secondary>
</indexterm>
<title>Log settings</title>
<para>
By default, &repmgr; and <application>repmgrd</application> write log output to
<literal>STDERR</literal>. An alternative log destination can be specified
(either a file or <literal>syslog</literal>).
</para>
<note>
<para>
The &repmgr; application itself will continue to write log output to <literal>STDERR</literal>
even if another log destination is configured, as otherwise any output resulting from a command
line operation will "disappear" into the log.
</para>
<para>
This behaviour can be overriden with the command line option <option>--log-to-file</option>,
which will redirect all logging output to the configured log destination. This is recommended
when &repmgr; is executed by another application, particularly <application>repmgrd</application>,
to enable log output generated by the &repmgr; application to be stored for later reference.
</para>
</note>
<variablelist>
<varlistentry id="repmgr-conf-log-level" xreflabel="log_level">
<term><varname>log_level</varname> (<type>string</type>)
<indexterm>
<primary><varname>log_level</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
One of <option>DEBUG</option>, <option>INFO</option>, <option>NOTICE</option>,
<option>WARNING</option>, <option>ERROR</option>, <option>ALERT</option>, <option>CRIT</option>
or <option>EMERG</option>.
</para>
<para>
Default is <option>INFO</option>.
</para>
<para>
Note that <option>DEBUG</option> will produce a substantial amount of log output
and should not be enabled in normal use.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-log-facility" xreflabel="log_facility">
<term><varname>log_facility</varname> (<type>string</type>)
<indexterm>
<primary><varname>log_facility</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Logging facility: possible values are <option>STDERR</option> (default), or for
syslog integration, one of <option>LOCAL0</option>, <option>LOCAL1</option>, <option>...</option>,
<option>LOCAL7</option>, <option>USER</option>.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-log-file" xreflabel="log_file">
<term><varname>log_file</varname> (<type>string</type>)
<indexterm>
<primary><varname>log_file</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
If <xref linkend="repmgr-conf-log-facility"> is set to <option>STDERR</option>, log output
can be redirected to the specified file.
</para>
<para>
See <xref linkend="repmgrd-log-rotation"> for information on configuring log rotation.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-log-status-interval" xreflabel="log_status_interval">
<term><varname>log_status_interval</varname> (<type>integer</type>)
<indexterm>
<primary><varname>log_status_interval</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
This setting causes <application>repmgrd</application> to emit a status log
line at the specified interval (in seconds, default <literal>300</literal>)
describing <application>repmgrd</application>'s current state, e.g.:
</para>
<programlisting>
[2018-07-12 00:47:32] [INFO] monitoring connection to upstream node "node1" (node ID: 1)</programlisting>
</listitem>
</varlistentry>
</variablelist>
</sect1>

View File

@@ -1,10 +1,10 @@
<sect1 id="configuration-file-settings" xreflabel="required configuration file settings"> <sect1 id="configuration-file-settings" xreflabel="configuration file settings">
<indexterm> <indexterm>
<primary>repmgr.conf</primary> <primary>repmgr.conf</primary>
<secondary>required settings</secondary> <secondary>basic settings</secondary>
</indexterm> </indexterm>
<title>Required configuration file settings</title> <title>Basic configuration file settings</title>
<para> <para>
Each <filename>repmgr.conf</filename> file must contain the following parameters: Each <filename>repmgr.conf</filename> file must contain the following parameters:
</para> </para>

View File

@@ -1,15 +1,15 @@
<sect1 id="configuration-file" xreflabel="configuration file"> <sect1 id="configuration-file" xreflabel="configuration file location">
<indexterm> <indexterm>
<primary>repmgr.conf</primary> <primary>repmgr.conf</primary>
<secondary>location</secondary>
</indexterm> </indexterm>
<indexterm> <indexterm>
<primary>configuration</primary> <primary>configuration</primary>
<secondary>repmgr.conf</secondary> <secondary>repmgr.conf location</secondary>
</indexterm> </indexterm>
<title>Configuration file</title> <title>Configuration file location</title>
<para> <para>
<application>repmgr</application> and <application>repmgrd</application> <application>repmgr</application> and <application>repmgrd</application>
use a common configuration file, by default called use a common configuration file, by default called
@@ -21,55 +21,6 @@
for more details. for more details.
</para> </para>
<sect2 id="configuration-file-format" xreflabel="configuration file format">
<indexterm>
<primary>repmgr.conf</primary>
<secondary>format</secondary>
</indexterm>
<title>Configuration file format</title>
<para>
<filename>repmgr.conf</filename> is a plain text file with one parameter/value
combination per line.
</para>
<para>
Whitespace is insignificant (except within a quoted parameter value) and blank lines are ignored.
Hash marks (#) designate the remainder of the line as a comment. Parameter values that are not simple
identifiers or numbers should be single-quoted. Note that single quote can not be embedded
in a parameter value.
</para>
<important>
<para>
&repmgr; will interpret double-quotes as being part of a string value; only use single quotes
to quote parameter values.
</para>
</important>
<para>
Example of a valid <filename>repmgr.conf</filename> file:
<programlisting>
# repmgr.conf
node_id=1
node_name= node1
conninfo ='host=node1 dbname=repmgr user=repmgr connect_timeout=2'
data_directory = /var/lib/pgsql/11/data</programlisting>
</para>
</sect2>
<sect2 id="configuration-file-location" xreflabel="configuration file location">
<indexterm>
<primary>repmgr.conf</primary>
<secondary>location</secondary>
</indexterm>
<title>Configuration file location</title>
<para> <para>
The configuration file will be searched for in the following locations: The configuration file will be searched for in the following locations:
<itemizedlist spacing="compact" mark="bullet"> <itemizedlist spacing="compact" mark="bullet">
@@ -99,7 +50,7 @@ data_directory = /var/lib/pgsql/11/data</programlisting>
Note that if a file is explicitly specified with <literal>-f/--config-file</literal>, Note that if a file is explicitly specified with <literal>-f/--config-file</literal>,
an error will be raised if it is not found or not readable, and no attempt will be made to an error will be raised if it is not found or not readable, and no attempt will be made to
check default locations; this is to prevent <application>repmgr</application> unexpectedly check default locations; this is to prevent <application>repmgr</application> unexpectedly
reading the wrong configuration file. reading the wrong configuraton file.
</para> </para>
<note> <note>
@@ -115,6 +66,4 @@ data_directory = /var/lib/pgsql/11/data</programlisting>
<filename>/path/to/repmgr.conf</filename>). <filename>/path/to/repmgr.conf</filename>).
</para> </para>
</note> </note>
</sect1>
</sect2>
</sect1>

View File

@@ -1,4 +1,4 @@
<sect1 id="configuration-file-service-commands" xreflabel="service command settings"> <sect1 id="configuration-service-commands" xreflabel="service command settings">
<indexterm> <indexterm>
<primary>repmgr.conf</primary> <primary>repmgr.conf</primary>
<secondary>service command settings</secondary> <secondary>service command settings</secondary>
@@ -17,9 +17,9 @@
<link linkend="repmgr-node-rejoin"><command>repmgr node rejoin</command></link>. <link linkend="repmgr-node-rejoin"><command>repmgr node rejoin</command></link>.
</para> </para>
<para> <para>
By default, &repmgr; will use PostgreSQL's <command>pg_ctl</command> utility to control the PostgreSQL By default, &repmgr; will use PostgreSQL's <command>pg_ctl</command> to control the PostgreSQL
server. However this can lead to various problems, particularly when PostgreSQL has been server. However this can lead to various problems, particularly when PostgreSQL has been
installed from packages, and especially so if <application>systemd</application> is in use. installed from packages, and expecially so if <application>systemd</application> is in use.
</para> </para>
@@ -47,14 +47,6 @@
service_restart_command service_restart_command
service_reload_command</programlisting> service_reload_command</programlisting>
</para> </para>
<note>
<para>
&repmgr; will not apply <option>pg_bindir</option> when executing any of these commands;
these can be user-defined scripts so must always be specified with the full path.
</para>
</note>
<note> <note>
<para> <para>
It's also possible to specify a <varname>service_promote_command</varname>. It's also possible to specify a <varname>service_promote_command</varname>.
@@ -64,7 +56,7 @@
</para> </para>
<para> <para>
If your packaging system does not provide such a command, it can be left empty, If your packaging system does not provide such a command, it can be left empty,
and &repmgr; will generate the appropriate `pg_ctl ... promote` command. and &repmgr; will generate the appropriate <command>pg_ctl ... promote</command> command.
</para> </para>
<para> <para>
Do not confuse this with <varname>promote_command</varname>, which is used Do not confuse this with <varname>promote_command</varname>, which is used
@@ -72,14 +64,15 @@
</para> </para>
</note> </note>
<para> <para>
To confirm which command &repmgr; will execute for each action, use To confirm which command &repmgr; will execute for each action, use
<command><link linkend="repmgr-node-service">repmgr node service --list-actions --action=...</link></command>, e.g.: <command>repmgr node service --list --action=...</command>, e.g.:
<programlisting> <programlisting>
repmgr -f /etc/repmgr.conf node service --list-actions --action=stop repmgr -f /etc/repmgr.conf node service --list --action=stop
repmgr -f /etc/repmgr.conf node service --list-actions --action=start repmgr -f /etc/repmgr.conf node service --list --action=start
repmgr -f /etc/repmgr.conf node service --list-actions --action=restart repmgr -f /etc/repmgr.conf node service --list --action=restart
repmgr -f /etc/repmgr.conf node service --list-actions --action=reload</programlisting> repmgr -f /etc/repmgr.conf node service --list --action=reload</programlisting>
</para> </para>
<para> <para>
@@ -99,7 +92,7 @@
Defaults:postgres !requiretty Defaults:postgres !requiretty
postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql-9.6, \ postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql-9.6, \
/usr/bin/systemctl start postgresql-9.6, \ /usr/bin/systemctl start postgresql-9.6, \
/usr/bin/systemctl restart postgresql-9.6, \ /usr/bin/systemctl restart postgresql-9.6 \
/usr/bin/systemctl reload postgresql-9.6</programlisting> /usr/bin/systemctl reload postgresql-9.6</programlisting>
</para> </para>

View File

@@ -1,304 +1,17 @@
<chapter id="configuration" xreflabel="Configuration"> <chapter id="configuration" xreflabel="Configuration">
<title>repmgr configuration</title> <title>repmgr configuration</title>
<sect1 id="configuration-prerequisites" xreflabel="Prerequisites for configuration">
<indexterm>
<primary>configuration</primary>
<secondary>prerequisites</secondary>
</indexterm>
<indexterm>
<primary>configuration</primary>
<secondary>ssh</secondary>
</indexterm>
<title>Prerequisites for configuration</title>
<para>
Following software must be installed on both servers:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><application>PostgreSQL</application></simpara>
</listitem>
<listitem>
<simpara>
<application>repmgr</application>
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
At network level, connections between the PostgreSQL port (default: <literal>5432</literal>)
must be possible between all nodes.
</para>
<para>
Passwordless <command>SSH</command> connectivity between all servers in the replication cluster
is not required, but is necessary in the following cases:
<itemizedlist>
<listitem>
<simpara>if you need &repmgr; to copy configuration files from outside the PostgreSQL
data directory (as is the case with e.g. <link linkend="packages-debian-ubuntu">Debian packages</link>);
in this case <command>rsync</command> must also be installed on all servers.
</simpara>
</listitem>
<listitem>
<simpara>to perform <link linkend="performing-switchover">switchover operations</link></simpara>
</listitem>
<listitem>
<simpara>
when executing <command><link linkend="repmgr-cluster-matrix">repmgr cluster matrix</link></command>
and <command><link linkend="repmgr-cluster-crosscheck">repmgr cluster crosscheck</link></command>
</simpara>
</listitem>
</itemizedlist>
</para>
<tip>
<simpara>
Consider setting <varname>ConnectTimeout</varname> to a low value in your SSH configuration.
This will make it faster to detect any SSH connection errors.
</simpara>
</tip>
<sect2 id="configuration-postgresql" xreflabel="PostgreSQL configuration">
<indexterm>
<primary>configuration</primary>
<secondary>PostgreSQL</secondary>
</indexterm>
<indexterm>
<primary>PostgreSQL configuration</primary>
</indexterm>
<title>PostgreSQL configuration for &repmgr;</title>
<para>
The following PostgreSQL configuration parameters may need to be changed in order
for &repmgr; (and replication itself) to function correctly.
</para>
<variablelist>
<varlistentry>
<indexterm>
<primary>hot_standby</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>hot_standby</option></term>
<listitem>
<para>
<option>hot_standby</option> must always be set to <literal>on</literal>, as &repmgr; needs
to be able to connect to each server it manages.
</para>
<para>
Note that <option>hot_standby</option> defaults to <literal>on</literal> from PostgreSQL 10
and later; in PostgreSQL 9.6 and earlier, the default was <literal>off</literal>.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-HOT-STANDBY">hot_standby</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<indexterm>
<primary>wal_level</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>wal_level</option></term>
<listitem>
<para>
<option>wal_level</option> must be one of <option>replica</option> or <option>logical</option>
(PostgreSQL 9.5 and earlier: one of <option>hot_standby</option> or <option>logical</option>).
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LEVEL">wal_level</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<indexterm>
<primary>max_wal_senders</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>max_wal_senders</option></term>
<listitem>
<para>
<option>max_wal_senders</option> must be set to a value of <literal>2</literal> or greater.
In general you will need one WAL sender for each standby which will attach to the PostgreSQL
instance; additionally &repmgr; will require two free WAL senders in order to clone further
standbys.
</para>
<para>
<option>max_wal_senders</option> should be set to an appropriate value on all PostgreSQL
instances in the replication cluster which may potentially become a primary server or
(in cascading replication) the upstream server of a standby.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-WAL-SENDERS">max_wal_senders</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<indexterm>
<primary>max_replication_slots</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>max_replication_slots</option></term>
<listitem>
<para>
If you are intending to use replication slots, <option>max_replication_slots</option>
must be set to a non-zero value.
</para>
<para>
<option>max_replication_slots</option> should be set to an appropriate value on all PostgreSQL
instances in the replication cluster which may potentially become a primary server or
(in cascading replication) the upstream server of a standby.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-REPLICATION-SLOTS">max_replication_slots</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<indexterm>
<primary>wal_log_hints</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>wal_log_hints</option></term>
<listitem>
<para>If you are intending to use <application>pg_rewind</application>,
and the cluster was not initialised using data checksums, you may want to consider enabling
<option>wal_log_hints</option>.
</para>
<para>
For more details see <xref linkend="repmgr-node-rejoin-pg-rewind">.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LOG-HINTS">wal_log_hints</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<indexterm>
<primary>archive_mode</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>archive_mode</option></term>
<listitem>
<para>
We suggest setting <option>archive_mode</option> to <literal>on</literal> (and
<option>archive_command</option> to <literal>/bin/true</literal>; see below)
even if you are currently not planning to use WAL file archiving.
</para>
<para>
This will make it simpler to set up WAL file archiving if it is ever required,
as changes to <option>archive_mode</option> require a full PostgreSQL server
restart, while <option>archive_command</option> changes can be applied via a normal
configuration reload.
</para>
<para>
However, &repmgr; itself does not require WAL file archiving.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-MODE">archive_mode</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<indexterm>
<primary>archive_command</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>archive_command</option></term>
<listitem>
<para>
If you have set <option>archive_mode</option> to <literal>on</literal> but are not currently planning
to use WAL file archiving, set <option>archive_command</option> to a command which does nothing but returns
<literal>true</literal>, such as <command>/bin/true</command>. See above for details.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-COMMAND">archive_command</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<indexterm>
<primary>wal_keep_segments</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>wal_keep_segments</option></term>
<listitem>
<para>
Normally there is no need to set <option>wal_keep_segments</option> (default: <literal>0</literal>), as it
is <emphasis>not</emphasis> a reliable way of ensuring that all required WAL segments are available to standbys.
Replication slots and/or an archiving solution such as Barman are recommended to ensure standbys have a reliable
source of WAL segments at all times.
</para>
<para>
The only reason ever to set <option>wal_keep_segments</option> is you have
you have configured <option>pg_basebackup_options</option>
in <filename>repmgr.conf</filename> to include the setting <literal>--wal-method=fetch</literal>
(PostgreSQL 9.6 and earlier: <literal>--xlog-method=fetch</literal>)
<emphasis>and</emphasis> you have <emphasis>not</emphasis> set <option>restore_command</option>
in <filename>repmgr.conf</filename> to fetch WAL files from a reliable source such as Barman,
in which case you'll need to set <option>wal_keep_segments</option>
to a sufficiently high number to ensure that all WAL files required by the standby
are retained. However we do not recommend managing replication in this way.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-WAL-KEEP-SEGMENTS">wal_keep_segments</ulink>.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
See also the <link linkend="quickstart-postgresql-configuration">PostgreSQL configuration</link> section in the
<link linkend="quickstart">Quick-start guide</link>.
</para>
</sect2>
</sect1>
&configuration-file; &configuration-file;
&configuration-file-required-settings; &configuration-file-settings;
&configuration-file-log-settings; &configuration-service-commands;
&configuration-file-service-commands;
<sect1 id="configuration-permissions" xreflabel="Database user permissions"> <sect1 id="configuration-permissions" xreflabel="User permissions">
<indexterm> <indexterm>
<primary>configuration</primary> <primary>configuration</primary>
<secondary>database user permissions</secondary> <secondary>user permissions</secondary>
</indexterm> </indexterm>
<title>repmgr database user permissions</title> <title>repmgr user permissions</title>
<para> <para>
&repmgr; will create an extension database containing objects &repmgr; will create an extension database containing objects
for administering &repmgr; metadata. The user defined in the <varname>conninfo</varname> for administering &repmgr; metadata. The user defined in the <varname>conninfo</varname>

View File

@@ -16,22 +16,15 @@
<para> <para>
A typical use case for a witness server is a two-node streaming replication A typical use case for a witness server is a two-node streaming replication
setup, where the primary and standby are in different locations (data centres). setup, where the primary and standby are in different locations (data centres).
By creating a witness server in the same location (data centre) as the primary, By creating a witness server in the same location as the primary, if the primary
if the primary becomes unavailable it's possible for the standby to decide whether becomes unavailable it's possible for the standby to decide whether it can
it can promote itself without risking a "split brain" scenario: if it can't see either the promote itself without risking a "split brain" scenario: if it can't see either the
witness or the primary server, it's likely there's a network-level interruption witness or the primary server, it's likely there's a network-level interruption
and it should not promote itself. If it can seen the witness but not the primary, and it should not promote itself. If it can seen the witness but not the primary,
this proves there is no network interruption and the primary itself is unavailable, this proves there is no network interruption and the primary itself is unavailable,
and it can therefore promote itself (and ideally take action to fence the and it can therefore promote itself (and ideally take action to fence the
former primary). former primary).
</para> </para>
<note>
<para>
<emphasis>Never</emphasis> install a witness server on the same physical host
as another node in the replication cluster managed by &repmgr; - it's essential
the witness is not affected in any way by failure of another node.
</para>
</note>
<para> <para>
For more complex replication scenarios,e.g. with multiple datacentres, it may For more complex replication scenarios,e.g. with multiple datacentres, it may
be preferable to use location-based failover, which ensures that only nodes be preferable to use location-based failover, which ensures that only nodes

View File

@@ -147,76 +147,58 @@
<para> <para>
By default, all notification types will be passed to the designated script; By default, all notification types will be passed to the designated script;
the notification types can be filtered to explicitly named ones using the the notification types can be filtered to explicitly named ones using the
<varname>event_notifications</varname> parameter. <varname>event_notifications</varname> parameter:
</para>
<para>
Events generated by the &repmgr; command:
<itemizedlist spacing="compact" mark="bullet"> <itemizedlist spacing="compact" mark="bullet">
<listitem> <listitem>
<simpara><literal><link linkend="repmgr-primary-register-events">cluster_created</link></literal></simpara> <simpara><literal>primary_register</literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal><link linkend="repmgr-primary-register-events">primary_register</link></literal></simpara> <simpara><literal>primary_unregister</literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal><link linkend="repmgr-primary-unregister-events">primary_unregister</link></literal></simpara> <simpara><literal>standby_register</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-clone-events">standby_clone</link></literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal><link linkend="repmgr-standby-register-events">standby_register</link></literal></simpara> <simpara><literal>standby_register_sync</literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal><link linkend="repmgr-standby-register-events">standby_register_sync</link></literal></simpara> <simpara><literal>standby_unregister</literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal><link linkend="repmgr-standby-unregister-events">standby_unregister</link></literal></simpara> <simpara><literal>standby_clone</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-promote-events">standby_promote</link></literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal><link linkend="repmgr-standby-follow-events">standby_follow</link></literal></simpara> <simpara><literal>standby_promote</literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal><link linkend="repmgr-standby-switchover-events">standby_switchover</link></literal></simpara> <simpara><literal>standby_follow</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-witness-register-events">witness_register</link></literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal><link linkend="repmgr-witness-unregister-events">witness_unregister</link></literal></simpara> <simpara><literal>standby_disconnect_manual</literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal><link linkend="repmgr-node-rejoin-events">node_rejoin</link></literal></simpara> <simpara><literal>standby_failure</literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal><link linkend="repmgr-cluster-cleanup-events">cluster_cleanup</link></literal></simpara> <simpara><literal>standby_recovery</literal></simpara>
</listitem>
<listitem>
<simpara><literal>witness_register</literal></simpara>
</listitem>
<listitem>
<simpara><literal>witness_unregister</literal></simpara>
</listitem>
<listitem>
<simpara><literal>node_rejoin</literal></simpara>
</listitem> </listitem>
</itemizedlist>
</para>
<para>
Events generated by <application>repmgrd</application> (streaming replication mode):
<itemizedlist spacing="compact" mark="bullet">
<listitem> <listitem>
<simpara><literal>repmgrd_start</literal></simpara> <simpara><literal>repmgrd_start</literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal>repmgrd_shutdown</literal></simpara> <simpara><literal>repmgrd_shutdown</literal></simpara>
</listitem> </listitem>
<listitem>
<simpara><literal>repmgrd_reload</literal></simpara>
</listitem>
<listitem> <listitem>
<simpara><literal>repmgrd_failover_promote</literal></simpara> <simpara><literal>repmgrd_failover_promote</literal></simpara>
</listitem> </listitem>
@@ -226,41 +208,15 @@
<listitem> <listitem>
<simpara><literal>repmgrd_failover_aborted</literal></simpara> <simpara><literal>repmgrd_failover_aborted</literal></simpara>
</listitem> </listitem>
<listitem>
<simpara><literal>repmgrd_standby_reconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_promote_error</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_local_disconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_local_reconnect</literal></simpara>
</listitem>
<listitem> <listitem>
<simpara><literal>repmgrd_upstream_disconnect</literal></simpara> <simpara><literal>repmgrd_upstream_disconnect</literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal>repmgrd_upstream_reconnect</literal></simpara> <simpara><literal>repmgrd_upstream_reconnect</literal></simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara><literal>standby_disconnect_manual</literal></simpara> <simpara><literal>repmgrd_promote_error</literal></simpara>
</listitem> </listitem>
<listitem>
<simpara><literal>standby_failure</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_recovery</literal></simpara>
</listitem>
</itemizedlist>
</para>
<para>
Events generated by <application>repmgrd</application> (BDR mode):
<itemizedlist spacing="compact" mark="bullet">
<listitem> <listitem>
<simpara><literal>bdr_failover</literal></simpara> <simpara><literal>bdr_failover</literal></simpara>
</listitem> </listitem>

View File

@@ -38,9 +38,8 @@
<!ENTITY quickstart SYSTEM "quickstart.sgml"> <!ENTITY quickstart SYSTEM "quickstart.sgml">
<!ENTITY configuration SYSTEM "configuration.sgml"> <!ENTITY configuration SYSTEM "configuration.sgml">
<!ENTITY configuration-file SYSTEM "configuration-file.sgml"> <!ENTITY configuration-file SYSTEM "configuration-file.sgml">
<!ENTITY configuration-file-required-settings SYSTEM "configuration-file-required-settings.sgml"> <!ENTITY configuration-file-settings SYSTEM "configuration-file-settings.sgml">
<!ENTITY configuration-file-log-settings SYSTEM "configuration-file-log-settings.sgml"> <!ENTITY configuration-service-commands SYSTEM "configuration-service-commands.sgml">
<!ENTITY configuration-file-service-commands SYSTEM "configuration-file-service-commands.sgml">
<!ENTITY cloning-standbys SYSTEM "cloning-standbys.sgml"> <!ENTITY cloning-standbys SYSTEM "cloning-standbys.sgml">
<!ENTITY promoting-standby SYSTEM "promoting-standby.sgml"> <!ENTITY promoting-standby SYSTEM "promoting-standby.sgml">
<!ENTITY follow-new-primary SYSTEM "follow-new-primary.sgml"> <!ENTITY follow-new-primary SYSTEM "follow-new-primary.sgml">
@@ -58,7 +57,6 @@
<!ENTITY repmgrd-cascading-replication SYSTEM "repmgrd-cascading-replication.sgml"> <!ENTITY repmgrd-cascading-replication SYSTEM "repmgrd-cascading-replication.sgml">
<!ENTITY repmgrd-network-split SYSTEM "repmgrd-network-split.sgml"> <!ENTITY repmgrd-network-split SYSTEM "repmgrd-network-split.sgml">
<!ENTITY repmgrd-witness-server SYSTEM "repmgrd-witness-server.sgml"> <!ENTITY repmgrd-witness-server SYSTEM "repmgrd-witness-server.sgml">
<!ENTITY repmgrd-pausing SYSTEM "repmgrd-pausing.sgml">
<!ENTITY repmgrd-bdr SYSTEM "repmgrd-bdr.sgml"> <!ENTITY repmgrd-bdr SYSTEM "repmgrd-bdr.sgml">
<!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.sgml"> <!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.sgml">
@@ -74,15 +72,11 @@
<!ENTITY repmgr-node-status SYSTEM "repmgr-node-status.sgml"> <!ENTITY repmgr-node-status SYSTEM "repmgr-node-status.sgml">
<!ENTITY repmgr-node-check SYSTEM "repmgr-node-check.sgml"> <!ENTITY repmgr-node-check SYSTEM "repmgr-node-check.sgml">
<!ENTITY repmgr-node-rejoin SYSTEM "repmgr-node-rejoin.sgml"> <!ENTITY repmgr-node-rejoin SYSTEM "repmgr-node-rejoin.sgml">
<!ENTITY repmgr-node-service SYSTEM "repmgr-node-service.sgml">
<!ENTITY repmgr-cluster-show SYSTEM "repmgr-cluster-show.sgml"> <!ENTITY repmgr-cluster-show SYSTEM "repmgr-cluster-show.sgml">
<!ENTITY repmgr-cluster-matrix SYSTEM "repmgr-cluster-matrix.sgml"> <!ENTITY repmgr-cluster-matrix SYSTEM "repmgr-cluster-matrix.sgml">
<!ENTITY repmgr-cluster-crosscheck SYSTEM "repmgr-cluster-crosscheck.sgml"> <!ENTITY repmgr-cluster-crosscheck SYSTEM "repmgr-cluster-crosscheck.sgml">
<!ENTITY repmgr-cluster-event SYSTEM "repmgr-cluster-event.sgml"> <!ENTITY repmgr-cluster-event SYSTEM "repmgr-cluster-event.sgml">
<!ENTITY repmgr-cluster-cleanup SYSTEM "repmgr-cluster-cleanup.sgml"> <!ENTITY repmgr-cluster-cleanup SYSTEM "repmgr-cluster-cleanup.sgml">
<!ENTITY repmgr-daemon-status SYSTEM "repmgr-daemon-status.sgml">
<!ENTITY repmgr-daemon-pause SYSTEM "repmgr-daemon-pause.sgml">
<!ENTITY repmgr-daemon-unpause SYSTEM "repmgr-daemon-unpause.sgml">
<!ENTITY appendix-release-notes SYSTEM "appendix-release-notes.sgml"> <!ENTITY appendix-release-notes SYSTEM "appendix-release-notes.sgml">
<!ENTITY appendix-faq SYSTEM "appendix-faq.sgml"> <!ENTITY appendix-faq SYSTEM "appendix-faq.sgml">

View File

@@ -16,7 +16,7 @@
<para> <para>
&repmgr; RPM packages for RedHat/CentOS variants and Fedora are available from the &repmgr; RPM packages for RedHat/CentOS variants and Fedora are available from the
<ulink url="https://2ndquadrant.com">2ndQuadrant</ulink> <ulink url="https://2ndquadrant.com">2ndQuadrant</ulink>
<ulink url="https://dl.2ndquadrant.com/">public repository</ulink>; see following <ulink url="https://rpm.2ndquadrant.com/">public RPM repository</ulink>; see following
section for details. section for details.
</para> </para>
<para> <para>
@@ -29,17 +29,16 @@
</para> </para>
<note> <note>
<para> <para>
&repmgr; RPM packages are designed to be compatible with the community-provided PostgreSQL packages &repmgr; packages are designed to be compatible with the community-provided PostgreSQL packages.
and 2ndQuadrant's <ulink url="https://www.2ndquadrant.com/en/resources/2ndqpostgres/">2ndQPostgres</ulink>.
They may not work with vendor-specific packages such as those provided by RedHat for RHEL They may not work with vendor-specific packages such as those provided by RedHat for RHEL
customers, as the PostgreSQL filesystem layout may be different to the community RPMs. customers, as the filesystem layout may be different to the community RPMs.
Please contact your support vendor for assistance. Please contact your support vendor for assistance.
</para> </para>
</note> </note>
<para> <para>
For more information on the package contents, including details of installation For more information on the package contents, including details of installation
paths and relevant <link linkend="configuration-file-service-commands">service commands</link>, paths and relevant <link linkend="configuration-service-commands">service commands</link>,
see the appendix section <xref linkend="packages-centos">. see the appendix section <xref linkend="packages-centos">.
</para> </para>
@@ -47,14 +46,26 @@
<sect3 id="installation-packages-redhat-2ndq"> <sect3 id="installation-packages-redhat-2ndq">
<title>2ndQuadrant public RPM yum repository</title> <title>2ndQuadrant public RPM yum repository</title>
<note>
<para> <para>
<ulink url="https://2ndquadrant.com">2ndQuadrant</ulink> previously provided a dedicated
&repmgr; repository at
<ulink url="http://packages.2ndquadrant.com/repmgr/">http://packages.2ndquadrant.com/repmgr/</ulink>.
This repository will be deprecated in a future release as it is now replaced by
the <ulink url="https://rpm.2ndquadrant.com/">public RPM repository</ulink>
documented below.
</para>
</note>
<para>
Beginning with <ulink url="https://repmgr.org/docs/4.0/release-4.0.5.html">repmgr 4.0.5</ulink>,
<ulink url="https://2ndquadrant.com/">2ndQuadrant</ulink> provides a dedicated <literal>yum</literal> <ulink url="https://2ndquadrant.com/">2ndQuadrant</ulink> provides a dedicated <literal>yum</literal>
<ulink url="https://dl.2ndquadrant.com/">public repository</ulink> for 2ndQuadrant software, <ulink url="https://rpm.2ndquadrant.com/">public RPM repository</ulink> for 2ndQuadrant software,
including &repmgr;. We recommend using this for all future &repmgr; releases. including &repmgr;. We recommend using this for all future &repmgr; releases.
</para> </para>
<para> <para>
General instructions for using this repository can be found on its General instructions for using this repository can be found on its
<ulink url="https://dl.2ndquadrant.com/">homepage</ulink>. Specific instructions <ulink url="https://rpm.2ndquadrant.com/">homepage</ulink>. Specific instructions
for installing &repmgr; follow below. for installing &repmgr; follow below.
</para> </para>
<para> <para>
@@ -64,36 +75,29 @@
<listitem> <listitem>
<para> <para>
Locate the repository RPM for your PostgreSQL version from the list at: Locate the repository RPM for your PostgreSQL version from the list at:
<ulink url="https://dl.2ndquadrant.com/">https://dl.2ndquadrant.com/</ulink> <ulink url="https://rpm.2ndquadrant.com/">https://rpm.2ndquadrant.com/</ulink>
</para> </para>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
Install the repository definition for your distribution and PostgreSQL version Install the repository RPM for your distribution and PostgreSQL version
(this enables the 2ndQuadrant repository as a source of &repmgr; packages). (this enables the 2ndQuadrant repository as a source of &repmgr; packages).
</para> </para>
<para> <para>
For example, for PostgreSQL 10 on CentOS, execute: For example, for PostgreSQL 10 on CentOS, execute:
<programlisting> <programlisting>
curl https://dl.2ndquadrant.com/default/release/get/10/rpm | sudo bash</programlisting> sudo yum install https://rpm.2ndquadrant.com/site/content/2ndquadrant-repo-10-1-1.el7.noarch.rpm
</programlisting>
</para> </para>
<para>
For PostgreSQL 9.6 on CentOS, execute:
<programlisting>
curl https://dl.2ndquadrant.com/default/release/get/9.6/rpm | sudo bash</programlisting>
</para>
<para> <para>
Verify that the repository is installed with: Verify that the repository is installed with:
<programlisting> <programlisting>
sudo yum repolist</programlisting> sudo yum repolist</programlisting>
The output should contain two entries like this: The output should contain two entries like this:
<programlisting> <programlisting>
2ndquadrant-dl-default-release-pg10/7/x86_64 2ndQuadrant packages (PG10) for 7 - x86_64 4 2ndquadrant-repo-10/7/x86_64 2ndQuadrant packages for PG10 for rhel 7 - x86_64 1
2ndquadrant-dl-default-release-pg10-debug/7/x86_64 2ndQuadrant packages (PG10) for 7 - x86_64 - Debug 3</programlisting> 2ndquadrant-repo-10-debug/7/x86_64 2ndQuadrant packages for PG10 for rhel 7 - x86_64 - Debug 1</programlisting>
</para> </para>
</listitem> </listitem>
@@ -101,23 +105,8 @@ sudo yum repolist</programlisting>
<para> <para>
Install the &repmgr version appropriate for your PostgreSQL version (e.g. <literal>repmgr10</literal>): Install the &repmgr version appropriate for your PostgreSQL version (e.g. <literal>repmgr10</literal>):
<programlisting> <programlisting>
sudo yum install repmgr10</programlisting> $ yum install repmgr10</programlisting>
</para> </para>
<note>
<para>
For packages for PostgreSQL 9.6 and earlier, the package name does not contain
a period between major and minor version numbers, e.g.
<literal>repmgr96</literal>.
</para>
</note>
<tip>
<para>
To determine the names of available packages, execute:
<programlisting>
yum search repmgr</programlisting>
</para>
</tip>
</listitem> </listitem>
</itemizedlist> </itemizedlist>
</para> </para>
@@ -178,7 +167,7 @@ yum search repmgr</programlisting>
</para> </para>
<para> <para>
For more information on the package contents, including details of installation For more information on the package contents, including details of installation
paths and relevant <link linkend="configuration-file-service-commands">service commands</link>, paths and relevant <link linkend="configuration-service-commands">service commands</link>,
see the appendix section <xref linkend="packages-debian-ubuntu">. see the appendix section <xref linkend="packages-debian-ubuntu">.
</para> </para>
@@ -186,50 +175,60 @@ yum search repmgr</programlisting>
<title>2ndQuadrant public apt repository for Debian/Ubuntu</title> <title>2ndQuadrant public apt repository for Debian/Ubuntu</title>
<para> <para>
Beginning with <ulink url="https://repmgr.org/docs/4.0/release-4.0.5.html">repmgr 4.0.5</ulink>,
<ulink url="https://2ndquadrant.com/">2ndQuadrant</ulink> provides a <ulink url="https://2ndquadrant.com/">2ndQuadrant</ulink> provides a
<ulink url="https://dl.2ndquadrant.com/">public apt repository</ulink> for 2ndQuadrant software, <ulink url="https://apt.2ndquadrant.com/">public apt repository</ulink> for 2ndQuadrant software,
including &repmgr;. including &repmgr;.
</para> </para>
<para> <para>
General instructions for using this repository can be found on its General instructions for using this repository can be found on its
<ulink url="https://dl.2ndquadrant.com/">homepage</ulink>. Specific instructions <ulink url="https://apt.2ndquadrant.com/">homepage</ulink>. Specific instructions
for installing &repmgr; follow below. for installing &repmgr; follow below.
</para> </para>
<para> <para>
<emphasis>Installation</emphasis> <emphasis>Installation</emphasis>
<itemizedlist> <itemizedlist>
<listitem> <listitem>
<para> <para>
Install the repository definition for your distribution and PostgreSQL version If not already present, install the <application>apt-transport-https</application> package:
(this enables the 2ndQuadrant repository as a source of &repmgr; packages) by executing:
<programlisting> <programlisting>
curl https://dl.2ndquadrant.com/default/release/get/deb | sudo bash</programlisting> sudo apt-get install apt-transport-https</programlisting>
</para> </para>
<note>
<para>
This will automatically install the following additional packages, if not already present:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>lsb-release</literal></simpara>
</listitem>
<listitem>
<simpara><literal>apt-transport-https</literal></simpara>
</listitem>
</itemizedlist>
</para>
</note>
</listitem> </listitem>
<listitem>
<para>
Create <filename>/etc/apt/sources.list.d/2ndquadrant.list</filename> as follows:
<programlisting>
sudo sh -c 'echo "deb https://apt.2ndquadrant.com/ $(lsb_release -cs)-2ndquadrant main" > /etc/apt/sources.list.d/2ndquadrant.list'</programlisting>
</para>
</listitem>
<listitem>
<para>
Install the 2ndQuadrant <ulink url="https://apt.2ndquadrant.com/site/keys/9904CD4BD6BAF0C3.asc">repository key</ulink>:
<programlisting>
sudo apt-get install curl ca-certificates
curl https://apt.2ndquadrant.com/site/keys/9904CD4BD6BAF0C3.asc | sudo apt-key add -</programlisting>
</para>
</listitem>
<listitem>
<para>
Update the package list
<programlisting>
sudo apt-get update</programlisting>
</para>
</listitem>
<listitem> <listitem>
<para> <para>
Install the &repmgr version appropriate for your PostgreSQL version (e.g. <literal>repmgr10</literal>): Install the &repmgr version appropriate for your PostgreSQL version (e.g. <literal>repmgr10</literal>):
<programlisting> <programlisting>
sudo apt-get install postgresql-10-repmgr</programlisting> $ apt-get install postgresql-10-repmgr</programlisting>
</para> </para>
<note> <note>
<para> <para>

View File

@@ -13,9 +13,8 @@
</para> </para>
<para> <para>
&repmgr; 4.x is compatible with all PostgreSQL versions from 9.3. See From version 4.0, repmgr is compatible with all PostgreSQL versions from 9.3, including PostgreSQL 10.
section <link linkend="install-compatibility-matrix">&repmgr; compatibility matrix</link> Note that some &repmgr; functionality is not available in PostgreSQL 9.3 and PostgreSQL 9.4.
for an overview of version compatibility.
</para> </para>
<note> <note>
@@ -32,33 +31,34 @@
<para> <para>
&repmgr; must be installed on each server in the replication cluster. &repmgr; must be installed on each server in the replication cluster.
If installing repmgr from packages, the package version must match the PostgreSQL If installing repmgr from packages, the package version must match the PostgreSQL
version. If installing from source, &repmgr; must be compiled against the same version. If installing from source, repmgr must be compiled against the same
major version. major version.
</para> </para>
<note>
<simpara>
The same &quot;major&quot; &repmgr; version (e.g. <literal>4.2.x</literal>) <emphasis>must</emphasis>
be installed on all node in the replication cluster. We strongly recommend keeping all
nodes on the same (preferably latest) &quot;minor&quot; &repmgr; version to minimize the risk
of incompatibilities.
</simpara>
<simpara>
If different &quot;major&quot; &repmgr; versions (e.g. 3.3.x and 4.1.x)
are installed on different nodes, in the best case &repmgr; (in particular <application>repmgrd</application>)
will not run. In the worst case, you will end up with a broken cluster.
</simpara>
</note>
<para> <para>
A dedicated system user for &repmgr; is <emphasis>not</emphasis> required; as many &repmgr; and A dedicated system user for &repmgr; is *not* required; as many &repmgr; and
<application>repmgrd</application> actions require direct access to the PostgreSQL data directory, <application>repmgrd</application> actions require direct access to the PostgreSQL data directory,
these commands should be executed by the <literal>postgres</literal> user. these commands should be executed by the <literal>postgres</literal> user.
</para> </para>
<para> <para>
See also <link linkend="configuration-prerequisites">Prerequisites for configuration</link> Passwordless <command>ssh</command> connectivity between all servers in the replication cluster
for information on networking requirements. is not required, but is necessary in the following cases:
<itemizedlist>
<listitem>
<simpara>if you need &repmgr; to copy configuration files from outside the PostgreSQL
data directory (in which case <command>rsync</command> is also required)</simpara>
</listitem>
<listitem>
<simpara>to perform <link linkend="performing-switchover">switchover operations</link></simpara>
</listitem>
<listitem>
<simpara>
when executing <command><link linkend="repmgr-cluster-matrix">repmgr cluster matrix</link></command>
and <command><link linkend="repmgr-cluster-crosscheck">repmgr cluster crosscheck</link></command>
</simpara>
</listitem>
</itemizedlist>
</para> </para>
<tip> <tip>
@@ -69,111 +69,4 @@
terminated if your <command>ssh</command> session to the server is interrupted or closed. terminated if your <command>ssh</command> session to the server is interrupted or closed.
</simpara> </simpara>
</tip> </tip>
<sect2 id="install-compatibility-matrix">
<indexterm>
<primary>repmgr</primary>
<secondary>compatibility matrix</secondary>
</indexterm>
<indexterm>
<primary>compatibility matrix</primary>
</indexterm>
<title>&repmgr; compatibility matrix</title>
<para>
The following table provides an overview of which &repmgr; version supports
which PostgreSQL version.
</para>
<table id="repmgr-compatibility-matrix">
<title>&repmgr; compatibility matrix</title>
<tgroup cols="2">
<thead>
<row>
<entry>
&repmgr; version
</entry>
<entry>
Latest release
</entry>
<entry>
Supported PostgreSQL versions
</entry>
</row>
</thead>
<tbody>
<row>
<entry>
&repmgr; 4.x
</entry>
<entry>
<link linkend="release-4.2">4.2</link> (2018-10-24)
</entry>
<entry>
9.3, 9.4, 9.5, 9.6, 10, 11
</entry>
</row>
<row>
<entry>
&repmgr; 3.x
</entry>
<entry>
<ulink url="https://repmgr.org/release-notes-3.3.2.html">3.3.2</ulink> (2017-05-30)
</entry>
<entry>
9.3, 9.4, 9.5, 9.6
</entry>
</row>
<row>
<entry>
&repmgr; 2.x
</entry>
<entry>
<ulink url="https://repmgr.org/release-notes-2.0.3.html">2.0.3</ulink> (2015-04-16)
</entry>
<entry>
9.0, 9.1, 9.2, 9.3, 9.4
</entry>
</row>
</tbody>
</tgroup>
</table>
<important>
<para>
The &repmgr; 2.x and 3.x series are no longer maintained or supported.
We strongly recommend upgrading to the latest &repmgr; version.
</para>
</important>
<para>
Note that some &repmgr; functionality is not available in PostgreSQL 9.3 and PostgreSQL 9.4.
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>
PostgreSQL 9.3 does not support replication slots, so corresponding &repmgr; functionality
is not available.
</para>
</listitem>
<listitem>
<para>
In PostgreSQL 9.3 and PostgreSQL 9.4, <command>pg_rewind</command> is not part of the core
distribution. <command>pg_rewind</command> will need to be compiled separately to be able
to use any &repmgr; functionality which takes advantage of it.
</para>
</listitem>
</itemizedlist>
</sect2>
</sect1> </sect1>

View File

@@ -12,8 +12,8 @@
To install &repmgr; the prerequisites for compiling To install &repmgr; the prerequisites for compiling
&postgres; must be installed. These are described in &postgres;'s &postgres; must be installed. These are described in &postgres;'s
documentation documentation
on <ulink url="https://www.postgresql.org/docs/current/static/install-requirements.html">build requirements</ulink> on <ulink url="https://www.postgresql.org/docs/current/install-requirements.html">build requirements</ulink>
and <ulink url="https://www.postgresql.org/docs/current/static/docguide-toolsets.html">build requirements for documentation</ulink>. and <ulink url="https://www.postgresql.org/docs/current/docguide-toolsets.html">build requirements for documentation</ulink>.
</para> </para>
<para> <para>
@@ -26,68 +26,12 @@
add the <ulink add the <ulink
url="http://apt.postgresql.org/">apt.postgresql.org</ulink> url="http://apt.postgresql.org/">apt.postgresql.org</ulink>
repository to your <filename>sources.list</filename> if you repository to your <filename>sources.list</filename> if you
have not already done so, and ensure the source repository is enabled. have not already done so. Then install the pre-requisites for
</para> building PostgreSQL with:
<tip>
<para>
If not configured, the source repository can be added by including
a <literal>deb-src</literal> line as a copy of the existing <literal>deb</literal>
line in the repository file, which is usually
<filename>/etc/apt/sources.list.d/pgdg.list</filename>, e.g.:
<programlisting>
deb http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main
deb-src http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisting>
</para>
</tip>
<para>
Then install the prerequisites for
building PostgreSQL with e.g.:
<programlisting> <programlisting>
sudo apt-get update sudo apt-get update
sudo apt-get build-dep postgresql-9.6</programlisting> sudo apt-get build-dep postgresql-9.6</programlisting>
</para> </para>
<important>
<simpara>
Select the appropriate PostgreSQL version for your target repmgr version.
</simpara>
</important>
<note>
<para>
If using <command>apt-get build-dep</command> is not possible, the
following packages may need to be installed manually:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>llibedit-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibkrb5-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibpam0g-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibreadline-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibselinux1-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibssl-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibxml2-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibxslt1-dev</literal></simpara>
</listitem>
</itemizedlist>
</para>
</note>
</listitem> </listitem>
<listitem> <listitem>
<para> <para>
@@ -101,45 +45,15 @@ deb-src http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisti
sudo yum install yum-utils openjade docbook-dtds docbook-style-dsssl docbook-style-xsl sudo yum install yum-utils openjade docbook-dtds docbook-style-dsssl docbook-style-xsl
sudo yum-builddep postgresql96</programlisting> sudo yum-builddep postgresql96</programlisting>
</para> </para>
</listitem>
</itemizedlist>
</para>
<important>
<simpara>
Select the appropriate PostgreSQL version for your target repmgr version.
</simpara>
</important>
<note> <note>
<para> <simpara>
If using <command>yum-builddep</command> is not possible, the Select the appropriate PostgreSQL versions for your target repmgr version.
following packages may need to be installed manually: </simpara>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>libselinux-devel</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libxml2-devel</literal></simpara>
</listitem>
<listitem>
<simpara><literal>libxslt-devel</literal></simpara>
</listitem>
<listitem>
<simpara><literal>openssl-devel</literal></simpara>
</listitem>
<listitem>
<simpara><literal>pam-devel</literal></simpara>
</listitem>
<listitem>
<simpara><literal>readline-devel</literal></simpara>
</listitem>
</itemizedlist>
</para>
</note> </note>
</listitem>
</itemizedlist>
</para>
</sect2> </sect2>
@@ -166,7 +80,7 @@ deb-src http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisti
</para> </para>
<para> <para>
There are also tags for each &repmgr; release, e.g. <literal>v4.2.0</literal>. There are also tags for each &repmgr; release, e.g. <filename>4.0.5</filename>.
</para> </para>
<para> <para>
@@ -251,7 +165,7 @@ deb-src http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisti
<note> <note>
<simpara> <simpara>
Due to changes in PostgreSQL's documentation build system from PostgreSQL 10, Due to changes in PostgreSQL's documentation build system from PostgreSQL 10,
the documentation can currently only be built against PostgreSQL 9.6 or earlier. the documentation can currently only be built agains PostgreSQL 9.6 or earlier.
This limitation will be fixed when time and resources permit. This limitation will be fixed when time and resources permit.
</simpara> </simpara>
</note> </note>

View File

@@ -1,10 +1,6 @@
<chapter id="quickstart" xreflabel="Quick-start guide"> <chapter id="quickstart" xreflabel="Quick-start guide">
<title>Quick-start guide</title> <title>Quick-start guide</title>
<indexterm>
<primary>quickstart</primary>
</indexterm>
<para> <para>
This section gives a quick introduction to &repmgr;, including setting up a This section gives a quick introduction to &repmgr;, including setting up a
sample &repmgr; installation and a basic replication cluster. sample &repmgr; installation and a basic replication cluster.
@@ -54,8 +50,7 @@
</para> </para>
<para> <para>
If you want <application>repmgr</application> to copy configuration files which are If you want <application>repmgr</application> to copy configuration files which are
located outside the PostgreSQL data directory, and/or to test located outside the PostgreSQL data directory, and/or to test <command>switchover</command>
<command><link linkend="repmgr-standby-switchover">switchover</link></command>
functionality, you will also need passwordless SSH connections between both servers, and functionality, you will also need passwordless SSH connections between both servers, and
<application>rsync</application> should be installed. <application>rsync</application> should be installed.
</para> </para>
@@ -68,7 +63,7 @@
</tip> </tip>
</sect1> </sect1>
<sect1 id="quickstart-postgresql-configuration" xreflabel="PostgreSQL configuration"> <sect1 id="quickstart-postgresql-configuration">
<title>PostgreSQL configuration</title> <title>PostgreSQL configuration</title>
<para> <para>
On the primary server, a PostgreSQL instance must be initialised and running. On the primary server, a PostgreSQL instance must be initialised and running.
@@ -83,13 +78,6 @@
max_wal_senders = 10 max_wal_senders = 10
# Enable replication slots; set this figure to at least one more
# than the number of standbys which will connect to this server.
# Note that repmgr will only make use of replication slots if
# "use_replication_slots" is set to "true" in repmgr.conf
max_replication_slots = 0
# Ensure WAL files contain enough information to enable read-only queries # Ensure WAL files contain enough information to enable read-only queries
# on the standby. # on the standby.
# #
@@ -114,6 +102,16 @@
# you WALs in a secure place. /bin/true is an example of a command that # you WALs in a secure place. /bin/true is an example of a command that
# ignores archiving. Use something more sensible. # ignores archiving. Use something more sensible.
archive_command = '/bin/true' archive_command = '/bin/true'
# If you have configured "pg_basebackup_options"
# in "repmgr.conf" to include the setting "--xlog-method=fetch" (from
# PostgreSQL 10 "--wal-method=fetch"), *and* you have not set
# "restore_command" in "repmgr.conf"to fetch WAL files from another
# source such as Barman, you'll need to set "wal_keep_segments" to a
# high enough value to ensure that all WAL files generated while
# the standby is being cloned are retained until the standby starts up.
#
# wal_keep_segments = 5000
</programlisting> </programlisting>
<tip> <tip>
<simpara> <simpara>
@@ -128,9 +126,6 @@
and the cluster was not initialised using data checksums, you may want to consider enabling and the cluster was not initialised using data checksums, you may want to consider enabling
<varname>wal_log_hints</varname>; for more details see <xref linkend="repmgr-node-rejoin-pg-rewind">. <varname>wal_log_hints</varname>; for more details see <xref linkend="repmgr-node-rejoin-pg-rewind">.
</para> </para>
<para>
See also the <link linkend="configuration-postgresql">PostgreSQL configuration</link> section in the <link linkend="configuration">repmgr configuaration guide</link>.
</para>
</sect1> </sect1>
<sect1 id="quickstart-repmgr-user-database"> <sect1 id="quickstart-repmgr-user-database">
@@ -201,20 +196,11 @@
<sect1 id="quickstart-standby-preparation"> <sect1 id="quickstart-standby-preparation">
<title>Preparing the standby</title> <title>Preparing the standby</title>
<para> <para>
On the standby, do <emphasis>not</emphasis> create a PostgreSQL instance (i.e. On the standby, do not create a PostgreSQL instance, but do ensure the destination
do not execute <application>initdb</application> or any database creation
scripts provided by packages), but do ensure the destination
data directory (and any other directories which you want PostgreSQL to use) data directory (and any other directories which you want PostgreSQL to use)
exist and are owned by the <literal>postgres</literal> system user. Permissions exist and are owned by the <literal>postgres</literal> system user. Permissions
must be set to <literal>0700</literal> (<literal>drwx------</literal>). must be set to <literal>0700</literal> (<literal>drwx------</literal>).
</para> </para>
<tip>
<simpara>
&repmgr; will place a copy of the primary's database files in this directory.
It will however refuse to run if a PostgreSQL instance has already been
created there.
</simpara>
</tip>
<para> <para>
Check the primary database is reachable from the standby using <application>psql</application>: Check the primary database is reachable from the standby using <application>psql</application>:
</para> </para>
@@ -248,45 +234,17 @@
<para> <para>
<filename>repmgr.conf</filename> should not be stored inside the PostgreSQL data directory, <filename>repmgr.conf</filename> should not be stored inside the PostgreSQL data directory,
as it could be overwritten when setting up or reinitialising the PostgreSQL as it could be overwritten when setting up or reinitialising the PostgreSQL
server. See sections <xref linkend="configuration"> and <xref linkend="configuration-file"> server. See sections on <xref linkend="configuration-file"> and <xref linkend="configuration-file-settings">
for further details about <filename>repmgr.conf</filename>. for further details about <filename>repmgr.conf</filename>.
</para> </para>
<note>
<para>
&repmgr; only uses <option>pg_bindir</option> when it executes
PostgreSQL binaries directly.
</para>
<para>
For user-defined scripts such as <option>promote_command</option> and the
various <option>service_*_command</option>s, you <emphasis>must</emphasis>
always explicitly provide the full path to the binary or script being
executed, even if it is &repmgr; itself.
</para>
<para>
This is because these options can contain user-defined scripts in arbitrary
locations, so prepending <option>pg_bindir</option> may break them.
</para>
</note>
<tip> <tip>
<simpara> <simpara>
For Debian-based distributions we recommend explictly setting For Debian-based distributions we recommend explictly setting
<option>pg_bindir</option> to the directory where <command>pg_ctl</command> and other binaries <literal>pg_bindir</literal> to the directory where <command>pg_ctl</command> and other binaries
not in the standard path are located. For PostgreSQL 9.6 this would be <filename>/usr/lib/postgresql/9.6/bin/</filename>. not in the standard path are located. For PostgreSQL 9.6 this would be <filename>/usr/lib/postgresql/9.6/bin/</filename>.
</simpara> </simpara>
</tip> </tip>
<tip>
<simpara>
If your distribution places the &repmgr; binaries in a location other than the
PostgreSQL installation directory, specify this with <option>repmgr_bindir</option>
to enable &repmgr; to perform operations (e.g.
<command><link linkend="repmgr-cluster-crosscheck">repmgr cluster crosscheck</link></command>)
on other nodes.
</simpara>
</tip>
<para> <para>
See the file See the file
<ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</> <ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</>

View File

@@ -15,14 +15,9 @@
<title>Description</title> <title>Description</title>
<para> <para>
Purges monitoring history from the <literal>repmgr.monitoring_history</literal> table to Purges monitoring history from the <literal>repmgr.monitoring_history</literal> table to
prevent excessive table growth. prevent excessive table growth. Use the <literal>-k/--keep-history</literal> to specify the
</para> number of days of monitoring history to retain. This command can be used
<para> manually or as a cronjob.
By default <emphasis>all</emphasis> data will be removed; Use the <option>-k/--keep-history</option>
option to specify the number of days of monitoring history to retain.
</para>
<para>
This command can be executed manually or as a cronjob.
</para> </para>
</refsect1> </refsect1>
@@ -43,35 +38,4 @@
<filename>repmgr.conf</filename>. <filename>repmgr.conf</filename>.
</para> </para>
</refsect1> </refsect1>
<refsect1 id="repmgr-cluster-cleanup-events">
<title>Event notifications</title>
<para>
A <literal>cluster_cleanup</literal> <link linkend="event-notifications">event notification</link> will be generated.
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--node-id</option></term>
<listitem>
<para>
Only delete monitoring records for the specified node.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
For more details see the sections <xref linkend="repmgrd-monitoring"> and
<xref linkend="repmgrd-monitoring-configuration">.
</para>
</refsect1>
</refentry> </refentry>

View File

@@ -56,36 +56,11 @@
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>
<term><option>ERR_BAD_SSH (12)</option></term> <term><option>ERR_CLUSTER_CHECK (25)</option></term>
<listitem> <listitem>
<para> <para>
One or more nodes could not be accessed via SSH. One or more nodes could not be reached.
</para> </para>
<note>
<simpara>
This only applies to nodes unreachable from the node where
this command is executed.
</simpara>
<simpara>
It's also possible that the crosscheck establishes that
connections between PostgreSQL on all nodes are functioning,
even if SSH access between some nodes is not possible.
</simpara>
</note>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
PostgreSQL on one or more nodes could not be reached.
</para>
<note>
<simpara>
This error code overrides <option>ERR_BAD_SSH</option>.
</simpara>
</note>
</listitem> </listitem>
</varlistentry> </varlistentry>

View File

@@ -49,22 +49,6 @@
</para> </para>
</refsect1> </refsect1>
<refsect1>
<title>Output format</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>--csv</literal>: generate output in CSV format. Note that the <literal>Details</literal>
column will currently not be emitted in CSV format.
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1> <refsect1>
<title>Example</title> <title>Example</title>
<para> <para>

View File

@@ -116,28 +116,14 @@
</varlistentry> </varlistentry>
<varlistentry> <varlistentry>
<term><option>ERR_BAD_SSH (12)</option></term> <term><option>ERR_CLUSTER_CHECK (25)</option></term>
<listitem> <listitem>
<para> <para>
One or more nodes could not be accessed via SSH. One or more nodes could not be reached.
</para> </para>
</listitem> </listitem>
</varlistentry> </varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
PostgreSQL on one or more nodes could not be reached.
</para>
<note>
<simpara>
This error code overrides <option>ERR_BAD_SSH</option>.
</simpara>
</note>
</listitem>
</varlistentry>
</variablelist> </variablelist>
</refsect1> </refsect1>

View File

@@ -81,16 +81,10 @@
<refsect1> <refsect1>
<title>Options</title> <title>Options</title>
<variablelist>
<varlistentry>
<term><option>--csv</option></term>
<listitem>
<para> <para>
<command>repmgr cluster show</command> accepts an optional parameter <literal>--csv</literal>, which <command>repmgr cluster show</command> accepts an optional parameter <literal>--csv</literal>, which
outputs the replication cluster's status in a simple CSV format, suitable for outputs the replication cluster's status in a simple CSV format, suitable for
parsing by scripts, e.g.: parsing by scripts:
<programlisting> <programlisting>
$ repmgr -f /etc/repmgr.conf cluster show --csv $ repmgr -f /etc/repmgr.conf cluster show --csv
1,-1,-1 1,-1,-1
@@ -117,76 +111,6 @@
</listitem> </listitem>
</itemizedlist> </itemizedlist>
</para> </para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--verbose</option></term>
<listitem>
<para>
Display the full text of any database connection error messages
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr cluster show</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_BAD_CONFIG (1)</option></term>
<listitem>
<para>
An issue was encountered while attempting to retrieve
&repmgr; metadata.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_DB_CONN (6)</option></term>
<listitem>
<para>
&repmgr; was unable to connect to the local PostgreSQL instance.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected with the replication configuration,
e.g. a node was not in its expected state.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-node-status">, <xref linkend="repmgr-node-check">, <xref linkend="repmgr-daemon-status">
</para>
</refsect1> </refsect1>
</refentry> </refentry>

View File

@@ -1,109 +0,0 @@
<refentry id="repmgr-daemon-pause">
<indexterm>
<primary>repmgr daemon pause</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr daemon pause</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr daemon pause</refname>
<refpurpose>Instruct all <application>repmgrd</application> instances in the replication cluster to pause failover operations</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command can be run on any active node in the replication cluster to instruct all
running <application>repmgrd</application> instances to &quot;pause&quot; themselves, i.e. take no
action (such as promoting themselves or following a new primary) if a failover event is detected.
</para>
<para>
This functionality is useful for performing maintenance operations, such as switchovers
or upgrades, which might otherwise trigger a failover if <application>repmgrd</application>
is running normally.
</para>
<note>
<para>
It's important to wait a few seconds after restarting PostgreSQL on any node before running
<command>repmgr daemon pause</command>, as the <application>repmgrd</application> instance
on the restarted node will take a second or two before it has updated its status.
</para>
</note>
<para>
<xref linkend="repmgr-daemon-unpause"> will instruct all previously paused <application>repmgrd</application>
instances to resume normal failover operation.
</para>
</refsect1>
<refsect1>
<title>Execution</title>
<para>
<command>repmgr daemon pause</command> can be executed on any active node in the
replication cluster. A valid <filename>repmgr.conf</filename> file is required.
It will have no effect on previously paused nodes.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<para>
<programlisting>
$ repmgr -f /etc/repmgr.conf daemon pause
NOTICE: node 1 (node1) paused
NOTICE: node 2 (node2) paused
NOTICE: node 3 (node3) paused</programlisting>
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check if nodes are reachable but don't pause <application>repmgrd</application>.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr daemon unpause</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
<application>repmgrd</application> could be paused on all nodes.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_REPMGRD_PAUSE (26)</option></term>
<listitem>
<para>
<application>repmgrd</application> could not be paused on one or mode nodes.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-daemon-unpause">, <xref linkend="repmgr-daemon-status">
</para>
</refsect1>
</refentry>

View File

@@ -1,165 +0,0 @@
<refentry id="repmgr-daemon-status">
<indexterm>
<primary>repmgr daemon status</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr daemon status</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr daemon status</refname>
<refpurpose>display information about the status of <application>repmgrd</application> on each node in the cluster</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command provides an overview over all active nodes in the cluster and the state
of each node's <application>repmgrd</application> instance. It can be used to check
the result of <xref linkend="repmgr-daemon-pause"> and <xref linkend="repmgr-daemon-unpause">
operations.
</para>
</refsect1>
<refsect1>
<title>Execution</title>
<para>
<command>repmgr daemon status</command> can be executed on any active node in the
replication cluster. A valid <filename>repmgr.conf</filename> file is required.
</para>
<note>
<para>
After restarting PostgreSQL on any node, the <application>repmgrd</application> instance
will take a second or two before it is able to update its status. Until then,
<application>repmgrd</application> will be shown as not running.
</para>
</note>
</refsect1>
<refsect1>
<title>Examples</title>
<para>
<application>repmgrd</application> running normally on all nodes:
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
ID | Name | Role | Status | repmgrd | PID | Paused?
----+-------+---------+---------+---------+------+---------
1 | node1 | primary | running | running | 7851 | no
2 | node2 | standby | running | running | 7889 | no
3 | node3 | standby | running | running | 7918 | no</programlisting>
</para>
<para>
<application>repmgrd</application> paused on all nodes (using <xref linkend="repmgr-daemon-pause">):
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
ID | Name | Role | Status | repmgrd | PID | Paused?
----+-------+---------+---------+---------+------+---------
1 | node1 | primary | running | running | 7851 | yes
2 | node2 | standby | running | running | 7889 | yes
3 | node3 | standby | running | running | 7918 | yes</programlisting>
</para>
<para>
<application>repmgrd</application> not running on one node:
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
ID | Name | Role | Status | repmgrd | PID | Paused?
----+-------+---------+---------+-------------+------+---------
1 | node1 | primary | running | running | 7851 | yes
2 | node2 | standby | running | not running | n/a | n/a
3 | node3 | standby | running | running | 7918 | yes</programlisting>
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--csv</option></term>
<listitem>
<para>
<command>repmgr daemon status</command> accepts an optional parameter <literal>--csv</literal>, which
outputs the replication cluster's status in a simple CSV format, suitable for
parsing by scripts, e.g.:
<programlisting>
$ repmgr -f /etc/repmgr.conf daemon status --csv
1,node1,primary,1,1,10204,1
2,node2,standby,1,0,-1,1
3,node3,standby,1,1,10225,1</programlisting>
</para>
<para>
The columns have following meanings:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
node ID
</simpara>
</listitem>
<listitem>
<simpara>
node name
</simpara>
</listitem>
<listitem>
<simpara>
node type (primary or standby)
</simpara>
</listitem>
<listitem>
<simpara>
PostgreSQL server running
</simpara>
</listitem>
<listitem>
<simpara>
<application>repmgrd</application> running (1 = running, 0 = not running)
</simpara>
</listitem>
<listitem>
<simpara>
<application>repmgrd</application> PID (-1 if not running)
</simpara>
</listitem>
<listitem>
<simpara>
<application>repmgrd</application> paused (1 = paused, 0 = not paused)
</simpara>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--verbose</option></term>
<listitem>
<para>
Display the full text of any database connection error messages
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-daemon-pause">, <xref linkend="repmgr-daemon-unpause">, <xref linkend="repmgr-cluster-show">
</para>
</refsect1>
</refentry>

View File

@@ -1,103 +0,0 @@
<refentry id="repmgr-daemon-unpause">
<indexterm>
<primary>repmgr daemon unpause</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr daemon unpause</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr daemon unpause</refname>
<refpurpose>Instruct all <application>repmgrd</application> instances in the replication cluster to resume failover operations</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command can be run on any active node in the replication cluster to instruct all
running <application>repmgrd</application> instances to &quot;unpause&quot;
(following a previous execution of <xref linkend="repmgr-daemon-pause">)
and resume normal failover/monitoring operation.
</para>
<note>
<para>
It's important to wait a few seconds after restarting PostgreSQL on any node before running
<command>repmgr daemon pause</command>, as the <application>repmgrd</application> instance
on the restarted node will take a second or two before it has updated its status.
</para>
</note>
</refsect1>
<refsect1>
<title>Execution</title>
<para>
<command>repmgr daemon unpause</command> can be executed on any active node in the
replication cluster. A valid <filename>repmgr.conf</filename> file is required.
It will have no effect on nodes which are not already paused.
</para>
</refsect1>
<refsect1>
<title>Example</title>
<para>
<programlisting>
$ repmgr -f /etc/repmgr.conf daemon unpause
NOTICE: node 1 (node1) unpaused
NOTICE: node 2 (node2) unpaused
NOTICE: node 3 (node3) unpaused</programlisting>
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check if nodes are reachable but don't unpause <application>repmgrd</application>.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr daemon unpause</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
<application>repmgrd</application> could be unpaused on all nodes.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_REPMGRD_PAUSE (26)</option></term>
<listitem>
<para>
<application>repmgrd</application> could not be unpaused on one or mode nodes.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-daemon-pause">, <xref linkend="repmgr-daemon-status">
</para>
</refsect1>
</refentry>

View File

@@ -30,8 +30,7 @@
Replication lag: OK (N/A - node is primary) Replication lag: OK (N/A - node is primary)
WAL archiving: OK (0 pending files) WAL archiving: OK (0 pending files)
Downstream servers: OK (2 of 2 downstream nodes attached) Downstream servers: OK (2 of 2 downstream nodes attached)
Replication slots: OK (node has no replication slots) Replication slots: OK (node has no replication slots)</programlisting>
Missing replication slots: OK (node has no missing replication slots)</programlisting>
</para> </para>
</refsect1> </refsect1>
<refsect1> <refsect1>
@@ -62,9 +61,7 @@
<listitem> <listitem>
<simpara> <simpara>
<literal>--archive-ready</literal>: checks for WAL files which have not yet been archived, <literal>--archive-ready</literal>: checks for WAL files which have not yet been archived
and returns <literal>WARNING</literal> or <literal>CRITICAL</literal> if the number
exceeds <varname>archive_ready_warning</varname> or <varname>archive_ready_critical</varname> respectively.
</simpara> </simpara>
</listitem> </listitem>
@@ -80,12 +77,6 @@
</simpara> </simpara>
</listitem> </listitem>
<listitem>
<simpara>
<literal>--missing-slots</literal>: checks there are no missing replication slots
</simpara>
</listitem>
</itemizedlist> </itemizedlist>
</para> </para>
</refsect1> </refsect1>
@@ -110,80 +101,4 @@
</itemizedlist> </itemizedlist>
</para> </para>
</refsect1> </refsect1>
<refsect1>
<title>Exit codes</title>
<para>
When executing <command>repmgr node check</command> with one of the individual
checks listed above, &repmgr; will emit one of the following Nagios-style exit codes
(even if <literal>--nagios</literal> is not supplied):
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>0</literal>: OK
</simpara>
</listitem>
<listitem>
<simpara>
<literal>1</literal>: WARNING
</simpara>
</listitem>
<listitem>
<simpara>
<literal>2</literal>: ERROR
</simpara>
</listitem>
<listitem>
<simpara>
<literal>3</literal>: UNKNOWN
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Following exit codes can be emitted by <command>repmgr status check</command>
if no individual check was specified.
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-node-status">, <xref linkend="repmgr-cluster-show">
</para>
</refsect1>
</refentry> </refentry>

View File

@@ -28,10 +28,6 @@
If the node is running and needs to be attached to the current primary, use If the node is running and needs to be attached to the current primary, use
<xref linkend="repmgr-standby-follow">. <xref linkend="repmgr-standby-follow">.
</para> </para>
<para>
Note <xref linkend="repmgr-standby-follow"> can only be used for standbys which have not diverged
from the rest of the cluster.
</para>
</tip> </tip>
</refsect1> </refsect1>
@@ -67,10 +63,10 @@
<term><option>--force-rewind[=/path/to/pg_rewind]</option></term> <term><option>--force-rewind[=/path/to/pg_rewind]</option></term>
<listitem> <listitem>
<para> <para>
Execute <application>pg_rewind</application>. Execute <application>pg_rewind</application> if necessary.
</para> </para>
<para> <para>
It is only necessary to provide the <application>pg_rewind</application> path It is only necessary to provide the <application>pg_rewind</application>
if using PostgreSQL 9.3 or 9.4, and <application>pg_rewind</application> if using PostgreSQL 9.3 or 9.4, and <application>pg_rewind</application>
is not installed in the PostgreSQL <filename>bin</filename> directory. is not installed in the PostgreSQL <filename>bin</filename> directory.
</para> </para>
@@ -119,26 +115,8 @@
</variablelist> </variablelist>
</refsect1> </refsect1>
<refsect1> <refsect1>
<title>Configuration file settings</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>node_rejoin_timeout</literal>:
the maximum length of time (in seconds) to wait for
the node to reconnect to the replication cluster (defaults to
the value set in <literal>standby_reconnect_timeout</literal>,
60 seconds).
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1 id="repmgr-node-rejoin-events">
<title>Event notifications</title> <title>Event notifications</title>
<para> <para>
A <literal>node_rejoin</literal> <link linkend="event-notifications">event notification</link> will be generated. A <literal>node_rejoin</literal> <link linkend="event-notifications">event notification</link> will be generated.
@@ -193,7 +171,7 @@
</note> </note>
<para> <para>
To have <command>repmgr node rejoin</command> use <command>pg_rewind</command>, To have <command>repmgr node rejoin</command> use <command>pg_rewind</command> if required,
pass the command line option <literal>--force-rewind</literal>, which will tell &repmgr; pass the command line option <literal>--force-rewind</literal>, which will tell &repmgr;
to execute <command>pg_rewind</command> to ensure the node can be rejoined successfully. to execute <command>pg_rewind</command> to ensure the node can be rejoined successfully.
</para> </para>
@@ -226,15 +204,6 @@
INFO: pg_rewind would now be executed INFO: pg_rewind would now be executed
DETAIL: pg_rewind command is: DETAIL: pg_rewind command is:
pg_rewind -D '/var/lib/postgresql/data' --source-server='host=node1 dbname=repmgr user=repmgr'</programlisting> pg_rewind -D '/var/lib/postgresql/data' --source-server='host=node1 dbname=repmgr user=repmgr'</programlisting>
<note>
<para>
If <option>--force-rewind</option> is used with the <option>--dry-run</option> option,
this checks the prerequisites for using <application>pg_rewind</application>, but cannot
predict the outcome of actually executing <application>pg_rewind</application>.
</para>
</note>
<programlisting> <programlisting>
$ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node1 dbname=repmgr user=repmgr' \ $ repmgr node rejoin -f /etc/repmgr.conf -d 'host=node1 dbname=repmgr user=repmgr' \
--force-rewind --config-files=postgresql.local.conf,postgresql.conf --verbose --force-rewind --config-files=postgresql.local.conf,postgresql.conf --verbose

View File

@@ -1,151 +0,0 @@
<refentry id="repmgr-node-service">
<indexterm>
<primary>repmgr node service</primary>
</indexterm>
<refmeta>
<refentrytitle>repmgr node service</refentrytitle>
</refmeta>
<refnamediv>
<refname>repmgr node service</refname>
<refpurpose>show or execute the system service command to stop/start/restart/reload/promote a node</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
Shows or executes the system service command to stop/start/restart/reload a node.
</para>
<para>
This command is mainly meant for internal &repmgr; usage, but is useful for
confirming the command configuration.
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Log the steps which would be taken, including displaying the command which would be executed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--action</option></term>
<listitem>
<para>
The action to perform. One of <literal>start</literal>, <literal>stop</literal>,
<literal>restart</literal>, <literal>reload</literal> or <literal>promote</literal>.
</para>
<para>
If the parameter <option>--list-actions</option> is provided together with
<option>--action</option>, the command which would be executed will be printed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--list-actions</option></term>
<listitem>
<para>
List all configured commands.
</para>
<para>
If the parameter <option>--action</option> is provided together with
<option>--list-actions</option>, the command which would be executed for that
particular action will be printed.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--checkpoint</option></term>
<listitem>
<para>
Issue a <command>CHECKPOINT</command> before stopping or restarting the node.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr node service</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_LOCAL_COMMAND (5)</option></term>
<listitem>
<para>
Execution of the system service command failed.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Examples</title>
<para>
See what action would be taken for a restart:
<programlisting>
[postgres@node1 ~]$ repmgr -f /etc/repmgr/11/repmgr.conf node service --action=restart --checkpoint --dry-run
INFO: a CHECKPOINT would be issued here
INFO: would execute server command "sudo service postgresql-11 restart"</programlisting>
</para>
<para>
Restart the PostgreSQL instance:
<programlisting>
[postgres@node1 ~]$ repmgr -f /etc/repmgr/11/repmgr.conf node service --action=restart --checkpoint
NOTICE: issuing CHECKPOINT
DETAIL: executing server command "sudo service postgresql-11 restart"
Redirecting to /bin/systemctl restart postgresql-11.service</programlisting>
</para>
<para>
List all commands:
<programlisting>
[postgres@node1 ~]$ repmgr -f /etc/repmgr/11/repmgr.conf node service --list-actions
Following commands would be executed for each action:
start: "sudo service postgresql-11 start"
stop: "sudo service postgresql-11 stop"
restart: "sudo service postgresql-11 restart"
reload: "sudo service postgresql-11 reload"
promote: "/usr/pgsql-11/bin/pg_ctl -w -D '/var/lib/pgsql/11/data' promote"</programlisting>
</para>
<para>
List a single command:
<programlisting>
[postgres@node1 ~]$ repmgr -f /etc/repmgr/11/repmgr.conf node service --list-actions --action=promote
/usr/pgsql-11/bin/pg_ctl -w -D '/var/lib/pgsql/11/data' promote </programlisting>
</para>
</refsect1>
</refentry>

View File

@@ -52,40 +52,10 @@
</para> </para>
</refsect1> </refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr node status</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1> <refsect1>
<title>See also</title> <title>See also</title>
<para> <para>
See <xref linkend="repmgr-node-check"> to diagnose issues and <xref linkend="repmgr-cluster-show"> See <xref linkend="repmgr-node-check"> to diagnose issues.
for an overview of all nodes in the cluster.
</para> </para>
</refsect1> </refsect1>
</refentry> </refentry>

View File

@@ -17,7 +17,7 @@
<title>Description</title> <title>Description</title>
<para> <para>
<command>repmgr primary register</command> registers a primary node in a <command>repmgr primary register</command> registers a primary node in a
streaming replication cluster, and configures it for use with &repmgr;, including streaming replication cluster, and configures it for use with repmgr, including
installing the &repmgr; extension. This command needs to be executed before any installing the &repmgr; extension. This command needs to be executed before any
standby nodes are registered. standby nodes are registered.
</para> </para>
@@ -75,18 +75,10 @@
</refsect1> </refsect1>
<refsect1 id="repmgr-primary-register-events"> <refsect1>
<title>Event notifications</title> <title>Event notifications</title>
<para> <para>
Following <link linkend="event-notifications">event notifications</link> will be generated: A <literal>primary_register</literal> <link linkend="event-notifications">event notification</link> will be generated.
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>cluster_created</literal></simpara>
</listitem>
<listitem>
<simpara><literal>primary_register</literal></simpara>
</listitem>
</itemizedlist>
</para> </para>
</refsect1> </refsect1>

View File

@@ -64,7 +64,7 @@
</refsect1> </refsect1>
<refsect1 id="repmgr-primary-unregister-events"> <refsect1>
<title>Event notifications</title> <title>Event notifications</title>
<para> <para>
A <literal>primary_unregister</literal> <link linkend="event-notifications">event notification</link> will be generated. A <literal>primary_unregister</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -49,7 +49,7 @@
not be copied by default. &repmgr; can copy these files, either to the same not be copied by default. &repmgr; can copy these files, either to the same
location on the standby server (provided appropriate directory and file permissions location on the standby server (provided appropriate directory and file permissions
are available), or into the standby's data directory. This requires passwordless are available), or into the standby's data directory. This requires passwordless
SSH access to the primary server. Add the option <option>--copy-external-config-files</option> SSH access to the primary server. Add the option <literal>--copy-external-config-files</literal>
to the <command>repmgr standby clone</command> command; by default files will be copied to to the <command>repmgr standby clone</command> command; by default files will be copied to
the same path as on the upstream server. Note that the user executing <command>repmgr</command> the same path as on the upstream server. Note that the user executing <command>repmgr</command>
must have write access to those directories. must have write access to those directories.
@@ -59,29 +59,12 @@
<literal>--copy-external-config-files=pgdata</literal>, but note that <literal>--copy-external-config-files=pgdata</literal>, but note that
any include directives in the copied files may need to be updated. any include directives in the copied files may need to be updated.
</para> </para>
<note>
<para>
When executing <command>repmgr standby clone</command> with the
<option>--copy-external-config-files</option> aand <option>--dry-run</option>
options, &repmgr; will check the SSH connection to the source node, but
will not verify whether the files can actually be copied.
</para>
<para>
During the actual clone operation, a check will be made before the database itself
is cloned to determine whether the files can actually be copied; if any problems are
encountered, the clone operation will be aborted, enabling the user to fix
any issues before retrying the clone operation.
</para>
</note>
<tip> <tip>
<simpara> <simpara>
For reliable configuration file management we recommend using a For reliable configuration file management we recommend using a
configuration management tool such as Ansible, Chef, Puppet or Salt. configuration management tool such as Ansible, Chef, Puppet or Salt.
</simpara> </simpara>
</tip> </tip>
</refsect1> </refsect1>
<refsect1 id="repmgr-standby-clone-recovery-conf"> <refsect1 id="repmgr-standby-clone-recovery-conf">
@@ -230,15 +213,6 @@
<variablelist> <variablelist>
<varlistentry>
<term><option>-d, --dbname=CONNINFO</option></term>
<listitem>
<para>
Connection string of the upstream node to use for cloning.
</para>
</listitem>
</varlistentry>
<varlistentry> <varlistentry>
<term><option>--dry-run</option></term> <term><option>--dry-run</option></term>
<listitem> <listitem>
@@ -350,7 +324,7 @@
</variablelist> </variablelist>
</refsect1> </refsect1>
<refsect1 id="repmgr-standby-clone-events"> <refsect1>
<title>Event notifications</title> <title>Event notifications</title>
<para> <para>
A <literal>standby_clone</literal> <link linkend="event-notifications">event notification</link> will be generated. A <literal>standby_clone</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -94,7 +94,7 @@
</variablelist> </variablelist>
</refsect1> </refsect1>
<refsect1 id="repmgr-standby-follow-events"> <refsect1>
<title>Event notifications</title> <title>Event notifications</title>
<para> <para>
A <literal>standby_follow</literal> <link linkend="event-notifications">event notification</link> will be generated. A <literal>standby_follow</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -50,7 +50,7 @@
</refsect1> </refsect1>
<refsect1 id="repmgr-standby-promote-events"> <refsect1>
<title>Event notifications</title> <title>Event notifications</title>
<para> <para>
A <literal>standby_promote</literal> <link linkend="event-notifications">event notification</link> will be generated. A <literal>standby_promote</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -159,7 +159,7 @@
</variablelist> </variablelist>
</refsect1> </refsect1>
<refsect1 id="repmgr-standby-register-events"> <refsect1>
<title>Event notifications</title> <title>Event notifications</title>
<para> <para>
A <literal>standby_register</literal> <link linkend="event-notifications">event notification</link> A <literal>standby_register</literal> <link linkend="event-notifications">event notification</link>

View File

@@ -35,10 +35,6 @@
&repmgr; will attempt to check for potential issues but cannot guarantee &repmgr; will attempt to check for potential issues but cannot guarantee
a successful switchover. a successful switchover.
</para> </para>
<para>
&repmgr; will refuse to perform the switchover if an exclusive backup is running on
the current primary.
</para>
</note> </note>
<para> <para>
For more details on performing a switchover, including preparation and configuration, For more details on performing a switchover, including preparation and configuration,
@@ -47,14 +43,8 @@
<note> <note>
<para> <para>
From <link linkend="release-4.2">repmgr 4.2</link>, &repmgr; will instruct any running <application>repmgrd</application> should not be active on any nodes while a switchover is being
<application>repmgrd</application> instances to pause operations while the switchover executed. This restriction may be lifted in a later version.
is being carried out, to prevent <application>repmgrd</application> from
unintentionally promoting a node. For more details, see <xref linkend="repmgrd-pausing">.
</para>
<para>
Users of &repmgr; versions prior to 4.2 should ensure that <application>repmgrd</application>
is not running on any nodes while a switchover is being executed.
</para> </para>
</note> </note>
@@ -68,9 +58,8 @@
<term><option>--always-promote</option></term> <term><option>--always-promote</option></term>
<listitem> <listitem>
<para> <para>
Promote standby to primary, even if it is behind or has diverged Promote standby to primary, even if it is behind original primary
from the original primary. The original primary will be shut down in any case, (original primary will be shut down in any case).
and will need to be manually reintegrated into the replication cluster.
</para> </para>
</listitem> </listitem>
</varlistentry> </varlistentry>
@@ -130,21 +119,6 @@
</listitem> </listitem>
</varlistentry> </varlistentry>
<varlistentry>
<term><option>--repmgrd-no-pause</option></term>
<listitem>
<para>
Don't pause <application>repmgrd</application> while executing a switchover.
</para>
<para>
This option should not be used unless you take steps by other means
to ensure <application>repmgrd</application> is paused or not
running on all nodes.
</para>
</listitem>
</varlistentry>
<varlistentry> <varlistentry>
<term><option>--siblings-follow</option></term> <term><option>--siblings-follow</option></term>
<listitem> <listitem>
@@ -164,7 +138,19 @@
Note that following parameters in <filename>repmgr.conf</filename> are relevant to the Note that following parameters in <filename>repmgr.conf</filename> are relevant to the
switchover operation: switchover operation:
<itemizedlist spacing="compact" mark="bullet"> <itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>reconnect_attempts</literal>: number of times to check the original primary
for a clean shutdown after executing the shutdown command, before aborting
</simpara>
</listitem>
<listitem>
<simpara>
<literal>reconnect_interval</literal>: interval (in seconds) to check the original
primary for a clean shutdown after executing the shutdown command (up to a maximum
of <literal>reconnect_attempts</literal> tries)
</simpara>
</listitem>
<listitem> <listitem>
<simpara> <simpara>
<literal>replication_lag_critical</literal>: <literal>replication_lag_critical</literal>:
@@ -174,30 +160,11 @@
</simpara> </simpara>
</listitem> </listitem>
<listitem>
<simpara>
<literal>shutdown_check_timeout</literal>: maximum number of seconds to wait for the
demotion candidate (current primary) to shut down, before aborting the switchover.
</simpara>
<simpara>
Note that this parameter is set on the node where <command>repmgr standby switchover</command>
is executed (promotion candidate); setting it on the demotion candidate (former primary) will
have no effect.
</simpara>
<note>
<para>
In versions prior to <link linkend="release-4.2">&repmgr; 4.2</link>, <command>repmgr standby switchover</command> would
use the values defined in <literal>reconnect_attempts</literal> and <literal>reconnect_interval</literal>
to determine the timeout for demotion candidate shutdown.
</para>
</note>
</listitem>
<listitem> <listitem>
<simpara> <simpara>
<literal>standby_reconnect_timeout</literal>: <literal>standby_reconnect_timeout</literal>:
maximum number of seconds to attempt to wait for the demotion candidate (former primary) Number of seconds to attempt to reconnect to the demoted primary
to reconnect to the promoted primary (default: 60 seconds) once it has been restarted.
</simpara> </simpara>
</listitem> </listitem>
@@ -213,7 +180,12 @@
Execute with the <literal>--dry-run</literal> option to test the switchover as far as Execute with the <literal>--dry-run</literal> option to test the switchover as far as
possible without actually changing the status of either node. possible without actually changing the status of either node.
</para> </para>
<important>
<para>
<application>repmgrd</application> must be shut down on all nodes while a switchover is being
executed. This restriction will be removed in a future &repmgr; version.
</para>
</important>
<para> <para>
External database connections, e.g. from an application, should not be permitted while External database connections, e.g. from an application, should not be permitted while
the switchover is taking place. In particular, active transactions on the primary the switchover is taking place. In particular, active transactions on the primary
@@ -221,7 +193,7 @@
</para> </para>
</refsect1> </refsect1>
<refsect1 id="repmgr-standby-switchover-events"> <refsect1>
<title>Event notifications</title> <title>Event notifications</title>
<para> <para>
<literal>standby_switchover</literal> and <literal>standby_promote</literal> <literal>standby_switchover</literal> and <literal>standby_promote</literal>

View File

@@ -59,7 +59,7 @@
</variablelist> </variablelist>
</refsect1> </refsect1>
<refsect1 id="repmgr-standby-unregister-events"> <refsect1>
<title>Event notifications</title> <title>Event notifications</title>
<para> <para>
A <literal>standby_unregister</literal> <link linkend="event-notifications">event notification</link> will be generated. A <literal>standby_unregister</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -23,19 +23,14 @@
use of the witness server with <application>repmgrd</application>. use of the witness server with <application>repmgrd</application>.
</para> </para>
<para> <para>
When executing <command>repmgr witness register</command>, database connection When executing <command>repmgr witness register</command>, connection information
information for the cluster primary server must also be provided. for the cluster primary server must also be provided. &repmgr; will automatically
use the <varname>user</varname> and <varname>dbname</varname> values defined
in the <varname>conninfo</varname> string defined in the witness node's
<filename>repmgr.conf</filename>, if these are not explicitly provided.
</para> </para>
<para> <para>
In most cases it's only necessary to provide the primary's hostname with Execute with the <literal>--dry-run</literal> option to check what would happen
the <option>-h</option>/<option>--host</option> option; &repmgr; will
automatically use the <varname>user</varname> and <varname>dbname</varname>
values defined in the <varname>conninfo</varname> string defined in the
witness node's <filename>repmgr.conf</filename>, unless these are explicitly
provided as command line options.
</para>
<para>
Execute with the <option>--dry-run</option> option to check what would happen
without actually registering the witness server. without actually registering the witness server.
</para> </para>
</refsect1> </refsect1>
@@ -55,7 +50,7 @@
</refsect1> </refsect1>
<refsect1 id="repmgr-witness-register-events"> <refsect1>
<title>Event notifications</title> <title>Event notifications</title>
<para> <para>
A <literal>witness_register</literal> <link linkend="event-notifications">event notification</link> will be generated. A <literal>witness_register</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -20,10 +20,7 @@
</para> </para>
<para> <para>
The node does not have to be running to be unregistered, however if this is the The node does not have to be running to be unregistered, however if this is the
case then either provide connection information for the primary server, or case then connection information for the primary server must be provided.
execute <command>repmgr witness unregister</command> on a running node and
provide the parameter <option>--node-id</option> with the node ID of the
witness server.
</para> </para>
<para> <para>
Execute with the <literal>--dry-run</literal> option to check what would happen Execute with the <literal>--dry-run</literal> option to check what would happen
@@ -39,17 +36,17 @@
INFO: connecting to witness node "node3" (ID: 3) INFO: connecting to witness node "node3" (ID: 3)
INFO: unregistering witness node 3 INFO: unregistering witness node 3
INFO: witness unregistration complete INFO: witness unregistration complete
DETAIL: witness node with UD 3 successfully unregistered</programlisting> DETAIL: witness node with id 3 (conninfo: host=node3 dbname=repmgr user=repmgr port=5499) successfully unregistered</programlisting>
</para> </para>
<para> <para>
Unregistering a non-running witness node: Unregistering a non-running witness node:
<programlisting> <programlisting>
$ repmgr -f /etc/repmgr.conf witness unregister -h node1 -p 5501 -F $ repmgr -f /etc/repmgr.conf witness unregister -h node1 -p 5501 -F
INFO: connecting to node "node3" (ID: 3) INFO: connecting to witness node "node3" (ID: 3)
NOTICE: unable to connect to node "node3" (ID: 3), removing node record on cluster primary only NOTICE: unable to connect to witness node "node3" (ID: 3), removing node record on cluster primary only
INFO: unregistering witness node 3 INFO: unregistering witness node 3
INFO: witness unregistration complete INFO: witness unregistration complete
DETAIL: witness node with id ID 3 successfully unregistered</programlisting> DETAIL: witness node with id 3 (conninfo: host=node3 dbname=repmgr user=repmgr port=5499) successfully unregistered</programlisting>
</para> </para>
</refsect1> </refsect1>
@@ -65,34 +62,8 @@
</para> </para>
</refsect1> </refsect1>
<refsect1> <refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually unregister the witness.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--node-id</option></term>
<listitem>
<para>
Unregister witness server with the specified node ID.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1 id="repmgr-witness-unregister-events">
<title>Event notifications</title> <title>Event notifications</title>
<para> <para>
A <literal>witness_unregister</literal> <link linkend="event-notifications">event notification</link> will be generated. A <literal>witness_unregister</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -24,7 +24,7 @@
<abstract> <abstract>
<para> <para>
This is the official documentation of &repmgr; &repmgrversion; for This is the official documentation of &repmgr; &repmgrversion; for
use with PostgreSQL 9.3 - PostgreSQL 11. use with PostgreSQL 9.3 - PostgreSQL 10.
</para> </para>
<para> <para>
&repmgr; is being continually developed and we strongly recommend using the &repmgr; is being continually developed and we strongly recommend using the
@@ -92,7 +92,6 @@
&repmgrd-cascading-replication; &repmgrd-cascading-replication;
&repmgrd-network-split; &repmgrd-network-split;
&repmgrd-witness-server; &repmgrd-witness-server;
&repmgrd-pausing;
&repmgrd-degraded-monitoring; &repmgrd-degraded-monitoring;
&repmgrd-monitoring; &repmgrd-monitoring;
&repmgrd-bdr; &repmgrd-bdr;
@@ -114,15 +113,11 @@
&repmgr-node-status; &repmgr-node-status;
&repmgr-node-check; &repmgr-node-check;
&repmgr-node-rejoin; &repmgr-node-rejoin;
&repmgr-node-service;
&repmgr-cluster-show; &repmgr-cluster-show;
&repmgr-cluster-matrix; &repmgr-cluster-matrix;
&repmgr-cluster-crosscheck; &repmgr-cluster-crosscheck;
&repmgr-cluster-event; &repmgr-cluster-event;
&repmgr-cluster-cleanup; &repmgr-cluster-cleanup;
&repmgr-daemon-status;
&repmgr-daemon-pause;
&repmgr-daemon-unpause;
</part> </part>
&appendix-release-notes; &appendix-release-notes;

View File

@@ -10,12 +10,12 @@
<title>BDR failover with repmgrd</title> <title>BDR failover with repmgrd</title>
<para> <para>
&repmgr; 4.x provides support for monitoring a pair of BDR 2.x nodes and taking action in &repmgr; 4.x provides support for monitoring BDR nodes and taking action in
case one of the nodes fails. case one of the nodes fails.
</para> </para>
<note> <note>
<simpara> <simpara>
Due to the nature of BDR 1.x/2.x, it's only safe to use this solution for Due to the nature of BDR, it's only safe to use this solution for
a two-node scenario. Introducing additional nodes will create an inherent a two-node scenario. Introducing additional nodes will create an inherent
risk of node desynchronisation if a node goes down without being cleanly risk of node desynchronisation if a node goes down without being cleanly
removed from the cluster. removed from the cluster.
@@ -31,21 +31,8 @@
reconfigure a proxy server/connection pooler such as <application>PgBouncer</application>. reconfigure a proxy server/connection pooler such as <application>PgBouncer</application>.
</para> </para>
<note>
<simpara>
This &repmgr; functionality is for BDR 2.x only running on PostgreSQL 9.4/9.6.
It is <emphasis>not</emphasis> required for later BDR versions.
</simpara>
</note>
<sect1 id="bdr-prerequisites" xreflabel="BDR prequisites"> <sect1 id="bdr-prerequisites" xreflabel="BDR prequisites">
<title>Prerequisites</title> <title>Prerequisites</title>
<important>
<para>
This &repmgr; functionality is for BDR 2.x only running on PostgreSQL 9.4/9.6.
It is <emphasis>not</emphasis> required for later BDR versions.
</para>
</important>
<para> <para>
&repmgr; 4 requires PostgreSQL 9.4 or 9.6 with the BDR 2 extension &repmgr; 4 requires PostgreSQL 9.4 or 9.6 with the BDR 2 extension
enabled and configured for a two-node BDR network. &repmgr; 4 packages enabled and configured for a two-node BDR network. &repmgr; 4 packages

View File

@@ -24,7 +24,7 @@
<para> <para>
To use <application>repmgrd</application>, its associated function library <emphasis>must</emphasis> be To use <application>repmgrd</application>, its associated function library <emphasis>must</emphasis> be
included via <filename>postgresql.conf</filename> with: included in <filename>postgresql.conf</filename> with:
<programlisting> <programlisting>
shared_preload_libraries = 'repmgr'</programlisting> shared_preload_libraries = 'repmgr'</programlisting>
@@ -34,6 +34,23 @@
the <ulink url="https://www.postgresql.org/docs/current/static/runtime-config-client.html#GUC-SHARED-PRELOAD-LIBRARIES">PostgreSQL documentation</ulink>. the <ulink url="https://www.postgresql.org/docs/current/static/runtime-config-client.html#GUC-SHARED-PRELOAD-LIBRARIES">PostgreSQL documentation</ulink>.
</para> </para>
<para>
To apply configuration file changes to a running <application>repmgrd</application>
daemon, execute the operating system's r<application>repmgrd</application> service reload command
(see <xref linkend="appendix-packages"> for examples),
or for instances which were manually started, execute <command>kill -HUP</command>, e.g.
<command>kill -HUP `cat /tmp/repmgrd.pid`</command>.
</para>
<note>
<para>
Check the <application>repmgrd</application> log to see what changes were
applied, or if any issues were encountered when reloading the configuration.
</para>
</note>
<para>
Note that only a subset of configuration file parameters can be changed on a
running <application>repmgrd</application> daemon.
</para>
<sect2 id="repmgrd-automatic-failover-configuration"> <sect2 id="repmgrd-automatic-failover-configuration">
<title>automatic failover configuration</title> <title>automatic failover configuration</title>
@@ -46,17 +63,8 @@
follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'</programlisting> follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'</programlisting>
</para> </para>
<para> <para>
Adjust file paths as appropriate; alway specify the full path to the &repmgr; binary. Adjust file paths as appropriate; we recomment specifying the full path to the &repmgr; binary.
</para> </para>
<note>
<para>
&repmgr; will not apply <option>pg_bindir</option> when executing <option>promote_command</option>
or <option>follow_command</option>; these can be user-defined scripts so must always be
specified with the full path.
</para>
</note>
<para> <para>
Note that the <literal>--log-to-file</literal> option will cause Note that the <literal>--log-to-file</literal> option will cause
output generated by the &repmgr; command, when executed by <application>repmgrd</application>, output generated by the &repmgr; command, when executed by <application>repmgrd</application>,
@@ -122,11 +130,11 @@
particularly on <application>systemd</application>-based systems. particularly on <application>systemd</application>-based systems.
</para> </para>
<para> <para>
For more details, see <xref linkend="configuration-file-service-commands">. For more details, see <xref linkend="configuration-service-commands">.
</para> </para>
</sect2> </sect2>
<sect2 id="repmgrd-monitoring-configuration" xreflabel="repmgrd monitoring configuration"> <sect2 id="repmgrd-monitoring-configuration">
<indexterm> <indexterm>
<primary>repmgrd</primary> <primary>repmgrd</primary>
<secondary>monitoring configuration</secondary> <secondary>monitoring configuration</secondary>
@@ -149,203 +157,6 @@
</para> </para>
</sect2> </sect2>
<sect2 id="repmgrd-reloading-configuration"xreflabel="reloading repmgrd configuration">
<indexterm>
<primary>repmgrd</primary>
<secondary>applying configuration changes</secondary>
</indexterm>
<title>Applying configuration changes to repmgrd</title>
<para>
To apply configuration file changes to a running <application>repmgrd</application>
daemon, execute the operating system's <application>repmgrd</application> service reload command
(see <xref linkend="appendix-packages"> for examples),
or for instances which were manually started, execute <command>kill -HUP</command>, e.g.
<command>kill -HUP `cat /tmp/repmgrd.pid`</command>.
</para>
<tip>
<para>
Check the <application>repmgrd</application> log to see what changes were
applied, or if any issues were encountered when reloading the configuration.
</para>
</tip>
<para>
Note that only the following subset of configuration file parameters can be changed on a
running <application>repmgrd</application> daemon:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<varname>async_query_timeout</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>bdr_local_monitoring_only</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>bdr_recovery_timeout</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>conninfo</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>degraded_monitoring_timeout</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>event_notification_command</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>event_notifications</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>failover</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>follow_command</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>log_facility</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>log_file</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>log_level</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>log_status_interval</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>monitor_interval_secs</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>monitoring_history</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>primary_notification_timeout</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>promote_command</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>reconnect_attempts</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>reconnect_interval</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>repmgrd_standby_startup_timeout</varname>
</simpara>
</listitem>
</itemizedlist>
<para>
The following set of configuration file parameters must be updated via
<command><link linkend="repmgr-standby-register">repmgr standby register --force</link></command>,
as they require changes to the <literal>repmgr.nodes</literal> table so they are visible to
all nodes in the replication cluster:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<varname>node_id</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>node_name</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>data_directory</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>location</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>priority</varname>
</simpara>
</listitem>
</itemizedlist>
<note>
<para>
After executing <command><link linkend="repmgr-standby-register">repmgr standby register --force</link></command>,
<application>repmgrd</application> <emphasis>must</emphasis> be restarted for the changes to take effect.
</para>
</note>
</sect2>
</sect1> </sect1>
<sect1 id="repmgrd-daemon"> <sect1 id="repmgrd-daemon">
@@ -366,63 +177,10 @@
<para> <para>
<application>repmgrd</application> can be started manually like this: <application>repmgrd</application> can be started manually like this:
<programlisting> <programlisting>
repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid</programlisting> repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize</programlisting>
and stopped with <command>kill `cat /tmp/repmgrd.pid`</command>. Adjust paths as appropriate. and stopped with <command>kill `cat /tmp/repmgrd.pid`</command>. Adjust paths as appropriate.
</para> </para>
<sect2 id="repmgrd-pid-file" xreflabel="repmgrd's PID file">
<indexterm>
<primary>repmgrd</primary>
<secondary>PID file</secondary>
</indexterm>
<indexterm>
<primary>PID file</primary>
<secondary>repmgrd</secondary>
</indexterm>
<title>repmgrd's PID file</title>
<para>
<application>repmgrd</application> will generate a PID file by default.
</para>
<note>
<simpara>
This is a behaviour change from previous versions (earlier than 4.1), where
the PID file had to be explicitly specified with the command line
parameter <option> --pid-file</option>.
</simpara>
</note>
<para>
The PID file can be specified in <filename>repmgr.conf</filename> with the configuration
parameter <varname>repmgrd_pid_file</varname>.
</para>
<para>
It can also be specified on the command line (as in previous versions) with
the command line parameter <option>--pid-file</option>. Note this will override
any value set in <filename>repmgr.conf</filename> with <varname>repmgrd_pid_file</varname>.
<option>--pid-file</option> may be deprecated in future releases.
</para>
<para>
If a PID file location was specified by the package maintainer, <application>repmgrd</application>
will use that. This only applies if &repmgr; was installed from a package and the package
maintainer has specified the PID file location.
</para>
<para>
If none of the above apply, <application>repmgrd</application> will create a PID file
in the operating system's temporary directory (das etermined by the environment variable
<varname>TMPDIR</varname>, or if that is not set, will use <filename>/tmp</filename>).
</para>
<para>
To prevent a PID file being generated at all, provide the command line option
<option>--no-pid-file</option>.
</para>
<para>
To see which PID file <application>repmgrd</application> would use, execute <application>repmgrd</application>
with the option <option>--show-pid-file</option>. <application>repmgrd</application>
will not start if this option is provided. Note that the value shown is the
file <application>repmgrd</application> would use next time it starts, and is
not necessarily the PID file currently in use.
</para>
</sect2>
<sect2 id="repmgrd-configuration-debian-ubuntu"> <sect2 id="repmgrd-configuration-debian-ubuntu">
<indexterm> <indexterm>
<primary>repmgrd</primary> <primary>repmgrd</primary>
@@ -454,7 +212,7 @@ REPMGRD_ENABLED=no
#REPMGRD_CONF="/path/to/repmgr.conf" #REPMGRD_CONF="/path/to/repmgr.conf"
# additional options # additional options
REPMGRD_OPTS="--daemonize=false" #REPMGRD_OPTS=""
# user to run repmgrd as # user to run repmgrd as
#REPMGRD_USER=postgres #REPMGRD_USER=postgres
@@ -469,16 +227,6 @@ REPMGRD_OPTS="--daemonize=false"
Set <varname>REPMGRD_ENABLED</varname> to <literal>yes</literal>, and <varname>REPMGRD_CONF</varname> Set <varname>REPMGRD_ENABLED</varname> to <literal>yes</literal>, and <varname>REPMGRD_CONF</varname>
to the <filename>repmgr.conf</filename> file you are using. to the <filename>repmgr.conf</filename> file you are using.
</para> </para>
<tip>
<para>
See <xref linkend="packages-debian-ubuntu"> for details of the Debian/Ubuntu packages and
typical file locations (including <filename>repmgr.conf</filename>).
</para>
</tip>
<para>
From <application>repmgrd</application> 4.1, ensure <varname>REPMGRD_OPTS</varname> includes
<option>--daemonize=false</option>, as daemonization is handled by the service command.
</para>
<para> <para>
If using <application>systemd</application>, you may need to execute <command>systemctl daemon-reload</command>. If using <application>systemd</application>, you may need to execute <command>systemctl daemon-reload</command>.
Also, if you attempted to start <application>repmgrd</application> using <command>systemctl start repmgrd</command>, Also, if you attempted to start <application>repmgrd</application> using <command>systemctl start repmgrd</command>,
@@ -521,34 +269,25 @@ REPMGRD_OPTS="--daemonize=false"
<secondary>repmgrd</secondary> <secondary>repmgrd</secondary>
</indexterm> </indexterm>
<indexterm>
<primary>repmgrd</primary>
<secondary>log rotation</secondary>
</indexterm>
<title>repmgrd log rotation</title> <title>repmgrd log rotation</title>
<para> <para>
To ensure the current <application>repmgrd</application> logfile To ensure the current <application>repmgrd</application> logfile
(specified in <filename>repmgr.conf</filename> with the parameter (specified in <filename>repmgr.conf</filename> with the parameter
<option>log_file</option>) does not grow indefinitely, configure your <option>log_file</option> does not grow indefinitely, configure your
system's <command>logrotate</command> to regularly rotate it. system's <command>logrotate</command> to regularly rotate it.
</para> </para>
<para> <para>
Sample configuration to rotate logfiles weekly with retention for Sample configuration to rotate logfiles weekly with retention for
up to 52 weeks and rotation forced if a file grows beyond 100Mb: up to 52 weeks and rotation forced if a file grows beyond 100Mb:
<programlisting> <programlisting>
/var/log/repmgr/repmgrd.log { /var/log/postgresql/repmgr-9.6.log {
missingok missingok
compress compress
rotate 52 rotate 52
maxsize 100M maxsize 100M
weekly weekly
create 0600 postgres postgres create 0600 postgres postgres
postrotate
/usr/bin/killall -HUP repmgrd
endscript
}</programlisting> }</programlisting>
</para> </para>
</sect1> </sect1>
</chapter> </chapter>

View File

@@ -1,4 +1,4 @@
<chapter id="repmgrd-degraded-monitoring" xreflabel="repmgrd degraded monitoring"> <chapter id="repmgrd-degraded-monitoring">
<indexterm> <indexterm>
<primary>repmgrd</primary> <primary>repmgrd</primary>
<secondary>degraded monitoring</secondary> <secondary>degraded monitoring</secondary>
@@ -7,8 +7,8 @@
<title>"degraded monitoring" mode</title> <title>"degraded monitoring" mode</title>
<para> <para>
In certain circumstances, <application>repmgrd</application> is not able to fulfill its primary mission In certain circumstances, <application>repmgrd</application> is not able to fulfill its primary mission
of monitoring the node's upstream server. In these cases it enters &quot;degraded monitoring&quot; of monitoring the nodes' upstream server. In these cases it enters "degraded
mode, where <application>repmgrd</application> remains active but is waiting for the situation monitoring" mode, where <application>repmgrd</application> remains active but is waiting for the situation
to be resolved. to be resolved.
</para> </para>
<para> <para>

View File

@@ -1,4 +1,4 @@
<chapter id="repmgrd-monitoring" xreflabel="Monitoring with repmgrd"> <chapter id="repmgrd-monitoring">
<indexterm> <indexterm>
<primary>repmgrd</primary> <primary>repmgrd</primary>
<secondary>monitoring</secondary> <secondary>monitoring</secondary>

View File

@@ -40,8 +40,8 @@
In a failover situation, <application>repmgrd</application> will check if any servers in the In a failover situation, <application>repmgrd</application> will check if any servers in the
same location as the current primary node are visible. If not, <application>repmgrd</application> same location as the current primary node are visible. If not, <application>repmgrd</application>
will assume a network interruption and not promote any node in any will assume a network interruption and not promote any node in any
other location (it will however enter <link linkend="repmgrd-degraded-monitoring">degraded monitoring</link> other location (it will however enter <xref linkend="repmgrd-degraded-monitoring"> mode until
mode until a primary becomes visible). a primary becomes visible).
</para> </para>
</chapter> </chapter>

View File

@@ -1,178 +0,0 @@
<chapter id="repmgrd-pausing" xreflabel="Pausing repmgrd">
<indexterm>
<primary>repmgrd</primary>
<secondary>pausing</secondary>
</indexterm>
<indexterm>
<primary>pausing repmgrd</primary>
</indexterm>
<title>Pausing repmgrd</title>
<para>
In normal operation, <application>repmgrd</application> monitors the state of the
PostgreSQL node it is running on, and will take appropriate action if problems
are detected, e.g. (if so configured) promote the node to primary, if the existing
primary has been determined as failed.
</para>
<para>
However, <application>repmgrd</application> is unable to distinguish between
planned outages (such as performing a <link linkend="performing-switchover">switchover</link>
or installing PostgreSQL maintenance released), and an actual server outage. In versions prior to
&repmgr; 4.2 it was necessary to stop <application>repmgrd</application> on all nodes (or at least
on all nodes where <application>repmgrd</application> is
<link linkend="repmgrd-automatic-failover">configured for automatic failover</link>)
to prevent <application>repmgrd</application> from making unintentional changes to the
replication cluster.
</para>
<para>
From <link linkend="release-4.2">&repmgr; 4.2</link>, <application>repmgrd</application>
can now be &quot;paused&quot;, i.e. instructed not to take any action such as performing a failover.
This can be done from any node in the cluster, removing the need to stop/restart
each <application>repmgrd</application> individually.
</para>
<note>
<para>
For major PostgreSQL upgrades, e.g. from PostgreSQL 10 to PostgreSQL 11,
<application>repmgrd</application> should be shut down completely and only started up
once the &repmgr; packages for the new PostgreSQL major version have been installed.
</para>
</note>
<sect1 id="repmgrd-pausing-prerequisites">
<title>Prerequisites for pausing <application>repmgrd</application></title>
<para>
In order to be able to pause/unpause <application>repmgrd</application>, following
prerequisites must be met:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><link linkend="release-4.2">&repmgr; 4.2</link> or later must be installed on all nodes.</simpara>
</listitem>
<listitem>
<simpara>The same major &repmgr; version (e.g. 4.2) must be installed on all nodes (and preferably the same minor version).</simpara>
</listitem>
<listitem>
<simpara>
PostgreSQL on all nodes must be accessible from the node where the
<literal>pause</literal>/<literal>unpause</literal> operation is executed, using the
<varname>conninfo</varname> string shown by <link linkend="repmgr-cluster-show"><command>repmgr cluster show</command></link>.
</simpara>
</listitem>
</itemizedlist>
</para>
<note>
<para>
These conditions are required for normal &repmgr; operation in any case.
</para>
</note>
</sect1>
<sect1 id="repmgrd-pausing-execution">
<title>Pausing/unpausing <application>repmgrd</application></title>
<para>
To pause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link>, e.g.:
<programlisting>
$ repmgr -f /etc/repmgr.conf daemon pause
NOTICE: node 1 (node1) paused
NOTICE: node 2 (node2) paused
NOTICE: node 3 (node3) paused</programlisting>
</para>
<para>
The state of <application>repmgrd</application> on each node can be checked with
<link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>, e.g.:
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
ID | Name | Role | Status | repmgrd | PID | Paused?
----+-------+---------+---------+---------+------+---------
1 | node1 | primary | running | running | 7851 | yes
2 | node2 | standby | running | running | 7889 | yes
3 | node3 | standby | running | running | 7918 | yes</programlisting>
</para>
<note>
<para>
If executing a switchover with <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
&repmgr; will automatically pause/unpause <application>repmgrd</application> as part of the switchover process.
</para>
</note>
<para>
If the primary (in this example, <literal>node1</literal>) is stopped, <application>repmgrd</application>
running on one of the standbys (here: <literal>node2</literal>) will react like this:
<programlisting>
[2018-09-20 12:22:21] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
[2018-09-20 12:22:21] [INFO] checking state of node 1, 1 of 5 attempts
[2018-09-20 12:22:21] [INFO] sleeping 1 seconds until next reconnection attempt
...
[2018-09-20 12:22:24] [INFO] sleeping 1 seconds until next reconnection attempt
[2018-09-20 12:22:25] [INFO] checking state of node 1, 5 of 5 attempts
[2018-09-20 12:22:25] [WARNING] unable to reconnect to node 1 after 5 attempts
[2018-09-20 12:22:25] [NOTICE] node is paused
[2018-09-20 12:22:33] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state
[2018-09-20 12:22:33] [DETAIL] repmgrd paused by administrator
[2018-09-20 12:22:33] [HINT] execute "repmgr daemon unpause" to resume normal failover mode</programlisting>
</para>
<para>
If the primary becomes available again (e.g. following a software upgrade), <application>repmgrd</application>
will automatically reconnect, e.g.:
<programlisting>
[2018-09-20 13:12:41] [NOTICE] reconnected to upstream node 1 after 8 seconds, resuming monitoring</programlisting>
</para>
<para>
To unpause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>, e.g.:
<programlisting>
$ repmgr -f /etc/repmgr.conf daemon unpause
NOTICE: node 1 (node1) unpaused
NOTICE: node 2 (node2) unpaused
NOTICE: node 3 (node3) unpaused</programlisting>
</para>
<note>
<para>
If the previous primary is no longer accessible when <application>repmgrd</application>
is unpaused, no failover action will be taken. Instead, a new primary must be manually promoted using
<link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>,
and any standbys attached to the new primary with
<link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>.
</para>
<para>
This is to prevent <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
resulting in the automatic promotion of a new primary, which may be a problem particularly
in larger clusters, where <application>repmgrd</application> could select a different promotion
candidate to the one intended by the administrator.
</para>
</note>
<sect2 id="repmgrd-pausing-details">
<title>Details on the <application>repmgrd</application> pausing mechanism</title>
<para>
The pause state of each node will be stored over a PostgreSQL restart.
</para>
<para>
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link> can be
executed even if <application>repmgrd</application> is not running; in this case,
<application>repmgrd</application> will start up in whichever pause state has been set.
</para>
<note>
<para>
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
<emphasis>do not</emphasis> stop/start <application>repmgrd</application>.
</para>
</note>
</sect2>
</sect1>
</chapter>

View File

@@ -19,10 +19,9 @@
</para> </para>
<para> <para>
<command>repmgr standby switchover</command> differs from other &repmgr; <command>repmgr standby switchover</command> differs from other &repmgr;
actions in that it also performs actions on other servers (the demotion actions in that it also performs actions on another server (the demotion
candidate, and optionally any other servers which are to follow the new primary), candidate), which means passwordless SSH access is required to that server
which means passwordless SSH access is required to those servers from the one where from the one where <command>repmgr standby switchover</command> is executed.
<command>repmgr standby switchover</command> is executed.
</para> </para>
<note> <note>
<simpara> <simpara>
@@ -61,13 +60,6 @@
&repmgr; being able to shut down the current primary server quickly and cleanly. &repmgr; being able to shut down the current primary server quickly and cleanly.
</para> </para>
<para>
Ensure that the promotion candidate has sufficient free walsenders available
(PostgreSQL configuration item <varname>max_wal_senders</varname>), and if replication
slots are in use, at least one free slot is available for the demotion candidate (
PostgreSQL configuration item <varname>max_replication_slots</varname>).
</para>
<para> <para>
Ensure that a passwordless SSH connection is possible from the promotion candidate Ensure that a passwordless SSH connection is possible from the promotion candidate
(standby) to the demotion candidate (current primary). If <literal>--siblings-follow</literal> (standby) to the demotion candidate (current primary). If <literal>--siblings-follow</literal>
@@ -84,12 +76,11 @@
<para> <para>
Double-check which commands will be used to stop/start/restart the current Double-check which commands will be used to stop/start/restart the current
primary; this can be done by e.g. executing <command><link linkend="repmgr-node-service">repmgr node service</link></command> primary; on the current primary execute:
on the current primary:
<programlisting> <programlisting>
repmgr -f /etc/repmgr.conf node service --list-actions --action=stop repmgr -f /etc/repmgr.conf node service --list --action=stop
repmgr -f /etc/repmgr.conf node service --list-actions --action=start repmgr -f /etc/repmgr.conf node service --list --action=start
repmgr -f /etc/repmgr.conf node service --list-actions --action=restart</programlisting> repmgr -f /etc/repmgr.conf node service --list --action=restart</programlisting>
</para> </para>
@@ -113,7 +104,7 @@
server. server.
</para> </para>
<para> <para>
For more details, see <xref linkend="configuration-file-service-commands">. For more details, see <xref linkend="configuration-service-commands">.
</para> </para>
</important> </important>
@@ -130,18 +121,12 @@
</simpara> </simpara>
</note> </note>
<para> <para>
Check that access from applications is minimalized or preferably blocked Check that access from applications is minimalized or preferably blocked
completely, so applications are not unexpectedly interrupted. completely, so applications are not unexpectedly interrupted.
</para> </para>
<note>
<para>
If an exclusive backup is running on the current primary, &repmgr; will not perform the
switchover.
</para>
</note>
<para> <para>
Check there is no significant replication lag on standbys attached to the Check there is no significant replication lag on standbys attached to the
current primary. current primary.
@@ -157,18 +142,11 @@
<note> <note>
<para> <para>
From <link linkend="release-4.2">repmgr 4.2</link>, &repmgr; will instruct any running Ensure that <application>repmgrd</application> is *not* running anywhere to prevent it unintentionally
<application>repmgrd</application> instances to pause operations while the switchover promoting a node. This restriction will be removed in a future &repmgr; version.
is being carried out, to prevent <application>repmgrd</application> from
unintentionally promoting a node. For more details, see <xref linkend="repmgrd-pausing">.
</para>
<para>
Users of &repmgr; versions prior to 4.2 should ensure that <application>repmgrd</application>
is not running on any nodes while a switchover is being executed.
</para> </para>
</note> </note>
<para> <para>
Finally, consider executing <command>repmgr standby switchover</command> with the Finally, consider executing <command>repmgr standby switchover</command> with the
<literal>--dry-run</literal> option; this will perform any necessary checks and inform you about <literal>--dry-run</literal> option; this will perform any necessary checks and inform you about
@@ -311,21 +289,7 @@
2 | node2 | primary | * running | | default | host=node2 dbname=repmgr user=repmgr 2 | node2 | primary | * running | | default | host=node2 dbname=repmgr user=repmgr
</programlisting> </programlisting>
</para> </para>
<para>
If <application>repmgrd</application> is in use, it's worth double-checking that
all nodes are unpaused by executing <command><link linkend="repmgr-daemon-status">repmgr-daemon-status</link></command>.
</para>
<note>
<para>
Users of &repmgr; versions prior to 4.2 will need to manually restart <application>repmgrd</application>
on all nodes after the switchover is completed.
</para>
</note>
</sect1> </sect1>
<sect1 id="switchover-caveats" xreflabel="Caveats"> <sect1 id="switchover-caveats" xreflabel="Caveats">
<indexterm> <indexterm>
<primary>switchover</primary> <primary>switchover</primary>
@@ -351,76 +315,17 @@
for details. for details.
</simpara> </simpara>
</listitem> </listitem>
<listitem>
<simpara>
<application>repmgrd</application> should not be running with setting <varname>failover=automatic</varname>
in <filename>repmgr.conf</filename> when a switchover is carried out, otherwise the
<application>repmgrd</application> daemon may try and promote a standby by itself.
</simpara>
</listitem>
</itemizedlist> </itemizedlist>
</para> </para>
<para>
We hope to remove some of these restrictions in future versions of &repmgr;.
</para>
</sect1> </sect1>
<sect1 id="switchover-troubleshooting" xreflabel="Troubleshooting">
<indexterm>
<primary>switchover</primary>
<secondary>troubleshooting</secondary>
</indexterm>
<title>Troubleshooting switchover issues</title>
<para>
As <link linkend="performing-switchover">emphasised previously</link>, performing a switchover
is a non-trivial operation and there are a number of potential issues which can occur.
While &repmgr; attempts to perform sanity checks, there's no guaranteed way of determining the success of
a switchover without actually carrying it out.
</para>
<sect2 id="switchover-troubleshooting-primary-shutdown">
<title>Demotion candidate (old primary) does not shut down</title>
<para>
&repmgr; may abort a switchover with a message like:
<programlisting>
ERROR: shutdown of the primary server could not be confirmed
HINT: check the primary server status before performing any further actions</programlisting>
</para>
<para>
This means the shutdown of the old primary has taken longer than &repmgr; expected,
and it has given up waiting.
</para>
<para>
In this case, check the PostgreSQL log on the primary server to see what is going
on. It's entirely possible the shutdown process is just taking longer than the
timeout set by the configuration parameter <varname>shutdown_check_timeout</varname>
(default: 60 seconds), in which case you may need to adjust this parameter.
</para>
<note>
<para>
Note that <varname>shutdown_check_timeout</varname> is set on the node where
<command>repmgr standby switchover</command> is executed (promotion candidate); setting it on the
demotion candidate (former primary) will have no effect.
</para>
</note>
<para>
If the primary server has shut down cleanly, and no other node has been promoted,
it is safe to restart it, in which case the replication cluster will be restored
to its original configuration.
</para>
</sect2>
<sect2 id="switchover-troubleshooting-exclusive-backup">
<title>Switchover aborts with an &quot;exclusive backup&quot; error</title>
<para>
&repmgr; may abort a switchover with a message like:
<programlisting>
ERROR: unable to perform a switchover while primary server is in exclusive backup mode
HINT: stop backup before attempting the switchover</programlisting>
</para>
<para>
This means an exclusive backup is running on the current primary; interrupting this
will not only abort the backup, but potentially leave the primary with an ambiguous
backup state.
</para>
<para>
To proceed, either wait until the backup has finished, or cancel it with the command
<command>SELECT pg_stop_backup()</command>. For more details see the PostgreSQL
documentation section
<ulink url="https://www.postgresql.org/docs/current/static/continuous-archiving.html#BACKUP-LOWLEVEL-BASE-BACKUP-EXCLUSIVE">Making an exclusive low level backup</ulink>.
</para>
</sect2>
</sect1>
</chapter> </chapter>

View File

@@ -4,6 +4,6 @@ Upgrading from repmgr 3
This document has been integrated into the main `repmgr` documentation This document has been integrated into the main `repmgr` documentation
and is now located here: and is now located here:
> [Upgrading from repmgr 3.x](https://repmgr.org/docs/current/upgrading-from-repmgr-3.html) > [Upgrading from repmgr 3.x](https://repmgr.org/docs/4.0/upgrading-from-repmgr-3.html)

View File

@@ -7,9 +7,9 @@
<title>Upgrading repmgr</title> <title>Upgrading repmgr</title>
<para> <para>
&repmgr; is updated regularly with minor releases (e.g. 4.0.1 to 4.0.2) &repmgr; is updated regularly with point releases (e.g. 4.0.1 to 4.0.2)
containing bugfixes and other minor improvements. Any substantial new containing bugfixes and other minor improvements. Any substantial new
functionality will be included in a major release (e.g. 4.0 to 4.1). functionality will be included in a feature release (e.g. 4.0.x to 4.1.x).
</para> </para>
<sect1 id="upgrading-repmgr-extension" xreflabel="Upgrading repmgr 4.x and later"> <sect1 id="upgrading-repmgr-extension" xreflabel="Upgrading repmgr 4.x and later">
@@ -19,202 +19,37 @@
</indexterm> </indexterm>
<title>Upgrading repmgr 4.x and later</title> <title>Upgrading repmgr 4.x and later</title>
<para> <para>
From version 4, &repmgr; consists of three elements: &repmgr; 4.x is implemented as a PostgreSQL extension; normally the upgrade consists
<itemizedlist spacing="compact" mark="bullet"> of the two following steps:
<orderedlist>
<listitem> <listitem>
<simpara> <simpara>
the <application>repmgr</application> and <application>repmgrd</application> executables Install the updated package (or compile the updated source)
</simpara> </simpara>
</listitem> </listitem>
<listitem> <listitem>
<simpara> <simpara>
the objects for the &repmgr; PostgreSQL extension (SQL files for creating/updating In the database where the &repmgr; extension is installed, execute
repmgr metadata, and the extension control file) <command>ALTER EXTENSION repmgr UPDATE</command>.
</simpara> </simpara>
</listitem> </listitem>
</orderedlist>
<listitem>
<simpara>
the shared library module used by <application>repmgrd</application> which
is resident in the PostgreSQL backend
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
With <emphasis>minor releases</emphasis>, usually changes are only made to the <application>repmgr</application>
and <application>repmgrd</application> executables. In this case, the upgrade is quite straightforward,
and is simply a case of installing the new version, and restarting <application>repmgrd</application>
(if running).
</para> </para>
<para>
For <emphasis>major releases</emphasis>, the &repmgr; PostgreSQL extension will need to be updated
to the latest version. Additionally, if the shared library module has been updated (this is sometimes,
but not always the case), PostgreSQL itself will need to be restarted on each node.
</para>
<important>
<para> <para>
Always check the <link linkend="appendix-release-notes">release notes</link> for every Always check the <link linkend="appendix-release-notes">release notes</link> for every
release as they may contain upgrade instructions particular to individual versions. release as they may contain upgrade instructions particular to individual versions.
</para> </para>
</important>
<sect2 id="upgrading-minor-version" xreflabel="Upgrading a minor version release">
<indexterm>
<primary>upgrading</primary>
<secondary>minor release</secondary>
</indexterm>
<title>Upgrading a minor version release</title>
<para> <para>
The process for installing minor version upgrades is quite straightforward: If the <application>repmgrd</application> daemon is in use, we recommend stopping it
before upgrading &repmgr;.
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
install the new &repmgr; version
</simpara>
</listitem>
<listitem>
<simpara>
restart <application>repmgrd</application> on all nodes where it is running
</simpara>
</listitem>
</itemizedlist>
</para>
<note>
<para>
Some packaging systems (e.g. <link linkend="packages-debian-ubuntu">Debian/Ubuntu</link>
may restart <application>repmgrd</application> as part of the package upgrade process.
</para>
</note>
<para>
Minor version upgrades can be performed in any order on the nodes in the replication
cluster.
</para>
<para>
A PostgreSQL restart is <emphasis>not</emphasis> required for minor version upgrades.
</para>
<note>
<para>
The same &repmgr; &quot;major version&quot; (e.g. <literal>4.2</literal>) must be
installed on all nodes in the replication cluster. While it's possible to have differing
&repmgr; &quot;minor versions&quot; (e.g. <literal>4.2.1</literal>) on different nodes,
we strongly recommend updating all nodes to the latest minor version.
</para>
</note>
</sect2>
<sect2 id="upgrading-major-version" xreflabel="Upgrading a major version release">
<indexterm>
<primary>upgrading</primary>
<secondary>major release</secondary>
</indexterm>
<title>Upgrading a major version release</title>
<para>
&quot;major version&quot; upgrades need to be planned more carefully, as they may include
changes to the &repmgr; metadata (which need to be propagated from the primary to all
standbys) and/or changes to the shared object file used by <application>repmgrd</application>
(which require a PostgreSQL restart).
</para> </para>
<para> <para>
With this in mind, Note that it may be necessary to restart the PostgreSQL server if the upgrade contains
changes to the shared object file used by <application>repmgrd</application>; check the
release notes for details.
</para> </para>
<para>
<orderedlist>
<listitem>
<simpara>
Stop <application>repmgrd</application> (if in use) on all nodes where it is running.
</simpara>
</listitem>
<listitem>
<simpara>
Disable the <application>repmgrd</application> service on all nodes where it is in use;
this is to prevent packages from prematurely restarting <application>repmgrd</application>.
</simpara>
</listitem>
<listitem>
<simpara>
Install the updated package (or compile the updated source) on all nodes.
</simpara>
</listitem>
<listitem>
<para>
If running a <literal>systemd</literal>-based Linux distribution, execute (as <literal>root</literal>,
or with appropriate <literal>sudo</literal> permissions):
<programlisting>
systemctl daemon-reload</programlisting>
</para>
</listitem>
<listitem>
<simpara>
If the &repmgr; shared library module has been updated (check the <link linkend="appendix-release-notes">release notes</link>!),
restart PostgreSQL, then <application>repmgrd</application> (if in use) on each node,
The order in which this is applied to individual nodes is not critical,
and it's also fine to restart PostgreSQL on all nodes first before starting <application>repmgrd</application>.
</simpara>
<simpara>
Note that if the upgrade requires a PostgreSQL restart, <application>repmgrd</application>
will only function correctly once all nodes have been restarted.
</simpara>
</listitem>
<listitem>
<para>
On the primary node, execute
<programlisting>
ALTER EXTENSION repmgr UPDATE</programlisting>
in the database where &repmgr; is installed.
</para>
</listitem>
<listitem>
<simpara>
Reenable the <application>repmgrd</application> service on all nodes where it is in use, and
ensure it is running.
</simpara>
</listitem>
</orderedlist>
</para>
<tip>
<para>
If the &repmgr; upgrade requires a PostgreSQL restart, combine the &repmgr; upgrade
with a PostgreSQL minor version upgrade, which will require a restart in any case.
New PostgreSQL minor version are usually released every couple of months.
</para>
</tip>
</sect2>
<sect2 id="upgrading-check-repmgrd" xreflabel="Checking repmgrd status after an upgrade">
<indexterm>
<primary>upgrading</primary>
<secondary>checking repmgrd status</secondary>
</indexterm>
<title>Checking repmgrd status after an upgrade</title>
<para>
From <link linkend="release-4.2">repmgr 4.2</link>, once the upgrade is complete, execute the <command><link linkend="repmgr-daemon-status">repmgr daemon status</link></command>
command (on any node) to show an overview of the status of <application>repmgrd</application> on all nodes.
</para>
</sect2>
</sect1> </sect1>
<sect1 id="upgrading-and-pg-upgrade" xreflabel="pg_upgrade and repmgr"> <sect1 id="upgrading-and-pg-upgrade" xreflabel="pg_upgrade and repmgr">
@@ -254,13 +89,6 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
be recreated by <application>pg_upgrade</application>. These will need to be recreated by <application>pg_upgrade</application>. These will need to
be recreated manually. be recreated manually.
</para> </para>
<tip>
<para>
Use <command><link linkend="repmgr-node-check">repmgr node check</link></command>
to determine which replacation slots need to be recreated.
</para>
</tip>
</sect1> </sect1>

View File

@@ -1 +1 @@
<!ENTITY repmgrversion "4.2"> <!ENTITY repmgrversion "4.0.6">

View File

@@ -46,7 +46,6 @@
#define ERR_SWITCHOVER_INCOMPLETE 22 #define ERR_SWITCHOVER_INCOMPLETE 22
#define ERR_FOLLOW_FAIL 23 #define ERR_FOLLOW_FAIL 23
#define ERR_REJOIN_FAIL 24 #define ERR_REJOIN_FAIL 24
#define ERR_NODE_STATUS 25 #define ERR_CLUSTER_CHECK 25
#define ERR_REPMGRD_PAUSE 26
#endif /* _ERRCODE_H_ */ #endif /* _ERRCODE_H_ */

12
log.c
View File

@@ -42,7 +42,7 @@ _stderr_log_with_level(const char *level_name, int level, const char *fmt, va_li
__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0))); __attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0)));
int log_type = REPMGR_STDERR; int log_type = REPMGR_STDERR;
int log_level = LOG_INFO; int log_level = LOG_NOTICE;
int last_log_level = LOG_INFO; int last_log_level = LOG_INFO;
int verbose_logging = false; int verbose_logging = false;
int terse_logging = false; int terse_logging = false;
@@ -70,7 +70,7 @@ _stderr_log_with_level(const char *level_name, int level, const char *fmt, va_li
/* /*
* Store the requested level so that if there's a subsequent log_hint() or * Store the requested level so that if there's a subsequent log_hint() or
* log_detail(), we can suppress that if --terse was specified, * log_detail(), we can suppress that if appropriate.
*/ */
last_log_level = level; last_log_level = level;
@@ -329,13 +329,6 @@ logger_set_terse(void)
} }
void
logger_set_level(int new_log_level)
{
log_level = new_log_level;
}
void void
logger_set_min_level(int min_log_level) logger_set_min_level(int min_log_level)
{ {
@@ -343,7 +336,6 @@ logger_set_min_level(int min_log_level)
log_level = min_log_level; log_level = min_log_level;
} }
int int
detect_log_level(const char *level) detect_log_level(const char *level)
{ {

1
log.h
View File

@@ -129,7 +129,6 @@ bool logger_shutdown(void);
void logger_set_verbose(void); void logger_set_verbose(void);
void logger_set_terse(void); void logger_set_terse(void);
void logger_set_min_level(int min_log_level); void logger_set_min_level(int min_log_level);
void logger_set_level(int new_log_level);
void void
log_detail(const char *fmt,...) log_detail(const char *fmt,...)

View File

@@ -1,2 +0,0 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit

View File

@@ -1,32 +0,0 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit
CREATE FUNCTION get_repmgrd_pid()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_repmgrd_pid'
LANGUAGE C STRICT;
CREATE FUNCTION get_repmgrd_pidfile()
RETURNS TEXT
AS 'MODULE_PATHNAME', 'get_repmgrd_pidfile'
LANGUAGE C STRICT;
CREATE FUNCTION set_repmgrd_pid(INT, TEXT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_repmgrd_pid'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_is_running()
RETURNS BOOL
AS 'MODULE_PATHNAME', 'repmgrd_is_running'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_pause(BOOL)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgrd_pause'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_is_paused()
RETURNS BOOL
AS 'MODULE_PATHNAME', 'repmgrd_is_paused'
LANGUAGE C STRICT;

View File

@@ -1,166 +0,0 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit
CREATE TABLE repmgr.nodes (
node_id INTEGER PRIMARY KEY,
upstream_node_id INTEGER NULL REFERENCES nodes (node_id) DEFERRABLE,
active BOOLEAN NOT NULL DEFAULT TRUE,
node_name TEXT NOT NULL,
type TEXT NOT NULL CHECK (type IN('primary','standby','witness','bdr')),
location TEXT NOT NULL DEFAULT 'default',
priority INT NOT NULL DEFAULT 100,
conninfo TEXT NOT NULL,
repluser VARCHAR(63) NOT NULL,
slot_name TEXT NULL,
config_file TEXT NOT NULL
);
CREATE TABLE repmgr.events (
node_id INTEGER NOT NULL,
event TEXT NOT NULL,
successful BOOLEAN NOT NULL DEFAULT TRUE,
event_timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
details TEXT NULL
);
DO $repmgr$
DECLARE
DECLARE server_version_num INT;
BEGIN
SELECT setting
FROM pg_catalog.pg_settings
WHERE name = 'server_version_num'
INTO server_version_num;
IF server_version_num >= 90400 THEN
EXECUTE $repmgr_func$
CREATE TABLE repmgr.monitoring_history (
primary_node_id INTEGER NOT NULL,
standby_node_id INTEGER NOT NULL,
last_monitor_time TIMESTAMP WITH TIME ZONE NOT NULL,
last_apply_time TIMESTAMP WITH TIME ZONE,
last_wal_primary_location PG_LSN NOT NULL,
last_wal_standby_location PG_LSN,
replication_lag BIGINT NOT NULL,
apply_lag BIGINT NOT NULL
)
$repmgr_func$;
ELSE
EXECUTE $repmgr_func$
CREATE TABLE repmgr.monitoring_history (
primary_node_id INTEGER NOT NULL,
standby_node_id INTEGER NOT NULL,
last_monitor_time TIMESTAMP WITH TIME ZONE NOT NULL,
last_apply_time TIMESTAMP WITH TIME ZONE,
last_wal_primary_location TEXT NOT NULL,
last_wal_standby_location TEXT,
replication_lag BIGINT NOT NULL,
apply_lag BIGINT NOT NULL
)
$repmgr_func$;
END IF;
END$repmgr$;
CREATE INDEX idx_monitoring_history_time
ON repmgr.monitoring_history (last_monitor_time, standby_node_id);
CREATE VIEW repmgr.show_nodes AS
SELECT n.node_id,
n.node_name,
n.active,
n.upstream_node_id,
un.node_name AS upstream_node_name,
n.type,
n.priority,
n.conninfo
FROM repmgr.nodes n
LEFT JOIN repmgr.nodes un
ON un.node_id = n.upstream_node_id;
/* XXX update upgrade scripts! */
CREATE TABLE repmgr.voting_term (
term INT NOT NULL
);
CREATE UNIQUE INDEX voting_term_restrict
ON repmgr.voting_term ((TRUE));
CREATE RULE voting_term_delete AS
ON DELETE TO repmgr.voting_term
DO INSTEAD NOTHING;
/* ================= */
/* repmgrd functions */
/* ================= */
/* monitoring functions */
CREATE FUNCTION set_local_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION get_local_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION standby_set_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'standby_set_last_updated'
LANGUAGE C STRICT;
CREATE FUNCTION standby_get_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'standby_get_last_updated'
LANGUAGE C STRICT;
/* failover functions */
CREATE FUNCTION notify_follow_primary(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'notify_follow_primary'
LANGUAGE C STRICT;
CREATE FUNCTION get_new_primary()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_new_primary'
LANGUAGE C STRICT;
CREATE FUNCTION reset_voting_status()
RETURNS VOID
AS 'MODULE_PATHNAME', 'reset_voting_status'
LANGUAGE C STRICT;
CREATE FUNCTION am_bdr_failover_handler(INT)
RETURNS BOOL
AS 'MODULE_PATHNAME', 'am_bdr_failover_handler'
LANGUAGE C STRICT;
CREATE FUNCTION unset_bdr_failover_handler()
RETURNS VOID
AS 'MODULE_PATHNAME', 'unset_bdr_failover_handler'
LANGUAGE C STRICT;
CREATE VIEW repmgr.replication_status AS
SELECT m.primary_node_id, m.standby_node_id, n.node_name AS standby_name,
n.type AS node_type, n.active, last_monitor_time,
CASE WHEN n.type='standby' THEN m.last_wal_primary_location ELSE NULL END AS last_wal_primary_location,
m.last_wal_standby_location,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.replication_lag) ELSE NULL END AS replication_lag,
CASE WHEN n.type='standby' THEN
CASE WHEN replication_lag > 0 THEN age(now(), m.last_apply_time) ELSE '0'::INTERVAL END
ELSE NULL
END AS replication_time_lag,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.apply_lag) ELSE NULL END AS apply_lag,
AGE(NOW(), CASE WHEN pg_catalog.pg_is_in_recovery() THEN repmgr.standby_get_last_updated() ELSE m.last_monitor_time END) AS communication_time_lag
FROM repmgr.monitoring_history m
JOIN repmgr.nodes n ON m.standby_node_id = n.node_id
WHERE (m.standby_node_id, m.last_monitor_time) IN (
SELECT m1.standby_node_id, MAX(m1.last_monitor_time)
FROM repmgr.monitoring_history m1 GROUP BY 1
);

View File

@@ -1,197 +0,0 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit
CREATE TABLE repmgr.nodes (
node_id INTEGER PRIMARY KEY,
upstream_node_id INTEGER NULL REFERENCES nodes (node_id) DEFERRABLE,
active BOOLEAN NOT NULL DEFAULT TRUE,
node_name TEXT NOT NULL,
type TEXT NOT NULL CHECK (type IN('primary','standby','witness','bdr')),
location TEXT NOT NULL DEFAULT 'default',
priority INT NOT NULL DEFAULT 100,
conninfo TEXT NOT NULL,
repluser VARCHAR(63) NOT NULL,
slot_name TEXT NULL,
config_file TEXT NOT NULL
);
CREATE TABLE repmgr.events (
node_id INTEGER NOT NULL,
event TEXT NOT NULL,
successful BOOLEAN NOT NULL DEFAULT TRUE,
event_timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
details TEXT NULL
);
DO $repmgr$
DECLARE
DECLARE server_version_num INT;
BEGIN
SELECT setting
FROM pg_catalog.pg_settings
WHERE name = 'server_version_num'
INTO server_version_num;
IF server_version_num >= 90400 THEN
EXECUTE $repmgr_func$
CREATE TABLE repmgr.monitoring_history (
primary_node_id INTEGER NOT NULL,
standby_node_id INTEGER NOT NULL,
last_monitor_time TIMESTAMP WITH TIME ZONE NOT NULL,
last_apply_time TIMESTAMP WITH TIME ZONE,
last_wal_primary_location PG_LSN NOT NULL,
last_wal_standby_location PG_LSN,
replication_lag BIGINT NOT NULL,
apply_lag BIGINT NOT NULL
)
$repmgr_func$;
ELSE
EXECUTE $repmgr_func$
CREATE TABLE repmgr.monitoring_history (
primary_node_id INTEGER NOT NULL,
standby_node_id INTEGER NOT NULL,
last_monitor_time TIMESTAMP WITH TIME ZONE NOT NULL,
last_apply_time TIMESTAMP WITH TIME ZONE,
last_wal_primary_location TEXT NOT NULL,
last_wal_standby_location TEXT,
replication_lag BIGINT NOT NULL,
apply_lag BIGINT NOT NULL
)
$repmgr_func$;
END IF;
END$repmgr$;
CREATE INDEX idx_monitoring_history_time
ON repmgr.monitoring_history (last_monitor_time, standby_node_id);
CREATE VIEW repmgr.show_nodes AS
SELECT n.node_id,
n.node_name,
n.active,
n.upstream_node_id,
un.node_name AS upstream_node_name,
n.type,
n.priority,
n.conninfo
FROM repmgr.nodes n
LEFT JOIN repmgr.nodes un
ON un.node_id = n.upstream_node_id;
/* XXX update upgrade scripts! */
CREATE TABLE repmgr.voting_term (
term INT NOT NULL
);
CREATE UNIQUE INDEX voting_term_restrict
ON repmgr.voting_term ((TRUE));
CREATE RULE voting_term_delete AS
ON DELETE TO repmgr.voting_term
DO INSTEAD NOTHING;
/* ================= */
/* repmgrd functions */
/* ================= */
/* monitoring functions */
CREATE FUNCTION set_local_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION get_local_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION standby_set_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'standby_set_last_updated'
LANGUAGE C STRICT;
CREATE FUNCTION standby_get_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'standby_get_last_updated'
LANGUAGE C STRICT;
/* failover functions */
CREATE FUNCTION notify_follow_primary(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'notify_follow_primary'
LANGUAGE C STRICT;
CREATE FUNCTION get_new_primary()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_new_primary'
LANGUAGE C STRICT;
CREATE FUNCTION reset_voting_status()
RETURNS VOID
AS 'MODULE_PATHNAME', 'reset_voting_status'
LANGUAGE C STRICT;
CREATE FUNCTION am_bdr_failover_handler(INT)
RETURNS BOOL
AS 'MODULE_PATHNAME', 'am_bdr_failover_handler'
LANGUAGE C STRICT;
CREATE FUNCTION unset_bdr_failover_handler()
RETURNS VOID
AS 'MODULE_PATHNAME', 'unset_bdr_failover_handler'
LANGUAGE C STRICT;
CREATE FUNCTION get_repmgrd_pid()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_repmgrd_pid'
LANGUAGE C STRICT;
CREATE FUNCTION get_repmgrd_pidfile()
RETURNS TEXT
AS 'MODULE_PATHNAME', 'get_repmgrd_pidfile'
LANGUAGE C STRICT;
CREATE FUNCTION set_repmgrd_pid(INT, TEXT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_repmgrd_pid'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_is_running()
RETURNS BOOL
AS 'MODULE_PATHNAME', 'repmgrd_is_running'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_pause(BOOL)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgrd_pause'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_is_paused()
RETURNS BOOL
AS 'MODULE_PATHNAME', 'repmgrd_is_paused'
LANGUAGE C STRICT;
CREATE VIEW repmgr.replication_status AS
SELECT m.primary_node_id, m.standby_node_id, n.node_name AS standby_name,
n.type AS node_type, n.active, last_monitor_time,
CASE WHEN n.type='standby' THEN m.last_wal_primary_location ELSE NULL END AS last_wal_primary_location,
m.last_wal_standby_location,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.replication_lag) ELSE NULL END AS replication_lag,
CASE WHEN n.type='standby' THEN
CASE WHEN replication_lag > 0 THEN age(now(), m.last_apply_time) ELSE '0'::INTERVAL END
ELSE NULL
END AS replication_time_lag,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.apply_lag) ELSE NULL END AS apply_lag,
AGE(NOW(), CASE WHEN pg_catalog.pg_is_in_recovery() THEN repmgr.standby_get_last_updated() ELSE m.last_monitor_time END) AS communication_time_lag
FROM repmgr.monitoring_history m
JOIN repmgr.nodes n ON m.standby_node_id = n.node_id
WHERE (m.standby_node_id, m.last_monitor_time) IN (
SELECT m1.standby_node_id, MAX(m1.last_monitor_time)
FROM repmgr.monitoring_history m1 GROUP BY 1
);

View File

@@ -83,10 +83,9 @@ do_bdr_register(void)
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
} }
/* BDR 2 implementation is for 2 nodes only */ if (bdr_nodes.node_count > 2)
if (get_bdr_version_num() < 3 && bdr_nodes.node_count > 2)
{ {
log_error(_("repmgr can only support BDR 2.x clusters with 2 nodes")); log_error(_("repmgr can only support BDR clusters with 2 nodes"));
log_detail(_("this BDR cluster has %i nodes"), bdr_nodes.node_count); log_detail(_("this BDR cluster has %i nodes"), bdr_nodes.node_count);
PQfinish(conn); PQfinish(conn);
pfree(dbname); pfree(dbname);
@@ -126,7 +125,7 @@ do_bdr_register(void)
} }
/* check whether repmgr extension exists, and there are no non-BDR nodes registered */ /* check whether repmgr extension exists, and there are no non-BDR nodes registered */
extension_status = get_repmgr_extension_status(conn, NULL); extension_status = get_repmgr_extension_status(conn);
if (extension_status == REPMGR_UNKNOWN) if (extension_status == REPMGR_UNKNOWN)
{ {
@@ -177,7 +176,6 @@ do_bdr_register(void)
if (bdr_node_has_repmgr_set(conn, config_file_options.node_name) == false) if (bdr_node_has_repmgr_set(conn, config_file_options.node_name) == false)
{ {
log_debug("bdr_node_has_repmgr_set() = false");
bdr_node_set_repmgr_set(conn, config_file_options.node_name); bdr_node_set_repmgr_set(conn, config_file_options.node_name);
} }
@@ -191,7 +189,7 @@ do_bdr_register(void)
{ {
NodeInfoList local_node_records = T_NODE_INFO_LIST_INITIALIZER; NodeInfoList local_node_records = T_NODE_INFO_LIST_INITIALIZER;
(void) get_all_node_records(conn, &local_node_records); get_all_node_records(conn, &local_node_records);
if (local_node_records.node_count == 0) if (local_node_records.node_count == 0)
{ {
@@ -203,7 +201,6 @@ do_bdr_register(void)
if (bdr_nodes.node_count == 0) if (bdr_nodes.node_count == 0)
{ {
log_error(_("unable to retrieve any BDR node records")); log_error(_("unable to retrieve any BDR node records"));
log_detail("%s", PQerrorMessage(conn));
PQfinish(conn); PQfinish(conn);
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
} }
@@ -232,14 +229,14 @@ do_bdr_register(void)
} }
/* check repmgr schema exists, skip if not */ /* check repmgr schema exists, skip if not */
other_node_extension_status = get_repmgr_extension_status(bdr_node_conn, NULL); other_node_extension_status = get_repmgr_extension_status(bdr_node_conn);
if (other_node_extension_status != REPMGR_INSTALLED) if (other_node_extension_status != REPMGR_INSTALLED)
{ {
continue; continue;
} }
(void) get_all_node_records(bdr_node_conn, &existing_nodes); get_all_node_records(bdr_node_conn, &existing_nodes);
for (cell = existing_nodes.head; cell; cell = cell->next) for (cell = existing_nodes.head; cell; cell = cell->next)
{ {
@@ -255,35 +252,7 @@ do_bdr_register(void)
} }
/* Add the repmgr extension tables to a replication set */ /* Add the repmgr extension tables to a replication set */
if (get_bdr_version_num() < 3)
{
add_extension_tables_to_bdr_replication_set(conn); add_extension_tables_to_bdr_replication_set(conn);
}
else
{
/* this is the only table we need to replicate */
char *replication_set = get_default_bdr_replication_set(conn);
/*
* this probably won't happen, but we need to be sure we're using
* the replication set metadata correctly...
*/
if (conn == NULL)
{
log_error(_("unable to retrieve default BDR replication set"));
log_hint(_("see preceding messages"));
log_debug("check query in get_default_bdr_replication_set()");
exit(ERR_BAD_CONFIG);
}
if (is_table_in_bdr_replication_set(conn, "nodes", replication_set) == false)
{
add_table_to_bdr_replication_set(conn, "nodes", replication_set);
}
pfree(replication_set);
}
initPQExpBuffer(&event_details); initPQExpBuffer(&event_details);
@@ -442,7 +411,7 @@ do_bdr_unregister(void)
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
} }
extension_status = get_repmgr_extension_status(conn, NULL); extension_status = get_repmgr_extension_status(conn);
if (extension_status != REPMGR_INSTALLED) if (extension_status != REPMGR_INSTALLED)
{ {
log_error(_("repmgr is not installed on database \"%s\""), dbname); log_error(_("repmgr is not installed on database \"%s\""), dbname);

View File

@@ -26,6 +26,7 @@
#define SHOW_HEADER_COUNT 7 #define SHOW_HEADER_COUNT 7
typedef enum typedef enum
{ {
SHOW_ID = 0, SHOW_ID = 0,
@@ -50,13 +51,21 @@ typedef enum
} EventHeader; } EventHeader;
struct ColHeader
{
char title[MAXLEN];
int max_length;
int cur_length;
};
struct ColHeader headers_show[SHOW_HEADER_COUNT]; struct ColHeader headers_show[SHOW_HEADER_COUNT];
struct ColHeader headers_event[EVENT_HEADER_COUNT]; struct ColHeader headers_event[EVENT_HEADER_COUNT];
static int build_cluster_matrix(t_node_matrix_rec ***matrix_rec_dest, int *name_length, ItemList *warnings, int *error_code); static int build_cluster_matrix(t_node_matrix_rec ***matrix_rec_dest, int *name_length);
static int build_cluster_crosscheck(t_node_status_cube ***cube_dest, int *name_length, ItemList *warnings, int *error_code); static int build_cluster_crosscheck(t_node_status_cube ***cube_dest, int *name_length);
static void cube_set_node_status(t_node_status_cube **cube, int n, int node_id, int matrix_node_id, int connection_node_id, int connection_status); static void cube_set_node_status(t_node_status_cube **cube, int n, int node_id, int matrix_node_id, int connection_node_id, int connection_status);
/* /*
@@ -74,8 +83,6 @@ do_cluster_show(void)
int i = 0; int i = 0;
ItemList warnings = {NULL, NULL}; ItemList warnings = {NULL, NULL};
bool success = false; bool success = false;
bool error_found = false;
bool connection_error_found = false;
/* Connect to local database to obtain cluster connection data */ /* Connect to local database to obtain cluster connection data */
log_verbose(LOG_INFO, _("connecting to database")); log_verbose(LOG_INFO, _("connecting to database"));
@@ -132,28 +139,16 @@ do_cluster_show(void)
cell->node_info->recovery_type = get_recovery_type(cell->node_info->conn); cell->node_info->recovery_type = get_recovery_type(cell->node_info->conn);
} }
else else
{
cell->node_info->node_status = NODE_STATUS_DOWN;
cell->node_info->recovery_type = RECTYPE_UNKNOWN;
connection_error_found = true;
if (runtime_options.verbose)
{ {
char error[MAXLEN]; char error[MAXLEN];
strncpy(error, PQerrorMessage(cell->node_info->conn), MAXLEN); strncpy(error, PQerrorMessage(cell->node_info->conn), MAXLEN);
cell->node_info->node_status = NODE_STATUS_DOWN;
cell->node_info->recovery_type = RECTYPE_UNKNOWN;
item_list_append_format(&warnings, item_list_append_format(&warnings,
"when attempting to connect to node \"%s\" (ID: %i), following error encountered :\n\"%s\"", "when attempting to connect to node \"%s\" (ID: %i), following error encountered :\n\"%s\"",
cell->node_info->node_name, cell->node_info->node_id, trim(error)); cell->node_info->node_name, cell->node_info->node_id, trim(error));
} }
else
{
item_list_append_format(&warnings,
"unable to connect to node \"%s\" (ID: %i)",
cell->node_info->node_name, cell->node_info->node_id);
}
}
initPQExpBuffer(&details); initPQExpBuffer(&details);
@@ -174,16 +169,16 @@ do_cluster_show(void)
switch (cell->node_info->recovery_type) switch (cell->node_info->recovery_type)
{ {
case RECTYPE_PRIMARY: case RECTYPE_PRIMARY:
appendPQExpBufferStr(&details, "* running"); appendPQExpBuffer(&details, "* running");
break; break;
case RECTYPE_STANDBY: case RECTYPE_STANDBY:
appendPQExpBufferStr(&details, "! running as standby"); appendPQExpBuffer(&details, "! running as standby");
item_list_append_format(&warnings, item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is registered as primary but running as standby", "node \"%s\" (ID: %i) is registered as primary but running as standby",
cell->node_info->node_name, cell->node_info->node_id); cell->node_info->node_name, cell->node_info->node_id);
break; break;
case RECTYPE_UNKNOWN: case RECTYPE_UNKNOWN:
appendPQExpBufferStr(&details, "! unknown"); appendPQExpBuffer(&details, "! unknown");
item_list_append_format(&warnings, item_list_append_format(&warnings,
"node \"%s\" (ID: %i) has unknown replication status", "node \"%s\" (ID: %i) has unknown replication status",
cell->node_info->node_name, cell->node_info->node_id); cell->node_info->node_name, cell->node_info->node_id);
@@ -194,14 +189,14 @@ do_cluster_show(void)
{ {
if (cell->node_info->recovery_type == RECTYPE_PRIMARY) if (cell->node_info->recovery_type == RECTYPE_PRIMARY)
{ {
appendPQExpBufferStr(&details, "! running"); appendPQExpBuffer(&details, "! running");
item_list_append_format(&warnings, item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is running but the repmgr node record is inactive", "node \"%s\" (ID: %i) is running but the repmgr node record is inactive",
cell->node_info->node_name, cell->node_info->node_id); cell->node_info->node_name, cell->node_info->node_id);
} }
else else
{ {
appendPQExpBufferStr(&details, "! running as standby"); appendPQExpBuffer(&details, "! running as standby");
item_list_append_format(&warnings, item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is registered as an inactive primary but running as standby", "node \"%s\" (ID: %i) is registered as an inactive primary but running as standby",
cell->node_info->node_name, cell->node_info->node_id); cell->node_info->node_name, cell->node_info->node_id);
@@ -214,7 +209,7 @@ do_cluster_show(void)
/* node is unreachable but marked active */ /* node is unreachable but marked active */
if (cell->node_info->active == true) if (cell->node_info->active == true)
{ {
appendPQExpBufferStr(&details, "? unreachable"); appendPQExpBuffer(&details, "? unreachable");
item_list_append_format(&warnings, item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is registered as an active primary but is unreachable", "node \"%s\" (ID: %i) is registered as an active primary but is unreachable",
cell->node_info->node_name, cell->node_info->node_id); cell->node_info->node_name, cell->node_info->node_id);
@@ -222,8 +217,7 @@ do_cluster_show(void)
/* node is unreachable and marked as inactive */ /* node is unreachable and marked as inactive */
else else
{ {
appendPQExpBufferStr(&details, "- failed"); appendPQExpBuffer(&details, "- failed");
error_found = true;
} }
} }
} }
@@ -238,16 +232,16 @@ do_cluster_show(void)
switch (cell->node_info->recovery_type) switch (cell->node_info->recovery_type)
{ {
case RECTYPE_STANDBY: case RECTYPE_STANDBY:
appendPQExpBufferStr(&details, " running"); appendPQExpBuffer(&details, " running");
break; break;
case RECTYPE_PRIMARY: case RECTYPE_PRIMARY:
appendPQExpBufferStr(&details, "! running as primary"); appendPQExpBuffer(&details, "! running as primary");
item_list_append_format(&warnings, item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is registered as standby but running as primary", "node \"%s\" (ID: %i) is registered as standby but running as primary",
cell->node_info->node_name, cell->node_info->node_id); cell->node_info->node_name, cell->node_info->node_id);
break; break;
case RECTYPE_UNKNOWN: case RECTYPE_UNKNOWN:
appendPQExpBufferStr(&details, "! unknown"); appendPQExpBuffer(&details, "! unknown");
item_list_append_format( item_list_append_format(
&warnings, &warnings,
"node \"%s\" (ID: %i) has unknown replication status", "node \"%s\" (ID: %i) has unknown replication status",
@@ -259,14 +253,14 @@ do_cluster_show(void)
{ {
if (cell->node_info->recovery_type == RECTYPE_STANDBY) if (cell->node_info->recovery_type == RECTYPE_STANDBY)
{ {
appendPQExpBufferStr(&details, "! running"); appendPQExpBuffer(&details, "! running");
item_list_append_format(&warnings, item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is running but the repmgr node record is inactive", "node \"%s\" (ID: %i) is running but the repmgr node record is inactive",
cell->node_info->node_name, cell->node_info->node_id); cell->node_info->node_name, cell->node_info->node_id);
} }
else else
{ {
appendPQExpBufferStr(&details, "! running as primary"); appendPQExpBuffer(&details, "! running as primary");
item_list_append_format(&warnings, item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is running as primary but the repmgr node record is inactive", "node \"%s\" (ID: %i) is running as primary but the repmgr node record is inactive",
cell->node_info->node_name, cell->node_info->node_id); cell->node_info->node_name, cell->node_info->node_id);
@@ -279,15 +273,14 @@ do_cluster_show(void)
/* node is unreachable but marked active */ /* node is unreachable but marked active */
if (cell->node_info->active == true) if (cell->node_info->active == true)
{ {
appendPQExpBufferStr(&details, "? unreachable"); appendPQExpBuffer(&details, "? unreachable");
item_list_append_format(&warnings, item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is registered as an active standby but is unreachable", "node \"%s\" (ID: %i) is registered as an active standby but is unreachable",
cell->node_info->node_name, cell->node_info->node_id); cell->node_info->node_name, cell->node_info->node_id);
} }
else else
{ {
appendPQExpBufferStr(&details, "- failed"); appendPQExpBuffer(&details, "- failed");
error_found = true;
} }
} }
} }
@@ -299,35 +292,24 @@ do_cluster_show(void)
if (cell->node_info->node_status == NODE_STATUS_UP) if (cell->node_info->node_status == NODE_STATUS_UP)
{ {
if (cell->node_info->active == true) if (cell->node_info->active == true)
{ appendPQExpBuffer(&details, "* running");
appendPQExpBufferStr(&details, "* running");
}
else else
{ appendPQExpBuffer(&details, "! running");
appendPQExpBufferStr(&details, "! running");
error_found = true;
}
} }
/* node is unreachable */ /* node is unreachable */
else else
{ {
if (cell->node_info->active == true) if (cell->node_info->active == true)
{ appendPQExpBuffer(&details, "? unreachable");
appendPQExpBufferStr(&details, "? unreachable");
}
else else
{ appendPQExpBuffer(&details, "- failed");
appendPQExpBufferStr(&details, "- failed");
error_found = true;
}
} }
} }
break; break;
case UNKNOWN: case UNKNOWN:
{ {
/* this should never happen */ /* this should never happen */
appendPQExpBufferStr(&details, "? unknown node type"); appendPQExpBuffer(&details, "? unknown node type");
error_found = true;
} }
break; break;
} }
@@ -355,10 +337,36 @@ do_cluster_show(void)
} }
/* Print column header row (text mode only) */
if (runtime_options.output_mode == OM_TEXT) if (runtime_options.output_mode == OM_TEXT)
{ {
print_status_header(SHOW_HEADER_COUNT, headers_show); for (i = 0; i < SHOW_HEADER_COUNT; i++)
{
if (i == 0)
printf(" ");
else
printf(" | ");
printf("%-*s",
headers_show[i].max_length,
headers_show[i].title);
}
printf("\n");
printf("-");
for (i = 0; i < SHOW_HEADER_COUNT; i++)
{
int j;
for (j = 0; j < headers_show[i].max_length; j++)
printf("-");
if (i < (SHOW_HEADER_COUNT - 1))
printf("-+-");
else
printf("-");
}
printf("\n");
} }
for (cell = nodes.head; cell; cell = cell->next) for (cell = nodes.head; cell; cell = cell->next)
@@ -406,6 +414,7 @@ do_cluster_show(void)
PQfinish(conn); PQfinish(conn);
/* emit any warnings */ /* emit any warnings */
if (warnings.head != NULL && runtime_options.terse == false && runtime_options.output_mode != OM_CSV) if (warnings.head != NULL && runtime_options.terse == false && runtime_options.output_mode != OM_CSV)
{ {
ItemListCell *cell = NULL; ItemListCell *cell = NULL;
@@ -415,25 +424,6 @@ do_cluster_show(void)
{ {
printf(_(" - %s\n"), cell->string); printf(_(" - %s\n"), cell->string);
} }
if (runtime_options.verbose == false && connection_error_found == true)
{
log_hint(_("execute with --verbose option to see connection error messages"));
}
}
/*
* If warnings were noted, even if they're not displayed (e.g. in --csv node),
* that means something's not right so we need to emit a non-zero exit code.
*/
if (warnings.head != NULL)
{
error_found = true;
}
if (error_found == true)
{
exit(ERR_NODE_STATUS);
} }
} }
@@ -446,7 +436,6 @@ do_cluster_show(void)
* --all * --all
* --node-[id|name] * --node-[id|name]
* --event * --event
* --csv
*/ */
void void
@@ -491,12 +480,8 @@ do_cluster_event(void)
strncpy(headers_event[EV_TIMESTAMP].title, _("Timestamp"), MAXLEN); strncpy(headers_event[EV_TIMESTAMP].title, _("Timestamp"), MAXLEN);
strncpy(headers_event[EV_DETAILS].title, _("Details"), MAXLEN); strncpy(headers_event[EV_DETAILS].title, _("Details"), MAXLEN);
/* /* if --terse provided, simply omit the "Details" column */
* If --terse or --csv provided, simply omit the "Details" column. if (runtime_options.terse == true)
* In --csv mode we'd need to quote/escape the contents "Details" column,
* which is doable but which will remain a TODO for now.
*/
if (runtime_options.terse == true || runtime_options.output_mode == OM_CSV)
column_count --; column_count --;
for (i = 0; i < column_count; i++) for (i = 0; i < column_count; i++)
@@ -519,8 +504,6 @@ do_cluster_event(void)
} }
if (runtime_options.output_mode == OM_TEXT)
{
for (i = 0; i < column_count; i++) for (i = 0; i < column_count; i++)
{ {
if (i == 0) if (i == 0)
@@ -548,25 +531,11 @@ do_cluster_event(void)
} }
printf("\n"); printf("\n");
}
for (i = 0; i < PQntuples(res); i++) for (i = 0; i < PQntuples(res); i++)
{ {
int j; int j;
if (runtime_options.output_mode == OM_CSV)
{
for (j = 0; j < column_count; j++)
{
printf("%s", PQgetvalue(res, i, j));
if ((j + 1) < column_count)
{
printf(",");
}
}
}
else
{
printf(" "); printf(" ");
for (j = 0; j < column_count; j++) for (j = 0; j < column_count; j++)
{ {
@@ -577,7 +546,6 @@ do_cluster_event(void)
if (j < (column_count - 1)) if (j < (column_count - 1))
printf(" | "); printf(" | ");
} }
}
printf("\n"); printf("\n");
} }
@@ -586,7 +554,6 @@ do_cluster_event(void)
PQfinish(conn); PQfinish(conn);
if (runtime_options.output_mode == OM_TEXT)
puts(""); puts("");
} }
@@ -602,12 +569,9 @@ do_cluster_crosscheck(void)
t_node_status_cube **cube; t_node_status_cube **cube;
bool connection_error_found = false; bool error_found = false;
int error_code = SUCCESS;
ItemList warnings = {NULL, NULL};
n = build_cluster_crosscheck(&cube, &name_length, &warnings, &error_code);
n = build_cluster_crosscheck(&cube, &name_length);
if (runtime_options.output_mode == OM_CSV) if (runtime_options.output_mode == OM_CSV)
{ {
for (i = 0; i < n; i++) for (i = 0; i < n; i++)
@@ -629,11 +593,6 @@ do_cluster_crosscheck(void)
cube[i]->node_id, cube[i]->node_id,
cube[j]->node_id, cube[j]->node_id,
max_node_status); max_node_status);
if (max_node_status == -1)
{
connection_error_found = true;
}
} }
} }
@@ -691,16 +650,16 @@ do_cluster_crosscheck(void)
{ {
case -2: case -2:
c = '?'; c = '?';
error_found = true;
break; break;
case -1: case -1:
c = 'x'; c = 'x';
connection_error_found = true; error_found = true;
break; break;
case 0: case 0:
c = '*'; c = '*';
break; break;
default: default:
log_error("unexpected node status value %i", max_node_status);
exit(ERR_INTERNAL); exit(ERR_INTERNAL);
} }
@@ -709,13 +668,6 @@ do_cluster_crosscheck(void)
printf("\n"); printf("\n");
} }
if (warnings.head != NULL && runtime_options.terse == false)
{
log_warning(_("following problems detected:"));
print_item_list(&warnings);
}
} }
/* clean up allocated cube array */ /* clean up allocated cube array */
@@ -742,23 +694,13 @@ do_cluster_crosscheck(void)
free(cube); free(cube);
} }
/* errors detected by build_cluster_crosscheck() have priority */ if (error_found == true)
if (connection_error_found == true)
{ {
error_code = ERR_NODE_STATUS; exit(ERR_CLUSTER_CHECK);
} }
exit(error_code);
} }
/*
* CLUSTER MATRIX
*
* Parameters:
* --csv
*/
void void
do_cluster_matrix() do_cluster_matrix()
{ {
@@ -771,30 +713,18 @@ do_cluster_matrix()
t_node_matrix_rec **matrix_rec_list; t_node_matrix_rec **matrix_rec_list;
bool connection_error_found = false; bool error_found = false;
int error_code = SUCCESS;
ItemList warnings = {NULL, NULL};
n = build_cluster_matrix(&matrix_rec_list, &name_length, &warnings, &error_code); n = build_cluster_matrix(&matrix_rec_list, &name_length);
if (runtime_options.output_mode == OM_CSV) if (runtime_options.output_mode == OM_CSV)
{ {
for (i = 0; i < n; i++) for (i = 0; i < n; i++)
{
for (j = 0; j < n; j++) for (j = 0; j < n; j++)
{
printf("%d,%d,%d\n", printf("%d,%d,%d\n",
matrix_rec_list[i]->node_id, matrix_rec_list[i]->node_id,
matrix_rec_list[i]->node_status_list[j]->node_id, matrix_rec_list[i]->node_status_list[j]->node_id,
matrix_rec_list[i]->node_status_list[j]->node_status); matrix_rec_list[i]->node_status_list[j]->node_status);
if (matrix_rec_list[i]->node_status_list[j]->node_status == -2
|| matrix_rec_list[i]->node_status_list[j]->node_status == -1)
{
connection_error_found = true;
}
}
}
} }
else else
{ {
@@ -823,16 +753,16 @@ do_cluster_matrix()
{ {
case -2: case -2:
c = '?'; c = '?';
error_found = true;
break; break;
case -1: case -1:
c = 'x'; c = 'x';
connection_error_found = true; error_found = true;
break; break;
case 0: case 0:
c = '*'; c = '*';
break; break;
default: default:
log_error("unexpected node status value %i", matrix_rec_list[i]->node_status_list[j]->node_status);
exit(ERR_INTERNAL); exit(ERR_INTERNAL);
} }
@@ -840,13 +770,6 @@ do_cluster_matrix()
} }
printf("\n"); printf("\n");
} }
if (warnings.head != NULL && runtime_options.terse == false)
{
log_warning(_("following problems detected:"));
print_item_list(&warnings);
}
} }
for (i = 0; i < n; i++) for (i = 0; i < n; i++)
@@ -861,13 +784,10 @@ do_cluster_matrix()
free(matrix_rec_list); free(matrix_rec_list);
/* actual database connection errors have priority */ if (error_found == true)
if (connection_error_found == true)
{ {
error_code = ERR_NODE_STATUS; exit(ERR_CLUSTER_CHECK);
} }
exit(error_code);
} }
@@ -896,7 +816,7 @@ matrix_set_node_status(t_node_matrix_rec **matrix_rec_list, int n, int node_id,
static int static int
build_cluster_matrix(t_node_matrix_rec ***matrix_rec_dest, int *name_length, ItemList *warnings, int *error_code) build_cluster_matrix(t_node_matrix_rec ***matrix_rec_dest, int *name_length)
{ {
PGconn *conn = NULL; PGconn *conn = NULL;
int i = 0, int i = 0,
@@ -925,12 +845,7 @@ build_cluster_matrix(t_node_matrix_rec ***matrix_rec_dest, int *name_length, Ite
local_node_id = runtime_options.node_id; local_node_id = runtime_options.node_id;
} }
if (get_all_node_records(conn, &nodes) == false) get_all_node_records(conn, &nodes);
{
/* get_all_node_records() will display the error */
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
PQfinish(conn); PQfinish(conn);
conn = NULL; conn = NULL;
@@ -944,7 +859,7 @@ build_cluster_matrix(t_node_matrix_rec ***matrix_rec_dest, int *name_length, Ite
/* /*
* Allocate an empty matrix record list * Allocate an empty matrix record list
* *
* -2 == NULL ? -1 == Error x 0 == OK * -2 == NULL ? -1 == Error x 0 == OK *
*/ */
matrix_rec_list = (t_node_matrix_rec **) pg_malloc0(sizeof(t_node_matrix_rec) * nodes.node_count); matrix_rec_list = (t_node_matrix_rec **) pg_malloc0(sizeof(t_node_matrix_rec) * nodes.node_count);
@@ -1007,7 +922,7 @@ build_cluster_matrix(t_node_matrix_rec ***matrix_rec_dest, int *name_length, Ite
host = param_get(&remote_conninfo, "host"); host = param_get(&remote_conninfo, "host");
node_conn = establish_db_connection_quiet(cell->node_info->conninfo); node_conn = establish_db_connection(cell->node_info->conninfo, false);
connection_status = connection_status =
(PQstatus(node_conn) == CONNECTION_OK) ? 0 : -1; (PQstatus(node_conn) == CONNECTION_OK) ? 0 : -1;
@@ -1044,12 +959,24 @@ build_cluster_matrix(t_node_matrix_rec ***matrix_rec_dest, int *name_length, Ite
* remote repmgr - those are the only values it needs to work, and * remote repmgr - those are the only values it needs to work, and
* saves us making assumptions about the location of repmgr.conf * saves us making assumptions about the location of repmgr.conf
*/ */
appendPQExpBufferChar(&command, '"'); appendPQExpBuffer(&command,
"\"%s -d '%s' ",
make_pg_path(progname()),
cell->node_info->conninfo);
make_remote_repmgr_path(&command, cell->node_info);
appendPQExpBufferStr(&command, if (strlen(pg_bindir))
" cluster show --csv -L NOTICE --terse\""); {
appendPQExpBuffer(&command,
"--pg_bindir=");
appendShellString(&command,
pg_bindir);
appendPQExpBuffer(&command,
" ");
}
appendPQExpBuffer(&command,
" cluster show --csv\"");
log_verbose(LOG_DEBUG, "build_cluster_matrix(): executing:\n %s", command.data); log_verbose(LOG_DEBUG, "build_cluster_matrix(): executing:\n %s", command.data);
@@ -1064,50 +991,32 @@ build_cluster_matrix(t_node_matrix_rec ***matrix_rec_dest, int *name_length, Ite
termPQExpBuffer(&command); termPQExpBuffer(&command);
/* no output returned - probably SSH error */
if (p[0] == '\0' || p[0] == '\n')
{
item_list_append_format(warnings,
"node %i inaccessible via SSH",
connection_node_id);
*error_code = ERR_BAD_SSH;
}
else
{
for (j = 0; j < nodes.node_count; j++) for (j = 0; j < nodes.node_count; j++)
{ {
if (sscanf(p, "%d,%d", &x, &y) != 2) if (sscanf(p, "%d,%d", &x, &y) != 2)
{ {
matrix_set_node_status(matrix_rec_list, fprintf(stderr, _("cannot parse --csv output: %s\n"), p);
nodes.node_count, PQfinish(node_conn);
connection_node_id, exit(ERR_INTERNAL);
x,
-2);
item_list_append_format(warnings,
"unable to parse --csv output for node %i; output returned was:\n\"%s\"",
connection_node_id, p);
*error_code = ERR_INTERNAL;
} }
else
{
matrix_set_node_status(matrix_rec_list, matrix_set_node_status(matrix_rec_list,
nodes.node_count, nodes.node_count,
connection_node_id, connection_node_id,
x, x,
(y == -1) ? -1 : 0); (y == -1) ? -1 : 0);
}
while (*p && (*p != '\n')) while (*p && (*p != '\n'))
p++; p++;
if (*p == '\n') if (*p == '\n')
p++; p++;
} }
}
termPQExpBuffer(&command_output); termPQExpBuffer(&command_output);
PQfinish(node_conn); PQfinish(node_conn);
free_conninfo_params(&remote_conninfo); free_conninfo_params(&remote_conninfo);
node_conn = NULL;
} }
*matrix_rec_dest = matrix_rec_list; *matrix_rec_dest = matrix_rec_list;
@@ -1120,7 +1029,7 @@ build_cluster_matrix(t_node_matrix_rec ***matrix_rec_dest, int *name_length, Ite
static int static int
build_cluster_crosscheck(t_node_status_cube ***dest_cube, int *name_length, ItemList *warnings, int *error_code) build_cluster_crosscheck(t_node_status_cube ***dest_cube, int *name_length)
{ {
PGconn *conn = NULL; PGconn *conn = NULL;
int h, int h,
@@ -1141,12 +1050,7 @@ build_cluster_crosscheck(t_node_status_cube ***dest_cube, int *name_length, Item
else else
conn = establish_db_connection_by_params(&source_conninfo, true); conn = establish_db_connection_by_params(&source_conninfo, true);
if (get_all_node_records(conn, &nodes) == false) get_all_node_records(conn, &nodes);
{
/* get_all_node_records() will display the error */
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
PQfinish(conn); PQfinish(conn);
conn = NULL; conn = NULL;
@@ -1233,13 +1137,28 @@ build_cluster_crosscheck(t_node_status_cube ***dest_cube, int *name_length, Item
initPQExpBuffer(&command); initPQExpBuffer(&command);
make_remote_repmgr_path(&command, cell->node_info); appendPQExpBuffer(&command,
"%s -d '%s' --node-id=%i ",
make_pg_path(progname()),
cell->node_info->conninfo,
remote_node_id);
appendPQExpBufferStr(&command, if (strlen(pg_bindir))
" cluster matrix --csv -L NOTICE --terse"); {
appendPQExpBuffer(&command,
"--pg_bindir=");
appendShellString(&command,
pg_bindir);
appendPQExpBuffer(&command,
" ");
}
appendPQExpBuffer(&command,
"cluster matrix --csv 2>/dev/null");
initPQExpBuffer(&command_output); initPQExpBuffer(&command_output);
/* fix to work with --node-id */
if (cube[i]->node_id == config_file_options.node_id) if (cube[i]->node_id == config_file_options.node_id)
{ {
(void) local_command_simple(command.data, (void) local_command_simple(command.data,
@@ -1280,13 +1199,9 @@ build_cluster_crosscheck(t_node_status_cube ***dest_cube, int *name_length, Item
p = command_output.data; p = command_output.data;
if (p[0] == '\0' || p[0] == '\n') if (!strlen(command_output.data))
{ {
item_list_append_format(warnings,
"node %i inaccessible via SSH",
remote_node_id);
termPQExpBuffer(&command_output); termPQExpBuffer(&command_output);
*error_code = ERR_BAD_SSH;
continue; continue;
} }
@@ -1298,23 +1213,16 @@ build_cluster_crosscheck(t_node_status_cube ***dest_cube, int *name_length, Item
if (sscanf(p, "%d,%d,%d", &matrix_rec_node_id, &node_status_node_id, &node_status) != 3) if (sscanf(p, "%d,%d,%d", &matrix_rec_node_id, &node_status_node_id, &node_status) != 3)
{ {
cube_set_node_status(cube, fprintf(stderr, _("cannot parse --csv output: %s\n"), p);
nodes.node_count, exit(ERR_INTERNAL);
remote_node_id,
matrix_rec_node_id,
node_status_node_id,
-2);
*error_code = ERR_INTERNAL;
} }
else
{
cube_set_node_status(cube, cube_set_node_status(cube,
nodes.node_count, nodes.node_count,
remote_node_id, remote_node_id,
matrix_rec_node_id, matrix_rec_node_id,
node_status_node_id, node_status_node_id,
node_status); node_status);
}
while (*p && (*p != '\n')) while (*p && (*p != '\n'))
p++; p++;
@@ -1374,7 +1282,6 @@ do_cluster_cleanup(void)
PGconn *conn = NULL; PGconn *conn = NULL;
PGconn *primary_conn = NULL; PGconn *primary_conn = NULL;
int entries_to_delete = 0; int entries_to_delete = 0;
PQExpBufferData event_details;
conn = establish_db_connection(config_file_options.conninfo, true); conn = establish_db_connection(config_file_options.conninfo, true);
@@ -1386,17 +1293,9 @@ do_cluster_cleanup(void)
log_debug(_("number of days of monitoring history to retain: %i"), runtime_options.keep_history); log_debug(_("number of days of monitoring history to retain: %i"), runtime_options.keep_history);
entries_to_delete = get_number_of_monitoring_records_to_delete(primary_conn, entries_to_delete = get_number_of_monitoring_records_to_delete(primary_conn, runtime_options.keep_history);
runtime_options.keep_history,
runtime_options.node_id);
if (entries_to_delete < 0) if (entries_to_delete == 0)
{
log_error(_("unable to query number of monitoring records to clean up"));
PQfinish(primary_conn);
exit(ERR_DB_QUERY);
}
else if (entries_to_delete == 0)
{ {
log_info(_("no monitoring records to delete")); log_info(_("no monitoring records to delete"));
PQfinish(primary_conn); PQfinish(primary_conn);
@@ -1406,23 +1305,10 @@ do_cluster_cleanup(void)
log_debug("at least %i monitoring records for deletion", log_debug("at least %i monitoring records for deletion",
entries_to_delete); entries_to_delete);
initPQExpBuffer(&event_details); if (delete_monitoring_records(primary_conn, runtime_options.keep_history) == false)
if (delete_monitoring_records(primary_conn, runtime_options.keep_history, runtime_options.node_id) == false)
{ {
appendPQExpBufferStr(&event_details, log_error(_("unable to delete monitoring records"));
_("unable to delete monitoring records"));
log_error("%s", event_details.data);
log_detail("%s", PQerrorMessage(primary_conn)); log_detail("%s", PQerrorMessage(primary_conn));
create_event_notification(primary_conn,
&config_file_options,
config_file_options.node_id,
"cluster_cleanup",
false,
event_details.data);
PQfinish(primary_conn); PQfinish(primary_conn);
exit(ERR_DB_QUERY); exit(ERR_DB_QUERY);
} }
@@ -1434,40 +1320,19 @@ do_cluster_cleanup(void)
log_detail("%s", PQerrorMessage(primary_conn)); log_detail("%s", PQerrorMessage(primary_conn));
} }
if (runtime_options.keep_history == 0)
PQfinish(primary_conn);
if (runtime_options.keep_history > 0)
{ {
appendPQExpBufferStr(&event_details, log_notice(_("monitoring records older than %i day(s) deleted"),
_("all monitoring records deleted")); runtime_options.keep_history);
} }
else else
{ {
appendPQExpBufferStr(&event_details, log_info(_("all monitoring records deleted"));
_("monitoring records deleted"));
} }
if (runtime_options.node_id != UNKNOWN_NODE_ID)
appendPQExpBuffer(&event_details,
_(" for node %i"),
runtime_options.node_id);
if (runtime_options.keep_history > 0)
appendPQExpBuffer(&event_details,
_("; records newer than %i day(s) retained"),
runtime_options.keep_history);
create_event_notification(primary_conn,
&config_file_options,
config_file_options.node_id,
"cluster_cleanup",
true,
event_details.data);
log_notice("%s", event_details.data);
termPQExpBuffer(&event_details);
PQfinish(primary_conn);
return; return;
} }
@@ -1482,7 +1347,6 @@ do_cluster_help(void)
printf(_(" %s [OPTIONS] cluster matrix\n"), progname()); printf(_(" %s [OPTIONS] cluster matrix\n"), progname());
printf(_(" %s [OPTIONS] cluster crosscheck\n"), progname()); printf(_(" %s [OPTIONS] cluster crosscheck\n"), progname());
printf(_(" %s [OPTIONS] cluster event\n"), progname()); printf(_(" %s [OPTIONS] cluster event\n"), progname());
printf(_(" %s [OPTIONS] cluster cleanup\n"), progname());
puts(""); puts("");
printf(_("CLUSTER SHOW\n")); printf(_("CLUSTER SHOW\n"));
@@ -1522,12 +1386,11 @@ do_cluster_help(void)
printf(_(" --event filter specific event\n")); printf(_(" --event filter specific event\n"));
printf(_(" --node-id restrict entries to node with this ID\n")); printf(_(" --node-id restrict entries to node with this ID\n"));
printf(_(" --node-name restrict entries to node with this name\n")); printf(_(" --node-name restrict entries to node with this name\n"));
printf(_(" --csv emit output as CSV\n"));
puts(""); puts("");
printf(_("CLUSTER CLEANUP\n")); printf(_("CLUSTER CLEANUP\n"));
puts(""); puts("");
printf(_(" \"cluster cleanup\" purges records from the \"repmgr.monitoring_history\" table.\n")); printf(_(" \"cluster cleanup\" purges records from the \"repmgr.monitor\" table.\n"));
puts(""); puts("");
printf(_(" -k, --keep-history=VALUE retain indicated number of days of history (default: 0)\n")); printf(_(" -k, --keep-history=VALUE retain indicated number of days of history (default: 0)\n"));
puts(""); puts("");

View File

@@ -1,420 +0,0 @@
/*
* repmgr-action-daemon.c
*
* Implements repmgrd actions for the repmgr command line utility
* Copyright (c) 2ndQuadrant, 2010-2018
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include "repmgr.h"
#include "repmgr-client-global.h"
#include "repmgr-action-daemon.h"
/*
* Possibly also show:
* - repmgrd start time?
* - repmgrd mode
* - priority
* - whether promotion candidate (due to zero priority/different location)
*/
typedef enum
{
STATUS_ID = 0,
STATUS_NAME,
STATUS_ROLE,
STATUS_PG,
STATUS_RUNNING,
STATUS_PID,
STATUS_PAUSED
} StatusHeader;
#define STATUS_HEADER_COUNT 7
struct ColHeader headers_status[STATUS_HEADER_COUNT];
static void fetch_node_records(PGconn *conn, NodeInfoList *node_list);
static void _do_repmgr_pause(bool pause);
void
do_daemon_status(void)
{
PGconn *conn = NULL;
NodeInfoList nodes = T_NODE_INFO_LIST_INITIALIZER;
NodeInfoListCell *cell = NULL;
int i;
RepmgrdInfo **repmgrd_info;
ItemList warnings = {NULL, NULL};
/* Connect to local database to obtain cluster connection data */
log_verbose(LOG_INFO, _("connecting to database"));
if (strlen(config_file_options.conninfo))
conn = establish_db_connection(config_file_options.conninfo, true);
else
conn = establish_db_connection_by_params(&source_conninfo, true);
fetch_node_records(conn, &nodes);
repmgrd_info = (RepmgrdInfo **) pg_malloc0(sizeof(RepmgrdInfo *) * nodes.node_count);
if (repmgrd_info == NULL)
{
log_error(_("unable to allocate memory"));
exit(ERR_OUT_OF_MEMORY);
}
strncpy(headers_status[STATUS_ID].title, _("ID"), MAXLEN);
strncpy(headers_status[STATUS_NAME].title, _("Name"), MAXLEN);
strncpy(headers_status[STATUS_ROLE].title, _("Role"), MAXLEN);
strncpy(headers_status[STATUS_PG].title, _("Status"), MAXLEN);
strncpy(headers_status[STATUS_RUNNING].title, _("repmgrd"), MAXLEN);
strncpy(headers_status[STATUS_PID].title, _("PID"), MAXLEN);
strncpy(headers_status[STATUS_PAUSED].title, _("Paused?"), MAXLEN);
for (i = 0; i < STATUS_HEADER_COUNT; i++)
{
headers_status[i].max_length = strlen(headers_status[i].title);
}
i = 0;
for (cell = nodes.head; cell; cell = cell->next)
{
int j;
repmgrd_info[i] = pg_malloc0(sizeof(RepmgrdInfo));
repmgrd_info[i]->node_id = cell->node_info->node_id;
repmgrd_info[i]->pid = UNKNOWN_PID;
repmgrd_info[i]->paused = false;
repmgrd_info[i]->running = false;
repmgrd_info[i]->pg_running = true;
cell->node_info->conn = establish_db_connection_quiet(cell->node_info->conninfo);
if (PQstatus(cell->node_info->conn) != CONNECTION_OK)
{
if (runtime_options.verbose)
{
char error[MAXLEN];
strncpy(error, PQerrorMessage(cell->node_info->conn), MAXLEN);
item_list_append_format(&warnings,
"when attempting to connect to node \"%s\" (ID: %i), following error encountered :\n\"%s\"",
cell->node_info->node_name, cell->node_info->node_id, trim(error));
}
else
{
item_list_append_format(&warnings,
"unable to connect to node \"%s\" (ID: %i)",
cell->node_info->node_name, cell->node_info->node_id);
}
repmgrd_info[i]->pg_running = false;
maxlen_snprintf(repmgrd_info[i]->pg_running_text, "%s", _("not running"));
maxlen_snprintf(repmgrd_info[i]->repmgrd_running, "%s", _("n/a"));
maxlen_snprintf(repmgrd_info[i]->pid_text, "%s", _("n/a"));
}
else
{
maxlen_snprintf(repmgrd_info[i]->pg_running_text, "%s", _("running"));
repmgrd_info[i]->pid = repmgrd_get_pid(cell->node_info->conn);
repmgrd_info[i]->running = repmgrd_is_running(cell->node_info->conn);
if (repmgrd_info[i]->running == true)
{
maxlen_snprintf(repmgrd_info[i]->repmgrd_running, "%s", _("running"));
}
else
{
maxlen_snprintf(repmgrd_info[i]->repmgrd_running, "%s", _("not running"));
}
if (repmgrd_info[i]->pid == UNKNOWN_PID)
{
maxlen_snprintf(repmgrd_info[i]->pid_text, "%s", _("n/a"));
}
else
{
maxlen_snprintf(repmgrd_info[i]->pid_text, "%i", repmgrd_info[i]->pid);
}
repmgrd_info[i]->paused = repmgrd_is_paused(cell->node_info->conn);
PQfinish(cell->node_info->conn);
}
headers_status[STATUS_NAME].cur_length = strlen(cell->node_info->node_name);
headers_status[STATUS_ROLE].cur_length = strlen(get_node_type_string(cell->node_info->type));
headers_status[STATUS_PID].cur_length = strlen(repmgrd_info[i]->pid_text);
headers_status[STATUS_RUNNING].cur_length = strlen(repmgrd_info[i]->repmgrd_running);
headers_status[STATUS_PG].cur_length = strlen(repmgrd_info[i]->pg_running_text);
for (j = 0; j < STATUS_HEADER_COUNT; j++)
{
if (headers_status[j].cur_length > headers_status[j].max_length)
{
headers_status[j].max_length = headers_status[j].cur_length;
}
}
i++;
}
/* Print column header row (text mode only) */
if (runtime_options.output_mode == OM_TEXT)
{
print_status_header(STATUS_HEADER_COUNT, headers_status);
}
i = 0;
for (cell = nodes.head; cell; cell = cell->next)
{
if (runtime_options.output_mode == OM_CSV)
{
printf("%i,%s,%s,%i,%i,%i,%i\n",
cell->node_info->node_id,
cell->node_info->node_name,
get_node_type_string(cell->node_info->type),
repmgrd_info[i]->pg_running ? 1 : 0,
repmgrd_info[i]->running ? 1 : 0,
repmgrd_info[i]->pid,
repmgrd_info[i]->paused ? 1 : 0);
}
else
{
printf(" %-*i ", headers_status[STATUS_ID].max_length, cell->node_info->node_id);
printf("| %-*s ", headers_status[STATUS_NAME].max_length, cell->node_info->node_name);
printf("| %-*s ", headers_status[STATUS_ROLE].max_length, get_node_type_string(cell->node_info->type));
printf("| %-*s ", headers_status[STATUS_PG].max_length, repmgrd_info[i]->pg_running_text);
printf("| %-*s ", headers_status[STATUS_RUNNING].max_length, repmgrd_info[i]->repmgrd_running);
printf("| %-*s ", headers_status[STATUS_PID].max_length, repmgrd_info[i]->pid_text);
if (repmgrd_info[i]->pid == UNKNOWN_PID)
printf("| %-*s ", headers_status[STATUS_PAUSED].max_length, "n/a");
else
printf("| %-*s ", headers_status[STATUS_PAUSED].max_length, repmgrd_info[i]->paused ? "yes" : "no");
printf("\n");
}
free(repmgrd_info[i]);
i++;
}
free(repmgrd_info);
/* emit any warnings */
if (warnings.head != NULL && runtime_options.terse == false && runtime_options.output_mode != OM_CSV)
{
ItemListCell *cell = NULL;
printf(_("\nWARNING: following issues were detected\n"));
for (cell = warnings.head; cell; cell = cell->next)
{
printf(_(" - %s\n"), cell->string);
}
if (runtime_options.verbose == false)
{
log_hint(_("execute with --verbose option to see connection error messages"));
}
}
}
void
do_daemon_pause(void)
{
_do_repmgr_pause(true);
}
void
do_daemon_unpause(void)
{
_do_repmgr_pause(false);
}
static void
_do_repmgr_pause(bool pause)
{
PGconn *conn = NULL;
NodeInfoList nodes = T_NODE_INFO_LIST_INITIALIZER;
NodeInfoListCell *cell = NULL;
RepmgrdInfo **repmgrd_info;
int i;
int error_nodes = 0;
repmgrd_info = (RepmgrdInfo **) pg_malloc0(sizeof(RepmgrdInfo *) * nodes.node_count);
if (repmgrd_info == NULL)
{
log_error(_("unable to allocate memory"));
exit(ERR_OUT_OF_MEMORY);
}
/* Connect to local database to obtain cluster connection data */
log_verbose(LOG_INFO, _("connecting to database"));
if (strlen(config_file_options.conninfo))
conn = establish_db_connection(config_file_options.conninfo, true);
else
conn = establish_db_connection_by_params(&source_conninfo, true);
fetch_node_records(conn, &nodes);
i = 0;
for (cell = nodes.head; cell; cell = cell->next)
{
repmgrd_info[i] = pg_malloc0(sizeof(RepmgrdInfo));
repmgrd_info[i]->node_id = cell->node_info->node_id;
log_verbose(LOG_DEBUG, "pausing node %i (%s)",
cell->node_info->node_id,
cell->node_info->node_name);
cell->node_info->conn = establish_db_connection_quiet(cell->node_info->conninfo);
if (PQstatus(cell->node_info->conn) != CONNECTION_OK)
{
log_warning(_("unable to connect to node %i"),
cell->node_info->node_id);
error_nodes++;
}
else
{
if (runtime_options.dry_run == true)
{
if (pause == true)
{
log_info(_("would pause node %i (%s) "),
cell->node_info->node_id,
cell->node_info->node_name);
}
else
{
log_info(_("would unpause node %i (%s) "),
cell->node_info->node_id,
cell->node_info->node_name);
}
}
else
{
bool success = repmgrd_pause(cell->node_info->conn, pause);
if (success == false)
error_nodes++;
log_notice(_("node %i (%s) %s"),
cell->node_info->node_id,
cell->node_info->node_name,
success == true
? pause == true ? "paused" : "unpaused"
: pause == true ? "not paused" : "not unpaused");
}
PQfinish(cell->node_info->conn);
}
i++;
}
if (error_nodes > 0)
{
if (pause == true)
{
log_error(_("unable to pause %i node(s)"), error_nodes);
}
else
{
log_error(_("unable to unpause %i node(s)"), error_nodes);
}
log_hint(_("execute \"repmgr daemon status\" to view current status"));
exit(ERR_REPMGRD_PAUSE);
}
exit(SUCCESS);
}
void
fetch_node_records(PGconn *conn, NodeInfoList *node_list)
{
bool success = get_all_node_records(conn, node_list);
if (success == false)
{
/* get_all_node_records() will display any error message */
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
if (node_list->node_count == 0)
{
log_error(_("no node records were found"));
log_hint(_("ensure at least one node is registered"));
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
}
void do_daemon_help(void)
{
print_help_header();
printf(_("Usage:\n"));
printf(_(" %s [OPTIONS] daemon status\n"), progname());
printf(_(" %s [OPTIONS] daemon pause\n"), progname());
printf(_(" %s [OPTIONS] daemon unpause\n"), progname());
puts("");
printf(_("DAEMON STATUS\n"));
puts("");
printf(_(" \"daemon status\" shows the status of repmgrd on each node in the cluster\n"));
puts("");
printf(_(" --csv emit output as CSV\n"));
printf(_(" --verbose show text of database connection error messages\n"));
puts("");
printf(_("DAEMON PAUSE\n"));
puts("");
printf(_(" \"daemon pause\" instructs repmgrd on each node to pause failover detection\n"));
puts("");
printf(_(" --dry-run check if nodes are reachable but don't pause repmgrd\n"));
puts("");
printf(_("DAEMON PAUSE\n"));
puts("");
printf(_(" \"daemon unpause\" instructs repmgrd on each node to resume failover detection\n"));
puts("");
printf(_(" --dry-run check if nodes are reachable but don't unpause repmgrd\n"));
puts("");
puts("");
}

View File

@@ -1,28 +0,0 @@
/*
* repmgr-action-daemon.h
* Copyright (c) 2ndQuadrant, 2010-2018
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#ifndef _REPMGR_ACTION_DAEMON_H_
#define _REPMGR_ACTION_DAEMON_H_
extern void do_daemon_status(void);
extern void do_daemon_pause(void);
extern void do_daemon_unpause(void);
extern void do_daemon_help(void);
#endif

View File

@@ -47,7 +47,6 @@ static CheckStatus do_node_check_downstream(PGconn *conn, OutputMode mode, Check
static CheckStatus do_node_check_replication_lag(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output); static CheckStatus do_node_check_replication_lag(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
static CheckStatus do_node_check_role(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output); static CheckStatus do_node_check_role(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
static CheckStatus do_node_check_slots(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output); static CheckStatus do_node_check_slots(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
static CheckStatus do_node_check_missing_slots(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
/* /*
* NODE STATUS * NODE STATUS
@@ -170,17 +169,11 @@ do_node_status(void)
} }
else else
{ {
/* "archive_mode" is not "off", i.e. one of "on", "always" */
bool enabled = true; bool enabled = true;
PQExpBufferData archiving_status; PQExpBufferData archiving_status;
char archive_command[MAXLEN] = ""; char archive_command[MAXLEN] = "";
initPQExpBuffer(&archiving_status); initPQExpBuffer(&archiving_status);
/*
* if the node is a standby, and "archive_mode" is "on", archiving will
* actually be disabled.
*/
if (recovery_type == RECTYPE_STANDBY) if (recovery_type == RECTYPE_STANDBY)
{ {
if (guc_set(conn, "archive_mode", "=", "on")) if (guc_set(conn, "archive_mode", "=", "on"))
@@ -189,16 +182,16 @@ do_node_status(void)
if (enabled == true) if (enabled == true)
{ {
appendPQExpBufferStr(&archiving_status, "enabled"); appendPQExpBuffer(&archiving_status, "enabled");
} }
else else
{ {
appendPQExpBufferStr(&archiving_status, "disabled"); appendPQExpBuffer(&archiving_status, "disabled");
} }
if (enabled == false && recovery_type == RECTYPE_STANDBY) if (enabled == false && recovery_type == RECTYPE_STANDBY)
{ {
appendPQExpBufferStr(&archiving_status, " (on standbys \"archive_mode\" must be set to \"always\" to be effective)"); appendPQExpBuffer(&archiving_status, " (on standbys \"archive_mode\" must be set to \"always\" to be effective)");
} }
key_value_list_set(&node_status, key_value_list_set(&node_status,
@@ -258,55 +251,6 @@ do_node_status(void)
"disabled"); "disabled");
} }
/* check for attached nodes */
{
NodeInfoList downstream_nodes = T_NODE_INFO_LIST_INITIALIZER;
NodeInfoListCell *node_cell = NULL;
ItemList missing_nodes = {NULL, NULL};
int missing_nodes_count = 0;
int expected_nodes_count = 0;
get_downstream_node_records(conn, config_file_options.node_id, &downstream_nodes);
/* if a witness node is present, we'll need to remove this from the total */
expected_nodes_count = downstream_nodes.node_count;
for (node_cell = downstream_nodes.head; node_cell; node_cell = node_cell->next)
{
/* skip witness server */
if (node_cell->node_info->type == WITNESS)
{
expected_nodes_count --;
continue;
}
if (is_downstream_node_attached(conn, node_cell->node_info->node_name) == false)
{
missing_nodes_count++;
item_list_append_format(&missing_nodes,
"%s (ID: %i)",
node_cell->node_info->node_name,
node_cell->node_info->node_id);
}
}
if (missing_nodes_count)
{
ItemListCell *missing_cell = NULL;
item_list_append_format(&warnings,
_("- %i of %i downstream nodes not attached:"),
missing_nodes_count,
expected_nodes_count);
for (missing_cell = missing_nodes.head; missing_cell; missing_cell = missing_cell->next)
{
item_list_append_format(&warnings,
" - %s\n", missing_cell->string);
}
}
}
if (server_version_num < 90400) if (server_version_num < 90400)
{ {
key_value_list_set(&node_status, key_value_list_set(&node_status,
@@ -506,7 +450,7 @@ do_node_status(void)
/* output missing slot information */ /* output missing slot information */
appendPQExpBufferChar(&output, '\n'); appendPQExpBuffer(&output, "\n");
appendPQExpBuffer(&output, appendPQExpBuffer(&output,
"\"missing_replication_slots\",%i", "\"missing_replication_slots\",%i",
missing_slots.node_count); missing_slots.node_count);
@@ -542,31 +486,18 @@ do_node_status(void)
termPQExpBuffer(&output); termPQExpBuffer(&output);
if (warnings.head != NULL && runtime_options.terse == false && runtime_options.output_mode == OM_TEXT) if (runtime_options.output_mode == OM_TEXT && warnings.head != NULL && runtime_options.terse == false)
{ {
log_warning(_("following issue(s) were detected:")); log_warning(_("following issue(s) were detected:"));
print_item_list(&warnings); print_item_list(&warnings);
log_hint(_("execute \"repmgr node check\" for more details")); log_hint(_("execute \"repmgr node check\" for more details"));
} }
clear_node_info_list(&missing_slots);
key_value_list_free(&node_status); key_value_list_free(&node_status);
item_list_free(&warnings); item_list_free(&warnings);
PQfinish(conn); PQfinish(conn);
/*
* If warnings were noted, even if they're not displayed (e.g. in --csv node),
* that means something's not right so we need to emit a non-zero exit code.
*/
if (warnings.head != NULL)
{
exit(ERR_NODE_STATUS);
}
return;
} }
/* /*
* Returns information about the running state of the node. * Returns information about the running state of the node.
* For internal use during "standby switchover". * For internal use during "standby switchover".
@@ -590,13 +521,13 @@ _do_node_status_is_shutdown_cleanly(void)
initPQExpBuffer(&output); initPQExpBuffer(&output);
appendPQExpBufferStr(&output, appendPQExpBuffer(&output,
"--state="); "--state=");
/* sanity-check we're dealing with a PostgreSQL directory */ /* sanity-check we're dealing with a PostgreSQL directory */
if (is_pg_dir(config_file_options.data_directory) == false) if (is_pg_dir(config_file_options.data_directory) == false)
{ {
appendPQExpBufferStr(&output, "UNKNOWN"); appendPQExpBuffer(&output, "UNKNOWN");
printf("%s\n", output.data); printf("%s\n", output.data);
termPQExpBuffer(&output); termPQExpBuffer(&output);
return; return;
@@ -659,10 +590,10 @@ _do_node_status_is_shutdown_cleanly(void)
switch (node_status) switch (node_status)
{ {
case NODE_STATUS_UP: case NODE_STATUS_UP:
appendPQExpBufferStr(&output, "RUNNING"); appendPQExpBuffer(&output, "RUNNING");
break; break;
case NODE_STATUS_SHUTTING_DOWN: case NODE_STATUS_SHUTTING_DOWN:
appendPQExpBufferStr(&output, "SHUTTING_DOWN"); appendPQExpBuffer(&output, "SHUTTING_DOWN");
break; break;
case NODE_STATUS_DOWN: case NODE_STATUS_DOWN:
appendPQExpBuffer(&output, appendPQExpBuffer(&output,
@@ -670,10 +601,10 @@ _do_node_status_is_shutdown_cleanly(void)
format_lsn(checkPoint)); format_lsn(checkPoint));
break; break;
case NODE_STATUS_UNCLEAN_SHUTDOWN: case NODE_STATUS_UNCLEAN_SHUTDOWN:
appendPQExpBufferStr(&output, "UNCLEAN_SHUTDOWN"); appendPQExpBuffer(&output, "UNCLEAN_SHUTDOWN");
break; break;
case NODE_STATUS_UNKNOWN: case NODE_STATUS_UNKNOWN:
appendPQExpBufferStr(&output, "UNKNOWN"); appendPQExpBuffer(&output, "UNKNOWN");
break; break;
} }
@@ -697,7 +628,6 @@ do_node_check(void)
CheckStatusList status_list = {NULL, NULL}; CheckStatusList status_list = {NULL, NULL};
CheckStatusListCell *cell = NULL; CheckStatusListCell *cell = NULL;
bool issue_detected = false;
/* for internal use */ /* for internal use */
if (runtime_options.has_passfile == true) if (runtime_options.has_passfile == true)
@@ -782,17 +712,6 @@ do_node_check(void)
exit(return_code); exit(return_code);
} }
if (runtime_options.missing_slots == true)
{
return_code = do_node_check_missing_slots(conn,
runtime_options.output_mode,
&node_info,
NULL);
PQfinish(conn);
exit(return_code);
}
if (runtime_options.output_mode == OM_NAGIOS) if (runtime_options.output_mode == OM_NAGIOS)
{ {
log_error(_("--nagios can only be used with a specific check")); log_error(_("--nagios can only be used with a specific check"));
@@ -806,23 +725,11 @@ do_node_check(void)
initPQExpBuffer(&output); initPQExpBuffer(&output);
/* order functions are called is also output order */ /* order functions are called is also output order */
if (do_node_check_role(conn, runtime_options.output_mode, &node_info, &status_list) != CHECK_STATUS_OK) (void) do_node_check_role(conn, runtime_options.output_mode, &node_info, &status_list);
issue_detected = true; (void) do_node_check_replication_lag(conn, runtime_options.output_mode, &node_info, &status_list);
(void) do_node_check_archive_ready(conn, runtime_options.output_mode, &status_list);
if (do_node_check_replication_lag(conn, runtime_options.output_mode, &node_info, &status_list) != CHECK_STATUS_OK) (void) do_node_check_downstream(conn, runtime_options.output_mode, &status_list);
issue_detected = true; (void) do_node_check_slots(conn, runtime_options.output_mode, &node_info, &status_list);
if (do_node_check_archive_ready(conn, runtime_options.output_mode, &status_list) != CHECK_STATUS_OK)
issue_detected = true;
if (do_node_check_downstream(conn, runtime_options.output_mode, &status_list) != CHECK_STATUS_OK)
issue_detected = true;
if (do_node_check_slots(conn, runtime_options.output_mode, &node_info, &status_list) != CHECK_STATUS_OK)
issue_detected = true;
if (do_node_check_missing_slots(conn, runtime_options.output_mode, &node_info, &status_list) != CHECK_STATUS_OK)
issue_detected = true;
if (runtime_options.output_mode == OM_CSV) if (runtime_options.output_mode == OM_CSV)
{ {
@@ -847,7 +754,7 @@ do_node_check(void)
",\"%s\"", ",\"%s\"",
cell->details); cell->details);
} }
appendPQExpBufferChar(&output, '\n'); appendPQExpBuffer(&output, "\n");
} }
} }
else else
@@ -869,7 +776,7 @@ do_node_check(void)
" (%s)", " (%s)",
cell->details); cell->details);
} }
appendPQExpBufferChar(&output, '\n'); appendPQExpBuffer(&output, "\n");
} }
} }
@@ -879,11 +786,6 @@ do_node_check(void)
check_status_list_free(&status_list); check_status_list_free(&status_list);
PQfinish(conn); PQfinish(conn);
if (issue_detected == true)
{
exit(ERR_NODE_STATUS);
}
} }
@@ -899,12 +801,12 @@ do_node_check_replication_connection(void)
initPQExpBuffer(&output); initPQExpBuffer(&output);
appendPQExpBufferStr(&output, appendPQExpBuffer(&output,
"--connection="); "--connection=");
if (runtime_options.remote_node_id == UNKNOWN_NODE_ID) if (runtime_options.remote_node_id == UNKNOWN_NODE_ID)
{ {
appendPQExpBufferStr(&output, "UNKNOWN"); appendPQExpBuffer(&output, "UNKNOWN");
printf("%s\n", output.data); printf("%s\n", output.data);
termPQExpBuffer(&output); termPQExpBuffer(&output);
return; return;
@@ -918,7 +820,7 @@ do_node_check_replication_connection(void)
if (record_status != RECORD_FOUND) if (record_status != RECORD_FOUND)
{ {
appendPQExpBufferStr(&output, "UNKNOWN"); appendPQExpBuffer(&output, "UNKNOWN");
printf("%s\n", output.data); printf("%s\n", output.data);
termPQExpBuffer(&output); termPQExpBuffer(&output);
return; return;
@@ -938,7 +840,7 @@ do_node_check_replication_connection(void)
if (PQstatus(repl_conn) != CONNECTION_OK) if (PQstatus(repl_conn) != CONNECTION_OK)
{ {
appendPQExpBufferStr(&output, "BAD"); appendPQExpBuffer(&output, "BAD");
printf("%s\n", output.data); printf("%s\n", output.data);
termPQExpBuffer(&output); termPQExpBuffer(&output);
return; return;
@@ -946,7 +848,7 @@ do_node_check_replication_connection(void)
PQfinish(repl_conn); PQfinish(repl_conn);
appendPQExpBufferStr(&output, "OK"); appendPQExpBuffer(&output, "OK");
printf("%s\n", output.data); printf("%s\n", output.data);
termPQExpBuffer(&output); termPQExpBuffer(&output);
@@ -1042,7 +944,8 @@ do_node_check_archive_ready(PGconn *conn, OutputMode mode, CheckStatusList *list
break; break;
case OM_NAGIOS: case OM_NAGIOS:
case OM_TEXT: case OM_TEXT:
appendPQExpBufferStr(&details, appendPQExpBuffer(
&details,
"unable to check archive_status directory"); "unable to check archive_status directory");
break; break;
@@ -1144,7 +1047,6 @@ do_node_check_downstream(PGconn *conn, OutputMode mode, CheckStatusList *list_ou
for (cell = downstream_nodes.head; cell; cell = cell->next) for (cell = downstream_nodes.head; cell; cell = cell->next)
{ {
/* skip witness server */
if (cell->node_info->type == WITNESS) if (cell->node_info->type == WITNESS)
{ {
expected_nodes_count --; expected_nodes_count --;
@@ -1171,7 +1073,7 @@ do_node_check_downstream(PGconn *conn, OutputMode mode, CheckStatusList *list_ou
if (missing_nodes_count == 0) if (missing_nodes_count == 0)
{ {
if (expected_nodes_count == 0) if (expected_nodes_count == 0)
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
"this node has no downstream nodes"); "this node has no downstream nodes");
else else
appendPQExpBuffer(&details, appendPQExpBuffer(&details,
@@ -1193,18 +1095,20 @@ do_node_check_downstream(PGconn *conn, OutputMode mode, CheckStatusList *list_ou
if (mode != OM_NAGIOS) if (mode != OM_NAGIOS)
{ {
appendPQExpBufferStr(&details, "; missing: "); appendPQExpBuffer(&details, "; missing: ");
for (missing_cell = missing_nodes.head; missing_cell; missing_cell = missing_cell->next) for (missing_cell = missing_nodes.head; missing_cell; missing_cell = missing_cell->next)
{ {
if (first == false) if (first == false)
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
", "); ", ");
else else
first = false; first = false;
if (first == false) if (first == false)
appendPQExpBufferStr(&details, missing_cell->string); appendPQExpBuffer(
&details,
"%s", missing_cell->string);
} }
} }
} }
@@ -1304,7 +1208,7 @@ do_node_check_replication_lag(PGconn *conn, OutputMode mode, t_node_info *node_i
switch (mode) switch (mode)
{ {
case OM_OPTFORMAT: case OM_OPTFORMAT:
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
"--lag=0"); "--lag=0");
break; break;
case OM_NAGIOS: case OM_NAGIOS:
@@ -1316,12 +1220,12 @@ do_node_check_replication_lag(PGconn *conn, OutputMode mode, t_node_info *node_i
case OM_TEXT: case OM_TEXT:
if (node_info->type == WITNESS) if (node_info->type == WITNESS)
{ {
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
"N/A - node is witness"); "N/A - node is witness");
} }
else else
{ {
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
"N/A - node is primary"); "N/A - node is primary");
} }
break; break;
@@ -1403,7 +1307,8 @@ do_node_check_replication_lag(PGconn *conn, OutputMode mode, t_node_info *node_i
break; break;
case OM_NAGIOS: case OM_NAGIOS:
case OM_TEXT: case OM_TEXT:
appendPQExpBufferStr(&details, appendPQExpBuffer(
&details,
"unable to query replication lag"); "unable to query replication lag");
break; break;
@@ -1504,12 +1409,12 @@ do_node_check_role(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckS
if (recovery_type == RECTYPE_STANDBY) if (recovery_type == RECTYPE_STANDBY)
{ {
status = CHECK_STATUS_CRITICAL; status = CHECK_STATUS_CRITICAL;
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
_("node is registered as primary but running as standby")); _("node is registered as primary but running as standby"));
} }
else else
{ {
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
_("node is primary")); _("node is primary"));
} }
break; break;
@@ -1517,12 +1422,12 @@ do_node_check_role(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckS
if (recovery_type == RECTYPE_PRIMARY) if (recovery_type == RECTYPE_PRIMARY)
{ {
status = CHECK_STATUS_CRITICAL; status = CHECK_STATUS_CRITICAL;
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
_("node is registered as standby but running as primary")); _("node is registered as standby but running as primary"));
} }
else else
{ {
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
_("node is standby")); _("node is standby"));
} }
break; break;
@@ -1530,12 +1435,12 @@ do_node_check_role(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckS
if (recovery_type == RECTYPE_STANDBY) if (recovery_type == RECTYPE_STANDBY)
{ {
status = CHECK_STATUS_CRITICAL; status = CHECK_STATUS_CRITICAL;
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
_("node is registered as witness but running as standby")); _("node is registered as witness but running as standby"));
} }
else else
{ {
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
_("node is witness")); _("node is witness"));
} }
break; break;
@@ -1547,8 +1452,8 @@ do_node_check_role(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckS
if (is_bdr_db(conn, &output) == false) if (is_bdr_db(conn, &output) == false)
{ {
status = CHECK_STATUS_CRITICAL; status = CHECK_STATUS_CRITICAL;
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
output.data); "%s", output.data);
} }
termPQExpBuffer(&output); termPQExpBuffer(&output);
@@ -1557,12 +1462,12 @@ do_node_check_role(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckS
if (is_active_bdr_node(conn, node_info->node_name) == false) if (is_active_bdr_node(conn, node_info->node_name) == false)
{ {
status = CHECK_STATUS_CRITICAL; status = CHECK_STATUS_CRITICAL;
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
_("node is not an active BDR node")); _("node is not an active BDR node"));
} }
else else
{ {
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
_("node is an active BDR node")); _("node is an active BDR node"));
} }
} }
@@ -1620,12 +1525,12 @@ do_node_check_slots(PGconn *conn, OutputMode mode, t_node_info *node_info, Check
if (server_version_num < 90400) if (server_version_num < 90400)
{ {
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
_("replication slots not available for this PostgreSQL version")); _("replication slots not available for this PostgreSQL version"));
} }
else if (node_info->total_replication_slots == 0) else if (node_info->total_replication_slots == 0)
{ {
appendPQExpBufferStr(&details, appendPQExpBuffer(&details,
_("node has no replication slots")); _("node has no replication slots"));
} }
else if (node_info->inactive_replication_slots == 0) else if (node_info->inactive_replication_slots == 0)
@@ -1678,129 +1583,6 @@ do_node_check_slots(PGconn *conn, OutputMode mode, t_node_info *node_info, Check
} }
static CheckStatus
do_node_check_missing_slots(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output)
{
CheckStatus status = CHECK_STATUS_OK;
PQExpBufferData details;
NodeInfoList missing_slots = T_NODE_INFO_LIST_INITIALIZER;
if (mode == OM_CSV && list_output == NULL)
{
log_error(_("--csv output not provided with --missing-slots option"));
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
initPQExpBuffer(&details);
if (server_version_num < 90400)
{
appendPQExpBufferStr(&details,
_("replication slots not available for this PostgreSQL version"));
}
else
{
get_downstream_nodes_with_missing_slot(conn,
config_file_options.node_id,
&missing_slots);
if (missing_slots.node_count == 0)
{
appendPQExpBufferStr(&details,
_("node has no missing replication slots"));
}
else
{
NodeInfoListCell *missing_slot_cell = NULL;
bool first_element = true;
status = CHECK_STATUS_CRITICAL;
appendPQExpBuffer(&details,
_("%i replication slots are missing"),
missing_slots.node_count);
if (missing_slots.node_count)
{
appendPQExpBufferStr(&details, ": ");
for (missing_slot_cell = missing_slots.head; missing_slot_cell; missing_slot_cell = missing_slot_cell->next)
{
if (first_element == true)
{
first_element = false;
}
else
{
appendPQExpBufferStr(&details, ", ");
}
appendPQExpBufferStr(&details, missing_slot_cell->node_info->slot_name);
}
}
}
}
switch (mode)
{
case OM_NAGIOS:
{
printf("REPMGR_MISSING_SLOTS %s: %s | missing_slots=%i",
output_check_status(status),
details.data,
missing_slots.node_count);
if (missing_slots.node_count)
{
NodeInfoListCell *missing_slot_cell = NULL;
bool first_element = true;
printf(";");
for (missing_slot_cell = missing_slots.head; missing_slot_cell; missing_slot_cell = missing_slot_cell->next)
{
if (first_element == true)
{
first_element = false;
}
else
{
printf(",");
}
printf("%s", missing_slot_cell->node_info->slot_name);
}
}
printf("\n");
break;
}
case OM_CSV:
case OM_TEXT:
if (list_output != NULL)
{
check_status_list_set(list_output,
"Missing replication slots",
status,
details.data);
}
else
{
printf("%s (%s)\n",
output_check_status(status),
details.data);
}
default:
break;
}
clear_node_info_list(&missing_slots);
termPQExpBuffer(&details);
return status;
}
void void
do_node_service(void) do_node_service(void)
{ {
@@ -2132,7 +1914,7 @@ do_node_rejoin(void)
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
} }
appendPQExpBufferStr(&msg, appendPQExpBuffer(&msg,
_("prerequisites for using pg_rewind are met")); _("prerequisites for using pg_rewind are met"));
if (runtime_options.dry_run == true) if (runtime_options.dry_run == true)
@@ -2354,19 +2136,19 @@ do_node_rejoin(void)
{ {
log_verbose(LOG_INFO, _("waiting for node %i to respond to pings; %i of max %i attempts"), log_verbose(LOG_INFO, _("waiting for node %i to respond to pings; %i of max %i attempts"),
config_file_options.node_id, config_file_options.node_id,
i + 1, config_file_options.node_rejoin_timeout); i + 1, config_file_options.standby_reconnect_timeout);
} }
else else
{ {
log_debug("sleeping 1 second waiting for node %i to respond to pings; %i of max %i attempts", log_debug("sleeping 1 second waiting for node %i to respond to pings; %i of max %i attempts",
config_file_options.node_id, config_file_options.node_id,
i + 1, config_file_options.node_rejoin_timeout); i + 1, config_file_options.standby_reconnect_timeout);
} }
sleep(1); sleep(1);
} }
for (; i < config_file_options.node_rejoin_timeout; i++) for (; i < config_file_options.standby_reconnect_timeout; i++)
{ {
success = is_downstream_node_attached(upstream_conn, config_file_options.node_name); success = is_downstream_node_attached(upstream_conn, config_file_options.node_name);
@@ -2381,13 +2163,13 @@ do_node_rejoin(void)
{ {
log_info(_("waiting for node %i to connect to new primary; %i of max %i attempts"), log_info(_("waiting for node %i to connect to new primary; %i of max %i attempts"),
config_file_options.node_id, config_file_options.node_id,
i + 1, config_file_options.node_rejoin_timeout); i + 1, config_file_options.standby_reconnect_timeout);
} }
else else
{ {
log_debug("sleeping 1 second waiting for node %i to connect to new primary; %i of max %i attempts", log_debug("sleeping 1 second waiting for node %i to connect to new primary; %i of max %i attempts",
config_file_options.node_id, config_file_options.node_id,
i + 1, config_file_options.node_rejoin_timeout); i + 1, config_file_options.standby_reconnect_timeout);
} }
sleep(1); sleep(1);
@@ -2412,54 +2194,6 @@ do_node_rejoin(void)
success = is_downstream_node_attached(upstream_conn, config_file_options.node_name); success = is_downstream_node_attached(upstream_conn, config_file_options.node_name);
} }
/*
* Handle replication slots:
* - if a slot for the new upstream exists, delete that
* - warn about any other inactive replication slots
*/
if (runtime_options.force_rewind_used == false && config_file_options.use_replication_slots)
{
PGconn *local_conn = NULL;
local_conn = establish_db_connection(config_file_options.conninfo, false);
if (PQstatus(local_conn) != CONNECTION_OK)
{
log_warning(_("unable to connect to local node to check replication slot status"));
log_hint(_("execute \"repmgr node check\" to check inactive slots and drop manually if necessary"));
}
else
{
KeyValueList inactive_replication_slots = {NULL, NULL};
KeyValueListCell *cell = NULL;
int inactive_count = 0;
PQExpBufferData slotinfo;
drop_replication_slot_if_exists(local_conn,
config_file_options.node_id,
primary_node_record.slot_name);
(void) get_inactive_replication_slots(local_conn, &inactive_replication_slots);
initPQExpBuffer(&slotinfo);
for (cell = inactive_replication_slots.head; cell; cell = cell->next)
{
appendPQExpBuffer(&slotinfo,
" - %s (%s)", cell->key, cell->value);
inactive_count++;
}
if (inactive_count > 0)
{
log_warning(_("%i inactive replication slots detected"), inactive_count);
log_detail(_("inactive replication slots:\n%s"), slotinfo.data);
log_hint(_("these replication slots may need to be removed manually"));
}
termPQExpBuffer(&slotinfo);
PQfinish(local_conn);
}
}
if (success == true) if (success == true)
{ {
@@ -2469,8 +2203,7 @@ do_node_rejoin(void)
else else
{ {
/* /*
* if we reach here, no record found in upstream node's pg_stat_replication * if we reach here, no record found in upstream node's pg_stat_replication */
*/
log_notice(_("NODE REJOIN has completed but node is not yet reattached to upstream")); log_notice(_("NODE REJOIN has completed but node is not yet reattached to upstream"));
log_hint(_("you will need to manually check the node's replication status")); log_hint(_("you will need to manually check the node's replication status"));
} }
@@ -2931,7 +2664,6 @@ do_node_help(void)
printf(_(" --replication-lag replication lag in seconds (standbys only)\n")); printf(_(" --replication-lag replication lag in seconds (standbys only)\n"));
printf(_(" --role check node has expected role\n")); printf(_(" --role check node has expected role\n"));
printf(_(" --slots check for inactive replication slots\n")); printf(_(" --slots check for inactive replication slots\n"));
printf(_(" --missing-slots check for missing replication slots\n"));
puts(""); puts("");
@@ -2963,7 +2695,6 @@ do_node_help(void)
printf(_(" --dry-run show what action would be performed, but don't execute it\n")); printf(_(" --dry-run show what action would be performed, but don't execute it\n"));
printf(_(" --action action to perform (one of \"start\", \"stop\", \"restart\" or \"reload\")\n")); printf(_(" --action action to perform (one of \"start\", \"stop\", \"restart\" or \"reload\")\n"));
printf(_(" --list-actions show what command would be performed for each action\n")); printf(_(" --list-actions show what command would be performed for each action\n"));
printf(_(" --checkpoint issue a CHECKPOINT before stopping or restarting the node\n"));
puts(""); puts("");

View File

@@ -64,11 +64,13 @@ do_primary_register(void)
PQfinish(conn); PQfinish(conn);
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
} }
else
log_error(_("unable to determine server's recovery type")); {
log_error(_("connection to node lost"));
PQfinish(conn); PQfinish(conn);
exit(ERR_DB_CONN); exit(ERR_DB_CONN);
} }
}
log_verbose(LOG_INFO, _("server is not in recovery")); log_verbose(LOG_INFO, _("server is not in recovery"));
@@ -170,7 +172,7 @@ do_primary_register(void)
&node_info); &node_info);
if (record_created == true) if (record_created == true)
{ {
appendPQExpBufferStr(&event_description, appendPQExpBuffer(&event_description,
"existing primary record updated"); "existing primary record updated");
} }
else else

File diff suppressed because it is too large Load Diff

View File

@@ -37,7 +37,6 @@ do_witness_register(void)
PGconn *witness_conn = NULL; PGconn *witness_conn = NULL;
PGconn *primary_conn = NULL; PGconn *primary_conn = NULL;
RecoveryType recovery_type = RECTYPE_UNKNOWN; RecoveryType recovery_type = RECTYPE_UNKNOWN;
ExtensionStatus extension_status = REPMGR_UNKNOWN;
NodeInfoList nodes = T_NODE_INFO_LIST_INITIALIZER; NodeInfoList nodes = T_NODE_INFO_LIST_INITIALIZER;
t_node_info node_record = T_NODE_INFO_INITIALIZER; t_node_info node_record = T_NODE_INFO_INITIALIZER;
RecordStatus record_status = RECORD_NOT_FOUND; RecordStatus record_status = RECORD_NOT_FOUND;
@@ -215,28 +214,11 @@ do_witness_register(void)
} }
} }
extension_status = get_repmgr_extension_status(witness_conn, NULL);
/* /*
* Check if the witness database already contains node records; * if repmgr.nodes contains entries, delete if -F/--force provided,
* only do this if the extension is actually installed. * otherwise exit with error
*/ */
if (extension_status == REPMGR_INSTALLED get_all_node_records(witness_conn, &nodes);
|| extension_status == REPMGR_OLD_VERSION_INSTALLED)
{
/*
* if repmgr.nodes contains entries, exit with error unless
* -F/--force provided (which will cause the existing records
* to be overwritten)
*/
if (get_all_node_records(witness_conn, &nodes) == false)
{
/* get_all_node_records() will display the error */
PQfinish(witness_conn);
PQfinish(primary_conn);
exit(ERR_BAD_CONFIG);
}
log_verbose(LOG_DEBUG, "%i node records found", nodes.node_count); log_verbose(LOG_DEBUG, "%i node records found", nodes.node_count);
@@ -253,7 +235,6 @@ do_witness_register(void)
} }
clear_node_info_list(&nodes); clear_node_info_list(&nodes);
}
if (runtime_options.dry_run == true) if (runtime_options.dry_run == true)
{ {
@@ -329,59 +310,55 @@ do_witness_register(void)
void void
do_witness_unregister(void) do_witness_unregister(void)
{ {
PGconn *local_conn = NULL; PGconn *witness_conn = NULL;
PGconn *primary_conn = NULL; PGconn *primary_conn = NULL;
t_node_info node_record = T_NODE_INFO_INITIALIZER; t_node_info node_record = T_NODE_INFO_INITIALIZER;
RecordStatus record_status = RECORD_NOT_FOUND; RecordStatus record_status = RECORD_NOT_FOUND;
bool node_record_deleted = false; bool node_record_deleted = false;
bool local_node_available = true; bool witness_available = true;
int witness_node_id = UNKNOWN_NODE_ID;
if (runtime_options.node_id != UNKNOWN_NODE_ID) log_info(_("connecting to witness node \"%s\" (ID: %i)"),
{
/* user has specified the witness node id */
witness_node_id = runtime_options.node_id;
}
else
{
/* assume witness node is local node */
witness_node_id = config_file_options.node_id;
}
log_info(_("connecting to node \"%s\" (ID: %i)"),
config_file_options.node_name, config_file_options.node_name,
config_file_options.node_id); config_file_options.node_id);
local_conn = establish_db_connection_quiet(config_file_options.conninfo); witness_conn = establish_db_connection_quiet(config_file_options.conninfo);
if (PQstatus(local_conn) != CONNECTION_OK) if (PQstatus(witness_conn) != CONNECTION_OK)
{ {
if (!runtime_options.force) if (!runtime_options.force)
{ {
log_error(_("unable to connect to node \"%s\" (ID: %i)"), log_error(_("unable to connect to witness node \"%s\" (ID: %i)"),
config_file_options.node_name, config_file_options.node_name,
config_file_options.node_id); config_file_options.node_id);
log_detail("%s", PQerrorMessage(local_conn)); log_detail("%s", PQerrorMessage(witness_conn));
log_hint(_("provide -F/--force to remove the witness record if the server is not running"));
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
} }
log_notice(_("unable to connect to witness node \"%s\" (ID: %i), removing node record on cluster primary only"), log_notice(_("unable to connect to witness node \"%s\" (ID: %i), removing node record on cluster primary only"),
config_file_options.node_name, config_file_options.node_name,
config_file_options.node_id); config_file_options.node_id);
local_node_available = false; witness_available = false;
} }
if (local_node_available == true) if (witness_available == true)
{ {
primary_conn = get_primary_connection_quiet(local_conn, NULL, NULL); primary_conn = get_primary_connection_quiet(witness_conn, NULL, NULL);
} }
else else
{ {
/* /*
* Assume user has provided connection details for the primary server * Extract the repmgr user and database names from the conninfo string
* provided in repmgr.conf
*/ */
get_conninfo_value(config_file_options.conninfo, "user", repmgr_user);
get_conninfo_value(config_file_options.conninfo, "dbname", repmgr_db);
param_set_ine(&source_conninfo, "user", repmgr_user);
param_set_ine(&source_conninfo, "dbname", repmgr_db);
primary_conn = establish_db_connection_by_params(&source_conninfo, false); primary_conn = establish_db_connection_by_params(&source_conninfo, false);
} }
if (PQstatus(primary_conn) != CONNECTION_OK) if (PQstatus(primary_conn) != CONNECTION_OK)
@@ -389,26 +366,26 @@ do_witness_unregister(void)
log_error(_("unable to connect to primary")); log_error(_("unable to connect to primary"));
log_detail("%s", PQerrorMessage(primary_conn)); log_detail("%s", PQerrorMessage(primary_conn));
if (local_node_available == true) if (witness_available == true)
{ {
PQfinish(local_conn); PQfinish(witness_conn);
} }
else if (runtime_options.connection_param_provided == false) else
{ {
log_hint(_("provide connection details for the primary server")); log_hint(_("provide connection details to primary server"));
} }
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
} }
/* Check node exists and is really a witness */ /* Check node exists and is really a witness */
record_status = get_node_record(primary_conn, witness_node_id, &node_record); record_status = get_node_record(primary_conn, config_file_options.node_id, &node_record);
if (record_status != RECORD_FOUND) if (record_status != RECORD_FOUND)
{ {
log_error(_("no record found for node %i"), witness_node_id); log_error(_("no record found for node %i"), config_file_options.node_id);
if (local_node_available == true) if (witness_available == true)
PQfinish(local_conn); PQfinish(witness_conn);
PQfinish(primary_conn); PQfinish(primary_conn);
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
@@ -416,17 +393,11 @@ do_witness_unregister(void)
if (node_record.type != WITNESS) if (node_record.type != WITNESS)
{ {
/*
* The node (either explicitly provided with --node-id, or the local node)
* is not a witness.
*
* TODO: scan node list and print hint about identity of known witness servers.
*/
log_error(_("node %i is not a witness node"), config_file_options.node_id); log_error(_("node %i is not a witness node"), config_file_options.node_id);
log_detail(_("node %i is a %s node"), config_file_options.node_id, get_node_type_string(node_record.type)); log_detail(_("node %i is a %s node"), config_file_options.node_id, get_node_type_string(node_record.type));
if (local_node_available == true) if (witness_available == true)
PQfinish(local_conn); PQfinish(witness_conn);
PQfinish(primary_conn); PQfinish(primary_conn);
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
@@ -435,43 +406,49 @@ do_witness_unregister(void)
if (runtime_options.dry_run == true) if (runtime_options.dry_run == true)
{ {
log_info(_("prerequisites for unregistering the witness node are met")); log_info(_("prerequisites for unregistering the witness node are met"));
if (local_node_available == true) if (witness_available == true)
PQfinish(local_conn); PQfinish(witness_conn);
PQfinish(primary_conn); PQfinish(primary_conn);
exit(SUCCESS); exit(SUCCESS);
} }
log_info(_("unregistering witness node %i"), witness_node_id); log_info(_("unregistering witness node %i"), config_file_options.node_id);
node_record_deleted = delete_node_record(primary_conn, node_record_deleted = delete_node_record(primary_conn,
witness_node_id); config_file_options.node_id);
if (node_record_deleted == false) if (node_record_deleted == false)
{ {
PQfinish(primary_conn); PQfinish(primary_conn);
PQfinish(witness_conn);
exit(ERR_BAD_CONFIG);
}
if (local_node_available == true) /* sync records from primary */
PQfinish(local_conn); if (witness_available == true && witness_copy_node_records(primary_conn, witness_conn) == false)
PQfinish(local_conn); {
log_error(_("unable to copy repmgr node records from primary"));
PQfinish(primary_conn);
PQfinish(witness_conn);
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
} }
/* Log the event */ /* Log the event */
create_event_record(primary_conn, create_event_record(primary_conn,
&config_file_options, &config_file_options,
witness_node_id, config_file_options.node_id,
"witness_unregister", "witness_unregister",
true, true,
NULL); NULL);
PQfinish(primary_conn); PQfinish(primary_conn);
if (local_node_available == true) if (witness_available == true)
PQfinish(local_conn); PQfinish(witness_conn);
log_info(_("witness unregistration complete")); log_info(_("witness unregistration complete"));
log_detail(_("witness node with ID %i successfully unregistered"), log_detail(_("witness node with id %i (conninfo: %s) successfully unregistered"),
witness_node_id); config_file_options.node_id, config_file_options.conninfo);
return; return;
} }
@@ -484,15 +461,13 @@ void do_witness_help(void)
printf(_("Usage:\n")); printf(_("Usage:\n"));
printf(_(" %s [OPTIONS] witness register\n"), progname()); printf(_(" %s [OPTIONS] witness register\n"), progname());
printf(_(" %s [OPTIONS] witness unregister\n"), progname()); printf(_(" %s [OPTIONS] witness unregister\n"), progname());
puts("");
printf(_("WITNESS REGISTER\n")); printf(_("WITNESS REGISTER\n"));
puts(""); puts("");
printf(_(" \"witness register\" registers a witness node.\n")); printf(_(" \"witness register\" registers a witness node.\n"));
puts(""); puts("");
printf(_(" Requires provision of connection information for the primary node,\n")); printf(_(" Requires provision of connection information for the primary\n"));
printf(_(" typically usually just the host name.\n"));
puts(""); puts("");
printf(_(" -h/--host host name of the primary node\n"));
printf(_(" --dry-run check prerequisites but don't make any changes\n")); printf(_(" --dry-run check prerequisites but don't make any changes\n"));
printf(_(" -F, --force overwrite an existing node record\n")); printf(_(" -F, --force overwrite an existing node record\n"));
puts(""); puts("");
@@ -503,9 +478,6 @@ void do_witness_help(void)
puts(""); puts("");
printf(_(" --dry-run check prerequisites but don't make any changes\n")); printf(_(" --dry-run check prerequisites but don't make any changes\n"));
printf(_(" -F, --force unregister when witness node not running\n")); printf(_(" -F, --force unregister when witness node not running\n"));
printf(_(" --node-id node ID of the witness node (provide if executing on\n"));
printf(_(" another node)\n"));
puts(""); puts("");
return; return;

View File

@@ -47,7 +47,6 @@ typedef struct
/* logging options */ /* logging options */
char log_level[MAXLEN]; /* overrides setting in repmgr.conf */ char log_level[MAXLEN]; /* overrides setting in repmgr.conf */
bool log_to_file; bool log_to_file;
bool quiet;
bool terse; bool terse;
bool verbose; bool verbose;
@@ -97,7 +96,6 @@ typedef struct
bool force_rewind_used; bool force_rewind_used;
char force_rewind_path[MAXPGPATH]; char force_rewind_path[MAXPGPATH];
bool siblings_follow; bool siblings_follow;
bool repmgrd_no_pause;
/* "node status" options */ /* "node status" options */
bool is_shutdown_cleanly; bool is_shutdown_cleanly;
@@ -108,7 +106,6 @@ typedef struct
bool replication_lag; bool replication_lag;
bool role; bool role;
bool slots; bool slots;
bool missing_slots;
bool has_passfile; bool has_passfile;
bool replication_connection; bool replication_connection;
@@ -140,7 +137,7 @@ typedef struct
/* general configuration options */ \ /* general configuration options */ \
"", false, false, "", false, false, \ "", false, false, "", false, false, \
/* logging options */ \ /* logging options */ \
"", false, false, false, false, \ "", false, false, false, \
/* output options */ \ /* output options */ \
false, false, false, \ false, false, false, \
/* database connection options */ \ /* database connection options */ \
@@ -155,13 +152,13 @@ typedef struct
/* "standby clone"/"standby follow" options */ \ /* "standby clone"/"standby follow" options */ \
NO_UPSTREAM_NODE, \ NO_UPSTREAM_NODE, \
/* "standby register" options */ \ /* "standby register" options */ \
false, -1, DEFAULT_WAIT_START, \ false, 0, DEFAULT_WAIT_START, \
/* "standby switchover" options */ \ /* "standby switchover" options */ \
false, false, "", false, false, \ false, false, "", false, \
/* "node status" options */ \ /* "node status" options */ \
false, \ false, \
/* "node check" options */ \ /* "node check" options */ \
false, false, false, false, false, false, false, false, \ false, false, false, false, false, false, false, \
/* "node join" options */ \ /* "node join" options */ \
"", \ "", \
/* "node service" options */ \ /* "node service" options */ \
@@ -194,14 +191,6 @@ typedef enum
} t_server_action; } t_server_action;
typedef struct ColHeader
{
char title[MAXLEN];
int max_length;
int cur_length;
} ColHeader;
/* global configuration structures */ /* global configuration structures */
extern t_runtime_options runtime_options; extern t_runtime_options runtime_options;
@@ -237,10 +226,7 @@ extern void get_superuser_connection(PGconn **conn, PGconn **superuser_conn, PGc
extern bool remote_command(const char *host, const char *user, const char *command, PQExpBufferData *outputbuf); extern bool remote_command(const char *host, const char *user, const char *command, PQExpBufferData *outputbuf);
extern void make_remote_repmgr_path(PQExpBufferData *outputbuf, t_node_info *remote_node_record); extern void make_remote_repmgr_path(PQExpBufferData *outputbuf, t_node_info *remote_node_record);
/* display functions */
extern void print_help_header(void); extern void print_help_header(void);
extern void print_status_header(int cols, ColHeader *headers);
/* server control functions */ /* server control functions */
extern void get_server_action(t_server_action action, char *script, char *data_dir); extern void get_server_action(t_server_action action, char *script, char *data_dir);
@@ -249,6 +235,5 @@ extern void get_node_config_directory(char *config_dir_buf);
extern void get_node_data_directory(char *data_dir_buf); extern void get_node_data_directory(char *data_dir_buf);
extern void init_node_record(t_node_info *node_record); extern void init_node_record(t_node_info *node_record);
extern bool can_use_pg_rewind(PGconn *conn, const char *data_directory, PQExpBufferData *reason); extern bool can_use_pg_rewind(PGconn *conn, const char *data_directory, PQExpBufferData *reason);
extern void drop_replication_slot_if_exists(PGconn *conn, int node_id, char *slot_name);
#endif /* _REPMGR_CLIENT_GLOBAL_H_ */ #endif /* _REPMGR_CLIENT_GLOBAL_H_ */

View File

@@ -29,14 +29,11 @@
* *
* NODE STATUS * NODE STATUS
* NODE CHECK * NODE CHECK
*
* For internal use:
* NODE REJOIN * NODE REJOIN
* NODE SERVICE * NODE SERVICE
* *
* DAEMON STATUS
* DAEMON PAUSE
* DAEMON UNPAUSE
*
*
* This program is free software: you can redistribute it and/or modify * This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by * it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or * the Free Software Foundation, either version 3 of the License, or
@@ -65,7 +62,6 @@
#include "repmgr-action-bdr.h" #include "repmgr-action-bdr.h"
#include "repmgr-action-node.h" #include "repmgr-action-node.h"
#include "repmgr-action-cluster.h" #include "repmgr-action-cluster.h"
#include "repmgr-action-daemon.h"
#include <storage/fd.h> /* for PG_TEMP_FILE_PREFIX */ #include <storage/fd.h> /* for PG_TEMP_FILE_PREFIX */
@@ -102,7 +98,7 @@ main(int argc, char **argv)
{ {
t_conninfo_param_list default_conninfo = T_CONNINFO_PARAM_LIST_INITIALIZER; t_conninfo_param_list default_conninfo = T_CONNINFO_PARAM_LIST_INITIALIZER;
int optindex = 0; int optindex;
int c; int c;
char *repmgr_command = NULL; char *repmgr_command = NULL;
@@ -112,7 +108,6 @@ main(int argc, char **argv)
char *dummy_action = ""; char *dummy_action = "";
bool help_option = false; bool help_option = false;
bool option_error_found = false;
set_progname(argv[0]); set_progname(argv[0]);
@@ -183,10 +178,7 @@ main(int argc, char **argv)
strncpy(runtime_options.username, pw->pw_name, MAXLEN); strncpy(runtime_options.username, pw->pw_name, MAXLEN);
} }
/* Make getopt emitting errors */ while ((c = getopt_long(argc, argv, "?Vb:f:FwWd:h:p:U:R:S:D:ck:L:tvC:", long_options,
opterr = 1;
while ((c = getopt_long(argc, argv, "?Vb:f:FwWd:h:p:U:R:S:D:ck:L:qtvC:", long_options,
&optindex)) != -1) &optindex)) != -1)
{ {
/* /*
@@ -204,7 +196,13 @@ main(int argc, char **argv)
case OPT_HELP: /* --help */ case OPT_HELP: /* --help */
help_option = true; help_option = true;
break; break;
case '?':
/* Actual help option given */
if (strcmp(argv[optind - 1], "-?") == 0)
{
help_option = true;
}
break;
case 'V': case 'V':
/* /*
@@ -442,10 +440,6 @@ main(int argc, char **argv)
runtime_options.siblings_follow = true; runtime_options.siblings_follow = true;
break; break;
case OPT_REPMGRD_NO_PAUSE:
runtime_options.repmgrd_no_pause = true;
break;
/*---------------------- /*----------------------
* "node status" options * "node status" options
*---------------------- *----------------------
@@ -479,10 +473,6 @@ main(int argc, char **argv)
runtime_options.slots = true; runtime_options.slots = true;
break; break;
case OPT_MISSING_SLOTS:
runtime_options.missing_slots = true;
break;
case OPT_HAS_PASSFILE: case OPT_HAS_PASSFILE:
runtime_options.has_passfile = true; runtime_options.has_passfile = true;
break; break;
@@ -582,12 +572,6 @@ main(int argc, char **argv)
logger_output_mode = OM_DAEMON; logger_output_mode = OM_DAEMON;
break; break;
/* --quiet */
case 'q':
runtime_options.quiet = true;
break;
/* --terse */ /* --terse */
case 't': case 't':
runtime_options.terse = true; runtime_options.terse = true;
@@ -643,24 +627,9 @@ main(int argc, char **argv)
_("--recovery-min-apply-delay is now a configuration file parameter, \"recovery_min_apply_delay\"")); _("--recovery-min-apply-delay is now a configuration file parameter, \"recovery_min_apply_delay\""));
break; break;
case ':': /* missing option argument */
option_error_found = true;
break;
case '?':
/* Actual help option given? */
if (strcmp(argv[optind - 1], "-?") == 0)
{
help_option = true;
break;
}
/* otherwise fall through to default */
default: /* invalid option */
option_error_found = true;
break;
} }
} }
/* /*
* If -d/--dbname appears to be a conninfo string, validate by attempting * If -d/--dbname appears to be a conninfo string, validate by attempting
* to parse it (and if successful, store the parsed parameters) * to parse it (and if successful, store the parsed parameters)
@@ -761,10 +730,9 @@ main(int argc, char **argv)
if (cli_errors.head != NULL) if (cli_errors.head != NULL)
{ {
free_conninfo_params(&source_conninfo); free_conninfo_params(&source_conninfo);
exit_with_cli_errors(&cli_errors, NULL); exit_with_cli_errors(&cli_errors);
} }
/*---------- /*----------
* Determine the node type and action; following are valid: * Determine the node type and action; following are valid:
* *
@@ -774,7 +742,6 @@ main(int argc, char **argv)
* BDR { REGISTER | UNREGISTER } | * BDR { REGISTER | UNREGISTER } |
* NODE { STATUS | CHECK | REJOIN | SERVICE } | * NODE { STATUS | CHECK | REJOIN | SERVICE } |
* CLUSTER { CROSSCHECK | MATRIX | SHOW | EVENT | CLEANUP } * CLUSTER { CROSSCHECK | MATRIX | SHOW | EVENT | CLEANUP }
* DAEMON { STATUS | PAUSE | UNPAUSE }
* *
* [node] is an optional hostname, provided instead of the -h/--host * [node] is an optional hostname, provided instead of the -h/--host
* option * option
@@ -808,7 +775,6 @@ main(int argc, char **argv)
action = PRIMARY_REGISTER; action = PRIMARY_REGISTER;
else if (strcasecmp(repmgr_action, "UNREGISTER") == 0) else if (strcasecmp(repmgr_action, "UNREGISTER") == 0)
action = PRIMARY_UNREGISTER; action = PRIMARY_UNREGISTER;
/* allow "primary check"/"primary status" as aliases for "node check"/"node status" */
else if (strcasecmp(repmgr_action, "CHECK") == 0) else if (strcasecmp(repmgr_action, "CHECK") == 0)
action = NODE_CHECK; action = NODE_CHECK;
else if (strcasecmp(repmgr_action, "STATUS") == 0) else if (strcasecmp(repmgr_action, "STATUS") == 0)
@@ -835,7 +801,6 @@ main(int argc, char **argv)
action = STANDBY_FOLLOW; action = STANDBY_FOLLOW;
else if (strcasecmp(repmgr_action, "SWITCHOVER") == 0) else if (strcasecmp(repmgr_action, "SWITCHOVER") == 0)
action = STANDBY_SWITCHOVER; action = STANDBY_SWITCHOVER;
/* allow "standby check"/"standby status" as aliases for "node check"/"node status" */
else if (strcasecmp(repmgr_action, "CHECK") == 0) else if (strcasecmp(repmgr_action, "CHECK") == 0)
action = NODE_CHECK; action = NODE_CHECK;
else if (strcasecmp(repmgr_action, "STATUS") == 0) else if (strcasecmp(repmgr_action, "STATUS") == 0)
@@ -911,21 +876,6 @@ main(int argc, char **argv)
else if (strcasecmp(repmgr_action, "CLEANUP") == 0) else if (strcasecmp(repmgr_action, "CLEANUP") == 0)
action = CLUSTER_CLEANUP; action = CLUSTER_CLEANUP;
} }
else if (strcasecmp(repmgr_command, "DAEMON") == 0)
{
if (help_option == true)
{
do_daemon_help();
exit(SUCCESS);
}
if (strcasecmp(repmgr_action, "STATUS") == 0)
action = DAEMON_STATUS;
else if (strcasecmp(repmgr_action, "PAUSE") == 0)
action = DAEMON_PAUSE;
else if (strcasecmp(repmgr_action, "UNPAUSE") == 0)
action = DAEMON_UNPAUSE;
}
else else
{ {
valid_repmgr_command_found = false; valid_repmgr_command_found = false;
@@ -1029,30 +979,9 @@ main(int argc, char **argv)
if (cli_errors.head != NULL) if (cli_errors.head != NULL)
{ {
free_conninfo_params(&source_conninfo); free_conninfo_params(&source_conninfo);
exit_with_cli_errors(&cli_errors);
exit_with_cli_errors(&cli_errors, valid_repmgr_command_found == true ? repmgr_command : NULL);
} }
/* no errors detected by repmgr, but getopt might have */
if (option_error_found == true)
{
if (valid_repmgr_command_found == true)
{
printf(_("Try \"%s --help\" or \"%s %s --help\" for more information.\n"),
progname(),
progname(),
repmgr_command);
}
else
{
printf(_("Try \"repmgr --help\" for more information.\n"));
}
free_conninfo_params(&source_conninfo);
exit(ERR_BAD_CONFIG);
}
/* /*
* Print any warnings about inappropriate command line options, unless * Print any warnings about inappropriate command line options, unless
* -t/--terse set * -t/--terse set
@@ -1148,17 +1077,6 @@ main(int argc, char **argv)
logger_set_min_level(LOG_INFO); logger_set_min_level(LOG_INFO);
} }
/*
* If -q/--quiet supplied, suppress any non-ERROR log output.
* This overrides everything else; we'll leave it up to the user to deal with the
* consequences of e.g. running --dry-run together with -q/--quiet.
*/
if (runtime_options.quiet == true)
{
logger_set_level(LOG_ERROR);
}
/* /*
* Node configuration information is not needed for all actions, with * Node configuration information is not needed for all actions, with
@@ -1324,17 +1242,6 @@ main(int argc, char **argv)
do_cluster_cleanup(); do_cluster_cleanup();
break; break;
/* DAEMON */
case DAEMON_STATUS:
do_daemon_status();
break;
case DAEMON_PAUSE:
do_daemon_pause();
break;
case DAEMON_UNPAUSE:
do_daemon_unpause();
break;
default: default:
/* An action will have been determined by this point */ /* An action will have been determined by this point */
break; break;
@@ -1399,7 +1306,7 @@ check_cli_parameters(const int action)
if (!runtime_options.host_param_provided) if (!runtime_options.host_param_provided)
{ {
item_list_append_format(&cli_errors, item_list_append_format(&cli_errors,
_("host name for the source node must be provided with -h/--host when executing %s"), _("host name for the source node must be provided when executing %s"),
action_name(action)); action_name(action));
} }
@@ -1456,7 +1363,7 @@ check_cli_parameters(const int action)
if (!runtime_options.host_param_provided) if (!runtime_options.host_param_provided)
{ {
item_list_append_format(&cli_errors, item_list_append_format(&cli_errors,
_("host name for the source node must be provided with -h/--host when executing %s"), _("host name for the source node must be provided when executing %s"),
action_name(action)); action_name(action));
} }
} }
@@ -1556,8 +1463,6 @@ check_cli_parameters(const int action)
{ {
case PRIMARY_UNREGISTER: case PRIMARY_UNREGISTER:
case STANDBY_UNREGISTER: case STANDBY_UNREGISTER:
case WITNESS_UNREGISTER:
case CLUSTER_CLEANUP:
case CLUSTER_EVENT: case CLUSTER_EVENT:
case CLUSTER_MATRIX: case CLUSTER_MATRIX:
case CLUSTER_CROSSCHECK: case CLUSTER_CROSSCHECK:
@@ -1598,7 +1503,6 @@ check_cli_parameters(const int action)
case STANDBY_CLONE: case STANDBY_CLONE:
case STANDBY_REGISTER: case STANDBY_REGISTER:
case STANDBY_FOLLOW: case STANDBY_FOLLOW:
case BDR_REGISTER:
break; break;
default: default:
item_list_append_format(&cli_warnings, item_list_append_format(&cli_warnings,
@@ -1781,18 +1685,6 @@ check_cli_parameters(const int action)
} }
} }
if (runtime_options.repmgrd_no_pause == true)
{
switch (action)
{
case STANDBY_SWITCHOVER:
break;
default:
item_list_append_format(&cli_warnings,
_("--repmgrd-no-pause will be ignored when executing %s"),
action_name(action));
}
}
if (runtime_options.config_files[0] != '\0') if (runtime_options.config_files[0] != '\0')
{ {
@@ -1821,8 +1713,6 @@ check_cli_parameters(const int action)
case WITNESS_UNREGISTER: case WITNESS_UNREGISTER:
case NODE_REJOIN: case NODE_REJOIN:
case NODE_SERVICE: case NODE_SERVICE:
case DAEMON_PAUSE:
case DAEMON_UNPAUSE:
break; break;
default: default:
item_list_append_format(&cli_warnings, item_list_append_format(&cli_warnings,
@@ -1902,14 +1792,6 @@ action_name(const int action)
return "CLUSTER MATRIX"; return "CLUSTER MATRIX";
case CLUSTER_CROSSCHECK: case CLUSTER_CROSSCHECK:
return "CLUSTER CROSSCHECK"; return "CLUSTER CROSSCHECK";
case DAEMON_STATUS:
return "DAEMON STATUS";
case DAEMON_PAUSE:
return "DAEMON PAUSE";
case DAEMON_UNPAUSE:
return "DAEMON UNPAUSE";
} }
return "UNKNOWN ACTION"; return "UNKNOWN ACTION";
@@ -1937,42 +1819,6 @@ print_error_list(ItemList *error_list, int log_level)
} }
void
print_status_header(int cols, ColHeader *headers)
{
int i;
for (i = 0; i < cols; i++)
{
if (i == 0)
printf(" ");
else
printf(" | ");
printf("%-*s",
headers[i].max_length,
headers[i].title);
}
printf("\n");
printf("-");
for (i = 0; i < cols; i++)
{
int j;
for (j = 0; j < headers[i].max_length; j++)
printf("-");
if (i < (cols - 1))
printf("-+-");
else
printf("-");
}
printf("\n");
}
void void
print_help_header(void) print_help_header(void)
{ {
@@ -1999,13 +1845,12 @@ do_help(void)
printf(_(" %s [OPTIONS] standby {register|unregister|clone|promote|follow|switchover}\n"), progname()); printf(_(" %s [OPTIONS] standby {register|unregister|clone|promote|follow|switchover}\n"), progname());
printf(_(" %s [OPTIONS] bdr {register|unregister}\n"), progname()); printf(_(" %s [OPTIONS] bdr {register|unregister}\n"), progname());
printf(_(" %s [OPTIONS] node {status|check|rejoin|service}\n"), progname()); printf(_(" %s [OPTIONS] node {status|check|rejoin|service}\n"), progname());
printf(_(" %s [OPTIONS] cluster {show|event|matrix|crosscheck|cleanup}\n"), progname()); printf(_(" %s [OPTIONS] cluster {show|event|matrix|crosscheck}\n"), progname());
printf(_(" %s [OPTIONS] witness {register|unregister}\n"), progname()); printf(_(" %s [OPTIONS] witness {register|unregister}\n"), progname());
printf(_(" %s [OPTIONS] daemon {status|pause|unpause}\n"), progname());
puts(""); puts("");
printf(_(" Execute \"%s {primary|standby|bdr|node|cluster|witness|daemon} --help\" to see command-specific options\n"), progname()); printf(_(" Execute \"%s {primary|standby|bdr|node|cluster} --help\" to see command-specific options\n"), progname());
puts(""); puts("");
@@ -2049,10 +1894,11 @@ do_help(void)
printf(_(" --dry-run show what would happen for action, but don't execute it\n")); printf(_(" --dry-run show what would happen for action, but don't execute it\n"));
printf(_(" -L, --log-level set log level (overrides configuration file; default: NOTICE)\n")); printf(_(" -L, --log-level set log level (overrides configuration file; default: NOTICE)\n"));
printf(_(" --log-to-file log to file (or logging facility) defined in repmgr.conf\n")); printf(_(" --log-to-file log to file (or logging facility) defined in repmgr.conf\n"));
printf(_(" -q, --quiet suppress all log output apart from errors\n"));
printf(_(" -t, --terse don't display detail, hints and other non-critical output\n")); printf(_(" -t, --terse don't display detail, hints and other non-critical output\n"));
printf(_(" -v, --verbose display additional log output (useful for debugging)\n")); printf(_(" -v, --verbose display additional log output (useful for debugging)\n"));
puts(""); puts("");
} }
@@ -2079,9 +1925,8 @@ create_repmgr_extension(PGconn *conn)
bool is_superuser = false; bool is_superuser = false;
PGconn *superuser_conn = NULL; PGconn *superuser_conn = NULL;
PGconn *schema_create_conn = NULL; PGconn *schema_create_conn = NULL;
t_extension_versions extversions = T_EXTENSION_VERSIONS_INITIALIZER;
extension_status = get_repmgr_extension_status(conn, &extversions); extension_status = get_repmgr_extension_status(conn);
switch (extension_status) switch (extension_status)
{ {
@@ -2093,15 +1938,8 @@ create_repmgr_extension(PGconn *conn)
log_error(_("\"repmgr\" extension is not available")); log_error(_("\"repmgr\" extension is not available"));
return false; return false;
case REPMGR_OLD_VERSION_INSTALLED:
log_error(_("an older version of the \"repmgr\" extension is installed"));
log_detail(_("version %s is installed but newer version %s is available"),
extversions.installed_version,
extversions.default_version);
log_hint(_("execute \"ALTER EXTENSION repmgr UPGRADE\""));
return false;
case REPMGR_INSTALLED: case REPMGR_INSTALLED:
/* TODO: check version */
log_info(_("\"repmgr\" extension is already installed")); log_info(_("\"repmgr\" extension is already installed"));
return true; return true;
@@ -2679,29 +2517,11 @@ remote_command(const char *host, const char *user, const char *command, PQExpBuf
void void
make_remote_repmgr_path(PQExpBufferData *output_buf, t_node_info *remote_node_record) make_remote_repmgr_path(PQExpBufferData *output_buf, t_node_info *remote_node_record)
{ {
if (config_file_options.repmgr_bindir[0] != '\0')
{
int len = strlen(config_file_options.repmgr_bindir);
appendPQExpBufferStr(output_buf,
config_file_options.repmgr_bindir);
/* Add trailing slash */
if (config_file_options.repmgr_bindir[len - 1] != '/')
{
appendPQExpBufferChar(output_buf, '/');
}
}
else if (pg_bindir[0] != '\0')
{
appendPQExpBufferStr(output_buf,
pg_bindir);
}
appendPQExpBuffer(output_buf, appendPQExpBuffer(output_buf,
"%s -f %s ", "%s -f %s ",
progname(), make_pg_path(progname()),
remote_node_record->config_file); remote_node_record->config_file);
} }
@@ -3099,45 +2919,3 @@ can_use_pg_rewind(PGconn *conn, const char *data_directory, PQExpBufferData *rea
return can_use; return can_use;
} }
void
drop_replication_slot_if_exists(PGconn *conn, int node_id, char *slot_name)
{
t_replication_slot slot_info = T_REPLICATION_SLOT_INITIALIZER;
RecordStatus record_status = get_slot_record(conn, slot_name, &slot_info);
log_verbose(LOG_DEBUG, "attempting to delete slot \"%s\" on node %i",
slot_name, node_id);
if (record_status != RECORD_FOUND)
{
/* this is a good thing */
log_verbose(LOG_INFO,
_("slot \"%s\" does not exist on node %i, nothing to remove"),
slot_name, node_id);
}
else
{
if (slot_info.active == false)
{
if (drop_replication_slot(conn, slot_name) == true)
{
log_notice(_("replication slot \"%s\" deleted on node %i"), slot_name, node_id);
}
else
{
log_error(_("unable to delete replication slot \"%s\" on node %i"), slot_name, node_id);
}
}
/*
* if active replication slot exists, call Houston as we have a
* problem
*/
else
{
log_warning(_("replication slot \"%s\" is still active on node %i"), slot_name, node_id);
}
}
}

View File

@@ -45,9 +45,6 @@
#define CLUSTER_MATRIX 19 #define CLUSTER_MATRIX 19
#define CLUSTER_CROSSCHECK 20 #define CLUSTER_CROSSCHECK 20
#define CLUSTER_EVENT 21 #define CLUSTER_EVENT 21
#define DAEMON_STATUS 22
#define DAEMON_PAUSE 23
#define DAEMON_UNPAUSE 24
/* command line options without short versions */ /* command line options without short versions */
#define OPT_HELP 1001 #define OPT_HELP 1001
@@ -90,8 +87,6 @@
#define OPT_REMOTE_NODE_ID 1038 #define OPT_REMOTE_NODE_ID 1038
#define OPT_RECOVERY_CONF_ONLY 1039 #define OPT_RECOVERY_CONF_ONLY 1039
#define OPT_NO_WAIT 1040 #define OPT_NO_WAIT 1040
#define OPT_MISSING_SLOTS 1041
#define OPT_REPMGRD_NO_PAUSE 1042
/* deprecated since 3.3 */ /* deprecated since 3.3 */
#define OPT_DATA_DIR 999 #define OPT_DATA_DIR 999
@@ -130,7 +125,6 @@ static struct option long_options[] =
/* logging options */ /* logging options */
{"log-level", required_argument, NULL, 'L'}, {"log-level", required_argument, NULL, 'L'},
{"log-to-file", no_argument, NULL, OPT_LOG_TO_FILE}, {"log-to-file", no_argument, NULL, OPT_LOG_TO_FILE},
{"quiet", no_argument, NULL, 'q'},
{"terse", no_argument, NULL, 't'}, {"terse", no_argument, NULL, 't'},
{"verbose", no_argument, NULL, 'v'}, {"verbose", no_argument, NULL, 'v'},
@@ -160,7 +154,6 @@ static struct option long_options[] =
*/ */
{"always-promote", no_argument, NULL, OPT_ALWAYS_PROMOTE}, {"always-promote", no_argument, NULL, OPT_ALWAYS_PROMOTE},
{"siblings-follow", no_argument, NULL, OPT_SIBLINGS_FOLLOW}, {"siblings-follow", no_argument, NULL, OPT_SIBLINGS_FOLLOW},
{"repmgrd-no-pause", no_argument, NULL, OPT_REPMGRD_NO_PAUSE},
/* "node status" options */ /* "node status" options */
{"is-shutdown-cleanly", no_argument, NULL, OPT_IS_SHUTDOWN_CLEANLY}, {"is-shutdown-cleanly", no_argument, NULL, OPT_IS_SHUTDOWN_CLEANLY},
@@ -171,7 +164,6 @@ static struct option long_options[] =
{"replication-lag", no_argument, NULL, OPT_REPLICATION_LAG}, {"replication-lag", no_argument, NULL, OPT_REPLICATION_LAG},
{"role", no_argument, NULL, OPT_ROLE}, {"role", no_argument, NULL, OPT_ROLE},
{"slots", no_argument, NULL, OPT_SLOTS}, {"slots", no_argument, NULL, OPT_SLOTS},
{"missing-slots", no_argument, NULL, OPT_MISSING_SLOTS},
{"has-passfile", no_argument, NULL, OPT_HAS_PASSFILE}, {"has-passfile", no_argument, NULL, OPT_HAS_PASSFILE},
{"replication-connection", no_argument, NULL, OPT_REPL_CONN}, {"replication-connection", no_argument, NULL, OPT_REPL_CONN},

262
repmgr.c
View File

@@ -26,7 +26,6 @@
#include "access/xlog.h" #include "access/xlog.h"
#include "miscadmin.h" #include "miscadmin.h"
#include "replication/walreceiver.h" #include "replication/walreceiver.h"
#include "storage/fd.h"
#include "storage/ipc.h" #include "storage/ipc.h"
#include "storage/lwlock.h" #include "storage/lwlock.h"
#include "storage/procarray.h" #include "storage/procarray.h"
@@ -44,21 +43,14 @@
#include "lib/stringinfo.h" #include "lib/stringinfo.h"
#include "access/xact.h" #include "access/xact.h"
#include "utils/snapmgr.h" #include "utils/snapmgr.h"
#if (PG_VERSION_NUM >= 90400)
#include "pgstat.h" #include "pgstat.h"
#else
#define PGSTAT_STAT_PERMANENT_DIRECTORY "pg_stat"
#endif
#include "voting.h" #include "voting.h"
#define UNKNOWN_NODE_ID -1 #define UNKNOWN_NODE_ID -1
#define UNKNOWN_PID -1
#define TRANCHE_NAME "repmgrd" #define TRANCHE_NAME "repmgrd"
#define REPMGRD_STATE_FILE PGSTAT_STAT_PERMANENT_DIRECTORY "/repmgrd_state.txt"
#define REPMGRD_STATE_FILE_BUF_SIZE 128
PG_MODULE_MAGIC; PG_MODULE_MAGIC;
@@ -74,9 +66,6 @@ typedef struct repmgrdSharedState
LWLockId lock; /* protects search/modification */ LWLockId lock; /* protects search/modification */
TimestampTz last_updated; TimestampTz last_updated;
int local_node_id; int local_node_id;
int repmgrd_pid;
char repmgrd_pidfile[MAXPGPATH];
bool repmgrd_paused;
/* streaming failover */ /* streaming failover */
NodeVotingStatus voting_status; NodeVotingStatus voting_status;
int current_electoral_term; int current_electoral_term;
@@ -123,25 +112,6 @@ PG_FUNCTION_INFO_V1(am_bdr_failover_handler);
Datum unset_bdr_failover_handler(PG_FUNCTION_ARGS); Datum unset_bdr_failover_handler(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(unset_bdr_failover_handler); PG_FUNCTION_INFO_V1(unset_bdr_failover_handler);
Datum set_repmgrd_pid(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(set_repmgrd_pid);
Datum get_repmgrd_pid(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(get_repmgrd_pid);
Datum get_repmgrd_pidfile(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(get_repmgrd_pidfile);
Datum repmgrd_is_running(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(repmgrd_is_running);
Datum repmgrd_pause(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(repmgrd_pause);
Datum repmgrd_is_paused(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(repmgrd_is_paused);
/* /*
* Module load callback * Module load callback
@@ -215,9 +185,6 @@ repmgr_shmem_startup(void)
#endif #endif
shared_state->local_node_id = UNKNOWN_NODE_ID; shared_state->local_node_id = UNKNOWN_NODE_ID;
shared_state->repmgrd_pid = UNKNOWN_PID;
memset(shared_state->repmgrd_pidfile, 0, MAXPGPATH);
shared_state->repmgrd_paused = false;
shared_state->current_electoral_term = 0; shared_state->current_electoral_term = 0;
shared_state->voting_status = VS_NO_VOTE; shared_state->voting_status = VS_NO_VOTE;
shared_state->candidate_node_id = UNKNOWN_NODE_ID; shared_state->candidate_node_id = UNKNOWN_NODE_ID;
@@ -237,8 +204,6 @@ Datum
set_local_node_id(PG_FUNCTION_ARGS) set_local_node_id(PG_FUNCTION_ARGS)
{ {
int local_node_id = UNKNOWN_NODE_ID; int local_node_id = UNKNOWN_NODE_ID;
int stored_node_id = UNKNOWN_NODE_ID;
int paused = -1;
if (!shared_state) if (!shared_state)
PG_RETURN_NULL(); PG_RETURN_NULL();
@@ -248,34 +213,6 @@ set_local_node_id(PG_FUNCTION_ARGS)
local_node_id = PG_GETARG_INT32(0); local_node_id = PG_GETARG_INT32(0);
/* read state file and if exists/valid, update "repmgrd_paused" */
{
FILE *file = NULL;
file = AllocateFile(REPMGRD_STATE_FILE, PG_BINARY_R);
if (file != NULL)
{
int buffer_size = REPMGRD_STATE_FILE_BUF_SIZE;
char buffer[REPMGRD_STATE_FILE_BUF_SIZE];
if (fgets(buffer, buffer_size, file) != NULL)
{
if (sscanf(buffer, "%i:%i", &stored_node_id, &paused) != 2)
{
elog(WARNING, "unable to parse repmgrd state file");
}
else
{
elog(DEBUG1, "node_id: %i; paused: %i", stored_node_id, paused);
}
}
FreeFile(file);
}
}
LWLockAcquire(shared_state->lock, LW_EXCLUSIVE); LWLockAcquire(shared_state->lock, LW_EXCLUSIVE);
/* only set local_node_id once, as it should never change */ /* only set local_node_id once, as it should never change */
@@ -284,19 +221,6 @@ set_local_node_id(PG_FUNCTION_ARGS)
shared_state->local_node_id = local_node_id; shared_state->local_node_id = local_node_id;
} }
/* only update if state file valid */
if (stored_node_id == shared_state->local_node_id)
{
if (paused == 0)
{
shared_state->repmgrd_paused = false;
}
else if (paused == 1)
{
shared_state->repmgrd_paused = true;
}
}
LWLockRelease(shared_state->lock); LWLockRelease(shared_state->lock);
PG_RETURN_VOID(); PG_RETURN_VOID();
@@ -492,191 +416,9 @@ unset_bdr_failover_handler(PG_FUNCTION_ARGS)
LWLockAcquire(shared_state->lock, LW_EXCLUSIVE); LWLockAcquire(shared_state->lock, LW_EXCLUSIVE);
shared_state->bdr_failover_handler = UNKNOWN_NODE_ID; shared_state->bdr_failover_handler = UNKNOWN_NODE_ID;
}
LWLockRelease(shared_state->lock); LWLockRelease(shared_state->lock);
}
PG_RETURN_VOID(); PG_RETURN_VOID();
} }
/*
* Returns the repmgrd pid; or NULL if none set; or -1 if set but repmgrd
* process not running (TODO!)
*/
Datum
get_repmgrd_pid(PG_FUNCTION_ARGS)
{
int repmgrd_pid = UNKNOWN_PID;
if (!shared_state)
PG_RETURN_NULL();
LWLockAcquire(shared_state->lock, LW_SHARED);
repmgrd_pid = shared_state->repmgrd_pid;
LWLockRelease(shared_state->lock);
PG_RETURN_INT32(repmgrd_pid);
}
/*
* Returns the repmgrd pidfile
*/
Datum
get_repmgrd_pidfile(PG_FUNCTION_ARGS)
{
char repmgrd_pidfile[MAXPGPATH];
if (!shared_state)
PG_RETURN_NULL();
memset(repmgrd_pidfile, 0, MAXPGPATH);
LWLockAcquire(shared_state->lock, LW_SHARED);
strncpy(repmgrd_pidfile, shared_state->repmgrd_pidfile, MAXPGPATH);
LWLockRelease(shared_state->lock);
if (repmgrd_pidfile[0] == '\0')
PG_RETURN_NULL();
PG_RETURN_TEXT_P(cstring_to_text(repmgrd_pidfile));
}
Datum
set_repmgrd_pid(PG_FUNCTION_ARGS)
{
int repmgrd_pid = UNKNOWN_PID;
char *repmgrd_pidfile = NULL;
if (!shared_state)
PG_RETURN_VOID();
if (PG_ARGISNULL(0))
{
repmgrd_pid = UNKNOWN_PID;
}
else
{
repmgrd_pid = PG_GETARG_INT32(0);
}
elog(DEBUG3, "set_repmgrd_pid(): provided pid is %i", repmgrd_pid);
if (repmgrd_pid != UNKNOWN_PID && !PG_ARGISNULL(1))
{
repmgrd_pidfile = text_to_cstring(PG_GETARG_TEXT_PP(1));
elog(INFO, "set_repmgrd_pid(): provided pidfile is %s", repmgrd_pidfile);
}
LWLockAcquire(shared_state->lock, LW_EXCLUSIVE);
shared_state->repmgrd_pid = repmgrd_pid;
memset(shared_state->repmgrd_pidfile, 0, MAXPGPATH);
if(repmgrd_pidfile != NULL)
{
strncpy(shared_state->repmgrd_pidfile, repmgrd_pidfile, MAXPGPATH);
}
LWLockRelease(shared_state->lock);
PG_RETURN_VOID();
}
Datum
repmgrd_is_running(PG_FUNCTION_ARGS)
{
int repmgrd_pid = UNKNOWN_PID;
int kill_ret;
if (!shared_state)
PG_RETURN_NULL();
LWLockAcquire(shared_state->lock, LW_SHARED);
repmgrd_pid = shared_state->repmgrd_pid;
LWLockRelease(shared_state->lock);
/* No PID registered - assume not running */
if (repmgrd_pid == UNKNOWN_PID)
{
PG_RETURN_BOOL(false);
}
kill_ret = kill(repmgrd_pid, 0);
if (kill_ret == 0)
{
PG_RETURN_BOOL(true);
}
PG_RETURN_BOOL(false);
}
Datum
repmgrd_pause(PG_FUNCTION_ARGS)
{
bool pause;
FILE *file = NULL;
StringInfoData buf;
if (!shared_state)
PG_RETURN_NULL();
if (PG_ARGISNULL(0))
PG_RETURN_NULL();
pause = PG_GETARG_BOOL(0);
LWLockAcquire(shared_state->lock, LW_EXCLUSIVE);
shared_state->repmgrd_paused = pause;
LWLockRelease(shared_state->lock);
/* write state to file */
file = AllocateFile(REPMGRD_STATE_FILE, PG_BINARY_W);
if (file == NULL)
{
elog(DEBUG1, "unable to allocate %s", REPMGRD_STATE_FILE);
// XXX anything else we can do? log?
PG_RETURN_VOID();
}
elog(DEBUG1, "allocated");
initStringInfo(&buf);
LWLockAcquire(shared_state->lock, LW_SHARED);
appendStringInfo(&buf, "%i:%i",
shared_state->local_node_id,
pause ? 1 : 0);
LWLockRelease(shared_state->lock);
// XXX check success
fwrite(buf.data, strlen(buf.data) + 1, 1, file);
resetStringInfo(&buf);
FreeFile(file);
PG_RETURN_VOID();
}
Datum
repmgrd_is_paused(PG_FUNCTION_ARGS)
{
bool is_paused;
if (!shared_state)
PG_RETURN_NULL();
LWLockAcquire(shared_state->lock, LW_SHARED);
is_paused = shared_state->repmgrd_paused;
LWLockRelease(shared_state->lock);
PG_RETURN_BOOL(is_paused);
}

View File

@@ -5,13 +5,7 @@
# Some configuration items will be set with a default value; this # Some configuration items will be set with a default value; this
# is noted for each item. Where no default value is shown, the # is noted for each item. Where no default value is shown, the
# parameter will be treated as empty or false. # parameter will be treated as empty or false.
#
# IMPORTANT: string values can be provided as-is, or enclosed in single quotes
# (but not double-quotes, which will be interpreted as part of the string), e.g.:
#
# node_name=foo
# node_name = 'foo'
#
# ============================================================================= # =============================================================================
# Required configuration items # Required configuration items
# ============================================================================= # =============================================================================
@@ -104,7 +98,7 @@
#log_facility=STDERR # Logging facility: possible values are STDERR, or for #log_facility=STDERR # Logging facility: possible values are STDERR, or for
# syslog integration, one of LOCAL0, LOCAL1, ..., LOCAL7, USER # syslog integration, one of LOCAL0, LOCAL1, ..., LOCAL7, USER
#log_file='' # STDERR can be redirected to an arbitrary file #log_file='' # stderr can be redirected to an arbitrary file
#log_status_interval=300 # interval (in seconds) for repmgrd to log a status message #log_status_interval=300 # interval (in seconds) for repmgrd to log a status message
@@ -149,15 +143,6 @@
# Debian/Ubuntu users: you will probably need to # Debian/Ubuntu users: you will probably need to
# set this to the directory where `pg_ctl` is located, # set this to the directory where `pg_ctl` is located,
# e.g. /usr/lib/postgresql/9.6/bin/ # e.g. /usr/lib/postgresql/9.6/bin/
#
# *NOTE* "pg_bindir" is only used when repmgr directly
# executes PostgreSQL binaries; any user-defined scripts
# *must* be specified with the full path
#repmgr_bindir='' # Path to repmgr binary directory (location of the repmgr
# binary. Only needed if the repmgr executable is not in
# the system $PATH or the path defined in "pg_bindir".
#use_primary_conninfo_password=false # explicitly set "password" in recovery.conf's #use_primary_conninfo_password=false # explicitly set "password" in recovery.conf's
# "primary_conninfo" parameter using the value contained # "primary_conninfo" parameter using the value contained
# in the environment variable PGPASSWORD # in the environment variable PGPASSWORD
@@ -171,7 +156,7 @@
# Examples: # Examples:
# #
# pg_ctl_options='-s' # pg_ctl_options='-s'
# pg_basebackup_options='--label=repmgr_backup' # pg_basebackup_options='--label=repmgr_backup
# rsync_options=--archive --checksum --compress --progress --rsh="ssh -o \"StrictHostKeyChecking no\"" # rsync_options=--archive --checksum --compress --progress --rsh="ssh -o \"StrictHostKeyChecking no\""
# ssh_options=-o "StrictHostKeyChecking no" # ssh_options=-o "StrictHostKeyChecking no"
@@ -222,7 +207,7 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
#------------------------------------------------------------------------------ #------------------------------------------------------------------------------
# "standby follow" settings # Standby follow settings
#------------------------------------------------------------------------------ #------------------------------------------------------------------------------
# These settings apply when instructing a standby to follow the new primary # These settings apply when instructing a standby to follow the new primary
@@ -234,30 +219,6 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
# for the standby to connect to the primary # for the standby to connect to the primary
#------------------------------------------------------------------------------
# "standby switchover" settings
#------------------------------------------------------------------------------
# These settings apply when switching roles between a primary and a standby
# ("repmgr standby switchover").
#shutdown_check_timeout=60 # The max length of time (in seconds) to wait for the demotion
# candidate (current primary) to shut down
#standby_reconnect_timeout=60 # The max length of time (in seconds) to wait
# for the demoted standby to reconnect to the promoted
# primary (note: this value should be equal to or greater
# than that set for "node_rejoin_timeout")
#------------------------------------------------------------------------------
# "node rejoin" settings
#------------------------------------------------------------------------------
# These settings apply when reintegrating a node into a replication cluster
# with "repmgrd_node_rejoin"
#node_rejoin_timeout=60 # The maximum length of time (in seconds) to wait for
# the node to reconnect to the replication cluster
#------------------------------------------------------------------------------ #------------------------------------------------------------------------------
# Barman options # Barman options
#------------------------------------------------------------------------------ #------------------------------------------------------------------------------
@@ -275,11 +236,6 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
# These settings are only applied when repmgrd is running. Values shown # These settings are only applied when repmgrd is running. Values shown
# are defaults. # are defaults.
#repmgrd_pid_file= # Path of PID file to use for repmgrd; if not set, a PID file will
# be generated in a temporary directory specified by the environment
# variable $TMPDIR, or if not set, in "/tmp". This value can be overridden
# by the command line option "-p/--pid-file"; the command line option
# "--no-pid-file" will force PID file creation to be skipped.
#failover=manual # one of 'automatic', 'manual'. #failover=manual # one of 'automatic', 'manual'.
# determines what action to take in the event of upstream failure # determines what action to take in the event of upstream failure
# #
@@ -289,11 +245,11 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
# manual attention to reattach it to replication # manual attention to reattach it to replication
# (does not apply to BDR mode) # (does not apply to BDR mode)
#priority=100 # indicate a preferred priority for promoting nodes; #priority=100 # indicate a preferred priorty for promoting nodes;
# a value of zero prevents the node being promoted to primary # a value of zero prevents the node being promoted to primary
# (default: 100) # (default: 100)
#reconnect_attempts=6 # Number of attempts which will be made to reconnect to an unreachable #reconnect_attempts=6 # Number attempts which will be made to reconnect to an unreachable
# primary (or other upstream node) # primary (or other upstream node)
#reconnect_interval=10 # Interval between attempts to reconnect to an unreachable #reconnect_interval=10 # Interval between attempts to reconnect to an unreachable
# primary (or other upstream node) # primary (or other upstream node)
@@ -309,9 +265,8 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
#primary_notification_timeout=60 # Interval (in seconds) which repmgrd on a standby #primary_notification_timeout=60 # Interval (in seconds) which repmgrd on a standby
# will wait for a notification from the new primary, # will wait for a notification from the new primary,
# before falling back to degraded monitoring # before falling back to degraded monitoring
#repmgrd_standby_startup_timeout=60 # Interval (in seconds) which repmgrd on a standby will wait #standby_reconnect_timeout=60 # Interval (in seconds) which repmgrd on a standby will wait
# for the the local node to restart and become ready to accept connections after # to reconnect to the local node after executing "follow_command"
# executing "follow_command" (defaults to the value set in "standby_reconnect_timeout")
#monitoring_history=no # Whether to write monitoring data to the "montoring_history" table #monitoring_history=no # Whether to write monitoring data to the "montoring_history" table
#monitor_interval_secs=2 # Interval (in seconds) at which to write monitoring data #monitor_interval_secs=2 # Interval (in seconds) at which to write monitoring data
@@ -349,7 +304,7 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
# #
# Debian/Ubuntu users: use "sudo pg_ctlcluster" to execute service control commands. # Debian/Ubuntu users: use "sudo pg_ctlcluster" to execute service control commands.
# #
# For more details, see: https://repmgr.org/docs/4.1/configuration-service-commands.html # For more details, see: https://repmgr.org/docs/4.0/configuration-service-commands.html
#service_start_command = '' #service_start_command = ''
#service_stop_command = '' #service_stop_command = ''

View File

@@ -1,6 +1,6 @@
# repmgr extension # repmgr extension
comment = 'Replication manager for PostgreSQL' comment = 'Replication manager for PostgreSQL'
default_version = '4.2' default_version = '4.0'
module_pathname = '$libdir/repmgr' module_pathname = '$libdir/repmgr'
relocatable = false relocatable = false
schema = repmgr schema = repmgr

View File

@@ -49,11 +49,8 @@
#define REPLICATION_TYPE_BDR 2 #define REPLICATION_TYPE_BDR 2
#define UNKNOWN_SERVER_VERSION_NUM -1 #define UNKNOWN_SERVER_VERSION_NUM -1
#define UNKNOWN_BDR_VERSION_NUM -1
#define UNKNOWN_TIMELINE_ID -1 #define UNKNOWN_TIMELINE_ID -1
#define UNKNOWN_SYSTEM_IDENTIFIER 0 #define UNKNOWN_SYSTEM_IDENTIFIER 0
#define UNKNOWN_PID -1
#define NODE_NOT_FOUND -1 #define NODE_NOT_FOUND -1
#define NO_UPSTREAM_NODE -1 #define NO_UPSTREAM_NODE -1
@@ -61,8 +58,6 @@
#define VOTING_TERM_NOT_SET -1 #define VOTING_TERM_NOT_SET -1
#define BDR2_REPLICATION_SET_NAME "repmgr"
/* /*
* various default values - ensure repmgr.conf.sample is update * various default values - ensure repmgr.conf.sample is update
* if any of these are changed * if any of these are changed
@@ -85,9 +80,7 @@
#define DEFAULT_WAIT_START 30 /* seconds */ #define DEFAULT_WAIT_START 30 /* seconds */
#define DEFAULT_PROMOTE_CHECK_TIMEOUT 60 /* seconds */ #define DEFAULT_PROMOTE_CHECK_TIMEOUT 60 /* seconds */
#define DEFAULT_PROMOTE_CHECK_INTERVAL 1 /* seconds */ #define DEFAULT_PROMOTE_CHECK_INTERVAL 1 /* seconds */
#define DEFAULT_SHUTDOWN_CHECK_TIMEOUT 60 /* seconds */
#define DEFAULT_STANDBY_RECONNECT_TIMEOUT 60 /* seconds */ #define DEFAULT_STANDBY_RECONNECT_TIMEOUT 60 /* seconds */
#define DEFAULT_NODE_REJOIN_TIMEOUT 60 /* seconds */
#ifndef RECOVERY_COMMAND_FILE #ifndef RECOVERY_COMMAND_FILE
#define RECOVERY_COMMAND_FILE "recovery.conf" #define RECOVERY_COMMAND_FILE "recovery.conf"

View File

@@ -1,2 +1,3 @@
#define REPMGR_VERSION_DATE "" #define REPMGR_VERSION_DATE ""
#define REPMGR_VERSION "4.2" #define REPMGR_VERSION "4.0.6"

View File

@@ -150,13 +150,7 @@ monitor_bdr(void)
* retrieve list of all nodes - we'll need these if the DB connection goes * retrieve list of all nodes - we'll need these if the DB connection goes
* away * away
*/ */
if (get_all_node_records(local_conn, &nodes) == false) get_all_node_records(local_conn, &nodes);
{
/* get_all_node_records() will display the error */
PQfinish(local_conn);
exit(ERR_BAD_CONFIG);
}
/* we're expecting all (both) nodes to be up */ /* we're expecting all (both) nodes to be up */
for (cell = nodes.head; cell; cell = cell->next) for (cell = nodes.head; cell; cell = cell->next)
@@ -220,8 +214,7 @@ monitor_bdr(void)
log_warning(_("unable to connect to node %s (ID %i)"), log_warning(_("unable to connect to node %s (ID %i)"),
cell->node_info->node_name, cell->node_info->node_id); cell->node_info->node_name, cell->node_info->node_id);
//cell->node_info->conn = try_reconnect(cell->node_info); cell->node_info->conn = try_reconnect(cell->node_info);
try_reconnect(&cell->node_info->conn, cell->node_info);
/* node has recovered - log and continue */ /* node has recovered - log and continue */
if (cell->node_info->node_status == NODE_STATUS_UP) if (cell->node_info->node_status == NODE_STATUS_UP)
@@ -300,7 +293,7 @@ loop:
/* /*
* if we can reload, then could need to change local_conn * if we can reload, then could need to change local_conn
*/ */
if (reload_config(&config_file_options, BDR)) if (reload_config(&config_file_options))
{ {
PQfinish(local_conn); PQfinish(local_conn);
local_conn = establish_db_connection(config_file_options.conninfo, true); local_conn = establish_db_connection(config_file_options.conninfo, true);
@@ -310,12 +303,11 @@ loop:
got_SIGHUP = false; got_SIGHUP = false;
} }
/* XXX this looks like it will never be called */
if (got_SIGHUP) if (got_SIGHUP)
{ {
log_debug("SIGHUP received"); log_debug("SIGHUP received");
if (reload_config(&config_file_options, BDR)) if (reload_config(&config_file_options))
{ {
PQfinish(local_conn); PQfinish(local_conn);
local_conn = establish_db_connection(config_file_options.conninfo, true); local_conn = establish_db_connection(config_file_options.conninfo, true);

File diff suppressed because it is too large Load Diff

176
repmgrd.c
View File

@@ -35,10 +35,8 @@
static char *config_file = NULL; static char *config_file = NULL;
static bool verbose = false; static bool verbose = false;
char pid_file[MAXPGPATH]; static char *pid_file = NULL;
static bool daemonize = true; static bool daemonize = false;
static bool show_pid_file = false;
static bool no_pid_file = false;
t_configuration_options config_file_options = T_CONFIGURATION_OPTIONS_INITIALIZER; t_configuration_options config_file_options = T_CONFIGURATION_OPTIONS_INITIALIZER;
@@ -88,7 +86,6 @@ main(int argc, char **argv)
RecordStatus record_status; RecordStatus record_status;
ExtensionStatus extension_status = REPMGR_UNKNOWN; ExtensionStatus extension_status = REPMGR_UNKNOWN;
t_extension_versions extversions = T_EXTENSION_VERSIONS_INITIALIZER;
FILE *fd; FILE *fd;
@@ -102,11 +99,8 @@ main(int argc, char **argv)
{"config-file", required_argument, NULL, 'f'}, {"config-file", required_argument, NULL, 'f'},
/* daemon options */ /* daemon options */
{"daemonize-short", optional_argument, NULL, 'd'}, {"daemonize", no_argument, NULL, 'd'},
{"daemonize", optional_argument, NULL, OPT_DAEMONIZE},
{"pid-file", required_argument, NULL, 'p'}, {"pid-file", required_argument, NULL, 'p'},
{"show-pid-file", no_argument, NULL, 's'},
{"no-pid-file", no_argument, NULL, OPT_NO_PID_FILE},
/* logging options */ /* logging options */
{"log-level", required_argument, NULL, 'L'}, {"log-level", required_argument, NULL, 'L'},
@@ -119,6 +113,8 @@ main(int argc, char **argv)
set_progname(argv[0]); set_progname(argv[0]);
srand(time(NULL));
/* Disallow running as root */ /* Disallow running as root */
if (geteuid() == 0) if (geteuid() == 0)
{ {
@@ -132,10 +128,6 @@ main(int argc, char **argv)
exit(1); exit(1);
} }
srand(time(NULL));
memset(pid_file, 0, MAXPGPATH);
while ((c = getopt_long(argc, argv, "?Vf:L:vdp:m", long_options, &optindex)) != -1) while ((c = getopt_long(argc, argv, "?Vf:L:vdp:m", long_options, &optindex)) != -1)
{ {
switch (c) switch (c)
@@ -180,20 +172,8 @@ main(int argc, char **argv)
daemonize = true; daemonize = true;
break; break;
case OPT_DAEMONIZE:
daemonize = parse_bool(optarg, "-d/--daemonize", &cli_errors);
break;
case 'p': case 'p':
strncpy(pid_file, optarg, MAXPGPATH); pid_file = optarg;
break;
case 's':
show_pid_file = true;
break;
case OPT_NO_PID_FILE:
no_pid_file = true;
break; break;
/* logging options */ /* logging options */
@@ -240,7 +220,7 @@ main(int argc, char **argv)
/* Exit here already if errors in command line options found */ /* Exit here already if errors in command line options found */
if (cli_errors.head != NULL) if (cli_errors.head != NULL)
{ {
exit_with_cli_errors(&cli_errors, NULL); exit_with_cli_errors(&cli_errors);
} }
startup_event_logged = false; startup_event_logged = false;
@@ -259,58 +239,6 @@ main(int argc, char **argv)
*/ */
load_config(config_file, verbose, false, &config_file_options, argv[0]); load_config(config_file, verbose, false, &config_file_options, argv[0]);
/* Determine pid file location, unless --no-pid-file supplied */
if (no_pid_file == false)
{
if (config_file_options.repmgrd_pid_file[0] != '\0')
{
if (pid_file[0] != '\0')
{
log_warning(_("\"repmgrd_pid_file\" will be overridden by --pid-file"));
}
else
{
strncpy(pid_file, config_file_options.repmgrd_pid_file, MAXPGPATH);
}
}
/* no pid file provided - determine location */
if (pid_file[0] == '\0')
{
/* packagers: if feasible, patch PID file path into "package_pid_file" */
char package_pid_file[MAXPGPATH] = "";
if (package_pid_file[0] != '\0')
{
maxpath_snprintf(pid_file, "%s", package_pid_file);
}
else
{
const char *tmpdir = getenv("TMPDIR");
if (!tmpdir)
tmpdir = "/tmp";
maxpath_snprintf(pid_file, "%s/repmgrd.pid", tmpdir);
}
}
}
else
{
/* --no-pid-file supplied - overwrite any value provided with --pid-file ... */
memset(pid_file, 0, MAXPGPATH);
}
/* If --show-pid-file supplied, output the location (if set) and exit */
if (show_pid_file == true)
{
printf("%s\n", pid_file);
exit(SUCCESS);
}
/* Some configuration file items can be overriden by command line options */ /* Some configuration file items can be overriden by command line options */
@@ -323,6 +251,8 @@ main(int argc, char **argv)
strncpy(config_file_options.log_level, cli_log_level, MAXLEN); strncpy(config_file_options.log_level, cli_log_level, MAXLEN);
} }
log_notice(_("repmgrd (repmgr %s) starting up"), REPMGR_VERSION);
/* /*
* -m/--monitoring-history, if provided, will override repmgr.conf's * -m/--monitoring-history, if provided, will override repmgr.conf's
* monitoring_history; this is for backwards compatibility as it's * monitoring_history; this is for backwards compatibility as it's
@@ -350,8 +280,6 @@ main(int argc, char **argv)
logger_init(&config_file_options, progname()); logger_init(&config_file_options, progname());
log_notice(_("repmgrd (%s %s) starting up"), progname(), REPMGR_VERSION);
if (verbose) if (verbose)
logger_set_verbose(); logger_set_verbose();
@@ -390,7 +318,7 @@ main(int argc, char **argv)
*/ */
/* Check "repmgr" the extension is installed */ /* Check "repmgr" the extension is installed */
extension_status = get_repmgr_extension_status(local_conn, &extversions); extension_status = get_repmgr_extension_status(local_conn);
if (extension_status != REPMGR_INSTALLED) if (extension_status != REPMGR_INSTALLED)
{ {
@@ -403,17 +331,6 @@ main(int argc, char **argv)
exit(ERR_DB_QUERY); exit(ERR_DB_QUERY);
} }
if (extension_status == REPMGR_OLD_VERSION_INSTALLED)
{
log_error(_("an older version of the \"repmgr\" extension is installed"));
log_detail(_("version %s is installed but newer version %s is available"),
extversions.installed_version,
extversions.default_version);
log_hint(_("verify the repmgr installation is updated properly before continuing"));
}
else
{
log_error(_("repmgr extension not found on this node")); log_error(_("repmgr extension not found on this node"));
if (extension_status == REPMGR_AVAILABLE) if (extension_status == REPMGR_AVAILABLE)
@@ -427,8 +344,6 @@ main(int argc, char **argv)
} }
log_hint(_("check that this node is part of a repmgr cluster")); log_hint(_("check that this node is part of a repmgr cluster"));
}
close_connection(&local_conn); close_connection(&local_conn);
exit(ERR_BAD_CONFIG); exit(ERR_BAD_CONFIG);
} }
@@ -499,14 +414,11 @@ main(int argc, char **argv)
daemonize_process(); daemonize_process();
} }
if (pid_file[0] != '\0') if (pid_file != NULL)
{ {
check_and_create_pid_file(pid_file); check_and_create_pid_file(pid_file);
} }
repmgrd_set_pid(local_conn, getpid(), pid_file);
#ifndef WIN32 #ifndef WIN32
setup_event_handlers(); setup_event_handlers();
#endif #endif
@@ -757,8 +669,6 @@ show_help(void)
{ {
printf(_("%s: replication management daemon for PostgreSQL\n"), progname()); printf(_("%s: replication management daemon for PostgreSQL\n"), progname());
puts(""); puts("");
printf(_("%s monitors a cluster of servers and optionally performs failover.\n"), progname());
puts("");
printf(_("Usage:\n")); printf(_("Usage:\n"));
printf(_(" %s [OPTIONS]\n"), progname()); printf(_(" %s [OPTIONS]\n"), progname());
@@ -778,22 +688,19 @@ show_help(void)
puts(""); puts("");
printf(_("Daemon configuration options:\n")); printf(_("General configuration options:\n"));
printf(_(" -d\n")); printf(_(" -d, --daemonize detach process from foreground\n"));
printf(_(" --daemonize[=true/false]\n")); printf(_(" -p, --pid-file=PATH write a PID file\n"));
printf(_(" detach process from foreground (default: true)\n"));
printf(_(" -p, --pid-file=PATH use the specified PID file\n"));
printf(_(" -s, --show-pid-file show PID file which would be used by the current configuration\n"));
printf(_(" --no-pid-file don't write a PID file\n"));
puts(""); puts("");
printf(_("%s monitors a cluster of servers and optionally performs failover.\n"), progname());
} }
void PGconn *
try_reconnect(PGconn **conn, t_node_info *node_info) try_reconnect(t_node_info *node_info)
{ {
PGconn *our_conn; PGconn *conn;
t_conninfo_param_list conninfo_params = T_CONNINFO_PARAM_LIST_INITIALIZER; t_conninfo_param_list conninfo_params = T_CONNINFO_PARAM_LIST_INITIALIZER;
int i; int i;
@@ -802,6 +709,7 @@ try_reconnect(PGconn **conn, t_node_info *node_info)
initialize_conninfo_params(&conninfo_params, false); initialize_conninfo_params(&conninfo_params, false);
/* we assume by now the conninfo string is parseable */ /* we assume by now the conninfo string is parseable */
(void) parse_conninfo_string(node_info->conninfo, &conninfo_params, NULL, false); (void) parse_conninfo_string(node_info->conninfo, &conninfo_params, NULL, false);
@@ -824,47 +732,18 @@ try_reconnect(PGconn **conn, t_node_info *node_info)
* degraded monitoring? - make that configurable * degraded monitoring? - make that configurable
*/ */
our_conn = establish_db_connection_by_params(&conninfo_params, false); conn = establish_db_connection_by_params(&conninfo_params, false);
if (PQstatus(our_conn) == CONNECTION_OK) if (PQstatus(conn) == CONNECTION_OK)
{ {
free_conninfo_params(&conninfo_params); free_conninfo_params(&conninfo_params);
log_info(_("connection to node %i succeeded"), node_info->node_id);
if (PQstatus(*conn) == CONNECTION_BAD)
{
log_verbose(LOG_INFO, "original connection handle returned CONNECTION_BAD, using new connection");
close_connection(conn);
*conn = our_conn;
}
else
{
ExecStatusType ping_result;
ping_result = connection_ping(*conn);
if (ping_result != PGRES_TUPLES_OK)
{
log_info("original connnection no longer available, using new connection");
close_connection(conn);
*conn = our_conn;
}
else
{
log_info(_("original connection is still available"));
PQfinish(our_conn);
}
}
node_info->node_status = NODE_STATUS_UP; node_info->node_status = NODE_STATUS_UP;
return conn;
return;
} }
close_connection(&our_conn); close_connection(&conn);
log_notice(_("unable to reconnect to node %i"), node_info->node_id); log_notice(_("unable to reconnect to node"));
} }
if (i + 1 < max_attempts) if (i + 1 < max_attempts)
@@ -883,7 +762,7 @@ try_reconnect(PGconn **conn, t_node_info *node_info)
free_conninfo_params(&conninfo_params); free_conninfo_params(&conninfo_params);
return; return NULL;
} }
@@ -921,12 +800,9 @@ print_monitoring_state(MonitoringState monitoring_state)
void void
terminate(int retval) terminate(int retval)
{ {
if (PQstatus(local_conn) == CONNECTION_OK)
repmgrd_set_pid(local_conn, UNKNOWN_PID, NULL);
logger_shutdown(); logger_shutdown();
if (pid_file[0] != '\0') if (pid_file)
{ {
unlink(pid_file); unlink(pid_file);
} }

View File

@@ -10,9 +10,6 @@
#include <time.h> #include <time.h>
#include "portability/instr_time.h" #include "portability/instr_time.h"
#define OPT_NO_PID_FILE 1000
#define OPT_DAEMONIZE 1001
extern volatile sig_atomic_t got_SIGHUP; extern volatile sig_atomic_t got_SIGHUP;
extern MonitoringState monitoring_state; extern MonitoringState monitoring_state;
extern instr_time degraded_monitoring_start; extern instr_time degraded_monitoring_start;
@@ -21,15 +18,12 @@ extern t_configuration_options config_file_options;
extern t_node_info local_node_info; extern t_node_info local_node_info;
extern PGconn *local_conn; extern PGconn *local_conn;
extern bool startup_event_logged; extern bool startup_event_logged;
extern char pid_file[MAXPGPATH];
void try_reconnect(PGconn **conn, t_node_info *node_info); PGconn *try_reconnect(t_node_info *node_info);
int calculate_elapsed(instr_time start_time); int calculate_elapsed(instr_time start_time);
const char *print_monitoring_state(MonitoringState monitoring_state); const char *print_monitoring_state(MonitoringState monitoring_state);
void update_registration(PGconn *conn); void update_registration(PGconn *conn);
void terminate(int retval); void terminate(int retval);
#endif /* _REPMGRD_H_ */ #endif /* _REPMGRD_H_ */

View File

@@ -87,17 +87,17 @@ append_where_clause(PQExpBufferData *where_clause, const char *format,...)
if (where_clause->data[0] == '\0') if (where_clause->data[0] == '\0')
{ {
appendPQExpBufferStr(where_clause, appendPQExpBuffer(where_clause,
" WHERE "); " WHERE ");
} }
else else
{ {
appendPQExpBufferStr(where_clause, appendPQExpBuffer(where_clause,
" AND "); " AND ");
} }
appendPQExpBufferStr(where_clause, appendPQExpBuffer(where_clause,
stringbuf); "%s", stringbuf);
} }