Compare commits

..

345 Commits

Author SHA1 Message Date
Ian Barwick
ee1a6f9d0f doc: add a link to the current documentation from the contents page 2019-04-03 10:48:36 +09:00
Ian Barwick
49eb408873 doc: fix typo
Per user report on mailing list.
2018-10-23 09:01:00 +09:00
Ian Barwick
fba3d29514 doc: clarify BDR repmgrd configuration
Link directly to section about configuring the "event_notification_command".
2018-07-23 13:23:28 +09:00
Ian Barwick
77200e5030 doc: remove duplicate item in list of event notifications 2018-07-18 16:11:18 +09:00
Ian Barwick
4589b8d439 doc: update documentation of "promote_command" and "service_promote_command"
See commit 63242e2277
2018-07-16 14:55:07 +09:00
Ian Barwick
048f7c3310 doc: add extra emphasis about not running repmgrd during switchover
One day this will no longer be an issue, until then let's hope the
fine documentation is read.
2018-07-11 09:55:37 +09:00
Ian Barwick
1e5f63792f node check: implement CSV output
This is advertised in the --help output and placeholder code was in
place, but it wasn't actually implemented.
2018-06-22 15:46:50 +09:00
Ian Barwick
d26989bd12 node status: improve output and documentation
In the default text output mode, list inactive slots.

In CSV output mode, list inactive slots as additional information;
add output line with number of missing slots and a list thereof.

Also document --csv output mode.
2018-06-22 15:46:44 +09:00
Ian Barwick
f999c810a7 node check: clarify status information for witness server
Previously the output gave the impression the server was a primary,
which is technically the case, but it's not the actual cluster primary.

Also output an error if the node is in recovery, which is unlikely but
you never know.
2018-06-22 15:46:40 +09:00
Ian Barwick
81077d4bc2 standby switchover: fix behaviour if witness node is a sibling
The witness node is not a streaming replication standby, so executing
"repmgr standby follow" will fail. Instead, execute "repmgr witness
register --force" to update the witness node record on the primary and
its local copy of all node records.

Addresses GitHub #453.
2018-06-21 17:16:18 +09:00
Ian Barwick
a549941d4f repmgr: don't count witness node as a standby when running "node status"
Addresses GitHub #451.
2018-06-21 14:27:47 +09:00
Ian Barwick
2f6c159f9a "repmgr node ...": update comments and formatting 2018-06-21 14:27:42 +09:00
Ian Barwick
2eca1a0311 repmgr: don't count witness node as a standby when running "node check"
Addresses GitHub #451.
2018-06-21 11:31:09 +09:00
Ian Barwick
f6377084ec doc: remove info about old RPM package repository 2018-06-15 11:14:10 +09:00
Ian Barwick
d85c02b92b doc: finalize release notes 2018-06-15 10:52:51 +09:00
Ian Barwick
d9ba41fc35 doc: emphasize that repmgrd should not be running during a switchover 2018-06-11 15:31:22 +09:00
Ian Barwick
afdaf9be66 _create_event(): log event and node ID for debugging 2018-06-11 15:20:01 +09:00
Ian Barwick
8067924c3e repmgr: consolidate code in "standby switchover"
Commit 41274f5525 left us with two if statements
in sequence with exactly the same condition, so consolidate both into a single
statement. Clarify code comments while we're at it.
2018-06-11 15:14:40 +09:00
Ian Barwick
e94a6eefde repmgr: cluster check commands - non-zero exit code if node(s) unavailable
Return ERR_CLUSTER_CHECK if one or nodes was not reachable.

Implements GitHub #447.
2018-06-11 12:41:19 +09:00
Ian Barwick
69d7b6f7eb doc: 4.0.6 release notes 2018-06-07 17:14:50 +09:00
Ian Barwick
8ec3b2a536 Bump version
4.0.6
2018-06-07 15:08:48 +09:00
Ian Barwick
68a9745e7e standby follow: check node has connect to new primary
After restarting the standby, poll pg_stat_replication on the upstream
until the standby connects, and exit with an error if it doesn't by the
timeout defined in "standby_follow_timeout".

Implments GitHub #444.
2018-06-07 14:41:05 +09:00
Ian Barwick
20ce53e2d2 doc: update release notes 2018-06-07 12:48:54 +09:00
Ian Barwick
638a119c85 standby follow: add hint about using "node rejoin"
If "repmgr standby follow" is executed on a node which isn't running,
point out "repmgr node rejoin" should probably be used instead.
2018-06-07 11:02:32 +09:00
Ian Barwick
053863cdd0 doc: fix typos 2018-06-07 10:40:30 +09:00
Ian Barwick
009cc0480c witness_register: check for existing node with same name 2018-06-07 10:04:26 +09:00
Ian Barwick
63bdc19132 repmgrd: ensure local node is counted as quorum member
Rename "standby_nodes" to "sibling_nodes" to make it clearer in the
code what total is actually provided by the struct.

Addresses GitHub #439.
2018-06-01 17:19:40 +09:00
Ian Barwick
fbd389d0b3 doc: fix typo 2018-06-01 13:07:19 +09:00
Ian Barwick
4aef4ea11e standby clone: improve external configuration file copying
If --copy-external-config-files was provided, check that we can copy
the files *before* cloning the standby, and abort if an error is
encountered. This will give the user the opportunity to fix any issues
before running the entire (and potentially lengthy) clone.

Previously errors were logged but no action taken, and the final
message indicated the clone operation was successful.

Addresses GitHub #443.
2018-06-01 13:00:07 +09:00
Ian Barwick
0ffaff75df repmgrd: ensue degraded monitoring timeout works on standby
Parameter "degraded_monitoring_timeout" was not being acted on when
monitoring a streaming replication standby.

Addresses GitHub #439.
2018-05-31 17:53:31 +09:00
Ian Barwick
c54bb73fb2 If --dry-run specified, ensure minimum log level is INFO
When executed with --dry-run, repmgr outputs detail about what would
happen using log level INFO. If the log_level is configured to
NOTICE or higher, it's possible some or all of the --dry-run output
might not be displayed.

Addresses GitHub #441.
2018-05-31 15:30:26 +09:00
Ian Barwick
28ea2e48de node rejoin: avoid outputting empty DETAIL message 2018-05-31 15:10:51 +09:00
Ian Barwick
41274f5525 node rejoin: improve handling of --config-file parameter
Fixes bug when parsing --config-file values (GitHub #442).

Also improves handling in --dry-run mode, as some checks for the
provided files were being skipped if --dry-run supplied, even though
they are intended to work with --dry-run.
2018-05-31 11:44:31 +09:00
Ian Barwick
edceb32ccb standby clone: --recovery-conf-only expects the standby to be registered
Note this in the documentation, and add a HINT about registering it
if the standby record is not available.

Related to GitHub #438.
2018-05-29 11:54:38 +09:00
Ian Barwick
3dba8336e9 standby clone: don't assume existence of "user" in upstream conninfo
Usually a seperate user (typically "repmgr") is set up specifically to manage
the repmgr metadata, however there's no compelling requirement to do this, and
it's possible the database owner (usually: "postgres") will be used, in which
case it's possible the username will be left out of the conninfo string.

Addresses GitHub #437.
2018-05-24 15:51:41 +09:00
Ian Barwick
97d0cee259 "config_file" is MAXPGPATH, not MAXLEN
The two values are the same anyway, so change is more for consistency.
2018-05-22 17:19:55 +09:00
Martín Marqués
2dfe1d18e9 Fix typo in a code comment 2018-05-19 12:29:04 -03:00
Ian Barwick
55bb93bd3f "standby clone": log actual connection string used to connect to upstream
Useful for diagnostic purposes.
2018-05-10 11:58:48 +09:00
Ian Barwick
4c49954cd4 Fix check for -d/--dbname parameter
Not a bug per-se, just meant some unnecessary processing was done on
an empty string.

Per note from petere.
2018-05-10 11:57:02 +09:00
Ian Barwick
a880b6ce16 Include "arpa/inet.h" in dbutils.c
Needed for htonl() on FreeBSD.
2018-05-10 11:25:52 +09:00
Ian Barwick
c51a2283dd Minor documentation fixes 2018-05-10 10:27:25 +09:00
Ian Barwick
717828e73e doc: update 2ndQuadrant repository information
Canonical link for each repository should not include any directories.
2018-05-03 17:21:29 +09:00
Ian Barwick
c7477d7a9c doc: update repository information 2018-05-03 15:22:33 +09:00
Ian Barwick
1db8d3904f doc: update package installation information
Document the new public 2ndQuadrant apt repository
2018-05-03 15:07:26 +09:00
Ian Barwick
362f478d55 doc: update package installation information
Document the new, public 2ndQuadrant RPM repository.
2018-05-03 14:12:29 +09:00
Ian Barwick
cb1bf892e6 Finalize 4.0.5 release 2018-05-01 11:26:30 +09:00
Ian Barwick
b1b5fe1193 doc: add notes about package compatibility
We need to emphasise that the repmgr packages are only compatible
with packages based on the PGDG filesystem layout; 3rd party vendor
packages often put application and data directories elsewhere.
See e.g. GitHub #427.
2018-05-01 11:08:59 +09:00
Ian Barwick
af0e141859 doc: update FAQ location 2018-05-01 10:27:59 +09:00
Ian Barwick
580c1a9170 doc: update HISTORY and add 4.0.5 release notes 2018-05-01 10:13:44 +09:00
Ian Barwick
b624fc7efa Bump version
4.0.5
2018-05-01 09:21:32 +09:00
Ian Barwick
67ccd4dcb3 repmgrd: don't explicitly close connections on shutdown 2018-04-30 15:13:30 +09:00
Ian Barwick
6de3a5a997 Fix parsing of "archive_ready_critical" configuration file parameter.
Per report in GitHub #426.
2018-04-28 06:59:20 +09:00
Ian Barwick
f86e89ba45 repmgrd: notify sibling nodes to follow new primary after pg_ctl timeout
If "pg_ctl promote" fails due to a timeout, but the promotion itself succeeds,
have repmgrd on the new primary explicitly notify any sibling nodes to
follow it.

Previously the sibling nodes would wait "primary_notification_timeout" seconds
before attempting to discover the new primary.

This (and preceding commit eac80ae) address GitHub #425.
2018-04-27 11:59:00 +09:00
Ian Barwick
a6d0ba07ed repmgrd: handle pg_ctl timeout
It's possible "pg_ctl promote" will timeout, causing "repmgr standby
follow" to return with an error; however the promotion itself will usually
succeed, so detect this case and handle accordingly.
2018-04-26 19:23:26 +09:00
Ian Barwick
b553a70ad5 repmgrd: always close the connection if the pointer is not NULL 2018-04-25 14:08:17 +09:00
Ian Barwick
3364f8bdf0 Add configuration file parameter "config_directory"
This enables explicit provision of an external configuration file
directory, which if set will be passed to "pg_ctl" as the -D
parameter. Otherwise "pg_ctl" will default to using the data directory,
which will cause some operations to fail if the configuration files
are not present there.

Note this is implemented primarily for feature completeness and for
development/testing purposes. Users who have installed "repmgr" from
a package should not rely on "pg_ctl" to stop/start/restart PostgreSQL,
instead they should set the appropriate "service_..._command" for their
operating system. For more details see:

    https://repmgr.org/docs/4.0/configuration-service-commands.html

Note: in a future release, the presence of "config_directory" in repmgr.conf
will be used to implictly set "--copy-external-config-files=samepath" when
cloning a standby; this is a behaviour change so will be implemented in the
next major realease (repmgr 4.1).

Implements GitHub #424.
2018-04-25 11:57:27 +09:00
Ian Barwick
242fa287b4 repmgrd: catch corner case in standby connection handle check
If repmgrd marks the local node as unavailable, and it was actually
restarting but a failover event occured before the next local node
check, failover will continue with the stale connection handle.

Add a final local node check just before starting the failover
process, so repmgrd can reconnect if it wasn't able to before.
2018-04-24 21:55:36 +09:00
Ian Barwick
fa908432c8 Minor doc and log output tweaks 2018-04-24 21:08:31 +09:00
Ian Barwick
afa942fef6 repmgrd: prevent standby connection handle from going stale
If monitoring history not in use, there's no activity on the standby's
connection handle, so if e.g. the standby is restarted, PQstatus()
never returns CONNECTION_BAD and repmgrd never notices the connection
is stale. Therefore execute a throw-away statement at "monitor_interval_secs".
2018-04-23 23:51:03 +09:00
Ian Barwick
94cfc66b04 doc: minor clarification 2018-04-20 12:23:04 +09:00
Ian Barwick
87eae9a50f doc: additional details about repmgrd usage in Debian/Ubuntu 2018-04-20 12:04:15 +09:00
Ian Barwick
82a37f4865 doc: add Debian package details 2018-04-20 10:57:19 +09:00
Ian Barwick
a38f727b7d doc: Improve CentOS package-related documentation 2018-04-20 10:31:42 +09:00
Ian Barwick
e6df936c1b doc: link to service command configuration from switchover section 2018-04-19 17:09:10 +09:00
Ian Barwick
91ca997d40 doc: improve configuration documentation
With special attention to setting service commands, and extra special
mention of "pg_ctlcluster" for Debian/Ubuntu users.
2018-04-19 16:49:26 +09:00
Ian Barwick
65c90a2a64 doc: update CentOS package documentation 2018-04-19 14:27:17 +09:00
Ian Barwick
90cba78f52 repmgrd: tweak event notifications on standby failure
The event notification was only being created if there was a valid
primary connection; it should be created in any case, so an event
notification script can be executed.
2018-04-17 10:27:25 +09:00
Ian Barwick
f8908d7e31 Bump version
4.0.5dev
2018-04-13 10:18:04 +09:00
Ian Barwick
478bbcccbf Add "dbname=replication" to all replication connection strings
Previously repmgr was attempting to make replication connections
with "dbname" set to the repmgr database name. While this works
if e.g. the repmgr user also has replication permissions, it will
fail if a dedicated replication user is specified, who only has
permission to access the virtual "replication" database.

Change this to use "dbname=replication" if the replication connection
user is different to the normal repmgr database user.

(We could just always set it to "replication", but that might break
existing installations e.g. where a .pgpass file is in use and there's
no "replication" entry for the normal repmgr database user).

Addresses GitHub #421.
2018-04-12 16:10:02 +09:00
Ian Barwick
a03d41de28 doc: mention --recovery-conf-only introduced in repmgr 4.0.4
Per GitHub #419.
2018-04-12 13:13:11 +09:00
Ian Barwick
f1e527adcb doc: various updates related to "standby clone" operations. 2018-04-12 13:08:05 +09:00
Ian Barwick
09e597dcdd Fix superuser password handling
When establishing a superuser connection, the connection parameters
were being copied from the existing (non-superuser) connection, which
in some circumstances can lead to that user's password being
included in the copied parameter list. The password parameter, if set, will
now always be removed, which will cause libpq to retrieve the correct
one from the .pgpass file.

Addresses GitHub #400.
2018-04-12 12:50:17 +09:00
Ian Barwick
94a7f0c719 Don't issue a CHECKPOINT after promoting a standby.
Issuing a CHECKPOINT immediately after promoting a standby may impact
performance. Commit 239a548e9d ensures
one is only issued when required, i.e. during a switchover when
pg_rewind will be executed.

This reverts commit a2068768ab.
2018-04-09 14:39:47 +09:00
Ian Barwick
6ac42f1593 "standby register": add sanity check when --upstream-node-id not supplied
If --upstream-node-id was not supplied to "repmgr standby register",
repmgr defaults to the primary node as upstream node. If the local node is
available, we now double-check that it's attached to the primary,
in case the lack of --upstream-node-id was an accidental ommission.

This check is only made when the local node is available.

This behaviour can be overriden with -F/--force (though it's hard to
imagine a scenario where that would be useful).

Addresses GitHub #395.
2018-04-05 17:40:05 +09:00
Ian Barwick
94b72382e5 doc: minor FAQ tweaks 2018-04-05 17:10:52 +09:00
Ian Barwick
18c12f58a4 doc: add a section about repmgrd and service commands etc. 2018-04-05 11:47:35 +09:00
Ian Barwick
cf3fa18085 doc: miscelleneous FAQ updates
- clarify pg_rewind item
 - add note about what's included in recovery.conf
2018-04-04 10:08:04 +09:00
Ian Barwick
a5281d93dc Add TODO for pg_rewind changes coming in PostgreSQL 11 2018-04-03 21:57:50 +09:00
Ian Barwick
0d73d3c2b5 Enable provision of "archive_cleanup_command" in recovery.conf
If "archive_cleanup_command" is defined in "repmgr.conf", a corresponding
entry will be made in the node's "recovery.conf" file after cloning a
standby.

Note that we recommend using PgBarman to manage WAL archives, but are
providing this facility to help repmgr to be integrated in existing environments.

Implements GitHub #416.
2018-04-03 14:11:24 +09:00
Ian Barwick
23c99304a6 "node rejoin": actively check for node to rejoin cluster
Previously repmgr was relying on whatever command was configured to
start PostgreSQL to determine whether the node being rejoined had
started correctly. However it's preferable to actively poll the upstream
to confirm it has restarted and actually attached as a standby before
confirming success of the "node rejoin" action.

This can be overridden with the -W/--no-wait option.

(Note that for consistency with other PostgreSQL utilities, the
short form of the --wait option is now "-w"; this is currently
only used in "repmgr standby follow".)

Also update "repmgr node rejoin" documentation with a list of supported
options, and add some useful index entries for "pg_rewind".

Implements GitHub #415.
2018-04-03 10:36:13 +09:00
Ian Barwick
1ab16bc6c2 doc: fix option description for "repmgr primary register" 2018-04-03 10:10:05 +09:00
Ian Barwick
7f1f04636d Refactor pg_control parsing
The "data_checksum_version" field towards the end of the ControlFileData struct,
meaning its position varies between versions. Previously this wasn't a problem
as it was only required for operations involving 9.5 and later, and its position
within the control file has not changed between the current release and current
HEAD.

However, in order to support pg_rewind in 9.3 and 9.4, which both have changes in
the control file format, we'll need version-specific parsing. This will also make
it easier to deal with any future changes to the control file format.
2018-04-02 20:55:10 +09:00
Ian Barwick
6a1797cadd Enable pg_rewind to be used with PostgreSQL 9.3/9.4
pg_rewind is not part of the core distribution for those, but we
provided support in repmgr 3.3 so should extend it to repmgr 4.

Note that there is no check in place whether the pg_rewind binary
exists, so it's up to the user to ensure it's present.

Addresses GitHub #413.
2018-04-02 20:55:04 +09:00
Ian Barwick
94d26dbe9f Always set "connect_timeout" when pinging a PostgreSQL instance
Insert "connect_timeout=2" into the connection parameters, if not
explicitly set by the user. This will prevent excessive wait time
for the host operating system to report a connection timeout.
2018-04-02 09:31:42 +09:00
Ian Barwick
ae655eb4fd Add TODO list
This file will collate various requests and ideas for future developement.
In particular it will reference requests which come in via the GitHub issue
tracker, so we can acknowledge and close off the request and not have an
open unresolved issue hanging around.
2018-03-30 14:18:51 +09:00
Ian Barwick
65371489c6 repmgrd: handle failover with two nodes in the primary location
If two nodes were in the primary location, and at least one node in
another location, the non-failed node in the primary location was not
recognising itself as a promotion candidate.

Addresses GitHub #407.
2018-03-30 12:17:34 +09:00
Ian Barwick
28c7737dc0 Log pg_control access errors as WARNINGs rather than DEBUG
This will make it easier to diagnose issues, possibly with an incorrect
"data_directory" setting in "repmgr.conf".
2018-03-30 11:24:44 +09:00
Ian Barwick
505d72d19c "standby switchover": force checkpoint if pg_rewind requested.
Addresses issue described in GitHub #378.

PostgreSQL itself doesn't issue a checkpoint after promotion to ensure
the newly promoted server is available as quickly as possible, so we'll
only execute an explicit CHECKPOINT when it's actually required, i.e.
when pg_rewind will be executed. This is required as pg_rewind uses
the timeline reported in the pg_control file to compare with the
server to be rewound, and the pg_control timeline is only updated after
the first checkpoint, so there is an interval where pg_rewind will
erroneously assume both servers are on the timeline and take no action.
2018-03-30 09:12:25 +09:00
Ian Barwick
b292ac61f8 "standby switchover": update hint 2018-03-30 09:12:21 +09:00
Ian Barwick
293d66bf71 Fix minimum accepted value for "degraded_monitoring_timeout"
Should be -1, the default.

Addresses GitHub #411.
2018-03-30 09:12:17 +09:00
Ian Barwick
3e1f0ec168 repmgr: move demoted primary check to the final step during switchover
This will give the demoted primary more time to start up as a standby,
during which "standby follow" can be executed on sibling nodes, if
specified.
2018-03-27 16:41:13 +09:00
Ian Barwick
6f9a1f975e repmgr: poll demoted primary after restart during switchover
During a switchover operation, once the demoted primary has been restarted
as a standby, repmgr attempts to reconnect to verify its status and drop
any redundant replication slots. However it's possible the standby may still
be in the startup phase, so poll for "standby_reconnect_timeout" seconds
before giving up.

Addresses GitHub #408.
2018-03-27 15:58:18 +09:00
Ian Barwick
deea4f69f7 Fix "repmgr cluster crosscheck" output
Addresses GitHub #398.
2018-03-27 10:28:27 +09:00
Ian Barwick
37e53108a2 Consolidate connection closure calls 2018-03-27 08:52:23 +09:00
Ian Barwick
96cf06204c doc: add note about remote command execution
When executing a command on a remote server, repmgr expects the remote binary
to be in the same location as the local binary. It's reasonable to assume
repmgr will be deployed in a unified environment; if not, the onus is on the
user to ensure repmgr can find the remote binary, e.g. by creating appropriate
symlinks.

Addresses query in GitHub #406.
2018-03-27 08:47:56 +09:00
Ian Barwick
381e22c2c7 Misc tweaks to witness code 2018-03-26 20:59:38 +09:00
Ian Barwick
7e2af17783 repmgrd: tweak log notices when marking a standby as failed
Announce what we're going to do (set the node record inactive) *before*
performing the action. Makes reading the log slightly easier.
2018-03-23 13:27:37 +08:00
Ian Barwick
b4272853e7 Add event "repmgrd_failover_aborted" 2018-03-23 10:44:00 +08:00
Ian Barwick
562b6ddfc2 Add error code ERR_FOLLOW_FAIL 2018-03-23 10:34:19 +08:00
Ian Barwick
a15e5c9d52 Tidy up queries in dbutils.c
- standardize formatting
- prefix various internal function calls with "pg_catalog.", to
  mitigate possible risks from CVE-2018-1058
2018-03-23 10:33:28 +08:00
Ian Barwick
d9cc09cee4 repmgrd: fix typo 2018-03-21 12:36:51 +09:00
Ian Barwick
c4f6abe951 Update HISTORY 2018-03-21 06:51:56 +09:00
Martín Marqués
e454fb77d3 While reviewing 7cb6e5af8d before merging
I noticed that besides the result cleanup added, there was still a missing
spot inside the if condition.

Adding the PQclear that was missing.
2018-03-21 06:51:50 +09:00
Andrzej Nowicki
b76e5852d3 One more memory leak fixed 2018-03-21 06:51:43 +09:00
Andrzej Nowicki
0674364ffd Clear node list to avoid memory leak, fixes #402 2018-03-21 06:51:37 +09:00
Ian Barwick
b2eb9b8525 Correctly handle error message pointer when parsing strings.
When parsing conninfo strings, ensure the error message pointer is
actually returned to the caller.

Not a criticial issue, just meant the contents of the error message
were not being displayed.
2018-03-10 14:28:10 +09:00
Ian Barwick
71c5d10a8c doc: update 4.0.4 release date 2018-03-09 20:07:16 +09:00
Ian Barwick
1476b21cd4 doc: update release notes
Add note about requiring 4.0.3 or later on all nodes when performing
a switchover from a noder running 4.0.3 or later.

Per report in GitHub #388.
2018-03-09 09:46:58 +09:00
Ian Barwick
b17993abdb doc: update "repmgr primary unregister" description
As noted by GitHub user yonj1e in GitHub #396.
2018-03-08 15:01:25 +09:00
Ian Barwick
8f68344f9a doc: update FAQ
Additional clarification for "repmgr standby clone --recovery-conf-only"
2018-03-08 10:04:30 +09:00
Ian Barwick
125ac6c297 doc: update FAQ
Add entry about upgrading PostgreSQL
2018-03-08 10:04:30 +09:00
Ian Barwick
955860923f Fix parsing of -k/--keep-history option
GitHub #394.
2018-03-07 19:14:18 +09:00
Ian Barwick
50626f90cc Add 4.0.4 release notes 2018-03-07 14:17:04 +09:00
Ian Barwick
9aea5b8aa7 repmgrd: fix failover handling in "manual" mode
Regression was introduced in commit c7a585c555
2018-03-06 22:35:51 +09:00
Ian Barwick
ed1bcb159e repmgrd: remove duplicate local record check in BDR mode 2018-03-06 12:31:07 +09:00
Ian Barwick
9c72c0d66e Add event "repmgrd_shutdown"
Implements GitHub #393
2018-03-06 10:59:54 +09:00
Emre Hasegeli
0ddc226c2a Add witness options to the main help
GitHub #392
2018-03-06 10:57:33 +09:00
Ian Barwick
93830cad61 Fix directory creation when cloning from Barman 2018-03-05 19:31:53 +09:00
Ian Barwick
bca1660d5e Improve repmgrd logging in BDR mode
Also ensure interval status log line is shown as intended
2018-03-05 15:05:40 +09:00
Ian Barwick
5a52917421 repmgrd: add debug log output for "monitor_interval_secs" sleep in all modes 2018-03-05 14:23:58 +09:00
Emre Hasegeli
70752d7d4a Add missing options to the main help 2018-03-05 09:52:04 +09:00
Ian Barwick
c29d1efc37 "standby clone": improve replication user selection
Use the upstream node's replication user when checking the replication
connection.
2018-03-02 16:21:32 +09:00
Ian Barwick
6fbbe2a97a "standby clone": fix --superuser handling
get_superuser_connection() was erroneously using the local node record
to connect to as a superuser, which works when registering the primary
but obviously not when cloning a standby.

Addresses GitHub #380.
2018-03-02 14:49:17 +09:00
Ian Barwick
ce42d6827e Update HISTORY 2018-03-01 15:51:09 +09:00
Ian Barwick
98384559a6 "standby clone": remove restriction on replication slots in Barman mode
While it's preferable to avoid standby replication slots if Barman is in
use, there's no technical reason to prevent this.

Implements GitHub #379.
2018-03-01 15:47:28 +09:00
Ian Barwick
4a1477343b repmgr: escape "restore_command" in generated recovery.conf 2018-03-01 10:39:04 +09:00
Ian Barwick
d2b9d20393 "standy clone": fix primary_conninfo when --upstream-conninfo provided 2018-03-01 09:18:40 +09:00
Ian Barwick
fe594c95ad repmgrd: retry standby connection after cascading standby failover 2018-02-28 21:15:11 +09:00
Ian Barwick
60e63feaca repmgrd: add configuration file parameter "standby_reconnect_timeout"
This is used for determining a timeout when reconnecting to the standby
after executing the "follow_command". This will normally not need to be
set explicitly, but maybe useful in cases where the standby's startup
phase can last longer than usual.
2018-02-28 18:56:33 +09:00
Ian Barwick
ae4d0f2622 repmgrd: fix main monitoring loop for witness server
Missing "break" was breaking it when following a new primary.
2018-02-28 16:30:14 +09:00
Ian Barwick
5e8b41e221 repmgrd: retry standby connection after "follow_command" executed
It's possible that the standby is still starting up after the "follow_command"
completes, so poll for a while until we get a connection.
2018-02-28 15:35:47 +09:00
Ian Barwick
c7a585c555 repmgrd: improve log output
- emit explicit startup NOTICE
- emit NOTICE when falling back to degraded monitoring on a primary node
- improve log message and event notification details when monitoring
  a former primary which has been reconnected as a standby
2018-02-28 12:35:13 +09:00
Ian Barwick
a27dd8c49c doc: document "primary_follow_timeout" configuration file parameter. 2018-02-27 10:09:40 +09:00
Ian Barwick
9365bf3474 "standby promote": make timeout values configurable
This introduces following new configuration file parameters, which
were previously hard-coded values:

 - promote_check_timeout
 - promote_check_interval

Implements GitHub #387.
2018-02-27 10:04:58 +09:00
Ian Barwick
e8ae0831fe doc: add <options> section for various commands 2018-02-26 16:54:54 +09:00
Ian Barwick
518866eba5 "node status": improve replication slot warnings
Addresses GitHub #385
2018-02-23 11:06:47 +09:00
Ian Barwick
ed0330c334 "standby clone": document --recovery-conf-only option 2018-02-23 10:54:42 +09:00
Ian Barwick
1f021dc9fa "standby clone --recovery-conf-only": display generated file with --dry-run
Refactor the original code which generates "recovery.conf" to place the
output into a buffer, which can either be output as "recovery.conf"
or copied to a buffer specified by the caller.
2018-02-23 10:16:47 +09:00
Ian Barwick
425839d764 Fix typo in function name 2018-02-22 15:48:41 +09:00
Ian Barwick
3a764f678a "standby clone": add --recovery-conf-only option
This will generate "recovery.conf" for an existing standby.

Typical use-case is a standby cloned manually from an external data
source (e.g. Barman), where "recovery.conf" needs to be created
(and if required a replication slot).

The --dry-run option will check the pre-requisites but not actually
create "recovery.conf" or a replication slot.

This requires that the upstream node is running, a replication connection
can be made and if required a replication slot can be created.

Implements GitHub #382.
2018-02-22 15:47:19 +09:00
Ian Barwick
829cf5cca4 repmgrd: improve detection of status change from primary to standby
If repmgrd is running in degraded mode on a primary which has been stopped,
then manually been brought back online as a standby (e.g. by creating
recovery.conf and starting the server), ensure it not only detects the
change but automatically updates the node record so it can resume
monitoring the node as a standby.

Previously, repmgrd was looping waiting for the record to be updated
(as is done transparently when executing "repmgr node rejoin") but
if the record was not updated within the timeout period (e.g. by
"repmgr standby register) it would fail to resume monitoring as a
standby.

It seems reasonable to have repmgrd automatically update the node record,
as this will restore failover capability as quickly as possible. If this
is not desired, then the onus is on the user to shut down repmgrd while
making the desired changes.
2018-02-22 11:35:47 +09:00
Ian Barwick
14420d83fa "node rejoin": ensure --dry-run is honoured
Addresses GitHub #383.
2018-02-20 15:28:39 +09:00
Ian Barwick
a80e22f0ed Bump version
4.0.4
2018-02-16 12:19:31 +09:00
Ian Barwick
832993bfbc doc: update 4.0.3 release notes 2018-02-16 12:15:10 +09:00
Ian Barwick
f1ea5e62df doc: update release notes 2018-02-15 14:42:29 +09:00
Ian Barwick
b47448d0e5 Replace remaining instances of strcpy() with strncpy()
Also use strncmp() to match.
2018-02-15 13:17:06 +09:00
Ian Barwick
a8232337d8 Catch various corner cases when restarting a PostgreSQL instance 2018-02-14 11:28:38 +09:00
Ian Barwick
c9eb1bfcc0 Always initialise t_conninfo_param_list structures 2018-02-13 10:48:18 +09:00
Ian Barwick
db552dfbc7 Bump version
4.0.3
2018-02-12 15:03:29 +09:00
Ian Barwick
9732f78565 repmgrd: check "repmgr" extension is installed before starting
Implements GitHub #361.
2018-02-12 11:31:59 +09:00
Ian Barwick
eb7dca2919 "node status": add warning about missing replication slots
Implements GitHub #364.
2018-02-12 10:53:31 +09:00
Ian Barwick
c113102926 Update repmgr.conf.sample
Add missing parameter "monitor_interval_secs"
2018-02-12 09:35:57 +09:00
Ian Barwick
ed6a167915 Execute a CHECKPOINT immediately after promoting the server
This ensures "pg_control" is updated with the latest timeline, mainly
to ensure that if "pg_rewind" is executed as part of a switchover
that it sees the latest timeline.

Per suggestion from GitHub user "superflav" in GitHub #378.

See also:

  https://www.postgresql.org/message-id/flat/20150428180253.GU30322%40tamriel.snowman.net
2018-02-09 12:09:16 +09:00
Ian Barwick
fbbe7afd61 doc: update HISTORY and release notes 2018-02-09 11:42:16 +09:00
Ian Barwick
ae1fc93e48 Ensure correct server version number used for replication stats query 2018-02-09 11:06:15 +09:00
Ian Barwick
7b4ee80af2 "standby switchover": check demotion candidate can make replication connection
Check it's actually possible for the demotion candidate to attach to
the promotion candidate before executing the switchover.

As with other checks of this nature, there's a faint possibility the
situation could change between the time the check is carried out and
the demotion candidate is restarted to connect to the promotion candidate,
but there's not a lot we can do about that. The main purpose is to
be able to catch existing misconfigurations before anything gets changed.

Implements GitHub #370.
2018-02-09 10:01:29 +09:00
Ian Barwick
0b8755e278 "witness register": fix primary node check
Addresses GitHub #377, based on report by user yonj1e in #373.
2018-02-08 16:28:50 +09:00
Ian Barwick
d3e1937808 "standby switchover": additional sanity checks
Check that sufficient walsenders will be available on the promotion
candidate, and if replication slots are in use check if enough of
those will be available.

Note these checks can't guarantee that the walsenders/slots will
be available at the appropriate points during the switchover process,
but do ensure that existing configuration problems will be caught.

Implements GitHub #371.
2018-02-08 15:23:10 +09:00
Ian Barwick
871d6fdee3 "standby clone": cowardly refuse to clone into an active data directory
By checking the PID file in the same way pg_ctl does, we can be pretty
much certain whether the target data directory contains an active
PostgreSQL instance.
2018-02-08 11:43:24 +09:00
Ian Barwick
c7dfe9e040 Fix "standby clone" in Barman mode with --no-upstream-connection
"--upstream-node-id", if provided, was not being passed through to
the SQL query executed via the Barman server.

Also modified the query to select the primary node if "--upstream-node-id"
is not provided.

Note: this is a very niche use case.
2018-02-07 16:36:44 +09:00
Ian Barwick
5c92a9e057 repmgr: simplify data directory checks when cloning
Attempting to use the contents of pg_control to tell whether the directory
is in use by PostgreSQL can result in false positives; we should use
a check based on the pidfile.

Also change the HINT to indicate a data directory can be overwritten
if -F/--force is provided.
2018-02-07 14:37:57 +09:00
Ian Barwick
aa5f025738 "standby clone": ensure "pg_subtrans" directory is created in Barman mode 2018-02-07 10:56:18 +09:00
Ian Barwick
5b91a2d409 Update HISTORY and release notes 2018-02-07 09:55:36 +09:00
Ian Barwick
596a19ee37 Move parse_output_to_argv() to configfile.c
So it can be used by parse_pg_basebackup_options().

Addresses GitHub #376.
2018-02-07 09:43:06 +09:00
Ian Barwick
23ff83b3b4 Fix typo in HINT 2018-02-07 08:55:51 +09:00
Ian Barwick
ba1f6bee0d doc: fix GitHub reference in release notes 2018-02-07 08:53:23 +09:00
Ian Barwick
da9c8f2491 Update HISTORY and release notes 2018-02-06 10:38:13 +09:00
Ian Barwick
64035ef701 "standby register/follow": provide primary node details for event notifications
For events generated by these commands, it may be useful to know details
of the primary node. This makes following additional parameters available
to event notification scripts:

- %p: node ID of the primary
- %a: node name of the primary
- %c: conninfo string for the primary

Implements GitHub #375
2018-02-06 09:36:46 +09:00
Ian Barwick
da3a5ab1dc doc: fix descriptions of %p event notification script parameter 2018-02-05 15:54:06 +09:00
Ian Barwick
9d301b4789 "standby register": add event notification "standby_register_sync"
Implements GitHub #374.
2018-02-05 15:21:38 +09:00
Ian Barwick
c070c649f7 doc: minor fixes to BDR docs
Also remove duplicate file.
2018-02-05 15:21:34 +09:00
Ian Barwick
3b823396eb doc: improve BDR failover documentation 2018-02-05 15:21:28 +09:00
Ian Barwick
c19e7f1025 "cluster show": output any connection error messagesin list of warnings
This ensures any connection errors are displayed by default in a
comprehensible, easily reportable way, and saves having to request/filter
DEBUG output.

Implements GitHub #369.
2018-02-05 10:32:20 +09:00
Ian Barwick
e4b5a1e19f "cluster show": minor code cleanup 2018-02-05 10:25:05 +09:00
Ian Barwick
f96cc3b906 "cluster show": improve handling of database errors
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.
2018-02-05 10:15:48 +09:00
Tony Finch
a481ca7ce2 "repmgr node status": correct upstream node info (#363)
repmgr was printing the name and ID of this node instead of its upstream

Signed-off-by: Tony Finch <dot@dotat.at>
2018-02-05 09:54:00 +09:00
Ian Barwick
32dc450a09 doc: add note about replication slots and PostgreSQL upgrades 2018-02-02 18:33:43 +09:00
Ian Barwick
34dbf64f50 Ensure an inactive PostgreSQL data directory can be deleted.
Addresses GitHub #366.
2018-02-02 17:12:25 +09:00
Ian Barwick
ea653a8dbc "standby follow": finalize implementation of --dry-run option 2018-02-02 15:42:08 +09:00
Ian Barwick
50894b6124 "standby follow": check for replication slot availability on target node 2018-02-02 15:01:23 +09:00
Ian Barwick
94e187c476 Improve "repmgr primary unregister" documentation and --help output
Per observations in GitHub #373
2018-02-02 14:12:15 +09:00
Ian Barwick
de6284ae79 doc: note password SSH requirements for "standby switchover" 2018-02-02 14:01:58 +09:00
Ian Barwick
c54045bcd8 "standby follow": initial implementation of --dry-run option
GitHub #363.
2018-02-01 14:18:40 +09:00
Ian Barwick
c0a53471e1 "standby switchover": improve log messages and add new exit code
Previously, if an issue was encountered with the old primary, but user
provided -F/--force to have repmgr promote the standby anyway, repmgr
would exit with the log message "STANDBY SWITCHOVER is complete"
and exit code 0 (SUCCESS).

To better report this partial completion, repmgr will now emit the message
"STANDBY SWITCHOVER has completed with issues" (and a HINT to check preceding
log messages) and new exit code 22 (ERR_SWITCHOVER_INCOMPLETE).
2018-01-31 10:25:15 +09:00
Ian Barwick
2eec8b5d79 Have do_standby_follow_internal() not abort on error
Pass the error code back to the caller instead, mainly so
"repmgr node rejoin" can better report errors.
2018-01-30 16:53:04 +09:00
Ian Barwick
c11e92cf2a repmgr: improve switchover handling when "pg_ctl" used
If logging output not explicitly rediretced with "-l" in the pg_ctl
options, repmgr would hang waiting for pg_ctl output.

Note that we recommend using the OS-level service commands where
available.
2018-01-30 13:43:37 +09:00
Ian Barwick
f294d09034 "repmgr standby register": improve error output when standby not running
Add explicit HINT
2018-01-26 22:13:11 +09:00
Ian Barwick
26c597ef5a doc: expand upgrade documentation
Include section about using pg_upgrade
2018-01-23 10:57:19 +09:00
Vlad
b8efbb7a15 doc: add missing word in overview
GitHub pull request #362
2018-01-19 09:11:54 +09:00
Ian Barwick
3044696c05 doc: update 4.0.2 release notes
Add details about upgrading.
2018-01-19 09:09:59 +09:00
Ian Barwick
6dc1969ad5 Remove --bdr-only configuration option
This was required for a specific use case during pre-release
development and is no longer needed now the physical streaming
replication handling is implemented.
2018-01-18 13:30:47 +09:00
Ian Barwick
cb41ef1733 doc: update list of event notifications 2018-01-18 11:48:10 +09:00
Ian Barwick
d10f1f289e Bump version in configure.in
4.0.2
2018-01-16 13:55:58 +09:00
Ian Barwick
5731ba6043 Update version and release date 2018-01-16 12:58:11 +09:00
Ian Barwick
3d6437c8f8 repmgr: assume node is actually shutting down if pingable and that's the reported status 2018-01-16 11:17:06 +09:00
Ian Barwick
54b5c8ad94 repmgrd: log execution error in "repmgrd_get_local_node_id()"
That shouldn't happen, but if it does it will make it easier to
identify the issue.
2018-01-16 11:14:04 +09:00
Ian Barwick
0eca08ffaf doc: improve switchover documentation
Emphasize need to set the "service_*_command" options when repmgr is
installed from a package.
2018-01-16 11:06:39 +09:00
Ian Barwick
05c1dc2b92 doc: add 4.0.2 release notes 2018-01-11 16:39:58 +09:00
Ian Barwick
2bd300073d doc: minor readbility fix 2018-01-11 15:49:56 +09:00
Ian Barwick
01e020df8e doc: note change of shared library name from "repmgr_funcs" to "repmgr" 2018-01-11 15:47:35 +09:00
Ian Barwick
ae7963dc64 repmgr: automatically create slot name if missing
It's possible that a node was registered with "use_replication_slots=false"
but that was later changed to "use_replication_slots=true". If the node
was not subsequently re-registered, the node record will contain an empty
slot name, which will cause any slot creation operation during
"standby follow" or "node rejoin" to fail.

To prevent this happening, check for an empty slot name and automatically
set before proceeding.

Addresses GitHub #343.
2018-01-11 11:13:41 +09:00
Ian Barwick
faffb2a6e7 repmgr: catch possible corner case when checking node shutdown status
It's conceivable that PQping is returning "no response" but the
shutdown hasn't quite completed.
2018-01-10 14:56:00 +09:00
Ian Barwick
5d57044118 repmgr: during switchover, correctly detect unclean shutdown status 2018-01-10 12:21:04 +09:00
Ian Barwick
07a88c78a5 repmgr standby switchover: add "%p" event notification parameter
This will contain the node ID of the former primary.
2018-01-10 11:01:00 +09:00
Ian Barwick
f7df8b9c80 doc: document command line options for "standby switchover" 2018-01-10 10:19:36 +09:00
Ian Barwick
20920b3da1 repmgr standby switchover: add event details 2018-01-10 09:55:24 +09:00
Ian Barwick
683f4de182 Bump version
4.0.2
2018-01-09 13:43:58 +09:00
Ian Barwick
0c62821ffb Consolidate parsing of output from executing repmgr on a remote server
This should also fix the issue reported in GitHub #349.
2018-01-09 13:33:38 +09:00
Ian Barwick
6b70e8bbe6 doc: list repmgr.conf parameters relevant during switchover 2018-01-08 11:13:39 +09:00
Ian Barwick
6b223698c9 Fix call to is_active_bdr_node() in BDR repmgrd
Following the fix to "is_active_bdr_node()" in 841f03ae, it turns out
the call in repmgrd-bdr.c was only accidentally working; explicitly
test for a false return value.
2018-01-04 21:06:45 +09:00
Ian Barwick
aee12dc2c7 "repmgr bdr register": create missing connection replication set if needed
Previously the assumption was that the "repmgr" replication set would be
set up when the nodes are created, however no checks were implemented
and this was not well-documented.

Addresses GitHub #347.
2018-01-04 17:12:52 +09:00
Ian Barwick
c5c86e1ada "repmgr bdr register": improve node name check
We'll use "bdr.bdr_get_local_node_name()" to check the local BDR node
name and the repmgr one match.
2018-01-04 16:07:06 +09:00
Ian Barwick
7476dc84f2 doc: link event notification page from relevate command reference pages 2018-01-04 14:54:14 +09:00
Ian Barwick
f6d63f5216 doc: update package documentation 2018-01-04 13:11:44 +09:00
Ian Barwick
a608b0bc18 "repmgr standby register": add --wait-start option
Implements GitHub #356.
2018-01-04 12:48:12 +09:00
Ian Barwick
469ebba656 doc: fix typos in "repmgr primary unregister" command reference 2018-01-04 12:31:29 +09:00
Ian Barwick
647c21ad0e doc: add link to event notifications page from "repmgr cluster event" 2018-01-04 10:57:54 +09:00
Ian Barwick
3d2530d6f9 Fix query in is_active_bdr_node()
Boolean column was not being checked correctly.

Also add detail output in "repmgr node role --check", where the function
is called.
2018-01-04 10:48:31 +09:00
Ian Barwick
b26e400199 "repmgr cluster event": move query to dbutils.c 2018-01-04 10:06:54 +09:00
Ian Barwick
152e9545a4 docs: document "repmgr cluster event --terse" 2018-01-04 09:53:54 +09:00
Ian Barwick
83b8f05221 "repmgr cluster events": optionally omit "Details" column with --terse
Implements GitHub #360.
2018-01-04 09:48:00 +09:00
Ian Barwick
486f8e5a2c repmgrd: document standby_[failure|recovery] event notifications
Also clean up the relevant code section.

Addresses GitHub #359.
2018-01-04 09:34:49 +09:00
Ian Barwick
e517cc74d1 repmgr node rejoin: handle missing node record correctly
If a connection was provided for a database other than the "repmgr"
database, error was logged but execution continued, resulting in
the connection being finished twice.

Addresses GitHub #358.
2018-01-03 15:20:10 +09:00
Ian Barwick
26285b470f doc: add appendix with details about packages
work-in-progress
2018-01-02 17:24:51 +09:00
Ian Barwick
1521657965 Update copyright notices to 2018 2018-01-02 10:20:09 +09:00
Ian Barwick
041604e303 doc: Fix event notification placeholder typo
Per report from Carlos.
2018-01-01 10:29:34 +09:00
Ian Barwick
0be0100a7c docs: update HISTORY 2017-12-27 10:24:56 +09:00
Ian Barwick
2133834dda doc: update documentation build instructions
Describe how to build documentation as a single file, and also note
requirement to build against 9.6 or earlier.
2017-12-27 10:24:22 +09:00
Ian Barwick
d5fd93c350 repmgr.conf.sample: fix command line argument
"repmgr node check --archive-ready" is correct, however abbreviated
versions will be accepted by getopt_long() if they don't match
or partially match any other options.

Per report by "chaintng" in GitHub #355.
2017-12-27 10:24:17 +09:00
Tony Finch
5804778b58 doc: an optional all-in-one-file manual 2017-12-27 10:24:10 +09:00
Ian Barwick
407a7ea2f4 repmgr: add missing -W option to getopt_long() invocation
Addresses GitHub #350.
2017-12-20 10:28:31 +09:00
Martín Marqués
4d2eca0978 Switch spaces for tabs in repmgr.conf sample file.
This makes comments stay aligned in most cases the conf file is
modified, and when indentation changes, it's easy to re-align
(by removing or adding a tab)

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-12-20 09:27:06 +09:00
Martín Marqués
9d25544ab5 Add more information to the setting up sudo without requiretty in
the documentation

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-12-20 09:27:02 +09:00
Daymel Bonne Solís
8506607388 Fix package name 2017-12-20 09:26:57 +09:00
Ian Barwick
e8e059c26d docs: update 4.0.1 release date 2017-12-13 15:15:13 +09:00
Abhijit Menon-Sen
38d293694d Fix typo: upstream_node_id → upstream_node 2017-12-11 09:30:37 +09:00
Ian Barwick
54a10a0c3f Add diagnostic option "repmgr node check --has-passfile"
This checks if the active libpq version (9.6 and later) has the
"passfile" option, and returns 0 if present, 1 if not.
`
2017-12-05 12:53:04 +09:00
Ian Barwick
a8016f602f Fix unpackaged upgrade SQL for PostgreSQL 9.3 2017-12-04 17:46:52 +09:00
Ian Barwick
de57ecdad1 Finalize 4.0.1 release files 2017-11-29 17:02:47 +09:00
Ian Barwick
1fde81cf3f docs: improve event notification documentation 2017-11-29 14:44:07 +09:00
Ian Barwick
146c412061 docs: minor fixes to various examples 2017-11-29 11:30:38 +09:00
Ian Barwick
e9cb61ae7a docs: add additional note about setting "wal_log_hints"
Useful to reference this when discussing PostgreSQL configuration in
general.
2017-11-29 11:25:14 +09:00
Ian Barwick
50e9460b3e Update release notes 2017-11-28 13:42:28 +09:00
Ian Barwick
47e7cbe147 Update HISTORY 2017-11-28 13:00:31 +09:00
Ian Barwick
bf0be3eb43 Bump version
4.0.1
2017-11-28 12:36:22 +09:00
Ian Barwick
270da1294c repmgr: initialise "voting_term" in "repmgr primary register"
This previously happened in the extension SQL code, which could
potentially cause replay problems if installing on a BDR cluster.

As this table is only required for streaming replication failover,
move the initialisation to "repmgr primary register".

Addresses GitHub #344 .
2017-11-28 12:26:33 +09:00
Ian Barwick
d3c47f450f docs: add 2ndQ yum repository installation instructions
These replace the HTML document at https://repmgr.org/yum-repository.html
2017-11-24 14:14:36 +09:00
Ian Barwick
c20475f94a Delete any replication slots copied by pg_rewind
If --force-rewind is used in conjunction with "repmgr node rejoin",
any replication slots present on the source node will be copied too;
it's essential to remove these to prevent stale slots being extant
when the node starts up.

We do this at file system level *before* the server starts to minimize
the risk of any problems.

Addresses GitHub #334
2017-11-24 11:15:14 +09:00
Ian Barwick
e0560c3e70 docs: fix configuration file example
Per report from Carlos Chapi.
2017-11-24 09:27:39 +09:00
Ian Barwick
3fa2bef6f4 repmgr: fix configuration file sanity check
The check was being carried out regardless of whether --copy-external-config-files
was specified, which means cloning will fail if no SSH connection is available.

Addresses GitHub #342
2017-11-23 22:50:28 +09:00
Ian Barwick
f8a0b051c8 repmgr: fix return code output for repmgr node check --action=...
Addresses GitHub #340
2017-11-23 10:35:41 +09:00
Martín Marqués
3e4a5e6ff5 Fix missing FQN for the nodes table.
This bug was not detected before because most users work with the repmgr
user. For that reason, the repmgr schema is already in the search_path
by default.

Add the repmgr schema to the nodes table in the LEFT JOIN used for
cluster show (and in other places)

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-11-23 10:35:38 +09:00
Ian Barwick
020b5b6982 docs: update 4.0.0 release notes 2017-11-21 16:27:18 +09:00
Ian Barwick
932326e4a0 Bump version in configure.in 2017-11-20 17:55:22 +09:00
Ian Barwick
019cd081e8 Bump version
4.0.0
2017-11-20 15:45:48 +09:00
Ian Barwick
3ace908126 docs: miscellaneous updates 2017-11-20 15:44:31 +09:00
Ian Barwick
2ad174489c docs: improve documentation of pg_basebackup_options 2017-11-20 15:30:31 +09:00
Ian Barwick
9124e0f0a2 docs: expand witness documentation 2017-11-20 15:29:31 +09:00
Ian Barwick
060b746743 docs: miscellaneous cleanup 2017-11-20 15:29:28 +09:00
Ian Barwick
bdb82d3aba docs: add initial witness server documentation 2017-11-20 15:29:24 +09:00
Ian Barwick
f6a6df3600 repmgrd: renable monitoring data recording when in archive recovery.
The warning emitted gives the impression that monitoring data shouldn't
be written if there's no streaming replication, but we can and should
do this as long as we have a primary connection.

Explictly document this in the code.

Also remove an unused variable warning.
2017-11-20 15:29:21 +09:00
Ian Barwick
67e27f9ecd Remove unneeded functions 2017-11-20 15:26:32 +09:00
Ian Barwick
454c0b7bd9 docs: add note about "service_promote_command" in repmgr.conf.sample
It must never contain "repmgr standby promote", as it is intended
to enable use of package-level promote commands such as Debian's
"pg_ctlcluster promote".

Addresses GitHub #336.
2017-11-20 12:31:24 +09:00
Ian Barwick
faf297b07f remove spurios "/base" path element in Barman tablespace cloning code.
Addresses GitHub #339
2017-11-20 11:10:30 +09:00
Ian Barwick
0dae8c9f0b repmgr: don't add empty "passfile" parameter in recovery.conf 2017-11-20 10:28:16 +09:00
Ian Barwick
3f872cde0c "repmgr node ...": fixes for 9.3
Mainly to account for the lack of replication slots.
2017-11-16 11:26:39 +09:00
Ian Barwick
e331069f53 Escape double-quotes in strings passed to an event notification script
The string in question will be generated internally by repmgr as a simple
one-line string with no control characters etc., so all that needs to be
escaped at the moment are any double quotes.
2017-11-16 10:38:55 +09:00
Ian Barwick
53ebde8f33 repmgrd: don't fail over unless more than 50% of active nodes are visible. 2017-11-15 14:04:41 +09:00
Ian Barwick
5e9d50f8ca repmgrd: finalize witness failover handling 2017-11-15 14:04:37 +09:00
Ian Barwick
347e753c27 repmgrd: synchronise repmgr.nodes table on witness server 2017-11-15 14:04:34 +09:00
Ian Barwick
2f978847b1 repmgrd: handle witness server 2017-11-15 14:04:30 +09:00
Ian Barwick
3014f72fda "witness register": set upstream_node_id to that of the primary 2017-11-15 14:04:26 +09:00
Ian Barwick
e02ddd0f37 repmgrd: basic witness node monitoring 2017-11-15 14:04:23 +09:00
Ian Barwick
29fcee2209 docs: add witness command reference files to file list 2017-11-15 14:04:19 +09:00
Ian Barwick
f61f7f82eb docs: add command reference for "witness (un)register" 2017-11-15 14:04:14 +09:00
Ian Barwick
efe28cbbeb witness (un)register: add --dry-run mode 2017-11-15 14:04:09 +09:00
Ian Barwick
6131c1d8ce witness unregister: enable execution when witness server is down
Also add help output for "repmgr witness --help".
2017-11-15 14:04:06 +09:00
Ian Barwick
c907b7b33d repmgr: minor fix to "repmgr standby --help" output 2017-11-15 14:04:01 +09:00
Ian Barwick
e6644305d3 Add "witness unregister" functionality 2017-11-15 14:03:57 +09:00
Ian Barwick
31b856dd9f Add "witness register" functionality 2017-11-15 14:03:54 +09:00
Ian Barwick
dff2bcc5de witness: initial code framework 2017-11-15 14:03:50 +09:00
Ian Barwick
688e609169 docs: add some more index entries 2017-11-15 14:03:44 +09:00
Ian Barwick
3e68c9fcc6 docs: document "passfile" configuration file parameter 2017-11-15 14:03:40 +09:00
Ian Barwick
d459b92186 Add configuration file "passfile"
This will enable a custom .pgpass to be included in "primary_conninfo"
(provided it's supported by the libpq version on the standby).
2017-11-15 14:03:37 +09:00
Ian Barwick
2a898721c0 docs: update release notes
Add note about changes to password handling.1
2017-11-15 14:03:34 +09:00
Ian Barwick
35782d83c0 Update extension SQL 2017-11-15 14:03:30 +09:00
Ian Barwick
e16eb42693 repmgrd: detect role change from primary to standby
If repmgrd is monitoring a primary which is taken off-line, then later
restored as a standby, detect this change and resume monitoring
in standby node.

Addresses GitHub #338.
2017-11-15 14:03:26 +09:00
Ian Barwick
4d6dc57589 repmgrd: check shared library is loaded
If this isn't the case, "repmgrd" will appear to run but not handle
failover correctly.

Address GitHub #337.
2017-11-15 14:03:18 +09:00
Ian Barwick
cbc97d84ac repmgrd: updates related to node_id handling 2017-11-15 14:03:15 +09:00
Ian Barwick
96fe7dd2d6 repmgrd: catch corner cases where monitoring data is not available 2017-11-15 14:03:12 +09:00
Ian Barwick
13935a88c9 repmgrd: ensure shmem is reinitialised after a restart 2017-11-09 19:51:31 +09:00
Ian Barwick
5275890467 repmgrd: misc fixes 2017-11-09 19:51:26 +09:00
Ian Barwick
7f865fdaf3 repmgrd: fix priority/node_id tie-break check 2017-11-09 19:51:22 +09:00
Ian Barwick
9e2fb7ea13 repmgrd: remove unneeded functions 2017-11-09 19:51:18 +09:00
Ian Barwick
a3428e4d8a repmgrd: simplify the candidate selection logic
All disconnected nodes will be in a static, known state, so as long as
each node has the same meta-information (repmgr.nodes) and is able
to retrieve the last receive LSN of the other nodes, it is possible
for each node to independently determine the best promotion candidate,
thereby reaching consensus without an explicit "voting" process.
2017-11-09 19:51:13 +09:00
Ian Barwick
03b9475755 repmgrd: fixes to failover handling
get_new_primary() returns NULL if no notification for the new primary has
been received, but the code was expecting it to return UNKNOWN_NODE_ID,
which was causing repmgrd to prematurely drop out of the new primary
detection loop if no notification had been received by the time the loop
started.

Also store the electoral term as a single row, single column table,
to ensure that all repmgrds see the same turn. It is then bumped
by the winning node after it gets promoted.

Various logging improvements.
2017-11-09 19:51:09 +09:00
Ian Barwick
de1eb3c459 Ensure shared memory functions handle NULL parameters correctly 2017-11-09 19:51:02 +09:00
Ian Barwick
a13eccccc5 Update .gitignore
Ignore output from "make installcheck"
2017-11-09 19:50:57 +09:00
Ian Barwick
158f132bc0 README: update links to https versions 2017-11-09 19:50:53 +09:00
Ian Barwick
cdf54d217a Fix lock acquisition in shared memory functions 2017-11-09 19:50:48 +09:00
Ian Barwick
1a8a82f207 Update repmgr.conf.sample 2017-11-09 19:50:42 +09:00
Ian Barwick
60e877ca39 docs: fix example in BDR section 2017-11-02 11:24:10 +09:00
Ian Barwick
91531bffe4 docs: tweak Markdown URL formatting 2017-11-01 10:59:10 +09:00
Ian Barwick
fc5f46ca5a docs: update links to repmgr 4.0 documentation 2017-11-01 10:49:58 +09:00
Ian Barwick
b76952e136 docs: update copyright info 2017-11-01 09:36:16 +09:00
Ian Barwick
c3a1969f55 docs: convert command reference sections to <refentry> format
Note that most entries still need a bit more tidying up, consistent structuring,
provision of more examples etc.
2017-10-31 11:29:49 +09:00
Ian Barwick
11d856a1ec "standby follow": get upstream record before server restart, if required
The standby may not always be available for connections right after it's
restarted, so attempting to connect and get the node's upstream record
after the restart may fail. Record is now retrieved before the restart.

Addresses GitHub #333.
2017-10-27 16:30:25 +09:00
Ian Barwick
fbf357947d docs: add sample output to "standby follow" and "standby promote" 2017-10-27 15:05:46 +09:00
Ian Barwick
47eaa99537 docs: add note about building docs 2017-10-27 10:46:58 +09:00
Ian Barwick
aeee11d1b7 docs: finalize conversion of existing BDR repmgr documentation 2017-10-26 18:57:34 +09:00
Ian Barwick
e4713c5eca docs: update configuration documentation 2017-10-26 18:57:29 +09:00
Ian Barwick
e55e5a0581 Initial conversion of existing BDR repmgr documentation 2017-10-26 18:56:58 +09:00
Ian Barwick
fb0aae183d Docs: update "repmgr cluster show" 2017-10-26 09:42:36 +09:00
Ian Barwick
52655e9cd5 Improve trim() function
Did not cope well with trailing spaces or entirely blank strings.
2017-10-26 09:42:26 +09:00
Ian Barwick
c5d91ca88c repmgr node rejoin: add --dry-run option 2017-10-26 09:42:12 +09:00
Ian Barwick
9f5edd07ad Fix typo 2017-10-26 09:35:25 +09:00
Ian Barwick
f58b102d51 Standardize terminology on "primary" (in place of "master") 2017-10-24 13:44:03 +09:00
Ian Barwick
90733aecf7 --dry-run available for "node rejoin" 2017-10-23 10:40:43 +09:00
Ian Barwick
e0be228c89 docs: fix formatting 2017-10-23 10:00:00 +09:00
Ian Barwick
a9759cf6ca Add --help output for "repmgr node service"
Addresses GitHub #329.
2017-10-20 16:49:29 +09:00
Ian Barwick
6852ac82c6 Add --help output for "repmgr node rejoin"
Addresses GitHub #329.
2017-10-20 16:49:19 +09:00
Ian Barwick
c27bd2a135 docs: fix typo 2017-10-20 16:06:46 +09:00
Ian Barwick
5045e2eb9d node rewind: add check for pg_rewind and --dry-run mode
Addresses GitHub #330
2017-10-20 14:16:56 +09:00
Ian Barwick
23f7af17a2 Note Barman configuration file parameter changes 2017-10-20 11:31:31 +09:00
Ian Barwick
93936c090d Fix error message typo 2017-10-20 11:19:12 +09:00
Ian Barwick
564c951f0c Prevent relative configuration file path being stored in the repmgr metadata
The configuration file path is stored to make remote execution of repmgr
(e.g. during "repmgr standby switchover") simpler, so relative paths
make no sense.

Addresses GitHub #332
2017-10-20 10:59:54 +09:00
Ian Barwick
3f5e8f6aec Update README
Main body of documentation moved to DocBook format and hosted at:

    https://repmgr.org/docs/index.html

as the existing README and sundry additional files were becoming
unmanageable. Conversion to DocBook format enables all documentation
to be managed in a single structured system, with cross-references,
indexes, linkable URLS etc.
2017-10-19 16:39:33 +09:00
Ian Barwick
a6a97cda86 docs: update "repmgr cluster show" page 2017-10-19 16:39:27 +09:00
Ian Barwick
18c8e4c529 Add placeholder FAQ.md
This replaces the original FAQ maintainted for repmgr 3.x; repmgr 4
documentation is now available in DocBook format.
2017-10-19 16:22:28 +09:00
Ian Barwick
6984fe7029 docs: expand release notes and redirect "changes-in-repmgr4.md" 2017-10-19 14:11:17 +09:00
Ian Barwick
5ecc3a0a8f Add 4.0 release notes 2017-10-19 13:59:03 +09:00
Ian Barwick
febde097be doc: add missing entry for "priority" in repmgr.conf.sample
Per report from Shaun Thomas.
2017-10-19 13:16:36 +09:00
Ian Barwick
19ea248226 docs: add more index references 2017-10-19 12:22:58 +09:00
Ian Barwick
acdbd1110a docs: note way of forcing recovery then quitting in single user mode 2017-10-19 12:22:54 +09:00
Ian Barwick
946683182c Documentation: update markup 2017-10-18 11:12:37 +09:00
Ian Barwick
c9fbb7febf Update package signature documentation 2017-10-18 10:51:35 +09:00
Ian Barwick
ff966fe533 Document "upgrading-from-repmgr3.md" moved to main repmgr documentation 2017-10-18 10:51:29 +09:00
Ian Barwick
7001960cc1 Update "repmgr node rejoin" documentation 2017-10-17 17:41:36 +09:00
Ian Barwick
1cfba44799 Add FAQ to documentation 2017-10-17 16:16:40 +09:00
Ian Barwick
d1f9ca4b43 Move deprecated command line option
Not required in repmgr4, we're keeping it around for backwards compatibility;
a warning will be issued if used.
2017-10-17 16:16:06 +09:00
Ian Barwick
f6c253f8a6 Various documentation fixes 2017-10-17 11:02:33 +09:00
Ian Barwick
95ec8d8b21 Bump doc version 2017-10-17 09:46:23 +09:00
Ian Barwick
041f1b7667 Merge commit '0b2a6fe2fb958f10f211f0656fd91cae980fd08d' into REL4_0_STABLE 2017-10-16 11:22:48 +09:00
Ian Barwick
104279016a Update HISTORY 2017-10-04 13:33:37 +09:00
Ian Barwick
901a7603b1 Stamp 4.0beta1 2017-10-04 13:01:49 +09:00
71 changed files with 1240 additions and 3733 deletions

72
HISTORY
View File

@@ -1,65 +1,21 @@
4.1.1 2018-09-05
logging: explicitly log the text of failed queries as ERRORs to
assist logfile analysis; GitHub #498
repmgr: truncate version string, if necessary; GitHub #490 (Ian)
repmgr: improve messages emitted during "standby promote" (Ian)
repmgr: "standby clone" - don't copy external config files in --dry-run
mode; GitHub #491 (Ian)
repmgr: add "cluster_cleanup" event; GitHub #492 (Ian)
repmgr: (standby switchover) improve detection of free walsenders;
GitHub #495 (Ian)
repmgr: (node rejoin) improve replication slot handling; GitHub #499 (Ian)
repmgrd: ensure that sending SIGHUP always results in the log file
being reopened; GitHub #485 (Ian)
repmgrd: report version number *after* logger initialisation; GitHub #487 (Ian)
repmgrd: fix startup on witness node when local data is stale; GitHub #488/#489 (Ian)
repmgrd: improve cascaded standby failover handling; GitHub #480 (Ian)
repmgrd: improve reconnection handling (Ian)
4.1.0 2018-07-31
repmgr: change default log_level to INFO, add documentation; GitHub #470 (Ian)
repmgr: add "--missing-slots" check to "repmgr node check" (Ian)
repmgr: improve command line error handling; GitHub #464 (Ian)
repmgr: fix "standby register --wait-sync" when no timeout provided (Ian)
repmgr: "cluster show" returns non-zero value if an issue encountered;
GitHub #456 (Ian)
repmgr: "node check" and "node status" returns non-zero value if an issue
encountered (Ian)
repmgr: add CSV output mode to "cluster event"; GitHub #471 (Ian)
repmgr: add -q/--quiet option to suppress non-error output; GitHub #468 (Ian)
repmgr: "node status" returns non-zero value if an issue encountered (Ian)
repmgr: enable "recovery_min_apply_delay" to be 0; GitHub #448 (Ian)
repmgr: "cluster cleanup" - add missing help options; GitHub #461/#462 (gclough)
repmgr: ensure witness node follows new primary after switchover;
GitHub #453 (Ian)
repmgr: fix witness node handling in "node check"/"node status";
GitHub #451 (Ian)
repmgr: fix "primary_slot_name" when using "standby clone" with --recovery-conf-only;
GitHub #474 (Ian)
repmgr: don't perform a switchover if an exclusive backup is running;
GitHub #476 (Martín)
repmgr: enable "witness unregister" to be run on any node; GitHub #472 (Ian)
repmgrd: create a PID file by default; GitHub #457 (Ian)
repmgrd: daemonize process by default; GitHub #458 (Ian)
4.0.6 2018-06-14
repmgr: (witness register) prevent registration of a witness server with the
same name as an existing node (Ian)
repmgr: (standby follow) check node has actually connected to new primary
before reporting success; GitHub #444 (Ian)
repmgr: (standby clone) improve handling of external configuration file copying,
including consideration in --dry-run check; GitHub #443 (Ian)
repmgr: (standby clone) don't require presence of "user" parameter in
conninfo string; GitHub #437 (Ian)
repmgr: (standby clone) improve documentation of --recovery-conf-only
mode; GitHub #438 (Ian)
repmgr: (node rejoin) fix bug when parsing --config-files parameter;
GitHub #442 (Ian)
repmgr: when using --dry-run, force log level to INFO to ensure output
will always be displayed; GitHub #441 (Ian)
same name as an existing node (Ian)
repmgr: (standby follow) check node has actually connected to new primary
before reporting success; GitHub #444 (Ian)
repmgr: (standby clone) improve handling of external configuration file copying,
including consideration in --dry-run check; GitHub #443 (Ian)
repmgr: (standby clone) don't require presence of "user" parameter in
conninfo string; GitHub #437 (Ian)
repmgr: (standby clone) improve documentation of --recovery-conf-only
mode; GitHub #438 (Ian)
repmgr: (node rejoin) fix bug when parsing --config-files parameter;
GitHub #442 (Ian)
repmgr: when using --dry-run, force log level to INFO to ensure output
will always be displayed; GitHub #441 (Ian)
repmgr: (cluster matrix/crosscheck) return non-zero exit code if node
connection issues detected; GitHub #447 (Ian)
repmgrd: ensure local node is counted as quorum member; GitHub #439 (Ian)
repmgrd: ensure local node is counted as quorum member; GitHub #439 (Ian)
4.0.5 2018-05-02
repmgr: poll demoted primary after restart as a standby during a

View File

@@ -11,10 +11,7 @@ EXTENSION = repmgr
DATA = \
repmgr--unpackaged--4.0.sql \
repmgr--4.0.sql \
repmgr--4.0--4.1.sql \
repmgr--4.1.sql
repmgr--4.0.sql
REGRESS = repmgr_extension

View File

@@ -28,8 +28,10 @@ char config_file_path[MAXPGPATH] = "";
static bool config_file_provided = false;
bool config_file_found = false;
static void parse_config(t_configuration_options *options, bool terse);
static void _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *warning_list);
static bool parse_bool(const char *s,
const char *config_item,
ItemList *error_list);
static void _parse_line(char *buf, char *name, char *value);
static void parse_event_notifications_list(t_configuration_options *options, const char *arg);
@@ -239,7 +241,7 @@ end_search:
}
static void
void
parse_config(t_configuration_options *options, bool terse)
{
/* Collate configuration file errors here for friendlier reporting */
@@ -331,12 +333,6 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
options->primary_follow_timeout = DEFAULT_PRIMARY_FOLLOW_TIMEOUT;
options->standby_follow_timeout = DEFAULT_STANDBY_FOLLOW_TIMEOUT;
/*------------------------
* standby switchover settings
*------------------------
*/
options->standby_reconnect_timeout = DEFAULT_STANDBY_RECONNECT_TIMEOUT;
/*-----------------
* repmgrd settings
*-----------------
@@ -356,8 +352,7 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
options->degraded_monitoring_timeout = -1;
options->async_query_timeout = DEFAULT_ASYNC_QUERY_TIMEOUT;
options->primary_notification_timeout = DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT;
options->repmgrd_standby_startup_timeout = -1; /* defaults to "standby_reconnect_timeout" if not set */
memset(options->repmgrd_pid_file, 0, sizeof(options->repmgrd_pid_file));
options->standby_reconnect_timeout = DEFAULT_STANDBY_RECONNECT_TIMEOUT;
/*-------------
* witness settings
@@ -544,14 +539,6 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
else if (strcmp(name, "standby_follow_timeout") == 0)
options->standby_follow_timeout = repmgr_atoi(value, name, error_list, 0);
/* standby switchover settings */
else if (strcmp(name, "standby_reconnect_timeout") == 0)
options->standby_reconnect_timeout = repmgr_atoi(value, name, error_list, 0);
/* node rejoin settings */
else if (strcmp(name, "node_rejoin_timeout") == 0)
options->node_rejoin_timeout = repmgr_atoi(value, name, error_list, 0);
/* node check settings */
else if (strcmp(name, "archive_ready_warning") == 0)
options->archive_ready_warning = repmgr_atoi(value, name, error_list, 1);
@@ -601,10 +588,8 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
options->async_query_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "primary_notification_timeout") == 0)
options->primary_notification_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "repmgrd_standby_startup_timeout") == 0)
options->repmgrd_standby_startup_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "repmgrd_pid_file") == 0)
strncpy(options->repmgrd_pid_file, value, MAXPGPATH);
else if (strcmp(name, "standby_reconnect_timeout") == 0)
options->standby_reconnect_timeout = repmgr_atoi(value, name, error_list, 0);
/* witness settings */
else if (strcmp(name, "witness_sync_interval") == 0)
@@ -786,17 +771,6 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
PQconninfoFree(conninfo_options);
}
/* set values for parameters which default to other parameters */
/*
* From 4.1, "repmgrd_standby_startup_timeout" replaces "standby_reconnect_timeout"
* in repmgrd; fall back to "standby_reconnect_timeout" if no value explicitly provided
*/
if (options->repmgrd_standby_startup_timeout == -1)
{
options->repmgrd_standby_startup_timeout = options->standby_reconnect_timeout;
}
/* add warning about changed "barman_" parameter meanings */
if ((options->barman_host[0] == '\0' && options->barman_server[0] != '\0') ||
(options->barman_host[0] != '\0' && options->barman_server[0] == '\0'))
@@ -821,12 +795,6 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
item_list_append(error_list,
_("\replication_lag_critical\" must be greater than \"replication_lag_warning\""));
}
if (options->standby_reconnect_timeout < options->node_rejoin_timeout)
{
item_list_append(error_list,
_("\"standby_reconnect_timeout\" must be equal to or greater than \"node_rejoin_timeout\""));
}
}
@@ -991,11 +959,12 @@ parse_time_unit_parameter(const char *name, const char *value, char *dest, ItemL
char *ptr = NULL;
int targ = strtol(value, &ptr, 10);
if (targ < 0)
if (targ < 1)
{
if (errors != NULL)
{
item_list_append_format(errors,
item_list_append_format(
errors,
_("invalid value provided for \"%s\""),
name);
}
@@ -1049,7 +1018,6 @@ parse_time_unit_parameter(const char *name, const char *value, char *dest, ItemL
* - promote_delay
* - reconnect_attempts
* - reconnect_interval
* - repmgrd_standby_startup_timeout
* - retry_promote_interval_secs
*
* non-changeable options
@@ -1065,7 +1033,7 @@ parse_time_unit_parameter(const char *name, const char *value, char *dest, ItemL
*/
bool
reload_config(t_configuration_options *orig_options, t_server_type server_type)
reload_config(t_configuration_options *orig_options)
{
PGconn *conn;
t_configuration_options new_options = T_CONFIGURATION_OPTIONS_INITIALIZER;
@@ -1075,50 +1043,17 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
static ItemList config_errors = {NULL, NULL};
static ItemList config_warnings = {NULL, NULL};
PQExpBufferData errors;
log_info(_("reloading configuration file"));
_parse_config(&new_options, &config_errors, &config_warnings);
if (server_type == PRIMARY || server_type == STANDBY)
{
if (new_options.promote_command[0] == '\0')
{
item_list_append(&config_errors, _("\"promote_command\": required parameter was not found"));
}
if (new_options.follow_command[0] == '\0')
{
item_list_append(&config_errors, _("\"follow_command\": required parameter was not found"));
}
}
if (config_errors.head != NULL)
{
ItemListCell *cell = NULL;
/* XXX dump errors to log */
log_warning(_("unable to parse new configuration, retaining current configuration"));
initPQExpBuffer(&errors);
appendPQExpBuffer(&errors,
"following errors were detected:\n");
for (cell = config_errors.head; cell; cell = cell->next)
{
appendPQExpBuffer(&errors,
" %s\n", cell->string);
}
log_detail("%s", errors.data);
termPQExpBuffer(&errors);
return false;
}
/* The following options cannot be changed */
if (new_options.node_id != orig_options->node_id)
@@ -1299,15 +1234,6 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
config_changed = true;
}
/* repmgrd_standby_startup_timeout */
if (orig_options->repmgrd_standby_startup_timeout != new_options.repmgrd_standby_startup_timeout)
{
orig_options->repmgrd_standby_startup_timeout = new_options.repmgrd_standby_startup_timeout;
log_info(_("\"repmgrd_standby_startup_timeout\" is now \"%i\""), new_options.repmgrd_standby_startup_timeout);
config_changed = true;
}
/*
* Handle changes to logging configuration
*/
@@ -1400,23 +1326,13 @@ exit_with_config_file_errors(ItemList *config_errors, ItemList *config_warnings,
void
exit_with_cli_errors(ItemList *error_list, const char *repmgr_command)
exit_with_cli_errors(ItemList *error_list)
{
fprintf(stderr, _("The following command line errors were encountered:\n"));
print_item_list(error_list);
if (repmgr_command != NULL)
{
fprintf(stderr, _("Try \"%s --help\" or \"%s %s --help\" for more information.\n"),
progname(),
progname(),
repmgr_command);
}
else
{
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname());
}
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname());
exit(ERR_BAD_CONFIG);
}
@@ -1521,7 +1437,7 @@ repmgr_atoi(const char *value, const char *config_item, ItemList *error_list, in
*
* https://www.postgresql.org/docs/current/static/config-setting.html
*/
bool
static bool
parse_bool(const char *s, const char *config_item, ItemList *error_list)
{
PQExpBufferData errors;
@@ -1807,9 +1723,6 @@ free_parsed_argv(char ***argv_array)
}
bool
parse_pg_basebackup_options(const char *pg_basebackup_options, t_basebackup_options *backup_options, int server_version_num, ItemList *error_list)
{

View File

@@ -102,12 +102,6 @@ typedef struct
int primary_follow_timeout;
int standby_follow_timeout;
/* standby switchover settings */
int standby_reconnect_timeout;
/* node rejoin settings */
int node_rejoin_timeout;
/* node check settings */
int archive_ready_warning;
int archive_ready_critical;
@@ -130,8 +124,7 @@ typedef struct
int degraded_monitoring_timeout;
int async_query_timeout;
int primary_notification_timeout;
int repmgrd_standby_startup_timeout;
char repmgrd_pid_file[MAXPGPATH];
int standby_reconnect_timeout;
/* BDR settings */
bool bdr_local_monitoring_only;
@@ -180,10 +173,6 @@ typedef struct
/* standby follow settings */ \
DEFAULT_PRIMARY_FOLLOW_TIMEOUT, \
DEFAULT_STANDBY_FOLLOW_TIMEOUT, \
/* standby switchover settings */ \
DEFAULT_STANDBY_RECONNECT_TIMEOUT, \
/* node rejoin settings */ \
DEFAULT_NODE_REJOIN_TIMEOUT, \
/* node check settings */ \
DEFAULT_ARCHIVE_READY_WARNING, DEFAULT_ARCHIVE_READY_CRITICAL, \
DEFAULT_REPLICATION_LAG_WARNING, DEFAULT_REPLICATION_LAG_CRITICAL, \
@@ -197,7 +186,7 @@ typedef struct
false, -1, \
DEFAULT_ASYNC_QUERY_TIMEOUT, \
DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT, \
-1, "", \
DEFAULT_STANDBY_RECONNECT_TIMEOUT, \
/* BDR settings */ \
false, DEFAULT_BDR_RECOVERY_TIMEOUT, \
/* service settings */ \
@@ -273,20 +262,16 @@ typedef struct
"", "", "", "" \
}
#include "dbutils.h"
void set_progname(const char *argv0);
const char *progname(void);
void load_config(const char *config_file, bool verbose, bool terse, t_configuration_options *options, char *argv0);
bool reload_config(t_configuration_options *orig_options, t_server_type server_type);
void parse_config(t_configuration_options *options, bool terse);
bool reload_config(t_configuration_options *orig_options);
bool parse_recovery_conf(const char *data_dir, t_recovery_conf *conf);
bool parse_bool(const char *s,
const char *config_item,
ItemList *error_list);
int repmgr_atoi(const char *s,
const char *config_item,
ItemList *error_list,
@@ -302,7 +287,7 @@ void free_parsed_argv(char ***argv_array);
/* called by repmgr-client and repmgrd */
void exit_with_cli_errors(ItemList *error_list, const char *repmgr_command);
void exit_with_cli_errors(ItemList *error_list);
void print_item_list(ItemList *item_list);
#endif /* _REPMGR_CONFIGFILE_H_ */

18
configure vendored
View File

@@ -1,6 +1,6 @@
#! /bin/sh
# Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.69 for repmgr 4.1.1.
# Generated by GNU Autoconf 2.69 for repmgr 4.0.5.
#
# Report bugs to <pgsql-bugs@postgresql.org>.
#
@@ -582,8 +582,8 @@ MAKEFLAGS=
# Identity of this package.
PACKAGE_NAME='repmgr'
PACKAGE_TARNAME='repmgr'
PACKAGE_VERSION='4.1.1'
PACKAGE_STRING='repmgr 4.1.1'
PACKAGE_VERSION='4.0.5'
PACKAGE_STRING='repmgr 4.0.5'
PACKAGE_BUGREPORT='pgsql-bugs@postgresql.org'
PACKAGE_URL='https://2ndquadrant.com/en/resources/repmgr/'
@@ -1178,7 +1178,7 @@ if test "$ac_init_help" = "long"; then
# Omit some internal or obsolete options to make the list less imposing.
# This message is too long to be a string in the A/UX 3.1 sh.
cat <<_ACEOF
\`configure' configures repmgr 4.1.1 to adapt to many kinds of systems.
\`configure' configures repmgr 4.0.5 to adapt to many kinds of systems.
Usage: $0 [OPTION]... [VAR=VALUE]...
@@ -1239,7 +1239,7 @@ fi
if test -n "$ac_init_help"; then
case $ac_init_help in
short | recursive ) echo "Configuration of repmgr 4.1.1:";;
short | recursive ) echo "Configuration of repmgr 4.0.5:";;
esac
cat <<\_ACEOF
@@ -1313,7 +1313,7 @@ fi
test -n "$ac_init_help" && exit $ac_status
if $ac_init_version; then
cat <<\_ACEOF
repmgr configure 4.1.1
repmgr configure 4.0.5
generated by GNU Autoconf 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
@@ -1332,7 +1332,7 @@ cat >config.log <<_ACEOF
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.
It was created by repmgr $as_me 4.1.1, which was
It was created by repmgr $as_me 4.0.5, which was
generated by GNU Autoconf 2.69. Invocation command line was
$ $0 $@
@@ -2359,7 +2359,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
# report actual input values of CONFIG_FILES etc. instead of their
# values after options handling.
ac_log="
This file was extended by repmgr $as_me 4.1.1, which was
This file was extended by repmgr $as_me 4.0.5, which was
generated by GNU Autoconf 2.69. Invocation command line was
CONFIG_FILES = $CONFIG_FILES
@@ -2422,7 +2422,7 @@ _ACEOF
cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`"
ac_cs_version="\\
repmgr config.status 4.1.1
repmgr config.status 4.0.5
configured by $0, generated by GNU Autoconf 2.69,
with options \\"\$ac_cs_config\\"

View File

@@ -1,4 +1,4 @@
AC_INIT([repmgr], [4.1.1], [pgsql-bugs@postgresql.org], [repmgr], [https://2ndquadrant.com/en/resources/repmgr/])
AC_INIT([repmgr], [4.0.6], [pgsql-bugs@postgresql.org], [repmgr], [https://2ndquadrant.com/en/resources/repmgr/])
AC_COPYRIGHT([Copyright (c) 2010-2018, 2ndQuadrant Ltd.])

1121
dbutils.c

File diff suppressed because it is too large Load Diff

View File

@@ -29,9 +29,7 @@
#include "voting.h"
#define REPMGR_NODES_COLUMNS "n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name "
#define BDR2_NODES_COLUMNS "node_sysid, node_timeline, node_dboid, node_name, node_local_dsn, ''"
#define BDR3_NODES_COLUMNS "ns.node_id, 0, 0, ns.node_name, ns.interface_connstr, ns.peer_state_name"
#define BDR_NODES_COLUMNS "node_sysid, node_timeline, node_dboid, node_status, node_name, node_local_dsn, node_init_from_dsn, node_read_only, node_seq_id"
#define ERRBUFF_SIZE 512
@@ -96,14 +94,6 @@ typedef enum
SLOT_ACTIVE
} ReplSlotStatus;
typedef enum
{
BACKUP_STATE_UNKNOWN = -1,
BACKUP_STATE_IN_BACKUP,
BACKUP_STATE_NO_BACKUP
} BackupState;
/*
* Struct to store node information
*/
@@ -247,14 +237,18 @@ typedef struct s_bdr_node_info
char node_sysid[MAXLEN];
uint32 node_timeline;
uint32 node_dboid;
char node_status;
char node_name[MAXLEN];
char node_local_dsn[MAXLEN];
char peer_state_name[MAXLEN];
char node_init_from_dsn[MAXLEN];
bool read_only;
uint32 node_seq_id;
} t_bdr_node_info;
#define T_BDR_NODE_INFO_INITIALIZER { \
"", InvalidOid, InvalidOid, \
"", "", "" \
'?', "", "", "", \
false, -1 \
}
@@ -398,7 +392,6 @@ int get_ready_archive_files(PGconn *conn, const char *data_directory);
bool identify_system(PGconn *repl_conn, t_system_identification *identification);
bool repmgrd_set_local_node_id(PGconn *conn, int local_node_id);
int repmgrd_get_local_node_id(PGconn *conn);
BackupState server_in_exclusive_backup_mode(PGconn *conn);
/* extension functions */
ExtensionStatus get_repmgr_extension_status(PGconn *conn);
@@ -475,7 +468,7 @@ int wait_connection_availability(PGconn *conn, long long timeout);
/* node availability functions */
bool is_server_available(const char *conninfo);
bool is_server_available_params(t_conninfo_param_list *param_list);
ExecStatusType connection_ping(PGconn *conn);
void connection_ping(PGconn *conn);
/* monitoring functions */
void
@@ -514,14 +507,12 @@ void get_node_replication_stats(PGconn *conn, int server_version_num, t_node_in
bool is_downstream_node_attached(PGconn *conn, char *node_name);
/* BDR functions */
int get_bdr_version_num(void);
void get_all_bdr_node_records(PGconn *conn, BdrNodeInfoList *node_list);
RecordStatus get_bdr_node_record_by_name(PGconn *conn, const char *node_name, t_bdr_node_info *node_info);
bool is_bdr_db(PGconn *conn, PQExpBufferData *output);
bool is_bdr_db_quiet(PGconn *conn);
bool is_active_bdr_node(PGconn *conn, const char *node_name);
bool is_bdr_repmgr(PGconn *conn);
char *get_default_bdr_replication_set(PGconn *conn);
bool is_table_in_bdr_replication_set(PGconn *conn, const char *tablename, const char *set);
bool add_table_to_bdr_replication_set(PGconn *conn, const char *tablename, const char *set);
void add_extension_tables_to_bdr_replication_set(PGconn *conn);

View File

@@ -108,14 +108,6 @@
is not possible, contact your vendor for assistance.
</para>
</sect2>
<sect2 id="faq-old-packages">
<title>How can I obtain old versions of &repmgr; packages?</title>
<para>
See appendix <xref linkend="packages-old-versions"> for details.
</para>
</sect2>
</sect1>
<sect1 id="faq-repmgr" xreflabel="repmgr">
@@ -247,22 +239,11 @@
Under some circumstances event notifications can be generated for servers
which have not yet been registered; it's also useful to retain a record
of events which includes servers removed from the replication cluster
which no longer have an entry in the <literal>repmgr.nodes</literal> table.
which no longer have an entry in the <literal>repmrg.nodes</literal> table.
</para>
</sect2>
<sect2 id="faq-repmgr-recovery-conf-quoted-values" xreflabel="Quoted values in recovery.conf">
<title>Why are some values in <filename>recovery.conf</filename> surrounded by pairs of single quotes?</title>
<para>
This is to ensure that user-supplied values which are written as parameter values in <filename>recovery.conf</filename>
are escaped correctly and do not cause errors when <filename>recovery.conf</filename> is parsed.
</para>
<para>
The escaping is performed by an internal PostgreSQL routine, which leaves strings consisting
of digits and alphabetical characters only as-is, but wraps everything else in pairs of single quotes,
even if the string does not contain any characters which need escaping.
</para>
</sect2>
</sect1>
@@ -274,7 +255,7 @@
<sect2 id="faq-repmgrd-prevent-promotion" xreflabel="Prevent standby from being promoted to primary">
<title>How can I prevent a node from ever being promoted to primary?</title>
<para>
In <filename>repmgr.conf</filename>, set its priority to a value of <literal>0</literal>; apply the changed setting with
In `repmgr.conf`, set its priority to a value of 0 or less; apply the changed setting with
<command><link linkend="repmgr-standby-register">repmgr standby register --force</link></command>.
</para>
<para>
@@ -322,36 +303,5 @@
</para>
</sect2>
<sect2 id="faq-repmgrd-pg-bindir" xreflabel="repmgrd does not apply pg_bindir to promote_command or follow_command">
<title>
<application>repmgrd</application> ignores pg_bindir when executing <varname>promote_command</varname> or <varname>follow_command</varname>
</title>
<para>
<varname>promote_command</varname> or <varname>follow_command</varname> can be user-defined scripts,
so &repmgr; will not apply <option>pg_bindir</option> even if excuting &repmgr;. Always provide the full
path; see <xref linkend="repmgrd-automatic-failover-configuration"> for more details.
</para>
</sect2>
<sect2 id="faq-repmgrd-startup-no-upstream" xreflabel="repmgrd does not start if upstream node is not running">
<title>
<application>repmgrd</application> aborts startup with the error "<literal>upstream node must be running before repmgrd can start</literal>"
</title>
<para>
<application>repmgrd</application> does this to avoid starting up on a replication cluster
which is not in a healthy state. If the upstream is unavailable, <application>repmgrd</application>
may initiate a failover immediately after starting up, which could have unintended side-effects,
particularly if <application>repmgrd</application> is not running on other nodes.
</para>
<para>
In particular, it's possible that the node's local copy of the <literal>repmgr.nodes</literal> copy
is out-of-date, which may lead to incorrect failover behaviour.
</para>
<para>
The onus is therefore on the adminstrator to manually set the cluster to a stable, healthy state before
starting <application>repmgrd</application>.
</para>
</sect2>
</sect1>
</appendix>

View File

@@ -53,11 +53,11 @@
<tbody>
<row>
<entry>Repository URL:</entry>
<entry><ulink url="https://dl.2ndquadrant.com/">https://dl.2ndquadrant.com/</ulink></entry>
<entry><ulink url="https://rpm.2ndquadrant.com/">https://rpm.2ndquadrant.com/</ulink></entry>
</row>
<row>
<entry>Repository documentation:</entry>
<entry><ulink url="https://repmgr.org/docs/4.1/installation-packages.html#INSTALLATION-PACKAGES-REDHAT-2NDQ">https://repmgr.org/docs/4.1/installation-packages.html#INSTALLATION-PACKAGES-REDHAT-2NDQ</ulink></entry>
<entry><ulink url="https://repmgr.org/docs/4.0/installation-packages.html#INSTALLATION-PACKAGES-REDHAT-2NDQ">https://repmgr.org/docs/4.0/installation-packages.html#INSTALLATION-PACKAGES-REDHAT-2NDQ</ulink></entry>
</row>
</tbody>
</tgroup>
@@ -253,23 +253,6 @@
</para>
<table id="apt-2ndquadrant-repository">
<title>2ndQuadrant public repository</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Repository URL:</entry>
<entry><ulink url="https://dl.2ndquadrant.com/">https://dl.2ndquadrant.com/</ulink></entry>
</row>
<row>
<entry>Repository documentation:</entry>
<entry><ulink url="https://repmgr.org/docs/4.1/installation-packages.html#INSTALLATION-PACKAGES-DEBIAN">https://repmgr.org/docs/4.1/installation-packages.html#INSTALLATION-PACKAGES-DEBIAN</ulink></entry>
</row>
</tbody>
</tgroup>
</table>
<table id="apt-repository">
<title>PostgreSQL Community APT repository (PGDG)</title>
<tgroup cols="2">
@@ -381,169 +364,4 @@
</sect2>
</sect1>
<sect1 id="packages-snapshot" xreflabel="Snapshot packages">
<title>Snapshot packages</title>
<indexterm>
<primary>snapshot packages</primary>
</indexterm>
<indexterm>
<primary>packages</primary>
<secondary>snaphots</secondary>
</indexterm>
<para>
For testing new features and bug fixes, from time to time 2ndQuadrant provides
so-called &quot;snapshot packages&quot; via its public repository. These packages
are built from the &repmgr; source at a particular point in time, and are not formal
releases.
</para>
<note>
<para>
We do not recommend installing these packages in a production environment
unless specifically advised.
</para>
</note>
<para>
To install a snapshot package, it's necessary to install the 2ndQuadrant public snapshot repository,
following the instructions here: <ulink url="https://dl.2ndquadrant.com/default/release/site/">https://dl.2ndquadrant.com/default/release/site/</ulink> but replace <literal>release</literal> with <literal>snapshot</literal>
in the appropriate URL.
</para>
<para>
For example, to install the snapshot RPM repository for PostgreSQL 9.6, execute (as <literal>root</literal>):
<programlisting>
curl https://dl.2ndquadrant.com/default/snapshot/get/9.6/rpm | bash</programlisting>
or as a normal user with root sudo access:
<programlisting>
curl https://dl.2ndquadrant.com/default/snapshot/get/9.6/rpm | sudo bash</programlisting>
</para>
<para>
Alternatively you can browse the repository here:
<ulink url="https://dl.2ndquadrant.com/default/snapshot/browse/">https://dl.2ndquadrant.com/default/snapshot/browse/</ulink>.
</para>
<para>
Once the repository is installed, installing or updating &repmgr; will result in the latest snapshot
package being installed.
</para>
<para>
The package name will be formatted like this:
<programlisting>
repmgr96-4.1.1-0.0git320.g5113ab0.1.el7.x86_64.rpm</programlisting>
containg the snapshot build number (here: <literal>320</literal>) and the hash
of the <application>git</application> commit it was built from (here: <literal>g5113ab0</literal>).
</para>
<para>
Note that the next formal release (in the above example <literal>4.1.1</literal>), once available,
will install in place of any snapshot builds.
</para>
</sect1>
<sect1 id="packages-old-versions" xreflabel="Installing old package versions">
<title>Installing old package versions</title>
<indexterm>
<primary>old packages</primary>
</indexterm>
<indexterm>
<primary>packages</primary>
<secondary>old versions</secondary>
</indexterm>
<sect2 id="packages-old-versions-debian" xreflabel="old Debian package versions">
<title>Debian/Ubuntu</title>
<para>
An archive of old packages (<literal>3.3.2</literal> and later) for Debian/Ubuntu-based systems is available here:
<ulink url="http://atalia.postgresql.org/morgue/r/repmgr/">http://atalia.postgresql.org/morgue/r/repmgr/</ulink>
</para>
</sect2>
<sect2 id="packages-old-versions-rhel-centos" xreflabel="old RHEL/CentOS package versions">
<title>RHEL/CentOS</title>
<para>
Old RPM packages (<literal>3.2</literal> and later) can be retrieved from the
(deprecated) 2ndQuadrant repository at
<ulink url="http://packages.2ndquadrant.com/">http://packages.2ndquadrant.com/</ulink>
by installing the appropriate repository RPM:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
<listitem>
<simpara>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
</itemizedlist>
<para>
Old versions can be located with e.g.:
<programlisting>
yum --showduplicates list repmgr96</programlisting>
(substitute the appropriate package name; see <xref linkend="packages-centos">) and installed with:
<programlisting>
yum install {package_name}-{version}</programlisting>
where <literal>{package_name}</literal> is the base package name (e.g. <literal>repmgr96</literal>)
and <literal>{version}</literal> is the version listed by the
<command> yum --showduplicates list ...</command> command, e.g. <literal>4.0.6-1.rhel6</literal>.
</para>
<para>For example:
<programlisting>
yum install repmgr96-4.0.6-1.rhel6</programlisting>
</para>
</sect2>
</sect1>
<sect1 id="packages-packager-info" xreflabel="Information for packagers">
<title>Information for packagers</title>
<indexterm>
<primary>packages</primary>
<secondary>information for packagers</secondary>
</indexterm>
<para>
We recommend patching the following parameters when
building the package as built-in default values for user convenience.
These values can nevertheless be overridden by the user, if desired.
</para>
<itemizedlist>
<listitem>
<para>
Configuration file location: the default configuration file location
can be hard-coded by patching <varname>package_conf_file</varname>
in <filename>configfile.c</filename>:
<programlisting>
/* packagers: if feasible, patch configuration file path into "package_conf_file" */
char package_conf_file[MAXPGPATH] = "";</programlisting>
</para>
<para>
See also: <xref linkend="configuration-file">
</para>
</listitem>
<listitem>
<para>
PID file location: the default <application>repmgrd</application> PID file
location can be hard-coded by patching <varname>package_pid_file</varname>
in <filename>repmgrd.c</filename>:
<programlisting>
/* packagers: if feasible, patch PID file path into "package_pid_file" */
char package_pid_file[MAXPGPATH] = "";</programlisting>
</para>
<para>
See also: <xref linkend="repmgrd-pid-file">
</para>
</listitem>
</itemizedlist>
</sect1>
</appendix>

View File

@@ -15,374 +15,9 @@
See also: <xref linkend="upgrading-repmgr">
</para>
<sect1 id="release-4.1.1">
<title>Release 4.1.1</title>
<para><emphasis>Wed September 5, 2018</emphasis></para>
<para>
repmgr 4.1.1 contains a number of usability enhancements and bug fixes.
</para>
<para>
We recommend upgrading to this version as soon as possible.
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.1.0;
<application>repmgrd</application> (if running) should be restarted.
See <xref linkend="upgrading-repmgr"> for more details.
</para>
<sect2>
<title>repmgr enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-standby-clone">repmgr standby switchover --dry-run</link></command>
no longer copies external configuration files to test they can be copied; this avoids making
any changes to the target system. (GitHub #491).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-cluster-cleanup">repmgr cluster cleanup</link></command>:
add <literal>cluster_cleanup</literal> event. (GitHub #492)
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-standby-switchover">repmgr standby switchover</link></command>:
improve detection of free walsenders. (GitHub #495).
</para>
</listitem>
<listitem>
<para>
Improve messages emitted during
<command><link linkend="repmgr-standby-promote">repmgr standby promote</link></command>.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>repmgrd enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
Always reopen the log file after
receiving <literal>SIGHUP</literal>. Previously this only happened if
a configuration file change was detected.
(GitHub #485).
</para>
</listitem>
<listitem>
<para>
Report version number <emphasis>after</emphasis>
logger initialisation. (GitHub #487).
</para>
</listitem>
<listitem>
<para>
Improve cascaded standby failover handling. (GitHub #480).
</para>
</listitem>
<listitem>
<para>
Improve reconnection handling after brief network outages; if
monitoring data being collected, this could lead to orphaned
sessions on the primary. (GitHub #480).
</para>
</listitem>
<listitem>
<para>
Check <varname>promote_command</varname> and <varname>follow_command</varname>
are defined when reloading configuration. These were checked on startup but
not reload by <application>repmgrd</application>, which made it possible to
make <application>repmgrd</application> with invalid values. It's unlikely
anyone would want to do this, but we should make it impossible anyway.
(GitHub #486).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>Other</title>
<para>
<itemizedlist>
<listitem>
<para>
Text of any failed queries will now be logged as <literal>ERROR</literal> to assist
logfile analysis at log levels higher than <literal>DEBUG</literal>.
(GitHub #498).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>Bug fixes</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-node-rejoin">repmgr node rejoin</link></command>:
remove new upstream's replication slot if it still exists on the rejoined
standby. (GitHub #499).
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: fix startup on witness node when local data is stale. (GitHub #488, #489).
</para>
</listitem>
<listitem>
<para>
Truncate version string reported by PostgreSQL if necessary; some
distributions insert additional detail after the actual version.
(GitHub #490).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<sect1 id="release-4.1.0">
<title>Release 4.1.0</title>
<para><emphasis>Tue July 31, 2018</emphasis></para>
<para>
&repmgr; 4.1.0 introduces some changes to <application>repmgrd</application>
behaviour and some additional configuration parameters.
</para>
<para>
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.6.
The following post-upgrade steps must be carried out:
<itemizedlist>
<listitem>
<para>
Execute <command>ALTER EXTENSION repmgr UPDATE</command>
on the primary server in the database where &repmgr; is installed.
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application> must be restarted on all nodes where it is running.
</para>
</listitem>
</itemizedlist>
A restart of the PostgreSQL server is <emphasis>not</emphasis> required
for this release (unless upgrading from repmgr 3.x).
</para>
<para>
See <xref linkend="upgrading-repmgr-extension"> for more details.
</para>
<para>
Configuration changes are backwards-compatible and no changes to
<filename>repmgr.conf</filename> are required. However users should
review the changes listed below.
</para>
<note>
<para>
<emphasis>Repository changes</emphasis>
</para>
<para>
Coinciding with this release, the 2ndQuadrant repository structure has changed.
See section <xref linkend="installation-packages"> for details, particularly
if you are using a RPM-based system.
</para>
</note>
<sect2>
<title>Configuration file changes</title>
<para>
<itemizedlist>
<listitem>
<para>
Default for <xref linkend="repmgr-conf-log-level"> is now <option>INFO</option>.
This produces additional informative log output, without creating excessive additional
log file volume, and matches the setting assumed for examples in the documentation.
(GitHub #470).
</para>
</listitem>
<listitem>
<para>
<varname>recovery_min_apply_delay</varname> now accepts a minimum value
of <literal>zero</literal> (GitHub #448).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>repmgr enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<application>repmgr</application>: always exit with an error if an unrecognised
command line option is provided. This matches the behaviour of other PostgreSQL
utilities such as <application>psql</application>. (GitHub #464).
</para>
</listitem>
<listitem>
<para>
<application>repmgr</application>: add <option>-q/--quiet</option> option to suppress non-error
output. (GitHub #468).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-cluster-show">repmgr cluster show</link></command>,
<command><link linkend="repmgr-node-check">repmgr node check</link></command> and
<command><link linkend="repmgr-node-status">repmgr node status</link></command>
return non-zero exit code if node status issues detected. (GitHub #456).
</para>
</listitem>
<listitem>
<para>
Add <option>--csv</option> output option for
<command><link linkend="repmgr-cluster-event">repmgr cluster event</link></command>.
(GitHub #471).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-witness-unregister">repmgr witness unregister</link></command>
can be run on any node, by providing the ID of the witness node with <option>--node-id</option>.
(GitHub #472).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-standby-switchover">repmgr standby switchover</link></command>
will refuse to run if an exclusive backup is taking place on the current primary.
(GitHub #476).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>repmgrd enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<application>repmgrd</application>: create a PID file by default
(GitHub #457). For details, see <xref linkend="repmgrd-pid-file">.
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: daemonize process by default.
In case, for whatever reason, the user does not wish to daemonize the
process, provide <option>--daemonize=false</option>.
(GitHub #458).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>Bug fixes</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-standby-register">repmgr standby register --wait-sync</link></command>:
fix behaviour when no timeout provided.
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-cluster-cleanup">repmgr cluster cleanup</link></command>:
add missing help options. (GitHub #461/#462).
</para>
</listitem>
<listitem>
<para>
Ensure witness node follows new primary after switchover. (GitHub #453).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-node-check">repmgr node check</link></command> and
<command><link linkend="repmgr-node-status">repmgr node status</link></command>:
fix witness node handling. (GitHub #451).
</para>
</listitem>
<listitem>
<para>
When using <command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>
with <option>--recovery-conf-only</option> and replication slots, ensure
<varname>primary_slot_name</varname> is set correctly. (GitHub #474).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<sect1 id="release-4.0.6">
<title>Release 4.0.6</title>
<para><emphasis>Thu June 14, 2018</emphasis></para>
<para><emphasis>June 14, 2018</emphasis></para>
<para>
&repmgr; 4.0.6 contains a number of bug fixes and usability enhancements.
</para>
@@ -505,7 +140,7 @@
<listitem>
<para>
Various documentation improvements, with particular emphasis on
the importance of setting appropriate <link linkend="configuration-file-service-commands">service commands</link>
the importance of setting appropriate <link linkend="configuration-service-commands">service commands</link>
instead of relying on <application>pg_ctl</application>.
</para>
</listitem>

View File

@@ -5,14 +5,14 @@
<title>repmgr source code signing key</title>
<para>
The signing key ID used for <application>repmgr</application> source code bundles is:
<ulink url="https://repmgr.org/download/SOURCE-GPG-KEY-repmgr">
<ulink url="http://packages.2ndquadrant.com/repmgr/SOURCE-GPG-KEY-repmgr">
<literal>0x297F1DCC</literal></ulink>.
</para>
<para>
To download the <application>repmgr</application> source key to your computer:
<programlisting>
curl -s https://repmgr.org/download/SOURCE-GPG-KEY-repmgr | gpg --import
curl -s http://packages.2ndquadrant.com/repmgr/SOURCE-GPG-KEY-repmgr | gpg --import
gpg --fingerprint 0x297F1DCC
</programlisting>
then verify that the fingerprint is the expected value:

View File

@@ -1,107 +0,0 @@
<sect1 id="configuration-file-log-settings" xreflabel="log settings">
<indexterm>
<primary>repmgr.conf</primary>
<secondary>log settings</secondary>
</indexterm>
<indexterm>
<primary>log settings</primary>
<secondary>configuration in repmgr.conf</secondary>
</indexterm>
<title>Log settings</title>
<para>
By default, &repmgr; and <application>repmgrd</application> write log output to
<literal>STDERR</literal>. An alternative log destination can be specified
(either a file or <literal>syslog</literal>).
</para>
<note>
<para>
The &repmgr; application itself will continue to write log output to <literal>STDERR</literal>
even if another log destination is configured, as otherwise any output resulting from a command
line operation will "disappear" into the log.
</para>
<para>
This behaviour can be overriden with the command line option <option>--log-to-file</option>,
which will redirect all logging output to the configured log destination. This is recommended
when &repmgr; is executed by another application, particularly <application>repmgrd</application>,
to enable log output generated by the &repmgr; application to be stored for later reference.
</para>
</note>
<variablelist>
<varlistentry id="repmgr-conf-log-level" xreflabel="log_level">
<term><varname>log_level</varname> (<type>string</type>)
<indexterm>
<primary><varname>log_level</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
One of <option>DEBUG</option>, <option>INFO</option>, <option>NOTICE</option>,
<option>WARNING</option>, <option>ERROR</option>, <option>ALERT</option>, <option>CRIT</option>
or <option>EMERG</option>.
</para>
<para>
Default is <option>INFO</option>.
</para>
<para>
Note that <option>DEBUG</option> will produce a substantial amount of log output
and should not be enabled in normal use.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-log-facility" xreflabel="log_facility">
<term><varname>log_facility</varname> (<type>string</type>)
<indexterm>
<primary><varname>log_facility</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Logging facility: possible values are <option>STDERR</option> (default), or for
syslog integration, one of <option>LOCAL0</option>, <option>LOCAL1</option>, <option>...</option>,
<option>LOCAL7</option>, <option>USER</option>.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-log-file" xreflabel="log_file">
<term><varname>log_file</varname> (<type>string</type>)
<indexterm>
<primary><varname>log_file</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
If <xref linkend="repmgr-conf-log-facility"> is set to <option>STDERR</option>, log output
can be redirected to the specified file.
</para>
<para>
See <xref linkend="repmgrd-log-rotation"> for information on configuring log rotation.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-log-status-interval" xreflabel="log_status_interval">
<term><varname>log_status_interval</varname> (<type>integer</type>)
<indexterm>
<primary><varname>log_status_interval</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
This setting causes <application>repmgrd</application> to emit a status log
line at the specified interval (in seconds, default <literal>300</literal>)
describing <application>repmgrd</application>'s current state, e.g.:
</para>
<programlisting>
[2018-07-12 00:47:32] [INFO] monitoring connection to upstream node "node1" (node ID: 1)</programlisting>
</listitem>
</varlistentry>
</variablelist>
</sect1>

View File

@@ -1,10 +1,10 @@
<sect1 id="configuration-file-settings" xreflabel="required configuration file settings">
<sect1 id="configuration-file-settings" xreflabel="configuration file settings">
<indexterm>
<primary>repmgr.conf</primary>
<secondary>required settings</secondary>
<secondary>basic settings</secondary>
</indexterm>
<title>Required configuration file settings</title>
<title>Basic configuration file settings</title>
<para>
Each <filename>repmgr.conf</filename> file must contain the following parameters:
</para>

View File

@@ -1,4 +1,4 @@
<sect1 id="configuration-file-service-commands" xreflabel="service command settings">
<sect1 id="configuration-service-commands" xreflabel="service command settings">
<indexterm>
<primary>repmgr.conf</primary>
<secondary>service command settings</secondary>
@@ -25,7 +25,7 @@
<note>
<para>
If using <application>systemd</application>, ensure you have <varname>RemoteIPC</varname> set to <literal>off</literal>.
If using <application>systemd</application>, ensure you have <varname>RemoveIPC</varname> set to <literal>off</literal>.
See the <ulink url="https://wiki.postgresql.org/wiki/Systemd">systemd</ulink>
entry in the <ulink url="https://wiki.postgresql.org/wiki/Main_Page">PostgreSQL wiki</ulink> for details.
</para>
@@ -47,14 +47,6 @@
service_restart_command
service_reload_command</programlisting>
</para>
<note>
<para>
&repmgr; will not apply <option>pg_bindir</option> when executing any of these commands;
these can be user-defined scripts so must always be specified with the full path.
</para>
</note>
<note>
<para>
It's also possible to specify a <varname>service_promote_command</varname>.
@@ -64,7 +56,7 @@
</para>
<para>
If your packaging system does not provide such a command, it can be left empty,
and &repmgr; will generate the appropriate `pg_ctl ... promote` command.
and &repmgr; will generate the appropriate <command>pg_ctl ... promote</command> command.
</para>
<para>
Do not confuse this with <varname>promote_command</varname>, which is used
@@ -72,6 +64,7 @@
</para>
</note>
<para>
To confirm which command &repmgr; will execute for each action, use
<command>repmgr node service --list --action=...</command>, e.g.:

View File

@@ -2,17 +2,16 @@
<title>repmgr configuration</title>
&configuration-file;
&configuration-file-required-settings;
&configuration-file-log-settings;
&configuration-file-service-commands;
&configuration-file-settings;
&configuration-service-commands;
<sect1 id="configuration-permissions" xreflabel="Database user permissions">
<sect1 id="configuration-permissions" xreflabel="User permissions">
<indexterm>
<primary>configuration</primary>
<secondary>database user permissions</secondary>
<secondary>user permissions</secondary>
</indexterm>
<title>repmgr database user permissions</title>
<title>repmgr user permissions</title>
<para>
&repmgr; will create an extension database containing objects
for administering &repmgr; metadata. The user defined in the <varname>conninfo</varname>

View File

@@ -16,22 +16,15 @@
<para>
A typical use case for a witness server is a two-node streaming replication
setup, where the primary and standby are in different locations (data centres).
By creating a witness server in the same location (data centre) as the primary,
if the primary becomes unavailable it's possible for the standby to decide whether
it can promote itself without risking a "split brain" scenario: if it can't see either the
By creating a witness server in the same location as the primary, if the primary
becomes unavailable it's possible for the standby to decide whether it can
promote itself without risking a "split brain" scenario: if it can't see either the
witness or the primary server, it's likely there's a network-level interruption
and it should not promote itself. If it can seen the witness but not the primary,
this proves there is no network interruption and the primary itself is unavailable,
and it can therefore promote itself (and ideally take action to fence the
former primary).
</para>
<note>
<para>
<emphasis>Never</emphasis> install a witness server on the same physical host
as another node in the replication cluster managed by &repmgr; - it's essential
the witness is not affected in any way by failure of another node.
</para>
</note>
<para>
For more complex replication scenarios,e.g. with multiple datacentres, it may
be preferable to use location-based failover, which ensures that only nodes

View File

@@ -147,104 +147,34 @@
<para>
By default, all notification types will be passed to the designated script;
the notification types can be filtered to explicitly named ones using the
<varname>event_notifications</varname> parameter.
</para>
<para>
Events generated by the &repmgr; command:
<varname>event_notifications</varname> parameter:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal><link linkend="repmgr-primary-register-events">cluster_created</link></literal></simpara>
<simpara><literal>primary_register</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-primary-register-events">primary_register</link></literal></simpara>
<simpara><literal>primary_unregister</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-primary-unregister-events">primary_unregister</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-clone-events">standby_clone</link></literal></simpara>
<simpara><literal>standby_register</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-register-events">standby_register</link></literal></simpara>
<simpara><literal>standby_register_sync</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-register-events">standby_register_sync</link></literal></simpara>
<simpara><literal>standby_unregister</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-unregister-events">standby_unregister</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-promote-events">standby_promote</link></literal></simpara>
<simpara><literal>standby_clone</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-follow-events">standby_follow</link></literal></simpara>
<simpara><literal>standby_promote</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-standby-switchover-events">standby_switchover</link></literal></simpara>
<simpara><literal>standby_follow</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-witness-register-events">witness_register</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-witness-unregister-events">witness_unregister</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-node-rejoin-events">node_rejoin</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgr-cluster-cleanup-events">cluster_cleanup</link></literal></simpara>
</listitem>
</itemizedlist>
</para>
<para>
Events generated by <application>repmgrd</application> (streaming replication mode):
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>repmgrd_start</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_shutdown</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_reload</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_promote</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_follow</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_aborted</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_standby_reconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_promote_error</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_local_disconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_local_reconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_upstream_disconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_upstream_reconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>standby_disconnect_manual</literal></simpara>
</listitem>
@@ -254,13 +184,39 @@
<listitem>
<simpara><literal>standby_recovery</literal></simpara>
</listitem>
</itemizedlist>
</para>
<para>
Events generated by <application>repmgrd</application> (BDR mode):
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>witness_register</literal></simpara>
</listitem>
<listitem>
<simpara><literal>witness_unregister</literal></simpara>
</listitem>
<listitem>
<simpara><literal>node_rejoin</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_start</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_shutdown</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_promote</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_follow</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_aborted</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_upstream_disconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_upstream_reconnect</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_promote_error</literal></simpara>
</listitem>
<listitem>
<simpara><literal>bdr_failover</literal></simpara>
</listitem>

View File

@@ -38,9 +38,8 @@
<!ENTITY quickstart SYSTEM "quickstart.sgml">
<!ENTITY configuration SYSTEM "configuration.sgml">
<!ENTITY configuration-file SYSTEM "configuration-file.sgml">
<!ENTITY configuration-file-required-settings SYSTEM "configuration-file-required-settings.sgml">
<!ENTITY configuration-file-log-settings SYSTEM "configuration-file-log-settings.sgml">
<!ENTITY configuration-file-service-commands SYSTEM "configuration-file-service-commands.sgml">
<!ENTITY configuration-file-settings SYSTEM "configuration-file-settings.sgml">
<!ENTITY configuration-service-commands SYSTEM "configuration-service-commands.sgml">
<!ENTITY cloning-standbys SYSTEM "cloning-standbys.sgml">
<!ENTITY promoting-standby SYSTEM "promoting-standby.sgml">
<!ENTITY follow-new-primary SYSTEM "follow-new-primary.sgml">

View File

@@ -16,7 +16,7 @@
<para>
&repmgr; RPM packages for RedHat/CentOS variants and Fedora are available from the
<ulink url="https://2ndquadrant.com">2ndQuadrant</ulink>
<ulink url="https://dl.2ndquadrant.com/">public repository</ulink>; see following
<ulink url="https://rpm.2ndquadrant.com/">public RPM repository</ulink>; see following
section for details.
</para>
<para>
@@ -38,7 +38,7 @@
<para>
For more information on the package contents, including details of installation
paths and relevant <link linkend="configuration-file-service-commands">service commands</link>,
paths and relevant <link linkend="configuration-service-commands">service commands</link>,
see the appendix section <xref linkend="packages-centos">.
</para>
@@ -46,15 +46,26 @@
<sect3 id="installation-packages-redhat-2ndq">
<title>2ndQuadrant public RPM yum repository</title>
<note>
<para>
<ulink url="https://2ndquadrant.com">2ndQuadrant</ulink> previously provided a dedicated
&repmgr; repository at
<ulink url="http://packages.2ndquadrant.com/repmgr/">http://packages.2ndquadrant.com/repmgr/</ulink>.
This repository will be deprecated in a future release as it is now replaced by
the <ulink url="https://rpm.2ndquadrant.com/">public RPM repository</ulink>
documented below.
</para>
</note>
<para>
Beginning with <ulink url="https://repmgr.org/docs/4.0/release-4.0.5.html">repmgr 4.0.5</ulink>,
<ulink url="https://2ndquadrant.com/">2ndQuadrant</ulink> provides a dedicated <literal>yum</literal>
<ulink url="https://dl.2ndquadrant.com/">public repository</ulink> for 2ndQuadrant software,
<ulink url="https://rpm.2ndquadrant.com/">public RPM repository</ulink> for 2ndQuadrant software,
including &repmgr;. We recommend using this for all future &repmgr; releases.
</para>
<para>
General instructions for using this repository can be found on its
<ulink url="https://dl.2ndquadrant.com/">homepage</ulink>. Specific instructions
<ulink url="https://rpm.2ndquadrant.com/">homepage</ulink>. Specific instructions
for installing &repmgr; follow below.
</para>
<para>
@@ -64,19 +75,20 @@
<listitem>
<para>
Locate the repository RPM for your PostgreSQL version from the list at:
<ulink url="https://dl.2ndquadrant.com/">https://dl.2ndquadrant.com/</ulink>
<ulink url="https://rpm.2ndquadrant.com/">https://rpm.2ndquadrant.com/</ulink>
</para>
</listitem>
<listitem>
<para>
Install the repository definition for your distribution and PostgreSQL version
Install the repository RPM for your distribution and PostgreSQL version
(this enables the 2ndQuadrant repository as a source of &repmgr; packages).
</para>
<para>
For example, for PostgreSQL 10 on CentOS, execute:
<programlisting>
curl https://dl.2ndquadrant.com/default/release/get/10/rpm | sudo bash</programlisting>
sudo yum install https://rpm.2ndquadrant.com/site/content/2ndquadrant-repo-10-1-1.el7.noarch.rpm
</programlisting>
</para>
<para>
Verify that the repository is installed with:
@@ -84,8 +96,8 @@ curl https://dl.2ndquadrant.com/default/release/get/10/rpm | sudo bash</programl
sudo yum repolist</programlisting>
The output should contain two entries like this:
<programlisting>
2ndquadrant-dl-default-release-pg10/7/x86_64 2ndQuadrant packages (PG10) for 7 - x86_64 4
2ndquadrant-dl-default-release-pg10-debug/7/x86_64 2ndQuadrant packages (PG10) for 7 - x86_64 - Debug 3</programlisting>
2ndquadrant-repo-10/7/x86_64 2ndQuadrant packages for PG10 for rhel 7 - x86_64 1
2ndquadrant-repo-10-debug/7/x86_64 2ndQuadrant packages for PG10 for rhel 7 - x86_64 - Debug 1</programlisting>
</para>
</listitem>
@@ -155,7 +167,7 @@ $ yum install repmgr10</programlisting>
</para>
<para>
For more information on the package contents, including details of installation
paths and relevant <link linkend="configuration-file-service-commands">service commands</link>,
paths and relevant <link linkend="configuration-service-commands">service commands</link>,
see the appendix section <xref linkend="packages-debian-ubuntu">.
</para>
@@ -165,43 +177,52 @@ $ yum install repmgr10</programlisting>
<para>
Beginning with <ulink url="https://repmgr.org/docs/4.0/release-4.0.5.html">repmgr 4.0.5</ulink>,
<ulink url="https://2ndquadrant.com/">2ndQuadrant</ulink> provides a
<ulink url="https://dl.2ndquadrant.com/">public apt repository</ulink> for 2ndQuadrant software,
<ulink url="https://apt.2ndquadrant.com/">public apt repository</ulink> for 2ndQuadrant software,
including &repmgr;.
</para>
<para>
General instructions for using this repository can be found on its
<ulink url="https://dl.2ndquadrant.com/">homepage</ulink>. Specific instructions
<ulink url="https://apt.2ndquadrant.com/">homepage</ulink>. Specific instructions
for installing &repmgr; follow below.
</para>
<para>
<emphasis>Installation</emphasis>
<itemizedlist>
<listitem>
<para>
Install the repository definition for your distribution and PostgreSQL version
(this enables the 2ndQuadrant repository as a source of &repmgr; packages) by executing:
<programlisting>
curl https://dl.2ndquadrant.com/default/release/get/deb | sudo bash</programlisting>
If not already present, install the <application>apt-transport-https</application> package:
<programlisting>
sudo apt-get install apt-transport-https</programlisting>
</para>
<note>
<para>
This will automatically install the following additional packages, if not already present:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>lsb-release</literal></simpara>
</listitem>
<listitem>
<simpara><literal>apt-transport-https</literal></simpara>
</listitem>
</itemizedlist>
</para>
</note>
</listitem>
</listitem>
<listitem>
<para>
Create <filename>/etc/apt/sources.list.d/2ndquadrant.list</filename> as follows:
<programlisting>
sudo sh -c 'echo "deb https://apt.2ndquadrant.com/ $(lsb_release -cs)-2ndquadrant main" > /etc/apt/sources.list.d/2ndquadrant.list'</programlisting>
</para>
</listitem>
<listitem>
<para>
Install the 2ndQuadrant <ulink url="https://apt.2ndquadrant.com/site/keys/9904CD4BD6BAF0C3.asc">repository key</ulink>:
<programlisting>
sudo apt-get install curl ca-certificates
curl https://apt.2ndquadrant.com/site/keys/9904CD4BD6BAF0C3.asc | sudo apt-key add -</programlisting>
</para>
</listitem>
<listitem>
<para>
Update the package list
<programlisting>
sudo apt-get update</programlisting>
</para>
</listitem>
<listitem>
<para>

View File

@@ -12,8 +12,8 @@
To install &repmgr; the prerequisites for compiling
&postgres; must be installed. These are described in &postgres;'s
documentation
on <ulink url="https://www.postgresql.org/docs/current/static/install-requirements.html">build requirements</ulink>
and <ulink url="https://www.postgresql.org/docs/current/static/docguide-toolsets.html">build requirements for documentation</ulink>.
on <ulink url="https://www.postgresql.org/docs/current/install-requirements.html">build requirements</ulink>
and <ulink url="https://www.postgresql.org/docs/current/docguide-toolsets.html">build requirements for documentation</ulink>.
</para>
<para>

View File

@@ -234,34 +234,17 @@
<para>
<filename>repmgr.conf</filename> should not be stored inside the PostgreSQL data directory,
as it could be overwritten when setting up or reinitialising the PostgreSQL
server. See sections <xref linkend="configuration"> and <xref linkend="configuration-file">
server. See sections on <xref linkend="configuration-file"> and <xref linkend="configuration-file-settings">
for further details about <filename>repmgr.conf</filename>.
</para>
<tip>
<simpara>
For Debian-based distributions we recommend explictly setting
<option>pg_bindir</option> to the directory where <command>pg_ctl</command> and other binaries
<literal>pg_bindir</literal> to the directory where <command>pg_ctl</command> and other binaries
not in the standard path are located. For PostgreSQL 9.6 this would be <filename>/usr/lib/postgresql/9.6/bin/</filename>.
</simpara>
</tip>
<note>
<para>
&repmgr; only uses <option>pg_bindir</option> when it executes
PostgreSQL binaries directly.
</para>
<para>
For user-defined scripts such as <option>promote_command</option> and the
various <option>service_*_command</option>s, you <emphasis>must</emphasis>
always explicitly provide the full path to the binary or script being
executed, even if it is &repmgr; itself.
</para>
<para>
This is because these options can contain user-defined scripts in arbitrary
locations, so prepending <option>pg_bindir</option> may break them.
</para>
</note>
<para>
See the file
<ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</>

View File

@@ -15,14 +15,9 @@
<title>Description</title>
<para>
Purges monitoring history from the <literal>repmgr.monitoring_history</literal> table to
prevent excessive table growth.
</para>
<para>
By default <emphasis>all</emphasis> data will be removed; Use the <option>-k/--keep-history</option>
option to specify the number of days of monitoring history to retain.
</para>
<para>
This command can be executed manually or as a cronjob.
prevent excessive table growth. Use the <literal>-k/--keep-history</literal> to specify the
number of days of monitoring history to retain. This command can be used
manually or as a cronjob.
</para>
</refsect1>
@@ -43,21 +38,4 @@
<filename>repmgr.conf</filename>.
</para>
</refsect1>
<refsect1 id="repmgr-cluster-cleanup-events">
<title>Event notifications</title>
<para>
A <literal>cluster_cleanup</literal> <link linkend="event-notifications">event notification</link> will be generated.
</para>
</refsect1>
<refsect1>
<title>See also</title>
<para>
For more details see the sections <xref linkend="repmgrd-monitoring"> and
<xref linkend="repmgrd-monitoring-configuration">.
</para>
</refsect1>
</refentry>

View File

@@ -56,7 +56,7 @@
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<term><option>ERR_CLUSTER_CHECK (25)</option></term>
<listitem>
<para>
One or more nodes could not be reached.

View File

@@ -49,22 +49,6 @@
</para>
</refsect1>
<refsect1>
<title>Output format</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>--csv</literal>: generate output in CSV format. Note that the <literal>Details</literal>
column will currently not be emitted in CSV format.
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>Example</title>
<para>

View File

@@ -116,7 +116,7 @@
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<term><option>ERR_CLUSTER_CHECK (25)</option></term>
<listitem>
<para>
One or more nodes could not be reached.

View File

@@ -113,40 +113,4 @@
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr cluster show</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-node-status">, <xref linkend="repmgr-node-check">
</para>
</refsect1>
</refentry>

View File

@@ -61,9 +61,7 @@
<listitem>
<simpara>
<literal>--archive-ready</literal>: checks for WAL files which have not yet been archived,
and returns <literal>WARNING</literal> or <literal>CRITICAL</literal> if the number
exceeds <varname>archive_ready_warning</varname> or <varname>archive_ready_critical</varname> respectively.
<literal>--archive-ready</literal>: checks for WAL files which have not yet been archived
</simpara>
</listitem>
@@ -79,12 +77,6 @@
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--missing-slots</literal>: checks there are no missing replication slots
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
@@ -109,80 +101,4 @@
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
When executing <command>repmgr node check</command> with one of the individual
checks listed above, &repmgr; will emit one of the following Nagios-style exit codes
(even if <literal>--nagios</literal> is not supplied):
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>0</literal>: OK
</simpara>
</listitem>
<listitem>
<simpara>
<literal>1</literal>: WARNING
</simpara>
</listitem>
<listitem>
<simpara>
<literal>2</literal>: ERROR
</simpara>
</listitem>
<listitem>
<simpara>
<literal>3</literal>: UNKNOWN
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Following exit codes can be emitted by <command>repmgr status check</command>
if no individual check was specified.
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-node-status">, <xref linkend="repmgr-cluster-show">
</para>
</refsect1>
</refentry>

View File

@@ -28,10 +28,6 @@
If the node is running and needs to be attached to the current primary, use
<xref linkend="repmgr-standby-follow">.
</para>
<para>
Note <xref linkend="repmgr-standby-follow"> can only be used for standbys which have not diverged
from the rest of the cluster.
</para>
</tip>
</refsect1>
@@ -119,26 +115,8 @@
</variablelist>
</refsect1>
<refsect1>
<title>Configuration file settings</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>node_rejoin_timeout</literal>:
the maximum length of time (in seconds) to wait for
the node to reconnect to the replication cluster (defaults to
the value set in <literal>standby_reconnect_timeout</literal>,
60 seconds).
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1 id="repmgr-node-rejoin-events">
<title>Event notifications</title>
<para>
A <literal>node_rejoin</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -52,40 +52,10 @@
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr node status</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
See <xref linkend="repmgr-node-check"> to diagnose issues and <xref linkend="repmgr-cluster-show">
for an overview of all nodes in the cluster.
See <xref linkend="repmgr-node-check"> to diagnose issues.
</para>
</refsect1>
</refentry>

View File

@@ -17,7 +17,7 @@
<title>Description</title>
<para>
<command>repmgr primary register</command> registers a primary node in a
streaming replication cluster, and configures it for use with &repmgr;, including
streaming replication cluster, and configures it for use with repmgr, including
installing the &repmgr; extension. This command needs to be executed before any
standby nodes are registered.
</para>
@@ -75,18 +75,10 @@
</refsect1>
<refsect1 id="repmgr-primary-register-events">
<refsect1>
<title>Event notifications</title>
<para>
Following <link linkend="event-notifications">event notifications</link> will be generated:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>cluster_created</literal></simpara>
</listitem>
<listitem>
<simpara><literal>primary_register</literal></simpara>
</listitem>
</itemizedlist>
A <literal>primary_register</literal> <link linkend="event-notifications">event notification</link> will be generated.
</para>
</refsect1>

View File

@@ -64,7 +64,7 @@
</refsect1>
<refsect1 id="repmgr-primary-unregister-events">
<refsect1>
<title>Event notifications</title>
<para>
A <literal>primary_unregister</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -49,7 +49,7 @@
not be copied by default. &repmgr; can copy these files, either to the same
location on the standby server (provided appropriate directory and file permissions
are available), or into the standby's data directory. This requires passwordless
SSH access to the primary server. Add the option <option>--copy-external-config-files</option>
SSH access to the primary server. Add the option <literal>--copy-external-config-files</literal>
to the <command>repmgr standby clone</command> command; by default files will be copied to
the same path as on the upstream server. Note that the user executing <command>repmgr</command>
must have write access to those directories.
@@ -59,29 +59,12 @@
<literal>--copy-external-config-files=pgdata</literal>, but note that
any include directives in the copied files may need to be updated.
</para>
<note>
<para>
When executing <command>repmgr standby clone</command> with the
<option>--copy-external-config-files</option> aand <option>--dry-run</option>
options, &repmgr; will check the SSH connection to the source node, but
will not verify whether the files can actually be copied.
</para>
<para>
During the actual clone operation, a check will be made before the database itself
is cloned to determine whether the files can actually be copied; if any problems are
encountered, the clone operation will be aborted, enabling the user to fix
any issues before retrying the clone operation.
</para>
</note>
<tip>
<simpara>
For reliable configuration file management we recommend using a
configuration management tool such as Ansible, Chef, Puppet or Salt.
</simpara>
</tip>
</refsect1>
<refsect1 id="repmgr-standby-clone-recovery-conf">
@@ -230,15 +213,6 @@
<variablelist>
<varlistentry>
<term><option>-d, --dbname=CONNINFO</option></term>
<listitem>
<para>
Connection string of the upstream node to use for cloning.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
@@ -350,7 +324,7 @@
</variablelist>
</refsect1>
<refsect1 id="repmgr-standby-clone-events">
<refsect1>
<title>Event notifications</title>
<para>
A <literal>standby_clone</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -94,7 +94,7 @@
</variablelist>
</refsect1>
<refsect1 id="repmgr-standby-follow-events">
<refsect1>
<title>Event notifications</title>
<para>
A <literal>standby_follow</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -50,7 +50,7 @@
</refsect1>
<refsect1 id="repmgr-standby-promote-events">
<refsect1>
<title>Event notifications</title>
<para>
A <literal>standby_promote</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -159,7 +159,7 @@
</variablelist>
</refsect1>
<refsect1 id="repmgr-standby-register-events">
<refsect1>
<title>Event notifications</title>
<para>
A <literal>standby_register</literal> <link linkend="event-notifications">event notification</link>

View File

@@ -46,9 +46,6 @@
<application>repmgrd</application> should not be active on any nodes while a switchover is being
executed. This restriction may be lifted in a later version.
</para>
<para>
&repmgr; will not perform the switchover if an exclusive backup is running on the current primary.
</para>
</note>
</refsect1>
@@ -166,8 +163,8 @@
<listitem>
<simpara>
<literal>standby_reconnect_timeout</literal>:
number of seconds to attempt to wait for the demoted primary
to reconnect to the promoted primary (default: 60 seconds)
Number of seconds to attempt to reconnect to the demoted primary
once it has been restarted.
</simpara>
</listitem>
@@ -196,7 +193,7 @@
</para>
</refsect1>
<refsect1 id="repmgr-standby-switchover-events">
<refsect1>
<title>Event notifications</title>
<para>
<literal>standby_switchover</literal> and <literal>standby_promote</literal>

View File

@@ -59,7 +59,7 @@
</variablelist>
</refsect1>
<refsect1 id="repmgr-standby-unregister-events">
<refsect1>
<title>Event notifications</title>
<para>
A <literal>standby_unregister</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -50,7 +50,7 @@
</refsect1>
<refsect1 id="repmgr-witness-register-events">
<refsect1>
<title>Event notifications</title>
<para>
A <literal>witness_register</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -20,10 +20,7 @@
</para>
<para>
The node does not have to be running to be unregistered, however if this is the
case then either provide connection information for the primary server, or
execute <command>repmgr witness unregister</command> on a running node and
provide the parameter <option>--node-id</option> with the node ID of the
witness server.
case then connection information for the primary server must be provided.
</para>
<para>
Execute with the <literal>--dry-run</literal> option to check what would happen
@@ -39,17 +36,17 @@
INFO: connecting to witness node "node3" (ID: 3)
INFO: unregistering witness node 3
INFO: witness unregistration complete
DETAIL: witness node with UD 3 successfully unregistered</programlisting>
DETAIL: witness node with id 3 (conninfo: host=node3 dbname=repmgr user=repmgr port=5499) successfully unregistered</programlisting>
</para>
<para>
Unregistering a non-running witness node:
<programlisting>
$ repmgr -f /etc/repmgr.conf witness unregister -h node1 -p 5501 -F
INFO: connecting to node "node3" (ID: 3)
NOTICE: unable to connect to node "node3" (ID: 3), removing node record on cluster primary only
INFO: connecting to witness node "node3" (ID: 3)
NOTICE: unable to connect to witness node "node3" (ID: 3), removing node record on cluster primary only
INFO: unregistering witness node 3
INFO: witness unregistration complete
DETAIL: witness node with id ID 3 successfully unregistered</programlisting>
DETAIL: witness node with id 3 (conninfo: host=node3 dbname=repmgr user=repmgr port=5499) successfully unregistered</programlisting>
</para>
</refsect1>
@@ -65,34 +62,8 @@
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually unregister the witness.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--node-id</option></term>
<listitem>
<para>
Unregister witness server with the specified node ID.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1 id="repmgr-witness-unregister-events">
<title>Event notifications</title>
<para>
A <literal>witness_unregister</literal> <link linkend="event-notifications">event notification</link> will be generated.

View File

@@ -25,7 +25,13 @@
<para>
This is the official documentation of &repmgr; &repmgrversion; for
use with PostgreSQL 9.3 - PostgreSQL 10.
It describes the functionality supported by the current version of &repmgr;.
</para>
<para>
&repmgr; is being continually developed and we strongly recommend using the
latest version. Please check the
<ulink url="https://repmgr.org/">repmgr website</ulink> for details
about the current &repmgr; version as well as the
<ulink url="https://repmgr.org/docs/current/index.html">current documentation</ulink>.
</para>
<para>

View File

@@ -15,7 +15,7 @@
</para>
<note>
<simpara>
Due to the nature of BDR 1.x/2.x, it's only safe to use this solution for
Due to the nature of BDR, it's only safe to use this solution for
a two-node scenario. Introducing additional nodes will create an inherent
risk of node desynchronisation if a node goes down without being cleanly
removed from the cluster.

View File

@@ -24,7 +24,7 @@
<para>
To use <application>repmgrd</application>, its associated function library <emphasis>must</emphasis> be
included via <filename>postgresql.conf</filename> with:
included in <filename>postgresql.conf</filename> with:
<programlisting>
shared_preload_libraries = 'repmgr'</programlisting>
@@ -52,7 +52,6 @@
running <application>repmgrd</application> daemon.
</para>
<sect2 id="repmgrd-automatic-failover-configuration">
<title>automatic failover configuration</title>
<para>
@@ -64,17 +63,8 @@
follow_command='/usr/bin/repmgr standby follow -f /etc/repmgr.conf --log-to-file --upstream-node-id=%n'</programlisting>
</para>
<para>
Adjust file paths as appropriate; alway specify the full path to the &repmgr; binary.
Adjust file paths as appropriate; we recomment specifying the full path to the &repmgr; binary.
</para>
<note>
<para>
&repmgr; will not apply <option>pg_bindir</option> when executing <option>promote_command</option>
or <option>follow_command</option>; these can be user-defined scripts so must always be
specified with the full path.
</para>
</note>
<para>
Note that the <literal>--log-to-file</literal> option will cause
output generated by the &repmgr; command, when executed by <application>repmgrd</application>,
@@ -140,11 +130,11 @@
particularly on <application>systemd</application>-based systems.
</para>
<para>
For more details, see <xref linkend="configuration-file-service-commands">.
For more details, see <xref linkend="configuration-service-commands">.
</para>
</sect2>
<sect2 id="repmgrd-monitoring-configuration" xreflabel="repmgrd monitoring configuration">
<sect2 id="repmgrd-monitoring-configuration">
<indexterm>
<primary>repmgrd</primary>
<secondary>monitoring configuration</secondary>
@@ -187,63 +177,10 @@
<para>
<application>repmgrd</application> can be started manually like this:
<programlisting>
repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid</programlisting>
repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize</programlisting>
and stopped with <command>kill `cat /tmp/repmgrd.pid`</command>. Adjust paths as appropriate.
</para>
<sect2 id="repmgrd-pid-file" xreflabel="repmgrd's PID file">
<indexterm>
<primary>repmgrd</primary>
<secondary>PID file</secondary>
</indexterm>
<indexterm>
<primary>PID file</primary>
<secondary>repmgrd</secondary>
</indexterm>
<title>repmgrd's PID file</title>
<para>
<application>repmgrd</application> will generate a PID file by default.
</para>
<note>
<simpara>
This is a behaviour change from previous versions (earlier than 4.1), where
the PID file had to be explicitly specified with the command line
parameter <option> --pid-file</option>.
</simpara>
</note>
<para>
The PID file can be specified in <filename>repmgr.conf</filename> with the configuration
parameter <varname>repmgrd_pid_file</varname>.
</para>
<para>
It can also be specified on the command line (as in previous versions) with
the command line parameter <option>--pid-file</option>. Note this will override
any value set in <filename>repmgr.conf</filename> with <varname>repmgrd_pid_file</varname>.
<option>--pid-file</option> may be deprecated in future releases.
</para>
<para>
If a PID file location was specified by the package maintainer, <application>repmgrd</application>
will use that. This only applies if &repmgr; was installed from a package and the package
maintainer has specified the PID file location.
</para>
<para>
If none of the above apply, <application>repmgrd</application> will create a PID file
in the operating system's temporary directory (das etermined by the environment variable
<varname>TMPDIR</varname>, or if that is not set, will use <filename>/tmp</filename>).
</para>
<para>
To prevent a PID file being generated at all, provide the command line option
<option>--no-pid-file</option>.
</para>
<para>
To see which PID file <application>repmgrd</application> would use, execute <application>repmgrd</application>
with the option <option>--show-pid-file</option>. <application>repmgrd</application>
will not start if this option is provided. Note that the value shown is the
file <application>repmgrd</application> would use next time it starts, and is
not necessarily the PID file currently in use.
</para>
</sect2>
<sect2 id="repmgrd-configuration-debian-ubuntu">
<indexterm>
<primary>repmgrd</primary>
@@ -332,34 +269,25 @@ REPMGRD_ENABLED=no
<secondary>repmgrd</secondary>
</indexterm>
<indexterm>
<primary>repmgrd</primary>
<secondary>log rotation</secondary>
</indexterm>
<title>repmgrd log rotation</title>
<para>
To ensure the current <application>repmgrd</application> logfile
(specified in <filename>repmgr.conf</filename> with the parameter
<option>log_file</option>) does not grow indefinitely, configure your
<option>log_file</option> does not grow indefinitely, configure your
system's <command>logrotate</command> to regularly rotate it.
</para>
<para>
Sample configuration to rotate logfiles weekly with retention for
up to 52 weeks and rotation forced if a file grows beyond 100Mb:
<programlisting>
/var/log/repmgr/repmgrd.log {
/var/log/postgresql/repmgr-9.6.log {
missingok
compress
rotate 52
maxsize 100M
weekly
create 0600 postgres postgres
postrotate
/usr/bin/killall -HUP repmgrd
endscript
}</programlisting>
</para>
</sect1>
</chapter>

View File

@@ -1,4 +1,4 @@
<chapter id="repmgrd-degraded-monitoring" xreflabel="repmgrd degraded monitoring">
<chapter id="repmgrd-degraded-monitoring">
<indexterm>
<primary>repmgrd</primary>
<secondary>degraded monitoring</secondary>
@@ -7,8 +7,8 @@
<title>"degraded monitoring" mode</title>
<para>
In certain circumstances, <application>repmgrd</application> is not able to fulfill its primary mission
of monitoring the node's upstream server. In these cases it enters &quot;degraded monitoring&quot;
mode, where <application>repmgrd</application> remains active but is waiting for the situation
of monitoring the nodes' upstream server. In these cases it enters "degraded
monitoring" mode, where <application>repmgrd</application> remains active but is waiting for the situation
to be resolved.
</para>
<para>

View File

@@ -1,4 +1,4 @@
<chapter id="repmgrd-monitoring" xreflabel="Monitoring with repmgrd">
<chapter id="repmgrd-monitoring">
<indexterm>
<primary>repmgrd</primary>
<secondary>monitoring</secondary>

View File

@@ -40,8 +40,8 @@
In a failover situation, <application>repmgrd</application> will check if any servers in the
same location as the current primary node are visible. If not, <application>repmgrd</application>
will assume a network interruption and not promote any node in any
other location (it will however enter <link linkend="repmgrd-degraded-monitoring">degraded monitoring</link>
mode until a primary becomes visible).
other location (it will however enter <xref linkend="repmgrd-degraded-monitoring"> mode until
a primary becomes visible).
</para>
</chapter>

View File

@@ -57,14 +57,7 @@
<para>
As mentioned in the previous section, success of the switchover operation depends on
&repmgr; being able to shut down the current primary server quickly and cleanly.
</para>
<para>
Ensure that the promotion candidate has sufficient free walsenders available
(PostgreSQL configuration item <varname>max_wal_senders</varname>), and if replication
slots are in use, at least one free slot is available for the demotion candidate (
PostgreSQL configuration item <varname>max_replication_slots</varname>).
&repmgr; being able to shut down the current primary server quickly and cleanly.
</para>
<para>
@@ -111,7 +104,7 @@
server.
</para>
<para>
For more details, see <xref linkend="configuration-file-service-commands">.
For more details, see <xref linkend="configuration-service-commands">.
</para>
</important>
@@ -128,21 +121,15 @@
</simpara>
</note>
<para>
Check that access from applications is minimalized or preferably blocked
completely, so applications are not unexpectedly interrupted.
Check that access from applications is minimalized or preferably blocked
completely, so applications are not unexpectedly interrupted.
</para>
<note>
<para>
If an exclusive backup is running on the current primary, &repmgr; will not perform the
switchover.
</para>
</note>
<para>
Check there is no significant replication lag on standbys attached to the
current primary.
Check there is no significant replication lag on standbys attached to the
current primary.
</para>
<para>
@@ -160,7 +147,6 @@
</para>
</note>
<para>
Finally, consider executing <command>repmgr standby switchover</command> with the
<literal>--dry-run</literal> option; this will perform any necessary checks and inform you about

View File

@@ -29,18 +29,8 @@
</listitem>
<listitem>
<simpara>
<application>repmgrd</application> (if running) must be restarted.
</simpara>
</listitem>
<listitem>
<simpara>
For major releases, e.g. from <literal>4.0.x</literal> to <literal>4.1</literal>,
execute <command>ALTER EXTENSION repmgr UPDATE</command>
on the primary node in the database where the &repmgr; extension is installed.
</simpara>
<simpara>
This will update the extension metadata and, if necessary, apply
changes to the &repmgr; extension objects.
In the database where the &repmgr; extension is installed, execute
<command>ALTER EXTENSION repmgr UPDATE</command>.
</simpara>
</listitem>
</orderedlist>
@@ -51,6 +41,10 @@
release as they may contain upgrade instructions particular to individual versions.
</para>
<para>
If the <application>repmgrd</application> daemon is in use, we recommend stopping it
before upgrading &repmgr;.
</para>
<para>
Note that it may be necessary to restart the PostgreSQL server if the upgrade contains
changes to the shared object file used by <application>repmgrd</application>; check the

View File

@@ -1 +1 @@
<!ENTITY repmgrversion "4.1.1">
<!ENTITY repmgrversion "4.0.6">

View File

@@ -46,6 +46,6 @@
#define ERR_SWITCHOVER_INCOMPLETE 22
#define ERR_FOLLOW_FAIL 23
#define ERR_REJOIN_FAIL 24
#define ERR_NODE_STATUS 25
#define ERR_CLUSTER_CHECK 25
#endif /* _ERRCODE_H_ */

12
log.c
View File

@@ -42,7 +42,7 @@ _stderr_log_with_level(const char *level_name, int level, const char *fmt, va_li
__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0)));
int log_type = REPMGR_STDERR;
int log_level = LOG_INFO;
int log_level = LOG_NOTICE;
int last_log_level = LOG_INFO;
int verbose_logging = false;
int terse_logging = false;
@@ -70,7 +70,7 @@ _stderr_log_with_level(const char *level_name, int level, const char *fmt, va_li
/*
* Store the requested level so that if there's a subsequent log_hint() or
* log_detail(), we can suppress that if --terse was specified,
* log_detail(), we can suppress that if appropriate.
*/
last_log_level = level;
@@ -329,13 +329,6 @@ logger_set_terse(void)
}
void
logger_set_level(int new_log_level)
{
log_level = new_log_level;
}
void
logger_set_min_level(int min_log_level)
{
@@ -343,7 +336,6 @@ logger_set_min_level(int min_log_level)
log_level = min_log_level;
}
int
detect_log_level(const char *level)
{

1
log.h
View File

@@ -129,7 +129,6 @@ bool logger_shutdown(void);
void logger_set_verbose(void);
void logger_set_terse(void);
void logger_set_min_level(int min_log_level);
void logger_set_level(int new_log_level);
void
log_detail(const char *fmt,...)

View File

@@ -1,2 +0,0 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit

View File

@@ -1,167 +0,0 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit
CREATE TABLE repmgr.nodes (
node_id INTEGER PRIMARY KEY,
upstream_node_id INTEGER NULL REFERENCES nodes (node_id) DEFERRABLE,
active BOOLEAN NOT NULL DEFAULT TRUE,
node_name TEXT NOT NULL,
type TEXT NOT NULL CHECK (type IN('primary','standby','witness','bdr')),
location TEXT NOT NULL DEFAULT 'default',
priority INT NOT NULL DEFAULT 100,
conninfo TEXT NOT NULL,
repluser VARCHAR(63) NOT NULL,
slot_name TEXT NULL,
config_file TEXT NOT NULL
);
CREATE TABLE repmgr.events (
node_id INTEGER NOT NULL,
event TEXT NOT NULL,
successful BOOLEAN NOT NULL DEFAULT TRUE,
event_timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
details TEXT NULL
);
DO $repmgr$
DECLARE
DECLARE server_version_num INT;
BEGIN
SELECT setting
FROM pg_catalog.pg_settings
WHERE name = 'server_version_num'
INTO server_version_num;
IF server_version_num >= 90400 THEN
EXECUTE $repmgr_func$
CREATE TABLE repmgr.monitoring_history (
primary_node_id INTEGER NOT NULL,
standby_node_id INTEGER NOT NULL,
last_monitor_time TIMESTAMP WITH TIME ZONE NOT NULL,
last_apply_time TIMESTAMP WITH TIME ZONE,
last_wal_primary_location PG_LSN NOT NULL,
last_wal_standby_location PG_LSN,
replication_lag BIGINT NOT NULL,
apply_lag BIGINT NOT NULL
)
$repmgr_func$;
ELSE
EXECUTE $repmgr_func$
CREATE TABLE repmgr.monitoring_history (
primary_node_id INTEGER NOT NULL,
standby_node_id INTEGER NOT NULL,
last_monitor_time TIMESTAMP WITH TIME ZONE NOT NULL,
last_apply_time TIMESTAMP WITH TIME ZONE,
last_wal_primary_location TEXT NOT NULL,
last_wal_standby_location TEXT,
replication_lag BIGINT NOT NULL,
apply_lag BIGINT NOT NULL
)
$repmgr_func$;
END IF;
END$repmgr$;
CREATE INDEX idx_monitoring_history_time
ON repmgr.monitoring_history (last_monitor_time, standby_node_id);
CREATE VIEW repmgr.show_nodes AS
SELECT n.node_id,
n.node_name,
n.active,
n.upstream_node_id,
un.node_name AS upstream_node_name,
n.type,
n.priority,
n.conninfo
FROM repmgr.nodes n
LEFT JOIN repmgr.nodes un
ON un.node_id = n.upstream_node_id;
/* XXX update upgrade scripts! */
CREATE TABLE repmgr.voting_term (
term INT NOT NULL
);
CREATE UNIQUE INDEX voting_term_restrict
ON repmgr.voting_term ((TRUE));
CREATE RULE voting_term_delete AS
ON DELETE TO repmgr.voting_term
DO INSTEAD NOTHING;
/* ================= */
/* repmgrd functions */
/* ================= */
/* monitoring functions */
CREATE FUNCTION set_local_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION get_local_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION standby_set_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'standby_set_last_updated'
LANGUAGE C STRICT;
CREATE FUNCTION standby_get_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'standby_get_last_updated'
LANGUAGE C STRICT;
/* failover functions */
CREATE FUNCTION notify_follow_primary(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'notify_follow_primary'
LANGUAGE C STRICT;
CREATE FUNCTION get_new_primary()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_new_primary'
LANGUAGE C STRICT;
CREATE FUNCTION reset_voting_status()
RETURNS VOID
AS 'MODULE_PATHNAME', 'reset_voting_status'
LANGUAGE C STRICT;
CREATE FUNCTION am_bdr_failover_handler(INT)
RETURNS BOOL
AS 'MODULE_PATHNAME', 'am_bdr_failover_handler'
LANGUAGE C STRICT;
CREATE FUNCTION unset_bdr_failover_handler()
RETURNS VOID
AS 'MODULE_PATHNAME', 'unset_bdr_failover_handler'
LANGUAGE C STRICT;
CREATE VIEW repmgr.replication_status AS
SELECT m.primary_node_id, m.standby_node_id, n.node_name AS standby_name,
n.type AS node_type, n.active, last_monitor_time,
CASE WHEN n.type='standby' THEN m.last_wal_primary_location ELSE NULL END AS last_wal_primary_location,
m.last_wal_standby_location,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.replication_lag) ELSE NULL END AS replication_lag,
CASE WHEN n.type='standby' THEN
CASE WHEN replication_lag > 0 THEN age(now(), m.last_apply_time) ELSE '0'::INTERVAL END
ELSE NULL
END AS replication_time_lag,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.apply_lag) ELSE NULL END AS apply_lag,
AGE(NOW(), CASE WHEN pg_catalog.pg_is_in_recovery() THEN repmgr.standby_get_last_updated() ELSE m.last_monitor_time END) AS communication_time_lag
FROM repmgr.monitoring_history m
JOIN repmgr.nodes n ON m.standby_node_id = n.node_id
WHERE (m.standby_node_id, m.last_monitor_time) IN (
SELECT m1.standby_node_id, MAX(m1.last_monitor_time)
FROM repmgr.monitoring_history m1 GROUP BY 1
);

View File

@@ -83,10 +83,9 @@ do_bdr_register(void)
exit(ERR_BAD_CONFIG);
}
/* BDR 2 implementation is for 2 nodes only */
if (get_bdr_version_num() < 3 && bdr_nodes.node_count > 2)
if (bdr_nodes.node_count > 2)
{
log_error(_("repmgr can only support BDR 2.x clusters with 2 nodes"));
log_error(_("repmgr can only support BDR clusters with 2 nodes"));
log_detail(_("this BDR cluster has %i nodes"), bdr_nodes.node_count);
PQfinish(conn);
pfree(dbname);
@@ -177,7 +176,6 @@ do_bdr_register(void)
if (bdr_node_has_repmgr_set(conn, config_file_options.node_name) == false)
{
log_debug("bdr_node_has_repmgr_set() = false");
bdr_node_set_repmgr_set(conn, config_file_options.node_name);
}
@@ -203,7 +201,6 @@ do_bdr_register(void)
if (bdr_nodes.node_count == 0)
{
log_error(_("unable to retrieve any BDR node records"));
log_detail("%s", PQerrorMessage(conn));
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
@@ -255,35 +252,7 @@ do_bdr_register(void)
}
/* Add the repmgr extension tables to a replication set */
if (get_bdr_version_num() < 3)
{
add_extension_tables_to_bdr_replication_set(conn);
}
else
{
/* this is the only table we need to replicate */
char *replication_set = get_default_bdr_replication_set(conn);
/*
* this probably won't happen, but we need to be sure we're using
* the replication set metadata correctly...
*/
if (conn == NULL)
{
log_error(_("unable to retrieve default BDR replication set"));
log_hint(_("see preceding messages"));
log_debug("check query in get_default_bdr_replication_set()");
exit(ERR_BAD_CONFIG);
}
if (is_table_in_bdr_replication_set(conn, "nodes", replication_set) == false)
{
add_table_to_bdr_replication_set(conn, "nodes", replication_set);
}
pfree(replication_set);
}
add_extension_tables_to_bdr_replication_set(conn);
initPQExpBuffer(&event_details);

View File

@@ -83,7 +83,6 @@ do_cluster_show(void)
int i = 0;
ItemList warnings = {NULL, NULL};
bool success = false;
bool error_found = false;
/* Connect to local database to obtain cluster connection data */
log_verbose(LOG_INFO, _("connecting to database"));
@@ -219,7 +218,6 @@ do_cluster_show(void)
else
{
appendPQExpBuffer(&details, "- failed");
error_found = true;
}
}
}
@@ -283,7 +281,6 @@ do_cluster_show(void)
else
{
appendPQExpBuffer(&details, "- failed");
error_found = true;
}
}
}
@@ -295,27 +292,17 @@ do_cluster_show(void)
if (cell->node_info->node_status == NODE_STATUS_UP)
{
if (cell->node_info->active == true)
{
appendPQExpBuffer(&details, "* running");
}
else
{
appendPQExpBuffer(&details, "! running");
error_found = true;
}
}
/* node is unreachable */
else
{
if (cell->node_info->active == true)
{
appendPQExpBuffer(&details, "? unreachable");
}
else
{
appendPQExpBuffer(&details, "- failed");
error_found = true;
}
}
}
break;
@@ -323,7 +310,6 @@ do_cluster_show(void)
{
/* this should never happen */
appendPQExpBuffer(&details, "? unknown node type");
error_found = true;
}
break;
}
@@ -428,6 +414,7 @@ do_cluster_show(void)
PQfinish(conn);
/* emit any warnings */
if (warnings.head != NULL && runtime_options.terse == false && runtime_options.output_mode != OM_CSV)
{
ItemListCell *cell = NULL;
@@ -438,20 +425,6 @@ do_cluster_show(void)
printf(_(" - %s\n"), cell->string);
}
}
/*
* If warnings were noted, even if they're not displayed (e.g. in --csv node),
* that means something's not right so we need to emit a non-zero exit code.
*/
if (warnings.head != NULL)
{
error_found = true;
}
if (error_found == true)
{
exit(ERR_NODE_STATUS);
}
}
@@ -463,7 +436,6 @@ do_cluster_show(void)
* --all
* --node-[id|name]
* --event
* --csv
*/
void
@@ -508,12 +480,8 @@ do_cluster_event(void)
strncpy(headers_event[EV_TIMESTAMP].title, _("Timestamp"), MAXLEN);
strncpy(headers_event[EV_DETAILS].title, _("Details"), MAXLEN);
/*
* If --terse or --csv provided, simply omit the "Details" column.
* In --csv mode we'd need to quote/escape the contents "Details" column,
* which is doable but which will remain a TODO for now.
*/
if (runtime_options.terse == true || runtime_options.output_mode == OM_CSV)
/* if --terse provided, simply omit the "Details" column */
if (runtime_options.terse == true)
column_count --;
for (i = 0; i < column_count; i++)
@@ -536,64 +504,47 @@ do_cluster_event(void)
}
if (runtime_options.output_mode == OM_TEXT)
for (i = 0; i < column_count; i++)
{
for (i = 0; i < column_count; i++)
{
if (i == 0)
printf(" ");
else
printf(" | ");
if (i == 0)
printf(" ");
else
printf(" | ");
printf("%-*s",
headers_event[i].max_length,
headers_event[i].title);
}
printf("\n");
printf("-");
for (i = 0; i < column_count; i++)
{
int j;
for (j = 0; j < headers_event[i].max_length; j++)
printf("-");
if (i < (column_count - 1))
printf("-+-");
else
printf("-");
}
printf("\n");
printf("%-*s",
headers_event[i].max_length,
headers_event[i].title);
}
printf("\n");
printf("-");
for (i = 0; i < column_count; i++)
{
int j;
for (j = 0; j < headers_event[i].max_length; j++)
printf("-");
if (i < (column_count - 1))
printf("-+-");
else
printf("-");
}
printf("\n");
for (i = 0; i < PQntuples(res); i++)
{
int j;
if (runtime_options.output_mode == OM_CSV)
printf(" ");
for (j = 0; j < column_count; j++)
{
for (j = 0; j < column_count; j++)
{
printf("%s", PQgetvalue(res, i, j));
if ((j + 1) < column_count)
{
printf(",");
}
}
}
else
{
printf(" ");
for (j = 0; j < column_count; j++)
{
printf("%-*s",
headers_event[j].max_length,
PQgetvalue(res, i, j));
printf("%-*s",
headers_event[j].max_length,
PQgetvalue(res, i, j));
if (j < (column_count - 1))
printf(" | ");
}
if (j < (column_count - 1))
printf(" | ");
}
printf("\n");
@@ -603,8 +554,7 @@ do_cluster_event(void)
PQfinish(conn);
if (runtime_options.output_mode == OM_TEXT)
puts("");
puts("");
}
@@ -746,7 +696,7 @@ do_cluster_crosscheck(void)
if (error_found == true)
{
exit(ERR_NODE_STATUS);
exit(ERR_CLUSTER_CHECK);
}
}
@@ -836,7 +786,7 @@ do_cluster_matrix()
if (error_found == true)
{
exit(ERR_NODE_STATUS);
exit(ERR_CLUSTER_CHECK);
}
}
@@ -1332,7 +1282,6 @@ do_cluster_cleanup(void)
PGconn *conn = NULL;
PGconn *primary_conn = NULL;
int entries_to_delete = 0;
PQExpBufferData event_details;
conn = establish_db_connection(config_file_options.conninfo, true);
@@ -1346,13 +1295,7 @@ do_cluster_cleanup(void)
entries_to_delete = get_number_of_monitoring_records_to_delete(primary_conn, runtime_options.keep_history);
if (entries_to_delete < 0)
{
log_error(_("unable to query number of monitoring records to clean up"));
PQfinish(primary_conn);
exit(ERR_DB_QUERY);
}
else if (entries_to_delete == 0)
if (entries_to_delete == 0)
{
log_info(_("no monitoring records to delete"));
PQfinish(primary_conn);
@@ -1362,23 +1305,10 @@ do_cluster_cleanup(void)
log_debug("at least %i monitoring records for deletion",
entries_to_delete);
initPQExpBuffer(&event_details);
if (delete_monitoring_records(primary_conn, runtime_options.keep_history) == false)
{
appendPQExpBuffer(&event_details,
_("unable to delete monitoring records"));
log_error("%s", event_details.data);
log_error(_("unable to delete monitoring records"));
log_detail("%s", PQerrorMessage(primary_conn));
create_event_notification(primary_conn,
&config_file_options,
config_file_options.node_id,
"cluster_cleanup",
false,
event_details.data);
PQfinish(primary_conn);
exit(ERR_DB_QUERY);
}
@@ -1390,22 +1320,7 @@ do_cluster_cleanup(void)
log_detail("%s", PQerrorMessage(primary_conn));
}
appendPQExpBuffer(&event_details,
_("monitoring records deleted"));
if (runtime_options.keep_history > 0)
appendPQExpBuffer(&event_details,
_("; records newer than %i day(s) retained"),
runtime_options.keep_history);
create_event_notification(primary_conn,
&config_file_options,
config_file_options.node_id,
"cluster_cleanup",
true,
event_details.data);
termPQExpBuffer(&event_details);
PQfinish(primary_conn);
if (runtime_options.keep_history > 0)
@@ -1432,7 +1347,6 @@ do_cluster_help(void)
printf(_(" %s [OPTIONS] cluster matrix\n"), progname());
printf(_(" %s [OPTIONS] cluster crosscheck\n"), progname());
printf(_(" %s [OPTIONS] cluster event\n"), progname());
printf(_(" %s [OPTIONS] cluster cleanup\n"), progname());
puts("");
printf(_("CLUSTER SHOW\n"));
@@ -1472,7 +1386,6 @@ do_cluster_help(void)
printf(_(" --event filter specific event\n"));
printf(_(" --node-id restrict entries to node with this ID\n"));
printf(_(" --node-name restrict entries to node with this name\n"));
printf(_(" --csv emit output as CSV\n"));
puts("");
printf(_("CLUSTER CLEANUP\n"));

View File

@@ -47,7 +47,6 @@ static CheckStatus do_node_check_downstream(PGconn *conn, OutputMode mode, Check
static CheckStatus do_node_check_replication_lag(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
static CheckStatus do_node_check_role(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
static CheckStatus do_node_check_slots(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
static CheckStatus do_node_check_missing_slots(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
/*
* NODE STATUS
@@ -170,17 +169,11 @@ do_node_status(void)
}
else
{
/* "archive_mode" is not "off", i.e. one of "on", "always" */
bool enabled = true;
PQExpBufferData archiving_status;
char archive_command[MAXLEN] = "";
initPQExpBuffer(&archiving_status);
/*
* if the node is a standby, and "archive_mode" is "on", archiving will
* actually be disabled.
*/
if (recovery_type == RECTYPE_STANDBY)
{
if (guc_set(conn, "archive_mode", "=", "on"))
@@ -258,55 +251,6 @@ do_node_status(void)
"disabled");
}
/* check for attached nodes */
{
NodeInfoList downstream_nodes = T_NODE_INFO_LIST_INITIALIZER;
NodeInfoListCell *node_cell = NULL;
ItemList missing_nodes = {NULL, NULL};
int missing_nodes_count = 0;
int expected_nodes_count = 0;
get_downstream_node_records(conn, config_file_options.node_id, &downstream_nodes);
/* if a witness node is present, we'll need to remove this from the total */
expected_nodes_count = downstream_nodes.node_count;
for (node_cell = downstream_nodes.head; node_cell; node_cell = node_cell->next)
{
/* skip witness server */
if (node_cell->node_info->type == WITNESS)
{
expected_nodes_count --;
continue;
}
if (is_downstream_node_attached(conn, node_cell->node_info->node_name) == false)
{
missing_nodes_count++;
item_list_append_format(&missing_nodes,
"%s (ID: %i)",
node_cell->node_info->node_name,
node_cell->node_info->node_id);
}
}
if (missing_nodes_count)
{
ItemListCell *missing_cell = NULL;
item_list_append_format(&warnings,
_("- %i of %i downstream nodes not attached:"),
missing_nodes_count,
expected_nodes_count);
for (missing_cell = missing_nodes.head; missing_cell; missing_cell = missing_cell->next)
{
item_list_append_format(&warnings,
" - %s\n", missing_cell->string);
}
}
}
if (server_version_num < 90400)
{
key_value_list_set(&node_status,
@@ -542,31 +486,18 @@ do_node_status(void)
termPQExpBuffer(&output);
if (warnings.head != NULL && runtime_options.terse == false && runtime_options.output_mode == OM_TEXT)
if (runtime_options.output_mode == OM_TEXT && warnings.head != NULL && runtime_options.terse == false)
{
log_warning(_("following issue(s) were detected:"));
print_item_list(&warnings);
log_hint(_("execute \"repmgr node check\" for more details"));
}
clear_node_info_list(&missing_slots);
key_value_list_free(&node_status);
item_list_free(&warnings);
PQfinish(conn);
/*
* If warnings were noted, even if they're not displayed (e.g. in --csv node),
* that means something's not right so we need to emit a non-zero exit code.
*/
if (warnings.head != NULL)
{
exit(ERR_NODE_STATUS);
}
return;
}
/*
* Returns information about the running state of the node.
* For internal use during "standby switchover".
@@ -697,7 +628,6 @@ do_node_check(void)
CheckStatusList status_list = {NULL, NULL};
CheckStatusListCell *cell = NULL;
bool issue_detected = false;
/* for internal use */
if (runtime_options.has_passfile == true)
@@ -782,17 +712,6 @@ do_node_check(void)
exit(return_code);
}
if (runtime_options.missing_slots == true)
{
return_code = do_node_check_missing_slots(conn,
runtime_options.output_mode,
&node_info,
NULL);
PQfinish(conn);
exit(return_code);
}
if (runtime_options.output_mode == OM_NAGIOS)
{
log_error(_("--nagios can only be used with a specific check"));
@@ -806,23 +725,11 @@ do_node_check(void)
initPQExpBuffer(&output);
/* order functions are called is also output order */
if (do_node_check_role(conn, runtime_options.output_mode, &node_info, &status_list) != CHECK_STATUS_OK)
issue_detected = true;
if (do_node_check_replication_lag(conn, runtime_options.output_mode, &node_info, &status_list) != CHECK_STATUS_OK)
issue_detected = true;
if (do_node_check_archive_ready(conn, runtime_options.output_mode, &status_list) != CHECK_STATUS_OK)
issue_detected = true;
if (do_node_check_downstream(conn, runtime_options.output_mode, &status_list) != CHECK_STATUS_OK)
issue_detected = true;
if (do_node_check_slots(conn, runtime_options.output_mode, &node_info, &status_list) != CHECK_STATUS_OK)
issue_detected = true;
if (do_node_check_missing_slots(conn, runtime_options.output_mode, &node_info, &status_list) != CHECK_STATUS_OK)
issue_detected = true;
(void) do_node_check_role(conn, runtime_options.output_mode, &node_info, &status_list);
(void) do_node_check_replication_lag(conn, runtime_options.output_mode, &node_info, &status_list);
(void) do_node_check_archive_ready(conn, runtime_options.output_mode, &status_list);
(void) do_node_check_downstream(conn, runtime_options.output_mode, &status_list);
(void) do_node_check_slots(conn, runtime_options.output_mode, &node_info, &status_list);
if (runtime_options.output_mode == OM_CSV)
{
@@ -879,11 +786,6 @@ do_node_check(void)
check_status_list_free(&status_list);
PQfinish(conn);
if (issue_detected == true)
{
exit(ERR_NODE_STATUS);
}
}
@@ -1145,7 +1047,6 @@ do_node_check_downstream(PGconn *conn, OutputMode mode, CheckStatusList *list_ou
for (cell = downstream_nodes.head; cell; cell = cell->next)
{
/* skip witness server */
if (cell->node_info->type == WITNESS)
{
expected_nodes_count --;
@@ -1682,130 +1583,6 @@ do_node_check_slots(PGconn *conn, OutputMode mode, t_node_info *node_info, Check
}
static CheckStatus
do_node_check_missing_slots(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output)
{
CheckStatus status = CHECK_STATUS_OK;
PQExpBufferData details;
NodeInfoList missing_slots = T_NODE_INFO_LIST_INITIALIZER;
if (mode == OM_CSV && list_output == NULL)
{
log_error(_("--csv output not provided with --missing-slots option"));
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
initPQExpBuffer(&details);
if (server_version_num < 90400)
{
appendPQExpBuffer(&details,
_("replication slots not available for this PostgreSQL version"));
}
else
{
get_downstream_nodes_with_missing_slot(conn,
config_file_options.node_id,
&missing_slots);
if (missing_slots.node_count == 0)
{
appendPQExpBuffer(&details,
_("node has no missing replication slots"));
}
else
{
NodeInfoListCell *missing_slot_cell = NULL;
bool first_element = true;
status = CHECK_STATUS_CRITICAL;
appendPQExpBuffer(&details,
_("%i replication slots are missing"),
missing_slots.node_count);
if (missing_slots.node_count)
{
appendPQExpBuffer(&details, ": ");
for (missing_slot_cell = missing_slots.head; missing_slot_cell; missing_slot_cell = missing_slot_cell->next)
{
if (first_element == true)
{
first_element = false;
}
else
{
appendPQExpBuffer(&details, ", ");
}
appendPQExpBuffer(&details, "%s", missing_slot_cell->node_info->slot_name);
}
}
}
}
switch (mode)
{
case OM_NAGIOS:
{
printf("REPMGR_MISSING_SLOTS %s: %s | missing_slots=%i",
output_check_status(status),
details.data,
missing_slots.node_count);
if (missing_slots.node_count)
{
NodeInfoListCell *missing_slot_cell = NULL;
bool first_element = true;
printf(";");
for (missing_slot_cell = missing_slots.head; missing_slot_cell; missing_slot_cell = missing_slot_cell->next)
{
if (first_element == true)
{
first_element = false;
}
else
{
printf(",");
}
printf("%s", missing_slot_cell->node_info->slot_name);
}
}
printf("\n");
break;
}
case OM_CSV:
case OM_TEXT:
if (list_output != NULL)
{
check_status_list_set(list_output,
"Replication slots",
status,
details.data);
}
else
{
printf("%s (%s)\n",
output_check_status(status),
details.data);
}
default:
break;
}
clear_node_info_list(&missing_slots);
termPQExpBuffer(&details);
return status;
}
void
do_node_service(void)
{
@@ -2359,19 +2136,19 @@ do_node_rejoin(void)
{
log_verbose(LOG_INFO, _("waiting for node %i to respond to pings; %i of max %i attempts"),
config_file_options.node_id,
i + 1, config_file_options.node_rejoin_timeout);
i + 1, config_file_options.standby_reconnect_timeout);
}
else
{
log_debug("sleeping 1 second waiting for node %i to respond to pings; %i of max %i attempts",
config_file_options.node_id,
i + 1, config_file_options.node_rejoin_timeout);
i + 1, config_file_options.standby_reconnect_timeout);
}
sleep(1);
}
for (; i < config_file_options.node_rejoin_timeout; i++)
for (; i < config_file_options.standby_reconnect_timeout; i++)
{
success = is_downstream_node_attached(upstream_conn, config_file_options.node_name);
@@ -2386,13 +2163,13 @@ do_node_rejoin(void)
{
log_info(_("waiting for node %i to connect to new primary; %i of max %i attempts"),
config_file_options.node_id,
i + 1, config_file_options.node_rejoin_timeout);
i + 1, config_file_options.standby_reconnect_timeout);
}
else
{
log_debug("sleeping 1 second waiting for node %i to connect to new primary; %i of max %i attempts",
config_file_options.node_id,
i + 1, config_file_options.node_rejoin_timeout);
i + 1, config_file_options.standby_reconnect_timeout);
}
sleep(1);
@@ -2417,54 +2194,6 @@ do_node_rejoin(void)
success = is_downstream_node_attached(upstream_conn, config_file_options.node_name);
}
/*
* Handle replication slots:
* - if a slot for the new upstream exists, delete that
* - warn about any other inactive replication slots
*/
if (runtime_options.force_rewind_used == false && config_file_options.use_replication_slots)
{
PGconn *local_conn = NULL;
local_conn = establish_db_connection(config_file_options.conninfo, false);
if (PQstatus(local_conn) != CONNECTION_OK)
{
log_warning(_("unable to connect to local node to check replication slot status"));
log_hint(_("execute \"repmgr node check\" to check inactive slots and drop manually if necessary"));
}
else
{
KeyValueList inactive_replication_slots = {NULL, NULL};
KeyValueListCell *cell = NULL;
int inactive_count = 0;
PQExpBufferData slotinfo;
drop_replication_slot_if_exists(local_conn,
config_file_options.node_id,
primary_node_record.slot_name);
(void) get_inactive_replication_slots(local_conn, &inactive_replication_slots);
initPQExpBuffer(&slotinfo);
for (cell = inactive_replication_slots.head; cell; cell = cell->next)
{
appendPQExpBuffer(&slotinfo,
" - %s (%s)", cell->key, cell->value);
inactive_count++;
}
if (inactive_count > 0)
{
log_warning(_("%i inactive replication slots detected"), inactive_count);
log_detail(_("inactive replication slots:\n%s"), slotinfo.data);
log_hint(_("these replication slots may need to be removed manually"));
}
termPQExpBuffer(&slotinfo);
PQfinish(local_conn);
}
}
if (success == true)
{
@@ -2474,8 +2203,7 @@ do_node_rejoin(void)
else
{
/*
* if we reach here, no record found in upstream node's pg_stat_replication
*/
* if we reach here, no record found in upstream node's pg_stat_replication */
log_notice(_("NODE REJOIN has completed but node is not yet reattached to upstream"));
log_hint(_("you will need to manually check the node's replication status"));
}
@@ -2936,7 +2664,6 @@ do_node_help(void)
printf(_(" --replication-lag replication lag in seconds (standbys only)\n"));
printf(_(" --role check node has expected role\n"));
printf(_(" --slots check for inactive replication slots\n"));
printf(_(" --missing-slots check for missing replication slots\n"));
puts("");

View File

@@ -64,10 +64,12 @@ do_primary_register(void)
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
log_error(_("unable to determine server's recovery type"));
PQfinish(conn);
exit(ERR_DB_CONN);
else
{
log_error(_("connection to node lost"));
PQfinish(conn);
exit(ERR_DB_CONN);
}
}
log_verbose(LOG_INFO, _("server is not in recovery"));

View File

@@ -89,6 +89,8 @@ static int run_file_backup(t_node_info *node_record);
static void copy_configuration_files(bool delete_after_copy);
static void drop_replication_slot_if_exists(PGconn *conn, int node_id, char *slot_name);
static void tablespace_data_append(TablespaceDataList *list, const char *name, const char *oid, const char *location);
static void get_barman_property(char *dst, char *name, char *local_repmgr_directory);
@@ -469,7 +471,6 @@ do_standby_clone(void)
termPQExpBuffer(&msg);
r = test_ssh_connection(runtime_options.host, runtime_options.remote_user);
if (r != 0)
{
log_error(_("remote host \"%s\" is not reachable via SSH - unable to copy external configuration files"),
@@ -497,41 +498,32 @@ do_standby_clone(void)
termPQExpBuffer(&msg);
/*
* Here we'll attempt an initial test copy of the detected external
* files, to detect any issues before we run the base backup.
*
* Note this will exit with an error, unless -F/--force supplied.
*
* We don't do this during a --dry-run as it may introduce unexpected changes
* on the local node; during an actual clone operation, any problems with
* copying files will be detected early and the operation aborted before
* the actual database cloning commences.
*
* TODO: put the files in a temporary directory and move to their final
* destination once the database has been cloned.
*/
if (runtime_options.dry_run == false)
if (runtime_options.copy_external_config_files_destination == CONFIG_FILE_SAMEPATH)
{
if (runtime_options.copy_external_config_files_destination == CONFIG_FILE_SAMEPATH)
{
/*
* Files will be placed in the same path as on the source server;
* don't delete after copying.
*/
copy_configuration_files(false);
/*
* Files will be placed in the same path as on the source server;
* don't delete after copying.
*/
copy_configuration_files(false);
}
else
{
/*
* Files will be placed in the data directory - delete after copying.
* They'll be copied again later; see TODO above.
*/
copy_configuration_files(true);
}
}
else
{
/*
* Files will be placed in the data directory - delete after copying.
* They'll be copied again later; see TODO above.
*/
copy_configuration_files(true);
}
}
@@ -1062,7 +1054,6 @@ _do_create_recovery_conf(void)
local_node_record.slot_name,
upstream_node_record.node_name,
upstream_node_id);
if (runtime_options.force == false && runtime_options.dry_run == false)
{
log_error("%s", msg.data);
@@ -1094,7 +1085,7 @@ _do_create_recovery_conf(void)
initPQExpBuffer(&msg);
appendPQExpBuffer(&msg,
_("insufficient free replication slots on upstream node \"%s\" (ID: %i)"),
_("insufficient free replicaiton slots on upstream node \"%s\" (ID: %i)"),
upstream_node_record.node_name,
upstream_node_id);
@@ -1150,14 +1141,14 @@ _do_create_recovery_conf(void)
if (runtime_options.dry_run == true)
{
char recovery_conf_contents[MAXLEN] = "";
create_recovery_file(&local_node_record, &recovery_conninfo, recovery_conf_contents, false);
create_recovery_file(&upstream_node_record, &recovery_conninfo, recovery_conf_contents, false);
log_info(_("would create \"recovery.conf\" file in \"%s\""), local_data_directory);
log_detail(_("\n%s"), recovery_conf_contents);
}
else
{
if (!create_recovery_file(&local_node_record, &recovery_conninfo, local_data_directory, true))
if (!create_recovery_file(&upstream_node_record, &recovery_conninfo, local_data_directory, true))
{
log_error(_("unable to create \"recovery.conf\""));
}
@@ -1566,8 +1557,8 @@ do_standby_register(void)
exit(ERR_BAD_CONFIG);
}
log_warning(_("this node does not appear to be attached to upstream node \"%s\" (ID: %i)"),
upstream_node_record.node_name,
upstream_node_record.node_id);
config_file_options.node_name,
config_file_options.node_id);
}
PQfinish(upstream_conn);
}
@@ -1717,16 +1708,11 @@ do_standby_register(void)
termPQExpBuffer(&details);
/*
* If --wait-sync option set, wait for the records to synchronise
* (unless 0 seconds provided, which disables it, which is the same as
* not providing the option). The default value is -1, which means
* no timeout.
*/
/* if --wait-sync option set, wait for the records to synchronise */
if (PQstatus(conn) == CONNECTION_OK &&
runtime_options.wait_register_sync == true &&
runtime_options.wait_register_sync_seconds != 0)
runtime_options.wait_register_sync_seconds > 0)
{
bool sync_ok = false;
int timer = 0;
@@ -1750,11 +1736,7 @@ do_standby_register(void)
{
bool records_match = true;
/*
* If timeout set to a positive value, check if we've reached it and
* exit the loop
*/
if (runtime_options.wait_register_sync_seconds > 0 && runtime_options.wait_register_sync_seconds == timer)
if (runtime_options.wait_register_sync_seconds && runtime_options.wait_register_sync_seconds == timer)
break;
node_record_status = get_node_record(conn,
@@ -2058,8 +2040,6 @@ _do_standby_promote_internal(PGconn *conn)
local_node_record.node_name,
local_node_record.node_id,
script);
log_detail(_("waiting up to %i seconds (parameter \"promote_check_timeout\") for promotion to complete"),
config_file_options.promote_check_timeout);
r = system(script);
if (r != 0)
@@ -2085,8 +2065,6 @@ _do_standby_promote_internal(PGconn *conn)
if (recovery_type == RECTYPE_STANDBY)
{
log_error(_("STANDBY PROMOTE failed, node is still a standby"));
log_detail(_("node still in recovery after %i seconds"), config_file_options.promote_check_timeout);
log_hint(_("the node may need more time to promote itself, check the PostgreSQL log for details"));
PQfinish(conn);
exit(ERR_PROMOTION_FAIL);
}
@@ -2732,10 +2710,6 @@ do_standby_follow_internal(PGconn *primary_conn, t_node_info *primary_node_recor
* If replication slots are in use, and an inactive one for this node
* exists on the former upstream, drop it.
*
* Note that if this function is called by do_standby_switchover(), the
* "repmgr node rejoin" command executed on the demotion candidate may already
* have removed the slot, so there may be nothing to do.
*
* XXX check if former upstream is current primary?
*/
@@ -2843,12 +2817,6 @@ do_standby_switchover(void)
int reachable_sibling_nodes_with_slot_count = 0;
int unreachable_sibling_node_count = 0;
/* number of free walsenders required on promotion candidate */
int min_required_wal_senders = 1;
/* this will be calculated as max_wal_senders - COUNT(*) FROM pg_stat_replication */
int available_wal_senders = 0;
/* number of free replication slots required on promotion candidate */
int min_required_free_slots = 0;
@@ -2933,25 +2901,6 @@ do_standby_switchover(void)
exit(ERR_DB_QUERY);
}
/*
* Check that there's no exclusive backups running on the primary.
* We don't want to end up damaging the backup and also leaving the server in an
* state where there's control data saying it's in backup mode but there's no
* backup_label in PGDATA.
* If the DBA wants to do the switchover anyway, he should first stop the
* backup that's running.
*/
if (server_in_exclusive_backup_mode(remote_conn) != BACKUP_STATE_NO_BACKUP)
{
log_error(_("unable to perform a switchover while primary server is in exclusive backup mode"));
log_hint(_("stop backup before attempting the switchover"));
PQfinish(local_conn);
PQfinish(remote_conn);
exit(ERR_SWITCHOVER_FAIL);
}
/*
* Check this standby is attached to the demotion candidate
* TODO:
@@ -3118,176 +3067,6 @@ do_standby_switchover(void)
}
termPQExpBuffer(&command_output);
/*
* populate local node record with current state of various replication-related
* values, so we can check for sufficient walsenders and replication slots
*/
get_node_replication_stats(local_conn, server_version_num, &local_node_record);
available_wal_senders = local_node_record.max_wal_senders -
local_node_record.attached_wal_receivers;
/*
* If --siblings-follow specified, get list and check they're reachable
* (if not just issue a warning)
*/
get_active_sibling_node_records(local_conn,
local_node_record.node_id,
local_node_record.upstream_node_id,
&sibling_nodes);
if (runtime_options.siblings_follow == false)
{
if (sibling_nodes.node_count > 0)
{
log_warning(_("%i sibling nodes found, but option \"--siblings-follow\" not specified"),
sibling_nodes.node_count);
log_detail(_("these nodes will remain attached to the current primary"));
}
}
else
{
char host[MAXLEN] = "";
NodeInfoListCell *cell;
log_verbose(LOG_INFO, _("%i active sibling nodes found"),
sibling_nodes.node_count);
if (sibling_nodes.node_count == 0)
{
log_warning(_("option \"--sibling-nodes\" specified, but no sibling nodes exist"));
}
else
{
/* include walsender for promotion candidate in total */
for (cell = sibling_nodes.head; cell; cell = cell->next)
{
/* get host from node record */
get_conninfo_value(cell->node_info->conninfo, "host", host);
r = test_ssh_connection(host, runtime_options.remote_user);
if (r != 0)
{
cell->node_info->reachable = false;
unreachable_sibling_node_count++;
}
else
{
cell->node_info->reachable = true;
reachable_sibling_node_count++;
min_required_wal_senders++;
if (cell->node_info->slot_name[0] != '\0')
{
reachable_sibling_nodes_with_slot_count++;
min_required_free_slots++;
}
}
}
if (unreachable_sibling_node_count > 0)
{
if (runtime_options.force == false)
{
log_error(_("%i of %i sibling nodes unreachable via SSH:"),
unreachable_sibling_node_count,
sibling_nodes.node_count);
}
else
{
log_warning(_("%i of %i sibling nodes unreachable via SSH:"),
unreachable_sibling_node_count,
sibling_nodes.node_count);
}
/* display list of unreachable sibling nodes */
for (cell = sibling_nodes.head; cell; cell = cell->next)
{
if (cell->node_info->reachable == true)
continue;
log_detail(" %s (ID: %i)",
cell->node_info->node_name,
cell->node_info->node_id);
}
if (runtime_options.force == false)
{
log_hint(_("use -F/--force to proceed in any case"));
PQfinish(local_conn);
exit(ERR_BAD_CONFIG);
}
if (runtime_options.dry_run == true)
{
log_detail(_("F/--force specified, would proceed anyway"));
}
else
{
log_detail(_("F/--force specified, proceeding anyway"));
}
}
else
{
char *msg = _("all sibling nodes are reachable via SSH");
if (runtime_options.dry_run == true)
{
log_info("%s", msg);
}
else
{
log_verbose(LOG_INFO, "%s", msg);
}
}
}
}
/*
* check there are sufficient free walsenders - obviously there's potential
* for a later race condition if some walsenders come into use before the
* switchover operation gets around to attaching the sibling nodes, but
* this should catch any actual existing configuration issue (and if anyone's
* performing a switchover in such an unstable environment, they only have
* themselves to blame).
*/
if (available_wal_senders < min_required_wal_senders)
{
if (runtime_options.force == false || runtime_options.dry_run == true)
{
log_error(_("insufficient free walsenders on promotion candidate"));
log_detail(_("at least %i walsenders required but only %i free walsenders on promotion candidate"),
min_required_wal_senders,
available_wal_senders);
log_hint(_("increase parameter \"max_wal_senders\" or use -F/--force to proceed in any case"));
if (runtime_options.dry_run == false)
{
PQfinish(local_conn);
exit(ERR_BAD_CONFIG);
}
}
else
{
log_warning(_("insufficient free walsenders on promotion candidate"));
log_detail(_("at least %i walsenders required but only %i free walsender(s) on promotion candidate"),
min_required_wal_senders,
available_wal_senders);
}
}
else
{
if (runtime_options.dry_run == true)
{
log_info(_("%i walsenders required, %i available"),
min_required_wal_senders,
available_wal_senders);
}
}
/* check demotion candidate can make replication connection to promotion candidate */
{
initPQExpBuffer(&remote_command_str);
@@ -3531,6 +3310,171 @@ do_standby_switchover(void)
PQfinish(remote_conn);
/*
* populate local node record with current state of various replication-related
* values, so we can check for sufficient walsenders and replication slots
*/
get_node_replication_stats(local_conn, server_version_num, &local_node_record);
/*
* If --siblings-follow specified, get list and check they're reachable
* (if not just issue a warning)
*/
get_active_sibling_node_records(local_conn,
local_node_record.node_id,
local_node_record.upstream_node_id,
&sibling_nodes);
if (runtime_options.siblings_follow == false)
{
if (sibling_nodes.node_count > 0)
{
log_warning(_("%i sibling nodes found, but option \"--siblings-follow\" not specified"),
sibling_nodes.node_count);
log_detail(_("these nodes will remain attached to the current primary"));
}
}
else
{
char host[MAXLEN] = "";
NodeInfoListCell *cell;
log_verbose(LOG_INFO, _("%i active sibling nodes found"),
sibling_nodes.node_count);
if (sibling_nodes.node_count == 0)
{
log_warning(_("option \"--sibling-nodes\" specified, but no sibling nodes exist"));
}
else
{
/* include walsender for promotion candidate in total */
int min_required_wal_senders = 1;
int available_wal_senders = local_node_record.max_wal_senders -
local_node_record.attached_wal_receivers;
for (cell = sibling_nodes.head; cell; cell = cell->next)
{
/* get host from node record */
get_conninfo_value(cell->node_info->conninfo, "host", host);
r = test_ssh_connection(host, runtime_options.remote_user);
if (r != 0)
{
cell->node_info->reachable = false;
unreachable_sibling_node_count++;
}
else
{
cell->node_info->reachable = true;
reachable_sibling_node_count++;
min_required_wal_senders++;
if (cell->node_info->slot_name[0] != '\0')
{
reachable_sibling_nodes_with_slot_count++;
min_required_free_slots++;
}
}
}
if (unreachable_sibling_node_count > 0)
{
if (runtime_options.force == false)
{
log_error(_("%i of %i sibling nodes unreachable via SSH:"),
unreachable_sibling_node_count,
sibling_nodes.node_count);
}
else
{
log_warning(_("%i of %i sibling nodes unreachable via SSH:"),
unreachable_sibling_node_count,
sibling_nodes.node_count);
}
/* display list of unreachable sibling nodes */
for (cell = sibling_nodes.head; cell; cell = cell->next)
{
if (cell->node_info->reachable == true)
continue;
log_detail(" %s (ID: %i)",
cell->node_info->node_name,
cell->node_info->node_id);
}
if (runtime_options.force == false)
{
log_hint(_("use -F/--force to proceed in any case"));
PQfinish(local_conn);
exit(ERR_BAD_CONFIG);
}
if (runtime_options.dry_run == true)
{
log_detail(_("F/--force specified, would proceed anyway"));
}
else
{
log_detail(_("F/--force specified, proceeding anyway"));
}
}
else
{
char *msg = _("all sibling nodes are reachable via SSH");
if (runtime_options.dry_run == true)
{
log_info("%s", msg);
}
else
{
log_verbose(LOG_INFO, "%s", msg);
}
}
/*
* check there are sufficient free walsenders - obviously there's potential
* for a later race condition if some walsenders come into use before the
* switchover operation gets around to attaching the sibling nodes, but
* this should catch any actual existing configuration issue.
*/
if (available_wal_senders < min_required_wal_senders)
{
if (runtime_options.force == false || runtime_options.dry_run == true)
{
log_error(_("insufficient free walsenders to attach all sibling nodes"));
log_detail(_("at least %i walsenders required but only %i free walsenders on promotion candidate"),
min_required_wal_senders,
available_wal_senders);
log_hint(_("increase parameter \"max_wal_senders\" or use -F/--force to proceed in any case"));
if (runtime_options.dry_run == false)
{
PQfinish(local_conn);
exit(ERR_BAD_CONFIG);
}
}
else
{
log_warning(_("insufficient free walsenders to attach all sibling nodes"));
log_detail(_("at least %i walsenders required but only %i free walsender(s) on promotion candidate"),
min_required_wal_senders,
available_wal_senders);
}
}
else
{
if (runtime_options.dry_run == true)
{
log_info(_("%i walsenders required, %i available"),
min_required_wal_senders,
available_wal_senders);
}
}
}
}
/*
* if replication slots are required by demotion candidate and/or siblings,
@@ -5138,73 +5082,57 @@ run_basebackup(t_node_info *node_record)
{
PGconn *upstream_conn = NULL;
upstream_conn = establish_db_connection(upstream_node_record.conninfo, false);
upstream_conn = establish_db_connection(upstream_node_record.conninfo, true);
/*
* It's possible the upstream node is not yet running, in which case we'll
* have to rely on the user taking action to create the slot
*/
if (PQstatus(upstream_conn) != CONNECTION_OK)
record_status = get_slot_record(upstream_conn, node_record->slot_name, &slot_info);
if (record_status == RECORD_FOUND)
{
log_warning(_("unable to connect to upstream node to create replication slot"));
/*
* TODO: if slot creation also handled by "standby register", update warning
*/
log_hint(_("you may need to create the replication slot manually"));
log_verbose(LOG_INFO,
_("replication slot \"%s\" aleady exists on upstream node %i"),
node_record->slot_name,
upstream_node_id);
slot_exists_on_upstream = true;
}
else
{
record_status = get_slot_record(upstream_conn, node_record->slot_name, &slot_info);
PQExpBufferData event_details;
if (record_status == RECORD_FOUND)
log_notice(_("creating replication slot \"%s\" on upstream node %i"),
node_record->slot_name,
upstream_node_id);
get_superuser_connection(&upstream_conn, &superuser_conn, &privileged_conn);
initPQExpBuffer(&event_details);
if (create_replication_slot(privileged_conn, node_record->slot_name, source_server_version_num, &event_details) == false)
{
log_verbose(LOG_INFO,
_("replication slot \"%s\" aleady exists on upstream node %i"),
node_record->slot_name,
upstream_node_id);
slot_exists_on_upstream = true;
}
else
{
PQExpBufferData event_details;
log_error("%s", event_details.data);
log_notice(_("creating replication slot \"%s\" on upstream node %i"),
node_record->slot_name,
upstream_node_id);
create_event_notification(
primary_conn,
&config_file_options,
config_file_options.node_id,
"standby_clone",
false,
event_details.data);
get_superuser_connection(&upstream_conn, &superuser_conn, &privileged_conn);
initPQExpBuffer(&event_details);
if (create_replication_slot(privileged_conn, node_record->slot_name, source_server_version_num, &event_details) == false)
{
log_error("%s", event_details.data);
create_event_notification(primary_conn,
&config_file_options,
config_file_options.node_id,
"standby_clone",
false,
event_details.data);
PQfinish(source_conn);
if (superuser_conn != NULL)
PQfinish(superuser_conn);
exit(ERR_DB_QUERY);
}
PQfinish(source_conn);
if (superuser_conn != NULL)
PQfinish(superuser_conn);
termPQExpBuffer(&event_details);
exit(ERR_DB_QUERY);
}
PQfinish(upstream_conn);
}
}
if (superuser_conn != NULL)
PQfinish(superuser_conn);
/* delete slot on source server */
termPQExpBuffer(&event_details);
}
PQfinish(upstream_conn);
}
get_superuser_connection(&source_conn, &superuser_conn, &privileged_conn);
@@ -5212,7 +5140,7 @@ run_basebackup(t_node_info *node_record)
{
if (slot_exists_on_upstream == false)
{
if (drop_replication_slot(privileged_conn, node_record->slot_name) == true)
if (drop_replication_slot(source_conn, node_record->slot_name) == true)
{
log_notice(_("replication slot \"%s\" deleted on source node"), node_record->slot_name);
}
@@ -5870,7 +5798,7 @@ get_barman_property(char *dst, char *name, char *local_repmgr_directory)
initPQExpBuffer(&command_output);
maxlen_snprintf(command,
"grep \"^[[:space:]]%s:\" %s/show-server.txt",
"grep \"^\t%s:\" %s/show-server.txt",
name, local_repmgr_tmp_directory);
(void) local_command(command, &command_output);
@@ -6067,6 +5995,45 @@ check_recovery_type(PGconn *conn)
}
static void
drop_replication_slot_if_exists(PGconn *conn, int node_id, char *slot_name)
{
t_replication_slot slot_info = T_REPLICATION_SLOT_INITIALIZER;
RecordStatus record_status = get_slot_record(conn, slot_name, &slot_info);
log_verbose(LOG_DEBUG, "attempting to delete slot \"%s\" on node %i",
slot_name, node_id);
if (record_status != RECORD_FOUND)
{
log_info(_("no slot record found for slot \"%s\" on node %i"),
slot_name, node_id);
}
else
{
if (slot_info.active == false)
{
if (drop_replication_slot(conn, slot_name) == true)
{
log_notice(_("replication slot \"%s\" deleted on node %i"), slot_name, node_id);
}
else
{
log_error(_("unable to delete replication slot \"%s\" on node %i"), slot_name, node_id);
}
}
/*
* if active replication slot exists, call Houston as we have a
* problem
*/
else
{
log_warning(_("replication slot \"%s\" is still active on node %i"), slot_name, node_id);
}
}
}
/*
* Creates a recovery.conf file for a standby
@@ -6537,7 +6504,6 @@ do_standby_help(void)
puts("");
printf(_(" \"standby clone\" clones a standby from the primary or an upstream node.\n"));
puts("");
printf(_(" -d, --dbname=conninfo conninfo of the upstream node to use for cloning.\n"));
printf(_(" -c, --fast-checkpoint force fast checkpoint\n"));
printf(_(" --copy-external-config-files[={samepath|pgdata}]\n" \
" copy configuration files located outside the \n" \

View File

@@ -310,59 +310,55 @@ do_witness_register(void)
void
do_witness_unregister(void)
{
PGconn *local_conn = NULL;
PGconn *witness_conn = NULL;
PGconn *primary_conn = NULL;
t_node_info node_record = T_NODE_INFO_INITIALIZER;
RecordStatus record_status = RECORD_NOT_FOUND;
bool node_record_deleted = false;
bool local_node_available = true;
int witness_node_id = UNKNOWN_NODE_ID;
bool witness_available = true;
if (runtime_options.node_id != UNKNOWN_NODE_ID)
{
/* user has specified the witness node id */
witness_node_id = runtime_options.node_id;
}
else
{
/* assume witness node is local node */
witness_node_id = config_file_options.node_id;
}
log_info(_("connecting to node \"%s\" (ID: %i)"),
log_info(_("connecting to witness node \"%s\" (ID: %i)"),
config_file_options.node_name,
config_file_options.node_id);
local_conn = establish_db_connection_quiet(config_file_options.conninfo);
witness_conn = establish_db_connection_quiet(config_file_options.conninfo);
if (PQstatus(local_conn) != CONNECTION_OK)
if (PQstatus(witness_conn) != CONNECTION_OK)
{
if (!runtime_options.force)
{
log_error(_("unable to connect to node \"%s\" (ID: %i)"),
log_error(_("unable to connect to witness node \"%s\" (ID: %i)"),
config_file_options.node_name,
config_file_options.node_id);
log_detail("%s", PQerrorMessage(local_conn));
log_detail("%s", PQerrorMessage(witness_conn));
log_hint(_("provide -F/--force to remove the witness record if the server is not running"));
exit(ERR_BAD_CONFIG);
}
log_notice(_("unable to connect to witness node \"%s\" (ID: %i), removing node record on cluster primary only"),
config_file_options.node_name,
config_file_options.node_id);
local_node_available = false;
witness_available = false;
}
if (local_node_available == true)
if (witness_available == true)
{
primary_conn = get_primary_connection_quiet(local_conn, NULL, NULL);
primary_conn = get_primary_connection_quiet(witness_conn, NULL, NULL);
}
else
{
/*
* Assume user has provided connection details for the primary server
* Extract the repmgr user and database names from the conninfo string
* provided in repmgr.conf
*/
get_conninfo_value(config_file_options.conninfo, "user", repmgr_user);
get_conninfo_value(config_file_options.conninfo, "dbname", repmgr_db);
param_set_ine(&source_conninfo, "user", repmgr_user);
param_set_ine(&source_conninfo, "dbname", repmgr_db);
primary_conn = establish_db_connection_by_params(&source_conninfo, false);
}
if (PQstatus(primary_conn) != CONNECTION_OK)
@@ -370,26 +366,26 @@ do_witness_unregister(void)
log_error(_("unable to connect to primary"));
log_detail("%s", PQerrorMessage(primary_conn));
if (local_node_available == true)
if (witness_available == true)
{
PQfinish(local_conn);
PQfinish(witness_conn);
}
else if (runtime_options.connection_param_provided == false)
else
{
log_hint(_("provide connection details for the primary server"));
log_hint(_("provide connection details to primary server"));
}
exit(ERR_BAD_CONFIG);
}
/* Check node exists and is really a witness */
record_status = get_node_record(primary_conn, witness_node_id, &node_record);
record_status = get_node_record(primary_conn, config_file_options.node_id, &node_record);
if (record_status != RECORD_FOUND)
{
log_error(_("no record found for node %i"), witness_node_id);
log_error(_("no record found for node %i"), config_file_options.node_id);
if (local_node_available == true)
PQfinish(local_conn);
if (witness_available == true)
PQfinish(witness_conn);
PQfinish(primary_conn);
exit(ERR_BAD_CONFIG);
@@ -397,17 +393,11 @@ do_witness_unregister(void)
if (node_record.type != WITNESS)
{
/*
* The node (either explicitly provided with --node-id, or the local node)
* is not a witness.
*
* TODO: scan node list and print hint about identity of known witness servers.
*/
log_error(_("node %i is not a witness node"), config_file_options.node_id);
log_detail(_("node %i is a %s node"), config_file_options.node_id, get_node_type_string(node_record.type));
if (local_node_available == true)
PQfinish(local_conn);
if (witness_available == true)
PQfinish(witness_conn);
PQfinish(primary_conn);
exit(ERR_BAD_CONFIG);
@@ -416,43 +406,49 @@ do_witness_unregister(void)
if (runtime_options.dry_run == true)
{
log_info(_("prerequisites for unregistering the witness node are met"));
if (local_node_available == true)
PQfinish(local_conn);
if (witness_available == true)
PQfinish(witness_conn);
PQfinish(primary_conn);
exit(SUCCESS);
}
log_info(_("unregistering witness node %i"), witness_node_id);
log_info(_("unregistering witness node %i"), config_file_options.node_id);
node_record_deleted = delete_node_record(primary_conn,
witness_node_id);
config_file_options.node_id);
if (node_record_deleted == false)
{
PQfinish(primary_conn);
PQfinish(witness_conn);
exit(ERR_BAD_CONFIG);
}
if (local_node_available == true)
PQfinish(local_conn);
PQfinish(local_conn);
/* sync records from primary */
if (witness_available == true && witness_copy_node_records(primary_conn, witness_conn) == false)
{
log_error(_("unable to copy repmgr node records from primary"));
PQfinish(primary_conn);
PQfinish(witness_conn);
exit(ERR_BAD_CONFIG);
}
/* Log the event */
create_event_record(primary_conn,
&config_file_options,
witness_node_id,
config_file_options.node_id,
"witness_unregister",
true,
NULL);
PQfinish(primary_conn);
if (local_node_available == true)
PQfinish(local_conn);
if (witness_available == true)
PQfinish(witness_conn);
log_info(_("witness unregistration complete"));
log_detail(_("witness node with ID %i successfully unregistered"),
witness_node_id);
log_detail(_("witness node with id %i (conninfo: %s) successfully unregistered"),
config_file_options.node_id, config_file_options.conninfo);
return;
}
@@ -472,19 +468,16 @@ void do_witness_help(void)
puts("");
printf(_(" Requires provision of connection information for the primary\n"));
puts("");
printf(_(" --dry-run check prerequisites but don't make any changes\n"));
printf(_(" -F, --force overwrite an existing node record\n"));
printf(_(" --dry-run check prerequisites but don't make any changes\n"));
printf(_(" -F, --force overwrite an existing node record\n"));
puts("");
printf(_("WITNESS UNREGISTER\n"));
puts("");
printf(_(" \"witness register\" unregisters a witness node.\n"));
puts("");
printf(_(" --dry-run check prerequisites but don't make any changes\n"));
printf(_(" -F, --force unregister when witness node not running\n"));
printf(_(" --node-id node ID of the witness node (provide if executing on\n"));
printf(_(" another node)\n"));
printf(_(" --dry-run check prerequisites but don't make any changes\n"));
printf(_(" -F, --force unregister when witness node not running\n"));
puts("");
return;

View File

@@ -47,7 +47,6 @@ typedef struct
/* logging options */
char log_level[MAXLEN]; /* overrides setting in repmgr.conf */
bool log_to_file;
bool quiet;
bool terse;
bool verbose;
@@ -107,7 +106,6 @@ typedef struct
bool replication_lag;
bool role;
bool slots;
bool missing_slots;
bool has_passfile;
bool replication_connection;
@@ -139,7 +137,7 @@ typedef struct
/* general configuration options */ \
"", false, false, "", false, false, \
/* logging options */ \
"", false, false, false, false, \
"", false, false, false, \
/* output options */ \
false, false, false, \
/* database connection options */ \
@@ -154,13 +152,13 @@ typedef struct
/* "standby clone"/"standby follow" options */ \
NO_UPSTREAM_NODE, \
/* "standby register" options */ \
false, -1, DEFAULT_WAIT_START, \
false, 0, DEFAULT_WAIT_START, \
/* "standby switchover" options */ \
false, false, "", false, \
/* "node status" options */ \
false, \
/* "node check" options */ \
false, false, false, false, false, false, false, false, \
false, false, false, false, false, false, false, \
/* "node join" options */ \
"", \
/* "node service" options */ \
@@ -237,6 +235,5 @@ extern void get_node_config_directory(char *config_dir_buf);
extern void get_node_data_directory(char *data_dir_buf);
extern void init_node_record(t_node_info *node_record);
extern bool can_use_pg_rewind(PGconn *conn, const char *data_directory, PQExpBufferData *reason);
extern void drop_replication_slot_if_exists(PGconn *conn, int node_id, char *slot_name);
#endif /* _REPMGR_CLIENT_GLOBAL_H_ */

View File

@@ -98,7 +98,7 @@ main(int argc, char **argv)
{
t_conninfo_param_list default_conninfo = T_CONNINFO_PARAM_LIST_INITIALIZER;
int optindex = 0;
int optindex;
int c;
char *repmgr_command = NULL;
@@ -108,7 +108,6 @@ main(int argc, char **argv)
char *dummy_action = "";
bool help_option = false;
bool option_error_found = false;
set_progname(argv[0]);
@@ -179,10 +178,7 @@ main(int argc, char **argv)
strncpy(runtime_options.username, pw->pw_name, MAXLEN);
}
/* Make getopt emitting errors */
opterr = 1;
while ((c = getopt_long(argc, argv, "?Vb:f:FwWd:h:p:U:R:S:D:ck:L:qtvC:", long_options,
while ((c = getopt_long(argc, argv, "?Vb:f:FwWd:h:p:U:R:S:D:ck:L:tvC:", long_options,
&optindex)) != -1)
{
/*
@@ -200,7 +196,13 @@ main(int argc, char **argv)
case OPT_HELP: /* --help */
help_option = true;
break;
case '?':
/* Actual help option given */
if (strcmp(argv[optind - 1], "-?") == 0)
{
help_option = true;
}
break;
case 'V':
/*
@@ -471,10 +473,6 @@ main(int argc, char **argv)
runtime_options.slots = true;
break;
case OPT_MISSING_SLOTS:
runtime_options.missing_slots = true;
break;
case OPT_HAS_PASSFILE:
runtime_options.has_passfile = true;
break;
@@ -574,12 +572,6 @@ main(int argc, char **argv)
logger_output_mode = OM_DAEMON;
break;
/* --quiet */
case 'q':
runtime_options.quiet = true;
break;
/* --terse */
case 't':
runtime_options.terse = true;
@@ -635,24 +627,9 @@ main(int argc, char **argv)
_("--recovery-min-apply-delay is now a configuration file parameter, \"recovery_min_apply_delay\""));
break;
case ':': /* missing option argument */
option_error_found = true;
break;
case '?':
/* Actual help option given? */
if (strcmp(argv[optind - 1], "-?") == 0)
{
help_option = true;
break;
}
/* otherwise fall through to default */
default: /* invalid option */
option_error_found = true;
break;
}
}
/*
* If -d/--dbname appears to be a conninfo string, validate by attempting
* to parse it (and if successful, store the parsed parameters)
@@ -753,10 +730,9 @@ main(int argc, char **argv)
if (cli_errors.head != NULL)
{
free_conninfo_params(&source_conninfo);
exit_with_cli_errors(&cli_errors, NULL);
exit_with_cli_errors(&cli_errors);
}
/*----------
* Determine the node type and action; following are valid:
*
@@ -1003,30 +979,9 @@ main(int argc, char **argv)
if (cli_errors.head != NULL)
{
free_conninfo_params(&source_conninfo);
exit_with_cli_errors(&cli_errors, valid_repmgr_command_found == true ? repmgr_command : NULL);
exit_with_cli_errors(&cli_errors);
}
/* no errors detected by repmgr, but getopt might have */
if (option_error_found == true)
{
if (valid_repmgr_command_found == true)
{
printf(_("Try \"%s --help\" or \"%s %s --help\" for more information.\n"),
progname(),
progname(),
repmgr_command);
}
else
{
printf(_("Try \"repmgr --help\" for more information.\n"));
}
free_conninfo_params(&source_conninfo);
exit(ERR_BAD_CONFIG);
}
/*
* Print any warnings about inappropriate command line options, unless
* -t/--terse set
@@ -1122,17 +1077,6 @@ main(int argc, char **argv)
logger_set_min_level(LOG_INFO);
}
/*
* If -q/--quiet supplied, suppress any non-ERROR log output.
* This overrides everything else; we'll leave it up to the user to deal with the
* consequences of e.g. running --dry-run together with -q/--quiet.
*/
if (runtime_options.quiet == true)
{
logger_set_level(LOG_ERROR);
}
/*
* Node configuration information is not needed for all actions, with
@@ -1519,7 +1463,6 @@ check_cli_parameters(const int action)
{
case PRIMARY_UNREGISTER:
case STANDBY_UNREGISTER:
case WITNESS_UNREGISTER:
case CLUSTER_EVENT:
case CLUSTER_MATRIX:
case CLUSTER_CROSSCHECK:
@@ -1560,7 +1503,6 @@ check_cli_parameters(const int action)
case STANDBY_CLONE:
case STANDBY_REGISTER:
case STANDBY_FOLLOW:
case BDR_REGISTER:
break;
default:
item_list_append_format(&cli_warnings,
@@ -1903,7 +1845,7 @@ do_help(void)
printf(_(" %s [OPTIONS] standby {register|unregister|clone|promote|follow|switchover}\n"), progname());
printf(_(" %s [OPTIONS] bdr {register|unregister}\n"), progname());
printf(_(" %s [OPTIONS] node {status|check|rejoin|service}\n"), progname());
printf(_(" %s [OPTIONS] cluster {show|event|matrix|crosscheck|cleanup}\n"), progname());
printf(_(" %s [OPTIONS] cluster {show|event|matrix|crosscheck}\n"), progname());
printf(_(" %s [OPTIONS] witness {register|unregister}\n"), progname());
puts("");
@@ -1952,7 +1894,6 @@ do_help(void)
printf(_(" --dry-run show what would happen for action, but don't execute it\n"));
printf(_(" -L, --log-level set log level (overrides configuration file; default: NOTICE)\n"));
printf(_(" --log-to-file log to file (or logging facility) defined in repmgr.conf\n"));
printf(_(" -q, --quiet suppress all log output apart from errors\n"));
printf(_(" -t, --terse don't display detail, hints and other non-critical output\n"));
printf(_(" -v, --verbose display additional log output (useful for debugging)\n"));
@@ -2978,46 +2919,3 @@ can_use_pg_rewind(PGconn *conn, const char *data_directory, PQExpBufferData *rea
return can_use;
}
void
drop_replication_slot_if_exists(PGconn *conn, int node_id, char *slot_name)
{
t_replication_slot slot_info = T_REPLICATION_SLOT_INITIALIZER;
RecordStatus record_status = get_slot_record(conn, slot_name, &slot_info);
log_verbose(LOG_DEBUG, "attempting to delete slot \"%s\" on node %i",
slot_name, node_id);
if (record_status != RECORD_FOUND)
{
/* this is a good thing */
log_verbose(LOG_INFO,
_("slot \"%s\" does not exist on node %i, nothing to remove"),
slot_name, node_id);
}
else
{
if (slot_info.active == false)
{
if (drop_replication_slot(conn, slot_name) == true)
{
log_notice(_("replication slot \"%s\" deleted on node %i"), slot_name, node_id);
}
else
{
log_error(_("unable to delete replication slot \"%s\" on node %i"), slot_name, node_id);
}
}
/*
* if active replication slot exists, call Houston as we have a
* problem
*/
else
{
log_warning(_("replication slot \"%s\" is still active on node %i"), slot_name, node_id);
}
}
}

View File

@@ -87,7 +87,6 @@
#define OPT_REMOTE_NODE_ID 1038
#define OPT_RECOVERY_CONF_ONLY 1039
#define OPT_NO_WAIT 1040
#define OPT_MISSING_SLOTS 1041
/* deprecated since 3.3 */
#define OPT_DATA_DIR 999
@@ -126,7 +125,6 @@ static struct option long_options[] =
/* logging options */
{"log-level", required_argument, NULL, 'L'},
{"log-to-file", no_argument, NULL, OPT_LOG_TO_FILE},
{"quiet", no_argument, NULL, 'q'},
{"terse", no_argument, NULL, 't'},
{"verbose", no_argument, NULL, 'v'},
@@ -166,7 +164,6 @@ static struct option long_options[] =
{"replication-lag", no_argument, NULL, OPT_REPLICATION_LAG},
{"role", no_argument, NULL, OPT_ROLE},
{"slots", no_argument, NULL, OPT_SLOTS},
{"missing-slots", no_argument, NULL, OPT_MISSING_SLOTS},
{"has-passfile", no_argument, NULL, OPT_HAS_PASSFILE},
{"replication-connection", no_argument, NULL, OPT_REPL_CONN},

View File

@@ -98,7 +98,7 @@
#log_facility=STDERR # Logging facility: possible values are STDERR, or for
# syslog integration, one of LOCAL0, LOCAL1, ..., LOCAL7, USER
#log_file='' # STDERR can be redirected to an arbitrary file
#log_file='' # stderr can be redirected to an arbitrary file
#log_status_interval=300 # interval (in seconds) for repmgrd to log a status message
@@ -143,11 +143,6 @@
# Debian/Ubuntu users: you will probably need to
# set this to the directory where `pg_ctl` is located,
# e.g. /usr/lib/postgresql/9.6/bin/
#
# *NOTE* "pg_bindir" is only used when repmgr directly
# executes PostgreSQL binaries; any user-defined scripts
# *must* be specified with the full path
#
#use_primary_conninfo_password=false # explicitly set "password" in recovery.conf's
# "primary_conninfo" parameter using the value contained
# in the environment variable PGPASSWORD
@@ -188,11 +183,11 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
# parameter can be provided multiple times.
#restore_command='' # This will be placed in the recovery.conf file generated
# by repmgr.
# by repmgr.
#archive_cleanup_command='' # This will be placed in the recovery.conf file generated
# by repmgr. Note we recommend using Barman for managing
# WAL archives (see: https://www.pgbarman.org )
# by repmgr. Note we recommend using Barman for managing
# WAL archives (see: https://www.pgbarman.org )
#recovery_min_apply_delay= # If provided, "recovery_min_apply_delay" in recovery.conf
# will be set to this value (PostgreSQL 9.4 and later).
@@ -212,7 +207,7 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
#------------------------------------------------------------------------------
# "standby follow" settings
# Standby follow settings
#------------------------------------------------------------------------------
# These settings apply when instructing a standby to follow the new primary
@@ -224,28 +219,6 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
# for the standby to connect to the primary
#------------------------------------------------------------------------------
# "standby switchover" settings
#------------------------------------------------------------------------------
# These settings apply when switching roles between a primary and a standby
# ("repmgr standby switchover").
#standby_reconnect_timeout=60 # The max length of time (in seconds) to wait
# for the demoted standby to reconnect to the promoted
# primary (note: this value should be equal to or greater
# than that set for "node_rejoin_timeout")
#------------------------------------------------------------------------------
# "node rejoin" settings
#------------------------------------------------------------------------------
# These settings apply when reintegrating a node into a replication cluster
# with "repmgrd_node_rejoin"
#node_rejoin_timeout=60 # The maximum length of time (in seconds) to wait for
# the node to reconnect to the replication cluster
#------------------------------------------------------------------------------
# Barman options
#------------------------------------------------------------------------------
@@ -263,11 +236,6 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
# These settings are only applied when repmgrd is running. Values shown
# are defaults.
#repmgrd_pid_file= # Path of PID file to use for repmgrd; if not set, a PID file will
# be generated in a temporary directory specified by the environment
# variable $TMPDIR, or if not set, in "/tmp". This value can be overridden
# by the command line option "-p/--pid-file"; the command line option
# "--no-pid-file" will force PID file creation to be skipped.
#failover=manual # one of 'automatic', 'manual'.
# determines what action to take in the event of upstream failure
#
@@ -277,7 +245,7 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
# manual attention to reattach it to replication
# (does not apply to BDR mode)
#priority=100 # indicate a preferred priority for promoting nodes;
#priority=100 # indicate a preferred priorty for promoting nodes;
# a value of zero prevents the node being promoted to primary
# (default: 100)
@@ -297,9 +265,8 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
#primary_notification_timeout=60 # Interval (in seconds) which repmgrd on a standby
# will wait for a notification from the new primary,
# before falling back to degraded monitoring
#repmgrd_standby_startup_timeout=60 # Interval (in seconds) which repmgrd on a standby will wait
# for the the local node to restart and become ready to accept connections after
# executing "follow_command" (defaults to the value set in "standby_reconnect_timeout")
#standby_reconnect_timeout=60 # Interval (in seconds) which repmgrd on a standby will wait
# to reconnect to the local node after executing "follow_command"
#monitoring_history=no # Whether to write monitoring data to the "montoring_history" table
#monitor_interval_secs=2 # Interval (in seconds) at which to write monitoring data
@@ -337,7 +304,7 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
#
# Debian/Ubuntu users: use "sudo pg_ctlcluster" to execute service control commands.
#
# For more details, see: https://repmgr.org/docs/4.1/configuration-service-commands.html
# For more details, see: https://repmgr.org/docs/4.0/configuration-service-commands.html
#service_start_command = ''
#service_stop_command = ''
@@ -381,7 +348,7 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
#------------------------------------------------------------------------------
#bdr_local_monitoring_only=false # Only monitor the local node; no checks will be
# performed on the other node
# performed on the other node
#bdr_recovery_timeout # If a BDR node was offline and has become available
# maximum length of time in seconds to wait for the
# node to reconnect to the cluster
# maximum length of time in seconds to wait for the
# node to reconnect to the cluster

View File

@@ -1,6 +1,6 @@
# repmgr extension
comment = 'Replication manager for PostgreSQL'
default_version = '4.1'
default_version = '4.0'
module_pathname = '$libdir/repmgr'
relocatable = false
schema = repmgr

View File

@@ -49,8 +49,6 @@
#define REPLICATION_TYPE_BDR 2
#define UNKNOWN_SERVER_VERSION_NUM -1
#define UNKNOWN_BDR_VERSION_NUM -1
#define UNKNOWN_TIMELINE_ID -1
#define UNKNOWN_SYSTEM_IDENTIFIER 0
@@ -60,8 +58,6 @@
#define VOTING_TERM_NOT_SET -1
#define BDR2_REPLICATION_SET_NAME "repmgr"
/*
* various default values - ensure repmgr.conf.sample is update
* if any of these are changed
@@ -85,7 +81,6 @@
#define DEFAULT_PROMOTE_CHECK_TIMEOUT 60 /* seconds */
#define DEFAULT_PROMOTE_CHECK_INTERVAL 1 /* seconds */
#define DEFAULT_STANDBY_RECONNECT_TIMEOUT 60 /* seconds */
#define DEFAULT_NODE_REJOIN_TIMEOUT 60 /* seconds */
#ifndef RECOVERY_COMMAND_FILE
#define RECOVERY_COMMAND_FILE "recovery.conf"

View File

@@ -1,2 +1,3 @@
#define REPMGR_VERSION_DATE ""
#define REPMGR_VERSION "4.1.1"
#define REPMGR_VERSION "4.0.6"

View File

@@ -214,8 +214,7 @@ monitor_bdr(void)
log_warning(_("unable to connect to node %s (ID %i)"),
cell->node_info->node_name, cell->node_info->node_id);
//cell->node_info->conn = try_reconnect(cell->node_info);
try_reconnect(&cell->node_info->conn, cell->node_info);
cell->node_info->conn = try_reconnect(cell->node_info);
/* node has recovered - log and continue */
if (cell->node_info->node_status == NODE_STATUS_UP)
@@ -294,7 +293,7 @@ loop:
/*
* if we can reload, then could need to change local_conn
*/
if (reload_config(&config_file_options, BDR))
if (reload_config(&config_file_options))
{
PQfinish(local_conn);
local_conn = establish_db_connection(config_file_options.conninfo, true);
@@ -304,12 +303,11 @@ loop:
got_SIGHUP = false;
}
/* XXX this looks like it will never be called */
if (got_SIGHUP)
{
log_debug("SIGHUP received");
if (reload_config(&config_file_options, BDR))
if (reload_config(&config_file_options))
{
PQfinish(local_conn);
local_conn = establish_db_connection(config_file_options.conninfo, true);

View File

@@ -60,8 +60,6 @@ static int primary_node_id = UNKNOWN_NODE_ID;
static t_node_info upstream_node_info = T_NODE_INFO_INITIALIZER;
static NodeInfoList sibling_nodes = T_NODE_INFO_LIST_INITIALIZER;
static instr_time last_monitoring_update;
static ElectionResult do_election(void);
static const char *_print_election_result(ElectionResult result);
@@ -83,8 +81,6 @@ static bool do_witness_failover(void);
static void update_monitoring_history(void);
static void handle_sighup(PGconn **conn, t_server_type server_type);
static const char * format_failover_state(FailoverState failover_state);
@@ -166,8 +162,8 @@ do_physical_node_check(void)
if (config_file_options.failover == FAILOVER_AUTOMATIC)
{
/*
* Check that "promote_command" and "follow_command" are defined, otherwise repmgrd
* won't be able to perform any useful action in a failover situation.
* check that promote/follow commands are defined, otherwise repmgrd
* won't be able to perform any useful action
*/
bool required_param_missing = false;
@@ -179,24 +175,14 @@ do_physical_node_check(void)
if (config_file_options.service_promote_command[0] != '\0')
{
/*
* "service_promote_command" is *not* a substitute for "promote_command";
* it is intended for use in those systems (e.g. Debian) where there's a service
* level promote command (e.g. pg_ctlcluster).
*
* "promote_command" should either execute "repmgr standby promote" directly, or
* a script which executes "repmgr standby promote". This is essential, as the
* repmgr metadata is updated by "repmgr standby promote".
*
* "service_promote_command", if set, will be executed by "repmgr standby promote",
* but never by repmgrd.
*
* if repmgrd executes "service_promote_command" directly,
* repmgr metadata won't get updated
*/
log_hint(_("\"service_promote_command\" is set, but can only be executed by \"repmgr standby promote\""));
}
required_param_missing = true;
}
if (config_file_options.follow_command[0] == '\0')
{
log_error(_("\"follow_command\" must be defined in the configuration file"));
@@ -288,6 +274,8 @@ monitor_streaming_primary(void)
local_node_info.node_status = NODE_STATUS_UNKNOWN;
close_connection(&local_conn);
/*
* as we're monitoring the primary, no point in trying to
* write the event to the database
@@ -303,7 +291,7 @@ monitor_streaming_primary(void)
termPQExpBuffer(&event_details);
try_reconnect(&local_conn, &local_node_info);
local_conn = try_reconnect(&local_node_info);
if (local_node_info.node_status == NODE_STATUS_UP)
{
@@ -547,7 +535,26 @@ loop:
if (got_SIGHUP)
{
handle_sighup(&local_conn, PRIMARY);
log_debug("SIGHUP received");
if (reload_config(&config_file_options))
{
close_connection(&local_conn);
local_conn = establish_db_connection(config_file_options.conninfo, true);
if (*config_file_options.log_file)
{
FILE *fd;
fd = freopen(config_file_options.log_file, "a", stderr);
if (fd == NULL)
{
fprintf(stderr, "error reopening stderr to \"%s\": %s",
config_file_options.log_file, strerror(errno));
}
}
}
got_SIGHUP = false;
}
log_verbose(LOG_DEBUG, "sleeping %i seconds (parameter \"monitor_interval_secs\")",
@@ -565,11 +572,9 @@ monitor_streaming_standby(void)
instr_time log_status_interval_start;
PQExpBufferData event_details;
log_debug("monitor_streaming_standby()");
reset_node_voting_status();
INSTR_TIME_SET_ZERO(last_monitoring_update);
log_debug("monitor_streaming_standby()");
/*
* If no upstream node id is specified in the metadata, we'll try and
@@ -718,9 +723,10 @@ monitor_streaming_standby(void)
_("unable to connect to upstream node \"%s\" (node ID: %i)"),
upstream_node_info.node_name, upstream_node_info.node_id);
/* XXX possible pre-action event */
/* */
if (upstream_node_info.type == STANDBY)
{
/* XXX possible pre-action event */
create_event_record(primary_conn,
&config_file_options,
config_file_options.node_id,
@@ -742,6 +748,8 @@ monitor_streaming_standby(void)
log_warning("%s", event_details.data);
termPQExpBuffer(&event_details);
close_connection(&upstream_conn);
/*
* if local node is unreachable, make a last-minute attempt to reconnect
* before continuing with the failover process
@@ -752,18 +760,13 @@ monitor_streaming_standby(void)
check_connection(&local_node_info, &local_conn);
}
try_reconnect(&upstream_conn, &upstream_node_info);
upstream_conn = try_reconnect(&upstream_node_info);
/* Node has recovered - log and continue */
if (upstream_node_info.node_status == NODE_STATUS_UP)
{
int upstream_node_unreachable_elapsed = calculate_elapsed(upstream_node_unreachable_start);
if (upstream_node_info.type == PRIMARY)
{
primary_conn = upstream_conn;
}
initPQExpBuffer(&event_details);
appendPQExpBuffer(&event_details,
@@ -771,7 +774,7 @@ monitor_streaming_standby(void)
upstream_node_unreachable_elapsed);
log_notice("%s", event_details.data);
create_event_notification(primary_conn,
create_event_notification(upstream_conn,
&config_file_options,
config_file_options.node_id,
"repmgrd_upstream_reconnect",
@@ -1041,7 +1044,8 @@ loop:
if (config_file_options.failover == FAILOVER_MANUAL)
{
appendPQExpBuffer(&monitoring_summary,
appendPQExpBuffer(
&monitoring_summary,
_(" (automatic failover disabled)"));
}
@@ -1051,18 +1055,6 @@ loop:
{
log_detail(_("waiting for upstream or another primary to reappear"));
}
else if (config_file_options.monitoring_history == true)
{
if (INSTR_TIME_IS_ZERO(last_monitoring_update))
{
log_detail(_("no monitoring statistics have been written yet"));
}
else
{
log_detail(_("last monitoring statistics update was %i seconds ago"),
calculate_elapsed(last_monitoring_update));
}
}
INSTR_TIME_SET_CURRENT(log_status_interval_start);
}
@@ -1074,16 +1066,7 @@ loop:
}
else
{
if (config_file_options.monitoring_history == true)
{
log_verbose(LOG_WARNING, _("monitoring_history requested but primary connection not available"));
}
/*
* if monitoring not in use, we'll need to ensure the local connection
* handle isn't stale
*/
(void) connection_ping(local_conn);
connection_ping(local_conn);
}
/*
@@ -1172,7 +1155,26 @@ loop:
if (got_SIGHUP)
{
handle_sighup(&local_conn, STANDBY);
log_debug("SIGHUP received");
if (reload_config(&config_file_options))
{
close_connection(&local_conn);
local_conn = establish_db_connection(config_file_options.conninfo, true);
if (*config_file_options.log_file)
{
FILE *fd;
fd = freopen(config_file_options.log_file, "a", stderr);
if (fd == NULL)
{
fprintf(stderr, "error reopening stderr to \"%s\": %s",
config_file_options.log_file, strerror(errno));
}
}
}
got_SIGHUP = false;
}
log_verbose(LOG_DEBUG, "sleeping %i seconds (parameter \"monitor_interval_secs\")",
@@ -1192,18 +1194,36 @@ monitor_streaming_witness(void)
PQExpBufferData event_details;
RecordStatus record_status;
int primary_node_id = UNKNOWN_NODE_ID;
reset_node_voting_status();
log_debug("monitor_streaming_witness()");
/*
* At this point we can't trust the local copy of "repmgr.nodes", as
* it may not have been updated. We'll scan the cluster for the current
* primary and refresh the copy from that before proceeding further.
*/
primary_conn = get_primary_connection_quiet(local_conn, &primary_node_id, NULL);
if (get_primary_node_record(local_conn, &upstream_node_info) == false)
{
PQExpBufferData event_details;
initPQExpBuffer(&event_details);
appendPQExpBuffer(&event_details,
_("unable to retrieve record for primary node"));
log_error("%s", event_details.data);
log_hint(_("execute \"repmgr witness register --force\" to update the witness node "));
close_connection(&local_conn);
create_event_notification(NULL,
&config_file_options,
config_file_options.node_id,
"repmgrd_shutdown",
false,
event_details.data);
termPQExpBuffer(&event_details);
terminate(ERR_BAD_CONFIG);
}
primary_conn = establish_db_connection(upstream_node_info.conninfo, false);
/*
* Primary node must be running at repmgrd startup.
@@ -1228,7 +1248,7 @@ monitor_streaming_witness(void)
* refresh upstream node record from primary, so it's as up-to-date
* as possible
*/
record_status = get_node_record(primary_conn, primary_node_id, &upstream_node_info);
record_status = get_node_record(primary_conn, upstream_node_info.node_id, &upstream_node_info);
/*
* This is unlikely to happen; if it does emit a warning for diagnostic
@@ -1300,7 +1320,8 @@ monitor_streaming_witness(void)
true,
event_details.data);
try_reconnect(&primary_conn, &upstream_node_info);
close_connection(&primary_conn);
primary_conn = try_reconnect(&upstream_node_info);
/* Node has recovered - log and continue */
if (upstream_node_info.node_status == NODE_STATUS_UP)
@@ -1314,7 +1335,7 @@ monitor_streaming_witness(void)
upstream_node_unreachable_elapsed);
log_notice("%s", event_details.data);
create_event_notification(primary_conn,
create_event_notification(upstream_conn,
&config_file_options,
config_file_options.node_id,
"repmgrd_upstream_reconnect",
@@ -1482,7 +1503,26 @@ loop:
if (got_SIGHUP)
{
handle_sighup(&local_conn, WITNESS);
log_debug("SIGHUP received");
if (reload_config(&config_file_options))
{
close_connection(&local_conn);
local_conn = establish_db_connection(config_file_options.conninfo, true);
if (*config_file_options.log_file)
{
FILE *fd;
fd = freopen(config_file_options.log_file, "a", stderr);
if (fd == NULL)
{
fprintf(stderr, "error reopening stderr to \"%s\": %s",
config_file_options.log_file, strerror(errno));
}
}
}
got_SIGHUP = false;
}
log_verbose(LOG_DEBUG, "sleeping %i seconds (parameter \"monitor_interval_secs\")",
@@ -1499,15 +1539,8 @@ loop:
static bool
do_primary_failover(void)
{
ElectionResult election_result;
/*
* Double-check status of the local connection
*/
check_connection(&local_node_info, &local_conn);
/* attempt to initiate voting process */
election_result = do_election();
ElectionResult election_result = do_election();
/* TODO add pre-event notification here */
failover_state = FAILOVER_STATE_UNKNOWN;
@@ -1728,21 +1761,12 @@ update_monitoring_history(void)
long long unsigned int replication_lag_bytes = 0;
/* both local and primary connections must be available */
if (PQstatus(primary_conn) != CONNECTION_OK)
{
log_warning(_("primary connection is not available, unable to update monitoring history"));
if (PQstatus(primary_conn) != CONNECTION_OK || PQstatus(local_conn) != CONNECTION_OK)
return;
}
if (PQstatus(local_conn) != CONNECTION_OK)
{
log_warning(_("local connection is not available, unable to update monitoring history"));
return;
}
if (get_replication_info(local_conn, &replication_info) == false)
{
log_warning(_("unable to retrieve replication status information, unable to update monitoring history"));
log_warning(_("unable to retrieve replication status information"));
return;
}
@@ -1794,7 +1818,8 @@ update_monitoring_history(void)
replication_lag_bytes = 0;
}
add_monitoring_record(primary_conn,
add_monitoring_record(
primary_conn,
local_conn,
primary_node_id,
local_node_info.node_id,
@@ -1804,8 +1829,6 @@ update_monitoring_history(void)
replication_info.last_xact_replay_timestamp,
replication_lag_bytes,
apply_lag_bytes);
INSTR_TIME_SET_CURRENT(last_monitoring_update);
}
@@ -1830,7 +1853,7 @@ do_upstream_standby_failover(void)
t_node_info primary_node_info = T_NODE_INFO_INITIALIZER;
RecordStatus record_status = RECORD_NOT_FOUND;
RecoveryType primary_type = RECTYPE_UNKNOWN;
int i, standby_follow_result;
int i, r;
char parsed_follow_command[MAXPGPATH] = "";
close_connection(&upstream_conn);
@@ -1864,18 +1887,9 @@ do_upstream_standby_failover(void)
if (primary_type != RECTYPE_PRIMARY)
{
if (primary_type == RECTYPE_STANDBY)
{
log_error(_("last known primary \"%s\" (ID: %i) is in recovery, not following"),
primary_node_info.node_name,
primary_node_info.node_id);
}
else
{
log_error(_("unable to determine status of last known primary \"%s\" (ID: %i), not following"),
primary_node_info.node_name,
primary_node_info.node_id);
}
log_error(_("last known primary\"%s\" (ID: %i) is in recovery, not following"),
primary_node_info.node_name,
primary_node_info.node_id);
close_connection(&primary_conn);
monitoring_state = MS_DEGRADED;
@@ -1886,6 +1900,8 @@ do_upstream_standby_failover(void)
/* Close the connection to this server */
close_connection(&local_conn);
initPQExpBuffer(&event_details);
log_debug(_("standby follow command is:\n \"%s\""),
config_file_options.follow_command);
@@ -1895,12 +1911,10 @@ do_upstream_standby_failover(void)
*/
parse_follow_command(parsed_follow_command, config_file_options.follow_command, primary_node_info.node_id);
standby_follow_result = system(parsed_follow_command);
r = system(parsed_follow_command);
if (standby_follow_result != 0)
if (r != 0)
{
initPQExpBuffer(&event_details);
appendPQExpBuffer(&event_details,
_("unable to execute follow command:\n %s"),
config_file_options.follow_command);
@@ -1911,7 +1925,8 @@ do_upstream_standby_failover(void)
* It may not possible to write to the event notification table but we
* should be able to generate an external notification if required.
*/
create_event_notification(primary_conn,
create_event_notification(
primary_conn,
&config_file_options,
local_node_info.node_id,
"repmgrd_failover_follow",
@@ -1924,22 +1939,18 @@ do_upstream_standby_failover(void)
/*
* It's possible that the standby is still starting up after the "follow_command"
* completes, so poll for a while until we get a connection.
*
* NOTE: we've previously closed the local connection, so even if the follow command
* failed for whatever reason and the local node remained up, we can re-open
* the local connection.
*/
for (i = 0; i < config_file_options.repmgrd_standby_startup_timeout; i++)
for (i = 0; i < config_file_options.standby_reconnect_timeout; i++)
{
local_conn = establish_db_connection(local_node_info.conninfo, false);
if (PQstatus(local_conn) == CONNECTION_OK)
break;
log_debug("sleeping 1 second; %i of %i (\"repmgrd_standby_startup_timeout\") attempts to reconnect to local node",
log_debug("sleeping 1 second; %i of %i attempts to reconnect to local node",
i + 1,
config_file_options.repmgrd_standby_startup_timeout);
config_file_options.standby_reconnect_timeout);
sleep(1);
}
@@ -1953,47 +1964,28 @@ do_upstream_standby_failover(void)
/* refresh shared memory settings which will have been zapped by the restart */
repmgrd_set_local_node_id(local_conn, config_file_options.node_id);
/*
*
*/
if (standby_follow_result != 0)
if (update_node_record_set_upstream(primary_conn,
local_node_info.node_id,
primary_node_info.node_id) == false)
{
monitoring_state = MS_DEGRADED;
INSTR_TIME_SET_CURRENT(degraded_monitoring_start);
appendPQExpBuffer(&event_details,
_("unable to set node %i's new upstream ID to %i"),
local_node_info.node_id,
primary_node_info.node_id);
return FAILOVER_STATE_FOLLOW_FAIL;
}
log_error("%s", event_details.data);
/*
* update upstream_node_id to primary node (but only if follow command
* was successful)
*/
create_event_notification(
NULL,
&config_file_options,
local_node_info.node_id,
"repmgrd_failover_follow",
false,
event_details.data);
{
if (update_node_record_set_upstream(primary_conn,
local_node_info.node_id,
primary_node_info.node_id) == false)
{
initPQExpBuffer(&event_details);
appendPQExpBuffer(&event_details,
_("unable to set node %i's new upstream ID to %i"),
local_node_info.node_id,
primary_node_info.node_id);
termPQExpBuffer(&event_details);
log_error("%s", event_details.data);
create_event_notification(NULL,
&config_file_options,
local_node_info.node_id,
"repmgrd_failover_follow",
false,
event_details.data);
termPQExpBuffer(&event_details);
terminate(ERR_BAD_CONFIG);
}
terminate(ERR_BAD_CONFIG);
}
/* refresh own internal node record */
@@ -2009,8 +2001,6 @@ do_upstream_standby_failover(void)
local_node_info.upstream_node_id = primary_node_info.node_id;
}
initPQExpBuffer(&event_details);
appendPQExpBuffer(&event_details,
_("node %i is now following primary node %i"),
local_node_info.node_id,
@@ -2018,7 +2008,8 @@ do_upstream_standby_failover(void)
log_notice("%s", event_details.data);
create_event_notification(primary_conn,
create_event_notification(
primary_conn,
&config_file_options,
local_node_info.node_id,
"repmgrd_failover_follow",
@@ -2065,10 +2056,10 @@ promote_self(void)
return FAILOVER_STATE_PROMOTION_FAILED;
}
/* the presence of this command has been established already */
/* the presence of either of this command has been established already */
promote_command = config_file_options.promote_command;
log_info(_("promote_command is:\n \"%s\""),
log_debug("promote command is:\n \"%s\"",
promote_command);
if (log_type == REPMGR_STDERR && *config_file_options.log_file)
@@ -2400,7 +2391,7 @@ follow_new_primary(int new_primary_id)
* completes, so poll for a while until we get a connection.
*/
for (i = 0; i < config_file_options.repmgrd_standby_startup_timeout; i++)
for (i = 0; i < config_file_options.standby_reconnect_timeout; i++)
{
local_conn = establish_db_connection(local_node_info.conninfo, false);
@@ -2409,7 +2400,7 @@ follow_new_primary(int new_primary_id)
log_debug("sleeping 1 second; %i of %i attempts to reconnect to local node",
i + 1,
config_file_options.repmgrd_standby_startup_timeout);
config_file_options.standby_reconnect_timeout);
sleep(1);
}
@@ -2475,26 +2466,20 @@ witness_follow_new_primary(int new_primary_id)
{
RecoveryType primary_recovery_type = get_recovery_type(upstream_conn);
switch (primary_recovery_type)
if (primary_recovery_type == RECTYPE_PRIMARY)
{
case RECTYPE_PRIMARY:
new_primary_ok = true;
break;
case RECTYPE_STANDBY:
new_primary_ok = false;
log_warning(_("new primary is not in recovery"));
break;
case RECTYPE_UNKNOWN:
new_primary_ok = false;
log_warning(_("unable to determine status of new primary"));
break;
new_primary_ok = true;
}
else
{
new_primary_ok = false;
log_warning(_("new primary is not in recovery"));
close_connection(&upstream_conn);
}
}
if (new_primary_ok == false)
{
close_connection(&upstream_conn);
return FAILOVER_STATE_FOLLOW_FAIL;
}
@@ -2980,30 +2965,3 @@ format_failover_state(FailoverState failover_state)
}
static void
handle_sighup(PGconn **conn, t_server_type server_type)
{
log_debug("SIGHUP received");
if (reload_config(&config_file_options, server_type))
{
PQfinish(*conn);
*conn = establish_db_connection(config_file_options.conninfo, true);
}
if (*config_file_options.log_file)
{
FILE *fd;
log_debug("reopening %s", config_file_options.log_file);
fd = freopen(config_file_options.log_file, "a", stderr);
if (fd == NULL)
{
fprintf(stderr, "error reopening stderr to \"%s\": %s",
config_file_options.log_file, strerror(errno));
}
}
got_SIGHUP = false;
}

153
repmgrd.c
View File

@@ -35,10 +35,8 @@
static char *config_file = NULL;
static bool verbose = false;
static char pid_file[MAXPGPATH];
static bool daemonize = true;
static bool show_pid_file = false;
static bool no_pid_file = false;
static char *pid_file = NULL;
static bool daemonize = false;
t_configuration_options config_file_options = T_CONFIGURATION_OPTIONS_INITIALIZER;
@@ -101,10 +99,8 @@ main(int argc, char **argv)
{"config-file", required_argument, NULL, 'f'},
/* daemon options */
{"daemonize", optional_argument, NULL, 'd'},
{"daemonize", no_argument, NULL, 'd'},
{"pid-file", required_argument, NULL, 'p'},
{"show-pid-file", no_argument, NULL, 's'},
{"no-pid-file", no_argument, NULL, OPT_NO_PID_FILE},
/* logging options */
{"log-level", required_argument, NULL, 'L'},
@@ -117,6 +113,8 @@ main(int argc, char **argv)
set_progname(argv[0]);
srand(time(NULL));
/* Disallow running as root */
if (geteuid() == 0)
{
@@ -130,10 +128,6 @@ main(int argc, char **argv)
exit(1);
}
srand(time(NULL));
memset(pid_file, 0, MAXPGPATH);
while ((c = getopt_long(argc, argv, "?Vf:L:vdp:m", long_options, &optindex)) != -1)
{
switch (c)
@@ -175,22 +169,11 @@ main(int argc, char **argv)
/* daemon options */
case 'd':
if (optarg != NULL)
{
daemonize = parse_bool(optarg, "-d/--daemonize", &cli_errors);
}
daemonize = true;
break;
case 'p':
strncpy(pid_file, optarg, MAXPGPATH);
break;
case 's':
show_pid_file = true;
break;
case OPT_NO_PID_FILE:
no_pid_file = true;
pid_file = optarg;
break;
/* logging options */
@@ -237,7 +220,7 @@ main(int argc, char **argv)
/* Exit here already if errors in command line options found */
if (cli_errors.head != NULL)
{
exit_with_cli_errors(&cli_errors, NULL);
exit_with_cli_errors(&cli_errors);
}
startup_event_logged = false;
@@ -256,58 +239,6 @@ main(int argc, char **argv)
*/
load_config(config_file, verbose, false, &config_file_options, argv[0]);
/* Determine pid file location, unless --no-pid-file supplied */
if (no_pid_file == false)
{
if (config_file_options.repmgrd_pid_file[0] != '\0')
{
if (pid_file[0] != '\0')
{
log_warning(_("\"repmgrd_pid_file\" will be overridden by --pid-file"));
}
else
{
strncpy(pid_file, config_file_options.repmgrd_pid_file, MAXPGPATH);
}
}
/* no pid file provided - determine location */
if (pid_file[0] == '\0')
{
/* packagers: if feasible, patch PID file path into "package_pid_file" */
char package_pid_file[MAXPGPATH] = "";
if (package_pid_file[0] != '\0')
{
maxpath_snprintf(pid_file, "%s", package_pid_file);
}
else
{
const char *tmpdir = getenv("TMPDIR");
if (!tmpdir)
tmpdir = "/tmp";
maxpath_snprintf(pid_file, "%s/repmgrd.pid", tmpdir);
}
}
}
else
{
/* --no-pid-file supplied - overwrite any value provided with --pid-file ... */
memset(pid_file, 0, MAXPGPATH);
}
/* If --show-pid-file supplied, output the location (if set) and exit */
if (show_pid_file == true)
{
printf("%s\n", pid_file);
exit(SUCCESS);
}
/* Some configuration file items can be overriden by command line options */
@@ -320,6 +251,8 @@ main(int argc, char **argv)
strncpy(config_file_options.log_level, cli_log_level, MAXLEN);
}
log_notice(_("repmgrd (repmgr %s) starting up"), REPMGR_VERSION);
/*
* -m/--monitoring-history, if provided, will override repmgr.conf's
* monitoring_history; this is for backwards compatibility as it's
@@ -347,8 +280,6 @@ main(int argc, char **argv)
logger_init(&config_file_options, progname());
log_notice(_("repmgrd (%s %s) starting up"), progname(), REPMGR_VERSION);
if (verbose)
logger_set_verbose();
@@ -483,7 +414,7 @@ main(int argc, char **argv)
daemonize_process();
}
if (pid_file[0] != '\0')
if (pid_file != NULL)
{
check_and_create_pid_file(pid_file);
}
@@ -738,8 +669,6 @@ show_help(void)
{
printf(_("%s: replication management daemon for PostgreSQL\n"), progname());
puts("");
printf(_("%s monitors a cluster of servers and optionally performs failover.\n"), progname());
puts("");
printf(_("Usage:\n"));
printf(_(" %s [OPTIONS]\n"), progname());
@@ -759,21 +688,19 @@ show_help(void)
puts("");
printf(_("Daemon configuration options:\n"));
printf(_(" -d, --daemonize[=true/false]\n"));
printf(_(" detach process from foreground (default: true)\n"));
printf(_(" -p, --pid-file=PATH use the specified PID file\n"));
printf(_(" -s, --show-pid-file show PID file which would be used by the current configuration\n"));
printf(_(" --no-pid-file don't write a PID file\n"));
printf(_("General configuration options:\n"));
printf(_(" -d, --daemonize detach process from foreground\n"));
printf(_(" -p, --pid-file=PATH write a PID file\n"));
puts("");
printf(_("%s monitors a cluster of servers and optionally performs failover.\n"), progname());
}
void
try_reconnect(PGconn **conn, t_node_info *node_info)
PGconn *
try_reconnect(t_node_info *node_info)
{
PGconn *our_conn;
PGconn *conn;
t_conninfo_param_list conninfo_params = T_CONNINFO_PARAM_LIST_INITIALIZER;
int i;
@@ -782,6 +709,7 @@ try_reconnect(PGconn **conn, t_node_info *node_info)
initialize_conninfo_params(&conninfo_params, false);
/* we assume by now the conninfo string is parseable */
(void) parse_conninfo_string(node_info->conninfo, &conninfo_params, NULL, false);
@@ -804,47 +732,18 @@ try_reconnect(PGconn **conn, t_node_info *node_info)
* degraded monitoring? - make that configurable
*/
our_conn = establish_db_connection_by_params(&conninfo_params, false);
conn = establish_db_connection_by_params(&conninfo_params, false);
if (PQstatus(our_conn) == CONNECTION_OK)
if (PQstatus(conn) == CONNECTION_OK)
{
free_conninfo_params(&conninfo_params);
log_info(_("connection to node %i succeeded"), node_info->node_id);
if (PQstatus(*conn) == CONNECTION_BAD)
{
log_verbose(LOG_INFO, "original connection handle returned CONNECTION_BAD, using new connection");
close_connection(conn);
*conn = our_conn;
}
else
{
ExecStatusType ping_result;
ping_result = connection_ping(*conn);
if (ping_result != PGRES_TUPLES_OK)
{
log_info("original connnection no longer available, using new connection");
close_connection(conn);
*conn = our_conn;
}
else
{
log_info(_("original connection is still available"));
PQfinish(our_conn);
}
}
node_info->node_status = NODE_STATUS_UP;
return;
return conn;
}
close_connection(&our_conn);
log_notice(_("unable to reconnect to node %i"), node_info->node_id);
close_connection(&conn);
log_notice(_("unable to reconnect to node"));
}
if (i + 1 < max_attempts)
@@ -863,7 +762,7 @@ try_reconnect(PGconn **conn, t_node_info *node_info)
free_conninfo_params(&conninfo_params);
return;
return NULL;
}
@@ -903,7 +802,7 @@ terminate(int retval)
{
logger_shutdown();
if (pid_file[0] != '\0')
if (pid_file)
{
unlink(pid_file);
}

View File

@@ -10,8 +10,6 @@
#include <time.h>
#include "portability/instr_time.h"
#define OPT_NO_PID_FILE 1000
extern volatile sig_atomic_t got_SIGHUP;
extern MonitoringState monitoring_state;
extern instr_time degraded_monitoring_start;
@@ -21,13 +19,11 @@ extern t_node_info local_node_info;
extern PGconn *local_conn;
extern bool startup_event_logged;
void try_reconnect(PGconn **conn, t_node_info *node_info);
PGconn *try_reconnect(t_node_info *node_info);
int calculate_elapsed(instr_time start_time);
const char *print_monitoring_state(MonitoringState monitoring_state);
void update_registration(PGconn *conn);
void terminate(int retval);
#endif /* _REPMGRD_H_ */