Commit Graph

762 Commits

Author SHA1 Message Date
Ian Barwick
dfdebd6c08 Enable provision of "archive_cleanup_command" in recovery.conf
If "archive_cleanup_command" is defined in "repmgr.conf", a corresponding
entry will be made in the node's "recovery.conf" file after cloning a
standby.

Note that we recommend using PgBarman to manage WAL archives, but are
providing this facility to help repmgr to be integrated in existing environments.

Implements GitHub #416.
2018-04-03 14:10:21 +09:00
Ian Barwick
63a11f8926 "standby promote": make timeout values configurable
This introduces following new configuration file parameters, which
were previously hard-coded values:

 - promote_check_timeout
 - promote_check_interval

Implements GitHub #387.
2018-04-03 14:10:14 +09:00
Ian Barwick
a3f371b8c0 "node rejoin": actively check for node to rejoin cluster
Previously repmgr was relying on whatever command was configured to
start PostgreSQL to determine whether the node being rejoined had
started correctly. However it's preferable to actively poll the upstream
to confirm it has restarted and actually attached as a standby before
confirming success of the "node rejoin" action.

This can be overridden with the -W/--no-wait option.

(Note that for consistency with other PostgreSQL utilities, the
short form of the --wait option is now "-w"; this is currently
only used in "repmgr standby follow".)

Also update "repmgr node rejoin" documentation with a list of supported
options, and add some useful index entries for "pg_rewind".

Implements GitHub #415.
2018-04-03 10:34:44 +09:00
Ian Barwick
938692c169 doc: fix option description for "repmgr primary register" 2018-04-03 10:09:24 +09:00
Ian Barwick
ad24b04c35 Refactor pg_control parsing
The "data_checksum_version" field towards the end of the ControlFileData struct,
meaning its position varies between versions. Previously this wasn't a problem
as it was only required for operations involving 9.5 and later, and its position
within the control file has not changed between the current release and current
HEAD.

However, in order to support pg_rewind in 9.3 and 9.4, which both have changes in
the control file format, we'll need version-specific parsing. This will also make
it easier to deal with any future changes to the control file format.
2018-04-02 20:54:42 +09:00
Ian Barwick
3ccf1cf182 Enable pg_rewind to be used with PostgreSQL 9.3/9.4
pg_rewind is not part of the core distribution for those, but we
provided support in repmgr 3.3 so should extend it to repmgr 4.

Note that there is no check in place whether the pg_rewind binary
exists, so it's up to the user to ensure it's present.

Addresses GitHub #413.
2018-04-02 20:54:29 +09:00
Ian Barwick
5e4bdb5a1b repmgrd: handle failover with two nodes in the primary location
If two nodes were in the primary location, and at least one node in
another location, the non-failed node in the primary location was not
recognising itself as a promotion candidate.

Addresses GitHub #407.
2018-04-02 20:51:27 +09:00
Ian Barwick
50321bb95d Log pg_control access errors as WARNINGs rather than DEBUG
This will make it easier to diagnose issues, possibly with an incorrect
"data_directory" setting in "repmgr.conf".
2018-04-02 09:28:56 +09:00
Ian Barwick
253c215c12 Add TODO list
This file will collate various requests and ideas for future developement.
In particular it will reference requests which come in via the GitHub issue
tracker, so we can acknowledge and close off the request and not have an
open unresolved issue hanging around.
2018-03-30 14:24:36 +09:00
Ian Barwick
22c40ae62d doc: update HISTORY and release notes 2018-03-30 09:41:48 +09:00
Ian Barwick
239a548e9d "standby switchover": force checkpoint if pg_rewind requested.
Addresses issue described in GitHub #378.

PostgreSQL itself doesn't issue a checkpoint after promotion to ensure
the newly promoted server is available as quickly as possible, so we'll
only execute an explicit CHECKPOINT when it's actually required, i.e.
when pg_rewind will be executed. This is required as pg_rewind uses
the timeline reported in the pg_control file to compare with the
server to be rewound, and the pg_control timeline is only updated after
the first checkpoint, so there is an interval where pg_rewind will
erroneously assume both servers are on the timeline and take no action.
2018-03-29 23:55:08 +09:00
Ian Barwick
231ef5563e "standby switchover": update hint 2018-03-29 23:41:59 +09:00
Ian Barwick
e1413fa8ea Fix minimum accepted value for "degraded_monitoring_timeout"
Should be -1, the default.

Addresses GitHub #411.
2018-03-29 21:15:03 +09:00
Ian Barwick
7111483b65 repmgr: move demoted primary check to the final step during switchover
This will give the demoted primary more time to start up as a standby,
during which "standby follow" can be executed on sibling nodes, if
specified.
2018-03-27 16:44:15 +09:00
Ian Barwick
1558497ae4 repmgr: poll demoted primary after restart during switchover
During a switchover operation, once the demoted primary has been restarted
as a standby, repmgr attempts to reconnect to verify its status and drop
any redundant replication slots. However it's possible the standby may still
be in the startup phase, so poll for "standby_reconnect_timeout" seconds
before giving up.

Addresses GitHub #408.
2018-03-27 16:44:10 +09:00
Ian Barwick
9c5e76401f Fix "repmgr cluster crosscheck" output
Addresses GitHub #398.
2018-03-27 16:44:04 +09:00
Ian Barwick
a403da67bc Consolidate connection closure calls 2018-03-27 16:43:59 +09:00
Ian Barwick
71b13f5307 doc: add note about remote command execution
When executing a command on a remote server, repmgr expects the remote binary
to be in the same location as the local binary. It's reasonable to assume
repmgr will be deployed in a unified environment; if not, the onus is on the
user to ensure repmgr can find the remote binary, e.g. by creating appropriate
symlinks.

Addresses query in GitHub #406.
2018-03-27 16:43:55 +09:00
Ian Barwick
1c5561d114 Misc tweaks to witness code 2018-03-26 20:59:29 +09:00
Ian Barwick
c0b607ef41 doc: update list of event notifications 2018-03-23 10:40:39 +08:00
Ian Barwick
462fdca4b4 Tidy up queries in dbutils.c
- standardize formatting
- prefix various internal function calls with "pg_catalog.", to
  mitigate possible risks from CVE-2018-1058
2018-03-23 10:28:28 +08:00
Ian Barwick
0e55a60660 Add event "repmgrd_failover_aborted" 2018-03-21 13:23:06 +09:00
Ian Barwick
93deab3e96 Add error code ERR_FOLLOW_FAIL 2018-03-21 13:11:30 +09:00
Ian Barwick
81c69e3677 repmgrd: fix typo 2018-03-21 12:36:15 +09:00
Ian Barwick
0219f4c91f Always set "connect_timeout" when pinging a PostgreSQL instance
Insert "connect_timeout=2" into the connection parameters, if not
explicitly set by the user. This will prevent excessive wait time
for the host operating system to report a connection timeout.
2018-03-21 11:48:57 +09:00
Ian Barwick
85a4adc99c Update HISTORY 2018-03-21 06:48:32 +09:00
Martín Marqués
208d7d418e While reviewing 7cb6e5af8d before merging
I noticed that besides the result cleanup added, there was still a missing
spot inside the if condition.

Adding the PQclear that was missing.
2018-03-13 11:43:36 -03:00
Martín Marqués
7cb6e5af8d Merge pull request #403 from AndrzejNowicki/master
Clear node list to avoid memory leak on witness
2018-03-13 11:41:10 -03:00
Andrzej Nowicki
d2a2df13d5 One more memory leak fixed 2018-03-13 11:23:33 +01:00
Andrzej Nowicki
358e001218 Clear node list to avoid memory leak, fixes #402 2018-03-13 11:05:24 +01:00
Ian Barwick
d7702b3444 Correctly handle error message pointer when parsing strings.
When parsing conninfo strings, ensure the error message pointer is
actually returned to the caller.

Not a criticial issue, just meant the contents of the error message
were not being displayed.
2018-03-10 14:29:12 +09:00
Ian Barwick
a8286030c0 doc: update "repmgr primary unregister" description
As noted by GitHub user yonj1e in GitHub #396.
2018-03-08 19:11:41 +09:00
Ian Barwick
ff0ba3e19a doc: update FAQ
Additional clarification for "repmgr standby clone --recovery-conf-only"
2018-03-08 19:11:33 +09:00
Ian Barwick
6f5cce7e6f doc: update FAQ
Add entry about upgrading PostgreSQL
2018-03-08 19:11:21 +09:00
Ian Barwick
509f7a8255 Fix parsing of -k/--keep-history option
GitHub #394.
2018-03-07 19:22:04 +09:00
Ian Barwick
e8cdf72ecd Add 4.0.4 release notes 2018-03-07 19:21:49 +09:00
Ian Barwick
2a99dfa15b repmgrd: fix failover handling in "manual" mode
Regression was introduced in commit c7a585c555
2018-03-07 19:21:40 +09:00
Ian Barwick
bad034f7ee repmgrd: remove duplicate local record check in BDR mode 2018-03-07 19:21:33 +09:00
Ian Barwick
cdb504d700 Add event "repmgrd_shutdown"
Implements GitHub #393
2018-03-06 11:00:03 +09:00
Ian Barwick
0af2077bed repmgrd: add debug log output for "monitor_interval_secs" sleep in all modes 2018-03-06 10:56:21 +09:00
Emre Hasegeli
dea87b7285 Add witness options to the main help
GitHub #392
2018-03-06 10:55:06 +09:00
Martín Marqués
d6b13f3428 Merge pull request #391 from hasegeli/helpmissing
Add missing options to the main help
2018-03-02 15:36:53 -03:00
Emre Hasegeli
5808d8190e Add missing options to the main help 2018-03-02 17:08:50 +01:00
Ian Barwick
d2a5cc23cc "standby clone": improve replication user selection
Use the upstream node's replication user when checking the replication
connection.
2018-03-02 16:43:23 +09:00
Ian Barwick
9981ede1af "standby clone": fix --superuser handling
get_superuser_connection() was erroneously using the local node record
to connect to as a superuser, which works when registering the primary
but obviously not when cloning a standby.

Addresses GitHub #380.
2018-03-02 16:43:19 +09:00
Ian Barwick
40ccae57a3 Update HISTORY 2018-03-02 11:05:30 +09:00
Ian Barwick
3c2b8e5792 "standby clone": remove restriction on replication slots in Barman mode
While it's preferable to avoid standby replication slots if Barman is in
use, there's no technical reason to prevent this.

Implements GitHub #379.
2018-03-02 11:05:25 +09:00
Ian Barwick
354231284e repmgr: escape "restore_command" in generated recovery.conf 2018-03-02 11:05:21 +09:00
Ian Barwick
dbbfcb6a63 "standy clone": fix primary_conninfo when --upstream-conninfo provided 2018-03-02 11:05:15 +09:00
Ian Barwick
bc766a48ed repmgrd: retry standby connection after cascading standby failover 2018-03-02 11:05:07 +09:00