Commit Graph

1959 Commits

Author SHA1 Message Date
Ian Barwick 938692c169 doc: fix option description for "repmgr primary register" 2018-04-03 10:09:24 +09:00
Ian Barwick ad24b04c35 Refactor pg_control parsing
The "data_checksum_version" field towards the end of the ControlFileData struct,
meaning its position varies between versions. Previously this wasn't a problem
as it was only required for operations involving 9.5 and later, and its position
within the control file has not changed between the current release and current
HEAD.

However, in order to support pg_rewind in 9.3 and 9.4, which both have changes in
the control file format, we'll need version-specific parsing. This will also make
it easier to deal with any future changes to the control file format.
2018-04-02 20:54:42 +09:00
Ian Barwick 3ccf1cf182 Enable pg_rewind to be used with PostgreSQL 9.3/9.4
pg_rewind is not part of the core distribution for those, but we
provided support in repmgr 3.3 so should extend it to repmgr 4.

Note that there is no check in place whether the pg_rewind binary
exists, so it's up to the user to ensure it's present.

Addresses GitHub #413.
2018-04-02 20:54:29 +09:00
Ian Barwick 5e4bdb5a1b repmgrd: handle failover with two nodes in the primary location
If two nodes were in the primary location, and at least one node in
another location, the non-failed node in the primary location was not
recognising itself as a promotion candidate.

Addresses GitHub #407.
2018-04-02 20:51:27 +09:00
Ian Barwick 50321bb95d Log pg_control access errors as WARNINGs rather than DEBUG
This will make it easier to diagnose issues, possibly with an incorrect
"data_directory" setting in "repmgr.conf".
2018-04-02 09:28:56 +09:00
Ian Barwick 253c215c12 Add TODO list
This file will collate various requests and ideas for future developement.
In particular it will reference requests which come in via the GitHub issue
tracker, so we can acknowledge and close off the request and not have an
open unresolved issue hanging around.
2018-03-30 14:24:36 +09:00
Ian Barwick 22c40ae62d doc: update HISTORY and release notes 2018-03-30 09:41:48 +09:00
Ian Barwick 239a548e9d "standby switchover": force checkpoint if pg_rewind requested.
Addresses issue described in GitHub #378.

PostgreSQL itself doesn't issue a checkpoint after promotion to ensure
the newly promoted server is available as quickly as possible, so we'll
only execute an explicit CHECKPOINT when it's actually required, i.e.
when pg_rewind will be executed. This is required as pg_rewind uses
the timeline reported in the pg_control file to compare with the
server to be rewound, and the pg_control timeline is only updated after
the first checkpoint, so there is an interval where pg_rewind will
erroneously assume both servers are on the timeline and take no action.
2018-03-29 23:55:08 +09:00
Ian Barwick 231ef5563e "standby switchover": update hint 2018-03-29 23:41:59 +09:00
Ian Barwick e1413fa8ea Fix minimum accepted value for "degraded_monitoring_timeout"
Should be -1, the default.

Addresses GitHub #411.
2018-03-29 21:15:03 +09:00
Ian Barwick 7111483b65 repmgr: move demoted primary check to the final step during switchover
This will give the demoted primary more time to start up as a standby,
during which "standby follow" can be executed on sibling nodes, if
specified.
2018-03-27 16:44:15 +09:00
Ian Barwick 1558497ae4 repmgr: poll demoted primary after restart during switchover
During a switchover operation, once the demoted primary has been restarted
as a standby, repmgr attempts to reconnect to verify its status and drop
any redundant replication slots. However it's possible the standby may still
be in the startup phase, so poll for "standby_reconnect_timeout" seconds
before giving up.

Addresses GitHub #408.
2018-03-27 16:44:10 +09:00
Ian Barwick 9c5e76401f Fix "repmgr cluster crosscheck" output
Addresses GitHub #398.
2018-03-27 16:44:04 +09:00
Ian Barwick a403da67bc Consolidate connection closure calls 2018-03-27 16:43:59 +09:00
Ian Barwick 71b13f5307 doc: add note about remote command execution
When executing a command on a remote server, repmgr expects the remote binary
to be in the same location as the local binary. It's reasonable to assume
repmgr will be deployed in a unified environment; if not, the onus is on the
user to ensure repmgr can find the remote binary, e.g. by creating appropriate
symlinks.

Addresses query in GitHub #406.
2018-03-27 16:43:55 +09:00
Ian Barwick 1c5561d114 Misc tweaks to witness code 2018-03-26 20:59:29 +09:00
Ian Barwick c0b607ef41 doc: update list of event notifications 2018-03-23 10:40:39 +08:00
Ian Barwick 462fdca4b4 Tidy up queries in dbutils.c
- standardize formatting
- prefix various internal function calls with "pg_catalog.", to
  mitigate possible risks from CVE-2018-1058
2018-03-23 10:28:28 +08:00
Ian Barwick 0e55a60660 Add event "repmgrd_failover_aborted" 2018-03-21 13:23:06 +09:00
Ian Barwick 93deab3e96 Add error code ERR_FOLLOW_FAIL 2018-03-21 13:11:30 +09:00
Ian Barwick 81c69e3677 repmgrd: fix typo 2018-03-21 12:36:15 +09:00
Ian Barwick 0219f4c91f Always set "connect_timeout" when pinging a PostgreSQL instance
Insert "connect_timeout=2" into the connection parameters, if not
explicitly set by the user. This will prevent excessive wait time
for the host operating system to report a connection timeout.
2018-03-21 11:48:57 +09:00
Ian Barwick 85a4adc99c Update HISTORY 2018-03-21 06:48:32 +09:00
Martín Marqués 208d7d418e While reviewing 7cb6e5af8d before merging
I noticed that besides the result cleanup added, there was still a missing
spot inside the if condition.

Adding the PQclear that was missing.
2018-03-13 11:43:36 -03:00
Martín Marqués 7cb6e5af8d Merge pull request #403 from AndrzejNowicki/master
Clear node list to avoid memory leak on witness
2018-03-13 11:41:10 -03:00
Andrzej Nowicki d2a2df13d5 One more memory leak fixed 2018-03-13 11:23:33 +01:00
Andrzej Nowicki 358e001218 Clear node list to avoid memory leak, fixes #402 2018-03-13 11:05:24 +01:00
Ian Barwick d7702b3444 Correctly handle error message pointer when parsing strings.
When parsing conninfo strings, ensure the error message pointer is
actually returned to the caller.

Not a criticial issue, just meant the contents of the error message
were not being displayed.
2018-03-10 14:29:12 +09:00
Ian Barwick a8286030c0 doc: update "repmgr primary unregister" description
As noted by GitHub user yonj1e in GitHub #396.
2018-03-08 19:11:41 +09:00
Ian Barwick ff0ba3e19a doc: update FAQ
Additional clarification for "repmgr standby clone --recovery-conf-only"
2018-03-08 19:11:33 +09:00
Ian Barwick 6f5cce7e6f doc: update FAQ
Add entry about upgrading PostgreSQL
2018-03-08 19:11:21 +09:00
Ian Barwick 509f7a8255 Fix parsing of -k/--keep-history option
GitHub #394.
2018-03-07 19:22:04 +09:00
Ian Barwick e8cdf72ecd Add 4.0.4 release notes 2018-03-07 19:21:49 +09:00
Ian Barwick 2a99dfa15b repmgrd: fix failover handling in "manual" mode
Regression was introduced in commit c7a585c555
2018-03-07 19:21:40 +09:00
Ian Barwick bad034f7ee repmgrd: remove duplicate local record check in BDR mode 2018-03-07 19:21:33 +09:00
Ian Barwick cdb504d700 Add event "repmgrd_shutdown"
Implements GitHub #393
2018-03-06 11:00:03 +09:00
Ian Barwick 0af2077bed repmgrd: add debug log output for "monitor_interval_secs" sleep in all modes 2018-03-06 10:56:21 +09:00
Emre Hasegeli dea87b7285 Add witness options to the main help
GitHub #392
2018-03-06 10:55:06 +09:00
Martín Marqués d6b13f3428 Merge pull request #391 from hasegeli/helpmissing
Add missing options to the main help
2018-03-02 15:36:53 -03:00
Emre Hasegeli 5808d8190e Add missing options to the main help 2018-03-02 17:08:50 +01:00
Ian Barwick d2a5cc23cc "standby clone": improve replication user selection
Use the upstream node's replication user when checking the replication
connection.
2018-03-02 16:43:23 +09:00
Ian Barwick 9981ede1af "standby clone": fix --superuser handling
get_superuser_connection() was erroneously using the local node record
to connect to as a superuser, which works when registering the primary
but obviously not when cloning a standby.

Addresses GitHub #380.
2018-03-02 16:43:19 +09:00
Ian Barwick 40ccae57a3 Update HISTORY 2018-03-02 11:05:30 +09:00
Ian Barwick 3c2b8e5792 "standby clone": remove restriction on replication slots in Barman mode
While it's preferable to avoid standby replication slots if Barman is in
use, there's no technical reason to prevent this.

Implements GitHub #379.
2018-03-02 11:05:25 +09:00
Ian Barwick 354231284e repmgr: escape "restore_command" in generated recovery.conf 2018-03-02 11:05:21 +09:00
Ian Barwick dbbfcb6a63 "standy clone": fix primary_conninfo when --upstream-conninfo provided 2018-03-02 11:05:15 +09:00
Ian Barwick bc766a48ed repmgrd: retry standby connection after cascading standby failover 2018-03-02 11:05:07 +09:00
Ian Barwick 55441f2729 repmgrd: add configuration file parameter "standby_reconnect_timeout"
This is used for determining a timeout when reconnecting to the standby
after executing the "follow_command". This will normally not need to be
set explicitly, but maybe useful in cases where the standby's startup
phase can last longer than usual.
2018-03-02 11:04:56 +09:00
Ian Barwick e38a9ec7e1 repmgrd: fix main monitoring loop for witness server
Missing "break" was breaking it when following a new primary.
2018-03-02 11:04:22 +09:00
Ian Barwick c1356b9e0d repmgrd: retry standby connection after "follow_command" executed
It's possible that the standby is still starting up after the "follow_command"
completes, so poll for a while until we get a connection.
2018-03-02 11:04:19 +09:00