Commit Graph

782 Commits

Author SHA1 Message Date
Ian Barwick 6a1797cadd Enable pg_rewind to be used with PostgreSQL 9.3/9.4
pg_rewind is not part of the core distribution for those, but we
provided support in repmgr 3.3 so should extend it to repmgr 4.

Note that there is no check in place whether the pg_rewind binary
exists, so it's up to the user to ensure it's present.

Addresses GitHub #413.
2018-04-02 20:55:04 +09:00
Ian Barwick 94d26dbe9f Always set "connect_timeout" when pinging a PostgreSQL instance
Insert "connect_timeout=2" into the connection parameters, if not
explicitly set by the user. This will prevent excessive wait time
for the host operating system to report a connection timeout.
2018-04-02 09:31:42 +09:00
Ian Barwick ae655eb4fd Add TODO list
This file will collate various requests and ideas for future developement.
In particular it will reference requests which come in via the GitHub issue
tracker, so we can acknowledge and close off the request and not have an
open unresolved issue hanging around.
2018-03-30 14:18:51 +09:00
Ian Barwick 65371489c6 repmgrd: handle failover with two nodes in the primary location
If two nodes were in the primary location, and at least one node in
another location, the non-failed node in the primary location was not
recognising itself as a promotion candidate.

Addresses GitHub #407.
2018-03-30 12:17:34 +09:00
Ian Barwick 28c7737dc0 Log pg_control access errors as WARNINGs rather than DEBUG
This will make it easier to diagnose issues, possibly with an incorrect
"data_directory" setting in "repmgr.conf".
2018-03-30 11:24:44 +09:00
Ian Barwick 505d72d19c "standby switchover": force checkpoint if pg_rewind requested.
Addresses issue described in GitHub #378.

PostgreSQL itself doesn't issue a checkpoint after promotion to ensure
the newly promoted server is available as quickly as possible, so we'll
only execute an explicit CHECKPOINT when it's actually required, i.e.
when pg_rewind will be executed. This is required as pg_rewind uses
the timeline reported in the pg_control file to compare with the
server to be rewound, and the pg_control timeline is only updated after
the first checkpoint, so there is an interval where pg_rewind will
erroneously assume both servers are on the timeline and take no action.
2018-03-30 09:12:25 +09:00
Ian Barwick b292ac61f8 "standby switchover": update hint 2018-03-30 09:12:21 +09:00
Ian Barwick 293d66bf71 Fix minimum accepted value for "degraded_monitoring_timeout"
Should be -1, the default.

Addresses GitHub #411.
2018-03-30 09:12:17 +09:00
Ian Barwick 3e1f0ec168 repmgr: move demoted primary check to the final step during switchover
This will give the demoted primary more time to start up as a standby,
during which "standby follow" can be executed on sibling nodes, if
specified.
2018-03-27 16:41:13 +09:00
Ian Barwick 6f9a1f975e repmgr: poll demoted primary after restart during switchover
During a switchover operation, once the demoted primary has been restarted
as a standby, repmgr attempts to reconnect to verify its status and drop
any redundant replication slots. However it's possible the standby may still
be in the startup phase, so poll for "standby_reconnect_timeout" seconds
before giving up.

Addresses GitHub #408.
2018-03-27 15:58:18 +09:00
Ian Barwick deea4f69f7 Fix "repmgr cluster crosscheck" output
Addresses GitHub #398.
2018-03-27 10:28:27 +09:00
Ian Barwick 37e53108a2 Consolidate connection closure calls 2018-03-27 08:52:23 +09:00
Ian Barwick 96cf06204c doc: add note about remote command execution
When executing a command on a remote server, repmgr expects the remote binary
to be in the same location as the local binary. It's reasonable to assume
repmgr will be deployed in a unified environment; if not, the onus is on the
user to ensure repmgr can find the remote binary, e.g. by creating appropriate
symlinks.

Addresses query in GitHub #406.
2018-03-27 08:47:56 +09:00
Ian Barwick 381e22c2c7 Misc tweaks to witness code 2018-03-26 20:59:38 +09:00
Ian Barwick 7e2af17783 repmgrd: tweak log notices when marking a standby as failed
Announce what we're going to do (set the node record inactive) *before*
performing the action. Makes reading the log slightly easier.
2018-03-23 13:27:37 +08:00
Ian Barwick b4272853e7 Add event "repmgrd_failover_aborted" 2018-03-23 10:44:00 +08:00
Ian Barwick 562b6ddfc2 Add error code ERR_FOLLOW_FAIL 2018-03-23 10:34:19 +08:00
Ian Barwick a15e5c9d52 Tidy up queries in dbutils.c
- standardize formatting
- prefix various internal function calls with "pg_catalog.", to
  mitigate possible risks from CVE-2018-1058
2018-03-23 10:33:28 +08:00
Ian Barwick d9cc09cee4 repmgrd: fix typo 2018-03-21 12:36:51 +09:00
Ian Barwick c4f6abe951 Update HISTORY 2018-03-21 06:51:56 +09:00
Martín Marqués e454fb77d3 While reviewing 7cb6e5af8d before merging
I noticed that besides the result cleanup added, there was still a missing
spot inside the if condition.

Adding the PQclear that was missing.
2018-03-21 06:51:50 +09:00
Andrzej Nowicki b76e5852d3 One more memory leak fixed 2018-03-21 06:51:43 +09:00
Andrzej Nowicki 0674364ffd Clear node list to avoid memory leak, fixes #402 2018-03-21 06:51:37 +09:00
Ian Barwick b2eb9b8525 Correctly handle error message pointer when parsing strings.
When parsing conninfo strings, ensure the error message pointer is
actually returned to the caller.

Not a criticial issue, just meant the contents of the error message
were not being displayed.
2018-03-10 14:28:10 +09:00
Ian Barwick 71c5d10a8c doc: update 4.0.4 release date 2018-03-09 20:07:16 +09:00
Ian Barwick 1476b21cd4 doc: update release notes
Add note about requiring 4.0.3 or later on all nodes when performing
a switchover from a noder running 4.0.3 or later.

Per report in GitHub #388.
2018-03-09 09:46:58 +09:00
Ian Barwick b17993abdb doc: update "repmgr primary unregister" description
As noted by GitHub user yonj1e in GitHub #396.
v4.0.4
2018-03-08 15:01:25 +09:00
Ian Barwick 8f68344f9a doc: update FAQ
Additional clarification for "repmgr standby clone --recovery-conf-only"
2018-03-08 10:04:30 +09:00
Ian Barwick 125ac6c297 doc: update FAQ
Add entry about upgrading PostgreSQL
2018-03-08 10:04:30 +09:00
Ian Barwick 955860923f Fix parsing of -k/--keep-history option
GitHub #394.
2018-03-07 19:14:18 +09:00
Ian Barwick 50626f90cc Add 4.0.4 release notes 2018-03-07 14:17:04 +09:00
Ian Barwick 9aea5b8aa7 repmgrd: fix failover handling in "manual" mode
Regression was introduced in commit c7a585c555
2018-03-06 22:35:51 +09:00
Ian Barwick ed1bcb159e repmgrd: remove duplicate local record check in BDR mode 2018-03-06 12:31:07 +09:00
Ian Barwick 9c72c0d66e Add event "repmgrd_shutdown"
Implements GitHub #393
2018-03-06 10:59:54 +09:00
Emre Hasegeli 0ddc226c2a Add witness options to the main help
GitHub #392
2018-03-06 10:57:33 +09:00
Ian Barwick 93830cad61 Fix directory creation when cloning from Barman 2018-03-05 19:31:53 +09:00
Ian Barwick bca1660d5e Improve repmgrd logging in BDR mode
Also ensure interval status log line is shown as intended
2018-03-05 15:05:40 +09:00
Ian Barwick 5a52917421 repmgrd: add debug log output for "monitor_interval_secs" sleep in all modes 2018-03-05 14:23:58 +09:00
Emre Hasegeli 70752d7d4a Add missing options to the main help 2018-03-05 09:52:04 +09:00
Ian Barwick c29d1efc37 "standby clone": improve replication user selection
Use the upstream node's replication user when checking the replication
connection.
2018-03-02 16:21:32 +09:00
Ian Barwick 6fbbe2a97a "standby clone": fix --superuser handling
get_superuser_connection() was erroneously using the local node record
to connect to as a superuser, which works when registering the primary
but obviously not when cloning a standby.

Addresses GitHub #380.
2018-03-02 14:49:17 +09:00
Ian Barwick ce42d6827e Update HISTORY 2018-03-01 15:51:09 +09:00
Ian Barwick 98384559a6 "standby clone": remove restriction on replication slots in Barman mode
While it's preferable to avoid standby replication slots if Barman is in
use, there's no technical reason to prevent this.

Implements GitHub #379.
2018-03-01 15:47:28 +09:00
Ian Barwick 4a1477343b repmgr: escape "restore_command" in generated recovery.conf 2018-03-01 10:39:04 +09:00
Ian Barwick d2b9d20393 "standy clone": fix primary_conninfo when --upstream-conninfo provided 2018-03-01 09:18:40 +09:00
Ian Barwick fe594c95ad repmgrd: retry standby connection after cascading standby failover 2018-02-28 21:15:11 +09:00
Ian Barwick 60e63feaca repmgrd: add configuration file parameter "standby_reconnect_timeout"
This is used for determining a timeout when reconnecting to the standby
after executing the "follow_command". This will normally not need to be
set explicitly, but maybe useful in cases where the standby's startup
phase can last longer than usual.
2018-02-28 18:56:33 +09:00
Ian Barwick ae4d0f2622 repmgrd: fix main monitoring loop for witness server
Missing "break" was breaking it when following a new primary.
2018-02-28 16:30:14 +09:00
Ian Barwick 5e8b41e221 repmgrd: retry standby connection after "follow_command" executed
It's possible that the standby is still starting up after the "follow_command"
completes, so poll for a while until we get a connection.
2018-02-28 15:35:47 +09:00
Ian Barwick c7a585c555 repmgrd: improve log output
- emit explicit startup NOTICE
- emit NOTICE when falling back to degraded monitoring on a primary node
- improve log message and event notification details when monitoring
  a former primary which has been reconnected as a standby
2018-02-28 12:35:13 +09:00