Commit Graph

736 Commits

Author SHA1 Message Date
Ian Barwick
60e63feaca repmgrd: add configuration file parameter "standby_reconnect_timeout"
This is used for determining a timeout when reconnecting to the standby
after executing the "follow_command". This will normally not need to be
set explicitly, but maybe useful in cases where the standby's startup
phase can last longer than usual.
2018-02-28 18:56:33 +09:00
Ian Barwick
ae4d0f2622 repmgrd: fix main monitoring loop for witness server
Missing "break" was breaking it when following a new primary.
2018-02-28 16:30:14 +09:00
Ian Barwick
5e8b41e221 repmgrd: retry standby connection after "follow_command" executed
It's possible that the standby is still starting up after the "follow_command"
completes, so poll for a while until we get a connection.
2018-02-28 15:35:47 +09:00
Ian Barwick
c7a585c555 repmgrd: improve log output
- emit explicit startup NOTICE
- emit NOTICE when falling back to degraded monitoring on a primary node
- improve log message and event notification details when monitoring
  a former primary which has been reconnected as a standby
2018-02-28 12:35:13 +09:00
Ian Barwick
a27dd8c49c doc: document "primary_follow_timeout" configuration file parameter. 2018-02-27 10:09:40 +09:00
Ian Barwick
9365bf3474 "standby promote": make timeout values configurable
This introduces following new configuration file parameters, which
were previously hard-coded values:

 - promote_check_timeout
 - promote_check_interval

Implements GitHub #387.
2018-02-27 10:04:58 +09:00
Ian Barwick
e8ae0831fe doc: add <options> section for various commands 2018-02-26 16:54:54 +09:00
Ian Barwick
518866eba5 "node status": improve replication slot warnings
Addresses GitHub #385
2018-02-23 11:06:47 +09:00
Ian Barwick
ed0330c334 "standby clone": document --recovery-conf-only option 2018-02-23 10:54:42 +09:00
Ian Barwick
1f021dc9fa "standby clone --recovery-conf-only": display generated file with --dry-run
Refactor the original code which generates "recovery.conf" to place the
output into a buffer, which can either be output as "recovery.conf"
or copied to a buffer specified by the caller.
2018-02-23 10:16:47 +09:00
Ian Barwick
425839d764 Fix typo in function name 2018-02-22 15:48:41 +09:00
Ian Barwick
3a764f678a "standby clone": add --recovery-conf-only option
This will generate "recovery.conf" for an existing standby.

Typical use-case is a standby cloned manually from an external data
source (e.g. Barman), where "recovery.conf" needs to be created
(and if required a replication slot).

The --dry-run option will check the pre-requisites but not actually
create "recovery.conf" or a replication slot.

This requires that the upstream node is running, a replication connection
can be made and if required a replication slot can be created.

Implements GitHub #382.
2018-02-22 15:47:19 +09:00
Ian Barwick
829cf5cca4 repmgrd: improve detection of status change from primary to standby
If repmgrd is running in degraded mode on a primary which has been stopped,
then manually been brought back online as a standby (e.g. by creating
recovery.conf and starting the server), ensure it not only detects the
change but automatically updates the node record so it can resume
monitoring the node as a standby.

Previously, repmgrd was looping waiting for the record to be updated
(as is done transparently when executing "repmgr node rejoin") but
if the record was not updated within the timeout period (e.g. by
"repmgr standby register) it would fail to resume monitoring as a
standby.

It seems reasonable to have repmgrd automatically update the node record,
as this will restore failover capability as quickly as possible. If this
is not desired, then the onus is on the user to shut down repmgrd while
making the desired changes.
2018-02-22 11:35:47 +09:00
Ian Barwick
14420d83fa "node rejoin": ensure --dry-run is honoured
Addresses GitHub #383.
2018-02-20 15:28:39 +09:00
Ian Barwick
a80e22f0ed Bump version
4.0.4
2018-02-16 12:19:31 +09:00
Ian Barwick
832993bfbc doc: update 4.0.3 release notes 2018-02-16 12:15:10 +09:00
Ian Barwick
f1ea5e62df doc: update release notes 2018-02-15 14:42:29 +09:00
Ian Barwick
b47448d0e5 Replace remaining instances of strcpy() with strncpy()
Also use strncmp() to match.
2018-02-15 13:17:06 +09:00
Ian Barwick
a8232337d8 Catch various corner cases when restarting a PostgreSQL instance v4.0.3 2018-02-14 11:28:38 +09:00
Ian Barwick
c9eb1bfcc0 Always initialise t_conninfo_param_list structures 2018-02-13 10:48:18 +09:00
Ian Barwick
db552dfbc7 Bump version
4.0.3
2018-02-12 15:03:29 +09:00
Ian Barwick
9732f78565 repmgrd: check "repmgr" extension is installed before starting
Implements GitHub #361.
2018-02-12 11:31:59 +09:00
Ian Barwick
eb7dca2919 "node status": add warning about missing replication slots
Implements GitHub #364.
2018-02-12 10:53:31 +09:00
Ian Barwick
c113102926 Update repmgr.conf.sample
Add missing parameter "monitor_interval_secs"
2018-02-12 09:35:57 +09:00
Ian Barwick
ed6a167915 Execute a CHECKPOINT immediately after promoting the server
This ensures "pg_control" is updated with the latest timeline, mainly
to ensure that if "pg_rewind" is executed as part of a switchover
that it sees the latest timeline.

Per suggestion from GitHub user "superflav" in GitHub #378.

See also:

  https://www.postgresql.org/message-id/flat/20150428180253.GU30322%40tamriel.snowman.net
2018-02-09 12:09:16 +09:00
Ian Barwick
fbbe7afd61 doc: update HISTORY and release notes 2018-02-09 11:42:16 +09:00
Ian Barwick
ae1fc93e48 Ensure correct server version number used for replication stats query 2018-02-09 11:06:15 +09:00
Ian Barwick
7b4ee80af2 "standby switchover": check demotion candidate can make replication connection
Check it's actually possible for the demotion candidate to attach to
the promotion candidate before executing the switchover.

As with other checks of this nature, there's a faint possibility the
situation could change between the time the check is carried out and
the demotion candidate is restarted to connect to the promotion candidate,
but there's not a lot we can do about that. The main purpose is to
be able to catch existing misconfigurations before anything gets changed.

Implements GitHub #370.
2018-02-09 10:01:29 +09:00
Ian Barwick
0b8755e278 "witness register": fix primary node check
Addresses GitHub #377, based on report by user yonj1e in #373.
2018-02-08 16:28:50 +09:00
Ian Barwick
d3e1937808 "standby switchover": additional sanity checks
Check that sufficient walsenders will be available on the promotion
candidate, and if replication slots are in use check if enough of
those will be available.

Note these checks can't guarantee that the walsenders/slots will
be available at the appropriate points during the switchover process,
but do ensure that existing configuration problems will be caught.

Implements GitHub #371.
2018-02-08 15:23:10 +09:00
Ian Barwick
871d6fdee3 "standby clone": cowardly refuse to clone into an active data directory
By checking the PID file in the same way pg_ctl does, we can be pretty
much certain whether the target data directory contains an active
PostgreSQL instance.
2018-02-08 11:43:24 +09:00
Ian Barwick
c7dfe9e040 Fix "standby clone" in Barman mode with --no-upstream-connection
"--upstream-node-id", if provided, was not being passed through to
the SQL query executed via the Barman server.

Also modified the query to select the primary node if "--upstream-node-id"
is not provided.

Note: this is a very niche use case.
2018-02-07 16:36:44 +09:00
Ian Barwick
5c92a9e057 repmgr: simplify data directory checks when cloning
Attempting to use the contents of pg_control to tell whether the directory
is in use by PostgreSQL can result in false positives; we should use
a check based on the pidfile.

Also change the HINT to indicate a data directory can be overwritten
if -F/--force is provided.
2018-02-07 14:37:57 +09:00
Ian Barwick
aa5f025738 "standby clone": ensure "pg_subtrans" directory is created in Barman mode 2018-02-07 10:56:18 +09:00
Ian Barwick
5b91a2d409 Update HISTORY and release notes 2018-02-07 09:55:36 +09:00
Ian Barwick
596a19ee37 Move parse_output_to_argv() to configfile.c
So it can be used by parse_pg_basebackup_options().

Addresses GitHub #376.
2018-02-07 09:43:06 +09:00
Ian Barwick
23ff83b3b4 Fix typo in HINT 2018-02-07 08:55:51 +09:00
Ian Barwick
ba1f6bee0d doc: fix GitHub reference in release notes 2018-02-07 08:53:23 +09:00
Ian Barwick
da9c8f2491 Update HISTORY and release notes 2018-02-06 10:38:13 +09:00
Ian Barwick
64035ef701 "standby register/follow": provide primary node details for event notifications
For events generated by these commands, it may be useful to know details
of the primary node. This makes following additional parameters available
to event notification scripts:

- %p: node ID of the primary
- %a: node name of the primary
- %c: conninfo string for the primary

Implements GitHub #375
2018-02-06 09:36:46 +09:00
Ian Barwick
da3a5ab1dc doc: fix descriptions of %p event notification script parameter 2018-02-05 15:54:06 +09:00
Ian Barwick
9d301b4789 "standby register": add event notification "standby_register_sync"
Implements GitHub #374.
2018-02-05 15:21:38 +09:00
Ian Barwick
c070c649f7 doc: minor fixes to BDR docs
Also remove duplicate file.
2018-02-05 15:21:34 +09:00
Ian Barwick
3b823396eb doc: improve BDR failover documentation 2018-02-05 15:21:28 +09:00
Ian Barwick
c19e7f1025 "cluster show": output any connection error messagesin list of warnings
This ensures any connection errors are displayed by default in a
comprehensible, easily reportable way, and saves having to request/filter
DEBUG output.

Implements GitHub #369.
2018-02-05 10:32:20 +09:00
Ian Barwick
e4b5a1e19f "cluster show": minor code cleanup 2018-02-05 10:25:05 +09:00
Ian Barwick
f96cc3b906 "cluster show": improve handling of database errors
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.
2018-02-05 10:15:48 +09:00
Tony Finch
a481ca7ce2 "repmgr node status": correct upstream node info (#363)
repmgr was printing the name and ID of this node instead of its upstream

Signed-off-by: Tony Finch <dot@dotat.at>
2018-02-05 09:54:00 +09:00
Ian Barwick
32dc450a09 doc: add note about replication slots and PostgreSQL upgrades 2018-02-02 18:33:43 +09:00
Ian Barwick
34dbf64f50 Ensure an inactive PostgreSQL data directory can be deleted.
Addresses GitHub #366.
2018-02-02 17:12:25 +09:00