Commit Graph

695 Commits

Author SHA1 Message Date
Ian Barwick
927bf038a0 "standby switchover": check demotion candidate can make replication connection
Check it's actually possible for the demotion candidate to attach to
the promotion candidate before executing the switchover.

As with other checks of this nature, there's a faint possibility the
situation could change between the time the check is carried out and
the demotion candidate is restarted to connect to the promotion candidate,
but there's not a lot we can do about that. The main purpose is to
be able to catch existing misconfigurations before anything gets changed.

Implements GitHub #370.
2018-02-09 10:00:54 +09:00
Ian Barwick
76a93af15c "witness register": fix primary node check
Addresses GitHub #377, based on report by user yonj1e in #373.
2018-02-08 16:41:04 +09:00
Ian Barwick
ee2df36a76 "standby switchover": additional sanity checks
Check that sufficient walsenders will be available on the promotion
candidate, and if replication slots are in use check if enough of
those will be available.

Note these checks can't guarantee that the walsenders/slots will
be available at the appropriate points during the switchover process,
but do ensure that existing configuration problems will be caught.

Implements GitHub #371.
2018-02-08 15:19:24 +09:00
Ian Barwick
571e6b2783 "standby clone": cowardly refuse to clone into an active data directory
By checking the PID file in the same way pg_ctl does, we can be pretty
much certain whether the target data directory contains an active
PostgreSQL instance.
2018-02-08 10:19:05 +09:00
Ian Barwick
76cc11b786 Fix "standby clone" in Barman mode with --no-upstream-connection
"--upstream-node-id", if provided, was not being passed through to
the SQL query executed via the Barman server.

Also modified the query to select the primary node if "--upstream-node-id"
is not provided.

Note: this is a very niche use case.
2018-02-07 16:34:01 +09:00
Ian Barwick
56710f4819 repmgr: simplify data directory checks when cloning
Attempting to use the contents of pg_control to tell whether the directory
is in use by PostgreSQL can result in false positives; we should use
a check based on the pidfile.

Also change the HINT to indicate a data directory can be overwritten
if -F/--force is provided.
2018-02-07 14:45:37 +09:00
Ian Barwick
f9528efdb8 "standby clone": ensure "pg_subtrans" directory is created in Barman mode 2018-02-07 14:45:04 +09:00
Ian Barwick
658ec20e37 doc: fix GitHub reference in release notes 2018-02-07 14:43:47 +09:00
Ian Barwick
e6aa831782 Update HISTORY and release notes 2018-02-07 14:43:43 +09:00
Ian Barwick
9b56f157dc Move parse_output_to_argv() to configfile.c
So it can be used by parse_pg_basebackup_options().

Addresses GitHub #376.
2018-02-07 09:47:50 +09:00
Ian Barwick
05f872effe Fix typo in HINT 2018-02-07 08:56:29 +09:00
Ian Barwick
ae691688be doc: fix descriptions of %p event notification script parameter 2018-02-05 15:52:48 +09:00
Ian Barwick
57f1e939c5 "standby register": add event notification "standby_register_sync"
Implements GitHub #374.
2018-02-05 15:20:19 +09:00
Ian Barwick
48b5deebf3 doc: minor fixes to BDR docs
Also remove duplicate file.
2018-02-05 14:01:37 +09:00
Ian Barwick
1868453953 doc: improve BDR failover documentation 2018-02-05 13:25:49 +09:00
Ian Barwick
dd45189fa8 "cluster show": output any connection error messagesin list of warnings
This ensures any connection errors are displayed by default in a
comprehensible, easily reportable way, and saves having to request/filter
DEBUG output.

Implements GitHub #369.
2018-02-05 10:36:04 +09:00
Ian Barwick
a79c4fae88 "cluster show": minor code cleanup 2018-02-05 10:36:00 +09:00
Ian Barwick
657ed83921 "cluster show": improve handling of database errors
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.
2018-02-05 10:35:56 +09:00
Tony Finch
4fb085f52d "repmgr node status": correct upstream node info (#363)
repmgr was printing the name and ID of this node instead of its upstream

Signed-off-by: Tony Finch <dot@dotat.at>
2018-02-05 09:52:58 +09:00
Ian Barwick
d0bb5b1565 Ensure an inactive PostgreSQL data directory can be deleted.
Addresses GitHub #366.
2018-02-02 17:18:51 +09:00
Ian Barwick
ee64f3a745 "standby follow": finalize implementation of --dry-run option 2018-02-02 17:18:47 +09:00
Ian Barwick
6c81e54f76 "standby follow": check for replication slot availability on target node 2018-02-02 17:18:43 +09:00
Ian Barwick
65bf203a89 Improve "repmgr primary unregister" documentation and --help output
Per observations in GitHub #373
2018-02-02 17:18:36 +09:00
Ian Barwick
b4dbee517f doc: note password SSH requirements for "standby switchover" 2018-02-02 17:18:31 +09:00
Ian Barwick
e23d28a22d "standby follow": initial implementation of --dry-run option
GitHub #363.
2018-02-01 14:16:49 +09:00
Ian Barwick
811d2a45bd "standby switchover": improve log messages and add new exit code
Previously, if an issue was encountered with the old primary, but user
provided -F/--force to have repmgr promote the standby anyway, repmgr
would exit with the log message "STANDBY SWITCHOVER is complete"
and exit code 0 (SUCCESS).

To better report this partial completion, repmgr will now emit the message
"STANDBY SWITCHOVER has completed with issues" (and a HINT to check preceding
log messages) and new exit code 22 (ERR_SWITCHOVER_INCOMPLETE).
2018-01-31 11:03:54 +09:00
Ian Barwick
92f4710ee2 Have do_standby_follow_internal() not abort on error
Pass the error code back to the caller instead, mainly so
"repmgr node rejoin" can better report errors.
2018-01-31 11:03:27 +09:00
Ian Barwick
044d8a1098 repmgr: improve switchover handling when "pg_ctl" used
If logging output not explicitly rediretced with "-l" in the pg_ctl
options, repmgr would hang waiting for pg_ctl output.

Note that we recommend using the OS-level service commands where
available.
2018-01-30 16:56:26 +09:00
Ian Barwick
b38f45120c "repmgr standby register": improve error output when standby not running
Add explicit HINT
2018-01-27 07:17:34 +09:00
Ian Barwick
db3a046393 doc: expand upgrade documentation
Include section about using pg_upgrade
2018-01-25 10:48:24 +09:00
Ian Barwick
ec068e38a2 Remove --bdr-only configuration option
This was required for a specific use case during pre-release
development and is no longer needed now the physical streaming
replication handling is implemented.
2018-01-25 10:48:09 +09:00
Ian Barwick
3a382e826e doc: update 4.0.2 release notes
Add details about upgrading.
2018-01-19 09:10:42 +09:00
Ian Barwick
3dcf57a333 doc: add 4.0.2 release notes 2018-01-19 09:10:42 +09:00
Vlad
f658c8d3d8 doc: add missing word in overview
GitHub pull request #362
2018-01-19 09:09:40 +09:00
Ian Barwick
375a96a5c8 repmgrd: log execution error in "repmgrd_get_local_node_id()"
That shouldn't happen, but if it does it will make it easier to
identify the issue.
2018-01-16 11:16:19 +09:00
Ian Barwick
b4d6724405 doc: improve switchover documentation
Emphasize need to set the "service_*_command" options when repmgr is
installed from a package.
2018-01-16 11:16:19 +09:00
Ian Barwick
8fd0c4ad83 repmgr: assume node is actually shutting down if pingable and that's the reported status 2018-01-12 21:53:37 +09:00
Ian Barwick
7ccae6c2b1 repmgr: automatically create slot name if missing
It's possible that a node was registered with "use_replication_slots=false"
but that was later changed to "use_replication_slots=true". If the node
was not subsequently re-registered, the node record will contain an empty
slot name, which will cause any slot creation operation during
"standby follow" or "node rejoin" to fail.

To prevent this happening, check for an empty slot name and automatically
set before proceeding.

Addresses GitHub #343.
2018-01-11 14:47:50 +09:00
Ian Barwick
61d46172b9 repmgr: catch possible corner case when checking node shutdown status
It's conceivable that PQping is returning "no response" but the
shutdown hasn't quite completed.
2018-01-10 15:09:21 +09:00
Ian Barwick
810471b2f2 repmgr: during switchover, correctly detect unclean shutdown status 2018-01-10 12:25:16 +09:00
Ian Barwick
5bd8cf958a repmgr standby switchover: add "%p" event notification parameter
This will contain the node ID of the former primary.
2018-01-10 12:25:12 +09:00
Ian Barwick
5a45997db5 doc: document command line options for "standby switchover" 2018-01-10 12:25:07 +09:00
Ian Barwick
f1f5100007 repmgr standby switchover: add event details 2018-01-10 12:25:00 +09:00
Ian Barwick
1c8ad4d89b Consolidate parsing of output from executing repmgr on a remote server
This should also fix the issue reported in GitHub #349.
2018-01-09 16:24:13 +09:00
Ian Barwick
842a610e84 Fix call to is_active_bdr_node() in BDR repmgrd
Following the fix to "is_active_bdr_node()" in 841f03ae, it turns out
the call in repmgrd-bdr.c was only accidentally working; explicitly
test for a false return value.
2018-01-04 21:03:36 +09:00
Ian Barwick
fcb7e7a29b "repmgr bdr register": create missing connection replication set if needed
Previously the assumption was that the "repmgr" replication set would be
set up when the nodes are created, however no checks were implemented
and this was not well-documented.

Addresses GitHub #347.
2018-01-04 17:46:49 +09:00
Ian Barwick
26e404b1f3 "repmgr bdr register": improve node name check
We'll use "bdr.bdr_get_local_node_name()" to check the local BDR node
name and the repmgr one match.
2018-01-04 17:46:44 +09:00
Ian Barwick
625d032435 doc: link event notification page from relevate command reference pages 2018-01-04 14:56:15 +09:00
Ian Barwick
3d07d65966 doc: update package documentation 2018-01-04 14:56:12 +09:00
Ian Barwick
b705127a34 "repmgr standby register": add --wait-start option
Implements GitHub #356.
2018-01-04 14:56:08 +09:00