Commit Graph

1303 Commits

Author SHA1 Message Date
Ian Barwick 5cbaff8d0a Improve database connection failure logging
Log the output of PQerrorStatus() in a couple of places where it was missing.

Additionally, always log the output of PQerrorStatus() starting with a blank
line, otherwise the first line looks like it was emitted by repmgr, and
it's harder to scan the error message.

Before:

    [2019-03-20 11:24:15] [DETAIL] could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?

After:

    [2019-03-20 11:27:21] [DETAIL]
    could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?
2019-03-20 12:13:40 +09:00
Ian Barwick a38e229e61 check_primary_status(): handle case where recovery type unknown 2019-03-20 12:13:34 +09:00
Ian Barwick 272abdd483 Refactor check_primary_status()
Reduce nested if/else branching, and improve documentation.
2019-03-20 12:13:08 +09:00
Ian Barwick b4f6043abc Update .gitignore
Ignore artefacts from failed patch application.
2019-03-20 12:11:57 +09:00
Ian Barwick a7f3f899ff doc: update repmgrd example output 2019-03-20 12:10:31 +09:00
Ian Barwick 3ec43eda36 doc: remove references to "primary_visibility_consensus"
Feature remains experimental.
v4.3.0rc2
2019-03-18 17:43:16 +09:00
Ian Barwick ce8e1cccc4 Remove outdated comment
This was only relevant for repmgr3 and earlier; in repmgr4 the schema
is hard-coded.
2019-03-18 15:19:25 +09:00
Ian Barwick 70bfa4c8e1 Clarify calls to check_primary_status()
Use a constant rather than a magic number to indicate non-provision
of elapsed degraded monitoring time.
2019-03-18 14:21:41 +09:00
Ian Barwick f0d5ad503d doc: clarify "cluster show" error codes 2019-03-18 10:50:05 +09:00
John Naylor b9ee57ee0f Fix assorted Makefile bugs
1. The target additional-maintainer-clean was misspelled as
maintainer-additional-clean.

2. Add add missing clean targets, in particular sysutils.o, config.h,
repmgr_version.h, and Makefile.global. While at it, use a wildcard
for obj files.

3. Don't delete configure.

4. Remove generated file doc/version.sgml from the repo.

5. Have maintainer-clean recurse to the doc directory.
v4.3.0rc1
2019-03-15 16:30:27 +09:00
Ian Barwick d5d6ed4be7 Bump version
4.3rc1
2019-03-15 14:41:41 +09:00
Ian Barwick f4655074ae doc: miscellaenous cleanup 2019-03-15 14:39:55 +09:00
Ian Barwick 67d26ab7e2 doc: tweak wording in event notification documentation 2019-03-15 14:08:18 +09:00
Ian Barwick 70a7b45a03 doc: add explanation of the configuration file format 2019-03-15 14:07:19 +09:00
Ian Barwick 4251590833 doc: update "connection_check_type" descriptions 2019-03-15 14:07:13 +09:00
Ian Barwick 9347d34ce0 repmgrd: optionally check upstream availability through connection attempts 2019-03-15 14:07:08 +09:00
John Naylor feb90ee50c Correct some doc typos 2019-03-15 14:07:05 +09:00
Ian Barwick 0a6486bb7f doc: expand "standby_disconnect_on_failover" documentation 2019-03-15 14:07:01 +09:00
Ian Barwick 39443bbcee Count witness and zero-priority nodes in visibility check 2019-03-15 14:06:58 +09:00
Ian Barwick fc636b1bd2 Ensure witness node sets last upstream seen time 2019-03-15 14:06:55 +09:00
Ian Barwick 048bad1c88 doc: fix option name typo 2019-03-15 14:06:51 +09:00
Ian Barwick 4528eb1796 doc: expand "failover_validate_command" documentation 2019-03-15 14:06:37 +09:00
Ian Barwick 169c9ccd32 repmgrd: improve logging output when executing "failover_validate_command" 2019-03-15 14:06:34 +09:00
Ian Barwick 5f92fbddf2 doc: various updates 2019-03-15 14:06:30 +09:00
Ian Barwick 617e466f72 doc: merge repmgrd witness server description into failover section 2019-03-13 16:19:41 +09:00
Ian Barwick 435fac297b doc: merge repmgrd split network handling description into failover section 2019-03-13 16:19:37 +09:00
Ian Barwick 4bc12b4c94 doc: merge repmgrd monitoring description into operating section 2019-03-13 16:19:33 +09:00
Ian Barwick 91234994e2 doc: merge repmgrd degraded monitoring description into operation section 2019-03-13 16:19:30 +09:00
Ian Barwick ee9da30f20 doc: merge repmgrd notes into operation documentation 2019-03-13 16:19:27 +09:00
Ian Barwick 2e67bc1341 doc: merge repmgrd pause documentation into overview 2019-03-13 16:19:24 +09:00
Ian Barwick 18ab5cab4e doc: initial repmgrd doc refactoring 2019-03-13 16:19:20 +09:00
Ian Barwick 60bb4e9fc8 doc: update repmgrd configuration documentation 2019-03-13 16:19:17 +09:00
Ian Barwick 52bee6b98d repmgrd: various minor logging improvements 2019-03-13 16:19:13 +09:00
Ian Barwick ecb1f379f5 repmgrd: remove global variable
Make the "sibling_nodes" local, and pass by reference where relevant.
2019-03-13 16:19:10 +09:00
Ian Barwick e1cd2c22d4 repmgrd: enable election rerun
If "failover_validation_command" is set, and the command returns an error,
rerun the election.

There is a pause between reruns to avoid "churn"; the length of this pause
is controlled by the configuration parameter "election_rerun_interval".
2019-03-13 16:19:03 +09:00
Ian Barwick 1dea6b76d9 Remove redundant struct allocation 2019-03-13 16:19:00 +09:00
Ian Barwick 702f90fc9d doc: update list of reloadable repmgrd configuration options 2019-03-13 16:18:56 +09:00
Ian Barwick c4d1eec6f3 doc: document "failover_validation_command" 2019-03-13 16:18:53 +09:00
Ian Barwick b241c606c0 doc: expand repmgrd configuration section 2019-03-13 16:18:50 +09:00
Ian Barwick 45c896d716 Execute "failover_validation_command" when only one standby exists 2019-03-08 15:29:17 +09:00
Ian Barwick 514595ea10 Make "failover_validation_command" reloadable 2019-03-08 15:29:12 +09:00
Ian Barwick 531194fa27 Initial implementation of "failover_validation_command" 2019-03-08 15:29:06 +09:00
Ian Barwick 2aa67c992c Make recently added configuration options reloadable 2019-03-08 15:28:59 +09:00
Ian Barwick 37892afcfc Add configuration option "primary_visibility_consensus"
This determines whether repmgrd should continue with a failover if
one or more nodes report they can still see the standby.
2019-03-08 15:28:53 +09:00
Ian Barwick e4e5e35552 Add configuration option "sibling_nodes_disconnect_timeout"
This controls the maximum length of time in seconds that repmgrd will
wait for other standbys to disconnect their WAL receivers in a failover
situation.

This setting is only used when "standby_disconnect_on_failover" is set to "true".
2019-03-08 15:28:48 +09:00
Ian Barwick b320c1f0ae Reset "wal_retrieve_retry_interval" for all nodes 2019-03-08 15:28:42 +09:00
Ian Barwick 280654bed6 repmgrd: don't wait for WAL receiver to reconnect during failover
If the WAL receiver has been temporarily disabled, we don't want to
wait for it to start up as it may not be able to at that point; we do
however need to reset "wal_retrieve_retry_interval".
2019-03-08 15:28:27 +09:00
Ian Barwick ae675059c0 Improve logging/sanity checking for "node control" options 2019-03-08 15:28:22 +09:00
Ian Barwick 454ebabe89 Improve logging when disabling/enabling WAL receiver
Also check action is being run on node which is in recovery.
2019-03-08 15:28:17 +09:00
Ian Barwick d1d6ef8d12 Check for WAL receiver start up 2019-03-08 15:28:11 +09:00