Commit Graph

1058 Commits

Author SHA1 Message Date
Ian Barwick a459c60145 Avoid defining variable-length arrays
As of PostgreSQL commit d9dd406f, variable length arrays are no longer
permitted. As they're not actually required anyway, just define appropriate
constants.

Also noted in GitHub #510.
2018-10-26 10:09:45 +09:00
Ian Barwick 65721bbbcd doc: update README 2018-10-24 15:24:04 +09:00
Ian Barwick 96895ba8a8 doc: update 4.2 release notes 2018-10-24 15:24:00 +09:00
Ian Barwick e0d6d906e7 repmgrd: fix upstream role check
Only take action if it's confirmed as a standby.
2018-10-23 12:47:55 +09:00
Ian Barwick dc8ffd30c6 "standby switchover": close all connections used to check repmgrd status
The connections used to check repmgrd status on all nodes were not being
closed if repmgrd was not running. Normally this wouldn't be a huge
problem as they will go away when repmgr terminates or the PostgreSQL
server restarted. However, if shutdown mode is "smart", the open
connection on the demotion candidate will cause the shutdown operation
to fail until repmgr times out.
2018-10-23 11:05:28 +09:00
Ian Barwick 24392fa11b doc: fix typos 2018-10-23 09:21:00 +09:00
Ian Barwick 06b5239ada doc: fix typo
Per user report on mailing list.
2018-10-23 08:59:30 +09:00
Ian Barwick 56173d94a9 Fix Makefile for VPATH builds under PostgreSQL 11 2018-10-22 16:38:18 +09:00
Ian Barwick 578f11003c repmgrd: improve node role change detection 2018-10-19 11:25:11 +09:00
Ian Barwick 36bd7cdc9f Speed up witness "failover" during a switchover 2018-10-18 17:26:29 +09:00
Ian Barwick 62ac56c3f5 repmgrd: handle case where upstream is no longer primary
If the upstream comes back on line (e.g. after a switchover), and its
status is no longer primary, restart monitoring to ensure the correct
primary (potentially the current node) is being monitored.
2018-10-18 16:50:13 +09:00
Ian Barwick c79852cce0 Ensure witness repmgrd detects change in upstream's role
This ensures that e.g. after a switchover, repmgrd running on a witness
node will automatically detect the new primary and monitor that.
2018-10-18 16:15:46 +09:00
Ian Barwick 3907a545b0 repmgrd: ensure witness node doesn't try and follow another witness
Theoretically there should never be more than one witness node
visible here, but it's not impossible to rule it out, so add a
check just in case.
2018-10-18 12:17:06 +09:00
Ian Barwick d1d057a184 doc: improve upgrade instructions
Note requirement to execute "systemctl daemon-reload" for systemd
systems...
2018-10-17 17:07:52 +09:00
Ian Barwick b70e3b48c8 doc: improve upgrade instructions 2018-10-17 14:32:38 +09:00
Ian Barwick ab6c3d9b6e Handle NULL strings when parsing boolean arguments 2018-10-17 11:47:32 +09:00
Ian Barwick 6999dbb52a Doc: update HISTORY and 4.2 release notes 2018-10-17 11:47:28 +09:00
Ian Barwick b2348c9a70 repmgrd: improve promotion script failure handling
While scanning for a new primary following a promotion script failure,
repmgrd was treating a witness server as a potential new primary
and would attempt to "follow" it. Fortunately "repmgr standby follow"
would do the right thing and choose the actual primary, if available,
otherwise do nothing, so the cluster would eventually end up in the
correct state, albeit for the wrong reason.

By skipping the witness server as a potential new primary,
repmgrd will do the right thing if the original primary does come
back online, i.e. resume monitoring as before.
2018-10-16 11:42:54 +09:00
Ian Barwick 7b26180ebb doc: update upgrade instructions 2018-10-16 09:44:49 +09:00
Ian Barwick d70a5250ab doc: update upgrade instructions 2018-10-11 14:57:49 +09:00
Abhijit Menon-Sen 024accfbba Merge pull request #508 from gilou/docfix
Missing comma in sudoers example
2018-10-10 22:00:43 +05:30
Gilles Pietri 55c967fd14 Missing comma in sudoers example 2018-10-10 17:07:36 +02:00
Ian Barwick c1edb896df Move repmgrd pid functions to 4.1 → 4.2 upgrade file 2018-10-10 10:12:39 +09:00
Ian Barwick fd66d93937 Fix LWLockRelease() call in unset_bdr_failover_handler() 2018-10-08 09:36:50 +09:00
Ian Barwick 40e94635b2 doc: fix typo in repmgr.conf.sample 2018-10-08 09:36:28 +09:00
Ian Barwick 9ad41bfb0f doc: expand upgrade section 2018-10-05 17:45:57 +09:00
Ian Barwick 35c156ce7e Update 4.1 → 4.2 upgrade script 2018-10-05 12:15:18 +09:00
Ian Barwick 85f27ff559 doc: note repmgr's default pg_basebackup options 2018-10-04 13:13:28 +09:00
Ian Barwick ad03885b72 repmgrd: fix parsing of -d/--daemonize option
The getopt API doesn't cope well with optional arguments to short form options,
e.g. "-o foo", so we need to check the next argument value to see whether it looks
like an option or an actual argument value.
2018-10-04 11:48:54 +09:00
Ian Barwick 3e38759c02 use appendPQExpBufferStr/-Char() consistently 2018-10-04 08:42:42 +09:00
Ian Barwick 15a5d2ee9d "repmgr standby": use appendPQExpBufferStr/-Char() consistently 2018-10-03 17:31:12 +09:00
Ian Barwick 61c91df332 "repmgr node": use appendPQExpBufferStr/-Char() where appropriate 2018-10-03 14:09:29 +09:00
Ian Barwick b346914d4d repmgr: fix "Missing replication slots" label in "node check"
Per report in GitHub #507.
2018-10-03 13:53:52 +09:00
Ian Barwick ac40ef0e43 doc: add additional index entries for package information 2018-10-03 11:59:42 +09:00
Ian Barwick eebf07549f doc: update repmgrd configuration for Debian/Ubuntu 2018-10-03 11:59:27 +09:00
Ian Barwick a40fd60cb5 repmgrd: fix parsing of -d/--daemonize option 2018-10-03 11:36:38 +09:00
Ian Barwick bd24848ce9 doc: add tip about setting "ConnectTimeout" for SSH 2018-10-03 10:16:47 +09:00
Ian Barwick 7ab81e10de Log SSH errors when running "repmgr cluster (matrix|crosscheck)"
Previously repmgr would abort with an unhelpful message about being
unable to parse CSV output.

With this commit, it will continue running, and display a list of
inaccessible nodes as an addendum to the main output (unless --csv
or --terse options are specified).

Addresses GitHub #246.
2018-10-03 10:12:18 +09:00
Ian Barwick 455a0bd93f Use make_remote_repmgr_path() in place of make_repmgr_path()
Also we can now simplify "cluster (matrix|crosscheck)" commands as
beginning with v4.0, we know where the configuration file is, so can
provide that when invoking repmgr remotely.
2018-10-02 09:59:18 +09:00
Ian Barwick 11d25e2aef Add configuration parameter "repmgr_bindir"
This is to facilitate remote invocation of repmgr when the repmgr
binary is located somewhere other than the PostgreSQL binary directory, as it
cannot be assumed all package maintainers will install repmgr there.

This parameter is optional; if not set (the default), repmgr will fall back
to "pg_bindir" (if set).

Addresses GitHub #246.
2018-10-02 09:59:12 +09:00
Ian Barwick b14fbbdc72 Add "repmgr daemon ..." options to main help output 2018-09-27 19:07:59 +09:00
Ian Barwick 2491b8ae52 Add functionality to "pause" repmgrd
In some circumstances, e.g. while performing a switchover, it is essential
that repmgrd does not take any kind of failover action, as this will put
the cluster into an incorrect state.

Previously it was necessary to stop repmgrd on all nodes (or at least
those nodes which repmgrd would consider as promotion candidates), however
this is a cumbersome and potentially risk-prone operation, particularly if the
replication cluster contains more than a couple of servers.

To prevent this issue from occurring, this patch introduces the ability
to "pause" repmgrd on all nodes wth a single command ("repmgr daemon pause")
which notifies repmgrd not to take any failover action until the node
is "unpaused" ("repmgr daemon unpause").

"repmgr daemon status" provides an overview of each node and whether repmgrd
is running, and if so whether it is paused.

"repmgr standby switchover" has been modified to automatically pause repmgrd
while carrying out the switchover.

See documentation for further details.
2018-09-27 16:42:10 +09:00
Ian Barwick fce3c02760 Update control file checks for PostgreSQL 11 2018-09-27 14:08:12 +09:00
Ian Barwick 1f8f6f3a39 repmgrd: add notice about different location preventing standby promotion
Though we note this in the DEBUG output, it's not immediately obvious
from the logs, especially outside of the DEBUG log level, why a node
didn't promote itself if it is in a different location to the primary.
2018-09-27 11:06:18 +09:00
Ian Barwick 401f903456 repmgrd: document parameters which can be reloaded via SIGHUP
Also add a new subsection with details on reloading repmgrd configuration.
2018-09-27 10:44:23 +09:00
Ian Barwick 688337dec3 repmgr: add "--node-id" option to "cluster cleanup"
Implements GitHub #493.
2018-09-25 15:56:40 +09:00
Ian Barwick b660cb9fe4 doc: fix link in 4.1.1 release notes 2018-09-25 14:30:38 +09:00
Ian Barwick 5d8d9db21d doc: update 4.2 release notes 2018-09-25 14:28:28 +09:00
Ian Barwick 9439467958 doc: add troubleshooting section to switchover documentation 2018-09-25 13:47:58 +09:00
Ian Barwick 38e3aae053 repmgr: add parameter "shutdown_check_timeout"
Previously, "repmgr standby switchover" used the configuration file parameters
"reconnect_interval" and "reconnect_attempts" to define a timeout to determine
whether the current primary (demotion candidate) has shut down.

However, these parameters are intended for primary failure detection and are
generally lower in value, while a controlled shutdown may take longer, resulting
in the switchover being aborted as repmgr was not waiting long enough.

To prevent this happening, parameter "shutdown_check_timeout" has been added.
This complements the existing "standby_reconnect_timeout" parameter used
by "repmgr standby switchover".

Implements GitHub #504.
2018-09-25 11:34:06 +09:00