Commit Graph

971 Commits

Author SHA1 Message Date
Ian Barwick
222f7e6080 doc: add a link to the current documentation from the contents page 2019-04-03 10:47:19 +09:00
Ian Barwick
446695e328 doc: fix typos 2018-10-23 09:22:11 +09:00
Ian Barwick
ec3da13e22 doc: fix typo
Per user report on mailing list.
2018-10-23 09:00:46 +09:00
Ian Barwick
1488c014ff Changes for a 4.1.2 snapshot release 2018-10-16 13:24:48 +09:00
Ian Barwick
f471316504 repmgrd: improve promotion script failure handling
While scanning for a new primary following a promotion script failure,
repmgrd was treating a witness server as a potential new primary
and would attempt to "follow" it. Fortunately "repmgr standby follow"
would do the right thing and choose the actual primary, if available,
otherwise do nothing, so the cluster would eventually end up in the
correct state, albeit for the wrong reason.

By skipping the witness server as a potential new primary,
repmgrd will do the right thing if the original primary does come
back online, i.e. resume monitoring as before.
2018-10-16 11:39:54 +09:00
Gilles Pietri
726299f7ef Missing comma in sudoers example 2018-10-11 09:59:15 +09:00
Ian Barwick
7fda2a1bcf doc: fix typo in repmgr.conf.sample 2018-10-08 09:37:41 +09:00
Ian Barwick
d26141b8ab Fix LWLockRelease() call in unset_bdr_failover_handler() 2018-10-08 09:37:31 +09:00
Ian Barwick
4a6b5fe913 Update control file checks for PostgreSQL 11 2018-09-27 14:08:39 +09:00
Ian Barwick
a71e644255 repmgrd: document parameters which can be reloaded via SIGHUP
Also add a new subsection with details on reloading repmgrd configuration.
2018-09-27 10:44:34 +09:00
Ian Barwick
8646fd6004 doc: fix link in 4.1.1 release notes 2018-09-25 14:30:57 +09:00
Ian Barwick
3e1bb1a523 doc: minor fixes to "repmgr.conf.sample" 2018-09-25 10:54:54 +09:00
Ian Barwick
f5e58fc062 doc: update "repmgr node rejoin" documentation
Clarify various points related to --force-rewind and pg_rewind usage.
2018-09-14 14:09:33 +09:00
Ian Barwick
6b95a96f3a repmgr: improve "cluster show" output
Only output full contents of connection error messages in --verbose mode,
otherwise it can spew a lot of text onto the screen.
2018-09-12 14:17:39 +09:00
Ian Barwick
bd146ae9ac repmgrd: update local node id in shared memory after local node restart
Also ensure local node restarts are handled more elegantly, so we're not
surprised by a stale connection handle.

GitHub #502.
2018-09-12 14:17:35 +09:00
Ian Barwick
c7f8e48d12 Bump version
4.1.2
2018-09-07 13:08:55 +09:00
Ian Barwick
322190516c doc: update link 2018-09-05 15:41:32 +09:00
Ian Barwick
31a49ff781 doc: update version v4.1.1 2018-09-04 12:33:44 +09:00
Ian Barwick
a6f99b58dd doc: update 4.1.1 release notes 2018-09-04 12:33:10 +09:00
Ian Barwick
09b041433e doc: update 4.1.1 release notes 2018-09-04 09:46:59 +09:00
Ian Barwick
058c8168e1 repmgrd: fix syntax 2018-08-30 15:54:31 +09:00
Ian Barwick
0468e47ef3 repmgrd: improve reconnection handling
Previously, if the server being monitored was not available, repmgrd
would always close the existing connection handle and open a new one.

However, in some cases, e.g. a brief network outage, the existing
connection handle is still good and does not need to be reopened.

This could be particularly problematic if monitoring_history is on,
as this risks leaving orphan sessions on the primary which (given
a sufficiently unstable network) could lead to all available backends
being occupied.

Instead, during an outage we now use a new connection to verify
the server is accessible; if the old connection is still available
(e.g. following a short network interruption) we continue using that;
if  not (e.g. the server was restarted), we use the new one.
2018-08-30 15:47:49 +09:00
Ian Barwick
216326f316 doc: update release notes 2018-08-30 13:09:41 +09:00
Ian Barwick
3fb20ce774 repmgr: improve slot handling in "node rejoin"
On the rejoined node, if a replication slot for the new upstream exists
(which is typically the case after a failover), delete that slot.

Also emit a warning about any inactive replication slots which may need
to be cleaned up manually.

GitHub #499.
2018-08-30 11:57:44 +09:00
Ian Barwick
e468ca859e repmgrd: improve monitoring statistics logging
Add more granular logging to help diagnose issues, and also keep track
of when the last monitoring statistics update was set and emit that
as DETAIL every time we emit a log status update.
2018-08-29 14:48:30 +09:00
Ian Barwick
623c84c022 Add additional query error logging
It's unlikely we'll get an error in these cases, but you never know.

Also, with queries which return a list of node records, it's necessary
to call _populate_node_records() even if the query fails, so a properly
initalised, albeit empty list is returned to the caller.
2018-08-29 10:27:42 +09:00
Ian Barwick
c2dded1d7b Log text of failed queries at log level ERROR
Previously query texts were always logged at log level DEBUG, but
that doesn't help much in a normal production environment when
trying to identify the cause of issues.

Also make various other minor improvements to query logging and
handling of database errors.

Implements GitHub #498.
2018-08-29 10:09:51 +09:00
Ian Barwick
457dbbd267 "standby switchover": improve replication connection check
Previously repmgr would first check that a replication can be made
from the demotion candidate to the promotion candidate, however it's
preferable to sanity-check the number of available walsenders first,
to provide a more useful error message.
2018-08-24 16:31:46 +09:00
Ian Barwick
5485c06bc1 doc: fix internal link 2018-08-24 09:43:18 +09:00
Cédric Villemain
00ae42eb07 Fix grep to find conninfo
it used to use \t* but [[:space:]] should be better as it does match more kind
of spaces (the current one being broken in my case on RH7)
2018-08-24 09:20:51 +09:00
Ian Barwick
33525491ae doc: update package signing key link 2018-08-23 12:33:48 +09:00
Ian Barwick
8c84f7a214 doc: update source requirement links
Per report from Daymel Bonne.
2018-08-23 10:56:49 +09:00
Ian Barwick
efe4bed88e doc: improve event notification documentation
- add undocumented events (per report from Daymel Bonne)
 - split up list into sections for better overview
 - where feasible, add cross-links
2018-08-23 10:22:05 +09:00
Ian Barwick
9ba8dcbac3 doc: clarify statement about BDR HA support 2018-08-23 09:36:58 +09:00
Ian Barwick
a8996a5bfa doc: clarify when "standby follow" can be used.
The unqualified wording previously implied that any running server could
be rejoined with "standby follow", which is not the case with a
"split brain" primary.
2018-08-21 13:53:21 +09:00
Ian Barwick
4cbba98193 repmgr: add "cluster_cleanup" event
GitHub #492.
2018-08-20 16:48:08 +09:00
Ian Barwick
23e6b85de3 doc: document sources of old package versions 2018-08-20 14:16:48 +09:00
Ian Barwick
d5ecb09f22 doc: add information about snapshot packages 2018-08-20 13:03:04 +09:00
Ian Barwick
719dd93676 doc: update release notes 2018-08-20 12:33:11 +09:00
Ian Barwick
5747f1d446 repmgrd: improve cascaded standby failover handling
In particular, improve handling of the case where the standby follow
command fails due to the primary not being available.

GitHub #480.
2018-08-16 17:14:05 +09:00
Ian Barwick
9313b43cb1 repmgrd: fix PQExpBuffer handling in upstream failover handler
Was sometimes leading to blank log lines.
2018-08-16 16:14:14 +09:00
Ian Barwick
5aeb1b0589 repmgrd: don't imply primary is in recovery if it's not available 2018-08-16 15:31:25 +09:00
Ian Barwick
6c93388848 repmgrd: fix "repmgrd_upstream_reconnect" event notification
Upstream node is not always the primary node.

Per report in GitHub #480.
2018-08-16 14:57:11 +09:00
Ian Barwick
d4ad8ce20c "standby clone" - don't copy external config files in dry run mode
Avoid copying files during a --dry-run as it may introduce unexpected changes
on the target node. During an actual clone operation, any problems with
copying files will be detected early and the operation aborted before
the actual database cloning commences.

GitHub #491.
2018-08-16 14:03:39 +09:00
Ian Barwick
bacab8d31c "standby promote": improve log messages
Make it clearer what repmgr is waiting for, and what to do if the
promotion appears to fail.
2018-08-16 11:52:18 +09:00
Ian Barwick
14856e3a4d repmgrd: ensure primary connection handle is refreshed after reconnect
In some circumstances, if monitoring history was in use, repmgrd was attempting
to fetch the primary's current LSN on a stale connection handle.
2018-08-15 16:57:21 +09:00
Ian Barwick
ca9242badb repmgr: fix handling of slot creation error when cloning
If cloning from another node other than the intended upstream, and
replication slots are in use, once the cloning process is complete,
repmgr will attempt to connect to the intended upstream to create
the replication slot.

Previously it would abort with a connection error, but as this issue
is not fatal to the cloning process itself, and in some situations may
be intentional, it's better to log a warning and continue.

We should probably collate this (and any similar items needing
attention after the cloning operation) into a list output at the end,
otherwise the warning may get overlooked.
2018-08-15 15:11:13 +09:00
Ian Barwick
ff0929e882 doc: update FAQ
Explain why some values in recovery.conf are surrounded by pairs of single
quotes.
2018-08-15 14:48:23 +09:00
Abhijit Menon-Sen
8cd1811edb Fix upstream node name in warning
This log_warning is supposed to reproduce the error in the block above,
but used the current node's name instead of the intended upstream node.
2018-08-14 10:10:50 +09:00
Ian Barwick
bf15c0d40f doc: improve "repmgr cluster cleanup" documentation 2018-08-14 10:09:18 +09:00