Commit Graph

64 Commits

Author SHA1 Message Date
Ian Barwick
c2dded1d7b Log text of failed queries at log level ERROR
Previously query texts were always logged at log level DEBUG, but
that doesn't help much in a normal production environment when
trying to identify the cause of issues.

Also make various other minor improvements to query logging and
handling of database errors.

Implements GitHub #498.
2018-08-29 10:09:51 +09:00
Ian Barwick
457dbbd267 "standby switchover": improve replication connection check
Previously repmgr would first check that a replication can be made
from the demotion candidate to the promotion candidate, however it's
preferable to sanity-check the number of available walsenders first,
to provide a more useful error message.
2018-08-24 16:31:46 +09:00
Ian Barwick
4cbba98193 repmgr: add "cluster_cleanup" event
GitHub #492.
2018-08-20 16:48:08 +09:00
Ian Barwick
719dd93676 doc: update release notes 2018-08-20 12:33:11 +09:00
Ian Barwick
d4ad8ce20c "standby clone" - don't copy external config files in dry run mode
Avoid copying files during a --dry-run as it may introduce unexpected changes
on the target node. During an actual clone operation, any problems with
copying files will be detected early and the operation aborted before
the actual database cloning commences.

GitHub #491.
2018-08-16 14:03:39 +09:00
Ian Barwick
bacab8d31c "standby promote": improve log messages
Make it clearer what repmgr is waiting for, and what to do if the
promotion appears to fail.
2018-08-16 11:52:18 +09:00
Ian Barwick
9ae9d31165 repmgr: truncate version string if necessary
Some distributions may add extra information to PG_VERSION after
the actual version number (e.g. "10.4 (Debian 10.4-2.pgdg90+1)"), so
copy the version number string up until the first space is found.

GitHub #490.
2018-08-14 09:56:54 +09:00
Ian Barwick
4c44c01380 doc: update release notes 2018-08-10 09:52:39 +09:00
Ian Barwick
5113ab0274 repmgrd: fix startup on witness node when local data is stale
Previously, when running on a witness server, repmgrd didn't consider
the local cache of the "repmgr.nodes" table might be outdated, e.g.
as repmgrd wasn't running on the witness server during a failover,
so could potentially end up monitoring a former primary now running
as a standby.

When running on a witness server, at startup repmgrd will now scan
all nodes to determine the current primary, and refresh its local
cache from there. This will also ensure it can start up even if the
node currently registered as primary in the local cache is not available.

Implements GitHub #488 and #489.
2018-08-09 16:42:20 +09:00
Ian Barwick
25f68bb283 repmgrd: report version number *after* logger initialisation
This ensures the version number always makes it into the log destination.

Implements GitHub #487.
2018-08-08 15:45:48 +09:00
Ian Barwick
3a789d53e0 repmgrd: always reopen log file after receiving SIGHUP
For whatever reason, since at least repmgr 2.0 the log file was only
ever reopened if a configuration file change took place.

GitHub #485.
2018-08-02 10:51:18 +09:00
Ian Barwick
cb4f6f6e3f doc: add release date for 4.1.0 2018-07-31 10:58:06 +09:00
Ian Barwick
7ecfb333b9 doc: add note about switchover and exclusive backups
Also rename server_not_in_exclusive_backup_mode() to avoid double
negatives.

GitHub #476.
2018-07-19 16:02:31 +09:00
Ian Barwick
673bde2b7f repmgr: fix "primary_slot_name" when using "standby clone" with --recovery-conf-only
Addresses GitHub #474.
2018-07-17 13:42:10 +09:00
Ian Barwick
69782cf703 repmgr: enable "witness unregister" to be run on any node
Provide the ID of the witness node with --node-id=...

Implements GitHub #472.
2018-07-13 17:37:59 +09:00
Ian Barwick
5acb3e6790 doc: update release notes 2018-07-13 15:35:34 +09:00
Ian Barwick
6dfcaa357e doc: update release notes 2018-07-13 15:06:04 +09:00
Ian Barwick
56919ea499 repmgr: add -q/--quiet option
This suppresses log output below log level ERROR. This is useful mainly
when repmgr is being executed programmatically, e.g. in a cronjob,
where it's only useful to receive output if something goes wrong.

Note we advise against using this option when executing repmgr
commands which operate on PostgreSQL nodes (standby follow,
standby promote, standby switchover, node rejoin), particularly when
executed by repmgrd, as the log output will provide valuable
troubleshooting information.

Implements suggestion in GitHub #468.
2018-07-13 12:09:41 +09:00
Ian Barwick
b3f64987cb repmgr: add --csv output to "cluster event"
Implements GitHub #471.
2018-07-13 11:19:42 +09:00
Ian Barwick
8b059bc9b0 Change default for "log_level" to INFO
Default was previously NOTICE (as in repmgr 3.x) but documentation
implied it was INFO, and many of the the documentation examples assume
it is.

This produces some quite informative log output, without creating excessive
log file volume. In particular it's useful to get a better idea of what
repmgrd is actually doing.

Also add documentation section for the log configuration parameters.

GitHub #470, containing change suggested in GitHub #467.
2018-07-12 14:50:48 +09:00
Ian Barwick
ae60caacdd repmgr: make "node check" and "node status" return ERR_NODE_STATUS when appropriate
If any issue is detected (and "node check" is not being executed with a specific
individual check), "ERR_NODE_STATUS" is returned.
2018-07-05 14:31:06 +09:00
Ian Barwick
92d0e6809b repmgr: "cluster show" to return non-zero value if an issue encountered 2018-07-05 13:32:50 +09:00
Ian Barwick
4c7c681a14 repmgr: have "cluster show" exit with a non-zero value if issues detected
If any issues are detected (e.g. node not reachable, unexpected node status
etc.), "repmgr cluster show" returns exit code 25 ("ERR_NODE_STATUS").

Note that exit code 25 was introduced recently as "ERR_CLUSTER_CHECK",
however it makes sense to use this to indicate issues detected by any
command which can detect node issues.

Addresses GitHub #456.
2018-07-05 11:03:48 +09:00
Ian Barwick
37311e15a3 repmgr: fix "standby register --wait-sync" when no timeout provided
The default value for "wait_register_sync_seconds" was zero, which is treated
as disabling --wait-sync altogether. Default value now set to -1, which is taken
to mean no timeout value supplied.
2018-07-04 17:22:04 +09:00
Ian Barwick
a194cf56b3 repmgr: exit with an error if an unrecognised command line option is provided.
This matches the behaviour of other PostgreSQL utilities such as psql, though
repmgr will only abort once all command line options are parsed, so as many
errors as possible are found and displayed. If a repmgr "command" (e.g.
"repmgr primary ..." was provided, a hint about the relevant command
help section (e.g. "repmgr primary --help") will be provided alongside
the generic help command (i.e. "repmgr --help").

Addresses GitHub #464, with further improvements.
2018-07-04 11:02:50 +09:00
Ian Barwick
802755fd60 repmgrd: daemonize process by default
It's hard to imagine a use case where this isn't desirable, but
in case, for whatever reason, the user does not wish to daemonize the
process, the command line option "--daemonize=false" can be provided.

Implements GitHub #458.
2018-06-29 22:01:49 +09:00
Ian Barwick
d00c0c67d0 repmgrd: document PID file options/configuration 2018-06-29 17:00:25 +09:00
Ian Barwick
080a29c33b node check: add --missing-slots check
This enables an explicit check for slots which should exist (according
to the repmgr metadata) but which aren't present.
2018-06-22 17:21:40 +09:00
Ian Barwick
3b0cde2846 repmgr: cluster check commands - non-zero exit code if node(s) unavailable
Return ERR_CLUSTER_CHECK if one or nodes was not reachable.

Implements GitHub #447.
2018-06-12 10:30:11 +09:00
Ian Barwick
00704913a6 doc: 4.0.6 release notes 2018-06-12 10:29:35 +09:00
Ian Barwick
e12fbb7b4d doc: update release notes 2018-06-07 15:04:38 +09:00
Ian Barwick
2d43feb34b doc: update HISTORY and add 4.0.5 release notes 2018-05-01 10:21:40 +09:00
Ian Barwick
5e4bdb5a1b repmgrd: handle failover with two nodes in the primary location
If two nodes were in the primary location, and at least one node in
another location, the non-failed node in the primary location was not
recognising itself as a promotion candidate.

Addresses GitHub #407.
2018-04-02 20:51:27 +09:00
Ian Barwick
22c40ae62d doc: update HISTORY and release notes 2018-03-30 09:41:48 +09:00
Ian Barwick
1558497ae4 repmgr: poll demoted primary after restart during switchover
During a switchover operation, once the demoted primary has been restarted
as a standby, repmgr attempts to reconnect to verify its status and drop
any redundant replication slots. However it's possible the standby may still
be in the startup phase, so poll for "standby_reconnect_timeout" seconds
before giving up.

Addresses GitHub #408.
2018-03-27 16:44:10 +09:00
Ian Barwick
9c5e76401f Fix "repmgr cluster crosscheck" output
Addresses GitHub #398.
2018-03-27 16:44:04 +09:00
Ian Barwick
0219f4c91f Always set "connect_timeout" when pinging a PostgreSQL instance
Insert "connect_timeout=2" into the connection parameters, if not
explicitly set by the user. This will prevent excessive wait time
for the host operating system to report a connection timeout.
2018-03-21 11:48:57 +09:00
Ian Barwick
85a4adc99c Update HISTORY 2018-03-21 06:48:32 +09:00
Ian Barwick
d7702b3444 Correctly handle error message pointer when parsing strings.
When parsing conninfo strings, ensure the error message pointer is
actually returned to the caller.

Not a criticial issue, just meant the contents of the error message
were not being displayed.
2018-03-10 14:29:12 +09:00
Ian Barwick
e8cdf72ecd Add 4.0.4 release notes 2018-03-07 19:21:49 +09:00
Ian Barwick
9981ede1af "standby clone": fix --superuser handling
get_superuser_connection() was erroneously using the local node record
to connect to as a superuser, which works when registering the primary
but obviously not when cloning a standby.

Addresses GitHub #380.
2018-03-02 16:43:19 +09:00
Ian Barwick
40ccae57a3 Update HISTORY 2018-03-02 11:05:30 +09:00
Ian Barwick
15625183c1 "standby clone": document --recovery-conf-only option 2018-02-23 11:19:21 +09:00
Ian Barwick
22b3a74fa0 repmgrd: improve detection of status change from primary to standby
If repmgrd is running in degraded mode on a primary which has been stopped,
then manually been brought back online as a standby (e.g. by creating
recovery.conf and starting the server), ensure it not only detects the
change but automatically updates the node record so it can resume
monitoring the node as a standby.

Previously, repmgrd was looping waiting for the record to be updated
(as is done transparently when executing "repmgr node rejoin") but
if the record was not updated within the timeout period (e.g. by
"repmgr standby register) it would fail to resume monitoring as a
standby.

It seems reasonable to have repmgrd automatically update the node record,
as this will restore failover capability as quickly as possible. If this
is not desired, then the onus is on the user to shut down repmgrd while
making the desired changes.
2018-02-22 15:50:45 +09:00
Ian Barwick
98af51da03 "node rejoin": ensure --dry-run is honoured
Addresses GitHub #383.
2018-02-20 15:31:03 +09:00
Ian Barwick
728a256a93 doc: update release notes 2018-02-16 12:15:35 +09:00
Ian Barwick
76a93af15c "witness register": fix primary node check
Addresses GitHub #377, based on report by user yonj1e in #373.
2018-02-08 16:41:04 +09:00
Ian Barwick
ee2df36a76 "standby switchover": additional sanity checks
Check that sufficient walsenders will be available on the promotion
candidate, and if replication slots are in use check if enough of
those will be available.

Note these checks can't guarantee that the walsenders/slots will
be available at the appropriate points during the switchover process,
but do ensure that existing configuration problems will be caught.

Implements GitHub #371.
2018-02-08 15:19:24 +09:00
Ian Barwick
f9528efdb8 "standby clone": ensure "pg_subtrans" directory is created in Barman mode 2018-02-07 14:45:04 +09:00
Ian Barwick
e6aa831782 Update HISTORY and release notes 2018-02-07 14:43:43 +09:00