Commit Graph

896 Commits

Author SHA1 Message Date
Ian Barwick
937cffd54c doc: clarify BDR repmgrd configuration
Link directly to section about configuring the "event_notification_command".
2018-07-23 13:21:11 +09:00
Ian Barwick
2b1e12591a doc: fix markup errors 2018-07-23 13:18:38 +09:00
Ian Barwick
7ecfb333b9 doc: add note about switchover and exclusive backups
Also rename server_not_in_exclusive_backup_mode() to avoid double
negatives.

GitHub #476.
2018-07-19 16:02:31 +09:00
Martín Marqués
8f13a66aaa Check that there is no exclusive backup taking place while we perform
a switchover.

We've found that this can cause some issues with postgres control
metadata (could be a postgres bug) so best thing is *not* no switchover
if there's a backup taking place.

It's also a bad idea from an architectual point of view, as a switchover
is supposed to be planed, so why perform it when we are taking backups.

GitHub #476.
2018-07-19 16:02:21 +09:00
Ian Barwick
ef35d071bf Fix is_active_bdr_node() query for BDR 2.x
Copy/paste error when adapting the query for BDR 3.x.
2018-07-19 09:50:30 +09:00
Ian Barwick
b87f9dabb4 doc: remove duplicate item in list of event notifications 2018-07-18 16:10:55 +09:00
Ian Barwick
7decc7975f Fix BDR version check
repgexp_match() is only available from PostgreSQL 10 and later.
2018-07-18 10:54:16 +09:00
Ian Barwick
a5cfc244bc repmgr: have "node status" check for missing downstream nodes
This matches the behaviour of "node check".
2018-07-18 10:27:19 +09:00
Ian Barwick
673bde2b7f repmgr: fix "primary_slot_name" when using "standby clone" with --recovery-conf-only
Addresses GitHub #474.
2018-07-17 13:42:10 +09:00
Martín Marqués
81de200561 Add information to the --help and docs of standby clone regarding the need
to provide a conninfo line to the upstream from which we will be cloning
from.
2018-07-16 18:56:41 -03:00
Ian Barwick
cb46fb6410 repmgrd: when reloading configuration, log any errors encountered 2018-07-16 16:46:39 +09:00
Ian Barwick
bd58e4128c repmgrd: log "promote_command" at log_level "INFO"
If repmgrd is promoting the local node, it was only logging the contents
of "promote_command" at DEBUG level; it would be useful to see this at
the default log level.

Related to GitHub #473.
2018-07-16 15:33:10 +09:00
Ian Barwick
63242e2277 doc: update documentation of "promote_command" and "service_promote_command"
The documentation implied it would override "promote_command", which is
not the case.

"promote_command" is used by repmgrd to execute "repmgr standby promote"
(either directly or via a custom script).

"service_promote_command" can be set to specify a package-level service
command to promote the local PostgreSQL instance from standby to primary,
e.g. Debian's pg_ctlcluster. If set, this will be executed by "repmgr standby promote".

Also update code comments to clarify usage.

Related to GitHub #473.
2018-07-16 14:43:53 +09:00
Ian Barwick
69782cf703 repmgr: enable "witness unregister" to be run on any node
Provide the ID of the witness node with --node-id=...

Implements GitHub #472.
2018-07-13 17:37:59 +09:00
Ian Barwick
5acb3e6790 doc: update release notes 2018-07-13 15:35:34 +09:00
Ian Barwick
6dfcaa357e doc: update release notes 2018-07-13 15:06:04 +09:00
Ian Barwick
8acc50e752 Bump version number in configure.in 2018-07-13 14:05:29 +09:00
Ian Barwick
56919ea499 repmgr: add -q/--quiet option
This suppresses log output below log level ERROR. This is useful mainly
when repmgr is being executed programmatically, e.g. in a cronjob,
where it's only useful to receive output if something goes wrong.

Note we advise against using this option when executing repmgr
commands which operate on PostgreSQL nodes (standby follow,
standby promote, standby switchover, node rejoin), particularly when
executed by repmgrd, as the log output will provide valuable
troubleshooting information.

Implements suggestion in GitHub #468.
2018-07-13 12:09:41 +09:00
Ian Barwick
b3f64987cb repmgr: add --csv output to "cluster event"
Implements GitHub #471.
2018-07-13 11:19:42 +09:00
Ian Barwick
388ac2f392 repmgrd: enable package to supply default PID file path
Also add documentation for packagers about paths which can be patched
as default package values.
2018-07-13 10:26:47 +09:00
Ian Barwick
8b059bc9b0 Change default for "log_level" to INFO
Default was previously NOTICE (as in repmgr 3.x) but documentation
implied it was INFO, and many of the the documentation examples assume
it is.

This produces some quite informative log output, without creating excessive
log file volume. In particular it's useful to get a better idea of what
repmgrd is actually doing.

Also add documentation section for the log configuration parameters.

GitHub #470, containing change suggested in GitHub #467.
2018-07-12 14:50:48 +09:00
Ian Barwick
cfa7155784 doc: update links to configuration file sections 2018-07-12 11:43:04 +09:00
Ian Barwick
47644b55ed doc: rearrange repmgr.conf documentation 2018-07-12 11:36:28 +09:00
Ian Barwick
17f30ec364 repmgrd: add additional local node connection check
It's possible there are corner-cases where do_election() is called while the
local connection is invalid, so perform an additional check.
2018-07-11 15:11:20 +09:00
Ian Barwick
c6b8d78bad doc: add extra emphasis about not running repmgrd during switchover
One day this will no longer be an issue, until then let's hope the
fine documentation is read.
2018-07-11 09:53:29 +09:00
Ian Barwick
ae60caacdd repmgr: make "node check" and "node status" return ERR_NODE_STATUS when appropriate
If any issue is detected (and "node check" is not being executed with a specific
individual check), "ERR_NODE_STATUS" is returned.
2018-07-05 14:31:06 +09:00
Ian Barwick
92d0e6809b repmgr: "cluster show" to return non-zero value if an issue encountered 2018-07-05 13:32:50 +09:00
Ian Barwick
4c7c681a14 repmgr: have "cluster show" exit with a non-zero value if issues detected
If any issues are detected (e.g. node not reachable, unexpected node status
etc.), "repmgr cluster show" returns exit code 25 ("ERR_NODE_STATUS").

Note that exit code 25 was introduced recently as "ERR_CLUSTER_CHECK",
however it makes sense to use this to indicate issues detected by any
command which can detect node issues.

Addresses GitHub #456.
2018-07-05 11:03:48 +09:00
Ian Barwick
29de052dd8 repmgr: clarify intent behind --wait-sync timeout processing 2018-07-05 10:09:04 +09:00
Ian Barwick
ebf2a3a7cc doc: fix typo in release notes 2018-07-05 08:45:10 +09:00
Ian Barwick
37311e15a3 repmgr: fix "standby register --wait-sync" when no timeout provided
The default value for "wait_register_sync_seconds" was zero, which is treated
as disabling --wait-sync altogether. Default value now set to -1, which is taken
to mean no timeout value supplied.
2018-07-04 17:22:04 +09:00
Ian Barwick
a194cf56b3 repmgr: exit with an error if an unrecognised command line option is provided.
This matches the behaviour of other PostgreSQL utilities such as psql, though
repmgr will only abort once all command line options are parsed, so as many
errors as possible are found and displayed. If a repmgr "command" (e.g.
"repmgr primary ..." was provided, a hint about the relevant command
help section (e.g. "repmgr primary --help") will be provided alongside
the generic help command (i.e. "repmgr --help").

Addresses GitHub #464, with further improvements.
2018-07-04 11:02:50 +09:00
Abhijit Menon-Sen
c4f9205f17 Merge pull request #460 from gclough/repmgr_conf_sample_typo_priority
Fixed typo in repmgr.conf.sample, "priority"
2018-07-03 17:43:57 +05:30
Abhijit Menon-Sen
6d09ebcfb5 Merge pull request #462 from gclough/repmgr_cluster_help_2
Fix "cluster cleanup" help
2018-07-03 17:43:35 +05:30
Abhijit Menon-Sen
319a29583d Merge pull request #461 from gclough/add_cluster_cleanup_help
Added "cluster cleanup" to help
2018-07-03 17:43:20 +05:30
Greg Clough
a5d47fd478 Fix "cluster cleanup" help
Fix "cluster cleanup" help
2018-06-29 22:57:06 +01:00
Greg Clough
190104c7db Added "cluster cleanup" to help 2018-06-29 22:54:59 +01:00
Greg Clough
ff16d3b3bb Fixed typo in repmgr.conf.sample, "priority"
Fixed typo in repmgr.conf.sample, "priority"
2018-06-29 22:00:09 +01:00
Ian Barwick
802755fd60 repmgrd: daemonize process by default
It's hard to imagine a use case where this isn't desirable, but
in case, for whatever reason, the user does not wish to daemonize the
process, the command line option "--daemonize=false" can be provided.

Implements GitHub #458.
2018-06-29 22:01:49 +09:00
Ian Barwick
d00c0c67d0 repmgrd: document PID file options/configuration 2018-06-29 17:00:25 +09:00
Ian Barwick
8d636690bd repmgrd: create pid file by default
Traditionally repmgrd will only write a pidfile if explicitly requested with
-p/--pid-file. However it's normally desirable to have a pidfile, and it's
preferable to have one used by default to prevent accidentally starting a second
repmgrd instance.

Following changes made:

 - add configuration file parameter "repmgrd_pid_file" (initially overridden by
   -p/--pid-file for backwards compatibility, though eventually we'll want to
   drop -p/--pid-file altogether)
 - add command line option --no-pid-file
 - if neither "repmgrd_pid_file" nor -p/--pid-file is set, create the pid file
   in a temporary directory

Implements GitHub #457.
2018-06-29 14:36:24 +09:00
Ian Barwick
b2081dca52 De-overload configuration file parameter "standby_reconnect_timeout"
Currently the (very generic sounding) "standby_reconnect_timeout" configuration
file parameter is used in several different contexts and it would be useful
to have more granular control over the different timeouts it's used to configure.

This patch introduces "node_rejoin_timeout", used in place of "standby_reconnect_timeout"
(which wasn't documented) when "repmgr node rejoin" is executed, to determine
how long to wait for the node to rejoin the replication cluster.

Additionally "repmgrd_standby_startup_timeout" is introduced as a timeout for
failover situations, when repmgrd executes "repmgr standby follow" to follow
a new primary, and waits for the standby to restart and become available
for connections.

"standby_reconnect_timeout" is now only relevant for "repmgr standby switchover".

Implements GitHub #454.
2018-06-28 18:00:55 +09:00
Ian Barwick
080a29c33b node check: add --missing-slots check
This enables an explicit check for slots which should exist (according
to the repmgr metadata) but which aren't present.
2018-06-22 17:21:40 +09:00
Ian Barwick
dd7a4068d2 node check: implement CSV output
This is advertised in the --help output and placeholder code was in
place, but it wasn't actually implemented.
2018-06-22 13:14:57 +09:00
Ian Barwick
fcf237fe31 node status: improve output and documentation
In the default text output mode, list inactive slots.

In CSV output mode, list inactive slots as additional information;
add output line with number of missing slots and a list thereof.

Also document --csv output mode.
2018-06-22 11:46:50 +09:00
Ian Barwick
4d70a667fb node check: clarify status information for witness server
Previously the output gave the impression the server was a primary,
which is technically the case, but it's not the actual cluster primary.

Also output an error if the node is in recovery, which is unlikely but
you never know.
2018-06-22 10:15:45 +09:00
Ian Barwick
c5ba72c2c5 standby switchover: fix behaviour if witness node is a sibling
The witness node is not a streaming replication standby, so executing
"repmgr standby follow" will fail. Instead, execute "repmgr witness
register --force" to update the witness node record on the primary and
its local copy of all node records.

Addresses GitHub #453.
2018-06-21 16:48:58 +09:00
Ian Barwick
0f97a98f28 repmgr: don't count witness node as a standby when running "node status"
Addresses GitHub #451.
2018-06-21 13:06:18 +09:00
Ian Barwick
269e3242c8 "repmgr node ...": update comments and formatting 2018-06-21 12:12:07 +09:00
Ian Barwick
b0ed87832b repmgr: don't count witness node as a standby when running "node check"
Addresses GitHub #451.
2018-06-21 11:13:46 +09:00