Compare commits

..

382 Commits

Author SHA1 Message Date
Ian Barwick
db4199e08f doc: update document build version for 4.1 branch 2018-07-24 14:02:38 +09:00
Ian Barwick
0d9ed02729 doc: fix typo 2018-07-24 14:02:08 +09:00
Ian Barwick
8e9f0b802b Create 4.1 branch 2018-07-24 10:22:31 +09:00
Ian Barwick
c236405251 Update extension metadata for 4.1 release
This release does not make any changes to the extension database
objects.
2018-07-24 09:56:43 +09:00
Ian Barwick
527a5f7fee doc: update release notes and upgrade instructions 2018-07-24 09:54:06 +09:00
Ian Barwick
937cffd54c doc: clarify BDR repmgrd configuration
Link directly to section about configuring the "event_notification_command".
2018-07-23 13:21:11 +09:00
Ian Barwick
2b1e12591a doc: fix markup errors 2018-07-23 13:18:38 +09:00
Ian Barwick
7ecfb333b9 doc: add note about switchover and exclusive backups
Also rename server_not_in_exclusive_backup_mode() to avoid double
negatives.

GitHub #476.
2018-07-19 16:02:31 +09:00
Martín Marqués
8f13a66aaa Check that there is no exclusive backup taking place while we perform
a switchover.

We've found that this can cause some issues with postgres control
metadata (could be a postgres bug) so best thing is *not* no switchover
if there's a backup taking place.

It's also a bad idea from an architectual point of view, as a switchover
is supposed to be planed, so why perform it when we are taking backups.

GitHub #476.
2018-07-19 16:02:21 +09:00
Ian Barwick
ef35d071bf Fix is_active_bdr_node() query for BDR 2.x
Copy/paste error when adapting the query for BDR 3.x.
2018-07-19 09:50:30 +09:00
Ian Barwick
b87f9dabb4 doc: remove duplicate item in list of event notifications 2018-07-18 16:10:55 +09:00
Ian Barwick
7decc7975f Fix BDR version check
repgexp_match() is only available from PostgreSQL 10 and later.
2018-07-18 10:54:16 +09:00
Ian Barwick
a5cfc244bc repmgr: have "node status" check for missing downstream nodes
This matches the behaviour of "node check".
2018-07-18 10:27:19 +09:00
Ian Barwick
673bde2b7f repmgr: fix "primary_slot_name" when using "standby clone" with --recovery-conf-only
Addresses GitHub #474.
2018-07-17 13:42:10 +09:00
Martín Marqués
81de200561 Add information to the --help and docs of standby clone regarding the need
to provide a conninfo line to the upstream from which we will be cloning
from.
2018-07-16 18:56:41 -03:00
Ian Barwick
cb46fb6410 repmgrd: when reloading configuration, log any errors encountered 2018-07-16 16:46:39 +09:00
Ian Barwick
bd58e4128c repmgrd: log "promote_command" at log_level "INFO"
If repmgrd is promoting the local node, it was only logging the contents
of "promote_command" at DEBUG level; it would be useful to see this at
the default log level.

Related to GitHub #473.
2018-07-16 15:33:10 +09:00
Ian Barwick
63242e2277 doc: update documentation of "promote_command" and "service_promote_command"
The documentation implied it would override "promote_command", which is
not the case.

"promote_command" is used by repmgrd to execute "repmgr standby promote"
(either directly or via a custom script).

"service_promote_command" can be set to specify a package-level service
command to promote the local PostgreSQL instance from standby to primary,
e.g. Debian's pg_ctlcluster. If set, this will be executed by "repmgr standby promote".

Also update code comments to clarify usage.

Related to GitHub #473.
2018-07-16 14:43:53 +09:00
Ian Barwick
69782cf703 repmgr: enable "witness unregister" to be run on any node
Provide the ID of the witness node with --node-id=...

Implements GitHub #472.
2018-07-13 17:37:59 +09:00
Ian Barwick
5acb3e6790 doc: update release notes 2018-07-13 15:35:34 +09:00
Ian Barwick
6dfcaa357e doc: update release notes 2018-07-13 15:06:04 +09:00
Ian Barwick
8acc50e752 Bump version number in configure.in 2018-07-13 14:05:29 +09:00
Ian Barwick
56919ea499 repmgr: add -q/--quiet option
This suppresses log output below log level ERROR. This is useful mainly
when repmgr is being executed programmatically, e.g. in a cronjob,
where it's only useful to receive output if something goes wrong.

Note we advise against using this option when executing repmgr
commands which operate on PostgreSQL nodes (standby follow,
standby promote, standby switchover, node rejoin), particularly when
executed by repmgrd, as the log output will provide valuable
troubleshooting information.

Implements suggestion in GitHub #468.
2018-07-13 12:09:41 +09:00
Ian Barwick
b3f64987cb repmgr: add --csv output to "cluster event"
Implements GitHub #471.
2018-07-13 11:19:42 +09:00
Ian Barwick
388ac2f392 repmgrd: enable package to supply default PID file path
Also add documentation for packagers about paths which can be patched
as default package values.
2018-07-13 10:26:47 +09:00
Ian Barwick
8b059bc9b0 Change default for "log_level" to INFO
Default was previously NOTICE (as in repmgr 3.x) but documentation
implied it was INFO, and many of the the documentation examples assume
it is.

This produces some quite informative log output, without creating excessive
log file volume. In particular it's useful to get a better idea of what
repmgrd is actually doing.

Also add documentation section for the log configuration parameters.

GitHub #470, containing change suggested in GitHub #467.
2018-07-12 14:50:48 +09:00
Ian Barwick
cfa7155784 doc: update links to configuration file sections 2018-07-12 11:43:04 +09:00
Ian Barwick
47644b55ed doc: rearrange repmgr.conf documentation 2018-07-12 11:36:28 +09:00
Ian Barwick
17f30ec364 repmgrd: add additional local node connection check
It's possible there are corner-cases where do_election() is called while the
local connection is invalid, so perform an additional check.
2018-07-11 15:11:20 +09:00
Ian Barwick
c6b8d78bad doc: add extra emphasis about not running repmgrd during switchover
One day this will no longer be an issue, until then let's hope the
fine documentation is read.
2018-07-11 09:53:29 +09:00
Ian Barwick
ae60caacdd repmgr: make "node check" and "node status" return ERR_NODE_STATUS when appropriate
If any issue is detected (and "node check" is not being executed with a specific
individual check), "ERR_NODE_STATUS" is returned.
2018-07-05 14:31:06 +09:00
Ian Barwick
92d0e6809b repmgr: "cluster show" to return non-zero value if an issue encountered 2018-07-05 13:32:50 +09:00
Ian Barwick
4c7c681a14 repmgr: have "cluster show" exit with a non-zero value if issues detected
If any issues are detected (e.g. node not reachable, unexpected node status
etc.), "repmgr cluster show" returns exit code 25 ("ERR_NODE_STATUS").

Note that exit code 25 was introduced recently as "ERR_CLUSTER_CHECK",
however it makes sense to use this to indicate issues detected by any
command which can detect node issues.

Addresses GitHub #456.
2018-07-05 11:03:48 +09:00
Ian Barwick
29de052dd8 repmgr: clarify intent behind --wait-sync timeout processing 2018-07-05 10:09:04 +09:00
Ian Barwick
ebf2a3a7cc doc: fix typo in release notes 2018-07-05 08:45:10 +09:00
Ian Barwick
37311e15a3 repmgr: fix "standby register --wait-sync" when no timeout provided
The default value for "wait_register_sync_seconds" was zero, which is treated
as disabling --wait-sync altogether. Default value now set to -1, which is taken
to mean no timeout value supplied.
2018-07-04 17:22:04 +09:00
Ian Barwick
a194cf56b3 repmgr: exit with an error if an unrecognised command line option is provided.
This matches the behaviour of other PostgreSQL utilities such as psql, though
repmgr will only abort once all command line options are parsed, so as many
errors as possible are found and displayed. If a repmgr "command" (e.g.
"repmgr primary ..." was provided, a hint about the relevant command
help section (e.g. "repmgr primary --help") will be provided alongside
the generic help command (i.e. "repmgr --help").

Addresses GitHub #464, with further improvements.
2018-07-04 11:02:50 +09:00
Abhijit Menon-Sen
c4f9205f17 Merge pull request #460 from gclough/repmgr_conf_sample_typo_priority
Fixed typo in repmgr.conf.sample, "priority"
2018-07-03 17:43:57 +05:30
Abhijit Menon-Sen
6d09ebcfb5 Merge pull request #462 from gclough/repmgr_cluster_help_2
Fix "cluster cleanup" help
2018-07-03 17:43:35 +05:30
Abhijit Menon-Sen
319a29583d Merge pull request #461 from gclough/add_cluster_cleanup_help
Added "cluster cleanup" to help
2018-07-03 17:43:20 +05:30
Greg Clough
a5d47fd478 Fix "cluster cleanup" help
Fix "cluster cleanup" help
2018-06-29 22:57:06 +01:00
Greg Clough
190104c7db Added "cluster cleanup" to help 2018-06-29 22:54:59 +01:00
Greg Clough
ff16d3b3bb Fixed typo in repmgr.conf.sample, "priority"
Fixed typo in repmgr.conf.sample, "priority"
2018-06-29 22:00:09 +01:00
Ian Barwick
802755fd60 repmgrd: daemonize process by default
It's hard to imagine a use case where this isn't desirable, but
in case, for whatever reason, the user does not wish to daemonize the
process, the command line option "--daemonize=false" can be provided.

Implements GitHub #458.
2018-06-29 22:01:49 +09:00
Ian Barwick
d00c0c67d0 repmgrd: document PID file options/configuration 2018-06-29 17:00:25 +09:00
Ian Barwick
8d636690bd repmgrd: create pid file by default
Traditionally repmgrd will only write a pidfile if explicitly requested with
-p/--pid-file. However it's normally desirable to have a pidfile, and it's
preferable to have one used by default to prevent accidentally starting a second
repmgrd instance.

Following changes made:

 - add configuration file parameter "repmgrd_pid_file" (initially overridden by
   -p/--pid-file for backwards compatibility, though eventually we'll want to
   drop -p/--pid-file altogether)
 - add command line option --no-pid-file
 - if neither "repmgrd_pid_file" nor -p/--pid-file is set, create the pid file
   in a temporary directory

Implements GitHub #457.
2018-06-29 14:36:24 +09:00
Ian Barwick
b2081dca52 De-overload configuration file parameter "standby_reconnect_timeout"
Currently the (very generic sounding) "standby_reconnect_timeout" configuration
file parameter is used in several different contexts and it would be useful
to have more granular control over the different timeouts it's used to configure.

This patch introduces "node_rejoin_timeout", used in place of "standby_reconnect_timeout"
(which wasn't documented) when "repmgr node rejoin" is executed, to determine
how long to wait for the node to rejoin the replication cluster.

Additionally "repmgrd_standby_startup_timeout" is introduced as a timeout for
failover situations, when repmgrd executes "repmgr standby follow" to follow
a new primary, and waits for the standby to restart and become available
for connections.

"standby_reconnect_timeout" is now only relevant for "repmgr standby switchover".

Implements GitHub #454.
2018-06-28 18:00:55 +09:00
Ian Barwick
080a29c33b node check: add --missing-slots check
This enables an explicit check for slots which should exist (according
to the repmgr metadata) but which aren't present.
2018-06-22 17:21:40 +09:00
Ian Barwick
dd7a4068d2 node check: implement CSV output
This is advertised in the --help output and placeholder code was in
place, but it wasn't actually implemented.
2018-06-22 13:14:57 +09:00
Ian Barwick
fcf237fe31 node status: improve output and documentation
In the default text output mode, list inactive slots.

In CSV output mode, list inactive slots as additional information;
add output line with number of missing slots and a list thereof.

Also document --csv output mode.
2018-06-22 11:46:50 +09:00
Ian Barwick
4d70a667fb node check: clarify status information for witness server
Previously the output gave the impression the server was a primary,
which is technically the case, but it's not the actual cluster primary.

Also output an error if the node is in recovery, which is unlikely but
you never know.
2018-06-22 10:15:45 +09:00
Ian Barwick
c5ba72c2c5 standby switchover: fix behaviour if witness node is a sibling
The witness node is not a streaming replication standby, so executing
"repmgr standby follow" will fail. Instead, execute "repmgr witness
register --force" to update the witness node record on the primary and
its local copy of all node records.

Addresses GitHub #453.
2018-06-21 16:48:58 +09:00
Ian Barwick
0f97a98f28 repmgr: don't count witness node as a standby when running "node status"
Addresses GitHub #451.
2018-06-21 13:06:18 +09:00
Ian Barwick
269e3242c8 "repmgr node ...": update comments and formatting 2018-06-21 12:12:07 +09:00
Ian Barwick
b0ed87832b repmgr: don't count witness node as a standby when running "node check"
Addresses GitHub #451.
2018-06-21 11:13:46 +09:00
Ian Barwick
836d2125fe Improve BDR3 node query
We can get everything we need from bdr.node_summary
2018-06-15 14:30:06 +09:00
Ian Barwick
bf0d67c60a Add repmgr.nodes to the BDR replication set 2018-06-15 14:29:08 +09:00
Ian Barwick
e1d807188d Add extension upgrade files 2018-06-15 14:27:42 +09:00
Ian Barwick
108c3a36fb Enable creation of repmgr extension on BDR3 node 2018-06-15 14:26:47 +09:00
Ian Barwick
8377704596 Convert BDR query functions to handle BDR2/BDR3 2018-06-15 14:26:07 +09:00
Ian Barwick
4f642f8332 Detect and store BDR major version number when executing "is_bdr_db()"
BDR3 metadata structure is very different to BDR1/2, so we'll need to
generate queries according to version.
2018-06-15 14:25:55 +09:00
Ian Barwick
029ba46470 doc: remove info about old RPM package repository 2018-06-15 13:27:19 +09:00
Ian Barwick
098f8eaf2a doc: finalize release notes 2018-06-15 13:27:14 +09:00
Ian Barwick
d60bd232f0 Enable "recovery_min_apply_delay" to be zero.
Addresses GitHub #448.
2018-06-14 11:11:33 +09:00
Ian Barwick
eca1943026 doc: emphasize that repmgrd should not be running during a switchover 2018-06-12 10:30:35 +09:00
Ian Barwick
bcab4bc391 _create_event(): log event and node ID for debugging 2018-06-12 10:30:30 +09:00
Ian Barwick
bb320a64f5 repmgr: consolidate code in "standby switchover"
Commit 41274f5525 left us with two if statements
in sequence with exactly the same condition, so consolidate both into a single
statement. Clarify code comments while we're at it.
2018-06-12 10:30:24 +09:00
Ian Barwick
3b0cde2846 repmgr: cluster check commands - non-zero exit code if node(s) unavailable
Return ERR_CLUSTER_CHECK if one or nodes was not reachable.

Implements GitHub #447.
2018-06-12 10:30:11 +09:00
Ian Barwick
00704913a6 doc: 4.0.6 release notes 2018-06-12 10:29:35 +09:00
Ian Barwick
efc388065e standby follow: check node has connect to new primary
After restarting the standby, poll pg_stat_replication on the upstream
until the standby connects, and exit with an error if it doesn't by the
timeout defined in "standby_follow_timeout".

Implments GitHub #444.
2018-06-07 15:04:45 +09:00
Ian Barwick
e12fbb7b4d doc: update release notes 2018-06-07 15:04:38 +09:00
Ian Barwick
0108fb2e72 standby follow: add hint about using "node rejoin"
If "repmgr standby follow" is executed on a node which isn't running,
point out "repmgr node rejoin" should probably be used instead.
2018-06-07 15:04:30 +09:00
Ian Barwick
e408351697 doc: fix typos 2018-06-07 15:04:25 +09:00
Ian Barwick
f904cd2573 witness_register: check for existing node with same name 2018-06-07 15:04:18 +09:00
Ian Barwick
95fe7ea621 repmgrd: ensure local node is counted as quorum member
Rename "standby_nodes" to "sibling_nodes" to make it clearer in the
code what total is actually provided by the struct.

Addresses GitHub #439.
2018-06-07 15:04:12 +09:00
Ian Barwick
a50ac039da doc: fix typo 2018-06-07 15:04:06 +09:00
Ian Barwick
535fba43d3 standby clone: improve external configuration file copying
If --copy-external-config-files was provided, check that we can copy
the files *before* cloning the standby, and abort if an error is
encountered. This will give the user the opportunity to fix any issues
before running the entire (and potentially lengthy) clone.

Previously errors were logged but no action taken, and the final
message indicated the clone operation was successful.

Addresses GitHub #443.
2018-06-07 15:04:01 +09:00
Ian Barwick
043a6c5bea repmgrd: ensue degraded monitoring timeout works on standby
Parameter "degraded_monitoring_timeout" was not being acted on when
monitoring a streaming replication standby.

Addresses GitHub #439.
2018-06-07 15:03:52 +09:00
Ian Barwick
8da26f1c6c If --dry-run specified, ensure minimum log level is INFO
When executed with --dry-run, repmgr outputs detail about what would
happen using log level INFO. If the log_level is configured to
NOTICE or higher, it's possible some or all of the --dry-run output
might not be displayed.

Addresses GitHub #441.
2018-06-07 15:03:43 +09:00
Ian Barwick
7861392450 node rejoin: avoid outputting empty DETAIL message 2018-06-07 15:03:36 +09:00
Ian Barwick
b297e40d77 node rejoin: improve handling of --config-file parameter
Fixes bug when parsing --config-file values (GitHub #442).

Also improves handling in --dry-run mode, as some checks for the
provided files were being skipped if --dry-run supplied, even though
they are intended to work with --dry-run.
2018-06-07 15:03:30 +09:00
Ian Barwick
7613b1769c standby clone: --recovery-conf-only expects the standby to be registered
Note this in the documentation, and add a HINT about registering it
if the standby record is not available.

Related to GitHub #438.
2018-05-31 09:42:53 +09:00
Ian Barwick
b1b49748a7 "config_file" is MAXPGPATH, not MAXLEN
The two values are the same anyway, so change is more for consistency.
2018-05-24 15:52:57 +09:00
Ian Barwick
276239422b standby clone: don't assume existence of "user" in upstream conninfo
Usually a seperate user (typically "repmgr") is set up specifically to manage
the repmgr metadata, however there's no compelling requirement to do this, and
it's possible the database owner (usually: "postgres") will be used, in which
case it's possible the username will be left out of the conninfo string.

Addresses GitHub #437.
2018-05-24 15:52:51 +09:00
Martín Marqués
49418e096e Fix typo in a code comment 2018-05-19 12:30:03 -03:00
Ian Barwick
6c518f1403 "standby clone": log actual connection string used to connect to upstream
Useful for diagnostic purposes.
2018-05-10 12:03:13 +09:00
Ian Barwick
b365765bc8 Fix check for -d/--dbname parameter
Not a bug per-se, just meant some unnecessary processing was done on
an empty string.

Per note from petere.
2018-05-10 12:03:09 +09:00
Ian Barwick
bd63948937 Include "arpa/inet.h" in dbutils.c
Needed for htonl() on FreeBSD.
2018-05-10 12:03:04 +09:00
Ian Barwick
69c1f147ea doc: update 2ndQuadrant repository information
Canonical link for each repository should not include any directories.
2018-05-10 10:39:31 +09:00
Ian Barwick
ce8d3cf0b0 doc: update repository information 2018-05-10 10:39:27 +09:00
Ian Barwick
14134f8e70 doc: update package installation information
Document the new public 2ndQuadrant apt repository
2018-05-10 10:39:23 +09:00
Ian Barwick
be8448ddcb doc: update package installation information
Document the new, public 2ndQuadrant RPM repository.
2018-05-10 10:39:18 +09:00
Ian Barwick
a2ff1536ad doc: add notes about package compatibility
We need to emphasise that the repmgr packages are only compatible
with packages based on the PGDG filesystem layout; 3rd party vendor
packages often put application and data directories elsewhere.
See e.g. GitHub #427.
2018-05-10 10:38:54 +09:00
Ian Barwick
9c0c1b663e Minor documentation fixes 2018-05-10 10:25:29 +09:00
Ian Barwick
2d43feb34b doc: update HISTORY and add 4.0.5 release notes 2018-05-01 10:21:40 +09:00
Ian Barwick
6f315c1b3c repmgrd: don't explicitly close connections on shutdown 2018-05-01 10:21:10 +09:00
Ian Barwick
635bdccb2c Fix parsing of "archive_ready_critical" configuration file parameter.
Per report in GitHub #426.
2018-04-28 07:00:56 +09:00
Ian Barwick
16048a879e repmgrd: notify sibling nodes to follow new primary after pg_ctl timeout
If "pg_ctl promote" fails due to a timeout, but the promotion itself succeeds,
have repmgrd on the new primary explicitly notify any sibling nodes to
follow it.

Previously the sibling nodes would wait "primary_notification_timeout" seconds
before attempting to discover the new primary.

This (and preceding commit eac80ae) address GitHub #425.
2018-04-27 11:54:21 +09:00
Ian Barwick
eac80ae9c1 repmgrd: handle pg_ctl timeout
It's possible "pg_ctl promote" will timeout, causing "repmgr standby
follow" to return with an error; however the promotion itself will usually
succeed, so detect this case and handle accordingly.
2018-04-26 19:19:42 +09:00
Ian Barwick
887b845aa0 repmgrd: always close the connection if the pointer is not NULL 2018-04-26 10:04:07 +09:00
Ian Barwick
8320179f34 Add configuration file parameter "config_directory"
This enables explicit provision of an external configuration file
directory, which if set will be passed to "pg_ctl" as the -D
parameter. Otherwise "pg_ctl" will default to using the data directory,
which will cause some operations to fail if the configuration files
are not present there.

Note this is implemented primarily for feature completeness and for
development/testing purposes. Users who have installed "repmgr" from
a package should not rely on "pg_ctl" to stop/start/restart PostgreSQL,
instead they should set the appropriate "service_..._command" for their
operating system. For more details see:

    https://repmgr.org/docs/4.0/configuration-service-commands.html

Note: in a future release, the presence of "config_directory" in repmgr.conf
will be used to implictly set "--copy-external-config-files=samepath" when
cloning a standby; this is a behaviour change so will be implemented in the
next major realease (repmgr 4.1).

Implements GitHub #424.
2018-04-25 11:58:24 +09:00
Ian Barwick
7822aa784f repmgrd: catch corner case in standby connection handle check
If repmgrd marks the local node as unavailable, and it was actually
restarting but a failover event occured before the next local node
check, failover will continue with the stale connection handle.

Add a final local node check just before starting the failover
process, so repmgrd can reconnect if it wasn't able to before.
2018-04-24 21:56:57 +09:00
Ian Barwick
4455ded935 repmgrd: prevent standby connection handle from going stale
If monitoring history not in use, there's no activity on the standby's
connection handle, so if e.g. the standby is restarted, PQstatus()
never returns CONNECTION_BAD and repmgrd never notices the connection
is stale. Therefore execute a throw-away statement at "monitor_interval_secs".
2018-04-24 21:56:52 +09:00
Ian Barwick
fd0b850f41 Minor doc and log output tweaks 2018-04-24 21:08:05 +09:00
Ian Barwick
d9ac1d6fd0 doc: minor clarification 2018-04-20 12:58:46 +09:00
Ian Barwick
11e4d9fd05 doc: additional details about repmgrd usage in Debian/Ubuntu 2018-04-20 12:58:41 +09:00
Ian Barwick
4b54106f48 doc: add Debian package details 2018-04-20 12:58:37 +09:00
Ian Barwick
f3941ceab0 doc: Improve CentOS package-related documentation 2018-04-20 12:58:33 +09:00
Ian Barwick
93f80c413e doc: link to service command configuration from switchover section 2018-04-20 10:15:22 +09:00
Ian Barwick
09b8a86605 doc: improve configuration documentation
With special attention to setting service commands, and extra special
mention of "pg_ctlcluster" for Debian/Ubuntu users.
2018-04-20 10:15:18 +09:00
Ian Barwick
6b3d54a5f3 doc: update CentOS package documentation 2018-04-20 10:15:14 +09:00
Ian Barwick
85ab2d94b7 repmgrd: tweak event notifications on standby failure
The event notification was only being created if there was a valid
primary connection; it should be created in any case, so an event
notification script can be executed.
2018-04-20 10:15:08 +09:00
Ian Barwick
cda952f1e4 Add "dbname=replication" to all replication connection strings
Previously repmgr was attempting to make replication connections
with "dbname" set to the repmgr database name. While this works
if e.g. the repmgr user also has replication permissions, it will
fail if a dedicated replication user is specified, who only has
permission to access the virtual "replication" database.

Change this to use "dbname=replication" if the replication connection
user is different to the normal repmgr database user.

(We could just always set it to "replication", but that might break
existing installations e.g. where a .pgpass file is in use and there's
no "replication" entry for the normal repmgr database user).

Addresses GitHub #421.
2018-04-12 16:11:16 +09:00
Ian Barwick
99ad57f88a doc: mention --recovery-conf-only introduced in repmgr 4.0.4
Per GitHub #419.
2018-04-12 16:11:12 +09:00
Ian Barwick
ad0671ead2 doc: various updates related to "standby clone" operations. 2018-04-12 16:11:07 +09:00
Ian Barwick
1bbb2ef213 Fix superuser password handling
When establishing a superuser connection, the connection parameters
were being copied from the existing (non-superuser) connection, which
in some circumstances can lead to that user's password being
included in the copied parameter list. The password parameter, if set, will
now always be removed, which will cause libpq to retrieve the correct
one from the .pgpass file.

Addresses GitHub #400.
2018-04-12 12:49:41 +09:00
Ian Barwick
62c29aab32 Don't issue a CHECKPOINT after promoting a standby.
Issuing a CHECKPOINT immediately after promoting a standby may impact
performance. Commit 239a548e9d ensures
one is only issued when required, i.e. during a switchover when
pg_rewind will be executed.

This reverts commit a2068768ab.
2018-04-09 14:35:54 +09:00
Ian Barwick
b9dc94f28f doc: update FAQ location 2018-04-07 11:46:10 +09:00
Ian Barwick
e8ba213174 "standby register": add sanity check when --upstream-node-id not supplied
If --upstream-node-id was not supplied to "repmgr standby register",
repmgr defaults to the primary node as upstream node. If the local node is
available, we now double-check that it's attached to the primary,
in case the lack of --upstream-node-id was an accidental ommission.

This check is only made when the local node is available.

This behaviour can be overriden with -F/--force (though it's hard to
imagine a scenario where that would be useful).

Addresses GitHub #395.
2018-04-05 17:38:55 +09:00
Ian Barwick
0dcddbb062 doc: minor FAQ tweaks 2018-04-05 17:10:33 +09:00
Ian Barwick
b4dab86c3b doc: add a section about repmgrd and service commands etc. 2018-04-05 11:49:08 +09:00
Ian Barwick
644a56a645 doc: miscelleneous FAQ updates
- clarify pg_rewind item
 - add note about what's included in recovery.conf
2018-04-04 10:07:08 +09:00
Ian Barwick
4876a9fde3 Add TODO for pg_rewind changes coming in PostgreSQL 11 2018-04-03 21:56:46 +09:00
Ian Barwick
ec998bf9c5 doc: update HISTORY and release notes 2018-04-03 15:00:49 +09:00
Ian Barwick
e36b180de8 Ensure correct server version number used for replication stats query 2018-04-03 14:45:37 +09:00
Ian Barwick
a2068768ab Execute a CHECKPOINT immediately after promoting the server
This ensures "pg_control" is updated with the latest timeline, mainly
to ensure that if "pg_rewind" is executed as part of a switchover
that it sees the latest timeline.

Per suggestion from GitHub user "superflav" in GitHub #378.

See also:

  https://www.postgresql.org/message-id/flat/20150428180253.GU30322%40tamriel.snowman.net
2018-04-03 14:44:44 +09:00
Ian Barwick
bde9fea48c Fix directory creation when cloning from Barman 2018-04-03 14:44:03 +09:00
Ian Barwick
cdaf84c329 doc: minor readbility fix 2018-04-03 14:42:48 +09:00
Ian Barwick
c4cd0c46da doc: add note about replication slots and PostgreSQL upgrades 2018-04-03 14:41:58 +09:00
Ian Barwick
3b00dc912a Catch various corner cases when restarting a PostgreSQL instance 2018-04-03 14:40:53 +09:00
Ian Barwick
1a80de1290 doc: document "primary_follow_timeout" configuration file parameter. 2018-04-03 14:39:38 +09:00
Ian Barwick
26b565dff2 Improve repmgrd logging in BDR mode
Also ensure interval status log line is shown as intended
2018-04-03 14:38:32 +09:00
Ian Barwick
96811ccc01 repmgrd: tweak log notices when marking a standby as failed
Announce what we're going to do (set the node record inactive) *before*
performing the action. Makes reading the log slightly easier.
2018-04-03 14:37:43 +09:00
Ian Barwick
73982859f6 repmgrd: improve log output
- emit explicit startup NOTICE
- emit NOTICE when falling back to degraded monitoring on a primary node
- improve log message and event notification details when monitoring
  a former primary which has been reconnected as a standby
2018-04-03 14:37:06 +09:00
Ian Barwick
afb7ca886c doc: note change of shared library name from "repmgr_funcs" to "repmgr" 2018-04-03 14:35:45 +09:00
Ian Barwick
df11ad894f doc: update release notes
Add note about requiring 4.0.3 or later on all nodes when performing
a switchover from a noder running 4.0.3 or later.

Per report in GitHub #388.
2018-04-03 14:35:18 +09:00
Ian Barwick
614b4ae84b doc: update 4.0.4 release date 2018-04-03 14:34:24 +09:00
Ian Barwick
1e1b4b1a65 "standby register/follow": provide primary node details for event notifications
For events generated by these commands, it may be useful to know details
of the primary node. This makes following additional parameters available
to event notification scripts:

- %p: node ID of the primary
- %a: node name of the primary
- %c: conninfo string for the primary

Implements GitHub #375
2018-04-03 14:32:19 +09:00
Ian Barwick
cf64f9e95c Always initialise t_conninfo_param_list structures 2018-04-03 14:31:24 +09:00
Ian Barwick
dfdebd6c08 Enable provision of "archive_cleanup_command" in recovery.conf
If "archive_cleanup_command" is defined in "repmgr.conf", a corresponding
entry will be made in the node's "recovery.conf" file after cloning a
standby.

Note that we recommend using PgBarman to manage WAL archives, but are
providing this facility to help repmgr to be integrated in existing environments.

Implements GitHub #416.
2018-04-03 14:10:21 +09:00
Ian Barwick
63a11f8926 "standby promote": make timeout values configurable
This introduces following new configuration file parameters, which
were previously hard-coded values:

 - promote_check_timeout
 - promote_check_interval

Implements GitHub #387.
2018-04-03 14:10:14 +09:00
Ian Barwick
a3f371b8c0 "node rejoin": actively check for node to rejoin cluster
Previously repmgr was relying on whatever command was configured to
start PostgreSQL to determine whether the node being rejoined had
started correctly. However it's preferable to actively poll the upstream
to confirm it has restarted and actually attached as a standby before
confirming success of the "node rejoin" action.

This can be overridden with the -W/--no-wait option.

(Note that for consistency with other PostgreSQL utilities, the
short form of the --wait option is now "-w"; this is currently
only used in "repmgr standby follow".)

Also update "repmgr node rejoin" documentation with a list of supported
options, and add some useful index entries for "pg_rewind".

Implements GitHub #415.
2018-04-03 10:34:44 +09:00
Ian Barwick
938692c169 doc: fix option description for "repmgr primary register" 2018-04-03 10:09:24 +09:00
Ian Barwick
ad24b04c35 Refactor pg_control parsing
The "data_checksum_version" field towards the end of the ControlFileData struct,
meaning its position varies between versions. Previously this wasn't a problem
as it was only required for operations involving 9.5 and later, and its position
within the control file has not changed between the current release and current
HEAD.

However, in order to support pg_rewind in 9.3 and 9.4, which both have changes in
the control file format, we'll need version-specific parsing. This will also make
it easier to deal with any future changes to the control file format.
2018-04-02 20:54:42 +09:00
Ian Barwick
3ccf1cf182 Enable pg_rewind to be used with PostgreSQL 9.3/9.4
pg_rewind is not part of the core distribution for those, but we
provided support in repmgr 3.3 so should extend it to repmgr 4.

Note that there is no check in place whether the pg_rewind binary
exists, so it's up to the user to ensure it's present.

Addresses GitHub #413.
2018-04-02 20:54:29 +09:00
Ian Barwick
5e4bdb5a1b repmgrd: handle failover with two nodes in the primary location
If two nodes were in the primary location, and at least one node in
another location, the non-failed node in the primary location was not
recognising itself as a promotion candidate.

Addresses GitHub #407.
2018-04-02 20:51:27 +09:00
Ian Barwick
50321bb95d Log pg_control access errors as WARNINGs rather than DEBUG
This will make it easier to diagnose issues, possibly with an incorrect
"data_directory" setting in "repmgr.conf".
2018-04-02 09:28:56 +09:00
Ian Barwick
253c215c12 Add TODO list
This file will collate various requests and ideas for future developement.
In particular it will reference requests which come in via the GitHub issue
tracker, so we can acknowledge and close off the request and not have an
open unresolved issue hanging around.
2018-03-30 14:24:36 +09:00
Ian Barwick
22c40ae62d doc: update HISTORY and release notes 2018-03-30 09:41:48 +09:00
Ian Barwick
239a548e9d "standby switchover": force checkpoint if pg_rewind requested.
Addresses issue described in GitHub #378.

PostgreSQL itself doesn't issue a checkpoint after promotion to ensure
the newly promoted server is available as quickly as possible, so we'll
only execute an explicit CHECKPOINT when it's actually required, i.e.
when pg_rewind will be executed. This is required as pg_rewind uses
the timeline reported in the pg_control file to compare with the
server to be rewound, and the pg_control timeline is only updated after
the first checkpoint, so there is an interval where pg_rewind will
erroneously assume both servers are on the timeline and take no action.
2018-03-29 23:55:08 +09:00
Ian Barwick
231ef5563e "standby switchover": update hint 2018-03-29 23:41:59 +09:00
Ian Barwick
e1413fa8ea Fix minimum accepted value for "degraded_monitoring_timeout"
Should be -1, the default.

Addresses GitHub #411.
2018-03-29 21:15:03 +09:00
Ian Barwick
7111483b65 repmgr: move demoted primary check to the final step during switchover
This will give the demoted primary more time to start up as a standby,
during which "standby follow" can be executed on sibling nodes, if
specified.
2018-03-27 16:44:15 +09:00
Ian Barwick
1558497ae4 repmgr: poll demoted primary after restart during switchover
During a switchover operation, once the demoted primary has been restarted
as a standby, repmgr attempts to reconnect to verify its status and drop
any redundant replication slots. However it's possible the standby may still
be in the startup phase, so poll for "standby_reconnect_timeout" seconds
before giving up.

Addresses GitHub #408.
2018-03-27 16:44:10 +09:00
Ian Barwick
9c5e76401f Fix "repmgr cluster crosscheck" output
Addresses GitHub #398.
2018-03-27 16:44:04 +09:00
Ian Barwick
a403da67bc Consolidate connection closure calls 2018-03-27 16:43:59 +09:00
Ian Barwick
71b13f5307 doc: add note about remote command execution
When executing a command on a remote server, repmgr expects the remote binary
to be in the same location as the local binary. It's reasonable to assume
repmgr will be deployed in a unified environment; if not, the onus is on the
user to ensure repmgr can find the remote binary, e.g. by creating appropriate
symlinks.

Addresses query in GitHub #406.
2018-03-27 16:43:55 +09:00
Ian Barwick
1c5561d114 Misc tweaks to witness code 2018-03-26 20:59:29 +09:00
Ian Barwick
c0b607ef41 doc: update list of event notifications 2018-03-23 10:40:39 +08:00
Ian Barwick
462fdca4b4 Tidy up queries in dbutils.c
- standardize formatting
- prefix various internal function calls with "pg_catalog.", to
  mitigate possible risks from CVE-2018-1058
2018-03-23 10:28:28 +08:00
Ian Barwick
0e55a60660 Add event "repmgrd_failover_aborted" 2018-03-21 13:23:06 +09:00
Ian Barwick
93deab3e96 Add error code ERR_FOLLOW_FAIL 2018-03-21 13:11:30 +09:00
Ian Barwick
81c69e3677 repmgrd: fix typo 2018-03-21 12:36:15 +09:00
Ian Barwick
0219f4c91f Always set "connect_timeout" when pinging a PostgreSQL instance
Insert "connect_timeout=2" into the connection parameters, if not
explicitly set by the user. This will prevent excessive wait time
for the host operating system to report a connection timeout.
2018-03-21 11:48:57 +09:00
Ian Barwick
85a4adc99c Update HISTORY 2018-03-21 06:48:32 +09:00
Martín Marqués
208d7d418e While reviewing 7cb6e5af8d before merging
I noticed that besides the result cleanup added, there was still a missing
spot inside the if condition.

Adding the PQclear that was missing.
2018-03-13 11:43:36 -03:00
Martín Marqués
7cb6e5af8d Merge pull request #403 from AndrzejNowicki/master
Clear node list to avoid memory leak on witness
2018-03-13 11:41:10 -03:00
Andrzej Nowicki
d2a2df13d5 One more memory leak fixed 2018-03-13 11:23:33 +01:00
Andrzej Nowicki
358e001218 Clear node list to avoid memory leak, fixes #402 2018-03-13 11:05:24 +01:00
Ian Barwick
d7702b3444 Correctly handle error message pointer when parsing strings.
When parsing conninfo strings, ensure the error message pointer is
actually returned to the caller.

Not a criticial issue, just meant the contents of the error message
were not being displayed.
2018-03-10 14:29:12 +09:00
Ian Barwick
a8286030c0 doc: update "repmgr primary unregister" description
As noted by GitHub user yonj1e in GitHub #396.
2018-03-08 19:11:41 +09:00
Ian Barwick
ff0ba3e19a doc: update FAQ
Additional clarification for "repmgr standby clone --recovery-conf-only"
2018-03-08 19:11:33 +09:00
Ian Barwick
6f5cce7e6f doc: update FAQ
Add entry about upgrading PostgreSQL
2018-03-08 19:11:21 +09:00
Ian Barwick
509f7a8255 Fix parsing of -k/--keep-history option
GitHub #394.
2018-03-07 19:22:04 +09:00
Ian Barwick
e8cdf72ecd Add 4.0.4 release notes 2018-03-07 19:21:49 +09:00
Ian Barwick
2a99dfa15b repmgrd: fix failover handling in "manual" mode
Regression was introduced in commit c7a585c555
2018-03-07 19:21:40 +09:00
Ian Barwick
bad034f7ee repmgrd: remove duplicate local record check in BDR mode 2018-03-07 19:21:33 +09:00
Ian Barwick
cdb504d700 Add event "repmgrd_shutdown"
Implements GitHub #393
2018-03-06 11:00:03 +09:00
Ian Barwick
0af2077bed repmgrd: add debug log output for "monitor_interval_secs" sleep in all modes 2018-03-06 10:56:21 +09:00
Emre Hasegeli
dea87b7285 Add witness options to the main help
GitHub #392
2018-03-06 10:55:06 +09:00
Martín Marqués
d6b13f3428 Merge pull request #391 from hasegeli/helpmissing
Add missing options to the main help
2018-03-02 15:36:53 -03:00
Emre Hasegeli
5808d8190e Add missing options to the main help 2018-03-02 17:08:50 +01:00
Ian Barwick
d2a5cc23cc "standby clone": improve replication user selection
Use the upstream node's replication user when checking the replication
connection.
2018-03-02 16:43:23 +09:00
Ian Barwick
9981ede1af "standby clone": fix --superuser handling
get_superuser_connection() was erroneously using the local node record
to connect to as a superuser, which works when registering the primary
but obviously not when cloning a standby.

Addresses GitHub #380.
2018-03-02 16:43:19 +09:00
Ian Barwick
40ccae57a3 Update HISTORY 2018-03-02 11:05:30 +09:00
Ian Barwick
3c2b8e5792 "standby clone": remove restriction on replication slots in Barman mode
While it's preferable to avoid standby replication slots if Barman is in
use, there's no technical reason to prevent this.

Implements GitHub #379.
2018-03-02 11:05:25 +09:00
Ian Barwick
354231284e repmgr: escape "restore_command" in generated recovery.conf 2018-03-02 11:05:21 +09:00
Ian Barwick
dbbfcb6a63 "standy clone": fix primary_conninfo when --upstream-conninfo provided 2018-03-02 11:05:15 +09:00
Ian Barwick
bc766a48ed repmgrd: retry standby connection after cascading standby failover 2018-03-02 11:05:07 +09:00
Ian Barwick
55441f2729 repmgrd: add configuration file parameter "standby_reconnect_timeout"
This is used for determining a timeout when reconnecting to the standby
after executing the "follow_command". This will normally not need to be
set explicitly, but maybe useful in cases where the standby's startup
phase can last longer than usual.
2018-03-02 11:04:56 +09:00
Ian Barwick
e38a9ec7e1 repmgrd: fix main monitoring loop for witness server
Missing "break" was breaking it when following a new primary.
2018-03-02 11:04:22 +09:00
Ian Barwick
c1356b9e0d repmgrd: retry standby connection after "follow_command" executed
It's possible that the standby is still starting up after the "follow_command"
completes, so poll for a while until we get a connection.
2018-03-02 11:04:19 +09:00
Ian Barwick
383a17fba1 doc: add <options> section for various commands 2018-02-26 16:54:27 +09:00
Ian Barwick
29cb153643 "node status": improve replication slot warnings
Addresses GitHub #385
2018-02-23 11:19:33 +09:00
Ian Barwick
15625183c1 "standby clone": document --recovery-conf-only option 2018-02-23 11:19:21 +09:00
Ian Barwick
b6a1b75d22 "standby clone --recovery-conf-only": display generated file with --dry-run
Refactor the original code which generates "recovery.conf" to place the
output into a buffer, which can either be output as "recovery.conf"
or copied to a buffer specified by the caller.
2018-02-23 11:18:45 +09:00
Ian Barwick
c644ddde51 Fix typo in function name 2018-02-22 15:50:57 +09:00
Ian Barwick
ee98a3a58e "standby clone": add --recovery-conf-only option
This will generate "recovery.conf" for an existing standby.

Typical use-case is a standby cloned manually from an external data
source (e.g. Barman), where "recovery.conf" needs to be created
(and if required a replication slot).

The --dry-run option will check the pre-requisites but not actually
create "recovery.conf" or a replication slot.

This requires that the upstream node is running, a replication connection
can be made and if required a replication slot can be created.

Implements GitHub #382.
2018-02-22 15:50:51 +09:00
Ian Barwick
22b3a74fa0 repmgrd: improve detection of status change from primary to standby
If repmgrd is running in degraded mode on a primary which has been stopped,
then manually been brought back online as a standby (e.g. by creating
recovery.conf and starting the server), ensure it not only detects the
change but automatically updates the node record so it can resume
monitoring the node as a standby.

Previously, repmgrd was looping waiting for the record to be updated
(as is done transparently when executing "repmgr node rejoin") but
if the record was not updated within the timeout period (e.g. by
"repmgr standby register) it would fail to resume monitoring as a
standby.

It seems reasonable to have repmgrd automatically update the node record,
as this will restore failover capability as quickly as possible. If this
is not desired, then the onus is on the user to shut down repmgrd while
making the desired changes.
2018-02-22 15:50:45 +09:00
Ian Barwick
98af51da03 "node rejoin": ensure --dry-run is honoured
Addresses GitHub #383.
2018-02-20 15:31:03 +09:00
Ian Barwick
e5eff3f6d5 doc: update 4.0.3 release notes 2018-02-16 12:15:44 +09:00
Ian Barwick
728a256a93 doc: update release notes 2018-02-16 12:15:35 +09:00
Ian Barwick
f5f02ae0ee Replace remaining instances of strcpy() with strncpy()
Also use strncmp() to match.
2018-02-15 13:31:55 +09:00
Ian Barwick
64d85587de repmgrd: check "repmgr" extension is installed before starting
Implements GitHub #361.
2018-02-12 11:38:31 +09:00
Ian Barwick
6b7f6089ba "node status": add warning about missing replication slots
Implements GitHub #364.
2018-02-12 11:38:27 +09:00
Ian Barwick
5719a0dfd3 Update repmgr.conf.sample
Add missing parameter "monitor_interval_secs"
2018-02-12 11:38:22 +09:00
Ian Barwick
927bf038a0 "standby switchover": check demotion candidate can make replication connection
Check it's actually possible for the demotion candidate to attach to
the promotion candidate before executing the switchover.

As with other checks of this nature, there's a faint possibility the
situation could change between the time the check is carried out and
the demotion candidate is restarted to connect to the promotion candidate,
but there's not a lot we can do about that. The main purpose is to
be able to catch existing misconfigurations before anything gets changed.

Implements GitHub #370.
2018-02-09 10:00:54 +09:00
Ian Barwick
76a93af15c "witness register": fix primary node check
Addresses GitHub #377, based on report by user yonj1e in #373.
2018-02-08 16:41:04 +09:00
Ian Barwick
ee2df36a76 "standby switchover": additional sanity checks
Check that sufficient walsenders will be available on the promotion
candidate, and if replication slots are in use check if enough of
those will be available.

Note these checks can't guarantee that the walsenders/slots will
be available at the appropriate points during the switchover process,
but do ensure that existing configuration problems will be caught.

Implements GitHub #371.
2018-02-08 15:19:24 +09:00
Ian Barwick
571e6b2783 "standby clone": cowardly refuse to clone into an active data directory
By checking the PID file in the same way pg_ctl does, we can be pretty
much certain whether the target data directory contains an active
PostgreSQL instance.
2018-02-08 10:19:05 +09:00
Ian Barwick
76cc11b786 Fix "standby clone" in Barman mode with --no-upstream-connection
"--upstream-node-id", if provided, was not being passed through to
the SQL query executed via the Barman server.

Also modified the query to select the primary node if "--upstream-node-id"
is not provided.

Note: this is a very niche use case.
2018-02-07 16:34:01 +09:00
Ian Barwick
56710f4819 repmgr: simplify data directory checks when cloning
Attempting to use the contents of pg_control to tell whether the directory
is in use by PostgreSQL can result in false positives; we should use
a check based on the pidfile.

Also change the HINT to indicate a data directory can be overwritten
if -F/--force is provided.
2018-02-07 14:45:37 +09:00
Ian Barwick
f9528efdb8 "standby clone": ensure "pg_subtrans" directory is created in Barman mode 2018-02-07 14:45:04 +09:00
Ian Barwick
658ec20e37 doc: fix GitHub reference in release notes 2018-02-07 14:43:47 +09:00
Ian Barwick
e6aa831782 Update HISTORY and release notes 2018-02-07 14:43:43 +09:00
Ian Barwick
9b56f157dc Move parse_output_to_argv() to configfile.c
So it can be used by parse_pg_basebackup_options().

Addresses GitHub #376.
2018-02-07 09:47:50 +09:00
Ian Barwick
05f872effe Fix typo in HINT 2018-02-07 08:56:29 +09:00
Ian Barwick
ae691688be doc: fix descriptions of %p event notification script parameter 2018-02-05 15:52:48 +09:00
Ian Barwick
57f1e939c5 "standby register": add event notification "standby_register_sync"
Implements GitHub #374.
2018-02-05 15:20:19 +09:00
Ian Barwick
48b5deebf3 doc: minor fixes to BDR docs
Also remove duplicate file.
2018-02-05 14:01:37 +09:00
Ian Barwick
1868453953 doc: improve BDR failover documentation 2018-02-05 13:25:49 +09:00
Ian Barwick
dd45189fa8 "cluster show": output any connection error messagesin list of warnings
This ensures any connection errors are displayed by default in a
comprehensible, easily reportable way, and saves having to request/filter
DEBUG output.

Implements GitHub #369.
2018-02-05 10:36:04 +09:00
Ian Barwick
a79c4fae88 "cluster show": minor code cleanup 2018-02-05 10:36:00 +09:00
Ian Barwick
657ed83921 "cluster show": improve handling of database errors
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.
2018-02-05 10:35:56 +09:00
Tony Finch
4fb085f52d "repmgr node status": correct upstream node info (#363)
repmgr was printing the name and ID of this node instead of its upstream

Signed-off-by: Tony Finch <dot@dotat.at>
2018-02-05 09:52:58 +09:00
Ian Barwick
d0bb5b1565 Ensure an inactive PostgreSQL data directory can be deleted.
Addresses GitHub #366.
2018-02-02 17:18:51 +09:00
Ian Barwick
ee64f3a745 "standby follow": finalize implementation of --dry-run option 2018-02-02 17:18:47 +09:00
Ian Barwick
6c81e54f76 "standby follow": check for replication slot availability on target node 2018-02-02 17:18:43 +09:00
Ian Barwick
65bf203a89 Improve "repmgr primary unregister" documentation and --help output
Per observations in GitHub #373
2018-02-02 17:18:36 +09:00
Ian Barwick
b4dbee517f doc: note password SSH requirements for "standby switchover" 2018-02-02 17:18:31 +09:00
Ian Barwick
e23d28a22d "standby follow": initial implementation of --dry-run option
GitHub #363.
2018-02-01 14:16:49 +09:00
Ian Barwick
811d2a45bd "standby switchover": improve log messages and add new exit code
Previously, if an issue was encountered with the old primary, but user
provided -F/--force to have repmgr promote the standby anyway, repmgr
would exit with the log message "STANDBY SWITCHOVER is complete"
and exit code 0 (SUCCESS).

To better report this partial completion, repmgr will now emit the message
"STANDBY SWITCHOVER has completed with issues" (and a HINT to check preceding
log messages) and new exit code 22 (ERR_SWITCHOVER_INCOMPLETE).
2018-01-31 11:03:54 +09:00
Ian Barwick
92f4710ee2 Have do_standby_follow_internal() not abort on error
Pass the error code back to the caller instead, mainly so
"repmgr node rejoin" can better report errors.
2018-01-31 11:03:27 +09:00
Ian Barwick
044d8a1098 repmgr: improve switchover handling when "pg_ctl" used
If logging output not explicitly rediretced with "-l" in the pg_ctl
options, repmgr would hang waiting for pg_ctl output.

Note that we recommend using the OS-level service commands where
available.
2018-01-30 16:56:26 +09:00
Ian Barwick
b38f45120c "repmgr standby register": improve error output when standby not running
Add explicit HINT
2018-01-27 07:17:34 +09:00
Ian Barwick
db3a046393 doc: expand upgrade documentation
Include section about using pg_upgrade
2018-01-25 10:48:24 +09:00
Ian Barwick
ec068e38a2 Remove --bdr-only configuration option
This was required for a specific use case during pre-release
development and is no longer needed now the physical streaming
replication handling is implemented.
2018-01-25 10:48:09 +09:00
Ian Barwick
3a382e826e doc: update 4.0.2 release notes
Add details about upgrading.
2018-01-19 09:10:42 +09:00
Ian Barwick
3dcf57a333 doc: add 4.0.2 release notes 2018-01-19 09:10:42 +09:00
Vlad
f658c8d3d8 doc: add missing word in overview
GitHub pull request #362
2018-01-19 09:09:40 +09:00
Ian Barwick
375a96a5c8 repmgrd: log execution error in "repmgrd_get_local_node_id()"
That shouldn't happen, but if it does it will make it easier to
identify the issue.
2018-01-16 11:16:19 +09:00
Ian Barwick
b4d6724405 doc: improve switchover documentation
Emphasize need to set the "service_*_command" options when repmgr is
installed from a package.
2018-01-16 11:16:19 +09:00
Ian Barwick
8fd0c4ad83 repmgr: assume node is actually shutting down if pingable and that's the reported status 2018-01-12 21:53:37 +09:00
Ian Barwick
7ccae6c2b1 repmgr: automatically create slot name if missing
It's possible that a node was registered with "use_replication_slots=false"
but that was later changed to "use_replication_slots=true". If the node
was not subsequently re-registered, the node record will contain an empty
slot name, which will cause any slot creation operation during
"standby follow" or "node rejoin" to fail.

To prevent this happening, check for an empty slot name and automatically
set before proceeding.

Addresses GitHub #343.
2018-01-11 14:47:50 +09:00
Ian Barwick
61d46172b9 repmgr: catch possible corner case when checking node shutdown status
It's conceivable that PQping is returning "no response" but the
shutdown hasn't quite completed.
2018-01-10 15:09:21 +09:00
Ian Barwick
810471b2f2 repmgr: during switchover, correctly detect unclean shutdown status 2018-01-10 12:25:16 +09:00
Ian Barwick
5bd8cf958a repmgr standby switchover: add "%p" event notification parameter
This will contain the node ID of the former primary.
2018-01-10 12:25:12 +09:00
Ian Barwick
5a45997db5 doc: document command line options for "standby switchover" 2018-01-10 12:25:07 +09:00
Ian Barwick
f1f5100007 repmgr standby switchover: add event details 2018-01-10 12:25:00 +09:00
Ian Barwick
1c8ad4d89b Consolidate parsing of output from executing repmgr on a remote server
This should also fix the issue reported in GitHub #349.
2018-01-09 16:24:13 +09:00
Ian Barwick
842a610e84 Fix call to is_active_bdr_node() in BDR repmgrd
Following the fix to "is_active_bdr_node()" in 841f03ae, it turns out
the call in repmgrd-bdr.c was only accidentally working; explicitly
test for a false return value.
2018-01-04 21:03:36 +09:00
Ian Barwick
fcb7e7a29b "repmgr bdr register": create missing connection replication set if needed
Previously the assumption was that the "repmgr" replication set would be
set up when the nodes are created, however no checks were implemented
and this was not well-documented.

Addresses GitHub #347.
2018-01-04 17:46:49 +09:00
Ian Barwick
26e404b1f3 "repmgr bdr register": improve node name check
We'll use "bdr.bdr_get_local_node_name()" to check the local BDR node
name and the repmgr one match.
2018-01-04 17:46:44 +09:00
Ian Barwick
625d032435 doc: link event notification page from relevate command reference pages 2018-01-04 14:56:15 +09:00
Ian Barwick
3d07d65966 doc: update package documentation 2018-01-04 14:56:12 +09:00
Ian Barwick
b705127a34 "repmgr standby register": add --wait-start option
Implements GitHub #356.
2018-01-04 14:56:08 +09:00
Ian Barwick
832b38c5cb doc: fix typos in "repmgr primary unregister" command reference 2018-01-04 14:56:02 +09:00
Ian Barwick
3739a7b84d doc: add link to event notifications page from "repmgr cluster event" 2018-01-04 14:55:56 +09:00
Ian Barwick
841f03aeba Fix query in is_active_bdr_node()
Boolean column was not being checked correctly.

Also add detail output in "repmgr node role --check", where the function
is called.
2018-01-04 14:55:51 +09:00
Ian Barwick
cad12b1fb7 "repmgr cluster event": move query to dbutils.c 2018-01-04 14:55:46 +09:00
Ian Barwick
d31cc80d26 docs: document "repmgr cluster event --terse" 2018-01-04 14:55:40 +09:00
Ian Barwick
625187a61e "repmgr cluster events": optionally omit "Details" column with --terse
Implements GitHub #360.
2018-01-04 14:55:34 +09:00
Ian Barwick
e64d965c6a repmgrd: document standby_[failure|recovery] event notifications
Also clean up the relevant code section.

Addresses GitHub #359.
2018-01-04 09:33:37 +09:00
Ian Barwick
5d8ec136e6 repmgr node rejoin: handle missing node record correctly
If a connection was provided for a database other than the "repmgr"
database, error was logged but execution continued, resulting in
the connection being finished twice.

Addresses GitHub #358.
2018-01-03 15:17:01 +09:00
Ian Barwick
9951a8e106 doc: add appendix with details about packages
work-in-progress
2018-01-02 17:23:24 +09:00
Ian Barwick
26a9e848fd Update copyright notices to 2018 2018-01-02 10:19:46 +09:00
Ian Barwick
ba0b0a497f doc: Fix event notification placeholder typo
Per report from Carlos.
2018-01-01 10:28:19 +09:00
Ian Barwick
09dc43a61c docs: update HISTORY 2017-12-27 10:22:25 +09:00
Ian Barwick
b349f82571 doc: update documentation build instructions
Describe how to build documentation as a single file, and also note
requirement to build against 9.6 or earlier.
2017-12-27 10:05:44 +09:00
Ian Barwick
adbb627850 Merge branch 'doc-nochunks' of https://github.com/fanf2/repmgr
Pull request GitHub #353.
2017-12-27 09:58:09 +09:00
Ian Barwick
c47f976bde repmgr.conf.sample: fix command line argument
"repmgr node check --archive-ready" is correct, however abbreviated
versions will be accepted by getopt_long() if they don't match
or partially match any other options.

Per report by "chaintng" in GitHub #355.
2017-12-27 09:39:14 +09:00
Tony Finch
7c8cd7a482 doc: an optional all-in-one-file manual 2017-12-21 18:31:05 +00:00
Ian Barwick
edce8addbd repmgr: add missing -W option to getopt_long() invocation
Addresses GitHub #350.
2017-12-20 10:24:58 +09:00
Martín Marqués
b0f6202448 Merge pull request #352 from dbonne/master
Fix package name
2017-12-19 15:21:51 -03:00
Daymel Bonne Solís
985b13b6d3 Fix package name 2017-12-19 13:09:55 -05:00
Martín Marqués
69e64a9464 Add more information to the setting up sudo without requiretty in
the documentation

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-12-14 14:39:22 -03:00
Martín Marqués
f58954b3be Switch spaces for tabs in repmgr.conf sample file.
This makes comments stay aligned in most cases the conf file is
modified, and when indentation changes, it's easy to re-align
(by removing or adding a tab)

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-12-14 07:00:05 -03:00
Ian Barwick
3761d17752 docs: update 4.0.1 release date 2017-12-13 15:16:26 +09:00
Ian Barwick
8c121da8a1 Add diagnostic option "repmgr node check --has-passfile"
This checks if the active libpq version (9.6 and later) has the
"passfile" option, and returns 0 if present, 1 if not.
`
2017-12-11 20:09:48 +09:00
Abhijit Menon-Sen
6e9e4543e8 Fix typo: upstream_node_id → upstream_node 2017-12-08 09:46:58 +05:30
Ian Barwick
c94f1b7338 Fix unpackaged upgrade SQL for PostgreSQL 9.3 2017-12-04 17:52:36 +09:00
Ian Barwick
f78c169c3d docs: improve event notification documentation 2017-11-29 14:43:28 +09:00
Ian Barwick
f2db9f3ea4 docs: minor fixes to various examples 2017-11-29 11:33:42 +09:00
Ian Barwick
9944324c3a docs: add additional note about setting "wal_log_hints"
Useful to reference this when discussing PostgreSQL configuration in
general.
2017-11-29 11:22:12 +09:00
Ian Barwick
836f32bdbc Update release notes 2017-11-28 13:42:09 +09:00
Ian Barwick
cebbc73c38 Update HISTORY 2017-11-28 13:01:45 +09:00
Ian Barwick
472d703d2e repmgr: initialise "voting_term" in "repmgr primary register"
This previously happened in the extension SQL code, which could
potentially cause replay problems if installing on a BDR cluster.

As this table is only required for streaming replication failover,
move the initialisation to "repmgr primary register".

Addresses GitHub #344 .
2017-11-28 11:08:12 +09:00
Ian Barwick
de34e4e89b docs: add 2ndQ yum repository installation instructions
These replace the HTML document at https://repmgr.org/yum-repository.html
2017-11-24 14:13:33 +09:00
Ian Barwick
3a8ee126f3 Delete any replication slots copied by pg_rewind
If --force-rewind is used in conjunction with "repmgr node rejoin",
any replication slots present on the source node will be copied too;
it's essential to remove these to prevent stale slots being extant
when the node starts up.

We do this at file system level *before* the server starts to minimize
the risk of any problems.

Addresses GitHub #334
2017-11-24 11:13:31 +09:00
Ian Barwick
da93dd1f57 docs: fix configuration file example
Per report from Carlos Chapi.
2017-11-24 09:26:09 +09:00
Ian Barwick
295c18f6ff repmgr: fix configuration file sanity check
The check was being carried out regardless of whether --copy-external-config-files
was specified, which means cloning will fail if no SSH connection is available.

Addresses GitHub #342
2017-11-23 22:48:34 +09:00
Ian Barwick
81beec54aa repmgr: fix return code output for repmgr node check --action=...
Addresses GitHub #340
2017-11-23 10:34:21 +09:00
Martín Marqués
2e42226f68 Fix missing FQN for the nodes table.
This bug was not detected before because most users work with the repmgr
user. For that reason, the repmgr schema is already in the search_path
by default.

Add the repmgr schema to the nodes table in the LEFT JOIN used for
cluster show (and in other places)

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-11-22 17:13:58 -03:00
Ian Barwick
de10d7984a docs: update 4.0.0 release notes 2017-11-21 16:54:13 +09:00
Ian Barwick
404aab4041 docs: miscellaneous updates 2017-11-20 15:47:59 +09:00
Ian Barwick
8c422d6084 Remove unneeded functions 2017-11-20 15:18:21 +09:00
Ian Barwick
8b78b7292d docs: add note about "service_promote_command" in repmgr.conf.sample
It must never contain "repmgr standby promote", as it is intended
to enable use of package-level promote commands such as Debian's
"pg_ctlcluster promote".

Addresses GitHub #336.
2017-11-20 12:29:47 +09:00
Ian Barwick
4cebba32e2 remove spurios "/base" path element in Barman tablespace cloning code.
Addresses GitHub #339
2017-11-20 10:50:26 +09:00
Ian Barwick
c9f12cfbe0 repmgr: don't add empty "passfile" parameter in recovery.conf 2017-11-20 10:27:45 +09:00
Ian Barwick
5b4c92392c docs: expand witness documentation 2017-11-17 11:00:43 +09:00
Ian Barwick
e2b94adec3 docs: miscellaneous cleanup 2017-11-17 09:39:11 +09:00
Ian Barwick
3164bfa043 docs: add initial witness server documentation 2017-11-17 08:51:21 +09:00
Ian Barwick
08b443dce0 repmgrd: renable monitoring data recording when in archive recovery.
The warning emitted gives the impression that monitoring data shouldn't
be written if there's no streaming replication, but we can and should
do this as long as we have a primary connection.

Explictly document this in the code.

Also remove an unused variable warning.
2017-11-16 17:17:17 +09:00
Ian Barwick
9165d27f9f "repmgr node ...": fixes for 9.3
Mainly to account for the lack of replication slots.
2017-11-16 11:25:16 +09:00
Ian Barwick
b8b991398a Escape double-quotes in strings passed to an event notification script
The string in question will be generated internally by repmgr as a simple
one-line string with no control characters etc., so all that needs to be
escaped at the moment are any double quotes.
2017-11-16 10:36:48 +09:00
Ian Barwick
a9a17f206e docs: improve documentation of pg_basebackup_options 2017-11-15 20:50:13 +09:00
Ian Barwick
9d432546bf repmgrd: don't fail over unless more than 50% of active nodes are visible. 2017-11-15 13:48:28 +09:00
Ian Barwick
3c557ebd8e repmgrd: finalize witness failover handling 2017-11-15 13:48:25 +09:00
Ian Barwick
4efeb52cba repmgrd: synchronise repmgr.nodes table on witness server 2017-11-15 13:48:21 +09:00
Ian Barwick
60422c66f9 repmgrd: handle witness server 2017-11-15 13:48:17 +09:00
Ian Barwick
b63872afbb "witness register": set upstream_node_id to that of the primary 2017-11-15 13:48:14 +09:00
Ian Barwick
a31980b590 repmgrd: basic witness node monitoring 2017-11-15 13:48:11 +09:00
Ian Barwick
e07a3c7976 docs: add witness command reference files to file list 2017-11-15 13:48:06 +09:00
Ian Barwick
9d9a1be062 docs: add command reference for "witness (un)register" 2017-11-15 13:48:03 +09:00
Ian Barwick
8208b3f844 witness (un)register: add --dry-run mode 2017-11-15 13:48:00 +09:00
Ian Barwick
ecb8297b1f witness unregister: enable execution when witness server is down
Also add help output for "repmgr witness --help".
2017-11-15 13:47:54 +09:00
Ian Barwick
1553596f84 repmgr: minor fix to "repmgr standby --help" output 2017-11-15 13:47:52 +09:00
Ian Barwick
022d9c58c2 Add "witness unregister" functionality 2017-11-15 13:47:48 +09:00
Ian Barwick
a6cc4d80f0 Add "witness register" functionality 2017-11-15 13:47:45 +09:00
Ian Barwick
7fffe3ed96 witness: initial code framework 2017-11-15 13:47:41 +09:00
Ian Barwick
9b93a595f5 docs: add some more index entries 2017-11-14 20:55:37 +09:00
Ian Barwick
c34e08b802 docs: document "passfile" configuration file parameter 2017-11-14 20:53:26 +09:00
Ian Barwick
eb14bb58c6 Add configuration file "passfile"
This will enable a custom .pgpass to be included in "primary_conninfo"
(provided it's supported by the libpq version on the standby).
2017-11-14 19:30:25 +09:00
Ian Barwick
aa28069d8b docs: update release notes
Add note about changes to password handling.1
2017-11-14 18:47:39 +09:00
Ian Barwick
a1e272f64c Update extension SQL 2017-11-13 10:02:46 +09:00
Ian Barwick
9908a9c662 repmgrd: detect role change from primary to standby
If repmgrd is monitoring a primary which is taken off-line, then later
restored as a standby, detect this change and resume monitoring
in standby node.

Addresses GitHub #338.
2017-11-10 17:19:30 +09:00
Ian Barwick
aa089820ab repmgrd: check shared library is loaded
If this isn't the case, "repmgrd" will appear to run but not handle
failover correctly.

Address GitHub #337.
2017-11-10 14:35:17 +09:00
Ian Barwick
0230bafae1 repmgrd: updates related to node_id handling 2017-11-10 12:07:31 +09:00
Ian Barwick
de577adc67 repmgrd: catch corner cases where monitoring data is not available 2017-11-09 22:27:09 +09:00
Ian Barwick
fed17d49e3 repmgrd: ensure shmem is reinitialised after a restart 2017-11-09 19:31:21 +09:00
Ian Barwick
d80763f974 repmgrd: misc fixes 2017-11-09 19:31:16 +09:00
Ian Barwick
331e982bdb repmgrd: fix priority/node_id tie-break check 2017-11-09 19:31:12 +09:00
Ian Barwick
4ca7e6a6bf repmgrd: remove unneeded functions 2017-11-09 19:31:08 +09:00
Ian Barwick
6ac6e0733a repmgrd: simplify the candidate selection logic
All disconnected nodes will be in a static, known state, so as long as
each node has the same meta-information (repmgr.nodes) and is able
to retrieve the last receive LSN of the other nodes, it is possible
for each node to independently determine the best promotion candidate,
thereby reaching consensus without an explicit "voting" process.
2017-11-09 19:31:04 +09:00
Ian Barwick
79d21b516b repmgrd: fixes to failover handling
get_new_primary() returns NULL if no notification for the new primary has
been received, but the code was expecting it to return UNKNOWN_NODE_ID,
which was causing repmgrd to prematurely drop out of the new primary
detection loop if no notification had been received by the time the loop
started.

Also store the electoral term as a single row, single column table,
to ensure that all repmgrds see the same turn. It is then bumped
by the winning node after it gets promoted.

Various logging improvements.
2017-11-08 14:28:08 +09:00
Ian Barwick
7232187f4d Ensure shared memory functions handle NULL parameters correctly 2017-11-08 12:19:07 +09:00
Ian Barwick
fe98270b3f Update .gitignore
Ignore output from "make installcheck"
2017-11-08 12:09:33 +09:00
Ian Barwick
5a3e20fc38 README: update links to https versions 2017-11-08 12:07:35 +09:00
Ian Barwick
4ef2b111da Fix lock acquisition in shared memory functions 2017-11-08 11:55:08 +09:00
Ian Barwick
97471626b4 Update repmgr.conf.sample 2017-11-02 17:43:03 +09:00
Ian Barwick
4bd236b64c docs: fix example in BDR section 2017-11-02 11:23:41 +09:00
Ian Barwick
615dd2ecf4 docs: tweak Markdown URL formatting 2017-11-01 10:58:23 +09:00
Ian Barwick
1c1887f9cc docs: update links to repmgr 4.0 documentation 2017-11-01 10:50:22 +09:00
Ian Barwick
d3f11a640d docs: update copyright info 2017-11-01 09:35:57 +09:00
Ian Barwick
2341da7a06 docs: convert command reference sections to <refentry> format
Note that most entries still need a bit more tidying up, consistent structuring,
provision of more examples etc.
2017-10-31 11:27:13 +09:00
Ian Barwick
2c468d64fb "standby follow": get upstream record before server restart, if required
The standby may not always be available for connections right after it's
restarted, so attempting to connect and get the node's upstream record
after the restart may fail. Record is now retrieved before the restart.

Addresses GitHub #333.
2017-10-27 16:30:14 +09:00
Ian Barwick
9d9b74d740 docs: add sample output to "standby follow" and "standby promote" 2017-10-27 15:03:34 +09:00
Ian Barwick
a90d4419a6 docs: add note about building docs 2017-10-27 10:44:16 +09:00
Ian Barwick
68756c79f3 Fix typo 2017-10-27 09:50:48 +09:00
Ian Barwick
8ad081e7b5 docs: finalize conversion of existing BDR repmgr documentation 2017-10-26 18:52:35 +09:00
Ian Barwick
6b76704817 Initial conversion of existing BDR repmgr documentation 2017-10-26 16:29:40 +09:00
Ian Barwick
c03c509e73 docs: update configuration documentation 2017-10-26 16:11:17 +09:00
Ian Barwick
d9db4f6c45 repmgr node rejoin: add --dry-run option 2017-10-25 11:01:58 +09:00
Ian Barwick
c89d59fe96 Improve trim() function
Did not cope well with trailing spaces or entirely blank strings.
2017-10-24 15:34:43 +09:00
Ian Barwick
02b6d3748b Docs: update "repmgr cluster show" 2017-10-24 13:48:38 +09:00
Ian Barwick
7c3abe28b9 Standardize terminology on "primary" (in place of "master") 2017-10-24 13:42:50 +09:00
Ian Barwick
a39b8ccc2d --dry-run available for "node rejoin" 2017-10-23 10:40:21 +09:00
Ian Barwick
5638d4ab89 docs: fix formatting 2017-10-23 09:59:29 +09:00
Ian Barwick
37bdad290c Add --help output for "repmgr node service"
Addresses GitHub #329.
2017-10-20 16:44:44 +09:00
Ian Barwick
8911434da5 Add --help output for "repmgr node rejoin"
Addresses GitHub #329.
2017-10-20 16:31:17 +09:00
Ian Barwick
8a2bbcebfd docs: fix typo 2017-10-20 16:05:05 +09:00
Ian Barwick
61f01f8305 node rewind: add check for pg_rewind and --dry-run mode
Addresses GitHub #330
2017-10-20 14:15:23 +09:00
Ian Barwick
a35d77b7f0 Note Barman configuration file parameter changes 2017-10-20 11:30:36 +09:00
Ian Barwick
40ea1abbb4 Fix error message typo 2017-10-20 11:18:53 +09:00
Ian Barwick
785bfe9837 Prevent relative configuration file path being stored in the repmgr metadata
The configuration file path is stored to make remote execution of repmgr
(e.g. during "repmgr standby switchover") simpler, so relative paths
make no sense.

Addresses GitHub #332
2017-10-20 10:57:43 +09:00
Ian Barwick
31cd54bcff Update README
Main body of documentation moved to DocBook format and hosted at:

    https://repmgr.org/docs/index.html

as the existing README and sundry additional files were becoming
unmanageable. Conversion to DocBook format enables all documentation
to be managed in a single structured system, with cross-references,
indexes, linkable URLS etc.
2017-10-19 16:32:00 +09:00
Ian Barwick
35c8bb4e75 docs: update "repmgr cluster show" page 2017-10-19 16:21:59 +09:00
Ian Barwick
6b9ac22029 docs: expand release notes and redirect "changes-in-repmgr4.md" 2017-10-19 14:09:14 +09:00
Ian Barwick
7bf3c78f57 Add 4.0 release notes 2017-10-19 13:58:41 +09:00
Ian Barwick
34ee16899e doc: add missing entry for "priority" in repmgr.conf.sample
Per report from Shaun Thomas.
2017-10-19 13:14:52 +09:00
Ian Barwick
0938685ae7 docs: add more index references 2017-10-19 12:21:50 +09:00
Ian Barwick
b400436fba docs: note way of forcing recovery then quitting in single user mode 2017-10-18 22:31:06 +09:00
Ian Barwick
2745c92fc8 Documentation: update markup 2017-10-18 11:12:20 +09:00
Ian Barwick
34c0131b2d Update package signature documentation 2017-10-18 10:50:49 +09:00
Ian Barwick
c9abfdcc04 Document "upgrading-from-repmgr3.md" moved to main repmgr documentation 2017-10-18 09:37:16 +09:00
Ian Barwick
a878d7aaea Update "repmgr node rejoin" documentation 2017-10-17 17:40:50 +09:00
Ian Barwick
93aa7cea1a Add placeholder FAQ.md
This replaces the original FAQ maintainted for repmgr 3.x; repmgr 4
documentation is now available in DocBook format.
2017-10-17 16:31:55 +09:00
Ian Barwick
f00e6296e9 Move deprecated command line option
Not required in repmgr4, we're keeping it around for backwards compatibility;
a warning will be issued if used.
2017-10-17 16:07:44 +09:00
Ian Barwick
91354a71cc Add FAQ to documentation 2017-10-17 15:46:36 +09:00
Ian Barwick
c78cb6e1d6 Bump dev version number 2017-10-17 13:09:37 +09:00
Ian Barwick
71430a9f65 Various documentation fixes 2017-10-17 11:00:37 +09:00
Ian Barwick
3e93f847fd Update doc version 2017-10-16 11:25:56 +09:00
59 changed files with 3518 additions and 1070 deletions

45
HISTORY
View File

@@ -1,3 +1,48 @@
4.1.0 2018-??-??
repmgr: change default log_level to INFO, add documentation; GitHub #470 (Ian)
repmgr: add "--missing-slots" check to "repmgr node check" (Ian)
repmgr: improve command line error handling; GitHub #464 (Ian)
repmgr: fix "standby register --wait-sync" when no timeout provided (Ian)
repmgr: "cluster show" returns non-zero value if an issue encountered;
GitHub #456 (Ian)
repmgr: "node check" and "node status" returns non-zero value if an issue
encountered (Ian)
repmgr: add CSV output mode to "cluster event"; GitHub #471 (Ian)
repmgr: add -q/--quiet option to suppress non-error output; GitHub #468 (Ian)
repmgr: "node status" returns non-zero value if an issue encountered (Ian)
repmgr: enable "recovery_min_apply_delay" to be 0; GitHub #448 (Ian)
repmgr: "cluster cleanup" - add missing help options; GitHub #461/#462 (gclough)
repmgr: ensure witness node follows new primary after switchover;
GitHub #453 (Ian)
repmgr: fix witness node handling in "node check"/"node status";
GitHub #451 (Ian)
repmgr: fix "primary_slot_name" when using "standby clone" with --recovery-conf-only;
GitHub #474 (Ian)
repmgr: don't perform a switchover if an exclusive backup is running;
GitHub #476 (Martín)
repmgr: enable "witness unregister" to be run on any node; GitHub #472 (Ian)
repmgrd: create a PID file by default; GitHub #457 (Ian)
repmgrd: daemonize process by default; GitHub #458 (Ian)
4.0.6 2018-06-14
repmgr: (witness register) prevent registration of a witness server with the
same name as an existing node (Ian)
repmgr: (standby follow) check node has actually connected to new primary
before reporting success; GitHub #444 (Ian)
repmgr: (standby clone) improve handling of external configuration file copying,
including consideration in --dry-run check; GitHub #443 (Ian)
repmgr: (standby clone) don't require presence of "user" parameter in
conninfo string; GitHub #437 (Ian)
repmgr: (standby clone) improve documentation of --recovery-conf-only
mode; GitHub #438 (Ian)
repmgr: (node rejoin) fix bug when parsing --config-files parameter;
GitHub #442 (Ian)
repmgr: when using --dry-run, force log level to INFO to ensure output
will always be displayed; GitHub #441 (Ian)
repmgr: (cluster matrix/crosscheck) return non-zero exit code if node
connection issues detected; GitHub #447 (Ian)
repmgrd: ensure local node is counted as quorum member; GitHub #439 (Ian)
4.0.5 2018-05-02
repmgr: poll demoted primary after restart as a standby during a
switchover operation; GitHub #408 (Ian)

View File

@@ -11,7 +11,10 @@ EXTENSION = repmgr
DATA = \
repmgr--unpackaged--4.0.sql \
repmgr--4.0.sql
repmgr--4.0.sql \
repmgr--4.0--4.1.sql \
repmgr--4.1.sql
REGRESS = repmgr_extension

View File

@@ -29,9 +29,6 @@ static bool config_file_provided = false;
bool config_file_found = false;
static void _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *warning_list);
static bool parse_bool(const char *s,
const char *config_item,
ItemList *error_list);
static void _parse_line(char *buf, char *name, char *value);
static void parse_event_notifications_list(t_configuration_options *options, const char *arg);
@@ -319,13 +316,26 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
options->use_primary_conninfo_password = false;
memset(options->passfile, 0, sizeof(options->passfile));
/*-----------------------
/*-------------------------
* standby promote settings
*------------------------
*-------------------------
*/
options->promote_check_timeout = DEFAULT_PROMOTE_CHECK_TIMEOUT;
options->promote_check_interval = DEFAULT_PROMOTE_CHECK_INTERVAL;
/*------------------------
* standby follow settings
*------------------------
*/
options->primary_follow_timeout = DEFAULT_PRIMARY_FOLLOW_TIMEOUT;
options->standby_follow_timeout = DEFAULT_STANDBY_FOLLOW_TIMEOUT;
/*------------------------
* standby switchover settings
*------------------------
*/
options->standby_reconnect_timeout = DEFAULT_STANDBY_RECONNECT_TIMEOUT;
/*-----------------
* repmgrd settings
*-----------------
@@ -345,8 +355,8 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
options->degraded_monitoring_timeout = -1;
options->async_query_timeout = DEFAULT_ASYNC_QUERY_TIMEOUT;
options->primary_notification_timeout = DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT;
options->primary_follow_timeout = DEFAULT_PRIMARY_FOLLOW_TIMEOUT;
options->standby_reconnect_timeout = DEFAULT_STANDBY_RECONNECT_TIMEOUT;
options->repmgrd_standby_startup_timeout = -1; /* defaults to "standby_reconnect_timeout" if not set */
memset(options->repmgrd_pid_file, 0, sizeof(options->repmgrd_pid_file));
/*-------------
* witness settings
@@ -527,6 +537,20 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
else if (strcmp(name, "promote_check_interval") == 0)
options->promote_check_interval = repmgr_atoi(value, name, error_list, 1);
/* standby follow settings */
else if (strcmp(name, "primary_follow_timeout") == 0)
options->primary_follow_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "standby_follow_timeout") == 0)
options->standby_follow_timeout = repmgr_atoi(value, name, error_list, 0);
/* standby switchover settings */
else if (strcmp(name, "standby_reconnect_timeout") == 0)
options->standby_reconnect_timeout = repmgr_atoi(value, name, error_list, 0);
/* node rejoin settings */
else if (strcmp(name, "node_rejoin_timeout") == 0)
options->node_rejoin_timeout = repmgr_atoi(value, name, error_list, 0);
/* node check settings */
else if (strcmp(name, "archive_ready_warning") == 0)
options->archive_ready_warning = repmgr_atoi(value, name, error_list, 1);
@@ -576,10 +600,10 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
options->async_query_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "primary_notification_timeout") == 0)
options->primary_notification_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "primary_follow_timeout") == 0)
options->primary_follow_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "standby_reconnect_timeout") == 0)
options->standby_reconnect_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "repmgrd_standby_startup_timeout") == 0)
options->repmgrd_standby_startup_timeout = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "repmgrd_pid_file") == 0)
strncpy(options->repmgrd_pid_file, value, MAXPGPATH);
/* witness settings */
else if (strcmp(name, "witness_sync_interval") == 0)
@@ -761,6 +785,18 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
PQconninfoFree(conninfo_options);
}
/* set values for parameters which default to other parameters */
/*
* From 4.1, "repmgrd_standby_startup_timeout" replaces "standby_reconnect_timeout"
* in repmgrd; fall back to "standby_reconnect_timeout" if no value explicitly provided
*/
if (options->repmgrd_standby_startup_timeout == -1)
{
options->repmgrd_standby_startup_timeout = options->standby_reconnect_timeout;
}
/* add warning about changed "barman_" parameter meanings */
if ((options->barman_host[0] == '\0' && options->barman_server[0] != '\0') ||
(options->barman_host[0] != '\0' && options->barman_server[0] == '\0'))
@@ -785,6 +821,12 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
item_list_append(error_list,
_("\replication_lag_critical\" must be greater than \"replication_lag_warning\""));
}
if (options->standby_reconnect_timeout < options->node_rejoin_timeout)
{
item_list_append(error_list,
_("\"standby_reconnect_timeout\" must be equal to or greater than \"node_rejoin_timeout\""));
}
}
@@ -949,12 +991,11 @@ parse_time_unit_parameter(const char *name, const char *value, char *dest, ItemL
char *ptr = NULL;
int targ = strtol(value, &ptr, 10);
if (targ < 1)
if (targ < 0)
{
if (errors != NULL)
{
item_list_append_format(
errors,
item_list_append_format(errors,
_("invalid value provided for \"%s\""),
name);
}
@@ -1008,6 +1049,7 @@ parse_time_unit_parameter(const char *name, const char *value, char *dest, ItemL
* - promote_delay
* - reconnect_attempts
* - reconnect_interval
* - repmgrd_standby_startup_timeout
* - retry_promote_interval_secs
*
* non-changeable options
@@ -1033,17 +1075,36 @@ reload_config(t_configuration_options *orig_options)
static ItemList config_errors = {NULL, NULL};
static ItemList config_warnings = {NULL, NULL};
PQExpBufferData errors;
log_info(_("reloading configuration file"));
_parse_config(&new_options, &config_errors, &config_warnings);
if (config_errors.head != NULL)
{
/* XXX dump errors to log */
ItemListCell *cell = NULL;
log_warning(_("unable to parse new configuration, retaining current configuration"));
initPQExpBuffer(&errors);
appendPQExpBuffer(&errors,
"following errors were detected:\n");
for (cell = config_errors.head; cell; cell = cell->next)
{
appendPQExpBuffer(&errors,
" %s\n", cell->string);
}
log_detail("%s", errors.data);
termPQExpBuffer(&errors);
return false;
}
/* The following options cannot be changed */
if (new_options.node_id != orig_options->node_id)
@@ -1224,6 +1285,15 @@ reload_config(t_configuration_options *orig_options)
config_changed = true;
}
/* repmgrd_standby_startup_timeout */
if (orig_options->repmgrd_standby_startup_timeout != new_options.repmgrd_standby_startup_timeout)
{
orig_options->repmgrd_standby_startup_timeout = new_options.repmgrd_standby_startup_timeout;
log_info(_("\"repmgrd_standby_startup_timeout\" is now \"%i\""), new_options.repmgrd_standby_startup_timeout);
config_changed = true;
}
/*
* Handle changes to logging configuration
*/
@@ -1316,13 +1386,23 @@ exit_with_config_file_errors(ItemList *config_errors, ItemList *config_warnings,
void
exit_with_cli_errors(ItemList *error_list)
exit_with_cli_errors(ItemList *error_list, const char *repmgr_command)
{
fprintf(stderr, _("The following command line errors were encountered:\n"));
print_item_list(error_list);
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname());
if (repmgr_command != NULL)
{
fprintf(stderr, _("Try \"%s --help\" or \"%s %s --help\" for more information.\n"),
progname(),
progname(),
repmgr_command);
}
else
{
fprintf(stderr, _("Try \"%s --help\" for more information.\n"), progname());
}
exit(ERR_BAD_CONFIG);
}
@@ -1427,7 +1507,7 @@ repmgr_atoi(const char *value, const char *config_item, ItemList *error_list, in
*
* https://www.postgresql.org/docs/current/static/config-setting.html
*/
static bool
bool
parse_bool(const char *s, const char *config_item, ItemList *error_list)
{
PQExpBufferData errors;
@@ -1713,6 +1793,9 @@ free_parsed_argv(char ***argv_array)
}
bool
parse_pg_basebackup_options(const char *pg_basebackup_options, t_basebackup_options *backup_options, int server_version_num, ItemList *error_list)
{

View File

@@ -98,6 +98,16 @@ typedef struct
int promote_check_timeout;
int promote_check_interval;
/* standby follow settings */
int primary_follow_timeout;
int standby_follow_timeout;
/* standby switchover settings */
int standby_reconnect_timeout;
/* node rejoin settings */
int node_rejoin_timeout;
/* node check settings */
int archive_ready_warning;
int archive_ready_critical;
@@ -120,8 +130,8 @@ typedef struct
int degraded_monitoring_timeout;
int async_query_timeout;
int primary_notification_timeout;
int primary_follow_timeout;
int standby_reconnect_timeout;
int repmgrd_standby_startup_timeout;
char repmgrd_pid_file[MAXPGPATH];
/* BDR settings */
bool bdr_local_monitoring_only;
@@ -167,6 +177,13 @@ typedef struct
false, "", "", { NULL, NULL }, "", false, "", false, "", \
/* standby promote settings */ \
DEFAULT_PROMOTE_CHECK_TIMEOUT, DEFAULT_PROMOTE_CHECK_INTERVAL, \
/* standby follow settings */ \
DEFAULT_PRIMARY_FOLLOW_TIMEOUT, \
DEFAULT_STANDBY_FOLLOW_TIMEOUT, \
/* standby switchover settings */ \
DEFAULT_STANDBY_RECONNECT_TIMEOUT, \
/* node rejoin settings */ \
DEFAULT_NODE_REJOIN_TIMEOUT, \
/* node check settings */ \
DEFAULT_ARCHIVE_READY_WARNING, DEFAULT_ARCHIVE_READY_CRITICAL, \
DEFAULT_REPLICATION_LAG_WARNING, DEFAULT_REPLICATION_LAG_CRITICAL, \
@@ -180,8 +197,7 @@ typedef struct
false, -1, \
DEFAULT_ASYNC_QUERY_TIMEOUT, \
DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT, \
DEFAULT_PRIMARY_FOLLOW_TIMEOUT, \
DEFAULT_STANDBY_RECONNECT_TIMEOUT, \
-1, "", \
/* BDR settings */ \
false, DEFAULT_BDR_RECOVERY_TIMEOUT, \
/* service settings */ \
@@ -267,6 +283,10 @@ bool reload_config(t_configuration_options *orig_options);
bool parse_recovery_conf(const char *data_dir, t_recovery_conf *conf);
bool parse_bool(const char *s,
const char *config_item,
ItemList *error_list);
int repmgr_atoi(const char *s,
const char *config_item,
ItemList *error_list,
@@ -282,7 +302,7 @@ void free_parsed_argv(char ***argv_array);
/* called by repmgr-client and repmgrd */
void exit_with_cli_errors(ItemList *error_list);
void exit_with_cli_errors(ItemList *error_list, const char *repmgr_command);
void print_item_list(ItemList *item_list);
#endif /* _REPMGR_CONFIGFILE_H_ */

18
configure vendored
View File

@@ -1,6 +1,6 @@
#! /bin/sh
# Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.69 for repmgr 4.0.5.
# Generated by GNU Autoconf 2.69 for repmgr 4.1.
#
# Report bugs to <pgsql-bugs@postgresql.org>.
#
@@ -582,8 +582,8 @@ MAKEFLAGS=
# Identity of this package.
PACKAGE_NAME='repmgr'
PACKAGE_TARNAME='repmgr'
PACKAGE_VERSION='4.0.5'
PACKAGE_STRING='repmgr 4.0.5'
PACKAGE_VERSION='4.1'
PACKAGE_STRING='repmgr 4.1'
PACKAGE_BUGREPORT='pgsql-bugs@postgresql.org'
PACKAGE_URL='https://2ndquadrant.com/en/resources/repmgr/'
@@ -1178,7 +1178,7 @@ if test "$ac_init_help" = "long"; then
# Omit some internal or obsolete options to make the list less imposing.
# This message is too long to be a string in the A/UX 3.1 sh.
cat <<_ACEOF
\`configure' configures repmgr 4.0.5 to adapt to many kinds of systems.
\`configure' configures repmgr 4.1 to adapt to many kinds of systems.
Usage: $0 [OPTION]... [VAR=VALUE]...
@@ -1239,7 +1239,7 @@ fi
if test -n "$ac_init_help"; then
case $ac_init_help in
short | recursive ) echo "Configuration of repmgr 4.0.5:";;
short | recursive ) echo "Configuration of repmgr 4.1:";;
esac
cat <<\_ACEOF
@@ -1313,7 +1313,7 @@ fi
test -n "$ac_init_help" && exit $ac_status
if $ac_init_version; then
cat <<\_ACEOF
repmgr configure 4.0.5
repmgr configure 4.1
generated by GNU Autoconf 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
@@ -1332,7 +1332,7 @@ cat >config.log <<_ACEOF
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.
It was created by repmgr $as_me 4.0.5, which was
It was created by repmgr $as_me 4.1, which was
generated by GNU Autoconf 2.69. Invocation command line was
$ $0 $@
@@ -2359,7 +2359,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
# report actual input values of CONFIG_FILES etc. instead of their
# values after options handling.
ac_log="
This file was extended by repmgr $as_me 4.0.5, which was
This file was extended by repmgr $as_me 4.1, which was
generated by GNU Autoconf 2.69. Invocation command line was
CONFIG_FILES = $CONFIG_FILES
@@ -2422,7 +2422,7 @@ _ACEOF
cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`"
ac_cs_version="\\
repmgr config.status 4.0.5
repmgr config.status 4.1
configured by $0, generated by GNU Autoconf 2.69,
with options \\"\$ac_cs_config\\"

View File

@@ -1,4 +1,4 @@
AC_INIT([repmgr], [4.0.5], [pgsql-bugs@postgresql.org], [repmgr], [https://2ndquadrant.com/en/resources/repmgr/])
AC_INIT([repmgr], [4.1], [pgsql-bugs@postgresql.org], [repmgr], [https://2ndquadrant.com/en/resources/repmgr/])
AC_COPYRIGHT([Copyright (c) 2010-2018, 2ndQuadrant Ltd.])

520
dbutils.c
View File

@@ -23,6 +23,7 @@
#include <sys/time.h>
#include <sys/stat.h>
#include <dirent.h>
#include <arpa/inet.h>
#include "repmgr.h"
#include "dbutils.h"
@@ -32,6 +33,12 @@
/* mainly for use by repmgrd */
int server_version_num = UNKNOWN_SERVER_VERSION_NUM;
/*
* This is set by is_bdr_db(), which is called by every BDR-related
* action anyway; this is required to be able to generate appropriate
* queries for versions 2 and 3.
*/
int bdr_version_num = UNKNOWN_BDR_VERSION_NUM;
static PGconn *_establish_db_connection(const char *conninfo,
const bool exit_on_error,
@@ -83,7 +90,10 @@ wrap_ddl_query(PQExpBufferData *query_buf, int replication_type, const char *fmt
if (replication_type == REPLICATION_TYPE_BDR)
{
appendPQExpBuffer(query_buf, "SELECT bdr.bdr_replicate_ddl_command($repmgr$");
if (bdr_version_num < 3)
appendPQExpBuffer(query_buf, "SELECT bdr.bdr_replicate_ddl_command($repmgr$");
else
appendPQExpBuffer(query_buf, "SELECT bdr.replicate_ddl_command($repmgr$");
}
va_start(arglist, fmt);
@@ -370,6 +380,37 @@ get_conninfo_value(const char *conninfo, const char *keyword, char *output)
}
/*
* Get a default conninfo value for the provided parameter, and copy
* it to the 'output' buffer.
*
* Returns true on success, or false on failure (provided keyword not found).
*
*/
bool
get_conninfo_default_value(const char *param, char *output, int maxlen)
{
PQconninfoOption *defs = NULL;
PQconninfoOption *def = NULL;
bool found = false;
defs = PQconndefaults();
for (def = defs; def->keyword; def++)
{
if (strncmp(def->keyword, param, maxlen) == 0)
{
strncpy(output, def->val, maxlen);
found = true;
}
}
PQconninfoFree(defs);
return found;
}
void
initialize_conninfo_params(t_conninfo_param_list *param_list, bool set_defaults)
{
@@ -1560,6 +1601,39 @@ repmgrd_get_local_node_id(PGconn *conn)
}
/*
* Function that checks if the primary is in exclusive backup mode.
* We'll use this when executing an action can conflict with an exclusive
* backup.
*/
BackupState
server_in_exclusive_backup_mode(PGconn *conn)
{
BackupState backup_state = BACKUP_STATE_UNKNOWN;
PGresult *res = PQexec(conn, "SELECT pg_catalog.pg_is_in_backup()");
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_error(_("unable to retrieve information regarding backup mode of node"));
log_detail("%s", PQerrorMessage(conn));
PQclear(res);
return BACKUP_STATE_UNKNOWN;
}
if (atobool(PQgetvalue(res, 0, 0)) == true)
{
backup_state = BACKUP_STATE_IN_BACKUP;
}
else
{
backup_state = BACKUP_STATE_NO_BACKUP;
}
PQclear(res);
return backup_state;
}
/* ================ */
/* result functions */
@@ -1733,7 +1807,7 @@ _populate_node_record(PGresult *res, t_node_info *node_info, int row)
strncpy(node_info->location, PQgetvalue(res, row, 7), MAXLEN);
node_info->priority = atoi(PQgetvalue(res, row, 8));
node_info->active = atobool(PQgetvalue(res, row, 9));
strncpy(node_info->config_file, PQgetvalue(res, row, 10), MAXLEN);
strncpy(node_info->config_file, PQgetvalue(res, row, 10), MAXPGPATH);
/* This won't normally be set */
strncpy(node_info->upstream_node_name, PQgetvalue(res, row, 11), MAXLEN);
@@ -2146,8 +2220,9 @@ get_downstream_nodes_with_missing_slot(PGconn *conn, int this_node_id, NodeInfoL
"LEFT JOIN pg_catalog.pg_replication_slots rs "
" ON rs.slot_name = n.slot_name "
" WHERE n.slot_name IS NOT NULL"
" AND rs.slot_name IS NULL "
" AND n.upstream_node_id = %i ",
" AND rs.slot_name IS NULL "
" AND n.upstream_node_id = %i "
" AND n.type = 'standby'",
this_node_id);
log_verbose(LOG_DEBUG, "get_all_node_records_with_missing_slot():\n%s", query.data);
@@ -2207,6 +2282,7 @@ _create_update_node_record(PGconn *conn, char *action, t_node_info *node_info)
const char *param_values[param_count];
PGresult *res;
bool success = true;
maxlen_snprintf(node_id, "%i", node_info->node_id);
maxlen_snprintf(priority, "%i", node_info->priority);
@@ -2293,13 +2369,13 @@ _create_update_node_record(PGconn *conn, char *action, t_node_info *node_info)
node_info->node_name,
node_info->node_id);
log_detail("%s", PQerrorMessage(conn));
PQclear(res);
return false;
success = false;
}
PQclear(res);
return true;
return success;
}
@@ -2308,6 +2384,7 @@ update_node_record_set_active(PGconn *conn, int this_node_id, bool active)
{
PQExpBufferData query;
PGresult *res = NULL;
bool success = true;
initPQExpBuffer(&query);
@@ -2326,13 +2403,13 @@ update_node_record_set_active(PGconn *conn, int this_node_id, bool active)
{
log_error(_("unable to update node record:\n %s"),
PQerrorMessage(conn));
PQclear(res);
return false;
success = false;
}
PQclear(res);
return true;
return success;
}
@@ -2341,6 +2418,7 @@ update_node_record_set_active_standby(PGconn *conn, int this_node_id)
{
PQExpBufferData query;
PGresult *res = NULL;
bool success = true;
initPQExpBuffer(&query);
@@ -2360,13 +2438,13 @@ update_node_record_set_active_standby(PGconn *conn, int this_node_id)
{
log_error(_("unable to update node record:\n %s"),
PQerrorMessage(conn));
PQclear(res);
return false;
success = false;
}
PQclear(res);
return true;
return success;
}
@@ -2435,11 +2513,13 @@ update_node_record_set_primary(PGconn *conn, int this_node_id)
return commit_transaction(conn);
}
bool
update_node_record_set_upstream(PGconn *conn, int this_node_id, int new_upstream_node_id)
{
PQExpBufferData query;
PGresult *res = NULL;
bool success = true;
log_debug(_("update_node_record_set_upstream(): Updating node %i's upstream node to %i"),
this_node_id, new_upstream_node_id);
@@ -2461,14 +2541,13 @@ update_node_record_set_upstream(PGconn *conn, int this_node_id, int new_upstream
{
log_error(_("unable to set new upstream node id:\n %s"),
PQerrorMessage(conn));
PQclear(res);
return false;
success = false;
}
PQclear(res);
return true;
return success;
}
@@ -2481,6 +2560,7 @@ update_node_record_status(PGconn *conn, int this_node_id, char *type, int upstre
{
PQExpBufferData query;
PGresult *res = NULL;
bool success = true;
initPQExpBuffer(&query);
@@ -2504,14 +2584,13 @@ update_node_record_status(PGconn *conn, int this_node_id, char *type, int upstre
{
log_error(_("unable to update node record:\n %s"),
PQerrorMessage(conn));
PQclear(res);
return false;
success = false;
}
PQclear(res);
return true;
return success;
}
@@ -2524,6 +2603,7 @@ update_node_record_conn_priority(PGconn *conn, t_configuration_options *options)
{
PQExpBufferData query;
PGresult *res = NULL;
bool success = true;
initPQExpBuffer(&query);
@@ -2541,13 +2621,12 @@ update_node_record_conn_priority(PGconn *conn, t_configuration_options *options)
if (PQresultStatus(res) != PGRES_COMMAND_OK)
{
PQclear(res);
return false;
success = false;
}
PQclear(res);
return true;
return success;
}
@@ -2611,6 +2690,7 @@ delete_node_record(PGconn *conn, int node)
{
PQExpBufferData query;
PGresult *res = NULL;
bool success = true;
initPQExpBuffer(&query);
@@ -2628,19 +2708,20 @@ delete_node_record(PGconn *conn, int node)
{
log_error(_("unable to delete node record:\n %s"),
PQerrorMessage(conn));
PQclear(res);
return false;
success = false;
}
PQclear(res);
return true;
return success;
}
bool
truncate_node_records(PGconn *conn)
{
PGresult *res = NULL;
bool success = true;
res = PQexec(conn, "TRUNCATE TABLE repmgr.nodes");
@@ -2648,12 +2729,13 @@ truncate_node_records(PGconn *conn)
{
log_error(_("unable to truncate node record table:\n %s"),
PQerrorMessage(conn));
PQclear(res);
return false;
success = false;
}
PQclear(res);
return true;
return success;
}
@@ -2884,8 +2966,7 @@ get_datadir_configuration_files(PGconn *conn, KeyValueList *list)
for (i = 0; i < PQntuples(res); i++)
{
key_value_list_set(
list,
key_value_list_set(list,
PQgetvalue(res, i, 1),
PQgetvalue(res, i, 0));
}
@@ -3110,6 +3191,8 @@ _create_event(PGconn *conn, t_configuration_options *options, int node_id, char
char event_timestamp[MAXLEN] = "";
bool success = true;
log_verbose(LOG_DEBUG, "_create_event(): event is \"%s\" for node %i", event, node_id);
/*
* Only attempt to write a record if a connection handle was provided.
* Also check that the repmgr schema has been properly initialised - if
@@ -3620,7 +3703,7 @@ get_slot_record(PGconn *conn, char *slot_name, t_replication_slot *record)
int
get_free_replication_slots(PGconn *conn)
get_free_replication_slot_count(PGconn *conn)
{
PQExpBufferData query;
PGresult *res = NULL;
@@ -3657,6 +3740,47 @@ get_free_replication_slots(PGconn *conn)
}
int
get_inactive_replication_slots(PGconn *conn, KeyValueList *list)
{
PQExpBufferData query;
PGresult *res = NULL;
int i, inactive_slots = 0;
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
" SELECT slot_name, slot_type "
" FROM pg_catalog.pg_replication_slots "
" WHERE active IS FALSE "
" ORDER BY slot_name ");
res = PQexec(conn, query.data);
termPQExpBuffer(&query);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_error(_("unable to execute replication slot query"));
log_detail("%s", PQerrorMessage(conn));
PQclear(res);
return -1;
}
inactive_slots = PQntuples(res);
for (i = 0; i < inactive_slots; i++)
{
key_value_list_set(list,
PQgetvalue(res, i, 0),
PQgetvalue(res, i, 1));
}
PQclear(res);
return inactive_slots;
}
/* ==================== */
/* tablespace functions */
/* ==================== */
@@ -4277,6 +4401,7 @@ get_last_wal_receive_location(PGconn *conn)
/* BDR functions */
/* ============= */
static bool
_is_bdr_db(PGconn *conn, PQExpBufferData *output, bool quiet)
{
@@ -4287,7 +4412,9 @@ _is_bdr_db(PGconn *conn, PQExpBufferData *output, bool quiet)
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
"SELECT pg_catalog.count(*) FROM pg_catalog.pg_extension WHERE extname='bdr'");
" SELECT (pg_catalog.regexp_matches(extversion, '^\\d+'))[1] AS major_version "
" FROM pg_catalog.pg_extension "
" WHERE extname = 'bdr' ");
res = PQexec(conn, query.data);
termPQExpBuffer(&query);
@@ -4295,14 +4422,18 @@ _is_bdr_db(PGconn *conn, PQExpBufferData *output, bool quiet)
if (PQresultStatus(res) != PGRES_TUPLES_OK || PQntuples(res) == 0)
{
is_bdr_db = false;
bdr_version_num = UNKNOWN_BDR_VERSION_NUM;
}
else
{
is_bdr_db = atoi(PQgetvalue(res, 0, 0)) == 1 ? true : false;
is_bdr_db = true;
bdr_version_num = atoi(PQgetvalue(res, 0, 0));
}
PQclear(res);
log_verbose(LOG_DEBUG, "BDR ext version number is %i", bdr_version_num);
if (is_bdr_db == false)
{
const char *warning = _("BDR extension is not available for this database");
@@ -4315,36 +4446,42 @@ _is_bdr_db(PGconn *conn, PQExpBufferData *output, bool quiet)
return is_bdr_db;
}
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
"SELECT bdr.bdr_is_active_in_db()");
res = PQexec(conn, query.data);
termPQExpBuffer(&query);
is_bdr_db = atobool(PQgetvalue(res, 0, 0));
if (is_bdr_db == false)
if (bdr_version_num < 3)
{
const char *warning = _("BDR extension available for this database, but the database is not configured for BDR");
initPQExpBuffer(&query);
if (output != NULL)
appendPQExpBuffer(output, "%s", warning);
else if (quiet == false)
log_warning("%s", warning);
appendPQExpBuffer(&query,
"SELECT bdr.bdr_is_active_in_db()");
res = PQexec(conn, query.data);
termPQExpBuffer(&query);
is_bdr_db = atobool(PQgetvalue(res, 0, 0));
if (is_bdr_db == false)
{
const char *warning = _("BDR extension available for this database, but the database is not configured for BDR");
if (output != NULL)
appendPQExpBuffer(output, "%s", warning);
else if (quiet == false)
log_warning("%s", warning);
}
PQclear(res);
}
PQclear(res);
return is_bdr_db;
}
bool
is_bdr_db(PGconn *conn, PQExpBufferData *output)
{
return _is_bdr_db(conn, output, false);
}
bool
is_bdr_db_quiet(PGconn *conn)
{
@@ -4352,6 +4489,11 @@ is_bdr_db_quiet(PGconn *conn)
}
int
get_bdr_version_num(void)
{
return bdr_version_num;
}
bool
is_active_bdr_node(PGconn *conn, const char *node_name)
@@ -4361,13 +4503,29 @@ is_active_bdr_node(PGconn *conn, const char *node_name)
bool is_active_bdr_node = false;
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
" SELECT COALESCE(s.active, TRUE) AS active"
" FROM bdr.bdr_nodes n "
" LEFT JOIN pg_catalog.pg_replication_slots s "
" ON s.slot_name=bdr.bdr_format_slot_name(n.node_sysid, n.node_timeline, n.node_dboid, (SELECT oid FROM pg_catalog.pg_database WHERE datname = pg_catalog.current_database())) "
" WHERE n.node_name='%s' ",
node_name);
if (bdr_version_num < 3)
{
appendPQExpBuffer(&query,
" SELECT COALESCE(s.active, TRUE) AS active"
" FROM bdr.bdr_nodes n "
" LEFT JOIN pg_catalog.pg_replication_slots s "
" ON s.slot_name=bdr.bdr_format_slot_name(n.node_sysid, n.node_timeline, n.node_dboid, (SELECT oid FROM pg_catalog.pg_database WHERE datname = pg_catalog.current_database())) "
" WHERE n.node_name='%s' ",
node_name);
}
else
{
appendPQExpBuffer(&query,
" SELECT COALESCE(s.active, FALSE) AS active"
" FROM bdr.node bn "
" INNER JOIN pglogical.node pn "
" ON (pn.node_id = bn.pglogical_node_id) "
" LEFT JOIN pg_catalog.pg_replication_slots s "
" ON s.slot_name=bn.local_slot_name "
" WHERE pn.node_name='%s' ",
node_name);
}
log_verbose(LOG_DEBUG, "is_active_bdr_node():\n %s", query.data);
@@ -4421,6 +4579,64 @@ is_bdr_repmgr(PGconn *conn)
}
/*
* Get name of default BDR replication set.
*
* Caller must free provided value.
*/
char *
get_default_bdr_replication_set(PGconn *conn)
{
PQExpBufferData query;
PGresult *res = NULL;
char *default_replication_set = NULL;
int namelen;
if (bdr_version_num < 3)
{
/* For BDR2, we use a custom replication set */
namelen = strlen(BDR2_REPLICATION_SET_NAME);
default_replication_set = pg_malloc0(namelen + 1);
strncpy(default_replication_set, BDR2_REPLICATION_SET_NAME, namelen);
return default_replication_set;
}
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
" SELECT rs.set_name "
" FROM pglogical.replication_set rs "
" INNER JOIN bdr.node_group ng "
" ON ng.node_group_default_repset = rs.set_id ");
res = PQexec(conn, query.data);
termPQExpBuffer(&query);
if (PQresultStatus(res) != PGRES_TUPLES_OK || PQntuples(res) == 0)
{
log_warning(_("unable to retrieve default BDR replication set name"));
if (PQresultStatus(res) != PGRES_TUPLES_OK)
log_detail("%s", PQerrorMessage(conn));
PQclear(res);
return NULL;
}
namelen = strlen(PQgetvalue(res, 0, 0));
default_replication_set = pg_malloc0(namelen + 1);
strncpy(default_replication_set, PQgetvalue(res, 0, 0), namelen);
PQclear(res);
return default_replication_set;
}
bool
is_table_in_bdr_replication_set(PGconn *conn, const char *tablename, const char *set)
{
@@ -4430,12 +4646,28 @@ is_table_in_bdr_replication_set(PGconn *conn, const char *tablename, const char
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
"SELECT pg_catalog.count(*) "
" FROM pg_catalog.unnest(bdr.table_get_replication_sets('repmgr.%s')) AS repset "
" WHERE repset='%s' ",
tablename,
set);
if (bdr_version_num < 3)
{
appendPQExpBuffer(&query,
"SELECT pg_catalog.count(*) "
" FROM pg_catalog.unnest(bdr.table_get_replication_sets('repmgr.%s')) AS repset "
" WHERE repset='%s' ",
tablename,
set);
}
else
{
appendPQExpBuffer(&query,
" SELECT pg_catalog.count(*) "
" FROM pglogical.replication_set s "
" INNER JOIN pglogical.replication_set_table st "
" ON s.set_id = st.set_id "
" WHERE s.set_name = '%s' "
" AND st.set_reloid = 'repmgr.%s'::REGCLASS ",
set,
tablename);
}
res = PQexec(conn, query.data);
termPQExpBuffer(&query);
@@ -4461,32 +4693,44 @@ add_table_to_bdr_replication_set(PGconn *conn, const char *tablename, const char
{
PQExpBufferData query;
PGresult *res = NULL;
bool success = true;
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
"SELECT bdr.table_set_replication_sets('repmgr.%s', '{%s}')",
tablename,
set);
if (bdr_version_num < 3)
{
appendPQExpBuffer(&query,
"SELECT bdr.table_set_replication_sets('repmgr.%s', '{%s}')",
tablename,
set);
}
else
{
appendPQExpBuffer(&query,
" SELECT bdr.replication_set_add_table( "
" relation := 'repmgr.%s', "
" set_name := '%s' "
" ) ",
tablename,
set);
}
res = PQexec(conn, query.data);
termPQExpBuffer(&query);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_error(_("unable to add table \"repmgr.%s\" to replication set \"%s\":\n %s"),
log_error(_("unable to add table \"repmgr.%s\" to replication set \"%s\""),
tablename,
set,
PQerrorMessage(conn));
set);
log_detail("%s", PQerrorMessage(conn));
if (res != NULL)
PQclear(res);
return false;
success = false;
}
PQclear(res);
return true;
return success;
}
@@ -4499,8 +4743,16 @@ bdr_node_name_matches(PGconn *conn, const char *node_name, PQExpBufferData *bdr_
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
"SELECT bdr.bdr_get_local_node_name() AS node_name");
if (bdr_version_num < 3)
{
appendPQExpBuffer(&query,
"SELECT bdr.bdr_get_local_node_name() AS node_name");
}
else
{
appendPQExpBuffer(&query,
"SELECT node_name FROM bdr.local_node_info()");
}
res = PQexec(conn, query.data);
termPQExpBuffer(&query);
@@ -4531,21 +4783,36 @@ get_bdr_node_replication_slot_status(PGconn *conn, const char *node_name)
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
" SELECT s.active "
" FROM pg_catalog.pg_replication_slots s "
" WHERE slot_name = "
" (SELECT bdr.bdr_format_slot_name(node_sysid, node_timeline, node_dboid, datoid) "
" FROM bdr.bdr_nodes "
" WHERE node_name = '%s') ",
node_name);
if (bdr_version_num < 3)
{
appendPQExpBuffer(&query,
" SELECT s.active "
" FROM pg_catalog.pg_replication_slots s "
" WHERE slot_name = "
" (SELECT bdr.bdr_format_slot_name(node_sysid, node_timeline, node_dboid, datoid) "
" FROM bdr.bdr_nodes "
" WHERE node_name = '%s') ",
node_name);
}
else
{
appendPQExpBuffer(&query,
" SELECT COALESCE(s.active, FALSE) AS active"
" FROM bdr.node bn "
" INNER JOIN pglogical.node pn "
" ON (pn.node_id = bn.pglogical_node_id) "
" INNER JOIN pg_catalog.pg_replication_slots s "
" ON s.slot_name=bn.local_slot_name "
" WHERE pn.node_name='%s' ",
node_name);
}
log_verbose(LOG_DEBUG, "get_bdr_node_replication_slot_status():\n %s", query.data);
res = PQexec(conn, query.data);
termPQExpBuffer(&query);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
if (PQresultStatus(res) != PGRES_TUPLES_OK || PQntuples(res) == 0)
{
status = SLOT_UNKNOWN;
}
@@ -4596,6 +4863,9 @@ get_bdr_other_node_name(PGconn *conn, int node_id, char *node_name)
}
/*
* For BDR 2.x only
*/
void
add_extension_tables_to_bdr_replication_set(PGconn *conn)
{
@@ -4617,7 +4887,7 @@ add_extension_tables_to_bdr_replication_set(PGconn *conn)
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
/* */
/* XXX log error */
}
else
{
@@ -4625,8 +4895,7 @@ add_extension_tables_to_bdr_replication_set(PGconn *conn)
for (i = 0; i < PQntuples(res); i++)
{
add_table_to_bdr_replication_set(
conn,
add_table_to_bdr_replication_set(conn,
PQgetvalue(res, i, 0),
"repmgr");
}
@@ -4645,10 +4914,20 @@ get_all_bdr_node_records(PGconn *conn, BdrNodeInfoList *node_list)
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
" SELECT " BDR_NODES_COLUMNS
" FROM bdr.bdr_nodes "
"ORDER BY node_seq_id ");
if (bdr_version_num < 3)
{
appendPQExpBuffer(&query,
" SELECT " BDR2_NODES_COLUMNS
" FROM bdr.bdr_nodes "
"ORDER BY node_seq_id ");
}
else
{
appendPQExpBuffer(&query,
" SELECT " BDR3_NODES_COLUMNS
" FROM bdr.node_summary ns "
" ORDER BY node_name");
}
log_verbose(LOG_DEBUG, "get_all_node_records():\n%s", query.data);
@@ -4669,11 +4948,22 @@ get_bdr_node_record_by_name(PGconn *conn, const char *node_name, t_bdr_node_info
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
" SELECT " BDR_NODES_COLUMNS
" FROM bdr.bdr_nodes "
" WHERE node_name = '%s'",
node_name);
if (bdr_version_num < 3)
{
appendPQExpBuffer(&query,
" SELECT " BDR2_NODES_COLUMNS
" FROM bdr.bdr_nodes "
" WHERE node_name = '%s'",
node_name);
}
else
{
appendPQExpBuffer(&query,
" SELECT " BDR3_NODES_COLUMNS
" FROM bdr.node_summary ns "
" WHERE ns.node_name = '%s'",
node_name);
}
log_verbose(LOG_DEBUG, "get_bdr_node_record_by_name():\n%s", query.data);
@@ -4743,16 +5033,12 @@ _populate_bdr_node_records(PGresult *res, BdrNodeInfoList *node_list)
static void
_populate_bdr_node_record(PGresult *res, t_bdr_node_info *node_info, int row)
{
char buf[MAXLEN] = "";
strncpy(node_info->node_sysid, PQgetvalue(res, row, 0), MAXLEN);
node_info->node_timeline = atoi(PQgetvalue(res, row, 1));
node_info->node_dboid = atoi(PQgetvalue(res, row, 2));
strncpy(buf, PQgetvalue(res, row, 3), MAXLEN);
node_info->node_status = buf[0];
strncpy(node_info->node_name, PQgetvalue(res, row, 4), MAXLEN);
strncpy(node_info->node_local_dsn, PQgetvalue(res, row, 5), MAXLEN);
strncpy(node_info->node_init_from_dsn, PQgetvalue(res, row, 6), MAXLEN);
strncpy(node_info->node_name, PQgetvalue(res, row, 3), MAXLEN);
strncpy(node_info->node_local_dsn, PQgetvalue(res, row, 4), MAXLEN);
strncpy(node_info->peer_state_name, PQgetvalue(res, row, 5), MAXLEN);
}
@@ -4807,13 +5093,17 @@ bdr_node_has_repmgr_set(PGconn *conn, const char *node_name)
PGresult *res = NULL;
bool has_repmgr_set = false;
if (bdr_version_num >= 3)
return true;
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
" SELECT pg_catalog.count(*) "
" FROM pg_catalog.unnest(bdr.connection_get_replication_sets('%s') AS repset "
" WHERE repset = 'repmgr'",
node_name);
" WHERE repset = '%s'",
node_name,
BDR2_REPLICATION_SET_NAME);
res = PQexec(conn, query.data);
termPQExpBuffer(&query);
@@ -4840,26 +5130,40 @@ bdr_node_set_repmgr_set(PGconn *conn, const char *node_name)
PGresult *res = NULL;
bool success = true;
if (bdr_version_num >= 3)
return true;
initPQExpBuffer(&query);
/*
* Here we extract a list of existing replication sets, add 'repmgr', and
* set the replication sets to the new list.
*/
appendPQExpBuffer(&query,
" SELECT bdr.connection_set_replication_sets( "
" ARRAY( "
" SELECT repset::TEXT "
" FROM pg_catalog.unnest(bdr.connection_get_replication_sets('%s')) AS repset "
" UNION "
" SELECT 'repmgr'::TEXT "
" SELECT '%s'::TEXT "
" ), "
" '%s' "
" ) ",
node_name,
BDR2_REPLICATION_SET_NAME,
node_name);
log_debug("bdr_node_set_repmgr_set():\n%s", query.data);
res = PQexec(conn, query.data);
termPQExpBuffer(&query);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_debug("result status: %s", PQresStatus(PQresultStatus(res)));
log_error(_("unable to create replication set \"repmgr\""));
log_detail("%s", PQerrorMessage(conn));
success = false;
}

View File

@@ -29,7 +29,9 @@
#include "voting.h"
#define REPMGR_NODES_COLUMNS "n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name "
#define BDR_NODES_COLUMNS "node_sysid, node_timeline, node_dboid, node_status, node_name, node_local_dsn, node_init_from_dsn, node_read_only, node_seq_id"
#define BDR2_NODES_COLUMNS "node_sysid, node_timeline, node_dboid, node_name, node_local_dsn, ''"
#define BDR3_NODES_COLUMNS "ns.node_id, 0, 0, ns.node_name, ns.interface_connstr, ns.peer_state_name"
#define ERRBUFF_SIZE 512
@@ -94,6 +96,14 @@ typedef enum
SLOT_ACTIVE
} ReplSlotStatus;
typedef enum
{
BACKUP_STATE_UNKNOWN = -1,
BACKUP_STATE_IN_BACKUP,
BACKUP_STATE_NO_BACKUP
} BackupState;
/*
* Struct to store node information
*/
@@ -237,18 +247,14 @@ typedef struct s_bdr_node_info
char node_sysid[MAXLEN];
uint32 node_timeline;
uint32 node_dboid;
char node_status;
char node_name[MAXLEN];
char node_local_dsn[MAXLEN];
char node_init_from_dsn[MAXLEN];
bool read_only;
uint32 node_seq_id;
char peer_state_name[MAXLEN];
} t_bdr_node_info;
#define T_BDR_NODE_INFO_INITIALIZER { \
"", InvalidOid, InvalidOid, \
'?', "", "", "", \
false, -1 \
"", "", "" \
}
@@ -357,7 +363,7 @@ void close_connection(PGconn **conn);
/* conninfo manipulation functions */
bool get_conninfo_value(const char *conninfo, const char *keyword, char *output);
bool get_conninfo_default_value(const char *param, char *output, int maxlen);
void initialize_conninfo_params(t_conninfo_param_list *param_list, bool set_defaults);
void free_conninfo_params(t_conninfo_param_list *param_list);
void copy_conninfo_params(t_conninfo_param_list *dest_list, t_conninfo_param_list *source_list);
@@ -369,6 +375,7 @@ bool parse_conninfo_string(const char *conninfo_str, t_conninfo_param_list *par
char *param_list_to_string(t_conninfo_param_list *param_list);
bool has_passfile(void);
/* transaction functions */
bool begin_transaction(PGconn *conn);
bool commit_transaction(PGconn *conn);
@@ -391,6 +398,7 @@ int get_ready_archive_files(PGconn *conn, const char *data_directory);
bool identify_system(PGconn *repl_conn, t_system_identification *identification);
bool repmgrd_set_local_node_id(PGconn *conn, int local_node_id);
int repmgrd_get_local_node_id(PGconn *conn);
BackupState server_in_exclusive_backup_mode(PGconn *conn);
/* extension functions */
ExtensionStatus get_repmgr_extension_status(PGconn *conn);
@@ -454,7 +462,8 @@ void create_slot_name(char *slot_name, int node_id);
bool create_replication_slot(PGconn *conn, char *slot_name, int server_version_num, PQExpBufferData *error_msg);
bool drop_replication_slot(PGconn *conn, char *slot_name);
RecordStatus get_slot_record(PGconn *conn, char *slot_name, t_replication_slot *record);
int get_free_replication_slots(PGconn *conn);
int get_free_replication_slot_count(PGconn *conn);
int get_inactive_replication_slots(PGconn *conn, KeyValueList *list);
/* tablespace functions */
bool get_tablespace_name_by_location(PGconn *conn, const char *location, char *name);
@@ -505,12 +514,14 @@ void get_node_replication_stats(PGconn *conn, int server_version_num, t_node_in
bool is_downstream_node_attached(PGconn *conn, char *node_name);
/* BDR functions */
int get_bdr_version_num(void);
void get_all_bdr_node_records(PGconn *conn, BdrNodeInfoList *node_list);
RecordStatus get_bdr_node_record_by_name(PGconn *conn, const char *node_name, t_bdr_node_info *node_info);
bool is_bdr_db(PGconn *conn, PQExpBufferData *output);
bool is_bdr_db_quiet(PGconn *conn);
bool is_active_bdr_node(PGconn *conn, const char *node_name);
bool is_bdr_repmgr(PGconn *conn);
char *get_default_bdr_replication_set(PGconn *conn);
bool is_table_in_bdr_replication_set(PGconn *conn, const char *tablename, const char *set);
bool add_table_to_bdr_replication_set(PGconn *conn, const char *tablename, const char *set);
void add_extension_tables_to_bdr_replication_set(PGconn *conn);

View File

@@ -41,18 +41,19 @@
<title>CentOS repositories</title>
<para>
&repmgr; packages are available from the 2ndQuadrant repository, and also the PostgreSQL
community repository. The 2ndQuadrant repository is updated immediately after each
&repmgr; packages are available from the public 2ndQuadrant repository, and also the
PostgreSQL community repository. The 2ndQuadrant repository is updated immediately
after each
&repmgr; release.
</para>
<table id="centos-2ndquadrant-repository">
<title>2ndQuadrant repository</title>
<title>2ndQuadrant public repository</title>
<tgroup cols="2">
<tbody>
<row>
<entry>Repository URL:</entry>
<entry><ulink url="http://packages.2ndquadrant.com/repmgr/">http://packages.2ndquadrant.com/repmgr/</ulink></entry>
<entry><ulink url="https://rpm.2ndquadrant.com/">https://rpm.2ndquadrant.com/</ulink></entry>
</row>
<row>
<entry>Repository documentation:</entry>
@@ -363,4 +364,48 @@
</sect2>
</sect1>
<sect1 id="packages-packager-info" xreflabel="Information for packagers">
<title>Information for packagers</title>
<indexterm>
<primary>packages</primary>
<secondary>information for packagers</secondary>
</indexterm>
<para>
We recommend patching the following parameters when
building the package as built-in default values for user convenience.
These values can nevertheless be overridden by the user, if desired.
</para>
<itemizedlist>
<listitem>
<para>
Configuration file location: the default configuration file location
can be hard-coded by patching <varname>package_conf_file</varname>
in <filename>configfile.c</filename>:
<programlisting>
/* packagers: if feasible, patch configuration file path into "package_conf_file" */
char package_conf_file[MAXPGPATH] = "";</programlisting>
</para>
<para>
See also: <xref linkend="configuration-file">
</para>
</listitem>
<listitem>
<para>
PID file location: the default <application>repmgrd</application> PID file
location can be hard-coded by patching <varname>package_pid_file</varname>
in <filename>repmgrd.c</filename>:
<programlisting>
/* packagers: if feasible, patch PID file path into "package_pid_file" */
char package_pid_file[MAXPGPATH] = "";</programlisting>
</para>
<para>
See also: <xref linkend="repmgrd-pid-file">
</para>
</listitem>
</itemizedlist>
</sect1>
</appendix>

View File

@@ -15,6 +15,310 @@
See also: <xref linkend="upgrading-repmgr">
</para>
<sect1 id="release-4.1.0">
<title>Release 4.1.0</title>
<para><emphasis>???? ??, 2018</emphasis></para>
<para>
&repmgr; 4.1.0 introduces some changes to <application>repmgrd</application>
behaviour and some additional configuration parameters.
</para>
<para>
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.6.
The following post-upgrade steps must be carried out:
<itemizedlist>
<listitem>
<para>
<application>repmgrd</application> (if running) must be restarted.
</para>
</listitem>
<listitem>
<para>
Execute <command>ALTER EXTENSION repmgr UPDATE</command>
on the primary server in the database where &repmgr; is installed.
</para>
</listitem>
</itemizedlist>
A restart of the PostgreSQL server is <emphasis>not</emphasis> required
for this release.
</para>
<para>
See <xref linkend="upgrading-repmgr-extension"> for more details.
</para>
<para>
Configuration changes are backwards-compatible and no changes to
<filename>repmgr.conf</filename> are required. However users should
review the changes listed below.
</para>
<sect2>
<title>Configuration file changes</title>
<para>
<itemizedlist>
<listitem>
<para>
Default for <xref linkend="repmgr-conf-log-level"> is now <option>INFO</option>.
This produces additional informative log output, without creating excessive additional
log file volume, and matches the setting assumed for examples in the documentation.
(GitHub #470).
</para>
</listitem>
<listitem>
<para>
<varname>recovery_min_apply_delay</varname> now accepts a minimum value
of <literal>zero</literal> (GitHub #448).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>repmgr enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<application>repmgr</application>: always exit with an error if an unrecognised
command line option is provided. This matches the behaviour of other PostgreSQL
utilities such as <application>psql</application>. (GitHub #464).
</para>
</listitem>
<listitem>
<para>
<application>repmgr</application>: add <option>-q/--quiet</option> option to suppress non-error
output. (GitHub #468).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-cluster-show">repmgr cluster show</link></command>,
<command><link linkend="repmgr-node-check">repmgr node check</link></command> and
<command><link linkend="repmgr-node-status">repmgr node status</link></command>
return non-zero exit code if node status issues detected. (GitHub #456).
</para>
</listitem>
<listitem>
<para>
Add <option>--csv</option> output option for
<command><link linkend="repmgr-cluster-event">repmgr cluster event</link></command>.
(GitHub #471).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-witness-unregister">repmgr witness unregister</link></command>
can be run on any node, by providing the ID of the witness node with <option>--node-id</option>.
(GitHub #472).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-standby-switchover">repmgr standby switchover</link></command>
will refuse to run if an exclusive backup is taking place on the current primary.
(GitHub #476).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>repmgrd enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<application>repmgrd</application>: create a PID file by default
(GitHub #457). For details, see <xref linkend="repmgrd-pid-file">.
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: daemonize process by default.
In case, for whatever reason, the user does not wish to daemonize the
process, provide <option>--daemonize=false</option>.
(GitHub #458).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>Bug fixes</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-standby-register">repmgr standby register --wait-sync</link></command>:
fix behaviour when no timeout provided.
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-cluster-cleanup">repmgr cluster cleanup</link></command>:
add missing help options. (GitHub #461/#462).
</para>
</listitem>
<listitem>
<para>
Ensure witness node follows new primary after switchover. (GitHub #453).
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-node-check">repmgr node check</link></command> and
<command><link linkend="repmgr-node-status">repmgr node status</link></command>:
fix witness node handling. (GitHub #451).
</para>
</listitem>
<listitem>
<para>
When using <command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>
with <option>--recovery-conf-only</option> and replication slots, ensure
<varname>primary_slot_name</varname> is set correctly. (GitHub #474).
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<sect1 id="release-4.0.6">
<title>Release 4.0.6</title>
<para><emphasis>June 14, 2018</emphasis></para>
<para>
&repmgr; 4.0.6 contains a number of bug fixes and usability enhancements.
</para>
<para>
We recommend upgrading to this version as soon as possible.
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.5;
<application>repmgrd</application> (if running) should be restarted. See <xref linkend="upgrading-repmgr">
for more details.
</para>
<sect2>
<title>Usability enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-cluster-crosscheck">repmgr cluster crosscheck</link></command> and
<command><link linkend="repmgr-cluster-matrix">repmgr cluster matrix</link></command>:
return non-zero exit code if node connection issues detected (GitHub #447)
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>:
Improve handling of external configuration file copying, including consideration in
<option>--dry-run</option> check
(GitHub #443)
</para>
</listitem>
<listitem>
<para>
When using <option>--dry-run</option>, force log level to <literal>INFO</literal>
to ensure output will always be displayed
(GitHub #441)
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>:
Improve documentation of <option>--recovery-conf-only</option> mode
(GitHub #438)
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>:
Don't require presence of <varname>user</varname> parameter in conninfo string
(GitHub #437)
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>Bug fixes</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-witness-register">repmgr witness register</link></command>:
prevent registration of a witness server with the same name as an existing node
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command>:
check node has actually connected to new primary before reporting success
(GitHub #444)
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-node-rejoin">repmgr node rejoin</link></command>:
Fix bug when parsing <option>--config-files</option> parameter
(GitHub #442)
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: ensure local node is counted as quorum member
(GitHub #439)
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<sect1 id="release-4.0.5">
<title>Release 4.0.5</title>
<para><emphasis>Wed May 2, 2018</emphasis></para>
@@ -24,6 +328,7 @@
generation and (in <application>repmgrd</application>) handling of various
corner-case situations, as well as a number of bug fixes.
</para>
<sect2>
<title>Usability enhancements</title>
@@ -32,7 +337,7 @@
<listitem>
<para>
Various documentation improvements, with particular emphasis on
the importance of setting appropriate <link linkend="configuration-service-commands">service commands</link>
the importance of setting appropriate <link linkend="configuration-file-service-commands">service commands</link>
instead of relying on <application>pg_ctl</application>.
</para>
</listitem>

View File

@@ -33,34 +33,5 @@
</sect1>
<sect1 id="repmgr-rpm-key" xreflabel="repmgr rpm key">
<title>repmgr RPM signing key</title>
<para>
The signing key ID used for <application>repmgr</application> source code bundles is:
<ulink url="http://packages.2ndquadrant.com/repmgr/RPM-GPG-KEY-repmgr">
<literal>0x702D883A</literal></ulink>.
</para>
<para>
To download the <application>repmgr</application> source key to your computer:
<programlisting>
curl -s http://packages.2ndquadrant.com/repmgr/RPM-GPG-KEY-repmgr | gpg --import
gpg --fingerprint 0x702D883A
</programlisting>
then verify that the fingerprint is the expected value:
<programlisting>
AE4E 390E A58E 0037 6148 3F29 888D 018B 702D 883A</programlisting>
</para>
<para>
To check a repository RPM, use <application>rpmkeys</application> to load the
packaging signing key into the RPM database then use <literal>rpm -K</literal>, e.g.:
<programlisting>
sudo rpmkeys --import http://packages.2ndquadrant.com/repmgr/RPM-GPG-KEY-repmgr
rpm -K postgresql-bdr94-2ndquadrant-redhat-1.0-2.noarch.rpm
</programlisting>
</para>
</sect1>
</appendix>

View File

@@ -0,0 +1,107 @@
<sect1 id="configuration-file-log-settings" xreflabel="log settings">
<indexterm>
<primary>repmgr.conf</primary>
<secondary>log settings</secondary>
</indexterm>
<indexterm>
<primary>log settings</primary>
<secondary>configuration in repmgr.conf</secondary>
</indexterm>
<title>Log settings</title>
<para>
By default, &repmgr; and <application>repmgrd</application> write log output to
<literal>STDERR</literal>. An alternative log destination can be specified
(either a file or <literal>syslog</literal>).
</para>
<note>
<para>
The &repmgr; application itself will continue to write log output to <literal>STDERR</literal>
even if another log destination is configured, as otherwise any output resulting from a command
line operation will "disappear" into the log.
</para>
<para>
This behaviour can be overriden with the command line option <option>--log-to-file</option>,
which will redirect all logging output to the configured log destination. This is recommended
when &repmgr; is executed by another application, particularly <application>repmgrd</application>,
to enable log output generated by the &repmgr; application to be stored for later reference.
</para>
</note>
<variablelist>
<varlistentry id="repmgr-conf-log-level" xreflabel="log_level">
<term><varname>log_level</varname> (<type>string</type>)
<indexterm>
<primary><varname>log_level</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
One of <option>DEBUG</option>, <option>INFO</option>, <option>NOTICE</option>,
<option>WARNING</option>, <option>ERROR</option>, <option>ALERT</option>, <option>CRIT</option>
or <option>EMERG</option>.
</para>
<para>
Default is <option>INFO</option>.
</para>
<para>
Note that <option>DEBUG</option> will produce a substantial amount of log output
and should not be enabled in normal use.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-log-facility" xreflabel="log_facility">
<term><varname>log_facility</varname> (<type>string</type>)
<indexterm>
<primary><varname>log_facility</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
Logging facility: possible values are <option>STDERR</option> (default), or for
syslog integration, one of <option>LOCAL0</option>, <option>LOCAL1</option>, <option>...</option>,
<option>LOCAL7</option>, <option>USER</option>.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-log-file" xreflabel="log_file">
<term><varname>log_file</varname> (<type>string</type>)
<indexterm>
<primary><varname>log_file</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
If <xref linkend="repmgr-conf-log-facility"> is set to <option>STDERR</option>, log output
can be redirected to the specified file.
</para>
<para>
See <xref linkend="repmgrd-log-rotation"> for information on configuring log rotation.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-log-status-interval" xreflabel="log_status_interval">
<term><varname>log_status_interval</varname> (<type>integer</type>)
<indexterm>
<primary><varname>log_status_interval</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
This setting causes <application>repmgrd</application> to emit a status log
line at the specified interval (in seconds, default <literal>300</literal>)
describing <application>repmgrd</application>'s current state, e.g.:
</para>
<programlisting>
[2018-07-12 00:47:32] [INFO] monitoring connection to upstream node "node1" (node ID: 1)</programlisting>
</listitem>
</varlistentry>
</variablelist>
</sect1>

View File

@@ -1,10 +1,10 @@
<sect1 id="configuration-file-settings" xreflabel="configuration file settings">
<sect1 id="configuration-file-settings" xreflabel="required configuration file settings">
<indexterm>
<primary>repmgr.conf</primary>
<secondary>basic settings</secondary>
<secondary>required settings</secondary>
</indexterm>
<title>Basic configuration file settings</title>
<title>Required configuration file settings</title>
<para>
Each <filename>repmgr.conf</filename> file must contain the following parameters:
</para>

View File

@@ -1,4 +1,4 @@
<sect1 id="configuration-service-commands" xreflabel="service command settings">
<sect1 id="configuration-file-service-commands" xreflabel="service command settings">
<indexterm>
<primary>repmgr.conf</primary>
<secondary>service command settings</secondary>
@@ -50,10 +50,18 @@
<note>
<para>
It's also possible to specify a <varname>service_promote_command</varname>;
this overrides any value contained in the setting <varname>promote_command</varname>.
It's also possible to specify a <varname>service_promote_command</varname>.
This is intended for systems which provide a package-level promote command,
such as Debian's <application>pg_ctlcluster</application>.
such as Debian's <application>pg_ctlcluster</application>, to promote the
PostgreSQL from standby to primary.
</para>
<para>
If your packaging system does not provide such a command, it can be left empty,
and &repmgr; will generate the appropriate `pg_ctl ... promote` command.
</para>
<para>
Do not confuse this with <varname>promote_command</varname>, which is used
by <application>repmgrd</application> to execute <xref linkend="repmgr-standby-promote">.
</para>
</note>

View File

@@ -2,16 +2,17 @@
<title>repmgr configuration</title>
&configuration-file;
&configuration-file-settings;
&configuration-service-commands;
&configuration-file-required-settings;
&configuration-file-log-settings;
&configuration-file-service-commands;
<sect1 id="configuration-permissions" xreflabel="User permissions">
<sect1 id="configuration-permissions" xreflabel="Database user permissions">
<indexterm>
<primary>configuration</primary>
<secondary>user permissions</secondary>
<secondary>database user permissions</secondary>
</indexterm>
<title>repmgr user permissions</title>
<title>repmgr database user permissions</title>
<para>
&repmgr; will create an extension database containing objects
for administering &repmgr; metadata. The user defined in the <varname>conninfo</varname>

View File

@@ -206,7 +206,7 @@
<simpara><literal>repmgrd_failover_follow</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_aborted</literal></simpara>
<simpara><literal>repmgrd_failover_aborted</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_upstream_disconnect</literal></simpara>
@@ -217,9 +217,6 @@
<listitem>
<simpara><literal>repmgrd_promote_error</literal></simpara>
</listitem>
<listitem>
<simpara><literal>repmgrd_failover_promote</literal></simpara>
</listitem>
<listitem>
<simpara><literal>bdr_failover</literal></simpara>
</listitem>

View File

@@ -38,8 +38,9 @@
<!ENTITY quickstart SYSTEM "quickstart.sgml">
<!ENTITY configuration SYSTEM "configuration.sgml">
<!ENTITY configuration-file SYSTEM "configuration-file.sgml">
<!ENTITY configuration-file-settings SYSTEM "configuration-file-settings.sgml">
<!ENTITY configuration-service-commands SYSTEM "configuration-service-commands.sgml">
<!ENTITY configuration-file-required-settings SYSTEM "configuration-file-required-settings.sgml">
<!ENTITY configuration-file-log-settings SYSTEM "configuration-file-log-settings.sgml">
<!ENTITY configuration-file-service-commands SYSTEM "configuration-file-service-commands.sgml">
<!ENTITY cloning-standbys SYSTEM "cloning-standbys.sgml">
<!ENTITY promoting-standby SYSTEM "promoting-standby.sgml">
<!ENTITY follow-new-primary SYSTEM "follow-new-primary.sgml">

View File

@@ -5,26 +5,27 @@
system.
</para>
<sect2 id="installation-packages-redhat" xreflabel="Installing from packages on RHEL, Fedora and CentOS">
<sect2 id="installation-packages-redhat" xreflabel="Installing from packages on RHEL, CentOS and Fedora">
<indexterm>
<primary>installation</primary>
<secondary>on Red Hat/CentOS/Fedora etc.</secondary>
</indexterm>
<title>RedHat/Fedora/CentOS</title>
<title>RedHat/CentOS/Fedora</title>
<para>
RPM packages for &repmgr; are available via Yum through
&repmgr; RPM packages for RedHat/CentOS variants and Fedora are available from the
<ulink url="https://2ndquadrant.com">2ndQuadrant</ulink>
<ulink url="https://rpm.2ndquadrant.com/">public RPM repository</ulink>; see following
section for details.
</para>
<para>
RPM packages for &repmgr; are also available via Yum through
the PostgreSQL Global Development Group RPM repository
(<ulink url="https://yum.postgresql.org/">http://yum.postgresql.org/</ulink>).
Follow the instructions for your distribution (RedHat, CentOS,
Fedora, etc.) and architecture as detailed there.
</para>
<para>
<ulink url="https://2ndquadrant.com">2ndQuadrant</ulink> also provides its
own RPM packages which are made available
at the same time as each &repmgr; release, as it can take some days for
them to become available via the main PGDG repository. See following section for details:
Fedora, etc.) and architecture as detailed there. Note that it can take some days
for new &repmgr; packages to become available via the this repository.
</para>
<note>
<para>
@@ -37,65 +38,74 @@
<para>
For more information on the package contents, including details of installation
paths and relevant <link linkend="configuration-service-commands">service commands</link>,
paths and relevant <link linkend="configuration-file-service-commands">service commands</link>,
see the appendix section <xref linkend="packages-centos">.
</para>
<sect3 id="installation-packages-redhat-2ndq">
<title>2ndQuadrant repmgr yum repository</title>
<title>2ndQuadrant public RPM yum repository</title>
<note>
<para>
<ulink url="https://2ndquadrant.com">2ndQuadrant</ulink> previously provided a dedicated
&repmgr; repository at
<ulink url="http://packages.2ndquadrant.com/repmgr/">http://packages.2ndquadrant.com/repmgr/</ulink>.
This repository will be deprecated in a future release as it is now replaced by
the <ulink url="https://rpm.2ndquadrant.com/">public RPM repository</ulink>
documented below.
</para>
</note>
<para>
Beginning with <ulink url="http://repmgr.org/release-notes-3.1.3.html">repmgr 3.1.3</ulink>,
Beginning with <ulink url="https://repmgr.org/docs/4.0/release-4.0.5.html">repmgr 4.0.5</ulink>,
<ulink url="https://2ndquadrant.com/">2ndQuadrant</ulink> provides a dedicated <literal>yum</literal>
repository for &repmgr; releases. This repository complements the main
<ulink url="https://yum.postgresql.org/repopackages.php">PGDG community repository</ulink>,
but enables repmgr users to access the latest &repmgr; packages before they are
available via the PGDG repository, which can take several days to be updated following
a fresh &repmgr; release.
</para>
<ulink url="https://rpm.2ndquadrant.com/">public RPM repository</ulink> for 2ndQuadrant software,
including &repmgr;. We recommend using this for all future &repmgr; releases.
</para>
<para>
General instructions for using this repository can be found on its
<ulink url="https://rpm.2ndquadrant.com/">homepage</ulink>. Specific instructions
for installing &repmgr; follow below.
</para>
<para>
<emphasis>Installation</emphasis>
<itemizedlist>
<listitem>
<para>
Import the repository public key (optional but recommended):
<programlisting>
rpm --import http://packages.2ndquadrant.com/repmgr/RPM-GPG-KEY-repmgr</programlisting>
</para>
</listitem>
<listitem>
<para>
Locate the repository RPM for your PostgreSQL version from the list at:
<ulink url="https://rpm.2ndquadrant.com/">https://rpm.2ndquadrant.com/</ulink>
</para>
</listitem>
<listitem>
<para>
Install the repository RPM for your distribution (this enables the 2ndQuadrant
repository as a source of repmgr packages):
<itemizedlist>
<listitem>
<simpara>
<emphasis>Fedora:</emphasis>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
<listitem>
<simpara>
<emphasis>RHEL, CentOS etc:</emphasis>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
e.g.:
<programlisting>
$ yum install http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm</programlisting>
</para>
</listitem>
Install the repository RPM for your distribution and PostgreSQL version
(this enables the 2ndQuadrant repository as a source of &repmgr; packages).
</para>
<para>
For example, for PostgreSQL 10 on CentOS, execute:
<programlisting>
sudo yum install https://rpm.2ndquadrant.com/site/content/2ndquadrant-repo-10-1-1.el7.noarch.rpm
</programlisting>
</para>
<para>
Verify that the repository is installed with:
<programlisting>
sudo yum repolist</programlisting>
The output should contain two entries like this:
<programlisting>
2ndquadrant-repo-10/7/x86_64 2ndQuadrant packages for PG10 for rhel 7 - x86_64 1
2ndquadrant-repo-10-debug/7/x86_64 2ndQuadrant packages for PG10 for rhel 7 - x86_64 - Debug 1</programlisting>
</para>
</listitem>
<listitem>
<para>
Install the repmgr version appropriate for your PostgreSQL version (e.g. <literal>repmgr96</literal>), e.g.:
Install the &repmgr version appropriate for your PostgreSQL version (e.g. <literal>repmgr10</literal>):
<programlisting>
$ yum install repmgr96</programlisting>
$ yum install repmgr10</programlisting>
</para>
</listitem>
</itemizedlist>
@@ -105,13 +115,13 @@
<emphasis>Compatibility with PGDG Repositories</emphasis>
</para>
<para>
The 2ndQuadrant &repmgr; yum repository uses exactly the same package definitions as the
main PGDG repository and is effectively a selective mirror for &repmgr; packages only.
The 2ndQuadrant &repmgr; yum repository packages use the same definitions and file system layout as the
main PGDG repository.
</para>
<para>
Normally yum should prioritize the repository with the most recent &repmgr; version.
Once the PGDG repository has been updated, it doesn't matter which repository
the packages are installed from.
Normally <application>yum</application> will prioritize the repository with the most recent &repmgr; version.
Once the PGDG repository has been updated, it doesn't matter which repository
the packages are installed from.
</para>
<para>
To ensure the 2ndQuadrant repository is always prioritised, install <literal>yum-plugin-priorities</literal>
@@ -125,30 +135,23 @@
To install a specific package version, execute <command>yum --showduplicates list</command>
for the package in question:
<programlisting>
[root@localhost ~]# yum --showduplicates list repmgr96
[root@localhost ~]# yum --showduplicates list repmgr10
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: ftp.iij.ad.jp
* extras: ftp.iij.ad.jp
* updates: ftp.iij.ad.jp
Available Packages
repmgr96.x86_64 3.2-1.el6 2ndquadrant-repmgr
repmgr96.x86_64 3.2.1-1.el6 2ndquadrant-repmgr
repmgr96.x86_64 3.3-1.el6 2ndquadrant-repmgr
repmgr96.x86_64 3.3.1-1.el6 2ndquadrant-repmgr
repmgr96.x86_64 3.3.2-1.el6 2ndquadrant-repmgr
repmgr96.x86_64 3.3.2-1.rhel6 pgdg96
repmgr96.x86_64 4.0.0-1.el6 2ndquadrant-repmgr
repmgr96.x86_64 4.0.0-1.rhel6 pgdg96</programlisting>
repmgr10.x86_64 4.0.3-1.rhel7 pgdg10
repmgr10.x86_64 4.0.4-1.rhel7 pgdg10
repmgr10.x86_64 4.0.5-1.el7 2ndquadrant-repo-10</programlisting>
then append the appropriate version number to the package name with a hyphen, e.g.:
<programlisting>
[root@localhost ~]# yum install repmgr96-3.3.2-1.el6</programlisting>
[root@localhost ~]# yum install repmgr10-4.0.3-1.rhel7</programlisting>
</para>
</sect3>
</sect2>
<sect2 id="installation-packages-debian" xreflabel="Installing from packages on Debian or Ubuntu">
<indexterm>
@@ -164,10 +167,83 @@
</para>
<para>
For more information on the package contents, including details of installation
paths and relevant <link linkend="configuration-service-commands">service commands</link>,
paths and relevant <link linkend="configuration-file-service-commands">service commands</link>,
see the appendix section <xref linkend="packages-debian-ubuntu">.
</para>
<sect3 id="installation-packages-debian-ubuntu-2ndq">
<title>2ndQuadrant public apt repository for Debian/Ubuntu</title>
<para>
Beginning with <ulink url="https://repmgr.org/docs/4.0/release-4.0.5.html">repmgr 4.0.5</ulink>,
<ulink url="https://2ndquadrant.com/">2ndQuadrant</ulink> provides a
<ulink url="https://apt.2ndquadrant.com/">public apt repository</ulink> for 2ndQuadrant software,
including &repmgr;.
</para>
<para>
General instructions for using this repository can be found on its
<ulink url="https://apt.2ndquadrant.com/">homepage</ulink>. Specific instructions
for installing &repmgr; follow below.
</para>
<para>
<emphasis>Installation</emphasis>
<itemizedlist>
<listitem>
<para>
If not already present, install the <application>apt-transport-https</application> package:
<programlisting>
sudo apt-get install apt-transport-https</programlisting>
</para>
</listitem>
<listitem>
<para>
Create <filename>/etc/apt/sources.list.d/2ndquadrant.list</filename> as follows:
<programlisting>
sudo sh -c 'echo "deb https://apt.2ndquadrant.com/ $(lsb_release -cs)-2ndquadrant main" > /etc/apt/sources.list.d/2ndquadrant.list'</programlisting>
</para>
</listitem>
<listitem>
<para>
Install the 2ndQuadrant <ulink url="https://apt.2ndquadrant.com/site/keys/9904CD4BD6BAF0C3.asc">repository key</ulink>:
<programlisting>
sudo apt-get install curl ca-certificates
curl https://apt.2ndquadrant.com/site/keys/9904CD4BD6BAF0C3.asc | sudo apt-key add -</programlisting>
</para>
</listitem>
<listitem>
<para>
Update the package list
<programlisting>
sudo apt-get update</programlisting>
</para>
</listitem>
<listitem>
<para>
Install the &repmgr version appropriate for your PostgreSQL version (e.g. <literal>repmgr10</literal>):
<programlisting>
$ apt-get install postgresql-10-repmgr</programlisting>
</para>
<note>
<para>
For packages for PostgreSQL 9.6 and earlier, the package name includes
a period between major and minor version numbers, e.g.
<literal>postgresql-9.6-repmgr</literal>.
</para>
</note>
</listitem>
</itemizedlist>
</para>
</sect3>
</sect2>
</sect1>

View File

@@ -80,7 +80,7 @@
</para>
<para>
There are also tags for each &repmgr; release, e.g. <filename>REL4_0_STABLE</filename>.
There are also tags for each &repmgr; release, e.g. <filename>4.0.5</filename>.
</para>
<para>

View File

@@ -234,7 +234,7 @@
<para>
<filename>repmgr.conf</filename> should not be stored inside the PostgreSQL data directory,
as it could be overwritten when setting up or reinitialising the PostgreSQL
server. See sections on <xref linkend="configuration-file"> and <xref linkend="configuration-file-settings">
server. See sections <xref linkend="configuration"> and <xref linkend="configuration-file">
for further details about <filename>repmgr.conf</filename>.
</para>
<tip>

View File

@@ -38,5 +38,34 @@
and therefore determine the state of outbound connections from that node.
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr cluster crosscheck</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
The check completed successfully and all nodes are reachable.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more nodes could not be reached.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
</refentry>

View File

@@ -49,6 +49,22 @@
</para>
</refsect1>
<refsect1>
<title>Output format</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>--csv</literal>: generate output in CSV format. Note that the <literal>Details</literal>
column will currently not be emitted in CSV format.
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>Example</title>
<para>

View File

@@ -97,5 +97,35 @@
useful result.
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr cluster matrix</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
The check completed successfully and all nodes are reachable.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more nodes could not be reached.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
</refentry>

View File

@@ -113,4 +113,40 @@
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr cluster show</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-node-status">, <xref linkend="repmgr-node-check">
</para>
</refsect1>
</refentry>

View File

@@ -61,7 +61,9 @@
<listitem>
<simpara>
<literal>--archive-ready</literal>: checks for WAL files which have not yet been archived
<literal>--archive-ready</literal>: checks for WAL files which have not yet been archived,
and returns <literal>WARNING</literal> or <literal>CRITICAL</literal> if the number
exceeds <varname>archive_ready_warning</varname> or <varname>archive_ready_critical</varname> respectively.
</simpara>
</listitem>
@@ -77,11 +79,110 @@
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--missing-slots</literal>: checks there are no missing replication slots
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Individual checks can also be output in a Nagios-compatible format by additionally
providing the option <literal>--nagios</literal>.
</para>
</refsect1>
<refsect1>
<title>Output format</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>--csv</literal>: generate output in CSV format (not available
for individual checks)
</simpara>
</listitem>
<listitem>
<simpara>
<literal>--nagios</literal>: generate output in a Nagios-compatible format
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
When executing <command>repmgr node check</command> with one of the individual
checks listed above, &repmgr; will emit one of the following Nagios-style exit codes
(even if <literal>--nagios</literal> is not supplied):
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>0</literal>: OK
</simpara>
</listitem>
<listitem>
<simpara>
<literal>1</literal>: WARNING
</simpara>
</listitem>
<listitem>
<simpara>
<literal>2</literal>: ERROR
</simpara>
</listitem>
<listitem>
<simpara>
<literal>3</literal>: UNKNOWN
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
Following exit codes can be emitted by <command>repmgr status check</command>
if no individual check was specified.
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-node-status">, <xref linkend="repmgr-cluster-show">
</para>
</refsect1>
</refentry>

View File

@@ -115,7 +115,24 @@
</variablelist>
</refsect1>
<refsect1>
<title>Configuration file settings</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>node_rejoin_timeout</literal>:
the maximum length of time (in seconds) to wait for
the node to reconnect to the replication cluster (defaults to
the value set in <literal>standby_reconnect_timeout</literal>,
60 seconds).
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>Event notifications</title>
<para>

View File

@@ -24,7 +24,7 @@
<title>Example</title>
<para>
<programlisting>
$ repmgr -f /etc/repmgr.comf node status
$ repmgr -f /etc/repmgr.conf node status
Node "node1":
PostgreSQL version: 10beta1
Total data size: 30 MB
@@ -38,10 +38,54 @@
</para>
</refsect1>
<refsect1>
<title>Output format</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>--csv</literal>: generate output in CSV format
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <command>repmgr node status</command>:
</para>
<variablelist>
<varlistentry>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
No issues were detected.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>See also</title>
<para>
See <xref linkend="repmgr-node-check"> to diagnose issues.
See <xref linkend="repmgr-node-check"> to diagnose issues and <xref linkend="repmgr-cluster-show">
for an overview of all nodes in the cluster.
</para>
</refsect1>
</refentry>

View File

@@ -17,7 +17,7 @@
<title>Description</title>
<para>
<command>repmgr primary register</command> registers a primary node in a
streaming replication cluster, and configures it for use with repmgr, including
streaming replication cluster, and configures it for use with &repmgr;, including
installing the &repmgr; extension. This command needs to be executed before any
standby nodes are registered.
</para>

View File

@@ -124,7 +124,7 @@
<para>
We recommend using <ulink url="https://www.pgbarman.org/">Barman</ulink> to manage
WAL file archiving. For more details on combining &repmgr; and <application>Barman</application>,
in particular using <varname>restore_command</varname> to configure Barman as a backu source of
in particular using <varname>restore_command</varname> to configure Barman as a backup source of
WAL files, see <xref linkend="cloning-from-barman">.
</para>
</note>
@@ -177,12 +177,13 @@
<title>Using a standby cloned by another method</title>
<para>
&repmgr; supports standbys cloned by another method (e.g. using <application>barman</application>'s
<command>barman recover</command> command).
<command><ulink url="http://docs.pgbarman.org/release/2.4/#recover">barman recover</ulink></command> command).
</para>
<para>
To integrate the standby as a &repmgr; node, ensure the <filename>repmgr.conf</filename>
file is created for the node, then execute the command
<command>repmgr standby clone --recovery-conf-only</command>.
file is created for the node, and that it has been registered using
<command><link linkend="repmgr-standby-register">repmgr standby register</link></command>.
Then execute the command <command>repmgr standby clone --recovery-conf-only</command>.
This will create the <filename>recovery.conf</filename> file needed to attach
the node to its upstream, and will also create a replication slot on the
upstream node if required.
@@ -212,6 +213,15 @@
<variablelist>
<varlistentry>
<term><option>-d, --dbname=CONNINFO</option></term>
<listitem>
<para>
Connection string of the upstream node to use for cloning.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>

View File

@@ -26,10 +26,18 @@
running. It can only be used to attach an active standby to the current primary node
(and not to another standby).
</para>
<para>
To re-add an inactive node to the replication cluster, see
<xref linkend="repmgr-node-rejoin">
</para>
<tip>
<para>
To re-add an inactive node to the replication cluster, use
<xref linkend="repmgr-node-rejoin">.
</para>
</tip>
<para>
<command>repmgr standby follow</command> will wait up to
<varname>standby_follow_timeout</varname> seconds (default: <literal>30</literal>)
to verify the standby has actually connected to the new primary.
</para>
</refsect1>
@@ -92,7 +100,7 @@
A <literal>standby_follow</literal> <link linkend="event-notifications">event notification</link> will be generated.
</para>
<para>
If provided, &repmgr; will subsitute the placeholders <literal>%p</literal> with the node ID of the primary
If provided, &repmgr; will substitute the placeholders <literal>%p</literal> with the node ID of the primary
being followed, <literal>%c</literal> with its <literal>conninfo</literal> string, and
<literal>%a</literal> with its node name.
</para>

View File

@@ -32,6 +32,7 @@
check the promotion every <varname>promote_check_interval</varname> seconds (default: 1 second).
Both values can be defined in <filename>repmgr.conf</filename>.
</para>
</refsect1>
<refsect1>

View File

@@ -173,7 +173,7 @@
</para>
<para>
If provided, &repmgr; will subsitute the placeholders <literal>%p</literal> with the node ID of the
If provided, &repmgr; will substitute the placeholders <literal>%p</literal> with the node ID of the
primary node, <literal>%c</literal> with its <literal>conninfo</literal> string, and
<literal>%a</literal> with its node name.
</para>

View File

@@ -12,6 +12,7 @@
<refpurpose>promote a standby to primary and demote the existing primary to a standby</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
@@ -39,6 +40,17 @@
For more details on performing a switchover, including preparation and configuration,
see section <xref linkend="performing-switchover">.
</para>
<note>
<para>
<application>repmgrd</application> should not be active on any nodes while a switchover is being
executed. This restriction may be lifted in a later version.
</para>
<para>
&repmgr; will not perform the switchover if an exclusive backup is running on the current primary.
</para>
</note>
</refsect1>
<refsect1>
@@ -154,8 +166,8 @@
<listitem>
<simpara>
<literal>standby_reconnect_timeout</literal>:
Number of seconds to attempt to reconnect to the demoted primary
once it has been restarted.
number of seconds to attempt to wait for the demoted primary
to reconnect to the promoted primary (default: 60 seconds)
</simpara>
</listitem>
@@ -171,10 +183,12 @@
Execute with the <literal>--dry-run</literal> option to test the switchover as far as
possible without actually changing the status of either node.
</para>
<para>
<application>repmgrd</application> should not be active on any nodes while a switchover is being
executed. This restriction may be lifted in a later version.
</para>
<important>
<para>
<application>repmgrd</application> must be shut down on all nodes while a switchover is being
executed. This restriction will be removed in a future &repmgr; version.
</para>
</important>
<para>
External database connections, e.g. from an application, should not be permitted while
the switchover is taking place. In particular, active transactions on the primary
@@ -199,7 +213,7 @@
<refsect1>
<title>Exit codes</title>
<para>
Following exit codes can be emitted by <literal>repmgr standby switchover</literal>:
Following exit codes can be emitted by <command>repmgr standby switchover</command>:
</para>
<variablelist>
@@ -227,7 +241,7 @@
<para>
The switchover was executed but a problem was encountered.
Typically this means the former primary could not be reattached
as a standby.
as a standby. Check preceding log messages for more information.
</para>
</listitem>
</varlistentry>

View File

@@ -20,7 +20,10 @@
</para>
<para>
The node does not have to be running to be unregistered, however if this is the
case then connection information for the primary server must be provided.
case then either provide connection information for the primary server, or
execute <command>repmgr witness unregister</command> on a running node and
provide the parameter <option>--node-id</option> with the node ID of the
witness server.
</para>
<para>
Execute with the <literal>--dry-run</literal> option to check what would happen
@@ -36,17 +39,17 @@
INFO: connecting to witness node "node3" (ID: 3)
INFO: unregistering witness node 3
INFO: witness unregistration complete
DETAIL: witness node with id 3 (conninfo: host=node3 dbname=repmgr user=repmgr port=5499) successfully unregistered</programlisting>
DETAIL: witness node with UD 3 successfully unregistered</programlisting>
</para>
<para>
Unregistering a non-running witness node:
<programlisting>
$ repmgr -f /etc/repmgr.conf witness unregister -h node1 -p 5501 -F
INFO: connecting to witness node "node3" (ID: 3)
NOTICE: unable to connect to witness node "node3" (ID: 3), removing node record on cluster primary only
INFO: connecting to node "node3" (ID: 3)
NOTICE: unable to connect to node "node3" (ID: 3), removing node record on cluster primary only
INFO: unregistering witness node 3
INFO: witness unregistration complete
DETAIL: witness node with id 3 (conninfo: host=node3 dbname=repmgr user=repmgr port=5499) successfully unregistered</programlisting>
DETAIL: witness node with id ID 3 successfully unregistered</programlisting>
</para>
</refsect1>
@@ -62,6 +65,32 @@
</para>
</refsect1>
<refsect1>
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually unregister the witness.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--node-id</option></term>
<listitem>
<para>
Unregister witness server with the specified node ID.
</para>
</listitem>
</varlistentry>
</variablelist>
</refsect1>
<refsect1>
<title>Event notifications</title>

View File

@@ -99,15 +99,16 @@
replication cluster. The database must be the BDR-enabled database.
</para>
<para>
If defined, the evenr <application>event_notifications</application> parameter
will restrict execution of <varname>event_notification_command</varname>
If defined, the <varname>event_notifications</varname> parameter will restrict
execution of the script defined in <varname>event_notification_command</varname>
to the specified event(s).
</para>
<note>
<simpara>
<varname>event_notification_command</varname> is the script which does the actual "heavy lifting"
of reconfiguring the proxy server/ connection pooler. It is fully
user-definable; a reference implementation is documented below.
user-definable; see section <xref linkend="bdr-event-notification-command"> for a reference
implementation.
</simpara>
</note>
@@ -169,8 +170,8 @@
</para>
</sect1>
<sect1 id="bdr-event-notification-command" xreflabel="BDR failover event notification command">
<title>Defining the "event_notification_command"</title>
<sect1 id="bdr-event-notification-command" xreflabel="Defining the BDR failover &quot;event_notification command&quot;">
<title>Defining the BDR failover "event_notification_command"</title>
<para>
Key to "failover" execution is the <literal>event_notification_command</literal>,
which is a user-definable script specified in <filename>repmpgr.conf</filename>

View File

@@ -24,7 +24,7 @@
<para>
To use <application>repmgrd</application>, its associated function library <emphasis>must</emphasis> be
included in <filename>postgresql.conf</filename> with:
included via <filename>postgresql.conf</filename> with:
<programlisting>
shared_preload_libraries = 'repmgr'</programlisting>
@@ -34,6 +34,25 @@
the <ulink url="https://www.postgresql.org/docs/current/static/runtime-config-client.html#GUC-SHARED-PRELOAD-LIBRARIES">PostgreSQL documentation</ulink>.
</para>
<para>
To apply configuration file changes to a running <application>repmgrd</application>
daemon, execute the operating system's r<application>repmgrd</application> service reload command
(see <xref linkend="appendix-packages"> for examples),
or for instances which were manually started, execute <command>kill -HUP</command>, e.g.
<command>kill -HUP `cat /tmp/repmgrd.pid`</command>.
</para>
<note>
<para>
Check the <application>repmgrd</application> log to see what changes were
applied, or if any issues were encountered when reloading the configuration.
</para>
</note>
<para>
Note that only a subset of configuration file parameters can be changed on a
running <application>repmgrd</application> daemon.
</para>
<sect2 id="repmgrd-automatic-failover-configuration">
<title>automatic failover configuration</title>
<para>
@@ -112,7 +131,7 @@
particularly on <application>systemd</application>-based systems.
</para>
<para>
For more details, see <xref linkend="configuration-service-commands">.
For more details, see <xref linkend="configuration-file-service-commands">.
</para>
</sect2>
@@ -159,16 +178,62 @@
<para>
<application>repmgrd</application> can be started manually like this:
<programlisting>
repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid --daemonize</programlisting>
repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid</programlisting>
and stopped with <command>kill `cat /tmp/repmgrd.pid`</command>. Adjust paths as appropriate.
</para>
<para>
To apply configuration file changes to a running <application>repmgrd</application>
daemon, execute the operating system's service reload command (for manually started
instances, execute <command>kill -HUP `cat /tmp/repmgrd.pid`</command>).
Note that only a subset of configuration file parameters can be changed on a
running <application>repmgrd</application> daemon.
</para>
<sect2 id="repmgrd-pid-file" xreflabel="repmgrd's PID file">
<indexterm>
<primary>repmgrd</primary>
<secondary>PID file</secondary>
</indexterm>
<indexterm>
<primary>PID file</primary>
<secondary>repmgrd</secondary>
</indexterm>
<title>repmgrd's PID file</title>
<para>
<application>repmgrd</application> will generate a PID file by default.
</para>
<note>
<simpara>
This is a behaviour change from previous versions (earlier than 4.1), where
the PID file had to be explicitly specified with the command line
parameter <option> --pid-file</option>.
</simpara>
</note>
<para>
The PID file can be specified in <filename>repmgr.conf</filename> with the configuration
parameter <varname>repmgrd_pid_file</varname>.
</para>
<para>
It can also be specified on the command line (as in previous versions) with
the command line parameter <option>--pid-file</option>. Note this will override
any value set in <filename>repmgr.conf</filename> with <varname>repmgrd_pid_file</varname>.
<option>--pid-file</option> may be deprecated in future releases.
</para>
<para>
If a PID file location was specified by the package maintainer, <application>repmgrd</application>
will use that. This only applies if &repmgr; was installed from a package and the package
maintainer has specified the PID file location.
</para>
<para>
If none of the above apply, <application>repmgrd</application> will create a PID file
in the operating system's temporary directory (das etermined by the environment variable
<varname>TMPDIR</varname>, or if that is not set, will use <filename>/tmp</filename>).
</para>
<para>
To prevent a PID file being generated at all, provide the command line option
<option>--no-pid-file</option>.
</para>
<para>
To see which PID file <application>repmgrd</application> would use, execute <application>repmgrd</application>
with the option <option>--show-pid-file</option>. <application>repmgrd</application>
will not start if this option is provided. Note that the value shown is the
file <application>repmgrd</application> would use next time it starts, and is
not necessarily the PID file currently in use.
</para>
</sect2>
<sect2 id="repmgrd-configuration-debian-ubuntu">
<indexterm>

View File

@@ -57,7 +57,7 @@
<para>
As mentioned in the previous section, success of the switchover operation depends on
&repmgr; being able to shut down the current primary server quickly and cleanly.
&repmgr; being able to shut down the current primary server quickly and cleanly.
</para>
<para>
@@ -104,7 +104,7 @@
server.
</para>
<para>
For more details, see <xref linkend="configuration-service-commands">.
For more details, see <xref linkend="configuration-file-service-commands">.
</para>
</important>
@@ -121,15 +121,21 @@
</simpara>
</note>
<para>
Check that access from applications is minimalized or preferably blocked
completely, so applications are not unexpectedly interrupted.
Check that access from applications is minimalized or preferably blocked
completely, so applications are not unexpectedly interrupted.
</para>
<note>
<para>
If an exclusive backup is running on the current primary, &repmgr; will not perform the
switchover.
</para>
</note>
<para>
Check there is no significant replication lag on standbys attached to the
current primary.
Check there is no significant replication lag on standbys attached to the
current primary.
</para>
<para>
@@ -140,10 +146,13 @@
manually with <command>repmgr node check --archive-ready</command>.
</para>
<para>
Ensure that <application>repmgrd</application> is *not* running anywhere to prevent it unintentionally
promoting a node.
</para>
<note>
<para>
Ensure that <application>repmgrd</application> is *not* running anywhere to prevent it unintentionally
promoting a node. This restriction will be removed in a future &repmgr; version.
</para>
</note>
<para>
Finally, consider executing <command>repmgr standby switchover</command> with the

View File

@@ -29,8 +29,18 @@
</listitem>
<listitem>
<simpara>
In the database where the &repmgr; extension is installed, execute
<command>ALTER EXTENSION repmgr UPDATE</command>.
<application>repmgrd</application> (if running) must be restarted.
</simpara>
</listitem>
<listitem>
<simpara>
For major releases, e.g. from <literal>4.0.x</literal> to <literal>4.1</literal>,
execute <command>ALTER EXTENSION repmgr UPDATE</command>
on the primary node in the database where the &repmgr; extension is installed.
</simpara>
<simpara>
This will update the extension metadata and, if necessary, apply
changes to the &repmgr; extension objects.
</simpara>
</listitem>
</orderedlist>
@@ -41,10 +51,6 @@
release as they may contain upgrade instructions particular to individual versions.
</para>
<para>
If the <application>repmgrd</application> daemon is in use, we recommend stopping it
before upgrading &repmgr;.
</para>
<para>
Note that it may be necessary to restart the PostgreSQL server if the upgrade contains
changes to the shared object file used by <application>repmgrd</application>; check the

View File

@@ -1 +1 @@
<!ENTITY repmgrversion "4.0.5">
<!ENTITY repmgrversion "4.1.0">

View File

@@ -46,5 +46,6 @@
#define ERR_SWITCHOVER_INCOMPLETE 22
#define ERR_FOLLOW_FAIL 23
#define ERR_REJOIN_FAIL 24
#define ERR_NODE_STATUS 25
#endif /* _ERRCODE_H_ */

19
log.c
View File

@@ -42,7 +42,7 @@ _stderr_log_with_level(const char *level_name, int level, const char *fmt, va_li
__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0)));
int log_type = REPMGR_STDERR;
int log_level = LOG_NOTICE;
int log_level = LOG_INFO;
int last_log_level = LOG_INFO;
int verbose_logging = false;
int terse_logging = false;
@@ -70,7 +70,7 @@ _stderr_log_with_level(const char *level_name, int level, const char *fmt, va_li
/*
* Store the requested level so that if there's a subsequent log_hint() or
* log_detail(), we can suppress that if appropriate.
* log_detail(), we can suppress that if --terse was specified,
*/
last_log_level = level;
@@ -329,6 +329,21 @@ logger_set_terse(void)
}
void
logger_set_level(int new_log_level)
{
log_level = new_log_level;
}
void
logger_set_min_level(int min_log_level)
{
if (min_log_level > log_level)
log_level = min_log_level;
}
int
detect_log_level(const char *level)
{

2
log.h
View File

@@ -128,6 +128,8 @@ bool logger_shutdown(void);
void logger_set_verbose(void);
void logger_set_terse(void);
void logger_set_min_level(int min_log_level);
void logger_set_level(int new_log_level);
void
log_detail(const char *fmt,...)

2
repmgr--4.0--4.1.sql Normal file
View File

@@ -0,0 +1,2 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit

167
repmgr--4.1.sql Normal file
View File

@@ -0,0 +1,167 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit
CREATE TABLE repmgr.nodes (
node_id INTEGER PRIMARY KEY,
upstream_node_id INTEGER NULL REFERENCES nodes (node_id) DEFERRABLE,
active BOOLEAN NOT NULL DEFAULT TRUE,
node_name TEXT NOT NULL,
type TEXT NOT NULL CHECK (type IN('primary','standby','witness','bdr')),
location TEXT NOT NULL DEFAULT 'default',
priority INT NOT NULL DEFAULT 100,
conninfo TEXT NOT NULL,
repluser VARCHAR(63) NOT NULL,
slot_name TEXT NULL,
config_file TEXT NOT NULL
);
CREATE TABLE repmgr.events (
node_id INTEGER NOT NULL,
event TEXT NOT NULL,
successful BOOLEAN NOT NULL DEFAULT TRUE,
event_timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
details TEXT NULL
);
DO $repmgr$
DECLARE
DECLARE server_version_num INT;
BEGIN
SELECT setting
FROM pg_catalog.pg_settings
WHERE name = 'server_version_num'
INTO server_version_num;
IF server_version_num >= 90400 THEN
EXECUTE $repmgr_func$
CREATE TABLE repmgr.monitoring_history (
primary_node_id INTEGER NOT NULL,
standby_node_id INTEGER NOT NULL,
last_monitor_time TIMESTAMP WITH TIME ZONE NOT NULL,
last_apply_time TIMESTAMP WITH TIME ZONE,
last_wal_primary_location PG_LSN NOT NULL,
last_wal_standby_location PG_LSN,
replication_lag BIGINT NOT NULL,
apply_lag BIGINT NOT NULL
)
$repmgr_func$;
ELSE
EXECUTE $repmgr_func$
CREATE TABLE repmgr.monitoring_history (
primary_node_id INTEGER NOT NULL,
standby_node_id INTEGER NOT NULL,
last_monitor_time TIMESTAMP WITH TIME ZONE NOT NULL,
last_apply_time TIMESTAMP WITH TIME ZONE,
last_wal_primary_location TEXT NOT NULL,
last_wal_standby_location TEXT,
replication_lag BIGINT NOT NULL,
apply_lag BIGINT NOT NULL
)
$repmgr_func$;
END IF;
END$repmgr$;
CREATE INDEX idx_monitoring_history_time
ON repmgr.monitoring_history (last_monitor_time, standby_node_id);
CREATE VIEW repmgr.show_nodes AS
SELECT n.node_id,
n.node_name,
n.active,
n.upstream_node_id,
un.node_name AS upstream_node_name,
n.type,
n.priority,
n.conninfo
FROM repmgr.nodes n
LEFT JOIN repmgr.nodes un
ON un.node_id = n.upstream_node_id;
/* XXX update upgrade scripts! */
CREATE TABLE repmgr.voting_term (
term INT NOT NULL
);
CREATE UNIQUE INDEX voting_term_restrict
ON repmgr.voting_term ((TRUE));
CREATE RULE voting_term_delete AS
ON DELETE TO repmgr.voting_term
DO INSTEAD NOTHING;
/* ================= */
/* repmgrd functions */
/* ================= */
/* monitoring functions */
CREATE FUNCTION set_local_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION get_local_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION standby_set_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'standby_set_last_updated'
LANGUAGE C STRICT;
CREATE FUNCTION standby_get_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'standby_get_last_updated'
LANGUAGE C STRICT;
/* failover functions */
CREATE FUNCTION notify_follow_primary(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'notify_follow_primary'
LANGUAGE C STRICT;
CREATE FUNCTION get_new_primary()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_new_primary'
LANGUAGE C STRICT;
CREATE FUNCTION reset_voting_status()
RETURNS VOID
AS 'MODULE_PATHNAME', 'reset_voting_status'
LANGUAGE C STRICT;
CREATE FUNCTION am_bdr_failover_handler(INT)
RETURNS BOOL
AS 'MODULE_PATHNAME', 'am_bdr_failover_handler'
LANGUAGE C STRICT;
CREATE FUNCTION unset_bdr_failover_handler()
RETURNS VOID
AS 'MODULE_PATHNAME', 'unset_bdr_failover_handler'
LANGUAGE C STRICT;
CREATE VIEW repmgr.replication_status AS
SELECT m.primary_node_id, m.standby_node_id, n.node_name AS standby_name,
n.type AS node_type, n.active, last_monitor_time,
CASE WHEN n.type='standby' THEN m.last_wal_primary_location ELSE NULL END AS last_wal_primary_location,
m.last_wal_standby_location,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.replication_lag) ELSE NULL END AS replication_lag,
CASE WHEN n.type='standby' THEN
CASE WHEN replication_lag > 0 THEN age(now(), m.last_apply_time) ELSE '0'::INTERVAL END
ELSE NULL
END AS replication_time_lag,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.apply_lag) ELSE NULL END AS apply_lag,
AGE(NOW(), CASE WHEN pg_catalog.pg_is_in_recovery() THEN repmgr.standby_get_last_updated() ELSE m.last_monitor_time END) AS communication_time_lag
FROM repmgr.monitoring_history m
JOIN repmgr.nodes n ON m.standby_node_id = n.node_id
WHERE (m.standby_node_id, m.last_monitor_time) IN (
SELECT m1.standby_node_id, MAX(m1.last_monitor_time)
FROM repmgr.monitoring_history m1 GROUP BY 1
);

View File

@@ -83,9 +83,10 @@ do_bdr_register(void)
exit(ERR_BAD_CONFIG);
}
if (bdr_nodes.node_count > 2)
/* BDR 2 implementation is for 2 nodes only */
if (get_bdr_version_num() < 3 && bdr_nodes.node_count > 2)
{
log_error(_("repmgr can only support BDR clusters with 2 nodes"));
log_error(_("repmgr can only support BDR 2.x clusters with 2 nodes"));
log_detail(_("this BDR cluster has %i nodes"), bdr_nodes.node_count);
PQfinish(conn);
pfree(dbname);
@@ -176,6 +177,7 @@ do_bdr_register(void)
if (bdr_node_has_repmgr_set(conn, config_file_options.node_name) == false)
{
log_debug("bdr_node_has_repmgr_set() = false");
bdr_node_set_repmgr_set(conn, config_file_options.node_name);
}
@@ -201,6 +203,7 @@ do_bdr_register(void)
if (bdr_nodes.node_count == 0)
{
log_error(_("unable to retrieve any BDR node records"));
log_detail("%s", PQerrorMessage(conn));
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
@@ -252,7 +255,35 @@ do_bdr_register(void)
}
/* Add the repmgr extension tables to a replication set */
add_extension_tables_to_bdr_replication_set(conn);
if (get_bdr_version_num() < 3)
{
add_extension_tables_to_bdr_replication_set(conn);
}
else
{
/* this is the only table we need to replicate */
char *replication_set = get_default_bdr_replication_set(conn);
/*
* this probably won't happen, but we need to be sure we're using
* the replication set metadata correctly...
*/
if (conn == NULL)
{
log_error(_("unable to retrieve default BDR replication set"));
log_hint(_("see preceding messages"));
log_debug("check query in get_default_bdr_replication_set()");
exit(ERR_BAD_CONFIG);
}
if (is_table_in_bdr_replication_set(conn, "nodes", replication_set) == false)
{
add_table_to_bdr_replication_set(conn, "nodes", replication_set);
}
pfree(replication_set);
}
initPQExpBuffer(&event_details);

View File

@@ -83,6 +83,7 @@ do_cluster_show(void)
int i = 0;
ItemList warnings = {NULL, NULL};
bool success = false;
bool error_found = false;
/* Connect to local database to obtain cluster connection data */
log_verbose(LOG_INFO, _("connecting to database"));
@@ -218,6 +219,7 @@ do_cluster_show(void)
else
{
appendPQExpBuffer(&details, "- failed");
error_found = true;
}
}
}
@@ -281,6 +283,7 @@ do_cluster_show(void)
else
{
appendPQExpBuffer(&details, "- failed");
error_found = true;
}
}
}
@@ -292,17 +295,27 @@ do_cluster_show(void)
if (cell->node_info->node_status == NODE_STATUS_UP)
{
if (cell->node_info->active == true)
{
appendPQExpBuffer(&details, "* running");
}
else
{
appendPQExpBuffer(&details, "! running");
error_found = true;
}
}
/* node is unreachable */
else
{
if (cell->node_info->active == true)
{
appendPQExpBuffer(&details, "? unreachable");
}
else
{
appendPQExpBuffer(&details, "- failed");
error_found = true;
}
}
}
break;
@@ -310,6 +323,7 @@ do_cluster_show(void)
{
/* this should never happen */
appendPQExpBuffer(&details, "? unknown node type");
error_found = true;
}
break;
}
@@ -414,7 +428,6 @@ do_cluster_show(void)
PQfinish(conn);
/* emit any warnings */
if (warnings.head != NULL && runtime_options.terse == false && runtime_options.output_mode != OM_CSV)
{
ItemListCell *cell = NULL;
@@ -425,6 +438,20 @@ do_cluster_show(void)
printf(_(" - %s\n"), cell->string);
}
}
/*
* If warnings were noted, even if they're not displayed (e.g. in --csv node),
* that means something's not right so we need to emit a non-zero exit code.
*/
if (warnings.head != NULL)
{
error_found = true;
}
if (error_found == true)
{
exit(ERR_NODE_STATUS);
}
}
@@ -436,6 +463,7 @@ do_cluster_show(void)
* --all
* --node-[id|name]
* --event
* --csv
*/
void
@@ -480,8 +508,12 @@ do_cluster_event(void)
strncpy(headers_event[EV_TIMESTAMP].title, _("Timestamp"), MAXLEN);
strncpy(headers_event[EV_DETAILS].title, _("Details"), MAXLEN);
/* if --terse provided, simply omit the "Details" column */
if (runtime_options.terse == true)
/*
* If --terse or --csv provided, simply omit the "Details" column.
* In --csv mode we'd need to quote/escape the contents "Details" column,
* which is doable but which will remain a TODO for now.
*/
if (runtime_options.terse == true || runtime_options.output_mode == OM_CSV)
column_count --;
for (i = 0; i < column_count; i++)
@@ -504,47 +536,64 @@ do_cluster_event(void)
}
for (i = 0; i < column_count; i++)
if (runtime_options.output_mode == OM_TEXT)
{
if (i == 0)
printf(" ");
else
printf(" | ");
for (i = 0; i < column_count; i++)
{
if (i == 0)
printf(" ");
else
printf(" | ");
printf("%-*s",
headers_event[i].max_length,
headers_event[i].title);
printf("%-*s",
headers_event[i].max_length,
headers_event[i].title);
}
printf("\n");
printf("-");
for (i = 0; i < column_count; i++)
{
int j;
for (j = 0; j < headers_event[i].max_length; j++)
printf("-");
if (i < (column_count - 1))
printf("-+-");
else
printf("-");
}
printf("\n");
}
printf("\n");
printf("-");
for (i = 0; i < column_count; i++)
{
int j;
for (j = 0; j < headers_event[i].max_length; j++)
printf("-");
if (i < (column_count - 1))
printf("-+-");
else
printf("-");
}
printf("\n");
for (i = 0; i < PQntuples(res); i++)
{
int j;
printf(" ");
for (j = 0; j < column_count; j++)
if (runtime_options.output_mode == OM_CSV)
{
printf("%-*s",
headers_event[j].max_length,
PQgetvalue(res, i, j));
for (j = 0; j < column_count; j++)
{
printf("%s", PQgetvalue(res, i, j));
if ((j + 1) < column_count)
{
printf(",");
}
}
}
else
{
printf(" ");
for (j = 0; j < column_count; j++)
{
printf("%-*s",
headers_event[j].max_length,
PQgetvalue(res, i, j));
if (j < (column_count - 1))
printf(" | ");
if (j < (column_count - 1))
printf(" | ");
}
}
printf("\n");
@@ -554,7 +603,8 @@ do_cluster_event(void)
PQfinish(conn);
puts("");
if (runtime_options.output_mode == OM_TEXT)
puts("");
}
@@ -569,6 +619,8 @@ do_cluster_crosscheck(void)
t_node_status_cube **cube;
bool error_found = false;
n = build_cluster_crosscheck(&cube, &name_length);
if (runtime_options.output_mode == OM_CSV)
{
@@ -648,9 +700,11 @@ do_cluster_crosscheck(void)
{
case -2:
c = '?';
error_found = true;
break;
case -1:
c = 'x';
error_found = true;
break;
case 0:
c = '*';
@@ -689,6 +743,11 @@ do_cluster_crosscheck(void)
free(cube);
}
if (error_found == true)
{
exit(ERR_NODE_STATUS);
}
}
@@ -704,6 +763,8 @@ do_cluster_matrix()
t_node_matrix_rec **matrix_rec_list;
bool error_found = false;
n = build_cluster_matrix(&matrix_rec_list, &name_length);
if (runtime_options.output_mode == OM_CSV)
@@ -742,9 +803,11 @@ do_cluster_matrix()
{
case -2:
c = '?';
error_found = true;
break;
case -1:
c = 'x';
error_found = true;
break;
case 0:
c = '*';
@@ -770,6 +833,11 @@ do_cluster_matrix()
}
free(matrix_rec_list);
if (error_found == true)
{
exit(ERR_NODE_STATUS);
}
}
@@ -1329,6 +1397,7 @@ do_cluster_help(void)
printf(_(" %s [OPTIONS] cluster matrix\n"), progname());
printf(_(" %s [OPTIONS] cluster crosscheck\n"), progname());
printf(_(" %s [OPTIONS] cluster event\n"), progname());
printf(_(" %s [OPTIONS] cluster cleanup\n"), progname());
puts("");
printf(_("CLUSTER SHOW\n"));
@@ -1368,6 +1437,7 @@ do_cluster_help(void)
printf(_(" --event filter specific event\n"));
printf(_(" --node-id restrict entries to node with this ID\n"));
printf(_(" --node-name restrict entries to node with this name\n"));
printf(_(" --csv emit output as CSV\n"));
puts("");
printf(_("CLUSTER CLEANUP\n"));

File diff suppressed because it is too large Load Diff

View File

@@ -87,7 +87,7 @@ static void initialise_direct_clone(t_node_info *node_record);
static int run_basebackup(t_node_info *node_record);
static int run_file_backup(t_node_info *node_record);
static void copy_configuration_files(void);
static void copy_configuration_files(bool delete_after_copy);
static void drop_replication_slot_if_exists(PGconn *conn, int node_id, char *slot_name);
@@ -498,7 +498,33 @@ do_standby_clone(void)
termPQExpBuffer(&msg);
/* TODO: check all files are readable */
/*
* Here we'll attempt an initial test copy of the detected external
* files, to detect any issues before we run the base backup.
*
* Note this will exit with an error, unless -F/--force supplied.
*
* TODO: put the files in a temporary directory and move to their final
* destination once the database has been cloned.
*/
if (runtime_options.copy_external_config_files_destination == CONFIG_FILE_SAMEPATH)
{
/*
* Files will be placed in the same path as on the source server;
* don't delete after copying.
*/
copy_configuration_files(false);
}
else
{
/*
* Files will be placed in the data directory - delete after copying.
* They'll be copied again later; see TODO above.
*/
copy_configuration_files(true);
}
}
@@ -597,7 +623,12 @@ do_standby_clone(void)
*/
if (runtime_options.copy_external_config_files == true && config_files.entries > 0)
{
copy_configuration_files();
/*
* If "--copy-external-config-files=samepath" was used, the files will already
* have been copied.
*/
if (runtime_options.copy_external_config_files_destination == CONFIG_FILE_PGDATA)
copy_configuration_files(false);
}
/* Write the recovery.conf file */
@@ -938,7 +969,6 @@ _do_create_recovery_conf(void)
log_detail("%s", PQerrorMessage(source_conn));
}
exit(ERR_BAD_CONFIG);
}
@@ -955,7 +985,10 @@ _do_create_recovery_conf(void)
{
log_detail("%s", PQerrorMessage(source_conn));
}
else
{
log_hint(_("standby must be registered before a new recovery.conf file can be created"));
}
exit(ERR_BAD_CONFIG);
}
@@ -1021,6 +1054,7 @@ _do_create_recovery_conf(void)
local_node_record.slot_name,
upstream_node_record.node_name,
upstream_node_id);
if (runtime_options.force == false && runtime_options.dry_run == false)
{
log_error("%s", msg.data);
@@ -1052,7 +1086,7 @@ _do_create_recovery_conf(void)
initPQExpBuffer(&msg);
appendPQExpBuffer(&msg,
_("insufficient free replicaiton slots on upstream node \"%s\" (ID: %i)"),
_("insufficient free replication slots on upstream node \"%s\" (ID: %i)"),
upstream_node_record.node_name,
upstream_node_id);
@@ -1108,14 +1142,14 @@ _do_create_recovery_conf(void)
if (runtime_options.dry_run == true)
{
char recovery_conf_contents[MAXLEN] = "";
create_recovery_file(&upstream_node_record, &recovery_conninfo, recovery_conf_contents, false);
create_recovery_file(&local_node_record, &recovery_conninfo, recovery_conf_contents, false);
log_info(_("would create \"recovery.conf\" file in \"%s\""), local_data_directory);
log_detail(_("\n%s"), recovery_conf_contents);
}
else
{
if (!create_recovery_file(&upstream_node_record, &recovery_conninfo, local_data_directory, true))
if (!create_recovery_file(&local_node_record, &recovery_conninfo, local_data_directory, true))
{
log_error(_("unable to create \"recovery.conf\""));
}
@@ -1675,11 +1709,16 @@ do_standby_register(void)
termPQExpBuffer(&details);
/* if --wait-sync option set, wait for the records to synchronise */
/*
* If --wait-sync option set, wait for the records to synchronise
* (unless 0 seconds provided, which disables it, which is the same as
* not providing the option). The default value is -1, which means
* no timeout.
*/
if (PQstatus(conn) == CONNECTION_OK &&
runtime_options.wait_register_sync == true &&
runtime_options.wait_register_sync_seconds > 0)
runtime_options.wait_register_sync_seconds != 0)
{
bool sync_ok = false;
int timer = 0;
@@ -1703,7 +1742,11 @@ do_standby_register(void)
{
bool records_match = true;
if (runtime_options.wait_register_sync_seconds && runtime_options.wait_register_sync_seconds == timer)
/*
* If timeout set to a positive value, check if we've reached it and
* exit the loop
*/
if (runtime_options.wait_register_sync_seconds > 0 && runtime_options.wait_register_sync_seconds == timer)
break;
node_record_status = get_node_record(conn,
@@ -2126,7 +2169,13 @@ do_standby_follow(void)
log_verbose(LOG_DEBUG, "do_standby_follow()");
local_conn = establish_db_connection(config_file_options.conninfo, true);
local_conn = establish_db_connection(config_file_options.conninfo, false);
if (PQstatus(local_conn) != CONNECTION_OK)
{
log_hint(_("use \"repmgr node rejoin\" to re-add an inactive node to the replication cluster"));
exit(ERR_DB_CONN);
}
log_verbose(LOG_INFO, _("connected to local node"));
@@ -2218,7 +2267,7 @@ do_standby_follow(void)
if (config_file_options.use_replication_slots)
{
int free_slots = get_free_replication_slots(primary_conn);
int free_slots = get_free_replication_slot_count(primary_conn);
if (free_slots < 0)
{
log_error(_("unable to determine number of free replication slots on the primary"));
@@ -2313,6 +2362,74 @@ do_standby_follow(void)
&follow_output,
&follow_error_code);
/* unable to restart the standby */
if (success == false)
{
create_event_notification_extended(
primary_conn,
&config_file_options,
config_file_options.node_id,
"standby_follow",
success,
follow_output.data,
&event_info);
PQfinish(primary_conn);
log_notice(_("STANDBY FOLLOW failed"));
if (strlen( follow_output.data ))
log_detail("%s", follow_output.data);
termPQExpBuffer(&follow_output);
exit(follow_error_code);
}
termPQExpBuffer(&follow_output);
initPQExpBuffer(&follow_output);
/*
* Wait up to "standby_follow_timeout" seconds for standby to connect to
* upstream.
* For 9.6 and later, we could check pg_stat_wal_receiver on the local node.
*/
/* assume success, necessary if standby_follow_timeout is zero */
success = true;
for (timer = 0; timer < config_file_options.standby_follow_timeout; timer++)
{
success = is_downstream_node_attached(primary_conn, config_file_options.node_name);
if (success == true)
break;
log_verbose(LOG_DEBUG, "sleeping %i of max %i seconds waiting for standby to attach to primary",
timer + 1,
config_file_options.standby_follow_timeout);
sleep(1);
}
if (success == true)
{
log_notice(_("STANDBY FOLLOW successful"));
appendPQExpBuffer(&follow_output,
"standby attached to upstream node \"%s\" (node ID: %i)",
primary_node_record.node_name,
primary_node_id);
}
else
{
log_error(_("STANDBY FOLLOW failed"));
appendPQExpBuffer(&follow_output,
"standby did not attach to upstream node \"%s\" (node ID: %i) after %i seconds",
primary_node_record.node_name,
primary_node_id,
config_file_options.standby_follow_timeout);
}
log_detail("%s", follow_output.data);
create_event_notification_extended(
primary_conn,
&config_file_options,
@@ -2324,20 +2441,11 @@ do_standby_follow(void)
PQfinish(primary_conn);
if (success == false)
{
log_notice(_("STANDBY FOLLOW failed"));
log_detail("%s", follow_output.data);
termPQExpBuffer(&follow_output);
exit(follow_error_code);
}
log_notice(_("STANDBY FOLLOW successful"));
log_detail("%s", follow_output.data);
termPQExpBuffer(&follow_output);
if (success == false)
exit(ERR_FOLLOW_FAIL);
return;
}
@@ -2803,6 +2911,25 @@ do_standby_switchover(void)
exit(ERR_DB_QUERY);
}
/*
* Check that there's no exclusive backups running on the primary.
* We don't want to end up damaging the backup and also leaving the server in an
* state where there's control data saying it's in backup mode but there's no
* backup_label in PGDATA.
* If the DBA wants to do the switchover anyway, he should first stop the
* backup that's running.
*/
if (server_in_exclusive_backup_mode(remote_conn) != BACKUP_STATE_NO_BACKUP)
{
log_error(_("unable to perform a switchover while primary server is in exclusive backup mode"));
log_hint(_("stop backup before attempting the switchover"));
PQfinish(local_conn);
PQfinish(remote_conn);
exit(ERR_SWITCHOVER_FAIL);
}
/*
* Check this standby is attached to the demotion candidate
* TODO:
@@ -3335,8 +3462,6 @@ do_standby_switchover(void)
}
}
/*
* check there are sufficient free walsenders - obviously there's potential
* for a later race condition if some walsenders come into use before the
@@ -3760,7 +3885,6 @@ do_standby_switchover(void)
* If --siblings-follow specified, attempt to make them follow the new
* primary
*/
if (runtime_options.siblings_follow == true && sibling_nodes.node_count > 0)
{
int failed_follow_count = 0;
@@ -3787,8 +3911,17 @@ do_standby_switchover(void)
initPQExpBuffer(&remote_command_str);
make_remote_repmgr_path(&remote_command_str, &sibling_node_record);
appendPQExpBuffer(&remote_command_str,
"standby follow 2>/dev/null && echo \"1\" || echo \"0\"");
if (sibling_node_record.type == WITNESS)
{
appendPQExpBuffer(&remote_command_str,
"witness register -d \\'%s\\' --force 2>/dev/null && echo \"1\" || echo \"0\"",
local_node_record.conninfo);
}
else
{
appendPQExpBuffer(&remote_command_str,
"standby follow 2>/dev/null && echo \"1\" || echo \"0\"");
}
get_conninfo_value(cell->node_info->conninfo, "host", host);
log_debug("executing:\n %s", remote_command_str.data);
@@ -3803,8 +3936,16 @@ do_standby_switchover(void)
if (success == false || command_output.data[0] == '0')
{
log_warning(_("STANDBY FOLLOW failed on node \"%s\""),
cell->node_info->node_name);
if (sibling_node_record.type == WITNESS)
{
log_warning(_("WITNESS REGISTER failed on node \"%s\""),
cell->node_info->node_name);
}
else
{
log_warning(_("STANDBY FOLLOW failed on node \"%s\""),
cell->node_info->node_name);
}
failed_follow_count++;
}
@@ -3909,6 +4050,8 @@ check_source_server()
PGconn *privileged_conn = NULL;
char cluster_size[MAXLEN];
char *connstr = NULL;
t_node_info node_record = T_NODE_INFO_INITIALIZER;
RecordStatus record_status = RECORD_NOT_FOUND;
ExtensionStatus extension_status = REPMGR_UNKNOWN;
@@ -3917,8 +4060,11 @@ check_source_server()
log_verbose(LOG_DEBUG, "check_source_server()");
log_info(_("connecting to source node"));
source_conn = establish_db_connection_by_params(&source_conninfo, false);
connstr = param_list_to_string(&source_conninfo);
log_detail(_("connection string is: %s"), connstr);
pfree(connstr);
source_conn = establish_db_connection_by_params(&source_conninfo, false);
/*
* Unless in barman mode, exit with an error;
* establish_db_connection_by_params() will have already logged an error
@@ -4073,13 +4219,25 @@ check_source_server()
if (record_status == RECORD_FOUND)
{
t_conninfo_param_list upstream_conninfo = T_CONNINFO_PARAM_LIST_INITIALIZER;
char *upstream_conninfo_user;
initialize_conninfo_params(&upstream_conninfo, false);
parse_conninfo_string(node_record.conninfo, &upstream_conninfo, NULL, false);
strncpy(recovery_conninfo_str, node_record.conninfo, MAXLEN);
strncpy(upstream_repluser, node_record.repluser, NAMEDATALEN);
strncpy(upstream_user, param_get(&upstream_conninfo, "user"), NAMEDATALEN);
upstream_conninfo_user = param_get(&upstream_conninfo, "user");
if (upstream_conninfo_user != NULL)
{
strncpy(upstream_user, upstream_conninfo_user, NAMEDATALEN);
}
else
{
get_conninfo_default_value("user", upstream_user, NAMEDATALEN);
}
log_verbose(LOG_DEBUG, "upstream_user is \"%s\"", upstream_user);
upstream_conninfo_found = true;
}
@@ -4632,7 +4790,7 @@ initialise_direct_clone(t_node_info *node_record)
}
else
{
TablespaceListCell *cell = false;
TablespaceListCell *cell;
KeyValueList not_found = {NULL, NULL};
int total = 0,
matched = 0;
@@ -5690,7 +5848,7 @@ get_barman_property(char *dst, char *name, char *local_repmgr_directory)
static void
copy_configuration_files(void)
copy_configuration_files(bool delete_after_copy)
{
int i,
r;
@@ -5735,13 +5893,35 @@ copy_configuration_files(void)
r = copy_remote_files(runtime_options.host, runtime_options.remote_user,
file->filepath, dest_path.data, false, source_server_version_num);
termPQExpBuffer(&dest_path);
/*
* TODO: collate errors into list
*/
if (WEXITSTATUS(r))
{
log_error(_("standby clone: unable to copy config file \"%s\""),
file->filename);
log_hint(_("see preceding messages for details"));
if (runtime_options.force == false)
exit(ERR_BAD_RSYNC);
}
/*
* This is to check we can actually copy the files before running the
* main clone operation
*/
if (delete_after_copy == true)
{
/* this is very unlikely to happen, but log in case it does */
if (unlink(dest_path.data) < 0 && errno != ENOENT)
{
log_warning(_("unable to delete %s"), dest_path.data);
log_detail("%s", strerror(errno));
}
}
termPQExpBuffer(&dest_path);
}
return;
@@ -6353,6 +6533,7 @@ do_standby_help(void)
puts("");
printf(_(" \"standby clone\" clones a standby from the primary or an upstream node.\n"));
puts("");
printf(_(" -d, --dbname=conninfo conninfo of the upstream node to use for cloning.\n"));
printf(_(" -c, --fast-checkpoint force fast checkpoint\n"));
printf(_(" --copy-external-config-files[={samepath|pgdata}]\n" \
" copy configuration files located outside the \n" \

View File

@@ -137,7 +137,7 @@ do_witness_register(void)
}
/*
* TODO:sanity check witness node is not part of main cluster; we could
* TODO: sanity check witness node is not part of main cluster; we could
* add a random application_name to the respective connections,
* and do a simple check of pg_stat_activity
*/
@@ -193,8 +193,26 @@ do_witness_register(void)
}
}
/*
* Check that an active node with the same node_name doesn't exist already
*/
// XXX check other node with same name does not exist
record_status = get_node_record_by_name(primary_conn,
config_file_options.node_name,
&node_record);
if (record_status == RECORD_FOUND)
{
if (node_record.active == true && node_record.node_id != config_file_options.node_id)
{
log_error(_("node %i exists already with node_name \"%s\""),
node_record.node_id,
config_file_options.node_name);
PQfinish(primary_conn);
exit(ERR_BAD_CONFIG);
}
}
/*
* if repmgr.nodes contains entries, delete if -F/--force provided,
@@ -225,6 +243,7 @@ do_witness_register(void)
PQfinish(witness_conn);
exit(SUCCESS);
}
/* create record on primary */
/*
@@ -291,55 +310,59 @@ do_witness_register(void)
void
do_witness_unregister(void)
{
PGconn *witness_conn = NULL;
PGconn *local_conn = NULL;
PGconn *primary_conn = NULL;
t_node_info node_record = T_NODE_INFO_INITIALIZER;
RecordStatus record_status = RECORD_NOT_FOUND;
bool node_record_deleted = false;
bool witness_available = true;
bool local_node_available = true;
int witness_node_id = UNKNOWN_NODE_ID;
log_info(_("connecting to witness node \"%s\" (ID: %i)"),
if (runtime_options.node_id != UNKNOWN_NODE_ID)
{
/* user has specified the witness node id */
witness_node_id = runtime_options.node_id;
}
else
{
/* assume witness node is local node */
witness_node_id = config_file_options.node_id;
}
log_info(_("connecting to node \"%s\" (ID: %i)"),
config_file_options.node_name,
config_file_options.node_id);
witness_conn = establish_db_connection_quiet(config_file_options.conninfo);
local_conn = establish_db_connection_quiet(config_file_options.conninfo);
if (PQstatus(witness_conn) != CONNECTION_OK)
if (PQstatus(local_conn) != CONNECTION_OK)
{
if (!runtime_options.force)
{
log_error(_("unable to connect to witness node \"%s\" (ID: %i)"),
log_error(_("unable to connect to node \"%s\" (ID: %i)"),
config_file_options.node_name,
config_file_options.node_id);
log_detail("%s", PQerrorMessage(witness_conn));
log_hint(_("provide -F/--force to remove the witness record if the server is not running"));
log_detail("%s", PQerrorMessage(local_conn));
exit(ERR_BAD_CONFIG);
}
log_notice(_("unable to connect to witness node \"%s\" (ID: %i), removing node record on cluster primary only"),
config_file_options.node_name,
config_file_options.node_id);
witness_available = false;
local_node_available = false;
}
if (witness_available == true)
if (local_node_available == true)
{
primary_conn = get_primary_connection_quiet(witness_conn, NULL, NULL);
primary_conn = get_primary_connection_quiet(local_conn, NULL, NULL);
}
else
{
/*
* Extract the repmgr user and database names from the conninfo string
* provided in repmgr.conf
* Assume user has provided connection details for the primary server
*/
get_conninfo_value(config_file_options.conninfo, "user", repmgr_user);
get_conninfo_value(config_file_options.conninfo, "dbname", repmgr_db);
param_set_ine(&source_conninfo, "user", repmgr_user);
param_set_ine(&source_conninfo, "dbname", repmgr_db);
primary_conn = establish_db_connection_by_params(&source_conninfo, false);
}
if (PQstatus(primary_conn) != CONNECTION_OK)
@@ -347,26 +370,26 @@ do_witness_unregister(void)
log_error(_("unable to connect to primary"));
log_detail("%s", PQerrorMessage(primary_conn));
if (witness_available == true)
if (local_node_available == true)
{
PQfinish(witness_conn);
PQfinish(local_conn);
}
else
else if (runtime_options.connection_param_provided == false)
{
log_hint(_("provide connection details to primary server"));
log_hint(_("provide connection details for the primary server"));
}
exit(ERR_BAD_CONFIG);
}
/* Check node exists and is really a witness */
record_status = get_node_record(primary_conn, config_file_options.node_id, &node_record);
record_status = get_node_record(primary_conn, witness_node_id, &node_record);
if (record_status != RECORD_FOUND)
{
log_error(_("no record found for node %i"), config_file_options.node_id);
log_error(_("no record found for node %i"), witness_node_id);
if (witness_available == true)
PQfinish(witness_conn);
if (local_node_available == true)
PQfinish(local_conn);
PQfinish(primary_conn);
exit(ERR_BAD_CONFIG);
@@ -374,11 +397,17 @@ do_witness_unregister(void)
if (node_record.type != WITNESS)
{
/*
* The node (either explicitly provided with --node-id, or the local node)
* is not a witness.
*
* TODO: scan node list and print hint about identity of known witness servers.
*/
log_error(_("node %i is not a witness node"), config_file_options.node_id);
log_detail(_("node %i is a %s node"), config_file_options.node_id, get_node_type_string(node_record.type));
if (witness_available == true)
PQfinish(witness_conn);
if (local_node_available == true)
PQfinish(local_conn);
PQfinish(primary_conn);
exit(ERR_BAD_CONFIG);
@@ -387,49 +416,43 @@ do_witness_unregister(void)
if (runtime_options.dry_run == true)
{
log_info(_("prerequisites for unregistering the witness node are met"));
if (witness_available == true)
PQfinish(witness_conn);
if (local_node_available == true)
PQfinish(local_conn);
PQfinish(primary_conn);
exit(SUCCESS);
}
log_info(_("unregistering witness node %i"), config_file_options.node_id);
log_info(_("unregistering witness node %i"), witness_node_id);
node_record_deleted = delete_node_record(primary_conn,
config_file_options.node_id);
witness_node_id);
if (node_record_deleted == false)
{
PQfinish(primary_conn);
PQfinish(witness_conn);
exit(ERR_BAD_CONFIG);
}
/* sync records from primary */
if (witness_available == true && witness_copy_node_records(primary_conn, witness_conn) == false)
{
log_error(_("unable to copy repmgr node records from primary"));
PQfinish(primary_conn);
PQfinish(witness_conn);
if (local_node_available == true)
PQfinish(local_conn);
PQfinish(local_conn);
exit(ERR_BAD_CONFIG);
}
/* Log the event */
create_event_record(primary_conn,
&config_file_options,
config_file_options.node_id,
witness_node_id,
"witness_unregister",
true,
NULL);
PQfinish(primary_conn);
if (witness_available == true)
PQfinish(witness_conn);
if (local_node_available == true)
PQfinish(local_conn);
log_info(_("witness unregistration complete"));
log_detail(_("witness node with id %i (conninfo: %s) successfully unregistered"),
config_file_options.node_id, config_file_options.conninfo);
log_detail(_("witness node with ID %i successfully unregistered"),
witness_node_id);
return;
}
@@ -449,16 +472,19 @@ void do_witness_help(void)
puts("");
printf(_(" Requires provision of connection information for the primary\n"));
puts("");
printf(_(" --dry-run check prerequisites but don't make any changes\n"));
printf(_(" -F, --force overwrite an existing node record\n"));
printf(_(" --dry-run check prerequisites but don't make any changes\n"));
printf(_(" -F, --force overwrite an existing node record\n"));
puts("");
printf(_("WITNESS UNREGISTER\n"));
puts("");
printf(_(" \"witness register\" unregisters a witness node.\n"));
puts("");
printf(_(" --dry-run check prerequisites but don't make any changes\n"));
printf(_(" -F, --force unregister when witness node not running\n"));
printf(_(" --dry-run check prerequisites but don't make any changes\n"));
printf(_(" -F, --force unregister when witness node not running\n"));
printf(_(" --node-id node ID of the witness node (provide if executing on\n"));
printf(_(" another node)\n"));
puts("");
return;

View File

@@ -47,6 +47,7 @@ typedef struct
/* logging options */
char log_level[MAXLEN]; /* overrides setting in repmgr.conf */
bool log_to_file;
bool quiet;
bool terse;
bool verbose;
@@ -106,6 +107,7 @@ typedef struct
bool replication_lag;
bool role;
bool slots;
bool missing_slots;
bool has_passfile;
bool replication_connection;
@@ -137,7 +139,7 @@ typedef struct
/* general configuration options */ \
"", false, false, "", false, false, \
/* logging options */ \
"", false, false, false, \
"", false, false, false, false, \
/* output options */ \
false, false, false, \
/* database connection options */ \
@@ -152,13 +154,13 @@ typedef struct
/* "standby clone"/"standby follow" options */ \
NO_UPSTREAM_NODE, \
/* "standby register" options */ \
false, 0, DEFAULT_WAIT_START, \
false, -1, DEFAULT_WAIT_START, \
/* "standby switchover" options */ \
false, false, "", false, \
/* "node status" options */ \
false, \
/* "node check" options */ \
false, false, false, false, false, false, false, \
false, false, false, false, false, false, false, false, \
/* "node join" options */ \
"", \
/* "node service" options */ \

View File

@@ -98,7 +98,7 @@ main(int argc, char **argv)
{
t_conninfo_param_list default_conninfo = T_CONNINFO_PARAM_LIST_INITIALIZER;
int optindex;
int optindex = 0;
int c;
char *repmgr_command = NULL;
@@ -108,6 +108,7 @@ main(int argc, char **argv)
char *dummy_action = "";
bool help_option = false;
bool option_error_found = false;
set_progname(argv[0]);
@@ -178,7 +179,10 @@ main(int argc, char **argv)
strncpy(runtime_options.username, pw->pw_name, MAXLEN);
}
while ((c = getopt_long(argc, argv, "?Vb:f:FwWd:h:p:U:R:S:D:ck:L:tvC:", long_options,
/* Make getopt emitting errors */
opterr = 1;
while ((c = getopt_long(argc, argv, "?Vb:f:FwWd:h:p:U:R:S:D:ck:L:qtvC:", long_options,
&optindex)) != -1)
{
/*
@@ -196,13 +200,7 @@ main(int argc, char **argv)
case OPT_HELP: /* --help */
help_option = true;
break;
case '?':
/* Actual help option given */
if (strcmp(argv[optind - 1], "-?") == 0)
{
help_option = true;
}
break;
case 'V':
/*
@@ -473,6 +471,10 @@ main(int argc, char **argv)
runtime_options.slots = true;
break;
case OPT_MISSING_SLOTS:
runtime_options.missing_slots = true;
break;
case OPT_HAS_PASSFILE:
runtime_options.has_passfile = true;
break;
@@ -572,6 +574,12 @@ main(int argc, char **argv)
logger_output_mode = OM_DAEMON;
break;
/* --quiet */
case 'q':
runtime_options.quiet = true;
break;
/* --terse */
case 't':
runtime_options.terse = true;
@@ -627,14 +635,29 @@ main(int argc, char **argv)
_("--recovery-min-apply-delay is now a configuration file parameter, \"recovery_min_apply_delay\""));
break;
case ':': /* missing option argument */
option_error_found = true;
break;
case '?':
/* Actual help option given? */
if (strcmp(argv[optind - 1], "-?") == 0)
{
help_option = true;
break;
}
/* otherwise fall through to default */
default: /* invalid option */
option_error_found = true;
break;
}
}
/*
* If -d/--dbname appears to be a conninfo string, validate by attempting
* to parse it (and if successful, store the parsed parameters)
*/
if (runtime_options.dbname)
if (runtime_options.dbname[0])
{
if (strncmp(runtime_options.dbname, "postgresql://", 13) == 0 ||
strncmp(runtime_options.dbname, "postgres://", 11) == 0 ||
@@ -730,9 +753,10 @@ main(int argc, char **argv)
if (cli_errors.head != NULL)
{
free_conninfo_params(&source_conninfo);
exit_with_cli_errors(&cli_errors);
exit_with_cli_errors(&cli_errors, NULL);
}
/*----------
* Determine the node type and action; following are valid:
*
@@ -979,9 +1003,30 @@ main(int argc, char **argv)
if (cli_errors.head != NULL)
{
free_conninfo_params(&source_conninfo);
exit_with_cli_errors(&cli_errors);
exit_with_cli_errors(&cli_errors, valid_repmgr_command_found == true ? repmgr_command : NULL);
}
/* no errors detected by repmgr, but getopt might have */
if (option_error_found == true)
{
if (valid_repmgr_command_found == true)
{
printf(_("Try \"%s --help\" or \"%s %s --help\" for more information.\n"),
progname(),
progname(),
repmgr_command);
}
else
{
printf(_("Try \"repmgr --help\" for more information.\n"));
}
free_conninfo_params(&source_conninfo);
exit(ERR_BAD_CONFIG);
}
/*
* Print any warnings about inappropriate command line options, unless
* -t/--terse set
@@ -1010,7 +1055,6 @@ main(int argc, char **argv)
runtime_options.output_mode = OM_OPTFORMAT;
}
/*
* Check for configuration file items which can be overriden by runtime
* options
@@ -1068,6 +1112,28 @@ main(int argc, char **argv)
if (runtime_options.terse)
logger_set_terse();
/*
* If --dry-run specified, ensure log_level is at least LOG_INFO, regardless
* of what's in the configuration file or -L/--log-level paremeter, otherwise
* some or output might not be displayed.
*/
if (runtime_options.dry_run == true)
{
logger_set_min_level(LOG_INFO);
}
/*
* If -q/--quiet supplied, suppress any non-ERROR log output.
* This overrides everything else; we'll leave it up to the user to deal with the
* consequences of e.g. running --dry-run together with -q/--quiet.
*/
if (runtime_options.quiet == true)
{
logger_set_level(LOG_ERROR);
}
/*
* Node configuration information is not needed for all actions, with
* STANDBY CLONE being the main exception.
@@ -1453,6 +1519,7 @@ check_cli_parameters(const int action)
{
case PRIMARY_UNREGISTER:
case STANDBY_UNREGISTER:
case WITNESS_UNREGISTER:
case CLUSTER_EVENT:
case CLUSTER_MATRIX:
case CLUSTER_CROSSCHECK:
@@ -1493,6 +1560,7 @@ check_cli_parameters(const int action)
case STANDBY_CLONE:
case STANDBY_REGISTER:
case STANDBY_FOLLOW:
case BDR_REGISTER:
break;
default:
item_list_append_format(&cli_warnings,
@@ -1835,7 +1903,7 @@ do_help(void)
printf(_(" %s [OPTIONS] standby {register|unregister|clone|promote|follow|switchover}\n"), progname());
printf(_(" %s [OPTIONS] bdr {register|unregister}\n"), progname());
printf(_(" %s [OPTIONS] node {status|check|rejoin|service}\n"), progname());
printf(_(" %s [OPTIONS] cluster {show|event|matrix|crosscheck}\n"), progname());
printf(_(" %s [OPTIONS] cluster {show|event|matrix|crosscheck|cleanup}\n"), progname());
printf(_(" %s [OPTIONS] witness {register|unregister}\n"), progname());
puts("");
@@ -1884,6 +1952,7 @@ do_help(void)
printf(_(" --dry-run show what would happen for action, but don't execute it\n"));
printf(_(" -L, --log-level set log level (overrides configuration file; default: NOTICE)\n"));
printf(_(" --log-to-file log to file (or logging facility) defined in repmgr.conf\n"));
printf(_(" -q, --quiet suppress all log output apart from errors\n"));
printf(_(" -t, --terse don't display detail, hints and other non-critical output\n"));
printf(_(" -v, --verbose display additional log output (useful for debugging)\n"));

View File

@@ -87,6 +87,7 @@
#define OPT_REMOTE_NODE_ID 1038
#define OPT_RECOVERY_CONF_ONLY 1039
#define OPT_NO_WAIT 1040
#define OPT_MISSING_SLOTS 1041
/* deprecated since 3.3 */
#define OPT_DATA_DIR 999
@@ -125,6 +126,7 @@ static struct option long_options[] =
/* logging options */
{"log-level", required_argument, NULL, 'L'},
{"log-to-file", no_argument, NULL, OPT_LOG_TO_FILE},
{"quiet", no_argument, NULL, 'q'},
{"terse", no_argument, NULL, 't'},
{"verbose", no_argument, NULL, 'v'},
@@ -164,6 +166,7 @@ static struct option long_options[] =
{"replication-lag", no_argument, NULL, OPT_REPLICATION_LAG},
{"role", no_argument, NULL, OPT_ROLE},
{"slots", no_argument, NULL, OPT_SLOTS},
{"missing-slots", no_argument, NULL, OPT_MISSING_SLOTS},
{"has-passfile", no_argument, NULL, OPT_HAS_PASSFILE},
{"replication-connection", no_argument, NULL, OPT_REPL_CONN},

View File

@@ -98,7 +98,7 @@
#log_facility=STDERR # Logging facility: possible values are STDERR, or for
# syslog integration, one of LOCAL0, LOCAL1, ..., LOCAL7, USER
#log_file='' # stderr can be redirected to an arbitrary file:
#log_file='' # STDERR can be redirected to an arbitrary file
#log_status_interval=300 # interval (in seconds) for repmgrd to log a status message
@@ -207,16 +207,40 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
#------------------------------------------------------------------------------
# Standby follow settings
# "standby follow" settings
#------------------------------------------------------------------------------
# These settings apply when instructing a standby to follow the new primary
# ("repmgr standby follow").
#primary_follow_timeout=60 # The length of time (in seconds) to wait
#primary_follow_timeout=60 # The max length of time (in seconds) to wait
# for the new primary to become available
#standby_follow_timeout=15 # The max length of time (in seconds) to wait
# for the standby to connect to the primary
#------------------------------------------------------------------------------
# "standby switchover" settings
#------------------------------------------------------------------------------
# These settings apply when switching roles between a primary and a standby
# ("repmgr standby switchover").
#standby_reconnect_timeout=60 # The max length of time (in seconds) to wait
# for the demoted standby to reconnect to the promoted
# primary (note: this value should be equal to or greater
# than that set for "node_rejoin_timeout")
#------------------------------------------------------------------------------
# "node rejoin" settings
#------------------------------------------------------------------------------
# These settings apply when reintegrating a node into a replication cluster
# with "repmgrd_node_rejoin"
#node_rejoin_timeout=60 # The maximum length of time (in seconds) to wait for
# the node to reconnect to the replication cluster
#------------------------------------------------------------------------------
# Barman options
#------------------------------------------------------------------------------
@@ -234,6 +258,11 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
# These settings are only applied when repmgrd is running. Values shown
# are defaults.
#repmgrd_pid_file= # Path of PID file to use for repmgrd; if not set, a PID file will
# be generated in a temporary directory specified by the environment
# variable $TMPDIR, or if not set, in "/tmp". This value can be overridden
# by the command line option "-p/--pid-file"; the command line option
# "--no-pid-file" will force PID file creation to be skipped.
#failover=manual # one of 'automatic', 'manual'.
# determines what action to take in the event of upstream failure
#
@@ -243,7 +272,7 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
# manual attention to reattach it to replication
# (does not apply to BDR mode)
#priority=100 # indicate a preferred priorty for promoting nodes;
#priority=100 # indicate a preferred priority for promoting nodes;
# a value of zero prevents the node being promoted to primary
# (default: 100)
@@ -251,11 +280,11 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
# primary (or other upstream node)
#reconnect_interval=10 # Interval between attempts to reconnect to an unreachable
# primary (or other upstream node)
#promote_command= # command to execute when promoting a new primary; use something like:
#promote_command= # command repmgrd executes when promoting a new primary; use something like:
#
# repmgr standby promote -f /etc/repmgr.conf
#
#follow_command= # command to execute when instructing a standby to follow a new primary;
#follow_command= # command repmgrd executes when instructing a standby to follow a new primary;
# use something like:
#
# repmgr standby follow -f /etc/repmgr.conf -W --upstream-node-id=%n
@@ -263,8 +292,9 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
#primary_notification_timeout=60 # Interval (in seconds) which repmgrd on a standby
# will wait for a notification from the new primary,
# before falling back to degraded monitoring
#standby_reconnect_timeout=60 # Interval (in seconds) which repmgrd on a standby will wait
# to reconnect to the local node after executing "follow_command"
#repmgrd_standby_startup_timeout=60 # Interval (in seconds) which repmgrd on a standby will wait
# for the the local node to restart and become ready to accept connections after
# executing "follow_command" (defaults to the value set in "standby_reconnect_timeout")
#monitoring_history=no # Whether to write monitoring data to the "montoring_history" table
#monitor_interval_secs=2 # Interval (in seconds) at which to write monitoring data
@@ -308,11 +338,11 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
#service_stop_command = ''
#service_restart_command = ''
#service_reload_command = ''
#service_promote_command = '' # Note: this overrides any value contained in the setting
# "promote_command". This is intended for systems which
# provide a package-level promote command, such as Debian's
# "pg_ctlcluster"
#service_promote_command = '' # This parameter is intended for systems which provide a
# package-level promote command, such as Debian's
# "pg_ctlcluster". *IMPORTANT*: it is *not* a substitute
# for "promote_command"; do not use "repmgr standby promote"
# (or a script which executes "repmgr standby promote") here.
#------------------------------------------------------------------------------
# Status check thresholds

View File

@@ -1,6 +1,6 @@
# repmgr extension
comment = 'Replication manager for PostgreSQL'
default_version = '4.0'
default_version = '4.1'
module_pathname = '$libdir/repmgr'
relocatable = false
schema = repmgr

View File

@@ -49,6 +49,8 @@
#define REPLICATION_TYPE_BDR 2
#define UNKNOWN_SERVER_VERSION_NUM -1
#define UNKNOWN_BDR_VERSION_NUM -1
#define UNKNOWN_TIMELINE_ID -1
#define UNKNOWN_SYSTEM_IDENTIFIER 0
@@ -58,6 +60,8 @@
#define VOTING_TERM_NOT_SET -1
#define BDR2_REPLICATION_SET_NAME "repmgr"
/*
* various default values - ensure repmgr.conf.sample is update
* if any of these are changed
@@ -70,6 +74,7 @@
#define DEFAULT_ASYNC_QUERY_TIMEOUT 60 /* seconds */
#define DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT 60 /* seconds */
#define DEFAULT_PRIMARY_FOLLOW_TIMEOUT 60 /* seconds */
#define DEFAULT_STANDBY_FOLLOW_TIMEOUT 30 /* seconds */
#define DEFAULT_BDR_RECOVERY_TIMEOUT 30 /* seconds */
#define DEFAULT_ARCHIVE_READY_WARNING 16 /* WAL files */
#define DEFAULT_ARCHIVE_READY_CRITICAL 128 /* WAL files */
@@ -80,6 +85,7 @@
#define DEFAULT_PROMOTE_CHECK_TIMEOUT 60 /* seconds */
#define DEFAULT_PROMOTE_CHECK_INTERVAL 1 /* seconds */
#define DEFAULT_STANDBY_RECONNECT_TIMEOUT 60 /* seconds */
#define DEFAULT_NODE_REJOIN_TIMEOUT 60 /* seconds */
#ifndef RECOVERY_COMMAND_FILE
#define RECOVERY_COMMAND_FILE "recovery.conf"

View File

@@ -1,3 +1,2 @@
#define REPMGR_VERSION_DATE ""
#define REPMGR_VERSION "4.0.5"
#define REPMGR_VERSION "4.1.0"

View File

@@ -58,7 +58,7 @@ static FailoverState failover_state = FAILOVER_STATE_UNKNOWN;
static int primary_node_id = UNKNOWN_NODE_ID;
static t_node_info upstream_node_info = T_NODE_INFO_INITIALIZER;
static NodeInfoList standby_nodes = T_NODE_INFO_LIST_INITIALIZER;
static NodeInfoList sibling_nodes = T_NODE_INFO_LIST_INITIALIZER;
static ElectionResult do_election(void);
@@ -162,8 +162,8 @@ do_physical_node_check(void)
if (config_file_options.failover == FAILOVER_AUTOMATIC)
{
/*
* check that promote/follow commands are defined, otherwise repmgrd
* won't be able to perform any useful action
* Check that "promote_command" and "follow_command" are defined, otherwise repmgrd
* won't be able to perform any useful action in a failover situation.
*/
bool required_param_missing = false;
@@ -175,14 +175,24 @@ do_physical_node_check(void)
if (config_file_options.service_promote_command[0] != '\0')
{
/*
* if repmgrd executes "service_promote_command" directly,
* repmgr metadata won't get updated
* "service_promote_command" is *not* a substitute for "promote_command";
* it is intended for use in those systems (e.g. Debian) where there's a service
* level promote command (e.g. pg_ctlcluster).
*
* "promote_command" should either execute "repmgr standby promote" directly, or
* a script which executes "repmgr standby promote". This is essential, as the
* repmgr metadata is updated by "repmgr standby promote".
*
* "service_promote_command", if set, will be executed by "repmgr standby promote",
* but never by repmgrd.
*
*/
log_hint(_("\"service_promote_command\" is set, but can only be executed by \"repmgr standby promote\""));
}
required_param_missing = true;
}
if (config_file_options.follow_command[0] == '\0')
{
log_error(_("\"follow_command\" must be defined in the configuration file"));
@@ -816,6 +826,29 @@ monitor_streaming_standby(void)
{
int degraded_monitoring_elapsed = calculate_elapsed(degraded_monitoring_start);
if (config_file_options.degraded_monitoring_timeout > 0
&& degraded_monitoring_elapsed > config_file_options.degraded_monitoring_timeout)
{
initPQExpBuffer(&event_details);
appendPQExpBuffer(&event_details,
_("degraded monitoring timeout (%i seconds) exceeded, terminating"),
degraded_monitoring_elapsed);
log_notice("%s", event_details.data);
create_event_notification(NULL,
&config_file_options,
config_file_options.node_id,
"repmgrd_shutdown",
true,
event_details.data);
termPQExpBuffer(&event_details);
terminate(ERR_MONITORING_TIMEOUT);
}
log_debug("monitoring node %i in degraded state for %i seconds",
upstream_node_info.node_id,
degraded_monitoring_elapsed);
@@ -918,8 +951,8 @@ monitor_streaming_standby(void)
get_active_sibling_node_records(local_conn,
local_node_info.node_id,
former_upstream_node_id,
&standby_nodes);
notify_followers(&standby_nodes, local_node_info.node_id);
&sibling_nodes);
notify_followers(&sibling_nodes, local_node_info.node_id);
/* this will restart monitoring in primary mode */
monitoring_state = MS_NORMAL;
@@ -958,12 +991,12 @@ monitor_streaming_standby(void)
get_active_sibling_node_records(local_conn,
local_node_info.node_id,
local_node_info.upstream_node_id,
&standby_nodes);
&sibling_nodes);
if (standby_nodes.node_count > 0)
if (sibling_nodes.node_count > 0)
{
log_debug("scanning %i node records to detect new primary...", standby_nodes.node_count);
for (cell = standby_nodes.head; cell; cell = cell->next)
log_debug("scanning %i node records to detect new primary...", sibling_nodes.node_count);
for (cell = sibling_nodes.head; cell; cell = cell->next)
{
/* skip local node check, we did that above */
if (cell->node_info->node_id == local_node_info.node_id)
@@ -993,7 +1026,7 @@ monitor_streaming_standby(void)
follow_new_primary(follow_node_id);
}
}
clear_node_info_list(&standby_nodes);
clear_node_info_list(&sibling_nodes);
}
}
}
@@ -1395,12 +1428,12 @@ monitor_streaming_witness(void)
get_active_sibling_node_records(local_conn,
local_node_info.node_id,
local_node_info.upstream_node_id,
&standby_nodes);
&sibling_nodes);
if (standby_nodes.node_count > 0)
if (sibling_nodes.node_count > 0)
{
log_debug("scanning %i node records to detect new primary...", standby_nodes.node_count);
for (cell = standby_nodes.head; cell; cell = cell->next)
log_debug("scanning %i node records to detect new primary...", sibling_nodes.node_count);
for (cell = sibling_nodes.head; cell; cell = cell->next)
{
/* skip local node check, we did that above */
if (cell->node_info->node_id == local_node_info.node_id)
@@ -1430,7 +1463,7 @@ monitor_streaming_witness(void)
witness_follow_new_primary(follow_node_id);
}
}
clear_node_info_list(&standby_nodes);
clear_node_info_list(&sibling_nodes);
}
}
loop:
@@ -1516,8 +1549,15 @@ loop:
static bool
do_primary_failover(void)
{
ElectionResult election_result;
/*
* Double-check status of the local connection
*/
check_connection(&local_node_info, &local_conn);
/* attempt to initiate voting process */
ElectionResult election_result = do_election();
election_result = do_election();
/* TODO add pre-event notification here */
failover_state = FAILOVER_STATE_UNKNOWN;
@@ -1531,7 +1571,7 @@ do_primary_failover(void)
}
else if (election_result == ELECTION_WON)
{
if (standby_nodes.node_count > 0)
if (sibling_nodes.node_count > 0)
{
log_notice("this node is the winner, will now promote itself and inform other nodes");
}
@@ -1576,7 +1616,7 @@ do_primary_failover(void)
get_active_sibling_node_records(local_conn,
local_node_info.node_id,
upstream_node_info.node_id,
&standby_nodes);
&sibling_nodes);
}
else if (config_file_options.failover == FAILOVER_MANUAL)
@@ -1638,10 +1678,10 @@ do_primary_failover(void)
{
case FAILOVER_STATE_PROMOTED:
/* notify former siblings that they should now follow this node */
notify_followers(&standby_nodes, local_node_info.node_id);
notify_followers(&sibling_nodes, local_node_info.node_id);
/* we no longer care about our former siblings */
clear_node_info_list(&standby_nodes);
clear_node_info_list(&sibling_nodes);
/* pass control back down to start_monitoring() */
log_info(_("switching to primary monitoring mode"));
@@ -1655,10 +1695,10 @@ do_primary_failover(void)
* notify siblings that they should resume following the original
* primary
*/
notify_followers(&standby_nodes, upstream_node_info.node_id);
notify_followers(&sibling_nodes, upstream_node_info.node_id);
/* we no longer care about our former siblings */
clear_node_info_list(&standby_nodes);
clear_node_info_list(&sibling_nodes);
/* pass control back down to start_monitoring() */
log_info(_("resuming standby monitoring mode"));
@@ -1918,7 +1958,7 @@ do_upstream_standby_failover(void)
* completes, so poll for a while until we get a connection.
*/
for (i = 0; i < config_file_options.standby_reconnect_timeout; i++)
for (i = 0; i < config_file_options.repmgrd_standby_startup_timeout; i++)
{
local_conn = establish_db_connection(local_node_info.conninfo, false);
@@ -1927,7 +1967,7 @@ do_upstream_standby_failover(void)
log_debug("sleeping 1 second; %i of %i attempts to reconnect to local node",
i + 1,
config_file_options.standby_reconnect_timeout);
config_file_options.repmgrd_standby_startup_timeout);
sleep(1);
}
@@ -2033,10 +2073,10 @@ promote_self(void)
return FAILOVER_STATE_PROMOTION_FAILED;
}
/* the presence of either of this command has been established already */
/* the presence of this command has been established already */
promote_command = config_file_options.promote_command;
log_debug("promote command is:\n \"%s\"",
log_info(_("promote_command is:\n \"%s\""),
promote_command);
if (log_type == REPMGR_STDERR && *config_file_options.log_file)
@@ -2368,7 +2408,7 @@ follow_new_primary(int new_primary_id)
* completes, so poll for a while until we get a connection.
*/
for (i = 0; i < config_file_options.standby_reconnect_timeout; i++)
for (i = 0; i < config_file_options.repmgrd_standby_startup_timeout; i++)
{
local_conn = establish_db_connection(local_node_info.conninfo, false);
@@ -2377,7 +2417,7 @@ follow_new_primary(int new_primary_id)
log_debug("sleeping 1 second; %i of %i attempts to reconnect to local node",
i + 1,
config_file_options.standby_reconnect_timeout);
config_file_options.repmgrd_standby_startup_timeout);
sleep(1);
}
@@ -2543,6 +2583,7 @@ do_election(void)
/* we're visible */
int visible_nodes = 1;
int total_nodes = 0;
NodeInfoListCell *cell = NULL;
@@ -2593,14 +2634,16 @@ do_election(void)
get_active_sibling_node_records(local_conn,
local_node_info.node_id,
upstream_node_info.node_id,
&standby_nodes);
&sibling_nodes);
total_nodes = sibling_nodes.node_count + 1;
log_debug("do_election(): primary location is %s", upstream_node_info.location);
local_node_info.last_wal_receive_lsn = InvalidXLogRecPtr;
/* fast path if no other standbys (or witness) exists - normally win by default */
if (standby_nodes.node_count == 0)
if (sibling_nodes.node_count == 0)
{
if (strncmp(upstream_node_info.location, local_node_info.location, MAXLEN) == 0)
{
@@ -2628,7 +2671,7 @@ do_election(void)
}
else
{
/* standby nodes found - check if we're in the primary location befor checking theirs */
/* standby nodes found - check if we're in the primary location before checking theirs */
if (strncmp(upstream_node_info.location, local_node_info.location, MAXLEN) == 0)
{
primary_location_seen = true;
@@ -2643,7 +2686,7 @@ do_election(void)
/* pointer to "winning" node, initially self */
candidate_node = &local_node_info;
for (cell = standby_nodes.head; cell; cell = cell->next)
for (cell = sibling_nodes.head; cell; cell = cell->next)
{
/* assume the worst case */
cell->node_info->node_status = NODE_STATUS_UNKNOWN;
@@ -2698,7 +2741,7 @@ do_election(void)
candidate_node = cell->node_info;
}
/* LSN is same - tiebreak on priority, then node_id */
else if(cell->node_info->last_wal_receive_lsn == candidate_node->last_wal_receive_lsn)
else if (cell->node_info->last_wal_receive_lsn == candidate_node->last_wal_receive_lsn)
{
log_verbose(LOG_DEBUG, "node %i has same LSN as current candidate %i",
cell->node_info->node_id,
@@ -2750,9 +2793,9 @@ do_election(void)
log_debug("visible nodes: %i; total nodes: %i",
visible_nodes,
standby_nodes.node_count);
total_nodes);
if (visible_nodes <= (standby_nodes.node_count / 2.0))
if (visible_nodes <= (total_nodes / 2.0))
{
log_notice(_("unable to reach a qualified majority of nodes"));
log_detail(_("node will enter degraded monitoring state waiting for reconnect"));

101
repmgrd.c
View File

@@ -35,8 +35,10 @@
static char *config_file = NULL;
static bool verbose = false;
static char *pid_file = NULL;
static bool daemonize = false;
static char pid_file[MAXPGPATH];
static bool daemonize = true;
static bool show_pid_file = false;
static bool no_pid_file = false;
t_configuration_options config_file_options = T_CONFIGURATION_OPTIONS_INITIALIZER;
@@ -99,8 +101,10 @@ main(int argc, char **argv)
{"config-file", required_argument, NULL, 'f'},
/* daemon options */
{"daemonize", no_argument, NULL, 'd'},
{"daemonize", optional_argument, NULL, 'd'},
{"pid-file", required_argument, NULL, 'p'},
{"show-pid-file", no_argument, NULL, 's'},
{"no-pid-file", no_argument, NULL, OPT_NO_PID_FILE},
/* logging options */
{"log-level", required_argument, NULL, 'L'},
@@ -113,8 +117,6 @@ main(int argc, char **argv)
set_progname(argv[0]);
srand(time(NULL));
/* Disallow running as root */
if (geteuid() == 0)
{
@@ -128,6 +130,10 @@ main(int argc, char **argv)
exit(1);
}
srand(time(NULL));
memset(pid_file, 0, MAXPGPATH);
while ((c = getopt_long(argc, argv, "?Vf:L:vdp:m", long_options, &optindex)) != -1)
{
switch (c)
@@ -169,11 +175,22 @@ main(int argc, char **argv)
/* daemon options */
case 'd':
daemonize = true;
if (optarg != NULL)
{
daemonize = parse_bool(optarg, "-d/--daemonize", &cli_errors);
}
break;
case 'p':
pid_file = optarg;
strncpy(pid_file, optarg, MAXPGPATH);
break;
case 's':
show_pid_file = true;
break;
case OPT_NO_PID_FILE:
no_pid_file = true;
break;
/* logging options */
@@ -220,7 +237,7 @@ main(int argc, char **argv)
/* Exit here already if errors in command line options found */
if (cli_errors.head != NULL)
{
exit_with_cli_errors(&cli_errors);
exit_with_cli_errors(&cli_errors, NULL);
}
startup_event_logged = false;
@@ -239,6 +256,58 @@ main(int argc, char **argv)
*/
load_config(config_file, verbose, false, &config_file_options, argv[0]);
/* Determine pid file location, unless --no-pid-file supplied */
if (no_pid_file == false)
{
if (config_file_options.repmgrd_pid_file[0] != '\0')
{
if (pid_file[0] != '\0')
{
log_warning(_("\"repmgrd_pid_file\" will be overridden by --pid-file"));
}
else
{
strncpy(pid_file, config_file_options.repmgrd_pid_file, MAXPGPATH);
}
}
/* no pid file provided - determine location */
if (pid_file[0] == '\0')
{
/* packagers: if feasible, patch PID file path into "package_pid_file" */
char package_pid_file[MAXPGPATH] = "";
if (package_pid_file[0] != '\0')
{
maxpath_snprintf(pid_file, "%s", package_pid_file);
}
else
{
const char *tmpdir = getenv("TMPDIR");
if (!tmpdir)
tmpdir = "/tmp";
maxpath_snprintf(pid_file, "%s/repmgrd.pid", tmpdir);
}
}
}
else
{
/* --no-pid-file supplied - overwrite any value provided with --pid-file ... */
memset(pid_file, 0, MAXPGPATH);
}
/* If --show-pid-file supplied, output the location (if set) and exit */
if (show_pid_file == true)
{
printf("%s\n", pid_file);
exit(SUCCESS);
}
/* Some configuration file items can be overriden by command line options */
@@ -414,7 +483,7 @@ main(int argc, char **argv)
daemonize_process();
}
if (pid_file != NULL)
if (pid_file[0] != '\0')
{
check_and_create_pid_file(pid_file);
}
@@ -669,6 +738,8 @@ show_help(void)
{
printf(_("%s: replication management daemon for PostgreSQL\n"), progname());
puts("");
printf(_("%s monitors a cluster of servers and optionally performs failover.\n"), progname());
puts("");
printf(_("Usage:\n"));
printf(_(" %s [OPTIONS]\n"), progname());
@@ -688,12 +759,14 @@ show_help(void)
puts("");
printf(_("General configuration options:\n"));
printf(_(" -d, --daemonize detach process from foreground\n"));
printf(_(" -p, --pid-file=PATH write a PID file\n"));
printf(_("Daemon configuration options:\n"));
printf(_(" -d, --daemonize[=true/false]\n"));
printf(_(" detach process from foreground (default: true)\n"));
printf(_(" -p, --pid-file=PATH use the specified PID file\n"));
printf(_(" -s, --show-pid-file show PID file which would be used by the current configuration\n"));
printf(_(" --no-pid-file don't write a PID file\n"));
puts("");
printf(_("%s monitors a cluster of servers and optionally performs failover.\n"), progname());
}
@@ -802,7 +875,7 @@ terminate(int retval)
{
logger_shutdown();
if (pid_file)
if (pid_file[0] != '\0')
{
unlink(pid_file);
}

View File

@@ -10,6 +10,8 @@
#include <time.h>
#include "portability/instr_time.h"
#define OPT_NO_PID_FILE 1000
extern volatile sig_atomic_t got_SIGHUP;
extern MonitoringState monitoring_state;
extern instr_time degraded_monitoring_start;
@@ -26,4 +28,6 @@ const char *print_monitoring_state(MonitoringState monitoring_state);
void update_registration(PGconn *conn);
void terminate(int retval);
#endif /* _REPMGRD_H_ */