Compare commits

..

237 Commits

Author SHA1 Message Date
Ian Barwick
ae141b9d32 4.4beta2 2019-06-10 15:18:41 +09:00
Ian Barwick
d035550723 doc: add missing space 2019-06-10 09:02:51 +09:00
Ian Barwick
c7692b5d84 doc: improve repmgr.conf settings documentation 2019-06-07 12:50:05 +09:00
Ian Barwick
08b7f1294b doc: improve configuration documentation 2019-06-07 12:17:49 +09:00
Ian Barwick
81d01bf0e8 Canonicalize the data directory path when parsing the configuration file
This ensures the provided path matches the path PostgreSQL reports as its
data directory.
2019-06-07 09:53:44 +09:00
Ian Barwick
089c778e49 Fix extension version number query 2019-06-06 12:46:30 +09:00
Ian Barwick
b4b5681762 standby follow: remove some ineffective code
For some reason we were taking the trouble to extract an appliction_name
from the local node's conninfo, but this was being subsequently overwritten
with the node name (which is what we want anyway).
2019-06-06 12:15:23 +09:00
Ian Barwick
e5ef549aa7 doc: update release notes 2019-06-06 11:30:43 +09:00
Ian Barwick
cfc41392c3 Ensure parsed value of --upstream-conninfo is written to recovery.conf
Previously it was being parsed (a step which ensures any "application_name"
set by the caller is changed to the node name), but the original string
was being copied to "primary_conninfo" anyway.
2019-06-06 11:30:40 +09:00
Ian Barwick
55dc4f7a5f Remove redundant comment in .sql files 2019-06-04 13:46:30 +09:00
Ian Barwick
6616712346 4.4beta1 2019-06-04 13:22:56 +09:00
Ian Barwick
d893ce227b repmgrd: optionally exclude/include witness server from child node checks 2019-06-03 16:04:54 +09:00
Ian Barwick
e8731f8159 doc: update child node monitoring documentation 2019-06-03 16:04:51 +09:00
Ian Barwick
20d710e34c doc: update filename referenced in code comment 2019-06-03 15:30:02 +09:00
Ian Barwick
7e8710b1e9 doc: remove redundant entity definitions 2019-06-03 15:29:08 +09:00
Ian Barwick
19e8387d8f doc: remove mistakenly committed .sgml file 2019-05-30 19:58:06 +09:00
Ian Barwick
b5ff2ec120 repmgrd: update log text 2019-05-30 16:08:04 +09:00
Ian Barwick
0a4072b8f7 witness (un)register: add event details
Also create an actual event notification for both actions, rather
than just creating the event record.

This is presumably an oversight from the original conversion to
repmgr4 which no-one has noticed before.
2019-05-30 14:41:10 +09:00
Ian Barwick
d4df0055c9 repmgr: use --compact (not --terse) in "cluster events" to hide details column
This is consistent with usage elsewhere.

"--terse" is intended to reduce logging noise.
2019-05-30 14:19:37 +09:00
Ian Barwick
06a83247c9 repmgrd: note node type when logging child node dis/re-connections 2019-05-30 14:06:54 +09:00
Ian Barwick
a6ea1d0fda repmgrd: fix witness node disconnection monitoring 2019-05-30 11:51:50 +09:00
Ian Barwick
9a0994856a doc: note witness node behaviour in child node monitoring 2019-05-30 11:50:31 +09:00
Ian Barwick
45e17223b9 Update variable/field names relating to pg_basebackup's -X option
Now the "xlog nomenclature" Pg versions are fading into the past,
rename things related to handling pg_basebackup's -X option
(was: --xlog-method, now: --wal-method) to start with "wal_"
rather than "xlog_".

This is a cosmetic change for code clarity.
2019-05-30 09:32:06 +09:00
Ian Barwick
9085ca46a8 doc: update release notes 2019-05-28 15:38:19 +09:00
Ian Barwick
9114299223 Tweak log output if attempted to register witness on primary cluster 2019-05-28 14:58:32 +09:00
John Naylor
519df66197 Disallow witness on primary cluster 2019-05-28 14:40:15 +09:00
Ian Barwick
d1e708454f Fix fwrite() result check 2019-05-28 14:37:36 +09:00
Ian Barwick
d54f0d66fb Free palloc'd StringInfoData data 2019-05-28 13:04:44 +09:00
Ian Barwick
c153e2fc02 standby clone: improve --dry-run output
Log positive check results as an additional confirmation that the
upstream configuration appears to be correct.
2019-05-28 00:54:39 +09:00
Ian Barwick
44a39760a1 standby clone: improve source node replication connection check
Previously, the check was attempting to make replication connections
to the source node, and if these were failing, inferring that
insufficient walsenders were available.

However it's quite likely that the connections are refused due to
insufficient user connection permissions. So before performing
the connection check, query the number of potentially available
walsenders on the source node and compare it with the number
required (either 1 or 2) - if insufficient, exit with error and
hint about increasing "max_wal_senders".

Once we've established sufficient walsenders are available, inability
to connect is most likely related to permissions issues on the source
node.
2019-05-28 00:11:53 +09:00
Ian Barwick
b959f771c1 Improve naming/usage of node record variables in "standby clone"
Make it clearer we're dealing with the upstream node record.

Also avoid "overloading" the upstream record when checking for an
existing record with the same node name; this was not technically
a problem but mildly confusing when reading the code.
2019-05-27 23:39:49 +09:00
Ian Barwick
c560dfbbce cluster show: display timeline ID
This helps provide a better picture of the state of the cluster, i.e.
making it more obvious whether there's been a timeline divergence.

This also provides infrastructure for further improvements in cluster
status display and diagnosis.

Note this is only available in PostgreSQL 9.6 and later as it relies
on the SQL functions for interrogating pg_control, which can be executed
remotely. As PostgreSQL 9.5 will shortly be the only community-supported
version without these functions, it's not worth the effort of trying
to duplicate their functionality.
2019-05-27 09:39:19 +09:00
Ian Barwick
df6d160d2e Reformat REPMGR_NODE_COLUMNS macros for readability 2019-05-24 16:39:02 +09:00
Ian Barwick
14b805d650 Makefile: improve documentation targets
- add documentation targets to main Makefile
- ensure clean/maintainer-clean remove all generated documentation files
2019-05-24 14:15:54 +09:00
Ian Barwick
1d46261c24 doc: update appendix "Installing old package versions"
Move legacy 3.x package info to separate section.
2019-05-24 10:03:38 +09:00
Ian Barwick
8ead0042ad Miscellaneous comment and logging cleanup1 2019-05-23 09:31:46 +09:00
Ian Barwick
2bce1b371c doc: fold putative 4.3.1 release notes into 4.4 2019-05-23 09:03:18 +09:00
Ian Barwick
3c8bab97d8 Fix variable declarations 2019-05-22 17:26:34 +09:00
Ian Barwick
c9e85996f5 repmgr: prevent a standby being cloned from a witness server
Previously repmgr would happily clone from whatever server
it found at the provided source server address. We should
ensure that a standby can only be cloned from a node which
is part of the main replication cluster.

This check fetches a list of nodes from the source server,
connects to the first non-witness server it finds, and
compares the system identifiers of the source node and the
node it has connected to. If there is a mismatch, then the
source server is clearly not part of the main replication
cluster, and is most likely the witness server.
2019-05-22 16:52:25 +09:00
Ian Barwick
fa66e72c2f repmgrd: count witness server as child node for connection monitoring purposes
As the witness server does not, by definition, ever have an entry in pg_stat_replication,
we need to check its "attached" status by connecting to the witness server itself
and querying the reported upstream node ID (which should be set by the witness
server repmgrd). If this matches the current primary node ID, we count it as attached.
2019-05-21 15:19:41 +09:00
Ian Barwick
e6195edbca cluster show: warn if unable to connect to witness's upstream
Fix also applies to "daemon status".
2019-05-21 12:35:49 +09:00
Ian Barwick
2326c384c0 cluster show: fix upstream check for witnesses
Fix also applies to "daemon status"
2019-05-21 12:28:32 +09:00
Ian Barwick
074769a090 doc: remove copypasta error 2019-05-20 15:40:03 +09:00
Ian Barwick
10425d6967 doc: rename file endings from .sgml to .xml
As they are now XML files. In PostgreSQL itself they remain with
the .sgml suffix for backwards compatibility, but that's not
important for us.
2019-05-20 15:38:40 +09:00
Ian Barwick
cbaa890a22 doc: document "primary_visibility_consensus" 2019-05-17 14:55:51 +09:00
Ian Barwick
24e1108dba doc: fix incorrect case 2019-05-17 11:15:24 +09:00
Ian Barwick
f03e012c99 cluster show/daemon status: report if node not attached to advertised upstream 2019-05-14 16:15:03 +09:00
Ian Barwick
dd78a16006 Change return type of is_downstream_node_attached() from bool to NodeAttached
This enables us to better determine whether a node is definitively
attached, definitively not attached, or if it was not possible to
determine the attached state.
2019-05-14 15:57:20 +09:00
Ian Barwick
7599afce8b doc: mention minimum PostgreSQL version for building repmgr docs
As-is, it won't build against PostgreSQL 9.4 or earlier, but as 9.4
will be removed from community support later this year, it's not
so critical.
2019-05-14 14:26:03 +09:00
Ian Barwick
8587539adb Fix command line sanity check 2019-05-14 13:27:00 +09:00
Ian Barwick
fca033fb9d cluster show/daemon status: report upstream node mismatches
When showing node information, check if the node's copy of its
record shows a different upstream to the one expected according
to the node where the command is executed.

This helps visualise situations where the cluster is in an
unexpected state, and provide a better idea of the actual state.

For example, if a cluster has divided somehow and a set of nodes are
following a new primary, when running "cluster show" etc., repmgr
will now show the name of the primary those nodes are actually
following, rather than the now outdated node name recorded
on the other side of the split. A warning will also be issued
about the situation.
2019-05-14 13:11:31 +09:00
Ian Barwick
ae44012383 Minor code fixes to "cluster show"/"daemon status" formatting 2019-05-14 11:36:59 +09:00
Ian Barwick
b938f10206 repmgr client: mark some options as deprecated 2019-05-13 15:45:34 +09:00
Ian Barwick
0af732e88f doc: tweaks for PDF generation 2019-05-13 15:42:51 +09:00
Ian Barwick
1d36e34dfd doc: use "--wal-method" as the standard option
and note it's "--xlog-method" for 9.6 and earlier. This matches
practice elsewhere in the documentation.
2019-05-13 09:31:53 +09:00
Ian Barwick
d8e4c54ea4 "standby switchover": add "--repmgrd-force-unpause"
Implements GitHub #559.
2019-05-10 16:04:07 +09:00
Ian Barwick
d43b40c5c6 doc: enable creation of PDF files 2019-05-10 10:50:49 +09:00
Ian Barwick
ecf4bdb431 doc: fix typos in source install instructions
s/llib/lib/g
2019-05-10 10:28:52 +09:00
Ian Barwick
9d7a3e24af doc: tweak Makefile 2019-05-10 10:25:07 +09:00
Ian Barwick
6684822274 doc: update documentation build instructions
Also add an item in the release notes
2019-05-10 10:10:14 +09:00
Ian Barwick
edf3aa6687 doc: restore original stylesheet for now 2019-05-09 16:24:41 +09:00
Ian Barwick
255623004c doc: update link to PostgreSQL documentation 2019-05-09 16:24:37 +09:00
Ian Barwick
04a6bf86f2 doc: update documentation build instructions 2019-05-09 16:24:30 +09:00
Ian Barwick
3804c95019 doc: (re)add single page HTML generation 2019-05-09 16:24:26 +09:00
Ian Barwick
409eb47e2a doc: convert documentation to DocBook XML
This brings the repmgr documentation build system in line with that
used by the main PostgreSQL project, and removed the restriction
that documentation must be built against PostgreSQL 9.6 or earlier.

Main formatting changes are:

 - convert empty-element tags (mainly <xref/>)
 - put <indexterm> sections in the correct location
 - correct usage of various entities.
2019-05-09 16:24:21 +09:00
Ian Barwick
1a6f7e979d doc: update release notes 2019-05-07 17:23:03 +09:00
Ian Barwick
6f8fa45604 doc: update release notes 2019-05-07 15:49:54 +09:00
Ian Barwick
5e03627e6c doc: update release notes 2019-05-07 15:29:56 +09:00
Ian Barwick
1c13e57c8b doc: update release notes 2019-05-07 15:27:13 +09:00
Ian Barwick
02245a0014 repmgrd: add missing PQfinish() calls 2019-05-02 18:50:21 +09:00
Ian Barwick
4b37562444 Make it clearer that a witness node counts as a "sibling node"
It's not attached to the primary per-se, but needs to know what
the current primary is in order to correctly synchronise its
copy of the metadata.

Per GitHub #560.
2019-05-02 14:22:53 +09:00
Ian Barwick
8da355eb3f doc: update release notes 2019-05-02 14:00:07 +09:00
Ian Barwick
b8fa71257a doc: update "repmgr standby promote" documentation
Document new "--siblings-follow" option.
2019-05-02 12:06:08 +09:00
Ian Barwick
fed09ecaae standby promote: have former siblings follow new primary 2019-05-02 12:04:49 +09:00
Ian Barwick
98d09f83b5 standby (promote|switchover): improve --dry-run functionality
Continue checks as far as possible.
2019-05-02 12:04:43 +09:00
Ian Barwick
7bbe938e19 Separate promotion candidate walsender/slot checks into discrete functions
For use by "standby promote" as well as "standby follow"
2019-05-02 12:04:40 +09:00
Ian Barwick
63c7f758c3 Remove unneeded server version number variables
No need to pass these around.
2019-05-02 12:04:33 +09:00
Ian Barwick
b9f07f6a91 standby promote: use variable name "local_conn" for the local connection handle
This is consistent with usage in other functions, and makes it easier to
differentiate between the local node connection and the primary connection.
2019-05-02 12:04:26 +09:00
Ian Barwick
e4615b4666 Refactor code for executing --siblings-follow
This will enable provision of "--siblings-follow" to "repmgr standby promote"
2019-05-02 12:04:15 +09:00
Ian Barwick
dbeffbf29a doc: define entity for repmgrd 2019-05-01 10:36:54 +09:00
Ian Barwick
4d1e11533e doc: add missing space in example output 2019-05-01 10:14:18 +09:00
Ian Barwick
52905f1eb3 Standardize on "ID: %i" when logging node IDs
Previously there was a mix of "id:", "node id:", "node ID:" and "node_id:".
2019-04-30 17:07:33 +09:00
Ian Barwick
6c3b4c0db8 Remove unused line 2019-04-30 15:53:24 +09:00
Ian Barwick
89a7261483 Always quote node names in log messages 2019-04-30 15:52:56 +09:00
Frantisek Holop
d7de0a64e0 doc: bit too many e.g.'s
PR #565.
2019-04-30 10:47:45 +09:00
Frantisek Holop
531c4d9853 doc: promote -> follow
PR #565
2019-04-30 10:43:15 +09:00
Ian Barwick
356fe2e640 Fix "repmgr daemon status --csv" output 2019-04-29 20:52:27 +09:00
Ian Barwick
e32acda8c0 standby switchover: ignore nodes which are unreachable and marked as inactive
Previously "repmgr standby switchover" would abort if any node was unreachable,
as that means it was unable to check if repmgrd is running.

However if the node has been marked as inactive in the repmgr metadata, it's
reasonable to assume the node is no longer part of the replication cluster
and does not need to be checked.
2019-04-29 14:35:49 +09:00
Ian Barwick
5f10e68f31 emit warning if "--siblings-follow" provided out-of-context 2019-04-29 14:12:22 +09:00
Ian Barwick
87910a5448 repmgrd: improve logging of sibling node's upstream info
If the sibling node has already been promoted (for whatever
reason, e.g. "repmgr standby promote" was executed manually)
and has exited recovery, the upstream node ID will normally
be reported as "-1", which is correct, but looks confusing in
the logs.

We now only report the upstream node ID if the sibling node
is still in recovery, *or* if it has exited recovery but is
still reporting an extant node ID.
2019-04-29 13:51:17 +09:00
Ian Barwick
ec6266e375 doc: list caveats when monitoring child node disconnection 2019-04-25 17:52:14 +09:00
Ian Barwick
2082a8d3f3 Consolidate some code 2019-04-25 16:04:40 +09:00
Ian Barwick
c8d52bab6d cluster show: fix thinko introduced in commit 9fe2fa2 2019-04-25 15:46:07 +09:00
Ian Barwick
dbbf35ded1 Update HISTORY 2019-04-25 14:59:33 +09:00
Ian Barwick
9fe2fa2daf daemon status: make output more like that of "cluster show"
In particular make any issues with unexpected server state more
obvious.
2019-04-25 14:45:41 +09:00
Ian Barwick
da24896fd5 doc: add child node monitoring example 2019-04-24 16:04:47 +09:00
Ian Barwick
c092ce60a7 doc: document "child_node..." configuration parameters 2019-04-24 14:48:38 +09:00
Ian Barwick
090493ebc9 doc: document "child_node" events 2019-04-24 13:19:00 +09:00
Ian Barwick
8d80267ab1 doc: update "repmgr primary register" output 2019-04-24 13:18:31 +09:00
Ian Barwick
3231b5034d Remove temporary debugging log output 2019-04-24 13:17:52 +09:00
Ian Barwick
5a9175c740 Clarify hints about updating the repmgr extension 2019-04-24 11:37:31 +09:00
Ian Barwick
58b33fb411 Clarify a couple of code comments 2019-04-24 10:55:53 +09:00
Ian Barwick
3129da221e "primary register": ensure --force works if another primary is registered but not running 2019-04-23 16:54:07 +09:00
Ian Barwick
6cbf436bf8 Don't execute "child_nodes_disconnect_command" when repmgrd paused 2019-04-23 14:08:13 +09:00
Ian Barwick
5a90513878 repmgrd: monitor standbys attached to primary
This functionality enables repmgrd (when running on the primary) to
monitor connected child nodes. It will log connections and disconnections
and generate events.

Additionally, repmgrd can execute a custom script if the number of connected
child nodes falls below a configurable threshold. This script can be used
e.g. to "fence" the primary following a failover situation where a new primary
has been promoted and all standbys are now child nodes of that primary.
2019-04-22 16:18:52 +09:00
Ian Barwick
64c4cb81d5 Update pg_control processing for PostgreSQL 12 2019-04-18 09:31:33 +09:00
Ian Barwick
3115face28 doc: add note about when a PostgreSQL restart is required
Per query in GitHub #564.
2019-04-17 09:43:35 +09:00
Ian Barwick
80f66e87c9 Improve string handling during configuration file reload 2019-04-16 11:20:41 +09:00
Ian Barwick
ad28cf95bd standby register: add upstream node ID in event details 2019-04-16 11:01:22 +09:00
Ian Barwick
a0c6cb602f repmgrd: remove duplicate function definition 2019-04-16 10:53:05 +09:00
Ian Barwick
27803f93ff repmgrd: always unset upstream node ID when monitoring a primary 2019-04-12 12:26:39 +09:00
Ian Barwick
1a344d488a Use sizeof() consistently 2019-04-11 23:07:58 +09:00
Ian Barwick
46d17d0933 repmgrd: fix log output 2019-04-11 16:29:08 +09:00
Ian Barwick
6b79e08706 repmgrd: add addiitonal log output in do_election() 2019-04-11 15:46:20 +09:00
Ian Barwick
cd6a55c7cb repmgrd: improve primary visibility consensus check
Exclude sibling nodes which report they're following a different
node. This shouldn't happen, but could.
2019-04-11 15:46:14 +09:00
Ian Barwick
008bd00a59 repmgrd: store upstream node ID in shared memory 2019-04-11 15:46:09 +09:00
Ian Barwick
5a8741199f repmgrd: exclude witness server from followability check 2019-04-11 11:19:12 +09:00
Ian Barwick
dd454a8374 Miscellaneous string handling cleanup
This is mainly to prevent effectively spurious truncation warnings
in recent GCC versions.
2019-04-10 16:18:56 +09:00
Ian Barwick
a9b56d9833 Fix hint message
s/UPGRADE/UPDATE
2019-04-10 12:08:26 +09:00
Ian Barwick
ef47589c6b standby clone: always ensure directory is created with correct permissions
In Barman mode, if there is an existing, populated data directory, and
the "--force" option is provided, the entire directory was being deleted,
and later recreated as part of the rsync process, but with the default
permissions.

Fix this by recreating the data directory with the correct permissions
after deleting it.
2019-04-09 10:58:27 +09:00
Ian Barwick
77b9887d61 standby clone: improve --dry-run behaviour in barman mode
- emit additional informational output
- ensure that provision of --force does not result in an existing
  data directory being modified in any way
2019-04-08 15:12:22 +09:00
Ian Barwick
7631c60933 doc: update release notes 2019-04-08 11:27:25 +09:00
Ian Barwick
a8d560860d Ensure BDR-specific code only runs on BDR 2.x
The BDR support in repmgr is for a specific BDR 2.x use case, and
is not suitable for more recent BDR versions.
2019-04-05 14:37:49 +09:00
Ian Barwick
c338bc9c5e doc: add note about BDR replication type in sample config 2019-04-05 14:37:49 +09:00
Ian Barwick
3c8e42ff15 doc: emphasise that BDR2 support is for BDR2 only 2019-04-05 10:53:23 +09:00
Ian Barwick
be9c6d5fc6 Use correct sizeof() argument in a couple of strncpy calls
Source and destination buffers are however the same length in both cases.

Per GitHub #561.
2019-04-04 10:58:00 +09:00
Ian Barwick
55e79bd0b7 doc: update 4.3 release notes 2019-04-03 15:08:35 +09:00
Ian Barwick
8970a72be9 doc: update README
Link to current documentation version
2019-04-03 11:12:48 +09:00
Ian Barwick
7791abb8f7 doc: add a link to the current documentation from the contents page 2019-04-03 10:54:18 +09:00
Ian Barwick
602e06a8f4 doc: finalize 4.3 release notes 2019-04-02 14:42:06 +09:00
Ian Barwick
84f4c6c979 doc: note that --siblings-follow will become default in a future release 2019-04-02 11:04:36 +09:00
Ian Barwick
67e977592c standby switchover: list nodes which will remain attatched to the old primary
If --siblings-follow is not supplied, list all nodes which repmgr considers
to be siblings (this will include the witness server, if in use), and
which will remain attached to the old primary.
2019-04-02 10:46:59 +09:00
Ian Barwick
b1cd7e7edf doc: update quickstart guide
Improve sample PostgreSQL replication configuration, including
links to the PostgreSQL documentation for each configuration item.

Also set "max_replication_slots" to the same value as "max_wal_senders"
to ensure the sample configuration will work regardless of whether
replication slots are in use, though we do still encourage careful
reading of the comments in the sample configuration and the documentation
in general.
2019-04-02 09:27:37 +09:00
Ian Barwick
a564f365c1 Fix default return value in alter_system_int() 2019-04-01 14:50:19 +09:00
Ian Barwick
799ac6d453 Add is_server_available_quiet()
For use in cases where the caller collates node availability information
and doesn't want to prematurely emit log output.
2019-04-01 12:27:30 +09:00
Ian Barwick
57c0ccd477 Improve copying of strings from database results
Where feasible, specify the maximum string length via sizeof(), and
use snprintf() in place of strncpy().
2019-04-01 11:19:58 +09:00
Ian Barwick
aef8e31897 Bump master branch to 4.4dev 2019-03-28 17:24:36 +09:00
Ian Barwick
3d4b81ba2a Handle unhandled error situation in enable_wal_receiver() 2019-03-28 14:52:16 +09:00
Ian Barwick
98d924685b Updae BDR repmgrd to handle node_name as a max 63 char string
Follow-up from commit 1953ec7.
2019-03-28 14:32:52 +09:00
Ian Barwick
79613af8d0 Handle potential NULL return from string_skip_prefix() 2019-03-28 12:45:53 +09:00
Ian Barwick
5e9f202c9a Add missing break 2019-03-28 12:44:50 +09:00
Ian Barwick
e44c048ae2 Update code comment 2019-03-28 12:44:30 +09:00
Ian Barwick
bb42d8cba6 Fix calculation of maximum filename length 2019-03-28 12:40:29 +09:00
Ian Barwick
9d5afeebbc Remove logically dead code 2019-03-28 12:35:41 +09:00
Ian Barwick
fe822a9eea Prevent potential file descriptor resource leak 2019-03-28 12:29:10 +09:00
Ian Barwick
03cd5a6028 Put closedir call in correct location 2019-03-28 12:08:42 +09:00
Ian Barwick
1e1c596446 Add various missing close() calls 2019-03-28 11:32:25 +09:00
Ian Barwick
d43975eb5f Use correct argument for sizeof() 2019-03-28 11:02:50 +09:00
Ian Barwick
ece20f4831 Cast "int" to "long long" 2019-03-28 11:02:25 +09:00
Ian Barwick
e23f5afc5f doc: note valid characters for "node_name"
"node_name" will be used as "application_name", so should only contain
characters valid for that; see:

    https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-APPLICATION-NAME

Not yet enforced.
2019-03-28 10:53:43 +09:00
Ian Barwick
ba1f05ece9 Restrict "node_name" to maximum 63 characters
In "recovery.conf", the configuration parameter "node_name" is used
as the "application_name" value, which will be truncated by PostgreSQL
to 63 characters (NAMEDATALEN - 1).

repmgr sometimes needs to be able to extract the application name from
pg_stat_replication to determine if a node is connected (e.g. when
executing "repmgr standby register"), so the comparison will fail
if "node_name" exceeds 63 characters.
2019-03-28 10:37:57 +09:00
Ian Barwick
73ad689390 standby register: fail if --upstream-node-id is the local node ID 2019-03-27 14:22:55 +09:00
Ian Barwick
e9ece34aeb log_db_error(): fix formatted message handling 2019-03-27 11:00:31 +09:00
Ian Barwick
9dd2f30c72 Use sizeof(buf) rather than hard-coding value 2019-03-27 10:43:49 +09:00
Ian Barwick
9164d3931b repmgrd: clean up PQExpBuffer handling
Unless the PQExpBuffer is required for the duration of the function,
ensure it's always a variable local to the relevant code block. This
mitigates the risk of accidentally accessing a generically named
PQExpBuffer which hasn't been initialised or was previously terminated.
2019-03-26 13:15:25 +09:00
Ian Barwick
801ed2b0c8 repmgrd: don't terminate uninitialized PQExpBuffer 2019-03-26 11:35:45 +09:00
Ian Barwick
e490f35223 doc: fix syntax 2019-03-22 15:43:55 +09:00
Ian Barwick
ec873b0119 doc: update release notes 2019-03-22 15:43:49 +09:00
Ian Barwick
539861cb58 repmgrd: during failover, check if a node was already promoted
Previously, repmgrd assumed that during a failover, there would not
already be another primary node. However it's possible a node was
promoted manually. While this is not a desirable situation, it's
conceivable this could happen in the wild, so we should check for
it and react accordingly.

Also sanity-check that the follow target can actually be followed.

Addresses issue raised in GitHub #420.
2019-03-22 14:06:41 +09:00
Ian Barwick
6f0f338968 standby follow: set replication user when connecting to local node 2019-03-21 16:43:39 +09:00
Ian Barwick
bd26eb3025 standby switchover: don't attempt to pause repmgrd on unreachable nodes 2019-03-21 13:48:59 +09:00
Ian Barwick
9b089b7401 doc: add note about compiling against Pg11 and later with the --with-llvm option 2019-03-21 10:30:00 +09:00
Ian Barwick
314a1e8f4f use a constant to denote unknown replication lag 2019-03-20 17:26:04 +09:00
Ian Barwick
7204a0faf4 doc: consolidate witness server documentation 2019-03-20 16:31:52 +09:00
Ian Barwick
5e775cef16 doc: various improvements to repmgrd documentation 2019-03-20 16:10:03 +09:00
Ian Barwick
7d0caefaee Fix logging related to "connection_check_type"
Also log the selected type at repmgrd startup.
2019-03-20 11:58:18 +09:00
Ian Barwick
7434cc0b8e repmgrd: improve witness node monitoring
Mainly fix a couple of places where "standby" was hard-coded into a log
message which can apply either to a witness or a standby.
2019-03-20 11:47:36 +09:00
Ian Barwick
b84d98fe81 Explictly log PQping() failures 2019-03-20 11:47:32 +09:00
Ian Barwick
46efe57cd0 Improve database connection failure logging
Log the output of PQerrorStatus() in a couple of places where it was missing.

Additionally, always log the output of PQerrorStatus() starting with a blank
line, otherwise the first line looks like it was emitted by repmgr, and
it's harder to scan the error message.

Before:

    [2019-03-20 11:24:15] [DETAIL] could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?

After:

    [2019-03-20 11:27:21] [DETAIL]
    could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?
2019-03-20 11:47:28 +09:00
Ian Barwick
426759ca8e check_primary_status(): handle case where recovery type unknown 2019-03-18 16:16:54 +09:00
Ian Barwick
39df55c39c Check node recovery type before attempting to write an event record
In some corner cases (e.g. immediately after a switchover) where
the current primary has not yet been determined, the provided connection
might not be writeable. This prevents error messages such as
"cannot execute INSERT in a read-only transaction" generating unnecessary
noise in the logs.
2019-03-18 15:26:16 +09:00
Ian Barwick
f54ff85cfa Remove outdated comment
This was only relevant for repmgr3 and earlier; in repmgr4 the schema
is hard-coded.
2019-03-18 15:19:11 +09:00
Ian Barwick
8ab51c2ae3 Refactor check_primary_status()
Reduce nested if/else branching, and improve documentation.
2019-03-18 15:01:21 +09:00
Ian Barwick
43f28f4097 Clarify calls to check_primary_status()
Use a constant rather than a magic number to indicate non-provision
of elapsed degraded monitoring time.
2019-03-18 14:21:34 +09:00
Ian Barwick
0940185f49 doc: clarify "cluster show" error codes 2019-03-18 10:49:38 +09:00
John Naylor
4f9fc56871 Fix assorted Makefile bugs
1. The target additional-maintainer-clean was misspelled as
maintainer-additional-clean.

2. Add add missing clean targets, in particular sysutils.o, config.h,
repmgr_version.h, and Makefile.global. While at it, use a wildcard
for obj files.

3. Don't delete configure.

4. Remove generated file doc/version.sgml from the repo.

5. Have maintainer-clean recurse to the doc directory.
2019-03-15 16:29:31 +09:00
Ian Barwick
fbdf9617fa doc: update repmgrd example output 2019-03-15 15:43:11 +09:00
Ian Barwick
dfb92df05f doc: miscellaenous cleanup 2019-03-15 14:39:37 +09:00
Ian Barwick
9dd87dd5ce doc: add explanation of the configuration file format 2019-03-15 14:02:42 +09:00
Ian Barwick
a2df69512a doc: update "connection_check_type" descriptions 2019-03-14 15:44:59 +09:00
Ian Barwick
c2206b007a repmgrd: optionally check upstream availability through connection attempts 2019-03-14 15:44:53 +09:00
John Naylor
e06d3de444 Correct some doc typos 2019-03-14 11:58:31 +08:00
Ian Barwick
9d056b2f72 doc: expand "standby_disconnect_on_failover" documentation 2019-03-14 12:08:13 +09:00
Ian Barwick
19bf4d7434 Count witness and zero-priority nodes in visibility check 2019-03-14 11:17:51 +09:00
Ian Barwick
56d9f5b856 Ensure witness node sets last upstream seen time 2019-03-14 10:53:47 +09:00
Ian Barwick
c1d6753081 doc: fix option name typo 2019-03-14 09:32:06 +09:00
Ian Barwick
2b59b4894a doc: expand "failover_validate_command" documentation 2019-03-13 21:10:03 +09:00
Ian Barwick
c3c58df7b9 repmgrd: improve logging output when executing "failover_validate_command" 2019-03-13 21:07:26 +09:00
Ian Barwick
0e2f3e563a doc: various updates 2019-03-13 16:55:32 +09:00
Ian Barwick
8c4421d110 doc: merge repmgrd witness server description into failover section 2019-03-13 16:12:17 +09:00
Ian Barwick
69cb3f1e82 doc: merge repmgrd split network handling description into failover section 2019-03-13 16:12:14 +09:00
Ian Barwick
960acfeb3c doc: merge repmgrd monitoring description into operating section 2019-03-13 16:12:11 +09:00
Ian Barwick
a8d50a5b98 doc: merge repmgrd degraded monitoring description into operation section 2019-03-13 16:12:06 +09:00
Ian Barwick
11e5993bf5 doc: merge repmgrd notes into operation documentation 2019-03-13 16:12:03 +09:00
Ian Barwick
09861a5604 doc: merge repmgrd pause documentation into overview 2019-03-13 16:11:59 +09:00
Ian Barwick
89bba77d4d doc: initial repmgrd doc refactoring 2019-03-13 16:11:55 +09:00
Ian Barwick
dd6ece326f doc: update repmgrd configuration documentation 2019-03-13 13:34:08 +09:00
Ian Barwick
573d027db6 repmgrd: various minor logging improvements 2019-03-13 11:27:17 +09:00
Ian Barwick
1afb41647b repmgrd: remove global variable
Make the "sibling_nodes" local, and pass by reference where relevant.
2019-03-12 17:12:23 +09:00
Ian Barwick
fc397f25f6 repmgrd: enable election rerun
If "failover_validation_command" is set, and the command returns an error,
rerun the election.

There is a pause between reruns to avoid "churn"; the length of this pause
is controlled by the configuration parameter "election_rerun_interval".
2019-03-12 17:12:19 +09:00
Ian Barwick
99923f5ffc Remove redundant struct allocation 2019-03-11 19:06:07 +09:00
Ian Barwick
b9cdcd55e7 doc: update list of reloadable repmgrd configuration options 2019-03-11 16:18:10 +09:00
Ian Barwick
db87ff46fd doc: document "failover_validation_command" 2019-03-11 15:02:33 +09:00
Ian Barwick
2a8f8d8400 doc: expand repmgrd configuration section 2019-03-11 14:50:33 +09:00
Ian Barwick
4ef706c2ca Execute "failover_validation_command" when only one standby exists 2019-03-08 12:19:37 +09:00
Ian Barwick
663c2e75b4 Make "failover_validation_command" reloadable 2019-03-08 09:27:19 +09:00
Ian Barwick
db0d71c6a7 Initial implementation of "failover_validation_command" 2019-03-08 08:49:15 +09:00
Ian Barwick
6f4f56dd8c Make recently added configuration options reloadable 2019-03-07 10:58:25 +09:00
Ian Barwick
33fefd9f52 Add configuration option "primary_visibility_consensus"
This determines whether repmgrd should continue with a failover if
one or more nodes report they can still see the standby.
2019-03-07 10:41:42 +09:00
Ian Barwick
a3f90d2bba Add configuration option "sibling_nodes_disconnect_timeout"
This controls the maximum length of time in seconds that repmgrd will
wait for other standbys to disconnect their WAL receivers in a failover
situation.

This setting is only used when "standby_disconnect_on_failover" is set to "true".
2019-03-06 15:56:21 +09:00
Ian Barwick
2ed044c358 Reset "wal_retrieve_retry_interval" for all nodes 2019-03-06 15:55:03 +09:00
Ian Barwick
9823978f41 repmgrd: don't wait for WAL receiver to reconnect during failover
If the WAL receiver has been temporarily disabled, we don't want to
wait for it to start up as it may not be able to at that point; we do
however need to reset "wal_retrieve_retry_interval".
2019-03-06 15:54:56 +09:00
Ian Barwick
ae8171e461 Improve logging/sanity checking for "node control" options 2019-03-06 15:54:30 +09:00
Ian Barwick
1f8f64d57c Improve logging when disabling/enabling WAL receiver
Also check action is being run on node which is in recovery.
2019-03-06 15:54:26 +09:00
Ian Barwick
13c650fa83 Check for WAL receiver start up 2019-03-06 15:54:23 +09:00
Ian Barwick
f85b4cd98e Log warning if "standby_disconnect_on_failover" used on pre-9.5
"standby_disconnect_on_failover" requires availability of "wal_retrieve_retry_interval",
which is available from PostgreSQL 9.5.

9.4 will fall out of community support this year, so it doesn't seem
productive at this point to do anything more than put the onus on the user
to read the documentation and heed any warning messages in the logs.
2019-03-06 15:54:15 +09:00
Ian Barwick
1615353f48 repmgrd: optionally disconnect WAL receivers during failover
This is intended to ensure that all nodes have a constant LSN while
making the failover decision.

This feature is experimental and needs to be explicitly enabled with the
configuration file option "standby_disconnect_on_failover".

Note enabling this option will result in a delay in the failover decision
until the WAL receiver is disconnected on all nodes.
2019-03-06 15:53:57 +09:00
Ian Barwick
dd04ebb809 repmgrd: handle reconnect to restarted server when using "connection" checks 2019-03-06 14:54:05 +09:00
Ian Barwick
b4dcda37a1 *_transaction() functions: log error message text as DETAIL
Per behaviour elsewhere.
2019-03-06 12:12:47 +09:00
Ian Barwick
63f7ad546e repmgrd: add option "connection_check_type"
This enable selection of the method repmgrd uses to check whether the upstream
node is available. Possible values are:

 - "ping" (default): uses PQping() to check server availability
 - "connection":  executes a query on the connection to check server
   availability (similar to repmgr3.x).
2019-03-06 12:09:54 +09:00
Ian Barwick
4f83111033 repmgrd: ignore invalid "upstream_last_seen" value 2019-03-05 11:00:29 +09:00
Ian Barwick
92103c5338 Use appendPQExpBufferStr where approrpriate 2019-03-01 16:42:00 +09:00
Ian Barwick
4b89cbd98d Rename "..._primary_last_seen" functions to "..._upstream_last_seen"
As that better reflects what they do.
2019-02-28 15:36:55 +09:00
Ian Barwick
0330fa6e62 daemon status: with csv output, show repmgrd status as unknown where appropriate
Previously, if PostgreSQL was not running on the node, repmgrd and
pause status were shown as "0", implying their status was known.

This brings the csv output in line with the human-readable output,
which displays "n/a" in this case.
2019-02-28 12:27:39 +09:00
Ian Barwick
4006f8af3c doc: upate release notes 2019-02-28 10:01:51 +09:00
Ian Barwick
b1875a8d91 Split command execution functions into separate library
These may need to be executed by repmgrd.
2019-02-27 14:41:17 +09:00
Ian Barwick
5c2264eb8d Update .gitignore
Ignore artefacts from failed patch application.
2019-02-27 13:02:30 +09:00
Ian Barwick
a6c16541c2 doc: tweak wording in event notification documentation 2019-02-27 13:01:19 +09:00
Ian Barwick
790a1cc492 repmgrd: add additional logging during a failover operation 2019-02-27 11:46:05 +09:00
Ian Barwick
067ed82931 Remove unneeded debugging output 2019-02-26 21:16:11 +09:00
Ian Barwick
59f32d74df doc: update introductory blurb 2019-02-26 15:16:46 +09:00
Ian Barwick
0578053875 standby clone: check upstream connections after data copy operation
With long-running copy operations, it's possible the connection(s) to
the primary/source server may go away for some reason, so recheck
their availability before attempting to reuse.
2019-02-26 14:37:05 +09:00
John Naylor
897e3bee14 Doc fix: PostgreSQL 9.4 is no longer considered recent 2019-02-24 12:44:10 +07:00
John Naylor
4e414d2ea0 Fix typo 2019-02-24 10:50:09 +07:00
Ian Barwick
ea36609159 Add some missing query error logging 2019-02-23 16:54:07 +09:00
Ian Barwick
0c68018631 repmgrd: log details of nodes which can see primary
If a failover is cancelled because other nodes can still see the primary,
log the identies of those nodes.
2019-02-23 15:55:06 +09:00
Ian Barwick
b72c894db4 repmgrd: during failover, check if other nodes have seen the primary
In a situation where only some standbys are cut off from the primary,
a failover would result in a split brain/split cluster situation,
as it's likely one of the cut-off standbys will promote itself, and
other cut-off standbys (but not all standbys) will follow it.

To prevent this happening, interrogate the other sibiling nodes to
check whether they've seen the primary within a reasonably short interval;
if this is the case, do not take any failover action.

This feature is experimental.
2019-02-23 13:03:22 +09:00
111 changed files with 9060 additions and 3232 deletions

5
.gitignore vendored
View File

@@ -42,11 +42,12 @@ lib*.pc
/regression.diffs
/regression.out
/doc/Makefile
# other
/.lineno
*.dSYM
*.orig
*.rej
# generated binaries
repmgr
repmgrd

39
HISTORY
View File

@@ -1,4 +1,36 @@
4.3 2019-??
4.4 2019-??-??
repmgr: improve "daemon status" output (Ian)
repmgr: add "--siblings-follow" option to "standby promote" (Ian)
repmgr: add "--repmgrd-force-unpause" option to "standby switchover" (Ian)
repmgr: fix data directory permissions issue in barman mode where
an existing directory is being overwritten (Ian)
repmgr: improve "--dry-run" behaviour for "standby promote" and
"standby switchover" (Ian)
repmgr: when running "standby clone" with the "--upstream-conninfo" option
ensure that "application_name" is set correctly in "primary_conninfo" (Ian)
repmgr: ensure "--dry-run" together with --force when running "standby clone"
in barman mode does not modify an existing data directory (Ian)
repmgr: improve "--dry-run" output when running "standby clone" in
basebackup mode (Ian)
repmgr: improve upstream walsender checks when running "standby clone" (Ian)
repmgr: display node timeline ID in "cluster show" output (Ian)
repmgr: in "cluster show" and "daemon status", show upstream node name
as reported by each individual node (Ian)
repmgr: in "cluster show" and "daemon status", check if a node is attached
to its advertised upstream node
repmgr: use --compact rather than --terse option in "cluster event" (Ian)
repmgr: prevent a standby being cloned from a witness server (Ian)
repmgr: prevent a witness server being registered on the cluster primary (John)
repmgr: ensure BDR2-specific functionality cannot be used on
BDR3 and later (Ian)
repmgr: canonicalize the data directory path (Ian)
repmgrd: monitor standbys attached to primary (Ian)
repmgrd: add "primary visibility consensus" functionality (Ian)
repmgrd: fix memory leak which occurs while the monitored PostgreSQL
node is not running (Ian)
general: documentation converted to DocBook XML format (Ian)
4.3 2019-04-02
repmgr: add "daemon (start|stop)" command; GitHub #528 (Ian)
repmgr: add --version-number command line option (Ian)
repmgr: add --compact option to "cluster show"; GitHub #521 (Ian)
@@ -15,6 +47,8 @@
repmgr: add sanity check for correct extension version (Ian)
repmgr: ensure "witness register --dry-run" does not attempt to read node
tables if repmgr extension not installed; GitHub #513 (Ian)
repmgr: ensure "standby register" fails when --upstream-node-id is the
same as the local node ID (Ian)
repmgrd: check binary and extension major versions match; GitHub #515 (Ian)
repmgrd: on a cascaded standby, don't fail over if "failover=manual";
GitHub #531 (Ian)
@@ -22,6 +56,9 @@
candidates (Ian)
repmgrd: add option "connection_check_type" (Ian)
repmgrd: improve witness monitoring when primary node not available (Ian)
repmgrd: handle situation where a primary has unexpectedly appeared
during failover; GitHub #420 (Ian)
general: fix Makefile (John)
4.2 2018-10-24
repmgr: add parameter "shutdown_check_timeout" for use by "standby switchover";

View File

@@ -17,7 +17,9 @@ DATA = \
repmgr--4.1--4.2.sql \
repmgr--4.2.sql \
repmgr--4.2--4.3.sql \
repmgr--4.3.sql
repmgr--4.3.sql \
repmgr--4.3--4.4.sql \
repmgr--4.4.sql
REGRESS = repmgr_extension
@@ -75,10 +77,19 @@ Makefile: Makefile.in config.status configure
Makefile.global: Makefile.global.in config.status configure
./config.status $@
doc:
$(MAKE) -C doc all
doc: repmgr_version.h
$(MAKE) -C doc html
install-doc:
doc-repmgr.html: repmgr_version.h
$(MAKE) -C doc repmgr.html
doc-repmgr-A4.pdf: repmgr_version.h
$(MAKE) -C doc repmgr-A4.pdf
doc-repmgr-US.pdf: repmgr_version.h
$(MAKE) -C doc repmgr-US.pdf
install-doc: doc
$(MAKE) -C doc install
clean: additional-clean
@@ -87,6 +98,7 @@ maintainer-clean: additional-maintainer-clean
additional-clean:
rm -f *.o
$(MAKE) -C doc clean
additional-maintainer-clean: clean
$(MAKE) -C doc maintainer-clean
@@ -109,3 +121,4 @@ installdirs-scripts:
.PHONY: installdirs-scripts
endif
.PHONY: doc doc-repmgr.html doc-repmgr-A4.pdf doc-repmgr-US.pdf install-doc

View File

@@ -27,7 +27,7 @@ Documentation
The main `repmgr` documentation is available here:
> [repmgr 4 documentation](https://repmgr.org/docs/4.2/index.html)
> [repmgr documentation](https://repmgr.org/docs/current/index.html)
The `README` file for `repmgr` 3.x is available here:
@@ -72,7 +72,7 @@ Please report bugs and other issues to:
* https://github.com/2ndQuadrant/repmgr
Further information is available at https://www.repmgr.org/
Further information is available at https://repmgr.org/
We'd love to hear from you about how you use repmgr. Case studies and
news are always welcome. Send us an email at info@2ndQuadrant.com, or
@@ -97,6 +97,7 @@ Thanks from the repmgr core team.
Further reading
---------------
* [repmgr documentation](https://repmgr.org/docs/current/index.html)
* https://blog.2ndquadrant.com/repmgr-3-2-is-here-barman-support-brand-new-high-availability-features/
* https://blog.2ndquadrant.com/improvements-in-repmgr-3-1-4/
* https://blog.2ndquadrant.com/managing-useful-clusters-repmgr/

View File

@@ -344,7 +344,7 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
options->failover = FAILOVER_MANUAL;
options->priority = DEFAULT_PRIORITY;
memset(options->location, 0, sizeof(options->location));
strncpy(options->location, DEFAULT_LOCATION, MAXLEN);
strncpy(options->location, DEFAULT_LOCATION, sizeof(options->location));
memset(options->promote_command, 0, sizeof(options->promote_command));
memset(options->follow_command, 0, sizeof(options->follow_command));
options->monitor_interval_secs = DEFAULT_MONITORING_INTERVAL;
@@ -365,6 +365,13 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
memset(options->failover_validation_command, 0, sizeof(options->failover_validation_command));
options->election_rerun_interval = DEFAULT_ELECTION_RERUN_INTERVAL;
options->child_nodes_check_interval = DEFAULT_CHILD_NODES_CHECK_INTERVAL;
options->child_nodes_disconnect_min_count = DEFAULT_CHILD_NODES_DISCONNECT_MIN_COUNT;
options->child_nodes_connected_min_count = DEFAULT_CHILD_NODES_CONNECTED_MIN_COUNT;
options->child_nodes_connected_include_witness = DEFAULT_CHILD_NODES_CONNECTED_INCLUDE_WITNESS;
options->child_nodes_disconnect_timeout = DEFAULT_CHILD_NODES_DISCONNECT_TIMEOUT;
memset(options->child_nodes_disconnect_command, 0, sizeof(options->child_nodes_disconnect_command));
/*-------------
* witness settings
*-------------
@@ -484,21 +491,34 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
node_id_found = true;
}
else if (strcmp(name, "node_name") == 0)
strncpy(options->node_name, value, MAXLEN);
{
if (strlen(value) < sizeof(options->node_name))
strncpy(options->node_name, value, sizeof(options->node_name));
else
item_list_append_format(error_list,
_("value for \"node_name\" must contain fewer than %lu characters"),
sizeof(options->node_name));
}
else if (strcmp(name, "conninfo") == 0)
strncpy(options->conninfo, value, MAXLEN);
else if (strcmp(name, "data_directory") == 0)
{
strncpy(options->data_directory, value, MAXPGPATH);
canonicalize_path(options->data_directory);
}
else if (strcmp(name, "config_directory") == 0)
{
strncpy(options->config_directory, value, MAXPGPATH);
canonicalize_path(options->config_directory);
}
else if (strcmp(name, "replication_user") == 0)
{
if (strlen(value) < NAMEDATALEN)
strncpy(options->replication_user, value, NAMEDATALEN);
if (strlen(value) < sizeof(options->replication_user))
strncpy(options->replication_user, value, sizeof(options->replication_user));
else
item_list_append(error_list,
_("value for \"replication_user\" must contain fewer than " STR(NAMEDATALEN) " characters"));
item_list_append_format(error_list,
_("value for \"replication_user\" must contain fewer than %lu characters"),
sizeof(options->replication_user));
}
else if (strcmp(name, "pg_bindir") == 0)
strncpy(options->pg_bindir, value, MAXPGPATH);
@@ -645,7 +665,7 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
else
{
item_list_append(error_list,
_("value for \"connection_check_type\" must be \"ping\" or \"connection\"\n"));
_("value for \"connection_check_type\" must be \"ping\", \"connection\" or \"query\"\n"));
}
}
else if (strcmp(name, "primary_visibility_consensus") == 0)
@@ -654,6 +674,18 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
strncpy(options->failover_validation_command, value, sizeof(options->failover_validation_command));
else if (strcmp(name, "election_rerun_interval") == 0)
options->election_rerun_interval = repmgr_atoi(value, name, error_list, 0);
else if (strcmp(name, "child_nodes_check_interval") == 0)
options->child_nodes_check_interval = repmgr_atoi(value, name, error_list, 1);
else if (strcmp(name, "child_nodes_disconnect_command") == 0)
snprintf(options->child_nodes_disconnect_command, sizeof(options->child_nodes_disconnect_command), "%s", value);
else if (strcmp(name, "child_nodes_disconnect_min_count") == 0)
options->child_nodes_disconnect_min_count = repmgr_atoi(value, name, error_list, -1);
else if (strcmp(name, "child_nodes_connected_min_count") == 0)
options->child_nodes_connected_min_count = repmgr_atoi(value, name, error_list, -1);
else if (strcmp(name, "child_nodes_connected_include_witness") == 0)
options->child_nodes_connected_include_witness = parse_bool(value, name, error_list);
else if (strcmp(name, "child_nodes_disconnect_timeout") == 0)
options->child_nodes_disconnect_timeout = repmgr_atoi(value, name, error_list, 0);
/* witness settings */
else if (strcmp(name, "witness_sync_interval") == 0)
@@ -828,15 +860,16 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
conninfo_options = PQconninfoParse(options->conninfo, &conninfo_errmsg);
if (conninfo_options == NULL)
{
char error_message_buf[MAXLEN] = "";
PQExpBufferData error_message_buf;
initPQExpBuffer(&error_message_buf);
snprintf(error_message_buf,
MAXLEN,
_("\"conninfo\": %s (provided: \"%s\")"),
conninfo_errmsg,
options->conninfo);
appendPQExpBuffer(&error_message_buf,
_("\"conninfo\": %s (provided: \"%s\")"),
conninfo_errmsg,
options->conninfo);
item_list_append(error_list, error_message_buf);
item_list_append(error_list, error_message_buf.data);
termPQExpBuffer(&error_message_buf);
}
PQconninfoFree(conninfo_options);
@@ -1085,12 +1118,18 @@ parse_time_unit_parameter(const char *name, const char *value, char *dest, ItemL
* loop is started up; it therefore only needs to reload options required
* by repmgrd, which are as follows:
*
* changeable options (keep the list in "doc/repmgrd-configuration.sgml" in sync
* changeable options (keep the list in "doc/repmgrd-configuration.xml" in sync
* with these):
*
* - async_query_timeout
* - bdr_local_monitoring_only
* - bdr_recovery_timeout
* - child_nodes_check_interval
* - child_nodes_connected_min_count
* - child_nodes_connected_include_witness
* - child_nodes_disconnect_command
* - child_nodes_disconnect_min_count
* - child_nodes_disconnect_timeout
* - connection_check_type
* - conninfo
* - degraded_monitoring_timeout
@@ -1196,7 +1235,7 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
return false;
}
if (strncmp(new_options.node_name, orig_options->node_name, MAXLEN) != 0)
if (strncmp(new_options.node_name, orig_options->node_name, sizeof(orig_options->node_name)) != 0)
{
log_warning(_("\"node_name\" cannot be changed, keeping current configuration"));
return false;
@@ -1238,8 +1277,95 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
config_changed = true;
}
/* child_nodes_check_interval */
if (orig_options->child_nodes_check_interval != new_options.child_nodes_check_interval)
{
if (new_options.child_nodes_check_interval < 0)
{
log_error(_("\"child_nodes_check_interval\" must be \"0\" or greater; provided: \"%i\""),
new_options.child_nodes_check_interval);
}
else
{
orig_options->child_nodes_check_interval = new_options.child_nodes_check_interval;
log_info(_("\"child_nodes_check_interval\" is now \"%i\""), new_options.child_nodes_check_interval);
config_changed = true;
}
}
/* child_nodes_disconnect_command */
if (strncmp(orig_options->child_nodes_disconnect_command, new_options.child_nodes_disconnect_command, sizeof(orig_options->child_nodes_disconnect_command)) != 0)
{
snprintf(orig_options->child_nodes_disconnect_command, sizeof(orig_options->child_nodes_disconnect_command),
"%s", new_options.child_nodes_disconnect_command);
log_info(_("\"child_nodes_disconnect_command\" is now \"%s\""), new_options.child_nodes_disconnect_command);
config_changed = true;
}
/* child_nodes_disconnect_min_count */
if (orig_options->child_nodes_disconnect_min_count != new_options.child_nodes_disconnect_min_count)
{
if (new_options.child_nodes_disconnect_min_count < 0)
{
log_error(_("\"child_nodes_disconnect_min_count\" must be \"0\" or greater; provided: \"%i\""),
new_options.child_nodes_disconnect_min_count);
}
else
{
orig_options->child_nodes_disconnect_min_count = new_options.child_nodes_disconnect_min_count;
log_info(_("\"child_nodes_disconnect_min_count\" is now \"%i\""), new_options.child_nodes_disconnect_min_count);
config_changed = true;
}
}
/* child_nodes_connected_min_count */
if (orig_options->child_nodes_connected_min_count != new_options.child_nodes_connected_min_count)
{
if (new_options.child_nodes_connected_min_count < 0)
{
log_error(_("\"child_nodes_connected_min_count\" must be \"0\" or greater; provided: \"%i\""),
new_options.child_nodes_connected_min_count);
}
else
{
orig_options->child_nodes_connected_min_count = new_options.child_nodes_connected_min_count;
log_info(_("\"child_nodes_connected_min_count\" is now \"%i\""), new_options.child_nodes_connected_min_count);
config_changed = true;
}
}
/* child_nodes_connected_include_witness */
if (orig_options->child_nodes_connected_include_witness != new_options.child_nodes_connected_include_witness)
{
orig_options->child_nodes_connected_include_witness = new_options.child_nodes_connected_include_witness;
log_info(_("\"child_nodes_connected_include_witness\" is now \"%i\""), new_options.child_nodes_connected_include_witness);
config_changed = true;
}
/* child_nodes_disconnect_timeout */
if (orig_options->child_nodes_disconnect_timeout != new_options.child_nodes_disconnect_timeout)
{
if (new_options.child_nodes_disconnect_timeout < 0)
{
log_error(_("\"child_nodes_disconnect_timeout\" must be \"0\" or greater; provided: \"%i\""),
new_options.child_nodes_disconnect_timeout);
}
else
{
orig_options->child_nodes_disconnect_timeout = new_options.child_nodes_disconnect_timeout;
log_info(_("\"child_nodes_disconnect_timeout\" is now \"%i\""), new_options.child_nodes_disconnect_timeout);
config_changed = true;
}
}
/* conninfo */
if (strncmp(orig_options->conninfo, new_options.conninfo, MAXLEN) != 0)
if (strncmp(orig_options->conninfo, new_options.conninfo, sizeof(orig_options->conninfo)) != 0)
{
/* Test conninfo string works */
conn = establish_db_connection(new_options.conninfo, false);
@@ -1249,11 +1375,14 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
}
else
{
strncpy(orig_options->conninfo, new_options.conninfo, MAXLEN);
snprintf(orig_options->conninfo, sizeof(orig_options->conninfo),
"%s", new_options.conninfo);
log_info(_("\"conninfo\" is now \"%s\""), new_options.conninfo);
}
PQfinish(conn);
config_changed = true;
}
/* degraded_monitoring_timeout */
@@ -1266,18 +1395,20 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
}
/* event_notification_command */
if (strncmp(orig_options->event_notification_command, new_options.event_notification_command, MAXLEN) != 0)
if (strncmp(orig_options->event_notification_command, new_options.event_notification_command, sizeof(orig_options->event_notification_command)) != 0)
{
strncpy(orig_options->event_notification_command, new_options.event_notification_command, MAXLEN);
snprintf(orig_options->event_notification_command, sizeof(orig_options->event_notification_command),
"%s", new_options.event_notification_command);
log_info(_("\"event_notification_command\" is now \"%s\""), new_options.event_notification_command);
config_changed = true;
}
/* event_notifications */
if (strncmp(orig_options->event_notifications_orig, new_options.event_notifications_orig, MAXLEN) != 0)
if (strncmp(orig_options->event_notifications_orig, new_options.event_notifications_orig, sizeof(orig_options->event_notifications_orig)) != 0)
{
strncpy(orig_options->event_notifications_orig, new_options.event_notifications_orig, MAXLEN);
snprintf(orig_options->event_notifications_orig, sizeof(orig_options->event_notifications_orig),
"%s", new_options.event_notifications_orig);
log_info(_("\"event_notifications\" is now \"%s\""), new_options.event_notifications_orig);
clear_event_notification_list(orig_options);
@@ -1295,9 +1426,10 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
}
/* follow_command */
if (strncmp(orig_options->follow_command, new_options.follow_command, MAXLEN) != 0)
if (strncmp(orig_options->follow_command, new_options.follow_command, sizeof(orig_options->follow_command)) != 0)
{
strncpy(orig_options->follow_command, new_options.follow_command, MAXLEN);
snprintf(orig_options->follow_command, sizeof(orig_options->follow_command),
"%s", new_options.follow_command);
log_info(_("\"follow_command\" is now \"%s\""), new_options.follow_command);
config_changed = true;
@@ -1331,9 +1463,10 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
}
/* promote_command */
if (strncmp(orig_options->promote_command, new_options.promote_command, MAXLEN) != 0)
if (strncmp(orig_options->promote_command, new_options.promote_command, sizeof(orig_options->promote_command)) != 0)
{
strncpy(orig_options->promote_command, new_options.promote_command, MAXLEN);
snprintf(orig_options->promote_command, sizeof(orig_options->promote_command),
"%s", new_options.promote_command);
log_info(_("\"promote_command\" is now \"%s\""), new_options.promote_command);
config_changed = true;
@@ -1398,7 +1531,7 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
{
orig_options->connection_check_type = new_options.connection_check_type;
log_info(_("\"connection_check_type\" is now \"%s\""),
new_options.connection_check_type == CHECK_PING ? "ping" : "connection");
print_connection_check_type(new_options.connection_check_type));
config_changed = true;
}
@@ -1412,9 +1545,10 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
}
/* failover_validation_command */
if (strncmp(orig_options->failover_validation_command, new_options.failover_validation_command, MAXPGPATH) != 0)
if (strncmp(orig_options->failover_validation_command, new_options.failover_validation_command, sizeof(orig_options->failover_validation_command)) != 0)
{
strncpy(orig_options->failover_validation_command, new_options.failover_validation_command, MAXPGPATH);
snprintf(orig_options->failover_validation_command, sizeof(orig_options->failover_validation_command),
"%s", new_options.failover_validation_command);
log_info(_("\"failover_validation_command\" is now \"%s\""), new_options.failover_validation_command);
config_changed = true;
@@ -1425,18 +1559,20 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
*/
/* log_facility */
if (strncmp(orig_options->log_facility, new_options.log_facility, MAXLEN) != 0)
if (strncmp(orig_options->log_facility, new_options.log_facility, sizeof(orig_options->log_facility)) != 0)
{
strncpy(orig_options->log_facility, new_options.log_facility, MAXLEN);
snprintf(orig_options->log_facility, sizeof(orig_options->log_facility),
"%s", new_options.log_facility);
log_info(_("\"log_facility\" is now \"%s\""), new_options.log_facility);
log_config_changed = true;
}
/* log_file */
if (strncmp(orig_options->log_file, new_options.log_file, MAXLEN) != 0)
if (strncmp(orig_options->log_file, new_options.log_file, sizeof(orig_options->log_file)) != 0)
{
strncpy(orig_options->log_file, new_options.log_file, MAXLEN);
snprintf(orig_options->log_file, sizeof(orig_options->log_file),
"%s", new_options.log_file);
log_info(_("\"log_file\" is now \"%s\""), new_options.log_file);
log_config_changed = true;
@@ -1444,9 +1580,10 @@ reload_config(t_configuration_options *orig_options, t_server_type server_type)
/* log_level */
if (strncmp(orig_options->log_level, new_options.log_level, MAXLEN) != 0)
if (strncmp(orig_options->log_level, new_options.log_level, sizeof(orig_options->log_level)) != 0)
{
strncpy(orig_options->log_level, new_options.log_level, MAXLEN);
snprintf(orig_options->log_level, sizeof(orig_options->log_level),
"%s", new_options.log_level);
log_info(_("\"log_level\" is now \"%s\""), new_options.log_level);
log_config_changed = true;
@@ -1940,18 +2077,10 @@ parse_pg_basebackup_options(const char *pg_basebackup_options, t_basebackup_opti
struct option *long_options = NULL;
/* We're only interested in these options */
static struct option long_options_9[] =
{
{"slot", required_argument, NULL, 'S'},
{"xlog-method", required_argument, NULL, 'X'},
{NULL, 0, NULL, 0}
};
/*
* From PostgreSQL 10, --xlog-method is renamed --wal-method and there's
* also --no-slot, which we'll want to consider.
* We're only interested in these options.
*/
static struct option long_options_10[] =
{
{"slot", required_argument, NULL, 'S'},
@@ -1960,6 +2089,17 @@ parse_pg_basebackup_options(const char *pg_basebackup_options, t_basebackup_opti
{NULL, 0, NULL, 0}
};
/*
* Pre-PostgreSQL 10 options
*/
static struct option long_options_legacy[] =
{
{"slot", required_argument, NULL, 'S'},
{"xlog-method", required_argument, NULL, 'X'},
{NULL, 0, NULL, 0}
};
/* Don't attempt to tokenise an empty string */
if (!strlen(pg_basebackup_options))
return backup_options_ok;
@@ -1967,7 +2107,7 @@ parse_pg_basebackup_options(const char *pg_basebackup_options, t_basebackup_opti
if (server_version_num >= 100000)
long_options = long_options_10;
else
long_options = long_options_9;
long_options = long_options_legacy;
argc_item = parse_output_to_argv(pg_basebackup_options, &argv_array);
@@ -1986,7 +2126,7 @@ parse_pg_basebackup_options(const char *pg_basebackup_options, t_basebackup_opti
strncpy(backup_options->slot, optarg, MAXLEN);
break;
case 'X':
strncpy(backup_options->xlog_method, optarg, MAXLEN);
strncpy(backup_options->wal_method, optarg, MAXLEN);
break;
case 1:
backup_options->no_slot = true;
@@ -2017,3 +2157,21 @@ parse_pg_basebackup_options(const char *pg_basebackup_options, t_basebackup_opti
return backup_options_ok;
}
const char *
print_connection_check_type(ConnectionCheckType type)
{
switch (type)
{
case CHECK_PING:
return "ping";
case CHECK_QUERY:
return "query";
case CHECK_CONNECTION:
return "connection";
}
/* should never reach here */
return "UNKNOWN";
}

View File

@@ -76,7 +76,7 @@ typedef struct
{
/* node information */
int node_id;
char node_name[MAXLEN];
char node_name[NAMEDATALEN];
char conninfo[MAXLEN];
char replication_user[NAMEDATALEN];
char data_directory[MAXPGPATH];
@@ -88,7 +88,7 @@ typedef struct
/* log settings */
char log_level[MAXLEN];
char log_facility[MAXLEN];
char log_file[MAXLEN];
char log_file[MAXPGPATH];
int log_status_interval;
/* standby clone settings */
@@ -148,6 +148,12 @@ typedef struct
bool primary_visibility_consensus;
char failover_validation_command[MAXPGPATH];
int election_rerun_interval;
int child_nodes_check_interval;
int child_nodes_disconnect_min_count;
int child_nodes_connected_min_count;
bool child_nodes_connected_include_witness;
int child_nodes_disconnect_timeout;
char child_nodes_disconnect_command[MAXPGPATH];
/* BDR settings */
bool bdr_local_monitoring_only;
@@ -221,6 +227,11 @@ typedef struct
DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT, \
-1, "", false, DEFAULT_SIBLING_NODES_DISCONNECT_TIMEOUT, \
CHECK_PING, true, "", DEFAULT_ELECTION_RERUN_INTERVAL, \
DEFAULT_CHILD_NODES_CHECK_INTERVAL, \
DEFAULT_CHILD_NODES_DISCONNECT_MIN_COUNT, \
DEFAULT_CHILD_NODES_CONNECTED_MIN_COUNT, \
DEFAULT_CHILD_NODES_CONNECTED_INCLUDE_WITNESS, \
DEFAULT_CHILD_NODES_DISCONNECT_TIMEOUT, "", \
/* BDR settings */ \
false, DEFAULT_BDR_RECOVERY_TIMEOUT, \
/* service settings */ \
@@ -242,7 +253,7 @@ typedef struct
typedef struct
{
char slot[MAXLEN];
char xlog_method[MAXLEN];
char wal_method[MAXLEN];
bool no_slot; /* from PostgreSQL 10 */
} t_basebackup_options;
@@ -329,5 +340,6 @@ void free_parsed_argv(char ***argv_array);
/* called by repmgr-client and repmgrd */
void exit_with_cli_errors(ItemList *error_list, const char *repmgr_command);
void print_item_list(ItemList *item_list);
const char *print_connection_check_type(ConnectionCheckType type);
#endif /* _REPMGR_CONFIGFILE_H_ */

21
configure vendored
View File

@@ -1,6 +1,6 @@
#! /bin/sh
# Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.69 for repmgr 4.3.
# Generated by GNU Autoconf 2.69 for repmgr 4.4.
#
# Report bugs to <repmgr@googlegroups.com>.
#
@@ -582,8 +582,8 @@ MAKEFLAGS=
# Identity of this package.
PACKAGE_NAME='repmgr'
PACKAGE_TARNAME='repmgr'
PACKAGE_VERSION='4.3'
PACKAGE_STRING='repmgr 4.3'
PACKAGE_VERSION='4.4'
PACKAGE_STRING='repmgr 4.4'
PACKAGE_BUGREPORT='repmgr@googlegroups.com'
PACKAGE_URL='https://repmgr.org/'
@@ -1178,7 +1178,7 @@ if test "$ac_init_help" = "long"; then
# Omit some internal or obsolete options to make the list less imposing.
# This message is too long to be a string in the A/UX 3.1 sh.
cat <<_ACEOF
\`configure' configures repmgr 4.3 to adapt to many kinds of systems.
\`configure' configures repmgr 4.4 to adapt to many kinds of systems.
Usage: $0 [OPTION]... [VAR=VALUE]...
@@ -1239,7 +1239,7 @@ fi
if test -n "$ac_init_help"; then
case $ac_init_help in
short | recursive ) echo "Configuration of repmgr 4.3:";;
short | recursive ) echo "Configuration of repmgr 4.4:";;
esac
cat <<\_ACEOF
@@ -1313,7 +1313,7 @@ fi
test -n "$ac_init_help" && exit $ac_status
if $ac_init_version; then
cat <<\_ACEOF
repmgr configure 4.3
repmgr configure 4.4
generated by GNU Autoconf 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
@@ -1332,7 +1332,7 @@ cat >config.log <<_ACEOF
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.
It was created by repmgr $as_me 4.3, which was
It was created by repmgr $as_me 4.4, which was
generated by GNU Autoconf 2.69. Invocation command line was
$ $0 $@
@@ -1851,8 +1851,6 @@ ac_config_files="$ac_config_files Makefile"
ac_config_files="$ac_config_files Makefile.global"
ac_config_files="$ac_config_files doc/Makefile"
cat >confcache <<\_ACEOF
# This file is a shell script that caches the results of configure
# tests run on this system so they can be shared between configure
@@ -2359,7 +2357,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
# report actual input values of CONFIG_FILES etc. instead of their
# values after options handling.
ac_log="
This file was extended by repmgr $as_me 4.3, which was
This file was extended by repmgr $as_me 4.4, which was
generated by GNU Autoconf 2.69. Invocation command line was
CONFIG_FILES = $CONFIG_FILES
@@ -2422,7 +2420,7 @@ _ACEOF
cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`"
ac_cs_version="\\
repmgr config.status 4.3
repmgr config.status 4.4
configured by $0, generated by GNU Autoconf 2.69,
with options \\"\$ac_cs_config\\"
@@ -2546,7 +2544,6 @@ do
"config.h") CONFIG_HEADERS="$CONFIG_HEADERS config.h" ;;
"Makefile") CONFIG_FILES="$CONFIG_FILES Makefile" ;;
"Makefile.global") CONFIG_FILES="$CONFIG_FILES Makefile.global" ;;
"doc/Makefile") CONFIG_FILES="$CONFIG_FILES doc/Makefile" ;;
*) as_fn_error $? "invalid argument: \`$ac_config_target'" "$LINENO" 5;;
esac

View File

@@ -1,4 +1,4 @@
AC_INIT([repmgr], [4.3], [repmgr@googlegroups.com], [repmgr], [https://repmgr.org/])
AC_INIT([repmgr], [4.4], [repmgr@googlegroups.com], [repmgr], [https://repmgr.org/])
AC_COPYRIGHT([Copyright (c) 2010-2019, 2ndQuadrant Ltd.])
@@ -59,6 +59,5 @@ AC_SUBST(vpath_build)
AC_CONFIG_FILES([Makefile])
AC_CONFIG_FILES([Makefile.global])
AC_CONFIG_FILES([doc/Makefile])
AC_OUTPUT

View File

@@ -301,6 +301,8 @@ get_controlfile(const char *DataDir)
ControlFilePath);
log_detail("%s", strerror(errno));
close(fd);
return control_file_info;
}
@@ -308,7 +310,18 @@ get_controlfile(const char *DataDir)
control_file_info->control_file_processed = true;
if (version_num >= 110000)
if (version_num >= 120000)
{
ControlFileData12 *ptr = (struct ControlFileData12 *)ControlFileDataPtr;
control_file_info->system_identifier = ptr->system_identifier;
control_file_info->state = ptr->state;
control_file_info->checkPoint = ptr->checkPoint;
control_file_info->data_checksum_version = ptr->data_checksum_version;
control_file_info->timeline = ptr->checkPointCopy.ThisTimeLineID;
control_file_info->minRecoveryPointTLI = ptr->minRecoveryPointTLI;
control_file_info->minRecoveryPoint = ptr->minRecoveryPoint;
}
else if (version_num >= 110000)
{
ControlFileData11 *ptr = (struct ControlFileData11 *)ControlFileDataPtr;
control_file_info->system_identifier = ptr->system_identifier;

View File

@@ -333,6 +333,72 @@ typedef struct ControlFileData11
} ControlFileData11;
/*
* Following field added in Pg12:
*
* int max_wal_senders;
*
* Following fields removed:
*
* uint32 nextXidEpoch;
* TransactionId nextXid;
*
* and replaced by:
*
* FullTransactionId nextFullXid;
*/
typedef struct ControlFileData12
{
uint64 system_identifier;
uint32 pg_control_version; /* PG_CONTROL_VERSION */
uint32 catalog_version_no; /* see catversion.h */
DBState state; /* see enum above */
pg_time_t time; /* time stamp of last pg_control update */
XLogRecPtr checkPoint; /* last check point record ptr */
CheckPoint checkPointCopy; /* copy of last check point record */
XLogRecPtr unloggedLSN; /* current fake LSN value, for unlogged rels */
XLogRecPtr minRecoveryPoint;
TimeLineID minRecoveryPointTLI;
XLogRecPtr backupStartPoint;
XLogRecPtr backupEndPoint;
bool backupEndRequired;
int wal_level;
bool wal_log_hints;
int MaxConnections;
int max_worker_processes;
int max_wal_senders;
int max_prepared_xacts;
int max_locks_per_xact;
bool track_commit_timestamp;
uint32 maxAlign; /* alignment requirement for tuples */
double floatFormat; /* constant 1234567.0 */
uint32 blcksz; /* data block size for this DB */
uint32 relseg_size; /* blocks per segment of large relation */
uint32 xlog_blcksz; /* block size within WAL files */
uint32 xlog_seg_size; /* size of each WAL segment */
uint32 nameDataLen; /* catalog name field width */
uint32 indexMaxKeys; /* max number of columns in an index */
uint32 toast_max_chunk_size; /* chunk size in TOAST tables */
uint32 loblksize; /* chunk size in pg_largeobject */
bool float4ByVal; /* float4 pass-by-value? */
bool float8ByVal; /* float8, int8, etc pass-by-value? */
uint32 data_checksum_version;
} ControlFileData12;
extern int get_pg_version(const char *data_directory, char *version_string);
extern DBState get_db_state(const char *data_directory);
extern const char *describe_db_state(DBState state);

488
dbutils.c
View File

@@ -43,6 +43,8 @@ int bdr_version_num = UNKNOWN_BDR_VERSION_NUM;
static void log_db_error(PGconn *conn, const char *query_text, const char *fmt,...)
__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 4)));
static bool _is_server_available(const char *conninfo, bool quiet);
static PGconn *_establish_db_connection(const char *conninfo,
const bool exit_on_error,
const bool log_notice,
@@ -51,6 +53,8 @@ static PGconn *_establish_db_connection(const char *conninfo,
static PGconn *_get_primary_connection(PGconn *standby_conn, int *primary_id, char *primary_conninfo_out, bool quiet);
static bool _set_config(PGconn *conn, const char *config_param, const char *sqlquery);
static bool _get_pg_setting(PGconn *conn, const char *setting, char *str_output, int *int_output);
static RecordStatus _get_node_record(PGconn *conn, char *sqlquery, t_node_info *node_info, bool init_defaults);
static void _populate_node_record(PGresult *res, t_node_info *node_info, int row, bool init_defaults);
@@ -67,16 +71,19 @@ void
log_db_error(PGconn *conn, const char *query_text, const char *fmt,...)
{
va_list ap;
char buf[MAXLEN];
int retval;
va_start(ap, fmt);
log_error(fmt, ap);
retval = vsnprintf(buf, MAXLEN, fmt, ap);
va_end(ap);
if (conn != NULL && PQstatus(conn) == CONNECTION_OK)
if (retval < MAXLEN)
log_error("%s", buf);
if (conn != NULL)
{
log_detail("%s", PQerrorMessage(conn));
log_detail("\n%s", PQerrorMessage(conn));
}
if (query_text != NULL)
@@ -190,13 +197,13 @@ _establish_db_connection(const char *conninfo, const bool exit_on_error, const b
{
if (log_notice)
{
log_notice(_("connection to database failed:\n %s"),
PQerrorMessage(conn));
log_notice(_("connection to database failed"));
log_detail("\n%s", PQerrorMessage(conn));
}
else
{
log_error(_("connection to database failed:\n %s"),
PQerrorMessage(conn));
log_error(_("connection to database failed"));
log_detail("\n%s", PQerrorMessage(conn));
}
log_detail(_("attempted to connect using:\n %s"),
connection_string);
@@ -287,8 +294,9 @@ establish_db_connection_by_params(t_conninfo_param_list *param_list,
/* Check to see that the backend connection was successfully made */
if ((PQstatus(conn) != CONNECTION_OK))
{
log_error(_("connection to database failed:\n %s"),
PQerrorMessage(conn));
log_error(_("connection to database failed"));
log_detail("\n%s", PQerrorMessage(conn));
if (exit_on_error)
{
PQfinish(conn);
@@ -338,7 +346,9 @@ is_superuser_connection(PGconn *conn, t_connection_user *userinfo)
if (userinfo != NULL)
{
strncpy(userinfo->username, current_user, MAXLEN);
snprintf(userinfo->username,
sizeof(userinfo->username),
"%s", current_user);
userinfo->is_superuser = is_superuser;
}
@@ -987,52 +997,37 @@ guc_set(PGconn *conn, const char *parameter, const char *op,
return retval;
}
/**
* Just like guc_set except with an extra parameter containing the name of
* the pg datatype so that the comparison can be done properly.
*/
int
guc_set_typed(PGconn *conn, const char *parameter, const char *op,
const char *value, const char *datatype)
bool
get_pg_setting(PGconn *conn, const char *setting, char *output)
{
PQExpBufferData query;
PGresult *res = NULL;
int retval = 1;
bool success = _get_pg_setting(conn, setting, output, NULL);
char *escaped_parameter = escape_string(conn, parameter);
char *escaped_value = escape_string(conn, value);
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
"SELECT true FROM pg_catalog.pg_settings "
" WHERE name = '%s' AND setting::%s %s '%s'::%s",
parameter, datatype, op, value, datatype);
log_verbose(LOG_DEBUG, "guc_set_typed():\n%s", query.data);
res = PQexec(conn, query.data);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
if (success == true)
{
log_db_error(conn, query.data, _("guc_set_typed(): unable to execute query"));
retval = -1;
}
else if (PQntuples(res) == 0)
{
retval = 0;
log_verbose(LOG_DEBUG, _("get_pg_setting(): returned value is \"%s\""), output);
}
pfree(escaped_parameter);
pfree(escaped_value);
termPQExpBuffer(&query);
PQclear(res);
return retval;
return success;
}
bool
get_pg_setting(PGconn *conn, const char *setting, char *output)
get_pg_setting_int(PGconn *conn, const char *setting, int *output)
{
bool success = _get_pg_setting(conn, setting, NULL, output);
if (success == true)
{
log_verbose(LOG_DEBUG, _("get_pg_setting_int(): returned value is \"%i\""), *output);
}
return success;
}
bool
_get_pg_setting(PGconn *conn, const char *setting, char *str_output, int *int_output)
{
PQExpBufferData query;
PGresult *res = NULL;
@@ -1073,7 +1068,11 @@ get_pg_setting(PGconn *conn, const char *setting, char *output)
{
if (strcmp(PQgetvalue(res, i, 0), setting) == 0)
{
strncpy(output, PQgetvalue(res, i, 1), MAXLEN);
if (str_output != NULL)
snprintf(str_output, MAXLEN, "%s", PQgetvalue(res, i, 1));
else if (int_output != NULL)
*int_output = atoi(PQgetvalue(res, i, 1));
success = true;
break;
}
@@ -1084,10 +1083,6 @@ get_pg_setting(PGconn *conn, const char *setting, char *output)
}
}
if (success == true)
{
log_verbose(LOG_DEBUG, _("get_pg_setting(): returned value is \"%s\""), output);
}
termPQExpBuffer(&query);
PQclear(res);
@@ -1096,12 +1091,13 @@ get_pg_setting(PGconn *conn, const char *setting, char *output)
}
bool
alter_system_int(PGconn *conn, const char *name, int value)
{
PQExpBufferData query;
PGresult *res = NULL;
bool success = false;
bool success = true;
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
@@ -1117,7 +1113,6 @@ alter_system_int(PGconn *conn, const char *name, int value)
success = false;
}
termPQExpBuffer(&query);
PQclear(res);
@@ -1174,7 +1169,7 @@ get_cluster_size(PGconn *conn, char *size)
}
else
{
strncpy(size, PQgetvalue(res, 0, 0), MAXLEN);
snprintf(size, MAXLEN, "%s", PQgetvalue(res, 0, 0));
}
termPQExpBuffer(&query);
@@ -1222,7 +1217,7 @@ get_server_version(PGconn *conn, char *server_version_buf)
* first space.
*/
strncpy(_server_version_buf, PQgetvalue(res, 0, 1), MAXVERSIONSTR);
snprintf(_server_version_buf, MAXVERSIONSTR, "%s", PQgetvalue(res, 0, 1));
for (i = 0; i < MAXVERSIONSTR; i++)
{
@@ -1349,7 +1344,8 @@ _get_primary_connection(PGconn *conn,
/* initialize with the values of the current node being processed */
node_id = atoi(PQgetvalue(res, i, 0));
strncpy(remote_conninfo, PQgetvalue(res, i, 1), MAXCONNINFO);
snprintf(remote_conninfo, MAXCONNINFO, "%s", PQgetvalue(res, i, 1));
log_verbose(LOG_INFO,
_("checking if node %i is primary"),
node_id);
@@ -1513,10 +1509,10 @@ get_ready_archive_files(PGconn *conn, const char *data_directory)
while ((arcdir_ent = readdir(arcdir)) != NULL)
{
struct stat statbuf;
char file_path[MAXPGPATH] = "";
char file_path[MAXPGPATH + sizeof(arcdir_ent->d_name)];
int basenamelen = 0;
snprintf(file_path, MAXPGPATH,
snprintf(file_path, sizeof(file_path),
"%s/%s",
archive_status_dir,
arcdir_ent->d_name);
@@ -1543,12 +1539,12 @@ get_ready_archive_files(PGconn *conn, const char *data_directory)
}
bool
identify_system(PGconn *repl_conn, t_system_identification *identification)
{
PGresult *res = NULL;
/* semicolon required here */
res = PQexec(repl_conn, "IDENTIFY_SYSTEM;");
if (PQresultStatus(res) != PGRES_TUPLES_OK || !PQntuples(res))
@@ -1567,6 +1563,44 @@ identify_system(PGconn *repl_conn, t_system_identification *identification)
return true;
}
/*
* Return the system identifier by querying pg_control_system().
*
* Note there is a similar function in controldata.c ("get_system_identifier()")
* which reads the control file.
*/
uint64
system_identifier(PGconn *conn)
{
uint64 system_identifier = UNKNOWN_SYSTEM_IDENTIFIER;
PGresult *res = NULL;
/*
* pg_control_system() was introduced in PostgreSQL 9.6
*/
if (PQserverVersion(conn) < 90600)
{
return UNKNOWN_SYSTEM_IDENTIFIER;
}
res = PQexec(conn, "SELECT system_identifier FROM pg_catalog.pg_control_system()");
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_db_error(conn, NULL, _("system_identifier(): unable to query pg_control_system()"));
}
else
{
system_identifier = atol(PQgetvalue(res, 0, 0));
}
PQclear(res);
return system_identifier;
}
TimeLineHistoryEntry *
get_timeline_history(PGconn *repl_conn, TimeLineID tli)
{
@@ -1664,6 +1698,46 @@ get_timeline_history(PGconn *repl_conn, TimeLineID tli)
}
bool
get_child_nodes(PGconn *conn, int node_id, NodeInfoList *node_list)
{
PQExpBufferData query;
PGresult *res = NULL;
bool success = true;
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
" SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, "
" n.slot_name, n.location, n.priority, n.active, n.config_file, "
" '' AS upstream_node_name, "
" CASE WHEN sr.application_name IS NULL THEN FALSE ELSE TRUE END AS attached "
" FROM repmgr.nodes n "
" LEFT JOIN pg_catalog.pg_stat_replication sr "
" ON sr.application_name = n.node_name "
" WHERE n.upstream_node_id = %i ",
node_id);
log_verbose(LOG_DEBUG, "get_active_sibling_node_records():\n%s", query.data);
res = PQexec(conn, query.data);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_db_error(conn, query.data, _("get_active_sibling_records(): unable to execute query"));
success = false;
}
termPQExpBuffer(&query);
/* this will return an empty list if there was an error executing the query */
_populate_node_records(res, node_list);
PQclear(res);
return success;
}
/* =============================== */
/* repmgrd shared memory functions */
/* =============================== */
@@ -1932,6 +2006,61 @@ get_wal_receiver_pid(PGconn *conn)
return wal_receiver_pid;
}
int
repmgrd_get_upstream_node_id(PGconn *conn)
{
PGresult *res = NULL;
int upstream_node_id = UNKNOWN_NODE_ID;
const char *sqlquery = "SELECT repmgr.get_upstream_node_id()";
res = PQexec(conn, sqlquery);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_db_error(conn, sqlquery, _("repmgrd_get_upstream_node_id(): unable to execute query"));
}
else if (!PQgetisnull(res, 0, 0))
{
upstream_node_id = atoi(PQgetvalue(res, 0, 0));
}
PQclear(res);
return upstream_node_id;
}
bool
repmgrd_set_upstream_node_id(PGconn *conn, int node_id)
{
PQExpBufferData query;
PGresult *res = NULL;
bool success = true;
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
" SELECT repmgr.set_upstream_node_id(%i) ",
node_id);
log_verbose(LOG_DEBUG, "repmgrd_set_upstream_node_id():\n %s", query.data);
res = PQexec(conn, query.data);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_db_error(conn, query.data,
_("repmgrd_set_upstream_node_id(): unable to set upstream node ID (provided value: %i)"), node_id);
success = false;
}
termPQExpBuffer(&query);
PQclear(res);
return success;
}
/* ================ */
/* result functions */
/* ================ */
@@ -1963,9 +2092,9 @@ get_repmgr_extension_status(PGconn *conn, t_extension_versions *extversions)
appendPQExpBufferStr(&query,
" SELECT ae.name, e.extname, "
" ae.default_version, "
" (((ae.default_version::NUMERIC::INT) * 10000) + (ae.default_version::NUMERIC - ae.default_version::NUMERIC::INT) * 1000)::INT AS available, "
" (((FLOOR(ae.default_version::NUMERIC)::INT) * 10000) + (ae.default_version::NUMERIC - FLOOR(ae.default_version::NUMERIC)::INT) * 1000)::INT AS available, "
" ae.installed_version, "
" (((ae.installed_version::NUMERIC::INT) * 10000) + (ae.installed_version::NUMERIC - ae.installed_version::NUMERIC::INT) * 1000)::INT AS installed "
" (((FLOOR(ae.installed_version::NUMERIC)::INT) * 10000) + (ae.installed_version::NUMERIC - FLOOR(ae.installed_version::NUMERIC)::INT) * 1000)::INT AS installed "
" FROM pg_catalog.pg_available_extensions ae "
"LEFT JOIN pg_catalog.pg_extension e "
" ON e.extname=ae.name "
@@ -1994,9 +2123,13 @@ get_repmgr_extension_status(PGconn *conn, t_extension_versions *extversions)
/* caller wants to know which versions are installed/available */
if (extversions != NULL)
{
strncpy(extversions->default_version, PQgetvalue(res, 0, 2), 7);
snprintf(extversions->default_version,
sizeof(extversions->default_version),
"%s", PQgetvalue(res, 0, 2));
extversions->default_version_num = available_version;
strncpy(extversions->installed_version, PQgetvalue(res, 0, 4), 7);
snprintf(extversions->installed_version,
sizeof(extversions->installed_version),
"%s", PQgetvalue(res, 0, 4));
extversions->installed_version_num = installed_version;
}
@@ -2197,17 +2330,26 @@ _populate_node_record(PGresult *res, t_node_info *node_info, int row, bool init_
node_info->upstream_node_id = atoi(PQgetvalue(res, row, 2));
}
strncpy(node_info->node_name, PQgetvalue(res, row, 3), MAXLEN);
strncpy(node_info->conninfo, PQgetvalue(res, row, 4), MAXLEN);
strncpy(node_info->repluser, PQgetvalue(res, row, 5), NAMEDATALEN);
strncpy(node_info->slot_name, PQgetvalue(res, row, 6), MAXLEN);
strncpy(node_info->location, PQgetvalue(res, row, 7), MAXLEN);
snprintf(node_info->node_name, sizeof(node_info->node_name), "%s", PQgetvalue(res, row, 3));
snprintf(node_info->conninfo, sizeof(node_info->conninfo), "%s", PQgetvalue(res, row, 4));
snprintf(node_info->repluser, sizeof(node_info->repluser), "%s", PQgetvalue(res, row, 5));
snprintf(node_info->slot_name, sizeof(node_info->slot_name), "%s", PQgetvalue(res, row, 6));
snprintf(node_info->location, sizeof(node_info->location), "%s", PQgetvalue(res, row, 7));
node_info->priority = atoi(PQgetvalue(res, row, 8));
node_info->active = atobool(PQgetvalue(res, row, 9));
strncpy(node_info->config_file, PQgetvalue(res, row, 10), MAXPGPATH);
snprintf(node_info->config_file, sizeof(node_info->config_file), "%s", PQgetvalue(res, row, 10));
/* This won't normally be set */
strncpy(node_info->upstream_node_name, PQgetvalue(res, row, 11), MAXLEN);
/* These are only set by certain queries */
snprintf(node_info->upstream_node_name, sizeof(node_info->upstream_node_name), "%s", PQgetvalue(res, row, 11));
if (PQgetisnull(res, row, 12))
{
node_info->attached = NODE_ATTACHED_UNKNOWN;
}
else
{
node_info->attached = atobool(PQgetvalue(res, row, 12)) ? NODE_ATTACHED : NODE_DETACHED;
}
/* Set remaining struct fields with default values */
@@ -2330,8 +2472,7 @@ get_node_record_with_upstream(PGconn *conn, int node_id, t_node_info *node_info)
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
" SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, "
" n.slot_name, n.location, n.priority, n.active, n.config_file, un.node_name AS upstream_node_name "
" SELECT " REPMGR_NODES_COLUMNS_WITH_UPSTREAM
" FROM repmgr.nodes n "
" LEFT JOIN repmgr.nodes un "
" ON un.node_id = n.upstream_node_id"
@@ -2374,7 +2515,7 @@ get_node_record_by_name(PGconn *conn, const char *node_name, t_node_info *node_i
if (record_status == RECORD_NOT_FOUND)
{
log_verbose(LOG_DEBUG, "get_node_record_by_name(): no record found for node %s",
log_verbose(LOG_DEBUG, "get_node_record_by_name(): no record found for node \"%s\"",
node_name);
}
@@ -2630,8 +2771,7 @@ get_all_node_records_with_upstream(PGconn *conn, NodeInfoList *node_list)
initPQExpBuffer(&query);
appendPQExpBufferStr(&query,
" SELECT n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, "
" n.slot_name, n.location, n.priority, n.active, n.config_file, un.node_name AS upstream_node_name "
" SELECT " REPMGR_NODES_COLUMNS_WITH_UPSTREAM
" FROM repmgr.nodes n "
" LEFT JOIN repmgr.nodes un "
" ON un.node_id = n.upstream_node_id"
@@ -3255,6 +3395,10 @@ clear_node_info_list(NodeInfoList *nodes)
while (cell != NULL)
{
next_cell = cell->next;
if (cell->node_info->replication_info != NULL)
pfree(cell->node_info->replication_info);
pfree(cell->node_info);
pfree(cell);
cell = next_cell;
@@ -3461,11 +3605,15 @@ config_file_list_add(t_configfile_list *list, const char *file, const char *file
}
strncpy(list->files[list->entries]->filepath, file, MAXPGPATH);
snprintf(list->files[list->entries]->filepath,
sizeof(list->files[list->entries]->filepath),
"%s", file);
canonicalize_path(list->files[list->entries]->filepath);
snprintf(list->files[list->entries]->filename,
sizeof(list->files[list->entries]->filename),
"%s", filename);
strncpy(list->files[list->entries]->filename, filename, MAXPGPATH);
list->files[list->entries]->in_data_directory = in_data_dir;
list->entries++;
@@ -3545,13 +3693,10 @@ _create_event(PGconn *conn, t_configuration_options *options, int node_id, char
log_verbose(LOG_DEBUG, "_create_event(): event is \"%s\" for node %i", event, node_id);
/*
* Only attempt to write a record if a connection handle was provided.
* Also check that the repmgr schema has been properly initialised - if
* not it means no configuration file was provided, which can happen with
* e.g. `repmgr standby clone`, and we won't know which schema to write
* to.
* Only attempt to write a record if a connection handle was provided,
* and the connection handle points to a node which is not in recovery.
*/
if (conn != NULL && PQstatus(conn) == CONNECTION_OK)
if (conn != NULL && PQstatus(conn) == CONNECTION_OK && get_recovery_type(conn) == RECTYPE_PRIMARY)
{
int n_node_id = htonl(node_id);
char *t_successful = successful ? "TRUE" : "FALSE";
@@ -3605,7 +3750,7 @@ _create_event(PGconn *conn, t_configuration_options *options, int node_id, char
else
{
/* Store timestamp to send to the notification command */
strncpy(event_timestamp, PQgetvalue(res, 0, 0), MAXLEN);
snprintf(event_timestamp, MAXLEN, "%s", PQgetvalue(res, 0, 0));
}
termPQExpBuffer(&query);
@@ -4040,8 +4185,12 @@ get_slot_record(PGconn *conn, char *slot_name, t_replication_slot *record)
}
else
{
strncpy(record->slot_name, PQgetvalue(res, 0, 0), MAXLEN);
strncpy(record->slot_type, PQgetvalue(res, 0, 1), MAXLEN);
snprintf(record->slot_name,
sizeof(record->slot_name),
"%s", PQgetvalue(res, 0, 0));
snprintf(record->slot_type,
sizeof(record->slot_type),
"%s", PQgetvalue(res, 0, 1));
record->active = atobool(PQgetvalue(res, 0, 2));
}
@@ -4172,7 +4321,8 @@ get_tablespace_name_by_location(PGconn *conn, const char *location, char *name)
}
else
{
strncpy(name, PQgetvalue(res, 0, 0), MAXLEN);
snprintf(name, MAXLEN,
"%s", PQgetvalue(res, 0, 0));
}
termPQExpBuffer(&query);
@@ -4206,7 +4356,7 @@ cancel_query(PGconn *conn, int timeout)
if (PQcancel(pgcancel, errbuf, ERRBUFF_SIZE) == 0)
{
log_warning(_("unable to cancel current query"));
log_detail("%s", errbuf);
log_detail("\n%s", errbuf);
PQfreeCancel(pgcancel);
return false;
}
@@ -4236,7 +4386,7 @@ wait_connection_availability(PGconn *conn, int timeout)
long long timeout_ms;
/* calculate timeout in microseconds */
timeout_ms = timeout * 1000000;
timeout_ms = (long long) timeout * 1000000;
while (timeout_ms > 0)
{
@@ -4295,13 +4445,33 @@ wait_connection_availability(PGconn *conn, int timeout)
bool
is_server_available(const char *conninfo)
{
return _is_server_available(conninfo, false);
}
bool
is_server_available_quiet(const char *conninfo)
{
return _is_server_available(conninfo, true);
}
static bool
_is_server_available(const char *conninfo, bool quiet)
{
PGPing status = PQping(conninfo);
log_verbose(LOG_DEBUG, "is_server_available(): ping status for %s is %i", conninfo, (int)status);
log_verbose(LOG_DEBUG, "is_server_available(): ping status for \"%s\" is %s", conninfo, print_pqping_status(status));
if (status == PQPING_OK)
return true;
if (quiet == false)
{
log_warning(_("unable to ping \"%s\""), conninfo);
log_detail(_("PQping() returned \"%s\""), print_pqping_status(status));
}
return false;
}
@@ -4314,10 +4484,17 @@ is_server_available_params(t_conninfo_param_list *param_list)
false);
/* deparsing the param_list adds overhead, so only do it if needed */
if (log_level == LOG_DEBUG)
if (log_level == LOG_DEBUG || status != PQPING_OK)
{
char *conninfo_str = param_list_to_string(param_list);
log_verbose(LOG_DEBUG, "is_server_available_params(): ping status for %s is %i", conninfo_str, (int)status);
log_verbose(LOG_DEBUG, "is_server_available_params(): ping status for \"%s\" is %s", conninfo_str, print_pqping_status(status));
if (status != PQPING_OK)
{
log_warning(_("unable to ping \"%s\""), conninfo_str);
log_detail(_("PQping() returned \"%s\""), print_pqping_status(status));
}
pfree(conninfo_str);
}
@@ -4355,7 +4532,7 @@ connection_ping_reconnect(PGconn *conn)
if (PQstatus(conn) != CONNECTION_OK)
{
log_warning(_("connection error, attempting to reset"));
log_detail("%s", PQerrorMessage(conn));
log_detail("\n%s", PQerrorMessage(conn));
PQreset(conn);
ping_result = connection_ping(conn);
}
@@ -4887,6 +5064,8 @@ void
init_replication_info(ReplInfo *replication_info)
{
memset(replication_info->current_timestamp, 0, sizeof(replication_info->current_timestamp));
replication_info->in_recovery = false;
replication_info->timeline_id = UNKNOWN_TIMELINE_ID;
replication_info->last_wal_receive_lsn = InvalidXLogRecPtr;
replication_info->last_wal_replay_lsn = InvalidXLogRecPtr;
memset(replication_info->last_xact_replay_timestamp, 0, sizeof(replication_info->last_xact_replay_timestamp));
@@ -4907,6 +5086,7 @@ get_replication_info(PGconn *conn, t_server_type node_type, ReplInfo *replicatio
initPQExpBuffer(&query);
appendPQExpBufferStr(&query,
" SELECT ts, "
" in_recovery, "
" last_wal_receive_lsn, "
" last_wal_replay_lsn, "
" last_xact_replay_timestamp, "
@@ -4921,9 +5101,11 @@ get_replication_info(PGconn *conn, t_server_type node_type, ReplInfo *replicatio
" END AS replication_lag_time, "
" last_wal_receive_lsn >= last_wal_replay_lsn AS receiving_streamed_wal, "
" wal_replay_paused, "
" upstream_last_seen "
" upstream_last_seen, "
" upstream_node_id "
" FROM ( "
" SELECT CURRENT_TIMESTAMP AS ts, "
" pg_catalog.pg_is_in_recovery() AS in_recovery, "
" pg_catalog.pg_last_xact_replay_timestamp() AS last_xact_replay_timestamp, ");
@@ -4960,10 +5142,12 @@ get_replication_info(PGconn *conn, t_server_type node_type, ReplInfo *replicatio
" END AS wal_replay_paused, ");
}
/* Add information about upstream node from shared memory */
if (node_type == WITNESS)
{
appendPQExpBufferStr(&query,
" repmgr.get_upstream_last_seen() AS upstream_last_seen");
" repmgr.get_upstream_last_seen() AS upstream_last_seen, "
" repmgr.get_upstream_node_id() AS upstream_node_id ");
}
else
{
@@ -4971,7 +5155,12 @@ get_replication_info(PGconn *conn, t_server_type node_type, ReplInfo *replicatio
" CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE "
" THEN -1 "
" ELSE repmgr.get_upstream_last_seen() "
" END AS upstream_last_seen ");
" END AS upstream_last_seen, ");
appendPQExpBufferStr(&query,
" CASE WHEN pg_catalog.pg_is_in_recovery() IS FALSE "
" THEN -1 "
" ELSE repmgr.get_upstream_node_id() "
" END AS upstream_node_id ");
}
appendPQExpBufferStr(&query,
@@ -4989,14 +5178,20 @@ get_replication_info(PGconn *conn, t_server_type node_type, ReplInfo *replicatio
}
else
{
strncpy(replication_info->current_timestamp, PQgetvalue(res, 0, 0), MAXLEN);
replication_info->last_wal_receive_lsn = parse_lsn(PQgetvalue(res, 0, 1));
replication_info->last_wal_replay_lsn = parse_lsn(PQgetvalue(res, 0, 2));
strncpy(replication_info->last_xact_replay_timestamp, PQgetvalue(res, 0, 3), MAXLEN);
replication_info->replication_lag_time = atoi(PQgetvalue(res, 0, 4));
replication_info->receiving_streamed_wal = atobool(PQgetvalue(res, 0, 5));
replication_info->wal_replay_paused = atobool(PQgetvalue(res, 0, 6));
replication_info->upstream_last_seen = atoi(PQgetvalue(res, 0, 7));
snprintf(replication_info->current_timestamp,
sizeof(replication_info->current_timestamp),
"%s", PQgetvalue(res, 0, 0));
replication_info->in_recovery = atobool(PQgetvalue(res, 0, 1));
replication_info->last_wal_receive_lsn = parse_lsn(PQgetvalue(res, 0, 2));
replication_info->last_wal_replay_lsn = parse_lsn(PQgetvalue(res, 0, 3));
snprintf(replication_info->last_xact_replay_timestamp,
sizeof(replication_info->last_xact_replay_timestamp),
"%s", PQgetvalue(res, 0, 4));
replication_info->replication_lag_time = atoi(PQgetvalue(res, 0, 5));
replication_info->receiving_streamed_wal = atobool(PQgetvalue(res, 0, 6));
replication_info->wal_replay_paused = atobool(PQgetvalue(res, 0, 7));
replication_info->upstream_last_seen = atoi(PQgetvalue(res, 0, 8));
replication_info->upstream_node_id = atoi(PQgetvalue(res, 0, 9));
}
termPQExpBuffer(&query);
@@ -5042,13 +5237,12 @@ get_replication_lag_seconds(PGconn *conn)
log_warning("%s", PQerrorMessage(conn));
PQclear(res);
/* XXX magic number */
return -1;
return UNKNOWN_REPLICATION_LAG;
}
if (!PQntuples(res))
{
return -1;
return UNKNOWN_REPLICATION_LAG;
}
lag_seconds = atoi(PQgetvalue(res, 0, 0));
@@ -5058,6 +5252,38 @@ get_replication_lag_seconds(PGconn *conn)
}
TimeLineID
get_node_timeline(PGconn *conn)
{
TimeLineID timeline_id = UNKNOWN_TIMELINE_ID;
PGresult *res = NULL;
/*
* PG_control_checkpoint() was introduced in PostgreSQL 9.6
*/
if (PQserverVersion(conn) < 90600)
{
return UNKNOWN_TIMELINE_ID;
}
res = PQexec(conn, "SELECT timeline_id FROM pg_catalog.pg_control_checkpoint()");
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_db_error(conn, NULL, _("get_node_timeline(): unable to query pg_control_system()"));
}
else
{
timeline_id = atoi(PQgetvalue(res, 0, 0));
}
PQclear(res);
return timeline_id;
}
void
get_node_replication_stats(PGconn *conn, t_node_info *node_info)
{
@@ -5123,7 +5349,7 @@ get_node_replication_stats(PGconn *conn, t_node_info *node_info)
}
bool
NodeAttached
is_downstream_node_attached(PGconn *conn, char *node_name)
{
PQExpBufferData query;
@@ -5133,7 +5359,8 @@ is_downstream_node_attached(PGconn *conn, char *node_name)
initPQExpBuffer(&query);
appendPQExpBuffer(&query,
" SELECT pg_catalog.count(*) FROM pg_catalog.pg_stat_replication "
" SELECT pg_catalog.count(*) "
" FROM pg_catalog.pg_stat_replication "
" WHERE application_name = '%s'",
node_name);
@@ -5148,7 +5375,7 @@ is_downstream_node_attached(PGconn *conn, char *node_name)
termPQExpBuffer(&query);
PQclear(res);
return false;
return NODE_ATTACHED_UNKNOWN;
}
if (PQntuples(res) != 1)
@@ -5158,7 +5385,7 @@ is_downstream_node_attached(PGconn *conn, char *node_name)
termPQExpBuffer(&query);
PQclear(res);
return false;
return NODE_ATTACHED_UNKNOWN;
}
c = atoi(PQgetvalue(res, 0, 0));
@@ -5170,27 +5397,28 @@ is_downstream_node_attached(PGconn *conn, char *node_name)
{
log_verbose(LOG_WARNING, _("node \"%s\" not found in \"pg_stat_replication\""), node_name);
return false;
return NODE_DETACHED;
}
if (c > 1)
log_verbose(LOG_WARNING, _("multiple entries with \"application_name\" set to \"%s\" found in \"pg_stat_replication\""),
node_name);
return true;
return NODE_ATTACHED;
}
void
set_upstream_last_seen(PGconn *conn)
set_upstream_last_seen(PGconn *conn, int upstream_node_id)
{
PQExpBufferData query;
PGresult *res = NULL;
initPQExpBuffer(&query);
appendPQExpBufferStr(&query,
"SELECT repmgr.set_upstream_last_seen()");
appendPQExpBuffer(&query,
"SELECT repmgr.set_upstream_last_seen(%i)",
upstream_node_id);
res = PQexec(conn, query.data);
@@ -5506,7 +5734,9 @@ get_default_bdr_replication_set(PGconn *conn)
/* For BDR2, we use a custom replication set */
namelen = strlen(BDR2_REPLICATION_SET_NAME);
default_replication_set = pg_malloc0(namelen + 1);
strncpy(default_replication_set, BDR2_REPLICATION_SET_NAME, namelen);
snprintf(default_replication_set,
namelen + 1,
"%s", BDR2_REPLICATION_SET_NAME);
return default_replication_set;
}
@@ -5536,7 +5766,9 @@ get_default_bdr_replication_set(PGconn *conn)
namelen = strlen(PQgetvalue(res, 0, 0));
default_replication_set = pg_malloc0(namelen + 1);
strncpy(default_replication_set, PQgetvalue(res, 0, 0), namelen);
snprintf(default_replication_set,
namelen,
"%s", PQgetvalue(res, 0, 0));
PQclear(res);
@@ -5757,7 +5989,9 @@ get_bdr_other_node_name(PGconn *conn, int node_id, char *node_name)
if (PQresultStatus(res) == PGRES_TUPLES_OK)
{
strncpy(node_name, PQgetvalue(res, 0, 0), MAXLEN);
snprintf(node_name,
NAMEDATALEN,
"%s", PQgetvalue(res, 0, 0));
}
else
{
@@ -5940,12 +6174,12 @@ _populate_bdr_node_records(PGresult *res, BdrNodeInfoList *node_list)
static void
_populate_bdr_node_record(PGresult *res, t_bdr_node_info *node_info, int row)
{
strncpy(node_info->node_sysid, PQgetvalue(res, row, 0), MAXLEN);
snprintf(node_info->node_sysid, sizeof(node_info->node_sysid), "%s", PQgetvalue(res, row, 0));
node_info->node_timeline = atoi(PQgetvalue(res, row, 1));
node_info->node_dboid = atoi(PQgetvalue(res, row, 2));
strncpy(node_info->node_name, PQgetvalue(res, row, 3), MAXLEN);
strncpy(node_info->node_local_dsn, PQgetvalue(res, row, 4), MAXLEN);
strncpy(node_info->peer_state_name, PQgetvalue(res, row, 5), MAXLEN);
snprintf(node_info->node_name, sizeof(node_info->node_name), "%s", PQgetvalue(res, row, 3));
snprintf(node_info->node_local_dsn, sizeof(node_info->node_local_dsn), "%s", PQgetvalue(res, row, 4));
snprintf(node_info->peer_state_name, sizeof(node_info->peer_state_name), "%s", PQgetvalue(res, row, 5));
}

View File

@@ -29,7 +29,38 @@
#include "strutil.h"
#include "voting.h"
#define REPMGR_NODES_COLUMNS "n.node_id, n.type, n.upstream_node_id, n.node_name, n.conninfo, n.repluser, n.slot_name, n.location, n.priority, n.active, n.config_file, '' AS upstream_node_name "
#define REPMGR_NODES_COLUMNS \
"n.node_id, " \
"n.type, " \
"n.upstream_node_id, " \
"n.node_name, " \
"n.conninfo, " \
"n.repluser, " \
"n.slot_name, " \
"n.location, " \
"n.priority, " \
"n.active, " \
"n.config_file, " \
"'' AS upstream_node_name, " \
"NULL AS attached "
#define REPMGR_NODES_COLUMNS_WITH_UPSTREAM \
"n.node_id, " \
"n.type, " \
"n.upstream_node_id, " \
"n.node_name, " \
"n.conninfo, " \
"n.repluser, " \
"n.slot_name, " \
"n.location, " \
"n.priority, " \
"n.active, "\
"n.config_file, " \
"un.node_name AS upstream_node_name, " \
"NULL AS attached "
#define BDR2_NODES_COLUMNS "node_sysid, node_timeline, node_dboid, node_name, node_local_dsn, ''"
#define BDR3_NODES_COLUMNS "ns.node_id, 0, 0, ns.node_name, ns.interface_connstr, ns.peer_state_name"
@@ -92,6 +123,13 @@ typedef enum
CONN_ERROR
} ConnectionStatus;
typedef enum
{
NODE_ATTACHED_UNKNOWN = -1,
NODE_DETACHED,
NODE_ATTACHED
} NodeAttached;
typedef enum
{
SLOT_UNKNOWN = -1,
@@ -107,6 +145,7 @@ typedef enum
} BackupState;
/*
* Struct to store extension version information
*/
@@ -125,8 +164,28 @@ typedef struct s_extension_versions {
UNKNOWN_SERVER_VERSION_NUM \
}
typedef struct
{
char current_timestamp[MAXLEN];
bool in_recovery;
TimeLineID timeline_id;
XLogRecPtr last_wal_receive_lsn;
XLogRecPtr last_wal_replay_lsn;
char last_xact_replay_timestamp[MAXLEN];
int replication_lag_time;
bool receiving_streamed_wal;
bool wal_replay_paused;
int upstream_last_seen;
int upstream_node_id;
} ReplInfo;
/*
* Struct to store node information
* Struct to store node information.
*
* The first section represents the contents of the "repmgr.nodes"
* table; subsequent section contain information collated in
* various contexts.
*/
typedef struct s_node_info
{
@@ -134,8 +193,8 @@ typedef struct s_node_info
int node_id;
int upstream_node_id;
t_server_type type;
char node_name[MAXLEN];
char upstream_node_name[MAXLEN];
char node_name[NAMEDATALEN];
char upstream_node_name[NAMEDATALEN];
char conninfo[MAXLEN];
char repluser[NAMEDATALEN];
char location[MAXLEN];
@@ -152,7 +211,7 @@ typedef struct s_node_info
/* for ad-hoc use e.g. when working with a list of nodes */
char details[MAXLEN];
bool reachable;
bool attached;
NodeAttached attached;
/* various statistics */
int max_wal_senders;
int attached_wal_receivers;
@@ -160,6 +219,8 @@ typedef struct s_node_info
int total_replication_slots;
int active_replication_slots;
int inactive_replication_slots;
/* replication info */
ReplInfo *replication_info;
} t_node_info;
@@ -186,7 +247,8 @@ typedef struct s_node_info
/* for ad-hoc use e.g. when working with a list of nodes */ \
"", true, true, \
/* various statistics */ \
-1, -1, -1, -1, -1, -1 \
-1, -1, -1, -1, -1, -1, \
NULL \
}
@@ -299,17 +361,7 @@ typedef struct BdrNodeInfoList
0 \
}
typedef struct
{
char current_timestamp[MAXLEN];
XLogRecPtr last_wal_receive_lsn;
XLogRecPtr last_wal_replay_lsn;
char last_xact_replay_timestamp[MAXLEN];
int replication_lag_time;
bool receiving_streamed_wal;
bool wal_replay_paused;
int upstream_last_seen;
} ReplInfo;
typedef struct
{
@@ -413,8 +465,8 @@ bool rollback_transaction(PGconn *conn);
bool set_config(PGconn *conn, const char *config_param, const char *config_value);
bool set_config_bool(PGconn *conn, const char *config_param, bool state);
int guc_set(PGconn *conn, const char *parameter, const char *op, const char *value);
int guc_set_typed(PGconn *conn, const char *parameter, const char *op, const char *value, const char *datatype);
bool get_pg_setting(PGconn *conn, const char *setting, char *output);
bool get_pg_setting_int(PGconn *conn, const char *setting, int *output);
bool alter_system_int(PGconn *conn, const char *name, int value);
bool pg_reload_conf(PGconn *conn);
@@ -426,6 +478,7 @@ RecoveryType get_recovery_type(PGconn *conn);
int get_primary_node_id(PGconn *conn);
int get_ready_archive_files(PGconn *conn, const char *data_directory);
bool identify_system(PGconn *repl_conn, t_system_identification *identification);
uint64 system_identifier(PGconn *conn);
TimeLineHistoryEntry *get_timeline_history(PGconn *repl_conn, TimeLineID tli);
/* repmgrd shared memory functions */
@@ -439,6 +492,8 @@ bool repmgrd_is_running(PGconn *conn);
bool repmgrd_is_paused(PGconn *conn);
bool repmgrd_pause(PGconn *conn, bool pause);
pid_t get_wal_receiver_pid(PGconn *conn);
int repmgrd_get_upstream_node_id(PGconn *conn);
bool repmgrd_set_upstream_node_id(PGconn *conn, int node_id);
/* extension functions */
ExtensionStatus get_repmgr_extension_status(PGconn *conn, t_extension_versions *extversions);
@@ -467,6 +522,7 @@ bool get_primary_node_record(PGconn *conn, t_node_info *node_info);
bool get_all_node_records(PGconn *conn, NodeInfoList *node_list);
void get_downstream_node_records(PGconn *conn, int node_id, NodeInfoList *nodes);
void get_active_sibling_node_records(PGconn *conn, int node_id, int upstream_node_id, NodeInfoList *node_list);
bool get_child_nodes(PGconn *conn, int node_id, NodeInfoList *node_list);
void get_node_records_by_priority(PGconn *conn, NodeInfoList *node_list);
bool get_all_node_records_with_upstream(PGconn *conn, NodeInfoList *node_list);
bool get_downstream_nodes_with_missing_slot(PGconn *conn, int this_node_id, NodeInfoList *noede_list);
@@ -517,6 +573,7 @@ int wait_connection_availability(PGconn *conn, int timeout);
/* node availability functions */
bool is_server_available(const char *conninfo);
bool is_server_available_quiet(const char *conninfo);
bool is_server_available_params(t_conninfo_param_list *param_list);
ExecStatusType connection_ping(PGconn *conn);
ExecStatusType connection_ping_reconnect(PGconn *conn);
@@ -556,10 +613,12 @@ XLogRecPtr get_last_wal_receive_location(PGconn *conn);
void init_replication_info(ReplInfo *replication_info);
bool get_replication_info(PGconn *conn, t_server_type node_type, ReplInfo *replication_info);
int get_replication_lag_seconds(PGconn *conn);
TimeLineID get_node_timeline(PGconn *conn);
void get_node_replication_stats(PGconn *conn, t_node_info *node_info);
bool is_downstream_node_attached(PGconn *conn, char *node_name);
void set_upstream_last_seen(PGconn *conn);
NodeAttached is_downstream_node_attached(PGconn *conn, char *node_name);
void set_upstream_last_seen(PGconn *conn, int upstream_node_id);
int get_upstream_last_seen(PGconn *conn, t_server_type node_type);
bool is_wal_replay_paused(PGconn *conn, bool check_pending_wal);
/* BDR functions */

View File

@@ -276,6 +276,8 @@ is_pg_running(const char *path)
log_warning(_("invalid data in PostgreSQL PID file \"%s\""), path);
}
fclose(pidf);
return PG_DIR_NOT_RUNNING;
}
@@ -334,6 +336,15 @@ create_pg_dir(const char *path, bool force)
{
log_notice(_("-F/--force provided - deleting existing data directory \"%s\""), path);
nftw(path, unlink_dir_callback, 64, FTW_DEPTH | FTW_PHYS);
/* recreate the directory ourselves to ensure permissions are correct */
if (!create_dir(path))
{
log_error(_("unable to create directory \"%s\"..."),
path);
return false;
}
return true;
}
@@ -345,6 +356,15 @@ create_pg_dir(const char *path, bool force)
{
log_notice(_("deleting existing directory \"%s\""), path);
nftw(path, unlink_dir_callback, 64, FTW_DEPTH | FTW_PHYS);
/* recreate the directory ourselves to ensure permissions are correct */
if (!create_dir(path))
{
log_error(_("unable to create directory \"%s\"..."),
path);
return false;
}
return true;
}
return false;

8
doc/.gitignore vendored
View File

@@ -1,7 +1,9 @@
HTML.index
bookindex.sgml
bookindex.xml
html-stamp
html/
nochunks.dsl
repmgr.html
version.sgml
version.xml
*.fo
*.pdf
*.sgml

101
doc/Makefile Normal file
View File

@@ -0,0 +1,101 @@
# Make "html" the default target, since that is what most people tend
# to want to use.
html:
all: html
subdir = doc
repmgr_top_builddir = ..
include $(repmgr_top_builddir)/Makefile.global
XMLINCLUDE = --path .
ifndef XMLLINT
XMLLINT = $(missing) xmllint
endif
ifndef XSLTPROC
XSLTPROC = $(missing) xsltproc
endif
ifndef FOP
FOP = $(missing) fop
endif
override XSLTPROCFLAGS += --stringparam repmgr.version '$(REPMGR_VERSION)'
GENERATED_XML = version.xml
ALLXML := $(wildcard $(srcdir)/*.xml) $(GENERATED_XML)
version.xml: $(repmgr_top_builddir)/repmgr_version.h
{ \
echo "<!ENTITY repmgrversion \"$(REPMGR_VERSION)\">"; \
} > $@
##
## HTML
##
html: html-stamp
html-stamp: stylesheet.xsl repmgr.xml $(ALLXML)
$(XMLLINT) $(XMLINCLUDE) --noout --valid $(word 2,$^)
$(XSLTPROC) $(XMLINCLUDE) $(XSLTPROCFLAGS) $(XSLTPROC_HTML_FLAGS) $(wordlist 1,2,$^)
cp $(srcdir)/stylesheet.css $(srcdir)/website-docs.css html/
touch $@
# single-page HTML
repmgr.html: stylesheet-html-nochunk.xsl repmgr.xml $(ALLXML)
$(XMLLINT) $(XMLINCLUDE) --noout --valid $(word 2,$^)
$(XSLTPROC) $(XMLINCLUDE) $(XSLTPROCFLAGS) $(XSLTPROC_HTML_FLAGS) -o $@ $(wordlist 1,2,$^)
zip: html
cp -r html repmgr-docs-$(REPMGR_VERSION)
zip -r repmgr-docs-$(REPMGR_VERSION).zip repmgr-docs-$(REPMGR_VERSION)
rm -rf repmgr-docs-$(REPMGR_VERSION)
##
## Print
##
repmgr.pdf:
$(error Invalid target; use repmgr-A4.pdf or repmgr-US.pdf as targets)
# Standard paper size
repmgr-A4.fo: stylesheet-fo.xsl repmgr.xml $(ALLXML)
$(XMLLINT) $(XMLINCLUDE) --noout --valid $(word 2,$^)
$(XSLTPROC) $(XMLINCLUDE) $(XSLTPROCFLAGS) --stringparam paper.type A4 -o $@ $(wordlist 1,2,$^)
repmgr-A4.pdf: repmgr-A4.fo
$(FOP) -fo $< -pdf $@
# North American paper size
repmgr-US.fo: stylesheet-fo.xsl repmgr.xml $(ALLXML)
$(XMLLINT) $(XMLINCLUDE) --noout --valid $(word 2,$^)
$(XSLTPROC) $(XMLINCLUDE) $(XSLTPROCFLAGS) --stringparam paper.type USletter -o $@ $(wordlist 1,2,$^)
repmgr-US.pdf: repmgr-US.fo
$(FOP) -fo $< -pdf $@
install: html
@$(MKDIR_P) $(DESTDIR)$(docdir)/$(docmoduledir)/repmgr
@$(INSTALL_DATA) $(wildcard html/*.html) $(wildcard html/*.css) $(DESTDIR)$(docdir)/$(docmoduledir)/repmgr
@echo Installed docs to $(DESTDIR)$(docdir)/$(docmoduledir)/repmgr
clean:
rm -f html-stamp
rm -f HTML.index $(GENERATED_XML)
rm -f repmgr.html
rm -f repmgr-A4.pdf
rm -f repmgr-US.pdf
maintainer-clean:
rm -rf html
.PHONY: html

View File

@@ -1,76 +0,0 @@
repmgr_subdir = doc
repmgr_top_builddir = ..
include $(repmgr_top_builddir)/Makefile.global
ifndef JADE
JADE = $(missing) jade
endif
SGMLINCLUDE = -D . -D ${srcdir}
SPFLAGS += -wall -wno-unused-param -wno-empty -wfully-tagged
JADE.html.call = $(JADE) $(JADEFLAGS) $(SPFLAGS) $(SGMLINCLUDE) $(CATALOG) -t sgml -i output-html
ALLSGML := $(wildcard $(srcdir)/*.sgml)
# to build bookindex
ALMOSTALLSGML := $(filter-out %bookindex.sgml,$(ALLSGML))
GENERATED_SGML = version.sgml bookindex.sgml
Makefile: Makefile.in
cd $(repmgr_top_builddir) && ./config.status doc/Makefile
all: html
html: html-stamp
html-stamp: repmgr.sgml $(ALLSGML) $(GENERATED_SGML) stylesheet.dsl website-docs.css
$(MKDIR_P) html
$(JADE.html.call) -d stylesheet.dsl -i include-index $<
cp $(srcdir)/stylesheet.css $(srcdir)/website-docs.css html/
touch $@
repmgr.html: repmgr.sgml $(ALLSGML) $(GENERATED_SGML) stylesheet.dsl website-docs.css
sed '/html-index-filename/a\
(define nochunks #t)' <stylesheet.dsl >nochunks.dsl
$(JADE.html.call) -d nochunks.dsl -i include-index $< >repmgr.html
version.sgml: ${repmgr_top_builddir}/repmgr_version.h
{ \
echo "<!ENTITY repmgrversion \"$(REPMGR_VERSION)\">"; \
} > $@
HTML.index: repmgr.sgml $(ALMOSTALLSGML) stylesheet.dsl
@$(MKDIR_P) html
$(JADE.html.call) -d stylesheet.dsl -V html-index $<
website-docs.css:
@$(MKDIR_P) html
curl http://www.postgresql.org/media/css/docs.css > ${srcdir}/website-docs.css
bookindex.sgml: HTML.index
ifdef COLLATEINDEX
LC_ALL=C $(PERL) $(COLLATEINDEX) -f -g -i 'bookindex' -o $@ $<
else
@$(missing) collateindex.pl $< $@
endif
clean:
rm -f html-stamp
rm -f HTML.index $(GENERATED_SGML)
maintainer-clean:
rm -rf html
rm -f Makefile
zip: html
cp -r html repmgr-docs-$(REPMGR_VERSION)
zip -r repmgr-docs-$(REPMGR_VERSION).zip repmgr-docs-$(REPMGR_VERSION)
rm -rf repmgr-docs-$(REPMGR_VERSION)
install: html
@$(MKDIR_P) $(DESTDIR)$(docdir)/$(docmoduledir)/repmgr
@$(INSTALL_DATA) $(wildcard html/*.html) $(wildcard html/*.css) $(DESTDIR)$(docdir)/$(docmoduledir)/repmgr
@echo Installed docs to $(DESTDIR)$(docdir)/$(docmoduledir)/repmgr
.PHONY: html all

View File

@@ -1,9 +1,10 @@
<appendix id="appendix-faq" xreflabel="FAQ">
<indexterm>
<primary>FAQ (Frequently Asked Questions)</primary>
</indexterm>
<title>FAQ (Frequently Asked Questions)</title>
<title>FAQ (Frequently Asked Questions)</title>
<indexterm>
<primary>FAQ (Frequently Asked Questions)</primary>
</indexterm>
<sect1 id="faq-general" xreflabel="General">
<title>General</title>
@@ -19,7 +20,7 @@
<para>
&repmgr; 3.x builds on the improved replication facilities added
in PostgreSQL 9.3, as well as improved automated failover support
via <application>repmgrd</application>, and is not compatible with PostgreSQL 9.2
via &repmgrd;, and is not compatible with PostgreSQL 9.2
and earlier. We recommend upgrading to &repmgr; 4, as the &repmgr; 3.x
series is no longer maintained.
</para>
@@ -125,7 +126,7 @@
<sect2 id="faq-old-packages">
<title>How can I obtain old versions of &repmgr; packages?</title>
<para>
See appendix <xref linkend="packages-old-versions"> for details.
See appendix <xref linkend="packages-old-versions"/> for details.
</para>
</sect2>
@@ -135,7 +136,7 @@
No.
</para>
<para>
&repmgr; (together with <application>repmgrd</application>) assists with
&repmgr; (together with &repmgrd;) assists with
<emphasis>managing</emphasis> replication. It does not actually perform replication, which
is part of the core PostgreSQL functionality.
</para>
@@ -152,8 +153,8 @@
<title>Does it matter if different &repmgr; versions are present in the replication cluster?</title>
<para>
Yes. If different &quot;major&quot; &repmgr; versions (e.g. 3.3.x and 4.1.x) are present,
&repmgr; (in particular <application>repmgrd</application>)
may not run, or run properly, or in the worst case (if different <application>repmgrd</application>
&repmgr; (in particular &repmgrd;)
may not run, or run properly, or in the worst case (if different &repmgrd;
versions are running and there are differences in the failover implementation) break
your replication cluster.
</para>
@@ -251,8 +252,8 @@
</para>
<para>
&repmgr; provides the command <command>repmgr node rejoin</command> which can
optionally execute <command>pg_rewind</command>; see the <xref linkend="repmgr-node-rejoin">
documentation for details, in particular the section <xref linkend="repmgr-node-rejoin-pg-rewind">.
optionally execute <command>pg_rewind</command>; see the <xref linkend="repmgr-node-rejoin"/>
documentation for details, in particular the section <xref linkend="repmgr-node-rejoin-pg-rewind"/>.
</para>
<para>
If <command>pg_rewind</command> cannot be used, then the data directory will need
@@ -276,25 +277,25 @@
directory in <filename>/etc</filename>?</title>
<para>
Use the command line option <literal>--copy-external-config-files</literal>. For more details
see <xref linkend="repmgr-standby-clone-config-file-copying">.
see <xref linkend="repmgr-standby-clone-config-file-copying"/>.
</para>
</sect2>
<sect2 id="faq-repmgr-shared-preload-libaries-no-repmgrd" xreflabel="shared_preload_libraries without repmgrd">
<title>Do I need to include <literal>shared_preload_libraries = 'repmgr'</literal>
in <filename>postgresql.conf</filename> if I'm not using <application>repmgrd</application>?</title>
in <filename>postgresql.conf</filename> if I'm not using &repmgrd;?</title>
<para>
No, the <literal>repmgr</literal> shared library is only needed when running <application>repmgrd</application>.
If you later decide to run <application>repmgrd</application>, you just need to add
No, the <literal>repmgr</literal> shared library is only needed when running &repmgrd;.
If you later decide to run &repmgrd;, you just need to add
<literal>shared_preload_libraries = 'repmgr'</literal> and restart PostgreSQL.
</para>
</sect2>
<sect2 id="faq-repmgr-permissions" xreflabel="Replication permission problems">
<title>I've provided replication permission for the <literal>repmgr</literal> user in <filename>pg_hba.conf</filename>
but <command>repmgr</command>/<application>repmgrd</application> complains it can't connect to the server... Why?</title>
but <command>repmgr</command>/&repmgrd; complains it can't connect to the server... Why?</title>
<para>
<command>repmgr</command> and <application>repmgrd</application> need to be able to connect to the repmgr database
<command>repmgr</command> and &repmgrd; need to be able to connect to the repmgr database
with a normal connection to query metadata. The <literal>replication</literal> connection
permission is for PostgreSQL's streaming replication (and doesn't necessarily need to be the <literal>repmgr</literal> user).
</para>
@@ -317,7 +318,7 @@
<para>
Provide the option <literal>--waldir</literal> (<literal>--xlogdir</literal> in PostgreSQL 9.6
and earlier) with the absolute path to the WAL directory in <varname>pg_basebackup_options</varname>.
For more details see <xref linkend="cloning-advanced-pg-basebackup-options">.
For more details see <xref linkend="cloning-advanced-pg-basebackup-options"/>.
</para>
</sect2>
@@ -349,7 +350,7 @@
</sect1>
<sect1 id="faq-repmgrd" xreflabel="repmgrd">
<title><application>repmgrd</application></title>
<title>&repmgrd;</title>
<sect2 id="faq-repmgrd-prevent-promotion" xreflabel="Prevent standby from being promoted to primary">
@@ -365,12 +366,12 @@
</sect2>
<sect2 id="faq-repmgrd-delayed-standby" xreflabel="Delayed standby support">
<title>Does <application>repmgrd</application> support delayed standbys?</title>
<title>Does &repmgrd; support delayed standbys?</title>
<para>
<application>repmgrd</application> can monitor delayed standbys - those set up with
&repmgrd; can monitor delayed standbys - those set up with
<varname>recovery_min_apply_delay</varname> set to a non-zero value
in <filename>recovery.conf</filename> - but as it's not currently possible
to directly examine the value applied to the standby, <application>repmgrd</application>
to directly examine the value applied to the standby, &repmgrd;
may not be able to properly evaluate the node as a promotion candidate.
</para>
<para>
@@ -379,25 +380,25 @@
<filename>repmgr.conf</filename>.
</para>
<para>
Note that after registering a delayed standby, <application>repmgrd</application> will only start
Note that after registering a delayed standby, &repmgrd; will only start
once the metadata added in the primary node has been replicated.
</para>
</sect2>
<sect2 id="faq-repmgrd-logfile-rotate" xreflabel="repmgrd logfile rotation">
<title>How can I get <application>repmgrd</application> to rotate its logfile?</title>
<title>How can I get &repmgrd; to rotate its logfile?</title>
<para>
Configure your system's <literal>logrotate</literal> service to do this; see <xref linkend="repmgrd-log-rotation">.
Configure your system's <literal>logrotate</literal> service to do this; see <xref linkend="repmgrd-log-rotation"/>.
</para>
</sect2>
<sect2 id="faq-repmgrd-recloned-no-start" xreflabel="repmgrd not restarting after node cloned">
<title>I've recloned a failed primary as a standby, but <application>repmgrd</application> refuses to start?</title>
<title>I've recloned a failed primary as a standby, but &repmgrd; refuses to start?</title>
<para>
Check you registered the standby after recloning. If unregistered, the standby
cannot be considered as a promotion candidate even if <varname>failover</varname> is set to
<literal>automatic</literal>, which is probably not what you want. <application>repmgrd</application> will start if
<literal>automatic</literal>, which is probably not what you want. &repmgrd; will start if
<varname>failover</varname> is set to <literal>manual</literal> so the node's replication status can still
be monitored, if desired.
</para>
@@ -405,24 +406,24 @@
<sect2 id="faq-repmgrd-pg-bindir" xreflabel="repmgrd does not apply pg_bindir to promote_command or follow_command">
<title>
<application>repmgrd</application> ignores pg_bindir when executing <varname>promote_command</varname> or <varname>follow_command</varname>
&repmgrd; ignores pg_bindir when executing <varname>promote_command</varname> or <varname>follow_command</varname>
</title>
<para>
<varname>promote_command</varname> or <varname>follow_command</varname> can be user-defined scripts,
so &repmgr; will not apply <option>pg_bindir</option> even if excuting &repmgr;. Always provide the full
path; see <xref linkend="repmgrd-automatic-failover-configuration"> for more details.
path; see <xref linkend="repmgrd-automatic-failover-configuration"/> for more details.
</para>
</sect2>
<sect2 id="faq-repmgrd-startup-no-upstream" xreflabel="repmgrd does not start if upstream node is not running">
<title>
<application>repmgrd</application> aborts startup with the error "<literal>upstream node must be running before repmgrd can start</literal>"
&repmgrd; aborts startup with the error "<literal>upstream node must be running before repmgrd can start</literal>"
</title>
<para>
<application>repmgrd</application> does this to avoid starting up on a replication cluster
which is not in a healthy state. If the upstream is unavailable, <application>repmgrd</application>
&repmgrd; does this to avoid starting up on a replication cluster
which is not in a healthy state. If the upstream is unavailable, &repmgrd;
may initiate a failover immediately after starting up, which could have unintended side-effects,
particularly if <application>repmgrd</application> is not running on other nodes.
particularly if &repmgrd; is not running on other nodes.
</para>
<para>
In particular, it's possible that the node's local copy of the <literal>repmgr.nodes</literal> copy
@@ -430,7 +431,7 @@
</para>
<para>
The onus is therefore on the adminstrator to manually set the cluster to a stable, healthy state before
starting <application>repmgrd</application>.
starting &repmgrd;.
</para>
</sect2>

View File

@@ -1,9 +1,11 @@
<appendix id="appendix-packages" xreflabel="Package details">
<title>&repmgr; package details</title>
<indexterm>
<primary>packages</primary>
</indexterm>
<title>&repmgr; package details</title>
<para>
This section provides technical details about various &repmgr; binary
packages, such as location of the installed binaries and
@@ -309,8 +311,8 @@
version number for your installation.
</para>
<para>
See also <xref linkend="repmgrd-configuration-debian-ubuntu"> for some specifics related
to configuring the <application>repmgrd</application> daemon.
See also <xref linkend="repmgrd-configuration-debian-ubuntu"/> for some specifics related
to configuring the &repmgrd; daemon.
</para>
<table id="debian-9-packages">
@@ -481,34 +483,12 @@ repmgr96-4.1.1-0.0git320.g5113ab0.1.el7.x86_64.rpm</programlisting>
<sect2 id="packages-old-versions-rhel-centos" xreflabel="old RHEL/CentOS package versions">
<title>RHEL/CentOS</title>
<para>
Old RPM packages (<literal>3.2</literal> and later) can be retrieved from the
(deprecated) 2ndQuadrant repository at
<ulink url="http://packages.2ndquadrant.com/">http://packages.2ndquadrant.com/</ulink>
by installing the appropriate repository RPM:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
<listitem>
<simpara>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
</itemizedlist>
<para>
Old versions can be located with e.g.:
<programlisting>
yum --showduplicates list repmgr96</programlisting>
(substitute the appropriate package name; see <xref linkend="packages-centos">) and installed with:
(substitute the appropriate package name; see <xref linkend="packages-centos"/>) and installed with:
<programlisting>
yum install {package_name}-{version}</programlisting>
where <literal>{package_name}</literal> is the base package name (e.g. <literal>repmgr96</literal>)
@@ -520,6 +500,32 @@ repmgr96-4.1.1-0.0git320.g5113ab0.1.el7.x86_64.rpm</programlisting>
yum install repmgr96-4.0.6-1.rhel6</programlisting>
</para>
<sect3 id="packages-old-versions-rhel-centos-repmgr3">
<title>repmgr 3 packages</title>
<para>
Old &repmgr; 3 RPM packages (<literal>3.2</literal> and later) can be retrieved from the
(deprecated) 2ndQuadrant repository at
<ulink url="http://packages.2ndquadrant.com/repmgr/yum/">http://packages.2ndquadrant.com/repmgr/yum/</ulink>
by installing the appropriate repository RPM:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
<listitem>
<simpara>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
</itemizedlist>
</sect3>
</sect2>
</sect1>
@@ -546,13 +552,13 @@ repmgr96-4.1.1-0.0git320.g5113ab0.1.el7.x86_64.rpm</programlisting>
char package_conf_file[MAXPGPATH] = "";</programlisting>
</para>
<para>
See also: <xref linkend="configuration-file">
See also: <xref linkend="configuration-file"/>
</para>
</listitem>
<listitem>
<para>
PID file location: the default <application>repmgrd</application> PID file
PID file location: the default &repmgrd; PID file
location can be hard-coded by patching <varname>package_pid_file</varname>
in <filename>repmgrd.c</filename>:
<programlisting>
@@ -560,7 +566,7 @@ repmgr96-4.1.1-0.0git320.g5113ab0.1.el7.x86_64.rpm</programlisting>
char package_pid_file[MAXPGPATH] = "";</programlisting>
</para>
<para>
See also: <xref linkend="repmgrd-pid-file">
See also: <xref linkend="repmgrd-pid-file"/>
</para>
</listitem>

View File

@@ -12,19 +12,296 @@
</para>
<para>
See also: <xref linkend="upgrading-repmgr">
See also: <xref linkend="upgrading-repmgr"/>
</para>
<sect1 id="release-4.4">
<title>Release 4.4</title>
<para><emphasis>?? June, 2019</emphasis></para>
<sect2>
<title>repmgr client enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
<link linkend="repmgr-standby-clone"><command>repmgr standby clone</command></link>:
prevent a standby from being cloned from a witness server (PostgreSQL 9.6 and later only).
</para>
</listitem>
<listitem>
<para>
<link linkend="repmgr-witness-register"><command>repmgr witness register</command></link>:
prevent a witness server from being registered on the replication cluster primary server
(PostgreSQL 9.6 and later only).
</para>
<para>
Registering a witness on the primary node would defeat the purpose of having a witness server,
which is intended to remain running even if the cluster's primary goes down.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
if <option>--siblings-follow</option> is not supplied, list all nodes which repmgr considers
to be siblings (this will include the witness server, if in use), and
which will remain attached to the old primary.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
ignore nodes which are unreachable and marked as inactive.
Previously it would abort if any node was unreachable,
as that means it was unable to check if repmgrd is running.
</para>
<para>
However if the node has been marked as inactive in the repmgr metadata, it's
reasonable to assume the node is no longer part of the replication cluster
and does not need to be checked.
</para>
</listitem>
<listitem>
<para>
<link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>
and <link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>:
when executing with the <option>--dry-run</option> option, continue checks as far as possible
even if errors are encountered.
</para>
</listitem>
<listitem>
<para>
<link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>:
add <option>--siblings-follow</option> (similar to
<link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>).
</para>
<note>
<para>
If using &repmgrd;, when invoking
<command>repmgr standby promote</command> (either directly via
the <option>promote_command</option>, or in a script called
via <option>promote_command</option>), <option>--siblings-follow</option>
<emphasis>must not</emphasis> be included as a
command line option for <link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>.
</para>
</note>
</listitem>
<listitem>
<para>
<link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>:
add <option>--repmgrd-force-unpause</option> to unpause all &repmgrd; instances after executing a switchover.
This will ensure that any &repmgrd; instances which were paused before the switchover will be
unpaused.
</para>
</listitem>
<listitem>
<para>
<link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>:
make output similar to that of
<link linkend="repmgr-cluster-show"><command>repmgr cluster show</command></link>
for consistency and to make it easier to identify nodes not in the expected
state.
</para>
</listitem>
<listitem>
<para>
<link linkend="repmgr-cluster-show"><command>repmgr cluster show</command></link>:
display each node's timeline ID (PostgreSQL 9.6 and later only).
</para>
</listitem>
<listitem>
<para>
<link linkend="repmgr-cluster-show"><command>repmgr cluster show</command></link>
and <link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>:
show the upstream node name as reported by each individual node - this helps visualise
situations where the cluster is in an unexpected state, and provide a better idea of the
actual cluster state.
</para>
<para>
For example, if a cluster has divided somehow and a set of nodes are
following a new primary, when running either of these commands, &repmgr;
will now show the name of the primary those nodes are actually
following, rather than the now outdated node name recorded
on the other side of the &quot;split&quot;. A warning will also be issued
about the unexpected situation.
</para>
</listitem>
<listitem>
<para>
<link linkend="repmgr-cluster-show"><command>repmgr cluster show</command></link>
and <link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>:
check if a node is attached to its advertised upstream node, and issue a
warning if the node is not attached.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>repmgrd enhancements</title>
<para>
<itemizedlist>
<listitem>
<para>
On the primary node, &repmgrd; is now able to monitor standby connections and,
if the number of nodes connected falls below a certain (configurable) value,
execute a custom script.
</para>
<para>
This provided an additional method for fencing an isolated primary node, and/or taking
other action if one or more standys become disconnected.
</para>
<para>
See section <link linkend="repmgrd-primary-child-disconnection">Monitoring standby disconnections on the primary node</link>
for more details.
</para>
</listitem>
<listitem>
<para>
In a failover situation, &repmgrd; nodes on the standbys of the failed primary
are now able confirm among themselves that none can still see the primary
before continuing with the failover.
</para>
<para>
The <filename>repmgr.conf</filename> option <option>primary_visibility_consensus</option> must
be set to <literal>true</literal> to enable this functionality.
</para>
<para>
See section <xref linkend="repmgrd-primary-visibility-consensus"/>
for more details.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>Bug fixes</title>
<para>
<itemizedlist>
<listitem>
<para>
Ensure BDR2-specific functionality cannot be used on BDR3 and later.
</para>
<para>
The BDR support present in &repmgr; is for specific BDR2 use cases.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-clone"><command>repmgr standby clone</command></link>
in <option>--dry-run</option> mode, ensure provision of the <option>--force</option> option
does not result in an existing data directory being modified in any way.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-primary-register"><command>repmgr primary register</command></link>
with the <option>--force</option> option, if another primary record exists but the associated node is
unreachable (or running as a standby), set that node's record to inactive to enable the current node
to be registered as a primary.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-clone"><command>repmgr standby clone</command></link>
with the <option>--upstream-conninfo</option>, ensure that <varname>application_name</varname>
is set correctly in <varname>primary_conninfo</varname>.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
don't abort if one or more nodes are not reachable <emphasis>and</emphasis>
they are marked as inactive.
</para>
</listitem>
<listitem>
<para>
&repmgr;: canonicalize the data directory path when parsing the configuration file, so
the provided path matches the path PostgreSQL reports as its data directory.
Otherwise, if e.g. the data directory is configured with a trailing slash,
<link linkend="repmgr-node-check"><command>repmgr node check --data-directory-config</command></link>
will return a spurious error.
</para>
</listitem>
<listitem>
<para>
&repmgrd;: fix memory leak which occurs while the monitored PostgreSQL node is <emphasis>not</emphasis>
running.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>Other</title>
<para>
<itemizedlist>
<listitem>
<para>
The &repmgr; documentation has been converted to DocBook XML format,
as currently used by the main PostgreSQL project.
This means it can now be built against any PostgreSQL version from 9.5
(previously it was not possible to build the documentation against
PostgreSQL 10 or later), and makes it easier to provide the documentation
in other formats such as PDF.
</para>
<para>
For further details see: <xref linkend="installation-build-repmgr-docs"/>
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<sect1 id="release-4.3">
<title>Release 4.3</title>
<para><emphasis>Mar ???, 2019</emphasis></para>
<para><emphasis>Tue April 2, 2019</emphasis></para>
<para>
&repmgr; 4.3 is a major release.
</para>
<para>
For details on how to upgrade an existing &repmgr; instrallation, see
documentation section <link linkend="upgrading-major-version">Upgrading a major version release</link>.
</para>
<para>
If &repmgrd; is in use, a PostgreSQL restart <emphasis>is</emphasis> required;
in that case we suggest combining this &repmgr; upgrade with the next PostgreSQL
minor release, which will require a PostgreSQL restart in any case.
</para>
<important>
<para>
On Debian-based systems, including Ubuntu, if using <application>repmgrd</application>
On Debian-based systems, including Ubuntu, if using &repmgrd;
please ensure that in the file <filename>/etc/init.d/repmgrd</filename>, the parameter
<varname>REPMGRD_OPTS</varname> contains &quot;<literal>--daemonize=false</literal>&quot;, e.g.:
<programlisting>
@@ -37,7 +314,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
</important>
<sect2>
<title>repmgr enhancements</title>
<title>repmgr client enhancements</title>
<para>
<itemizedlist>
@@ -67,7 +344,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<para>
New commands <link linkend="repmgr-daemon-start"><command>repmgr daemon start</command></link> and
<link linkend="repmgr-daemon-stop"><command>repmgr daemon stop</command></link>:
these provide a standardized way of starting and stopping <application>repmgrd</application>.
these provide a standardized way of starting and stopping &repmgrd;.
GitHub #528.
</para>
<note>
@@ -83,7 +360,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<para>
<link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>
additionally displays the node priority and the interval (in seconds) since the
<application>repmgrd</application> instance last verified its upstream node was available.
&repmgrd; instance last verified its upstream node was available.
</para>
</listitem>
@@ -151,20 +428,20 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<application>repmgrd</application> will no longer consider nodes where <application>repmgrd</application>
is not running as promotion candidates.
&repmgrd; will no longer consider nodes where &repmgrd;
is not running as promotion candidates.
</para>
<para>
Previously, if &repmgrd; was not running on a node, but
that node qualified as the promotion candidate, it would never be promoted due to
the absence of a running &repmgrd;.
</para>
<para>
Previously, if <application>repmgrd</application> was not running on a node, but
that node qualified as the promotion candidate, it would never be promoted due to
the absence of a running <application>repmgrd</application>.
</para>
</listitem>
<listitem>
<para>
Add option <option>connection_check_type</option> to enable selection of the method
<application>repmgrd</application> uses to determine whether the upstream node is available.
&repmgrd; uses to determine whether the upstream node is available.
</para>
<para>
Possible values are <literal>ping</literal> (default; uses <command>PQping()</command> to
@@ -177,7 +454,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
New configuration option <link linkend="repmgrd-failover-validation"><option>failover_validation_command</option></link>
to allow an external mechanism to validate the failover decision made by <application>repmgrd</application>.
to allow an external mechanism to validate the failover decision made by &repmgrd;.
</para>
</listitem>
@@ -188,6 +465,14 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
</para>
</listitem>
<listitem>
<para>
In a failover situation, &repmgrd; will not attempt to promote a
node if another primary has already appeared (e.g. by being promoted manually).
GitHub #420.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
@@ -197,6 +482,35 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-cluster-show">repmgr cluster show</link></command>:
fix display of node IDs with multiple digits.
</para>
</listitem>
<listitem>
<para>
ensure <command><link linkend="repmgr-primary-unregister">repmgr primary unregister</link></command>
behaves correctly when executed on a witness server. GitHub #548.
</para>
</listitem>
<listitem>
<para>
ensure <command><link linkend="repmgr-standby-register">repmgr standby register</link></command>
fails when <option>--upstream-node-id</option> is the same as the local node ID.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-clone"><command>repmgr standby clone</command></link>,
recheck primary/upstream connection(s) after the data copy operation is complete, as these may
have gone away.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <command><link linkend="repmgr-standby-switchover">repmgr standby switchover</link></command>,
@@ -207,16 +521,8 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
&repmgr;: when executing <command><link linkend="repmgr-witness-register">repmgr witness register</link></command>,
chech the node to connected is actually the primary (i.e. not the witness server). GitHub #528.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-clone"><command>repmgr standby clone</command></link>,
recheck primary/upstream connection(s) after the data copy operation is complete, as these may
have gone away.
&repmgr;: when executing <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
verify the standby (promotion candidate) is currently attached to the primary (demotion candidate). GitHub #519.
</para>
</listitem>
@@ -224,47 +530,32 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
avoid a potential race condition when comparing received WAL on the standby to the primary's shutdown location,
as the standby's walreceiver may not have yet flushed all received WAL to disk. GitHub #518.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
verify the standby (promotion candidate) is currently attached to the primary (demotion candidate). GitHub #519.
as the standby's walreceiver may not have yet flushed all received WAL to disk. GitHub #518.
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: on a cascaded standby, don't fail over if
<literal>failover=manual</literal>. GitHub #531.
&repmgr;: when executing <command><link linkend="repmgr-witness-register">repmgr witness register</link></command>,
check the node to connected is actually the primary (i.e. not the witness server). GitHub #528.
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-cluster-show">repmgr cluster show</link></command>:
fix display of node IDs with multiple digits.
</para>
</listitem>
<listitem>
<para>
ensure <command><link linkend="repmgr-primary-unregister">repmgr primary unregister</link></command>
behaves correctly when executed on a witness server. GitHub #548.
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-node-check">repmgr node check</link></command>
will only consider physical replication slots, as the purpose
of slot checks is to warn about potential issues with
streaming replication standbys which are no longer attached.
</para>
</listitem>
will only consider physical replication slots, as the purpose
of slot checks is to warn about potential issues with
streaming replication standbys which are no longer attached.
</para>
</listitem>
<listitem>
<para>
&repmgrd;: on a cascaded standby, don't fail over if
<literal>failover=manual</literal>. GitHub #531.
</para>
</listitem>
</itemizedlist>
</para>
@@ -290,7 +581,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<important>
<para>
On Debian-based systems, including Ubuntu, if using <application>repmgrd</application>
On Debian-based systems, including Ubuntu, if using &repmgrd;
please ensure that the in the file <filename>/etc/init.d/repmgrd</filename>, the parameter
<varname>REPMGRD_OPTS</varname> contains &quot;<literal>--daemonize=false</literal>&quot;, e.g.:
<programlisting>
@@ -385,12 +676,12 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<application>repmgrd</application> can now be &quot;paused&quot;, i.e. instructed
&repmgrd; can now be &quot;paused&quot;, i.e. instructed
not to take any action such as a failover, even if the prerequisites for such an
action are detected.
</para>
<para>
This removes the need to stop <application>repmgrd</application> on all nodes when
This removes the need to stop &repmgrd; on all nodes when
performing a planned operation such as a switchover.
</para>
<para>
@@ -416,7 +707,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<application>repmgrd</application>: fix parsing of <option>-d/--daemonize</option> option.
&repmgrd;: fix parsing of <option>-d/--daemonize</option> option.
</para>
</listitem>
@@ -434,8 +725,8 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<para>
We recommend upgrading to this version as soon as possible.
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.1.0;
<application>repmgrd</application> (if running) should be restarted.
See <xref linkend="upgrading-repmgr"> for more details.
&repmgrd; (if running) should be restarted.
See <xref linkend="upgrading-repmgr"/> for more details.
</para>
<sect2>
@@ -516,8 +807,8 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<para>
Check <varname>promote_command</varname> and <varname>follow_command</varname>
are defined when reloading configuration. These were checked on startup but
not reload by <application>repmgrd</application>, which made it possible to
make <application>repmgrd</application> with invalid values. It's unlikely
not reload by &repmgrd;, which made it possible to
make &repmgrd; with invalid values. It's unlikely
anyone would want to do this, but we should make it impossible anyway.
(GitHub #486).
</para>
@@ -558,7 +849,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<application>repmgrd</application>: fix startup on witness node when local data is stale. (GitHub #488, #489).
&repmgrd;: fix startup on witness node when local data is stale. (GitHub #488, #489).
</para>
</listitem>
@@ -584,7 +875,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<title>Release 4.1.0</title>
<para><emphasis>Tue July 31, 2018</emphasis></para>
<para>
&repmgr; 4.1.0 introduces some changes to <application>repmgrd</application>
&repmgr; 4.1.0 introduces some changes to &repmgrd;
behaviour and some additional configuration parameters.
</para>
<para>
@@ -600,7 +891,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
</listitem>
<listitem>
<para>
<application>repmgrd</application> must be restarted on all nodes where it is running.
&repmgrd; must be restarted on all nodes where it is running.
</para>
</listitem>
@@ -610,7 +901,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
for this release (unless upgrading from repmgr 3.x).
</para>
<para>
See <xref linkend="upgrading-repmgr-extension"> for more details.
See <xref linkend="upgrading-repmgr-extension"/> for more details.
</para>
<para>
@@ -625,7 +916,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
</para>
<para>
Coinciding with this release, the 2ndQuadrant repository structure has changed.
See section <xref linkend="installation-packages"> for details, particularly
See section <xref linkend="installation-packages"/> for details, particularly
if you are using a RPM-based system.
</para>
</note>
@@ -638,7 +929,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
Default for <xref linkend="repmgr-conf-log-level"> is now <option>INFO</option>.
Default for <xref linkend="repmgr-conf-log-level"/> is now <option>INFO</option>.
This produces additional informative log output, without creating excessive additional
log file volume, and matches the setting assumed for examples in the documentation.
(GitHub #470).
@@ -722,14 +1013,14 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<application>repmgrd</application>: create a PID file by default
(GitHub #457). For details, see <xref linkend="repmgrd-pid-file">.
&repmgrd;: create a PID file by default
(GitHub #457). For details, see <xref linkend="repmgrd-pid-file"/>.
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: daemonize process by default.
&repmgrd;: daemonize process by default.
In case, for whatever reason, the user does not wish to daemonize the
process, provide <option>--daemonize=false</option>.
(GitHub #458).
@@ -798,7 +1089,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<para>
We recommend upgrading to this version as soon as possible.
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.5;
<application>repmgrd</application> (if running) should be restarted. See <xref linkend="upgrading-repmgr">
&repmgrd; (if running) should be restarted. See <xref linkend="upgrading-repmgr"/>
for more details.
</para>
@@ -885,7 +1176,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<application>repmgrd</application>: ensure local node is counted as quorum member
&repmgrd;: ensure local node is counted as quorum member
(GitHub #439)
</para>
</listitem>
@@ -902,7 +1193,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<para>
&repmgr; 4.0.5 contains a number of usability enhancements related to
<application>pg_rewind</application> usage, <filename>recovery.conf</filename>
generation and (in <application>repmgrd</application>) handling of various
generation and (in &repmgrd;) handling of various
corner-case situations, as well as a number of bug fixes.
</para>
@@ -934,7 +1225,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
Add sanity check if <option>--upstream-node-id</option> not supplied when executing
<xref linkend="repmgr-standby-register"> (GitHub #395).
<xref linkend="repmgr-standby-register"/> (GitHub #395).
</para>
</listitem>
@@ -967,7 +1258,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<application>repmgrd</application>: set <literal>connect_timeout=2</literal> (if not explicitly set)
&repmgrd;: set <literal>connect_timeout=2</literal> (if not explicitly set)
when pinging a server.
</para>
</listitem>
@@ -1023,20 +1314,20 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<application>repmgrd</application>: handle <command>pg_ctl promote</command> timeout (GitHub #425).
&repmgrd;: handle <command>pg_ctl promote</command> timeout (GitHub #425).
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: handle failover situation with only two nodes in the primary
&repmgrd;: handle failover situation with only two nodes in the primary
location, and at least one node in another location (GitHub #407).
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: prevent standby connection handle from going stale.
&repmgrd;: prevent standby connection handle from going stale.
</para>
</listitem>
@@ -1060,7 +1351,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
</para>
<para>
This release can be installed as a simple package upgrade from repmgr 4.0 ~ 4.0.3;
<application>repmgrd</application> (if running) should be restarted. See <xref linkend="upgrading-repmgr">
&repmgrd; (if running) should be restarted. See <xref linkend="upgrading-repmgr"/>
for more details.
</para>
@@ -1139,14 +1430,14 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<application>repmgrd</application>: improve detection of status change from primary to
&repmgrd;: improve detection of status change from primary to
standby
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: improve reconnection to the local node after a
&repmgrd;: improve reconnection to the local node after a
failover (previously a connection error due to the node starting up was being
interpreted as the node being unavailable)
</para>
@@ -1154,14 +1445,14 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<application>repmgrd</application>: when running on a witness server, correctly connect
&repmgrd;: when running on a witness server, correctly connect
to new primary after a failover
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: add <link linkend="event-notifications">event notification</link>
&repmgrd;: add <link linkend="event-notifications">event notification</link>
<literal>repmgrd_shutdown</literal> (GitHub #393)
</para>
</listitem>
@@ -1330,7 +1621,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
</para>
<para>
This release can be installed as a simple package upgrade from &repmgr; 4.0.1 or 4.0;
<application>repmgrd</application> (if running) should be restarted.
&repmgrd; (if running) should be restarted.
</para>
<sect2>
@@ -1447,14 +1738,14 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
Fix <xref linkend="repmgr-cluster-show"> when <literal>repmgr</literal> schema not set in search path
Fix <xref linkend="repmgr-cluster-show"/> when <literal>repmgr</literal> schema not set in search path
(GitHub #341)
</para>
</listitem>
<listitem>
<para>
When using <literal>--force-rewind</literal> with <xref linkend="repmgr-node-rejoin">
When using <literal>--force-rewind</literal> with <xref linkend="repmgr-node-rejoin"/>
delete any replication slots copied by <application>pg_rewind</application>
(GitHub #334)
</para>
@@ -1510,7 +1801,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
</note>
<para>
For detailed instructions on upgrading from repmgr 3.x, see <xref linkend="upgrading-from-repmgr-3">.
For detailed instructions on upgrading from repmgr 3.x, see <xref linkend="upgrading-from-repmgr-3"/>.
</para>
<sect2>
@@ -1524,7 +1815,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
the <command>switchover</command> process has been improved and streamlined,
speeding up the switchover process and can also instruct other standbys
to follow the new primary once the switchover has completed. See
<xref linkend="performing-switchover"> for more details.
<xref linkend="performing-switchover"/> for more details.
</para>
</listitem>
@@ -1550,10 +1841,10 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<emphasis>improved logging output</emphasis>:
&repmgr; (and <application>repmgrd</application>) now provide more explicit
&repmgr; (and &repmgrd;) now provide more explicit
logging output giving a better picture of what is going on. Where appropriate,
<literal>DETAIL</literal> and <literal>HINT</literal> log lines provide additional
detail and suggestions for resolving problems. Additionally, <application>repmgrd</application>
detail and suggestions for resolving problems. Additionally, &repmgrd;
now emits informational log lines at regular, configurable intervals
to confirm that it's running correctly and which node(s) it's monitoring.
</para>
@@ -1575,8 +1866,8 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<emphasis>monitoring and status checks</emphasis>:
New commands <xref linkend="repmgr-node-check"> and
<xref linkend="repmgr-node-status"> providing information
New commands <xref linkend="repmgr-node-check"/> and
<xref linkend="repmgr-node-status"/> providing information
about a node's status and replication-related monitoring
output.
</para>
@@ -1586,7 +1877,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
<emphasis>node rejoin</emphasis>:
New commands <xref linkend="repmgr-node-rejoin"> enables a failed
New commands <xref linkend="repmgr-node-rejoin"/> enables a failed
primary to be rejoined to a replication cluster, optionally using
<application>pg_rewind</application> to synchronise its data,
(note that <application>pg_rewind</application> may not be useable
@@ -1600,11 +1891,11 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<emphasis>automatic failover</emphasis>:
improved detection of node status; promotion decision based on a consensual
model, with the promoted primary explicitly informing other standbys to
follow it. The <application>repmgrd</application> daemon will continue
follow it. The &repmgrd; daemon will continue
functioning even if the monitored PostgreSQL instance is down, and resume
monitoring if it reappears. Additionally, if the instance's role has changed
(typically from a primary to a standby, e.g. following reintegration of a
failed primary using <xref linkend="repmgr-node-rejoin">) <application>repmgrd</application>
failed primary using <xref linkend="repmgr-node-rejoin"/>) &repmgrd;
will automatically resume monitoring it as a standby.
</para>
</listitem>
@@ -1668,7 +1959,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
by the configuration file option <varname>replication_user</varname>.
The value (which defaults to the user provided in the <varname>conninfo</varname>
string) will be stored in the &repmgr; metadata for use by
<xref linkend="repmgr-standby-clone"> and <xref linkend="repmgr-standby-follow">.
<xref linkend="repmgr-standby-clone"/> and <xref linkend="repmgr-standby-follow"/>.
</para></listitem>
<listitem><para>
@@ -1683,14 +1974,14 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
to <varname>primary_conninfo</varname> by default; to force the inclusion
of a password (not recommended), use the new configuration file parameter
<varname>use_primary_conninfo_password</varname>. For details, ee section
<xref linkend="cloning-advanced-managing-passwords">.
<xref linkend="cloning-advanced-managing-passwords"/>.
</para></listitem>
</itemizedlist>
</para>
<para>
<application>repmgrd</application>
&repmgrd;
<itemizedlist>
<listitem><para>
@@ -1812,7 +2103,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<simpara>
new parameter <varname>log_status_interval</varname>, which causes
<application>repmgrd</application> to emit a status log
&repmgrd; to emit a status log
line at the specified interval
</simpara>
</listitem>

View File

@@ -1,9 +1,11 @@
<appendix id="appendix-support" xreflabel="repmgr support">
<title>&repmgr; support</title>
<indexterm>
<primary>support</primary>
</indexterm>
<title>&repmgr; support</title>
<para>
<ulink url="https://2ndquadrant.com/">2ndQuadrant</ulink> provides 24x7
production support for &repmgr; and other PostgreSQL
@@ -28,12 +30,13 @@
</important>
<sect1 id="appendix-support-reporting-issues" xreflabel="Reportins Issues">
<title>Reporting Issues</title>
<indexterm>
<primary>support</primary>
<secondary>reporting issues</secondary>
</indexterm>
<title>Reporting Issues</title>
<para>
When asking questions or reporting issues, it is extremely helpful if the following information is included:
@@ -48,7 +51,7 @@
<listitem>
<simpara>
How was &repmgr installed? From source? From packages? If
How was &repmgr; installed? From source? From packages? If
so from which repository?
</simpara>
</listitem>
@@ -80,7 +83,7 @@
the maximum level of logging output.
</para>
<para>
If issues are encountered with <application>repmgrd</application>,
If issues are encountered with &repmgrd;,
please provide relevant extracts from the &repmgr; log files
and if possible the PostgreSQL log itself. Please ensure these
logs do not contain any confidential data.

View File

@@ -2,6 +2,8 @@
<title>Cloning standbys</title>
<sect1 id="cloning-from-barman" xreflabel="Cloning from Barman">
<title>Cloning a standby from Barman</title>
<indexterm>
<primary>cloning</primary>
<secondary>from Barman</secondary>
@@ -11,9 +13,8 @@
<secondary>cloning a standby</secondary>
</indexterm>
<title>Cloning a standby from Barman</title>
<para>
<xref linkend="repmgr-standby-clone"> can use
<xref linkend="repmgr-standby-clone"/> can use
<ulink url="https://www.2ndquadrant.com/">2ndQuadrant</ulink>'s
<ulink url="https://www.pgbarman.org/">Barman</ulink> application
to clone a standby (and also as a fallback source for WAL files).
@@ -73,7 +74,7 @@
<para>
the <varname>restore_command</varname> setting in <filename>repmgr.conf</filename> is configured to
use a copy of the <command>barman-wal-restore</command> script shipped with the
<literal>barman-cli</literal> package (see section <xref linkend="cloning-from-barman-restore-command">
<literal>barman-cli</literal> package (see section <xref linkend="cloning-from-barman-restore-command"/>
below).
</para>
</listitem>
@@ -126,12 +127,13 @@
</para>
</sect2>
<sect2 id="cloning-from-barman-restore-command" xreflabel="Using Barman as a WAL file source">
<indexterm>
<title>Using Barman as a WAL file source</title>
<indexterm>
<primary>Barman</primary>
<secondary>fetching archived WAL</secondary>
</indexterm>
<title>Using Barman as a WAL file source</title>
<para>
As a fallback in case streaming replication is interrupted, PostgreSQL can optionally
retrieve WAL files from an archive, such as that provided by Barman. This is done by
@@ -172,7 +174,9 @@
</sect2>
</sect1>
<sect1 id="cloning-replication-slots" xreflabel="Cloning and replication slots">
<sect1 id="cloning-replication-slots" xreflabel="Cloning and replication slots">
<title>Cloning and replication slots</title>
<indexterm>
<primary>cloning</primary>
<secondary>replication slots</secondary>
@@ -182,7 +186,6 @@
<primary>replication slots</primary>
<secondary>cloning</secondary>
</indexterm>
<title>Cloning and replication slots</title>
<para>
Replication slots were introduced with PostgreSQL 9.4 and are designed to ensure
that any standby connected to the primary using a replication slot will always
@@ -244,18 +247,20 @@
<simpara>
As an alternative we recommend using 2ndQuadrant's <ulink url="https://www.pgbarman.org/">Barman</ulink>,
which offloads WAL management to a separate server, removing the requirement to use a replication
slot for each individual standby to reserve WAL. See section <xref linkend="cloning-from-barman">
slot for each individual standby to reserve WAL. See section <xref linkend="cloning-from-barman"/>
for more details on using &repmgr; together with Barman.
</simpara>
</tip>
</sect1>
<sect1 id="cloning-cascading" xreflabel="Cloning and cascading replication">
<title>Cloning and cascading replication</title>
<indexterm>
<primary>cloning</primary>
<secondary>cascading replication</secondary>
</indexterm>
<title>Cloning and cascading replication</title>
<para>
Cascading replication, introduced with PostgreSQL 9.2, enables a standby server
to replicate from another standby server rather than directly from the primary,
@@ -276,7 +281,7 @@
</para>
<para>
To demonstrate cascading replication, first ensure you have a primary and standby
set up as shown in the <xref linkend="quickstart">.
set up as shown in the <xref linkend="quickstart"/>.
Then create an additional standby server with <filename>repmgr.conf</filename> looking
like this:
<programlisting>
@@ -339,11 +344,11 @@
</sect1>
<sect1 id="cloning-advanced" xreflabel="Advanced cloning options">
<title>Advanced cloning options</title>
<indexterm>
<primary>cloning</primary>
<secondary>advanced options</secondary>
</indexterm>
<title>Advanced cloning options</title>
<sect2 id="cloning-advanced-pg-basebackup-options" xreflabel="pg_basebackup options when cloning a standby">
<title>pg_basebackup options when cloning a standby</title>
@@ -365,7 +370,7 @@
<simpara>
If <application>Barman</application> is set up for the cluster, it's possible to
clone the standby directly from Barman, without any impact on the server the standby
is being cloned from. For more details see <xref linkend="cloning-from-barman">.
is being cloned from. For more details see <xref linkend="cloning-from-barman"/>.
</simpara>
</tip>
<para>
@@ -433,7 +438,7 @@
(but not <filename>~/.pgpass</filename>) and place it into the <varname>primary_conninfo</varname>
string in <filename>recovery.conf</filename>. Note that <varname>PGPASSWORD</varname>
will need to be set during any action which causes <filename>recovery.conf</filename> to be
rewritten, e.g. <xref linkend="repmgr-standby-follow">.
rewritten, e.g. <xref linkend="repmgr-standby-follow"/>.
</para>
<para>
It is of course also possible to include the password value in the <varname>conninfo</varname>
@@ -460,7 +465,7 @@
replication connections and generating <filename>recovery.conf</filename>. This
value will also be stored in the parameter <literal>repmgr.nodes</literal>
table for each node; it no longer needs to be explicitly specified when
cloning a node or executing <xref linkend="repmgr-standby-follow">.
cloning a node or executing <xref linkend="repmgr-standby-follow"/>.
</para>
</sect2>
</sect1>

View File

@@ -1,4 +1,6 @@
<sect1 id="configuration-file-log-settings" xreflabel="log settings">
<title>Log settings</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>log settings</secondary>
@@ -7,10 +9,9 @@
<primary>log settings</primary>
<secondary>configuration in repmgr.conf</secondary>
</indexterm>
<title>Log settings</title>
<para>
By default, &repmgr; and <application>repmgrd</application> write log output to
By default, &repmgr; and &repmgrd; write log output to
<literal>STDERR</literal>. An alternative log destination can be specified
(either a file or <literal>syslog</literal>).
</para>
@@ -24,7 +25,7 @@
<para>
This behaviour can be overriden with the command line option <option>--log-to-file</option>,
which will redirect all logging output to the configured log destination. This is recommended
when &repmgr; is executed by another application, particularly <application>repmgrd</application>,
when &repmgr; is executed by another application, particularly &repmgrd;,
to enable log output generated by the &repmgr; application to be stored for later reference.
</para>
</note>
@@ -32,12 +33,11 @@
<variablelist>
<varlistentry id="repmgr-conf-log-level" xreflabel="log_level">
<term><varname>log_level</varname> (<type>string</type>)
<term><varname>log_level</varname> (<type>string</type>)</term>
<listitem>
<indexterm>
<primary><varname>log_level</varname> configuration file parameter</primary>
</indexterm>
</term>
<listitem>
<para>
One of <option>DEBUG</option>, <option>INFO</option>, <option>NOTICE</option>,
<option>WARNING</option>, <option>ERROR</option>, <option>ALERT</option>, <option>CRIT</option>
@@ -76,11 +76,11 @@
</term>
<listitem>
<para>
If <xref linkend="repmgr-conf-log-facility"> is set to <option>STDERR</option>, log output
If <xref linkend="repmgr-conf-log-facility"/> is set to <option>STDERR</option>, log output
can be redirected to the specified file.
</para>
<para>
See <xref linkend="repmgrd-log-rotation"> for information on configuring log rotation.
See <xref linkend="repmgrd-log-rotation"/> for information on configuring log rotation.
</para>
</listitem>
</varlistentry>
@@ -93,12 +93,12 @@
</term>
<listitem>
<para>
This setting causes <application>repmgrd</application> to emit a status log
This setting causes &repmgrd; to emit a status log
line at the specified interval (in seconds, default <literal>300</literal>)
describing <application>repmgrd</application>'s current state, e.g.:
describing &repmgrd;'s current state, e.g.:
</para>
<programlisting>
[2018-07-12 00:47:32] [INFO] monitoring connection to upstream node "node1" (node ID: 1)</programlisting>
[2018-07-12 00:47:32] [INFO] monitoring connection to upstream node "node1" (ID: 1)</programlisting>
</listitem>
</varlistentry>

View File

@@ -1,10 +1,12 @@
<sect1 id="configuration-file-settings" xreflabel="required configuration file settings">
<title>Required configuration file settings</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>required settings</secondary>
</indexterm>
<title>Required configuration file settings</title>
<para>
Each <filename>repmgr.conf</filename> file must contain the following parameters:
</para>
@@ -39,6 +41,10 @@
called <varname>standby1</varname> (for example), things will be confusing
to say the least.
</para>
<para>
The string's maximum length is 63 characters and it should
contain only printable ASCII characters.
</para>
</listitem>
</varlistentry>
@@ -56,7 +62,7 @@
</para>
<para>
For details on conninfo strings, see section <ulink
url="https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING">Connection Strings</>
url="https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNSTRING">Connection Strings</ulink>
in the PosgreSQL documentation.
</para>
<para>
@@ -65,18 +71,18 @@
string to determine the length of time which elapses before a network
connection attempt is abandoned; for details see <ulink
url="https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNECT-CONNECT-TIMEOUT">
the PostgreSQL documentation</>.
the PostgreSQL documentation</ulink>.
</para>
</listitem>
</varlistentry>
<varlistentry id="repmgr-conf-data-directory" xreflabel="data_directory">
<term><varname>data_directory</varname> (<type>string</type>)
<indexterm>
<primary><varname>data_directory</varname> configuration file parameter</primary>
</indexterm>
</term>
<term><varname>data_directory</varname> (<type>string</type>)</term>
<listitem>
<indexterm>
<primary><varname>data_directory</varname> configuration file parameter</primary>
</indexterm>
<para>
The node's data directory. This is needed by repmgr
when performing operations when the PostgreSQL instance
@@ -90,33 +96,6 @@
</variablelist>
</para>
<para>
For a full list of annotated configuration items, see the file
<ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</ulink>.
</para>
<para>
For <application>repmgrd</application>-specific settings, see <xref linkend="repmgrd-configuration">.
</para>
<note>
<para>
The following parameters in the configuration file can be overridden with
command line options:
<itemizedlist>
<listitem>
<simpara>
<literal>-L/--log-level</literal> overrides <literal>log_level</literal> in
<filename>repmgr.conf</filename>
</simpara>
</listitem>
<listitem>
<simpara>
<literal>-b/--pg_bindir</literal> overrides <literal>pg_bindir</literal> in
<filename>repmgr.conf</filename>
</simpara>
</listitem>
</itemizedlist>
</para>
</note>
</sect1>

View File

@@ -1,4 +1,6 @@
<sect1 id="configuration-file-service-commands" xreflabel="service command settings">
<title>Service command settings</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>service command settings</secondary>
@@ -7,10 +9,9 @@
<primary>service command settings</primary>
<secondary>configuration in repmgr.conf</secondary>
</indexterm>
<title>Service command settings</title>
<para>
In some circumstances, &repmgr; (and <application>repmgrd</application>) need to
In some circumstances, &repmgr; (and &repmgrd;) need to
be able to stop, start or restart PostgreSQL. &repmgr; commands which need to do this
include <link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>,
<link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link> and
@@ -68,7 +69,7 @@
</para>
<para>
Do not confuse this with <varname>promote_command</varname>, which is used
by <application>repmgrd</application> to execute <xref linkend="repmgr-standby-promote">.
by &repmgrd; to execute <xref linkend="repmgr-standby-promote"/>.
</para>
</note>

View File

@@ -1,4 +1,7 @@
<sect1 id="configuration-file" xreflabel="configuration file">
<title>Configuration file</title>
<indexterm>
<primary>repmgr.conf</primary>
</indexterm>
@@ -8,28 +11,26 @@
<secondary>repmgr.conf</secondary>
</indexterm>
<title>Configuration file</title>
<para>
<application>repmgr</application> and <application>repmgrd</application>
<application>repmgr</application> and &repmgrd;
use a common configuration file, by default called
<filename>repmgr.conf</filename> (although any name can be used if explicitly specified).
<filename>repmgr.conf</filename> must contain a number of required parameters, including
the database connection string for the local node and the location
of its data directory; other values will be inferred from defaults if
not explicitly supplied. See section <xref linkend="configuration-file-settings">
not explicitly supplied. See section <xref linkend="configuration-file-settings"/>
for more details.
</para>
<sect2 id="configuration-file-format" xreflabel="configuration file format">
<title>Configuration file format</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>format</secondary>
</indexterm>
<title>Configuration file format</title>
<para>
<filename>repmgr.conf</filename> is a plain text file with one parameter/value
combination per line.
@@ -61,14 +62,79 @@ data_directory = /var/lib/pgsql/11/data</programlisting>
</sect2>
<sect2 id="configuration-file-items" xreflabel="configuration file items">
<title>Configuration file items</title>
<para>
The following sections document some sections of the configuration file:
<itemizedlist>
<listitem>
<simpara>
<xref linkend="configuration-file-settings"/>
</simpara>
</listitem>
<listitem>
<simpara>
<xref linkend="configuration-file-optional-settings"/>
</simpara>
</listitem>
<listitem>
<simpara>
<xref linkend="configuration-file-log-settings"/>
</simpara>
</listitem>
<listitem>
<simpara>
<xref linkend="configuration-file-service-commands"/>
</simpara>
</listitem>
</itemizedlist>
</para>
<para>
For a full list of annotated configuration items, see the file
<ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</ulink>.
</para>
<para>
For &repmgrd;-specific settings, see <xref linkend="repmgrd-configuration"/>.
</para>
<note>
<para>
The following parameters in the configuration file can be overridden with
command line options:
<itemizedlist>
<listitem>
<simpara>
<literal>-L/--log-level</literal> overrides <literal>log_level</literal> in
<filename>repmgr.conf</filename>
</simpara>
</listitem>
<listitem>
<simpara>
<literal>-b/--pg_bindir</literal> overrides <literal>pg_bindir</literal> in
<filename>repmgr.conf</filename>
</simpara>
</listitem>
</itemizedlist>
</para>
</note>
</sect2>
<sect2 id="configuration-file-location" xreflabel="configuration file location">
<title>Configuration file location</title>
<indexterm>
<primary>repmgr.conf</primary>
<secondary>location</secondary>
</indexterm>
<title>Configuration file location</title>
<para>
The configuration file will be searched for in the following locations:
@@ -105,10 +171,10 @@ data_directory = /var/lib/pgsql/11/data</programlisting>
<note>
<para>
If providing the configuration file location with <literal>-f/--config-file</literal>,
avoid using a relative path, particularly when executing <xref linkend="repmgr-primary-register">
and <xref linkend="repmgr-standby-register">, as &repmgr; stores the configuration file location
avoid using a relative path, particularly when executing <xref linkend="repmgr-primary-register"/>
and <xref linkend="repmgr-standby-register"/>, as &repmgr; stores the configuration file location
in the repmgr metadata for use when &repmgr; is executed remotely (e.g. during
<xref linkend="repmgr-standby-switchover">). &repmgr; will attempt to convert the
<xref linkend="repmgr-standby-switchover"/>). &repmgr; will attempt to convert the
a relative path into an absolute one, but this may not be the same as the path you
would explicitly provide (e.g. <filename>./repmgr.conf</filename> might be converted
to <filename>/path/to/./repmgr.conf</filename>, whereas you'd normally write

View File

@@ -2,6 +2,8 @@
<title>repmgr configuration</title>
<sect1 id="configuration-prerequisites" xreflabel="Prerequisites for configuration">
<title>Prerequisites for configuration</title>
<indexterm>
<primary>configuration</primary>
<secondary>prerequisites</secondary>
@@ -12,7 +14,6 @@
<secondary>ssh</secondary>
</indexterm>
<title>Prerequisites for configuration</title>
<para>
Following software must be installed on both servers:
<itemizedlist spacing="compact" mark="bullet">
@@ -62,6 +63,8 @@
</tip>
<sect2 id="configuration-postgresql" xreflabel="PostgreSQL configuration">
<title>PostgreSQL configuration for &repmgr;</title>
<indexterm>
<primary>configuration</primary>
<secondary>PostgreSQL</secondary>
@@ -71,7 +74,6 @@
<primary>PostgreSQL configuration</primary>
</indexterm>
<title>PostgreSQL configuration for &repmgr;</title>
<para>
The following PostgreSQL configuration parameters may need to be changed in order
for &repmgr; (and replication itself) to function correctly.
@@ -81,13 +83,14 @@
<varlistentry>
<indexterm>
<primary>hot_standby</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>hot_standby</option></term>
<listitem>
<indexterm>
<primary>hot_standby</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
<option>hot_standby</option> must always be set to <literal>on</literal>, as &repmgr; needs
to be able to connect to each server it manages.
@@ -104,13 +107,15 @@
<varlistentry>
<indexterm>
<primary>wal_level</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>wal_level</option></term>
<listitem>
<indexterm>
<primary>wal_level</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
<option>wal_level</option> must be one of <option>replica</option> or <option>logical</option>
(PostgreSQL 9.5 and earlier: one of <option>hot_standby</option> or <option>logical</option>).
@@ -123,13 +128,15 @@
<varlistentry>
<indexterm>
<primary>max_wal_senders</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>max_wal_senders</option></term>
<listitem>
<indexterm>
<primary>max_wal_senders</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
<option>max_wal_senders</option> must be set to a value of <literal>2</literal> or greater.
In general you will need one WAL sender for each standby which will attach to the PostgreSQL
@@ -149,13 +156,15 @@
<varlistentry>
<indexterm>
<primary>max_replication_slots</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>max_replication_slots</option></term>
<listitem>
<indexterm>
<primary>max_replication_slots</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
If you are intending to use replication slots, <option>max_replication_slots</option>
must be set to a non-zero value.
@@ -174,19 +183,20 @@
<varlistentry>
<indexterm>
<primary>wal_log_hints</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>wal_log_hints</option></term>
<listitem>
<indexterm>
<primary>wal_log_hints</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>If you are intending to use <application>pg_rewind</application>,
and the cluster was not initialised using data checksums, you may want to consider enabling
<option>wal_log_hints</option>.
</para>
<para>
For more details see <xref linkend="repmgr-node-rejoin-pg-rewind">.
For more details see <xref linkend="repmgr-node-rejoin-pg-rewind"/>.
</para>
<para>
PostgreSQL documentation: <ulink url="https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-WAL-LOG-HINTS">wal_log_hints</ulink>.
@@ -196,13 +206,15 @@
<varlistentry>
<indexterm>
<primary>archive_mode</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>archive_mode</option></term>
<listitem>
<indexterm>
<primary>archive_mode</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
We suggest setting <option>archive_mode</option> to <literal>on</literal> (and
<option>archive_command</option> to <literal>/bin/true</literal>; see below)
@@ -225,13 +237,15 @@
<varlistentry>
<indexterm>
<primary>archive_command</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>archive_command</option></term>
<listitem>
<indexterm>
<primary>archive_command</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
If you have set <option>archive_mode</option> to <literal>on</literal> but are not currently planning
to use WAL file archiving, set <option>archive_command</option> to a command which does nothing but returns
@@ -246,13 +260,15 @@
<varlistentry>
<indexterm>
<primary>wal_keep_segments</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<term><option>wal_keep_segments</option></term>
<listitem>
<indexterm>
<primary>wal_keep_segments</primary>
<secondary>PostgreSQL configuration</secondary>
</indexterm>
<para>
Normally there is no need to set <option>wal_keep_segments</option> (default: <literal>0</literal>), as it
is <emphasis>not</emphasis> a reliable way of ensuring that all required WAL segments are available to standbys.
@@ -289,16 +305,18 @@
&configuration-file;
&configuration-file-required-settings;
&configuration-file-optional-settings;
&configuration-file-log-settings;
&configuration-file-service-commands;
<sect1 id="configuration-permissions" xreflabel="Database user permissions">
<title>repmgr database user permissions</title>
<indexterm>
<primary>configuration</primary>
<secondary>database user permissions</secondary>
</indexterm>
<title>repmgr database user permissions</title>
<para>
&repmgr; will create an extension database containing objects
for administering &repmgr; metadata. The user defined in the <varname>conninfo</varname>

View File

@@ -1,93 +0,0 @@
<chapter id="using-witness-server">
<indexterm>
<primary>witness server</primary>
</indexterm>
<title>Using a witness server</title>
<para>
A <xref linkend="witness-server"> is a normal PostgreSQL instance which
is not part of the streaming replication cluster; its purpose is, if a
failover situation occurs, to provide proof that it is the primary server
itself which is unavailable, rather than e.g. a network split between
different physical locations.
</para>
<para>
A typical use case for a witness server is a two-node streaming replication
setup, where the primary and standby are in different locations (data centres).
By creating a witness server in the same location (data centre) as the primary,
if the primary becomes unavailable it's possible for the standby to decide whether
it can promote itself without risking a "split brain" scenario: if it can't see either the
witness or the primary server, it's likely there's a network-level interruption
and it should not promote itself. If it can see the witness but not the primary,
this proves there is no network interruption and the primary itself is unavailable,
and it can therefore promote itself (and ideally take action to fence the
former primary).
</para>
<note>
<para>
<emphasis>Never</emphasis> install a witness server on the same physical host
as another node in the replication cluster managed by &repmgr; - it's essential
the witness is not affected in any way by failure of another node.
</para>
</note>
<para>
For more complex replication scenarios,e.g. with multiple datacentres, it may
be preferable to use location-based failover, which ensures that only nodes
in the same location as the primary will ever be promotion candidates;
see <xref linkend="repmgrd-network-split"> for more details.
</para>
<note>
<simpara>
A witness server will only be useful if <application>repmgrd</application>
is in use.
</simpara>
</note>
<sect1 id="creating-witness-server">
<title>Creating a witness server</title>
<para>
To create a witness server, set up a normal PostgreSQL instance on a server
in the same physical location as the cluster's primary server.
</para>
<para>
This instance should <emphasis>not</emphasis> be on the same physical host as the primary server,
as otherwise if the primary server fails due to hardware issues, the witness
server will be lost too.
</para>
<note>
<simpara>
&repmgr; 3.3 and earlier provided a <command>repmgr create witness</command>
command, which would automatically create a PostgreSQL instance. However
this often resulted in an unsatisfactory, hard-to-customise instance.
</simpara>
</note>
<para>
The witness server should be configured in the same way as a normal
&repmgr; node; see section <xref linkend="configuration">.
</para>
<para>
Register the witness server with <xref linkend="repmgr-witness-register">.
This will create the &repmgr; extension on the witness server, and make
a copy of the &repmgr; metadata.
</para>
<note>
<simpara>
As the witness server is not part of the replication cluster, further
changes to the &repmgr; metadata will be synchronised by
<application>repmgrd</application>.
</simpara>
</note>
<para>
Once the witness server has been configured, <application>repmgrd</application>
should be started; for more details see <xref linkend="repmgrd-witness-server">.
</para>
<para>
To unregister a witness server, use <xref linkend="repmgr-witness-unregister">.
</para>
</sect1>
</chapter>

View File

@@ -1,12 +1,12 @@
<chapter id="event-notifications" xreflabel="event notifications">
<title>Event Notifications</title>
<indexterm>
<primary>event notifications</primary>
</indexterm>
<title>Event Notifications</title>
<para>
Each time &repmgr; or <application>repmgrd</application> perform a significant event, a record
Each time &repmgr; or &repmgrd; perform a significant event, a record
of that event is written into the <literal>repmgr.events</literal> table together with
a timestamp, an indication of failure or success, and further details
if appropriate. This is useful for gaining an overview of events
@@ -27,7 +27,7 @@
(3 rows)</programlisting>
</para>
<para>
Alternatively, use <xref linkend="repmgr-cluster-event"> to output a
Alternatively, use <xref linkend="repmgr-cluster-event"/> to output a
formatted list of events.
</para>
<para>
@@ -91,8 +91,7 @@
may contain spaces, so should be quoted in the provided command
configuration, e.g.:
<programlisting>
event_notification_command='/path/to/some/script %n %e %s "%t" "%d"'
</programlisting>
event_notification_command='/path/to/some/script %n %e %s "%t" "%d"'</programlisting>
</para>
<para>
@@ -104,10 +103,10 @@
<term><option>%p</option></term>
<listitem>
<para>
node ID of the current primary (<xref linkend="repmgr-standby-register"> and <xref linkend="repmgr-standby-follow">)
node ID of the current primary (<xref linkend="repmgr-standby-register"/> and <xref linkend="repmgr-standby-follow"/>)
</para>
<para>
node ID of the demoted primary (<xref linkend="repmgr-standby-switchover"> only)
node ID of the demoted primary (<xref linkend="repmgr-standby-switchover"/> only)
</para>
</listitem>
</varlistentry>
@@ -116,7 +115,7 @@
<listitem>
<para>
<literal>conninfo</literal> string of the primary node
(<xref linkend="repmgr-standby-register"> and <xref linkend="repmgr-standby-follow">)
(<xref linkend="repmgr-standby-register"/> and <xref linkend="repmgr-standby-follow"/>)
</para>
<para>
<literal>conninfo</literal> string of the next available node
@@ -129,7 +128,7 @@
<term><option>%a</option></term>
<listitem>
<para>
name of the current primary node (<xref linkend="repmgr-standby-register"> and <xref linkend="repmgr-standby-follow">)
name of the current primary node (<xref linkend="repmgr-standby-register"/> and <xref linkend="repmgr-standby-follow"/>)
</para>
<para>
name of the next available node (<varname>bdr_failover</varname> and <varname>bdr_recovery</varname>)
@@ -147,7 +146,10 @@
<para>
By default, all notification types will be passed to the designated script;
the notification types can be filtered to explicitly named ones using the
<varname>event_notifications</varname> parameter.
<varname>event_notifications</varname> parameter, e.g.:
<programlisting>
event_notifications=primary_register,standby_register,witness_register</programlisting>
</para>
<para>
@@ -205,7 +207,7 @@
</para>
<para>
Events generated by <application>repmgrd</application> (streaming replication mode):
Events generated by &repmgrd; (streaming replication mode):
<itemizedlist spacing="compact" mark="bullet">
<listitem>
@@ -255,11 +257,24 @@
<simpara><literal>standby_recovery</literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgrd-primary-child-disconnection-events">child_node_disconnect</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgrd-primary-child-disconnection-events">child_node_reconnect</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgrd-primary-child-disconnection-events">child_node_new_connect</link></literal></simpara>
</listitem>
<listitem>
<simpara><literal><link linkend="repmgrd-primary-child-disconnection-events">child_nodes_disconnect_command</link></literal></simpara>
</listitem>
</itemizedlist>
</para>
<para>
Events generated by <application>repmgrd</application> (BDR mode):
Events generated by &repmgrd; (BDR mode):
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>bdr_failover</literal></simpara>

View File

@@ -1,91 +0,0 @@
<!-- doc/filelist.sgml -->
<!ENTITY legal SYSTEM "legal.sgml">
<!ENTITY bookindex SYSTEM "bookindex.sgml">
<!--
Some parts of the documentation are also source for some plain-text
files used during installation. To selectively ignore or include
some parts (e.g., external xref's) when generating these files we use
these parameter entities. See also standalone-install.sgml.
-->
<!ENTITY % standalone-ignore "INCLUDE">
<!ENTITY % standalone-include "IGNORE">
<!-- doc/filelist.sgml -->
<!--
By default, no index is included. Use -i include-index on the command line
to include it.
-->
<!ENTITY % include-index "IGNORE">
<!--
Create empty index element for processing by XSLT stylesheet.
-->
<!ENTITY % include-xslt-index "IGNORE">
<!--
Include external documentation sections
-->
<!ENTITY overview SYSTEM "overview.sgml">
<!ENTITY install SYSTEM "install.sgml">
<!ENTITY install-requirements SYSTEM "install-requirements.sgml">
<!ENTITY install-packages SYSTEM "install-packages.sgml">
<!ENTITY install-source SYSTEM "install-source.sgml">
<!ENTITY quickstart SYSTEM "quickstart.sgml">
<!ENTITY configuration SYSTEM "configuration.sgml">
<!ENTITY configuration-file SYSTEM "configuration-file.sgml">
<!ENTITY configuration-file-required-settings SYSTEM "configuration-file-required-settings.sgml">
<!ENTITY configuration-file-log-settings SYSTEM "configuration-file-log-settings.sgml">
<!ENTITY configuration-file-service-commands SYSTEM "configuration-file-service-commands.sgml">
<!ENTITY cloning-standbys SYSTEM "cloning-standbys.sgml">
<!ENTITY promoting-standby SYSTEM "promoting-standby.sgml">
<!ENTITY follow-new-primary SYSTEM "follow-new-primary.sgml">
<!ENTITY switchover SYSTEM "switchover.sgml">
<!ENTITY configuring-witness-server SYSTEM "configuring-witness-server.sgml">
<!ENTITY event-notifications SYSTEM "event-notifications.sgml">
<!ENTITY upgrading-repmgr SYSTEM "upgrading-repmgr.sgml">
<!ENTITY repmgrd-overview SYSTEM "repmgrd-overview.sgml">
<!ENTITY repmgrd-automatic-failover SYSTEM "repmgrd-automatic-failover.sgml">
<!ENTITY repmgrd-configuration SYSTEM "repmgrd-configuration.sgml">
<!ENTITY repmgrd-operation SYSTEM "repmgrd-operation.sgml">
<!ENTITY repmgrd-bdr SYSTEM "repmgrd-bdr.sgml">
<!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.sgml">
<!ENTITY repmgr-primary-unregister SYSTEM "repmgr-primary-unregister.sgml">
<!ENTITY repmgr-standby-clone SYSTEM "repmgr-standby-clone.sgml">
<!ENTITY repmgr-standby-register SYSTEM "repmgr-standby-register.sgml">
<!ENTITY repmgr-standby-unregister SYSTEM "repmgr-standby-unregister.sgml">
<!ENTITY repmgr-standby-promote SYSTEM "repmgr-standby-promote.sgml">
<!ENTITY repmgr-standby-follow SYSTEM "repmgr-standby-follow.sgml">
<!ENTITY repmgr-standby-switchover SYSTEM "repmgr-standby-switchover.sgml">
<!ENTITY repmgr-witness-register SYSTEM "repmgr-witness-register.sgml">
<!ENTITY repmgr-witness-unregister SYSTEM "repmgr-witness-unregister.sgml">
<!ENTITY repmgr-node-status SYSTEM "repmgr-node-status.sgml">
<!ENTITY repmgr-node-check SYSTEM "repmgr-node-check.sgml">
<!ENTITY repmgr-node-rejoin SYSTEM "repmgr-node-rejoin.sgml">
<!ENTITY repmgr-node-service SYSTEM "repmgr-node-service.sgml">
<!ENTITY repmgr-cluster-show SYSTEM "repmgr-cluster-show.sgml">
<!ENTITY repmgr-cluster-matrix SYSTEM "repmgr-cluster-matrix.sgml">
<!ENTITY repmgr-cluster-crosscheck SYSTEM "repmgr-cluster-crosscheck.sgml">
<!ENTITY repmgr-cluster-event SYSTEM "repmgr-cluster-event.sgml">
<!ENTITY repmgr-cluster-cleanup SYSTEM "repmgr-cluster-cleanup.sgml">
<!ENTITY repmgr-daemon-status SYSTEM "repmgr-daemon-status.sgml">
<!ENTITY repmgr-daemon-start SYSTEM "repmgr-daemon-start.sgml">
<!ENTITY repmgr-daemon-stop SYSTEM "repmgr-daemon-stop.sgml">
<!ENTITY repmgr-daemon-pause SYSTEM "repmgr-daemon-pause.sgml">
<!ENTITY repmgr-daemon-unpause SYSTEM "repmgr-daemon-unpause.sgml">
<!ENTITY appendix-release-notes SYSTEM "appendix-release-notes.sgml">
<!ENTITY appendix-faq SYSTEM "appendix-faq.sgml">
<!ENTITY appendix-signatures SYSTEM "appendix-signatures.sgml">
<!ENTITY appendix-packages SYSTEM "appendix-packages.sgml">
<!ENTITY appendix-support SYSTEM "appendix-support.sgml">
<!ENTITY bookindex SYSTEM "bookindex.sgml">

70
doc/filelist.xml Normal file
View File

@@ -0,0 +1,70 @@
<!-- doc/filelist.xml -->
<!ENTITY legal SYSTEM "legal.xml">
<!ENTITY bookindex SYSTEM "bookindex.xml">
<!--
Include external documentation sections
-->
<!ENTITY overview SYSTEM "overview.xml">
<!ENTITY install SYSTEM "install.xml">
<!ENTITY install-requirements SYSTEM "install-requirements.xml">
<!ENTITY install-packages SYSTEM "install-packages.xml">
<!ENTITY install-source SYSTEM "install-source.xml">
<!ENTITY quickstart SYSTEM "quickstart.xml">
<!ENTITY configuration SYSTEM "configuration.xml">
<!ENTITY configuration-file SYSTEM "configuration-file.xml">
<!ENTITY configuration-file-required-settings SYSTEM "configuration-file-required-settings.xml">
<!ENTITY configuration-file-optional-settings SYSTEM "configuration-file-optional-settings.xml">
<!ENTITY configuration-file-log-settings SYSTEM "configuration-file-log-settings.xml">
<!ENTITY configuration-file-service-commands SYSTEM "configuration-file-service-commands.xml">
<!ENTITY cloning-standbys SYSTEM "cloning-standbys.xml">
<!ENTITY promoting-standby SYSTEM "promoting-standby.xml">
<!ENTITY follow-new-primary SYSTEM "follow-new-primary.xml">
<!ENTITY switchover SYSTEM "switchover.xml">
<!ENTITY event-notifications SYSTEM "event-notifications.xml">
<!ENTITY upgrading-repmgr SYSTEM "upgrading-repmgr.xml">
<!ENTITY repmgrd-overview SYSTEM "repmgrd-overview.xml">
<!ENTITY repmgrd-automatic-failover SYSTEM "repmgrd-automatic-failover.xml">
<!ENTITY repmgrd-configuration SYSTEM "repmgrd-configuration.xml">
<!ENTITY repmgrd-operation SYSTEM "repmgrd-operation.xml">
<!ENTITY repmgrd-bdr SYSTEM "repmgrd-bdr.xml">
<!ENTITY repmgr-primary-register SYSTEM "repmgr-primary-register.xml">
<!ENTITY repmgr-primary-unregister SYSTEM "repmgr-primary-unregister.xml">
<!ENTITY repmgr-standby-clone SYSTEM "repmgr-standby-clone.xml">
<!ENTITY repmgr-standby-register SYSTEM "repmgr-standby-register.xml">
<!ENTITY repmgr-standby-unregister SYSTEM "repmgr-standby-unregister.xml">
<!ENTITY repmgr-standby-promote SYSTEM "repmgr-standby-promote.xml">
<!ENTITY repmgr-standby-follow SYSTEM "repmgr-standby-follow.xml">
<!ENTITY repmgr-standby-switchover SYSTEM "repmgr-standby-switchover.xml">
<!ENTITY repmgr-witness-register SYSTEM "repmgr-witness-register.xml">
<!ENTITY repmgr-witness-unregister SYSTEM "repmgr-witness-unregister.xml">
<!ENTITY repmgr-node-status SYSTEM "repmgr-node-status.xml">
<!ENTITY repmgr-node-check SYSTEM "repmgr-node-check.xml">
<!ENTITY repmgr-node-rejoin SYSTEM "repmgr-node-rejoin.xml">
<!ENTITY repmgr-node-service SYSTEM "repmgr-node-service.xml">
<!ENTITY repmgr-cluster-show SYSTEM "repmgr-cluster-show.xml">
<!ENTITY repmgr-cluster-matrix SYSTEM "repmgr-cluster-matrix.xml">
<!ENTITY repmgr-cluster-crosscheck SYSTEM "repmgr-cluster-crosscheck.xml">
<!ENTITY repmgr-cluster-event SYSTEM "repmgr-cluster-event.xml">
<!ENTITY repmgr-cluster-cleanup SYSTEM "repmgr-cluster-cleanup.xml">
<!ENTITY repmgr-daemon-status SYSTEM "repmgr-daemon-status.xml">
<!ENTITY repmgr-daemon-start SYSTEM "repmgr-daemon-start.xml">
<!ENTITY repmgr-daemon-stop SYSTEM "repmgr-daemon-stop.xml">
<!ENTITY repmgr-daemon-pause SYSTEM "repmgr-daemon-pause.xml">
<!ENTITY repmgr-daemon-unpause SYSTEM "repmgr-daemon-unpause.xml">
<!ENTITY appendix-release-notes SYSTEM "appendix-release-notes.xml">
<!ENTITY appendix-faq SYSTEM "appendix-faq.xml">
<!ENTITY appendix-signatures SYSTEM "appendix-signatures.xml">
<!ENTITY appendix-packages SYSTEM "appendix-packages.xml">
<!ENTITY appendix-support SYSTEM "appendix-support.xml">
<!ENTITY bookindex SYSTEM "bookindex.xml">

View File

@@ -1,18 +1,19 @@
<chapter id="follow-new-primary">
<title>Following a new primary</title>
<indexterm>
<primary>Following a new primary</primary>
<seealso>repmgr standby follow</seealso>
</indexterm>
<title>Following a new primary</title>
<para>
Following the failure or removal of the replication cluster's existing primary
server, <xref linkend="repmgr-standby-follow"> can be used to make 'orphaned' standbys
server, <xref linkend="repmgr-standby-follow"/> can be used to make &quot;orphaned&quot; standbys
follow the new primary and catch up to its current state.
</para>
<para>
To demonstrate this, assuming a replication cluster in the same state as the
end of the preceding section (<xref linkend="promoting-standby">),
end of the preceding section (<xref linkend="promoting-standby"/>),
execute this:
<programlisting>
$ repmgr -f /etc/repmgr.conf standby follow

View File

@@ -13,12 +13,13 @@
<sect2 id="installation-packages-redhat" xreflabel="Installing from packages on RHEL, CentOS and Fedora">
<title>RedHat/CentOS/Fedora</title>
<indexterm>
<primary>installation</primary>
<secondary>on Red Hat/CentOS/Fedora etc.</secondary>
</indexterm>
<title>RedHat/CentOS/Fedora</title>
<para>
&repmgr; RPM packages for RedHat/CentOS variants and Fedora are available from the
<ulink url="https://2ndquadrant.com">2ndQuadrant</ulink>
@@ -46,7 +47,7 @@
<para>
For more information on the package contents, including details of installation
paths and relevant <link linkend="configuration-file-service-commands">service commands</link>,
see the appendix section <xref linkend="packages-centos">.
see the appendix section <xref linkend="packages-centos"/>.
</para>
@@ -105,7 +106,7 @@ sudo yum repolist</programlisting>
<listitem>
<para>
Install the &repmgr version appropriate for your PostgreSQL version (e.g. <literal>repmgr10</literal>):
Install the &repmgr; version appropriate for your PostgreSQL version (e.g. <literal>repmgr10</literal>):
<programlisting>
sudo yum install repmgr10</programlisting>
</para>
@@ -181,12 +182,13 @@ yum search repmgr</programlisting>
<sect2 id="installation-packages-debian" xreflabel="Installing from packages on Debian or Ubuntu">
<title>Debian/Ubuntu</title>
<indexterm>
<primary>installation</primary>
<secondary>on Debian/Ubuntu etc.</secondary>
</indexterm>
<title>Debian/Ubuntu</title>
<para>.deb packages for &repmgr; are available from the
PostgreSQL Community APT repository (<ulink url="http://apt.postgresql.org/">http://apt.postgresql.org/</ulink>).
Instructions can be found in the APT section of the PostgreSQL Wiki
@@ -195,7 +197,7 @@ yum search repmgr</programlisting>
<para>
For more information on the package contents, including details of installation
paths and relevant <link linkend="configuration-file-service-commands">service commands</link>,
see the appendix section <xref linkend="packages-debian-ubuntu">.
see the appendix section <xref linkend="packages-debian-ubuntu"/>.
</para>
<sect3 id="installation-packages-debian-ubuntu-2ndq">
@@ -242,7 +244,7 @@ curl https://dl.2ndquadrant.com/default/release/get/deb | sudo bash</programlist
<listitem>
<para>
Install the &repmgr version appropriate for your PostgreSQL version (e.g. <literal>repmgr10</literal>):
Install the &repmgr; version appropriate for your PostgreSQL version (e.g. <literal>repmgr10</literal>):
<programlisting>
sudo apt-get install postgresql-10-repmgr</programlisting>
</para>

View File

@@ -1,11 +1,12 @@
<sect1 id="install-requirements" xreflabel="installation requirements">
<title>Requirements for installing repmgr</title>
<indexterm>
<primary>installation</primary>
<secondary>requirements</secondary>
</indexterm>
<title>Requirements for installing repmgr</title>
<para>
repmgr is developed and tested on Linux and OS X, but should work on any
UNIX-like system supported by PostgreSQL itself. There is no support for
@@ -20,7 +21,7 @@
<note>
<simpara>
If upgrading from &repmgr; 3.x, please see the section <xref linkend="upgrading-from-repmgr-3">.
If upgrading from &repmgr; 3.x, please see the section <xref linkend="upgrading-from-repmgr-3"/>.
</simpara>
</note>
@@ -45,14 +46,14 @@
</simpara>
<simpara>
If different &quot;major&quot; &repmgr; versions (e.g. 3.3.x and 4.1.x)
are installed on different nodes, in the best case &repmgr; (in particular <application>repmgrd</application>)
are installed on different nodes, in the best case &repmgr; (in particular &repmgrd;)
will not run. In the worst case, you will end up with a broken cluster.
</simpara>
</note>
<para>
A dedicated system user for &repmgr; is <emphasis>not</emphasis> required; as many &repmgr; and
<application>repmgrd</application> actions require direct access to the PostgreSQL data directory,
&repmgrd; actions require direct access to the PostgreSQL data directory,
these commands should be executed by the <literal>postgres</literal> user.
</para>
@@ -72,6 +73,8 @@
<sect2 id="install-compatibility-matrix">
<title>&repmgr; compatibility matrix</title>
<indexterm>
<primary>repmgr</primary>
<secondary>compatibility matrix</secondary>
@@ -81,7 +84,6 @@
<primary>compatibility matrix</primary>
</indexterm>
<title>&repmgr; compatibility matrix</title>
<para>
The following table provides an overview of which &repmgr; version supports
which PostgreSQL version.
@@ -91,7 +93,7 @@
<table id="repmgr-compatibility-matrix">
<title>&repmgr; compatibility matrix</title>
<tgroup cols="2">
<tgroup cols="3">
<thead>
<row>
<entry>

View File

@@ -1,11 +1,12 @@
<sect1 id="installation-source" xreflabel="Installing from source code">
<indexterm>
<primary>installation</primary>
<secondary>from source</secondary>
</indexterm>
<title>Installing &repmgr; from source</title>
<indexterm>
<primary>installation</primary>
<secondary>from source</secondary>
</indexterm>
<sect2 id="installation-source-prereqs">
<title>Prerequisites for installing from source</title>
<para>
@@ -61,28 +62,28 @@ deb-src http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisti
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>llibedit-dev</literal></simpara>
<simpara><literal>libedit-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibkrb5-dev</literal></simpara>
<simpara><literal>libkrb5-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibpam0g-dev</literal></simpara>
<simpara><literal>libpam0g-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibreadline-dev</literal></simpara>
<simpara><literal>libreadline-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibselinux1-dev</literal></simpara>
<simpara><literal>libselinux1-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibssl-dev</literal></simpara>
<simpara><literal>libssl-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibxml2-dev</literal></simpara>
<simpara><literal>libxml2-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibxslt1-dev</literal></simpara>
<simpara><literal>libxslt1-dev</literal></simpara>
</listitem>
</itemizedlist>
</para>
@@ -136,6 +137,16 @@ deb-src http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisti
</itemizedlist>
</para>
</note>
<tip>
<para>
If building against PostgreSQL 11 or later configured with the <option>--with-llvm</option> option
(this is the case with the PGDG-provided packages) you'll also need to install the
<literal>llvm-toolset-7-clang</literal> package. This is available via the
<ulink url="https://wiki.centos.org/AdditionalResources/Repositories/SCL">Software Collections (SCL) Repository</ulink>.
</para>
</tip>
</listitem>
</itemizedlist>
</para>
@@ -190,7 +201,7 @@ deb-src http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisti
&repmgr; website along with a tarball checksum and a matching GnuPG
signature. See
<ulink url="http://repmgr.org/">http://repmgr.org/</ulink>
for the download information. See <xref linkend="appendix-signatures">
for the download information. See <xref linkend="appendix-signatures"/>
for information on verifying digital signatures.
</para>
@@ -198,11 +209,11 @@ deb-src http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisti
You will need to download the repmgr source, e.g. <filename>repmgr-4.0.tar.gz</filename>.
You may optionally verify the package checksums from the
<literal>.md5</literal> files and/or verify the GnuPG signatures
per <xref linkend="appendix-signatures">.
per <xref linkend="appendix-signatures"/>.
</para>
<para>
After you unpack the source code archives using <literal>tar xf</literal>
After you unpack the source code archives using <command>tar xf</command>
the installation process is the same as if you were installing from a git
clone.
</para>
@@ -217,7 +228,7 @@ deb-src http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisti
To installing &repmgr; from source, simply execute:
<programlisting>
./configure && make install</programlisting>
./configure &amp;&amp; make install</programlisting>
Ensure <command>pg_config</command> for the target PostgreSQL version is in
<varname>$PATH</varname>.
@@ -226,16 +237,30 @@ deb-src http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisti
<sect2 id="installation-build-repmgr-docs">
<sect2 id="installation-build-repmgr-docs" xreflabel="Building repmgr documentation">
<title>Building &repmgr; documentation</title>
<para>
The &repmgr; documentation is (like the main PostgreSQL project)
written in DocBook format. To build it locally as HTML, you'll need to
written in DocBook XML format. To build it locally as HTML, you'll need to
install the required packages as described in the
<ulink url="https://www.postgresql.org/docs/9.6/docguide-toolsets.html">
PostgreSQL documentation</ulink> then execute:
<ulink url="https://www.postgresql.org/docs/current/docguide-toolsets.html">PostgreSQL documentation</ulink>.
</para>
<para>
The minimum PostgreSQL version for building the &repmgr; documentation is
PostgreSQL 9.5.
</para>
<note>
<simpara>
In &repmgr; 4.3 and earlier, the documentation can only be built against
PostgreSQL 9.6 or earlier.
</simpara>
</note>
<para>
To build the documentation as HTML, execute:
<programlisting>
./configure && make install-doc</programlisting>
./configure &amp;&amp; make doc</programlisting>
</para>
<para>
The generated HTML files will be placed in the <filename>doc/html</filename>
@@ -243,19 +268,20 @@ deb-src http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisti
</para>
<para>
To build the documentation as a single HTML file, execute:
To build the documentation as a single HTML file, after configuring and building
the main &repmgr; source as described above, execute:
<programlisting>
cd doc/ && make repmgr.html</programlisting>
./configure &amp;&amp; make doc-repmgr.html</programlisting>
</para>
<note>
<simpara>
Due to changes in PostgreSQL's documentation build system from PostgreSQL 10,
the documentation can currently only be built against PostgreSQL 9.6 or earlier.
This limitation will be fixed when time and resources permit.
</simpara>
</note>
<para>
To build the documentation as a PDF file, after configuring and building
the main &repmgr; source as described above, execute:
<programlisting>
./configure &amp;&amp; make doc-repmgr-A4.pdf</programlisting>
</para>
</sect2>
</sect1>

View File

@@ -1,10 +1,11 @@
<chapter id="installation" xreflabel="Installation">
<title>Installation</title>
<indexterm>
<primary>installation</primary>
</indexterm>
<title>Installation</title>
<para>
&repmgr; can be installed from binary packages provided by your operating
system's packaging system, or from source.
@@ -18,7 +19,7 @@
only option if there are no packages for your operating system yet.
</para>
<para>
Before installing &repmgr; make sure you satisfy the <xref linkend="install-requirements">.
Before installing &repmgr; make sure you satisfy the <xref linkend="install-requirements"/>.
</para>
&install-requirements;

View File

@@ -1,4 +1,4 @@
<!-- doc/legal.sgml -->
<!-- doc/legal.xml -->
<date>2017</date>

View File

@@ -7,18 +7,18 @@
</para>
<sect1 id="repmgr-concepts" xreflabel="Concepts">
<title>Concepts</title>
<indexterm>
<primary>concepts</primary>
</indexterm>
<title>Concepts</title>
<para>
This guide assumes that you are familiar with PostgreSQL administration and
streaming replication concepts. For further details on streaming
replication, see the PostgreSQL documentation section on <ulink
url="https://www.postgresql.org/docs/current/interactive/warm-standby.html#STREAMING-REPLICATION">
streaming replication</>.
url="https://www.postgresql.org/docs/current/warm-standby.html#STREAMING-REPLICATION">
streaming replication</ulink>.
</para>
<para>
The following terms are used throughout the &repmgr; documentation.
@@ -58,7 +58,7 @@
<listitem>
<simpara>
This is the action which occurs if a primary server fails and a suitable standby
is promoted as the new primary. The <application>repmgrd</application> daemon supports automatic failover
is promoted as the new primary. The &repmgrd; daemon supports automatic failover
to minimise downtime.
</simpara>
</listitem>
@@ -107,7 +107,7 @@
promotes a (local) standby.
</para>
<para>
A witness server only needs to be created if <application>repmgrd</application>
A witness server only needs to be created if &repmgrd;
is in use.
</para>
</listitem>
@@ -198,7 +198,7 @@
</listitem>
<listitem>
<simpara><literal>repmgr.monitoring_history</literal>: historical standby monitoring information
written by <application>repmgrd</application></simpara>
written by &repmgrd;</simpara>
</listitem>
</itemizedlist>
</para>
@@ -214,7 +214,7 @@
name of the server's upstream node</simpara>
</listitem>
<listitem>
<simpara>repmgr.replication_status: when <application>repmgrd</application>'s monitoring is enabled, shows
<simpara>repmgr.replication_status: when &repmgrd;'s monitoring is enabled, shows
current monitoring status for each standby.</simpara>
</listitem>
</itemizedlist>

View File

@@ -1,13 +1,13 @@
<chapter id="promoting-standby" xreflabel="Promoting a standby">
<title>Promoting a standby server with repmgr</title>
<indexterm>
<primary>promoting a standby</primary>
<seealso>repmgr standby promote</seealso>
</indexterm>
<title>Promoting a standby server with repmgr</title>
<para>
If a primary server fails or needs to be removed from the replication cluster,
a new primary server must be designated, to ensure the cluster continues
to function correctly. This can be done with <xref linkend="repmgr-standby-promote">,
to function correctly. This can be done with <xref linkend="repmgr-standby-promote"/>,
which promotes the standby on the current server to primary.
</para>
@@ -31,7 +31,7 @@
At this point the replication cluster will be in a partially disabled state, with
both standbys accepting read-only connections while attempting to connect to the
stopped primary. Note that the &repmgr; metadata table will not yet have been updated;
executing <xref linkend="repmgr-cluster-show"> will note the discrepancy:
executing <xref linkend="repmgr-cluster-show"/> will note the discrepancy:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Connection string
@@ -60,7 +60,7 @@
DETAIL: node 2 was successfully promoted to primary</programlisting>
</para>
<para>
Executing <xref linkend="repmgr-cluster-show"> will show the current state; as there is now an
Executing <xref linkend="repmgr-cluster-show"/> will show the current state; as there is now an
active primary, the previous warning will not be displayed:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
@@ -72,8 +72,8 @@
</para>
<para>
However the sole remaining standby (<literal>node3</literal>) is still trying to replicate from the failed
primary; <xref linkend="repmgr-standby-follow"> must now be executed to rectify this situation
(see <xref linkend="follow-new-primary"> for example).
primary; <xref linkend="repmgr-standby-follow"/> must now be executed to rectify this situation
(see <xref linkend="follow-new-primary"/> for example).
</para>
</chapter>

View File

@@ -17,7 +17,7 @@
<note>
<simpara>
To upgrade an existing &repmgr; 3.x installation, see section
<xref linkend="upgrading-from-repmgr-3">.
<xref linkend="upgrading-from-repmgr-3"/>.
</simpara>
</note>
@@ -76,19 +76,25 @@
</para>
<programlisting>
# Enable replication connections; set this figure to at least one more
# Enable replication connections; set this value to at least one more
# than the number of standbys which will connect to this server
# (note that repmgr will execute `pg_basebackup` in WAL streaming mode,
# which requires two free WAL senders)
# (note that repmgr will execute "pg_basebackup" in WAL streaming mode,
# which requires two free WAL senders).
#
# See: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-WAL-SENDERS
max_wal_senders = 10
# Enable replication slots; set this figure to at least one more
# If using replication slots, set this value to at least one more
# than the number of standbys which will connect to this server.
# Note that repmgr will only make use of replication slots if
# "use_replication_slots" is set to "true" in repmgr.conf
# "use_replication_slots" is set to "true" in "repmgr.conf".
# (If you are not intending to use replication slots, this value
# can be set to "0").
#
# See: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-REPLICATION-SLOTS
max_replication_slots = 0
max_replication_slots = 10
# Ensure WAL files contain enough information to enable read-only queries
# on the standby.
@@ -103,33 +109,41 @@
# Enable read-only queries on a standby
# (Note: this will be ignored on a primary but we recommend including
# it anyway)
# it anyway, in case the primary later becomes a standby)
#
# See: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-HOT-STANDBY
hot_standby = on
# Enable WAL file archiving
#
# See: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-MODE
archive_mode = on
# Set archive command to a script or application that will safely store
# you WALs in a secure place. /bin/true is an example of a command that
# ignores archiving. Use something more sensible.
# Set archive command to a dummy command; this can later be changed without
# needing to restart the PostgreSQL instance.
#
# See: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-COMMAND
archive_command = '/bin/true'
</programlisting>
<tip>
<simpara>
Rather than editing these settings in the default <filename>postgresql.conf</filename>
file, create a separate file such as <filename>postgresql.replication.conf</filename> and
file, create a separate file such as <filename>postgresql.replication.conf</filename> and
include it from the end of the main configuration file with:
<command>include 'postgresql.replication.conf</command>.
<command>include 'postgresql.replication.conf'</command>.
</simpara>
</tip>
<para>
Additionally, if you are intending to use <application>pg_rewind</application>,
and the cluster was not initialised using data checksums, you may want to consider enabling
<varname>wal_log_hints</varname>; for more details see <xref linkend="repmgr-node-rejoin-pg-rewind">.
<varname>wal_log_hints</varname>; for more details see <xref linkend="repmgr-node-rejoin-pg-rewind"/>.
</para>
<para>
See also the <link linkend="configuration-postgresql">PostgreSQL configuration</link> section in the <link linkend="configuration">repmgr configuaration guide</link>.
See also the <link linkend="configuration-postgresql">PostgreSQL configuration</link> section in the
<link linkend="configuration">repmgr configuration guide</link>.
</para>
</sect1>
@@ -248,7 +262,7 @@
<para>
<filename>repmgr.conf</filename> should not be stored inside the PostgreSQL data directory,
as it could be overwritten when setting up or reinitialising the PostgreSQL
server. See sections <xref linkend="configuration"> and <xref linkend="configuration-file">
server. See sections <xref linkend="configuration"/> and <xref linkend="configuration-file"/>
for further details about <filename>repmgr.conf</filename>.
</para>
@@ -289,7 +303,7 @@
<para>
See the file
<ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</>
<ulink url="https://raw.githubusercontent.com/2ndQuadrant/repmgr/master/repmgr.conf.sample">repmgr.conf.sample</ulink>
for details of all available configuration parameters.
</para>
@@ -338,7 +352,7 @@
slot_name |
config_file | /etc/repmgr.conf</programlisting>
<para>
Each server in the replication cluster will have its own record. If <application>repmgrd</application>
Each server in the replication cluster will have its own record. If &repmgrd;
is in use, the fields <literal>upstream_node_id</literal>, <literal>active</literal> and
<literal>type</literal> will be updated when the node's status or role changes.
</para>

View File

@@ -38,7 +38,7 @@
<title>Notes</title>
<para>
Monitoring history will only be written if <application>repmgrd</application> is active, and
Monitoring history will only be written if &repmgrd; is active, and
<varname>monitoring_history</varname> is set to <literal>true</literal> in
<filename>repmgr.conf</filename>.
</para>
@@ -69,8 +69,8 @@
<refsect1>
<title>See also</title>
<para>
For more details see the sections <xref linkend="repmgrd-monitoring"> and
<xref linkend="repmgrd-monitoring-configuration">.
For more details see the sections <xref linkend="repmgrd-monitoring"/> and
<xref linkend="repmgrd-monitoring-configuration"/>.
</para>
</refsect1>

View File

@@ -16,9 +16,9 @@
<refsect1>
<title>Description</title>
<para>
<command>repmgr cluster crosscheck</command> is similar to <xref linkend="repmgr-cluster-matrix">,
<command>repmgr cluster crosscheck</command> is similar to <xref linkend="repmgr-cluster-matrix"/>,
but cross-checks connections between each combination of nodes. In "Example 3" in
<xref linkend="repmgr-cluster-matrix"> we have no information about the state of <literal>node3</literal>.
<xref linkend="repmgr-cluster-matrix"/> we have no information about the state of <literal>node3</literal>.
However by running <command>repmgr cluster crosscheck</command> it's possible to get a better
overview of the cluster situation:
<programlisting>

View File

@@ -40,12 +40,12 @@
<simpara><literal>--node-name</literal>: restrict entries to node with this name</simpara>
</listitem>
<listitem>
<simpara><literal>--event</literal>: filter specific event (see <xref linkend="event-notifications"> for a full list)</simpara>
<simpara><literal>--event</literal>: filter specific event (see <xref linkend="event-notifications"/> for a full list)</simpara>
</listitem>
</itemizedlist>
</para>
<para>
The "Details" column can be omitted by providing <literal>--terse</literal>.
The &quot;Details&quot; column can be omitted by providing <literal>--compact</literal>.
</para>
</refsect1>
@@ -71,9 +71,9 @@
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster event --event=standby_register
Node ID | Name | Event | OK | Timestamp | Details
---------+-------+------------------+----+---------------------+--------------------------------
3 | node3 | standby_register | t | 2017-08-17 10:28:55 | standby registration succeeded
2 | node2 | standby_register | t | 2017-08-17 10:28:53 | standby registration succeeded</programlisting>
---------+-------+------------------+----+---------------------+-------------------------------------------------------
3 | node3 | standby_register | t | 2019-04-16 10:59:59 | standby registration succeeded; upstream node ID is 1
2 | node2 | standby_register | t | 2019-04-16 10:59:57 | standby registration succeeded; upstream node ID is 1</programlisting>
</para>
</refsect1>
</refentry>

View File

@@ -93,7 +93,7 @@
connection from <literal>node3</literal>.
</para>
<para>
In this case, the <xref linkend="repmgr-cluster-crosscheck"> command will produce a more
In this case, the <xref linkend="repmgr-cluster-crosscheck"/> command will produce a more
useful result.
</para>
</refsect1>

View File

@@ -22,11 +22,13 @@
directly and can be run on any node in the cluster; this is also useful when analyzing
connectivity from a particular node.
</para>
<para>
For PostgreSQL 9.6 and later, the output will also contain the node's current timeline ID.
</para>
<para>
Node availability is tested by connecting from the node where
<command>repmgr cluster show</command> is executed, and does not necessarily imply the node
is down. See <xref linkend="repmgr-cluster-matrix"> and <xref linkend="repmgr-cluster-crosscheck"> to get
is down. See <xref linkend="repmgr-cluster-matrix"/> and <xref linkend="repmgr-cluster-crosscheck"/> to get
better overviews of connections between nodes.
</para>
@@ -52,11 +54,11 @@
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Connection string
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+-----------------------------------------
1 | node1 | primary | * running | | default | 100 | host=db_node1 dbname=repmgr user=repmgr
2 | node2 | standby | running | node1 | default | 100 | host=db_node2 dbname=repmgr user=repmgr
3 | node3 | standby | running | node1 | default | 100 | host=db_node3 dbname=repmgr user=repmgr</programlisting>
1 | node1 | primary | * running | | default | 100 | 1 | host=db_node1 dbname=repmgr user=repmgr
2 | node2 | standby | running | node1 | default | 100 | 1 | host=db_node2 dbname=repmgr user=repmgr
3 | node3 | standby | running | node1 | default | 100 | 1 | host=db_node3 dbname=repmgr user=repmgr</programlisting>
</para>
</refsect1>
<refsect1>
@@ -101,7 +103,7 @@
</para>
<tip>
<para>
Use <xref linkend="repmgr-cluster-matrix"> and <xref linkend="repmgr-cluster-crosscheck">
Use <xref linkend="repmgr-cluster-matrix"/> and <xref linkend="repmgr-cluster-crosscheck"/>
to diagnose connection issues across the whole replication cluster.
</para>
</tip>
@@ -196,11 +198,31 @@
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_BAD_CONFIG (1)</option></term>
<listitem>
<para>
An issue was encountered while attempting to retrieve
&repmgr; metadata.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_DB_CONN (6)</option></term>
<listitem>
<para>
&repmgr; was unable to connect to the local PostgreSQL instance.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>ERR_NODE_STATUS (25)</option></term>
<listitem>
<para>
One or more issues were detected.
One or more issues were detected with the replication configuration,
e.g. a node was not in its expected state.
</para>
</listitem>
</varlistentry>
@@ -211,7 +233,7 @@
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-node-status">, <xref linkend="repmgr-node-check">, <xref linkend="repmgr-daemon-status">
<xref linkend="repmgr-node-status"/>, <xref linkend="repmgr-node-check"/>, <xref linkend="repmgr-daemon-status"/>
</para>
</refsect1>

View File

@@ -14,30 +14,30 @@
<refnamediv>
<refname>repmgr daemon pause</refname>
<refpurpose>Instruct all <application>repmgrd</application> instances in the replication cluster to pause failover operations</refpurpose>
<refpurpose>Instruct all &repmgrd; instances in the replication cluster to pause failover operations</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command can be run on any active node in the replication cluster to instruct all
running <application>repmgrd</application> instances to &quot;pause&quot; themselves, i.e. take no
running &repmgrd; instances to &quot;pause&quot; themselves, i.e. take no
action (such as promoting themselves or following a new primary) if a failover event is detected.
</para>
<para>
This functionality is useful for performing maintenance operations, such as switchovers
or upgrades, which might otherwise trigger a failover if <application>repmgrd</application>
or upgrades, which might otherwise trigger a failover if &repmgrd;
is running normally.
</para>
<note>
<para>
It's important to wait a few seconds after restarting PostgreSQL on any node before running
<command>repmgr daemon pause</command>, as the <application>repmgrd</application> instance
<command>repmgr daemon pause</command>, as the &repmgrd; instance
on the restarted node will take a second or two before it has updated its status.
</para>
</note>
<para>
<xref linkend="repmgr-daemon-unpause"> will instruct all previously paused <application>repmgrd</application>
<xref linkend="repmgr-daemon-unpause"/> will instruct all previously paused &repmgrd;
instances to resume normal failover operation.
</para>
</refsect1>
@@ -69,7 +69,7 @@ NOTICE: node 3 (node3) paused</programlisting>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check if nodes are reachable but don't pause <application>repmgrd</application>.
Check if nodes are reachable but don't pause &repmgrd;.
</para>
</listitem>
</varlistentry>
@@ -87,7 +87,7 @@ NOTICE: node 3 (node3) paused</programlisting>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
<application>repmgrd</application> could be paused on all nodes.
&repmgrd; could be paused on all nodes.
</para>
</listitem>
</varlistentry>
@@ -96,7 +96,7 @@ NOTICE: node 3 (node3) paused</programlisting>
<term><option>ERR_REPMGRD_PAUSE (26)</option></term>
<listitem>
<para>
<application>repmgrd</application> could not be paused on one or mode nodes.
&repmgrd; could not be paused on one or mode nodes.
</para>
</listitem>
</varlistentry>
@@ -107,7 +107,7 @@ NOTICE: node 3 (node3) paused</programlisting>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-daemon-unpause">, <xref linkend="repmgr-daemon-status">
<xref linkend="repmgr-daemon-unpause"/>, <xref linkend="repmgr-daemon-status"/>
</para>
</refsect1>
</refentry>

View File

@@ -14,17 +14,17 @@
<refnamediv>
<refname>repmgr daemon start</refname>
<refpurpose>Start the <application>repmgrd</application> daemon</refpurpose>
<refpurpose>Start the &repmgrd; daemon</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command starts the <application>repmgrd</application> daemon on the
This command starts the &repmgrd; daemon on the
local node.
</para>
<para>
By default, &repmgr; will wait for up to 15 seconds to confirm that <application>repmgrd</application>
By default, &repmgr; will wait for up to 15 seconds to confirm that &repmgrd;
started. This behaviour can be overridden by specifying a diffent value using the <option>--wait</option>
option, or disabled altogether with the <option>--no-wait</option> option.
</para>
@@ -33,7 +33,7 @@
<para>
The <filename>repmgr.conf</filename> parameter <varname>repmgrd_service_start_command</varname>
must be set for <command>repmgr daemon start</command> to work; see section
<xref linkend="repmgr-daemon-start-configuration"> for details.
<xref linkend="repmgr-daemon-start-configuration"/> for details.
</para>
</important>
</refsect1>
@@ -50,7 +50,7 @@
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually attempt to start <application>repmgrd</application>.
Check prerequisites but don't actually attempt to start &repmgrd;.
</para>
<para>
This action will output the command which would be executed.
@@ -63,7 +63,7 @@
<term><option>--wait</option></term>
<listitem>
<para>
Wait for the specified number of seconds to confirm that <application>repmgrd</application>
Wait for the specified number of seconds to confirm that &repmgrd;
started successfully.
</para>
<para>
@@ -77,7 +77,7 @@
<term><option>--no-wait</option></term>
<listitem>
<para>
Don't wait to confirm that <application>repmgrd</application>
Don't wait to confirm that &repmgrd;
started successfully.
</para>
<para>
@@ -99,17 +99,18 @@
<variablelist>
<varlistentry>
<indexterm>
<primary>repmgrd_service_start_command</primary>
<secondary>with &quot;repmgr daemon start&quot;</secondary>
</indexterm>
<term><option>repmgrd_service_start_command</option></term>
<listitem>
<indexterm>
<primary>repmgrd_service_start_command</primary>
<secondary>with &quot;repmgr daemon start&quot;</secondary>
</indexterm>
<para>
<command>repmgr daemon start</command> will execute the command defined by the
<varname>repmgrd_service_start_command</varname> parameter in <filename>repmgr.conf</filename>.
This must be set to a shell command which will start <application>repmgrd</application>;
This must be set to a shell command which will start &repmgrd;;
if &repmgr; was installed from a package, this will be the service command defined by the
package. For more details see <link linkend="appendix-packages">Appendix: &repmgr; package details</link>.
</para>
@@ -117,7 +118,7 @@
<para>
If &repmgr; was installed from a system package, and you do not configure
<varname>repmgrd_service_start_command</varname> to an appropriate service command, this may
result in the system becoming confused about the state of the <application>repmgrd</application>
result in the system becoming confused about the state of the &repmgrd;
service; this is particularly the case with <literal>systemd</literal>.
</para>
</important>
@@ -139,12 +140,12 @@
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
The <application>repmgrd</application> start command (defined in
The &repmgrd; start command (defined in
<varname>repmgrd_service_start_command</varname>) was successfully executed.
</para>
<para>
If the <option>--wait</option> option was provided, &repmgr; will confirm that
<application>repmgrd</application> has actually started up.
&repmgrd; has actually started up.
</para>
</listitem>
</varlistentry>
@@ -167,10 +168,10 @@
&repmgr; was unable to connect to the local PostgreSQL node.
</para>
<para>
PostgreSQL must be running before <application>repmgrd</application>
PostgreSQL must be running before &repmgrd;
can be started. Additionally, unless the <option>--no-wait</option> option was
provided, &repmgr; needs to be able to connect to the local PostgreSQL node
to determine the state of <application>repmgrd</application>.
to determine the state of &repmgrd;.
</para>
</listitem>
</varlistentry>
@@ -180,11 +181,11 @@
<term><option>ERR_REPMGRD_SERVICE (27)</option></term>
<listitem>
<para>
The <application>repmgrd</application> start command (defined in
The &repmgrd; start command (defined in
<varname>repmgrd_service_start_command</varname>) was not successfully executed.
</para>
<para>
This can also mean that &repmgr; was unable to confirm whether <application>repmgrd</application>
This can also mean that &repmgr; was unable to confirm whether &repmgrd;
successfully started (unless the <option>--no-wait</option> option was provided).
</para>
</listitem>
@@ -196,7 +197,7 @@
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-daemon-stop">, <xref linkend="repmgr-daemon-status">, <xref linkend="repmgrd-daemon">
<xref linkend="repmgr-daemon-stop"/>, <xref linkend="repmgr-daemon-status"/>, <xref linkend="repmgrd-daemon"/>
</para>
</refsect1>

View File

@@ -14,15 +14,15 @@
<refnamediv>
<refname>repmgr daemon status</refname>
<refpurpose>display information about the status of <application>repmgrd</application> on each node in the cluster</refpurpose>
<refpurpose>display information about the status of &repmgrd; on each node in the cluster</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command provides an overview over all active nodes in the cluster and the state
of each node's <application>repmgrd</application> instance. It can be used to check
the result of <xref linkend="repmgr-daemon-pause"> and <xref linkend="repmgr-daemon-unpause">
of each node's &repmgrd; instance. It can be used to check
the result of <xref linkend="repmgr-daemon-pause"/> and <xref linkend="repmgr-daemon-unpause"/>
operations.
</para>
</refsect1>
@@ -35,13 +35,13 @@
</para>
<para>
If PostgreSQL is not running on a node, &repmgr; will not be able to determine the
status of that node's <application>repmgrd</application> instance.
status of that node's &repmgrd; instance.
</para>
<note>
<para>
After restarting PostgreSQL on any node, the <application>repmgrd</application> instance
After restarting PostgreSQL on any node, the &repmgrd; instance
will take a second or two before it is able to update its status. Until then,
<application>repmgrd</application> will be shown as not running.
&repmgrd; will be shown as not running.
</para>
</note>
@@ -50,35 +50,33 @@
<refsect1>
<title>Examples</title>
<para>
<application>repmgrd</application> running normally on all nodes:
&repmgrd; running normally on all nodes:
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
ID | Name | Role | Priority | Status | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+----------+---------+---------+-------+---------+--------------------
1 | node1 | primary | 100 | running | running | 71987 | no | n/a
2 | node2 | standby | 100 | running | running | 71996 | no | 1 second(s) ago
3 | node3 | standby | 100 | running | running | 72042 | no | 1 second(s) ago
</programlisting>
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node1 | primary | * running | | running | 96563 | no | n/a
2 | node2 | standby | running | node1 | running | 96572 | no | 1 second(s) ago
3 | node3 | standby | running | node1 | running | 96584 | no | 0 second(s) ago</programlisting>
</para>
<para>
<application>repmgrd</application> paused on all nodes (using <xref linkend="repmgr-daemon-pause">):
&repmgrd; paused on all nodes (using <xref linkend="repmgr-daemon-pause"/>):
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
ID | Name | Role | Priority | Status | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+----------+---------+---------+-------+---------+--------------------
1 | node1 | primary | 100 | running | running | 71987 | yes | n/a
2 | node2 | standby | 100 | running | running | 71996 | yes | 0 second(s) ago
3 | node3 | standby | 100 | running | running | 72042 | yes | 0 second(s) ago
</programlisting>
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node1 | primary | * running | | running | 96563 | yes | n/a
2 | node2 | standby | running | node1 | running | 96572 | yes | 1 second(s) ago
3 | node3 | standby | running | node1 | running | 96584 | yes | 0 second(s) ago</programlisting>
</para>
<para>
<application>repmgrd</application> not running on one node:
&repmgrd; not running on one node:
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
ID | Name | Role | Priority | Status | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+----------+---------+-------------+-------+---------+--------------------
1 | node1 | primary | 100 | running | running | 71987 | yes | n/a
2 | node2 | standby | 100 | running | not running | n/a | n/a | n/a
3 | node3 | standby | 100 | running | running | 72042 | yes | 0 second(s) ago</programlisting>
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+-------------+-------+---------+--------------------
1 | node1 | primary | * running | | running | 96563 | yes | n/a
2 | node2 | standby | running | node1 | not running | n/a | n/a | n/a
3 | node3 | standby | running | node1 | running | 96584 | yes | 0 second(s) ago</programlisting>
</para>
</refsect1>
@@ -96,9 +94,9 @@
parsing by scripts, e.g.:
<programlisting>
$ repmgr -f /etc/repmgr.conf daemon status --csv
1,node1,primary,1,1,5722,1,100,-1
2,node2,standby,1,0,-1,1,100,1
3,node3,standby,1,1,5779,1,100,1</programlisting>
1,node1,primary,1,1,5722,1,100,-1,default
2,node2,standby,1,0,-1,1,100,1,default
3,node3,standby,1,1,5779,1,100,1,default</programlisting>
</para>
<para>
The columns have following meanings:
@@ -129,25 +127,25 @@
<listitem>
<simpara>
<application>repmgrd</application> running (1 = running, 0 = not running, -1 = unknown)
&repmgrd; running (1 = running, 0 = not running, -1 = unknown)
</simpara>
</listitem>
<listitem>
<simpara>
<application>repmgrd</application> PID (-1 if not running or status unknown)
&repmgrd; PID (-1 if not running or status unknown)
</simpara>
</listitem>
<listitem>
<simpara>
<application>repmgrd</application> paused (1 = paused, 0 = not paused, -1 = unknown)
&repmgrd; paused (1 = paused, 0 = not paused, -1 = unknown)
</simpara>
</listitem>
<listitem>
<simpara>
<application>repmgrd</application> node priority
&repmgrd; node priority
</simpara>
</listitem>
@@ -157,9 +155,25 @@
</simpara>
</listitem>
<listitem>
<simpara>
node location
</simpara>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--detail</option></term>
<listitem>
<para>
Display additional information (<literal>location</literal>, <literal>priority</literal>)
about the &repmgr; configuration.
</para>
</listitem>
</varlistentry>
<varlistentry>
@@ -175,12 +189,10 @@
</refsect1>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-daemon-pause">, <xref linkend="repmgr-daemon-unpause">, <xref linkend="repmgr-cluster-show">
<xref linkend="repmgr-daemon-pause"/>, <xref linkend="repmgr-daemon-unpause"/>, <xref linkend="repmgr-cluster-show"/>
</para>
</refsect1>
</refentry>

View File

@@ -14,25 +14,25 @@
<refnamediv>
<refname>repmgr daemon stop</refname>
<refpurpose>Stop the <application>repmgrd</application> daemon</refpurpose>
<refpurpose>Stop the &repmgrd; daemon</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command stops the <application>repmgrd</application> daemon on the
This command stops the &repmgrd; daemon on the
local node.
</para>
<para>
By default, &repmgr; will wait for up to 15 seconds to confirm that <application>repmgrd</application>
By default, &repmgr; will wait for up to 15 seconds to confirm that &repmgrd;
stopped. This behaviour can be overridden by specifying a diffent value using the <option>--wait</option>
option, or disabled altogether with the <option>--no-wait</option> option.
</para>
<note>
<para>
If PostgreSQL is not running on the local node, under some circumstances &repmgr; may not
be able to confirm if <application>repmgrd</application> has actually stopped.
be able to confirm if &repmgrd; has actually stopped.
</para>
</note>
@@ -40,7 +40,7 @@
<para>
The <filename>repmgr.conf</filename> parameter <varname>repmgrd_service_stop_command</varname>
must be set for <command>repmgr daemon stop</command> to work; see section
<xref linkend="repmgr-daemon-stop-configuration"> for details.
<xref linkend="repmgr-daemon-stop-configuration"/> for details.
</para>
</important>
</refsect1>
@@ -50,7 +50,7 @@
<para>
<command>repmgr daemon stop</command> will execute the command defined by the
<varname>repmgrd_service_stop_command</varname> parameter in <filename>repmgr.conf</filename>.
This must be set to a shell command which will stop <application>repmgrd</application>;
This must be set to a shell command which will stop &repmgrd;;
if &repmgr; was installed from a package, this will be the service command defined by the
package. For more details see <link linkend="appendix-packages">Appendix: &repmgr; package details</link>.
</para>
@@ -59,7 +59,7 @@
<para>
If &repmgr; was installed from a system package, and you do not configure
<varname>repmgrd_service_stop_command</varname> to an appropriate service command, this may
result in the system becoming confused about the state of the <application>repmgrd</application>
result in the system becoming confused about the state of the &repmgrd;
service; this is particularly the case with <literal>systemd</literal>.
</para>
</important>
@@ -76,7 +76,7 @@
<term><option>--dry-run</option></term>
<listitem>
<para>
Check prerequisites but don't actually attempt to stop <application>repmgrd</application>.
Check prerequisites but don't actually attempt to stop &repmgrd;.
</para>
<para>
This action will output the command which would be executed.
@@ -89,7 +89,7 @@
<term><option>--wait</option></term>
<listitem>
<para>
Wait for the specified number of seconds to confirm that <application>repmgrd</application>
Wait for the specified number of seconds to confirm that &repmgrd;
stopped successfully.
</para>
<para>
@@ -103,7 +103,7 @@
<term><option>--no-wait</option></term>
<listitem>
<para>
Don't wait to confirm that <application>repmgrd</application>
Don't wait to confirm that &repmgrd;
stopped successfully.
</para>
<para>
@@ -124,17 +124,18 @@
<variablelist>
<varlistentry>
<indexterm>
<primary>repmgrd_service_stop_command</primary>
<secondary>with &quot;repmgr daemon stop&quot;</secondary>
</indexterm>
<term><option>repmgrd_service_stop_command</option></term>
<listitem>
<indexterm>
<primary>repmgrd_service_stop_command</primary>
<secondary>with &quot;repmgr daemon stop&quot;</secondary>
</indexterm>
<para>
<command>repmgr daemon stop</command> will execute the command defined by the
<varname>repmgrd_service_stop_command</varname> parameter in <filename>repmgr.conf</filename>.
This must be set to a shell command which will stop <application>repmgrd</application>;
This must be set to a shell command which will stop &repmgrd;;
if &repmgr; was installed from a package, this will be the service command defined by the
package. For more details see <link linkend="appendix-packages">Appendix: &repmgr; package details</link>.
</para>
@@ -142,7 +143,7 @@
<para>
If &repmgr; was installed from a system package, and you do not configure
<varname>repmgrd_service_stop_command</varname> to an appropriate service command, this may
result in the system becoming confused about the state of the <application>repmgrd</application>
result in the system becoming confused about the state of the &repmgrd;
service; this is particularly the case with <literal>systemd</literal>.
</para>
</important>
@@ -163,7 +164,7 @@
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
<application>repmgrd</application> could be stopped.
&repmgrd; could be stopped.
</para>
</listitem>
</varlistentry>
@@ -182,7 +183,7 @@
<term><option>ERR_REPMGRD_SERVICE (27)</option></term>
<listitem>
<para>
<application>repmgrd</application> could not be stopped.
&repmgrd; could not be stopped.
</para>
</listitem>
</varlistentry>
@@ -193,7 +194,7 @@
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-daemon-start">, <xref linkend="repmgr-daemon-status">, <xref linkend="repmgrd-daemon">
<xref linkend="repmgr-daemon-start"/>, <xref linkend="repmgr-daemon-status"/>, <xref linkend="repmgrd-daemon"/>
</para>
</refsect1>

View File

@@ -15,22 +15,22 @@
<refnamediv>
<refname>repmgr daemon unpause</refname>
<refpurpose>Instruct all <application>repmgrd</application> instances in the replication cluster to resume failover operations</refpurpose>
<refpurpose>Instruct all &repmgrd; instances in the replication cluster to resume failover operations</refpurpose>
</refnamediv>
<refsect1>
<title>Description</title>
<para>
This command can be run on any active node in the replication cluster to instruct all
running <application>repmgrd</application> instances to &quot;unpause&quot;
(following a previous execution of <xref linkend="repmgr-daemon-pause">)
running &repmgrd; instances to &quot;unpause&quot;
(following a previous execution of <xref linkend="repmgr-daemon-pause"/>)
and resume normal failover/monitoring operation.
</para>
<note>
<para>
It's important to wait a few seconds after restarting PostgreSQL on any node before running
<command>repmgr daemon pause</command>, as the <application>repmgrd</application> instance
<command>repmgr daemon pause</command>, as the &repmgrd; instance
on the restarted node will take a second or two before it has updated its status.
</para>
</note>
@@ -64,7 +64,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check if nodes are reachable but don't unpause <application>repmgrd</application>.
Check if nodes are reachable but don't unpause &repmgrd;.
</para>
</listitem>
</varlistentry>
@@ -82,7 +82,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<term><option>SUCCESS (0)</option></term>
<listitem>
<para>
<application>repmgrd</application> could be unpaused on all nodes.
&repmgrd; could be unpaused on all nodes.
</para>
</listitem>
</varlistentry>
@@ -91,7 +91,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<term><option>ERR_REPMGRD_PAUSE (26)</option></term>
<listitem>
<para>
<application>repmgrd</application> could not be unpaused on one or mode nodes.
&repmgrd; could not be unpaused on one or mode nodes.
</para>
</listitem>
</varlistentry>
@@ -102,7 +102,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-daemon-pause">, <xref linkend="repmgr-daemon-status">
<xref linkend="repmgr-daemon-pause"/>, <xref linkend="repmgr-daemon-status"/>
</para>
</refsect1>
</refentry>

View File

@@ -203,7 +203,7 @@
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-node-status">, <xref linkend="repmgr-cluster-show">
<xref linkend="repmgr-node-status"/>, <xref linkend="repmgr-cluster-show"/>
</para>
</refsect1>

View File

@@ -26,10 +26,10 @@
<tip>
<para>
If the node is running and needs to be attached to the current primary, use
<xref linkend="repmgr-standby-follow">.
<xref linkend="repmgr-standby-follow"/>.
</para>
<para>
Note <xref linkend="repmgr-standby-follow"> can only be used for standbys which have not diverged
Note <xref linkend="repmgr-standby-follow"/> can only be used for standbys which have not diverged
from the rest of the cluster.
</para>
</tip>
@@ -230,12 +230,13 @@
<refsect1 id="repmgr-node-rejoin-pg-rewind" xreflabel="Using pg_rewind">
<title>Using <command>pg_rewind</command></title>
<indexterm>
<primary>pg_rewind</primary>
<secondary>using with "repmgr node rejoin"</secondary>
</indexterm>
<title>Using <command>pg_rewind</command></title>
<para>
<command>repmgr node rejoin</command> can optionally use <command>pg_rewind</command> to re-integrate a
node which has diverged from the rest of the cluster, typically a failed primary.
@@ -321,7 +322,7 @@
If <option>--force-rewind</option> is used with the <option>--dry-run</option> option,
this checks the prerequisites for using <application>pg_rewind</application>, but is
not an absolute guarantee that actually executing <application>pg_rewind</application>
will succeed. See also section <xref linkend="repmgr-node-rejoin-caveats"> below.
will succeed. See also section <xref linkend="repmgr-node-rejoin-caveats"/> below.
</para>
</note>
@@ -344,12 +345,13 @@
<refsect1 id="repmgr-node-rejoin-caveats" xreflabel="Caveats">
<indexterm>
<primary>repmgr node rejoin</primary>
<secondary>caveats</secondary>
</indexterm>
<title>Caveats when using <command>repmgr node rejoin</command></title>
<indexterm>
<primary>repmgr node rejoin</primary>
<secondary>caveats</secondary>
</indexterm>
<para>
<command>repmgr node rejoin</command> attempts to determine whether it will succeed by
comparing the timelines and relative WAL positions of the local node (rejoin candidate) and primary
@@ -381,7 +383,7 @@
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-standby-follow">
<xref linkend="repmgr-standby-follow"/>
</para>
</refsect1>
</refentry>

View File

@@ -84,7 +84,7 @@
<refsect1>
<title>See also</title>
<para>
See <xref linkend="repmgr-node-check"> to diagnose issues and <xref linkend="repmgr-cluster-show">
See <xref linkend="repmgr-node-check"/> to diagnose issues and <xref linkend="repmgr-cluster-show"/>
for an overview of all nodes in the cluster.
</para>
</refsect1>

View File

@@ -38,23 +38,25 @@
Execute with the <option>--dry-run</option> option to check what would happen without
actually registering the primary.
</para>
<para>
<command>repmgr master register</command> can be used as an alias for
<command>repmgr primary register</command>.
</para>
<note>
<para>
If providing the configuration file location with <option>-f/--config-file</option>,
avoid using a relative path, as &repmgr; stores the configuration file location
in the repmgr metadata for use when &repmgr; is executed remotely (e.g. during
<xref linkend="repmgr-standby-switchover">). &repmgr; will attempt to convert the
<xref linkend="repmgr-standby-switchover"/>). &repmgr; will attempt to convert the
a relative path into an absolute one, but this may not be the same as the path you
would explicitly provide (e.g. <filename>./repmgr.conf</filename> might be converted
to <filename>/path/to/./repmgr.conf</filename>, whereas you'd normally write
<filename>/path/to/repmgr.conf</filename>).
</para>
</note>
<para>
<command>repmgr master register</command> can be used as an alias for
<command>repmgr primary register</command>.
</para>
</refsect1>
<refsect1>

View File

@@ -85,12 +85,13 @@
</refsect1>
<refsect1 id="repmgr-standby-clone-recovery-conf">
<indexterm>
<title>Customising recovery.conf</title>
<indexterm>
<primary>recovery.conf</primary>
<secondary>customising with &quot;repmgr standby clone&quot;</secondary>
</indexterm>
</indexterm>
<title>Customising recovery.conf</title>
<para>
By default, &repmgr; will create a minimal <filename>recovery.conf</filename>
containing following parameters:
@@ -142,7 +143,7 @@
We recommend using <ulink url="https://www.pgbarman.org/">Barman</ulink> to manage
WAL file archiving. For more details on combining &repmgr; and <application>Barman</application>,
in particular using <varname>restore_command</varname> to configure Barman as a backup source of
WAL files, see <xref linkend="cloning-from-barman">.
WAL files, see <xref linkend="cloning-from-barman"/>.
</para>
</note>
@@ -154,7 +155,7 @@
When initially cloning a standby, you will need to ensure
that all required WAL files remain available while the cloning is taking
place. To ensure this happens when using the default <command>pg_basebackup</command> method,
&repmgr; will set <command>pg_basebackup</command>'s <literal>--xlog-method</literal>
&repmgr; will set <command>pg_basebackup</command>'s <literal>--wal-method</literal>
parameter to <literal>stream</literal>,
which will ensure all WAL files generated during the cloning process are
streamed in parallel with the main backup. Note that this requires two
@@ -164,10 +165,10 @@
</para>
<para>
To override this behaviour, in <filename>repmgr.conf</filename> set
<command>pg_basebackup</command>'s <literal>--xlog-method</literal>
<command>pg_basebackup</command>'s <literal>--wal-method</literal>
parameter to <literal>fetch</literal>:
<programlisting>
pg_basebackup_options='--xlog-method=fetch'</programlisting>
pg_basebackup_options='--wal-method=fetch'</programlisting>
and ensure that <literal>wal_keep_segments</literal> is set to an appropriately high value.
See the <ulink url="https://www.postgresql.org/docs/current/app-pgbasebackup.html">
@@ -176,9 +177,8 @@
<note>
<simpara>
From PostgreSQL 10, <command>pg_basebackup</command>'s
<literal>--xlog-method</literal> parameter has been renamed to
<literal>--wal-method</literal>.
If using PostgreSQL 9.6 or earlier, replace <literal>--wal-method</literal>
with <literal>--xlog-method</literal>.
</simpara>
</note>
</refsect1>
@@ -186,12 +186,13 @@
<refsect1 id="repmgr-standby-create-recovery-conf">
<title>Using a standby cloned by another method</title>
<indexterm>
<primary>recovery.conf</primary>
<secondary>generating for a standby cloned by another method</secondary>
</indexterm>
<title>Using a standby cloned by another method</title>
<para>
&repmgr; supports standbys cloned by another method (e.g. using <application>barman</application>'s
<command><ulink url="http://docs.pgbarman.org/release/2.5/#recover">barman recover</ulink></command> command).
@@ -296,7 +297,7 @@
<term><option> --recovery-conf-only</option></term>
<listitem>
<para>
Create <filename>recovery.conf</filename> file for a previously cloned instance. &repmgr 4.0.4 and later.
Create <filename>recovery.conf</filename> file for a previously cloned instance. &repmgr; 4.0.4 and later.
</para>
</listitem>
</varlistentry>
@@ -325,9 +326,13 @@
<term><option>--upstream-conninfo</option></term>
<listitem>
<para>
<literal>primary_conninfo</literal> value to write in recovery.conf
<literal>primary_conninfo</literal> value to write in <filename>recovery.conf</filename>
when the intended upstream server does not yet exist.
</para>
<para>
Note that &repmgr; may modify the provided value, in particular to set the
correct <literal>application_name</literal>.
</para>
</listitem>
</varlistentry>
@@ -361,7 +366,7 @@
<refsect1>
<title>See also</title>
<para>
See <xref linkend="cloning-standbys"> for details about various aspects of cloning.
See <xref linkend="cloning-standbys"/> for details about various aspects of cloning.
</para>
</refsect1>
</refentry>

View File

@@ -41,7 +41,7 @@
<tip>
<para>
To re-add an inactive node to the replication cluster, use
<xref linkend="repmgr-node-rejoin">.
<xref linkend="repmgr-node-rejoin"/>.
</para>
</tip>
@@ -122,7 +122,7 @@
If not provided, &repmgr; will attempt to follow the current primary node.
</para>
<para>
Note that when using <application>repmgrd</application>, <option>--upstream-node-id</option>
Note that when using &repmgrd;, <option>--upstream-node-id</option>
should always be configured;
see <link linkend="repmgrd-automatic-failover-configuration">Automatic failover configuration</link>
for details.
@@ -252,7 +252,7 @@ DETAIL: follow target server's timeline 2 forked off current database system tim
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-node-rejoin">
<xref linkend="repmgr-node-rejoin"/>
</para>
</refsect1>
</refentry>

View File

@@ -17,18 +17,50 @@
<para>
Promotes a standby to a primary if the current primary has failed. This
command requires a valid <filename>repmgr.conf</filename> file for the standby, either
specified explicitly with <literal>-f/--config-file</literal> or located in a
specified explicitly with <literal>-f/--config-file</literal> or located in a
default location; no additional arguments are required.
</para>
<important>
<para>
If &repmgrd; is active, you must execute
<command><link linkend="repmgr-daemon-pause">repmgr daemon pause</link></command>
to temporarily disable &repmgrd; while making any changes
to the replication cluster.
</para>
</important>
<para>
If the standby promotion succeeds, the server will not need to be
restarted. However any other standbys will need to follow the new server,
by using <xref linkend="repmgr-standby-follow">; if <application>repmgrd</application>
is active, it will handle this automatically.
restarted. However any other standbys will need to follow the new primary,
and will need to be restarted to do this.
</para>
<para>
Note that &repmgr; will wait for up to <varname>promote_check_timeout</varname> seconds
(default: 60 seconds) to verify that the standby has been promoted, and will
Beginning with <link linkend="release-4.4">repmgr 4.4</link>,
the option <option>--siblings-follow</option> can be used to have
all other standbys (and a witness server, if in use)
follow the new primary.
</para>
<note>
<para>
If using &repmgrd;, when invoking
<command>repmgr standby promote</command> (either directly via
the <option>promote_command</option>, or in a script called
via <option>promote_command</option>), <option>--siblings-follow</option>
<emphasis>must not</emphasis> be included as a
command line option for <command>repmgr standby promote</command>.
</para>
</note>
<para>
In <link linkend="release-4.3">repmgr 4.3</link> and earlier,
<command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command>
must be executed on each standby individually.
</para>
<para>
&repmgr; will wait for up to <varname>promote_check_timeout</varname> seconds
(default: <literal>60</literal>) to verify that the standby has been promoted, and will
check the promotion every <varname>promote_check_interval</varname> seconds (default: 1 second).
Both values can be defined in <filename>repmgr.conf</filename>.
</para>
@@ -72,13 +104,36 @@
<title>Options</title>
<variablelist>
<varlistentry>
<term><option>--dry-run</option></term>
<listitem>
<para>
Check if this node can be promoted, but don't carry out the promotion
Check if this node can be promoted, but don't carry out the promotion.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--siblings-follow</option></term>
<listitem>
<para>
Have all sibling nodes (nodes formerly attached to the same upstream
node as the promotion candidate) follow this node after it has been promoted.
</para>
<para>
Note that a witness server, if in use, is also
counted as a &quot;sibling node&quot; as it needs to be instructed to
synchronise its metadata with the new primary.
</para>
<important>
<para>
Do <emphasis>not</emphasis> provide this option when configuring
&repmgrd;'s <option>promote_command</option>.
</para>
</important>
</listitem>
</varlistentry>
</variablelist>
</refsect1>

View File

@@ -17,7 +17,7 @@
<para>
<command>repmgr standby register</command> adds a standby's information to
the &repmgr; metadata. This command needs to be executed to enable
promote/follow operations and to allow <application>repmgrd</application> to work with the node.
promote/follow operations and to allow &repmgrd; to work with the node.
An existing standby can be registered using this command. Execute with the
<literal>--dry-run</literal> option to check what would happen without actually registering the
standby.
@@ -28,7 +28,7 @@
If providing the configuration file location with <literal>-f/--config-file</literal>,
avoid using a relative path, as &repmgr; stores the configuration file location
in the repmgr metadata for use when &repmgr; is executed remotely (e.g. during
<xref linkend="repmgr-standby-switchover">). &repmgr; will attempt to convert the
<xref linkend="repmgr-standby-switchover"/>). &repmgr; will attempt to convert the
a relative path into an absolute one, but this may not be the same as the path you
would explicitly provide (e.g. <filename>./repmgr.conf</filename> might be converted
to <filename>/path/to/./repmgr.conf</filename>, whereas you'd normally write
@@ -59,7 +59,7 @@
<para>
Depending on your environment and workload, it may take some time for the standby's node record
to propagate from the primary to the standby. Some actions (such as starting
<application>repmgrd</application>) require that the standby's node record
&repmgrd;) require that the standby's node record
is present and up-to-date to function correctly.
</para>
<para>

View File

@@ -22,11 +22,13 @@
passwordless SSH connection to the current primary.
</para>
<para>
If other standbys are connected to the demotion candidate, &repmgr; can instruct
If other nodes are connected to the demotion candidate, &repmgr; can instruct
these to follow the new primary if the option <literal>--siblings-follow</literal>
is specified. This requires a passwordless SSH connection between the promotion
candidate (new primary) and the standbys attached to the demotion candidate
(existing primary).
candidate (new primary) and the nodes attached to the demotion candidate
(existing primary). Note that a witness server, if in use, is also
counted as a &quot;sibling node&quot; as it needs to be instructed to
synchronise its metadata with the new primary.
</para>
<note>
<para>
@@ -42,18 +44,18 @@
</note>
<para>
For more details on performing a switchover, including preparation and configuration,
see section <xref linkend="performing-switchover">.
see section <xref linkend="performing-switchover"/>.
</para>
<note>
<para>
From <link linkend="release-4.2">repmgr 4.2</link>, &repmgr; will instruct any running
<application>repmgrd</application> instances to pause operations while the switchover
is being carried out, to prevent <application>repmgrd</application> from
unintentionally promoting a node. For more details, see <xref linkend="repmgrd-pausing">.
&repmgrd; instances to pause operations while the switchover
is being carried out, to prevent &repmgrd; from
unintentionally promoting a node. For more details, see <xref linkend="repmgrd-pausing"/>.
</para>
<para>
Users of &repmgr; versions prior to 4.2 should ensure that <application>repmgrd</application>
Users of &repmgr; versions prior to 4.2 should ensure that &repmgrd;
is not running on any nodes while a switchover is being executed.
</para>
</note>
@@ -115,7 +117,7 @@
(and the prerequisites for using <application>pg_rewind</application> are met).
If using PostgreSQL 9.3 or 9.4, and the <application>pg_rewind</application>
binary is not installed in the PostgreSQL <filename>bin</filename> directory,
provide its full path. For more details see also <xref linkend="switchover-pg-rewind">.
provide its full path. For more details see also <xref linkend="switchover-pg-rewind"/>.
</para>
</listitem>
</varlistentry>
@@ -134,12 +136,30 @@
<term><option>--repmgrd-no-pause</option></term>
<listitem>
<para>
Don't pause <application>repmgrd</application> while executing a switchover.
Don't pause &repmgrd; while executing a switchover.
</para>
<para>
This option should not be used unless you take steps by other means
to ensure <application>repmgrd</application> is paused or not
to ensure &repmgrd; is paused or not
running on all nodes.
</para>
<para>
This option cannot be used together with <option>--repmgrd-force-unpause</option>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>--repmgrd-force-unpause</option></term>
<listitem>
<para>
Always unpause all &repmgrd; instances after executing a switchover. This will ensure that
any &repmgrd; instances which were paused before the switchover will be
unpaused.
</para>
<para>
This option cannot be used together with <option>--repmgrd-no-pause</option>.
</para>
</listitem>
</varlistentry>
@@ -150,8 +170,18 @@
<term><option>--siblings-follow</option></term>
<listitem>
<para>
Have standbys attached to the old primary follow the new primary.
Have nodes attached to the old primary follow the new primary.
</para>
<para>
This will also ensure that a witness node, if in use, is updated
with the new primary's data.
</para>
<note>
<para>
In a future &repmgr; release, <option>--siblings-follow</option> will be applied
by default.
</para>
</note>
</listitem>
</varlistentry>
</variablelist>
@@ -169,13 +199,14 @@
<variablelist>
<varlistentry>
<indexterm>
<primary>replication_lag_critical</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<term><option>replication_lag_critical</option></term>
<listitem>
<indexterm>
<primary>replication_lag_critical</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<para>
If replication lag (in seconds) on the standby exceeds this value, the
switchover will be aborted (unless the <literal>-F/--force</literal> option
@@ -185,13 +216,14 @@
</varlistentry>
<varlistentry>
<indexterm>
<primary>shutdown_check_timeout</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<term><option>shutdown_check_timeout</option></term>
<listitem>
<indexterm>
<primary>shutdown_check_timeout</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<para>
The maximum number of seconds to wait for the
demotion candidate (current primary) to shut down, before aborting the switchover.
@@ -213,13 +245,13 @@
<varlistentry>
<indexterm>
<primary>wal_receive_check_timeout</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<term><option>wal_receive_check_timeout</option></term>
<listitem>
<indexterm>
<primary>wal_receive_check_timeout</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<para>
After the primary has shut down, the maximum number of seconds to wait for the
walreceiver on the standby to flush WAL to disk before comparing WAL receive location
@@ -230,13 +262,14 @@
<varlistentry>
<indexterm>
<primary>standby_reconnect_timeout</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<term><option>standby_reconnect_timeout</option></term>
<listitem>
<indexterm>
<primary>standby_reconnect_timeout</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<para>
The maximum number of seconds to attempt to wait for the demotion candidate (former primary)
to reconnect to the promoted primary (default: 60 seconds)
@@ -249,14 +282,16 @@
</listitem>
</varlistentry>
<varlistentry>
<indexterm>
<primary>node_rejoin_timeout</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<varlistentry>
<term><option>node_rejoin_timeout</option></term>
<listitem>
<indexterm>
<primary>node_rejoin_timeout</primary>
<secondary>with &quot;repmgr standby switchover&quot;</secondary>
</indexterm>
<para>
maximum number of seconds to attempt to wait for the demotion candidate (former primary)
to reconnect to the promoted primary (default: 60 seconds)
@@ -350,10 +385,10 @@
<refsect1>
<title>See also</title>
<para>
<xref linkend="repmgr-standby-follow">, <xref linkend="repmgr-node-rejoin">
<xref linkend="repmgr-standby-follow"/>, <xref linkend="repmgr-node-rejoin"/>
</para>
<para>
For more details on performing a switchover operation, see the section <xref linkend="performing-switchover">.
For more details on performing a switchover operation, see the section <xref linkend="performing-switchover"/>.
</para>
</refsect1>

View File

@@ -20,7 +20,7 @@
record to the &repmgr; metadata, and if necessary initialises the witness
node by installing the &repmgr; extension and copying the &repmgr; metadata
to the witness server. This command needs to be executed to enable
use of the witness server with <application>repmgrd</application>.
use of the witness server with &repmgrd;.
</para>
<para>
When executing <command>repmgr witness register</command>, database connection

View File

@@ -1,14 +1,16 @@
<!-- doc/src/sgml/postgres.sgml -->
<!-- doc/repmgr.xml -->
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V4.2//EN" [
<!ENTITY % version SYSTEM "version.sgml">
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd"
[
<!ENTITY % version SYSTEM "version.xml">
%version;
<!ENTITY % filelist SYSTEM "filelist.sgml">
<!ENTITY % filelist SYSTEM "filelist.xml">
%filelist;
<!ENTITY repmgr "<productname>repmgr</productname>">
<!ENTITY repmgrd "<productname>repmgrd</productname>">
<!ENTITY postgres "<productname>PostgreSQL</productname>">
]>
@@ -25,13 +27,19 @@
<para>
This is the official documentation of &repmgr; &repmgrversion; for
use with PostgreSQL 9.3 - PostgreSQL 11.
It describes the functionality supported by the current version of &repmgr;.
</para>
<para>
&repmgr; is being continually developed and we strongly recommend using the
latest version. Please check the
<ulink url="https://repmgr.org/">repmgr website</ulink> for details
about the current &repmgr; version as well as the
<ulink url="https://repmgr.org/docs/current/index.html">current repmgr documentation</ulink>.
</para>
<para>
&repmgr; is developed by
<ulink url="https://2ndquadrant.com">2ndQuadrant</ulink>
along with contributions from other individuals and companies.
along with contributions from other individuals and organisations.
Contributions from the community are appreciated and welcome - get
in touch via <ulink url="https://github.com/2ndQuadrant/repmgr">github</ulink>
or <ulink url="https://groups.google.com/group/repmgr">the mailing list/forum</ulink>.
@@ -43,7 +51,7 @@
&repmgr; is fully supported by 2ndQuadrant's
<ulink url="https://www.2ndquadrant.com/en/support/support-postgresql/">24/7 Production Support</ulink>.
2ndQuadrant, a Major Sponsor of the PostgreSQL project, continues to develop and maintain &repmgr;.
Other companies as well as individual developers are welcome to participate in the efforts.
Other organisations as well as individual developers are welcome to participate in the efforts.
</para>
</abstract>
@@ -73,7 +81,6 @@
&promoting-standby;
&follow-new-primary;
&switchover;
&configuring-witness-server;
&event-notifications;
&upgrading-repmgr;
</part>
@@ -122,7 +129,6 @@
&appendix-packages;
&appendix-support;
<![%include-index;[&bookindex;]]>
<![%include-xslt-index;[<index id="bookindex"></index>]]>
<index id="bookindex"></index>
</book>

View File

@@ -1,246 +0,0 @@
<chapter id="repmgrd-automatic-failover" xreflabel="Automatic failover with repmgrd">
<indexterm>
<primary>repmgrd</primary>
<secondary>automatic failover</secondary>
</indexterm>
<title>Automatic failover with repmgrd</title>
<para>
<application>repmgrd</application> is a management and monitoring daemon which runs
on each node in a replication cluster. It can automate actions such as
failover and updating standbys to follow the new primary, as well as
providing monitoring information about the state of each standby.
</para>
<sect1 id="repmgrd-witness-server" xreflabel="Using a witness server with repmgrd">
<indexterm>
<primary>repmgrd</primary>
<secondary>witness server</secondary>
</indexterm>
<indexterm>
<primary>witness server</primary>
<secondary>repmgrd</secondary>
</indexterm>
<title>Using a witness server with repmgrd</title>
<para>
In a situation caused e.g. by a network interruption between two
data centres, it's important to avoid a &quot;split-brain&quot; situation where
both sides of the network assume they are the active segment and the
side without an active primary unilaterally promotes one of its standbys.
</para>
<para>
To prevent this situation happening, it's essential to ensure that one
network segment has a &quot;voting majority&quot;, so other segments will know
they're in the minority and not attempt to promote a new primary. Where
an odd number of servers exists, this is not an issue. However, if each
network has an even number of nodes, it's necessary to provide some way
of ensuring a majority, which is where the witness server becomes useful.
</para>
<para>
This is not a fully-fledged standby node and is not integrated into
replication, but it effectively represents the &quot;casting vote&quot; when
deciding which network segment has a majority. A witness server can
be set up using <link linkend="repmgr-witness-register"><command>repmgr witness register</command></link>;
see also section <link linkend="using-witness-server">Using a witness server</link>.
</para>
<note>
<para>
It only
makes sense to create a witness server in conjunction with running
<application>repmgrd</application>; the witness server will require its own
<application>repmgrd</application> instance.
</para>
</note>
</sect1>
<sect1 id="repmgrd-network-split" xreflabel="Handling network splits with repmgrd">
<indexterm>
<primary>repmgrd</primary>
<secondary>network splits</secondary>
</indexterm>
<indexterm>
<primary>network splits</primary>
</indexterm>
<title>Handling network splits with repmgrd</title>
<para>
A common pattern for replication cluster setups is to spread servers over
more than one datacentre. This can provide benefits such as geographically-
distributed read replicas and DR (disaster recovery capability). However
this also means there is a risk of disconnection at network level between
datacentre locations, which would result in a split-brain scenario if
servers in a secondary data centre were no longer able to see the primary
in the main data centre and promoted a standby among themselves.
</para>
<para>
&repmgr; enables provision of &quot;<xref linkend="witness-server">&quot; to
artificially create a quorum of servers in a particular location, ensuring
that nodes in another location will not elect a new primary if they
are unable to see the majority of nodes. However this approach does not
scale well, particularly with more complex replication setups, e.g.
where the majority of nodes are located outside of the primary datacentre.
It also means the <literal>witness</literal> node needs to be managed as an
extra PostgreSQL instance outside of the main replication cluster, which
adds administrative and programming complexity.
</para>
<para>
<literal>repmgr4</literal> introduces the concept of <literal>location</literal>:
each node is associated with an arbitrary location string (default is
<literal>default</literal>); this is set in <filename>repmgr.conf</filename>, e.g.:
<programlisting>
node_id=1
node_name=node1
conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/data'
location='dc1'</programlisting>
</para>
<para>
In a failover situation, <application>repmgrd</application> will check if any servers in the
same location as the current primary node are visible. If not, <application>repmgrd</application>
will assume a network interruption and not promote any node in any
other location (it will however enter <link linkend="repmgrd-degraded-monitoring">degraded monitoring</link>
mode until a primary becomes visible).
</para>
</sect1>
<sect1 id="repmgrd-standby-disconnection-on-failover" xreflabel="Standby disconnection on failover">
<indexterm>
<primary>repmgrd</primary>
<secondary>standby disconnection on failover</secondary>
</indexterm>
<indexterm>
<primary>standby disconnection on failover</primary>
</indexterm>
<title>Standby disconnection on failover</title>
<para>
If <option>standby_disconnect_on_failover</option> is set to <literal>true</literal> in
<filename>repmgr.conf</filename>, in a failover situation <application>repmgrd</application> will forcibly disconnect
the local node's WAL receiver before making a failover decision.
</para>
<note>
<para>
<option>standby_disconnect_on_failover</option> is available from PostgreSQL 9.5 and later.
Additionally this requires that the <literal>repmgr</literal> database user is a superuser.
</para>
</note>
<para>
By doing this, it's possible to ensure that, at the point the failover decision is made, no nodes
are receiving data from the primary and their LSN location will be static.
</para>
<important>
<para>
<option>standby_disconnect_on_failover</option> <emphasis>must</emphasis> be set to the same value on
all nodes.
</para>
</important>
<para>
Note that when using <option>standby_disconnect_on_failover</option> there will be a delay of 5 seconds
plus however many seconds it takes to confirm the WAL receiver is disconnected before
<application>repmgrd</application> proceeds with the failover decision.
</para>
<para>
Following the failover operation, no matter what the outcome, each node will reconnect its WAL receiver.
</para>
<para>
If using <option>standby_disconnect_on_failover</option>, we recommend that the
<option>primary_visibility_consensus</option> option is also used.
</para>
</sect1>
<sect1 id="repmgrd-failover-validation" xreflabel="Failover validation">
<indexterm>
<primary>repmgrd</primary>
<secondary>failover validation</secondary>
</indexterm>
<indexterm>
<primary>failover validation</primary>
</indexterm>
<title>Failover validation</title>
<para>
From <link linkend="release-4.3">repmgr 4.3</link>, &repmgr; makes it possible to provide a script
to <application>repmgrd</application> which, in a failover situation,
will be executed by the promotion candidate (the node which has been selected
to be the new primary) to confirm whether the node should actually be promoted.
</para>
<para>
To use this, <option>failover_validation_command</option> in <filename>repmgr.conf</filename>
to a script executable by the <literal>postgres</literal> system user, e.g.:
<programlisting>
failover_validation_command=/path/to/script.sh %n %a</programlisting>
</para>
<para>
The <literal>%n</literal> parameter will be replaced with the node ID, and the
<literal>%a</literal> parameter will be replaced by the node name when the script is executed.
</para>
<para>
This script must return an exit code of <literal>0</literal> to indicate the node should promote itself.
Any other value will result in the promotion being aborted and the election rerun.
There is a pause of <option>election_rerun_interval</option> seconds before the election is rerun.
</para>
<para>
Sample <application>repmgrd</application> log file output during which the failover validation
script rejects the proposed promotion candidate:
<programlisting>
[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
[2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2)
[2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command"
[2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2
[2019-03-13 21:01:30] [INFO] output returned by failover validation command:
Node ID: 2
[2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1"
[2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun
[2019-03-13 21:01:30] [INFO] 1 followers to notify
[2019-03-13 21:01:30] [NOTICE] notifying node "node3" (node ID: 3) to rerun promotion candidate selection
INFO: node 3 received notification to rerun promotion candidate election
[2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval")</programlisting>
</para>
</sect1>
<sect1 id="cascading-replication" xreflabel="Cascading replication">
<indexterm>
<primary>repmgrd</primary>
<secondary>cascading replication</secondary>
</indexterm>
<indexterm>
<primary>cascading replication</primary>
<secondary>repmgrd</secondary>
</indexterm>
<title>repmgrd and cascading replication</title>
<para>
Cascading replication - where a standby can connect to an upstream node and not
the primary server itself - was introduced in PostgreSQL 9.2. &repmgr; and
<application>repmgrd</application> support cascading replication by keeping track of the relationship
between standby servers - each node record is stored with the node id of its
upstream ("parent") server (except of course the primary server).
</para>
<para>
In a failover situation where the primary node fails and a top-level standby
is promoted, a standby connected to another standby will not be affected
and continue working as normal (even if the upstream standby it's connected
to becomes the primary node). If however the node's direct upstream fails,
the &quot;cascaded standby&quot; will attempt to reconnect to that node's parent
(unless <varname>failover</varname> is set to <literal>manual</literal> in
<filename>repmgr.conf</filename>).
</para>
</sect1>
</chapter>

View File

@@ -0,0 +1,925 @@
<chapter id="repmgrd-automatic-failover" xreflabel="Automatic failover with repmgrd">
<title>Automatic failover with repmgrd</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>automatic failover</secondary>
</indexterm>
<para>
&repmgrd; is a management and monitoring daemon which runs
on each node in a replication cluster. It can automate actions such as
failover and updating standbys to follow the new primary, as well as
providing monitoring information about the state of each standby.
</para>
<sect1 id="repmgrd-witness-server" xreflabel="Using a witness server with repmgrd">
<title>Using a witness server</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>witness server</secondary>
</indexterm>
<indexterm>
<primary>witness server</primary>
<secondary>repmgrd</secondary>
</indexterm>
<para>
A <xref linkend="witness-server"/> is a normal PostgreSQL instance which
is not part of the streaming replication cluster; its purpose is, if a
failover situation occurs, to provide proof that it is the primary server
itself which is unavailable, rather than e.g. a network split between
different physical locations.
</para>
<para>
A typical use case for a witness server is a two-node streaming replication
setup, where the primary and standby are in different locations (data centres).
By creating a witness server in the same location (data centre) as the primary,
if the primary becomes unavailable it's possible for the standby to decide whether
it can promote itself without risking a "split brain" scenario: if it can't see either the
witness or the primary server, it's likely there's a network-level interruption
and it should not promote itself. If it can see the witness but not the primary,
this proves there is no network interruption and the primary itself is unavailable,
and it can therefore promote itself (and ideally take action to fence the
former primary).
</para>
<note>
<para>
<emphasis>Never</emphasis> install a witness server on the same physical host
as another node in the replication cluster managed by &repmgr; - it's essential
the witness is not affected in any way by failure of another node.
</para>
</note>
<para>
For more complex replication scenarios, e.g. with multiple datacentres, it may
be preferable to use location-based failover, which ensures that only nodes
in the same location as the primary will ever be promotion candidates;
see <xref linkend="repmgrd-network-split"/> for more details.
</para>
<note>
<simpara>
A witness server will only be useful if &repmgrd;
is in use.
</simpara>
</note>
<sect2 id="creating-witness-server">
<title>Creating a witness server</title>
<para>
To create a witness server, set up a normal PostgreSQL instance on a server
in the same physical location as the cluster's primary server.
</para>
<para>
This instance should <emphasis>not</emphasis> be on the same physical host as the primary server,
as otherwise if the primary server fails due to hardware issues, the witness
server will be lost too.
</para>
<note>
<simpara>
&repmgr; 3.3 and earlier provided a <command>repmgr create witness</command>
command, which would automatically create a PostgreSQL instance. However
this often resulted in an unsatisfactory, hard-to-customise instance.
</simpara>
</note>
<para>
The witness server should be configured in the same way as a normal
&repmgr; node; see section <xref linkend="configuration"/>.
</para>
<para>
Register the witness server with <xref linkend="repmgr-witness-register"/>.
This will create the &repmgr; extension on the witness server, and make
a copy of the &repmgr; metadata.
</para>
<note>
<simpara>
As the witness server is not part of the replication cluster, further
changes to the &repmgr; metadata will be synchronised by
&repmgrd;.
</simpara>
</note>
<para>
Once the witness server has been configured, &repmgrd;
should be started.
</para>
<para>
To unregister a witness server, use <xref linkend="repmgr-witness-unregister"/>.
</para>
</sect2>
</sect1>
<sect1 id="repmgrd-network-split" xreflabel="Handling network splits with repmgrd">
<title>Handling network splits with repmgrd</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>network splits</secondary>
</indexterm>
<indexterm>
<primary>network splits</primary>
</indexterm>
<para>
A common pattern for replication cluster setups is to spread servers over
more than one datacentre. This can provide benefits such as geographically-
distributed read replicas and DR (disaster recovery capability). However
this also means there is a risk of disconnection at network level between
datacentre locations, which would result in a split-brain scenario if
servers in a secondary data centre were no longer able to see the primary
in the main data centre and promoted a standby among themselves.
</para>
<para>
&repmgr; enables provision of &quot;<xref linkend="witness-server"/>&quot; to
artificially create a quorum of servers in a particular location, ensuring
that nodes in another location will not elect a new primary if they
are unable to see the majority of nodes. However this approach does not
scale well, particularly with more complex replication setups, e.g.
where the majority of nodes are located outside of the primary datacentre.
It also means the <literal>witness</literal> node needs to be managed as an
extra PostgreSQL instance outside of the main replication cluster, which
adds administrative and programming complexity.
</para>
<para>
<literal>repmgr4</literal> introduces the concept of <literal>location</literal>:
each node is associated with an arbitrary location string (default is
<literal>default</literal>); this is set in <filename>repmgr.conf</filename>, e.g.:
<programlisting>
node_id=1
node_name=node1
conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/postgresql/data'
location='dc1'</programlisting>
</para>
<para>
In a failover situation, &repmgrd; will check if any servers in the
same location as the current primary node are visible. If not, &repmgrd;
will assume a network interruption and not promote any node in any
other location (it will however enter <link linkend="repmgrd-degraded-monitoring">degraded monitoring</link>
mode until a primary becomes visible).
</para>
</sect1>
<sect1 id="repmgrd-primary-visibility-consensus" xreflabel="Primary visibility consensus">
<title>Primary visibility consensus</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>primary visibility consensus</secondary>
</indexterm>
<indexterm>
<primary>primary_visibility_consensus</primary>
</indexterm>
<para>
In more complex replication setups, particularly where replication occurs between
multiple datacentres, it's possible that some but not all standbys get cut off from the
primary (but not from the other standbys).
</para>
<para>
In this situation, normally it's not desirable for any of the standbys which have been
cut off to initiate a failover, as the primary is still functioning and standbys are
connected. Beginning with <link linkend="release-4.4">&repmgr; 4.4</link>
it is now possible for the affected standbys to build a consensus about whether
the primary is still available to some standbys (&quot;primary visibility consensus&quot;).
This is done by polling each standby for the time it last saw the primary;
if any have seen the primary very recently, it's reasonable
to infer that the primary is still available and a failover should not be started.
</para>
<para>
The time the primary was last seen by each node can be checked by executing
<link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>,
which includes this in its output, e.g.:
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
ID | Name | Role | Status | Upstream | repmgrd | PID | Paused? | Upstream last seen
----+-------+---------+-----------+----------+---------+-------+---------+--------------------
1 | node1 | primary | * running | | running | 96563 | no | n/a
2 | node2 | standby | running | node1 | running | 96572 | no | 1 second(s) ago
3 | node3 | standby | running | node1 | running | 96584 | no | 0 second(s) ago</programlisting>
</para>
<para>
To enable this functionality, in <filename>repmgr.conf</filename> set:
<programlisting>
primary_visibility_consensus=true</programlisting>
</para>
<note>
<para>
<option>primary_visibility_consensus</option> <emphasis>must</emphasis> be set to
<literal>true</literal> on all nodes for it to be effective.
</para>
</note>
<para>
The following sample &repmgrd; log output demonstrates the behaviour in a situation
where one of three standbys is no longer able to connect to the primary, but <emphasis>can</emphasis>
connect to the two other standbys (&quot;sibling nodes&quot;):
<programlisting>
[2019-05-17 05:36:12] [WARNING] unable to reconnect to node 1 after 3 attempts
[2019-05-17 05:36:12] [INFO] 2 active sibling nodes registered
[2019-05-17 05:36:12] [INFO] local node's last receive lsn: 0/7006E58
[2019-05-17 05:36:12] [INFO] checking state of sibling node "node3" (ID: 3)
[2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) reports its upstream is node 1, last seen 1 second(s) ago
[2019-05-17 05:36:12] [NOTICE] node 3 last saw primary node 1 second(s) ago, considering primary still visible
[2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/7006E58
[2019-05-17 05:36:12] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node2" (ID: 2)
[2019-05-17 05:36:12] [INFO] checking state of sibling node "node4" (ID: 4)
[2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) reports its upstream is node 1, last seen 0 second(s) ago
[2019-05-17 05:36:12] [NOTICE] node 4 last saw primary node 0 second(s) ago, considering primary still visible
[2019-05-17 05:36:12] [INFO] last receive LSN for sibling node "node4" (ID: 4) is: 0/7006E58
[2019-05-17 05:36:12] [INFO] node "node4" (ID: 4) has same LSN as current candidate "node2" (ID: 2)
[2019-05-17 05:36:12] [INFO] 2 nodes can see the primary
[2019-05-17 05:36:12] [DETAIL] following nodes can see the primary:
- node "node3" (ID: 3): 1 second(s) ago
- node "node4" (ID: 4): 0 second(s) ago
[2019-05-17 05:36:12] [NOTICE] cancelling failover as some nodes can still see the primary
[2019-05-17 05:36:12] [NOTICE] election cancelled
[2019-05-17 05:36:14] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state</programlisting>
In this situation it will cancel the failover and enter degraded monitoring node,
waiting for the primary to reappear.
</para>
</sect1>
<sect1 id="repmgrd-standby-disconnection-on-failover" xreflabel="Standby disconnection on failover">
<title>Standby disconnection on failover</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>standby disconnection on failover</secondary>
</indexterm>
<indexterm>
<primary>standby disconnection on failover</primary>
</indexterm>
<para>
If <option>standby_disconnect_on_failover</option> is set to <literal>true</literal> in
<filename>repmgr.conf</filename>, in a failover situation &repmgrd; will forcibly disconnect
the local node's WAL receiver before making a failover decision.
</para>
<note>
<para>
<option>standby_disconnect_on_failover</option> is available from PostgreSQL 9.5 and later.
Additionally this requires that the <literal>repmgr</literal> database user is a superuser.
</para>
</note>
<para>
By doing this, it's possible to ensure that, at the point the failover decision is made, no nodes
are receiving data from the primary and their LSN location will be static.
</para>
<important>
<para>
<option>standby_disconnect_on_failover</option> <emphasis>must</emphasis> be set to the same value on
all nodes.
</para>
</important>
<para>
Note that when using <option>standby_disconnect_on_failover</option> there will be a delay of 5 seconds
plus however many seconds it takes to confirm the WAL receiver is disconnected before
&repmgrd; proceeds with the failover decision.
</para>
<para>
Following the failover operation, no matter what the outcome, each node will reconnect its WAL receiver.
</para>
<para>
If using <option>standby_disconnect_on_failover</option>, we recommend that the
<option>primary_visibility_consensus</option> option is also used.
</para>
</sect1>
<sect1 id="repmgrd-failover-validation" xreflabel="Failover validation">
<title>Failover validation</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>failover validation</secondary>
</indexterm>
<indexterm>
<primary>failover validation</primary>
</indexterm>
<para>
From <link linkend="release-4.3">repmgr 4.3</link>, &repmgr; makes it possible to provide a script
to &repmgrd; which, in a failover situation,
will be executed by the promotion candidate (the node which has been selected
to be the new primary) to confirm whether the node should actually be promoted.
</para>
<para>
To use this, <option>failover_validation_command</option> in <filename>repmgr.conf</filename>
to a script executable by the <literal>postgres</literal> system user, e.g.:
<programlisting>
failover_validation_command=/path/to/script.sh %n %a</programlisting>
</para>
<para>
The <literal>%n</literal> parameter will be replaced with the node ID, and the
<literal>%a</literal> parameter will be replaced by the node name when the script is executed.
</para>
<para>
This script must return an exit code of <literal>0</literal> to indicate the node should promote itself.
Any other value will result in the promotion being aborted and the election rerun.
There is a pause of <option>election_rerun_interval</option> seconds before the election is rerun.
</para>
<para>
Sample &repmgrd; log file output during which the failover validation
script rejects the proposed promotion candidate:
<programlisting>
[2019-03-13 21:01:30] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
[2019-03-13 21:01:30] [NOTICE] promotion candidate is "node2" (ID: 2)
[2019-03-13 21:01:30] [NOTICE] executing "failover_validation_command"
[2019-03-13 21:01:30] [DETAIL] /usr/local/bin/failover-validation.sh 2
[2019-03-13 21:01:30] [INFO] output returned by failover validation command:
Node ID: 2
[2019-03-13 21:01:30] [NOTICE] failover validation command returned a non-zero value: "1"
[2019-03-13 21:01:30] [NOTICE] promotion candidate election will be rerun
[2019-03-13 21:01:30] [INFO] 1 followers to notify
[2019-03-13 21:01:30] [NOTICE] notifying node "node3" (ID: 3) to rerun promotion candidate selection
INFO: node 3 received notification to rerun promotion candidate election
[2019-03-13 21:01:30] [NOTICE] rerunning election after 15 seconds ("election_rerun_interval")</programlisting>
</para>
</sect1>
<sect1 id="cascading-replication" xreflabel="Cascading replication">
<title>repmgrd and cascading replication</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>cascading replication</secondary>
</indexterm>
<indexterm>
<primary>cascading replication</primary>
<secondary>repmgrd</secondary>
</indexterm>
<para>
Cascading replication - where a standby can connect to an upstream node and not
the primary server itself - was introduced in PostgreSQL 9.2. &repmgr; and
&repmgrd; support cascading replication by keeping track of the relationship
between standby servers - each node record is stored with the node id of its
upstream ("parent") server (except of course the primary server).
</para>
<para>
In a failover situation where the primary node fails and a top-level standby
is promoted, a standby connected to another standby will not be affected
and continue working as normal (even if the upstream standby it's connected
to becomes the primary node). If however the node's direct upstream fails,
the &quot;cascaded standby&quot; will attempt to reconnect to that node's parent
(unless <varname>failover</varname> is set to <literal>manual</literal> in
<filename>repmgr.conf</filename>).
</para>
</sect1>
<sect1 id="repmgrd-primary-child-disconnection" xreflabel="Monitoring standby disconnections on the primary">
<title>Monitoring standby disconnections on the primary node</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>standby disconnection</secondary>
</indexterm>
<indexterm>
<primary>repmgrd</primary>
<secondary>child node disconnection</secondary>
</indexterm>
<note>
<para>
This functionality is available in <link linkend="release-4.4">&repmgr; 4.4</link> and later.
</para>
</note>
<para>
When running on the primary node, &repmgrd; can
monitor connections and in particular disconnections by its attached
child nodes (standbys, and if in use, the witness server), and optionally
execute a custom command if certain criteria are met (such as the number of
attached nodes falling to zero following a failover to a new primary); this
command can be used for example to &quot;fence&quot; the node and ensure it
is isolated from any applications attempting to access the replication cluster.
</para>
<note>
<para>
Currently &repmgrd; can only detect disconnections
of streaming replication standbys and cannot determine whether a standby
has disconnected and fallen back to archive recovery.
</para>
<para>
See section <link linkend="repmgrd-primary-child-disconnection-caveats">caveats</link> below.
</para>
</note>
<sect2 id="repmgrd-primary-child-disconnection-monitoring-process">
<title>Standby disconnections monitoring process and criteria</title>
<para>
&repmgrd; monitors attached child nodes and decides
whether to invoke the user-defined command based on the following process
and criteria:
<itemizedlist>
<listitem>
<para>
Every few seconds (defined by the configuration parameter <varname>child_nodes_check_interval</varname>;
default: <literal>5</literal> seconds, a value of <literal>0</literal> disables this altogether), &repmgrd; queries
the <literal>pg_stat_replication</literal> system view and compares
the nodes present there against the list of nodes registered with &repmgr; which
should be attached to the primary.
</para>
<para>
If a witness server is in use, &repmgrd; connects to it and checks which upstream node
it is following.
</para>
</listitem>
<listitem>
<para>
If a child node (standby) is no longer present in <literal>pg_stat_replication</literal>,
&repmgrd; notes the time it detected the node's absence, and additionally generates a
<literal>child_node_disconnect</literal> event.
</para>
<para>
If a witness server is in use, and it is no longer following the primary, or not
reachable at all, &repmgrd; notes the time it detected the node's absence, and additionally generates a
<literal>child_node_disconnect</literal> event.
</para>
</listitem>
<listitem>
<para>
If a child node (standby) which was absent from <literal>pg_stat_replication</literal> reappears,
&repmgrd; clears the time it detected the node's absence, and additionally generates a
<literal>child_node_reconnect</literal> event.
</para>
<para>
If a witness server is in use, which was previously not reachable or not following the
primary node, has become reachable and is following the primary node, &repmgrd; clears the
time it detected the node's absence, and additionally generates a
<literal>child_node_reconnect</literal> event.
</para>
</listitem>
<listitem>
<para>
If an entirely new child node (standby or witness) is detected, &repmgrd; adds it to its internal list
and additionally generates a <literal>child_node_new_connect</literal> event.
</para>
</listitem>
<listitem>
<para>
If the <varname>child_nodes_disconnect_command</varname> parameter is set in
<filename>repmgr.conf</filename>, &repmgrd; will then loop through all child nodes.
If it determines that insufficient child nodes are connected, and a
minimum of <varname>child_nodes_disconnect_timeout</varname> seconds (default: <literal>30</literal>)
has elapsed since the last node became disconnected, &repmgrd; will then execute the
<varname>child_nodes_disconnect_command</varname> script.
</para>
<para>
By default, the <varname>child_nodes_disconnect_command</varname> will only be executed
if all child nodes are disconnected. If <varname>child_nodes_connected_min_count</varname>
is set, the <varname>child_nodes_disconnect_command</varname> script will be triggered
if the number of connected child nodes falls below the specified value (e.g.
if set to <literal>2</literal>, the script will be triggered if only one child node
is connected). Alternatively, if <varname>child_nodes_disconnect_min_count</varname>
and more than that number of child nodes disconnects, the script will be triggered.
</para>
<note>
<para>
By default, a witness node, if in use, will <emphasis>not</emphasis> be counted as a
child node for the purposes of determining whether to execute
<varname>child_nodes_disconnect_command</varname>.
</para>
<para>
To enable the witness node to be counted as a child node, set
<varname>child_nodes_connected_include_witness</varname> in <filename>repmgr.conf</filename>
to <literal>true</literal>
(and <link linkend="repmgrd-reloading-configuration">reload the configuration</link> if &repmgrd;
is running).
</para>
</note>
</listitem>
<listitem>
<para>
Note that child nodes which are not attached when &repmgrd;
starts will <emphasis>not</emphasis> be considered as missing, as &repmgrd;
cannot know why they are not attached.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2 id="repmgrd-primary-child-disconnection-example">
<title>Standby disconnections monitoring process example</title>
<para>
This example shows typical &repmgrd; log output from a three-node cluster
(primary and two child nodes), with <varname>child_nodes_connected_min_count</varname>
set to <literal>2</literal>.
</para>
<para>
&repmgrd; on the primary has started up, while two child
nodes are being provisioned:
<programlisting>
[2019-04-24 15:25:33] [INFO] monitoring primary node "node1" (ID: 1) in normal state
[2019-04-24 15:25:35] [NOTICE] new node "node2" (ID: 2) has connected
[2019-04-24 15:25:35] [NOTICE] 1 (of 1) child nodes are connected, but at least 2 child nodes required
[2019-04-24 15:25:35] [INFO] no child nodes have detached since repmgrd startup
(...)
[2019-04-24 15:25:44] [NOTICE] new node "node3" (ID: 3) has connected
[2019-04-24 15:25:46] [INFO] monitoring primary node "node1" (ID: 1) in normal state
(...)</programlisting>
</para>
<para>
One of the child nodes has disconnected; &repmgrd;
is now waiting <varname>child_nodes_disconnect_timeout</varname> seconds
before executing <varname>child_nodes_disconnect_command</varname>:
<programlisting>
[2019-04-24 15:28:11] [INFO] monitoring primary node "node1" (ID: 1) in normal state
[2019-04-24 15:28:17] [INFO] monitoring primary node "node1" (ID: 1) in normal state
[2019-04-24 15:28:19] [NOTICE] node "node3" (ID: 3) has disconnected
[2019-04-24 15:28:19] [NOTICE] 1 (of 2) child nodes are connected, but at least 2 child nodes required
[2019-04-24 15:28:19] [INFO] most recently detached child node was 3 (ca. 0 seconds ago), not triggering "child_nodes_disconnect_command"
[2019-04-24 15:28:19] [DETAIL] "child_nodes_disconnect_timeout" set To 30 seconds
(...)</programlisting>
</para>
<para>
<varname>child_nodes_disconnect_command</varname> is executed once:
<programlisting>
[2019-04-24 15:28:49] [INFO] most recently detached child node was 3 (ca. 30 seconds ago), triggering "child_nodes_disconnect_command"
[2019-04-24 15:28:49] [INFO] "child_nodes_disconnect_command" is:
"/usr/bin/fence-all-the-things.sh"
[2019-04-24 15:28:51] [NOTICE] 1 (of 2) child nodes are connected, but at least 2 child nodes required
[2019-04-24 15:28:51] [INFO] "child_nodes_disconnect_command" was previously executed, taking no action</programlisting>
</para>
</sect2>
<sect2 id="repmgrd-primary-child-disconnection-caveats">
<title>Standby disconnections monitoring caveats</title>
<para>
The follwing caveats should be considered if you are intending to use this functionality.
</para>
<para>
<itemizedlist mark="bullet">
<listitem>
<para>
If a child node is configured to use archive recovery, it's possible that
the child node will disconnect from the primary node and fall back to
archive recovery. In this case &repmgrd;
will nevertheless register a node disconnection.
</para>
</listitem>
<listitem>
<para>
&repmgr; relies on <varname>application_name</varname> in the child node's
<varname>primary_conninfo</varname> string to be the same as the node name
defined in the node's <filename>repmgr.conf</filename> file. Furthermore,
this <varname>application_name</varname> must be unique across the replication
cluster.
</para>
<para>
If a custom <varname>application_name</varname> is used, or the
<varname>application_name</varname> is not unique across the replication
cluster, &repmgr; will not be able to reliably monitor child node connections.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2 id="repmgrd-primary-child-disconnection-configuration">
<title>Standby disconnections monitoring process configuration</title>
<para>
The following parameters, set in <filename>repmgr.conf</filename>,
control how child node disconnection monitoring operates.
</para>
<variablelist>
<varlistentry>
<term><varname>child_nodes_check_interval</varname></term>
<listitem>
<indexterm>
<primary>child_nodes_check_interval</primary>
<secondary>child node disconnection monitoring</secondary>
</indexterm>
<para>
Interval (in seconds) after which &repmgrd; queries the
<literal>pg_stat_replication</literal> system view and compares the nodes present
there against the list of nodes registered with repmgr which should be attached to the primary.
</para>
<para>
Default is <literal>5</literal> seconds, a value of <literal>0</literal> disables this check
altogether.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><varname>child_nodes_disconnect_command</varname></term>
<listitem>
<indexterm>
<primary>child_nodes_disconnect_command</primary>
<secondary>child node disconnection monitoring</secondary>
</indexterm>
<para>
User-definable script to be executed when &repmgrd;
determines that an insufficient number of child nodes are connected. By default
the script is executed when no child nodes are executed, but the execution
threshold can be modified by setting one of <varname>child_nodes_connected_min_count</varname>
or<varname>child_nodes_disconnect_min_count</varname> (see below).
</para>
<para>
The <varname>child_nodes_disconnect_command</varname> script can be
any user-defined script or program. It <emphasis>must</emphasis> be able
to be executed by the system user under which the PostgreSQL server itself
runs (usually <literal>postgres</literal>).
</para>
<note>
<para>
If <varname>child_nodes_disconnect_command</varname> is not set, no action
will be taken.
</para>
</note>
<para>
If specified, the following format placeholder will be substituted when
executing <varname>child_nodes_disconnect_command</varname>:
</para>
<variablelist>
<varlistentry>
<term><option>%p</option></term>
<listitem>
<para>
ID of the node executing the <varname>child_nodes_disconnect_command</varname> script.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
The <varname>child_nodes_disconnect_command</varname> script will only be executed once
while the criteria for its execution are met. If the criteria for its execution are no longer
met (i.e. some child nodes have reconnected), it will be executed again if
the criteria for its execution are met again.
</para>
<para>
The <varname>child_nodes_disconnect_command</varname> script will not be executed if
&repmgrd; is <link linkend="repmgrd-pausing">paused</link>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><varname>child_nodes_disconnect_timeout</varname></term>
<listitem>
<indexterm>
<primary>child_nodes_disconnect_timeout</primary>
<secondary>child node disconnection monitoring</secondary>
</indexterm>
<para>
If &repmgrd; determines that an insufficient number of
child nodes are connected, it will wait for the specified number of seconds
to execute the <varname>child_nodes_disconnect_command</varname>.
</para>
<para>
Default: <literal>30</literal> seconds.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><varname>child_nodes_connected_min_count</varname></term>
<listitem>
<indexterm>
<primary>child_nodes_connected_min_count</primary>
<secondary>child node disconnection monitoring</secondary>
</indexterm>
<para>
If the number of child nodes connected falls below the number specified in
this parameter, the <varname>child_nodes_disconnect_command</varname> script
will be executed.
</para>
<para>
For example, if <varname>child_nodes_connected_min_count</varname> is set
to <literal>2</literal>, the <varname>child_nodes_disconnect_command</varname>
script will be executed if one or no child nodes are connected.
</para>
<para>
Note that <varname>child_nodes_connected_min_count</varname> overrides any value
set in <varname>child_nodes_disconnect_min_count</varname>.
</para>
<para>
If neither of <varname>child_nodes_connected_min_count</varname> or
<varname>child_nodes_disconnect_min_count</varname> are set,
the <varname>child_nodes_disconnect_command</varname> script
will be executed when no child nodes are connected.
</para>
<para>
A witness node, if in use, will not be counted as a child node unless
<varname>child_nodes_connected_include_witness</varname> is set to <literal>true</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><varname>child_nodes_disconnect_min_count</varname></term>
<listitem>
<indexterm>
<primary>child_nodes_disconnect_min_count</primary>
<secondary>child node disconnection monitoring</secondary>
</indexterm>
<para>
If the number of disconnected child nodes exceeds the number specified in
this parameter, the <varname>child_nodes_disconnect_command</varname> script
will be executed.
</para>
<para>
For example, if <varname>child_nodes_disconnect_min_count</varname> is set
to <literal>2</literal>, the <varname>child_nodes_disconnect_command</varname>
script will be executed if more than two child nodes are disconnected.
</para>
<para>
Note that any value set in <varname>child_nodes_disconnect_min_count</varname>
will be overriden by <varname>child_nodes_connected_min_count</varname>.
</para>
<para>
If neither of <varname>child_nodes_connected_min_count</varname> or
<varname>child_nodes_disconnect_min_count</varname> are set,
the <varname>child_nodes_disconnect_command</varname> script
will be executed when no child nodes are connected.
</para>
<para>
A witness node, if in use, will not be counted as a child node unless
<varname>child_nodes_connected_include_witness</varname> is set to <literal>true</literal>.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><varname>child_nodes_connected_include_witness</varname></term>
<listitem>
<indexterm>
<primary>child_nodes_connected_include_witness</primary>
<secondary>child node disconnection monitoring</secondary>
</indexterm>
<para>
Whether to count the witness node (if in use) as a child node when
determining whether to execute <varname>child_nodes_disconnect_command</varname>.
</para>
<para>
Default to <literal>false</literal>.
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
<sect2 id="repmgrd-primary-child-disconnection-events">
<title>Standby disconnections monitoring process event notifications</title>
<para>
The following <link linkend="event-notifications">event notifications</link> may be generated:
</para>
<variablelist>
<varlistentry>
<term><varname>child_node_disconnect</varname></term>
<listitem>
<indexterm>
<primary>child_node_disconnect</primary>
<secondary>event notification</secondary>
</indexterm>
<para>
This event is generated after &repmgrd;
detects that a child node is no longer streaming from the primary node.
</para>
<para>
Example:
<programlisting>
$ repmgr cluster event --event=child_node_disconnect
Node ID | Name | Event | OK | Timestamp | Details
---------+-------+-----------------------+----+---------------------+--------------------------------------------
1 | node1 | child_node_disconnect | t | 2019-04-24 12:41:36 | node "node3" (ID: 3) has disconnected</programlisting>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><varname>child_node_reconnect</varname></term>
<listitem>
<indexterm>
<primary>child_node_reconnect</primary>
<secondary>event notification</secondary>
</indexterm>
<para>
This event is generated after &repmgrd;
detects that a child node has resumed streaming from the primary node.
</para>
<para>
Example:
<programlisting>
$ repmgr cluster event --event=child_node_reconnect
Node ID | Name | Event | OK | Timestamp | Details
---------+-------+----------------------+----+---------------------+------------------------------------------------------------
1 | node1 | child_node_reconnect | t | 2019-04-24 12:42:19 | node "node3" (ID: 3) has reconnected after 42 seconds</programlisting>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><varname>child_node_new_connect</varname></term>
<listitem>
<indexterm>
<primary>child_node_new_connect</primary>
<secondary>event notification</secondary>
</indexterm>
<para>
This event is generated after &repmgrd;
detects that a new child node has been registered with &repmgr; and has
connected to the primary.
</para>
<para>
Example:
<programlisting>
$ repmgr cluster event --event=child_node_new_connect
Node ID | Name | Event | OK | Timestamp | Details
---------+-------+------------------------+----+---------------------+---------------------------------------------
1 | node1 | child_node_new_connect | t | 2019-04-24 12:41:30 | new node "node3" (ID: 3) has connected</programlisting>
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><varname>child_nodes_disconnect_command</varname></term>
<listitem>
<indexterm>
<primary>child_nodes_disconnect_command</primary>
<secondary>event notification</secondary>
</indexterm>
<para>
This event is generated after &repmgrd; detects
that sufficient child nodes have been disconnected for a sufficient amount
of time to trigger execution of the <varname>child_nodes_disconnect_command</varname>.
</para>
<para>
Example:
<programlisting>
$ repmgr cluster event --event=child_nodes_disconnect_command
Node ID | Name | Event | OK | Timestamp | Details
---------+-------+--------------------------------+----+---------------------+--------------------------------------------------------
1 | node1 | child_nodes_disconnect_command | t | 2019-04-24 13:08:17 | "child_nodes_disconnect_command" successfully executed</programlisting>
</para>
</listitem>
</varlistentry>
</variablelist>
</sect2>
</sect1>
</chapter>

View File

@@ -1,4 +1,6 @@
<chapter id="repmgrd-bdr">
<title>BDR failover with repmgrd</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>BDR</secondary>
@@ -8,9 +10,8 @@
<primary>BDR</primary>
</indexterm>
<title>BDR failover with repmgrd</title>
<para>
&repmgr; 4.x provides support for monitoring BDR nodes and taking action in
&repmgr; 4.x provides support for monitoring a pair of BDR 2.x nodes and taking action in
case one of the nodes fails.
</para>
<note>
@@ -24,15 +25,28 @@
<para>
In contrast to streaming replication, there's no concept of "promoting" a new
primary node with BDR. Instead, "failover" involves monitoring both nodes
with <application>repmgrd</application> and redirecting queries from the failed node to the remaining
with &repmgrd; and redirecting queries from the failed node to the remaining
active node. This can be done by using an
<link linkend="event-notifications">event notification</link> script
which is called by <application>repmgrd</application> to dynamically
which is called by &repmgrd; to dynamically
reconfigure a proxy server/connection pooler such as <application>PgBouncer</application>.
</para>
<note>
<simpara>
This &repmgr; functionality is for BDR 2.x only running on PostgreSQL 9.4/9.6.
It is <emphasis>not</emphasis> required for later BDR versions.
</simpara>
</note>
<sect1 id="bdr-prerequisites" xreflabel="BDR prequisites">
<title>Prerequisites</title>
<important>
<para>
This &repmgr; functionality is for BDR 2.x only running on PostgreSQL 9.4/9.6.
It is <emphasis>not</emphasis> required for later BDR versions.
</para>
</important>
<para>
&repmgr; 4 requires PostgreSQL 9.4 or 9.6 with the BDR 2 extension
enabled and configured for a two-node BDR network. &repmgr; 4 packages
@@ -47,7 +61,7 @@
<para>
Application database connections *must* be passed through a proxy server/
connection pooler such as <application>PgBouncer</application>, and it must be possible to dynamically
reconfigure that from <application>repmgrd</application>. The example demonstrated in this document
reconfigure that from &repmgrd;. The example demonstrated in this document
will use <application>PgBouncer</application>
</para>
<para>
@@ -81,7 +95,7 @@
# Event notification configuration
event_notifications=bdr_failover
event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a" >> /tmp/bdr-failover.log 2>&1'
event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a" >> /tmp/bdr-failover.log 2>&amp;1'
# repmgrd options
monitor_interval_secs=5
@@ -107,7 +121,7 @@
<simpara>
<varname>event_notification_command</varname> is the script which does the actual "heavy lifting"
of reconfiguring the proxy server/ connection pooler. It is fully
user-definable; see section <xref linkend="bdr-event-notification-command"> for a reference
user-definable; see section <xref linkend="bdr-event-notification-command"/> for a reference
implementation.
</simpara>
</note>
@@ -145,7 +159,7 @@
</important>
<para>
At this point the meta data for both nodes has been created; executing
<xref linkend="repmgr-cluster-show"> (on either node) should produce output like this:
<xref linkend="repmgr-cluster-show"/> (on either node) should produce output like this:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Connection string
@@ -155,7 +169,7 @@
</para>
<para>
Additionally it's possible to display log of significant events; executing
<xref linkend="repmgr-cluster-event"> (on either node) should produce output like this:
<xref linkend="repmgr-cluster-event"/> (on either node) should produce output like this:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster event
Node ID | Event | OK | Timestamp | Details
@@ -283,7 +297,7 @@
</listitem>
<listitem>
<simpara>recreates the <application>PgBouncer</application> configuration file on each
node using the information provided by <application>repmgrd</application>
node using the information provided by &repmgrd;
(primarily the <varname>conninfo</varname> string) to configure
<application>PgBouncer</application></simpara>
</listitem>
@@ -305,21 +319,21 @@
<title>Node monitoring and failover</title>
<para>
At the intervals specified by <varname>monitor_interval_secs</varname>
in <filename>repmgr.conf</filename>, <application>repmgrd</application>
in <filename>repmgr.conf</filename>, &repmgrd;
will ping each node to check if it's available. If a node isn't available,
<application>repmgrd</application> will enter failover mode and check <varname>reconnect_attempts</varname>
&repmgrd; will enter failover mode and check <varname>reconnect_attempts</varname>
times at intervals of <varname>reconnect_interval</varname> to confirm the node is definitely unreachable.
This buffer period is necessary to avoid false positives caused by transient
network outages.
</para>
<para>
If the node is still unavailable, <application>repmgrd</application> will enter failover mode and execute
If the node is still unavailable, &repmgrd; will enter failover mode and execute
the script defined in <varname>event_notification_command</varname>; an entry will be logged
in the <literal>repmgr.events</literal> table and <application>repmgrd</application> will
in the <literal>repmgr.events</literal> table and &repmgrd; will
(unless otherwise configured) resume monitoring of the node in "degraded" mode until it reappears.
</para>
<para>
<application>repmgrd</application> logfile output during a failover event will look something like this
&repmgrd; logfile output during a failover event will look something like this
on one node (usually the node which has failed, here <literal>node2</literal>):
<programlisting>
...
@@ -375,8 +389,8 @@
</para>
<para>
This assumes only the PostgreSQL instance on <literal>node2</literal> has failed. In this case the
<application>repmgrd</application> instance running on <literal>node2</literal> has performed the failover. However if
the entire server becomes unavailable, <application>repmgrd</application> on <literal>node1</literal> will perform
&repmgrd; instance running on <literal>node2</literal> has performed the failover. However if
the entire server becomes unavailable, &repmgrd; on <literal>node1</literal> will perform
the failover.
</para>
</sect1>
@@ -391,7 +405,7 @@
</para>
<para>
If the failed node comes back up and connects correctly, output similar to this
will be visible in the <application>repmgrd</application> log:
will be visible in the &repmgrd; log:
<programlisting>
[2017-07-27 21:25:30] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode
[2017-07-27 21:25:46] [INFO] monitoring BDR replication status on node "node2" (ID: 2)
@@ -404,10 +418,10 @@
<sect1 id="bdr-complete-shutdown" xreflabel="Shutdown of both nodes">
<title>Shutdown of both nodes</title>
<para>
If both PostgreSQL instances are shut down, <application>repmgrd</application> will try and handle the
If both PostgreSQL instances are shut down, &repmgrd; will try and handle the
situation as gracefully as possible, though with no failover candidates available
there's not much it can do. Should this case ever occur, we recommend shutting
down <application>repmgrd</application> on both nodes and restarting it once the PostgreSQL instances
down &repmgrd; on both nodes and restarting it once the PostgreSQL instances
are running properly.
</para>
</sect1>

View File

@@ -1,20 +1,20 @@
<chapter id="repmgrd-configuration">
<title>repmgrd setup and configuration</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>configuration</secondary>
</indexterm>
<title>repmgrd setup and configuration</title>
<para>
<application>repmgrd</application> is a daemon which runs on each PostgreSQL node,
&repmgrd; is a daemon which runs on each PostgreSQL node,
monitoring the local node, and (unless it's the primary node) the upstream server
(the primary server or with cascading replication, another standby) which it's
connected to.
</para>
<para>
<application>repmgrd</application> can be configured to provide failover
&repmgrd; can be configured to provide failover
capability in case the primary upstream node becomes unreachable, and/or
provide monitoring data to the &repmgr; metadatabase.
</para>
@@ -23,7 +23,7 @@
<title>repmgrd configuration</title>
<para>
To use <application>repmgrd</application>, its associated function library <emphasis>must</emphasis> be
To use &repmgrd;, its associated function library <emphasis>must</emphasis> be
included via <filename>postgresql.conf</filename> with:
<programlisting>
@@ -35,112 +35,115 @@
</para>
<para>
The following configuraton options apply to <application>repmgrd</application> in all circumstances:
The following configuraton options apply to &repmgrd; in all circumstances:
</para>
<variablelist>
<varlistentry>
<indexterm>
<varlistentry>
<term><option>monitor_interval_secs</option></term>
<listitem>
<indexterm>
<primary>monitor_interval_secs</primary>
</indexterm>
<term><option>monitor_interval_secs</option></term>
<listitem>
<para>
The interval (in seconds, default: <literal>2</literal>) to check the availability of the upstream node.
</para>
</listitem>
</varlistentry>
<para>
The interval (in seconds, default: <literal>2</literal>) to check the availability of the upstream node.
</para>
</listitem>
<varlistentry>
</varlistentry>
<varlistentry id="connection-check-type">
<term><option>connection_check_type</option></term>
<listitem>
<indexterm>
<primary>connection_check_type</primary>
</indexterm>
<term><option>connection_check_type</option></term>
<listitem>
<para>
The option <option>connection_check_type</option> is used to select the method
<application>repmgrd</application> uses to determine whether the upstream node is available.
</para>
<para>
Possible values are:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<para>
The option <option>connection_check_type</option> is used to select the method
&repmgrd; uses to determine whether the upstream node is available.
</para>
<para>
Possible values are:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<literal>ping</literal> (default) - uses <command>PQping()</command> to
determine server availability
</simpara>
</listitem>
<listitem>
<simpara>
<literal>connection</literal> - determines server availability
by attempt ingto make a new connection to the upstream node
</simpara>
</listitem>
<listitem>
<simpara>
<literal>query</literal> - determines server availability
by executing an SQL statement on the node via the existing connection
</simpara>
</listitem>
</listitem>
<listitem>
<simpara>
<literal>connection</literal> - determines server availability
by attempt ingto make a new connection to the upstream node
</simpara>
</listitem>
<listitem>
<simpara>
<literal>query</literal> - determines server availability
by executing an SQL statement on the node via the existing connection
</simpara>
</listitem>
</itemizedlist>
</para>
</listitem>
</varlistentry>
</itemizedlist>
</para>
</listitem>
</varlistentry>
<varlistentry>
<indexterm>
<varlistentry>
<term><option>reconnect_attempts</option></term>
<listitem>
<indexterm>
<primary>reconnect_attempts</primary>
</indexterm>
<term><option>reconnect_attempts</option></term>
<listitem>
<para>
The number of attempts (default: <literal>6</literal>) will be made to reconnect to an unreachable
upstream node before initiating a failover.
</para>
<para>
There will be an interval of <option>reconnect_interval</option> seconds between each reconnection
attempt.
</para>
</listitem>
</varlistentry>
<varlistentry>
<indexterm>
<para>
The number of attempts (default: <literal>6</literal>) will be made to reconnect to an unreachable
upstream node before initiating a failover.
</para>
<para>
There will be an interval of <option>reconnect_interval</option> seconds between each reconnection
attempt.
</para>
</listitem>
</varlistentry>
<varlistentry>
<term><option>reconnect_interval</option></term>
<listitem>
<indexterm>
<primary>reconnect_interval</primary>
</indexterm>
<term><option>reconnect_interval</option></term>
<listitem>
<para>
Interval (in seconds, default: <literal>10</literal>) between attempts to reconnect to an unreachable
upstream node.
</para>
<para>
<para>
Interval (in seconds, default: <literal>10</literal>) between attempts to reconnect to an unreachable
upstream node.
</para>
<para>
The number of reconnection attempts is defined by the parameter <option>reconnect_attempts</option>.
</para>
</listitem>
</varlistentry>
</para>
</listitem>
</varlistentry>
<varlistentry>
<varlistentry>
<term><option>degraded_monitoring_timeout</option></term>
<listitem>
<indexterm>
<primary>degraded_monitoring_timeout</primary>
</indexterm>
<term><option>degraded_monitoring_timeout</option></term>
<listitem>
<para>
Interval (in seconds) after which <application>repmgrd</application> will terminate if
either of the servers (local node and or upstream node) being monitored is no longer available
(<link linkend="repmgrd-degraded-monitoring">degraded monitoring mode</link>).
</para>
<para>
<literal>-1</literal> (default) disables this timeout completely.
</para>
</listitem>
</varlistentry>
<para>
Interval (in seconds) after which &repmgrd; will terminate if
either of the servers (local node and or upstream node) being monitored is no longer available
(<link linkend="repmgrd-degraded-monitoring">degraded monitoring mode</link>).
</para>
<para>
<literal>-1</literal> (default) disables this timeout completely.
</para>
</listitem>
</varlistentry>
</variablelist>
@@ -152,7 +155,7 @@
<title>Required configuration for automatic failover</title>
<para>
The following <application>repmgrd</application> options <emphasis>must</emphasis> be set in
The following &repmgrd; options <emphasis>must</emphasis> be set in
<filename>repmgr.conf</filename>:
<itemizedlist spacing="compact" mark="bullet">
@@ -182,17 +185,18 @@
<variablelist>
<varlistentry>
<indexterm>
<primary>failover</primary>
</indexterm>
<term><option>failover</option></term>
<listitem>
<indexterm>
<primary>failover</primary>
</indexterm>
<para>
<option>failover</option> can be one of <literal>automatic</literal> or <literal>manual</literal>.
</para>
<note>
<para>
If <option>failover</option> is set to <literal>manual</literal>, <application>repmgrd</application>
If <option>failover</option> is set to <literal>manual</literal>, &repmgrd;
will not take any action if a failover situation is detected, and the node may need to
be modified manually (e.g. by executing <command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command>).
</para>
@@ -202,14 +206,16 @@
</varlistentry>
<varlistentry>
<indexterm>
<primary>promote_command</primary>
</indexterm>
<term><option>promote_command</option></term>
<listitem>
<indexterm>
<primary>promote_command</primary>
</indexterm>
<para>
The program or script defined in <option>promote_command</option> will be executed
in a failover situation when <application>repmgrd</application> determines that
in a failover situation when &repmgrd; determines that
the current node is to become the new primary node.
</para>
<para>
@@ -217,7 +223,7 @@
<command><link linkend="repmgr-standby-promote">repmgr standby promote</link></command> command.
</para>
<para>
It is also possible to provide e.g. a shell script to e.g. perform user-defined tasks
It is also possible to provide a shell script to e.g. perform user-defined tasks
before promoting the current node. In this case the script <emphasis>must</emphasis>
at some point execute <command><link linkend="repmgr-standby-promote">repmgr standby promote</link></command>
to promote the node; if this is not done, &repmgr; metadata will not be updated and
@@ -231,8 +237,8 @@
<para>
Note that the <literal>--log-to-file</literal> option will cause
output generated by the &repmgr; command, when executed by <application>repmgrd</application>,
to be logged to the same destination configured to receive log output for <application>repmgrd</application>.
output generated by the &repmgr; command, when executed by &repmgrd;,
to be logged to the same destination configured to receive log output for &repmgrd;.
</para>
<note>
<para>
@@ -245,32 +251,33 @@
</varlistentry>
<varlistentry>
<indexterm>
<primary>follow_command</primary>
</indexterm>
<term><option>follow_command</option></term>
<listitem>
<indexterm>
<primary>follow_command</primary>
</indexterm>
<para>
The program or script defined in <option>follow_command</option> will be executed
in a failover situation when <application>repmgrd</application> determines that
in a failover situation when &repmgrd; determines that
the current node is to follow the new primary node.
</para>
<para>
Normally <option>follow_command</option> is set as &repmgr;'s
<command><link linkend="repmgr-standby-follow">repmgr standby promote</link></command> command.
<command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command> command.
</para>
<para>
The <option>follow_command</option> parameter
should provide the <literal>--upstream-node-id=%n</literal>
option to <command>repmgr standby follow</command>; the <literal>%n</literal> will be replaced by
<application>repmgrd</application> with the ID of the new primary node. If this is not provided,
&repmgrd; with the ID of the new primary node. If this is not provided,
<command>repmgr standby follow</command> will attempt to determine the new primary by itself, but if the
original primary comes back online after the new primary is promoted, there is a risk that
<command>repmgr standby follow</command> will result in the node continuing to follow
the original primary.
</para>
<para>
It is also possible to provide e.g. a shell script to e.g. perform user-defined tasks
It is also possible to provide a shell script to e.g. perform user-defined tasks
before promoting the current node. In this case the script <emphasis>must</emphasis>
at some point execute <command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command>
to promote the node; if this is not done, &repmgr; metadata will not be updated and
@@ -284,8 +291,8 @@
<para>
Note that the <literal>--log-to-file</literal> option will cause
output generated by the &repmgr; command, when executed by <application>repmgrd</application>,
to be logged to the same destination configured to receive log output for <application>repmgrd</application>.
output generated by the &repmgr; command, when executed by &repmgrd;,
to be logged to the same destination configured to receive log output for &repmgrd;.
</para>
<note>
<para>
@@ -312,11 +319,12 @@
<variablelist>
<varlistentry>
<indexterm>
<primary>priority</primary>
</indexterm>
<term><option>priority</option></term>
<listitem>
<indexterm>
<primary>priority</primary>
</indexterm>
<para>
Indicates a preferred priority (default: <literal>100</literal>) for promoting nodes;
a value of zero prevents the node being promoted to primary.
@@ -330,14 +338,15 @@
</varlistentry>
<varlistentry>
<indexterm>
<primary>failover_validation_command</primary>
</indexterm>
<term><option>failover_validation_command</option></term>
<listitem>
<indexterm>
<primary>failover_validation_command</primary>
</indexterm>
<para>
User-defined script to execute for an external mechanism to validate the failover
decision made by <application>repmgrd</application>.
decision made by &repmgrd;.
</para>
<note>
<para>
@@ -366,12 +375,13 @@
<varlistentry>
<indexterm>
<primary>primary_visibility_consensus</primary>
</indexterm>
<term><option>primary_visibility_consensus</option></term>
<listitem>
<indexterm>
<primary>primary_visibility_consensus</primary>
</indexterm>
<para>
If <literal>true</literal>, only continue with failover if no standbys have seen
the primary node recently.
@@ -387,11 +397,12 @@
<varlistentry>
<indexterm>
<primary>standby_disconnect_on_failover</primary>
</indexterm>
<term><option>standby_disconnect_on_failover</option></term>
<listitem>
<indexterm>
<primary>standby_disconnect_on_failover</primary>
</indexterm>
<para>
In a failover situation, disconnect the local node's WAL receiver.
</para>
@@ -408,7 +419,7 @@
for this option.
</para>
<para>
<application>repmgrd</application> will refuse to start if this option is set
&repmgrd; will refuse to start if this option is set
but either of these prerequisites is not met.
</para>
</note>
@@ -429,11 +440,12 @@
<variablelist>
<varlistentry>
<indexterm>
<primary>election_rerun_interval</primary>
</indexterm>
<term><option>election_rerun_interval</option></term>
<listitem>
<indexterm>
<primary>election_rerun_interval</primary>
</indexterm>
<para>
If <option>failover_validation_command</option> is set, and the command returns
an error, pause the specified amount of seconds (default: 15) before rerunning the election.
@@ -443,11 +455,12 @@
<varlistentry>
<indexterm>
<primary>sibling_nodes_disconnect_timeout</primary>
</indexterm>
<term><option>sibling_nodes_disconnect_timeout</option></term>
<listitem>
<indexterm>
<primary>sibling_nodes_disconnect_timeout</primary>
</indexterm>
<para>
If <option>standby_disconnect_on_failover</option> is <literal>true</literal>, the
maximum length of time (in seconds, default: <literal>30</literal>)
@@ -463,34 +476,36 @@
</sect2>
<sect2 id="postgresql-service-configuration">
<title>PostgreSQL service configuration</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>PostgreSQL service configuration</secondary>
</indexterm>
<title>PostgreSQL service configuration</title>
<para>
If using automatic failover, currently <application>repmgrd</application> will need to execute
If using automatic failover, currently &repmgrd; will need to execute
<link linkend="repmgr-standby-follow"><command>repmgr standby follow</command></link>
to restart PostgreSQL on standbys to have them follow a new primary.
</para>
<para>
To ensure this happens smoothly, it's essential to provide the appropriate system/service restart
command appropriate to your operating system via <varname>service_restart_command</varname>
in <filename>repmgr.conf</filename>. If you don't do this, <application>repmgrd</application>
in <filename>repmgr.conf</filename>. If you don't do this, &repmgrd;
will default to using <command>pg_ctl</command>, which can result in unexpected problems,
particularly on <application>systemd</application>-based systems.
</para>
<para>
For more details, see <xref linkend="configuration-file-service-commands">.
For more details, see <xref linkend="configuration-file-service-commands"/>.
</para>
</sect2>
<sect2 id="repmgrd-service-configuration">
<title>repmgrd service configuration</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>repmgrd service configuration</secondary>
</indexterm>
<title>repmgrd service configuration</title>
<para>
If you are intending to use the <link linkend="repmgr-daemon-start"><command>repmgr daemon start</command></link>
and <link linkend="repmgr-daemon-stop"><command>repmgr daemon stop</command></link> commands, the following
@@ -522,11 +537,12 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
<sect2 id="repmgrd-monitoring-configuration" xreflabel="repmgrd monitoring configuration">
<title>Monitoring configuration</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>monitoring configuration</secondary>
</indexterm>
<title>Monitoring configuration</title>
<para>
To enable monitoring, set:
<programlisting>
@@ -538,32 +554,33 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
the option <option>monitor_interval_secs</option> (see above).
</para>
<para>
For more details on monitoring, see <xref linkend="repmgrd-monitoring">.
For more details on monitoring, see <xref linkend="repmgrd-monitoring"/>.
</para>
</sect2>
<sect2 id="repmgrd-reloading-configuration"xreflabel="reloading repmgrd configuration">
<sect2 id="repmgrd-reloading-configuration" xreflabel="reloading repmgrd configuration">
<title>Applying configuration changes to repmgrd</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>applying configuration changes</secondary>
</indexterm>
<title>Applying configuration changes to repmgrd</title>
<para>
To apply configuration file changes to a running <application>repmgrd</application>
daemon, execute the operating system's <application>repmgrd</application> service reload command
(see <xref linkend="appendix-packages"> for examples),
To apply configuration file changes to a running &repmgrd;
daemon, execute the operating system's &repmgrd; service reload command
(see <xref linkend="appendix-packages"/> for examples),
or for instances which were manually started, execute <command>kill -HUP</command>, e.g.
<command>kill -HUP `cat /tmp/repmgrd.pid`</command>.
</para>
<tip>
<para>
Check the <application>repmgrd</application> log to see what changes were
Check the &repmgrd; log to see what changes were
applied, or if any issues were encountered when reloading the configuration.
</para>
</tip>
<para>
Note that only the following subset of configuration file parameters can be changed on a
running <application>repmgrd</application> daemon:
running &repmgrd; daemon:
</para>
<itemizedlist spacing="compact" mark="bullet">
@@ -585,6 +602,41 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
</simpara>
</listitem>
<listitem>
<simpara>
<varname>child_nodes_check_interval</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>child_nodes_connected_include_witness</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>child_nodes_connected_min_count</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>child_nodes_disconnect_command</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>child_nodes_disconnect_min_count</varname>
</simpara>
</listitem>
<listitem>
<simpara>
<varname>child_nodes_disconnect_timeout</varname>
</simpara>
</listitem>
<listitem>
<simpara>
@@ -770,7 +822,7 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
<note>
<para>
After executing <command><link linkend="repmgr-standby-register">repmgr standby register --force</link></command>,
<application>repmgrd</application> <emphasis>must</emphasis> be restarted for the changes to take effect.
&repmgrd; <emphasis>must</emphasis> be restarted for the changes to take effect.
</para>
</note>
@@ -779,24 +831,25 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
</sect1>
<sect1 id="repmgrd-daemon" xreflabel="repmgrd daemon">
<title>repmgrd daemon</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>starting and stopping</secondary>
</indexterm>
<title>repmgrd daemon</title>
<para>
If installed from a package, the <application>repmgrd</application> can be started
If installed from a package, the &repmgrd; can be started
via the operating system's service command, e.g. in <application>systemd</application>
using <command>systemctl</command>.
</para>
<para>
See appendix <xref linkend="appendix-packages"> for details of service commands
See appendix <xref linkend="appendix-packages"/> for details of service commands
for different distributions.
</para>
<para>
The commands <link linkend="repmgr-daemon-start"><command>repmgr daemon start</command></link> and
<link linkend="repmgr-daemon-stop"><command>repmgr daemon stop</command></link> can be used
as convenience wrappers to start and stop <application>repmgrd</application>.
as convenience wrappers to start and stop &repmgrd;.
</para>
<important>
<para>
@@ -808,13 +861,15 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
</para>
</important>
<para>
<application>repmgrd</application> can be started manually like this:
&repmgrd; can be started manually like this:
<programlisting>
repmgrd -f /etc/repmgr.conf --pid-file /tmp/repmgrd.pid</programlisting>
and stopped with <command>kill `cat /tmp/repmgrd.pid`</command>. Adjust paths as appropriate.
</para>
<sect2 id="repmgrd-pid-file" xreflabel="repmgrd's PID file">
<title>repmgrd's PID file</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>PID file</secondary>
@@ -823,9 +878,8 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
<primary>PID file</primary>
<secondary>repmgrd</secondary>
</indexterm>
<title>repmgrd's PID file</title>
<para>
<application>repmgrd</application> will generate a PID file by default.
&repmgrd; will generate a PID file by default.
</para>
<note>
<simpara>
@@ -845,12 +899,12 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
<option>--pid-file</option> may be deprecated in future releases.
</para>
<para>
If a PID file location was specified by the package maintainer, <application>repmgrd</application>
If a PID file location was specified by the package maintainer, &repmgrd;
will use that. This only applies if &repmgr; was installed from a package and the package
maintainer has specified the PID file location.
</para>
<para>
If none of the above apply, <application>repmgrd</application> will create a PID file
If none of the above apply, &repmgrd; will create a PID file
in the operating system's temporary directory (as setermined by the environment variable
<varname>TMPDIR</varname>, or if that is not set, will use <filename>/tmp</filename>).
</para>
@@ -859,15 +913,17 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
<option>--no-pid-file</option>.
</para>
<para>
To see which PID file <application>repmgrd</application> would use, execute <application>repmgrd</application>
with the option <option>--show-pid-file</option>. <application>repmgrd</application>
To see which PID file &repmgrd; would use, execute &repmgrd;
with the option <option>--show-pid-file</option>. &repmgrd;
will not start if this option is provided. Note that the value shown is the
file <application>repmgrd</application> would use next time it starts, and is
file &repmgrd; would use next time it starts, and is
not necessarily the PID file currently in use.
</para>
</sect2>
<sect2 id="repmgrd-configuration-debian-ubuntu">
<title>repmgrd daemon configuration on Debian/Ubuntu</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>Debian/Ubuntu and daemon configuration</secondary>
@@ -877,11 +933,9 @@ repmgrd_service_stop_command='sudo systemctl repmgr11 stop'
<secondary>repmgrd daemon configuration</secondary>
</indexterm>
<title>repmgrd daemon configuration on Debian/Ubuntu</title>
<para>
If &repmgr; was installed from Debian/Ubuntu packages, additional configuration
is required before <application>repmgrd</application> is started as a daemon.
is required before &repmgrd; is started as a daemon.
</para>
<para>
This is done via the file <filename>/etc/default/repmgrd</filename>, which by default
@@ -915,17 +969,17 @@ REPMGRD_OPTS="--daemonize=false"
</para>
<tip>
<para>
See <xref linkend="packages-debian-ubuntu"> for details of the Debian/Ubuntu packages and
See <xref linkend="packages-debian-ubuntu"/> for details of the Debian/Ubuntu packages and
typical file locations (including <filename>repmgr.conf</filename>).
</para>
</tip>
<para>
From <application>repmgrd</application> 4.1, ensure <varname>REPMGRD_OPTS</varname> includes
From &repmgrd; 4.1, ensure <varname>REPMGRD_OPTS</varname> includes
<option>--daemonize=false</option>, as daemonization is handled by the service command.
</para>
<para>
If using <application>systemd</application>, you may need to execute <command>systemctl daemon-reload</command>.
Also, if you attempted to start <application>repmgrd</application> using <command>systemctl start repmgrd</command>,
Also, if you attempted to start &repmgrd; using <command>systemctl start repmgrd</command>,
you'll need to execute <command>systemctl stop repmgrd</command>. Because that's how <application>systemd</application>
rolls.
</para>
@@ -959,7 +1013,9 @@ REPMGRD_OPTS="--daemonize=false"
<sect1 id="repmgrd-log-rotation">
<sect1 id="repmgrd-log-rotation">
<title>repmgrd log rotation</title>
<indexterm>
<primary>log rotation</primary>
<secondary>repmgrd</secondary>
@@ -970,9 +1026,8 @@ REPMGRD_OPTS="--daemonize=false"
<secondary>log rotation</secondary>
</indexterm>
<title>repmgrd log rotation</title>
<para>
To ensure the current <application>repmgrd</application> logfile
To ensure the current &repmgrd; logfile
(specified in <filename>repmgr.conf</filename> with the parameter
<option>log_file</option>) does not grow indefinitely, configure your
system's <command>logrotate</command> to regularly rotate it.

View File

@@ -1,14 +1,15 @@
<chapter id="repmgrd-operation" xreflabel="repmgrd operation">
<title>repmgrd operation</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>operation</secondary>
</indexterm>
<title>repmgrd operation</title>
<sect1 id="repmgrd-pausing">
<title>Pausing repmgrd</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>pausing</secondary>
@@ -18,45 +19,44 @@
<primary>pausing repmgrd</primary>
</indexterm>
<title>Pausing repmgrd</title>
<para>
In normal operation, <application>repmgrd</application> monitors the state of the
In normal operation, &repmgrd; monitors the state of the
PostgreSQL node it is running on, and will take appropriate action if problems
are detected, e.g. (if so configured) promote the node to primary, if the existing
primary has been determined as failed.
</para>
<para>
However, <application>repmgrd</application> is unable to distinguish between
However, &repmgrd; is unable to distinguish between
planned outages (such as performing a <link linkend="performing-switchover">switchover</link>
or installing PostgreSQL maintenance released), and an actual server outage. In versions prior to
&repmgr; 4.2 it was necessary to stop <application>repmgrd</application> on all nodes (or at least
on all nodes where <application>repmgrd</application> is
&repmgr; 4.2 it was necessary to stop &repmgrd; on all nodes (or at least
on all nodes where &repmgrd; is
<link linkend="repmgrd-automatic-failover">configured for automatic failover</link>)
to prevent <application>repmgrd</application> from making unintentional changes to the
to prevent &repmgrd; from making unintentional changes to the
replication cluster.
</para>
<para>
From <link linkend="release-4.2">&repmgr; 4.2</link>, <application>repmgrd</application>
From <link linkend="release-4.2">&repmgr; 4.2</link>, &repmgrd;
can now be &quot;paused&quot;, i.e. instructed not to take any action such as performing a failover.
This can be done from any node in the cluster, removing the need to stop/restart
each <application>repmgrd</application> individually.
each &repmgrd; individually.
</para>
<note>
<para>
For major PostgreSQL upgrades, e.g. from PostgreSQL 10 to PostgreSQL 11,
<application>repmgrd</application> should be shut down completely and only started up
&repmgrd; should be shut down completely and only started up
once the &repmgr; packages for the new PostgreSQL major version have been installed.
</para>
</note>
<sect2 id="repmgrd-pausing-prerequisites">
<title>Prerequisites for pausing <application>repmgrd</application></title>
<title>Prerequisites for pausing &repmgrd;</title>
<para>
In order to be able to pause/unpause <application>repmgrd</application>, following
In order to be able to pause/unpause &repmgrd;, following
prerequisites must be met:
<itemizedlist spacing="compact" mark="bullet">
@@ -86,9 +86,9 @@
</sect2>
<sect2 id="repmgrd-pausing-execution">
<title>Pausing/unpausing <application>repmgrd</application></title>
<title>Pausing/unpausing &repmgrd;</title>
<para>
To pause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link>, e.g.:
To pause &repmgrd;, execute <link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link>, e.g.:
<programlisting>
$ repmgr -f /etc/repmgr.conf daemon pause
NOTICE: node 1 (node1) paused
@@ -96,7 +96,7 @@ NOTICE: node 2 (node2) paused
NOTICE: node 3 (node3) paused</programlisting>
</para>
<para>
The state of <application>repmgrd</application> on each node can be checked with
The state of &repmgrd; on each node can be checked with
<link linkend="repmgr-daemon-status"><command>repmgr daemon status</command></link>, e.g.:
<programlisting>$ repmgr -f /etc/repmgr.conf daemon status
ID | Name | Role | Status | repmgrd | PID | Paused?
@@ -109,15 +109,15 @@ NOTICE: node 3 (node3) paused</programlisting>
<note>
<para>
If executing a switchover with <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
&repmgr; will automatically pause/unpause <application>repmgrd</application> as part of the switchover process.
&repmgr; will automatically pause/unpause &repmgrd; as part of the switchover process.
</para>
</note>
<para>
If the primary (in this example, <literal>node1</literal>) is stopped, <application>repmgrd</application>
If the primary (in this example, <literal>node1</literal>) is stopped, &repmgrd;
running on one of the standbys (here: <literal>node2</literal>) will react like this:
<programlisting>
[2018-09-20 12:22:21] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
[2018-09-20 12:22:21] [WARNING] unable to connect to upstream node "node1" (ID: 1)
[2018-09-20 12:22:21] [INFO] checking state of node 1, 1 of 5 attempts
[2018-09-20 12:22:21] [INFO] sleeping 1 seconds until next reconnection attempt
...
@@ -125,19 +125,19 @@ NOTICE: node 3 (node3) paused</programlisting>
[2018-09-20 12:22:25] [INFO] checking state of node 1, 5 of 5 attempts
[2018-09-20 12:22:25] [WARNING] unable to reconnect to node 1 after 5 attempts
[2018-09-20 12:22:25] [NOTICE] node is paused
[2018-09-20 12:22:33] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state
[2018-09-20 12:22:33] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state
[2018-09-20 12:22:33] [DETAIL] repmgrd paused by administrator
[2018-09-20 12:22:33] [HINT] execute "repmgr daemon unpause" to resume normal failover mode</programlisting>
</para>
<para>
If the primary becomes available again (e.g. following a software upgrade), <application>repmgrd</application>
If the primary becomes available again (e.g. following a software upgrade), &repmgrd;
will automatically reconnect, e.g.:
<programlisting>
[2018-09-20 13:12:41] [NOTICE] reconnected to upstream node 1 after 8 seconds, resuming monitoring</programlisting>
</para>
<para>
To unpause <application>repmgrd</application>, execute <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>, e.g.:
To unpause &repmgrd;, execute <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>, e.g.:
<programlisting>
$ repmgr -f /etc/repmgr.conf daemon unpause
NOTICE: node 1 (node1) unpaused
@@ -147,7 +147,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<note>
<para>
If the previous primary is no longer accessible when <application>repmgrd</application>
If the previous primary is no longer accessible when &repmgrd;
is unpaused, no failover action will be taken. Instead, a new primary must be manually promoted using
<link linkend="repmgr-standby-promote"><command>repmgr standby promote</command></link>,
and any standbys attached to the new primary with
@@ -156,13 +156,13 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<para>
This is to prevent <link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
resulting in the automatic promotion of a new primary, which may be a problem particularly
in larger clusters, where <application>repmgrd</application> could select a different promotion
in larger clusters, where &repmgrd; could select a different promotion
candidate to the one intended by the administrator.
</para>
</note>
</sect2>
<sect2 id="repmgrd-pausing-details">
<title>Details on the <application>repmgrd</application> pausing mechanism</title>
<title>Details on the &repmgrd; pausing mechanism</title>
<para>
The pause state of each node will be stored over a PostgreSQL restart.
@@ -171,30 +171,31 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<para>
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link> can be
executed even if <application>repmgrd</application> is not running; in this case,
<application>repmgrd</application> will start up in whichever pause state has been set.
executed even if &repmgrd; is not running; in this case,
&repmgrd; will start up in whichever pause state has been set.
</para>
<note>
<para>
<link linkend="repmgr-daemon-pause"><command>repmgr daemon pause</command></link> and
<link linkend="repmgr-daemon-unpause"><command>repmgr daemon unpause</command></link>
<emphasis>do not</emphasis> stop/start <application>repmgrd</application>.
<emphasis>do not</emphasis> stop/start &repmgrd;.
</para>
</note>
</sect2>
</sect1>
<sect1 id="repmgrd-wal-replay-pause">
<title>repmgrd and paused WAL replay</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>paused WAL replay</secondary>
</indexterm>
<title>repmgrd and paused WAL replay</title>
<para>
If WAL replay has been paused (using <command>pg_wal_replay_pause()</command>,
on PostgreSQL 9.6 and earlier <command>pg_xlog_replay_pause()</command>),
in a failover situation <application>repmgrd</application> will
in a failover situation &repmgrd; will
automatically resume WAL replay.
</para>
<para>
@@ -214,6 +215,8 @@ NOTICE: node 3 (node3) unpaused</programlisting>
</sect1>
<sect1 id="repmgrd-degraded-monitoring" xreflabel="repmgrd degraded monitoring">
<title>"degraded monitoring" mode</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>degraded monitoring</secondary>
@@ -223,11 +226,10 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<primary>degraded monitoring</primary>
</indexterm>
<title>"degraded monitoring" mode</title>
<para>
In certain circumstances, <application>repmgrd</application> is not able to fulfill its primary mission
In certain circumstances, &repmgrd; is not able to fulfill its primary mission
of monitoring the node's upstream server. In these cases it enters &quot;degraded monitoring&quot;
mode, where <application>repmgrd</application> remains active but is waiting for the situation
mode, where &repmgrd; remains active but is waiting for the situation
to be resolved.
</para>
<para>
@@ -268,8 +270,8 @@ NOTICE: node 3 (node3) unpaused</programlisting>
Example output in a situation where there is only one standby with <literal>failover=manual</literal>,
and the primary node is unavailable (but is later restarted):
<programlisting>
[2017-08-29 10:59:19] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state (automatic failover disabled)
[2017-08-29 10:59:33] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
[2017-08-29 10:59:19] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in normal state (automatic failover disabled)
[2017-08-29 10:59:33] [WARNING] unable to connect to upstream node "node1" (ID: 1)
[2017-08-29 10:59:33] [INFO] checking state of node 1, 1 of 5 attempts
[2017-08-29 10:59:33] [INFO] sleeping 1 seconds until next reconnection attempt
(...)
@@ -278,21 +280,21 @@ NOTICE: node 3 (node3) unpaused</programlisting>
[2017-08-29 10:59:37] [NOTICE] this node is not configured for automatic failover so will not be considered as promotion candidate
[2017-08-29 10:59:37] [NOTICE] no other nodes are available as promotion candidate
[2017-08-29 10:59:37] [HINT] use "repmgr standby promote" to manually promote this node
[2017-08-29 10:59:37] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state (automatic failover disabled)
[2017-08-29 10:59:53] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in degraded state (automatic failover disabled)
[2017-08-29 10:59:37] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state (automatic failover disabled)
[2017-08-29 10:59:53] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in degraded state (automatic failover disabled)
[2017-08-29 11:00:45] [NOTICE] reconnected to upstream node 1 after 68 seconds, resuming monitoring
[2017-08-29 11:00:57] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state (automatic failover disabled)</programlisting>
[2017-08-29 11:00:57] [INFO] node "node2" (ID: 2) monitoring upstream node "node1" (ID: 1) in normal state (automatic failover disabled)</programlisting>
</para>
<para>
By default, <literal>repmgrd</literal> will continue in degraded monitoring mode indefinitely.
However a timeout (in seconds) can be set with <varname>degraded_monitoring_timeout</varname>,
after which <application>repmgrd</application> will terminate.
after which &repmgrd; will terminate.
</para>
<note>
<para>
If <application>repmgrd</application> is monitoring a primary mode which has been stopped
If &repmgrd; is monitoring a primary mode which has been stopped
and manually restarted as a standby attached to a new primary, it will automatically detect
the status change and update the node record to reflect the node's new status
as an active standby. It will then resume monitoring the node as a standby.
@@ -302,6 +304,7 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<sect1 id="repmgrd-monitoring" xreflabel="Storing monitoring data">
<title>Storing monitoring data</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>monitoring</secondary>
@@ -311,9 +314,8 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<secondary>with repmgrd</secondary>
</indexterm>
<title>Storing monitoring data</title>
<para>
When <application>repmgrd</application> is running with the option <literal>monitoring_history=true</literal>,
When &repmgrd; is running with the option <literal>monitoring_history=true</literal>,
it will constantly write standby node status information to the
<varname>monitoring_history</varname> table, providing a near-real time
overview of replication status on all nodes
@@ -346,12 +348,12 @@ NOTICE: node 3 (node3) unpaused</programlisting>
<para>
As this can generate a large amount of monitoring data in the table
<literal>repmgr.monitoring_history</literal>. it's advisable to regularly
purge historical data using the <xref linkend="repmgr-cluster-cleanup">
purge historical data using the <xref linkend="repmgr-cluster-cleanup"/>
command; use the <literal>-k/--keep-history</literal> option to
specify how many day's worth of data should be retained.
</para>
<para>
It's possible to use <application>repmgrd</application> to run in monitoring
It's possible to use &repmgrd; to run in monitoring
mode only (without automatic failover capability) for some or all
nodes by setting <literal>failover=manual</literal> in the node's
<filename>repmgr.conf</filename> file. In the event of the node's upstream failing,

View File

@@ -1,122 +0,0 @@
<chapter id="repmgrd-overview" xreflabel="repmgrd overview">
<indexterm>
<primary>repmgrd</primary>
<secondary>overview</secondary>
</indexterm>
<title>repmgrd overview</title>
<para>
<application>repmgrd</application> (&quot;<literal>replication manager daemon</literal>&quot;)
is a management and monitoring daemon which runs
on each node in a replication cluster. It can automate actions such as
failover and updating standbys to follow the new primary, as well as
providing monitoring information about the state of each standby.
</para>
<sect1 id="repmgrd-demonstration">
<title>repmgrd demonstration</title>
<para>
To demonstrate automatic failover, set up a 3-node replication cluster (one primary
and two standbys streaming directly from the primary) so that the cluster looks
something like this:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Connection string
----+-------+---------+-----------+----------+----------+--------------------------------------
1 | node1 | primary | * running | | default | host=node1 dbname=repmgr user=repmgr
2 | node2 | standby | running | node1 | default | host=node2 dbname=repmgr user=repmgr
3 | node3 | standby | running | node1 | default | host=node3 dbname=repmgr user=repmgr</programlisting>
</para>
<tip>
<para>
See section <link linkend="repmgrd-automatic-failover-configuration">Required configuration for automatic failover</link>
for an example of minimal <filename>repmgr.conf</filename> file settings suitable for use with <application>repmgrd</application>.
</para>
</tip>
<para>
Start <application>repmgrd</application> on each standby and verify that it's running by examining the
log output, which at log level <literal>INFO</literal> will look like this:
<programlisting>
[2017-08-24 17:31:00] [NOTICE] using configuration file "/etc/repmgr.conf"
[2017-08-24 17:31:00] [INFO] connecting to database "host=node2 dbname=repmgr user=repmgr"
[2017-08-24 17:31:00] [NOTICE] starting monitoring of node <literal>node2</literal> (ID: 2)
[2017-08-24 17:31:00] [INFO] monitoring connection to upstream node "node1" (node ID: 1)</programlisting>
</para>
<para>
Each <application>repmgrd</application> should also have recorded its successful startup as an event:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster event --event=repmgrd_start
Node ID | Name | Event | OK | Timestamp | Details
---------+-------+---------------+----+---------------------+-------------------------------------------------------------
3 | node3 | repmgrd_start | t | 2017-08-24 17:35:54 | monitoring connection to upstream node "node1" (node ID: 1)
2 | node2 | repmgrd_start | t | 2017-08-24 17:35:50 | monitoring connection to upstream node "node1" (node ID: 1)
1 | node1 | repmgrd_start | t | 2017-08-24 17:35:46 | monitoring cluster primary "node1" (node ID: 1) </programlisting>
</para>
<para>
Now stop the current primary server with e.g.:
<programlisting>
pg_ctl -D /var/lib/postgresql/data -m immediate stop</programlisting>
</para>
<para>
This will force the primary to shut down straight away, aborting all processes
and transactions. This will cause a flurry of activity in the <application>repmgrd</application> log
files as each <application>repmgrd</application> detects the failure of the primary and a failover
decision is made. This is an extract from the log of a standby server (<literal>node2</literal>)
which has promoted to new primary after failure of the original primary (<literal>node1</literal>).
<programlisting>
[2017-08-24 23:32:01] [INFO] node "node2" (node ID: 2) monitoring upstream node "node1" (node ID: 1) in normal state
[2017-08-24 23:32:08] [WARNING] unable to connect to upstream node "node1" (node ID: 1)
[2017-08-24 23:32:08] [INFO] checking state of node 1, 1 of 5 attempts
[2017-08-24 23:32:08] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-08-24 23:32:09] [INFO] checking state of node 1, 2 of 5 attempts
[2017-08-24 23:32:09] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-08-24 23:32:10] [INFO] checking state of node 1, 3 of 5 attempts
[2017-08-24 23:32:10] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-08-24 23:32:11] [INFO] checking state of node 1, 4 of 5 attempts
[2017-08-24 23:32:11] [INFO] sleeping 1 seconds until next reconnection attempt
[2017-08-24 23:32:12] [INFO] checking state of node 1, 5 of 5 attempts
[2017-08-24 23:32:12] [WARNING] unable to reconnect to node 1 after 5 attempts
INFO: setting voting term to 1
INFO: node 2 is candidate
INFO: node 3 has received request from node 2 for electoral term 1 (our term: 0)
[2017-08-24 23:32:12] [NOTICE] this node is the winner, will now promote self and inform other nodes
INFO: connecting to standby database
NOTICE: promoting standby
DETAIL: promoting server using 'pg_ctl -l /var/log/postgres/startup.log -w -D '/var/lib/pgsql/data' promote'
INFO: reconnecting to promoted server
NOTICE: STANDBY PROMOTE successful
DETAIL: node 2 was successfully promoted to primary
INFO: node 3 received notification to follow node 2
[2017-08-24 23:32:13] [INFO] switching to primary monitoring mode</programlisting>
</para>
<para>
The cluster status will now look like this, with the original primary (<literal>node1</literal>)
marked as inactive, and standby <literal>node3</literal> now following the new primary
(<literal>node2</literal>):
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Connection string
----+-------+---------+-----------+----------+----------+----------------------------------------------------
1 | node1 | primary | - failed | | default | host=node1 dbname=repmgr user=repmgr
2 | node2 | primary | * running | | default | host=node2 dbname=repmgr user=repmgr
3 | node3 | standby | running | node2 | default | host=node3 dbname=repmgr user=repmgr</programlisting>
</para>
<para>
<command>repmgr cluster event</command> will display a summary of what happened to each server
during the failover:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster event
Node ID | Name | Event | OK | Timestamp | Details
---------+-------+--------------------------+----+---------------------+-----------------------------------------------------------------------------------
3 | node3 | repmgrd_failover_follow | t | 2017-08-24 23:32:16 | node 3 now following new upstream node 2
3 | node3 | standby_follow | t | 2017-08-24 23:32:16 | node 3 is now attached to node 2
2 | node2 | repmgrd_failover_promote | t | 2017-08-24 23:32:13 | node 2 promoted to primary; old primary 1 marked as failed
2 | node2 | standby_promote | t | 2017-08-24 23:32:13 | node 2 was successfully promoted to primary</programlisting>
</para>
</sect1>
</chapter>

187
doc/repmgrd-overview.xml Normal file
View File

@@ -0,0 +1,187 @@
<chapter id="repmgrd-overview" xreflabel="repmgrd overview">
<title>repmgrd overview</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>overview</secondary>
</indexterm>
<para>
&repmgrd; (&quot;<literal>replication manager daemon</literal>&quot;)
is a management and monitoring daemon which runs
on each node in a replication cluster. It can automate actions such as
failover and updating standbys to follow the new primary, as well as
providing monitoring information about the state of each standby.
</para>
<para>
&repmgrd; is designed to be straightforward to set up
and does not require additional external infrastructure.
</para>
<para>
Functionality provided by &repmgrd; includes:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
wide range of <link linkend="repmgrd-basic-configuration">configuration options</link>
</simpara>
</listitem>
<listitem>
<simpara>
option to execute custom scripts (&quot;<link linkend="event-notifications">event notifications</link>
at different points in the failover sequence
</simpara>
</listitem>
<listitem>
<simpara>
ability to <link linkend="repmgrd-pausing">pause repmgrd</link>
operation on all nodes with a
<link linkend="repmgr-daemon-pause"><command>single command</command></link>
</simpara>
</listitem>
<listitem>
<simpara>
optional <link linkend="repmgrd-witness-server">witness server</link>
</simpara>
</listitem>
<listitem>
<simpara>
&quot;location&quot; configuration option to restrict
potential promotion candidates to a single location
(e.g. when nodes are spread over multiple data centres)
</simpara>
</listitem>
<listitem>
<simpara>
<link linkend="connection-check-type">choice of method</link> to determine node availability
(PostgreSQL ping, query execution or new connection)
</simpara>
</listitem>
<listitem>
<simpara>
retention of monitoring statistics (optional)
</simpara>
</listitem>
</itemizedlist>
</para>
<sect1 id="repmgrd-demonstration">
<title>repmgrd demonstration</title>
<para>
To demonstrate automatic failover, set up a 3-node replication cluster (one primary
and two standbys streaming directly from the primary) so that the cluster looks
something like this:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show --compact
ID | Name | Role | Status | Upstream | Location | Prio.
----+-------+---------+-----------+----------+----------+-------
1 | node1 | primary | * running | | default | 100
2 | node2 | standby | running | node1 | default | 100
3 | node3 | standby | running | node1 | default | 100</programlisting>
</para>
<tip>
<para>
See section <link linkend="repmgrd-automatic-failover-configuration">Required configuration for automatic failover</link>
for an example of minimal <filename>repmgr.conf</filename> file settings suitable for use with &repmgrd;.
</para>
</tip>
<para>
Start &repmgrd; on each standby and verify that it's running by examining the
log output, which at log level <literal>INFO</literal> will look like this:
<programlisting>
[2019-03-15 06:32:05] [NOTICE] repmgrd (repmgrd 4.3) starting up
[2019-03-15 06:32:05] [INFO] connecting to database "host=node2 dbname=repmgr user=repmgr connect_timeout=2"
INFO: set_repmgrd_pid(): provided pidfile is /var/run/repmgr/repmgrd-11.pid
[2019-03-15 06:32:05] [NOTICE] starting monitoring of node "node2" (ID: 2)
[2019-03-15 06:32:05] [INFO] monitoring connection to upstream node "node1" (ID: 1)</programlisting>
</para>
<para>
Each &repmgrd; should also have recorded its successful startup as an event:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster event --event=repmgrd_start
Node ID | Name | Event | OK | Timestamp | Details
---------+-------+---------------+----+---------------------+--------------------------------------------------------
3 | node3 | repmgrd_start | t | 2019-03-14 04:17:30 | monitoring connection to upstream node "node1" (ID: 1)
2 | node2 | repmgrd_start | t | 2019-03-14 04:11:47 | monitoring connection to upstream node "node1" (ID: 1)
1 | node1 | repmgrd_start | t | 2019-03-14 04:04:31 | monitoring cluster primary "node1" (ID: 1)</programlisting>
</para>
<para>
Now stop the current primary server with e.g.:
<programlisting>
pg_ctl -D /var/lib/postgresql/data -m immediate stop</programlisting>
</para>
<para>
This will force the primary to shut down straight away, aborting all processes
and transactions. This will cause a flurry of activity in the &repmgrd; log
files as each &repmgrd; detects the failure of the primary and a failover
decision is made. This is an extract from the log of a standby server (<literal>node2</literal>)
which has promoted to new primary after failure of the original primary (<literal>node1</literal>).
<programlisting>
[2019-03-15 06:37:50] [WARNING] unable to connect to upstream node "node1" (ID: 1)
[2019-03-15 06:37:50] [INFO] checking state of node 1, 1 of 3 attempts
[2019-03-15 06:37:50] [INFO] sleeping 5 seconds until next reconnection attempt
[2019-03-15 06:37:55] [INFO] checking state of node 1, 2 of 3 attempts
[2019-03-15 06:37:55] [INFO] sleeping 5 seconds until next reconnection attempt
[2019-03-15 06:38:00] [INFO] checking state of node 1, 3 of 3 attempts
[2019-03-15 06:38:00] [WARNING] unable to reconnect to node 1 after 3 attempts
[2019-03-15 06:38:00] [INFO] primary and this node have the same location ("default")
[2019-03-15 06:38:00] [INFO] local node's last receive lsn: 0/900CBF8
[2019-03-15 06:38:00] [INFO] node 3 last saw primary node 12 second(s) ago
[2019-03-15 06:38:00] [INFO] last receive LSN for sibling node "node3" (ID: 3) is: 0/900CBF8
[2019-03-15 06:38:00] [INFO] node "node3" (ID: 3) has same LSN as current candidate "node2" (ID: 2)
[2019-03-15 06:38:00] [INFO] visible nodes: 2; total nodes: 2; no nodes have seen the primary within the last 4 seconds
[2019-03-15 06:38:00] [NOTICE] promotion candidate is "node2" (ID: 2)
[2019-03-15 06:38:00] [NOTICE] this node is the winner, will now promote itself and inform other nodes
[2019-03-15 06:38:00] [INFO] promote_command is:
"/usr/pgsql-11/bin/repmgr -f /etc/repmgr/11/repmgr.conf standby promote"
NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using "/usr/pgsql-11/bin/pg_ctl -w -D '/var/lib/pgsql/11/data' promote"
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary
[2019-03-15 06:38:01] [INFO] 3 followers to notify
[2019-03-15 06:38:01] [NOTICE] notifying node "node3" (ID: 3) to follow node 2
INFO: node 3 received notification to follow node 2
[2019-03-15 06:38:01] [INFO] switching to primary monitoring mode
[2019-03-15 06:38:01] [NOTICE] monitoring cluster primary "node2" (ID: 2)</programlisting>
</para>
<para>
The cluster status will now look like this, with the original primary (<literal>node1</literal>)
marked as inactive, and standby <literal>node3</literal> now following the new primary
(<literal>node2</literal>):
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster show --compact
ID | Name | Role | Status | Upstream | Location | Prio.
----+-------+---------+-----------+----------+----------+-------
1 | node1 | primary | - failed | | default | 100
2 | node2 | primary | * running | | default | 100
3 | node3 | standby | running | node2 | default | 100</programlisting>
</para>
<para>
<link linkend="repmgr-cluster-event"><command>repmgr cluster event</command></link> will display a summary of
what happened to each server during the failover:
<programlisting>
$ repmgr -f /etc/repmgr.conf cluster event
Node ID | Name | Event | OK | Timestamp | Details
---------+-------+----------------------------+----+---------------------+-------------------------------------------------------------
3 | node3 | repmgrd_failover_follow | t | 2019-03-15 06:38:03 | node 3 now following new upstream node 2
3 | node3 | standby_follow | t | 2019-03-15 06:38:02 | standby attached to upstream node "node2" (ID: 2)
2 | node2 | repmgrd_reload | t | 2019-03-15 06:38:01 | monitoring cluster primary "node2" (ID: 2)
2 | node2 | repmgrd_failover_promote | t | 2019-03-15 06:38:01 | node 2 promoted to primary; old primary 1 marked as failed
2 | node2 | standby_promote | t | 2019-03-15 06:38:01 | server "node2" (ID: 2) was successfully promoted to primary</programlisting>
</para>
</sect1>
</chapter>

89
doc/stylesheet-common.xsl Normal file
View File

@@ -0,0 +1,89 @@
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<!--
This file contains XSLT stylesheet customizations that are common to
all output formats (HTML, HTML Help, XSL-FO, etc.).
-->
<xsl:include href="stylesheet-speedup-common.xsl" />
<!-- Parameters -->
<!--
<xsl:param name="draft.mode">
<xsl:choose>
<xsl:when test="contains($repmgr.version, 'devel')">yes</xsl:when>
<xsl:otherwise>no</xsl:otherwise>
</xsl:choose>
</xsl:param>
-->
<xsl:param name="show.comments">
<xsl:choose>
<xsl:when test="contains($repmgr.version, 'devel')">1</xsl:when>
<xsl:otherwise>0</xsl:otherwise>
</xsl:choose>
</xsl:param>
<xsl:param name="callout.graphics" select="'0'"></xsl:param>
<xsl:param name="toc.section.depth">2</xsl:param>
<xsl:param name="linenumbering.extension" select="'0'"></xsl:param>
<xsl:param name="section.autolabel" select="1"></xsl:param>
<xsl:param name="section.label.includes.component.label" select="1"></xsl:param>
<xsl:param name="refentry.generate.name" select="0"></xsl:param>
<xsl:param name="refentry.generate.title" select="1"></xsl:param>
<xsl:param name="refentry.xref.manvolnum" select="0"/>
<xsl:param name="formal.procedures" select="0"></xsl:param>
<xsl:param name="generate.consistent.ids" select="1"/>
<xsl:param name="punct.honorific" select="''"></xsl:param>
<xsl:param name="variablelist.term.break.after">1</xsl:param>
<xsl:param name="variablelist.term.separator"></xsl:param>
<xsl:param name="xref.with.number.and.title" select="0"></xsl:param>
<!-- Change display of some elements -->
<xsl:template match="productname">
<xsl:call-template name="inline.charseq"/>
</xsl:template>
<xsl:template match="structfield">
<xsl:call-template name="inline.monoseq"/>
</xsl:template>
<xsl:template match="structname">
<xsl:call-template name="inline.monoseq"/>
</xsl:template>
<xsl:template match="symbol">
<xsl:call-template name="inline.monoseq"/>
</xsl:template>
<xsl:template match="systemitem">
<xsl:call-template name="inline.charseq"/>
</xsl:template>
<xsl:template match="token">
<xsl:call-template name="inline.monoseq"/>
</xsl:template>
<xsl:template match="type">
<xsl:call-template name="inline.monoseq"/>
</xsl:template>
<xsl:template match="programlisting/emphasis">
<xsl:call-template name="inline.boldseq"/>
</xsl:template>
<!-- Special support for Tcl synopses -->
<xsl:template match="optional[@role='tcl']">
<xsl:text>?</xsl:text>
<xsl:call-template name="inline.charseq"/>
<xsl:text>?</xsl:text>
</xsl:template>
</xsl:stylesheet>

97
doc/stylesheet-fo.xsl Normal file
View File

@@ -0,0 +1,97 @@
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"
xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/fo/docbook.xsl"/>
<xsl:include href="stylesheet-common.xsl" />
<xsl:param name="fop1.extensions" select="1"></xsl:param>
<xsl:param name="tablecolumns.extension" select="0"></xsl:param>
<xsl:param name="toc.max.depth">3</xsl:param>
<xsl:param name="ulink.footnotes" select="1"></xsl:param>
<xsl:param name="use.extensions" select="1"></xsl:param>
<xsl:param name="variablelist.as.blocks" select="1"></xsl:param>
<xsl:attribute-set name="monospace.verbatim.properties"
use-attribute-sets="verbatim.properties monospace.properties">
<xsl:attribute name="wrap-option">wrap</xsl:attribute>
</xsl:attribute-set>
<xsl:attribute-set name="nongraphical.admonition.properties">
<xsl:attribute name="border-style">solid</xsl:attribute>
<xsl:attribute name="border-width">1pt</xsl:attribute>
<xsl:attribute name="border-color">black</xsl:attribute>
<xsl:attribute name="padding-start">12pt</xsl:attribute>
<xsl:attribute name="padding-end">12pt</xsl:attribute>
<xsl:attribute name="padding-top">6pt</xsl:attribute>
<xsl:attribute name="padding-bottom">6pt</xsl:attribute>
</xsl:attribute-set>
<xsl:attribute-set name="admonition.title.properties">
<xsl:attribute name="text-align">center</xsl:attribute>
</xsl:attribute-set>
<!-- fix missing space after vertical simplelist
https://github.com/docbook/xslt10-stylesheets/issues/31 -->
<xsl:attribute-set name="normal.para.spacing">
<xsl:attribute name="space-after.optimum">1em</xsl:attribute>
<xsl:attribute name="space-after.minimum">0.8em</xsl:attribute>
<xsl:attribute name="space-after.maximum">1.2em</xsl:attribute>
</xsl:attribute-set>
<!-- Change display of some elements -->
<xsl:template match="command">
<xsl:call-template name="inline.monoseq"/>
</xsl:template>
<xsl:template match="confgroup" mode="bibliography.mode">
<fo:inline>
<xsl:apply-templates select="conftitle/text()" mode="bibliography.mode"/>
<xsl:text>, </xsl:text>
<xsl:apply-templates select="confdates/text()" mode="bibliography.mode"/>
<xsl:value-of select="$biblioentry.item.separator"/>
</fo:inline>
</xsl:template>
<xsl:template match="isbn" mode="bibliography.mode">
<fo:inline>
<xsl:text>ISBN </xsl:text>
<xsl:apply-templates mode="bibliography.mode"/>
<xsl:value-of select="$biblioentry.item.separator"/>
</fo:inline>
</xsl:template>
<!-- bug fix from <https://sourceforge.net/p/docbook/bugs/1360/#831b> -->
<xsl:template match="varlistentry/term" mode="xref-to">
<xsl:param name="verbose" select="1"/>
<xsl:apply-templates mode="no.anchor.mode"/>
</xsl:template>
<!-- include refsects in PDF bookmarks
(https://github.com/docbook/xslt10-stylesheets/issues/46) -->
<xsl:template match="refsect1|refsect2|refsect3"
mode="bookmark">
<xsl:variable name="id">
<xsl:call-template name="object.id"/>
</xsl:variable>
<xsl:variable name="bookmark-label">
<xsl:apply-templates select="." mode="object.title.markup"/>
</xsl:variable>
<fo:bookmark internal-destination="{$id}">
<xsl:attribute name="starting-state">
<xsl:value-of select="$bookmarks.state"/>
</xsl:attribute>
<fo:bookmark-title>
<xsl:value-of select="normalize-space($bookmark-label)"/>
</fo:bookmark-title>
<xsl:apply-templates select="*" mode="bookmark"/>
</fo:bookmark>
</xsl:template>
</xsl:stylesheet>

View File

@@ -0,0 +1,292 @@
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY % common.entities SYSTEM "http://docbook.sourceforge.net/release/xsl/current/common/entities.ent">
%common.entities;
]>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<!--
This file contains XSLT stylesheet customizations that are common to
all HTML output variants (chunked and single-page).
-->
<!-- Parameters -->
<xsl:param name="make.valid.html" select="1"></xsl:param>
<xsl:param name="generate.id.attributes" select="1"></xsl:param>
<xsl:param name="make.graphic.viewport" select="0"/>
<xsl:param name="link.mailto.url">pgsql-docs@lists.postgresql.org</xsl:param>
<xsl:param name="toc.max.depth">2</xsl:param>
<!-- Change display of some elements -->
<xsl:template match="command">
<xsl:call-template name="inline.monoseq"/>
</xsl:template>
<xsl:template match="confgroup" mode="bibliography.mode">
<span>
<xsl:call-template name="common.html.attributes"/>
<xsl:call-template name="id.attribute"/>
<xsl:apply-templates select="conftitle/text()" mode="bibliography.mode"/>
<xsl:text>, </xsl:text>
<xsl:apply-templates select="confdates/text()" mode="bibliography.mode"/>
<xsl:copy-of select="$biblioentry.item.separator"/>
</span>
</xsl:template>
<xsl:template match="isbn" mode="bibliography.mode">
<span>
<xsl:call-template name="common.html.attributes"/>
<xsl:call-template name="id.attribute"/>
<xsl:text>ISBN </xsl:text>
<xsl:apply-templates mode="bibliography.mode"/>
<xsl:copy-of select="$biblioentry.item.separator"/>
</span>
</xsl:template>
<!-- table of contents configuration -->
<xsl:param name="generate.toc">
appendix toc,title
article/appendix nop
article toc,title
book toc,title
chapter toc,title
part toc,title
preface toc,title
qandadiv toc
qandaset toc
reference toc,title
sect1 toc
sect2 toc
sect3 toc
sect4 toc
sect5 toc
section toc
set toc,title
</xsl:param>
<xsl:param name="generate.section.toc.level" select="1"></xsl:param>
<!-- include refentry under sect1 in tocs -->
<xsl:template match="sect1" mode="toc">
<xsl:param name="toc-context" select="."/>
<xsl:call-template name="subtoc">
<xsl:with-param name="toc-context" select="$toc-context"/>
<xsl:with-param name="nodes" select="sect2|refentry
|bridgehead[$bridgehead.in.toc != 0]"/>
</xsl:call-template>
</xsl:template>
<!-- Put index "quicklinks" (A | B | C | ...) at the top of the bookindex page. -->
<!-- from html/autoidx.xsl -->
<xsl:template name="generate-basic-index">
<xsl:param name="scope" select="NOTANODE"/>
<xsl:variable name="role">
<xsl:if test="$index.on.role != 0">
<xsl:value-of select="@role"/>
</xsl:if>
</xsl:variable>
<xsl:variable name="type">
<xsl:if test="$index.on.type != 0">
<xsl:value-of select="@type"/>
</xsl:if>
</xsl:variable>
<xsl:variable name="terms"
select="//indexterm
[count(.|key('letter',
translate(substring(&primary;, 1, 1),
&lowercase;,
&uppercase;))
[&scope;][1]) = 1
and not(@class = 'endofrange')]"/>
<xsl:variable name="alphabetical"
select="$terms[contains(concat(&lowercase;, &uppercase;),
substring(&primary;, 1, 1))]"/>
<xsl:variable name="others" select="$terms[not(contains(concat(&lowercase;,
&uppercase;),
substring(&primary;, 1, 1)))]"/>
<div class="index">
<!-- pgsql-docs: begin added stuff -->
<p class="indexdiv-quicklinks">
<a href="#indexdiv-Symbols">
<xsl:call-template name="gentext">
<xsl:with-param name="key" select="'index symbols'"/>
</xsl:call-template>
</a>
<xsl:apply-templates select="$alphabetical[count(.|key('letter',
translate(substring(&primary;, 1, 1),
&lowercase;,&uppercase;))[&scope;][1]) = 1]"
mode="index-div-quicklinks">
<xsl:with-param name="position" select="position()"/>
<xsl:with-param name="scope" select="$scope"/>
<xsl:with-param name="role" select="$role"/>
<xsl:with-param name="type" select="$type"/>
<xsl:sort select="translate(&primary;, &lowercase;, &uppercase;)"/>
</xsl:apply-templates>
</p>
<!-- pgsql-docs: end added stuff -->
<xsl:if test="$others">
<xsl:choose>
<xsl:when test="normalize-space($type) != '' and
$others[@type = $type][count(.|key('primary', &primary;)[&scope;][1]) = 1]">
<!-- pgsql-docs: added id attribute here for linking to it -->
<div class="indexdiv" id="indexdiv-Symbols">
<h3>
<xsl:call-template name="gentext">
<xsl:with-param name="key" select="'index symbols'"/>
</xsl:call-template>
</h3>
<dl>
<xsl:apply-templates select="$others[count(.|key('primary', &primary;)[&scope;][1]) = 1]"
mode="index-symbol-div">
<xsl:with-param name="position" select="position()"/>
<xsl:with-param name="scope" select="$scope"/>
<xsl:with-param name="role" select="$role"/>
<xsl:with-param name="type" select="$type"/>
<xsl:sort select="translate(&primary;, &lowercase;, &uppercase;)"/>
</xsl:apply-templates>
</dl>
</div>
</xsl:when>
<xsl:when test="normalize-space($type) != ''">
<!-- Output nothing, as there isn't a match for $other using this $type -->
</xsl:when>
<xsl:otherwise>
<!-- pgsql-docs: added id attribute here for linking to it -->
<div class="indexdiv" id="indexdiv-Symbols">
<h3>
<xsl:call-template name="gentext">
<xsl:with-param name="key" select="'index symbols'"/>
</xsl:call-template>
</h3>
<dl>
<xsl:apply-templates select="$others[count(.|key('primary',
&primary;)[&scope;][1]) = 1]"
mode="index-symbol-div">
<xsl:with-param name="position" select="position()"/>
<xsl:with-param name="scope" select="$scope"/>
<xsl:with-param name="role" select="$role"/>
<xsl:with-param name="type" select="$type"/>
<xsl:sort select="translate(&primary;, &lowercase;, &uppercase;)"/>
</xsl:apply-templates>
</dl>
</div>
</xsl:otherwise>
</xsl:choose>
</xsl:if>
<xsl:apply-templates select="$alphabetical[count(.|key('letter',
translate(substring(&primary;, 1, 1),
&lowercase;,&uppercase;))[&scope;][1]) = 1]"
mode="index-div-basic">
<xsl:with-param name="position" select="position()"/>
<xsl:with-param name="scope" select="$scope"/>
<xsl:with-param name="role" select="$role"/>
<xsl:with-param name="type" select="$type"/>
<xsl:sort select="translate(&primary;, &lowercase;, &uppercase;)"/>
</xsl:apply-templates>
</div>
</xsl:template>
<xsl:template match="indexterm" mode="index-div-basic">
<xsl:param name="scope" select="."/>
<xsl:param name="role" select="''"/>
<xsl:param name="type" select="''"/>
<xsl:variable name="key"
select="translate(substring(&primary;, 1, 1),
&lowercase;,&uppercase;)"/>
<xsl:if test="key('letter', $key)[&scope;]
[count(.|key('primary', &primary;)[&scope;][1]) = 1]">
<div class="indexdiv">
<!-- pgsql-docs: added id attribute here for linking to it -->
<xsl:attribute name="id">
<xsl:value-of select="concat('indexdiv-', $key)"/>
</xsl:attribute>
<xsl:if test="contains(concat(&lowercase;, &uppercase;), $key)">
<h3>
<xsl:value-of select="translate($key, &lowercase;, &uppercase;)"/>
</h3>
</xsl:if>
<dl>
<xsl:apply-templates select="key('letter', $key)[&scope;]
[count(.|key('primary', &primary;)
[&scope;][1])=1]"
mode="index-primary">
<xsl:with-param name="position" select="position()"/>
<xsl:with-param name="scope" select="$scope"/>
<xsl:with-param name="role" select="$role"/>
<xsl:with-param name="type" select="$type"/>
<xsl:sort select="translate(&primary;, &lowercase;, &uppercase;)"/>
</xsl:apply-templates>
</dl>
</div>
</xsl:if>
</xsl:template>
<!-- pgsql-docs -->
<xsl:template match="indexterm" mode="index-div-quicklinks">
<xsl:param name="scope" select="."/>
<xsl:param name="role" select="''"/>
<xsl:param name="type" select="''"/>
<xsl:variable name="key"
select="translate(substring(&primary;, 1, 1),
&lowercase;,&uppercase;)"/>
<xsl:if test="key('letter', $key)[&scope;]
[count(.|key('primary', &primary;)[&scope;][1]) = 1]">
<xsl:if test="contains(concat(&lowercase;, &uppercase;), $key)">
|
<a>
<xsl:attribute name="href">
<xsl:value-of select="concat('#indexdiv-', $key)"/>
</xsl:attribute>
<xsl:value-of select="translate($key, &lowercase;, &uppercase;)"/>
</a>
</xsl:if>
</xsl:if>
</xsl:template>
<!-- upper case HTML anchors for backward compatibility -->
<xsl:template name="object.id">
<xsl:param name="object" select="."/>
<xsl:choose>
<xsl:when test="$object/@id">
<xsl:value-of select="translate($object/@id, &lowercase;, &uppercase;)"/>
</xsl:when>
<xsl:when test="$object/@xml:id">
<xsl:value-of select="$object/@xml:id"/>
</xsl:when>
<xsl:when test="$generate.consistent.ids != 0">
<!-- Make $object the current node -->
<xsl:for-each select="$object">
<xsl:text>id-</xsl:text>
<xsl:number level="multiple" count="*"/>
</xsl:for-each>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="generate-id($object)"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>

View File

@@ -0,0 +1,23 @@
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version='1.0'
xmlns="http://www.w3.org/TR/xhtml1/transitional"
exclude-result-prefixes="#default">
<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/xhtml/docbook.xsl"/>
<xsl:include href="stylesheet-common.xsl" />
<xsl:include href="stylesheet-html-common.xsl" />
<xsl:include href="stylesheet-speedup-xhtml.xsl" />
<!-- embed SVG images into output file -->
<xsl:template match="imagedata[@format='SVG']">
<xsl:variable name="filename">
<xsl:call-template name="mediaobject.filename">
<xsl:with-param name="object" select=".."/>
</xsl:call-template>
</xsl:variable>
<xsl:copy-of select="document($filename)"/>
</xsl:template>
</xsl:stylesheet>

View File

@@ -0,0 +1,100 @@
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version='1.0'>
<!-- Performance-optimized versions of some upstream templates from common/
directory -->
<!-- from common/labels.xsl -->
<xsl:template match="chapter" mode="label.markup">
<xsl:choose>
<xsl:when test="@label">
<xsl:value-of select="@label"/>
</xsl:when>
<xsl:when test="string($chapter.autolabel) != 0">
<xsl:if test="$component.label.includes.part.label != 0 and
ancestor::part">
<xsl:variable name="part.label">
<xsl:apply-templates select="ancestor::part"
mode="label.markup"/>
</xsl:variable>
<xsl:if test="$part.label != ''">
<xsl:value-of select="$part.label"/>
<xsl:apply-templates select="ancestor::part"
mode="intralabel.punctuation">
<xsl:with-param name="object" select="."/>
</xsl:apply-templates>
</xsl:if>
</xsl:if>
<xsl:variable name="format">
<xsl:call-template name="autolabel.format">
<xsl:with-param name="format" select="$chapter.autolabel"/>
</xsl:call-template>
</xsl:variable>
<xsl:choose>
<xsl:when test="$label.from.part != 0 and ancestor::part">
<xsl:number from="part" count="chapter" format="{$format}" level="any"/>
</xsl:when>
<xsl:otherwise>
<!-- Optimization for pgsql-docs: When counting to get label for
this chapter, preceding chapters can only be our siblings or
children of a preceding part, so only count those instead of
scanning the entire node tree. -->
<!-- <xsl:number from="book" count="chapter" format="{$format}" level="any"/> -->
<xsl:number value="count(../preceding-sibling::part/chapter) + count(preceding-sibling::chapter) + 1" format="{$format}"/>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
</xsl:choose>
</xsl:template>
<xsl:template match="appendix" mode="label.markup">
<xsl:choose>
<xsl:when test="@label">
<xsl:value-of select="@label"/>
</xsl:when>
<xsl:when test="string($appendix.autolabel) != 0">
<xsl:if test="$component.label.includes.part.label != 0 and
ancestor::part">
<xsl:variable name="part.label">
<xsl:apply-templates select="ancestor::part"
mode="label.markup"/>
</xsl:variable>
<xsl:if test="$part.label != ''">
<xsl:value-of select="$part.label"/>
<xsl:apply-templates select="ancestor::part"
mode="intralabel.punctuation">
<xsl:with-param name="object" select="."/>
</xsl:apply-templates>
</xsl:if>
</xsl:if>
<xsl:variable name="format">
<xsl:call-template name="autolabel.format">
<xsl:with-param name="format" select="$appendix.autolabel"/>
</xsl:call-template>
</xsl:variable>
<xsl:choose>
<xsl:when test="$label.from.part != 0 and ancestor::part">
<xsl:number from="part" count="appendix" format="{$format}" level="any"/>
</xsl:when>
<xsl:otherwise>
<!-- Optimization for pgsql-docs: When counting to get label for
this appendix, preceding appendixes can only be our siblings or
children of a preceding part, so only count those instead of
scanning the entire node tree. -->
<!-- <xsl:number from="book|article" count="appendix" format="{$format}" level="any"/> -->
<xsl:number value="count(../preceding-sibling::part/appendix) + count(preceding-sibling::appendix) + 1" format="{$format}"/>
</xsl:otherwise>
</xsl:choose>
</xsl:when>
</xsl:choose>
</xsl:template>
<!-- from common/l10n.xsl -->
<!-- Just hardcode the language for the whole document, to make it faster. -->
<xsl:template name="l10n.language">en</xsl:template>
</xsl:stylesheet>

View File

@@ -0,0 +1,345 @@
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml"
version='1.0'>
<!-- Performance-optimized versions of some upstream templates from xhtml/
directory -->
<!-- from xhtml/autoidx.xsl -->
<xsl:template match="indexterm" mode="reference">
<xsl:param name="scope" select="."/>
<xsl:param name="role" select="''"/>
<xsl:param name="type" select="''"/>
<xsl:param name="position"/>
<xsl:param name="separator" select="''"/>
<xsl:variable name="term.separator">
<xsl:call-template name="index.separator">
<xsl:with-param name="key" select="'index.term.separator'"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="number.separator">
<xsl:call-template name="index.separator">
<xsl:with-param name="key" select="'index.number.separator'"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="range.separator">
<xsl:call-template name="index.separator">
<xsl:with-param name="key" select="'index.range.separator'"/>
</xsl:call-template>
</xsl:variable>
<xsl:choose>
<xsl:when test="$separator != ''">
<xsl:value-of select="$separator"/>
</xsl:when>
<xsl:when test="$position = 1">
<xsl:value-of select="$term.separator"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="$number.separator"/>
</xsl:otherwise>
</xsl:choose>
<xsl:choose>
<xsl:when test="@zone and string(@zone)">
<xsl:call-template name="reference">
<xsl:with-param name="zones" select="normalize-space(@zone)"/>
<xsl:with-param name="position" select="position()"/>
<xsl:with-param name="scope" select="$scope"/>
<xsl:with-param name="role" select="$role"/>
<xsl:with-param name="type" select="$type"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<a>
<xsl:apply-templates select="." mode="class.attribute"/>
<xsl:variable name="title">
<xsl:choose>
<xsl:when test="$index.prefer.titleabbrev != 0">
<xsl:apply-templates select="(ancestor-or-self::set|ancestor-or-self::book|ancestor-or-self::part|ancestor-or-self::reference|ancestor-or-self::partintro|ancestor-or-self::chapter|ancestor-or-self::appendix|ancestor-or-self::preface|ancestor-or-self::article|ancestor-or-self::section|ancestor-or-self::sect1|ancestor-or-self::sect2|ancestor-or-self::sect3|ancestor-or-self::sect4|ancestor-or-self::sect5|ancestor-or-self::refentry|ancestor-or-self::refsect1|ancestor-or-self::refsect2|ancestor-or-self::refsect3|ancestor-or-self::simplesect|ancestor-or-self::bibliography|ancestor-or-self::glossary|ancestor-or-self::index|ancestor-or-self::webpage|ancestor-or-self::topic)[last()]" mode="titleabbrev.markup"/>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="(ancestor-or-self::set|ancestor-or-self::book|ancestor-or-self::part|ancestor-or-self::reference|ancestor-or-self::partintro|ancestor-or-self::chapter|ancestor-or-self::appendix|ancestor-or-self::preface|ancestor-or-self::article|ancestor-or-self::section|ancestor-or-self::sect1|ancestor-or-self::sect2|ancestor-or-self::sect3|ancestor-or-self::sect4|ancestor-or-self::sect5|ancestor-or-self::refentry|ancestor-or-self::refsect1|ancestor-or-self::refsect2|ancestor-or-self::refsect3|ancestor-or-self::simplesect|ancestor-or-self::bibliography|ancestor-or-self::glossary|ancestor-or-self::index|ancestor-or-self::webpage|ancestor-or-self::topic)[last()]" mode="title.markup"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:attribute name="href">
<xsl:choose>
<xsl:when test="$index.links.to.section = 1">
<xsl:call-template name="href.target">
<xsl:with-param name="object" select="(ancestor-or-self::set|ancestor-or-self::book|ancestor-or-self::part|ancestor-or-self::reference|ancestor-or-self::partintro|ancestor-or-self::chapter|ancestor-or-self::appendix|ancestor-or-self::preface|ancestor-or-self::article|ancestor-or-self::section|ancestor-or-self::sect1|ancestor-or-self::sect2|ancestor-or-self::sect3|ancestor-or-self::sect4|ancestor-or-self::sect5|ancestor-or-self::refentry|ancestor-or-self::refsect1|ancestor-or-self::refsect2|ancestor-or-self::refsect3|ancestor-or-self::simplesect|ancestor-or-self::bibliography|ancestor-or-self::glossary|ancestor-or-self::index|ancestor-or-self::webpage|ancestor-or-self::topic)[last()]"/>
<!-- Optimization for pgsql-docs: We only have an index as a
child of book, so look that up directly instead of
scanning the entire node tree. Also, don't look for
setindex. -->
<!-- <xsl:with-param name="context" select="(//index[count(ancestor::node()|$scope) = count(ancestor::node()) and ($role = @role or $type = @type or (string-length($role) = 0 and string-length($type) = 0))] | //setindex[count(ancestor::node()|$scope) = count(ancestor::node()) and ($role = @role or $type = @type or (string-length($role) = 0 and string-length($type) = 0))])[1]"/> -->
<xsl:with-param name="context" select="(/book/index[count(ancestor::node()|$scope) = count(ancestor::node()) and ($role = @role or $type = @type or (string-length($role) = 0 and string-length($type) = 0))])[1]"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="href.target">
<xsl:with-param name="object" select="."/>
<xsl:with-param name="context" select="(//index[count(ancestor::node()|$scope) = count(ancestor::node()) and ($role = @role or $type = @type or (string-length($role) = 0 and string-length($type) = 0))] | //setindex[count(ancestor::node()|$scope) = count(ancestor::node()) and ($role = @role or $type = @type or (string-length($role) = 0 and string-length($type) = 0))])[1]"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:attribute>
<xsl:value-of select="$title"/> <!-- text only -->
</a>
<xsl:variable name="id" select="(@id|@xml:id)[1]"/>
<xsl:if test="key('endofrange', $id)[count(ancestor::node()|$scope) = count(ancestor::node()) and ($role = @role or $type = @type or (string-length($role) = 0 and string-length($type) = 0))]">
<xsl:apply-templates select="key('endofrange', $id)[count(ancestor::node()|$scope) = count(ancestor::node()) and ($role = @role or $type = @type or (string-length($role) = 0 and string-length($type) = 0))][last()]" mode="reference">
<xsl:with-param name="position" select="position()"/>
<xsl:with-param name="scope" select="$scope"/>
<xsl:with-param name="role" select="$role"/>
<xsl:with-param name="type" select="$type"/>
<xsl:with-param name="separator" select="$range.separator"/>
</xsl:apply-templates>
</xsl:if>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template name="reference">
<xsl:param name="scope" select="."/>
<xsl:param name="role" select="''"/>
<xsl:param name="type" select="''"/>
<xsl:param name="zones"/>
<xsl:choose>
<xsl:when test="contains($zones, ' ')">
<xsl:variable name="zone" select="substring-before($zones, ' ')"/>
<xsl:variable name="target" select="key('sections', $zone)"/>
<a>
<xsl:apply-templates select="." mode="class.attribute"/>
<!-- Optimization for pgsql-docs: this call adds nothing but fails with docbook-xsl 1.76 -->
<!-- <xsl:call-template name="id.attribute"/> -->
<xsl:attribute name="href">
<xsl:call-template name="href.target">
<xsl:with-param name="object" select="$target[1]"/>
<xsl:with-param name="context" select="//index[count(ancestor::node()|$scope) = count(ancestor::node()) and ($role = @role or $type = @type or (string-length($role) = 0 and string-length($type) = 0))][1]"/>
</xsl:call-template>
</xsl:attribute>
<xsl:apply-templates select="$target[1]" mode="index-title-content"/>
</a>
<xsl:text>, </xsl:text>
<xsl:call-template name="reference">
<xsl:with-param name="zones" select="substring-after($zones, ' ')"/>
<xsl:with-param name="position" select="position()"/>
<xsl:with-param name="scope" select="$scope"/>
<xsl:with-param name="role" select="$role"/>
<xsl:with-param name="type" select="$type"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:variable name="zone" select="$zones"/>
<xsl:variable name="target" select="key('sections', $zone)"/>
<a>
<xsl:apply-templates select="." mode="class.attribute"/>
<!-- Optimization for pgsql-docs: this call adds nothing but fails with docbook-xsl 1.76 -->
<!-- <xsl:call-template name="id.attribute"/> -->
<xsl:attribute name="href">
<xsl:call-template name="href.target">
<xsl:with-param name="object" select="$target[1]"/>
<!-- Optimization for pgsql-docs: Only look for index under book
instead of searching the whole node tree. -->
<!-- <xsl:with-param name="context" select="//index[count(ancestor::node()|$scope) = count(ancestor::node()) and ($role = @role or $type = @type or (string-length($role) = 0 and string-length($type) = 0))][1]"/> -->
<xsl:with-param name="context" select="/book/index[count(ancestor::node()|$scope) = count(ancestor::node()) and ($role = @role or $type = @type or (string-length($role) = 0 and string-length($type) = 0))][1]"/>
</xsl:call-template>
</xsl:attribute>
<xsl:apply-templates select="$target[1]" mode="index-title-content"/>
</a>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<!-- from xhtml/chunk-common.xsl -->
<xsl:template name="chunk-all-sections">
<xsl:param name="content">
<xsl:apply-imports/>
</xsl:param>
<!-- Optimization for pgsql-docs: Since we set a fixed $chunk.section.depth,
we can do away with a bunch of complicated XPath searches for the
previous and next sections at various levels. -->
<xsl:if test="$chunk.section.depth != 1">
<xsl:message terminate="yes">
<xsl:text>Error: If you change $chunk.section.depth, then you must update the performance-optimized chunk-all-sections-template.</xsl:text>
</xsl:message>
</xsl:if>
<xsl:variable name="prev"
select="(preceding::book[1]
|preceding::preface[1]
|preceding::chapter[1]
|preceding::appendix[1]
|preceding::part[1]
|preceding::reference[1]
|preceding::refentry[1]
|preceding::colophon[1]
|preceding::article[1]
|preceding::topic[1]
|preceding::bibliography[parent::article or parent::book or parent::part][1]
|preceding::glossary[parent::article or parent::book or parent::part][1]
|preceding::index[$generate.index != 0]
[parent::article or parent::book or parent::part][1]
|preceding::setindex[$generate.index != 0][1]
|ancestor::set
|ancestor::book[1]
|ancestor::preface[1]
|ancestor::chapter[1]
|ancestor::appendix[1]
|ancestor::part[1]
|ancestor::reference[1]
|ancestor::article[1]
|ancestor::topic[1]
|preceding::sect1[1]
|ancestor::sect1[1])[last()]"/>
<xsl:variable name="next"
select="(following::book[1]
|following::preface[1]
|following::chapter[1]
|following::appendix[1]
|following::part[1]
|following::reference[1]
|following::refentry[1]
|following::colophon[1]
|following::bibliography[parent::article or parent::book or parent::part][1]
|following::glossary[parent::article or parent::book or parent::part][1]
|following::index[$generate.index != 0]
[parent::article or parent::book][1]
|following::article[1]
|following::topic[1]
|following::setindex[$generate.index != 0][1]
|descendant::book[1]
|descendant::preface[1]
|descendant::chapter[1]
|descendant::appendix[1]
|descendant::article[1]
|descendant::topic[1]
|descendant::bibliography[parent::article or parent::book][1]
|descendant::glossary[parent::article or parent::book or parent::part][1]
|descendant::index[$generate.index != 0]
[parent::article or parent::book][1]
|descendant::colophon[1]
|descendant::setindex[$generate.index != 0][1]
|descendant::part[1]
|descendant::reference[1]
|descendant::refentry[1]
|following::sect1[1]
|descendant::sect1[1])[1]"/>
<xsl:call-template name="process-chunk">
<xsl:with-param name="prev" select="$prev"/>
<xsl:with-param name="next" select="$next"/>
<xsl:with-param name="content" select="$content"/>
</xsl:call-template>
</xsl:template>
<xsl:template name="href.target">
<xsl:param name="context" select="."/>
<xsl:param name="object" select="."/>
<xsl:param name="toc-context" select="."/>
<!-- Optimization for pgsql-docs: Remove support for dbhtml processing
instruction here -->
<xsl:variable name="href.to.uri">
<xsl:call-template name="href.target.uri">
<xsl:with-param name="object" select="$object"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="href.from.uri">
<xsl:choose>
<xsl:when test="not($toc-context = .)">
<xsl:call-template name="href.target.uri">
<xsl:with-param name="object" select="$toc-context"/>
</xsl:call-template>
</xsl:when>
<xsl:otherwise>
<xsl:call-template name="href.target.uri">
<xsl:with-param name="object" select="$context"/>
</xsl:call-template>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<xsl:variable name="href.to">
<xsl:value-of select="$href.to.uri"/>
</xsl:variable>
<xsl:variable name="href.from">
<xsl:call-template name="trim.common.uri.paths">
<xsl:with-param name="uriA" select="$href.to.uri"/>
<xsl:with-param name="uriB" select="$href.from.uri"/>
<xsl:with-param name="return" select="'B'"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="depth">
<xsl:call-template name="count.uri.path.depth">
<xsl:with-param name="filename" select="$href.from"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="href">
<xsl:call-template name="copy-string">
<xsl:with-param name="string" select="'../'"/>
<xsl:with-param name="count" select="$depth"/>
</xsl:call-template>
<xsl:value-of select="$href.to"/>
</xsl:variable>
<xsl:value-of select="$href"/>
</xsl:template>
<xsl:template name="html.head">
<xsl:param name="prev" select="/foo"/>
<xsl:param name="next" select="/foo"/>
<!-- Optimization for pgsql-docs: Cut out a bunch of things we don't need
here, including an expensive //legalnotice search. -->
<head>
<xsl:call-template name="system.head.content"/>
<xsl:call-template name="head.content"/>
<xsl:if test="$prev">
<link rel="prev">
<xsl:attribute name="href">
<xsl:call-template name="href.target">
<xsl:with-param name="object" select="$prev"/>
</xsl:call-template>
</xsl:attribute>
<xsl:attribute name="title">
<xsl:apply-templates select="$prev" mode="object.title.markup.textonly"/>
</xsl:attribute>
</link>
</xsl:if>
<xsl:if test="$next">
<link rel="next">
<xsl:attribute name="href">
<xsl:call-template name="href.target">
<xsl:with-param name="object" select="$next"/>
</xsl:call-template>
</xsl:attribute>
<xsl:attribute name="title">
<xsl:apply-templates select="$next" mode="object.title.markup.textonly"/>
</xsl:attribute>
</link>
</xsl:if>
<xsl:call-template name="user.head.content"/>
</head>
</xsl:template>
</xsl:stylesheet>

View File

@@ -1,96 +1,471 @@
/* doc/src/sgml/stylesheet.css */
/* PostgreSQL.org Documentation Style */
/* color scheme similar to www.postgresql.org */
@import 'website-docs.css';
BODY {
color: #000000;
background: #FFFFFF;
font-family: verdana, sans-serif;
/* requires global.css, table.css and text.css to be loaded before this file! */
body {
font-family: verdana, sans-serif;
font-size: 76%;
background: url("/resources/background.png") repeat-x scroll left top transparent;
padding: 15px 4%;
margin: 0;
}
A:link { color:#0066A2; }
A:visited { color:#004E66; }
A:active { color:#0066A2; }
A:hover { color:#000000; }
H1 {
font-size: 1.4em;
font-weight: bold;
margin-top: 0em;
margin-bottom: 0em;
color: #EC5800;
/* monospace font size fix */
pre, code, kbd, samp, tt {
font-family: monospace,monospace;
font-size: 1em;
}
H2 {
font-size: 1.2em;
margin: 1.2em 0em 1.2em 0em;
font-weight: bold;
color: #666;
div.NAVHEADER table {
margin-left: 0;
}
H3 {
font-size: 1.1em;
margin: 1.2em 0em 1.2em 0em;
font-weight: bold;
color: #666;
/* Container Definitions */
#docContainerWrap {
text-align: center; /* Win IE5 */
}
H4 {
font-size: 0.95em;
margin: 1.2em 0em 1.2em 0em;
font-weight: normal;
color: #666;
#docContainer {
margin: 0 auto;
width: 90%;
padding-bottom: 2em;
display: block;
text-align: left; /* Win IE5 */
}
H5 {
font-size: 0.9em;
margin: 1.2em 0em 1.2em 0em;
font-weight: normal;
#docHeader {
background-image: url("/media/img/docs/bg_hdr.png");
height: 83px;
margin: 0px;
padding: 0px;
display: block;
}
H6 {
font-size: 0.85em;
margin: 1.2em 0em 1.2em 0em;
font-weight: normal;
#docHeaderLogo {
position: relative;
width: 206px;
height: 83px;
border: 0px;
padding: 0px;
margin: 0 0 0 20px;
}
/* center some titles */
.BOOK .TITLE, .BOOK .CORPAUTHOR, .BOOK .COPYRIGHT {
text-align: center;
#docHeaderLogo img {
border: 0px;
}
/* decoration for formal examples */
DIV.EXAMPLE {
padding-left: 15px;
border-style: solid;
border-width: 0px;
border-left-width: 2px;
border-color: black;
margin: 0.5ex;
#docNavSearchContainer {
padding-bottom: 2px;
}
/* less dense spacing of TOC */
.BOOK .TOC DL DT {
padding-top: 1.5ex;
padding-bottom: 1.5ex;
#docNav, #docVersions {
position: relative;
text-align: left;
margin-left: 10px;
margin-top: 5px;
color: #666;
font-size: 0.95em;
}
.BOOK .TOC DL DL DT {
padding-top: 0ex;
padding-bottom: 0ex;
#docSearch {
position: relative;
text-align: right;
padding: 0;
margin: 0;
color: #666;
}
/* miscellaneous */
PRE.LITERALLAYOUT, .SCREEN, .SYNOPSIS, .PROGRAMLISTING {
margin-left: 4ex;
#docTextSize {
text-align: right;
white-space: nowrap;
margin-top: 7px;
font-size: 0.95em;
}
.COMMENT { color: red; }
#docSearch form {
position: relative;
top: 5px;
right: 0;
margin: 0; /* need for IE 5.5 OSX */
text-align: right; /* need for IE 5.5 OSX */
white-space: nowrap; /* for Opera */
}
VAR { font-family: monospace; font-style: italic; }
/* Konqueror's standard style for ACRONYM is italic. */
ACRONYM { font-style: inherit; }
#docSearch form label {
color: #666;
font-size: 0.95em;
}
#docSearch form input {
font-size: 0.95em;
}
#docSearch form #submit {
font-size: 0.95em;
background: #7A7A7A;
color: #fff;
border: 1px solid #7A7A7A;
padding: 1px 4px;
}
#docSearch form #q {
width: 170px;
font-size: 0.95em;
border: 1px solid #7A7A7A;
background: #E1E1E1;
color: #000000;
padding: 2px;
}
.frmDocSearch {
padding: 0;
margin: 0;
display: inline;
}
.inpDocSearch {
padding: 0;
margin: 0;
color: #000;
}
#docContent {
position: relative;
margin-left: 10px;
margin-right: 10px;
margin-top: 40px;
}
#docFooter {
position: relative;
font-size: 0.9em;
color: #666;
line-height: 1.3em;
margin-left: 10px;
margin-right: 10px;
}
#docComments {
margin-top: 10px;
}
#docClear {
clear: both;
margin: 0;
padding: 0;
}
/* Heading Definitions */
h1, h2, h3 {
font-weight: bold;
margin-top: 2ex;
color: #444;
}
h1 {
font-size: 1.4em;
}
h2 {
font-size: 1.2em !important;
}
h3 {
font-size: 1.1em;
}
h1 a:hover,
h2 a:hover,
h3 a:hover,
h4 a:hover {
color: #444;
text-decoration: none;
}
/* Text Styles */
div.SECT2 {
margin-top: 4ex;
}
div.SECT3 {
margin-top: 3ex;
margin-left: 3ex;
}
.txtCurrentLocation {
font-weight: bold;
}
p, ol, ul, li {
line-height: 1.5em;
}
.txtCommentsWrap {
border: 2px solid #F5F5F5;
width: 100%;
}
.txtCommentsContent {
background: #F5F5F5;
padding: 3px;
}
.txtCommentsPoster {
float: left;
}
.txtCommentsDate {
float: right;
}
.txtCommentsComment {
padding: 3px;
}
#docContainer pre code,
#docContainer pre tt,
#docContainer pre pre,
#docContainer tt tt,
#docContainer tt code,
#docContainer tt pre {
font-size: 1em;
}
pre.LITERALLAYOUT,
.SCREEN,
.SYNOPSIS,
.PROGRAMLISTING,
.REFSYNOPSISDIV p,
table.CAUTION,
table.WARNING,
blockquote.NOTE,
blockquote.TIP,
table.CALSTABLE {
-moz-box-shadow: 3px 3px 5px #DFDFDF;
-webkit-box-shadow: 3px 3px 5px #DFDFDF;
-khtml-box-shadow: 3px 3px 5px #DFDFDF;
-o-box-shadow: 3px 3px 5px #DFDFDF;
box-shadow: 3px 3px 5px #DFDFDF;
}
pre.LITERALLAYOUT,
.SCREEN,
.SYNOPSIS,
.PROGRAMLISTING,
.REFSYNOPSISDIV p,
table.CAUTION,
table.WARNING,
blockquote.NOTE,
blockquote.TIP {
color: black;
border-width: 1px;
border-style: solid;
padding: 2ex;
margin: 2ex 0 2ex 2ex;
overflow: auto;
-moz-border-radius: 8px;
-webkit-border-radius: 8px;
-khtml-border-radius: 8px;
border-radius: 8px;
}
pre.LITERALLAYOUT,
pre.SYNOPSIS,
pre.PROGRAMLISTING,
.REFSYNOPSISDIV p,
.SCREEN {
border-color: #CFCFCF;
background-color: #F7F7F7;
}
blockquote.NOTE,
blockquote.TIP {
border-color: #DBDBCC;
background-color: #EEEEDD;
padding: 14px;
width: 572px;
}
blockquote.NOTE,
blockquote.TIP,
table.CAUTION,
table.WARNING {
margin: 4ex auto;
}
blockquote.NOTE p,
blockquote.TIP p {
margin: 0;
}
blockquote.NOTE pre,
blockquote.NOTE code,
blockquote.TIP pre,
blockquote.TIP code {
margin-left: 0;
margin-right: 0;
-moz-box-shadow: none;
-webkit-box-shadow: none;
-khtml-box-shadow: none;
-o-box-shadow: none;
box-shadow: none;
}
.emphasis,
.c2 {
font-weight: bold;
}
.REPLACEABLE {
font-style: italic;
}
/* Table Styles */
table {
margin-left: 2ex;
}
table.CALSTABLE td,
table.CALSTABLE th,
table.CAUTION td,
table.CAUTION th,
table.WARNING td,
table.WARNING th {
border-style: solid;
}
table.CALSTABLE,
table.CAUTION,
table.WARNING {
border-spacing: 0;
border-collapse: collapse;
}
table.CALSTABLE
{
margin: 2ex 0 2ex 2ex;
background-color: #E0ECEF;
border: 2px solid #A7C6DF;
}
table.CALSTABLE tr:hover td
{
background-color: #EFEFEF;
}
table.CALSTABLE td {
background-color: #FFF;
}
table.CALSTABLE td,
table.CALSTABLE th {
border: 1px solid #A7C6DF;
padding: 0.5ex 0.5ex;
}
table.CAUTION,
table.WARNING {
border-collapse: separate;
display: block;
padding: 0;
max-width: 600px;
}
table.CAUTION {
background-color: #F5F5DC;
border-color: #DEDFA7;
}
table.WARNING {
background-color: #FFD7D7;
border-color: #DF421E;
}
table.CAUTION td,
table.CAUTION th,
table.WARNING td,
table.WARNING th {
border-width: 0;
padding-left: 2ex;
padding-right: 2ex;
}
table.CAUTION td,
table.CAUTION th {
border-color: #F3E4D5
}
table.WARNING td,
table.WARNING th {
border-color: #FFD7D7;
}
td.c1,
td.c2,
td.c3,
td.c4,
td.c5,
td.c6 {
font-size: 1.1em;
font-weight: bold;
border-bottom: 0px solid #FFEFEF;
padding: 1ex 2ex 0;
}
/* Link Styles */
#docNav a {
font-weight: bold;
}
a:link,
a:visited,
a:active,
a:hover {
text-decoration: underline;
}
a:link,
a:active {
color:#0066A2;
}
a:visited {
color:#004E66;
}
a:hover {
color:#000000;
}
#docFooter a:link,
#docFooter a:visited,
#docFooter a:active {
color:#666;
}
#docContainer code.FUNCTION tt {
font-size: 1em;
}
div.header {
color: #444;
margin-top: 5px;
}
div.footer {
text-align: center;
background-image: url("/resources/footerl.png"), url("/resources/footerr.png"), url("/resources/footerc.png");
background-position: left top, right top, center top;
background-repeat: no-repeat, no-repeat, repeat-x;
padding-top: 45px;
}
img {
border-style: none;
}

174
doc/stylesheet.xsl Normal file
View File

@@ -0,0 +1,174 @@
<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version='1.0'
xmlns="http://www.w3.org/TR/xhtml1/transitional"
exclude-result-prefixes="#default">
<xsl:import href="http://docbook.sourceforge.net/release/xsl/current/xhtml/chunk.xsl"/>
<xsl:include href="stylesheet-common.xsl" />
<xsl:include href="stylesheet-html-common.xsl" />
<xsl:include href="stylesheet-speedup-xhtml.xsl" />
<!-- Parameters -->
<xsl:param name="base.dir" select="'html/'"></xsl:param>
<xsl:param name="use.id.as.filename" select="'1'"></xsl:param>
<xsl:param name="generate.legalnotice.link" select="1"></xsl:param>
<xsl:param name="chunk.first.sections" select="1"/>
<xsl:param name="chunk.quietly" select="1"></xsl:param>
<xsl:param name="admon.style"></xsl:param> <!-- handled by CSS stylesheet -->
<xsl:param name="website.stylesheet" select="0"/>
<xsl:param name="html.stylesheet">
<xsl:choose>
<xsl:when test="$website.stylesheet = 0">stylesheet.css</xsl:when>
<xsl:otherwise>https://www.postgresql.org/media/css/docs.css</xsl:otherwise>
</xsl:choose>
</xsl:param>
<!-- strip directory name from image filerefs -->
<xsl:template match="imagedata/@fileref">
<xsl:value-of select="substring-after(., '/')"/>
</xsl:template>
<!--
Customization of header
- add Up and Home links
- add tool tips to links
(overrides html/chunk-common.xsl)
-->
<xsl:template name="header.navigation">
<xsl:param name="prev" select="/foo"/>
<xsl:param name="next" select="/foo"/>
<xsl:param name="nav.context"/>
<xsl:variable name="home" select="/*[1]"/>
<xsl:variable name="up" select="parent::*"/>
<xsl:variable name="row1" select="$navig.showtitles != 0"/>
<xsl:variable name="row2" select="count($prev) &gt; 0
or (count($up) &gt; 0
and generate-id($up) != generate-id($home)
and $navig.showtitles != 0)
or count($next) &gt; 0"/>
<xsl:if test="$suppress.navigation = '0' and $suppress.header.navigation = '0'">
<div class="navheader">
<xsl:if test="$row1 or $row2">
<table width="100%" summary="Navigation header">
<xsl:if test="$row1">
<tr>
<th colspan="5" align="center">
<xsl:apply-templates select="." mode="object.title.markup"/>
</th>
</tr>
</xsl:if>
<xsl:if test="$row2">
<tr>
<td width="10%" align="{$direction.align.start}">
<xsl:if test="count($prev)>0">
<a accesskey="p">
<xsl:attribute name="href">
<xsl:call-template name="href.target">
<xsl:with-param name="object" select="$prev"/>
</xsl:call-template>
</xsl:attribute>
<xsl:attribute name="title">
<xsl:apply-templates select="$prev" mode="object.title.markup"/>
</xsl:attribute>
<xsl:call-template name="navig.content">
<xsl:with-param name="direction" select="'prev'"/>
</xsl:call-template>
</a>
</xsl:if>
<xsl:text>&#160;</xsl:text>
</td>
<td width="10%" align="{$direction.align.start}">
<xsl:choose>
<xsl:when test="count($up)&gt;0
and generate-id($up) != generate-id($home)">
<a accesskey="u">
<xsl:attribute name="href">
<xsl:call-template name="href.target">
<xsl:with-param name="object" select="$up"/>
</xsl:call-template>
</xsl:attribute>
<xsl:attribute name="title">
<xsl:apply-templates select="$up" mode="object.title.markup"/>
</xsl:attribute>
<xsl:call-template name="navig.content">
<xsl:with-param name="direction" select="'up'"/>
</xsl:call-template>
</a>
</xsl:when>
<xsl:otherwise>&#160;</xsl:otherwise>
</xsl:choose>
</td>
<th width="60%" align="center">
<xsl:choose>
<xsl:when test="count($up) > 0
and generate-id($up) != generate-id($home)
and $navig.showtitles != 0">
<xsl:apply-templates select="$up" mode="object.title.markup"/>
</xsl:when>
<xsl:otherwise>&#160;</xsl:otherwise>
</xsl:choose>
</th>
<td width="10%" align="{$direction.align.end}">
<xsl:choose>
<xsl:when test="$home != . or $nav.context = 'toc'">
<a accesskey="h">
<xsl:attribute name="href">
<xsl:call-template name="href.target">
<xsl:with-param name="object" select="$home"/>
</xsl:call-template>
</xsl:attribute>
<xsl:attribute name="title">
<xsl:apply-templates select="$home" mode="object.title.markup"/>
</xsl:attribute>
<xsl:call-template name="navig.content">
<xsl:with-param name="direction" select="'home'"/>
</xsl:call-template>
</a>
<xsl:if test="$chunk.tocs.and.lots != 0 and $nav.context != 'toc'">
<xsl:text>&#160;|&#160;</xsl:text>
</xsl:if>
</xsl:when>
<xsl:otherwise>&#160;</xsl:otherwise>
</xsl:choose>
</td>
<td width="10%" align="{$direction.align.end}">
<xsl:text>&#160;</xsl:text>
<xsl:if test="count($next)>0">
<a accesskey="n">
<xsl:attribute name="href">
<xsl:call-template name="href.target">
<xsl:with-param name="object" select="$next"/>
</xsl:call-template>
</xsl:attribute>
<xsl:attribute name="title">
<xsl:apply-templates select="$next" mode="object.title.markup"/>
</xsl:attribute>
<xsl:call-template name="navig.content">
<xsl:with-param name="direction" select="'next'"/>
</xsl:call-template>
</a>
</xsl:if>
</td>
</tr>
</xsl:if>
</table>
</xsl:if>
<xsl:if test="$header.rule != 0">
<hr/>
</xsl:if>
</div>
</xsl:if>
</xsl:template>
</xsl:stylesheet>

View File

@@ -1,10 +1,10 @@
<chapter id="performing-switchover" xreflabel="Performing a switchover with repmgr">
<title>Performing a switchover with repmgr</title>
<indexterm>
<primary>switchover</primary>
</indexterm>
<title>Performing a switchover with repmgr</title>
<para>
A typical use-case for replication is a combination of primary and standby
server, with the standby serving as a backup which can easily be activated
@@ -15,7 +15,7 @@
<para>
In some cases however it's desirable to promote the standby in a planned
way, e.g. so maintenance can be performed on the primary; this kind of switchover
is supported by the <xref linkend="repmgr-standby-switchover"> command.
is supported by the <xref linkend="repmgr-standby-switchover"/> command.
</para>
<para>
<command>repmgr standby switchover</command> differs from other &repmgr;
@@ -44,17 +44,18 @@
and capturing all output to assist troubleshooting any problems.
</simpara>
<simpara>
Please also read carefully the sections <xref linkend="preparing-for-switchover"> and
<xref linkend="switchover-caveats"> below.
Please also read carefully the sections <xref linkend="preparing-for-switchover"/> and
<xref linkend="switchover-caveats"/> below.
</simpara>
</note>
<sect1 id="preparing-for-switchover" xreflabel="Preparing for switchover">
<title>Preparing for switchover</title>
<indexterm>
<primary>switchover</primary>
<secondary>preparation</secondary>
</indexterm>
<title>Preparing for switchover</title>
<para>
As mentioned in the previous section, success of the switchover operation depends on
@@ -72,7 +73,8 @@
Ensure that a passwordless SSH connection is possible from the promotion candidate
(standby) to the demotion candidate (current primary). If <literal>--siblings-follow</literal>
will be used, ensure that passwordless SSH connections are possible from the
promotion candidate to all standbys attached to the demotion candidate.
promotion candidate to all nodes attached to the demotion candidate
(including the witness server, if in use).
</para>
<note>
@@ -113,7 +115,7 @@
server.
</para>
<para>
For more details, see <xref linkend="configuration-file-service-commands">.
For more details, see <xref linkend="configuration-file-service-commands"/>.
</para>
</important>
@@ -158,12 +160,12 @@
<note>
<para>
From <link linkend="release-4.2">repmgr 4.2</link>, &repmgr; will instruct any running
<application>repmgrd</application> instances to pause operations while the switchover
is being carried out, to prevent <application>repmgrd</application> from
unintentionally promoting a node. For more details, see <xref linkend="repmgrd-pausing">.
&repmgrd; instances to pause operations while the switchover
is being carried out, to prevent &repmgrd; from
unintentionally promoting a node. For more details, see <xref linkend="repmgrd-pausing"/>.
</para>
<para>
Users of &repmgr; versions prior to 4.2 should ensure that <application>repmgrd</application>
Users of &repmgr; versions prior to 4.2 should ensure that &repmgrd;
is not running on any nodes while a switchover is being executed.
</para>
</note>
@@ -203,18 +205,19 @@
<note>
<simpara>
See <xref linkend="repmgr-standby-switchover"> for a full list of available
See <xref linkend="repmgr-standby-switchover"/> for a full list of available
command line options and <filename>repmgr.conf</filename> settings relevant
to performing a switchover.
</simpara>
</note>
<sect2 id="switchover-pg-rewind" xreflabel="Switchover and pg_rewind">
<sect2 id="switchover-pg-rewind" xreflabel="Switchover and pg_rewind">
<title>Switchover and pg_rewind</title>
<indexterm>
<primary>pg_rewind</primary>
<secondary>using with "repmgr standby switchover"</secondary>
</indexterm>
<title>Switchover and pg_rewind</title>
<para>
If the demotion candidate does not shut down smoothly or cleanly, there's a risk it
will have a slightly divergent timeline and will not be able to attach to the new
@@ -257,11 +260,12 @@
</sect1>
<sect1 id="switchover-execution" xreflabel="Executing the switchover command">
<title>Executing the switchover command</title>
<indexterm>
<primary>switchover</primary>
<secondary>execution</secondary>
</indexterm>
<title>Executing the switchover command</title>
<para>
To demonstrate switchover, we will assume a replication cluster with a
primary (<literal>node1</literal>) and one standby (<literal>node2</literal>);
@@ -312,13 +316,13 @@
</programlisting>
</para>
<para>
If <application>repmgrd</application> is in use, it's worth double-checking that
If &repmgrd; is in use, it's worth double-checking that
all nodes are unpaused by executing <command><link linkend="repmgr-daemon-status">repmgr-daemon-status</link></command>.
</para>
<note>
<para>
Users of &repmgr; versions prior to 4.2 will need to manually restart <application>repmgrd</application>
Users of &repmgr; versions prior to 4.2 will need to manually restart &repmgrd;
on all nodes after the switchover is completed.
</para>
</note>
@@ -327,11 +331,11 @@
<sect1 id="switchover-caveats" xreflabel="Caveats">
<title>Caveats</title>
<indexterm>
<primary>switchover</primary>
<secondary>caveats</secondary>
</indexterm>
<title>Caveats</title>
<para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
@@ -356,11 +360,12 @@
</sect1>
<sect1 id="switchover-troubleshooting" xreflabel="Troubleshooting">
<title>Troubleshooting switchover issues</title>
<indexterm>
<primary>switchover</primary>
<secondary>troubleshooting</secondary>
</indexterm>
<title>Troubleshooting switchover issues</title>
<para>
As <link linkend="performing-switchover">emphasised previously</link>, performing a switchover

View File

@@ -1,10 +1,10 @@
<chapter id="upgrading-repmgr" xreflabel="Upgrading repmgr">
<title>Upgrading repmgr</title>
<indexterm>
<primary>upgrading</primary>
</indexterm>
<title>Upgrading repmgr</title>
<para>
&repmgr; is updated regularly with minor releases (e.g. 4.0.1 to 4.0.2)
@@ -13,18 +13,19 @@
</para>
<sect1 id="upgrading-repmgr-extension" xreflabel="Upgrading repmgr 4.x and later">
<title>Upgrading repmgr 4.x and later</title>
<indexterm>
<primary>upgrading</primary>
<secondary>repmgr 4.x and later</secondary>
</indexterm>
<title>Upgrading repmgr 4.x and later</title>
<para>
From version 4, &repmgr; consists of three elements:
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
the <application>repmgr</application> and <application>repmgrd</application> executables
the <application>repmgr</application> and &repmgrd; executables
</simpara>
</listitem>
@@ -37,7 +38,7 @@
<listitem>
<simpara>
the shared library module used by <application>repmgrd</application> which
the shared library module used by &repmgrd; which
is resident in the PostgreSQL backend
</simpara>
</listitem>
@@ -45,8 +46,8 @@
</para>
<para>
With <emphasis>minor releases</emphasis>, usually changes are only made to the <application>repmgr</application>
and <application>repmgrd</application> executables. In this case, the upgrade is quite straightforward,
and is simply a case of installing the new version, and restarting <application>repmgrd</application>
and &repmgrd; executables. In this case, the upgrade is quite straightforward,
and is simply a case of installing the new version, and restarting &repmgrd;
(if running).
</para>
@@ -63,11 +64,12 @@
</important>
<sect2 id="upgrading-minor-version" xreflabel="Upgrading a minor version release">
<title>Upgrading a minor version release</title>
<indexterm>
<primary>upgrading</primary>
<secondary>minor release</secondary>
</indexterm>
<title>Upgrading a minor version release</title>
<para>
The process for installing minor version upgrades is quite straightforward:
@@ -82,7 +84,7 @@
<listitem>
<simpara>
restart <application>repmgrd</application> on all nodes where it is running
restart &repmgrd; on all nodes where it is running
</simpara>
</listitem>
@@ -93,7 +95,7 @@
<note>
<para>
Some packaging systems (e.g. <link linkend="packages-debian-ubuntu">Debian/Ubuntu</link>
may restart <application>repmgrd</application> as part of the package upgrade process.
may restart &repmgrd; as part of the package upgrade process.
</para>
</note>
@@ -118,15 +120,17 @@
</sect2>
<sect2 id="upgrading-major-version" xreflabel="Upgrading a major version release">
<title>Upgrading a major version release</title>
<indexterm>
<primary>upgrading</primary>
<secondary>major release</secondary>
</indexterm>
<title>Upgrading a major version release</title>
<para>
&quot;major version&quot; upgrades need to be planned more carefully, as they may include
changes to the &repmgr; metadata (which need to be propagated from the primary to all
standbys) and/or changes to the shared object file used by <application>repmgrd</application>
standbys) and/or changes to the shared object file used by &repmgrd;
(which require a PostgreSQL restart).
</para>
<para>
@@ -138,14 +142,14 @@
<listitem>
<simpara>
Stop <application>repmgrd</application> (if in use) on all nodes where it is running.
Stop &repmgrd; (if in use) on all nodes where it is running.
</simpara>
</listitem>
<listitem>
<simpara>
Disable the <application>repmgrd</application> service on all nodes where it is in use;
this is to prevent packages from prematurely restarting <application>repmgrd</application>.
Disable the &repmgrd; service on all nodes where it is in use;
this is to prevent packages from prematurely restarting &repmgrd;.
</simpara>
</listitem>
@@ -167,12 +171,12 @@ systemctl daemon-reload</programlisting>
<listitem>
<simpara>
If the &repmgr; shared library module has been updated (check the <link linkend="appendix-release-notes">release notes</link>!),
restart PostgreSQL, then <application>repmgrd</application> (if in use) on each node,
restart PostgreSQL, then &repmgrd; (if in use) on each node,
The order in which this is applied to individual nodes is not critical,
and it's also fine to restart PostgreSQL on all nodes first before starting <application>repmgrd</application>.
and it's also fine to restart PostgreSQL on all nodes first before starting &repmgrd;.
</simpara>
<simpara>
Note that if the upgrade requires a PostgreSQL restart, <application>repmgrd</application>
Note that if the upgrade requires a PostgreSQL restart, &repmgrd;
will only function correctly once all nodes have been restarted.
</simpara>
</listitem>
@@ -188,7 +192,7 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
<listitem>
<simpara>
Reenable the <application>repmgrd</application> service on all nodes where it is in use, and
Reenable the &repmgrd; service on all nodes where it is in use, and
ensure it is running.
</simpara>
</listitem>
@@ -205,19 +209,22 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
</sect2>
<sect2 id="upgrading-check-repmgrd" xreflabel="Checking repmgrd status after an upgrade">
<title>Checking repmgrd status after an upgrade</title>
<indexterm>
<primary>upgrading</primary>
<secondary>checking repmgrd status</secondary>
</indexterm>
<title>Checking repmgrd status after an upgrade</title>
<para>
From <link linkend="release-4.2">repmgr 4.2</link>, once the upgrade is complete, execute the <command><link linkend="repmgr-daemon-status">repmgr daemon status</link></command>
command (on any node) to show an overview of the status of <application>repmgrd</application> on all nodes.
command (on any node) to show an overview of the status of &repmgrd; on all nodes.
</para>
</sect2>
</sect1>
<sect1 id="upgrading-and-pg-upgrade" xreflabel="pg_upgrade and repmgr">
<title>pg_upgrade and repmgr</title>
<indexterm>
<primary>upgrading</primary>
<secondary>pg_upgrade</secondary>
@@ -225,7 +232,6 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
<indexterm>
<primary>pg_upgrade</primary>
</indexterm>
<title>pg_upgrade and repmgr</title>
<para>
<application>pg_upgrade</application> requires that if any functions are
@@ -265,12 +271,13 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
<sect1 id="upgrading-from-repmgr-3" xreflabel="Upgrading from repmgr 3.x">
<title>Upgrading from repmgr 3.x</title>
<indexterm>
<primary>upgrading</primary>
<secondary>from repmgr 3.x</secondary>
</indexterm>
<title>Upgrading from repmgr 3.x</title>
<para>
The upgrade process consists of two steps:
<orderedlist>
@@ -332,7 +339,7 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
</listitem>
<listitem>
<simpara><varname>monitoring_history</varname>: this replaces the
<application>repmgrd</application> command line option
&repmgrd; command line option
<literal>--monitoring-history</literal></simpara>
</listitem>
</itemizedlist>
@@ -383,7 +390,7 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
to the server configured in Barman (in &repmgr; 3, the deprecated
<literal>cluster</literal> parameter was used for this);
the physical Barman hostname is configured with
<literal>barman_host</literal> (see <xref linkend="cloning-from-barman-prerequisites">
<literal>barman_host</literal> (see <xref linkend="cloning-from-barman-prerequisites"/>
for details).
</para>
</note>
@@ -433,7 +440,7 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
<sect2>
<title>Upgrading the repmgr schema</title>
<para>
Ensure <application>repmgrd</application> is not running, or any cron jobs which execute the
Ensure &repmgrd; is not running, or any cron jobs which execute the
<command>repmgr</command> binary.
</para>
<para>
@@ -499,7 +506,7 @@ ALTER EXTENSION repmgr UPDATE</programlisting>
</para>
<para>
Check the data is updated as expected by examining the <structname>repmgr.nodes</structname>
table; restart <application>repmgrd</application> if required.
table; restart &repmgrd; if required.
</para>
<para>
The original <literal>repmgr_$cluster</literal> schema can be dropped at any time.

View File

@@ -1,21 +1,20 @@
/* PostgreSQL.org Documentation Style */
/*
* Documentation generated by XSL stylesheets has lower-case class
* names, older documentation generated by DSSSL stylesheets has
* upper-case class names, so we need to support both for a while. In
* some cases, the elements and classes differ further between the two
* stylesheets.
*/
/* requires global.css, table.css and text.css to be loaded before this file! */
body {
font-family: verdana, sans-serif;
font-size: 76%;
background: url("/resources/background.png") repeat-x scroll left top transparent;
padding: 15px 4%;
margin: 0;
}
/* monospace font size fix */
pre, code, kbd, samp, tt {
font-family: monospace,monospace;
font-size: 1em;
}
div.NAVHEADER table {
.navheader table,
.NAVHEADER table {
margin-left: 0;
}
@@ -99,7 +98,7 @@ div.NAVHEADER table {
#docSearch form input {
font-size: 0.95em;
}
#docSearch form #submit {
font-size: 0.95em;
background: #7A7A7A;
@@ -107,7 +106,7 @@ div.NAVHEADER table {
border: 1px solid #7A7A7A;
padding: 1px 4px;
}
#docSearch form #q {
width: 170px;
font-size: 0.95em;
@@ -138,9 +137,9 @@ div.NAVHEADER table {
#docFooter {
position: relative;
font-size: 0.9em;
color: #666;
line-height: 1.3em;
font-size: 0.9em;
color: #666;
line-height: 1.3em;
margin-left: 10px;
margin-right: 10px;
}
@@ -160,7 +159,6 @@ div.NAVHEADER table {
h1, h2, h3 {
font-weight: bold;
margin-top: 2ex;
color: #444;
}
h1 {
@@ -175,20 +173,35 @@ h3 {
font-size: 1.1em;
}
h1 a:hover,
h1 a:hover {
color: #EC5800;
text-decoration: none;
}
h2 a:hover,
h3 a:hover,
h4 a:hover {
color: #444;
color: #666666;
text-decoration: none;
}
/*
* Change color of h2 chunk titles in XSL build. (In DSSSL build,
* these will be h1, which is already handled elsewhere.)
*/
.titlepage h2.title,
.refnamediv h2 {
color: #EC5800;
}
/* Text Styles */
div.sect2,
div.SECT2 {
margin-top: 4ex;
}
div.sect3,
div.SECT3 {
margin-top: 3ex;
margin-left: 3ex;
@@ -203,7 +216,7 @@ p, ol, ul, li {
}
.txtCommentsWrap {
border: 2px solid #F5F5F5;
border: 2px solid #F5F5F5;
width: 100%;
}
@@ -233,6 +246,17 @@ p, ol, ul, li {
font-size: 1em;
}
pre.literallayout,
.screen,
.synopsis,
.programlisting,
.refsynopsisdiv p,
.caution,
.warning,
.note,
.tip,
.table table,
.informaltable table,
pre.LITERALLAYOUT,
.SCREEN,
.SYNOPSIS,
@@ -250,6 +274,15 @@ table.CALSTABLE {
box-shadow: 3px 3px 5px #DFDFDF;
}
pre.literallayout,
.screen,
.synopsis,
.programlisting,
.refsynopsisdiv p,
.caution,
.warning,
.note,
.tip,
pre.LITERALLAYOUT,
.SCREEN,
.SYNOPSIS,
@@ -271,6 +304,11 @@ blockquote.TIP {
border-radius: 8px;
}
pre.literallayout,
pre.synopsis,
pre.programlisting,
.refsynopsisdiv p,
.screen,
pre.LITERALLAYOUT,
pre.SYNOPSIS,
pre.PROGRAMLISTING,
@@ -280,6 +318,8 @@ pre.PROGRAMLISTING,
background-color: #F7F7F7;
}
.note,
.tip,
blockquote.NOTE,
blockquote.TIP {
border-color: #DBDBCC;
@@ -288,6 +328,10 @@ blockquote.TIP {
width: 572px;
}
.note,
.tip,
.caution,
.warning,
blockquote.NOTE,
blockquote.TIP,
table.CAUTION,
@@ -295,11 +339,17 @@ table.WARNING {
margin: 4ex auto;
}
.note p,
.tip p,
blockquote.NOTE p,
blockquote.TIP p {
margin: 0;
}
.note pre,
.note code,
.tip pre,
.tip code,
blockquote.NOTE pre,
blockquote.NOTE code,
blockquote.TIP pre,
@@ -313,11 +363,24 @@ blockquote.TIP code {
box-shadow: none;
}
.caution,
.warning {
max-width: 600px;
}
.tip h3,
.note h3,
.caution h3,
.warning h3 {
text-align: center;
}
.emphasis,
.c2 {
font-weight: bold;
}
.replaceable,
.REPLACEABLE {
font-style: italic;
}
@@ -328,6 +391,10 @@ table {
margin-left: 2ex;
}
.table table td,
.table table th,
.informaltable table td,
.informaltable table th,
table.CALSTABLE td,
table.CALSTABLE th,
table.CAUTION td,
@@ -337,6 +404,8 @@ table.WARNING th {
border-style: solid;
}
.table table,
.informaltable table,
table.CALSTABLE,
table.CAUTION,
table.WARNING {
@@ -344,6 +413,8 @@ table.WARNING {
border-collapse: collapse;
}
.table table,
.informaltable table,
table.CALSTABLE
{
margin: 2ex 0 2ex 2ex;
@@ -351,15 +422,23 @@ table.CALSTABLE
border: 2px solid #A7C6DF;
}
.table table tr:hover td,
.informaltable table tr:hover td,
table.CALSTABLE tr:hover td
{
background-color: #EFEFEF;
}
.table table td,
.informaltable table td,
table.CALSTABLE td {
background-color: #FFF;
}
.table table td,
.table table th,
.informaltable table td,
.informaltable table th,
table.CALSTABLE td,
table.CALSTABLE th {
border: 1px solid #A7C6DF;
@@ -374,11 +453,13 @@ table.WARNING {
max-width: 600px;
}
.caution,
table.CAUTION {
background-color: #F5F5DC;
border-color: #DEDFA7;
}
.warning,
table.WARNING {
background-color: #FFD7D7;
border-color: #DF421E;
@@ -447,23 +528,7 @@ a:hover {
color:#666;
}
#docContainer code.function tt,
#docContainer code.FUNCTION tt {
font-size: 1em;
}
div.header {
color: #444;
margin-top: 5px;
}
div.footer {
text-align: center;
background-image: url("/resources/footerl.png"), url("/resources/footerr.png"), url("/resources/footerc.png");
background-position: left top, right top, center top;
background-repeat: no-repeat, no-repeat, repeat-x;
padding-top: 45px;
}
img {
border-style: none;
}

2
log.c
View File

@@ -85,7 +85,7 @@ _stderr_log_with_level(const char *level_name, int level, const char *fmt, va_li
time(&t);
tm = localtime(&t);
strftime(buf, 100, "[%Y-%m-%d %H:%M:%S]", tm);
strftime(buf, sizeof(buf), "[%Y-%m-%d %H:%M:%S]", tm);
fprintf(stderr, "%s [%s] ", buf, level_name);
}
else

View File

@@ -78,8 +78,6 @@ CREATE VIEW repmgr.show_nodes AS
LEFT JOIN repmgr.nodes un
ON un.node_id = n.upstream_node_id;
/* XXX update upgrade scripts! */
CREATE TABLE repmgr.voting_term (
term INT NOT NULL
);

View File

@@ -78,8 +78,6 @@ CREATE VIEW repmgr.show_nodes AS
LEFT JOIN repmgr.nodes un
ON un.node_id = n.upstream_node_id;
/* XXX update upgrade scripts! */
CREATE TABLE repmgr.voting_term (
term INT NOT NULL
);

View File

@@ -78,8 +78,6 @@ CREATE VIEW repmgr.show_nodes AS
LEFT JOIN repmgr.nodes un
ON un.node_id = n.upstream_node_id;
/* XXX update upgrade scripts! */
CREATE TABLE repmgr.voting_term (
term INT NOT NULL
);

19
repmgr--4.3--4.4.sql Normal file
View File

@@ -0,0 +1,19 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit
DROP FUNCTION set_upstream_last_seen();
CREATE FUNCTION set_upstream_last_seen(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_upstream_last_seen'
LANGUAGE C STRICT;
CREATE FUNCTION get_upstream_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_upstream_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION set_upstream_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_upstream_node_id'
LANGUAGE C STRICT;

View File

@@ -78,8 +78,6 @@ CREATE VIEW repmgr.show_nodes AS
LEFT JOIN repmgr.nodes un
ON un.node_id = n.upstream_node_id;
/* XXX update upgrade scripts! */
CREATE TABLE repmgr.voting_term (
term INT NOT NULL
);

224
repmgr--4.4.sql Normal file
View File

@@ -0,0 +1,224 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit
CREATE TABLE repmgr.nodes (
node_id INTEGER PRIMARY KEY,
upstream_node_id INTEGER NULL REFERENCES nodes (node_id) DEFERRABLE,
active BOOLEAN NOT NULL DEFAULT TRUE,
node_name TEXT NOT NULL,
type TEXT NOT NULL CHECK (type IN('primary','standby','witness','bdr')),
location TEXT NOT NULL DEFAULT 'default',
priority INT NOT NULL DEFAULT 100,
conninfo TEXT NOT NULL,
repluser VARCHAR(63) NOT NULL,
slot_name TEXT NULL,
config_file TEXT NOT NULL
);
CREATE TABLE repmgr.events (
node_id INTEGER NOT NULL,
event TEXT NOT NULL,
successful BOOLEAN NOT NULL DEFAULT TRUE,
event_timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
details TEXT NULL
);
DO $repmgr$
DECLARE
DECLARE server_version_num INT;
BEGIN
SELECT setting
FROM pg_catalog.pg_settings
WHERE name = 'server_version_num'
INTO server_version_num;
IF server_version_num >= 90400 THEN
EXECUTE $repmgr_func$
CREATE TABLE repmgr.monitoring_history (
primary_node_id INTEGER NOT NULL,
standby_node_id INTEGER NOT NULL,
last_monitor_time TIMESTAMP WITH TIME ZONE NOT NULL,
last_apply_time TIMESTAMP WITH TIME ZONE,
last_wal_primary_location PG_LSN NOT NULL,
last_wal_standby_location PG_LSN,
replication_lag BIGINT NOT NULL,
apply_lag BIGINT NOT NULL
)
$repmgr_func$;
ELSE
EXECUTE $repmgr_func$
CREATE TABLE repmgr.monitoring_history (
primary_node_id INTEGER NOT NULL,
standby_node_id INTEGER NOT NULL,
last_monitor_time TIMESTAMP WITH TIME ZONE NOT NULL,
last_apply_time TIMESTAMP WITH TIME ZONE,
last_wal_primary_location TEXT NOT NULL,
last_wal_standby_location TEXT,
replication_lag BIGINT NOT NULL,
apply_lag BIGINT NOT NULL
)
$repmgr_func$;
END IF;
END$repmgr$;
CREATE INDEX idx_monitoring_history_time
ON repmgr.monitoring_history (last_monitor_time, standby_node_id);
CREATE VIEW repmgr.show_nodes AS
SELECT n.node_id,
n.node_name,
n.active,
n.upstream_node_id,
un.node_name AS upstream_node_name,
n.type,
n.priority,
n.conninfo
FROM repmgr.nodes n
LEFT JOIN repmgr.nodes un
ON un.node_id = n.upstream_node_id;
CREATE TABLE repmgr.voting_term (
term INT NOT NULL
);
CREATE UNIQUE INDEX voting_term_restrict
ON repmgr.voting_term ((TRUE));
CREATE RULE voting_term_delete AS
ON DELETE TO repmgr.voting_term
DO INSTEAD NOTHING;
/* ================= */
/* repmgrd functions */
/* ================= */
/* monitoring functions */
CREATE FUNCTION set_local_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION get_local_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION standby_set_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'standby_set_last_updated'
LANGUAGE C STRICT;
CREATE FUNCTION standby_get_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'standby_get_last_updated'
LANGUAGE C STRICT;
CREATE FUNCTION set_upstream_last_seen(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_upstream_last_seen'
LANGUAGE C STRICT;
CREATE FUNCTION get_upstream_last_seen()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_upstream_last_seen'
LANGUAGE C STRICT;
CREATE FUNCTION get_upstream_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_upstream_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION set_upstream_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_upstream_node_id'
LANGUAGE C STRICT;
/* failover functions */
CREATE FUNCTION notify_follow_primary(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'notify_follow_primary'
LANGUAGE C STRICT;
CREATE FUNCTION get_new_primary()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_new_primary'
LANGUAGE C STRICT;
CREATE FUNCTION reset_voting_status()
RETURNS VOID
AS 'MODULE_PATHNAME', 'reset_voting_status'
LANGUAGE C STRICT;
CREATE FUNCTION am_bdr_failover_handler(INT)
RETURNS BOOL
AS 'MODULE_PATHNAME', 'am_bdr_failover_handler'
LANGUAGE C STRICT;
CREATE FUNCTION unset_bdr_failover_handler()
RETURNS VOID
AS 'MODULE_PATHNAME', 'unset_bdr_failover_handler'
LANGUAGE C STRICT;
CREATE FUNCTION get_repmgrd_pid()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_repmgrd_pid'
LANGUAGE C STRICT;
CREATE FUNCTION get_repmgrd_pidfile()
RETURNS TEXT
AS 'MODULE_PATHNAME', 'get_repmgrd_pidfile'
LANGUAGE C STRICT;
CREATE FUNCTION set_repmgrd_pid(INT, TEXT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_repmgrd_pid'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_is_running()
RETURNS BOOL
AS 'MODULE_PATHNAME', 'repmgrd_is_running'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_pause(BOOL)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgrd_pause'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_is_paused()
RETURNS BOOL
AS 'MODULE_PATHNAME', 'repmgrd_is_paused'
LANGUAGE C STRICT;
CREATE FUNCTION get_wal_receiver_pid()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_wal_receiver_pid'
LANGUAGE C STRICT;
/* views */
CREATE VIEW repmgr.replication_status AS
SELECT m.primary_node_id, m.standby_node_id, n.node_name AS standby_name,
n.type AS node_type, n.active, last_monitor_time,
CASE WHEN n.type='standby' THEN m.last_wal_primary_location ELSE NULL END AS last_wal_primary_location,
m.last_wal_standby_location,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.replication_lag) ELSE NULL END AS replication_lag,
CASE WHEN n.type='standby' THEN
CASE WHEN replication_lag > 0 THEN age(now(), m.last_apply_time) ELSE '0'::INTERVAL END
ELSE NULL
END AS replication_time_lag,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.apply_lag) ELSE NULL END AS apply_lag,
AGE(NOW(), CASE WHEN pg_catalog.pg_is_in_recovery() THEN repmgr.standby_get_last_updated() ELSE m.last_monitor_time END) AS communication_time_lag
FROM repmgr.monitoring_history m
JOIN repmgr.nodes n ON m.standby_node_id = n.node_id
WHERE (m.standby_node_id, m.last_monitor_time) IN (
SELECT m1.standby_node_id, MAX(m1.last_monitor_time)
FROM repmgr.monitoring_history m1 GROUP BY 1
);

View File

@@ -93,6 +93,15 @@ do_bdr_register(void)
exit(ERR_BAD_CONFIG);
}
if (get_bdr_version_num() > 2)
{
log_error(_("\"repmgr bdr register\" is for BDR 2.x only"));
PQfinish(conn);
pfree(dbname);
exit(ERR_BAD_CONFIG);
}
/* check for a matching BDR node */
{
PQExpBufferData bdr_local_node_name;
@@ -216,7 +225,7 @@ do_bdr_register(void)
ExtensionStatus other_node_extension_status = REPMGR_UNKNOWN;
/* skip the local node */
if (strncmp(node_info.node_name, bdr_cell->node_info->node_name, MAXLEN) == 0)
if (strncmp(node_info.node_name, bdr_cell->node_info->node_name, sizeof(node_info.node_name)) == 0)
{
continue;
}
@@ -304,9 +313,9 @@ do_bdr_register(void)
node_info.active = true;
node_info.priority = config_file_options.priority;
strncpy(node_info.node_name, config_file_options.node_name, MAXLEN);
strncpy(node_info.location, config_file_options.location, MAXLEN);
strncpy(node_info.conninfo, config_file_options.conninfo, MAXLEN);
strncpy(node_info.node_name, config_file_options.node_name, sizeof(node_info.node_name));
strncpy(node_info.location, config_file_options.location, sizeof(node_info.location));
strncpy(node_info.conninfo, config_file_options.conninfo, sizeof(node_info.conninfo));
if (record_status == RECORD_FOUND)
{
@@ -330,7 +339,7 @@ do_bdr_register(void)
* name set when the node was registered.
*/
if (strncmp(node_info.node_name, config_file_options.node_name, MAXLEN) != 0)
if (strncmp(node_info.node_name, config_file_options.node_name, sizeof(node_info.node_name)) != 0)
{
log_error(_("a record for node %i is already registered with node_name \"%s\""),
config_file_options.node_id, node_info.node_name);

View File

@@ -24,7 +24,7 @@
#include "repmgr-client-global.h"
#include "repmgr-action-cluster.h"
#define SHOW_HEADER_COUNT 8
#define SHOW_HEADER_COUNT 9
typedef enum
{
@@ -35,6 +35,7 @@ typedef enum
SHOW_UPSTREAM_NAME,
SHOW_LOCATION,
SHOW_PRIORITY,
SHOW_TIMELINE_ID,
SHOW_CONNINFO
} ShowHeader;
@@ -64,6 +65,7 @@ static void cube_set_node_status(t_node_status_cube **cube, int n, int node_id,
* CLUSTER SHOW
*
* Parameters:
* --compact
* --csv
*/
void
@@ -112,9 +114,15 @@ do_cluster_show(void)
strncpy(headers_show[SHOW_LOCATION].title, _("Location"), MAXLEN);
if (runtime_options.compact == true)
{
strncpy(headers_show[SHOW_PRIORITY].title, _("Prio."), MAXLEN);
strncpy(headers_show[SHOW_TIMELINE_ID].title, _("TLI"), MAXLEN);
}
else
{
strncpy(headers_show[SHOW_PRIORITY].title, _("Priority"), MAXLEN);
strncpy(headers_show[SHOW_TIMELINE_ID].title, _("Timeline"), MAXLEN);
}
strncpy(headers_show[SHOW_CONNINFO].title, _("Connection string"), MAXLEN);
@@ -127,6 +135,16 @@ do_cluster_show(void)
{
headers_show[i].display = true;
/* Don't display timeline on pre-9.6 clusters */
if (i == SHOW_TIMELINE_ID)
{
if (PQserverVersion(conn) < 90600)
{
headers_show[i].display = false;
}
}
/* if --compact provided, don't display conninfo */
if (runtime_options.compact == true)
{
if (i == SHOW_CONNINFO)
@@ -135,34 +153,38 @@ do_cluster_show(void)
}
}
if (headers_show[i].display == true)
{
headers_show[i].max_length = strlen(headers_show[i].title);
}
}
/*
* TODO: count nodes marked as "? unreachable" and add a hint about
* the other cluster commands for better determining whether
* unreachable.
*/
for (cell = nodes.head; cell; cell = cell->next)
{
PQExpBufferData details;
PQExpBufferData node_status;
PQExpBufferData upstream;
PQExpBufferData buf;
cell->node_info->replication_info = palloc0(sizeof(ReplInfo));
if (cell->node_info->replication_info == NULL)
{
log_error(_("unable to allocate memory"));
exit(ERR_INTERNAL);
}
init_replication_info(cell->node_info->replication_info);
cell->node_info->conn = establish_db_connection_quiet(cell->node_info->conninfo);
if (PQstatus(cell->node_info->conn) == CONNECTION_OK)
if (PQstatus(cell->node_info->conn) != CONNECTION_OK)
{
cell->node_info->node_status = NODE_STATUS_UP;
cell->node_info->recovery_type = get_recovery_type(cell->node_info->conn);
}
else
{
/* check if node is reachable, but just not letting us in */
if (is_server_available(cell->node_info->conninfo))
cell->node_info->node_status = NODE_STATUS_REJECTED;
else
cell->node_info->node_status = NODE_STATUS_DOWN;
cell->node_info->recovery_type = RECTYPE_UNKNOWN;
connection_error_found = true;
if (runtime_options.verbose)
@@ -181,235 +203,25 @@ do_cluster_show(void)
cell->node_info->node_name, cell->node_info->node_id);
}
}
initPQExpBuffer(&details);
/*
* TODO: count nodes marked as "? unreachable" and add a hint about
* the other cluster commands for better determining whether
* unreachable.
*/
switch (cell->node_info->type)
else
{
case PRIMARY:
{
/* node is reachable */
if (cell->node_info->node_status == NODE_STATUS_UP)
{
if (cell->node_info->active == true)
{
switch (cell->node_info->recovery_type)
{
case RECTYPE_PRIMARY:
appendPQExpBufferStr(&details, "* running");
break;
case RECTYPE_STANDBY:
appendPQExpBufferStr(&details, "! running as standby");
item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is registered as primary but running as standby",
cell->node_info->node_name, cell->node_info->node_id);
break;
case RECTYPE_UNKNOWN:
appendPQExpBufferStr(&details, "! unknown");
item_list_append_format(&warnings,
"node \"%s\" (ID: %i) has unknown replication status",
cell->node_info->node_name, cell->node_info->node_id);
break;
}
}
else
{
if (cell->node_info->recovery_type == RECTYPE_PRIMARY)
{
appendPQExpBufferStr(&details, "! running");
item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is running but the repmgr node record is inactive",
cell->node_info->node_name, cell->node_info->node_id);
}
else
{
appendPQExpBufferStr(&details, "! running as standby");
item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is registered as an inactive primary but running as standby",
cell->node_info->node_name, cell->node_info->node_id);
}
}
}
/* node is up but cannot connect */
else if (cell->node_info->node_status == NODE_STATUS_REJECTED)
{
if (cell->node_info->active == true)
{
appendPQExpBufferStr(&details, "? running");
}
else
{
appendPQExpBufferStr(&details, "! running");
error_found = true;
}
}
/* node is unreachable */
else
{
/* node is unreachable but marked active */
if (cell->node_info->active == true)
{
appendPQExpBufferStr(&details, "? unreachable");
item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is registered as an active primary but is unreachable",
cell->node_info->node_name, cell->node_info->node_id);
}
/* node is unreachable and marked as inactive */
else
{
appendPQExpBufferStr(&details, "- failed");
error_found = true;
}
}
}
break;
case STANDBY:
{
/* node is reachable */
if (cell->node_info->node_status == NODE_STATUS_UP)
{
if (cell->node_info->active == true)
{
switch (cell->node_info->recovery_type)
{
case RECTYPE_STANDBY:
appendPQExpBufferStr(&details, " running");
break;
case RECTYPE_PRIMARY:
appendPQExpBufferStr(&details, "! running as primary");
item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is registered as standby but running as primary",
cell->node_info->node_name, cell->node_info->node_id);
break;
case RECTYPE_UNKNOWN:
appendPQExpBufferStr(&details, "! unknown");
item_list_append_format(
&warnings,
"node \"%s\" (ID: %i) has unknown replication status",
cell->node_info->node_name, cell->node_info->node_id);
break;
}
}
else
{
if (cell->node_info->recovery_type == RECTYPE_STANDBY)
{
appendPQExpBufferStr(&details, "! running");
item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is running but the repmgr node record is inactive",
cell->node_info->node_name, cell->node_info->node_id);
}
else
{
appendPQExpBufferStr(&details, "! running as primary");
item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is running as primary but the repmgr node record is inactive",
cell->node_info->node_name, cell->node_info->node_id);
}
}
/* warn about issue with paused WAL replay */
if (is_wal_replay_paused(cell->node_info->conn, true))
{
item_list_append_format(&warnings,
_("WAL replay is paused on node \"%s\" (ID: %i) with WAL replay pending; this node cannot be manually promoted until WAL replay is resumed"),
cell->node_info->node_name, cell->node_info->node_id);
}
}
/* node is up but cannot connect */
else if (cell->node_info->node_status == NODE_STATUS_REJECTED)
{
if (cell->node_info->active == true)
{
appendPQExpBufferStr(&details, "? running");
}
else
{
appendPQExpBufferStr(&details, "! running");
error_found = true;
}
}
/* node is unreachable */
else
{
/* node is unreachable but marked active */
if (cell->node_info->active == true)
{
appendPQExpBufferStr(&details, "? unreachable");
item_list_append_format(&warnings,
"node \"%s\" (ID: %i) is registered as an active standby but is unreachable",
cell->node_info->node_name, cell->node_info->node_id);
}
else
{
appendPQExpBufferStr(&details, "- failed");
error_found = true;
}
}
}
break;
case WITNESS:
case BDR:
{
/* node is reachable */
if (cell->node_info->node_status == NODE_STATUS_UP)
{
if (cell->node_info->active == true)
{
appendPQExpBufferStr(&details, "* running");
}
else
{
appendPQExpBufferStr(&details, "! running");
error_found = true;
}
}
/* node is up but cannot connect */
else if (cell->node_info->node_status == NODE_STATUS_REJECTED)
{
if (cell->node_info->active == true)
{
appendPQExpBufferStr(&details, "? rejected");
}
else
{
appendPQExpBufferStr(&details, "! failed");
error_found = true;
}
}
/* node is unreachable */
else
{
if (cell->node_info->active == true)
{
appendPQExpBufferStr(&details, "? unreachable");
}
else
{
appendPQExpBufferStr(&details, "- failed");
error_found = true;
}
}
}
break;
case UNKNOWN:
{
/* this should never happen */
appendPQExpBufferStr(&details, "? unknown node type");
error_found = true;
}
break;
/* NOP on pre-9.6 servers */
cell->node_info->replication_info->timeline_id = get_node_timeline(cell->node_info->conn);
}
strncpy(cell->node_info->details, details.data, MAXLEN);
termPQExpBuffer(&details);
initPQExpBuffer(&node_status);
initPQExpBuffer(&upstream);
if (format_node_status(cell->node_info, &node_status, &upstream, &warnings) == true)
error_found = true;
snprintf(cell->node_info->details, sizeof(cell->node_info->details),
"%s", node_status.data);
snprintf(cell->node_info->upstream_node_name, sizeof(cell->node_info->upstream_node_name),
"%s", upstream.data);
termPQExpBuffer(&node_status);
termPQExpBuffer(&upstream);
PQfinish(cell->node_info->conn);
cell->node_info->conn = NULL;
@@ -422,6 +234,7 @@ do_cluster_show(void)
headers_show[SHOW_ROLE].cur_length = strlen(get_node_type_string(cell->node_info->type));
headers_show[SHOW_NAME].cur_length = strlen(cell->node_info->node_name);
headers_show[SHOW_STATUS].cur_length = strlen(cell->node_info->details);
headers_show[SHOW_UPSTREAM_NAME].cur_length = strlen(cell->node_info->upstream_node_name);
initPQExpBuffer(&buf);
@@ -431,7 +244,18 @@ do_cluster_show(void)
headers_show[SHOW_LOCATION].cur_length = strlen(cell->node_info->location);
if (cell->node_info->replication_info->timeline_id == UNKNOWN_TIMELINE_ID)
{
/* display "?" */
headers_show[SHOW_PRIORITY].cur_length = 1;
}
else
{
initPQExpBuffer(&buf);
appendPQExpBuffer(&buf, "%i", cell->node_info->replication_info->timeline_id);
headers_show[SHOW_PRIORITY].cur_length = strlen(buf.data);
termPQExpBuffer(&buf);
}
headers_show[SHOW_CONNINFO].cur_length = strlen(cell->node_info->conninfo);
@@ -496,6 +320,14 @@ do_cluster_show(void)
printf("| %-*s ", headers_show[SHOW_LOCATION].max_length, cell->node_info->location);
printf("| %-*i ", headers_show[SHOW_PRIORITY].max_length, cell->node_info->priority);
if (headers_show[SHOW_TIMELINE_ID].display == true)
{
if (cell->node_info->replication_info->timeline_id == UNKNOWN_TIMELINE_ID)
printf("| %-*c ", headers_show[SHOW_TIMELINE_ID].max_length, '?');
else
printf("| %-*i ", headers_show[SHOW_TIMELINE_ID].max_length, (int)cell->node_info->replication_info->timeline_id);
}
if (headers_show[SHOW_CONNINFO].display == true)
{
printf("| %-*s", headers_show[SHOW_CONNINFO].max_length, cell->node_info->conninfo);
@@ -550,6 +382,7 @@ do_cluster_show(void)
* --node-[id|name]
* --event
* --csv
* --compact
*/
void
@@ -595,11 +428,11 @@ do_cluster_event(void)
strncpy(headers_event[EV_DETAILS].title, _("Details"), MAXLEN);
/*
* If --terse or --csv provided, simply omit the "Details" column.
* If --compact or --csv provided, simply omit the "Details" column.
* In --csv mode we'd need to quote/escape the contents "Details" column,
* which is doable but which will remain a TODO for now.
*/
if (runtime_options.terse == true || runtime_options.output_mode == OM_CSV)
if (runtime_options.compact == true || runtime_options.output_mode == OM_CSV)
column_count --;
for (i = 0; i < column_count; i++)
@@ -1063,7 +896,9 @@ build_cluster_matrix(t_node_matrix_rec ***matrix_rec_dest, int *name_length, Ite
matrix_rec_list[i] = (t_node_matrix_rec *) pg_malloc0(sizeof(t_node_matrix_rec));
matrix_rec_list[i]->node_id = cell->node_info->node_id;
strncpy(matrix_rec_list[i]->node_name, cell->node_info->node_name, MAXLEN);
strncpy(matrix_rec_list[i]->node_name,
cell->node_info->node_name,
sizeof(matrix_rec_list[i]->node_name));
/*
* Find the maximum length of a node name
@@ -1278,7 +1113,7 @@ build_cluster_crosscheck(t_node_status_cube ***dest_cube, int *name_length, Item
cube[h] = (t_node_status_cube *) pg_malloc(sizeof(t_node_status_cube));
cube[h]->node_id = cell->node_info->node_id;
strncpy(cube[h]->node_name, cell->node_info->node_name, MAXLEN);
strncpy(cube[h]->node_name, cell->node_info->node_name, sizeof(cube[h]->node_name));
/*
* Find the maximum length of a node name
@@ -1300,7 +1135,7 @@ build_cluster_crosscheck(t_node_status_cube ***dest_cube, int *name_length, Item
/* we don't need the name here */
cube[h]->matrix_list_rec[i]->node_name[0] = '\0';
cube[h]->matrix_list_rec[i]->node_status_list = (t_node_status_rec **) pg_malloc0(sizeof(t_node_status_rec) * nodes.node_count);
cube[h]->matrix_list_rec[i]->node_status_list = (t_node_status_rec **) pg_malloc0(sizeof(t_node_status_rec *) * nodes.node_count);
j = 0;
@@ -1628,6 +1463,7 @@ do_cluster_help(void)
printf(_(" --event filter specific event\n"));
printf(_(" --node-id restrict entries to node with this ID\n"));
printf(_(" --node-name restrict entries to node with this name\n"));
printf(_(" --compact omit \"Details\" column"));
printf(_(" --csv emit output as CSV\n"));
puts("");

View File

@@ -30,14 +30,14 @@ typedef struct
typedef struct
{
int node_id;
char node_name[MAXLEN];
char node_name[NAMEDATALEN];
t_node_status_rec **node_status_list;
} t_node_matrix_rec;
typedef struct
{
int node_id;
char node_name[MAXLEN];
char node_name[NAMEDATALEN];
t_node_matrix_rec **matrix_list_rec;
} t_node_status_cube;

View File

@@ -43,15 +43,17 @@ typedef enum
STATUS_ID = 0,
STATUS_NAME,
STATUS_ROLE,
STATUS_PRIORITY,
STATUS_PG,
STATUS_RUNNING,
STATUS_UPSTREAM_NAME,
STATUS_LOCATION,
STATUS_PRIORITY,
STATUS_REPMGRD,
STATUS_PID,
STATUS_PAUSED,
STATUS_UPSTREAM_LAST_SEEN
} StatusHeader;
#define STATUS_HEADER_COUNT 9
#define STATUS_HEADER_COUNT 11
struct ColHeader headers_status[STATUS_HEADER_COUNT];
@@ -91,14 +93,17 @@ do_daemon_status(void)
strncpy(headers_status[STATUS_ID].title, _("ID"), MAXLEN);
strncpy(headers_status[STATUS_NAME].title, _("Name"), MAXLEN);
strncpy(headers_status[STATUS_ROLE].title, _("Role"), MAXLEN);
strncpy(headers_status[STATUS_PG].title, _("Status"), MAXLEN);
strncpy(headers_status[STATUS_UPSTREAM_NAME].title, _("Upstream"), MAXLEN);
/* following only displayed with the --detail option */
strncpy(headers_status[STATUS_LOCATION].title, _("Location"), MAXLEN);
if (runtime_options.compact == true)
strncpy(headers_status[STATUS_PRIORITY].title, _("Prio."), MAXLEN);
else
strncpy(headers_status[STATUS_PRIORITY].title, _("Priority"), MAXLEN);
strncpy(headers_status[STATUS_PG].title, _("Status"), MAXLEN);
strncpy(headers_status[STATUS_RUNNING].title, _("repmgrd"), MAXLEN);
strncpy(headers_status[STATUS_REPMGRD].title, _("repmgrd"), MAXLEN);
strncpy(headers_status[STATUS_PID].title, _("PID"), MAXLEN);
strncpy(headers_status[STATUS_PAUSED].title, _("Paused?"), MAXLEN);
@@ -107,19 +112,25 @@ do_daemon_status(void)
else
strncpy(headers_status[STATUS_UPSTREAM_LAST_SEEN].title, _("Upstream last seen"), MAXLEN);
for (i = 0; i < STATUS_HEADER_COUNT; i++)
{
headers_status[i].max_length = strlen(headers_status[i].title);
headers_status[i].display = true;
}
if (runtime_options.detail == false)
{
headers_status[STATUS_LOCATION].display = false;
headers_status[STATUS_PRIORITY].display = false;
}
i = 0;
for (cell = nodes.head; cell; cell = cell->next)
{
int j;
PQExpBufferData buf;
PQExpBufferData node_status;
PQExpBufferData upstream;
repmgrd_info[i] = pg_malloc0(sizeof(RepmgrdInfo));
repmgrd_info[i]->node_id = cell->node_info->node_id;
@@ -135,6 +146,7 @@ do_daemon_status(void)
if (PQstatus(cell->node_info->conn) != CONNECTION_OK)
{
connection_error_found = true;
if (runtime_options.verbose)
@@ -155,13 +167,13 @@ do_daemon_status(void)
}
repmgrd_info[i]->pg_running = false;
maxlen_snprintf(repmgrd_info[i]->pg_running_text, "%s", _("not running"));
maxlen_snprintf(repmgrd_info[i]->repmgrd_running, "%s", _("n/a"));
maxlen_snprintf(repmgrd_info[i]->pid_text, "%s", _("n/a"));
}
else
{
maxlen_snprintf(repmgrd_info[i]->pg_running_text, "%s", _("running"));
cell->node_info->node_status = NODE_STATUS_UP;
cell->node_info->recovery_type = get_recovery_type(cell->node_info->conn);
repmgrd_info[i]->pid = repmgrd_get_pid(cell->node_info->conn);
@@ -217,22 +229,42 @@ do_daemon_status(void)
maxlen_snprintf(repmgrd_info[i]->upstream_last_seen_text, _("%i second(s) ago"), repmgrd_info[i]->upstream_last_seen);
}
}
PQfinish(cell->node_info->conn);
}
initPQExpBuffer(&node_status);
initPQExpBuffer(&upstream);
(void)format_node_status(cell->node_info, &node_status, &upstream, &warnings);
snprintf(repmgrd_info[i]->pg_running_text, sizeof(cell->node_info->details),
"%s", node_status.data);
snprintf(cell->node_info->upstream_node_name, sizeof(cell->node_info->upstream_node_name),
"%s", upstream.data);
termPQExpBuffer(&node_status);
termPQExpBuffer(&upstream);
PQfinish(cell->node_info->conn);
headers_status[STATUS_NAME].cur_length = strlen(cell->node_info->node_name);
headers_status[STATUS_ROLE].cur_length = strlen(get_node_type_string(cell->node_info->type));
headers_status[STATUS_PG].cur_length = strlen(repmgrd_info[i]->pg_running_text);
headers_status[STATUS_UPSTREAM_NAME].cur_length = strlen(cell->node_info->upstream_node_name);
initPQExpBuffer(&buf);
appendPQExpBuffer(&buf, "%i", cell->node_info->priority);
headers_status[STATUS_PRIORITY].cur_length = strlen(buf.data);
termPQExpBuffer(&buf);
if (runtime_options.detail == true)
{
PQExpBufferData buf;
headers_status[STATUS_LOCATION].cur_length = strlen(cell->node_info->location);
initPQExpBuffer(&buf);
appendPQExpBuffer(&buf, "%i", cell->node_info->priority);
headers_status[STATUS_PRIORITY].cur_length = strlen(buf.data);
termPQExpBuffer(&buf);
}
headers_status[STATUS_PID].cur_length = strlen(repmgrd_info[i]->pid_text);
headers_status[STATUS_RUNNING].cur_length = strlen(repmgrd_info[i]->repmgrd_running);
headers_status[STATUS_PG].cur_length = strlen(repmgrd_info[i]->pg_running_text);
headers_status[STATUS_REPMGRD].cur_length = strlen(repmgrd_info[i]->repmgrd_running);
headers_status[STATUS_UPSTREAM_LAST_SEEN].cur_length = strlen(repmgrd_info[i]->upstream_last_seen_text);
@@ -269,7 +301,7 @@ do_daemon_status(void)
paused = -1;
}
printf("%i,%s,%s,%i,%i,%i,%i,%i,%i\n",
printf("%i,%s,%s,%i,%i,%i,%i,%i,%i,%s\n",
cell->node_info->node_id,
cell->node_info->node_name,
get_node_type_string(cell->node_info->type),
@@ -280,17 +312,24 @@ do_daemon_status(void)
cell->node_info->priority,
repmgrd_info[i]->pid == UNKNOWN_PID
? -1
: repmgrd_info[i]->upstream_last_seen);
: repmgrd_info[i]->upstream_last_seen,
cell->node_info->location);
}
else
{
printf(" %-*i ", headers_status[STATUS_ID].max_length, cell->node_info->node_id);
printf("| %-*s ", headers_status[STATUS_NAME].max_length, cell->node_info->node_name);
printf("| %-*s ", headers_status[STATUS_ROLE].max_length, get_node_type_string(cell->node_info->type));
printf("| %-*i ", headers_status[STATUS_PRIORITY].max_length, cell->node_info->priority);
printf("| %-*s ", headers_status[STATUS_PG].max_length, repmgrd_info[i]->pg_running_text);
printf("| %-*s ", headers_status[STATUS_RUNNING].max_length, repmgrd_info[i]->repmgrd_running);
printf("| %-*s ", headers_status[STATUS_UPSTREAM_NAME].max_length, cell->node_info->upstream_node_name);
if (runtime_options.detail == true)
{
printf("| %-*s ", headers_status[STATUS_LOCATION].max_length, cell->node_info->location);
printf("| %-*i ", headers_status[STATUS_PRIORITY].max_length, cell->node_info->priority);
}
printf("| %-*s ", headers_status[STATUS_REPMGRD].max_length, repmgrd_info[i]->repmgrd_running);
printf("| %-*s ", headers_status[STATUS_PID].max_length, repmgrd_info[i]->pid_text);
if (repmgrd_info[i]->pid == UNKNOWN_PID)
@@ -441,7 +480,7 @@ _do_repmgr_pause(bool pause)
void
fetch_node_records(PGconn *conn, NodeInfoList *node_list)
{
bool success = get_all_node_records(conn, node_list);
bool success = get_all_node_records_with_upstream(conn, node_list);
if (success == false)
{
@@ -756,6 +795,7 @@ void do_daemon_help(void)
printf(_(" \"daemon status\" shows the status of repmgrd on each node in the cluster\n"));
puts("");
printf(_(" --csv emit output as CSV\n"));
printf(_(" --detail show additional detail\n"));
printf(_(" --verbose show text of database connection error messages\n"));
puts("");

View File

@@ -294,7 +294,7 @@ do_node_status(void)
continue;
}
if (is_downstream_node_attached(conn, node_cell->node_info->node_name) == false)
if (is_downstream_node_attached(conn, node_cell->node_info->node_name) != NODE_ATTACHED)
{
missing_nodes_count++;
item_list_append_format(&missing_nodes,
@@ -1166,7 +1166,7 @@ do_node_check_downstream(PGconn *conn, OutputMode mode, CheckStatusList *list_ou
continue;
}
if (is_downstream_node_attached(conn, cell->node_info->node_name) == false)
if (is_downstream_node_attached(conn, cell->node_info->node_name) != NODE_ATTACHED)
{
missing_nodes_count++;
item_list_append_format(&missing_nodes,
@@ -1408,7 +1408,7 @@ do_node_check_replication_lag(PGconn *conn, OutputMode mode, t_node_info *node_i
break;
}
}
else if (lag_seconds < 0)
else if (lag_seconds == UNKNOWN_REPLICATION_LAG)
{
status = CHECK_STATUS_UNKNOWN;
@@ -2222,7 +2222,7 @@ do_node_rejoin(void)
{
RecoveryType upstream_recovery_type = get_recovery_type(upstream_conn);
log_error(_("unable to connect to current registered primary \"%s\" (node ID: %i)"),
log_error(_("unable to connect to current registered primary \"%s\" (ID: %i)"),
primary_node_record.node_name,
primary_node_record.node_id);
log_detail(_("registered primary node conninfo is: \"%s\""),
@@ -2476,6 +2476,8 @@ do_node_rejoin(void)
termPQExpBuffer(&slotdir_ent_path);
}
closedir(slotdir);
}
termPQExpBuffer(&slotdir_path);
}
@@ -2560,9 +2562,10 @@ do_node_rejoin(void)
for (; i < config_file_options.node_rejoin_timeout; i++)
{
success = is_downstream_node_attached(primary_conn, config_file_options.node_name);
NodeAttached node_attached = is_downstream_node_attached(primary_conn,
config_file_options.node_name);
if (success == true)
if (node_attached == NODE_ATTACHED)
{
log_verbose(LOG_INFO, _("node %i has attached to its upstream node"),
config_file_options.node_id);
@@ -2610,7 +2613,9 @@ do_node_rejoin(void)
else
{
/* -W/--no-wait provided - check once */
success = is_downstream_node_attached(primary_conn, config_file_options.node_name);
NodeAttached node_attached = is_downstream_node_attached(primary_conn, config_file_options.node_name);
if (node_attached == NODE_ATTACHED)
success = true;
}
/*
@@ -2784,6 +2789,7 @@ _do_node_archive_config(void)
arcdir = opendir(archive_dir.data);
/* always attempt to open the directory */
if (arcdir == NULL)
{
log_error(_("unable to open archive directory \"%s\""),
@@ -2829,10 +2835,11 @@ _do_node_archive_config(void)
termPQExpBuffer(&arcdir_ent_path);
}
closedir(arcdir);
}
closedir(arcdir);
/*
* extract list of config files from --config-files
*/
@@ -2850,7 +2857,7 @@ _do_node_archive_config(void)
{
int filename_len = j - i;
if (filename_len > MAXPGPATH)
if (filename_len >= MAXPGPATH)
filename_len = MAXPGPATH - 1;
strncpy(filenamebuf, runtime_options.config_files + i, filename_len);
@@ -3104,11 +3111,12 @@ copy_file(const char *src_file, const char *dest_file)
int a = 0;
ptr_old = fopen(src_file, "r");
ptr_new = fopen(dest_file, "w");
if (ptr_old == NULL)
return false;
ptr_new = fopen(dest_file, "w");
if (ptr_new == NULL)
{
fclose(ptr_old);

View File

@@ -96,28 +96,6 @@ do_primary_register(void)
initialize_voting_term(conn);
/* Ensure there isn't another registered node which is primary */
primary_conn = get_primary_connection(conn, &current_primary_id, NULL);
if (primary_conn != NULL)
{
if (current_primary_id != config_file_options.node_id)
{
/*
* it's impossible to add a second primary to a streaming
* replication cluster
*/
log_error(_("there is already an active registered primary (node ID: %i) in this cluster"), current_primary_id);
PQfinish(primary_conn);
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
/* we've probably connected to ourselves */
PQfinish(primary_conn);
}
begin_transaction(conn);
/*
@@ -128,12 +106,32 @@ do_primary_register(void)
current_primary_id = get_primary_node_id(conn);
if (current_primary_id != NODE_NOT_FOUND && current_primary_id != config_file_options.node_id)
{
log_error(_("another node with id %i is already registered as primary"), current_primary_id);
log_detail(_("a streaming replication cluster can have only one primary node"));
log_debug("XXX %i", current_primary_id);
primary_conn = establish_primary_db_connection(conn, false);
rollback_transaction(conn);
PQfinish(conn);
exit(ERR_BAD_CONFIG);
if (PQstatus(primary_conn) == CONNECTION_OK)
{
if (get_recovery_type(primary_conn) == RECTYPE_PRIMARY)
{
log_error(_("there is already an active registered primary (ID: %i) in this cluster"),
current_primary_id);
log_detail(_("a streaming replication cluster can have only one primary node"));
log_hint(_("ensure this node is shut down before registering a new primary"));
PQfinish(primary_conn);
rollback_transaction(conn);
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
log_warning(_("node %is is registered as primary but running as a standby"),
current_primary_id);
PQfinish(primary_conn);
}
log_notice(_("setting node %i's node record to inactive"),
current_primary_id);
update_node_record_set_active(conn, current_primary_id, false);
}
/*
@@ -225,12 +223,12 @@ do_primary_register(void)
if (record_status == RECORD_FOUND)
{
log_notice(_("primary node record (id: %i) updated"),
log_notice(_("primary node record (ID: %i) updated"),
config_file_options.node_id);
}
else
{
log_notice(_("primary node record (id: %i) registered"),
log_notice(_("primary node record (ID: %i) registered"),
config_file_options.node_id);
}
@@ -276,7 +274,7 @@ do_primary_unregister(void)
if (get_primary_node_record(local_conn, &primary_node_info) == true)
{
log_detail(_("current primary registered as node %s (id: %i, conninfo: \"%s\")"),
log_detail(_("current primary registered as node \"%s\" (ID: %i, conninfo: \"%s\")"),
primary_node_info.node_name,
primary_node_info.node_id,
primary_node_info.conninfo);
@@ -318,7 +316,7 @@ do_primary_unregister(void)
if (target_node_info_ptr->type == WITNESS)
{
log_error(_("node %s (id: %i) is a witness server, unable to unregister"),
log_error(_("node \"%s\" (ID: %i) is a witness server, unable to unregister"),
target_node_info_ptr->node_name,
target_node_info_ptr->node_id);
if (target_node_info_ptr->type == STANDBY)
@@ -359,7 +357,7 @@ do_primary_unregister(void)
for (cell = downstream_nodes.head; cell; cell = cell->next)
{
appendPQExpBuffer(&detail,
" %s (id: %i)\n",
" %s (ID: %i)\n",
cell->node_info->node_name,
cell->node_info->node_id);
}
@@ -379,7 +377,7 @@ do_primary_unregister(void)
{
if (target_node_info_ptr->type != PRIMARY)
{
log_error(_("node %s (id: %i) is not a primary, unable to unregister"),
log_error(_("node \"%s\" (ID: %i) is not a primary, unable to unregister"),
target_node_info_ptr->node_name,
target_node_info_ptr->node_id);
if (target_node_info_ptr->type == STANDBY)
@@ -406,7 +404,7 @@ do_primary_unregister(void)
*/
if (target_node_info_ptr->type != PRIMARY)
{
log_error(_("node %s (ID: %i) is a %s, unable to unregister"),
log_error(_("node \"%s\" (ID: %i) is a %s, unable to unregister"),
target_node_info_ptr->node_name,
target_node_info_ptr->node_id,
get_node_type_string(target_node_info_ptr->type));
@@ -420,7 +418,7 @@ do_primary_unregister(void)
*/
else if (!runtime_options.force)
{
log_error(_("node %s (ID: %i) is running as a standby, unable to unregister"),
log_error(_("node \"%s\" (ID: %i) is running as a standby, unable to unregister"),
target_node_info_ptr->node_name,
target_node_info_ptr->node_id);
log_hint(_("the node can be registered as a standby with \"repmgr standby register --force\""));
@@ -445,7 +443,7 @@ do_primary_unregister(void)
if (primary_record_found == false)
{
log_error(_("node %s (ID: %i) is a primary node, but no primary node record found"),
log_error(_("node \"%s\" (ID: %i) is a primary node, but no primary node record found"),
target_node_info_ptr->node_name,
target_node_info_ptr->node_id);
log_hint(_("register this node as primary with \"repmgr primary register --force\""));
@@ -460,7 +458,7 @@ do_primary_unregister(void)
*/
if (primary_node_info.node_id == target_node_info_ptr->node_id)
{
log_error(_("node %s (ID: %i) is the current primary node, unable to unregister"),
log_error(_("node \"%s\" (ID: %i) is the current primary node, unable to unregister"),
target_node_info_ptr->node_name,
target_node_info_ptr->node_id);
@@ -482,7 +480,7 @@ do_primary_unregister(void)
{
if (!runtime_options.force)
{
log_error(_("node %s (ID: %i) is marked as active, unable to unregister"),
log_error(_("node \"%s\" (ID: %i) is marked as active, unable to unregister"),
target_node_info_ptr->node_name,
target_node_info_ptr->node_id);
log_hint(_("run \"repmgr primary unregister --force\" to unregister this node"));
@@ -493,7 +491,7 @@ do_primary_unregister(void)
if (runtime_options.dry_run == true)
{
log_notice(_("node %s (ID: %i) would now be unregistered"),
log_notice(_("node \"%s\" (ID: %i) would now be unregistered"),
target_node_info_ptr->node_name,
target_node_info_ptr->node_id);
log_hint(_("run the same command without the --dry-run option to unregister this node"));
@@ -506,7 +504,7 @@ do_primary_unregister(void)
if (delete_success == false)
{
log_error(_("unable to unregister node %s (ID: %i)"),
log_error(_("unable to unregister node \"%s\" (ID: %i)"),
target_node_info_ptr->node_name,
target_node_info_ptr->node_id);
PQfinish(primary_conn);
@@ -515,14 +513,14 @@ do_primary_unregister(void)
initPQExpBuffer(&event_details);
appendPQExpBuffer(&event_details,
_("node %s (ID: %i) unregistered"),
_("node \"%s\" (ID: %i) unregistered"),
target_node_info_ptr->node_name,
target_node_info_ptr->node_id);
if (target_node_info_ptr->node_id != config_file_options.node_id)
{
appendPQExpBuffer(&event_details,
_(" from node %s (ID: %i)"),
_(" from node \"%s\" (ID: %i)"),
config_file_options.node_name,
config_file_options.node_id);
}
@@ -535,7 +533,7 @@ do_primary_unregister(void)
event_details.data);
termPQExpBuffer(&event_details);
log_info(_("node %s (ID: %i) was successfully unregistered"),
log_info(_("node \"%s\" (ID: %i) was successfully unregistered"),
target_node_info_ptr->node_name,
target_node_info_ptr->node_id);
}

File diff suppressed because it is too large Load Diff

View File

@@ -56,8 +56,7 @@ do_witness_register(void)
log_error(_("unable to connect to witness node \"%s\" (ID: %i)"),
config_file_options.node_name,
config_file_options.node_id);
log_detail("%s",
PQerrorMessage(witness_conn));
log_detail("\n%s", PQerrorMessage(witness_conn));
log_hint(_("the witness node must be running before it can be registered"));
exit(ERR_BAD_CONFIG);
}
@@ -174,11 +173,26 @@ do_witness_register(void)
exit(ERR_BAD_CONFIG);
}
/*
* TODO: sanity check witness node is not part of main cluster; we could
* add a random application_name to the respective connections,
* and do a simple check of pg_stat_activity
*/
/* Sanity check witness node is not part of main cluster. */
if (PQserverVersion(primary_conn) >= 90600 &&
PQserverVersion(witness_conn) >= 90600)
{
uint64 primary_system_identifier = system_identifier(primary_conn);
uint64 witness_system_identifier = system_identifier(witness_conn);
if (primary_system_identifier == witness_system_identifier &&
primary_system_identifier != UNKNOWN_SYSTEM_IDENTIFIER)
{
log_error(_("witness node cannot be in the same cluster as the primary node"));
log_detail(_("database system identifiers on primary node and provided witness node match (%lu)"),
primary_system_identifier);
log_hint(_("the witness node must be created on a separate read/write node"));
PQfinish(witness_conn);
PQfinish(primary_conn);
exit(ERR_BAD_CONFIG);
}
}
/* check that primary node is not a BDR node */
if (is_bdr_db_quiet(primary_conn) == true)
@@ -357,13 +371,24 @@ do_witness_register(void)
exit(ERR_BAD_CONFIG);
}
/* create event */
create_event_record(primary_conn,
&config_file_options,
config_file_options.node_id,
"witness_register",
true,
NULL);
{
PQExpBufferData event_details;
initPQExpBuffer(&event_details);
appendPQExpBuffer(&event_details,
_("witness registration succeeded; upstream node ID is %i"),
node_record.upstream_node_id);
/* create event */
create_event_notification(primary_conn,
&config_file_options,
config_file_options.node_id,
"witness_register",
true,
event_details.data);
termPQExpBuffer(&event_details);
}
PQfinish(primary_conn);
PQfinish(witness_conn);
@@ -411,7 +436,7 @@ do_witness_unregister(void)
log_error(_("unable to connect to node \"%s\" (ID: %i)"),
config_file_options.node_name,
config_file_options.node_id);
log_detail("%s", PQerrorMessage(local_conn));
log_detail("\n%s", PQerrorMessage(local_conn));
exit(ERR_BAD_CONFIG);
}
@@ -437,7 +462,7 @@ do_witness_unregister(void)
if (PQstatus(primary_conn) != CONNECTION_OK)
{
log_error(_("unable to connect to primary"));
log_detail("%s", PQerrorMessage(primary_conn));
log_detail("\n%s", PQerrorMessage(primary_conn));
if (local_node_available == true)
{
@@ -506,13 +531,24 @@ do_witness_unregister(void)
exit(ERR_BAD_CONFIG);
}
/* Log the event */
create_event_record(primary_conn,
&config_file_options,
witness_node_id,
"witness_unregister",
true,
NULL);
{
PQExpBufferData event_details;
initPQExpBuffer(&event_details);
appendPQExpBufferStr(&event_details,
_("witness unregistration succeeded"));
/* create event */
create_event_notification(primary_conn,
&config_file_options,
witness_node_id,
"witness_unregister",
true,
event_details.data);
termPQExpBuffer(&event_details);
}
PQfinish(primary_conn);

View File

@@ -45,6 +45,7 @@ typedef struct
int wait;
bool no_wait;
bool compact;
bool detail;
/* logging options */
char log_level[MAXLEN]; /* overrides setting in repmgr.conf */
@@ -70,7 +71,7 @@ typedef struct
/* general node options */
int node_id;
char node_name[MAXLEN];
char node_name[NAMEDATALEN];
char data_dir[MAXPGPATH];
int remote_node_id;
@@ -100,6 +101,7 @@ typedef struct
char force_rewind_path[MAXPGPATH];
bool siblings_follow;
bool repmgrd_no_pause;
bool repmgrd_force_unpause;
/* "node status" options */
bool is_shutdown_cleanly;
@@ -143,7 +145,7 @@ typedef struct
/* configuration metadata */ \
false, false, false, false, false, \
/* general configuration options */ \
"", false, false, "", -1, false, false, \
"", false, false, "", -1, false, false, false, \
/* logging options */ \
"", false, false, false, false, \
/* output options */ \
@@ -162,7 +164,7 @@ typedef struct
/* "standby register" options */ \
false, -1, DEFAULT_WAIT_START, \
/* "standby switchover" options */ \
false, false, "", false, false, \
false, false, "", false, false, false, \
/* "node status" options */ \
false, \
/* "node check" options */ \
@@ -241,8 +243,8 @@ extern void get_superuser_connection(PGconn **conn, PGconn **superuser_conn, PGc
extern void make_remote_repmgr_path(PQExpBufferData *outputbuf, t_node_info *remote_node_record);
extern void make_repmgrd_path(PQExpBufferData *output_buf);
/* display functions */
extern bool format_node_status(t_node_info *node_info, PQExpBufferData *node_status, PQExpBufferData *upstream, ItemList *warnings);
extern void print_help_header(void);
extern void print_status_header(int cols, ColHeader *headers);

View File

@@ -271,6 +271,10 @@ main(int argc, char **argv)
runtime_options.compact = true;
break;
/* --detail */
case OPT_DETAIL:
runtime_options.detail = true;
break;
/*----------------------------
* database connection options
@@ -356,9 +360,15 @@ main(int argc, char **argv)
/* --node-name */
case OPT_NODE_NAME:
strncpy(runtime_options.node_name, optarg, MAXLEN);
{
if (strlen(optarg) < sizeof(runtime_options.node_name))
strncpy(runtime_options.node_name, optarg, sizeof(runtime_options.node_name));
else
item_list_append_format(&cli_errors,
_("value for \"--node-name\" must contain fewer than %lu characters"),
sizeof(runtime_options.node_name));
break;
}
/* --remote-node-id */
case OPT_REMOTE_NODE_ID:
runtime_options.remote_node_id = repmgr_atoi(optarg, "--remote-node-id", &cli_errors, MIN_NODE_ID);
@@ -468,6 +478,10 @@ main(int argc, char **argv)
runtime_options.repmgrd_no_pause = true;
break;
case OPT_REPMGRD_FORCE_UNPAUSE:
runtime_options.repmgrd_force_unpause = true;
break;
/*----------------------
* "node status" options
*----------------------
@@ -1280,7 +1294,7 @@ main(int argc, char **argv)
pfree(escaped);
if (record_status != RECORD_FOUND)
{
log_error(_("node %s (specified with --node-name) not found"),
log_error(_("node \"%s\" (specified with --node-name) not found"),
runtime_options.node_name);
PQfinish(conn);
free_conninfo_params(&source_conninfo);
@@ -1674,6 +1688,8 @@ check_cli_parameters(const int action)
item_list_append_format(&cli_warnings,
_("--replication-user ignored when executing %s"),
action_name(action));
break;
default:
item_list_append_format(&cli_warnings,
_("--replication-user not required when executing %s"),
@@ -1846,6 +1862,22 @@ check_cli_parameters(const int action)
}
}
if (runtime_options.repmgrd_force_unpause == true)
{
switch (action)
{
case STANDBY_SWITCHOVER:
if (runtime_options.repmgrd_no_pause == true)
item_list_append(&cli_errors,
_("--repmgrd-force-unpause and --repmgrd-no-pause cannot be used together"));
break;
default:
item_list_append_format(&cli_warnings,
_("--repmgrd-force-unpause will be ignored when executing %s"),
action_name(action));
}
}
if (runtime_options.config_files[0] != '\0')
{
switch (action)
@@ -1908,12 +1940,12 @@ check_cli_parameters(const int action)
}
/* --compact */
if (runtime_options.compact == true)
{
switch (action)
{
case CLUSTER_SHOW:
case CLUSTER_EVENT:
case DAEMON_STATUS:
break;
default:
@@ -1923,6 +1955,35 @@ check_cli_parameters(const int action)
}
}
/* --detail */
if (runtime_options.detail == true)
{
switch (action)
{
case DAEMON_STATUS:
break;
default:
item_list_append_format(&cli_warnings,
_("--detail is not effective when executing %s"),
action_name(action));
}
}
/* --siblings-follow */
if (runtime_options.siblings_follow == true)
{
switch (action)
{
case STANDBY_PROMOTE:
case STANDBY_SWITCHOVER:
break;
default:
item_list_append_format(&cli_warnings,
_("----siblings-follow is not effective when executing %s"),
action_name(action));
}
}
/* --disable-wal-receiver / --enable-wal-receiver */
if (runtime_options.disable_wal_receiver == true || runtime_options.enable_wal_receiver == true)
{
@@ -1947,6 +2008,425 @@ check_cli_parameters(const int action)
}
/*
* Generate formatted node status output for display by "cluster show" and
* "daemon status".
*/
bool
format_node_status(t_node_info *node_info, PQExpBufferData *node_status, PQExpBufferData *upstream, ItemList *warnings)
{
bool error_found = false;
t_node_info remote_node_rec = T_NODE_INFO_INITIALIZER;
RecordStatus remote_node_rec_found = RECORD_NOT_FOUND;
if (PQstatus(node_info->conn) == CONNECTION_OK)
{
node_info->node_status = NODE_STATUS_UP;
node_info->recovery_type = get_recovery_type(node_info->conn);
/* get node's copy of its record so we can see what it thinks its status is */
remote_node_rec_found = get_node_record_with_upstream(node_info->conn, node_info->node_id, &remote_node_rec);
}
else
{
/* check if node is reachable, but just not letting us in */
if (is_server_available_quiet(node_info->conninfo))
node_info->node_status = NODE_STATUS_REJECTED;
else
node_info->node_status = NODE_STATUS_DOWN;
node_info->recovery_type = RECTYPE_UNKNOWN;
}
/* format node status info */
switch (node_info->type)
{
case PRIMARY:
{
/* node is reachable */
if (node_info->node_status == NODE_STATUS_UP)
{
if (node_info->active == true)
{
switch (node_info->recovery_type)
{
case RECTYPE_PRIMARY:
appendPQExpBufferStr(node_status, "* running");
break;
case RECTYPE_STANDBY:
appendPQExpBufferStr(node_status, "! running as standby");
item_list_append_format(warnings,
"node \"%s\" (ID: %i) is registered as primary but running as standby",
node_info->node_name, node_info->node_id);
break;
case RECTYPE_UNKNOWN:
appendPQExpBufferStr(node_status, "! unknown");
item_list_append_format(warnings,
"node \"%s\" (ID: %i) has unknown replication status",
node_info->node_name, node_info->node_id);
break;
}
}
else
{
if (node_info->recovery_type == RECTYPE_PRIMARY)
{
appendPQExpBufferStr(node_status, "! running");
item_list_append_format(warnings,
"node \"%s\" (ID: %i) is running but the repmgr node record is inactive",
node_info->node_name, node_info->node_id);
}
else
{
appendPQExpBufferStr(node_status, "! running as standby");
item_list_append_format(warnings,
"node \"%s\" (ID: %i) is registered as an inactive primary but running as standby",
node_info->node_name, node_info->node_id);
}
}
}
/* node is up but cannot connect */
else if (node_info->node_status == NODE_STATUS_REJECTED)
{
if (node_info->active == true)
{
appendPQExpBufferStr(node_status, "? running");
}
else
{
appendPQExpBufferStr(node_status, "! running");
error_found = true;
}
}
/* node is unreachable */
else
{
/* node is unreachable but marked active */
if (node_info->active == true)
{
appendPQExpBufferStr(node_status, "? unreachable");
item_list_append_format(warnings,
"node \"%s\" (ID: %i) is registered as an active primary but is unreachable",
node_info->node_name, node_info->node_id);
}
/* node is unreachable and marked as inactive */
else
{
appendPQExpBufferStr(node_status, "- failed");
error_found = true;
}
}
}
break;
case STANDBY:
{
/* node is reachable */
if (node_info->node_status == NODE_STATUS_UP)
{
if (node_info->active == true)
{
switch (node_info->recovery_type)
{
case RECTYPE_STANDBY:
appendPQExpBufferStr(node_status, " running");
break;
case RECTYPE_PRIMARY:
appendPQExpBufferStr(node_status, "! running as primary");
item_list_append_format(warnings,
"node \"%s\" (ID: %i) is registered as standby but running as primary",
node_info->node_name, node_info->node_id);
break;
case RECTYPE_UNKNOWN:
appendPQExpBufferStr(node_status, "! unknown");
item_list_append_format(
warnings,
"node \"%s\" (ID: %i) has unknown replication status",
node_info->node_name, node_info->node_id);
break;
}
}
else
{
if (node_info->recovery_type == RECTYPE_STANDBY)
{
appendPQExpBufferStr(node_status, "! running");
item_list_append_format(warnings,
"node \"%s\" (ID: %i) is running but the repmgr node record is inactive",
node_info->node_name, node_info->node_id);
}
else
{
appendPQExpBufferStr(node_status, "! running as primary");
item_list_append_format(warnings,
"node \"%s\" (ID: %i) is running as primary but the repmgr node record is inactive",
node_info->node_name, node_info->node_id);
}
}
/* warn about issue with paused WAL replay */
if (is_wal_replay_paused(node_info->conn, true))
{
item_list_append_format(warnings,
_("WAL replay is paused on node \"%s\" (ID: %i) with WAL replay pending; this node cannot be manually promoted until WAL replay is resumed"),
node_info->node_name, node_info->node_id);
}
}
/* node is up but cannot connect */
else if (node_info->node_status == NODE_STATUS_REJECTED)
{
if (node_info->active == true)
{
appendPQExpBufferStr(node_status, "? running");
}
else
{
appendPQExpBufferStr(node_status, "! running");
error_found = true;
}
}
/* node is unreachable */
else
{
/* node is unreachable but marked active */
if (node_info->active == true)
{
appendPQExpBufferStr(node_status, "? unreachable");
item_list_append_format(warnings,
"node \"%s\" (ID: %i) is registered as an active standby but is unreachable",
node_info->node_name, node_info->node_id);
}
else
{
appendPQExpBufferStr(node_status, "- failed");
error_found = true;
}
}
}
break;
case WITNESS:
case BDR:
{
/* node is reachable */
if (node_info->node_status == NODE_STATUS_UP)
{
if (node_info->active == true)
{
appendPQExpBufferStr(node_status, "* running");
}
else
{
appendPQExpBufferStr(node_status, "! running");
error_found = true;
}
}
/* node is up but cannot connect */
else if (node_info->node_status == NODE_STATUS_REJECTED)
{
if (node_info->active == true)
{
appendPQExpBufferStr(node_status, "? rejected");
}
else
{
appendPQExpBufferStr(node_status, "! failed");
error_found = true;
}
}
/* node is unreachable */
else
{
if (node_info->active == true)
{
appendPQExpBufferStr(node_status, "? unreachable");
}
else
{
appendPQExpBufferStr(node_status, "- failed");
error_found = true;
}
}
}
break;
case UNKNOWN:
{
/* this should never happen */
appendPQExpBufferStr(node_status, "? unknown node type");
error_found = true;
}
break;
}
/* format node upstream info */
if (remote_node_rec_found == RECORD_NOT_FOUND)
{
/*
* Unable to retrieve the node's copy of its own record - copy the
* name from our own copy of the record
*/
appendPQExpBufferStr(upstream,
node_info->upstream_node_name);
}
else if (remote_node_rec.type == WITNESS)
{
/* no upstream - unlikely to happen */
if (remote_node_rec.upstream_node_id == NO_UPSTREAM_NODE)
{
appendPQExpBufferStr(upstream, "! ");
item_list_append_format(warnings,
"node \"%s\" (ID: %i) is a witness but reports it has no upstream node",
node_info->node_name,
node_info->node_id);
}
/* mismatch between reported upstream and upstream in local node's metadata */
else if(node_info->upstream_node_id != remote_node_rec.upstream_node_id)
{
appendPQExpBufferStr(upstream, "! ");
if (node_info->upstream_node_id != remote_node_rec.upstream_node_id)
{
item_list_append_format(warnings,
"node \"%s\" (ID: %i) reports a different upstream (reported: \"%s\", expected \"%s\")",
node_info->node_name,
node_info->node_id,
remote_node_rec.upstream_node_name,
node_info->upstream_node_name);
}
}
else
{
t_node_info upstream_node_rec = T_NODE_INFO_INITIALIZER;
RecordStatus upstream_node_rec_found = get_node_record(node_info->conn,
node_info->upstream_node_id,
&upstream_node_rec);
if (upstream_node_rec_found != RECORD_FOUND)
{
appendPQExpBufferStr(upstream, "? ");
item_list_append_format(warnings,
"unable to find record for upstream node ID %i",
node_info->upstream_node_id);
}
else
{
PGconn *upstream_conn = establish_db_connection_quiet(upstream_node_rec.conninfo);
if (PQstatus(upstream_conn) != CONNECTION_OK)
{
appendPQExpBufferStr(upstream, "? ");
item_list_append_format(warnings,
"unable to connect to node \"%s\" (ID: %i)'s upstream node \"%s\" (ID: %i)",
node_info->node_name,
node_info->node_id,
upstream_node_rec.node_name,
upstream_node_rec.node_id);
}
PQfinish(upstream_conn);
}
}
appendPQExpBufferStr(upstream,
remote_node_rec.upstream_node_name);
}
else if (remote_node_rec.type == STANDBY)
{
if (node_info->upstream_node_id != NO_UPSTREAM_NODE && node_info->upstream_node_id == remote_node_rec.upstream_node_id)
{
/*
* expected and reported upstreams match - check if node is actually
* connected to the upstream
*/
NodeAttached attached_to_upstream = NODE_ATTACHED_UNKNOWN;
t_node_info upstream_node_rec = T_NODE_INFO_INITIALIZER;
RecordStatus upstream_node_rec_found = get_node_record(node_info->conn,
node_info->upstream_node_id,
&upstream_node_rec);
if (upstream_node_rec_found != RECORD_FOUND)
{
item_list_append_format(warnings,
"unable to find record for upstream node ID %i",
node_info->upstream_node_id);
}
else
{
PGconn *upstream_conn = establish_db_connection_quiet(upstream_node_rec.conninfo);
if (PQstatus(upstream_conn) != CONNECTION_OK)
{
item_list_append_format(warnings,
"unable to connect to node \"%s\" (ID: %i)'s upstream node \"%s\" (ID: %i)",
node_info->node_name,
node_info->node_id,
upstream_node_rec.node_name,
upstream_node_rec.node_id);
}
else
{
attached_to_upstream = is_downstream_node_attached(upstream_conn, node_info->node_name);
}
PQfinish(upstream_conn);
}
if (attached_to_upstream == NODE_ATTACHED_UNKNOWN)
{
appendPQExpBufferStr(upstream, "? ");
item_list_append_format(warnings,
"unable to determine if node \"%s\" (ID: %i) is attached to its upstream node \"%s\" (ID: %i)",
node_info->node_name,
node_info->node_id,
upstream_node_rec.node_name,
upstream_node_rec.node_id);
}
else if (attached_to_upstream == NODE_DETACHED)
{
appendPQExpBufferStr(upstream, "! ");
item_list_append_format(warnings,
"node \"%s\" (ID: %i) is not attached to its upstream node \"%s\" (ID: %i)",
node_info->node_name,
node_info->node_id,
upstream_node_rec.node_name,
upstream_node_rec.node_id);
}
appendPQExpBufferStr(upstream,
node_info->upstream_node_name);
}
else
{
if (node_info->upstream_node_id != NO_UPSTREAM_NODE && remote_node_rec.upstream_node_id == NO_UPSTREAM_NODE)
{
appendPQExpBufferChar(upstream, '!');
item_list_append_format(warnings,
"node \"%s\" (ID: %i) reports it has no upstream (expected: \"%s\")",
node_info->node_name,
node_info->node_id,
node_info->upstream_node_name);
}
else if (node_info->upstream_node_id != NO_UPSTREAM_NODE && remote_node_rec.upstream_node_id != NO_UPSTREAM_NODE)
{
appendPQExpBuffer(upstream,
"! %s", remote_node_rec.upstream_node_name);
item_list_append_format(warnings,
"node \"%s\" (ID: %i) reports a different upstream (reported: \"%s\", expected \"%s\")",
node_info->node_name,
node_info->node_id,
remote_node_rec.upstream_node_name,
node_info->upstream_node_name);
}
}
}
return error_found;
}
static const char *
action_name(const int action)
{
@@ -2036,9 +2516,10 @@ print_error_list(ItemList *error_list, int log_level)
void
print_status_header(int cols, ColHeader *headers)
{
int i;
int i, di;
int max_cols = 0;
/* count how many columns we actually need to display */
for (i = 0; i < cols; i++)
{
@@ -2065,7 +2546,8 @@ print_status_header(int cols, ColHeader *headers)
printf("\n");
printf("-");
for (i = 0; i < max_cols; i++)
di = 0;
for (i = 0; i < cols; i++)
{
int j;
@@ -2075,10 +2557,11 @@ print_status_header(int cols, ColHeader *headers)
for (j = 0; j < headers[i].max_length; j++)
printf("-");
if (i < (max_cols - 1))
if (di < (max_cols - 1))
printf("-+-");
else
printf("-");
di++;
}
printf("\n");
@@ -2211,7 +2694,7 @@ create_repmgr_extension(PGconn *conn)
log_detail(_("version %s is installed but newer version %s is available"),
extversions.installed_version,
extversions.default_version);
log_hint(_("execute \"ALTER EXTENSION repmgr UPGRADE\""));
log_hint(_("update the installed extension version by executing \"ALTER EXTENSION repmgr UPDATE\""));
return false;
case REPMGR_INSTALLED:
@@ -2457,6 +2940,7 @@ get_superuser_connection(PGconn **conn, PGconn **superuser_conn, PGconn **privil
if (PQstatus(*conn) != CONNECTION_OK)
{
log_error(_("no database connection available"));
log_detail("\n%s", PQerrorMessage(*conn));
exit(ERR_INTERNAL);
}
@@ -3000,7 +3484,7 @@ init_node_record(t_node_info *node_record)
strncpy(node_record->location, "default", MAXLEN);
strncpy(node_record->node_name, config_file_options.node_name, MAXLEN);
strncpy(node_record->node_name, config_file_options.node_name, sizeof(node_record->node_name));
strncpy(node_record->conninfo, config_file_options.conninfo, MAXLEN);
strncpy(node_record->config_file, config_file_path, MAXPGPATH);
@@ -3054,9 +3538,6 @@ can_use_pg_rewind(PGconn *conn, const char *data_directory, PQExpBufferData *rea
/* "full_page_writes" must be on in any case */
if (guc_set(conn, "full_page_writes", "=", "off"))
{
if (can_use == false)
appendPQExpBuffer(reason, "; ");
appendPQExpBuffer(reason,
_("\"full_page_writes\" must be set to \"on\""));
@@ -3143,6 +3624,8 @@ drop_replication_slot_if_exists(PGconn *conn, int node_id, char *slot_name)
/*
* Here we'll perform some timeline sanity checks to ensure the follow target
* can actually be followed.
*
* See also comment for check_node_can_follow() in repmgrd-physical.c .
*/
bool
check_node_can_attach(TimeLineID local_tli, XLogRecPtr local_xlogpos, PGconn *follow_target_conn, t_node_info *follow_target_node_record, bool is_rejoin)
@@ -3233,6 +3716,7 @@ check_node_can_attach(TimeLineID local_tli, XLogRecPtr local_xlogpos, PGconn *fo
return false;
}
/* timelines are the same - check relative positions */
if (follow_target_identification.timeline == local_tli)
{
XLogRecPtr follow_target_xlogpos = get_node_current_lsn(follow_target_conn);
@@ -3244,7 +3728,6 @@ check_node_can_attach(TimeLineID local_tli, XLogRecPtr local_xlogpos, PGconn *fo
return false;
}
/* timeline is the same - check relative positions */
if (local_xlogpos <= follow_target_xlogpos)
{
log_info(_("timelines are same, this server is not ahead"));

Some files were not shown because too many files have changed in this diff Show More