From fc5f46ca5ab24bae9ab3b0cd9f04ea5cd97739b6 Mon Sep 17 00:00:00 2001 From: Ian Barwick Date: Wed, 1 Nov 2017 10:49:58 +0900 Subject: [PATCH] docs: update links to repmgr 4.0 documentation --- README.md | 4 +- doc/bdr-failover.md | 286 +--------------------------------- doc/changes-in-repmgr4.md | 4 +- doc/upgrading-from-repmgr3.md | 5 +- 4 files changed, 10 insertions(+), 289 deletions(-) diff --git a/README.md b/README.md index 19930382..340fc355 100644 --- a/README.md +++ b/README.md @@ -25,9 +25,9 @@ for details. `repmgr 4` will support future public BDR releases. Documentation ------------- -The main `repmgr` documentation is available at: +The main `repmgr` documentation is available here: - https://repmgr.org/docs/index.html + (repmgr 4 documentation)[https://repmgr.org/docs/4.0/index.html] The `README` file for `repmgr` 3.x is available here: diff --git a/doc/bdr-failover.md b/doc/bdr-failover.md index f8c11bc5..087d1b5a 100644 --- a/doc/bdr-failover.md +++ b/doc/bdr-failover.md @@ -1,288 +1,8 @@ BDR failover with repmgrd ========================= -`repmgr 4` provides support for monitoring BDR nodes and taking action in case -one of the nodes fails. +This document has been integrated into the main `repmgr` documentation +and is now located here: - *NOTE* Due to the nature of BDR, it's only safe to use this solution for - a two-node scenario. Introducing additional nodes will create an inherent - risk of node desynchronisation if a node goes down without being cleanly - removed from the cluster. + [BDR failover with repmgrd](https://repmgr.org/docs/4.0/repmgrd-bdr.html) -In contrast to streaming replication, there's no concept of "promoting" a new -primary node with BDR. Instead, "failover" involves monitoring both nodes -with `repmgrd` and redirecting queries from the failed node to the remaining -active node. This can be done by using the event notification script generated by -`repmgrd` to dynamically reconfigure a proxy server/connection pooler such -as PgBouncer. - - -Prerequisites -------------- - -`repmgr 4` requires PostgreSQL 9.6 with the BDR 2 extension enabled and -configured for a two-node BDR network. `repmgr 4` packages -must be installed on each node before attempting to configure repmgr. - - *NOTE* `repmgr 4` will refuse to install if it detects more than two - BDR nodes. - -Application database connections *must* be passed through a proxy server/ -connection pooler such as PgBouncer, and it must be possible to dynamically -reconfigure that from `repmgrd`. The example demonstrated in this document -will use PgBouncer. - -The proxy server / connection poolers must not be installed on the database -servers. - -For this example, it's assumed password-less SSH connections are available -from the PostgreSQL servers to the servers where PgBouncer runs, and -that the user on those servers has permission to alter the PgBouncer -configuration files. - -PostgreSQL connections must be possible between each node, and each node -must be able to connect to each PgBouncer instance. - - -Configuration -------------- - -Sample configuration for `repmgr.conf`: - - node_id=1 - node_name='node1' - conninfo='host=node1 dbname=bdrtest user=repmgr connect_timeout=2' - replication_type='bdr' - - event_notifications=bdr_failover - event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a" >> /tmp/bdr-failover.log 2>&1' - - # repmgrd options - monitor_interval_secs=5 - reconnect_attempts=6 - reconnect_interval=5 - -Adjust settings as appropriate; copy and adjust for the second node (particularly -the values `node_id`, `node_name` and `conninfo`). - -Note that the values provided for the `conninfo` string must be valid for -connections from *both* nodes in the cluster. The database must be the BDR -database. - -If defined, `event_notifications` will restrict execution of `event_notification_command` -to the specified events. - -`event_notification_command` is the script which does the actual "heavy lifting" -of reconfiguring the proxy server/ connection pooler. It is fully user-definable; -a sample implementation is documented below. - - -repmgr user permissions ------------------------ - -`repmgr` will create an extension in the BDR database containing objects -for administering `repmgr` metadata. The user defined in the `conninfo` -setting must be able to access all objects. Additionally, superuser permissions -are required to install the `repmgr` extension. The easiest way to do this -is create the `repmgr` user as a superuser, however if this is not -desirable, the `repmgr` user can be created as a normal user and a -superuser specified with `--superuser` when registering a BDR node. - -repmgr setup ------------- - -Register both nodes: - - $ repmgr -f /etc/repmgr.conf bdr register - NOTICE: attempting to install extension "repmgr" - NOTICE: "repmgr" extension successfully installed - NOTICE: node record created for node 'node1' (ID: 1) - NOTICE: BDR node 1 registered (conninfo: host=localhost dbname=bdrtest user=repmgr port=5501) - - $ repmgr -f /etc/repmgr.conf bdr register - NOTICE: node record created for node 'node2' (ID: 2) - NOTICE: BDR node 2 registered (conninfo: host=localhost dbname=bdrtest user=repmgr port=5502) - -The `repmgr` extension will be automatically created when the first -node is registered, and will be propagated to the second node. - - *IMPORTANT* ensure the repmgr package is available on both nodes before - attempting to register the first node - - -At this point the meta data for both nodes has been created; executing -`repmgr cluster show` (on either node) should produce output like this: - - $ repmgr -f /etc/repmgr.conf cluster show - ID | Name | Role | Status | Upstream | Connection string - ----+-------+------+-----------+----------+-------------------------------------------------------- - 1 | node1 | bdr | * running | | host=node1 dbname=bdrtest user=repmgr connect_timeout=2 - 2 | node2 | bdr | * running | | host=node2 dbname=bdrtest user=repmgr connect_timeout=2 - -Additionally it's possible to see a log of significant events; so far -this will only record the two node registrations (in reverse chronological order): - - Node ID | Event | OK | Timestamp | Details - ---------+--------------+----+---------------------+---------------------------------------------- - 2 | bdr_register | t | 2017-07-27 17:51:48 | node record created for node 'node2' (ID: 2) - 1 | bdr_register | t | 2017-07-27 17:51:00 | node record created for node 'node1' (ID: 1) - - -Defining the "event_notification_command" ------------------------------------------ - -Key to "failover" execution is the `event_notification_command`, which is a -user-definable script which should reconfigure the proxy server/ -connection pooler. - -Each time `repmgr` (or `repmgrd`) records an event, it can optionally -execute the script defined in `event_notification_command` to -take further action; details of the event will be passed as parameters. -Following placeholders are available to the script: - - %n - node ID - %e - event type - %s - success (1 or 0) - %t - timestamp - %d - details - %c - conninfo string of the next available node - %a - name of the next available node - -Note that `%c` and `%a` will only be provided during `bdr_failover` -events, which is what is of interest here. - -The provided sample script (`scripts/bdr-pgbouncer.sh`) is configured like -this: - - event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a"' - -and parses the configures parameters like this: - - NODE_ID=$1 - EVENT_TYPE=$2 - SUCCESS=$3 - NEXT_CONNINFO=$4 - NEXT_NODE_NAME=$5 - -It also contains some hard-coded values about the PgBouncer configuration for -both nodes; these will need to be adjusted for your local environment of course -(ideally the scripts would be maintained as templates and generated by some -kind of provisioning system). - -The script performs following steps: - - - pauses PgBouncer on all nodes - - recreates the PgBouncer configuration file on each node using the information - provided by `repmgrd` (mainly the `conninfo` string) to configure PgBouncer - to point to the remaining node - - reloads the PgBouncer configuration - - resumes PgBouncer - -From that point, any connections to PgBouncer on the failed BDR node will be redirected -to the active node. - - -repmgrd -------- - - - -Node monitoring and failover ----------------------------- - -At the intervals specified by `monitor_interval_secs` in `repmgr.conf`, `repmgrd` -will ping each node to check if it's available. If a node isn't available, -`repmgrd` will enter failover mode and check `reconnect_attempts` times -at intervals of `reconnect_interval` to confirm the node is definitely unreachable. -This buffer period is necessary to avoid false positives caused by transient -network outages. - -If the node is still unavailable, `repmgrd` will enter failover mode and execute -the script defined in `event_notification_command`; an entry will be logged -in the `repmgr.events` table and `repmgrd` will (unless otherwise configured) -resume monitoring of the node in "degraded" mode until it reappears. - -`repmgrd` logfile output during a failover event will look something like this -one one node (usually the node which has failed, here "node2"): - - ... - [2017-07-27 21:08:39] [INFO] starting continuous BDR node monitoring - [2017-07-27 21:08:39] [INFO] monitoring BDR replication status on node "node2" (ID: 2) - [2017-07-27 21:08:55] [INFO] monitoring BDR replication status on node "node2" (ID: 2) - [2017-07-27 21:09:11] [INFO] monitoring BDR replication status on node "node2" (ID: 2) - [2017-07-27 21:09:23] [WARNING] unable to connect to node node2 (ID 2) - [2017-07-27 21:09:23] [INFO] checking state of node 2, 0 of 5 attempts - [2017-07-27 21:09:23] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-07-27 21:09:24] [INFO] checking state of node 2, 1 of 5 attempts - [2017-07-27 21:09:24] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-07-27 21:09:25] [INFO] checking state of node 2, 2 of 5 attempts - [2017-07-27 21:09:25] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-07-27 21:09:26] [INFO] checking state of node 2, 3 of 5 attempts - [2017-07-27 21:09:26] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-07-27 21:09:27] [INFO] checking state of node 2, 4 of 5 attempts - [2017-07-27 21:09:27] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-07-27 21:09:28] [WARNING] unable to reconnect to node 2 after 5 attempts - [2017-07-27 21:09:28] [NOTICE] setting node record for node 2 to inactive - [2017-07-27 21:09:28] [INFO] executing notification command for event "bdr_failover" - [2017-07-27 21:09:28] [DETAIL] command is: - /path/to/bdr-pgbouncer.sh 2 bdr_failover 1 "host=host=node1 dbname=bdrtest user=repmgr connect_timeout=2" "node1" - [2017-07-27 21:09:28] [INFO] node 'node2' (ID: 2) detected as failed; next available node is 'node1' (ID: 1) - [2017-07-27 21:09:28] [INFO] monitoring BDR replication status on node "node2" (ID: 2) - [2017-07-27 21:09:28] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode - ... - -Output on the other node ("node1") during the same event will look like this: - - [2017-07-27 21:08:35] [INFO] starting continuous BDR node monitoring - [2017-07-27 21:08:35] [INFO] monitoring BDR replication status on node "node1" (ID: 1) - [2017-07-27 21:08:51] [INFO] monitoring BDR replication status on node "node1" (ID: 1) - [2017-07-27 21:09:07] [INFO] monitoring BDR replication status on node "node1" (ID: 1) - [2017-07-27 21:09:23] [WARNING] unable to connect to node node2 (ID 2) - [2017-07-27 21:09:23] [INFO] checking state of node 2, 0 of 5 attempts - [2017-07-27 21:09:23] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-07-27 21:09:24] [INFO] checking state of node 2, 1 of 5 attempts - [2017-07-27 21:09:24] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-07-27 21:09:25] [INFO] checking state of node 2, 2 of 5 attempts - [2017-07-27 21:09:25] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-07-27 21:09:26] [INFO] checking state of node 2, 3 of 5 attempts - [2017-07-27 21:09:26] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-07-27 21:09:27] [INFO] checking state of node 2, 4 of 5 attempts - [2017-07-27 21:09:27] [INFO] sleeping 1 seconds until next reconnection attempt - [2017-07-27 21:09:28] [WARNING] unable to reconnect to node 2 after 5 attempts - [2017-07-27 21:09:28] [NOTICE] other node's repmgrd is handling failover - [2017-07-27 21:09:28] [INFO] monitoring BDR replication status on node "node1" (ID: 1) - [2017-07-27 21:09:28] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode - -This assumes only the PostgreSQL instance on "node2" has failed. In this case the -`repmgrd` instance running on "node2" has performed the failover. However if -the entire server becomes unavailable, `repmgrd` on "node1" will perform -the failover. - - -Node recovery -------------- - -Following failure of a BDR node, if the node subsequently becomes available again, -a `bdr_recovery` event will be generated. This could potentially be used to -reconfigure PgBouncer automatically to bring the node back into the available pool, -however it would be prudent to manually verify the node's status before -exposing it to the application. - -If the failed node comes back up and connects correctly, output similar to this -will be visible in the `repmgrd` log: - - [2017-07-27 21:25:30] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode - [2017-07-27 21:25:46] [INFO] monitoring BDR replication status on node "node2" (ID: 2) - [2017-07-27 21:25:46] [DETAIL] monitoring node "node2" (ID: 2) in degraded mode - [2017-07-27 21:25:55] [INFO] active replication slot for node "node1" found after 1 seconds - [2017-07-27 21:25:55] [NOTICE] node "node2" (ID: 2) has recovered after 986 seconds - - -Shutdown of both nodes ----------------------- - -If both PostgreSQL instances are shut down, `repmgrd` will try and handle the -situation as gracefully as possible, though with no failover candidates available -there's not much it can do. Should this case ever occur, we recommend shutting -down `repmgrd` on both nodes and restarting it once the PostgreSQL instances -are running properly. diff --git a/doc/changes-in-repmgr4.md b/doc/changes-in-repmgr4.md index cbf663e5..780a7fea 100644 --- a/doc/changes-in-repmgr4.md +++ b/doc/changes-in-repmgr4.md @@ -1,7 +1,7 @@ Changes in repmgr 4 =================== -This document has been integrated into the main repmgr documentation +This document has been integrated into the main `repmgr` documentation and is now located here: - https://repmgr.org/docs/release-4.0.html + (Release notes)[https://repmgr.org/docs/4.0/release-4.0.html] diff --git a/doc/upgrading-from-repmgr3.md b/doc/upgrading-from-repmgr3.md index 84a18363..030ef803 100644 --- a/doc/upgrading-from-repmgr3.md +++ b/doc/upgrading-from-repmgr3.md @@ -1,8 +1,9 @@ Upgrading from repmgr 3 ======================= -This document has been integrated into the main repmgr documentation +This document has been integrated into the main `repmgr` documentation and is now located here: - https://repmgr.org/docs/upgrading-from-repmgr-3.html + [Upgrading from repmgr 3.x](https://repmgr.org/docs/4.0/upgrading-from-repmgr-3.html) +