From 6b767048176f4c140f455d4aa934ed0adb9ce02b Mon Sep 17 00:00:00 2001 From: Ian Barwick Date: Thu, 26 Oct 2017 16:29:40 +0900 Subject: [PATCH] Initial conversion of existing BDR repmgr documentation --- doc/filelist.sgml | 1 + doc/repmgr-bdr.sgml | 37 ++++++++ doc/repmgr.sgml | 1 + doc/repmgrd-bdr.sgml | 171 ++++++++++++++++++++++++++++++++++++ doc/repmgrd-monitoring.sgml | 2 +- 5 files changed, 211 insertions(+), 1 deletion(-) create mode 100644 doc/repmgr-bdr.sgml create mode 100644 doc/repmgrd-bdr.sgml diff --git a/doc/filelist.sgml b/doc/filelist.sgml index 3f1ec405..85d09e7b 100644 --- a/doc/filelist.sgml +++ b/doc/filelist.sgml @@ -53,6 +53,7 @@ + diff --git a/doc/repmgr-bdr.sgml b/doc/repmgr-bdr.sgml new file mode 100644 index 00000000..58685c6e --- /dev/null +++ b/doc/repmgr-bdr.sgml @@ -0,0 +1,37 @@ + + + repmgrd + BDR + + + + BDR + + + BDR failover with repmgrd + + &repmgr; 4.x provides support for monitoring BDR nodes and taking action in + case one of the nodes fails. + + + + Due to the nature of BDR, it's only safe to use this solution for + a two-node scenario. Introducing additional nodes will create an inherent + risk of node desynchronisation if a node goes down without being cleanly + removed from the cluster. + + + + In contrast to streaming replication, there's no concept of "promoting" a new + primary node with BDR. Instead, "failover" involves monitoring both nodes + with `repmgrd` and redirecting queries from the failed node to the remaining + active node. This can be done by using an + event notification script + which is called by repmgrd to dynamically + reconfigure a proxy server/connection pooler such as PgBouncer. + + + + + + diff --git a/doc/repmgr.sgml b/doc/repmgr.sgml index 93e4fe74..bbd8044f 100644 --- a/doc/repmgr.sgml +++ b/doc/repmgr.sgml @@ -86,6 +86,7 @@ &repmgrd-network-split; &repmgrd-degraded-monitoring; &repmgrd-monitoring; + &repmgrd-bdr; diff --git a/doc/repmgrd-bdr.sgml b/doc/repmgrd-bdr.sgml new file mode 100644 index 00000000..c5e32a27 --- /dev/null +++ b/doc/repmgrd-bdr.sgml @@ -0,0 +1,171 @@ + + + repmgrd + BDR + + + + BDR + + + BDR failover with repmgrd + + &repmgr; 4.x provides support for monitoring BDR nodes and taking action in + case one of the nodes fails. + + + + Due to the nature of BDR, it's only safe to use this solution for + a two-node scenario. Introducing additional nodes will create an inherent + risk of node desynchronisation if a node goes down without being cleanly + removed from the cluster. + + + + In contrast to streaming replication, there's no concept of "promoting" a new + primary node with BDR. Instead, "failover" involves monitoring both nodes + with `repmgrd` and redirecting queries from the failed node to the remaining + active node. This can be done by using an + event notification script + which is called by repmgrd to dynamically + reconfigure a proxy server/connection pooler such as PgBouncer. + + + + Prerequisites + + &repmgr; 4 requires PostgreSQL 9.4 or 9.6 with the BDR 2 extension + enabled and configured for a two-node BDR network. &repmgr; 4 packages + must be installed on each node before attempting to configure + repmgr. + + + + &repmgr; 4 will refuse to install if it detects more than two BDR nodes. + + + + Application database connections *must* be passed through a proxy server/ + connection pooler such as PgBouncer, and it must be possible to dynamically + reconfigure that from repmgrd. The example demonstrated in this document + will use PgBouncer + + + The proxy server / connection poolers must not + be installed on the database servers. + + + For this example, it's assumed password-less SSH connections are available + from the PostgreSQL servers to the servers where PgBouncer + runs, and that the user on those servers has permission to alter the + PgBouncer configuration files. + + + PostgreSQL connections must be possible between each node, and each node + must be able to connect to each PgBouncer instance. + + + + + Configuration + + A sample configuration for repmgr.conf on each + BDR node would look like this: + + # Node information + node_id=1 + node_name='node1' + conninfo='host=node1 dbname=bdrtest user=repmgr connect_timeout=2' + data_directory='/var/lib/postgresql/data' + replication_type='bdr' + + # Event notification configuration + event_notifications=bdr_failover + event_notification_command='/path/to/bdr-pgbouncer.sh %n %e %s "%c" "%a" >> /tmp/bdr-failover.log 2>&1' + + # repmgrd options + monitor_interval_secs=5 + reconnect_attempts=6 + reconnect_interval=5 + + + Adjust settings as appropriate; copy and adjust for the second node (particularly + the values node_id, node_name + and conninfo). + + + Note that the values provided for the conninfo string + must be valid for connections from both nodes in the + replication cluster. The database must be the BDR-enabled database. + + + If defined, the evenr event_notifications parameter + will restrict execution of event_notification_command + to the specified event(s). + + + + event_notification_command is the script which does the actual "heavy lifting" + of reconfiguring the proxy server/ connection pooler. It is fully + user-definable; a reference implementation is documented below. + + + + + + + repmgr setup + + Register both nodes; example on node1: + + $ repmgr -f /etc/repmgr.conf bdr register + NOTICE: attempting to install extension "repmgr" + NOTICE: "repmgr" extension successfully installed + NOTICE: node record created for node 'node1' (ID: 1) + NOTICE: BDR node 1 registered (conninfo: host=node1 dbname=bdrtest user=repmgr) + + + and on node1: + + $ repmgr -f /etc/repmgr.conf bdr register + NOTICE: node record created for node 'node2' (ID: 2) + NOTICE: BDR node 2 registered (conninfo: host=node2 dbname=bdrtest user=repmgr) + + + The repmgr extension will be automatically created + when the first node is registered, and will be propagated to the second + node. + + + + Ensure the &repmgr; package is available on both nodes before + attempting to register the first node. + + + + At this point the meta data for both nodes has been created; executing + (on either node) should produce output like this: + + $ repmgr -f /etc/repmgr.conf cluster show + ID | Name | Role | Status | Upstream | Location | Connection string + ----+-------+------+-----------+----------+-------------------------------------------------------- + 1 | node1 | bdr | * running | | default | host=node1 dbname=bdrtest user=repmgr connect_timeout=2 + 2 | node2 | bdr | * running | | default | host=node2 dbname=bdrtest user=repmgr connect_timeout=2 + + + Additionally it's possible to display log of significant events; executing + (on either node) should produce output like this: + + Node ID | Event | OK | Timestamp | Details + ---------+--------------+----+---------------------+---------------------------------------------- + 2 | bdr_register | t | 2017-07-27 17:51:48 | node record created for node 'node2' (ID: 2) + 1 | bdr_register | t | 2017-07-27 17:51:00 | node record created for node 'node1' (ID: 1) + + + + At this point there will only be records for the two node registrations (displayed in reverse + chronological order). + + + + diff --git a/doc/repmgrd-monitoring.sgml b/doc/repmgrd-monitoring.sgml index f2ad9d57..e20d3f07 100644 --- a/doc/repmgrd-monitoring.sgml +++ b/doc/repmgrd-monitoring.sgml @@ -6,7 +6,7 @@ Monitoring with repmgrd - When `repmgrd` is running with the option monitoring_history=true, + When repmgrd is running with the option monitoring_history=true, it will constantly write standby node status information to the monitoring_history table, providing a near-real time overview of replication status on all nodes