Fencing a failed master node with repmgrd and pgbouncer
=======================================================

With automatic failover, it's essential to ensure that a failed master
remains inaccessible to your application, even if it comes back online
again, to avoid a split-brain situation.

By using `pgbouncer` together with `repmgrd`, it's possible to combine
automatic failover with a process to isolate the failed master from
your application and ensure that all connections which should go to
the master are directed there smoothly without having to reconfigure
your application. (Note that as a connection pooler, `pgbouncer` can
benefit your application in other ways, but those are beyond the scope
of this document).

* * *

> *WARNING*: automatic failover is tricky to get right. This document
> demonstrates one possible implementation method, however you should
> carefully configure and test any setup to suit the needs of your own
> replication cluster/application.

* * *

In a failover situation, `repmgrd` promotes a standby to master by
executing the command defined in `promote_command`. Normally this
would be something like:

    repmgr standby promote -f /etc/repmgr.conf

By wrapping this in a custom script which adjusts the `pgbouncer`
configuration on all nodes, it's possible to fence the failed master
and redirect write connections to the new master.

The script consists of three sections:

* commands to pause `pgbouncer` on all nodes
* the promotion command itself
* commands to reconfigure and restart `pgbouncer` on all nodes

Note that it requires password-less SSH access between all nodes to be
able to update the `pgbouncer` configuration files.

For the purposes of this demonstration, we'll assume there are 3 nodes
(master and two standbys), with `pgbouncer` listening on port 6432
handling connections to a database called `appdb`. The `postgres`
system user must have write access to the `pgbouncer` configuration
file on all nodes, assumed to be at `/etc/pgbouncer.ini`.

The script also requires a template file containing global `pgbouncer`
configuration, which should looks something like this (adjust
settings appropriately for your environment):

`/var/lib/postgres/repmgr/pgbouncer.ini.template`

    [pgbouncer]

    logfile = /var/log/pgbouncer/pgbouncer.log
    pidfile = /var/run/pgbouncer/pgbouncer.pid

    listen_addr = *
    listen_port = 6532
    unix_socket_dir = /tmp

    auth_type = trust
    auth_file = /etc/pgbouncer.auth

    admin_users = postgres
    stats_users = postgres

    pool_mode = transaction

    max_client_conn = 100
    default_pool_size = 20
    min_pool_size = 5
    reserve_pool_size = 5
    reserve_pool_timeout = 3

    log_connections = 1
    log_disconnections = 1
    log_pooler_errors = 1

The actual script is as follows; adjust the configurable items as appropriate:

`/var/lib/postgres/repmgr/promote.sh`


    #!/usr/bin/env bash
    set -u
    set -e

    # Configurable items
    PGBOUNCER_HOSTS="node1 node2 node3"
    REPMGR_DB="repmgr"
    REPMGR_USER="repmgr"
    REPMGR_SCHEMA="repmgr_test"
    PGBOUNCER_CONFIG="/etc/pgbouncer.ini"
    PGBOUNCER_INI_TEMPLATE="/var/lib/postgres/repmgr/pgbouncer.ini.template"
    PGBOUNCER_DATABASE="appdb"

    # 1. Pause running pgbouncer instances
    for HOST in $PGBOUNCER_HOSTS
    do
        psql -t -c "pause" -h $HOST -p $PORT -U postgres pgbouncer
    done


    # 2. Promote this node from standby to master

    repmgr standby promote -f /etc/repmgr.conf


    # 3. Reconfigure pgbouncer instances

    PGBOUNCER_INI_NEW="/tmp/pgbouncer.ini.new"

    for HOST in $PGBOUNCER_HOSTS
    do
        # Recreate the pgbouncer config file
        echo -e "[databases]\n" > $PGBOUNCER_INI_NEW

        psql -d $REPMGR_DB -U $REPMGR_USER -t -A \
          -c "SELECT '$PGBOUNCER_DATABASE= ' || conninfo || ' application_name=pgbouncer_$HOST' \
              FROM $REPMGR_SCHEMA.repl_nodes \
              WHERE active = TRUE AND type='master'" >> $PGBOUNCER_INI_NEW

        cat $PGBOUNCER_INI_TEMPLATE >> $PGBOUNCER_INI_NEW

        rsync $PGBOUNCER_INI_NEW $HOST:$PGBOUNCER_CONFIG

        psql -tc "reload" -h $HOST -U postgres pgbouncer
        psql -tc "resume" -h $HOST -U postgres pgbouncer

    done

    # Clean up generated file
    rm $PGBOUNCER_INI_NEW

    echo "Reconfiguration of pgbouncer complete"

Script and template file should be installed on each node where
`repmgrd` is running.

Finally, set `promote_command` in `repmgr.conf` on each node to
point to the custom promote script:

    promote_command=/var/lib/postgres/repmgr/promote.sh

and reload/restart any running `repmgrd` instances for the changes to take
effect.