mirror of
https://github.com/EnterpriseDB/repmgr.git
synced 2026-03-23 15:16:29 +00:00
Compare commits
1 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
372f4f7d3d |
@@ -1,29 +0,0 @@
|
||||
License and Contributions
|
||||
=========================
|
||||
|
||||
`repmgr` is licensed under the GPL v3. All of its code and documentation is
|
||||
Copyright 2010-2015, 2ndQuadrant Limited. See the files COPYRIGHT and LICENSE for
|
||||
details.
|
||||
|
||||
The development of repmgr has primarily been sponsored by 2ndQuadrant customers.
|
||||
|
||||
Additional work has been sponsored by the 4CaaST project for cloud computing,
|
||||
which has received funding from the European Union's Seventh Framework Programme
|
||||
(FP7/2007-2013) under grant agreement 258862.
|
||||
|
||||
Contributions to `repmgr` are welcome, and will be listed in the file `CREDITS`.
|
||||
2ndQuadrant Limited requires that any contributions provide a copyright
|
||||
assignment and a disclaimer of any work-for-hire ownership claims from the
|
||||
employer of the developer. This lets us make sure that all of the repmgr
|
||||
distribution remains free code. Please contact info@2ndQuadrant.com for a
|
||||
copy of the relevant Copyright Assignment Form.
|
||||
|
||||
Code style
|
||||
----------
|
||||
|
||||
Code in repmgr is formatted to a consistent style using the following command:
|
||||
|
||||
astyle --style=ansi --indent=tab --suffix=none *.c *.h
|
||||
|
||||
Contributors should reformat their code similarly before submitting code to
|
||||
the project, in order to minimize merge conflicts with other work.
|
||||
@@ -203,12 +203,6 @@ repmgr will also ask for the superuser password on the witness database so
|
||||
it can reconnect when needed (the command line option --initdb-no-pwprompt
|
||||
will set up a password-less superuser).
|
||||
|
||||
By default the witness server will listen on port 5499; this value can be
|
||||
overridden by explicitly providing the port number in the conninfo string
|
||||
in repmgr.conf. (Note that it is also possible to specify the port number
|
||||
with the -l/--local-port option, however this option is now deprecated and
|
||||
will be overridden by a port setting in the conninfo string).
|
||||
|
||||
Start the repmgrd daemons
|
||||
-------------------------
|
||||
|
||||
|
||||
19
FAQ.md
19
FAQ.md
@@ -90,23 +90,6 @@ General
|
||||
|
||||
This option is only available when using the `--rsync-only` option.
|
||||
|
||||
- How can I make the witness server use a particular port?
|
||||
|
||||
By default the witness server is configured to use port 5499; this
|
||||
is intended to support running the witness server as a separate
|
||||
instance on a normal node server, rather than on its own dedicated server.
|
||||
|
||||
To specify a port for the witness server, supply the port number to
|
||||
repmgr with the `-l/--local-port` command line option.
|
||||
|
||||
- Do I need to include `shared_preload_libraries = 'repmgr_funcs'`
|
||||
in `postgresql.conf` if I'm not using `repmgrd`?
|
||||
|
||||
No, the `repmgr_funcs` library is only needed when running `repmgrd`.
|
||||
If you later decide to run `repmgrd`, you just need to add
|
||||
`shared_preload_libraries = 'repmgr_funcs'` and restart PostgreSQL.
|
||||
|
||||
|
||||
`repmgrd`
|
||||
---------
|
||||
|
||||
@@ -119,7 +102,7 @@ General
|
||||
|
||||
- How can I prevent a node from ever being promoted to master?
|
||||
|
||||
In `repmgr.conf`, set its priority to a value of 0 or less.
|
||||
In `rempgr.conf`, set its priority to a value of 0 or less.
|
||||
|
||||
- Does `repmgrd` support delayed standbys?
|
||||
|
||||
|
||||
30
HISTORY
30
HISTORY
@@ -1,27 +1,4 @@
|
||||
3.0.2 2015-09-
|
||||
Improve handling of --help/--version options; and improve help output (Ian)
|
||||
Improve handling of situation where logfile can't be opened (Ian)
|
||||
Always pass -D/--pgdata option to pg_basebackup (Ian)
|
||||
Bugfix: standby clone --force does not empty pg_xlog (Gianni)
|
||||
Bugfix: autofailover with reconnect_attempts > 1 (Gianni)
|
||||
Bugfix: ignore comments after values (soxwellfb)
|
||||
Bugfix: handle string values in 'node' parameter correctly (Gregory Duchatelet)
|
||||
Allow repmgr to be compiled with a newer libpq (Marco)
|
||||
Bugfix: call update_node_record_set_upstream() for STANDBY FOLLOW (Tomas)
|
||||
Update `repmgr --help` output (per Github report from renard)
|
||||
Update tablespace remapping in --rsync-only mode for 9.5 and later (Ian)
|
||||
Deprecate `-l/--local-port` option - the port can be extracted
|
||||
from the conninfo string in repmgr.conf (Ian)
|
||||
Add STANDBY UNREGISTE (Vik Fearing)
|
||||
|
||||
3.0.1 2015-04-16
|
||||
Prevent repmgrd from looping infinitely if node was not registered (Ian)
|
||||
When promoting a standby, have repmgr (not repmgrd) handle metadata updates (Ian)
|
||||
Re-use replication slot if it already exists (Ian)
|
||||
Prevent a test SSH connection being made when not needed (Ian)
|
||||
Correct monitoring table column names (Ian)
|
||||
|
||||
3.0 2015-03-27
|
||||
3.0
|
||||
Require PostgreSQL 9.3 or later (Ian)
|
||||
Use `pg_basebackup` by default (instead of `rsync`) to clone standby servers (Ian)
|
||||
Use `pg_ctl promote` to promote a standby to primary
|
||||
@@ -34,11 +11,6 @@
|
||||
General usability and logging message improvements (Ian)
|
||||
Code consolidation and cleanup (Ian)
|
||||
|
||||
2.0.3 2015-04-16
|
||||
Add -S/--superuser option for witness database creation Ian)
|
||||
Add -c/--fast-checkpoint option for cloning (Christoph)
|
||||
Add option "--initdb-no-pwprompt" (Ian)
|
||||
|
||||
2.0.2 2015-02-17
|
||||
Add "--checksum" in rsync when using "--force" (Jaime)
|
||||
Use createdb/createuser instead of psql (Jaime)
|
||||
|
||||
96
PACKAGES.md
96
PACKAGES.md
@@ -4,10 +4,10 @@ Packaging
|
||||
Notes on RedHat Linux, Fedora, and CentOS Builds
|
||||
------------------------------------------------
|
||||
|
||||
The RPM packages of PostgreSQL put `pg_config` into the `postgresql-devel`
|
||||
The RPM packages of PostgreSQL put ``pg_config`` into the ``postgresql-devel``
|
||||
package, not the main server one. And if you have a RPM install of PostgreSQL
|
||||
9.0, the entire PostgreSQL binary directory will not be in your PATH by default
|
||||
either. Individual utilities are made available via the `alternatives`
|
||||
either. Individual utilities are made available via the ``alternatives``
|
||||
mechanism, but not all commands will be wrapped that way. The files installed
|
||||
by repmgr will certainly not be in the default PATH for the postgres user
|
||||
on such a system. They will instead be in /usr/pgsql-9.0/bin/ on this
|
||||
@@ -15,61 +15,57 @@ type of system.
|
||||
|
||||
When building repmgr against a RPM packaged build, you may discover that some
|
||||
development packages are needed as well. The following build errors can
|
||||
occur:
|
||||
occur::
|
||||
|
||||
/usr/bin/ld: cannot find -lxslt
|
||||
/usr/bin/ld: cannot find -lpam
|
||||
/usr/bin/ld: cannot find -lxslt
|
||||
/usr/bin/ld: cannot find -lpam
|
||||
|
||||
Install the following packages to correct those:
|
||||
Install the following packages to correct those::
|
||||
|
||||
|
||||
yum install libxslt-devel
|
||||
yum install pam-devel
|
||||
yum install libxslt-devel
|
||||
yum install pam-devel
|
||||
|
||||
If building repmgr as a regular user, then doing the install into the system
|
||||
directories using sudo, the syntax is hard. `pg_config` won't be in root's
|
||||
path either. The following recipe should work:
|
||||
|
||||
sudo PATH="/usr/pgsql-9.0/bin:$PATH" make USE_PGXS=1 install
|
||||
directories using sudo, the syntax is hard. ``pg_config`` won't be in root's
|
||||
path either. The following recipe should work::
|
||||
|
||||
sudo PATH="/usr/pgsql-9.0/bin:$PATH" make USE_PGXS=1 install
|
||||
|
||||
Issues with 32 and 64 bit RPMs
|
||||
------------------------------
|
||||
|
||||
If when building, you receive a series of errors of this form:
|
||||
If when building, you receive a series of errors of this form::
|
||||
|
||||
/usr/bin/ld: skipping incompatible /usr/pgsql-9.0/lib/libpq.so when searching for -lpq
|
||||
|
||||
This is likely because you have both the 32 and 64 bit versions of the
|
||||
`postgresql90-devel` package installed. You can check that like this:
|
||||
``postgresql90-devel`` package installed. You can check that like this::
|
||||
|
||||
rpm -qa --queryformat '%{NAME}\t%{ARCH}\n' | grep postgresql90-devel
|
||||
rpm -qa --queryformat '%{NAME}\t%{ARCH}\n' | grep postgresql90-devel
|
||||
|
||||
And if two packages appear, one for i386 and one for x86_64, that's not supposed
|
||||
to be allowed.
|
||||
|
||||
This can happen when using the PGDG repo to install that package;
|
||||
here is an example sessions demonstrating the problem case appearing:
|
||||
here is an example sessions demonstrating the problem case appearing::
|
||||
|
||||
# yum install postgresql-devel
|
||||
..
|
||||
Setting up Install Process
|
||||
Resolving Dependencies
|
||||
--> Running transaction check
|
||||
---> Package postgresql90-devel.i386 0:9.0.2-2PGDG.rhel5 set to be updated
|
||||
---> Package postgresql90-devel.x86_64 0:9.0.2-2PGDG.rhel5 set to be updated
|
||||
--> Finished Dependency Resolution
|
||||
|
||||
# yum install postgresql-devel
|
||||
..
|
||||
Setting up Install Process
|
||||
Resolving Dependencies
|
||||
--> Running transaction check
|
||||
---> Package postgresql90-devel.i386 0:9.0.2-2PGDG.rhel5 set to be updated
|
||||
---> Package postgresql90-devel.x86_64 0:9.0.2-2PGDG.rhel5 set to be updated
|
||||
--> Finished Dependency Resolution
|
||||
|
||||
Dependencies Resolved
|
||||
|
||||
=========================================================================
|
||||
Package Arch Version Repository Size
|
||||
=========================================================================
|
||||
Installing:
|
||||
postgresql90-devel i386 9.0.2-2PGDG.rhel5 pgdg90 1.5 M
|
||||
postgresql90-devel x86_64 9.0.2-2PGDG.rhel5 pgdg90 1.6 M
|
||||
Dependencies Resolved
|
||||
|
||||
=========================================================================
|
||||
Package Arch Version Repository Size
|
||||
=========================================================================
|
||||
Installing:
|
||||
postgresql90-devel i386 9.0.2-2PGDG.rhel5 pgdg90 1.5 M
|
||||
postgresql90-devel x86_64 9.0.2-2PGDG.rhel5 pgdg90 1.6 M
|
||||
|
||||
Note how both the i386 and x86_64 platform architectures are selected for
|
||||
installation. Your main PostgreSQL package will only be compatible with one of
|
||||
@@ -77,14 +73,14 @@ those, and if the repmgr build finds the wrong postgresql90-devel these
|
||||
"skipping incompatible" messages appear.
|
||||
|
||||
In this case, you can temporarily remove both packages, then just install the
|
||||
correct one for your architecture. Example:
|
||||
correct one for your architecture. Example::
|
||||
|
||||
rpm -e postgresql90-devel --allmatches
|
||||
yum install postgresql90-devel-9.0.2-2PGDG.rhel5.x86_64
|
||||
rpm -e postgresql90-devel --allmatches
|
||||
yum install postgresql90-devel-9.0.2-2PGDG.rhel5.x86_64
|
||||
|
||||
Instead just deleting the package from the wrong platform might not leave behind
|
||||
the correct files, due to the way in which these accidentally happen to interact.
|
||||
If you already tried to build repmgr before doing this, you'll need to do:
|
||||
If you already tried to build repmgr before doing this, you'll need to do::
|
||||
|
||||
make USE_PGXS=1 clean
|
||||
|
||||
@@ -93,19 +89,19 @@ to get rid of leftover files from the wrong architecture.
|
||||
Notes on Ubuntu, Debian or other Debian-based Builds
|
||||
----------------------------------------------------
|
||||
|
||||
The Debian packages of PostgreSQL put `pg_config` into the development package
|
||||
called `postgresql-server-dev-$version`.
|
||||
The Debian packages of PostgreSQL put ``pg_config`` into the development package
|
||||
called ``postgresql-server-dev-$version``.
|
||||
|
||||
When building repmgr against a Debian packages build, you may discover that some
|
||||
development packages are needed as well. You will need the following development
|
||||
packages installed:
|
||||
packages installed::
|
||||
|
||||
sudo apt-get install libxslt-dev libxml2-dev libpam-dev libedit-dev
|
||||
sudo apt-get install libxslt-dev libxml2-dev libpam-dev libedit-dev
|
||||
|
||||
If you're using Debian packages for PostgreSQL and are building repmgr with the
|
||||
USE_PGXS option you also need to install the corresponding development package:
|
||||
If your using Debian packages for PostgreSQL and are building repmgr with the
|
||||
USE_PGXS option you also need to install the corresponding development package::
|
||||
|
||||
sudo apt-get install postgresql-server-dev-9.0
|
||||
sudo apt-get install postgresql-server-dev-9.0
|
||||
|
||||
If you build and install repmgr manually it will not be on the system path. The
|
||||
binaries will be installed in /usr/lib/postgresql/$version/bin/ which is not on
|
||||
@@ -114,14 +110,14 @@ multiple installed versions of PostgreSQL on the same system through a wrapper
|
||||
called pg_wrapper and repmgr is not (yet) known to this wrapper.
|
||||
|
||||
You can solve this in many different ways, the most Debian like is to make an
|
||||
alternate for repmgr and repmgrd:
|
||||
alternate for repmgr and repmgrd::
|
||||
|
||||
sudo update-alternatives --install /usr/bin/repmgr repmgr /usr/lib/postgresql/9.0/bin/repmgr 10
|
||||
sudo update-alternatives --install /usr/bin/repmgrd repmgrd /usr/lib/postgresql/9.0/bin/repmgrd 10
|
||||
sudo update-alternatives --install /usr/bin/repmgr repmgr /usr/lib/postgresql/9.0/bin/repmgr 10
|
||||
sudo update-alternatives --install /usr/bin/repmgrd repmgrd /usr/lib/postgresql/9.0/bin/repmgrd 10
|
||||
|
||||
You can also make a deb package of repmgr using:
|
||||
You can also make a deb package of repmgr using::
|
||||
|
||||
make USE_PGXS=1 deb
|
||||
make USE_PGXS=1 deb
|
||||
|
||||
This will build a Debian package one level up from where you build, normally the
|
||||
same directory that you have your repmgr/ directory in.
|
||||
|
||||
@@ -21,8 +21,7 @@ Master setup
|
||||
CREATE DATABASE repmgr_db OWNER repmgr_usr;
|
||||
```
|
||||
|
||||
- configure `postgresql.conf` for replication (see README.md for sample
|
||||
settings)
|
||||
- configure `postgresql.conf` for replication (see above)
|
||||
|
||||
- update `pg_hba.conf`, e.g.:
|
||||
|
||||
@@ -72,10 +71,7 @@ Standby setup
|
||||
[2015-03-03 18:18:23] [NOTICE] HINT: You can now start your postgresql server
|
||||
[2015-03-03 18:18:23] [NOTICE] for example : pg_ctl -D /path/to/standby/data start
|
||||
|
||||
Note that the `repmgr.conf` file is not required when cloning a standby.
|
||||
However we recommend providing a valid `repmgr.conf` if you wish to use
|
||||
replication slots, or want `repmgr` to log the clone event to the
|
||||
`repl_events` table.
|
||||
Note that at this point it does not matter if the `repmgr.conf` file is not found.
|
||||
|
||||
This will clone the PostgreSQL database files from the master, including its
|
||||
`postgresql.conf` and `pg_hba.conf` files, and additionally automatically create
|
||||
@@ -111,8 +107,8 @@ This concludes the basic `repmgr` setup of master and standby. The records
|
||||
created in the `repl_nodes` table should look something like this:
|
||||
|
||||
repmgr_db=# SELECT * from repmgr_test.repl_nodes;
|
||||
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active
|
||||
----+---------+------------------+---------+-------+----------------------------------------------------+-----------+----------+--------
|
||||
1 | primary | | test | node1 | host=repmgr_node1 user=repmgr_usr dbname=repmgr_db | | 0 | t
|
||||
2 | standby | 1 | test | node2 | host=repmgr_node2 user=repmgr_usr dbname=repmgr_db | | 0 | t
|
||||
id | type | upstream_node_id | cluster | name | conninfo | slot_name | priority | active
|
||||
----+---------+------------------+---------+-------+-------------------------------------------------+-----------+----------+--------
|
||||
1 | primary | | test | node1 | host=localhost user=repmgr_usr dbname=repmgr_db | | 0 | t
|
||||
2 | standby | 1 | test | node2 | host=localhost user=repmgr_usr dbname=repmgr_db | | 0 | t
|
||||
(2 rows)
|
||||
|
||||
45
README.md
45
README.md
@@ -7,7 +7,7 @@ hot-standby capabilities with tools to set up standby servers, monitor
|
||||
replication, and perform administrative tasks such as failover or manual
|
||||
switchover operations.
|
||||
|
||||
This document covers `repmgr 3`, which supports PostgreSQL 9.3 and later.
|
||||
This document covers `repmgr 3`, which supports PostgreSQL 9.4 and 9.3.
|
||||
This version can use `pg_basebackup` to clone standby servers, supports
|
||||
replication slots and cascading replication, doesn't require a restart
|
||||
after promotion, and has many usability improvements.
|
||||
@@ -53,7 +53,7 @@ on any UNIX-like system which PostgreSQL itself supports.
|
||||
|
||||
All nodes must be running the same major version of PostgreSQL, and we
|
||||
recommend that they also run the same minor version. This version of
|
||||
`repmgr` (v3) supports PostgreSQL 9.3 and later.
|
||||
`repmgr` (v3) supports PostgreSQL 9.3 and 9.4.
|
||||
|
||||
Earlier versions of `repmgr` needed password-less SSH access between
|
||||
nodes in order to clone standby servers using `rsync`. `repmgr 3` can
|
||||
@@ -98,8 +98,8 @@ for details.
|
||||
|
||||
### PostgreSQL configuration
|
||||
|
||||
The primary server needs to be configured for replication with settings
|
||||
like the following in `postgresql.conf`:
|
||||
The primary server needs to be configured for replication with the
|
||||
following settings in `postgresql.conf`:
|
||||
|
||||
# Allow read-only queries on standby servers. The number of WAL
|
||||
# senders should be larger than the number of standby servers.
|
||||
@@ -121,18 +121,13 @@ like the following in `postgresql.conf`:
|
||||
archive_mode = on
|
||||
archive_command = 'cd .'
|
||||
|
||||
# If you plan to use repmgrd, ensure that shared_preload_libraries
|
||||
# is configured to load 'repmgr_funcs'
|
||||
|
||||
shared_preload_libraries = 'repmgr_funcs'
|
||||
# You can also set additional replication parameters here, such as
|
||||
# hot_standby_feedback or synchronous_standby_names.
|
||||
|
||||
PostgreSQL 9.4 makes it possible to use replication slots, which means
|
||||
the value of `wal_keep_segments` need no longer be set. See section
|
||||
"Replication slots" below for more details.
|
||||
|
||||
With PostgreSQL 9.3, `repmgr` expects `wal_keep_segments` to be set to
|
||||
at least 5000 (= 80GB of WAL) by default, though this can be overriden
|
||||
with the `-w N` argument.
|
||||
the value of wal_keep_segments need no longer be set. With 9.3, `repmgr`
|
||||
expects it to be set to at least 5000 (= 80GB of WAL) by default, though
|
||||
this can be overriden with the `-w N` argument.
|
||||
|
||||
A dedicated PostgreSQL superuser account and a database in which to
|
||||
store monitoring and replication data are required. Create them by
|
||||
@@ -228,7 +223,7 @@ The node can then be restarted.
|
||||
The node will then need to be re-registered with `repmgr`; again
|
||||
the `--force` option is required to update the existing record:
|
||||
|
||||
repmgr -f /etc/repmgr/repmgr.conf \
|
||||
repmgr -f /etc/repmgr/repmgr.conf
|
||||
--force \
|
||||
standby register
|
||||
|
||||
@@ -350,7 +345,6 @@ Following event types currently exist:
|
||||
|
||||
master_register
|
||||
standby_register
|
||||
standby_unregister
|
||||
standby_clone
|
||||
standby_promote
|
||||
witness_create
|
||||
@@ -404,18 +398,6 @@ stored in the `repl_nodes` table.
|
||||
Note that `repmgr` will fail with an error if this option is specified when
|
||||
working with PostgreSQL 9.3.
|
||||
|
||||
Be aware that when initially cloning a standby, you will need to ensure
|
||||
that all required WAL files remain available while the cloning is taking
|
||||
place. If using the default `pg_basebackup` method, we recommend setting
|
||||
`pg_basebackup`'s `--xlog-method` parameter to `stream` like this:
|
||||
|
||||
pg_basebackup_options='--xlog-method=stream'
|
||||
|
||||
See the `pg_basebackup` documentation [*] for details. Otherwise you'll need
|
||||
to set `wal_keep_segments` to an appropriately high value.
|
||||
|
||||
[*] http://www.postgresql.org/docs/current/static/app-pgbasebackup.html
|
||||
|
||||
Further reading:
|
||||
* http://www.postgresql.org/docs/current/interactive/warm-standby.html#STREAMING-REPLICATION-SLOTS
|
||||
* http://blog.2ndquadrant.com/postgresql-9-4-slots/
|
||||
@@ -453,19 +435,12 @@ its port if is different from the default one.
|
||||
Registers a master in a cluster. This command needs to be executed before any
|
||||
standby nodes are registered.
|
||||
|
||||
`primary register` can be used as an alias for `master register`.
|
||||
|
||||
* `standby register`
|
||||
|
||||
Registers a standby with `repmgr`. This command needs to be executed to enable
|
||||
promote/follow operations and to allow `repmgrd` to work with the node.
|
||||
An existing standby can be registered using this command.
|
||||
|
||||
* `standby unregister`
|
||||
|
||||
Unregisters a standby with `repmgr`. This command does not affect the actual
|
||||
replication.
|
||||
|
||||
* `standby clone [node to be cloned]`
|
||||
|
||||
Clones a new standby node from the data directory of the master (or
|
||||
|
||||
@@ -1,114 +1,89 @@
|
||||
#!/bin/sh
|
||||
#!/bin/bash
|
||||
#
|
||||
# chkconfig: - 75 16
|
||||
# description: Enable repmgrd replication management and monitoring daemon for PostgreSQL
|
||||
# processname: repmgrd
|
||||
# pidfile="/var/run/${NAME}.pid"
|
||||
# repmgrd Start up the repmgrd daemon
|
||||
# repmrgd (replication manager daemon)
|
||||
#
|
||||
# chkconfig: - 75 16
|
||||
# description: repmgrd is the repliation manager daemon \
|
||||
# The repmgrd replication management and monitoring daemon for PostgreSQL.
|
||||
|
||||
### BEGIN INIT INFO
|
||||
# Provides: repmgrd
|
||||
# Required-Start: $local_fs $remote_fs $network $syslog postgresql
|
||||
# Required-Stop: $local_fs $remote_fs $network $syslog postgresql
|
||||
# Should-Start: $syslog postgresql-9.3
|
||||
# Should-Stop: $syslog postgresql-9.3
|
||||
# Short-Description: start and stop repmrgd
|
||||
# Description: Enable repmgrd replication management and monitoring daemon for PostgreSQL
|
||||
# this is used to monitor a postgresql cluster.
|
||||
### END INIT INFO
|
||||
|
||||
# Source function library.
|
||||
INITD=/etc/rc.d/init.d
|
||||
. $INITD/functions
|
||||
. /etc/init.d/functions
|
||||
|
||||
# Get function listing for cross-distribution logic.
|
||||
TYPESET=`typeset -f|grep "declare"`
|
||||
|
||||
# Get network config.
|
||||
# Source networking configuration.
|
||||
. /etc/sysconfig/network
|
||||
|
||||
DESC="PostgreSQL replication management and monitoring daemon"
|
||||
NAME=repmgrd
|
||||
|
||||
REPMGRD_ENABLED=no
|
||||
prog=repmgrd
|
||||
REPMGRD_ENABLED=yes
|
||||
REPMGRD_OPTS=
|
||||
REPMGRD_USER=postgres
|
||||
REPMGRD_BIN=/usr/pgsql-9.3/bin/repmgrd
|
||||
REPMGRD_PIDFILE=/var/run/repmgrd.pid
|
||||
REPMGRD_LOCK=/var/lock/subsys/${NAME}
|
||||
REPMGRD_LOG=/var/lib/pgsql/9.3/data/pg_log/repmgrd.log
|
||||
DAEMONIZE="-d"
|
||||
|
||||
# Read configuration variable file if it is present
|
||||
[ -r /etc/sysconfig/$NAME ] && . /etc/sysconfig/$NAME
|
||||
# pull in sysconfig settings
|
||||
[ -f /etc/sysconfig/repmgrd ] && . /etc/sysconfig/repmgrd
|
||||
|
||||
# For SELinux we need to use 'runuser' not 'su'
|
||||
if [ -x /sbin/runuser ]
|
||||
then
|
||||
SU=runuser
|
||||
else
|
||||
SU=su
|
||||
fi
|
||||
|
||||
test -x $REPMGRD_BIN || exit 0
|
||||
LOCKFILE=/var/lock/subsys/$prog
|
||||
RETVAL=0
|
||||
|
||||
case "$REPMGRD_ENABLED" in
|
||||
[Yy]*)
|
||||
break
|
||||
#nothing to do here
|
||||
;;
|
||||
*)
|
||||
exit 0
|
||||
exit 2
|
||||
;;
|
||||
esac
|
||||
|
||||
|
||||
if [ -z "${REPMGRD_OPTS}" ]
|
||||
if [ -z "$REPMGRD_OPTS" ]
|
||||
then
|
||||
echo "Not starting ${NAME}, REPMGRD_OPTS not set in /etc/sysconfig/${NAME}"
|
||||
exit 0
|
||||
echo "Not starting $prog, REPMGRD_OPTS not set in /etc/sysconfig/$prog"
|
||||
exit 2
|
||||
fi
|
||||
|
||||
start()
|
||||
{
|
||||
REPMGRD_START=$"Starting ${NAME} service: "
|
||||
start() {
|
||||
[ "$EUID" != "0" ] && exit 4
|
||||
[ "$NETWORKING" = "no" ] && exit 1
|
||||
|
||||
# Make sure startup-time log file is valid
|
||||
if [ ! -e "${REPMGRD_LOG}" -a ! -h "${REPMGRD_LOG}" ]
|
||||
then
|
||||
touch "${REPMGRD_LOG}" || exit 1
|
||||
chown ${REPMGRD_USER}:postgres "${REPMGRD_LOG}"
|
||||
chmod go-rwx "${REPMGRD_LOG}"
|
||||
[ -x /sbin/restorecon ] && /sbin/restorecon "${REPMGRD_LOG}"
|
||||
fi
|
||||
|
||||
echo -n "${REPMGRD_START}"
|
||||
$SU -l $REPMGRD_USER -c "${REPMGRD_BIN} ${REPMGRD_OPTS} -p ${REPMGRD_PIDFILE} &" >> "${REPMGRD_LOG}" 2>&1 < /dev/null
|
||||
sleep 2
|
||||
pid=`head -n 1 "${REPMGRD_PIDFILE}" 2>/dev/null`
|
||||
if [ "x${pid}" != "x" ]
|
||||
then
|
||||
success "${REPMGRD_START}"
|
||||
touch "${REPMGRD_LOCK}"
|
||||
echo $pid > "${REPMGRD_PIDFILE}"
|
||||
# Start daemons.
|
||||
echo -n $"Starting $prog: "
|
||||
daemon --user $REPMGRD_USER $prog $DAEMONIZE $REPMGRD_OPTS
|
||||
RETVAL=$?
|
||||
echo
|
||||
else
|
||||
failure "${REPMGRD_START}"
|
||||
echo
|
||||
script_result=1
|
||||
fi
|
||||
[ $RETVAL -eq 0 ] && touch $LOCKFILE
|
||||
return $RETVAL
|
||||
}
|
||||
|
||||
stop()
|
||||
{
|
||||
echo -n $"Stopping ${NAME} service: "
|
||||
if [ -e "${REPMGRD_LOCK}" ]
|
||||
then
|
||||
killproc ${NAME}
|
||||
ret=$?
|
||||
if [ $ret -eq 0 ]
|
||||
then
|
||||
echo_success
|
||||
rm -f "${REPMGRD_PIDFILE}"
|
||||
rm -f "${REPMGRD_LOCK}"
|
||||
stop() {
|
||||
[ "$EUID" != "0" ] && exit 4
|
||||
echo -n $"Shutting down $prog: "
|
||||
killproc $prog
|
||||
RETVAL=$?
|
||||
echo
|
||||
[ $RETVAL -eq 0 ] && rm -f $LOCKFILE
|
||||
return $RETVAL
|
||||
}
|
||||
status() {
|
||||
if [ -f "$LOCKFILE" ]; then
|
||||
echo "$prog is running"
|
||||
else
|
||||
echo_failure
|
||||
script_result=1
|
||||
RETVAL=3
|
||||
echo "$prog is stopped"
|
||||
fi
|
||||
else
|
||||
# not running; per LSB standards this is "ok"
|
||||
echo_success
|
||||
fi
|
||||
echo
|
||||
return $RETVAL
|
||||
}
|
||||
|
||||
|
||||
# See how we were called.
|
||||
case "$1" in
|
||||
start)
|
||||
@@ -118,16 +93,22 @@ case "$1" in
|
||||
stop
|
||||
;;
|
||||
status)
|
||||
status -p $REPMGRD_PIDFILE $NAME
|
||||
script_result=$?
|
||||
status $prog
|
||||
;;
|
||||
restart)
|
||||
restart|force-reload)
|
||||
stop
|
||||
start
|
||||
start
|
||||
;;
|
||||
try-restart|condrestart)
|
||||
if status $prog > /dev/null; then
|
||||
stop
|
||||
start
|
||||
fi
|
||||
;;
|
||||
reload)
|
||||
exit 3
|
||||
;;
|
||||
*)
|
||||
echo $"Usage: $0 {start|stop|status|restart}"
|
||||
echo $"Usage: $0 {start|stop|status|restart|try-restart|force-reload}"
|
||||
exit 2
|
||||
esac
|
||||
|
||||
exit $script_result
|
||||
|
||||
@@ -1,21 +1,4 @@
|
||||
# default settings for repmgrd. This file is source by /bin/sh from
|
||||
# /etc/init.d/repmgrd
|
||||
#default sysconfig file for repmrgd
|
||||
#custom overrides can be placed here
|
||||
|
||||
# disable repmgrd by default so it won't get started upon installation
|
||||
# valid values: yes/no
|
||||
REPMGRD_ENABLED=no
|
||||
|
||||
# Options for repmgrd (required)
|
||||
#REPMGRD_OPTS="--verbose -d -f /var/lib/pgsql/repmgr/repmgr.conf"
|
||||
|
||||
# User to run repmgrd as
|
||||
#REPMGRD_USER=postgres
|
||||
|
||||
# repmgrd binary
|
||||
#REPMGRD_BIN=/usr/bin/repmgrd
|
||||
|
||||
# pid file
|
||||
#REPMGRD_PIDFILE=/var/lib/pgsql/repmgr/repmgrd.pid
|
||||
|
||||
# log file
|
||||
#REPMGRD_LOG=/var/lib/pgsql/repmgr/repmgrd.log
|
||||
REPMGRD_OPTS="-f /etc/repmgr/repmgr.conf"
|
||||
|
||||
49
SSH-RSYNC.md
49
SSH-RSYNC.md
@@ -1,36 +1,35 @@
|
||||
Set up trusted copy between postgres accounts
|
||||
---------------------------------------------
|
||||
|
||||
If you need to use `rsync` to clone standby servers, the `postgres` account
|
||||
on your primary and standby servers must be each able to access the other
|
||||
If you need to use rsync to clone standby servers, the postgres account
|
||||
on your master and standby servers must be each able to access the other
|
||||
using SSH without a password.
|
||||
|
||||
First generate an ssh key, using an empty passphrase, and copy the resulting
|
||||
keys and a matching authorization file to a privileged user account on the other
|
||||
system:
|
||||
First generate a ssh key, using an empty passphrase, and copy the resulting
|
||||
keys and a maching authorization file to a privledged user on the other system::
|
||||
|
||||
[postgres@node1]$ ssh-keygen -t rsa
|
||||
Generating public/private rsa key pair.
|
||||
Enter file in which to save the key (/var/lib/pgsql/.ssh/id_rsa):
|
||||
Enter passphrase (empty for no passphrase):
|
||||
Enter same passphrase again:
|
||||
Your identification has been saved in /var/lib/pgsql/.ssh/id_rsa.
|
||||
Your public key has been saved in /var/lib/pgsql/.ssh/id_rsa.pub.
|
||||
The key fingerprint is:
|
||||
aa:bb:cc:dd:ee:ff:aa:11:22:33:44:55:66:77:88:99 postgres@db1.domain.com
|
||||
[postgres@node1]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
|
||||
[postgres@node1]$ chmod go-rwx ~/.ssh/*
|
||||
[postgres@node1]$ cd ~/.ssh
|
||||
[postgres@node1]$ scp id_rsa.pub id_rsa authorized_keys user@node2:
|
||||
[postgres@node1]$ ssh-keygen -t rsa
|
||||
Generating public/private rsa key pair.
|
||||
Enter file in which to save the key (/var/lib/pgsql/.ssh/id_rsa):
|
||||
Enter passphrase (empty for no passphrase):
|
||||
Enter same passphrase again:
|
||||
Your identification has been saved in /var/lib/pgsql/.ssh/id_rsa.
|
||||
Your public key has been saved in /var/lib/pgsql/.ssh/id_rsa.pub.
|
||||
The key fingerprint is:
|
||||
aa:bb:cc:dd:ee:ff:aa:11:22:33:44:55:66:77:88:99 postgres@db1.domain.com
|
||||
[postgres@node1]$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
|
||||
[postgres@node1]$ chmod go-rwx ~/.ssh/*
|
||||
[postgres@node1]$ cd ~/.ssh
|
||||
[postgres@node1]$ scp id_rsa.pub id_rsa authorized_keys user@node2:
|
||||
|
||||
Login as a user on the other system, and install the files into the `postgres`
|
||||
user's account:
|
||||
Login as a user on the other system, and install the files into the postgres
|
||||
user's account::
|
||||
|
||||
[user@node2 ~]$ sudo chown postgres.postgres authorized_keys id_rsa.pub id_rsa
|
||||
[user@node2 ~]$ sudo mkdir -p ~postgres/.ssh
|
||||
[user@node2 ~]$ sudo chown postgres.postgres ~postgres/.ssh
|
||||
[user@node2 ~]$ sudo mv authorized_keys id_rsa.pub id_rsa ~postgres/.ssh
|
||||
[user@node2 ~]$ sudo chmod -R go-rwx ~postgres/.ssh
|
||||
[user@node2 ~]$ sudo chown postgres.postgres authorized_keys id_rsa.pub id_rsa
|
||||
[user@node2 ~]$ sudo mkdir -p ~postgres/.ssh
|
||||
[user@node2 ~]$ sudo chown postgres.postgres ~postgres/.ssh
|
||||
[user@node2 ~]$ sudo mv authorized_keys id_rsa.pub id_rsa ~postgres/.ssh
|
||||
[user@node2 ~]$ sudo chmod -R go-rwx ~postgres/.ssh
|
||||
|
||||
Now test that ssh in both directions works. You may have to accept some new
|
||||
known hosts in the process.
|
||||
|
||||
27
TODO
27
TODO
@@ -5,15 +5,9 @@ Known issues in repmgr
|
||||
the database server using the ``pg_ctl`` command may accidentally
|
||||
terminate after their associated ssh session ends.
|
||||
|
||||
* PGPASSFILE may not be passed to pg_basebackup
|
||||
|
||||
Planned feature improvements
|
||||
============================
|
||||
|
||||
* Use 'primary' instead of 'master' in documentation and log output
|
||||
for consistency with PostgreSQL documentation. See also commit
|
||||
870b0a53b627eeb9aca1fc14cbafe25b5beafe12.
|
||||
|
||||
* A better check which standby did receive most of the data
|
||||
|
||||
* Make the fact that a standby may be delayed a factor in the voting
|
||||
@@ -24,21 +18,8 @@ Planned feature improvements
|
||||
* Create the repmgr user/database on "master register".
|
||||
|
||||
* Use pg_basebackup for the data directory, and ALSO rsync for the
|
||||
configuration files.
|
||||
configuration files.
|
||||
|
||||
* If no configuration file supplied, search in sensible default locations
|
||||
(currently: current directory and `pg_config --sysconfdir`); if
|
||||
possible this should include the location provided by the package,
|
||||
if installed.
|
||||
|
||||
* repmgrd: if connection to the upstream node fails on startup, optionally
|
||||
retry for a certain period before giving up; this will cover cases when
|
||||
e.g. primary and standby are both starting up, and the standby comes up
|
||||
before the primary. See github issue #80.
|
||||
|
||||
* make old master node ID available for event notification commands
|
||||
(See github issue #80).
|
||||
|
||||
* Have pg_basebackup use replication slots, if and when support for
|
||||
this is added; see:
|
||||
http://www.postgresql.org/message-id/555DD2B2.7020000@gmx.net
|
||||
* Use pg_basebackup -X s
|
||||
NOTE: this can be used by including `-X s` in the configuration parameter
|
||||
`pg_basebackup_options`
|
||||
53
check_dir.c
53
check_dir.c
@@ -23,19 +23,14 @@
|
||||
#include <errno.h>
|
||||
#include <stdio.h>
|
||||
#include <string.h>
|
||||
#include <ftw.h>
|
||||
|
||||
/* NB: postgres_fe must be included BEFORE check_dir */
|
||||
#include <libpq-fe.h>
|
||||
#include <postgres_fe.h>
|
||||
|
||||
#include "postgres_fe.h"
|
||||
#include "check_dir.h"
|
||||
|
||||
#include "strutil.h"
|
||||
#include "log.h"
|
||||
|
||||
static bool _create_pg_dir(char *dir, bool force, bool for_witness);
|
||||
static int unlink_dir_callback(const char *fpath, const struct stat *sb, int typeflag, struct FTW *ftwbuf);
|
||||
|
||||
/*
|
||||
* make sure the directory either doesn't exist or is empty
|
||||
* we use this function to check the new data directory and
|
||||
@@ -248,19 +243,6 @@ is_pg_dir(char *dir)
|
||||
|
||||
bool
|
||||
create_pg_dir(char *dir, bool force)
|
||||
{
|
||||
return _create_pg_dir(dir, force, false);
|
||||
}
|
||||
|
||||
bool
|
||||
create_witness_pg_dir(char *dir, bool force)
|
||||
{
|
||||
return _create_pg_dir(dir, force, true);
|
||||
}
|
||||
|
||||
|
||||
static bool
|
||||
_create_pg_dir(char *dir, bool force, bool for_witness)
|
||||
{
|
||||
bool pg_dir = false;
|
||||
|
||||
@@ -297,24 +279,12 @@ _create_pg_dir(char *dir, bool force, bool for_witness)
|
||||
|
||||
pg_dir = is_pg_dir(dir);
|
||||
|
||||
|
||||
/*
|
||||
* we use force to reduce the time needed to restore a node which
|
||||
* turn async after a failover or anything else
|
||||
*/
|
||||
if (pg_dir && force)
|
||||
{
|
||||
|
||||
/*
|
||||
* The witness server does not store any data other than a copy of the
|
||||
* repmgr metadata, so in --force mode we can simply overwrite the
|
||||
* directory.
|
||||
*
|
||||
* For non-witness servers, we'll leave the data in place, both to reduce
|
||||
* the risk of unintentional data loss and to make it possible for the
|
||||
* data directory to be brought up-to-date with rsync.
|
||||
*/
|
||||
if (for_witness)
|
||||
{
|
||||
log_notice(_("deleting existing data directory \"%s\"\n"), dir);
|
||||
nftw(dir, unlink_dir_callback, 64, FTW_DEPTH | FTW_PHYS);
|
||||
}
|
||||
/* Let it continue */
|
||||
break;
|
||||
}
|
||||
@@ -336,14 +306,3 @@ _create_pg_dir(char *dir, bool force, bool for_witness)
|
||||
}
|
||||
return true;
|
||||
}
|
||||
|
||||
static int
|
||||
unlink_dir_callback(const char *fpath, const struct stat *sb, int typeflag, struct FTW *ftwbuf)
|
||||
{
|
||||
int rv = remove(fpath);
|
||||
|
||||
if (rv)
|
||||
perror(fpath);
|
||||
|
||||
return rv;
|
||||
}
|
||||
|
||||
@@ -26,6 +26,5 @@ bool create_dir(char *dir);
|
||||
bool set_dir_permissions(char *dir);
|
||||
bool is_pg_dir(char *dir);
|
||||
bool create_pg_dir(char *dir, bool force);
|
||||
bool create_witness_pg_dir(char *dir, bool force);
|
||||
|
||||
#endif
|
||||
|
||||
201
config.c
201
config.c
@@ -27,11 +27,9 @@
|
||||
static void parse_event_notifications_list(t_configuration_options *options, const char *arg);
|
||||
static void tablespace_list_append(t_configuration_options *options, const char *arg);
|
||||
|
||||
static char config_file_path[MAXPGPATH];
|
||||
static bool config_file_provided = false;
|
||||
|
||||
/*
|
||||
* load_config()
|
||||
* parse_config()
|
||||
*
|
||||
* Set default options and overwrite with values from provided configuration
|
||||
* file.
|
||||
@@ -42,21 +40,30 @@ static bool config_file_provided = false;
|
||||
* reload_config()
|
||||
*/
|
||||
bool
|
||||
load_config(const char *config_file, t_configuration_options *options, char *argv0)
|
||||
parse_config(const char *config_file, t_configuration_options *options)
|
||||
{
|
||||
struct stat config;
|
||||
char *s,
|
||||
buff[MAXLINELENGTH];
|
||||
char config_file_buf[MAXLEN];
|
||||
char name[MAXLEN];
|
||||
char value[MAXLEN];
|
||||
bool config_file_provided = false;
|
||||
FILE *fp;
|
||||
|
||||
/* Sanity checks */
|
||||
|
||||
/*
|
||||
* If a configuration file was provided, check it exists, otherwise
|
||||
* emit an error and terminate
|
||||
* emit an error
|
||||
*/
|
||||
if (config_file[0])
|
||||
{
|
||||
strncpy(config_file_path, config_file, MAXPGPATH);
|
||||
canonicalize_path(config_file_path);
|
||||
struct stat config;
|
||||
|
||||
if (stat(config_file_path, &config) != 0)
|
||||
strncpy(config_file_buf, config_file, MAXLEN);
|
||||
canonicalize_path(config_file_buf);
|
||||
|
||||
if(stat(config_file_buf, &config) != 0)
|
||||
{
|
||||
log_err(_("provided configuration file '%s' not found: %s\n"),
|
||||
config_file,
|
||||
@@ -69,53 +76,16 @@ load_config(const char *config_file, t_configuration_options *options, char *arg
|
||||
}
|
||||
|
||||
/*
|
||||
* If no configuration file was provided, attempt to find a default file
|
||||
* If no configuration file was provided, set to a default file
|
||||
* which `parse_config()` will attempt to read if it exists
|
||||
*/
|
||||
if (config_file_provided == false)
|
||||
else
|
||||
{
|
||||
char my_exec_path[MAXPGPATH];
|
||||
char etc_path[MAXPGPATH];
|
||||
|
||||
/* First check if one is in the default sysconfdir */
|
||||
if (find_my_exec(argv0, my_exec_path) < 0)
|
||||
{
|
||||
fprintf(stderr, _("%s: could not find own program executable\n"), argv0);
|
||||
exit(EXIT_FAILURE);
|
||||
}
|
||||
|
||||
get_etc_path(my_exec_path, etc_path);
|
||||
|
||||
snprintf(config_file_path, MAXPGPATH, "%s/repmgr.conf", etc_path);
|
||||
|
||||
log_debug(_("Looking for configuration file in %s\n"), etc_path);
|
||||
|
||||
if (stat(config_file_path, &config) != 0)
|
||||
{
|
||||
/* Not found - default to ./repmgr.conf */
|
||||
strncpy(config_file_path, DEFAULT_CONFIG_FILE, MAXPGPATH);
|
||||
canonicalize_path(config_file_path);
|
||||
log_debug(_("Looking for configuration file in %s\n"), config_file_path);
|
||||
}
|
||||
strncpy(config_file_buf, DEFAULT_CONFIG_FILE, MAXLEN);
|
||||
}
|
||||
|
||||
return parse_config(options);
|
||||
}
|
||||
|
||||
|
||||
bool
|
||||
parse_config(t_configuration_options *options)
|
||||
{
|
||||
FILE *fp;
|
||||
char *s,
|
||||
buff[MAXLINELENGTH];
|
||||
char name[MAXLEN];
|
||||
char value[MAXLEN];
|
||||
|
||||
/* For sanity-checking provided conninfo string */
|
||||
PQconninfoOption *conninfo_options;
|
||||
char *conninfo_errmsg = NULL;
|
||||
|
||||
fp = fopen(config_file_path, "r");
|
||||
fp = fopen(config_file_buf, "r");
|
||||
|
||||
/*
|
||||
* Since some commands don't require a config file at all, not having one
|
||||
@@ -129,9 +99,9 @@ parse_config(t_configuration_options *options)
|
||||
*/
|
||||
if (fp == NULL)
|
||||
{
|
||||
if (config_file_provided)
|
||||
if(config_file_provided)
|
||||
{
|
||||
log_err(_("unable to open provided configuration file '%s'; terminating\n"), config_file_path);
|
||||
log_err(_("unable to open provided configuration file '%s'; terminating\n"), config_file_buf);
|
||||
exit(ERR_BAD_CONFIG);
|
||||
}
|
||||
|
||||
@@ -179,17 +149,13 @@ parse_config(t_configuration_options *options)
|
||||
{
|
||||
bool known_parameter = true;
|
||||
|
||||
/* Skip blank lines and comments */
|
||||
if (buff[0] == '\n' || buff[0] == '#')
|
||||
continue;
|
||||
|
||||
/* Parse name/value pair from line */
|
||||
parse_line(buff, name, value);
|
||||
|
||||
/* Skip blank lines */
|
||||
if (!strlen(name))
|
||||
continue;
|
||||
|
||||
/* Skip comments */
|
||||
if (name[0] == '#')
|
||||
continue;
|
||||
|
||||
/* Copy into correct entry in parameters struct */
|
||||
if (strcmp(name, "cluster") == 0)
|
||||
strncpy(options->cluster_name, value, MAXLEN);
|
||||
@@ -273,7 +239,7 @@ parse_config(t_configuration_options *options)
|
||||
* we want to accept those, we'd need to add stricter default checking,
|
||||
* as currently e.g. an empty `node` value will be converted to '0'.
|
||||
*/
|
||||
if (known_parameter == true && !strlen(value)) {
|
||||
if(known_parameter == true && !strlen(value)) {
|
||||
log_err(_("no value provided for parameter '%s'\n"), name);
|
||||
exit(ERR_BAD_CONFIG);
|
||||
}
|
||||
@@ -296,12 +262,6 @@ parse_config(t_configuration_options *options)
|
||||
exit(ERR_BAD_CONFIG);
|
||||
}
|
||||
|
||||
if (options->node == 0)
|
||||
{
|
||||
log_err(_("'node' must be an integer greater than zero\n"));
|
||||
exit(ERR_BAD_CONFIG);
|
||||
}
|
||||
|
||||
if (*options->node_name == '\0')
|
||||
{
|
||||
log_err(_("required parameter 'node_name' was not found\n"));
|
||||
@@ -314,19 +274,6 @@ parse_config(t_configuration_options *options)
|
||||
exit(ERR_BAD_CONFIG);
|
||||
}
|
||||
|
||||
/* Sanity check the provided conninfo string
|
||||
*
|
||||
* NOTE: this verifies the string format and checks for valid options
|
||||
* but does not sanity check values
|
||||
*/
|
||||
conninfo_options = PQconninfoParse(options->conninfo, &conninfo_errmsg);
|
||||
if (conninfo_options == NULL)
|
||||
{
|
||||
log_err(_("Parameter 'conninfo' is invalid: %s"), conninfo_errmsg);
|
||||
exit(ERR_BAD_CONFIG);
|
||||
}
|
||||
PQconninfoFree(conninfo_options);
|
||||
|
||||
/* The following checks are for valid parameter values */
|
||||
if (options->master_response_timeout <= 0)
|
||||
{
|
||||
@@ -358,7 +305,7 @@ trim(char *s)
|
||||
*s2 = &s[strlen(s) - 1];
|
||||
|
||||
/* If string is empty, no action needed */
|
||||
if (s2 < s1)
|
||||
if(s2 < s1)
|
||||
return s;
|
||||
|
||||
/* Trim and delimit right side */
|
||||
@@ -384,50 +331,24 @@ parse_line(char *buff, char *name, char *value)
|
||||
int j = 0;
|
||||
|
||||
/*
|
||||
* Extract parameter name, if present
|
||||
* first we find the name of the parameter
|
||||
*/
|
||||
for (; i < MAXLEN; ++i)
|
||||
{
|
||||
|
||||
if (buff[i] == '=')
|
||||
if (buff[i] != '=')
|
||||
name[j++] = buff[i];
|
||||
else
|
||||
break;
|
||||
|
||||
switch(buff[i])
|
||||
{
|
||||
/* Ignore whitespace */
|
||||
case ' ':
|
||||
case '\n':
|
||||
case '\r':
|
||||
case '\t':
|
||||
continue;
|
||||
default:
|
||||
name[j++] = buff[i];
|
||||
}
|
||||
}
|
||||
name[j] = '\0';
|
||||
|
||||
/*
|
||||
* Ignore any whitespace following the '=' sign
|
||||
*/
|
||||
for (; i < MAXLEN; ++i)
|
||||
{
|
||||
if (buff[i+1] == ' ')
|
||||
continue;
|
||||
if (buff[i+1] == '\t')
|
||||
continue;
|
||||
|
||||
break;
|
||||
}
|
||||
|
||||
/*
|
||||
* Extract parameter value
|
||||
* Now the value
|
||||
*/
|
||||
j = 0;
|
||||
for (++i; i < MAXLEN; ++i)
|
||||
if (buff[i] == '\'')
|
||||
continue;
|
||||
else if (buff[i] == '#')
|
||||
break;
|
||||
else if (buff[i] != '\n')
|
||||
value[j++] = buff[i];
|
||||
else
|
||||
@@ -437,7 +358,7 @@ parse_line(char *buff, char *name, char *value)
|
||||
}
|
||||
|
||||
bool
|
||||
reload_config(t_configuration_options *orig_options)
|
||||
reload_config(char *config_file, t_configuration_options * orig_options)
|
||||
{
|
||||
PGconn *conn;
|
||||
t_configuration_options new_options;
|
||||
@@ -448,7 +369,7 @@ reload_config(t_configuration_options *orig_options)
|
||||
*/
|
||||
log_info(_("reloading configuration file and updating repmgr tables\n"));
|
||||
|
||||
parse_config(&new_options);
|
||||
parse_config(config_file, &new_options);
|
||||
if (new_options.node == -1)
|
||||
{
|
||||
log_warning(_("unable to parse new configuration, retaining current configuration\n"));
|
||||
@@ -497,7 +418,7 @@ reload_config(t_configuration_options *orig_options)
|
||||
return false;
|
||||
}
|
||||
|
||||
if (strcmp(orig_options->conninfo, new_options.conninfo) != 0)
|
||||
if(strcmp(orig_options->conninfo, new_options.conninfo) != 0)
|
||||
{
|
||||
/* Test conninfo string */
|
||||
conn = establish_db_connection(new_options.conninfo, false);
|
||||
@@ -517,56 +438,56 @@ reload_config(t_configuration_options *orig_options)
|
||||
*/
|
||||
|
||||
/* cluster_name */
|
||||
if (strcmp(orig_options->cluster_name, new_options.cluster_name) != 0)
|
||||
if(strcmp(orig_options->cluster_name, new_options.cluster_name) != 0)
|
||||
{
|
||||
strcpy(orig_options->cluster_name, new_options.cluster_name);
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* conninfo */
|
||||
if (strcmp(orig_options->conninfo, new_options.conninfo) != 0)
|
||||
if(strcmp(orig_options->conninfo, new_options.conninfo) != 0)
|
||||
{
|
||||
strcpy(orig_options->conninfo, new_options.conninfo);
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* node */
|
||||
if (orig_options->node != new_options.node)
|
||||
if(orig_options->node != new_options.node)
|
||||
{
|
||||
orig_options->node = new_options.node;
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* failover */
|
||||
if (orig_options->failover != new_options.failover)
|
||||
if(orig_options->failover != new_options.failover)
|
||||
{
|
||||
orig_options->failover = new_options.failover;
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* priority */
|
||||
if (orig_options->priority != new_options.priority)
|
||||
if(orig_options->priority != new_options.priority)
|
||||
{
|
||||
orig_options->priority = new_options.priority;
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* node_name */
|
||||
if (strcmp(orig_options->node_name, new_options.node_name) != 0)
|
||||
if(strcmp(orig_options->node_name, new_options.node_name) != 0)
|
||||
{
|
||||
strcpy(orig_options->node_name, new_options.node_name);
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* promote_command */
|
||||
if (strcmp(orig_options->promote_command, new_options.promote_command) != 0)
|
||||
if(strcmp(orig_options->promote_command, new_options.promote_command) != 0)
|
||||
{
|
||||
strcpy(orig_options->promote_command, new_options.promote_command);
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* follow_command */
|
||||
if (strcmp(orig_options->follow_command, new_options.follow_command) != 0)
|
||||
if(strcmp(orig_options->follow_command, new_options.follow_command) != 0)
|
||||
{
|
||||
strcpy(orig_options->follow_command, new_options.follow_command);
|
||||
config_changed = true;
|
||||
@@ -583,76 +504,76 @@ reload_config(t_configuration_options *orig_options)
|
||||
*/
|
||||
|
||||
/* rsync_options */
|
||||
if (strcmp(orig_options->rsync_options, new_options.rsync_options) != 0)
|
||||
if(strcmp(orig_options->rsync_options, new_options.rsync_options) != 0)
|
||||
{
|
||||
strcpy(orig_options->rsync_options, new_options.rsync_options);
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* ssh_options */
|
||||
if (strcmp(orig_options->ssh_options, new_options.ssh_options) != 0)
|
||||
if(strcmp(orig_options->ssh_options, new_options.ssh_options) != 0)
|
||||
{
|
||||
strcpy(orig_options->ssh_options, new_options.ssh_options);
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* master_response_timeout */
|
||||
if (orig_options->master_response_timeout != new_options.master_response_timeout)
|
||||
if(orig_options->master_response_timeout != new_options.master_response_timeout)
|
||||
{
|
||||
orig_options->master_response_timeout = new_options.master_response_timeout;
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* reconnect_attempts */
|
||||
if (orig_options->reconnect_attempts != new_options.reconnect_attempts)
|
||||
if(orig_options->reconnect_attempts != new_options.reconnect_attempts)
|
||||
{
|
||||
orig_options->reconnect_attempts = new_options.reconnect_attempts;
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* reconnect_intvl */
|
||||
if (orig_options->reconnect_intvl != new_options.reconnect_intvl)
|
||||
if(orig_options->reconnect_intvl != new_options.reconnect_intvl)
|
||||
{
|
||||
orig_options->reconnect_intvl = new_options.reconnect_intvl;
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* pg_ctl_options */
|
||||
if (strcmp(orig_options->pg_ctl_options, new_options.pg_ctl_options) != 0)
|
||||
if(strcmp(orig_options->pg_ctl_options, new_options.pg_ctl_options) != 0)
|
||||
{
|
||||
strcpy(orig_options->pg_ctl_options, new_options.pg_ctl_options);
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* pg_basebackup_options */
|
||||
if (strcmp(orig_options->pg_basebackup_options, new_options.pg_basebackup_options) != 0)
|
||||
if(strcmp(orig_options->pg_basebackup_options, new_options.pg_basebackup_options) != 0)
|
||||
{
|
||||
strcpy(orig_options->pg_basebackup_options, new_options.pg_basebackup_options);
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* monitor_interval_secs */
|
||||
if (orig_options->monitor_interval_secs != new_options.monitor_interval_secs)
|
||||
if(orig_options->monitor_interval_secs != new_options.monitor_interval_secs)
|
||||
{
|
||||
orig_options->monitor_interval_secs = new_options.monitor_interval_secs;
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* retry_promote_interval_secs */
|
||||
if (orig_options->retry_promote_interval_secs != new_options.retry_promote_interval_secs)
|
||||
if(orig_options->retry_promote_interval_secs != new_options.retry_promote_interval_secs)
|
||||
{
|
||||
orig_options->retry_promote_interval_secs = new_options.retry_promote_interval_secs;
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
/* use_replication_slots */
|
||||
if (orig_options->use_replication_slots != new_options.use_replication_slots)
|
||||
if(orig_options->use_replication_slots != new_options.use_replication_slots)
|
||||
{
|
||||
orig_options->use_replication_slots = new_options.use_replication_slots;
|
||||
config_changed = true;
|
||||
}
|
||||
|
||||
if (config_changed == true)
|
||||
if(config_changed == true)
|
||||
{
|
||||
log_debug(_("reload_config(): configuration has changed\n"));
|
||||
}
|
||||
@@ -681,7 +602,7 @@ tablespace_list_append(t_configuration_options *options, const char *arg)
|
||||
const char *arg_ptr;
|
||||
|
||||
cell = (TablespaceListCell *) pg_malloc0(sizeof(TablespaceListCell));
|
||||
if (cell == NULL)
|
||||
if(cell == NULL)
|
||||
{
|
||||
log_err(_("unable to allocate memory; terminating\n"));
|
||||
exit(ERR_BAD_CONFIG);
|
||||
@@ -749,7 +670,7 @@ parse_event_notifications_list(t_configuration_options *options, const char *arg
|
||||
for (arg_ptr = arg; arg_ptr <= (arg + strlen(arg)); arg_ptr++)
|
||||
{
|
||||
/* ignore whitespace */
|
||||
if (*arg_ptr == ' ' || *arg_ptr == '\t')
|
||||
if(*arg_ptr == ' ' || *arg_ptr == '\t')
|
||||
{
|
||||
continue;
|
||||
}
|
||||
@@ -758,13 +679,13 @@ parse_event_notifications_list(t_configuration_options *options, const char *arg
|
||||
* comma (or end-of-string) should mark the end of an event type -
|
||||
* just as long as there was something preceding it
|
||||
*/
|
||||
if ((*arg_ptr == ',' || *arg_ptr == '\0') && event_type_buf[0] != '\0')
|
||||
if((*arg_ptr == ',' || *arg_ptr == '\0') && event_type_buf[0] != '\0')
|
||||
{
|
||||
EventNotificationListCell *cell;
|
||||
|
||||
cell = (EventNotificationListCell *) pg_malloc0(sizeof(EventNotificationListCell));
|
||||
|
||||
if (cell == NULL)
|
||||
if(cell == NULL)
|
||||
{
|
||||
log_err(_("unable to allocate memory; terminating\n"));
|
||||
exit(ERR_BAD_CONFIG);
|
||||
@@ -787,7 +708,7 @@ parse_event_notifications_list(t_configuration_options *options, const char *arg
|
||||
dst_ptr = event_type_buf;
|
||||
}
|
||||
/* ignore duplicated commas */
|
||||
else if (*arg_ptr == ',')
|
||||
else if(*arg_ptr == ',')
|
||||
{
|
||||
continue;
|
||||
}
|
||||
|
||||
5
config.h
5
config.h
@@ -83,10 +83,9 @@ typedef struct
|
||||
#define T_CONFIGURATION_OPTIONS_INITIALIZER { "", -1, NO_UPSTREAM_NODE, "", MANUAL_FAILOVER, -1, "", "", "", "", "", "", "", -1, -1, -1, "", "", "", "", 0, 0, 0, "", { NULL, NULL }, {NULL, NULL} }
|
||||
|
||||
|
||||
bool load_config(const char *config_file, t_configuration_options *options, char *argv0);
|
||||
bool reload_config(t_configuration_options *orig_options);
|
||||
bool parse_config(t_configuration_options *options);
|
||||
bool parse_config(const char *config_file, t_configuration_options *options);
|
||||
void parse_line(char *buff, char *name, char *value);
|
||||
char *trim(char *s);
|
||||
bool reload_config(char *config_file, t_configuration_options *orig_options);
|
||||
|
||||
#endif
|
||||
|
||||
217
dbutils.c
217
dbutils.c
@@ -82,72 +82,6 @@ establish_db_connection_by_params(const char *keywords[], const char *values[],
|
||||
}
|
||||
|
||||
|
||||
bool
|
||||
begin_transaction(PGconn *conn)
|
||||
{
|
||||
PGresult *res;
|
||||
|
||||
res = PQexec(conn, "BEGIN");
|
||||
|
||||
if (PQresultStatus(res) != PGRES_COMMAND_OK)
|
||||
{
|
||||
log_err(_("Unable to begin transaction: %s\n"),
|
||||
PQerrorMessage(conn));
|
||||
|
||||
PQclear(res);
|
||||
return false;
|
||||
}
|
||||
|
||||
PQclear(res);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
bool
|
||||
commit_transaction(PGconn *conn)
|
||||
{
|
||||
PGresult *res;
|
||||
|
||||
res = PQexec(conn, "COMMIT");
|
||||
|
||||
if (PQresultStatus(res) != PGRES_COMMAND_OK)
|
||||
{
|
||||
log_err(_("Unable to commit transaction: %s\n"),
|
||||
PQerrorMessage(conn));
|
||||
PQclear(res);
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
PQclear(res);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
bool
|
||||
rollback_transaction(PGconn *conn)
|
||||
{
|
||||
PGresult *res;
|
||||
|
||||
res = PQexec(conn, "ROLLBACK");
|
||||
|
||||
if (PQresultStatus(res) != PGRES_COMMAND_OK)
|
||||
{
|
||||
log_err(_("Unable to rollback transaction: %s\n"),
|
||||
PQerrorMessage(conn));
|
||||
PQclear(res);
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
PQclear(res);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
bool
|
||||
check_cluster_schema(PGconn *conn)
|
||||
{
|
||||
@@ -263,7 +197,7 @@ is_pgup(PGconn *conn, int timeout)
|
||||
|
||||
|
||||
/*
|
||||
* Return the id of the active master node, or NODE_NOT_FOUND if no
|
||||
* Return the id of the active master node, or -1 if no
|
||||
* record available.
|
||||
*
|
||||
* This reports the value stored in the database only and
|
||||
@@ -290,12 +224,12 @@ get_master_node_id(PGconn *conn, char *cluster)
|
||||
{
|
||||
log_err(_("get_master_node_id(): query failed\n%s\n"),
|
||||
PQerrorMessage(conn));
|
||||
retval = NODE_NOT_FOUND;
|
||||
retval = -1;
|
||||
}
|
||||
else if (PQntuples(res) == 0)
|
||||
{
|
||||
log_warning(_("get_master_node_id(): no active primary found\n"));
|
||||
retval = NODE_NOT_FOUND;
|
||||
retval = -1;
|
||||
}
|
||||
else
|
||||
{
|
||||
@@ -326,7 +260,7 @@ get_server_version(PGconn *conn, char *server_version)
|
||||
return -1;
|
||||
}
|
||||
|
||||
if (server_version != NULL)
|
||||
if(server_version != NULL)
|
||||
strcpy(server_version, PQgetvalue(res, 0, 0));
|
||||
|
||||
return atoi(PQgetvalue(res, 0, 0));
|
||||
@@ -465,7 +399,7 @@ get_pg_setting(PGconn *conn, const char *setting, char *output)
|
||||
}
|
||||
}
|
||||
|
||||
if (success == true)
|
||||
if(success == true)
|
||||
{
|
||||
log_debug(_("get_pg_setting(): returned value is '%s'\n"), output);
|
||||
}
|
||||
@@ -524,7 +458,7 @@ get_upstream_connection(PGconn *standby_conn, char *cluster, int node_id,
|
||||
return NULL;
|
||||
}
|
||||
|
||||
if (!PQntuples(res))
|
||||
if(!PQntuples(res))
|
||||
{
|
||||
log_notice(_("no record found for upstream server"));
|
||||
PQclear(res);
|
||||
@@ -533,7 +467,7 @@ get_upstream_connection(PGconn *standby_conn, char *cluster, int node_id,
|
||||
|
||||
strncpy(upstream_conninfo, PQgetvalue(res, 0, 0), MAXCONNINFO);
|
||||
|
||||
if (upstream_node_id_ptr != NULL)
|
||||
if(upstream_node_id_ptr != NULL)
|
||||
*upstream_node_id_ptr = atoi(PQgetvalue(res, 0, 1));
|
||||
|
||||
PQclear(res);
|
||||
@@ -575,9 +509,9 @@ get_master_connection(PGconn *standby_conn, char *cluster,
|
||||
int i,
|
||||
node_id;
|
||||
|
||||
if (master_id != NULL)
|
||||
if(master_id != NULL)
|
||||
{
|
||||
*master_id = NODE_NOT_FOUND;
|
||||
*master_id = -1;
|
||||
}
|
||||
|
||||
/* find all nodes belonging to this cluster */
|
||||
@@ -636,7 +570,7 @@ get_master_connection(PGconn *standby_conn, char *cluster,
|
||||
PQclear(res1);
|
||||
log_debug(_("get_master_connection(): current master node is %i\n"), node_id);
|
||||
|
||||
if (master_id != NULL)
|
||||
if(master_id != NULL)
|
||||
{
|
||||
*master_id = node_id;
|
||||
}
|
||||
@@ -775,7 +709,7 @@ get_repmgr_schema(void)
|
||||
char *
|
||||
get_repmgr_schema_quoted(PGconn *conn)
|
||||
{
|
||||
if (strcmp(repmgr_schema_quoted, "") == 0)
|
||||
if(strcmp(repmgr_schema_quoted, "") == 0)
|
||||
{
|
||||
char *identifier = PQescapeIdentifier(conn, repmgr_schema,
|
||||
strlen(repmgr_schema));
|
||||
@@ -794,49 +728,6 @@ create_replication_slot(PGconn *conn, char *slot_name)
|
||||
char sqlquery[QUERY_STR_LEN];
|
||||
PGresult *res;
|
||||
|
||||
/*
|
||||
* Check whether slot exists already; if it exists and is active, that
|
||||
* means another active standby is using it, which creates an error situation;
|
||||
* if not we can reuse it as-is
|
||||
*/
|
||||
|
||||
sqlquery_snprintf(sqlquery,
|
||||
"SELECT active, slot_type "
|
||||
" FROM pg_replication_slots "
|
||||
" WHERE slot_name = '%s' ",
|
||||
slot_name);
|
||||
|
||||
res = PQexec(conn, sqlquery);
|
||||
if (!res || PQresultStatus(res) != PGRES_TUPLES_OK)
|
||||
{
|
||||
log_err(_("unable to query pg_replication_slots: %s\n"),
|
||||
PQerrorMessage(conn));
|
||||
PQclear(res);
|
||||
return false;
|
||||
}
|
||||
|
||||
if (PQntuples(res))
|
||||
{
|
||||
if (strcmp(PQgetvalue(res, 0, 1), "physical") != 0)
|
||||
{
|
||||
log_err(_("Slot '%s' exists and is not a physical slot\n"),
|
||||
slot_name);
|
||||
PQclear(res);
|
||||
}
|
||||
if (strcmp(PQgetvalue(res, 0, 0), "f") == 0)
|
||||
{
|
||||
PQclear(res);
|
||||
log_debug(_("Replication slot '%s' exists but is inactive; reusing\n"),
|
||||
slot_name);
|
||||
|
||||
return true;
|
||||
}
|
||||
PQclear(res);
|
||||
log_err(_("Slot '%s' already exists as an active slot\n"),
|
||||
slot_name);
|
||||
return false;
|
||||
}
|
||||
|
||||
sqlquery_snprintf(sqlquery,
|
||||
"SELECT * FROM pg_create_physical_replication_slot('%s')",
|
||||
slot_name);
|
||||
@@ -1039,13 +930,13 @@ create_node_record(PGconn *conn, char *action, int node, char *type, int upstrea
|
||||
char slot_name_buf[MAXLEN];
|
||||
PGresult *res;
|
||||
|
||||
if (upstream_node == NO_UPSTREAM_NODE)
|
||||
if(upstream_node == NO_UPSTREAM_NODE)
|
||||
{
|
||||
/*
|
||||
* No explicit upstream node id provided for standby - attempt to
|
||||
* get primary node id
|
||||
*/
|
||||
if (strcmp(type, "standby") == 0)
|
||||
if(strcmp(type, "standby") == 0)
|
||||
{
|
||||
int primary_node_id = get_master_node_id(conn, cluster_name);
|
||||
maxlen_snprintf(upstream_node_id, "%i", primary_node_id);
|
||||
@@ -1060,7 +951,7 @@ create_node_record(PGconn *conn, char *action, int node, char *type, int upstrea
|
||||
maxlen_snprintf(upstream_node_id, "%i", upstream_node);
|
||||
}
|
||||
|
||||
if (slot_name != NULL && slot_name[0])
|
||||
if(slot_name != NULL && slot_name[0])
|
||||
{
|
||||
maxlen_snprintf(slot_name_buf, "'%s'", slot_name);
|
||||
}
|
||||
@@ -1084,7 +975,7 @@ create_node_record(PGconn *conn, char *action, int node, char *type, int upstrea
|
||||
slot_name_buf,
|
||||
priority);
|
||||
|
||||
if (action != NULL)
|
||||
if(action != NULL)
|
||||
{
|
||||
log_debug(_("%s: %s\n"), action, sqlquery);
|
||||
}
|
||||
@@ -1115,7 +1006,7 @@ delete_node_record(PGconn *conn, int node, char *action)
|
||||
" WHERE id = %d",
|
||||
get_repmgr_schema_quoted(conn),
|
||||
node);
|
||||
if (action != NULL)
|
||||
if(action != NULL)
|
||||
{
|
||||
log_debug(_("%s: %s\n"), action, sqlquery);
|
||||
}
|
||||
@@ -1146,8 +1037,8 @@ delete_node_record(PGconn *conn, int node, char *action)
|
||||
*
|
||||
* Note this function may be called with `conn` set to NULL in cases where
|
||||
* the master node is not available and it's therefore not possible to write
|
||||
* an event record. In this case, if `event_notification_command` is set, a
|
||||
* user-defined notification to be generated; if not, this function will have
|
||||
* an event record. In this case, if `event_notification_command` is set a user-
|
||||
* defined notification to be generated; if not, this function will have
|
||||
* no effect.
|
||||
*/
|
||||
|
||||
@@ -1160,12 +1051,7 @@ create_event_record(PGconn *conn, t_configuration_options *options, int node_id,
|
||||
bool success = true;
|
||||
struct tm ts;
|
||||
|
||||
/* Only attempt to write a record if a connection handle was provided.
|
||||
Also check that the repmgr schema has been properly intialised - if
|
||||
not it means no configuration file was provided, which can happen with
|
||||
e.g. `repmgr standby clone`, and we won't know which schema to write to.
|
||||
*/
|
||||
if (conn != NULL && strcmp(repmgr_schema, DEFAULT_REPMGR_SCHEMA_PREFIX) != 0)
|
||||
if(conn != NULL)
|
||||
{
|
||||
int n_node_id = htonl(node_id);
|
||||
char *t_successful = successful ? "TRUE" : "FALSE";
|
||||
@@ -1228,7 +1114,7 @@ create_event_record(PGconn *conn, t_configuration_options *options, int node_id,
|
||||
* current timestamp ourselves. This isn't quite the same
|
||||
* format as PostgreSQL, but is close enough for diagnostic use.
|
||||
*/
|
||||
if (!strlen(event_timestamp))
|
||||
if(!strlen(event_timestamp))
|
||||
{
|
||||
time_t now;
|
||||
|
||||
@@ -1238,7 +1124,7 @@ create_event_record(PGconn *conn, t_configuration_options *options, int node_id,
|
||||
}
|
||||
|
||||
/* an event notification command was provided - parse and execute it */
|
||||
if (strlen(options->event_notification_command))
|
||||
if(strlen(options->event_notification_command))
|
||||
{
|
||||
char parsed_command[MAXPGPATH];
|
||||
const char *src_ptr;
|
||||
@@ -1254,14 +1140,14 @@ create_event_record(PGconn *conn, t_configuration_options *options, int node_id,
|
||||
* (If 'event_notifications' was not provided, we assume the script
|
||||
* should be executed for all events).
|
||||
*/
|
||||
if (options->event_notifications.head != NULL)
|
||||
if(options->event_notifications.head != NULL)
|
||||
{
|
||||
EventNotificationListCell *cell;
|
||||
bool notify_ok = false;
|
||||
|
||||
for (cell = options->event_notifications.head; cell; cell = cell->next)
|
||||
{
|
||||
if (strcmp(event, cell->event_type) == 0)
|
||||
if(strcmp(event, cell->event_type) == 0)
|
||||
{
|
||||
notify_ok = true;
|
||||
break;
|
||||
@@ -1271,7 +1157,7 @@ create_event_record(PGconn *conn, t_configuration_options *options, int node_id,
|
||||
/*
|
||||
* Event type not found in the 'event_notifications' list - return early
|
||||
*/
|
||||
if (notify_ok == false)
|
||||
if(notify_ok == false)
|
||||
{
|
||||
log_debug(_("Not executing notification script for event type '%s'\n"), event);
|
||||
return success;
|
||||
@@ -1303,7 +1189,7 @@ create_event_record(PGconn *conn, t_configuration_options *options, int node_id,
|
||||
case 'd':
|
||||
/* %d: details */
|
||||
src_ptr++;
|
||||
if (details != NULL)
|
||||
if(details != NULL)
|
||||
{
|
||||
strlcpy(dst_ptr, details, end_ptr - dst_ptr);
|
||||
dst_ptr += strlen(dst_ptr);
|
||||
@@ -1349,56 +1235,3 @@ create_event_record(PGconn *conn, t_configuration_options *options, int node_id,
|
||||
|
||||
return success;
|
||||
}
|
||||
|
||||
bool
|
||||
update_node_record_set_upstream(PGconn *conn, char *cluster_name, int this_node_id, int new_upstream_node_id)
|
||||
{
|
||||
PGresult *res;
|
||||
char sqlquery[QUERY_STR_LEN];
|
||||
|
||||
log_debug(_("update_node_record_set_upstream(): Updating node %i's upstream node to %i\n"), this_node_id, new_upstream_node_id);
|
||||
|
||||
sqlquery_snprintf(sqlquery,
|
||||
" UPDATE %s.repl_nodes "
|
||||
" SET upstream_node_id = %i "
|
||||
" WHERE cluster = '%s' "
|
||||
" AND id = %i ",
|
||||
get_repmgr_schema_quoted(conn),
|
||||
new_upstream_node_id,
|
||||
cluster_name,
|
||||
this_node_id);
|
||||
res = PQexec(conn, sqlquery);
|
||||
|
||||
if (PQresultStatus(res) != PGRES_COMMAND_OK)
|
||||
{
|
||||
log_err(_("Unable to set new upstream node id: %s\n"),
|
||||
PQerrorMessage(conn));
|
||||
PQclear(res);
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
PQclear(res);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
PGresult *
|
||||
get_node_record(PGconn *conn, char *cluster, int node_id)
|
||||
{
|
||||
char sqlquery[QUERY_STR_LEN];
|
||||
|
||||
sprintf(sqlquery,
|
||||
"SELECT id, upstream_node_id, conninfo, type, slot_name, active "
|
||||
" FROM %s.repl_nodes "
|
||||
" WHERE cluster = '%s' "
|
||||
" AND id = %i",
|
||||
get_repmgr_schema_quoted(conn),
|
||||
cluster,
|
||||
node_id);
|
||||
|
||||
log_debug("get_node_record(): %s\n", sqlquery);
|
||||
|
||||
return PQexec(conn, sqlquery);
|
||||
}
|
||||
|
||||
@@ -30,9 +30,6 @@ PGconn *establish_db_connection(const char *conninfo,
|
||||
PGconn *establish_db_connection_by_params(const char *keywords[],
|
||||
const char *values[],
|
||||
const bool exit_on_error);
|
||||
bool begin_transaction(PGconn *conn);
|
||||
bool commit_transaction(PGconn *conn);
|
||||
bool rollback_transaction(PGconn *conn);
|
||||
bool check_cluster_schema(PGconn *conn);
|
||||
int is_standby(PGconn *conn);
|
||||
bool is_pgup(PGconn *conn, int timeout);
|
||||
@@ -66,7 +63,6 @@ bool copy_configuration(PGconn *masterconn, PGconn *witnessconn, char *cluster_
|
||||
bool create_node_record(PGconn *conn, char *action, int node, char *type, int upstream_node, char *cluster_name, char *node_name, char *conninfo, int priority, char *slot_name);
|
||||
bool delete_node_record(PGconn *conn, int node, char *action);
|
||||
bool create_event_record(PGconn *conn, t_configuration_options *options, int node_id, char *event, bool successful, char *details);
|
||||
bool update_node_record_set_upstream(PGconn *conn, char *cluster_name, int this_node_id, int new_upstream_node_id);
|
||||
PGresult * get_node_record(PGconn *conn, char *cluster, int node_id);
|
||||
|
||||
|
||||
#endif
|
||||
|
||||
4
debian/repmgr.repmgrd.default
vendored
4
debian/repmgr.repmgrd.default
vendored
@@ -12,7 +12,7 @@ REPMGRD_ENABLED=no
|
||||
#REPMGRD_USER=postgres
|
||||
|
||||
# repmgrd binary
|
||||
#REPMGRD_BIN=/usr/bin/repmgrd
|
||||
#REPMGR_BIN=/usr/bin/repmgr
|
||||
|
||||
# pid file
|
||||
#REPMGRD_PIDFILE=/var/run/repmgrd.pid
|
||||
#REPMGR_PIDFILE=/var/run/repmgrd.pid
|
||||
|
||||
@@ -35,6 +35,5 @@
|
||||
#define ERR_BAD_SSH 12
|
||||
#define ERR_SYS_FAILURE 13
|
||||
#define ERR_BAD_BASEBACKUP 14
|
||||
#define ERR_INTERNAL 15
|
||||
|
||||
#endif /* _ERRCODE_H_ */
|
||||
|
||||
24
log.c
24
log.c
@@ -144,32 +144,12 @@ logger_init(t_configuration_options * opts, const char *ident, const char *level
|
||||
{
|
||||
FILE *fd;
|
||||
|
||||
/* Check if we can write to the specified file before redirecting
|
||||
* stderr - if freopen() fails, stderr output will vanish into
|
||||
* the ether and the user won't know what's going on.
|
||||
*/
|
||||
|
||||
fd = fopen(opts->logfile, "a");
|
||||
if (fd == NULL)
|
||||
{
|
||||
stderr_log_err(_("Unable to open specified logfile '%s' for writing: %s\n"), opts->logfile, strerror(errno));
|
||||
stderr_log_err(_("Terminating\n"));
|
||||
exit(ERR_BAD_CONFIG);
|
||||
}
|
||||
fclose(fd);
|
||||
|
||||
stderr_log_notice(_("Redirecting logging output to '%s'\n"), opts->logfile);
|
||||
fd = freopen(opts->logfile, "a", stderr);
|
||||
|
||||
/* It's possible freopen() may still fail due to e.g. a race condition;
|
||||
as it's not feasible to restore stderr after a failed freopen(),
|
||||
we'll write to stdout as a last resort.
|
||||
*/
|
||||
if (fd == NULL)
|
||||
{
|
||||
printf(_("Unable to open specified logfile %s for writing: %s\n"), opts->logfile, strerror(errno));
|
||||
printf(_("Terminating\n"));
|
||||
exit(ERR_BAD_CONFIG);
|
||||
fprintf(stderr, "error reopening stderr to '%s': %s",
|
||||
opts->logfile, strerror(errno));
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -7,11 +7,8 @@
|
||||
#
|
||||
# repmgr and repmgrd require these items to be configured:
|
||||
|
||||
# Cluster name - this will be used by repmgr to generate its internal
|
||||
# schema (pattern: "repmgr_{cluster}"); while this name will be quoted
|
||||
# to preserve case, we recommend using lower case and avoiding whitespace
|
||||
# to facilitate easier querying of the repmgr views and tables.
|
||||
cluster=example_cluster
|
||||
# Cluster name
|
||||
cluster=test
|
||||
|
||||
# Node ID and name
|
||||
# (Note: we recommend to avoid naming nodes after their initial
|
||||
|
||||
9
repmgr.h
9
repmgr.h
@@ -20,9 +20,11 @@
|
||||
#ifndef _REPMGR_H_
|
||||
#define _REPMGR_H_
|
||||
|
||||
#include <libpq-fe.h>
|
||||
#include <postgres_fe.h>
|
||||
#include <getopt_long.h>
|
||||
#include "postgres_fe.h"
|
||||
#include "libpq-fe.h"
|
||||
|
||||
|
||||
#include "getopt_long.h"
|
||||
|
||||
#include "strutil.h"
|
||||
#include "dbutils.h"
|
||||
@@ -47,7 +49,6 @@
|
||||
|
||||
#define MANUAL_FAILOVER 0
|
||||
#define AUTOMATIC_FAILOVER 1
|
||||
#define NODE_NOT_FOUND -1
|
||||
#define NO_UPSTREAM_NODE -1
|
||||
|
||||
|
||||
|
||||
357
repmgrd.c
357
repmgrd.c
@@ -88,9 +88,12 @@ static void check_node_configuration(void);
|
||||
|
||||
static void standby_monitor(void);
|
||||
static void witness_monitor(void);
|
||||
static bool check_connection(PGconn **conn, const char *type, const char *conninfo);
|
||||
static bool check_connection(PGconn *conn, const char *type);
|
||||
static bool set_local_node_failed(void);
|
||||
|
||||
static bool update_node_record_set_master(PGconn *conn, int this_node_id, int old_master_node_id);
|
||||
static bool update_node_record_set_upstream(PGconn *conn, int this_node_id, int new_upstream_node_id);
|
||||
|
||||
static void update_shared_memory(char *last_wal_standby_applied);
|
||||
static void update_registration(void);
|
||||
static void do_master_failover(void);
|
||||
@@ -145,8 +148,6 @@ main(int argc, char **argv)
|
||||
{"monitoring-history", no_argument, NULL, 'm'},
|
||||
{"daemonize", no_argument, NULL, 'd'},
|
||||
{"pid-file", required_argument, NULL, 'p'},
|
||||
{"help", no_argument, NULL, '?'},
|
||||
{"version", no_argument, NULL, 'V'},
|
||||
{NULL, 0, NULL, 0}
|
||||
};
|
||||
|
||||
@@ -160,7 +161,21 @@ main(int argc, char **argv)
|
||||
int server_version_num = 0;
|
||||
progname = get_progname(argv[0]);
|
||||
|
||||
while ((c = getopt_long(argc, argv, "?Vf:v:mdp:", long_options, &optindex)) != -1)
|
||||
if (argc > 1)
|
||||
{
|
||||
if (strcmp(argv[1], "--help") == 0 || strcmp(argv[1], "-?") == 0)
|
||||
{
|
||||
help(progname);
|
||||
exit(SUCCESS);
|
||||
}
|
||||
if (strcmp(argv[1], "--version") == 0 || strcmp(argv[1], "-V") == 0)
|
||||
{
|
||||
printf("%s %s (PostgreSQL %s)\n", progname, REPMGR_VERSION, PG_VERSION);
|
||||
exit(SUCCESS);
|
||||
}
|
||||
}
|
||||
|
||||
while ((c = getopt_long(argc, argv, "f:v:mdp:", long_options, &optindex)) != -1)
|
||||
{
|
||||
switch (c)
|
||||
{
|
||||
@@ -179,12 +194,6 @@ main(int argc, char **argv)
|
||||
case 'p':
|
||||
pid_file = optarg;
|
||||
break;
|
||||
case '?':
|
||||
help(progname);
|
||||
exit(SUCCESS);
|
||||
case 'V':
|
||||
printf("%s %s (PostgreSQL %s)\n", progname, REPMGR_VERSION, PG_VERSION);
|
||||
exit(SUCCESS);
|
||||
default:
|
||||
usage();
|
||||
exit(ERR_BAD_CONFIG);
|
||||
@@ -200,7 +209,7 @@ main(int argc, char **argv)
|
||||
* which case we'll need to refactor parse_config() not to abort,
|
||||
* and return the error message.
|
||||
*/
|
||||
load_config(config_file, &local_options, argv[0]);
|
||||
parse_config(config_file, &local_options);
|
||||
|
||||
if (daemonize)
|
||||
{
|
||||
@@ -259,7 +268,7 @@ main(int argc, char **argv)
|
||||
|
||||
server_version_num = get_server_version(my_local_conn, NULL);
|
||||
|
||||
if (server_version_num < MIN_SUPPORTED_VERSION_NUM)
|
||||
if(server_version_num < MIN_SUPPORTED_VERSION_NUM)
|
||||
{
|
||||
if (server_version_num > 0)
|
||||
{
|
||||
@@ -275,17 +284,9 @@ main(int argc, char **argv)
|
||||
terminate(ERR_BAD_CONFIG);
|
||||
}
|
||||
|
||||
/* Retrieve record for this node from the local database */
|
||||
/* Retrieve record for this node from the database */
|
||||
node_info = get_node_info(my_local_conn, local_options.cluster_name, local_options.node);
|
||||
|
||||
/* No node record found - exit gracefully */
|
||||
if (node_info.node_id == NODE_NOT_FOUND)
|
||||
{
|
||||
log_err(_("No metadata record found for this node - terminating\n"));
|
||||
log_notice(_("HINT: was this node registered with 'repmgr (master|standby) register'?\n"));
|
||||
terminate(ERR_BAD_CONFIG);
|
||||
}
|
||||
|
||||
log_debug("node id is %i, upstream is %i\n", node_info.node_id, node_info.upstream_node_id);
|
||||
|
||||
/*
|
||||
@@ -311,7 +312,7 @@ main(int argc, char **argv)
|
||||
check_cluster_configuration(my_local_conn);
|
||||
check_node_configuration();
|
||||
|
||||
if (reload_config(&local_options))
|
||||
if (reload_config(config_file, &local_options))
|
||||
{
|
||||
PQfinish(my_local_conn);
|
||||
my_local_conn = establish_db_connection(local_options.conninfo, true);
|
||||
@@ -320,7 +321,7 @@ main(int argc, char **argv)
|
||||
}
|
||||
|
||||
/* Log startup event */
|
||||
if (startup_event_logged == false)
|
||||
if(startup_event_logged == false)
|
||||
{
|
||||
create_event_record(master_conn,
|
||||
&local_options,
|
||||
@@ -334,9 +335,9 @@ main(int argc, char **argv)
|
||||
log_info(_("starting continuous master connection check\n"));
|
||||
|
||||
/*
|
||||
* Check that master is still alive.
|
||||
* XXX We should also check that the
|
||||
* standby servers are sending info
|
||||
* Check that master is still alive.
|
||||
* XXX We should also check that the
|
||||
* standby servers are sending info
|
||||
*/
|
||||
|
||||
/*
|
||||
@@ -345,7 +346,7 @@ main(int argc, char **argv)
|
||||
*/
|
||||
do
|
||||
{
|
||||
if (check_connection(&master_conn, "master", NULL))
|
||||
if (check_connection(master_conn, "master"))
|
||||
{
|
||||
sleep(local_options.monitor_interval_secs);
|
||||
}
|
||||
@@ -360,10 +361,10 @@ main(int argc, char **argv)
|
||||
if (got_SIGHUP)
|
||||
{
|
||||
/*
|
||||
* if we can reload the configuration file, then could need to change
|
||||
* if we can reload, then could need to change
|
||||
* my_local_conn
|
||||
*/
|
||||
if (reload_config(&local_options))
|
||||
if (reload_config(config_file, &local_options))
|
||||
{
|
||||
PQfinish(my_local_conn);
|
||||
my_local_conn = establish_db_connection(local_options.conninfo, true);
|
||||
@@ -424,14 +425,14 @@ main(int argc, char **argv)
|
||||
check_cluster_configuration(my_local_conn);
|
||||
check_node_configuration();
|
||||
|
||||
if (reload_config(&local_options))
|
||||
if (reload_config(config_file, &local_options))
|
||||
{
|
||||
PQfinish(my_local_conn);
|
||||
my_local_conn = establish_db_connection(local_options.conninfo, true);
|
||||
update_registration();
|
||||
}
|
||||
/* Log startup event */
|
||||
if (startup_event_logged == false)
|
||||
if(startup_event_logged == false)
|
||||
{
|
||||
create_event_record(master_conn,
|
||||
&local_options,
|
||||
@@ -475,7 +476,7 @@ main(int argc, char **argv)
|
||||
* if we can reload, then could need to change
|
||||
* my_local_conn
|
||||
*/
|
||||
if (reload_config(&local_options))
|
||||
if (reload_config(config_file, &local_options))
|
||||
{
|
||||
PQfinish(my_local_conn);
|
||||
my_local_conn = establish_db_connection(local_options.conninfo, true);
|
||||
@@ -483,7 +484,7 @@ main(int argc, char **argv)
|
||||
}
|
||||
got_SIGHUP = false;
|
||||
}
|
||||
if (failover_done)
|
||||
if(failover_done)
|
||||
{
|
||||
log_debug(_("standby check loop will terminate\n"));
|
||||
}
|
||||
@@ -528,9 +529,9 @@ witness_monitor(void)
|
||||
* of a missing master and promotion of a standby by that standby's
|
||||
* repmgrd, so we'll loop for a while before giving up.
|
||||
*/
|
||||
connection_ok = check_connection(&master_conn, "master", NULL);
|
||||
connection_ok = check_connection(master_conn, "master");
|
||||
|
||||
if (connection_ok == false)
|
||||
if(connection_ok == false)
|
||||
{
|
||||
int connection_retries;
|
||||
log_debug(_("old master node ID: %i\n"), master_options.node);
|
||||
@@ -581,7 +582,7 @@ witness_monitor(void)
|
||||
}
|
||||
}
|
||||
|
||||
if (connection_ok == false)
|
||||
if(connection_ok == false)
|
||||
{
|
||||
PQExpBufferData errmsg;
|
||||
initPQExpBuffer(&errmsg);
|
||||
@@ -636,9 +637,9 @@ witness_monitor(void)
|
||||
*/
|
||||
sqlquery_snprintf(sqlquery,
|
||||
"INSERT INTO %s.repl_monitor "
|
||||
" (primary_node, standby_node, "
|
||||
" (master_node, standby_node, "
|
||||
" last_monitor_time, last_apply_time, "
|
||||
" last_wal_primary_location, last_wal_standby_location, "
|
||||
" last_wal_master_location, last_wal_standby_location, "
|
||||
" replication_lag, apply_lag )"
|
||||
" VALUES(%d, %d, "
|
||||
" '%s'::TIMESTAMP WITH TIME ZONE, NULL, "
|
||||
@@ -685,7 +686,6 @@ standby_monitor(void)
|
||||
bool did_retry = false;
|
||||
|
||||
PGconn *upstream_conn;
|
||||
char upstream_conninfo[MAXCONNINFO];
|
||||
int upstream_node_id;
|
||||
t_node_info upstream_node;
|
||||
|
||||
@@ -697,7 +697,7 @@ standby_monitor(void)
|
||||
* no point in doing much else anyway
|
||||
*/
|
||||
|
||||
if (!check_connection(&my_local_conn, "standby", NULL))
|
||||
if (!check_connection(my_local_conn, "standby"))
|
||||
{
|
||||
PQExpBufferData errmsg;
|
||||
|
||||
@@ -723,7 +723,7 @@ standby_monitor(void)
|
||||
upstream_conn = get_upstream_connection(my_local_conn,
|
||||
local_options.cluster_name,
|
||||
local_options.node,
|
||||
&upstream_node_id, upstream_conninfo);
|
||||
&upstream_node_id, NULL);
|
||||
|
||||
type = upstream_node_id == master_options.node
|
||||
? "master"
|
||||
@@ -735,11 +735,11 @@ standby_monitor(void)
|
||||
* we cannot reconnect, try to get a new upstream node.
|
||||
*/
|
||||
|
||||
check_connection(&upstream_conn, type, upstream_conninfo);
|
||||
/*
|
||||
* This takes up to local_options.reconnect_attempts *
|
||||
* local_options.reconnect_intvl seconds
|
||||
*/
|
||||
check_connection(upstream_conn, type); /* this takes up to
|
||||
* local_options.reconnect_attempts
|
||||
* local_options.reconnect_intvl seconds
|
||||
*/
|
||||
|
||||
|
||||
if (PQstatus(upstream_conn) != CONNECTION_OK)
|
||||
{
|
||||
@@ -809,7 +809,7 @@ standby_monitor(void)
|
||||
*/
|
||||
upstream_node = get_node_info(my_local_conn, local_options.cluster_name, node_info.upstream_node_id);
|
||||
|
||||
if (upstream_node.type == MASTER)
|
||||
if(upstream_node.type == MASTER)
|
||||
{
|
||||
log_debug(_("failure detected on master node (%i); attempting to promote a standby\n"),
|
||||
node_info.upstream_node_id);
|
||||
@@ -820,7 +820,7 @@ standby_monitor(void)
|
||||
log_debug(_("failure detected on upstream node %i; attempting to reconnect to new upstream node\n"),
|
||||
node_info.upstream_node_id);
|
||||
|
||||
if (!do_upstream_standby_failover(upstream_node))
|
||||
if(!do_upstream_standby_failover(upstream_node))
|
||||
{
|
||||
PQExpBufferData errmsg;
|
||||
initPQExpBuffer(&errmsg);
|
||||
@@ -872,7 +872,7 @@ standby_monitor(void)
|
||||
log_err(_("standby node has disappeared, trying to reconnect...\n"));
|
||||
did_retry = true;
|
||||
|
||||
if (!check_connection(&my_local_conn, "standby", NULL))
|
||||
if (!check_connection(my_local_conn, "standby"))
|
||||
{
|
||||
set_local_node_failed();
|
||||
terminate(0);
|
||||
@@ -917,7 +917,7 @@ standby_monitor(void)
|
||||
return;
|
||||
}
|
||||
|
||||
if (PQntuples(res) == 0)
|
||||
if(PQntuples(res) == 0)
|
||||
{
|
||||
log_err(_("standby_monitor(): no active master found\n"));
|
||||
PQclear(res);
|
||||
@@ -927,19 +927,18 @@ standby_monitor(void)
|
||||
active_master_id = atoi(PQgetvalue(res, 0, 0));
|
||||
PQclear(res);
|
||||
|
||||
if (active_master_id != master_options.node)
|
||||
if(active_master_id != master_options.node)
|
||||
{
|
||||
log_notice(_("connecting to active master (node %i)...\n"), active_master_id); \
|
||||
if (master_conn != NULL)
|
||||
if(master_conn != NULL)
|
||||
{
|
||||
PQfinish(master_conn);
|
||||
}
|
||||
master_conn = get_master_connection(my_local_conn,
|
||||
local_options.cluster_name,
|
||||
&master_options.node, NULL);
|
||||
|
||||
}
|
||||
if (PQstatus(master_conn) != CONNECTION_OK)
|
||||
PQreset(master_conn);
|
||||
|
||||
/*
|
||||
* Cancel any query that is still being executed, so i can insert the
|
||||
@@ -994,9 +993,9 @@ standby_monitor(void)
|
||||
*/
|
||||
sqlquery_snprintf(sqlquery,
|
||||
"INSERT INTO %s.repl_monitor "
|
||||
" (primary_node, standby_node, "
|
||||
" (master_node, standby_node, "
|
||||
" last_monitor_time, last_apply_time, "
|
||||
" last_wal_primary_location, last_wal_standby_location, "
|
||||
" last_wal_master_location, last_wal_standby_location, "
|
||||
" replication_lag, apply_lag ) "
|
||||
" VALUES(%d, %d, "
|
||||
" '%s'::TIMESTAMP WITH TIME ZONE, '%s'::TIMESTAMP WITH TIME ZONE, "
|
||||
@@ -1004,7 +1003,7 @@ standby_monitor(void)
|
||||
" %llu, %llu) ",
|
||||
get_repmgr_schema_quoted(master_conn),
|
||||
master_options.node, local_options.node,
|
||||
monitor_standby_timestamp, last_wal_standby_applied_timestamp,
|
||||
monitor_standby_timestamp, last_wal_standby_applied_timestamp,
|
||||
last_wal_master_location, last_wal_standby_received,
|
||||
(long long unsigned int)(lsn_master - lsn_standby_received),
|
||||
(long long unsigned int)(lsn_standby_received - lsn_standby_applied));
|
||||
@@ -1102,7 +1101,7 @@ do_master_failover(void)
|
||||
|
||||
/* Copy details of the failed node */
|
||||
/* XXX only node_id is actually used later */
|
||||
if (nodes[i].type == MASTER)
|
||||
if(nodes[i].type == MASTER)
|
||||
{
|
||||
failed_master.node_id = nodes[i].node_id;
|
||||
failed_master.xlog_location = nodes[i].xlog_location;
|
||||
@@ -1146,8 +1145,8 @@ do_master_failover(void)
|
||||
total_nodes, visible_nodes);
|
||||
|
||||
/*
|
||||
* Am I on the group that should keep alive? If I see less than half of
|
||||
* total_nodes then I should do nothing
|
||||
* am i on the group that should keep alive? if i see less than half of
|
||||
* total_nodes then i should do nothing
|
||||
*/
|
||||
if (visible_nodes < (total_nodes / 2.0))
|
||||
{
|
||||
@@ -1208,7 +1207,7 @@ do_master_failover(void)
|
||||
|
||||
/* If position is 0/0, error */
|
||||
/* XXX do we need to terminate ourselves if the queried node has a problem? */
|
||||
if (xlog_recptr == InvalidXLogRecPtr)
|
||||
if(xlog_recptr == InvalidXLogRecPtr)
|
||||
{
|
||||
log_err(_("InvalidXLogRecPtr detected on standby node %i\n"), nodes[i].node_id);
|
||||
terminate(ERR_FAILOVER_FAIL);
|
||||
@@ -1298,12 +1297,12 @@ do_master_failover(void)
|
||||
* empty string; otherwise position is 0/0 and we need to continue
|
||||
* looping until a valid LSN is reported
|
||||
*/
|
||||
if (xlog_recptr == InvalidXLogRecPtr)
|
||||
if(xlog_recptr == InvalidXLogRecPtr)
|
||||
{
|
||||
if (lsn_format_ok == false)
|
||||
if(lsn_format_ok == false)
|
||||
{
|
||||
/* Unable to parse value returned by `repmgr_get_last_standby_location()` */
|
||||
if (*PQgetvalue(res, 0, 0) == '\0')
|
||||
if(*PQgetvalue(res, 0, 0) == '\0')
|
||||
{
|
||||
log_crit(
|
||||
_("unable to obtain LSN from node %i"), nodes[i].node_id
|
||||
@@ -1435,6 +1434,25 @@ do_master_failover(void)
|
||||
/* and reconnect to the local database */
|
||||
my_local_conn = establish_db_connection(local_options.conninfo, true);
|
||||
|
||||
/* update node information to reflect new status */
|
||||
if(update_node_record_set_master(my_local_conn, node_info.node_id, failed_master.node_id) == false)
|
||||
{
|
||||
appendPQExpBuffer(&event_details,
|
||||
_("unable to update node record for node %i (promoted to master following failure of node %i)"),
|
||||
node_info.node_id,
|
||||
failed_master.node_id);
|
||||
|
||||
log_err("%s\n", event_details.data);
|
||||
|
||||
create_event_record(NULL,
|
||||
&local_options,
|
||||
node_info.node_id,
|
||||
"repmgrd_failover_promote",
|
||||
false,
|
||||
event_details.data);
|
||||
|
||||
terminate(ERR_DB_QUERY);
|
||||
}
|
||||
|
||||
/* update internal record for this node */
|
||||
node_info = get_node_info(my_local_conn, local_options.cluster_name, local_options.node);
|
||||
@@ -1483,9 +1501,9 @@ do_master_failover(void)
|
||||
*/
|
||||
new_master_conn = establish_db_connection(best_candidate.conninfo_str, true);
|
||||
|
||||
if (local_options.use_replication_slots)
|
||||
if(local_options.use_replication_slots)
|
||||
{
|
||||
if (create_replication_slot(new_master_conn, node_info.slot_name) == false)
|
||||
if(create_replication_slot(new_master_conn, node_info.slot_name) == false)
|
||||
{
|
||||
|
||||
appendPQExpBuffer(&event_details,
|
||||
@@ -1518,7 +1536,7 @@ do_master_failover(void)
|
||||
my_local_conn = establish_db_connection(local_options.conninfo, true);
|
||||
|
||||
/* update node information to reflect new status */
|
||||
if (update_node_record_set_upstream(new_master_conn, local_options.cluster_name, node_info.node_id, best_candidate.node_id) == false)
|
||||
if(update_node_record_set_upstream(new_master_conn, node_info.node_id, best_candidate.node_id) == false)
|
||||
{
|
||||
appendPQExpBuffer(&event_details,
|
||||
_("Unable to update node record for node %i (following new upstream node %i)"),
|
||||
@@ -1586,7 +1604,7 @@ do_upstream_standby_failover(t_node_info upstream_node)
|
||||
* Verify that we can still talk to the cluster master even though
|
||||
* node upstream is not available
|
||||
*/
|
||||
if (!check_connection(&master_conn, "master", NULL))
|
||||
if (!check_connection(master_conn, "master"))
|
||||
{
|
||||
log_err(_("do_upstream_standby_failover(): Unable to connect to last known master node\n"));
|
||||
return false;
|
||||
@@ -1610,7 +1628,7 @@ do_upstream_standby_failover(t_node_info upstream_node)
|
||||
return false;
|
||||
}
|
||||
|
||||
if (PQntuples(res) == 0)
|
||||
if(PQntuples(res) == 0)
|
||||
{
|
||||
log_err(_("no node with id %i found"), upstream_node_id);
|
||||
PQclear(res);
|
||||
@@ -1618,7 +1636,7 @@ do_upstream_standby_failover(t_node_info upstream_node)
|
||||
}
|
||||
|
||||
/* upstream node is inactive */
|
||||
if (strcmp(PQgetvalue(res, 0, 1), "f") == 0)
|
||||
if(strcmp(PQgetvalue(res, 0, 1), "f") == 0)
|
||||
{
|
||||
/*
|
||||
* Upstream node is an inactive master, meaning no there are no direct
|
||||
@@ -1628,7 +1646,7 @@ do_upstream_standby_failover(t_node_info upstream_node)
|
||||
* provide an option to either try and find the current master and/or
|
||||
* a strategy to connect to a different upstream node
|
||||
*/
|
||||
if (strcmp(PQgetvalue(res, 0, 4), "master") == 0)
|
||||
if(strcmp(PQgetvalue(res, 0, 4), "master") == 0)
|
||||
{
|
||||
log_err(_("unable to find active master node\n"));
|
||||
PQclear(res);
|
||||
@@ -1662,7 +1680,7 @@ do_upstream_standby_failover(t_node_info upstream_node)
|
||||
terminate(ERR_BAD_CONFIG);
|
||||
}
|
||||
|
||||
if (update_node_record_set_upstream(master_conn, local_options.cluster_name, node_info.node_id, upstream_node_id) == false)
|
||||
if(update_node_record_set_upstream(master_conn, node_info.node_id, upstream_node_id) == false)
|
||||
{
|
||||
terminate(ERR_BAD_CONFIG);
|
||||
}
|
||||
@@ -1675,7 +1693,7 @@ do_upstream_standby_failover(t_node_info upstream_node)
|
||||
|
||||
|
||||
static bool
|
||||
check_connection(PGconn **conn, const char *type, const char *conninfo)
|
||||
check_connection(PGconn *conn, const char *type)
|
||||
{
|
||||
int connection_retries;
|
||||
|
||||
@@ -1686,16 +1704,7 @@ check_connection(PGconn **conn, const char *type, const char *conninfo)
|
||||
*/
|
||||
for (connection_retries = 0; connection_retries < local_options.reconnect_attempts; connection_retries++)
|
||||
{
|
||||
if (*conn == NULL)
|
||||
{
|
||||
if (conninfo == NULL)
|
||||
{
|
||||
log_err("INTERNAL ERROR: *conn == NULL && conninfo == NULL");
|
||||
terminate(ERR_INTERNAL);
|
||||
}
|
||||
*conn = establish_db_connection(conninfo, false);
|
||||
}
|
||||
if (!is_pgup(*conn, local_options.master_response_timeout))
|
||||
if (!is_pgup(conn, local_options.master_response_timeout))
|
||||
{
|
||||
log_warning(_("connection to %s has been lost, trying to recover... %i seconds before failover decision\n"),
|
||||
type,
|
||||
@@ -1713,9 +1722,9 @@ check_connection(PGconn **conn, const char *type, const char *conninfo)
|
||||
}
|
||||
}
|
||||
|
||||
if (!is_pgup(*conn, local_options.master_response_timeout))
|
||||
if (!is_pgup(conn, local_options.master_response_timeout))
|
||||
{
|
||||
log_err(_("unable to reconnect to %s (timeout %i seconds)...\n"),
|
||||
log_err(_("unable to reconnect to %s after %i seconds...\n"),
|
||||
type,
|
||||
local_options.master_response_timeout
|
||||
);
|
||||
@@ -1740,10 +1749,10 @@ set_local_node_failed(void)
|
||||
{
|
||||
PGresult *res;
|
||||
char sqlquery[QUERY_STR_LEN];
|
||||
int active_master_node_id = NODE_NOT_FOUND;
|
||||
int active_master_node_id = -1;
|
||||
char master_conninfo[MAXLEN];
|
||||
|
||||
if (!check_connection(&master_conn, "master", NULL))
|
||||
if (!check_connection(master_conn, "master"))
|
||||
{
|
||||
log_err(_("set_local_node_failed(): Unable to connect to last known master node\n"));
|
||||
return false;
|
||||
@@ -1771,7 +1780,7 @@ set_local_node_failed(void)
|
||||
return false;
|
||||
}
|
||||
|
||||
if (!PQntuples(res))
|
||||
if(!PQntuples(res))
|
||||
{
|
||||
log_err(_("no active master record found\n"));
|
||||
return false;
|
||||
@@ -1781,14 +1790,14 @@ set_local_node_failed(void)
|
||||
strncpy(master_conninfo, PQgetvalue(res, 0, 1), MAXLEN);
|
||||
PQclear(res);
|
||||
|
||||
if (active_master_node_id != master_options.node)
|
||||
if(active_master_node_id != master_options.node)
|
||||
{
|
||||
log_notice(_("current active master is %i; attempting to connect\n"),
|
||||
active_master_node_id);
|
||||
PQfinish(master_conn);
|
||||
master_conn = establish_db_connection(master_conninfo, false);
|
||||
|
||||
if (PQstatus(master_conn) != CONNECTION_OK)
|
||||
if(PQstatus(master_conn) != CONNECTION_OK)
|
||||
{
|
||||
log_err(_("unable to connect to active master\n"));
|
||||
return false;
|
||||
@@ -1946,13 +1955,13 @@ lsn_to_xlogrecptr(char *lsn, bool *format_ok)
|
||||
|
||||
if (sscanf(lsn, "%X/%X", &xlogid, &xrecoff) != 2)
|
||||
{
|
||||
if (format_ok != NULL)
|
||||
if(format_ok != NULL)
|
||||
*format_ok = false;
|
||||
log_err(_("incorrect log location format: %s\n"), lsn);
|
||||
return 0;
|
||||
}
|
||||
|
||||
if (format_ok != NULL)
|
||||
if(format_ok != NULL)
|
||||
*format_ok = true;
|
||||
|
||||
return (((XLogRecPtr) xlogid * 16 * 1024 * 1024 * 255) + xrecoff);
|
||||
@@ -1969,21 +1978,17 @@ usage(void)
|
||||
void
|
||||
help(const char *progname)
|
||||
{
|
||||
printf(_("%s: replication management daemon for PostgreSQL\n"), progname);
|
||||
printf(_("\n"));
|
||||
printf(_("Usage:\n"));
|
||||
printf(_(" %s [OPTIONS]\n"), progname);
|
||||
printf(_("\n"));
|
||||
printf(_("Options:\n"));
|
||||
printf(_(" -?, --help show this help, then exit\n"));
|
||||
printf(_(" -V, --version output version information, then exit\n"));
|
||||
printf(_("Usage: %s [OPTIONS]\n"), progname);
|
||||
printf(_("Replicator manager daemon for PostgreSQL.\n"));
|
||||
printf(_("\nOptions:\n"));
|
||||
printf(_(" --help show this help, then exit\n"));
|
||||
printf(_(" --version output version information, then exit\n"));
|
||||
printf(_(" -v, --verbose output verbose activity information\n"));
|
||||
printf(_(" -m, --monitoring-history track advance or lag of the replication in every standby in repl_monitor\n"));
|
||||
printf(_(" -f, --config-file=PATH path to the configuration file\n"));
|
||||
printf(_(" -d, --daemonize detach process from foreground\n"));
|
||||
printf(_(" -p, --pid-file=PATH write a PID file\n"));
|
||||
printf(_("\n"));
|
||||
printf(_("%s monitors a cluster of servers and optionally performs failover.\n"), progname);
|
||||
printf(_("\n%s monitors a cluster of servers.\n"), progname);
|
||||
}
|
||||
|
||||
|
||||
@@ -2084,7 +2089,7 @@ update_registration(void)
|
||||
|
||||
log_err("%s\n", errmsg.data);
|
||||
|
||||
create_event_record(master_conn,
|
||||
create_event_record(my_local_conn,
|
||||
&local_options,
|
||||
local_options.node,
|
||||
"repmgrd_shutdown",
|
||||
@@ -2226,12 +2231,23 @@ check_and_create_pid_file(const char *pid_file)
|
||||
t_node_info
|
||||
get_node_info(PGconn *conn, char *cluster, int node_id)
|
||||
{
|
||||
char sqlquery[QUERY_STR_LEN];
|
||||
PGresult *res;
|
||||
|
||||
t_node_info node_info = { NODE_NOT_FOUND, NO_UPSTREAM_NODE, "", InvalidXLogRecPtr, UNKNOWN, false, false};
|
||||
t_node_info node_info = {-1, NO_UPSTREAM_NODE, "", InvalidXLogRecPtr, UNKNOWN, false, false};
|
||||
|
||||
res = get_node_record(conn, cluster, node_id);
|
||||
sprintf(sqlquery,
|
||||
"SELECT id, upstream_node_id, conninfo, type, slot_name, active "
|
||||
" FROM %s.repl_nodes "
|
||||
" WHERE cluster = '%s' "
|
||||
" AND id = %i",
|
||||
get_repmgr_schema_quoted(conn),
|
||||
local_options.cluster_name,
|
||||
node_id);
|
||||
|
||||
log_debug("get_node_info(): %s\n", sqlquery);
|
||||
|
||||
res = PQexec(my_local_conn, sqlquery);
|
||||
if (PQresultStatus(res) != PGRES_TUPLES_OK)
|
||||
{
|
||||
PQExpBufferData errmsg;
|
||||
@@ -2244,7 +2260,7 @@ get_node_info(PGconn *conn, char *cluster, int node_id)
|
||||
|
||||
log_err("%s\n", errmsg.data);
|
||||
|
||||
create_event_record(NULL,
|
||||
create_event_record(my_local_conn,
|
||||
&local_options,
|
||||
local_options.node,
|
||||
"repmgrd_shutdown",
|
||||
@@ -2258,7 +2274,7 @@ get_node_info(PGconn *conn, char *cluster, int node_id)
|
||||
if (!PQntuples(res)) {
|
||||
log_warning(_("No record found record for node %i\n"), node_id);
|
||||
PQclear(res);
|
||||
node_info.node_id = NODE_NOT_FOUND;
|
||||
node_info.node_id = -1;
|
||||
return node_info;
|
||||
}
|
||||
|
||||
@@ -2280,18 +2296,139 @@ get_node_info(PGconn *conn, char *cluster, int node_id)
|
||||
static t_server_type
|
||||
parse_node_type(const char *type)
|
||||
{
|
||||
if (strcmp(type, "master") == 0)
|
||||
if(strcmp(type, "master") == 0)
|
||||
{
|
||||
return MASTER;
|
||||
}
|
||||
else if (strcmp(type, "standby") == 0)
|
||||
else if(strcmp(type, "standby") == 0)
|
||||
{
|
||||
return STANDBY;
|
||||
}
|
||||
else if (strcmp(type, "witness") == 0)
|
||||
else if(strcmp(type, "witness") == 0)
|
||||
{
|
||||
return WITNESS;
|
||||
}
|
||||
|
||||
return UNKNOWN;
|
||||
}
|
||||
|
||||
|
||||
static bool
|
||||
update_node_record_set_master(PGconn *conn, int this_node_id, int old_master_node_id)
|
||||
{
|
||||
PGresult *res;
|
||||
char sqlquery[QUERY_STR_LEN];
|
||||
|
||||
log_debug(_("Setting failed node %i inactive; marking node %i as master\n"), old_master_node_id, this_node_id);
|
||||
|
||||
res = PQexec(conn, "BEGIN");
|
||||
|
||||
if (PQresultStatus(res) != PGRES_COMMAND_OK)
|
||||
{
|
||||
log_err(_("Unable to begin transaction: %s\n"),
|
||||
PQerrorMessage(conn));
|
||||
|
||||
PQclear(res);
|
||||
return false;
|
||||
}
|
||||
|
||||
PQclear(res);
|
||||
|
||||
sqlquery_snprintf(sqlquery,
|
||||
" UPDATE %s.repl_nodes "
|
||||
" SET active = FALSE "
|
||||
" WHERE cluster = '%s' "
|
||||
" AND id = %i ",
|
||||
get_repmgr_schema_quoted(conn),
|
||||
local_options.cluster_name,
|
||||
old_master_node_id);
|
||||
|
||||
res = PQexec(conn, sqlquery);
|
||||
|
||||
if (PQresultStatus(res) != PGRES_COMMAND_OK)
|
||||
{
|
||||
log_err(_("Unable to set old master node %i as inactive: %s\n"),
|
||||
old_master_node_id,
|
||||
PQerrorMessage(conn));
|
||||
PQclear(res);
|
||||
|
||||
PQexec(conn, "ROLLBACK");
|
||||
return false;
|
||||
}
|
||||
|
||||
PQclear(res);
|
||||
|
||||
sqlquery_snprintf(sqlquery,
|
||||
" UPDATE %s.repl_nodes "
|
||||
" SET type = 'master', "
|
||||
" upstream_node_id = NULL "
|
||||
" WHERE cluster = '%s' "
|
||||
" AND id = %i ",
|
||||
get_repmgr_schema_quoted(conn),
|
||||
local_options.cluster_name,
|
||||
this_node_id);
|
||||
|
||||
res = PQexec(conn, sqlquery);
|
||||
|
||||
if (PQresultStatus(res) != PGRES_COMMAND_OK)
|
||||
{
|
||||
log_err(_("Unable to set current node %i as active master: %s\n"),
|
||||
this_node_id,
|
||||
PQerrorMessage(conn));
|
||||
PQclear(res);
|
||||
|
||||
PQexec(conn, "ROLLBACK");
|
||||
return false;
|
||||
}
|
||||
|
||||
PQclear(res);
|
||||
|
||||
res = PQexec(conn, "COMMIT");
|
||||
|
||||
if (PQresultStatus(res) != PGRES_COMMAND_OK)
|
||||
{
|
||||
log_err(_("Unable to set commit transaction: %s\n"),
|
||||
PQerrorMessage(conn));
|
||||
PQclear(res);
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
PQclear(res);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
|
||||
static bool
|
||||
update_node_record_set_upstream(PGconn *conn, int this_node_id, int new_upstream_node_id)
|
||||
{
|
||||
PGresult *res;
|
||||
char sqlquery[QUERY_STR_LEN];
|
||||
|
||||
log_debug(_("update_node_record_set_upstream(): Updating node %i's upstream node to %i\n"), this_node_id, new_upstream_node_id);
|
||||
|
||||
sqlquery_snprintf(sqlquery,
|
||||
" UPDATE %s.repl_nodes "
|
||||
" SET upstream_node_id = %i "
|
||||
" WHERE cluster = '%s' "
|
||||
" AND id = %i ",
|
||||
get_repmgr_schema_quoted(conn),
|
||||
new_upstream_node_id,
|
||||
local_options.cluster_name,
|
||||
this_node_id);
|
||||
res = PQexec(conn, sqlquery);
|
||||
|
||||
if (PQresultStatus(res) != PGRES_COMMAND_OK)
|
||||
{
|
||||
log_err(_("Unable to set new upstream node id: %s\n"),
|
||||
PQerrorMessage(conn));
|
||||
PQclear(res);
|
||||
|
||||
return false;
|
||||
}
|
||||
|
||||
PQclear(res);
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user