Compare commits

...

200 Commits

Author SHA1 Message Date
Ian Barwick
a492745db1 Update version number for minor release 2.0.3 2015-04-16 11:14:04 +09:00
Ian Barwick
9ac066db51 Add 2.0.3 HISTORY file items 2015-04-16 11:01:31 +09:00
Ian Barwick
20db2f52b1 Clean up indentation, trailing whitespace 2015-04-16 08:47:05 +09:00
Ian Barwick
341831ad69 Change '-X/--fast-checkpoint' option to '-c/--fast-checkpoint'
Make consistent with same option added in 3.0. '-c' is similar to
pg_basebackup's '-c, --checkpoint=fast|spread' option; we may also
want to use -X for something more pg_xlog-related in future.

Also add missing 'break' statement
2015-04-16 08:24:29 +09:00
Ian Barwick
e8bc5521a5 Add option "--initdb-no-pwprompt"
Previously repmgr passed the -W flag to initdb, which forced
manual input of a password; this option removes the -W flag
to make repetitive testing easier.

Conflicts:
	repmgr.c
	repmgr.h
2015-04-16 07:58:41 +09:00
Ian Barwick
1d9aacfed9 Add -S/--superuser option for witness database creation
Previously the witness database creation code was hard-coding the
username 'postgres' when accessing the previously initialised database.
However initdb was not passed any explicit username, meaning the
default database superuser name was the same as the user running
repmgr.

With this patch, a superuser user name (default: postgres) will
be passed to initdb.

Per report by eggyknap [1]

[1] https://github.com/2ndQuadrant/repmgr/issues/38

Conflicts:
	repmgr.c
	repmgr.h
2015-04-15 08:10:58 +09:00
Ian Barwick
e39ec70ef0 Prevent compiler warnings with 9.0
PG_PRINTF_ATTRIBUTE was introduced with 9.1.
2015-04-14 20:21:54 +09:00
Christoph Monech-Tegeder
196585c78a optional fast checkpointing
commandline -X/--fast-checkpoint
2015-04-08 23:09:12 +02:00
Martín Marqués
79728ba6dd Updated RHEL files. 2015-03-30 11:07:11 -03:00
Martín Marqués
c4a47c467f Add check for wal_level = logical so we don't fail on 9.4
On 9.4 we have logical decoding, which introduced a new wal_level called
logical. This level includes all the previous ones, so you can run a
hot_standby if wal_level = logical, because the relevant information for
hot_standby will be there, plus other information needed for logical
decoding.

We fix this be adding a second check when wal_level is not hot_standby.
2015-03-30 10:47:17 -03:00
Ian Barwick
36e5944b2c Add items for future 2.0.3 release 2015-03-27 11:12:11 +09:00
John Galt
097bbdebfd Fixed typo 2015-03-24 10:47:48 +09:00
Germ van Ek
4fa75afa26 Added postgresql-9.4 to debian control file 2015-03-24 10:03:38 +09:00
Ian Barwick
0aae96008f Parse config file before daemonizing
Daemonizing changes the current working directory to '/',
which breaks configuration file parsing if the file is in
the previous working directory and provided without an
explicit path.

Also it makes general sense to parse the configuration file
before daemonizing.
2015-03-09 08:26:59 +09:00
Ian Barwick
2349e182d2 Prevent trim() from segfaulting on an empty string 2015-03-07 23:48:14 +09:00
Ian Barwick
03a8f2eaba Clarify repmgr.conf usage 2015-02-27 10:02:30 +09:00
Ian Barwick
b6a263a40e Update QUICKSTART.md 2015-02-24 08:52:36 +09:00
Magnus Hagander
81050899e8 Fix markup
This was broken in commit  8faf41dd94,
most likely because of a runaway search/replace.
2015-02-24 08:49:55 +09:00
Ian Barwick
51aa63c8f9 Update version and history for minor release 2.0.2 2015-02-17 16:06:00 +09:00
Ian Barwick
e53162deb8 Fix master port check
Check introduced in dc0dfe9b56
was comparing the provided database name instead of the port.
2015-02-12 14:43:53 +09:00
Jaime Casanova
6a8336b880 Add "--checksum" in rsync when using "--force"
If the user don't put that option in rsync_options using of "--force"
could be unsafe.
While the probability of failures because of this are low they aren't
zero.
2015-02-10 20:28:17 -05:00
Marco Nenciarini
4a445e7f8a Fix syntax errors in repmgr.c 2014-11-10 12:38:18 -05:00
Jaime Casanova
3c1d72a5ea Code review: Do not use psql on do_witness_create,
use createdb and createuser binaries instead
2014-11-10 12:37:10 -05:00
Martín Marqués
d4b9a32a86 errcode.h is a local header. 2014-11-10 12:36:17 -05:00
Martín Marqués
07a216ca25 If the user doesn't pass the port on which the primary server is listening
we have to assume it's the DEFAULT_MASTER_PORT.

This was not done, so we added a check to see if it has a value that is
usable, else we use DEFAULT_MASTER_PORT.
2014-11-10 12:34:30 -05:00
Ian Barwick
d3c067f1bd Clarify repmgr database role
Conflicts:
	QUICKSTART.md
2014-11-10 12:33:26 -05:00
Ian Barwick
e6caf11bf2 Fix pg_hba.conf example
Conflicts:
	QUICKSTART.md
2014-11-10 12:30:33 -05:00
Ian Barwick
9909881d81 Update HISTORY for minor release 2.0.1 2014-11-10 12:27:07 -05:00
Ian Barwick
8073a294f0 Formatting fixes
Conflicts:
	QUICKSTART.md
2014-11-10 12:26:28 -05:00
Ian Barwick
bf5e0b9b48 Correct year in specfile changelog 2014-11-10 12:21:49 -05:00
Ian Barwick
2e9f4aa30f Convert QUICKSTART file to markdown format
Less effort for more consistent formatting (at least the way
github renders it).
2014-11-10 12:12:25 -05:00
Ian Barwick
0dcacc3a70 Formatting fixes 2014-11-10 12:11:42 -05:00
Ian Barwick
65120c47cf Fix formatting 2014-11-10 12:10:47 -05:00
Ian Barwick
f9397c0f06 Add a "quickstart" guide
Provides a succinct overview of the steps needed to get repmgr
up and running as.
2014-11-10 12:04:02 -05:00
Ian Barwick
af3c865b05 Fix log messages in do_standby_promote()
Initial connection is to current standby, before attempting to
connect to old master.
2014-11-10 10:44:55 -05:00
Ian Barwick
112a11a311 Typo fixes 2014-11-10 10:42:39 -05:00
Ian Barwick
7b87b5eddd Change successful standby promotion message to log level 'NOTICE'
Was previously 'ERROR'.
2014-11-10 10:40:31 -05:00
Ian Barwick
1aa36ca1c1 Properly specify rsync --exclude directories
Using '--exclude=dirname/*' to explicitly specify directories whose contents
should not be copied. This will result in empty directories being created
on the destination if they exist on the source, but that's not a problem as
they are needed anyway.

Previously the generated rsync command contained '--exclude=pg_log*', which
will break replication on 9.5 as the wildcard expansion prevents the
'pg_logical' directory from being copied.
2014-11-09 18:16:08 -05:00
Ian Barwick
a7eff1f39e Typo fixes and minor wording tweaks for clarity 2014-11-09 17:25:47 -05:00
Riegie Godwin Jeyaranchen
e64e230559 Update README.rst
Fixing a grammar mistake.
2014-11-09 17:15:42 -05:00
Nathan Van Overloop
bba167db9e init script: make status call return proper return code 2014-11-09 11:13:47 -05:00
Nathan Van Overloop
2676adcaed re-add comment full debug of log.c 2014-11-09 11:04:39 -05:00
Nathan Van Overloop
5a27d5e57b on init of witness server create db and user to avoid using postgres 2014-11-09 10:55:09 -05:00
Nathan Van Overloop
4071589ba5 adapt makefile for RHEL + RHEL specific files 2014-11-09 10:51:40 -05:00
brynhood
6cb2376974 Makefile: create bindir before instal + force dir
in order to facilitate building of an rpm I've added an / to the end of the dirs.
2014-11-07 15:25:45 -05:00
PriceChild
235c98a0b5 Typo in example command. 2014-11-07 15:13:23 -05:00
Warren Moore
16da2f48c2 keep naming consistent 2014-11-07 15:12:49 -05:00
Warren Moore
c23e5858f2 fix: witness creation and monitoring
While reading node entries from master use a separate PGresult when inserting into witness.
Witness monitoring supplies a null value for 'last_apply_time'.
2014-11-07 15:09:05 -05:00
József Kószó
30ccee43d9 debian init script and config file documentation fixes 2014-11-07 15:02:27 -05:00
József Kószó
9357f89d12 debian init script and config file documentation fixes 2014-11-07 15:02:03 -05:00
József Kószó
48da11acfd debian init script and config file documentation fixes 2014-11-07 14:40:53 -05:00
Christian Kruse
07c54c296c removed old comment 2014-11-07 13:49:13 -05:00
Christian Kruse
8f0b9592e8 no longer use global variable for SQL query buffer 2014-11-07 13:47:58 -05:00
Christian Kruse
b35bf3f91d removed no-longer used variable 2014-11-07 13:47:18 -05:00
Christian Kruse
04c101c5f0 rather big refactoring: use a naming scheme
In the past naming of functions, variables and such didn't really have a
naming scheme. Now they should have.

This is backpatched from master (2.1dev) just because it will be easier
to backpatch other fixes.
2014-11-07 13:46:04 -05:00
Christian Kruse
65989840d2 avoid usage of snprintf()
We have a nice little abstraction for snprintf with covering the case
that a string is too big for the target buffer – let's use that!
2014-11-07 13:44:23 -05:00
Christian Kruse
24bd4e7a3f completely avoid usage of strnlen() 2014-11-07 13:40:20 -05:00
Christian Kruse
1c67e105ff pg_indent'ing all files…
Conflicts:
	version.h
2014-11-07 13:32:29 -05:00
Christian Kruse
069f9ff2ed version push 2014-03-17 14:26:56 +01:00
Christian Kruse
b8ade8e908 fixing some documentation errors 2014-03-10 15:51:55 +01:00
Christian Kruse
c0abb3be31 Merge branch 'master' into REL2_0_STABLE 2014-03-06 15:23:52 +01:00
Christian Kruse
fed5c77653 various improvements and bugfixes in the init script 2014-03-06 15:23:22 +01:00
Christian Kruse
8429b43edf Merge pull request #14 from wamonite/fix_follow_user
fix: store the master connection user name on standby follow
2014-03-06 15:20:02 +01:00
Warren Moore
7e55ce737d fix: store the master connection user name on standby follow 2014-03-05 16:49:56 +00:00
Christian Kruse
98c7635fb5 fixing more compiler warnings 2014-03-04 17:58:36 +01:00
Christian Kruse
90ecb2b107 fix: check return values of freopen()
Some compiles complain about not checking the return value of freopen(),
so we check it
2014-03-04 15:32:48 +01:00
Christian Kruse
50b9022a41 fix: don't use Windows newlines 2014-03-04 12:59:23 +01:00
Christian Kruse
150ccc0662 add option to avoid repmgrd started upon installation
Now repmgr.repmgrd.default has another option: REPMGRD_ENABLED. Valid
values are either yes or no.
2014-03-04 12:46:05 +01:00
Christian Kruse
0a71123920 Merge branch 'master' into REL2_0_STABLE 2014-03-03 09:25:08 +01:00
Christian Kruse
0ff14a2aa1 avoid compiler warnings 2014-02-21 13:47:29 +01:00
Christian Kruse
5215265694 fix: now CloseConnections() is much more safe 2014-02-18 17:06:36 +01:00
Christian Kruse
e45ac25348 fix: progname is const, do not free it
The leak is irrelevant
2014-02-18 16:45:35 +01:00
Christian Kruse
a1ce01f033 fix: fixed some leaks 2014-02-18 16:35:29 +01:00
Christian Kruse
516cde621a fix: strcpy() on overlapping memory regions is invalid 2014-02-18 15:42:20 +01:00
Christian Kruse
f0807923a3 fix: gettimeofday() expects two arguments 2014-02-18 15:33:56 +01:00
Christian Kruse
10ca8037f8 added some more log messages
Now we should be able to distinguish different events more easily
2014-02-18 14:10:12 +01:00
Christian Kruse
0dc46f0dc8 fix: set connection to NULL when finishing it
This will avoid CloseConnections() to try to close an already closed connection.
2014-02-18 13:42:49 +01:00
Christian Kruse
c3b58658ad fixing repmgr repl_status columns
repmgr repl_status had the column time_lag which was documented to be
the time a standby is behind master. In fact it only works like this
when viewed on the standby and not on the master: there it only was the
time of the last status update. We dropped that column and replaced it
by a new column „communication_time_lag“ which is the content of the
repl_status column on the master. On the standby we contain the time of
the last update in shared mem though refer always to the correct time
nonetheless where repl_status is queried. We also added a new column,
„replication_time_lag“, which refers to the apply delay.
2014-02-15 01:35:27 +01:00
Christian Kruse
18f1fed77f fixing wait_connection_availability()
wait_connection_availability() did take at least 2 seconds per call in
the old incarnation. Now we may finish a call without any sleep at all
when the result is already ready at the time called
2014-02-15 01:31:12 +01:00
Christian Kruse
d58fd080ca flush stderr after a log message appears
We had the problem that the log file appeared empty for a long time due
to file buffers. Thus we call fflush() after every log message so the
log file gets written out to disk quickly
2014-02-15 01:29:12 +01:00
Christian Kruse
c4ac2d3343 fixing PQexec() calls
fixing several calls where we did not check the result status but only
the return value; the query may fail nonetheless
2014-02-15 01:27:53 +01:00
Christian Kruse
a72c2296e9 Merge branch 'master' into REL2_0_STABLE 2014-02-11 09:28:40 +01:00
Christian Kruse
5ff1beeea7 do not enable autofailover by default
Autofailover is an experimental feature which should not be enabled by
default. The user has to be aware of what he is doing when enabling it.
2014-02-11 09:27:31 +01:00
Christian Kruse
9c3d79147b now version.h contains the right version 2014-02-07 21:47:39 +01:00
Christian Kruse
ca470647cb cleanup of usage text
Now it properly aligns and breaks at 78 characters.
2014-01-30 14:26:17 +01:00
Christian Kruse
62ee287e3f updated TODO 2014-01-30 14:10:14 +01:00
Christian Kruse
729a1b848a release notes for 2.0 stable 2014-01-30 13:59:17 +01:00
Christian Kruse
701cf043fd fix: seems as if I missread -hackers 2014-01-23 16:46:49 +01:00
Christian Kruse
bbb67c55f6 simple past of set is set 2014-01-23 10:50:37 +01:00
Christian Kruse
c2c48a9fe6 removed already finished TODO tasks 2014-01-23 10:48:04 +01:00
Christian Kruse
9d6ac2ebf9 fixed documentation and line endings 2014-01-23 10:39:21 +01:00
Christian Kruse
680f23fb1d copyright push 2014-01-23 10:37:49 +01:00
Christian Kruse
1159113c58 ignore the dynamic shared memory directory, too 2014-01-23 10:02:32 +01:00
Christian Kruse
f25a709454 added an explicit type cast to avoid compiler warnings 2014-01-22 15:17:47 +01:00
Christian Kruse
897daddcc7 removed not needed arguments to avoid compiler warnings 2014-01-22 15:17:28 +01:00
Christian Kruse
0fdcce0477 use if instead of switch and avoid a warning 2014-01-22 15:12:29 +01:00
Christian Kruse
de58eff7c1 added a chdir() for proper daemonizing 2014-01-22 14:30:38 +01:00
Christian Kruse
f2a0b31a20 more log format fixes 2014-01-22 14:30:24 +01:00
Christian Kruse
e007a55967 fix: do not use fsync()
We do not need fsync(), the fflush() is enough to avoid concurrent
logs.
2014-01-22 11:47:50 +01:00
Christian Kruse
d235c696af fix: do not newline at the start of a log line
This breaks the log file format since it will have a line break directly
after the timestamp
2014-01-22 11:47:02 +01:00
Christian Kruse
4ef6fbb5fe do not close stderr but reopen it to /dev/null
We want stderr to be always a valid file descriptor
2014-01-21 16:25:57 +01:00
Christian Kruse
2e61d7b156 refactoring: daemonizing is now a function 2014-01-21 16:19:49 +01:00
Christian Kruse
4496a0761e we now use a function and are more sophisticated
Refactoring part: we now use a function to generate the PID
file. Sophistication: we now check if the PID contained in the file is a
valid PID. We ignore the file if it doesn't.
2014-01-21 16:18:15 +01:00
Christian Kruse
3978ead184 use a second fork to avoid a terminal
after the setsid() we are the process leader. And as a process leader we
are able to open a new terminal, even if we currently don't own one. So
we do another fork and do not call setsid() and not become a process
leader to avoid that.
2014-01-21 15:51:33 +01:00
Christian Kruse
b36dbf61fe reopening stdin and stdout to /dev/null now
stdin, stdout and stderr should always be valid file handles. Thus we
don't close them but reopen them to /dev/null
2014-01-21 15:31:38 +01:00
Christian Kruse
84466ecca5 log_crit() is more appropriate 2014-01-21 15:23:20 +01:00
Christian Kruse
649086e5e4 use unlink() instead of remove()
`remove()` will do a rmdir if necessary - we don't want that. So we use `unlink()`
2014-01-21 15:22:31 +01:00
Christian Kruse
7cf2eb440d renamed config options to a much more descriptive name 2014-01-21 15:19:50 +01:00
Christian Kruse
388bbfb773 split install target into install_prog and install_ext
Patch by Marco Nenciarini <mnencia@debian.org>
2014-01-21 14:23:33 +01:00
Christian Kruse
a89aa02c68 fix: make pg_config be settable from outside the makefile
Patch by Marco Nenciarini <mnencia@debian.org>
2014-01-21 14:22:59 +01:00
Christian Kruse
c81793b63f fix: added forgotten options.priority value
Patch by Marco Nenciarini <mnencia@debian.org>
2014-01-21 14:18:12 +01:00
Christian Kruse
b4e83cf188 Add format attribute checking for printf() like functions
Patch by Marco Nenciarini <mnencia@debian.org>
2014-01-21 14:14:36 +01:00
Christian Kruse
1db61ce277 fix: fail when repmgr_funcs is not pre-loaded
when repmgr_funcs is not pre-loaded `repmgr_update_standby_location()`
will return false and `repmgr_get_last_standby_location()` will return
an empty string. Thus we may end in an endless loop. To avoid that we fail.
2014-01-21 13:54:10 +01:00
Christian Kruse
41abf9a7ef fix: flushing and fsync()ing the log file
When not flushing and fsync()ing it the output may be garbled due to
concurrent writes to the file (system() spawns a child process with
stdin/stdout/stderr inherited from it's parent)
2014-01-21 13:52:27 +01:00
Christian Kruse
abebc53ddc fix: sscanf() does not set variables to 0 on error 2014-01-21 13:48:41 +01:00
Christian Kruse
5fc4a0382f added config options sleep_delay and sleep_monitor
sleep_monitor replaces the old SLEEP_MONITOR define and makes it
configurable; this is the interval in which we monitor

sleep_delay replaces the old sleep(300) when waiting for the master to
recover.
2014-01-17 14:35:50 +01:00
Christian Kruse
a7d3c9b93a fix: also close stderr when using syslog logging 2014-01-17 12:14:26 +01:00
Christian Kruse
ee9dc9e247 do not use exit()
We avoid using exit() to be able to clean up when we have to
terminate. This includes removal of the PID file as well as closing
database connections.
2014-01-17 11:28:55 +01:00
Christian Kruse
94cb5b94e7 fix: reopen log file on SIGHUP 2014-01-16 17:16:45 +01:00
Christian Kruse
a08aa50f92 fix: close stdin and stdout only in repmgrd
closing stdin and stdout might cause problems when using system(), so we
avoid it.
2014-01-16 16:01:58 +01:00
Christian Kruse
9563877fbb new config option, stdout/stdin closed
Now stdin and stdout get closed. Additionally stderr gets closed and
reopened to the new config option „logfile“ if specified
2014-01-16 15:22:34 +01:00
Christian Kruse
4f3bd6612c do not exit in getMasterConnection() 2014-01-16 15:07:15 +01:00
Christian Kruse
192ee3cdb0 do not exit in get_cluster_size 2014-01-16 15:07:06 +01:00
Christian Kruse
6f149ead8f do not exit in guc_setted and guc_setted_typed 2014-01-16 14:48:46 +01:00
Christian Kruse
77aa6aa326 do not exit in pg_version 2014-01-16 14:48:42 +01:00
Christian Kruse
18206b3a64 do not exit() in is_witness 2014-01-16 14:28:56 +01:00
Christian Kruse
91446bcf93 fix: do not try to reconnect infinitely 2014-01-10 17:26:02 +01:00
Christian Kruse
dcdf8788ae fix: handle connection loss to standby
We do basically the same as we do for the master since connections drop
from time to time
2014-01-10 17:12:03 +01:00
Christian Kruse
4fabfbbbd0 fix: do not exit in is_standby()
Instead we now return an int with 0 meaning „not a standby,“ 1 meaning
„is a standby“ and -1 meaning „connection dropped“
2014-01-10 17:11:16 +01:00
Christian Kruse
c41030b40e Merge branch 'REL2_0_STABLE'
Conflicts:
	HISTORY
	dbutils.h
	repmgr.c
	repmgrd.c
	version.h
2014-01-10 16:07:33 +01:00
Christian Kruse
a0fdadd5d2 this way it is much cleaner 2014-01-09 15:35:44 +01:00
Christian Kruse
4c3d7f80ed now code compiles with -ansi -pedantic and has less warnings 2014-01-09 14:45:07 +01:00
Christian Kruse
6e3fe059d8 added config options pg_bindir and pg_ctl_options 2014-01-09 14:44:34 +01:00
Christian Kruse
9f26254ac3 fix: added some missing initializers to avoid compiler warning 2014-01-09 13:33:22 +01:00
Christian Kruse
0e8ff1730e added handling of a PID file 2014-01-09 13:04:40 +01:00
Christian Kruse
634fdff303 fix: do not call setup_event_handlers() on WIN32
If we put setup_event_handlers() in #ifdef WIN32, we have to do it for
the call and the declaration, too
2014-01-09 12:57:16 +01:00
Christian Kruse
cbce29f009 fixed typos 2014-01-08 11:55:03 +01:00
Christian Kruse
920f925e4b added a new cli option --daemonize
This option forks the process and generates a new session. This
effectively detaches it from the shell. Don't forget to redirect stderr
or use syslog for logging!
2014-01-08 11:53:15 +01:00
Christian Kruse
9fe2d6886e white space cleanup 2014-01-07 16:42:06 +01:00
Christian Kruse
0068dd573a fix: do not compare pointers but the strings 2014-01-07 15:52:29 +01:00
Christian Kruse
d0f3cb59c7 fix: create data directory after sanity check 2014-01-07 14:42:55 +01:00
Christian Kruse
7428e92e10 fix: correctly check the return value of PQexec()
not only check if return value is not NULL but also check that the
returned result is a PGRES_COMMAND_OK (e.g. the INSERT was successful)
2014-01-07 14:27:31 +01:00
Christian Kruse
a97065113d fix: remove own node earlier if force is set
We have to remove our own node before we check for a new master if force
is set; else master register would fail on the second time since there
already is a master (ourselves), even if we specify -F
2014-01-07 14:16:58 +01:00
Christian Kruse
9e2f276fcf fix: do not exit after pg_start_backup() w/o pg_stop_backup() 2014-01-07 14:02:29 +01:00
Christian Kruse
b0cd2b5e43 fix: do not exit() in create_pgdir()
This could leave the database in a locked state (pg_start_backup()).
And since all calls to create_pgdir() handle the return value correctly
we simply replace the exit() by a return false
2014-01-07 14:01:46 +01:00
Jaime Casanova
9209248420 Fix oversight in the header of guc_setted_typed() 2013-12-19 11:09:08 -05:00
Jaime Casanova
6693b99288 Files to create the debian package
Patch by: Christian Kruse
2013-12-19 01:43:12 -05:00
Jaime Casanova
8e7b487838 Update debian control file 2013-12-19 01:41:24 -05:00
Jaime Casanova
7f796e2d15 Update history and credit files 2013-12-19 01:40:00 -05:00
Jaime Casanova
5e04ab6eae Add a ssh_options parameter to allow ssh checking
to consider non-default values (ie: a different port)

Patch by Jay Taylor
2013-12-19 01:22:55 -05:00
Jaime Casanova
a1f4285e2b Add guc_setted_typed() function to allow
wal_keep_segmeents to be checked as an integer instead
of text

Patch by Jay Taylor
2013-12-19 01:22:42 -05:00
Jaime Casanova
493133986d Add timestamps to log line in stderr
Patch by Christian Kruse
2013-12-19 01:15:28 -05:00
Jaime Casanova
8b370dc581 Fix some typos
Patch by Krzysztof Gajdemski
2013-12-07 13:25:46 -05:00
Jaime Casanova
43af00aa12 Ignore pg_log when cloning, just like we ignore pg_xlog 2013-12-04 01:23:48 -05:00
Jaime Casanova
3c8df59eb9 Make repmgr compile in 9.3.
Patch provided by Shawn Ellis with some fixes by me.
2013-11-14 00:43:35 -05:00
Jaime Casanova
b410772627 Rework algorithm to coordinate voting
Make this by waiting for all nodes to finish a step, before starting
a new one. So everyone starts promoting or following in a coordinated
fashion.
Also make a few fixes.
2013-09-26 13:24:31 -05:00
Jaime Casanova
d99024ba11 Make repmgrd survive to the failover
To do this it needs to reconnect to the new master
2013-09-26 11:58:59 -05:00
Jaime Casanova
1afaa3a26f Rearrange the logic in do_failover() for further improvements.
Specially, make this a more coordinated process by making all
nodes waiting for the others before going to the next step.

This is one step further in following Andres Freund advices
but there is still a lot to do in order to complete that,
specially it could be needed to add more fields to repl_nodes
and to the shm area.
2013-09-23 18:28:58 -05:00
Jaime Casanova
079a7c9f16 In a failover situation get the nodes in a well defined order.
When deciding which node will be the new master, we should get the
nodes in a well defined order otherwise two standbys could process
nodes with the same priority in different order and end up with
a two master situation.
2013-07-26 00:59:50 -05:00
Jaime Casanova
3b66a31ac9 In a failover situation get the nodes in a well defined order.
When deciding which node will be the new master, we should get the
nodes in a well defined order otherwise two standbys could process
nodes with the same priority in different order and end up with
a two master situation.
2013-07-26 00:52:31 -05:00
Jaime Casanova
bdf957ca52 Add a missing ')'. This is a typo introduced in commit
2bc8044fda

Per complaint from Carlos Chapi when compiling for a customer.
2013-07-13 12:39:13 -05:00
Jaime Casanova
ad3630e7a9 Add a missing ')'. This is a typo introduced in commit
2bc8044fda

Per complaint from Carlos Chapi when compiling for a customer.
2013-07-13 12:37:15 -05:00
Jaime Casanova
67b451aa45 If PQgetCancel() returns NULL we should also return false.
Noted by Andres Freund.
2013-07-12 08:03:36 -05:00
Jaime Casanova
0a70d907ae Improve messages in wait_connection_availability, so we know what
error makes the failover procedure to start

By gripe from Andres Freund
2013-07-12 08:03:25 -05:00
Jaime Casanova
2e7acf03c4 If PQgetCancel() returns NULL we should also return false.
Noted by Andres Freund.
2013-07-12 08:01:01 -05:00
Jaime Casanova
2bc8044fda Improve messages in wait_connection_availability, so we know what
error makes the failover procedure to start

By gripe from Andres Freund
2013-07-10 19:25:58 -05:00
Jaime Casanova
ab1d380843 If PQcancel() fails, consider it as if the master is failing.
Because PQcancel() establish a new synchronous connection to the
database, if it fails it means something wrong has happenned with
master. So instead of just ignore the failure, CancelQuery() now
reports a failure condition so we can detect master's death in
that situation.

This is very important specially when only postmaster crashes but
other children/backend connections are still there. Because the
children connection won't fail and CancelQuery() failure is our
only indication of something wrong happenning.
Currently we just ignore the PQcancel() failure which leads us to
a situation in which we just loop forever
trying to cancel the async query.

Reported by: Martin Euser <martin.euser@nl.abnamro.com>
Problem analyzed and bug spotted by: Andres Freund <andres@2ndquadrant.com>
Patch by: Jaime Casanova <jaime@2ndquadrant.com>
2013-07-10 10:21:51 -05:00
Jaime Casanova
b0b44a157f If PQcancel() fails, consider it as if the master is failing.
Because PQcancel() establish a new synchronous connection to the
database, if it fails it means something wrong has happenned with
master. So instead of just ignore the failure, CancelQuery() now
reports a failure condition so we can detect master's death in
that situation.

This is very important specially when only postmaster crashes but
other children/backend connections are still there. Because the
children connection won't fail and CancelQuery() failure is our
only indication of something wrong happenning.
Currently we just ignore the PQcancel() failure which leads us to
a situation in which we just loop forever
trying to cancel the async query.

Reported by: Martin Euser <martin.euser@nl.abnamro.com>
Problem analyzed and bug spotted by: Andres Freund <andres@2ndquadrant.com>
Patch by: Jaime Casanova <jaime@2ndquadrant.com>
2013-07-10 09:53:45 -05:00
Jaime Casanova
49a2531930 Options -F -W -I -v doesn't accept arguments, which means that on
getopt_long shouldn't be marked with the colon (:) character.

This has been wrong since day one, so backpatching all the way until
1.1
2013-01-13 16:37:39 -05:00
Jaime Casanova
672b237c4e Options -F -W -I -v doesn't accept arguments, which means that on
getopt_long shouldn't be marked with the colon (:) character.

This has been wrong since day one, so backpatching all the way until
1.1
2013-01-13 16:32:56 -05:00
Jaime Casanova
7d94151494 If the node is a witness don't bother asking its position, it always
will be 0/0. We just need to check that we can connect to it to determine
if we are in the majority.
2013-01-11 03:44:50 -05:00
Jaime Casanova
4191b77e70 If the node is a witness don't bother asking its position, it always
will be 0/0. We just need to check that we can connect to it to determine
if we are in the majority.
2013-01-11 03:42:08 -05:00
Jaime Casanova
2a5d431481 Fix a problem that caused a standby to promote itself without going to
voting procedure.

This is because of a race condition inside CheckPrimaryConnection().

This has independently reported by Alex Railean and Dumitru, and Frank Jördens.
Analyzed and fixed by Cédric Villemain.

The fix have been verified to work by Frank
2012-12-19 12:01:27 -05:00
Jaime Casanova
81b8a944de Fix a problem that caused a standby to promote itself without going to
voting procedure.

This is because of a race condition inside CheckPrimaryConnection().

This has independently reported by Alex Railean and Dumitru, and Frank Jördens.
Analyzed and fixed by Cédric Villemain.

The fix have been verified to work by Frank
2012-12-19 11:45:58 -05:00
Jaime Casanova
93a999adc7 Formatting code using astyle 2012-12-11 11:49:07 -05:00
Jaime Casanova
1b69282df9 Formatting code using astyle 2012-12-11 11:47:59 -05:00
Jaime Casanova
06dd252f69 To select new master it needs to know which standby has received more
xlog records from master, so it standby should use pg_last_xlog_receive_location()
to report their positions. This solves a possible situation in which
a standby that is considered as new master when promoted is no longer
the best option.
2012-12-03 09:27:12 -05:00
Jaime Casanova
088ca29fe3 To select new master it needs to know which standby has received more
xlog records from master, so it standby should use pg_last_xlog_receive_location()
to report their positions. This solves a possible situation in which
a standby that is considered as new master when promoted is no longer
the best option.
2012-12-03 09:18:08 -05:00
Jaime Casanova
30e9d06172 Add an option for STANDBY FOLLOW to wait for a master to appear.
This is important for autofailover to do the right thing when
standbys detected master death at different times.

While this is a new option, seems important for the autofailover
to work properly so i will consider the lack of it a bug and
will backpatch to 2.0 where autofailover was introduced.

For gripe from Alex Railean, about a standby not finding the new
master because the new master hasn't finish promoting.
2012-11-14 15:09:26 -05:00
Jaime Casanova
d6bd5aa381 Add an option for STANDBY FOLLOW to wait for a master to appear.
This is important for autofailover to do the right thing when
standbys detected master death at different times.

While this is a new option, seems important for the autofailover
to work properly so i will consider the lack of it a bug and
will backpatch to 2.0 where autofailover was introduced.

For gripe from Alex Railean, about a standby not finding the new
master because the new master hasn't finish promoting.
2012-11-14 15:07:59 -05:00
Gabriele Bartolini
bbdcffa813 Fixed typos notified by lintian 2012-11-09 18:09:43 +01:00
Jaime Casanova
cd1a84252e Fix node decision logic when priorities are involved. Currently if
two nodes with different prorities are equally good to be promoted
the second one (with a lower priority, considering them
in descending order) will win.

Per report from Brailean Dumitru
2012-09-16 02:47:02 -05:00
Jaime Casanova
5f33d9d715 Fix node decision logic when priorities are involved. Currently if
two nodes with different prorities are equally good to be promoted
the second one (with a lower priority, considering them
in descending order) will win.

Per report from Brailean Dumitru
2012-09-16 02:38:28 -05:00
Jaime Casanova
2e19b3688b Add a comment 2012-09-16 02:26:18 -05:00
Jaime Casanova
877f4cf82e Add a comment 2012-09-16 02:23:16 -05:00
Jaime Casanova
de883a4c84 Keep compiler quiet. Noted when compiling in FreeBSD in which i
get a warning for an uninitialized variable.

Also, define InvalidXLogRecPtr. We don't really need it but using
it make the initialization future proof (considering that in 9.3
XLogRecPtr will change its structure).
2012-09-16 02:21:18 -05:00
Jaime Casanova
949f5ee498 Keep compiler quiet. Noted when compiling in FreeBSD in which i
get a warning for an uninitialized variable.

Also, define InvalidXLogRecPtr. We don't really need it but using
it make the initialization future proof (considering that in 9.3
XLogRecPtr will change its structure).
2012-09-16 02:10:02 -05:00
Jaime Casanova
eb2f7efb4a When we have more command-line arguments than we should have we
need to show that last value and we should use only optind for that
instead of optind+1
2012-09-15 17:39:10 -05:00
Jaime Casanova
85ff3ec286 Fix documentation to always use -h sintax to refer to the node we
want to clone or connect to, instead of relying on the fact that
for some time putting that argument at last worked.
2012-09-15 17:38:42 -05:00
Jaime Casanova
499a501afd Make repmgr compatible with FreeBSD.
We need to add an #include and make it use a different path for the
"true" binary.

Maybe we need to make this changes for all BSD systems but having no
evidence of that i prefer to make this only for systems with __FreeBSD__
2012-09-15 17:37:59 -05:00
Jaime Casanova
0a9107d76d Improve sample of commands for promote and follow 2012-09-15 17:37:43 -05:00
Jaime Casanova
2803bb92a8 Make repmgr compatible with FreeBSD.
We need to add an #include and make it use a different path for the
"true" binary.

Maybe we need to make this changes for all BSD systems but having no
evidence of that i prefer to make this only for systems with __FreeBSD__
2012-09-15 17:32:38 -05:00
Jaime Casanova
16fe41eecf Improve sample of commands for promote and follow 2012-09-11 15:53:57 -05:00
Jaime Casanova
95ec0450da When we have more command-line arguments than we should have we
need to show that last value and we should use only optind for that
instead of optind+1
2012-08-30 02:11:48 -05:00
Jaime Casanova
57aa95f674 Fix documentation to always use -h sintax to refer to the node we
want to clone or connect to, instead of relying on the fact that
for some time putting that argument at last worked.
2012-08-30 02:10:10 -05:00
Jaime Casanova
d365a309fc Fix HISTORY to show from newest to oldest 2012-07-27 11:29:07 -05:00
Jaime Casanova
d5a41bb587 Fix tabs in HISTORY 2012-07-27 11:22:04 -05:00
Jaime Casanova
474d3217b4 Fix typos in RELEASE NOTES 2012-07-27 11:21:49 -05:00
Jaime Casanova
7a00d5a9a4 Now that we can have no monitoring we need to check all nodes at failover
not only those in repl_monitor
2012-07-21 17:53:15 -05:00
Jaime Casanova
5683b905dd New development branch is 2.1dev 2012-07-21 12:22:04 -05:00
36 changed files with 3790 additions and 1868 deletions

View File

@@ -1,4 +1,4 @@
Copyright (c) 2010-2012, 2ndQuadrant Limited
Copyright (c) 2010-2014, 2ndQuadrant Limited
All rights reserved.
This program is free software: you can redistribute it and/or modify

View File

@@ -10,3 +10,7 @@ Hannu Krosing <hannu@2ndQuadrant.com>
Cédric Villemain <cedric@2ndquadrant.com>
Charles Duffy <charles@dyfis.net>
Daniel Farina <daniel@heroku.com>
Shawn Ellis <shawn.ellis17@gmail.com>
Jay Taylor <jay@jaytaylor.com>
Christian Kruse <christian@2ndQuadrant.com>
Krzysztof Gajdemski <songo@debian.org.pl>

44
HISTORY
View File

@@ -1,5 +1,43 @@
2.0beta 2012-07-27
Make CLONE command try to make an exact copy including $PGDATA location (Cedric)
2.0.3 2015-04-16
Add -S/--superuser option for witness database creation Ian)
Add -c/--fast-checkpoint option for cloning (Christoph)
Add option "--initdb-no-pwprompt" (Ian)
2.0.2 2015-02-17
Add "--checksum" in rsync when using "--force" (Jaime)
Use createdb/createuser instead of psql (Jaime)
Fixes to witness creation and monitoring (wamonite)
Use default master port if none supplied (Martín)
Documentation fixes and improvements (Ian)
2.0.1 2014-07-16
Documentation fixes and new QUICKSTART file (Ian)
Explicitly specify directories to ignore when cloning (Ian)
Fix log level for some log messages (Ian)
RHEL/CentOS specfile, init script and Makefile fixes (Nathan Van Overloop)
Debian init script and config file documentation fixes (József Kószó)
Typo fixes (Riegie Godwin Jeyaranchen, PriceChild)
2.0stable 2014-01-30
Documentation fixes (Christian)
General refactoring, code quality improvements and stabilization work (Christian)
Added proper daemonizing (-d/--daemonize) (Christian)
Added PID file handling (-p/--pid-file) (Christian)
New config option: monitor_interval_secs (Christian)
New config option: retry_promote_interval (Christian)
New config option: logfile (Christian)
New config option: pg_bindir (Christian)
New config option: pgctl_options (Christian)
2.0beta2 2013-12-19
Improve autofailover logic and algorithms (Jaime, Andres)
Ignore pg_log when cloning (Jaime)
Add timestamps to log line in stderr (Christian)
Correctly check wal_keep_segments (Jay Taylor)
Add a ssh_options parameter (Jay Taylor)
2.0beta1 2012-07-27
Make CLONE command try to make an exact copy including $PGDATA location (Cedric)
Add detection of master failure (Jaime)
Add the notion of a witness server (Jaime)
Add autofailover capabilities (Jaime)
@@ -26,7 +64,7 @@
1.1.0 2011-03-09
Make options -U, -R and -p not mandatory (Jaime)
1.1.0b1 2011-02-24
1.1.0b1 2011-02-24
Fix missing "--force" option in help (Greg Smith)
Correct warning message for wal_keep_segments (Bas van Oostveen)
Add Debian build/usage docs (Bas, Hannu Krosing, Cedric Villemain)

View File

@@ -1,6 +1,6 @@
#
# Makefile
# Copyright (c) 2ndQuadrant, 2010-2012
# Copyright (c) 2ndQuadrant, 2010-2014
repmgrd_OBJS = dbutils.o config.o repmgrd.o log.o strutil.o
repmgr_OBJS = dbutils.o check_dir.o config.o repmgr.o log.o strutil.o
@@ -21,7 +21,8 @@ repmgr: $(repmgr_OBJS)
$(CC) $(CFLAGS) $(repmgr_OBJS) $(PG_LIBS) $(LDFLAGS) $(LDFLAGS_EX) $(LIBS) -o repmgr
ifdef USE_PGXS
PGXS := $(shell pg_config --pgxs)
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
else
subdir = contrib/repmgr
@@ -32,11 +33,27 @@ endif
# XXX: Try to use PROGRAM construct (see pgxs.mk) someday. Right now
# is overriding pgxs install.
install:
$(INSTALL_PROGRAM) repmgrd$(X) '$(DESTDIR)$(bindir)'
$(INSTALL_PROGRAM) repmgr$(X) '$(DESTDIR)$(bindir)'
install: install_prog install_ext
install_prog:
mkdir -p '$(DESTDIR)$(bindir)'
$(INSTALL_PROGRAM) repmgrd$(X) '$(DESTDIR)$(bindir)/'
$(INSTALL_PROGRAM) repmgr$(X) '$(DESTDIR)$(bindir)/'
install_ext:
$(MAKE) -C sql install
install_rhel:
mkdir -p '$(DESTDIR)/etc/init.d/'
$(INSTALL_PROGRAM) RHEL/repmgrd.init '$(DESTDIR)/etc/init.d/repmgrd'
mkdir -p '$(DESTDIR)/etc/sysconfig/'
$(INSTALL_PROGRAM) RHEL/repmgrd.sysconfig '$(DESTDIR)/etc/sysconfig/repmgrd'
mkdir -p '$(DESTDIR)/etc/repmgr/'
$(INSTALL_PROGRAM) repmgr.conf.sample '$(DESTDIR)/etc/repmgr/'
mkdir -p '$(DESTDIR)/usr/bin/'
$(INSTALL_PROGRAM) repmgrd$(X) '$(DESTDIR)/usr/bin/'
$(INSTALL_PROGRAM) repmgr$(X) '$(DESTDIR)/usr/bin/'
ifneq (,$(DATA)$(DATA_built))
@for file in $(addprefix $(srcdir)/, $(DATA)) $(DATA_built); do \
echo "$(INSTALL_DATA) $$file '$(DESTDIR)$(datadir)/$(datamoduledir)'"; \
@@ -62,3 +79,4 @@ deb: repmgrd repmgr
mv debian.deb ../postgresql-repmgr-9.0_1.0.0.deb
rm -rf ./debian/usr

304
QUICKSTART.md Normal file
View File

@@ -0,0 +1,304 @@
repmgr: Quickstart guide
========================
`repmgr` is an open-source tool suite for mananaging replication and failover
among multiple PostgreSQL server nodes. It enhances PostgreSQL's built-in
hot-standby capabilities with a set of administration tools for monitoring
replication, setting up standby servers and performing failover/switchover
operations.
This quickstart guide assumes you are familiar with PostgreSQL replication
setup and Linux/UNIX system administration. For a more detailed tutorial
covering setup on a variety of different systems, see the README.rst file.
Conceptual Overview
-------------------
`repmgr` provides two binaries:
- `repmgr`: a command-line client to manage replication and `repmgr` configuration
- `repmgrd`: an optional daemon process which runs on standby nodes to monitor
replication and node status
Each PostgreSQL node requires a `repmgr.conf` configuration file; additionally
it must be "registered" using the `repmgr` command-line client. `repmgr` stores
information about managed nodes in a custom schema on the node's current master
database.
Requirements
------------
`repmgr` works with PostgreSQL 9.0 and later. All server nodes must be running the
same PostgreSQL major version, and preferably should be running the same minor
version.
`repmgr` will work on any Linux or UNIX-like environment capable of running
PostgreSQL. `rsync` must also be installed.
Installation
------------
`repmgr` must be installed on each PostgreSQL server node.
* Packages
- RPM packages for RedHat-based distributions are available from PGDG
- Debian/Ubuntu provide .deb packages.
It is also possible to build .deb packages directly from the `repmgr` source;
see README.rst for further details.
* Source installation
- `repmgr` source code is hosted at github (https://github.com/2ndQuadrant/repmgr);
tar.gz files can be downloaded from https://github.com/2ndQuadrant/repmgr/releases .
`repmgr` can be built easily using PGXS:
sudo make USE_PGXS=1 install
Configuration
-------------
### Server configuration
Password-less SSH logins must be enabled for the database system user (typically `postgres`)
between all server nodes to enable `repmgr` to copy required files.
### PostgreSQL configuration
The master PostgreSQL node needs to be configured for replication with the
following settings:
wal_level = 'hot_standby' # minimal, archive, hot_standby, or logical
archive_mode = on # allows archiving to be done
archive_command = 'cd .' # command to use to archive a logfile segment
max_wal_senders = 10 # max number of walsender processes
wal_keep_segments = 5000 # in logfile segments, 16MB each; 0 disables
hot_standby = on # "on" allows queries during recovery
Note that `repmgr` expects a default of 5000 wal_keep_segments, although this
value can be overridden when executing the `repmgr` client.
Additionally, `repmgr` requires a dedicated PostgreSQL superuser account
and a database in which to store monitoring and replication data. The `repmgr`
user account will also be used for replication connections from the standby,
so a seperate replication user with the `REPLICATION` privilege is not required.
The database can in principle be any database, including the default `postgres`
one, however it's probably advisable to create a dedicated database for `repmgr`
usage.
### repmgr configuration
Each PostgreSQL node requires a `repmgr.conf` configuration file containing
identification and database connection information:
cluster=test
node=1
node_name=node1
conninfo='host=repmgr_node1 user=repmgr_usr dbname=repmgr_db'
pg_bindir=/path/to/postgres/bin
* `cluster`: common name for the replication cluster; this must be the same on all nodes
* `node`: a unique, abitrary integer identifier
* `name`: a unique, human-readable name
* `conninfo`: a standard conninfo string enabling repmgr to connect to the
control database; user and name must be the same on all nodes, while other
parameters such as port may differ. The `host` parameter *must* be a hostname
resolvable by all nodes on the cluster.
* `pg_bindir`: (optional) location of PostgreSQL binaries, if not in the default $PATH
Note that the configuration file should *not* be stored inside the PostgreSQL
data directory. The configuration file can be specified with the
`-f, --config-file=PATH` option and can have any arbitrary name. If no
configuration file is specified, `repmgr` will search for `repmgr.conf`
in the current working directory.
Each node configuration needs to be registered with `repmgr`, either using the
`repmgr` command line tool, or the `repmgrd` daemon; for details see below. Details
about each node are inserted into the `repmgr` database (for details see below).
Replication setup and monitoring
--------------------------------
For the purposes of this guide, we'll assume the database user will be
`repmgr_usr` and the database will be `repmgr_db`, and that the following
environment variables are set on each node:
- $HOME: the PostgreSQL system user's home directory
- $PGDATA: the PostgreSQL data directory
Master setup
------------
1. Configure PostgreSQL
- create user and database:
```
CREATE ROLE repmgr_usr LOGIN SUPERUSER;
CREATE DATABASE repmgr_db OWNER repmgr_usr;
```
- configure `postgresql.conf` for replication (see above)
- update `pg_hba.conf`, e.g.:
```
host repmgr_db repmgr_usr 192.168.1.0/24 trust
host replication repmgr_usr 192.168.1.0/24 trust
```
Restart the PostgreSQL server after making these changes.
2. Create the `repmgr` configuration file:
$ cat $HOME/repmgr/repmgr.conf
cluster=test
node=1
node_name=node1
conninfo='host=repmgr_node1 user=repmgr_usr dbname=repmgr_db'
pg_bindir=/path/to/postgres/bin
(For an annotated `repmgr.conf` file, see `repmgr.conf.sample` in the
repository's root directory).
3. Register the master node with `repmgr`:
$ repmgr -f $HOME/repmgr/repmgr.conf --verbose master register
[2014-07-04 10:43:42] [INFO] repmgr mgr connecting to master database
[2014-07-04 10:43:42] [INFO] repmgr connected to master, checking its state
[2014-07-04 10:43:42] [INFO] master register: creating database objects inside the repmgr_test schema
[2014-07-04 10:43:43] [NOTICE] Master node correctly registered for cluster test with id 1 (conninfo: host=localhost user=repmgr_usr dbname=repmgr_db)
Slave/standby setup
-------------------
1. Use `repmgr` to clone the master:
$ repmgr -D $PGDATA -d repmgr_db -U repmgr_usr -R postgres --verbose standby clone 192.168.1.2
Opening configuration file: ./repmgr.conf
[2014-07-04 10:49:00] [ERROR] Did not find the configuration file './repmgr.conf', continuing
[2014-07-04 10:49:00] [INFO] repmgr connecting to master database
[2014-07-04 10:49:00] [INFO] repmgr connected to master, checking its state
[2014-07-04 10:49:00] [INFO] Successfully connected to primary. Current installation size is 1807 MB
[2014-07-04 10:49:00] [NOTICE] Starting backup...
[2014-07-04 10:49:00] [INFO] creating directory "/path/to/data/"...
(...)
[2014-07-04 10:53:19] [NOTICE] Finishing backup...
NOTICE: pg_stop_backup complete, all required WAL segments have been archived
[2014-07-04 10:53:21] [INFO] repmgr requires primary to keep WAL files 0000000100000000000000AD until at least 0000000100000000000000AD
[2014-07-04 10:53:21] [NOTICE] repmgr standby clone complete
[2014-07-04 10:53:21] [NOTICE] HINT: You can now start your postgresql server
[2014-07-04 10:53:21] [NOTICE] for example : /etc/init.d/postgresql start
-R is the database system user on the master node. At this point it does not matter
if the `repmgr.conf` file is not found.
This will clone the PostgreSQL database files from the master, including its
`postgresql.conf` and `pg_hba.conf` files, and additionally automatically create
the `recovery.conf` file containing the correct parameters to start streaming
from the primary node.
2. Start the PostgreSQL server
3. Create the `repmgr` configuration file:
$ cat $HOME/repmgr/repmgr.conf
cluster=test
node=2
node_name=node2
conninfo='host=repmgr_node2 user=repmgr_usr dbname=repmgr_db'
pg_bindir=/path/to/postgres/bin
4. Register the master node with `repmgr`:
$ repmgr -f $HOME/repmgr/repmgr.conf --verbose standby register
Opening configuration file: /path/to/repmgr/repmgr.conf
[2014-07-04 11:48:13] [INFO] repmgr connecting to standby database
[2014-07-04 11:48:13] [INFO] repmgr connected to standby, checking its state
[2014-07-04 11:48:13] [INFO] repmgr connecting to master database
[2014-07-04 11:48:13] [INFO] finding node list for cluster 'test'
[2014-07-04 11:48:13] [INFO] checking role of cluster node 'host=repmgr_node1 user=repmgr_usr dbname=repmgr_db'
[2014-07-04 11:48:13] [INFO] repmgr connected to master, checking its state
[2014-07-04 11:48:13] [INFO] repmgr registering the standby
[2014-07-04 11:48:13] [INFO] repmgr registering the standby complete
[2014-07-04 11:48:13] [NOTICE] Standby node correctly registered for cluster test with id 2 (conninfo: host=localhost user=repmgr_usr dbname=repmgr_db)
Monitoring
----------
`repmgrd` is a management and monitoring daemon which runs on standby nodes
and which and can automate remote actions. It can be started simply with e.g.:
repmgrd -f $HOME/repmgr/repmgr.conf --verbose > $HOME/repmgr/repmgr.log 2>&1
or alternatively:
repmgrd -f $HOME/repmgr/repmgr.conf --verbose --monitoring-history > $HOME/repmgr/repmgrd.log 2>&1
which will track advance or lag of the replication in every standby in the
`repl_monitor` table.
Example log output:
[2014-07-04 11:55:17] [INFO] repmgrd Connecting to database 'host=localhost user=repmgr_usr dbname=repmgr_db'
[2014-07-04 11:55:17] [INFO] repmgrd Connected to database, checking its state
[2014-07-04 11:55:17] [INFO] repmgrd Connecting to primary for cluster 'test'
[2014-07-04 11:55:17] [INFO] finding node list for cluster 'test'
[2014-07-04 11:55:17] [INFO] checking role of cluster node 'host=repmgr_node1 user=repmgr_usr dbname=repmgr_db'
[2014-07-04 11:55:17] [INFO] repmgrd Checking cluster configuration with schema 'repmgr_test'
[2014-07-04 11:55:17] [INFO] repmgrd Checking node 2 in cluster 'test'
[2014-07-04 11:55:17] [INFO] Reloading configuration file and updating repmgr tables
[2014-07-04 11:55:17] [INFO] repmgrd Starting continuous standby node monitoring
Failover
--------
To promote a standby to master, on the standby execute e.g.:
repmgr -f $HOME/repmgr/repmgr.conf --verbose standby promote
`repmgr` will attempt to connect to the current master to verify that it
is not available (if it is, `repmgr` will not promote the standby).
Other standby servers need to be told to follow the new master with:
repmgr -f $HOME/repmgr/repmgr.conf --verbose standby follow
See file `autofailover_quick_setup.rst` for details on setting up
automated failover.
repmgr database schema
----------------------
`repmgr` creates a small schema for its own use in the database specified in
each node's conninfo configuration parameter. This database can in principle
be any database. The schema name is the global `cluster` name prefixed
with `repmgr_`, so for the example setup above the schema name is
`repmgr_test`.
The schema contains two tables:
* `repl_nodes`
stores information about all registered servers in the cluster
* `repl_monitor`
stores monitoring information about each node
and one view, `repl_status`, which summarizes the latest monitoring information
for each node.
Further reading
---------------
* http://blog.2ndquadrant.com/announcing-repmgr-2-0/
* http://blog.2ndquadrant.com/managing-useful-clusters-repmgr/
* http://blog.2ndquadrant.com/easier_postgresql_90_clusters/

View File

@@ -23,7 +23,7 @@ databases as a single cluster. repmgr includes two components:
Supported Releases
------------------
repmgr works with PostgreSQL versions 9.0 and superior.
repmgr works with PostgreSQL versions 9.0 and later.
There are currently no incompatibilities when upgrading repmgr from 9.0 to 9.1,
so your 9.0 configuration will work with 9.1
@@ -77,7 +77,7 @@ and run::
And if a previously failed node becomes available again, such as
the lost node1 above, you can get it to resynchronize by only copying
over changes made while it was down using. That hapens with what's
over changes made while it was down. That happens with what's
called a forced clone, which overwrites existing data rather than
assuming it starts with an empty database directory tree::
@@ -131,19 +131,19 @@ If you need to remove the source code temporary files from this directory,
that can be done like this::
make USE_PGXS=1 clean
See below for building notes specific to RedHat Linux variants.
Using a full source code tree
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In this method, the repmgr distribution is copied into the PostgreSQL source
code tree, assumed to be at the ${postgresql_sources} for this example.
code tree, assumed to be under ${postgresql_sources} for this example.
The resulting subdirectory must be named ``contrib/repmgr``, without any
version number::
cp repmgr.tar.gz ${postgresql_sources}/contrib
cd ${postgresql_sources}/contrib
cd ${postgresql_sources}/contrib
tar xvzf repmgr-1.0.tar.gz
cd repmgr
make
@@ -237,7 +237,7 @@ If you already tried to build repmgr before doing this, you'll need to do::
make USE_PGXS=1 clean
To get rid of leftover files from the wrong architecture.
to get rid of leftover files from the wrong architecture.
Notes on Ubuntu, Debian or other Debian-based Builds
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -279,8 +279,8 @@ Confirm software was built correctly
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You should now find the repmgr programs available in the subdirectory where
the rest of your PostgreSQL installation is at. You can confirm the software
is available by checking its version::
the rest of your PostgreSQL binary files are located. You can confirm the
software is available by checking its version::
repmgr --version
repmgrd --version
@@ -374,10 +374,10 @@ Usage walkthrough
This assumes you've already followed the steps in "Installation Outline" to
install repmgr and repmgrd on the system.
A normal production installation of ``repmgr`` will normally involve two
different systems running on the same port, typically the default of 5432,
with both using files owned by the ``postgres`` user account. This
walkthrough assumes the following setup:
A typical production installation of ``repmgr`` might involve two PostgreSQL
instances on seperate servers, both running under the ``postgres`` user account
and both using the default port (5432). This walkthrough assumes the following
setup:
* A primary (master) server called "node1," running as the "postgres" user
who is also the owner of the files. This server is operating on port 5432. This
@@ -427,7 +427,7 @@ system you already have superuser access to.
Clearing the PostgreSQL installation on the Standby
---------------------------------------------------
To setup a new streaming replica, startin by removing any PostgreSQL
To setup a new streaming replica, start by removing any PostgreSQL
installation on the existing standby nodes.
* Stop any server on "node2" and "node3". You can confirm the database
@@ -625,18 +625,18 @@ Now restore to the original configuration by stopping
primary server, then bringing up "node2" as a standby with a valid
``recovery.conf`` file.
Stop the "node2" server::
Stop the "node2" server and type the following on "node1" server::
repmgr -f /var/lib/pgsql/repmgr/repmgr.conf standby promote
Now the original primary, "node1" is acting again as primary.
Now the original primary, "node1", is acting again as primary.
Start the "node2" server and type this on "node1"::
Start the "node2" server and type this on "node2"::
repmgr standby clone --force -h node2 -p 5432 -U postgres -R postgres --verbose
Verify the roles have reversed by attempting to insert a record on "node"
and on "node1".
Verify the roles have reversed by attempting to insert a record on "node1"
and on "node2".
The servers are now again acting as primary on "node1" and standby on "node2".
@@ -660,7 +660,7 @@ You can usually leave out changes to the port number in this case too.
* A database exists on "prime" called "testdb."
* The Postgress installation in each of the above is defined as $PGDATA,
* The Postgres installation in each of the above is defined as $PGDATA,
which is represented here with ``/data/prime`` as the "prime" server and
``/data/standby`` as the "standby" server.
@@ -890,7 +890,7 @@ The output from this program looks like this::
Configuration options:
-D, --data-dir=DIR local directory where the files will be copied to
-f, --config_file=PATH path to the configuration file
-f, --config-file=PATH path to the configuration file
-R, --remote-user=USERNAME database server username for rsync
-w, --wal-keep-segments=VALUE minimum value for the GUC wal_keep_segments (default: 5000)
-I, --ignore-rsync-warning ignore rsync partial transfer warning
@@ -1014,7 +1014,7 @@ The output from this program looks like this::
--version output version information, then exit
--verbose output verbose activity information
--monitoring-history track advance or lag of the replication in every standby in repl_monitor
-f, --config_file=PATH database to connect to
-f, --config-file=PATH path to the configuration file
repmgrd monitors a cluster of servers.
@@ -1085,7 +1085,7 @@ License and Contributions
=========================
repmgr is licensed under the GPL v3. All of its code and documentation is
Copyright 2010-2012, 2ndQuadrant Limited. See the files COPYRIGHT and LICENSE for
Copyright 2010-2014, 2ndQuadrant Limited. See the files COPYRIGHT and LICENSE for
details.
Main sponsorship of repmgr has been from 2ndQuadrant customers.

57
RHEL/repmgr.spec Normal file
View File

@@ -0,0 +1,57 @@
Summary: repmgr
Name: repmgr
Version: 2.0
Release: 2
License: GPLv3
Group: System Environment/Daemons
URL: http://repmgr.org
Packager: Nathan Van Overloop <nathan.van.overloop@nexperteam.be>
Vendor: 2ndQuadrant Limited
Distribution: centos
Source0: %{name}-%{version}.tar.gz
BuildRoot: %{_tmppath}/%{name}-%{version}-%{release}-root
%description
repmgr for centos6
%prep
%setup
%build
export PATH=$PATH:/usr/pgsql-9.3/bin/
%{__make} USE_PGXS=1
%install
[ "%{buildroot}" != "/" ] && %{__rm} -rf %{buildroot}
export PATH=$PATH:/usr/pgsql-9.3/bin/
%{__make} USE_PGXS=1 install DESTDIR=%{buildroot} INSTALL="install -p"
%{__make} USE_PGXS=1 install_prog DESTDIR=%{buildroot} INSTALL="install -p"
%{__make} USE_PGXS=1 install_rhel DESTDIR=%{buildroot} INSTALL="install -p"
%clean
[ "%{buildroot}" != "/" ] && %{__rm} -rf %{buildroot}
%files
%defattr(-,root,root)
/usr/bin/repmgr
/usr/bin/repmgrd
/usr/pgsql-9.3/bin/repmgr
/usr/pgsql-9.3/bin/repmgrd
/usr/pgsql-9.3/lib/repmgr_funcs.so
/usr/pgsql-9.3/share/contrib/repmgr.sql
/usr/pgsql-9.3/share/contrib/repmgr_funcs.sql
/usr/pgsql-9.3/share/contrib/uninstall_repmgr.sql
/usr/pgsql-9.3/share/contrib/uninstall_repmgr_funcs.sql
%attr(0755,root,root)/etc/init.d/repmgrd
%attr(0644,root,root)/etc/sysconfig/repmgrd
%attr(0644,root,root)/etc/repmgr/repmgr.conf.sample
%changelog
* Thu Jun 05 2014 Nathan Van Overloop <nathan.van.overloop@nexperteam.be> 2.0.2
- fix witness creation to create db and user if needed
* Fri Apr 04 2014 Nathan Van Overloop <nathan.van.overloop@nexperteam.be> 2.0.1
- initial build for RHEL6

133
RHEL/repmgrd.init Executable file
View File

@@ -0,0 +1,133 @@
#!/bin/sh
#
# chkconfig: - 75 16
# description: Enable repmgrd replication management and monitoring daemon for PostgreSQL
# processname: repmgrd
# pidfile="/var/run/${NAME}.pid"
# Source function library.
INITD=/etc/rc.d/init.d
. $INITD/functions
# Get function listing for cross-distribution logic.
TYPESET=`typeset -f|grep "declare"`
# Get network config.
. /etc/sysconfig/network
DESC="PostgreSQL replication management and monitoring daemon"
NAME=repmgrd
REPMGRD_ENABLED=no
REPMGRD_OPTS=
REPMGRD_USER=postgres
REPMGRD_BIN=/usr/pgsql-9.3/bin/repmgrd
REPMGRD_PIDFILE=/var/run/repmgrd.pid
REPMGRD_LOCK=/var/lock/subsys/${NAME}
REPMGRD_LOG=/var/lib/pgsql/9.3/data/pg_log/repmgrd.log
# Read configuration variable file if it is present
[ -r /etc/sysconfig/$NAME ] && . /etc/sysconfig/$NAME
# For SELinux we need to use 'runuser' not 'su'
if [ -x /sbin/runuser ]
then
SU=runuser
else
SU=su
fi
test -x $REPMGRD_BIN || exit 0
case "$REPMGRD_ENABLED" in
[Yy]*)
break
;;
*)
exit 0
;;
esac
if [ -z "${REPMGRD_OPTS}" ]
then
echo "Not starting ${NAME}, REPMGRD_OPTS not set in /etc/sysconfig/${NAME}"
exit 0
fi
start()
{
REPMGRD_START=$"Starting ${NAME} service: "
# Make sure startup-time log file is valid
if [ ! -e "${REPMGRD_LOG}" -a ! -h "${REPMGRD_LOG}" ]
then
touch "${REPMGRD_LOG}" || exit 1
chown ${REPMGRD_USER}:postgres "${REPMGRD_LOG}"
chmod go-rwx "${REPMGRD_LOG}"
[ -x /sbin/restorecon ] && /sbin/restorecon "${REPMGRD_LOG}"
fi
echo -n "${REPMGRD_START}"
$SU -l $REPMGRD_USER -c "${REPMGRD_BIN} ${REPMGRD_OPTS} -p ${REPMGRD_PIDFILE} &" >> "${REPMGRD_LOG}" 2>&1 < /dev/null
sleep 2
pid=`head -n 1 "${REPMGRD_PIDFILE}" 2>/dev/null`
if [ "x${pid}" != "x" ]
then
success "${REPMGRD_START}"
touch "${REPMGRD_LOCK}"
echo $pid > "${REPMGRD_PIDFILE}"
echo
else
failure "${REPMGRD_START}"
echo
script_result=1
fi
}
stop()
{
echo -n $"Stopping ${NAME} service: "
if [ -e "${REPMGRD_LOCK}" ]
then
killproc ${NAME}
ret=$?
if [ $ret -eq 0 ]
then
echo_success
rm -f "${REPMGRD_PIDFILE}"
rm -f "${REPMGRD_LOCK}"
else
echo_failure
script_result=1
fi
else
# not running; per LSB standards this is "ok"
echo_success
fi
echo
}
# See how we were called.
case "$1" in
start)
start
;;
stop)
stop
;;
status)
status -p $REPMGRD_PIDFILE $NAME
script_result=$?
;;
restart)
stop
start
;;
*)
echo $"Usage: $0 {start|stop|status|restart}"
exit 2
esac
exit $script_result

21
RHEL/repmgrd.sysconfig Normal file
View File

@@ -0,0 +1,21 @@
# default settings for repmgrd. This file is source by /bin/sh from
# /etc/init.d/repmgrd
# disable repmgrd by default so it won't get started upon installation
# valid values: yes/no
REPMGRD_ENABLED=no
# Options for repmgrd (required)
#REPMGRD_OPTS="--verbose -d -f /var/lib/pgsql/repmgr/repmgr.conf"
# User to run repmgrd as
#REPMGRD_USER=postgres
# repmgrd binary
#REPMGRD_BIN=/usr/bin/repmgr
# pid file
#REPMGRD_PIDFILE=/var/lib/pgsql/repmgr/repmgrd.pid
# log file
#REPMGRD_LOG=/var/lib/pgsql/repmgr/repmgrd.log

19
TODO
View File

@@ -1,21 +1,18 @@
Known issues in repmgr
======================
* The check for whether ``wal_keep_segments`` is considered large enough
does a string comparison rather than an integer one. It can give both
false positive (setting is large enough but flagged as too small) and
false negative (setting is too small but not noted as such) errors.
* When running repmgr against a remote machine, operations that start
the database server using the ``pg_ctl`` command may accidentally
terminate after their associated ssh session ends.
* After running repmgrd as a regular foreground application, hitting
control-C causes the program to crash.
Planned feature improvements
============================
* Before running ``pg_start_backup()``, a sanity check that there is a
a working ssh connection to the destination would help find
configuration errors before disturbing the database.
* Timeline increases when promoting a standby
* A better check which standby did receive most of the data
* Make the fact that a standby may be delayed a factor in the voting
algorithm
* include support for delayed standbys

View File

@@ -1,213 +1,225 @@
=====================================================
PostgreSQL Automatic Fail-Over - User Documentation
=====================================================
Automatic Failover
==================
repmgr allows setups for automatic failover when it detects the failure of the master node.
Following is a quick setup for this.
Installation
============
For convenience, we define:
* node1 is the hostname fully qualified of the Master server, IP 192.168.1.10
* node2 is the hostname fully qualified of the Standby server, IP 192.168.1.11
* witness is the hostname fully qualified of the server used for witness, IP 192.168.1.12
:Note: It is not recommanded to use name defining status of a server like «masterserver»,
this is a name leading to confusion once a failover take place and the Master is
now on the «standbyserver».
Summary
-------
2 PostgreSQL servers are involved in the replication. Automatic fail-over need
to vote to decide what server it should promote, thus an odd number is required
and a witness-repmgrd is installed in a third server where it uses a PostgreSQL
cluster to communicate with other repmgrd daemons.
1. Install PostgreSQL in all the servers involved (including the server used for
witness)
2. Install repmgr in all the servers involved (including the server used for witness)
3. Configure the Master PostreSQL
4. Clone the Master to the Standby using "repmgr standby clone" command
5. Configure repmgr in all the servers involved (including the server used for witness)
6. Register Master and Standby nodes
7. Initiate witness server
8. Start the repmgrd daemons in all nodes
:Note: A complete Hight-Availability design need at least 3 servers to still have
a backup node after a first failure.
Install PostgreSQL
------------------
You can install PostgreSQL using any of the recommended methods. You should ensure
it's 9.0 or superior.
Install repmgr
--------------
Install repmgr following the steps in the README.
Configure PostreSQL
-------------------
Log in node1.
Edit the file postgresql.conf and modify the parameters::
listen_addresses='*'
wal_level = 'hot_standby'
archive_mode = on
archive_command = 'cd .' # we can also use exit 0, anything that
# just does nothing
max_wal_senders = 10
wal_keep_segments = 5000 # 80 GB required on pg_xlog
hot_standby = on
shared_preload_libraries = 'repmgr_funcs'
Edit the file pg_hba.conf and add lines for the replication::
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr 192.168.1.10/30 trust
host replication all 192.168.1.10/30 trust
:Note: It is also possible to use a password authentication (md5), .pgpass file
should be edited to allow connection between each node.
Create the user and database to manage replication::
su - postgres
createuser -s repmgr
createdb -O repmgr repmgr
psql -f /usr/share/postgresql/9.0/contrib/repmgr_funcs.sql repmgr
Restart the PostgreSQL server::
pg_ctl -D $PGDATA restart
And check everything is fine in the server log.
Create the ssh-key for the postgres user and copy it to other servers::
su - postgres
ssh-keygen # /!\ do not use a passphrase /!\
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
exit
rsync -avz ~postgres/.ssh/authorized_keys node2:~postgres/.ssh/
rsync -avz ~postgres/.ssh/authorized_keys witness:~postgres/.ssh/
rsync -avz ~postgres/.ssh/id_rsa* node2:~postgres/.ssh/
rsync -avz ~postgres/.ssh/id_rsa* witness:~postgres/.ssh/
Clone Master
------------
Log in node2.
Clone the node1 (the current Master)::
su - postgres
repmgr -d repmgr -U repmgr standby clone node1
Start the PostgreSQL server::
pg_ctl -D $PGDATA start
And check everything is fine in the server log.
Configure repmgr
----------------
Log in each server and configure repmgr by editing the file
/etc/repmgr/repmgr.conf::
cluster=my_cluster
node=1
node_name=earth
conninfo='host=192.168.1.10 dbname=repmgr user=repmgr'
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic
promote_command='promote_command.sh'
follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'
* *cluster* is the name of the current replication.
* *node* is the number of the current node (1, 2 or 3 in the current example).
* *node_name* is an identifier for every node.
* *conninfo* is used to connect to the local PostgreSQL server (where the configuration file is) from any node. In the witness server configuration it is needed to add a 'port=5499' to the conninfo.
* *master_response_timeout* is the maximum amount of time we are going to wait before deciding the master has died and start failover procedure.
* *reconnect_attempts* is the number of times we will try to reconnect to master after a failure has been detected and before start failover procedure.
* *reconnect_interval* is the amount of time between retries to reconnect to master after a failure has been detected and before start failover procedure.
* *failover* configure behavior : *manual* or *automatic*.
* *promote_command* the command executed to do the failover (including the PostgreSQL failover itself). The command must return 0 on success.
* *follow_command* the command executed to address the current standby to another Master. The command must return 0 on success.
Register Master and Standby
---------------------------
Log in node1.
Register the node as Master::
su - postgres
repmgr -f /etc/repmgr/repmgr.conf master register
Log in node2.
Register the node as Standby::
su - postgres
repmgr -f /etc/repmgr/repmgr.conf standby register
Initialize witness server
-------------------------
Log in witness.
Initialize the witness server::
su - postgres
repmgr -d repmgr -U repmgr -h 192.168.1.10 -D $WITNESS_PGDATA -f /etc/repmgr/repmgr.conf witness create node1
It needs information to connect to the master to copy the configuration of the cluster, also it needs to know where it should initialize it's own $PGDATA.
As part of the procees it also ask for the superuser password so it can connect when needed.
Start the repmgrd daemons
-------------------------
Log in node2 and witness.
su - postgres
repmgrd -f /etc/repmgr/repmgr.conf > /var/log/postgresql/repmgr.log 2>&1
:Note: The Master does not need a repmgrd daemon.
Suspend Automatic behavior
==========================
Edit the repmgr.conf of the node to remove from automatic processing and change::
failover=manual
Then, signal repmgrd daemon::
su - postgres
kill -HUP `pidoff repmgrd`
TODO : -HUP configuration update is not implemented and it should check its
configuration file against its configuration in DB, updating
accordingly the SQL conf (especialy the failover manual or auto)
this allow witness-standby and standby-not-promotable features
and simpler usage of the tool ;)
Usage
=====
The repmgr documentation is in the README file (how to build, options, etc.)
=====================================================
PostgreSQL Automatic Fail-Over - User Documentation
=====================================================
Automatic Failover
==================
repmgr allows setups for automatic failover when it detects the failure of the master node.
Following is a quick setup for this.
Installation
============
For convenience, we define:
**node1**
is the hostname fully qualified of the Master server, IP 192.168.1.10
**node2**
is the hostname fully qualified of the Standby server, IP 192.168.1.11
**witness**
is the hostname fully qualified of the server used for witness, IP 192.168.1.12
**Note:** It is not recommanded to use name defining status of a server like «masterserver»,
this is a name leading to confusion once a failover take place and the Master is
now on the «standbyserver».
Summary
-------
2 PostgreSQL servers are involved in the replication. Automatic fail-over need
to vote to decide what server it should promote, thus an odd number is required
and a witness-repmgrd is installed in a third server where it uses a PostgreSQL
cluster to communicate with other repmgrd daemons.
1. Install PostgreSQL in all the servers involved (including the server used for
witness)
2. Install repmgr in all the servers involved (including the server used for witness)
3. Configure the Master PostreSQL
4. Clone the Master to the Standby using "repmgr standby clone" command
5. Configure repmgr in all the servers involved (including the server used for witness)
6. Register Master and Standby nodes
7. Initiate witness server
8. Start the repmgrd daemons in all nodes
**Note** A complete High-Availability design needs at least 3 servers to still have
a backup node after a first failure.
Install PostgreSQL
------------------
You can install PostgreSQL using any of the recommended methods. You should ensure
it's 9.0 or later.
Install repmgr
--------------
Install repmgr following the steps in the README file.
Configure PostreSQL
-------------------
Log in node1.
Edit the file postgresql.conf and modify the parameters::
listen_addresses='*'
wal_level = 'hot_standby'
archive_mode = on
archive_command = 'cd .' # we can also use exit 0, anything that
# just does nothing
max_wal_senders = 10
wal_keep_segments = 5000 # 80 GB required on pg_xlog
hot_standby = on
shared_preload_libraries = 'repmgr_funcs'
Edit the file pg_hba.conf and add lines for the replication::
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr 192.168.1.10/30 trust
host replication all 192.168.1.10/30 trust
**Note:** It is also possible to use a password authentication (md5), .pgpass file
should be edited to allow connection between each node.
Create the user and database to manage replication::
su - postgres
createuser -s repmgr
createdb -O repmgr repmgr
psql -f /usr/share/postgresql/9.0/contrib/repmgr_funcs.sql repmgr
Restart the PostgreSQL server::
pg_ctl -D $PGDATA restart
And check everything is fine in the server log.
Create the ssh-key for the postgres user and copy it to other servers::
su - postgres
ssh-keygen # /!\ do not use a passphrase /!\
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
exit
rsync -avz ~postgres/.ssh/authorized_keys node2:~postgres/.ssh/
rsync -avz ~postgres/.ssh/authorized_keys witness:~postgres/.ssh/
rsync -avz ~postgres/.ssh/id_rsa* node2:~postgres/.ssh/
rsync -avz ~postgres/.ssh/id_rsa* witness:~postgres/.ssh/
Clone Master
------------
Log in node2.
Clone the node1 (the current Master)::
su - postgres
repmgr -d repmgr -U repmgr -h node1 standby clone
Start the PostgreSQL server::
pg_ctl -D $PGDATA start
And check everything is fine in the server log.
Configure repmgr
----------------
Log in each server and configure repmgr by editing the file
/etc/repmgr/repmgr.conf::
cluster=my_cluster
node=1
node_name=earth
conninfo='host=192.168.1.10 dbname=repmgr user=repmgr'
master_response_timeout=60
reconnect_attempts=6
reconnect_interval=10
failover=automatic
promote_command='promote_command.sh'
follow_command='repmgr standby follow -f /etc/repmgr/repmgr.conf'
**cluster**
is the name of the current replication.
**node**
is the number of the current node (1, 2 or 3 in the current example).
**node_name**
is an identifier for every node.
**conninfo**
is used to connect to the local PostgreSQL server (where the configuration file is) from any node. In the witness server configuration it is needed to add a 'port=5499' to the conninfo.
**master_response_timeout**
is the maximum amount of time we are going to wait before deciding the master has died and start failover procedure.
**reconnect_attempts**
is the number of times we will try to reconnect to master after a failure has been detected and before start failover procedure.
**reconnect_interval**
is the amount of time between retries to reconnect to master after a failure has been detected and before start failover procedure.
**failover**
configure behavior: *manual* or *automatic*.
**promote_command**
the command executed to do the failover (including the PostgreSQL failover itself). The command must return 0 on success.
**follow_command**
the command executed to address the current standby to another Master. The command must return 0 on success.
Register Master and Standby
---------------------------
Log in node1.
Register the node as Master::
su - postgres
repmgr -f /etc/repmgr/repmgr.conf master register
Log in node2. Register it as a standby::
su - postgres
repmgr -f /etc/repmgr/repmgr.conf standby register
Initialize witness server
-------------------------
Log in witness.
Initialize the witness server::
su - postgres
repmgr -d repmgr -U repmgr -h 192.168.1.10 -D $WITNESS_PGDATA -f /etc/repmgr/repmgr.conf witness create
It needs information to connect to the master to copy the configuration of the cluster, also it needs to know where it should initialize it's own $PGDATA.
As part of the procees it also ask for the superuser password so it can connect when needed.
Start the repmgrd daemons
-------------------------
Log in node2 and witness.
su - postgres
repmgrd -f /etc/repmgr/repmgr.conf > /var/log/postgresql/repmgr.log 2>&1
**Note:** The Master does not need a repmgrd daemon.
Suspend Automatic behavior
==========================
Edit the repmgr.conf of the node to remove from automatic processing and change::
failover=manual
Then, signal repmgrd daemon::
su - postgres
kill -HUP `pidof repmgrd`
Usage
=====
The repmgr documentation is in the README file (how to build, options, etc.)

View File

@@ -1,6 +1,6 @@
/*
* check_dir.c - Directories management functions
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -44,9 +44,9 @@
int
check_dir(char *dir)
{
DIR *chkdir;
struct dirent *file;
int result = 1;
DIR *chkdir;
struct dirent *file;
int result = 1;
errno = 0;
@@ -58,7 +58,7 @@ check_dir(char *dir)
while ((file = readdir(chkdir)) != NULL)
{
if (strcmp(".", file->d_name) == 0 ||
strcmp("..", file->d_name) == 0)
strcmp("..", file->d_name) == 0)
{
/* skip this and parent directory */
continue;
@@ -71,6 +71,7 @@ check_dir(char *dir)
}
#ifdef WIN32
/*
* This fix is in mingw cvs (runtime/mingwex/dirent.c rev 1.4), but not in
* released version
@@ -82,29 +83,29 @@ check_dir(char *dir)
closedir(chkdir);
if (errno != 0)
return -1; /* some kind of I/O error? */
return -1; /* some kind of I/O error? */
return result;
}
/*
* Create directory
* Create directory with error log message when failing
*/
bool
create_directory(char *dir)
create_dir(char *dir)
{
if (mkdir_p(dir, 0700) == 0)
return true;
log_err(_("Could not create directory \"%s\": %s\n"),
dir, strerror(errno));
dir, strerror(errno));
return false;
}
bool
set_directory_permissions(char *dir)
set_dir_permissions(char *dir)
{
return (chmod(dir, 0700) != 0) ? false : true;
}
@@ -127,10 +128,10 @@ mkdir_p(char *path, mode_t omode)
{
struct stat sb;
mode_t numask,
oumask;
oumask;
int first,
last,
retval;
last,
retval;
char *p;
p = path;
@@ -149,8 +150,8 @@ mkdir_p(char *path, mode_t omode)
return 1;
}
else if (p[1] == ':' &&
((p[0] >= 'a' && p[0] <= 'z') ||
(p[0] >= 'A' && p[0] <= 'Z')))
((p[0] >= 'a' && p[0] <= 'z') ||
(p[0] >= 'A' && p[0] <= 'Z')))
{
/* local drive */
p += 2;
@@ -221,16 +222,16 @@ bool
is_pg_dir(char *dir)
{
const size_t buf_sz = 8192;
char path[buf_sz];
struct stat sb;
int r;
char path[buf_sz];
struct stat sb;
int r;
// test pgdata
/* test pgdata */
xsnprintf(path, buf_sz, "%s/PG_VERSION", dir);
if (stat(path, &sb) == 0)
return true;
// test tablespace dir
/* test tablespace dir */
sprintf(path, "ls %s/PG_*/ -I*", dir);
r = system(path);
if (r == 0)
@@ -241,67 +242,67 @@ is_pg_dir(char *dir)
bool
create_pgdir(char *dir, bool force)
create_pg_dir(char *dir, bool force)
{
bool pg_dir = false;
bool pg_dir = false;
/* Check this directory could be used as a PGDATA dir */
switch (check_dir(dir))
{
case 0:
/* dir not there, must create it */
log_info(_("creating directory \"%s\"...\n"), dir);
case 0:
/* dir not there, must create it */
log_info(_("creating directory \"%s\"...\n"), dir);
if (!create_directory(dir))
{
log_err(_("couldn't create directory \"%s\"...\n"),
dir);
exit(ERR_BAD_CONFIG);
}
break;
case 1:
/* Present but empty, fix permissions and use it */
log_info(_("checking and correcting permissions on existing directory %s ...\n"),
dir);
if (!set_directory_permissions(dir))
{
log_err(_("could not change permissions of directory \"%s\": %s\n"),
dir, strerror(errno));
exit(ERR_BAD_CONFIG);
}
break;
case 2:
/* Present and not empty */
log_warning(_("directory \"%s\" exists but is not empty\n"),
dir);
pg_dir = is_pg_dir(dir);
/*
* we use force to reduce the time needed to restore a node which
* turn async after a failover or anything else
*/
if (pg_dir && force)
{
/* Let it continue */
if (!create_dir(dir))
{
log_err(_("couldn't create directory \"%s\"...\n"),
dir);
return false;
}
break;
}
else if (pg_dir && !force)
{
log_warning(_("\nThis looks like a PostgreSQL directory.\n"
"If you are sure you want to clone here, "
"please check there is no PostgreSQL server "
"running and use the --force option\n"));
exit(ERR_BAD_CONFIG);
}
case 1:
/* Present but empty, fix permissions and use it */
log_info(_("checking and correcting permissions on existing directory %s ...\n"),
dir);
return false;
default:
/* Trouble accessing directory */
log_err(_("could not access directory \"%s\": %s\n"),
dir, strerror(errno));
exit(ERR_BAD_CONFIG);
if (!set_dir_permissions(dir))
{
log_err(_("could not change permissions of directory \"%s\": %s\n"),
dir, strerror(errno));
return false;
}
break;
case 2:
/* Present and not empty */
log_warning(_("directory \"%s\" exists but is not empty\n"),
dir);
pg_dir = is_pg_dir(dir);
/*
* we use force to reduce the time needed to restore a node which
* turn async after a failover or anything else
*/
if (pg_dir && force)
{
/* Let it continue */
break;
}
else if (pg_dir && !force)
{
log_warning(_("\nThis looks like a PostgreSQL directory.\n"
"If you are sure you want to clone here, "
"please check there is no PostgreSQL server "
"running and use the --force option\n"));
return false;
}
return false;
default:
/* Trouble accessing directory */
log_err(_("could not access directory \"%s\": %s\n"),
dir, strerror(errno));
return false;
}
return true;
}

View File

@@ -1,6 +1,6 @@
/*
* check_dir.h
* Copyright (c) 2ndQuadrant, 2010-2012
* Copyright (c) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -20,11 +20,11 @@
#ifndef _REPMGR_CHECK_DIR_H_
#define _REPMGR_CHECK_DIR_H_
int mkdir_p(char *path, mode_t omode);
int check_dir(char *dir);
bool create_directory(char *dir);
bool set_directory_permissions(char *dir);
bool is_pg_dir(char *dir);
bool create_pgdir(char *dir, bool force);
int mkdir_p(char *path, mode_t omode);
int check_dir(char *dir);
bool create_dir(char *dir);
bool set_dir_permissions(char *dir);
bool is_pg_dir(char *dir);
bool create_pg_dir(char *dir, bool force);
#endif

133
config.c
View File

@@ -1,6 +1,6 @@
/*
* config.c - Functions to parse the config file
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -23,13 +23,14 @@
#include "repmgr.h"
void
parse_config(const char *config_file, t_configuration_options *options)
parse_config(const char *config_file, t_configuration_options * options)
{
char *s, buff[MAXLINELENGTH];
char name[MAXLEN];
char value[MAXLEN];
char *s,
buff[MAXLINELENGTH];
char name[MAXLEN];
char value[MAXLEN];
FILE *fp = fopen (config_file, "r");
FILE *fp = fopen(config_file, "r");
/* Initialize */
memset(options->cluster_name, 0, sizeof(options->cluster_name));
@@ -41,6 +42,9 @@ parse_config(const char *config_file, t_configuration_options *options)
memset(options->promote_command, 0, sizeof(options->promote_command));
memset(options->follow_command, 0, sizeof(options->follow_command));
memset(options->rsync_options, 0, sizeof(options->rsync_options));
memset(options->ssh_options, 0, sizeof(options->ssh_options));
memset(options->pg_bindir, 0, sizeof(options->pg_bindir));
memset(options->pgctl_options, 0, sizeof(options->pgctl_options));
/* if nothing has been provided defaults to 60 */
options->master_response_timeout = 60;
@@ -49,18 +53,22 @@ parse_config(const char *config_file, t_configuration_options *options)
options->reconnect_attempts = 6;
options->reconnect_intvl = 10;
options->monitor_interval_secs = 2;
options->retry_promote_interval_secs = 300;
/*
* Since some commands don't require a config file at all, not
* having one isn't necessarily a problem.
* Since some commands don't require a config file at all, not having one
* isn't necessarily a problem.
*/
if (fp == NULL)
{
log_err(_("Did not find the configuration file '%s', continuing\n"), config_file);
log_err(_("Did not find the configuration file '%s', continuing\n"),
config_file);
return;
}
/* Read next line */
while ((s = fgets (buff, sizeof buff, fp)) != NULL)
while ((s = fgets(buff, sizeof buff, fp)) != NULL)
{
/* Skip blank lines and comments */
if (buff[0] == '\n' || buff[0] == '#')
@@ -71,20 +79,23 @@ parse_config(const char *config_file, t_configuration_options *options)
/* Copy into correct entry in parameters struct */
if (strcmp(name, "cluster") == 0)
strncpy (options->cluster_name, value, MAXLEN);
strncpy(options->cluster_name, value, MAXLEN);
else if (strcmp(name, "node") == 0)
options->node = atoi(value);
else if (strcmp(name, "conninfo") == 0)
strncpy (options->conninfo, value, MAXLEN);
strncpy(options->conninfo, value, MAXLEN);
else if (strcmp(name, "rsync_options") == 0)
strncpy (options->rsync_options, value, QUERY_STR_LEN);
strncpy(options->rsync_options, value, QUERY_STR_LEN);
else if (strcmp(name, "ssh_options") == 0)
strncpy(options->ssh_options, value, QUERY_STR_LEN);
else if (strcmp(name, "loglevel") == 0)
strncpy (options->loglevel, value, MAXLEN);
strncpy(options->loglevel, value, MAXLEN);
else if (strcmp(name, "logfacility") == 0)
strncpy (options->logfacility, value, MAXLEN);
strncpy(options->logfacility, value, MAXLEN);
else if (strcmp(name, "failover") == 0)
{
char failoverstr[MAXLEN];
char failoverstr[MAXLEN];
strncpy(failoverstr, value, MAXLEN);
if (strcmp(failoverstr, "manual") == 0)
@@ -111,15 +122,25 @@ parse_config(const char *config_file, t_configuration_options *options)
options->reconnect_attempts = atoi(value);
else if (strcmp(name, "reconnect_interval") == 0)
options->reconnect_intvl = atoi(value);
else if (strcmp(name, "pg_bindir") == 0)
strncpy(options->pg_bindir, value, MAXLEN);
else if (strcmp(name, "pg_ctl_options") == 0)
strncpy(options->pgctl_options, value, MAXLEN);
else if (strcmp(name, "logfile") == 0)
strncpy(options->logfile, value, MAXLEN);
else if (strcmp(name, "monitor_interval_secs") == 0)
options->monitor_interval_secs = atoi(value);
else if (strcmp(name, "retry_promote_interval_secs") == 0)
options->retry_promote_interval_secs = atoi(value);
else
log_warning(_("%s/%s: Unknown name/value pair!\n"), name, value);
}
/* Close file */
fclose (fp);
fclose(fp);
/* Check config settings */
if (strnlen(options->cluster_name, MAXLEN)==0)
if (*options->cluster_name == '\0')
{
log_err(_("Cluster name is missing. Check the configuration file.\n"));
exit(ERR_BAD_CONFIG);
@@ -148,39 +169,52 @@ parse_config(const char *config_file, t_configuration_options *options)
log_err(_("Reconnect intervals must be zero or greater. Check the configuration file.\n"));
exit(ERR_BAD_CONFIG);
}
if (*options->pg_bindir == '\0')
{
log_err(_("pg_bindir config value not found. Check the configuration file.\n"));
exit(ERR_BAD_CONFIG);
}
}
char *
trim (char *s)
trim(char *s)
{
/* Initialize start, end pointers */
char *s1 = s, *s2 = &s[strlen (s) - 1];
char *s1 = s,
*s2 = &s[strlen(s) - 1];
/* If string is empty, no action needed */
if(s2 < s1)
return s;
/* Trim and delimit right side */
while ( (isspace (*s2)) && (s2 >= s1) )
while ((isspace(*s2)) && (s2 >= s1))
--s2;
*(s2+1) = '\0';
*(s2 + 1) = '\0';
/* Trim left side */
while ( (isspace (*s1)) && (s1 < s2) )
while ((isspace(*s1)) && (s1 < s2))
++s1;
/* Copy finished string */
strcpy (s, s1);
memmove(s, s1, s2 - s1);
s[s2 - s1 + 1] = '\0';
return s;
}
void
parse_line(char *buff, char *name, char *value)
{
int i = 0;
int j = 0;
int i = 0;
int j = 0;
/*
* first we find the name of the parameter
*/
for ( ; i < MAXLEN; ++i)
for (; i < MAXLEN; ++i)
{
if (buff[i] != '=')
name[j++] = buff[i];
@@ -193,7 +227,7 @@ parse_line(char *buff, char *name, char *value)
* Now the value
*/
j = 0;
for ( ++i ; i < MAXLEN; ++i)
for (++i; i < MAXLEN; ++i)
if (buff[i] == '\'')
continue;
else if (buff[i] != '\n')
@@ -205,9 +239,9 @@ parse_line(char *buff, char *name, char *value)
}
bool
reload_configuration(char *config_file, t_configuration_options *orig_options)
reload_config(char *config_file, t_configuration_options * orig_options)
{
PGconn *conn;
PGconn *conn;
t_configuration_options new_options;
@@ -218,57 +252,57 @@ reload_configuration(char *config_file, t_configuration_options *orig_options)
parse_config(config_file, &new_options);
if (new_options.node == -1)
{
log_warning(_("\nCannot load new configuration, will keep current one.\n"));
log_warning(_("Cannot load new configuration, will keep current one.\n"));
return false;
}
if (strcmp(new_options.cluster_name, orig_options->cluster_name) != 0)
{
log_warning(_("\nCannot change cluster name, will keep current configuration.\n"));
log_warning(_("Cannot change cluster name, will keep current configuration.\n"));
return false;
}
if (new_options.node != orig_options->node)
{
log_warning(_("\nCannot change node number, will keep current configuration.\n"));
log_warning(_("Cannot change node number, will keep current configuration.\n"));
return false;
}
if (new_options.node_name != orig_options->node_name)
if (strcmp(new_options.node_name, orig_options->node_name) != 0)
{
log_warning(_("\nCannot change standby name, will keep current configuration.\n"));
log_warning(_("Cannot change standby name, will keep current configuration.\n"));
return false;
}
if (new_options.failover != MANUAL_FAILOVER && new_options.failover != AUTOMATIC_FAILOVER)
{
log_warning(_("\nNew value for failover is not valid. Should be MANUAL or AUTOMATIC.\n"));
log_warning(_("New value for failover is not valid. Should be MANUAL or AUTOMATIC.\n"));
return false;
}
if (new_options.master_response_timeout <= 0)
{
log_warning(_("\nNew value for master_response_timeout is not valid. Should be greater than zero.\n"));
log_warning(_("New value for master_response_timeout is not valid. Should be greater than zero.\n"));
return false;
}
if (new_options.reconnect_attempts < 0)
{
log_warning(_("\nNew value for reconnect_attempts is not valid. Should be greater or equal than zero.\n"));
log_warning(_("New value for reconnect_attempts is not valid. Should be greater or equal than zero.\n"));
return false;
}
if (new_options.reconnect_intvl < 0)
{
log_warning(_("\nNew value for reconnect_interval is not valid. Should be greater or equal than zero.\n"));
log_warning(_("New value for reconnect_interval is not valid. Should be greater or equal than zero.\n"));
return false;
}
/* Test conninfo string */
conn = establishDBConnection(new_options.conninfo, false);
conn = establish_db_connection(new_options.conninfo, false);
if (!conn || (PQstatus(conn) != CONNECTION_OK))
{
log_warning(_("\nconninfo string is not valid, will keep current configuration.\n"));
log_warning(_("conninfo string is not valid, will keep current configuration.\n"));
return false;
}
PQfinish(conn);
@@ -283,19 +317,20 @@ reload_configuration(char *config_file, t_configuration_options *orig_options)
strcpy(orig_options->promote_command, new_options.promote_command);
strcpy(orig_options->follow_command, new_options.follow_command);
strcpy(orig_options->rsync_options, new_options.rsync_options);
strcpy(orig_options->ssh_options, new_options.ssh_options);
orig_options->master_response_timeout = new_options.master_response_timeout;
orig_options->reconnect_attempts = new_options.reconnect_attempts;
orig_options->reconnect_intvl = new_options.reconnect_intvl;
/*
* XXX These ones can change with a simple SIGHUP?
strcpy (orig_options->loglevel, new_options.loglevel);
strcpy (orig_options->logfacility, new_options.logfacility);
logger_shutdown();
XXX do we have progname here ?
logger_init(progname, orig_options.loglevel, orig_options.logfacility);
*/
*
* strcpy (orig_options->loglevel, new_options.loglevel); strcpy
* (orig_options->logfacility, new_options.logfacility);
*
* logger_shutdown(); XXX do we have progname here ? logger_init(progname,
* orig_options.loglevel, orig_options.logfacility);
*/
return true;
}

View File

@@ -1,6 +1,6 @@
/*
* config.h
* Copyright (c) 2ndQuadrant, 2010-2012
* Copyright (c) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -25,25 +25,33 @@
typedef struct
{
char cluster_name[MAXLEN];
int node;
char conninfo[MAXLEN];
int failover;
int priority;
char node_name[MAXLEN];
char promote_command[MAXLEN];
char follow_command[MAXLEN];
char loglevel[MAXLEN];
char logfacility[MAXLEN];
char rsync_options[QUERY_STR_LEN];
int master_response_timeout;
int reconnect_attempts;
int reconnect_intvl;
} t_configuration_options;
char cluster_name[MAXLEN];
int node;
char conninfo[MAXLEN];
int failover;
int priority;
char node_name[MAXLEN];
char promote_command[MAXLEN];
char follow_command[MAXLEN];
char loglevel[MAXLEN];
char logfacility[MAXLEN];
char rsync_options[QUERY_STR_LEN];
char ssh_options[QUERY_STR_LEN];
int master_response_timeout;
int reconnect_attempts;
int reconnect_intvl;
char pg_bindir[MAXLEN];
char pgctl_options[MAXLEN];
char logfile[MAXLEN];
int monitor_interval_secs;
int retry_promote_interval_secs;
} t_configuration_options;
void parse_config(const char *config_file, t_configuration_options *options);
void parse_line(char *buff, char *name, char *value);
char *trim(char *s);
bool reload_configuration(char *config_file, t_configuration_options *orig_options);
#define T_CONFIGURATION_OPTIONS_INITIALIZER { "", -1, "", MANUAL_FAILOVER, -1, "", "", "", "", "", "", "", -1, -1, -1, "", "", "", 0, 0 }
void parse_config(const char *config_file, t_configuration_options * options);
void parse_line(char *buff, char *name, char *value);
char *trim(char *s);
bool reload_config(char *config_file, t_configuration_options * orig_options);
#endif

331
dbutils.c
View File

@@ -1,6 +1,6 @@
/*
* dbutils.c - Database connection/management functions
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -18,17 +18,19 @@
*/
#include <unistd.h>
#include <time.h>
#include <sys/time.h>
#include "repmgr.h"
#include "strutil.h"
#include "log.h"
PGconn *
establishDBConnection(const char *conninfo, const bool exit_on_error)
establish_db_connection(const char *conninfo, const bool exit_on_error)
{
/* Make a connection to the database */
PGconn *conn = NULL;
char connection_string[MAXLEN];
PGconn *conn = NULL;
char connection_string[MAXLEN];
strcpy(connection_string, conninfo);
strcat(connection_string, " fallback_application_name='repmgr'");
@@ -38,7 +40,7 @@ establishDBConnection(const char *conninfo, const bool exit_on_error)
if ((PQstatus(conn) != CONNECTION_OK))
{
log_err(_("Connection to database failed: %s\n"),
PQerrorMessage(conn));
PQerrorMessage(conn));
if (exit_on_error)
{
@@ -51,16 +53,17 @@ establishDBConnection(const char *conninfo, const bool exit_on_error)
}
PGconn *
establishDBConnectionByParams(const char *keywords[], const char *values[],const bool exit_on_error)
establish_db_connection_by_params(const char *keywords[], const char *values[],
const bool exit_on_error)
{
/* Make a connection to the database */
PGconn *conn = PQconnectdbParams(keywords, values, true);
PGconn *conn = PQconnectdbParams(keywords, values, true);
/* Check to see that the backend connection was successfully made */
if ((PQstatus(conn) != CONNECTION_OK))
{
log_err(_("Connection to database failed: %s\n"),
PQerrorMessage(conn));
PQerrorMessage(conn));
if (exit_on_error)
{
PQfinish(conn);
@@ -71,25 +74,22 @@ establishDBConnectionByParams(const char *keywords[], const char *values[],const
return conn;
}
bool
int
is_standby(PGconn *conn)
{
PGresult *res;
bool result = false;
int result = 0;
res = PQexec(conn, "SELECT pg_is_in_recovery()");
if (PQresultStatus(res) != PGRES_TUPLES_OK)
if (res == NULL || PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_err(_("Can't query server mode: %s"),
PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
exit(ERR_DB_QUERY);
PQerrorMessage(conn));
result = -1;
}
if (PQntuples(res) == 1 && strcmp(PQgetvalue(res, 0, 0), "t") == 0)
result = true;
else if (PQntuples(res) == 1 && strcmp(PQgetvalue(res, 0, 0), "t") == 0)
result = 1;
PQclear(res);
return result;
@@ -97,26 +97,23 @@ is_standby(PGconn *conn)
bool
int
is_witness(PGconn *conn, char *schema, char *cluster, int node_id)
{
PGresult *res;
bool result = false;
int result = 0;
char sqlquery[QUERY_STR_LEN];
sqlquery_snprintf(sqlquery, "SELECT witness from %s.repl_nodes where cluster = '%s' and id = %d",
schema, cluster, node_id);
schema, cluster, node_id);
res = PQexec(conn, sqlquery);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_err(_("Can't query server mode: %s"), PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
exit(ERR_DB_QUERY);
result = -1;
}
if (PQntuples(res) == 1 && strcmp(PQgetvalue(res, 0, 0), "t") == 0)
result = true;
else if (PQntuples(res) == 1 && strcmp(PQgetvalue(res, 0, 0), "t") == 0)
result = 1;
PQclear(res);
return result;
@@ -128,6 +125,7 @@ bool
is_pgup(PGconn *conn, int timeout)
{
char sqlquery[QUERY_STR_LEN];
/* Check the connection status twice in case it changes after reset */
bool twice = false;
@@ -138,23 +136,24 @@ is_pgup(PGconn *conn, int timeout)
{
if (twice)
return false;
PQreset(conn); // reconnect
PQreset(conn); /* reconnect */
twice = true;
}
else
{
/*
* Send a SELECT 1 just to check if the connection is OK
*/
CancelQuery(conn, timeout);
* Send a SELECT 1 just to check if the connection is OK
*/
if (!cancel_query(conn, timeout))
goto failed;
if (wait_connection_availability(conn, timeout) != 1)
goto failed;
sqlquery_snprintf(sqlquery, "SELECT 1");
if (PQsendQuery(conn, sqlquery) == 0)
{
log_warning(_("PQsendQuery: Query could not be sent to primary. %s\n"),
PQerrorMessage(conn));
log_warning(_("PQsendQuery: Query could not be sent to primary. %s\n"),
PQerrorMessage(conn));
goto failed;
}
if (wait_connection_availability(conn, timeout) != 1)
@@ -162,11 +161,15 @@ is_pgup(PGconn *conn, int timeout)
break;
failed:
// we need to retry, because we might just have loose the connection once
failed:
/*
* we need to retry, because we might just have loose the
* connection once
*/
if (twice)
return false;
PQreset(conn); // reconnect
PQreset(conn); /* reconnect */
twice = true;
}
}
@@ -179,26 +182,25 @@ failed:
* if 8 or inferior returns an empty string
*/
char *
pg_version(PGconn *conn, char* major_version)
pg_version(PGconn *conn, char *major_version)
{
PGresult *res;
PGresult *res;
int major_version1;
char *major_version2;
int major_version1;
char *major_version2;
res = PQexec(conn,
"WITH pg_version(ver) AS "
"(SELECT split_part(version(), ' ', 2)) "
"SELECT split_part(ver, '.', 1), split_part(ver, '.', 2) "
"FROM pg_version");
"WITH pg_version(ver) AS "
"(SELECT split_part(version(), ' ', 2)) "
"SELECT split_part(ver, '.', 1), split_part(ver, '.', 2) "
"FROM pg_version");
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_err(_("Version check PQexec failed: %s"),
PQerrorMessage(conn));
PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
exit(ERR_DB_QUERY);
return NULL;
}
major_version1 = atoi(PQgetvalue(res, 0, 0));
@@ -208,7 +210,7 @@ pg_version(PGconn *conn, char* major_version)
{
/* form a major version string */
xsnprintf(major_version, MAXVERSIONSTR, "%d.%s", major_version1,
major_version2);
major_version2);
}
else
strcpy(major_version, "");
@@ -219,59 +221,92 @@ pg_version(PGconn *conn, char* major_version)
}
bool
guc_setted(PGconn *conn, const char *parameter, const char *op,
const char *value)
int
guc_set(PGconn *conn, const char *parameter, const char *op,
const char *value)
{
PGresult *res;
PGresult *res;
char sqlquery[QUERY_STR_LEN];
int retval = 1;
sqlquery_snprintf(sqlquery, "SELECT true FROM pg_settings "
" WHERE name = '%s' AND setting %s '%s'",
parameter, op, value);
" WHERE name = '%s' AND setting %s '%s'",
parameter, op, value);
res = PQexec(conn, sqlquery);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_err(_("GUC setting check PQexec failed: %s"),
PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
exit(ERR_DB_QUERY);
PQerrorMessage(conn));
retval = -1;
}
if (PQntuples(res) == 0)
else if (PQntuples(res) == 0)
{
PQclear(res);
return false;
retval = 0;
}
PQclear(res);
return true;
return retval;
}
/**
* Just like guc_set except with an extra parameter containing the name of
* the pg datatype so that the comparison can be done properly.
*/
int
guc_set_typed(PGconn *conn, const char *parameter, const char *op,
const char *value, const char *datatype)
{
PGresult *res;
char sqlquery[QUERY_STR_LEN];
int retval = 1;
sqlquery_snprintf(sqlquery, "SELECT true FROM pg_settings "
" WHERE name = '%s' AND setting::%s %s '%s'::%s",
parameter, datatype, op, value, datatype);
res = PQexec(conn, sqlquery);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_err(_("GUC setting check PQexec failed: %s"),
PQerrorMessage(conn));
retval = -1;
}
else if (PQntuples(res) == 0)
{
retval = 0;
}
PQclear(res);
return retval;
}
const char *
get_cluster_size(PGconn *conn)
{
PGresult *res;
const char *size;
char sqlquery[QUERY_STR_LEN];
PGresult *res;
const char *size = NULL;
char sqlquery[QUERY_STR_LEN];
sqlquery_snprintf(
sqlquery,
"SELECT pg_size_pretty(SUM(pg_database_size(oid))::bigint) "
" FROM pg_database ");
sqlquery,
"SELECT pg_size_pretty(SUM(pg_database_size(oid))::bigint) "
" FROM pg_database ");
res = PQexec(conn, sqlquery);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_err(_("Get cluster size PQexec failed: %s"),
PQerrorMessage(conn));
PQclear(res);
PQfinish(conn);
exit(ERR_DB_QUERY);
PQerrorMessage(conn));
}
size = PQgetvalue(res, 0, 0);
else
{
size = PQgetvalue(res, 0, 0);
}
PQclear(res);
return size;
}
@@ -280,23 +315,23 @@ get_cluster_size(PGconn *conn)
* get a connection to master by reading repl_nodes, creating a connection
* to each node (one at a time) and finding if it is a master or a standby
*
* NB: If master_conninfo_out may be NULL. If it is non-null, it is assumed to
* NB: If master_conninfo_out may be NULL. If it is non-null, it is assumed to
* point to allocated memory of MAXCONNINFO in length, and the master server
* connection string is placed there.
*/
PGconn *
getMasterConnection(PGconn *standby_conn, char *schema, char *cluster,
int *master_id, char *master_conninfo_out)
get_master_connection(PGconn *standby_conn, char *schema, char *cluster,
int *master_id, char *master_conninfo_out)
{
PGconn *master_conn = NULL;
PGresult *res1;
PGresult *res2;
char sqlquery[QUERY_STR_LEN];
char master_conninfo_stack[MAXCONNINFO];
char *master_conninfo = &*master_conninfo_stack;
char schema_quoted[MAXLEN];
PGconn *master_conn = NULL;
PGresult *res1;
PGresult *res2;
char sqlquery[QUERY_STR_LEN];
char master_conninfo_stack[MAXCONNINFO];
char *master_conninfo = &*master_conninfo_stack;
char schema_quoted[MAXLEN];
int i;
int i;
/*
* If the caller wanted to get a copy of the connection info string, sub
@@ -311,8 +346,8 @@ getMasterConnection(PGconn *standby_conn, char *schema, char *cluster,
* Assemble the unquoted schema name
*/
{
char *identifier = PQescapeIdentifier(standby_conn, schema,
strlen(schema));
char *identifier = PQescapeIdentifier(standby_conn, schema,
strlen(schema));
maxlen_snprintf(schema_quoted, "%s", identifier);
PQfreemem(identifier);
@@ -320,20 +355,19 @@ getMasterConnection(PGconn *standby_conn, char *schema, char *cluster,
/* find all nodes belonging to this cluster */
log_info(_("finding node list for cluster '%s'\n"),
cluster);
cluster);
sqlquery_snprintf(sqlquery, "SELECT id, conninfo FROM %s.repl_nodes "
" WHERE cluster = '%s' and not witness",
schema_quoted, cluster);
" WHERE cluster = '%s' and not witness",
schema_quoted, cluster);
res1 = PQexec(standby_conn, sqlquery);
if (PQresultStatus(res1) != PGRES_TUPLES_OK)
{
log_err(_("Can't get nodes info: %s\n"),
PQerrorMessage(standby_conn));
PQerrorMessage(standby_conn));
PQclear(res1);
PQfinish(standby_conn);
exit(ERR_DB_QUERY);
return NULL;
}
for (i = 0; i < PQntuples(res1); i++)
@@ -342,23 +376,23 @@ getMasterConnection(PGconn *standby_conn, char *schema, char *cluster,
*master_id = atoi(PQgetvalue(res1, i, 0));
strncpy(master_conninfo, PQgetvalue(res1, i, 1), MAXCONNINFO);
log_info(_("checking role of cluster node '%s'\n"),
master_conninfo);
master_conn = establishDBConnection(master_conninfo, false);
master_conninfo);
master_conn = establish_db_connection(master_conninfo, false);
if (PQstatus(master_conn) != CONNECTION_OK)
continue;
/*
* Can't use the is_standby() function here because on error that
* function closes the connection passed and exits. This still
* needs to close master_conn first.
* function closes the connection passed and exits. This still needs
* to close master_conn first.
*/
res2 = PQexec(master_conn, "SELECT pg_is_in_recovery()");
if (PQresultStatus(res2) != PGRES_TUPLES_OK)
{
log_err(_("Can't get recovery state from this node: %s\n"),
PQerrorMessage(master_conn));
PQerrorMessage(master_conn));
PQclear(res2);
PQfinish(master_conn);
continue;
@@ -380,14 +414,13 @@ getMasterConnection(PGconn *standby_conn, char *schema, char *cluster,
}
}
/* If we finish this loop without finding a master then
* we doesn't have the info or the master has failed (or we
* reached max_connections or superuser_reserved_connections,
* anything else I'm missing?).
/*
* If we finish this loop without finding a master then we doesn't have
* the info or the master has failed (or we reached max_connections or
* superuser_reserved_connections, anything else I'm missing?).
*
* Probably we will need to check the error to know if we need
* to start failover procedure or just fix some situation on the
* standby.
* Probably we will need to check the error to know if we need to start
* failover procedure or just fix some situation on the standby.
*/
PQclear(res1);
return NULL;
@@ -395,52 +428,102 @@ getMasterConnection(PGconn *standby_conn, char *schema, char *cluster,
/*
* wait until current query finishes ignoring any results, this could be an async command
* or a cancelation of a query
* wait until current query finishes ignoring any results, this could be an
* async command or a cancelation of a query
* return 1 if Ok; 0 if any error ocurred; -1 if timeout reached
*/
int
wait_connection_availability(PGconn *conn, int timeout)
wait_connection_availability(PGconn *conn, long long timeout)
{
PGresult *res;
fd_set read_set;
int sock = PQsocket(conn);
struct timeval tmout,
before,
after;
struct timezone tz;
while(timeout-- >= 0)
/* recalc to microseconds */
timeout *= 1000000;
while (timeout > 0)
{
if (PQconsumeInput(conn) == 0)
{
log_warning(_("PQconsumeInput: Query could not be sent to primary. %s\n"),
PQerrorMessage(conn));
log_warning(_("wait_connection_availability: could not receive data from connection. %s\n"),
PQerrorMessage(conn));
return 0;
}
if (PQisBusy(conn) == 0)
{
res = PQgetResult(conn);
if (res == NULL)
break;
PQclear(res);
do
{
res = PQgetResult(conn);
PQclear(res);
} while (res != NULL);
break;
}
sleep(1);
tmout.tv_sec = 0;
tmout.tv_usec = 250000;
FD_ZERO(&read_set);
FD_SET(sock, &read_set);
gettimeofday(&before, &tz);
if (select(sock, &read_set, NULL, NULL, &tmout) == -1)
{
log_warning(
_("wait_connection_availability: select() returned with error: %s"),
strerror(errno));
return -1;
}
gettimeofday(&after, &tz);
timeout -= (after.tv_sec * 1000000 + after.tv_usec) -
(before.tv_sec * 1000000 + before.tv_usec);
}
if (timeout >= 0)
{
return 1;
else
return -1;
}
log_warning(_("wait_connection_availability: timeout reached"));
return -1;
}
void
CancelQuery(PGconn *conn, int timeout)
bool
cancel_query(PGconn *conn, int timeout)
{
char errbuf[ERRBUFF_SIZE];
PGcancel *pgcancel;
char errbuf[ERRBUFF_SIZE];
PGcancel *pgcancel;
wait_connection_availability(conn, timeout);
if (wait_connection_availability(conn, timeout) != 1)
return false;
pgcancel = PQgetCancel(conn);
if (!pgcancel || PQcancel(pgcancel, errbuf, ERRBUFF_SIZE) == 0)
if (pgcancel == NULL)
return false;
/*
* PQcancel can only return 0 if socket()/connect()/send() fails, in any
* of those cases we can assume something bad happened to the connection
*/
if (PQcancel(pgcancel, errbuf, ERRBUFF_SIZE) == 0)
{
log_warning(_("Can't stop current query: %s\n"), errbuf);
PQfreeCancel(pgcancel);
return false;
}
PQfreeCancel(pgcancel);
return true;
}

View File

@@ -1,6 +1,6 @@
/*
* dbutils.h
* Copyright (c) 2ndQuadrant, 2010-2012
* Copyright (c) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -22,20 +22,25 @@
#include "strutil.h"
PGconn *establishDBConnection(const char *conninfo, const bool exit_on_error);
PGconn *establishDBConnectionByParams(const char *keywords[],
const char *values[],
const bool exit_on_error);
bool is_standby(PGconn *conn);
bool is_witness(PGconn *conn, char *schema, char *cluster, int node_id);
bool is_pgup(PGconn *conn, int timeout);
char *pg_version(PGconn *conn, char* major_version);
bool guc_setted(PGconn *conn, const char *parameter, const char *op,
const char *value);
const char *get_cluster_size(PGconn *conn);
PGconn *getMasterConnection(PGconn *standby_conn, char *schema, char *cluster,
int *master_id, char *master_conninfo_out);
PGconn *establish_db_connection(const char *conninfo,
const bool exit_on_error);
PGconn *establish_db_connection_by_params(const char *keywords[],
const char *values[],
const bool exit_on_error);
int is_standby(PGconn *conn);
int is_witness(PGconn *conn, char *schema, char *cluster, int node_id);
bool is_pgup(PGconn *conn, int timeout);
char *pg_version(PGconn *conn, char *major_version);
int guc_set(PGconn *conn, const char *parameter, const char *op,
const char *value);
int guc_set_typed(PGconn *conn, const char *parameter, const char *op,
const char *value, const char *datatype);
const char *get_cluster_size(PGconn *conn);
PGconn *get_master_connection(PGconn *standby_conn, char *schema, char *cluster,
int *master_id, char *master_conninfo_out);
int wait_connection_availability(PGconn *conn, long long timeout);
bool cancel_query(PGconn *conn, int timeout);
int wait_connection_availability(PGconn *conn, int timeout);
void CancelQuery(PGconn *conn, int timeout);
#endif

View File

@@ -1,9 +1,9 @@
Package: repmgr-auto
Version: 1.0-1
Version: 2.0beta2
Section: database
Priority: optional
Architecture: all
Depends: rsync, postgresql-9.0
Maintainer: Greg Smith <greg@2ndQuadrant.com>
Depends: rsync, postgresql-9.0 | postgresql-9.1 | postgresql-9.2 | postgresql-9.3 | postgresql-9.4
Maintainer: Jaime Casanova <jaime@2ndQuadrant.com>
Description: PostgreSQL replication setup, magament and monitoring
has two main executables

18
debian/repmgr.repmgrd.default vendored Normal file
View File

@@ -0,0 +1,18 @@
# default settings for repmgrd. This file is source by /bin/sh from
# /etc/init.d/repmgrd
# disable repmgrd by default so it won't get started upon installation
# valid values: yes/no
REPMGRD_ENABLED=no
# Options for repmgrd (required)
#REPMGRD_OPTS="--config-file /path/to/repmgr.conf"
# User to run repmgrd as
#REPMGRD_USER=postgres
# repmgrd binary
#REPMGR_BIN=/usr/bin/repmgr
# pid file
#REPMGR_PIDFILE=/var/run/repmgrd.pid

101
debian/repmgr.repmgrd.init vendored Normal file
View File

@@ -0,0 +1,101 @@
#!/bin/sh
### BEGIN INIT INFO
# Provides: repmgrd
# Required-Start: $local_fs $remote_fs $network $syslog postgresql
# Required-Stop: $local_fs $remote_fs $network $syslog postgresql
# Should-Start: $syslog postgresql
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Start/stop repmgrd
# Description: Enable repmgrd replication management and monitoring daemon for PostgreSQL
### END INIT INFO
set -e
DESC="PostgreSQL replication management and monitoring daemon"
NAME=repmgrd
REPMGRD_ENABLED=no
REPMGRD_OPTS=
REPMGRD_USER=postgres
REPMGRD_BIN=/usr/bin/repmgrd
REPMGRD_PIDFILE=/var/run/repmgrd.pid
# Read configuration variable file if it is present
[ -r /etc/default/$NAME ] && . /etc/default/$NAME
test -x $REPMGRD_BIN || exit 0
case "$REPMGRD_ENABLED" in
[Yy]*)
break
;;
*)
exit 0
;;
esac
# Define LSB log_* functions.
. /lib/lsb/init-functions
if [ -z "$REPMGRD_OPTS" ]
then
log_warning_msg "Not starting $NAME, REPMGRD_OPTS not set in /etc/default/$NAME"
exit 0
fi
do_start()
{
# Return
# 0 if daemon has been started
# 1 if daemon was already running
# other if daemon could not be started or a failure occured
start-stop-daemon --start --quiet --background --chuid $REPMGRD_USER --make-pidfile --pidfile $REPMGRD_PIDFILE --exec $REPMGRD_BIN -- $REPMGRD_OPTS
}
do_stop()
{
# Return
# 0 if daemon has been stopped
# 1 if daemon was already stopped
# other if daemon could not be stopped or a failure occurred
start-stop-daemon --stop --quiet --retry=TERM/30/KILL/5 --pidfile $REPMGRD_PIDFILE --exec $REPMGRD_BIN
}
case "$1" in
start)
log_daemon_msg "Starting $DESC" "$NAME"
do_start
case "$?" in
0) log_end_msg 0 ;;
1) log_progress_msg "already started"
log_end_msg 0 ;;
*) log_end_msg 1 ;;
esac
;;
stop)
log_daemon_msg "Stopping $DESC" "$NAME"
do_stop
case "$?" in
0) log_end_msg 0 ;;
1) log_progress_msg "already stopped"
log_end_msg 0 ;;
*) log_end_msg 1 ;;
esac
;;
restart|force-reload)
$0 stop
$0 start
;;
status)
status_of_proc -p $REPMGRD_PIDFILE $REPMGRD_BIN $NAME && exit 0 || exit $?
;;
*)
echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload|status}" >&2
exit 3
;;
esac
exit 0

View File

@@ -1,6 +1,6 @@
/*
* errcode.h
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -34,6 +34,7 @@
#define ERR_BAD_PASSWORD 9
#define ERR_STR_OVERFLOW 10
#define ERR_FAILOVER_FAIL 11
#define ERR_BAD_SSH 12
#define ERR_BAD_SSH 12
#define ERR_SYS_FAILURE 13
#endif /* _ERRCODE_H_ */
#endif /* _ERRCODE_H_ */

134
log.c
View File

@@ -1,6 +1,6 @@
/*
* log.c - Logging methods
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This module is a set of methods for logging (currently only syslog)
*
@@ -25,9 +25,11 @@
#ifdef HAVE_SYSLOG
#include <syslog.h>
#include <stdarg.h>
#endif
#include <stdarg.h>
#include <time.h>
#include "log.h"
#define DEFAULT_IDENT "repmgr"
@@ -37,20 +39,44 @@
/* #define REPMGR_DEBUG */
static int detect_log_level(const char* level);
static int detect_log_facility(const char* facility);
int log_type = REPMGR_STDERR;
int log_level = LOG_NOTICE;
bool logger_init(const char* ident, const char* level, const char* facility)
void
stderr_log_with_level(const char *level_name, int level, const char *fmt, ...)
{
time_t t;
struct tm *tm;
char buff[100];
va_list ap;
int l;
int f;
if (log_level >= level)
{
time(&t);
tm = localtime(&t);
strftime(buff, 100, "[%Y-%m-%d %H:%M:%S]", tm);
fprintf(stderr, "%s [%s] ", buff, level_name);
va_start(ap, fmt);
vfprintf(stderr, fmt, ap);
va_end(ap);
fflush(stderr);
}
}
static int detect_log_level(const char *level);
static int detect_log_facility(const char *facility);
int log_type = REPMGR_STDERR;
int log_level = LOG_NOTICE;
bool
logger_init(t_configuration_options * opts, const char *ident, const char *level, const char *facility)
{
int l;
int f;
#ifdef HAVE_SYSLOG
int syslog_facility = DEFAULT_SYSLOG_FACILITY;
int syslog_facility = DEFAULT_SYSLOG_FACILITY;
#endif
#ifdef REPMGR_DEBUG
@@ -107,21 +133,33 @@ bool logger_init(const char* ident, const char* level, const char* facility)
if (log_type == REPMGR_SYSLOG)
{
setlogmask (LOG_UPTO (log_level));
openlog (ident, LOG_CONS | LOG_PID | LOG_NDELAY, syslog_facility);
setlogmask(LOG_UPTO(log_level));
openlog(ident, LOG_CONS | LOG_PID | LOG_NDELAY, syslog_facility);
stderr_log_notice(_("Setup syslog (level: %s, facility: %s)\n"), level, facility);
}
#endif
if (*opts->logfile)
{
FILE *fd;
fd = freopen(opts->logfile, "a", stderr);
if (fd == NULL)
{
fprintf(stderr, "error reopening stderr to '%s': %s",
opts->logfile, strerror(errno));
}
}
return true;
}
bool logger_shutdown(void)
bool
logger_shutdown(void)
{
#ifdef HAVE_SYSLOG
if (log_type == REPMGR_SYSLOG)
closelog();
@@ -135,13 +173,15 @@ bool logger_shutdown(void)
* options, which might increase requested logging over what's specified
* in the regular configuration file.
*/
void logger_min_verbose(int minimum)
void
logger_min_verbose(int minimum)
{
if (log_level < minimum)
log_level = minimum;
}
int detect_log_level(const char* level)
int
detect_log_level(const char *level)
{
if (!strcmp(level, "DEBUG"))
return LOG_DEBUG;
@@ -163,40 +203,42 @@ int detect_log_level(const char* level)
return 0;
}
int detect_log_facility(const char* facility)
int
detect_log_facility(const char *facility)
{
int local = 0;
int local = 0;
if (!strncmp(facility, "LOCAL", 5) && strlen(facility) == 6)
{
local = atoi (&facility[5]);
local = atoi(&facility[5]);
switch (local)
{
case 0:
return LOG_LOCAL0;
break;
case 1:
return LOG_LOCAL1;
break;
case 2:
return LOG_LOCAL2;
break;
case 3:
return LOG_LOCAL3;
break;
case 4:
return LOG_LOCAL4;
break;
case 5:
return LOG_LOCAL5;
break;
case 6:
return LOG_LOCAL6;
break;
case 7:
return LOG_LOCAL7;
break;
case 0:
return LOG_LOCAL0;
break;
case 1:
return LOG_LOCAL1;
break;
case 2:
return LOG_LOCAL2;
break;
case 3:
return LOG_LOCAL3;
break;
case 4:
return LOG_LOCAL4;
break;
case 5:
return LOG_LOCAL5;
break;
case 6:
return LOG_LOCAL6;
break;
case 7:
return LOG_LOCAL7;
break;
}
}

59
log.h
View File

@@ -1,6 +1,6 @@
/*
* log.h
* Copyright (c) 2ndQuadrant, 2010-2012
* Copyright (c) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -25,15 +25,25 @@
#define REPMGR_SYSLOG 1
#define REPMGR_STDERR 2
#if (PG_VERSION_NUM >= 90100)
void
stderr_log_with_level(const char *level_name, int level, const char *fmt,...)
__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 4)));
#else
void
stderr_log_with_level(const char *level_name, int level, const char *fmt,...)
__attribute__((format(printf, 3, 4)));
#endif
/* Standard error logging */
#define stderr_log_debug(...) if (log_level >= LOG_DEBUG) fprintf(stderr, __VA_ARGS__)
#define stderr_log_info(...) if (log_level >= LOG_INFO) fprintf(stderr, __VA_ARGS__)
#define stderr_log_notice(...) if (log_level >= LOG_NOTICE) fprintf(stderr, __VA_ARGS__)
#define stderr_log_warning(...) if (log_level >= LOG_WARNING) fprintf(stderr, __VA_ARGS__)
#define stderr_log_err(...) if (log_level >= LOG_ERR) fprintf(stderr, __VA_ARGS__)
#define stderr_log_crit(...) if (log_level >= LOG_CRIT) fprintf(stderr, __VA_ARGS__)
#define stderr_log_alert(...) if (log_level >= LOG_ALERT) fprintf(stderr, __VA_ARGS__)
#define stderr_log_emerg(...) if (log_level >= LOG_EMERG) fprintf(stderr, __VA_ARGS__)
#define stderr_log_debug(...) stderr_log_with_level("DEBUG", LOG_DEBUG, __VA_ARGS__)
#define stderr_log_info(...) stderr_log_with_level("INFO", LOG_INFO, __VA_ARGS__)
#define stderr_log_notice(...) stderr_log_with_level("NOTICE", LOG_NOTICE, __VA_ARGS__)
#define stderr_log_warning(...) stderr_log_with_level("WARNING", LOG_WARNING, __VA_ARGS__)
#define stderr_log_err(...) stderr_log_with_level("ERROR", LOG_ERR, __VA_ARGS__)
#define stderr_log_crit(...) stderr_log_with_level("CRITICAL", LOG_CRIT, __VA_ARGS__)
#define stderr_log_alert(...) stderr_log_with_level("ALERT", LOG_ALERT, __VA_ARGS__)
#define stderr_log_emerg(...) stderr_log_with_level("EMERGENCY", LOG_EMERG, __VA_ARGS__)
#ifdef HAVE_SYSLOG
@@ -86,17 +96,16 @@
if (log_type == REPMGR_SYSLOG) syslog(LOG_ALERT, __VA_ARGS__); \
else stderr_log_alert(__VA_ARGS__); \
}
#else
#define LOG_EMERG 0 /* system is unusable */
#define LOG_ALERT 1 /* action must be taken immediately */
#define LOG_CRIT 2 /* critical conditions */
#define LOG_ERR 3 /* error conditions */
#define LOG_WARNING 4 /* warning conditions */
#define LOG_NOTICE 5 /* normal but significant condition */
#define LOG_INFO 6 /* informational */
#define LOG_DEBUG 7 /* debug-level messages */
#define LOG_EMERG 0 /* system is unusable */
#define LOG_ALERT 1 /* action must be taken immediately */
#define LOG_CRIT 2 /* critical conditions */
#define LOG_ERR 3 /* error conditions */
#define LOG_WARNING 4 /* warning conditions */
#define LOG_NOTICE 5 /* normal but significant condition */
#define LOG_INFO 6 /* informational */
#define LOG_DEBUG 7 /* debug-level messages */
#define log_debug(...) stderr_log_debug(__VA_ARGS__)
#define log_info(...) stderr_log_info(__VA_ARGS__)
@@ -106,16 +115,18 @@
#define log_crit(...) stderr_log_crit(__VA_ARGS__)
#define log_alert(...) stderr_log_alert(__VA_ARGS__)
#define log_emerg(...) stderr_log_emerg(__VA_ARGS__)
#endif
/* Logger initialisation and shutdown */
bool logger_shutdown(void);
bool logger_init(const char* ident, const char* level, const char* facility);
void logger_min_verbose(int minimum);
bool logger_shutdown(void);
extern int log_type;
extern int log_level;
bool logger_init(t_configuration_options * opts, const char *ident,
const char *level, const char *facility);
void logger_min_verbose(int minimum);
extern int log_type;
extern int log_level;
#endif

1779
repmgr.c

File diff suppressed because it is too large Load Diff

View File

@@ -11,7 +11,8 @@ node_name=standby2
# Connection information
conninfo='host=192.168.204.104'
rsync_options=--archive --checksum --compress --progress --rsh=ssh
rsync_options=--archive --checksum --compress --progress --rsh="ssh -o \"StrictHostKeyChecking no\""
ssh_options=-o "StrictHostKeyChecking no"
# How many seconds we wait for master response before declaring master failure
master_response_timeout=60
@@ -21,10 +22,10 @@ reconnect_attempts=6
reconnect_interval=10
# Autofailover options
failover=automatic
failover=manual
priority=-1
promote_command='repmgr promote'
follow_command='repmgr follow'
promote_command='repmgr standby promote -f /path/to/repmgr.conf'
follow_command='repmgr standby follow -f /path/to/repmgr.conf -W'
# Log level: possible values are DEBUG, INFO, NOTICE, WARNING, ERR, ALERT, CRIT or EMERG
# Default: NOTICE
@@ -33,3 +34,29 @@ loglevel=NOTICE
# Logging facility: possible values are STDERR or - for Syslog integration - one of LOCAL0, LOCAL1, ..., LOCAL7, USER
# Default: STDERR
logfacility=STDERR
# path to pg_ctl executable
pg_bindir=/usr/bin/
#
# you may add command line arguments for pg_ctl
#
# pg_ctl_options='-s'
#
# redirect stderr to a logfile
#
# logfile='/var/log/repmgr.log'
#
# change monitoring interval; default is 2s
#
# monitor_interval_secs=2
#
# change wait time for master; before we bail out and exit when the
# master disappears, we wait 6 * retry_promote_interval_secs seconds;
# by default this would be half an hour (since sleep_delay default
# value is 300)
#
# retry_promote_interval_secs=300

View File

@@ -1,6 +1,6 @@
/*
* repmgr.h
* Copyright (c) 2ndQuadrant, 2010-2012
* Copyright (c) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -50,24 +50,28 @@
typedef struct
{
char dbname[MAXLEN];
char host[MAXLEN];
char username[MAXLEN];
char dest_dir[MAXFILENAME];
char config_file[MAXFILENAME];
char remote_user[MAXLEN];
char wal_keep_segments[MAXLEN];
bool verbose;
bool force;
bool ignore_rsync_warn;
char dbname[MAXLEN];
char host[MAXLEN];
char username[MAXLEN];
char dest_dir[MAXFILENAME];
char config_file[MAXFILENAME];
char remote_user[MAXLEN];
char superuser[MAXLEN];
char wal_keep_segments[MAXLEN];
bool verbose;
bool force;
bool wait_for_master;
bool ignore_rsync_warn;
bool initdb_no_pwprompt;
bool fast_checkpoint;
char masterport[MAXLEN];
char localport[MAXLEN];
char masterport[MAXLEN];
char localport[MAXLEN];
/* parameter used by CLUSTER CLEANUP */
int keep_history;
} t_runtime_options;
int keep_history;
} t_runtime_options;
#define SLEEP_MONITOR 2
#define T_RUNTIME_OPTIONS_INITIALIZER { "", "", "", "", "", "", "", DEFAULT_WAL_KEEP_SEGMENTS, false, false, false, false, false, false, "", "", 0}
#endif

View File

@@ -1,7 +1,7 @@
/*
* repmgr.sql
*
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
*/

1480
repmgrd.c

File diff suppressed because it is too large Load Diff

View File

@@ -9,7 +9,8 @@ DATA=uninstall_repmgr_funcs.sql
OBJS=repmgr_funcs.o
ifdef USE_PGXS
PGXS := $(shell pg_config --pgxs)
PG_CONFIG = pg_config
PGXS := $(shell $(PG_CONFIG) --pgxs)
include $(PGXS)
else
subdir = contrib/repmgr/sql

View File

@@ -15,9 +15,10 @@
#include "storage/shmem.h"
#include "storage/spin.h"
#include "utils/builtins.h"
#include "utils/timestamp.h"
/* same definition as the one in xlog_internal.h */
#define MAXFNAMELEN 64
#define MAXFNAMELEN 64
PG_MODULE_MAGIC;
@@ -26,29 +27,36 @@ PG_MODULE_MAGIC;
*/
typedef struct repmgrSharedState
{
LWLockId lock; /* protects search/modification */
char location[MAXFNAMELEN]; /* last known xlog location */
} repmgrSharedState;
LWLockId lock; /* protects search/modification */
char location[MAXFNAMELEN]; /* last known xlog location */
TimestampTz last_updated;
} repmgrSharedState;
/* Links to shared memory state */
static repmgrSharedState *shared_state = NULL;
static shmem_startup_hook_type prev_shmem_startup_hook = NULL;
void _PG_init(void);
void _PG_fini(void);
void _PG_init(void);
void _PG_fini(void);
static void repmgr_shmem_startup(void);
static Size repmgr_memsize(void);
static bool repmgr_set_standby_location(char *locationstr);
Datum repmgr_update_standby_location(PG_FUNCTION_ARGS);
Datum repmgr_get_last_standby_location(PG_FUNCTION_ARGS);
Datum repmgr_update_standby_location(PG_FUNCTION_ARGS);
Datum repmgr_get_last_standby_location(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(repmgr_update_standby_location);
PG_FUNCTION_INFO_V1(repmgr_get_last_standby_location);
Datum repmgr_update_last_updated(PG_FUNCTION_ARGS);
Datum repmgr_get_last_updated(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(repmgr_update_last_updated);
PG_FUNCTION_INFO_V1(repmgr_get_last_updated);
/*
* Module load callback
@@ -60,9 +68,9 @@ _PG_init(void)
* In order to create our shared memory area, we have to be loaded via
* shared_preload_libraries. If not, fall out without hooking into any of
* the main system. (We don't throw error here because it seems useful to
* allow the repmgr functions to be created even when the
* module isn't active. The functions must protect themselves against
* being called then, however.)
* allow the repmgr functions to be created even when the module isn't
* active. The functions must protect themselves against being called
* then, however.)
*/
if (!process_shared_preload_libraries_in_progress)
return;
@@ -112,15 +120,15 @@ repmgr_shmem_startup(void)
LWLockAcquire(AddinShmemInitLock, LW_EXCLUSIVE);
shared_state = ShmemInitStruct("repmgr shared state",
sizeof(repmgrSharedState),
&found);
sizeof(repmgrSharedState),
&found);
if (!found)
{
/* First time through ... */
shared_state->lock = LWLockAssign();
snprintf(shared_state->location,
sizeof(shared_state->location), "%X/%X", 0, 0);
sizeof(shared_state->location), "%X/%X", 0, 0);
}
LWLockRelease(AddinShmemInitLock);
@@ -133,20 +141,20 @@ repmgr_shmem_startup(void)
static Size
repmgr_memsize(void)
{
return MAXALIGN(sizeof(repmgrSharedState));
return MAXALIGN(sizeof(repmgrSharedState));
}
static bool
repmgr_set_standby_location(char *locationstr)
{
/* Safety check... */
if (!shared_state)
return false;
/* Safety check... */
if (!shared_state)
return false;
LWLockAcquire(shared_state->lock, LW_EXCLUSIVE);
LWLockAcquire(shared_state->lock, LW_EXCLUSIVE);
strncpy(shared_state->location, locationstr, MAXFNAMELEN);
LWLockRelease(shared_state->lock);
LWLockRelease(shared_state->lock);
return true;
}
@@ -158,7 +166,7 @@ repmgr_set_standby_location(char *locationstr)
Datum
repmgr_get_last_standby_location(PG_FUNCTION_ARGS)
{
char location[MAXFNAMELEN];
char location[MAXFNAMELEN];
/* Safety check... */
if (!shared_state)
@@ -176,14 +184,49 @@ repmgr_get_last_standby_location(PG_FUNCTION_ARGS)
Datum
repmgr_update_standby_location(PG_FUNCTION_ARGS)
{
text *location = PG_GETARG_TEXT_P(0);
char *locationstr;
text *location = PG_GETARG_TEXT_P(0);
char *locationstr;
/* Safety check... */
if (!shared_state)
PG_RETURN_BOOL(false);
/* Safety check... */
if (!shared_state)
PG_RETURN_BOOL(false);
locationstr = text_to_cstring(location);
locationstr = text_to_cstring(location);
PG_RETURN_BOOL(repmgr_set_standby_location(locationstr));
}
/* update and return last updated with current timestamp */
Datum
repmgr_update_last_updated(PG_FUNCTION_ARGS)
{
TimestampTz last_updated = GetCurrentTimestamp();
/* Safety check... */
if (!shared_state)
PG_RETURN_NULL();
LWLockAcquire(shared_state->lock, LW_SHARED);
shared_state->last_updated = last_updated;
LWLockRelease(shared_state->lock);
PG_RETURN_TIMESTAMPTZ(last_updated);
}
/* get last updated timestamp */
Datum
repmgr_get_last_updated(PG_FUNCTION_ARGS)
{
TimestampTz last_updated;
/* Safety check... */
if (!shared_state)
PG_RETURN_NULL();
LWLockAcquire(shared_state->lock, LW_EXCLUSIVE);
last_updated = shared_state->last_updated;
LWLockRelease(shared_state->lock);
PG_RETURN_TIMESTAMPTZ(last_updated);
}

View File

@@ -1,6 +1,6 @@
/*
* repmgr_function.sql
* Copyright (c) 2ndQuadrant, 2010
* Copyright (c) 2ndQuadrant, 2010-2014
*
*/
@@ -13,3 +13,11 @@ LANGUAGE C STRICT;
CREATE FUNCTION repmgr_get_last_standby_location() RETURNS text
AS 'MODULE_PATHNAME', 'repmgr_get_last_standby_location'
LANGUAGE C STRICT;
CREATE FUNCTION repmgr_update_last_updated() RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'repmgr_update_last_updated'
LANGUAGE C STRICT;
CREATE FUNCTION repmgr_get_last_updated() RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'repmgr_get_last_updated'
LANGUAGE C STRICT;

View File

@@ -1,2 +1,11 @@
/*
* uninstall_repmgr_funcs.sql
* Copyright (c) 2ndQuadrant, 2010-2014
*
*/
DROP FUNCTION repmgr_update_standby_location(text);
DROP FUNCTION repmgr_get_last_standby_location();
DROP FUNCTION repmgr_update_last_updated();
DROP FUNCTION repmgr_get_last_updated();

View File

@@ -1,7 +1,7 @@
/*
* strutil.c
*
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
* This program is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
@@ -25,29 +25,27 @@
#include "log.h"
#include "strutil.h"
static int xvsnprintf(char *str, size_t size, const char *format, va_list ap);
/* Add strnlen on platforms that don't have it, like OS X */
#ifndef strnlen
size_t
strnlen(const char *s, size_t n)
{
const char *end = (const char *) memchr(s, '\0', n);
return(end ? end - s : n);
}
#if (PG_VERSION_NUM >= 90100)
static int
xvsnprintf(char *str, size_t size, const char *format, va_list ap)
__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 0)));
#else
static int
xvsnprintf(char *str, size_t size, const char *format, va_list ap)
__attribute__((format(printf, 3, 0)));
#endif
static int
xvsnprintf(char *str, size_t size, const char *format, va_list ap)
{
int retval;
int retval;
retval = vsnprintf(str, size, format, ap);
if (retval >= size)
if (retval >= (int) size)
{
log_err(_("Buffer of size not large enough to format entire string '%s'\n"),
str);
str);
exit(ERR_STR_OVERFLOW);
}
@@ -56,10 +54,10 @@ xvsnprintf(char *str, size_t size, const char *format, va_list ap)
int
xsnprintf(char *str, size_t size, const char *format, ...)
xsnprintf(char *str, size_t size, const char *format,...)
{
va_list arglist;
int retval;
va_list arglist;
int retval;
va_start(arglist, format);
retval = xvsnprintf(str, size, format, arglist);
@@ -70,7 +68,7 @@ xsnprintf(char *str, size_t size, const char *format, ...)
int
sqlquery_snprintf(char *str, const char *format, ...)
sqlquery_snprintf(char *str, const char *format,...)
{
va_list arglist;
int retval;
@@ -83,7 +81,8 @@ sqlquery_snprintf(char *str, const char *format, ...)
}
int maxlen_snprintf(char *str, const char *format, ...)
int
maxlen_snprintf(char *str, const char *format,...)
{
va_list arglist;
int retval;

View File

@@ -1,6 +1,6 @@
/*
* strutil.h
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
*
* This program is free software: you can redistribute it and/or modify
@@ -22,7 +22,7 @@
#define _STRUTIL_H_
#include <stdlib.h>
#include <errcode.h>
#include "errcode.h"
#define QUERY_STR_LEN 8192
#define MAXLEN 1024
@@ -31,13 +31,30 @@
#define MAXCONNINFO 1024
extern int xsnprintf(char *str, size_t size, const char *format, ...);
extern int sqlquery_snprintf(char *str, const char *format, ...);
extern int maxlen_snprintf(char *str, const char *format, ...);
#if (PG_VERSION_NUM >= 90100)
extern int
xsnprintf(char *str, size_t size, const char *format,...)
__attribute__((format(PG_PRINTF_ATTRIBUTE, 3, 4)));
/* Add strnlen on platforms that don't have it, like OS X */
#ifndef strnlen
extern size_t strnlen(const char *s, size_t n);
extern int
sqlquery_snprintf(char *str, const char *format,...)
__attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
extern int
maxlen_snprintf(char *str, const char *format,...)
__attribute__((format(PG_PRINTF_ATTRIBUTE, 2, 3)));
#else
extern int
xsnprintf(char *str, size_t size, const char *format,...)
__attribute__((format(printf, 3, 4)));
extern int
sqlquery_snprintf(char *str, const char *format,...)
__attribute__((format(printf, 2, 3)));
extern int
maxlen_snprintf(char *str, const char *format,...)
__attribute__((format(printf, 2, 3)));
#endif
#endif /* _STRUTIL_H_ */
#endif /* _STRUTIL_H_ */

View File

@@ -1,7 +1,7 @@
/*
* uninstall_repmgr.sql
*
* Copyright (C) 2ndQuadrant, 2010-2012
* Copyright (C) 2ndQuadrant, 2010-2014
*
*/

View File

@@ -1,4 +1,6 @@
#ifndef _VERSION_H_
#define _VERSION_H_
#define REPMGR_VERSION "2.0beta1"
#define REPMGR_VERSION "2.0.3"
#endif