Compare commits

...

33 Commits

Author SHA1 Message Date
Ian Barwick
ef382dfede doc: update release notes 2019-12-10 16:35:59 +09:00
Ian Barwick
bc93d2996c standby follow: don't attempt to delete slot if current upstream is primary
An attempt will be made to delete an existing replication slot on the
old upstream node (this is important during e.g. a switchover operation
or when attaching a cascaded standby to a new upstream). However if the
standby is currently attached to the follow target node anyway, the
replication slot should never be deleted.
2019-12-10 15:48:49 +09:00
Ian Barwick
0946073406 doc: remove references to putative 4.3.1 release
The changes listed will be part of the upcoming 4.4 release.
2019-05-24 10:08:27 +09:00
Ian Barwick
129c8782a4 doc: update appendix "Installing old package versions"
Move legacy 3.x package info to separate section.
2019-05-24 10:05:10 +09:00
Ian Barwick
5493055a1d doc: fix typos in source install instructions
s/llib/lib/g
2019-05-10 10:30:29 +09:00
Ian Barwick
a50f0e7cc0 doc: update release notes 2019-05-07 15:49:34 +09:00
Ian Barwick
adfde1b681 doc: add &repmgrd; as entity 2019-05-07 15:26:26 +09:00
Ian Barwick
d4b17635fe doc: update release notes 2019-05-07 15:25:54 +09:00
Ian Barwick
e4c573a7f6 repmgrd: add missing PQfinish() calls 2019-05-02 18:52:37 +09:00
Frantisek Holop
492665e34c doc: promote -> follow
PR #565
2019-04-30 10:53:14 +09:00
Frantisek Holop
2d7c38e2ef doc: bit too many e.g.'s
PR #565.
2019-04-30 10:52:45 +09:00
Ian Barwick
9ee2448583 standby switchover: ignore nodes which are unreachable and marked as inactive
Previously "repmgr standby switchover" would abort if any node was unreachable,
as that means it was unable to check if repmgrd is running.

However if the node has been marked as inactive in the repmgr metadata, it's
reasonable to assume the node is no longer part of the replication cluster
and does not need to be checked.
2019-04-29 14:35:58 +09:00
Ian Barwick
cf9458161f Update HISTORY 2019-04-25 14:58:05 +09:00
Ian Barwick
67dc42d2ad Clarify hints about updating the repmgr extension 2019-04-24 11:39:06 +09:00
Ian Barwick
3b96b2afce "primary register": ensure --force works if another primary is registered but not running 2019-04-23 22:06:28 +09:00
Ian Barwick
216f274c15 doc: add note about when a PostgreSQL restart is required
Per query in GitHub #564.
2019-04-17 09:44:29 +09:00
Ian Barwick
8cb101be1d repmgrd: exclude witness server from followability check 2019-04-11 11:20:45 +09:00
Ian Barwick
03b29908e2 Miscellaneous string handling cleanup
This is mainly to prevent effectively spurious truncation warnings
in recent GCC versions.
2019-04-10 16:21:09 +09:00
Ian Barwick
99be03f000 Fix hint message
s/UPGRADE/UPDATE
2019-04-10 12:15:53 +09:00
Ian Barwick
7aaac343f8 standby clone: always ensure directory is created with correct permissions
In Barman mode, if there is an existing, populated data directory, and
the "--force" option is provided, the entire directory was being deleted,
and later recreated as part of the rsync process, but with the default
permissions.

Fix this by recreating the data directory with the correct permissions
after deleting it.
2019-04-09 11:02:06 +09:00
Ian Barwick
68470a9167 standby clone: improve --dry-run behaviour in barman mode
- emit additional informational output
- ensure that provision of --force does not result in an existing
  data directory being modified in any way
2019-04-08 15:13:13 +09:00
Ian Barwick
35320c27bd doc: update release notes 2019-04-08 11:28:13 +09:00
Ian Barwick
b7b9db7e9c Ensure BDR-specific code only runs on BDR 2.x
The BDR support in repmgr is for a specific BDR 2.x use case, and
is not suitable for more recent BDR versions.
2019-04-08 11:28:10 +09:00
Ian Barwick
01e11950a5 doc: add note about BDR replication type in sample config 2019-04-08 11:28:05 +09:00
Ian Barwick
fcaee6e6e8 doc: emphasise that BDR2 support is for BDR2 only 2019-04-05 10:59:20 +09:00
Ian Barwick
538d5f9df0 Use correct sizeof() argument in a couple of strncpy calls
Source and destination buffers are however the same length in both cases.

Per GitHub #561.
2019-04-04 11:03:01 +09:00
Ian Barwick
4e8b94c105 doc: update 4.3 release notes 2019-04-03 15:09:09 +09:00
Ian Barwick
9ee51bb0cb doc: update README
Link to current documentation version
2019-04-03 10:58:19 +09:00
Ian Barwick
bab07cdda1 doc: add a link to the current documentation from the contents page 2019-04-03 10:52:19 +09:00
Ian Barwick
b03f07ca8f doc: finalize 4.3 release notes 2019-04-02 14:42:25 +09:00
Ian Barwick
39fbe02c48 doc: note that --siblings-follow will become default in a future release 2019-04-02 11:04:59 +09:00
Ian Barwick
2249b79811 standby switchover: list nodes which will remain attatched to the old primary
If --siblings-follow is not supplied, list all nodes which repmgr considers
to be siblings (this will include the witness server, if in use), and
which will remain attached to the old primary.
2019-04-02 10:47:05 +09:00
Ian Barwick
bb0fd944ae doc: update quickstart guide
Improve sample PostgreSQL replication configuration, including
links to the PostgreSQL documentation for each configuration item.

Also set "max_replication_slots" to the same value as "max_wal_senders"
to ensure the sample configuration will work regardless of whether
replication slots are in use, though we do still encourage careful
reading of the comments in the sample configuration and the documentation
in general.
2019-04-02 09:10:05 +09:00
24 changed files with 421 additions and 201 deletions

View File

@@ -1,4 +1,8 @@
4.3 2019-??
4.3.1 2019-12-??
repmgr: ensure an existing replication slot is not deleted if the
follow target is the node's current upstream (Ian)
4.3 2019-04-02
repmgr: add "daemon (start|stop)" command; GitHub #528 (Ian)
repmgr: add --version-number command line option (Ian)
repmgr: add --compact option to "cluster show"; GitHub #521 (Ian)
@@ -26,6 +30,7 @@
repmgrd: improve witness monitoring when primary node not available (Ian)
repmgrd: handle situation where a primary has unexpectedly appeared
during failover; GitHub #420 (Ian)
general: fix Makefile (John)
4.2 2018-10-24
repmgr: add parameter "shutdown_check_timeout" for use by "standby switchover";

View File

@@ -27,7 +27,7 @@ Documentation
The main `repmgr` documentation is available here:
> [repmgr 4 documentation](https://repmgr.org/docs/4.2/index.html)
> [repmgr documentation](https://repmgr.org/docs/current/index.html)
The `README` file for `repmgr` 3.x is available here:
@@ -72,7 +72,7 @@ Please report bugs and other issues to:
* https://github.com/2ndQuadrant/repmgr
Further information is available at https://www.repmgr.org/
Further information is available at https://repmgr.org/
We'd love to hear from you about how you use repmgr. Case studies and
news are always welcome. Send us an email at info@2ndQuadrant.com, or
@@ -97,6 +97,7 @@ Thanks from the repmgr core team.
Further reading
---------------
* [repmgr documentation](https://repmgr.org/docs/current/index.html)
* https://blog.2ndquadrant.com/repmgr-3-2-is-here-barman-support-brand-new-high-availability-features/
* https://blog.2ndquadrant.com/improvements-in-repmgr-3-1-4/
* https://blog.2ndquadrant.com/managing-useful-clusters-repmgr/

View File

@@ -836,15 +836,16 @@ _parse_config(t_configuration_options *options, ItemList *error_list, ItemList *
conninfo_options = PQconninfoParse(options->conninfo, &conninfo_errmsg);
if (conninfo_options == NULL)
{
char error_message_buf[MAXLEN] = "";
PQExpBufferData error_message_buf;
initPQExpBuffer(&error_message_buf);
snprintf(error_message_buf,
MAXLEN,
_("\"conninfo\": %s (provided: \"%s\")"),
conninfo_errmsg,
options->conninfo);
appendPQExpBuffer(&error_message_buf,
_("\"conninfo\": %s (provided: \"%s\")"),
conninfo_errmsg,
options->conninfo);
item_list_append(error_list, error_message_buf);
item_list_append(error_list, error_message_buf.data);
termPQExpBuffer(&error_message_buf);
}
PQconninfoFree(conninfo_options);

View File

@@ -1521,10 +1521,10 @@ get_ready_archive_files(PGconn *conn, const char *data_directory)
while ((arcdir_ent = readdir(arcdir)) != NULL)
{
struct stat statbuf;
char file_path[MAXPGPATH] = "";
char file_path[MAXPGPATH + sizeof(arcdir_ent->d_name)];
int basenamelen = 0;
snprintf(file_path, MAXPGPATH,
snprintf(file_path, sizeof(file_path),
"%s/%s",
archive_status_dir,
arcdir_ent->d_name);

View File

@@ -336,6 +336,15 @@ create_pg_dir(const char *path, bool force)
{
log_notice(_("-F/--force provided - deleting existing data directory \"%s\""), path);
nftw(path, unlink_dir_callback, 64, FTW_DEPTH | FTW_PHYS);
/* recreate the directory ourselves to ensure permissions are correct */
if (!create_dir(path))
{
log_error(_("unable to create directory \"%s\"..."),
path);
return false;
}
return true;
}
@@ -347,6 +356,15 @@ create_pg_dir(const char *path, bool force)
{
log_notice(_("deleting existing directory \"%s\""), path);
nftw(path, unlink_dir_callback, 64, FTW_DEPTH | FTW_PHYS);
/* recreate the directory ourselves to ensure permissions are correct */
if (!create_dir(path))
{
log_error(_("unable to create directory \"%s\"..."),
path);
return false;
}
return true;
}
return false;

View File

@@ -481,28 +481,6 @@ repmgr96-4.1.1-0.0git320.g5113ab0.1.el7.x86_64.rpm</programlisting>
<sect2 id="packages-old-versions-rhel-centos" xreflabel="old RHEL/CentOS package versions">
<title>RHEL/CentOS</title>
<para>
Old RPM packages (<literal>3.2</literal> and later) can be retrieved from the
(deprecated) 2ndQuadrant repository at
<ulink url="http://packages.2ndquadrant.com/">http://packages.2ndquadrant.com/</ulink>
by installing the appropriate repository RPM:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
<listitem>
<simpara>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
</itemizedlist>
<para>
Old versions can be located with e.g.:
@@ -520,6 +498,32 @@ repmgr96-4.1.1-0.0git320.g5113ab0.1.el7.x86_64.rpm</programlisting>
yum install repmgr96-4.0.6-1.rhel6</programlisting>
</para>
<sect3 id="packages-old-versions-rhel-centos-repmgr3">
<title>repmgr 3 packages</title>
<para>
Old &repmgr; 3 RPM packages (<literal>3.2</literal> and later) can be retrieved from the
(deprecated) 2ndQuadrant repository at
<ulink url="http://packages.2ndquadrant.com/repmgr/yum/">http://packages.2ndquadrant.com/repmgr/yum/</ulink>
by installing the appropriate repository RPM:
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-fedora-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
<listitem>
<simpara>
<ulink url="http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm">http://packages.2ndquadrant.com/repmgr/yum-repo-rpms/repmgr-rhel-1.0-1.noarch.rpm</ulink>
</simpara>
</listitem>
</itemizedlist>
</sect3>
</sect2>
</sect1>

View File

@@ -14,13 +14,47 @@
<para>
See also: <xref linkend="upgrading-repmgr">
</para>
<sect1 id="release-4.3.1">
<title>Release 4.3.1</title>
<para><emphasis>??? December ??, 2019</emphasis></para>
<para>
&repmgr; 4.3.1 is a minor release.
</para>
<sect2>
<title>Bug fixes</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command>:
ensure an existing replication slot is not deleted if the
follow target is the node's current upstream.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<sect1 id="release-4.3">
<title>Release 4.3</title>
<para><emphasis>Mar ???, 2019</emphasis></para>
<para><emphasis>Tue April 2, 2019</emphasis></para>
<para>
&repmgr; 4.3 is a major release.
</para>
<para>
For details on how to upgrade an existing &repmgr; instrallation, see
documentation section <link linkend="upgrading-major-version">Upgrading a major version release</link>.
</para>
<para>
If <application>repmgrd</application> is in use, a PostgreSQL restart <emphasis>is</emphasis> required;
in that case we suggest combining this &repmgr; upgrade with the next PostgreSQL
minor release, which will require a PostgreSQL restart in any case.
</para>
<important>
<para>
@@ -37,7 +71,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
</important>
<sect2>
<title>repmgr enhancements</title>
<title>repmgr client enhancements</title>
<para>
<itemizedlist>
@@ -191,7 +225,7 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<listitem>
<para>
In a failover situation, <application>repmgrd</application> will not attempt to promote a
node if another standby has already appeared (e.g. by being promoted manually).
node if another primary has already appeared (e.g. by being promoted manually).
GitHub #420.
</para>
</listitem>
@@ -205,52 +239,6 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
<para>
<itemizedlist>
<listitem>
<para>
&repmgr;: when executing <command><link linkend="repmgr-standby-switchover">repmgr standby switchover</link></command>,
prevent escaping issues with connection URIs when executing <command><link linkend="repmgr-node-rejoin">repmgr node rejoin</link></command>
on the demotion candidate. GitHub #525.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <command><link linkend="repmgr-witness-register">repmgr witness register</link></command>,
check the node to connected is actually the primary (i.e. not the witness server). GitHub #528.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-clone"><command>repmgr standby clone</command></link>,
recheck primary/upstream connection(s) after the data copy operation is complete, as these may
have gone away.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
avoid a potential race condition when comparing received WAL on the standby to the primary's shutdown location,
as the standby's walreceiver may not have yet flushed all received WAL to disk. GitHub #518.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
verify the standby (promotion candidate) is currently attached to the primary (demotion candidate). GitHub #519.
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: on a cascaded standby, don't fail over if
<literal>failover=manual</literal>. GitHub #531.
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-cluster-show">repmgr cluster show</link></command>:
@@ -272,6 +260,44 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-clone"><command>repmgr standby clone</command></link>,
recheck primary/upstream connection(s) after the data copy operation is complete, as these may
have gone away.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <command><link linkend="repmgr-standby-switchover">repmgr standby switchover</link></command>,
prevent escaping issues with connection URIs when executing <command><link linkend="repmgr-node-rejoin">repmgr node rejoin</link></command>
on the demotion candidate. GitHub #525.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
verify the standby (promotion candidate) is currently attached to the primary (demotion candidate). GitHub #519.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>,
avoid a potential race condition when comparing received WAL on the standby to the primary's shutdown location,
as the standby's walreceiver may not have yet flushed all received WAL to disk. GitHub #518.
</para>
</listitem>
<listitem>
<para>
&repmgr;: when executing <command><link linkend="repmgr-witness-register">repmgr witness register</link></command>,
check the node to connected is actually the primary (i.e. not the witness server). GitHub #528.
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-node-check">repmgr node check</link></command>
@@ -281,6 +307,13 @@ REPMGRD_OPTS="--daemonize=false"</programlisting>
</para>
</listitem>
<listitem>
<para>
<application>repmgrd</application>: on a cascaded standby, don't fail over if
<literal>failover=manual</literal>. GitHub #531.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>

View File

@@ -61,28 +61,28 @@ deb-src http://apt.postgresql.org/pub/repos/apt/ stretch-pgdg main</programlisti
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara><literal>llibedit-dev</literal></simpara>
<simpara><literal>libedit-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibkrb5-dev</literal></simpara>
<simpara><literal>libkrb5-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibpam0g-dev</literal></simpara>
<simpara><literal>libpam0g-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibreadline-dev</literal></simpara>
<simpara><literal>libreadline-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibselinux1-dev</literal></simpara>
<simpara><literal>libselinux1-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibssl-dev</literal></simpara>
<simpara><literal>libssl-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibxml2-dev</literal></simpara>
<simpara><literal>libxml2-dev</literal></simpara>
</listitem>
<listitem>
<simpara><literal>llibxslt1-dev</literal></simpara>
<simpara><literal>libxslt1-dev</literal></simpara>
</listitem>
</itemizedlist>
</para>

View File

@@ -76,19 +76,25 @@
</para>
<programlisting>
# Enable replication connections; set this figure to at least one more
# Enable replication connections; set this value to at least one more
# than the number of standbys which will connect to this server
# (note that repmgr will execute `pg_basebackup` in WAL streaming mode,
# which requires two free WAL senders)
# (note that repmgr will execute "pg_basebackup" in WAL streaming mode,
# which requires two free WAL senders).
#
# See: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-WAL-SENDERS
max_wal_senders = 10
# Enable replication slots; set this figure to at least one more
# If using replication slots, set this value to at least one more
# than the number of standbys which will connect to this server.
# Note that repmgr will only make use of replication slots if
# "use_replication_slots" is set to "true" in repmgr.conf
# "use_replication_slots" is set to "true" in "repmgr.conf".
# (If you are not intending to use replication slots, this value
# can be set to "0").
#
# See: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-MAX-REPLICATION-SLOTS
max_replication_slots = 0
max_replication_slots = 10
# Ensure WAL files contain enough information to enable read-only queries
# on the standby.
@@ -103,24 +109,31 @@
# Enable read-only queries on a standby
# (Note: this will be ignored on a primary but we recommend including
# it anyway)
# it anyway, in case the primary later becomes a standby)
#
# See: https://www.postgresql.org/docs/current/runtime-config-replication.html#GUC-HOT-STANDBY
hot_standby = on
# Enable WAL file archiving
#
# See: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-MODE
archive_mode = on
# Set archive command to a script or application that will safely store
# you WALs in a secure place. /bin/true is an example of a command that
# ignores archiving. Use something more sensible.
# Set archive command to a dummy command; this can later be changed without
# needing to restart the PostgreSQL instance.
#
# See: https://www.postgresql.org/docs/current/runtime-config-wal.html#GUC-ARCHIVE-COMMAND
archive_command = '/bin/true'
</programlisting>
<tip>
<simpara>
Rather than editing these settings in the default <filename>postgresql.conf</filename>
file, create a separate file such as <filename>postgresql.replication.conf</filename> and
file, create a separate file such as <filename>postgresql.replication.conf</filename> and
include it from the end of the main configuration file with:
<command>include 'postgresql.replication.conf</command>.
<command>include 'postgresql.replication.conf'</command>.
</simpara>
</tip>
<para>
@@ -129,7 +142,8 @@
<varname>wal_log_hints</varname>; for more details see <xref linkend="repmgr-node-rejoin-pg-rewind">.
</para>
<para>
See also the <link linkend="configuration-postgresql">PostgreSQL configuration</link> section in the <link linkend="configuration">repmgr configuaration guide</link>.
See also the <link linkend="configuration-postgresql">PostgreSQL configuration</link> section in the
<link linkend="configuration">repmgr configuration guide</link>.
</para>
</sect1>

View File

@@ -22,10 +22,10 @@
passwordless SSH connection to the current primary.
</para>
<para>
If other standbys are connected to the demotion candidate, &repmgr; can instruct
If other nodes are connected to the demotion candidate, &repmgr; can instruct
these to follow the new primary if the option <literal>--siblings-follow</literal>
is specified. This requires a passwordless SSH connection between the promotion
candidate (new primary) and the standbys attached to the demotion candidate
candidate (new primary) and the nodes attached to the demotion candidate
(existing primary).
</para>
<note>
@@ -150,8 +150,18 @@
<term><option>--siblings-follow</option></term>
<listitem>
<para>
Have standbys attached to the old primary follow the new primary.
Have nodes attached to the old primary follow the new primary.
</para>
<para>
This will also ensure that a witness node, if in use, is updated
with the new primary's data.
</para>
<note>
<para>
In a future &repmgr; release, <option>--siblings-follow</option> will be applied
by default.
</para>
</note>
</listitem>
</varlistentry>
</variablelist>

View File

@@ -9,6 +9,7 @@
%filelist;
<!ENTITY repmgr "<productname>repmgr</productname>">
<!ENTITY repmgrd "<productname>repmgrd</productname>">
<!ENTITY postgres "<productname>PostgreSQL</productname>">
]>
@@ -25,7 +26,13 @@
<para>
This is the official documentation of &repmgr; &repmgrversion; for
use with PostgreSQL 9.3 - PostgreSQL 11.
It describes the functionality supported by the current version of &repmgr;.
</para>
<para>
&repmgr; is being continually developed and we strongly recommend using the
latest version. Please check the
<ulink url="https://repmgr.org/">repmgr website</ulink> for details
about the current &repmgr; version as well as the
<ulink url="https://repmgr.org/docs/current/index.html">current repmgr documentation</ulink>.
</para>
<para>

View File

@@ -10,7 +10,7 @@
<title>BDR failover with repmgrd</title>
<para>
&repmgr; 4.x provides support for monitoring BDR nodes and taking action in
&repmgr; 4.x provides support for monitoring a pair of BDR 2.x nodes and taking action in
case one of the nodes fails.
</para>
<note>
@@ -31,8 +31,21 @@
reconfigure a proxy server/connection pooler such as <application>PgBouncer</application>.
</para>
<note>
<simpara>
This &repmgr; functionality is for BDR 2.x only running on PostgreSQL 9.4/9.6.
It is <emphasis>not</emphasis> required for later BDR versions.
</simpara>
</note>
<sect1 id="bdr-prerequisites" xreflabel="BDR prequisites">
<title>Prerequisites</title>
<important>
<para>
This &repmgr; functionality is for BDR 2.x only running on PostgreSQL 9.4/9.6.
It is <emphasis>not</emphasis> required for later BDR versions.
</para>
</important>
<para>
&repmgr; 4 requires PostgreSQL 9.4 or 9.6 with the BDR 2 extension
enabled and configured for a two-node BDR network. &repmgr; 4 packages

View File

@@ -217,7 +217,7 @@
<command><link linkend="repmgr-standby-promote">repmgr standby promote</link></command> command.
</para>
<para>
It is also possible to provide e.g. a shell script to e.g. perform user-defined tasks
It is also possible to provide a shell script to e.g. perform user-defined tasks
before promoting the current node. In this case the script <emphasis>must</emphasis>
at some point execute <command><link linkend="repmgr-standby-promote">repmgr standby promote</link></command>
to promote the node; if this is not done, &repmgr; metadata will not be updated and
@@ -257,7 +257,7 @@
</para>
<para>
Normally <option>follow_command</option> is set as &repmgr;'s
<command><link linkend="repmgr-standby-follow">repmgr standby promote</link></command> command.
<command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command> command.
</para>
<para>
The <option>follow_command</option> parameter
@@ -270,7 +270,7 @@
the original primary.
</para>
<para>
It is also possible to provide e.g. a shell script to e.g. perform user-defined tasks
It is also possible to provide a shell script to e.g. perform user-defined tasks
before promoting the current node. In this case the script <emphasis>must</emphasis>
at some point execute <command><link linkend="repmgr-standby-follow">repmgr standby follow</link></command>
to promote the node; if this is not done, &repmgr; metadata will not be updated and

View File

@@ -72,7 +72,8 @@
Ensure that a passwordless SSH connection is possible from the promotion candidate
(standby) to the demotion candidate (current primary). If <literal>--siblings-follow</literal>
will be used, ensure that passwordless SSH connections are possible from the
promotion candidate to all standbys attached to the demotion candidate.
promotion candidate to all nodes attached to the demotion candidate
(including the witness server, if in use).
</para>
<note>

View File

@@ -93,6 +93,15 @@ do_bdr_register(void)
exit(ERR_BAD_CONFIG);
}
if (get_bdr_version_num() > 2)
{
log_error(_("\"repmgr bdr register\" is for BDR 2.x only"));
PQfinish(conn);
pfree(dbname);
exit(ERR_BAD_CONFIG);
}
/* check for a matching BDR node */
{
PQExpBufferData bdr_local_node_name;

View File

@@ -1065,7 +1065,7 @@ build_cluster_matrix(t_node_matrix_rec ***matrix_rec_dest, int *name_length, Ite
matrix_rec_list[i]->node_id = cell->node_info->node_id;
strncpy(matrix_rec_list[i]->node_name,
cell->node_info->node_name,
sizeof(cell->node_info->node_name));
sizeof(matrix_rec_list[i]->node_name));
/*
* Find the maximum length of a node name
@@ -1280,7 +1280,7 @@ build_cluster_crosscheck(t_node_status_cube ***dest_cube, int *name_length, Item
cube[h] = (t_node_status_cube *) pg_malloc(sizeof(t_node_status_cube));
cube[h]->node_id = cell->node_info->node_id;
strncpy(cube[h]->node_name, cell->node_info->node_name, sizeof(cell->node_info->node_name));
strncpy(cube[h]->node_name, cell->node_info->node_name, sizeof(cube[h]->node_name));
/*
* Find the maximum length of a node name

View File

@@ -96,28 +96,6 @@ do_primary_register(void)
initialize_voting_term(conn);
/* Ensure there isn't another registered node which is primary */
primary_conn = get_primary_connection(conn, &current_primary_id, NULL);
if (primary_conn != NULL)
{
if (current_primary_id != config_file_options.node_id)
{
/*
* it's impossible to add a second primary to a streaming
* replication cluster
*/
log_error(_("there is already an active registered primary (node ID: %i) in this cluster"), current_primary_id);
PQfinish(primary_conn);
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
/* we've probably connected to ourselves */
PQfinish(primary_conn);
}
begin_transaction(conn);
/*
@@ -128,12 +106,32 @@ do_primary_register(void)
current_primary_id = get_primary_node_id(conn);
if (current_primary_id != NODE_NOT_FOUND && current_primary_id != config_file_options.node_id)
{
log_error(_("another node with id %i is already registered as primary"), current_primary_id);
log_detail(_("a streaming replication cluster can have only one primary node"));
log_debug("XXX %i", current_primary_id);
primary_conn = establish_primary_db_connection(conn, false);
rollback_transaction(conn);
PQfinish(conn);
exit(ERR_BAD_CONFIG);
if (PQstatus(primary_conn) == CONNECTION_OK)
{
if (get_recovery_type(primary_conn) == RECTYPE_PRIMARY)
{
log_error(_("there is already an active registered primary (node ID: %i) in this cluster"),
current_primary_id);
log_detail(_("a streaming replication cluster can have only one primary node"));
log_hint(_("ensure this node is shut down before registering a new primary"));
PQfinish(primary_conn);
rollback_transaction(conn);
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
log_warning(_("node %is is registered as primary but running as a standby"),
current_primary_id);
PQfinish(primary_conn);
}
log_notice(_("setting node %i's node record to inactive"),
current_primary_id);
update_node_record_set_active(conn, current_primary_id, false);
}
/*

View File

@@ -770,56 +770,73 @@ do_standby_clone(void)
void
check_barman_config(void)
{
char command[MAXLEN];
PQExpBufferData command;
bool command_ok = false;
/*
* Check that there is at least one valid backup
*/
log_info(_("connecting to Barman server to verify backup for %s"), config_file_options.barman_server);
log_info(_("connecting to Barman server to verify backup for \"%s\""), config_file_options.barman_server);
maxlen_snprintf(command, "%s show-backup %s latest > /dev/null",
make_barman_ssh_command(barman_command_buf),
config_file_options.barman_server);
initPQExpBuffer(&command);
command_ok = local_command(command, NULL);
appendPQExpBuffer(&command, "%s show-backup %s latest > /dev/null",
make_barman_ssh_command(barman_command_buf),
config_file_options.barman_server);
command_ok = local_command(command.data, NULL);
if (command_ok == false)
{
log_error(_("no valid backup for server %s was found in the Barman catalogue"),
log_error(_("no valid backup for server \"%s\" was found in the Barman catalogue"),
config_file_options.barman_server);
log_detail(_("command executed was:\n %s"), command.data),
log_hint(_("refer to the Barman documentation for more information"));
termPQExpBuffer(&command);
exit(ERR_BARMAN);
}
if (!create_pg_dir(local_data_directory, runtime_options.force))
else if (runtime_options.dry_run == true)
{
log_error(_("unable to use directory %s"),
local_data_directory);
log_hint(_("use -F/--force option to force this directory to be overwritten"));
exit(ERR_BAD_CONFIG);
log_info(_("valid backup for server \"%s\" found in the Barman catalogue"),
config_file_options.barman_server);
}
termPQExpBuffer(&command);
/*
* Create the local repmgr subdirectory
* Attempt to create data directory (unless --dry-run specified,
* in which case do nothing; warnings will be emitted elsewhere about
* any issues with the data directory)
*/
maxlen_snprintf(local_repmgr_tmp_directory,
"%s/repmgr", local_data_directory);
maxlen_snprintf(datadir_list_filename,
"%s/data.txt", local_repmgr_tmp_directory);
if (!create_pg_dir(local_repmgr_tmp_directory, runtime_options.force))
if (runtime_options.dry_run == false)
{
log_error(_("unable to create directory \"%s\""),
local_repmgr_tmp_directory);
if (!create_pg_dir(local_data_directory, runtime_options.force))
{
log_error(_("unable to use directory %s"),
local_data_directory);
log_hint(_("use -F/--force option to force this directory to be overwritten"));
exit(ERR_BAD_CONFIG);
}
exit(ERR_BAD_CONFIG);
/*
* Create the local repmgr subdirectory
*/
maxlen_snprintf(local_repmgr_tmp_directory,
"%s/repmgr", local_data_directory);
maxlen_snprintf(datadir_list_filename,
"%s/data.txt", local_repmgr_tmp_directory);
if (!create_pg_dir(local_repmgr_tmp_directory, runtime_options.force))
{
log_error(_("unable to create directory \"%s\""),
local_repmgr_tmp_directory);
exit(ERR_BAD_CONFIG);
}
}
/*
@@ -827,20 +844,37 @@ check_barman_config(void)
*/
log_info(_("connecting to Barman server to fetch server parameters"));
maxlen_snprintf(command, "%s show-server %s > %s/show-server.txt",
make_barman_ssh_command(barman_command_buf),
config_file_options.barman_server,
local_repmgr_tmp_directory);
initPQExpBuffer(&command);
command_ok = local_command(command, NULL);
if (runtime_options.dry_run == true)
{
appendPQExpBuffer(&command, "%s show-server %s > /dev/null",
make_barman_ssh_command(barman_command_buf),
config_file_options.barman_server);
}
else
{
appendPQExpBuffer(&command, "%s show-server %s > %s/show-server.txt",
make_barman_ssh_command(barman_command_buf),
config_file_options.barman_server,
local_repmgr_tmp_directory);
}
command_ok = local_command(command.data, NULL);
if (command_ok == false)
{
log_error(_("unable to fetch server parameters from Barman server"));
log_detail(_("command executed was:\n %s"), command.data),
termPQExpBuffer(&command);
exit(ERR_BARMAN);
}
else if (runtime_options.dry_run == true)
{
log_info(_("server parameters were successfully fetched from Barman server"));
}
termPQExpBuffer(&command);
}
@@ -872,7 +906,7 @@ _do_create_recovery_conf(void)
t_node_info upstream_node_record = T_NODE_INFO_INITIALIZER;
RecordStatus record_status = RECORD_NOT_FOUND;
char recovery_file_path[MAXPGPATH] = "";
char recovery_file_path[MAXPGPATH + sizeof(RECOVERY_COMMAND_FILE)] = "";
struct stat st;
bool node_is_running = false;
bool slot_creation_required = false;
@@ -1117,7 +1151,10 @@ _do_create_recovery_conf(void)
/* check if recovery.conf exists */
snprintf(recovery_file_path, MAXPGPATH, "%s/%s", local_data_directory, RECOVERY_COMMAND_FILE);
snprintf(recovery_file_path, sizeof(recovery_file_path),
"%s/%s",
local_data_directory,
RECOVERY_COMMAND_FILE);
if (stat(recovery_file_path, &st) == -1)
{
@@ -2864,8 +2901,8 @@ do_standby_follow_internal(PGconn *primary_conn, PGconn *follow_target_conn, t_n
free_conninfo_params(&local_node_conninfo);
/*
* store the original upstream node id so we can delete the
* replication slot, if exists
* Store the original upstream node id so we can delete the
* replication slot, if it exists.
*/
if (local_node_record.upstream_node_id != UNKNOWN_NODE_ID)
{
@@ -2877,9 +2914,17 @@ do_standby_follow_internal(PGconn *primary_conn, PGconn *follow_target_conn, t_n
}
if (config_file_options.use_replication_slots && runtime_options.host_param_provided == false && original_upstream_node_id != UNKNOWN_NODE_ID)
if (config_file_options.use_replication_slots && runtime_options.host_param_provided == false)
{
remove_old_replication_slot = true;
/*
* Only attempt to delete the old replication slot if the old upstream
* node is known and is different to the follow target node.
*/
if (original_upstream_node_id != UNKNOWN_NODE_ID
&& original_upstream_node_id != follow_target_node_record->node_id)
{
remove_old_replication_slot = true;
}
}
}
@@ -3026,8 +3071,6 @@ do_standby_follow_internal(PGconn *primary_conn, PGconn *follow_target_conn, t_n
* Note that if this function is called by do_standby_switchover(), the
* "repmgr node rejoin" command executed on the demotion candidate may already
* have removed the slot, so there may be nothing to do.
*
* XXX check if former upstream is current primary?
*/
if (remove_old_replication_slot == true)
@@ -3561,9 +3604,26 @@ do_standby_switchover(void)
{
if (sibling_nodes.node_count > 0)
{
PQExpBufferData nodes;
NodeInfoListCell *cell;
initPQExpBuffer(&nodes);
for (cell = sibling_nodes.head; cell; cell = cell->next)
{
appendPQExpBuffer(&nodes,
" %s (node ID: %i)",
cell->node_info->node_name,
cell->node_info->node_id);
if (cell->next)
appendPQExpBufferStr(&nodes, "\n");
}
log_warning(_("%i sibling nodes found, but option \"--siblings-follow\" not specified"),
sibling_nodes.node_count);
log_detail(_("these nodes will remain attached to the current primary"));
log_detail(_("these nodes will remain attached to the current primary:\n%s"), nodes.data);
termPQExpBuffer(&nodes);
}
}
else
@@ -4030,16 +4090,24 @@ do_standby_switchover(void)
repmgrd_info[i]->pg_running = false;
item_list_append_format(&repmgrd_connection_errors,
_("unable to connect to node \"%s\" (ID %i):\n%s"),
cell->node_info->node_name,
cell->node_info->node_id,
PQerrorMessage(cell->node_info->conn));
/*
* Only worry about unreachable nodes if they're marked as active
* in the repmgr metadata.
*/
if (cell->node_info->active == true)
{
unreachable_node_count++;
item_list_append_format(&repmgrd_connection_errors,
_("unable to connect to node \"%s\" (ID %i):\n%s"),
cell->node_info->node_name,
cell->node_info->node_id,
PQerrorMessage(cell->node_info->conn));
}
PQfinish(cell->node_info->conn);
cell->node_info->conn = NULL;
unreachable_node_count++;
i++;
continue;
}
@@ -4982,19 +5050,41 @@ check_source_server()
}
/*
* In the default pg_basebackup mode, we'll cowardly refuse to overwrite
* an existing data directory
* Check the local directory to see if it appears to be a PostgreSQL
* data directory.
*
* Note: a previous call to check_dir() will have checked whether it contains
* a running PostgreSQL instance.
*/
if (mode == pg_basebackup)
if (is_pg_dir(local_data_directory))
{
if (is_pg_dir(local_data_directory) && runtime_options.force != true)
const char *msg = _("target data directory appears to be a PostgreSQL data directory");
const char *hint = _("use -F/--force to overwrite the existing data directory");
if (runtime_options.force == false && runtime_options.dry_run == false)
{
log_error(_("target data directory appears to be a PostgreSQL data directory"));
log_error("%s", msg);
log_detail(_("target data directory is \"%s\""), local_data_directory);
log_hint(_("use -F/--force to overwrite the existing data directory"));
log_hint("%s", hint);
PQfinish(source_conn);
exit(ERR_BAD_CONFIG);
}
if (runtime_options.dry_run == true)
{
if (runtime_options.force == true)
{
log_warning("%s and will be overwritten", msg);
log_detail(_("target data directory is \"%s\""), local_data_directory);
}
else
{
log_warning("%s", msg);
log_detail(_("target data directory is \"%s\""), local_data_directory);
log_hint("%s", hint);
}
}
}
/*

View File

@@ -2219,7 +2219,7 @@ create_repmgr_extension(PGconn *conn)
log_detail(_("version %s is installed but newer version %s is available"),
extversions.installed_version,
extversions.default_version);
log_hint(_("execute \"ALTER EXTENSION repmgr UPGRADE\""));
log_hint(_("update the installed extension version by executing \"ALTER EXTENSION repmgr UPDATE\""));
return false;
case REPMGR_INSTALLED:

View File

@@ -72,6 +72,7 @@
# to the user defined in "conninfo".
#replication_type=physical # Must be one of 'physical' or 'bdr'.
# NOTE: "bdr" can only be used with BDR 2.x
#location=default # arbitrary string defining the location of the node; this
# is used during failover to check visibilty of the

View File

@@ -1,3 +1,3 @@
#define REPMGR_VERSION_DATE ""
#define REPMGR_VERSION "4.3"
#define REPMGR_VERSION_NUM 40300
#define REPMGR_VERSION "4.3.1"
#define REPMGR_VERSION_NUM 40301

View File

@@ -96,9 +96,21 @@ monitor_bdr(void)
if (!is_bdr_db(local_conn, NULL))
{
log_error(_("database is not BDR-enabled"));
PQfinish(local_conn);
exit(ERR_BAD_CONFIG);
}
/*
* Check this is a supported BDR version (basically BDR 2.x)
*/
if (get_bdr_version_num() > 2)
{
log_error(_("\"bdr\" mode is for BDR 2.x only"));
log_hint(_("for BDR 3 and later, use \"replication_type=physical\""));
log_error(_("database is not BDR-enabled"));
exit(ERR_DB_CONN);
}
if (is_table_in_bdr_replication_set(local_conn, "nodes", "repmgr") == false)
{
log_error(_("repmgr metadata table 'repmgr.%s' is not in the 'repmgr' replication set"),

View File

@@ -3580,7 +3580,7 @@ do_election(NodeInfoList *sibling_nodes, int *new_primary_id)
* to follow it.
*/
if (sibling_replication_info.in_recovery == false)
if (sibling_replication_info.in_recovery == false && cell->node_info->type != WITNESS)
{
bool can_follow;
@@ -3931,10 +3931,13 @@ check_connection(t_node_info *node_info, PGconn **conn)
log_info(_("attempting to reconnect to node \"%s\" (ID: %i)"),
node_info->node_name,
node_info->node_id);
PQfinish(*conn);
*conn = establish_db_connection(node_info->conninfo, false);
if (PQstatus(*conn) != CONNECTION_OK)
{
PQfinish(*conn);
*conn = NULL;
log_warning(_("reconnection to node \"%s\" (ID: %i) failed"),
node_info->node_name,

View File

@@ -409,8 +409,8 @@ main(int argc, char **argv)
log_detail(_("\"repmgr\" version %s is installed but extension is version %s"),
REPMGR_VERSION,
extversions.installed_version);
log_hint(_("update the repmgr binaries to match the installed extension version"));
log_hint(_("verify the repmgr installation on this server is updated properly before continuing"));
close_connection(&local_conn);
exit(ERR_BAD_CONFIG);
}
@@ -421,8 +421,8 @@ main(int argc, char **argv)
log_detail(_("\"repmgr\" version %s is installed but extension is version %s"),
REPMGR_VERSION,
extversions.installed_version);
log_hint(_("update the installed extension version by executing \"ALTER EXTENSION repmgr UPDATE\""));
log_hint(_("verify the repmgr extension is updated properly before continuing"));
close_connection(&local_conn);
exit(ERR_BAD_CONFIG);
}