Compare commits

..

106 Commits

Author SHA1 Message Date
Ian Barwick
78cc278639 docs: finalize release notes 2022-02-15 13:38:24 +09:00
Ian Barwick
ceb65027c6 Bump version number 2022-02-04 09:08:04 +09:00
Ian Barwick
e6caa14ea2 Bump version number
5.3.1
2022-02-03 14:48:47 +09:00
Ian Barwick
88a11f36ca Add include for pwd.h
This was previously included via the PostgreSQL source, but that
seems to have gone away in recent HEAD builds.
2022-02-03 14:32:18 +09:00
Ian Barwick
7f371b11a5 doc: update version matrix
5.2.1 is the latest release in the 5.2.x series.
2022-02-03 13:30:29 +09:00
Ian Barwick
349eacd4b7 doc: update release notes 2022-02-03 13:29:32 +09:00
Ian Barwick
9f2afe9643 Fix upgrade paths from 4.1 ~ 4.3 to 5.2 and later
A number of C functions were added in releases 4.2 to 4.4; however
these were renamed in 5.3 to prevent naming clashes with other
extensions.

This does however mean that when upgrading from one of the above
versions, the intermediate upgrade steps will attempt to create
SQL functions referencing C functions which no longer exist in
repmgr.so, and hence cause the upgrade to fail.

We can work around this by providing empty upgrade scripts
from these versions to 4.4, which skip the problematic CREATE
FUNCTION commands. The functions will be correctly created in
the 5.2--5.3 upgrade script.
2022-02-03 13:29:25 +09:00
Ian Barwick
356f65531f repmgrd: move connection pointer declaration inside relevant block
As it's used only there and nowhere else.
2022-01-04 12:46:50 +09:00
Ian Barwick
2a7579c770 doc: update release notes 2022-01-04 12:33:30 +09:00
zhouhj43183
820d972d41 repmgrd: ensure potentially open connections are closed
When recovering from degraded state in local node monitoring, in some
cases a new connection was opened to the local node without closing
the old one, which will result in memory leakage.
2022-01-04 12:24:42 +09:00
Ian Barwick
d0add49c84 doc: update repmgr.conf.sample
Minor formatting fix.
2021-12-08 09:52:27 +09:00
Ian Barwick
9a84fa84f9 doc: update repmgr.conf.sample
Remove bogus -W option in "repmgr standby follow" example invocation
for the "follow_command" parameter.

The option (which corresponds to "--no-wait") is not used by
"repmgr standby follow".

Per report from Jimmy Angelakos.
2021-12-08 09:52:22 +09:00
Ian Barwick
ff2c56f5cb doc: fix typo 2021-11-05 09:14:19 +09:00
Ian Barwick
3b860bad80 Removed temporary include file workaround 2021-10-29 15:44:16 +09:00
Ian Barwick
70c79aeaec Add dummy include file
This is a workaround required to facilitate Debian package builds
against PostgreSQL Extended.
2021-10-12 16:50:33 +09:00
Ian Barwick
afd876377c doc: repmgr 5.2 is no longer supported. 2021-10-12 14:05:41 +09:00
Ian Barwick
277910cb31 Fix version number 2021-10-12 13:50:22 +09:00
Ian Barwick
9554154677 doc: update version matrix 2021-10-12 13:49:28 +09:00
Ian Barwick
2bbfb5daa0 doc: update release notes 2021-10-12 10:13:26 +09:00
Ian Barwick
82b0e85a66 Bump version number 2021-10-12 10:11:27 +09:00
Ian Barwick
3320eb0983 repmgrd: improve node activation at startup
Commit 79d1f00 modified repmgrd to automatically set an inactive node
to "active" at startup.

However we need to avoid doing that for cases where the node role has
changed (e.g. a former primary was recloned as a standby) but the node
record was not updated.
2021-10-11 14:39:14 +09:00
Ian Barwick
7862941300 repmgrd: add %p event notification parameter for "repmgrd_failover_promote"
This enables an event notification script to identify the former primary
node.
2021-09-28 10:25:45 +09:00
Ian Barwick
f152fc3016 Add --repmgrd option to "repmgr node check"
This provides a simple way for checking whether the node's repmgrd is
running.

GitHub #719.
2021-09-28 09:46:54 +09:00
Ian Barwick
9c260e605d Update Makefile 2021-09-16 17:17:20 +09:00
Ian Barwick
7ad06530e4 doc: update 5.3.0 release notes 2021-09-16 17:15:58 +09:00
Ian Barwick
c787273f91 Add extension script for unpackaged upgrades to 5.3 2021-09-16 13:55:56 +09:00
Ian Barwick
d40055c8dd doc: update 5.3.0 release notes 2021-09-16 13:42:34 +09:00
Ian Barwick
bf0478088c standby: add missing include 2021-08-18 10:26:56 +09:00
Ian Barwick
efd5792de4 Remove redundant shared library function prototypes
From PostgreSQL 9.4 (commit e7128e8d), explicit function prototypes
are not required as they will be generated by the PG_FUNCTION_INFO_V1
macro.

(We were supporting PostgreSQL 9.3 until relatively recently, so there
was nothing particular to gain by removing these earlier).
2021-08-18 10:19:13 +09:00
Ian Barwick
17987a2690 standby switchover: detect if demotion candidate is running as a primary
This shouldn't happen, but if it does, log the fact for easier analysis.
2021-07-28 11:50:57 +09:00
Ian Barwick
5f1ba6db3d standby switchover: improve handling of node rejoin failure
Explicitly check whether the "repmgr node rejoin" command on the
demotion candidate succeeded. Due to the way SSH execution is
currently implemented, we can return either the command execution
status or the command output; to ensure any errors are available,
log them to a temporary file on the demotion candidate and note
its location in case of an error.

While we're at it, improve error message handling when the demotion
candidate fails to rejoin.
2021-07-28 11:42:40 +09:00
Ian Barwick
55efbe60ea standby switchover: optionally delay promotion
This is for testing purposes only and should not be used in production.
2021-07-27 17:01:27 +09:00
Ian Barwick
132f5ebc08 standby switchover: improve logging of repmgrd pause actions
- state how many nodes are to be operated on
- if errors were encountered with any nodes, emit the total number
  of nodes as well as the number of affected nodes
- log nodes where repmgrd was not running anyway as NOTICE, not
  WARNING
2021-07-26 17:46:58 +09:00
Ian Barwick
c901f36f81 Standardize formatting of node ID in log messages
Mostly we have "(ID: %i)", so use that rather than "(ID %i)" which has
crept into a few places.
2021-07-26 17:27:36 +09:00
Ian Barwick
edb49b2747 doc: link to main documentation section about RemoveIPC 2021-07-26 11:01:03 +09:00
Ian Barwick
a35d85ed70 doc: link to service commands section from switchover docs 2021-07-26 09:47:14 +09:00
Ian Barwick
b6b91425d9 doc: document pg_bindir setting
Per suggestion in GitHub #705.
2021-07-20 13:36:08 +09:00
Ian Barwick
32329ca55a doc: link to sample configuration file
Unfortunately it hasn't been possible yet to include all available
configuration items in the main documentation, but we should at least
make it easier to find the full list.
2021-07-20 13:18:57 +09:00
Ian Barwick
79d1f005db repmgrd: activate inactive node record at startup
If a PostgreSQL instance was shut down while repmgrd was running, and
repmgrd was subsequently restarted (this chain of events could occur
during e.g. a server reboot), the node record will have been set to
"inactive". Previously, in this case repmgrd would refuse to start up.
However, as we can determine the node is running, it should normally
be no problem to automatically set the node record to "active".

The old behaviour can be restored by setting the new parameter
"repmgrd_exit_on_inactive_node" to "true".

RM19604.
2021-07-12 17:46:09 +09:00
Ian Barwick
f64f498afb Be more flexible when parsing the output from pg_config --version
The string may not always start with "PostgreSQL" when building
against non-community versions.

Life would be much easier here if there was an option like
"pg_config --version-number" or similar.
2021-07-05 15:38:34 +09:00
Ian Barwick
f10c013e89 doc: note PostgreSQL 14 support
repmgr 5.3 will provide support for PostgreSQL 14.
2021-07-01 16:00:21 +09:00
Ian Barwick
2059a55a99 doc: update release notes 2021-07-01 13:44:33 +09:00
Ian Barwick
99ed17b838 doc: update release notes 2021-07-01 13:28:15 +09:00
Ian Barwick
078b4ad863 standby clone: set "slot_name" in node record if required
If executing "repmgr standby clone --replication-conf-only" on a node
which was set up without replication slots, but the repmgr configuration
was since changed to "use_replication_slots=1", repmgr will attempt to
create the replication slot. This will however fail if "slot_name"
is not set in the node's record, so have repmgr set the slot_name in
this case.

It might be preferable to preemptively create the slot name for each
node when configuring the cluster, however this would be a behavioural
change which would be better off in a major release (for example, it's
conceivable a user runs sanity checks on the node records and expects
to find the slot names empty if replication slots are not in use).
2021-07-01 13:04:23 +09:00
Ian Barwick
2af71c6426 repmgrd: ensure short option "-s" is accepted
The long option --show-pid-file was fine.
2021-06-03 18:41:11 +09:00
Ian Barwick
9349520530 Remove reference to PostgreSQL 9.3 in --help output
It is not supported by repmgr 5.2 and later.
2021-04-15 15:31:59 +09:00
Ian Barwick
14851e61de doc: clarify "connection_check_type='query'" 2021-03-03 13:12:58 +09:00
Ian Barwick
888e1d7a3b docs: update repmgr.conf.sample
Fix description for connection_check_type='connection'.
2021-03-02 11:44:14 +09:00
Ian Barwick
1b4c2a60bb docs: update README
Note latest version number.
2021-03-02 11:26:15 +09:00
Ian Barwick
da163e811c doc: update README
Fix and update broken link.
2021-03-02 11:14:16 +09:00
Ian Barwick
80d1beef7e doc: update GitHub links to new location 2021-03-02 08:58:49 +09:00
Ian Barwick
4f009548f6 doc: remove generated .fo files 2021-03-01 11:06:41 +09:00
Ian Barwick
d266df3143 Change copyright information to "EnterpriseDB Corporation"
RM20485.
2021-03-01 11:03:52 +09:00
Ian Barwick
dd8204e013 Rename various shared library functions
Some of the more generically named functions are at risk of colliding
with functions defined in other libraries. To mitigate that risk,
prefix with "repmgr_", unless the name already has some reference
to repmgr.

This requires an extension version bump.

RM20471.
2021-02-23 10:14:28 +09:00
Ian Barwick
d34b4e71a6 Fix incorrect comment 2021-02-22 13:28:27 +09:00
Ian Barwick
12749c3f63 doc: fix XML markup
An incorrect column count was causing PDF builds to fail.
2021-02-19 20:39:41 +09:00
Ian Barwick
ce59d92731 doc: update repmgr.conf.sample 2021-01-14 15:27:24 +09:00
Ian Barwick
cfbeed50d6 node rejoin: emit rejoin target note information as NOTICE
As it's possible to specify the connection information for any available
node, but currently not possible to rejoin to a node other than the
primary, explicitly mention what the rejoin target will be.
2021-01-06 14:11:37 +09:00
Ian Barwick
da3eaee127 doc: "repmgr node rejoin" clarifications
- make it clearer a node can only be joined to the primary
- update patch status
2021-01-06 12:36:11 +09:00
Ian Barwick
b37a599fc6 Update copyright notices to 2021 2021-01-04 12:54:54 +09:00
Ian Barwick
f011e552d0 Add missing PQconninfoFree() call 2020-12-24 18:07:18 +09:00
Ian Barwick
d1cc05faf9 repmgrd: edit code comment for clarity 2020-12-22 13:58:34 +09:00
Ian Barwick
7ceba84e32 doc: minor grammar tweak 2020-12-22 13:57:31 +09:00
Josh Soref
842c67ca18 doc: various spelling fixes
Via GitHub #687.
2020-12-22 13:47:56 +09:00
Josh Soref
f619c3a8ff Fix various typos in code comments.
Via GitHub #687.
2020-12-22 13:43:06 +09:00
Josh Soref
5a88858596 repmgr: various log ouput typo fixes
Via GitHub #687.
2020-12-22 13:18:11 +09:00
Josh Soref
02bc143c75 repmgr: fix typo in "repmgr node --help" output
Via GitHub #687.
2020-12-22 13:07:16 +09:00
Ian Barwick
c480d01f9c Improve HINT about upgrading the repmgr extension
Per feedback in GitHub #685.
2020-12-15 08:41:46 +09:00
Ian Barwick
e762200a12 doc: update README
Link to recent-ish EDB blog article.
2020-12-08 13:31:21 +09:00
Ian Barwick
2133e1097e standby switchover: remove extraneous space in log message 2020-12-08 13:14:45 +09:00
Ian Barwick
77d7a098a1 doc: add 5.2.1 release date 2020-12-08 12:41:46 +09:00
Ian Barwick
4e9cdf0267 doc: update 5.2.1 release notes 2020-12-04 14:49:20 +09:00
Ian Barwick
0d8bf2a935 Minor string formatting optimization 2020-12-04 10:16:21 +09:00
Ian Barwick
debbda6074 standby clone: tweak error message
Probably a remnant from the 9.1 era, where it was not possible to
take a base backup from a standby.
2020-12-04 10:13:45 +09:00
Ian Barwick
d5b94431f2 standby follow: fix standby.signal generation
Oversight from previous a93c6dfc.
2020-12-02 09:11:20 +09:00
Ian Barwick
93187e9743 Add missing connection close
In a corner-case situation where a standby is unable to attach to
the new primary due to a mismatch in the WAL stream, the connection
used to verify the recovery status of the new primary was not being
closed, leading to a risk of connection exhaustion on the new primary.

Addresses GitHub #682.
2020-12-01 21:33:07 +09:00
Ian Barwick
f7e45863ad standby clone: fix data directory permissions handling for Pg11 and later
Previously, repmgr would forcibly change the permissions on a data
directory to 0700. However from PostgreSQL 11, 0750 is also valid,
so that value should not be changed.
2020-12-01 11:48:22 +09:00
Ian Barwick
89556d6488 standby clone: add --recovery-min-apply-delay to help output 2020-11-30 16:44:48 +09:00
Ian Barwick
4ad868d119 doc: update 5.2.1 release notes 2020-11-30 16:44:48 +09:00
Ian Barwick
a93c6dfca7 Ensure standby.signal is set correctly if -D/--data-directory supplied
When cloning a standby, it's possible to do a "raw" clone by providing
-D/--data-directory but no repmgr.conf file. However the code which
creates "standby.signal" was assuming the presence of a valid
repmgr.conf complete with "data_directory" configuration.

This is very much a niche-use case.
2020-11-27 11:24:15 +09:00
Ian Barwick
4d8bc63834 repmgrd: fix issue with incorrect reconnect_interval
Addresses GitHub #673.
2020-11-25 20:40:28 +09:00
Ian Barwick
7bca9df223 Update Makefile
We don't actually need $(LIBS) in there; this was cargo-culted in
from somewhere.
2020-11-24 17:37:48 +09:00
Ian Barwick
1ac62a4352 Avoid compiler warnings for various strncpy() operations
Here the compiler may complain that the source length is being used,
though in all cases the source length was previously used to
define the length of the destination buffer, so it's not actually
a problem.
2020-11-24 15:42:49 +09:00
Ian Barwick
8f7a32a9a2 repmgr: prevent termination in corner-case situation
If neither the local node nor the upstream are available, and
"standby_disconnect_on_failover" is set, attempting to fetch
the walreceiver PID will result in repmgrd terminating.

Add a check that the connection is valid before attempting to
fetch the walreceiver PID.

Addresses GitHub #675.
2020-11-17 16:34:55 +09:00
Ian Barwick
9c04de11fc standby clone: various clarifications for --replication-conf-only option
In particular, the emitted HINT was not really appropriate for Pg13 and
later.
2020-11-17 09:58:51 +09:00
Ian Barwick
040b1ae4e3 Update corner-case error message
Not possible to build repmgr compatible with Pg12+ against Pg11
and earlier due to the addition of FullTransactionId.
2020-11-17 09:39:41 +09:00
Ian Barwick
703aed3fa3 doc: tweak "repmgr standby clone" reference
As recovery.conf starts to fade away, mention that last.
2020-11-10 16:07:22 +09:00
Ian Barwick
7ee0098771 standby clone: add option --recovery-min-apply-delay
This overrides the equivalent setting in repmgr.conf, if present.

Note this option was available in repmgr versions prior to 4.0, but
was assumed to be redundant. However recently a use-case was made
for its reintroduction.
2020-11-10 15:55:04 +09:00
Ian Barwick
430d12b870 Fix typo 2020-11-10 13:40:38 +09:00
Romain Jacquier
c8b2d23361 Fix help witness
Fix the `repmgr witness --help` command where at the "Unregister" section the message shown was
```
"witness register" unregisters a witness node.
```
instead of
```
"witness unregister" unregisters a witness node.
```

GitHub #676.
2020-11-09 13:34:08 +09:00
Ian Barwick
8543c0bcf6 standby clone: emit pg_basebackup command in --dry-run mode 2020-11-04 12:00:42 +09:00
Ian Barwick
674c06d01c Decouple extension version check from binary version
Until now the extension version has always moved in lock-step
with the binary version, but that doesn't always need to be
the case, so make it possible to have an extension version
which does not match the binary version.
2020-10-30 14:42:58 +09:00
Ian Barwick
970d7a136f Fix return value of pg_reload_conf() database utility function
Would always return "false", but as the value wasn't used anywhere,
the issue was inconsequential.

However while we're at it, actually check the return value in the
two places it's called, to help diagnose any issues in the unlikely
event they occur.

Per issue reported via GitHub PR #671 from user duzhgg.
2020-10-30 14:25:11 +09:00
Ian Barwick
7bde686796 standby clone: handle missing "postgresql.auto.conf"
In PostgreSQL 12 and later we need to append replication configuration
to "postgresql.auto.conf" to guarantee it will be read last, and hence
override any preceding replication configuration which may be haunting
the configuration files.

We've been assuming that "postgresql.auto.conf" will always be present,
but at least one corner case has been observed where that was not the
case on the node being cloned from. Moreover it's perfectly acceptable
that this file does not exist (it will be recreated the next time
ALTER SYSTEM is executed), so we should be prepared to handle that case.

In passing, improve handling of more unlikely errors which might be
encountered when processing "postgresql.auto.conf".
2020-10-30 12:25:03 +09:00
Ian Barwick
ab1447aeca Standardize code style 2020-10-30 11:06:15 +09:00
Ian Barwick
293e37688f config: fix parsing of "replication_type"
This is a legacy parameter which can currently only contain one value,
"physical" (the default).

It can be safely omitted.

Addresses GitHub #672.
2020-10-30 10:14:04 +09:00
Ian Barwick
96718151a6 doc: update README
- remove partial sentence
- remove links to very dated blog entries
2020-10-27 13:58:07 +09:00
Ian Barwick
65ffe51bb4 doc: update README
Link to release notes as a simple way of providing the latest release
information.
2020-10-27 13:52:19 +09:00
Ian Barwick
b6d0288a82 Finalize release date 2020-10-22 21:23:50 +09:00
Ian Barwick
f888407ad8 Additional fix to upgrade script
Drop old "repl_events" table.
2020-10-22 21:22:52 +09:00
Ian Barwick
1512c7b761 Fix extension script for unpackaged upgrades to 5.2
Apparently "ALTER TABLE" (which we were using to convert the
"repl_events" table) does not mark the table as being part of the
extension. Instead, we need to create the new table and copy the
data, as is done with the other tables.
2020-10-22 21:22:47 +09:00
Ian Barwick
8877d4d508 doc: add missing "unpackaged" reference 2020-10-22 21:22:43 +09:00
Ian Barwick
f18b2e900d standby clone: improve Barman source server check
Use "remote_command()" to execute the remote psql command, and
provide the -X option to psql to ensure it doesn't read ~/.psqlrc.
2020-10-22 21:22:04 +09:00
Ian Barwick
5c4aa1856c doc: update README 2020-10-22 21:20:17 +09:00
Ian Barwick
091a2df167 doc: update release notes 2020-10-20 14:09:44 +09:00
Ian Barwick
17a1732eb0 Bump master branch to 5.3dev
Also update the minimum version check to PostgreSQL 9.4.
2020-10-20 13:41:49 +09:00
35 changed files with 1301 additions and 287 deletions

10
HISTORY
View File

@@ -1,4 +1,12 @@
5.2.2. 2021-??-??
5.3.1 2022-??-??
repmgrd: fixes for potential connection leaks (hslightdb)
5.3.0 2021-10-12
standby switchover: improve handling of node rejoin failure (Ian)
repmgrd: prefix all shared library functions with "repmgr_" to
minimize the risk of clashes with other shared libraries (Ian)
repmgrd: at startup, if node record is marked as "inactive", attempt
to set it to "active" (Ian)
standby clone: set "slot_name" in node record if required (Ian)
node rejoin: emit rejoin target note information as NOTICE (Ian)
repmgrd: ensure short option "-s" is accepted (Ian)

View File

@@ -13,6 +13,7 @@ DATA = \
repmgr--unpackaged--4.0.sql \
repmgr--unpackaged--5.1.sql \
repmgr--unpackaged--5.2.sql \
repmgr--unpackaged--5.3.sql \
repmgr--4.0.sql \
repmgr--4.0--4.1.sql \
repmgr--4.1.sql \
@@ -27,7 +28,9 @@ DATA = \
repmgr--5.0--5.1.sql \
repmgr--5.1.sql \
repmgr--5.1--5.2.sql \
repmgr--5.2.sql
repmgr--5.2.sql \
repmgr--5.2--5.3.sql \
repmgr--5.3.sql
REGRESS = repmgr_extension

View File

@@ -90,3 +90,4 @@ Further reading
* [repmgr documentation](https://repmgr.org/docs/current/index.html)
* [How to Automate PostgreSQL 12 Replication and Failover with repmgr - Part 1](https://www.2ndquadrant.com/en/blog/how-to-automate-postgresql-12-replication-and-failover-with-repmgr-part-1/)
* [How to Automate PostgreSQL 12 Replication and Failover with repmgr - Part 2](https://www.2ndquadrant.com/en/blog/how-to-automate-postgresql-12-replication-and-failover-with-repmgr-part-2/)
* [How to implement repmgr for PostgreSQL automatic failover](https://www.enterprisedb.com/postgres-tutorials/how-implement-repmgr-postgresql-automatic-failover)

View File

@@ -581,6 +581,16 @@ struct ConfigFileSetting config_file_settings[] =
{ .strmaxlen = sizeof(config_file_options.repmgrd_pid_file) },
{ .postprocess_func = &repmgr_canonicalize_path }
},
/* repmgrd_exit_on_inactive_node */
{
"repmgrd_exit_on_inactive_node",
CONFIG_BOOL,
{ .boolptr = &config_file_options.repmgrd_exit_on_inactive_node},
{ .booldefault = DEFAULT_REPMGRD_EXIT_ON_INACTIVE_NODE },
{},
{},
{}
},
/* standby_disconnect_on_failover */
{
"standby_disconnect_on_failover",

View File

@@ -206,6 +206,7 @@ typedef struct
int primary_notification_timeout;
int repmgrd_standby_startup_timeout;
char repmgrd_pid_file[MAXPGPATH];
bool repmgrd_exit_on_inactive_node;
bool standby_disconnect_on_failover;
int sibling_nodes_disconnect_timeout;
ConnectionCheckType connection_check_type;

24
configure vendored
View File

@@ -1,6 +1,6 @@
#! /bin/sh
# Guess values for system-dependent variables and create Makefiles.
# Generated by GNU Autoconf 2.69 for repmgr 5.2.1.
# Generated by GNU Autoconf 2.69 for repmgr 5.3.0.
#
# Report bugs to <repmgr@googlegroups.com>.
#
@@ -582,8 +582,8 @@ MAKEFLAGS=
# Identity of this package.
PACKAGE_NAME='repmgr'
PACKAGE_TARNAME='repmgr'
PACKAGE_VERSION='5.2.1'
PACKAGE_STRING='repmgr 5.2.1'
PACKAGE_VERSION='5.3.0'
PACKAGE_STRING='repmgr 5.3.0'
PACKAGE_BUGREPORT='repmgr@googlegroups.com'
PACKAGE_URL='https://repmgr.org/'
@@ -1181,7 +1181,7 @@ if test "$ac_init_help" = "long"; then
# Omit some internal or obsolete options to make the list less imposing.
# This message is too long to be a string in the A/UX 3.1 sh.
cat <<_ACEOF
\`configure' configures repmgr 5.2.1 to adapt to many kinds of systems.
\`configure' configures repmgr 5.3.0 to adapt to many kinds of systems.
Usage: $0 [OPTION]... [VAR=VALUE]...
@@ -1242,7 +1242,7 @@ fi
if test -n "$ac_init_help"; then
case $ac_init_help in
short | recursive ) echo "Configuration of repmgr 5.2.1:";;
short | recursive ) echo "Configuration of repmgr 5.3.0:";;
esac
cat <<\_ACEOF
@@ -1316,7 +1316,7 @@ fi
test -n "$ac_init_help" && exit $ac_status
if $ac_init_version; then
cat <<\_ACEOF
repmgr configure 5.2.1
repmgr configure 5.3.0
generated by GNU Autoconf 2.69
Copyright (C) 2012 Free Software Foundation, Inc.
@@ -1335,7 +1335,7 @@ cat >config.log <<_ACEOF
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.
It was created by repmgr $as_me 5.2.1, which was
It was created by repmgr $as_me 5.3.0, which was
generated by GNU Autoconf 2.69. Invocation command line was
$ $0 $@
@@ -1811,11 +1811,11 @@ fi
pgac_pg_config_version=$($PG_CONFIG --version 2>/dev/null)
major_version_num=$(echo "$pgac_pg_config_version"|
$SED -e 's/^PostgreSQL \([0-9]\{1,2\}\).*$/\1/')
$SED -e 's/^[^0-9]\+ \([0-9]\{1,2\}\).*$/\1/')
if test "$major_version_num" -lt '10'; then
version_num=$(echo "$pgac_pg_config_version"|
$SED -e 's/^PostgreSQL \([0-9]*\)\.\([0-9]*\)\([a-zA-Z0-9.]*\)$/\1.\2/')
$SED -e 's/^[^0-9]\+ \([0-9]*\)\.\([0-9]*\)\([a-zA-Z0-9.]*\)$/\1.\2/')
if test -z "$version_num"; then
as_fn_error $? "could not detect the PostgreSQL version, wrong or broken pg_config?" "$LINENO" 5
@@ -1829,7 +1829,7 @@ if test "$major_version_num" -lt '10'; then
fi
else
version_num=$(echo "$pgac_pg_config_version"|
$SED -e 's/^PostgreSQL \(.\+\)$/\1/')
$SED -e 's/^[^0-9]\+ \(.\+\)$/\1/')
if test -z "$version_num"; then
as_fn_error $? "could not detect the PostgreSQL version, wrong or broken pg_config?" "$LINENO" 5
@@ -2487,7 +2487,7 @@ cat >>$CONFIG_STATUS <<\_ACEOF || ac_write_fail=1
# report actual input values of CONFIG_FILES etc. instead of their
# values after options handling.
ac_log="
This file was extended by repmgr $as_me 5.2.1, which was
This file was extended by repmgr $as_me 5.3.0, which was
generated by GNU Autoconf 2.69. Invocation command line was
CONFIG_FILES = $CONFIG_FILES
@@ -2550,7 +2550,7 @@ _ACEOF
cat >>$CONFIG_STATUS <<_ACEOF || ac_write_fail=1
ac_cs_config="`$as_echo "$ac_configure_args" | sed 's/^ //; s/[\\""\`\$]/\\\\&/g'`"
ac_cs_version="\\
repmgr config.status 5.2.1
repmgr config.status 5.3.0
configured by $0, generated by GNU Autoconf 2.69,
with options \\"\$ac_cs_config\\"

View File

@@ -1,4 +1,4 @@
AC_INIT([repmgr], [5.2.1], [repmgr@googlegroups.com], [repmgr], [https://repmgr.org/])
AC_INIT([repmgr], [5.3.1], [repmgr@googlegroups.com], [repmgr], [https://repmgr.org/])
AC_COPYRIGHT([Copyright (c) 2010-2021, EnterpriseDB Corporation])
@@ -19,11 +19,11 @@ fi
pgac_pg_config_version=$($PG_CONFIG --version 2>/dev/null)
major_version_num=$(echo "$pgac_pg_config_version"|
$SED -e 's/^PostgreSQL \([[0-9]]\{1,2\}\).*$/\1/')
$SED -e 's/^[[^0-9]]\+ \([[0-9]]\{1,2\}\).*$/\1/')
if test "$major_version_num" -lt '10'; then
version_num=$(echo "$pgac_pg_config_version"|
$SED -e 's/^PostgreSQL \([[0-9]]*\)\.\([[0-9]]*\)\([[a-zA-Z0-9.]]*\)$/\1.\2/')
$SED -e 's/^[[^0-9]]\+ \([[0-9]]*\)\.\([[0-9]]*\)\([[a-zA-Z0-9.]]*\)$/\1.\2/')
if test -z "$version_num"; then
AC_MSG_ERROR([could not detect the PostgreSQL version, wrong or broken pg_config?])
@@ -37,7 +37,7 @@ if test "$major_version_num" -lt '10'; then
fi
else
version_num=$(echo "$pgac_pg_config_version"|
$SED -e 's/^PostgreSQL \(.\+\)$/\1/')
$SED -e 's/^[[^0-9]]\+ \(.\+\)$/\1/')
if test -z "$version_num"; then
AC_MSG_ERROR([could not detect the PostgreSQL version, wrong or broken pg_config?])

View File

@@ -4242,7 +4242,7 @@ _create_event(PGconn *conn, t_configuration_options *options, int node_id, char
}
break;
case 'p':
/* %p: primary id ("standby_switchover": former primary id) */
/* %p: primary id ("standby_switchover"/"repmgrd_failover_promote": former primary id) */
src_ptr++;
if (event_info->node_id != UNKNOWN_NODE_ID)
{
@@ -6008,6 +6008,43 @@ is_wal_replay_paused(PGconn *conn, bool check_pending_wal)
return is_paused;
}
/* repmgrd status functions */
CheckStatus
get_repmgrd_status(PGconn *conn)
{
PQExpBufferData query;
PGresult *res = NULL;
CheckStatus repmgrd_status = CHECK_STATUS_CRITICAL;
initPQExpBuffer(&query);
appendPQExpBufferStr(&query,
" SELECT "
" CASE "
" WHEN repmgr.repmgrd_is_running() "
" THEN "
" CASE "
" WHEN repmgr.repmgrd_is_paused() THEN 1 ELSE 0 "
" END "
" ELSE 2 "
" END AS repmgrd_status");
res = PQexec(conn, query.data);
if (PQresultStatus(res) != PGRES_TUPLES_OK)
{
log_db_error(conn, query.data, _("unable to execute repmgrd status query"));
}
else
{
repmgrd_status = atoi(PQgetvalue(res, 0, 0));
}
termPQExpBuffer(&query);
PQclear(res);
return repmgrd_status;
}
/* miscellaneous debugging functions */

View File

@@ -602,6 +602,9 @@ int get_upstream_last_seen(PGconn *conn, t_server_type node_type);
bool is_wal_replay_paused(PGconn *conn, bool check_pending_wal);
/* repmgrd status functions */
CheckStatus get_repmgrd_status(PGconn *conn);
/* miscellaneous debugging functions */
const char *print_node_status(NodeStatus node_status);
const char *print_pqping_status(PGPing ping_status);

View File

@@ -16,17 +16,114 @@
</para>
<!-- remember to update the release date in ../repmgr_version.h.in -->
<sect1 id="release-5.2.2">
<title id="release-current">Release 5.2.2</title>
<para><emphasis>??? ? ???, 2021</emphasis></para>
<sect1 id="release-5.3.1">
<title id="release-current">Release 5.3.1</title>
<para><emphasis>Tue 15 February, 2022</emphasis></para>
<para>
&repmgr; 5.2.2 is a minor release.
&repmgr; 5.3.1 is a minor release.
</para>
<sect2>
<title>Bug fixes</title>
<para>
<itemizedlist>
<listitem>
<para>
Fix upgrade path from &repmgr; 4.2 and 4.3 to &repmgr; 5.3.
</para>
</listitem>
<listitem>
<para>
&repmgrd;: ensure potentially open connections are closed.
</para>
<para>
In some cases, when recovering from degraded state in local node monitoring,
new connection was opened to the local node without closing
the old one, which will result in memory leakage.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
</sect1>
<itemizedlist>
<sect1 id="release-5.3.0">
<title>Release 5.3.0</title>
<para><emphasis>Tue 12 October, 2021</emphasis></para>
<para>
&repmgr; 5.3.0 is a major release.
</para>
<para>
This release provides support for <ulink url="https://www.postgresql.org/docs/14/release-14.html">PostgreSQL 14</ulink>,
released in September 2021.
</para>
<sect2>
<title>Improvements</title>
<para>
<itemizedlist>
<listitem>
<para>
<link linkend="repmgr-standby-switchover"><command>repmgr standby switchover</command></link>:
Improve handling of node rejoin failure on the demotion candidate.
</para>
<para>
Previously &repmgr; did not check whether <command>repmgr node rejoin</command> actually
succeeded on the demotion candidate, and would always wait up to <varname>node_rejoin_timeout</varname>
seconds for it to attach to the promotion candidate, even if this would never happen.
</para>
<para>
This makes it easier to identify unexpected events during a switchover operation, such as
the demotion candidate being unexpectedly restarted by an external process.
</para>
<para>
Note that the output of the <link linkend="repmgr-node-rejoin"><command>repmgr node rejoin</command></link>
operation on the demotion candidate will now be logged to a temporary file on that node;
the location of the file will be reported in the error message, if one is emitted.
</para>
</listitem>
<listitem>
<para>
&repmgrd;: at startup, if node record is marked as "inactive", attempt
to set it to "active".
</para>
<para>
This behaviour can be overridden by setting the configuration parameter
<varname>repmgrd_exit_on_inactive_node</varname> to <literal>true</literal>.
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-node-rejoin">repmgr node rejoin</link></command>:
emit rejoin target note information as <literal>NOTICE</literal>.
</para>
<para>
This makes it clearer what &repmgr; is trying to do.
</para>
</listitem>
<listitem>
<para>
<link linkend="repmgr-node-check">repmgr node check</link>:
option <option>--repmgrd</option> added to check &repmgrd;
status.
</para>
</listitem>
<listitem>
<para>
Add <literal>%p</literal> <link linkend="event-notifications">event notification parameter</link>
providing the node ID of the former primary for the <literal>repmgrd_failover_promote</literal> event.
</para>
</listitem>
</itemizedlist>
</para>
</sect2>
<sect2>
<title>Bug fixes</title>
<para>
<itemizedlist>
<listitem>
<para>
<command><link linkend="repmgr-standby-clone">repmgr standby clone</link></command>:
@@ -38,22 +135,24 @@
</para>
</listitem>
<listitem>
<para>
<command><link linkend="repmgr-node-rejoin">repmgr node rejoin</link></command>:
emit rejoin target note information as <literal>NOTICE</literal>.
</para>
<para>
This makes it clearer what &repmgr; is trying to do.
</para>
</listitem>
<para>
&repmgrd;: rename internal shared library functions to minimize the
risk of clashes with other shared libraries.
</para>
<para>
This does not affect user-facing SQL functions. However an upgrade
of the installed extension version is required.
</para>
</listitem>
<listitem>
<para>
&repmgrd;: ensure short option <option>-s</option> is accepted.
</para>
</listitem>
</itemizedlist>
</itemizedlist>
</para>
</sect2>
</sect1>

View File

@@ -262,7 +262,7 @@ conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'</programlistin
<indexterm>
<primary>repmgr.conf</primary>
<secondary>ostgreSQL major version upgrades</secondary>
<secondary>PostgreSQL major version upgrades</secondary>
</indexterm>
<para>

View File

@@ -95,7 +95,8 @@
</para>
<para>
The following parameters are provided for a subset of event notifications:
The following parameters are provided for a subset of event notifications; their meaning may
change according to context:
</para>
<variablelist>
@@ -108,6 +109,9 @@
<para>
node ID of the demoted primary (<xref linkend="repmgr-standby-switchover"/> only)
</para>
<para>
node ID of the former primary (<literal>repmgrd_failover_promote</literal> only)
</para>
</listitem>
</varlistentry>
<varlistentry>
@@ -133,7 +137,7 @@
<para>
The values provided for <literal>%c</literal> and <literal>%a</literal>
will probably contain spaces, so should always be quoted.
may contain spaces, so should always be quoted.
</para>
<para>

View File

@@ -112,10 +112,9 @@
</thead>
<tbody>
<row>
<entry>
&repmgr; 5.2
&repmgr; 5.3
</entry>
<entry>
YES
@@ -123,6 +122,21 @@
<entry>
<link linkend="release-current">&repmgrversion;</link> (&releasedate;)
</entry>
<entry>
9.4, 9.5, 9.6, 10, 11, 12, 13, 14
</entry>
</row>
<row>
<entry>
&repmgr; 5.2
</entry>
<entry>
NO
</entry>
<entry>
<link linkend="release-5.2.1">5.2.1</link> (2020-12-07)
</entry>
<entry>
9.4, 9.5, 9.6, 10, 11, 12, 13
</entry>

View File

@@ -125,12 +125,29 @@
is correctly configured.
</simpara>
</listitem>
</itemizedlist>
</para>
</refsect1>
<refsect1>
<title>repmgrd</title>
<para>
A separate check is available to verify whether &repmgrd; is running,
This is not included in the general output, as this does not
per-se constitute a check of the node's replication status.
</para>
<itemizedlist spacing="compact" mark="bullet">
<listitem>
<simpara>
<option>--repmgrd</option>: checks whether &repmgrd; is running.
If &repmgrd; is running but paused, status <literal>1</literal>
(<literal>WARNING</literal>) is returned.
</simpara>
</listitem>
</itemizedlist>
</refsect1>
<refsect1>
<title>Additional checks</title>
<para>

View File

@@ -26,7 +26,7 @@
<abstract>
<para>
This is the official documentation of &repmgr; &repmgrversion; for
use with PostgreSQL 9.4 - PostgreSQL 13.
use with PostgreSQL 9.4 - PostgreSQL 14.
</para>
<para>
&repmgr; is being continually developed and we strongly recommend using the

View File

@@ -485,6 +485,32 @@
</listitem>
</varlistentry>
<varlistentry>
<term><option>repmgrd_exit_on_inactive_node</option></term>
<listitem>
<indexterm>
<primary>repmgrd_exit_on_inactive_node</primary>
</indexterm>
<para>
This parameter is available in &repmgr; 5.3 and later.
</para>
<para>
If a node was marked as inactive but is running, and this option is set to
<literal>true</literal>, &repmgrd; will abort on startup.
</para>
<para>
By default, <option>repmgrd_exit_on_inactive_node</option> is set
to <literal>false</literal>, in which case &repmgrd; will set the
node record to active on startup.
</para>
<para>
Setting this parameter to <literal>true</literal> causes &repmgrd;
to behave in the same way it did in &repmgr; 5.2 and earlier.
</para>
</listitem>
</varlistentry>
</variablelist>
<para>
@@ -1053,6 +1079,29 @@ REPMGRD_OPTS="--daemonize=false"
</para>
</sect2>
<sect2 id="repmgrd-daemon-monitoring">
<title>repmgrd daemon monitoring</title>
<indexterm>
<primary>repmgrd</primary>
<secondary>monitoring</secondary>
</indexterm>
<indexterm>
<primary>monitoring</primary>
<secondary>repmgrd</secondary>
</indexterm>
<para>
The command <command><link linkend="repmgr-service-status">repmgr service status</link></command>
provides an overview of the &repmgrd; daemon status (including pause status)
on all nodes in the cluster.
</para>
<para>
From &repmgr; 5.3, <command><link linkend="repmgr-node-check">repmgr node check --repmgrd</link></command>
can be used to check the status of &repmgrd; (including pause status)
on the local node.
</para>
</sect2>
</sect1>
<sect1 id="repmgrd-connection-settings">

View File

@@ -1,17 +1,10 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit
\echo Use "ALTER EXTENSION repmgr UPDATE" to load this file. \quit
CREATE FUNCTION set_upstream_last_seen()
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_upstream_last_seen'
LANGUAGE C STRICT;
-- This script is intentionally empty and exists to skip the CREATE FUNCTION
-- commands contained in the 4.2--4.3 and 4.3--4.4 extension upgrade scripts,
-- which reference C functions which no longer exist in 5.3 and later.
--
-- These functions will be explicitly created in the 5.2--5.3 extension
-- upgrade step with the correct C function references.
CREATE FUNCTION get_upstream_last_seen()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_upstream_last_seen'
LANGUAGE C STRICT;
CREATE FUNCTION get_wal_receiver_pid()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_wal_receiver_pid'
LANGUAGE C STRICT;

View File

@@ -1,19 +1,9 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit
\echo Use "ALTER EXTENSION repmgr UPDATE" to load this file. \quit
DROP FUNCTION set_upstream_last_seen();
CREATE FUNCTION set_upstream_last_seen(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_upstream_last_seen'
LANGUAGE C STRICT;
CREATE FUNCTION get_upstream_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_upstream_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION set_upstream_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_upstream_node_id'
LANGUAGE C STRICT;
-- This script is intentionally empty and exists to skip the CREATE FUNCTION
-- commands contained in the 4.3--4.4 extension upgrade script, which reference
-- C functions which no longer exist in 5.3 and later.
--
-- These functions will be explicitly created in the 5.2--5.3 extension
-- upgrade step with the correct C function references.

64
repmgr--5.2--5.3.sql Normal file
View File

@@ -0,0 +1,64 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit
CREATE OR REPLACE FUNCTION set_local_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_set_local_node_id'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION repmgr.get_local_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_local_node_id'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION standby_set_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'repmgr_standby_set_last_updated'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION standby_get_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'repmgr_standby_get_last_updated'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION set_upstream_last_seen(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_set_upstream_last_seen'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION get_upstream_last_seen()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_upstream_last_seen'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION get_upstream_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_upstream_node_id'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION set_upstream_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_set_upstream_node_id'
LANGUAGE C STRICT;
/* failover functions */
CREATE OR REPLACE FUNCTION notify_follow_primary(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_notify_follow_primary'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION get_new_primary()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_new_primary'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION reset_voting_status()
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_reset_voting_status'
LANGUAGE C STRICT;
CREATE OR REPLACE FUNCTION get_wal_receiver_pid()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_wal_receiver_pid'
LANGUAGE C STRICT;

192
repmgr--5.3.sql Normal file
View File

@@ -0,0 +1,192 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit
CREATE TABLE repmgr.nodes (
node_id INTEGER PRIMARY KEY,
upstream_node_id INTEGER NULL REFERENCES nodes (node_id) DEFERRABLE,
active BOOLEAN NOT NULL DEFAULT TRUE,
node_name TEXT NOT NULL,
type TEXT NOT NULL CHECK (type IN('primary','standby','witness','bdr')),
location TEXT NOT NULL DEFAULT 'default',
priority INT NOT NULL DEFAULT 100,
conninfo TEXT NOT NULL,
repluser VARCHAR(63) NOT NULL,
slot_name TEXT NULL,
config_file TEXT NOT NULL
);
SELECT pg_catalog.pg_extension_config_dump('repmgr.nodes', '');
CREATE TABLE repmgr.events (
node_id INTEGER NOT NULL,
event TEXT NOT NULL,
successful BOOLEAN NOT NULL DEFAULT TRUE,
event_timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
details TEXT NULL
);
SELECT pg_catalog.pg_extension_config_dump('repmgr.events', '');
CREATE TABLE repmgr.monitoring_history (
primary_node_id INTEGER NOT NULL,
standby_node_id INTEGER NOT NULL,
last_monitor_time TIMESTAMP WITH TIME ZONE NOT NULL,
last_apply_time TIMESTAMP WITH TIME ZONE,
last_wal_primary_location PG_LSN NOT NULL,
last_wal_standby_location PG_LSN,
replication_lag BIGINT NOT NULL,
apply_lag BIGINT NOT NULL
);
CREATE INDEX idx_monitoring_history_time
ON repmgr.monitoring_history (last_monitor_time, standby_node_id);
SELECT pg_catalog.pg_extension_config_dump('repmgr.monitoring_history', '');
CREATE VIEW repmgr.show_nodes AS
SELECT n.node_id,
n.node_name,
n.active,
n.upstream_node_id,
un.node_name AS upstream_node_name,
n.type,
n.priority,
n.conninfo
FROM repmgr.nodes n
LEFT JOIN repmgr.nodes un
ON un.node_id = n.upstream_node_id;
CREATE TABLE repmgr.voting_term (
term INT NOT NULL
);
CREATE UNIQUE INDEX voting_term_restrict
ON repmgr.voting_term ((TRUE));
CREATE RULE voting_term_delete AS
ON DELETE TO repmgr.voting_term
DO INSTEAD NOTHING;
/* ================= */
/* repmgrd functions */
/* ================= */
/* monitoring functions */
CREATE FUNCTION set_local_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_set_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION get_local_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION standby_set_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'repmgr_standby_set_last_updated'
LANGUAGE C STRICT;
CREATE FUNCTION standby_get_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'repmgr_standby_get_last_updated'
LANGUAGE C STRICT;
CREATE FUNCTION set_upstream_last_seen(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_set_upstream_last_seen'
LANGUAGE C STRICT;
CREATE FUNCTION get_upstream_last_seen()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_upstream_last_seen'
LANGUAGE C STRICT;
CREATE FUNCTION get_upstream_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_upstream_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION set_upstream_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_set_upstream_node_id'
LANGUAGE C STRICT;
/* failover functions */
CREATE FUNCTION notify_follow_primary(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_notify_follow_primary'
LANGUAGE C STRICT;
CREATE FUNCTION get_new_primary()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_new_primary'
LANGUAGE C STRICT;
CREATE FUNCTION reset_voting_status()
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_reset_voting_status'
LANGUAGE C STRICT;
CREATE FUNCTION get_repmgrd_pid()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_repmgrd_pid'
LANGUAGE C STRICT;
CREATE FUNCTION get_repmgrd_pidfile()
RETURNS TEXT
AS 'MODULE_PATHNAME', 'get_repmgrd_pidfile'
LANGUAGE C STRICT;
CREATE FUNCTION set_repmgrd_pid(INT, TEXT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_repmgrd_pid'
LANGUAGE C CALLED ON NULL INPUT;
CREATE FUNCTION repmgrd_is_running()
RETURNS BOOL
AS 'MODULE_PATHNAME', 'repmgrd_is_running'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_pause(BOOL)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgrd_pause'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_is_paused()
RETURNS BOOL
AS 'MODULE_PATHNAME', 'repmgrd_is_paused'
LANGUAGE C STRICT;
CREATE FUNCTION get_wal_receiver_pid()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_wal_receiver_pid'
LANGUAGE C STRICT;
/* views */
CREATE VIEW repmgr.replication_status AS
SELECT m.primary_node_id, m.standby_node_id, n.node_name AS standby_name,
n.type AS node_type, n.active, last_monitor_time,
CASE WHEN n.type='standby' THEN m.last_wal_primary_location ELSE NULL END AS last_wal_primary_location,
m.last_wal_standby_location,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.replication_lag) ELSE NULL END AS replication_lag,
CASE WHEN n.type='standby' THEN
CASE WHEN replication_lag > 0 THEN age(now(), m.last_apply_time) ELSE '0'::INTERVAL END
ELSE NULL
END AS replication_time_lag,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.apply_lag) ELSE NULL END AS apply_lag,
AGE(NOW(), CASE WHEN pg_catalog.pg_is_in_recovery() THEN repmgr.standby_get_last_updated() ELSE m.last_monitor_time END) AS communication_time_lag
FROM repmgr.monitoring_history m
JOIN repmgr.nodes n ON m.standby_node_id = n.node_id
WHERE (m.standby_node_id, m.last_monitor_time) IN (
SELECT m1.standby_node_id, MAX(m1.last_monitor_time)
FROM repmgr.monitoring_history m1 GROUP BY 1
);

245
repmgr--unpackaged--5.3.sql Normal file
View File

@@ -0,0 +1,245 @@
-- complain if script is sourced in psql, rather than via CREATE EXTENSION
\echo Use "CREATE EXTENSION repmgr" to load this file. \quit
-- extract the current schema name
-- NOTE: this assumes there will be only one schema matching 'repmgr_%';
-- user is responsible for ensuring this is the case
CREATE TEMPORARY TABLE repmgr_old_schema (schema_name TEXT);
INSERT INTO repmgr_old_schema (schema_name)
SELECT nspname AS schema_name
FROM pg_catalog.pg_namespace
WHERE nspname LIKE 'repmgr_%'
LIMIT 1;
-- move old objects into new schema
DO $repmgr$
DECLARE
old_schema TEXT;
BEGIN
SELECT schema_name FROM repmgr_old_schema
INTO old_schema;
EXECUTE format('ALTER TABLE %I.repl_nodes SET SCHEMA repmgr', old_schema);
EXECUTE format('ALTER TABLE %I.repl_events SET SCHEMA repmgr', old_schema);
EXECUTE format('ALTER TABLE %I.repl_monitor SET SCHEMA repmgr', old_schema);
EXECUTE format('DROP VIEW IF EXISTS %I.repl_show_nodes', old_schema);
EXECUTE format('DROP VIEW IF EXISTS %I.repl_status', old_schema);
END$repmgr$;
-- convert "repmgr_$cluster.repl_nodes" to "repmgr.nodes"
CREATE TABLE repmgr.nodes (
node_id INTEGER PRIMARY KEY,
upstream_node_id INTEGER NULL REFERENCES repmgr.nodes (node_id) DEFERRABLE,
active BOOLEAN NOT NULL DEFAULT TRUE,
node_name TEXT NOT NULL,
type TEXT NOT NULL CHECK (type IN('primary','standby','witness','bdr')),
location TEXT NOT NULL DEFAULT 'default',
priority INT NOT NULL DEFAULT 100,
conninfo TEXT NOT NULL,
repluser VARCHAR(63) NOT NULL,
slot_name TEXT NULL,
config_file TEXT NOT NULL
);
INSERT INTO repmgr.nodes
(node_id, upstream_node_id, active, node_name, type, location, priority, conninfo, repluser, slot_name, config_file)
SELECT id, upstream_node_id, active, name,
CASE WHEN type = 'master' THEN 'primary' ELSE type END,
'default', priority, conninfo, 'unknown', slot_name, 'unknown'
FROM repmgr.repl_nodes
ORDER BY id;
-- convert "repmgr_$cluster.repl_event" to "event"
CREATE TABLE repmgr.events (
node_id INTEGER NOT NULL,
event TEXT NOT NULL,
successful BOOLEAN NOT NULL DEFAULT TRUE,
event_timestamp TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP,
details TEXT NULL
);
INSERT INTO repmgr.events
(node_id, event, successful, event_timestamp, details)
SELECT node_id, event, successful, event_timestamp, details
FROM repmgr.repl_events;
-- create new table "repmgr.voting_term"
CREATE TABLE repmgr.voting_term (
term INT NOT NULL
);
CREATE UNIQUE INDEX voting_term_restrict
ON repmgr.voting_term ((TRUE));
CREATE RULE voting_term_delete AS
ON DELETE TO repmgr.voting_term
DO INSTEAD NOTHING;
INSERT INTO repmgr.voting_term (term) VALUES (1);
-- convert "repmgr_$cluster.repl_monitor" to "monitoring_history"
CREATE TABLE repmgr.monitoring_history (
primary_node_id INTEGER NOT NULL,
standby_node_id INTEGER NOT NULL,
last_monitor_time TIMESTAMP WITH TIME ZONE NOT NULL,
last_apply_time TIMESTAMP WITH TIME ZONE,
last_wal_primary_location PG_LSN NOT NULL,
last_wal_standby_location PG_LSN,
replication_lag BIGINT NOT NULL,
apply_lag BIGINT NOT NULL
);
INSERT INTO repmgr.monitoring_history
(primary_node_id, standby_node_id, last_monitor_time, last_apply_time, last_wal_primary_location, last_wal_standby_location, replication_lag, apply_lag)
SELECT primary_node, standby_node, last_monitor_time, last_apply_time, last_wal_primary_location::pg_lsn, last_wal_standby_location::pg_lsn, replication_lag, apply_lag
FROM repmgr.repl_monitor;
CREATE INDEX idx_monitoring_history_time
ON repmgr.monitoring_history (last_monitor_time, standby_node_id);
CREATE VIEW repmgr.show_nodes AS
SELECT n.node_id,
n.node_name,
n.active,
n.upstream_node_id,
un.node_name AS upstream_node_name,
n.type,
n.priority,
n.conninfo
FROM repmgr.nodes n
LEFT JOIN repmgr.nodes un
ON un.node_id = n.upstream_node_id;
/* ================= */
/* repmgrd functions */
/* ================= */
/* monitoring functions */
CREATE FUNCTION set_local_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_set_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION get_local_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_local_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION standby_set_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'repmgr_standby_set_last_updated'
LANGUAGE C STRICT;
CREATE FUNCTION standby_get_last_updated()
RETURNS TIMESTAMP WITH TIME ZONE
AS 'MODULE_PATHNAME', 'repmgr_standby_get_last_updated'
LANGUAGE C STRICT;
CREATE FUNCTION set_upstream_last_seen(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_set_upstream_last_seen'
LANGUAGE C STRICT;
CREATE FUNCTION get_upstream_last_seen()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_upstream_last_seen'
LANGUAGE C STRICT;
CREATE FUNCTION get_upstream_node_id()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_upstream_node_id'
LANGUAGE C STRICT;
CREATE FUNCTION set_upstream_node_id(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_set_upstream_node_id'
LANGUAGE C STRICT;
/* failover functions */
CREATE FUNCTION notify_follow_primary(INT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_notify_follow_primary'
LANGUAGE C STRICT;
CREATE FUNCTION get_new_primary()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_new_primary'
LANGUAGE C STRICT;
CREATE FUNCTION reset_voting_status()
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgr_reset_voting_status'
LANGUAGE C STRICT;
CREATE FUNCTION get_repmgrd_pid()
RETURNS INT
AS 'MODULE_PATHNAME', 'get_repmgrd_pid'
LANGUAGE C STRICT;
CREATE FUNCTION get_repmgrd_pidfile()
RETURNS TEXT
AS 'MODULE_PATHNAME', 'get_repmgrd_pidfile'
LANGUAGE C STRICT;
CREATE FUNCTION set_repmgrd_pid(INT, TEXT)
RETURNS VOID
AS 'MODULE_PATHNAME', 'set_repmgrd_pid'
LANGUAGE C CALLED ON NULL INPUT;
CREATE FUNCTION repmgrd_is_running()
RETURNS BOOL
AS 'MODULE_PATHNAME', 'repmgrd_is_running'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_pause(BOOL)
RETURNS VOID
AS 'MODULE_PATHNAME', 'repmgrd_pause'
LANGUAGE C STRICT;
CREATE FUNCTION repmgrd_is_paused()
RETURNS BOOL
AS 'MODULE_PATHNAME', 'repmgrd_is_paused'
LANGUAGE C STRICT;
CREATE FUNCTION get_wal_receiver_pid()
RETURNS INT
AS 'MODULE_PATHNAME', 'repmgr_get_wal_receiver_pid'
LANGUAGE C STRICT;
/* views */
CREATE VIEW repmgr.replication_status AS
SELECT m.primary_node_id, m.standby_node_id, n.node_name AS standby_name,
n.type AS node_type, n.active, last_monitor_time,
CASE WHEN n.type='standby' THEN m.last_wal_primary_location ELSE NULL END AS last_wal_primary_location,
m.last_wal_standby_location,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.replication_lag) ELSE NULL END AS replication_lag,
CASE WHEN n.type='standby' THEN
CASE WHEN replication_lag > 0 THEN age(now(), m.last_apply_time) ELSE '0'::INTERVAL END
ELSE NULL
END AS replication_time_lag,
CASE WHEN n.type='standby' THEN pg_catalog.pg_size_pretty(m.apply_lag) ELSE NULL END AS apply_lag,
AGE(NOW(), CASE WHEN pg_catalog.pg_is_in_recovery() THEN repmgr.standby_get_last_updated() ELSE m.last_monitor_time END) AS communication_time_lag
FROM repmgr.monitoring_history m
JOIN repmgr.nodes n ON m.standby_node_id = n.node_id
WHERE (m.standby_node_id, m.last_monitor_time) IN (
SELECT m1.standby_node_id, MAX(m1.last_monitor_time)
FROM repmgr.monitoring_history m1 GROUP BY 1
);
/* drop old tables */
DROP TABLE repmgr.repl_nodes;
DROP TABLE repmgr.repl_monitor;
DROP TABLE repmgr.repl_events;
-- remove temporary table
DROP TABLE repmgr_old_schema;

View File

@@ -35,6 +35,7 @@
static bool copy_file(const char *src_file, const char *dest_file);
static void format_archive_dir(PQExpBufferData *archive_dir);
static t_server_action parse_server_action(const char *action);
static const char *output_repmgrd_status(CheckStatus status);
static void exit_optformat_error(const char *error, int errcode);
@@ -52,9 +53,11 @@ static CheckStatus do_node_check_role(PGconn *conn, OutputMode mode, t_node_info
static CheckStatus do_node_check_slots(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
static CheckStatus do_node_check_missing_slots(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
static CheckStatus do_node_check_data_directory(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
static CheckStatus do_node_check_repmgrd(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
static CheckStatus do_node_check_replication_config_owner(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output);
static CheckStatus do_node_check_db_connection(PGconn *conn, OutputMode mode);
/*
* NODE STATUS
*
@@ -941,6 +944,16 @@ do_node_check(void)
exit(return_code);
}
if (runtime_options.repmgrd == true)
{
return_code = do_node_check_repmgrd(conn,
runtime_options.output_mode,
&node_info,
NULL);
PQfinish(conn);
exit(return_code);
}
if (runtime_options.replication_config_owner == true)
{
return_code = do_node_check_replication_config_owner(conn,
@@ -2024,7 +2037,6 @@ do_node_check_missing_slots(PGconn *conn, OutputMode mode, t_node_info *node_inf
return status;
}
CheckStatus
do_node_check_data_directory(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output)
{
@@ -2159,6 +2171,53 @@ do_node_check_data_directory(PGconn *conn, OutputMode mode, t_node_info *node_in
return status;
}
CheckStatus
do_node_check_repmgrd(PGconn *conn, OutputMode mode, t_node_info *node_info, CheckStatusList *list_output)
{
CheckStatus status = CHECK_STATUS_OK;
if (mode == OM_CSV && list_output == NULL)
{
log_error(_("--csv output not provided with --repmgrd option"));
PQfinish(conn);
exit(ERR_BAD_CONFIG);
}
status = get_repmgrd_status(conn);
switch (mode)
{
case OM_OPTFORMAT:
printf("--repmgrd=%s\n",
output_check_status(status));
break;
case OM_NAGIOS:
printf("REPMGRD %s: %s\n",
output_check_status(status),
output_repmgrd_status(status));
break;
case OM_CSV:
case OM_TEXT:
if (list_output != NULL)
{
check_status_list_set(list_output,
"repmgrd",
status,
output_repmgrd_status(status));
}
else
{
printf("%s (%s)\n",
output_check_status(status),
output_repmgrd_status(status));
}
default:
break;
}
return status;
}
/*
* This is not included in the general list output
*/
@@ -2820,7 +2879,8 @@ do_node_rejoin(void)
log_notice(_("temporarily removing \"standby.signal\""));
log_detail(_("this is required so pg_rewind can fix the unclean shutdown"));
make_standby_signal_path(standby_signal_file_path);
make_standby_signal_path(config_file_options.data_directory,
standby_signal_file_path);
if (unlink(standby_signal_file_path) < 0 && errno != ENOENT)
{
@@ -2845,7 +2905,7 @@ do_node_rejoin(void)
* of whether the pg_rewind operation failed.
*/
log_notice(_("recreating \"standby.signal\""));
write_standby_signal();
write_standby_signal(config_file_options.data_directory);
}
if (ret == false)
@@ -3569,6 +3629,25 @@ copy_file(const char *src_file, const char *dest_file)
}
static const char *
output_repmgrd_status(CheckStatus status)
{
switch (status)
{
case CHECK_STATUS_OK:
return "repmgrd running";
case CHECK_STATUS_WARNING:
return "repmgrd running but paused";
case CHECK_STATUS_CRITICAL:
return "repmgrd not running";
case CHECK_STATUS_UNKNOWN:
return "repmgrd status unknown";
}
return "UNKNOWN";
}
void
do_node_help(void)
{
@@ -3611,6 +3690,7 @@ do_node_help(void)
printf(_(" --role check node has expected role\n"));
printf(_(" --slots check for inactive replication slots\n"));
printf(_(" --missing-slots check for missing replication slots\n"));
printf(_(" --repmgrd check if repmgrd is running\n"));
printf(_(" --data-directory-config check repmgr's data directory configuration\n"));
puts("");

View File

@@ -20,6 +20,7 @@
*/
#include <sys/stat.h>
#include <time.h>
#include "repmgr.h"
#include "dirutil.h"
@@ -173,21 +174,6 @@ do_standby_clone(void)
initialize_conninfo_params(&recovery_conninfo, false);
/*
* --replication-conf-only provided - we'll handle that separately
*/
if (runtime_options.replication_conf_only == true)
{
return _do_create_replication_conf();
}
/*
* conninfo params for the actual upstream node (which might be different
* to the node we're cloning from) to write to recovery.conf
*/
mode = get_standby_clone_mode();
/*
* Copy the provided data directory; if a configuration file was provided,
* use the (mandatory) value from that; if -D/--pgdata was provided, use
@@ -215,6 +201,19 @@ do_standby_clone(void)
exit(ERR_BAD_CONFIG);
}
/*
* --replication-conf-only provided - we'll handle that separately
*/
if (runtime_options.replication_conf_only == true)
{
return _do_create_replication_conf();
}
/*
* conninfo params for the actual upstream node (which might be different
* to the node we're cloning from) to write to recovery.conf
*/
mode = get_standby_clone_mode();
if (mode == barman)
{
@@ -670,6 +669,15 @@ do_standby_clone(void)
log_hint(_("consider using the -c/--fast-checkpoint option"));
}
if (mode == pg_basebackup)
{
/*
* In --dry-run mode, this will just output the pg_basebackup command which
* would be executed.
*/
run_basebackup(&local_node_record);
}
PQfinish(source_conn);
log_info(_("all prerequisites for \"standby clone\" are met"));
@@ -1538,7 +1546,7 @@ _do_create_replication_conf(void)
}
else
{
if (write_standby_signal() == false)
if (write_standby_signal(local_data_directory) == false)
{
log_error(_("unable to write \"standby.signal\" file"));
}
@@ -1999,7 +2007,7 @@ do_standby_register(void)
/* only do this if record does not exist */
if (record_status != RECORD_FOUND)
{
log_warning(_("--upstream-node-id not supplied, assuming upstream node is primary (node ID %i)"),
log_warning(_("--upstream-node-id not supplied, assuming upstream node is primary (node ID: %i)"),
primary_node_id);
/* check our standby is connected */
@@ -3644,8 +3652,9 @@ do_standby_switchover(void)
PQExpBufferData remote_command_str;
PQExpBufferData command_output;
PQExpBufferData node_rejoin_options;
PQExpBufferData errmsg;
PQExpBufferData logmsg;
PQExpBufferData detailmsg;
PQExpBufferData event_details;
int r,
i;
@@ -3662,6 +3671,9 @@ do_standby_switchover(void)
/* store list of configuration files on the demotion candidate */
KeyValueList remote_config_files = {NULL, NULL};
/* temporary log file for "repmgr node rejoin" on the demotion candidate */
char node_rejoin_log[MAXPGPATH] = "";
NodeInfoList sibling_nodes = T_NODE_INFO_LIST_INITIALIZER;
SiblingNodeStats sibling_nodes_stats = T_SIBLING_NODES_STATS_INITIALIZER;
@@ -3804,24 +3816,24 @@ do_standby_switchover(void)
* the demotion candidate as the rejoin will fail if we are unable to to write to that.
*/
initPQExpBuffer(&errmsg);
initPQExpBuffer(&logmsg);
initPQExpBuffer(&detailmsg);
if (check_replication_config_owner(PQserverVersion(local_conn),
config_file_options.data_directory,
&errmsg, &detailmsg) == false)
&logmsg, &detailmsg) == false)
{
log_error("%s", errmsg.data);
log_error("%s", logmsg.data);
log_detail("%s", detailmsg.data);
termPQExpBuffer(&errmsg);
termPQExpBuffer(&logmsg);
termPQExpBuffer(&detailmsg);
PQfinish(local_conn);
exit(ERR_BAD_CONFIG);
}
termPQExpBuffer(&errmsg);
termPQExpBuffer(&logmsg);
termPQExpBuffer(&detailmsg);
/* check remote server connection and retrieve its record */
@@ -4769,6 +4781,7 @@ do_standby_switchover(void)
repmgrd_info = (RepmgrdInfo **) pg_malloc0(sizeof(RepmgrdInfo *) * all_nodes.node_count);
log_notice(_("attempting to pause repmgrd on %i nodes"), all_nodes.node_count);
for (cell = all_nodes.head; cell; cell = cell->next)
{
repmgrd_info[i] = pg_malloc0(sizeof(RepmgrdInfo));
@@ -4797,7 +4810,7 @@ do_standby_switchover(void)
unreachable_node_count++;
item_list_append_format(&repmgrd_connection_errors,
_("unable to connect to node \"%s\" (ID %i):\n%s"),
_("unable to connect to node \"%s\" (ID: %i):\n%s"),
cell->node_info->node_name,
cell->node_info->node_id,
PQerrorMessage(cell->node_info->conn));
@@ -4828,8 +4841,9 @@ do_standby_switchover(void)
initPQExpBuffer(&msg);
appendPQExpBuffer(&msg,
_("unable to connect to %i node(s), unable to pause all repmgrd instances"),
unreachable_node_count);
_("unable to connect to %i of %i node(s), unable to pause all repmgrd instances"),
unreachable_node_count,
all_nodes.node_count);
initPQExpBuffer(&detail);
@@ -4880,7 +4894,7 @@ do_standby_switchover(void)
*/
if (repmgrd_info[i]->pg_running == false)
{
log_warning(_("node \"%s\" (ID %i) unreachable, unable to pause repmgrd"),
log_warning(_("node \"%s\" (ID: %i) unreachable, unable to pause repmgrd"),
cell->node_info->node_name,
cell->node_info->node_id);
i++;
@@ -4893,7 +4907,7 @@ do_standby_switchover(void)
*/
if (repmgrd_info[i]->running == false)
{
log_warning(_("repmgrd not running on node \"%s\" (ID %i)"),
log_notice(_("repmgrd not running on node \"%s\" (ID: %i), not pausing"),
cell->node_info->node_name,
cell->node_info->node_id);
i++;
@@ -4914,14 +4928,14 @@ do_standby_switchover(void)
if (runtime_options.dry_run == true)
{
log_info(_("would pause repmgrd on node \"%s\" (ID %i)"),
log_info(_("would pause repmgrd on node \"%s\" (ID: %i)"),
cell->node_info->node_name,
cell->node_info->node_id);
}
else
{
/* XXX check result */
log_debug("pausing repmgrd on node \"%s\" (ID %i)",
log_debug("pausing repmgrd on node \"%s\" (ID: %i)",
cell->node_info->node_name,
cell->node_info->node_id);
@@ -5221,6 +5235,18 @@ do_standby_switchover(void)
format_lsn(replication_info.last_wal_receive_lsn),
format_lsn(remote_last_checkpoint_lsn));
/*
* optionally add a delay before promoting the standby; this is mainly
* useful for testing (e.g. for reappearance of the original primary) and
* is not documented.
*/
if (config_file_options.promote_delay > 0)
{
log_debug("sleeping %i seconds before promoting standby",
config_file_options.promote_delay);
sleep(config_file_options.promote_delay);
}
/*
* Promote standby (local node).
*
@@ -5346,6 +5372,21 @@ do_standby_switchover(void)
pfree(conninfo_normalized);
}
/* */
snprintf(node_rejoin_log, MAXPGPATH,
#if defined(__i386__) || defined(__i386)
"/tmp/node-rejoin.%u.log",
(unsigned)time(NULL)
#else
"/tmp/node-rejoin.%lu.log",
(unsigned long)time(NULL)
#endif
);
appendPQExpBuffer(&remote_command_str,
" > %s 2>&1 && echo \"1\" || echo \"0\"",
node_rejoin_log);
termPQExpBuffer(&node_rejoin_options);
log_debug("executing:\n %s", remote_command_str.data);
@@ -5359,78 +5400,161 @@ do_standby_switchover(void)
termPQExpBuffer(&remote_command_str);
/* TODO: verify this node's record was updated correctly */
initPQExpBuffer(&logmsg);
initPQExpBuffer(&detailmsg);
/* This is failure to execute the ssh command */
if (command_success == false)
{
log_error(_("rejoin failed with error code %i"), r);
switchover_success = false;
appendPQExpBuffer(&logmsg,
_("unable to execute \"repmgr node rejoin\" on demotion candidate \"%s\" (ID: %i)"),
remote_node_record.node_name,
remote_node_record.node_id);
appendPQExpBufferStr(&detailmsg,
command_output.data);
create_event_notification_extended(local_conn,
&config_file_options,
config_file_options.node_id,
"standby_switchover",
false,
command_output.data,
&event_info);
}
else
{
PQExpBufferData event_details;
standy_join_status join_success = check_standby_join(local_conn,
&local_node_record,
&remote_node_record);
standy_join_status join_success = JOIN_UNKNOWN;
initPQExpBuffer(&event_details);
switch (join_success) {
case JOIN_FAIL_NO_PING:
appendPQExpBuffer(&event_details,
_("node \"%s\" (ID: %i) promoted to primary, but demote node \"%s\" (ID: %i) did not beome available"),
config_file_options.node_name,
config_file_options.node_id,
remote_node_record.node_name,
remote_node_record.node_id);
switchover_success = false;
break;
case JOIN_FAIL_NO_REPLICATION:
appendPQExpBuffer(&event_details,
_("node \"%s\" (ID: %i) promoted to primary, but demote node \"%s\" (ID: %i) did not connect to the new primary"),
config_file_options.node_name,
config_file_options.node_id,
remote_node_record.node_name,
remote_node_record.node_id);
switchover_success = false;
break;
case JOIN_SUCCESS:
appendPQExpBuffer(&event_details,
_("node \"%s\" (ID: %i) promoted to primary, node \"%s\" (ID: %i) demoted to standby"),
config_file_options.node_name,
config_file_options.node_id,
remote_node_record.node_name,
remote_node_record.node_id);
}
create_event_notification_extended(local_conn,
&config_file_options,
config_file_options.node_id,
"standby_switchover",
switchover_success,
event_details.data,
&event_info);
if (switchover_success == true)
/* "rempgr node rejoin" failed on the demotion candidate */
if (command_output.data[0] == '0')
{
log_notice("%s", event_details.data);
appendPQExpBuffer(&logmsg,
_("execution of \"repmgr node rejoin\" on demotion candidate \"%s\" (ID: %i) failed"),
remote_node_record.node_name,
remote_node_record.node_id);
/*
* Speculatively check if the demotion candidate has been restarted, e.g. by
* an external watchdog process which isn't aware a switchover is happening.
* This falls into the category "thing outside of our control which shouldn't
* happen, but if it does, make it easier to find out what happened".
*/
remote_conn = establish_db_connection(remote_node_record.conninfo, false);
if (PQstatus(remote_conn) == CONNECTION_OK)
{
if (get_recovery_type(remote_conn) == RECTYPE_PRIMARY)
{
appendPQExpBuffer(&detailmsg,
_("PostgreSQL instance on demotion candidate \"%s\" (ID: %i) is running as a primary\n"),
remote_node_record.node_name,
remote_node_record.node_id);
log_warning("%s", detailmsg.data);
}
}
PQfinish(remote_conn);
appendPQExpBuffer(&detailmsg,
"check log file \"%s\" on \"%s\" for details",
node_rejoin_log,
remote_node_record.node_name);
switchover_success = false;
join_success = JOIN_COMMAND_FAIL;
}
else
{
log_error("%s", event_details.data);
join_success = check_standby_join(local_conn,
&local_node_record,
&remote_node_record);
switch (join_success) {
case JOIN_FAIL_NO_PING:
appendPQExpBuffer(&logmsg,
_("node \"%s\" (ID: %i) promoted to primary, but demotion candidate \"%s\" (ID: %i) did not become available"),
config_file_options.node_name,
config_file_options.node_id,
remote_node_record.node_name,
remote_node_record.node_id);
switchover_success = false;
break;
case JOIN_FAIL_NO_REPLICATION:
appendPQExpBuffer(&logmsg,
_("node \"%s\" (ID: %i) promoted to primary, but demotion candidate \"%s\" (ID: %i) did not connect to the new primary"),
config_file_options.node_name,
config_file_options.node_id,
remote_node_record.node_name,
remote_node_record.node_id);
switchover_success = false;
break;
case JOIN_SUCCESS:
appendPQExpBuffer(&logmsg,
_("node \"%s\" (ID: %i) promoted to primary, node \"%s\" (ID: %i) demoted to standby"),
config_file_options.node_name,
config_file_options.node_id,
remote_node_record.node_name,
remote_node_record.node_id);
break;
/* check_standby_join() does not return this */
case JOIN_COMMAND_FAIL:
break;
/* should never happen*/
case JOIN_UNKNOWN:
appendPQExpBuffer(&logmsg,
"unable to determine success of node rejoin action for demotion candidate \"%s\" (ID: %i)",
remote_node_record.node_name,
remote_node_record.node_id);
switchover_success = false;
break;
}
if (switchover_success == false)
{
appendPQExpBuffer(&detailmsg,
"check the PostgreSQL log file on demotion candidate \"%s\" (ID: %i)",
remote_node_record.node_name,
remote_node_record.node_id);
}
}
termPQExpBuffer(&event_details);
}
if (switchover_success == true)
{
/* TODO: verify demotion candidates's node record was updated correctly */
log_notice("%s", logmsg.data);
}
else
{
log_error("%s", logmsg.data);
}
initPQExpBuffer(&event_details);
appendPQExpBufferStr(&event_details, logmsg.data);
if (detailmsg.data[0] != '\0')
{
log_detail("%s", detailmsg.data);
appendPQExpBuffer(&event_details, "\n%s",
detailmsg.data);
}
create_event_notification_extended(local_conn,
&config_file_options,
config_file_options.node_id,
"standby_switchover",
switchover_success,
event_details.data,
&event_info);
termPQExpBuffer(&event_details);
termPQExpBuffer(&logmsg);
termPQExpBuffer(&detailmsg);
termPQExpBuffer(&command_output);
/*
* If --siblings-follow specified, attempt to make them follow the new
* primary
@@ -5543,7 +5667,7 @@ do_standby_switchover(void)
if (repmgrd_info[i]->paused == true && runtime_options.repmgrd_force_unpause == false)
{
log_debug("repmgrd on node \"%s\" (ID %i) paused before switchover, --repmgrd-force-unpause not provided, not unpausing",
log_debug("repmgrd on node \"%s\" (ID: %i) paused before switchover, --repmgrd-force-unpause not provided, not unpausing",
cell->node_info->node_name,
cell->node_info->node_id);
@@ -5551,7 +5675,7 @@ do_standby_switchover(void)
continue;
}
log_debug("unpausing repmgrd on node \"%s\" (ID %i)",
log_debug("unpausing repmgrd on node \"%s\" (ID: %i)",
cell->node_info->node_name,
cell->node_info->node_id);
@@ -5562,7 +5686,7 @@ do_standby_switchover(void)
if (repmgrd_pause(cell->node_info->conn, false) == false)
{
item_list_append_format(&repmgrd_unpause_errors,
_("unable to unpause node \"%s\" (ID %i)"),
_("unable to unpause node \"%s\" (ID: %i)"),
cell->node_info->node_name,
cell->node_info->node_id);
error_node_count++;
@@ -5571,7 +5695,7 @@ do_standby_switchover(void)
else
{
item_list_append_format(&repmgrd_unpause_errors,
_("unable to connect to node \"%s\" (ID %i):\n%s"),
_("unable to connect to node \"%s\" (ID: %i):\n%s"),
cell->node_info->node_name,
cell->node_info->node_id,
PQerrorMessage(cell->node_info->conn));
@@ -6855,6 +6979,13 @@ run_basebackup(t_node_info *node_record)
termPQExpBuffer(&params);
if (runtime_options.dry_run == true)
{
log_info(_("would execute:\n %s"), script.data);
termPQExpBuffer(&script);
return SUCCESS;
}
log_info(_("executing:\n %s"), script.data);
/*
@@ -8018,9 +8149,9 @@ create_recovery_file(t_node_info *node_record, t_conninfo_param_list *primary_co
free(escaped);
}
/*
* Caller requests the generated file to be written into a buffer
*/
if (as_file == false)
{
/* create file in buffer */
@@ -8040,20 +8171,17 @@ create_recovery_file(t_node_info *node_record, t_conninfo_param_list *primary_co
return true;
}
/*
* PostgreSQL 12 and later: modify postgresql.auto.conf
*
*/
if (server_version_num >= 120000)
{
if (modify_auto_conf(dest, &recovery_config) == false)
{
return false;
}
if (write_standby_signal() == false)
if (write_standby_signal(dest) == false)
{
return false;
}

View File

@@ -120,6 +120,7 @@ typedef struct
bool missing_slots;
bool has_passfile;
bool replication_connection;
bool repmgrd;
bool data_directory_config;
bool replication_config_owner;
bool db_connection;
@@ -175,7 +176,7 @@ typedef struct
/* "node status" options */ \
false, \
/* "node check" options */ \
false, false, false, false, false, false, false, false, false, false, false, false, \
false, false, false, false, false, false, false, false, false, false, false, false, false, \
/* "node rejoin" options */ \
"", \
/* "node service" options */ \
@@ -219,7 +220,9 @@ typedef enum
typedef enum
{
JOIN_UNKNOWN = -1,
JOIN_SUCCESS,
JOIN_COMMAND_FAIL,
JOIN_FAIL_NO_PING,
JOIN_FAIL_NO_REPLICATION
} standy_join_status;
@@ -282,8 +285,8 @@ extern void get_node_config_directory(char *config_dir_buf);
extern void get_node_data_directory(char *data_dir_buf);
extern void init_node_record(t_node_info *node_record);
extern bool can_use_pg_rewind(PGconn *conn, const char *data_directory, PQExpBufferData *reason);
extern void make_standby_signal_path(char *buf);
extern bool write_standby_signal(void);
extern void make_standby_signal_path(const char *data_dir, char *buf);
extern bool write_standby_signal(const char *data_dir);
extern bool create_replication_slot(PGconn *conn, char *slot_name, t_node_info *upstream_node_record, PQExpBufferData *error_msg);
extern bool drop_replication_slot_if_exists(PGconn *conn, int node_id, char *slot_name);

View File

@@ -51,6 +51,7 @@
* along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include <pwd.h>
#include <unistd.h>
#include <sys/stat.h>
#include <signal.h>
@@ -549,6 +550,10 @@ main(int argc, char **argv)
runtime_options.data_directory_config = true;
break;
case OPT_REPMGRD:
runtime_options.repmgrd = true;
break;
case OPT_REPLICATION_CONFIG_OWNER:
runtime_options.replication_config_owner = true;
break;
@@ -3661,11 +3666,11 @@ can_use_pg_rewind(PGconn *conn, const char *data_directory, PQExpBufferData *rea
void
make_standby_signal_path(char *buf)
make_standby_signal_path(const char *data_dir, char *buf)
{
snprintf(buf, MAXPGPATH,
"%s/%s",
config_file_options.data_directory,
data_dir,
STANDBY_SIGNAL_FILE);
}
@@ -3673,13 +3678,15 @@ make_standby_signal_path(char *buf)
* create standby.signal (PostgreSQL 12 and later)
*/
bool
write_standby_signal(void)
write_standby_signal(const char *data_dir)
{
char standby_signal_file_path[MAXPGPATH] = "";
FILE *file;
mode_t um;
make_standby_signal_path(standby_signal_file_path);
Assert(data_dir != NULL);
make_standby_signal_path(data_dir, standby_signal_file_path);
/* Set umask to 0600 */
um = umask((~(S_IRUSR | S_IWUSR)) & (S_IRWXG | S_IRWXO));

View File

@@ -100,6 +100,7 @@
#define OPT_DB_CONNECTION 1047
#define OPT_VERIFY_BACKUP 1048
#define OPT_RECOVERY_MIN_APPLY_DELAY 1049
#define OPT_REPMGRD 1050
/* These options are for internal use only */
#define OPT_CONFIG_ARCHIVE_DIR 2001
@@ -193,6 +194,7 @@ static struct option long_options[] =
{"role", no_argument, NULL, OPT_ROLE},
{"slots", no_argument, NULL, OPT_SLOTS},
{"missing-slots", no_argument, NULL, OPT_MISSING_SLOTS},
{"repmgrd", no_argument, NULL, OPT_REPMGRD},
{"has-passfile", no_argument, NULL, OPT_HAS_PASSFILE},
{"replication-connection", no_argument, NULL, OPT_REPL_CONN},
{"data-directory-config", no_argument, NULL, OPT_DATA_DIRECTORY_CONFIG},

View File

@@ -88,59 +88,24 @@ void _PG_fini(void);
static void repmgr_shmem_startup(void);
Datum set_local_node_id(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(set_local_node_id);
Datum get_local_node_id(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(get_local_node_id);
Datum standby_set_last_updated(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(standby_set_last_updated);
Datum standby_get_last_updated(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(standby_get_last_updated);
Datum set_upstream_last_seen(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(set_upstream_last_seen);
Datum get_upstream_last_seen(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(get_upstream_last_seen);
Datum get_upstream_node_id(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(get_upstream_node_id);
Datum set_upstream_node_id(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(set_upstream_node_id);
Datum notify_follow_primary(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(notify_follow_primary);
Datum get_new_primary(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(get_new_primary);
Datum reset_voting_status(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(reset_voting_status);
Datum set_repmgrd_pid(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(repmgr_set_local_node_id);
PG_FUNCTION_INFO_V1(repmgr_get_local_node_id);
PG_FUNCTION_INFO_V1(repmgr_standby_set_last_updated);
PG_FUNCTION_INFO_V1(repmgr_standby_get_last_updated);
PG_FUNCTION_INFO_V1(repmgr_set_upstream_last_seen);
PG_FUNCTION_INFO_V1(repmgr_get_upstream_last_seen);
PG_FUNCTION_INFO_V1(repmgr_get_upstream_node_id);
PG_FUNCTION_INFO_V1(repmgr_set_upstream_node_id);
PG_FUNCTION_INFO_V1(repmgr_notify_follow_primary);
PG_FUNCTION_INFO_V1(repmgr_get_new_primary);
PG_FUNCTION_INFO_V1(repmgr_reset_voting_status);
PG_FUNCTION_INFO_V1(set_repmgrd_pid);
Datum get_repmgrd_pid(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(get_repmgrd_pid);
Datum get_repmgrd_pidfile(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(get_repmgrd_pidfile);
Datum repmgrd_is_running(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(repmgrd_is_running);
Datum repmgrd_pause(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(repmgrd_pause);
Datum repmgrd_is_paused(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(repmgrd_is_paused);
Datum get_wal_receiver_pid(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1(get_wal_receiver_pid);
PG_FUNCTION_INFO_V1(repmgr_get_wal_receiver_pid);
/*
@@ -233,7 +198,7 @@ repmgr_shmem_startup(void)
/* ==================== */
Datum
set_local_node_id(PG_FUNCTION_ARGS)
repmgr_set_local_node_id(PG_FUNCTION_ARGS)
{
int local_node_id = UNKNOWN_NODE_ID;
int stored_node_id = UNKNOWN_NODE_ID;
@@ -303,7 +268,7 @@ set_local_node_id(PG_FUNCTION_ARGS)
Datum
get_local_node_id(PG_FUNCTION_ARGS)
repmgr_get_local_node_id(PG_FUNCTION_ARGS)
{
int local_node_id = UNKNOWN_NODE_ID;
@@ -320,7 +285,7 @@ get_local_node_id(PG_FUNCTION_ARGS)
/* update and return last updated with current timestamp */
Datum
standby_set_last_updated(PG_FUNCTION_ARGS)
repmgr_standby_set_last_updated(PG_FUNCTION_ARGS)
{
TimestampTz last_updated = GetCurrentTimestamp();
@@ -337,7 +302,7 @@ standby_set_last_updated(PG_FUNCTION_ARGS)
/* get last updated timestamp */
Datum
standby_get_last_updated(PG_FUNCTION_ARGS)
repmgr_standby_get_last_updated(PG_FUNCTION_ARGS)
{
TimestampTz last_updated;
@@ -354,7 +319,7 @@ standby_get_last_updated(PG_FUNCTION_ARGS)
Datum
set_upstream_last_seen(PG_FUNCTION_ARGS)
repmgr_set_upstream_last_seen(PG_FUNCTION_ARGS)
{
int upstream_node_id = UNKNOWN_NODE_ID;
@@ -377,7 +342,7 @@ set_upstream_last_seen(PG_FUNCTION_ARGS)
Datum
get_upstream_last_seen(PG_FUNCTION_ARGS)
repmgr_get_upstream_last_seen(PG_FUNCTION_ARGS)
{
long secs;
int microsecs;
@@ -411,7 +376,7 @@ get_upstream_last_seen(PG_FUNCTION_ARGS)
Datum
get_upstream_node_id(PG_FUNCTION_ARGS)
repmgr_get_upstream_node_id(PG_FUNCTION_ARGS)
{
int upstream_node_id = UNKNOWN_NODE_ID;
@@ -426,7 +391,7 @@ get_upstream_node_id(PG_FUNCTION_ARGS)
}
Datum
set_upstream_node_id(PG_FUNCTION_ARGS)
repmgr_set_upstream_node_id(PG_FUNCTION_ARGS)
{
int upstream_node_id = UNKNOWN_NODE_ID;
int local_node_id = UNKNOWN_NODE_ID;
@@ -462,7 +427,7 @@ set_upstream_node_id(PG_FUNCTION_ARGS)
Datum
notify_follow_primary(PG_FUNCTION_ARGS)
repmgr_notify_follow_primary(PG_FUNCTION_ARGS)
{
int primary_node_id = UNKNOWN_NODE_ID;
@@ -505,7 +470,7 @@ notify_follow_primary(PG_FUNCTION_ARGS)
Datum
get_new_primary(PG_FUNCTION_ARGS)
repmgr_get_new_primary(PG_FUNCTION_ARGS)
{
int new_primary_node_id = UNKNOWN_NODE_ID;
@@ -527,7 +492,7 @@ get_new_primary(PG_FUNCTION_ARGS)
Datum
reset_voting_status(PG_FUNCTION_ARGS)
repmgr_reset_voting_status(PG_FUNCTION_ARGS)
{
if (!shared_state)
PG_RETURN_NULL();
@@ -735,7 +700,7 @@ repmgrd_is_paused(PG_FUNCTION_ARGS)
Datum
get_wal_receiver_pid(PG_FUNCTION_ARGS)
repmgr_get_wal_receiver_pid(PG_FUNCTION_ARGS)
{
int wal_receiver_pid;

View File

@@ -69,7 +69,7 @@
#------------------------------------------------------------------------------
#replication_user='repmgr' # User to make replication connections with, if not set
# defaults to the user defined in "conninfo".
# defaults to the user defined in "conninfo".
#replication_type='physical' # Must "physical" (the default).
@@ -314,7 +314,7 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
#follow_command='' # command repmgrd executes when instructing a standby to follow a new primary;
# use something like:
#
# repmgr standby follow -f /etc/repmgr.conf -W --upstream-node-id=%n
# repmgr standby follow -f /etc/repmgr.conf --upstream-node-id=%n
#
#primary_notification_timeout=60 # Interval (in seconds) which repmgrd on a standby
# will wait for a notification from the new primary,
@@ -337,6 +337,7 @@ ssh_options='-q -o ConnectTimeout=10' # Options to append to "ssh"
# "--no-pid-file" will force PID file creation to be skipped.
# Note: there is normally no need to set this, particularly if
# repmgr was installed from packages.
#repmgrd_exit_on_inactive_node=false # If "true", and the node record is marked as "inactive", abort repmgrd startup
#standby_disconnect_on_failover=false # If "true", in a failover situation wait for all standbys to
# disconnect their WAL receivers before electing a new primary
# (PostgreSQL 9.5 and later only; repmgr user must be a superuser for this)

View File

@@ -1,8 +1,7 @@
# repmgr extension
comment = 'Replication manager for PostgreSQL'
default_version = '5.2'
default_version = '5.3'
module_pathname = '$libdir/repmgr'
relocatable = false
schema = repmgr

View File

@@ -135,6 +135,7 @@
#define DEFAULT_ASYNC_QUERY_TIMEOUT 60 /* seconds */
#define DEFAULT_PRIMARY_NOTIFICATION_TIMEOUT 60 /* seconds */
#define DEFAULT_REPMGRD_STANDBY_STARTUP_TIMEOUT -1 /*seconds */
#define DEFAULT_REPMGRD_EXIT_ON_INACTIVE_NODE false,
#define DEFAULT_STANDBY_DISCONNECT_ON_FAILOVER false
#define DEFAULT_SIBLING_NODES_DISCONNECT_TIMEOUT 30 /* seconds */
#define DEFAULT_CONNECTION_CHECK_TYPE CHECK_PING

View File

@@ -1,5 +1,7 @@
#define REPMGR_VERSION_DATE ""
#define REPMGR_VERSION "5.2.1"
#define REPMGR_VERSION_NUM 50201
#define REPMGR_RELEASE_DATE "2020-12-07"
#define REPMGR_VERSION "5.3.1"
#define REPMGR_VERSION_NUM 50301
#define REPMGR_EXTENSION_VERSION "5.3"
#define REPMGR_EXTENSION_NUM 50300
#define REPMGR_RELEASE_DATE "2022-02-15"
#define PG_ACTUAL_VERSION_NUM

View File

@@ -169,45 +169,126 @@ handle_sigint_physical(SIGNAL_ARGS)
/* perform some sanity checks on the node's configuration */
void
do_physical_node_check(void)
do_physical_node_check(PGconn *conn)
{
/*
* Check if node record is active - if not, and `failover=automatic`, the
* node won't be considered as a promotion candidate; this often happens
* when a failed primary is recloned and the node was not re-registered,
* giving the impression failover capability is there when it's not. In
* this case abort with an error and a hint about registering.
* If node record is "inactive"; if not, attempt to set it to "active".
*
* If `failover=manual`, repmgrd can continue to passively monitor the
* node, but we should nevertheless issue a warning and the same hint.
* Usually it will have become inactive due to e.g. a standby being shut down
* while repmgrd was running in an unpaused state. In this case it's
* perfectly reasonable to automatically mark the node as "active".
*/
if (local_node_info.active == false)
{
char *hint = "Check that \"repmgr (primary|standby) register\" was executed for this node";
RecoveryType recovery_type = get_recovery_type(conn);
switch (config_file_options.failover)
/*
* If the local node's recovery status is incompatible with its registered
* status, e.g. registered as primary but running as a standby, refuse to start.
*
* This typically happens when a failed primary is recloned but the node was not
* re-registered, leaving the cluster in a potentially ambiguous state. In
* this case it would not be possible or desirable to attempt to set the
* node to active; the user should ensure the cluster is in the correct state.
*/
if (recovery_type != RECTYPE_UNKNOWN && local_node_info.type != UNKNOWN)
{
/* "failover" is an enum, all values should be covered here */
bool require_reregister = false;
PQExpBufferData event_details;
initPQExpBuffer(&event_details);
case FAILOVER_AUTOMATIC:
log_error(_("this node is marked as inactive and cannot be used as a failover target"));
if (recovery_type == RECTYPE_STANDBY && local_node_info.type != STANDBY)
{
appendPQExpBuffer(&event_details,
_("node is registered as a %s but running as a standby"),
get_node_type_string(local_node_info.type));
require_reregister = true;
}
else if (recovery_type == RECTYPE_PRIMARY && local_node_info.type == STANDBY)
{
log_error(_("node is registered as a standby but running as a %s"), get_node_type_string(local_node_info.type));
require_reregister = true;
}
if (require_reregister == true)
{
log_error("%s", event_details.data);
log_hint(_("%s"), hint);
create_event_notification(NULL,
&config_file_options,
config_file_options.node_id,
"repmgrd_shutdown",
"repmgrd_start",
false,
"node is inactive and cannot be used as a failover target");
event_details.data);
termPQExpBuffer(&event_details);
terminate(ERR_BAD_CONFIG);
break;
}
case FAILOVER_MANUAL:
log_warning(_("this node is marked as inactive and will be passively monitored only"));
log_hint(_("%s"), hint);
break;
termPQExpBuffer(&event_details);
}
/*
* Attempt to set node record active (unless explicitly configured not to)
*/
log_notice(_("setting node record for node \"%s\" (ID: %i) to \"active\""),
local_node_info.node_name,
local_node_info.node_id);
if (config_file_options.repmgrd_exit_on_inactive_node == false)
{
PGconn *primary_conn = get_primary_connection(conn, NULL, NULL);
bool success = true;
if (PQstatus(primary_conn) != CONNECTION_OK)
{
log_error(_("unable to connect to the primary node to activate the node record"));
success = false;
}
else
{
success = update_node_record_set_active(primary_conn, local_node_info.node_id, true);
PQfinish(primary_conn);
}
if (success == true)
{
local_node_info.active = true;
}
}
/*
* Corner-case where it was not possible to set the node to "active"
*/
if (local_node_info.active == false)
{
switch (config_file_options.failover)
{
/* "failover" is an enum, all values should be covered here */
case FAILOVER_AUTOMATIC:
log_error(_("this node is marked as inactive and cannot be used as a failover target"));
log_hint(_("%s"), hint);
create_event_notification(NULL,
&config_file_options,
config_file_options.node_id,
"repmgrd_start",
false,
"node is inactive and cannot be used as a failover target");
terminate(ERR_BAD_CONFIG);
break;
case FAILOVER_MANUAL:
log_warning(_("this node is marked as inactive and will be passively monitored only"));
log_hint(_("%s"), hint);
break;
}
}
}
@@ -504,6 +585,7 @@ monitor_streaming_primary(void)
if (is_server_available(local_node_info.conninfo) == true)
{
close_connection(&local_conn);
local_conn = establish_db_connection(local_node_info.conninfo, false);
if (PQstatus(local_conn) != CONNECTION_OK)
@@ -1732,7 +1814,10 @@ monitor_streaming_standby(void)
if (upstream_check_result == true)
{
if (config_file_options.connection_check_type != CHECK_QUERY)
{
close_connection(&upstream_conn);
upstream_conn = establish_db_connection(upstream_node_info.conninfo, false);
}
if (PQstatus(upstream_conn) == CONNECTION_OK)
{
@@ -1813,6 +1898,7 @@ monitor_streaming_standby(void)
int former_upstream_node_id = local_node_info.upstream_node_id;
NodeInfoList sibling_nodes = T_NODE_INFO_LIST_INITIALIZER;
PQExpBufferData event_details;
t_event_info event_info = T_EVENT_INFO_INITIALIZER;
update_node_record_set_primary(local_conn, local_node_info.node_id);
record_status = get_node_record(local_conn, local_node_info.node_id, &local_node_info);
@@ -1825,12 +1911,16 @@ monitor_streaming_standby(void)
initPQExpBuffer(&event_details);
appendPQExpBufferStr(&event_details,
_("promotion command failed but promotion completed successfully"));
create_event_notification(local_conn,
&config_file_options,
local_node_info.node_id,
"repmgrd_failover_promote",
true,
event_details.data);
event_info.node_id = former_upstream_node_id;
create_event_notification_extended(local_conn,
&config_file_options,
local_node_info.node_id,
"repmgrd_failover_promote",
true,
event_details.data,
&event_info);
termPQExpBuffer(&event_details);
@@ -2460,8 +2550,10 @@ monitor_streaming_witness(void)
if (check_upstream_connection(&primary_conn, upstream_node_info.conninfo, NULL) == true)
{
if (config_file_options.connection_check_type != CHECK_QUERY)
{
close_connection(&primary_conn);
primary_conn = establish_db_connection(upstream_node_info.conninfo, false);
}
if (PQstatus(primary_conn) == CONNECTION_OK)
{
PQExpBufferData event_details;
@@ -2975,7 +3067,6 @@ do_primary_failover(void)
t_node_info new_primary = T_NODE_INFO_INITIALIZER;
RecordStatus record_status = RECORD_NOT_FOUND;
PGconn *new_primary_conn;
record_status = get_node_record(local_conn, new_primary_id, &new_primary);
@@ -2987,6 +3078,7 @@ do_primary_failover(void)
else
{
PQExpBufferData event_details;
PGconn *new_primary_conn;
initPQExpBuffer(&event_details);
appendPQExpBuffer(&event_details,
@@ -3007,7 +3099,6 @@ do_primary_failover(void)
event_details.data);
close_connection(&new_primary_conn);
termPQExpBuffer(&event_details);
}
failover_state = FAILOVER_STATE_REQUIRES_MANUAL_FAILOVER;
}
@@ -3674,6 +3765,7 @@ promote_self(void)
{
PQExpBufferData event_details;
t_event_info event_info = T_EVENT_INFO_INITIALIZER;
/* update own internal node record */
record_status = get_node_record(local_conn, local_node_info.node_id, &local_node_info);
@@ -3690,13 +3782,16 @@ promote_self(void)
failed_primary.node_name,
failed_primary.node_id);
event_info.node_id = failed_primary.node_id;
/* local_conn is now the primary connection */
create_event_notification(local_conn,
&config_file_options,
local_node_info.node_id,
"repmgrd_failover_promote",
true,
event_details.data);
create_event_notification_extended(local_conn,
&config_file_options,
local_node_info.node_id,
"repmgrd_failover_promote",
true,
event_details.data,
&event_info);
termPQExpBuffer(&event_details);
}

View File

@@ -19,7 +19,7 @@
#ifndef _REPMGRD_PHYSICAL_H_
#define _REPMGRD_PHYSICAL_H_
void do_physical_node_check(void);
void do_physical_node_check(PGconn *conn);
void monitor_streaming_primary(void);
void monitor_streaming_standby(void);

View File

@@ -396,13 +396,14 @@ main(int argc, char **argv)
* extension is the latest available according to "pg_available_extensions" -
* - does our (major) version match that?
*/
log_verbose(LOG_DEBUG, "binary version: %i; extension version: %i",
REPMGR_VERSION_NUM, extversions.installed_version_num);
if ((REPMGR_VERSION_NUM/100) < (extversions.installed_version_num / 100))
log_verbose(LOG_DEBUG, "expected extension version: %i; extension version: %i",
REPMGR_EXTENSION_NUM, extversions.installed_version_num);
if ((REPMGR_EXTENSION_NUM/100) < (extversions.installed_version_num / 100))
{
log_error(_("this \"repmgr\" version is older than the installed \"repmgr\" extension version"));
log_detail(_("\"repmgr\" version %s is installed but extension is version %s"),
log_detail(_("\"repmgr\" version %s providing extension version %s is installed but extension is version %s"),
REPMGR_VERSION,
REPMGR_EXTENSION_VERSION,
extversions.installed_version);
log_hint(_("update the repmgr binaries to match the installed extension version"));
@@ -410,11 +411,12 @@ main(int argc, char **argv)
exit(ERR_BAD_CONFIG);
}
if ((REPMGR_VERSION_NUM/100) > (extversions.installed_version_num / 100))
if ((REPMGR_EXTENSION_NUM/100) > (extversions.installed_version_num / 100))
{
log_error(_("this \"repmgr\" version is newer than the installed \"repmgr\" extension version"));
log_detail(_("\"repmgr\" version %s is installed but extension is version %s"),
log_detail(_("\"repmgr\" version %s providing extension version %s is installed but extension is version %s"),
REPMGR_VERSION,
REPMGR_EXTENSION_VERSION,
extversions.installed_version);
log_hint(_("update the installed extension version by executing \"ALTER EXTENSION repmgr UPDATE\" in the repmgr database"));
@@ -510,7 +512,7 @@ main(int argc, char **argv)
log_debug("node id is %i, upstream node id is %i",
local_node_info.node_id,
local_node_info.upstream_node_id);
do_physical_node_check();
do_physical_node_check(local_conn);
}
if (daemonize == true)

View File

@@ -369,7 +369,6 @@ check_status_list_free(CheckStatusList *list)
}
const char *
output_check_status(CheckStatus status)
{