Commit Graph

412 Commits

Author SHA1 Message Date
Ian Barwick
de3f0802b4 Update source comments to clarify data directory modifications 2020-06-19 13:51:00 +09:00
Ian Barwick
0d0ffc675c standby clone: add a strategic Assert 2020-06-09 14:31:49 +09:00
Ian Barwick
11dc923a20 standby clone: minor code cleanup 2020-06-09 14:31:44 +09:00
Ian Barwick
e97319f01d Fix typo in comment 2020-06-09 14:31:40 +09:00
Ian Barwick
db1cb1433f Rename the TablespaceDataListCell element "f" to "fptr" for clarity
And add a few more comments to make it clearer what's going on.
2020-06-09 14:31:36 +09:00
Ian Barwick
c1428a3ecd standby clone: fixes for Barman tablespace handling.
repmgr creates a file with a list of tablespace files to fetch from
Barman, however the file may not actually have been flushed to disk
at the point the rsync operation was executed, so may be incomplete
or empty.

Also fix handling of tablespace remapping.

Addresses GitHub #650.
2020-06-09 10:52:10 +09:00
Ian Barwick
fc568a9101 run_file_backup(): fix comments
Explicitly document use-case for this function, and fix a comment
which probably got munged by pg_indent.
2020-06-08 12:45:38 +09:00
Ian Barwick
a0d3fae7ab standby register: ensure location field is compared during record check 2020-05-21 14:35:03 +09:00
Ian Barwick
1b5ad743b5 standby clone: explicitly set closed connection pointers to NULL
We omitted to do this with the connections used when checking the system
identifier, which means libpq calls by the teardown function using the
pointer risk using unallocated memory.

Addresses issue reported in GitHub #644.
2020-05-11 13:52:10 +09:00
Ian Barwick
d1ab6ce28b standby clone: emit warning, not error if server is 9.3 and tablespace_mapping provided 2020-05-07 10:23:07 +09:00
Ian Barwick
d0c5dffe91 standby clone: explicitly log that replication slots not in use
Helps with diagnosing output.
2020-04-27 13:57:18 +09:00
Ian Barwick
38b3447bd3 Add repmgr home page to --help output
Per PostgreSQL commit 1933ae629e7b706c6c23673a381e778819db307d it seems
to be all the rage these days.
2020-04-24 09:41:56 +09:00
Ian Barwick
971309c830 Fix parsing of database connection check results in "standby clone" 2020-04-23 13:19:20 +09:00
Ian Barwick
1628bfb846 Update references to "recovery.conf" in _do_create_replication_conf() 2020-04-23 11:42:13 +09:00
Ian Barwick
025e66ea46 standby switchover: check superuser connection on demotion candidate
Add a sanity check that rempgr, when remotely executed on the demotion
candidate, is able to connect as superuser. If not, emit a diagnostic
command as a hint.
2020-04-21 11:25:01 +09:00
Ian Barwick
4e48301d78 standby switchover: note database name for superuser connections
It's useful to have a confirmation of which database repmgr is trying
to connect to when the -S/--superuser connection is provided.

It will always be the database defined in the repmgr.conf "conninfo"
parameter, but having the name available is useful when e.g.
troubleshooting issues with .pgpass configuration.
2020-04-20 16:49:47 +09:00
Ian Barwick
2f26a02b5c doc: clarify usage of -F/--force with "standby promote"
Per GitHub #632.
2020-04-20 12:11:49 +09:00
Ian Barwick
97d83bd443 standby switchover: add hint for diagnosing remote DB connection failure
Output a command, which when excuted on the local node (promotion
candidate) will attempt to remotely connect to the demotion candidate
and display both the connection message encountered and the connection
parameters used.

This is useful for corner-cases where the connection normally succeeds if a
particular environment variable (e.g. PGPORT) is normally set, but is
not set in the environment where SSH is executed.
2020-04-17 11:20:02 +09:00
Ian Barwick
cfd35852b7 standby switchover: improve archive check error handling
Explicitly log if a database connection failure caused the check
to fail.

It's unlikely this situation will be encountered, as the data directory
check will already have run and checked for connection failure, however
there's a small chance the connection could fail between checks.
2020-04-15 14:08:33 +09:00
Ian Barwick
32dde4eaaf standby switchover: improve directory check failure handling
It's possible that the remote data directory check will fail if e.g.
connection configuration is not consistent across all nodes. This
modification ensures a database error connection is reported, rather
than a spurios issue with the data directory configuration.
2020-04-15 14:08:29 +09:00
Ian Barwick
410dd40526 standby switchover: standardize log message 2020-04-15 10:24:44 +09:00
Ian Barwick
599bab590a Create temporary pg.auto.conf file with the same permissions as the original
Commit 0574279 set the file permissions to 0600 rather than the user's
umask, but if initdb was executed with -g/--allow-group-access, the
file is maintained with 0640, so we'll just maintain the existing
permssions.
2020-04-07 13:29:59 +09:00
Ian Barwick
cd80f265ac standby clone: warn about missing pg_rewind prerequisites
These are not essential for cloning a standby, but useful to warn
as early as possible in case the user is intending to use pg_rewind.
2020-04-06 15:37:37 +09:00
Ian Barwick
09f0be8ceb Minor log output fixes 2020-04-06 13:19:58 +09:00
Ian Barwick
447054a630 standby promote: in --dry-run mode, display promote command which will be used
For PostgreSQL 12 and later, explicitly note whether repmgr user has
execution permissions on the pg_promote() function.
2020-04-02 12:37:32 +09:00
Ian Barwick
5d92c99bb9 standby switchover: warn if no superuser connection available 2020-03-26 11:18:04 +09:00
Ian Barwick
e64349e4da standby switchover: accept -S/--superuser option 2020-03-25 14:00:51 +09:00
Ian Barwick
06f0e5e94f Minor error message output tweak 2020-03-23 16:30:00 +09:00
Ian Barwick
12adb5e0d1 Add warning if --superuser option provided when it won't be used
Currently the only place this option is relevant is "standby clone".
2020-03-23 15:28:22 +09:00
Ian Barwick
0bc0a28378 standby promote: enable "service_promote_command" in PostgreSQL 12
This enables the promote command generated internally by repmgr to
be overridden if desired, in the same way as for PostgreSQL 11 and
earlier.
2020-03-06 13:21:30 +09:00
Ian Barwick
fb5ce720f3 standby promote: fall back to "pg_ctl promote" if necessary
From PostgreSQL 12, the SQL-level function "pg_promote()" can be used
to promote a PostgreSQL instance, however usage is restricted to
superusers and users to whom explicit execution permission for this
function has been granted.

Therefore, if execution permission is not available, fall back to
"pg_ctl promote".
2020-03-06 12:53:37 +09:00
Ian Barwick
9de31428f1 Consolidate replication connection code
In a few places, replication connections are generated from the
parameters used by existing connections. This has resulted in a
number of similar blocks of code which do more-or-less the same
thing almost but not quite identically. In two cases, the code
omitted to set "dbname=replication", which can cause problems
in some contexts.

These code blocks have now been consolidated into standardized
functions.

This also resolves the issue addressed by GitHub #619.
2020-03-05 17:21:37 +09:00
Ian Barwick
63aac64938 standby switchover: fetch remote repmgr version number 2020-03-04 17:21:27 +09:00
Ian Barwick
8f6058c676 standby switchover: check replication configuration file ownership
Within a PostgreSQL data directory, all files should have the same
ownership as the data directory itself. PostgreSQL itself expects
this, and ownership of files by another user is likely to cause
problems.

In PostgreSQL 11 or earlier, if "recovery.conf" cannot be moved
by PostgreSQL (because e.g. it is owned by root), it will not be
possible to promote the standby to primary.

In PostgreSQL 12 and later, if "postgresql.auto.conf" on the demotion
candidate (current primary) has incorrect ownership (e.g. owned by
root), repmgr will very likely not be able to modify this file and
write the replication configuration required for the node to rejoin
the cluster as a standby.

Checks added to catch both cases before a switchover is executed.
2020-03-04 17:21:22 +09:00
Ian Barwick
e218422eca standby clone: fix references to "recovery.conf" for Pg 12 and later
"standby clone --recovery-conf-only" still mentioned "recovery.conf" in a
couple of places; change that to the more generic "replication configuration"
for Pg 12 and later.
2020-02-27 12:20:09 +09:00
Ian Barwick
b4af80fdec Add optional check for unsupported future PostgreSQL releases
This is for backbranches to prevent them running against newer
PostgreSQL versions with which they are not compatible, for example
4.4.x with PostgreSQL 12 and later.
2020-02-14 10:43:19 +09:00
laixiong
cb7bbda021 standby switchover: check remote's registered repmgr.conf
Check the demotion candidate's registered repmgr.conf file can be found.

If the configuration file has been deleted or moved, previously the
resulting error message would have been a confusing reference to
an incorrectly configured data directory; by explicitly checking for the
expected configuration file, we can make troubleshooting easier.

Original patch by laixiong <yin.zhb@gmail.com> (GitHub #615), modified
by Ian Barwick.
2020-02-05 15:56:57 +09:00
Ian Barwick
9cf4616af1 standby switchover: mark successful if standby attaches later
Handle corner case where standby (demotion candidate) doesn't
attach during the main check cycle, but does at the final check,
where we'll want to mark the operation as successful.
2020-02-05 14:29:30 +09:00
Ian Barwick
6f01c54620 repmgr: improve "standby switchover" completion checks
There were some corner cases where "repmgr standby switchover"
would erroneously report a successful switchover, even if the
demotion candidate had not reattached to the promotion candidate.

Also improve the logging in various places to make it clearer
what is happening on which node.
2020-02-04 15:38:10 +09:00
Ian Barwick
cd7f36a6fd Add general check function "check_replication_slots_available()"
Make the code previously only used by "standby follow" generally
available - we'll want to use this from "node rejoin" as well.

While we're at it, when reporting failure due to lack of free
replication slots, report the current value of "max_replication_slots".
2020-02-03 16:43:55 +09:00
Ian Barwick
ab9c84c655 Report error code on follow/rejoin failure due to non-available slot
Previously, if "repmgr standby follow" or "repmgr node rejoin" failed
due to a replication slot not being available, no error code was
returned.
2020-02-03 15:03:31 +09:00
Ian Barwick
0141bc2be7 standby switchover: display "shutdown_check_timeout" value in --dry-run mode
It's useful to be aware of this setting.
2020-01-30 10:30:18 +09:00
Ian Barwick
a7689ecd78 standby switchover: fix repmgr execution confirmation in --dry-run mode
Inexplicably, "localhost" was hard-coded, rather than the remote host
name.
2020-01-29 14:04:37 +09:00
Ian Barwick
ef30892250 standby switchover: improve wording of pending archive file messages 2020-01-28 13:40:59 +09:00
Ian Barwick
7fdf2f1778 Update copyright notices to 2020 2020-01-13 14:06:20 +09:00
Ian Barwick
3f5d2f6ee9 standby follow: don't attempt to delete slot if new upstream is same as current
An attempt will be made to delete an existing replication slot on the
old upstream node (this is important during e.g. a switchover operation
or when attaching a cascaded standby to a new upstream). However if the
standby is currently attached to the follow target node anyway, the
replication slot should never be deleted.
2019-12-10 15:56:00 +09:00
Ian Barwick
4ed72eb901 Minor formatting fix 2019-11-20 15:13:01 +09:00
Ian Barwick
220ec7fc96 Minimize user permissions requirements for replication slots
Enable operations which create or drop replication slots to be carried
out with the minimum necessary user permissions, i.e. a user with the
REPLICATION attribute.

This can be the repmgr user, or a dedicated replication user.
In the latter case, if the dedicated replication user is only
permitted to make replication connections, the streaming
replication protocol is used to create/drop slots.

Implements part of GitHub #536.
2019-10-30 15:51:15 +09:00
Ian Barwick
1a9bcddccd standby clone: fix typo in log message 2019-10-28 14:08:48 +09:00
Ian Barwick
52f9cd3bae Rename "_do_create_recovery_conf()" to "_do_create_replication_conf()"
As of PostgreSQL 12, the functionality is no longer specific to the
recovery.conf file.
2019-10-24 15:12:39 +09:00