Would always return "false", but as the value wasn't used anywhere,
the issue was inconsequential.
However while we're at it, actually check the return value in the
two places it's called, to help diagnose any issues in the unlikely
event they occur.
Per issue reported via GitHub PR #671 from user duzhgg.
In a corner-case situation where a standby is unable to attach to
the new primary due to a mismatch in the WAL stream, the connection
used to verify the recovery status of the new primary was not being
closed, leading to a risk of connection exhaustion on the new primary.
Addresses GitHub #682.
Previously, repmgr would forcibly change the permissions on a data
directory to 0700. However from PostgreSQL 11, 0750 is also valid,
so that value should not be changed.
Here the compiler may complain that the source length is being used,
though in all cases the source length was previously used to
define the length of the destination buffer, so it's not actually
a problem.
If neither the local node nor the upstream are available, and
"standby_disconnect_on_failover" is set, attempting to fetch
the walreceiver PID will result in repmgrd terminating.
Add a check that the connection is valid before attempting to
fetch the walreceiver PID.
Addresses GitHub #675.
This overrides the equivalent setting in repmgr.conf, if present.
Note this option was available in repmgr versions prior to 4.0, but
was assumed to be redundant. However recently a use-case was made
for its reintroduction.
Fix the `repmgr witness --help` command where at the "Unregister" section the message shown was
```
"witness register" unregisters a witness node.
```
instead of
```
"witness unregister" unregisters a witness node.
```
GitHub #676.
In PostgreSQL 12 and later we need to append replication configuration
to "postgresql.auto.conf" to guarantee it will be read last, and hence
override any preceding replication configuration which may be haunting
the configuration files.
We've been assuming that "postgresql.auto.conf" will always be present,
but at least one corner case has been observed where that was not the
case on the node being cloned from. Moreover it's perfectly acceptable
that this file does not exist (it will be recreated the next time
ALTER SYSTEM is executed), so we should be prepared to handle that case.
In passing, improve handling of more unlikely errors which might be
encountered when processing "postgresql.auto.conf".
Apparently "ALTER TABLE" (which we were using to convert the
"repl_events" table) does not mark the table as being part of the
extension. Instead, we need to create the new table and copy the
data, as is done with the other tables.
As it seems redirecting stderr to stdin (2>&1) when executing
system commands results in a SIGPIPE (141) return code, making
it impossible to determine the actual return code, redirect
stderr to a temporary file and collate the output from that.
There are possibly better ways of doing this which could
be revisited at a future date.
When cloning from Barman, and --no-upstream-connection was supplied,
the server version number will not be available at this point in the
code. It will however later be extracted from the Barman metadata,
so move the check for the --waldir pg_basebackup option to after
this point.
Also add an explicit check that a server version number has been
obtained (and fall back to extracting it from the cloned data
directory), as subsequent operations depend on knowing this to
be performed correctly.
From PostgreSQL 13, pg_rewind will automatically handle an unclean
shutdown itself, so as long as --force-rewind was provided, so there
is no need to fail with an error.
Note that pg_rewind handles the unclean shutdown by starting PostgreSQL
in single user mode, which it does before performing any checks as
to whether a rewind is actually necessary.
However pg_rewind doesn't take into account the possible presence
of a standby.signal file, so we remove that and recreate it after
pg_rewind was executed.
If two diverged nodes are on the same timeline, currently there's
no way of establishing the divergence point and pg_rewind
is ineffective.
Clarify the log messages to make this clearer.
When executing "repmgr standby clone" in Barman mode, and --waldir
is set in pg_basebackup options, properly report an error if the
target WAL directory could not be created or is not empty.
By setting --waldir in "pg_basebackup_options", standbys cloned using
pg_basebackup would have their WAL directory set to the specified
location and symlinked from the data directory.
This commit causes repmgr to honour that setting even when cloning
from Barman.