Enable operations which create or drop replication slots to be carried
out with the minimum necessary user permissions, i.e. a user with the
REPLICATION attribute.
This can be the repmgr user, or a dedicated replication user.
In the latter case, if the dedicated replication user is only
permitted to make replication connections, the streaming
replication protocol is used to create/drop slots.
Implements part of GitHub #536.
Append "_sql" to the respective function names, as we'll later be
creating equivalent functions which use the replication protocol
so need a way to distinguish between them.
A more generic option name to cover pre- and post-Pg12 replication
configuration methods.
--recovery-conf-only is retained as an alias for backwards
compatibility.
This makes it possible to return log output when executing repmgr
remotely at a different level to the one defined in the remote
repmgr's repmgr.conf.
This is particularly useful when DEBUG output is required.
"repmgr daemon" can be interpreted to mean the commands affect the local
daemon process only. Rename the commands which affect the entire cluster
to "repmgr service ...".
The "repmgr daemon ..." form of the affected commands is retained for backwards
compatibility.
The following options have long since been deprecated, and any attempt
to use them results only in a warning that they are no longer valid:
--data-dir
--no-conninfo-password
--recovery-min-apply-delay
When showing node information, check if the node's copy of its
record shows a different upstream to the one expected according
to the node where the command is executed.
This helps visualise situations where the cluster is in an
unexpected state, and provide a better idea of the actual state.
For example, if a cluster has divided somehow and a set of nodes are
following a new primary, when running "cluster show" etc., repmgr
will now show the name of the primary those nodes are actually
following, rather than the now outdated node name recorded
on the other side of the split. A warning will also be issued
about the situation.
In "recovery.conf", the configuration parameter "node_name" is used
as the "application_name" value, which will be truncated by PostgreSQL
to 63 characters (NAMEDATALEN - 1).
repmgr sometimes needs to be able to extract the application name from
pg_stat_replication to determine if a node is connected (e.g. when
executing "repmgr standby register"), so the comparison will fail
if "node_name" exceeds 63 characters.
Previously, repmgrd assumed that during a failover, there would not
already be another primary node. However it's possible a node was
promoted manually. While this is not a desirable situation, it's
conceivable this could happen in the wild, so we should check for
it and react accordingly.
Also sanity-check that the follow target can actually be followed.
Addresses issue raised in GitHub #420.
Log the output of PQerrorStatus() in a couple of places where it was missing.
Additionally, always log the output of PQerrorStatus() starting with a blank
line, otherwise the first line looks like it was emitted by repmgr, and
it's harder to scan the error message.
Before:
[2019-03-20 11:24:15] [DETAIL] could not connect to server: Connection refused
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 5501?
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5501?
After:
[2019-03-20 11:27:21] [DETAIL]
could not connect to server: Connection refused
Is the server running on host "localhost" (::1) and accepting
TCP/IP connections on port 5501?
could not connect to server: Connection refused
Is the server running on host "localhost" (127.0.0.1) and accepting
TCP/IP connections on port 5501?
This is intended to ensure that all nodes have a constant LSN while
making the failover decision.
This feature is experimental and needs to be explicitly enabled with the
configuration file option "standby_disconnect_on_failover".
Note enabling this option will result in a delay in the failover decision
until the WAL receiver is disconnected on all nodes.
Previously repmgr only emitted that if there was a timeline/LSN
mismatch, but it's useful to have confirmation of how it came
to the conclusion that rejoin will succeed.