In the default text output mode, list inactive slots.
In CSV output mode, list inactive slots as additional information;
add output line with number of missing slots and a list thereof.
Also document --csv output mode.
Usually a seperate user (typically "repmgr") is set up specifically to manage
the repmgr metadata, however there's no compelling requirement to do this, and
it's possible the database owner (usually: "postgres") will be used, in which
case it's possible the username will be left out of the conninfo string.
Addresses GitHub #437.
If repmgrd marks the local node as unavailable, and it was actually
restarting but a failover event occured before the next local node
check, failover will continue with the stale connection handle.
Add a final local node check just before starting the failover
process, so repmgrd can reconnect if it wasn't able to before.
If monitoring history not in use, there's no activity on the standby's
connection handle, so if e.g. the standby is restarted, PQstatus()
never returns CONNECTION_BAD and repmgrd never notices the connection
is stale. Therefore execute a throw-away statement at "monitor_interval_secs".
When establishing a superuser connection, the connection parameters
were being copied from the existing (non-superuser) connection, which
in some circumstances can lead to that user's password being
included in the copied parameter list. The password parameter, if set, will
now always be removed, which will cause libpq to retrieve the correct
one from the .pgpass file.
Addresses GitHub #400.
pg_rewind is not part of the core distribution for those, but we
provided support in repmgr 3.3 so should extend it to repmgr 4.
Note that there is no check in place whether the pg_rewind binary
exists, so it's up to the user to ensure it's present.
Addresses GitHub #413.
Insert "connect_timeout=2" into the connection parameters, if not
explicitly set by the user. This will prevent excessive wait time
for the host operating system to report a connection timeout.
When parsing conninfo strings, ensure the error message pointer is
actually returned to the caller.
Not a criticial issue, just meant the contents of the error message
were not being displayed.
get_superuser_connection() was erroneously using the local node record
to connect to as a superuser, which works when registering the primary
but obviously not when cloning a standby.
Addresses GitHub #380.
This will generate "recovery.conf" for an existing standby.
Typical use-case is a standby cloned manually from an external data
source (e.g. Barman), where "recovery.conf" needs to be created
(and if required a replication slot).
The --dry-run option will check the pre-requisites but not actually
create "recovery.conf" or a replication slot.
This requires that the upstream node is running, a replication connection
can be made and if required a replication slot can be created.
Implements GitHub #382.
If repmgrd is running in degraded mode on a primary which has been stopped,
then manually been brought back online as a standby (e.g. by creating
recovery.conf and starting the server), ensure it not only detects the
change but automatically updates the node record so it can resume
monitoring the node as a standby.
Previously, repmgrd was looping waiting for the record to be updated
(as is done transparently when executing "repmgr node rejoin") but
if the record was not updated within the timeout period (e.g. by
"repmgr standby register) it would fail to resume monitoring as a
standby.
It seems reasonable to have repmgrd automatically update the node record,
as this will restore failover capability as quickly as possible. If this
is not desired, then the onus is on the user to shut down repmgrd while
making the desired changes.
Check that sufficient walsenders will be available on the promotion
candidate, and if replication slots are in use check if enough of
those will be available.
Note these checks can't guarantee that the walsenders/slots will
be available at the appropriate points during the switchover process,
but do ensure that existing configuration problems will be caught.
Implements GitHub #371.
For events generated by these commands, it may be useful to know details
of the primary node. This makes following additional parameters available
to event notification scripts:
- %p: node ID of the primary
- %a: node name of the primary
- %c: conninfo string for the primary
Implements GitHub #375
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.
It's possible that a node was registered with "use_replication_slots=false"
but that was later changed to "use_replication_slots=true". If the node
was not subsequently re-registered, the node record will contain an empty
slot name, which will cause any slot creation operation during
"standby follow" or "node rejoin" to fail.
To prevent this happening, check for an empty slot name and automatically
set before proceeding.
Addresses GitHub #343.
Previously the assumption was that the "repmgr" replication set would be
set up when the nodes are created, however no checks were implemented
and this was not well-documented.
Addresses GitHub #347.
This previously happened in the extension SQL code, which could
potentially cause replay problems if installing on a BDR cluster.
As this table is only required for streaming replication failover,
move the initialisation to "repmgr primary register".
Addresses GitHub #344 .
This bug was not detected before because most users work with the repmgr
user. For that reason, the repmgr schema is already in the search_path
by default.
Add the repmgr schema to the nodes table in the LEFT JOIN used for
cluster show (and in other places)
Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>