Commit Graph

164 Commits

Author SHA1 Message Date
Ian Barwick
0219f4c91f Always set "connect_timeout" when pinging a PostgreSQL instance
Insert "connect_timeout=2" into the connection parameters, if not
explicitly set by the user. This will prevent excessive wait time
for the host operating system to report a connection timeout.
2018-03-21 11:48:57 +09:00
Ian Barwick
85a4adc99c Update HISTORY 2018-03-21 06:48:32 +09:00
Martín Marqués
208d7d418e While reviewing 7cb6e5af8d before merging
I noticed that besides the result cleanup added, there was still a missing
spot inside the if condition.

Adding the PQclear that was missing.
2018-03-13 11:43:36 -03:00
Andrzej Nowicki
d2a2df13d5 One more memory leak fixed 2018-03-13 11:23:33 +01:00
Andrzej Nowicki
358e001218 Clear node list to avoid memory leak, fixes #402 2018-03-13 11:05:24 +01:00
Ian Barwick
d7702b3444 Correctly handle error message pointer when parsing strings.
When parsing conninfo strings, ensure the error message pointer is
actually returned to the caller.

Not a criticial issue, just meant the contents of the error message
were not being displayed.
2018-03-10 14:29:12 +09:00
Ian Barwick
9981ede1af "standby clone": fix --superuser handling
get_superuser_connection() was erroneously using the local node record
to connect to as a superuser, which works when registering the primary
but obviously not when cloning a standby.

Addresses GitHub #380.
2018-03-02 16:43:19 +09:00
Ian Barwick
29cb153643 "node status": improve replication slot warnings
Addresses GitHub #385
2018-02-23 11:19:33 +09:00
Ian Barwick
c644ddde51 Fix typo in function name 2018-02-22 15:50:57 +09:00
Ian Barwick
ee98a3a58e "standby clone": add --recovery-conf-only option
This will generate "recovery.conf" for an existing standby.

Typical use-case is a standby cloned manually from an external data
source (e.g. Barman), where "recovery.conf" needs to be created
(and if required a replication slot).

The --dry-run option will check the pre-requisites but not actually
create "recovery.conf" or a replication slot.

This requires that the upstream node is running, a replication connection
can be made and if required a replication slot can be created.

Implements GitHub #382.
2018-02-22 15:50:51 +09:00
Ian Barwick
22b3a74fa0 repmgrd: improve detection of status change from primary to standby
If repmgrd is running in degraded mode on a primary which has been stopped,
then manually been brought back online as a standby (e.g. by creating
recovery.conf and starting the server), ensure it not only detects the
change but automatically updates the node record so it can resume
monitoring the node as a standby.

Previously, repmgrd was looping waiting for the record to be updated
(as is done transparently when executing "repmgr node rejoin") but
if the record was not updated within the timeout period (e.g. by
"repmgr standby register) it would fail to resume monitoring as a
standby.

It seems reasonable to have repmgrd automatically update the node record,
as this will restore failover capability as quickly as possible. If this
is not desired, then the onus is on the user to shut down repmgrd while
making the desired changes.
2018-02-22 15:50:45 +09:00
Ian Barwick
f5f02ae0ee Replace remaining instances of strcpy() with strncpy()
Also use strncmp() to match.
2018-02-15 13:31:55 +09:00
Ian Barwick
6b7f6089ba "node status": add warning about missing replication slots
Implements GitHub #364.
2018-02-12 11:38:27 +09:00
Ian Barwick
ee2df36a76 "standby switchover": additional sanity checks
Check that sufficient walsenders will be available on the promotion
candidate, and if replication slots are in use check if enough of
those will be available.

Note these checks can't guarantee that the walsenders/slots will
be available at the appropriate points during the switchover process,
but do ensure that existing configuration problems will be caught.

Implements GitHub #371.
2018-02-08 15:19:24 +09:00
Ian Barwick
657ed83921 "cluster show": improve handling of database errors
In particular, if running "repmgr cluster show" against a database
without the repmgr metadata, showing the error (rather than just
"no records found" etc.) will provide some clues about the problem.
2018-02-05 10:35:56 +09:00
Ian Barwick
6c81e54f76 "standby follow": check for replication slot availability on target node 2018-02-02 17:18:43 +09:00
Ian Barwick
375a96a5c8 repmgrd: log execution error in "repmgrd_get_local_node_id()"
That shouldn't happen, but if it does it will make it easier to
identify the issue.
2018-01-16 11:16:19 +09:00
Ian Barwick
8fd0c4ad83 repmgr: assume node is actually shutting down if pingable and that's the reported status 2018-01-12 21:53:37 +09:00
Ian Barwick
7ccae6c2b1 repmgr: automatically create slot name if missing
It's possible that a node was registered with "use_replication_slots=false"
but that was later changed to "use_replication_slots=true". If the node
was not subsequently re-registered, the node record will contain an empty
slot name, which will cause any slot creation operation during
"standby follow" or "node rejoin" to fail.

To prevent this happening, check for an empty slot name and automatically
set before proceeding.

Addresses GitHub #343.
2018-01-11 14:47:50 +09:00
Ian Barwick
61d46172b9 repmgr: catch possible corner case when checking node shutdown status
It's conceivable that PQping is returning "no response" but the
shutdown hasn't quite completed.
2018-01-10 15:09:21 +09:00
Ian Barwick
810471b2f2 repmgr: during switchover, correctly detect unclean shutdown status 2018-01-10 12:25:16 +09:00
Ian Barwick
5bd8cf958a repmgr standby switchover: add "%p" event notification parameter
This will contain the node ID of the former primary.
2018-01-10 12:25:12 +09:00
Ian Barwick
fcb7e7a29b "repmgr bdr register": create missing connection replication set if needed
Previously the assumption was that the "repmgr" replication set would be
set up when the nodes are created, however no checks were implemented
and this was not well-documented.

Addresses GitHub #347.
2018-01-04 17:46:49 +09:00
Ian Barwick
26e404b1f3 "repmgr bdr register": improve node name check
We'll use "bdr.bdr_get_local_node_name()" to check the local BDR node
name and the repmgr one match.
2018-01-04 17:46:44 +09:00
Ian Barwick
841f03aeba Fix query in is_active_bdr_node()
Boolean column was not being checked correctly.

Also add detail output in "repmgr node role --check", where the function
is called.
2018-01-04 14:55:51 +09:00
Ian Barwick
cad12b1fb7 "repmgr cluster event": move query to dbutils.c 2018-01-04 14:55:46 +09:00
Ian Barwick
26a9e848fd Update copyright notices to 2018 2018-01-02 10:19:46 +09:00
Ian Barwick
8c121da8a1 Add diagnostic option "repmgr node check --has-passfile"
This checks if the active libpq version (9.6 and later) has the
"passfile" option, and returns 0 if present, 1 if not.
`
2017-12-11 20:09:48 +09:00
Ian Barwick
472d703d2e repmgr: initialise "voting_term" in "repmgr primary register"
This previously happened in the extension SQL code, which could
potentially cause replay problems if installing on a BDR cluster.

As this table is only required for streaming replication failover,
move the initialisation to "repmgr primary register".

Addresses GitHub #344 .
2017-11-28 11:08:12 +09:00
Ian Barwick
81beec54aa repmgr: fix return code output for repmgr node check --action=...
Addresses GitHub #340
2017-11-23 10:34:21 +09:00
Martín Marqués
2e42226f68 Fix missing FQN for the nodes table.
This bug was not detected before because most users work with the repmgr
user. For that reason, the repmgr schema is already in the search_path
by default.

Add the repmgr schema to the nodes table in the LEFT JOIN used for
cluster show (and in other places)

Signed-off-by: Martín Marqués <martin.marques@2ndquadrant.com>
2017-11-22 17:13:58 -03:00
Ian Barwick
8c422d6084 Remove unneeded functions 2017-11-20 15:18:21 +09:00
Ian Barwick
9165d27f9f "repmgr node ...": fixes for 9.3
Mainly to account for the lack of replication slots.
2017-11-16 11:25:16 +09:00
Ian Barwick
b8b991398a Escape double-quotes in strings passed to an event notification script
The string in question will be generated internally by repmgr as a simple
one-line string with no control characters etc., so all that needs to be
escaped at the moment are any double quotes.
2017-11-16 10:36:48 +09:00
Ian Barwick
9d432546bf repmgrd: don't fail over unless more than 50% of active nodes are visible. 2017-11-15 13:48:28 +09:00
Ian Barwick
022d9c58c2 Add "witness unregister" functionality 2017-11-15 13:47:48 +09:00
Ian Barwick
a6cc4d80f0 Add "witness register" functionality 2017-11-15 13:47:45 +09:00
Ian Barwick
eb14bb58c6 Add configuration file "passfile"
This will enable a custom .pgpass to be included in "primary_conninfo"
(provided it's supported by the libpq version on the standby).
2017-11-14 19:30:25 +09:00
Ian Barwick
aa089820ab repmgrd: check shared library is loaded
If this isn't the case, "repmgrd" will appear to run but not handle
failover correctly.

Address GitHub #337.
2017-11-10 14:35:17 +09:00
Ian Barwick
4ca7e6a6bf repmgrd: remove unneeded functions 2017-11-09 19:31:08 +09:00
Ian Barwick
79d21b516b repmgrd: fixes to failover handling
get_new_primary() returns NULL if no notification for the new primary has
been received, but the code was expecting it to return UNKNOWN_NODE_ID,
which was causing repmgrd to prematurely drop out of the new primary
detection loop if no notification had been received by the time the loop
started.

Also store the electoral term as a single row, single column table,
to ensure that all repmgrds see the same turn. It is then bumped
by the winning node after it gets promoted.

Various logging improvements.
2017-11-08 14:28:08 +09:00
Ian Barwick
d6c27f8938 Standardize quoting in log messages 2017-10-04 09:34:59 +09:00
Ian Barwick
31c7cb4e9a Fixes for 9.3 support 2017-09-15 17:13:17 +09:00
Ian Barwick
b6b31b15b2 Implement "repmgr cluster cleanup" 2017-09-11 13:48:46 +09:00
Ian Barwick
a9f4a027a7 pgindent run 2017-09-11 11:14:13 +09:00
Ian Barwick
e4f7dc8234 Add copyright notices 2017-09-08 13:27:39 +09:00
Ian Barwick
a28bbd68eb "standby clone": improve replication slots handling
Ensure replication slot is created on the upstream node and deleted from
the source node, if upstream node and source nodes differ.
2017-09-06 12:16:02 +09:00
Ian Barwick
bd07a34472 "standby clone": improve log messages
Make it clearer which nodes are being connected to, and why.
2017-09-06 10:15:52 +09:00
Ian Barwick
1ef00f5a3b repmgrd: parse "follow_command" during cascaded standby failover 2017-09-05 11:19:25 +09:00
Ian Barwick
78e6bdeebe Have repmgrd parse "standby follow --upstream-node-id=%n" 2017-09-04 13:42:50 +09:00