Commit Graph

817 Commits

Author SHA1 Message Date
Martín Marqués
49418e096e Fix typo in a code comment 2018-05-19 12:30:03 -03:00
Ian Barwick
6c518f1403 "standby clone": log actual connection string used to connect to upstream
Useful for diagnostic purposes.
2018-05-10 12:03:13 +09:00
Ian Barwick
b365765bc8 Fix check for -d/--dbname parameter
Not a bug per-se, just meant some unnecessary processing was done on
an empty string.

Per note from petere.
2018-05-10 12:03:09 +09:00
Ian Barwick
bd63948937 Include "arpa/inet.h" in dbutils.c
Needed for htonl() on FreeBSD.
2018-05-10 12:03:04 +09:00
Ian Barwick
69c1f147ea doc: update 2ndQuadrant repository information
Canonical link for each repository should not include any directories.
2018-05-10 10:39:31 +09:00
Ian Barwick
ce8d3cf0b0 doc: update repository information 2018-05-10 10:39:27 +09:00
Ian Barwick
14134f8e70 doc: update package installation information
Document the new public 2ndQuadrant apt repository
2018-05-10 10:39:23 +09:00
Ian Barwick
be8448ddcb doc: update package installation information
Document the new, public 2ndQuadrant RPM repository.
2018-05-10 10:39:18 +09:00
Ian Barwick
a2ff1536ad doc: add notes about package compatibility
We need to emphasise that the repmgr packages are only compatible
with packages based on the PGDG filesystem layout; 3rd party vendor
packages often put application and data directories elsewhere.
See e.g. GitHub #427.
2018-05-10 10:38:54 +09:00
Ian Barwick
9c0c1b663e Minor documentation fixes 2018-05-10 10:25:29 +09:00
Ian Barwick
2d43feb34b doc: update HISTORY and add 4.0.5 release notes 2018-05-01 10:21:40 +09:00
Ian Barwick
6f315c1b3c repmgrd: don't explicitly close connections on shutdown 2018-05-01 10:21:10 +09:00
Ian Barwick
635bdccb2c Fix parsing of "archive_ready_critical" configuration file parameter.
Per report in GitHub #426.
2018-04-28 07:00:56 +09:00
Ian Barwick
16048a879e repmgrd: notify sibling nodes to follow new primary after pg_ctl timeout
If "pg_ctl promote" fails due to a timeout, but the promotion itself succeeds,
have repmgrd on the new primary explicitly notify any sibling nodes to
follow it.

Previously the sibling nodes would wait "primary_notification_timeout" seconds
before attempting to discover the new primary.

This (and preceding commit eac80ae) address GitHub #425.
2018-04-27 11:54:21 +09:00
Ian Barwick
eac80ae9c1 repmgrd: handle pg_ctl timeout
It's possible "pg_ctl promote" will timeout, causing "repmgr standby
follow" to return with an error; however the promotion itself will usually
succeed, so detect this case and handle accordingly.
2018-04-26 19:19:42 +09:00
Ian Barwick
887b845aa0 repmgrd: always close the connection if the pointer is not NULL 2018-04-26 10:04:07 +09:00
Ian Barwick
8320179f34 Add configuration file parameter "config_directory"
This enables explicit provision of an external configuration file
directory, which if set will be passed to "pg_ctl" as the -D
parameter. Otherwise "pg_ctl" will default to using the data directory,
which will cause some operations to fail if the configuration files
are not present there.

Note this is implemented primarily for feature completeness and for
development/testing purposes. Users who have installed "repmgr" from
a package should not rely on "pg_ctl" to stop/start/restart PostgreSQL,
instead they should set the appropriate "service_..._command" for their
operating system. For more details see:

    https://repmgr.org/docs/4.0/configuration-service-commands.html

Note: in a future release, the presence of "config_directory" in repmgr.conf
will be used to implictly set "--copy-external-config-files=samepath" when
cloning a standby; this is a behaviour change so will be implemented in the
next major realease (repmgr 4.1).

Implements GitHub #424.
2018-04-25 11:58:24 +09:00
Ian Barwick
7822aa784f repmgrd: catch corner case in standby connection handle check
If repmgrd marks the local node as unavailable, and it was actually
restarting but a failover event occured before the next local node
check, failover will continue with the stale connection handle.

Add a final local node check just before starting the failover
process, so repmgrd can reconnect if it wasn't able to before.
2018-04-24 21:56:57 +09:00
Ian Barwick
4455ded935 repmgrd: prevent standby connection handle from going stale
If monitoring history not in use, there's no activity on the standby's
connection handle, so if e.g. the standby is restarted, PQstatus()
never returns CONNECTION_BAD and repmgrd never notices the connection
is stale. Therefore execute a throw-away statement at "monitor_interval_secs".
2018-04-24 21:56:52 +09:00
Ian Barwick
fd0b850f41 Minor doc and log output tweaks 2018-04-24 21:08:05 +09:00
Ian Barwick
d9ac1d6fd0 doc: minor clarification 2018-04-20 12:58:46 +09:00
Ian Barwick
11e4d9fd05 doc: additional details about repmgrd usage in Debian/Ubuntu 2018-04-20 12:58:41 +09:00
Ian Barwick
4b54106f48 doc: add Debian package details 2018-04-20 12:58:37 +09:00
Ian Barwick
f3941ceab0 doc: Improve CentOS package-related documentation 2018-04-20 12:58:33 +09:00
Ian Barwick
93f80c413e doc: link to service command configuration from switchover section 2018-04-20 10:15:22 +09:00
Ian Barwick
09b8a86605 doc: improve configuration documentation
With special attention to setting service commands, and extra special
mention of "pg_ctlcluster" for Debian/Ubuntu users.
2018-04-20 10:15:18 +09:00
Ian Barwick
6b3d54a5f3 doc: update CentOS package documentation 2018-04-20 10:15:14 +09:00
Ian Barwick
85ab2d94b7 repmgrd: tweak event notifications on standby failure
The event notification was only being created if there was a valid
primary connection; it should be created in any case, so an event
notification script can be executed.
2018-04-20 10:15:08 +09:00
Ian Barwick
cda952f1e4 Add "dbname=replication" to all replication connection strings
Previously repmgr was attempting to make replication connections
with "dbname" set to the repmgr database name. While this works
if e.g. the repmgr user also has replication permissions, it will
fail if a dedicated replication user is specified, who only has
permission to access the virtual "replication" database.

Change this to use "dbname=replication" if the replication connection
user is different to the normal repmgr database user.

(We could just always set it to "replication", but that might break
existing installations e.g. where a .pgpass file is in use and there's
no "replication" entry for the normal repmgr database user).

Addresses GitHub #421.
2018-04-12 16:11:16 +09:00
Ian Barwick
99ad57f88a doc: mention --recovery-conf-only introduced in repmgr 4.0.4
Per GitHub #419.
2018-04-12 16:11:12 +09:00
Ian Barwick
ad0671ead2 doc: various updates related to "standby clone" operations. 2018-04-12 16:11:07 +09:00
Ian Barwick
1bbb2ef213 Fix superuser password handling
When establishing a superuser connection, the connection parameters
were being copied from the existing (non-superuser) connection, which
in some circumstances can lead to that user's password being
included in the copied parameter list. The password parameter, if set, will
now always be removed, which will cause libpq to retrieve the correct
one from the .pgpass file.

Addresses GitHub #400.
2018-04-12 12:49:41 +09:00
Ian Barwick
62c29aab32 Don't issue a CHECKPOINT after promoting a standby.
Issuing a CHECKPOINT immediately after promoting a standby may impact
performance. Commit 239a548e9d ensures
one is only issued when required, i.e. during a switchover when
pg_rewind will be executed.

This reverts commit a2068768ab.
2018-04-09 14:35:54 +09:00
Ian Barwick
b9dc94f28f doc: update FAQ location 2018-04-07 11:46:10 +09:00
Ian Barwick
e8ba213174 "standby register": add sanity check when --upstream-node-id not supplied
If --upstream-node-id was not supplied to "repmgr standby register",
repmgr defaults to the primary node as upstream node. If the local node is
available, we now double-check that it's attached to the primary,
in case the lack of --upstream-node-id was an accidental ommission.

This check is only made when the local node is available.

This behaviour can be overriden with -F/--force (though it's hard to
imagine a scenario where that would be useful).

Addresses GitHub #395.
2018-04-05 17:38:55 +09:00
Ian Barwick
0dcddbb062 doc: minor FAQ tweaks 2018-04-05 17:10:33 +09:00
Ian Barwick
b4dab86c3b doc: add a section about repmgrd and service commands etc. 2018-04-05 11:49:08 +09:00
Ian Barwick
644a56a645 doc: miscelleneous FAQ updates
- clarify pg_rewind item
 - add note about what's included in recovery.conf
2018-04-04 10:07:08 +09:00
Ian Barwick
4876a9fde3 Add TODO for pg_rewind changes coming in PostgreSQL 11 2018-04-03 21:56:46 +09:00
Ian Barwick
ec998bf9c5 doc: update HISTORY and release notes 2018-04-03 15:00:49 +09:00
Ian Barwick
e36b180de8 Ensure correct server version number used for replication stats query 2018-04-03 14:45:37 +09:00
Ian Barwick
a2068768ab Execute a CHECKPOINT immediately after promoting the server
This ensures "pg_control" is updated with the latest timeline, mainly
to ensure that if "pg_rewind" is executed as part of a switchover
that it sees the latest timeline.

Per suggestion from GitHub user "superflav" in GitHub #378.

See also:

  https://www.postgresql.org/message-id/flat/20150428180253.GU30322%40tamriel.snowman.net
2018-04-03 14:44:44 +09:00
Ian Barwick
bde9fea48c Fix directory creation when cloning from Barman 2018-04-03 14:44:03 +09:00
Ian Barwick
cdaf84c329 doc: minor readbility fix 2018-04-03 14:42:48 +09:00
Ian Barwick
c4cd0c46da doc: add note about replication slots and PostgreSQL upgrades 2018-04-03 14:41:58 +09:00
Ian Barwick
3b00dc912a Catch various corner cases when restarting a PostgreSQL instance 2018-04-03 14:40:53 +09:00
Ian Barwick
1a80de1290 doc: document "primary_follow_timeout" configuration file parameter. 2018-04-03 14:39:38 +09:00
Ian Barwick
26b565dff2 Improve repmgrd logging in BDR mode
Also ensure interval status log line is shown as intended
2018-04-03 14:38:32 +09:00
Ian Barwick
96811ccc01 repmgrd: tweak log notices when marking a standby as failed
Announce what we're going to do (set the node record inactive) *before*
performing the action. Makes reading the log slightly easier.
2018-04-03 14:37:43 +09:00
Ian Barwick
73982859f6 repmgrd: improve log output
- emit explicit startup NOTICE
- emit NOTICE when falling back to degraded monitoring on a primary node
- improve log message and event notification details when monitoring
  a former primary which has been reconnected as a standby
2018-04-03 14:37:06 +09:00