Commit Graph

1286 Commits

Author SHA1 Message Date
Ian Barwick
799ac6d453 Add is_server_available_quiet()
For use in cases where the caller collates node availability information
and doesn't want to prematurely emit log output.
2019-04-01 12:27:30 +09:00
Ian Barwick
57c0ccd477 Improve copying of strings from database results
Where feasible, specify the maximum string length via sizeof(), and
use snprintf() in place of strncpy().
2019-04-01 11:19:58 +09:00
Ian Barwick
aef8e31897 Bump master branch to 4.4dev 2019-03-28 17:24:36 +09:00
Ian Barwick
3d4b81ba2a Handle unhandled error situation in enable_wal_receiver() 2019-03-28 14:52:16 +09:00
Ian Barwick
98d924685b Updae BDR repmgrd to handle node_name as a max 63 char string
Follow-up from commit 1953ec7.
2019-03-28 14:32:52 +09:00
Ian Barwick
79613af8d0 Handle potential NULL return from string_skip_prefix() 2019-03-28 12:45:53 +09:00
Ian Barwick
5e9f202c9a Add missing break 2019-03-28 12:44:50 +09:00
Ian Barwick
e44c048ae2 Update code comment 2019-03-28 12:44:30 +09:00
Ian Barwick
bb42d8cba6 Fix calculation of maximum filename length 2019-03-28 12:40:29 +09:00
Ian Barwick
9d5afeebbc Remove logically dead code 2019-03-28 12:35:41 +09:00
Ian Barwick
fe822a9eea Prevent potential file descriptor resource leak 2019-03-28 12:29:10 +09:00
Ian Barwick
03cd5a6028 Put closedir call in correct location 2019-03-28 12:08:42 +09:00
Ian Barwick
1e1c596446 Add various missing close() calls 2019-03-28 11:32:25 +09:00
Ian Barwick
d43975eb5f Use correct argument for sizeof() 2019-03-28 11:02:50 +09:00
Ian Barwick
ece20f4831 Cast "int" to "long long" 2019-03-28 11:02:25 +09:00
Ian Barwick
e23f5afc5f doc: note valid characters for "node_name"
"node_name" will be used as "application_name", so should only contain
characters valid for that; see:

    https://www.postgresql.org/docs/current/runtime-config-logging.html#GUC-APPLICATION-NAME

Not yet enforced.
2019-03-28 10:53:43 +09:00
Ian Barwick
ba1f05ece9 Restrict "node_name" to maximum 63 characters
In "recovery.conf", the configuration parameter "node_name" is used
as the "application_name" value, which will be truncated by PostgreSQL
to 63 characters (NAMEDATALEN - 1).

repmgr sometimes needs to be able to extract the application name from
pg_stat_replication to determine if a node is connected (e.g. when
executing "repmgr standby register"), so the comparison will fail
if "node_name" exceeds 63 characters.
2019-03-28 10:37:57 +09:00
Ian Barwick
73ad689390 standby register: fail if --upstream-node-id is the local node ID 2019-03-27 14:22:55 +09:00
Ian Barwick
e9ece34aeb log_db_error(): fix formatted message handling 2019-03-27 11:00:31 +09:00
Ian Barwick
9dd2f30c72 Use sizeof(buf) rather than hard-coding value 2019-03-27 10:43:49 +09:00
Ian Barwick
9164d3931b repmgrd: clean up PQExpBuffer handling
Unless the PQExpBuffer is required for the duration of the function,
ensure it's always a variable local to the relevant code block. This
mitigates the risk of accidentally accessing a generically named
PQExpBuffer which hasn't been initialised or was previously terminated.
2019-03-26 13:15:25 +09:00
Ian Barwick
801ed2b0c8 repmgrd: don't terminate uninitialized PQExpBuffer 2019-03-26 11:35:45 +09:00
Ian Barwick
e490f35223 doc: fix syntax 2019-03-22 15:43:55 +09:00
Ian Barwick
ec873b0119 doc: update release notes 2019-03-22 15:43:49 +09:00
Ian Barwick
539861cb58 repmgrd: during failover, check if a node was already promoted
Previously, repmgrd assumed that during a failover, there would not
already be another primary node. However it's possible a node was
promoted manually. While this is not a desirable situation, it's
conceivable this could happen in the wild, so we should check for
it and react accordingly.

Also sanity-check that the follow target can actually be followed.

Addresses issue raised in GitHub #420.
2019-03-22 14:06:41 +09:00
Ian Barwick
6f0f338968 standby follow: set replication user when connecting to local node 2019-03-21 16:43:39 +09:00
Ian Barwick
bd26eb3025 standby switchover: don't attempt to pause repmgrd on unreachable nodes 2019-03-21 13:48:59 +09:00
Ian Barwick
9b089b7401 doc: add note about compiling against Pg11 and later with the --with-llvm option 2019-03-21 10:30:00 +09:00
Ian Barwick
314a1e8f4f use a constant to denote unknown replication lag 2019-03-20 17:26:04 +09:00
Ian Barwick
7204a0faf4 doc: consolidate witness server documentation 2019-03-20 16:31:52 +09:00
Ian Barwick
5e775cef16 doc: various improvements to repmgrd documentation 2019-03-20 16:10:03 +09:00
Ian Barwick
7d0caefaee Fix logging related to "connection_check_type"
Also log the selected type at repmgrd startup.
2019-03-20 11:58:18 +09:00
Ian Barwick
7434cc0b8e repmgrd: improve witness node monitoring
Mainly fix a couple of places where "standby" was hard-coded into a log
message which can apply either to a witness or a standby.
2019-03-20 11:47:36 +09:00
Ian Barwick
b84d98fe81 Explictly log PQping() failures 2019-03-20 11:47:32 +09:00
Ian Barwick
46efe57cd0 Improve database connection failure logging
Log the output of PQerrorStatus() in a couple of places where it was missing.

Additionally, always log the output of PQerrorStatus() starting with a blank
line, otherwise the first line looks like it was emitted by repmgr, and
it's harder to scan the error message.

Before:

    [2019-03-20 11:24:15] [DETAIL] could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?

After:

    [2019-03-20 11:27:21] [DETAIL]
    could not connect to server: Connection refused
            Is the server running on host "localhost" (::1) and accepting
            TCP/IP connections on port 5501?
    could not connect to server: Connection refused
            Is the server running on host "localhost" (127.0.0.1) and accepting
            TCP/IP connections on port 5501?
2019-03-20 11:47:28 +09:00
Ian Barwick
426759ca8e check_primary_status(): handle case where recovery type unknown 2019-03-18 16:16:54 +09:00
Ian Barwick
39df55c39c Check node recovery type before attempting to write an event record
In some corner cases (e.g. immediately after a switchover) where
the current primary has not yet been determined, the provided connection
might not be writeable. This prevents error messages such as
"cannot execute INSERT in a read-only transaction" generating unnecessary
noise in the logs.
2019-03-18 15:26:16 +09:00
Ian Barwick
f54ff85cfa Remove outdated comment
This was only relevant for repmgr3 and earlier; in repmgr4 the schema
is hard-coded.
2019-03-18 15:19:11 +09:00
Ian Barwick
8ab51c2ae3 Refactor check_primary_status()
Reduce nested if/else branching, and improve documentation.
2019-03-18 15:01:21 +09:00
Ian Barwick
43f28f4097 Clarify calls to check_primary_status()
Use a constant rather than a magic number to indicate non-provision
of elapsed degraded monitoring time.
2019-03-18 14:21:34 +09:00
Ian Barwick
0940185f49 doc: clarify "cluster show" error codes 2019-03-18 10:49:38 +09:00
John Naylor
4f9fc56871 Fix assorted Makefile bugs
1. The target additional-maintainer-clean was misspelled as
maintainer-additional-clean.

2. Add add missing clean targets, in particular sysutils.o, config.h,
repmgr_version.h, and Makefile.global. While at it, use a wildcard
for obj files.

3. Don't delete configure.

4. Remove generated file doc/version.sgml from the repo.

5. Have maintainer-clean recurse to the doc directory.
2019-03-15 16:29:31 +09:00
Ian Barwick
fbdf9617fa doc: update repmgrd example output 2019-03-15 15:43:11 +09:00
Ian Barwick
dfb92df05f doc: miscellaenous cleanup 2019-03-15 14:39:37 +09:00
Ian Barwick
9dd87dd5ce doc: add explanation of the configuration file format 2019-03-15 14:02:42 +09:00
Ian Barwick
a2df69512a doc: update "connection_check_type" descriptions 2019-03-14 15:44:59 +09:00
Ian Barwick
c2206b007a repmgrd: optionally check upstream availability through connection attempts 2019-03-14 15:44:53 +09:00
John Naylor
e06d3de444 Correct some doc typos 2019-03-14 11:58:31 +08:00
Ian Barwick
9d056b2f72 doc: expand "standby_disconnect_on_failover" documentation 2019-03-14 12:08:13 +09:00
Ian Barwick
19bf4d7434 Count witness and zero-priority nodes in visibility check 2019-03-14 11:17:51 +09:00