Previously, repmgr was using a very simple ad-hoc string-based parser,
which had various limitations and allowed configuration files to be
created in a way which could cause confusion and/or unexpected
behaviour.
For example, it accepted strings enclosed in single quotes, but treated
strings enclosed in double quotes literally. A node_name defined thusly:
node_name="somenode"
would result in the literal value '"somenode"' being used, which could
lead to unobvious errors along the lines of:
no record found for ""somenode""
The configuration file parser has been adapted from the one used by
PostgreSQL itself, so behaves more-or-less identically (though some
functions such as file inclusion are not supported in repmgr).
This makes configuration parsing more robust and consistent;
additionally, error reporting will be more precise.
Note this does mean that some repmgr.conf items previously accepted
as valid by repmgr will now be rejected; in particular this includes
strings containing spaces which are not enclosed in single quotes.
This reverts commit c6ca183247.
Backing out this patch for now as the Debian build system doesn't
seem to like it, even though it builds just fine on Debian itself.
Previously, repmgr was using a very simple ad-hoc string-based parser,
which had various limitations and allowed configuration files to be
created in a way which could cause confusion and/or unexpected
behaviour.
For example, it accepted strings enclosed in single quotes, but treated
strings enclosed in double quotes literally. A node_name defined thusly:
node_name="somenode"
would result in the literal value '"somenode"' being used, which could
lead to unobvious errors along the lines of:
no record found for ""somenode""
The configuration file parser has been adapted from the one used by
PostgreSQL itself, so behaves more-or-less identically (though some
functions such as file inclusion are not supported in repmgr).
This makes configuration parsing more robust and consistent;
additionally, error reporting will be more precise.
Note this does mean that some repmgr.conf items previously accepted
as valid by repmgr will now be rejected; in particular this includes
strings containing spaces which are not enclosed in single quotes.
Now the "xlog nomenclature" Pg versions are fading into the past,
rename things related to handling pg_basebackup's -X option
(was: --xlog-method, now: --wal-method) to start with "wal_"
rather than "xlog_".
This is a cosmetic change for code clarity.
This functionality enables repmgrd (when running on the primary) to
monitor connected child nodes. It will log connections and disconnections
and generate events.
Additionally, repmgrd can execute a custom script if the number of connected
child nodes falls below a configurable threshold. This script can be used
e.g. to "fence" the primary following a failover situation where a new primary
has been promoted and all standbys are now child nodes of that primary.
In "recovery.conf", the configuration parameter "node_name" is used
as the "application_name" value, which will be truncated by PostgreSQL
to 63 characters (NAMEDATALEN - 1).
repmgr sometimes needs to be able to extract the application name from
pg_stat_replication to determine if a node is connected (e.g. when
executing "repmgr standby register"), so the comparison will fail
if "node_name" exceeds 63 characters.
If "failover_validation_command" is set, and the command returns an error,
rerun the election.
There is a pause between reruns to avoid "churn"; the length of this pause
is controlled by the configuration parameter "election_rerun_interval".
This controls the maximum length of time in seconds that repmgrd will
wait for other standbys to disconnect their WAL receivers in a failover
situation.
This setting is only used when "standby_disconnect_on_failover" is set to "true".
This is intended to ensure that all nodes have a constant LSN while
making the failover decision.
This feature is experimental and needs to be explicitly enabled with the
configuration file option "standby_disconnect_on_failover".
Note enabling this option will result in a delay in the failover decision
until the WAL receiver is disconnected on all nodes.
This enable selection of the method repmgrd uses to check whether the upstream
node is available. Possible values are:
- "ping" (default): uses PQping() to check server availability
- "connection": executes a query on the connection to check server
availability (similar to repmgr3.x).
Immediately after the demotion candidate (primary) has shut down, we can't
be absolutely sure that the walreceiver has flushed all WAL to disk, so
checking pg_last_wal_receive_lsn() at that point might not reflect
the actual last available WAL location.
To handle this, we'll loop for a while (timeout controlled by configuration
parameter "wal_receive_check_timeout") before finally deciding whether
the standby is still behind the shut-down primary.
Addresses issue raised in GitHub #518.
This is to facilitate remote invocation of repmgr when the repmgr
binary is located somewhere other than the PostgreSQL binary directory, as it
cannot be assumed all package maintainers will install repmgr there.
This parameter is optional; if not set (the default), repmgr will fall back
to "pg_bindir" (if set).
Addresses GitHub #246.
Previously, "repmgr standby switchover" used the configuration file parameters
"reconnect_interval" and "reconnect_attempts" to define a timeout to determine
whether the current primary (demotion candidate) has shut down.
However, these parameters are intended for primary failure detection and are
generally lower in value, while a controlled shutdown may take longer, resulting
in the switchover being aborted as repmgr was not waiting long enough.
To prevent this happening, parameter "shutdown_check_timeout" has been added.
This complements the existing "standby_reconnect_timeout" parameter used
by "repmgr standby switchover".
Implements GitHub #504.
This matches the behaviour of other PostgreSQL utilities such as psql, though
repmgr will only abort once all command line options are parsed, so as many
errors as possible are found and displayed. If a repmgr "command" (e.g.
"repmgr primary ..." was provided, a hint about the relevant command
help section (e.g. "repmgr primary --help") will be provided alongside
the generic help command (i.e. "repmgr --help").
Addresses GitHub #464, with further improvements.
It's hard to imagine a use case where this isn't desirable, but
in case, for whatever reason, the user does not wish to daemonize the
process, the command line option "--daemonize=false" can be provided.
Implements GitHub #458.
Traditionally repmgrd will only write a pidfile if explicitly requested with
-p/--pid-file. However it's normally desirable to have a pidfile, and it's
preferable to have one used by default to prevent accidentally starting a second
repmgrd instance.
Following changes made:
- add configuration file parameter "repmgrd_pid_file" (initially overridden by
-p/--pid-file for backwards compatibility, though eventually we'll want to
drop -p/--pid-file altogether)
- add command line option --no-pid-file
- if neither "repmgrd_pid_file" nor -p/--pid-file is set, create the pid file
in a temporary directory
Implements GitHub #457.
Currently the (very generic sounding) "standby_reconnect_timeout" configuration
file parameter is used in several different contexts and it would be useful
to have more granular control over the different timeouts it's used to configure.
This patch introduces "node_rejoin_timeout", used in place of "standby_reconnect_timeout"
(which wasn't documented) when "repmgr node rejoin" is executed, to determine
how long to wait for the node to rejoin the replication cluster.
Additionally "repmgrd_standby_startup_timeout" is introduced as a timeout for
failover situations, when repmgrd executes "repmgr standby follow" to follow
a new primary, and waits for the standby to restart and become available
for connections.
"standby_reconnect_timeout" is now only relevant for "repmgr standby switchover".
Implements GitHub #454.
After restarting the standby, poll pg_stat_replication on the upstream
until the standby connects, and exit with an error if it doesn't by the
timeout defined in "standby_follow_timeout".
Implments GitHub #444.
This enables explicit provision of an external configuration file
directory, which if set will be passed to "pg_ctl" as the -D
parameter. Otherwise "pg_ctl" will default to using the data directory,
which will cause some operations to fail if the configuration files
are not present there.
Note this is implemented primarily for feature completeness and for
development/testing purposes. Users who have installed "repmgr" from
a package should not rely on "pg_ctl" to stop/start/restart PostgreSQL,
instead they should set the appropriate "service_..._command" for their
operating system. For more details see:
https://repmgr.org/docs/4.0/configuration-service-commands.html
Note: in a future release, the presence of "config_directory" in repmgr.conf
will be used to implictly set "--copy-external-config-files=samepath" when
cloning a standby; this is a behaviour change so will be implemented in the
next major realease (repmgr 4.1).
Implements GitHub #424.
If "archive_cleanup_command" is defined in "repmgr.conf", a corresponding
entry will be made in the node's "recovery.conf" file after cloning a
standby.
Note that we recommend using PgBarman to manage WAL archives, but are
providing this facility to help repmgr to be integrated in existing environments.
Implements GitHub #416.
This introduces following new configuration file parameters, which
were previously hard-coded values:
- promote_check_timeout
- promote_check_interval
Implements GitHub #387.
This is used for determining a timeout when reconnecting to the standby
after executing the "follow_command". This will normally not need to be
set explicitly, but maybe useful in cases where the standby's startup
phase can last longer than usual.
When executing repmgr on remote nodes, we otherwise end up jumping
through hoops as we can't make assumptions about where the configuration
file is located, but really need to be able to provide it.
From a support point of view it will also make life easier as it will
be easy to specify exactly which file to provide.