A fix for this was introduced with commit ee9270fe8d
and removed in 4f1c67a1bf.
Refactor the original fix to simply omit attempting to write an invalid entry
to the monitoring table.
removing it.
Basically, on startup the standby will start receiving again from the
begining of the WAL and so received will be lower then applied.
A proper code is needed to make sure the standby is still following the
correct master (as per node information)
used in the cluster.
Main issue was that if the local repmgrd was not able to connect locally,
it would set the local node as failed (active = false). This is fine, because
we actually don't know if the node is active (actually, it's not active ATM)
so it's best to keep it out of the cluster.
The problem is that if the postgres service comes back up, and is able to
recover by it self, then we should ack that fact and set it as active.
There was another issue related with repmgrd being terminated if the postgres
service was downs. This is not the correct thing to do: we should keep
trying to connect to the local standby.
used in the cluster.
Main issue was that if the local repmgrd was not able to connect locally,
it would set the local node as failed (active = false). This is fine, because
we actually don't know if the node is active (actually, it's not active ATM)
so it's best to keep it out of the cluster.
The problem is that if the postgres service comes back up, and is able to
recover by it self, then we should ack that fact and set it as active.
There was another issue related with repmgrd being terminated if the postgres
service was downs. This is not the correct thing to do: we should keep
trying to connect to the local standby.
Also refactor configuration file handling while we're at it.
Previously a configuration file would be ignored if it couldn't
be opened, however that is now treated as an error.
repmgr and particularly repmgrd currently produce substantial
amounts of log output. Much of this is only useful when troubleshooting
or debugging.
Previously the -v/--verbose option just forced the log level to
INFO. With repmgrd this is pretty pointless - just set the log
level in the configuration file. With repmgr the configuration
file can be overriden by the new -L/--log-level option.
-v/--verbose now provides an additional, chattier/pedantic level
of logging ("Opening *this* logfile", "Executing *this* query",
"running in *this* loop") which is helpful for understanding
repmgr/repmgrd's behaviour, particularly for troubleshooting.
What additional verbose logging is generated will of course a
also depends on the log level set, so e.g. someone trying to
work out which configuration file is actually being opened
can use '--log-level=INFO --verbose' without being bothered
by an avalanche of extra verbose debugging output.
-t/--terse option will silence certain non-essential output, at
the moment any HINTs.
Note that -v/--verbose and -t/--terse are not mutually exclusive
(suggestions for better names welcome).
There are a few places where additional hints are written as log
output, usually LOG_NOTICE. Create an explicit function to provide
hints in a standardized manner; by storing the log level of the
previous logger call, we can ensure the hint is only displayed when
the log message itself would be.
Part of an ongoing effort to better control repmgr's logging output.
Related to Github #127.
- use the previously introduced repmgr_atoi() function to parse
integers better
- collate all detected errors and output as a list, rather than
failing on the first error.
repmgrd correctly updates ID of the upstream node after automatic
failover, but repmgr was not doing that for manual failvers.
This moves the existing function to dbutils and modifies it so that
it does not rely on global variables with configuration (available
just in repmgrd).
This should fix issue #67 (hopefully, haven't done much testing).
If no configuration file provided, also check default Postgres
sysconfig dir.
It would also be useful to check the configuration directory
provided by the RPM/DEB packages, not sure if that's programmatically
feasible.
Currently repmgr/repmgrd will only accept these as valid when
provided as the first command line option, however it's possible
a user will want to get the output of those options by adding
them to the end of a previously inputted command.
Note that after the first of these options is encountered, the
program will terminate and not process any other options. This
is consistent with psql's behaviour
Per GitHub issue #107 from Sébastien Gross.
The main change is that now check_connection requires a conninfo
parameter, and the connection object has type (PGconn **) so it can be
replaced by check_connection if needed.
The bug was caused by the fact that the first failure resulted in
*conn == NULL, so that subsequent checks of the upstream connection
were failing irrespectively of the actual state of the upstream node.
Now, when *conn == NULL, check_connection will use conninfo to
establish a new connection and place it into *conn. We introduce a new
INTERNAL_ERROR code for the case when they are both NULL.
In passing, we also reworded a confusing error message, distinguishing
a timeout from the actual elapsed time.
Provide the master connection if available, and if not enable
create_event_record() to skip trying to write to the database,
but execute the notification program if defined.