A fix for this was introduced with commit ee9270fe8d
and removed in 4f1c67a1bf.
Refactor the original fix to simply omit attempting to write an invalid entry
to the monitoring table.
removing it.
Basically, on startup the standby will start receiving again from the
begining of the WAL and so received will be lower then applied.
A proper code is needed to make sure the standby is still following the
correct master (as per node information)
used in the cluster.
Main issue was that if the local repmgrd was not able to connect locally,
it would set the local node as failed (active = false). This is fine, because
we actually don't know if the node is active (actually, it's not active ATM)
so it's best to keep it out of the cluster.
The problem is that if the postgres service comes back up, and is able to
recover by it self, then we should ack that fact and set it as active.
There was another issue related with repmgrd being terminated if the postgres
service was downs. This is not the correct thing to do: we should keep
trying to connect to the local standby.
used in the cluster.
Main issue was that if the local repmgrd was not able to connect locally,
it would set the local node as failed (active = false). This is fine, because
we actually don't know if the node is active (actually, it's not active ATM)
so it's best to keep it out of the cluster.
The problem is that if the postgres service comes back up, and is able to
recover by it self, then we should ack that fact and set it as active.
There was another issue related with repmgrd being terminated if the postgres
service was downs. This is not the correct thing to do: we should keep
trying to connect to the local standby.
Anyone needing them, particularly in a replication context, should
know what they're doing anyway.
See also: http://www.postgresql.org/docs/current/interactive/sql-createindex.html#AEN74175
"Also, changes to hash indexes are not replicated over streaming or file-based
replication after the initial base backup, so they give wrong answers to
queries that subsequently use them. For these reasons, hash index use is presently
discouraged."
Also refactor configuration file handling while we're at it.
Previously a configuration file would be ignored if it couldn't
be opened, however that is now treated as an error.
When parsing command line arguments in check_parameters_for_action(),
create warnings for paramters supplied but not required (e.g. -D/--data-dir
for MASTER REGISTER), rather than fail with error(s), as the
presence of the parameters won't cause any problems.
Errors will still be raised for required-but-missing parameters, of course.
* repmgr: add explicit --log-level flag, repurpose --verbose flag to
show extra detailed/repetitive output only (see item below too)
-> e0cbdd5b31
* debug output: show some repetitive output only if --verbose flag set to prevent
excessive log growth
-> 8ab1901a93
This should make it more likely that the actual primary is first
in the retrieved list, reducing the number of connections to
other nodes in the cluster which need to be made.