Because PQcancel() establish a new synchronous connection to the
database, if it fails it means something wrong has happenned with
master. So instead of just ignore the failure, CancelQuery() now
reports a failure condition so we can detect master's death in
that situation.
This is very important specially when only postmaster crashes but
other children/backend connections are still there. Because the
children connection won't fail and CancelQuery() failure is our
only indication of something wrong happenning.
Currently we just ignore the PQcancel() failure which leads us to
a situation in which we just loop forever
trying to cancel the async query.
Reported by: Martin Euser <martin.euser@nl.abnamro.com>
Problem analyzed and bug spotted by: Andres Freund <andres@2ndquadrant.com>
Patch by: Jaime Casanova <jaime@2ndquadrant.com>
voting procedure.
This is because of a race condition inside CheckPrimaryConnection().
This has independently reported by Alex Railean and Dumitru, and Frank Jördens.
Analyzed and fixed by Cédric Villemain.
The fix have been verified to work by Frank
xlog records from master, so it standby should use pg_last_xlog_receive_location()
to report their positions. This solves a possible situation in which
a standby that is considered as new master when promoted is no longer
the best option.
This is important for autofailover to do the right thing when
standbys detected master death at different times.
While this is a new option, seems important for the autofailover
to work properly so i will consider the lack of it a bug and
will backpatch to 2.0 where autofailover was introduced.
For gripe from Alex Railean, about a standby not finding the new
master because the new master hasn't finish promoting.
two nodes with different prorities are equally good to be promoted
the second one (with a lower priority, considering them
in descending order) will win.
Per report from Brailean Dumitru
get a warning for an uninitialized variable.
Also, define InvalidXLogRecPtr. We don't really need it but using
it make the initialization future proof (considering that in 9.3
XLogRecPtr will change its structure).
We need to add an #include and make it use a different path for the
"true" binary.
Maybe we need to make this changes for all BSD systems but having no
evidence of that i prefer to make this only for systems with __FreeBSD__
to zero, which was causing to a false positive in the failure detection
logic in wait_connection_availability(). So, change that to defaults to 60s
and add a check to avoid it being set to zero or negative.
Problem reported and analyzed by Andrew Newman
to find the master.
This was a micro optimization based on the fact that all commands that
needed to detect the master were executed from the standby but now that
we have CLUSTER level commands that is not true anymore
of time we spent a reponse from master before declaring the failure.
Also, change is_pgup() so it use PQsendQuery() instead of PQexec to
execute the check of master
Or at least, we try. By default, after 60 seconds pg_ctl just return.
This make useless to wait ourselves after pg_ctl start of witness so
remove the sleep