Related to Github #127.
- use the previously introduced repmgr_atoi() function to parse
integers better
- collate all detected errors and output as a list, rather than
failing on the first error.
repmgrd correctly updates ID of the upstream node after automatic
failover, but repmgr was not doing that for manual failvers.
This moves the existing function to dbutils and modifies it so that
it does not rely on global variables with configuration (available
just in repmgrd).
This should fix issue #67 (hopefully, haven't done much testing).
If no configuration file provided, also check default Postgres
sysconfig dir.
It would also be useful to check the configuration directory
provided by the RPM/DEB packages, not sure if that's programmatically
feasible.
Currently repmgr/repmgrd will only accept these as valid when
provided as the first command line option, however it's possible
a user will want to get the output of those options by adding
them to the end of a previously inputted command.
Note that after the first of these options is encountered, the
program will terminate and not process any other options. This
is consistent with psql's behaviour
Per GitHub issue #107 from Sébastien Gross.
The main change is that now check_connection requires a conninfo
parameter, and the connection object has type (PGconn **) so it can be
replaced by check_connection if needed.
The bug was caused by the fact that the first failure resulted in
*conn == NULL, so that subsequent checks of the upstream connection
were failing irrespectively of the actual state of the upstream node.
Now, when *conn == NULL, check_connection will use conninfo to
establish a new connection and place it into *conn. We introduce a new
INTERNAL_ERROR code for the case when they are both NULL.
In passing, we also reworded a confusing error message, distinguishing
a timeout from the actual elapsed time.
Provide the master connection if available, and if not enable
create_event_record() to skip trying to write to the database,
but execute the notification program if defined.
Command to be executed each time an event is logged.
Following formatting sequences will be interpolated:
%e - event type
%d - description
%s - success (1 or 0)
%t - timestamp
No need to prefix each line with the program name; this was pretty
inconsistent anyway. The only place where log output needs to identify
the outputting program is when syslog is being used, which is done
anyway.
Daemonizing changes the current working directory to '/',
which breaks configuration file parsing if the file is in
the previous working directory and provided without an
explicit path.
Also it makes general sense to parse the configuration file
before daemonizing.
This makes keeping track of events such as failovers
much easier. Note that this is for convenience and is
not a foolproof auditing log.
Sample output:
repmgr_db=# SELECT * from repmgr_test.repl_events ;
node_id | event | successful | event_timestamp | details
---------+--------------------------+------------+-------------------------------+----------------------------------------------------------
1 | master_register | t | 2015-03-06 14:14:08.196636+09 |
2 | standby_clone | t | 2015-03-06 14:14:17.660768+09 | Backup method: pg_basebackup; --force: N
2 | standby_register | t | 2015-03-06 14:14:18.762222+09 |
4 | witness_create | t | 2015-03-06 14:14:22.072815+09 |
3 | standby_clone | t | 2015-03-06 14:14:23.524673+09 | Backup method: pg_basebackup; --force: N
3 | standby_register | t | 2015-03-06 14:14:24.620161+09 |
2 | repmgrd_start | t | 2015-03-06 14:14:29.639096+09 |
3 | repmgrd_start | t | 2015-03-06 14:14:29.641489+09 |
4 | repmgrd_start | t | 2015-03-06 14:14:29.648002+09 |
2 | standby_promote | t | 2015-03-06 14:15:01.956737+09 | Node 2 was successfully be promoted to master
2 | repmgrd_failover_promote | t | 2015-03-06 14:15:01.964771+09 | Node 2 promoted to master; old master 1 marked as failed
3 | repmgrd_failover_follow | t | 2015-03-06 14:15:07.228493+09 | Node 3 now following new upstream node 2
(12 rows)