No need to prefix each line with the program name; this was pretty
inconsistent anyway. The only place where log output needs to identify
the outputting program is when syslog is being used, which is done
anyway.
Daemonizing changes the current working directory to '/',
which breaks configuration file parsing if the file is in
the previous working directory and provided without an
explicit path.
Also it makes general sense to parse the configuration file
before daemonizing.
This makes keeping track of events such as failovers
much easier. Note that this is for convenience and is
not a foolproof auditing log.
Sample output:
repmgr_db=# SELECT * from repmgr_test.repl_events ;
node_id | event | successful | event_timestamp | details
---------+--------------------------+------------+-------------------------------+----------------------------------------------------------
1 | master_register | t | 2015-03-06 14:14:08.196636+09 |
2 | standby_clone | t | 2015-03-06 14:14:17.660768+09 | Backup method: pg_basebackup; --force: N
2 | standby_register | t | 2015-03-06 14:14:18.762222+09 |
4 | witness_create | t | 2015-03-06 14:14:22.072815+09 |
3 | standby_clone | t | 2015-03-06 14:14:23.524673+09 | Backup method: pg_basebackup; --force: N
3 | standby_register | t | 2015-03-06 14:14:24.620161+09 |
2 | repmgrd_start | t | 2015-03-06 14:14:29.639096+09 |
3 | repmgrd_start | t | 2015-03-06 14:14:29.641489+09 |
4 | repmgrd_start | t | 2015-03-06 14:14:29.648002+09 |
2 | standby_promote | t | 2015-03-06 14:15:01.956737+09 | Node 2 was successfully be promoted to master
2 | repmgrd_failover_promote | t | 2015-03-06 14:15:01.964771+09 | Node 2 promoted to master; old master 1 marked as failed
3 | repmgrd_failover_follow | t | 2015-03-06 14:15:07.228493+09 | Node 3 now following new upstream node 2
(12 rows)
This involves mainly abstracting the functions which copy
and create records from repmgr.c to dbutils.c, as they need
to be shared between repmgr and repmgrd.
Per issue noted here:
https://groups.google.com/forum/#!topic/repmgr/v5nu1Xwf6X0
If the node is a cascaded standby and the primary fails, `primary_conn`
will not be updated automatically; when writing monitoring info,
ensure we connect to the current primary.
Attempt to attach to the next available upstream node, otherwise
quit monitoring. We'll need to add further options for failover
scenarios, including attempting to attach to another node,
shutting down the server completely etc.
To handle cascaded replication we're going to have to keep track
of each node's upstream node. Also enumerate the node type
("primary", "standby" or "witness") and mark if active.