Commit Graph

143 Commits

Author SHA1 Message Date
Ian Barwick
5e9db47d12 Fix query in get_node_record_by_name() 2016-07-05 21:06:31 +09:00
Ian Barwick
3fac975de6 Prevent multiple nodes being registered with the same name.
Fixes GitHub #192.
2016-06-24 09:25:41 +09:00
Ian Barwick
005640be51 Fix PQconninfoParse() return type check 2016-06-05 10:20:42 +09:00
Ian Barwick
9a05999abb Fix log formatting 2016-05-17 17:24:02 +09:00
Ian Barwick
2eb00a3e6f Remove unneeded column 2016-05-12 09:56:29 +09:00
Ian Barwick
21b2ff1a1f repmgrd: better handling of missing upstream_node_id
Ensure we default to master node.
2016-05-12 09:20:33 +09:00
Ian Barwick
57f9432692 Add missing newlines in log messages 2016-05-11 21:47:40 +09:00
Ian Barwick
e3e1c5de4e Use "immediately_reserve" parameter in pg_create_physical_replication_slot (9.6) 2016-04-04 12:56:00 +09:00
Ian Barwick
5bc809466c Make self-referencing foreign key on repl_nodes table deferrable 2016-04-01 15:19:22 +09:00
Ian Barwick
2a8d6f72c6 Make witness server node update an atomic operation
If the connection to the primary is lost, roll back to the previously
known state.

TRUNCATE is of course not MVCC-friendly, but that shouldn't matter here
as only one process should ever be looking at this table.
2016-04-01 11:07:17 +09:00
Ian Barwick
190cc7dcb4 Rename copy_configuration () to witness_copy_node_records()
As it's witness-specific. Per suggestion from Martín.
2016-04-01 08:44:23 +09:00
Ian Barwick
c48c248c15 Regularly sync witness server repl_nodes table.
Although the witness server will resync the repl_nodes table following
a failover, other operations (e.g. removing or cloning a standby)
were previously not reflected in the witness server's copy of this
table.

As a short-term workaround, automatically resync the table at regular
intervals (defined by the configuration file parameter
"witness_repl_nodes_sync_interval_secs", default 30 seconds).
2016-03-29 16:49:28 +09:00
Ian Barwick
ca6cbcf965 Add sanity checks to be sure pg_rewind can be used before executing a switchover 2016-01-28 09:25:00 +09:00
Ian Barwick
f982708b35 Add function test_db_connection()
The difference between this and establish_db_connection() is that
it outputs any connection failure as a [NOTICE] rather than an
[ERROR]; it's intended for use in e.g. polling a server to wait
for it to come up/go down, while preventing [ERROR] log lines
which may cause confusion.
2016-01-20 07:56:03 +09:00
Ian Barwick
7e6bac1be6 Display a couple of repetitive log messages in verbose mode only 2016-01-08 11:10:34 +09:00
Ian Barwick
b72058dba8 Update copyright notice to 2016 2016-01-05 15:57:46 +09:00
Ian Barwick
120688013e Add "standby switchover" mode
Perform a switchover by:
 - stopping current primary node
 - promoting this standby node to primary
 - forcing previous primary node to follow this node

Caveats:
 - repmgrd must not be running, otherwise it may
   attempt a failover
   (TODO: find some way of notifying repmgrd of planned
    activity like this)
 - currently only set up for two-node operation; any other
   standbys will probably become downstream cascaded standbys
   of the old primary once it's restarted
 - as we're executing repmgr remotely (on the old primary),
   we'll need the location of its configuration file; this
   can be provided explicitly with -C/--remote-config-file,
   otherwise repmgr will look in default locations on the
   remote server
 - this does not yet support "rewinding" stopped nodes
   which will be unable to catch up with the primary

TODO:
 - update help, docs
 - make connection test timeouts/intervals configurable
2015-11-30 12:20:24 +09:00
Ian Barwick
67a81d1d47 Minor log message fixes 2015-11-18 09:10:22 +09:00
Ian Barwick
fc6225a511 Refactor get_master_connection() and update description
Use 'remote_conn' instead of 'master_conn', as the connection
handle can potentially be used for any node.
2015-11-17 13:59:28 +09:00
Ian Barwick
e3111d37ba get_master_connection(): order node list by node type and priority
This should make it more likely that the actual primary is first
in the retrieved list, reducing the number of connections to
other nodes in the cluster which need to be made.
2015-11-17 13:59:28 +09:00
Ian Barwick
3ab91730c3 get_master_connection(): possible to use is_standby() now 2015-11-16 17:49:02 +09:00
Ian Barwick
dd7f9b79ae Tidy up logging output in dbutils.c
Log all executed SQL if verbose mode is enabled.
2015-11-16 17:39:42 +09:00
Ian Barwick
487aadc4b9 Add TODO items 2015-11-13 14:51:27 +09:00
Ian Barwick
142517fcca Always use catalog path when calling system functions
Removes any risk of issues due to search path mangling etc.
2015-11-11 11:17:47 +09:00
Ian Barwick
56cec22f22 Use pg_malloc0() instead of malloc()
See also d08bd352c1
2015-10-26 15:34:33 +09:00
Ian Barwick
06b9e0a8ec Clarify purpose of get_repmgr_schema() 2015-10-07 10:54:47 +09:00
Ian Barwick
45eb0ea5d3 Miscelleanous comment fixes 2015-09-25 11:17:26 +09:00
Ian Barwick
c3bd02b83d Standardize if-statement formatting
"if(" -> "if ("
2015-09-24 17:45:08 +09:00
Ian Barwick
8e7d110a22 Check for existing master record before deleting it
Otherwise repmgr implies it's deleting a record which isn't actually
there.
2015-09-24 17:39:39 +09:00
Ian Barwick
c429b0b186 Don't fail with error when registering master if schema already defined
Registering a master creates the schema, but it may be desirable
to forcibly reregister a master without deleting the schema, so
uncouple the dependency.

Also ensure schema creation is atomic by wrapping it in a transaction.

Per GitHub issue #49.
2015-09-24 16:55:43 +09:00
Tomas Vondra
ef6b24551a call update_node_record_set_upstream() for STANDBY FOLLOW
repmgrd correctly updates ID of the upstream node after automatic
failover, but repmgr was not doing that for manual failvers.

This moves the existing function to dbutils and modifies it so that
it does not rely on global variables with configuration (available
just in repmgrd).

This should fix issue #67 (hopefully, haven't done much testing).
2015-09-23 12:32:47 +09:00
Ian Barwick
e8025c7c9f Re-use replication slot if it already exists
Per issue #65 in GitHub
2015-04-13 13:17:38 +09:00
Ian Barwick
4dfeffe087 Add constant NODE_NOT_FOUND
Which is what the magic number means in those contexts.
2015-03-31 14:35:16 +09:00
Ian Barwick
7d33c1e411 Only attempt to log an event if the rempgr schema has been set
In some circumstances (primarily when executing `repmgr standby
clone`) the `repmgr.conf` file is not mandated. However this means
the repmgr schema is not known, and any attempt to create an
event record will result in a log warning, which may cause
confusion as to the success of the operation.

It might be better to mandate providing `repmgr.conf` in all
circumstances.

Per report in https://github.com/2ndQuadrant/repmgr/issues/53 .
2015-03-31 10:28:34 +09:00
Ian Barwick
96255b988a Remove unused function
And standardized nomenclature on "master" rather than "primary".
2015-03-24 14:29:54 +09:00
Ian Barwick
bd19a2c868 Improve handling of event logging in rempgrd
Provide the master connection if available, and if not enable
create_event_record() to skip trying to write to the database,
but execute the notification program if defined.
2015-03-24 13:40:39 +09:00
Ian Barwick
3e2c9ed410 Support --fast-checkpoint 2015-03-23 12:18:17 +09:00
Ian Barwick
c757985640 primary -> master
For consistency
2015-03-19 11:26:47 +09:00
Ian Barwick
874616f149 Add %n/node id format option for 'event_notification_command' 2015-03-16 18:04:50 +09:00
Ian Barwick
61ce18ebbe Add configuration parameter 'event_notifications' 2015-03-16 17:31:26 +09:00
Ian Barwick
922dfd88e5 Add configuration option 'event_notification_command'
Command to be executed each time an event is logged.

Following formatting sequences will be interpolated:

      %e - event type
      %d - description
      %s - success (1 or 0)
      %t - timestamp
2015-03-16 13:41:13 +09:00
Ian Barwick
0307c51d4b Add initial event logging code 2015-03-16 07:44:54 +09:00
Abhijit Menon-Sen
94d0d119f6 Fix typo 2015-03-13 16:15:53 +05:30
Ian Barwick
606d0afabc primary -> master
For consistency.
2015-03-09 15:48:46 +09:00
Ian Barwick
4e6c250830 Remove experimental event logging code
Needs more bikeshedding.
2015-03-09 14:39:04 +09:00
Ian Barwick
abf92883a8 Clean up log output
No need to prefix each line with the program name; this was pretty
inconsistent anyway. The only place where log output needs to identify
the outputting program is when syslog is being used, which is done
anyway.
2015-03-09 12:00:05 +09:00
Ian Barwick
2339adba6c Fix event logging when cloning from another standby
We can only write to the primary, which we'll need to find seperately
when cloning from a standby.
2015-03-06 18:39:36 +09:00
Ian Barwick
491309f4ba Write events of note to a log table
This makes keeping track of events such as failovers
much easier. Note that this is for convenience and is
not a foolproof auditing log.

Sample output:

repmgr_db=# SELECT * from repmgr_test.repl_events ;
 node_id |          event           | successful |        event_timestamp        |                         details
---------+--------------------------+------------+-------------------------------+----------------------------------------------------------
       1 | master_register          | t          | 2015-03-06 14:14:08.196636+09 |
       2 | standby_clone            | t          | 2015-03-06 14:14:17.660768+09 | Backup method: pg_basebackup; --force: N
       2 | standby_register         | t          | 2015-03-06 14:14:18.762222+09 |
       4 | witness_create           | t          | 2015-03-06 14:14:22.072815+09 |
       3 | standby_clone            | t          | 2015-03-06 14:14:23.524673+09 | Backup method: pg_basebackup; --force: N
       3 | standby_register         | t          | 2015-03-06 14:14:24.620161+09 |
       2 | repmgrd_start            | t          | 2015-03-06 14:14:29.639096+09 |
       3 | repmgrd_start            | t          | 2015-03-06 14:14:29.641489+09 |
       4 | repmgrd_start            | t          | 2015-03-06 14:14:29.648002+09 |
       2 | standby_promote          | t          | 2015-03-06 14:15:01.956737+09 | Node 2 was successfully be promoted to master
       2 | repmgrd_failover_promote | t          | 2015-03-06 14:15:01.964771+09 | Node 2 promoted to master; old master 1 marked as failed
       3 | repmgrd_failover_follow  | t          | 2015-03-06 14:15:07.228493+09 | Node 3 now following new upstream node 2
(12 rows)
2015-03-06 14:35:41 +09:00
Ian Barwick
0f8759d316 Consolidate duplicated code 2015-03-04 17:27:51 +09:00
Ian Barwick
defb1e819b Add some annotations 2015-03-04 10:36:19 +09:00