diff --git a/doc/switchover.sgml b/doc/switchover.sgml index ec834102..83a3396f 100644 --- a/doc/switchover.sgml +++ b/doc/switchover.sgml @@ -54,24 +54,46 @@ preparation Preparing for switchover + - As mentioned above, success of the switchover operation depends on &repmgr; - being able to shut down the current primary server quickly and cleanly. + As mentioned in the previous section, success of the switchover operation depends on + &repmgr; being able to shut down the current primary server quickly and cleanly. + Double-check which commands will be used to stop/start/restart the current primary; on the primary execute: repmgr -f /etc/repmgr.conf node service --list --action=stop repmgr -f /etc/repmgr.conf node service --list --action=start - repmgr -f /etc/repmgr.conf node service --list --action=restart - + repmgr -f /etc/repmgr.conf node service --list --action=restart + + + These commands can be defined in repmgr.conf with + , + and . + + + + + If &repmgr; is installed from a package. you should set these commands + to use the appropriate service commands defined by the package/operating + system as these will ensure PostgreSQL is stopped/started properly + taking into account configuration and log file locations etc. + + + If the options aren't defined, &repmgr; will + fall back to using pg_ctl to stop/start/restart + PostgreSQL, which may not work properly. + + + On systemd systems we strongly recommend using the appropriate systemctl commands (typically run via sudo) to ensure - systemd informed about the status of the PostgreSQL service. + systemd is informed about the status of the PostgreSQL service. If using sudo for the systemctl calls, make sure the @@ -79,25 +101,30 @@ this way, repmgr will fail to stop the primary. + Check that access from applications is minimalized or preferably blocked completely, so applications are not unexpectedly interrupted. + Check there is no significant replication lag on standbys attached to the current primary. + If WAL file archiving is set up, check that there is no backlog of files waiting - to be archived, as PostgreSQL will not finally shut down until all these have been + to be archived, as PostgreSQL will not finally shut down until all of these have been archived. If there is a backlog exceeding archive_ready_warning WAL files, &repmgr; will emit a warning before attempting to perform a switchover; you can also check manually with repmgr node check --archive-ready. + Ensure that repmgrd is *not* running anywhere to prevent it unintentionally promoting a node. + Finally, consider executing repmgr standby switchover with the --dry-run option; this will perform any necessary checks and inform you about @@ -115,6 +142,48 @@ "pg_ctl -l /var/log/postgresql/startup.log -D '/var/lib/postgresql/data' -m fast -W stop" + + + + Be aware that checks the prerequisites + for performing the switchover and some basic sanity checks on the + state of the database which might effect the switchover operation + (e.g. replication lag); it cannot however guarantee the switchover + operation will succeed. In particular, if the current primary + does not shut down cleanly, &repmgr; will not be able to reliably + execute the switchover (as there would be a danger of divergence + between the former and new primary nodes). + + + + + Note that following parameters in repmgr.conf are relevant to the + switchover operation: + + + + reconnect_attempts: number of times to check the original primary + for a clean shutdown after executing the shutdown command, before aborting + + + + + reconnect_interval: interval (in seconds) to check the original + primary for a clean shutdown after executing the shutdown command (up to a maximum + of reconnect_attempts tries) + + + + + replication_lag_critical: + if replication lag (in seconds) on the standby exceeds this value, the + switchover will be aborted (unless the -F/--force option + is provided) + + + + +