Introduce tcp_keepalives to PgCat (#315)

We have encountered a case where PgCat pools were stuck following a database incident. Our best understanding at this point is that the PgCat -> Postgres connections died silently and because Tokio defaults to disabling keepalives, connections in the pool were marked as busy forever. Only when we deployed PgCat did we see recovery.

This PR introduces tcp_keepalives to PgCat. This sets the defaults to be

keepalives_idle: 5        # seconds
keepalives_interval: 5 # seconds
keepalives_count: 5    # a count
These settings can detect the death of an idle connection within 30 seconds of its death. Please note that the connection can remain idle forever (from an application perspective) as long as the keepalive packets are flowing so disconnection will only occur if the other end is not acknowledging keepalive packets (keepalive packet acks are handled by the OS, the application does not need to do anything). I plan to add tcp_user_timeout in a follow-up PR.
This commit is contained in:
Mostafa Abdelraouf
2023-02-08 11:35:38 -06:00
committed by GitHub
parent d81a744154
commit f1265a5570
11 changed files with 114 additions and 13 deletions

View File

@@ -8,7 +8,7 @@ class PgcatProcess
attr_reader :pid
def self.finalize(pid, log_filename, config_filename)
`kill #{pid}`
`kill #{pid}` if pid
File.delete(config_filename) if File.exist?(config_filename)
File.delete(log_filename) if File.exist?(log_filename)
end
@@ -75,8 +75,11 @@ class PgcatProcess
end
def stop
return unless @pid
`kill #{@pid}`
sleep 0.1
@pid = nil
end
def shutdown