Code coverage logic was missing coverage from rust tests. This is now fixed.
Also, we weren't reaping spawned PgCat processes correctly which left zombie processes.
We have encountered a case where PgCat pools were stuck following a database incident. Our best understanding at this point is that the PgCat -> Postgres connections died silently and because Tokio defaults to disabling keepalives, connections in the pool were marked as busy forever. Only when we deployed PgCat did we see recovery.
This PR introduces tcp_keepalives to PgCat. This sets the defaults to be
keepalives_idle: 5 # seconds
keepalives_interval: 5 # seconds
keepalives_count: 5 # a count
These settings can detect the death of an idle connection within 30 seconds of its death. Please note that the connection can remain idle forever (from an application perspective) as long as the keepalive packets are flowing so disconnection will only occur if the other end is not acknowledging keepalive packets (keepalive packet acks are handled by the OS, the application does not need to do anything). I plan to add tcp_user_timeout in a follow-up PR.