We have encountered a case where PgCat pools were stuck following a database incident. Our best understanding at this point is that the PgCat -> Postgres connections died silently and because Tokio defaults to disabling keepalives, connections in the pool were marked as busy forever. Only when we deployed PgCat did we see recovery.
This PR introduces tcp_keepalives to PgCat. This sets the defaults to be
keepalives_idle: 5 # seconds
keepalives_interval: 5 # seconds
keepalives_count: 5 # a count
These settings can detect the death of an idle connection within 30 seconds of its death. Please note that the connection can remain idle forever (from an application perspective) as long as the keepalive packets are flowing so disconnection will only occur if the other end is not acknowledging keepalive packets (keepalive packet acks are handled by the OS, the application does not need to do anything). I plan to add tcp_user_timeout in a follow-up PR.
* Don't send discard all when state is changed in transaction
* Remove unnecessary clone
* spelling
* Move transaction check to SET command
* Add test for set command in transaction
* type
* Update comments
* Update comments
* use moves instead of clones for initial message
* don't make message mutable
* Update unwrap
* but i'm not a wrapper
* Add set local test
* change continue
* Changes shard struct to use vector of ServerConfig
* Adds to query router
* Change client disconnect with error message to warn instead of debug
* Add warning logs for clean up actions
* Send DISCARD ALL even if client is not in transaction
* fmt
* Added tests + avoided sending extra discard all
* Adds set name logic to beginning of handle client
* fmt
* refactor dead code handling
* Refactor reading command tag
* remove unnecessary trim
* Removing debugging statement
* typo
* typo{
* documentation
* edit text
* un-unwrap
* run ci
* run ci
Co-authored-by: Zain Kabani <zain.kabani@instacart.com>
* initial commit of server check delay implementation
* fmt
* spelling
* Update name to last_healthcheck and some comments
* Moved server tested stat to after require_healthcheck check
* Make health check delay configurable
* Rename to last_activity
* Fix typo
* Add debug log for healthcheck
* Add address to debug log
* Add support for multi-database / multi-user pools
* Nothing
* cargo fmt
* CI
* remove test users
* rename pool
* Update tests to use admin user/pass
* more fixes
* Revert bad change
* Use PGDATABASE env var
* send server info in case of admin
* constants
* server.rs docs
* client.rs comments
* dead code; comments
* comment
* query cancellation comments
* remove unnecessary cast
* move db setup up one step
* query cancellation test
* new line; good night