* Add a new exec_simple_query method
This adds a new `exec_simple_query` method so we can make 'out of band'
queries to servers that don't interfere with pools at all.
In order to reuse startup code for making these simple queries,
we need to set the stats (`Reporter`) optional, so using these
simple queries wont interfere with stats.
* Add auth passthough (auth_query)
Adds a feature that allows setting auth passthrough for md5 auth.
It adds 3 new (general and pool) config parameters:
- `auth_query`: An string containing a query that will be executed on boot
to obtain the hash of a given user. This query have to use a placeholder `$1`,
so pgcat can replace it with the user its trying to fetch the hash from.
- `auth_query_user`: The user to use for connecting to the server and executing the
auth_query.
- `auth_query_password`: The password to use for connecting to the server and executing the
auth_query.
The configuration can be done either on the general config (so pools share them) or in a per-pool basis.
The behavior is, at boot time, when validating server connections, a hash is fetched per server
and stored in the pool. When new server connections are created, and no cleartext password is specified,
the obtained hash is used for creating them, if the hash could not be obtained for whatever reason, it retries
it.
When client authentication is tried, it uses cleartext passwords if specified, it not, it checks whether
we have query_auth set up, if so, it tries to use the obtained hash for making client auth. If there is no
hash (we could not obtain one when validating the connection), a new fetch is tried.
Once we have a hash, we authenticate using it against whathever the client has sent us, if there is a failure
we refetch the hash and retry auth (so password changes can be done).
The idea with this 'retrial' mechanism is to make it fault tolerant, so if for whatever reason hash could not be
obtained during connection validation, or the password has change, we can still connect later.
* Add documentation for Auth passthrough
* Refactor stats to use atomics
When we are dealing with a high number of connections, generated
stats cannot be consumed fast enough by the stats collector loop.
This makes the stats subsystem inconsistent and a log of
warning messages are thrown due to unregistered server/clients.
This change refactors the stats subsystem so it uses atomics:
- Now counters are handled using U64 atomics
- Event system is dropped and averages are calculated using a loop
every 15 seconds.
- Now, instead of snapshots being generated ever second we keep track of servers/clients
that have registered. Each pool/server/client has its own instance of the counter and
makes changes directly, instead of adding an event that gets processed later.
* Manually mplement Hash/Eq in `config::Address` ignoring stats
* Add tests for client connection counters
* Allow connecting to dockerized dev pgcat from the host
* stats: Decrease cl_idle when idle socket disconnects
This is an implementation of Query mirroring in PgCat (outlined here #302)
In configs, we match mirror hosts with the servers handling the traffic. A mirror host will receive the same protocol messages as the main server it was matched with.
This is done by creating an async task for each mirror server, it communicates with the main server through two channels, one for the protocol messages and one for the exit signal. The mirror server sends the protocol packets to the underlying PostgreSQL server. We receive from the underlying PostgreSQL server as soon as the data is available and we immediately discard it. We use bb8 to manage the life cycle of the connection, not for pooling since each mirror server handler is more or less single-threaded.
We don't have any connection pooling in the mirrors. Matching each mirror connection to an actual server connection guarantees that we will not have more connections to any of the mirrors than the parent pool would allow.
We have encountered a case where PgCat pools were stuck following a database incident. Our best understanding at this point is that the PgCat -> Postgres connections died silently and because Tokio defaults to disabling keepalives, connections in the pool were marked as busy forever. Only when we deployed PgCat did we see recovery.
This PR introduces tcp_keepalives to PgCat. This sets the defaults to be
keepalives_idle: 5 # seconds
keepalives_interval: 5 # seconds
keepalives_count: 5 # a count
These settings can detect the death of an idle connection within 30 seconds of its death. Please note that the connection can remain idle forever (from an application perspective) as long as the keepalive packets are flowing so disconnection will only occur if the other end is not acknowledging keepalive packets (keepalive packet acks are handled by the OS, the application does not need to do anything). I plan to add tcp_user_timeout in a follow-up PR.
* Don't send discard all when state is changed in transaction
* Remove unnecessary clone
* spelling
* Move transaction check to SET command
* Add test for set command in transaction
* type
* Update comments
* Update comments
* use moves instead of clones for initial message
* don't make message mutable
* Update unwrap
* but i'm not a wrapper
* Add set local test
* change continue
* Changes shard struct to use vector of ServerConfig
* Adds to query router
* Change client disconnect with error message to warn instead of debug
* Add warning logs for clean up actions
* Send DISCARD ALL even if client is not in transaction
* fmt
* Added tests + avoided sending extra discard all
* Adds set name logic to beginning of handle client
* fmt
* refactor dead code handling
* Refactor reading command tag
* remove unnecessary trim
* Removing debugging statement
* typo
* typo{
* documentation
* edit text
* un-unwrap
* run ci
* run ci
Co-authored-by: Zain Kabani <zain.kabani@instacart.com>
* initial commit of server check delay implementation
* fmt
* spelling
* Update name to last_healthcheck and some comments
* Moved server tested stat to after require_healthcheck check
* Make health check delay configurable
* Rename to last_activity
* Fix typo
* Add debug log for healthcheck
* Add address to debug log
* Add support for multi-database / multi-user pools
* Nothing
* cargo fmt
* CI
* remove test users
* rename pool
* Update tests to use admin user/pass
* more fixes
* Revert bad change
* Use PGDATABASE env var
* send server info in case of admin
* constants
* server.rs docs
* client.rs comments
* dead code; comments
* comment
* query cancellation comments
* remove unnecessary cast
* move db setup up one step
* query cancellation test
* new line; good night