* Revert "Reset wait times when checked out successfully (#656)"
This reverts commit ec3920d60f.
* Revert "Not sure how this sneaked past CI"
This reverts commit 4c5498b915.
* Revert "only report wait times from clients currently waiting to match behavior of pgbouncer (#655)"
This reverts commit 0e8064b049.
* Expose clients maxwait time in SHOW CLIENTS response via PgCat admin
Displays the maxwait via maxwait_seconds and maxwait_us columns for each client that can be used to track down the wait time per client in a case where the overall pool stats shows waiting time. The maxwait_us, similar to the pool stats setup, is configured to display as a remainder alongside the maxwait_seconds.
* Use maxwait instead of maxwait_seconds to match pools column name
---------
Co-authored-by: Calvin Hughes <9379992+calvinhughes@users.noreply.github.com>
* Initial commit
* Cleanup and add stats
* Use an arc instead of full clones to store the parse packets
* Use mutex instead
* fmt
* clippy
* fmt
* fix?
* fix?
* fmt
* typo
* Update docs
* Refactor custom protocol
* fmt
* move custom protocol handling to before parsing
* Support describe
* Add LRU for server side statement cache
* rename variable
* Refactoring
* Move docs
* Fix test
* fix
* Update tests
* trigger build
* Add more tests
* Reorder handling sync
* Support when a named describe is sent along with Parse (go pgx) and expecting results
* don't talk to client if not needed when client sends Parse
* fmt :(
* refactor tests
* nit
* Reduce hashing
* Reducing work done to decode describe and parse messages
* minor refactor
* Merge branch 'main' into zain/reimplment-prepared-statements-with-global-lru-cache
* Rewrite extended and prepared protocol message handling to better support mocking response packets and close
* An attempt to better handle if there are DDL changes that might break cached plans with ideas about how to further improve it
* fix
* Minor stats fixed and cleanup
* Cosmetic fixes (#64)
* Cosmetic fixes
* fix test
* Change server drop for statement cache error to a `deallocate all`
* Updated comments and added new idea for handling DDL changes impacting cached plans
* fix test?
* Revert test change
* trigger build, flakey test
* Avoid potential race conditions by changing get_or_insert to promote for pool LRU
* remove ps enabled variable on the server in favor of using an option
* Add close to the Extended Protocol buffer
---------
Co-authored-by: Lev Kokotov <levkk@users.noreply.github.com>
* User server parameters struct instead of server info bytesmut
* Refactor to use hashmap for all params and add server parameters to client
* Sync parameters on client server checkout
* minor refactor
* update client side parameters when changed
* Move the SET statement logic from the C packet to the S packet.
* trigger build
* revert validation changes
* remove comment
* Try fix
* Reset cleanup state after sync
* fix server version test
* Track application name through client life for stats
* Add tests
* minor refactoring
* fmt
* fix
* fmt
This commit adds the clap library and configures the necessary args to
parse from the command line, expanding the current option of a single
file and adding support for environment variables.
Signed-off-by: Sebastian Webber <sebastian@swebber.me>
This commit adds a new function to handle notify and use it
in the SHOW HELP command, which displays the available options
in the admin console.
Also, adding Fabrízio as a co-author for all the help with the
protocol and the help to structure this PR.
Signed-off-by: Sebastian Webber <sebastian@swebber.me>
Co-authored-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
What is wrong
Stats reported by SHOW POOLS seem to be leaking. We see lingering cl_idle , cl_waiting, and similarly for sv_idle , sv_active. We confirmed that these are reporting issues not actual lingering clients.
This behavior is readily reproducible by running
while true; do
psql "postgres://sharding_user:sharding_user@localhost:6432/sharded_db" -c "SELECT 1" > /dev/null 2>&1 &
done
Why it happens
I wasn't able to get to figure our the reason for the bug but my best guess is that we have race conditions when updating pool-level stats. So even though individual update operations are atomic, we perform a check then update sequence which is not protected by a guard.
https://github.com/postgresml/pgcat/blob/main/src/stats/pool.rs#L174-L179
I am also suspecting that using Relaxed ordering might allow this behavior (I changed all operations to use Ordering::SeqCst but still got lingering clients)
How to fix
Since SHOW POOLS/SHOW SERVER/SHOW CLIENTS all show the current state of the proxy (as opposed to SHOW STATS which show aggregate values), this PR refactors SHOW POOLS to have it construct the results directly from SHOW SERVER and SHOW CLIENT datasets. This reduces the complexity of stat updates and eliminates the need for having locks when updating pool stats as we only care about updating individual client/server states.
This will change the semantics of maxwait, so instead of it holding the maxwait time ever encountered by a client (connected or disconnected), it will only consider connected clients which should be okay given PgCat tends to hold on to client connections more than Pgbouncer.
* Refactor stats to use atomics
When we are dealing with a high number of connections, generated
stats cannot be consumed fast enough by the stats collector loop.
This makes the stats subsystem inconsistent and a log of
warning messages are thrown due to unregistered server/clients.
This change refactors the stats subsystem so it uses atomics:
- Now counters are handled using U64 atomics
- Event system is dropped and averages are calculated using a loop
every 15 seconds.
- Now, instead of snapshots being generated ever second we keep track of servers/clients
that have registered. Each pool/server/client has its own instance of the counter and
makes changes directly, instead of adding an event that gets processed later.
* Manually mplement Hash/Eq in `config::Address` ignoring stats
* Add tests for client connection counters
* Allow connecting to dockerized dev pgcat from the host
* stats: Decrease cl_idle when idle socket disconnects
This is an implementation of Query mirroring in PgCat (outlined here #302)
In configs, we match mirror hosts with the servers handling the traffic. A mirror host will receive the same protocol messages as the main server it was matched with.
This is done by creating an async task for each mirror server, it communicates with the main server through two channels, one for the protocol messages and one for the exit signal. The mirror server sends the protocol packets to the underlying PostgreSQL server. We receive from the underlying PostgreSQL server as soon as the data is available and we immediately discard it. We use bb8 to manage the life cycle of the connection, not for pooling since each mirror server handler is more or less single-threaded.
We don't have any connection pooling in the mirrors. Matching each mirror connection to an actual server connection guarantees that we will not have more connections to any of the mirrors than the parent pool would allow.
Sometimes we want an admin to be able to ban a host for some time to route traffic away from that host for reasons like partial outages, replication lag, and scheduled maintenance.
We can achieve this today using a configuration update but a quicker approach is to send a control command to PgCat that bans the replica for some specified duration.
This command does not change the current banning rules like
Primaries cannot be banned
When all replicas are banned, all replicas are unbanned
* Adds SHUTDOWN command to PgCat as alternate option to sending SIGINT
* Check if we're already in SHUTDOWN sequence
* Send signal directly from shutdown instead of using channel
* Add tests
* trigger build
* Lowercase response and boolean change
* Update tests
* Fix tests
* typo
* Add support for multi-database / multi-user pools
* Nothing
* cargo fmt
* CI
* remove test users
* rename pool
* Update tests to use admin user/pass
* more fixes
* Revert bad change
* Use PGDATABASE env var
* send server info in case of admin
* Support reloading the entire config (including sharding logic) without restart.
* Fix bug incorrectly handing error reporting when the shard is set incorrectly via SET SHARD TO command.
selected wrong shard and the connection keep reporting fatal #80.
* Fix total_received and avg_recv admin database statistics.
* Enabling the query parser by default.
* More tests.