Commit Graph

5 Commits

Author SHA1 Message Date
Mostafa Abdelraouf
e1e4929d43 Report waiting time only for currently waiting clients (#678)
The pool maxwait metric currently operates differently from Pgbouncer.

The way it operates today is that we keep track of max_wait on each connected client, when SHOW POOLS query is made, we go over the connected clients and we get the max of max_wait times among clients. This means the pool maxwait will never reset, it will always be monotonically increasing until the client with the highest maxwait disconnects.

This PR changes this behavior, by keeping track of the wait_start time on each client, when a client goes into WAITING state, we record the time offset from connect_time. When we either successfully or unsuccessfully checkout a connection from the pool, we reset the wait_start time.

When SHOW POOLS query is made, we go over all connected clients and we only consider clients whose wait_start is non-zero, for clients that have non-zero wait times, we compare them and report the maximum waiting time as maxwait for the pool.
2024-01-18 11:57:28 -06:00
Lev Kokotov
dc4d6edf17 Revert max_wait changes (#658)
* Revert "Reset wait times when checked out successfully (#656)"

This reverts commit ec3920d60f.

* Revert "Not sure how this sneaked past CI"

This reverts commit 4c5498b915.

* Revert "only report wait times from clients currently waiting to match behavior of pgbouncer (#655)"

This reverts commit 0e8064b049.
2023-12-05 01:47:38 -08:00
Daniel Babiak
0e8064b049 only report wait times from clients currently waiting to match behavior of pgbouncer (#655)
* Change maxwait to only report wait times from clients currently waiting to match behavior of pgbouncer

* Fix tests
2023-12-04 18:19:51 -08:00
Calvin Hughes
998cc16a3c Expose clients maxwait time in SHOW CLIENTS response via admin (#639)
* Expose clients maxwait time in SHOW CLIENTS response via PgCat admin
Displays the maxwait via maxwait_seconds and maxwait_us columns for each client that can be used to track down the wait time per client in a case where the overall pool stats shows waiting time. The maxwait_us, similar to the pool stats setup, is configured to display as a remainder alongside the maxwait_seconds.

* Use maxwait instead of maxwait_seconds to match pools column name

---------

Co-authored-by: Calvin Hughes <9379992+calvinhughes@users.noreply.github.com>
2023-11-13 11:24:39 -08:00
Mostafa Abdelraouf
a8a30ad43b Refactor Pool Stats to be based off of Server/Client stats (#445)
What is wrong
Stats reported by SHOW POOLS seem to be leaking. We see lingering cl_idle , cl_waiting, and similarly for sv_idle , sv_active. We confirmed that these are reporting issues not actual lingering clients.

This behavior is readily reproducible by running

while true; do
    psql "postgres://sharding_user:sharding_user@localhost:6432/sharded_db" -c "SELECT 1" > /dev/null 2>&1  &
done

Why it happens
I wasn't able to get to figure our the reason for the bug but my best guess is that we have race conditions when updating pool-level stats. So even though individual update operations are atomic, we perform a check then update sequence which is not protected by a guard.
https://github.com/postgresml/pgcat/blob/main/src/stats/pool.rs#L174-L179

I am also suspecting that using Relaxed ordering might allow this behavior (I changed all operations to use Ordering::SeqCst but still got lingering clients)

How to fix
Since SHOW POOLS/SHOW SERVER/SHOW CLIENTS all show the current state of the proxy (as opposed to SHOW STATS which show aggregate values), this PR refactors SHOW POOLS to have it construct the results directly from SHOW SERVER and SHOW CLIENT datasets. This reduces the complexity of stat updates and eliminates the need for having locks when updating pool stats as we only care about updating individual client/server states.

This will change the semantics of maxwait, so instead of it holding the maxwait time ever encountered by a client (connected or disconnected), it will only consider connected clients which should be okay given PgCat tends to hold on to client connections more than Pgbouncer.
2023-05-23 08:44:49 -05:00