mirror of
https://github.com/postgresml/pgcat.git
synced 2026-03-26 02:16:30 +00:00
Refactor Pool Stats to be based off of Server/Client stats (#445)
What is wrong
Stats reported by SHOW POOLS seem to be leaking. We see lingering cl_idle , cl_waiting, and similarly for sv_idle , sv_active. We confirmed that these are reporting issues not actual lingering clients.
This behavior is readily reproducible by running
while true; do
psql "postgres://sharding_user:sharding_user@localhost:6432/sharded_db" -c "SELECT 1" > /dev/null 2>&1 &
done
Why it happens
I wasn't able to get to figure our the reason for the bug but my best guess is that we have race conditions when updating pool-level stats. So even though individual update operations are atomic, we perform a check then update sequence which is not protected by a guard.
https://github.com/postgresml/pgcat/blob/main/src/stats/pool.rs#L174-L179
I am also suspecting that using Relaxed ordering might allow this behavior (I changed all operations to use Ordering::SeqCst but still got lingering clients)
How to fix
Since SHOW POOLS/SHOW SERVER/SHOW CLIENTS all show the current state of the proxy (as opposed to SHOW STATS which show aggregate values), this PR refactors SHOW POOLS to have it construct the results directly from SHOW SERVER and SHOW CLIENT datasets. This reduces the complexity of stat updates and eliminates the need for having locks when updating pool stats as we only care about updating individual client/server states.
This will change the semantics of maxwait, so instead of it holding the maxwait time ever encountered by a client (connected or disconnected), it will only consider connected clients which should be okay given PgCat tends to hold on to client connections more than Pgbouncer.
This commit is contained in:
committed by
GitHub
parent
d63be9b93a
commit
a8a30ad43b
21
src/stats.rs
21
src/stats.rs
@@ -1,4 +1,3 @@
|
||||
use crate::pool::PoolIdentifier;
|
||||
/// Statistics and reporting.
|
||||
use arc_swap::ArcSwap;
|
||||
|
||||
@@ -16,13 +15,11 @@ pub mod pool;
|
||||
pub mod server;
|
||||
pub use address::AddressStats;
|
||||
pub use client::{ClientState, ClientStats};
|
||||
pub use pool::PoolStats;
|
||||
pub use server::{ServerState, ServerStats};
|
||||
|
||||
/// Convenience types for various stats
|
||||
type ClientStatesLookup = HashMap<i32, Arc<ClientStats>>;
|
||||
type ServerStatesLookup = HashMap<i32, Arc<ServerStats>>;
|
||||
type PoolStatsLookup = HashMap<(String, String), Arc<PoolStats>>;
|
||||
|
||||
/// Stats for individual client connections
|
||||
/// Used in SHOW CLIENTS.
|
||||
@@ -34,11 +31,6 @@ static CLIENT_STATS: Lazy<Arc<RwLock<ClientStatesLookup>>> =
|
||||
static SERVER_STATS: Lazy<Arc<RwLock<ServerStatesLookup>>> =
|
||||
Lazy::new(|| Arc::new(RwLock::new(ServerStatesLookup::default())));
|
||||
|
||||
/// Aggregate stats for each pool (a pool is identified by database name and username)
|
||||
/// Used in SHOW POOLS.
|
||||
static POOL_STATS: Lazy<Arc<RwLock<PoolStatsLookup>>> =
|
||||
Lazy::new(|| Arc::new(RwLock::new(PoolStatsLookup::default())));
|
||||
|
||||
/// The statistics reporter. An instance is given to each possible source of statistics,
|
||||
/// e.g. client stats, server stats, connection pool stats.
|
||||
pub static REPORTER: Lazy<ArcSwap<Reporter>> =
|
||||
@@ -80,13 +72,6 @@ impl Reporter {
|
||||
fn server_disconnecting(&self, server_id: i32) {
|
||||
SERVER_STATS.write().remove(&server_id);
|
||||
}
|
||||
|
||||
/// Register a pool with the stats system.
|
||||
fn pool_register(&self, identifier: PoolIdentifier, stats: Arc<PoolStats>) {
|
||||
POOL_STATS
|
||||
.write()
|
||||
.insert((identifier.db, identifier.user), stats);
|
||||
}
|
||||
}
|
||||
|
||||
/// The statistics collector which used for calculating averages
|
||||
@@ -139,12 +124,6 @@ pub fn get_server_stats() -> ServerStatesLookup {
|
||||
SERVER_STATS.read().clone()
|
||||
}
|
||||
|
||||
/// Get a snapshot of pool statistics.
|
||||
/// by the `Collector`.
|
||||
pub fn get_pool_stats() -> PoolStatsLookup {
|
||||
POOL_STATS.read().clone()
|
||||
}
|
||||
|
||||
/// Get the statistics reporter used to update stats across the pools/clients.
|
||||
pub fn get_reporter() -> Reporter {
|
||||
(*(*REPORTER.load())).clone()
|
||||
|
||||
Reference in New Issue
Block a user