pgcat

mirror of https://github.com/postgresml/pgcat.git synced 2026-03-23 01:16:30 +00:00

Author	SHA1	Message	Date
Zain Kabani	e14b283f0c	Make infer role configurable and fix double parse bug (#533 ) * Make infer role configurable and fix double parse bug * Fix tests * Enable infer_role_from query in toml for tests * Fix test * Add max length config, add logging for which application is failing to parse, and change config name * fmt * Update src/config.rs --------- Co-authored-by: Lev Kokotov <levkk@users.noreply.github.com>	2023-08-08 13:10:03 -07:00
Zain Kabani	aca9738821	Make queue strategy configurable and default to Fifo (#463 ) * Change idle timeout default to 10 minutes * Revert lifo for now while we investigate connection thrashing issues * Make queue strategy configurable * test revert idle time out * Add pgcat start to python test	2023-06-09 11:35:20 -07:00
Zain Kabani	b67c33b6d0	Use latest bb8 and use Lifo as the queue strategy in the pool (#455 ) * Use git bb8 * Use latest bb8 and change pool is use stack	2023-05-28 19:46:13 -07:00
Mostafa Abdelraouf	a8a30ad43b	Refactor Pool Stats to be based off of Server/Client stats (#445 ) What is wrong Stats reported by SHOW POOLS seem to be leaking. We see lingering cl_idle , cl_waiting, and similarly for sv_idle , sv_active. We confirmed that these are reporting issues not actual lingering clients. This behavior is readily reproducible by running while true; do psql "postgres://sharding_user:sharding_user@localhost:6432/sharded_db" -c "SELECT 1" > /dev/null 2>&1 & done Why it happens I wasn't able to get to figure our the reason for the bug but my best guess is that we have race conditions when updating pool-level stats. So even though individual update operations are atomic, we perform a check then update sequence which is not protected by a guard. https://github.com/postgresml/pgcat/blob/main/src/stats/pool.rs#L174-L179 I am also suspecting that using Relaxed ordering might allow this behavior (I changed all operations to use Ordering::SeqCst but still got lingering clients) How to fix Since SHOW POOLS/SHOW SERVER/SHOW CLIENTS all show the current state of the proxy (as opposed to SHOW STATS which show aggregate values), this PR refactors SHOW POOLS to have it construct the results directly from SHOW SERVER and SHOW CLIENT datasets. This reduces the complexity of stat updates and eliminates the need for having locks when updating pool stats as we only care about updating individual client/server states. This will change the semantics of maxwait, so instead of it holding the maxwait time ever encountered by a client (connected or disconnected), it will only consider connected clients which should be okay given PgCat tends to hold on to client connections more than Pgbouncer.	2023-05-23 08:44:49 -05:00
Lev Kokotov	37e3349c24	Optionally clean up server connections (#444 ) * Optionally clean up server connections * move setting to pool * fix test * Print setting to screen * fmt * Fix pool_settings override in tests	2023-05-18 10:46:55 -07:00
Lev Kokotov	0898461c01	Allow to deploy pools without checking (#438 )	2023-05-12 12:48:37 -07:00
Lev Kokotov	52b1b43850	Prewarmer (#435 ) * Prewarmer * hmm * Tests * default * fix test * Correct configuration * Added minimal config example * remove connect_timeout	2023-05-12 09:50:52 -07:00
Lev Kokotov	389993bf3e	Accurate log messages (#425 )	2023-05-05 08:27:19 -07:00
Lev Kokotov	ba5243b6dd	Optionally validate config on boot (#423 )	2023-05-03 17:07:23 -07:00
Lev Kokotov	811885f464	Actually plugins (#421 ) * more plugins * clean up * fix tests * fix flakey test	2023-05-03 16:13:45 -07:00
Lev Kokotov	09e54e1175	Plugins! (#420 ) * Some queries * Plugins!! * cleanup * actual names * the actual plugins * comment * fix tests * Tests * unused errors * Increase reaper rate to actually enforce settings * ok	2023-05-03 09:13:05 -07:00
Lev Kokotov	0d504032b2	Server TLS (#417 ) * Server TLS * Finish up TLS * thats it * diff * remove dead code * maybe? * dirty shutdown * skip flakey test * remove unused error * fetch config once	2023-04-30 09:41:46 -07:00
Lev Kokotov	4a87b4807d	Add more pool settings (#416 ) * Add some pool settings * fmt	2023-04-26 16:33:26 -07:00
Lev Kokotov	a62f6b0eea	Fix port; add user pool mode (#395 ) * Fix port; add user pool mode * will probably break our session/transaction mode tests	2023-04-05 15:06:19 -07:00
Jose Fernández	6f768a84ce	Auth passthrough (auth_query) (#266 ) * Add a new exec_simple_query method This adds a new `exec_simple_query` method so we can make 'out of band' queries to servers that don't interfere with pools at all. In order to reuse startup code for making these simple queries, we need to set the stats (`Reporter`) optional, so using these simple queries wont interfere with stats. * Add auth passthough (auth_query) Adds a feature that allows setting auth passthrough for md5 auth. It adds 3 new (general and pool) config parameters: - `auth_query`: An string containing a query that will be executed on boot to obtain the hash of a given user. This query have to use a placeholder `$1`, so pgcat can replace it with the user its trying to fetch the hash from. - `auth_query_user`: The user to use for connecting to the server and executing the auth_query. - `auth_query_password`: The password to use for connecting to the server and executing the auth_query. The configuration can be done either on the general config (so pools share them) or in a per-pool basis. The behavior is, at boot time, when validating server connections, a hash is fetched per server and stored in the pool. When new server connections are created, and no cleartext password is specified, the obtained hash is used for creating them, if the hash could not be obtained for whatever reason, it retries it. When client authentication is tried, it uses cleartext passwords if specified, it not, it checks whether we have query_auth set up, if so, it tries to use the obtained hash for making client auth. If there is no hash (we could not obtain one when validating the connection), a new fetch is tried. Once we have a hash, we authenticate using it against whathever the client has sent us, if there is a failure we refetch the hash and retry auth (so password changes can be done). The idea with this 'retrial' mechanism is to make it fault tolerant, so if for whatever reason hash could not be obtained during connection validation, or the password has change, we can still connect later. * Add documentation for Auth passthrough	2023-03-30 13:29:23 -07:00
Jose Fernández	58ce76d9b9	Refactor stats to use atomics (#375 ) * Refactor stats to use atomics When we are dealing with a high number of connections, generated stats cannot be consumed fast enough by the stats collector loop. This makes the stats subsystem inconsistent and a log of warning messages are thrown due to unregistered server/clients. This change refactors the stats subsystem so it uses atomics: - Now counters are handled using U64 atomics - Event system is dropped and averages are calculated using a loop every 15 seconds. - Now, instead of snapshots being generated ever second we keep track of servers/clients that have registered. Each pool/server/client has its own instance of the counter and makes changes directly, instead of adding an event that gets processed later. * Manually mplement Hash/Eq in `config::Address` ignoring stats * Add tests for client connection counters * Allow connecting to dockerized dev pgcat from the host * stats: Decrease cl_idle when idle socket disconnects	2023-03-28 17:19:37 +02:00
Mostafa Abdelraouf	aa89e357e0	PgCat Query Mirroring (#341 ) This is an implementation of Query mirroring in PgCat (outlined here #302) In configs, we match mirror hosts with the servers handling the traffic. A mirror host will receive the same protocol messages as the main server it was matched with. This is done by creating an async task for each mirror server, it communicates with the main server through two channels, one for the protocol messages and one for the exit signal. The mirror server sends the protocol packets to the underlying PostgreSQL server. We receive from the underlying PostgreSQL server as soon as the data is available and we immediately discard it. We use bb8 to manage the life cycle of the connection, not for pooling since each mirror server handler is more or less single-threaded. We don't have any connection pooling in the mirrors. Matching each mirror connection to an actual server connection guarantees that we will not have more connections to any of the mirrors than the parent pool would allow.	2023-03-10 06:23:51 -06:00
Mostafa Abdelraouf	2cc6a09fba	Add Manual host banning to PgCat (#340 ) Sometimes we want an admin to be able to ban a host for some time to route traffic away from that host for reasons like partial outages, replication lag, and scheduled maintenance. We can achieve this today using a configuration update but a quicker approach is to send a control command to PgCat that bans the replica for some specified duration. This command does not change the current banning rules like Primaries cannot be banned When all replicas are banned, all replicas are unbanned	2023-03-06 06:10:59 -06:00
Mostafa Abdelraouf	75a7d4409a	Fix Back-and-forth RELOAD Bug (#330 ) We identified a bug where RELOAD fails to update the pools. To reproduce you need to start at some config state, modify that state a bit, reload, revert the configs back to the original state, and reload. The last reload will fail to update the pool because PgCat "thinks" the pool state didn't change. This is because we use a HashSet to keep track of config hashes but we never remove values from it. Say we start with State A, we modify pool configs to State B and reload. Now the POOL_HASHES struct has State A and State B. Attempting to go back to State A will encounter a hashset hit which is interpreted by PgCat as "Configs are the same, no need to reload pools" We fix this by attaching a config_hash value to ConnectionPool object and we calculate that value when we create the pool. This eliminates the need for a global variable. One shortcoming here is that changing any config under one user in the pool will trigger a reload for the entire pool (which is fine I think)	2023-02-21 21:53:10 -06:00
John Meagher	d5f60b1720	Allow shard setting with comments (#293 ) What Allows shard selection by the client to come in via comments like /* shard_id: 1 / select from foo; Why We're using a setup in Ruby that makes it tough or impossible to inject commands on the connection to set the shard before it gets to the "real" SQL being run. Instead we have an updated PG adapter that allows injection of comments before each executed SQL statement. We need this support in pgcat in order to keep some complex shard picking logic in Ruby code while using pgcat for connection management. Local Testing Run postgres and pgcat with the default options. Run psql < tests/sharding/query_routing_setup.sql to setup the database for the tests and run ./tests/pgbench/external_shard_test.sh as often as needed to exercise the shard setting comment test.	2023-02-15 15:19:16 -06:00
Lev Kokotov	24e79dcf05	Startup improvements & PAUSE/RESUME (#300 ) * Dont require servers to be online to start pooler * PAUSE/RESUME * fix * Refresh pool * Fixes * lint	2023-01-28 15:36:35 -08:00
zainkabani	a0e740d30f	Refactors is_banned logic and forces health check on unban (#288 ) * Refactors is_banned logic and forces healthcheck on unban * typo * Make is banned log debug * addressing comments * Comment	2023-01-19 17:36:48 -08:00
Mostafa Abdelraouf	7894bba59b	Introduce least-outstanding-connections load balancing (#282 ) Least outstanding connections load balancing can improve the load distribution between instances but for Pgcat it may also improve handling slow replicas that don't go completely down. With LoC, traffic will quickly move away from the slow replica without waiting for the replica to be banned. If all replicas slow down equally (due to a bad query that is hitting all replicas), the algorithm will degenerate to Random Load Balancing (which is what we had in Pgcat until today). This may also allow Pgcat to accommodate pools with differently-sized replicas.	2023-01-17 06:52:18 -06:00
Jose Fernández	99247f7c88	Allow setting `idle_timeout` for server connections. (#257 ) In postgres, you can specify an `idle_session_timeout` which will close sessions idling for that amount of time. If a session is closed because of a timeout, PgCat will erroneously mark the server as unhealthy as the next health check will return an error because the connection was drop, if no health check is to be executed, it will simply fail trying to send the query to the server for the same reason, the conn was drop. Given that bb8 allows configuring an idle_timeout for pools, it would be nice to allow setting this parameter in the config file, this way you can set it to something shorter than the server one. Also, server pool will be kept smaller in moments of less traffic. Actually, currently this value is set as its default in bb8, which is 10 minutes. This changes allows setting the parameter using the config file. It can be set both globally and per pool. When creating the pool, if the pool don't have it defined, global value is used.	2022-12-16 08:01:00 -08:00
zainkabani	0c96156dae	Adds health check setting to pool and avoids get_config in hotpath (#235 ) * Adds healthcheck settings to pool * fmt * Fix test	2022-11-16 18:51:15 -08:00
Cluas	dfa26ec6f8	chore: make clippy lint happy (#225 ) * chore: make clippy happy * chore: cargo fmt * chore: cargo fmt	2022-11-09 10:04:31 -08:00
Lev Kokotov	0524787d31	Automatic sharding: part one of many (#194 ) Starting automatic sharding	2022-10-25 11:47:41 -07:00
Mostafa Abdelraouf	83fd639918	A bit faster get_pool (#187 ) * A bit faster get_pool * fmt	2022-10-08 08:16:04 -07:00
Mostafa Abdelraouf	3d33ccf4b0	Fix maxwait metric (#183 ) Max wait was being reported as 0 after #159 This PR fixes that and adds test	2022-10-05 21:41:09 -05:00
Lev Kokotov	7987c5ffad	Replace a few types with more developer-friendly names (#182 ) * Replace a few types with more developer-friendly names * UserPool -> PoolIdentifier	2022-10-01 10:25:59 -07:00
zainkabani	24f5eec3ea	Change sharding config to enum and move validation of configs into public functions (#178 ) Moves config validation to own functions to enable tools to use them Moves sharding config to enum Makes defaults public Make connect_timeout on pool and option which is overwritten by general connect_timeout	2022-09-28 08:50:14 -05:00
Lev Kokotov	19fd677891	Fix the pool fix (#176 ) * Always listen to the compiler * Its fine	2022-09-23 12:06:07 -07:00
Lev Kokotov	964a5e1708	Don't drop connections if DB hasn't changed (#175 ) * Don't drop connections if DB hasn't changed * Incoporate connect_timeout into the pool config * use the field	2022-09-23 11:32:05 -07:00
zainkabani	f72dac420b	Add defaults for configs (#174 ) * add statement timeout to readme * Add defaults to various configs * primary read enabled default to false	2022-09-22 23:00:46 -07:00
zainkabani	3a729bb75b	Minor refactor for configs (#172 ) * Changes shard struct to use vector of ServerConfig * Adds to query router * Change client disconnect with error message to warn instead of debug * Add warning logs for clean up actions	2022-09-22 10:07:02 -07:00
zainkabani	85cc2f4147	Update to latest library versions (#170 )	2022-09-21 13:48:33 -07:00
Mostafa Abdelraouf	4ae1bc8d32	Add SHOW CLIENTS / SHOW SERVERS + Stats refactor and tests (#159 ) * wip * Main Thread Panic when swarmed with clients * fix * fix * 1024 * fix * remove test * Add SHOW CLIENTS * revert * fmt * Refactor + tests * fmt * add test * Add SHOW SERVERS + Make PR unreviewable * prometheus * add state to clients and servers * fmt * Add application_name to server stats * Add tests for waiting clients * Docs * remove comment * comments * typo * cleanup * CI	2022-09-14 11:20:41 -04:00
Mostafa Abdelraouf	36339bd96f	Log Address information in connection create/drop (#154 ) * Log Address information in connection create/drop * run ci	2022-09-01 11:16:22 -07:00
Lev Kokotov	9d84d6f131	Graceful shutdown and refactor (#144 ) * Graceful shutdown and refactor * ok * _Graceful_ shutdown * Remove hardcoded setting * clean up * end * timeout * hmm * hmm! * bash * bash * hmm * maybe maybe * Adds tests and move non-admin connection rejection to startup (#145) * Move error response * Adds tests and removes unused variable * Adds debug log Co-authored-by: zainkabani <77307340+zainkabani@users.noreply.github.com>	2022-08-25 06:40:56 -07:00
Lev Kokotov	069d76029f	Fix incorrect routing for replicas (#139 ) * Fix incorrect routing for replicas * name	2022-08-21 22:40:49 -07:00
Mostafa Abdelraouf	5f5b5e2543	Random instance selection (#136 ) * wip * revert some' * revert more * poor-man's integration test * remove test * fmt * --workspace * fix build * fix integration test * another stab * log * run after integration * cargo test after integration * revert * revert more * Refactor + clean up * more clean up	2022-08-21 22:15:20 -07:00
zainkabani	5948fef6cf	Minor Refactoring of re-used code and server stat reporting (#129 ) * Minor changes to stats reporting and recduce re-used code * fmt	2022-08-18 05:12:38 -07:00
Mostafa Abdelraouf	790898c20e	Add pool name and username to address object (#128 ) * Add pool name and username to address object * Fix address name * fmt	2022-08-17 08:40:47 -07:00
Lev Kokotov	be254cedd9	Fix debug log (#120 )	2022-08-11 22:47:47 -07:00
zainkabani	f963b12821	Health check delay (#118 ) * initial commit of server check delay implementation * fmt * spelling * Update name to last_healthcheck and some comments * Moved server tested stat to after require_healthcheck check * Make health check delay configurable * Rename to last_activity * Fix typo * Add debug log for healthcheck * Add address to debug log	2022-08-11 14:42:40 -07:00
Mostafa Abdelraouf	48cff1f955	Slightly more light weight health check (#100 )	2022-07-29 11:58:25 -07:00
Mostafa Abdelraouf	2ae4b438e3	Add support for multi-database / multi-user pools (#96 ) * Add support for multi-database / multi-user pools * Nothing * cargo fmt * CI * remove test users * rename pool * Update tests to use admin user/pass * more fixes * Revert bad change * Use PGDATABASE env var * send server info in case of admin	2022-07-27 19:47:55 -07:00
Lev Kokotov	b93303eb83	Live reloading entire config and bug fixes (#84 ) * Support reloading the entire config (including sharding logic) without restart. * Fix bug incorrectly handing error reporting when the shard is set incorrectly via SET SHARD TO command. selected wrong shard and the connection keep reporting fatal #80. * Fix total_received and avg_recv admin database statistics. * Enabling the query parser by default. * More tests.	2022-06-24 14:52:38 -07:00
Lev Kokotov	fe32b5ef17	Reduce traffic on the stats channel (#69 )	2022-05-17 13:05:25 -07:00
Lev Kokotov	54699222f8	Possible fix for clients waiting stat leak (#68 )	2022-05-14 21:35:33 -07:00

1 2

97 Commits