pgcat

mirror of https://github.com/postgresml/pgcat.git synced 2026-07-16 17:39:06 +00:00

Author	SHA1	Message	Date
Mostafa	3349cecc18	Add checkout_failure_limit config/feature (#911 ) In a high availability deployment of PgCat, it is possible that a client may land on a container of PgCat that is very busy with clients and as such the new client might be perpetually stuck in checkout failure loop because all connections are used by other clients. This is specially true in session mode pools with long-lived client connections (e.g. FDW connections). One way to fix this issue is to close client connections after they encounter some number of checkout failure. This will force the client to hit the Network load balancer again, land on a different process/container, try to checkout a connection on the new process/container. if it fails, it is disconnected and tries with another one. This mechanism is guaranteed to eventually land on a balanced state where all clients are able to find connections provided that the overall number of connections across all containers matches the number of clients. I was able to reproduce this issue in a control environment and was able to show this PR is able to fix it.	2025-02-27 13:17:00 -06:00
Alex Kesling	f8e2fcd0ed	s/Iniitalize/Initialize/ (#897 )	2025-01-09 11:59:42 -08:00
Nadav Shatz	3202f5685b	Add DB activity based routing (#864 )	2024-12-22 05:23:57 -06:00
Gabriel Simmer	b37d105184	chore(deps): bump sqlparser from 0.41.0 to 0.52.0 (#870 ) * chore(deps): bump sqlparser from 0.41.0 to 0.52.0 Bumps [sqlparser](https://github.com/apache/datafusion-sqlparser-rs) from 0.41.0 to 0.52.0. - [Changelog](https://github.com/apache/datafusion-sqlparser-rs/blob/main/CHANGELOG.md) - [Commits](https://github.com/apache/datafusion-sqlparser-rs/commits) --- updated-dependencies: - dependency-name: sqlparser dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <support@github.com> * bump * Update to latest sqlparser version --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Mostafa <mostafa.mohmmed@gmail.com>	2024-11-23 07:25:37 -06:00
Jose Fernández	c11418c083	Revert "Do not unban replicas if a primary is available" (#850 ) Revert "Do not unban replicas if a primary is available (#843)" This reverts commit `cdcfa99fb9`.	2024-11-07 22:00:43 +01:00
Jose Fernández	c9544bdff2	Fix `default_role` being ignored when `query_parser_enabled` was false (#847 ) Fix default_role being ignored when query_parser_enabled was false	2024-11-07 11:11:49 -06:00
Jose Fernández	cdcfa99fb9	Do not unban replicas if a primary is available (#843 ) Add `unban_replicas_when_all_banned` to control unbanning replicas behavior.	2024-11-07 11:11:11 -06:00
Mostafa	c27d801abf	Rename a couple of variables (#839 )	2024-10-23 06:38:07 -05:00
Javier Goday	186e72298f	#829 : read/write splitting on CTE mutable statements (#835 )	2024-10-23 06:20:04 -05:00
Sebastian Serth	3935366d86	End Prometheus stats with a new line separator (#826 ) End prometheus stats with a new line separator According to the [OpenMetrics specification](https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md#overall-structure), each line MUST end with `\n`. Previously, the last line was not ending with `\n`, so that strict parsers had issues reading the Prometheus stats.	2024-09-22 17:14:04 -05:00
Andrew Jackson	670311daf9	Implement Trust Authentication (#805 ) * Implement Trust Authentication * Remove remaining LDAP stuff * Reverted LDAP changes, Cleaned up tests --------- Co-authored-by: Andrew Jackson <andrewjackson2988@gmail.com> Co-authored-by: CommanderKeynes <andrewjackson947@gmail.coma>	2024-09-10 09:29:45 -05:00
Mostafa Abdelraouf	f0865ca616	Improve Prometheus exporter output (#795 ) * Prometheus metrics updates: * Add username label to deconflict metrics that would otherwise have duplicate labels across different pools. * Group metrics by name and only print HELP and TYPE once per metric name. * Sort labels for a deterministic output. --------- Co-authored-by: Curtis Myzie <curtis.myzie@gmail.com> Co-authored-by: Towhid Khan	2024-09-05 08:58:18 -05:00
Mostafa Abdelraouf	2def40ea6a	Add test case for issue 776 (#786 ) I am adding a tiny test that uses the SQL statement that was reported to break an older version of SQL parser library #776	2024-08-31 10:52:33 -05:00
Mostafa Abdelraouf	c05129018d	Improve Prometheus stats + Add Grafana dashboard (#785 ) We were missing some labels on metrics generated by the Prometheus exporter so I fixed that. There are still some gaps that I want to address with respect to the metrics we track but this seems like a good start. I also created a Grafana Dashboard and exported it to JSON. It is designed with the same metric names the Prometheus exporter uses.	2024-08-31 08:18:57 -05:00
Mostafa Abdelraouf	29a476e190	QueryRouter: route to primary when locks exists (select for update) (#782 ) Authored-by: Javier Goday <jgoday@gmail.com>	2024-08-30 04:26:36 -05:00
Saraj Munjal	7cbc9178d8	Bump the hyper crate to v1.4.1 and rework prometheus server handling (#778 ) Bump hyper to v1.4.1 and rework prometheus server handling	2024-08-29 09:47:58 -05:00
Mostafa Abdelraouf	8f9a2b8e6f	Fix a Panic in admin commands (#779 ) We have a panic when we send SHOW or ;;;;;;;;;;;;;;;;; to admin database. This PR fixes these panics and adds a couple of tests	2024-08-28 21:29:40 -05:00
brandonpike	cbf4d58144	Fix lint warnings for rust-1.79 (#769 ) 2 things that are recommended by rust-lang - implementing `std::fmt::Display` rather than ToString (1) and using clone_from (2). [1] https://rust-lang.github.io/rust-clippy/master/index.html#/to_string_trait_impl [2] https://rust-lang.github.io/rust-clippy/master/index.html#assigning_clones Signed-off-by: Brandon Pike <pikebrandon@att.net>	2024-07-15 20:30:26 -07:00
Andrey Stikheev	0b034a6831	Add TCP_NODELAY option to improve performance for large response queries (#749 ) This commit adds the TCP_NODELAY option to the socket configuration in `configure_socket` function. Without this option, we observed significant performance issues when executing SELECT queries with large responses. Before the fix: postgres=> SELECT repeat('a', 1); SELECT repeat('a', 8153); Time: 1.368 ms Time: 41.364 ms After the fix: postgres=> SELECT repeat('a', 1); SELECT repeat('a', 8153); Time: 1.332 ms Time: 1.528 ms By setting TCP_NODELAY, we eliminate the Nagle's algorithm delay, which results in a substantial improvement in response times for large queries. This problem was discussed in https://github.com/postgresml/pgcat/issues/616.	2024-05-26 14:47:21 -07:00
Mostafa Abdelraouf	966b8e093c	Report checkout error when all servers are down (#736 ) We shouldn't report checkout_success when we are going to return Error.	2024-05-08 12:18:27 -05:00
Toby Hede	0d94d0b90a	Update sqlparser to 0.41 (#666 )	2024-04-12 22:12:37 -07:00
Mostafa Abdelraouf	e1e4929d43	Report waiting time only for currently waiting clients (#678 ) The pool maxwait metric currently operates differently from Pgbouncer. The way it operates today is that we keep track of max_wait on each connected client, when SHOW POOLS query is made, we go over the connected clients and we get the max of max_wait times among clients. This means the pool maxwait will never reset, it will always be monotonically increasing until the client with the highest maxwait disconnects. This PR changes this behavior, by keeping track of the wait_start time on each client, when a client goes into WAITING state, we record the time offset from connect_time. When we either successfully or unsuccessfully checkout a connection from the pool, we reset the wait_start time. When SHOW POOLS query is made, we go over all connected clients and we only consider clients whose wait_start is non-zero, for clients that have non-zero wait times, we compare them and report the maximum waiting time as maxwait for the pool.	2024-01-18 11:57:28 -06:00
Lev Kokotov	dc4d6edf17	Revert max_wait changes (#658 ) * Revert "Reset wait times when checked out successfully (#656)" This reverts commit `ec3920d60f`. * Revert "Not sure how this sneaked past CI" This reverts commit `4c5498b915`. * Revert "only report wait times from clients currently waiting to match behavior of pgbouncer (#655)" This reverts commit `0e8064b049`.	2023-12-05 01:47:38 -08:00
Lev Kokotov	ec3920d60f	Reset wait times when checked out successfully (#656 )	2023-12-04 18:33:08 -08:00
Lev	4c5498b915	Not sure how this sneaked past CI	2023-12-04 18:30:03 -08:00
Daniel Babiak	0e8064b049	only report wait times from clients currently waiting to match behavior of pgbouncer (#655 ) * Change maxwait to only report wait times from clients currently waiting to match behavior of pgbouncer * Fix tests	2023-12-04 18:19:51 -08:00
Alec	4dbef49ec9	Require a reason when marking a server bad (#654 ) When calling mark_bad require a reason so it can be logged rather than the generic message	2023-12-04 16:09:41 -08:00
Lev Kokotov	e76d720ffb	Dont cache prepared statement with errors (#647 ) * Fix prepared statement not found when prepared stmt has error * cleanup debug * remove more debug msgs * sure debugged this.. * version bump * add rust tests	2023-11-28 21:13:30 -08:00
Calvin Hughes	998cc16a3c	Expose clients maxwait time in SHOW CLIENTS response via admin (#639 ) * Expose clients maxwait time in SHOW CLIENTS response via PgCat admin Displays the maxwait via maxwait_seconds and maxwait_us columns for each client that can be used to track down the wait time per client in a case where the overall pool stats shows waiting time. The maxwait_us, similar to the pool stats setup, is configured to display as a remainder alongside the maxwait_seconds. * Use maxwait instead of maxwait_seconds to match pools column name --------- Co-authored-by: Calvin Hughes <9379992+calvinhughes@users.noreply.github.com>	2023-11-13 11:24:39 -08:00
Jakob Schultz-Falk	7c37da2fad	Support unnamed prepared statements (#635 ) * Add golang test suite to reproduce issue with unnamed parameterized prepared statements * Allow caching of unnamed prepared statements * Passthrough describe on portals * Remove unneeded kill * Update Dockerfile.ci with golang * Move out update of Dockerfiles to separate PR	2023-11-08 16:36:45 -08:00
Lev Kokotov	dae240d30c	Add connet_timeout and idle_timeout to the user (#634 ) * Add connect_timeout to the user * Allow user to override connect timeout * version * lock * Add both timeouts to the user	2023-11-06 12:18:52 -08:00
Zain Kabani	7d3003a16a	Reimplement prepared statements with LRU cache and statement deduplication (#618 ) * Initial commit * Cleanup and add stats * Use an arc instead of full clones to store the parse packets * Use mutex instead * fmt * clippy * fmt * fix? * fix? * fmt * typo * Update docs * Refactor custom protocol * fmt * move custom protocol handling to before parsing * Support describe * Add LRU for server side statement cache * rename variable * Refactoring * Move docs * Fix test * fix * Update tests * trigger build * Add more tests * Reorder handling sync * Support when a named describe is sent along with Parse (go pgx) and expecting results * don't talk to client if not needed when client sends Parse * fmt :( * refactor tests * nit * Reduce hashing * Reducing work done to decode describe and parse messages * minor refactor * Merge branch 'main' into zain/reimplment-prepared-statements-with-global-lru-cache * Rewrite extended and prepared protocol message handling to better support mocking response packets and close * An attempt to better handle if there are DDL changes that might break cached plans with ideas about how to further improve it * fix * Minor stats fixed and cleanup * Cosmetic fixes (#64) * Cosmetic fixes * fix test * Change server drop for statement cache error to a `deallocate all` * Updated comments and added new idea for handling DDL changes impacting cached plans * fix test? * Revert test change * trigger build, flakey test * Avoid potential race conditions by changing get_or_insert to promote for pool LRU * remove ps enabled variable on the server in favor of using an option * Add close to the Extended Protocol buffer --------- Co-authored-by: Lev Kokotov <levkk@users.noreply.github.com>	2023-10-25 15:11:57 -07:00
Zain Kabani	d37df43a90	Reduces the amount of time the get_pool operation takes (#625 ) * Reduces the amount of time the get_pool operation takes * trigger build * Fix admin	2023-10-19 23:49:05 -07:00
Mohammad Dashti	2c7bf52c17	Removed unnecessary `clippy` overrides. (#614 ) Removed unnecessary clippy overrides.	2023-10-11 10:13:23 -07:00
Mohammad Dashti	de8df29ca4	Added `clippy` to CI and fixed all `clippy` warnings (#613 ) * Fixed all clippy warnings. * Added `clippy` to CI. * Reverted an unwanted change + Applied `cargo fmt`. * Fixed the idiom version. * Revert "Fixed the idiom version." This reverts commit `6f78be0d42`. * Fixed clippy issues on CI. * Revert "Fixed clippy issues on CI." This reverts commit `a9fa6ba189`. * Revert "Reverted an unwanted change + Applied `cargo fmt`." This reverts commit `6bd37b6479`. * Revert "Fixed all clippy warnings." This reverts commit `d1f3b847e3`. * Removed Clippy * Removed Lint * `admin.rs` clippy fixes. * Applied more clippy changes. * Even more clippy changes. * `client.rs` clippy fixes. * `server.rs` clippy fixes. * Revert "Removed Lint" This reverts commit `cb5042b144`. * Revert "Removed Clippy" This reverts commit `6dec8bffb1`. * Applied lint. * Revert "Revert "Fixed clippy issues on CI."" This reverts commit `49164a733c`.	2023-10-10 09:18:21 -07:00
Mohammad Dashti	3371c01e0e	Added a `Plugin` trait (#536 ) * Improved logging * Improved logging for more `Address` usages * Fixed lint issues. * Reverted the `Address` logging changes. * Applied the PR comment by @levkk. * Applied the PR comment by @levkk. * Applied the PR comment by @levkk. * Applied the PR comment by @levkk.	2023-10-03 13:13:21 -07:00
Mohammad Dashti	c2a483f36a	Automatic sharding for INSERT, UPDATE, and DELETE statements. (#610 ) Added support for INSERT, UPDATE, and DELETE for auto-sharding.	2023-10-03 09:36:13 -07:00
Kevin Elliott	04e9814770	Fix incorrect data output for plugin query_logger (#601 ) Update query_logger.rs Pool and user were incorrectly swapped and needed to be fixed.	2023-09-25 18:45:51 -07:00
Lev Kokotov	037d232fcd	Mark admin clients as disconnected on error (#597 )	2023-09-21 15:55:22 -07:00
Lev Kokotov	b2933762e7	Report maxwait for clients that end up not getting a connection (#596 )	2023-09-21 14:50:18 -07:00
Mohammad Dashti	7f5639c94a	Include `thread_id` in the logs (#592 ) Include `thread_id` in the logs.	2023-09-20 09:11:16 -07:00
Lev Kokotov	c0112f6f12	Revert "User-friendly error messages" (#587 ) Revert "User-friendly error messages (#586)" This reverts commit `b7ceee2ddf`.	2023-09-11 16:39:31 -07:00
Lev Kokotov	b7ceee2ddf	User-friendly error messages (#586 )	2023-09-11 16:39:11 -07:00
Mostafa Abdelraouf	0b01d70b55	Allow configuring routing decision when no shard is selected (#578 ) The TL;DR for the change is that we allow QueryRouter to set the active shard to None. This signals to the Pool::get method that we have no shard selected. The get method follows a no_shard_specified_behavior config to know how to route the query. Original PR description Ruby-pg library makes a startup query to SET client_encoding to ... if Encoding.default_internal value is set (Code). This query is troublesome because we cannot possibly attach a routing comment to it. PgCat, by default, will route that query to the default shard. Everything is fine until shard 0 has issues, Clients will all be attempting to send this query to shard0 which increases the connection latency significantly for all clients, even those not interested in shard0 This PR introduces no_shard_specified_behavior that defines the behavior in case we have routing-by-comment enabled but we get a query without a comment. The allowed behaviors are random: Picks a shard at random random_healthy: Picks a shard at random favoring shards with the least number of recent connection/checkout errors shard_<number>: e.g. shard_0, shard_4, etc. picks a specific shard, everytime In order to achieve this, this PR introduces an error_count on the Address Object that tracks the number of errors since the last checkout and uses that metric to sort shards by error count before making a routing decision. I didn't want to use address stats to avoid introducing a routing dependency on internal stats (We might do that in the future but I prefer to avoid this for the time being. I also made changes to the test environment to replace Ruby's TOML reader library, It appears to be abandoned and does not support mixed arrays (which we use in the config toml), and it also does not play nicely with single-quoted regular expressions. I opted for using yj which is a CLI tool that can convert from toml to JSON and back. So I refactor the tests to use that library.	2023-09-11 13:47:28 -05:00
hellower	33db0dffa8	stream.peer_addr() & auth_query (#575 ) * Don't unwrap stream.peer_addr() https://github.com/postgresml/pgcat/pull/562 (same code) (another lines changed) * auth_query (real sample) # single quote need auth_query="SELECT usename, passwd FROM pg_shadow WHERE usename='$1'"	2023-08-31 14:11:38 -07:00
Tommy Li	9937193332	Allow pause/resuming all pools (#566 ) support pausing all pools	2023-08-29 10:07:36 -07:00
Zain Kabani	ffe820497f	Don't unwrap stream.peer_addr() (#562 )	2023-08-25 10:33:39 -07:00
Zain Kabani	be549f3faa	Fixes try_execute_command message parsing bug (#560 ) * Fixes try_execute_command message parsing bug * Fix initial segment logic * Add test	2023-08-24 11:25:43 -07:00
Zain Kabani	3255323bff	Adds option to log which parameter status is changed by the client (#550 )	2023-08-16 11:01:21 -07:00
Zain Kabani	bb27586758	Reset instead of discard all (#549 ) * Use reset all instead of discard all * Move 'X' handling to before admin handle * fix tests	2023-08-16 10:08:48 -07:00

1 2 3 4 5 ...

317 Commits