Commit Graph

306 Commits

Author SHA1 Message Date
Jose Fernández
9241df18e2 Allow sending logs to stdout by using STDOUT_LOG env var (#334)
* Allow sending logs to stdout by using STDOUT_LOG env var

* Increase stats buffer size
2023-02-28 13:10:40 -08:00
zainkabani
eb8cfdb1f1 Adds SHUTDOWN command as alternate option to sending SIGINT (#331)
* Adds SHUTDOWN command to PgCat as alternate option to sending SIGINT

* Check if we're already in SHUTDOWN sequence

* Send signal directly from shutdown instead of using channel

* Add tests

* trigger build

* Lowercase response and boolean change

* Update tests

* Fix tests

* typo
2023-02-26 22:16:30 -08:00
Mostafa Abdelraouf
75a7d4409a Fix Back-and-forth RELOAD Bug (#330)
We identified a bug where RELOAD fails to update the pools.

To reproduce you need to start at some config state, modify that state a bit, reload, revert the configs back to the original state, and reload. The last reload will fail to update the pool because PgCat "thinks" the pool state didn't change.

This is because we use a HashSet to keep track of config hashes but we never remove values from it.
Say we start with State A, we modify pool configs to State B and reload. Now the POOL_HASHES struct has State A and State B. Attempting to go back to State A will encounter a hashset hit which is interpreted by PgCat as "Configs are the same, no need to reload pools"

We fix this by attaching a config_hash value to ConnectionPool object and we calculate that value when we create the pool. This eliminates the need for a global variable. One shortcoming here is that changing any config under one user in the pool will trigger a reload for the entire pool (which is fine I think)
2023-02-21 21:53:10 -06:00
Nicholas Dujay
37e1c5297a implement show users (#329)
* implement show users

* fix compile errors

* add basic ruby test

* gitignore things
2023-02-21 13:08:43 -08:00
zainkabani
2b05ff4ee5 Log worker thread count at startup (#322) 2023-02-16 16:51:38 -06:00
John Meagher
d5f60b1720 Allow shard setting with comments (#293)
What
Allows shard selection by the client to come in via comments like /* shard_id: 1 */ select * from foo;

Why
We're using a setup in Ruby that makes it tough or impossible to inject commands on the connection to set the shard before it gets to the "real" SQL being run. Instead we have an updated PG adapter that allows injection of comments before each executed SQL statement. We need this support in pgcat in order to keep some complex shard picking logic in Ruby code while using pgcat for connection management.

Local Testing
Run postgres and pgcat with the default options. Run psql < tests/sharding/query_routing_setup.sql to setup the database for the tests and run ./tests/pgbench/external_shard_test.sh as often as needed to exercise the shard setting comment test.
2023-02-15 15:19:16 -06:00
Mostafa Abdelraouf
97f5a0564d Fix deprecation warnings (#319)
warning: use of deprecated function `base64::decode`: Use Engine::decode
2023-02-14 16:20:11 -06:00
Tommy Chen
9830c18315 Support EC and PKCS8 private keys (#316)
* Support EC and PKCS8 private keys

* Use iter instead of infinite loop in `load_keys` fn
2023-02-14 08:30:47 -08:00
Mostafa Abdelraouf
f1265a5570 Introduce tcp_keepalives to PgCat (#315)
We have encountered a case where PgCat pools were stuck following a database incident. Our best understanding at this point is that the PgCat -> Postgres connections died silently and because Tokio defaults to disabling keepalives, connections in the pool were marked as busy forever. Only when we deployed PgCat did we see recovery.

This PR introduces tcp_keepalives to PgCat. This sets the defaults to be

keepalives_idle: 5        # seconds
keepalives_interval: 5 # seconds
keepalives_count: 5    # a count
These settings can detect the death of an idle connection within 30 seconds of its death. Please note that the connection can remain idle forever (from an application perspective) as long as the keepalive packets are flowing so disconnection will only occur if the other end is not acknowledging keepalive packets (keepalive packet acks are handled by the OS, the application does not need to do anything). I plan to add tcp_user_timeout in a follow-up PR.
2023-02-08 11:35:38 -06:00
zainkabani
d81a744154 Fix logging mistakes (#313)
Mistakenly logging username as poolname and poolname as username
2023-02-07 14:16:28 -06:00
Kurtsley
1c73889fb9 Add initial Windows support, ref #298 (#301) 2023-01-28 15:51:05 -08:00
Lev Kokotov
24e79dcf05 Startup improvements & PAUSE/RESUME (#300)
* Dont require servers to be online to start pooler

* PAUSE/RESUME

* fix

* Refresh pool

* Fixes

* lint
2023-01-28 15:36:35 -08:00
Lev Kokotov
2e3eb2663e Fix formatting (#299) 2023-01-28 09:17:49 -08:00
zainkabani
a0e740d30f Refactors is_banned logic and forces health check on unban (#288)
* Refactors is_banned logic and forces healthcheck on unban

* typo

* Make is banned log debug

* addressing comments

* Comment
2023-01-19 17:36:48 -08:00
Jose Fernández
c58f9557ae Add more metrics to prometheus endpoint (#263)
This change:
- Adds server metrics to prometheus endpoint.
- Adds database metrics to prometheus endpoint.
- Adds pools metrics to prometheus endpoint.
- Change metrics name to have a prefix of (stats|pools|databases|servers).
2023-01-19 07:48:12 -08:00
zainkabani
ca8901910c Removes message cloning operation required for query router (#285)
* Removes message cloning operation required for query router

* fmt

* flakey?

* ?
2023-01-19 07:19:49 -08:00
Mostafa Abdelraouf
87a771aecc Log error messages for network failures (#289)
We are seeing some Error reading message code from socket error messages, we want to get more context so this PR logs the actual error reported.
2023-01-19 05:18:08 -06:00
zainkabani
85ac3ef9a5 Buffer client CopyData messages (#284)
Buffers CopyData messages and removes buffer clone for the sync message
2023-01-17 17:39:55 -08:00
Mostafa Abdelraouf
7894bba59b Introduce least-outstanding-connections load balancing (#282)
Least outstanding connections load balancing can improve the load distribution between instances but for Pgcat it may also improve handling slow replicas that don't go completely down. With LoC, traffic will quickly move away from the slow replica without waiting for the replica to be banned.

If all replicas slow down equally (due to a bad query that is hitting all replicas), the algorithm will degenerate to Random Load Balancing (which is what we had in Pgcat until today).

This may also allow Pgcat to accommodate pools with differently-sized replicas.
2023-01-17 06:52:18 -06:00
zainkabani
ab0bad6da0 Write messages directly onto message buffer instead of allocating on own buffer (#283)
* initial commit

* comment

* fmt
2023-01-16 20:22:06 -08:00
zainkabani
8720ed3826 Buffer copy data messages (#265)
* Buffer copy data messages

* Update comment
2022-12-21 06:57:53 -08:00
Jose Fernández
9e8ef566c6 Allow setting the number of runtime workers to be used. (#258)
This change adds a new configuration parameter called `worker_threads` that
allows setting the number of workers the Tokio Runtime will use. It defaults to
4 to maintain backward compatibility.

Given that the config file parse is done asynchronously, first, a transient runtime
is created for reading config, and once it has been parsed, the actual runtime
that will be used for PgCat execution is created.
2022-12-16 11:13:13 -08:00
Jose Fernández
99247f7c88 Allow setting idle_timeout for server connections. (#257)
In postgres, you can specify an `idle_session_timeout` which will close
sessions idling for that amount of time. If a session is closed because
of a timeout, PgCat will erroneously mark the server as unhealthy as the next
health check will return an error because the connection was drop, if no
health check is to be executed, it will simply fail trying to send the query
to the server for the same reason, the conn was drop.

Given that bb8 allows configuring an idle_timeout for pools, it would be
nice to allow setting this parameter in the config file, this way you can
set it to something shorter than the server one. Also, server pool will be kept
smaller in moments of less traffic. Actually, currently this value is set as its
default in bb8, which is 10 minutes.

This changes allows setting the parameter using the config file. It can be set both
globally and per pool. When creating the pool, if the pool don't have it defined, global
value is used.
2022-12-16 08:01:00 -08:00
zainkabani
c62b86f4e6 Adds details to errors and fixes error propagation bug (#239) 2022-11-17 09:24:39 -08:00
zainkabani
fcd2cae4e1 Move get_config in startup to admin branch to scope down usage (#238) 2022-11-17 09:22:12 -08:00
zainkabani
5145b20e02 Move ClientBadStartup error log to debug (#237) 2022-11-16 22:16:16 -08:00
zainkabani
fe0b012832 Adds configuration for logging connections and removes get_config from entrypoint (#236)
* Adds configuration for logging connections and removes get_config from entrypoint

* typo

* rename connection config var and add to toml files

* update config log

* fmt
2022-11-16 22:15:47 -08:00
zainkabani
0c96156dae Adds health check setting to pool and avoids get_config in hotpath (#235)
* Adds healthcheck settings to pool

* fmt

* Fix test
2022-11-16 18:51:15 -08:00
zainkabani
b7e70b885c Default to using username when database isn't present on startup (#234) 2022-11-16 18:49:04 -08:00
Cluas
dfa26ec6f8 chore: make clippy lint happy (#225)
* chore: make clippy happy

* chore: cargo fmt

* chore: cargo fmt
2022-11-09 10:04:31 -08:00
Pradeep Chhetri
63d4431046 Fix for warnings about avg_errors not implemented (#220) 2022-11-02 08:11:47 -07:00
Lev Kokotov
9fe8d5e76f Dont change shard unless you know (#195) 2022-10-26 00:14:08 -07:00
Lev Kokotov
0524787d31 Automatic sharding: part one of many (#194)
Starting automatic sharding
2022-10-25 11:47:41 -07:00
Lev Kokotov
dea952e4ca Re-enable query parser and parse multiple statements (#191)
* Re-enable query parser and parse multiple statements

* no diff
2022-10-23 16:59:51 -07:00
zainkabani
19f635881a Don't send discard all when state is changed in transaction (#186)
* Don't send discard all when state is changed in transaction

* Remove unnecessary clone

* spelling

* Move transaction check to SET command

* Add test for set command in transaction

* type

* Update comments

* Update comments

* use moves instead of clones for initial message

* don't make message mutable

* Update unwrap

* but i'm not a wrapper

* Add set local test

* change continue
2022-10-13 19:33:12 -07:00
Mostafa Abdelraouf
eceb7f092e Use Jemalloc (#189)
Jemalloc performs better than the standard allocator in various metrics (http://ithare.com/testing-memory-allocators-ptmalloc2-tcmalloc-hoard-jemalloc-while-trying-to-simulate-real-world-loads/).

This PR makes changes to use Jemalloc as the global allocator for Pgcat. Windows is not officially supported by Pgcat but it should still compile but without Jemalloc as the allocator.
2022-10-13 11:13:45 -05:00
Mostafa Abdelraouf
83fd639918 A bit faster get_pool (#187)
* A bit faster get_pool

* fmt
2022-10-08 08:16:04 -07:00
Mostafa Abdelraouf
3d33ccf4b0 Fix maxwait metric (#183)
Max wait was being reported as 0 after #159

This PR fixes that and adds test
2022-10-05 21:41:09 -05:00
Lev Kokotov
7987c5ffad Replace a few types with more developer-friendly names (#182)
* Replace a few types with more developer-friendly names

* UserPool -> PoolIdentifier
2022-10-01 10:25:59 -07:00
zainkabani
24f5eec3ea Change sharding config to enum and move validation of configs into public functions (#178)
Moves config validation to own functions to enable tools to use them
Moves sharding config to enum
Makes defaults public
Make connect_timeout on pool and option which is overwritten by general connect_timeout
2022-09-28 08:50:14 -05:00
Mostafa Abdelraouf
af064ef447 Set client state to idle after error (#179)
* Set client state to idle after error

* fmt

* spelling

* clean up
2022-09-24 09:09:15 -07:00
Lev Kokotov
19fd677891 Fix the pool fix (#176)
* Always listen to the compiler

* Its fine
2022-09-23 12:06:07 -07:00
Lev Kokotov
964a5e1708 Don't drop connections if DB hasn't changed (#175)
* Don't drop connections if DB hasn't changed

* Incoporate connect_timeout into the pool config

* use the field
2022-09-23 11:32:05 -07:00
Mostafa Abdelraouf
d126c7424d Log failed client logins (#173)
* Log failed client logins

* more logging

* remove clones

* remove
2022-09-23 09:08:38 -07:00
zainkabani
f72dac420b Add defaults for configs (#174)
* add statement timeout to readme

* Add defaults to various configs

* primary read enabled default to false
2022-09-22 23:00:46 -07:00
zainkabani
3a729bb75b Minor refactor for configs (#172)
* Changes shard struct to use vector of ServerConfig

* Adds to query router

* Change client disconnect with error message to warn instead of debug

* Add warning logs for clean up actions
2022-09-22 10:07:02 -07:00
zainkabani
85cc2f4147 Update to latest library versions (#170) 2022-09-21 13:48:33 -07:00
zainkabani
8c09ab6c20 Export pgcat objects in lib (#169)
* Export pgcat objects in lib

* fmt
2022-09-20 18:47:32 -07:00
Mostafa Abdelraouf
f7a951745c Report Query times (#166)
* Report avg and total query timing

* Report query times

* fmt
2022-09-15 02:21:45 -04:00
Mostafa Abdelraouf
4ae1bc8d32 Add SHOW CLIENTS / SHOW SERVERS + Stats refactor and tests (#159)
* wip

* Main Thread Panic when swarmed with clients

* fix

* fix

* 1024

* fix

* remove test

* Add SHOW CLIENTS

* revert

* fmt

* Refactor + tests

* fmt

* add test

* Add SHOW SERVERS + Make PR unreviewable

* prometheus

* add state to clients and servers

* fmt

* Add application_name to server stats

* Add tests for waiting clients

* Docs

* remove comment

* comments

* typo

* cleanup

* CI
2022-09-14 11:20:41 -04:00