Commit Graph

58 Commits

Author SHA1 Message Date
Mostafa Abdelraouf
75a7d4409a Fix Back-and-forth RELOAD Bug (#330)
We identified a bug where RELOAD fails to update the pools.

To reproduce you need to start at some config state, modify that state a bit, reload, revert the configs back to the original state, and reload. The last reload will fail to update the pool because PgCat "thinks" the pool state didn't change.

This is because we use a HashSet to keep track of config hashes but we never remove values from it.
Say we start with State A, we modify pool configs to State B and reload. Now the POOL_HASHES struct has State A and State B. Attempting to go back to State A will encounter a hashset hit which is interpreted by PgCat as "Configs are the same, no need to reload pools"

We fix this by attaching a config_hash value to ConnectionPool object and we calculate that value when we create the pool. This eliminates the need for a global variable. One shortcoming here is that changing any config under one user in the pool will trigger a reload for the entire pool (which is fine I think)
2023-02-21 21:53:10 -06:00
zainkabani
2b05ff4ee5 Log worker thread count at startup (#322) 2023-02-16 16:51:38 -06:00
John Meagher
d5f60b1720 Allow shard setting with comments (#293)
What
Allows shard selection by the client to come in via comments like /* shard_id: 1 */ select * from foo;

Why
We're using a setup in Ruby that makes it tough or impossible to inject commands on the connection to set the shard before it gets to the "real" SQL being run. Instead we have an updated PG adapter that allows injection of comments before each executed SQL statement. We need this support in pgcat in order to keep some complex shard picking logic in Ruby code while using pgcat for connection management.

Local Testing
Run postgres and pgcat with the default options. Run psql < tests/sharding/query_routing_setup.sql to setup the database for the tests and run ./tests/pgbench/external_shard_test.sh as often as needed to exercise the shard setting comment test.
2023-02-15 15:19:16 -06:00
Mostafa Abdelraouf
f1265a5570 Introduce tcp_keepalives to PgCat (#315)
We have encountered a case where PgCat pools were stuck following a database incident. Our best understanding at this point is that the PgCat -> Postgres connections died silently and because Tokio defaults to disabling keepalives, connections in the pool were marked as busy forever. Only when we deployed PgCat did we see recovery.

This PR introduces tcp_keepalives to PgCat. This sets the defaults to be

keepalives_idle: 5        # seconds
keepalives_interval: 5 # seconds
keepalives_count: 5    # a count
These settings can detect the death of an idle connection within 30 seconds of its death. Please note that the connection can remain idle forever (from an application perspective) as long as the keepalive packets are flowing so disconnection will only occur if the other end is not acknowledging keepalive packets (keepalive packet acks are handled by the OS, the application does not need to do anything). I plan to add tcp_user_timeout in a follow-up PR.
2023-02-08 11:35:38 -06:00
Mostafa Abdelraouf
7894bba59b Introduce least-outstanding-connections load balancing (#282)
Least outstanding connections load balancing can improve the load distribution between instances but for Pgcat it may also improve handling slow replicas that don't go completely down. With LoC, traffic will quickly move away from the slow replica without waiting for the replica to be banned.

If all replicas slow down equally (due to a bad query that is hitting all replicas), the algorithm will degenerate to Random Load Balancing (which is what we had in Pgcat until today).

This may also allow Pgcat to accommodate pools with differently-sized replicas.
2023-01-17 06:52:18 -06:00
Jose Fernández
9e8ef566c6 Allow setting the number of runtime workers to be used. (#258)
This change adds a new configuration parameter called `worker_threads` that
allows setting the number of workers the Tokio Runtime will use. It defaults to
4 to maintain backward compatibility.

Given that the config file parse is done asynchronously, first, a transient runtime
is created for reading config, and once it has been parsed, the actual runtime
that will be used for PgCat execution is created.
2022-12-16 11:13:13 -08:00
Jose Fernández
99247f7c88 Allow setting idle_timeout for server connections. (#257)
In postgres, you can specify an `idle_session_timeout` which will close
sessions idling for that amount of time. If a session is closed because
of a timeout, PgCat will erroneously mark the server as unhealthy as the next
health check will return an error because the connection was drop, if no
health check is to be executed, it will simply fail trying to send the query
to the server for the same reason, the conn was drop.

Given that bb8 allows configuring an idle_timeout for pools, it would be
nice to allow setting this parameter in the config file, this way you can
set it to something shorter than the server one. Also, server pool will be kept
smaller in moments of less traffic. Actually, currently this value is set as its
default in bb8, which is 10 minutes.

This changes allows setting the parameter using the config file. It can be set both
globally and per pool. When creating the pool, if the pool don't have it defined, global
value is used.
2022-12-16 08:01:00 -08:00
zainkabani
fe0b012832 Adds configuration for logging connections and removes get_config from entrypoint (#236)
* Adds configuration for logging connections and removes get_config from entrypoint

* typo

* rename connection config var and add to toml files

* update config log

* fmt
2022-11-16 22:15:47 -08:00
Cluas
dfa26ec6f8 chore: make clippy lint happy (#225)
* chore: make clippy happy

* chore: cargo fmt

* chore: cargo fmt
2022-11-09 10:04:31 -08:00
Lev Kokotov
0524787d31 Automatic sharding: part one of many (#194)
Starting automatic sharding
2022-10-25 11:47:41 -07:00
zainkabani
24f5eec3ea Change sharding config to enum and move validation of configs into public functions (#178)
Moves config validation to own functions to enable tools to use them
Moves sharding config to enum
Makes defaults public
Make connect_timeout on pool and option which is overwritten by general connect_timeout
2022-09-28 08:50:14 -05:00
Lev Kokotov
19fd677891 Fix the pool fix (#176)
* Always listen to the compiler

* Its fine
2022-09-23 12:06:07 -07:00
Lev Kokotov
964a5e1708 Don't drop connections if DB hasn't changed (#175)
* Don't drop connections if DB hasn't changed

* Incoporate connect_timeout into the pool config

* use the field
2022-09-23 11:32:05 -07:00
zainkabani
f72dac420b Add defaults for configs (#174)
* add statement timeout to readme

* Add defaults to various configs

* primary read enabled default to false
2022-09-22 23:00:46 -07:00
zainkabani
3a729bb75b Minor refactor for configs (#172)
* Changes shard struct to use vector of ServerConfig

* Adds to query router

* Change client disconnect with error message to warn instead of debug

* Add warning logs for clean up actions
2022-09-22 10:07:02 -07:00
zainkabani
8c09ab6c20 Export pgcat objects in lib (#169)
* Export pgcat objects in lib

* fmt
2022-09-20 18:47:32 -07:00
Lev Kokotov
9d84d6f131 Graceful shutdown and refactor (#144)
* Graceful shutdown and refactor

* ok

* _Graceful_ shutdown

* Remove hardcoded setting

* clean up

* end

* timeout

* hmm

* hmm!

* bash

* bash

* hmm

* maybe maybe

* Adds tests and move non-admin connection rejection to startup (#145)

* Move error response

* Adds tests and removes unused variable

* Adds debug log

Co-authored-by: zainkabani <77307340+zainkabani@users.noreply.github.com>
2022-08-25 06:40:56 -07:00
Lev Kokotov
069d76029f Fix incorrect routing for replicas (#139)
* Fix incorrect routing for replicas

* name
2022-08-21 22:40:49 -07:00
Mostafa Abdelraouf
5f5b5e2543 Random instance selection (#136)
* wip

* revert some'

* revert more

* poor-man's integration test

* remove test

* fmt

* --workspace

* fix build

* fix integration test

* another stab

* log

* run after integration

* cargo test after integration

* revert

* revert more

* Refactor + clean up

* more clean up
2022-08-21 22:15:20 -07:00
Mostafa Abdelraouf
790898c20e Add pool name and username to address object (#128)
* Add pool name and username to address object

* Fix address name

* fmt
2022-08-17 08:40:47 -07:00
Lev Kokotov
3285006440 Statement timeout + replica imbalance fix (#122)
* Statement timeout

* send error message too

* Correct error messages

* Fix replica inbalance

* disable stmt timeout by default

* Redundant mark_bad

* revert healthcheck delay

* tests

* set it to 0

* reload config again
2022-08-13 13:45:58 -07:00
Pradeep Chhetri
52303cc808 Make prometheus port configurable (#121)
* Make prometheus port configurable

* Update circleci config
2022-08-13 10:25:14 -07:00
zainkabani
f963b12821 Health check delay (#118)
* initial commit of server check delay implementation

* fmt

* spelling

* Update name to last_healthcheck and some comments

* Moved server tested stat to after require_healthcheck check

* Make health check delay configurable

* Rename to last_activity

* Fix typo

* Add debug log for healthcheck

* Add address to debug log
2022-08-11 14:42:40 -07:00
Nicholas Dujay
1b166b462d create a prometheus exporter on a standard http port (#107)
* create a hyper server and add option to enable it in config

* move prometheus stuff to its own file; update format

* create metric type and help lookup table

* finish the metric help type map

* switch to a boolean and a standard port

* dont emit unimplemented metrics

* fail if curl returns a non 200

* resolve conflicts

* move log out of config.show and into main

* terminating new line

* upgrade curl

* include unimplemented stats
2022-08-09 12:19:11 -07:00
zainkabani
3719c22322 Implementing graceful shutdown (#105)
* Initial commit for graceful shutdown

* fmt

* Add .vscode to gitignore

* Updates shutdown logic to use channels

* fmt

* fmt

* Adds shutdown timeout

* Fmt and updates tomls

* Updates readme

* fmt and updates log levels

* Update python tests to test shutdown

* merge changes

* Rename listener rx and update bash to be in line with master

* Update python test bash script ordering

* Adds error response message before shutdown

* Add details on shutdown event loop

* Fixes response length for error

* Adds handler for sigterm

* Uses ready for query function and fixes number of bytes

* fmt
2022-08-08 16:01:24 -07:00
Mostafa Abdelraouf
106ebee71c Fix local dev (#112)
* Fix Dev env

* Update tests/sharding/query_routing_setup.sql

* Update tests/sharding/query_routing_setup.sql

* bring pgcat.toml on ci and local dev to parity

* more parity

* pool names

* pool names

* less diff

* fix tests

* fmt

* add other user to setup

Co-authored-by: Lev Kokotov <levkk@users.noreply.github.com>
2022-08-08 13:15:48 -07:00
Mostafa Abdelraouf
35381ba8fd Add test for config Serializer (#102)
* Add test for serializer

* fmt
2022-07-30 16:28:25 -07:00
Mostafa Abdelraouf
e591865d78 Avoid ValueAfterTable when serializing configs (#101) 2022-07-30 16:12:02 -07:00
Mostafa Abdelraouf
8a06fc4047 Add Serialize trait to configs (#99) 2022-07-28 15:42:04 -07:00
Mostafa Abdelraouf
2ae4b438e3 Add support for multi-database / multi-user pools (#96)
* Add support for multi-database / multi-user pools

* Nothing

* cargo fmt

* CI

* remove test users

* rename pool

* Update tests to use admin user/pass

* more fixes

* Revert bad change

* Use PGDATABASE env var

* send server info in case of admin
2022-07-27 19:47:55 -07:00
Lev
186f8be5b3 lint 2022-06-27 17:01:40 -07:00
Lev
7667fefead config check 2022-06-27 17:01:14 -07:00
Lev Kokotov
b974aacd71 check 2022-06-27 09:46:33 -07:00
Lev Kokotov
5bcd3bf9c3 Automatically reload config every seconds (disabled by default) (#86)
* Automatically reload config every seconds (disabld by default)

* add that
2022-06-25 11:46:20 -07:00
Lev Kokotov
b93303eb83 Live reloading entire config and bug fixes (#84)
* Support reloading the entire config (including sharding logic) without restart.

* Fix bug incorrectly handing error reporting when the shard is set incorrectly via SET SHARD TO command.
selected wrong shard and the connection keep reporting fatal #80.

* Fix total_received and avg_recv admin database statistics.

* Enabling the query parser by default.

* More tests.
2022-06-24 14:52:38 -07:00
Lev Kokotov
df85139281 Update README. Comments. Version bump. (#60)
* update readme

* comments

* just a version bump
2022-03-10 01:33:29 -08:00
Lev Kokotov
b309ead58f Handle SIGTERM. Add docker-compose.yml (#59)
* docker-compsoe

* remove statsd config

* readme
2022-03-08 17:18:48 -08:00
Lev Kokotov
35828a0a8c Per-shard statistics (#57)
* per shard stats

* aight

* cleaner

* fix show lists

* comments

* more friendly

* case-insensitive

* test all shards

* ok

* HUH?
2022-03-04 17:04:27 -08:00
Lev Kokotov
d4186b7815 More admin (#53)
* more admin

* more admin

* show lists

* tests
2022-03-01 22:49:43 -08:00
Lev Kokotov
b21e0f4a7e admin SHOW DATABASES (#51)
* admin SHOW DATABASES

* test

* correct replica count
2022-02-28 17:22:28 -08:00
Lev Kokotov
eb1473060e admin: SHOW CONFIG (#50)
* admin: SHOW CONFIG

* test
2022-02-28 08:14:39 -08:00
Lev Kokotov
26f75f8d5d admin RELOAD (#49)
* admin RELOAD

* test
2022-02-27 10:21:24 -08:00
Lev Kokotov
b3c8ca4b8a Another example of a sharding function (#41)
* Another example of a sharding function

* tests
2022-02-23 11:50:34 -08:00
Lev Kokotov
a6fc935040 Can pass config path as argument (#36)
* show better errors for config parsing

* lint
2022-02-21 20:41:32 -08:00
Lev Kokotov
44b5e7eeee use logger lib; minor refactor; sv_* stats (#29) 2022-02-20 22:47:08 -08:00
Lev Kokotov
d4c1fc87ee Reloadable config (#26)
* Reloadable config

* readme

* live config reload

* test matrix
2022-02-19 13:57:35 -08:00
Lev Kokotov
aa796289bf Query parser 3.0 (#23)
* Starting query parsing

* Query parser

* working config

* disable by default

* fix tsets

* introducing log crate; test for query router; comments

* typo

* fixes for banning

* added test for prepared stmt
2022-02-18 07:10:18 -08:00
Lev Kokotov
05b4cccb97 More statistics (#20)
* cleaner stats

* remove shard selection there
2022-02-15 08:18:01 -08:00
Lev Kokotov
526b9eb666 Pass real server info to the client (#10) 2022-02-11 22:19:49 -08:00
Lev Kokotov
0d369ab90a add default server role; bug fix 2022-02-11 11:19:40 -08:00