Commit Graph

231 Commits

Author SHA1 Message Date
Lev Kokotov
389993bf3e Accurate log messages (#425) 2023-05-05 08:27:19 -07:00
Lev Kokotov
ba5243b6dd Optionally validate config on boot (#423) 2023-05-03 17:07:23 -07:00
Lev Kokotov
128ef72911 lowercase config query (#422)
* lowercase config query

* remove debug
2023-05-03 16:47:20 -07:00
Lev Kokotov
811885f464 Actually plugins (#421)
* more plugins

* clean up

* fix tests

* fix flakey test
2023-05-03 16:13:45 -07:00
Lev Kokotov
09e54e1175 Plugins! (#420)
* Some queries

* Plugins!!

* cleanup

* actual names

* the actual plugins

* comment

* fix tests

* Tests

* unused errors

* Increase reaper rate to actually enforce settings

* ok
2023-05-03 09:13:05 -07:00
Jose Fernández
7dfbd993f2 Add dns_cache for server addresses as in pgbouncer (#249)
* Add dns_cache so server addresses are cached and invalidated when DNS changes.

Adds a module to deal with dns_cache feature. It's
main struct is CachedResolver, which is a simple thread safe
hostname <-> Ips cache with the ability to refresh resolutions
every `dns_max_ttl` seconds. This way, a client can check whether its
ip address has changed.

* Allow reloading dns cached

* Add documentation for dns_cached
2023-05-02 10:26:40 +02:00
Lev Kokotov
0d504032b2 Server TLS (#417)
* Server TLS

* Finish up TLS

* thats it

* diff

* remove dead code

* maybe?

* dirty shutdown

* skip flakey test

* remove unused error

* fetch config once
2023-04-30 09:41:46 -07:00
Lev Kokotov
4a87b4807d Add more pool settings (#416)
* Add some pool settings

* fmt
2023-04-26 16:33:26 -07:00
Shawn
cb5ff40a59 fix typo (#415)
chore: typo
2023-04-26 08:28:54 -07:00
Lev Kokotov
3dae3d0777 Separate server and client passwords optionally (#407)
* Separate server and user passwords

* config
2023-04-18 09:57:17 -07:00
Cluas
bae12fca99 feat: set keepalive for pgcat server itself (#402)
* feat: set keepalive for pgcat server self

* docs: note also set for client
2023-04-12 09:29:43 -07:00
Lev Kokotov
421c5d4b64 Load config on client connect (#401) 2023-04-11 10:32:48 -07:00
Kian-Meng Ang
d568739db9 Fix typos (#398)
Found via `typos --format brief`
2023-04-10 18:37:16 -07:00
Lev Kokotov
692353c839 A couple things (#397)
* Format cleanup

* fmt

* finally
2023-04-10 14:51:01 -07:00
Lev Kokotov
a62f6b0eea Fix port; add user pool mode (#395)
* Fix port; add user pool mode

* will probably break our session/transaction mode tests
2023-04-05 15:06:19 -07:00
Jose Fernández
6f768a84ce Auth passthrough (auth_query) (#266)
* Add a new exec_simple_query method

This adds a new `exec_simple_query` method so we can make 'out of band'
queries to servers that don't interfere with pools at all.
In order to reuse startup code for making these simple queries,
we need to set the stats (`Reporter`) optional, so using these
simple queries wont interfere with stats.

* Add auth passthough (auth_query)

Adds a feature that allows setting auth passthrough for md5 auth.

It adds 3 new (general and pool) config parameters:

- `auth_query`: An string containing a query that will be executed on boot
to obtain the hash of a given user. This query have to use a placeholder `$1`,
so pgcat can replace it with the user its trying to fetch the hash from.
- `auth_query_user`: The user to use for connecting to the server and executing the
auth_query.
- `auth_query_password`: The password to use for connecting to the server and executing the
auth_query.

The configuration can be done either on the general config (so pools share them) or in a per-pool basis.

The behavior is, at boot time, when validating server connections, a hash is fetched per server
and stored in the pool. When new server connections are created, and no cleartext password is specified,
the obtained hash is used for creating them, if the hash could not be obtained for whatever reason, it retries
it.

When client authentication is tried, it uses cleartext passwords if specified, it not, it checks whether
we have query_auth set up, if so, it tries to use the obtained hash for making client auth. If there is no
hash (we could not obtain one when validating the connection), a new fetch is tried.

Once we have a hash, we authenticate using it against whathever the client has sent us, if there is a failure
we refetch the hash and retry auth (so password changes can be done).

The idea with this 'retrial' mechanism is to make it fault tolerant, so if for whatever reason hash could not be
obtained during connection validation, or the password has change, we can still connect later.

* Add documentation for Auth passthrough
2023-03-30 13:29:23 -07:00
Jose Fernández
58ce76d9b9 Refactor stats to use atomics (#375)
* Refactor stats to use atomics

When we are dealing with a high number of connections, generated
stats cannot be consumed fast enough by the stats collector loop.
This makes the stats subsystem inconsistent and a log of
warning messages are thrown due to unregistered server/clients.

This change refactors the stats subsystem so it uses atomics:

- Now counters are handled using U64 atomics
- Event system is dropped and averages are calculated using a loop
  every 15 seconds.
- Now, instead of snapshots being generated ever second we keep track of servers/clients
  that have registered. Each pool/server/client has its own instance of the counter and
  makes changes directly, instead of adding an event that gets processed later.

* Manually mplement Hash/Eq in `config::Address` ignoring stats

* Add tests for client connection counters

* Allow connecting to dockerized dev pgcat from the host

* stats: Decrease cl_idle when idle socket disconnects
2023-03-28 17:19:37 +02:00
Zain Kabani
ca4431b67e Add idle client in transaction configuration (#380)
* Add idle client in transaction configuration

* fmt

* Update docs

* trigger build

* Add tests

* Make the config dynamic from reloads

* fmt

* comments

* trigger build

* fix config.md

* remove error
2023-03-24 08:20:30 -07:00
Mostafa Abdelraouf
d66b377a8e Check Slice bounds in read_message to avoid panics (#371)
When recv is called in the mirroring client, we noticed an occasional panic when reading the message.

thread 'tokio-runtime-worker' panicked at 'slice index starts at 5 but ends at 0', src/messages.rs:522:18
We are still debugging the reason why this happens but adding a check for slice bounds seems like a good idea. Instead of panicking, this will return an Err to the caller which will close the connection.
2023-03-17 12:31:43 -05:00
Mostafa Abdelraouf
e5df179ac9 Reduce memory and CPU footprint of mirroring (#369)
The experimental mirroring feature used a lot of memory and CPU when put under production traffic. This change attempts to reduce memory and CPU usage.

Memory footprint is reduced by making the channel smaller. CPU usage is reduced by avoiding allocations if the channel is full or is closed.

We might lose more messages this way if the mirror falls behind but that is more acceptable than crashing the entire process when it goes out-of-memory (OOM)
2023-03-15 17:58:45 -05:00
Lev Kokotov
b4baa86e8a Extended query protocol sharding (#339)
* Prepared stmt sharding

s

tests

* len check

* remove python test

* latest rust

* move that to debug for sure

* Add the actual tests

* latest image

* Update tests/ruby/sharding_spec.rb
2023-03-10 07:55:22 -08:00
Mostafa Abdelraouf
76e195a8a4 Reorder fields in Shard to avoid ValueAfterTable errors (#349) 2023-03-10 07:39:42 -06:00
Mostafa Abdelraouf
aa89e357e0 PgCat Query Mirroring (#341)
This is an implementation of Query mirroring in PgCat (outlined here #302)

In configs, we match mirror hosts with the servers handling the traffic. A mirror host will receive the same protocol messages as the main server it was matched with.

This is done by creating an async task for each mirror server, it communicates with the main server through two channels, one for the protocol messages and one for the exit signal. The mirror server sends the protocol packets to the underlying PostgreSQL server. We receive from the underlying PostgreSQL server as soon as the data is available and we immediately discard it. We use bb8 to manage the life cycle of the connection, not for pooling since each mirror server handler is more or less single-threaded.

We don't have any connection pooling in the mirrors. Matching each mirror connection to an actual server connection guarantees that we will not have more connections to any of the mirrors than the parent pool would allow.
2023-03-10 06:23:51 -06:00
Mostafa Abdelraouf
2cc6a09fba Add Manual host banning to PgCat (#340)
Sometimes we want an admin to be able to ban a host for some time to route traffic away from that host for reasons like partial outages, replication lag, and scheduled maintenance.

We can achieve this today using a configuration update but a quicker approach is to send a control command to PgCat that bans the replica for some specified duration.

This command does not change the current banning rules like

Primaries cannot be banned
When all replicas are banned, all replicas are unbanned
2023-03-06 06:10:59 -06:00
Lev Kokotov
c3eaf023c7 Automatic sharding for SELECT v2 (#337)
* More comprehensive read sharding support

* A few fixes

* fq

* comment

* wildcard
2023-03-02 00:53:31 -05:00
Jose Fernández
9241df18e2 Allow sending logs to stdout by using STDOUT_LOG env var (#334)
* Allow sending logs to stdout by using STDOUT_LOG env var

* Increase stats buffer size
2023-02-28 13:10:40 -08:00
zainkabani
eb8cfdb1f1 Adds SHUTDOWN command as alternate option to sending SIGINT (#331)
* Adds SHUTDOWN command to PgCat as alternate option to sending SIGINT

* Check if we're already in SHUTDOWN sequence

* Send signal directly from shutdown instead of using channel

* Add tests

* trigger build

* Lowercase response and boolean change

* Update tests

* Fix tests

* typo
2023-02-26 22:16:30 -08:00
Mostafa Abdelraouf
75a7d4409a Fix Back-and-forth RELOAD Bug (#330)
We identified a bug where RELOAD fails to update the pools.

To reproduce you need to start at some config state, modify that state a bit, reload, revert the configs back to the original state, and reload. The last reload will fail to update the pool because PgCat "thinks" the pool state didn't change.

This is because we use a HashSet to keep track of config hashes but we never remove values from it.
Say we start with State A, we modify pool configs to State B and reload. Now the POOL_HASHES struct has State A and State B. Attempting to go back to State A will encounter a hashset hit which is interpreted by PgCat as "Configs are the same, no need to reload pools"

We fix this by attaching a config_hash value to ConnectionPool object and we calculate that value when we create the pool. This eliminates the need for a global variable. One shortcoming here is that changing any config under one user in the pool will trigger a reload for the entire pool (which is fine I think)
2023-02-21 21:53:10 -06:00
Nicholas Dujay
37e1c5297a implement show users (#329)
* implement show users

* fix compile errors

* add basic ruby test

* gitignore things
2023-02-21 13:08:43 -08:00
zainkabani
2b05ff4ee5 Log worker thread count at startup (#322) 2023-02-16 16:51:38 -06:00
John Meagher
d5f60b1720 Allow shard setting with comments (#293)
What
Allows shard selection by the client to come in via comments like /* shard_id: 1 */ select * from foo;

Why
We're using a setup in Ruby that makes it tough or impossible to inject commands on the connection to set the shard before it gets to the "real" SQL being run. Instead we have an updated PG adapter that allows injection of comments before each executed SQL statement. We need this support in pgcat in order to keep some complex shard picking logic in Ruby code while using pgcat for connection management.

Local Testing
Run postgres and pgcat with the default options. Run psql < tests/sharding/query_routing_setup.sql to setup the database for the tests and run ./tests/pgbench/external_shard_test.sh as often as needed to exercise the shard setting comment test.
2023-02-15 15:19:16 -06:00
Mostafa Abdelraouf
97f5a0564d Fix deprecation warnings (#319)
warning: use of deprecated function `base64::decode`: Use Engine::decode
2023-02-14 16:20:11 -06:00
Tommy Chen
9830c18315 Support EC and PKCS8 private keys (#316)
* Support EC and PKCS8 private keys

* Use iter instead of infinite loop in `load_keys` fn
2023-02-14 08:30:47 -08:00
Mostafa Abdelraouf
f1265a5570 Introduce tcp_keepalives to PgCat (#315)
We have encountered a case where PgCat pools were stuck following a database incident. Our best understanding at this point is that the PgCat -> Postgres connections died silently and because Tokio defaults to disabling keepalives, connections in the pool were marked as busy forever. Only when we deployed PgCat did we see recovery.

This PR introduces tcp_keepalives to PgCat. This sets the defaults to be

keepalives_idle: 5        # seconds
keepalives_interval: 5 # seconds
keepalives_count: 5    # a count
These settings can detect the death of an idle connection within 30 seconds of its death. Please note that the connection can remain idle forever (from an application perspective) as long as the keepalive packets are flowing so disconnection will only occur if the other end is not acknowledging keepalive packets (keepalive packet acks are handled by the OS, the application does not need to do anything). I plan to add tcp_user_timeout in a follow-up PR.
2023-02-08 11:35:38 -06:00
zainkabani
d81a744154 Fix logging mistakes (#313)
Mistakenly logging username as poolname and poolname as username
2023-02-07 14:16:28 -06:00
Kurtsley
1c73889fb9 Add initial Windows support, ref #298 (#301) 2023-01-28 15:51:05 -08:00
Lev Kokotov
24e79dcf05 Startup improvements & PAUSE/RESUME (#300)
* Dont require servers to be online to start pooler

* PAUSE/RESUME

* fix

* Refresh pool

* Fixes

* lint
2023-01-28 15:36:35 -08:00
Lev Kokotov
2e3eb2663e Fix formatting (#299) 2023-01-28 09:17:49 -08:00
zainkabani
a0e740d30f Refactors is_banned logic and forces health check on unban (#288)
* Refactors is_banned logic and forces healthcheck on unban

* typo

* Make is banned log debug

* addressing comments

* Comment
2023-01-19 17:36:48 -08:00
Jose Fernández
c58f9557ae Add more metrics to prometheus endpoint (#263)
This change:
- Adds server metrics to prometheus endpoint.
- Adds database metrics to prometheus endpoint.
- Adds pools metrics to prometheus endpoint.
- Change metrics name to have a prefix of (stats|pools|databases|servers).
2023-01-19 07:48:12 -08:00
zainkabani
ca8901910c Removes message cloning operation required for query router (#285)
* Removes message cloning operation required for query router

* fmt

* flakey?

* ?
2023-01-19 07:19:49 -08:00
Mostafa Abdelraouf
87a771aecc Log error messages for network failures (#289)
We are seeing some Error reading message code from socket error messages, we want to get more context so this PR logs the actual error reported.
2023-01-19 05:18:08 -06:00
zainkabani
85ac3ef9a5 Buffer client CopyData messages (#284)
Buffers CopyData messages and removes buffer clone for the sync message
2023-01-17 17:39:55 -08:00
Mostafa Abdelraouf
7894bba59b Introduce least-outstanding-connections load balancing (#282)
Least outstanding connections load balancing can improve the load distribution between instances but for Pgcat it may also improve handling slow replicas that don't go completely down. With LoC, traffic will quickly move away from the slow replica without waiting for the replica to be banned.

If all replicas slow down equally (due to a bad query that is hitting all replicas), the algorithm will degenerate to Random Load Balancing (which is what we had in Pgcat until today).

This may also allow Pgcat to accommodate pools with differently-sized replicas.
2023-01-17 06:52:18 -06:00
zainkabani
ab0bad6da0 Write messages directly onto message buffer instead of allocating on own buffer (#283)
* initial commit

* comment

* fmt
2023-01-16 20:22:06 -08:00
zainkabani
8720ed3826 Buffer copy data messages (#265)
* Buffer copy data messages

* Update comment
2022-12-21 06:57:53 -08:00
Jose Fernández
9e8ef566c6 Allow setting the number of runtime workers to be used. (#258)
This change adds a new configuration parameter called `worker_threads` that
allows setting the number of workers the Tokio Runtime will use. It defaults to
4 to maintain backward compatibility.

Given that the config file parse is done asynchronously, first, a transient runtime
is created for reading config, and once it has been parsed, the actual runtime
that will be used for PgCat execution is created.
2022-12-16 11:13:13 -08:00
Jose Fernández
99247f7c88 Allow setting idle_timeout for server connections. (#257)
In postgres, you can specify an `idle_session_timeout` which will close
sessions idling for that amount of time. If a session is closed because
of a timeout, PgCat will erroneously mark the server as unhealthy as the next
health check will return an error because the connection was drop, if no
health check is to be executed, it will simply fail trying to send the query
to the server for the same reason, the conn was drop.

Given that bb8 allows configuring an idle_timeout for pools, it would be
nice to allow setting this parameter in the config file, this way you can
set it to something shorter than the server one. Also, server pool will be kept
smaller in moments of less traffic. Actually, currently this value is set as its
default in bb8, which is 10 minutes.

This changes allows setting the parameter using the config file. It can be set both
globally and per pool. When creating the pool, if the pool don't have it defined, global
value is used.
2022-12-16 08:01:00 -08:00
zainkabani
c62b86f4e6 Adds details to errors and fixes error propagation bug (#239) 2022-11-17 09:24:39 -08:00
zainkabani
fcd2cae4e1 Move get_config in startup to admin branch to scope down usage (#238) 2022-11-17 09:22:12 -08:00