Commit Graph

48 Commits

Author SHA1 Message Date
Zain Kabani
e14b283f0c Make infer role configurable and fix double parse bug (#533)
* Make infer role configurable and fix double parse bug

* Fix tests

* Enable infer_role_from query in toml for tests

* Fix test

* Add max length config, add logging for which application is failing to parse, and change config name

* fmt

* Update src/config.rs

---------

Co-authored-by: Lev Kokotov <levkk@users.noreply.github.com>
2023-08-08 13:10:03 -07:00
Mostafa Abdelraouf
2a8f3653a6 Fix COPY FROM and add tests (#522)
* Fix COPY FROM and add tests

* E

* fmt
2023-07-20 23:06:01 -07:00
Lev Kokotov
4b78af9676 Implement Close for prepared statements (#482)
* Partial support for Close

* Close

* respect config value

* prepared spec

* Hmm

* Print cache size
2023-06-18 23:02:34 -07:00
Mostafa Abdelraouf
a8a30ad43b Refactor Pool Stats to be based off of Server/Client stats (#445)
What is wrong
Stats reported by SHOW POOLS seem to be leaking. We see lingering cl_idle , cl_waiting, and similarly for sv_idle , sv_active. We confirmed that these are reporting issues not actual lingering clients.

This behavior is readily reproducible by running

while true; do
    psql "postgres://sharding_user:sharding_user@localhost:6432/sharded_db" -c "SELECT 1" > /dev/null 2>&1  &
done

Why it happens
I wasn't able to get to figure our the reason for the bug but my best guess is that we have race conditions when updating pool-level stats. So even though individual update operations are atomic, we perform a check then update sequence which is not protected by a guard.
https://github.com/postgresml/pgcat/blob/main/src/stats/pool.rs#L174-L179

I am also suspecting that using Relaxed ordering might allow this behavior (I changed all operations to use Ordering::SeqCst but still got lingering clients)

How to fix
Since SHOW POOLS/SHOW SERVER/SHOW CLIENTS all show the current state of the proxy (as opposed to SHOW STATS which show aggregate values), this PR refactors SHOW POOLS to have it construct the results directly from SHOW SERVER and SHOW CLIENT datasets. This reduces the complexity of stat updates and eliminates the need for having locks when updating pool stats as we only care about updating individual client/server states.

This will change the semantics of maxwait, so instead of it holding the maxwait time ever encountered by a client (connected or disconnected), it will only consider connected clients which should be okay given PgCat tends to hold on to client connections more than Pgbouncer.
2023-05-23 08:44:49 -05:00
Lev Kokotov
37e3349c24 Optionally clean up server connections (#444)
* Optionally clean up server connections

* move setting to pool

* fix test

* Print setting to screen

* fmt

* Fix pool_settings override in tests
2023-05-18 10:46:55 -07:00
Zain Kabani
7f57a89d75 Fix time based average stats (#442)
* keep track of current stats and zero them after updating averages

* Try tests

* typo

* remove commented test stuff

* Avoid dividing by zero

* Fix test

* refactor, get rid of iterator. do it manually

* trigger build

* Fix
2023-05-17 21:38:10 -07:00
Lev Kokotov
52b1b43850 Prewarmer (#435)
* Prewarmer

* hmm

* Tests

* default

* fix test

* Correct configuration

* Added minimal config example

* remove connect_timeout
2023-05-12 09:50:52 -07:00
Zain Kabani
73260690b0 Fixes average stats bug (#436)
* Add test

* Fix test

* Add fix
2023-05-11 17:37:58 -07:00
Andrew Tanner
159eb89bf0 First try with role reset (#427)
* First try with role rest

* update

* extra line

* Update src/server.rs

Co-authored-by: Lev Kokotov <levkk@users.noreply.github.com>

* Update tests/ruby/misc_spec.rb

Co-authored-by: Lev Kokotov <levkk@users.noreply.github.com>

---------

Co-authored-by: Lev Kokotov <levkk@users.noreply.github.com>
2023-05-05 15:31:27 -07:00
Lev Kokotov
811885f464 Actually plugins (#421)
* more plugins

* clean up

* fix tests

* fix flakey test
2023-05-03 16:13:45 -07:00
Lev Kokotov
09e54e1175 Plugins! (#420)
* Some queries

* Plugins!!

* cleanup

* actual names

* the actual plugins

* comment

* fix tests

* Tests

* unused errors

* Increase reaper rate to actually enforce settings

* ok
2023-05-03 09:13:05 -07:00
Lev Kokotov
0d504032b2 Server TLS (#417)
* Server TLS

* Finish up TLS

* thats it

* diff

* remove dead code

* maybe?

* dirty shutdown

* skip flakey test

* remove unused error

* fetch config once
2023-04-30 09:41:46 -07:00
Kian-Meng Ang
d568739db9 Fix typos (#398)
Found via `typos --format brief`
2023-04-10 18:37:16 -07:00
Mostafa Abdelraouf
7ddd23b514 Protocol-level test helpers (#393)
I needed to have granular control over protocol message testing. For example, being able to send protocol messages one-by-one and then be able to inspect the results.

In order to do that, I created this low-level ruby client that can be used to send protocol messages in any order without blocking and also allows inspection of response messages.
2023-04-01 15:27:57 -05:00
Jose Fernández
6f768a84ce Auth passthrough (auth_query) (#266)
* Add a new exec_simple_query method

This adds a new `exec_simple_query` method so we can make 'out of band'
queries to servers that don't interfere with pools at all.
In order to reuse startup code for making these simple queries,
we need to set the stats (`Reporter`) optional, so using these
simple queries wont interfere with stats.

* Add auth passthough (auth_query)

Adds a feature that allows setting auth passthrough for md5 auth.

It adds 3 new (general and pool) config parameters:

- `auth_query`: An string containing a query that will be executed on boot
to obtain the hash of a given user. This query have to use a placeholder `$1`,
so pgcat can replace it with the user its trying to fetch the hash from.
- `auth_query_user`: The user to use for connecting to the server and executing the
auth_query.
- `auth_query_password`: The password to use for connecting to the server and executing the
auth_query.

The configuration can be done either on the general config (so pools share them) or in a per-pool basis.

The behavior is, at boot time, when validating server connections, a hash is fetched per server
and stored in the pool. When new server connections are created, and no cleartext password is specified,
the obtained hash is used for creating them, if the hash could not be obtained for whatever reason, it retries
it.

When client authentication is tried, it uses cleartext passwords if specified, it not, it checks whether
we have query_auth set up, if so, it tries to use the obtained hash for making client auth. If there is no
hash (we could not obtain one when validating the connection), a new fetch is tried.

Once we have a hash, we authenticate using it against whathever the client has sent us, if there is a failure
we refetch the hash and retry auth (so password changes can be done).

The idea with this 'retrial' mechanism is to make it fault tolerant, so if for whatever reason hash could not be
obtained during connection validation, or the password has change, we can still connect later.

* Add documentation for Auth passthrough
2023-03-30 13:29:23 -07:00
Jose Fernández
58ce76d9b9 Refactor stats to use atomics (#375)
* Refactor stats to use atomics

When we are dealing with a high number of connections, generated
stats cannot be consumed fast enough by the stats collector loop.
This makes the stats subsystem inconsistent and a log of
warning messages are thrown due to unregistered server/clients.

This change refactors the stats subsystem so it uses atomics:

- Now counters are handled using U64 atomics
- Event system is dropped and averages are calculated using a loop
  every 15 seconds.
- Now, instead of snapshots being generated ever second we keep track of servers/clients
  that have registered. Each pool/server/client has its own instance of the counter and
  makes changes directly, instead of adding an event that gets processed later.

* Manually mplement Hash/Eq in `config::Address` ignoring stats

* Add tests for client connection counters

* Allow connecting to dockerized dev pgcat from the host

* stats: Decrease cl_idle when idle socket disconnects
2023-03-28 17:19:37 +02:00
Zain Kabani
ca4431b67e Add idle client in transaction configuration (#380)
* Add idle client in transaction configuration

* fmt

* Update docs

* trigger build

* Add tests

* Make the config dynamic from reloads

* fmt

* comments

* trigger build

* fix config.md

* remove error
2023-03-24 08:20:30 -07:00
Lev Kokotov
b4baa86e8a Extended query protocol sharding (#339)
* Prepared stmt sharding

s

tests

* len check

* remove python test

* latest rust

* move that to debug for sure

* Add the actual tests

* latest image

* Update tests/ruby/sharding_spec.rb
2023-03-10 07:55:22 -08:00
Mostafa Abdelraouf
aa89e357e0 PgCat Query Mirroring (#341)
This is an implementation of Query mirroring in PgCat (outlined here #302)

In configs, we match mirror hosts with the servers handling the traffic. A mirror host will receive the same protocol messages as the main server it was matched with.

This is done by creating an async task for each mirror server, it communicates with the main server through two channels, one for the protocol messages and one for the exit signal. The mirror server sends the protocol packets to the underlying PostgreSQL server. We receive from the underlying PostgreSQL server as soon as the data is available and we immediately discard it. We use bb8 to manage the life cycle of the connection, not for pooling since each mirror server handler is more or less single-threaded.

We don't have any connection pooling in the mirrors. Matching each mirror connection to an actual server connection guarantees that we will not have more connections to any of the mirrors than the parent pool would allow.
2023-03-10 06:23:51 -06:00
Mostafa Abdelraouf
2cc6a09fba Add Manual host banning to PgCat (#340)
Sometimes we want an admin to be able to ban a host for some time to route traffic away from that host for reasons like partial outages, replication lag, and scheduled maintenance.

We can achieve this today using a configuration update but a quicker approach is to send a control command to PgCat that bans the replica for some specified duration.

This command does not change the current banning rules like

Primaries cannot be banned
When all replicas are banned, all replicas are unbanned
2023-03-06 06:10:59 -06:00
Jose Fernández
8a0da10a87 Dev environment (#338)
Add dev env
2023-03-02 12:14:10 -05:00
Mostafa Abdelraouf
75a7d4409a Fix Back-and-forth RELOAD Bug (#330)
We identified a bug where RELOAD fails to update the pools.

To reproduce you need to start at some config state, modify that state a bit, reload, revert the configs back to the original state, and reload. The last reload will fail to update the pool because PgCat "thinks" the pool state didn't change.

This is because we use a HashSet to keep track of config hashes but we never remove values from it.
Say we start with State A, we modify pool configs to State B and reload. Now the POOL_HASHES struct has State A and State B. Attempting to go back to State A will encounter a hashset hit which is interpreted by PgCat as "Configs are the same, no need to reload pools"

We fix this by attaching a config_hash value to ConnectionPool object and we calculate that value when we create the pool. This eliminates the need for a global variable. One shortcoming here is that changing any config under one user in the pool will trigger a reload for the entire pool (which is fine I think)
2023-02-21 21:53:10 -06:00
Nicholas Dujay
37e1c5297a implement show users (#329)
* implement show users

* fix compile errors

* add basic ruby test

* gitignore things
2023-02-21 13:08:43 -08:00
Mostafa Abdelraouf
f9134807d7 More Test coverage + fix some code coverage bugs (#321)
Connection to the CI databases is viewed by Postgres as coming from localhost. The pg_hba.conf file generated by the docker image uses trust for these connections, that's why we had no test coverage on SASL and md5 branches.

This PR fixes this issue. There was also an issue with under-reporting code coverage. This should be fixed now
2023-02-16 23:09:22 -06:00
Mostafa Abdelraouf
bf6efde8cc Fix code coverage + less flakiness (#318)
Code coverage logic was missing coverage from rust tests. This is now fixed.
Also, we weren't reaping spawned PgCat processes correctly which left zombie processes.
2023-02-13 15:29:08 -06:00
Mostafa Abdelraouf
f1265a5570 Introduce tcp_keepalives to PgCat (#315)
We have encountered a case where PgCat pools were stuck following a database incident. Our best understanding at this point is that the PgCat -> Postgres connections died silently and because Tokio defaults to disabling keepalives, connections in the pool were marked as busy forever. Only when we deployed PgCat did we see recovery.

This PR introduces tcp_keepalives to PgCat. This sets the defaults to be

keepalives_idle: 5        # seconds
keepalives_interval: 5 # seconds
keepalives_count: 5    # a count
These settings can detect the death of an idle connection within 30 seconds of its death. Please note that the connection can remain idle forever (from an application perspective) as long as the keepalive packets are flowing so disconnection will only occur if the other end is not acknowledging keepalive packets (keepalive packet acks are handled by the OS, the application does not need to do anything). I plan to add tcp_user_timeout in a follow-up PR.
2023-02-08 11:35:38 -06:00
Mostafa Abdelraouf
87a771aecc Log error messages for network failures (#289)
We are seeing some Error reading message code from socket error messages, we want to get more context so this PR logs the actual error reported.
2023-01-19 05:18:08 -06:00
dependabot[bot]
99a3b9896d chore(deps): bump activerecord from 7.0.3.1 to 7.0.4.1 in /tests/ruby (#287)
Bumps [activerecord](https://github.com/rails/rails) from 7.0.3.1 to 7.0.4.1.
- [Release notes](https://github.com/rails/rails/releases)
- [Changelog](https://github.com/rails/rails/blob/v7.0.4.1/activerecord/CHANGELOG.md)
- [Commits](https://github.com/rails/rails/compare/v7.0.3.1...v7.0.4.1)

---
updated-dependencies:
- dependency-name: activerecord
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-18 16:56:16 -08:00
Mostafa Abdelraouf
7894bba59b Introduce least-outstanding-connections load balancing (#282)
Least outstanding connections load balancing can improve the load distribution between instances but for Pgcat it may also improve handling slow replicas that don't go completely down. With LoC, traffic will quickly move away from the slow replica without waiting for the replica to be banned.

If all replicas slow down equally (due to a bad query that is hitting all replicas), the algorithm will degenerate to Random Load Balancing (which is what we had in Pgcat until today).

This may also allow Pgcat to accommodate pools with differently-sized replicas.
2023-01-17 06:52:18 -06:00
zainkabani
19f635881a Don't send discard all when state is changed in transaction (#186)
* Don't send discard all when state is changed in transaction

* Remove unnecessary clone

* spelling

* Move transaction check to SET command

* Add test for set command in transaction

* type

* Update comments

* Update comments

* use moves instead of clones for initial message

* don't make message mutable

* Update unwrap

* but i'm not a wrapper

* Add set local test

* change continue
2022-10-13 19:33:12 -07:00
Mostafa Abdelraouf
3d33ccf4b0 Fix maxwait metric (#183)
Max wait was being reported as 0 after #159

This PR fixes that and adds test
2022-10-05 21:41:09 -05:00
Mostafa Abdelraouf
af064ef447 Set client state to idle after error (#179)
* Set client state to idle after error

* fmt

* spelling

* clean up
2022-09-24 09:09:15 -07:00
Mostafa Abdelraouf
f7a951745c Report Query times (#166)
* Report avg and total query timing

* Report query times

* fmt
2022-09-15 02:21:45 -04:00
Mostafa Abdelraouf
4ae1bc8d32 Add SHOW CLIENTS / SHOW SERVERS + Stats refactor and tests (#159)
* wip

* Main Thread Panic when swarmed with clients

* fix

* fix

* 1024

* fix

* remove test

* Add SHOW CLIENTS

* revert

* fmt

* Refactor + tests

* fmt

* add test

* Add SHOW SERVERS + Make PR unreviewable

* prometheus

* add state to clients and servers

* fmt

* Add application_name to server stats

* Add tests for waiting clients

* Docs

* remove comment

* comments

* typo

* cleanup

* CI
2022-09-14 11:20:41 -04:00
Mostafa Abdelraouf
9514b3b2d1 Clean connection state up after protocol named prepared statement (#163)
* Clean connection state up after protocol named prepared statement

* Avoid cloning + add test

* fmt
2022-09-07 20:37:17 -07:00
Mostafa Abdelraouf
23a642f4a4 Send DISCARD ALL even if client is not in transaction (#152)
* Send DISCARD ALL even if client is not in transaction

* fmt

* Added tests + avoided sending extra discard all

* Adds set name logic to beginning of handle client

* fmt

* refactor dead code handling

* Refactor reading command tag

* remove unnecessary trim

* Removing debugging statement

* typo

* typo{

* documentation

* edit text

* un-unwrap

* run ci

* run ci

Co-authored-by: Zain Kabani <zain.kabani@instacart.com>
2022-09-01 20:06:55 -07:00
Mostafa Abdelraouf
d48c04a7fb Ruby integration tests (#147)
* Ruby integration tests

* forgot a file

* refactor

* refactoring

* more refactoring

* remove config helper

* try multiple databases

* fix

* more databases

* Use pg stats

* ports

* speed

* Fix tests

* preload library

* comment
2022-08-30 09:14:53 -07:00
Mostafa Abdelraouf
c054ff068d Avoid sending Z packet in the middle of extended protocol packet sequence if we fail to get connection from pool (#137)
* Failing test

* maybe

* try fail

* try

* add message

* pool size

* correct user

* more

* debug

* try fix

* see stdout

* stick?

* fix configs

* modify

* types

* m

* maybe

* make tests idempotent

* hopefully fails

* Add client fix

* revert pgcat.toml change

* Fix tests
2022-08-23 11:02:23 -07:00
Mostafa Abdelraouf
7592339092 Prevent clients from sticking to old pools after config update (#113)
* Re-acquire pool at the beginning of Protocol loop

* Fix query router + add tests for recycling behavior
2022-08-09 12:18:27 -07:00
Mostafa Abdelraouf
1b648ca00e Send proper server parameters to clients using admin db (#103)
* Send proper server parameters to clients using admin db

* clean up

* fix python test

* build

* Add python

* missing &

* debug ls

* fix tests

* fix tests

* fix

* Fix warning

* Address comments
2022-07-31 19:52:23 -07:00
Mostafa Abdelraouf
2ae4b438e3 Add support for multi-database / multi-user pools (#96)
* Add support for multi-database / multi-user pools

* Nothing

* cargo fmt

* CI

* remove test users

* rename pool

* Update tests to use admin user/pass

* more fixes

* Revert bad change

* Use PGDATABASE env var

* send server info in case of admin
2022-07-27 19:47:55 -07:00
dependabot[bot]
eff8e3e229 Bump activerecord from 7.0.2.2 to 7.0.3.1 in /tests/ruby (#94)
Bumps [activerecord](https://github.com/rails/rails) from 7.0.2.2 to 7.0.3.1.
- [Release notes](https://github.com/rails/rails/releases)
- [Changelog](https://github.com/rails/rails/blob/v7.0.3.1/activerecord/CHANGELOG.md)
- [Commits](https://github.com/rails/rails/compare/v7.0.2.2...v7.0.3.1)

---
updated-dependencies:
- dependency-name: activerecord
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-07-12 13:24:41 -07:00
Lev Kokotov
37e3a86881 Pass application_name to server (#73)
* Pass application_name to server

* fmt
2022-06-03 00:15:50 -07:00
Lev Kokotov
ccbca66e7a Poorly behaved client fix (#65)
* Poorly behaved client fix

* yes officer

* fix tests

* no useless rescue

* Looks ok
2022-05-09 09:09:22 -07:00
Lev Kokotov
303fec063b Ruby (#30)
* cop

* log
2022-02-20 23:33:04 -08:00
Lev Kokotov
a556ec1c43 More query router commands; settings last until changed again; docs (#25)
* readme

* touch up docs

* stuff

* refactor query router

* remove unused

* less verbose

* docs

* no link

* method rename
2022-02-19 08:57:24 -08:00
Lev Kokotov
bbacb9cf01 Explicit shard selection; Rails tests (#24)
* Explicit shard selection; Rails tests

* try running ruby tests

* try without lockfile

* aha

* ok
2022-02-18 09:43:07 -08:00
Lev Kokotov
6e83556867 Ruby tests 2022-02-03 18:08:51 -08:00