* Add a new exec_simple_query method
This adds a new `exec_simple_query` method so we can make 'out of band'
queries to servers that don't interfere with pools at all.
In order to reuse startup code for making these simple queries,
we need to set the stats (`Reporter`) optional, so using these
simple queries wont interfere with stats.
* Add auth passthough (auth_query)
Adds a feature that allows setting auth passthrough for md5 auth.
It adds 3 new (general and pool) config parameters:
- `auth_query`: An string containing a query that will be executed on boot
to obtain the hash of a given user. This query have to use a placeholder `$1`,
so pgcat can replace it with the user its trying to fetch the hash from.
- `auth_query_user`: The user to use for connecting to the server and executing the
auth_query.
- `auth_query_password`: The password to use for connecting to the server and executing the
auth_query.
The configuration can be done either on the general config (so pools share them) or in a per-pool basis.
The behavior is, at boot time, when validating server connections, a hash is fetched per server
and stored in the pool. When new server connections are created, and no cleartext password is specified,
the obtained hash is used for creating them, if the hash could not be obtained for whatever reason, it retries
it.
When client authentication is tried, it uses cleartext passwords if specified, it not, it checks whether
we have query_auth set up, if so, it tries to use the obtained hash for making client auth. If there is no
hash (we could not obtain one when validating the connection), a new fetch is tried.
Once we have a hash, we authenticate using it against whathever the client has sent us, if there is a failure
we refetch the hash and retry auth (so password changes can be done).
The idea with this 'retrial' mechanism is to make it fault tolerant, so if for whatever reason hash could not be
obtained during connection validation, or the password has change, we can still connect later.
* Add documentation for Auth passthrough
* Refactor stats to use atomics
When we are dealing with a high number of connections, generated
stats cannot be consumed fast enough by the stats collector loop.
This makes the stats subsystem inconsistent and a log of
warning messages are thrown due to unregistered server/clients.
This change refactors the stats subsystem so it uses atomics:
- Now counters are handled using U64 atomics
- Event system is dropped and averages are calculated using a loop
every 15 seconds.
- Now, instead of snapshots being generated ever second we keep track of servers/clients
that have registered. Each pool/server/client has its own instance of the counter and
makes changes directly, instead of adding an event that gets processed later.
* Manually mplement Hash/Eq in `config::Address` ignoring stats
* Add tests for client connection counters
* Allow connecting to dockerized dev pgcat from the host
* stats: Decrease cl_idle when idle socket disconnects
When recv is called in the mirroring client, we noticed an occasional panic when reading the message.
thread 'tokio-runtime-worker' panicked at 'slice index starts at 5 but ends at 0', src/messages.rs:522:18
We are still debugging the reason why this happens but adding a check for slice bounds seems like a good idea. Instead of panicking, this will return an Err to the caller which will close the connection.
The experimental mirroring feature used a lot of memory and CPU when put under production traffic. This change attempts to reduce memory and CPU usage.
Memory footprint is reduced by making the channel smaller. CPU usage is reduced by avoiding allocations if the channel is full or is closed.
We might lose more messages this way if the mirror falls behind but that is more acceptable than crashing the entire process when it goes out-of-memory (OOM)
This PR adds a utility script that generates config documentation from pgcat.toml. Ideally, we'd want to generate the configs directly from config.rs where the actual defaults are set but this is a good start as we already had several undocumented config flags.
* Prepared stmt sharding
s
tests
* len check
* remove python test
* latest rust
* move that to debug for sure
* Add the actual tests
* latest image
* Update tests/ruby/sharding_spec.rb
This is an implementation of Query mirroring in PgCat (outlined here #302)
In configs, we match mirror hosts with the servers handling the traffic. A mirror host will receive the same protocol messages as the main server it was matched with.
This is done by creating an async task for each mirror server, it communicates with the main server through two channels, one for the protocol messages and one for the exit signal. The mirror server sends the protocol packets to the underlying PostgreSQL server. We receive from the underlying PostgreSQL server as soon as the data is available and we immediately discard it. We use bb8 to manage the life cycle of the connection, not for pooling since each mirror server handler is more or less single-threaded.
We don't have any connection pooling in the mirrors. Matching each mirror connection to an actual server connection guarantees that we will not have more connections to any of the mirrors than the parent pool would allow.
Sometimes we want an admin to be able to ban a host for some time to route traffic away from that host for reasons like partial outages, replication lag, and scheduled maintenance.
We can achieve this today using a configuration update but a quicker approach is to send a control command to PgCat that bans the replica for some specified duration.
This command does not change the current banning rules like
Primaries cannot be banned
When all replicas are banned, all replicas are unbanned
* Adds SHUTDOWN command to PgCat as alternate option to sending SIGINT
* Check if we're already in SHUTDOWN sequence
* Send signal directly from shutdown instead of using channel
* Add tests
* trigger build
* Lowercase response and boolean change
* Update tests
* Fix tests
* typo
We identified a bug where RELOAD fails to update the pools.
To reproduce you need to start at some config state, modify that state a bit, reload, revert the configs back to the original state, and reload. The last reload will fail to update the pool because PgCat "thinks" the pool state didn't change.
This is because we use a HashSet to keep track of config hashes but we never remove values from it.
Say we start with State A, we modify pool configs to State B and reload. Now the POOL_HASHES struct has State A and State B. Attempting to go back to State A will encounter a hashset hit which is interpreted by PgCat as "Configs are the same, no need to reload pools"
We fix this by attaching a config_hash value to ConnectionPool object and we calculate that value when we create the pool. This eliminates the need for a global variable. One shortcoming here is that changing any config under one user in the pool will trigger a reload for the entire pool (which is fine I think)
Connection to the CI databases is viewed by Postgres as coming from localhost. The pg_hba.conf file generated by the docker image uses trust for these connections, that's why we had no test coverage on SASL and md5 branches.
This PR fixes this issue. There was also an issue with under-reporting code coverage. This should be fixed now
I am seeing Directory (/home/circleci/project) you are trying to checkout to is not empty and not a git repository error after I started using the new Dockerfile.ci image. My best guess is that this failure is because we download toxiproxy.deb file into the home directory which blocks git checkout.
This PR moves toxiproxy to /tmp/ to avoid this
We have to build and push the docker image used in CI manually. This PR builds that image automatically and pushes it to Github docker repository.
Will start using that image in a follow PR
What
Allows shard selection by the client to come in via comments like /* shard_id: 1 */ select * from foo;
Why
We're using a setup in Ruby that makes it tough or impossible to inject commands on the connection to set the shard before it gets to the "real" SQL being run. Instead we have an updated PG adapter that allows injection of comments before each executed SQL statement. We need this support in pgcat in order to keep some complex shard picking logic in Ruby code while using pgcat for connection management.
Local Testing
Run postgres and pgcat with the default options. Run psql < tests/sharding/query_routing_setup.sql to setup the database for the tests and run ./tests/pgbench/external_shard_test.sh as often as needed to exercise the shard setting comment test.