Currently, `connect_timeout` sounds like it should be for connections to
the Postgres server. It's actually used for obtaining a connection from
the pool.
* Make user min_pool_size configurable
* Set user server_lifetime only if specified
* Increment chart version
* Use default instea of or
* Allow enabling server_tls
* statement_timeout default value
* Allow pulling password from existing secret
---------
Co-authored-by: Mostafa Abdelraouf <mostafa.mohmmed@gmail.com>
Build is failing with this error
Downloading activerecord-3.2.14 revealed dependencies not in the API or the
lockfile (activesupport (= 3.2.14), activemodel (= 3.2.14), arel (~> 3.0.2),
tzinfo (~> 0.3.29)).
Either installing with `--full-index` or running `bundle update activerecord`
should fix the problem.
After ActiveSupport was updated.
This PR fixes that
In #796, I noticed that the deb package was not build since an automation was missing.
With this PR, I add the missing automation.
I tested the workflow in my repo...
when starting the workflow manually: https://github.com/MrSerth/pgcat/actions/runs/10737879151/job/29780286094
when drafting a new release: https://github.com/MrSerth/pgcat/actions/runs/10737835796/job/29780146212
Obviously, both workflows failed since I cannot upload to the APT repo. However, the version substitution for the workflow is working correctly (as shown when collapsing the first line of the "Build and release package" step).
Previously, upgrading the deb package stopped the service but didn't reenable it after a successful upgrade. This made upgrading the package more difficult and required a second step to restart the service. With this commit, the systemd service is automatically started when the default config file is present.
* Prometheus metrics updates:
* Add username label to deconflict metrics that would otherwise
have duplicate labels across different pools.
* Group metrics by name and only print HELP and TYPE once per
metric name.
* Sort labels for a deterministic output.
---------
Co-authored-by: Curtis Myzie <curtis.myzie@gmail.com>
Co-authored-by: Towhid Khan
Currently the python tests act as scripts. A lot of output is generated to stdout which makes it very hard to figure out where problems were. Also if you want to run only a single test you basically need to comment out code in order to accomplish this.
This PR modifies the python tests to us the pytest python testing framework. This framework allows individual tests to be targeted via the command line, without touching the source code. It also suppressed stdout by default making the test output much easier to read. Also after the tests run it will provide a summary of what failed, what succeded, etc.
Co-authored-by: CommanderKeynes <andrewjackson947@gmail.coma>
Co-authored-by: Andrew Jackson <andrewjackson2988@gmail.com>
Writing and iterating on integration tests are cumbersome, having to wait 10 minutes for the test-suite to run just to see if your test works or not is unacceptable.
In this PR, I added a detailed workflow for writing tests that should shorten the feedback cycle of modifying tests to be as low as a few seconds.
It will involve opening a shell into a long-lived container that has all the setup and dependencies necessary and then running your desired tests directly there. I added a convenience script that bootstraps the environment and then opens an interactive shell into the container and you can then run tests immediately in an environment that is more or less identical to what we have running in CircleCI
We were missing some labels on metrics generated by the Prometheus exporter so I fixed that. There are still some gaps that I want to address with respect to the metrics we track but this seems like a good start.
I also created a Grafana Dashboard and exported it to JSON. It is designed with the same metric names the Prometheus exporter uses.
The docker CI build image is failing due to this error
249.5 Finished release [optimized] target(s) in 2m 49s
249.5 Installing /home/circleci/.cargo/bin/rustfilt
249.5 Installed package `rustfilt v0.2.1` (executable `rustfilt`)
249.5 error: failed to compile `cargo-binutils v0.3.6`, intermediate artifacts can be found at `/tmp/cargo-installrWENQG`
249.5
249.5 Caused by:
249.5 package `cargo-platform v0.1.8` cannot be built because it requires rustc 1.73 or newer, while the currently active rustc version is 1.67.1
249.5 Try re-running cargo install with `--locked`
249.5 Summary Successfully installed rustfilt! Failed to install cargo-binutils (see error(s) above).
249.5 error: some crates failed to install
So I am bumping the version up
This commit adds the TCP_NODELAY option to the socket configuration in
`configure_socket` function. Without this option, we observed significant
performance issues when executing SELECT queries with large responses.
Before the fix:
postgres=> SELECT repeat('a', 1); SELECT repeat('a', 8153);
Time: 1.368 ms
Time: 41.364 ms
After the fix:
postgres=> SELECT repeat('a', 1); SELECT repeat('a', 8153);
Time: 1.332 ms
Time: 1.528 ms
By setting TCP_NODELAY, we eliminate the Nagle's algorithm delay, which
results in a substantial improvement in response times for large queries.
This problem was discussed in https://github.com/postgresml/pgcat/issues/616.
Use rust:bullseye base image
With the original rust:1.70-bullseye image, the container cannot be
built:
17.06 Installing /usr/local/cargo/bin/rustfilt
17.06 Installed package `rustfilt v0.2.1` (executable `rustfilt`)
17.06 error: failed to compile `cargo-binutils v0.3.6`, intermediate artifacts can be found at `/tmp/cargo-installrc6mPb`
17.06
17.06 Caused by:
17.06 package `cargo-platform v0.1.8` cannot be built because it requires rustc 1.73 or newer, while the currently active rustc version is 1.70.0
17.06 Try re-running cargo install with `--locked`
17.06 Summary Successfully installed rustfilt! Failed to install cargo-binutils (see error(s) above).
17.06 error: some crates failed to install
This is the same base image used on tests/docker/Dockerfile
* add workflow
* feat: add pgcat helm chart
* fix: set the right include into configmap
Signed-off-by: David ALEXANDRE <david.alexandre@w6d.io>
* update values and config
* prettifying config
---------
Signed-off-by: David ALEXANDRE <david.alexandre@w6d.io>
The pool maxwait metric currently operates differently from Pgbouncer.
The way it operates today is that we keep track of max_wait on each connected client, when SHOW POOLS query is made, we go over the connected clients and we get the max of max_wait times among clients. This means the pool maxwait will never reset, it will always be monotonically increasing until the client with the highest maxwait disconnects.
This PR changes this behavior, by keeping track of the wait_start time on each client, when a client goes into WAITING state, we record the time offset from connect_time. When we either successfully or unsuccessfully checkout a connection from the pool, we reset the wait_start time.
When SHOW POOLS query is made, we go over all connected clients and we only consider clients whose wait_start is non-zero, for clients that have non-zero wait times, we compare them and report the maximum waiting time as maxwait for the pool.
* Revert "Reset wait times when checked out successfully (#656)"
This reverts commit ec3920d60f.
* Revert "Not sure how this sneaked past CI"
This reverts commit 4c5498b915.
* Revert "only report wait times from clients currently waiting to match behavior of pgbouncer (#655)"
This reverts commit 0e8064b049.
* Fix prepared statement not found when prepared stmt has error
* cleanup debug
* remove more debug msgs
* sure debugged this..
* version bump
* add rust tests
* Expose clients maxwait time in SHOW CLIENTS response via PgCat admin
Displays the maxwait via maxwait_seconds and maxwait_us columns for each client that can be used to track down the wait time per client in a case where the overall pool stats shows waiting time. The maxwait_us, similar to the pool stats setup, is configured to display as a remainder alongside the maxwait_seconds.
* Use maxwait instead of maxwait_seconds to match pools column name
---------
Co-authored-by: Calvin Hughes <9379992+calvinhughes@users.noreply.github.com>
* Add golang test suite to reproduce issue with unnamed parameterized prepared statements
* Allow caching of unnamed prepared statements
* Passthrough describe on portals
* Remove unneeded kill
* Update Dockerfile.ci with golang
* Move out update of Dockerfiles to separate PR
* Initial commit
* Cleanup and add stats
* Use an arc instead of full clones to store the parse packets
* Use mutex instead
* fmt
* clippy
* fmt
* fix?
* fix?
* fmt
* typo
* Update docs
* Refactor custom protocol
* fmt
* move custom protocol handling to before parsing
* Support describe
* Add LRU for server side statement cache
* rename variable
* Refactoring
* Move docs
* Fix test
* fix
* Update tests
* trigger build
* Add more tests
* Reorder handling sync
* Support when a named describe is sent along with Parse (go pgx) and expecting results
* don't talk to client if not needed when client sends Parse
* fmt :(
* refactor tests
* nit
* Reduce hashing
* Reducing work done to decode describe and parse messages
* minor refactor
* Merge branch 'main' into zain/reimplment-prepared-statements-with-global-lru-cache
* Rewrite extended and prepared protocol message handling to better support mocking response packets and close
* An attempt to better handle if there are DDL changes that might break cached plans with ideas about how to further improve it
* fix
* Minor stats fixed and cleanup
* Cosmetic fixes (#64)
* Cosmetic fixes
* fix test
* Change server drop for statement cache error to a `deallocate all`
* Updated comments and added new idea for handling DDL changes impacting cached plans
* fix test?
* Revert test change
* trigger build, flakey test
* Avoid potential race conditions by changing get_or_insert to promote for pool LRU
* remove ps enabled variable on the server in favor of using an option
* Add close to the Extended Protocol buffer
---------
Co-authored-by: Lev Kokotov <levkk@users.noreply.github.com>
* Improved logging
* Improved logging for more `Address` usages
* Fixed lint issues.
* Reverted the `Address` logging changes.
* Applied the PR comment by @levkk.
* Applied the PR comment by @levkk.
* Applied the PR comment by @levkk.
* Applied the PR comment by @levkk.
It could be used to implement container health checks.
Example:
PGPASSWORD="<some-password>" psql -U pgcat -p 6432 -h 127.0.0.1 -tA -c "show version;" -d pgcat >/dev/null
The TL;DR for the change is that we allow QueryRouter to set the active shard to None. This signals to the Pool::get method that we have no shard selected. The get method follows a no_shard_specified_behavior config to know how to route the query.
Original PR description
Ruby-pg library makes a startup query to SET client_encoding to ... if Encoding.default_internal value is set (Code). This query is troublesome because we cannot possibly attach a routing comment to it. PgCat, by default, will route that query to the default shard.
Everything is fine until shard 0 has issues, Clients will all be attempting to send this query to shard0 which increases the connection latency significantly for all clients, even those not interested in shard0
This PR introduces no_shard_specified_behavior that defines the behavior in case we have routing-by-comment enabled but we get a query without a comment. The allowed behaviors are
random: Picks a shard at random
random_healthy: Picks a shard at random favoring shards with the least number of recent connection/checkout errors
shard_<number>: e.g. shard_0, shard_4, etc. picks a specific shard, everytime
In order to achieve this, this PR introduces an error_count on the Address Object that tracks the number of errors since the last checkout and uses that metric to sort shards by error count before making a routing decision.
I didn't want to use address stats to avoid introducing a routing dependency on internal stats (We might do that in the future but I prefer to avoid this for the time being.
I also made changes to the test environment to replace Ruby's TOML reader library, It appears to be abandoned and does not support mixed arrays (which we use in the config toml), and it also does not play nicely with single-quoted regular expressions. I opted for using yj which is a CLI tool that can convert from toml to JSON and back. So I refactor the tests to use that library.
* User server parameters struct instead of server info bytesmut
* Refactor to use hashmap for all params and add server parameters to client
* Sync parameters on client server checkout
* minor refactor
* update client side parameters when changed
* Move the SET statement logic from the C packet to the S packet.
* trigger build
* revert validation changes
* remove comment
* Try fix
* Reset cleanup state after sync
* fix server version test
* Track application name through client life for stats
* Add tests
* minor refactoring
* fmt
* fix
* fmt
* Make infer role configurable and fix double parse bug
* Fix tests
* Enable infer_role_from query in toml for tests
* Fix test
* Add max length config, add logging for which application is failing to parse, and change config name
* fmt
* Update src/config.rs
---------
Co-authored-by: Lev Kokotov <levkk@users.noreply.github.com>
* Move connection checkin log messages to their own target
Under heavy load they can happen thousands of times per second, and
should generally be considered a nuisance at best. This marks the state
discard as an info rather than a warning, and moves all the messages
into their own log-target, so they can be filtered separately from the
more relevant warnings.
Signed-off-by: D.S. Ljungmark <spider@skuggor.se>
* Remove left-over env_logger dependencies
When moving to tracing-subscriber for logging, the env_logger
dependencies were left around, this cuts them out as dead code.
Signed-off-by: D.S. Ljungmark <spider@skuggor.se>
* Restore ability to filter log messages at runtime
This restores the RUST_LOG filters from env_logger but now with the
tracing subscriber setup. The filters are chained so commandline options
mark the default in case either option is set, which should be the path
of least confusion for users. ( RUST_LOG setting level to debug, and
commandline to warning is an odd user case, and I don't know what a user
who does that is expecting. )
It also bumps the version number as a fix to see which versions have
which behaviour.
Signed-off-by: D.S. Ljungmark <spider@skuggor.se>
---------
Signed-off-by: D.S. Ljungmark <spider@skuggor.se>
add --no-color option to disable colors
this commit adds a new option to disable colors in the terminal and also
moves the logger configuration to a different crate.
Signed-off-by: Sebastian Webber <sebastian@swebber.me>
This commit adds the clap library and configures the necessary args to
parse from the command line, expanding the current option of a single
file and adding support for environment variables.
Signed-off-by: Sebastian Webber <sebastian@swebber.me>
This commit adds a new function to handle notify and use it
in the SHOW HELP command, which displays the available options
in the admin console.
Also, adding Fabrízio as a co-author for all the help with the
protocol and the help to structure this PR.
Signed-off-by: Sebastian Webber <sebastian@swebber.me>
Co-authored-by: Fabrízio de Royes Mello <fabriziomello@gmail.com>
* Change idle timeout default to 10 minutes
* Revert lifo for now while we investigate connection thrashing issues
* Make queue strategy configurable
* test revert idle time out
* Add pgcat start to python test
What is wrong
Stats reported by SHOW POOLS seem to be leaking. We see lingering cl_idle , cl_waiting, and similarly for sv_idle , sv_active. We confirmed that these are reporting issues not actual lingering clients.
This behavior is readily reproducible by running
while true; do
psql "postgres://sharding_user:sharding_user@localhost:6432/sharded_db" -c "SELECT 1" > /dev/null 2>&1 &
done
Why it happens
I wasn't able to get to figure our the reason for the bug but my best guess is that we have race conditions when updating pool-level stats. So even though individual update operations are atomic, we perform a check then update sequence which is not protected by a guard.
https://github.com/postgresml/pgcat/blob/main/src/stats/pool.rs#L174-L179
I am also suspecting that using Relaxed ordering might allow this behavior (I changed all operations to use Ordering::SeqCst but still got lingering clients)
How to fix
Since SHOW POOLS/SHOW SERVER/SHOW CLIENTS all show the current state of the proxy (as opposed to SHOW STATS which show aggregate values), this PR refactors SHOW POOLS to have it construct the results directly from SHOW SERVER and SHOW CLIENT datasets. This reduces the complexity of stat updates and eliminates the need for having locks when updating pool stats as we only care about updating individual client/server states.
This will change the semantics of maxwait, so instead of it holding the maxwait time ever encountered by a client (connected or disconnected), it will only consider connected clients which should be okay given PgCat tends to hold on to client connections more than Pgbouncer.
* keep track of current stats and zero them after updating averages
* Try tests
* typo
* remove commented test stuff
* Avoid dividing by zero
* Fix test
* refactor, get rid of iterator. do it manually
* trigger build
* Fix
The docker-compose dev setup is broken under Apple silicon, starting the stack fails with the following error. Switching to a different docker image fixes the issue.
* Add dns_cache so server addresses are cached and invalidated when DNS changes.
Adds a module to deal with dns_cache feature. It's
main struct is CachedResolver, which is a simple thread safe
hostname <-> Ips cache with the ability to refresh resolutions
every `dns_max_ttl` seconds. This way, a client can check whether its
ip address has changed.
* Allow reloading dns cached
* Add documentation for dns_cached
I needed to have granular control over protocol message testing. For example, being able to send protocol messages one-by-one and then be able to inspect the results.
In order to do that, I created this low-level ruby client that can be used to send protocol messages in any order without blocking and also allows inspection of response messages.
* Add a new exec_simple_query method
This adds a new `exec_simple_query` method so we can make 'out of band'
queries to servers that don't interfere with pools at all.
In order to reuse startup code for making these simple queries,
we need to set the stats (`Reporter`) optional, so using these
simple queries wont interfere with stats.
* Add auth passthough (auth_query)
Adds a feature that allows setting auth passthrough for md5 auth.
It adds 3 new (general and pool) config parameters:
- `auth_query`: An string containing a query that will be executed on boot
to obtain the hash of a given user. This query have to use a placeholder `$1`,
so pgcat can replace it with the user its trying to fetch the hash from.
- `auth_query_user`: The user to use for connecting to the server and executing the
auth_query.
- `auth_query_password`: The password to use for connecting to the server and executing the
auth_query.
The configuration can be done either on the general config (so pools share them) or in a per-pool basis.
The behavior is, at boot time, when validating server connections, a hash is fetched per server
and stored in the pool. When new server connections are created, and no cleartext password is specified,
the obtained hash is used for creating them, if the hash could not be obtained for whatever reason, it retries
it.
When client authentication is tried, it uses cleartext passwords if specified, it not, it checks whether
we have query_auth set up, if so, it tries to use the obtained hash for making client auth. If there is no
hash (we could not obtain one when validating the connection), a new fetch is tried.
Once we have a hash, we authenticate using it against whathever the client has sent us, if there is a failure
we refetch the hash and retry auth (so password changes can be done).
The idea with this 'retrial' mechanism is to make it fault tolerant, so if for whatever reason hash could not be
obtained during connection validation, or the password has change, we can still connect later.
* Add documentation for Auth passthrough
* Refactor stats to use atomics
When we are dealing with a high number of connections, generated
stats cannot be consumed fast enough by the stats collector loop.
This makes the stats subsystem inconsistent and a log of
warning messages are thrown due to unregistered server/clients.
This change refactors the stats subsystem so it uses atomics:
- Now counters are handled using U64 atomics
- Event system is dropped and averages are calculated using a loop
every 15 seconds.
- Now, instead of snapshots being generated ever second we keep track of servers/clients
that have registered. Each pool/server/client has its own instance of the counter and
makes changes directly, instead of adding an event that gets processed later.
* Manually mplement Hash/Eq in `config::Address` ignoring stats
* Add tests for client connection counters
* Allow connecting to dockerized dev pgcat from the host
* stats: Decrease cl_idle when idle socket disconnects
When recv is called in the mirroring client, we noticed an occasional panic when reading the message.
thread 'tokio-runtime-worker' panicked at 'slice index starts at 5 but ends at 0', src/messages.rs:522:18
We are still debugging the reason why this happens but adding a check for slice bounds seems like a good idea. Instead of panicking, this will return an Err to the caller which will close the connection.
The experimental mirroring feature used a lot of memory and CPU when put under production traffic. This change attempts to reduce memory and CPU usage.
Memory footprint is reduced by making the channel smaller. CPU usage is reduced by avoiding allocations if the channel is full or is closed.
We might lose more messages this way if the mirror falls behind but that is more acceptable than crashing the entire process when it goes out-of-memory (OOM)
This PR adds a utility script that generates config documentation from pgcat.toml. Ideally, we'd want to generate the configs directly from config.rs where the actual defaults are set but this is a good start as we already had several undocumented config flags.
* Prepared stmt sharding
s
tests
* len check
* remove python test
* latest rust
* move that to debug for sure
* Add the actual tests
* latest image
* Update tests/ruby/sharding_spec.rb
This is an implementation of Query mirroring in PgCat (outlined here #302)
In configs, we match mirror hosts with the servers handling the traffic. A mirror host will receive the same protocol messages as the main server it was matched with.
This is done by creating an async task for each mirror server, it communicates with the main server through two channels, one for the protocol messages and one for the exit signal. The mirror server sends the protocol packets to the underlying PostgreSQL server. We receive from the underlying PostgreSQL server as soon as the data is available and we immediately discard it. We use bb8 to manage the life cycle of the connection, not for pooling since each mirror server handler is more or less single-threaded.
We don't have any connection pooling in the mirrors. Matching each mirror connection to an actual server connection guarantees that we will not have more connections to any of the mirrors than the parent pool would allow.
Sometimes we want an admin to be able to ban a host for some time to route traffic away from that host for reasons like partial outages, replication lag, and scheduled maintenance.
We can achieve this today using a configuration update but a quicker approach is to send a control command to PgCat that bans the replica for some specified duration.
This command does not change the current banning rules like
Primaries cannot be banned
When all replicas are banned, all replicas are unbanned
* Adds SHUTDOWN command to PgCat as alternate option to sending SIGINT
* Check if we're already in SHUTDOWN sequence
* Send signal directly from shutdown instead of using channel
* Add tests
* trigger build
* Lowercase response and boolean change
* Update tests
* Fix tests
* typo
We identified a bug where RELOAD fails to update the pools.
To reproduce you need to start at some config state, modify that state a bit, reload, revert the configs back to the original state, and reload. The last reload will fail to update the pool because PgCat "thinks" the pool state didn't change.
This is because we use a HashSet to keep track of config hashes but we never remove values from it.
Say we start with State A, we modify pool configs to State B and reload. Now the POOL_HASHES struct has State A and State B. Attempting to go back to State A will encounter a hashset hit which is interpreted by PgCat as "Configs are the same, no need to reload pools"
We fix this by attaching a config_hash value to ConnectionPool object and we calculate that value when we create the pool. This eliminates the need for a global variable. One shortcoming here is that changing any config under one user in the pool will trigger a reload for the entire pool (which is fine I think)
Connection to the CI databases is viewed by Postgres as coming from localhost. The pg_hba.conf file generated by the docker image uses trust for these connections, that's why we had no test coverage on SASL and md5 branches.
This PR fixes this issue. There was also an issue with under-reporting code coverage. This should be fixed now
I am seeing Directory (/home/circleci/project) you are trying to checkout to is not empty and not a git repository error after I started using the new Dockerfile.ci image. My best guess is that this failure is because we download toxiproxy.deb file into the home directory which blocks git checkout.
This PR moves toxiproxy to /tmp/ to avoid this
We have to build and push the docker image used in CI manually. This PR builds that image automatically and pushes it to Github docker repository.
Will start using that image in a follow PR
What
Allows shard selection by the client to come in via comments like /* shard_id: 1 */ select * from foo;
Why
We're using a setup in Ruby that makes it tough or impossible to inject commands on the connection to set the shard before it gets to the "real" SQL being run. Instead we have an updated PG adapter that allows injection of comments before each executed SQL statement. We need this support in pgcat in order to keep some complex shard picking logic in Ruby code while using pgcat for connection management.
Local Testing
Run postgres and pgcat with the default options. Run psql < tests/sharding/query_routing_setup.sql to setup the database for the tests and run ./tests/pgbench/external_shard_test.sh as often as needed to exercise the shard setting comment test.
Code coverage logic was missing coverage from rust tests. This is now fixed.
Also, we weren't reaping spawned PgCat processes correctly which left zombie processes.
We have encountered a case where PgCat pools were stuck following a database incident. Our best understanding at this point is that the PgCat -> Postgres connections died silently and because Tokio defaults to disabling keepalives, connections in the pool were marked as busy forever. Only when we deployed PgCat did we see recovery.
This PR introduces tcp_keepalives to PgCat. This sets the defaults to be
keepalives_idle: 5 # seconds
keepalives_interval: 5 # seconds
keepalives_count: 5 # a count
These settings can detect the death of an idle connection within 30 seconds of its death. Please note that the connection can remain idle forever (from an application perspective) as long as the keepalive packets are flowing so disconnection will only occur if the other end is not acknowledging keepalive packets (keepalive packet acks are handled by the OS, the application does not need to do anything). I plan to add tcp_user_timeout in a follow-up PR.
This change:
- Adds server metrics to prometheus endpoint.
- Adds database metrics to prometheus endpoint.
- Adds pools metrics to prometheus endpoint.
- Change metrics name to have a prefix of (stats|pools|databases|servers).
Least outstanding connections load balancing can improve the load distribution between instances but for Pgcat it may also improve handling slow replicas that don't go completely down. With LoC, traffic will quickly move away from the slow replica without waiting for the replica to be banned.
If all replicas slow down equally (due to a bad query that is hitting all replicas), the algorithm will degenerate to Random Load Balancing (which is what we had in Pgcat until today).
This may also allow Pgcat to accommodate pools with differently-sized replicas.
This change adds a new configuration parameter called `worker_threads` that
allows setting the number of workers the Tokio Runtime will use. It defaults to
4 to maintain backward compatibility.
Given that the config file parse is done asynchronously, first, a transient runtime
is created for reading config, and once it has been parsed, the actual runtime
that will be used for PgCat execution is created.
In postgres, you can specify an `idle_session_timeout` which will close
sessions idling for that amount of time. If a session is closed because
of a timeout, PgCat will erroneously mark the server as unhealthy as the next
health check will return an error because the connection was drop, if no
health check is to be executed, it will simply fail trying to send the query
to the server for the same reason, the conn was drop.
Given that bb8 allows configuring an idle_timeout for pools, it would be
nice to allow setting this parameter in the config file, this way you can
set it to something shorter than the server one. Also, server pool will be kept
smaller in moments of less traffic. Actually, currently this value is set as its
default in bb8, which is 10 minutes.
This changes allows setting the parameter using the config file. It can be set both
globally and per pool. When creating the pool, if the pool don't have it defined, global
value is used.
* Adds configuration for logging connections and removes get_config from entrypoint
* typo
* rename connection config var and add to toml files
* update config log
* fmt
* Don't send discard all when state is changed in transaction
* Remove unnecessary clone
* spelling
* Move transaction check to SET command
* Add test for set command in transaction
* type
* Update comments
* Update comments
* use moves instead of clones for initial message
* don't make message mutable
* Update unwrap
* but i'm not a wrapper
* Add set local test
* change continue
Moves config validation to own functions to enable tools to use them
Moves sharding config to enum
Makes defaults public
Make connect_timeout on pool and option which is overwritten by general connect_timeout
* Changes shard struct to use vector of ServerConfig
* Adds to query router
* Change client disconnect with error message to warn instead of debug
* Add warning logs for clean up actions
* Send DISCARD ALL even if client is not in transaction
* fmt
* Added tests + avoided sending extra discard all
* Adds set name logic to beginning of handle client
* fmt
* refactor dead code handling
* Refactor reading command tag
* remove unnecessary trim
* Removing debugging statement
* typo
* typo{
* documentation
* edit text
* un-unwrap
* run ci
* run ci
Co-authored-by: Zain Kabani <zain.kabani@instacart.com>
* wip
* revert some'
* revert more
* poor-man's integration test
* remove test
* fmt
* --workspace
* fix build
* fix integration test
* another stab
* log
* run after integration
* cargo test after integration
* revert
* revert more
* Refactor + clean up
* more clean up
* initial commit of server check delay implementation
* fmt
* spelling
* Update name to last_healthcheck and some comments
* Moved server tested stat to after require_healthcheck check
* Make health check delay configurable
* Rename to last_activity
* Fix typo
* Add debug log for healthcheck
* Add address to debug log
* Validates pgcat is closed after shutdown python tests
* Fix pgrep logic
* Moves sigterm step to after cleanup to decouple
* Replace subprocess with os.system for running pgcat
* create a hyper server and add option to enable it in config
* move prometheus stuff to its own file; update format
* create metric type and help lookup table
* finish the metric help type map
* switch to a boolean and a standard port
* dont emit unimplemented metrics
* fail if curl returns a non 200
* resolve conflicts
* move log out of config.show and into main
* terminating new line
* upgrade curl
* include unimplemented stats
* Initial commit for graceful shutdown
* fmt
* Add .vscode to gitignore
* Updates shutdown logic to use channels
* fmt
* fmt
* Adds shutdown timeout
* Fmt and updates tomls
* Updates readme
* fmt and updates log levels
* Update python tests to test shutdown
* merge changes
* Rename listener rx and update bash to be in line with master
* Update python test bash script ordering
* Adds error response message before shutdown
* Add details on shutdown event loop
* Fixes response length for error
* Adds handler for sigterm
* Uses ready for query function and fixes number of bytes
* fmt
* Fix Dev env
* Update tests/sharding/query_routing_setup.sql
* Update tests/sharding/query_routing_setup.sql
* bring pgcat.toml on ci and local dev to parity
* more parity
* pool names
* pool names
* less diff
* fix tests
* fmt
* add other user to setup
Co-authored-by: Lev Kokotov <levkk@users.noreply.github.com>
* Add support for multi-database / multi-user pools
* Nothing
* cargo fmt
* CI
* remove test users
* rename pool
* Update tests to use admin user/pass
* more fixes
* Revert bad change
* Use PGDATABASE env var
* send server info in case of admin
* Support reloading the entire config (including sharding logic) without restart.
* Fix bug incorrectly handing error reporting when the shard is set incorrectly via SET SHARD TO command.
selected wrong shard and the connection keep reporting fatal #80.
* Fix total_received and avg_recv admin database statistics.
* Enabling the query parser by default.
* More tests.
2. Run the test suite (e.g. `pgbench`) to make sure everything still works. The tests are in `.circleci/run_tests.sh`.
3. Performance is important, make sure there are no regressions in your branch vs. `main`.
Happy hacking!
## TODOs
A non-exhaustive list of things that would be useful to implement:
#### Client authentication
MD5 is probably sufficient, but maybe others too.
#### Admin
Admin database for stats collection and pooler administration. PgBouncer gives us a nice example on how to do that, specifically how to implement `RowDescription` and `DataRow` messages, [example here](https://github.com/pgbouncer/pgbouncer/blob/4f9ced8e63d317a6ff45c8b0efa876b32161f6db/src/admin.c#L813).
| Transaction pooling | :white_check_mark: | Identical to PgBouncer. |
| Session pooling | :white_check_mark: | Identical to PgBouncer. |
| `COPY` support | :white_check_mark: | Both `COPY TO` and `COPY FROM` are supported. |
| Query cancellation | :white_check_mark: | Supported both in transaction and session pooling modes. |
| Load balancing of read queries | :white_check_mark: | Using round-robin between replicas. Primary is included when `primary_reads_enabled` is enabled (default). |
| Sharding | :white_check_mark: | Transactions are sharded using `SET SHARD TO` and `SET SHARDING KEY TO` syntax extensions; see examples below. |
| Failover | :white_check_mark: | Replicas are tested with a health check. If a health check fails, remaining replicas are attempted; see below for algorithm description and examples. |
| Statistics reporting | :white_check_mark: | Statistics similar to PgBouncers are reported via StatsD. |
| Live configuration reloading | :construction_worker: | Reload config with a `SIGHUP` to the process, e.g. `kill -s SIGHUP $(pgrep pgcat)`. Not all settings can be reloaded without a restart. |
| Client authentication | :x: :wrench: | On the roadmap; currently all clients are allowed to connect and one user is used to connect to Postgres. |
## Deployment
See `Dockerfile` for example deployment using Docker. The pooler is configured to spawn 4 workers so 4 CPUs are recommended for optimal performance.
That setting can be adjusted to spawn as many (or as little) workers as needed.
| `host` | The pooler will run on this host, 0.0.0.0 means accessible from everywhere. | `0.0.0.0` |
| `port` | The pooler will run on this port. | `6432` |
| `pool_size` | Maximum allowed server connections per pool. Pools are separated for each user/shard/server role. The connections are allocated as needed. | `15` |
| `pool_mode` | The pool mode to use, i.e. `session` or `transaction`. | `transaction` |
| `connect_timeout` | Maximum time to establish a connection to a server (milliseconds). If reached, the server is banned and the next target is attempted. | `5000` |
| `healthcheck_timeout` | Maximum time to pass a health check (`SELECT 1`, milliseconds). If reached, the server is banned and the next target is attempted. | `1000` |
| `ban_time` | Ban time for a server (seconds). It won't be allowed to serve transactions until the ban expires; failover targets will be used instead. | `60` |
| `statsd_address` | StatsD host and port. Statistics will be sent there every 15 seconds. | `127.0.0.1:8125` |
| | | |
| **`user`** | | |
| `name` | The user name. | `sharding_user` |
| `password` | The user password in plaintext. | `hunter2` |
| | | |
| **`shards`** | Shards are numerically numbered starting from 0; the order in the config is preserved by the pooler to route queries accordingly. | `[shards.0]` |
| `servers` | List of servers to connect to and their roles. A server is: `[host, port, role]`, where `role` is either `primary` or `replica`. | `["127.0.0.1", 5432, "primary"]` |
| `database` | The name of the database to connect to. This is the same on all servers that are part of one shard. | |
| **`query_router`** | | |
| `default_role` | Traffic is routed to this role by default (round-robin), unless the client specifies otherwise. Default is `any`, for any role available. | `any`, `primary`, `replica` |
| `query_parser_enabled` | Enable the query parser which will inspect incoming queries and route them to a primary or replicas. | `false` |
| `primary_reads_enabled` | Enable this to allow read queries on the primary; otherwise read queries are routed to the replicas. | `true` |
## Local development
1. Install Rust (latest stable will work great).
2.`cargo build --release` (to get better benchmarks).
3. Change the config in `pgcat.toml` to fit your setup (optional given next step).
4. Install Postgres and run `psql -f tests/sharding/query_routing_setup.sql` (user/password may be required depending on your setup)
5.`RUST_LOG=info cargo run --release` You're ready to go!
### Tests
Quickest way to test your changes is to use pgbench:
| Load balancing | :white_check_mark: | :white_check_mark: | We could test this by emitting statistics for each replica and compare them. |
| Failover | :white_check_mark: | :white_check_mark: | Misconfigure a replica in `pgcat.toml` and watch it forward queries to spares. CI testing is using Toxiproxy. |
| Sharding | :white_check_mark: | :white_check_mark: | See `tests/sharding` and `tests/ruby` for an Rails/ActiveRecord example. |
| Statistics reporting | :x: | :white_check_mark: | Run `nc -l -u 8125` and watch the stats come in every 15 seconds. |
| Live config reloading | :white_check_mark: | :white_check_mark: | Run `kill -s SIGHUP $(pgrep pgcat)` and watch the config reload. |
## Usage
### Session mode
In session mode, a client talks to one server for the duration of the connection. Prepared statements, `SET`, and advisory locks are supported. In terms of supported features, there is very little if any difference between session mode and talking directly to the server.
To use session mode, change `pool_mode = "session"`.
### Transaction mode
In transaction mode, a client talks to one server for the duration of a single transaction; once it's over, the server is returned to the pool. Prepared statements, `SET`, and advisory locks are not supported; alternatives are to use `SET LOCAL` and `pg_advisory_xact_lock` which are scoped to the transaction.
This mode is enabled by default.
### Load balancing of read queries
All queries are load balanced against the configured servers using the round-robin algorithm. The most straight forward configuration example would be to put this pooler in front of several replicas and let it load balance all queries.
If the configuration includes a primary and replicas, the queries can be separated with the built-in query parser. The query parser will interpret the query and route all `SELECT` queries to a replica, while all other queries including explicit transactions will be routed to the primary.
The query parser is disabled by default.
#### Query parser
The query parser will do its best to determine where the query should go, but sometimes that's not possible. In that case, the client can select which server it wants using this custom SQL syntax:
```sql
-- To talk to the primary for the duration of the next transaction:
SETSERVERROLETO'primary';
-- To talk to the replica for the duration of the next transaction:
SETSERVERROLETO'replica';
-- Let the query parser decide
SETSERVERROLETO'auto';
-- Pick any server at random
SETSERVERROLETO'any';
-- Reset to default configured settings
SETSERVERROLETO'default';
```
The setting will persist until it's changed again or the client disconnects.
By default, all queries are routed to the first available server; `default_role` setting controls this behavior.
### Failover
All servers are checked with a `SELECT 1` query before being given to a client. If the server is not reachable, it will be banned and cannot serve any more transactions for the duration of the ban. The queries are routed to the remaining servers. If all servers become banned, the ban list is cleared: this is a safety precaution against false positives. The primary can never be banned.
The ban time can be changed with `ban_time`. The default is 60 seconds.
Failover behavior can get pretty interesting (read complex) when multiple configurations and factors are involved. The table below will try to explain what PgCat does in each scenario:
| **Query** | **`SET SERVER ROLE TO`** | **`query_parser_enabled`** | **`primary_reads_enabled`** | **Target state** | **Outcome** |
| Read query, i.e. `SELECT` | unset (any) | false | false | up | Query is routed to the first instance in the round-robin loop. |
| Read query | unset (any) | true | false | up | Query is routed to the first replica instance in the round-robin loop. |
| Read query | unset (any) | true | true | up | Query is routed to the first instance in the round-robin loop. |
| Read query | replica | false | false | up | Query is routed to the first replica instance in the round-robin loop. |
| Read query | primary | false | false | up | Query is routed to the primary. |
| Read query | unset (any) | false | false | down | First instance is banned for reads. Next target in the round-robin loop is attempted. |
| Read query | unset (any) | true | false | down | First replica instance is banned. Next replica instance is attempted in the round-robin loop. |
| Read query | unset (any) | true | true | down | First instance (even if primary) is banned for reads. Next instance is attempted in the round-robin loop. |
| Read query | replica | false | false | down | First replica instance is banned. Next replica instance is attempted in the round-robin loop. |
| Read query | primary | false | false | down | The query is attempted against the primary and fails. The client receives an error. |
| | | | | | |
| Write query e.g. `INSERT` | unset (any) | false | false | up | The query is attempted against the first available instance in the round-robin loop. If the instance is a replica, the query fails and the client receives an error. |
| Write query | unset (any) | true | false | up | The query is routed to the primary. |
| Write query | unset (any) | true | true | up | The query is routed to the primary. |
| Write query | primary | false | false | up | The query is routed to the primary. |
| Write query | replica | false | false | up | The query is routed to the replica and fails. The client receives an error. |
| Write query | unset (any) | true | false | down | The query is routed to the primary and fails. The client receives an error. |
| Write query | unset (any) | true | true | down | The query is routed to the primary and fails. The client receives an error. |
| Write query | primary | false | false | down | The query is routed to the primary and fails. The client receives an error. |
| | | | | | |
### Sharding
We use the `PARTITION BY HASH` hashing function, the same as used by Postgres for declarative partitioning. This allows to shard the database using Postgres partitions and place the partitions on different servers (shards). Both read and write queries can be routed to the shards using this pooler.
To route queries to a particular shard, we use this custom SQL syntax:
```sql
-- To talk to a shard explicitely
SETSHARDTO'1';
-- To let the pooler choose based on a value
SETSHARDINGKEYTO'1234';
```
The active shard will last until it's changed again or the client disconnects. By default, the queries are routed to shard 0.
For hash function implementation, see `src/sharding.rs` and `tests/sharding/partition_hash_test_setup.sql`.
#### ActiveRecord/Rails
```ruby
classUser<ActiveRecord::Base
end
# Metadata will be fetched from shard 0
ActiveRecord::Base.establish_connection
# Grab a bunch of users from shard 1
User.connection.execute"SET SHARD TO '1'"
User.take(10)
# Using id as the sharding key
User.connection.execute"SET SHARDING KEY TO '1234'"
User.find_by_id(1234)
# Using geographical sharding
User.connection.execute"SET SERVER ROLE TO 'primary'"
SETSERVERROLETO'auto';-- let the query router figure out where the query should go
SELECT*FROMusersWHEREemail='test@example.com';-- shard setting lasts until set again; we are reading from the primary
```
### Statistics reporting
Stats are reported using StatsD every 15 seconds. The address is configurable with `statsd_address`, the default is `127.0.0.1:8125`. The stats are very similar to what Pgbouncer reports and the names are kept to be comparable.
### Live configuration reloading
The config can be reloaded by sending a `kill -s SIGHUP` to the process. Not all settings are currently supported by live reload:
This helps us test the sharding algorithm we implemented.
## Setup
We setup 3 Postgres DBs, `shard0`, `shard1`, and `shard2`. In each database, we create a partitioned table called `data`. The table is partitioned by hash, and each database will only have _one_ partition, `shard0` will satisfy `modulus 3, remainder 0`, `shard1` will satisfy `modulus 3, remainder 1`, etc.
To set this up, you can just run:
```bash
psql -f query_routing_setup.sql
```
## Run the tests
Start up PgCat by running `cargo run --release` in the root of the repo. In a different tab, run this:
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.