Improve Config Documentation (#351)

This PR adds a utility script that generates config documentation from pgcat.toml. Ideally, we'd want to generate the configs directly from config.rs where the actual defaults are set but this is a good start as we already had several undocumented config flags.
This commit is contained in:
Mostafa Abdelraouf
2023-03-10 22:00:28 -06:00
committed by GitHub
parent 0704ea089c
commit b09f0a3e6b
4 changed files with 490 additions and 72 deletions

340
CONFIG.md Normal file
View File

@@ -0,0 +1,340 @@
# PgCat Configurations
## `general` Section
### host
```
path: general.host
default: "0.0.0.0"
```
What IP to run on, 0.0.0.0 means accessible from everywhere.
### port
```
path: general.port
default: 6432
```
Port to run on, same as PgBouncer used in this example.
### enable_prometheus_exporter
```
path: general.enable_prometheus_exporter
default: true
```
Whether to enable prometheus exporter or not.
### prometheus_exporter_port
```
path: general.prometheus_exporter_port
default: 9930
```
Port at which prometheus exporter listens on.
### connect_timeout
```
path: general.connect_timeout
default: 5000 # milliseconds
```
How long to wait before aborting a server connection (ms).
### idle_timeout
```
path: general.idle_timeout
default: 30000 # milliseconds
```
How long an idle connection with a server is left open (ms).
### healthcheck_timeout
```
path: general.healthcheck_timeout
default: 1000 # milliseconds
```
How much time to give the health check query to return with a result (ms).
### healthcheck_delay
```
path: general.healthcheck_delay
default: 30000 # milliseconds
```
How long to keep connection available for immediate re-use, without running a healthcheck query on it
### shutdown_timeout
```
path: general.shutdown_timeout
default: 60000 # milliseconds
```
How much time to give clients during shutdown before forcibly killing client connections (ms).
### ban_time
```
path: general.ban_time
default: 60 # seconds
```
How long to ban a server if it fails a health check (seconds).
### log_client_connections
```
path: general.log_client_connections
default: false
```
If we should log client connections
### log_client_disconnections
```
path: general.log_client_disconnections
default: false
```
If we should log client disconnections
### autoreload
```
path: general.autoreload
default: false
```
When set to true, PgCat reloads configs if it detects a change in the config file.
### worker_threads
```
path: general.worker_threads
default: 5
```
Number of worker threads the Runtime will use (4 by default).
### tcp_keepalives_idle
```
path: general.tcp_keepalives_idle
default: 5
```
Number of seconds of connection idleness to wait before sending a keepalive packet to the server.
### tcp_keepalives_count
```
path: general.tcp_keepalives_count
default: 5
```
Number of unacknowledged keepalive packets allowed before giving up and closing the connection.
### tcp_keepalives_interval
```
path: general.tcp_keepalives_interval
default: 5
```
Number of seconds between keepalive packets.
### tls_certificate
```
path: general.tls_certificate
default: <UNSET>
example: "server.cert"
```
Path to TLS Certficate file to use for TLS connections
### tls_private_key
```
path: general.tls_private_key
default: <UNSET>
example: "server.key"
```
Path to TLS private key file to use for TLS connections
### admin_username
```
path: general.admin_username
default: "admin_user"
```
User name to access the virtual administrative database (pgbouncer or pgcat)
Connecting to that database allows running commands like `SHOW POOLS`, `SHOW DATABASES`, etc..
### admin_password
```
path: general.admin_password
default: "admin_pass"
```
Password to access the virtual administrative database
## `pools.<pool_name>` Section
### pool_mode
```
path: pools.<pool_name>.pool_mode
default: "transaction"
```
Pool mode (see PgBouncer docs for more).
`session` one server connection per connected client
`transaction` one server connection per client transaction
### load_balancing_mode
```
path: pools.<pool_name>.load_balancing_mode
default: "random"
```
Load balancing mode
`random` selects the server at random
`loc` selects the server with the least outstanding busy conncetions
### default_role
```
path: pools.<pool_name>.default_role
default: "any"
```
If the client doesn't specify, PgCat routes traffic to this role by default.
`any` round-robin between primary and replicas,
`replica` round-robin between replicas only without touching the primary,
`primary` all queries go to the primary unless otherwise specified.
### query_parser_enabled
```
path: pools.<pool_name>.query_parser_enabled
default: true
```
If Query Parser is enabled, we'll attempt to parse
every incoming query to determine if it's a read or a write.
If it's a read query, we'll direct it to a replica. Otherwise, if it's a write,
we'll direct it to the primary.
### primary_reads_enabled
```
path: pools.<pool_name>.primary_reads_enabled
default: true
```
If the query parser is enabled and this setting is enabled, the primary will be part of the pool of databases used for
load balancing of read queries. Otherwise, the primary will only be used for write
queries. The primary can always be explicitly selected with our custom protocol.
### sharding_key_regex
```
path: pools.<pool_name>.sharding_key_regex
default: <UNSET>
example: '/\* sharding_key: (\d+) \*/'
```
Allow sharding commands to be passed as statement comments instead of
separate commands. If these are unset this functionality is disabled.
### sharding_function
```
path: pools.<pool_name>.sharding_function
default: "pg_bigint_hash"
```
So what if you wanted to implement a different hashing function,
or you've already built one and you want this pooler to use it?
Current options:
`pg_bigint_hash`: PARTITION BY HASH (Postgres hashing function)
`sha1`: A hashing function based on SHA1
### automatic_sharding_key
```
path: pools.<pool_name>.automatic_sharding_key
default: <UNSET>
example: "data.id"
```
Automatically parse this from queries and route queries to the right shard!
### idle_timeout
```
path: pools.<pool_name>.idle_timeout
default: 40000
```
Idle timeout can be overwritten in the pool
### connect_timeout
```
path: pools.<pool_name>.connect_timeout
default: 3000
```
Connect timeout can be overwritten in the pool
## `pools.<pool_name>.users.<user_index>` Section
### username
```
path: pools.<pool_name>.users.<user_index>.username
default: "sharding_user"
```
Postgresql username
### password
```
path: pools.<pool_name>.users.<user_index>.password
default: "sharding_user"
```
Postgresql password
### pool_size
```
path: pools.<pool_name>.users.<user_index>.pool_size
default: 9
```
Maximum number of server connections that can be established for this user
The maximum number of connection from a single Pgcat process to any database in the cluster
is the sum of pool_size across all users.
### statement_timeout
```
path: pools.<pool_name>.users.<user_index>.statement_timeout
default: 0
```
Maximum query duration. Dangerous, but protects against DBs that died in a non-obvious way.
0 means it is disabled.
## `pools.<pool_name>.shards.<shard_index>` Section
### servers
```
path: pools.<pool_name>.shards.<shard_index>.servers
default: [["127.0.0.1", 5432, "primary"], ["localhost", 5432, "replica"]]
```
Array of servers in the shard, each server entry is an array of `[host, port, role]`
### mirrors
```
path: pools.<pool_name>.shards.<shard_index>.mirrors
default: <UNSET>
example: [["1.2.3.4", 5432, 0], ["1.2.3.4", 5432, 1]]
```
Array of mirrors for the shard, each mirror entry is an array of `[host, port, index of server in servers array]`
Traffic hitting the server identified by the index will be sent to the mirror.
### database
```
path: pools.<pool_name>.shards.<shard_index>.database
default: "shard0"
```
Database name (e.g. "postgres")

View File

@@ -39,35 +39,7 @@ PGPASSWORD=postgres psql -h 127.0.0.1 -p 6432 -U postgres -c 'SELECT 1'
### Config
| **Name** | **Description** | **Examples** |
|------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------|
| **`general`** | | |
| `host` | The pooler will run on this host, 0.0.0.0 means accessible from everywhere. | `0.0.0.0` |
| `port` | The pooler will run on this port. | `6432` |
| `enable_prometheus_exporter` | Enable prometheus exporter which will export metrics in prometheus exposition format. | `true` |
| `prometheus_exporter_port` | Port at which prometheus exporter listens on. | `9930` |
| `pool_size` | Maximum allowed server connections per pool. Pools are separated for each user/shard/server role. The connections are allocated as needed. | `15` |
| `pool_mode` | The pool mode to use, i.e. `session` or `transaction`. | `transaction` |
| `connect_timeout` | Maximum time to establish a connection to a server (milliseconds). If reached, the server is banned and the next target is attempted. | `5000` |
| `healthcheck_timeout` | Maximum time to pass a health check (`SELECT 1`, milliseconds). If reached, the server is banned and the next target is attempted. | `1000` |
| `shutdown_timeout` | Maximum time to give clients during shutdown before forcibly killing client connections (ms). | `60000` |
| `healthcheck_delay` | How long to keep connection available for immediate re-use, without running a healthcheck query on it | `30000` |
| `ban_time` | Ban time for a server (seconds). It won't be allowed to serve transactions until the ban expires; failover targets will be used instead. | `60` |
| `autoreload` | Enable auto-reload of config after fixed time-interval. | `false` |
| | | |
| **`user`** | | |
| `name` | The user name. | `sharding_user` |
| `password` | The user password in plaintext. | `hunter2` |
| `statement_timeout` | Timeout in milliseconds for how long a query takes to execute | `0 (disabled)`
| | | |
| **`shards`** | Shards are numerically numbered starting from 0; the order in the config is preserved by the pooler to route queries accordingly. | `[shards.0]` |
| `servers` | List of servers to connect to and their roles. A server is: `[host, port, role]`, where `role` is either `primary` or `replica`. | `["127.0.0.1", 5432, "primary"]` |
| `database` | The name of the database to connect to. This is the same on all servers that are part of one shard. | |
| | | |
| **`query_router`** | | |
| `default_role` | Traffic is routed to this role by default (random), unless the client specifies otherwise. Default is `any`, for any role available. | `any`, `primary`, `replica` |
| `query_parser_enabled` | Enable the query parser which will inspect incoming queries and route them to a primary or replicas. | `false` |
| `primary_reads_enabled` | Enable this to allow read queries on the primary; otherwise read queries are routed to the replicas. | `true` |
(See Configurations page)[https://github.com/levkk/pgcat/blob/main/CONFIG.md]
## Local development

View File

@@ -18,21 +18,21 @@ enable_prometheus_exporter = true
prometheus_exporter_port = 9930
# How long to wait before aborting a server connection (ms).
connect_timeout = 5000
connect_timeout = 5000 # milliseconds
# How long an idle connection with a server is left open (ms).
idle_timeout = 30000
idle_timeout = 30000 # milliseconds
# How much time to give the health check query to return with a result (ms).
healthcheck_timeout = 1000
healthcheck_timeout = 1000 # milliseconds
# How long to keep connection available for immediate re-use, without running a healthcheck query on it
healthcheck_delay = 30000
healthcheck_delay = 30000 # milliseconds
# How much time to give clients during shutdown before forcibly killing client connections (ms).
shutdown_timeout = 60000
shutdown_timeout = 60000 # milliseconds
# For how long to ban a server if it fails a health check (seconds).
# How long to ban a server if it fails a health check (seconds).
ban_time = 60 # seconds
# If we should log client connections
@@ -41,40 +41,52 @@ log_client_connections = false
# If we should log client disconnections
log_client_disconnections = false
# Reload config automatically if it changes.
# When set to true, PgCat reloads configs if it detects a change in the config file.
autoreload = false
# Number of worker threads the Runtime will use (4 by default).
worker_threads = 5
# TLS
# Number of seconds of connection idleness to wait before sending a keepalive packet to the server.
tcp_keepalives_idle = 5
# Number of unacknowledged keepalive packets allowed before giving up and closing the connection.
tcp_keepalives_count = 5
# Number of seconds between keepalive packets.
tcp_keepalives_interval = 5
# Path to TLS Certficate file to use for TLS connections
# tls_certificate = "server.cert"
# Path to TLS private key file to use for TLS connections
# tls_private_key = "server.key"
# Credentials to access the virtual administrative database (pgbouncer or pgcat)
# User name to access the virtual administrative database (pgbouncer or pgcat)
# Connecting to that database allows running commands like `SHOW POOLS`, `SHOW DATABASES`, etc..
admin_username = "admin_user"
# Password to access the virtual administrative database
admin_password = "admin_pass"
# pool
# configs are structured as pool.<pool_name>
# the pool_name is what clients use as database name when connecting
# For the example below a client can connect using "postgres://sharding_user:sharding_user@pgcat_host:pgcat_port/sharded_db"
# pool configs are structured as pool.<pool_name>
# the pool_name is what clients use as database name when connecting.
# For a pool named `sharded_db`, clients access that pool using connection string like
# `postgres://sharding_user:sharding_user@pgcat_host:pgcat_port/sharded_db`
[pools.sharded_db]
# Pool mode (see PgBouncer docs for more).
# session: one server connection per connected client
# transaction: one server connection per client transaction
# `session` one server connection per connected client
# `transaction` one server connection per client transaction
pool_mode = "transaction"
# If the client doesn't specify, route traffic to
# this role by default.
#
# any: round-robin between primary and replicas,
# replica: round-robin between replicas only without touching the primary,
# primary: all queries go to the primary unless otherwise specified.
# Load balancing mode
# `random` selects the server at random
# `loc` selects the server with the least outstanding busy conncetions
load_balancing_mode = "random"
# If the client doesn't specify, PgCat routes traffic to this role by default.
# `any` round-robin between primary and replicas,
# `replica` round-robin between replicas only without touching the primary,
# `primary` all queries go to the primary unless otherwise specified.
default_role = "any"
# Query parser. If enabled, we'll attempt to parse
# If Query Parser is enabled, we'll attempt to parse
# every incoming query to determine if it's a read or a write.
# If it's a read query, we'll direct it to a replica. Otherwise, if it's a write,
# we'll direct it to the primary.
@@ -93,23 +105,26 @@ primary_reads_enabled = true
# So what if you wanted to implement a different hashing function,
# or you've already built one and you want this pooler to use it?
#
# Current options:
#
# pg_bigint_hash: PARTITION BY HASH (Postgres hashing function)
# sha1: A hashing function based on SHA1
#
# `pg_bigint_hash`: PARTITION BY HASH (Postgres hashing function)
# `sha1`: A hashing function based on SHA1
sharding_function = "pg_bigint_hash"
# Automatically parse this from queries and route queries to the right shard!
automatic_sharding_key = "data.id"
# automatic_sharding_key = "data.id"
# Idle timeout can be overwritten in the pool
idle_timeout = 40000
# Credentials for users that may connect to this cluster
# Connect timeout can be overwritten in the pool
connect_timeout = 3000
# User configs are structured as pool.<pool_name>.users.<user_index>
# This secion holds the credentials for users that may connect to this cluster
[pools.sharded_db.users.0]
# Postgresql username
username = "sharding_user"
# Postgresql password
password = "sharding_user"
# Maximum number of server connections that can be established for this user
# The maximum number of connection from a single Pgcat process to any database in the cluster
@@ -117,6 +132,7 @@ password = "sharding_user"
pool_size = 9
# Maximum query duration. Dangerous, but protects against DBs that died in a non-obvious way.
# 0 means it is disabled.
statement_timeout = 0
[pools.sharded_db.users.1]
@@ -125,28 +141,26 @@ password = "other_user"
pool_size = 21
statement_timeout = 15000
# Shard 0
# Shard configs are structured as pool.<pool_name>.shards.<shard_id>
# Each shard config contains a list of servers that make up the shard
# and the database name to use.
[pools.sharded_db.shards.0]
# [ host, port, role ]
servers = [
[ "127.0.0.1", 5432, "primary" ],
[ "localhost", 5432, "replica" ]
]
# Array of servers in the shard, each server entry is an array of `[host, port, role]`
servers = [["127.0.0.1", 5432, "primary"], ["localhost", 5432, "replica"]]
# Array of mirrors for the shard, each mirror entry is an array of `[host, port, index of server in servers array]`
# Traffic hitting the server identified by the index will be sent to the mirror.
# mirrors = [["1.2.3.4", 5432, 0], ["1.2.3.4", 5432, 1]]
# Database name (e.g. "postgres")
database = "shard0"
[pools.sharded_db.shards.1]
servers = [
[ "127.0.0.1", 5432, "primary" ],
[ "localhost", 5432, "replica" ],
]
servers = [["127.0.0.1", 5432, "primary"], ["localhost", 5432, "replica"]]
database = "shard1"
[pools.sharded_db.shards.2]
servers = [
[ "127.0.0.1", 5432, "primary" ],
[ "localhost", 5432, "replica" ],
]
servers = [["127.0.0.1", 5432, "primary" ], ["localhost", 5432, "replica" ]]
database = "shard2"

View File

@@ -0,0 +1,92 @@
import re
import tomli
class DocGenerator:
def __init__(self, filename):
self.doc = []
self.current_section = ""
self.current_comment = []
self.current_field_name = ""
self.current_field_value = []
self.current_field_unset = False
self.filename = filename
def write(self):
with open("../CONFIG.md", "w") as text_file:
text_file.write("# PgCat Configurations \n")
for entry in self.doc:
if entry["name"] == "__section__":
text_file.write("## `" + entry["section"] + "` Section" + "\n")
text_file.write("\n")
continue
text_file.write("### " + entry["name"]+ "\n")
text_file.write("```"+ "\n")
text_file.write("path: " + entry["fqdn"]+ "\n")
text_file.write("default: " + entry["defaults"].strip()+ "\n")
if entry["example"] is not None:
text_file.write("example: " + entry["example"].strip()+ "\n")
text_file.write("```"+ "\n")
text_file.write("\n")
text_file.write(entry["comment"]+ "\n")
text_file.write("\n")
def save_entry(self):
if len(self.current_field_name) == 0:
return
if len(self.current_comment) == 0:
return
self.current_section = self.current_section.replace("sharded_db", "<pool_name>")
self.current_section = self.current_section.replace("simple_db", "<pool_name>")
self.current_section = self.current_section.replace("users.0", "users.<user_index>")
self.current_section = self.current_section.replace("users.1", "users.<user_index>")
self.current_section = self.current_section.replace("shards.0", "shards.<shard_index>")
self.current_section = self.current_section.replace("shards.1", "shards.<shard_index>")
self.doc.append(
{
"name": self.current_field_name,
"fqdn": self.current_section + "." + self.current_field_name,
"section": self.current_section,
"comment": "\n".join(self.current_comment),
"defaults": self.current_field_value if not self.current_field_unset else "<UNSET>",
"example": self.current_field_value if self.current_field_unset else None
}
)
self.current_comment = []
self.current_field_name = ""
self.current_field_value = []
def parse(self):
with open("../pgcat.toml", "r") as f:
for line in f.readlines():
line = line.strip()
if len(line) == 0:
self.save_entry()
if line.startswith("["):
self.current_section = line[1:-1]
self.current_field_name = "__section__"
self.current_field_unset = False
self.save_entry()
elif line.startswith("#"):
results = re.search("^#\s*([A-Za-z0-9_]+)\s*=(.+)$", line)
if results is not None:
self.current_field_name = results.group(1)
self.current_field_value = results.group(2)
self.current_field_unset = True
self.save_entry()
else:
self.current_comment.append(line[1:].strip())
else:
results = re.search("^\s*([A-Za-z0-9_]+)\s*=(.+)$", line)
if results is None:
continue
self.current_field_name = results.group(1)
self.current_field_value = results.group(2)
self.current_field_unset = False
self.save_entry()
self.save_entry()
return self
DocGenerator("../pgcat.toml").parse().write()